public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 7/10] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin
@ 2014-09-23 18:20 Julian Brown
  2014-11-11 13:54 ` [PATCH 1/5] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin (repost) Julian Brown
                   ` (5 more replies)
  0 siblings, 6 replies; 36+ messages in thread
From: Julian Brown @ 2014-09-23 18:20 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 7750 bytes --]

This patch contains the bulk of the OpenACC 2.0 runtime support,
building around, or on top of, the OpenMP 4.0 support (as previously
posted or already extant upstream) where we could. Several things are
new though, naturally: I will try to run down a few of those here.

* A new header file  -- gomp-constants.h -- has been introduced
  containing several magic values (mapping codes used by both OpenMP
  and OpenACC), with the intent that it could be used by both GCC (on
  the producing side) and libgomp (on the consuming side). It's not yet
  used everywhere it could be, though.

* Plugin support has been fleshed out somewhat, so that plugins can now
  implement hooks supporting OpenMP or OpenACC, or indeed both. A
  concept of "capabilities" has been added to tell the runtime what
  each device supports, and also some meta-information like whether the
  device is able to run "native" host code, or operates using shared
  memory. A small number of libgomp support routines (gomp_*) are
  exported as gomp_plugin_*.

* The variable mapping code in target.c has been extended to allow for
  asynchronous behaviour, since OpenACC permits such. This allows
  copy-in/execution/copy-back to be queued on a device at some given
  point, and then host-side book-keeping structures to be tidied up at
  some arbitrary later point once the offloaded computation has
  completed.

* OpenACC needs more bits of the "kind" of each mapped variable: 16
  rather than 8. This is abstracted (slightly clumsily) by using the
  "get_kind" helper in target.c.

* OpenACC and OpenMP both offer an enumerated "type" for each supported
  accelerator device. For various reasons it's helpful for these
  numbers to map 1-to-1 onto each other, so this patch arranges for
  that to be so. This will require a little ongoing care and attention
  as more device types are added.

* The OpenACC runtime generally and the NVPTX plugin in particular are
  designed to work with multiple devices and multiple concurrent host
  threads (at least in theory!). One or two places where the OpenMP and
  OpenACC implementations diverge (particularly the location of the
  memory map) are because of that -- though in fact, that particular
  divergence wasn't necessary and could probably be cleaned up with a
  follow-on patch.

This code has undergone a couple of refactorings before appearing here,
in particular the OpenACC support originally formed a separate library
(libgoacc) rather than being integrated with libgomp: some vestiges of
previous incarnations may remain.

Thanks,

Julian

xxxx-xx-xx  Nathan Sidwell  <nathan@codesourcery.com>
	    James Norris  <jnorris@codesourcery.com>
	    Thomas Schwinge  <thomas@codesourcery.com>
	    Tom de Vries  <tom@codesourcery.com>
	    Julian Brown  <julian@codesourcery.com>

	include/
	* gomp-constants.h: New file.

	libgomp/
	* Makefile.am (AM_CPPFLAGS): Search in ../include also.
	(libgomp_plugin_nvptx_version_info,
	libgomp_plugin_nvptx_la_SOURCES)
	(libgomp_plugin_nvptx_la_CPPFLAGS,
	ibgomp_plugin_nvptx_la_LDFLAGS)
	(libgomp_plugin_nvptx_la_LIBADD,
	libgomp_plugin_nvptx_la_LIBTOOLFLAGS): Set variables if
	PLUGIN_NVPTX is defined. (toolexeclib_LTLIBRARIES): Add
	nonshm-host and (conditionally) nvidia plugins.
	(libgomp_plugin_nonshm_host_version_info)
	(libgomp_plugin_nonshm_host_la_SOURCES)
	(libgomp_plugin_nonshm_host_la_CPPFLAGS)
	(libgomp_plugin_nonshm_host_la_LDFLAGS)
	(libgomp_plugin_nonshm_host_la_LIBTOOLFLAGS): Set variables.
	(libgomp_la_SOURCES): Add oacc-parallel.c, splay-tree.c,
	oacc-fortran.c, oacc-host.c, oacc-init.c, oacc-mem.c,
	oacc-async.c, oacc-plugin.c, oacc-cuda.c, libgomp-plugin.c.
	(nodist_libsubinclude_HEADERS): Add
	openacc.h, ../include/gomp-constants.h.
	* Makefile.in: Regenerate.
	* config.h.in: Regenerate.
	* configure.ac: Add TODOs for OpenACC in various places.
	(CUDA_DRIVER_CPPFLAGS, CUDA_DRIVER_LDFLAGS): Initialize.
	(--with-cuda-driver, --with-cuda-driver-include)
	(--with-cuda-driver-lib, --enable-accelerator): Implement new
	options. (PLUGIN_NVPTX, PLUGIN_NVPTX_CPPFLAGS,
	PLUGIN_NVPTX_LDFLAGS) (PLUGIN_NVPTX_LIBS): Initialize variables.
	* configure: Regenerate.
	* configure.tgt: Add TODOs for OpenACC.
	* env.c (target.h): Include.
	(goacc_device_num, goacc_device_type): New globals.
	(goacc_parse_device_num, goacc_parse_device_type): New
	functions. (initialize_env): Parse GCC_ACC_NOTIFY,
	ACC_DEVICE_TYPE, ACC_DEVICE_NUM environment variables.
	* error.c (gomp_verror, gomp_vfatal, gomp_vnotify,
	gomp_notify): New functions.
	(gomp_fatal): Make global.
	* libgomp.h (stdarg.h): Include.
	(struct gomp_memory_mapping): Forward declaration.
	(struct gomp_task_icv): Add acc_notify_var member.
	(goacc_device_num, goacc_device_type): Add extern declarations.
	(gomp_vnotify, gomp_notify, gomp_verror, gomp_vfatal): Add
	prototypes. (gomp_init_targets_once): Add prototype.
	* libgomp.map (OACC_2.0): New symbol version. Add public acc_*
	interface functions.
	(PLUGIN_1.0): New symbol version. Add gomp plugin interface
	functions.
	* libgomp_g.h (GOACC_data_start, GOACC_data_end, GOACC_kernels)
	(GOACC_parallel, GOACC_wait): Add prototypes.
	* target.c (limits.h, stdbool.h, stdlib.h): Don't include.
	(oacc-plugin.h, gomp-constants.h, stdio.h, assert.h): Include.
	(splay_tree_node, splay_tree, splay_tree_key, target_mem_desc)
	(splay_tree_key_s, enum target_type, gomp_device_descr): Don't
	declare here.
	(splay-tree.h): Include.
	(target.h): Include.
	(splay_compare): Change linkage to hidden not static.
	(gomp_init_targets_once): New function.
	(gomp_get_num_devices): Use above.
	(dump_mappings): New function (for debugging).
	(get_kind): New function.
	(gomp_map_vars): Add gomp_memory_mapping (mm), is_openacc
	parameters. Change KINDS to void *. Use lock from memory map
	not device. Use macros from gomp-constants.h instead of
	hard-coded values. Support OpenACC-specific mappings.
	(gomp_copy_from_async): New function.
	(gomp_unmap_vars): Add DO_COPYFROM argument. Only copy memory
	back from device if it is true. Use lock from memory map not
	device. (gomp_update): Add mm, is_openacc args. Use lock from
	memory map not device. Use macros from gomp-constants.h not
	hard-coded values. (gomp_register_image_for_device): Add
	forward declaration. (GOMP_offload_register): Change
	TARGET_DATA type to void **. Check realloc result.
	(gomp_init_device): Change linkage to hidden not static. Tweak
	mem map location.
	(gomp_fini_device): New function.
	(GOMP_target): Adjust lazy initialization, check target
	capabilities for OpenMP 4.0 support. Add locking around splay
	tree lookup. Add new arg to gomp_unmap_vars call.
	(GOMP_target_data): Tweak lazy initialization. Add new args to
	gomp_map_vars, gomp_unmap_vars calls.
	(GOMP_target_update): Tweak lazy initialization. Add new args to
	gomp_update call.
	(gomp_load_plugin_for_device): Initialize device_fini and
	OpenACC-specific plugin hooks.
	(gomp_register_images_for_device): Rename to...
	(gomp_register_image_for_device): This, and register a single
	device only, and only if it has not already had images
	registered. (gomp_find_available_plugins): Rearrange to fix
	plugin loading and initialization for OpenACC.
	* target.h: New file.
	* splay-tree.h: Move bulk of implementation to...
	* splay-tree.c: New file.
	* libgomp-plugin.c: New file.
	* libgomp-plugin.h: New file.
	* oacc-async.c: New file.
	* oacc-cuda.c: New file.
	* oacc-fortran.c: New file.
	* oacc-host.c: New file.
	* oacc-init.c: New file.
	* oacc-mem.c: New file.
	* oacc-parallel.c: New file.
	* oacc-plugin.c: New file.
	* plugin-nvptx.c: New file.
	* oacc-int.h: New file.
	* openacc.f90: New file.
	* openacc.h: New file.
	* openacc_lib.h: New file.
	* testsuite/Makefile.in: Regenerated.

[-- Attachment #2: 07-libgomp-openacc-support-1.diff --]
[-- Type: text/x-patch, Size: 212547 bytes --]

commit da6810c7f2a3e56f77fa589ba4777b68b5751fd4
Author: Julian Brown <julian@codesourcery.com>
Date:   Mon Sep 22 02:55:12 2014 -0700

    OpenACC support for libgomp.

diff --git a/include/gomp-constants.h b/include/gomp-constants.h
new file mode 100644
index 0000000..3ee275b
--- /dev/null
+++ b/include/gomp-constants.h
@@ -0,0 +1,45 @@
+#ifndef GOMP_CONSTANTS_H
+#define GOMP_CONSTANTS_H 1
+
+/* Enumerated variable mapping types used to communicate between GCC and
+   libgomp.  These values are used for both OpenMP and OpenACC.  */
+
+#define GOMP_MAP_ALLOC			0x00
+#define GOMP_MAP_ALLOC_TO		0x01
+#define GOMP_MAP_ALLOC_FROM		0x02
+#define GOMP_MAP_ALLOC_TOFROM		0x03
+#define GOMP_MAP_POINTER		0x04
+#define GOMP_MAP_TO_PSET		0x05
+#define GOMP_MAP_FORCE_ALLOC		0x08
+#define GOMP_MAP_FORCE_TO		0x09
+#define GOMP_MAP_FORCE_FROM		0x0a
+#define GOMP_MAP_FORCE_TOFROM		0x0b
+#define GOMP_MAP_FORCE_PRESENT		0x0c
+#define GOMP_MAP_FORCE_DEALLOC		0x0d
+#define GOMP_MAP_FORCE_DEVICEPTR	0x0e
+#define GOMP_MAP_FORCE_PRIVATE		0x18
+#define GOMP_MAP_FORCE_FIRSTPRIVATE	0x19
+
+#define GOMP_MAP_COPYTO_P(X) \
+  ((X) == GOMP_MAP_ALLOC_TO || (X) == GOMP_MAP_FORCE_TO)
+
+#define GOMP_MAP_COPYFROM_P(X) \
+  ((X) == GOMP_MAP_ALLOC_FROM || (X) == GOMP_MAP_FORCE_FROM)
+
+#define GOMP_MAP_TOFROM_P(X) \
+  ((X) == GOMP_MAP_ALLOC_TOFROM || (X) == GOMP_MAP_FORCE_TOFROM)
+
+#define GOMP_MAP_POINTER_P(X) \
+  ((X) == GOMP_MAP_POINTER)
+
+#define GOMP_IF_CLAUSE_FALSE		-2
+
+/* Canonical list of target type codes for OpenMP/OpenACC.  */
+#define GOMP_TARGET_NONE		0
+#define GOMP_TARGET_HOST		2
+#define GOMP_TARGET_NONSHM_HOST		3
+#define GOMP_TARGET_NOT_HOST		4
+#define GOMP_TARGET_NVIDIA_PTX		5
+#define GOMP_TARGET_INTEL_MIC		6
+
+#endif
diff --git a/libgomp/Makefile.am b/libgomp/Makefile.am
index 427415e..b54998a 100644
--- a/libgomp/Makefile.am
+++ b/libgomp/Makefile.am
@@ -14,13 +14,35 @@ libsubincludedir = $(libdir)/gcc/$(target_alias)/$(gcc_version)/include
 
 vpath % $(strip $(search_path))
 
-AM_CPPFLAGS = $(addprefix -I, $(search_path))
+AM_CPPFLAGS = $(addprefix -I, $(search_path)) \
+	$(addprefix -I, $(search_path)/../include)
 AM_CFLAGS = $(XCFLAGS)
 AM_LDFLAGS = $(XLDFLAGS) $(SECTION_LDFLAGS) $(OPT_LDFLAGS)
 
 toolexeclib_LTLIBRARIES = libgomp.la
 nodist_toolexeclib_HEADERS = libgomp.spec
 
+if PLUGIN_NVPTX
+# Nvidia PTX OpenACC plugin.
+libgomp_plugin_nvptx_version_info = -version-info $(libtool_VERSION)
+toolexeclib_LTLIBRARIES += libgomp-plugin-nvptx.la
+libgomp_plugin_nvptx_la_SOURCES = plugin-nvptx.c
+libgomp_plugin_nvptx_la_CPPFLAGS = $(AM_CPPFLAGS) $(PLUGIN_NVPTX_CPPFLAGS)
+libgomp_plugin_nvptx_la_LDFLAGS = $(libgomp_plugin_nvptx_version_info) \
+	$(lt_host_flags)
+libgomp_plugin_nvptx_la_LDFLAGS += $(PLUGIN_NVPTX_LDFLAGS)
+libgomp_plugin_nvptx_la_LIBADD = $(PLUGIN_NVPTX_LIBS)
+libgomp_plugin_nvptx_la_LIBTOOLFLAGS = --tag=disable-static
+endif
+
+libgomp_plugin_nonshm_host_version_info = -version-info $(libtool_VERSION)
+toolexeclib_LTLIBRARIES += libgomp-plugin-nonshm-host.la
+libgomp_plugin_nonshm_host_la_SOURCES = oacc-host.c
+libgomp_plugin_nonshm_host_la_CPPFLAGS = $(AM_CPPFLAGS) -DNONSHM_HOST_PLUGIN
+libgomp_plugin_nonshm_host_la_LDFLAGS = \
+	$(libgomp_plugin_nonshm_host_version_info) $(lt_host_flags)
+libgomp_plugin_nonshm_host_la_LIBTOOLFLAGS = --tag=disable-static
+
 if LIBGOMP_BUILD_VERSIONED_SHLIB
 # -Wc is only a libtool option.
 comma = ,
@@ -60,10 +82,12 @@ libgomp_la_LINK = $(LINK) $(libgomp_la_LDFLAGS)
 libgomp_la_SOURCES = alloc.c barrier.c critical.c env.c error.c iter.c \
 	iter_ull.c loop.c loop_ull.c ordered.c parallel.c sections.c single.c \
 	task.c team.c work.c lock.c mutex.c proc.c sem.c bar.c ptrlock.c \
-	time.c fortran.c affinity.c target.c
+	time.c fortran.c affinity.c target.c oacc-parallel.c splay-tree.c \
+	oacc-fortran.c oacc-host.c oacc-init.c oacc-mem.c oacc-async.c \
+	oacc-plugin.c oacc-cuda.c libgomp-plugin.c
 
 nodist_noinst_HEADERS = libgomp_f.h
-nodist_libsubinclude_HEADERS = omp.h
+nodist_libsubinclude_HEADERS = omp.h openacc.h ../include/gomp-constants.h
 if USE_FORTRAN
 nodist_finclude_HEADERS = omp_lib.h omp_lib.f90 omp_lib.mod omp_lib_kinds.mod
 endif
diff --git a/libgomp/Makefile.in b/libgomp/Makefile.in
index 5cd666f..6e37f5a 100644
--- a/libgomp/Makefile.in
+++ b/libgomp/Makefile.in
@@ -36,6 +36,7 @@ POST_UNINSTALL = :
 build_triplet = @build@
 host_triplet = @host@
 target_triplet = @target@
+@PLUGIN_NVPTX_TRUE@am__append_1 = libgomp-plugin-nvptx.la
 subdir = .
 DIST_COMMON = ChangeLog $(srcdir)/Makefile.in $(srcdir)/Makefile.am \
 	$(top_srcdir)/configure $(am__configure_deps) \
@@ -91,12 +92,37 @@ am__installdirs = "$(DESTDIR)$(toolexeclibdir)" "$(DESTDIR)$(infodir)" \
 	"$(DESTDIR)$(fincludedir)" "$(DESTDIR)$(libsubincludedir)" \
 	"$(DESTDIR)$(toolexeclibdir)"
 LTLIBRARIES = $(toolexeclib_LTLIBRARIES)
+libgomp_plugin_nonshm_host_la_LIBADD =
+am_libgomp_plugin_nonshm_host_la_OBJECTS =  \
+	libgomp_plugin_nonshm_host_la-oacc-host.lo
+libgomp_plugin_nonshm_host_la_OBJECTS =  \
+	$(am_libgomp_plugin_nonshm_host_la_OBJECTS)
+libgomp_plugin_nonshm_host_la_LINK = $(LIBTOOL) --tag=CC \
+	$(libgomp_plugin_nonshm_host_la_LIBTOOLFLAGS) $(LIBTOOLFLAGS) \
+	--mode=link $(CCLD) $(AM_CFLAGS) $(CFLAGS) \
+	$(libgomp_plugin_nonshm_host_la_LDFLAGS) $(LDFLAGS) -o $@
+am__DEPENDENCIES_1 =
+@PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_DEPENDENCIES =  \
+@PLUGIN_NVPTX_TRUE@	$(am__DEPENDENCIES_1)
+@PLUGIN_NVPTX_TRUE@am_libgomp_plugin_nvptx_la_OBJECTS =  \
+@PLUGIN_NVPTX_TRUE@	libgomp_plugin_nvptx_la-plugin-nvptx.lo
+libgomp_plugin_nvptx_la_OBJECTS =  \
+	$(am_libgomp_plugin_nvptx_la_OBJECTS)
+libgomp_plugin_nvptx_la_LINK = $(LIBTOOL) --tag=CC \
+	$(libgomp_plugin_nvptx_la_LIBTOOLFLAGS) $(LIBTOOLFLAGS) \
+	--mode=link $(CCLD) $(AM_CFLAGS) $(CFLAGS) \
+	$(libgomp_plugin_nvptx_la_LDFLAGS) $(LDFLAGS) -o $@
+@PLUGIN_NVPTX_TRUE@am_libgomp_plugin_nvptx_la_rpath = -rpath \
+@PLUGIN_NVPTX_TRUE@	$(toolexeclibdir)
 libgomp_la_LIBADD =
 am_libgomp_la_OBJECTS = alloc.lo barrier.lo critical.lo env.lo \
 	error.lo iter.lo iter_ull.lo loop.lo loop_ull.lo ordered.lo \
 	parallel.lo sections.lo single.lo task.lo team.lo work.lo \
 	lock.lo mutex.lo proc.lo sem.lo bar.lo ptrlock.lo time.lo \
-	fortran.lo affinity.lo target.lo
+	fortran.lo affinity.lo target.lo oacc-parallel.lo \
+	splay-tree.lo oacc-fortran.lo oacc-host.lo oacc-init.lo \
+	oacc-mem.lo oacc-async.lo oacc-plugin.lo oacc-cuda.lo \
+	libgomp-plugin.lo
 libgomp_la_OBJECTS = $(am_libgomp_la_OBJECTS)
 DEFAULT_INCLUDES = -I.@am__isrc@
 depcomp = $(SHELL) $(top_srcdir)/../depcomp
@@ -108,7 +134,8 @@ LTCOMPILE = $(LIBTOOL) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) \
 	--mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) \
 	$(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS)
 CCLD = $(CC)
-SOURCES = $(libgomp_la_SOURCES)
+SOURCES = $(libgomp_plugin_nonshm_host_la_SOURCES) \
+	$(libgomp_plugin_nvptx_la_SOURCES) $(libgomp_la_SOURCES)
 MULTISRCTOP = 
 MULTIBUILDTOP = 
 MULTIDIRS = 
@@ -213,6 +240,10 @@ PACKAGE_URL = @PACKAGE_URL@
 PACKAGE_VERSION = @PACKAGE_VERSION@
 PATH_SEPARATOR = @PATH_SEPARATOR@
 PERL = @PERL@
+PLUGIN_NVPTX = @PLUGIN_NVPTX@
+PLUGIN_NVPTX_CPPFLAGS = @PLUGIN_NVPTX_CPPFLAGS@
+PLUGIN_NVPTX_LDFLAGS = @PLUGIN_NVPTX_LDFLAGS@
+PLUGIN_NVPTX_LIBS = @PLUGIN_NVPTX_LIBS@
 RANLIB = @RANLIB@
 SECTION_LDFLAGS = @SECTION_LDFLAGS@
 SED = @SED@
@@ -293,12 +324,32 @@ gcc_version := $(shell cat $(top_srcdir)/../gcc/BASE-VER)
 search_path = $(addprefix $(top_srcdir)/config/, $(config_path)) $(top_srcdir)
 fincludedir = $(libdir)/gcc/$(target_alias)/$(gcc_version)/finclude
 libsubincludedir = $(libdir)/gcc/$(target_alias)/$(gcc_version)/include
-AM_CPPFLAGS = $(addprefix -I, $(search_path))
+AM_CPPFLAGS = $(addprefix -I, $(search_path)) \
+	$(addprefix -I, $(search_path)/../include)
+
 AM_CFLAGS = $(XCFLAGS)
 AM_LDFLAGS = $(XLDFLAGS) $(SECTION_LDFLAGS) $(OPT_LDFLAGS)
-toolexeclib_LTLIBRARIES = libgomp.la
+toolexeclib_LTLIBRARIES = libgomp.la $(am__append_1) \
+	libgomp-plugin-nonshm-host.la
 nodist_toolexeclib_HEADERS = libgomp.spec
 
+# Nvidia PTX OpenACC plugin.
+@PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_version_info = -version-info $(libtool_VERSION)
+@PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_SOURCES = plugin-nvptx.c
+@PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_CPPFLAGS = $(AM_CPPFLAGS) $(PLUGIN_NVPTX_CPPFLAGS)
+@PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_LDFLAGS =  \
+@PLUGIN_NVPTX_TRUE@	$(libgomp_plugin_nvptx_version_info) \
+@PLUGIN_NVPTX_TRUE@	$(lt_host_flags) $(PLUGIN_NVPTX_LDFLAGS)
+@PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_LIBADD = $(PLUGIN_NVPTX_LIBS)
+@PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_LIBTOOLFLAGS = --tag=disable-static
+libgomp_plugin_nonshm_host_version_info = -version-info $(libtool_VERSION)
+libgomp_plugin_nonshm_host_la_SOURCES = oacc-host.c
+libgomp_plugin_nonshm_host_la_CPPFLAGS = $(AM_CPPFLAGS) -DNONSHM_HOST_PLUGIN
+libgomp_plugin_nonshm_host_la_LDFLAGS = \
+	$(libgomp_plugin_nonshm_host_version_info) $(lt_host_flags)
+
+libgomp_plugin_nonshm_host_la_LIBTOOLFLAGS = --tag=disable-static
+
 # -Wc is only a libtool option.
 @LIBGOMP_BUILD_VERSIONED_SHLIB_TRUE@comma = ,
 @LIBGOMP_BUILD_VERSIONED_SHLIB_TRUE@PREPROCESS = $(subst -Wc$(comma), , $(COMPILE)) -E
@@ -317,10 +368,12 @@ libgomp_la_LINK = $(LINK) $(libgomp_la_LDFLAGS)
 libgomp_la_SOURCES = alloc.c barrier.c critical.c env.c error.c iter.c \
 	iter_ull.c loop.c loop_ull.c ordered.c parallel.c sections.c single.c \
 	task.c team.c work.c lock.c mutex.c proc.c sem.c bar.c ptrlock.c \
-	time.c fortran.c affinity.c target.c
+	time.c fortran.c affinity.c target.c oacc-parallel.c splay-tree.c \
+	oacc-fortran.c oacc-host.c oacc-init.c oacc-mem.c oacc-async.c \
+	oacc-plugin.c oacc-cuda.c libgomp-plugin.c
 
 nodist_noinst_HEADERS = libgomp_f.h
-nodist_libsubinclude_HEADERS = omp.h
+nodist_libsubinclude_HEADERS = omp.h openacc.h ../include/gomp-constants.h
 @USE_FORTRAN_TRUE@nodist_finclude_HEADERS = omp_lib.h omp_lib.f90 omp_lib.mod omp_lib_kinds.mod
 LTLDFLAGS = $(shell $(SHELL) $(top_srcdir)/../libtool-ldflags $(LDFLAGS))
 LINK = $(LIBTOOL) --tag CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=link \
@@ -444,6 +497,10 @@ clean-toolexeclibLTLIBRARIES:
 	  echo "rm -f \"$${dir}/so_locations\""; \
 	  rm -f "$${dir}/so_locations"; \
 	done
+libgomp-plugin-nonshm-host.la: $(libgomp_plugin_nonshm_host_la_OBJECTS) $(libgomp_plugin_nonshm_host_la_DEPENDENCIES) 
+	$(libgomp_plugin_nonshm_host_la_LINK) -rpath $(toolexeclibdir) $(libgomp_plugin_nonshm_host_la_OBJECTS) $(libgomp_plugin_nonshm_host_la_LIBADD) $(LIBS)
+libgomp-plugin-nvptx.la: $(libgomp_plugin_nvptx_la_OBJECTS) $(libgomp_plugin_nvptx_la_DEPENDENCIES) 
+	$(libgomp_plugin_nvptx_la_LINK) $(am_libgomp_plugin_nvptx_la_rpath) $(libgomp_plugin_nvptx_la_OBJECTS) $(libgomp_plugin_nvptx_la_LIBADD) $(LIBS)
 libgomp.la: $(libgomp_la_OBJECTS) $(libgomp_la_DEPENDENCIES) 
 	$(libgomp_la_LINK) -rpath $(toolexeclibdir) $(libgomp_la_OBJECTS) $(libgomp_la_LIBADD) $(LIBS)
 
@@ -463,10 +520,21 @@ distclean-compile:
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/fortran.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/iter.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/iter_ull.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libgomp-plugin.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libgomp_plugin_nonshm_host_la-oacc-host.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libgomp_plugin_nvptx_la-plugin-nvptx.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/lock.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/loop.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/loop_ull.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/mutex.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/oacc-async.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/oacc-cuda.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/oacc-fortran.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/oacc-host.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/oacc-init.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/oacc-mem.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/oacc-parallel.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/oacc-plugin.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ordered.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/parallel.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/proc.Plo@am__quote@
@@ -474,6 +542,7 @@ distclean-compile:
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/sections.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/sem.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/single.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/splay-tree.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/target.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/task.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/team.Plo@am__quote@
@@ -501,6 +570,20 @@ distclean-compile:
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
 @am__fastdepCC_FALSE@	$(LTCOMPILE) -c -o $@ $<
 
+libgomp_plugin_nonshm_host_la-oacc-host.lo: oacc-host.c
+@am__fastdepCC_TRUE@	$(LIBTOOL)  --tag=CC $(libgomp_plugin_nonshm_host_la_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libgomp_plugin_nonshm_host_la_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) -MT libgomp_plugin_nonshm_host_la-oacc-host.lo -MD -MP -MF $(DEPDIR)/libgomp_plugin_nonshm_host_la-oacc-host.Tpo -c -o libgomp_plugin_nonshm_host_la-oacc-host.lo `test -f 'oacc-host.c' || echo '$(srcdir)/'`oacc-host.c
+@am__fastdepCC_TRUE@	$(am__mv) $(DEPDIR)/libgomp_plugin_nonshm_host_la-oacc-host.Tpo $(DEPDIR)/libgomp_plugin_nonshm_host_la-oacc-host.Plo
+@AMDEP_TRUE@@am__fastdepCC_FALSE@	source='oacc-host.c' object='libgomp_plugin_nonshm_host_la-oacc-host.lo' libtool=yes @AMDEPBACKSLASH@
+@AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
+@am__fastdepCC_FALSE@	$(LIBTOOL)  --tag=CC $(libgomp_plugin_nonshm_host_la_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libgomp_plugin_nonshm_host_la_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) -c -o libgomp_plugin_nonshm_host_la-oacc-host.lo `test -f 'oacc-host.c' || echo '$(srcdir)/'`oacc-host.c
+
+libgomp_plugin_nvptx_la-plugin-nvptx.lo: plugin-nvptx.c
+@am__fastdepCC_TRUE@	$(LIBTOOL)  --tag=CC $(libgomp_plugin_nvptx_la_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libgomp_plugin_nvptx_la_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) -MT libgomp_plugin_nvptx_la-plugin-nvptx.lo -MD -MP -MF $(DEPDIR)/libgomp_plugin_nvptx_la-plugin-nvptx.Tpo -c -o libgomp_plugin_nvptx_la-plugin-nvptx.lo `test -f 'plugin-nvptx.c' || echo '$(srcdir)/'`plugin-nvptx.c
+@am__fastdepCC_TRUE@	$(am__mv) $(DEPDIR)/libgomp_plugin_nvptx_la-plugin-nvptx.Tpo $(DEPDIR)/libgomp_plugin_nvptx_la-plugin-nvptx.Plo
+@AMDEP_TRUE@@am__fastdepCC_FALSE@	source='plugin-nvptx.c' object='libgomp_plugin_nvptx_la-plugin-nvptx.lo' libtool=yes @AMDEPBACKSLASH@
+@AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
+@am__fastdepCC_FALSE@	$(LIBTOOL)  --tag=CC $(libgomp_plugin_nvptx_la_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libgomp_plugin_nvptx_la_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) -c -o libgomp_plugin_nvptx_la-plugin-nvptx.lo `test -f 'plugin-nvptx.c' || echo '$(srcdir)/'`plugin-nvptx.c
+
 mostlyclean-libtool:
 	-rm -f *.lo
 
diff --git a/libgomp/config.h.in b/libgomp/config.h.in
index 67f5420..1aa6454 100644
--- a/libgomp/config.h.in
+++ b/libgomp/config.h.in
@@ -82,7 +82,7 @@
 /* Define to 1 if you have the <unistd.h> header file. */
 #undef HAVE_UNISTD_H
 
-/* Define to 1 if GNU symbol versioning is used for libgomp. */
+/* Define to 1 if GNU symbol versioning is used. */
 #undef LIBGOMP_GNU_SYMBOL_VERSIONING
 
 /* Define to the sub-directory in which libtool stores uninstalled libraries.
@@ -110,6 +110,9 @@
 /* Define to the version of this package. */
 #undef PACKAGE_VERSION
 
+/* Define to 1 if the NVIDIA plugin is built, 0 if not. */
+#undef PLUGIN_NVPTX
+
 /* Define if all infrastructure, needed for plugins, is supported. */
 #undef PLUGIN_SUPPORT
 
diff --git a/libgomp/configure b/libgomp/configure
index 704f22a..262cc89 100755
--- a/libgomp/configure
+++ b/libgomp/configure
@@ -627,6 +627,12 @@ LIBGOMP_BUILD_VERSIONED_SHLIB_FALSE
 LIBGOMP_BUILD_VERSIONED_SHLIB_TRUE
 OPT_LDFLAGS
 SECTION_LDFLAGS
+PLUGIN_NVPTX_FALSE
+PLUGIN_NVPTX_TRUE
+PLUGIN_NVPTX_LIBS
+PLUGIN_NVPTX_LDFLAGS
+PLUGIN_NVPTX_CPPFLAGS
+PLUGIN_NVPTX
 libtool_VERSION
 ac_ct_FC
 FCFLAGS
@@ -758,6 +764,9 @@ ac_user_opts='
 enable_option_checking
 enable_version_specific_runtime_libs
 enable_generated_files_in_srcdir
+with_cuda_driver
+with_cuda_driver_include
+with_cuda_driver_lib
 enable_multilib
 enable_dependency_tracking
 enable_shared
@@ -1425,6 +1434,16 @@ Optional Features:
 Optional Packages:
   --with-PACKAGE[=ARG]    use PACKAGE [ARG=yes]
   --without-PACKAGE       do not use PACKAGE (same as --with-PACKAGE=no)
+  --with-cuda-driver=PATH specify prefix directory for installed CUDA driver
+                          package. Equivalent to
+                          --with-cuda-driver-include=PATH/include plus
+                          --with-cuda-driver-lib=PATH/lib
+  --with-cuda-driver-include=PATH
+                          specify directory for installed CUDA driver include
+                          files
+  --with-cuda-driver-lib=PATH
+                          specify directory for the installed CUDA driver
+                          library
   --with-pic              try to use only PIC/non-PIC objects [default=use
                           both]
   --with-gnu-ld           assume the C compiler uses GNU ld [default=no]
@@ -2596,6 +2615,38 @@ else
 fi
 
 
+# Look for the CUDA driver package.
+CUDA_DRIVER_CPPFLAGS=
+CUDA_DRIVER_LDFLAGS=
+
+# Check whether --with-cuda-driver was given.
+if test "${with_cuda_driver+set}" = set; then :
+  withval=$with_cuda_driver;
+fi
+
+
+# Check whether --with-cuda-driver-include was given.
+if test "${with_cuda_driver_include+set}" = set; then :
+  withval=$with_cuda_driver_include;
+fi
+
+
+# Check whether --with-cuda-driver-lib was given.
+if test "${with_cuda_driver_lib+set}" = set; then :
+  withval=$with_cuda_driver_lib;
+fi
+
+if test "x$with_cuda_driver" != x; then
+  CUDA_DRIVER_CPPFLAGS=-I$with_cuda_driver/include
+  CUDA_DRIVER_LDFLAGS=-L$with_cuda_driver/lib
+fi
+if test "x$with_cuda_driver_include" != x; then
+  CUDA_DRIVER_CPPFLAGS=-I$with_cuda_driver_include
+fi
+if test "x$with_cuda_driver_lib" != x; then
+  CUDA_DRIVER_LDFLAGS=-L$with_cuda_driver_lib
+fi
+
 
 # -------
 # -------
@@ -11094,7 +11145,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 11097 "configure"
+#line 11148 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -11200,7 +11251,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 11203 "configure"
+#line 11254 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -15009,6 +15060,7 @@ ac_config_commands="$ac_config_commands gstdint.h"
 
 
 
+# TODO: not for OpenACC?
 # Check to see if -pthread or -lpthread is needed.  Prefer the former.
 # In case the pthread.h system header is not found, this test will fail.
 XPCFLAGS=""
@@ -15113,7 +15165,78 @@ if test x$plugin_support = xyes; then
 
 $as_echo "#define PLUGIN_SUPPORT 1" >>confdefs.h
 
+elif test "x$enable_accelerator" != xno; then
+  as_fn_error "Can't have support for accelerators without support for plugins" "$LINENO" 5
+fi
+
+PLUGIN_NVPTX=0
+PLUGIN_NVPTX_CPPFLAGS=
+PLUGIN_NVPTX_LDFLAGS=
+PLUGIN_NVPTX_LIBS=
+
+
+
+
+# enable_accelerator has already been validated at top level.
+# No need to do it again.
+case $enable_accelerator in
+  auto-nvptx*|nvptx*)
+    PLUGIN_NVPTX=$enable_accelerator
+    PLUGIN_NVPTX_CPPFLAGS=$CUDA_DRIVER_CPPFLAGS
+    PLUGIN_NVPTX_LDFLAGS=$CUDA_DRIVER_LDFLAGS
+    PLUGIN_NVPTX_LIBS='-lcuda'
+
+    PLUGIN_NVPTX_save_CPPFLAGS=$CPPFLAGS
+    CPPFLAGS="$PLUGIN_NVPTX_CPPFLAGS $CPPFLAGS"
+    PLUGIN_NVPTX_save_LDFLAGS=$LDFLAGS
+    LDFLAGS="$PLUGIN_NVPTX_LDFLAGS $LDFLAGS"
+    PLUGIN_NVPTX_save_LIBS=$LIBS
+    LIBS="$PLUGIN_NVPTX_LIBS $LIBS"
+    cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#include "cuda.h"
+int
+main ()
+{
+CUresult r = cuCtxPushCurrent (NULL);
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_link "$LINENO"; then :
+  PLUGIN_NVPTX=1
 fi
+rm -f core conftest.err conftest.$ac_objext \
+    conftest$ac_exeext conftest.$ac_ext
+    CPPFLAGS=$PLUGIN_NVPTX_save_CPPFLAGS
+    LDFLAGS=$PLUGIN_NVPTX_save_LDFLAGS
+    LIBS=$PLUGIN_NVPTX_save_LIBS
+    case $PLUGIN_NVPTX in
+      auto-nvptx*)
+	PLUGIN_NVPTX=0
+	{ $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: CUDA driver package required for nvptx support; disabling" >&5
+$as_echo "$as_me: WARNING: CUDA driver package required for nvptx support; disabling" >&2;}
+	;;
+      nvptx*)
+	PLUGIN_NVPTX=0
+	as_fn_error "CUDA driver package required for nvptx support" "$LINENO" 5
+	;;
+    esac
+    ;;
+esac
+ if test $PLUGIN_NVPTX = 1; then
+  PLUGIN_NVPTX_TRUE=
+  PLUGIN_NVPTX_FALSE='#'
+else
+  PLUGIN_NVPTX_TRUE='#'
+  PLUGIN_NVPTX_FALSE=
+fi
+
+
+cat >>confdefs.h <<_ACEOF
+#define PLUGIN_NVPTX $PLUGIN_NVPTX
+_ACEOF
+
 
 # Check for functions needed.
 for ac_func in getloadavg clock_gettime strtoull
@@ -15278,6 +15401,7 @@ fi
 rm -f core conftest.err conftest.$ac_objext \
     conftest$ac_exeext conftest.$ac_ext
 
+# TODO: not for OpenACC?
 # At least for glibc, clock_gettime is in librt.  But don't pull that
 # in if it still doesn't give us the function we want.
 if test $ac_cv_func_clock_gettime = no; then
@@ -16127,6 +16251,7 @@ $as_echo "#define HAVE_SYNC_BUILTINS 1" >>confdefs.h
 
   fi
 
+# TODO: not for OpenACC?
 XCFLAGS="$XCFLAGS$XPCFLAGS"
 
 
@@ -16241,6 +16366,7 @@ fi
 # the underscore here and update the PREREQ.  If it doesn't, then we'll
 # need to copy this macro to our acinclude.m4.
 save_CFLAGS="$CFLAGS"
+# TODO: not for OpenACC?
 for i in $config_path; do
   if test -f $srcdir/config/$i/omp-lock.h; then
     CFLAGS="$CFLAGS -include confdefs.h -include $srcdir/config/$i/omp-lock.h"
@@ -16458,6 +16584,10 @@ if test -z "${MAINTAINER_MODE_TRUE}" && test -z "${MAINTAINER_MODE_FALSE}"; then
   as_fn_error "conditional \"MAINTAINER_MODE\" was never defined.
 Usually this means the macro was only invoked conditionally." "$LINENO" 5
 fi
+if test -z "${PLUGIN_NVPTX_TRUE}" && test -z "${PLUGIN_NVPTX_FALSE}"; then
+  as_fn_error "conditional \"PLUGIN_NVPTX\" was never defined.
+Usually this means the macro was only invoked conditionally." "$LINENO" 5
+fi
 if test -z "${LIBGOMP_BUILD_VERSIONED_SHLIB_TRUE}" && test -z "${LIBGOMP_BUILD_VERSIONED_SHLIB_FALSE}"; then
   as_fn_error "conditional \"LIBGOMP_BUILD_VERSIONED_SHLIB\" was never defined.
 Usually this means the macro was only invoked conditionally." "$LINENO" 5
diff --git a/libgomp/configure.ac b/libgomp/configure.ac
index da06426..db67a21 100644
--- a/libgomp/configure.ac
+++ b/libgomp/configure.ac
@@ -2,6 +2,8 @@
 # aclocal -I ../config && autoconf && autoheader && automake
 
 AC_PREREQ(2.64)
+#TODO: Update for OpenACC?  But then also have to update copyright notices in
+#all source files...
 AC_INIT([GNU OpenMP Runtime Library], 1.0,,[libgomp])
 AC_CONFIG_HEADER(config.h)
 
@@ -28,6 +30,31 @@ LIBGOMP_ENABLE(generated-files-in-srcdir, no, ,
 AC_MSG_RESULT($enable_generated_files_in_srcdir)
 AM_CONDITIONAL(GENINSRC, test "$enable_generated_files_in_srcdir" = yes)
 
+# Look for the CUDA driver package.
+CUDA_DRIVER_CPPFLAGS=
+CUDA_DRIVER_LDFLAGS=
+AC_ARG_WITH(cuda-driver,
+	[AS_HELP_STRING([--with-cuda-driver=PATH],
+		[specify prefix directory for installed CUDA driver package.
+		 Equivalent to --with-cuda-driver-include=PATH/include
+		 plus --with-cuda-driver-lib=PATH/lib])])
+AC_ARG_WITH(cuda-driver-include,
+	[AS_HELP_STRING([--with-cuda-driver-include=PATH],
+		[specify directory for installed CUDA driver include files])])
+AC_ARG_WITH(cuda-driver-lib,
+	[AS_HELP_STRING([--with-cuda-driver-lib=PATH],
+		[specify directory for the installed CUDA driver library])])
+if test "x$with_cuda_driver" != x; then
+  CUDA_DRIVER_CPPFLAGS=-I$with_cuda_driver/include
+  CUDA_DRIVER_LDFLAGS=-L$with_cuda_driver/lib
+fi
+if test "x$with_cuda_driver_include" != x; then
+  CUDA_DRIVER_CPPFLAGS=-I$with_cuda_driver_include
+fi
+if test "x$with_cuda_driver_lib" != x; then
+  CUDA_DRIVER_LDFLAGS=-L$with_cuda_driver_lib
+fi
+
 
 # -------
 # -------
@@ -174,6 +201,7 @@ AC_CHECK_HEADERS(unistd.h semaphore.h sys/loadavg.h sys/time.h sys/time.h)
 
 GCC_HEADER_STDINT(gstdint.h)
 
+# TODO: not for OpenACC?
 # Check to see if -pthread or -lpthread is needed.  Prefer the former.
 # In case the pthread.h system header is not found, this test will fail.
 XPCFLAGS=""
@@ -200,8 +228,57 @@ AC_CHECK_HEADER(dirent.h, , [plugin_support=no])
 if test x$plugin_support = xyes; then
   AC_DEFINE(PLUGIN_SUPPORT, 1,
     [Define if all infrastructure, needed for plugins, is supported.])
+elif test "x$enable_accelerator" != xno; then
+  AC_MSG_ERROR([Can't have support for accelerators without support for plugins])
 fi
 
+PLUGIN_NVPTX=0
+PLUGIN_NVPTX_CPPFLAGS=
+PLUGIN_NVPTX_LDFLAGS=
+PLUGIN_NVPTX_LIBS=
+AC_SUBST(PLUGIN_NVPTX)
+AC_SUBST(PLUGIN_NVPTX_CPPFLAGS)
+AC_SUBST(PLUGIN_NVPTX_LDFLAGS)
+AC_SUBST(PLUGIN_NVPTX_LIBS)
+# enable_accelerator has already been validated at top level.
+# No need to do it again.
+case $enable_accelerator in
+  auto-nvptx*|nvptx*)
+    PLUGIN_NVPTX=$enable_accelerator
+    PLUGIN_NVPTX_CPPFLAGS=$CUDA_DRIVER_CPPFLAGS
+    PLUGIN_NVPTX_LDFLAGS=$CUDA_DRIVER_LDFLAGS
+    PLUGIN_NVPTX_LIBS='-lcuda'
+
+    PLUGIN_NVPTX_save_CPPFLAGS=$CPPFLAGS
+    CPPFLAGS="$PLUGIN_NVPTX_CPPFLAGS $CPPFLAGS"
+    PLUGIN_NVPTX_save_LDFLAGS=$LDFLAGS
+    LDFLAGS="$PLUGIN_NVPTX_LDFLAGS $LDFLAGS"
+    PLUGIN_NVPTX_save_LIBS=$LIBS
+    LIBS="$PLUGIN_NVPTX_LIBS $LIBS"
+    AC_LINK_IFELSE(
+      [AC_LANG_PROGRAM(
+	[#include "cuda.h"],
+	[CUresult r = cuCtxPushCurrent (NULL);])],
+      [PLUGIN_NVPTX=1])
+    CPPFLAGS=$PLUGIN_NVPTX_save_CPPFLAGS
+    LDFLAGS=$PLUGIN_NVPTX_save_LDFLAGS
+    LIBS=$PLUGIN_NVPTX_save_LIBS
+    case $PLUGIN_NVPTX in
+      auto-nvptx*)
+	PLUGIN_NVPTX=0
+	AC_MSG_WARN([CUDA driver package required for nvptx support; disabling])
+	;;
+      nvptx*)
+	PLUGIN_NVPTX=0
+	AC_MSG_ERROR([CUDA driver package required for nvptx support])
+	;;
+    esac
+    ;;
+esac
+AM_CONDITIONAL([PLUGIN_NVPTX], [test $PLUGIN_NVPTX = 1])
+AC_DEFINE_UNQUOTED([PLUGIN_NVPTX], [$PLUGIN_NVPTX],
+		  [Define to 1 if the NVIDIA plugin is built, 0 if not.])
+
 # Check for functions needed.
 AC_CHECK_FUNCS(getloadavg clock_gettime strtoull)
 
@@ -235,6 +312,7 @@ AC_LINK_IFELSE(
   AC_DEFINE(HAVE_PTHREAD_AFFINITY_NP, 1,
 [	Define if pthread_{,attr_}{g,s}etaffinity_np is supported.]))
 
+# TODO: not for OpenACC?
 # At least for glibc, clock_gettime is in librt.  But don't pull that
 # in if it still doesn't give us the function we want.
 if test $ac_cv_func_clock_gettime = no; then
@@ -255,7 +333,7 @@ LIBGOMP_ENABLE_SYMVERS
 
 if test $enable_symvers = gnu; then
   AC_DEFINE(LIBGOMP_GNU_SYMBOL_VERSIONING, 1,
-	    [Define to 1 if GNU symbol versioning is used for libgomp.])
+	    [Define to 1 if GNU symbol versioning is used.])
 fi
 
 # Get target configury.
@@ -266,6 +344,7 @@ CFLAGS="$save_CFLAGS $XCFLAGS"
 # had a chance to set XCFLAGS.
 LIBGOMP_CHECK_SYNC_BUILTINS
 
+# TODO: not for OpenACC?
 XCFLAGS="$XCFLAGS$XPCFLAGS"
 
 AC_SUBST(config_path)
@@ -300,6 +379,7 @@ AM_CONDITIONAL([USE_FORTRAN], [test "$ac_cv_fc_compiler_gnu" = yes])
 # the underscore here and update the PREREQ.  If it doesn't, then we'll
 # need to copy this macro to our acinclude.m4.
 save_CFLAGS="$CFLAGS"
+# TODO: not for OpenACC?
 for i in $config_path; do
   if test -f $srcdir/config/$i/omp-lock.h; then
     CFLAGS="$CFLAGS -include confdefs.h -include $srcdir/config/$i/omp-lock.h"
diff --git a/libgomp/configure.tgt b/libgomp/configure.tgt
index 8b18417..b315834 100644
--- a/libgomp/configure.tgt
+++ b/libgomp/configure.tgt
@@ -116,12 +116,14 @@ case "${target}" in
 	case "${target}" in
 	  *-*-hpux11*)
 	     # HPUX v11.x requires -lrt to resolve sem_init in libgomp.la
+	     # TODO: not for OpenACC?
 	     XLDFLAGS="${XLDFLAGS} -lrt"
 	     ;;
 	esac
 	case "${target}" in
 	  hppa[12]*-*-hpux*)
 	    # PA 32 HP-UX needs -frandom-seed for bootstrap compare.
+	    # TODO: not for OpenACC?
 	    XCFLAGS="${XCFLAGS} -frandom-seed=fixed-seed"
 	    ;;
 	esac
@@ -137,6 +139,7 @@ case "${target}" in
 
   *-*-freebsd*)
 	# Need to link with -lpthread so libgomp.so is self-contained.
+	# TODO: not for OpenACC?
 	XLDFLAGS="${XLDFLAGS} -lpthread"
 	;;
 
diff --git a/libgomp/env.c b/libgomp/env.c
index 94c72a3..32fb92c 100644
--- a/libgomp/env.c
+++ b/libgomp/env.c
@@ -27,6 +27,7 @@
 
 #include "libgomp.h"
 #include "libgomp_f.h"
+#include "target.h"
 #include <ctype.h>
 #include <stdlib.h>
 #include <stdio.h>
@@ -77,6 +78,9 @@ unsigned long gomp_bind_var_list_len;
 void **gomp_places_list;
 unsigned long gomp_places_list_len;
 
+int goacc_device_num;
+char* goacc_device_type;
+
 /* Parse the OMP_SCHEDULE environment variable.  */
 
 static void
@@ -1013,6 +1017,37 @@ parse_affinity (bool ignore)
 
 
 static void
+goacc_parse_device_num (void)
+{
+  const char *env = getenv ("ACC_DEVICE_NUM");
+  int default_num = -1;
+  
+  if (env && *env != '\0')
+    {
+      char *end;
+      default_num = strtol (env, &end, 0);
+      
+      if (*end || default_num < 0)
+        default_num = 0;
+    }
+  else
+    default_num = 0;
+  
+  goacc_device_num = default_num;
+}
+
+static void
+goacc_parse_device_type (void)
+{
+  const char *env = getenv ("ACC_DEVICE_TYPE");
+  
+  if (env && *env != '\0')
+    goacc_device_type = strdup (env);
+  else
+    goacc_device_type = NULL;
+}
+
+static void
 handle_omp_display_env (unsigned long stacksize, int wait_policy)
 {
   const char *env;
@@ -1181,6 +1216,7 @@ initialize_env (void)
       gomp_global_icv.thread_limit_var
 	= thread_limit_var > INT_MAX ? UINT_MAX : thread_limit_var;
     }
+  parse_int ("GCC_ACC_NOTIFY", &gomp_global_icv.acc_notify_var, true);
 #ifndef HAVE_SYNC_BUILTINS
   gomp_mutex_init (&gomp_managed_threads_lock);
 #endif
@@ -1271,6 +1307,13 @@ initialize_env (void)
     }
 
   handle_omp_display_env (stacksize, wait_policy);
+  
+  /* Look for OpenACC-specific environment variables.  */
+  goacc_parse_device_num ();
+  goacc_parse_device_type ();
+
+  /* Initialize OpenACC-specific internal state.  */
+  ACC_runtime_initialize ();
 }
 
 \f
diff --git a/libgomp/error.c b/libgomp/error.c
index d9b28f1..5f400cc 100644
--- a/libgomp/error.c
+++ b/libgomp/error.c
@@ -35,7 +35,7 @@
 #include <stdlib.h>
 
 
-static void
+void
 gomp_verror (const char *fmt, va_list list)
 {
   fputs ("\nlibgomp: ", stderr);
@@ -54,13 +54,40 @@ gomp_error (const char *fmt, ...)
 }
 
 void
+gomp_vfatal (const char *fmt, va_list list)
+{
+  gomp_verror (fmt, list);
+  exit (EXIT_FAILURE);
+}
+
+void
 gomp_fatal (const char *fmt, ...)
 {
   va_list list;
 
   va_start (list, fmt);
-  gomp_verror (fmt, list);
+  gomp_vfatal (fmt, list);
   va_end (list);
 
-  exit (EXIT_FAILURE);
+  /* Unreachable.  */
+  abort ();
+}
+
+void
+gomp_vnotify (const char *msg, va_list list)
+{
+  struct gomp_task_icv *icv = gomp_icv (false);
+  if (icv->acc_notify_var)
+    vfprintf (stderr, msg, list);
+}
+
+void
+gomp_notify(const char *msg, ...)
+{
+  va_list list;
+  
+  va_start (list, msg);
+  gomp_vnotify (msg, list);
+  va_end (list);
 }
+
diff --git a/libgomp/libgomp-plugin.c b/libgomp/libgomp-plugin.c
new file mode 100644
index 0000000..73c8765
--- /dev/null
+++ b/libgomp/libgomp-plugin.c
@@ -0,0 +1,106 @@
+/* Copyright (C) 2014 Free Software Foundation, Inc.
+   Contributed by CodeSourcery.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* Exported (non-hidden) functions exposing libgomp interface for plugins.  */
+
+#include <stdlib.h>
+
+#include "libgomp.h"
+#include "libgomp-plugin.h"
+#include "target.h"
+
+void *
+gomp_plugin_malloc (size_t size)
+{
+  return gomp_malloc (size);
+}
+
+void *
+gomp_plugin_malloc_cleared (size_t size)
+{
+  return gomp_malloc_cleared (size);
+}
+
+void *
+gomp_plugin_realloc (void *ptr, size_t size)
+{
+  return gomp_realloc (ptr, size);
+}
+
+void
+gomp_plugin_error (const char *msg, ...)
+{
+  va_list ap;
+  
+  va_start (ap, msg);
+  gomp_verror (msg, ap);
+  va_end (ap);
+}
+
+void
+gomp_plugin_notify (const char *msg, ...)
+{
+  va_list ap;
+  
+  va_start (ap, msg);
+  gomp_vnotify (msg, ap);
+  va_end (ap);
+}
+
+void
+gomp_plugin_fatal (const char *msg, ...)
+{
+  va_list ap;
+  
+  va_start (ap, msg);
+  gomp_vfatal (msg, ap);
+  va_end (ap);
+  
+  /* Unreachable.  */
+  abort ();
+}
+
+void
+gomp_plugin_mutex_init (gomp_mutex_t *mutex)
+{
+  gomp_mutex_init (mutex);
+}
+
+void
+gomp_plugin_mutex_destroy (gomp_mutex_t *mutex)
+{
+  gomp_mutex_destroy (mutex);
+}
+
+void
+gomp_plugin_mutex_lock (gomp_mutex_t *mutex)
+{
+  gomp_mutex_lock (mutex);
+}
+
+void
+gomp_plugin_mutex_unlock (gomp_mutex_t *mutex)
+{
+  gomp_mutex_unlock (mutex);
+}
diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h
new file mode 100644
index 0000000..ea4d89a
--- /dev/null
+++ b/libgomp/libgomp-plugin.h
@@ -0,0 +1,57 @@
+/* Copyright (C) 2014 Free Software Foundation, Inc.
+   Contributed by CodeSourcery.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* An interface to various libgomp-internal functions for use by plugins.  */
+
+#ifndef LIBGOMP_PLUGIN_H
+#define LIBGOMP_PLUGIN_H 1
+
+#include "mutex.h"
+
+/* alloc.c */
+
+extern void *gomp_plugin_malloc (size_t) __attribute__((malloc));
+extern void *gomp_plugin_malloc_cleared (size_t) __attribute__((malloc));
+extern void *gomp_plugin_realloc (void *, size_t);
+
+/* error.c */
+
+extern void gomp_plugin_notify(const char *msg, ...);
+extern void gomp_plugin_error (const char *, ...)
+	__attribute__((format (printf, 1, 2)));
+extern void gomp_plugin_fatal (const char *, ...)
+	__attribute__((noreturn, format (printf, 1, 2)));
+
+/* mutex.c */
+
+extern void gomp_plugin_mutex_init (gomp_mutex_t *mutex);
+extern void gomp_plugin_mutex_destroy (gomp_mutex_t *mutex);
+extern void gomp_plugin_mutex_lock (gomp_mutex_t *mutex);
+extern void gomp_plugin_mutex_unlock (gomp_mutex_t *mutex);
+
+/* target.c */
+
+extern void gomp_plugin_async_unmap_vars (void *ptr);
+
+#endif
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index d53a326..2f3131d 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -40,6 +40,7 @@
 #include <pthread.h>
 #include <stdbool.h>
 #include <stdlib.h>
+#include <stdarg.h>
 
 #ifdef HAVE_ATTRIBUTE_VISIBILITY
 # pragma GCC visibility push(hidden)
@@ -220,6 +221,7 @@ struct gomp_team_state
 };
 
 struct target_mem_desc;
+struct gomp_memory_mapping;
 
 /* These are the OpenMP 4.0 Internal Control Variables described in
    section 2.3.1.  Those described as having one copy per task are
@@ -236,6 +238,7 @@ struct gomp_task_icv
   bool dyn_var;
   bool nest_var;
   char bind_var;
+  int acc_notify_var;
   /* Internal ICV.  */
   struct target_mem_desc *target_data;
 };
@@ -254,6 +257,9 @@ extern unsigned long gomp_bind_var_list_len;
 extern void **gomp_places_list;
 extern unsigned long gomp_places_list_len;
 
+extern int goacc_device_num;
+extern char* goacc_device_type;
+
 enum gomp_task_kind
 {
   GOMP_TASK_IMPLICIT,
@@ -532,8 +538,12 @@ extern void *gomp_realloc (void *, size_t);
 
 /* error.c */
 
+extern void gomp_vnotify (const char *, va_list);
+extern void gomp_notify(const char *msg, ...);
+extern void gomp_verror (const char *, va_list);
 extern void gomp_error (const char *, ...)
 	__attribute__((format (printf, 1, 2)));
+extern void gomp_vfatal (const char *, va_list);
 extern void gomp_fatal (const char *, ...)
 	__attribute__((noreturn, format (printf, 1, 2)));
 
@@ -610,6 +620,7 @@ extern int gomp_get_num_devices (void);
 
 /* target.c */
 
+extern void gomp_init_targets_once (void);
 extern int gomp_get_num_devices (void);
 
 /* work.c */
diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
index f36df23..03ba281 100644
--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -232,3 +232,66 @@ GOMP_4.0.1 {
   global:
 	GOMP_offload_register;
 } GOMP_4.0;
+
+OACC_2.0 {
+  global:
+	acc_get_num_devices;
+	acc_get_num_devices_;
+	acc_set_device_type;
+	acc_set_device_type_;
+	acc_get_device_type;
+	acc_get_device_type_;
+	acc_set_device_num;
+	acc_set_device_num_;
+	acc_get_device_num;
+	acc_get_device_num_;
+	acc_init;
+	acc_init_;
+	acc_shutdown;
+	acc_shutdown_;
+	acc_on_device;
+	acc_on_device_;
+	acc_malloc;
+	acc_free;
+	acc_copyin;
+	acc_present_or_copyin;
+	acc_create;
+	acc_present_or_create;
+	acc_copyout;
+	acc_delete;
+	acc_update_device;
+	acc_update_self;
+	acc_map_data;
+	acc_unmap_data;
+	acc_deviceptr;
+	acc_hostptr;
+	acc_is_present;
+	acc_memcpy_to_device;
+	acc_memcpy_from_device;
+	acc_async_test;
+	acc_async_test_all;
+	acc_wait;
+	acc_wait_async;
+	acc_wait_all;
+	acc_wait_all_async;
+	acc_get_current_cuda_device;
+	acc_get_current_cuda_context;
+	acc_get_cuda_stream;
+	acc_set_cuda_stream;
+};
+
+# FIXME: Hygiene/grouping/naming?
+PLUGIN_1.0 {
+  global:
+	gomp_plugin_malloc;
+	gomp_plugin_malloc_cleared;
+	gomp_plugin_realloc;
+	gomp_plugin_error;
+	gomp_plugin_notify;
+	gomp_plugin_fatal;
+	gomp_plugin_mutex_init;
+	gomp_plugin_mutex_destroy;
+	gomp_plugin_mutex_lock;
+	gomp_plugin_mutex_unlock;
+	gomp_plugin_async_unmap_vars;
+};
diff --git a/libgomp/libgomp_g.h b/libgomp/libgomp_g.h
index be0c6ea..44f200c 100644
--- a/libgomp/libgomp_g.h
+++ b/libgomp/libgomp_g.h
@@ -214,4 +214,17 @@ extern void GOMP_target_update (int, const void *,
 				size_t, void **, size_t *, unsigned char *);
 extern void GOMP_teams (unsigned int, unsigned int);
 
+/* oacc-parallel.c */
+
+extern void GOACC_data_start (int, const void *,
+			      size_t, void **, size_t *, unsigned short *);
+extern void GOACC_data_end (void);
+extern void GOACC_kernels (int, void (*) (void *), const void *,
+			   size_t, void **, size_t *, unsigned short *,
+			   int, int, int, int, int, ...);
+extern void GOACC_parallel (int, void (*) (void *), const void *,
+			    size_t, void **, size_t *, unsigned short *,
+			    int, int, int, int, int, ...);
+extern void GOACC_wait (int, int, ...);
+
 #endif /* LIBGOMP_G_H */
diff --git a/libgomp/oacc-async.c b/libgomp/oacc-async.c
new file mode 100644
index 0000000..e6b6ebf
--- /dev/null
+++ b/libgomp/oacc-async.c
@@ -0,0 +1,80 @@
+/* OpenACC Runtime Library Definitions.
+
+   Copyright (C) 2013-2014 Free Software Foundation, Inc.
+
+   Contributed by Nathan Sidwell <nathan@codesourcery.com>.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#include "openacc.h"
+#include "libgomp.h"
+#include "target.h"
+
+int
+acc_async_test (int async)
+{
+  if (async < acc_async_sync)
+    gomp_fatal ("invalid async argument: %d", async);
+
+  return ACC_dev->openacc.async_test_func (async);
+}
+
+int
+acc_async_test_all (void)
+{
+  return ACC_dev->openacc.async_test_all_func ();
+}
+
+void
+acc_wait (int async)
+{
+  if (async < acc_async_sync)
+    gomp_fatal ("invalid async argument: %d", async);
+
+  ACC_dev->openacc.async_wait_func (async);
+  return;
+}
+
+void
+acc_wait_async (int async1, int async2)
+{
+  ACC_dev->openacc.async_wait_async_func (async1, async2);
+  return;
+}
+
+void
+acc_wait_all (void)
+{
+  ACC_dev->openacc.async_wait_all_func ();
+  return;
+}
+
+void
+acc_wait_all_async (int async)
+{
+  if (async < acc_async_sync)
+    gomp_fatal ("invalid async argument: %d", async);
+
+  ACC_dev->openacc.async_wait_all_async_func (async);
+  return;
+}
diff --git a/libgomp/oacc-cuda.c b/libgomp/oacc-cuda.c
new file mode 100644
index 0000000..f587325
--- /dev/null
+++ b/libgomp/oacc-cuda.c
@@ -0,0 +1,81 @@
+/* OpenACC Runtime Library: CUDA support glue.
+
+   Copyright (C) 2014 Free Software Foundation, Inc.
+
+   Contributed by Mentor Embedded.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "openacc.h"
+#include "config.h"
+#include "libgomp.h"
+#include "target.h"
+
+void *
+acc_get_current_cuda_device (void)
+{
+  void *p = NULL;
+
+  if (ACC_dev && ACC_dev->openacc.cuda.get_current_device_func)
+    p = ACC_dev->openacc.cuda.get_current_device_func ();
+
+  return p;
+}
+
+void *
+acc_get_current_cuda_context (void)
+{
+  void *p = NULL;
+
+  if (ACC_dev && ACC_dev->openacc.cuda.get_current_context_func)
+    p = ACC_dev->openacc.cuda.get_current_context_func ();
+
+  return p;
+}
+
+void *
+acc_get_cuda_stream (int async)
+{
+  void *p = NULL;
+
+  if (async < 0)
+    return p;
+
+  if (ACC_dev && ACC_dev->openacc.cuda.get_stream_func)
+    p = ACC_dev->openacc.cuda.get_stream_func (async);
+
+  return p;
+}
+
+int
+acc_set_cuda_stream (int async, void *stream)
+{
+  int s = -1;
+
+  if (async < 0 || stream == NULL)
+    return 0;
+
+  if (ACC_dev && ACC_dev->openacc.cuda.set_stream_func)
+    s = ACC_dev->openacc.cuda.set_stream_func (async, stream);
+
+  return s;
+}
diff --git a/libgomp/oacc-fortran.c b/libgomp/oacc-fortran.c
new file mode 100644
index 0000000..c047f3d
--- /dev/null
+++ b/libgomp/oacc-fortran.c
@@ -0,0 +1,89 @@
+/* Copyright (C) 2013-2014 Free Software Foundation, Inc.
+
+   Contributed by Thomas Schwinge <thomas@codesourcery.com>.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* This file contains Fortran wrapper routines.  */
+
+#include "openacc.h"
+#include <stdint.h>
+
+#ifdef HAVE_ATTRIBUTE_ALIAS
+/* Use internal aliases if possible.  */
+ialias_redirect (acc_get_num_devices)
+ialias_redirect (acc_set_device_type)
+ialias_redirect (acc_get_device_type)
+ialias_redirect (acc_set_device_num)
+ialias_redirect (acc_get_device_num)
+ialias_redirect (acc_init)
+ialias_redirect (acc_shutdown)
+ialias_redirect (acc_on_device)
+#endif
+
+int32_t
+acc_get_num_devices_ (const int32_t *dev)
+{
+  return acc_get_num_devices (*dev);
+}
+
+void
+acc_set_device_type_ (const int32_t *dev)
+{
+  acc_set_device_type (*dev);
+}
+
+int32_t
+acc_get_device_type_ (void)
+{
+  return acc_get_device_type ();
+}
+
+void
+acc_set_device_num_ (const int32_t *num, const int32_t *dev)
+{
+  acc_set_device_num (*num, *dev);
+}
+
+int32_t
+acc_get_device_num_ (const int32_t *dev)
+{
+  return acc_get_device_num (*dev);
+}
+
+void
+acc_init_ (const int32_t *dev)
+{
+  acc_init (*dev);
+}
+
+void
+acc_shutdown_ (const int32_t *dev)
+{
+  acc_shutdown (*dev);
+}
+
+int32_t
+acc_on_device_ (const acc_device_t *dev)
+{
+  return acc_on_device (*dev);
+}
diff --git a/libgomp/oacc-host.c b/libgomp/oacc-host.c
new file mode 100644
index 0000000..b51058b
--- /dev/null
+++ b/libgomp/oacc-host.c
@@ -0,0 +1,416 @@
+/* OpenACC Runtime Library: acc_device_host.
+
+   Copyright (C) 2013 Free Software Foundation, Inc.
+
+   Contributed by Thomas Schwinge <thomas@codesourcery.com>.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* Simple implementation of support routines for a shared-memory
+   acc_device_host, and a non-shared memory acc_device_nonshm_host, with the
+   latter built as a plugin.  */
+
+#include "openacc.h"
+#include "config.h"
+#include "libgomp.h"
+#include "target.h"
+#ifdef NONSHM_HOST_PLUGIN
+#include "libgomp-plugin.h"
+#endif
+
+#include <stdint.h>
+#include <stdlib.h>
+#include <string.h>
+#include <stdio.h>
+
+#undef DEBUG
+
+#ifdef NONSHM_HOST_PLUGIN
+#define STATIC
+#define GOMP(X) gomp_plugin_##X
+#define SELF "non-SHM host plugin: "
+#else
+#define STATIC static
+#define GOMP(X) gomp_##X
+#define SELF "host: "
+#endif
+
+#ifndef NONSHM_HOST_PLUGIN
+static struct gomp_device_descr host_dispatch;
+#endif
+
+STATIC const char *
+get_name (void)
+{
+#ifdef DEBUG
+  fprintf (stderr, SELF "%s:%s\n", __FILE__, __FUNCTION__);
+#endif
+
+#ifdef NONSHM_HOST_PLUGIN
+  return "nonshm-host";
+#else
+  return "host";
+#endif
+}
+
+STATIC int
+get_type (void)
+{
+#ifdef DEBUG
+  fprintf (stderr, SELF "%s:%s\n", __FILE__, __FUNCTION__);
+#endif
+
+#ifdef NONSHM_HOST_PLUGIN
+  return TARGET_TYPE_NONSHM_HOST;
+#else
+  return TARGET_TYPE_HOST;
+#endif
+}
+
+STATIC unsigned int
+get_caps (void)
+{
+  unsigned int caps = TARGET_CAP_OPENACC_200 | TARGET_CAP_OPENMP_400
+		      | TARGET_CAP_NATIVE_EXEC;
+
+#ifndef NONSHM_HOST_PLUGIN
+  caps |= TARGET_CAP_SHARED_MEM;
+#endif
+
+#ifdef DEBUG
+  fprintf (stderr, SELF "%s:%s: 0x%x\n", __FILE__, __FUNCTION__, caps);
+#endif
+
+  return caps;
+}
+
+STATIC int
+get_num_devices (void)
+{
+#ifdef DEBUG
+  fprintf (stderr, SELF "%s:%s\n", __FILE__, __FUNCTION__);
+#endif
+
+  return 1;
+}
+
+STATIC void
+offload_register (void *host_table, void *target_data)
+{
+#ifdef DEBUG
+  fprintf (stderr, SELF "%s:%s (%p, %p)\n", __FILE__, __FUNCTION__, host_table,
+	   target_data);
+#endif
+}
+
+STATIC int
+device_init (void)
+{
+#ifdef DEBUG
+  fprintf (stderr, SELF "%s:%s\n", __FILE__, __FUNCTION__);
+#endif
+
+  return get_num_devices ();
+}
+
+STATIC int
+device_fini (void)
+{
+#ifdef DEBUG
+  fprintf (stderr, SELF "%s:%s\n", __FILE__, __FUNCTION__);
+#endif
+
+  return 0;
+}
+
+STATIC int
+device_get_table (void *table)
+{
+#ifdef DEBUG
+  fprintf (stderr, SELF "%s:%s (%p)\n", __FILE__, __FUNCTION__, table);
+#endif
+
+  return 0;
+}
+
+STATIC bool
+openacc_avail (void)
+{
+#ifdef DEBUG
+  fprintf (stderr, SELF "%s:%s\n", __FILE__, __FUNCTION__);
+#endif
+
+  return 1;
+}
+
+STATIC void *
+openacc_open_device (int n)
+{
+#ifdef DEBUG
+  fprintf (stderr, SELF "%s:%s (%u)\n", __FILE__, __FUNCTION__, n);
+#endif
+
+  return (void *) (intptr_t) n;
+}
+
+STATIC int
+openacc_close_device (void *hnd)
+{
+#ifdef DEBUG
+  fprintf (stderr, SELF "%s:%s (%p)\n", __FILE__, __FUNCTION__, hnd);
+#endif
+
+  return 0;
+}
+
+STATIC int
+openacc_get_device_num (void)
+{
+#ifdef DEBUG
+  fprintf (stderr, SELF "%s:%s\n", __FILE__, __FUNCTION__);
+#endif
+
+  return 0;
+}
+
+STATIC void
+openacc_set_device_num (int n)
+{
+#ifdef DEBUG
+  fprintf (stderr, SELF "%s:%s (%u)\n", __FILE__, __FUNCTION__, n);
+#endif
+
+  if (n > 0)
+    GOMP(fatal) ("device number %u out of range for host execution", n);
+}
+
+STATIC void *
+device_alloc (size_t s)
+{
+  void *ptr = GOMP(malloc) (s);
+
+#ifdef DEBUG
+  fprintf (stderr, SELF "%s:%s (%zd): %p\n", __FILE__, __FUNCTION__, s, ptr);
+#endif
+
+  return ptr;
+}
+
+STATIC void
+device_free (void *p)
+{
+#ifdef DEBUG
+  fprintf (stderr, SELF "%s:%s (%p)\n", __FILE__, __FUNCTION__, p);
+#endif
+
+  free (p);
+}
+
+STATIC void *
+device_host2dev (void *d, const void *h, size_t s)
+{
+#ifdef DEBUG
+  fprintf (stderr, SELF "%s:%s (%p, %p, %zd)\n", __FILE__, __FUNCTION__, d, h,
+	   s);
+#endif
+
+  memcpy (d, h, s);
+
+  return 0;
+}
+
+STATIC void *
+device_dev2host (void *h, const void *d, size_t s)
+{
+#ifdef DEBUG
+  fprintf (stderr, SELF "%s:%s (%p, %p, %zd)\n", __FILE__, __FUNCTION__, h, d,
+	   s);
+#endif
+
+  memcpy (h, d, s);
+
+  return 0;
+}
+
+STATIC void
+device_run (void *fn_ptr, void *vars)
+{
+#ifdef DEBUG
+  fprintf (stderr, SELF "%s:%s (%p, %p)\n", __FILE__, __FUNCTION__, fn_ptr,
+	   vars);
+#endif
+
+  void (*fn)(void *) = (void (*)(void *)) fn_ptr;
+
+  fn (vars);
+}
+
+STATIC void
+openacc_parallel (void (*fn) (void *), size_t mapnum __attribute__((unused)),
+		  void **hostaddrs, void **devaddrs __attribute__((unused)),
+		  size_t *sizes __attribute__((unused)),
+		  unsigned short *kinds __attribute__((unused)),
+		  int num_gangs __attribute__((unused)),
+		  int num_workers __attribute__((unused)),
+		  int vector_length __attribute__((unused)),
+		  int async __attribute__((unused)),
+		  void *targ_mem_desc __attribute__((unused)))
+{
+#ifdef DEBUG
+  fprintf (stderr, SELF "%s:%s (%p, %zu, %p, %p, %p, %d, %d, %d, %d, %p)\n",
+	   __FILE__, __FUNCTION__, fn, mapnum, hostaddrs, sizes, kinds,
+	   num_gangs, num_workers, vector_length, async, targ_mem_desc);
+#endif
+
+  fn (hostaddrs);
+}
+
+STATIC void
+openacc_async_set_async (int async __attribute__((unused)))
+{
+#ifdef DEBUG
+  fprintf (stderr, SELF "%s:%s (%d)\n", __FILE__, __FUNCTION__, async);
+#endif
+}
+
+STATIC int
+openacc_async_test (int async __attribute__((unused)))
+{
+#ifdef DEBUG
+  fprintf (stderr, SELF "%s:%s (%d)\n", __FILE__, __FUNCTION__, async);
+#endif
+
+  return 1;
+}
+
+STATIC int
+openacc_async_test_all (void)
+{
+#ifdef DEBUG
+  fprintf (stderr, SELF "%s:%s\n", __FILE__, __FUNCTION__);
+#endif
+
+  return 1;
+}
+
+STATIC void
+openacc_async_wait (int async __attribute__((unused)))
+{
+#ifdef DEBUG
+  fprintf (stderr, SELF "%s:%s (%d)\n", __FILE__, __FUNCTION__, async);
+#endif
+}
+
+STATIC void
+openacc_async_wait_all (void)
+{
+#ifdef DEBUG
+  fprintf (stderr, SELF "%s:%s\n", __FILE__, __FUNCTION__);
+#endif
+}
+
+STATIC void
+openacc_async_wait_async (int async1 __attribute__((unused)),
+                	  int async2 __attribute__((unused)))
+{
+#ifdef DEBUG
+  fprintf (stderr, SELF "%s:%s (%d, %d)\n", __FILE__, __FUNCTION__, async1,
+	   async2);
+#endif
+}
+
+STATIC void
+openacc_async_wait_all_async (int async __attribute__((unused)))
+{
+#ifdef DEBUG
+  fprintf (stderr, SELF "%s:%s (%d)\n", __FILE__, __FUNCTION__, async);
+#endif
+}
+
+#ifndef NONSHM_HOST_PLUGIN
+static struct gomp_device_descr host_dispatch =
+  {
+    .name = "host",
+
+    .type = TARGET_TYPE_HOST,
+    .capabilities = TARGET_CAP_OPENACC_200 | TARGET_CAP_NATIVE_EXEC
+		    | TARGET_CAP_SHARED_MEM,
+    .id = 0,
+
+    .is_initialized = false,
+    .offload_regions_registered = false,
+
+    .get_name_func = get_name,
+    .get_type_func = get_type,
+    .get_caps_func = get_caps,
+
+    .device_init_func = device_init,
+    .device_fini_func = device_fini,
+    .get_num_devices_func = get_num_devices,
+    .offload_register_func = offload_register,
+    .device_get_table_func = device_get_table,
+
+    .device_alloc_func = device_alloc,
+    .device_free_func = device_free,
+    .device_host2dev_func = device_host2dev,
+    .device_dev2host_func = device_dev2host,
+    
+    .device_run_func = device_run,
+
+    .openacc = {
+      .open_device_func = openacc_open_device,
+      .close_device_func = openacc_close_device,
+
+      .get_device_num_func = openacc_get_device_num,
+      .set_device_num_func = openacc_set_device_num,
+
+      /* Device available.  */
+      .avail_func = openacc_avail,
+
+      .exec_func = openacc_parallel,
+
+      .async_set_async_func = openacc_async_set_async,
+      .async_test_func = openacc_async_test,
+      .async_test_all_func = openacc_async_test_all,
+      .async_wait_func = openacc_async_wait,
+      .async_wait_async_func = openacc_async_wait_async,
+      .async_wait_all_func = openacc_async_wait_all,
+      .async_wait_all_async_func = openacc_async_wait_all_async,
+      
+      .cuda = {
+	.get_current_device_func = NULL,
+	.get_current_context_func = NULL,
+	.get_stream_func = NULL,
+	.set_stream_func = NULL,
+      }
+    }
+  };
+
+/* Register this device type.  */
+static __attribute__ ((constructor))
+void ACC_host_init (void)
+{
+  gomp_mutex_init (&host_dispatch.mem_map.lock);
+  ACC_register (&host_dispatch);
+}
+#endif
+
diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c
new file mode 100644
index 0000000..af2d2aa
--- /dev/null
+++ b/libgomp/oacc-init.c
@@ -0,0 +1,507 @@
+/* OpenACC Runtime initialization routines
+
+   Copyright (C) 2013-2014 Free Software Foundation, Inc.
+
+   Contributed by Nathan Sidwell <nathan@codesourcery.com>.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "libgomp.h"
+#include "target.h"
+#include <assert.h>
+#include <stdlib.h>
+#include <strings.h>
+#include <stdbool.h>
+#include <sys/queue.h>
+#include <stdio.h>
+
+gomp_mutex_t acc_device_lock;
+
+/* Current dispatcher, and how it was initialized */
+static acc_device_t init_key = _ACC_device_hwm;
+
+/* The dispatch table for the current accelerator device.  This is currently
+   global, so you can only have one type of device open at any given time in a
+   program.  */
+struct gomp_device_descr const *ACC_dev;
+
+/* Handle for current thread.  */
+__thread  void *ACC_handle;
+static __thread int handle_num = -1;
+
+/* This context structure associates the handle for a physical device with
+   memory-mapping information for that device, and is used to associate new
+   host threads with previously-opened devices.  Note that it's not directly
+   connected with the CUDA "context" concept as used by the NVidia plugin.  */
+struct ACC_context {
+  struct memmap_t *ACC_memmap;
+  void *ACC_handle;
+  SLIST_ENTRY(ACC_context) next;
+};
+
+static SLIST_HEAD(_ACC_contexts, ACC_context) _ACC_contexts;
+static struct _ACC_contexts *ACC_contexts;
+
+static struct gomp_device_descr const *dispatchers[_ACC_device_hwm] = { 0 };
+
+void
+ACC_register (struct gomp_device_descr const *disp)
+{
+  gomp_mutex_lock (&acc_device_lock);
+
+  assert (acc_device_type (disp->type) != acc_device_none
+	  && acc_device_type (disp->type) != acc_device_default
+	  && acc_device_type (disp->type) != acc_device_not_host);
+  assert (!dispatchers[disp->type]);
+  dispatchers[disp->type] = disp;
+
+  gomp_mutex_unlock (&acc_device_lock);
+}
+
+static void
+close_handle (void)
+{
+  if (ACC_memmap)
+    {
+      if (ACC_mem_close (ACC_handle, ACC_memmap))
+        {
+          if (ACC_dev->openacc.close_device_func (ACC_handle) < 0)
+            gomp_fatal ("failed to close device");
+        }
+
+      ACC_memmap = 0;
+    }
+}
+
+static struct gomp_device_descr const *
+resolve_device (acc_device_t d)
+{
+  acc_device_t d_arg = d;
+
+  switch (d)
+    {
+    case acc_device_default:
+      {
+	if (goacc_device_type)
+	  {
+	    /* Lookup the named device.  */
+	    while (++d != _ACC_device_hwm)
+	      if (dispatchers[d]
+		  && !strcasecmp (goacc_device_type, dispatchers[d]->name)
+		  && dispatchers[d]->openacc.avail_func ())
+		goto found;
+
+	    gomp_fatal ("device type %s not supported", goacc_device_type);
+	  }
+
+	/* No default device specified, so start scanning for any non-host
+	   device that is available.  */
+	d = acc_device_not_host;
+      }
+      /* FALLTHROUGH */
+
+    case acc_device_not_host:
+      /* Find the first available device after acc_device_not_host.  */
+      while (++d != _ACC_device_hwm)
+	if (dispatchers[d] && dispatchers[d]->openacc.avail_func ())
+	  goto found;
+      if (d_arg == acc_device_default)
+	{	  
+	  d = acc_device_host;
+	  goto found;
+	}
+      gomp_fatal ("no device found");
+      break;
+
+    case acc_device_host:
+      break;
+
+    default:
+      if (d > _ACC_device_hwm)
+	gomp_fatal ("device %u out of range", (unsigned)d);
+      break;
+    }
+ found:
+
+  assert (d != acc_device_none
+	  && d != acc_device_default
+	  && d != acc_device_not_host);
+
+  return dispatchers[d];
+}
+
+static struct gomp_device_descr const *
+_acc_init (acc_device_t d)
+{
+  struct gomp_device_descr const *acc_dev;
+
+  if (ACC_dev)
+    gomp_fatal ("device already active");
+
+  init_key = d;  /* We need to remember what we were intialized as, to
+		    check shutdown etc.  */
+
+  acc_dev = resolve_device (d);
+  if (!acc_dev || !acc_dev->openacc.avail_func ())
+    gomp_fatal ("device %u not supported", (unsigned)d);
+
+  if (!acc_dev->is_initialized)
+    gomp_init_device ((struct gomp_device_descr *) acc_dev);
+
+  return acc_dev;
+}
+
+/* Open the ORD'th device of the currently-active type (ACC_dev must be
+   initialised before calling).  If ORD is < 0, open the default-numbered
+   device (set by the ACC_DEVICE_NUM environment variable or a call to
+   acc_set_device_num), or leave any currently-opened device as is.  "Opening"
+   consists of  calling the device's open_device_func hook, and either creating
+   a new memory mapping or associating a new thread with an existing such
+   mapping (that matches ACC_handle, i.e. which corresponds to the same
+   physical device).  */
+
+static void
+lazy_open (int ord)
+{
+  struct ACC_context *acc_ctx;
+
+  if (ACC_memmap)
+    {
+      assert (ord < 0 || ord == handle_num);
+      return;
+    }
+
+  assert (ACC_dev);
+
+  if (ord < 0)
+    ord = goacc_device_num;
+
+  ACC_handle = ACC_dev->openacc.open_device_func (ord);
+  handle_num = ord;
+
+  SLIST_FOREACH(acc_ctx, ACC_contexts, next)
+    {
+      if (acc_ctx->ACC_handle == ACC_handle)
+        {
+          ACC_memmap = acc_ctx->ACC_memmap;
+	  ACC_dev->openacc.async_set_async_func (acc_async_sync);
+
+          return;
+        }
+    }
+
+  ACC_memmap = ACC_mem_open (ACC_handle, NULL, handle_num);
+
+  ACC_dev->openacc.async_set_async_func (acc_async_sync);
+
+  acc_ctx = gomp_malloc (sizeof (struct ACC_context));
+  acc_ctx->ACC_handle = ACC_handle;
+  acc_ctx->ACC_memmap = ACC_memmap;
+
+  SLIST_INSERT_HEAD(ACC_contexts, acc_ctx, next);
+}
+
+/* OpenACC 2.0a (3.2.12, 3.2.13) doesn't specify whether the serialization of
+   init/shutdown is per-process or per-thread.  We choose per-process.  */
+
+void
+acc_init (acc_device_t d)
+{
+  if (!ACC_dev)
+    gomp_init_targets_once ();
+
+  gomp_mutex_lock (&acc_device_lock);
+
+  ACC_dev = _acc_init (d);
+
+  lazy_open (-1);
+
+  gomp_mutex_unlock (&acc_device_lock);
+}
+
+ialias (acc_init)
+
+void
+_acc_shutdown (acc_device_t d)
+{
+  /* We don't check whether d matches the actual device found, because
+     OpenACC 2.0 (3.2.12) says the parameters to the init and this
+     call must match (for the shutdown call anyway, it's silent on
+     others).  */
+
+  if (!ACC_dev)
+    gomp_fatal ("no device initialized");
+  if (init_key != d)
+    gomp_fatal ("device %u(%u) is initialized",
+	       (unsigned)init_key, (unsigned)ACC_dev->type);
+
+  close_handle ();
+
+  while (SLIST_FIRST(ACC_contexts) != NULL)
+    {
+      struct ACC_context *c;
+
+      c = SLIST_FIRST(ACC_contexts);
+      SLIST_REMOVE_HEAD(ACC_contexts, next);
+      free (c);
+    }
+
+  gomp_fini_device ((struct gomp_device_descr *) ACC_dev);
+
+  ACC_dev = 0;
+  ACC_handle = 0;
+  handle_num = -1;
+}
+
+void
+acc_shutdown (acc_device_t d)
+{
+  gomp_mutex_lock (&acc_device_lock);
+
+  _acc_shutdown (d);
+
+  gomp_mutex_unlock (&acc_device_lock);
+}
+
+ialias (acc_shutdown)
+
+static struct gomp_device_descr const *
+lazy_init (acc_device_t d)
+{
+  if (ACC_dev)
+    {
+      /* Re-initializing the same device, do nothing.  */
+      if (d == init_key)
+	return ACC_dev;
+
+      _acc_shutdown (init_key);
+    }
+
+  assert (!ACC_dev);
+
+  return _acc_init (d);
+}
+
+static void
+lazy_init_and_open (acc_device_t d)
+{
+  if (!ACC_dev)
+    gomp_init_targets_once ();
+
+  gomp_mutex_lock (&acc_device_lock);
+
+  ACC_dev = lazy_init (d);
+
+  lazy_open (-1);
+
+  gomp_mutex_unlock (&acc_device_lock);
+}
+
+int
+acc_get_num_devices (acc_device_t d)
+{
+  int n = 0;
+  struct gomp_device_descr const *acc_dev;
+
+  if (d == acc_device_none)
+    return 0;
+
+  if (!ACC_dev)
+    gomp_init_targets_once ();
+
+  acc_dev = resolve_device (d);
+  if (!acc_dev)
+    return 0;
+
+  n = acc_dev->device_init_func ();
+  if (n < 0)
+    n = 0;
+
+  return n;
+}
+
+ialias (acc_get_num_devices)
+
+void
+acc_set_device_type (acc_device_t d)
+{
+  lazy_init_and_open (d);
+}
+
+ialias (acc_set_device_type)
+
+acc_device_t
+acc_get_device_type (void)
+{
+  acc_device_t res = acc_device_none;
+  const struct gomp_device_descr *dev;
+
+  if (ACC_dev)
+    res = acc_device_type (ACC_dev->type);
+  else
+    {
+      gomp_init_targets_once ();
+
+      dev = resolve_device (acc_device_default);
+      res = acc_device_type (dev->type);
+    }
+
+  assert (res != acc_device_default
+	  && res != acc_device_not_host);
+
+  return res;
+}
+
+ialias (acc_get_device_type)
+
+int
+acc_get_device_num (acc_device_t d)
+{
+  const struct gomp_device_descr *dev;
+  int num;
+
+  if (d >= _ACC_device_hwm)
+    gomp_fatal ("device %u out of range", (unsigned)d);
+
+  if (!ACC_dev)
+    gomp_init_targets_once ();
+
+  dev = resolve_device (d);
+  if (!dev)
+    gomp_fatal ("no devices of type %u", d);
+
+  /* We might not have called lazy_open for this host thread yet, in which case
+     the get_device_num_func hook will return -1.  */
+  num = dev->openacc.get_device_num_func ();
+  if (num < 0)
+    num = goacc_device_num;
+  
+  return num;
+}
+
+ialias (acc_get_device_num)
+
+void
+acc_set_device_num (int n, acc_device_t d)
+{
+  const struct gomp_device_descr *dev;
+  int num_devices;
+
+  if (!ACC_dev)
+    gomp_init_targets_once ();
+  
+  if ((int) d == 0)
+    {
+      int i;
+      
+      /* A device setting of zero sets all device types on the system to use
+         the Nth instance of that device type.  Only attempt it for initialized
+	 devices though.  */
+      for (i = acc_device_not_host + 1; i < _ACC_device_hwm; i++)
+        {
+	  dev = resolve_device (d);
+	  if (dev && dev->is_initialized)
+	    dev->openacc.set_device_num_func (n);
+	}
+
+      /* ...and for future calls to acc_init/acc_set_device_type, etc.  */
+      goacc_device_num = n;
+    }
+  else
+    {
+      gomp_mutex_lock (&acc_device_lock);
+
+      ACC_dev = lazy_init (d);
+
+      num_devices = ACC_dev->get_num_devices_func ();
+
+      if (n >= num_devices)
+        gomp_fatal ("device %u out of range", n);
+
+      if (n != handle_num)
+	close_handle ();
+
+      lazy_open (n);
+
+      gomp_mutex_unlock (&acc_device_lock);
+    }
+}
+
+ialias (acc_set_device_num)
+
+int
+acc_on_device (acc_device_t dev)
+{
+  /* Just rely on the compiler builtin.  */
+  return __builtin_acc_on_device (dev);
+}
+ialias (acc_on_device)
+
+attribute_hidden void
+ACC_runtime_initialize (void)
+{
+  gomp_mutex_init (&acc_device_lock);
+
+  ACC_contexts = &_ACC_contexts;
+  SLIST_INIT (ACC_contexts);
+}
+
+/* Compiler helper functions */
+
+static __thread struct gomp_device_descr const *saved_bound_dev;
+
+void
+ACC_save_and_set_bind (acc_device_t d)
+{
+  assert (!saved_bound_dev);
+
+  saved_bound_dev = ACC_dev;
+  ACC_dev = dispatchers[d];
+}
+
+void
+ACC_restore_bind (void)
+{
+  ACC_dev = saved_bound_dev;
+  saved_bound_dev = NULL;
+}
+
+/* This is called from any OpenACC support function that may need to implicitly
+   initialize the libgomp runtime.  On exit all such initialization will have
+   been done, and both the global ACC_dev and the per-host-thread ACC_memmap
+   pointers will be valid.  */
+
+void
+ACC_lazy_initialize (void)
+{
+  if (ACC_dev && ACC_memmap)
+    return;
+
+  if (!ACC_dev)
+    lazy_init_and_open (acc_device_default);
+  else
+    {
+      gomp_mutex_lock (&acc_device_lock);
+      lazy_open (-1);
+      gomp_mutex_unlock (&acc_device_lock);
+    }
+}
diff --git a/libgomp/oacc-int.h b/libgomp/oacc-int.h
new file mode 100644
index 0000000..470774b
--- /dev/null
+++ b/libgomp/oacc-int.h
@@ -0,0 +1,127 @@
+/* OpenACC Runtime - internal declarations
+
+   Copyright (C) 2005-2014 Free Software Foundation, Inc.
+
+   Contributed by Nathan Sidwell <nathan@codesourcery.com> and Thomas Schwinge
+   <thomas@codesourcery.com>.  In parts based on libgomp.h contributed by
+   Richard Henderson <rth@redhat.com>.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* This file contains data types and function declarations that are not
+   part of the official OpenACC user interface.  There are declarations
+   in here that are part of the GNU OpenACC ABI, in that the compiler is
+   required to know about them and use them.
+
+   The convention is that the all caps prefix "GOACC" is used group items
+   that are part of the external ABI, and the lower case prefix "goacc"
+   is used group items that are completely private to the library.  */
+
+#ifndef _OACC_INT_H
+#define _OACC_INT_H 1
+
+#include "openacc.h"
+#include "config.h"
+#include <stddef.h>
+#include <stdbool.h>
+#include <stdarg.h>
+
+#ifdef HAVE_ATTRIBUTE_VISIBILITY
+# pragma GCC visibility push(hidden)
+#endif
+
+typedef struct ACC_dispatch_t
+{
+  /* open or close a device instance.  */
+  void *(*open_device_func) (int n);
+  int (*close_device_func) (void *h);
+
+  /* set or get the device number.  */
+  int (*get_device_num_func) (void);
+  void (*set_device_num_func) (int);
+
+  /* availability */
+  bool (*avail_func) (void);
+
+  /* execute */
+  void (*exec_func) (void (*) (void *), size_t, void **, void **, size_t *,
+		     unsigned short *, int, int, int, int, void *);
+
+  /* asynchronous routines  */
+  int (*async_test_func) (int);
+  int (*async_test_all_func) (void);
+  void (*async_wait_func) (int);
+  void (*async_wait_async_func) (int, int);
+  void (*async_wait_all_func) (void);
+  void (*async_wait_all_async_func) (int);
+  void (*async_set_async_func) (int);
+
+  /* NVIDIA target specific routines  */
+  struct {
+    void *(*get_current_device_func) (void);
+    void *(*get_current_context_func) (void);
+    void *(*get_stream_func) (int);
+    int (*set_stream_func) (int, void *);
+  } cuda;
+} ACC_dispatch_t;
+
+typedef enum ACC_dispatch_f
+  {
+    ACC_unified_mem_f = 1 << 0,
+  }
+ACC_dispatch_f;
+
+struct gomp_device_descr;
+
+void ACC_register (struct gomp_device_descr const *) __GOACC_NOTHROW;
+
+/* Memory routines.  */
+struct memmap_t *ACC_mem_open (void *, struct memmap_t *, int) __GOACC_NOTHROW;
+bool ACC_mem_close (void *, struct memmap_t *) __GOACC_NOTHROW;
+struct gomp_device_descr *ACC_resolve_device(int) __GOACC_NOTHROW;
+
+/* Current dispatcher */
+extern struct gomp_device_descr const *ACC_dev;
+
+/* Device handle for current thread.  */
+extern __thread void *ACC_handle;
+
+typedef struct memmap_t
+{
+  unsigned live;
+  struct target_mem_desc *tlist;
+  struct gomp_memory_mapping mem_map;
+} memmap_t;
+
+/* Memory mapping */
+extern __thread struct memmap_t *ACC_memmap;
+
+void ACC_runtime_initialize (void);
+void ACC_save_and_set_bind (acc_device_t);
+void ACC_restore_bind (void);
+void ACC_lazy_initialize (void);
+
+#ifdef HAVE_ATTRIBUTE_VISIBILITY
+# pragma GCC visibility pop
+#endif
+
+#endif /* _OACC_INT_H */
diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
new file mode 100644
index 0000000..6aed63e
--- /dev/null
+++ b/libgomp/oacc-mem.c
@@ -0,0 +1,515 @@
+/* OpenACC Runtime initialization routines
+
+   Copyright (C) 2013 Free Software Foundation, Inc.
+
+   Contributed by Nathan Sidwell <nathan@codesourcery.com>.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "openacc.h"
+#include "config.h"
+#include "libgomp.h"
+#include "gomp-constants.h"
+#include "target.h"
+#include <stdio.h>
+#include <stdint.h>
+
+#include "splay-tree.h"
+
+/* Although this pointer is local to each host thread, it points to a memmap_t
+   that is stored per-context (different host threads may be associated with
+   different contexts, and each context is associated with a physical
+   device).  */
+__thread struct memmap_t *ACC_memmap;
+
+memmap_t *
+ACC_mem_open (void *handle, memmap_t *src, int handle_num)
+{
+  if (!src)
+    {
+      src = gomp_malloc (sizeof (*src));
+      src->live = 0;
+      src->mem_map.splay_tree.root = NULL;
+      src->tlist = NULL;
+      gomp_mutex_init (&src->mem_map.lock);
+    }
+
+  src->live++;
+
+  return src;
+}
+
+bool
+ACC_mem_close (void *handle, memmap_t *mm)
+{
+  bool closed = 0;
+
+  if (!--mm->live)
+    {
+      struct target_mem_desc *t;
+
+      for (t = mm->tlist; t != NULL; t = t->prev)
+        {
+          ACC_dev->device_free_func (t->to_free);
+
+          t->tgt_end = 0;
+          t->to_free = 0;
+
+          gomp_unmap_vars (t, true);
+        }
+
+       closed = 1;
+    }
+
+  gomp_mutex_destroy (&mm->mem_map.lock);
+
+  return closed;
+}
+
+/* Return block containing [H->S), or NULL if not contained.  */
+
+attribute_hidden splay_tree_key
+lookup_host (memmap_t *mm, void *h, size_t s)
+{
+  struct splay_tree_key_s node;
+  splay_tree_key key;
+  struct gomp_memory_mapping *mem_map = &mm->mem_map;
+
+  node.host_start = (uintptr_t) h;
+  node.host_end = (uintptr_t) h + s;
+
+  gomp_mutex_lock (&mem_map->lock);
+
+  key = splay_tree_lookup (&mem_map->splay_tree, &node);
+
+  gomp_mutex_unlock (&mem_map->lock);
+
+  return key;
+}
+
+/* Return block containing [D->S), or NULL if not contained.
+   The list isn't ordered by device address, so we have to iterate
+   over the whole array.  This is not expected to be a common
+   operation.  */
+
+static splay_tree_key
+lookup_dev (memmap_t *b, void *d, size_t s)
+{
+  int i;
+  struct target_mem_desc *t;
+
+  gomp_mutex_lock (&b->mem_map.lock);
+
+  for (t = b->tlist; t != NULL; t = t->prev)
+    {
+      if (t->tgt_start <= (uintptr_t) d && t->tgt_end >= (uintptr_t) d + s)
+        break;
+    }
+
+  gomp_mutex_unlock (&b->mem_map.lock);
+
+  if (!t)
+    return NULL;
+
+  for (i = 0; i < t->refcount; i++)
+    {
+      void * offset;
+
+      splay_tree_key k = &t->array[i].key;
+      offset = d - t->tgt_start + k->tgt_offset;
+
+      if (k->host_start + offset <= (void *) k->host_end)
+        return k;
+    }
+ 
+  return NULL;
+}
+
+/* OpenACC is silent on how memory exhaustion is indicated.  We return
+   NULL.  */
+
+void *
+acc_malloc (size_t s)
+{
+  if (!s)
+    return NULL;
+
+  ACC_lazy_initialize ();
+
+  return ACC_dev->device_alloc_func (s);
+}
+
+/* OpenACC 2.0a (3.2.16) doesn't specify what to do in the event
+   the device address is mapped. We choose to check if it mapped,
+   and if it is, to unmap it. */
+void
+acc_free (void *d)
+{
+  splay_tree_key k;
+
+  if (!d)
+    return;
+
+  /* We don't have to call lazy open here, as the ptr value must have
+     been returned by acc_malloc.  It's not permitted to pass NULL in
+     (unless you got that null from acc_malloc).  */
+  if ((k = lookup_dev (ACC_memmap, d, 1)))
+   {
+     void *offset;
+
+     offset = d - k->tgt->tgt_start + k->tgt_offset;
+
+     acc_unmap_data((void *)(k->host_start + offset));
+   }
+
+  ACC_dev->device_free_func (d);
+}
+
+void
+acc_memcpy_to_device (void *d, void *h, size_t s)
+{
+  /* No need to call lazy open here, as the device pointer must have
+     been obtained from a routine that did that.  */
+  ACC_dev->device_host2dev_func (d, h, s);
+}
+
+void
+acc_memcpy_from_device (void *h, void *d, size_t s)
+{
+  /* No need to call lazy open here, as the device pointer must have
+     been obtained from a routine that did that.  */
+  ACC_dev->device_dev2host_func (h, d, s);
+}
+
+/* Return the device pointer that corresponds to host data H.  Or NULL
+   if no mapping.  */
+
+void *
+acc_deviceptr (void *h)
+{
+  splay_tree_key n;
+  void *d;
+  void *offset;
+
+  ACC_lazy_initialize ();
+
+  n = lookup_host (ACC_memmap, h, 1);
+
+  if (!n)
+    return NULL;
+
+  offset = h - n->host_start;
+
+  d = n->tgt->tgt_start + n->tgt_offset + offset;
+
+  return d;
+}
+
+/* Return the host pointer that corresponds to device data D.  Or NULL
+   if no mapping.  */
+
+void *
+acc_hostptr (void *d)
+{
+  splay_tree_key n;
+  void *h;
+  void *offset;
+
+  ACC_lazy_initialize ();
+
+  n = lookup_dev (ACC_memmap, d, 1);
+
+  if (!n)
+    return NULL;
+
+  offset = d - n->tgt->tgt_start + n->tgt_offset;
+
+  h = n->host_start + offset;
+
+  return h;
+}
+
+/* Return 1 if host data [H,+S] is present on the device.  */
+
+int
+acc_is_present (void *h, size_t s)
+{
+  splay_tree_key n;
+
+  if (!s || !h)
+    return 0;
+
+  ACC_lazy_initialize ();
+
+  n = lookup_host (ACC_memmap, h, s);
+
+  if (n && (((uintptr_t)h < n->host_start) ||
+	((uintptr_t)h + s > n->host_end) || (s > n->host_end - n->host_start)))
+    n = NULL;
+
+  return n != NULL;
+}
+
+/* Create a mapping for host [H,+S] -> device [D,+S] */
+
+void
+acc_map_data (void *h, void *d, size_t s)
+{
+  struct target_mem_desc *tgt;
+  size_t mapnum = 1;
+  void *hostaddrs = h;
+  void *devaddrs = d;
+  size_t sizes = s;
+  unsigned short kinds = GOMP_MAP_ALLOC;
+
+  ACC_lazy_initialize ();
+
+  if (!d || !h || !s)
+    gomp_fatal ("[%p,+%d]->[%p,+%d] is a bad map",
+                (void *)h, (int)s, (void *)d, (int)s);
+
+  if (lookup_host (ACC_memmap, h, s))
+    gomp_fatal ("host address [%p, +%d] is already mapped", (void *)h, (int)s);
+
+  if (lookup_dev (ACC_memmap, d, s))
+    gomp_fatal ("device address [%p, +%d] is already mapped", (void *)d, (int)s);
+
+  tgt = gomp_map_vars ((struct gomp_device_descr *) ACC_dev,
+		       &ACC_memmap->mem_map, mapnum, &hostaddrs,
+		       &devaddrs, &sizes, &kinds, true, false);
+
+  tgt->prev = ACC_memmap->tlist;
+  ACC_memmap->tlist = tgt;
+}
+
+void
+acc_unmap_data (void *h)
+{
+  /* No need to call lazy open, as the address must have been mapped.
+   */
+
+  size_t host_size;
+  splay_tree_key n = lookup_host (ACC_memmap, h, 1);
+  struct target_mem_desc *t;
+
+  if (!n)
+    gomp_fatal ("%p is not a mapped block", (void *)h);
+
+  host_size = n->host_end - n->host_start;
+
+  if (n->host_start != (uintptr_t) h)
+    gomp_fatal ("[%p,%d] surrounds1 %p",
+            (void *)n->host_start, (int)host_size, (void *)h);
+
+  t = n->tgt;
+
+  if (t->refcount == 2)
+    {
+      struct target_mem_desc *tp;
+
+      /* This is the last reference, so pull the descriptor off the 
+         chain. This avoids gomp_unmap_vars via gomp_unmap_tgt from
+         freeing the device memory. */
+      t->tgt_end = 0;
+      t->to_free = 0;
+
+      gomp_mutex_lock (&ACC_memmap->mem_map.lock);
+
+      for (tp = NULL, t = ACC_memmap->tlist; t != NULL; tp = t, t = t->prev)
+        {
+          if (n->tgt == t)
+            {
+              if (tp)
+                tp->prev = t->prev;
+              else
+                ACC_memmap->tlist = t->prev;
+
+              break; 
+            }
+        }
+
+      gomp_mutex_unlock (&ACC_memmap->mem_map.lock);
+    }
+
+  gomp_unmap_vars (t, true);
+}
+
+#define PCC_Present (1 << 0)
+#define PCC_Create (1 << 1)
+#define PCC_Copy (1 << 2)
+
+attribute_hidden void *
+present_create_copy (unsigned f, void *h, size_t s)
+{
+  void *d;
+  splay_tree_key n;
+
+  if (!h || !s)
+    gomp_fatal ("[%p,+%d] is a bad range", (void *)h, (int)s);
+
+  ACC_lazy_initialize ();
+
+  n = lookup_host (ACC_memmap, h, s);
+  if (n)
+    {
+      /* Present. */
+      d = (void *) (n->tgt->tgt_start + n->tgt_offset);
+
+      if (!(f & PCC_Present))
+        gomp_fatal ("[%p,+%d] already mapped to [%p,+%d]",
+            (void *)h, (int)s, (void *)d, (int)s);
+      if ((h + s) > (void *)n->host_end)    
+        gomp_fatal ("[%p,+%d] not mapped", (void *)h, (int)s);
+    }
+  else if (!(f & PCC_Create))
+    {
+      gomp_fatal ("[%p,+%d] not mapped", (void *)h, (int)s);
+    }
+  else
+    {
+      struct target_mem_desc *tgt;
+      size_t mapnum = 1;
+      unsigned short kinds;
+      void *hostaddrs = h;
+
+      if (f & PCC_Copy)
+        kinds = GOMP_MAP_ALLOC_TO;
+      else
+        kinds = GOMP_MAP_ALLOC;
+
+      tgt = gomp_map_vars ((struct gomp_device_descr *) ACC_dev,
+			   &ACC_memmap->mem_map, mapnum, &hostaddrs,
+			   NULL, &s, &kinds, true, false);
+
+      d = tgt->to_free;
+      tgt->prev = ACC_memmap->tlist;
+      ACC_memmap->tlist = tgt;
+    }
+  
+  return d;
+}
+
+void *
+acc_create (void *h, size_t s)
+{
+  return present_create_copy (PCC_Create, h, s);
+}
+
+void *
+acc_copyin (void *h, size_t s)
+{
+  return present_create_copy (PCC_Create | PCC_Copy, h, s);
+}
+
+void *
+acc_present_or_create (void *h, size_t s)
+{
+  return present_create_copy (PCC_Present | PCC_Create, h, s);
+}
+
+void *
+acc_present_or_copyin (void *h, size_t s)
+{
+  return present_create_copy (PCC_Present | PCC_Create | PCC_Copy, h, s);
+}
+
+#define DC_Copyout (1 << 0)
+
+static void
+delete_copyout (unsigned f, void *h, size_t s)
+{
+  size_t host_size;
+  splay_tree_key n;
+  void *d;
+
+  n = lookup_host (ACC_memmap, h, s);
+
+  /* No need to call lazy open, as the data must already have been
+     mapped.  */
+
+  if (!n)
+    gomp_fatal ("[%p,%d] is not mapped", (void *)h, (int)s);
+
+  d = (void *) (n->tgt->tgt_start + n->tgt_offset);
+
+  host_size = n->host_end - n->host_start;
+
+  if (n->host_start != (uintptr_t) h || host_size != s)
+    gomp_fatal ("[%p,%d] surrounds2 [%p,+%d]",
+            (void *)n->host_start, (int)host_size, (void *)h, (int)s);
+
+  if (f & DC_Copyout)
+    ACC_dev->device_dev2host_func (h, d, s);
+  
+  acc_unmap_data(h);
+
+  ACC_dev->device_free_func (d);
+}
+
+void
+acc_delete (void *h , size_t s)
+{
+  delete_copyout (0, h, s);
+}
+
+void acc_copyout (void *h, size_t s)
+{
+  delete_copyout (DC_Copyout, h, s);
+}
+
+static void
+update_dev_host (int is_dev, void *h, size_t s)
+{
+  splay_tree_key n;
+  void *d;
+
+  if (!ACC_memmap)
+    gomp_fatal ("[%p,%d] is not mapped", h, (int)s);
+
+  n = lookup_host (ACC_memmap, h, s);
+
+  /* No need to call lazy open, as the data must already have been
+     mapped.  */
+
+  if (!n)
+    gomp_fatal ("[%p,%d] is not mapped", h, (int)s);
+
+  d = (void *) (n->tgt->tgt_start + n->tgt_offset);
+
+  if (is_dev)
+    ACC_dev->device_host2dev_func (d, h, s);
+  else
+    ACC_dev->device_dev2host_func (h, d, s);
+
+}
+
+void
+acc_update_device (void *h, size_t s)
+{
+  update_dev_host (1, h, s);
+}
+
+void
+acc_update_self (void *h, size_t s)
+{
+  update_dev_host (0, h, s);
+}
diff --git a/libgomp/oacc-parallel.c b/libgomp/oacc-parallel.c
new file mode 100644
index 0000000..6508e4f
--- /dev/null
+++ b/libgomp/oacc-parallel.c
@@ -0,0 +1,386 @@
+/* Copyright (C) 2013-2014 Free Software Foundation, Inc.
+
+   Contributed by Thomas Schwinge <thomas@codesourcery.com>.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* This file handles OpenACC constructs.  */
+
+#include "openacc.h"
+#include "libgomp.h"
+#include "libgomp_g.h"
+#include "gomp-constants.h"
+#include "target.h"
+#include <stdio.h>
+#include <string.h>
+#include <stdarg.h>
+#include <assert.h>
+#include <alloca.h>
+
+#ifdef FUTURE
+// device geometry per device type
+struct devgeom
+{
+  int gangs;
+  int workers;
+  int vectors;
+};
+  
+
+// XXX: acceptable defaults?
+static __thread struct devgeom devgeom = { 1, 1, 1 };
+#endif
+
+#ifdef LATER
+static void
+dump_devaddrs(void)
+{
+  int i;
+  struct devaddr *dp;
+
+  gomp_notify("++++ num_devaddrs %d\n", num_devaddrs);
+  for (dp = devaddrs, i = 1; dp != 0; dp = dp->next, i++)
+    {
+      gomp_notify("++++ %.02d) %p\n", i, dp->d);
+    }
+}
+#endif
+
+static void
+dump_var(char *s, size_t idx, void *hostaddr, size_t size, unsigned char kind)
+{
+  gomp_notify(" %2zi: %3s 0x%.2x -", idx, s, kind & 0xff);
+
+  switch (kind & 0xff)
+    {
+      case 0x00: gomp_notify(" ALLOC              "); break;
+      case 0x01: gomp_notify(" ALLOC TO           "); break;
+      case 0x02: gomp_notify(" ALLOC FROM         "); break;
+      case 0x03: gomp_notify(" ALLOC TOFROM       "); break;
+      case 0x04: gomp_notify(" POINTER            "); break;
+      case 0x05: gomp_notify(" TO_PSET            "); break;
+
+      case 0x08: gomp_notify(" FORCE_ALLOC        "); break;
+      case 0x09: gomp_notify(" FORCE_TO           "); break;
+      case 0x0a: gomp_notify(" FORCE_FROM         "); break;
+      case 0x0b: gomp_notify(" FORCE_TOFROM       "); break;
+      case 0x0c: gomp_notify(" FORCE_PRESENT      "); break;
+      case 0x0d: gomp_notify(" FORCE_DEALLOC      "); break;
+      case 0x0e: gomp_notify(" FORCE_DEVICEPTR    "); break;
+
+      case 0x18: gomp_notify(" FORCE_PRIVATE      "); break;
+      case 0x19: gomp_notify(" FORCE_FIRSTPRIVATE "); break;
+
+      case (unsigned char) -1: gomp_notify(" DUMMY              "); break;
+      default: gomp_notify("UGH! 0x%x\n", kind);
+    }
+    
+  gomp_notify("- %d - %4d/0x%04x ", 1 << (kind >> 8), (int)size, (int)size);
+  gomp_notify("- %p\n", hostaddr);
+
+  return;
+}
+
+/* Ensure that the target device for DEVICE_TYPE is initialised (and that
+   plugins have been loaded if appropriate).  The ACC_dev variable for the
+   current thread will be set appropriately for the given device type on
+   return.  */
+
+attribute_hidden void
+select_acc_device (int device_type)
+{
+  if (device_type == GOMP_IF_CLAUSE_FALSE)
+    return;
+
+  if (device_type == acc_device_none)
+    device_type = acc_device_host;
+
+  if (device_type >= 0)
+    {
+      /* NOTE: this will go badly if the surrounding data environment is set up
+         to use a different device type.  We'll just have to trust that users
+	 know what they're doing...  */
+      acc_set_device_type (device_type);
+    }
+
+  ACC_lazy_initialize ();
+}
+
+void goacc_wait (int async, int num_waits, va_list ap);
+
+void
+GOACC_parallel (int device, void (*fn) (void *), const void *openmp_target,
+		size_t mapnum, void **hostaddrs, size_t *sizes,
+		unsigned short *kinds,
+		int num_gangs, int num_workers, int vector_length,
+		int async, int num_waits, ...)
+{
+  bool if_clause_condition_value = device != GOMP_IF_CLAUSE_FALSE;
+  va_list ap;
+  struct target_mem_desc *tgt;
+  void **devaddrs;
+  unsigned int i;
+
+  if (num_gangs != 1)
+    gomp_fatal ("num_gangs (%d) different from one is not yet supported",
+		num_gangs);
+  if (num_workers != 1)
+    gomp_fatal ("num_workers (%d) different from one is not yet supported",
+		num_workers);
+
+  gomp_notify ("%s: mapnum=%zd, hostaddrs=%p, sizes=%p, kinds=%p, async=%d\n",
+	       __FUNCTION__, mapnum, hostaddrs, sizes, kinds, async);
+
+  select_acc_device (device);
+
+  /* Host fallback if "if" clause is false or if the current device is set to
+     the host.  */
+  if (!if_clause_condition_value)
+    {
+      ACC_save_and_set_bind (acc_device_host);
+      fn (hostaddrs);
+      ACC_restore_bind ();
+      return;
+    }
+  else if (acc_device_type (ACC_dev->type) == acc_device_host)
+    {
+      fn (hostaddrs);
+      return;
+    }
+
+  va_start (ap, num_waits);
+  
+  if (num_waits > 0)
+    goacc_wait (async, num_waits, ap);
+
+  va_end (ap);
+
+  ACC_dev->openacc.async_set_async_func (async);
+
+  tgt = gomp_map_vars ((struct gomp_device_descr *) ACC_dev,
+		       &ACC_memmap->mem_map, mapnum, hostaddrs,
+		       NULL, sizes, kinds, true, false);
+
+  devaddrs = alloca (sizeof (void *) * mapnum);
+  for (i = 0; i < mapnum; i++)
+    devaddrs[i] = (void *) (tgt->list[i]->tgt->tgt_start
+			    + tgt->list[i]->tgt_offset);
+
+  ACC_dev->openacc.exec_func (fn, mapnum, hostaddrs, devaddrs, sizes, kinds,
+			      num_gangs, num_workers, vector_length, async,
+			      tgt);
+
+  /* If running synchronously, unmap immediately.  */
+  if (async < acc_async_noval)
+    gomp_unmap_vars (tgt, true);
+  else
+    gomp_copy_from_async (tgt);
+
+  ACC_dev->openacc.async_set_async_func (acc_async_sync);
+}
+
+static __thread struct target_mem_desc *mapped_data = NULL;
+
+void
+GOACC_data_start (int device, const void *openmp_target, size_t mapnum,
+		  void **hostaddrs, size_t *sizes, unsigned short *kinds)
+{
+  bool if_clause_condition_value = device != GOMP_IF_CLAUSE_FALSE;
+  struct target_mem_desc *tgt;
+
+  gomp_notify ("%s: mapnum=%zd, hostaddrs=%p, sizes=%p, kinds=%p\n",
+	       __FUNCTION__, mapnum, hostaddrs, sizes, kinds);
+
+  select_acc_device (device);
+
+  /* Host fallback or 'do nothing'.  */
+  if ((ACC_dev->capabilities & TARGET_CAP_SHARED_MEM)
+      || !if_clause_condition_value)
+    {
+      tgt = gomp_map_vars (NULL, NULL, 0, NULL, NULL, NULL, NULL, true, false);
+      tgt->prev = mapped_data;
+      mapped_data = tgt;
+
+      return;
+    }
+
+  gomp_notify ("  %s: prepare mappings\n", __FUNCTION__);
+  tgt = gomp_map_vars ((struct gomp_device_descr *) ACC_dev,
+		       &ACC_memmap->mem_map, mapnum, hostaddrs,
+		       NULL, sizes, kinds, true, false);
+  gomp_notify ("  %s: mappings prepared\n", __FUNCTION__);
+  tgt->prev = mapped_data;
+  mapped_data = tgt;
+}
+
+void
+GOACC_data_end (void)
+{
+  struct target_mem_desc *tgt = mapped_data;
+
+  gomp_notify ("  %s: restore mappings\n", __FUNCTION__);
+  mapped_data = tgt->prev;
+  gomp_unmap_vars (tgt, true);
+  gomp_notify ("  %s: mappings restored\n", __FUNCTION__);
+}
+
+
+void
+GOACC_kernels (int device, void (*fn) (void *), const void *openmp_target,
+	       size_t mapnum, void **hostaddrs, size_t *sizes,
+	       unsigned short *kinds,
+	       int num_gangs, int num_workers, int vector_length,
+	       int async, int num_waits, ...)
+{
+  gomp_notify ("%s: mapnum=%zd, hostaddrs=%p, sizes=%p, kinds=%p\n", __FUNCTION__,
+	 mapnum, hostaddrs, sizes, kinds);
+
+  va_list ap;
+
+  select_acc_device (device);
+
+  va_start (ap, num_waits);
+
+  if (num_waits > 0)
+    goacc_wait (async, num_waits, ap);
+
+  va_end (ap);
+
+  /* TODO.  */
+  GOACC_parallel (device, fn, openmp_target, mapnum, hostaddrs, sizes, kinds,
+		  num_gangs, num_workers, vector_length, async, num_waits);
+}
+
+void
+goacc_wait (int async, int num_waits, va_list ap)
+{
+  int i;
+
+  assert (num_waits >= 0);
+
+  if (async == acc_async_sync && num_waits == 0)
+    {
+      acc_wait_all ();
+      return;
+    }
+
+  if (async == acc_async_sync && num_waits)
+    {
+      for (i = 0; i < num_waits; i++)
+        {
+          int qid = va_arg (ap, int);
+
+          if (acc_async_test (qid))
+            continue;
+
+          acc_wait (qid);
+        }
+      return;
+    }
+
+  if (async == acc_async_noval && num_waits == 0)
+    {
+      ACC_dev->openacc.async_wait_all_async_func (acc_async_noval);
+      return;
+    }
+
+  for (i = 0; i < num_waits; i++)
+    {
+      int qid = va_arg (ap, int);
+
+      if (acc_async_test (qid))
+	continue;
+
+      /* If we're waiting on the same asynchronous queue as we're launching on,
+         the queue itself will order work as required, so there's no need to
+	 wait explicitly.  */
+      if (qid != async)
+	ACC_dev->openacc.async_wait_async_func (qid, async);
+    }
+}
+
+void
+GOACC_update (int device, const void *openmp_target, size_t mapnum,
+	      void **hostaddrs, size_t *sizes, unsigned short *kinds,
+	      int async, int num_waits, ...)
+{
+  bool if_clause_condition_value = device != GOMP_IF_CLAUSE_FALSE;
+  size_t i;
+
+  select_acc_device (device);
+
+  if ((ACC_dev->capabilities & TARGET_CAP_SHARED_MEM)
+      || !if_clause_condition_value)
+    return;
+
+  if (num_waits > 0)
+    {
+      va_list ap;
+
+      va_start (ap, num_waits);
+
+      goacc_wait (async, num_waits, ap);
+
+      va_end (ap);
+    }
+
+  ACC_dev->openacc.async_set_async_func (async);
+
+  for (i = 0; i < mapnum; ++i)
+    {
+      unsigned char kind = kinds[i] & 0xff;
+
+      dump_var("UPD", i, hostaddrs[i], sizes[i], kinds[i]);
+
+      switch (kind)
+	{
+	  case GOMP_MAP_POINTER:
+	     break;
+
+	  case GOMP_MAP_FORCE_TO:
+	     acc_update_device (hostaddrs[i], sizes[i]);
+	     break;
+
+	  case GOMP_MAP_FORCE_FROM:
+	     acc_update_self (hostaddrs[i], sizes[i]);
+	     break;
+
+	  default:
+	     gomp_fatal (">>>> GOACC_update UNHANDLED kind 0x%.2x", kind);
+	     break;
+	}
+    }
+
+  ACC_dev->openacc.async_set_async_func (acc_async_sync);
+}
+
+void
+GOACC_wait (int async, int num_waits, ...)
+{
+  va_list ap;
+
+  va_start (ap, num_waits);
+
+  goacc_wait (async, num_waits, ap);
+
+  va_end (ap);
+}
diff --git a/libgomp/oacc-plugin.c b/libgomp/oacc-plugin.c
new file mode 100644
index 0000000..c335b51
--- /dev/null
+++ b/libgomp/oacc-plugin.c
@@ -0,0 +1,44 @@
+/* Copyright (C) 2014 Free Software Foundation, Inc.
+   Contributed by CodeSourcery.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* Initialize and register OpenACC dispatch table from libgomp plugin.  */
+
+#include "libgomp.h"
+#include "oacc-plugin.h"
+#include "target.h"
+
+void
+ACC_plugin_register (struct gomp_device_descr *device)
+{
+  ACC_register (device);
+}
+
+
+void
+gomp_plugin_async_unmap_vars (void *ptr)
+{
+  struct target_mem_desc *tgt = ptr;
+  
+  gomp_unmap_vars (tgt, false);
+}
diff --git a/libgomp/oacc-plugin.h b/libgomp/oacc-plugin.h
new file mode 100644
index 0000000..0493a12
--- /dev/null
+++ b/libgomp/oacc-plugin.h
@@ -0,0 +1,32 @@
+/* Copyright (C) 2014 Free Software Foundation, Inc.
+   Contributed by CodeSourcery.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _OACC_PLUGIN_H
+#define _OACC_PLUGIN_H 1
+
+#include "target.h"
+
+extern void ACC_plugin_register (struct gomp_device_descr *dev);
+
+#endif
diff --git a/libgomp/openacc.f90 b/libgomp/openacc.f90
new file mode 100644
index 0000000..fe7f5ab
--- /dev/null
+++ b/libgomp/openacc.f90
@@ -0,0 +1,108 @@
+!  OpenACC Runtime Library Definitions.
+
+!  Copyright (C) 2013-2014 Free Software Foundation, Inc.
+
+!  Contributed by Thomas Schwinge <thomas@codesourcery.com>.
+
+!  This file is part of the GNU OpenMP Library (libgomp).
+
+!  Libgomp is free software; you can redistribute it and/or modify it
+!  under the terms of the GNU General Public License as published by
+!  the Free Software Foundation; either version 3, or (at your option)
+!  any later version.
+
+!  Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+!  WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+!  FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+!  more details.
+
+!  Under Section 7 of GPL version 3, you are granted additional
+!  permissions described in the GCC Runtime Library Exception, version
+!  3.1, as published by the Free Software Foundation.
+
+!  You should have received a copy of the GNU General Public License and
+!  a copy of the GCC Runtime Library Exception along with this program;
+!  see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+!  <http://www.gnu.org/licenses/>.
+
+module openacc_kinds
+  implicit none
+
+  integer, parameter :: acc_device_kind = 4
+
+end module openacc_kinds
+
+module openacc
+  use openacc_kinds
+  implicit none
+
+  integer, parameter :: openacc_version = 201306
+
+  integer (acc_device_kind), parameter :: acc_device_none = 0
+  integer (acc_device_kind), parameter :: acc_device_default = 1
+  integer (acc_device_kind), parameter :: acc_device_host = 2
+  integer (acc_device_kind), parameter :: acc_device_nonshm_host = 3
+  integer (acc_device_kind), parameter :: acc_device_not_host = 4
+  integer (acc_device_kind), parameter :: acc_device_nvidia = 5
+
+  interface
+     function acc_get_num_devices (dev)
+       use openacc_kinds
+       integer (4) :: acc_get_num_devices
+       integer (acc_device_kind), intent (in) :: dev
+     end function acc_get_num_devices
+  end interface
+
+  interface
+     subroutine acc_set_devices_type (dev)
+       use openacc_kinds
+       integer (acc_device_kind), intent (in) :: dev
+     end subroutine acc_set_devices_type
+  end interface
+
+  interface
+     function acc_get_device_type ()
+       use openacc_kinds
+       integer (acc_device_kind) :: acc_get_device_type
+     end function acc_get_device_type
+  end interface
+
+  interface acc_set_device_num
+     subroutine acc_set_device_num (num, dev)
+       use openacc_kinds
+       integer (4), intent (in) :: num
+       integer (acc_device_kind), intent (in) :: dev
+     end subroutine acc_set_device_num
+  end interface acc_set_device_num
+
+  interface
+     function acc_get_device_num (dev)
+       use openacc_kinds
+       integer (4) :: acc_get_device_num
+       integer (acc_device_kind), intent (in) :: dev
+     end function acc_get_device_num
+  end interface
+
+  interface
+     subroutine acc_init (dev)
+       use openacc_kinds
+       integer (acc_device_kind), intent (in) :: dev
+     end subroutine acc_init
+  end interface
+
+  interface
+     subroutine acc_shutdown (dev)
+       use openacc_kinds
+       integer (acc_device_kind), intent (in) :: dev
+     end subroutine acc_shutdown
+  end interface
+
+  interface
+     function acc_on_device (dev)
+       use openacc_kinds
+       logical (4) :: acc_on_device
+       integer (acc_device_kind), intent (in) :: dev
+     end function acc_on_device
+  end interface
+
+end module openacc
diff --git a/libgomp/openacc.h b/libgomp/openacc.h
new file mode 100644
index 0000000..e712d7b
--- /dev/null
+++ b/libgomp/openacc.h
@@ -0,0 +1,127 @@
+/* OpenACC Runtime Library User-facing Declarations
+
+   Copyright (C) 2013-2014 Free Software Foundation, Inc.
+
+   Contributed by Thomas Schwinge <thomas@codesourcery.com>.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _OPENACC_H
+#define _OPENACC_H 1
+
+#include "gomp-constants.h"
+
+/* The OpenACC std is silent on whether or not including openacc.h
+   might or must not include other header files.  We chose to include
+   some.  */
+#include <stddef.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#if __cplusplus >= 201103
+# define __GOACC_NOTHROW noexcept ()
+#elif __cplusplus
+# define __GOACC_NOTHROW throw ()
+#else /* Not C++ */
+# define __GOACC_NOTHROW __attribute__ ((__nothrow__))
+#endif
+
+  /* Types */
+  typedef enum acc_device_t
+    {
+      acc_device_none = 0,
+      acc_device_default, /* This has to be a distinct value, as no
+			     return value can match it.  */
+      acc_device_host = GOMP_TARGET_HOST,
+      acc_device_nonshm_host = GOMP_TARGET_NONSHM_HOST,
+      acc_device_not_host,
+      acc_device_nvidia = GOMP_TARGET_NVIDIA_PTX,
+      _ACC_device_hwm
+    } acc_device_t;
+
+  typedef enum acc_async_t
+    {
+      acc_async_noval = -1,
+      acc_async_sync  = -2
+    } acc_async_t;
+
+  int acc_get_num_devices (acc_device_t __dev) __GOACC_NOTHROW;
+  void acc_set_device_type (acc_device_t __dev) __GOACC_NOTHROW;
+  acc_device_t acc_get_device_type (void) __GOACC_NOTHROW;
+  void acc_set_device_num (int __num, acc_device_t __dev) __GOACC_NOTHROW;
+  int acc_get_device_num (acc_device_t __dev) __GOACC_NOTHROW;
+  int acc_async_test (int __async) __GOACC_NOTHROW;
+  int acc_async_test_all (void) __GOACC_NOTHROW;
+  void acc_wait (int __async) __GOACC_NOTHROW;
+  void acc_wait_async (int __async1, int __async2) __GOACC_NOTHROW;
+  void acc_wait_all (void) __GOACC_NOTHROW;
+  void acc_wait_all_async (int __async) __GOACC_NOTHROW;
+  void acc_init (acc_device_t __dev) __GOACC_NOTHROW;
+  void acc_shutdown (acc_device_t __dev) __GOACC_NOTHROW;
+  int acc_on_device (acc_device_t __dev) __GOACC_NOTHROW;
+  void *acc_malloc (size_t) __GOACC_NOTHROW;
+  void acc_free (void *) __GOACC_NOTHROW;
+  /* Some of these would be more correct with const qualifiers, but
+     the standard specifies otherwise.  */
+  void *acc_copyin (void *, size_t) __GOACC_NOTHROW;
+  void *acc_present_or_copyin (void *, size_t) __GOACC_NOTHROW;
+  void *acc_create (void *, size_t) __GOACC_NOTHROW;
+  void *acc_present_or_create (void *, size_t) __GOACC_NOTHROW;
+  void acc_copyout (void *, size_t) __GOACC_NOTHROW;
+  void acc_delete (void *, size_t) __GOACC_NOTHROW;
+  void acc_update_device (void *, size_t) __GOACC_NOTHROW;
+  void acc_update_self (void *, size_t) __GOACC_NOTHROW;
+  void acc_map_data (void *, void *, size_t) __GOACC_NOTHROW;
+  void acc_unmap_data (void *) __GOACC_NOTHROW;
+  void *acc_deviceptr (void *) __GOACC_NOTHROW;
+  void *acc_hostptr (void *) __GOACC_NOTHROW;
+  int acc_is_present (void *, size_t) __GOACC_NOTHROW;
+  void acc_memcpy_to_device (void *, void *, size_t) __GOACC_NOTHROW;
+  void acc_memcpy_from_device (void *, void *, size_t) __GOACC_NOTHROW;
+
+  void ACC_target (int, void (*) (void *), const void *,
+	     size_t, void **, size_t *, unsigned char *, int *) __GOACC_NOTHROW;
+  void ACC_parallel (int, void (*) (void *), const void *,
+	     size_t, void **, size_t *, unsigned char *) __GOACC_NOTHROW;
+  void ACC_add_device_code (void const *, char const *) __GOACC_NOTHROW;
+
+  void ACC_async_copy(int) __GOACC_NOTHROW;
+  void ACC_async_kern(int) __GOACC_NOTHROW;
+
+  /* Old names.  OpenACC does not specify whether these can or must
+     not be macros, inlines or aliases for the new names.  */
+  #define acc_pcreate acc_present_or_create
+  #define acc_pcopyin acc_present_or_copyin
+
+  /* CUDA-specific routines.  */
+  void *acc_get_current_cuda_device (void) __GOACC_NOTHROW;
+  void *acc_get_current_cuda_context (void) __GOACC_NOTHROW;
+  void *acc_get_cuda_stream (int __async) __GOACC_NOTHROW;
+  int acc_set_cuda_stream (int __async, void *__stream) __GOACC_NOTHROW;
+  
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _OPENACC_H */
diff --git a/libgomp/openacc_lib.h b/libgomp/openacc_lib.h
new file mode 100644
index 0000000..3ce1d8a
--- /dev/null
+++ b/libgomp/openacc_lib.h
@@ -0,0 +1,64 @@
+!  OpenACC Runtime Library Definitions.                   -*- mode: fortran -*-
+
+!  Copyright (C) 2013-2014 Free Software Foundation, Inc.
+
+!  Contributed by Thomas Schwinge <thomas@codesourcery.com>.
+
+!  This file is part of the GNU OpenMP Library (libgomp).
+
+!  Libgomp is free software; you can redistribute it and/or modify it
+!  under the terms of the GNU General Public License as published by
+!  the Free Software Foundation; either version 3, or (at your option)
+!  any later version.
+
+!  Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+!  WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+!  FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+!  more details.
+
+!  Under Section 7 of GPL version 3, you are granted additional
+!  permissions described in the GCC Runtime Library Exception, version
+!  3.1, as published by the Free Software Foundation.
+
+!  You should have received a copy of the GNU General Public License and
+!  a copy of the GCC Runtime Library Exception along with this program;
+!  see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+!  <http://www.gnu.org/licenses/>.
+
+      integer openacc_version
+      parameter (openacc_version = 201306)
+
+      integer acc_device_kind
+      parameter (acc_device_kind = 4)
+      integer (acc_device_kind) acc_device_none
+      parameter (acc_device_none = 0)
+      integer (acc_device_kind) acc_device_default
+      parameter (acc_device_default = 1)
+      integer (acc_device_kind) acc_device_host
+      parameter (acc_device_host = 2)
+      integer (acc_device_kind) acc_device_nonshm_host
+      parameter (acc_device_nonshm_host = 3)
+      integer (acc_device_kind) acc_device_not_host
+      parameter (acc_device_not_host = 4)
+      integer (acc_device_kind) acc_device_nvidia
+      parameter (acc_device_nvidia = 5)
+
+      external acc_get_num_devices
+      integer (4) acc_get_num_devices
+
+      external acc_set_device_type
+
+      external acc_get_device_type
+      integer (acc_device_kind) acc_get_device_type
+
+      external acc_set_device_num
+
+      external acc_get_device_num
+      integer (4) acc_get_device_num
+
+      external acc_init
+
+      external acc_shutdown
+
+      external acc_on_device
+      logical (4) acc_on_device
diff --git a/libgomp/plugin-nvptx.c b/libgomp/plugin-nvptx.c
new file mode 100644
index 0000000..f65292e
--- /dev/null
+++ b/libgomp/plugin-nvptx.c
@@ -0,0 +1,1854 @@
+/* Plugin for NVPTX execution.
+
+   Copyright (C) 2013-2014 Free Software Foundation, Inc.
+
+   Contributed by CodeSourcery.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* Nvidia PTX-specific parts of OpenACC support.  The cuda driver
+   library appears to hold some implicit state, but the documentation
+   is not clear as to what that state might be.  Or how one might
+   propagate it from one thread to another.  */
+
+//#define DEBUG
+//#define DISABLE_ASYNC
+
+#include "openacc.h"
+#include "config.h"
+#include "libgomp.h"
+#include "target.h"
+#include "libgomp-plugin.h"
+
+#include <cuda.h>
+#include <sys/queue.h>
+#include <stdint.h>
+#include <string.h>
+#include <stdio.h>
+#include <dlfcn.h>
+#include <unistd.h>
+#include <assert.h>
+
+#define	CUERRORS 50
+static struct _errlist
+{
+  CUresult r;
+  char *m;
+} cuErrorList[CUERRORS] = {
+    { CUDA_ERROR_INVALID_VALUE, "invalid value" },
+    { CUDA_ERROR_OUT_OF_MEMORY, "out of memory" },
+    { CUDA_ERROR_NOT_INITIALIZED, "not initialized" },
+    { CUDA_ERROR_DEINITIALIZED, "deinitialized" },
+    { CUDA_ERROR_PROFILER_DISABLED, "profiler disabled" },
+    { CUDA_ERROR_PROFILER_NOT_INITIALIZED, "profiler not initialized" },
+    { CUDA_ERROR_PROFILER_ALREADY_STARTED, "already started" },
+    { CUDA_ERROR_PROFILER_ALREADY_STOPPED, "already stopped" },
+    { CUDA_ERROR_NO_DEVICE, "no device" },
+    { CUDA_ERROR_INVALID_DEVICE, "invalid device" },
+    { CUDA_ERROR_INVALID_IMAGE, "invalid image" },
+    { CUDA_ERROR_INVALID_CONTEXT, "invalid context" },
+    { CUDA_ERROR_CONTEXT_ALREADY_CURRENT, "context already current" },
+    { CUDA_ERROR_MAP_FAILED, "map error" },
+    { CUDA_ERROR_UNMAP_FAILED, "unmap error" },
+    { CUDA_ERROR_ARRAY_IS_MAPPED, "array is mapped" },
+    { CUDA_ERROR_ALREADY_MAPPED, "already mapped" },
+    { CUDA_ERROR_NO_BINARY_FOR_GPU, "no binary for gpu" },
+    { CUDA_ERROR_ALREADY_ACQUIRED, "already acquired" },
+    { CUDA_ERROR_NOT_MAPPED, "not mapped" },
+    { CUDA_ERROR_NOT_MAPPED_AS_ARRAY, "not mapped as array" },
+    { CUDA_ERROR_NOT_MAPPED_AS_POINTER, "not mapped as pointer" },
+    { CUDA_ERROR_ECC_UNCORRECTABLE, "ecc uncorrectable" },
+    { CUDA_ERROR_UNSUPPORTED_LIMIT, "unsupported limit" },
+    { CUDA_ERROR_CONTEXT_ALREADY_IN_USE, "context already in use" },
+    { CUDA_ERROR_PEER_ACCESS_UNSUPPORTED, "peer access unsupported" },
+    { CUDA_ERROR_INVALID_SOURCE, "invalid source" },
+    { CUDA_ERROR_FILE_NOT_FOUND, "file not found" },
+    { CUDA_ERROR_SHARED_OBJECT_SYMBOL_NOT_FOUND,
+                                            "shared object symbol not found" },
+    { CUDA_ERROR_SHARED_OBJECT_INIT_FAILED, "shared object init error" },
+    { CUDA_ERROR_OPERATING_SYSTEM, "operating system" },
+    { CUDA_ERROR_INVALID_HANDLE, "invalid handle" },
+    { CUDA_ERROR_NOT_FOUND, "not found" },
+    { CUDA_ERROR_NOT_READY, "not ready" },
+    { CUDA_ERROR_LAUNCH_FAILED, "launch error" },
+    { CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES, "launch out of resources" },
+    { CUDA_ERROR_LAUNCH_TIMEOUT, "launch timeout" },
+    { CUDA_ERROR_LAUNCH_INCOMPATIBLE_TEXTURING,
+                                            "launch incompatibe texturing" },
+    { CUDA_ERROR_PEER_ACCESS_ALREADY_ENABLED, "peer access already enabled" },
+    { CUDA_ERROR_PEER_ACCESS_NOT_ENABLED, "peer access not enabled " },
+    { CUDA_ERROR_PRIMARY_CONTEXT_ACTIVE, "primary cotext active" },
+    { CUDA_ERROR_CONTEXT_IS_DESTROYED, "context is destroyed" },
+    { CUDA_ERROR_ASSERT, "assert" },
+    { CUDA_ERROR_TOO_MANY_PEERS, "too many peers" },
+    { CUDA_ERROR_HOST_MEMORY_ALREADY_REGISTERED,
+                                            "host memory already registered" },
+    { CUDA_ERROR_HOST_MEMORY_NOT_REGISTERED, "host memory not registered" },
+    { CUDA_ERROR_NOT_PERMITTED, "no permitted" },
+    { CUDA_ERROR_NOT_SUPPORTED, "not supported" },
+    { CUDA_ERROR_UNKNOWN, "unknown" }
+};
+
+static char errmsg[128];
+
+static char *
+cuErrorMsg (CUresult r)
+{
+  int i;
+
+  for (i = 0; i < CUERRORS; i++)
+    {
+      if (cuErrorList[i].r == r)
+	return &cuErrorList[i].m[0];
+    }
+
+  sprintf (&errmsg[0], "unknown result code: %5d", r);
+
+  return &errmsg[0];
+}
+
+static bool PTX_inited = false;
+
+struct PTX_stream
+{
+  CUstream stream;
+  pthread_t host_thread;
+  bool multithreaded;
+
+  CUdeviceptr d;
+  void *h;
+  void *h_begin;
+  void *h_end;
+  void *h_next;
+  void *h_prev;
+  void *h_tail;
+
+  SLIST_ENTRY(PTX_stream) next;
+};
+
+SLIST_HEAD(PTX_streams, PTX_stream);
+
+/* Each thread may select a stream (also specific to a device/context).  */
+static __thread struct PTX_stream *current_stream;
+
+struct map
+{
+  int     async;
+  size_t  size;
+  char    mappings[0];
+};
+
+static void
+map_init (struct PTX_stream *s)
+{
+  CUresult r;
+
+  int size = getpagesize ();
+
+  assert (s);
+  assert (!s->d);
+  assert (!s->h);
+
+  r = cuMemAllocHost (&s->h, size);
+  if (r != CUDA_SUCCESS)
+    gomp_plugin_fatal ("cuMemAllocHost error: %s", cuErrorMsg (r));
+
+  r = cuMemHostGetDevicePointer (&s->d, s->h, 0);
+  if (r != CUDA_SUCCESS)
+    gomp_plugin_fatal ("cuMemHostGetDevicePointer error: %s", cuErrorMsg (r));
+
+  assert (s->h);
+
+  s->h_begin = s->h;
+  s->h_end = s->h_begin + size;
+  s->h_next = s->h_prev = s->h_tail = s->h_begin;
+
+  assert (s->h_next);
+  assert (s->h_end);
+}
+
+static void
+map_fini (struct PTX_stream *s)
+{
+  CUresult r;
+  
+  r = cuMemFreeHost (s->h);
+  if (r != CUDA_SUCCESS)
+    gomp_plugin_fatal ("cuMemFreeHost error: %s", cuErrorMsg (r));
+}
+
+static void
+map_pop (struct PTX_stream *s)
+{
+  struct map *m;
+
+  assert (s != NULL);
+  assert (s->h_next);
+  assert (s->h_prev);
+  assert (s->h_tail);
+
+  m = s->h_tail;
+
+  s->h_tail += m->size;
+
+  if (s->h_tail >= s->h_end)
+    s->h_tail = s->h_begin + (int) (s->h_tail - s->h_end);
+
+  if (s->h_next == s->h_tail)
+    s->h_prev = s->h_next;
+
+  assert (s->h_next >= s->h_begin);
+  assert (s->h_tail >= s->h_begin);
+  assert (s->h_prev >= s->h_begin);
+
+  assert (s->h_next <= s->h_end);
+  assert (s->h_tail <= s->h_end);
+  assert (s->h_prev <= s->h_end);
+}
+
+static void
+map_push (struct PTX_stream *s, int async, size_t size, void **h, void **d)
+{
+  int left;
+  int offset;
+  struct map *m;
+
+  assert (s != NULL);
+
+  left = s->h_end - s->h_next;
+  size += sizeof (struct map);
+
+  assert (s->h_prev);
+  assert (s->h_next);
+
+  if (size >= left)
+    {
+      m = s->h_prev;
+      m->size += left;
+      s->h_next = s->h_begin;
+
+      if (s->h_next + size > s->h_end)
+	gomp_plugin_fatal ("unable to push map");
+    }
+
+  assert (s->h_next);
+
+  m = s->h_next;
+  m->async = async;
+  m->size = size;
+
+  offset = (void *)&m->mappings[0] - s->h;
+
+  *d = (void *)(s->d + offset);
+  *h = (void *)(s->h + offset);
+
+  s->h_prev = s->h_next;
+  s->h_next += size;
+
+  assert (s->h_prev);
+  assert (s->h_next);
+
+  assert (s->h_next >= s->h_begin);
+  assert (s->h_tail >= s->h_begin);
+  assert (s->h_prev >= s->h_begin);
+  assert (s->h_next <= s->h_end);
+  assert (s->h_tail <= s->h_end);
+  assert (s->h_prev <= s->h_end);
+
+  return;
+}
+
+struct PTX_device
+{
+  CUcontext ctx;
+  bool ctx_shared;
+  CUdevice dev;
+  struct PTX_stream *null_stream;
+  /* All non-null streams associated with this device (actually context),
+     either created implicitly or passed in from the user (via
+     acc_set_cuda_stream).  */
+  struct PTX_streams active_streams;
+  struct {
+    struct PTX_stream **arr;
+    int size;
+  } async_streams;
+  /* A lock for use when manipulating the above stream list and array.  */
+  gomp_mutex_t stream_lock;
+  int ord;
+  bool overlap;
+  bool map;
+  bool concur;
+  int  mode;
+  bool mkern;
+  SLIST_ENTRY(PTX_device) next;
+};
+
+static __thread struct PTX_device *PTX_dev;
+static SLIST_HEAD(_PTX_devices, PTX_device) _PTX_devices;
+static struct _PTX_devices *PTX_devices;
+
+enum PTX_event_type
+{
+  PTX_EVT_MEM,
+  PTX_EVT_KNL,
+  PTX_EVT_SYNC
+};
+
+struct PTX_event
+{
+  CUevent *evt;
+  int type;
+  void *addr;
+  void *tgt;
+  int ord;
+  SLIST_ENTRY(PTX_event) next;
+};
+
+static gomp_mutex_t PTX_event_lock;
+static SLIST_HEAD(_PTX_events, PTX_event) _PTX_events;
+static struct _PTX_events *PTX_events;
+
+#define _XSTR(s) _STR(s)
+#define _STR(s) #s
+
+#define CUSYMS 36
+static struct _synames
+{
+  char *n;
+} cuSymNames[CUSYMS] =
+{
+  { _XSTR(cuCtxCreate) },
+  { _XSTR(cuCtxDestroy) },
+  { _XSTR(cuCtxGetCurrent) },
+  { _XSTR(cuCtxPushCurrent) },
+  { _XSTR(cuCtxSynchronize) },
+  { _XSTR(cuDeviceGet) },
+  { _XSTR(cuDeviceGetAttribute) },
+  { _XSTR(cuDeviceGetCount) },
+  { _XSTR(cuEventCreate) },
+  { _XSTR(cuEventDestroy) },
+  { _XSTR(cuEventQuery) },
+  { _XSTR(cuEventRecord) },
+  { _XSTR(cuInit) },
+  { _XSTR(cuLaunchKernel) },
+  { _XSTR(cuLinkAddData) },
+  { _XSTR(cuLinkComplete) },
+  { _XSTR(cuLinkCreate) },
+  { _XSTR(cuMemAlloc) },
+  { _XSTR(cuMemAllocHost) },
+  { _XSTR(cuMemcpy) },
+  { _XSTR(cuMemcpyDtoH) },
+  { _XSTR(cuMemcpyDtoHAsync) },
+  { _XSTR(cuMemcpyHtoD) },
+  { _XSTR(cuMemcpyHtoDAsync) },
+  { _XSTR(cuMemFree) },
+  { _XSTR(cuMemFreeHost) },
+  { _XSTR(cuMemGetAddressRange) },
+  { _XSTR(cuMemHostGetDevicePointer) },
+  { _XSTR(cuMemHostRegister) },
+  { _XSTR(cuMemHostUnregister) },
+  { _XSTR(cuModuleGetFunction) },
+  { _XSTR(cuModuleLoadData) },
+  { _XSTR(cuStreamDestroy) },
+  { _XSTR(cuStreamQuery) },
+  { _XSTR(cuStreamSynchronize) },
+  { _XSTR(cuStreamWaitEvent) }
+};
+
+static int
+verify_device_library (void)
+{
+  int i;
+  void *dh, *ds;
+
+  dh = dlopen ("libcuda.so", RTLD_LAZY);
+  if (!dh)
+    return -1;
+
+  for (i = 0; i < CUSYMS; i++)
+    {
+      ds = dlsym (dh, cuSymNames[i].n);
+      if (!ds)
+        return -1;
+    }
+
+  dlclose (dh);
+  
+  return 0;
+}
+
+static void
+init_streams_for_device (struct PTX_device *ptx_dev, int concurrency)
+{
+  int i;
+  struct PTX_stream *null_stream
+    = gomp_plugin_malloc (sizeof (struct PTX_stream));
+
+  null_stream->stream = NULL;
+  null_stream->host_thread = pthread_self ();
+  null_stream->multithreaded = true;
+  null_stream->d = (CUdeviceptr) NULL;
+  null_stream->h = NULL;
+  map_init (null_stream);
+  ptx_dev->null_stream = null_stream;
+  
+  SLIST_INIT (&ptx_dev->active_streams);
+  gomp_plugin_mutex_init (&ptx_dev->stream_lock);
+  
+  if (concurrency < 1)
+    concurrency = 1;
+  
+  /* This is just a guess -- make space for as many async streams as the
+     current device is capable of concurrently executing.  This can grow
+     later as necessary.  No streams are created yet.  */
+  ptx_dev->async_streams.arr
+    = gomp_plugin_malloc (concurrency * sizeof (struct PTX_stream *));
+  ptx_dev->async_streams.size = concurrency;
+  
+  for (i = 0; i < concurrency; i++)
+    ptx_dev->async_streams.arr[i] = NULL;
+}
+
+static void
+fini_streams_for_device (struct PTX_device *ptx_dev)
+{
+  struct PTX_stream *s;
+  free (ptx_dev->async_streams.arr);
+  
+  while (!SLIST_EMPTY (&ptx_dev->active_streams))
+    {
+      s = SLIST_FIRST (&ptx_dev->active_streams);
+      SLIST_REMOVE_HEAD (&ptx_dev->active_streams, next);
+      cuStreamDestroy (s->stream);
+      map_fini (s);
+      free (s);
+    }
+  
+  map_fini (ptx_dev->null_stream);
+  free (ptx_dev->null_stream);
+}
+
+/* Select a stream for (OpenACC-semantics) ASYNC argument for the current
+   thread THREAD (and also current device/context).  If CREATE is true, create
+   the stream if it does not exist (or use EXISTING if it is non-NULL), and
+   associate the stream with the same thread argument.  Returns stream to use
+   as result.  */
+
+static struct PTX_stream *
+select_stream_for_async (int async, pthread_t thread, bool create,
+			 CUstream existing)
+{
+  /* Local copy of TLS variable.  */
+  struct PTX_device *ptx_dev = PTX_dev;
+  struct PTX_stream *stream = NULL;
+  int orig_async = async;
+  
+  /* The special value acc_async_noval (-1) maps (for now) to an
+     implicitly-created stream, which is then handled the same as any other
+     numbered async stream.  Other options are available, e.g. using the null
+     stream for anonymous async operations, or choosing an idle stream from an
+     active set.  But, stick with this for now.  */
+  if (async > acc_async_sync)
+    async++;
+  
+  if (create)
+    gomp_plugin_mutex_lock (&ptx_dev->stream_lock);
+
+  /* NOTE: AFAICT there's no particular need for acc_async_sync to map to the
+     null stream, and in fact better performance may be obtainable if it doesn't
+     (because the null stream enforces overly-strict synchronisation with
+     respect to other streams for legacy reasons, and that's probably not
+     needed with OpenACC).  Maybe investigate later.  */
+  if (async == acc_async_sync)
+    stream = ptx_dev->null_stream;
+  else if (async >= 0 && async < ptx_dev->async_streams.size
+	   && ptx_dev->async_streams.arr[async] && !(create && existing))
+    stream = ptx_dev->async_streams.arr[async];
+  else if (async >= 0 && create)
+    {
+      if (async >= ptx_dev->async_streams.size)
+        {
+	  int i, newsize = ptx_dev->async_streams.size * 2;
+	  
+	  if (async >= newsize)
+	    newsize = async + 1;
+	  
+	  ptx_dev->async_streams.arr
+	    = gomp_plugin_realloc (ptx_dev->async_streams.arr,
+				   newsize * sizeof (struct PTX_stream *));
+	  
+	  for (i = ptx_dev->async_streams.size; i < newsize; i++)
+	    ptx_dev->async_streams.arr[i] = NULL;
+	  
+	  ptx_dev->async_streams.size = newsize;
+	}
+
+      /* Create a new stream on-demand if there isn't one already, or if we're
+	 setting a particular async value to an existing (externally-provided)
+	 stream.  */
+      if (!ptx_dev->async_streams.arr[async] || existing)
+        {
+	  CUresult r;
+	  struct PTX_stream *s
+	    = gomp_plugin_malloc (sizeof (struct PTX_stream));
+
+	  if (existing)
+	    s->stream = existing;
+	  else
+	    {
+	      r = cuStreamCreate (&s->stream, CU_STREAM_DEFAULT);
+	      if (r != CUDA_SUCCESS)
+		gomp_plugin_fatal ("cuStreamCreate error: %s", cuErrorMsg (r));
+	    }
+	  
+	  /* If CREATE is true, we're going to be queueing some work on this
+	     stream.  Associate it with the current host thread.  */
+	  s->host_thread = thread;
+	  s->multithreaded = false;
+	  
+	  s->d = (CUdeviceptr) NULL;
+	  s->h = NULL;
+	  map_init (s);
+	  
+	  SLIST_INSERT_HEAD (&ptx_dev->active_streams, s, next);
+	  ptx_dev->async_streams.arr[async] = s;
+	}
+
+      stream = ptx_dev->async_streams.arr[async];
+    }
+  else if (async < 0)
+    gomp_plugin_fatal ("bad async %d", async);
+
+  if (create)
+    {
+      assert (stream != NULL);
+
+      /* If we're trying to use the same stream from different threads
+	 simultaneously, set stream->multithreaded to true.  This affects the
+	 behaviour of acc_async_test_all and acc_wait_all, which are supposed to
+	 only wait for asynchronous launches from the same host thread they are
+	 invoked on.  If multiple threads use the same async value, we make note
+	 of that here and fall back to testing/waiting for all threads in those
+	 functions.  */
+      if (thread != stream->host_thread)
+        stream->multithreaded = true;
+
+      gomp_plugin_mutex_unlock (&ptx_dev->stream_lock);
+    }
+  else if (stream && !stream->multithreaded
+	   && !pthread_equal (stream->host_thread, thread))
+    gomp_plugin_fatal ("async %d used on wrong thread", orig_async);
+
+#ifdef DEBUG
+  fprintf (stderr, "libgomp plugin: %s:%s using stream %p (CUDA stream %p) "
+	   "for async %d\n", __FILE__, __FUNCTION__, stream,
+	   stream ? stream->stream : NULL, orig_async);
+#endif
+
+  return stream;
+}
+
+static int PTX_get_num_devices (void);
+
+/* Initialize the device.  */
+static int
+PTX_init (void)
+{
+  CUresult r;
+  int rc;
+
+  if (PTX_inited)
+    return PTX_get_num_devices ();
+
+  rc = verify_device_library ();
+  if (rc < 0)
+    return -1;
+
+  r = cuInit (0);
+  if (r != CUDA_SUCCESS)
+    gomp_plugin_fatal ("cuInit error: %s", cuErrorMsg (r));
+
+  PTX_devices = &_PTX_devices;
+  PTX_events = &_PTX_events;
+
+  SLIST_INIT(PTX_devices);
+  SLIST_INIT(PTX_events);
+
+  gomp_plugin_mutex_init (&PTX_event_lock);
+
+  PTX_inited = true;
+
+  return PTX_get_num_devices ();
+}
+
+static int
+PTX_fini (void)
+{
+  PTX_inited = false;
+
+  return 0;
+}
+
+static void *
+PTX_open_device (int n)
+{
+  CUdevice dev;
+  CUresult r;
+  int async_engines, pi;
+
+  if (PTX_devices)
+    {
+      struct PTX_device *ptx_device;
+
+      SLIST_FOREACH(ptx_device, PTX_devices, next)
+        {
+          if (ptx_device->ord == n)
+            {
+              PTX_dev = ptx_device;
+
+              if (PTX_dev->ctx)
+                {
+                  r = cuCtxPushCurrent (PTX_dev->ctx);
+                  if (r != CUDA_SUCCESS)
+                    gomp_plugin_fatal ("cuCtxPushCurrent error: %s",
+				       cuErrorMsg (r));
+                }
+
+              return (void *)PTX_dev;
+            }
+        }
+    }
+
+  r = cuDeviceGet (&dev, n);
+  if (r != CUDA_SUCCESS)
+    gomp_plugin_fatal ("cuDeviceGet error: %s", cuErrorMsg (r));
+
+  PTX_dev = gomp_plugin_malloc (sizeof (struct PTX_device));
+  PTX_dev->ord = n;
+  PTX_dev->dev = dev;
+  PTX_dev->ctx_shared = false;
+
+  SLIST_INSERT_HEAD(PTX_devices, PTX_dev, next);
+
+  r = cuCtxGetCurrent (&PTX_dev->ctx);
+  if (r != CUDA_SUCCESS)
+    gomp_plugin_fatal ("cuCtxGetCurrent error: %s", cuErrorMsg (r));
+
+  if (!PTX_dev->ctx)
+    {
+      r = cuCtxCreate (&PTX_dev->ctx, CU_CTX_SCHED_AUTO, dev);
+      if (r != CUDA_SUCCESS)
+	gomp_plugin_fatal ("cuCtxCreate error: %s", cuErrorMsg (r));
+    }
+  else
+    {
+      PTX_dev->ctx_shared = true;
+    }
+   
+  r = cuDeviceGetAttribute (&pi, CU_DEVICE_ATTRIBUTE_GPU_OVERLAP, dev);
+  if (r != CUDA_SUCCESS)
+    gomp_plugin_fatal ("cuDeviceGetAttribute error: %s", cuErrorMsg (r));
+
+  PTX_dev->overlap = pi;
+
+  r = cuDeviceGetAttribute (&pi, CU_DEVICE_ATTRIBUTE_CAN_MAP_HOST_MEMORY, dev);
+  if (r != CUDA_SUCCESS)
+    gomp_plugin_fatal ("cuDeviceGetAttribute error: %s", cuErrorMsg (r));
+
+  PTX_dev->map = pi;
+
+  r = cuDeviceGetAttribute (&pi, CU_DEVICE_ATTRIBUTE_CONCURRENT_KERNELS, dev);
+  if (r != CUDA_SUCCESS)
+    gomp_plugin_fatal ("cuDeviceGetAttribute error: %s", cuErrorMsg (r));
+
+  PTX_dev->concur = pi;
+
+  r = cuDeviceGetAttribute (&pi, CU_DEVICE_ATTRIBUTE_COMPUTE_MODE, dev);
+  if (r != CUDA_SUCCESS)
+    gomp_plugin_fatal ("cuDeviceGetAttribute error: %s", cuErrorMsg (r));
+
+  PTX_dev->mode = pi;
+
+  r = cuDeviceGetAttribute (&pi, CU_DEVICE_ATTRIBUTE_INTEGRATED, dev);
+  if (r != CUDA_SUCCESS)
+    gomp_plugin_fatal ("cuDeviceGetAttribute error: %s", cuErrorMsg (r));
+
+  PTX_dev->mkern = pi;
+
+  r = cuDeviceGetAttribute (&async_engines,
+			    CU_DEVICE_ATTRIBUTE_ASYNC_ENGINE_COUNT, dev);
+  if (r != CUDA_SUCCESS)
+    async_engines = 1;
+
+  init_streams_for_device (PTX_dev, async_engines);
+
+  current_stream = PTX_dev->null_stream;
+
+  return (void *)PTX_dev;
+}
+
+static int
+PTX_close_device (void *h __attribute__((unused)))
+{
+  CUresult r;
+
+  if (!PTX_dev)
+    return 0;
+  
+  fini_streams_for_device (PTX_dev);
+
+  if (!PTX_dev->ctx_shared)
+    {
+      r = cuCtxDestroy (PTX_dev->ctx);
+      if (r != CUDA_SUCCESS)
+	gomp_plugin_fatal ("cuCtxDestroy error: %s", cuErrorMsg (r));
+    }
+
+  SLIST_REMOVE(PTX_devices, PTX_dev, PTX_device, next);
+  free (PTX_dev);
+
+  PTX_dev = NULL;
+
+  return 0;
+}
+
+static int
+PTX_get_num_devices (void)
+{
+  int n;
+  CUresult r;
+
+  assert (PTX_inited);
+
+  r = cuDeviceGetCount (&n);
+  if (r!= CUDA_SUCCESS)
+    gomp_plugin_fatal ("cuDeviceGetCount error: %s", cuErrorMsg (r));
+
+  return n;
+}
+
+static bool
+PTX_avail(void)
+{
+  bool avail = false;
+
+  if (PTX_init () > 0)
+    avail = true;
+
+  return avail;
+}
+
+#define ABORT_PTX				\
+  ".version 3.1\n"				\
+  ".target sm_30\n"				\
+  ".address_size 64\n"				\
+  ".visible .func abort;\n"			\
+  ".visible .func abort\n"			\
+  "{\n"						\
+  "trap;\n"					\
+  "ret;\n"					\
+  "}\n"						\
+  ".visible .func _gfortran_abort;\n"		\
+  ".visible .func _gfortran_abort\n"		\
+  "{\n"						\
+  "trap;\n"					\
+  "ret;\n"					\
+  "}\n" \
+
+/* Generated with:
+
+   $ echo 'int acc_on_device(int d) { return __builtin_acc_on_device(d); } int acc_on_device_(int *d) { return acc_on_device(*d); }' | accel-gcc/xgcc -Baccel-gcc -x c - -o - -S -m64 -O3 -fno-builtin-acc_on_device -fno-inline
+*/
+#define ACC_ON_DEVICE_PTX						\
+  "        .version        3.1\n"					\
+  "        .target sm_30\n"						\
+  "        .address_size 64\n"						\
+  ".visible .func (.param.u32 %out_retval)acc_on_device(.param.u32 %in_ar1);\n" \
+  ".visible .func (.param.u32 %out_retval)acc_on_device(.param.u32 %in_ar1)\n" \
+  "{\n"									\
+  "        .reg.u32 %ar1;\n"						\
+  ".reg.u32 %retval;\n"							\
+  "        .reg.u64 %hr10;\n"						\
+  "        .reg.u32 %r24;\n"						\
+  "        .reg.u32 %r25;\n"						\
+  "        .reg.pred %r27;\n"						\
+  "        .reg.u32 %r30;\n"						\
+  "        ld.param.u32 %ar1, [%in_ar1];\n"				\
+  "                mov.u32 %r24, %ar1;\n"				\
+  "                setp.ne.u32 %r27,%r24,4;\n"				\
+  "                set.u32.eq.u32 %r30,%r24,5;\n"			\
+  "                neg.s32 %r25, %r30;\n"				\
+  "        @%r27   bra     $L3;\n"					\
+  "                mov.u32 %r25, 1;\n"					\
+  "$L3:\n"								\
+  "                mov.u32 %retval, %r25;\n"				\
+  "        st.param.u32    [%out_retval], %retval;\n"			\
+  "        ret;\n"							\
+  "        }\n"								\
+  ".visible .func (.param.u32 %out_retval)acc_on_device_(.param.u64 %in_ar1);\n" \
+  ".visible .func (.param.u32 %out_retval)acc_on_device_(.param.u64 %in_ar1)\n" \
+  "{\n"									\
+  "        .reg.u64 %ar1;\n"						\
+  ".reg.u32 %retval;\n"							\
+  "        .reg.u64 %hr10;\n"						\
+  "        .reg.u64 %r25;\n"						\
+  "        .reg.u32 %r26;\n"						\
+  "        .reg.u32 %r27;\n"						\
+  "        ld.param.u64 %ar1, [%in_ar1];\n"				\
+  "                mov.u64 %r25, %ar1;\n"				\
+  "                ld.u32  %r26, [%r25];\n"				\
+  "        {\n"								\
+  "                .param.u32 %retval_in;\n"				\
+  "        {\n"								\
+  "                .param.u32 %out_arg0;\n"				\
+  "                st.param.u32 [%out_arg0], %r26;\n"			\
+  "                call (%retval_in), acc_on_device, (%out_arg0);\n"	\
+  "        }\n"								\
+  "                ld.param.u32    %r27, [%retval_in];\n"		\
+  "}\n"									\
+  "                mov.u32 %retval, %r27;\n"				\
+  "        st.param.u32    [%out_retval], %retval;\n"			\
+  "        ret;\n"							\
+  "        }"
+
+static void
+link_ptx (CUmodule *module, char *ptx_code)
+{
+  CUjit_option opts[7];
+  void *optvals[7];
+  float elapsed = 0.0;
+#define LOGSIZE 8192
+  char elog[LOGSIZE];
+  char ilog[LOGSIZE];
+  unsigned long logsize = LOGSIZE;
+  CUlinkState linkstate;
+  CUresult r;
+  void *linkout;
+  size_t linkoutsize __attribute__((unused));
+
+  gomp_plugin_notify ("attempting to load:\n---\n%s\n---\n", ptx_code);
+
+  opts[0] = CU_JIT_WALL_TIME;
+  optvals[0] = &elapsed;
+
+  opts[1] = CU_JIT_INFO_LOG_BUFFER;
+  optvals[1] = &ilog[0];
+
+  opts[2] = CU_JIT_INFO_LOG_BUFFER_SIZE_BYTES;
+  optvals[2] = (void *) logsize;
+
+  opts[3] = CU_JIT_ERROR_LOG_BUFFER;
+  optvals[3] = &elog[0];
+
+  opts[4] = CU_JIT_ERROR_LOG_BUFFER_SIZE_BYTES;
+  optvals[4] = (void *) logsize;
+
+  opts[5] = CU_JIT_LOG_VERBOSE;
+  optvals[5] = (void *) 1;
+
+  opts[6] = CU_JIT_TARGET;
+  optvals[6] = (void *) CU_TARGET_COMPUTE_30;
+
+  r = cuLinkCreate (7, opts, optvals, &linkstate);
+  if (r != CUDA_SUCCESS)
+    gomp_plugin_fatal ("cuLinkCreate error: %s", cuErrorMsg (r));
+
+  char *abort_ptx = ABORT_PTX;
+  r = cuLinkAddData (linkstate, CU_JIT_INPUT_PTX, abort_ptx,
+		     strlen (abort_ptx) + 1, 0, 0, 0, 0);
+  if (r != CUDA_SUCCESS)
+    {
+      gomp_plugin_error ("Link error log %s\n", &elog[0]);
+      gomp_plugin_fatal ("cuLinkAddData (abort) error: %s", cuErrorMsg (r));
+    }
+
+  char *acc_on_device_ptx = ACC_ON_DEVICE_PTX;
+  r = cuLinkAddData (linkstate, CU_JIT_INPUT_PTX, acc_on_device_ptx,
+		     strlen (acc_on_device_ptx) + 1, 0, 0, 0, 0);
+  if (r != CUDA_SUCCESS)
+    {
+      gomp_plugin_error ("Link error log %s\n", &elog[0]);
+      gomp_plugin_fatal ("cuLinkAddData (acc_on_device) error: %s",
+			 cuErrorMsg (r));
+    }
+
+  r = cuLinkAddData (linkstate, CU_JIT_INPUT_PTX, ptx_code,
+              strlen (ptx_code) + 1, 0, 0, 0, 0);
+  if (r != CUDA_SUCCESS)
+    {
+      gomp_plugin_error ("Link error log %s\n", &elog[0]);
+      gomp_plugin_fatal ("cuLinkAddData (ptx_code) error: %s", cuErrorMsg (r));
+    }
+
+  r = cuLinkComplete (linkstate, &linkout, &linkoutsize);
+  if (r != CUDA_SUCCESS)
+    gomp_plugin_fatal ("cuLinkComplete error: %s", cuErrorMsg (r));
+
+  gomp_plugin_notify ("Link complete: %fms\n", elapsed);
+  gomp_plugin_notify ("Link log %s\n", &ilog[0]);
+
+  r = cuModuleLoadData (module, linkout);
+  if (r != CUDA_SUCCESS)
+    gomp_plugin_fatal ("cuModuleLoadData error: %s", cuErrorMsg (r));
+}
+
+static void
+event_gc (void)
+{
+  struct PTX_event *ptx_event;
+
+  gomp_plugin_mutex_lock (&PTX_event_lock);
+
+  for (ptx_event = SLIST_FIRST (PTX_events); ptx_event;)
+    {
+      CUresult r;
+      struct PTX_event *next = SLIST_NEXT (ptx_event, next);
+
+      if (ptx_event->ord != PTX_dev->ord)
+        continue;
+
+      r = cuEventQuery (*ptx_event->evt);
+      if (r == CUDA_SUCCESS)
+        {
+          CUevent *te;
+
+          te = ptx_event->evt;
+
+	  switch (ptx_event->type)
+	    {
+	    case PTX_EVT_MEM:
+	    case PTX_EVT_SYNC:
+	      break;
+	    
+	    case PTX_EVT_KNL:
+              {
+        	map_pop (ptx_event->addr);
+		if (ptx_event->tgt)
+		  gomp_plugin_async_unmap_vars (ptx_event->tgt);
+              }
+	      break;
+	    }
+
+          cuEventDestroy (*te);
+          free ((void *)te);
+
+          SLIST_REMOVE (PTX_events, ptx_event, PTX_event, next);
+
+          free (ptx_event);
+
+	  ptx_event = next;
+        }
+    }
+
+  gomp_plugin_mutex_unlock (&PTX_event_lock);
+}
+
+static void
+event_add (enum PTX_event_type type, CUevent *e, void *h, void *tgt)
+{
+  struct PTX_event *ptx_event;
+
+  assert (type == PTX_EVT_MEM || type == PTX_EVT_KNL || type == PTX_EVT_SYNC);
+
+  ptx_event = gomp_plugin_malloc (sizeof (struct PTX_event));
+  ptx_event->type = type;
+  ptx_event->evt = e;
+  ptx_event->addr = h;
+  ptx_event->tgt = tgt;
+  ptx_event->ord = PTX_dev->ord;
+
+  gomp_plugin_mutex_lock (&PTX_event_lock);
+
+  SLIST_INSERT_HEAD(PTX_events, ptx_event, next);
+
+  gomp_plugin_mutex_unlock (&PTX_event_lock);
+}
+
+static void **kernel_target_data;
+static void **kernel_host_table;
+
+void
+PTX_exec (void (*fn), size_t mapnum, void **hostaddrs, void **devaddrs,
+	  size_t *sizes, unsigned short *kinds, int num_gangs, int num_workers,
+	  int vector_length, int async, void *targ_mem_desc)
+{
+  CUfunction function;
+  CUmodule module;
+  CUresult r;
+  char *kernel_name = NULL;
+  char **fn_names;
+  int i;
+  int fn_entries;
+  struct PTX_stream *dev_str;
+  void **fn_table;
+  void *kargs[1];
+  void *hp, *dp;
+  unsigned int nthreads_in_block;
+
+  if (kernel_target_data == NULL)
+    gomp_plugin_fatal ("Could not find module with kernel functions\n"
+		       "Perhaps -fopenacc was used without -flto ?");
+
+  link_ptx (&module, kernel_target_data[0]);
+
+  /* kernel_target_data[0] -> ptx code
+     kernel_target_data[1] -> variable mappings
+     kernel_target_data[2] -> array of kernel names in ascii
+
+     kernel_host_table[0] -> start of function addresses (_omp_func_table)
+     kernel_host_table[1] -> end of function addresses (_omp_funcs_end)
+
+     The array of kernel names and the functions addresses form a
+     one-to-one correspondence.  */
+
+  fn_table = kernel_host_table[0];
+  fn_names = (char **) kernel_target_data[2];
+  fn_entries = (kernel_host_table[1] - kernel_host_table[0]) / sizeof (void *);
+
+  for (i = 0; i < fn_entries; i++)
+    {
+      if (fn_table[i] == fn)
+	{
+	  kernel_name = fn_names[i];
+	  break;
+	}
+    }
+
+  if (!kernel_name)
+    gomp_plugin_fatal ("Could not find kernel name matching function %p", fn);
+
+  r = cuModuleGetFunction (&function, module, kernel_name);
+  if (r != CUDA_SUCCESS)
+    gomp_plugin_fatal ("cuModuleGetFunction error: %s", cuErrorMsg (r));
+  
+  dev_str = select_stream_for_async (async, pthread_self (), false, NULL);
+  assert (dev_str == current_stream);
+
+  /* This reserves a chunk of a pre-allocated page of memory mapped on both
+     the host and the device. HP is a host pointer to the new chunk, and DP is
+     the corresponding device pointer.  */
+  map_push (dev_str, async, mapnum * sizeof (void *), &hp, &dp);
+
+  gomp_plugin_notify ("  %s: prepare mappings\n", __FUNCTION__);
+
+  /* Copy the array of arguments to the mapped page.  */
+  for (i = 0; i < mapnum; i++)
+    ((void **) hp)[i] = devaddrs[i];
+
+  /* Copy the (device) pointers to arguments to the device (dp and hp might in
+     fact have the same value on a unified-memory system).  */
+  r = cuMemcpy ((CUdeviceptr)dp, (CUdeviceptr)hp, mapnum * sizeof (void *));
+  if (r != CUDA_SUCCESS)
+    gomp_plugin_fatal ("cuMemcpy failed: %s", cuErrorMsg (r));
+
+  gomp_plugin_notify ("  %s: kernel %s: launch\n", __FUNCTION__, kernel_name);
+
+  // XXX: possible geometry mappings??
+  //
+  // OpenACC		CUDA
+  //
+  // num_gangs		blocks
+  // num_workers	warps (where a warp is equivalent to 32 threads)
+  // vector length	threads
+  //
+
+  /* The openacc vector_length clause 'determines the vector length to use for
+     vector or SIMD operations'.  The question is how to map this to CUDA.
+
+     In CUDA, the warp size is the vector length of a CUDA device.  However, the
+     CUDA interface abstracts away from that, and only shows us warp size
+     indirectly in maximum number of threads per block, which is a product of
+     warp size and the number of hyperthreads of a multiprocessor.
+
+     We choose to map openacc vector_length directly onto the number of threads
+     in a block, in the x dimension.  This is reflected in gcc code generation
+     that uses ThreadIdx.x to access vector elements.
+
+     Attempting to use an openacc vector_length of more than the maximum number
+     of threads per block will result in a cuda error.  */
+  nthreads_in_block = vector_length;
+
+  kargs[0] = &dp;
+  r = cuLaunchKernel (function,
+			1, 1, 1,
+			nthreads_in_block, 1, 1,
+			0, dev_str->stream, kargs, 0);
+  if (r != CUDA_SUCCESS)
+    gomp_plugin_fatal ("cuLaunchKernel error: %s", cuErrorMsg (r));
+
+#ifndef DISABLE_ASYNC
+  if (async < acc_async_noval)
+    {
+      r = cuStreamSynchronize (dev_str->stream);
+      if (r != CUDA_SUCCESS)
+        gomp_plugin_fatal ("cuStreamSynchronize error: %s", cuErrorMsg (r));
+    }
+  else
+    {
+      CUevent *e;
+
+      e = (CUevent *)gomp_plugin_malloc (sizeof (CUevent));
+
+      r = cuEventCreate (e, CU_EVENT_DISABLE_TIMING);
+      if (r != CUDA_SUCCESS)
+        gomp_plugin_fatal ("cuEventCreate error: %s", cuErrorMsg (r));
+
+      event_gc ();
+
+      r = cuEventRecord (*e, dev_str->stream);
+      if (r != CUDA_SUCCESS)
+        gomp_plugin_fatal ("cuEventRecord error: %s", cuErrorMsg (r));
+
+      event_add (PTX_EVT_KNL, e, (void *)dev_str, targ_mem_desc);
+    }
+#else
+  r = cuCtxSynchronize ();
+  if (r != CUDA_SUCCESS)
+    gomp_plugin_fatal ("cuCtxSynchronize error: %s", cuErrorMsg (r));
+#endif
+
+  gomp_plugin_notify ("  %s: kernel %s: finished\n", __FUNCTION__,
+	       kernel_name);
+
+#ifndef DISABLE_ASYNC
+  if (async < acc_async_noval)
+#endif
+    map_pop (dev_str);
+}
+
+void * openacc_get_current_cuda_context (void);
+
+static void *
+PTX_alloc (size_t s)
+{
+  CUdeviceptr d;
+  CUresult r;
+
+  r = cuMemAlloc (&d, s);
+  if (r == CUDA_ERROR_OUT_OF_MEMORY)
+    return 0;
+  if (r != CUDA_SUCCESS)
+    gomp_plugin_fatal ("cuMemAlloc error: %s", cuErrorMsg (r));
+  return (void *)d;
+}
+
+static void
+PTX_free (void *p)
+{
+  CUresult r;
+  CUdeviceptr pb;
+  size_t ps;
+
+  r = cuMemGetAddressRange (&pb, &ps, (CUdeviceptr)p);
+  if (r != CUDA_SUCCESS)
+    gomp_plugin_fatal ("cuMemGetAddressRange error: %s", cuErrorMsg (r));
+
+  if ((CUdeviceptr)p != pb)
+    gomp_plugin_fatal ("invalid device address");
+
+  r = cuMemFree ((CUdeviceptr)p);
+  if (r != CUDA_SUCCESS)
+    gomp_plugin_fatal ("cuMemFree error: %s", cuErrorMsg (r));
+}
+
+static void *
+PTX_host2dev (void *d, const void *h, size_t s)
+{
+  CUresult r;
+  CUdeviceptr pb;
+  size_t ps;
+
+  if (!s)
+    return 0;
+
+  if (!d)
+    gomp_plugin_fatal ("invalid device address");
+
+  r = cuMemGetAddressRange (&pb, &ps, (CUdeviceptr)d);
+  if (r != CUDA_SUCCESS)
+    gomp_plugin_fatal ("cuMemGetAddressRange error: %s", cuErrorMsg (r));
+
+  if (!pb)
+    gomp_plugin_fatal ("invalid device address");
+
+  if (!h)
+    gomp_plugin_fatal ("invalid host address");
+
+  if (d == h)
+    gomp_plugin_fatal ("invalid host or device address");
+
+  if ((void *)(d + s) > (void *)(pb + ps))
+    gomp_plugin_fatal ("invalid size");
+
+#ifndef DISABLE_ASYNC
+  if (current_stream != PTX_dev->null_stream)
+    {
+      CUevent *e;
+
+      e = (CUevent *)gomp_plugin_malloc (sizeof (CUevent));
+
+      r = cuEventCreate (e, CU_EVENT_DISABLE_TIMING);
+      if (r != CUDA_SUCCESS)
+        gomp_plugin_fatal ("cuEventCreate error: %s", cuErrorMsg (r));
+
+      event_gc ();
+
+      r = cuMemcpyHtoDAsync ((CUdeviceptr)d, h, s, current_stream->stream);
+      if (r != CUDA_SUCCESS)
+        gomp_plugin_fatal ("cuMemcpyHtoDAsync error: %s", cuErrorMsg (r));
+
+      r = cuEventRecord (*e, current_stream->stream);
+      if (r != CUDA_SUCCESS)
+        gomp_plugin_fatal ("cuEventRecord error: %s", cuErrorMsg (r));
+
+      event_add (PTX_EVT_MEM, e, (void *)h, NULL);
+    }
+  else
+#endif
+    {
+      r = cuMemcpyHtoD ((CUdeviceptr)d, h, s);
+      if (r != CUDA_SUCCESS)
+        gomp_plugin_fatal ("cuMemcpyHtoD error: %s", cuErrorMsg (r));
+    }
+
+  return 0;
+}
+
+static void *
+PTX_dev2host (void *h, const void *d, size_t s)
+{
+  CUresult r;
+  CUdeviceptr pb;
+  size_t ps;
+
+  if (!s)
+    return 0;
+
+  if (!d)
+    gomp_plugin_fatal ("invalid device address");
+
+  r = cuMemGetAddressRange (&pb, &ps, (CUdeviceptr)d);
+  if (r != CUDA_SUCCESS)
+    gomp_plugin_fatal ("cuMemGetAddressRange error: %s", cuErrorMsg (r));
+
+  if (!pb)
+    gomp_plugin_fatal ("invalid device address");
+
+  if (!h)
+    gomp_plugin_fatal ("invalid host address");
+
+  if (d == h)
+    gomp_plugin_fatal ("invalid host or device address");
+
+  if ((void *)(d + s) > (void *)(pb + ps))
+    gomp_plugin_fatal ("invalid size");
+
+#ifndef DISABLE_ASYNC
+  if (current_stream != PTX_dev->null_stream)
+    {
+      CUevent *e;
+
+      e = (CUevent *)gomp_plugin_malloc (sizeof (CUevent));
+
+      r = cuEventCreate (e, CU_EVENT_DISABLE_TIMING);
+      if (r != CUDA_SUCCESS)
+        gomp_plugin_fatal ("cuEventCreate error: %s\n", cuErrorMsg (r));
+
+      event_gc ();
+
+      r = cuMemcpyDtoHAsync (h, (CUdeviceptr)d, s, current_stream->stream);
+      if (r != CUDA_SUCCESS)
+        gomp_plugin_fatal ("cuMemcpyDtoHAsync error: %s", cuErrorMsg (r));
+
+      r = cuEventRecord (*e, current_stream->stream);
+      if (r != CUDA_SUCCESS)
+        gomp_plugin_fatal ("cuEventRecord error: %s", cuErrorMsg (r));
+
+      event_add (PTX_EVT_MEM, e, (void *)h, NULL);
+    }
+  else
+#endif
+    {
+      r = cuMemcpyDtoH (h, (CUdeviceptr)d, s);
+      if (r != CUDA_SUCCESS)
+        gomp_plugin_fatal ("cuMemcpyDtoH error: %s", cuErrorMsg (r));
+    }
+
+  return 0;
+}
+
+static void
+PTX_set_async (int async)
+{
+  current_stream = select_stream_for_async (async, pthread_self (), true, NULL);
+}
+
+static int
+PTX_async_test (int async)
+{
+  CUresult r;
+  struct PTX_stream *s;
+  
+  s = select_stream_for_async (async, pthread_self (), false, NULL);
+
+  if (!s)
+    gomp_plugin_fatal ("unknown async %d", async);
+
+  r = cuStreamQuery (s->stream);
+  if (r == CUDA_SUCCESS)
+    return 1;
+  else if (r == CUDA_ERROR_NOT_READY)
+    return 0;
+
+  gomp_plugin_fatal ("cuStreamQuery error: %s", cuErrorMsg (r));
+
+  return 0;
+}
+
+static int
+PTX_async_test_all (void)
+{
+  struct PTX_stream *s;
+  pthread_t self = pthread_self ();
+
+  gomp_plugin_mutex_lock (&PTX_dev->stream_lock);
+
+  SLIST_FOREACH (s, &PTX_dev->active_streams, next)
+    {
+      if ((s->multithreaded || pthread_equal (s->host_thread, self))
+	  && cuStreamQuery (s->stream) == CUDA_ERROR_NOT_READY)
+	{
+	  gomp_plugin_mutex_unlock (&PTX_dev->stream_lock);
+	  return 0;
+	}
+    }
+
+  gomp_plugin_mutex_unlock (&PTX_dev->stream_lock);
+
+  return 1;
+}
+
+static void
+PTX_wait (int async)
+{
+  CUresult r;
+  struct PTX_stream *s;
+  
+  s = select_stream_for_async (async, pthread_self (), false, NULL);
+
+  if (!s)
+    gomp_plugin_fatal ("unknown async %d", async);
+
+  r = cuStreamSynchronize (s->stream);
+  if (r != CUDA_SUCCESS)
+    gomp_plugin_fatal ("cuStreamSynchronize error: %s", cuErrorMsg (r));
+  
+  event_gc ();
+}
+
+static void
+PTX_wait_async (int async1, int async2)
+{
+  CUresult r;
+  CUevent *e;
+  struct PTX_stream *s1, *s2;
+  pthread_t self = pthread_self ();
+
+  /* The stream that is waiting (rather than being waited for) doesn't
+     necessarily have to exist already.  */
+  s2 = select_stream_for_async (async2, self, true, NULL);
+
+  s1 = select_stream_for_async (async1, self, false, NULL);
+  if (!s1)
+    gomp_plugin_fatal ("invalid async 1\n");
+
+  if (s1 == s2)
+    gomp_plugin_fatal ("identical parameters");
+
+  e = (CUevent *)gomp_plugin_malloc (sizeof (CUevent));
+
+  r = cuEventCreate (e, CU_EVENT_DISABLE_TIMING);
+  if (r != CUDA_SUCCESS)
+    gomp_plugin_fatal ("cuEventCreate error: %s", cuErrorMsg (r));
+
+  event_gc ();
+
+  r = cuEventRecord (*e, s1->stream);
+  if (r != CUDA_SUCCESS)
+    gomp_plugin_fatal ("cuEventRecord error: %s", cuErrorMsg (r));
+
+  event_add (PTX_EVT_SYNC, e, NULL, NULL);
+
+  r = cuStreamWaitEvent (s2->stream, *e, 0);
+  if (r != CUDA_SUCCESS)
+    gomp_plugin_fatal ("cuStreamWaitEvent error: %s", cuErrorMsg (r));
+}
+
+static void
+PTX_wait_all (void)
+{
+  CUresult r;
+  struct PTX_stream *s;
+  pthread_t self = pthread_self ();
+
+  gomp_plugin_mutex_lock (&PTX_dev->stream_lock);
+
+  /* Wait for active streams initiated by this thread (or by multiple threads)
+     to complete.  */
+  SLIST_FOREACH (s, &PTX_dev->active_streams, next)
+    {
+      if (s->multithreaded || pthread_equal (s->host_thread, self))
+        {
+	  r = cuStreamQuery (s->stream);
+	  if (r == CUDA_SUCCESS)
+	    continue;
+	  else if (r != CUDA_ERROR_NOT_READY)
+	    gomp_plugin_fatal ("cuStreamQuery error: %s", cuErrorMsg (r));
+
+	  r = cuStreamSynchronize (s->stream);
+	  if (r != CUDA_SUCCESS)
+	    gomp_plugin_fatal ("cuStreamSynchronize error: %s", cuErrorMsg (r));
+	}
+    }
+
+  gomp_plugin_mutex_unlock (&PTX_dev->stream_lock);
+
+  event_gc ();
+}
+
+static void
+PTX_wait_all_async (int async)
+{
+  CUresult r;
+  struct PTX_stream *waiting_stream, *other_stream;
+  CUevent *e;
+  pthread_t self = pthread_self ();
+  
+  /* The stream doing the waiting.  This could be the first mention of the
+     stream, so create it if necessary.  */
+  waiting_stream
+    = select_stream_for_async (async, pthread_self (), true, NULL);
+  
+  /* Launches on the null stream already block on other streams in the
+     context.  */
+  if (!waiting_stream || waiting_stream == PTX_dev->null_stream)
+    return;
+
+  event_gc ();
+
+  gomp_plugin_mutex_lock (&PTX_dev->stream_lock);
+
+  SLIST_FOREACH (other_stream, &PTX_dev->active_streams, next)
+    {
+      if (!other_stream->multithreaded
+	  && !pthread_equal (other_stream->host_thread, self))
+	continue;
+
+      e = (CUevent *) gomp_plugin_malloc (sizeof (CUevent));
+
+      r = cuEventCreate (e, CU_EVENT_DISABLE_TIMING);
+      if (r != CUDA_SUCCESS)
+	gomp_plugin_fatal ("cuEventCreate error: %s", cuErrorMsg (r));
+
+      /* Record an event on the waited-for stream.  */
+      r = cuEventRecord (*e, other_stream->stream);
+      if (r != CUDA_SUCCESS)
+	gomp_plugin_fatal ("cuEventRecord error: %s", cuErrorMsg (r));
+
+      event_add (PTX_EVT_SYNC, e, NULL, NULL);
+
+      r = cuStreamWaitEvent (waiting_stream->stream, *e, 0);
+      if (r != CUDA_SUCCESS)
+	gomp_plugin_fatal ("cuStreamWaitEvent error: %s", cuErrorMsg (r));
+   }
+
+  gomp_plugin_mutex_unlock (&PTX_dev->stream_lock);
+}
+
+static void *
+PTX_get_current_cuda_device (void)
+{
+  if (!PTX_dev)
+    return NULL;
+
+  return &PTX_dev->dev;
+}
+
+static void *
+PTX_get_current_cuda_context (void)
+{
+  if (!PTX_dev)
+    return NULL;
+
+  return PTX_dev->ctx;
+}
+
+static void *
+PTX_get_cuda_stream (int async)
+{
+  struct PTX_stream *s;
+
+  if (!PTX_dev)
+    return NULL;
+
+  s = select_stream_for_async (async, pthread_self (), false, NULL);
+
+  return s ? s->stream : NULL;
+}
+
+static int
+PTX_set_cuda_stream (int async, void *stream)
+{
+  struct PTX_stream *oldstream;
+  pthread_t self = pthread_self ();
+
+  gomp_plugin_mutex_lock (&PTX_dev->stream_lock);
+
+  if (async < 0)
+    gomp_plugin_fatal ("bad async %d", async);
+
+  /* We have a list of active streams and an array mapping async values to
+     entries of that list.  We need to take "ownership" of the passed-in stream,
+     and add it to our list, removing the previous entry also (if there was one)
+     in order to prevent resource leaks.  Note the potential for surprise
+     here: maybe we should keep track of passed-in streams and leave it up to
+     the user to tidy those up, but that doesn't work for stream handles
+     returned from acc_get_cuda_stream above...  */
+
+  oldstream = select_stream_for_async (async, self, false, NULL);
+  
+  if (oldstream)
+    {
+      SLIST_REMOVE (&PTX_dev->active_streams, oldstream, PTX_stream, next);
+      
+      cuStreamDestroy (oldstream->stream);
+      map_fini (oldstream);
+      free (oldstream);
+    }
+
+  gomp_plugin_mutex_unlock (&PTX_dev->stream_lock);
+
+  (void) select_stream_for_async (async, self, true, (CUstream) stream);
+
+  return 1;
+}
+
+/* Plugin entry points.  */
+
+
+int
+get_type (void)
+{
+#ifdef DEBUG
+  fprintf (stderr, "libgomp plugin: %s:%s\n", __FILE__, __FUNCTION__);
+#endif
+
+  return TARGET_TYPE_NVIDIA_PTX;
+}
+
+unsigned int
+get_caps (void)
+{
+  return TARGET_CAP_OPENACC_200;
+}
+
+const char *
+get_name (void)
+{
+  return "nvidia";
+}
+
+int
+get_num_devices (void)
+{
+#ifdef DEBUG
+  fprintf (stderr, "libgomp plugin: %s:%s\n", __FILE__, __FUNCTION__);
+#endif
+
+  return PTX_get_num_devices ();
+}
+  
+void
+offload_register (void *host_table, void *target_data)
+{
+#ifdef DEBUG
+  fprintf (stderr, "libgomp plugin: %s:%s (%p, %p)\n", __FILE__, __FUNCTION__,
+	   host_table, target_data);
+#endif
+  
+  kernel_target_data = target_data;
+  kernel_host_table = host_table;
+}
+
+int
+device_init (void)
+{
+#ifdef DEBUG
+  fprintf (stderr, "libgomp plugin: %s:%s\n", __FILE__, __FUNCTION__);
+#endif
+
+  return PTX_init ();
+}
+
+int
+device_fini (void)
+{
+#ifdef DEBUG
+  fprintf (stderr, "libgomp plugin: %s:%s\n", __FILE__, __FUNCTION__);
+#endif
+
+  return PTX_fini ();
+}
+
+int
+device_get_table (void *table)
+{
+#ifdef DEBUG
+  fprintf (stderr, "libgomp plugin: %s:%s (%p)\n", __FILE__, __FUNCTION__,
+	   table);
+#endif
+
+  /* There are no fixed host-target address mappings for NVPTX.  */
+  return 0;
+}
+
+void *
+device_alloc (size_t size)
+{
+#ifdef DEBUG
+  fprintf (stderr, "libgomp plugin: %s:%s (%zu)\n", __FILE__, __FUNCTION__,
+	   size);
+#endif
+
+  return PTX_alloc (size);
+}
+
+void
+device_free (void *ptr)
+{
+#ifdef DEBUG
+  fprintf (stderr, "libgomp plugin: %s:%s (%p)\n", __FILE__, __FUNCTION__, ptr);
+#endif
+
+  PTX_free (ptr);
+}
+
+void *
+device_dev2host (void *dst, const void *src, size_t n)
+{
+#ifdef DEBUG
+  fprintf (stderr, "libgomp plugin: %s:%s (%p, %p, %zu)\n", __FILE__,
+	   __FUNCTION__, dst,
+	  src, n);
+#endif
+
+  return PTX_dev2host (dst, src, n);
+}
+
+void *
+device_host2dev (void *dst, const void *src, size_t n)
+{
+#ifdef DEBUG
+  fprintf (stderr, "libgomp plugin: %s:%s (%p, %p, %zu)\n", __FILE__,
+	   __FUNCTION__, dst, src, n);
+#endif
+
+  return PTX_host2dev (dst, src, n);
+}
+
+void (*device_run) (void *fn_ptr, void *vars) = NULL;
+
+void
+openacc_parallel (void (*fn) (void *), size_t mapnum, void **hostaddrs,
+		  void **devaddrs, size_t *sizes, unsigned short *kinds,
+		  int num_gangs, int num_workers, int vector_length,
+		  int async, void *targ_mem_desc)
+{
+#ifdef DEBUG
+  fprintf (stderr, "libgomp plugin: %s:%s (%p, %zu, %p, %p, %p, %d, %d, %d, "
+	   "%d, %p)\n", __FILE__, __FUNCTION__, fn, mapnum, hostaddrs, sizes,
+	   kinds, num_gangs, num_workers, vector_length, async, targ_mem_desc);
+#endif
+
+  PTX_exec (fn, mapnum, hostaddrs, devaddrs, sizes, kinds, num_gangs,
+	    num_workers, vector_length, async, targ_mem_desc);
+}
+
+void *
+openacc_open_device (int n)
+{
+#ifdef DEBUG
+  fprintf (stderr, "libgomp plugin: %s:%s (%d)\n", __FILE__, __FUNCTION__, n);
+#endif
+  return PTX_open_device (n);
+}
+
+int
+openacc_close_device (void *h)
+{
+#ifdef DEBUG
+  fprintf (stderr, "libgomp plugin: %s:%s (%p)\n", __FILE__, __FUNCTION__, h);
+#endif
+  return PTX_close_device (h);
+}
+
+void
+openacc_set_device_num (int n)
+{
+  assert (n >= 0);
+
+  if (!PTX_dev || PTX_dev->ord != n)
+    (void) PTX_open_device (n);
+}
+
+/* This can be called before the device is "opened" for the current thread, in
+   which case we can't tell which device number should be returned.  We don't
+   actually want to open the device here, so just return -1 and let the caller
+   (oacc-init.c:acc_get_device_num) handle it.  */
+
+int
+openacc_get_device_num (void)
+{
+  if (PTX_dev)
+    return PTX_dev->ord;
+  else
+    return -1;
+}
+
+bool
+openacc_avail (void)
+{
+#ifdef DEBUG
+  fprintf (stderr, "libgomp plugin: %s:%s\n", __FILE__, __FUNCTION__);
+#endif
+  return PTX_avail ();
+}
+
+int
+openacc_async_test (int async)
+{
+#ifdef DEBUG
+  fprintf (stderr, "libgomp plugin: %s:%s (%d)\n", __FILE__, __FUNCTION__,
+	   async);
+#endif
+  return PTX_async_test (async);
+}
+
+int
+openacc_async_test_all (void)
+{
+#ifdef DEBUG
+  fprintf (stderr, "libgomp plugin: %s:%s\n", __FILE__, __FUNCTION__);
+#endif
+  return PTX_async_test_all ();
+}
+
+void
+openacc_async_wait (int async)
+{
+#ifdef DEBUG
+  fprintf (stderr, "libgomp plugin: %s:%s (%d)\n", __FILE__, __FUNCTION__,
+	   async);
+#endif
+  PTX_wait (async);
+}
+
+void
+openacc_async_wait_async (int async1, int async2)
+{
+#ifdef DEBUG
+  fprintf (stderr, "libgomp plugin: %s:%s (%d, %d)\n", __FILE__, __FUNCTION__,
+	   async1, async2);
+#endif
+  PTX_wait_async (async1, async2);
+}
+
+void
+openacc_async_wait_all (void)
+{
+#ifdef DEBUG
+  fprintf (stderr, "libgomp plugin: %s:%s\n", __FILE__, __FUNCTION__);
+#endif
+  PTX_wait_all ();
+}
+
+void
+openacc_async_wait_all_async (int async)
+{
+#ifdef DEBUG
+  fprintf (stderr, "libgomp plugin: %s:%s (%d)\n", __FILE__, __FUNCTION__,
+	   async);
+#endif
+  PTX_wait_all_async (async);
+}
+
+void
+openacc_async_set_async (int async)
+{
+#ifdef DEBUG
+  fprintf (stderr, "libgomp plugin: %s:%s (%d)\n", __FILE__, __FUNCTION__,
+	   async);
+#endif
+  PTX_set_async (async);
+}
+
+void *
+openacc_get_current_cuda_device (void)
+{
+#ifdef DEBUG
+  fprintf (stderr, "libgomp plugin: %s:%s\n", __FILE__, __FUNCTION__);
+#endif
+  return PTX_get_current_cuda_device ();
+}
+
+void *
+openacc_get_current_cuda_context (void)
+{
+#ifdef DEBUG
+  fprintf (stderr, "libgomp plugin: %s:%s\n", __FILE__, __FUNCTION__);
+#endif
+  return PTX_get_current_cuda_context ();
+}
+
+/* NOTE: This returns a CUstream, not a PTX_stream pointer.  */
+
+void *
+openacc_get_cuda_stream (int async)
+{
+#ifdef DEBUG
+  fprintf (stderr, "libgomp plugin: %s:%s (%d)\n", __FILE__, __FUNCTION__,
+	   async);
+#endif
+  return PTX_get_cuda_stream (async);
+}
+
+/* NOTE: This takes a CUstream, not a PTX_stream pointer.  */
+
+int
+openacc_set_cuda_stream (int async, void *stream)
+{
+#ifdef DEBUG
+  fprintf (stderr, "libgomp plugin: %s:%s (%d, %p)\n", __FILE__, __FUNCTION__,
+	   async, stream);
+#endif
+  return PTX_set_cuda_stream (async, stream);
+}
diff --git a/libgomp/splay-tree.c b/libgomp/splay-tree.c
new file mode 100644
index 0000000..14b03ac
--- /dev/null
+++ b/libgomp/splay-tree.c
@@ -0,0 +1,224 @@
+/* A splay-tree datatype.
+   Copyright 1998-2013
+   Free Software Foundation, Inc.
+   Contributed by Mark Mitchell (mark@markmitchell.com).
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* The splay tree code copied from include/splay-tree.h and adjusted,
+   so that all the data lives directly in splay_tree_node_s structure
+   and no extra allocations are needed.
+
+   Files including this header should before including it add:
+typedef struct splay_tree_node_s *splay_tree_node;
+typedef struct splay_tree_s *splay_tree;
+typedef struct splay_tree_key_s *splay_tree_key;
+   define splay_tree_key_s structure, and define
+   splay_compare inline function.  */
+
+/* For an easily readable description of splay-trees, see:
+
+     Lewis, Harry R. and Denenberg, Larry.  Data Structures and Their
+     Algorithms.  Harper-Collins, Inc.  1991.
+
+   The major feature of splay trees is that all basic tree operations
+   are amortized O(log n) time for a tree with n nodes.  */
+
+#include "libgomp.h"
+#include "splay-tree.h"
+
+extern int splay_compare (splay_tree_key, splay_tree_key);
+
+/* Rotate the edge joining the left child N with its parent P.  PP is the
+   grandparents' pointer to P.  */
+
+static inline void
+rotate_left (splay_tree_node *pp, splay_tree_node p, splay_tree_node n)
+{
+  splay_tree_node tmp;
+  tmp = n->right;
+  n->right = p;
+  p->left = tmp;
+  *pp = n;
+}
+
+/* Rotate the edge joining the right child N with its parent P.  PP is the
+   grandparents' pointer to P.  */
+
+static inline void
+rotate_right (splay_tree_node *pp, splay_tree_node p, splay_tree_node n)
+{
+  splay_tree_node tmp;
+  tmp = n->left;
+  n->left = p;
+  p->right = tmp;
+  *pp = n;
+}
+
+/* Bottom up splay of KEY.  */
+
+static void
+splay_tree_splay (splay_tree sp, splay_tree_key key)
+{
+  if (sp->root == NULL)
+    return;
+
+  do {
+    int cmp1, cmp2;
+    splay_tree_node n, c;
+
+    n = sp->root;
+    cmp1 = splay_compare (key, &n->key);
+
+    /* Found.  */
+    if (cmp1 == 0)
+      return;
+
+    /* Left or right?  If no child, then we're done.  */
+    if (cmp1 < 0)
+      c = n->left;
+    else
+      c = n->right;
+    if (!c)
+      return;
+
+    /* Next one left or right?  If found or no child, we're done
+       after one rotation.  */
+    cmp2 = splay_compare (key, &c->key);
+    if (cmp2 == 0
+	|| (cmp2 < 0 && !c->left)
+	|| (cmp2 > 0 && !c->right))
+      {
+	if (cmp1 < 0)
+	  rotate_left (&sp->root, n, c);
+	else
+	  rotate_right (&sp->root, n, c);
+	return;
+      }
+
+    /* Now we have the four cases of double-rotation.  */
+    if (cmp1 < 0 && cmp2 < 0)
+      {
+	rotate_left (&n->left, c, c->left);
+	rotate_left (&sp->root, n, n->left);
+      }
+    else if (cmp1 > 0 && cmp2 > 0)
+      {
+	rotate_right (&n->right, c, c->right);
+	rotate_right (&sp->root, n, n->right);
+      }
+    else if (cmp1 < 0 && cmp2 > 0)
+      {
+	rotate_right (&n->left, c, c->right);
+	rotate_left (&sp->root, n, n->left);
+      }
+    else if (cmp1 > 0 && cmp2 < 0)
+      {
+	rotate_left (&n->right, c, c->left);
+	rotate_right (&sp->root, n, n->right);
+      }
+  } while (1);
+}
+
+/* Insert a new NODE into SP.  The NODE shouldn't exist in the tree.  */
+
+attribute_hidden void
+splay_tree_insert (splay_tree sp, splay_tree_node node)
+{
+  int comparison = 0;
+
+  splay_tree_splay (sp, &node->key);
+
+  if (sp->root)
+    comparison = splay_compare (&sp->root->key, &node->key);
+
+  if (sp->root && comparison == 0)
+    gomp_fatal ("Duplicate node");
+  else
+    {
+      /* Insert it at the root.  */
+      if (sp->root == NULL)
+	node->left = node->right = NULL;
+      else if (comparison < 0)
+	{
+	  node->left = sp->root;
+	  node->right = node->left->right;
+	  node->left->right = NULL;
+	}
+      else
+	{
+	  node->right = sp->root;
+	  node->left = node->right->left;
+	  node->right->left = NULL;
+	}
+
+      sp->root = node;
+    }
+}
+
+/* Remove node with KEY from SP.  It is not an error if it did not exist.  */
+
+attribute_hidden void
+splay_tree_remove (splay_tree sp, splay_tree_key key)
+{
+  splay_tree_splay (sp, key);
+
+  if (sp->root && splay_compare (&sp->root->key, key) == 0)
+    {
+      splay_tree_node left, right;
+
+      left = sp->root->left;
+      right = sp->root->right;
+
+      /* One of the children is now the root.  Doesn't matter much
+	 which, so long as we preserve the properties of the tree.  */
+      if (left)
+	{
+	  sp->root = left;
+
+	  /* If there was a right child as well, hang it off the
+	     right-most leaf of the left child.  */
+	  if (right)
+	    {
+	      while (left->right)
+		left = left->right;
+	      left->right = right;
+	    }
+	}
+      else
+	sp->root = right;
+    }
+}
+
+/* Lookup KEY in SP, returning NODE if present, and NULL
+   otherwise.  */
+
+attribute_hidden splay_tree_key
+splay_tree_lookup (splay_tree sp, splay_tree_key key)
+{
+  splay_tree_splay (sp, key);
+
+  if (sp->root && splay_compare (&sp->root->key, key) == 0)
+    return &sp->root->key;
+  else
+    return NULL;
+}
diff --git a/libgomp/splay-tree.h b/libgomp/splay-tree.h
index 04a71d1..d98ee9e 100644
--- a/libgomp/splay-tree.h
+++ b/libgomp/splay-tree.h
@@ -43,6 +43,30 @@ typedef struct splay_tree_key_s *splay_tree_key;
    The major feature of splay trees is that all basic tree operations
    are amortized O(log n) time for a tree with n nodes.  */
 
+#ifndef _SPLAY_TREE_H
+#define _SPLAY_TREE_H 1
+
+typedef struct splay_tree_node_s *splay_tree_node;
+typedef struct splay_tree_s *splay_tree;
+typedef struct splay_tree_key_s *splay_tree_key;
+
+struct splay_tree_key_s {
+  /* Address of the host object.  */
+  uintptr_t host_start;
+  /* Address immediately after the host object.  */
+  uintptr_t host_end;
+  /* Descriptor of the target memory.  */
+  struct target_mem_desc *tgt;
+  /* Offset from tgt->tgt_start to the start of the target object.  */
+  uintptr_t tgt_offset;
+  /* Reference count.  */
+  uintptr_t refcount;
+  /* Asynchronous reference count.  */
+  uintptr_t async_refcount;
+  /* True if data should be copied from device to host at the end.  */
+  bool copy_from;
+};
+
 /* The nodes in the splay tree.  */
 struct splay_tree_node_s {
   struct splay_tree_key_s key;
@@ -56,177 +80,8 @@ struct splay_tree_s {
   splay_tree_node root;
 };
 
-/* Rotate the edge joining the left child N with its parent P.  PP is the
-   grandparents' pointer to P.  */
-
-static inline void
-rotate_left (splay_tree_node *pp, splay_tree_node p, splay_tree_node n)
-{
-  splay_tree_node tmp;
-  tmp = n->right;
-  n->right = p;
-  p->left = tmp;
-  *pp = n;
-}
-
-/* Rotate the edge joining the right child N with its parent P.  PP is the
-   grandparents' pointer to P.  */
-
-static inline void
-rotate_right (splay_tree_node *pp, splay_tree_node p, splay_tree_node n)
-{
-  splay_tree_node tmp;
-  tmp = n->left;
-  n->left = p;
-  p->right = tmp;
-  *pp = n;
-}
-
-/* Bottom up splay of KEY.  */
-
-static void
-splay_tree_splay (splay_tree sp, splay_tree_key key)
-{
-  if (sp->root == NULL)
-    return;
-
-  do {
-    int cmp1, cmp2;
-    splay_tree_node n, c;
-
-    n = sp->root;
-    cmp1 = splay_compare (key, &n->key);
-
-    /* Found.  */
-    if (cmp1 == 0)
-      return;
-
-    /* Left or right?  If no child, then we're done.  */
-    if (cmp1 < 0)
-      c = n->left;
-    else
-      c = n->right;
-    if (!c)
-      return;
-
-    /* Next one left or right?  If found or no child, we're done
-       after one rotation.  */
-    cmp2 = splay_compare (key, &c->key);
-    if (cmp2 == 0
-	|| (cmp2 < 0 && !c->left)
-	|| (cmp2 > 0 && !c->right))
-      {
-	if (cmp1 < 0)
-	  rotate_left (&sp->root, n, c);
-	else
-	  rotate_right (&sp->root, n, c);
-	return;
-      }
-
-    /* Now we have the four cases of double-rotation.  */
-    if (cmp1 < 0 && cmp2 < 0)
-      {
-	rotate_left (&n->left, c, c->left);
-	rotate_left (&sp->root, n, n->left);
-      }
-    else if (cmp1 > 0 && cmp2 > 0)
-      {
-	rotate_right (&n->right, c, c->right);
-	rotate_right (&sp->root, n, n->right);
-      }
-    else if (cmp1 < 0 && cmp2 > 0)
-      {
-	rotate_right (&n->left, c, c->right);
-	rotate_left (&sp->root, n, n->left);
-      }
-    else if (cmp1 > 0 && cmp2 < 0)
-      {
-	rotate_left (&n->right, c, c->left);
-	rotate_right (&sp->root, n, n->right);
-      }
-  } while (1);
-}
-
-/* Insert a new NODE into SP.  The NODE shouldn't exist in the tree.  */
-
-static void
-splay_tree_insert (splay_tree sp, splay_tree_node node)
-{
-  int comparison = 0;
-
-  splay_tree_splay (sp, &node->key);
-
-  if (sp->root)
-    comparison = splay_compare (&sp->root->key, &node->key);
-
-  if (sp->root && comparison == 0)
-    abort ();
-  else
-    {
-      /* Insert it at the root.  */
-      if (sp->root == NULL)
-	node->left = node->right = NULL;
-      else if (comparison < 0)
-	{
-	  node->left = sp->root;
-	  node->right = node->left->right;
-	  node->left->right = NULL;
-	}
-      else
-	{
-	  node->right = sp->root;
-	  node->left = node->right->left;
-	  node->right->left = NULL;
-	}
-
-      sp->root = node;
-    }
-}
-
-/* Remove node with KEY from SP.  It is not an error if it did not exist.  */
-
-static void
-splay_tree_remove (splay_tree sp, splay_tree_key key)
-{
-  splay_tree_splay (sp, key);
-
-  if (sp->root && splay_compare (&sp->root->key, key) == 0)
-    {
-      splay_tree_node left, right;
-
-      left = sp->root->left;
-      right = sp->root->right;
-
-      /* One of the children is now the root.  Doesn't matter much
-	 which, so long as we preserve the properties of the tree.  */
-      if (left)
-	{
-	  sp->root = left;
-
-	  /* If there was a right child as well, hang it off the
-	     right-most leaf of the left child.  */
-	  if (right)
-	    {
-	      while (left->right)
-		left = left->right;
-	      left->right = right;
-	    }
-	}
-      else
-	sp->root = right;
-    }
-}
-
-/* Lookup KEY in SP, returning NODE if present, and NULL
-   otherwise.  */
-
-static splay_tree_key
-splay_tree_lookup (splay_tree sp, splay_tree_key key)
-{
-  splay_tree_splay (sp, key);
-
-  if (sp->root && splay_compare (&sp->root->key, key) == 0)
-    return &sp->root->key;
-  else
-    return NULL;
-}
+attribute_hidden splay_tree_key splay_tree_lookup (splay_tree, splay_tree_key);
+attribute_hidden void splay_tree_insert (splay_tree, splay_tree_node);
+attribute_hidden void splay_tree_remove (splay_tree, splay_tree_key);
+
+#endif /* _SPLAY_TREE_H */
diff --git a/libgomp/target.c b/libgomp/target.c
index 64b787e..bd6af4c 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -26,10 +26,11 @@
    creation and termination.  */
 
 #include "libgomp.h"
-#include <limits.h>
-#include <stdbool.h>
-#include <stdlib.h>
+#include "oacc-plugin.h"
+#include "gomp-constants.h"
 #include <string.h>
+#include <stdio.h>
+#include <assert.h>
 
 #ifdef PLUGIN_SUPPORT
 # include <dlfcn.h>
@@ -40,54 +41,7 @@ static void gomp_target_init (void);
 
 static pthread_once_t gomp_is_initialized = PTHREAD_ONCE_INIT;
 
-/* Forward declaration for a node in the tree.  */
-typedef struct splay_tree_node_s *splay_tree_node;
-typedef struct splay_tree_s *splay_tree;
-typedef struct splay_tree_key_s *splay_tree_key;
-
-struct target_mem_desc {
-  /* Reference count.  */
-  uintptr_t refcount;
-  /* All the splay nodes allocated together.  */
-  splay_tree_node array;
-  /* Start of the target region.  */
-  uintptr_t tgt_start;
-  /* End of the targer region.  */
-  uintptr_t tgt_end;
-  /* Handle to free.  */
-  void *to_free;
-  /* Previous target_mem_desc.  */
-  struct target_mem_desc *prev;
-  /* Number of items in following list.  */
-  size_t list_count;
-
-  /* Corresponding target device descriptor.  */
-  struct gomp_device_descr *device_descr;
-
-  /* List of splay keys to remove (or decrease refcount)
-     at the end of region.  */
-  splay_tree_key list[];
-};
-
-struct splay_tree_key_s {
-  /* Address of the host object.  */
-  uintptr_t host_start;
-  /* Address immediately after the host object.  */
-  uintptr_t host_end;
-  /* Descriptor of the target memory.  */
-  struct target_mem_desc *tgt;
-  /* Offset from tgt->tgt_start to the start of the target object.  */
-  uintptr_t tgt_offset;
-  /* Reference count.  */
-  uintptr_t refcount;
-  /* True if data should be copied from device to host at the end.  */
-  bool copy_from;
-};
-
-enum target_type {
-  TARGET_TYPE_HOST,
-  TARGET_TYPE_INTEL_MIC
-};
+#include "splay-tree.h"
 
 /* This structure describes an offload image.
    It contains type of the target, pointer to host table descriptor, and pointer
@@ -112,7 +66,7 @@ static int num_devices;
 
 /* The comparison function.  */
 
-static int
+attribute_hidden int
 splay_compare (splay_tree_key x, splay_tree_key y)
 {
   if (x->host_start == x->host_end
@@ -125,45 +79,7 @@ splay_compare (splay_tree_key x, splay_tree_key y)
   return 0;
 }
 
-#include "splay-tree.h"
-
-/* This structure describes accelerator device.
-   It contains name of the corresponding libgomp plugin, function handlers for
-   interaction with the device, ID-number of the device, and information about
-   mapped memory.  */
-struct gomp_device_descr
-{
-  /* This is the ID number of device.  It could be specified in DEVICE-clause of
-     TARGET construct.  */
-  int id;
-
-  /* This is the TYPE of device.  */
-  enum target_type type;
-
-  /* Set to true when device is initialized.  */
-  bool is_initialized;
-
-  /* Plugin file handler.  */
-  void *plugin_handle;
-
-  /* Function handlers.  */
-  int (*get_type_func) (void);
-  int (*get_num_devices_func) (void);
-  void (*offload_register_func) (void *, void *);
-  void (*device_init_func) (void);
-  int (*device_get_table_func) (void *);
-  void *(*device_alloc_func) (size_t);
-  void (*device_free_func) (void *);
-  void *(*device_dev2host_func) (void *, const void *, size_t);
-  void *(*device_host2dev_func) (void *, const void *, size_t);
-  void (*device_run_func) (void *, void *);
-
-  /* Splay tree containing information about mapped memory regions.  */
-  struct splay_tree_s dev_splay_tree;
-
-  /* Mutex for operating with the splay tree and other shared structures.  */
-  gomp_mutex_t dev_env_lock;
-};
+#include "target.h"
 
 struct mapping_table {
   uintptr_t host_start;
@@ -172,10 +88,16 @@ struct mapping_table {
   uintptr_t tgt_end;
 };
 
+attribute_hidden void
+gomp_init_targets_once (void)
+{
+  (void) pthread_once (&gomp_is_initialized, gomp_target_init);
+}
+
 attribute_hidden int
 gomp_get_num_devices (void)
 {
-  (void) pthread_once (&gomp_is_initialized, gomp_target_init);
+  gomp_init_targets_once ();
   return num_devices;
 }
 
@@ -194,6 +116,33 @@ resolve_device (int device_id)
   return &devices[device_id];
 }
 
+__attribute__((used)) static void
+dump_mappings (FILE *f, splay_tree_node node)
+{
+  int i;
+  
+  splay_tree_key k = &node->key;
+  
+  if (!k)
+    return;
+  
+  fprintf (f, "key %p: host_start %p, host_end %p, tgt_offset %p, refcount %d, "
+	   "copy_from %s\n", k, (void *) k->host_start,
+	   (void *) k->host_end, (void *) k->tgt_offset, (int) k->refcount,
+	   k->copy_from ? "true" : "false");
+  fprintf (f, "tgt->refcount %d, tgt->tgt_start %p, tgt->tgt_end %p, "
+	   "tgt->to_free %p, tgt->prev %p, tgt->list_count %d, "
+	   "tgt->device_descr %p\n", (int) k->tgt->refcount,
+	   (void *) k->tgt->tgt_start, (void *) k->tgt->tgt_end,
+	   k->tgt->to_free, k->tgt->prev, (int) k->tgt->list_count,
+	   k->tgt->device_descr);
+
+  for (i = 0; i < k->tgt->list_count; i++)
+    fprintf (f, "item %d: %p\n", i, k->tgt->list[i]);
+  
+  dump_mappings (f, node->left);
+  dump_mappings (f, node->right);
+}
 
 /* Handle the case where splay_tree_lookup found oldn for newn.
    Helper function of gomp_map_vars.  */
@@ -211,18 +160,29 @@ gomp_map_vars_existing (splay_tree_key oldn, splay_tree_key newn,
   oldn->refcount++;
 }
 
-static struct target_mem_desc *
-gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
-	       void **hostaddrs, size_t *sizes, unsigned char *kinds,
-	       bool is_target)
+static int
+get_kind (bool is_openacc, void *kinds, int idx)
+{
+  return is_openacc ? ((unsigned short *) kinds)[idx]
+		    : ((unsigned char *) kinds)[idx];
+}
+
+attribute_hidden struct target_mem_desc *
+gomp_map_vars (struct gomp_device_descr *devicep,
+	       struct gomp_memory_mapping *mm, size_t mapnum,
+	       void **hostaddrs, void **devaddrs, size_t *sizes,
+	       void *kinds, bool is_openacc, bool is_target)
 {
   size_t i, tgt_align, tgt_size, not_found_cnt = 0;
+  const int rshift = is_openacc ? 8 : 3;
+  const int typemask = is_openacc ? 0xff : 0x7;
   struct splay_tree_key_s cur_node;
   struct target_mem_desc *tgt
     = gomp_malloc (sizeof (*tgt) + sizeof (tgt->list[0]) * mapnum);
   tgt->list_count = mapnum;
   tgt->refcount = 1;
   tgt->device_descr = devicep;
+  tgt->mem_map = mm;
 
   if (mapnum == 0)
     return tgt;
@@ -235,40 +195,41 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
       tgt_align = align;
       tgt_size = mapnum * sizeof (void *);
     }
-  gomp_mutex_lock (&devicep->dev_env_lock);
+  gomp_mutex_lock (&mm->lock);
   for (i = 0; i < mapnum; i++)
     {
+      int kind = get_kind (is_openacc, kinds, i);
       if (hostaddrs[i] == NULL)
 	{
 	  tgt->list[i] = NULL;
 	  continue;
 	}
       cur_node.host_start = (uintptr_t) hostaddrs[i];
-      if ((kinds[i] & 7) != 4)
+      if (!GOMP_MAP_POINTER_P (kind & typemask))
 	cur_node.host_end = cur_node.host_start + sizes[i];
       else
 	cur_node.host_end = cur_node.host_start + sizeof (void *);
-      splay_tree_key n = splay_tree_lookup (&devicep->dev_splay_tree,
-					    &cur_node);
+      splay_tree_key n = splay_tree_lookup (&mm->splay_tree, &cur_node);
       if (n)
 	{
 	  tgt->list[i] = n;
-	  gomp_map_vars_existing (n, &cur_node, kinds[i]);
+	  gomp_map_vars_existing (n, &cur_node, kind);
 	}
       else
 	{
-	  size_t align = (size_t) 1 << (kinds[i] >> 3);
+	  size_t align = (size_t) 1 << (kind >> rshift);
 	  tgt->list[i] = NULL;
 	  not_found_cnt++;
 	  if (tgt_align < align)
 	    tgt_align = align;
 	  tgt_size = (tgt_size + align - 1) & ~(align - 1);
 	  tgt_size += cur_node.host_end - cur_node.host_start;
-	  if ((kinds[i] & 7) == 5)
+	  if ((kind & typemask) == GOMP_MAP_TO_PSET)
 	    {
 	      size_t j;
 	      for (j = i + 1; j < mapnum; j++)
-		if ((kinds[j] & 7) != 4)
+		if (!GOMP_MAP_POINTER_P (get_kind (is_openacc, kinds, j)
+					 & typemask))
 		  break;
 		else if ((uintptr_t) hostaddrs[j] < cur_node.host_start
 			 || ((uintptr_t) hostaddrs[j] + sizeof (void *)
@@ -283,7 +244,15 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 	}
     }
 
-  if (not_found_cnt || is_target)
+  if (devaddrs)
+    {
+      if (mapnum != 1)
+        gomp_fatal ("unexpected aggregation");
+      tgt->to_free = devaddrs[0];
+      tgt->tgt_start = (uintptr_t) tgt->to_free;
+      tgt->tgt_end = tgt->tgt_start + sizes[0];
+    }
+  else if (not_found_cnt || is_target)
     {
       /* Allocate tgt_align aligned tgt_size block of memory.  */
       /* FIXME: Perhaps change interface to allocate properly aligned
@@ -293,11 +262,18 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
       tgt->tgt_start = (tgt->tgt_start + tgt_align - 1) & ~(tgt_align - 1);
       tgt->tgt_end = tgt->tgt_start + tgt_size;
     }
+  else
+    {
+      tgt->to_free = NULL;
+      tgt->tgt_start = 0;
+      tgt->tgt_end = 0;
+    }
 
   tgt_size = 0;
   if (is_target)
     tgt_size = mapnum * sizeof (void *);
 
+  tgt->array = NULL;
   if (not_found_cnt)
     {
       tgt->array = gomp_malloc (not_found_cnt * sizeof (*tgt->array));
@@ -307,43 +283,51 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
       for (i = 0; i < mapnum; i++)
 	if (tgt->list[i] == NULL)
 	  {
+	    int kind = get_kind (is_openacc, kinds, i);
 	    if (hostaddrs[i] == NULL)
 	      continue;
 	    splay_tree_key k = &array->key;
 	    k->host_start = (uintptr_t) hostaddrs[i];
-	    if ((kinds[i] & 7) != 4)
+	    if (!GOMP_MAP_POINTER_P (kind & typemask))
 	      k->host_end = k->host_start + sizes[i];
 	    else
 	      k->host_end = k->host_start + sizeof (void *);
-	    splay_tree_key n
-	      = splay_tree_lookup (&devicep->dev_splay_tree, k);
+	    splay_tree_key n = splay_tree_lookup (&mm->splay_tree, k);
 	    if (n)
 	      {
 		tgt->list[i] = n;
-		gomp_map_vars_existing (n, k, kinds[i]);
+		gomp_map_vars_existing (n, k, kind);
 	      }
 	    else
 	      {
-		size_t align = (size_t) 1 << (kinds[i] >> 3);
+		size_t align = (size_t) 1 << (kind >> rshift);
 		tgt->list[i] = k;
 		tgt_size = (tgt_size + align - 1) & ~(align - 1);
 		k->tgt = tgt;
 		k->tgt_offset = tgt_size;
 		tgt_size += k->host_end - k->host_start;
-		if ((kinds[i] & 7) == 2 || (kinds[i] & 7) == 3)
-		  k->copy_from = true;
+		k->copy_from = GOMP_MAP_COPYFROM_P (kind & typemask)
+			       || GOMP_MAP_TOFROM_P (kind & typemask);
 		k->refcount = 1;
+		k->async_refcount = 0;
 		tgt->refcount++;
 		array->left = NULL;
 		array->right = NULL;
-		splay_tree_insert (&devicep->dev_splay_tree, array);
-		switch (kinds[i] & 7)
+		splay_tree_insert (&mm->splay_tree, array);
+		switch (kind & typemask)
 		  {
-		  case 0: /* ALLOC */
-		  case 2: /* FROM */
+		  case GOMP_MAP_FORCE_ALLOC:
+		  case GOMP_MAP_FORCE_FROM:
+		    /* FIXME: No special handling (see comment in
+		       oacc-parallel.c).  */
+		  case GOMP_MAP_ALLOC:
+		  case GOMP_MAP_ALLOC_FROM:
 		    break;
-		  case 1: /* TO */
-		  case 3: /* TOFROM */
+		  case GOMP_MAP_FORCE_TO:
+		  case GOMP_MAP_FORCE_TOFROM:
+		    /* FIXME: No special handling, as above.  */
+		  case GOMP_MAP_ALLOC_TO:
+		  case GOMP_MAP_ALLOC_TOFROM:
 		    /* Copy from host to device memory.  */
 		    /* FIXME: Perhaps add some smarts, like if copying
 		       several adjacent fields from host to target, use some
@@ -353,7 +337,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 		       (void *) k->host_start,
 		       k->host_end - k->host_start);
 		    break;
-		  case 4: /* POINTER */
+		  case GOMP_MAP_POINTER:
 		    cur_node.host_start
 		      = (uintptr_t) *(void **) k->host_start;
 		    if (cur_node.host_start == (uintptr_t) NULL)
@@ -370,19 +354,16 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 		    /* Add bias to the pointer value.  */
 		    cur_node.host_start += sizes[i];
 		    cur_node.host_end = cur_node.host_start + 1;
-		    n = splay_tree_lookup (&devicep->dev_splay_tree,
-					   &cur_node);
+		    n = splay_tree_lookup (&mm->splay_tree, &cur_node);
 		    if (n == NULL)
 		      {
 			/* Could be possibly zero size array section.  */
 			cur_node.host_end--;
-			n = splay_tree_lookup (&devicep->dev_splay_tree,
-					       &cur_node);
+			n = splay_tree_lookup (&mm->splay_tree, &cur_node);
 			if (n == NULL)
 			  {
 			    cur_node.host_start--;
-			    n = splay_tree_lookup (&devicep->dev_splay_tree,
-						   &cur_node);
+			    n = splay_tree_lookup (&mm->splay_tree, &cur_node);
 			    cur_node.host_start++;
 			  }
 		      }
@@ -403,7 +384,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 		       (void *) &cur_node.tgt_offset,
 		       sizeof (void *));
 		    break;
-		  case 5: /* TO_PSET */
+		  case GOMP_MAP_TO_PSET:
 		    /* Copy from host to device memory.  */
 		    /* FIXME: see above FIXME comment.  */
 		    devicep->device_host2dev_func
@@ -411,7 +392,8 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 		       (void *) k->host_start,
 		       (k->host_end - k->host_start));
 		    for (j = i + 1; j < mapnum; j++)
-		      if ((kinds[j] & 7) != 4)
+		      if (!GOMP_MAP_POINTER_P (get_kind (is_openacc, kinds, j)
+					       & typemask))
 			break;
 		      else if ((uintptr_t) hostaddrs[j] < k->host_start
 			       || ((uintptr_t) hostaddrs[j] + sizeof (void *)
@@ -440,19 +422,18 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 			  /* Add bias to the pointer value.  */
 			  cur_node.host_start += sizes[j];
 			  cur_node.host_end = cur_node.host_start + 1;
-			  n = splay_tree_lookup (&devicep->dev_splay_tree,
-						 &cur_node);
+			  n = splay_tree_lookup (&mm->splay_tree, &cur_node);
 			  if (n == NULL)
 			    {
 			      /* Could be possibly zero size array section.  */
 			      cur_node.host_end--;
-			      n = splay_tree_lookup (&devicep->dev_splay_tree,
+			      n = splay_tree_lookup (&mm->splay_tree,
 						     &cur_node);
 			      if (n == NULL)
 				{
 				  cur_node.host_start--;
-				  n = splay_tree_lookup
-					(&devicep->dev_splay_tree, &cur_node);
+				  n = splay_tree_lookup (&mm->splay_tree,
+							 &cur_node);
 				  cur_node.host_start++;
 				}
 			    }
@@ -478,6 +459,31 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 			  i++;
 			}
 		      break;
+		    case GOMP_MAP_FORCE_PRESENT:
+		      {
+		        /* We already looked up the memory region above and it
+			   was missing.  */
+			size_t size = k->host_end - k->host_start;
+			gomp_fatal ("present clause: !acc_is_present (%p, "
+				    "%zd (0x%zx))", (void *) k->host_start,
+				    size, size);
+		      }
+		      break;
+		    case GOMP_MAP_FORCE_DEVICEPTR:
+		      assert (k->host_end - k->host_start == sizeof (void *));
+		      
+		      devicep->device_host2dev_func
+		        ((void *) (tgt->tgt_start + k->tgt_offset),
+			 (void *) k->host_start,
+			 sizeof (void *));
+		      break;
+		    case GOMP_MAP_FORCE_PRIVATE:
+		      abort ();
+		    case GOMP_MAP_FORCE_FIRSTPRIVATE:
+		      abort ();
+		    default:
+		      gomp_fatal ("%s: unhandled kind 0x%.2x", __FUNCTION__,
+				  kind);
 		  }
 		array++;
 	      }
@@ -501,7 +507,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 	}
     }
 
-  gomp_mutex_unlock (&devicep->dev_env_lock);
+  gomp_mutex_unlock (&mm->lock);
   return tgt;
 }
 
@@ -516,10 +522,52 @@ gomp_unmap_tgt (struct target_mem_desc *tgt)
   free (tgt);
 }
 
-static void
-gomp_unmap_vars (struct target_mem_desc *tgt)
+/* Decrease the refcount for a set of mapped variables, and queue asychronous
+   copies from the device back to the host after any work that has been issued. 
+   Because the regions are still "live", increment an asynchronous reference
+   count to indicate that they should not be unmapped from host-side data
+   structures until the asynchronous copy has completed.  */
+
+attribute_hidden void
+gomp_copy_from_async (struct target_mem_desc *tgt)
 {
   struct gomp_device_descr *devicep = tgt->device_descr;
+  struct gomp_memory_mapping *mm = tgt->mem_map;
+  size_t i;
+  
+  gomp_mutex_lock (&mm->lock);
+
+  for (i = 0; i < tgt->list_count; i++)
+    if (tgt->list[i] == NULL)
+      ;
+    else if (tgt->list[i]->refcount > 1)
+      {
+	tgt->list[i]->refcount--;
+	tgt->list[i]->async_refcount++;
+      }
+    else
+      {
+	splay_tree_key k = tgt->list[i];
+	if (k->copy_from)
+	  /* Copy from device to host memory.  */
+	  devicep->device_dev2host_func
+	    ((void *) k->host_start,
+	     (void *) (k->tgt->tgt_start + k->tgt_offset),
+	     k->host_end - k->host_start);
+      }
+
+  gomp_mutex_unlock (&mm->lock);
+}
+
+/* Unmap variables described by TGT.  If DO_COPYFROM is true, copy relevant
+   variables back from device to host: if it is false, it is assumed that this
+   has been done already, i.e. by gomp_copy_from_async above.  */
+
+attribute_hidden void
+gomp_unmap_vars (struct target_mem_desc *tgt, bool do_copyfrom)
+{
+  struct gomp_device_descr *devicep = tgt->device_descr;
+  struct gomp_memory_mapping *mm = tgt->mem_map;
 
   if (tgt->list_count == 0)
     {
@@ -528,22 +576,24 @@ gomp_unmap_vars (struct target_mem_desc *tgt)
     }
 
   size_t i;
-  gomp_mutex_lock (&devicep->dev_env_lock);
+  gomp_mutex_lock (&mm->lock);
   for (i = 0; i < tgt->list_count; i++)
     if (tgt->list[i] == NULL)
       ;
     else if (tgt->list[i]->refcount > 1)
       tgt->list[i]->refcount--;
+    else if (tgt->list[i]->async_refcount > 0)
+      tgt->list[i]->async_refcount--;
     else
       {
 	splay_tree_key k = tgt->list[i];
-	if (k->copy_from)
+	if (k->copy_from && do_copyfrom)
 	  /* Copy from device to host memory.  */
 	  devicep->device_dev2host_func
 	    ((void *) k->host_start,
 	     (void *) (k->tgt->tgt_start + k->tgt_offset),
 	     k->host_end - k->host_start);
-	splay_tree_remove (&devicep->dev_splay_tree, k);
+	splay_tree_remove (&mm->splay_tree, k);
 	if (k->tgt->refcount > 1)
 	  k->tgt->refcount--;
 	else
@@ -554,15 +604,17 @@ gomp_unmap_vars (struct target_mem_desc *tgt)
     tgt->refcount--;
   else
     gomp_unmap_tgt (tgt);
-  gomp_mutex_unlock (&devicep->dev_env_lock);
+  gomp_mutex_unlock (&mm->lock);
 }
 
 static void
-gomp_update (struct gomp_device_descr *devicep, size_t mapnum,
-	     void **hostaddrs, size_t *sizes, unsigned char *kinds)
+gomp_update (struct gomp_device_descr *devicep, struct gomp_memory_mapping *mm,
+	     size_t mapnum, void **hostaddrs, size_t *sizes, void *kinds,
+	     bool is_openacc)
 {
   size_t i;
   struct splay_tree_key_s cur_node;
+  const int typemask = is_openacc ? 0xff : 0x7;
 
   if (!devicep)
     return;
@@ -570,16 +622,17 @@ gomp_update (struct gomp_device_descr *devicep, size_t mapnum,
   if (mapnum == 0)
     return;
 
-  gomp_mutex_lock (&devicep->dev_env_lock);
+  gomp_mutex_lock (&mm->lock);
   for (i = 0; i < mapnum; i++)
     if (sizes[i])
       {
 	cur_node.host_start = (uintptr_t) hostaddrs[i];
 	cur_node.host_end = cur_node.host_start + sizes[i];
-	splay_tree_key n = splay_tree_lookup (&devicep->dev_splay_tree,
+	splay_tree_key n = splay_tree_lookup (&mm->splay_tree,
 					      &cur_node);
 	if (n)
 	  {
+	    int kind = get_kind (is_openacc, kinds, i);
 	    if (n->host_start > cur_node.host_start
 		|| n->host_end < cur_node.host_end)
 	      gomp_fatal ("Trying to update [%p..%p) object when"
@@ -588,7 +641,7 @@ gomp_update (struct gomp_device_descr *devicep, size_t mapnum,
 			  (void *) cur_node.host_end,
 			  (void *) n->host_start,
 			  (void *) n->host_end);
-	    if ((kinds[i] & 7) == 1)
+	    if (GOMP_MAP_COPYTO_P (kind & typemask))
 	      /* Copy from host to device memory.  */
 	      devicep->device_host2dev_func
 		((void *) (n->tgt->tgt_start
@@ -597,7 +650,7 @@ gomp_update (struct gomp_device_descr *devicep, size_t mapnum,
 			   - n->host_start),
 		 (void *) cur_node.host_start,
 		 cur_node.host_end - cur_node.host_start);
-	    else if ((kinds[i] & 7) == 2)
+	    else if (GOMP_MAP_COPYFROM_P (kind & typemask))
 	      /* Copy from device to host memory.  */
 	      devicep->device_dev2host_func
 		((void *) cur_node.host_start,
@@ -612,20 +665,25 @@ gomp_update (struct gomp_device_descr *devicep, size_t mapnum,
 		      (void *) cur_node.host_start,
 		      (void *) cur_node.host_end);
       }
-  gomp_mutex_unlock (&devicep->dev_env_lock);
+  gomp_mutex_unlock (&mm->lock);
 }
 
+static void gomp_register_image_for_device (struct gomp_device_descr *device,
+					    struct offload_image_descr *image);
 
 /* This function should be called from every offload image.  It gets the
    descriptor of the host func and var tables HOST_TABLE, TYPE of the target,
    and TARGET_DATA needed by target plugin (target tables, etc.)  */
 void
-GOMP_offload_register (void *host_table, int type, void *target_data)
+GOMP_offload_register (void *host_table, int type, void **target_data)
 {
   offload_images = gomp_realloc (offload_images,
 				 (num_offload_images + 1)
 				 * sizeof (struct offload_image_descr));
 
+  if (offload_images == NULL)
+    return;
+
   offload_images[num_offload_images].type = type;
   offload_images[num_offload_images].host_table = host_table;
   offload_images[num_offload_images].target_data = target_data;
@@ -633,18 +691,20 @@ GOMP_offload_register (void *host_table, int type, void *target_data)
   num_offload_images++;
 }
 
-static void
+attribute_hidden void
 gomp_init_device (struct gomp_device_descr *devicep)
 {
+  int i;
+
   /* Initialize the target device.  */
   devicep->device_init_func ();
 
   /* Get address mapping table for device.  */
   struct mapping_table *table = NULL;
   int num_entries = devicep->device_get_table_func (&table);
+  struct gomp_memory_mapping *mm = &devicep->mem_map;
 
   /* Insert host-target address mapping into dev_splay_tree.  */
-  int i;
   for (i = 0; i < num_entries; i++)
     {
       struct target_mem_desc *tgt = gomp_malloc (sizeof (*tgt));
@@ -663,13 +723,32 @@ gomp_init_device (struct gomp_device_descr *devicep)
       k->tgt = tgt;
       node->left = NULL;
       node->right = NULL;
-      splay_tree_insert (&devicep->dev_splay_tree, node);
+      splay_tree_insert (&mm->splay_tree, node);
     }
 
   free (table);
   devicep->is_initialized = true;
 }
 
+attribute_hidden void
+gomp_fini_device (struct gomp_device_descr *devicep)
+{
+  struct gomp_memory_mapping *mm = &devicep->mem_map;
+
+  if (devicep->is_initialized)
+    devicep->device_fini_func ();
+
+  while (mm->splay_tree.root)
+    {
+      struct target_mem_desc *tgt = mm->splay_tree.root->key.tgt;
+      free (tgt->array);
+      free (tgt);
+      splay_tree_remove (&mm->splay_tree, &mm->splay_tree.root->key);
+    }
+
+  devicep->is_initialized = false;
+}
+
 /* Called when encountering a target directive.  If DEVICE
    is -1, it means use device-var ICV.  If it is -2 (or any other value
    larger than last available hw device, use host fallback.
@@ -686,7 +765,12 @@ GOMP_target (int device, void (*fn) (void *), const void *openmp_target,
 	     unsigned char *kinds)
 {
   struct gomp_device_descr *devicep = resolve_device (device);
-  if (devicep == NULL)
+  struct gomp_memory_mapping *mm = &devicep->mem_map;
+
+  if (devicep != NULL && !devicep->is_initialized)
+    gomp_init_device (devicep);
+
+  if (devicep == NULL || !(devicep->capabilities & TARGET_CAP_OPENMP_400))
     {
       /* Host fallback.  */
       struct gomp_thread old_thr, *thr = gomp_thread ();
@@ -703,24 +787,24 @@ GOMP_target (int device, void (*fn) (void *), const void *openmp_target,
       return;
     }
 
-  if (!devicep->is_initialized)
-    gomp_init_device (devicep);
-
   struct splay_tree_key_s k;
   k.host_start = (uintptr_t) fn;
   k.host_end = k.host_start + 1;
-  splay_tree_key tgt_fn = splay_tree_lookup (&devicep->dev_splay_tree, &k);
-  if (tgt_fn == NULL && devicep->type != TARGET_TYPE_HOST)
+  gomp_mutex_lock (&mm->lock);
+  splay_tree_key tgt_fn = splay_tree_lookup (&devicep->mem_map.splay_tree, &k);
+  if (tgt_fn == NULL && !(devicep->capabilities & TARGET_CAP_NATIVE_EXEC))
     gomp_fatal ("Target function wasn't mapped");
+  gomp_mutex_unlock (&mm->lock);
 
   struct target_mem_desc *tgt_vars
-    = gomp_map_vars (devicep, mapnum, hostaddrs, sizes, kinds, true);
-  if (devicep->type == TARGET_TYPE_HOST)
+    = gomp_map_vars (devicep, &devicep->mem_map, mapnum, hostaddrs, NULL,
+		     sizes, kinds, false, true);
+  if (devicep->capabilities & TARGET_CAP_NATIVE_EXEC)
     devicep->device_run_func (fn, (void *) tgt_vars->tgt_start);
   else
     devicep->device_run_func ((void *) tgt_fn->tgt->tgt_start,
 			      (void *) tgt_vars->tgt_start);
-  gomp_unmap_vars (tgt_vars);
+  gomp_unmap_vars (tgt_vars, true);
 }
 
 void
@@ -728,7 +812,11 @@ GOMP_target_data (int device, const void *openmp_target, size_t mapnum,
 		  void **hostaddrs, size_t *sizes, unsigned char *kinds)
 {
   struct gomp_device_descr *devicep = resolve_device (device);
-  if (devicep == NULL)
+
+  if (devicep != NULL && !devicep->is_initialized)
+    gomp_init_device (devicep);
+
+  if (devicep == NULL || !(devicep->capabilities & TARGET_CAP_OPENMP_400))
     {
       /* Host fallback.  */
       struct gomp_task_icv *icv = gomp_icv (false);
@@ -739,18 +827,17 @@ GOMP_target_data (int device, const void *openmp_target, size_t mapnum,
 	     new #pragma omp target data, otherwise GOMP_target_end_data
 	     would get out of sync.  */
 	  struct target_mem_desc *tgt
-	    = gomp_map_vars (NULL, 0, NULL, NULL, NULL, false);
+	    = gomp_map_vars (NULL, NULL, 0, NULL, NULL, NULL, NULL, false,
+			     false);
 	  tgt->prev = icv->target_data;
 	  icv->target_data = tgt;
 	}
       return;
     }
 
-  if (!devicep->is_initialized)
-    gomp_init_device (devicep);
-
   struct target_mem_desc *tgt
-    = gomp_map_vars (devicep, mapnum, hostaddrs, sizes, kinds, false);
+    = gomp_map_vars (devicep, &devicep->mem_map, mapnum, hostaddrs, NULL, sizes,
+		     kinds, false, false);
   struct gomp_task_icv *icv = gomp_icv (true);
   tgt->prev = icv->target_data;
   icv->target_data = tgt;
@@ -764,7 +851,7 @@ GOMP_target_end_data (void)
     {
       struct target_mem_desc *tgt = icv->target_data;
       icv->target_data = tgt->prev;
-      gomp_unmap_vars (tgt);
+      gomp_unmap_vars (tgt, true);
     }
 }
 
@@ -773,13 +860,15 @@ GOMP_target_update (int device, const void *openmp_target, size_t mapnum,
 		    void **hostaddrs, size_t *sizes, unsigned char *kinds)
 {
   struct gomp_device_descr *devicep = resolve_device (device);
-  if (devicep == NULL)
-    return;
 
-  if (!devicep->is_initialized)
+  if (devicep != NULL && !devicep->is_initialized)
     gomp_init_device (devicep);
 
-  gomp_update (devicep, mapnum, hostaddrs, sizes, kinds);
+  if (devicep == NULL || !(devicep->capabilities & TARGET_CAP_OPENMP_400))
+    return;
+
+  gomp_update (devicep, &devicep->mem_map, mapnum, hostaddrs, sizes, kinds,
+	       false);
 }
 
 void
@@ -822,7 +911,8 @@ static bool
 gomp_load_plugin_for_device (struct gomp_device_descr *device,
 			     const char *plugin_name)
 {
-  char *err = NULL;
+  char *err = NULL, *last_missing = NULL;
+  int optional_present, optional_total;
 
   /* Clear any existing error.  */
   dlerror ();
@@ -845,40 +935,98 @@ gomp_load_plugin_for_device (struct gomp_device_descr *device,
 	goto out;							\
     }									\
   while (0)
+  /* Similar, but missing functions are not an error.  */
+#define DLSYM_OPT(f,n) \
+  do									\
+    {									\
+      char *tmp_err;							\
+      device->f##_func = dlsym (device->plugin_handle, #n);		\
+      tmp_err = dlerror ();						\
+      if (tmp_err == NULL)						\
+        optional_present++;						\
+      else								\
+        last_missing = #n;						\
+      optional_total++;							\
+    }									\
+  while (0)
+
+  DLSYM (get_name);
+  DLSYM (get_caps);
   DLSYM (get_type);
   DLSYM (get_num_devices);
   DLSYM (offload_register);
   DLSYM (device_init);
+  DLSYM (device_fini);
   DLSYM (device_get_table);
   DLSYM (device_alloc);
   DLSYM (device_free);
   DLSYM (device_dev2host);
   DLSYM (device_host2dev);
-  DLSYM (device_run);
+  if (device->get_caps_func () & TARGET_CAP_OPENMP_400)
+    DLSYM (device_run);
+  if (device->get_caps_func () & TARGET_CAP_OPENACC_200)
+    {
+      optional_present = optional_total = 0;
+      DLSYM_OPT (openacc.exec, openacc_parallel);
+      DLSYM_OPT (openacc.open_device, openacc_open_device);
+      DLSYM_OPT (openacc.close_device, openacc_close_device);
+      DLSYM_OPT (openacc.get_device_num, openacc_get_device_num);
+      DLSYM_OPT (openacc.set_device_num, openacc_set_device_num);
+      DLSYM_OPT (openacc.avail, openacc_avail);
+      DLSYM_OPT (openacc.async_test, openacc_async_test);
+      DLSYM_OPT (openacc.async_test_all, openacc_async_test_all);
+      DLSYM_OPT (openacc.async_wait, openacc_async_wait);
+      DLSYM_OPT (openacc.async_wait_async, openacc_async_wait_async);
+      DLSYM_OPT (openacc.async_wait_all, openacc_async_wait_all);
+      DLSYM_OPT (openacc.async_wait_all_async, openacc_async_wait_all_async);
+      DLSYM_OPT (openacc.async_set_async, openacc_async_set_async);
+      /* Require all the OpenACC handlers if we have TARGET_CAP_OPENACC_200.  */
+      if (optional_present != optional_total)
+	{
+	  err = "plugin missing OpenACC handler function";
+	  goto out;
+	}
+      optional_present = optional_total = 0;
+      DLSYM_OPT (openacc.cuda.get_current_device,
+		 openacc_get_current_cuda_device);
+      DLSYM_OPT (openacc.cuda.get_current_context,
+		 openacc_get_current_cuda_context);
+      DLSYM_OPT (openacc.cuda.get_stream, openacc_get_cuda_stream);
+      DLSYM_OPT (openacc.cuda.set_stream, openacc_set_cuda_stream);
+      /* Make sure all the CUDA functions are there if any of them are.  */
+      if (optional_present && optional_present != optional_total)
+	{
+	  err = "plugin missing OpenACC CUDA handler function";
+	  goto out;
+	}
+    }
 #undef DLSYM
+#undef DLSYM_OPT
 
  out:
   if (err != NULL)
     {
       gomp_error ("while loading %s: %s", plugin_name, err);
+      if (last_missing)
+        gomp_error ("missing function was %s", last_missing);
       if (device->plugin_handle)
 	dlclose (device->plugin_handle);
     }
   return err == NULL;
 }
 
-/* This function finds OFFLOAD_IMAGES corresponding to DEVICE type, and
-   registers them in the plugin.  */
+/* This function adds a compatible offload image IMAGE to an accelerator device
+   DEVICE.  */
+
 static void
-gomp_register_images_for_device (struct gomp_device_descr *device)
+gomp_register_image_for_device (struct gomp_device_descr *device,
+				struct offload_image_descr *image)
 {
-  int i;
-  for (i = 0; i < num_offload_images; i++)
+  if (!device->offload_regions_registered
+      && (device->type == image->type || device->type == TARGET_TYPE_HOST))
     {
-      struct offload_image_descr *image = &offload_images[i];
-
-      if (device->type == image->type || device->type == TARGET_TYPE_HOST)
-	device->offload_register_func (image->host_table, image->target_data);
+      device->offload_register_func (image->host_table, image->target_data);
+      device->offload_regions_registered = true;
     }
 }
 
@@ -895,6 +1043,7 @@ gomp_find_available_plugins (void)
   DIR *dir = NULL;
   struct dirent *ent;
   char plugin_name[PATH_MAX];
+  int i;
 
   num_devices = 0;
   devices = NULL;
@@ -909,7 +1058,7 @@ gomp_find_available_plugins (void)
 
   while ((ent = readdir (dir)) != NULL)
     {
-      struct gomp_device_descr current_device;
+      struct gomp_device_descr current_device, *devicep;
       if (!gomp_check_plugin_file_name (ent->d_name))
 	continue;
       if (strlen (plugin_path) + 1 + strlen (ent->d_name) >= PATH_MAX)
@@ -919,7 +1068,7 @@ gomp_find_available_plugins (void)
       strcat (plugin_name, ent->d_name);
       if (!gomp_load_plugin_for_device (&current_device, plugin_name))
 	continue;
-      devices = realloc (devices, (num_devices + 1)
+      devices = gomp_realloc (devices, (num_devices + 1)
 				  * sizeof (struct gomp_device_descr));
       if (devices == NULL)
 	{
@@ -927,18 +1076,31 @@ gomp_find_available_plugins (void)
 	  goto out;
 	}
 
-      /* FIXME: Properly handle multiple devices of the same type.  */
-      if (current_device.get_num_devices_func () >= 1)
-	{
-	  current_device.id = num_devices + 1;
-	  current_device.type = current_device.get_type_func ();
-	  current_device.is_initialized = false;
-	  current_device.dev_splay_tree.root = NULL;
-	  gomp_register_images_for_device (&current_device);
-	  devices[num_devices] = current_device;
-	  gomp_mutex_init (&devices[num_devices].dev_env_lock);
-	  num_devices++;
-	}
+      devices[num_devices] = current_device;
+      devicep = &devices[num_devices];
+
+      devicep->is_initialized = false;
+      devicep->offload_regions_registered = false;
+      devicep->mem_map.splay_tree.root = NULL;
+      devicep->type = devicep->get_type_func ();
+      devicep->name = devicep->get_name_func ();
+      devicep->capabilities = devicep->get_caps_func ();
+      gomp_mutex_init (&devicep->mem_map.lock);
+      devicep->id = ++num_devices;
+    }
+
+  for (i = 0; i < num_devices; i++)
+    {
+      int j;
+
+      for (j = 0; j < num_offload_images; j++)
+	gomp_register_image_for_device (&devices[i], &offload_images[j]);
+
+      /* The 'devices' array can be moved (by the realloc call) until we have
+	 found all the plugins, so registering with the OpenACC runtime (which
+	 takes a copy of the pointer argument) must be delayed until now.  */
+      if (devices[i].capabilities & TARGET_CAP_OPENACC_200)
+	ACC_plugin_register (&devices[i]);
     }
 
  out:
diff --git a/libgomp/target.h b/libgomp/target.h
new file mode 100644
index 0000000..4206548
--- /dev/null
+++ b/libgomp/target.h
@@ -0,0 +1,164 @@
+/* Copyright (C) 2013-2014 Free Software Foundation, Inc.
+   Contributed by Jakub Jelinek <jakub@redhat.com>.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* This file handles the maintainence of threads in response to team
+   creation and termination.  */
+
+#ifndef _TARGET_H
+#define _TARGET_H 1
+
+#include <stdarg.h>
+#include "splay-tree.h"
+#include "gomp-constants.h"
+
+struct target_mem_desc {
+  /* Reference count.  */
+  uintptr_t refcount;
+  /* All the splay nodes allocated together.  */
+  splay_tree_node array;
+  /* Start of the target region.  */
+  uintptr_t tgt_start;
+  /* End of the targer region.  */
+  uintptr_t tgt_end;
+  /* Handle to free.  */
+  void *to_free;
+  /* Previous target_mem_desc.  */
+  struct target_mem_desc *prev;
+  /* Number of items in following list.  */
+  size_t list_count;
+
+  /* Corresponding target device descriptor.  */
+  struct gomp_device_descr *device_descr;
+  
+  /* Memory mapping info for the thread that created this descriptor.  */
+  struct gomp_memory_mapping *mem_map;
+
+  /* List of splay keys to remove (or decrease refcount)
+     at the end of region.  */
+  splay_tree_key list[];
+};
+
+/* Keep in sync with openacc.h:acc_device_t.  */
+
+enum target_type {
+  TARGET_TYPE_HOST = GOMP_TARGET_HOST,
+  TARGET_TYPE_NONSHM_HOST = GOMP_TARGET_NONSHM_HOST,
+  TARGET_TYPE_NVIDIA_PTX = GOMP_TARGET_NVIDIA_PTX,
+  TARGET_TYPE_INTEL_MIC = GOMP_TARGET_INTEL_MIC,
+};
+
+#define TARGET_CAP_SHARED_MEM	1
+#define TARGET_CAP_NATIVE_EXEC	2
+#define TARGET_CAP_OPENMP_400	4
+#define TARGET_CAP_OPENACC_200	8
+
+/* Information about mapped memory regions (per device/context).  */
+
+struct gomp_memory_mapping
+{
+  /* Splay tree containing information about mapped memory regions.  */
+  struct splay_tree_s splay_tree;
+
+  /* Mutex for operating with the splay tree and other shared structures.  */
+  gomp_mutex_t lock;
+};
+
+#include "oacc-int.h"
+
+static inline enum acc_device_t
+acc_device_type (enum target_type type)
+{
+  return (enum acc_device_t) type;
+}
+
+/* This structure describes accelerator device.
+   It contains name of the corresponding libgomp plugin, function handlers for
+   interaction with the device, ID-number of the device, and information about
+   mapped memory.  */
+struct gomp_device_descr
+{
+  /* The name of the device.  */
+  const char *name;
+
+  /* Capabilities of device (supports OpenACC, OpenMP).  */
+  unsigned int capabilities;
+
+  /* This is the ID number of device.  It could be specified in DEVICE-clause of
+     TARGET construct.  */
+  int id;
+
+  /* This is the TYPE of device.  */
+  enum target_type type;
+
+  /* Set to true when device is initialized.  */
+  bool is_initialized;
+  
+  /* True when offload regions have been registered with this device.  */
+  bool offload_regions_registered;
+
+  /* Plugin file handler.  */
+  void *plugin_handle;
+
+  /* Function handlers.  */
+  const char *(*get_name_func) (void);
+  unsigned int (*get_caps_func) (void);
+  int (*get_type_func) (void);
+  int (*get_num_devices_func) (void);
+  void (*offload_register_func) (void *, void *);
+  int (*device_init_func) (void);
+  int (*device_fini_func) (void);
+  int (*device_get_table_func) (void *);
+  void *(*device_alloc_func) (size_t);
+  void (*device_free_func) (void *);
+  void *(*device_dev2host_func) (void *, const void *, size_t);
+  void *(*device_host2dev_func) (void *, const void *, size_t);
+  void (*device_run_func) (void *, void *);
+
+  /* OpenACC-specific functions.  */
+  ACC_dispatch_t openacc;
+  
+  /* Memory-mapping info (only for OpenMP -- mappings are stored per-thread
+     for OpenACC. It's not clear if that's a useful distinction).  */
+  struct gomp_memory_mapping mem_map;
+};
+
+extern struct target_mem_desc *
+gomp_map_vars (struct gomp_device_descr *devicep,
+	       struct gomp_memory_mapping *mm, size_t mapnum,
+	       void **hostaddrs, void **devaddrs, size_t *sizes,
+	       void *kinds, bool is_openacc, bool is_target);
+
+extern void
+gomp_copy_from_async (struct target_mem_desc *tgt);
+
+extern void
+gomp_unmap_vars (struct target_mem_desc *tgt, bool);
+
+extern attribute_hidden void
+gomp_init_device (struct gomp_device_descr *devicep);
+
+extern attribute_hidden void
+gomp_fini_device (struct gomp_device_descr *devicep);
+
+#endif /* _TARGET_H */
diff --git a/libgomp/testsuite/Makefile.in b/libgomp/testsuite/Makefile.in
index 5273eaa..77b365e 100644
--- a/libgomp/testsuite/Makefile.in
+++ b/libgomp/testsuite/Makefile.in
@@ -129,6 +129,10 @@ PACKAGE_URL = @PACKAGE_URL@
 PACKAGE_VERSION = @PACKAGE_VERSION@
 PATH_SEPARATOR = @PATH_SEPARATOR@
 PERL = @PERL@
+PLUGIN_NVPTX = @PLUGIN_NVPTX@
+PLUGIN_NVPTX_CPPFLAGS = @PLUGIN_NVPTX_CPPFLAGS@
+PLUGIN_NVPTX_LDFLAGS = @PLUGIN_NVPTX_LDFLAGS@
+PLUGIN_NVPTX_LIBS = @PLUGIN_NVPTX_LIBS@
 RANLIB = @RANLIB@
 SECTION_LDFLAGS = @SECTION_LDFLAGS@
 SED = @SED@

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 1/5] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin (repost)
  2014-09-23 18:20 [PATCH 7/10] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin Julian Brown
@ 2014-11-11 13:54 ` Julian Brown
  2014-11-12 10:10   ` Jakub Jelinek
                     ` (6 more replies)
  2014-12-22 16:41 ` [PATCH 7/10] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin Thomas Schwinge
                   ` (4 subsequent siblings)
  5 siblings, 7 replies; 36+ messages in thread
From: Julian Brown @ 2014-11-11 13:54 UTC (permalink / raw)
  To: gcc-patches, Jakub Jelinek, Thomas Schwinge, Ilya Verbin

[-- Attachment #1: Type: text/plain, Size: 7869 bytes --]

On Tue, 23 Sep 2014 19:19:31 +0100
Julian Brown <julian@codesourcery.com> wrote:

> This patch contains the bulk of the OpenACC 2.0 runtime support,
> building around, or on top of, the OpenMP 4.0 support (as previously
> posted or already extant upstream) where we could. [...]

Here is a new version of the OpenACC support patch for libgomp, rebased
on top of a version of Ilya Verbin's patches that I merged to a local
clone of trunk, and tested as far as possible without the
middle/front-end pieces, since those are not ready yet. This patch
brings the OpenACC support in libgomp up-to-date with the various fixes
that I have been making on the gomp4 branch, in particular I believe
all of Jakub's earlier comments in the following email have been
addressed:

https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02095.html

Since Ilya's most-recently posted patches, there is now somewhat of a
mismatch in APIs between the OpenMP and OpenACC parts of libgomp with
regard to handling of multiple devices of the same type. This is mostly
handled by the "open" and "close" hooks for OpenACC (and per-thread
state that tracks the active device number) but is now handled by the
"init" hook for OpenMP (which OpenACC just uses for overall
initialisation/shutdown), and explicit target_id parameters for
several of the plugin hooks. This is only a problem for hypothetical
plugins that support both multiple devices and both of OpenMP and
OpenACC, and so far no such plugins exist, but we may need to think
about how to unify the divergent approaches to multiple
devices/multiple threads sooner or later.

A few OpenMP tests fail with the new host_nonshm plugin (with failures
of the form "libgomp: Trying to update [0x605820..0x605824) object that
is not mapped"), probably because of middle-end bugs. I haven't
investigated those in detail.

OK for mainline?

Thanks,

Julian

ChangeLog

xxxx-xx-xx  Nathan Sidwell  <nathan@codesourcery.com>
	    James Norris  <jnorris@codesourcery.com>
	    Thomas Schwinge  <thomas@codesourcery.com>
	    Tom de Vries  <tom@codesourcery.com>
	    Julian Brown  <julian@codesourcery.com>
	    Bernd Schmidt  <bernds@codesourcery.com>
	    Cesar Philippidis  <cesar@codesourcery.com>

    include/
    * gomp-constants.h: New file.

    libgomp/
    * Makefile.am (search_path): Search in $(top_srcidr)/../include also.
    (libgomp_la_SOURCES): Add oacc-parallel.c, splay-tree.c,
    oacc-host.c, oacc-init.c, oacc-mem.c, oacc-async.c, oacc-plugin.c,
    oacc-cuda.c, libgomp-plugin.c.
    (Makefrag.am): Include.
    (libgomp_la_SOURCES): Add openacc.f90 if USE_FORTRAN is true.
    (nodist_libsubinclude_HEADERS): Add openacc.h, ../include/gomp-constants.h.
    (nodist_finclude_HEADERS): Add openacc_lib.h, openacc.f90, openacc.mod,
    openacc_kinds.mod.
    * configure.ac (plugin_support): Add check for accelerators if attempting
    to build plugins.
    (plugin/configfrag.ac): Include.
    (offload_targets): Add host_nonshm target by default, nvptx target
    conditionally if the corresponding offload target is enabled.
    (testsuite/libgomp-test-support.exp): Add to AC_CONFIG_FILES.
    * env.c (libgomp_target.h, oacc-int.h): Include.
    (goacc_notify_var, goacc_device_num, goacc_device_type): New globals.
    (goacc_parse_device_type): New functions.
    (initialize_env): Parse GCC_ACC_NOTIFY, ACC_DEVICE_TYPE, ACC_DEVICE_NUM
    environment variables. Call ACC_runtime_initialize.
    * error.c (gomp_verror): Make global.
    (gomp_vfatal, gomp_vnotify, gomp_notify): New functions.
    (gomp_fatal): Use gomp_vfatal instead of gomp_verror.
    * libgomp.h (stdarg.h): Include.
    (struct gomp_memory_mapping): Forward declaration.
    (goacc_notify_var, goacc_device_num, goacc_device_type): Add extern
    declarations.
    (gomp_vnotify, gomp_notify, gomp_verror, gomp_vfatal): Add
    prototypes.
    (gomp_init_targets_once): Add prototype.
    * libgomp.map (OACC_2.0): New symbol version. Add public acc_*
    interface functions.
    (PLUGIN_1.0): New symbol version. Add gomp plugin interface functions.
    * libgomp_g.h (GOACC_data_start, GOACC_data_end, GOACC_kernels)
    (GOACC_parallel, GOACC_wait): Add prototypes.
    * libgomp_target.h (gomp-constants.h, splay-tree.h): Include.
    (offload_target_type): Set enumeration values from constants in
    gomp-constants.h. Add OFFLOAD_TARGET_TYPE_HOST_NONSHM and
    OFFLOAD_TARGET_TYPE_NVIDIA_PTX.
    (struct target_mem_desc): Move to here.
    (TARGET_CAP_SHARED_MEM, TARGET_CAP_NATIVE_EXEC, TARGET_CAP_OPENMP_400)
    (TARGET_CAP_OPENACC_200): Define macros.
    (struct gomp_memory_mapping): New.
    (struct ACC_dispatch_t): New.
    (struct gomp_device_descr): Move here. Add offload_regions_registered,
    openacc dispatch functions, target_data.
    (gomp_map_vars, gomp_copy_from_async, gomp_unmap_vars, gomp_init_device)
    (gomp_init_tables, gomp_fini_device, gomp_free_memmap): Add prototypes.
    * target.c (oacc-plugin.h, gomp-constants.h, oacc-int.h, stdio.h)
    (assert.h): Include.
    (splay_tree_node, splay_tree, splay_tree_key, target_mem_desc)
    (splay_tree_key_s, gomp_device_descr): Don't declare here.
    (splay_compare): Change linkage to hidden not static.
    (gomp_init_targets_once): New function.
    (gomp_get_num_devices): Use above.
    (get_kind): New function.
    (gomp_map_vars): Add is_openacc parameter. Change KINDS to void *. Use lock
    from memory map not device. Use macros from gomp-constants.h instead of
    hard-coded values. Support OpenACC-specific mappings.
    (gomp_copy_from_async): New function.
    (gomp_unmap_vars): Add DO_COPYFROM argument. Only copy memory
    back from device if it is true. Use lock from memory map not
    device.
    (gomp_update): Add is_openacc parameter. Use lock from memory map not
    device. Use macros from gomp-constants.h instead of hard-coded values.
    (gomp_register_image_for_device): Add forward declaration.
    (GOMP_offload_register): Check realloc result.
    (gomp_init_device): Change linkage to hidden not static.
    (gomp_init_tables, gomp_init_dev_tables, gomp_free_memmap)
    (gomp_fini_device): New function.
    (GOMP_target): Adjust lazy initialization, check target
    capabilities for OpenMP 4.0 support. Update call to gomp_map_vars,
    gomp_unmap_vars.
    (GOMP_target_data): Adjust lazy initialization. Update call to
    gomp_map_vars.
    (GOMP_target_end_data): Update call to gomp_unmap_vars.
    (GOMP_target_update): Tweak lazy initialization. Add new args to
    gomp_update call.
    (gomp_load_plugin_for_device): Initialize get_name, get_caps, device_fini
    and OpenACC-specific plugin hooks.
    (gomp_register_images_for_device): Rename to...
    (gomp_register_image_for_device): This, and register a single
    device only, and only if it has not already had images
    registered.
    (gomp_find_available_plugins): Initialize OpenACC-specific bits, offload
    image registration, and other new device member data. Prefer device with
    TARGET_CAP_OPENMP_400 if more than one plugin is available.
    * libgomp-plugin.c: New file.
    * libgomp-plugin.h: New file.
    * oacc-async.c: New file.
    * oacc-cuda.c: New file.
    * oacc-host.c: New file.
    * oacc-init.c: New file.
    * oacc-int.h: New file.
    * oacc-mem.c: New file.
    * oacc-parallel.c: New file.
    * oacc-plugin.c: New file.
    * oacc-plugin.h: New file.
    * openacc.f90: New file.
    * openacc.h: New file.
    * openacc_lib.h: New file.
    * splay-tree.h: Move bulk of implementation to...
    * splay-tree.c: New file.
    * Makefile.in: Regenerate.
    * config.h.in: Regenerate.
    * configure: Regenerate.
    * plugin/Makefrag.am: New file.
    * plugin/configfrag.am: New file.
    * plugin/plugin-host.c: New file.
    * plugin/plugin-nvptx.c: New file.
    * testsuite/libgomp-test-support.exp.in: New file.

[-- Attachment #2: 0001-OpenACC-support-for-libgomp-3.patch --]
[-- Type: text/x-patch, Size: 242180 bytes --]

commit ee628c1e4014164b184eabad53bace13472e0d19
Author: Julian Brown <julian@codesourcery.com>
Date:   Mon Sep 22 02:55:12 2014 -0700

    OpenACC support for libgomp.
    
    xxxx-xx-xx  Nathan Sidwell  <nathan@codesourcery.com>
    	    James Norris  <jnorris@codesourcery.com>
    	    Thomas Schwinge  <thomas@codesourcery.com>
    	    Tom de Vries  <tom@codesourcery.com>
    	    Julian Brown  <julian@codesourcery.com>
    	    Bernd Schmidt  <bernds@codesourcery.com>
    	    Cesar Philippidis  <cesar@codesourcery.com>
    
        include/
        * gomp-constants.h: New file.
    
        libgomp/
        * Makefile.am (search_path): Search in $(top_srcidr)/../include also.
        (libgomp_la_SOURCES): Add oacc-parallel.c, splay-tree.c,
        oacc-fortran.c, oacc-host.c, oacc-init.c, oacc-mem.c,
        oacc-async.c, oacc-plugin.c, oacc-cuda.c, libgomp-plugin.c.
        (Makefrag.am): Include.
        (libgomp_la_SOURCES): Add openacc.f90 if USE_FORTRAN is true.
        (nodist_libsubinclude_HEADERS): Add openacc.h, ../include/gomp-constants.h.
        (nodist_finclude_HEADERS): Add openacc_lib.h, openacc.f90, openacc.mod,
        openacc_kinds.mod.
        * configure.ac (plugin_support): Add check for accelerators if attempting
        to build plugins.
        (plugin/configfrag.ac): Include.
        (offload_targets): Add host_nonshm target by default, nvptx target
        conditionally if the corresponding offload target is enabled.
        (testsuite/libgomp-test-support.exp): Add to AC_CONFIG_FILES.
        * env.c (libgomp_target.h, oacc-int.h): Include.
        (goacc_notify_var, goacc_device_num, goacc_device_type): New globals.
        (goacc_parse_device_type): New functions.
        (initialize_env): Parse GCC_ACC_NOTIFY, ACC_DEVICE_TYPE, ACC_DEVICE_NUM
        environment variables. Call ACC_runtime_initialize.
        * error.c (gomp_verror): Make global.
        (gomp_vfatal, gomp_vnotify, gomp_notify): New functions.
        (gomp_fatal): Use gomp_vfatal instead of gomp_verror.
        * libgomp.h (stdarg.h): Include.
        (struct gomp_memory_mapping): Forward declaration.
        (goacc_notify_var, goacc_device_num, goacc_device_type): Add extern
        declarations.
        (gomp_vnotify, gomp_notify, gomp_verror, gomp_vfatal): Add
        prototypes.
        (gomp_init_targets_once): Add prototype.
        * libgomp.map (OACC_2.0): New symbol version. Add public acc_*
        interface functions.
        (PLUGIN_1.0): New symbol version. Add gomp plugin interface functions.
        * libgomp_g.h (GOACC_data_start, GOACC_data_end, GOACC_kernels)
        (GOACC_parallel, GOACC_wait): Add prototypes.
        * libgomp_target.h (gomp-constants.h, splay-tree.h): Include.
        (offload_target_type): Set enumeration values from constants in
        gomp-constants.h. Add OFFLOAD_TARGET_TYPE_HOST_NONSHM and
        OFFLOAD_TARGET_TYPE_NVIDIA_PTX.
        (struct target_mem_desc): Move to here.
        (TARGET_CAP_SHARED_MEM, TARGET_CAP_NATIVE_EXEC, TARGET_CAP_OPENMP_400)
        (TARGET_CAP_OPENACC_200): Define macros.
        (struct gomp_memory_mapping): New.
        (struct ACC_dispatch_t): New.
        (struct gomp_device_descr): Move here. Add offload_regions_registered,
        openacc dispatch functions, target_data.
        (gomp_map_vars, gomp_copy_from_async, gomp_unmap_vars, gomp_init_device)
        (gomp_init_tables, gomp_fini_device, gomp_free_memmap): Add prototypes.
        * target.c (oacc-plugin.h, gomp-constants.h, oacc-int.h, stdio.h)
        (assert.h): Include.
        (splay_tree_node, splay_tree, splay_tree_key, target_mem_desc)
        (splay_tree_key_s, gomp_device_descr): Don't declare here.
        (splay_compare): Change linkage to hidden not static.
        (gomp_init_targets_once): New function.
        (gomp_get_num_devices): Use above.
        (get_kind): New function.
        (gomp_map_vars): Add is_openacc parameter. Change KINDS to void *. Use lock
        from memory map not device. Use macros from gomp-constants.h instead of
        hard-coded values. Support OpenACC-specific mappings.
        (gomp_copy_from_async): New function.
        (gomp_unmap_vars): Add DO_COPYFROM argument. Only copy memory
        back from device if it is true. Use lock from memory map not
        device.
        (gomp_update): Add is_openacc parameter. Use lock from memory map not
        device. Use macros from gomp-constants.h instead of hard-coded values.
        (gomp_register_image_for_device): Add forward declaration.
        (GOMP_offload_register): Check realloc result.
        (gomp_init_device): Change linkage to hidden not static.
        (gomp_init_tables, gomp_init_dev_tables, gomp_free_memmap)
        (gomp_fini_device): New function.
        (GOMP_target): Adjust lazy initialization, check target
        capabilities for OpenMP 4.0 support. Update call to gomp_map_vars,
        gomp_unmap_vars.
        (GOMP_target_data): Adjust lazy initialization. Update call to
        gomp_map_vars.
        (GOMP_target_end_data): Update call to gomp_unmap_vars.
        (GOMP_target_update): Tweak lazy initialization. Add new args to
        gomp_update call.
        (gomp_load_plugin_for_device): Initialize get_name, get_caps, device_fini
        and OpenACC-specific plugin hooks.
        (gomp_register_images_for_device): Rename to...
        (gomp_register_image_for_device): This, and register a single
        device only, and only if it has not already had images
        registered.
        (gomp_find_available_plugins): Initialize OpenACC-specific bits, offload
        image registration, and other new device member data. Prefer device with
        TARGET_CAP_OPENMP_400 if more than one plugin is available.
        * libgomp-plugin.c: New file.
        * libgomp-plugin.h: New file.
        * oacc-async.c: New file.
        * oacc-cuda.c: New file.
        * oacc-fortran.c: New file.
        * oacc-host.c: New file.
        * oacc-init.c: New file.
        * oacc-int.h: New file.
        * oacc-mem.c: New file.
        * oacc-parallel.c: New file.
        * oacc-plugin.c: New file.
        * oacc-plugin.h: New file.
        * openacc.f90: New file.
        * openacc.h: New file.
        * openacc_lib.h: New file.
        * splay-tree.h: Move bulk of implementation to...
        * splay-tree.c: New file.
        * Makefile.in: Regenerate.
        * config.h.in: Regenerate.
        * configure: Regenerate.
        * plugin/Makefrag.am: New file.
        * plugin/configfrag.am: New file.
        * plugin/plugin-host.c: New file.
        * plugin/plugin-nvptx.c: New file.
        * testsuite/libgomp-test-support.exp.in: New file.

diff --git a/include/gomp-constants.h b/include/gomp-constants.h
new file mode 100644
index 0000000..7ef5c88
--- /dev/null
+++ b/include/gomp-constants.h
@@ -0,0 +1,45 @@
+#ifndef GOMP_CONSTANTS_H
+#define GOMP_CONSTANTS_H 1
+
+/* Enumerated variable mapping types used to communicate between GCC and
+   libgomp.  These values are used for both OpenMP and OpenACC.  */
+
+#define GOMP_MAP_ALLOC			0x00
+#define GOMP_MAP_ALLOC_TO		0x01
+#define GOMP_MAP_ALLOC_FROM		0x02
+#define GOMP_MAP_ALLOC_TOFROM		0x03
+#define GOMP_MAP_POINTER		0x04
+#define GOMP_MAP_TO_PSET		0x05
+#define GOMP_MAP_FORCE_ALLOC		0x08
+#define GOMP_MAP_FORCE_TO		0x09
+#define GOMP_MAP_FORCE_FROM		0x0a
+#define GOMP_MAP_FORCE_TOFROM		0x0b
+#define GOMP_MAP_FORCE_PRESENT		0x0c
+#define GOMP_MAP_FORCE_DEALLOC		0x0d
+#define GOMP_MAP_FORCE_DEVICEPTR	0x0e
+#define GOMP_MAP_FORCE_PRIVATE		0x18
+#define GOMP_MAP_FORCE_FIRSTPRIVATE	0x19
+
+#define GOMP_MAP_COPYTO_P(X) \
+  ((X) == GOMP_MAP_ALLOC_TO || (X) == GOMP_MAP_FORCE_TO)
+
+#define GOMP_MAP_COPYFROM_P(X) \
+  ((X) == GOMP_MAP_ALLOC_FROM || (X) == GOMP_MAP_FORCE_FROM)
+
+#define GOMP_MAP_TOFROM_P(X) \
+  ((X) == GOMP_MAP_ALLOC_TOFROM || (X) == GOMP_MAP_FORCE_TOFROM)
+
+#define GOMP_MAP_POINTER_P(X) \
+  ((X) == GOMP_MAP_POINTER)
+
+#define GOMP_IF_CLAUSE_FALSE		-2
+
+/* Canonical list of target type codes for OpenMP/OpenACC.  */
+#define GOMP_TARGET_NONE		0
+#define GOMP_TARGET_HOST		2
+#define GOMP_TARGET_HOST_NONSHM		3
+#define GOMP_TARGET_NOT_HOST		4
+#define GOMP_TARGET_NVIDIA_PTX		5
+#define GOMP_TARGET_INTEL_MIC		6
+
+#endif
diff --git a/libgomp/Makefile.am b/libgomp/Makefile.am
index 427415e..4c73c7a 100644
--- a/libgomp/Makefile.am
+++ b/libgomp/Makefile.am
@@ -7,7 +7,8 @@ SUBDIRS = testsuite
 gcc_version := $(shell cat $(top_srcdir)/../gcc/BASE-VER)
 
 config_path = @config_path@
-search_path = $(addprefix $(top_srcdir)/config/, $(config_path)) $(top_srcdir)
+search_path = $(addprefix $(top_srcdir)/config/, $(config_path)) $(top_srcdir) \
+	      $(top_srcdir)/../include
 
 fincludedir = $(libdir)/gcc/$(target_alias)/$(gcc_version)/finclude
 libsubincludedir = $(libdir)/gcc/$(target_alias)/$(gcc_version)/include
@@ -60,12 +61,21 @@ libgomp_la_LINK = $(LINK) $(libgomp_la_LDFLAGS)
 libgomp_la_SOURCES = alloc.c barrier.c critical.c env.c error.c iter.c \
 	iter_ull.c loop.c loop_ull.c ordered.c parallel.c sections.c single.c \
 	task.c team.c work.c lock.c mutex.c proc.c sem.c bar.c ptrlock.c \
-	time.c fortran.c affinity.c target.c
+	time.c fortran.c affinity.c target.c oacc-parallel.c splay-tree.c \
+	oacc-host.c oacc-init.c oacc-mem.c oacc-async.c oacc-plugin.c \
+	oacc-cuda.c libgomp-plugin.c
+
+include $(top_srcdir)/plugin/Makefrag.am
+
+if USE_FORTRAN
+libgomp_la_SOURCES += openacc.f90
+endif
 
 nodist_noinst_HEADERS = libgomp_f.h
-nodist_libsubinclude_HEADERS = omp.h
+nodist_libsubinclude_HEADERS = omp.h openacc.h ../include/gomp-constants.h
 if USE_FORTRAN
-nodist_finclude_HEADERS = omp_lib.h omp_lib.f90 omp_lib.mod omp_lib_kinds.mod
+nodist_finclude_HEADERS = omp_lib.h omp_lib.f90 omp_lib.mod omp_lib_kinds.mod \
+	openacc_lib.h openacc.f90 openacc.mod openacc_kinds.mod
 endif
 
 LTLDFLAGS = $(shell $(SHELL) $(top_srcdir)/../libtool-ldflags $(LDFLAGS))
diff --git a/libgomp/Makefile.in b/libgomp/Makefile.in
index 5cd666f..88a4f46 100644
diff --git a/libgomp/config.h.in b/libgomp/config.h.in
index 94a2b3b..309962d 100644
diff --git a/libgomp/configure b/libgomp/configure
index 97c9be6..d325306 100755
diff --git a/libgomp/configure.ac b/libgomp/configure.ac
index 3f34ff8..2b701ca 100644
--- a/libgomp/configure.ac
+++ b/libgomp/configure.ac
@@ -2,6 +2,8 @@
 # aclocal -I ../config && autoconf && autoheader && automake
 
 AC_PREREQ(2.64)
+#TODO: Update for OpenACC?  But then also have to update copyright notices in
+#all source files...
 AC_INIT([GNU OpenMP Runtime Library], 1.0,,[libgomp])
 AC_CONFIG_HEADER(config.h)
 
@@ -28,7 +30,6 @@ LIBGOMP_ENABLE(generated-files-in-srcdir, no, ,
 AC_MSG_RESULT($enable_generated_files_in_srcdir)
 AM_CONDITIONAL(GENINSRC, test "$enable_generated_files_in_srcdir" = yes)
 
-
 # -------
 # -------
 
@@ -198,8 +199,12 @@ AC_CHECK_LIB(dl, dlsym, , [plugin_support=no])
 if test x"$plugin_support" = xyes; then
   AC_DEFINE(PLUGIN_SUPPORT, 1,
     [Define if all infrastructure, needed for plugins, is supported.])
+elif test "x$enable_accelerator" != xno; then
+  AC_MSG_ERROR([Can't have support for accelerators without support for plugins])
 fi
 
+m4_include([plugin/configfrag.ac])
+
 # Check for functions needed.
 AC_CHECK_FUNCS(getloadavg clock_gettime strtoull)
 
@@ -280,13 +285,15 @@ else
   multilib_arg=
 fi
 
-offload_targets=
+offload_targets=host_nonshm
 if test x"$enable_offload_targets" != x; then
   for tgt in `echo $enable_offload_targets | sed -e 's#,# #g'`; do
     tgt=`echo $tgt | sed 's/=.*//'`
     case $tgt in
       *-intelmic-* | *-intelmicemul-*)
 	tgt_name="intelmic" ;;
+      nvptx-*)
+	tgt_name="nvptx" ;;
       *)
 	AC_MSG_ERROR([unknown offload target specified]) ;;
     esac
@@ -374,4 +381,5 @@ CFLAGS="$save_CFLAGS"
 
 AC_CONFIG_FILES(omp.h omp_lib.h omp_lib.f90 libgomp_f.h)
 AC_CONFIG_FILES(Makefile testsuite/Makefile libgomp.spec)
+AC_CONFIG_FILES([testsuite/libgomp-test-support.exp])
 AC_OUTPUT
diff --git a/libgomp/env.c b/libgomp/env.c
index 94c72a3..26d2149 100644
--- a/libgomp/env.c
+++ b/libgomp/env.c
@@ -27,6 +27,8 @@
 
 #include "libgomp.h"
 #include "libgomp_f.h"
+#include "libgomp_target.h"
+#include "oacc-int.h"
 #include <ctype.h>
 #include <stdlib.h>
 #include <stdio.h>
@@ -77,6 +79,10 @@ unsigned long gomp_bind_var_list_len;
 void **gomp_places_list;
 unsigned long gomp_places_list_len;
 
+int goacc_notify_var;
+int goacc_device_num;
+char* goacc_device_type;
+
 /* Parse the OMP_SCHEDULE environment variable.  */
 
 static void
@@ -1011,6 +1017,16 @@ parse_affinity (bool ignore)
   return false;
 }
 
+static void
+goacc_parse_device_type (void)
+{
+  const char *env = getenv ("ACC_DEVICE_TYPE");
+  
+  if (env && *env != '\0')
+    goacc_device_type = strdup (env);
+  else
+    goacc_device_type = NULL;
+}
 
 static void
 handle_omp_display_env (unsigned long stacksize, int wait_policy)
@@ -1181,6 +1197,7 @@ initialize_env (void)
       gomp_global_icv.thread_limit_var
 	= thread_limit_var > INT_MAX ? UINT_MAX : thread_limit_var;
     }
+  parse_int ("GCC_ACC_NOTIFY", &goacc_notify_var, true);
 #ifndef HAVE_SYNC_BUILTINS
   gomp_mutex_init (&gomp_managed_threads_lock);
 #endif
@@ -1271,6 +1288,15 @@ initialize_env (void)
     }
 
   handle_omp_display_env (stacksize, wait_policy);
+  
+  /* Look for OpenACC-specific environment variables.  */
+  if (!parse_int ("ACC_DEVICE_NUM", &goacc_device_num, true))
+    goacc_device_num = 0;
+
+  goacc_parse_device_type ();
+
+  /* Initialize OpenACC-specific internal state.  */
+  ACC_runtime_initialize ();
 }
 
 \f
diff --git a/libgomp/error.c b/libgomp/error.c
index d9b28f1..320b4d2 100644
--- a/libgomp/error.c
+++ b/libgomp/error.c
@@ -35,7 +35,7 @@
 #include <stdlib.h>
 
 
-static void
+void
 gomp_verror (const char *fmt, va_list list)
 {
   fputs ("\nlibgomp: ", stderr);
@@ -54,13 +54,39 @@ gomp_error (const char *fmt, ...)
 }
 
 void
+gomp_vfatal (const char *fmt, va_list list)
+{
+  gomp_verror (fmt, list);
+  exit (EXIT_FAILURE);
+}
+
+void
 gomp_fatal (const char *fmt, ...)
 {
   va_list list;
 
   va_start (list, fmt);
-  gomp_verror (fmt, list);
+  gomp_vfatal (fmt, list);
   va_end (list);
 
-  exit (EXIT_FAILURE);
+  /* Unreachable.  */
+  abort ();
+}
+
+void
+gomp_vnotify (const char *msg, va_list list)
+{
+  if (goacc_notify_var)
+    vfprintf (stderr, msg, list);
+}
+
+void
+gomp_notify (const char *msg, ...)
+{
+  va_list list;
+  
+  va_start (list, msg);
+  gomp_vnotify (msg, list);
+  va_end (list);
 }
+
diff --git a/libgomp/libgomp-plugin.c b/libgomp/libgomp-plugin.c
new file mode 100644
index 0000000..f0e35d6
--- /dev/null
+++ b/libgomp/libgomp-plugin.c
@@ -0,0 +1,107 @@
+/* Copyright (C) 2014 Free Software Foundation, Inc.
+
+   Contributed by Mentor Embedded.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* Exported (non-hidden) functions exposing libgomp interface for plugins.  */
+
+#include <stdlib.h>
+
+#include "libgomp.h"
+#include "libgomp-plugin.h"
+#include "target.h"
+
+void *
+GOMP_PLUGIN_malloc (size_t size)
+{
+  return gomp_malloc (size);
+}
+
+void *
+GOMP_PLUGIN_malloc_cleared (size_t size)
+{
+  return gomp_malloc_cleared (size);
+}
+
+void *
+GOMP_PLUGIN_realloc (void *ptr, size_t size)
+{
+  return gomp_realloc (ptr, size);
+}
+
+void
+GOMP_PLUGIN_error (const char *msg, ...)
+{
+  va_list ap;
+  
+  va_start (ap, msg);
+  gomp_verror (msg, ap);
+  va_end (ap);
+}
+
+void
+GOMP_PLUGIN_notify (const char *msg, ...)
+{
+  va_list ap;
+  
+  va_start (ap, msg);
+  gomp_vnotify (msg, ap);
+  va_end (ap);
+}
+
+void
+GOMP_PLUGIN_fatal (const char *msg, ...)
+{
+  va_list ap;
+  
+  va_start (ap, msg);
+  gomp_vfatal (msg, ap);
+  va_end (ap);
+  
+  /* Unreachable.  */
+  abort ();
+}
+
+void
+GOMP_PLUGIN_mutex_init (gomp_mutex_t *mutex)
+{
+  gomp_mutex_init (mutex);
+}
+
+void
+GOMP_PLUGIN_mutex_destroy (gomp_mutex_t *mutex)
+{
+  gomp_mutex_destroy (mutex);
+}
+
+void
+GOMP_PLUGIN_mutex_lock (gomp_mutex_t *mutex)
+{
+  gomp_mutex_lock (mutex);
+}
+
+void
+GOMP_PLUGIN_mutex_unlock (gomp_mutex_t *mutex)
+{
+  gomp_mutex_unlock (mutex);
+}
diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h
new file mode 100644
index 0000000..87367e3
--- /dev/null
+++ b/libgomp/libgomp-plugin.h
@@ -0,0 +1,54 @@
+/* Copyright (C) 2014 Free Software Foundation, Inc.
+
+   Contributed by Mentor Embedded.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* An interface to various libgomp-internal functions for use by plugins.  */
+
+#ifndef LIBGOMP_PLUGIN_H
+#define LIBGOMP_PLUGIN_H 1
+
+#include "mutex.h"
+
+/* alloc.c */
+
+extern void *GOMP_PLUGIN_malloc (size_t) __attribute__((malloc));
+extern void *GOMP_PLUGIN_malloc_cleared (size_t) __attribute__((malloc));
+extern void *GOMP_PLUGIN_realloc (void *, size_t);
+
+/* error.c */
+
+extern void GOMP_PLUGIN_notify(const char *msg, ...);
+extern void GOMP_PLUGIN_error (const char *, ...)
+	__attribute__((format (printf, 1, 2)));
+extern void GOMP_PLUGIN_fatal (const char *, ...)
+	__attribute__((noreturn, format (printf, 1, 2)));
+
+/* mutex.c */
+
+extern void GOMP_PLUGIN_mutex_init (gomp_mutex_t *mutex);
+extern void GOMP_PLUGIN_mutex_destroy (gomp_mutex_t *mutex);
+extern void GOMP_PLUGIN_mutex_lock (gomp_mutex_t *mutex);
+extern void GOMP_PLUGIN_mutex_unlock (gomp_mutex_t *mutex);
+
+#endif
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index a1482cc..251e61b 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -40,6 +40,7 @@
 #include <pthread.h>
 #include <stdbool.h>
 #include <stdlib.h>
+#include <stdarg.h>
 
 #ifdef HAVE_ATTRIBUTE_VISIBILITY
 # pragma GCC visibility push(hidden)
@@ -220,6 +221,7 @@ struct gomp_team_state
 };
 
 struct target_mem_desc;
+struct gomp_memory_mapping;
 
 /* These are the OpenMP 4.0 Internal Control Variables described in
    section 2.3.1.  Those described as having one copy per task are
@@ -254,6 +256,10 @@ extern unsigned long gomp_bind_var_list_len;
 extern void **gomp_places_list;
 extern unsigned long gomp_places_list_len;
 
+extern int goacc_notify_var;
+extern int goacc_device_num;
+extern char* goacc_device_type;
+
 enum gomp_task_kind
 {
   GOMP_TASK_IMPLICIT,
@@ -532,8 +538,12 @@ extern void *gomp_realloc (void *, size_t);
 
 /* error.c */
 
+extern void gomp_vnotify (const char *, va_list);
+extern void gomp_notify (const char *msg, ...);
+extern void gomp_verror (const char *, va_list);
 extern void gomp_error (const char *, ...)
 	__attribute__((format (printf, 1, 2)));
+extern void gomp_vfatal (const char *, va_list);
 extern void gomp_fatal (const char *, ...)
 	__attribute__((noreturn, format (printf, 1, 2)));
 
@@ -606,6 +616,7 @@ extern void gomp_free_thread (void *);
 
 /* target.c */
 
+extern void gomp_init_targets_once (void);
 extern int gomp_get_num_devices (void);
 
 /* work.c */
diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
index f36df23..938f6bf 100644
--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -232,3 +232,115 @@ GOMP_4.0.1 {
   global:
 	GOMP_offload_register;
 } GOMP_4.0;
+
+OACC_2.0 {
+  global:
+	acc_get_num_devices;
+	acc_get_num_devices_h_;
+	acc_set_device_type;
+	acc_set_device_type_h_;
+	acc_get_device_type;
+	acc_get_device_type_h_;
+	acc_set_device_num;
+	acc_set_device_num_h_;
+	acc_get_device_num;
+	acc_get_device_num_h_;
+	acc_async_test;
+	acc_async_test_h_;
+	acc_async_test_all;
+	acc_async_test_all_h_;
+	acc_wait;
+	acc_wait_h_;
+	acc_wait_async;
+	acc_wait_async_h_;
+	acc_wait_all;
+	acc_wait_all_h_;
+	acc_wait_all_async;
+	acc_wait_all_async_h_;
+	acc_init;
+	acc_init_h_;
+	acc_shutdown;
+	acc_shutdown_h_;
+	acc_on_device;
+	acc_on_device_h_;
+	acc_malloc;
+	acc_free;
+	acc_copyin;
+	acc_copyin_32_h_;
+	acc_copyin_64_h_;
+	acc_copyin_array_h_;
+	acc_present_or_copyin;
+	acc_present_or_copyin_32_h_;
+	acc_present_or_copyin_64_h_;
+	acc_present_or_copyin_array_h_;
+	acc_create;
+	acc_create_32_h_;
+	acc_create_64_h_;
+	acc_create_array_h_;
+	acc_present_or_create;
+	acc_present_or_create_32_h_;
+	acc_present_or_create_64_h_;
+	acc_present_or_create_array_h_;
+	acc_copyout;
+	acc_copyout_32_h_;
+	acc_copyout_64_h_;
+	acc_copyout_array_h_;
+	acc_delete;
+	acc_delete_32_h_;
+	acc_delete_64_h_;
+	acc_delete_array_h_;
+	acc_update_device;
+	acc_update_device_32_h_;
+	acc_update_device_64_h_;
+	acc_update_device_array_h_;
+	acc_update_self;
+	acc_update_self_32_h_;
+	acc_update_self_64_h_;
+	acc_update_self_array_h_;
+	acc_map_data;
+	acc_unmap_data;
+	acc_deviceptr;
+	acc_hostptr;
+	acc_is_present;
+	acc_is_present_32_h_;
+	acc_is_present_64_h_;
+	acc_is_present_array_h_;
+	acc_memcpy_to_device;
+	acc_memcpy_from_device;
+	acc_get_current_cuda_device;
+	acc_get_current_cuda_context;
+	acc_get_cuda_stream;
+	acc_set_cuda_stream;
+};
+
+GOACC_2.0 {
+  global:
+	GOACC_data_end;
+	GOACC_data_start;
+	GOACC_kernels;
+	GOACC_parallel;
+	GOACC_update;
+	GOACC_wait;
+};
+
+PLUGIN_1.0 {
+  global:
+	GOMP_PLUGIN_malloc;
+	GOMP_PLUGIN_malloc_cleared;
+	GOMP_PLUGIN_realloc;
+	GOMP_PLUGIN_error;
+	GOMP_PLUGIN_notify;
+	GOMP_PLUGIN_fatal;
+	GOMP_PLUGIN_mutex_init;
+	GOMP_PLUGIN_mutex_destroy;
+	GOMP_PLUGIN_mutex_lock;
+	GOMP_PLUGIN_mutex_unlock;
+	GOMP_PLUGIN_async_unmap_vars;
+	GOMP_PLUGIN_acc_thread;
+};
+
+# TODO.  See testsuite/lib/libgomp.exp:libgomp_init.
+INTERNAL {
+  global:
+	initialize_env;
+};
diff --git a/libgomp/libgomp_g.h b/libgomp/libgomp_g.h
index be0c6ea..44f200c 100644
--- a/libgomp/libgomp_g.h
+++ b/libgomp/libgomp_g.h
@@ -214,4 +214,17 @@ extern void GOMP_target_update (int, const void *,
 				size_t, void **, size_t *, unsigned char *);
 extern void GOMP_teams (unsigned int, unsigned int);
 
+/* oacc-parallel.c */
+
+extern void GOACC_data_start (int, const void *,
+			      size_t, void **, size_t *, unsigned short *);
+extern void GOACC_data_end (void);
+extern void GOACC_kernels (int, void (*) (void *), const void *,
+			   size_t, void **, size_t *, unsigned short *,
+			   int, int, int, int, int, ...);
+extern void GOACC_parallel (int, void (*) (void *), const void *,
+			    size_t, void **, size_t *, unsigned short *,
+			    int, int, int, int, int, ...);
+extern void GOACC_wait (int, int, ...);
+
 #endif /* LIBGOMP_G_H */
diff --git a/libgomp/libgomp_target.h b/libgomp/libgomp_target.h
index f7d19d0..9d6b7e4 100644
--- a/libgomp/libgomp_target.h
+++ b/libgomp/libgomp_target.h
@@ -24,11 +24,15 @@
 #ifndef LIBGOMP_TARGET_H
 #define LIBGOMP_TARGET_H 1
 
-/* Type of offload target device.  */
+#include "gomp-constants.h"
+
+/* Type of offload target device.  Keep in sync with openacc.h:acc_device_t.  */
 enum offload_target_type
 {
-  OFFLOAD_TARGET_TYPE_HOST,
-  OFFLOAD_TARGET_TYPE_INTEL_MIC
+  OFFLOAD_TARGET_TYPE_HOST = GOMP_TARGET_HOST,
+  OFFLOAD_TARGET_TYPE_HOST_NONSHM = GOMP_TARGET_HOST_NONSHM,
+  OFFLOAD_TARGET_TYPE_NVIDIA_PTX = GOMP_TARGET_NVIDIA_PTX,
+  OFFLOAD_TARGET_TYPE_INTEL_MIC = GOMP_TARGET_INTEL_MIC
 };
 
 /* Auxiliary struct, used for transferring a host-target address range mapping
@@ -41,4 +45,177 @@ struct mapping_table
   uintptr_t tgt_end;
 };
 
+#include "splay-tree.h"
+
+struct target_mem_desc {
+  /* Reference count.  */
+  uintptr_t refcount;
+  /* All the splay nodes allocated together.  */
+  splay_tree_node array;
+  /* Start of the target region.  */
+  uintptr_t tgt_start;
+  /* End of the targer region.  */
+  uintptr_t tgt_end;
+  /* Handle to free.  */
+  void *to_free;
+  /* Previous target_mem_desc.  */
+  struct target_mem_desc *prev;
+  /* Number of items in following list.  */
+  size_t list_count;
+
+  /* Corresponding target device descriptor.  */
+  struct gomp_device_descr *device_descr;
+  
+  /* Memory mapping info for the thread that created this descriptor.  */
+  struct gomp_memory_mapping *mem_map;
+
+  /* List of splay keys to remove (or decrease refcount)
+     at the end of region.  */
+  splay_tree_key list[];
+};
+
+#define TARGET_CAP_SHARED_MEM	1
+#define TARGET_CAP_NATIVE_EXEC	2
+#define TARGET_CAP_OPENMP_400	4
+#define TARGET_CAP_OPENACC_200	8
+
+/* Information about mapped memory regions (per device/context).  */
+
+struct gomp_memory_mapping
+{
+  /* Splay tree containing information about mapped memory regions.  */
+  struct splay_tree_s splay_tree;
+
+  /* Mutex for operating with the splay tree and other shared structures.  */
+  gomp_mutex_t lock;
+  
+  /* True when tables have been added to this memory map.  */
+  bool is_initialized;
+};
+
+typedef struct ACC_dispatch_t
+{
+  /* This is a linked list of data mapped using the
+     acc_map_data/acc_unmap_data or "acc enter data"/"acc exit data" pragmas
+     (TODO).  Unlike mapped_data in the goacc_thread struct, unmapping can
+     happen out-of-order with respect to mapping.  */
+  struct target_mem_desc *data_environ;
+
+  /* Open or close a device instance.  */
+  void *(*open_device_func) (int n);
+  int (*close_device_func) (void *h);
+
+  /* Set or get the device number.  */
+  int (*get_device_num_func) (void);
+  void (*set_device_num_func) (int);
+
+  /* Execute.  */
+  void (*exec_func) (void (*) (void *), size_t, void **, void **, size_t *,
+		     unsigned short *, int, int, int, int, void *);
+
+  /* Async cleanup callback registration.  */
+  void (*register_async_cleanup_func) (void *);
+
+  /* Asynchronous routines.  */
+  int (*async_test_func) (int);
+  int (*async_test_all_func) (void);
+  void (*async_wait_func) (int);
+  void (*async_wait_async_func) (int, int);
+  void (*async_wait_all_func) (void);
+  void (*async_wait_all_async_func) (int);
+  void (*async_set_async_func) (int);
+
+  /* Create/destroy TLS data.  */
+  void *(*create_thread_data_func) (void *);
+  void (*destroy_thread_data_func) (void *);
+
+  /* NVIDIA target specific routines.  */
+  struct {
+    void *(*get_current_device_func) (void);
+    void *(*get_current_context_func) (void);
+    void *(*get_stream_func) (int);
+    int (*set_stream_func) (int, void *);
+  } cuda;
+} ACC_dispatch_t;
+
+/* This structure describes accelerator device.
+   It contains name of the corresponding libgomp plugin, function handlers for
+   interaction with the device, ID-number of the device, and information about
+   mapped memory.  */
+struct gomp_device_descr
+{
+  /* The name of the device.  */
+  const char *name;
+
+  /* Capabilities of device (supports OpenACC, OpenMP).  */
+  unsigned int capabilities;
+
+  /* This is the ID number of device.  It could be specified in DEVICE-clause of
+     TARGET construct.  */
+  int id;
+
+  /* This is the ID number of device among devices of the same type.  */
+  int target_id;
+
+  /* This is the TYPE of device.  */
+  enum offload_target_type type;
+
+  /* Set to true when device is initialized.  */
+  bool is_initialized;
+  
+  /* True when offload regions have been registered with this device.  */
+  bool offload_regions_registered;
+
+  /* Plugin file handler.  */
+  void *plugin_handle;
+
+  /* Function handlers.  */
+  const char *(*get_name_func) (void);
+  unsigned int (*get_caps_func) (void);
+  int (*get_type_func) (void);
+  int (*get_num_devices_func) (void);
+  void (*register_image_func) (void *, void *);
+  void (*init_device_func) (int);
+  void (*fini_device_func) (int);
+  int (*get_table_func) (int, struct mapping_table **);
+  void *(*alloc_func) (int, size_t);
+  void (*free_func) (int, void *);
+  void *(*dev2host_func) (int, void *, const void *, size_t);
+  void *(*host2dev_func) (int, void *, const void *, size_t);
+  void (*run_func) (int, void *, void *);
+
+  /* OpenACC-specific functions.  */
+  ACC_dispatch_t openacc;
+  
+  /* Memory-mapping info for this device instance.  */
+  struct gomp_memory_mapping mem_map;
+
+  /* Extra information required for a device instance by a given target.  */
+  void *target_data;
+};
+
+extern struct target_mem_desc *
+gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
+	       void **hostaddrs, void **devaddrs, size_t *sizes, void *kinds,
+	       bool is_openacc, bool is_target);
+
+extern void
+gomp_copy_from_async (struct target_mem_desc *tgt);
+
+extern void
+gomp_unmap_vars (struct target_mem_desc *tgt, bool);
+
+extern attribute_hidden void
+gomp_init_device (struct gomp_device_descr *devicep);
+
+extern attribute_hidden void
+gomp_init_tables (const struct gomp_device_descr *devicep,
+		  struct gomp_memory_mapping *mm);
+
+extern attribute_hidden void
+gomp_fini_device (struct gomp_device_descr *devicep);
+
+extern attribute_hidden void
+gomp_free_memmap (struct gomp_device_descr *devicep);
+
 #endif /* LIBGOMP_TARGET_H */
diff --git a/libgomp/oacc-async.c b/libgomp/oacc-async.c
new file mode 100644
index 0000000..94c62d8
--- /dev/null
+++ b/libgomp/oacc-async.c
@@ -0,0 +1,77 @@
+/* OpenACC Runtime Library Definitions.
+
+   Copyright (C) 2013-2014 Free Software Foundation, Inc.
+
+   Contributed by Mentor Embedded.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#include "openacc.h"
+#include "libgomp.h"
+#include "libgomp_target.h"
+#include "oacc-int.h"
+
+int
+acc_async_test (int async)
+{
+  if (async < acc_async_sync)
+    gomp_fatal ("invalid async argument: %d", async);
+
+  return base_dev->openacc.async_test_func (async);
+}
+
+int
+acc_async_test_all (void)
+{
+  return base_dev->openacc.async_test_all_func ();
+}
+
+void
+acc_wait (int async)
+{
+  if (async < acc_async_sync)
+    gomp_fatal ("invalid async argument: %d", async);
+
+  base_dev->openacc.async_wait_func (async);
+}
+
+void
+acc_wait_async (int async1, int async2)
+{
+  base_dev->openacc.async_wait_async_func (async1, async2);
+}
+
+void
+acc_wait_all (void)
+{
+  base_dev->openacc.async_wait_all_func ();
+}
+
+void
+acc_wait_all_async (int async)
+{
+  if (async < acc_async_sync)
+    gomp_fatal ("invalid async argument: %d", async);
+
+  base_dev->openacc.async_wait_all_async_func (async);
+}
diff --git a/libgomp/oacc-cuda.c b/libgomp/oacc-cuda.c
new file mode 100644
index 0000000..9965d5c
--- /dev/null
+++ b/libgomp/oacc-cuda.c
@@ -0,0 +1,84 @@
+/* OpenACC Runtime Library: CUDA support glue.
+
+   Copyright (C) 2014 Free Software Foundation, Inc.
+
+   Contributed by Mentor Embedded.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "openacc.h"
+#include "config.h"
+#include "libgomp.h"
+#include "libgomp_target.h"
+#include "oacc-int.h"
+
+void *
+acc_get_current_cuda_device (void)
+{
+  void *p = NULL;
+
+  if (base_dev && base_dev->openacc.cuda.get_current_device_func)
+    p = base_dev->openacc.cuda.get_current_device_func ();
+
+  return p;
+}
+
+void *
+acc_get_current_cuda_context (void)
+{
+  void *p = NULL;
+
+  if (base_dev && base_dev->openacc.cuda.get_current_context_func)
+    p = base_dev->openacc.cuda.get_current_context_func ();
+
+  return p;
+}
+
+void *
+acc_get_cuda_stream (int async)
+{
+  void *p = NULL;
+
+  if (async < 0)
+    return p;
+
+  if (base_dev && base_dev->openacc.cuda.get_stream_func)
+    p = base_dev->openacc.cuda.get_stream_func (async);
+
+  return p;
+}
+
+int
+acc_set_cuda_stream (int async, void *stream)
+{
+  int s = -1;
+
+  if (async < 0 || stream == NULL)
+    return 0;
+  
+  ACC_lazy_initialize ();
+
+  if (base_dev && base_dev->openacc.cuda.set_stream_func)
+    s = base_dev->openacc.cuda.set_stream_func (async, stream);
+
+  return s;
+}
diff --git a/libgomp/oacc-host.c b/libgomp/oacc-host.c
new file mode 100644
index 0000000..079ba3c
--- /dev/null
+++ b/libgomp/oacc-host.c
@@ -0,0 +1,30 @@
+/* OpenACC Runtime Library: acc_device_host, acc_device_host_nonshm.
+
+   Copyright (C) 2013-2014 Free Software Foundation, Inc.
+
+   Contributed by Mentor Embedded.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* This shares much of the implementation of the plugin-host.c "host_nonshm"
+   plugin.  */
+#include "plugin/plugin-host.c"
diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c
new file mode 100644
index 0000000..12d52e7
--- /dev/null
+++ b/libgomp/oacc-init.c
@@ -0,0 +1,613 @@
+/* OpenACC Runtime initialization routines
+
+   Copyright (C) 2013-2014 Free Software Foundation, Inc.
+
+   Contributed by Mentor Embedded.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "libgomp.h"
+#include "libgomp_target.h"
+#include "oacc-int.h"
+#include "openacc.h"
+#include <assert.h>
+#include <stdlib.h>
+#include <strings.h>
+#include <stdbool.h>
+#include <stdio.h>
+
+static gomp_mutex_t acc_device_lock;
+
+/* The dispatch table for the current accelerator device.  This is global, so
+   you can only have one type of device open at any given time in a program. 
+   This is the "base" device in that several devices that use the same
+   dispatch table may be active concurrently: this one (the "zeroth") is used
+   for overall initialisation/shutdown, and other instances -- not necessarily
+   including this one -- may be opened and closed once the base device has
+   been initialized.  */
+struct gomp_device_descr const *base_dev;
+
+#ifdef HAVE_TLS
+__thread struct goacc_thread *goacc_tls_data;
+#else
+pthread_key_t goacc_tls_key;
+#endif
+static pthread_key_t goacc_cleanup_key;
+
+/* Current dispatcher, and how it was initialized */
+static acc_device_t init_key = _ACC_device_hwm;
+
+static struct goacc_thread *goacc_threads;
+static gomp_mutex_t goacc_thread_lock;
+
+/* An array of dispatchers for device types, indexed by the type.  This array
+   only references "base" devices, and other instances of the same type are
+   found by simply indexing from each such device (which are stored linearly,
+   grouped by device in target.c:devices).  */
+static struct gomp_device_descr const *dispatchers[_ACC_device_hwm] = { 0 };
+
+void
+ACC_register (struct gomp_device_descr const *disp)
+{
+  /* Only register the 0th device here.  */
+  if (disp->target_id != 0)
+    return;
+
+  gomp_mutex_lock (&acc_device_lock);
+
+  assert (acc_device_type (disp->type) != acc_device_none
+	  && acc_device_type (disp->type) != acc_device_default
+	  && acc_device_type (disp->type) != acc_device_not_host);
+  assert (!dispatchers[disp->type]);
+  dispatchers[disp->type] = disp;
+
+  gomp_mutex_unlock (&acc_device_lock);
+}
+
+static struct gomp_device_descr const *
+resolve_device (acc_device_t d)
+{
+  acc_device_t d_arg = d;
+
+  switch (d)
+    {
+    case acc_device_default:
+      {
+	if (goacc_device_type)
+	  {
+	    /* Lookup the named device.  */
+	    while (++d != _ACC_device_hwm)
+	      if (dispatchers[d]
+		  && !strcasecmp (goacc_device_type, dispatchers[d]->name)
+		  && dispatchers[d]->get_num_devices_func () > 0)
+		goto found;
+
+	    gomp_fatal ("device type %s not supported", goacc_device_type);
+	  }
+
+	/* No default device specified, so start scanning for any non-host
+	   device that is available.  */
+	d = acc_device_not_host;
+      }
+      /* FALLTHROUGH */
+
+    case acc_device_not_host:
+      /* Find the first available device after acc_device_not_host.  */
+      while (++d != _ACC_device_hwm)
+	if (dispatchers[d] && dispatchers[d]->get_num_devices_func () > 0)
+	  goto found;
+      if (d_arg == acc_device_default)
+	{	  
+	  d = acc_device_host;
+	  goto found;
+	}
+      gomp_fatal ("no device found");
+      break;
+
+    case acc_device_host:
+      break;
+
+    default:
+      if (d > _ACC_device_hwm)
+	gomp_fatal ("device %u out of range", (unsigned)d);
+      break;
+    }
+ found:
+
+  assert (d != acc_device_none
+	  && d != acc_device_default
+	  && d != acc_device_not_host);
+
+  return dispatchers[d];
+}
+
+/* This is called when plugins have been initialized, and serves to call
+   (indirectly) the target's device_init hook.  Calling multiple times without
+   an intervening _acc_shutdown call is an error.  */
+
+static struct gomp_device_descr const *
+_acc_init (acc_device_t d)
+{
+  struct gomp_device_descr const *acc_dev;
+
+  acc_dev = resolve_device (d);
+
+  if (!acc_dev || acc_dev->get_num_devices_func () <= 0)
+    gomp_fatal ("device %u not supported", (unsigned)d);
+
+  if (acc_dev->is_initialized)
+    gomp_fatal ("device already active");
+
+  /* We need to remember what we were intialized as, to check shutdown etc.  */
+  init_key = d;  
+
+  gomp_init_device ((struct gomp_device_descr *) acc_dev);
+
+  return acc_dev;
+}
+
+static struct goacc_thread *
+goacc_new_thread (void)
+{
+  struct goacc_thread *thr = gomp_malloc (sizeof (struct gomp_thread));
+
+#ifdef HAVE_TLS
+  goacc_tls_data = thr;
+#else
+  pthread_setspecific (goacc_tls_key, thr);
+#endif
+
+  pthread_setspecific (goacc_cleanup_key, thr);
+
+  gomp_mutex_lock (&goacc_thread_lock);
+  thr->next = goacc_threads;
+  goacc_threads = thr;
+  gomp_mutex_unlock (&goacc_thread_lock);
+
+  return thr;
+}
+
+static void
+goacc_destroy_thread (void *data)
+{
+  struct goacc_thread *thr = data, *walk, *prev;
+  
+  gomp_mutex_lock (&goacc_thread_lock);
+  
+  if (thr)
+    {
+      if (base_dev && thr->target_tls)
+	{
+	  base_dev->openacc.destroy_thread_data_func (thr->target_tls);
+	  thr->target_tls = NULL;
+	}
+
+      assert (!thr->mapped_data);
+
+      /* Remove from thread list.  */
+      for (prev = NULL, walk = goacc_threads; walk;
+	   prev = walk, walk = walk->next)
+	if (walk == thr)
+	  {
+	    if (prev == NULL)
+	      goacc_threads = walk->next;
+	    else
+	      prev->next = walk->next;
+
+	    free (thr);
+
+	    break;
+	  }
+
+      assert (walk);
+    }
+
+  gomp_mutex_unlock (&goacc_thread_lock);
+}
+
+/* Open the ORD'th device of the currently-active type (base_dev must be
+   initialised before calling).  If ORD is < 0, open the default-numbered
+   device (set by the ACC_DEVICE_NUM environment variable or a call to
+   acc_set_device_num), or leave any currently-opened device as is.  "Opening"
+   consists of calling the device's open_device_func hook, and setting up
+   thread-local data (maybe allocating, then initializing with information
+   pertaining to the newly-opened or previously-opened device).  */
+
+static void
+lazy_open (int ord)
+{
+  struct goacc_thread *thr = goacc_thread ();
+  struct gomp_device_descr *acc_dev;
+
+  if (thr && thr->dev)
+    {
+      assert (ord < 0 || ord == thr->dev->target_id);
+      return;
+    }
+
+  assert (base_dev);
+
+  if (ord < 0)
+    ord = goacc_device_num;
+
+  if (ord >= base_dev->get_num_devices_func ())
+    gomp_fatal ("device %u does not exist", ord);
+
+  if (!thr)
+    thr = goacc_new_thread ();
+
+  acc_dev = thr->dev = (struct gomp_device_descr *) &base_dev[ord];
+
+  assert (acc_dev->target_id == ord);
+
+  thr->saved_bound_dev = NULL;
+  thr->mapped_data = NULL;
+
+  if (!acc_dev->target_data)
+    acc_dev->target_data = acc_dev->openacc.open_device_func (ord);
+
+  thr->target_tls
+    = acc_dev->openacc.create_thread_data_func (acc_dev->target_data);
+
+  acc_dev->openacc.async_set_async_func (acc_async_sync);
+
+  if (!acc_dev->mem_map.is_initialized)
+    gomp_init_tables (acc_dev, &acc_dev->mem_map);
+}
+
+/* OpenACC 2.0a (3.2.12, 3.2.13) doesn't specify whether the serialization of
+   init/shutdown is per-process or per-thread.  We choose per-process.  */
+
+void
+acc_init (acc_device_t d)
+{
+  if (!base_dev)
+    gomp_init_targets_once ();
+
+  gomp_mutex_lock (&acc_device_lock);
+
+  base_dev = _acc_init (d);
+
+  lazy_open (-1);
+
+  gomp_mutex_unlock (&acc_device_lock);
+}
+
+ialias (acc_init)
+
+void
+_acc_shutdown (acc_device_t d)
+{
+  struct goacc_thread *walk;
+
+  /* We don't check whether d matches the actual device found, because
+     OpenACC 2.0 (3.2.12) says the parameters to the init and this
+     call must match (for the shutdown call anyway, it's silent on
+     others).  */
+
+  if (!base_dev)
+    gomp_fatal ("no device initialized");
+  if (d != init_key)
+    gomp_fatal ("device %u(%u) is initialized",
+		(unsigned) init_key, (unsigned) base_dev->type);
+
+  gomp_mutex_lock (&goacc_thread_lock);
+
+  /* Free target-specific TLS data and close all devices.  */
+  for (walk = goacc_threads; walk != NULL; walk = walk->next)
+    {
+      if (walk->target_tls)
+	base_dev->openacc.destroy_thread_data_func (walk->target_tls);
+
+      walk->target_tls = NULL;
+
+      /* This would mean the user is shutting down OpenACC in the middle of an
+         "acc data" pragma.  Likely not intentional.  */
+      if (walk->mapped_data)
+	gomp_fatal ("shutdown in 'acc data' region");
+
+      if (walk->dev)
+	{
+          if (walk->dev->openacc.close_device_func (walk->dev->target_data) < 0)
+	    gomp_fatal ("failed to close device");
+
+	  walk->dev->target_data = NULL;
+
+	  gomp_free_memmap (walk->dev);
+
+	  walk->dev = NULL;
+	}
+    }
+
+  gomp_mutex_unlock (&goacc_thread_lock);
+
+  gomp_fini_device ((struct gomp_device_descr *) base_dev);
+
+  base_dev = NULL;
+}
+
+void
+acc_shutdown (acc_device_t d)
+{
+  gomp_mutex_lock (&acc_device_lock);
+
+  _acc_shutdown (d);
+
+  gomp_mutex_unlock (&acc_device_lock);
+}
+
+ialias (acc_shutdown)
+
+/* This function is called after plugins have been initialized.  It deals with
+   the "base" device, and is used to prepare the runtime for dealing with a
+   number of such devices (as implemented by some particular plugin).  If the
+   argument device type D matches a previous call to the function, return the
+   current base device, else shut the old device down and re-initialize with
+   the new device type.  */
+
+static struct gomp_device_descr const *
+lazy_init (acc_device_t d)
+{
+  if (base_dev)
+    {
+      /* Re-initializing the same device, do nothing.  */
+      if (d == init_key)
+	return base_dev;
+
+      _acc_shutdown (init_key);
+    }
+
+  assert (!base_dev);
+
+  return _acc_init (d);
+}
+
+/* Ensure that plugins are loaded, initialize and open the (default-numbered)
+   device.  */
+
+static void
+lazy_init_and_open (acc_device_t d)
+{
+  if (!base_dev)
+    gomp_init_targets_once ();
+
+  gomp_mutex_lock (&acc_device_lock);
+
+  base_dev = lazy_init (d);
+
+  lazy_open (-1);
+
+  gomp_mutex_unlock (&acc_device_lock);
+}
+
+int
+acc_get_num_devices (acc_device_t d)
+{
+  int n = 0;
+  struct gomp_device_descr const *acc_dev;
+
+  if (d == acc_device_none)
+    return 0;
+
+  if (!base_dev)
+    gomp_init_targets_once ();
+
+  acc_dev = resolve_device (d);
+  if (!acc_dev)
+    return 0;
+
+  n = acc_dev->get_num_devices_func ();
+  if (n < 0)
+    n = 0;
+
+  return n;
+}
+
+ialias (acc_get_num_devices)
+
+void
+acc_set_device_type (acc_device_t d)
+{
+  lazy_init_and_open (d);
+}
+
+ialias (acc_set_device_type)
+
+acc_device_t
+acc_get_device_type (void)
+{
+  acc_device_t res = acc_device_none;
+  const struct gomp_device_descr *dev;
+
+  if (base_dev)
+    res = acc_device_type (base_dev->type);
+  else
+    {
+      gomp_init_targets_once ();
+
+      dev = resolve_device (acc_device_default);
+      res = acc_device_type (dev->type);
+    }
+
+  assert (res != acc_device_default
+	  && res != acc_device_not_host);
+
+  return res;
+}
+
+ialias (acc_get_device_type)
+
+int
+acc_get_device_num (acc_device_t d)
+{
+  const struct gomp_device_descr *dev;
+  int num;
+
+  if (d >= _ACC_device_hwm)
+    gomp_fatal ("device %u out of range", (unsigned)d);
+
+  if (!base_dev)
+    gomp_init_targets_once ();
+
+  dev = resolve_device (d);
+  if (!dev)
+    gomp_fatal ("no devices of type %u", d);
+
+  /* We might not have called lazy_open for this host thread yet, in which case
+     the get_device_num_func hook will return -1.  */
+  num = dev->openacc.get_device_num_func ();
+  if (num < 0)
+    num = goacc_device_num;
+  
+  return num;
+}
+
+ialias (acc_get_device_num)
+
+void
+acc_set_device_num (int n, acc_device_t d)
+{
+  const struct gomp_device_descr *dev;
+  int num_devices;
+
+  if (!base_dev)
+    gomp_init_targets_once ();
+  
+  if ((int) d == 0)
+    {
+      int i;
+      
+      /* A device setting of zero sets all device types on the system to use
+         the Nth instance of that device type.  Only attempt it for initialized
+	 devices though.  */
+      for (i = acc_device_not_host + 1; i < _ACC_device_hwm; i++)
+        {
+	  dev = resolve_device (d);
+	  if (dev && dev->is_initialized)
+	    dev->openacc.set_device_num_func (n);
+	}
+
+      /* ...and for future calls to acc_init/acc_set_device_type, etc.  */
+      goacc_device_num = n;
+    }
+  else
+    {
+      struct goacc_thread *thr = goacc_thread ();
+
+      gomp_mutex_lock (&acc_device_lock);
+
+      base_dev = lazy_init (d);
+
+      num_devices = base_dev->get_num_devices_func ();
+
+      if (n >= num_devices)
+        gomp_fatal ("device %u out of range", n);
+
+      /* If we're changing the device number, de-associate this thread with
+	 the device (but don't close the device, since it may be in use by
+	 other threads).  */
+      if (thr && thr->dev && n != thr->dev->target_id)
+	thr->dev = NULL;
+
+      lazy_open (n);
+
+      gomp_mutex_unlock (&acc_device_lock);
+    }
+}
+
+ialias (acc_set_device_num)
+
+int
+acc_on_device (acc_device_t dev)
+{
+  struct goacc_thread *thr = goacc_thread ();
+
+  if (thr && thr->dev
+      && acc_device_type (thr->dev->type) == acc_device_host_nonshm)
+    return dev == acc_device_host_nonshm || dev == acc_device_not_host;
+
+  /* Just rely on the compiler builtin.  */
+  return __builtin_acc_on_device (dev);
+}
+ialias (acc_on_device)
+
+attribute_hidden void
+ACC_runtime_initialize (void)
+{
+  gomp_mutex_init (&acc_device_lock);
+
+#ifndef HAVE_TLS
+  pthread_key_create (&goacc_tls_key, NULL);
+#endif
+
+  pthread_key_create (&goacc_cleanup_key, goacc_destroy_thread);
+
+  base_dev = NULL;
+
+  goacc_threads = NULL;
+  gomp_mutex_init (&goacc_thread_lock);
+}
+
+/* Compiler helper functions */
+
+void
+ACC_save_and_set_bind (acc_device_t d)
+{
+  struct goacc_thread *thr = goacc_thread ();
+
+  assert (!thr->saved_bound_dev);
+
+  thr->saved_bound_dev = thr->dev;
+  thr->dev = (struct gomp_device_descr *) dispatchers[d];
+}
+
+void
+ACC_restore_bind (void)
+{
+  struct goacc_thread *thr = goacc_thread ();
+
+  thr->dev = thr->saved_bound_dev;
+  thr->saved_bound_dev = NULL;
+}
+
+/* This is called from any OpenACC support function that may need to implicitly
+   initialize the libgomp runtime.  On exit all such initialization will have
+   been done, and both the global ACC_dev and the per-host-thread ACC_memmap
+   pointers will be valid.  */
+
+void
+ACC_lazy_initialize (void)
+{
+  struct goacc_thread *thr = goacc_thread ();
+
+  if (thr && thr->dev)
+    return;
+
+  if (!base_dev)
+    lazy_init_and_open (acc_device_default);
+  else
+    {
+      gomp_mutex_lock (&acc_device_lock);
+      lazy_open (-1);
+      gomp_mutex_unlock (&acc_device_lock);
+    }
+}
diff --git a/libgomp/oacc-int.h b/libgomp/oacc-int.h
new file mode 100644
index 0000000..aa955bd
--- /dev/null
+++ b/libgomp/oacc-int.h
@@ -0,0 +1,106 @@
+/* OpenACC Runtime - internal declarations
+
+   Copyright (C) 2005-2014 Free Software Foundation, Inc.
+
+   Contributed by Mentor Embedded.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* This file contains data types and function declarations that are not
+   part of the official OpenACC user interface.  There are declarations
+   in here that are part of the GNU OpenACC ABI, in that the compiler is
+   required to know about them and use them.
+
+   The convention is that the all caps prefix "GOACC" is used group items
+   that are part of the external ABI, and the lower case prefix "goacc"
+   is used group items that are completely private to the library.  */
+
+#ifndef _OACC_INT_H
+#define _OACC_INT_H 1
+
+#include "openacc.h"
+#include "config.h"
+#include <stddef.h>
+#include <stdbool.h>
+#include <stdarg.h>
+
+#ifdef HAVE_ATTRIBUTE_VISIBILITY
+# pragma GCC visibility push(hidden)
+#endif
+
+static inline enum acc_device_t
+acc_device_type (enum offload_target_type type)
+{
+  return (enum acc_device_t) type;
+}
+
+struct goacc_thread
+{
+  /* The device for the current thread.  */
+  struct gomp_device_descr *dev;
+  
+  struct gomp_device_descr *saved_bound_dev;
+
+  /* This is a linked list of data mapped by the "acc data" pragma, following
+     strictly push/pop semantics according to lexical scope.  */
+  struct target_mem_desc *mapped_data;
+    
+  /* These structures form a list: this is the next thread in that list.  */
+  struct goacc_thread *next;
+  
+  /* Target-specific data (used by plugin).  */
+  void *target_tls;
+};
+
+#ifdef HAVE_TLS
+extern __thread struct goacc_thread *goacc_tls_data;
+static inline struct goacc_thread *
+goacc_thread (void)
+{
+  return goacc_tls_data;
+}
+#else
+extern pthread_key_t goacc_tls_key;
+static inline struct goacc_thread *
+goacc_thread (void)
+{
+  return pthread_getspecific (goacc_tls_key);
+}
+#endif
+
+struct gomp_device_descr;
+
+void ACC_register (struct gomp_device_descr const *) __GOACC_NOTHROW;
+
+/* Current dispatcher.  */
+extern struct gomp_device_descr const *base_dev;
+
+void ACC_runtime_initialize (void);
+void ACC_save_and_set_bind (acc_device_t);
+void ACC_restore_bind (void);
+void ACC_lazy_initialize (void);
+
+#ifdef HAVE_ATTRIBUTE_VISIBILITY
+# pragma GCC visibility pop
+#endif
+
+#endif /* _OACC_INT_H */
diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
new file mode 100644
index 0000000..39e14a7
--- /dev/null
+++ b/libgomp/oacc-mem.c
@@ -0,0 +1,510 @@
+/* OpenACC Runtime initialization routines
+
+   Copyright (C) 2013 Free Software Foundation, Inc.
+
+   Contributed by Mentor Embedded.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "openacc.h"
+#include "config.h"
+#include "libgomp.h"
+#include "gomp-constants.h"
+#include "libgomp_target.h"
+#include "oacc-int.h"
+#include <stdio.h>
+#include <stdint.h>
+#include <assert.h>
+
+#include "splay-tree.h"
+
+/* Return block containing [H->S), or NULL if not contained.  */
+
+attribute_hidden splay_tree_key
+lookup_host (struct gomp_memory_mapping *mem_map, void *h, size_t s)
+{
+  struct splay_tree_key_s node;
+  splay_tree_key key;
+
+  node.host_start = (uintptr_t) h;
+  node.host_end = (uintptr_t) h + s;
+
+  gomp_mutex_lock (&mem_map->lock);
+
+  key = splay_tree_lookup (&mem_map->splay_tree, &node);
+
+  gomp_mutex_unlock (&mem_map->lock);
+
+  return key;
+}
+
+/* Return block containing [D->S), or NULL if not contained.
+   The list isn't ordered by device address, so we have to iterate
+   over the whole array.  This is not expected to be a common
+   operation.  */
+
+static splay_tree_key
+lookup_dev (struct target_mem_desc *tgt, void *d, size_t s)
+{
+  int i;
+  struct target_mem_desc *t;
+  struct gomp_memory_mapping *mem_map;
+  
+  if (!tgt)
+    return NULL;
+  
+  mem_map = tgt->mem_map;
+
+  gomp_mutex_lock (&mem_map->lock);
+
+  for (t = tgt; t != NULL; t = t->prev)
+    {
+      if (t->tgt_start <= (uintptr_t) d && t->tgt_end >= (uintptr_t) d + s)
+        break;
+    }
+
+  gomp_mutex_unlock (&mem_map->lock);
+
+  if (!t)
+    return NULL;
+
+  for (i = 0; i < t->list_count; i++)
+    {
+      void * offset;
+
+      splay_tree_key k = &t->array[i].key;
+      offset = d - t->tgt_start + k->tgt_offset;
+
+      if (k->host_start + offset <= (void *) k->host_end)
+        return k;
+    }
+ 
+  return NULL;
+}
+
+/* OpenACC is silent on how memory exhaustion is indicated.  We return
+   NULL.  */
+
+void *
+acc_malloc (size_t s)
+{
+  if (!s)
+    return NULL;
+
+  ACC_lazy_initialize ();
+
+  struct goacc_thread *thr = goacc_thread ();
+
+  return base_dev->alloc_func (thr->dev->target_id, s);
+}
+
+/* OpenACC 2.0a (3.2.16) doesn't specify what to do in the event
+   the device address is mapped. We choose to check if it mapped,
+   and if it is, to unmap it. */
+void
+acc_free (void *d)
+{
+  splay_tree_key k;
+  struct goacc_thread *thr = goacc_thread ();
+
+  if (!d)
+    return;
+
+  /* We don't have to call lazy open here, as the ptr value must have
+     been returned by acc_malloc.  It's not permitted to pass NULL in
+     (unless you got that null from acc_malloc).  */
+  if ((k = lookup_dev (thr->dev->openacc.data_environ, d, 1)))
+   {
+     void *offset;
+
+     offset = d - k->tgt->tgt_start + k->tgt_offset;
+
+     acc_unmap_data ((void *)(k->host_start + offset));
+   }
+
+  base_dev->free_func (thr->dev->target_id, d);
+}
+
+void
+acc_memcpy_to_device (void *d, void *h, size_t s)
+{
+  /* No need to call lazy open here, as the device pointer must have
+     been obtained from a routine that did that.  */
+  struct goacc_thread *thr = goacc_thread ();
+
+  base_dev->host2dev_func (thr->dev->target_id, d, h, s);
+}
+
+void
+acc_memcpy_from_device (void *h, void *d, size_t s)
+{
+  /* No need to call lazy open here, as the device pointer must have
+     been obtained from a routine that did that.  */
+  struct goacc_thread *thr = goacc_thread ();
+
+  base_dev->dev2host_func (thr->dev->target_id, h, d, s);
+}
+
+/* Return the device pointer that corresponds to host data H.  Or NULL
+   if no mapping.  */
+
+void *
+acc_deviceptr (void *h)
+{
+  splay_tree_key n;
+  void *d;
+  void *offset;
+
+  ACC_lazy_initialize ();
+
+  struct goacc_thread *thr = goacc_thread ();
+
+  n = lookup_host (&thr->dev->mem_map, h, 1);
+
+  if (!n)
+    return NULL;
+
+  offset = h - n->host_start;
+
+  d = n->tgt->tgt_start + n->tgt_offset + offset;
+
+  return d;
+}
+
+/* Return the host pointer that corresponds to device data D.  Or NULL
+   if no mapping.  */
+
+void *
+acc_hostptr (void *d)
+{
+  splay_tree_key n;
+  void *h;
+  void *offset;
+
+  ACC_lazy_initialize ();
+
+  struct goacc_thread *thr = goacc_thread ();
+
+  n = lookup_dev (thr->dev->openacc.data_environ, d, 1);
+
+  if (!n)
+    return NULL;
+
+  offset = d - n->tgt->tgt_start + n->tgt_offset;
+
+  h = n->host_start + offset;
+
+  return h;
+}
+
+/* Return 1 if host data [H,+S] is present on the device.  */
+
+int
+acc_is_present (void *h, size_t s)
+{
+  splay_tree_key n;
+
+  if (!s || !h)
+    return 0;
+
+  ACC_lazy_initialize ();
+
+  struct goacc_thread *thr = goacc_thread ();
+  struct gomp_device_descr *acc_dev = thr->dev;
+
+  n = lookup_host (&acc_dev->mem_map, h, s);
+
+  if (n && ((uintptr_t)h < n->host_start
+	    || (uintptr_t)h + s > n->host_end
+	    || s > n->host_end - n->host_start))
+    n = NULL;
+
+  return n != NULL;
+}
+
+/* Create a mapping for host [H,+S] -> device [D,+S] */
+
+void
+acc_map_data (void *h, void *d, size_t s)
+{
+  struct target_mem_desc *tgt;
+  size_t mapnum = 1;
+  void *hostaddrs = h;
+  void *devaddrs = d;
+  size_t sizes = s;
+  unsigned short kinds = GOMP_MAP_ALLOC;
+
+  ACC_lazy_initialize ();
+
+  struct goacc_thread *thr = goacc_thread ();
+  struct gomp_device_descr *acc_dev = thr->dev;
+
+  if (acc_dev->capabilities & TARGET_CAP_SHARED_MEM)
+    {
+      if (d != h)
+        gomp_fatal ("cannot map data on shared-memory system");
+
+      tgt = gomp_map_vars (NULL, 0, NULL, NULL, NULL, NULL, true, false);
+    }
+  else
+    {
+      struct goacc_thread *thr = goacc_thread ();
+
+      if (!d || !h || !s)
+	gomp_fatal ("[%p,+%d]->[%p,+%d] is a bad map",
+                    (void *)h, (int)s, (void *)d, (int)s);
+
+      if (lookup_host (&acc_dev->mem_map, h, s))
+	gomp_fatal ("host address [%p, +%d] is already mapped", (void *)h,
+		    (int)s);
+
+      if (lookup_dev (thr->dev->openacc.data_environ, d, s))
+	gomp_fatal ("device address [%p, +%d] is already mapped", (void *)d,
+		    (int)s);
+
+      tgt = gomp_map_vars (acc_dev, mapnum, &hostaddrs, &devaddrs, &sizes,
+			   &kinds, true, false);
+    }
+
+  tgt->prev = acc_dev->openacc.data_environ;
+  acc_dev->openacc.data_environ = tgt;
+}
+
+void
+acc_unmap_data (void *h)
+{
+  struct goacc_thread *thr = goacc_thread ();
+  struct gomp_device_descr *acc_dev = thr->dev;
+
+  /* No need to call lazy open, as the address must have been mapped.  */
+
+  size_t host_size;
+  splay_tree_key n = lookup_host (&acc_dev->mem_map, h, 1);
+  struct target_mem_desc *t;
+
+  if (!n)
+    gomp_fatal ("%p is not a mapped block", (void *)h);
+
+  host_size = n->host_end - n->host_start;
+
+  if (n->host_start != (uintptr_t) h)
+    gomp_fatal ("[%p,%d] surrounds1 %p",
+        	(void *) n->host_start, (int) host_size, (void *) h);
+
+  t = n->tgt;
+
+  if (t->refcount == 2)
+    {
+      struct target_mem_desc *tp;
+
+      /* This is the last reference, so pull the descriptor off the 
+         chain. This avoids gomp_unmap_vars via gomp_unmap_tgt from
+         freeing the device memory. */
+      t->tgt_end = 0;
+      t->to_free = 0;
+
+      gomp_mutex_lock (&acc_dev->mem_map.lock);
+
+      for (tp = NULL, t = acc_dev->openacc.data_environ; t != NULL;
+	   tp = t, t = t->prev)
+        if (n->tgt == t)
+          {
+            if (tp)
+              tp->prev = t->prev;
+            else
+              acc_dev->openacc.data_environ = t->prev;
+
+            break; 
+          }
+
+      gomp_mutex_unlock (&acc_dev->mem_map.lock);
+    }
+  
+  gomp_unmap_vars (t, true);
+}
+
+#define PCC_Present (1 << 0)
+#define PCC_Create (1 << 1)
+#define PCC_Copy (1 << 2)
+
+attribute_hidden void *
+present_create_copy (unsigned f, void *h, size_t s)
+{
+  void *d;
+  splay_tree_key n;
+
+  if (!h || !s)
+    gomp_fatal ("[%p,+%d] is a bad range", (void *)h, (int)s);
+
+  ACC_lazy_initialize ();
+
+  struct goacc_thread *thr = goacc_thread ();
+  struct gomp_device_descr *acc_dev = thr->dev;
+
+  n = lookup_host (&acc_dev->mem_map, h, s);
+  if (n)
+    {
+      /* Present. */
+      d = (void *) (n->tgt->tgt_start + n->tgt_offset);
+
+      if (!(f & PCC_Present))
+        gomp_fatal ("[%p,+%d] already mapped to [%p,+%d]",
+            (void *)h, (int)s, (void *)d, (int)s);
+      if ((h + s) > (void *)n->host_end)    
+        gomp_fatal ("[%p,+%d] not mapped", (void *)h, (int)s);
+    }
+  else if (!(f & PCC_Create))
+    {
+      gomp_fatal ("[%p,+%d] not mapped", (void *)h, (int)s);
+    }
+  else
+    {
+      struct target_mem_desc *tgt;
+      size_t mapnum = 1;
+      unsigned short kinds;
+      void *hostaddrs = h;
+
+      if (f & PCC_Copy)
+        kinds = GOMP_MAP_ALLOC_TO;
+      else
+        kinds = GOMP_MAP_ALLOC;
+
+      tgt = gomp_map_vars (acc_dev, mapnum, &hostaddrs, NULL, &s, &kinds, true,
+			   false);
+
+      gomp_mutex_lock (&acc_dev->mem_map.lock);
+
+      d = tgt->to_free;
+      tgt->prev = acc_dev->openacc.data_environ;
+      acc_dev->openacc.data_environ = tgt;
+
+      gomp_mutex_unlock (&acc_dev->mem_map.lock);
+    }
+  
+  return d;
+}
+
+void *
+acc_create (void *h, size_t s)
+{
+  return present_create_copy (PCC_Create, h, s);
+}
+
+void *
+acc_copyin (void *h, size_t s)
+{
+  return present_create_copy (PCC_Create | PCC_Copy, h, s);
+}
+
+void *
+acc_present_or_create (void *h, size_t s)
+{
+  return present_create_copy (PCC_Present | PCC_Create, h, s);
+}
+
+void *
+acc_present_or_copyin (void *h, size_t s)
+{
+  return present_create_copy (PCC_Present | PCC_Create | PCC_Copy, h, s);
+}
+
+#define DC_Copyout (1 << 0)
+
+static void
+delete_copyout (unsigned f, void *h, size_t s)
+{
+  size_t host_size;
+  splay_tree_key n;
+  void *d;
+  struct goacc_thread *thr = goacc_thread ();
+  struct gomp_device_descr *acc_dev = thr->dev;
+
+  n = lookup_host (&acc_dev->mem_map, h, s);
+
+  /* No need to call lazy open, as the data must already have been
+     mapped.  */
+
+  if (!n)
+    gomp_fatal ("[%p,%d] is not mapped", (void *)h, (int)s);
+
+  d = (void *) (n->tgt->tgt_start + n->tgt_offset);
+
+  host_size = n->host_end - n->host_start;
+
+  if (n->host_start != (uintptr_t) h || host_size != s)
+    gomp_fatal ("[%p,%d] surrounds2 [%p,+%d]",
+        	(void *) n->host_start, (int) host_size, (void *) h, (int) s);
+
+  if (f & DC_Copyout)
+    acc_dev->dev2host_func (acc_dev->target_id, h, d, s);
+  
+  acc_unmap_data (h);
+
+  acc_dev->free_func (acc_dev->target_id, d);
+}
+
+void
+acc_delete (void *h , size_t s)
+{
+  delete_copyout (0, h, s);
+}
+
+void acc_copyout (void *h, size_t s)
+{
+  delete_copyout (DC_Copyout, h, s);
+}
+
+static void
+update_dev_host (int is_dev, void *h, size_t s)
+{
+  splay_tree_key n;
+  void *d;
+  struct goacc_thread *thr = goacc_thread ();
+  struct gomp_device_descr *acc_dev = thr->dev;
+
+  n = lookup_host (&acc_dev->mem_map, h, s);
+
+  /* No need to call lazy open, as the data must already have been
+     mapped.  */
+
+  if (!n)
+    gomp_fatal ("[%p,%d] is not mapped", h, (int)s);
+
+  d = (void *) (n->tgt->tgt_start + n->tgt_offset);
+
+  if (is_dev)
+    acc_dev->host2dev_func (acc_dev->target_id, d, h, s);
+  else
+    acc_dev->dev2host_func (acc_dev->target_id, h, d, s);
+}
+
+void
+acc_update_device (void *h, size_t s)
+{
+  update_dev_host (1, h, s);
+}
+
+void
+acc_update_self (void *h, size_t s)
+{
+  update_dev_host (0, h, s);
+}
diff --git a/libgomp/oacc-parallel.c b/libgomp/oacc-parallel.c
new file mode 100644
index 0000000..0611362
--- /dev/null
+++ b/libgomp/oacc-parallel.c
@@ -0,0 +1,390 @@
+/* Copyright (C) 2013-2014 Free Software Foundation, Inc.
+
+   Contributed by Mentor Embedded.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* This file handles OpenACC constructs.  */
+
+#include "openacc.h"
+#include "libgomp.h"
+#include "libgomp_g.h"
+#include "gomp-constants.h"
+#include "libgomp_target.h"
+#include "oacc-int.h"
+#include <stdio.h>
+#include <string.h>
+#include <stdarg.h>
+#include <assert.h>
+#include <alloca.h>
+
+static void
+dump_var (char *s, size_t idx, void *hostaddr, size_t size, unsigned char kind)
+{
+  gomp_notify(" %2zi: %3s 0x%.2x -", idx, s, kind & 0xff);
+
+  switch (kind & 0xff)
+    {
+      case 0x00: gomp_notify(" ALLOC              "); break;
+      case 0x01: gomp_notify(" ALLOC TO           "); break;
+      case 0x02: gomp_notify(" ALLOC FROM         "); break;
+      case 0x03: gomp_notify(" ALLOC TOFROM       "); break;
+      case 0x04: gomp_notify(" POINTER            "); break;
+      case 0x05: gomp_notify(" TO_PSET            "); break;
+
+      case 0x08: gomp_notify(" FORCE_ALLOC        "); break;
+      case 0x09: gomp_notify(" FORCE_TO           "); break;
+      case 0x0a: gomp_notify(" FORCE_FROM         "); break;
+      case 0x0b: gomp_notify(" FORCE_TOFROM       "); break;
+      case 0x0c: gomp_notify(" FORCE_PRESENT      "); break;
+      case 0x0d: gomp_notify(" FORCE_DEALLOC      "); break;
+      case 0x0e: gomp_notify(" FORCE_DEVICEPTR    "); break;
+
+      case 0x18: gomp_notify(" FORCE_PRIVATE      "); break;
+      case 0x19: gomp_notify(" FORCE_FIRSTPRIVATE "); break;
+
+      case (unsigned char) -1: gomp_notify(" DUMMY              "); break;
+      default: gomp_notify("UGH! 0x%x\n", kind);
+    }
+    
+  gomp_notify("- %d - %4d/0x%04x ", 1 << (kind >> 8), (int)size, (int)size);
+  gomp_notify("- %p\n", hostaddr);
+
+  return;
+}
+
+/* Ensure that the target device for DEVICE_TYPE is initialised (and that
+   plugins have been loaded if appropriate).  The ACC_dev variable for the
+   current thread will be set appropriately for the given device type on
+   return.  */
+
+attribute_hidden void
+select_acc_device (int device_type)
+{
+  ACC_lazy_initialize ();
+
+  if (device_type == GOMP_IF_CLAUSE_FALSE)
+    return;
+
+  if (device_type == acc_device_none)
+    device_type = acc_device_host;
+
+  if (device_type >= 0)
+    {
+      /* NOTE: this will go badly if the surrounding data environment is set up
+         to use a different device type.  We'll just have to trust that users
+	 know what they're doing...  */
+      acc_set_device_type (device_type);
+    }
+}
+
+void goacc_wait (int async, int num_waits, va_list ap);
+
+void
+GOACC_parallel (int device, void (*fn) (void *), const void *openmp_target,
+		size_t mapnum, void **hostaddrs, size_t *sizes,
+		unsigned short *kinds,
+		int num_gangs, int num_workers, int vector_length,
+		int async, int num_waits, ...)
+{
+  bool if_clause_condition_value = device != GOMP_IF_CLAUSE_FALSE;
+  va_list ap;
+  struct goacc_thread *thr;
+  struct gomp_device_descr *acc_dev;
+  struct target_mem_desc *tgt;
+  void **devaddrs;
+  unsigned int i;
+  struct splay_tree_key_s k;
+  splay_tree_key tgt_fn_key;
+  void (*tgt_fn);
+
+  if (num_gangs != 1)
+    gomp_fatal ("num_gangs (%d) different from one is not yet supported",
+		num_gangs);
+  if (num_workers != 1)
+    gomp_fatal ("num_workers (%d) different from one is not yet supported",
+		num_workers);
+
+  gomp_notify ("%s: mapnum=%zd, hostaddrs=%p, sizes=%p, kinds=%p, async=%d\n",
+	       __FUNCTION__, mapnum, hostaddrs, sizes, kinds, async);
+
+  select_acc_device (device);
+
+  thr = goacc_thread ();
+  acc_dev = thr->dev;
+
+  /* Host fallback if "if" clause is false or if the current device is set to
+     the host.  */
+  if (!if_clause_condition_value)
+    {
+      ACC_save_and_set_bind (acc_device_host);
+      fn (hostaddrs);
+      ACC_restore_bind ();
+      return;
+    }
+  else if (acc_device_type (acc_dev->type) == acc_device_host)
+    {
+      fn (hostaddrs);
+      return;
+    }
+
+  va_start (ap, num_waits);
+  
+  if (num_waits > 0)
+    goacc_wait (async, num_waits, ap);
+
+  va_end (ap);
+
+  acc_dev->openacc.async_set_async_func (async);
+
+  if (!(acc_dev->capabilities & TARGET_CAP_NATIVE_EXEC))
+    {
+      k.host_start = (uintptr_t) fn;
+      k.host_end = k.host_start + 1;
+      gomp_mutex_lock (&acc_dev->mem_map.lock);
+      tgt_fn_key = splay_tree_lookup (&acc_dev->mem_map.splay_tree, &k);
+      gomp_mutex_unlock (&acc_dev->mem_map.lock);
+
+      if (tgt_fn_key == NULL)
+	gomp_fatal ("target function wasn't mapped: perhaps -fopenacc was "
+		    "used without -flto?");
+
+      tgt_fn = (void (*)) tgt_fn_key->tgt->tgt_start;
+    }
+  else
+    tgt_fn = (void (*)) fn;
+
+  tgt = gomp_map_vars (acc_dev, mapnum, hostaddrs, NULL, sizes, kinds, true,
+		       false);
+
+  devaddrs = alloca (sizeof (void *) * mapnum);
+  for (i = 0; i < mapnum; i++)
+    devaddrs[i] = (void *) (tgt->list[i]->tgt->tgt_start
+			    + tgt->list[i]->tgt_offset);
+
+  acc_dev->openacc.exec_func (tgt_fn, mapnum, hostaddrs, devaddrs, sizes, kinds,
+			      num_gangs, num_workers, vector_length, async,
+			      tgt);
+
+  /* If running synchronously, unmap immediately.  */
+  if (async < acc_async_noval)
+    gomp_unmap_vars (tgt, true);
+  else
+    {
+      gomp_copy_from_async (tgt);
+      acc_dev->openacc.register_async_cleanup_func (tgt);
+    }
+
+  acc_dev->openacc.async_set_async_func (acc_async_sync);
+}
+
+void
+GOACC_data_start (int device, const void *openmp_target, size_t mapnum,
+		  void **hostaddrs, size_t *sizes, unsigned short *kinds)
+{
+  bool if_clause_condition_value = device != GOMP_IF_CLAUSE_FALSE;
+  struct target_mem_desc *tgt;
+
+  gomp_notify ("%s: mapnum=%zd, hostaddrs=%p, sizes=%p, kinds=%p\n",
+	       __FUNCTION__, mapnum, hostaddrs, sizes, kinds);
+
+  select_acc_device (device);
+
+  struct goacc_thread *thr = goacc_thread ();
+  struct gomp_device_descr *acc_dev = thr->dev;
+
+  /* Host fallback or 'do nothing'.  */
+  if ((acc_dev->capabilities & TARGET_CAP_SHARED_MEM)
+      || !if_clause_condition_value)
+    {
+      tgt = gomp_map_vars (NULL, 0, NULL, NULL, NULL, NULL, true, false);
+      tgt->prev = thr->mapped_data;
+      thr->mapped_data = tgt;
+
+      return;
+    }
+
+  gomp_notify ("  %s: prepare mappings\n", __FUNCTION__);
+  tgt = gomp_map_vars (acc_dev, mapnum, hostaddrs, NULL, sizes, kinds, true,
+		       false);
+  gomp_notify ("  %s: mappings prepared\n", __FUNCTION__);
+  tgt->prev = thr->mapped_data;
+  thr->mapped_data = tgt;
+}
+
+void
+GOACC_data_end (void)
+{
+  struct goacc_thread *thr = goacc_thread ();
+  struct target_mem_desc *tgt = thr->mapped_data;
+
+  gomp_notify ("  %s: restore mappings\n", __FUNCTION__);
+  thr->mapped_data = tgt->prev;
+  gomp_unmap_vars (tgt, true);
+  gomp_notify ("  %s: mappings restored\n", __FUNCTION__);
+}
+
+
+void
+GOACC_kernels (int device, void (*fn) (void *), const void *openmp_target,
+	       size_t mapnum, void **hostaddrs, size_t *sizes,
+	       unsigned short *kinds,
+	       int num_gangs, int num_workers, int vector_length,
+	       int async, int num_waits, ...)
+{
+  gomp_notify ("%s: mapnum=%zd, hostaddrs=%p, sizes=%p, kinds=%p\n",
+	       __FUNCTION__, mapnum, hostaddrs, sizes, kinds);
+
+  va_list ap;
+
+  select_acc_device (device);
+
+  va_start (ap, num_waits);
+
+  if (num_waits > 0)
+    goacc_wait (async, num_waits, ap);
+
+  va_end (ap);
+
+  GOACC_parallel (device, fn, openmp_target, mapnum, hostaddrs, sizes, kinds,
+		  num_gangs, num_workers, vector_length, async, 0);
+}
+
+void
+goacc_wait (int async, int num_waits, va_list ap)
+{
+  struct goacc_thread *thr = goacc_thread ();
+  struct gomp_device_descr *acc_dev = thr->dev;
+  int i;
+
+  assert (num_waits >= 0);
+
+  if (async == acc_async_sync && num_waits == 0)
+    {
+      acc_wait_all ();
+      return;
+    }
+
+  if (async == acc_async_sync && num_waits)
+    {
+      for (i = 0; i < num_waits; i++)
+        {
+          int qid = va_arg (ap, int);
+
+          if (acc_async_test (qid))
+            continue;
+
+          acc_wait (qid);
+        }
+      return;
+    }
+
+  if (async == acc_async_noval && num_waits == 0)
+    {
+      acc_dev->openacc.async_wait_all_async_func (acc_async_noval);
+      return;
+    }
+
+  for (i = 0; i < num_waits; i++)
+    {
+      int qid = va_arg (ap, int);
+
+      if (acc_async_test (qid))
+	continue;
+
+      /* If we're waiting on the same asynchronous queue as we're launching on,
+         the queue itself will order work as required, so there's no need to
+	 wait explicitly.  */
+      if (qid != async)
+	acc_dev->openacc.async_wait_async_func (qid, async);
+    }
+}
+
+void
+GOACC_update (int device, const void *openmp_target, size_t mapnum,
+	      void **hostaddrs, size_t *sizes, unsigned short *kinds,
+	      int async, int num_waits, ...)
+{
+  bool if_clause_condition_value = device != GOMP_IF_CLAUSE_FALSE;
+  size_t i;
+
+  select_acc_device (device);
+
+  struct goacc_thread *thr = goacc_thread ();
+  struct gomp_device_descr *acc_dev = thr->dev;
+
+  if ((acc_dev->capabilities & TARGET_CAP_SHARED_MEM)
+      || !if_clause_condition_value)
+    return;
+
+  if (num_waits > 0)
+    {
+      va_list ap;
+
+      va_start (ap, num_waits);
+
+      goacc_wait (async, num_waits, ap);
+
+      va_end (ap);
+    }
+
+  acc_dev->openacc.async_set_async_func (async);
+
+  for (i = 0; i < mapnum; ++i)
+    {
+      unsigned char kind = kinds[i] & 0xff;
+
+      dump_var ("UPD", i, hostaddrs[i], sizes[i], kinds[i]);
+
+      switch (kind)
+	{
+	case GOMP_MAP_POINTER:
+	  break;
+
+	case GOMP_MAP_FORCE_TO:
+	  acc_update_device (hostaddrs[i], sizes[i]);
+	  break;
+
+	case GOMP_MAP_FORCE_FROM:
+	  acc_update_self (hostaddrs[i], sizes[i]);
+	  break;
+
+	default:
+	  gomp_fatal (">>>> GOACC_update UNHANDLED kind 0x%.2x", kind);
+	  break;
+	}
+    }
+
+  acc_dev->openacc.async_set_async_func (acc_async_sync);
+}
+
+void
+GOACC_wait (int async, int num_waits, ...)
+{
+  va_list ap;
+
+  va_start (ap, num_waits);
+
+  goacc_wait (async, num_waits, ap);
+
+  va_end (ap);
+}
diff --git a/libgomp/oacc-plugin.c b/libgomp/oacc-plugin.c
new file mode 100644
index 0000000..357cb5f
--- /dev/null
+++ b/libgomp/oacc-plugin.c
@@ -0,0 +1,48 @@
+/* Copyright (C) 2014 Free Software Foundation, Inc.
+
+   Contributed by Mentor Embedded.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* Initialize and register OpenACC dispatch table from libgomp plugin.  */
+
+#include "libgomp.h"
+#include "oacc-plugin.h"
+#include "libgomp_target.h"
+#include "oacc-int.h"
+
+void
+GOMP_PLUGIN_async_unmap_vars (void *ptr)
+{
+  struct target_mem_desc *tgt = ptr;
+  
+  gomp_unmap_vars (tgt, false);
+}
+
+/* Return the target-specific part of the TLS data for the current thread.  */
+
+void *
+GOMP_PLUGIN_acc_thread (void)
+{
+  struct goacc_thread *thr = goacc_thread ();
+  return thr ? thr->target_tls : NULL;
+}
diff --git a/libgomp/oacc-plugin.h b/libgomp/oacc-plugin.h
new file mode 100644
index 0000000..d05a28f
--- /dev/null
+++ b/libgomp/oacc-plugin.h
@@ -0,0 +1,32 @@
+/* Copyright (C) 2014 Free Software Foundation, Inc.
+
+   Contributed by Mentor Embedded.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _OACC_PLUGIN_H
+#define _OACC_PLUGIN_H 1
+
+extern void GOMP_PLUGIN_async_unmap_vars (void *ptr);
+extern void *GOMP_PLUGIN_acc_thread (void);
+
+#endif
diff --git a/libgomp/openacc.f90 b/libgomp/openacc.f90
new file mode 100644
index 0000000..e4d4d8f
--- /dev/null
+++ b/libgomp/openacc.f90
@@ -0,0 +1,953 @@
+!  OpenACC Runtime Library Definitions.
+
+!  Copyright (C) 2014 Free Software Foundation, Inc.
+
+!  Contributed by Tobias Burnus <burnus@net-b.de>
+!              and Mentor Embedded.
+
+!  This file is part of the GNU OpenMP Library (libgomp).
+
+!  Libgomp is free software; you can redistribute it and/or modify it
+!  under the terms of the GNU General Public License as published by
+!  the Free Software Foundation; either version 3, or (at your option)
+!  any later version.
+
+!  Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+!  WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+!  FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+!  more details.
+
+!  Under Section 7 of GPL version 3, you are granted additional
+!  permissions described in the GCC Runtime Library Exception, version
+!  3.1, as published by the Free Software Foundation.
+
+!  You should have received a copy of the GNU General Public License and
+!  a copy of the GCC Runtime Library Exception along with this program;
+!  see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+!  <http://www.gnu.org/licenses/>.
+
+module openacc_kinds
+  use iso_fortran_env, only: int32
+  implicit none
+
+  private :: int32
+  public :: acc_device_kind
+
+  integer, parameter :: acc_device_kind = int32
+
+  public :: acc_device_none, acc_device_default, acc_device_host
+  public :: acc_device_not_host, acc_device_nvidia
+
+  integer (acc_device_kind), parameter :: acc_device_none = 0
+  integer (acc_device_kind), parameter :: acc_device_default = 1
+  integer (acc_device_kind), parameter :: acc_device_host = 2
+  integer (acc_device_kind), parameter :: acc_device_host_nonshm = 3
+  integer (acc_device_kind), parameter :: acc_device_not_host = 4
+  integer (acc_device_kind), parameter :: acc_device_nvidia = 5
+
+  public :: acc_handle_kind
+
+  integer, parameter :: acc_handle_kind = int32
+
+  public :: acc_async_noval, acc_async_sync
+
+  integer (acc_handle_kind), parameter :: acc_async_noval = -1
+  integer (acc_handle_kind), parameter :: acc_async_sync = -2
+
+end module
+
+module openacc_internal
+  use openacc_kinds
+  implicit none
+
+  interface
+    function acc_get_num_devices_h (d)
+      import
+      integer acc_get_num_devices_h
+      integer (acc_device_kind) d
+    end function
+
+    subroutine acc_set_device_type_h (d)
+      import
+      integer (acc_device_kind) d
+    end subroutine
+
+    function acc_get_device_type_h ()
+      import
+      integer (acc_device_kind) acc_get_device_type_h
+    end function
+
+    subroutine acc_set_device_num_h (n, d)
+      import
+      integer n
+      integer (acc_device_kind) d
+    end subroutine
+
+    function acc_get_device_num_h (d)
+      import
+      integer acc_get_device_num_h
+      integer (acc_device_kind) d
+    end function
+
+    function acc_async_test_h (a)
+      logical acc_async_test_h
+      integer a
+    end function
+
+    function acc_async_test_all_h ()
+      logical acc_async_test_all_h
+    end function
+
+    subroutine acc_wait_h (a)
+      integer a
+    end subroutine
+
+    subroutine acc_wait_async_h (a1, a2)
+      integer a1, a2
+    end subroutine
+
+    subroutine acc_wait_all_h ()
+    end subroutine
+
+    subroutine acc_wait_all_async_h (a)
+      integer a
+    end subroutine
+
+    subroutine acc_init_h (d)
+      import
+      integer (acc_device_kind) d
+    end subroutine
+
+    subroutine acc_shutdown_h (d)
+      import
+      integer (acc_device_kind) d
+    end subroutine
+
+    function acc_on_device_h (d)
+      import
+      integer (acc_device_kind) d
+      logical acc_on_device_h
+    end function
+
+    subroutine acc_copyin_32_h (a, len)
+      use iso_c_binding, only: c_int32_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int32_t) len
+    end subroutine
+
+    subroutine acc_copyin_64_h (a, len)
+      use iso_c_binding, only: c_int64_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int64_t) len
+    end subroutine
+
+    subroutine acc_copyin_array_h (a)
+      type (*), dimension (..), contiguous :: a
+    end subroutine
+
+    subroutine acc_present_or_copyin_32_h (a, len)
+      use iso_c_binding, only: c_int32_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int32_t) len
+    end subroutine
+
+    subroutine acc_present_or_copyin_64_h (a, len)
+      use iso_c_binding, only: c_int64_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int64_t) len
+    end subroutine
+
+    subroutine acc_present_or_copyin_array_h (a)
+      type (*), dimension (..), contiguous :: a
+    end subroutine
+
+    subroutine acc_create_32_h (a, len)
+      use iso_c_binding, only: c_int32_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int32_t) len
+    end subroutine
+
+    subroutine acc_create_64_h (a, len)
+      use iso_c_binding, only: c_int64_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int64_t) len
+    end subroutine
+
+    subroutine acc_create_array_h (a)
+      type (*), dimension (..), contiguous :: a
+    end subroutine
+
+    subroutine acc_present_or_create_32_h (a, len)
+      use iso_c_binding, only: c_int32_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int32_t) len
+    end subroutine
+
+    subroutine acc_present_or_create_64_h (a, len)
+      use iso_c_binding, only: c_int64_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int64_t) len
+    end subroutine
+
+    subroutine acc_present_or_create_array_h (a)
+      type (*), dimension (..), contiguous :: a
+    end subroutine
+
+    subroutine acc_copyout_32_h (a, len)
+      use iso_c_binding, only: c_int32_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int32_t) len
+    end subroutine
+
+    subroutine acc_copyout_64_h (a, len)
+      use iso_c_binding, only: c_int64_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int64_t) len
+    end subroutine
+
+    subroutine acc_copyout_array_h (a)
+      type (*), dimension (..), contiguous :: a
+    end subroutine
+
+    subroutine acc_delete_32_h (a, len)
+      use iso_c_binding, only: c_int32_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int32_t) len
+    end subroutine
+
+    subroutine acc_delete_64_h (a, len)
+      use iso_c_binding, only: c_int64_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int64_t) len
+    end subroutine
+
+    subroutine acc_delete_array_h (a)
+      type (*), dimension (..), contiguous :: a
+    end subroutine
+
+    subroutine acc_update_device_32_h (a, len)
+      use iso_c_binding, only: c_int32_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int32_t) len
+    end subroutine
+
+    subroutine acc_update_device_64_h (a, len)
+      use iso_c_binding, only: c_int64_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int64_t) len
+    end subroutine
+
+    subroutine acc_update_device_array_h (a)
+      type (*), dimension (..), contiguous :: a
+    end subroutine
+
+    subroutine acc_update_self_32_h (a, len)
+      use iso_c_binding, only: c_int32_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int32_t) len
+    end subroutine
+
+    subroutine acc_update_self_64_h (a, len)
+      use iso_c_binding, only: c_int64_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int64_t) len
+    end subroutine
+
+    subroutine acc_update_self_array_h (a)
+      type (*), dimension (..), contiguous :: a
+    end subroutine
+
+    function acc_is_present_32_h (a, len)
+      use iso_c_binding, only: c_int32_t
+      logical acc_is_present_32_h
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int32_t) len
+    end function
+
+    function acc_is_present_64_h (a, len)
+      use iso_c_binding, only: c_int64_t
+      logical acc_is_present_64_h
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int64_t) len
+    end function
+
+    function acc_is_present_array_h (a)
+      logical acc_is_present_array_h
+      type (*), dimension (..), contiguous :: a
+    end function
+  end interface
+
+  interface
+    function acc_get_num_devices_l (d) &
+        bind (C, name = "acc_get_num_devices")
+      use iso_c_binding, only: c_int
+      integer (c_int) :: acc_get_num_devices_l
+      integer (c_int), value :: d
+    end function
+
+    subroutine acc_set_device_type_l (d) &
+        bind (C, name = "acc_set_device_type")
+      use iso_c_binding, only: c_int
+      integer (c_int), value :: d
+    end subroutine
+
+    function acc_get_device_type_l () &
+        bind (C, name = "acc_get_device_type")
+      use iso_c_binding, only: c_int
+      integer (c_int) :: acc_get_device_type_l
+    end function
+
+    subroutine acc_set_device_num_l (n, d) &
+        bind (C, name = "acc_set_device_num")
+      use iso_c_binding, only: c_int
+      integer (c_int), value :: n, d
+    end subroutine
+
+    function acc_get_device_num_l (d) &
+        bind (C, name = "acc_get_device_num")
+      use iso_c_binding, only: c_int
+      integer (c_int) :: acc_get_device_num_l
+      integer (c_int), value :: d
+    end function
+
+    function acc_async_test_l (a) &
+        bind (C, name = "acc_async_test")
+      use iso_c_binding, only: c_int
+      integer (c_int) :: acc_async_test_l
+      integer (c_int), value :: a
+    end function
+
+    function acc_async_test_all_l () &
+        bind (C, name = "acc_async_test_all")
+      use iso_c_binding, only: c_int
+      integer (c_int) :: acc_async_test_all_l
+    end function
+
+    subroutine acc_wait_l (a) &
+        bind (C, name = "acc_wait")
+      use iso_c_binding, only: c_int
+      integer (c_int), value :: a
+    end subroutine
+
+    subroutine acc_wait_async_l (a1, a2) &
+        bind (C, name = "acc_wait_async")
+      use iso_c_binding, only: c_int
+      integer (c_int), value :: a1, a2
+    end subroutine
+
+    subroutine acc_wait_all_l () &
+        bind (C, name = "acc_wait_all")
+      use iso_c_binding, only: c_int
+    end subroutine
+
+    subroutine acc_wait_all_async_l (a) &
+        bind (C, name = "acc_wait_all_async")
+      use iso_c_binding, only: c_int
+      integer (c_int), value :: a
+    end subroutine
+
+    subroutine acc_init_l (d) &
+        bind (C, name = "acc_init")
+      use iso_c_binding, only: c_int
+      integer (c_int), value :: d
+    end subroutine
+
+    subroutine acc_shutdown_l (d) &
+        bind (C, name = "acc_shutdown")
+      use iso_c_binding, only: c_int
+      integer (c_int), value :: d
+    end subroutine
+
+    function acc_on_device_l (d) &
+        bind (C, name = "acc_on_device")
+      use iso_c_binding, only: c_int
+      integer (c_int) :: acc_on_device_l
+      integer (c_int), value :: d
+    end function
+
+    subroutine acc_copyin_l (a, len) &
+        bind (C, name = "acc_copyin")
+      use iso_c_binding, only: c_size_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_size_t), value :: len
+    end subroutine
+
+    subroutine acc_present_or_copyin_l (a, len) &
+        bind (C, name = "acc_present_or_copyin")
+      use iso_c_binding, only: c_size_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_size_t), value :: len
+    end subroutine
+
+    subroutine acc_create_l (a, len) &
+        bind (C, name = "acc_create")
+      use iso_c_binding, only: c_size_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_size_t), value :: len
+    end subroutine
+
+    subroutine acc_present_or_create_l (a, len) &
+        bind (C, name = "acc_present_or_create")
+      use iso_c_binding, only: c_size_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_size_t), value :: len
+    end subroutine
+
+    subroutine acc_copyout_l (a, len) &
+        bind (C, name = "acc_copyout")
+      use iso_c_binding, only: c_size_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_size_t), value :: len
+    end subroutine
+
+    subroutine acc_delete_l (a, len) &
+        bind (C, name = "acc_delete")
+      use iso_c_binding, only: c_size_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_size_t), value :: len
+    end subroutine
+
+    subroutine acc_update_device_l (a, len) &
+        bind (C, name = "acc_update_device")
+      use iso_c_binding, only: c_size_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_size_t), value :: len
+    end subroutine
+
+    subroutine acc_update_self_l (a, len) &
+        bind (C, name = "acc_update_self")
+      use iso_c_binding, only: c_size_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_size_t), value :: len
+    end subroutine
+
+    function acc_is_present_l (a, len) &
+        bind (C, name = "acc_is_present")
+      use iso_c_binding, only: c_int32_t, c_size_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      integer (c_int32_t) :: acc_is_present_l
+      type (*), dimension (*) :: a
+      integer (c_size_t), value :: len
+    end function
+  end interface
+end module
+
+module openacc
+  use openacc_kinds
+  use openacc_internal
+  implicit none
+
+  public :: openacc_version
+
+  public :: acc_get_num_devices, acc_set_device_type, acc_get_device_type
+  public :: acc_set_device_num, acc_get_device_num, acc_async_test
+  public :: acc_async_test_all, acc_wait, acc_wait_async, acc_wait_all
+  public :: acc_wait_all_async, acc_init, acc_shutdown, acc_on_device
+  public :: acc_copyin, acc_present_or_copyin, acc_pcopyin, acc_create
+  public :: acc_present_or_create, acc_pcreate, acc_copyout, acc_delete
+  public :: acc_update_device, acc_update_self, acc_is_present
+
+  integer, parameter :: openacc_version = 201306
+
+  interface acc_get_num_devices
+    procedure :: acc_get_num_devices_h
+  end interface
+
+  interface acc_set_device_type
+    procedure :: acc_set_device_type_h
+  end interface
+
+  interface acc_get_device_type
+    procedure :: acc_get_device_type_h
+  end interface
+
+  interface acc_set_device_num
+    procedure :: acc_set_device_num_h
+  end interface
+
+  interface acc_get_device_num
+    procedure :: acc_get_device_num_h
+  end interface
+
+  interface acc_async_test
+    procedure :: acc_async_test_h
+  end interface
+
+  interface acc_async_test_all
+    procedure :: acc_async_test_all_h
+  end interface
+
+  interface acc_wait
+    procedure :: acc_wait_h
+  end interface
+
+  interface acc_wait_async
+    procedure :: acc_wait_async_h
+  end interface
+
+  interface acc_wait_all
+    procedure :: acc_wait_all_h
+  end interface
+
+  interface acc_wait_all_async
+    procedure :: acc_wait_all_async_h
+  end interface
+
+  interface acc_init
+    procedure :: acc_init_h
+  end interface
+
+  interface acc_shutdown
+    procedure :: acc_shutdown_h
+  end interface
+
+  interface acc_on_device
+    procedure :: acc_on_device_h
+  end interface
+
+  ! acc_malloc: Only available in C/C++
+  ! acc_free: Only available in C/C++
+
+  ! As vendor extension, the following code supports both 32bit and 64bit
+  ! arguments for "size"; the OpenACC standard only permits default-kind
+  ! integers, which are of kind 4 (i.e. 32 bits).
+  ! Additionally, the two-argument version also takes arrays as argument.
+  ! and the one argument version also scalars. Note that the code assumes
+  ! that the arrays are contiguous.
+
+  interface acc_copyin
+    procedure :: acc_copyin_32_h
+    procedure :: acc_copyin_64_h
+    procedure :: acc_copyin_array_h
+  end interface
+
+  interface acc_present_or_copyin
+    procedure :: acc_present_or_copyin_32_h
+    procedure :: acc_present_or_copyin_64_h
+    procedure :: acc_present_or_copyin_array_h
+  end interface
+
+  interface acc_pcopyin
+    procedure :: acc_present_or_copyin_32_h
+    procedure :: acc_present_or_copyin_64_h
+    procedure :: acc_present_or_copyin_array_h
+  end interface
+
+  interface acc_create
+    procedure :: acc_create_32_h
+    procedure :: acc_create_64_h
+    procedure :: acc_create_array_h
+  end interface
+
+  interface acc_present_or_create
+    procedure :: acc_present_or_create_32_h
+    procedure :: acc_present_or_create_64_h
+    procedure :: acc_present_or_create_array_h
+  end interface
+
+  interface acc_pcreate
+    procedure :: acc_present_or_create_32_h
+    procedure :: acc_present_or_create_64_h
+    procedure :: acc_present_or_create_array_h
+  end interface
+
+  interface acc_copyout
+    procedure :: acc_copyout_32_h
+    procedure :: acc_copyout_64_h
+    procedure :: acc_copyout_array_h
+  end interface
+
+  interface acc_delete
+    procedure :: acc_delete_32_h
+    procedure :: acc_delete_64_h
+    procedure :: acc_delete_array_h
+  end interface
+
+  interface acc_update_device
+    procedure :: acc_update_device_32_h
+    procedure :: acc_update_device_64_h
+    procedure :: acc_update_device_array_h
+  end interface
+
+  interface acc_update_self
+    procedure :: acc_update_self_32_h
+    procedure :: acc_update_self_64_h
+    procedure :: acc_update_self_array_h
+  end interface
+
+  ! acc_map_data: Only available in C/C++
+  ! acc_unmap_data: Only available in C/C++
+  ! acc_deviceptr: Only available in C/C++
+  ! acc_hostptr: Only available in C/C++
+
+  interface acc_is_present
+    procedure :: acc_is_present_32_h
+    procedure :: acc_is_present_64_h
+    procedure :: acc_is_present_array_h
+  end interface
+
+  ! acc_memcpy_to_device: Only available in C/C++
+  ! acc_memcpy_from_device: Only available in C/C++
+
+end module
+
+function acc_get_num_devices_h (d)
+  use openacc_internal, only: acc_get_num_devices_l
+  use openacc_kinds
+  integer acc_get_num_devices_h
+  integer (acc_device_kind) d
+  acc_get_num_devices_h = acc_get_num_devices_l (d)
+end function
+
+subroutine acc_set_device_type_h (d)
+  use openacc_internal, only: acc_set_device_type_l
+  use openacc_kinds
+  integer (acc_device_kind) d
+  call acc_set_device_type_l (d)
+end subroutine
+
+function acc_get_device_type_h ()
+  use openacc_internal, only: acc_get_device_type_l
+  use openacc_kinds
+  integer (acc_device_kind) acc_get_device_type_h
+  acc_get_device_type_h = acc_get_device_type_l ()
+end function
+
+subroutine acc_set_device_num_h (n, d)
+  use openacc_internal, only: acc_set_device_num_l
+  use openacc_kinds
+  integer n
+  integer (acc_device_kind) d
+  call acc_set_device_num_l (n, d)
+end subroutine
+
+function acc_get_device_num_h (d)
+  use openacc_internal, only: acc_get_device_num_l
+  use openacc_kinds
+  integer acc_get_device_num_h
+  integer (acc_device_kind) d
+  acc_get_device_num_h = acc_get_device_num_l (d)
+end function
+
+function acc_async_test_h (a)
+  use openacc_internal, only: acc_async_test_l
+  logical acc_async_test_h
+  integer a
+  if (acc_async_test_l (a) .eq. 1) then
+    acc_async_test_h = .TRUE.
+  else
+    acc_async_test_h = .FALSE.
+  end if
+end function
+
+function acc_async_test_all_h ()
+  use openacc_internal, only: acc_async_test_all_l
+  logical acc_async_test_all_h
+  if (acc_async_test_all_l () .eq. 1) then
+    acc_async_test_all_h = .TRUE.
+  else
+    acc_async_test_all_h = .FALSE.
+  end if
+end function
+
+subroutine acc_wait_h (a)
+  use openacc_internal, only: acc_wait_l
+  integer a
+  call acc_wait_l (a)
+end subroutine
+
+subroutine acc_wait_async_h (a1, a2)
+  use openacc_internal, only: acc_wait_async_l
+  integer a1, a2
+  call acc_wait_async_l (a1, a2)
+end subroutine
+
+subroutine acc_wait_all_h ()
+  use openacc_internal, only: acc_wait_all_l
+  call acc_wait_all_l ()
+end subroutine
+
+subroutine acc_wait_all_async_h (a)
+  use openacc_internal, only: acc_wait_all_async_l
+  integer a
+  call acc_wait_all_async_l (a)
+end subroutine
+
+subroutine acc_init_h (d)
+  use openacc_internal, only: acc_init_l
+  use openacc_kinds
+  integer (acc_device_kind) d
+  call acc_init_l (d)
+end subroutine
+
+subroutine acc_shutdown_h (d)
+  use openacc_internal, only: acc_shutdown_l
+  use openacc_kinds
+  integer (acc_device_kind) d
+  call acc_shutdown_l (d)
+end subroutine
+
+function acc_on_device_h (d)
+  use openacc_internal, only: acc_on_device_l
+  use openacc_kinds
+  integer (acc_device_kind) d
+  logical acc_on_device_h
+  if (acc_on_device_l (d) .eq. 1) then
+    acc_on_device_h = .TRUE.
+  else
+    acc_on_device_h = .FALSE.
+  end if
+end function
+
+subroutine acc_copyin_32_h (a, len)
+  use iso_c_binding, only: c_int32_t, c_size_t
+  use openacc_internal, only: acc_copyin_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int32_t) len
+  call acc_copyin_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_copyin_64_h (a, len)
+  use iso_c_binding, only: c_int64_t, c_size_t
+  use openacc_internal, only: acc_copyin_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int64_t) len
+  call acc_copyin_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_copyin_array_h (a)
+  use openacc_internal, only: acc_copyin_l
+  type (*), dimension (..), contiguous :: a
+  call acc_copyin_l (a, sizeof (a))
+end subroutine
+
+subroutine acc_present_or_copyin_32_h (a, len)
+  use iso_c_binding, only: c_int32_t, c_size_t
+  use openacc_internal, only: acc_present_or_copyin_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int32_t) len
+  call acc_present_or_copyin_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_present_or_copyin_64_h (a, len)
+  use iso_c_binding, only: c_int64_t, c_size_t
+  use openacc_internal, only: acc_present_or_copyin_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int64_t) len
+  call acc_present_or_copyin_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_present_or_copyin_array_h (a)
+  use openacc_internal, only: acc_present_or_copyin_l
+  type (*), dimension (..), contiguous :: a
+  call acc_present_or_copyin_l (a, sizeof (a))
+end subroutine
+
+subroutine acc_create_32_h (a, len)
+  use iso_c_binding, only: c_int32_t, c_size_t
+  use openacc_internal, only: acc_create_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int32_t) len
+  call acc_create_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_create_64_h (a, len)
+  use iso_c_binding, only: c_int64_t, c_size_t
+  use openacc_internal, only: acc_create_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int64_t) len
+  call acc_create_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_create_array_h (a)
+  use openacc_internal, only: acc_create_l
+  type (*), dimension (..), contiguous :: a
+  call acc_create_l (a, sizeof (a))
+end subroutine
+
+subroutine acc_present_or_create_32_h (a, len)
+  use iso_c_binding, only: c_int32_t, c_size_t
+  use openacc_internal, only: acc_present_or_create_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int32_t) len
+  call acc_present_or_create_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_present_or_create_64_h (a, len)
+  use iso_c_binding, only: c_int64_t, c_size_t
+  use openacc_internal, only: acc_present_or_create_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int64_t) len
+  call acc_present_or_create_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_present_or_create_array_h (a)
+  use openacc_internal, only: acc_present_or_create_l
+  type (*), dimension (..), contiguous :: a
+  call acc_present_or_create_l (a, sizeof (a))
+end subroutine
+
+subroutine acc_copyout_32_h (a, len)
+  use iso_c_binding, only: c_int32_t, c_size_t
+  use openacc_internal, only: acc_copyout_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int32_t) len
+  call acc_copyout_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_copyout_64_h (a, len)
+  use iso_c_binding, only: c_int64_t, c_size_t
+  use openacc_internal, only: acc_copyout_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int64_t) len
+  call acc_copyout_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_copyout_array_h (a)
+  use openacc_internal, only: acc_copyout_l
+  type (*), dimension (..), contiguous :: a
+  call acc_copyout_l (a, sizeof (a))
+end subroutine
+
+subroutine acc_delete_32_h (a, len)
+  use iso_c_binding, only: c_int32_t, c_size_t
+  use openacc_internal, only: acc_delete_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int32_t) len
+  call acc_delete_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_delete_64_h (a, len)
+  use iso_c_binding, only: c_int64_t, c_size_t
+  use openacc_internal, only: acc_delete_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int64_t) len
+  call acc_delete_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_delete_array_h (a)
+  use openacc_internal, only: acc_delete_l
+  type (*), dimension (..), contiguous :: a
+  call acc_delete_l (a, sizeof (a))
+end subroutine
+
+subroutine acc_update_device_32_h (a, len)
+  use iso_c_binding, only: c_int32_t, c_size_t
+  use openacc_internal, only: acc_update_device_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int32_t) len
+  call acc_update_device_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_update_device_64_h (a, len)
+  use iso_c_binding, only: c_int64_t, c_size_t
+  use openacc_internal, only: acc_update_device_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int64_t) len
+  call acc_update_device_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_update_device_array_h (a)
+  use openacc_internal, only: acc_update_device_l
+  type (*), dimension (..), contiguous :: a
+  call acc_update_device_l (a, sizeof (a))
+end subroutine
+
+subroutine acc_update_self_32_h (a, len)
+  use iso_c_binding, only: c_int32_t, c_size_t
+  use openacc_internal, only: acc_update_self_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int32_t) len
+  call acc_update_self_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_update_self_64_h (a, len)
+  use iso_c_binding, only: c_int64_t, c_size_t
+  use openacc_internal, only: acc_update_self_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int64_t) len
+  call acc_update_self_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_update_self_array_h (a)
+  use openacc_internal, only: acc_update_self_l
+  type (*), dimension (..), contiguous :: a
+  call acc_update_self_l (a, sizeof (a))
+end subroutine
+
+function acc_is_present_32_h (a, len)
+  use iso_c_binding, only: c_int32_t, c_size_t
+  use openacc_internal, only: acc_is_present_l
+  logical acc_is_present_32_h
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int32_t) len
+  if (acc_is_present_l (a, int (len, kind = c_size_t)) .eq. 1) then
+    acc_is_present_32_h = .TRUE.
+  else
+    acc_is_present_32_h = .FALSE.
+  end if
+end function
+
+function acc_is_present_64_h (a, len)
+  use iso_c_binding, only: c_int64_t, c_size_t
+  use openacc_internal, only: acc_is_present_l
+  logical acc_is_present_64_h
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int64_t) len
+  if (acc_is_present_l (a, int (len, kind = c_size_t)) .eq. 1) then
+    acc_is_present_64_h = .TRUE.
+  else
+    acc_is_present_64_h = .FALSE.
+  end if
+end function
+
+function acc_is_present_array_h (a)
+  use openacc_internal, only: acc_is_present_l
+  logical acc_is_present_array_h
+  type (*), dimension (..), contiguous :: a
+  acc_is_present_array_h = acc_is_present_l (a, sizeof (a)) == 1
+end function
diff --git a/libgomp/openacc.h b/libgomp/openacc.h
new file mode 100644
index 0000000..d43978f
--- /dev/null
+++ b/libgomp/openacc.h
@@ -0,0 +1,127 @@
+/* OpenACC Runtime Library User-facing Declarations
+
+   Copyright (C) 2013-2014 Free Software Foundation, Inc.
+
+   Contributed by Mentor Embedded.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _OPENACC_H
+#define _OPENACC_H 1
+
+#include "gomp-constants.h"
+
+/* The OpenACC std is silent on whether or not including openacc.h
+   might or must not include other header files.  We chose to include
+   some.  */
+#include <stddef.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#if __cplusplus >= 201103
+# define __GOACC_NOTHROW noexcept ()
+#elif __cplusplus
+# define __GOACC_NOTHROW throw ()
+#else /* Not C++ */
+# define __GOACC_NOTHROW __attribute__ ((__nothrow__))
+#endif
+
+  /* Types */
+  typedef enum acc_device_t
+    {
+      acc_device_none = 0,
+      acc_device_default, /* This has to be a distinct value, as no
+			     return value can match it.  */
+      acc_device_host = GOMP_TARGET_HOST,
+      acc_device_host_nonshm = GOMP_TARGET_HOST_NONSHM,
+      acc_device_not_host,
+      acc_device_nvidia = GOMP_TARGET_NVIDIA_PTX,
+      _ACC_device_hwm
+    } acc_device_t;
+
+  typedef enum acc_async_t
+    {
+      acc_async_noval = -1,
+      acc_async_sync  = -2
+    } acc_async_t;
+
+  int acc_get_num_devices (acc_device_t __dev) __GOACC_NOTHROW;
+  void acc_set_device_type (acc_device_t __dev) __GOACC_NOTHROW;
+  acc_device_t acc_get_device_type (void) __GOACC_NOTHROW;
+  void acc_set_device_num (int __num, acc_device_t __dev) __GOACC_NOTHROW;
+  int acc_get_device_num (acc_device_t __dev) __GOACC_NOTHROW;
+  int acc_async_test (int __async) __GOACC_NOTHROW;
+  int acc_async_test_all (void) __GOACC_NOTHROW;
+  void acc_wait (int __async) __GOACC_NOTHROW;
+  void acc_wait_async (int __async1, int __async2) __GOACC_NOTHROW;
+  void acc_wait_all (void) __GOACC_NOTHROW;
+  void acc_wait_all_async (int __async) __GOACC_NOTHROW;
+  void acc_init (acc_device_t __dev) __GOACC_NOTHROW;
+  void acc_shutdown (acc_device_t __dev) __GOACC_NOTHROW;
+  int acc_on_device (acc_device_t __dev) __GOACC_NOTHROW;
+  void *acc_malloc (size_t) __GOACC_NOTHROW;
+  void acc_free (void *) __GOACC_NOTHROW;
+  /* Some of these would be more correct with const qualifiers, but
+     the standard specifies otherwise.  */
+  void *acc_copyin (void *, size_t) __GOACC_NOTHROW;
+  void *acc_present_or_copyin (void *, size_t) __GOACC_NOTHROW;
+  void *acc_create (void *, size_t) __GOACC_NOTHROW;
+  void *acc_present_or_create (void *, size_t) __GOACC_NOTHROW;
+  void acc_copyout (void *, size_t) __GOACC_NOTHROW;
+  void acc_delete (void *, size_t) __GOACC_NOTHROW;
+  void acc_update_device (void *, size_t) __GOACC_NOTHROW;
+  void acc_update_self (void *, size_t) __GOACC_NOTHROW;
+  void acc_map_data (void *, void *, size_t) __GOACC_NOTHROW;
+  void acc_unmap_data (void *) __GOACC_NOTHROW;
+  void *acc_deviceptr (void *) __GOACC_NOTHROW;
+  void *acc_hostptr (void *) __GOACC_NOTHROW;
+  int acc_is_present (void *, size_t) __GOACC_NOTHROW;
+  void acc_memcpy_to_device (void *, void *, size_t) __GOACC_NOTHROW;
+  void acc_memcpy_from_device (void *, void *, size_t) __GOACC_NOTHROW;
+
+  void ACC_target (int, void (*) (void *), const void *,
+	     size_t, void **, size_t *, unsigned char *, int *) __GOACC_NOTHROW;
+  void ACC_parallel (int, void (*) (void *), const void *,
+	     size_t, void **, size_t *, unsigned char *) __GOACC_NOTHROW;
+  void ACC_add_device_code (void const *, char const *) __GOACC_NOTHROW;
+
+  void ACC_async_copy(int) __GOACC_NOTHROW;
+  void ACC_async_kern(int) __GOACC_NOTHROW;
+
+  /* Old names.  OpenACC does not specify whether these can or must
+     not be macros, inlines or aliases for the new names.  */
+  #define acc_pcreate acc_present_or_create
+  #define acc_pcopyin acc_present_or_copyin
+
+  /* CUDA-specific routines.  */
+  void *acc_get_current_cuda_device (void) __GOACC_NOTHROW;
+  void *acc_get_current_cuda_context (void) __GOACC_NOTHROW;
+  void *acc_get_cuda_stream (int __async) __GOACC_NOTHROW;
+  int acc_set_cuda_stream (int __async, void *__stream) __GOACC_NOTHROW;
+  
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _OPENACC_H */
diff --git a/libgomp/openacc_lib.h b/libgomp/openacc_lib.h
new file mode 100644
index 0000000..4e335f2
--- /dev/null
+++ b/libgomp/openacc_lib.h
@@ -0,0 +1,378 @@
+!  OpenACC Runtime Library Definitions.			-*- mode: fortran -*-
+
+!  Copyright (C) 2014 Free Software Foundation, Inc.
+
+!  Contributed by Tobias Burnus <burnus@net-b.de>
+!              and Mentor Embedded.
+
+!  This file is part of the GNU OpenMP Library (libgomp).
+
+!  Libgomp is free software; you can redistribute it and/or modify it
+!  under the terms of the GNU General Public License as published by
+!  the Free Software Foundation; either version 3, or (at your option)
+!  any later version.
+
+!  Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+!  WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+!  FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+!  more details.
+
+!  Under Section 7 of GPL version 3, you are granted additional
+!  permissions described in the GCC Runtime Library Exception, version
+!  3.1, as published by the Free Software Foundation.
+
+!  You should have received a copy of the GNU General Public License and
+!  a copy of the GCC Runtime Library Exception along with this program;
+!  see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+!  <http://www.gnu.org/licenses/>.
+
+! NOTE: Due to the use of dimension (..), the code only works when compiled
+! with -std=f2008ts/gnu/legacy but not with other standard settings.
+! Alternatively, the user can use the module version, which permits
+! compilation with -std=f95.
+
+      integer, parameter :: acc_device_kind = 4
+
+      integer (acc_device_kind), parameter :: acc_device_none = 0
+      integer (acc_device_kind), parameter :: acc_device_default = 1
+      integer (acc_device_kind), parameter :: acc_device_host = 2
+      integer (acc_device_kind), parameter :: acc_device_host_nonshm = 3
+      integer (acc_device_kind), parameter :: acc_device_not_host = 4
+      integer (acc_device_kind), parameter :: acc_device_nvidia = 5
+
+      integer, parameter :: acc_handle_kind = 4
+
+      integer (acc_handle_kind), parameter :: acc_async_noval = -1
+      integer (acc_handle_kind), parameter :: acc_async_sync = -2
+
+      integer, parameter :: openacc_version = 201306
+
+      interface acc_get_num_devices
+        function acc_get_num_devices_h (d)
+          import acc_device_kind
+          integer acc_get_num_devices_h
+          integer (acc_device_kind) d
+        end function
+      end interface
+
+      interface acc_set_device_type
+        subroutine acc_set_device_type_h (d)
+          import acc_device_kind
+          integer (acc_device_kind) d
+        end subroutine
+      end interface
+
+      interface acc_get_device_type
+        function acc_get_device_type_h ()
+          import acc_device_kind
+          integer (acc_device_kind) acc_get_device_type_h
+        end function
+      end interface
+
+      interface acc_set_device_num
+        subroutine acc_set_device_num_h (n, d)
+          import acc_device_kind
+          integer n
+          integer (acc_device_kind) d
+        end subroutine
+      end interface
+
+      interface acc_get_device_num
+        function acc_get_device_num_h (d)
+          import acc_device_kind
+          integer acc_get_device_num_h
+          integer (acc_device_kind) d
+        end function
+      end interface
+
+      interface acc_async_test
+        function acc_async_test_h (a)
+          logical acc_async_test_h
+          integer a
+        end function
+      end interface
+
+      interface acc_async_test_all
+        function acc_async_test_all_h ()
+          logical acc_async_test_all_h
+        end function
+      end interface
+
+      interface acc_wait
+        subroutine acc_wait_h (a)
+          integer a
+        end subroutine
+      end interface
+
+      interface acc_wait_async
+        subroutine acc_wait_async_h (a1, a2)
+          integer a1, a2
+        end subroutine
+      end interface
+
+      interface acc_wait_all
+        subroutine acc_wait_all_h ()
+        end subroutine
+      end interface
+
+      interface acc_wait_all_async
+        subroutine acc_wait_all_async_h (a)
+          integer a
+        end subroutine
+      end interface
+
+      interface acc_init
+        subroutine acc_init_h (devicetype)
+          import acc_device_kind
+          integer (acc_device_kind) devicetype
+        end subroutine
+      end interface
+
+      interface acc_shutdown
+        subroutine acc_shutdown_h (devicetype)
+          import acc_device_kind
+          integer (acc_device_kind) devicetype
+        end subroutine
+      end interface
+
+      interface acc_on_device
+        function acc_on_device_h (devicetype)
+          import acc_device_kind
+          logical acc_on_device_h
+          integer (acc_device_kind) devicetype
+        end function
+      end interface
+
+      ! acc_malloc: Only available in C/C++
+      ! acc_free: Only available in C/C++
+
+      interface acc_copyin
+        subroutine acc_copyin_32_h (a, len)
+          use iso_c_binding, only: c_int32_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int32_t) len
+        end subroutine
+
+        subroutine acc_copyin_64_h (a, len)
+          use iso_c_binding, only: c_int64_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int64_t) len
+        end subroutine
+
+        subroutine acc_copyin_array_h (a)
+          type (*), dimension (..), contiguous :: a
+          end subroutine
+      end interface
+
+      interface acc_present_or_copyin
+        subroutine acc_present_or_copyin_32_h (a, len)
+          use iso_c_binding, only: c_int32_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int32_t) len
+        end subroutine
+
+        subroutine acc_present_or_copyin_64_h (a, len)
+          use iso_c_binding, only: c_int64_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int64_t) len
+        end subroutine
+
+        subroutine acc_present_or_copyin_array_h (a)
+          type (*), dimension (..), contiguous :: a
+          end subroutine
+      end interface
+
+      interface acc_pcopyin
+        subroutine acc_pcopyin_32_h (a, len)
+          use iso_c_binding, only: c_int32_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int32_t) len
+        end subroutine
+
+        subroutine acc_pcopyin_64_h (a, len)
+          use iso_c_binding, only: c_int64_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int64_t) len
+        end subroutine
+
+        subroutine acc_pcopyin_array_h (a)
+          type (*), dimension (..), contiguous :: a
+          end subroutine
+      end interface
+
+      interface acc_create
+        subroutine acc_create_32_h (a, len)
+          use iso_c_binding, only: c_int32_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int32_t) len
+        end subroutine
+
+        subroutine acc_create_64_h (a, len)
+          use iso_c_binding, only: c_int64_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int64_t) len
+        end subroutine
+
+        subroutine acc_create_array_h (a)
+          type (*), dimension (..), contiguous :: a
+          end subroutine
+      end interface
+
+      interface acc_present_or_create
+        subroutine acc_present_or_create_32_h (a, len)
+          use iso_c_binding, only: c_int32_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int32_t) len
+        end subroutine
+
+        subroutine acc_present_or_create_64_h (a, len)
+          use iso_c_binding, only: c_int64_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int64_t) len
+        end subroutine
+
+        subroutine acc_present_or_create_array_h (a)
+          type (*), dimension (..), contiguous :: a
+          end subroutine
+      end interface
+
+      interface acc_pcreate
+        subroutine acc_pcreate_32_h (a, len)
+          use iso_c_binding, only: c_int32_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int32_t) len
+        end subroutine
+
+        subroutine acc_pcreate_64_h (a, len)
+          use iso_c_binding, only: c_int64_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int64_t) len
+        end subroutine
+
+        subroutine acc_pcreate_array_h (a)
+          type (*), dimension (..), contiguous :: a
+          end subroutine
+      end interface
+
+      interface acc_copyout
+        subroutine acc_copyout_32_h (a, len)
+          use iso_c_binding, only: c_int32_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int32_t) len
+        end subroutine
+
+        subroutine acc_copyout_64_h (a, len)
+          use iso_c_binding, only: c_int64_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int64_t) len
+        end subroutine
+
+        subroutine acc_copyout_array_h (a)
+          type (*), dimension (..), contiguous :: a
+        end subroutine
+      end interface
+
+      interface acc_delete
+        subroutine acc_delete_32_h (a, len)
+          use iso_c_binding, only: c_int32_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int32_t) len
+        end subroutine
+
+        subroutine acc_delete_64_h (a, len)
+          use iso_c_binding, only: c_int64_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int64_t) len
+        end subroutine
+
+        subroutine acc_delete_array_h (a)
+          type (*), dimension (..), contiguous :: a
+        end subroutine
+      end interface
+
+      interface acc_update_device
+        subroutine acc_update_device_32_h (a, len)
+          use iso_c_binding, only: c_int32_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int32_t) len
+        end subroutine
+
+        subroutine acc_update_device_64_h (a, len)
+          use iso_c_binding, only: c_int64_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int64_t) len
+        end subroutine
+
+        subroutine acc_update_device_array_h (a)
+          type (*), dimension (..), contiguous :: a
+        end subroutine
+      end interface
+
+      interface acc_update_self
+        subroutine acc_update_self_32_h (a, len)
+          use iso_c_binding, only: c_int32_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int32_t) len
+        end subroutine
+
+        subroutine acc_update_self_64_h (a, len)
+          use iso_c_binding, only: c_int64_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int64_t) len
+        end subroutine
+
+        subroutine acc_update_self_array_h (a)
+          type (*), dimension (..), contiguous :: a
+        end subroutine
+      end interface
+
+      ! acc_map_data: Only available in C/C++
+      ! acc_unmap_data: Only available in C/C++
+      ! acc_deviceptr: Only available in C/C++
+      ! acc_ostptr: Only available in C/C++
+
+      interface acc_is_present
+        function acc_is_present_32_h (a, len)
+          use iso_c_binding, only: c_int32_t
+          logical acc_is_present_32_h
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int32_t) len
+        end function
+
+        function acc_is_present_64_h (a, len)
+          use iso_c_binding, only: c_int64_t
+          logical acc_is_present_64_h
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int64_t) len
+        end function
+
+        function acc_is_present_array_h (a)
+          logical acc_is_present_array_h
+          type (*), dimension (..), contiguous :: a
+        end function
+      end interface
+
+      ! acc_memcpy_to_device: Only available in C/C++
+      ! acc_memcpy_from_device: Only available in C/C++
diff --git a/libgomp/plugin/Makefrag.am b/libgomp/plugin/Makefrag.am
new file mode 100644
index 0000000..d6642d9
--- /dev/null
+++ b/libgomp/plugin/Makefrag.am
@@ -0,0 +1,47 @@
+# Plugins for offload execution, Makefile.am fragment.
+#
+# Copyright (C) 2014 Free Software Foundation, Inc.
+#
+# Contributed by Mentor Embedded.
+#
+# This file is part of the GNU OpenMP Library (libgomp).
+#
+# Libgomp is free software; you can redistribute it and/or modify it
+# under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+# WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+# FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+# more details.
+#
+# Under Section 7 of GPL version 3, you are granted additional
+# permissions described in the GCC Runtime Library Exception, version
+# 3.1, as published by the Free Software Foundation.
+#
+# You should have received a copy of the GNU General Public License and
+# a copy of the GCC Runtime Library Exception along with this program;
+# see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+# <http://www.gnu.org/licenses/>.
+
+if PLUGIN_NVPTX
+# Nvidia PTX OpenACC plugin.
+libgomp_plugin_nvptx_version_info = -version-info $(libtool_VERSION)
+toolexeclib_LTLIBRARIES += libgomp-plugin-nvptx.la
+libgomp_plugin_nvptx_la_SOURCES = plugin/plugin-nvptx.c
+libgomp_plugin_nvptx_la_CPPFLAGS = $(AM_CPPFLAGS) $(PLUGIN_NVPTX_CPPFLAGS)
+libgomp_plugin_nvptx_la_LDFLAGS = $(libgomp_plugin_nvptx_version_info) \
+	$(lt_host_flags)
+libgomp_plugin_nvptx_la_LDFLAGS += $(PLUGIN_NVPTX_LDFLAGS)
+libgomp_plugin_nvptx_la_LIBADD = $(PLUGIN_NVPTX_LIBS)
+libgomp_plugin_nvptx_la_LIBTOOLFLAGS = --tag=disable-static
+endif
+
+libgomp_plugin_host_nonshm_version_info = -version-info $(libtool_VERSION)
+toolexeclib_LTLIBRARIES += libgomp-plugin-host_nonshm.la
+libgomp_plugin_host_nonshm_la_SOURCES = plugin/plugin-host.c
+libgomp_plugin_host_nonshm_la_CPPFLAGS = $(AM_CPPFLAGS) -DHOST_NONSHM_PLUGIN
+libgomp_plugin_host_nonshm_la_LDFLAGS = \
+	$(libgomp_plugin_host_nonshm_version_info) $(lt_host_flags)
+libgomp_plugin_host_nonshm_la_LIBTOOLFLAGS = --tag=disable-static
diff --git a/libgomp/plugin/configfrag.ac b/libgomp/plugin/configfrag.ac
new file mode 100644
index 0000000..68c7dc7
--- /dev/null
+++ b/libgomp/plugin/configfrag.ac
@@ -0,0 +1,107 @@
+# Plugins for offload execution, configure.ac fragment.
+#
+# Copyright (C) 2014 Free Software Foundation, Inc.
+#
+# Contributed by Mentor Embedded.
+#
+# This file is part of the GNU OpenMP Library (libgomp).
+#
+# Libgomp is free software; you can redistribute it and/or modify it
+# under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+# WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+# FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+# more details.
+#
+# Under Section 7 of GPL version 3, you are granted additional
+# permissions described in the GCC Runtime Library Exception, version
+# 3.1, as published by the Free Software Foundation.
+#
+# You should have received a copy of the GNU General Public License and
+# a copy of the GCC Runtime Library Exception along with this program;
+# see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+# <http://www.gnu.org/licenses/>.
+
+# Look for the CUDA driver package.
+CUDA_DRIVER_INCLUDE=
+CUDA_DRIVER_LIB=
+AC_SUBST(CUDA_DRIVER_INCLUDE)
+AC_SUBST(CUDA_DRIVER_LIB)
+CUDA_DRIVER_CPPFLAGS=
+CUDA_DRIVER_LDFLAGS=
+AC_ARG_WITH(cuda-driver,
+	[AS_HELP_STRING([--with-cuda-driver=PATH],
+		[specify prefix directory for installed CUDA driver package.
+		 Equivalent to --with-cuda-driver-include=PATH/include
+		 plus --with-cuda-driver-lib=PATH/lib])])
+AC_ARG_WITH(cuda-driver-include,
+	[AS_HELP_STRING([--with-cuda-driver-include=PATH],
+		[specify directory for installed CUDA driver include files])])
+AC_ARG_WITH(cuda-driver-lib,
+	[AS_HELP_STRING([--with-cuda-driver-lib=PATH],
+		[specify directory for the installed CUDA driver library])])
+if test "x$with_cuda_driver" != x; then
+  CUDA_DRIVER_INCLUDE=$with_cuda_driver/include
+  CUDA_DRIVER_LIB=$with_cuda_driver/lib
+fi
+if test "x$with_cuda_driver_include" != x; then
+  CUDA_DRIVER_INCLUDE=$with_cuda_driver_include
+fi
+if test "x$with_cuda_driver_lib" != x; then
+  CUDA_DRIVER_LIB=$with_cuda_driver_lib
+fi
+if test "x$CUDA_DRIVER_INCLUDE" != x; then
+  CUDA_DRIVER_CPPFLAGS=-I$CUDA_DRIVER_INCLUDE
+fi
+if test "x$CUDA_DRIVER_LIB" != x; then
+  CUDA_DRIVER_LDFLAGS=-L$CUDA_DRIVER_LIB
+fi
+
+PLUGIN_NVPTX=0
+PLUGIN_NVPTX_CPPFLAGS=
+PLUGIN_NVPTX_LDFLAGS=
+PLUGIN_NVPTX_LIBS=
+AC_SUBST(PLUGIN_NVPTX)
+AC_SUBST(PLUGIN_NVPTX_CPPFLAGS)
+AC_SUBST(PLUGIN_NVPTX_LDFLAGS)
+AC_SUBST(PLUGIN_NVPTX_LIBS)
+
+for accel in `echo $enable_offload_targets | sed -e 's#,# #g'`; do
+  case "$accel" in
+    nvptx*)
+      PLUGIN_NVPTX=$accel
+      PLUGIN_NVPTX_CPPFLAGS=$CUDA_DRIVER_CPPFLAGS
+      PLUGIN_NVPTX_LDFLAGS=$CUDA_DRIVER_LDFLAGS
+      PLUGIN_NVPTX_LIBS='-lcuda'
+
+      PLUGIN_NVPTX_save_CPPFLAGS=$CPPFLAGS
+      CPPFLAGS="$PLUGIN_NVPTX_CPPFLAGS $CPPFLAGS"
+      PLUGIN_NVPTX_save_LDFLAGS=$LDFLAGS
+      LDFLAGS="$PLUGIN_NVPTX_LDFLAGS $LDFLAGS"
+      PLUGIN_NVPTX_save_LIBS=$LIBS
+      LIBS="$PLUGIN_NVPTX_LIBS $LIBS"
+      AC_LINK_IFELSE(
+	[AC_LANG_PROGRAM(
+	  [#include "cuda.h"],
+	  [CUresult r = cuCtxPushCurrent (NULL);])],
+	[PLUGIN_NVPTX=1])
+      CPPFLAGS=$PLUGIN_NVPTX_save_CPPFLAGS
+      LDFLAGS=$PLUGIN_NVPTX_save_LDFLAGS
+      LIBS=$PLUGIN_NVPTX_save_LIBS
+      case $PLUGIN_NVPTX in
+	nvptx*)
+	  PLUGIN_NVPTX=0
+	  AC_MSG_ERROR([CUDA driver package required for nvptx support])
+	  ;;
+      esac
+      ;;
+  esac
+done
+AM_CONDITIONAL([PLUGIN_NVPTX], [test $PLUGIN_NVPTX = 1])
+AC_DEFINE_UNQUOTED([PLUGIN_NVPTX], [$PLUGIN_NVPTX],
+		  [Define to 1 if the NVIDIA plugin is built, 0 if not.])
+
+AC_OUTPUT
diff --git a/libgomp/plugin/plugin-host.c b/libgomp/plugin/plugin-host.c
new file mode 100644
index 0000000..937fc7f
--- /dev/null
+++ b/libgomp/plugin/plugin-host.c
@@ -0,0 +1,340 @@
+/* OpenACC Runtime Library: acc_device_host, acc_device_host_nonshm.
+
+   Copyright (C) 2013 Free Software Foundation, Inc.
+
+   Contributed by Mentor Embedded.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* Simple implementation of support routines for a shared-memory
+   acc_device_host, and a non-shared memory acc_device_host_nonshm, with the
+   latter built as a plugin.  */
+
+#include "openacc.h"
+#include "config.h"
+#include "libgomp.h"
+#include "libgomp_target.h"
+#ifdef HOST_NONSHM_PLUGIN
+#include "libgomp-plugin.h"
+#include "oacc-plugin.h"
+#else
+#include "oacc-int.h"
+#endif
+
+#include <stdint.h>
+#include <stdlib.h>
+#include <string.h>
+#include <stdio.h>
+
+#ifdef HOST_NONSHM_PLUGIN
+#define STATIC
+#define GOMP(X) GOMP_PLUGIN_##X
+#define SELF "host_nonshm plugin: "
+#else
+#define STATIC static
+#define GOMP(X) gomp_##X
+#define SELF "host: "
+#endif
+
+#ifndef HOST_NONSHM_PLUGIN
+static struct gomp_device_descr host_dispatch;
+#endif
+
+STATIC const char *
+GOMP_OFFLOAD_get_name (void)
+{
+#ifdef HOST_NONSHM_PLUGIN
+  return "host_nonshm";
+#else
+  return "host";
+#endif
+}
+
+STATIC int
+GOMP_OFFLOAD_get_type (void)
+{
+#ifdef HOST_NONSHM_PLUGIN
+  return OFFLOAD_TARGET_TYPE_HOST_NONSHM;
+#else
+  return OFFLOAD_TARGET_TYPE_HOST;
+#endif
+}
+
+STATIC unsigned int
+GOMP_OFFLOAD_get_caps (void)
+{
+  unsigned int caps = TARGET_CAP_OPENACC_200 | TARGET_CAP_OPENMP_400
+		      | TARGET_CAP_NATIVE_EXEC;
+
+#ifndef HOST_NONSHM_PLUGIN
+  caps |= TARGET_CAP_SHARED_MEM;
+#endif
+
+  return caps;
+}
+
+STATIC int
+GOMP_OFFLOAD_get_num_devices (void)
+{
+  return 1;
+}
+
+STATIC void
+GOMP_OFFLOAD_register_image (void *host_table __attribute__((unused)),
+			     void *target_data __attribute__((unused)))
+{
+}
+
+STATIC void
+GOMP_OFFLOAD_init_device (int n __attribute__((unused)))
+{
+}
+
+STATIC void
+GOMP_OFFLOAD_fini_device (int n __attribute__((unused)))
+{
+}
+
+STATIC int
+GOMP_OFFLOAD_get_table (int n __attribute__((unused)),
+			struct mapping_table **table __attribute__((unused)))
+{
+  return 0;
+}
+
+STATIC void *
+GOMP_OFFLOAD_openacc_open_device (int n)
+{
+  return (void *) (intptr_t) n;
+}
+
+STATIC int
+GOMP_OFFLOAD_openacc_close_device (void *hnd)
+{
+  return 0;
+}
+
+STATIC int
+GOMP_OFFLOAD_openacc_get_device_num (void)
+{
+  return 0;
+}
+
+STATIC void
+GOMP_OFFLOAD_openacc_set_device_num (int n)
+{
+  if (n > 0)
+    GOMP(fatal) ("device number %u out of range for host execution", n);
+}
+
+STATIC void *
+GOMP_OFFLOAD_alloc (int n __attribute__((unused)), size_t s)
+{
+  return GOMP(malloc) (s);
+}
+
+STATIC void
+GOMP_OFFLOAD_free (int n __attribute__((unused)), void *p)
+{
+  free (p);
+}
+
+STATIC void *
+GOMP_OFFLOAD_host2dev (int n __attribute__((unused)), void *d, const void *h,
+		       size_t s)
+{
+#ifdef HOST_NONSHM_PLUGIN
+  memcpy (d, h, s);
+#endif
+
+  return 0;
+}
+
+STATIC void *
+GOMP_OFFLOAD_dev2host (int n __attribute__((unused)), void *h, const void *d,
+		       size_t s)
+{
+#ifdef HOST_NONSHM_PLUGIN
+  memcpy (h, d, s);
+#endif
+
+  return 0;
+}
+
+STATIC void
+GOMP_OFFLOAD_run (int n __attribute__((unused)), void *fn_ptr, void *vars)
+{
+  void (*fn)(void *) = (void (*)(void *)) fn_ptr;
+
+  fn (vars);
+}
+
+STATIC void
+GOMP_OFFLOAD_openacc_parallel (void (*fn) (void *),
+			       size_t mapnum __attribute__((unused)),
+			       void **hostaddrs,
+			       void **devaddrs __attribute__((unused)),
+			       size_t *sizes __attribute__((unused)),
+			       unsigned short *kinds __attribute__((unused)),
+			       int num_gangs __attribute__((unused)),
+			       int num_workers __attribute__((unused)),
+			       int vector_length __attribute__((unused)),
+			       int async __attribute__((unused)),
+			       void *targ_mem_desc __attribute__((unused)))
+{
+#ifdef HOST_NONSHM_PLUGIN
+  fn (devaddrs);
+#else
+  fn (hostaddrs);
+#endif
+}
+
+STATIC void
+GOMP_OFFLOAD_openacc_register_async_cleanup (void *targ_mem_desc)
+{
+#ifdef HOST_NONSHM_PLUGIN
+  /* "Asynchronous" launches are executed synchronously on the (non-SHM) host,
+     so there's no point in delaying host-side cleanup -- just do it now.  */
+  GOMP_PLUGIN_async_unmap_vars (targ_mem_desc);
+#endif
+}
+
+STATIC void
+GOMP_OFFLOAD_openacc_async_set_async (int async __attribute__((unused)))
+{
+}
+
+STATIC int
+GOMP_OFFLOAD_openacc_async_test (int async __attribute__((unused)))
+{
+  return 1;
+}
+
+STATIC int
+GOMP_OFFLOAD_openacc_async_test_all (void)
+{
+  return 1;
+}
+
+STATIC void
+GOMP_OFFLOAD_openacc_async_wait (int async __attribute__((unused)))
+{
+}
+
+STATIC void
+GOMP_OFFLOAD_openacc_async_wait_all (void)
+{
+}
+
+STATIC void
+GOMP_OFFLOAD_openacc_async_wait_async (int async1 __attribute__((unused)),
+				       int async2 __attribute__((unused)))
+{
+}
+
+STATIC void
+GOMP_OFFLOAD_openacc_async_wait_all_async (int async __attribute__((unused)))
+{
+}
+
+STATIC void *
+GOMP_OFFLOAD_openacc_create_thread_data (void *targ_data __attribute__((unused)))
+{
+  return NULL;
+}
+
+STATIC void
+GOMP_OFFLOAD_openacc_destroy_thread_data (void *tls_data __attribute__((unused)))
+{
+}
+
+#ifndef HOST_NONSHM_PLUGIN
+static struct gomp_device_descr host_dispatch =
+  {
+    .name = "host",
+
+    .type = OFFLOAD_TARGET_TYPE_HOST,
+    .capabilities = TARGET_CAP_OPENACC_200 | TARGET_CAP_NATIVE_EXEC
+		    | TARGET_CAP_SHARED_MEM,
+    .id = 0,
+
+    .is_initialized = false,
+    .offload_regions_registered = false,
+
+    .get_name_func = GOMP_OFFLOAD_get_name,
+    .get_type_func = GOMP_OFFLOAD_get_type,
+    .get_caps_func = GOMP_OFFLOAD_get_caps,
+
+    .init_device_func = GOMP_OFFLOAD_init_device,
+    .fini_device_func = GOMP_OFFLOAD_fini_device,
+    .get_num_devices_func = GOMP_OFFLOAD_get_num_devices,
+    .register_image_func = GOMP_OFFLOAD_register_image,
+    .get_table_func = GOMP_OFFLOAD_get_table,
+
+    .alloc_func = GOMP_OFFLOAD_alloc,
+    .free_func = GOMP_OFFLOAD_free,
+    .host2dev_func = GOMP_OFFLOAD_host2dev,
+    .dev2host_func = GOMP_OFFLOAD_dev2host,
+    
+    .run_func = GOMP_OFFLOAD_run,
+
+    .openacc = {
+      .open_device_func = GOMP_OFFLOAD_openacc_open_device,
+      .close_device_func = GOMP_OFFLOAD_openacc_close_device,
+
+      .get_device_num_func = GOMP_OFFLOAD_openacc_get_device_num,
+      .set_device_num_func = GOMP_OFFLOAD_openacc_set_device_num,
+
+      .exec_func = GOMP_OFFLOAD_openacc_parallel,
+
+      .register_async_cleanup_func
+        = GOMP_OFFLOAD_openacc_register_async_cleanup,
+
+      .async_set_async_func = GOMP_OFFLOAD_openacc_async_set_async,
+      .async_test_func = GOMP_OFFLOAD_openacc_async_test,
+      .async_test_all_func = GOMP_OFFLOAD_openacc_async_test_all,
+      .async_wait_func = GOMP_OFFLOAD_openacc_async_wait,
+      .async_wait_async_func = GOMP_OFFLOAD_openacc_async_wait_async,
+      .async_wait_all_func = GOMP_OFFLOAD_openacc_async_wait_all,
+      .async_wait_all_async_func = GOMP_OFFLOAD_openacc_async_wait_all_async,
+
+      .create_thread_data_func = GOMP_OFFLOAD_openacc_create_thread_data,
+      .destroy_thread_data_func = GOMP_OFFLOAD_openacc_destroy_thread_data,
+
+      .cuda = {
+	.get_current_device_func = NULL,
+	.get_current_context_func = NULL,
+	.get_stream_func = NULL,
+	.set_stream_func = NULL,
+      }
+    }
+  };
+
+/* Register this device type.  */
+static __attribute__ ((constructor))
+void ACC_host_init (void)
+{
+  gomp_mutex_init (&host_dispatch.mem_map.lock);
+  ACC_register (&host_dispatch);
+}
+#endif
+
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
new file mode 100644
index 0000000..f66633d
--- /dev/null
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -0,0 +1,1851 @@
+/* Plugin for NVPTX execution.
+
+   Copyright (C) 2013-2014 Free Software Foundation, Inc.
+
+   Contributed by Mentor Embedded.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* Nvidia PTX-specific parts of OpenACC support.  The cuda driver
+   library appears to hold some implicit state, but the documentation
+   is not clear as to what that state might be.  Or how one might
+   propagate it from one thread to another.  */
+
+#include "openacc.h"
+#include "config.h"
+#include "libgomp.h"
+#include "libgomp_target.h"
+#include "libgomp-plugin.h"
+#include "oacc-plugin.h"
+
+#include <cuda.h>
+#include <stdint.h>
+#include <string.h>
+#include <stdio.h>
+#include <dlfcn.h>
+#include <unistd.h>
+#include <assert.h>
+
+#define	ARRAYSIZE(X) (sizeof (X) / sizeof ((X)[0]))
+
+static struct _errlist
+{
+  CUresult r;
+  char *m;
+} cuErrorList[] = {
+    { CUDA_ERROR_INVALID_VALUE, "invalid value" },
+    { CUDA_ERROR_OUT_OF_MEMORY, "out of memory" },
+    { CUDA_ERROR_NOT_INITIALIZED, "not initialized" },
+    { CUDA_ERROR_DEINITIALIZED, "deinitialized" },
+    { CUDA_ERROR_PROFILER_DISABLED, "profiler disabled" },
+    { CUDA_ERROR_PROFILER_NOT_INITIALIZED, "profiler not initialized" },
+    { CUDA_ERROR_PROFILER_ALREADY_STARTED, "already started" },
+    { CUDA_ERROR_PROFILER_ALREADY_STOPPED, "already stopped" },
+    { CUDA_ERROR_NO_DEVICE, "no device" },
+    { CUDA_ERROR_INVALID_DEVICE, "invalid device" },
+    { CUDA_ERROR_INVALID_IMAGE, "invalid image" },
+    { CUDA_ERROR_INVALID_CONTEXT, "invalid context" },
+    { CUDA_ERROR_CONTEXT_ALREADY_CURRENT, "context already current" },
+    { CUDA_ERROR_MAP_FAILED, "map error" },
+    { CUDA_ERROR_UNMAP_FAILED, "unmap error" },
+    { CUDA_ERROR_ARRAY_IS_MAPPED, "array is mapped" },
+    { CUDA_ERROR_ALREADY_MAPPED, "already mapped" },
+    { CUDA_ERROR_NO_BINARY_FOR_GPU, "no binary for gpu" },
+    { CUDA_ERROR_ALREADY_ACQUIRED, "already acquired" },
+    { CUDA_ERROR_NOT_MAPPED, "not mapped" },
+    { CUDA_ERROR_NOT_MAPPED_AS_ARRAY, "not mapped as array" },
+    { CUDA_ERROR_NOT_MAPPED_AS_POINTER, "not mapped as pointer" },
+    { CUDA_ERROR_ECC_UNCORRECTABLE, "ecc uncorrectable" },
+    { CUDA_ERROR_UNSUPPORTED_LIMIT, "unsupported limit" },
+    { CUDA_ERROR_CONTEXT_ALREADY_IN_USE, "context already in use" },
+    { CUDA_ERROR_PEER_ACCESS_UNSUPPORTED, "peer access unsupported" },
+    { CUDA_ERROR_INVALID_SOURCE, "invalid source" },
+    { CUDA_ERROR_FILE_NOT_FOUND, "file not found" },
+    { CUDA_ERROR_SHARED_OBJECT_SYMBOL_NOT_FOUND,
+                                            "shared object symbol not found" },
+    { CUDA_ERROR_SHARED_OBJECT_INIT_FAILED, "shared object init error" },
+    { CUDA_ERROR_OPERATING_SYSTEM, "operating system" },
+    { CUDA_ERROR_INVALID_HANDLE, "invalid handle" },
+    { CUDA_ERROR_NOT_FOUND, "not found" },
+    { CUDA_ERROR_NOT_READY, "not ready" },
+    { CUDA_ERROR_LAUNCH_FAILED, "launch error" },
+    { CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES, "launch out of resources" },
+    { CUDA_ERROR_LAUNCH_TIMEOUT, "launch timeout" },
+    { CUDA_ERROR_LAUNCH_INCOMPATIBLE_TEXTURING,
+                                            "launch incompatibe texturing" },
+    { CUDA_ERROR_PEER_ACCESS_ALREADY_ENABLED, "peer access already enabled" },
+    { CUDA_ERROR_PEER_ACCESS_NOT_ENABLED, "peer access not enabled " },
+    { CUDA_ERROR_PRIMARY_CONTEXT_ACTIVE, "primary cotext active" },
+    { CUDA_ERROR_CONTEXT_IS_DESTROYED, "context is destroyed" },
+    { CUDA_ERROR_ASSERT, "assert" },
+    { CUDA_ERROR_TOO_MANY_PEERS, "too many peers" },
+    { CUDA_ERROR_HOST_MEMORY_ALREADY_REGISTERED,
+                                            "host memory already registered" },
+    { CUDA_ERROR_HOST_MEMORY_NOT_REGISTERED, "host memory not registered" },
+    { CUDA_ERROR_NOT_PERMITTED, "no permitted" },
+    { CUDA_ERROR_NOT_SUPPORTED, "not supported" },
+    { CUDA_ERROR_UNKNOWN, "unknown" }
+};
+
+static char errmsg[128];
+
+static char *
+cuErrorMsg (CUresult r)
+{
+  int i;
+
+  for (i = 0; i < ARRAYSIZE (cuErrorList); i++)
+    {
+      if (cuErrorList[i].r == r)
+	return &cuErrorList[i].m[0];
+    }
+
+  sprintf (&errmsg[0], "unknown result code: %5d", r);
+
+  return &errmsg[0];
+}
+
+struct targ_fn_descriptor
+{
+  CUfunction fn;
+  const char *name;
+};
+
+static bool PTX_inited = false;
+
+struct PTX_stream
+{
+  CUstream stream;
+  pthread_t host_thread;
+  bool multithreaded;
+
+  CUdeviceptr d;
+  void *h;
+  void *h_begin;
+  void *h_end;
+  void *h_next;
+  void *h_prev;
+  void *h_tail;
+
+  struct PTX_stream *next;
+};
+
+/* Thread-specific data for PTX.  */
+
+struct nvptx_thread
+{
+  struct PTX_stream *current_stream;
+  struct PTX_device *ptx_dev;
+};
+
+struct map
+{
+  int     async;
+  size_t  size;
+  char    mappings[0];
+};
+
+static void
+map_init (struct PTX_stream *s)
+{
+  CUresult r;
+
+  int size = getpagesize ();
+
+  assert (s);
+  assert (!s->d);
+  assert (!s->h);
+
+  r = cuMemAllocHost (&s->h, size);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuMemAllocHost error: %s", cuErrorMsg (r));
+
+  r = cuMemHostGetDevicePointer (&s->d, s->h, 0);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuMemHostGetDevicePointer error: %s", cuErrorMsg (r));
+
+  assert (s->h);
+
+  s->h_begin = s->h;
+  s->h_end = s->h_begin + size;
+  s->h_next = s->h_prev = s->h_tail = s->h_begin;
+
+  assert (s->h_next);
+  assert (s->h_end);
+}
+
+static void
+map_fini (struct PTX_stream *s)
+{
+  CUresult r;
+  
+  r = cuMemFreeHost (s->h);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuMemFreeHost error: %s", cuErrorMsg (r));
+}
+
+static void
+map_pop (struct PTX_stream *s)
+{
+  struct map *m;
+
+  assert (s != NULL);
+  assert (s->h_next);
+  assert (s->h_prev);
+  assert (s->h_tail);
+
+  m = s->h_tail;
+
+  s->h_tail += m->size;
+
+  if (s->h_tail >= s->h_end)
+    s->h_tail = s->h_begin + (int) (s->h_tail - s->h_end);
+
+  if (s->h_next == s->h_tail)
+    s->h_prev = s->h_next;
+
+  assert (s->h_next >= s->h_begin);
+  assert (s->h_tail >= s->h_begin);
+  assert (s->h_prev >= s->h_begin);
+
+  assert (s->h_next <= s->h_end);
+  assert (s->h_tail <= s->h_end);
+  assert (s->h_prev <= s->h_end);
+}
+
+static void
+map_push (struct PTX_stream *s, int async, size_t size, void **h, void **d)
+{
+  int left;
+  int offset;
+  struct map *m;
+
+  assert (s != NULL);
+
+  left = s->h_end - s->h_next;
+  size += sizeof (struct map);
+
+  assert (s->h_prev);
+  assert (s->h_next);
+
+  if (size >= left)
+    {
+      m = s->h_prev;
+      m->size += left;
+      s->h_next = s->h_begin;
+
+      if (s->h_next + size > s->h_end)
+	GOMP_PLUGIN_fatal ("unable to push map");
+    }
+
+  assert (s->h_next);
+
+  m = s->h_next;
+  m->async = async;
+  m->size = size;
+
+  offset = (void *)&m->mappings[0] - s->h;
+
+  *d = (void *)(s->d + offset);
+  *h = (void *)(s->h + offset);
+
+  s->h_prev = s->h_next;
+  s->h_next += size;
+
+  assert (s->h_prev);
+  assert (s->h_next);
+
+  assert (s->h_next >= s->h_begin);
+  assert (s->h_tail >= s->h_begin);
+  assert (s->h_prev >= s->h_begin);
+  assert (s->h_next <= s->h_end);
+  assert (s->h_tail <= s->h_end);
+  assert (s->h_prev <= s->h_end);
+
+  return;
+}
+
+struct PTX_device
+{
+  CUcontext ctx;
+  bool ctx_shared;
+  CUdevice dev;
+  struct PTX_stream *null_stream;
+  /* All non-null streams associated with this device (actually context),
+     either created implicitly or passed in from the user (via
+     acc_set_cuda_stream).  */
+  struct PTX_stream *active_streams;
+  struct {
+    struct PTX_stream **arr;
+    int size;
+  } async_streams;
+  /* A lock for use when manipulating the above stream list and array.  */
+  gomp_mutex_t stream_lock;
+  int ord;
+  bool overlap;
+  bool map;
+  bool concur;
+  int  mode;
+  bool mkern;
+
+  struct PTX_device *next;
+};
+
+enum PTX_event_type
+{
+  PTX_EVT_MEM,
+  PTX_EVT_KNL,
+  PTX_EVT_SYNC,
+  PTX_EVT_ASYNC_CLEANUP
+};
+
+struct PTX_event
+{
+  CUevent *evt;
+  int type;
+  void *addr;
+  int ord;
+
+  struct PTX_event *next;
+};
+
+static gomp_mutex_t PTX_event_lock;
+static struct PTX_event *PTX_events;
+
+#define _XSTR(s) _STR(s)
+#define _STR(s) #s
+
+static struct _synames
+{
+  char *n;
+} cuSymNames[] =
+{
+  { _XSTR(cuCtxCreate) },
+  { _XSTR(cuCtxDestroy) },
+  { _XSTR(cuCtxGetCurrent) },
+  { _XSTR(cuCtxPushCurrent) },
+  { _XSTR(cuCtxSynchronize) },
+  { _XSTR(cuDeviceGet) },
+  { _XSTR(cuDeviceGetAttribute) },
+  { _XSTR(cuDeviceGetCount) },
+  { _XSTR(cuEventCreate) },
+  { _XSTR(cuEventDestroy) },
+  { _XSTR(cuEventQuery) },
+  { _XSTR(cuEventRecord) },
+  { _XSTR(cuInit) },
+  { _XSTR(cuLaunchKernel) },
+  { _XSTR(cuLinkAddData) },
+  { _XSTR(cuLinkComplete) },
+  { _XSTR(cuLinkCreate) },
+  { _XSTR(cuMemAlloc) },
+  { _XSTR(cuMemAllocHost) },
+  { _XSTR(cuMemcpy) },
+  { _XSTR(cuMemcpyDtoH) },
+  { _XSTR(cuMemcpyDtoHAsync) },
+  { _XSTR(cuMemcpyHtoD) },
+  { _XSTR(cuMemcpyHtoDAsync) },
+  { _XSTR(cuMemFree) },
+  { _XSTR(cuMemFreeHost) },
+  { _XSTR(cuMemGetAddressRange) },
+  { _XSTR(cuMemHostGetDevicePointer) },
+  { _XSTR(cuMemHostRegister) },
+  { _XSTR(cuMemHostUnregister) },
+  { _XSTR(cuModuleGetFunction) },
+  { _XSTR(cuModuleLoadData) },
+  { _XSTR(cuStreamDestroy) },
+  { _XSTR(cuStreamQuery) },
+  { _XSTR(cuStreamSynchronize) },
+  { _XSTR(cuStreamWaitEvent) }
+};
+
+static int
+verify_device_library (void)
+{
+  int i;
+  void *dh, *ds;
+
+  dh = dlopen ("libcuda.so", RTLD_LAZY);
+  if (!dh)
+    return -1;
+
+  for (i = 0; i < ARRAYSIZE (cuSymNames); i++)
+    {
+      ds = dlsym (dh, cuSymNames[i].n);
+      if (!ds)
+        return -1;
+    }
+
+  dlclose (dh);
+  
+  return 0;
+}
+
+static inline struct nvptx_thread *
+nvptx_thread (void)
+{
+  return (struct nvptx_thread *) GOMP_PLUGIN_acc_thread ();
+}
+
+static void
+init_streams_for_device (struct PTX_device *ptx_dev, int concurrency)
+{
+  int i;
+  struct PTX_stream *null_stream
+    = GOMP_PLUGIN_malloc (sizeof (struct PTX_stream));
+
+  null_stream->stream = NULL;
+  null_stream->host_thread = pthread_self ();
+  null_stream->multithreaded = true;
+  null_stream->d = (CUdeviceptr) NULL;
+  null_stream->h = NULL;
+  map_init (null_stream);
+  ptx_dev->null_stream = null_stream;
+  
+  ptx_dev->active_streams = NULL;
+  GOMP_PLUGIN_mutex_init (&ptx_dev->stream_lock);
+  
+  if (concurrency < 1)
+    concurrency = 1;
+  
+  /* This is just a guess -- make space for as many async streams as the
+     current device is capable of concurrently executing.  This can grow
+     later as necessary.  No streams are created yet.  */
+  ptx_dev->async_streams.arr
+    = GOMP_PLUGIN_malloc (concurrency * sizeof (struct PTX_stream *));
+  ptx_dev->async_streams.size = concurrency;
+  
+  for (i = 0; i < concurrency; i++)
+    ptx_dev->async_streams.arr[i] = NULL;
+}
+
+static void
+fini_streams_for_device (struct PTX_device *ptx_dev)
+{
+  free (ptx_dev->async_streams.arr);
+  
+  while (ptx_dev->active_streams != NULL)
+    {
+      struct PTX_stream *s = ptx_dev->active_streams;
+      ptx_dev->active_streams = ptx_dev->active_streams->next;
+
+      cuStreamDestroy (s->stream);
+      map_fini (s);
+      free (s);
+    }
+  
+  map_fini (ptx_dev->null_stream);
+  free (ptx_dev->null_stream);
+}
+
+/* Select a stream for (OpenACC-semantics) ASYNC argument for the current
+   thread THREAD (and also current device/context).  If CREATE is true, create
+   the stream if it does not exist (or use EXISTING if it is non-NULL), and
+   associate the stream with the same thread argument.  Returns stream to use
+   as result.  */
+
+static struct PTX_stream *
+select_stream_for_async (int async, pthread_t thread, bool create,
+			 CUstream existing)
+{
+  struct nvptx_thread *nvthd = nvptx_thread ();
+  /* Local copy of TLS variable.  */
+  struct PTX_device *ptx_dev = nvthd->ptx_dev;
+  struct PTX_stream *stream = NULL;
+  int orig_async = async;
+  
+  /* The special value acc_async_noval (-1) maps (for now) to an
+     implicitly-created stream, which is then handled the same as any other
+     numbered async stream.  Other options are available, e.g. using the null
+     stream for anonymous async operations, or choosing an idle stream from an
+     active set.  But, stick with this for now.  */
+  if (async > acc_async_sync)
+    async++;
+  
+  if (create)
+    GOMP_PLUGIN_mutex_lock (&ptx_dev->stream_lock);
+
+  /* NOTE: AFAICT there's no particular need for acc_async_sync to map to the
+     null stream, and in fact better performance may be obtainable if it doesn't
+     (because the null stream enforces overly-strict synchronisation with
+     respect to other streams for legacy reasons, and that's probably not
+     needed with OpenACC).  Maybe investigate later.  */
+  if (async == acc_async_sync)
+    stream = ptx_dev->null_stream;
+  else if (async >= 0 && async < ptx_dev->async_streams.size
+	   && ptx_dev->async_streams.arr[async] && !(create && existing))
+    stream = ptx_dev->async_streams.arr[async];
+  else if (async >= 0 && create)
+    {
+      if (async >= ptx_dev->async_streams.size)
+	{
+	  int i, newsize = ptx_dev->async_streams.size * 2;
+	  
+	  if (async >= newsize)
+	    newsize = async + 1;
+	  
+	  ptx_dev->async_streams.arr
+	    = GOMP_PLUGIN_realloc (ptx_dev->async_streams.arr,
+				   newsize * sizeof (struct PTX_stream *));
+	  
+	  for (i = ptx_dev->async_streams.size; i < newsize; i++)
+	    ptx_dev->async_streams.arr[i] = NULL;
+	  
+	  ptx_dev->async_streams.size = newsize;
+	}
+
+      /* Create a new stream on-demand if there isn't one already, or if we're
+	 setting a particular async value to an existing (externally-provided)
+	 stream.  */
+      if (!ptx_dev->async_streams.arr[async] || existing)
+        {
+	  CUresult r;
+	  struct PTX_stream *s
+	    = GOMP_PLUGIN_malloc (sizeof (struct PTX_stream));
+
+	  if (existing)
+	    s->stream = existing;
+	  else
+	    {
+	      r = cuStreamCreate (&s->stream, CU_STREAM_DEFAULT);
+	      if (r != CUDA_SUCCESS)
+		GOMP_PLUGIN_fatal ("cuStreamCreate error: %s", cuErrorMsg (r));
+	    }
+	  
+	  /* If CREATE is true, we're going to be queueing some work on this
+	     stream.  Associate it with the current host thread.  */
+	  s->host_thread = thread;
+	  s->multithreaded = false;
+	  
+	  s->d = (CUdeviceptr) NULL;
+	  s->h = NULL;
+	  map_init (s);
+	  
+	  s->next = ptx_dev->active_streams;
+	  ptx_dev->active_streams = s;
+	  ptx_dev->async_streams.arr[async] = s;
+	}
+
+      stream = ptx_dev->async_streams.arr[async];
+    }
+  else if (async < 0)
+    GOMP_PLUGIN_fatal ("bad async %d", async);
+
+  if (create)
+    {
+      assert (stream != NULL);
+
+      /* If we're trying to use the same stream from different threads
+	 simultaneously, set stream->multithreaded to true.  This affects the
+	 behaviour of acc_async_test_all and acc_wait_all, which are supposed to
+	 only wait for asynchronous launches from the same host thread they are
+	 invoked on.  If multiple threads use the same async value, we make note
+	 of that here and fall back to testing/waiting for all threads in those
+	 functions.  */
+      if (thread != stream->host_thread)
+        stream->multithreaded = true;
+
+      GOMP_PLUGIN_mutex_unlock (&ptx_dev->stream_lock);
+    }
+  else if (stream && !stream->multithreaded
+	   && !pthread_equal (stream->host_thread, thread))
+    GOMP_PLUGIN_fatal ("async %d used on wrong thread", orig_async);
+
+  return stream;
+}
+
+static int PTX_get_num_devices (void);
+
+/* Initialize the device.  */
+static int
+PTX_init (void)
+{
+  CUresult r;
+  int rc;
+
+  if (PTX_inited)
+    return PTX_get_num_devices ();
+
+  rc = verify_device_library ();
+  if (rc < 0)
+    return -1;
+
+  r = cuInit (0);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuInit error: %s", cuErrorMsg (r));
+
+  PTX_events = NULL;
+
+  GOMP_PLUGIN_mutex_init (&PTX_event_lock);
+
+  PTX_inited = true;
+
+  return PTX_get_num_devices ();
+}
+
+static void
+PTX_fini (void)
+{
+  PTX_inited = false;
+}
+
+static void *
+PTX_open_device (int n)
+{
+  struct PTX_device *ptx_dev;
+  CUdevice dev;
+  CUresult r;
+  int async_engines, pi;
+
+  r = cuDeviceGet (&dev, n);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuDeviceGet error: %s", cuErrorMsg (r));
+
+  ptx_dev = GOMP_PLUGIN_malloc (sizeof (struct PTX_device));
+
+  ptx_dev->ord = n;
+  ptx_dev->dev = dev;
+  ptx_dev->ctx_shared = false;
+
+  r = cuCtxGetCurrent (&ptx_dev->ctx);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuCtxGetCurrent error: %s", cuErrorMsg (r));
+
+  if (!ptx_dev->ctx)
+    {
+      r = cuCtxCreate (&ptx_dev->ctx, CU_CTX_SCHED_AUTO, dev);
+      if (r != CUDA_SUCCESS)
+	GOMP_PLUGIN_fatal ("cuCtxCreate error: %s", cuErrorMsg (r));
+    }
+  else
+    ptx_dev->ctx_shared = true;
+
+  r = cuDeviceGetAttribute (&pi, CU_DEVICE_ATTRIBUTE_GPU_OVERLAP, dev);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuDeviceGetAttribute error: %s", cuErrorMsg (r));
+
+  ptx_dev->overlap = pi;
+
+  r = cuDeviceGetAttribute (&pi, CU_DEVICE_ATTRIBUTE_CAN_MAP_HOST_MEMORY, dev);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuDeviceGetAttribute error: %s", cuErrorMsg (r));
+
+  ptx_dev->map = pi;
+
+  r = cuDeviceGetAttribute (&pi, CU_DEVICE_ATTRIBUTE_CONCURRENT_KERNELS, dev);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuDeviceGetAttribute error: %s", cuErrorMsg (r));
+
+  ptx_dev->concur = pi;
+
+  r = cuDeviceGetAttribute (&pi, CU_DEVICE_ATTRIBUTE_COMPUTE_MODE, dev);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuDeviceGetAttribute error: %s", cuErrorMsg (r));
+
+  ptx_dev->mode = pi;
+
+  r = cuDeviceGetAttribute (&pi, CU_DEVICE_ATTRIBUTE_INTEGRATED, dev);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuDeviceGetAttribute error: %s", cuErrorMsg (r));
+
+  ptx_dev->mkern = pi;
+
+  r = cuDeviceGetAttribute (&async_engines,
+			    CU_DEVICE_ATTRIBUTE_ASYNC_ENGINE_COUNT, dev);
+  if (r != CUDA_SUCCESS)
+    async_engines = 1;
+
+  init_streams_for_device (ptx_dev, async_engines);
+
+  return (void *) ptx_dev;
+}
+
+static int
+PTX_close_device (void *targ_data)
+{
+  CUresult r;
+  struct PTX_device *ptx_dev = targ_data;
+
+  if (!ptx_dev)
+    return 0;
+  
+  fini_streams_for_device (ptx_dev);
+
+  if (!ptx_dev->ctx_shared)
+    {
+      r = cuCtxDestroy (ptx_dev->ctx);
+      if (r != CUDA_SUCCESS)
+	GOMP_PLUGIN_fatal ("cuCtxDestroy error: %s", cuErrorMsg (r));
+    }
+
+  free (ptx_dev);
+
+  return 0;
+}
+
+static int
+PTX_get_num_devices (void)
+{
+  int n;
+  CUresult r;
+
+  /* This function will be called before the plugin has been initialized in
+     order to enumerate available devices, but CUDA API routines can't be used
+     until cuInit has been called.  Just call it now (but don't yet do any
+     further initialization).  */
+  if (!PTX_inited)
+    cuInit (0);
+
+  r = cuDeviceGetCount (&n);
+  if (r!= CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuDeviceGetCount error: %s", cuErrorMsg (r));
+
+  return n;
+}
+
+#define ABORT_PTX				\
+  ".version 3.1\n"				\
+  ".target sm_30\n"				\
+  ".address_size 64\n"				\
+  ".visible .func abort;\n"			\
+  ".visible .func abort\n"			\
+  "{\n"						\
+  "trap;\n"					\
+  "ret;\n"					\
+  "}\n"						\
+  ".visible .func _gfortran_abort;\n"		\
+  ".visible .func _gfortran_abort\n"		\
+  "{\n"						\
+  "trap;\n"					\
+  "ret;\n"					\
+  "}\n" \
+
+/* Generated with:
+
+   $ echo 'int acc_on_device(int d) { return __builtin_acc_on_device(d); } int acc_on_device_(int *d) { return acc_on_device(*d); }' | accel-gcc/xgcc -Baccel-gcc -x c - -o - -S -m64 -O3 -fno-builtin-acc_on_device -fno-inline
+*/
+#define ACC_ON_DEVICE_PTX						\
+  "        .version        3.1\n"					\
+  "        .target sm_30\n"						\
+  "        .address_size 64\n"						\
+  ".visible .func (.param.u32 %out_retval)acc_on_device(.param.u32 %in_ar1);\n" \
+  ".visible .func (.param.u32 %out_retval)acc_on_device(.param.u32 %in_ar1)\n" \
+  "{\n"									\
+  "        .reg.u32 %ar1;\n"						\
+  ".reg.u32 %retval;\n"							\
+  "        .reg.u64 %hr10;\n"						\
+  "        .reg.u32 %r24;\n"						\
+  "        .reg.u32 %r25;\n"						\
+  "        .reg.pred %r27;\n"						\
+  "        .reg.u32 %r30;\n"						\
+  "        ld.param.u32 %ar1, [%in_ar1];\n"				\
+  "                mov.u32 %r24, %ar1;\n"				\
+  "                setp.ne.u32 %r27,%r24,4;\n"				\
+  "                set.u32.eq.u32 %r30,%r24,5;\n"			\
+  "                neg.s32 %r25, %r30;\n"				\
+  "        @%r27   bra     $L3;\n"					\
+  "                mov.u32 %r25, 1;\n"					\
+  "$L3:\n"								\
+  "                mov.u32 %retval, %r25;\n"				\
+  "        st.param.u32    [%out_retval], %retval;\n"			\
+  "        ret;\n"							\
+  "        }\n"								\
+  ".visible .func (.param.u32 %out_retval)acc_on_device_(.param.u64 %in_ar1);\n" \
+  ".visible .func (.param.u32 %out_retval)acc_on_device_(.param.u64 %in_ar1)\n" \
+  "{\n"									\
+  "        .reg.u64 %ar1;\n"						\
+  ".reg.u32 %retval;\n"							\
+  "        .reg.u64 %hr10;\n"						\
+  "        .reg.u64 %r25;\n"						\
+  "        .reg.u32 %r26;\n"						\
+  "        .reg.u32 %r27;\n"						\
+  "        ld.param.u64 %ar1, [%in_ar1];\n"				\
+  "                mov.u64 %r25, %ar1;\n"				\
+  "                ld.u32  %r26, [%r25];\n"				\
+  "        {\n"								\
+  "                .param.u32 %retval_in;\n"				\
+  "        {\n"								\
+  "                .param.u32 %out_arg0;\n"				\
+  "                st.param.u32 [%out_arg0], %r26;\n"			\
+  "                call (%retval_in), acc_on_device, (%out_arg0);\n"	\
+  "        }\n"								\
+  "                ld.param.u32    %r27, [%retval_in];\n"		\
+  "}\n"									\
+  "                mov.u32 %retval, %r27;\n"				\
+  "        st.param.u32    [%out_retval], %retval;\n"			\
+  "        ret;\n"							\
+  "        }"
+
+static void
+link_ptx (CUmodule *module, char *ptx_code)
+{
+  CUjit_option opts[7];
+  void *optvals[7];
+  float elapsed = 0.0;
+#define LOGSIZE 8192
+  char elog[LOGSIZE];
+  char ilog[LOGSIZE];
+  unsigned long logsize = LOGSIZE;
+  CUlinkState linkstate;
+  CUresult r;
+  void *linkout;
+  size_t linkoutsize __attribute__((unused));
+
+  GOMP_PLUGIN_notify ("attempting to load:\n---\n%s\n---\n", ptx_code);
+
+  opts[0] = CU_JIT_WALL_TIME;
+  optvals[0] = &elapsed;
+
+  opts[1] = CU_JIT_INFO_LOG_BUFFER;
+  optvals[1] = &ilog[0];
+
+  opts[2] = CU_JIT_INFO_LOG_BUFFER_SIZE_BYTES;
+  optvals[2] = (void *) logsize;
+
+  opts[3] = CU_JIT_ERROR_LOG_BUFFER;
+  optvals[3] = &elog[0];
+
+  opts[4] = CU_JIT_ERROR_LOG_BUFFER_SIZE_BYTES;
+  optvals[4] = (void *) logsize;
+
+  opts[5] = CU_JIT_LOG_VERBOSE;
+  optvals[5] = (void *) 1;
+
+  opts[6] = CU_JIT_TARGET;
+  optvals[6] = (void *) CU_TARGET_COMPUTE_30;
+
+  r = cuLinkCreate (7, opts, optvals, &linkstate);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuLinkCreate error: %s", cuErrorMsg (r));
+
+  char *abort_ptx = ABORT_PTX;
+  r = cuLinkAddData (linkstate, CU_JIT_INPUT_PTX, abort_ptx,
+		     strlen (abort_ptx) + 1, 0, 0, 0, 0);
+  if (r != CUDA_SUCCESS)
+    {
+      GOMP_PLUGIN_error ("Link error log %s\n", &elog[0]);
+      GOMP_PLUGIN_fatal ("cuLinkAddData (abort) error: %s", cuErrorMsg (r));
+    }
+
+  char *acc_on_device_ptx = ACC_ON_DEVICE_PTX;
+  r = cuLinkAddData (linkstate, CU_JIT_INPUT_PTX, acc_on_device_ptx,
+		     strlen (acc_on_device_ptx) + 1, 0, 0, 0, 0);
+  if (r != CUDA_SUCCESS)
+    {
+      GOMP_PLUGIN_error ("Link error log %s\n", &elog[0]);
+      GOMP_PLUGIN_fatal ("cuLinkAddData (acc_on_device) error: %s",
+			 cuErrorMsg (r));
+    }
+
+  r = cuLinkAddData (linkstate, CU_JIT_INPUT_PTX, ptx_code,
+              strlen (ptx_code) + 1, 0, 0, 0, 0);
+  if (r != CUDA_SUCCESS)
+    {
+      GOMP_PLUGIN_error ("Link error log %s\n", &elog[0]);
+      GOMP_PLUGIN_fatal ("cuLinkAddData (ptx_code) error: %s", cuErrorMsg (r));
+    }
+
+  r = cuLinkComplete (linkstate, &linkout, &linkoutsize);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuLinkComplete error: %s", cuErrorMsg (r));
+
+  GOMP_PLUGIN_notify ("Link complete: %fms\n", elapsed);
+  GOMP_PLUGIN_notify ("Link log %s\n", &ilog[0]);
+
+  r = cuModuleLoadData (module, linkout);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuModuleLoadData error: %s", cuErrorMsg (r));
+}
+
+static void
+event_gc (bool memmap_lockable)
+{
+  struct PTX_event *e = PTX_events;
+  struct nvptx_thread *nvthd = nvptx_thread ();
+
+  GOMP_PLUGIN_mutex_lock (&PTX_event_lock);
+
+  while (e != NULL)
+    {
+      CUresult r;
+
+      if (e->ord != nvthd->ptx_dev->ord)
+	{
+	  e = e->next;
+	  continue;
+	}
+
+      r = cuEventQuery (*e->evt);
+      if (r == CUDA_SUCCESS)
+	{
+	  CUevent *te;
+
+	  te = e->evt;
+
+	  switch (e->type)
+	    {
+	    case PTX_EVT_MEM:
+	    case PTX_EVT_SYNC:
+	      break;
+	    
+	    case PTX_EVT_KNL:
+	      map_pop (e->addr);
+	      break;
+	    
+	    case PTX_EVT_ASYNC_CLEANUP:
+	      {
+		/* The function gomp_plugin_async_unmap_vars needs to claim the
+		   memory-map splay tree lock for the current device, so we
+		   can't call it when one of our callers has already claimed
+		   the lock.  In that case, just delay the GC for this event
+		   until later.  */
+		if (!memmap_lockable)
+		  {
+		    e = e->next;
+		    continue;
+		  }
+
+		GOMP_PLUGIN_async_unmap_vars (e->addr);
+	      }
+	      break;
+	    }
+
+	  cuEventDestroy (*te);
+	  free ((void *)te);
+
+	  struct PTX_event *next = e->next;
+
+	  if (PTX_events == e)
+	    PTX_events = PTX_events->next;
+	  else
+	    {
+	      struct PTX_event *e_ = PTX_events;
+	      while (e_->next != e)
+		e_ = e_->next;
+	      e_->next = e_->next->next;
+	    }
+
+	  free (e);
+	  e = next;
+        }
+      else
+	e = e->next;
+    }
+
+  GOMP_PLUGIN_mutex_unlock (&PTX_event_lock);
+}
+
+static void
+event_add (enum PTX_event_type type, CUevent *e, void *h)
+{
+  struct PTX_event *ptx_event;
+  struct nvptx_thread *nvthd = nvptx_thread ();
+
+  assert (type == PTX_EVT_MEM || type == PTX_EVT_KNL || type == PTX_EVT_SYNC
+	  || type == PTX_EVT_ASYNC_CLEANUP);
+
+  ptx_event = GOMP_PLUGIN_malloc (sizeof (struct PTX_event));
+  ptx_event->type = type;
+  ptx_event->evt = e;
+  ptx_event->addr = h;
+  ptx_event->ord = nvthd->ptx_dev->ord;
+
+  GOMP_PLUGIN_mutex_lock (&PTX_event_lock);
+
+  ptx_event->next = PTX_events;
+  PTX_events = ptx_event;
+
+  GOMP_PLUGIN_mutex_unlock (&PTX_event_lock);
+}
+
+void
+PTX_exec (void (*fn), size_t mapnum, void **hostaddrs, void **devaddrs,
+	  size_t *sizes, unsigned short *kinds, int num_gangs, int num_workers,
+	  int vector_length, int async, void *targ_mem_desc)
+{
+  struct targ_fn_descriptor *targ_fn = (struct targ_fn_descriptor *) fn;
+  CUfunction function;
+  CUresult r;
+  int i;
+  struct PTX_stream *dev_str;
+  void *kargs[1];
+  void *hp, *dp;
+  unsigned int nthreads_in_block;
+  struct nvptx_thread *nvthd = nvptx_thread ();
+
+  function = targ_fn->fn;
+  
+  dev_str = select_stream_for_async (async, pthread_self (), false, NULL);
+  assert (dev_str == nvthd->current_stream);
+
+  /* This reserves a chunk of a pre-allocated page of memory mapped on both
+     the host and the device. HP is a host pointer to the new chunk, and DP is
+     the corresponding device pointer.  */
+  map_push (dev_str, async, mapnum * sizeof (void *), &hp, &dp);
+
+  GOMP_PLUGIN_notify ("  %s: prepare mappings\n", __FUNCTION__);
+
+  /* Copy the array of arguments to the mapped page.  */
+  for (i = 0; i < mapnum; i++)
+    ((void **) hp)[i] = devaddrs[i];
+
+  /* Copy the (device) pointers to arguments to the device (dp and hp might in
+     fact have the same value on a unified-memory system).  */
+  r = cuMemcpy ((CUdeviceptr)dp, (CUdeviceptr)hp, mapnum * sizeof (void *));
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuMemcpy failed: %s", cuErrorMsg (r));
+
+  GOMP_PLUGIN_notify ("  %s: kernel %s: launch\n", __FUNCTION__, targ_fn->name);
+
+  // XXX: possible geometry mappings??
+  //
+  // OpenACC		CUDA
+  //
+  // num_gangs		blocks
+  // num_workers	warps (where a warp is equivalent to 32 threads)
+  // vector length	threads
+  //
+
+  /* The openacc vector_length clause 'determines the vector length to use for
+     vector or SIMD operations'.  The question is how to map this to CUDA.
+
+     In CUDA, the warp size is the vector length of a CUDA device.  However, the
+     CUDA interface abstracts away from that, and only shows us warp size
+     indirectly in maximum number of threads per block, which is a product of
+     warp size and the number of hyperthreads of a multiprocessor.
+
+     We choose to map openacc vector_length directly onto the number of threads
+     in a block, in the x dimension.  This is reflected in gcc code generation
+     that uses ThreadIdx.x to access vector elements.
+
+     Attempting to use an openacc vector_length of more than the maximum number
+     of threads per block will result in a cuda error.  */
+  nthreads_in_block = vector_length;
+
+  kargs[0] = &dp;
+  r = cuLaunchKernel (function,
+			1, 1, 1,
+			nthreads_in_block, 1, 1,
+			0, dev_str->stream, kargs, 0);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuLaunchKernel error: %s", cuErrorMsg (r));
+
+#ifndef DISABLE_ASYNC
+  if (async < acc_async_noval)
+    {
+      r = cuStreamSynchronize (dev_str->stream);
+      if (r != CUDA_SUCCESS)
+        GOMP_PLUGIN_fatal ("cuStreamSynchronize error: %s", cuErrorMsg (r));
+    }
+  else
+    {
+      CUevent *e;
+
+      e = (CUevent *)GOMP_PLUGIN_malloc (sizeof (CUevent));
+
+      r = cuEventCreate (e, CU_EVENT_DISABLE_TIMING);
+      if (r != CUDA_SUCCESS)
+        GOMP_PLUGIN_fatal ("cuEventCreate error: %s", cuErrorMsg (r));
+
+      event_gc (true);
+
+      r = cuEventRecord (*e, dev_str->stream);
+      if (r != CUDA_SUCCESS)
+        GOMP_PLUGIN_fatal ("cuEventRecord error: %s", cuErrorMsg (r));
+
+      event_add (PTX_EVT_KNL, e, (void *)dev_str);
+    }
+#else
+  r = cuCtxSynchronize ();
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuCtxSynchronize error: %s", cuErrorMsg (r));
+#endif
+
+  GOMP_PLUGIN_notify ("  %s: kernel %s: finished\n", __FUNCTION__,
+		      targ_fn->name);
+
+#ifndef DISABLE_ASYNC
+  if (async < acc_async_noval)
+#endif
+    map_pop (dev_str);
+}
+
+void * openacc_get_current_cuda_context (void);
+
+static void *
+PTX_alloc (size_t s)
+{
+  CUdeviceptr d;
+  CUresult r;
+
+  r = cuMemAlloc (&d, s);
+  if (r == CUDA_ERROR_OUT_OF_MEMORY)
+    return 0;
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuMemAlloc error: %s", cuErrorMsg (r));
+  return (void *)d;
+}
+
+static void
+PTX_free (void *p)
+{
+  CUresult r;
+  CUdeviceptr pb;
+  size_t ps;
+
+  r = cuMemGetAddressRange (&pb, &ps, (CUdeviceptr)p);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuMemGetAddressRange error: %s", cuErrorMsg (r));
+
+  if ((CUdeviceptr)p != pb)
+    GOMP_PLUGIN_fatal ("invalid device address");
+
+  r = cuMemFree ((CUdeviceptr)p);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuMemFree error: %s", cuErrorMsg (r));
+}
+
+static void *
+PTX_host2dev (void *d, const void *h, size_t s)
+{
+  CUresult r;
+  CUdeviceptr pb;
+  size_t ps;
+  struct nvptx_thread *nvthd = nvptx_thread ();
+
+  if (!s)
+    return 0;
+
+  if (!d)
+    GOMP_PLUGIN_fatal ("invalid device address");
+
+  r = cuMemGetAddressRange (&pb, &ps, (CUdeviceptr)d);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuMemGetAddressRange error: %s", cuErrorMsg (r));
+
+  if (!pb)
+    GOMP_PLUGIN_fatal ("invalid device address");
+
+  if (!h)
+    GOMP_PLUGIN_fatal ("invalid host address");
+
+  if (d == h)
+    GOMP_PLUGIN_fatal ("invalid host or device address");
+
+  if ((void *)(d + s) > (void *)(pb + ps))
+    GOMP_PLUGIN_fatal ("invalid size");
+
+#ifndef DISABLE_ASYNC
+  if (nvthd->current_stream != nvthd->ptx_dev->null_stream)
+    {
+      CUevent *e;
+
+      e = (CUevent *)GOMP_PLUGIN_malloc (sizeof (CUevent));
+
+      r = cuEventCreate (e, CU_EVENT_DISABLE_TIMING);
+      if (r != CUDA_SUCCESS)
+        GOMP_PLUGIN_fatal ("cuEventCreate error: %s", cuErrorMsg (r));
+
+      event_gc (false);
+
+      r = cuMemcpyHtoDAsync ((CUdeviceptr)d, h, s,
+			     nvthd->current_stream->stream);
+      if (r != CUDA_SUCCESS)
+        GOMP_PLUGIN_fatal ("cuMemcpyHtoDAsync error: %s", cuErrorMsg (r));
+
+      r = cuEventRecord (*e, nvthd->current_stream->stream);
+      if (r != CUDA_SUCCESS)
+        GOMP_PLUGIN_fatal ("cuEventRecord error: %s", cuErrorMsg (r));
+
+      event_add (PTX_EVT_MEM, e, (void *)h);
+    }
+  else
+#endif
+    {
+      r = cuMemcpyHtoD ((CUdeviceptr)d, h, s);
+      if (r != CUDA_SUCCESS)
+        GOMP_PLUGIN_fatal ("cuMemcpyHtoD error: %s", cuErrorMsg (r));
+    }
+
+  return 0;
+}
+
+static void *
+PTX_dev2host (void *h, const void *d, size_t s)
+{
+  CUresult r;
+  CUdeviceptr pb;
+  size_t ps;
+  struct nvptx_thread *nvthd = nvptx_thread ();
+
+  if (!s)
+    return 0;
+
+  if (!d)
+    GOMP_PLUGIN_fatal ("invalid device address");
+
+  r = cuMemGetAddressRange (&pb, &ps, (CUdeviceptr)d);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuMemGetAddressRange error: %s", cuErrorMsg (r));
+
+  if (!pb)
+    GOMP_PLUGIN_fatal ("invalid device address");
+
+  if (!h)
+    GOMP_PLUGIN_fatal ("invalid host address");
+
+  if (d == h)
+    GOMP_PLUGIN_fatal ("invalid host or device address");
+
+  if ((void *)(d + s) > (void *)(pb + ps))
+    GOMP_PLUGIN_fatal ("invalid size");
+
+#ifndef DISABLE_ASYNC
+  if (nvthd->current_stream != nvthd->ptx_dev->null_stream)
+    {
+      CUevent *e;
+
+      e = (CUevent *)GOMP_PLUGIN_malloc (sizeof (CUevent));
+
+      r = cuEventCreate (e, CU_EVENT_DISABLE_TIMING);
+      if (r != CUDA_SUCCESS)
+        GOMP_PLUGIN_fatal ("cuEventCreate error: %s\n", cuErrorMsg (r));
+
+      event_gc (false);
+
+      r = cuMemcpyDtoHAsync (h, (CUdeviceptr)d, s,
+			     nvthd->current_stream->stream);
+      if (r != CUDA_SUCCESS)
+        GOMP_PLUGIN_fatal ("cuMemcpyDtoHAsync error: %s", cuErrorMsg (r));
+
+      r = cuEventRecord (*e, nvthd->current_stream->stream);
+      if (r != CUDA_SUCCESS)
+        GOMP_PLUGIN_fatal ("cuEventRecord error: %s", cuErrorMsg (r));
+
+      event_add (PTX_EVT_MEM, e, (void *)h);
+    }
+  else
+#endif
+    {
+      r = cuMemcpyDtoH (h, (CUdeviceptr)d, s);
+      if (r != CUDA_SUCCESS)
+	GOMP_PLUGIN_fatal ("cuMemcpyDtoH error: %s", cuErrorMsg (r));
+    }
+
+  return 0;
+}
+
+static void
+PTX_set_async (int async)
+{
+  struct nvptx_thread *nvthd = nvptx_thread ();
+  nvthd->current_stream
+    = select_stream_for_async (async, pthread_self (), true, NULL);
+}
+
+static int
+PTX_async_test (int async)
+{
+  CUresult r;
+  struct PTX_stream *s;
+  
+  s = select_stream_for_async (async, pthread_self (), false, NULL);
+
+  if (!s)
+    GOMP_PLUGIN_fatal ("unknown async %d", async);
+
+  r = cuStreamQuery (s->stream);
+  if (r == CUDA_SUCCESS)
+    {
+      /* The oacc-parallel.c:goacc_wait function calls this hook to determine
+	 whether all work has completed on this stream, and if so omits the call
+	 to the wait hook.  If that happens, event_gc might not get called
+	 (which prevents variables from getting unmapped and their associated
+	 device storage freed), so call it here.  */
+      event_gc (true);
+      return 1;
+    }
+  else if (r == CUDA_ERROR_NOT_READY)
+    return 0;
+
+  GOMP_PLUGIN_fatal ("cuStreamQuery error: %s", cuErrorMsg (r));
+
+  return 0;
+}
+
+static int
+PTX_async_test_all (void)
+{
+  struct PTX_stream *s;
+  pthread_t self = pthread_self ();
+  struct nvptx_thread *nvthd = nvptx_thread ();
+
+  GOMP_PLUGIN_mutex_lock (&nvthd->ptx_dev->stream_lock);
+
+  for (s = nvthd->ptx_dev->active_streams; s != NULL; s = s->next)
+    {
+      if ((s->multithreaded || pthread_equal (s->host_thread, self))
+	  && cuStreamQuery (s->stream) == CUDA_ERROR_NOT_READY)
+	{
+	  GOMP_PLUGIN_mutex_unlock (&nvthd->ptx_dev->stream_lock);
+	  return 0;
+	}
+    }
+
+  GOMP_PLUGIN_mutex_unlock (&nvthd->ptx_dev->stream_lock);
+
+  event_gc (true);
+
+  return 1;
+}
+
+static void
+PTX_wait (int async)
+{
+  CUresult r;
+  struct PTX_stream *s;
+  
+  s = select_stream_for_async (async, pthread_self (), false, NULL);
+
+  if (!s)
+    GOMP_PLUGIN_fatal ("unknown async %d", async);
+
+  r = cuStreamSynchronize (s->stream);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuStreamSynchronize error: %s", cuErrorMsg (r));
+  
+  event_gc (true);
+}
+
+static void
+PTX_wait_async (int async1, int async2)
+{
+  CUresult r;
+  CUevent *e;
+  struct PTX_stream *s1, *s2;
+  pthread_t self = pthread_self ();
+
+  /* The stream that is waiting (rather than being waited for) doesn't
+     necessarily have to exist already.  */
+  s2 = select_stream_for_async (async2, self, true, NULL);
+
+  s1 = select_stream_for_async (async1, self, false, NULL);
+  if (!s1)
+    GOMP_PLUGIN_fatal ("invalid async 1\n");
+
+  if (s1 == s2)
+    GOMP_PLUGIN_fatal ("identical parameters");
+
+  e = (CUevent *)GOMP_PLUGIN_malloc (sizeof (CUevent));
+
+  r = cuEventCreate (e, CU_EVENT_DISABLE_TIMING);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuEventCreate error: %s", cuErrorMsg (r));
+
+  event_gc (true);
+
+  r = cuEventRecord (*e, s1->stream);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuEventRecord error: %s", cuErrorMsg (r));
+
+  event_add (PTX_EVT_SYNC, e, NULL);
+
+  r = cuStreamWaitEvent (s2->stream, *e, 0);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuStreamWaitEvent error: %s", cuErrorMsg (r));
+}
+
+static void
+PTX_wait_all (void)
+{
+  CUresult r;
+  struct PTX_stream *s;
+  pthread_t self = pthread_self ();
+  struct nvptx_thread *nvthd = nvptx_thread ();
+
+  GOMP_PLUGIN_mutex_lock (&nvthd->ptx_dev->stream_lock);
+
+  /* Wait for active streams initiated by this thread (or by multiple threads)
+     to complete.  */
+  for (s = nvthd->ptx_dev->active_streams; s != NULL; s = s->next)
+    {
+      if (s->multithreaded || pthread_equal (s->host_thread, self))
+	{
+	  r = cuStreamQuery (s->stream);
+	  if (r == CUDA_SUCCESS)
+	    continue;
+	  else if (r != CUDA_ERROR_NOT_READY)
+	    GOMP_PLUGIN_fatal ("cuStreamQuery error: %s", cuErrorMsg (r));
+
+	  r = cuStreamSynchronize (s->stream);
+	  if (r != CUDA_SUCCESS)
+	    GOMP_PLUGIN_fatal ("cuStreamSynchronize error: %s", cuErrorMsg (r));
+	}
+    }
+
+  GOMP_PLUGIN_mutex_unlock (&nvthd->ptx_dev->stream_lock);
+
+  event_gc (true);
+}
+
+static void
+PTX_wait_all_async (int async)
+{
+  CUresult r;
+  struct PTX_stream *waiting_stream, *other_stream;
+  CUevent *e;
+  struct nvptx_thread *nvthd = nvptx_thread ();
+  pthread_t self = pthread_self ();
+  
+  /* The stream doing the waiting.  This could be the first mention of the
+     stream, so create it if necessary.  */
+  waiting_stream
+    = select_stream_for_async (async, pthread_self (), true, NULL);
+  
+  /* Launches on the null stream already block on other streams in the
+     context.  */
+  if (!waiting_stream || waiting_stream == nvthd->ptx_dev->null_stream)
+    return;
+
+  event_gc (true);
+
+  GOMP_PLUGIN_mutex_lock (&nvthd->ptx_dev->stream_lock);
+
+  for (other_stream = nvthd->ptx_dev->active_streams;
+       other_stream != NULL;
+       other_stream = other_stream->next)
+    {
+      if (!other_stream->multithreaded
+	  && !pthread_equal (other_stream->host_thread, self))
+	continue;
+
+      e = (CUevent *) GOMP_PLUGIN_malloc (sizeof (CUevent));
+
+      r = cuEventCreate (e, CU_EVENT_DISABLE_TIMING);
+      if (r != CUDA_SUCCESS)
+	GOMP_PLUGIN_fatal ("cuEventCreate error: %s", cuErrorMsg (r));
+
+      /* Record an event on the waited-for stream.  */
+      r = cuEventRecord (*e, other_stream->stream);
+      if (r != CUDA_SUCCESS)
+	GOMP_PLUGIN_fatal ("cuEventRecord error: %s", cuErrorMsg (r));
+
+      event_add (PTX_EVT_SYNC, e, NULL);
+
+      r = cuStreamWaitEvent (waiting_stream->stream, *e, 0);
+      if (r != CUDA_SUCCESS)
+	GOMP_PLUGIN_fatal ("cuStreamWaitEvent error: %s", cuErrorMsg (r));
+   }
+
+  GOMP_PLUGIN_mutex_unlock (&nvthd->ptx_dev->stream_lock);
+}
+
+static void *
+PTX_get_current_cuda_device (void)
+{
+  struct nvptx_thread *nvthd = nvptx_thread ();
+
+  if (!nvthd || !nvthd->ptx_dev)
+    return NULL;
+
+  return &nvthd->ptx_dev->dev;
+}
+
+static void *
+PTX_get_current_cuda_context (void)
+{
+  struct nvptx_thread *nvthd = nvptx_thread ();
+
+  if (!nvthd || !nvthd->ptx_dev)
+    return NULL;
+
+  return nvthd->ptx_dev->ctx;
+}
+
+static void *
+PTX_get_cuda_stream (int async)
+{
+  struct PTX_stream *s;
+  struct nvptx_thread *nvthd = nvptx_thread ();
+
+  if (!nvthd || !nvthd->ptx_dev)
+    return NULL;
+
+  s = select_stream_for_async (async, pthread_self (), false, NULL);
+
+  return s ? s->stream : NULL;
+}
+
+static int
+PTX_set_cuda_stream (int async, void *stream)
+{
+  struct PTX_stream *oldstream;
+  pthread_t self = pthread_self ();
+  struct nvptx_thread *nvthd = nvptx_thread ();
+
+  GOMP_PLUGIN_mutex_lock (&nvthd->ptx_dev->stream_lock);
+
+  if (async < 0)
+    GOMP_PLUGIN_fatal ("bad async %d", async);
+
+  /* We have a list of active streams and an array mapping async values to
+     entries of that list.  We need to take "ownership" of the passed-in stream,
+     and add it to our list, removing the previous entry also (if there was one)
+     in order to prevent resource leaks.  Note the potential for surprise
+     here: maybe we should keep track of passed-in streams and leave it up to
+     the user to tidy those up, but that doesn't work for stream handles
+     returned from acc_get_cuda_stream above...  */
+
+  oldstream = select_stream_for_async (async, self, false, NULL);
+  
+  if (oldstream)
+    {
+      if (nvthd->ptx_dev->active_streams == oldstream)
+	nvthd->ptx_dev->active_streams = nvthd->ptx_dev->active_streams->next;
+      else
+	{
+	  struct PTX_stream *s = nvthd->ptx_dev->active_streams;
+	  while (s->next != oldstream)
+	    s = s->next;
+	  s->next = s->next->next;
+	}
+
+      cuStreamDestroy (oldstream->stream);
+      map_fini (oldstream);
+      free (oldstream);
+    }
+
+  GOMP_PLUGIN_mutex_unlock (&nvthd->ptx_dev->stream_lock);
+
+  (void) select_stream_for_async (async, self, true, (CUstream) stream);
+
+  return 1;
+}
+
+/* Plugin entry points.  */
+
+
+int
+GOMP_OFFLOAD_get_type (void)
+{
+  return OFFLOAD_TARGET_TYPE_NVIDIA_PTX;
+}
+
+unsigned int
+GOMP_OFFLOAD_get_caps (void)
+{
+  return TARGET_CAP_OPENACC_200;
+}
+
+const char *
+GOMP_OFFLOAD_get_name (void)
+{
+  return "nvidia";
+}
+
+int
+GOMP_OFFLOAD_get_num_devices (void)
+{
+  return PTX_get_num_devices ();
+}
+
+static void **kernel_target_data;
+static void **kernel_host_table;
+
+void
+GOMP_OFFLOAD_register_image (void *host_table, void *target_data)
+{
+  kernel_target_data = target_data;
+  kernel_host_table = host_table;
+}
+
+void
+GOMP_OFFLOAD_init_device (int n __attribute__((unused)))
+{
+  (void) PTX_init ();
+}
+
+void
+GOMP_OFFLOAD_fini_device (int n __attribute__((unused)))
+{
+  PTX_fini ();
+}
+
+int
+GOMP_OFFLOAD_get_table (int n __attribute__((unused)),
+			struct mapping_table **tablep)
+{
+  CUmodule module;
+  void **fn_table;
+  char **fn_names;
+  int fn_entries, i;
+  CUresult r;
+  struct targ_fn_descriptor *targ_fns;
+
+  if (PTX_init () <= 0)
+    return 0;
+
+  /* This isn't an error, because an image may legitimately have no offloaded
+     regions and so will not call GOMP_offload_register.  */
+  if (kernel_target_data == NULL)
+    return 0;
+
+  link_ptx (&module, kernel_target_data[0]);
+
+  /* kernel_target_data[0] -> ptx code
+     kernel_target_data[1] -> variable mappings
+     kernel_target_data[2] -> array of kernel names in ascii
+
+     kernel_host_table[0] -> start of function addresses (_omp_func_table)
+     kernel_host_table[1] -> end of function addresses (_omp_funcs_end)
+
+     The array of kernel names and the functions addresses form a
+     one-to-one correspondence.  */
+
+  fn_table = kernel_host_table[0];
+  fn_names = (char **) kernel_target_data[2];
+  fn_entries = (kernel_host_table[1] - kernel_host_table[0]) / sizeof (void *);
+
+  *tablep = GOMP_PLUGIN_malloc (sizeof (struct mapping_table) * fn_entries);
+  targ_fns = GOMP_PLUGIN_malloc (sizeof (struct targ_fn_descriptor)
+				 * fn_entries);
+
+  for (i = 0; i < fn_entries; i++)
+    {
+      CUfunction function;
+
+      r = cuModuleGetFunction (&function, module, fn_names[i]);
+      if (r != CUDA_SUCCESS)
+	GOMP_PLUGIN_fatal ("cuModuleGetFunction error: %s", cuErrorMsg (r));
+
+      targ_fns[i].fn = function;
+      targ_fns[i].name = (const char *) fn_names[i];
+
+      (*tablep)[i].host_start = (uintptr_t) fn_table[i];
+      (*tablep)[i].host_end = (*tablep)[i].host_start + 1;
+      (*tablep)[i].tgt_start = (uintptr_t) &targ_fns[i];
+      (*tablep)[i].tgt_end = (*tablep)[i].tgt_start + 1;
+    }
+
+  return fn_entries;
+}
+
+void *
+GOMP_OFFLOAD_alloc (int n __attribute__((unused)), size_t size)
+{
+  return PTX_alloc (size);
+}
+
+void
+GOMP_OFFLOAD_free (int n __attribute__((unused)), void *ptr)
+{
+  PTX_free (ptr);
+}
+
+void *
+GOMP_OFFLOAD_dev2host (int ord __attribute__((unused)), void *dst,
+		       const void *src, size_t n)
+{
+  return PTX_dev2host (dst, src, n);
+}
+
+void *
+GOMP_OFFLOAD_host2dev (int ord __attribute__((unused)), void *dst,
+		       const void *src, size_t n)
+{
+  return PTX_host2dev (dst, src, n);
+}
+
+void (*device_run) (void *fn_ptr, void *vars) = NULL;
+
+void
+GOMP_OFFLOAD_openacc_parallel (void (*fn) (void *), size_t mapnum,
+			      void **hostaddrs, void **devaddrs, size_t *sizes,
+			      unsigned short *kinds, int num_gangs,
+			      int num_workers, int vector_length, int async,
+			      void *targ_mem_desc)
+{
+  PTX_exec (fn, mapnum, hostaddrs, devaddrs, sizes, kinds, num_gangs,
+	    num_workers, vector_length, async, targ_mem_desc);
+}
+
+void *
+GOMP_OFFLOAD_openacc_open_device (int n)
+{
+  return PTX_open_device (n);
+}
+
+int
+GOMP_OFFLOAD_openacc_close_device (void *h)
+{
+  return PTX_close_device (h);
+}
+
+void
+GOMP_OFFLOAD_openacc_set_device_num (int n)
+{
+  struct nvptx_thread *nvthd = nvptx_thread ();
+
+  assert (n >= 0);
+
+  if (!nvthd->ptx_dev || nvthd->ptx_dev->ord != n)
+    (void) PTX_open_device (n);
+}
+
+/* This can be called before the device is "opened" for the current thread, in
+   which case we can't tell which device number should be returned.  We don't
+   actually want to open the device here, so just return -1 and let the caller
+   (oacc-init.c:acc_get_device_num) handle it.  */
+
+int
+GOMP_OFFLOAD_openacc_get_device_num (void)
+{
+  struct nvptx_thread *nvthd = nvptx_thread ();
+
+  if (nvthd && nvthd->ptx_dev)
+    return nvthd->ptx_dev->ord;
+  else
+    return -1;
+}
+
+void
+GOMP_OFFLOAD_openacc_register_async_cleanup (void *targ_mem_desc)
+{
+  CUevent *e;
+  CUresult r;
+  struct nvptx_thread *nvthd = nvptx_thread ();
+
+  e = (CUevent *) GOMP_PLUGIN_malloc (sizeof (CUevent));
+
+  r = cuEventCreate (e, CU_EVENT_DISABLE_TIMING);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuEventCreate error: %s", cuErrorMsg (r));
+
+  r = cuEventRecord (*e, nvthd->current_stream->stream);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuEventRecord error: %s", cuErrorMsg (r));
+
+  event_add (PTX_EVT_ASYNC_CLEANUP, e, targ_mem_desc);
+}
+
+int
+GOMP_OFFLOAD_openacc_async_test (int async)
+{
+  return PTX_async_test (async);
+}
+
+int
+GOMP_OFFLOAD_openacc_async_test_all (void)
+{
+  return PTX_async_test_all ();
+}
+
+void
+GOMP_OFFLOAD_openacc_async_wait (int async)
+{
+  PTX_wait (async);
+}
+
+void
+GOMP_OFFLOAD_openacc_async_wait_async (int async1, int async2)
+{
+  PTX_wait_async (async1, async2);
+}
+
+void
+GOMP_OFFLOAD_openacc_async_wait_all (void)
+{
+  PTX_wait_all ();
+}
+
+void
+GOMP_OFFLOAD_openacc_async_wait_all_async (int async)
+{
+  PTX_wait_all_async (async);
+}
+
+void
+GOMP_OFFLOAD_openacc_async_set_async (int async)
+{
+  PTX_set_async (async);
+}
+
+void *
+GOMP_OFFLOAD_openacc_create_thread_data (void *targ_data)
+{
+  struct PTX_device *ptx_dev = (struct PTX_device *) targ_data;
+  struct nvptx_thread *nvthd
+    = GOMP_PLUGIN_malloc (sizeof (struct nvptx_thread));
+  CUresult r;
+  CUcontext thd_ctx;
+
+  r = cuCtxGetCurrent (&thd_ctx);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuCtxGetCurrent error: %s", cuErrorMsg (r));
+
+  assert (ptx_dev->ctx);
+
+  if (!thd_ctx)
+    {
+      r = cuCtxPushCurrent (ptx_dev->ctx);
+      if (r != CUDA_SUCCESS)
+	GOMP_PLUGIN_fatal ("cuCtxPushCurrent error: %s", cuErrorMsg (r));
+    }
+
+  nvthd->current_stream = ptx_dev->null_stream;
+  nvthd->ptx_dev = ptx_dev;
+
+  return (void *) nvthd;
+}
+
+void
+GOMP_OFFLOAD_openacc_destroy_thread_data (void *data)
+{
+  free (data);
+}
+
+void *
+GOMP_OFFLOAD_openacc_get_current_cuda_device (void)
+{
+  return PTX_get_current_cuda_device ();
+}
+
+void *
+GOMP_OFFLOAD_openacc_get_current_cuda_context (void)
+{
+  return PTX_get_current_cuda_context ();
+}
+
+/* NOTE: This returns a CUstream, not a PTX_stream pointer.  */
+
+void *
+GOMP_OFFLOAD_openacc_get_cuda_stream (int async)
+{
+  return PTX_get_cuda_stream (async);
+}
+
+/* NOTE: This takes a CUstream, not a PTX_stream pointer.  */
+
+int
+GOMP_OFFLOAD_openacc_set_cuda_stream (int async, void *stream)
+{
+  return PTX_set_cuda_stream (async, stream);
+}
diff --git a/libgomp/splay-tree.c b/libgomp/splay-tree.c
new file mode 100644
index 0000000..14b03ac
--- /dev/null
+++ b/libgomp/splay-tree.c
@@ -0,0 +1,224 @@
+/* A splay-tree datatype.
+   Copyright 1998-2013
+   Free Software Foundation, Inc.
+   Contributed by Mark Mitchell (mark@markmitchell.com).
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* The splay tree code copied from include/splay-tree.h and adjusted,
+   so that all the data lives directly in splay_tree_node_s structure
+   and no extra allocations are needed.
+
+   Files including this header should before including it add:
+typedef struct splay_tree_node_s *splay_tree_node;
+typedef struct splay_tree_s *splay_tree;
+typedef struct splay_tree_key_s *splay_tree_key;
+   define splay_tree_key_s structure, and define
+   splay_compare inline function.  */
+
+/* For an easily readable description of splay-trees, see:
+
+     Lewis, Harry R. and Denenberg, Larry.  Data Structures and Their
+     Algorithms.  Harper-Collins, Inc.  1991.
+
+   The major feature of splay trees is that all basic tree operations
+   are amortized O(log n) time for a tree with n nodes.  */
+
+#include "libgomp.h"
+#include "splay-tree.h"
+
+extern int splay_compare (splay_tree_key, splay_tree_key);
+
+/* Rotate the edge joining the left child N with its parent P.  PP is the
+   grandparents' pointer to P.  */
+
+static inline void
+rotate_left (splay_tree_node *pp, splay_tree_node p, splay_tree_node n)
+{
+  splay_tree_node tmp;
+  tmp = n->right;
+  n->right = p;
+  p->left = tmp;
+  *pp = n;
+}
+
+/* Rotate the edge joining the right child N with its parent P.  PP is the
+   grandparents' pointer to P.  */
+
+static inline void
+rotate_right (splay_tree_node *pp, splay_tree_node p, splay_tree_node n)
+{
+  splay_tree_node tmp;
+  tmp = n->left;
+  n->left = p;
+  p->right = tmp;
+  *pp = n;
+}
+
+/* Bottom up splay of KEY.  */
+
+static void
+splay_tree_splay (splay_tree sp, splay_tree_key key)
+{
+  if (sp->root == NULL)
+    return;
+
+  do {
+    int cmp1, cmp2;
+    splay_tree_node n, c;
+
+    n = sp->root;
+    cmp1 = splay_compare (key, &n->key);
+
+    /* Found.  */
+    if (cmp1 == 0)
+      return;
+
+    /* Left or right?  If no child, then we're done.  */
+    if (cmp1 < 0)
+      c = n->left;
+    else
+      c = n->right;
+    if (!c)
+      return;
+
+    /* Next one left or right?  If found or no child, we're done
+       after one rotation.  */
+    cmp2 = splay_compare (key, &c->key);
+    if (cmp2 == 0
+	|| (cmp2 < 0 && !c->left)
+	|| (cmp2 > 0 && !c->right))
+      {
+	if (cmp1 < 0)
+	  rotate_left (&sp->root, n, c);
+	else
+	  rotate_right (&sp->root, n, c);
+	return;
+      }
+
+    /* Now we have the four cases of double-rotation.  */
+    if (cmp1 < 0 && cmp2 < 0)
+      {
+	rotate_left (&n->left, c, c->left);
+	rotate_left (&sp->root, n, n->left);
+      }
+    else if (cmp1 > 0 && cmp2 > 0)
+      {
+	rotate_right (&n->right, c, c->right);
+	rotate_right (&sp->root, n, n->right);
+      }
+    else if (cmp1 < 0 && cmp2 > 0)
+      {
+	rotate_right (&n->left, c, c->right);
+	rotate_left (&sp->root, n, n->left);
+      }
+    else if (cmp1 > 0 && cmp2 < 0)
+      {
+	rotate_left (&n->right, c, c->left);
+	rotate_right (&sp->root, n, n->right);
+      }
+  } while (1);
+}
+
+/* Insert a new NODE into SP.  The NODE shouldn't exist in the tree.  */
+
+attribute_hidden void
+splay_tree_insert (splay_tree sp, splay_tree_node node)
+{
+  int comparison = 0;
+
+  splay_tree_splay (sp, &node->key);
+
+  if (sp->root)
+    comparison = splay_compare (&sp->root->key, &node->key);
+
+  if (sp->root && comparison == 0)
+    gomp_fatal ("Duplicate node");
+  else
+    {
+      /* Insert it at the root.  */
+      if (sp->root == NULL)
+	node->left = node->right = NULL;
+      else if (comparison < 0)
+	{
+	  node->left = sp->root;
+	  node->right = node->left->right;
+	  node->left->right = NULL;
+	}
+      else
+	{
+	  node->right = sp->root;
+	  node->left = node->right->left;
+	  node->right->left = NULL;
+	}
+
+      sp->root = node;
+    }
+}
+
+/* Remove node with KEY from SP.  It is not an error if it did not exist.  */
+
+attribute_hidden void
+splay_tree_remove (splay_tree sp, splay_tree_key key)
+{
+  splay_tree_splay (sp, key);
+
+  if (sp->root && splay_compare (&sp->root->key, key) == 0)
+    {
+      splay_tree_node left, right;
+
+      left = sp->root->left;
+      right = sp->root->right;
+
+      /* One of the children is now the root.  Doesn't matter much
+	 which, so long as we preserve the properties of the tree.  */
+      if (left)
+	{
+	  sp->root = left;
+
+	  /* If there was a right child as well, hang it off the
+	     right-most leaf of the left child.  */
+	  if (right)
+	    {
+	      while (left->right)
+		left = left->right;
+	      left->right = right;
+	    }
+	}
+      else
+	sp->root = right;
+    }
+}
+
+/* Lookup KEY in SP, returning NODE if present, and NULL
+   otherwise.  */
+
+attribute_hidden splay_tree_key
+splay_tree_lookup (splay_tree sp, splay_tree_key key)
+{
+  splay_tree_splay (sp, key);
+
+  if (sp->root && splay_compare (&sp->root->key, key) == 0)
+    return &sp->root->key;
+  else
+    return NULL;
+}
diff --git a/libgomp/splay-tree.h b/libgomp/splay-tree.h
index eb8011a..f29d437 100644
--- a/libgomp/splay-tree.h
+++ b/libgomp/splay-tree.h
@@ -43,6 +43,30 @@ typedef struct splay_tree_key_s *splay_tree_key;
    The major feature of splay trees is that all basic tree operations
    are amortized O(log n) time for a tree with n nodes.  */
 
+#ifndef _SPLAY_TREE_H
+#define _SPLAY_TREE_H 1
+
+typedef struct splay_tree_node_s *splay_tree_node;
+typedef struct splay_tree_s *splay_tree;
+typedef struct splay_tree_key_s *splay_tree_key;
+
+struct splay_tree_key_s {
+  /* Address of the host object.  */
+  uintptr_t host_start;
+  /* Address immediately after the host object.  */
+  uintptr_t host_end;
+  /* Descriptor of the target memory.  */
+  struct target_mem_desc *tgt;
+  /* Offset from tgt->tgt_start to the start of the target object.  */
+  uintptr_t tgt_offset;
+  /* Reference count.  */
+  uintptr_t refcount;
+  /* Asynchronous reference count.  */
+  uintptr_t async_refcount;
+  /* True if data should be copied from device to host at the end.  */
+  bool copy_from;
+};
+
 /* The nodes in the splay tree.  */
 struct splay_tree_node_s {
   struct splay_tree_key_s key;
@@ -56,177 +80,8 @@ struct splay_tree_s {
   splay_tree_node root;
 };
 
-/* Rotate the edge joining the left child N with its parent P.  PP is the
-   grandparents' pointer to P.  */
-
-static inline void
-rotate_left (splay_tree_node *pp, splay_tree_node p, splay_tree_node n)
-{
-  splay_tree_node tmp;
-  tmp = n->right;
-  n->right = p;
-  p->left = tmp;
-  *pp = n;
-}
-
-/* Rotate the edge joining the right child N with its parent P.  PP is the
-   grandparents' pointer to P.  */
-
-static inline void
-rotate_right (splay_tree_node *pp, splay_tree_node p, splay_tree_node n)
-{
-  splay_tree_node tmp;
-  tmp = n->left;
-  n->left = p;
-  p->right = tmp;
-  *pp = n;
-}
-
-/* Bottom up splay of KEY.  */
-
-static void
-splay_tree_splay (splay_tree sp, splay_tree_key key)
-{
-  if (sp->root == NULL)
-    return;
-
-  do {
-    int cmp1, cmp2;
-    splay_tree_node n, c;
-
-    n = sp->root;
-    cmp1 = splay_compare (key, &n->key);
-
-    /* Found.  */
-    if (cmp1 == 0)
-      return;
-
-    /* Left or right?  If no child, then we're done.  */
-    if (cmp1 < 0)
-      c = n->left;
-    else
-      c = n->right;
-    if (!c)
-      return;
-
-    /* Next one left or right?  If found or no child, we're done
-       after one rotation.  */
-    cmp2 = splay_compare (key, &c->key);
-    if (cmp2 == 0
-	|| (cmp2 < 0 && !c->left)
-	|| (cmp2 > 0 && !c->right))
-      {
-	if (cmp1 < 0)
-	  rotate_left (&sp->root, n, c);
-	else
-	  rotate_right (&sp->root, n, c);
-	return;
-      }
-
-    /* Now we have the four cases of double-rotation.  */
-    if (cmp1 < 0 && cmp2 < 0)
-      {
-	rotate_left (&n->left, c, c->left);
-	rotate_left (&sp->root, n, n->left);
-      }
-    else if (cmp1 > 0 && cmp2 > 0)
-      {
-	rotate_right (&n->right, c, c->right);
-	rotate_right (&sp->root, n, n->right);
-      }
-    else if (cmp1 < 0 && cmp2 > 0)
-      {
-	rotate_right (&n->left, c, c->right);
-	rotate_left (&sp->root, n, n->left);
-      }
-    else if (cmp1 > 0 && cmp2 < 0)
-      {
-	rotate_left (&n->right, c, c->left);
-	rotate_right (&sp->root, n, n->right);
-      }
-  } while (1);
-}
-
-/* Insert a new NODE into SP.  The NODE shouldn't exist in the tree.  */
-
-static void
-splay_tree_insert (splay_tree sp, splay_tree_node node)
-{
-  int comparison = 0;
-
-  splay_tree_splay (sp, &node->key);
-
-  if (sp->root)
-    comparison = splay_compare (&sp->root->key, &node->key);
-
-  if (sp->root && comparison == 0)
-    abort ();
-  else
-    {
-      /* Insert it at the root.  */
-      if (sp->root == NULL)
-	node->left = node->right = NULL;
-      else if (comparison < 0)
-	{
-	  node->left = sp->root;
-	  node->right = node->left->right;
-	  node->left->right = NULL;
-	}
-      else
-	{
-	  node->right = sp->root;
-	  node->left = node->right->left;
-	  node->right->left = NULL;
-	}
-
-      sp->root = node;
-    }
-}
-
-/* Remove node with KEY from SP.  It is not an error if it did not exist.  */
-
-static void
-splay_tree_remove (splay_tree sp, splay_tree_key key)
-{
-  splay_tree_splay (sp, key);
-
-  if (sp->root && splay_compare (&sp->root->key, key) == 0)
-    {
-      splay_tree_node left, right;
-
-      left = sp->root->left;
-      right = sp->root->right;
-
-      /* One of the children is now the root.  Doesn't matter much
-	 which, so long as we preserve the properties of the tree.  */
-      if (left)
-	{
-	  sp->root = left;
-
-	  /* If there was a right child as well, hang it off the
-	     right-most leaf of the left child.  */
-	  if (right)
-	    {
-	      while (left->right)
-		left = left->right;
-	      left->right = right;
-	    }
-	}
-      else
-	sp->root = right;
-    }
-}
-
-/* Lookup KEY in SP, returning NODE if present, and NULL
-   otherwise.  */
-
-static splay_tree_key
-splay_tree_lookup (splay_tree sp, splay_tree_key key)
-{
-  splay_tree_splay (sp, key);
-
-  if (sp->root && splay_compare (&sp->root->key, key) == 0)
-    return &sp->root->key;
-  else
-    return NULL;
-}
+attribute_hidden splay_tree_key splay_tree_lookup (splay_tree, splay_tree_key);
+attribute_hidden void splay_tree_insert (splay_tree, splay_tree_node);
+attribute_hidden void splay_tree_remove (splay_tree, splay_tree_key);
+
+#endif /* _SPLAY_TREE_H */
diff --git a/libgomp/target.c b/libgomp/target.c
index 4ace170..a307239 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -30,7 +30,12 @@
 #include <limits.h>
 #include <stdbool.h>
 #include <stdlib.h>
+#include "oacc-plugin.h"
+#include "gomp-constants.h"
+#include "oacc-int.h"
 #include <string.h>
+#include <stdio.h>
+#include <assert.h>
 
 #ifdef PLUGIN_SUPPORT
 #include <dlfcn.h>
@@ -40,50 +45,6 @@ static void gomp_target_init (void);
 
 static pthread_once_t gomp_is_initialized = PTHREAD_ONCE_INIT;
 
-/* Forward declaration for a node in the tree.  */
-typedef struct splay_tree_node_s *splay_tree_node;
-typedef struct splay_tree_s *splay_tree;
-typedef struct splay_tree_key_s *splay_tree_key;
-
-struct target_mem_desc {
-  /* Reference count.  */
-  uintptr_t refcount;
-  /* All the splay nodes allocated together.  */
-  splay_tree_node array;
-  /* Start of the target region.  */
-  uintptr_t tgt_start;
-  /* End of the targer region.  */
-  uintptr_t tgt_end;
-  /* Handle to free.  */
-  void *to_free;
-  /* Previous target_mem_desc.  */
-  struct target_mem_desc *prev;
-  /* Number of items in following list.  */
-  size_t list_count;
-
-  /* Corresponding target device descriptor.  */
-  struct gomp_device_descr *device_descr;
-
-  /* List of splay keys to remove (or decrease refcount)
-     at the end of region.  */
-  splay_tree_key list[];
-};
-
-struct splay_tree_key_s {
-  /* Address of the host object.  */
-  uintptr_t host_start;
-  /* Address immediately after the host object.  */
-  uintptr_t host_end;
-  /* Descriptor of the target memory.  */
-  struct target_mem_desc *tgt;
-  /* Offset from tgt->tgt_start to the start of the target object.  */
-  uintptr_t tgt_offset;
-  /* Reference count.  */
-  uintptr_t refcount;
-  /* True if data should be copied from device to host at the end.  */
-  bool copy_from;
-};
-
 /* This structure describes an offload image.
    It contains type of the target device, pointer to host table descriptor, and
    pointer to target data.  */
@@ -107,7 +68,7 @@ static int num_devices;
 
 /* The comparison function.  */
 
-static int
+attribute_hidden int
 splay_compare (splay_tree_key x, splay_tree_key y)
 {
   if (x->host_start == x->host_end
@@ -122,47 +83,16 @@ splay_compare (splay_tree_key x, splay_tree_key y)
 
 #include "splay-tree.h"
 
-/* This structure describes accelerator device.
-   It contains ID-number of the device, its type, function handlers for
-   interaction with the device, and information about mapped memory.  */
-struct gomp_device_descr
+attribute_hidden void
+gomp_init_targets_once (void)
 {
-  /* This is the ID number of device.  It could be specified in DEVICE-clause of
-     TARGET construct.  */
-  int id;
-
-  /* This is the ID number of device among devices of the same type.  */
-  int target_id;
-
-  /* This is the TYPE of device.  */
-  enum offload_target_type type;
-
-  /* Set to true when device is initialized.  */
-  bool is_initialized;
-
-  /* Function handlers.  */
-  int (*get_type_func) (void);
-  int (*get_num_devices_func) (void);
-  void (*register_image_func) (void *, void *);
-  void (*init_device_func) (int);
-  int (*get_table_func) (int, void *);
-  void *(*alloc_func) (int, size_t);
-  void (*free_func) (int, void *);
-  void *(*host2dev_func) (int, void *, const void *, size_t);
-  void *(*dev2host_func) (int, void *, const void *, size_t);
-  void (*run_func) (int, void *, void *);
-
-  /* Splay tree containing information about mapped memory regions.  */
-  struct splay_tree_s dev_splay_tree;
-
-  /* Mutex for operating with the splay tree and other shared structures.  */
-  gomp_mutex_t dev_env_lock;
-};
+  (void) pthread_once (&gomp_is_initialized, gomp_target_init);
+}
 
 attribute_hidden int
 gomp_get_num_devices (void)
 {
-  (void) pthread_once (&gomp_is_initialized, gomp_target_init);
+  gomp_init_targets_once ();
   return num_devices;
 }
 
@@ -198,18 +128,29 @@ gomp_map_vars_existing (splay_tree_key oldn, splay_tree_key newn,
   oldn->refcount++;
 }
 
-static struct target_mem_desc *
+static int
+get_kind (bool is_openacc, void *kinds, int idx)
+{
+  return is_openacc ? ((unsigned short *) kinds)[idx]
+		    : ((unsigned char *) kinds)[idx];
+}
+
+attribute_hidden struct target_mem_desc *
 gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
-	       void **hostaddrs, size_t *sizes, unsigned char *kinds,
-	       bool is_target)
+	       void **hostaddrs, void **devaddrs, size_t *sizes, void *kinds,
+	       bool is_openacc, bool is_target)
 {
   size_t i, tgt_align, tgt_size, not_found_cnt = 0;
+  const int rshift = is_openacc ? 8 : 3;
+  const int typemask = is_openacc ? 0xff : 0x7;
+  struct gomp_memory_mapping *mm = &devicep->mem_map;
   struct splay_tree_key_s cur_node;
   struct target_mem_desc *tgt
     = gomp_malloc (sizeof (*tgt) + sizeof (tgt->list[0]) * mapnum);
   tgt->list_count = mapnum;
   tgt->refcount = 1;
   tgt->device_descr = devicep;
+  tgt->mem_map = mm;
 
   if (mapnum == 0)
     return tgt;
@@ -222,41 +163,41 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
       tgt_align = align;
       tgt_size = mapnum * sizeof (void *);
     }
-
-  gomp_mutex_lock (&devicep->dev_env_lock);
+  gomp_mutex_lock (&mm->lock);
   for (i = 0; i < mapnum; i++)
     {
+      int kind = get_kind (is_openacc, kinds, i);
       if (hostaddrs[i] == NULL)
 	{
 	  tgt->list[i] = NULL;
 	  continue;
 	}
       cur_node.host_start = (uintptr_t) hostaddrs[i];
-      if ((kinds[i] & 7) != 4)
+      if (!GOMP_MAP_POINTER_P (kind & typemask))
 	cur_node.host_end = cur_node.host_start + sizes[i];
       else
 	cur_node.host_end = cur_node.host_start + sizeof (void *);
-      splay_tree_key n = splay_tree_lookup (&devicep->dev_splay_tree,
-					    &cur_node);
+      splay_tree_key n = splay_tree_lookup (&mm->splay_tree, &cur_node);
       if (n)
 	{
 	  tgt->list[i] = n;
-	  gomp_map_vars_existing (n, &cur_node, kinds[i]);
+	  gomp_map_vars_existing (n, &cur_node, kind);
 	}
       else
 	{
-	  size_t align = (size_t) 1 << (kinds[i] >> 3);
+	  size_t align = (size_t) 1 << (kind >> rshift);
 	  tgt->list[i] = NULL;
 	  not_found_cnt++;
 	  if (tgt_align < align)
 	    tgt_align = align;
 	  tgt_size = (tgt_size + align - 1) & ~(align - 1);
 	  tgt_size += cur_node.host_end - cur_node.host_start;
-	  if ((kinds[i] & 7) == 5)
+	  if ((kind & typemask) == GOMP_MAP_TO_PSET)
 	    {
 	      size_t j;
 	      for (j = i + 1; j < mapnum; j++)
-		if ((kinds[j] & 7) != 4)
+		if (!GOMP_MAP_POINTER_P (get_kind (is_openacc, kinds, j)
+					 & typemask))
 		  break;
 		else if ((uintptr_t) hostaddrs[j] < cur_node.host_start
 			 || ((uintptr_t) hostaddrs[j] + sizeof (void *)
@@ -271,7 +212,15 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 	}
     }
 
-  if (not_found_cnt || is_target)
+  if (devaddrs)
+    {
+      if (mapnum != 1)
+        gomp_fatal ("unexpected aggregation");
+      tgt->to_free = devaddrs[0];
+      tgt->tgt_start = (uintptr_t) tgt->to_free;
+      tgt->tgt_end = tgt->tgt_start + sizes[0];
+    }
+  else if (not_found_cnt || is_target)
     {
       /* Allocate tgt_align aligned tgt_size block of memory.  */
       /* FIXME: Perhaps change interface to allocate properly aligned
@@ -303,44 +252,52 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
       for (i = 0; i < mapnum; i++)
 	if (tgt->list[i] == NULL)
 	  {
+	    int kind = get_kind (is_openacc, kinds, i);
 	    if (hostaddrs[i] == NULL)
 	      continue;
 	    splay_tree_key k = &array->key;
 	    k->host_start = (uintptr_t) hostaddrs[i];
-	    if ((kinds[i] & 7) != 4)
+	    if (!GOMP_MAP_POINTER_P (kind & typemask))
 	      k->host_end = k->host_start + sizes[i];
 	    else
 	      k->host_end = k->host_start + sizeof (void *);
-	    splay_tree_key n
-	      = splay_tree_lookup (&devicep->dev_splay_tree, k);
+	    splay_tree_key n = splay_tree_lookup (&mm->splay_tree, k);
 	    if (n)
 	      {
 		tgt->list[i] = n;
-		gomp_map_vars_existing (n, k, kinds[i]);
+		gomp_map_vars_existing (n, k, kind);
 	      }
 	    else
 	      {
-		size_t align = (size_t) 1 << (kinds[i] >> 3);
+		size_t align = (size_t) 1 << (kind >> rshift);
 		tgt->list[i] = k;
 		tgt_size = (tgt_size + align - 1) & ~(align - 1);
 		k->tgt = tgt;
 		k->tgt_offset = tgt_size;
 		tgt_size += k->host_end - k->host_start;
-		k->copy_from = false;
-		if ((kinds[i] & 7) == 2 || (kinds[i] & 7) == 3)
-		  k->copy_from = true;
+		k->copy_from = GOMP_MAP_COPYFROM_P (kind & typemask)
+			       || GOMP_MAP_TOFROM_P (kind & typemask);
 		k->refcount = 1;
+		k->async_refcount = 0;
 		tgt->refcount++;
 		array->left = NULL;
 		array->right = NULL;
-		splay_tree_insert (&devicep->dev_splay_tree, array);
-		switch (kinds[i] & 7)
+		splay_tree_insert (&mm->splay_tree, array);
+		switch (kind & typemask)
 		  {
-		  case 0: /* ALLOC */
-		  case 2: /* FROM */
+		  case GOMP_MAP_FORCE_ALLOC:
+		  case GOMP_MAP_FORCE_FROM:
+		    /* FIXME: No special handling (see comment in
+		       oacc-parallel.c).  */
+		  case GOMP_MAP_ALLOC:
+		  case GOMP_MAP_ALLOC_FROM:
 		    break;
-		  case 1: /* TO */
-		  case 3: /* TOFROM */
+		  case GOMP_MAP_FORCE_TO:
+		  case GOMP_MAP_FORCE_TOFROM:
+		    /* FIXME: No special handling, as above.  */
+		  case GOMP_MAP_ALLOC_TO:
+		  case GOMP_MAP_ALLOC_TOFROM:
+		    /* Copy from host to device memory.  */
 		    /* FIXME: Perhaps add some smarts, like if copying
 		       several adjacent fields from host to target, use some
 		       host buffer to avoid sending each var individually.  */
@@ -350,7 +307,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 					    (void *) k->host_start,
 					    k->host_end - k->host_start);
 		    break;
-		  case 4: /* POINTER */
+		  case GOMP_MAP_POINTER:
 		    cur_node.host_start
 		      = (uintptr_t) *(void **) k->host_start;
 		    if (cur_node.host_start == (uintptr_t) NULL)
@@ -366,19 +323,16 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 		    /* Add bias to the pointer value.  */
 		    cur_node.host_start += sizes[i];
 		    cur_node.host_end = cur_node.host_start + 1;
-		    n = splay_tree_lookup (&devicep->dev_splay_tree,
-					   &cur_node);
+		    n = splay_tree_lookup (&mm->splay_tree, &cur_node);
 		    if (n == NULL)
 		      {
 			/* Could be possibly zero size array section.  */
 			cur_node.host_end--;
-			n = splay_tree_lookup (&devicep->dev_splay_tree,
-					       &cur_node);
+			n = splay_tree_lookup (&mm->splay_tree, &cur_node);
 			if (n == NULL)
 			  {
 			    cur_node.host_start--;
-			    n = splay_tree_lookup (&devicep->dev_splay_tree,
-						   &cur_node);
+			    n = splay_tree_lookup (&mm->splay_tree, &cur_node);
 			    cur_node.host_start++;
 			  }
 		      }
@@ -398,14 +352,17 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 					    (void *) &cur_node.tgt_offset,
 					    sizeof (void *));
 		    break;
-		  case 5: /* TO_PSET */
-		    devicep->host2dev_func (devicep->target_id,
-					    (void *) (tgt->tgt_start
-						      + k->tgt_offset),
-					    (void *) k->host_start,
-					    k->host_end - k->host_start);
+		  case GOMP_MAP_TO_PSET:
+		    /* Copy from host to device memory.  */
+		    /* FIXME: see above FIXME comment.  */
+		    devicep->host2dev_func
+		      (devicep->target_id,
+		       (void *) (tgt->tgt_start + k->tgt_offset),
+		       (void *) k->host_start,
+		       (k->host_end - k->host_start));
 		    for (j = i + 1; j < mapnum; j++)
-		      if ((kinds[j] & 7) != 4)
+		      if (!GOMP_MAP_POINTER_P (get_kind (is_openacc, kinds, j)
+					       & typemask))
 			break;
 		      else if ((uintptr_t) hostaddrs[j] < k->host_start
 			       || ((uintptr_t) hostaddrs[j] + sizeof (void *)
@@ -432,19 +389,18 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 			  /* Add bias to the pointer value.  */
 			  cur_node.host_start += sizes[j];
 			  cur_node.host_end = cur_node.host_start + 1;
-			  n = splay_tree_lookup (&devicep->dev_splay_tree,
-						 &cur_node);
+			  n = splay_tree_lookup (&mm->splay_tree, &cur_node);
 			  if (n == NULL)
 			    {
 			      /* Could be possibly zero size array section.  */
 			      cur_node.host_end--;
-			      n = splay_tree_lookup (&devicep->dev_splay_tree,
+			      n = splay_tree_lookup (&mm->splay_tree,
 						     &cur_node);
 			      if (n == NULL)
 				{
 				  cur_node.host_start--;
-				  n = splay_tree_lookup
-					(&devicep->dev_splay_tree, &cur_node);
+				  n = splay_tree_lookup (&mm->splay_tree,
+							 &cur_node);
 				  cur_node.host_start++;
 				}
 			    }
@@ -468,6 +424,32 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 			  i++;
 			}
 		      break;
+		    case GOMP_MAP_FORCE_PRESENT:
+		      {
+		        /* We already looked up the memory region above and it
+			   was missing.  */
+			size_t size = k->host_end - k->host_start;
+			gomp_fatal ("present clause: !acc_is_present (%p, "
+				    "%zd (0x%zx))", (void *) k->host_start,
+				    size, size);
+		      }
+		      break;
+		    case GOMP_MAP_FORCE_DEVICEPTR:
+		      assert (k->host_end - k->host_start == sizeof (void *));
+		      
+		      devicep->host2dev_func
+		        (devicep->target_id,
+			 (void *) (tgt->tgt_start + k->tgt_offset),
+			 (void *) k->host_start,
+			 sizeof (void *));
+		      break;
+		    case GOMP_MAP_FORCE_PRIVATE:
+		      abort ();
+		    case GOMP_MAP_FORCE_FIRSTPRIVATE:
+		      abort ();
+		    default:
+		      gomp_fatal ("%s: unhandled kind 0x%.2x", __FUNCTION__,
+				  kind);
 		  }
 		array++;
 	      }
@@ -490,7 +472,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 	}
     }
 
-  gomp_mutex_unlock (&devicep->dev_env_lock);
+  gomp_mutex_unlock (&mm->lock);
   return tgt;
 }
 
@@ -505,10 +487,51 @@ gomp_unmap_tgt (struct target_mem_desc *tgt)
   free (tgt);
 }
 
-static void
-gomp_unmap_vars (struct target_mem_desc *tgt)
+/* Decrease the refcount for a set of mapped variables, and queue asychronous
+   copies from the device back to the host after any work that has been issued. 
+   Because the regions are still "live", increment an asynchronous reference
+   count to indicate that they should not be unmapped from host-side data
+   structures until the asynchronous copy has completed.  */
+
+attribute_hidden void
+gomp_copy_from_async (struct target_mem_desc *tgt)
+{
+  struct gomp_device_descr *devicep = tgt->device_descr;
+  struct gomp_memory_mapping *mm = tgt->mem_map;
+  size_t i;
+  
+  gomp_mutex_lock (&mm->lock);
+
+  for (i = 0; i < tgt->list_count; i++)
+    if (tgt->list[i] == NULL)
+      ;
+    else if (tgt->list[i]->refcount > 1)
+      {
+	tgt->list[i]->refcount--;
+	tgt->list[i]->async_refcount++;
+      }
+    else
+      {
+	splay_tree_key k = tgt->list[i];
+	if (k->copy_from)
+	  /* Copy from device to host memory.  */
+	  devicep->dev2host_func (devicep->target_id, (void *) k->host_start,
+				  (void *) (k->tgt->tgt_start + k->tgt_offset),
+				  k->host_end - k->host_start);
+      }
+
+  gomp_mutex_unlock (&mm->lock);
+}
+
+/* Unmap variables described by TGT.  If DO_COPYFROM is true, copy relevant
+   variables back from device to host: if it is false, it is assumed that this
+   has been done already, i.e. by gomp_copy_from_async above.  */
+
+attribute_hidden void
+gomp_unmap_vars (struct target_mem_desc *tgt, bool do_copyfrom)
 {
   struct gomp_device_descr *devicep = tgt->device_descr;
+  struct gomp_memory_mapping *mm = tgt->mem_map;
 
   if (tgt->list_count == 0)
     {
@@ -517,20 +540,23 @@ gomp_unmap_vars (struct target_mem_desc *tgt)
     }
 
   size_t i;
-  gomp_mutex_lock (&devicep->dev_env_lock);
+  gomp_mutex_lock (&mm->lock);
   for (i = 0; i < tgt->list_count; i++)
     if (tgt->list[i] == NULL)
       ;
     else if (tgt->list[i]->refcount > 1)
       tgt->list[i]->refcount--;
+    else if (tgt->list[i]->async_refcount > 0)
+      tgt->list[i]->async_refcount--;
     else
       {
 	splay_tree_key k = tgt->list[i];
-	if (k->copy_from)
+	if (k->copy_from && do_copyfrom)
+	  /* Copy from device to host memory.  */
 	  devicep->dev2host_func (devicep->target_id, (void *) k->host_start,
 				  (void *) (k->tgt->tgt_start + k->tgt_offset),
 				  k->host_end - k->host_start);
-	splay_tree_remove (&devicep->dev_splay_tree, k);
+	splay_tree_remove (&mm->splay_tree, k);
 	if (k->tgt->refcount > 1)
 	  k->tgt->refcount--;
 	else
@@ -541,15 +567,17 @@ gomp_unmap_vars (struct target_mem_desc *tgt)
     tgt->refcount--;
   else
     gomp_unmap_tgt (tgt);
-  gomp_mutex_unlock (&devicep->dev_env_lock);
+  gomp_mutex_unlock (&mm->lock);
 }
 
 static void
-gomp_update (struct gomp_device_descr *devicep, size_t mapnum,
-	     void **hostaddrs, size_t *sizes, unsigned char *kinds)
+gomp_update (struct gomp_device_descr *devicep, struct gomp_memory_mapping *mm,
+	     size_t mapnum, void **hostaddrs, size_t *sizes, void *kinds,
+	     bool is_openacc)
 {
   size_t i;
   struct splay_tree_key_s cur_node;
+  const int typemask = is_openacc ? 0xff : 0x7;
 
   if (!devicep)
     return;
@@ -557,16 +585,17 @@ gomp_update (struct gomp_device_descr *devicep, size_t mapnum,
   if (mapnum == 0)
     return;
 
-  gomp_mutex_lock (&devicep->dev_env_lock);
+  gomp_mutex_lock (&mm->lock);
   for (i = 0; i < mapnum; i++)
     if (sizes[i])
       {
 	cur_node.host_start = (uintptr_t) hostaddrs[i];
 	cur_node.host_end = cur_node.host_start + sizes[i];
-	splay_tree_key n = splay_tree_lookup (&devicep->dev_splay_tree,
+	splay_tree_key n = splay_tree_lookup (&mm->splay_tree,
 					      &cur_node);
 	if (n)
 	  {
+	    int kind = get_kind (is_openacc, kinds, i);
 	    if (n->host_start > cur_node.host_start
 		|| n->host_end < cur_node.host_end)
 	      gomp_fatal ("Trying to update [%p..%p) object when"
@@ -575,31 +604,38 @@ gomp_update (struct gomp_device_descr *devicep, size_t mapnum,
 			  (void *) cur_node.host_end,
 			  (void *) n->host_start,
 			  (void *) n->host_end);
-	    if ((kinds[i] & 7) == 1)
-	      devicep->host2dev_func (devicep->target_id,
-				      (void *) (n->tgt->tgt_start
-						+ n->tgt_offset
-						+ cur_node.host_start
-						- n->host_start),
-				      (void *) cur_node.host_start,
-				      cur_node.host_end - cur_node.host_start);
-	    else if ((kinds[i] & 7) == 2)
-	      devicep->dev2host_func (devicep->target_id,
-				      (void *) cur_node.host_start,
-				      (void *) (n->tgt->tgt_start
-						+ n->tgt_offset
-						+ cur_node.host_start
-						- n->host_start),
-				      cur_node.host_end - cur_node.host_start);
+	    if (GOMP_MAP_COPYTO_P (kind & typemask))
+	      /* Copy from host to device memory.  */
+	      devicep->host2dev_func
+		(devicep->target_id, 
+		 (void *) (n->tgt->tgt_start
+			   + n->tgt_offset
+			   + cur_node.host_start
+			   - n->host_start),
+		 (void *) cur_node.host_start,
+		 cur_node.host_end - cur_node.host_start);
+	    else if (GOMP_MAP_COPYFROM_P (kind & typemask))
+	      /* Copy from device to host memory.  */
+	      devicep->dev2host_func
+		(devicep->target_id,
+		 (void *) cur_node.host_start,
+		 (void *) (n->tgt->tgt_start
+			   + n->tgt_offset
+			   + cur_node.host_start
+			   - n->host_start),
+		 cur_node.host_end - cur_node.host_start);
 	  }
 	else
 	  gomp_fatal ("Trying to update [%p..%p) object that is not mapped",
 		      (void *) cur_node.host_start,
 		      (void *) cur_node.host_end);
       }
-  gomp_mutex_unlock (&devicep->dev_env_lock);
+  gomp_mutex_unlock (&mm->lock);
 }
 
+static void gomp_register_image_for_device (struct gomp_device_descr *device,
+					    struct offload_image_descr *image);
+
 /* This function should be called from every offload image.
    It gets the descriptor of the host func and var tables HOST_TABLE, TYPE of
    the target, and TARGET_DATA needed by target plugin.  */
@@ -612,6 +648,9 @@ GOMP_offload_register (void *host_table, enum offload_target_type target_type,
 				 (num_offload_images + 1)
 				 * sizeof (struct offload_image_descr));
 
+  if (offload_images == NULL)
+    return;
+
   offload_images[num_offload_images].type = target_type;
   offload_images[num_offload_images].host_table = host_table;
   offload_images[num_offload_images].target_data = target_data;
@@ -621,17 +660,24 @@ GOMP_offload_register (void *host_table, enum offload_target_type target_type,
 
 /* This function initializes the target device, specified by DEVICEP.  */
 
-static void
+attribute_hidden void
 gomp_init_device (struct gomp_device_descr *devicep)
 {
+  /* Initialize the target device.  */
   devicep->init_device_func (devicep->target_id);
+  
+  devicep->is_initialized = true;
+}
 
+attribute_hidden void
+gomp_init_tables (const struct gomp_device_descr *devicep,
+		  struct gomp_memory_mapping *mm)
+{
   /* Get address mapping table for device.  */
   struct mapping_table *table = NULL;
-  int num_entries = devicep->get_table_func (devicep->target_id, &table);
+  int i, num_entries = devicep->get_table_func (devicep->target_id, &table);
 
   /* Insert host-target address mapping into dev_splay_tree.  */
-  int i;
   for (i = 0; i < num_entries; i++)
     {
       struct target_mem_desc *tgt = gomp_malloc (sizeof (*tgt));
@@ -641,7 +687,7 @@ gomp_init_device (struct gomp_device_descr *devicep)
       tgt->tgt_end = table[i].tgt_end;
       tgt->to_free = NULL;
       tgt->list_count = 0;
-      tgt->device_descr = devicep;
+      tgt->device_descr = (struct gomp_device_descr *) devicep;
       splay_tree_node node = tgt->array;
       splay_tree_key k = &node->key;
       k->host_start = table[i].host_start;
@@ -650,11 +696,45 @@ gomp_init_device (struct gomp_device_descr *devicep)
       k->tgt = tgt;
       node->left = NULL;
       node->right = NULL;
-      splay_tree_insert (&devicep->dev_splay_tree, node);
+      splay_tree_insert (&mm->splay_tree, node);
     }
 
   free (table);
-  devicep->is_initialized = true;
+  mm->is_initialized = true;
+}
+
+static void
+gomp_init_dev_tables (struct gomp_device_descr *devicep)
+{
+  gomp_init_device (devicep);
+  gomp_init_tables (devicep, &devicep->mem_map);
+}
+
+
+attribute_hidden void
+gomp_free_memmap (struct gomp_device_descr *devicep)
+{
+  struct gomp_memory_mapping *mm = &devicep->mem_map;
+
+  while (mm->splay_tree.root)
+    {
+      struct target_mem_desc *tgt = mm->splay_tree.root->key.tgt;
+      
+      splay_tree_remove (&mm->splay_tree, &mm->splay_tree.root->key);
+      free (tgt->array);
+      free (tgt);
+    }
+
+  mm->is_initialized = false;
+}
+
+attribute_hidden void
+gomp_fini_device (struct gomp_device_descr *devicep)
+{
+  if (devicep->is_initialized)
+    devicep->fini_device_func (devicep->target_id);
+
+  devicep->is_initialized = false;
 }
 
 /* Called when encountering a target directive.  If DEVICE
@@ -673,7 +753,12 @@ GOMP_target (int device, void (*fn) (void *), const void *openmp_target,
 	     unsigned char *kinds)
 {
   struct gomp_device_descr *devicep = resolve_device (device);
-  if (devicep == NULL)
+  struct gomp_memory_mapping *mm = &devicep->mem_map;
+
+  if (devicep != NULL && !devicep->is_initialized)
+    gomp_init_dev_tables (devicep);
+
+  if (devicep == NULL || !(devicep->capabilities & TARGET_CAP_OPENMP_400))
     {
       /* Host fallback.  */
       struct gomp_thread old_thr, *thr = gomp_thread ();
@@ -690,20 +775,30 @@ GOMP_target (int device, void (*fn) (void *), const void *openmp_target,
       return;
     }
 
-  gomp_mutex_lock (&devicep->dev_env_lock);
-  if (!devicep->is_initialized)
-    gomp_init_device (devicep);
+  void *fn_addr;
 
-  struct splay_tree_key_s k;
-  k.host_start = (uintptr_t) fn;
-  k.host_end = k.host_start + 1;
-  splay_tree_key tgt_fn = splay_tree_lookup (&devicep->dev_splay_tree, &k);
-  if (tgt_fn == NULL)
-    gomp_fatal ("Target function wasn't mapped");
-  gomp_mutex_unlock (&devicep->dev_env_lock);
+  if (devicep->capabilities & TARGET_CAP_NATIVE_EXEC)
+    fn_addr = (void *) fn;
+  else
+    {
+      gomp_mutex_lock (&mm->lock);
+      if (!devicep->is_initialized)
+	gomp_init_dev_tables (devicep);
+      struct splay_tree_key_s k;
+      k.host_start = (uintptr_t) fn;
+      k.host_end = k.host_start + 1;
+      splay_tree_key tgt_fn = splay_tree_lookup (&devicep->mem_map.splay_tree,
+						 &k);
+      if (tgt_fn == NULL)
+	gomp_fatal ("Target function wasn't mapped");
+      gomp_mutex_unlock (&mm->lock);
+      
+      fn_addr = (void *) tgt_fn->tgt->tgt_start;
+    }
 
   struct target_mem_desc *tgt_vars
-    = gomp_map_vars (devicep, mapnum, hostaddrs, sizes, kinds, true);
+    = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, false,
+		     true);
   struct gomp_thread old_thr, *thr = gomp_thread ();
   old_thr = *thr;
   memset (thr, '\0', sizeof (*thr));
@@ -712,11 +807,10 @@ GOMP_target (int device, void (*fn) (void *), const void *openmp_target,
       thr->place = old_thr.place;
       thr->ts.place_partition_len = gomp_places_list_len;
     }
-  devicep->run_func (devicep->target_id, (void *) tgt_fn->tgt->tgt_start,
-		     (void *) tgt_vars->tgt_start);
+  devicep->run_func (devicep->target_id, fn_addr, (void *) tgt_vars->tgt_start);
   gomp_free_thread (thr);
   *thr = old_thr;
-  gomp_unmap_vars (tgt_vars);
+  gomp_unmap_vars (tgt_vars, true);
 }
 
 void
@@ -724,7 +818,12 @@ GOMP_target_data (int device, const void *openmp_target, size_t mapnum,
 		  void **hostaddrs, size_t *sizes, unsigned char *kinds)
 {
   struct gomp_device_descr *devicep = resolve_device (device);
-  if (devicep == NULL)
+  struct gomp_memory_mapping *mm = &devicep->mem_map;
+
+  if (devicep != NULL && !devicep->is_initialized)
+    gomp_init_dev_tables (devicep);
+
+  if (devicep == NULL || !(devicep->capabilities & TARGET_CAP_OPENMP_400))
     {
       /* Host fallback.  */
       struct gomp_task_icv *icv = gomp_icv (false);
@@ -735,20 +834,21 @@ GOMP_target_data (int device, const void *openmp_target, size_t mapnum,
 	     new #pragma omp target data, otherwise GOMP_target_end_data
 	     would get out of sync.  */
 	  struct target_mem_desc *tgt
-	    = gomp_map_vars (NULL, 0, NULL, NULL, NULL, false);
+	    = gomp_map_vars (NULL, 0, NULL, NULL, NULL, NULL, false, false);
 	  tgt->prev = icv->target_data;
 	  icv->target_data = tgt;
 	}
       return;
     }
 
-  gomp_mutex_lock (&devicep->dev_env_lock);
+  gomp_mutex_lock (&mm->lock);
   if (!devicep->is_initialized)
-    gomp_init_device (devicep);
-  gomp_mutex_unlock (&devicep->dev_env_lock);
+    gomp_init_dev_tables (devicep);
+  gomp_mutex_unlock (&mm->lock);
 
   struct target_mem_desc *tgt
-    = gomp_map_vars (devicep, mapnum, hostaddrs, sizes, kinds, false);
+    = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, false,
+		     false);
   struct gomp_task_icv *icv = gomp_icv (true);
   tgt->prev = icv->target_data;
   icv->target_data = tgt;
@@ -762,7 +862,7 @@ GOMP_target_end_data (void)
     {
       struct target_mem_desc *tgt = icv->target_data;
       icv->target_data = tgt->prev;
-      gomp_unmap_vars (tgt);
+      gomp_unmap_vars (tgt, true);
     }
 }
 
@@ -771,15 +871,18 @@ GOMP_target_update (int device, const void *openmp_target, size_t mapnum,
 		    void **hostaddrs, size_t *sizes, unsigned char *kinds)
 {
   struct gomp_device_descr *devicep = resolve_device (device);
-  if (devicep == NULL)
-    return;
+  struct gomp_memory_mapping *mm = &devicep->mem_map;
 
-  gomp_mutex_lock (&devicep->dev_env_lock);
-  if (!devicep->is_initialized)
+  gomp_mutex_lock (&mm->lock);
+  if (devicep != NULL && !devicep->is_initialized)
     gomp_init_device (devicep);
-  gomp_mutex_unlock (&devicep->dev_env_lock);
+  gomp_mutex_unlock (&mm->lock);
 
-  gomp_update (devicep, mapnum, hostaddrs, sizes, kinds);
+  if (devicep == NULL || !(devicep->capabilities & TARGET_CAP_OPENMP_400))
+    return;
+
+  gomp_update (devicep, &devicep->mem_map, mapnum, hostaddrs, sizes, kinds,
+	       false);
 }
 
 void
@@ -806,9 +909,22 @@ gomp_load_plugin_for_device (struct gomp_device_descr *device,
 			     const char *plugin_name)
 {
   void *plugin_handle = dlopen (plugin_name, RTLD_LAZY);
+  char *err = NULL, *last_missing = NULL;
+  int optional_present, optional_total;
+
   if (!plugin_handle)
     return false;
 
+  /* Clear any existing error.  */
+  dlerror ();
+
+  device->plugin_handle = dlopen (plugin_name, RTLD_LAZY);
+  if (!device->plugin_handle)
+    {
+      err = dlerror ();
+      goto out;
+    }
+
   /* Check if all required functions are available in the plugin and store
      their handlers.  */
 #define DLSYM(f)						    \
@@ -819,33 +935,104 @@ gomp_load_plugin_for_device (struct gomp_device_descr *device,
 	return false;						    \
     }								    \
   while (0)
+  /* Similar, but missing functions are not an error.  */
+#define DLSYM_OPT(f,n) \
+  do									\
+    {									\
+      char *tmp_err;							\
+      device->f##_func = dlsym (device->plugin_handle,			\
+				"GOMP_OFFLOAD_" #n);			\
+      tmp_err = dlerror ();						\
+      if (tmp_err == NULL)						\
+        optional_present++;						\
+      else								\
+        last_missing = #n;						\
+      optional_total++;							\
+    }									\
+  while (0)
+
+  DLSYM (get_name);
+  DLSYM (get_caps);
   DLSYM (get_type);
   DLSYM (get_num_devices);
   DLSYM (register_image);
   DLSYM (init_device);
+  DLSYM (fini_device);
   DLSYM (get_table);
   DLSYM (alloc);
   DLSYM (free);
   DLSYM (dev2host);
   DLSYM (host2dev);
-  DLSYM (run);
+  device->capabilities = device->get_caps_func ();
+  if (device->capabilities & TARGET_CAP_OPENMP_400)
+    DLSYM (run);
+  if (device->capabilities & TARGET_CAP_OPENACC_200)
+    {
+      optional_present = optional_total = 0;
+      DLSYM_OPT (openacc.exec, openacc_parallel);
+      DLSYM_OPT (openacc.open_device, openacc_open_device);
+      DLSYM_OPT (openacc.close_device, openacc_close_device);
+      DLSYM_OPT (openacc.get_device_num, openacc_get_device_num);
+      DLSYM_OPT (openacc.set_device_num, openacc_set_device_num);
+      DLSYM_OPT (openacc.register_async_cleanup,
+		 openacc_register_async_cleanup);
+      DLSYM_OPT (openacc.async_test, openacc_async_test);
+      DLSYM_OPT (openacc.async_test_all, openacc_async_test_all);
+      DLSYM_OPT (openacc.async_wait, openacc_async_wait);
+      DLSYM_OPT (openacc.async_wait_async, openacc_async_wait_async);
+      DLSYM_OPT (openacc.async_wait_all, openacc_async_wait_all);
+      DLSYM_OPT (openacc.async_wait_all_async, openacc_async_wait_all_async);
+      DLSYM_OPT (openacc.async_set_async, openacc_async_set_async);
+      DLSYM_OPT (openacc.create_thread_data, openacc_create_thread_data);
+      DLSYM_OPT (openacc.destroy_thread_data, openacc_destroy_thread_data);
+      /* Require all the OpenACC handlers if we have TARGET_CAP_OPENACC_200.  */
+      if (optional_present != optional_total)
+	{
+	  err = "plugin missing OpenACC handler function";
+	  goto out;
+	}
+      optional_present = optional_total = 0;
+      DLSYM_OPT (openacc.cuda.get_current_device,
+		 openacc_get_current_cuda_device);
+      DLSYM_OPT (openacc.cuda.get_current_context,
+		 openacc_get_current_cuda_context);
+      DLSYM_OPT (openacc.cuda.get_stream, openacc_get_cuda_stream);
+      DLSYM_OPT (openacc.cuda.set_stream, openacc_set_cuda_stream);
+      /* Make sure all the CUDA functions are there if any of them are.  */
+      if (optional_present && optional_present != optional_total)
+	{
+	  err = "plugin missing OpenACC CUDA handler function";
+	  goto out;
+	}
+    }
 #undef DLSYM
+#undef DLSYM_OPT
 
-  return true;
+ out:
+  if (err != NULL)
+    {
+      gomp_error ("while loading %s: %s", plugin_name, err);
+      if (last_missing)
+        gomp_error ("missing function was %s", last_missing);
+      if (device->plugin_handle)
+	dlclose (device->plugin_handle);
+    }
+  return err == NULL;
 }
 
-/* This function finds OFFLOAD_IMAGES corresponding to DEVICE type, and
-   registers them in the plugin.  */
+/* This function adds a compatible offload image IMAGE to an accelerator device
+   DEVICE.  */
 
 static void
-gomp_register_images_for_device (struct gomp_device_descr *device)
+gomp_register_image_for_device (struct gomp_device_descr *device,
+				struct offload_image_descr *image)
 {
-  int i;
-  for (i = 0; i < num_offload_images; i++)
+  if (!device->offload_regions_registered
+      && (device->type == image->type
+	  || device->type == OFFLOAD_TARGET_TYPE_HOST))
     {
-      struct offload_image_descr *image = &offload_images[i];
-      if (image->type == device->type)
-	device->register_image_func (image->host_table, image->target_data);
+      device->register_image_func (image->host_table, image->target_data);
+      device->offload_regions_registered = true;
     }
 }
 
@@ -901,15 +1088,19 @@ gomp_target_init (void)
 		  }
 
 		current_device.type = current_device.get_type_func ();
+		current_device.name = current_device.get_name_func ();
 		current_device.is_initialized = false;
-		current_device.dev_splay_tree.root = NULL;
-		gomp_register_images_for_device (&current_device);
+		current_device.offload_regions_registered = false;
+		current_device.mem_map.splay_tree.root = NULL;
+		current_device.mem_map.is_initialized = false;
+		current_device.target_data = NULL;
+		current_device.openacc.data_environ = NULL;
 		for (i = 0; i < new_num_devices; i++)
 		  {
 		    current_device.id = num_devices + 1;
 		    current_device.target_id = i;
 		    devices[num_devices] = current_device;
-		    gomp_mutex_init (&devices[num_devices].dev_env_lock);
+		    gomp_mutex_init (&devices[num_devices].mem_map.lock);
 		    num_devices++;
 		  }
 	      }
@@ -920,6 +1111,43 @@ gomp_target_init (void)
       }
     while (next);
 
+  /* Prefer a device with TARGET_CAP_OPENMP_400 for ICV default-device-var.  */
+  if (num_devices > 1)
+    {
+      int d = gomp_icv (false)->default_device_var;
+
+      if (!(devices[d].capabilities & TARGET_CAP_OPENMP_400))
+	{
+	  for (i = 0; i < num_devices; i++)
+	    {
+	      if (devices[i].capabilities & TARGET_CAP_OPENMP_400)
+		{
+		  struct gomp_device_descr device_tmp = devices[d];
+		  devices[d] = devices[i];
+		  devices[d].id = d + 1;
+		  devices[i] = device_tmp;
+		  devices[i].id = i + 1;
+
+		  break;
+		}
+	    }
+	}
+    }
+
+  for (i = 0; i < num_devices; i++)
+    {
+      int j;
+
+      for (j = 0; j < num_offload_images; j++)
+	gomp_register_image_for_device (&devices[i], &offload_images[j]);
+
+      /* The 'devices' array can be moved (by the realloc call) until we have
+	 found all the plugins, so registering with the OpenACC runtime (which
+	 takes a copy of the pointer argument) must be delayed until now.  */
+      if (devices[i].capabilities & TARGET_CAP_OPENACC_200)
+	ACC_register (&devices[i]);
+    }
+
   free (offload_images);
   offload_images = NULL;
   num_offload_images = 0;
diff --git a/libgomp/target.h b/libgomp/target.h
new file mode 100644
index 0000000..e69de29
diff --git a/libgomp/testsuite/Makefile.in b/libgomp/testsuite/Makefile.in
index 5273eaa..634844c 100644
diff --git a/libgomp/testsuite/libgomp-test-support.exp.in b/libgomp/testsuite/libgomp-test-support.exp.in
new file mode 100644
index 0000000..dcadad7
--- /dev/null
+++ b/libgomp/testsuite/libgomp-test-support.exp.in
@@ -0,0 +1,2 @@
+set cuda_driver_include "@CUDA_DRIVER_INCLUDE@"
+set cuda_driver_lib "@CUDA_DRIVER_LIB@"

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/5] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin (repost)
  2014-11-11 13:54 ` [PATCH 1/5] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin (repost) Julian Brown
@ 2014-11-12 10:10   ` Jakub Jelinek
  2014-11-12 10:59     ` Thomas Schwinge
                       ` (5 more replies)
  2014-12-22 17:55   ` Thomas Schwinge
                     ` (5 subsequent siblings)
  6 siblings, 6 replies; 36+ messages in thread
From: Jakub Jelinek @ 2014-11-12 10:10 UTC (permalink / raw)
  To: Julian Brown; +Cc: gcc-patches, Thomas Schwinge, Ilya Verbin

On Tue, Nov 11, 2014 at 01:53:23PM +0000, Julian Brown wrote:
> A few OpenMP tests fail with the new host_nonshm plugin (with failures
> of the form "libgomp: Trying to update [0x605820..0x605824) object that
> is not mapped"), probably because of middle-end bugs. I haven't
> investigated those in detail.

Depends how exactly your host_nonshm plugin works.  A few tests in the
testsuite use #pragma omp declare target variables, so if host_nonshm
plugin is something like I had on the gomp-4_0-branch initially as
hackish device 257, where code is run on the host, and map directives simply
malloc/free host memory and memcpy stuff around, then without extra work
the #pragma omp declare target variables indeed can't work.
You'd either need to support a strange partially shared memory model,
where #pragma omp declare target variables would be shared (you'd still
need to populate the mapping data structures with those vars and identity
map them), or not so conforming model where you'd map them on entering
the target regions if they aren't mapped yet (the thing is that then
if the variables are changed on the host in between the start of the program
and the target region, you'd use the changed values instead the values
they were originally assigned), or map them in some constructor (but, how
would you know if a host_nonshm plugin is going to be used in the future).

One can always use the intelmicemul plugin to test nonshared-memory stuff
without any HW (provided the host is x86_64/i686), so do we really need
host_nonshm plugin?

> --- a/libgomp/configure.ac
> +++ b/libgomp/configure.ac
> @@ -2,6 +2,8 @@
>  # aclocal -I ../config && autoconf && autoheader && automake
>  
>  AC_PREREQ(2.64)
> +#TODO: Update for OpenACC?  But then also have to update copyright notices in
> +#all source files...

Please drop this.

> @@ -1181,6 +1197,7 @@ initialize_env (void)
>        gomp_global_icv.thread_limit_var
>  	= thread_limit_var > INT_MAX ? UINT_MAX : thread_limit_var;
>      }
> +  parse_int ("GCC_ACC_NOTIFY", &goacc_notify_var, true);

I would have expected GACC_NOTIFY name instead (or GOACC_NOTIFY)
to match GOMP_SPINCOUNT and similar env vars.

> +  /* Initialize OpenACC-specific internal state.  */
> +  ACC_runtime_initialize ();

Is there any need for the capital letters in the function name?

> -static void
> +void
>  gomp_verror (const char *fmt, va_list list)
>  {
>    fputs ("\nlibgomp: ", stderr);
> @@ -54,13 +54,39 @@ gomp_error (const char *fmt, ...)
>  }
>  
>  void
> +gomp_vfatal (const char *fmt, va_list list)
> +{
> +  gomp_verror (fmt, list);
> +  exit (EXIT_FAILURE);
> +}

You should add noreturn attribute to gomp_vfatal prototype in the header.

> +
> +void
>  gomp_fatal (const char *fmt, ...)
>  {
>    va_list list;
>  
>    va_start (list, fmt);
> -  gomp_verror (fmt, list);
> +  gomp_vfatal (fmt, list);
>    va_end (list);
>  
> -  exit (EXIT_FAILURE);
> +  /* Unreachable.  */
> +  abort ();

And there is no need for the abort here.

> +extern int goacc_notify_var;
> +extern int goacc_device_num;
> +extern char* goacc_device_type;

See above.

> @@ -532,8 +538,12 @@ extern void *gomp_realloc (void *, size_t);
>  
>  /* error.c */
>  
> +extern void gomp_vnotify (const char *, va_list);
> +extern void gomp_notify (const char *msg, ...);
> +extern void gomp_verror (const char *, va_list);
>  extern void gomp_error (const char *, ...)
>  	__attribute__((format (printf, 1, 2)));
> +extern void gomp_vfatal (const char *, va_list);

See above.  Also, please add format attributes too for all the new
prototypes here.

>  extern void gomp_fatal (const char *, ...)
>  	__attribute__((noreturn, format (printf, 1, 2)));
>  

> +OACC_2.0 {
> +  global:
> +	acc_get_num_devices;
> +	acc_get_num_devices_h_;

Somebody recently suggested (for OpenMP) that we just should use
bind(C) in the Fortran module, it is too late for OpenMP, as we
have to keep the *_ entrypoints for compatibility anyway, but
for OpenACC and new OpenMP functions supposedly you could avoid
exporting all the *_ wrappers and use * directly.

> +PLUGIN_1.0 {

Perhaps GOMP_PLUGIN_1.0 instead?

> +  global:
> +	GOMP_PLUGIN_malloc;
> +	GOMP_PLUGIN_malloc_cleared;
> +	GOMP_PLUGIN_realloc;
> +	GOMP_PLUGIN_error;
> +	GOMP_PLUGIN_notify;
> +	GOMP_PLUGIN_fatal;
> +	GOMP_PLUGIN_mutex_init;
> +	GOMP_PLUGIN_mutex_destroy;
> +	GOMP_PLUGIN_mutex_lock;
> +	GOMP_PLUGIN_mutex_unlock;
> +	GOMP_PLUGIN_async_unmap_vars;
> +	GOMP_PLUGIN_acc_thread;
> +};
> +
> +# TODO.  See testsuite/lib/libgomp.exp:libgomp_init.
> +INTERNAL {
> +  global:
> +	initialize_env;
> +};

Ugh, I don't like that.  If it is a hack around dejagnu deficiency, then
perhaps dejagnu should be changed or gcc *.exp adjusted, if it is for
all programs, then there should be some way how to communicate passing
state from the host to the target plugin.

> +typedef struct ACC_dispatch_t

Can't you just use acc_dispatch_t ?
I'd prefer the capital prefixes just for functions called from
compiler generated code (like GOMP_* entrypoints; so GACC_*) and
perhaps if the standard mandates some other functions/structures to be
upper-case, or for the functions plugin calls from libgomp or libgomp
looks up in the plugins, but not elsewhere.

> +/* This is called when plugins have been initialized, and serves to call
> +   (indirectly) the target's device_init hook.  Calling multiple times without
> +   an intervening _acc_shutdown call is an error.  */
> +
> +static struct gomp_device_descr const *
> +_acc_init (acc_device_t d)

Why the underscore prefix?  Can't it clash with reserved namespaces?

> +static void
> +dump_var (char *s, size_t idx, void *hostaddr, size_t size, unsigned char kind)
> +{
> +  gomp_notify(" %2zi: %3s 0x%.2x -", idx, s, kind & 0xff);

Formatting, missing space before ( (many times).

> +  gomp_notify("- %d - %4d/0x%04x ", 1 << (kind >> 8), (int)size, (int)size);

And space after (int).

> +
> +  gomp_notify ("%s: mapnum=%zd, hostaddrs=%p, sizes=%p, kinds=%p\n",
> +	       __FUNCTION__, mapnum, hostaddrs, sizes, kinds);

Isn't such debugging too costly?  Perhaps either enable it only in
debugging builds, or at least guard with (perhaps in a gomp_notify macro)
with
  if (__builtin_expect (goacc_notify_var, 0))
    (gomp_notify) (__VA_ARGS__)
?  I'd think doing it in debugging builds only should be sufficient.
> +
> +  void ACC_async_copy(int) __GOACC_NOTHROW;
> +  void ACC_async_kern(int) __GOACC_NOTHROW;

Formatting.  Please check the missing spaces before ( everywhere.

> +} cuErrorList[] = {

This is GCC code, is there any need for CamelCase?
> +static char *
> +cuErrorMsg (CUresult r)

Ditto.

> +  { _XSTR(cuCtxCreate) },

Missing space before (.
> +  { _XSTR(cuCtxDestroy) },
...

	Jakub

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/5] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin (repost)
  2014-11-12 10:10   ` Jakub Jelinek
@ 2014-11-12 10:59     ` Thomas Schwinge
  2014-11-12 21:11       ` Mike Stump
  2014-11-12 11:06     ` Julian Brown
                       ` (4 subsequent siblings)
  5 siblings, 1 reply; 36+ messages in thread
From: Thomas Schwinge @ 2014-11-12 10:59 UTC (permalink / raw)
  To: Jakub Jelinek, Julian Brown, Ilya Verbin; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1947 bytes --]

Hi!

On Wed, 12 Nov 2014 11:06:26 +0100, Jakub Jelinek <jakub@redhat.com> wrote:
> On Tue, Nov 11, 2014 at 01:53:23PM +0000, Julian Brown wrote:
> > +# TODO.  See testsuite/lib/libgomp.exp:libgomp_init.
> > +INTERNAL {
> > +  global:
> > +	initialize_env;
> > +};

This should not have been part of the upstream trunk submission -- it is
an internal change, which also is not present on gomp-4_0-branch.

> Ugh, I don't like that.

Neither do I.  ;-/

> If it is a hack around dejagnu deficiency, then
> perhaps dejagnu should be changed or gcc *.exp adjusted, if it is for
> all programs, then there should be some way how to communicate passing
> state from the host to the target plugin.

There is no mechanism in DejaGnu to pass environment variables to remote
boards (which we're using in internal testing), and we currently use that
to circle through available accelerators/libgomp plugins: by setting the
ACC_DEVICE_TYPE environment variable.  For that, we have to pre-set that
environment variable before libgomp initializes, which is difficult (or,
not possible), because libgomp's initialization is done via a constructor
attribute, and it is not possible from the "main" executable to preempt a
dependent shared library's constructors, so we have to "re-initialize"
later on...

This clearly is a hack, was never meant for upstream, and should be
replaced by a different scheme: my idea was that GCC should expose to
libgomp the first offloading device, T1, that has been specified with
-foffload=T1,T2,..., (defaulting to a GCC configure-time list), and then
T1 would be used as the default device for OpenACC.  This T1,T2,... list
could also serve to map the numeric device clause ID enumeration in
OpenMP?  (As far as I'm aware, there is no scheme in OpenMP to map from
"descriptive" device names (intelmic, nvptx, ...) to the numeric IDs used
with the device clause?)


Grüße,
 Thomas

[-- Attachment #2: Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/5] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin (repost)
  2014-11-12 10:10   ` Jakub Jelinek
  2014-11-12 10:59     ` Thomas Schwinge
@ 2014-11-12 11:06     ` Julian Brown
  2014-11-12 11:15       ` Jakub Jelinek
  2014-11-12 11:33     ` libgomp: "GNU OpenMP Runtime Library" (was: [PATCH 1/5] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin (repost)) Thomas Schwinge
                       ` (3 subsequent siblings)
  5 siblings, 1 reply; 36+ messages in thread
From: Julian Brown @ 2014-11-12 11:06 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches, Thomas Schwinge, Ilya Verbin

On Wed, 12 Nov 2014 11:06:26 +0100
Jakub Jelinek <jakub@redhat.com> wrote:

> On Tue, Nov 11, 2014 at 01:53:23PM +0000, Julian Brown wrote:
> > A few OpenMP tests fail with the new host_nonshm plugin (with
> > failures of the form "libgomp: Trying to update
> > [0x605820..0x605824) object that is not mapped"), probably because
> > of middle-end bugs. I haven't investigated those in detail.
> 
> Depends how exactly your host_nonshm plugin works.  A few tests in the
> testsuite use #pragma omp declare target variables, so if host_nonshm
> plugin is something like I had on the gomp-4_0-branch initially as
> hackish device 257, where code is run on the host, and map directives
> simply malloc/free host memory and memcpy stuff around, then without
> extra work the #pragma omp declare target variables indeed can't work.
> You'd either need to support a strange partially shared memory model,
> where #pragma omp declare target variables would be shared (you'd
> still need to populate the mapping data structures with those vars
> and identity map them), or not so conforming model where you'd map
> them on entering the target regions if they aren't mapped yet (the
> thing is that then if the variables are changed on the host in
> between the start of the program and the target region, you'd use the
> changed values instead the values they were originally assigned), or
> map them in some constructor (but, how would you know if a
> host_nonshm plugin is going to be used in the future).

Thanks for the review! I'll work on addressing your comments. Your
characterization of the host_nonshm plugin sounds accurate, but OOI,
what does the Intel MIC plugin do differently that means it is not
subject to the same problem with target variables?

> One can always use the intelmicemul plugin to test nonshared-memory
> stuff without any HW (provided the host is x86_64/i686), so do we
> really need host_nonshm plugin?

It might still be useful for testing (non-shm) OpenACC without
hardware, I guess (or for pedagogical purposes) -- perhaps we could
remove the TARGET_CAP_OPENMP_400 flag, if that's not expected to work.

Julian

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/5] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin (repost)
  2014-11-12 11:06     ` Julian Brown
@ 2014-11-12 11:15       ` Jakub Jelinek
  0 siblings, 0 replies; 36+ messages in thread
From: Jakub Jelinek @ 2014-11-12 11:15 UTC (permalink / raw)
  To: Julian Brown; +Cc: gcc-patches, Thomas Schwinge, Ilya Verbin

On Wed, Nov 12, 2014 at 11:03:26AM +0000, Julian Brown wrote:
> Thanks for the review! I'll work on addressing your comments. Your
> characterization of the host_nonshm plugin sounds accurate, but OOI,
> what does the Intel MIC plugin do differently that means it is not
> subject to the same problem with target variables?

For the *-intelmicemul-* offloading target, the plugin uses the offloading
library and runs the offloading region in a separate process.
So, x86_64 (or i?86) ELF shared libraries are embedded into the data
sections of your programs, when encountering target region for the first
time it extracts them, saves them into temporary directory as shared
libraries, saves there also a short binary, runs the binary and through
the offloading library communicates between the host (one process) and
offloading target (another process on the same host).

So, you really can use the *-intelmicemul-* offloading yourself too, without
any special hardware, to test OpenACC (at least, as soon as all the
necessary hooks are wired in the plugin if any are missing right now).

	Jakub

^ permalink raw reply	[flat|nested] 36+ messages in thread

* libgomp: "GNU OpenMP Runtime Library" (was: [PATCH 1/5] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin (repost))
  2014-11-12 10:10   ` Jakub Jelinek
  2014-11-12 10:59     ` Thomas Schwinge
  2014-11-12 11:06     ` Julian Brown
@ 2014-11-12 11:33     ` Thomas Schwinge
  2014-11-12 11:49       ` Jakub Jelinek
       [not found]     ` <20141113232615.4ff373bf@octopus>
                       ` (2 subsequent siblings)
  5 siblings, 1 reply; 36+ messages in thread
From: Thomas Schwinge @ 2014-11-12 11:33 UTC (permalink / raw)
  To: Jakub Jelinek, gcc; +Cc: gcc-patches, Ilya Verbin, Julian Brown

[-- Attachment #1: Type: text/plain, Size: 1438 bytes --]

Hi!

On Wed, 12 Nov 2014 11:06:26 +0100, Jakub Jelinek <jakub@redhat.com> wrote:
> On Tue, Nov 11, 2014 at 01:53:23PM +0000, Julian Brown wrote:
> > --- a/libgomp/configure.ac
> > +++ b/libgomp/configure.ac
> > @@ -2,6 +2,8 @@
> >  # aclocal -I ../config && autoconf && autoheader && automake
> >  
> >  AC_PREREQ(2.64)
> > +#TODO: Update for OpenACC?  But then also have to update copyright notices in
> > +#all source files...
| >  AC_INIT([GNU OpenMP Runtime Library], 1.0,,[libgomp])
> 
> Please drop this.

(I agree to drop the TODO marker, obviously.)  Note that I'm not trying
to drive this into a "bikeshedding" discussion, and neither is my
intention to discredit the lots of pioneering OpenMP work in GCC (which
we're largely basing our OpenACC work on -- thanks!).

The underlying question here is, with offloading generally as well as the
OpenACC Runtime Library also to be living in libgomp, calling it "GNU
OpenMP Runtime Library" is no longer accurate.  (Also, I'm not proposing
to change the libgomp library name -- that would probably be too much of
a hassle?)  Do we want a new "verbose" name for libgomp, "GNU Offloading,
OpenACC, and OpenMP Runtime Library" (sorting alphabetically), or
something else, or no change.  I'm afraid that not changing it will be
confusing to users who are looking for the GCC implementation of the
OpenACC Runtime Library, for example?


Grüße,
 Thomas

[-- Attachment #2: Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: libgomp: "GNU OpenMP Runtime Library" (was: [PATCH 1/5] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin (repost))
  2014-11-12 11:33     ` libgomp: "GNU OpenMP Runtime Library" (was: [PATCH 1/5] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin (repost)) Thomas Schwinge
@ 2014-11-12 11:49       ` Jakub Jelinek
  2014-11-12 13:40         ` David Malcolm
  0 siblings, 1 reply; 36+ messages in thread
From: Jakub Jelinek @ 2014-11-12 11:49 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: gcc, gcc-patches, Ilya Verbin, Julian Brown

On Wed, Nov 12, 2014 at 12:18:13PM +0100, Thomas Schwinge wrote:
> On Wed, 12 Nov 2014 11:06:26 +0100, Jakub Jelinek <jakub@redhat.com> wrote:
> > On Tue, Nov 11, 2014 at 01:53:23PM +0000, Julian Brown wrote:
> > > --- a/libgomp/configure.ac
> > > +++ b/libgomp/configure.ac
> > > @@ -2,6 +2,8 @@
> > >  # aclocal -I ../config && autoconf && autoheader && automake
> > >  
> > >  AC_PREREQ(2.64)
> > > +#TODO: Update for OpenACC?  But then also have to update copyright notices in
> > > +#all source files...
> | >  AC_INIT([GNU OpenMP Runtime Library], 1.0,,[libgomp])
> > 
> > Please drop this.
> 
> (I agree to drop the TODO marker, obviously.)  Note that I'm not trying
> to drive this into a "bikeshedding" discussion, and neither is my
> intention to discredit the lots of pioneering OpenMP work in GCC (which
> we're largely basing our OpenACC work on -- thanks!).
> 
> The underlying question here is, with offloading generally as well as the
> OpenACC Runtime Library also to be living in libgomp, calling it "GNU
> OpenMP Runtime Library" is no longer accurate.  (Also, I'm not proposing
> to change the libgomp library name -- that would probably be too much of
> a hassle?)  Do we want a new "verbose" name for libgomp, "GNU Offloading,
> OpenACC, and OpenMP Runtime Library" (sorting alphabetically), or
> something else, or no change.  I'm afraid that not changing it will be
> confusing to users who are looking for the GCC implementation of the
> OpenACC Runtime Library, for example?

Yeah, it is something I wanted to mention in the review of the documentation
patch, calling it just GNU OpenMP Runtime Library is not right after
it handles OpenACC too, but GNU Offloading, OpenACC and OpenMP Runtime
Library sounds bad to me too, because offloading (both OpenMP offloading and
OpenACC offloading) is actually only a small part of what the library is
about, I still view the library primarily as being a runtime for
OpenMP parallelization, tasking etc.; that's how it started and even OpenMP
offloading is just a matter of the last year (and until today in upstream
not even any actual offloading), for OpenACC it is solely about
offloading and directives in the offloaded code, right?

So, don't want to bikeshed, but I'd call it
GNU OpenMP and OpenACC Runtime Library simply from the things what it does
and how it evolved, I know it isn't alphabetically sorted that way, but
it will the library has more than 9 years of history now and tons of users
already.

	Jakub

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: libgomp: "GNU OpenMP Runtime Library" (was: [PATCH 1/5] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin (repost))
  2014-11-12 11:49       ` Jakub Jelinek
@ 2014-11-12 13:40         ` David Malcolm
  2014-11-12 13:49           ` Jakub Jelinek
  0 siblings, 1 reply; 36+ messages in thread
From: David Malcolm @ 2014-11-12 13:40 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Thomas Schwinge, gcc, gcc-patches, Ilya Verbin, Julian Brown

On Wed, 2014-11-12 at 12:33 +0100, Jakub Jelinek wrote:
> On Wed, Nov 12, 2014 at 12:18:13PM +0100, Thomas Schwinge wrote:
> > On Wed, 12 Nov 2014 11:06:26 +0100, Jakub Jelinek <jakub@redhat.com> wrote:
> > > On Tue, Nov 11, 2014 at 01:53:23PM +0000, Julian Brown wrote:
> > > > --- a/libgomp/configure.ac
> > > > +++ b/libgomp/configure.ac
> > > > @@ -2,6 +2,8 @@
> > > >  # aclocal -I ../config && autoconf && autoheader && automake
> > > >  
> > > >  AC_PREREQ(2.64)
> > > > +#TODO: Update for OpenACC?  But then also have to update copyright notices in
> > > > +#all source files...
> > | >  AC_INIT([GNU OpenMP Runtime Library], 1.0,,[libgomp])
> > > 
> > > Please drop this.
> > 
> > (I agree to drop the TODO marker, obviously.)  Note that I'm not trying
> > to drive this into a "bikeshedding" discussion, and neither is my
> > intention to discredit the lots of pioneering OpenMP work in GCC (which
> > we're largely basing our OpenACC work on -- thanks!).
> > 
> > The underlying question here is, with offloading generally as well as the
> > OpenACC Runtime Library also to be living in libgomp, calling it "GNU
> > OpenMP Runtime Library" is no longer accurate.  (Also, I'm not proposing
> > to change the libgomp library name -- that would probably be too much of
> > a hassle?)  Do we want a new "verbose" name for libgomp, "GNU Offloading,
> > OpenACC, and OpenMP Runtime Library" (sorting alphabetically), or
> > something else, or no change.  I'm afraid that not changing it will be
> > confusing to users who are looking for the GCC implementation of the
> > OpenACC Runtime Library, for example?
> 
> Yeah, it is something I wanted to mention in the review of the documentation
> patch, calling it just GNU OpenMP Runtime Library is not right after
> it handles OpenACC too, but GNU Offloading, OpenACC and OpenMP Runtime
> Library sounds bad to me too, because offloading (both OpenMP offloading and
> OpenACC offloading) is actually only a small part of what the library is
> about, I still view the library primarily as being a runtime for
> OpenMP parallelization, tasking etc.; that's how it started and even OpenMP
> offloading is just a matter of the last year (and until today in upstream
> not even any actual offloading), for OpenACC it is solely about
> offloading and directives in the offloaded code, right?
> 
> So, don't want to bikeshed, but I'd call it
> GNU OpenMP and OpenACC Runtime Library simply from the things what it does
> and how it evolved, I know it isn't alphabetically sorted that way, but
> it will the library has more than 9 years of history now and tons of users
> already.

Apologies for bikeshedding, and I normally dislike "cute" names, but
renaming it to

   "GNU Offloading and Multi Processing library"

would allow a backronym of "libgomp", thus preserving the existing
filenames/SONAME etc.

Dave

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: libgomp: "GNU OpenMP Runtime Library" (was: [PATCH 1/5] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin (repost))
  2014-11-12 13:40         ` David Malcolm
@ 2014-11-12 13:49           ` Jakub Jelinek
  2014-11-12 20:30             ` David Malcolm
  0 siblings, 1 reply; 36+ messages in thread
From: Jakub Jelinek @ 2014-11-12 13:49 UTC (permalink / raw)
  To: David Malcolm
  Cc: Thomas Schwinge, gcc, gcc-patches, Ilya Verbin, Julian Brown

On Wed, Nov 12, 2014 at 08:33:34AM -0500, David Malcolm wrote:
> Apologies for bikeshedding, and I normally dislike "cute" names, but
> renaming it to
> 
>    "GNU Offloading and Multi Processing library"
> 
> would allow a backronym of "libgomp", thus preserving the existing
> filenames/SONAME etc.

I think this is fine, can you change it both in libgomp/configure.ac
and texi docs?

	Jakub

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: libgomp: "GNU OpenMP Runtime Library" (was: [PATCH 1/5] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin (repost))
  2014-11-12 13:49           ` Jakub Jelinek
@ 2014-11-12 20:30             ` David Malcolm
  2014-11-12 20:41               ` Jakub Jelinek
  0 siblings, 1 reply; 36+ messages in thread
From: David Malcolm @ 2014-11-12 20:30 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Thomas Schwinge, gcc, gcc-patches, Ilya Verbin, Julian Brown

[-- Attachment #1: Type: text/plain, Size: 672 bytes --]

On Wed, 2014-11-12 at 14:47 +0100, Jakub Jelinek wrote:
> On Wed, Nov 12, 2014 at 08:33:34AM -0500, David Malcolm wrote:
> > Apologies for bikeshedding, and I normally dislike "cute" names, but
> > renaming it to
> > 
> >    "GNU Offloading and Multi Processing library"
> > 
> > would allow a backronym of "libgomp", thus preserving the existing
> > filenames/SONAME etc.
> 
> I think this is fine, can you change it both in libgomp/configure.ac
> and texi docs?

Am attaching a patch that does so, though I suspect the wording in the
texi may need some more work (not my area of expertise).

Bootstrapped on x86_64-unknown-linux-gnu (Fedora 20); regrtest ongoing.

Dave

[-- Attachment #2: 0001-Change-human-name-of-libgomp.patch --]
[-- Type: text/x-patch, Size: 3023 bytes --]

From f52f7d0e2115d3f88e8662cab650f8746a2c147d Mon Sep 17 00:00:00 2001
From: David Malcolm <dmalcolm@redhat.com>
Date: Wed, 12 Nov 2014 12:25:25 -0500
Subject: [PATCH] Change "human" name of libgomp

libgomp/ChangeLog:
	* configure.ac (AC_INIT): Rename from "GNU OpenMP Runtime Library"
	to "GNU Offloading and Multi Processing Runtime Library".
	* libgomp.texi (direntry): Likewise.  Reword to refer to both
	OpenMP and OpenACC.
	(Introduction): Reword.
	(Runtime Library Routines): Reword.
---
 libgomp/configure.ac |  2 +-
 libgomp/libgomp.texi | 14 ++++++++------
 2 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/libgomp/configure.ac b/libgomp/configure.ac
index 84d250f..1a70058 100644
--- a/libgomp/configure.ac
+++ b/libgomp/configure.ac
@@ -2,7 +2,7 @@
 # aclocal -I ../config && autoconf && autoheader && automake
 
 AC_PREREQ(2.64)
-AC_INIT([GNU OpenMP Runtime Library], 1.0,,[libgomp])
+AC_INIT([GNU Offloading and Multi Processing Runtime Library], 1.0,,[libgomp])
 AC_CONFIG_HEADER(config.h)
 
 # -------
diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 254be57..78e8404 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -31,11 +31,12 @@ texts being (a) (see below), and with the Back-Cover Texts being (b)
 @ifinfo
 @dircategory GNU Libraries
 @direntry
-* libgomp: (libgomp).                    GNU OpenMP runtime library
+* libgomp: (libgomp).   GNU Offloading and Multi Processing Runtime library
 @end direntry
 
-This manual documents the GNU implementation of the OpenMP API for 
-multi-platform shared-memory parallel programming in C/C++ and Fortran.
+This manual documents libgomp, the GNU Offloading and Multi
+Processing Runtime library.  This is the GNU implementation of the OpenMP
+and OpenACC APIs for parallel programming in C/C++ and Fortran.
 
 Published by the Free Software Foundation
 51 Franklin Street, Fifth Floor
@@ -69,7 +70,8 @@ Boston, MA 02110-1301, USA@*
 @top Introduction
 @cindex Introduction
 
-This manual documents the usage of libgomp, the GNU implementation of the 
+This manual documents the usage of libgomp, the GNU Offloading and Multi
+Processing Runtime library.  This is the GNU implementation of the
 @uref{http://www.openmp.org, OpenMP} Application Programming Interface (API)
 for multi-platform shared-memory parallel programming in C/C++ and Fortran.
 
@@ -82,8 +84,8 @@ for multi-platform shared-memory parallel programming in C/C++ and Fortran.
 @comment
 @menu
 * Enabling OpenMP::            How to enable OpenMP for your applications.
-* Runtime Library Routines::   The OpenMP runtime application programming 
-                               interface.
+* Runtime Library Routines::   The offloading and multiprocessing runtime
+                               application programming interface.
 * Environment Variables::      Influencing runtime behavior with environment 
                                variables.
 * The libgomp ABI::            Notes on the external ABI presented by libgomp.
-- 
1.8.5.3


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: libgomp: "GNU OpenMP Runtime Library" (was: [PATCH 1/5] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin (repost))
  2014-11-12 20:30             ` David Malcolm
@ 2014-11-12 20:41               ` Jakub Jelinek
  2014-11-12 20:50                 ` David Malcolm
  0 siblings, 1 reply; 36+ messages in thread
From: Jakub Jelinek @ 2014-11-12 20:41 UTC (permalink / raw)
  To: David Malcolm
  Cc: Thomas Schwinge, gcc, gcc-patches, Ilya Verbin, Julian Brown

On Wed, Nov 12, 2014 at 03:22:21PM -0500, David Malcolm wrote:
> On Wed, 2014-11-12 at 14:47 +0100, Jakub Jelinek wrote:
> > On Wed, Nov 12, 2014 at 08:33:34AM -0500, David Malcolm wrote:
> > > Apologies for bikeshedding, and I normally dislike "cute" names, but
> > > renaming it to
> > > 
> > >    "GNU Offloading and Multi Processing library"
> > > 
> > > would allow a backronym of "libgomp", thus preserving the existing
> > > filenames/SONAME etc.
> > 
> > I think this is fine, can you change it both in libgomp/configure.ac
> > and texi docs?
> 
> Am attaching a patch that does so, though I suspect the wording in the
> texi may need some more work (not my area of expertise).

Oops, I didn't mean by "you" above you, but the OpenACC folks, sorry for
confusion.  Anyway, your patch is ok for trunk.  Thanks.

> >From f52f7d0e2115d3f88e8662cab650f8746a2c147d Mon Sep 17 00:00:00 2001
> From: David Malcolm <dmalcolm@redhat.com>
> Date: Wed, 12 Nov 2014 12:25:25 -0500
> Subject: [PATCH] Change "human" name of libgomp
> 
> libgomp/ChangeLog:
> 	* configure.ac (AC_INIT): Rename from "GNU OpenMP Runtime Library"
> 	to "GNU Offloading and Multi Processing Runtime Library".
> 	* libgomp.texi (direntry): Likewise.  Reword to refer to both
> 	OpenMP and OpenACC.
> 	(Introduction): Reword.
> 	(Runtime Library Routines): Reword.
> ---
>  libgomp/configure.ac |  2 +-
>  libgomp/libgomp.texi | 14 ++++++++------
>  2 files changed, 9 insertions(+), 7 deletions(-)
> 
> diff --git a/libgomp/configure.ac b/libgomp/configure.ac
> index 84d250f..1a70058 100644
> --- a/libgomp/configure.ac
> +++ b/libgomp/configure.ac
> @@ -2,7 +2,7 @@
>  # aclocal -I ../config && autoconf && autoheader && automake
>  
>  AC_PREREQ(2.64)
> -AC_INIT([GNU OpenMP Runtime Library], 1.0,,[libgomp])
> +AC_INIT([GNU Offloading and Multi Processing Runtime Library], 1.0,,[libgomp])
>  AC_CONFIG_HEADER(config.h)
>  
>  # -------
> diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
> index 254be57..78e8404 100644
> --- a/libgomp/libgomp.texi
> +++ b/libgomp/libgomp.texi
> @@ -31,11 +31,12 @@ texts being (a) (see below), and with the Back-Cover Texts being (b)
>  @ifinfo
>  @dircategory GNU Libraries
>  @direntry
> -* libgomp: (libgomp).                    GNU OpenMP runtime library
> +* libgomp: (libgomp).   GNU Offloading and Multi Processing Runtime library
>  @end direntry
>  
> -This manual documents the GNU implementation of the OpenMP API for 
> -multi-platform shared-memory parallel programming in C/C++ and Fortran.
> +This manual documents libgomp, the GNU Offloading and Multi
> +Processing Runtime library.  This is the GNU implementation of the OpenMP
> +and OpenACC APIs for parallel programming in C/C++ and Fortran.
>  
>  Published by the Free Software Foundation
>  51 Franklin Street, Fifth Floor
> @@ -69,7 +70,8 @@ Boston, MA 02110-1301, USA@*
>  @top Introduction
>  @cindex Introduction
>  
> -This manual documents the usage of libgomp, the GNU implementation of the 
> +This manual documents the usage of libgomp, the GNU Offloading and Multi
> +Processing Runtime library.  This is the GNU implementation of the
>  @uref{http://www.openmp.org, OpenMP} Application Programming Interface (API)
>  for multi-platform shared-memory parallel programming in C/C++ and Fortran.
>  
> @@ -82,8 +84,8 @@ for multi-platform shared-memory parallel programming in C/C++ and Fortran.
>  @comment
>  @menu
>  * Enabling OpenMP::            How to enable OpenMP for your applications.
> -* Runtime Library Routines::   The OpenMP runtime application programming 
> -                               interface.
> +* Runtime Library Routines::   The offloading and multiprocessing runtime
> +                               application programming interface.
>  * Environment Variables::      Influencing runtime behavior with environment 
>                                 variables.
>  * The libgomp ABI::            Notes on the external ABI presented by libgomp.
> -- 
> 1.8.5.3
> 


	Jakub

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: libgomp: "GNU OpenMP Runtime Library" (was: [PATCH 1/5] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin (repost))
  2014-11-12 20:41               ` Jakub Jelinek
@ 2014-11-12 20:50                 ` David Malcolm
  2015-01-11  2:18                   ` libgomp: Now known as the GNU Offloading and Multi Processing Runtime Library (was: libgomp: "GNU OpenMP Runtime Library") Thomas Schwinge
  0 siblings, 1 reply; 36+ messages in thread
From: David Malcolm @ 2014-11-12 20:50 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Thomas Schwinge, gcc, gcc-patches, Ilya Verbin, Julian Brown

On Wed, 2014-11-12 at 21:30 +0100, Jakub Jelinek wrote:
> On Wed, Nov 12, 2014 at 03:22:21PM -0500, David Malcolm wrote:
> > On Wed, 2014-11-12 at 14:47 +0100, Jakub Jelinek wrote:
> > > On Wed, Nov 12, 2014 at 08:33:34AM -0500, David Malcolm wrote:
> > > > Apologies for bikeshedding, and I normally dislike "cute" names, but
> > > > renaming it to
> > > > 
> > > >    "GNU Offloading and Multi Processing library"
> > > > 
> > > > would allow a backronym of "libgomp", thus preserving the existing
> > > > filenames/SONAME etc.
> > > 
> > > I think this is fine, can you change it both in libgomp/configure.ac
> > > and texi docs?
> > 
> > Am attaching a patch that does so, though I suspect the wording in the
> > texi may need some more work (not my area of expertise).
> 
> Oops, I didn't mean by "you" above you, but the OpenACC folks, sorry for
> confusion.  Anyway, your patch is ok for trunk.  Thanks.

Ah, ok :)   Presumably this is conditional on the rest of the OpenACC
work merging?  (AIUI the OpenACC work is not yet on trunk, right?)

If so, perhaps the OpenACC people can adopt the patch and apply it (or
in a modified form) when they merge their work?

Sorry if I'm stepping on any toes here
Dave


> > >From f52f7d0e2115d3f88e8662cab650f8746a2c147d Mon Sep 17 00:00:00 2001
> > From: David Malcolm <dmalcolm@redhat.com>
> > Date: Wed, 12 Nov 2014 12:25:25 -0500
> > Subject: [PATCH] Change "human" name of libgomp
> > 
> > libgomp/ChangeLog:
> > 	* configure.ac (AC_INIT): Rename from "GNU OpenMP Runtime Library"
> > 	to "GNU Offloading and Multi Processing Runtime Library".
> > 	* libgomp.texi (direntry): Likewise.  Reword to refer to both
> > 	OpenMP and OpenACC.
> > 	(Introduction): Reword.
> > 	(Runtime Library Routines): Reword.
> > ---
> >  libgomp/configure.ac |  2 +-
> >  libgomp/libgomp.texi | 14 ++++++++------
> >  2 files changed, 9 insertions(+), 7 deletions(-)
> > 
> > diff --git a/libgomp/configure.ac b/libgomp/configure.ac
> > index 84d250f..1a70058 100644
> > --- a/libgomp/configure.ac
> > +++ b/libgomp/configure.ac
> > @@ -2,7 +2,7 @@
> >  # aclocal -I ../config && autoconf && autoheader && automake
> >  
> >  AC_PREREQ(2.64)
> > -AC_INIT([GNU OpenMP Runtime Library], 1.0,,[libgomp])
> > +AC_INIT([GNU Offloading and Multi Processing Runtime Library], 1.0,,[libgomp])
> >  AC_CONFIG_HEADER(config.h)
> >  
> >  # -------
> > diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
> > index 254be57..78e8404 100644
> > --- a/libgomp/libgomp.texi
> > +++ b/libgomp/libgomp.texi
> > @@ -31,11 +31,12 @@ texts being (a) (see below), and with the Back-Cover Texts being (b)
> >  @ifinfo
> >  @dircategory GNU Libraries
> >  @direntry
> > -* libgomp: (libgomp).                    GNU OpenMP runtime library
> > +* libgomp: (libgomp).   GNU Offloading and Multi Processing Runtime library
> >  @end direntry
> >  
> > -This manual documents the GNU implementation of the OpenMP API for 
> > -multi-platform shared-memory parallel programming in C/C++ and Fortran.
> > +This manual documents libgomp, the GNU Offloading and Multi
> > +Processing Runtime library.  This is the GNU implementation of the OpenMP
> > +and OpenACC APIs for parallel programming in C/C++ and Fortran.
> >  
> >  Published by the Free Software Foundation
> >  51 Franklin Street, Fifth Floor
> > @@ -69,7 +70,8 @@ Boston, MA 02110-1301, USA@*
> >  @top Introduction
> >  @cindex Introduction
> >  
> > -This manual documents the usage of libgomp, the GNU implementation of the 
> > +This manual documents the usage of libgomp, the GNU Offloading and Multi
> > +Processing Runtime library.  This is the GNU implementation of the
> >  @uref{http://www.openmp.org, OpenMP} Application Programming Interface (API)
> >  for multi-platform shared-memory parallel programming in C/C++ and Fortran.
> >  
> > @@ -82,8 +84,8 @@ for multi-platform shared-memory parallel programming in C/C++ and Fortran.
> >  @comment
> >  @menu
> >  * Enabling OpenMP::            How to enable OpenMP for your applications.
> > -* Runtime Library Routines::   The OpenMP runtime application programming 
> > -                               interface.
> > +* Runtime Library Routines::   The offloading and multiprocessing runtime
> > +                               application programming interface.
> >  * Environment Variables::      Influencing runtime behavior with environment 
> >                                 variables.
> >  * The libgomp ABI::            Notes on the external ABI presented by libgomp.
> > -- 
> > 1.8.5.3
> > 
> 
> 
> 	Jakub


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/5] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin (repost)
  2014-11-12 10:59     ` Thomas Schwinge
@ 2014-11-12 21:11       ` Mike Stump
  0 siblings, 0 replies; 36+ messages in thread
From: Mike Stump @ 2014-11-12 21:11 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: Jakub Jelinek, Julian Brown, Ilya Verbin, gcc-patches

On Nov 12, 2014, at 2:55 AM, Thomas Schwinge <thomas@codesourcery.com> wrote:
> There is no mechanism in DejaGnu to pass environment variables to remote
> boards (which we're using in internal testing), and we currently use that
> to circle through available accelerators/libgomp plugins

So, two thoughts come to mind.  In my target, I wire up env and in my simulator, I push all of the environment from the host into the target.  Works very nicely, I can bootstrap gcc and binutils on my target.  Total line count to do this is fairly trivial.  Once done, then all the usual environment adjusting that one can do in tcl is reflected on the target.

Or, you can put in code that will communicate the bits over to the target.  When commands are executed, usually we are pushing them into shells, and in shells, you can do:

  env1=value1 env2=value2 command arg1 arg2

and push as many variables over that you want.  Just need an api to add/manage what variables you want to actively push.  If you don’t have a shell, you still need some way to communicate the variables over.  Ultimately this will have to be put into the board file and then the abi can use that interface to move the variables.  I don’ think there are existing ways to do this.

I like the first, and that’s fine for testing on software simulators with lots of memory.  On actual target hardware with limited memory, it would be less appropriate.

The second, might require a dejagnu update to it work.


Another thought, if you have an argument to the program you want to run, —env, that can place environment variables into that program, you can then just add —env args to the command line too, and it will put them into the environment directly.  Benefit, no dejagnu mods or updates.  Easy to understand and audit (env variables otherwise kinda disappear, making flaws in them harder to see).

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Fortran/C interfacing (was: [PATCH 1/5] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin (repost))
       [not found]     ` <20141113232615.4ff373bf@octopus>
@ 2014-11-14 16:07       ` Thomas Schwinge
  2014-11-14 21:01         ` Fortran/C interfacing Tobias Burnus
  0 siblings, 1 reply; 36+ messages in thread
From: Thomas Schwinge @ 2014-11-14 16:07 UTC (permalink / raw)
  To: Jakub Jelinek, fortran, Tobias Burnus; +Cc: gcc-patches, Julian Brown

[-- Attachment #1: Type: text/plain, Size: 914 bytes --]

Hi!

On Wed, 12 Nov 2014 11:06:26 +0100, Jakub Jelinek <jakub@redhat.com> wrote:
> On Tue, Nov 11, 2014 at 01:53:23PM +0000, Julian Brown wrote:
> > [OpenACC libgomp changes]
> > [openacc.f90, and exporting symbols in libgomp.map]

> > +OACC_2.0 {
> > +  global:
> > +	acc_get_num_devices;
> > +	acc_get_num_devices_h_;
> 
> Somebody recently suggested (for OpenMP) that we just should use
> bind(C) in the Fortran module, it is too late for OpenMP, as we
> have to keep the *_ entrypoints for compatibility anyway, but
> for OpenACC and new OpenMP functions supposedly you could avoid
> exporting all the *_ wrappers and use * directly.

Tobis, as our local expert :-) -- how does that "resonate" with the
discussion (and implementation) about Fortran/C interfacing in
<http://news.gmane.org/find-root.php?message_id=%3C20140818135104.GA8943%40physik.fu-berlin.de%3E>?


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* GOMP_DEBUG environment variable?  (was: [PATCH 1/5] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin (repost))
  2014-11-12 10:10   ` Jakub Jelinek
                       ` (3 preceding siblings ...)
       [not found]     ` <20141113232615.4ff373bf@octopus>
@ 2014-11-14 16:38     ` Thomas Schwinge
  2014-11-15  1:04     ` [PATCH 1/5] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin (repost) Julian Brown
  5 siblings, 0 replies; 36+ messages in thread
From: Thomas Schwinge @ 2014-11-14 16:38 UTC (permalink / raw)
  To: Jakub Jelinek, Julian Brown; +Cc: gcc-patches, Ilya Verbin

[-- Attachment #1: Type: text/plain, Size: 1852 bytes --]

Hi!

On Wed, 12 Nov 2014 11:06:26 +0100, Jakub Jelinek <jakub@redhat.com> wrote:
> On Tue, Nov 11, 2014 at 01:53:23PM +0000, Julian Brown wrote:
> > @@ -1181,6 +1197,7 @@ initialize_env (void)
> >        gomp_global_icv.thread_limit_var
> >  	= thread_limit_var > INT_MAX ? UINT_MAX : thread_limit_var;
> >      }
> > +  parse_int ("GCC_ACC_NOTIFY", &goacc_notify_var, true);
> 
> I would have expected GACC_NOTIFY name instead (or GOACC_NOTIFY)
> to match GOMP_SPINCOUNT and similar env vars.

GOACC_NOTIFY was a first implementation for an immediate need, but I've
always had in the back of my head the idea to generalize this.  How about
GOMP_DEBUG, and this can then be set to comma-separated list of "classes"
of debugging information?

> > +  gomp_notify ("%s: mapnum=%zd, hostaddrs=%p, sizes=%p, kinds=%p\n",
> > +	       __FUNCTION__, mapnum, hostaddrs, sizes, kinds);
> 
> Isn't such debugging too costly?  Perhaps either enable it only in
> debugging builds, or at least guard with (perhaps in a gomp_notify macro)
> with
>   if (__builtin_expect (goacc_notify_var, 0))
>     (gomp_notify) (__VA_ARGS__)
> ?  I'd think doing it in debugging builds only should be sufficient.

I think, users may occasionally wish to see such "debugging" information,
but certainly, guarding it with __builtin_expect is a good thing to do.
How about having an enum gomp_debug, defining several "classes" of
debugging information ("device" (scanning), "mem" (mapping setup),
"offload" (kernel launches), and so on -- not all to be added right now,
of course), and make that the first parameter to gomp_debug?  Certainly
that can't be in "hot" code paths (too much output), and for "non-hot"
code paths, I don't think the gomp_debug_var comparison matters, compared
to the other code executing nearby.


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Fortran/C interfacing
  2014-11-14 16:07       ` Fortran/C interfacing (was: [PATCH 1/5] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin (repost)) Thomas Schwinge
@ 2014-11-14 21:01         ` Tobias Burnus
  2014-11-14 21:24           ` Jakub Jelinek
  0 siblings, 1 reply; 36+ messages in thread
From: Tobias Burnus @ 2014-11-14 21:01 UTC (permalink / raw)
  To: Thomas Schwinge, Jakub Jelinek, fortran; +Cc: gcc-patches, Julian Brown

Am 14.11.2014 um 16:56 schrieb Thomas Schwinge:
> Hi!
>
> On Wed, 12 Nov 2014 11:06:26 +0100, Jakub Jelinek <jakub@redhat.com> wrote:
>> On Tue, Nov 11, 2014 at 01:53:23PM +0000, Julian Brown wrote:
>>> [OpenACC libgomp changes][openacc.f90, and exporting symbols in libgomp.map]
>>>
>>>
>> Somebody recently suggested (for OpenMP) that we just should use
>> bind(C) in the Fortran module, it is too late for OpenMP, as we
>> have to keep the *_ entrypoints for compatibility anyway, but
>> for OpenACC and new OpenMP functions supposedly you could avoid
>> exporting all the *_ wrappers and use * directly.
> Tobis, as our local expert :-) -- how does that "resonate" with the
> discussion (and implementation) about Fortran/C interfacing

Unfortunately, it a wrapper either on C or on Fortran side is unavoidable.

For C/C++, one has has:
   void* acc_copyin(h_void*, size_t );

That matches (if one ignores the return type):
   subroutine acc_copyin(a, len) bind(C)
     use iso_c_binding, only: c_size_t
     type(*) :: a
     integer(c_size_t) :: len

If one looks at Fortran's first interface, it seems to match:
   subroutine acc_copyin(a, len )
     type:: a
     integer :: len

However, a default-size integer in Fortran is usually* 4-bytes wide 
while size_t on most 64bit systems** is 8-bytes wide. As Fortran doesn't 
automatically convert the integer type, I think there is no way but 
providing additionally a function which takes an "int"/default-kind 
integer. (* unless one uses -fdefault-integer-8; ** such as x32.)

I think in the current version of the patch – I haven't re-checked –, 
one provides both a function for int32 and int64, one matching a 32 and 
one a 64 bit integer. Additionally, it permits to use >2GB arrays. In 
principle, one of the versions could directly invoke the C function 
without the wrapper [where c_size_t == kind(integer)] - but that would 
require come conditional compilation.
Whether one has a trailing "_" and whether one implements it in C or in 
Fortran doesn't really matter.

The second Fortran interface is:

   subroutine acc_copyin(a)
     type, dimension(: [,:]…) :: a

which can be best writtin in modern Fortran as:
    class(*), dimension(..) :: a
and which, knowing the internal implementation, I wrote as
    type(*), dimension(..) :: a
as it avoids some extra code on the caller side - but it is not fully 
standard conform. On the other hand, "type(*), dimension(..)" can be 
also marked as BIND(C).

There is no C equivalent but one can use something like
    call acc_copyin(c_loc(a), size(a)*storage_size(a)/8)
to convert it to the C form of the function.

Again, this conversion can be done either in Fortran or in C. One just 
needs to take the first field of the array descriptor - the address of 
the actual data - and needs to extract the size of an element and the 
number of elements. (Caveat: The current code only works if the array is 
contiguous and the argument is not an assumed-size array.)


Thus, all in all, I think the current implementation is okay. However, 
if someone has a better suggestion, I am interested.

Tobias

PS: I am happy that TYPE(*) and DIMENSION(..) exist (both in the 
standard and in the compiler) as they make life much simpler. Otherwise, 
providing an explicit interface would be tediuous for intrinsic types 
and impossible for derived types.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Fortran/C interfacing
  2014-11-14 21:01         ` Fortran/C interfacing Tobias Burnus
@ 2014-11-14 21:24           ` Jakub Jelinek
  0 siblings, 0 replies; 36+ messages in thread
From: Jakub Jelinek @ 2014-11-14 21:24 UTC (permalink / raw)
  To: Tobias Burnus; +Cc: Thomas Schwinge, fortran, gcc-patches, Julian Brown

On Fri, Nov 14, 2014 at 09:47:17PM +0100, Tobias Burnus wrote:
> >>Somebody recently suggested (for OpenMP) that we just should use
> >>bind(C) in the Fortran module, it is too late for OpenMP, as we
> >>have to keep the *_ entrypoints for compatibility anyway, but
> >>for OpenACC and new OpenMP functions supposedly you could avoid
> >>exporting all the *_ wrappers and use * directly.
> >Tobis, as our local expert :-) -- how does that "resonate" with the
> >discussion (and implementation) about Fortran/C interfacing
> 
> Unfortunately, it a wrapper either on C or on Fortran side is unavoidable.

Ok, just wanted to rise it, if it isn't possible, fine as is.

	Jakub

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/5] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin (repost)
  2014-11-12 10:10   ` Jakub Jelinek
                       ` (4 preceding siblings ...)
  2014-11-14 16:38     ` GOMP_DEBUG environment variable? (was: [PATCH 1/5] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin (repost)) Thomas Schwinge
@ 2014-11-15  1:04     ` Julian Brown
  2014-11-19 19:58       ` Bernd Schmidt
  5 siblings, 1 reply; 36+ messages in thread
From: Julian Brown @ 2014-11-15  1:04 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches, Thomas Schwinge, Ilya Verbin

[-- Attachment #1: Type: text/plain, Size: 3041 bytes --]

Hi,

On Wed, 12 Nov 2014 11:06:26 +0100
Jakub Jelinek <jakub@redhat.com> wrote:

> On Tue, Nov 11, 2014 at 01:53:23PM +0000, Julian Brown wrote:
> > A few OpenMP tests fail with the new host_nonshm plugin (with
> > failures of the form "libgomp: Trying to update
> > [0x605820..0x605824) object that is not mapped"), probably because
> > of middle-end bugs. I haven't investigated those in detail.
> 
> Depends how exactly your host_nonshm plugin works. [...]
> 
> One can always use the intelmicemul plugin to test nonshared-memory
> stuff without any HW (provided the host is x86_64/i686), so do we
> really need host_nonshm plugin?

This is a new version of the patch with (hopefully) all these
comments addressed. For now, I've left the host_nonshm plugin present,
but removed the OpenMP-support capability from it, since it doesn't
seem likely that we can support the required semantics for that in the
short term.

> > --- a/libgomp/configure.ac
> > +++ b/libgomp/configure.ac
> > @@ -2,6 +2,8 @@
> >  # aclocal -I ../config && autoconf && autoheader && automake
> >  
> >  AC_PREREQ(2.64)
> > +#TODO: Update for OpenACC?  But then also have to update copyright
> > notices in +#all source files...

I've incorporated David Malcolm's suggestion (and patch) into the
configury bits of this patch, and into the documentation bits. I'll be
reposting those too shortly.

> > @@ -1181,6 +1197,7 @@ initialize_env (void)
> >        gomp_global_icv.thread_limit_var
> >  	= thread_limit_var > INT_MAX ? UINT_MAX : thread_limit_var;
> >      }
> > +  parse_int ("GCC_ACC_NOTIFY", &goacc_notify_var, true);
> 
> I would have expected GACC_NOTIFY name instead (or GOACC_NOTIFY)
> to match GOMP_SPINCOUNT and similar env vars.

I've renamed the environment variable, added a configure-time flag to
enable verbose libgomp output (--enable-libgomp-verbose, disabled by
default), and added the suggested __builtin_expect. The configure flag
might be overkill, considering Thomas's later comments, or maybe the
default could be verbose-output enabled instead.

> > +OACC_2.0 {
> > +  global:
> > +	acc_get_num_devices;
> > +	acc_get_num_devices_h_;
> 
> Somebody recently suggested (for OpenMP) that we just should use
> bind(C) in the Fortran module, it is too late for OpenMP, as we
> have to keep the *_ entrypoints for compatibility anyway, but
> for OpenACC and new OpenMP functions supposedly you could avoid
> exporting all the *_ wrappers and use * directly.

I enlisted Jim Norris's help with this -- I might not understand the
issues with the Fortran bindings fully (discussed in another thread),
but it seems like the interfaces that only accept/return scalars at
least can use the "non-decorated" C function directly. That removes
some of the underscore-suffixed symbols, at least.

I've renamed several other variables that used unnecessary capitals,
and standardized on "goacc_" for internal interfaces, where it seemed
appropriate. I also moved some of the host-specific parts of
plugin/plugin-host.c into oacc-host.c.

Thanks,

Julian

[-- Attachment #2: 0001-OpenACC-support-for-libgomp-4.diff --]
[-- Type: text/x-patch, Size: 240208 bytes --]

commit 48ae7eecfbca988d1bd85e28d2ee52bb2ebb7e27
Author: Julian Brown <julian@codesourcery.com>
Date:   Thu Nov 13 04:21:00 2014 -0800

    OpenACC support for libgomp.
    
    xxxx-xx-xx  Nathan Sidwell  <nathan@codesourcery.com>
    	    James Norris  <jnorris@codesourcery.com>
    	    Thomas Schwinge  <thomas@codesourcery.com>
    	    Tom de Vries  <tom@codesourcery.com>
    	    Julian Brown  <julian@codesourcery.com>
    	    Bernd Schmidt  <bernds@codesourcery.com>
    	    Cesar Philippidis  <cesar@codesourcery.com>
    
        include/
        * gomp-constants.h: New file.
    
        libgomp/
        * Makefile.am (search_path): Search in $(top_srcidr)/../include also.
        (libgomp_la_SOURCES): Add oacc-parallel.c, splay-tree.c,
        oacc-fortran.c, oacc-host.c, oacc-init.c, oacc-mem.c,
        oacc-async.c, oacc-plugin.c, oacc-cuda.c, libgomp-plugin.c.
        (Makefrag.am): Include.
        (libgomp_la_SOURCES): Add openacc.f90 if USE_FORTRAN is true.
        (nodist_libsubinclude_HEADERS): Add openacc.h, ../include/gomp-constants.h.
        (nodist_finclude_HEADERS): Add openacc_lib.h, openacc.f90, openacc.mod,
        openacc_kinds.mod.
        * configure.ac (plugin_support): Add check for accelerators if attempting
        to build plugins.
        (plugin/configfrag.ac): Include.
        (offload_targets): Add host_nonshm target by default, nvptx target
        conditionally if the corresponding offload target is enabled.
        (testsuite/libgomp-test-support.exp): Add to AC_CONFIG_FILES.
        * env.c (libgomp_target.h, oacc-int.h): Include.
        (goacc_notify_var, goacc_device_num, goacc_device_type): New globals.
        (goacc_parse_device_type): New functions.
        (initialize_env): Parse GCC_ACC_NOTIFY, ACC_DEVICE_TYPE, ACC_DEVICE_NUM
        environment variables. Call ACC_runtime_initialize.
        * error.c (gomp_verror): Make global.
        (gomp_vfatal, gomp_vnotify, gomp_notify): New functions.
        (gomp_fatal): Use gomp_vfatal instead of gomp_verror.
        * libgomp.h (stdarg.h): Include.
        (struct gomp_memory_mapping): Forward declaration.
        (goacc_notify_var, goacc_device_num, goacc_device_type): Add extern
        declarations.
        (gomp_vnotify, gomp_notify, gomp_verror, gomp_vfatal): Add
        prototypes.
        (gomp_init_targets_once): Add prototype.
        * libgomp.map (OACC_2.0): New symbol version. Add public acc_*
        interface functions.
        (PLUGIN_1.0): New symbol version. Add gomp plugin interface functions.
        * libgomp_g.h (GOACC_data_start, GOACC_data_end, GOACC_kernels)
        (GOACC_parallel, GOACC_wait): Add prototypes.
        * libgomp_target.h (gomp-constants.h, splay-tree.h): Include.
        (offload_target_type): Set enumeration values from constants in
        gomp-constants.h. Add OFFLOAD_TARGET_TYPE_HOST_NONSHM and
        OFFLOAD_TARGET_TYPE_NVIDIA_PTX.
        (struct target_mem_desc): Move to here.
        (TARGET_CAP_SHARED_MEM, TARGET_CAP_NATIVE_EXEC, TARGET_CAP_OPENMP_400)
        (TARGET_CAP_OPENACC_200): Define macros.
        (struct gomp_memory_mapping): New.
        (struct ACC_dispatch_t): New.
        (struct gomp_device_descr): Move here. Add offload_regions_registered,
        openacc dispatch functions, target_data.
        (gomp_map_vars, gomp_copy_from_async, gomp_unmap_vars, gomp_init_device)
        (gomp_init_tables, gomp_fini_device, gomp_free_memmap): Add prototypes.
        * target.c (oacc-plugin.h, gomp-constants.h, oacc-int.h, stdio.h)
        (assert.h): Include.
        (splay_tree_node, splay_tree, splay_tree_key, target_mem_desc)
        (splay_tree_key_s, gomp_device_descr): Don't declare here.
        (splay_compare): Change linkage to hidden not static.
        (gomp_init_targets_once): New function.
        (gomp_get_num_devices): Use above.
        (get_kind): New function.
        (gomp_map_vars): Add is_openacc parameter. Change KINDS to void *. Use lock
        from memory map not device. Use macros from gomp-constants.h instead of
        hard-coded values. Support OpenACC-specific mappings.
        (gomp_copy_from_async): New function.
        (gomp_unmap_vars): Add DO_COPYFROM argument. Only copy memory
        back from device if it is true. Use lock from memory map not
        device.
        (gomp_update): Add is_openacc parameter. Use lock from memory map not
        device. Use macros from gomp-constants.h instead of hard-coded values.
        (gomp_register_image_for_device): Add forward declaration.
        (GOMP_offload_register): Check realloc result.
        (gomp_init_device): Change linkage to hidden not static.
        (gomp_init_tables, gomp_init_dev_tables, gomp_free_memmap)
        (gomp_fini_device): New function.
        (GOMP_target): Adjust lazy initialization, check target
        capabilities for OpenMP 4.0 support. Update call to gomp_map_vars,
        gomp_unmap_vars.
        (GOMP_target_data): Adjust lazy initialization. Update call to
        gomp_map_vars.
        (GOMP_target_end_data): Update call to gomp_unmap_vars.
        (GOMP_target_update): Tweak lazy initialization. Add new args to
        gomp_update call.
        (gomp_load_plugin_for_device): Initialize get_name, get_caps, device_fini
        and OpenACC-specific plugin hooks.
        (gomp_register_images_for_device): Rename to...
        (gomp_register_image_for_device): This, and register a single
        device only, and only if it has not already had images
        registered.
        (gomp_find_available_plugins): Initialize OpenACC-specific bits, offload
        image registration, and other new device member data. Prefer device with
        TARGET_CAP_OPENMP_400 if more than one plugin is available.
        * libgomp-plugin.c: New file.
        * libgomp-plugin.h: New file.
        * oacc-async.c: New file.
        * oacc-cuda.c: New file.
        * oacc-fortran.c: New file.
        * oacc-host.c: New file.
        * oacc-init.c: New file.
        * oacc-int.h: New file.
        * oacc-mem.c: New file.
        * oacc-parallel.c: New file.
        * oacc-plugin.c: New file.
        * oacc-plugin.h: New file.
        * openacc.f90: New file.
        * openacc.h: New file.
        * openacc_lib.h: New file.
        * splay-tree.h: Move bulk of implementation to...
        * splay-tree.c: New file.
        * Makefile.in: Regenerate.
        * config.h.in: Regenerate.
        * configure: Regenerate.
        * plugin/Makefrag.am: New file.
        * plugin/configfrag.am: New file.
        * plugin/plugin-host.c: New file.
        * plugin/plugin-nvptx.c: New file.
        * testsuite/libgomp-test-support.exp.in: New file.
    
    add --enable-libgomp-verbose to compile-time disable notify calls
    
    __builtin_expect for gomp_notify, when enabled

diff --git a/include/gomp-constants.h b/include/gomp-constants.h
new file mode 100644
index 0000000..7ef5c88
--- /dev/null
+++ b/include/gomp-constants.h
@@ -0,0 +1,45 @@
+#ifndef GOMP_CONSTANTS_H
+#define GOMP_CONSTANTS_H 1
+
+/* Enumerated variable mapping types used to communicate between GCC and
+   libgomp.  These values are used for both OpenMP and OpenACC.  */
+
+#define GOMP_MAP_ALLOC			0x00
+#define GOMP_MAP_ALLOC_TO		0x01
+#define GOMP_MAP_ALLOC_FROM		0x02
+#define GOMP_MAP_ALLOC_TOFROM		0x03
+#define GOMP_MAP_POINTER		0x04
+#define GOMP_MAP_TO_PSET		0x05
+#define GOMP_MAP_FORCE_ALLOC		0x08
+#define GOMP_MAP_FORCE_TO		0x09
+#define GOMP_MAP_FORCE_FROM		0x0a
+#define GOMP_MAP_FORCE_TOFROM		0x0b
+#define GOMP_MAP_FORCE_PRESENT		0x0c
+#define GOMP_MAP_FORCE_DEALLOC		0x0d
+#define GOMP_MAP_FORCE_DEVICEPTR	0x0e
+#define GOMP_MAP_FORCE_PRIVATE		0x18
+#define GOMP_MAP_FORCE_FIRSTPRIVATE	0x19
+
+#define GOMP_MAP_COPYTO_P(X) \
+  ((X) == GOMP_MAP_ALLOC_TO || (X) == GOMP_MAP_FORCE_TO)
+
+#define GOMP_MAP_COPYFROM_P(X) \
+  ((X) == GOMP_MAP_ALLOC_FROM || (X) == GOMP_MAP_FORCE_FROM)
+
+#define GOMP_MAP_TOFROM_P(X) \
+  ((X) == GOMP_MAP_ALLOC_TOFROM || (X) == GOMP_MAP_FORCE_TOFROM)
+
+#define GOMP_MAP_POINTER_P(X) \
+  ((X) == GOMP_MAP_POINTER)
+
+#define GOMP_IF_CLAUSE_FALSE		-2
+
+/* Canonical list of target type codes for OpenMP/OpenACC.  */
+#define GOMP_TARGET_NONE		0
+#define GOMP_TARGET_HOST		2
+#define GOMP_TARGET_HOST_NONSHM		3
+#define GOMP_TARGET_NOT_HOST		4
+#define GOMP_TARGET_NVIDIA_PTX		5
+#define GOMP_TARGET_INTEL_MIC		6
+
+#endif
diff --git a/libgomp/Makefile.am b/libgomp/Makefile.am
index 427415e..f48c1ff 100644
--- a/libgomp/Makefile.am
+++ b/libgomp/Makefile.am
@@ -7,7 +7,8 @@ SUBDIRS = testsuite
 gcc_version := $(shell cat $(top_srcdir)/../gcc/BASE-VER)
 
 config_path = @config_path@
-search_path = $(addprefix $(top_srcdir)/config/, $(config_path)) $(top_srcdir)
+search_path = $(addprefix $(top_srcdir)/config/, $(config_path)) $(top_srcdir) \
+	      $(top_srcdir)/../include
 
 fincludedir = $(libdir)/gcc/$(target_alias)/$(gcc_version)/finclude
 libsubincludedir = $(libdir)/gcc/$(target_alias)/$(gcc_version)/include
@@ -18,6 +19,10 @@ AM_CPPFLAGS = $(addprefix -I, $(search_path))
 AM_CFLAGS = $(XCFLAGS)
 AM_LDFLAGS = $(XLDFLAGS) $(SECTION_LDFLAGS) $(OPT_LDFLAGS)
 
+if LIBGOMP_VERBOSE
+AM_CPPFLAGS += -DLIBGOMP_VERBOSE
+endif
+
 toolexeclib_LTLIBRARIES = libgomp.la
 nodist_toolexeclib_HEADERS = libgomp.spec
 
@@ -60,12 +65,21 @@ libgomp_la_LINK = $(LINK) $(libgomp_la_LDFLAGS)
 libgomp_la_SOURCES = alloc.c barrier.c critical.c env.c error.c iter.c \
 	iter_ull.c loop.c loop_ull.c ordered.c parallel.c sections.c single.c \
 	task.c team.c work.c lock.c mutex.c proc.c sem.c bar.c ptrlock.c \
-	time.c fortran.c affinity.c target.c
+	time.c fortran.c affinity.c target.c oacc-parallel.c splay-tree.c \
+	oacc-host.c oacc-init.c oacc-mem.c oacc-async.c oacc-plugin.c \
+	oacc-cuda.c libgomp-plugin.c
+
+include $(top_srcdir)/plugin/Makefrag.am
+
+if USE_FORTRAN
+libgomp_la_SOURCES += openacc.f90
+endif
 
 nodist_noinst_HEADERS = libgomp_f.h
-nodist_libsubinclude_HEADERS = omp.h
+nodist_libsubinclude_HEADERS = omp.h openacc.h ../include/gomp-constants.h
 if USE_FORTRAN
-nodist_finclude_HEADERS = omp_lib.h omp_lib.f90 omp_lib.mod omp_lib_kinds.mod
+nodist_finclude_HEADERS = omp_lib.h omp_lib.f90 omp_lib.mod omp_lib_kinds.mod \
+	openacc_lib.h openacc.f90 openacc.mod openacc_kinds.mod
 endif
 
 LTLDFLAGS = $(shell $(SHELL) $(top_srcdir)/../libtool-ldflags $(LDFLAGS))
diff --git a/libgomp/Makefile.in b/libgomp/Makefile.in
index 8e4774f..d2a803a 100644
diff --git a/libgomp/config.h.in b/libgomp/config.h.in
index 94a2b3b..309962d 100644
diff --git a/libgomp/configure b/libgomp/configure
index 19f36c6..83a6a11 100755
diff --git a/libgomp/configure.ac b/libgomp/configure.ac
index cea6366..68bcb27 100644
--- a/libgomp/configure.ac
+++ b/libgomp/configure.ac
@@ -2,7 +2,7 @@
 # aclocal -I ../config && autoconf && autoheader && automake
 
 AC_PREREQ(2.64)
-AC_INIT([GNU OpenMP Runtime Library], 1.0,,[libgomp])
+AC_INIT([GNU Offloading and Multi Processing Runtime Library], 1.0,,[libgomp])
 AC_CONFIG_HEADER(config.h)
 
 # -------
@@ -28,7 +28,6 @@ LIBGOMP_ENABLE(generated-files-in-srcdir, no, ,
 AC_MSG_RESULT($enable_generated_files_in_srcdir)
 AM_CONDITIONAL(GENINSRC, test "$enable_generated_files_in_srcdir" = yes)
 
-
 # -------
 # -------
 
@@ -193,13 +192,28 @@ AC_LINK_IFELSE(
    [],
    [AC_MSG_ERROR([Pthreads are required to build libgomp])])])
 
+# Enable --enable-libgomp-verbose
+AC_ARG_ENABLE(libgomp-verbose,
+[AS_HELP_STRING([--enable-libgomp-verbose],
+                [enable verbose debugging output for libgomp])],
+[case "${enableval}" in
+  yes) libgomp_verbose=true ;;
+  no) libgomp_verbose=false ;;
+  *) AC_MSG_ERROR([bad value ${enableval} for --enable-libgomp-verbose]) ;;
+esac], [libgomp_verbose=false])
+AM_CONDITIONAL([LIBGOMP_VERBOSE], [test x$libgomp_verbose = xtrue])
+
 plugin_support=yes
 AC_CHECK_LIB(dl, dlsym, , [plugin_support=no])
 if test x"$plugin_support" = xyes; then
   AC_DEFINE(PLUGIN_SUPPORT, 1,
     [Define if all infrastructure, needed for plugins, is supported.])
+elif test "x$enable_accelerator" != xno; then
+  AC_MSG_ERROR([Can't have support for accelerators without support for plugins])
 fi
 
+m4_include([plugin/configfrag.ac])
+
 # Check for functions needed.
 AC_CHECK_FUNCS(getloadavg clock_gettime strtoull)
 
@@ -283,7 +297,7 @@ fi
 # Get accel target and path to install tree of accel compiler
 offload_additional_options=
 offload_additional_lib_paths=
-offload_targets=
+offload_targets=host_nonshm
 if test x"$enable_offload_targets" != x; then
   for tgt in `echo $enable_offload_targets | sed -e 's#,# #g'`; do
     tgt_dir=`echo $tgt | grep '=' | sed 's/.*=//'`
@@ -291,6 +305,8 @@ if test x"$enable_offload_targets" != x; then
     case $tgt in
       *-intelmic-* | *-intelmicemul-*)
 	tgt_name="intelmic" ;;
+      nvptx-*)
+	tgt_name="nvptx" ;;
       *)
 	AC_MSG_ERROR([unknown offload target specified]) ;;
     esac
@@ -388,4 +404,5 @@ CFLAGS="$save_CFLAGS"
 
 AC_CONFIG_FILES(omp.h omp_lib.h omp_lib.f90 libgomp_f.h)
 AC_CONFIG_FILES(Makefile testsuite/Makefile libgomp.spec)
+AC_CONFIG_FILES([testsuite/libgomp-test-support.exp])
 AC_OUTPUT
diff --git a/libgomp/env.c b/libgomp/env.c
index 94c72a3..7e32eb7 100644
--- a/libgomp/env.c
+++ b/libgomp/env.c
@@ -27,6 +27,8 @@
 
 #include "libgomp.h"
 #include "libgomp_f.h"
+#include "libgomp_target.h"
+#include "oacc-int.h"
 #include <ctype.h>
 #include <stdlib.h>
 #include <stdio.h>
@@ -77,6 +79,10 @@ unsigned long gomp_bind_var_list_len;
 void **gomp_places_list;
 unsigned long gomp_places_list_len;
 
+int goacc_notify_var;
+int goacc_device_num;
+char* goacc_device_type;
+
 /* Parse the OMP_SCHEDULE environment variable.  */
 
 static void
@@ -1011,6 +1017,16 @@ parse_affinity (bool ignore)
   return false;
 }
 
+static void
+goacc_parse_device_type (void)
+{
+  const char *env = getenv ("ACC_DEVICE_TYPE");
+  
+  if (env && *env != '\0')
+    goacc_device_type = strdup (env);
+  else
+    goacc_device_type = NULL;
+}
 
 static void
 handle_omp_display_env (unsigned long stacksize, int wait_policy)
@@ -1181,6 +1197,7 @@ initialize_env (void)
       gomp_global_icv.thread_limit_var
 	= thread_limit_var > INT_MAX ? UINT_MAX : thread_limit_var;
     }
+  parse_int ("GOACC_NOTIFY", &goacc_notify_var, true);
 #ifndef HAVE_SYNC_BUILTINS
   gomp_mutex_init (&gomp_managed_threads_lock);
 #endif
@@ -1271,6 +1288,15 @@ initialize_env (void)
     }
 
   handle_omp_display_env (stacksize, wait_policy);
+  
+  /* Look for OpenACC-specific environment variables.  */
+  if (!parse_int ("ACC_DEVICE_NUM", &goacc_device_num, true))
+    goacc_device_num = 0;
+
+  goacc_parse_device_type ();
+
+  /* Initialize OpenACC-specific internal state.  */
+  goacc_runtime_initialize ();
 }
 
 \f
diff --git a/libgomp/error.c b/libgomp/error.c
index d9b28f1..c455f58 100644
--- a/libgomp/error.c
+++ b/libgomp/error.c
@@ -35,7 +35,7 @@
 #include <stdlib.h>
 
 
-static void
+void
 gomp_verror (const char *fmt, va_list list)
 {
   fputs ("\nlibgomp: ", stderr);
@@ -54,13 +54,40 @@ gomp_error (const char *fmt, ...)
 }
 
 void
+gomp_vfatal (const char *fmt, va_list list)
+{
+  gomp_verror (fmt, list);
+  exit (EXIT_FAILURE);
+}
+
+void
 gomp_fatal (const char *fmt, ...)
 {
   va_list list;
 
   va_start (list, fmt);
-  gomp_verror (fmt, list);
+  gomp_vfatal (fmt, list);
   va_end (list);
+}
 
-  exit (EXIT_FAILURE);
+#ifdef LIBGOMP_VERBOSE
+
+#undef gomp_vnotify
+void
+gomp_vnotify (const char *msg, va_list list)
+{
+  if (goacc_notify_var)
+    vfprintf (stderr, msg, list);
+}
+
+#undef gomp_notify
+void
+gomp_notify (const char *msg, ...)
+{
+  va_list list;
+  
+  va_start (list, msg);
+  gomp_vnotify (msg, list);
+  va_end (list);
 }
+#endif
diff --git a/libgomp/libgomp-plugin.c b/libgomp/libgomp-plugin.c
new file mode 100644
index 0000000..f0e35d6
--- /dev/null
+++ b/libgomp/libgomp-plugin.c
@@ -0,0 +1,107 @@
+/* Copyright (C) 2014 Free Software Foundation, Inc.
+
+   Contributed by Mentor Embedded.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* Exported (non-hidden) functions exposing libgomp interface for plugins.  */
+
+#include <stdlib.h>
+
+#include "libgomp.h"
+#include "libgomp-plugin.h"
+#include "target.h"
+
+void *
+GOMP_PLUGIN_malloc (size_t size)
+{
+  return gomp_malloc (size);
+}
+
+void *
+GOMP_PLUGIN_malloc_cleared (size_t size)
+{
+  return gomp_malloc_cleared (size);
+}
+
+void *
+GOMP_PLUGIN_realloc (void *ptr, size_t size)
+{
+  return gomp_realloc (ptr, size);
+}
+
+void
+GOMP_PLUGIN_error (const char *msg, ...)
+{
+  va_list ap;
+  
+  va_start (ap, msg);
+  gomp_verror (msg, ap);
+  va_end (ap);
+}
+
+void
+GOMP_PLUGIN_notify (const char *msg, ...)
+{
+  va_list ap;
+  
+  va_start (ap, msg);
+  gomp_vnotify (msg, ap);
+  va_end (ap);
+}
+
+void
+GOMP_PLUGIN_fatal (const char *msg, ...)
+{
+  va_list ap;
+  
+  va_start (ap, msg);
+  gomp_vfatal (msg, ap);
+  va_end (ap);
+  
+  /* Unreachable.  */
+  abort ();
+}
+
+void
+GOMP_PLUGIN_mutex_init (gomp_mutex_t *mutex)
+{
+  gomp_mutex_init (mutex);
+}
+
+void
+GOMP_PLUGIN_mutex_destroy (gomp_mutex_t *mutex)
+{
+  gomp_mutex_destroy (mutex);
+}
+
+void
+GOMP_PLUGIN_mutex_lock (gomp_mutex_t *mutex)
+{
+  gomp_mutex_lock (mutex);
+}
+
+void
+GOMP_PLUGIN_mutex_unlock (gomp_mutex_t *mutex)
+{
+  gomp_mutex_unlock (mutex);
+}
diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h
new file mode 100644
index 0000000..87367e3
--- /dev/null
+++ b/libgomp/libgomp-plugin.h
@@ -0,0 +1,54 @@
+/* Copyright (C) 2014 Free Software Foundation, Inc.
+
+   Contributed by Mentor Embedded.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* An interface to various libgomp-internal functions for use by plugins.  */
+
+#ifndef LIBGOMP_PLUGIN_H
+#define LIBGOMP_PLUGIN_H 1
+
+#include "mutex.h"
+
+/* alloc.c */
+
+extern void *GOMP_PLUGIN_malloc (size_t) __attribute__((malloc));
+extern void *GOMP_PLUGIN_malloc_cleared (size_t) __attribute__((malloc));
+extern void *GOMP_PLUGIN_realloc (void *, size_t);
+
+/* error.c */
+
+extern void GOMP_PLUGIN_notify(const char *msg, ...);
+extern void GOMP_PLUGIN_error (const char *, ...)
+	__attribute__((format (printf, 1, 2)));
+extern void GOMP_PLUGIN_fatal (const char *, ...)
+	__attribute__((noreturn, format (printf, 1, 2)));
+
+/* mutex.c */
+
+extern void GOMP_PLUGIN_mutex_init (gomp_mutex_t *mutex);
+extern void GOMP_PLUGIN_mutex_destroy (gomp_mutex_t *mutex);
+extern void GOMP_PLUGIN_mutex_lock (gomp_mutex_t *mutex);
+extern void GOMP_PLUGIN_mutex_unlock (gomp_mutex_t *mutex);
+
+#endif
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index a1482cc..b86b960 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -40,6 +40,7 @@
 #include <pthread.h>
 #include <stdbool.h>
 #include <stdlib.h>
+#include <stdarg.h>
 
 #ifdef HAVE_ATTRIBUTE_VISIBILITY
 # pragma GCC visibility push(hidden)
@@ -220,6 +221,7 @@ struct gomp_team_state
 };
 
 struct target_mem_desc;
+struct gomp_memory_mapping;
 
 /* These are the OpenMP 4.0 Internal Control Variables described in
    section 2.3.1.  Those described as having one copy per task are
@@ -254,6 +256,10 @@ extern unsigned long gomp_bind_var_list_len;
 extern void **gomp_places_list;
 extern unsigned long gomp_places_list_len;
 
+extern int goacc_notify_var;
+extern int goacc_device_num;
+extern char* goacc_device_type;
+
 enum gomp_task_kind
 {
   GOMP_TASK_IMPLICIT,
@@ -532,8 +538,29 @@ extern void *gomp_realloc (void *, size_t);
 
 /* error.c */
 
+#ifdef LIBGOMP_VERBOSE
+extern void gomp_vnotify (const char *, va_list);
+extern void gomp_notify (const char *msg, ...)
+	__attribute__((format (printf, 1, 2)));
+#define gomp_notify(...) \
+  do { \
+    if (__builtin_expect (goacc_notify_var, 0)) \
+      (gomp_notify) (__VA_ARGS__); \
+  } while (0)
+#define gomp_vnotify(FMT, VALIST) \
+  do { \
+    if (__builtin_expect (goacc_notify_var, 0)) \
+      (gomp_vnotify) ((FMT), (VALIST)); \
+  } while (0)
+#else
+#define gomp_vnotify(FMT, VALIST)
+#define gomp_notify(FMT, ...)
+#endif
+extern void gomp_verror (const char *, va_list);
 extern void gomp_error (const char *, ...)
 	__attribute__((format (printf, 1, 2)));
+extern void gomp_vfatal (const char *, va_list)
+	__attribute__((noreturn));
 extern void gomp_fatal (const char *, ...)
 	__attribute__((noreturn, format (printf, 1, 2)));
 
@@ -606,6 +633,7 @@ extern void gomp_free_thread (void *);
 
 /* target.c */
 
+extern void gomp_init_targets_once (void);
 extern int gomp_get_num_devices (void);
 
 /* work.c */
diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
index f36df23..f6e70e9 100644
--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -232,3 +232,98 @@ GOMP_4.0.1 {
   global:
 	GOMP_offload_register;
 } GOMP_4.0;
+
+OACC_2.0 {
+  global:
+	acc_get_num_devices;
+	acc_set_device_type;
+	acc_get_device_type;
+	acc_set_device_num;
+	acc_get_device_num;
+	acc_async_test;
+	acc_async_test_h_;
+	acc_async_test_all;
+	acc_async_test_all_h_;
+	acc_wait;
+	acc_wait_async;
+	acc_wait_all;
+	acc_wait_all_async;
+	acc_init;
+	acc_shutdown;
+	acc_on_device;
+	acc_on_device_h_;
+	acc_malloc;
+	acc_free;
+	acc_copyin;
+	acc_copyin_32_h_;
+	acc_copyin_64_h_;
+	acc_copyin_array_h_;
+	acc_present_or_copyin;
+	acc_present_or_copyin_32_h_;
+	acc_present_or_copyin_64_h_;
+	acc_present_or_copyin_array_h_;
+	acc_create;
+	acc_create_32_h_;
+	acc_create_64_h_;
+	acc_create_array_h_;
+	acc_present_or_create;
+	acc_present_or_create_32_h_;
+	acc_present_or_create_64_h_;
+	acc_present_or_create_array_h_;
+	acc_copyout;
+	acc_copyout_32_h_;
+	acc_copyout_64_h_;
+	acc_copyout_array_h_;
+	acc_delete;
+	acc_delete_32_h_;
+	acc_delete_64_h_;
+	acc_delete_array_h_;
+	acc_update_device;
+	acc_update_device_32_h_;
+	acc_update_device_64_h_;
+	acc_update_device_array_h_;
+	acc_update_self;
+	acc_update_self_32_h_;
+	acc_update_self_64_h_;
+	acc_update_self_array_h_;
+	acc_map_data;
+	acc_unmap_data;
+	acc_deviceptr;
+	acc_hostptr;
+	acc_is_present;
+	acc_is_present_32_h_;
+	acc_is_present_64_h_;
+	acc_is_present_array_h_;
+	acc_memcpy_to_device;
+	acc_memcpy_from_device;
+	acc_get_current_cuda_device;
+	acc_get_current_cuda_context;
+	acc_get_cuda_stream;
+	acc_set_cuda_stream;
+};
+
+GOACC_2.0 {
+  global:
+	GOACC_data_end;
+	GOACC_data_start;
+	GOACC_kernels;
+	GOACC_parallel;
+	GOACC_update;
+	GOACC_wait;
+};
+
+GOMP_PLUGIN_1.0 {
+  global:
+	GOMP_PLUGIN_malloc;
+	GOMP_PLUGIN_malloc_cleared;
+	GOMP_PLUGIN_realloc;
+	GOMP_PLUGIN_error;
+	GOMP_PLUGIN_notify;
+	GOMP_PLUGIN_fatal;
+	GOMP_PLUGIN_mutex_init;
+	GOMP_PLUGIN_mutex_destroy;
+	GOMP_PLUGIN_mutex_lock;
+	GOMP_PLUGIN_mutex_unlock;
+	GOMP_PLUGIN_async_unmap_vars;
+	GOMP_PLUGIN_acc_thread;
+};
diff --git a/libgomp/libgomp_g.h b/libgomp/libgomp_g.h
index be0c6ea..44f200c 100644
--- a/libgomp/libgomp_g.h
+++ b/libgomp/libgomp_g.h
@@ -214,4 +214,17 @@ extern void GOMP_target_update (int, const void *,
 				size_t, void **, size_t *, unsigned char *);
 extern void GOMP_teams (unsigned int, unsigned int);
 
+/* oacc-parallel.c */
+
+extern void GOACC_data_start (int, const void *,
+			      size_t, void **, size_t *, unsigned short *);
+extern void GOACC_data_end (void);
+extern void GOACC_kernels (int, void (*) (void *), const void *,
+			   size_t, void **, size_t *, unsigned short *,
+			   int, int, int, int, int, ...);
+extern void GOACC_parallel (int, void (*) (void *), const void *,
+			    size_t, void **, size_t *, unsigned short *,
+			    int, int, int, int, int, ...);
+extern void GOACC_wait (int, int, ...);
+
 #endif /* LIBGOMP_G_H */
diff --git a/libgomp/libgomp_target.h b/libgomp/libgomp_target.h
index f7d19d0..679368a 100644
--- a/libgomp/libgomp_target.h
+++ b/libgomp/libgomp_target.h
@@ -24,11 +24,15 @@
 #ifndef LIBGOMP_TARGET_H
 #define LIBGOMP_TARGET_H 1
 
-/* Type of offload target device.  */
+#include "gomp-constants.h"
+
+/* Type of offload target device.  Keep in sync with openacc.h:acc_device_t.  */
 enum offload_target_type
 {
-  OFFLOAD_TARGET_TYPE_HOST,
-  OFFLOAD_TARGET_TYPE_INTEL_MIC
+  OFFLOAD_TARGET_TYPE_HOST = GOMP_TARGET_HOST,
+  OFFLOAD_TARGET_TYPE_HOST_NONSHM = GOMP_TARGET_HOST_NONSHM,
+  OFFLOAD_TARGET_TYPE_NVIDIA_PTX = GOMP_TARGET_NVIDIA_PTX,
+  OFFLOAD_TARGET_TYPE_INTEL_MIC = GOMP_TARGET_INTEL_MIC
 };
 
 /* Auxiliary struct, used for transferring a host-target address range mapping
@@ -41,4 +45,177 @@ struct mapping_table
   uintptr_t tgt_end;
 };
 
+#include "splay-tree.h"
+
+struct target_mem_desc {
+  /* Reference count.  */
+  uintptr_t refcount;
+  /* All the splay nodes allocated together.  */
+  splay_tree_node array;
+  /* Start of the target region.  */
+  uintptr_t tgt_start;
+  /* End of the targer region.  */
+  uintptr_t tgt_end;
+  /* Handle to free.  */
+  void *to_free;
+  /* Previous target_mem_desc.  */
+  struct target_mem_desc *prev;
+  /* Number of items in following list.  */
+  size_t list_count;
+
+  /* Corresponding target device descriptor.  */
+  struct gomp_device_descr *device_descr;
+  
+  /* Memory mapping info for the thread that created this descriptor.  */
+  struct gomp_memory_mapping *mem_map;
+
+  /* List of splay keys to remove (or decrease refcount)
+     at the end of region.  */
+  splay_tree_key list[];
+};
+
+#define TARGET_CAP_SHARED_MEM	1
+#define TARGET_CAP_NATIVE_EXEC	2
+#define TARGET_CAP_OPENMP_400	4
+#define TARGET_CAP_OPENACC_200	8
+
+/* Information about mapped memory regions (per device/context).  */
+
+struct gomp_memory_mapping
+{
+  /* Splay tree containing information about mapped memory regions.  */
+  struct splay_tree_s splay_tree;
+
+  /* Mutex for operating with the splay tree and other shared structures.  */
+  gomp_mutex_t lock;
+  
+  /* True when tables have been added to this memory map.  */
+  bool is_initialized;
+};
+
+typedef struct acc_dispatch_t
+{
+  /* This is a linked list of data mapped using the
+     acc_map_data/acc_unmap_data or "acc enter data"/"acc exit data" pragmas
+     (TODO).  Unlike mapped_data in the goacc_thread struct, unmapping can
+     happen out-of-order with respect to mapping.  */
+  struct target_mem_desc *data_environ;
+
+  /* Open or close a device instance.  */
+  void *(*open_device_func) (int n);
+  int (*close_device_func) (void *h);
+
+  /* Set or get the device number.  */
+  int (*get_device_num_func) (void);
+  void (*set_device_num_func) (int);
+
+  /* Execute.  */
+  void (*exec_func) (void (*) (void *), size_t, void **, void **, size_t *,
+		     unsigned short *, int, int, int, int, void *);
+
+  /* Async cleanup callback registration.  */
+  void (*register_async_cleanup_func) (void *);
+
+  /* Asynchronous routines.  */
+  int (*async_test_func) (int);
+  int (*async_test_all_func) (void);
+  void (*async_wait_func) (int);
+  void (*async_wait_async_func) (int, int);
+  void (*async_wait_all_func) (void);
+  void (*async_wait_all_async_func) (int);
+  void (*async_set_async_func) (int);
+
+  /* Create/destroy TLS data.  */
+  void *(*create_thread_data_func) (void *);
+  void (*destroy_thread_data_func) (void *);
+
+  /* NVIDIA target specific routines.  */
+  struct {
+    void *(*get_current_device_func) (void);
+    void *(*get_current_context_func) (void);
+    void *(*get_stream_func) (int);
+    int (*set_stream_func) (int, void *);
+  } cuda;
+} acc_dispatch_t;
+
+/* This structure describes accelerator device.
+   It contains name of the corresponding libgomp plugin, function handlers for
+   interaction with the device, ID-number of the device, and information about
+   mapped memory.  */
+struct gomp_device_descr
+{
+  /* The name of the device.  */
+  const char *name;
+
+  /* Capabilities of device (supports OpenACC, OpenMP).  */
+  unsigned int capabilities;
+
+  /* This is the ID number of device.  It could be specified in DEVICE-clause of
+     TARGET construct.  */
+  int id;
+
+  /* This is the ID number of device among devices of the same type.  */
+  int target_id;
+
+  /* This is the TYPE of device.  */
+  enum offload_target_type type;
+
+  /* Set to true when device is initialized.  */
+  bool is_initialized;
+  
+  /* True when offload regions have been registered with this device.  */
+  bool offload_regions_registered;
+
+  /* Plugin file handler.  */
+  void *plugin_handle;
+
+  /* Function handlers.  */
+  const char *(*get_name_func) (void);
+  unsigned int (*get_caps_func) (void);
+  int (*get_type_func) (void);
+  int (*get_num_devices_func) (void);
+  void (*register_image_func) (void *, void *);
+  void (*init_device_func) (int);
+  void (*fini_device_func) (int);
+  int (*get_table_func) (int, struct mapping_table **);
+  void *(*alloc_func) (int, size_t);
+  void (*free_func) (int, void *);
+  void *(*dev2host_func) (int, void *, const void *, size_t);
+  void *(*host2dev_func) (int, void *, const void *, size_t);
+  void (*run_func) (int, void *, void *);
+
+  /* OpenACC-specific functions.  */
+  acc_dispatch_t openacc;
+  
+  /* Memory-mapping info for this device instance.  */
+  struct gomp_memory_mapping mem_map;
+
+  /* Extra information required for a device instance by a given target.  */
+  void *target_data;
+};
+
+extern struct target_mem_desc *
+gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
+	       void **hostaddrs, void **devaddrs, size_t *sizes, void *kinds,
+	       bool is_openacc, bool is_target);
+
+extern void
+gomp_copy_from_async (struct target_mem_desc *tgt);
+
+extern void
+gomp_unmap_vars (struct target_mem_desc *tgt, bool);
+
+extern attribute_hidden void
+gomp_init_device (struct gomp_device_descr *devicep);
+
+extern attribute_hidden void
+gomp_init_tables (const struct gomp_device_descr *devicep,
+		  struct gomp_memory_mapping *mm);
+
+extern attribute_hidden void
+gomp_fini_device (struct gomp_device_descr *devicep);
+
+extern attribute_hidden void
+gomp_free_memmap (struct gomp_device_descr *devicep);
+
 #endif /* LIBGOMP_TARGET_H */
diff --git a/libgomp/oacc-async.c b/libgomp/oacc-async.c
new file mode 100644
index 0000000..94c62d8
--- /dev/null
+++ b/libgomp/oacc-async.c
@@ -0,0 +1,77 @@
+/* OpenACC Runtime Library Definitions.
+
+   Copyright (C) 2013-2014 Free Software Foundation, Inc.
+
+   Contributed by Mentor Embedded.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#include "openacc.h"
+#include "libgomp.h"
+#include "libgomp_target.h"
+#include "oacc-int.h"
+
+int
+acc_async_test (int async)
+{
+  if (async < acc_async_sync)
+    gomp_fatal ("invalid async argument: %d", async);
+
+  return base_dev->openacc.async_test_func (async);
+}
+
+int
+acc_async_test_all (void)
+{
+  return base_dev->openacc.async_test_all_func ();
+}
+
+void
+acc_wait (int async)
+{
+  if (async < acc_async_sync)
+    gomp_fatal ("invalid async argument: %d", async);
+
+  base_dev->openacc.async_wait_func (async);
+}
+
+void
+acc_wait_async (int async1, int async2)
+{
+  base_dev->openacc.async_wait_async_func (async1, async2);
+}
+
+void
+acc_wait_all (void)
+{
+  base_dev->openacc.async_wait_all_func ();
+}
+
+void
+acc_wait_all_async (int async)
+{
+  if (async < acc_async_sync)
+    gomp_fatal ("invalid async argument: %d", async);
+
+  base_dev->openacc.async_wait_all_async_func (async);
+}
diff --git a/libgomp/oacc-cuda.c b/libgomp/oacc-cuda.c
new file mode 100644
index 0000000..4d0b284
--- /dev/null
+++ b/libgomp/oacc-cuda.c
@@ -0,0 +1,84 @@
+/* OpenACC Runtime Library: CUDA support glue.
+
+   Copyright (C) 2014 Free Software Foundation, Inc.
+
+   Contributed by Mentor Embedded.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "openacc.h"
+#include "config.h"
+#include "libgomp.h"
+#include "libgomp_target.h"
+#include "oacc-int.h"
+
+void *
+acc_get_current_cuda_device (void)
+{
+  void *p = NULL;
+
+  if (base_dev && base_dev->openacc.cuda.get_current_device_func)
+    p = base_dev->openacc.cuda.get_current_device_func ();
+
+  return p;
+}
+
+void *
+acc_get_current_cuda_context (void)
+{
+  void *p = NULL;
+
+  if (base_dev && base_dev->openacc.cuda.get_current_context_func)
+    p = base_dev->openacc.cuda.get_current_context_func ();
+
+  return p;
+}
+
+void *
+acc_get_cuda_stream (int async)
+{
+  void *p = NULL;
+
+  if (async < 0)
+    return p;
+
+  if (base_dev && base_dev->openacc.cuda.get_stream_func)
+    p = base_dev->openacc.cuda.get_stream_func (async);
+
+  return p;
+}
+
+int
+acc_set_cuda_stream (int async, void *stream)
+{
+  int s = -1;
+
+  if (async < 0 || stream == NULL)
+    return 0;
+  
+  goacc_lazy_initialize ();
+
+  if (base_dev && base_dev->openacc.cuda.set_stream_func)
+    s = base_dev->openacc.cuda.set_stream_func (async, stream);
+
+  return s;
+}
diff --git a/libgomp/oacc-host.c b/libgomp/oacc-host.c
new file mode 100644
index 0000000..0d94465
--- /dev/null
+++ b/libgomp/oacc-host.c
@@ -0,0 +1,99 @@
+/* OpenACC Runtime Library: acc_device_host, acc_device_host_nonshm.
+
+   Copyright (C) 2013-2014 Free Software Foundation, Inc.
+
+   Contributed by Mentor Embedded.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* This shares much of the implementation of the plugin-host.c "host_nonshm"
+   plugin.  */
+#include "plugin/plugin-host.c"
+
+static struct gomp_device_descr host_dispatch =
+  {
+    .name = "host",
+
+    .type = OFFLOAD_TARGET_TYPE_HOST,
+    .capabilities = TARGET_CAP_OPENACC_200 | TARGET_CAP_NATIVE_EXEC
+		    | TARGET_CAP_SHARED_MEM,
+    .id = 0,
+
+    .is_initialized = false,
+    .offload_regions_registered = false,
+
+    .get_name_func = GOMP_OFFLOAD_get_name,
+    .get_type_func = GOMP_OFFLOAD_get_type,
+    .get_caps_func = GOMP_OFFLOAD_get_caps,
+
+    .init_device_func = GOMP_OFFLOAD_init_device,
+    .fini_device_func = GOMP_OFFLOAD_fini_device,
+    .get_num_devices_func = GOMP_OFFLOAD_get_num_devices,
+    .register_image_func = GOMP_OFFLOAD_register_image,
+    .get_table_func = GOMP_OFFLOAD_get_table,
+
+    .alloc_func = GOMP_OFFLOAD_alloc,
+    .free_func = GOMP_OFFLOAD_free,
+    .host2dev_func = GOMP_OFFLOAD_host2dev,
+    .dev2host_func = GOMP_OFFLOAD_dev2host,
+    
+    .run_func = GOMP_OFFLOAD_run,
+
+    .openacc = {
+      .open_device_func = GOMP_OFFLOAD_openacc_open_device,
+      .close_device_func = GOMP_OFFLOAD_openacc_close_device,
+
+      .get_device_num_func = GOMP_OFFLOAD_openacc_get_device_num,
+      .set_device_num_func = GOMP_OFFLOAD_openacc_set_device_num,
+
+      .exec_func = GOMP_OFFLOAD_openacc_parallel,
+
+      .register_async_cleanup_func
+        = GOMP_OFFLOAD_openacc_register_async_cleanup,
+
+      .async_set_async_func = GOMP_OFFLOAD_openacc_async_set_async,
+      .async_test_func = GOMP_OFFLOAD_openacc_async_test,
+      .async_test_all_func = GOMP_OFFLOAD_openacc_async_test_all,
+      .async_wait_func = GOMP_OFFLOAD_openacc_async_wait,
+      .async_wait_async_func = GOMP_OFFLOAD_openacc_async_wait_async,
+      .async_wait_all_func = GOMP_OFFLOAD_openacc_async_wait_all,
+      .async_wait_all_async_func = GOMP_OFFLOAD_openacc_async_wait_all_async,
+
+      .create_thread_data_func = GOMP_OFFLOAD_openacc_create_thread_data,
+      .destroy_thread_data_func = GOMP_OFFLOAD_openacc_destroy_thread_data,
+
+      .cuda = {
+	.get_current_device_func = NULL,
+	.get_current_context_func = NULL,
+	.get_stream_func = NULL,
+	.set_stream_func = NULL,
+      }
+    }
+  };
+
+/* Register this device type.  */
+static __attribute__ ((constructor))
+void goacc_host_init (void)
+{
+  gomp_mutex_init (&host_dispatch.mem_map.lock);
+  goacc_register (&host_dispatch);
+}
diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c
new file mode 100644
index 0000000..ed5deb3
--- /dev/null
+++ b/libgomp/oacc-init.c
@@ -0,0 +1,613 @@
+/* OpenACC Runtime initialization routines
+
+   Copyright (C) 2013-2014 Free Software Foundation, Inc.
+
+   Contributed by Mentor Embedded.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "libgomp.h"
+#include "libgomp_target.h"
+#include "oacc-int.h"
+#include "openacc.h"
+#include <assert.h>
+#include <stdlib.h>
+#include <strings.h>
+#include <stdbool.h>
+#include <stdio.h>
+
+static gomp_mutex_t acc_device_lock;
+
+/* The dispatch table for the current accelerator device.  This is global, so
+   you can only have one type of device open at any given time in a program. 
+   This is the "base" device in that several devices that use the same
+   dispatch table may be active concurrently: this one (the "zeroth") is used
+   for overall initialisation/shutdown, and other instances -- not necessarily
+   including this one -- may be opened and closed once the base device has
+   been initialized.  */
+struct gomp_device_descr const *base_dev;
+
+#ifdef HAVE_TLS
+__thread struct goacc_thread *goacc_tls_data;
+#else
+pthread_key_t goacc_tls_key;
+#endif
+static pthread_key_t goacc_cleanup_key;
+
+/* Current dispatcher, and how it was initialized */
+static acc_device_t init_key = _ACC_device_hwm;
+
+static struct goacc_thread *goacc_threads;
+static gomp_mutex_t goacc_thread_lock;
+
+/* An array of dispatchers for device types, indexed by the type.  This array
+   only references "base" devices, and other instances of the same type are
+   found by simply indexing from each such device (which are stored linearly,
+   grouped by device in target.c:devices).  */
+static struct gomp_device_descr const *dispatchers[_ACC_device_hwm] = { 0 };
+
+attribute_hidden void
+goacc_register (struct gomp_device_descr const *disp)
+{
+  /* Only register the 0th device here.  */
+  if (disp->target_id != 0)
+    return;
+
+  gomp_mutex_lock (&acc_device_lock);
+
+  assert (acc_device_type (disp->type) != acc_device_none
+	  && acc_device_type (disp->type) != acc_device_default
+	  && acc_device_type (disp->type) != acc_device_not_host);
+  assert (!dispatchers[disp->type]);
+  dispatchers[disp->type] = disp;
+
+  gomp_mutex_unlock (&acc_device_lock);
+}
+
+static struct gomp_device_descr const *
+resolve_device (acc_device_t d)
+{
+  acc_device_t d_arg = d;
+
+  switch (d)
+    {
+    case acc_device_default:
+      {
+	if (goacc_device_type)
+	  {
+	    /* Lookup the named device.  */
+	    while (++d != _ACC_device_hwm)
+	      if (dispatchers[d]
+		  && !strcasecmp (goacc_device_type, dispatchers[d]->name)
+		  && dispatchers[d]->get_num_devices_func () > 0)
+		goto found;
+
+	    gomp_fatal ("device type %s not supported", goacc_device_type);
+	  }
+
+	/* No default device specified, so start scanning for any non-host
+	   device that is available.  */
+	d = acc_device_not_host;
+      }
+      /* FALLTHROUGH */
+
+    case acc_device_not_host:
+      /* Find the first available device after acc_device_not_host.  */
+      while (++d != _ACC_device_hwm)
+	if (dispatchers[d] && dispatchers[d]->get_num_devices_func () > 0)
+	  goto found;
+      if (d_arg == acc_device_default)
+	{	  
+	  d = acc_device_host;
+	  goto found;
+	}
+      gomp_fatal ("no device found");
+      break;
+
+    case acc_device_host:
+      break;
+
+    default:
+      if (d > _ACC_device_hwm)
+	gomp_fatal ("device %u out of range", (unsigned)d);
+      break;
+    }
+ found:
+
+  assert (d != acc_device_none
+	  && d != acc_device_default
+	  && d != acc_device_not_host);
+
+  return dispatchers[d];
+}
+
+/* This is called when plugins have been initialized, and serves to call
+   (indirectly) the target's device_init hook.  Calling multiple times without
+   an intervening acc_shutdown_1 call is an error.  */
+
+static struct gomp_device_descr const *
+acc_init_1 (acc_device_t d)
+{
+  struct gomp_device_descr const *acc_dev;
+
+  acc_dev = resolve_device (d);
+
+  if (!acc_dev || acc_dev->get_num_devices_func () <= 0)
+    gomp_fatal ("device %u not supported", (unsigned)d);
+
+  if (acc_dev->is_initialized)
+    gomp_fatal ("device already active");
+
+  /* We need to remember what we were intialized as, to check shutdown etc.  */
+  init_key = d;  
+
+  gomp_init_device ((struct gomp_device_descr *) acc_dev);
+
+  return acc_dev;
+}
+
+static struct goacc_thread *
+goacc_new_thread (void)
+{
+  struct goacc_thread *thr = gomp_malloc (sizeof (struct gomp_thread));
+
+#ifdef HAVE_TLS
+  goacc_tls_data = thr;
+#else
+  pthread_setspecific (goacc_tls_key, thr);
+#endif
+
+  pthread_setspecific (goacc_cleanup_key, thr);
+
+  gomp_mutex_lock (&goacc_thread_lock);
+  thr->next = goacc_threads;
+  goacc_threads = thr;
+  gomp_mutex_unlock (&goacc_thread_lock);
+
+  return thr;
+}
+
+static void
+goacc_destroy_thread (void *data)
+{
+  struct goacc_thread *thr = data, *walk, *prev;
+  
+  gomp_mutex_lock (&goacc_thread_lock);
+  
+  if (thr)
+    {
+      if (base_dev && thr->target_tls)
+	{
+	  base_dev->openacc.destroy_thread_data_func (thr->target_tls);
+	  thr->target_tls = NULL;
+	}
+
+      assert (!thr->mapped_data);
+
+      /* Remove from thread list.  */
+      for (prev = NULL, walk = goacc_threads; walk;
+	   prev = walk, walk = walk->next)
+	if (walk == thr)
+	  {
+	    if (prev == NULL)
+	      goacc_threads = walk->next;
+	    else
+	      prev->next = walk->next;
+
+	    free (thr);
+
+	    break;
+	  }
+
+      assert (walk);
+    }
+
+  gomp_mutex_unlock (&goacc_thread_lock);
+}
+
+/* Open the ORD'th device of the currently-active type (base_dev must be
+   initialised before calling).  If ORD is < 0, open the default-numbered
+   device (set by the ACC_DEVICE_NUM environment variable or a call to
+   acc_set_device_num), or leave any currently-opened device as is.  "Opening"
+   consists of calling the device's open_device_func hook, and setting up
+   thread-local data (maybe allocating, then initializing with information
+   pertaining to the newly-opened or previously-opened device).  */
+
+static void
+lazy_open (int ord)
+{
+  struct goacc_thread *thr = goacc_thread ();
+  struct gomp_device_descr *acc_dev;
+
+  if (thr && thr->dev)
+    {
+      assert (ord < 0 || ord == thr->dev->target_id);
+      return;
+    }
+
+  assert (base_dev);
+
+  if (ord < 0)
+    ord = goacc_device_num;
+
+  if (ord >= base_dev->get_num_devices_func ())
+    gomp_fatal ("device %u does not exist", ord);
+
+  if (!thr)
+    thr = goacc_new_thread ();
+
+  acc_dev = thr->dev = (struct gomp_device_descr *) &base_dev[ord];
+
+  assert (acc_dev->target_id == ord);
+
+  thr->saved_bound_dev = NULL;
+  thr->mapped_data = NULL;
+
+  if (!acc_dev->target_data)
+    acc_dev->target_data = acc_dev->openacc.open_device_func (ord);
+
+  thr->target_tls
+    = acc_dev->openacc.create_thread_data_func (acc_dev->target_data);
+
+  acc_dev->openacc.async_set_async_func (acc_async_sync);
+
+  if (!acc_dev->mem_map.is_initialized)
+    gomp_init_tables (acc_dev, &acc_dev->mem_map);
+}
+
+/* OpenACC 2.0a (3.2.12, 3.2.13) doesn't specify whether the serialization of
+   init/shutdown is per-process or per-thread.  We choose per-process.  */
+
+void
+acc_init (acc_device_t d)
+{
+  if (!base_dev)
+    gomp_init_targets_once ();
+
+  gomp_mutex_lock (&acc_device_lock);
+
+  base_dev = acc_init_1 (d);
+
+  lazy_open (-1);
+
+  gomp_mutex_unlock (&acc_device_lock);
+}
+
+ialias (acc_init)
+
+void
+acc_shutdown_1 (acc_device_t d)
+{
+  struct goacc_thread *walk;
+
+  /* We don't check whether d matches the actual device found, because
+     OpenACC 2.0 (3.2.12) says the parameters to the init and this
+     call must match (for the shutdown call anyway, it's silent on
+     others).  */
+
+  if (!base_dev)
+    gomp_fatal ("no device initialized");
+  if (d != init_key)
+    gomp_fatal ("device %u(%u) is initialized",
+		(unsigned) init_key, (unsigned) base_dev->type);
+
+  gomp_mutex_lock (&goacc_thread_lock);
+
+  /* Free target-specific TLS data and close all devices.  */
+  for (walk = goacc_threads; walk != NULL; walk = walk->next)
+    {
+      if (walk->target_tls)
+	base_dev->openacc.destroy_thread_data_func (walk->target_tls);
+
+      walk->target_tls = NULL;
+
+      /* This would mean the user is shutting down OpenACC in the middle of an
+         "acc data" pragma.  Likely not intentional.  */
+      if (walk->mapped_data)
+	gomp_fatal ("shutdown in 'acc data' region");
+
+      if (walk->dev)
+	{
+          if (walk->dev->openacc.close_device_func (walk->dev->target_data) < 0)
+	    gomp_fatal ("failed to close device");
+
+	  walk->dev->target_data = NULL;
+
+	  gomp_free_memmap (walk->dev);
+
+	  walk->dev = NULL;
+	}
+    }
+
+  gomp_mutex_unlock (&goacc_thread_lock);
+
+  gomp_fini_device ((struct gomp_device_descr *) base_dev);
+
+  base_dev = NULL;
+}
+
+void
+acc_shutdown (acc_device_t d)
+{
+  gomp_mutex_lock (&acc_device_lock);
+
+  acc_shutdown_1 (d);
+
+  gomp_mutex_unlock (&acc_device_lock);
+}
+
+ialias (acc_shutdown)
+
+/* This function is called after plugins have been initialized.  It deals with
+   the "base" device, and is used to prepare the runtime for dealing with a
+   number of such devices (as implemented by some particular plugin).  If the
+   argument device type D matches a previous call to the function, return the
+   current base device, else shut the old device down and re-initialize with
+   the new device type.  */
+
+static struct gomp_device_descr const *
+lazy_init (acc_device_t d)
+{
+  if (base_dev)
+    {
+      /* Re-initializing the same device, do nothing.  */
+      if (d == init_key)
+	return base_dev;
+
+      acc_shutdown_1 (init_key);
+    }
+
+  assert (!base_dev);
+
+  return acc_init_1 (d);
+}
+
+/* Ensure that plugins are loaded, initialize and open the (default-numbered)
+   device.  */
+
+static void
+lazy_init_and_open (acc_device_t d)
+{
+  if (!base_dev)
+    gomp_init_targets_once ();
+
+  gomp_mutex_lock (&acc_device_lock);
+
+  base_dev = lazy_init (d);
+
+  lazy_open (-1);
+
+  gomp_mutex_unlock (&acc_device_lock);
+}
+
+int
+acc_get_num_devices (acc_device_t d)
+{
+  int n = 0;
+  struct gomp_device_descr const *acc_dev;
+
+  if (d == acc_device_none)
+    return 0;
+
+  if (!base_dev)
+    gomp_init_targets_once ();
+
+  acc_dev = resolve_device (d);
+  if (!acc_dev)
+    return 0;
+
+  n = acc_dev->get_num_devices_func ();
+  if (n < 0)
+    n = 0;
+
+  return n;
+}
+
+ialias (acc_get_num_devices)
+
+void
+acc_set_device_type (acc_device_t d)
+{
+  lazy_init_and_open (d);
+}
+
+ialias (acc_set_device_type)
+
+acc_device_t
+acc_get_device_type (void)
+{
+  acc_device_t res = acc_device_none;
+  const struct gomp_device_descr *dev;
+
+  if (base_dev)
+    res = acc_device_type (base_dev->type);
+  else
+    {
+      gomp_init_targets_once ();
+
+      dev = resolve_device (acc_device_default);
+      res = acc_device_type (dev->type);
+    }
+
+  assert (res != acc_device_default
+	  && res != acc_device_not_host);
+
+  return res;
+}
+
+ialias (acc_get_device_type)
+
+int
+acc_get_device_num (acc_device_t d)
+{
+  const struct gomp_device_descr *dev;
+  int num;
+
+  if (d >= _ACC_device_hwm)
+    gomp_fatal ("device %u out of range", (unsigned)d);
+
+  if (!base_dev)
+    gomp_init_targets_once ();
+
+  dev = resolve_device (d);
+  if (!dev)
+    gomp_fatal ("no devices of type %u", d);
+
+  /* We might not have called lazy_open for this host thread yet, in which case
+     the get_device_num_func hook will return -1.  */
+  num = dev->openacc.get_device_num_func ();
+  if (num < 0)
+    num = goacc_device_num;
+  
+  return num;
+}
+
+ialias (acc_get_device_num)
+
+void
+acc_set_device_num (int n, acc_device_t d)
+{
+  const struct gomp_device_descr *dev;
+  int num_devices;
+
+  if (!base_dev)
+    gomp_init_targets_once ();
+  
+  if ((int) d == 0)
+    {
+      int i;
+      
+      /* A device setting of zero sets all device types on the system to use
+         the Nth instance of that device type.  Only attempt it for initialized
+	 devices though.  */
+      for (i = acc_device_not_host + 1; i < _ACC_device_hwm; i++)
+        {
+	  dev = resolve_device (d);
+	  if (dev && dev->is_initialized)
+	    dev->openacc.set_device_num_func (n);
+	}
+
+      /* ...and for future calls to acc_init/acc_set_device_type, etc.  */
+      goacc_device_num = n;
+    }
+  else
+    {
+      struct goacc_thread *thr = goacc_thread ();
+
+      gomp_mutex_lock (&acc_device_lock);
+
+      base_dev = lazy_init (d);
+
+      num_devices = base_dev->get_num_devices_func ();
+
+      if (n >= num_devices)
+        gomp_fatal ("device %u out of range", n);
+
+      /* If we're changing the device number, de-associate this thread with
+	 the device (but don't close the device, since it may be in use by
+	 other threads).  */
+      if (thr && thr->dev && n != thr->dev->target_id)
+	thr->dev = NULL;
+
+      lazy_open (n);
+
+      gomp_mutex_unlock (&acc_device_lock);
+    }
+}
+
+ialias (acc_set_device_num)
+
+int
+acc_on_device (acc_device_t dev)
+{
+  struct goacc_thread *thr = goacc_thread ();
+
+  if (thr && thr->dev
+      && acc_device_type (thr->dev->type) == acc_device_host_nonshm)
+    return dev == acc_device_host_nonshm || dev == acc_device_not_host;
+
+  /* Just rely on the compiler builtin.  */
+  return __builtin_acc_on_device (dev);
+}
+ialias (acc_on_device)
+
+attribute_hidden void
+goacc_runtime_initialize (void)
+{
+  gomp_mutex_init (&acc_device_lock);
+
+#ifndef HAVE_TLS
+  pthread_key_create (&goacc_tls_key, NULL);
+#endif
+
+  pthread_key_create (&goacc_cleanup_key, goacc_destroy_thread);
+
+  base_dev = NULL;
+
+  goacc_threads = NULL;
+  gomp_mutex_init (&goacc_thread_lock);
+}
+
+/* Compiler helper functions */
+
+attribute_hidden void
+goacc_save_and_set_bind (acc_device_t d)
+{
+  struct goacc_thread *thr = goacc_thread ();
+
+  assert (!thr->saved_bound_dev);
+
+  thr->saved_bound_dev = thr->dev;
+  thr->dev = (struct gomp_device_descr *) dispatchers[d];
+}
+
+attribute_hidden void
+goacc_restore_bind (void)
+{
+  struct goacc_thread *thr = goacc_thread ();
+
+  thr->dev = thr->saved_bound_dev;
+  thr->saved_bound_dev = NULL;
+}
+
+/* This is called from any OpenACC support function that may need to implicitly
+   initialize the libgomp runtime.  On exit all such initialization will have
+   been done, and both the global ACC_dev and the per-host-thread ACC_memmap
+   pointers will be valid.  */
+
+attribute_hidden void
+goacc_lazy_initialize (void)
+{
+  struct goacc_thread *thr = goacc_thread ();
+
+  if (thr && thr->dev)
+    return;
+
+  if (!base_dev)
+    lazy_init_and_open (acc_device_default);
+  else
+    {
+      gomp_mutex_lock (&acc_device_lock);
+      lazy_open (-1);
+      gomp_mutex_unlock (&acc_device_lock);
+    }
+}
diff --git a/libgomp/oacc-int.h b/libgomp/oacc-int.h
new file mode 100644
index 0000000..c333a20
--- /dev/null
+++ b/libgomp/oacc-int.h
@@ -0,0 +1,106 @@
+/* OpenACC Runtime - internal declarations
+
+   Copyright (C) 2005-2014 Free Software Foundation, Inc.
+
+   Contributed by Mentor Embedded.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* This file contains data types and function declarations that are not
+   part of the official OpenACC user interface.  There are declarations
+   in here that are part of the GNU OpenACC ABI, in that the compiler is
+   required to know about them and use them.
+
+   The convention is that the all caps prefix "GOACC" is used group items
+   that are part of the external ABI, and the lower case prefix "goacc"
+   is used group items that are completely private to the library.  */
+
+#ifndef _OACC_INT_H
+#define _OACC_INT_H 1
+
+#include "openacc.h"
+#include "config.h"
+#include <stddef.h>
+#include <stdbool.h>
+#include <stdarg.h>
+
+#ifdef HAVE_ATTRIBUTE_VISIBILITY
+# pragma GCC visibility push(hidden)
+#endif
+
+static inline enum acc_device_t
+acc_device_type (enum offload_target_type type)
+{
+  return (enum acc_device_t) type;
+}
+
+struct goacc_thread
+{
+  /* The device for the current thread.  */
+  struct gomp_device_descr *dev;
+  
+  struct gomp_device_descr *saved_bound_dev;
+
+  /* This is a linked list of data mapped by the "acc data" pragma, following
+     strictly push/pop semantics according to lexical scope.  */
+  struct target_mem_desc *mapped_data;
+    
+  /* These structures form a list: this is the next thread in that list.  */
+  struct goacc_thread *next;
+  
+  /* Target-specific data (used by plugin).  */
+  void *target_tls;
+};
+
+#ifdef HAVE_TLS
+extern __thread struct goacc_thread *goacc_tls_data;
+static inline struct goacc_thread *
+goacc_thread (void)
+{
+  return goacc_tls_data;
+}
+#else
+extern pthread_key_t goacc_tls_key;
+static inline struct goacc_thread *
+goacc_thread (void)
+{
+  return pthread_getspecific (goacc_tls_key);
+}
+#endif
+
+struct gomp_device_descr;
+
+void goacc_register (struct gomp_device_descr const *) __GOACC_NOTHROW;
+
+/* Current dispatcher.  */
+extern struct gomp_device_descr const *base_dev;
+
+void goacc_runtime_initialize (void);
+void goacc_save_and_set_bind (acc_device_t);
+void goacc_restore_bind (void);
+void goacc_lazy_initialize (void);
+
+#ifdef HAVE_ATTRIBUTE_VISIBILITY
+# pragma GCC visibility pop
+#endif
+
+#endif /* _OACC_INT_H */
diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
new file mode 100644
index 0000000..ac1ea47
--- /dev/null
+++ b/libgomp/oacc-mem.c
@@ -0,0 +1,510 @@
+/* OpenACC Runtime initialization routines
+
+   Copyright (C) 2013 Free Software Foundation, Inc.
+
+   Contributed by Mentor Embedded.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "openacc.h"
+#include "config.h"
+#include "libgomp.h"
+#include "gomp-constants.h"
+#include "libgomp_target.h"
+#include "oacc-int.h"
+#include <stdio.h>
+#include <stdint.h>
+#include <assert.h>
+
+#include "splay-tree.h"
+
+/* Return block containing [H->S), or NULL if not contained.  */
+
+attribute_hidden splay_tree_key
+lookup_host (struct gomp_memory_mapping *mem_map, void *h, size_t s)
+{
+  struct splay_tree_key_s node;
+  splay_tree_key key;
+
+  node.host_start = (uintptr_t) h;
+  node.host_end = (uintptr_t) h + s;
+
+  gomp_mutex_lock (&mem_map->lock);
+
+  key = splay_tree_lookup (&mem_map->splay_tree, &node);
+
+  gomp_mutex_unlock (&mem_map->lock);
+
+  return key;
+}
+
+/* Return block containing [D->S), or NULL if not contained.
+   The list isn't ordered by device address, so we have to iterate
+   over the whole array.  This is not expected to be a common
+   operation.  */
+
+static splay_tree_key
+lookup_dev (struct target_mem_desc *tgt, void *d, size_t s)
+{
+  int i;
+  struct target_mem_desc *t;
+  struct gomp_memory_mapping *mem_map;
+  
+  if (!tgt)
+    return NULL;
+  
+  mem_map = tgt->mem_map;
+
+  gomp_mutex_lock (&mem_map->lock);
+
+  for (t = tgt; t != NULL; t = t->prev)
+    {
+      if (t->tgt_start <= (uintptr_t) d && t->tgt_end >= (uintptr_t) d + s)
+        break;
+    }
+
+  gomp_mutex_unlock (&mem_map->lock);
+
+  if (!t)
+    return NULL;
+
+  for (i = 0; i < t->list_count; i++)
+    {
+      void * offset;
+
+      splay_tree_key k = &t->array[i].key;
+      offset = d - t->tgt_start + k->tgt_offset;
+
+      if (k->host_start + offset <= (void *) k->host_end)
+        return k;
+    }
+ 
+  return NULL;
+}
+
+/* OpenACC is silent on how memory exhaustion is indicated.  We return
+   NULL.  */
+
+void *
+acc_malloc (size_t s)
+{
+  if (!s)
+    return NULL;
+
+  goacc_lazy_initialize ();
+
+  struct goacc_thread *thr = goacc_thread ();
+
+  return base_dev->alloc_func (thr->dev->target_id, s);
+}
+
+/* OpenACC 2.0a (3.2.16) doesn't specify what to do in the event
+   the device address is mapped. We choose to check if it mapped,
+   and if it is, to unmap it. */
+void
+acc_free (void *d)
+{
+  splay_tree_key k;
+  struct goacc_thread *thr = goacc_thread ();
+
+  if (!d)
+    return;
+
+  /* We don't have to call lazy open here, as the ptr value must have
+     been returned by acc_malloc.  It's not permitted to pass NULL in
+     (unless you got that null from acc_malloc).  */
+  if ((k = lookup_dev (thr->dev->openacc.data_environ, d, 1)))
+   {
+     void *offset;
+
+     offset = d - k->tgt->tgt_start + k->tgt_offset;
+
+     acc_unmap_data ((void *)(k->host_start + offset));
+   }
+
+  base_dev->free_func (thr->dev->target_id, d);
+}
+
+void
+acc_memcpy_to_device (void *d, void *h, size_t s)
+{
+  /* No need to call lazy open here, as the device pointer must have
+     been obtained from a routine that did that.  */
+  struct goacc_thread *thr = goacc_thread ();
+
+  base_dev->host2dev_func (thr->dev->target_id, d, h, s);
+}
+
+void
+acc_memcpy_from_device (void *h, void *d, size_t s)
+{
+  /* No need to call lazy open here, as the device pointer must have
+     been obtained from a routine that did that.  */
+  struct goacc_thread *thr = goacc_thread ();
+
+  base_dev->dev2host_func (thr->dev->target_id, h, d, s);
+}
+
+/* Return the device pointer that corresponds to host data H.  Or NULL
+   if no mapping.  */
+
+void *
+acc_deviceptr (void *h)
+{
+  splay_tree_key n;
+  void *d;
+  void *offset;
+
+  goacc_lazy_initialize ();
+
+  struct goacc_thread *thr = goacc_thread ();
+
+  n = lookup_host (&thr->dev->mem_map, h, 1);
+
+  if (!n)
+    return NULL;
+
+  offset = h - n->host_start;
+
+  d = n->tgt->tgt_start + n->tgt_offset + offset;
+
+  return d;
+}
+
+/* Return the host pointer that corresponds to device data D.  Or NULL
+   if no mapping.  */
+
+void *
+acc_hostptr (void *d)
+{
+  splay_tree_key n;
+  void *h;
+  void *offset;
+
+  goacc_lazy_initialize ();
+
+  struct goacc_thread *thr = goacc_thread ();
+
+  n = lookup_dev (thr->dev->openacc.data_environ, d, 1);
+
+  if (!n)
+    return NULL;
+
+  offset = d - n->tgt->tgt_start + n->tgt_offset;
+
+  h = n->host_start + offset;
+
+  return h;
+}
+
+/* Return 1 if host data [H,+S] is present on the device.  */
+
+int
+acc_is_present (void *h, size_t s)
+{
+  splay_tree_key n;
+
+  if (!s || !h)
+    return 0;
+
+  goacc_lazy_initialize ();
+
+  struct goacc_thread *thr = goacc_thread ();
+  struct gomp_device_descr *acc_dev = thr->dev;
+
+  n = lookup_host (&acc_dev->mem_map, h, s);
+
+  if (n && ((uintptr_t)h < n->host_start
+	    || (uintptr_t)h + s > n->host_end
+	    || s > n->host_end - n->host_start))
+    n = NULL;
+
+  return n != NULL;
+}
+
+/* Create a mapping for host [H,+S] -> device [D,+S] */
+
+void
+acc_map_data (void *h, void *d, size_t s)
+{
+  struct target_mem_desc *tgt;
+  size_t mapnum = 1;
+  void *hostaddrs = h;
+  void *devaddrs = d;
+  size_t sizes = s;
+  unsigned short kinds = GOMP_MAP_ALLOC;
+
+  goacc_lazy_initialize ();
+
+  struct goacc_thread *thr = goacc_thread ();
+  struct gomp_device_descr *acc_dev = thr->dev;
+
+  if (acc_dev->capabilities & TARGET_CAP_SHARED_MEM)
+    {
+      if (d != h)
+        gomp_fatal ("cannot map data on shared-memory system");
+
+      tgt = gomp_map_vars (NULL, 0, NULL, NULL, NULL, NULL, true, false);
+    }
+  else
+    {
+      struct goacc_thread *thr = goacc_thread ();
+
+      if (!d || !h || !s)
+	gomp_fatal ("[%p,+%d]->[%p,+%d] is a bad map",
+                    (void *)h, (int)s, (void *)d, (int)s);
+
+      if (lookup_host (&acc_dev->mem_map, h, s))
+	gomp_fatal ("host address [%p, +%d] is already mapped", (void *)h,
+		    (int)s);
+
+      if (lookup_dev (thr->dev->openacc.data_environ, d, s))
+	gomp_fatal ("device address [%p, +%d] is already mapped", (void *)d,
+		    (int)s);
+
+      tgt = gomp_map_vars (acc_dev, mapnum, &hostaddrs, &devaddrs, &sizes,
+			   &kinds, true, false);
+    }
+
+  tgt->prev = acc_dev->openacc.data_environ;
+  acc_dev->openacc.data_environ = tgt;
+}
+
+void
+acc_unmap_data (void *h)
+{
+  struct goacc_thread *thr = goacc_thread ();
+  struct gomp_device_descr *acc_dev = thr->dev;
+
+  /* No need to call lazy open, as the address must have been mapped.  */
+
+  size_t host_size;
+  splay_tree_key n = lookup_host (&acc_dev->mem_map, h, 1);
+  struct target_mem_desc *t;
+
+  if (!n)
+    gomp_fatal ("%p is not a mapped block", (void *)h);
+
+  host_size = n->host_end - n->host_start;
+
+  if (n->host_start != (uintptr_t) h)
+    gomp_fatal ("[%p,%d] surrounds1 %p",
+        	(void *) n->host_start, (int) host_size, (void *) h);
+
+  t = n->tgt;
+
+  if (t->refcount == 2)
+    {
+      struct target_mem_desc *tp;
+
+      /* This is the last reference, so pull the descriptor off the 
+         chain. This avoids gomp_unmap_vars via gomp_unmap_tgt from
+         freeing the device memory. */
+      t->tgt_end = 0;
+      t->to_free = 0;
+
+      gomp_mutex_lock (&acc_dev->mem_map.lock);
+
+      for (tp = NULL, t = acc_dev->openacc.data_environ; t != NULL;
+	   tp = t, t = t->prev)
+        if (n->tgt == t)
+          {
+            if (tp)
+              tp->prev = t->prev;
+            else
+              acc_dev->openacc.data_environ = t->prev;
+
+            break; 
+          }
+
+      gomp_mutex_unlock (&acc_dev->mem_map.lock);
+    }
+  
+  gomp_unmap_vars (t, true);
+}
+
+#define PCC_Present (1 << 0)
+#define PCC_Create (1 << 1)
+#define PCC_Copy (1 << 2)
+
+attribute_hidden void *
+present_create_copy (unsigned f, void *h, size_t s)
+{
+  void *d;
+  splay_tree_key n;
+
+  if (!h || !s)
+    gomp_fatal ("[%p,+%d] is a bad range", (void *)h, (int)s);
+
+  goacc_lazy_initialize ();
+
+  struct goacc_thread *thr = goacc_thread ();
+  struct gomp_device_descr *acc_dev = thr->dev;
+
+  n = lookup_host (&acc_dev->mem_map, h, s);
+  if (n)
+    {
+      /* Present. */
+      d = (void *) (n->tgt->tgt_start + n->tgt_offset);
+
+      if (!(f & PCC_Present))
+        gomp_fatal ("[%p,+%d] already mapped to [%p,+%d]",
+            (void *)h, (int)s, (void *)d, (int)s);
+      if ((h + s) > (void *)n->host_end)    
+        gomp_fatal ("[%p,+%d] not mapped", (void *)h, (int)s);
+    }
+  else if (!(f & PCC_Create))
+    {
+      gomp_fatal ("[%p,+%d] not mapped", (void *)h, (int)s);
+    }
+  else
+    {
+      struct target_mem_desc *tgt;
+      size_t mapnum = 1;
+      unsigned short kinds;
+      void *hostaddrs = h;
+
+      if (f & PCC_Copy)
+        kinds = GOMP_MAP_ALLOC_TO;
+      else
+        kinds = GOMP_MAP_ALLOC;
+
+      tgt = gomp_map_vars (acc_dev, mapnum, &hostaddrs, NULL, &s, &kinds, true,
+			   false);
+
+      gomp_mutex_lock (&acc_dev->mem_map.lock);
+
+      d = tgt->to_free;
+      tgt->prev = acc_dev->openacc.data_environ;
+      acc_dev->openacc.data_environ = tgt;
+
+      gomp_mutex_unlock (&acc_dev->mem_map.lock);
+    }
+  
+  return d;
+}
+
+void *
+acc_create (void *h, size_t s)
+{
+  return present_create_copy (PCC_Create, h, s);
+}
+
+void *
+acc_copyin (void *h, size_t s)
+{
+  return present_create_copy (PCC_Create | PCC_Copy, h, s);
+}
+
+void *
+acc_present_or_create (void *h, size_t s)
+{
+  return present_create_copy (PCC_Present | PCC_Create, h, s);
+}
+
+void *
+acc_present_or_copyin (void *h, size_t s)
+{
+  return present_create_copy (PCC_Present | PCC_Create | PCC_Copy, h, s);
+}
+
+#define DC_Copyout (1 << 0)
+
+static void
+delete_copyout (unsigned f, void *h, size_t s)
+{
+  size_t host_size;
+  splay_tree_key n;
+  void *d;
+  struct goacc_thread *thr = goacc_thread ();
+  struct gomp_device_descr *acc_dev = thr->dev;
+
+  n = lookup_host (&acc_dev->mem_map, h, s);
+
+  /* No need to call lazy open, as the data must already have been
+     mapped.  */
+
+  if (!n)
+    gomp_fatal ("[%p,%d] is not mapped", (void *)h, (int)s);
+
+  d = (void *) (n->tgt->tgt_start + n->tgt_offset);
+
+  host_size = n->host_end - n->host_start;
+
+  if (n->host_start != (uintptr_t) h || host_size != s)
+    gomp_fatal ("[%p,%d] surrounds2 [%p,+%d]",
+        	(void *) n->host_start, (int) host_size, (void *) h, (int) s);
+
+  if (f & DC_Copyout)
+    acc_dev->dev2host_func (acc_dev->target_id, h, d, s);
+  
+  acc_unmap_data (h);
+
+  acc_dev->free_func (acc_dev->target_id, d);
+}
+
+void
+acc_delete (void *h , size_t s)
+{
+  delete_copyout (0, h, s);
+}
+
+void acc_copyout (void *h, size_t s)
+{
+  delete_copyout (DC_Copyout, h, s);
+}
+
+static void
+update_dev_host (int is_dev, void *h, size_t s)
+{
+  splay_tree_key n;
+  void *d;
+  struct goacc_thread *thr = goacc_thread ();
+  struct gomp_device_descr *acc_dev = thr->dev;
+
+  n = lookup_host (&acc_dev->mem_map, h, s);
+
+  /* No need to call lazy open, as the data must already have been
+     mapped.  */
+
+  if (!n)
+    gomp_fatal ("[%p,%d] is not mapped", h, (int)s);
+
+  d = (void *) (n->tgt->tgt_start + n->tgt_offset);
+
+  if (is_dev)
+    acc_dev->host2dev_func (acc_dev->target_id, d, h, s);
+  else
+    acc_dev->dev2host_func (acc_dev->target_id, h, d, s);
+}
+
+void
+acc_update_device (void *h, size_t s)
+{
+  update_dev_host (1, h, s);
+}
+
+void
+acc_update_self (void *h, size_t s)
+{
+  update_dev_host (0, h, s);
+}
diff --git a/libgomp/oacc-parallel.c b/libgomp/oacc-parallel.c
new file mode 100644
index 0000000..0ff44bf
--- /dev/null
+++ b/libgomp/oacc-parallel.c
@@ -0,0 +1,388 @@
+/* Copyright (C) 2013-2014 Free Software Foundation, Inc.
+
+   Contributed by Mentor Embedded.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* This file handles OpenACC constructs.  */
+
+#include "openacc.h"
+#include "libgomp.h"
+#include "libgomp_g.h"
+#include "gomp-constants.h"
+#include "libgomp_target.h"
+#include "oacc-int.h"
+#include <stdio.h>
+#include <string.h>
+#include <stdarg.h>
+#include <assert.h>
+#include <alloca.h>
+
+static void
+dump_var (char *s, size_t idx, void *hostaddr, size_t size, unsigned char kind)
+{
+  gomp_notify (" %2zi: %3s 0x%.2x -", idx, s, kind & 0xff);
+
+  switch (kind & 0xff)
+    {
+      case 0x00: gomp_notify (" ALLOC              "); break;
+      case 0x01: gomp_notify (" ALLOC TO           "); break;
+      case 0x02: gomp_notify (" ALLOC FROM         "); break;
+      case 0x03: gomp_notify (" ALLOC TOFROM       "); break;
+      case 0x04: gomp_notify (" POINTER            "); break;
+      case 0x05: gomp_notify (" TO_PSET            "); break;
+
+      case 0x08: gomp_notify (" FORCE_ALLOC        "); break;
+      case 0x09: gomp_notify (" FORCE_TO           "); break;
+      case 0x0a: gomp_notify (" FORCE_FROM         "); break;
+      case 0x0b: gomp_notify (" FORCE_TOFROM       "); break;
+      case 0x0c: gomp_notify (" FORCE_PRESENT      "); break;
+      case 0x0d: gomp_notify (" FORCE_DEALLOC      "); break;
+      case 0x0e: gomp_notify (" FORCE_DEVICEPTR    "); break;
+
+      case 0x18: gomp_notify (" FORCE_PRIVATE      "); break;
+      case 0x19: gomp_notify (" FORCE_FIRSTPRIVATE "); break;
+
+      case (unsigned char) -1: gomp_notify (" DUMMY              "); break;
+      default: gomp_notify ("UGH! 0x%x\n", kind);
+    }
+    
+  gomp_notify ("- %d - %4d/0x%04x ", 1 << (kind >> 8), (int) size, (int) size);
+  gomp_notify ("- %p\n", hostaddr);
+}
+
+/* Ensure that the target device for DEVICE_TYPE is initialised (and that
+   plugins have been loaded if appropriate).  The ACC_dev variable for the
+   current thread will be set appropriately for the given device type on
+   return.  */
+
+attribute_hidden void
+select_acc_device (int device_type)
+{
+  goacc_lazy_initialize ();
+
+  if (device_type == GOMP_IF_CLAUSE_FALSE)
+    return;
+
+  if (device_type == acc_device_none)
+    device_type = acc_device_host;
+
+  if (device_type >= 0)
+    {
+      /* NOTE: this will go badly if the surrounding data environment is set up
+         to use a different device type.  We'll just have to trust that users
+	 know what they're doing...  */
+      acc_set_device_type (device_type);
+    }
+}
+
+void goacc_wait (int async, int num_waits, va_list ap);
+
+void
+GOACC_parallel (int device, void (*fn) (void *), const void *openmp_target,
+		size_t mapnum, void **hostaddrs, size_t *sizes,
+		unsigned short *kinds,
+		int num_gangs, int num_workers, int vector_length,
+		int async, int num_waits, ...)
+{
+  bool if_clause_condition_value = device != GOMP_IF_CLAUSE_FALSE;
+  va_list ap;
+  struct goacc_thread *thr;
+  struct gomp_device_descr *acc_dev;
+  struct target_mem_desc *tgt;
+  void **devaddrs;
+  unsigned int i;
+  struct splay_tree_key_s k;
+  splay_tree_key tgt_fn_key;
+  void (*tgt_fn);
+
+  if (num_gangs != 1)
+    gomp_fatal ("num_gangs (%d) different from one is not yet supported",
+		num_gangs);
+  if (num_workers != 1)
+    gomp_fatal ("num_workers (%d) different from one is not yet supported",
+		num_workers);
+
+  gomp_notify ("%s: mapnum=%zd, hostaddrs=%p, sizes=%p, kinds=%p, async=%d\n",
+	       __FUNCTION__, mapnum, hostaddrs, sizes, kinds, async);
+
+  select_acc_device (device);
+
+  thr = goacc_thread ();
+  acc_dev = thr->dev;
+
+  /* Host fallback if "if" clause is false or if the current device is set to
+     the host.  */
+  if (!if_clause_condition_value)
+    {
+      goacc_save_and_set_bind (acc_device_host);
+      fn (hostaddrs);
+      goacc_restore_bind ();
+      return;
+    }
+  else if (acc_device_type (acc_dev->type) == acc_device_host)
+    {
+      fn (hostaddrs);
+      return;
+    }
+
+  va_start (ap, num_waits);
+  
+  if (num_waits > 0)
+    goacc_wait (async, num_waits, ap);
+
+  va_end (ap);
+
+  acc_dev->openacc.async_set_async_func (async);
+
+  if (!(acc_dev->capabilities & TARGET_CAP_NATIVE_EXEC))
+    {
+      k.host_start = (uintptr_t) fn;
+      k.host_end = k.host_start + 1;
+      gomp_mutex_lock (&acc_dev->mem_map.lock);
+      tgt_fn_key = splay_tree_lookup (&acc_dev->mem_map.splay_tree, &k);
+      gomp_mutex_unlock (&acc_dev->mem_map.lock);
+
+      if (tgt_fn_key == NULL)
+	gomp_fatal ("target function wasn't mapped: perhaps -fopenacc was "
+		    "used without -flto?");
+
+      tgt_fn = (void (*)) tgt_fn_key->tgt->tgt_start;
+    }
+  else
+    tgt_fn = (void (*)) fn;
+
+  tgt = gomp_map_vars (acc_dev, mapnum, hostaddrs, NULL, sizes, kinds, true,
+		       false);
+
+  devaddrs = alloca (sizeof (void *) * mapnum);
+  for (i = 0; i < mapnum; i++)
+    devaddrs[i] = (void *) (tgt->list[i]->tgt->tgt_start
+			    + tgt->list[i]->tgt_offset);
+
+  acc_dev->openacc.exec_func (tgt_fn, mapnum, hostaddrs, devaddrs, sizes, kinds,
+			      num_gangs, num_workers, vector_length, async,
+			      tgt);
+
+  /* If running synchronously, unmap immediately.  */
+  if (async < acc_async_noval)
+    gomp_unmap_vars (tgt, true);
+  else
+    {
+      gomp_copy_from_async (tgt);
+      acc_dev->openacc.register_async_cleanup_func (tgt);
+    }
+
+  acc_dev->openacc.async_set_async_func (acc_async_sync);
+}
+
+void
+GOACC_data_start (int device, const void *openmp_target, size_t mapnum,
+		  void **hostaddrs, size_t *sizes, unsigned short *kinds)
+{
+  bool if_clause_condition_value = device != GOMP_IF_CLAUSE_FALSE;
+  struct target_mem_desc *tgt;
+
+  gomp_notify ("%s: mapnum=%zd, hostaddrs=%p, sizes=%p, kinds=%p\n",
+	       __FUNCTION__, mapnum, hostaddrs, sizes, kinds);
+
+  select_acc_device (device);
+
+  struct goacc_thread *thr = goacc_thread ();
+  struct gomp_device_descr *acc_dev = thr->dev;
+
+  /* Host fallback or 'do nothing'.  */
+  if ((acc_dev->capabilities & TARGET_CAP_SHARED_MEM)
+      || !if_clause_condition_value)
+    {
+      tgt = gomp_map_vars (NULL, 0, NULL, NULL, NULL, NULL, true, false);
+      tgt->prev = thr->mapped_data;
+      thr->mapped_data = tgt;
+
+      return;
+    }
+
+  gomp_notify ("  %s: prepare mappings\n", __FUNCTION__);
+  tgt = gomp_map_vars (acc_dev, mapnum, hostaddrs, NULL, sizes, kinds, true,
+		       false);
+  gomp_notify ("  %s: mappings prepared\n", __FUNCTION__);
+  tgt->prev = thr->mapped_data;
+  thr->mapped_data = tgt;
+}
+
+void
+GOACC_data_end (void)
+{
+  struct goacc_thread *thr = goacc_thread ();
+  struct target_mem_desc *tgt = thr->mapped_data;
+
+  gomp_notify ("  %s: restore mappings\n", __FUNCTION__);
+  thr->mapped_data = tgt->prev;
+  gomp_unmap_vars (tgt, true);
+  gomp_notify ("  %s: mappings restored\n", __FUNCTION__);
+}
+
+
+void
+GOACC_kernels (int device, void (*fn) (void *), const void *openmp_target,
+	       size_t mapnum, void **hostaddrs, size_t *sizes,
+	       unsigned short *kinds,
+	       int num_gangs, int num_workers, int vector_length,
+	       int async, int num_waits, ...)
+{
+  gomp_notify ("%s: mapnum=%zd, hostaddrs=%p, sizes=%p, kinds=%p\n",
+	       __FUNCTION__, mapnum, hostaddrs, sizes, kinds);
+
+  va_list ap;
+
+  select_acc_device (device);
+
+  va_start (ap, num_waits);
+
+  if (num_waits > 0)
+    goacc_wait (async, num_waits, ap);
+
+  va_end (ap);
+
+  GOACC_parallel (device, fn, openmp_target, mapnum, hostaddrs, sizes, kinds,
+		  num_gangs, num_workers, vector_length, async, 0);
+}
+
+void
+goacc_wait (int async, int num_waits, va_list ap)
+{
+  struct goacc_thread *thr = goacc_thread ();
+  struct gomp_device_descr *acc_dev = thr->dev;
+  int i;
+
+  assert (num_waits >= 0);
+
+  if (async == acc_async_sync && num_waits == 0)
+    {
+      acc_wait_all ();
+      return;
+    }
+
+  if (async == acc_async_sync && num_waits)
+    {
+      for (i = 0; i < num_waits; i++)
+        {
+          int qid = va_arg (ap, int);
+
+          if (acc_async_test (qid))
+            continue;
+
+          acc_wait (qid);
+        }
+      return;
+    }
+
+  if (async == acc_async_noval && num_waits == 0)
+    {
+      acc_dev->openacc.async_wait_all_async_func (acc_async_noval);
+      return;
+    }
+
+  for (i = 0; i < num_waits; i++)
+    {
+      int qid = va_arg (ap, int);
+
+      if (acc_async_test (qid))
+	continue;
+
+      /* If we're waiting on the same asynchronous queue as we're launching on,
+         the queue itself will order work as required, so there's no need to
+	 wait explicitly.  */
+      if (qid != async)
+	acc_dev->openacc.async_wait_async_func (qid, async);
+    }
+}
+
+void
+GOACC_update (int device, const void *openmp_target, size_t mapnum,
+	      void **hostaddrs, size_t *sizes, unsigned short *kinds,
+	      int async, int num_waits, ...)
+{
+  bool if_clause_condition_value = device != GOMP_IF_CLAUSE_FALSE;
+  size_t i;
+
+  select_acc_device (device);
+
+  struct goacc_thread *thr = goacc_thread ();
+  struct gomp_device_descr *acc_dev = thr->dev;
+
+  if ((acc_dev->capabilities & TARGET_CAP_SHARED_MEM)
+      || !if_clause_condition_value)
+    return;
+
+  if (num_waits > 0)
+    {
+      va_list ap;
+
+      va_start (ap, num_waits);
+
+      goacc_wait (async, num_waits, ap);
+
+      va_end (ap);
+    }
+
+  acc_dev->openacc.async_set_async_func (async);
+
+  for (i = 0; i < mapnum; ++i)
+    {
+      unsigned char kind = kinds[i] & 0xff;
+
+      dump_var ("UPD", i, hostaddrs[i], sizes[i], kinds[i]);
+
+      switch (kind)
+	{
+	case GOMP_MAP_POINTER:
+	  break;
+
+	case GOMP_MAP_FORCE_TO:
+	  acc_update_device (hostaddrs[i], sizes[i]);
+	  break;
+
+	case GOMP_MAP_FORCE_FROM:
+	  acc_update_self (hostaddrs[i], sizes[i]);
+	  break;
+
+	default:
+	  gomp_fatal (">>>> GOACC_update UNHANDLED kind 0x%.2x", kind);
+	  break;
+	}
+    }
+
+  acc_dev->openacc.async_set_async_func (acc_async_sync);
+}
+
+void
+GOACC_wait (int async, int num_waits, ...)
+{
+  va_list ap;
+
+  va_start (ap, num_waits);
+
+  goacc_wait (async, num_waits, ap);
+
+  va_end (ap);
+}
diff --git a/libgomp/oacc-plugin.c b/libgomp/oacc-plugin.c
new file mode 100644
index 0000000..357cb5f
--- /dev/null
+++ b/libgomp/oacc-plugin.c
@@ -0,0 +1,48 @@
+/* Copyright (C) 2014 Free Software Foundation, Inc.
+
+   Contributed by Mentor Embedded.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* Initialize and register OpenACC dispatch table from libgomp plugin.  */
+
+#include "libgomp.h"
+#include "oacc-plugin.h"
+#include "libgomp_target.h"
+#include "oacc-int.h"
+
+void
+GOMP_PLUGIN_async_unmap_vars (void *ptr)
+{
+  struct target_mem_desc *tgt = ptr;
+  
+  gomp_unmap_vars (tgt, false);
+}
+
+/* Return the target-specific part of the TLS data for the current thread.  */
+
+void *
+GOMP_PLUGIN_acc_thread (void)
+{
+  struct goacc_thread *thr = goacc_thread ();
+  return thr ? thr->target_tls : NULL;
+}
diff --git a/libgomp/oacc-plugin.h b/libgomp/oacc-plugin.h
new file mode 100644
index 0000000..d05a28f
--- /dev/null
+++ b/libgomp/oacc-plugin.h
@@ -0,0 +1,32 @@
+/* Copyright (C) 2014 Free Software Foundation, Inc.
+
+   Contributed by Mentor Embedded.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _OACC_PLUGIN_H
+#define _OACC_PLUGIN_H 1
+
+extern void GOMP_PLUGIN_async_unmap_vars (void *ptr);
+extern void *GOMP_PLUGIN_acc_thread (void);
+
+#endif
diff --git a/libgomp/openacc.f90 b/libgomp/openacc.f90
new file mode 100644
index 0000000..a344929
--- /dev/null
+++ b/libgomp/openacc.f90
@@ -0,0 +1,803 @@
+!  OpenACC Runtime Library Definitions.
+
+!  Copyright (C) 2014 Free Software Foundation, Inc.
+
+!  Contributed by Tobias Burnus <burnus@net-b.de>
+!              and Mentor Embedded.
+
+!  This file is part of the GNU OpenMP Library (libgomp).
+
+!  Libgomp is free software; you can redistribute it and/or modify it
+!  under the terms of the GNU General Public License as published by
+!  the Free Software Foundation; either version 3, or (at your option)
+!  any later version.
+
+!  Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+!  WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+!  FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+!  more details.
+
+!  Under Section 7 of GPL version 3, you are granted additional
+!  permissions described in the GCC Runtime Library Exception, version
+!  3.1, as published by the Free Software Foundation.
+
+!  You should have received a copy of the GNU General Public License and
+!  a copy of the GCC Runtime Library Exception along with this program;
+!  see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+!  <http://www.gnu.org/licenses/>.
+
+module openacc_kinds
+  use iso_fortran_env, only: int32
+  implicit none
+
+  private :: int32
+  public :: acc_device_kind
+
+  integer, parameter :: acc_device_kind = int32
+
+  public :: acc_device_none, acc_device_default, acc_device_host
+  public :: acc_device_not_host, acc_device_nvidia
+
+  integer (acc_device_kind), parameter :: acc_device_none = 0
+  integer (acc_device_kind), parameter :: acc_device_default = 1
+  integer (acc_device_kind), parameter :: acc_device_host = 2
+  integer (acc_device_kind), parameter :: acc_device_host_nonshm = 3
+  integer (acc_device_kind), parameter :: acc_device_not_host = 4
+  integer (acc_device_kind), parameter :: acc_device_nvidia = 5
+
+  public :: acc_handle_kind
+
+  integer, parameter :: acc_handle_kind = int32
+
+  public :: acc_async_noval, acc_async_sync
+
+  integer (acc_handle_kind), parameter :: acc_async_noval = -1
+  integer (acc_handle_kind), parameter :: acc_async_sync = -2
+
+end module
+
+module openacc_internal
+  use openacc_kinds
+  implicit none
+
+  interface
+    function acc_async_test_h (a)
+      logical acc_async_test_h
+      integer a
+    end function
+
+    function acc_async_test_all_h ()
+      logical acc_async_test_all_h
+    end function
+
+    function acc_on_device_h (d)
+      import
+      integer (acc_device_kind) d
+      logical acc_on_device_h
+    end function
+
+    subroutine acc_copyin_32_h (a, len)
+      use iso_c_binding, only: c_int32_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int32_t) len
+    end subroutine
+
+    subroutine acc_copyin_64_h (a, len)
+      use iso_c_binding, only: c_int64_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int64_t) len
+    end subroutine
+
+    subroutine acc_copyin_array_h (a)
+      type (*), dimension (..), contiguous :: a
+    end subroutine
+
+    subroutine acc_present_or_copyin_32_h (a, len)
+      use iso_c_binding, only: c_int32_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int32_t) len
+    end subroutine
+
+    subroutine acc_present_or_copyin_64_h (a, len)
+      use iso_c_binding, only: c_int64_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int64_t) len
+    end subroutine
+
+    subroutine acc_present_or_copyin_array_h (a)
+      type (*), dimension (..), contiguous :: a
+    end subroutine
+
+    subroutine acc_create_32_h (a, len)
+      use iso_c_binding, only: c_int32_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int32_t) len
+    end subroutine
+
+    subroutine acc_create_64_h (a, len)
+      use iso_c_binding, only: c_int64_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int64_t) len
+    end subroutine
+
+    subroutine acc_create_array_h (a)
+      type (*), dimension (..), contiguous :: a
+    end subroutine
+
+    subroutine acc_present_or_create_32_h (a, len)
+      use iso_c_binding, only: c_int32_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int32_t) len
+    end subroutine
+
+    subroutine acc_present_or_create_64_h (a, len)
+      use iso_c_binding, only: c_int64_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int64_t) len
+    end subroutine
+
+    subroutine acc_present_or_create_array_h (a)
+      type (*), dimension (..), contiguous :: a
+    end subroutine
+
+    subroutine acc_copyout_32_h (a, len)
+      use iso_c_binding, only: c_int32_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int32_t) len
+    end subroutine
+
+    subroutine acc_copyout_64_h (a, len)
+      use iso_c_binding, only: c_int64_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int64_t) len
+    end subroutine
+
+    subroutine acc_copyout_array_h (a)
+      type (*), dimension (..), contiguous :: a
+    end subroutine
+
+    subroutine acc_delete_32_h (a, len)
+      use iso_c_binding, only: c_int32_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int32_t) len
+    end subroutine
+
+    subroutine acc_delete_64_h (a, len)
+      use iso_c_binding, only: c_int64_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int64_t) len
+    end subroutine
+
+    subroutine acc_delete_array_h (a)
+      type (*), dimension (..), contiguous :: a
+    end subroutine
+
+    subroutine acc_update_device_32_h (a, len)
+      use iso_c_binding, only: c_int32_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int32_t) len
+    end subroutine
+
+    subroutine acc_update_device_64_h (a, len)
+      use iso_c_binding, only: c_int64_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int64_t) len
+    end subroutine
+
+    subroutine acc_update_device_array_h (a)
+      type (*), dimension (..), contiguous :: a
+    end subroutine
+
+    subroutine acc_update_self_32_h (a, len)
+      use iso_c_binding, only: c_int32_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int32_t) len
+    end subroutine
+
+    subroutine acc_update_self_64_h (a, len)
+      use iso_c_binding, only: c_int64_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int64_t) len
+    end subroutine
+
+    subroutine acc_update_self_array_h (a)
+      type (*), dimension (..), contiguous :: a
+    end subroutine
+
+    function acc_is_present_32_h (a, len)
+      use iso_c_binding, only: c_int32_t
+      logical acc_is_present_32_h
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int32_t) len
+    end function
+
+    function acc_is_present_64_h (a, len)
+      use iso_c_binding, only: c_int64_t
+      logical acc_is_present_64_h
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int64_t) len
+    end function
+
+    function acc_is_present_array_h (a)
+      logical acc_is_present_array_h
+      type (*), dimension (..), contiguous :: a
+    end function
+  end interface
+
+  interface
+    function acc_async_test_l (a) &
+        bind (C, name = "acc_async_test")
+      use iso_c_binding, only: c_int
+      integer (c_int) :: acc_async_test_l
+      integer (c_int), value :: a
+    end function
+
+    function acc_async_test_all_l () &
+        bind (C, name = "acc_async_test_all")
+      use iso_c_binding, only: c_int
+      integer (c_int) :: acc_async_test_all_l
+    end function
+
+    function acc_on_device_l (d) &
+        bind (C, name = "acc_on_device")
+      use iso_c_binding, only: c_int
+      integer (c_int) :: acc_on_device_l
+      integer (c_int), value :: d
+    end function
+
+    subroutine acc_copyin_l (a, len) &
+        bind (C, name = "acc_copyin")
+      use iso_c_binding, only: c_size_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_size_t), value :: len
+    end subroutine
+
+    subroutine acc_present_or_copyin_l (a, len) &
+        bind (C, name = "acc_present_or_copyin")
+      use iso_c_binding, only: c_size_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_size_t), value :: len
+    end subroutine
+
+    subroutine acc_create_l (a, len) &
+        bind (C, name = "acc_create")
+      use iso_c_binding, only: c_size_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_size_t), value :: len
+    end subroutine
+
+    subroutine acc_present_or_create_l (a, len) &
+        bind (C, name = "acc_present_or_create")
+      use iso_c_binding, only: c_size_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_size_t), value :: len
+    end subroutine
+
+    subroutine acc_copyout_l (a, len) &
+        bind (C, name = "acc_copyout")
+      use iso_c_binding, only: c_size_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_size_t), value :: len
+    end subroutine
+
+    subroutine acc_delete_l (a, len) &
+        bind (C, name = "acc_delete")
+      use iso_c_binding, only: c_size_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_size_t), value :: len
+    end subroutine
+
+    subroutine acc_update_device_l (a, len) &
+        bind (C, name = "acc_update_device")
+      use iso_c_binding, only: c_size_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_size_t), value :: len
+    end subroutine
+
+    subroutine acc_update_self_l (a, len) &
+        bind (C, name = "acc_update_self")
+      use iso_c_binding, only: c_size_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_size_t), value :: len
+    end subroutine
+
+    function acc_is_present_l (a, len) &
+        bind (C, name = "acc_is_present")
+      use iso_c_binding, only: c_int32_t, c_size_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      integer (c_int32_t) :: acc_is_present_l
+      type (*), dimension (*) :: a
+      integer (c_size_t), value :: len
+    end function
+  end interface
+end module
+
+module openacc
+  use openacc_kinds
+  use openacc_internal
+  implicit none
+
+  public :: openacc_version
+
+  public :: acc_get_num_devices, acc_set_device_type, acc_get_device_type
+  public :: acc_set_device_num, acc_get_device_num, acc_async_test
+  public :: acc_async_test_all, acc_wait, acc_wait_async, acc_wait_all
+  public :: acc_wait_all_async, acc_init, acc_shutdown, acc_on_device
+  public :: acc_copyin, acc_present_or_copyin, acc_pcopyin, acc_create
+  public :: acc_present_or_create, acc_pcreate, acc_copyout, acc_delete
+  public :: acc_update_device, acc_update_self, acc_is_present
+
+  integer, parameter :: openacc_version = 201306
+
+  interface acc_get_num_devices
+    function acc_get_num_devices (d) &
+        bind (C, name = "acc_get_num_devices")
+      use iso_c_binding, only: c_int
+      integer (c_int) :: acc_get_num_devices
+      integer (c_int), value :: d
+    end function
+  end interface
+
+  interface acc_set_device_type
+    subroutine acc_set_device_type (d) &
+        bind (C, name = "acc_set_device_type")
+      use iso_c_binding, only: c_int
+      integer (c_int), value :: d
+    end subroutine
+  end interface
+
+  interface acc_get_device_type
+    function acc_get_device_type () &
+        bind (C, name = "acc_get_device_type")
+      use iso_c_binding, only: c_int
+      integer (c_int) :: acc_get_device_type
+    end function
+  end interface
+
+  interface acc_set_device_num
+    subroutine acc_set_device_num (n, d) &
+        bind (C, name = "acc_set_device_num")
+      use iso_c_binding, only: c_int
+      integer (c_int), value :: n, d
+    end subroutine
+  end interface
+
+  interface acc_get_device_num
+    function acc_get_device_num (d) &
+        bind (C, name = "acc_get_device_num")
+      use iso_c_binding, only: c_int
+      integer (c_int) :: acc_get_device_num
+      integer (c_int), value :: d
+    end function
+  end interface
+
+  interface acc_async_test
+    procedure :: acc_async_test_h
+  end interface
+
+  interface acc_async_test_all
+    procedure :: acc_async_test_all_h
+  end interface
+
+  interface acc_wait
+    subroutine acc_wait (a) &
+        bind (C, name = "acc_wait")
+      use iso_c_binding, only: c_int
+      integer (c_int), value :: a
+    end subroutine
+  end interface
+
+  interface acc_wait_async
+    subroutine acc_wait_async (a1, a2) &
+        bind (C, name = "acc_wait_async")
+      use iso_c_binding, only: c_int
+      integer (c_int), value :: a1, a2
+    end subroutine
+  end interface
+
+  interface acc_wait_all
+    subroutine acc_wait_all () &
+        bind (C, name = "acc_wait_all")
+      use iso_c_binding, only: c_int
+    end subroutine
+  end interface
+
+  interface acc_wait_all_async
+    subroutine acc_wait_all_async (a) &
+        bind (C, name = "acc_wait_all_async")
+      use iso_c_binding, only: c_int
+      integer (c_int), value :: a
+    end subroutine
+  end interface
+
+  interface acc_init
+    subroutine acc_init (d) &
+        bind (C, name = "acc_init")
+      use iso_c_binding, only: c_int
+      integer (c_int), value :: d
+    end subroutine
+  end interface
+
+  interface acc_shutdown
+    subroutine acc_shutdown (d) &
+        bind (C, name = "acc_shutdown")
+      use iso_c_binding, only: c_int
+      integer (c_int), value :: d
+    end subroutine
+  end interface
+
+  interface acc_on_device
+    procedure :: acc_on_device_h
+  end interface
+
+  ! acc_malloc: Only available in C/C++
+  ! acc_free: Only available in C/C++
+
+  ! As vendor extension, the following code supports both 32bit and 64bit
+  ! arguments for "size"; the OpenACC standard only permits default-kind
+  ! integers, which are of kind 4 (i.e. 32 bits).
+  ! Additionally, the two-argument version also takes arrays as argument.
+  ! and the one argument version also scalars. Note that the code assumes
+  ! that the arrays are contiguous.
+
+  interface acc_copyin
+    procedure :: acc_copyin_32_h
+    procedure :: acc_copyin_64_h
+    procedure :: acc_copyin_array_h
+  end interface
+
+  interface acc_present_or_copyin
+    procedure :: acc_present_or_copyin_32_h
+    procedure :: acc_present_or_copyin_64_h
+    procedure :: acc_present_or_copyin_array_h
+  end interface
+
+  interface acc_pcopyin
+    procedure :: acc_present_or_copyin_32_h
+    procedure :: acc_present_or_copyin_64_h
+    procedure :: acc_present_or_copyin_array_h
+  end interface
+
+  interface acc_create
+    procedure :: acc_create_32_h
+    procedure :: acc_create_64_h
+    procedure :: acc_create_array_h
+  end interface
+
+  interface acc_present_or_create
+    procedure :: acc_present_or_create_32_h
+    procedure :: acc_present_or_create_64_h
+    procedure :: acc_present_or_create_array_h
+  end interface
+
+  interface acc_pcreate
+    procedure :: acc_present_or_create_32_h
+    procedure :: acc_present_or_create_64_h
+    procedure :: acc_present_or_create_array_h
+  end interface
+
+  interface acc_copyout
+    procedure :: acc_copyout_32_h
+    procedure :: acc_copyout_64_h
+    procedure :: acc_copyout_array_h
+  end interface
+
+  interface acc_delete
+    procedure :: acc_delete_32_h
+    procedure :: acc_delete_64_h
+    procedure :: acc_delete_array_h
+  end interface
+
+  interface acc_update_device
+    procedure :: acc_update_device_32_h
+    procedure :: acc_update_device_64_h
+    procedure :: acc_update_device_array_h
+  end interface
+
+  interface acc_update_self
+    procedure :: acc_update_self_32_h
+    procedure :: acc_update_self_64_h
+    procedure :: acc_update_self_array_h
+  end interface
+
+  ! acc_map_data: Only available in C/C++
+  ! acc_unmap_data: Only available in C/C++
+  ! acc_deviceptr: Only available in C/C++
+  ! acc_hostptr: Only available in C/C++
+
+  interface acc_is_present
+    procedure :: acc_is_present_32_h
+    procedure :: acc_is_present_64_h
+    procedure :: acc_is_present_array_h
+  end interface
+
+  ! acc_memcpy_to_device: Only available in C/C++
+  ! acc_memcpy_from_device: Only available in C/C++
+
+end module
+
+function acc_async_test_h (a)
+  use openacc_internal, only: acc_async_test_l
+  logical acc_async_test_h
+  integer a
+  if (acc_async_test_l (a) .eq. 1) then
+    acc_async_test_h = .TRUE.
+  else
+    acc_async_test_h = .FALSE.
+  end if
+end function
+
+function acc_async_test_all_h ()
+  use openacc_internal, only: acc_async_test_all_l
+  logical acc_async_test_all_h
+  if (acc_async_test_all_l () .eq. 1) then
+    acc_async_test_all_h = .TRUE.
+  else
+    acc_async_test_all_h = .FALSE.
+  end if
+end function
+
+function acc_on_device_h (d)
+  use openacc_internal, only: acc_on_device_l
+  use openacc_kinds
+  integer (acc_device_kind) d
+  logical acc_on_device_h
+  if (acc_on_device_l (d) .eq. 1) then
+    acc_on_device_h = .TRUE.
+  else
+    acc_on_device_h = .FALSE.
+  end if
+end function
+
+subroutine acc_copyin_32_h (a, len)
+  use iso_c_binding, only: c_int32_t, c_size_t
+  use openacc_internal, only: acc_copyin_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int32_t) len
+  call acc_copyin_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_copyin_64_h (a, len)
+  use iso_c_binding, only: c_int64_t, c_size_t
+  use openacc_internal, only: acc_copyin_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int64_t) len
+  call acc_copyin_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_copyin_array_h (a)
+  use openacc_internal, only: acc_copyin_l
+  type (*), dimension (..), contiguous :: a
+  call acc_copyin_l (a, sizeof (a))
+end subroutine
+
+subroutine acc_present_or_copyin_32_h (a, len)
+  use iso_c_binding, only: c_int32_t, c_size_t
+  use openacc_internal, only: acc_present_or_copyin_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int32_t) len
+  call acc_present_or_copyin_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_present_or_copyin_64_h (a, len)
+  use iso_c_binding, only: c_int64_t, c_size_t
+  use openacc_internal, only: acc_present_or_copyin_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int64_t) len
+  call acc_present_or_copyin_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_present_or_copyin_array_h (a)
+  use openacc_internal, only: acc_present_or_copyin_l
+  type (*), dimension (..), contiguous :: a
+  call acc_present_or_copyin_l (a, sizeof (a))
+end subroutine
+
+subroutine acc_create_32_h (a, len)
+  use iso_c_binding, only: c_int32_t, c_size_t
+  use openacc_internal, only: acc_create_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int32_t) len
+  call acc_create_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_create_64_h (a, len)
+  use iso_c_binding, only: c_int64_t, c_size_t
+  use openacc_internal, only: acc_create_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int64_t) len
+  call acc_create_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_create_array_h (a)
+  use openacc_internal, only: acc_create_l
+  type (*), dimension (..), contiguous :: a
+  call acc_create_l (a, sizeof (a))
+end subroutine
+
+subroutine acc_present_or_create_32_h (a, len)
+  use iso_c_binding, only: c_int32_t, c_size_t
+  use openacc_internal, only: acc_present_or_create_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int32_t) len
+  call acc_present_or_create_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_present_or_create_64_h (a, len)
+  use iso_c_binding, only: c_int64_t, c_size_t
+  use openacc_internal, only: acc_present_or_create_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int64_t) len
+  call acc_present_or_create_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_present_or_create_array_h (a)
+  use openacc_internal, only: acc_present_or_create_l
+  type (*), dimension (..), contiguous :: a
+  call acc_present_or_create_l (a, sizeof (a))
+end subroutine
+
+subroutine acc_copyout_32_h (a, len)
+  use iso_c_binding, only: c_int32_t, c_size_t
+  use openacc_internal, only: acc_copyout_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int32_t) len
+  call acc_copyout_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_copyout_64_h (a, len)
+  use iso_c_binding, only: c_int64_t, c_size_t
+  use openacc_internal, only: acc_copyout_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int64_t) len
+  call acc_copyout_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_copyout_array_h (a)
+  use openacc_internal, only: acc_copyout_l
+  type (*), dimension (..), contiguous :: a
+  call acc_copyout_l (a, sizeof (a))
+end subroutine
+
+subroutine acc_delete_32_h (a, len)
+  use iso_c_binding, only: c_int32_t, c_size_t
+  use openacc_internal, only: acc_delete_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int32_t) len
+  call acc_delete_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_delete_64_h (a, len)
+  use iso_c_binding, only: c_int64_t, c_size_t
+  use openacc_internal, only: acc_delete_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int64_t) len
+  call acc_delete_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_delete_array_h (a)
+  use openacc_internal, only: acc_delete_l
+  type (*), dimension (..), contiguous :: a
+  call acc_delete_l (a, sizeof (a))
+end subroutine
+
+subroutine acc_update_device_32_h (a, len)
+  use iso_c_binding, only: c_int32_t, c_size_t
+  use openacc_internal, only: acc_update_device_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int32_t) len
+  call acc_update_device_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_update_device_64_h (a, len)
+  use iso_c_binding, only: c_int64_t, c_size_t
+  use openacc_internal, only: acc_update_device_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int64_t) len
+  call acc_update_device_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_update_device_array_h (a)
+  use openacc_internal, only: acc_update_device_l
+  type (*), dimension (..), contiguous :: a
+  call acc_update_device_l (a, sizeof (a))
+end subroutine
+
+subroutine acc_update_self_32_h (a, len)
+  use iso_c_binding, only: c_int32_t, c_size_t
+  use openacc_internal, only: acc_update_self_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int32_t) len
+  call acc_update_self_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_update_self_64_h (a, len)
+  use iso_c_binding, only: c_int64_t, c_size_t
+  use openacc_internal, only: acc_update_self_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int64_t) len
+  call acc_update_self_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_update_self_array_h (a)
+  use openacc_internal, only: acc_update_self_l
+  type (*), dimension (..), contiguous :: a
+  call acc_update_self_l (a, sizeof (a))
+end subroutine
+
+function acc_is_present_32_h (a, len)
+  use iso_c_binding, only: c_int32_t, c_size_t
+  use openacc_internal, only: acc_is_present_l
+  logical acc_is_present_32_h
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int32_t) len
+  if (acc_is_present_l (a, int (len, kind = c_size_t)) .eq. 1) then
+    acc_is_present_32_h = .TRUE.
+  else
+    acc_is_present_32_h = .FALSE.
+  end if
+end function
+
+function acc_is_present_64_h (a, len)
+  use iso_c_binding, only: c_int64_t, c_size_t
+  use openacc_internal, only: acc_is_present_l
+  logical acc_is_present_64_h
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int64_t) len
+  if (acc_is_present_l (a, int (len, kind = c_size_t)) .eq. 1) then
+    acc_is_present_64_h = .TRUE.
+  else
+    acc_is_present_64_h = .FALSE.
+  end if
+end function
+
+function acc_is_present_array_h (a)
+  use openacc_internal, only: acc_is_present_l
+  logical acc_is_present_array_h
+  type (*), dimension (..), contiguous :: a
+  acc_is_present_array_h = acc_is_present_l (a, sizeof (a)) == 1
+end function
diff --git a/libgomp/openacc.h b/libgomp/openacc.h
new file mode 100644
index 0000000..01e0722
--- /dev/null
+++ b/libgomp/openacc.h
@@ -0,0 +1,127 @@
+/* OpenACC Runtime Library User-facing Declarations
+
+   Copyright (C) 2013-2014 Free Software Foundation, Inc.
+
+   Contributed by Mentor Embedded.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _OPENACC_H
+#define _OPENACC_H 1
+
+#include "gomp-constants.h"
+
+/* The OpenACC std is silent on whether or not including openacc.h
+   might or must not include other header files.  We chose to include
+   some.  */
+#include <stddef.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#if __cplusplus >= 201103
+# define __GOACC_NOTHROW noexcept ()
+#elif __cplusplus
+# define __GOACC_NOTHROW throw ()
+#else /* Not C++ */
+# define __GOACC_NOTHROW __attribute__ ((__nothrow__))
+#endif
+
+  /* Types */
+  typedef enum acc_device_t
+    {
+      acc_device_none = 0,
+      acc_device_default, /* This has to be a distinct value, as no
+			     return value can match it.  */
+      acc_device_host = GOMP_TARGET_HOST,
+      acc_device_host_nonshm = GOMP_TARGET_HOST_NONSHM,
+      acc_device_not_host,
+      acc_device_nvidia = GOMP_TARGET_NVIDIA_PTX,
+      _ACC_device_hwm
+    } acc_device_t;
+
+  typedef enum acc_async_t
+    {
+      acc_async_noval = -1,
+      acc_async_sync  = -2
+    } acc_async_t;
+
+  int acc_get_num_devices (acc_device_t __dev) __GOACC_NOTHROW;
+  void acc_set_device_type (acc_device_t __dev) __GOACC_NOTHROW;
+  acc_device_t acc_get_device_type (void) __GOACC_NOTHROW;
+  void acc_set_device_num (int __num, acc_device_t __dev) __GOACC_NOTHROW;
+  int acc_get_device_num (acc_device_t __dev) __GOACC_NOTHROW;
+  int acc_async_test (int __async) __GOACC_NOTHROW;
+  int acc_async_test_all (void) __GOACC_NOTHROW;
+  void acc_wait (int __async) __GOACC_NOTHROW;
+  void acc_wait_async (int __async1, int __async2) __GOACC_NOTHROW;
+  void acc_wait_all (void) __GOACC_NOTHROW;
+  void acc_wait_all_async (int __async) __GOACC_NOTHROW;
+  void acc_init (acc_device_t __dev) __GOACC_NOTHROW;
+  void acc_shutdown (acc_device_t __dev) __GOACC_NOTHROW;
+  int acc_on_device (acc_device_t __dev) __GOACC_NOTHROW;
+  void *acc_malloc (size_t) __GOACC_NOTHROW;
+  void acc_free (void *) __GOACC_NOTHROW;
+  /* Some of these would be more correct with const qualifiers, but
+     the standard specifies otherwise.  */
+  void *acc_copyin (void *, size_t) __GOACC_NOTHROW;
+  void *acc_present_or_copyin (void *, size_t) __GOACC_NOTHROW;
+  void *acc_create (void *, size_t) __GOACC_NOTHROW;
+  void *acc_present_or_create (void *, size_t) __GOACC_NOTHROW;
+  void acc_copyout (void *, size_t) __GOACC_NOTHROW;
+  void acc_delete (void *, size_t) __GOACC_NOTHROW;
+  void acc_update_device (void *, size_t) __GOACC_NOTHROW;
+  void acc_update_self (void *, size_t) __GOACC_NOTHROW;
+  void acc_map_data (void *, void *, size_t) __GOACC_NOTHROW;
+  void acc_unmap_data (void *) __GOACC_NOTHROW;
+  void *acc_deviceptr (void *) __GOACC_NOTHROW;
+  void *acc_hostptr (void *) __GOACC_NOTHROW;
+  int acc_is_present (void *, size_t) __GOACC_NOTHROW;
+  void acc_memcpy_to_device (void *, void *, size_t) __GOACC_NOTHROW;
+  void acc_memcpy_from_device (void *, void *, size_t) __GOACC_NOTHROW;
+
+  void ACC_target (int, void (*) (void *), const void *,
+	     size_t, void **, size_t *, unsigned char *, int *) __GOACC_NOTHROW;
+  void ACC_parallel (int, void (*) (void *), const void *,
+	     size_t, void **, size_t *, unsigned char *) __GOACC_NOTHROW;
+  void ACC_add_device_code (void const *, char const *) __GOACC_NOTHROW;
+
+  void ACC_async_copy (int) __GOACC_NOTHROW;
+  void ACC_async_kern (int) __GOACC_NOTHROW;
+
+  /* Old names.  OpenACC does not specify whether these can or must
+     not be macros, inlines or aliases for the new names.  */
+  #define acc_pcreate acc_present_or_create
+  #define acc_pcopyin acc_present_or_copyin
+
+  /* CUDA-specific routines.  */
+  void *acc_get_current_cuda_device (void) __GOACC_NOTHROW;
+  void *acc_get_current_cuda_context (void) __GOACC_NOTHROW;
+  void *acc_get_cuda_stream (int __async) __GOACC_NOTHROW;
+  int acc_set_cuda_stream (int __async, void *__stream) __GOACC_NOTHROW;
+  
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _OPENACC_H */
diff --git a/libgomp/openacc_lib.h b/libgomp/openacc_lib.h
new file mode 100644
index 0000000..13118a7
--- /dev/null
+++ b/libgomp/openacc_lib.h
@@ -0,0 +1,390 @@
+!  OpenACC Runtime Library Definitions.			-*- mode: fortran -*-
+
+!  Copyright (C) 2014 Free Software Foundation, Inc.
+
+!  Contributed by Tobias Burnus <burnus@net-b.de>
+!              and Mentor Embedded.
+
+!  This file is part of the GNU OpenMP Library (libgomp).
+
+!  Libgomp is free software; you can redistribute it and/or modify it
+!  under the terms of the GNU General Public License as published by
+!  the Free Software Foundation; either version 3, or (at your option)
+!  any later version.
+
+!  Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+!  WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+!  FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+!  more details.
+
+!  Under Section 7 of GPL version 3, you are granted additional
+!  permissions described in the GCC Runtime Library Exception, version
+!  3.1, as published by the Free Software Foundation.
+
+!  You should have received a copy of the GNU General Public License and
+!  a copy of the GCC Runtime Library Exception along with this program;
+!  see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+!  <http://www.gnu.org/licenses/>.
+
+! NOTE: Due to the use of dimension (..), the code only works when compiled
+! with -std=f2008ts/gnu/legacy but not with other standard settings.
+! Alternatively, the user can use the module version, which permits
+! compilation with -std=f95.
+
+      integer, parameter :: acc_device_kind = 4
+
+      integer (acc_device_kind), parameter :: acc_device_none = 0
+      integer (acc_device_kind), parameter :: acc_device_default = 1
+      integer (acc_device_kind), parameter :: acc_device_host = 2
+      integer (acc_device_kind), parameter :: acc_device_host_nonshm = 3
+      integer (acc_device_kind), parameter :: acc_device_not_host = 4
+      integer (acc_device_kind), parameter :: acc_device_nvidia = 5
+
+      integer, parameter :: acc_handle_kind = 4
+
+      integer (acc_handle_kind), parameter :: acc_async_noval = -1
+      integer (acc_handle_kind), parameter :: acc_async_sync = -2
+
+      integer, parameter :: openacc_version = 201306
+
+      interface
+	function acc_get_num_devices (d)
+     &    bind (C, name = "acc_get_num_devices")
+	  use iso_c_binding, only: c_int
+	  integer (c_int) :: acc_get_num_devices
+	  integer (c_int), value :: d
+	end function
+      end interface
+
+      interface acc_set_device_type
+	subroutine acc_set_device_type (d)
+     &    bind (C, name = "acc_set_device_type")
+	  use iso_c_binding, only: c_int
+	  integer (c_int), value :: d
+	end subroutine
+      end interface
+
+      interface acc_get_device_type
+	function acc_get_device_type ()
+     &    bind (C, name = "acc_get_device_type")
+	  use iso_c_binding, only: c_int
+	  integer (c_int) :: acc_get_device_type
+	end function
+      end interface
+
+      interface acc_set_device_num
+	subroutine acc_set_device_num (n, d)
+     &    bind (C, name = "acc_set_device_num")
+	  use iso_c_binding, only: c_int
+	  integer (c_int), value :: n, d
+	end subroutine
+      end interface
+
+      interface acc_get_device_num
+	function acc_get_device_num (d)
+     &    bind (C, name = "acc_get_device_num")
+	  use iso_c_binding, only: c_int
+	  integer (c_int) :: acc_get_device_num
+	  integer (c_int), value :: d
+	end function
+      end interface
+
+      interface acc_async_test
+        function acc_async_test_h (a)
+          logical acc_async_test_h
+          integer a
+        end function
+      end interface
+
+      interface acc_async_test_all
+        function acc_async_test_all_h ()
+          logical acc_async_test_all_h
+        end function
+      end interface
+
+      interface acc_wait
+	subroutine acc_wait (a)
+     &    bind (C, name = "acc_wait")
+	  use iso_c_binding, only: c_int
+	  integer (c_int), value :: a
+	end subroutine
+      end interface
+
+      interface acc_wait_async
+	subroutine acc_wait_async (a1, a2)
+     &  bind (C, name = "acc_wait_async")
+	end subroutine
+      end interface
+
+      interface acc_wait_all
+	subroutine acc_wait_all ()
+     &    bind (C, name = "acc_wait_all")
+	  use iso_c_binding, only: c_int
+	end subroutine
+      end interface
+
+      interface acc_wait_all_async
+	subroutine acc_wait_all_async (a)
+     &    bind (C, name = "acc_wait_all_async")
+	  use iso_c_binding, only: c_int
+	  integer (c_int), value :: a
+	end subroutine
+      end interface
+
+      interface acc_init
+	subroutine acc_init (d)
+     &    bind (C, name = "acc_init")
+	  use iso_c_binding, only: c_int
+	  integer (c_int), value :: d
+	end subroutine
+      end interface
+
+      interface acc_shutdown
+	subroutine acc_shutdown (d)
+     &    bind (C, name = "acc_shutdown")
+	  use iso_c_binding, only: c_int
+	  integer (c_int), value :: d
+	end subroutine
+      end interface
+
+      interface acc_on_device
+        function acc_on_device_h (devicetype)
+          import acc_device_kind
+          logical acc_on_device_h
+          integer (acc_device_kind) devicetype
+        end function
+      end interface
+
+      ! acc_malloc: Only available in C/C++
+      ! acc_free: Only available in C/C++
+
+      interface acc_copyin
+        subroutine acc_copyin_32_h (a, len)
+          use iso_c_binding, only: c_int32_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int32_t) len
+        end subroutine
+
+        subroutine acc_copyin_64_h (a, len)
+          use iso_c_binding, only: c_int64_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int64_t) len
+        end subroutine
+
+        subroutine acc_copyin_array_h (a)
+          type (*), dimension (..), contiguous :: a
+          end subroutine
+      end interface
+
+      interface acc_present_or_copyin
+        subroutine acc_present_or_copyin_32_h (a, len)
+          use iso_c_binding, only: c_int32_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int32_t) len
+        end subroutine
+
+        subroutine acc_present_or_copyin_64_h (a, len)
+          use iso_c_binding, only: c_int64_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int64_t) len
+        end subroutine
+
+        subroutine acc_present_or_copyin_array_h (a)
+          type (*), dimension (..), contiguous :: a
+          end subroutine
+      end interface
+
+      interface acc_pcopyin
+        subroutine acc_pcopyin_32_h (a, len)
+          use iso_c_binding, only: c_int32_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int32_t) len
+        end subroutine
+
+        subroutine acc_pcopyin_64_h (a, len)
+          use iso_c_binding, only: c_int64_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int64_t) len
+        end subroutine
+
+        subroutine acc_pcopyin_array_h (a)
+          type (*), dimension (..), contiguous :: a
+          end subroutine
+      end interface
+
+      interface acc_create
+        subroutine acc_create_32_h (a, len)
+          use iso_c_binding, only: c_int32_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int32_t) len
+        end subroutine
+
+        subroutine acc_create_64_h (a, len)
+          use iso_c_binding, only: c_int64_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int64_t) len
+        end subroutine
+
+        subroutine acc_create_array_h (a)
+          type (*), dimension (..), contiguous :: a
+          end subroutine
+      end interface
+
+      interface acc_present_or_create
+        subroutine acc_present_or_create_32_h (a, len)
+          use iso_c_binding, only: c_int32_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int32_t) len
+        end subroutine
+
+        subroutine acc_present_or_create_64_h (a, len)
+          use iso_c_binding, only: c_int64_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int64_t) len
+        end subroutine
+
+        subroutine acc_present_or_create_array_h (a)
+          type (*), dimension (..), contiguous :: a
+          end subroutine
+      end interface
+
+      interface acc_pcreate
+        subroutine acc_pcreate_32_h (a, len)
+          use iso_c_binding, only: c_int32_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int32_t) len
+        end subroutine
+
+        subroutine acc_pcreate_64_h (a, len)
+          use iso_c_binding, only: c_int64_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int64_t) len
+        end subroutine
+
+        subroutine acc_pcreate_array_h (a)
+          type (*), dimension (..), contiguous :: a
+          end subroutine
+      end interface
+
+      interface acc_copyout
+        subroutine acc_copyout_32_h (a, len)
+          use iso_c_binding, only: c_int32_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int32_t) len
+        end subroutine
+
+        subroutine acc_copyout_64_h (a, len)
+          use iso_c_binding, only: c_int64_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int64_t) len
+        end subroutine
+
+        subroutine acc_copyout_array_h (a)
+          type (*), dimension (..), contiguous :: a
+        end subroutine
+      end interface
+
+      interface acc_delete
+        subroutine acc_delete_32_h (a, len)
+          use iso_c_binding, only: c_int32_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int32_t) len
+        end subroutine
+
+        subroutine acc_delete_64_h (a, len)
+          use iso_c_binding, only: c_int64_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int64_t) len
+        end subroutine
+
+        subroutine acc_delete_array_h (a)
+          type (*), dimension (..), contiguous :: a
+        end subroutine
+      end interface
+
+      interface acc_update_device
+        subroutine acc_update_device_32_h (a, len)
+          use iso_c_binding, only: c_int32_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int32_t) len
+        end subroutine
+
+        subroutine acc_update_device_64_h (a, len)
+          use iso_c_binding, only: c_int64_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int64_t) len
+        end subroutine
+
+        subroutine acc_update_device_array_h (a)
+          type (*), dimension (..), contiguous :: a
+        end subroutine
+      end interface
+
+      interface acc_update_self
+        subroutine acc_update_self_32_h (a, len)
+          use iso_c_binding, only: c_int32_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int32_t) len
+        end subroutine
+
+        subroutine acc_update_self_64_h (a, len)
+          use iso_c_binding, only: c_int64_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int64_t) len
+        end subroutine
+
+        subroutine acc_update_self_array_h (a)
+          type (*), dimension (..), contiguous :: a
+        end subroutine
+      end interface
+
+      ! acc_map_data: Only available in C/C++
+      ! acc_unmap_data: Only available in C/C++
+      ! acc_deviceptr: Only available in C/C++
+      ! acc_ostptr: Only available in C/C++
+
+      interface acc_is_present
+        function acc_is_present_32_h (a, len)
+          use iso_c_binding, only: c_int32_t
+          logical acc_is_present_32_h
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int32_t) len
+        end function
+
+        function acc_is_present_64_h (a, len)
+          use iso_c_binding, only: c_int64_t
+          logical acc_is_present_64_h
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int64_t) len
+        end function
+
+        function acc_is_present_array_h (a)
+          logical acc_is_present_array_h
+          type (*), dimension (..), contiguous :: a
+        end function
+      end interface
+
+      ! acc_memcpy_to_device: Only available in C/C++
+      ! acc_memcpy_from_device: Only available in C/C++
diff --git a/libgomp/plugin/Makefrag.am b/libgomp/plugin/Makefrag.am
new file mode 100644
index 0000000..d6642d9
--- /dev/null
+++ b/libgomp/plugin/Makefrag.am
@@ -0,0 +1,47 @@
+# Plugins for offload execution, Makefile.am fragment.
+#
+# Copyright (C) 2014 Free Software Foundation, Inc.
+#
+# Contributed by Mentor Embedded.
+#
+# This file is part of the GNU OpenMP Library (libgomp).
+#
+# Libgomp is free software; you can redistribute it and/or modify it
+# under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+# WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+# FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+# more details.
+#
+# Under Section 7 of GPL version 3, you are granted additional
+# permissions described in the GCC Runtime Library Exception, version
+# 3.1, as published by the Free Software Foundation.
+#
+# You should have received a copy of the GNU General Public License and
+# a copy of the GCC Runtime Library Exception along with this program;
+# see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+# <http://www.gnu.org/licenses/>.
+
+if PLUGIN_NVPTX
+# Nvidia PTX OpenACC plugin.
+libgomp_plugin_nvptx_version_info = -version-info $(libtool_VERSION)
+toolexeclib_LTLIBRARIES += libgomp-plugin-nvptx.la
+libgomp_plugin_nvptx_la_SOURCES = plugin/plugin-nvptx.c
+libgomp_plugin_nvptx_la_CPPFLAGS = $(AM_CPPFLAGS) $(PLUGIN_NVPTX_CPPFLAGS)
+libgomp_plugin_nvptx_la_LDFLAGS = $(libgomp_plugin_nvptx_version_info) \
+	$(lt_host_flags)
+libgomp_plugin_nvptx_la_LDFLAGS += $(PLUGIN_NVPTX_LDFLAGS)
+libgomp_plugin_nvptx_la_LIBADD = $(PLUGIN_NVPTX_LIBS)
+libgomp_plugin_nvptx_la_LIBTOOLFLAGS = --tag=disable-static
+endif
+
+libgomp_plugin_host_nonshm_version_info = -version-info $(libtool_VERSION)
+toolexeclib_LTLIBRARIES += libgomp-plugin-host_nonshm.la
+libgomp_plugin_host_nonshm_la_SOURCES = plugin/plugin-host.c
+libgomp_plugin_host_nonshm_la_CPPFLAGS = $(AM_CPPFLAGS) -DHOST_NONSHM_PLUGIN
+libgomp_plugin_host_nonshm_la_LDFLAGS = \
+	$(libgomp_plugin_host_nonshm_version_info) $(lt_host_flags)
+libgomp_plugin_host_nonshm_la_LIBTOOLFLAGS = --tag=disable-static
diff --git a/libgomp/plugin/configfrag.ac b/libgomp/plugin/configfrag.ac
new file mode 100644
index 0000000..68c7dc7
--- /dev/null
+++ b/libgomp/plugin/configfrag.ac
@@ -0,0 +1,107 @@
+# Plugins for offload execution, configure.ac fragment.
+#
+# Copyright (C) 2014 Free Software Foundation, Inc.
+#
+# Contributed by Mentor Embedded.
+#
+# This file is part of the GNU OpenMP Library (libgomp).
+#
+# Libgomp is free software; you can redistribute it and/or modify it
+# under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+# WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+# FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+# more details.
+#
+# Under Section 7 of GPL version 3, you are granted additional
+# permissions described in the GCC Runtime Library Exception, version
+# 3.1, as published by the Free Software Foundation.
+#
+# You should have received a copy of the GNU General Public License and
+# a copy of the GCC Runtime Library Exception along with this program;
+# see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+# <http://www.gnu.org/licenses/>.
+
+# Look for the CUDA driver package.
+CUDA_DRIVER_INCLUDE=
+CUDA_DRIVER_LIB=
+AC_SUBST(CUDA_DRIVER_INCLUDE)
+AC_SUBST(CUDA_DRIVER_LIB)
+CUDA_DRIVER_CPPFLAGS=
+CUDA_DRIVER_LDFLAGS=
+AC_ARG_WITH(cuda-driver,
+	[AS_HELP_STRING([--with-cuda-driver=PATH],
+		[specify prefix directory for installed CUDA driver package.
+		 Equivalent to --with-cuda-driver-include=PATH/include
+		 plus --with-cuda-driver-lib=PATH/lib])])
+AC_ARG_WITH(cuda-driver-include,
+	[AS_HELP_STRING([--with-cuda-driver-include=PATH],
+		[specify directory for installed CUDA driver include files])])
+AC_ARG_WITH(cuda-driver-lib,
+	[AS_HELP_STRING([--with-cuda-driver-lib=PATH],
+		[specify directory for the installed CUDA driver library])])
+if test "x$with_cuda_driver" != x; then
+  CUDA_DRIVER_INCLUDE=$with_cuda_driver/include
+  CUDA_DRIVER_LIB=$with_cuda_driver/lib
+fi
+if test "x$with_cuda_driver_include" != x; then
+  CUDA_DRIVER_INCLUDE=$with_cuda_driver_include
+fi
+if test "x$with_cuda_driver_lib" != x; then
+  CUDA_DRIVER_LIB=$with_cuda_driver_lib
+fi
+if test "x$CUDA_DRIVER_INCLUDE" != x; then
+  CUDA_DRIVER_CPPFLAGS=-I$CUDA_DRIVER_INCLUDE
+fi
+if test "x$CUDA_DRIVER_LIB" != x; then
+  CUDA_DRIVER_LDFLAGS=-L$CUDA_DRIVER_LIB
+fi
+
+PLUGIN_NVPTX=0
+PLUGIN_NVPTX_CPPFLAGS=
+PLUGIN_NVPTX_LDFLAGS=
+PLUGIN_NVPTX_LIBS=
+AC_SUBST(PLUGIN_NVPTX)
+AC_SUBST(PLUGIN_NVPTX_CPPFLAGS)
+AC_SUBST(PLUGIN_NVPTX_LDFLAGS)
+AC_SUBST(PLUGIN_NVPTX_LIBS)
+
+for accel in `echo $enable_offload_targets | sed -e 's#,# #g'`; do
+  case "$accel" in
+    nvptx*)
+      PLUGIN_NVPTX=$accel
+      PLUGIN_NVPTX_CPPFLAGS=$CUDA_DRIVER_CPPFLAGS
+      PLUGIN_NVPTX_LDFLAGS=$CUDA_DRIVER_LDFLAGS
+      PLUGIN_NVPTX_LIBS='-lcuda'
+
+      PLUGIN_NVPTX_save_CPPFLAGS=$CPPFLAGS
+      CPPFLAGS="$PLUGIN_NVPTX_CPPFLAGS $CPPFLAGS"
+      PLUGIN_NVPTX_save_LDFLAGS=$LDFLAGS
+      LDFLAGS="$PLUGIN_NVPTX_LDFLAGS $LDFLAGS"
+      PLUGIN_NVPTX_save_LIBS=$LIBS
+      LIBS="$PLUGIN_NVPTX_LIBS $LIBS"
+      AC_LINK_IFELSE(
+	[AC_LANG_PROGRAM(
+	  [#include "cuda.h"],
+	  [CUresult r = cuCtxPushCurrent (NULL);])],
+	[PLUGIN_NVPTX=1])
+      CPPFLAGS=$PLUGIN_NVPTX_save_CPPFLAGS
+      LDFLAGS=$PLUGIN_NVPTX_save_LDFLAGS
+      LIBS=$PLUGIN_NVPTX_save_LIBS
+      case $PLUGIN_NVPTX in
+	nvptx*)
+	  PLUGIN_NVPTX=0
+	  AC_MSG_ERROR([CUDA driver package required for nvptx support])
+	  ;;
+      esac
+      ;;
+  esac
+done
+AM_CONDITIONAL([PLUGIN_NVPTX], [test $PLUGIN_NVPTX = 1])
+AC_DEFINE_UNQUOTED([PLUGIN_NVPTX], [$PLUGIN_NVPTX],
+		  [Define to 1 if the NVIDIA plugin is built, 0 if not.])
+
+AC_OUTPUT
diff --git a/libgomp/plugin/plugin-host.c b/libgomp/plugin/plugin-host.c
new file mode 100644
index 0000000..aee3c4e
--- /dev/null
+++ b/libgomp/plugin/plugin-host.c
@@ -0,0 +1,269 @@
+/* OpenACC Runtime Library: acc_device_host, acc_device_host_nonshm.
+
+   Copyright (C) 2013 Free Software Foundation, Inc.
+
+   Contributed by Mentor Embedded.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* Simple implementation of support routines for a shared-memory
+   acc_device_host, and a non-shared memory acc_device_host_nonshm, with the
+   latter built as a plugin.  */
+
+#include "openacc.h"
+#include "config.h"
+#include "libgomp.h"
+#include "libgomp_target.h"
+#ifdef HOST_NONSHM_PLUGIN
+#include "libgomp-plugin.h"
+#include "oacc-plugin.h"
+#else
+#include "oacc-int.h"
+#endif
+
+#include <stdint.h>
+#include <stdlib.h>
+#include <string.h>
+#include <stdio.h>
+
+#ifdef HOST_NONSHM_PLUGIN
+#define STATIC
+#define GOMP(X) GOMP_PLUGIN_##X
+#define SELF "host_nonshm plugin: "
+#else
+#define STATIC static
+#define GOMP(X) gomp_##X
+#define SELF "host: "
+#endif
+
+#ifndef HOST_NONSHM_PLUGIN
+static struct gomp_device_descr host_dispatch;
+#endif
+
+STATIC const char *
+GOMP_OFFLOAD_get_name (void)
+{
+#ifdef HOST_NONSHM_PLUGIN
+  return "host_nonshm";
+#else
+  return "host";
+#endif
+}
+
+STATIC int
+GOMP_OFFLOAD_get_type (void)
+{
+#ifdef HOST_NONSHM_PLUGIN
+  return OFFLOAD_TARGET_TYPE_HOST_NONSHM;
+#else
+  return OFFLOAD_TARGET_TYPE_HOST;
+#endif
+}
+
+STATIC unsigned int
+GOMP_OFFLOAD_get_caps (void)
+{
+  unsigned int caps = TARGET_CAP_OPENACC_200 | TARGET_CAP_NATIVE_EXEC;
+
+#ifndef HOST_NONSHM_PLUGIN
+  caps |= TARGET_CAP_SHARED_MEM;
+#endif
+
+  return caps;
+}
+
+STATIC int
+GOMP_OFFLOAD_get_num_devices (void)
+{
+  return 1;
+}
+
+STATIC void
+GOMP_OFFLOAD_register_image (void *host_table __attribute__((unused)),
+			     void *target_data __attribute__((unused)))
+{
+}
+
+STATIC void
+GOMP_OFFLOAD_init_device (int n __attribute__((unused)))
+{
+}
+
+STATIC void
+GOMP_OFFLOAD_fini_device (int n __attribute__((unused)))
+{
+}
+
+STATIC int
+GOMP_OFFLOAD_get_table (int n __attribute__((unused)),
+			struct mapping_table **table __attribute__((unused)))
+{
+  return 0;
+}
+
+STATIC void *
+GOMP_OFFLOAD_openacc_open_device (int n)
+{
+  return (void *) (intptr_t) n;
+}
+
+STATIC int
+GOMP_OFFLOAD_openacc_close_device (void *hnd)
+{
+  return 0;
+}
+
+STATIC int
+GOMP_OFFLOAD_openacc_get_device_num (void)
+{
+  return 0;
+}
+
+STATIC void
+GOMP_OFFLOAD_openacc_set_device_num (int n)
+{
+  if (n > 0)
+    GOMP(fatal) ("device number %u out of range for host execution", n);
+}
+
+STATIC void *
+GOMP_OFFLOAD_alloc (int n __attribute__((unused)), size_t s)
+{
+  return GOMP(malloc) (s);
+}
+
+STATIC void
+GOMP_OFFLOAD_free (int n __attribute__((unused)), void *p)
+{
+  free (p);
+}
+
+STATIC void *
+GOMP_OFFLOAD_host2dev (int n __attribute__((unused)), void *d, const void *h,
+		       size_t s)
+{
+#ifdef HOST_NONSHM_PLUGIN
+  memcpy (d, h, s);
+#endif
+
+  return 0;
+}
+
+STATIC void *
+GOMP_OFFLOAD_dev2host (int n __attribute__((unused)), void *h, const void *d,
+		       size_t s)
+{
+#ifdef HOST_NONSHM_PLUGIN
+  memcpy (h, d, s);
+#endif
+
+  return 0;
+}
+
+STATIC void
+GOMP_OFFLOAD_run (int n __attribute__((unused)), void *fn_ptr, void *vars)
+{
+  void (*fn)(void *) = (void (*)(void *)) fn_ptr;
+
+  fn (vars);
+}
+
+STATIC void
+GOMP_OFFLOAD_openacc_parallel (void (*fn) (void *),
+			       size_t mapnum __attribute__((unused)),
+			       void **hostaddrs,
+			       void **devaddrs __attribute__((unused)),
+			       size_t *sizes __attribute__((unused)),
+			       unsigned short *kinds __attribute__((unused)),
+			       int num_gangs __attribute__((unused)),
+			       int num_workers __attribute__((unused)),
+			       int vector_length __attribute__((unused)),
+			       int async __attribute__((unused)),
+			       void *targ_mem_desc __attribute__((unused)))
+{
+#ifdef HOST_NONSHM_PLUGIN
+  fn (devaddrs);
+#else
+  fn (hostaddrs);
+#endif
+}
+
+STATIC void
+GOMP_OFFLOAD_openacc_register_async_cleanup (void *targ_mem_desc)
+{
+#ifdef HOST_NONSHM_PLUGIN
+  /* "Asynchronous" launches are executed synchronously on the (non-SHM) host,
+     so there's no point in delaying host-side cleanup -- just do it now.  */
+  GOMP_PLUGIN_async_unmap_vars (targ_mem_desc);
+#endif
+}
+
+STATIC void
+GOMP_OFFLOAD_openacc_async_set_async (int async __attribute__((unused)))
+{
+}
+
+STATIC int
+GOMP_OFFLOAD_openacc_async_test (int async __attribute__((unused)))
+{
+  return 1;
+}
+
+STATIC int
+GOMP_OFFLOAD_openacc_async_test_all (void)
+{
+  return 1;
+}
+
+STATIC void
+GOMP_OFFLOAD_openacc_async_wait (int async __attribute__((unused)))
+{
+}
+
+STATIC void
+GOMP_OFFLOAD_openacc_async_wait_all (void)
+{
+}
+
+STATIC void
+GOMP_OFFLOAD_openacc_async_wait_async (int async1 __attribute__((unused)),
+				       int async2 __attribute__((unused)))
+{
+}
+
+STATIC void
+GOMP_OFFLOAD_openacc_async_wait_all_async (int async __attribute__((unused)))
+{
+}
+
+STATIC void *
+GOMP_OFFLOAD_openacc_create_thread_data (void *targ_data
+					 __attribute__((unused)))
+{
+  return NULL;
+}
+
+STATIC void
+GOMP_OFFLOAD_openacc_destroy_thread_data (void *tls_data
+					  __attribute__((unused)))
+{
+}
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
new file mode 100644
index 0000000..3d1b81b
--- /dev/null
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -0,0 +1,1852 @@
+/* Plugin for NVPTX execution.
+
+   Copyright (C) 2013-2014 Free Software Foundation, Inc.
+
+   Contributed by Mentor Embedded.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* Nvidia PTX-specific parts of OpenACC support.  The cuda driver
+   library appears to hold some implicit state, but the documentation
+   is not clear as to what that state might be.  Or how one might
+   propagate it from one thread to another.  */
+
+#include "openacc.h"
+#include "config.h"
+#include "libgomp.h"
+#include "libgomp_target.h"
+#include "libgomp-plugin.h"
+#include "oacc-plugin.h"
+
+#include <cuda.h>
+#include <stdint.h>
+#include <string.h>
+#include <stdio.h>
+#include <dlfcn.h>
+#include <unistd.h>
+#include <assert.h>
+
+#define	ARRAYSIZE(X) (sizeof (X) / sizeof ((X)[0]))
+
+static struct
+{
+  CUresult r;
+  char *m;
+} cuda_errlist[]=
+{
+  { CUDA_ERROR_INVALID_VALUE, "invalid value" },
+  { CUDA_ERROR_OUT_OF_MEMORY, "out of memory" },
+  { CUDA_ERROR_NOT_INITIALIZED, "not initialized" },
+  { CUDA_ERROR_DEINITIALIZED, "deinitialized" },
+  { CUDA_ERROR_PROFILER_DISABLED, "profiler disabled" },
+  { CUDA_ERROR_PROFILER_NOT_INITIALIZED, "profiler not initialized" },
+  { CUDA_ERROR_PROFILER_ALREADY_STARTED, "already started" },
+  { CUDA_ERROR_PROFILER_ALREADY_STOPPED, "already stopped" },
+  { CUDA_ERROR_NO_DEVICE, "no device" },
+  { CUDA_ERROR_INVALID_DEVICE, "invalid device" },
+  { CUDA_ERROR_INVALID_IMAGE, "invalid image" },
+  { CUDA_ERROR_INVALID_CONTEXT, "invalid context" },
+  { CUDA_ERROR_CONTEXT_ALREADY_CURRENT, "context already current" },
+  { CUDA_ERROR_MAP_FAILED, "map error" },
+  { CUDA_ERROR_UNMAP_FAILED, "unmap error" },
+  { CUDA_ERROR_ARRAY_IS_MAPPED, "array is mapped" },
+  { CUDA_ERROR_ALREADY_MAPPED, "already mapped" },
+  { CUDA_ERROR_NO_BINARY_FOR_GPU, "no binary for gpu" },
+  { CUDA_ERROR_ALREADY_ACQUIRED, "already acquired" },
+  { CUDA_ERROR_NOT_MAPPED, "not mapped" },
+  { CUDA_ERROR_NOT_MAPPED_AS_ARRAY, "not mapped as array" },
+  { CUDA_ERROR_NOT_MAPPED_AS_POINTER, "not mapped as pointer" },
+  { CUDA_ERROR_ECC_UNCORRECTABLE, "ecc uncorrectable" },
+  { CUDA_ERROR_UNSUPPORTED_LIMIT, "unsupported limit" },
+  { CUDA_ERROR_CONTEXT_ALREADY_IN_USE, "context already in use" },
+  { CUDA_ERROR_PEER_ACCESS_UNSUPPORTED, "peer access unsupported" },
+  { CUDA_ERROR_INVALID_SOURCE, "invalid source" },
+  { CUDA_ERROR_FILE_NOT_FOUND, "file not found" },
+  { CUDA_ERROR_SHARED_OBJECT_SYMBOL_NOT_FOUND,
+                                           "shared object symbol not found" },
+  { CUDA_ERROR_SHARED_OBJECT_INIT_FAILED, "shared object init error" },
+  { CUDA_ERROR_OPERATING_SYSTEM, "operating system" },
+  { CUDA_ERROR_INVALID_HANDLE, "invalid handle" },
+  { CUDA_ERROR_NOT_FOUND, "not found" },
+  { CUDA_ERROR_NOT_READY, "not ready" },
+  { CUDA_ERROR_LAUNCH_FAILED, "launch error" },
+  { CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES, "launch out of resources" },
+  { CUDA_ERROR_LAUNCH_TIMEOUT, "launch timeout" },
+  { CUDA_ERROR_LAUNCH_INCOMPATIBLE_TEXTURING,
+                                             "launch incompatibe texturing" },
+  { CUDA_ERROR_PEER_ACCESS_ALREADY_ENABLED, "peer access already enabled" },
+  { CUDA_ERROR_PEER_ACCESS_NOT_ENABLED, "peer access not enabled " },
+  { CUDA_ERROR_PRIMARY_CONTEXT_ACTIVE, "primary cotext active" },
+  { CUDA_ERROR_CONTEXT_IS_DESTROYED, "context is destroyed" },
+  { CUDA_ERROR_ASSERT, "assert" },
+  { CUDA_ERROR_TOO_MANY_PEERS, "too many peers" },
+  { CUDA_ERROR_HOST_MEMORY_ALREADY_REGISTERED,
+                                           "host memory already registered" },
+  { CUDA_ERROR_HOST_MEMORY_NOT_REGISTERED, "host memory not registered" },
+  { CUDA_ERROR_NOT_PERMITTED, "not permitted" },
+  { CUDA_ERROR_NOT_SUPPORTED, "not supported" },
+  { CUDA_ERROR_UNKNOWN, "unknown" }
+};
+
+static char errmsg[128];
+
+static char *
+cuda_error (CUresult r)
+{
+  int i;
+
+  for (i = 0; i < ARRAYSIZE (cuda_errlist); i++)
+    {
+      if (cuda_errlist[i].r == r)
+	return &cuda_errlist[i].m[0];
+    }
+
+  sprintf (&errmsg[0], "unknown result code: %5d", r);
+
+  return &errmsg[0];
+}
+
+struct targ_fn_descriptor
+{
+  CUfunction fn;
+  const char *name;
+};
+
+static bool ptx_inited = false;
+
+struct ptx_stream
+{
+  CUstream stream;
+  pthread_t host_thread;
+  bool multithreaded;
+
+  CUdeviceptr d;
+  void *h;
+  void *h_begin;
+  void *h_end;
+  void *h_next;
+  void *h_prev;
+  void *h_tail;
+
+  struct ptx_stream *next;
+};
+
+/* Thread-specific data for PTX.  */
+
+struct nvptx_thread
+{
+  struct ptx_stream *current_stream;
+  struct ptx_device *ptx_dev;
+};
+
+struct map
+{
+  int     async;
+  size_t  size;
+  char    mappings[0];
+};
+
+static void
+map_init (struct ptx_stream *s)
+{
+  CUresult r;
+
+  int size = getpagesize ();
+
+  assert (s);
+  assert (!s->d);
+  assert (!s->h);
+
+  r = cuMemAllocHost (&s->h, size);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuMemAllocHost error: %s", cuda_error (r));
+
+  r = cuMemHostGetDevicePointer (&s->d, s->h, 0);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuMemHostGetDevicePointer error: %s", cuda_error (r));
+
+  assert (s->h);
+
+  s->h_begin = s->h;
+  s->h_end = s->h_begin + size;
+  s->h_next = s->h_prev = s->h_tail = s->h_begin;
+
+  assert (s->h_next);
+  assert (s->h_end);
+}
+
+static void
+map_fini (struct ptx_stream *s)
+{
+  CUresult r;
+  
+  r = cuMemFreeHost (s->h);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuMemFreeHost error: %s", cuda_error (r));
+}
+
+static void
+map_pop (struct ptx_stream *s)
+{
+  struct map *m;
+
+  assert (s != NULL);
+  assert (s->h_next);
+  assert (s->h_prev);
+  assert (s->h_tail);
+
+  m = s->h_tail;
+
+  s->h_tail += m->size;
+
+  if (s->h_tail >= s->h_end)
+    s->h_tail = s->h_begin + (int) (s->h_tail - s->h_end);
+
+  if (s->h_next == s->h_tail)
+    s->h_prev = s->h_next;
+
+  assert (s->h_next >= s->h_begin);
+  assert (s->h_tail >= s->h_begin);
+  assert (s->h_prev >= s->h_begin);
+
+  assert (s->h_next <= s->h_end);
+  assert (s->h_tail <= s->h_end);
+  assert (s->h_prev <= s->h_end);
+}
+
+static void
+map_push (struct ptx_stream *s, int async, size_t size, void **h, void **d)
+{
+  int left;
+  int offset;
+  struct map *m;
+
+  assert (s != NULL);
+
+  left = s->h_end - s->h_next;
+  size += sizeof (struct map);
+
+  assert (s->h_prev);
+  assert (s->h_next);
+
+  if (size >= left)
+    {
+      m = s->h_prev;
+      m->size += left;
+      s->h_next = s->h_begin;
+
+      if (s->h_next + size > s->h_end)
+	GOMP_PLUGIN_fatal ("unable to push map");
+    }
+
+  assert (s->h_next);
+
+  m = s->h_next;
+  m->async = async;
+  m->size = size;
+
+  offset = (void *)&m->mappings[0] - s->h;
+
+  *d = (void *)(s->d + offset);
+  *h = (void *)(s->h + offset);
+
+  s->h_prev = s->h_next;
+  s->h_next += size;
+
+  assert (s->h_prev);
+  assert (s->h_next);
+
+  assert (s->h_next >= s->h_begin);
+  assert (s->h_tail >= s->h_begin);
+  assert (s->h_prev >= s->h_begin);
+  assert (s->h_next <= s->h_end);
+  assert (s->h_tail <= s->h_end);
+  assert (s->h_prev <= s->h_end);
+
+  return;
+}
+
+struct ptx_device
+{
+  CUcontext ctx;
+  bool ctx_shared;
+  CUdevice dev;
+  struct ptx_stream *null_stream;
+  /* All non-null streams associated with this device (actually context),
+     either created implicitly or passed in from the user (via
+     acc_set_cuda_stream).  */
+  struct ptx_stream *active_streams;
+  struct {
+    struct ptx_stream **arr;
+    int size;
+  } async_streams;
+  /* A lock for use when manipulating the above stream list and array.  */
+  gomp_mutex_t stream_lock;
+  int ord;
+  bool overlap;
+  bool map;
+  bool concur;
+  int  mode;
+  bool mkern;
+
+  struct ptx_device *next;
+};
+
+enum PTX_event_type
+{
+  PTX_EVT_MEM,
+  PTX_EVT_KNL,
+  PTX_EVT_SYNC,
+  PTX_EVT_ASYNC_CLEANUP
+};
+
+struct PTX_event
+{
+  CUevent *evt;
+  int type;
+  void *addr;
+  int ord;
+
+  struct PTX_event *next;
+};
+
+static gomp_mutex_t PTX_event_lock;
+static struct PTX_event *PTX_events;
+
+#define _XSTR(s) _STR(s)
+#define _STR(s) #s
+
+static struct _synames
+{
+  char *n;
+} cuSymNames[] =
+{
+  { _XSTR (cuCtxCreate) },
+  { _XSTR (cuCtxDestroy) },
+  { _XSTR (cuCtxGetCurrent) },
+  { _XSTR (cuCtxPushCurrent) },
+  { _XSTR (cuCtxSynchronize) },
+  { _XSTR (cuDeviceGet) },
+  { _XSTR (cuDeviceGetAttribute) },
+  { _XSTR (cuDeviceGetCount) },
+  { _XSTR (cuEventCreate) },
+  { _XSTR (cuEventDestroy) },
+  { _XSTR (cuEventQuery) },
+  { _XSTR (cuEventRecord) },
+  { _XSTR (cuInit) },
+  { _XSTR (cuLaunchKernel) },
+  { _XSTR (cuLinkAddData) },
+  { _XSTR (cuLinkComplete) },
+  { _XSTR (cuLinkCreate) },
+  { _XSTR (cuMemAlloc) },
+  { _XSTR (cuMemAllocHost) },
+  { _XSTR (cuMemcpy) },
+  { _XSTR (cuMemcpyDtoH) },
+  { _XSTR (cuMemcpyDtoHAsync) },
+  { _XSTR (cuMemcpyHtoD) },
+  { _XSTR (cuMemcpyHtoDAsync) },
+  { _XSTR (cuMemFree) },
+  { _XSTR (cuMemFreeHost) },
+  { _XSTR (cuMemGetAddressRange) },
+  { _XSTR (cuMemHostGetDevicePointer) },
+  { _XSTR (cuMemHostRegister) },
+  { _XSTR (cuMemHostUnregister) },
+  { _XSTR (cuModuleGetFunction) },
+  { _XSTR (cuModuleLoadData) },
+  { _XSTR (cuStreamDestroy) },
+  { _XSTR (cuStreamQuery) },
+  { _XSTR (cuStreamSynchronize) },
+  { _XSTR (cuStreamWaitEvent) }
+};
+
+static int
+verify_device_library (void)
+{
+  int i;
+  void *dh, *ds;
+
+  dh = dlopen ("libcuda.so", RTLD_LAZY);
+  if (!dh)
+    return -1;
+
+  for (i = 0; i < ARRAYSIZE (cuSymNames); i++)
+    {
+      ds = dlsym (dh, cuSymNames[i].n);
+      if (!ds)
+        return -1;
+    }
+
+  dlclose (dh);
+  
+  return 0;
+}
+
+static inline struct nvptx_thread *
+nvptx_thread (void)
+{
+  return (struct nvptx_thread *) GOMP_PLUGIN_acc_thread ();
+}
+
+static void
+init_streams_for_device (struct ptx_device *ptx_dev, int concurrency)
+{
+  int i;
+  struct ptx_stream *null_stream
+    = GOMP_PLUGIN_malloc (sizeof (struct ptx_stream));
+
+  null_stream->stream = NULL;
+  null_stream->host_thread = pthread_self ();
+  null_stream->multithreaded = true;
+  null_stream->d = (CUdeviceptr) NULL;
+  null_stream->h = NULL;
+  map_init (null_stream);
+  ptx_dev->null_stream = null_stream;
+  
+  ptx_dev->active_streams = NULL;
+  GOMP_PLUGIN_mutex_init (&ptx_dev->stream_lock);
+  
+  if (concurrency < 1)
+    concurrency = 1;
+  
+  /* This is just a guess -- make space for as many async streams as the
+     current device is capable of concurrently executing.  This can grow
+     later as necessary.  No streams are created yet.  */
+  ptx_dev->async_streams.arr
+    = GOMP_PLUGIN_malloc (concurrency * sizeof (struct ptx_stream *));
+  ptx_dev->async_streams.size = concurrency;
+  
+  for (i = 0; i < concurrency; i++)
+    ptx_dev->async_streams.arr[i] = NULL;
+}
+
+static void
+fini_streams_for_device (struct ptx_device *ptx_dev)
+{
+  free (ptx_dev->async_streams.arr);
+  
+  while (ptx_dev->active_streams != NULL)
+    {
+      struct ptx_stream *s = ptx_dev->active_streams;
+      ptx_dev->active_streams = ptx_dev->active_streams->next;
+
+      cuStreamDestroy (s->stream);
+      map_fini (s);
+      free (s);
+    }
+  
+  map_fini (ptx_dev->null_stream);
+  free (ptx_dev->null_stream);
+}
+
+/* Select a stream for (OpenACC-semantics) ASYNC argument for the current
+   thread THREAD (and also current device/context).  If CREATE is true, create
+   the stream if it does not exist (or use EXISTING if it is non-NULL), and
+   associate the stream with the same thread argument.  Returns stream to use
+   as result.  */
+
+static struct ptx_stream *
+select_stream_for_async (int async, pthread_t thread, bool create,
+			 CUstream existing)
+{
+  struct nvptx_thread *nvthd = nvptx_thread ();
+  /* Local copy of TLS variable.  */
+  struct ptx_device *ptx_dev = nvthd->ptx_dev;
+  struct ptx_stream *stream = NULL;
+  int orig_async = async;
+  
+  /* The special value acc_async_noval (-1) maps (for now) to an
+     implicitly-created stream, which is then handled the same as any other
+     numbered async stream.  Other options are available, e.g. using the null
+     stream for anonymous async operations, or choosing an idle stream from an
+     active set.  But, stick with this for now.  */
+  if (async > acc_async_sync)
+    async++;
+  
+  if (create)
+    GOMP_PLUGIN_mutex_lock (&ptx_dev->stream_lock);
+
+  /* NOTE: AFAICT there's no particular need for acc_async_sync to map to the
+     null stream, and in fact better performance may be obtainable if it doesn't
+     (because the null stream enforces overly-strict synchronisation with
+     respect to other streams for legacy reasons, and that's probably not
+     needed with OpenACC).  Maybe investigate later.  */
+  if (async == acc_async_sync)
+    stream = ptx_dev->null_stream;
+  else if (async >= 0 && async < ptx_dev->async_streams.size
+	   && ptx_dev->async_streams.arr[async] && !(create && existing))
+    stream = ptx_dev->async_streams.arr[async];
+  else if (async >= 0 && create)
+    {
+      if (async >= ptx_dev->async_streams.size)
+	{
+	  int i, newsize = ptx_dev->async_streams.size * 2;
+	  
+	  if (async >= newsize)
+	    newsize = async + 1;
+	  
+	  ptx_dev->async_streams.arr
+	    = GOMP_PLUGIN_realloc (ptx_dev->async_streams.arr,
+				   newsize * sizeof (struct ptx_stream *));
+	  
+	  for (i = ptx_dev->async_streams.size; i < newsize; i++)
+	    ptx_dev->async_streams.arr[i] = NULL;
+	  
+	  ptx_dev->async_streams.size = newsize;
+	}
+
+      /* Create a new stream on-demand if there isn't one already, or if we're
+	 setting a particular async value to an existing (externally-provided)
+	 stream.  */
+      if (!ptx_dev->async_streams.arr[async] || existing)
+        {
+	  CUresult r;
+	  struct ptx_stream *s
+	    = GOMP_PLUGIN_malloc (sizeof (struct ptx_stream));
+
+	  if (existing)
+	    s->stream = existing;
+	  else
+	    {
+	      r = cuStreamCreate (&s->stream, CU_STREAM_DEFAULT);
+	      if (r != CUDA_SUCCESS)
+		GOMP_PLUGIN_fatal ("cuStreamCreate error: %s", cuda_error (r));
+	    }
+	  
+	  /* If CREATE is true, we're going to be queueing some work on this
+	     stream.  Associate it with the current host thread.  */
+	  s->host_thread = thread;
+	  s->multithreaded = false;
+	  
+	  s->d = (CUdeviceptr) NULL;
+	  s->h = NULL;
+	  map_init (s);
+	  
+	  s->next = ptx_dev->active_streams;
+	  ptx_dev->active_streams = s;
+	  ptx_dev->async_streams.arr[async] = s;
+	}
+
+      stream = ptx_dev->async_streams.arr[async];
+    }
+  else if (async < 0)
+    GOMP_PLUGIN_fatal ("bad async %d", async);
+
+  if (create)
+    {
+      assert (stream != NULL);
+
+      /* If we're trying to use the same stream from different threads
+	 simultaneously, set stream->multithreaded to true.  This affects the
+	 behaviour of acc_async_test_all and acc_wait_all, which are supposed to
+	 only wait for asynchronous launches from the same host thread they are
+	 invoked on.  If multiple threads use the same async value, we make note
+	 of that here and fall back to testing/waiting for all threads in those
+	 functions.  */
+      if (thread != stream->host_thread)
+        stream->multithreaded = true;
+
+      GOMP_PLUGIN_mutex_unlock (&ptx_dev->stream_lock);
+    }
+  else if (stream && !stream->multithreaded
+	   && !pthread_equal (stream->host_thread, thread))
+    GOMP_PLUGIN_fatal ("async %d used on wrong thread", orig_async);
+
+  return stream;
+}
+
+static int PTX_get_num_devices (void);
+
+/* Initialize the device.  */
+static int
+PTX_init (void)
+{
+  CUresult r;
+  int rc;
+
+  if (ptx_inited)
+    return PTX_get_num_devices ();
+
+  rc = verify_device_library ();
+  if (rc < 0)
+    return -1;
+
+  r = cuInit (0);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuInit error: %s", cuda_error (r));
+
+  PTX_events = NULL;
+
+  GOMP_PLUGIN_mutex_init (&PTX_event_lock);
+
+  ptx_inited = true;
+
+  return PTX_get_num_devices ();
+}
+
+static void
+PTX_fini (void)
+{
+  ptx_inited = false;
+}
+
+static void *
+PTX_open_device (int n)
+{
+  struct ptx_device *ptx_dev;
+  CUdevice dev;
+  CUresult r;
+  int async_engines, pi;
+
+  r = cuDeviceGet (&dev, n);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuDeviceGet error: %s", cuda_error (r));
+
+  ptx_dev = GOMP_PLUGIN_malloc (sizeof (struct ptx_device));
+
+  ptx_dev->ord = n;
+  ptx_dev->dev = dev;
+  ptx_dev->ctx_shared = false;
+
+  r = cuCtxGetCurrent (&ptx_dev->ctx);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuCtxGetCurrent error: %s", cuda_error (r));
+
+  if (!ptx_dev->ctx)
+    {
+      r = cuCtxCreate (&ptx_dev->ctx, CU_CTX_SCHED_AUTO, dev);
+      if (r != CUDA_SUCCESS)
+	GOMP_PLUGIN_fatal ("cuCtxCreate error: %s", cuda_error (r));
+    }
+  else
+    ptx_dev->ctx_shared = true;
+
+  r = cuDeviceGetAttribute (&pi, CU_DEVICE_ATTRIBUTE_GPU_OVERLAP, dev);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuDeviceGetAttribute error: %s", cuda_error (r));
+
+  ptx_dev->overlap = pi;
+
+  r = cuDeviceGetAttribute (&pi, CU_DEVICE_ATTRIBUTE_CAN_MAP_HOST_MEMORY, dev);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuDeviceGetAttribute error: %s", cuda_error (r));
+
+  ptx_dev->map = pi;
+
+  r = cuDeviceGetAttribute (&pi, CU_DEVICE_ATTRIBUTE_CONCURRENT_KERNELS, dev);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuDeviceGetAttribute error: %s", cuda_error (r));
+
+  ptx_dev->concur = pi;
+
+  r = cuDeviceGetAttribute (&pi, CU_DEVICE_ATTRIBUTE_COMPUTE_MODE, dev);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuDeviceGetAttribute error: %s", cuda_error (r));
+
+  ptx_dev->mode = pi;
+
+  r = cuDeviceGetAttribute (&pi, CU_DEVICE_ATTRIBUTE_INTEGRATED, dev);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuDeviceGetAttribute error: %s", cuda_error (r));
+
+  ptx_dev->mkern = pi;
+
+  r = cuDeviceGetAttribute (&async_engines,
+			    CU_DEVICE_ATTRIBUTE_ASYNC_ENGINE_COUNT, dev);
+  if (r != CUDA_SUCCESS)
+    async_engines = 1;
+
+  init_streams_for_device (ptx_dev, async_engines);
+
+  return (void *) ptx_dev;
+}
+
+static int
+PTX_close_device (void *targ_data)
+{
+  CUresult r;
+  struct ptx_device *ptx_dev = targ_data;
+
+  if (!ptx_dev)
+    return 0;
+  
+  fini_streams_for_device (ptx_dev);
+
+  if (!ptx_dev->ctx_shared)
+    {
+      r = cuCtxDestroy (ptx_dev->ctx);
+      if (r != CUDA_SUCCESS)
+	GOMP_PLUGIN_fatal ("cuCtxDestroy error: %s", cuda_error (r));
+    }
+
+  free (ptx_dev);
+
+  return 0;
+}
+
+static int
+PTX_get_num_devices (void)
+{
+  int n;
+  CUresult r;
+
+  /* This function will be called before the plugin has been initialized in
+     order to enumerate available devices, but CUDA API routines can't be used
+     until cuInit has been called.  Just call it now (but don't yet do any
+     further initialization).  */
+  if (!ptx_inited)
+    cuInit (0);
+
+  r = cuDeviceGetCount (&n);
+  if (r!= CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuDeviceGetCount error: %s", cuda_error (r));
+
+  return n;
+}
+
+#define ABORT_PTX				\
+  ".version 3.1\n"				\
+  ".target sm_30\n"				\
+  ".address_size 64\n"				\
+  ".visible .func abort;\n"			\
+  ".visible .func abort\n"			\
+  "{\n"						\
+  "trap;\n"					\
+  "ret;\n"					\
+  "}\n"						\
+  ".visible .func _gfortran_abort;\n"		\
+  ".visible .func _gfortran_abort\n"		\
+  "{\n"						\
+  "trap;\n"					\
+  "ret;\n"					\
+  "}\n" \
+
+/* Generated with:
+
+   $ echo 'int acc_on_device(int d) { return __builtin_acc_on_device(d); } int acc_on_device_(int *d) { return acc_on_device(*d); }' | accel-gcc/xgcc -Baccel-gcc -x c - -o - -S -m64 -O3 -fno-builtin-acc_on_device -fno-inline
+*/
+#define ACC_ON_DEVICE_PTX						\
+  "        .version        3.1\n"					\
+  "        .target sm_30\n"						\
+  "        .address_size 64\n"						\
+  ".visible .func (.param.u32 %out_retval)acc_on_device(.param.u32 %in_ar1);\n" \
+  ".visible .func (.param.u32 %out_retval)acc_on_device(.param.u32 %in_ar1)\n" \
+  "{\n"									\
+  "        .reg.u32 %ar1;\n"						\
+  ".reg.u32 %retval;\n"							\
+  "        .reg.u64 %hr10;\n"						\
+  "        .reg.u32 %r24;\n"						\
+  "        .reg.u32 %r25;\n"						\
+  "        .reg.pred %r27;\n"						\
+  "        .reg.u32 %r30;\n"						\
+  "        ld.param.u32 %ar1, [%in_ar1];\n"				\
+  "                mov.u32 %r24, %ar1;\n"				\
+  "                setp.ne.u32 %r27,%r24,4;\n"				\
+  "                set.u32.eq.u32 %r30,%r24,5;\n"			\
+  "                neg.s32 %r25, %r30;\n"				\
+  "        @%r27   bra     $L3;\n"					\
+  "                mov.u32 %r25, 1;\n"					\
+  "$L3:\n"								\
+  "                mov.u32 %retval, %r25;\n"				\
+  "        st.param.u32    [%out_retval], %retval;\n"			\
+  "        ret;\n"							\
+  "        }\n"								\
+  ".visible .func (.param.u32 %out_retval)acc_on_device_(.param.u64 %in_ar1);\n" \
+  ".visible .func (.param.u32 %out_retval)acc_on_device_(.param.u64 %in_ar1)\n" \
+  "{\n"									\
+  "        .reg.u64 %ar1;\n"						\
+  ".reg.u32 %retval;\n"							\
+  "        .reg.u64 %hr10;\n"						\
+  "        .reg.u64 %r25;\n"						\
+  "        .reg.u32 %r26;\n"						\
+  "        .reg.u32 %r27;\n"						\
+  "        ld.param.u64 %ar1, [%in_ar1];\n"				\
+  "                mov.u64 %r25, %ar1;\n"				\
+  "                ld.u32  %r26, [%r25];\n"				\
+  "        {\n"								\
+  "                .param.u32 %retval_in;\n"				\
+  "        {\n"								\
+  "                .param.u32 %out_arg0;\n"				\
+  "                st.param.u32 [%out_arg0], %r26;\n"			\
+  "                call (%retval_in), acc_on_device, (%out_arg0);\n"	\
+  "        }\n"								\
+  "                ld.param.u32    %r27, [%retval_in];\n"		\
+  "}\n"									\
+  "                mov.u32 %retval, %r27;\n"				\
+  "        st.param.u32    [%out_retval], %retval;\n"			\
+  "        ret;\n"							\
+  "        }"
+
+static void
+link_ptx (CUmodule *module, char *ptx_code)
+{
+  CUjit_option opts[7];
+  void *optvals[7];
+  float elapsed = 0.0;
+#define LOGSIZE 8192
+  char elog[LOGSIZE];
+  char ilog[LOGSIZE];
+  unsigned long logsize = LOGSIZE;
+  CUlinkState linkstate;
+  CUresult r;
+  void *linkout;
+  size_t linkoutsize __attribute__((unused));
+
+  GOMP_PLUGIN_notify ("attempting to load:\n---\n%s\n---\n", ptx_code);
+
+  opts[0] = CU_JIT_WALL_TIME;
+  optvals[0] = &elapsed;
+
+  opts[1] = CU_JIT_INFO_LOG_BUFFER;
+  optvals[1] = &ilog[0];
+
+  opts[2] = CU_JIT_INFO_LOG_BUFFER_SIZE_BYTES;
+  optvals[2] = (void *) logsize;
+
+  opts[3] = CU_JIT_ERROR_LOG_BUFFER;
+  optvals[3] = &elog[0];
+
+  opts[4] = CU_JIT_ERROR_LOG_BUFFER_SIZE_BYTES;
+  optvals[4] = (void *) logsize;
+
+  opts[5] = CU_JIT_LOG_VERBOSE;
+  optvals[5] = (void *) 1;
+
+  opts[6] = CU_JIT_TARGET;
+  optvals[6] = (void *) CU_TARGET_COMPUTE_30;
+
+  r = cuLinkCreate (7, opts, optvals, &linkstate);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuLinkCreate error: %s", cuda_error (r));
+
+  char *abort_ptx = ABORT_PTX;
+  r = cuLinkAddData (linkstate, CU_JIT_INPUT_PTX, abort_ptx,
+		     strlen (abort_ptx) + 1, 0, 0, 0, 0);
+  if (r != CUDA_SUCCESS)
+    {
+      GOMP_PLUGIN_error ("Link error log %s\n", &elog[0]);
+      GOMP_PLUGIN_fatal ("cuLinkAddData (abort) error: %s", cuda_error (r));
+    }
+
+  char *acc_on_device_ptx = ACC_ON_DEVICE_PTX;
+  r = cuLinkAddData (linkstate, CU_JIT_INPUT_PTX, acc_on_device_ptx,
+		     strlen (acc_on_device_ptx) + 1, 0, 0, 0, 0);
+  if (r != CUDA_SUCCESS)
+    {
+      GOMP_PLUGIN_error ("Link error log %s\n", &elog[0]);
+      GOMP_PLUGIN_fatal ("cuLinkAddData (acc_on_device) error: %s",
+			 cuda_error (r));
+    }
+
+  r = cuLinkAddData (linkstate, CU_JIT_INPUT_PTX, ptx_code,
+              strlen (ptx_code) + 1, 0, 0, 0, 0);
+  if (r != CUDA_SUCCESS)
+    {
+      GOMP_PLUGIN_error ("Link error log %s\n", &elog[0]);
+      GOMP_PLUGIN_fatal ("cuLinkAddData (ptx_code) error: %s", cuda_error (r));
+    }
+
+  r = cuLinkComplete (linkstate, &linkout, &linkoutsize);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuLinkComplete error: %s", cuda_error (r));
+
+  GOMP_PLUGIN_notify ("Link complete: %fms\n", elapsed);
+  GOMP_PLUGIN_notify ("Link log %s\n", &ilog[0]);
+
+  r = cuModuleLoadData (module, linkout);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuModuleLoadData error: %s", cuda_error (r));
+}
+
+static void
+event_gc (bool memmap_lockable)
+{
+  struct PTX_event *e = PTX_events;
+  struct nvptx_thread *nvthd = nvptx_thread ();
+
+  GOMP_PLUGIN_mutex_lock (&PTX_event_lock);
+
+  while (e != NULL)
+    {
+      CUresult r;
+
+      if (e->ord != nvthd->ptx_dev->ord)
+	{
+	  e = e->next;
+	  continue;
+	}
+
+      r = cuEventQuery (*e->evt);
+      if (r == CUDA_SUCCESS)
+	{
+	  CUevent *te;
+
+	  te = e->evt;
+
+	  switch (e->type)
+	    {
+	    case PTX_EVT_MEM:
+	    case PTX_EVT_SYNC:
+	      break;
+	    
+	    case PTX_EVT_KNL:
+	      map_pop (e->addr);
+	      break;
+	    
+	    case PTX_EVT_ASYNC_CLEANUP:
+	      {
+		/* The function gomp_plugin_async_unmap_vars needs to claim the
+		   memory-map splay tree lock for the current device, so we
+		   can't call it when one of our callers has already claimed
+		   the lock.  In that case, just delay the GC for this event
+		   until later.  */
+		if (!memmap_lockable)
+		  {
+		    e = e->next;
+		    continue;
+		  }
+
+		GOMP_PLUGIN_async_unmap_vars (e->addr);
+	      }
+	      break;
+	    }
+
+	  cuEventDestroy (*te);
+	  free ((void *)te);
+
+	  struct PTX_event *next = e->next;
+
+	  if (PTX_events == e)
+	    PTX_events = PTX_events->next;
+	  else
+	    {
+	      struct PTX_event *e_ = PTX_events;
+	      while (e_->next != e)
+		e_ = e_->next;
+	      e_->next = e_->next->next;
+	    }
+
+	  free (e);
+	  e = next;
+        }
+      else
+	e = e->next;
+    }
+
+  GOMP_PLUGIN_mutex_unlock (&PTX_event_lock);
+}
+
+static void
+event_add (enum PTX_event_type type, CUevent *e, void *h)
+{
+  struct PTX_event *ptx_event;
+  struct nvptx_thread *nvthd = nvptx_thread ();
+
+  assert (type == PTX_EVT_MEM || type == PTX_EVT_KNL || type == PTX_EVT_SYNC
+	  || type == PTX_EVT_ASYNC_CLEANUP);
+
+  ptx_event = GOMP_PLUGIN_malloc (sizeof (struct PTX_event));
+  ptx_event->type = type;
+  ptx_event->evt = e;
+  ptx_event->addr = h;
+  ptx_event->ord = nvthd->ptx_dev->ord;
+
+  GOMP_PLUGIN_mutex_lock (&PTX_event_lock);
+
+  ptx_event->next = PTX_events;
+  PTX_events = ptx_event;
+
+  GOMP_PLUGIN_mutex_unlock (&PTX_event_lock);
+}
+
+void
+PTX_exec (void (*fn), size_t mapnum, void **hostaddrs, void **devaddrs,
+	  size_t *sizes, unsigned short *kinds, int num_gangs, int num_workers,
+	  int vector_length, int async, void *targ_mem_desc)
+{
+  struct targ_fn_descriptor *targ_fn = (struct targ_fn_descriptor *) fn;
+  CUfunction function;
+  CUresult r;
+  int i;
+  struct ptx_stream *dev_str;
+  void *kargs[1];
+  void *hp, *dp;
+  unsigned int nthreads_in_block;
+  struct nvptx_thread *nvthd = nvptx_thread ();
+
+  function = targ_fn->fn;
+  
+  dev_str = select_stream_for_async (async, pthread_self (), false, NULL);
+  assert (dev_str == nvthd->current_stream);
+
+  /* This reserves a chunk of a pre-allocated page of memory mapped on both
+     the host and the device. HP is a host pointer to the new chunk, and DP is
+     the corresponding device pointer.  */
+  map_push (dev_str, async, mapnum * sizeof (void *), &hp, &dp);
+
+  GOMP_PLUGIN_notify ("  %s: prepare mappings\n", __FUNCTION__);
+
+  /* Copy the array of arguments to the mapped page.  */
+  for (i = 0; i < mapnum; i++)
+    ((void **) hp)[i] = devaddrs[i];
+
+  /* Copy the (device) pointers to arguments to the device (dp and hp might in
+     fact have the same value on a unified-memory system).  */
+  r = cuMemcpy ((CUdeviceptr)dp, (CUdeviceptr)hp, mapnum * sizeof (void *));
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuMemcpy failed: %s", cuda_error (r));
+
+  GOMP_PLUGIN_notify ("  %s: kernel %s: launch\n", __FUNCTION__, targ_fn->name);
+
+  // XXX: possible geometry mappings??
+  //
+  // OpenACC		CUDA
+  //
+  // num_gangs		blocks
+  // num_workers	warps (where a warp is equivalent to 32 threads)
+  // vector length	threads
+  //
+
+  /* The openacc vector_length clause 'determines the vector length to use for
+     vector or SIMD operations'.  The question is how to map this to CUDA.
+
+     In CUDA, the warp size is the vector length of a CUDA device.  However, the
+     CUDA interface abstracts away from that, and only shows us warp size
+     indirectly in maximum number of threads per block, which is a product of
+     warp size and the number of hyperthreads of a multiprocessor.
+
+     We choose to map openacc vector_length directly onto the number of threads
+     in a block, in the x dimension.  This is reflected in gcc code generation
+     that uses ThreadIdx.x to access vector elements.
+
+     Attempting to use an openacc vector_length of more than the maximum number
+     of threads per block will result in a cuda error.  */
+  nthreads_in_block = vector_length;
+
+  kargs[0] = &dp;
+  r = cuLaunchKernel (function,
+			1, 1, 1,
+			nthreads_in_block, 1, 1,
+			0, dev_str->stream, kargs, 0);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuLaunchKernel error: %s", cuda_error (r));
+
+#ifndef DISABLE_ASYNC
+  if (async < acc_async_noval)
+    {
+      r = cuStreamSynchronize (dev_str->stream);
+      if (r != CUDA_SUCCESS)
+        GOMP_PLUGIN_fatal ("cuStreamSynchronize error: %s", cuda_error (r));
+    }
+  else
+    {
+      CUevent *e;
+
+      e = (CUevent *)GOMP_PLUGIN_malloc (sizeof (CUevent));
+
+      r = cuEventCreate (e, CU_EVENT_DISABLE_TIMING);
+      if (r != CUDA_SUCCESS)
+        GOMP_PLUGIN_fatal ("cuEventCreate error: %s", cuda_error (r));
+
+      event_gc (true);
+
+      r = cuEventRecord (*e, dev_str->stream);
+      if (r != CUDA_SUCCESS)
+        GOMP_PLUGIN_fatal ("cuEventRecord error: %s", cuda_error (r));
+
+      event_add (PTX_EVT_KNL, e, (void *)dev_str);
+    }
+#else
+  r = cuCtxSynchronize ();
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuCtxSynchronize error: %s", cuda_error (r));
+#endif
+
+  GOMP_PLUGIN_notify ("  %s: kernel %s: finished\n", __FUNCTION__,
+		      targ_fn->name);
+
+#ifndef DISABLE_ASYNC
+  if (async < acc_async_noval)
+#endif
+    map_pop (dev_str);
+}
+
+void * openacc_get_current_cuda_context (void);
+
+static void *
+PTX_alloc (size_t s)
+{
+  CUdeviceptr d;
+  CUresult r;
+
+  r = cuMemAlloc (&d, s);
+  if (r == CUDA_ERROR_OUT_OF_MEMORY)
+    return 0;
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuMemAlloc error: %s", cuda_error (r));
+  return (void *)d;
+}
+
+static void
+PTX_free (void *p)
+{
+  CUresult r;
+  CUdeviceptr pb;
+  size_t ps;
+
+  r = cuMemGetAddressRange (&pb, &ps, (CUdeviceptr)p);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuMemGetAddressRange error: %s", cuda_error (r));
+
+  if ((CUdeviceptr)p != pb)
+    GOMP_PLUGIN_fatal ("invalid device address");
+
+  r = cuMemFree ((CUdeviceptr)p);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuMemFree error: %s", cuda_error (r));
+}
+
+static void *
+PTX_host2dev (void *d, const void *h, size_t s)
+{
+  CUresult r;
+  CUdeviceptr pb;
+  size_t ps;
+  struct nvptx_thread *nvthd = nvptx_thread ();
+
+  if (!s)
+    return 0;
+
+  if (!d)
+    GOMP_PLUGIN_fatal ("invalid device address");
+
+  r = cuMemGetAddressRange (&pb, &ps, (CUdeviceptr)d);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuMemGetAddressRange error: %s", cuda_error (r));
+
+  if (!pb)
+    GOMP_PLUGIN_fatal ("invalid device address");
+
+  if (!h)
+    GOMP_PLUGIN_fatal ("invalid host address");
+
+  if (d == h)
+    GOMP_PLUGIN_fatal ("invalid host or device address");
+
+  if ((void *)(d + s) > (void *)(pb + ps))
+    GOMP_PLUGIN_fatal ("invalid size");
+
+#ifndef DISABLE_ASYNC
+  if (nvthd->current_stream != nvthd->ptx_dev->null_stream)
+    {
+      CUevent *e;
+
+      e = (CUevent *)GOMP_PLUGIN_malloc (sizeof (CUevent));
+
+      r = cuEventCreate (e, CU_EVENT_DISABLE_TIMING);
+      if (r != CUDA_SUCCESS)
+        GOMP_PLUGIN_fatal ("cuEventCreate error: %s", cuda_error (r));
+
+      event_gc (false);
+
+      r = cuMemcpyHtoDAsync ((CUdeviceptr)d, h, s,
+			     nvthd->current_stream->stream);
+      if (r != CUDA_SUCCESS)
+        GOMP_PLUGIN_fatal ("cuMemcpyHtoDAsync error: %s", cuda_error (r));
+
+      r = cuEventRecord (*e, nvthd->current_stream->stream);
+      if (r != CUDA_SUCCESS)
+        GOMP_PLUGIN_fatal ("cuEventRecord error: %s", cuda_error (r));
+
+      event_add (PTX_EVT_MEM, e, (void *)h);
+    }
+  else
+#endif
+    {
+      r = cuMemcpyHtoD ((CUdeviceptr)d, h, s);
+      if (r != CUDA_SUCCESS)
+        GOMP_PLUGIN_fatal ("cuMemcpyHtoD error: %s", cuda_error (r));
+    }
+
+  return 0;
+}
+
+static void *
+PTX_dev2host (void *h, const void *d, size_t s)
+{
+  CUresult r;
+  CUdeviceptr pb;
+  size_t ps;
+  struct nvptx_thread *nvthd = nvptx_thread ();
+
+  if (!s)
+    return 0;
+
+  if (!d)
+    GOMP_PLUGIN_fatal ("invalid device address");
+
+  r = cuMemGetAddressRange (&pb, &ps, (CUdeviceptr)d);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuMemGetAddressRange error: %s", cuda_error (r));
+
+  if (!pb)
+    GOMP_PLUGIN_fatal ("invalid device address");
+
+  if (!h)
+    GOMP_PLUGIN_fatal ("invalid host address");
+
+  if (d == h)
+    GOMP_PLUGIN_fatal ("invalid host or device address");
+
+  if ((void *)(d + s) > (void *)(pb + ps))
+    GOMP_PLUGIN_fatal ("invalid size");
+
+#ifndef DISABLE_ASYNC
+  if (nvthd->current_stream != nvthd->ptx_dev->null_stream)
+    {
+      CUevent *e;
+
+      e = (CUevent *)GOMP_PLUGIN_malloc (sizeof (CUevent));
+
+      r = cuEventCreate (e, CU_EVENT_DISABLE_TIMING);
+      if (r != CUDA_SUCCESS)
+        GOMP_PLUGIN_fatal ("cuEventCreate error: %s\n", cuda_error (r));
+
+      event_gc (false);
+
+      r = cuMemcpyDtoHAsync (h, (CUdeviceptr)d, s,
+			     nvthd->current_stream->stream);
+      if (r != CUDA_SUCCESS)
+        GOMP_PLUGIN_fatal ("cuMemcpyDtoHAsync error: %s", cuda_error (r));
+
+      r = cuEventRecord (*e, nvthd->current_stream->stream);
+      if (r != CUDA_SUCCESS)
+        GOMP_PLUGIN_fatal ("cuEventRecord error: %s", cuda_error (r));
+
+      event_add (PTX_EVT_MEM, e, (void *)h);
+    }
+  else
+#endif
+    {
+      r = cuMemcpyDtoH (h, (CUdeviceptr)d, s);
+      if (r != CUDA_SUCCESS)
+	GOMP_PLUGIN_fatal ("cuMemcpyDtoH error: %s", cuda_error (r));
+    }
+
+  return 0;
+}
+
+static void
+PTX_set_async (int async)
+{
+  struct nvptx_thread *nvthd = nvptx_thread ();
+  nvthd->current_stream
+    = select_stream_for_async (async, pthread_self (), true, NULL);
+}
+
+static int
+PTX_async_test (int async)
+{
+  CUresult r;
+  struct ptx_stream *s;
+  
+  s = select_stream_for_async (async, pthread_self (), false, NULL);
+
+  if (!s)
+    GOMP_PLUGIN_fatal ("unknown async %d", async);
+
+  r = cuStreamQuery (s->stream);
+  if (r == CUDA_SUCCESS)
+    {
+      /* The oacc-parallel.c:goacc_wait function calls this hook to determine
+	 whether all work has completed on this stream, and if so omits the call
+	 to the wait hook.  If that happens, event_gc might not get called
+	 (which prevents variables from getting unmapped and their associated
+	 device storage freed), so call it here.  */
+      event_gc (true);
+      return 1;
+    }
+  else if (r == CUDA_ERROR_NOT_READY)
+    return 0;
+
+  GOMP_PLUGIN_fatal ("cuStreamQuery error: %s", cuda_error (r));
+
+  return 0;
+}
+
+static int
+PTX_async_test_all (void)
+{
+  struct ptx_stream *s;
+  pthread_t self = pthread_self ();
+  struct nvptx_thread *nvthd = nvptx_thread ();
+
+  GOMP_PLUGIN_mutex_lock (&nvthd->ptx_dev->stream_lock);
+
+  for (s = nvthd->ptx_dev->active_streams; s != NULL; s = s->next)
+    {
+      if ((s->multithreaded || pthread_equal (s->host_thread, self))
+	  && cuStreamQuery (s->stream) == CUDA_ERROR_NOT_READY)
+	{
+	  GOMP_PLUGIN_mutex_unlock (&nvthd->ptx_dev->stream_lock);
+	  return 0;
+	}
+    }
+
+  GOMP_PLUGIN_mutex_unlock (&nvthd->ptx_dev->stream_lock);
+
+  event_gc (true);
+
+  return 1;
+}
+
+static void
+PTX_wait (int async)
+{
+  CUresult r;
+  struct ptx_stream *s;
+  
+  s = select_stream_for_async (async, pthread_self (), false, NULL);
+
+  if (!s)
+    GOMP_PLUGIN_fatal ("unknown async %d", async);
+
+  r = cuStreamSynchronize (s->stream);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuStreamSynchronize error: %s", cuda_error (r));
+  
+  event_gc (true);
+}
+
+static void
+PTX_wait_async (int async1, int async2)
+{
+  CUresult r;
+  CUevent *e;
+  struct ptx_stream *s1, *s2;
+  pthread_t self = pthread_self ();
+
+  /* The stream that is waiting (rather than being waited for) doesn't
+     necessarily have to exist already.  */
+  s2 = select_stream_for_async (async2, self, true, NULL);
+
+  s1 = select_stream_for_async (async1, self, false, NULL);
+  if (!s1)
+    GOMP_PLUGIN_fatal ("invalid async 1\n");
+
+  if (s1 == s2)
+    GOMP_PLUGIN_fatal ("identical parameters");
+
+  e = (CUevent *)GOMP_PLUGIN_malloc (sizeof (CUevent));
+
+  r = cuEventCreate (e, CU_EVENT_DISABLE_TIMING);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuEventCreate error: %s", cuda_error (r));
+
+  event_gc (true);
+
+  r = cuEventRecord (*e, s1->stream);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuEventRecord error: %s", cuda_error (r));
+
+  event_add (PTX_EVT_SYNC, e, NULL);
+
+  r = cuStreamWaitEvent (s2->stream, *e, 0);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuStreamWaitEvent error: %s", cuda_error (r));
+}
+
+static void
+PTX_wait_all (void)
+{
+  CUresult r;
+  struct ptx_stream *s;
+  pthread_t self = pthread_self ();
+  struct nvptx_thread *nvthd = nvptx_thread ();
+
+  GOMP_PLUGIN_mutex_lock (&nvthd->ptx_dev->stream_lock);
+
+  /* Wait for active streams initiated by this thread (or by multiple threads)
+     to complete.  */
+  for (s = nvthd->ptx_dev->active_streams; s != NULL; s = s->next)
+    {
+      if (s->multithreaded || pthread_equal (s->host_thread, self))
+	{
+	  r = cuStreamQuery (s->stream);
+	  if (r == CUDA_SUCCESS)
+	    continue;
+	  else if (r != CUDA_ERROR_NOT_READY)
+	    GOMP_PLUGIN_fatal ("cuStreamQuery error: %s", cuda_error (r));
+
+	  r = cuStreamSynchronize (s->stream);
+	  if (r != CUDA_SUCCESS)
+	    GOMP_PLUGIN_fatal ("cuStreamSynchronize error: %s", cuda_error (r));
+	}
+    }
+
+  GOMP_PLUGIN_mutex_unlock (&nvthd->ptx_dev->stream_lock);
+
+  event_gc (true);
+}
+
+static void
+PTX_wait_all_async (int async)
+{
+  CUresult r;
+  struct ptx_stream *waiting_stream, *other_stream;
+  CUevent *e;
+  struct nvptx_thread *nvthd = nvptx_thread ();
+  pthread_t self = pthread_self ();
+  
+  /* The stream doing the waiting.  This could be the first mention of the
+     stream, so create it if necessary.  */
+  waiting_stream
+    = select_stream_for_async (async, pthread_self (), true, NULL);
+  
+  /* Launches on the null stream already block on other streams in the
+     context.  */
+  if (!waiting_stream || waiting_stream == nvthd->ptx_dev->null_stream)
+    return;
+
+  event_gc (true);
+
+  GOMP_PLUGIN_mutex_lock (&nvthd->ptx_dev->stream_lock);
+
+  for (other_stream = nvthd->ptx_dev->active_streams;
+       other_stream != NULL;
+       other_stream = other_stream->next)
+    {
+      if (!other_stream->multithreaded
+	  && !pthread_equal (other_stream->host_thread, self))
+	continue;
+
+      e = (CUevent *) GOMP_PLUGIN_malloc (sizeof (CUevent));
+
+      r = cuEventCreate (e, CU_EVENT_DISABLE_TIMING);
+      if (r != CUDA_SUCCESS)
+	GOMP_PLUGIN_fatal ("cuEventCreate error: %s", cuda_error (r));
+
+      /* Record an event on the waited-for stream.  */
+      r = cuEventRecord (*e, other_stream->stream);
+      if (r != CUDA_SUCCESS)
+	GOMP_PLUGIN_fatal ("cuEventRecord error: %s", cuda_error (r));
+
+      event_add (PTX_EVT_SYNC, e, NULL);
+
+      r = cuStreamWaitEvent (waiting_stream->stream, *e, 0);
+      if (r != CUDA_SUCCESS)
+	GOMP_PLUGIN_fatal ("cuStreamWaitEvent error: %s", cuda_error (r));
+   }
+
+  GOMP_PLUGIN_mutex_unlock (&nvthd->ptx_dev->stream_lock);
+}
+
+static void *
+PTX_get_current_cuda_device (void)
+{
+  struct nvptx_thread *nvthd = nvptx_thread ();
+
+  if (!nvthd || !nvthd->ptx_dev)
+    return NULL;
+
+  return &nvthd->ptx_dev->dev;
+}
+
+static void *
+PTX_get_current_cuda_context (void)
+{
+  struct nvptx_thread *nvthd = nvptx_thread ();
+
+  if (!nvthd || !nvthd->ptx_dev)
+    return NULL;
+
+  return nvthd->ptx_dev->ctx;
+}
+
+static void *
+PTX_get_cuda_stream (int async)
+{
+  struct ptx_stream *s;
+  struct nvptx_thread *nvthd = nvptx_thread ();
+
+  if (!nvthd || !nvthd->ptx_dev)
+    return NULL;
+
+  s = select_stream_for_async (async, pthread_self (), false, NULL);
+
+  return s ? s->stream : NULL;
+}
+
+static int
+PTX_set_cuda_stream (int async, void *stream)
+{
+  struct ptx_stream *oldstream;
+  pthread_t self = pthread_self ();
+  struct nvptx_thread *nvthd = nvptx_thread ();
+
+  GOMP_PLUGIN_mutex_lock (&nvthd->ptx_dev->stream_lock);
+
+  if (async < 0)
+    GOMP_PLUGIN_fatal ("bad async %d", async);
+
+  /* We have a list of active streams and an array mapping async values to
+     entries of that list.  We need to take "ownership" of the passed-in stream,
+     and add it to our list, removing the previous entry also (if there was one)
+     in order to prevent resource leaks.  Note the potential for surprise
+     here: maybe we should keep track of passed-in streams and leave it up to
+     the user to tidy those up, but that doesn't work for stream handles
+     returned from acc_get_cuda_stream above...  */
+
+  oldstream = select_stream_for_async (async, self, false, NULL);
+  
+  if (oldstream)
+    {
+      if (nvthd->ptx_dev->active_streams == oldstream)
+	nvthd->ptx_dev->active_streams = nvthd->ptx_dev->active_streams->next;
+      else
+	{
+	  struct ptx_stream *s = nvthd->ptx_dev->active_streams;
+	  while (s->next != oldstream)
+	    s = s->next;
+	  s->next = s->next->next;
+	}
+
+      cuStreamDestroy (oldstream->stream);
+      map_fini (oldstream);
+      free (oldstream);
+    }
+
+  GOMP_PLUGIN_mutex_unlock (&nvthd->ptx_dev->stream_lock);
+
+  (void) select_stream_for_async (async, self, true, (CUstream) stream);
+
+  return 1;
+}
+
+/* Plugin entry points.  */
+
+
+int
+GOMP_OFFLOAD_get_type (void)
+{
+  return OFFLOAD_TARGET_TYPE_NVIDIA_PTX;
+}
+
+unsigned int
+GOMP_OFFLOAD_get_caps (void)
+{
+  return TARGET_CAP_OPENACC_200;
+}
+
+const char *
+GOMP_OFFLOAD_get_name (void)
+{
+  return "nvidia";
+}
+
+int
+GOMP_OFFLOAD_get_num_devices (void)
+{
+  return PTX_get_num_devices ();
+}
+
+static void **kernel_target_data;
+static void **kernel_host_table;
+
+void
+GOMP_OFFLOAD_register_image (void *host_table, void *target_data)
+{
+  kernel_target_data = target_data;
+  kernel_host_table = host_table;
+}
+
+void
+GOMP_OFFLOAD_init_device (int n __attribute__((unused)))
+{
+  (void) PTX_init ();
+}
+
+void
+GOMP_OFFLOAD_fini_device (int n __attribute__((unused)))
+{
+  PTX_fini ();
+}
+
+int
+GOMP_OFFLOAD_get_table (int n __attribute__((unused)),
+			struct mapping_table **tablep)
+{
+  CUmodule module;
+  void **fn_table;
+  char **fn_names;
+  int fn_entries, i;
+  CUresult r;
+  struct targ_fn_descriptor *targ_fns;
+
+  if (PTX_init () <= 0)
+    return 0;
+
+  /* This isn't an error, because an image may legitimately have no offloaded
+     regions and so will not call GOMP_offload_register.  */
+  if (kernel_target_data == NULL)
+    return 0;
+
+  link_ptx (&module, kernel_target_data[0]);
+
+  /* kernel_target_data[0] -> ptx code
+     kernel_target_data[1] -> variable mappings
+     kernel_target_data[2] -> array of kernel names in ascii
+
+     kernel_host_table[0] -> start of function addresses (_omp_func_table)
+     kernel_host_table[1] -> end of function addresses (_omp_funcs_end)
+
+     The array of kernel names and the functions addresses form a
+     one-to-one correspondence.  */
+
+  fn_table = kernel_host_table[0];
+  fn_names = (char **) kernel_target_data[2];
+  fn_entries = (kernel_host_table[1] - kernel_host_table[0]) / sizeof (void *);
+
+  *tablep = GOMP_PLUGIN_malloc (sizeof (struct mapping_table) * fn_entries);
+  targ_fns = GOMP_PLUGIN_malloc (sizeof (struct targ_fn_descriptor)
+				 * fn_entries);
+
+  for (i = 0; i < fn_entries; i++)
+    {
+      CUfunction function;
+
+      r = cuModuleGetFunction (&function, module, fn_names[i]);
+      if (r != CUDA_SUCCESS)
+	GOMP_PLUGIN_fatal ("cuModuleGetFunction error: %s", cuda_error (r));
+
+      targ_fns[i].fn = function;
+      targ_fns[i].name = (const char *) fn_names[i];
+
+      (*tablep)[i].host_start = (uintptr_t) fn_table[i];
+      (*tablep)[i].host_end = (*tablep)[i].host_start + 1;
+      (*tablep)[i].tgt_start = (uintptr_t) &targ_fns[i];
+      (*tablep)[i].tgt_end = (*tablep)[i].tgt_start + 1;
+    }
+
+  return fn_entries;
+}
+
+void *
+GOMP_OFFLOAD_alloc (int n __attribute__((unused)), size_t size)
+{
+  return PTX_alloc (size);
+}
+
+void
+GOMP_OFFLOAD_free (int n __attribute__((unused)), void *ptr)
+{
+  PTX_free (ptr);
+}
+
+void *
+GOMP_OFFLOAD_dev2host (int ord __attribute__((unused)), void *dst,
+		       const void *src, size_t n)
+{
+  return PTX_dev2host (dst, src, n);
+}
+
+void *
+GOMP_OFFLOAD_host2dev (int ord __attribute__((unused)), void *dst,
+		       const void *src, size_t n)
+{
+  return PTX_host2dev (dst, src, n);
+}
+
+void (*device_run) (void *fn_ptr, void *vars) = NULL;
+
+void
+GOMP_OFFLOAD_openacc_parallel (void (*fn) (void *), size_t mapnum,
+			      void **hostaddrs, void **devaddrs, size_t *sizes,
+			      unsigned short *kinds, int num_gangs,
+			      int num_workers, int vector_length, int async,
+			      void *targ_mem_desc)
+{
+  PTX_exec (fn, mapnum, hostaddrs, devaddrs, sizes, kinds, num_gangs,
+	    num_workers, vector_length, async, targ_mem_desc);
+}
+
+void *
+GOMP_OFFLOAD_openacc_open_device (int n)
+{
+  return PTX_open_device (n);
+}
+
+int
+GOMP_OFFLOAD_openacc_close_device (void *h)
+{
+  return PTX_close_device (h);
+}
+
+void
+GOMP_OFFLOAD_openacc_set_device_num (int n)
+{
+  struct nvptx_thread *nvthd = nvptx_thread ();
+
+  assert (n >= 0);
+
+  if (!nvthd->ptx_dev || nvthd->ptx_dev->ord != n)
+    (void) PTX_open_device (n);
+}
+
+/* This can be called before the device is "opened" for the current thread, in
+   which case we can't tell which device number should be returned.  We don't
+   actually want to open the device here, so just return -1 and let the caller
+   (oacc-init.c:acc_get_device_num) handle it.  */
+
+int
+GOMP_OFFLOAD_openacc_get_device_num (void)
+{
+  struct nvptx_thread *nvthd = nvptx_thread ();
+
+  if (nvthd && nvthd->ptx_dev)
+    return nvthd->ptx_dev->ord;
+  else
+    return -1;
+}
+
+void
+GOMP_OFFLOAD_openacc_register_async_cleanup (void *targ_mem_desc)
+{
+  CUevent *e;
+  CUresult r;
+  struct nvptx_thread *nvthd = nvptx_thread ();
+
+  e = (CUevent *) GOMP_PLUGIN_malloc (sizeof (CUevent));
+
+  r = cuEventCreate (e, CU_EVENT_DISABLE_TIMING);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuEventCreate error: %s", cuda_error (r));
+
+  r = cuEventRecord (*e, nvthd->current_stream->stream);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuEventRecord error: %s", cuda_error (r));
+
+  event_add (PTX_EVT_ASYNC_CLEANUP, e, targ_mem_desc);
+}
+
+int
+GOMP_OFFLOAD_openacc_async_test (int async)
+{
+  return PTX_async_test (async);
+}
+
+int
+GOMP_OFFLOAD_openacc_async_test_all (void)
+{
+  return PTX_async_test_all ();
+}
+
+void
+GOMP_OFFLOAD_openacc_async_wait (int async)
+{
+  PTX_wait (async);
+}
+
+void
+GOMP_OFFLOAD_openacc_async_wait_async (int async1, int async2)
+{
+  PTX_wait_async (async1, async2);
+}
+
+void
+GOMP_OFFLOAD_openacc_async_wait_all (void)
+{
+  PTX_wait_all ();
+}
+
+void
+GOMP_OFFLOAD_openacc_async_wait_all_async (int async)
+{
+  PTX_wait_all_async (async);
+}
+
+void
+GOMP_OFFLOAD_openacc_async_set_async (int async)
+{
+  PTX_set_async (async);
+}
+
+void *
+GOMP_OFFLOAD_openacc_create_thread_data (void *targ_data)
+{
+  struct ptx_device *ptx_dev = (struct ptx_device *) targ_data;
+  struct nvptx_thread *nvthd
+    = GOMP_PLUGIN_malloc (sizeof (struct nvptx_thread));
+  CUresult r;
+  CUcontext thd_ctx;
+
+  r = cuCtxGetCurrent (&thd_ctx);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuCtxGetCurrent error: %s", cuda_error (r));
+
+  assert (ptx_dev->ctx);
+
+  if (!thd_ctx)
+    {
+      r = cuCtxPushCurrent (ptx_dev->ctx);
+      if (r != CUDA_SUCCESS)
+	GOMP_PLUGIN_fatal ("cuCtxPushCurrent error: %s", cuda_error (r));
+    }
+
+  nvthd->current_stream = ptx_dev->null_stream;
+  nvthd->ptx_dev = ptx_dev;
+
+  return (void *) nvthd;
+}
+
+void
+GOMP_OFFLOAD_openacc_destroy_thread_data (void *data)
+{
+  free (data);
+}
+
+void *
+GOMP_OFFLOAD_openacc_get_current_cuda_device (void)
+{
+  return PTX_get_current_cuda_device ();
+}
+
+void *
+GOMP_OFFLOAD_openacc_get_current_cuda_context (void)
+{
+  return PTX_get_current_cuda_context ();
+}
+
+/* NOTE: This returns a CUstream, not a ptx_stream pointer.  */
+
+void *
+GOMP_OFFLOAD_openacc_get_cuda_stream (int async)
+{
+  return PTX_get_cuda_stream (async);
+}
+
+/* NOTE: This takes a CUstream, not a ptx_stream pointer.  */
+
+int
+GOMP_OFFLOAD_openacc_set_cuda_stream (int async, void *stream)
+{
+  return PTX_set_cuda_stream (async, stream);
+}
diff --git a/libgomp/splay-tree.c b/libgomp/splay-tree.c
new file mode 100644
index 0000000..14b03ac
--- /dev/null
+++ b/libgomp/splay-tree.c
@@ -0,0 +1,224 @@
+/* A splay-tree datatype.
+   Copyright 1998-2013
+   Free Software Foundation, Inc.
+   Contributed by Mark Mitchell (mark@markmitchell.com).
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* The splay tree code copied from include/splay-tree.h and adjusted,
+   so that all the data lives directly in splay_tree_node_s structure
+   and no extra allocations are needed.
+
+   Files including this header should before including it add:
+typedef struct splay_tree_node_s *splay_tree_node;
+typedef struct splay_tree_s *splay_tree;
+typedef struct splay_tree_key_s *splay_tree_key;
+   define splay_tree_key_s structure, and define
+   splay_compare inline function.  */
+
+/* For an easily readable description of splay-trees, see:
+
+     Lewis, Harry R. and Denenberg, Larry.  Data Structures and Their
+     Algorithms.  Harper-Collins, Inc.  1991.
+
+   The major feature of splay trees is that all basic tree operations
+   are amortized O(log n) time for a tree with n nodes.  */
+
+#include "libgomp.h"
+#include "splay-tree.h"
+
+extern int splay_compare (splay_tree_key, splay_tree_key);
+
+/* Rotate the edge joining the left child N with its parent P.  PP is the
+   grandparents' pointer to P.  */
+
+static inline void
+rotate_left (splay_tree_node *pp, splay_tree_node p, splay_tree_node n)
+{
+  splay_tree_node tmp;
+  tmp = n->right;
+  n->right = p;
+  p->left = tmp;
+  *pp = n;
+}
+
+/* Rotate the edge joining the right child N with its parent P.  PP is the
+   grandparents' pointer to P.  */
+
+static inline void
+rotate_right (splay_tree_node *pp, splay_tree_node p, splay_tree_node n)
+{
+  splay_tree_node tmp;
+  tmp = n->left;
+  n->left = p;
+  p->right = tmp;
+  *pp = n;
+}
+
+/* Bottom up splay of KEY.  */
+
+static void
+splay_tree_splay (splay_tree sp, splay_tree_key key)
+{
+  if (sp->root == NULL)
+    return;
+
+  do {
+    int cmp1, cmp2;
+    splay_tree_node n, c;
+
+    n = sp->root;
+    cmp1 = splay_compare (key, &n->key);
+
+    /* Found.  */
+    if (cmp1 == 0)
+      return;
+
+    /* Left or right?  If no child, then we're done.  */
+    if (cmp1 < 0)
+      c = n->left;
+    else
+      c = n->right;
+    if (!c)
+      return;
+
+    /* Next one left or right?  If found or no child, we're done
+       after one rotation.  */
+    cmp2 = splay_compare (key, &c->key);
+    if (cmp2 == 0
+	|| (cmp2 < 0 && !c->left)
+	|| (cmp2 > 0 && !c->right))
+      {
+	if (cmp1 < 0)
+	  rotate_left (&sp->root, n, c);
+	else
+	  rotate_right (&sp->root, n, c);
+	return;
+      }
+
+    /* Now we have the four cases of double-rotation.  */
+    if (cmp1 < 0 && cmp2 < 0)
+      {
+	rotate_left (&n->left, c, c->left);
+	rotate_left (&sp->root, n, n->left);
+      }
+    else if (cmp1 > 0 && cmp2 > 0)
+      {
+	rotate_right (&n->right, c, c->right);
+	rotate_right (&sp->root, n, n->right);
+      }
+    else if (cmp1 < 0 && cmp2 > 0)
+      {
+	rotate_right (&n->left, c, c->right);
+	rotate_left (&sp->root, n, n->left);
+      }
+    else if (cmp1 > 0 && cmp2 < 0)
+      {
+	rotate_left (&n->right, c, c->left);
+	rotate_right (&sp->root, n, n->right);
+      }
+  } while (1);
+}
+
+/* Insert a new NODE into SP.  The NODE shouldn't exist in the tree.  */
+
+attribute_hidden void
+splay_tree_insert (splay_tree sp, splay_tree_node node)
+{
+  int comparison = 0;
+
+  splay_tree_splay (sp, &node->key);
+
+  if (sp->root)
+    comparison = splay_compare (&sp->root->key, &node->key);
+
+  if (sp->root && comparison == 0)
+    gomp_fatal ("Duplicate node");
+  else
+    {
+      /* Insert it at the root.  */
+      if (sp->root == NULL)
+	node->left = node->right = NULL;
+      else if (comparison < 0)
+	{
+	  node->left = sp->root;
+	  node->right = node->left->right;
+	  node->left->right = NULL;
+	}
+      else
+	{
+	  node->right = sp->root;
+	  node->left = node->right->left;
+	  node->right->left = NULL;
+	}
+
+      sp->root = node;
+    }
+}
+
+/* Remove node with KEY from SP.  It is not an error if it did not exist.  */
+
+attribute_hidden void
+splay_tree_remove (splay_tree sp, splay_tree_key key)
+{
+  splay_tree_splay (sp, key);
+
+  if (sp->root && splay_compare (&sp->root->key, key) == 0)
+    {
+      splay_tree_node left, right;
+
+      left = sp->root->left;
+      right = sp->root->right;
+
+      /* One of the children is now the root.  Doesn't matter much
+	 which, so long as we preserve the properties of the tree.  */
+      if (left)
+	{
+	  sp->root = left;
+
+	  /* If there was a right child as well, hang it off the
+	     right-most leaf of the left child.  */
+	  if (right)
+	    {
+	      while (left->right)
+		left = left->right;
+	      left->right = right;
+	    }
+	}
+      else
+	sp->root = right;
+    }
+}
+
+/* Lookup KEY in SP, returning NODE if present, and NULL
+   otherwise.  */
+
+attribute_hidden splay_tree_key
+splay_tree_lookup (splay_tree sp, splay_tree_key key)
+{
+  splay_tree_splay (sp, key);
+
+  if (sp->root && splay_compare (&sp->root->key, key) == 0)
+    return &sp->root->key;
+  else
+    return NULL;
+}
diff --git a/libgomp/splay-tree.h b/libgomp/splay-tree.h
index eb8011a..f29d437 100644
--- a/libgomp/splay-tree.h
+++ b/libgomp/splay-tree.h
@@ -43,6 +43,30 @@ typedef struct splay_tree_key_s *splay_tree_key;
    The major feature of splay trees is that all basic tree operations
    are amortized O(log n) time for a tree with n nodes.  */
 
+#ifndef _SPLAY_TREE_H
+#define _SPLAY_TREE_H 1
+
+typedef struct splay_tree_node_s *splay_tree_node;
+typedef struct splay_tree_s *splay_tree;
+typedef struct splay_tree_key_s *splay_tree_key;
+
+struct splay_tree_key_s {
+  /* Address of the host object.  */
+  uintptr_t host_start;
+  /* Address immediately after the host object.  */
+  uintptr_t host_end;
+  /* Descriptor of the target memory.  */
+  struct target_mem_desc *tgt;
+  /* Offset from tgt->tgt_start to the start of the target object.  */
+  uintptr_t tgt_offset;
+  /* Reference count.  */
+  uintptr_t refcount;
+  /* Asynchronous reference count.  */
+  uintptr_t async_refcount;
+  /* True if data should be copied from device to host at the end.  */
+  bool copy_from;
+};
+
 /* The nodes in the splay tree.  */
 struct splay_tree_node_s {
   struct splay_tree_key_s key;
@@ -56,177 +80,8 @@ struct splay_tree_s {
   splay_tree_node root;
 };
 
-/* Rotate the edge joining the left child N with its parent P.  PP is the
-   grandparents' pointer to P.  */
-
-static inline void
-rotate_left (splay_tree_node *pp, splay_tree_node p, splay_tree_node n)
-{
-  splay_tree_node tmp;
-  tmp = n->right;
-  n->right = p;
-  p->left = tmp;
-  *pp = n;
-}
-
-/* Rotate the edge joining the right child N with its parent P.  PP is the
-   grandparents' pointer to P.  */
-
-static inline void
-rotate_right (splay_tree_node *pp, splay_tree_node p, splay_tree_node n)
-{
-  splay_tree_node tmp;
-  tmp = n->left;
-  n->left = p;
-  p->right = tmp;
-  *pp = n;
-}
-
-/* Bottom up splay of KEY.  */
-
-static void
-splay_tree_splay (splay_tree sp, splay_tree_key key)
-{
-  if (sp->root == NULL)
-    return;
-
-  do {
-    int cmp1, cmp2;
-    splay_tree_node n, c;
-
-    n = sp->root;
-    cmp1 = splay_compare (key, &n->key);
-
-    /* Found.  */
-    if (cmp1 == 0)
-      return;
-
-    /* Left or right?  If no child, then we're done.  */
-    if (cmp1 < 0)
-      c = n->left;
-    else
-      c = n->right;
-    if (!c)
-      return;
-
-    /* Next one left or right?  If found or no child, we're done
-       after one rotation.  */
-    cmp2 = splay_compare (key, &c->key);
-    if (cmp2 == 0
-	|| (cmp2 < 0 && !c->left)
-	|| (cmp2 > 0 && !c->right))
-      {
-	if (cmp1 < 0)
-	  rotate_left (&sp->root, n, c);
-	else
-	  rotate_right (&sp->root, n, c);
-	return;
-      }
-
-    /* Now we have the four cases of double-rotation.  */
-    if (cmp1 < 0 && cmp2 < 0)
-      {
-	rotate_left (&n->left, c, c->left);
-	rotate_left (&sp->root, n, n->left);
-      }
-    else if (cmp1 > 0 && cmp2 > 0)
-      {
-	rotate_right (&n->right, c, c->right);
-	rotate_right (&sp->root, n, n->right);
-      }
-    else if (cmp1 < 0 && cmp2 > 0)
-      {
-	rotate_right (&n->left, c, c->right);
-	rotate_left (&sp->root, n, n->left);
-      }
-    else if (cmp1 > 0 && cmp2 < 0)
-      {
-	rotate_left (&n->right, c, c->left);
-	rotate_right (&sp->root, n, n->right);
-      }
-  } while (1);
-}
-
-/* Insert a new NODE into SP.  The NODE shouldn't exist in the tree.  */
-
-static void
-splay_tree_insert (splay_tree sp, splay_tree_node node)
-{
-  int comparison = 0;
-
-  splay_tree_splay (sp, &node->key);
-
-  if (sp->root)
-    comparison = splay_compare (&sp->root->key, &node->key);
-
-  if (sp->root && comparison == 0)
-    abort ();
-  else
-    {
-      /* Insert it at the root.  */
-      if (sp->root == NULL)
-	node->left = node->right = NULL;
-      else if (comparison < 0)
-	{
-	  node->left = sp->root;
-	  node->right = node->left->right;
-	  node->left->right = NULL;
-	}
-      else
-	{
-	  node->right = sp->root;
-	  node->left = node->right->left;
-	  node->right->left = NULL;
-	}
-
-      sp->root = node;
-    }
-}
-
-/* Remove node with KEY from SP.  It is not an error if it did not exist.  */
-
-static void
-splay_tree_remove (splay_tree sp, splay_tree_key key)
-{
-  splay_tree_splay (sp, key);
-
-  if (sp->root && splay_compare (&sp->root->key, key) == 0)
-    {
-      splay_tree_node left, right;
-
-      left = sp->root->left;
-      right = sp->root->right;
-
-      /* One of the children is now the root.  Doesn't matter much
-	 which, so long as we preserve the properties of the tree.  */
-      if (left)
-	{
-	  sp->root = left;
-
-	  /* If there was a right child as well, hang it off the
-	     right-most leaf of the left child.  */
-	  if (right)
-	    {
-	      while (left->right)
-		left = left->right;
-	      left->right = right;
-	    }
-	}
-      else
-	sp->root = right;
-    }
-}
-
-/* Lookup KEY in SP, returning NODE if present, and NULL
-   otherwise.  */
-
-static splay_tree_key
-splay_tree_lookup (splay_tree sp, splay_tree_key key)
-{
-  splay_tree_splay (sp, key);
-
-  if (sp->root && splay_compare (&sp->root->key, key) == 0)
-    return &sp->root->key;
-  else
-    return NULL;
-}
+attribute_hidden splay_tree_key splay_tree_lookup (splay_tree, splay_tree_key);
+attribute_hidden void splay_tree_insert (splay_tree, splay_tree_node);
+attribute_hidden void splay_tree_remove (splay_tree, splay_tree_key);
+
+#endif /* _SPLAY_TREE_H */
diff --git a/libgomp/target.c b/libgomp/target.c
index 5b4873b..9345ac2 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -30,7 +30,12 @@
 #include <limits.h>
 #include <stdbool.h>
 #include <stdlib.h>
+#include "oacc-plugin.h"
+#include "gomp-constants.h"
+#include "oacc-int.h"
 #include <string.h>
+#include <stdio.h>
+#include <assert.h>
 
 #ifdef PLUGIN_SUPPORT
 #include <dlfcn.h>
@@ -40,50 +45,6 @@ static void gomp_target_init (void);
 
 static pthread_once_t gomp_is_initialized = PTHREAD_ONCE_INIT;
 
-/* Forward declaration for a node in the tree.  */
-typedef struct splay_tree_node_s *splay_tree_node;
-typedef struct splay_tree_s *splay_tree;
-typedef struct splay_tree_key_s *splay_tree_key;
-
-struct target_mem_desc {
-  /* Reference count.  */
-  uintptr_t refcount;
-  /* All the splay nodes allocated together.  */
-  splay_tree_node array;
-  /* Start of the target region.  */
-  uintptr_t tgt_start;
-  /* End of the targer region.  */
-  uintptr_t tgt_end;
-  /* Handle to free.  */
-  void *to_free;
-  /* Previous target_mem_desc.  */
-  struct target_mem_desc *prev;
-  /* Number of items in following list.  */
-  size_t list_count;
-
-  /* Corresponding target device descriptor.  */
-  struct gomp_device_descr *device_descr;
-
-  /* List of splay keys to remove (or decrease refcount)
-     at the end of region.  */
-  splay_tree_key list[];
-};
-
-struct splay_tree_key_s {
-  /* Address of the host object.  */
-  uintptr_t host_start;
-  /* Address immediately after the host object.  */
-  uintptr_t host_end;
-  /* Descriptor of the target memory.  */
-  struct target_mem_desc *tgt;
-  /* Offset from tgt->tgt_start to the start of the target object.  */
-  uintptr_t tgt_offset;
-  /* Reference count.  */
-  uintptr_t refcount;
-  /* True if data should be copied from device to host at the end.  */
-  bool copy_from;
-};
-
 /* This structure describes an offload image.
    It contains type of the target device, pointer to host table descriptor, and
    pointer to target data.  */
@@ -107,7 +68,7 @@ static int num_devices;
 
 /* The comparison function.  */
 
-static int
+attribute_hidden int
 splay_compare (splay_tree_key x, splay_tree_key y)
 {
   if (x->host_start == x->host_end
@@ -122,47 +83,16 @@ splay_compare (splay_tree_key x, splay_tree_key y)
 
 #include "splay-tree.h"
 
-/* This structure describes accelerator device.
-   It contains ID-number of the device, its type, function handlers for
-   interaction with the device, and information about mapped memory.  */
-struct gomp_device_descr
+attribute_hidden void
+gomp_init_targets_once (void)
 {
-  /* This is the ID number of device.  It could be specified in DEVICE-clause of
-     TARGET construct.  */
-  int id;
-
-  /* This is the ID number of device among devices of the same type.  */
-  int target_id;
-
-  /* This is the TYPE of device.  */
-  enum offload_target_type type;
-
-  /* Set to true when device is initialized.  */
-  bool is_initialized;
-
-  /* Function handlers.  */
-  int (*get_type_func) (void);
-  int (*get_num_devices_func) (void);
-  void (*register_image_func) (void *, void *);
-  void (*init_device_func) (int);
-  int (*get_table_func) (int, void *);
-  void *(*alloc_func) (int, size_t);
-  void (*free_func) (int, void *);
-  void *(*host2dev_func) (int, void *, const void *, size_t);
-  void *(*dev2host_func) (int, void *, const void *, size_t);
-  void (*run_func) (int, void *, void *);
-
-  /* Splay tree containing information about mapped memory regions.  */
-  struct splay_tree_s dev_splay_tree;
-
-  /* Mutex for operating with the splay tree and other shared structures.  */
-  gomp_mutex_t dev_env_lock;
-};
+  (void) pthread_once (&gomp_is_initialized, gomp_target_init);
+}
 
 attribute_hidden int
 gomp_get_num_devices (void)
 {
-  (void) pthread_once (&gomp_is_initialized, gomp_target_init);
+  gomp_init_targets_once ();
   return num_devices;
 }
 
@@ -198,18 +128,29 @@ gomp_map_vars_existing (splay_tree_key oldn, splay_tree_key newn,
   oldn->refcount++;
 }
 
-static struct target_mem_desc *
+static int
+get_kind (bool is_openacc, void *kinds, int idx)
+{
+  return is_openacc ? ((unsigned short *) kinds)[idx]
+		    : ((unsigned char *) kinds)[idx];
+}
+
+attribute_hidden struct target_mem_desc *
 gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
-	       void **hostaddrs, size_t *sizes, unsigned char *kinds,
-	       bool is_target)
+	       void **hostaddrs, void **devaddrs, size_t *sizes, void *kinds,
+	       bool is_openacc, bool is_target)
 {
   size_t i, tgt_align, tgt_size, not_found_cnt = 0;
+  const int rshift = is_openacc ? 8 : 3;
+  const int typemask = is_openacc ? 0xff : 0x7;
+  struct gomp_memory_mapping *mm = &devicep->mem_map;
   struct splay_tree_key_s cur_node;
   struct target_mem_desc *tgt
     = gomp_malloc (sizeof (*tgt) + sizeof (tgt->list[0]) * mapnum);
   tgt->list_count = mapnum;
   tgt->refcount = 1;
   tgt->device_descr = devicep;
+  tgt->mem_map = mm;
 
   if (mapnum == 0)
     return tgt;
@@ -222,41 +163,41 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
       tgt_align = align;
       tgt_size = mapnum * sizeof (void *);
     }
-
-  gomp_mutex_lock (&devicep->dev_env_lock);
+  gomp_mutex_lock (&mm->lock);
   for (i = 0; i < mapnum; i++)
     {
+      int kind = get_kind (is_openacc, kinds, i);
       if (hostaddrs[i] == NULL)
 	{
 	  tgt->list[i] = NULL;
 	  continue;
 	}
       cur_node.host_start = (uintptr_t) hostaddrs[i];
-      if ((kinds[i] & 7) != 4)
+      if (!GOMP_MAP_POINTER_P (kind & typemask))
 	cur_node.host_end = cur_node.host_start + sizes[i];
       else
 	cur_node.host_end = cur_node.host_start + sizeof (void *);
-      splay_tree_key n = splay_tree_lookup (&devicep->dev_splay_tree,
-					    &cur_node);
+      splay_tree_key n = splay_tree_lookup (&mm->splay_tree, &cur_node);
       if (n)
 	{
 	  tgt->list[i] = n;
-	  gomp_map_vars_existing (n, &cur_node, kinds[i]);
+	  gomp_map_vars_existing (n, &cur_node, kind);
 	}
       else
 	{
-	  size_t align = (size_t) 1 << (kinds[i] >> 3);
+	  size_t align = (size_t) 1 << (kind >> rshift);
 	  tgt->list[i] = NULL;
 	  not_found_cnt++;
 	  if (tgt_align < align)
 	    tgt_align = align;
 	  tgt_size = (tgt_size + align - 1) & ~(align - 1);
 	  tgt_size += cur_node.host_end - cur_node.host_start;
-	  if ((kinds[i] & 7) == 5)
+	  if ((kind & typemask) == GOMP_MAP_TO_PSET)
 	    {
 	      size_t j;
 	      for (j = i + 1; j < mapnum; j++)
-		if ((kinds[j] & 7) != 4)
+		if (!GOMP_MAP_POINTER_P (get_kind (is_openacc, kinds, j)
+					 & typemask))
 		  break;
 		else if ((uintptr_t) hostaddrs[j] < cur_node.host_start
 			 || ((uintptr_t) hostaddrs[j] + sizeof (void *)
@@ -271,7 +212,15 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 	}
     }
 
-  if (not_found_cnt || is_target)
+  if (devaddrs)
+    {
+      if (mapnum != 1)
+        gomp_fatal ("unexpected aggregation");
+      tgt->to_free = devaddrs[0];
+      tgt->tgt_start = (uintptr_t) tgt->to_free;
+      tgt->tgt_end = tgt->tgt_start + sizes[0];
+    }
+  else if (not_found_cnt || is_target)
     {
       /* Allocate tgt_align aligned tgt_size block of memory.  */
       /* FIXME: Perhaps change interface to allocate properly aligned
@@ -303,44 +252,52 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
       for (i = 0; i < mapnum; i++)
 	if (tgt->list[i] == NULL)
 	  {
+	    int kind = get_kind (is_openacc, kinds, i);
 	    if (hostaddrs[i] == NULL)
 	      continue;
 	    splay_tree_key k = &array->key;
 	    k->host_start = (uintptr_t) hostaddrs[i];
-	    if ((kinds[i] & 7) != 4)
+	    if (!GOMP_MAP_POINTER_P (kind & typemask))
 	      k->host_end = k->host_start + sizes[i];
 	    else
 	      k->host_end = k->host_start + sizeof (void *);
-	    splay_tree_key n
-	      = splay_tree_lookup (&devicep->dev_splay_tree, k);
+	    splay_tree_key n = splay_tree_lookup (&mm->splay_tree, k);
 	    if (n)
 	      {
 		tgt->list[i] = n;
-		gomp_map_vars_existing (n, k, kinds[i]);
+		gomp_map_vars_existing (n, k, kind);
 	      }
 	    else
 	      {
-		size_t align = (size_t) 1 << (kinds[i] >> 3);
+		size_t align = (size_t) 1 << (kind >> rshift);
 		tgt->list[i] = k;
 		tgt_size = (tgt_size + align - 1) & ~(align - 1);
 		k->tgt = tgt;
 		k->tgt_offset = tgt_size;
 		tgt_size += k->host_end - k->host_start;
-		k->copy_from = false;
-		if ((kinds[i] & 7) == 2 || (kinds[i] & 7) == 3)
-		  k->copy_from = true;
+		k->copy_from = GOMP_MAP_COPYFROM_P (kind & typemask)
+			       || GOMP_MAP_TOFROM_P (kind & typemask);
 		k->refcount = 1;
+		k->async_refcount = 0;
 		tgt->refcount++;
 		array->left = NULL;
 		array->right = NULL;
-		splay_tree_insert (&devicep->dev_splay_tree, array);
-		switch (kinds[i] & 7)
+		splay_tree_insert (&mm->splay_tree, array);
+		switch (kind & typemask)
 		  {
-		  case 0: /* ALLOC */
-		  case 2: /* FROM */
+		  case GOMP_MAP_FORCE_ALLOC:
+		  case GOMP_MAP_FORCE_FROM:
+		    /* FIXME: No special handling (see comment in
+		       oacc-parallel.c).  */
+		  case GOMP_MAP_ALLOC:
+		  case GOMP_MAP_ALLOC_FROM:
 		    break;
-		  case 1: /* TO */
-		  case 3: /* TOFROM */
+		  case GOMP_MAP_FORCE_TO:
+		  case GOMP_MAP_FORCE_TOFROM:
+		    /* FIXME: No special handling, as above.  */
+		  case GOMP_MAP_ALLOC_TO:
+		  case GOMP_MAP_ALLOC_TOFROM:
+		    /* Copy from host to device memory.  */
 		    /* FIXME: Perhaps add some smarts, like if copying
 		       several adjacent fields from host to target, use some
 		       host buffer to avoid sending each var individually.  */
@@ -350,7 +307,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 					    (void *) k->host_start,
 					    k->host_end - k->host_start);
 		    break;
-		  case 4: /* POINTER */
+		  case GOMP_MAP_POINTER:
 		    cur_node.host_start
 		      = (uintptr_t) *(void **) k->host_start;
 		    if (cur_node.host_start == (uintptr_t) NULL)
@@ -366,19 +323,16 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 		    /* Add bias to the pointer value.  */
 		    cur_node.host_start += sizes[i];
 		    cur_node.host_end = cur_node.host_start + 1;
-		    n = splay_tree_lookup (&devicep->dev_splay_tree,
-					   &cur_node);
+		    n = splay_tree_lookup (&mm->splay_tree, &cur_node);
 		    if (n == NULL)
 		      {
 			/* Could be possibly zero size array section.  */
 			cur_node.host_end--;
-			n = splay_tree_lookup (&devicep->dev_splay_tree,
-					       &cur_node);
+			n = splay_tree_lookup (&mm->splay_tree, &cur_node);
 			if (n == NULL)
 			  {
 			    cur_node.host_start--;
-			    n = splay_tree_lookup (&devicep->dev_splay_tree,
-						   &cur_node);
+			    n = splay_tree_lookup (&mm->splay_tree, &cur_node);
 			    cur_node.host_start++;
 			  }
 		      }
@@ -398,14 +352,17 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 					    (void *) &cur_node.tgt_offset,
 					    sizeof (void *));
 		    break;
-		  case 5: /* TO_PSET */
-		    devicep->host2dev_func (devicep->target_id,
-					    (void *) (tgt->tgt_start
-						      + k->tgt_offset),
-					    (void *) k->host_start,
-					    k->host_end - k->host_start);
+		  case GOMP_MAP_TO_PSET:
+		    /* Copy from host to device memory.  */
+		    /* FIXME: see above FIXME comment.  */
+		    devicep->host2dev_func
+		      (devicep->target_id,
+		       (void *) (tgt->tgt_start + k->tgt_offset),
+		       (void *) k->host_start,
+		       (k->host_end - k->host_start));
 		    for (j = i + 1; j < mapnum; j++)
-		      if ((kinds[j] & 7) != 4)
+		      if (!GOMP_MAP_POINTER_P (get_kind (is_openacc, kinds, j)
+					       & typemask))
 			break;
 		      else if ((uintptr_t) hostaddrs[j] < k->host_start
 			       || ((uintptr_t) hostaddrs[j] + sizeof (void *)
@@ -432,19 +389,18 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 			  /* Add bias to the pointer value.  */
 			  cur_node.host_start += sizes[j];
 			  cur_node.host_end = cur_node.host_start + 1;
-			  n = splay_tree_lookup (&devicep->dev_splay_tree,
-						 &cur_node);
+			  n = splay_tree_lookup (&mm->splay_tree, &cur_node);
 			  if (n == NULL)
 			    {
 			      /* Could be possibly zero size array section.  */
 			      cur_node.host_end--;
-			      n = splay_tree_lookup (&devicep->dev_splay_tree,
+			      n = splay_tree_lookup (&mm->splay_tree,
 						     &cur_node);
 			      if (n == NULL)
 				{
 				  cur_node.host_start--;
-				  n = splay_tree_lookup
-					(&devicep->dev_splay_tree, &cur_node);
+				  n = splay_tree_lookup (&mm->splay_tree,
+							 &cur_node);
 				  cur_node.host_start++;
 				}
 			    }
@@ -468,6 +424,32 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 			  i++;
 			}
 		      break;
+		    case GOMP_MAP_FORCE_PRESENT:
+		      {
+		        /* We already looked up the memory region above and it
+			   was missing.  */
+			size_t size = k->host_end - k->host_start;
+			gomp_fatal ("present clause: !acc_is_present (%p, "
+				    "%zd (0x%zx))", (void *) k->host_start,
+				    size, size);
+		      }
+		      break;
+		    case GOMP_MAP_FORCE_DEVICEPTR:
+		      assert (k->host_end - k->host_start == sizeof (void *));
+		      
+		      devicep->host2dev_func
+		        (devicep->target_id,
+			 (void *) (tgt->tgt_start + k->tgt_offset),
+			 (void *) k->host_start,
+			 sizeof (void *));
+		      break;
+		    case GOMP_MAP_FORCE_PRIVATE:
+		      abort ();
+		    case GOMP_MAP_FORCE_FIRSTPRIVATE:
+		      abort ();
+		    default:
+		      gomp_fatal ("%s: unhandled kind 0x%.2x", __FUNCTION__,
+				  kind);
 		  }
 		array++;
 	      }
@@ -490,7 +472,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 	}
     }
 
-  gomp_mutex_unlock (&devicep->dev_env_lock);
+  gomp_mutex_unlock (&mm->lock);
   return tgt;
 }
 
@@ -505,10 +487,51 @@ gomp_unmap_tgt (struct target_mem_desc *tgt)
   free (tgt);
 }
 
-static void
-gomp_unmap_vars (struct target_mem_desc *tgt)
+/* Decrease the refcount for a set of mapped variables, and queue asychronous
+   copies from the device back to the host after any work that has been issued. 
+   Because the regions are still "live", increment an asynchronous reference
+   count to indicate that they should not be unmapped from host-side data
+   structures until the asynchronous copy has completed.  */
+
+attribute_hidden void
+gomp_copy_from_async (struct target_mem_desc *tgt)
+{
+  struct gomp_device_descr *devicep = tgt->device_descr;
+  struct gomp_memory_mapping *mm = tgt->mem_map;
+  size_t i;
+  
+  gomp_mutex_lock (&mm->lock);
+
+  for (i = 0; i < tgt->list_count; i++)
+    if (tgt->list[i] == NULL)
+      ;
+    else if (tgt->list[i]->refcount > 1)
+      {
+	tgt->list[i]->refcount--;
+	tgt->list[i]->async_refcount++;
+      }
+    else
+      {
+	splay_tree_key k = tgt->list[i];
+	if (k->copy_from)
+	  /* Copy from device to host memory.  */
+	  devicep->dev2host_func (devicep->target_id, (void *) k->host_start,
+				  (void *) (k->tgt->tgt_start + k->tgt_offset),
+				  k->host_end - k->host_start);
+      }
+
+  gomp_mutex_unlock (&mm->lock);
+}
+
+/* Unmap variables described by TGT.  If DO_COPYFROM is true, copy relevant
+   variables back from device to host: if it is false, it is assumed that this
+   has been done already, i.e. by gomp_copy_from_async above.  */
+
+attribute_hidden void
+gomp_unmap_vars (struct target_mem_desc *tgt, bool do_copyfrom)
 {
   struct gomp_device_descr *devicep = tgt->device_descr;
+  struct gomp_memory_mapping *mm = tgt->mem_map;
 
   if (tgt->list_count == 0)
     {
@@ -517,20 +540,23 @@ gomp_unmap_vars (struct target_mem_desc *tgt)
     }
 
   size_t i;
-  gomp_mutex_lock (&devicep->dev_env_lock);
+  gomp_mutex_lock (&mm->lock);
   for (i = 0; i < tgt->list_count; i++)
     if (tgt->list[i] == NULL)
       ;
     else if (tgt->list[i]->refcount > 1)
       tgt->list[i]->refcount--;
+    else if (tgt->list[i]->async_refcount > 0)
+      tgt->list[i]->async_refcount--;
     else
       {
 	splay_tree_key k = tgt->list[i];
-	if (k->copy_from)
+	if (k->copy_from && do_copyfrom)
+	  /* Copy from device to host memory.  */
 	  devicep->dev2host_func (devicep->target_id, (void *) k->host_start,
 				  (void *) (k->tgt->tgt_start + k->tgt_offset),
 				  k->host_end - k->host_start);
-	splay_tree_remove (&devicep->dev_splay_tree, k);
+	splay_tree_remove (&mm->splay_tree, k);
 	if (k->tgt->refcount > 1)
 	  k->tgt->refcount--;
 	else
@@ -541,15 +567,17 @@ gomp_unmap_vars (struct target_mem_desc *tgt)
     tgt->refcount--;
   else
     gomp_unmap_tgt (tgt);
-  gomp_mutex_unlock (&devicep->dev_env_lock);
+  gomp_mutex_unlock (&mm->lock);
 }
 
 static void
-gomp_update (struct gomp_device_descr *devicep, size_t mapnum,
-	     void **hostaddrs, size_t *sizes, unsigned char *kinds)
+gomp_update (struct gomp_device_descr *devicep, struct gomp_memory_mapping *mm,
+	     size_t mapnum, void **hostaddrs, size_t *sizes, void *kinds,
+	     bool is_openacc)
 {
   size_t i;
   struct splay_tree_key_s cur_node;
+  const int typemask = is_openacc ? 0xff : 0x7;
 
   if (!devicep)
     return;
@@ -557,16 +585,17 @@ gomp_update (struct gomp_device_descr *devicep, size_t mapnum,
   if (mapnum == 0)
     return;
 
-  gomp_mutex_lock (&devicep->dev_env_lock);
+  gomp_mutex_lock (&mm->lock);
   for (i = 0; i < mapnum; i++)
     if (sizes[i])
       {
 	cur_node.host_start = (uintptr_t) hostaddrs[i];
 	cur_node.host_end = cur_node.host_start + sizes[i];
-	splay_tree_key n = splay_tree_lookup (&devicep->dev_splay_tree,
+	splay_tree_key n = splay_tree_lookup (&mm->splay_tree,
 					      &cur_node);
 	if (n)
 	  {
+	    int kind = get_kind (is_openacc, kinds, i);
 	    if (n->host_start > cur_node.host_start
 		|| n->host_end < cur_node.host_end)
 	      gomp_fatal ("Trying to update [%p..%p) object when"
@@ -575,31 +604,38 @@ gomp_update (struct gomp_device_descr *devicep, size_t mapnum,
 			  (void *) cur_node.host_end,
 			  (void *) n->host_start,
 			  (void *) n->host_end);
-	    if ((kinds[i] & 7) == 1)
-	      devicep->host2dev_func (devicep->target_id,
-				      (void *) (n->tgt->tgt_start
-						+ n->tgt_offset
-						+ cur_node.host_start
-						- n->host_start),
-				      (void *) cur_node.host_start,
-				      cur_node.host_end - cur_node.host_start);
-	    else if ((kinds[i] & 7) == 2)
-	      devicep->dev2host_func (devicep->target_id,
-				      (void *) cur_node.host_start,
-				      (void *) (n->tgt->tgt_start
-						+ n->tgt_offset
-						+ cur_node.host_start
-						- n->host_start),
-				      cur_node.host_end - cur_node.host_start);
+	    if (GOMP_MAP_COPYTO_P (kind & typemask))
+	      /* Copy from host to device memory.  */
+	      devicep->host2dev_func
+		(devicep->target_id, 
+		 (void *) (n->tgt->tgt_start
+			   + n->tgt_offset
+			   + cur_node.host_start
+			   - n->host_start),
+		 (void *) cur_node.host_start,
+		 cur_node.host_end - cur_node.host_start);
+	    else if (GOMP_MAP_COPYFROM_P (kind & typemask))
+	      /* Copy from device to host memory.  */
+	      devicep->dev2host_func
+		(devicep->target_id,
+		 (void *) cur_node.host_start,
+		 (void *) (n->tgt->tgt_start
+			   + n->tgt_offset
+			   + cur_node.host_start
+			   - n->host_start),
+		 cur_node.host_end - cur_node.host_start);
 	  }
 	else
 	  gomp_fatal ("Trying to update [%p..%p) object that is not mapped",
 		      (void *) cur_node.host_start,
 		      (void *) cur_node.host_end);
       }
-  gomp_mutex_unlock (&devicep->dev_env_lock);
+  gomp_mutex_unlock (&mm->lock);
 }
 
+static void gomp_register_image_for_device (struct gomp_device_descr *device,
+					    struct offload_image_descr *image);
+
 /* This function should be called from every offload image.
    It gets the descriptor of the host func and var tables HOST_TABLE, TYPE of
    the target, and TARGET_DATA needed by target plugin.  */
@@ -612,6 +648,9 @@ GOMP_offload_register (void *host_table, enum offload_target_type target_type,
 				 (num_offload_images + 1)
 				 * sizeof (struct offload_image_descr));
 
+  if (offload_images == NULL)
+    return;
+
   offload_images[num_offload_images].type = target_type;
   offload_images[num_offload_images].host_table = host_table;
   offload_images[num_offload_images].target_data = target_data;
@@ -621,17 +660,24 @@ GOMP_offload_register (void *host_table, enum offload_target_type target_type,
 
 /* This function initializes the target device, specified by DEVICEP.  */
 
-static void
+attribute_hidden void
 gomp_init_device (struct gomp_device_descr *devicep)
 {
+  /* Initialize the target device.  */
   devicep->init_device_func (devicep->target_id);
+  
+  devicep->is_initialized = true;
+}
 
+attribute_hidden void
+gomp_init_tables (const struct gomp_device_descr *devicep,
+		  struct gomp_memory_mapping *mm)
+{
   /* Get address mapping table for device.  */
   struct mapping_table *table = NULL;
-  int num_entries = devicep->get_table_func (devicep->target_id, &table);
+  int i, num_entries = devicep->get_table_func (devicep->target_id, &table);
 
   /* Insert host-target address mapping into dev_splay_tree.  */
-  int i;
   for (i = 0; i < num_entries; i++)
     {
       struct target_mem_desc *tgt = gomp_malloc (sizeof (*tgt));
@@ -641,7 +687,7 @@ gomp_init_device (struct gomp_device_descr *devicep)
       tgt->tgt_end = table[i].tgt_end;
       tgt->to_free = NULL;
       tgt->list_count = 0;
-      tgt->device_descr = devicep;
+      tgt->device_descr = (struct gomp_device_descr *) devicep;
       splay_tree_node node = tgt->array;
       splay_tree_key k = &node->key;
       k->host_start = table[i].host_start;
@@ -652,11 +698,45 @@ gomp_init_device (struct gomp_device_descr *devicep)
       k->tgt = tgt;
       node->left = NULL;
       node->right = NULL;
-      splay_tree_insert (&devicep->dev_splay_tree, node);
+      splay_tree_insert (&mm->splay_tree, node);
     }
 
   free (table);
-  devicep->is_initialized = true;
+  mm->is_initialized = true;
+}
+
+static void
+gomp_init_dev_tables (struct gomp_device_descr *devicep)
+{
+  gomp_init_device (devicep);
+  gomp_init_tables (devicep, &devicep->mem_map);
+}
+
+
+attribute_hidden void
+gomp_free_memmap (struct gomp_device_descr *devicep)
+{
+  struct gomp_memory_mapping *mm = &devicep->mem_map;
+
+  while (mm->splay_tree.root)
+    {
+      struct target_mem_desc *tgt = mm->splay_tree.root->key.tgt;
+      
+      splay_tree_remove (&mm->splay_tree, &mm->splay_tree.root->key);
+      free (tgt->array);
+      free (tgt);
+    }
+
+  mm->is_initialized = false;
+}
+
+attribute_hidden void
+gomp_fini_device (struct gomp_device_descr *devicep)
+{
+  if (devicep->is_initialized)
+    devicep->fini_device_func (devicep->target_id);
+
+  devicep->is_initialized = false;
 }
 
 /* Called when encountering a target directive.  If DEVICE
@@ -675,7 +755,12 @@ GOMP_target (int device, void (*fn) (void *), const void *openmp_target,
 	     unsigned char *kinds)
 {
   struct gomp_device_descr *devicep = resolve_device (device);
-  if (devicep == NULL)
+  struct gomp_memory_mapping *mm = &devicep->mem_map;
+
+  if (devicep != NULL && !devicep->is_initialized)
+    gomp_init_dev_tables (devicep);
+
+  if (devicep == NULL || !(devicep->capabilities & TARGET_CAP_OPENMP_400))
     {
       /* Host fallback.  */
       struct gomp_thread old_thr, *thr = gomp_thread ();
@@ -692,20 +777,30 @@ GOMP_target (int device, void (*fn) (void *), const void *openmp_target,
       return;
     }
 
-  gomp_mutex_lock (&devicep->dev_env_lock);
-  if (!devicep->is_initialized)
-    gomp_init_device (devicep);
+  void *fn_addr;
 
-  struct splay_tree_key_s k;
-  k.host_start = (uintptr_t) fn;
-  k.host_end = k.host_start + 1;
-  splay_tree_key tgt_fn = splay_tree_lookup (&devicep->dev_splay_tree, &k);
-  if (tgt_fn == NULL)
-    gomp_fatal ("Target function wasn't mapped");
-  gomp_mutex_unlock (&devicep->dev_env_lock);
+  if (devicep->capabilities & TARGET_CAP_NATIVE_EXEC)
+    fn_addr = (void *) fn;
+  else
+    {
+      gomp_mutex_lock (&mm->lock);
+      if (!devicep->is_initialized)
+	gomp_init_dev_tables (devicep);
+      struct splay_tree_key_s k;
+      k.host_start = (uintptr_t) fn;
+      k.host_end = k.host_start + 1;
+      splay_tree_key tgt_fn = splay_tree_lookup (&devicep->mem_map.splay_tree,
+						 &k);
+      if (tgt_fn == NULL)
+	gomp_fatal ("Target function wasn't mapped");
+      gomp_mutex_unlock (&mm->lock);
+      
+      fn_addr = (void *) tgt_fn->tgt->tgt_start;
+    }
 
   struct target_mem_desc *tgt_vars
-    = gomp_map_vars (devicep, mapnum, hostaddrs, sizes, kinds, true);
+    = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, false,
+		     true);
   struct gomp_thread old_thr, *thr = gomp_thread ();
   old_thr = *thr;
   memset (thr, '\0', sizeof (*thr));
@@ -714,11 +809,10 @@ GOMP_target (int device, void (*fn) (void *), const void *openmp_target,
       thr->place = old_thr.place;
       thr->ts.place_partition_len = gomp_places_list_len;
     }
-  devicep->run_func (devicep->target_id, (void *) tgt_fn->tgt->tgt_start,
-		     (void *) tgt_vars->tgt_start);
+  devicep->run_func (devicep->target_id, fn_addr, (void *) tgt_vars->tgt_start);
   gomp_free_thread (thr);
   *thr = old_thr;
-  gomp_unmap_vars (tgt_vars);
+  gomp_unmap_vars (tgt_vars, true);
 }
 
 void
@@ -726,7 +820,12 @@ GOMP_target_data (int device, const void *openmp_target, size_t mapnum,
 		  void **hostaddrs, size_t *sizes, unsigned char *kinds)
 {
   struct gomp_device_descr *devicep = resolve_device (device);
-  if (devicep == NULL)
+  struct gomp_memory_mapping *mm = &devicep->mem_map;
+
+  if (devicep != NULL && !devicep->is_initialized)
+    gomp_init_dev_tables (devicep);
+
+  if (devicep == NULL || !(devicep->capabilities & TARGET_CAP_OPENMP_400))
     {
       /* Host fallback.  */
       struct gomp_task_icv *icv = gomp_icv (false);
@@ -737,20 +836,21 @@ GOMP_target_data (int device, const void *openmp_target, size_t mapnum,
 	     new #pragma omp target data, otherwise GOMP_target_end_data
 	     would get out of sync.  */
 	  struct target_mem_desc *tgt
-	    = gomp_map_vars (NULL, 0, NULL, NULL, NULL, false);
+	    = gomp_map_vars (NULL, 0, NULL, NULL, NULL, NULL, false, false);
 	  tgt->prev = icv->target_data;
 	  icv->target_data = tgt;
 	}
       return;
     }
 
-  gomp_mutex_lock (&devicep->dev_env_lock);
+  gomp_mutex_lock (&mm->lock);
   if (!devicep->is_initialized)
-    gomp_init_device (devicep);
-  gomp_mutex_unlock (&devicep->dev_env_lock);
+    gomp_init_dev_tables (devicep);
+  gomp_mutex_unlock (&mm->lock);
 
   struct target_mem_desc *tgt
-    = gomp_map_vars (devicep, mapnum, hostaddrs, sizes, kinds, false);
+    = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, false,
+		     false);
   struct gomp_task_icv *icv = gomp_icv (true);
   tgt->prev = icv->target_data;
   icv->target_data = tgt;
@@ -764,7 +864,7 @@ GOMP_target_end_data (void)
     {
       struct target_mem_desc *tgt = icv->target_data;
       icv->target_data = tgt->prev;
-      gomp_unmap_vars (tgt);
+      gomp_unmap_vars (tgt, true);
     }
 }
 
@@ -773,15 +873,18 @@ GOMP_target_update (int device, const void *openmp_target, size_t mapnum,
 		    void **hostaddrs, size_t *sizes, unsigned char *kinds)
 {
   struct gomp_device_descr *devicep = resolve_device (device);
-  if (devicep == NULL)
-    return;
+  struct gomp_memory_mapping *mm = &devicep->mem_map;
 
-  gomp_mutex_lock (&devicep->dev_env_lock);
-  if (!devicep->is_initialized)
+  gomp_mutex_lock (&mm->lock);
+  if (devicep != NULL && !devicep->is_initialized)
     gomp_init_device (devicep);
-  gomp_mutex_unlock (&devicep->dev_env_lock);
+  gomp_mutex_unlock (&mm->lock);
 
-  gomp_update (devicep, mapnum, hostaddrs, sizes, kinds);
+  if (devicep == NULL || !(devicep->capabilities & TARGET_CAP_OPENMP_400))
+    return;
+
+  gomp_update (devicep, &devicep->mem_map, mapnum, hostaddrs, sizes, kinds,
+	       false);
 }
 
 void
@@ -808,9 +911,22 @@ gomp_load_plugin_for_device (struct gomp_device_descr *device,
 			     const char *plugin_name)
 {
   void *plugin_handle = dlopen (plugin_name, RTLD_LAZY);
+  char *err = NULL, *last_missing = NULL;
+  int optional_present, optional_total;
+
   if (!plugin_handle)
     return false;
 
+  /* Clear any existing error.  */
+  dlerror ();
+
+  device->plugin_handle = dlopen (plugin_name, RTLD_LAZY);
+  if (!device->plugin_handle)
+    {
+      err = dlerror ();
+      goto out;
+    }
+
   /* Check if all required functions are available in the plugin and store
      their handlers.  */
 #define DLSYM(f)						    \
@@ -821,33 +937,104 @@ gomp_load_plugin_for_device (struct gomp_device_descr *device,
 	return false;						    \
     }								    \
   while (0)
+  /* Similar, but missing functions are not an error.  */
+#define DLSYM_OPT(f,n) \
+  do									\
+    {									\
+      char *tmp_err;							\
+      device->f##_func = dlsym (device->plugin_handle,			\
+				"GOMP_OFFLOAD_" #n);			\
+      tmp_err = dlerror ();						\
+      if (tmp_err == NULL)						\
+        optional_present++;						\
+      else								\
+        last_missing = #n;						\
+      optional_total++;							\
+    }									\
+  while (0)
+
+  DLSYM (get_name);
+  DLSYM (get_caps);
   DLSYM (get_type);
   DLSYM (get_num_devices);
   DLSYM (register_image);
   DLSYM (init_device);
+  DLSYM (fini_device);
   DLSYM (get_table);
   DLSYM (alloc);
   DLSYM (free);
   DLSYM (dev2host);
   DLSYM (host2dev);
-  DLSYM (run);
+  device->capabilities = device->get_caps_func ();
+  if (device->capabilities & TARGET_CAP_OPENMP_400)
+    DLSYM (run);
+  if (device->capabilities & TARGET_CAP_OPENACC_200)
+    {
+      optional_present = optional_total = 0;
+      DLSYM_OPT (openacc.exec, openacc_parallel);
+      DLSYM_OPT (openacc.open_device, openacc_open_device);
+      DLSYM_OPT (openacc.close_device, openacc_close_device);
+      DLSYM_OPT (openacc.get_device_num, openacc_get_device_num);
+      DLSYM_OPT (openacc.set_device_num, openacc_set_device_num);
+      DLSYM_OPT (openacc.register_async_cleanup,
+		 openacc_register_async_cleanup);
+      DLSYM_OPT (openacc.async_test, openacc_async_test);
+      DLSYM_OPT (openacc.async_test_all, openacc_async_test_all);
+      DLSYM_OPT (openacc.async_wait, openacc_async_wait);
+      DLSYM_OPT (openacc.async_wait_async, openacc_async_wait_async);
+      DLSYM_OPT (openacc.async_wait_all, openacc_async_wait_all);
+      DLSYM_OPT (openacc.async_wait_all_async, openacc_async_wait_all_async);
+      DLSYM_OPT (openacc.async_set_async, openacc_async_set_async);
+      DLSYM_OPT (openacc.create_thread_data, openacc_create_thread_data);
+      DLSYM_OPT (openacc.destroy_thread_data, openacc_destroy_thread_data);
+      /* Require all the OpenACC handlers if we have TARGET_CAP_OPENACC_200.  */
+      if (optional_present != optional_total)
+	{
+	  err = "plugin missing OpenACC handler function";
+	  goto out;
+	}
+      optional_present = optional_total = 0;
+      DLSYM_OPT (openacc.cuda.get_current_device,
+		 openacc_get_current_cuda_device);
+      DLSYM_OPT (openacc.cuda.get_current_context,
+		 openacc_get_current_cuda_context);
+      DLSYM_OPT (openacc.cuda.get_stream, openacc_get_cuda_stream);
+      DLSYM_OPT (openacc.cuda.set_stream, openacc_set_cuda_stream);
+      /* Make sure all the CUDA functions are there if any of them are.  */
+      if (optional_present && optional_present != optional_total)
+	{
+	  err = "plugin missing OpenACC CUDA handler function";
+	  goto out;
+	}
+    }
 #undef DLSYM
+#undef DLSYM_OPT
 
-  return true;
+ out:
+  if (err != NULL)
+    {
+      gomp_error ("while loading %s: %s", plugin_name, err);
+      if (last_missing)
+        gomp_error ("missing function was %s", last_missing);
+      if (device->plugin_handle)
+	dlclose (device->plugin_handle);
+    }
+  return err == NULL;
 }
 
-/* This function finds OFFLOAD_IMAGES corresponding to DEVICE type, and
-   registers them in the plugin.  */
+/* This function adds a compatible offload image IMAGE to an accelerator device
+   DEVICE.  */
 
 static void
-gomp_register_images_for_device (struct gomp_device_descr *device)
+gomp_register_image_for_device (struct gomp_device_descr *device,
+				struct offload_image_descr *image)
 {
-  int i;
-  for (i = 0; i < num_offload_images; i++)
+  if (!device->offload_regions_registered
+      && (device->type == image->type
+	  || device->type == OFFLOAD_TARGET_TYPE_HOST))
     {
-      struct offload_image_descr *image = &offload_images[i];
-      if (image->type == device->type)
-	device->register_image_func (image->host_table, image->target_data);
+      device->register_image_func (image->host_table, image->target_data);
+      device->offload_regions_registered = true;
     }
 }
 
@@ -903,15 +1090,19 @@ gomp_target_init (void)
 		  }
 
 		current_device.type = current_device.get_type_func ();
+		current_device.name = current_device.get_name_func ();
 		current_device.is_initialized = false;
-		current_device.dev_splay_tree.root = NULL;
-		gomp_register_images_for_device (&current_device);
+		current_device.offload_regions_registered = false;
+		current_device.mem_map.splay_tree.root = NULL;
+		current_device.mem_map.is_initialized = false;
+		current_device.target_data = NULL;
+		current_device.openacc.data_environ = NULL;
 		for (i = 0; i < new_num_devices; i++)
 		  {
 		    current_device.id = num_devices + 1;
 		    current_device.target_id = i;
 		    devices[num_devices] = current_device;
-		    gomp_mutex_init (&devices[num_devices].dev_env_lock);
+		    gomp_mutex_init (&devices[num_devices].mem_map.lock);
 		    num_devices++;
 		  }
 	      }
@@ -922,6 +1113,43 @@ gomp_target_init (void)
       }
     while (next);
 
+  /* Prefer a device with TARGET_CAP_OPENMP_400 for ICV default-device-var.  */
+  if (num_devices > 1)
+    {
+      int d = gomp_icv (false)->default_device_var;
+
+      if (!(devices[d].capabilities & TARGET_CAP_OPENMP_400))
+	{
+	  for (i = 0; i < num_devices; i++)
+	    {
+	      if (devices[i].capabilities & TARGET_CAP_OPENMP_400)
+		{
+		  struct gomp_device_descr device_tmp = devices[d];
+		  devices[d] = devices[i];
+		  devices[d].id = d + 1;
+		  devices[i] = device_tmp;
+		  devices[i].id = i + 1;
+
+		  break;
+		}
+	    }
+	}
+    }
+
+  for (i = 0; i < num_devices; i++)
+    {
+      int j;
+
+      for (j = 0; j < num_offload_images; j++)
+	gomp_register_image_for_device (&devices[i], &offload_images[j]);
+
+      /* The 'devices' array can be moved (by the realloc call) until we have
+	 found all the plugins, so registering with the OpenACC runtime (which
+	 takes a copy of the pointer argument) must be delayed until now.  */
+      if (devices[i].capabilities & TARGET_CAP_OPENACC_200)
+	goacc_register (&devices[i]);
+    }
+
   free (offload_images);
   offload_images = NULL;
   num_offload_images = 0;
diff --git a/libgomp/target.h b/libgomp/target.h
new file mode 100644
index 0000000..e69de29
diff --git a/libgomp/testsuite/Makefile.in b/libgomp/testsuite/Makefile.in
index 2f845f0..78b6351 100644
diff --git a/libgomp/testsuite/libgomp-test-support.exp.in b/libgomp/testsuite/libgomp-test-support.exp.in
new file mode 100644
index 0000000..dcadad7
--- /dev/null
+++ b/libgomp/testsuite/libgomp-test-support.exp.in
@@ -0,0 +1,2 @@
+set cuda_driver_include "@CUDA_DRIVER_INCLUDE@"
+set cuda_driver_lib "@CUDA_DRIVER_LIB@"

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/5] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin (repost)
  2014-11-15  1:04     ` [PATCH 1/5] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin (repost) Julian Brown
@ 2014-11-19 19:58       ` Bernd Schmidt
  2014-11-19 20:39         ` Cesar Philippidis
  0 siblings, 1 reply; 36+ messages in thread
From: Bernd Schmidt @ 2014-11-19 19:58 UTC (permalink / raw)
  To: Julian Brown, Jakub Jelinek
  Cc: gcc-patches, Thomas Schwinge, Ilya Verbin, Cesar Philippidis

[-- Attachment #1: Type: text/plain, Size: 848 bytes --]

I've had some trouble with this patch as well - parts of it appear 
malformed, and in one instance a file still references the nonexistent 
target.h rather than libgomp_target.h. That was fixed relatively easily, 
but it is also missing some changes that Cesar made to our local sources 
recently, and which are required by Thomas' middle-end submission.

I'm attaching the patch in the form in which I've made it work locally, 
plus Cesar's patch which is needed on top of it. Julian, you'll probably 
want to look for that patch since it also included testsuite changes. 
Cesar - have a look over this please and maybe explain for review 
purposes what your patch does.

On the bright side, I now have a local tree based on gcc trunk with all 
posted patches plus several additional fixes, and it appears to be 
offloading stuff to ptx.


Bernd


[-- Attachment #2: jb-oacc-1115b.diff --]
[-- Type: text/x-patch, Size: 365993 bytes --]

commit 9a93f9ac64add8a4f8fc18f792b211218f7e9e29
Author: Bernd Schmidt <bernds@codesourcery.com>
Date:   Wed Nov 19 00:23:06 2014 +0100

    Julian's libgomp stuff

diff --git a/include/gomp-constants.h b/include/gomp-constants.h
new file mode 100644
index 0000000..7ef5c88
--- /dev/null
+++ b/include/gomp-constants.h
@@ -0,0 +1,45 @@
+#ifndef GOMP_CONSTANTS_H
+#define GOMP_CONSTANTS_H 1
+
+/* Enumerated variable mapping types used to communicate between GCC and
+   libgomp.  These values are used for both OpenMP and OpenACC.  */
+
+#define GOMP_MAP_ALLOC			0x00
+#define GOMP_MAP_ALLOC_TO		0x01
+#define GOMP_MAP_ALLOC_FROM		0x02
+#define GOMP_MAP_ALLOC_TOFROM		0x03
+#define GOMP_MAP_POINTER		0x04
+#define GOMP_MAP_TO_PSET		0x05
+#define GOMP_MAP_FORCE_ALLOC		0x08
+#define GOMP_MAP_FORCE_TO		0x09
+#define GOMP_MAP_FORCE_FROM		0x0a
+#define GOMP_MAP_FORCE_TOFROM		0x0b
+#define GOMP_MAP_FORCE_PRESENT		0x0c
+#define GOMP_MAP_FORCE_DEALLOC		0x0d
+#define GOMP_MAP_FORCE_DEVICEPTR	0x0e
+#define GOMP_MAP_FORCE_PRIVATE		0x18
+#define GOMP_MAP_FORCE_FIRSTPRIVATE	0x19
+
+#define GOMP_MAP_COPYTO_P(X) \
+  ((X) == GOMP_MAP_ALLOC_TO || (X) == GOMP_MAP_FORCE_TO)
+
+#define GOMP_MAP_COPYFROM_P(X) \
+  ((X) == GOMP_MAP_ALLOC_FROM || (X) == GOMP_MAP_FORCE_FROM)
+
+#define GOMP_MAP_TOFROM_P(X) \
+  ((X) == GOMP_MAP_ALLOC_TOFROM || (X) == GOMP_MAP_FORCE_TOFROM)
+
+#define GOMP_MAP_POINTER_P(X) \
+  ((X) == GOMP_MAP_POINTER)
+
+#define GOMP_IF_CLAUSE_FALSE		-2
+
+/* Canonical list of target type codes for OpenMP/OpenACC.  */
+#define GOMP_TARGET_NONE		0
+#define GOMP_TARGET_HOST		2
+#define GOMP_TARGET_HOST_NONSHM		3
+#define GOMP_TARGET_NOT_HOST		4
+#define GOMP_TARGET_NVIDIA_PTX		5
+#define GOMP_TARGET_INTEL_MIC		6
+
+#endif
diff --git a/libgomp/Makefile.am b/libgomp/Makefile.am
index 427415e..f48c1ff 100644
--- a/libgomp/Makefile.am
+++ b/libgomp/Makefile.am
@@ -7,7 +7,8 @@ SUBDIRS = testsuite
 gcc_version := $(shell cat $(top_srcdir)/../gcc/BASE-VER)
 
 config_path = @config_path@
-search_path = $(addprefix $(top_srcdir)/config/, $(config_path)) $(top_srcdir)
+search_path = $(addprefix $(top_srcdir)/config/, $(config_path)) $(top_srcdir) \
+	      $(top_srcdir)/../include
 
 fincludedir = $(libdir)/gcc/$(target_alias)/$(gcc_version)/finclude
 libsubincludedir = $(libdir)/gcc/$(target_alias)/$(gcc_version)/include
@@ -18,6 +19,10 @@ AM_CPPFLAGS = $(addprefix -I, $(search_path))
 AM_CFLAGS = $(XCFLAGS)
 AM_LDFLAGS = $(XLDFLAGS) $(SECTION_LDFLAGS) $(OPT_LDFLAGS)
 
+if LIBGOMP_VERBOSE
+AM_CPPFLAGS += -DLIBGOMP_VERBOSE
+endif
+
 toolexeclib_LTLIBRARIES = libgomp.la
 nodist_toolexeclib_HEADERS = libgomp.spec
 
@@ -60,12 +65,21 @@ libgomp_la_LINK = $(LINK) $(libgomp_la_LDFLAGS)
 libgomp_la_SOURCES = alloc.c barrier.c critical.c env.c error.c iter.c \
 	iter_ull.c loop.c loop_ull.c ordered.c parallel.c sections.c single.c \
 	task.c team.c work.c lock.c mutex.c proc.c sem.c bar.c ptrlock.c \
-	time.c fortran.c affinity.c target.c
+	time.c fortran.c affinity.c target.c oacc-parallel.c splay-tree.c \
+	oacc-host.c oacc-init.c oacc-mem.c oacc-async.c oacc-plugin.c \
+	oacc-cuda.c libgomp-plugin.c
+
+include $(top_srcdir)/plugin/Makefrag.am
+
+if USE_FORTRAN
+libgomp_la_SOURCES += openacc.f90
+endif
 
 nodist_noinst_HEADERS = libgomp_f.h
-nodist_libsubinclude_HEADERS = omp.h
+nodist_libsubinclude_HEADERS = omp.h openacc.h ../include/gomp-constants.h
 if USE_FORTRAN
-nodist_finclude_HEADERS = omp_lib.h omp_lib.f90 omp_lib.mod omp_lib_kinds.mod
+nodist_finclude_HEADERS = omp_lib.h omp_lib.f90 omp_lib.mod omp_lib_kinds.mod \
+	openacc_lib.h openacc.f90 openacc.mod openacc_kinds.mod
 endif
 
 LTLDFLAGS = $(shell $(SHELL) $(top_srcdir)/../libtool-ldflags $(LDFLAGS))
diff --git a/libgomp/Makefile.in b/libgomp/Makefile.in
index 8e4774f..d2a803a 100644
--- a/libgomp/Makefile.in
+++ b/libgomp/Makefile.in
@@ -15,6 +15,33 @@
 
 @SET_MAKE@
 
+# Plugins for offload execution, Makefile.am fragment.
+#
+# Copyright (C) 2014 Free Software Foundation, Inc.
+#
+# Contributed by Mentor Embedded.
+#
+# This file is part of the GNU OpenMP Library (libgomp).
+#
+# Libgomp is free software; you can redistribute it and/or modify it
+# under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+# WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+# FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+# more details.
+#
+# Under Section 7 of GPL version 3, you are granted additional
+# permissions described in the GCC Runtime Library Exception, version
+# 3.1, as published by the Free Software Foundation.
+#
+# You should have received a copy of the GNU General Public License and
+# a copy of the GCC Runtime Library Exception along with this program;
+# see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+# <http://www.gnu.org/licenses/>.
+
 
 VPATH = @srcdir@
 pkgdatadir = $(datadir)/@PACKAGE@
@@ -36,13 +63,17 @@ POST_UNINSTALL = :
 build_triplet = @build@
 host_triplet = @host@
 target_triplet = @target@
-subdir = .
-DIST_COMMON = ChangeLog $(srcdir)/Makefile.in $(srcdir)/Makefile.am \
+@LIBGOMP_VERBOSE_TRUE@am__append_1 = -DLIBGOMP_VERBOSE
+DIST_COMMON = $(top_srcdir)/plugin/Makefrag.am ChangeLog \
+	$(srcdir)/Makefile.in $(srcdir)/Makefile.am \
 	$(top_srcdir)/configure $(am__configure_deps) \
 	$(srcdir)/config.h.in $(srcdir)/../mkinstalldirs \
 	$(srcdir)/omp.h.in $(srcdir)/omp_lib.h.in \
 	$(srcdir)/omp_lib.f90.in $(srcdir)/libgomp_f.h.in \
 	$(srcdir)/libgomp.spec.in $(srcdir)/../depcomp
+@PLUGIN_NVPTX_TRUE@am__append_2 = libgomp-plugin-nvptx.la
+@USE_FORTRAN_TRUE@am__append_3 = openacc.f90
+subdir = .
 ACLOCAL_M4 = $(top_srcdir)/aclocal.m4
 am__aclocal_m4_deps = $(top_srcdir)/../config/acx.m4 \
 	$(top_srcdir)/../config/depstand.m4 \
@@ -56,7 +87,8 @@ am__aclocal_m4_deps = $(top_srcdir)/../config/acx.m4 \
 	$(top_srcdir)/../config/tls.m4 $(top_srcdir)/../ltoptions.m4 \
 	$(top_srcdir)/../ltsugar.m4 $(top_srcdir)/../ltversion.m4 \
 	$(top_srcdir)/../lt~obsolete.m4 $(top_srcdir)/acinclude.m4 \
-	$(top_srcdir)/../libtool.m4 $(top_srcdir)/configure.ac
+	$(top_srcdir)/../libtool.m4 $(top_srcdir)/plugin/configfrag.ac \
+	$(top_srcdir)/configure.ac
 am__configure_deps = $(am__aclocal_m4_deps) $(CONFIGURE_DEPENDENCIES) \
 	$(ACLOCAL_M4)
 am__CONFIG_DISTCLEAN_FILES = config.status config.cache config.log \
@@ -91,12 +123,38 @@ am__installdirs = "$(DESTDIR)$(toolexeclibdir)" "$(DESTDIR)$(infodir)" \
 	"$(DESTDIR)$(fincludedir)" "$(DESTDIR)$(libsubincludedir)" \
 	"$(DESTDIR)$(toolexeclibdir)"
 LTLIBRARIES = $(toolexeclib_LTLIBRARIES)
+libgomp_plugin_host_nonshm_la_LIBADD =
+am_libgomp_plugin_host_nonshm_la_OBJECTS =  \
+	libgomp_plugin_host_nonshm_la-plugin-host.lo
+libgomp_plugin_host_nonshm_la_OBJECTS =  \
+	$(am_libgomp_plugin_host_nonshm_la_OBJECTS)
+libgomp_plugin_host_nonshm_la_LINK = $(LIBTOOL) --tag=CC \
+	$(libgomp_plugin_host_nonshm_la_LIBTOOLFLAGS) $(LIBTOOLFLAGS) \
+	--mode=link $(CCLD) $(AM_CFLAGS) $(CFLAGS) \
+	$(libgomp_plugin_host_nonshm_la_LDFLAGS) $(LDFLAGS) -o $@
+am__DEPENDENCIES_1 =
+@PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_DEPENDENCIES =  \
+@PLUGIN_NVPTX_TRUE@	$(am__DEPENDENCIES_1)
+@PLUGIN_NVPTX_TRUE@am_libgomp_plugin_nvptx_la_OBJECTS =  \
+@PLUGIN_NVPTX_TRUE@	libgomp_plugin_nvptx_la-plugin-nvptx.lo
+libgomp_plugin_nvptx_la_OBJECTS =  \
+	$(am_libgomp_plugin_nvptx_la_OBJECTS)
+libgomp_plugin_nvptx_la_LINK = $(LIBTOOL) --tag=CC \
+	$(libgomp_plugin_nvptx_la_LIBTOOLFLAGS) $(LIBTOOLFLAGS) \
+	--mode=link $(CCLD) $(AM_CFLAGS) $(CFLAGS) \
+	$(libgomp_plugin_nvptx_la_LDFLAGS) $(LDFLAGS) -o $@
+@PLUGIN_NVPTX_TRUE@am_libgomp_plugin_nvptx_la_rpath = -rpath \
+@PLUGIN_NVPTX_TRUE@	$(toolexeclibdir)
 libgomp_la_LIBADD =
+@USE_FORTRAN_TRUE@am__objects_1 = openacc.lo
 am_libgomp_la_OBJECTS = alloc.lo barrier.lo critical.lo env.lo \
 	error.lo iter.lo iter_ull.lo loop.lo loop_ull.lo ordered.lo \
 	parallel.lo sections.lo single.lo task.lo team.lo work.lo \
 	lock.lo mutex.lo proc.lo sem.lo bar.lo ptrlock.lo time.lo \
-	fortran.lo affinity.lo target.lo
+	fortran.lo affinity.lo target.lo oacc-parallel.lo \
+	splay-tree.lo oacc-host.lo oacc-init.lo oacc-mem.lo \
+	oacc-async.lo oacc-plugin.lo oacc-cuda.lo libgomp-plugin.lo \
+	$(am__objects_1)
 libgomp_la_OBJECTS = $(am_libgomp_la_OBJECTS)
 DEFAULT_INCLUDES = -I.@am__isrc@
 depcomp = $(SHELL) $(top_srcdir)/../depcomp
@@ -108,7 +166,15 @@ LTCOMPILE = $(LIBTOOL) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) \
 	--mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) \
 	$(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS)
 CCLD = $(CC)
-SOURCES = $(libgomp_la_SOURCES)
+FCCOMPILE = $(FC) $(AM_FCFLAGS) $(FCFLAGS)
+LTFCCOMPILE = $(LIBTOOL) --tag=FC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) \
+	--mode=compile $(FC) $(AM_FCFLAGS) $(FCFLAGS)
+FCLD = $(FC)
+FCLINK = $(LIBTOOL) --tag=FC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) \
+	--mode=link $(FCLD) $(AM_FCFLAGS) $(FCFLAGS) $(AM_LDFLAGS) \
+	$(LDFLAGS) -o $@
+SOURCES = $(libgomp_plugin_host_nonshm_la_SOURCES) \
+	$(libgomp_plugin_nvptx_la_SOURCES) $(libgomp_la_SOURCES)
 MULTISRCTOP = 
 MULTIBUILDTOP = 
 MULTIDIRS = 
@@ -155,6 +221,8 @@ CCDEPMODE = @CCDEPMODE@
 CFLAGS = @CFLAGS@
 CPP = @CPP@
 CPPFLAGS = @CPPFLAGS@
+CUDA_DRIVER_INCLUDE = @CUDA_DRIVER_INCLUDE@
+CUDA_DRIVER_LIB = @CUDA_DRIVER_LIB@
 CYGPATH_W = @CYGPATH_W@
 DEFS = @DEFS@
 DEPDIR = @DEPDIR@
@@ -213,6 +281,10 @@ PACKAGE_URL = @PACKAGE_URL@
 PACKAGE_VERSION = @PACKAGE_VERSION@
 PATH_SEPARATOR = @PATH_SEPARATOR@
 PERL = @PERL@
+PLUGIN_NVPTX = @PLUGIN_NVPTX@
+PLUGIN_NVPTX_CPPFLAGS = @PLUGIN_NVPTX_CPPFLAGS@
+PLUGIN_NVPTX_LDFLAGS = @PLUGIN_NVPTX_LDFLAGS@
+PLUGIN_NVPTX_LIBS = @PLUGIN_NVPTX_LIBS@
 RANLIB = @RANLIB@
 SECTION_LDFLAGS = @SECTION_LDFLAGS@
 SED = @SED@
@@ -293,13 +365,16 @@ top_srcdir = @top_srcdir@
 ACLOCAL_AMFLAGS = -I .. -I ../config
 SUBDIRS = testsuite
 gcc_version := $(shell cat $(top_srcdir)/../gcc/BASE-VER)
-search_path = $(addprefix $(top_srcdir)/config/, $(config_path)) $(top_srcdir)
+search_path = $(addprefix $(top_srcdir)/config/, $(config_path)) $(top_srcdir) \
+	      $(top_srcdir)/../include
+
 fincludedir = $(libdir)/gcc/$(target_alias)/$(gcc_version)/finclude
 libsubincludedir = $(libdir)/gcc/$(target_alias)/$(gcc_version)/include
-AM_CPPFLAGS = $(addprefix -I, $(search_path))
+AM_CPPFLAGS = $(addprefix -I, $(search_path)) $(am__append_1)
 AM_CFLAGS = $(XCFLAGS)
 AM_LDFLAGS = $(XLDFLAGS) $(SECTION_LDFLAGS) $(OPT_LDFLAGS)
-toolexeclib_LTLIBRARIES = libgomp.la
+toolexeclib_LTLIBRARIES = libgomp.la $(am__append_2) \
+	libgomp-plugin-host_nonshm.la
 nodist_toolexeclib_HEADERS = libgomp.spec
 
 # -Wc is only a libtool option.
@@ -318,13 +393,34 @@ libgomp_la_LDFLAGS = $(libgomp_version_info) $(libgomp_version_script) \
 libgomp_la_DEPENDENCIES = $(libgomp_version_dep)
 libgomp_la_LINK = $(LINK) $(libgomp_la_LDFLAGS)
 libgomp_la_SOURCES = alloc.c barrier.c critical.c env.c error.c iter.c \
-	iter_ull.c loop.c loop_ull.c ordered.c parallel.c sections.c single.c \
-	task.c team.c work.c lock.c mutex.c proc.c sem.c bar.c ptrlock.c \
-	time.c fortran.c affinity.c target.c
-
+	iter_ull.c loop.c loop_ull.c ordered.c parallel.c sections.c \
+	single.c task.c team.c work.c lock.c mutex.c proc.c sem.c \
+	bar.c ptrlock.c time.c fortran.c affinity.c target.c \
+	oacc-parallel.c splay-tree.c oacc-host.c oacc-init.c \
+	oacc-mem.c oacc-async.c oacc-plugin.c oacc-cuda.c \
+	libgomp-plugin.c $(am__append_3)
+
+# Nvidia PTX OpenACC plugin.
+@PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_version_info = -version-info $(libtool_VERSION)
+@PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_SOURCES = plugin/plugin-nvptx.c
+@PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_CPPFLAGS = $(AM_CPPFLAGS) $(PLUGIN_NVPTX_CPPFLAGS)
+@PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_LDFLAGS =  \
+@PLUGIN_NVPTX_TRUE@	$(libgomp_plugin_nvptx_version_info) \
+@PLUGIN_NVPTX_TRUE@	$(lt_host_flags) $(PLUGIN_NVPTX_LDFLAGS)
+@PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_LIBADD = $(PLUGIN_NVPTX_LIBS)
+@PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_LIBTOOLFLAGS = --tag=disable-static
+libgomp_plugin_host_nonshm_version_info = -version-info $(libtool_VERSION)
+libgomp_plugin_host_nonshm_la_SOURCES = plugin/plugin-host.c
+libgomp_plugin_host_nonshm_la_CPPFLAGS = $(AM_CPPFLAGS) -DHOST_NONSHM_PLUGIN
+libgomp_plugin_host_nonshm_la_LDFLAGS = \
+	$(libgomp_plugin_host_nonshm_version_info) $(lt_host_flags)
+
+libgomp_plugin_host_nonshm_la_LIBTOOLFLAGS = --tag=disable-static
 nodist_noinst_HEADERS = libgomp_f.h
-nodist_libsubinclude_HEADERS = omp.h
-@USE_FORTRAN_TRUE@nodist_finclude_HEADERS = omp_lib.h omp_lib.f90 omp_lib.mod omp_lib_kinds.mod
+nodist_libsubinclude_HEADERS = omp.h openacc.h ../include/gomp-constants.h
+@USE_FORTRAN_TRUE@nodist_finclude_HEADERS = omp_lib.h omp_lib.f90 omp_lib.mod omp_lib_kinds.mod \
+@USE_FORTRAN_TRUE@	openacc_lib.h openacc.f90 openacc.mod openacc_kinds.mod
+
 LTLDFLAGS = $(shell $(SHELL) $(top_srcdir)/../libtool-ldflags $(LDFLAGS))
 LINK = $(LIBTOOL) --tag CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=link \
 	$(CCLD) $(AM_CFLAGS) $(CFLAGS) $(AM_LDFLAGS) $(LTLDFLAGS) -o $@
@@ -354,10 +450,10 @@ all: config.h
 	$(MAKE) $(AM_MAKEFLAGS) all-recursive
 
 .SUFFIXES:
-.SUFFIXES: .c .dvi .lo .o .obj .ps
+.SUFFIXES: .c .dvi .f90 .lo .o .obj .ps
 am--refresh:
 	@:
-$(srcdir)/Makefile.in: @MAINTAINER_MODE_TRUE@ $(srcdir)/Makefile.am  $(am__configure_deps)
+$(srcdir)/Makefile.in: @MAINTAINER_MODE_TRUE@ $(srcdir)/Makefile.am $(top_srcdir)/plugin/Makefrag.am $(am__configure_deps)
 	@for dep in $?; do \
 	  case '$(am__configure_deps)' in \
 	    *$$dep*) \
@@ -447,6 +543,10 @@ clean-toolexeclibLTLIBRARIES:
 	  echo "rm -f \"$${dir}/so_locations\""; \
 	  rm -f "$${dir}/so_locations"; \
 	done
+libgomp-plugin-host_nonshm.la: $(libgomp_plugin_host_nonshm_la_OBJECTS) $(libgomp_plugin_host_nonshm_la_DEPENDENCIES) 
+	$(libgomp_plugin_host_nonshm_la_LINK) -rpath $(toolexeclibdir) $(libgomp_plugin_host_nonshm_la_OBJECTS) $(libgomp_plugin_host_nonshm_la_LIBADD) $(LIBS)
+libgomp-plugin-nvptx.la: $(libgomp_plugin_nvptx_la_OBJECTS) $(libgomp_plugin_nvptx_la_DEPENDENCIES) 
+	$(libgomp_plugin_nvptx_la_LINK) $(am_libgomp_plugin_nvptx_la_rpath) $(libgomp_plugin_nvptx_la_OBJECTS) $(libgomp_plugin_nvptx_la_LIBADD) $(LIBS)
 libgomp.la: $(libgomp_la_OBJECTS) $(libgomp_la_DEPENDENCIES) 
 	$(libgomp_la_LINK) -rpath $(toolexeclibdir) $(libgomp_la_OBJECTS) $(libgomp_la_LIBADD) $(LIBS)
 
@@ -466,10 +566,20 @@ distclean-compile:
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/fortran.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/iter.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/iter_ull.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libgomp-plugin.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libgomp_plugin_host_nonshm_la-plugin-host.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libgomp_plugin_nvptx_la-plugin-nvptx.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/lock.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/loop.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/loop_ull.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/mutex.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/oacc-async.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/oacc-cuda.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/oacc-host.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/oacc-init.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/oacc-mem.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/oacc-parallel.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/oacc-plugin.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ordered.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/parallel.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/proc.Plo@am__quote@
@@ -477,6 +587,7 @@ distclean-compile:
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/sections.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/sem.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/single.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/splay-tree.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/target.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/task.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/team.Plo@am__quote@
@@ -504,6 +615,29 @@ distclean-compile:
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
 @am__fastdepCC_FALSE@	$(LTCOMPILE) -c -o $@ $<
 
+libgomp_plugin_host_nonshm_la-plugin-host.lo: plugin/plugin-host.c
+@am__fastdepCC_TRUE@	$(LIBTOOL)  --tag=CC $(libgomp_plugin_host_nonshm_la_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libgomp_plugin_host_nonshm_la_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) -MT libgomp_plugin_host_nonshm_la-plugin-host.lo -MD -MP -MF $(DEPDIR)/libgomp_plugin_host_nonshm_la-plugin-host.Tpo -c -o libgomp_plugin_host_nonshm_la-plugin-host.lo `test -f 'plugin/plugin-host.c' || echo '$(srcdir)/'`plugin/plugin-host.c
+@am__fastdepCC_TRUE@	$(am__mv) $(DEPDIR)/libgomp_plugin_host_nonshm_la-plugin-host.Tpo $(DEPDIR)/libgomp_plugin_host_nonshm_la-plugin-host.Plo
+@AMDEP_TRUE@@am__fastdepCC_FALSE@	source='plugin/plugin-host.c' object='libgomp_plugin_host_nonshm_la-plugin-host.lo' libtool=yes @AMDEPBACKSLASH@
+@AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
+@am__fastdepCC_FALSE@	$(LIBTOOL)  --tag=CC $(libgomp_plugin_host_nonshm_la_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libgomp_plugin_host_nonshm_la_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) -c -o libgomp_plugin_host_nonshm_la-plugin-host.lo `test -f 'plugin/plugin-host.c' || echo '$(srcdir)/'`plugin/plugin-host.c
+
+libgomp_plugin_nvptx_la-plugin-nvptx.lo: plugin/plugin-nvptx.c
+@am__fastdepCC_TRUE@	$(LIBTOOL)  --tag=CC $(libgomp_plugin_nvptx_la_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libgomp_plugin_nvptx_la_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) -MT libgomp_plugin_nvptx_la-plugin-nvptx.lo -MD -MP -MF $(DEPDIR)/libgomp_plugin_nvptx_la-plugin-nvptx.Tpo -c -o libgomp_plugin_nvptx_la-plugin-nvptx.lo `test -f 'plugin/plugin-nvptx.c' || echo '$(srcdir)/'`plugin/plugin-nvptx.c
+@am__fastdepCC_TRUE@	$(am__mv) $(DEPDIR)/libgomp_plugin_nvptx_la-plugin-nvptx.Tpo $(DEPDIR)/libgomp_plugin_nvptx_la-plugin-nvptx.Plo
+@AMDEP_TRUE@@am__fastdepCC_FALSE@	source='plugin/plugin-nvptx.c' object='libgomp_plugin_nvptx_la-plugin-nvptx.lo' libtool=yes @AMDEPBACKSLASH@
+@AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
+@am__fastdepCC_FALSE@	$(LIBTOOL)  --tag=CC $(libgomp_plugin_nvptx_la_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libgomp_plugin_nvptx_la_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) -c -o libgomp_plugin_nvptx_la-plugin-nvptx.lo `test -f 'plugin/plugin-nvptx.c' || echo '$(srcdir)/'`plugin/plugin-nvptx.c
+
+.f90.o:
+	$(FCCOMPILE) -c -o $@ $<
+
+.f90.obj:
+	$(FCCOMPILE) -c -o $@ `$(CYGPATH_W) '$<'`
+
+.f90.lo:
+	$(LTFCCOMPILE) -c -o $@ $<
+
 mostlyclean-libtool:
 	-rm -f *.lo
 
diff --git a/libgomp/configure b/libgomp/configure
index 19f36c6..83a6a11 100755
--- a/libgomp/configure
+++ b/libgomp/configure
@@ -1,6 +1,6 @@
 #! /bin/sh
 # Guess values for system-dependent variables and create Makefiles.
-# Generated by GNU Autoconf 2.64 for GNU OpenMP Runtime Library 1.0.
+# Generated by GNU Autoconf 2.64 for GNU Offloading and Multi Processing Runtime Library 1.0.
 #
 # Copyright (C) 1992, 1993, 1994, 1995, 1996, 1998, 1999, 2000, 2001,
 # 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009 Free Software
@@ -554,10 +554,10 @@ MFLAGS=
 MAKEFLAGS=
 
 # Identity of this package.
-PACKAGE_NAME='GNU OpenMP Runtime Library'
+PACKAGE_NAME='GNU Offloading and Multi Processing Runtime Library'
 PACKAGE_TARNAME='libgomp'
 PACKAGE_VERSION='1.0'
-PACKAGE_STRING='GNU OpenMP Runtime Library 1.0'
+PACKAGE_STRING='GNU Offloading and Multi Processing Runtime Library 1.0'
 PACKAGE_BUGREPORT=''
 PACKAGE_URL='http://www.gnu.org/software/libgomp/'
 
@@ -597,11 +597,7 @@ ac_includes_default="\
 # include <unistd.h>
 #endif"
 
-ac_subst_vars='am__EXEEXT_FALSE
-am__EXEEXT_TRUE
-LTLIBOBJS
-LIBOBJS
-OMP_NEST_LOCK_25_KIND
+ac_subst_vars='OMP_NEST_LOCK_25_KIND
 OMP_LOCK_25_KIND
 OMP_NEST_LOCK_25_ALIGN
 OMP_NEST_LOCK_25_SIZE
@@ -630,6 +626,20 @@ LIBGOMP_BUILD_VERSIONED_SHLIB_FALSE
 LIBGOMP_BUILD_VERSIONED_SHLIB_TRUE
 OPT_LDFLAGS
 SECTION_LDFLAGS
+am__EXEEXT_FALSE
+am__EXEEXT_TRUE
+LTLIBOBJS
+LIBOBJS
+PLUGIN_NVPTX_FALSE
+PLUGIN_NVPTX_TRUE
+PLUGIN_NVPTX_LIBS
+PLUGIN_NVPTX_LDFLAGS
+PLUGIN_NVPTX_CPPFLAGS
+PLUGIN_NVPTX
+CUDA_DRIVER_LIB
+CUDA_DRIVER_INCLUDE
+LIBGOMP_VERBOSE_FALSE
+LIBGOMP_VERBOSE_TRUE
 libtool_VERSION
 ac_ct_FC
 FCFLAGS
@@ -770,6 +780,10 @@ enable_fast_install
 with_gnu_ld
 enable_libtool_lock
 enable_maintainer_mode
+enable_libgomp_verbose
+with_cuda_driver
+with_cuda_driver_include
+with_cuda_driver_lib
 enable_linux_futex
 enable_tls
 enable_symvers
@@ -1324,7 +1338,7 @@ if test "$ac_init_help" = "long"; then
   # Omit some internal or obsolete options to make the list less imposing.
   # This message is too long to be a string in the A/UX 3.1 sh.
   cat <<_ACEOF
-\`configure' configures GNU OpenMP Runtime Library 1.0 to adapt to many kinds of systems.
+\`configure' configures GNU Offloading and Multi Processing Runtime Library 1.0 to adapt to many kinds of systems.
 
 Usage: $0 [OPTION]... [VAR=VALUE]...
 
@@ -1395,7 +1409,7 @@ fi
 
 if test -n "$ac_init_help"; then
   case $ac_init_help in
-     short | recursive ) echo "Configuration of GNU OpenMP Runtime Library 1.0:";;
+     short | recursive ) echo "Configuration of GNU Offloading and Multi Processing Runtime Library 1.0:";;
    esac
   cat <<\_ACEOF
 
@@ -1420,6 +1434,8 @@ Optional Features:
   --disable-libtool-lock  avoid locking (might break parallel builds)
   --enable-maintainer-mode  enable make rules and dependencies not useful
 			  (and sometimes confusing) to the casual installer
+  --enable-libgomp-verbose
+                          enable verbose debugging output for libgomp
   --enable-linux-futex    use the Linux futex system call [default=default]
   --enable-tls            Use thread-local storage [default=yes]
   --enable-symvers=STYLE  enables symbol versioning of the shared library
@@ -1431,6 +1447,16 @@ Optional Packages:
   --with-pic              try to use only PIC/non-PIC objects [default=use
                           both]
   --with-gnu-ld           assume the C compiler uses GNU ld [default=no]
+  --with-cuda-driver=PATH specify prefix directory for installed CUDA driver
+                          package. Equivalent to
+                          --with-cuda-driver-include=PATH/include plus
+                          --with-cuda-driver-lib=PATH/lib
+  --with-cuda-driver-include=PATH
+                          specify directory for installed CUDA driver include
+                          files
+  --with-cuda-driver-lib=PATH
+                          specify directory for the installed CUDA driver
+                          library
 
 Some influential environment variables:
   CC          C compiler command
@@ -1448,7 +1474,7 @@ Use these variables to override the choices made by `configure' or to help
 it to find libraries and programs with nonstandard names/locations.
 
 Report bugs to the package provider.
-GNU OpenMP Runtime Library home page: <http://www.gnu.org/software/libgomp/>.
+GNU Offloading and Multi Processing Runtime Library home page: <http://www.gnu.org/software/libgomp/>.
 General help using GNU software: <http://www.gnu.org/gethelp/>.
 _ACEOF
 ac_status=$?
@@ -1512,7 +1538,7 @@ fi
 test -n "$ac_init_help" && exit $ac_status
 if $ac_init_version; then
   cat <<\_ACEOF
-GNU OpenMP Runtime Library configure 1.0
+GNU Offloading and Multi Processing Runtime Library configure 1.0
 generated by GNU Autoconf 2.64
 
 Copyright (C) 2009 Free Software Foundation, Inc.
@@ -2193,7 +2219,7 @@ cat >config.log <<_ACEOF
 This file contains any messages produced by compilers while
 running configure, to aid debugging if configure makes a mistake.
 
-It was created by GNU OpenMP Runtime Library $as_me 1.0, which was
+It was created by GNU Offloading and Multi Processing Runtime Library $as_me 1.0, which was
 generated by GNU Autoconf 2.64.  Invocation command line was
 
   $ $0 $@
@@ -2599,7 +2625,6 @@ else
 fi
 
 
-
 # -------
 # -------
 
@@ -11097,7 +11122,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 11100 "configure"
+#line 11125 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -11203,7 +11228,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 11206 "configure"
+#line 11231 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -15055,6 +15080,27 @@ fi
 rm -f core conftest.err conftest.$ac_objext \
     conftest$ac_exeext conftest.$ac_ext
 
+# Enable --enable-libgomp-verbose
+# Check whether --enable-libgomp-verbose was given.
+if test "${enable_libgomp_verbose+set}" = set; then :
+  enableval=$enable_libgomp_verbose; case "${enableval}" in
+  yes) libgomp_verbose=true ;;
+  no) libgomp_verbose=false ;;
+  *) as_fn_error "bad value ${enableval} for --enable-libgomp-verbose" "$LINENO" 5 ;;
+esac
+else
+  libgomp_verbose=false
+fi
+
+ if test x$libgomp_verbose = xtrue; then
+  LIBGOMP_VERBOSE_TRUE=
+  LIBGOMP_VERBOSE_FALSE='#'
+else
+  LIBGOMP_VERBOSE_TRUE='#'
+  LIBGOMP_VERBOSE_FALSE=
+fi
+
+
 plugin_support=yes
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking for dlsym in -ldl" >&5
 $as_echo_n "checking for dlsym in -ldl... " >&6; }
@@ -15107,232 +15153,3147 @@ if test x"$plugin_support" = xyes; then
 
 $as_echo "#define PLUGIN_SUPPORT 1" >>confdefs.h
 
+elif test "x$enable_accelerator" != xno; then
+  as_fn_error "Can't have support for accelerators without support for plugins" "$LINENO" 5
 fi
 
-# Check for functions needed.
-for ac_func in getloadavg clock_gettime strtoull
-do :
-  as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
-ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
-eval as_val=\$$as_ac_var
-   if test "x$as_val" = x""yes; then :
-  cat >>confdefs.h <<_ACEOF
-#define `$as_echo "HAVE_$ac_func" | $as_tr_cpp` 1
-_ACEOF
-
-fi
-done
+# Plugins for offload execution, configure.ac fragment.
+#
+# Copyright (C) 2014 Free Software Foundation, Inc.
+#
+# Contributed by Mentor Embedded.
+#
+# This file is part of the GNU OpenMP Library (libgomp).
+#
+# Libgomp is free software; you can redistribute it and/or modify it
+# under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+# WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+# FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+# more details.
+#
+# Under Section 7 of GPL version 3, you are granted additional
+# permissions described in the GCC Runtime Library Exception, version
+# 3.1, as published by the Free Software Foundation.
+#
+# You should have received a copy of the GNU General Public License and
+# a copy of the GCC Runtime Library Exception along with this program;
+# see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+# <http://www.gnu.org/licenses/>.
 
+# Look for the CUDA driver package.
+CUDA_DRIVER_INCLUDE=
+CUDA_DRIVER_LIB=
 
-# Check for broken semaphore implementation on darwin.
-# sem_init returns: sem_init error: Function not implemented.
-case "$host" in
-  *-darwin*)
 
-$as_echo "#define HAVE_BROKEN_POSIX_SEMAPHORES 1" >>confdefs.h
+CUDA_DRIVER_CPPFLAGS=
+CUDA_DRIVER_LDFLAGS=
 
-    ;;
-esac
+# Check whether --with-cuda-driver was given.
+if test "${with_cuda_driver+set}" = set; then :
+  withval=$with_cuda_driver;
+fi
 
- # Check whether --enable-linux-futex was given.
-if test "${enable_linux_futex+set}" = set; then :
-  enableval=$enable_linux_futex;
-      case "$enableval" in
-       yes|no|default) ;;
-       *) as_fn_error "Unknown argument to enable/disable linux-futex" "$LINENO" 5 ;;
-                          esac
 
-else
-  enable_linux_futex=default
+# Check whether --with-cuda-driver-include was given.
+if test "${with_cuda_driver_include+set}" = set; then :
+  withval=$with_cuda_driver_include;
 fi
 
 
-case "$target" in
-  *-linux*)
-    case "$enable_linux_futex" in
-      default)
-	# If headers don't have gettid/futex syscalls definition, then
-	# default to no, otherwise there will be compile time failures.
-	# Otherwise, default to yes.  If we don't detect we are
-	# compiled/linked against NPTL and not cross-compiling, check
-	# if programs are run by default against NPTL and if not, issue
-	# a warning.
-	enable_linux_futex=no
-	cat confdefs.h - <<_ACEOF >conftest.$ac_ext
-/* end confdefs.h.  */
-#include <sys/syscall.h>
-	   int lk;
-int
-main ()
-{
-syscall (SYS_gettid); syscall (SYS_futex, &lk, 0, 0, 0);
-  ;
-  return 0;
-}
-_ACEOF
-if ac_fn_c_try_link "$LINENO"; then :
-  save_LIBS="$LIBS"
-	   LIBS="-lpthread $LIBS"
-	   cat confdefs.h - <<_ACEOF >conftest.$ac_ext
-/* end confdefs.h.  */
-#ifndef _GNU_SOURCE
-	     #define _GNU_SOURCE 1
-	     #endif
-	     #include <pthread.h>
-	     pthread_t th; void *status;
-int
-main ()
-{
-pthread_tryjoin_np (th, &status);
-  ;
-  return 0;
-}
-_ACEOF
-if ac_fn_c_try_link "$LINENO"; then :
-  enable_linux_futex=yes
-else
-  if test x$cross_compiling = xno; then
-	       if getconf GNU_LIBPTHREAD_VERSION 2>/dev/null \
-		  | LC_ALL=C grep -i NPTL > /dev/null 2>/dev/null; then :; else
-		 { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: The kernel might not support futex or gettid syscalls.
-If so, please configure with --disable-linux-futex" >&5
-$as_echo "$as_me: WARNING: The kernel might not support futex or gettid syscalls.
-If so, please configure with --disable-linux-futex" >&2;}
-	       fi
-	     fi
-	     enable_linux_futex=yes
-fi
-rm -f core conftest.err conftest.$ac_objext \
-    conftest$ac_exeext conftest.$ac_ext
-	   LIBS="$save_LIBS"
+# Check whether --with-cuda-driver-lib was given.
+if test "${with_cuda_driver_lib+set}" = set; then :
+  withval=$with_cuda_driver_lib;
 fi
-rm -f core conftest.err conftest.$ac_objext \
-    conftest$ac_exeext conftest.$ac_ext
-	;;
-      yes)
-	cat confdefs.h - <<_ACEOF >conftest.$ac_ext
-/* end confdefs.h.  */
-#include <sys/syscall.h>
-	   int lk;
-int
-main ()
-{
-syscall (SYS_gettid); syscall (SYS_futex, &lk, 0, 0, 0);
-  ;
-  return 0;
-}
-_ACEOF
-if ac_fn_c_try_link "$LINENO"; then :
 
-else
-  as_fn_error "SYS_gettid and SYS_futex required for --enable-linux-futex" "$LINENO" 5
+if test "x$with_cuda_driver" != x; then
+  CUDA_DRIVER_INCLUDE=$with_cuda_driver/include
+  CUDA_DRIVER_LIB=$with_cuda_driver/lib
 fi
-rm -f core conftest.err conftest.$ac_objext \
-    conftest$ac_exeext conftest.$ac_ext
-	;;
-    esac
-    ;;
-  *)
-    enable_linux_futex=no
-    ;;
-esac
-if test x$enable_linux_futex = xyes; then
-  :
+if test "x$with_cuda_driver_include" != x; then
+  CUDA_DRIVER_INCLUDE=$with_cuda_driver_include
+fi
+if test "x$with_cuda_driver_lib" != x; then
+  CUDA_DRIVER_LIB=$with_cuda_driver_lib
+fi
+if test "x$CUDA_DRIVER_INCLUDE" != x; then
+  CUDA_DRIVER_CPPFLAGS=-I$CUDA_DRIVER_INCLUDE
+fi
+if test "x$CUDA_DRIVER_LIB" != x; then
+  CUDA_DRIVER_LDFLAGS=-L$CUDA_DRIVER_LIB
 fi
 
+PLUGIN_NVPTX=0
+PLUGIN_NVPTX_CPPFLAGS=
+PLUGIN_NVPTX_LDFLAGS=
+PLUGIN_NVPTX_LIBS=
 
-# Check for pthread_{,attr_}[sg]etaffinity_np.
-cat confdefs.h - <<_ACEOF >conftest.$ac_ext
-/* end confdefs.h.  */
-#define _GNU_SOURCE
-   #include <pthread.h>
-int
-main ()
-{
-cpu_set_t cpuset;
-   pthread_attr_t attr;
-   pthread_getaffinity_np (pthread_self (), sizeof (cpu_set_t), &cpuset);
-   if (CPU_ISSET (0, &cpuset))
-     CPU_SET (1, &cpuset);
-   else
-     CPU_ZERO (&cpuset);
-   pthread_setaffinity_np (pthread_self (), sizeof (cpu_set_t), &cpuset);
-   pthread_attr_init (&attr);
-   pthread_attr_getaffinity_np (&attr, sizeof (cpu_set_t), &cpuset);
-   pthread_attr_setaffinity_np (&attr, sizeof (cpu_set_t), &cpuset);
-  ;
-  return 0;
-}
-_ACEOF
-if ac_fn_c_try_link "$LINENO"; then :
 
-$as_echo "#define HAVE_PTHREAD_AFFINITY_NP 1" >>confdefs.h
 
-fi
-rm -f core conftest.err conftest.$ac_objext \
-    conftest$ac_exeext conftest.$ac_ext
 
-# At least for glibc, clock_gettime is in librt.  But don't pull that
-# in if it still doesn't give us the function we want.
-if test $ac_cv_func_clock_gettime = no; then
-  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for clock_gettime in -lrt" >&5
-$as_echo_n "checking for clock_gettime in -lrt... " >&6; }
-if test "${ac_cv_lib_rt_clock_gettime+set}" = set; then :
-  $as_echo_n "(cached) " >&6
-else
-  ac_check_lib_save_LIBS=$LIBS
-LIBS="-lrt  $LIBS"
-cat confdefs.h - <<_ACEOF >conftest.$ac_ext
-/* end confdefs.h.  */
 
-/* Override any GCC internal prototype to avoid an error.
-   Use char because int might match the return type of a GCC
-   builtin and then its argument prototype would still apply.  */
-#ifdef __cplusplus
-extern "C"
-#endif
-char clock_gettime ();
+for accel in `echo $enable_offload_targets | sed -e 's#,# #g'`; do
+  case "$accel" in
+    nvptx*)
+      PLUGIN_NVPTX=$accel
+      PLUGIN_NVPTX_CPPFLAGS=$CUDA_DRIVER_CPPFLAGS
+      PLUGIN_NVPTX_LDFLAGS=$CUDA_DRIVER_LDFLAGS
+      PLUGIN_NVPTX_LIBS='-lcuda'
+
+      PLUGIN_NVPTX_save_CPPFLAGS=$CPPFLAGS
+      CPPFLAGS="$PLUGIN_NVPTX_CPPFLAGS $CPPFLAGS"
+      PLUGIN_NVPTX_save_LDFLAGS=$LDFLAGS
+      LDFLAGS="$PLUGIN_NVPTX_LDFLAGS $LDFLAGS"
+      PLUGIN_NVPTX_save_LIBS=$LIBS
+      LIBS="$PLUGIN_NVPTX_LIBS $LIBS"
+      cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#include "cuda.h"
 int
 main ()
 {
-return clock_gettime ();
+CUresult r = cuCtxPushCurrent (NULL);
   ;
   return 0;
 }
 _ACEOF
 if ac_fn_c_try_link "$LINENO"; then :
-  ac_cv_lib_rt_clock_gettime=yes
-else
-  ac_cv_lib_rt_clock_gettime=no
+  PLUGIN_NVPTX=1
 fi
 rm -f core conftest.err conftest.$ac_objext \
     conftest$ac_exeext conftest.$ac_ext
-LIBS=$ac_check_lib_save_LIBS
+      CPPFLAGS=$PLUGIN_NVPTX_save_CPPFLAGS
+      LDFLAGS=$PLUGIN_NVPTX_save_LDFLAGS
+      LIBS=$PLUGIN_NVPTX_save_LIBS
+      case $PLUGIN_NVPTX in
+	nvptx*)
+	  PLUGIN_NVPTX=0
+	  as_fn_error "CUDA driver package required for nvptx support" "$LINENO" 5
+	  ;;
+      esac
+      ;;
+  esac
+done
+ if test $PLUGIN_NVPTX = 1; then
+  PLUGIN_NVPTX_TRUE=
+  PLUGIN_NVPTX_FALSE='#'
+else
+  PLUGIN_NVPTX_TRUE='#'
+  PLUGIN_NVPTX_FALSE=
 fi
-{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_rt_clock_gettime" >&5
-$as_echo "$ac_cv_lib_rt_clock_gettime" >&6; }
-if test "x$ac_cv_lib_rt_clock_gettime" = x""yes; then :
-  LIBS="-lrt $LIBS"
 
-$as_echo "#define HAVE_CLOCK_GETTIME 1" >>confdefs.h
 
-fi
+cat >>confdefs.h <<_ACEOF
+#define PLUGIN_NVPTX $PLUGIN_NVPTX
+_ACEOF
 
-fi
 
-# See if we support thread-local storage.
+cat >confcache <<\_ACEOF
+# This file is a shell script that caches the results of configure
+# tests run on this system so they can be shared between configure
+# scripts and configure runs, see configure's option --config-cache.
+# It is not useful on other systems.  If it contains results you don't
+# want to keep, you may remove or edit it.
+#
+# config.status only pays attention to the cache file if you give it
+# the --recheck option to rerun configure.
+#
+# `ac_cv_env_foo' variables (set or unset) will be overridden when
+# loading this file, other *unset* `ac_cv_foo' will be assigned the
+# following values.
 
+_ACEOF
 
-   # Check whether --enable-tls was given.
-if test "${enable_tls+set}" = set; then :
-  enableval=$enable_tls;
-      case "$enableval" in
-       yes|no) ;;
-       *) as_fn_error "Argument to enable/disable tls must be yes or no" "$LINENO" 5 ;;
+# The following way of writing the cache mishandles newlines in values,
+# but we know of no workaround that is simple, portable, and efficient.
+# So, we kill variables containing newlines.
+# Ultrix sh set writes to stderr and can't be redirected directly,
+# and sets the high bit in the cache file unless we assign to the vars.
+(
+  for ac_var in `(set) 2>&1 | sed -n 's/^\([a-zA-Z_][a-zA-Z0-9_]*\)=.*/\1/p'`; do
+    eval ac_val=\$$ac_var
+    case $ac_val in #(
+    *${as_nl}*)
+      case $ac_var in #(
+      *_cv_*) { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: cache variable $ac_var contains a newline" >&5
+$as_echo "$as_me: WARNING: cache variable $ac_var contains a newline" >&2;} ;;
       esac
+      case $ac_var in #(
+      _ | IFS | as_nl) ;; #(
+      BASH_ARGV | BASH_SOURCE) eval $ac_var= ;; #(
+      *) { eval $ac_var=; unset $ac_var;} ;;
+      esac ;;
+    esac
+  done
 
-else
-  enable_tls=yes
+  (set) 2>&1 |
+    case $as_nl`(ac_space=' '; set) 2>&1` in #(
+    *${as_nl}ac_space=\ *)
+      # `set' does not quote correctly, so add quotes: double-quote
+      # substitution turns \\\\ into \\, and sed turns \\ into \.
+      sed -n \
+	"s/'/'\\\\''/g;
+	  s/^\\([_$as_cr_alnum]*_cv_[_$as_cr_alnum]*\\)=\\(.*\\)/\\1='\\2'/p"
+      ;; #(
+    *)
+      # `set' quotes correctly as required by POSIX, so do not add quotes.
+      sed -n "/^[_$as_cr_alnum]*_cv_[_$as_cr_alnum]*=/p"
+      ;;
+    esac |
+    sort
+) |
+  sed '
+     /^ac_cv_env_/b end
+     t clear
+     :clear
+     s/^\([^=]*\)=\(.*[{}].*\)$/test "${\1+set}" = set || &/
+     t end
+     s/^\([^=]*\)=\(.*\)$/\1=${\1=\2}/
+     :end' >>confcache
+if diff "$cache_file" confcache >/dev/null 2>&1; then :; else
+  if test -w "$cache_file"; then
+    test "x$cache_file" != "x/dev/null" &&
+      { $as_echo "$as_me:${as_lineno-$LINENO}: updating cache $cache_file" >&5
+$as_echo "$as_me: updating cache $cache_file" >&6;}
+    cat confcache >$cache_file
+  else
+    { $as_echo "$as_me:${as_lineno-$LINENO}: not updating unwritable cache $cache_file" >&5
+$as_echo "$as_me: not updating unwritable cache $cache_file" >&6;}
+  fi
+fi
+rm -f confcache
+
+test "x$prefix" = xNONE && prefix=$ac_default_prefix
+# Let make expand exec_prefix.
+test "x$exec_prefix" = xNONE && exec_prefix='${prefix}'
+
+DEFS=-DHAVE_CONFIG_H
+
+ac_libobjs=
+ac_ltlibobjs=
+for ac_i in : $LIBOBJS; do test "x$ac_i" = x: && continue
+  # 1. Remove the extension, and $U if already installed.
+  ac_script='s/\$U\././;s/\.o$//;s/\.obj$//'
+  ac_i=`$as_echo "$ac_i" | sed "$ac_script"`
+  # 2. Prepend LIBOBJDIR.  When used with automake>=1.10 LIBOBJDIR
+  #    will be set to the directory where LIBOBJS objects are built.
+  as_fn_append ac_libobjs " \${LIBOBJDIR}$ac_i\$U.$ac_objext"
+  as_fn_append ac_ltlibobjs " \${LIBOBJDIR}$ac_i"'$U.lo'
+done
+LIBOBJS=$ac_libobjs
+
+LTLIBOBJS=$ac_ltlibobjs
+
+
+if test -z "${GENINSRC_TRUE}" && test -z "${GENINSRC_FALSE}"; then
+  as_fn_error "conditional \"GENINSRC\" was never defined.
+Usually this means the macro was only invoked conditionally." "$LINENO" 5
+fi
+ if test -n "$EXEEXT"; then
+  am__EXEEXT_TRUE=
+  am__EXEEXT_FALSE='#'
+else
+  am__EXEEXT_TRUE='#'
+  am__EXEEXT_FALSE=
+fi
+
+if test -z "${AMDEP_TRUE}" && test -z "${AMDEP_FALSE}"; then
+  as_fn_error "conditional \"AMDEP\" was never defined.
+Usually this means the macro was only invoked conditionally." "$LINENO" 5
+fi
+if test -z "${am__fastdepCC_TRUE}" && test -z "${am__fastdepCC_FALSE}"; then
+  as_fn_error "conditional \"am__fastdepCC\" was never defined.
+Usually this means the macro was only invoked conditionally." "$LINENO" 5
+fi
+if test -z "${BUILD_INFO_TRUE}" && test -z "${BUILD_INFO_FALSE}"; then
+  as_fn_error "conditional \"BUILD_INFO\" was never defined.
+Usually this means the macro was only invoked conditionally." "$LINENO" 5
+fi
+if test -z "${MAINTAINER_MODE_TRUE}" && test -z "${MAINTAINER_MODE_FALSE}"; then
+  as_fn_error "conditional \"MAINTAINER_MODE\" was never defined.
+Usually this means the macro was only invoked conditionally." "$LINENO" 5
+fi
+if test -z "${LIBGOMP_VERBOSE_TRUE}" && test -z "${LIBGOMP_VERBOSE_FALSE}"; then
+  as_fn_error "conditional \"LIBGOMP_VERBOSE\" was never defined.
+Usually this means the macro was only invoked conditionally." "$LINENO" 5
+fi
+if test -z "${PLUGIN_NVPTX_TRUE}" && test -z "${PLUGIN_NVPTX_FALSE}"; then
+  as_fn_error "conditional \"PLUGIN_NVPTX\" was never defined.
+Usually this means the macro was only invoked conditionally." "$LINENO" 5
+fi
+
+: ${CONFIG_STATUS=./config.status}
+ac_write_fail=0
+ac_clean_files_save=$ac_clean_files
+ac_clean_files="$ac_clean_files $CONFIG_STATUS"
+{ $as_echo "$as_me:${as_lineno-$LINENO}: creating $CONFIG_STATUS" >&5
+$as_echo "$as_me: creating $CONFIG_STATUS" >&6;}
+as_write_fail=0
+cat >$CONFIG_STATUS <<_ASEOF || as_write_fail=1
+#! $SHELL
+# Generated by $as_me.
+# Run this file to recreate the current configuration.
+# Compiler output produced by configure, useful for debugging
+# configure, is in config.log if it exists.
+
+debug=false
+ac_cs_recheck=false
+ac_cs_silent=false
+
+SHELL=\${CONFIG_SHELL-$SHELL}
+export SHELL
+_ASEOF
+cat >>$CONFIG_STATUS <<\_ASEOF || as_write_fail=1
+## -------------------- ##
+## M4sh Initialization. ##
+## -------------------- ##
+
+# Be more Bourne compatible
+DUALCASE=1; export DUALCASE # for MKS sh
+if test -n "${ZSH_VERSION+set}" && (emulate sh) >/dev/null 2>&1; then :
+  emulate sh
+  NULLCMD=:
+  # Pre-4.2 versions of Zsh do word splitting on ${1+"$@"}, which
+  # is contrary to our usage.  Disable this feature.
+  alias -g '${1+"$@"}'='"$@"'
+  setopt NO_GLOB_SUBST
+else
+  case `(set -o) 2>/dev/null` in #(
+  *posix*) :
+    set -o posix ;; #(
+  *) :
+     ;;
+esac
+fi
+
+
+as_nl='
+'
+export as_nl
+# Printing a long string crashes Solaris 7 /usr/bin/printf.
+as_echo='\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\'
+as_echo=$as_echo$as_echo$as_echo$as_echo$as_echo
+as_echo=$as_echo$as_echo$as_echo$as_echo$as_echo$as_echo
+# Prefer a ksh shell builtin over an external printf program on Solaris,
+# but without wasting forks for bash or zsh.
+if test -z "$BASH_VERSION$ZSH_VERSION" \
+    && (test "X`print -r -- $as_echo`" = "X$as_echo") 2>/dev/null; then
+  as_echo='print -r --'
+  as_echo_n='print -rn --'
+elif (test "X`printf %s $as_echo`" = "X$as_echo") 2>/dev/null; then
+  as_echo='printf %s\n'
+  as_echo_n='printf %s'
+else
+  if test "X`(/usr/ucb/echo -n -n $as_echo) 2>/dev/null`" = "X-n $as_echo"; then
+    as_echo_body='eval /usr/ucb/echo -n "$1$as_nl"'
+    as_echo_n='/usr/ucb/echo -n'
+  else
+    as_echo_body='eval expr "X$1" : "X\\(.*\\)"'
+    as_echo_n_body='eval
+      arg=$1;
+      case $arg in #(
+      *"$as_nl"*)
+	expr "X$arg" : "X\\(.*\\)$as_nl";
+	arg=`expr "X$arg" : ".*$as_nl\\(.*\\)"`;;
+      esac;
+      expr "X$arg" : "X\\(.*\\)" | tr -d "$as_nl"
+    '
+    export as_echo_n_body
+    as_echo_n='sh -c $as_echo_n_body as_echo'
+  fi
+  export as_echo_body
+  as_echo='sh -c $as_echo_body as_echo'
+fi
+
+# The user is always right.
+if test "${PATH_SEPARATOR+set}" != set; then
+  PATH_SEPARATOR=:
+  (PATH='/bin;/bin'; FPATH=$PATH; sh -c :) >/dev/null 2>&1 && {
+    (PATH='/bin:/bin'; FPATH=$PATH; sh -c :) >/dev/null 2>&1 ||
+      PATH_SEPARATOR=';'
+  }
+fi
+
+
+# IFS
+# We need space, tab and new line, in precisely that order.  Quoting is
+# there to prevent editors from complaining about space-tab.
+# (If _AS_PATH_WALK were called with IFS unset, it would disable word
+# splitting by setting IFS to empty value.)
+IFS=" ""	$as_nl"
+
+# Find who we are.  Look in the path if we contain no directory separator.
+case $0 in #((
+  *[\\/]* ) as_myself=$0 ;;
+  *) as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
+for as_dir in $PATH
+do
+  IFS=$as_save_IFS
+  test -z "$as_dir" && as_dir=.
+    test -r "$as_dir/$0" && as_myself=$as_dir/$0 && break
+  done
+IFS=$as_save_IFS
+
+     ;;
+esac
+# We did not find ourselves, most probably we were run as `sh COMMAND'
+# in which case we are not to be found in the path.
+if test "x$as_myself" = x; then
+  as_myself=$0
+fi
+if test ! -f "$as_myself"; then
+  $as_echo "$as_myself: error: cannot find myself; rerun with an absolute file name" >&2
+  exit 1
+fi
+
+# Unset variables that we do not need and which cause bugs (e.g. in
+# pre-3.0 UWIN ksh).  But do not cause bugs in bash 2.01; the "|| exit 1"
+# suppresses any "Segmentation fault" message there.  '((' could
+# trigger a bug in pdksh 5.2.14.
+for as_var in BASH_ENV ENV MAIL MAILPATH
+do eval test x\${$as_var+set} = xset \
+  && ( (unset $as_var) || exit 1) >/dev/null 2>&1 && unset $as_var || :
+done
+PS1='$ '
+PS2='> '
+PS4='+ '
+
+# NLS nuisances.
+LC_ALL=C
+export LC_ALL
+LANGUAGE=C
+export LANGUAGE
+
+# CDPATH.
+(unset CDPATH) >/dev/null 2>&1 && unset CDPATH
+
+
+# as_fn_error ERROR [LINENO LOG_FD]
+# ---------------------------------
+# Output "`basename $0`: error: ERROR" to stderr. If LINENO and LOG_FD are
+# provided, also output the error to LOG_FD, referencing LINENO. Then exit the
+# script with status $?, using 1 if that was 0.
+as_fn_error ()
+{
+  as_status=$?; test $as_status -eq 0 && as_status=1
+  if test "$3"; then
+    as_lineno=${as_lineno-"$2"} as_lineno_stack=as_lineno_stack=$as_lineno_stack
+    $as_echo "$as_me:${as_lineno-$LINENO}: error: $1" >&$3
+  fi
+  $as_echo "$as_me: error: $1" >&2
+  as_fn_exit $as_status
+} # as_fn_error
+
+
+# as_fn_set_status STATUS
+# -----------------------
+# Set $? to STATUS, without forking.
+as_fn_set_status ()
+{
+  return $1
+} # as_fn_set_status
+
+# as_fn_exit STATUS
+# -----------------
+# Exit the shell with STATUS, even in a "trap 0" or "set -e" context.
+as_fn_exit ()
+{
+  set +e
+  as_fn_set_status $1
+  exit $1
+} # as_fn_exit
+
+# as_fn_unset VAR
+# ---------------
+# Portably unset VAR.
+as_fn_unset ()
+{
+  { eval $1=; unset $1;}
+}
+as_unset=as_fn_unset
+# as_fn_append VAR VALUE
+# ----------------------
+# Append the text in VALUE to the end of the definition contained in VAR. Take
+# advantage of any shell optimizations that allow amortized linear growth over
+# repeated appends, instead of the typical quadratic growth present in naive
+# implementations.
+if (eval "as_var=1; as_var+=2; test x\$as_var = x12") 2>/dev/null; then :
+  eval 'as_fn_append ()
+  {
+    eval $1+=\$2
+  }'
+else
+  as_fn_append ()
+  {
+    eval $1=\$$1\$2
+  }
+fi # as_fn_append
+
+# as_fn_arith ARG...
+# ------------------
+# Perform arithmetic evaluation on the ARGs, and store the result in the
+# global $as_val. Take advantage of shells that can avoid forks. The arguments
+# must be portable across $(()) and expr.
+if (eval "test \$(( 1 + 1 )) = 2") 2>/dev/null; then :
+  eval 'as_fn_arith ()
+  {
+    as_val=$(( $* ))
+  }'
+else
+  as_fn_arith ()
+  {
+    as_val=`expr "$@" || test $? -eq 1`
+  }
+fi # as_fn_arith
+
+
+if expr a : '\(a\)' >/dev/null 2>&1 &&
+   test "X`expr 00001 : '.*\(...\)'`" = X001; then
+  as_expr=expr
+else
+  as_expr=false
+fi
+
+if (basename -- /) >/dev/null 2>&1 && test "X`basename -- / 2>&1`" = "X/"; then
+  as_basename=basename
+else
+  as_basename=false
+fi
+
+if (as_dir=`dirname -- /` && test "X$as_dir" = X/) >/dev/null 2>&1; then
+  as_dirname=dirname
+else
+  as_dirname=false
+fi
+
+as_me=`$as_basename -- "$0" ||
+$as_expr X/"$0" : '.*/\([^/][^/]*\)/*$' \| \
+	 X"$0" : 'X\(//\)$' \| \
+	 X"$0" : 'X\(/\)' \| . 2>/dev/null ||
+$as_echo X/"$0" |
+    sed '/^.*\/\([^/][^/]*\)\/*$/{
+	    s//\1/
+	    q
+	  }
+	  /^X\/\(\/\/\)$/{
+	    s//\1/
+	    q
+	  }
+	  /^X\/\(\/\).*/{
+	    s//\1/
+	    q
+	  }
+	  s/.*/./; q'`
+
+# Avoid depending upon Character Ranges.
+as_cr_letters='abcdefghijklmnopqrstuvwxyz'
+as_cr_LETTERS='ABCDEFGHIJKLMNOPQRSTUVWXYZ'
+as_cr_Letters=$as_cr_letters$as_cr_LETTERS
+as_cr_digits='0123456789'
+as_cr_alnum=$as_cr_Letters$as_cr_digits
+
+ECHO_C= ECHO_N= ECHO_T=
+case `echo -n x` in #(((((
+-n*)
+  case `echo 'xy\c'` in
+  *c*) ECHO_T='	';;	# ECHO_T is single tab character.
+  xy)  ECHO_C='\c';;
+  *)   echo `echo ksh88 bug on AIX 6.1` > /dev/null
+       ECHO_T='	';;
+  esac;;
+*)
+  ECHO_N='-n';;
+esac
+
+rm -f conf$$ conf$$.exe conf$$.file
+if test -d conf$$.dir; then
+  rm -f conf$$.dir/conf$$.file
+else
+  rm -f conf$$.dir
+  mkdir conf$$.dir 2>/dev/null
+fi
+if (echo >conf$$.file) 2>/dev/null; then
+  if ln -s conf$$.file conf$$ 2>/dev/null; then
+    as_ln_s='ln -s'
+    # ... but there are two gotchas:
+    # 1) On MSYS, both `ln -s file dir' and `ln file dir' fail.
+    # 2) DJGPP < 2.04 has no symlinks; `ln -s' creates a wrapper executable.
+    # In both cases, we have to default to `cp -p'.
+    ln -s conf$$.file conf$$.dir 2>/dev/null && test ! -f conf$$.exe ||
+      as_ln_s='cp -p'
+  elif ln conf$$.file conf$$ 2>/dev/null; then
+    as_ln_s=ln
+  else
+    as_ln_s='cp -p'
+  fi
+else
+  as_ln_s='cp -p'
+fi
+rm -f conf$$ conf$$.exe conf$$.dir/conf$$.file conf$$.file
+rmdir conf$$.dir 2>/dev/null
+
+
+# as_fn_mkdir_p
+# -------------
+# Create "$as_dir" as a directory, including parents if necessary.
+as_fn_mkdir_p ()
+{
+
+  case $as_dir in #(
+  -*) as_dir=./$as_dir;;
+  esac
+  test -d "$as_dir" || eval $as_mkdir_p || {
+    as_dirs=
+    while :; do
+      case $as_dir in #(
+      *\'*) as_qdir=`$as_echo "$as_dir" | sed "s/'/'\\\\\\\\''/g"`;; #'(
+      *) as_qdir=$as_dir;;
+      esac
+      as_dirs="'$as_qdir' $as_dirs"
+      as_dir=`$as_dirname -- "$as_dir" ||
+$as_expr X"$as_dir" : 'X\(.*[^/]\)//*[^/][^/]*/*$' \| \
+	 X"$as_dir" : 'X\(//\)[^/]' \| \
+	 X"$as_dir" : 'X\(//\)$' \| \
+	 X"$as_dir" : 'X\(/\)' \| . 2>/dev/null ||
+$as_echo X"$as_dir" |
+    sed '/^X\(.*[^/]\)\/\/*[^/][^/]*\/*$/{
+	    s//\1/
+	    q
+	  }
+	  /^X\(\/\/\)[^/].*/{
+	    s//\1/
+	    q
+	  }
+	  /^X\(\/\/\)$/{
+	    s//\1/
+	    q
+	  }
+	  /^X\(\/\).*/{
+	    s//\1/
+	    q
+	  }
+	  s/.*/./; q'`
+      test -d "$as_dir" && break
+    done
+    test -z "$as_dirs" || eval "mkdir $as_dirs"
+  } || test -d "$as_dir" || as_fn_error "cannot create directory $as_dir"
+
+
+} # as_fn_mkdir_p
+if mkdir -p . 2>/dev/null; then
+  as_mkdir_p='mkdir -p "$as_dir"'
+else
+  test -d ./-p && rmdir ./-p
+  as_mkdir_p=false
+fi
+
+if test -x / >/dev/null 2>&1; then
+  as_test_x='test -x'
+else
+  if ls -dL / >/dev/null 2>&1; then
+    as_ls_L_option=L
+  else
+    as_ls_L_option=
+  fi
+  as_test_x='
+    eval sh -c '\''
+      if test -d "$1"; then
+	test -d "$1/.";
+      else
+	case $1 in #(
+	-*)set "./$1";;
+	esac;
+	case `ls -ld'$as_ls_L_option' "$1" 2>/dev/null` in #((
+	???[sx]*):;;*)false;;esac;fi
+    '\'' sh
+  '
+fi
+as_executable_p=$as_test_x
+
+# Sed expression to map a string onto a valid CPP name.
+as_tr_cpp="eval sed 'y%*$as_cr_letters%P$as_cr_LETTERS%;s%[^_$as_cr_alnum]%_%g'"
+
+# Sed expression to map a string onto a valid variable name.
+as_tr_sh="eval sed 'y%*+%pp%;s%[^_$as_cr_alnum]%_%g'"
+
+
+exec 6>&1
+## ----------------------------------- ##
+## Main body of $CONFIG_STATUS script. ##
+## ----------------------------------- ##
+_ASEOF
+test $as_write_fail = 0 && chmod +x $CONFIG_STATUS || ac_write_fail=1
+
+cat >>$CONFIG_STATUS <<\_ACEOF || ac_write_fail=1
+# Save the log message, to keep $0 and so on meaningful, and to
+# report actual input values of CONFIG_FILES etc. instead of their
+# values after options handling.
+ac_log="
+This file was extended by GNU Offloading and Multi Processing Runtime Library $as_me 1.0, which was
+generated by GNU Autoconf 2.64.  Invocation command line was
+
+  CONFIG_FILES    = $CONFIG_FILES
+  CONFIG_HEADERS  = $CONFIG_HEADERS
+  CONFIG_LINKS    = $CONFIG_LINKS
+  CONFIG_COMMANDS = $CONFIG_COMMANDS
+  $ $0 $@
+
+on `(hostname || uname -n) 2>/dev/null | sed 1q`
+"
+
+_ACEOF
+
+
+case $ac_config_headers in *"
+"*) set x $ac_config_headers; shift; ac_config_headers=$*;;
+esac
+
+
+cat >>$CONFIG_STATUS <<_ACEOF || ac_write_fail=1
+# Files that config.status was made for.
+config_headers="$ac_config_headers"
+config_commands="$ac_config_commands"
+
+_ACEOF
+
+cat >>$CONFIG_STATUS <<\_ACEOF || ac_write_fail=1
+ac_cs_usage="\
+\`$as_me' instantiates files and other configuration actions
+from templates according to the current configuration.  Unless the files
+and actions are specified as TAGs, all are instantiated by default.
+
+Usage: $0 [OPTION]... [TAG]...
+
+  -h, --help       print this help, then exit
+  -V, --version    print version number and configuration settings, then exit
+  -q, --quiet, --silent
+                   do not print progress messages
+  -d, --debug      don't remove temporary files
+      --recheck    update $as_me by reconfiguring in the same conditions
+      --header=FILE[:TEMPLATE]
+                   instantiate the configuration header FILE
+
+Configuration headers:
+$config_headers
+
+Configuration commands:
+$config_commands
+
+Report bugs to the package provider.
+GNU Offloading and Multi Processing Runtime Library home page: <http://www.gnu.org/software/libgomp/>.
+General help using GNU software: <http://www.gnu.org/gethelp/>."
+
+_ACEOF
+cat >>$CONFIG_STATUS <<_ACEOF || ac_write_fail=1
+ac_cs_version="\\
+GNU Offloading and Multi Processing Runtime Library config.status 1.0
+configured by $0, generated by GNU Autoconf 2.64,
+  with options \\"`$as_echo "$ac_configure_args" | sed 's/^ //; s/[\\""\`\$]/\\\\&/g'`\\"
+
+Copyright (C) 2009 Free Software Foundation, Inc.
+This config.status script is free software; the Free Software Foundation
+gives unlimited permission to copy, distribute and modify it."
+
+ac_pwd='$ac_pwd'
+srcdir='$srcdir'
+INSTALL='$INSTALL'
+MKDIR_P='$MKDIR_P'
+AWK='$AWK'
+test -n "\$AWK" || AWK=awk
+_ACEOF
+
+cat >>$CONFIG_STATUS <<\_ACEOF || ac_write_fail=1
+# The default lists apply if the user does not specify any file.
+ac_need_defaults=:
+while test $# != 0
+do
+  case $1 in
+  --*=*)
+    ac_option=`expr "X$1" : 'X\([^=]*\)='`
+    ac_optarg=`expr "X$1" : 'X[^=]*=\(.*\)'`
+    ac_shift=:
+    ;;
+  *)
+    ac_option=$1
+    ac_optarg=$2
+    ac_shift=shift
+    ;;
+  esac
+
+  case $ac_option in
+  # Handling of the options.
+  -recheck | --recheck | --rechec | --reche | --rech | --rec | --re | --r)
+    ac_cs_recheck=: ;;
+  --version | --versio | --versi | --vers | --ver | --ve | --v | -V )
+    $as_echo "$ac_cs_version"; exit ;;
+  --debug | --debu | --deb | --de | --d | -d )
+    debug=: ;;
+  --header | --heade | --head | --hea )
+    $ac_shift
+    case $ac_optarg in
+    *\'*) ac_optarg=`$as_echo "$ac_optarg" | sed "s/'/'\\\\\\\\''/g"` ;;
+    esac
+    as_fn_append CONFIG_HEADERS " '$ac_optarg'"
+    ac_need_defaults=false;;
+  --he | --h)
+    # Conflict between --help and --header
+    as_fn_error "ambiguous option: \`$1'
+Try \`$0 --help' for more information.";;
+  --help | --hel | -h )
+    $as_echo "$ac_cs_usage"; exit ;;
+  -q | -quiet | --quiet | --quie | --qui | --qu | --q \
+  | -silent | --silent | --silen | --sile | --sil | --si | --s)
+    ac_cs_silent=: ;;
+
+  # This is an error.
+  -*) as_fn_error "unrecognized option: \`$1'
+Try \`$0 --help' for more information." ;;
+
+  *) as_fn_append ac_config_targets " $1"
+     ac_need_defaults=false ;;
+
+  esac
+  shift
+done
+
+ac_configure_extra_args=
+
+if $ac_cs_silent; then
+  exec 6>/dev/null
+  ac_configure_extra_args="$ac_configure_extra_args --silent"
+fi
+
+_ACEOF
+cat >>$CONFIG_STATUS <<_ACEOF || ac_write_fail=1
+if \$ac_cs_recheck; then
+  set X '$SHELL' '$0' $ac_configure_args \$ac_configure_extra_args --no-create --no-recursion
+  shift
+  \$as_echo "running CONFIG_SHELL=$SHELL \$*" >&6
+  CONFIG_SHELL='$SHELL'
+  export CONFIG_SHELL
+  exec "\$@"
+fi
+
+_ACEOF
+cat >>$CONFIG_STATUS <<\_ACEOF || ac_write_fail=1
+exec 5>>config.log
+{
+  echo
+  sed 'h;s/./-/g;s/^.../## /;s/...$/ ##/;p;x;p;x' <<_ASBOX
+## Running $as_me. ##
+_ASBOX
+  $as_echo "$ac_log"
+} >&5
+
+_ACEOF
+cat >>$CONFIG_STATUS <<_ACEOF || ac_write_fail=1
+#
+# INIT-COMMANDS
+#
+
+srcdir="$srcdir"
+host="$host"
+target="$target"
+with_multisubdir="$with_multisubdir"
+with_multisrctop="$with_multisrctop"
+with_target_subdir="$with_target_subdir"
+ac_configure_args="${multilib_arg} ${ac_configure_args}"
+multi_basedir="$multi_basedir"
+CONFIG_SHELL=${CONFIG_SHELL-/bin/sh}
+CC="$CC"
+CXX="$CXX"
+GFORTRAN="$GFORTRAN"
+GCJ="$GCJ"
+AMDEP_TRUE="$AMDEP_TRUE" ac_aux_dir="$ac_aux_dir"
+
+
+# The HP-UX ksh and POSIX shell print the target directory to stdout
+# if CDPATH is set.
+(unset CDPATH) >/dev/null 2>&1 && unset CDPATH
+
+sed_quote_subst='$sed_quote_subst'
+double_quote_subst='$double_quote_subst'
+delay_variable_subst='$delay_variable_subst'
+macro_version='`$ECHO "$macro_version" | $SED "$delay_single_quote_subst"`'
+macro_revision='`$ECHO "$macro_revision" | $SED "$delay_single_quote_subst"`'
+enable_shared='`$ECHO "$enable_shared" | $SED "$delay_single_quote_subst"`'
+enable_static='`$ECHO "$enable_static" | $SED "$delay_single_quote_subst"`'
+pic_mode='`$ECHO "$pic_mode" | $SED "$delay_single_quote_subst"`'
+enable_fast_install='`$ECHO "$enable_fast_install" | $SED "$delay_single_quote_subst"`'
+SHELL='`$ECHO "$SHELL" | $SED "$delay_single_quote_subst"`'
+ECHO='`$ECHO "$ECHO" | $SED "$delay_single_quote_subst"`'
+host_alias='`$ECHO "$host_alias" | $SED "$delay_single_quote_subst"`'
+host='`$ECHO "$host" | $SED "$delay_single_quote_subst"`'
+host_os='`$ECHO "$host_os" | $SED "$delay_single_quote_subst"`'
+build_alias='`$ECHO "$build_alias" | $SED "$delay_single_quote_subst"`'
+build='`$ECHO "$build" | $SED "$delay_single_quote_subst"`'
+build_os='`$ECHO "$build_os" | $SED "$delay_single_quote_subst"`'
+SED='`$ECHO "$SED" | $SED "$delay_single_quote_subst"`'
+Xsed='`$ECHO "$Xsed" | $SED "$delay_single_quote_subst"`'
+GREP='`$ECHO "$GREP" | $SED "$delay_single_quote_subst"`'
+EGREP='`$ECHO "$EGREP" | $SED "$delay_single_quote_subst"`'
+FGREP='`$ECHO "$FGREP" | $SED "$delay_single_quote_subst"`'
+LD='`$ECHO "$LD" | $SED "$delay_single_quote_subst"`'
+NM='`$ECHO "$NM" | $SED "$delay_single_quote_subst"`'
+LN_S='`$ECHO "$LN_S" | $SED "$delay_single_quote_subst"`'
+max_cmd_len='`$ECHO "$max_cmd_len" | $SED "$delay_single_quote_subst"`'
+ac_objext='`$ECHO "$ac_objext" | $SED "$delay_single_quote_subst"`'
+exeext='`$ECHO "$exeext" | $SED "$delay_single_quote_subst"`'
+lt_unset='`$ECHO "$lt_unset" | $SED "$delay_single_quote_subst"`'
+lt_SP2NL='`$ECHO "$lt_SP2NL" | $SED "$delay_single_quote_subst"`'
+lt_NL2SP='`$ECHO "$lt_NL2SP" | $SED "$delay_single_quote_subst"`'
+reload_flag='`$ECHO "$reload_flag" | $SED "$delay_single_quote_subst"`'
+reload_cmds='`$ECHO "$reload_cmds" | $SED "$delay_single_quote_subst"`'
+OBJDUMP='`$ECHO "$OBJDUMP" | $SED "$delay_single_quote_subst"`'
+deplibs_check_method='`$ECHO "$deplibs_check_method" | $SED "$delay_single_quote_subst"`'
+file_magic_cmd='`$ECHO "$file_magic_cmd" | $SED "$delay_single_quote_subst"`'
+AR='`$ECHO "$AR" | $SED "$delay_single_quote_subst"`'
+AR_FLAGS='`$ECHO "$AR_FLAGS" | $SED "$delay_single_quote_subst"`'
+STRIP='`$ECHO "$STRIP" | $SED "$delay_single_quote_subst"`'
+RANLIB='`$ECHO "$RANLIB" | $SED "$delay_single_quote_subst"`'
+old_postinstall_cmds='`$ECHO "$old_postinstall_cmds" | $SED "$delay_single_quote_subst"`'
+old_postuninstall_cmds='`$ECHO "$old_postuninstall_cmds" | $SED "$delay_single_quote_subst"`'
+old_archive_cmds='`$ECHO "$old_archive_cmds" | $SED "$delay_single_quote_subst"`'
+lock_old_archive_extraction='`$ECHO "$lock_old_archive_extraction" | $SED "$delay_single_quote_subst"`'
+CC='`$ECHO "$CC" | $SED "$delay_single_quote_subst"`'
+CFLAGS='`$ECHO "$CFLAGS" | $SED "$delay_single_quote_subst"`'
+compiler='`$ECHO "$compiler" | $SED "$delay_single_quote_subst"`'
+GCC='`$ECHO "$GCC" | $SED "$delay_single_quote_subst"`'
+lt_cv_sys_global_symbol_pipe='`$ECHO "$lt_cv_sys_global_symbol_pipe" | $SED "$delay_single_quote_subst"`'
+lt_cv_sys_global_symbol_to_cdecl='`$ECHO "$lt_cv_sys_global_symbol_to_cdecl" | $SED "$delay_single_quote_subst"`'
+lt_cv_sys_global_symbol_to_c_name_address='`$ECHO "$lt_cv_sys_global_symbol_to_c_name_address" | $SED "$delay_single_quote_subst"`'
+lt_cv_sys_global_symbol_to_c_name_address_lib_prefix='`$ECHO "$lt_cv_sys_global_symbol_to_c_name_address_lib_prefix" | $SED "$delay_single_quote_subst"`'
+objdir='`$ECHO "$objdir" | $SED "$delay_single_quote_subst"`'
+MAGIC_CMD='`$ECHO "$MAGIC_CMD" | $SED "$delay_single_quote_subst"`'
+lt_prog_compiler_no_builtin_flag='`$ECHO "$lt_prog_compiler_no_builtin_flag" | $SED "$delay_single_quote_subst"`'
+lt_prog_compiler_wl='`$ECHO "$lt_prog_compiler_wl" | $SED "$delay_single_quote_subst"`'
+lt_prog_compiler_pic='`$ECHO "$lt_prog_compiler_pic" | $SED "$delay_single_quote_subst"`'
+lt_prog_compiler_static='`$ECHO "$lt_prog_compiler_static" | $SED "$delay_single_quote_subst"`'
+lt_cv_prog_compiler_c_o='`$ECHO "$lt_cv_prog_compiler_c_o" | $SED "$delay_single_quote_subst"`'
+need_locks='`$ECHO "$need_locks" | $SED "$delay_single_quote_subst"`'
+DSYMUTIL='`$ECHO "$DSYMUTIL" | $SED "$delay_single_quote_subst"`'
+NMEDIT='`$ECHO "$NMEDIT" | $SED "$delay_single_quote_subst"`'
+LIPO='`$ECHO "$LIPO" | $SED "$delay_single_quote_subst"`'
+OTOOL='`$ECHO "$OTOOL" | $SED "$delay_single_quote_subst"`'
+OTOOL64='`$ECHO "$OTOOL64" | $SED "$delay_single_quote_subst"`'
+libext='`$ECHO "$libext" | $SED "$delay_single_quote_subst"`'
+shrext_cmds='`$ECHO "$shrext_cmds" | $SED "$delay_single_quote_subst"`'
+extract_expsyms_cmds='`$ECHO "$extract_expsyms_cmds" | $SED "$delay_single_quote_subst"`'
+archive_cmds_need_lc='`$ECHO "$archive_cmds_need_lc" | $SED "$delay_single_quote_subst"`'
+enable_shared_with_static_runtimes='`$ECHO "$enable_shared_with_static_runtimes" | $SED "$delay_single_quote_subst"`'
+export_dynamic_flag_spec='`$ECHO "$export_dynamic_flag_spec" | $SED "$delay_single_quote_subst"`'
+whole_archive_flag_spec='`$ECHO "$whole_archive_flag_spec" | $SED "$delay_single_quote_subst"`'
+compiler_needs_object='`$ECHO "$compiler_needs_object" | $SED "$delay_single_quote_subst"`'
+old_archive_from_new_cmds='`$ECHO "$old_archive_from_new_cmds" | $SED "$delay_single_quote_subst"`'
+old_archive_from_expsyms_cmds='`$ECHO "$old_archive_from_expsyms_cmds" | $SED "$delay_single_quote_subst"`'
+archive_cmds='`$ECHO "$archive_cmds" | $SED "$delay_single_quote_subst"`'
+archive_expsym_cmds='`$ECHO "$archive_expsym_cmds" | $SED "$delay_single_quote_subst"`'
+module_cmds='`$ECHO "$module_cmds" | $SED "$delay_single_quote_subst"`'
+module_expsym_cmds='`$ECHO "$module_expsym_cmds" | $SED "$delay_single_quote_subst"`'
+with_gnu_ld='`$ECHO "$with_gnu_ld" | $SED "$delay_single_quote_subst"`'
+allow_undefined_flag='`$ECHO "$allow_undefined_flag" | $SED "$delay_single_quote_subst"`'
+no_undefined_flag='`$ECHO "$no_undefined_flag" | $SED "$delay_single_quote_subst"`'
+hardcode_libdir_flag_spec='`$ECHO "$hardcode_libdir_flag_spec" | $SED "$delay_single_quote_subst"`'
+hardcode_libdir_flag_spec_ld='`$ECHO "$hardcode_libdir_flag_spec_ld" | $SED "$delay_single_quote_subst"`'
+hardcode_libdir_separator='`$ECHO "$hardcode_libdir_separator" | $SED "$delay_single_quote_subst"`'
+hardcode_direct='`$ECHO "$hardcode_direct" | $SED "$delay_single_quote_subst"`'
+hardcode_direct_absolute='`$ECHO "$hardcode_direct_absolute" | $SED "$delay_single_quote_subst"`'
+hardcode_minus_L='`$ECHO "$hardcode_minus_L" | $SED "$delay_single_quote_subst"`'
+hardcode_shlibpath_var='`$ECHO "$hardcode_shlibpath_var" | $SED "$delay_single_quote_subst"`'
+hardcode_automatic='`$ECHO "$hardcode_automatic" | $SED "$delay_single_quote_subst"`'
+inherit_rpath='`$ECHO "$inherit_rpath" | $SED "$delay_single_quote_subst"`'
+link_all_deplibs='`$ECHO "$link_all_deplibs" | $SED "$delay_single_quote_subst"`'
+fix_srcfile_path='`$ECHO "$fix_srcfile_path" | $SED "$delay_single_quote_subst"`'
+always_export_symbols='`$ECHO "$always_export_symbols" | $SED "$delay_single_quote_subst"`'
+export_symbols_cmds='`$ECHO "$export_symbols_cmds" | $SED "$delay_single_quote_subst"`'
+exclude_expsyms='`$ECHO "$exclude_expsyms" | $SED "$delay_single_quote_subst"`'
+include_expsyms='`$ECHO "$include_expsyms" | $SED "$delay_single_quote_subst"`'
+prelink_cmds='`$ECHO "$prelink_cmds" | $SED "$delay_single_quote_subst"`'
+file_list_spec='`$ECHO "$file_list_spec" | $SED "$delay_single_quote_subst"`'
+variables_saved_for_relink='`$ECHO "$variables_saved_for_relink" | $SED "$delay_single_quote_subst"`'
+need_lib_prefix='`$ECHO "$need_lib_prefix" | $SED "$delay_single_quote_subst"`'
+need_version='`$ECHO "$need_version" | $SED "$delay_single_quote_subst"`'
+version_type='`$ECHO "$version_type" | $SED "$delay_single_quote_subst"`'
+runpath_var='`$ECHO "$runpath_var" | $SED "$delay_single_quote_subst"`'
+shlibpath_var='`$ECHO "$shlibpath_var" | $SED "$delay_single_quote_subst"`'
+shlibpath_overrides_runpath='`$ECHO "$shlibpath_overrides_runpath" | $SED "$delay_single_quote_subst"`'
+libname_spec='`$ECHO "$libname_spec" | $SED "$delay_single_quote_subst"`'
+library_names_spec='`$ECHO "$library_names_spec" | $SED "$delay_single_quote_subst"`'
+soname_spec='`$ECHO "$soname_spec" | $SED "$delay_single_quote_subst"`'
+install_override_mode='`$ECHO "$install_override_mode" | $SED "$delay_single_quote_subst"`'
+postinstall_cmds='`$ECHO "$postinstall_cmds" | $SED "$delay_single_quote_subst"`'
+postuninstall_cmds='`$ECHO "$postuninstall_cmds" | $SED "$delay_single_quote_subst"`'
+finish_cmds='`$ECHO "$finish_cmds" | $SED "$delay_single_quote_subst"`'
+finish_eval='`$ECHO "$finish_eval" | $SED "$delay_single_quote_subst"`'
+hardcode_into_libs='`$ECHO "$hardcode_into_libs" | $SED "$delay_single_quote_subst"`'
+sys_lib_search_path_spec='`$ECHO "$sys_lib_search_path_spec" | $SED "$delay_single_quote_subst"`'
+sys_lib_dlsearch_path_spec='`$ECHO "$sys_lib_dlsearch_path_spec" | $SED "$delay_single_quote_subst"`'
+hardcode_action='`$ECHO "$hardcode_action" | $SED "$delay_single_quote_subst"`'
+enable_dlopen='`$ECHO "$enable_dlopen" | $SED "$delay_single_quote_subst"`'
+enable_dlopen_self='`$ECHO "$enable_dlopen_self" | $SED "$delay_single_quote_subst"`'
+enable_dlopen_self_static='`$ECHO "$enable_dlopen_self_static" | $SED "$delay_single_quote_subst"`'
+old_striplib='`$ECHO "$old_striplib" | $SED "$delay_single_quote_subst"`'
+striplib='`$ECHO "$striplib" | $SED "$delay_single_quote_subst"`'
+compiler_lib_search_dirs='`$ECHO "$compiler_lib_search_dirs" | $SED "$delay_single_quote_subst"`'
+predep_objects='`$ECHO "$predep_objects" | $SED "$delay_single_quote_subst"`'
+postdep_objects='`$ECHO "$postdep_objects" | $SED "$delay_single_quote_subst"`'
+predeps='`$ECHO "$predeps" | $SED "$delay_single_quote_subst"`'
+postdeps='`$ECHO "$postdeps" | $SED "$delay_single_quote_subst"`'
+compiler_lib_search_path='`$ECHO "$compiler_lib_search_path" | $SED "$delay_single_quote_subst"`'
+LD_FC='`$ECHO "$LD_FC" | $SED "$delay_single_quote_subst"`'
+reload_flag_FC='`$ECHO "$reload_flag_FC" | $SED "$delay_single_quote_subst"`'
+reload_cmds_FC='`$ECHO "$reload_cmds_FC" | $SED "$delay_single_quote_subst"`'
+old_archive_cmds_FC='`$ECHO "$old_archive_cmds_FC" | $SED "$delay_single_quote_subst"`'
+compiler_FC='`$ECHO "$compiler_FC" | $SED "$delay_single_quote_subst"`'
+GCC_FC='`$ECHO "$GCC_FC" | $SED "$delay_single_quote_subst"`'
+lt_prog_compiler_no_builtin_flag_FC='`$ECHO "$lt_prog_compiler_no_builtin_flag_FC" | $SED "$delay_single_quote_subst"`'
+lt_prog_compiler_wl_FC='`$ECHO "$lt_prog_compiler_wl_FC" | $SED "$delay_single_quote_subst"`'
+lt_prog_compiler_pic_FC='`$ECHO "$lt_prog_compiler_pic_FC" | $SED "$delay_single_quote_subst"`'
+lt_prog_compiler_static_FC='`$ECHO "$lt_prog_compiler_static_FC" | $SED "$delay_single_quote_subst"`'
+lt_cv_prog_compiler_c_o_FC='`$ECHO "$lt_cv_prog_compiler_c_o_FC" | $SED "$delay_single_quote_subst"`'
+archive_cmds_need_lc_FC='`$ECHO "$archive_cmds_need_lc_FC" | $SED "$delay_single_quote_subst"`'
+enable_shared_with_static_runtimes_FC='`$ECHO "$enable_shared_with_static_runtimes_FC" | $SED "$delay_single_quote_subst"`'
+export_dynamic_flag_spec_FC='`$ECHO "$export_dynamic_flag_spec_FC" | $SED "$delay_single_quote_subst"`'
+whole_archive_flag_spec_FC='`$ECHO "$whole_archive_flag_spec_FC" | $SED "$delay_single_quote_subst"`'
+compiler_needs_object_FC='`$ECHO "$compiler_needs_object_FC" | $SED "$delay_single_quote_subst"`'
+old_archive_from_new_cmds_FC='`$ECHO "$old_archive_from_new_cmds_FC" | $SED "$delay_single_quote_subst"`'
+old_archive_from_expsyms_cmds_FC='`$ECHO "$old_archive_from_expsyms_cmds_FC" | $SED "$delay_single_quote_subst"`'
+archive_cmds_FC='`$ECHO "$archive_cmds_FC" | $SED "$delay_single_quote_subst"`'
+archive_expsym_cmds_FC='`$ECHO "$archive_expsym_cmds_FC" | $SED "$delay_single_quote_subst"`'
+module_cmds_FC='`$ECHO "$module_cmds_FC" | $SED "$delay_single_quote_subst"`'
+module_expsym_cmds_FC='`$ECHO "$module_expsym_cmds_FC" | $SED "$delay_single_quote_subst"`'
+with_gnu_ld_FC='`$ECHO "$with_gnu_ld_FC" | $SED "$delay_single_quote_subst"`'
+allow_undefined_flag_FC='`$ECHO "$allow_undefined_flag_FC" | $SED "$delay_single_quote_subst"`'
+no_undefined_flag_FC='`$ECHO "$no_undefined_flag_FC" | $SED "$delay_single_quote_subst"`'
+hardcode_libdir_flag_spec_FC='`$ECHO "$hardcode_libdir_flag_spec_FC" | $SED "$delay_single_quote_subst"`'
+hardcode_libdir_flag_spec_ld_FC='`$ECHO "$hardcode_libdir_flag_spec_ld_FC" | $SED "$delay_single_quote_subst"`'
+hardcode_libdir_separator_FC='`$ECHO "$hardcode_libdir_separator_FC" | $SED "$delay_single_quote_subst"`'
+hardcode_direct_FC='`$ECHO "$hardcode_direct_FC" | $SED "$delay_single_quote_subst"`'
+hardcode_direct_absolute_FC='`$ECHO "$hardcode_direct_absolute_FC" | $SED "$delay_single_quote_subst"`'
+hardcode_minus_L_FC='`$ECHO "$hardcode_minus_L_FC" | $SED "$delay_single_quote_subst"`'
+hardcode_shlibpath_var_FC='`$ECHO "$hardcode_shlibpath_var_FC" | $SED "$delay_single_quote_subst"`'
+hardcode_automatic_FC='`$ECHO "$hardcode_automatic_FC" | $SED "$delay_single_quote_subst"`'
+inherit_rpath_FC='`$ECHO "$inherit_rpath_FC" | $SED "$delay_single_quote_subst"`'
+link_all_deplibs_FC='`$ECHO "$link_all_deplibs_FC" | $SED "$delay_single_quote_subst"`'
+fix_srcfile_path_FC='`$ECHO "$fix_srcfile_path_FC" | $SED "$delay_single_quote_subst"`'
+always_export_symbols_FC='`$ECHO "$always_export_symbols_FC" | $SED "$delay_single_quote_subst"`'
+export_symbols_cmds_FC='`$ECHO "$export_symbols_cmds_FC" | $SED "$delay_single_quote_subst"`'
+exclude_expsyms_FC='`$ECHO "$exclude_expsyms_FC" | $SED "$delay_single_quote_subst"`'
+include_expsyms_FC='`$ECHO "$include_expsyms_FC" | $SED "$delay_single_quote_subst"`'
+prelink_cmds_FC='`$ECHO "$prelink_cmds_FC" | $SED "$delay_single_quote_subst"`'
+file_list_spec_FC='`$ECHO "$file_list_spec_FC" | $SED "$delay_single_quote_subst"`'
+hardcode_action_FC='`$ECHO "$hardcode_action_FC" | $SED "$delay_single_quote_subst"`'
+compiler_lib_search_dirs_FC='`$ECHO "$compiler_lib_search_dirs_FC" | $SED "$delay_single_quote_subst"`'
+predep_objects_FC='`$ECHO "$predep_objects_FC" | $SED "$delay_single_quote_subst"`'
+postdep_objects_FC='`$ECHO "$postdep_objects_FC" | $SED "$delay_single_quote_subst"`'
+predeps_FC='`$ECHO "$predeps_FC" | $SED "$delay_single_quote_subst"`'
+postdeps_FC='`$ECHO "$postdeps_FC" | $SED "$delay_single_quote_subst"`'
+compiler_lib_search_path_FC='`$ECHO "$compiler_lib_search_path_FC" | $SED "$delay_single_quote_subst"`'
+
+LTCC='$LTCC'
+LTCFLAGS='$LTCFLAGS'
+compiler='$compiler_DEFAULT'
+
+# A function that is used when there is no print builtin or printf.
+func_fallback_echo ()
+{
+  eval 'cat <<_LTECHO_EOF
+\$1
+_LTECHO_EOF'
+}
+
+# Quote evaled strings.
+for var in SHELL \
+ECHO \
+SED \
+GREP \
+EGREP \
+FGREP \
+LD \
+NM \
+LN_S \
+lt_SP2NL \
+lt_NL2SP \
+reload_flag \
+OBJDUMP \
+deplibs_check_method \
+file_magic_cmd \
+AR \
+AR_FLAGS \
+STRIP \
+RANLIB \
+CC \
+CFLAGS \
+compiler \
+lt_cv_sys_global_symbol_pipe \
+lt_cv_sys_global_symbol_to_cdecl \
+lt_cv_sys_global_symbol_to_c_name_address \
+lt_cv_sys_global_symbol_to_c_name_address_lib_prefix \
+lt_prog_compiler_no_builtin_flag \
+lt_prog_compiler_wl \
+lt_prog_compiler_pic \
+lt_prog_compiler_static \
+lt_cv_prog_compiler_c_o \
+need_locks \
+DSYMUTIL \
+NMEDIT \
+LIPO \
+OTOOL \
+OTOOL64 \
+shrext_cmds \
+export_dynamic_flag_spec \
+whole_archive_flag_spec \
+compiler_needs_object \
+with_gnu_ld \
+allow_undefined_flag \
+no_undefined_flag \
+hardcode_libdir_flag_spec \
+hardcode_libdir_flag_spec_ld \
+hardcode_libdir_separator \
+fix_srcfile_path \
+exclude_expsyms \
+include_expsyms \
+file_list_spec \
+variables_saved_for_relink \
+libname_spec \
+library_names_spec \
+soname_spec \
+install_override_mode \
+finish_eval \
+old_striplib \
+striplib \
+compiler_lib_search_dirs \
+predep_objects \
+postdep_objects \
+predeps \
+postdeps \
+compiler_lib_search_path \
+LD_FC \
+reload_flag_FC \
+compiler_FC \
+lt_prog_compiler_no_builtin_flag_FC \
+lt_prog_compiler_wl_FC \
+lt_prog_compiler_pic_FC \
+lt_prog_compiler_static_FC \
+lt_cv_prog_compiler_c_o_FC \
+export_dynamic_flag_spec_FC \
+whole_archive_flag_spec_FC \
+compiler_needs_object_FC \
+with_gnu_ld_FC \
+allow_undefined_flag_FC \
+no_undefined_flag_FC \
+hardcode_libdir_flag_spec_FC \
+hardcode_libdir_flag_spec_ld_FC \
+hardcode_libdir_separator_FC \
+fix_srcfile_path_FC \
+exclude_expsyms_FC \
+include_expsyms_FC \
+file_list_spec_FC \
+compiler_lib_search_dirs_FC \
+predep_objects_FC \
+postdep_objects_FC \
+predeps_FC \
+postdeps_FC \
+compiler_lib_search_path_FC; do
+    case \`eval \\\\\$ECHO \\\\""\\\\\$\$var"\\\\"\` in
+    *[\\\\\\\`\\"\\\$]*)
+      eval "lt_\$var=\\\\\\"\\\`\\\$ECHO \\"\\\$\$var\\" | \\\$SED \\"\\\$sed_quote_subst\\"\\\`\\\\\\""
+      ;;
+    *)
+      eval "lt_\$var=\\\\\\"\\\$\$var\\\\\\""
+      ;;
+    esac
+done
+
+# Double-quote double-evaled strings.
+for var in reload_cmds \
+old_postinstall_cmds \
+old_postuninstall_cmds \
+old_archive_cmds \
+extract_expsyms_cmds \
+old_archive_from_new_cmds \
+old_archive_from_expsyms_cmds \
+archive_cmds \
+archive_expsym_cmds \
+module_cmds \
+module_expsym_cmds \
+export_symbols_cmds \
+prelink_cmds \
+postinstall_cmds \
+postuninstall_cmds \
+finish_cmds \
+sys_lib_search_path_spec \
+sys_lib_dlsearch_path_spec \
+reload_cmds_FC \
+old_archive_cmds_FC \
+old_archive_from_new_cmds_FC \
+old_archive_from_expsyms_cmds_FC \
+archive_cmds_FC \
+archive_expsym_cmds_FC \
+module_cmds_FC \
+module_expsym_cmds_FC \
+export_symbols_cmds_FC \
+prelink_cmds_FC; do
+    case \`eval \\\\\$ECHO \\\\""\\\\\$\$var"\\\\"\` in
+    *[\\\\\\\`\\"\\\$]*)
+      eval "lt_\$var=\\\\\\"\\\`\\\$ECHO \\"\\\$\$var\\" | \\\$SED -e \\"\\\$double_quote_subst\\" -e \\"\\\$sed_quote_subst\\" -e \\"\\\$delay_variable_subst\\"\\\`\\\\\\""
+      ;;
+    *)
+      eval "lt_\$var=\\\\\\"\\\$\$var\\\\\\""
+      ;;
+    esac
+done
+
+ac_aux_dir='$ac_aux_dir'
+xsi_shell='$xsi_shell'
+lt_shell_append='$lt_shell_append'
+
+# See if we are running on zsh, and set the options which allow our
+# commands through without removal of \ escapes INIT.
+if test -n "\${ZSH_VERSION+set}" ; then
+   setopt NO_GLOB_SUBST
+fi
+
+
+    PACKAGE='$PACKAGE'
+    VERSION='$VERSION'
+    TIMESTAMP='$TIMESTAMP'
+    RM='$RM'
+    ofile='$ofile'
+
+
+
+
+
+
+GCC="$GCC"
+CC="$CC"
+acx_cv_header_stdint="$acx_cv_header_stdint"
+acx_cv_type_int8_t="$acx_cv_type_int8_t"
+acx_cv_type_int16_t="$acx_cv_type_int16_t"
+acx_cv_type_int32_t="$acx_cv_type_int32_t"
+acx_cv_type_int64_t="$acx_cv_type_int64_t"
+acx_cv_type_intptr_t="$acx_cv_type_intptr_t"
+ac_cv_type_uintmax_t="$ac_cv_type_uintmax_t"
+ac_cv_type_uintptr_t="$ac_cv_type_uintptr_t"
+ac_cv_type_uint64_t="$ac_cv_type_uint64_t"
+ac_cv_type_u_int64_t="$ac_cv_type_u_int64_t"
+ac_cv_type_u_int32_t="$ac_cv_type_u_int32_t"
+ac_cv_type_int_least32_t="$ac_cv_type_int_least32_t"
+ac_cv_type_int_fast32_t="$ac_cv_type_int_fast32_t"
+ac_cv_sizeof_void_p="$ac_cv_sizeof_void_p"
+
+
+_ACEOF
+
+cat >>$CONFIG_STATUS <<\_ACEOF || ac_write_fail=1
+
+# Handling of arguments.
+for ac_config_target in $ac_config_targets
+do
+  case $ac_config_target in
+    "config.h") CONFIG_HEADERS="$CONFIG_HEADERS config.h" ;;
+    "default-1") CONFIG_COMMANDS="$CONFIG_COMMANDS default-1" ;;
+    "depfiles") CONFIG_COMMANDS="$CONFIG_COMMANDS depfiles" ;;
+    "libtool") CONFIG_COMMANDS="$CONFIG_COMMANDS libtool" ;;
+    "gstdint.h") CONFIG_COMMANDS="$CONFIG_COMMANDS gstdint.h" ;;
+
+  *) as_fn_error "invalid argument: \`$ac_config_target'" "$LINENO" 5;;
+  esac
+done
+
+
+# If the user did not use the arguments to specify the items to instantiate,
+# then the envvar interface is used.  Set only those that are not.
+# We use the long form for the default assignment because of an extremely
+# bizarre bug on SunOS 4.1.3.
+if $ac_need_defaults; then
+  test "${CONFIG_HEADERS+set}" = set || CONFIG_HEADERS=$config_headers
+  test "${CONFIG_COMMANDS+set}" = set || CONFIG_COMMANDS=$config_commands
+fi
+
+# Have a temporary directory for convenience.  Make it in the build tree
+# simply because there is no reason against having it here, and in addition,
+# creating and moving files from /tmp can sometimes cause problems.
+# Hook for its removal unless debugging.
+# Note that there is a small window in which the directory will not be cleaned:
+# after its creation but before its name has been assigned to `$tmp'.
+$debug ||
+{
+  tmp=
+  trap 'exit_status=$?
+  { test -z "$tmp" || test ! -d "$tmp" || rm -fr "$tmp"; } && exit $exit_status
+' 0
+  trap 'as_fn_exit 1' 1 2 13 15
+}
+# Create a (secure) tmp directory for tmp files.
+
+{
+  tmp=`(umask 077 && mktemp -d "./confXXXXXX") 2>/dev/null` &&
+  test -n "$tmp" && test -d "$tmp"
+}  ||
+{
+  tmp=./conf$$-$RANDOM
+  (umask 077 && mkdir "$tmp")
+} || as_fn_error "cannot create a temporary directory in ." "$LINENO" 5
+
+# Set up the scripts for CONFIG_HEADERS section.
+# No need to generate them if there are no CONFIG_HEADERS.
+# This happens for instance with `./config.status Makefile'.
+if test -n "$CONFIG_HEADERS"; then
+cat >"$tmp/defines.awk" <<\_ACAWK ||
+BEGIN {
+_ACEOF
+
+# Transform confdefs.h into an awk script `defines.awk', embedded as
+# here-document in config.status, that substitutes the proper values into
+# config.h.in to produce config.h.
+
+# Create a delimiter string that does not exist in confdefs.h, to ease
+# handling of long lines.
+ac_delim='%!_!# '
+for ac_last_try in false false :; do
+  ac_t=`sed -n "/$ac_delim/p" confdefs.h`
+  if test -z "$ac_t"; then
+    break
+  elif $ac_last_try; then
+    as_fn_error "could not make $CONFIG_HEADERS" "$LINENO" 5
+  else
+    ac_delim="$ac_delim!$ac_delim _$ac_delim!! "
+  fi
+done
+
+# For the awk script, D is an array of macro values keyed by name,
+# likewise P contains macro parameters if any.  Preserve backslash
+# newline sequences.
+
+ac_word_re=[_$as_cr_Letters][_$as_cr_alnum]*
+sed -n '
+s/.\{148\}/&'"$ac_delim"'/g
+t rset
+:rset
+s/^[	 ]*#[	 ]*define[	 ][	 ]*/ /
+t def
+d
+:def
+s/\\$//
+t bsnl
+s/["\\]/\\&/g
+s/^ \('"$ac_word_re"'\)\(([^()]*)\)[	 ]*\(.*\)/P["\1"]="\2"\
+D["\1"]=" \3"/p
+s/^ \('"$ac_word_re"'\)[	 ]*\(.*\)/D["\1"]=" \2"/p
+d
+:bsnl
+s/["\\]/\\&/g
+s/^ \('"$ac_word_re"'\)\(([^()]*)\)[	 ]*\(.*\)/P["\1"]="\2"\
+D["\1"]=" \3\\\\\\n"\\/p
+t cont
+s/^ \('"$ac_word_re"'\)[	 ]*\(.*\)/D["\1"]=" \2\\\\\\n"\\/p
+t cont
+d
+:cont
+n
+s/.\{148\}/&'"$ac_delim"'/g
+t clear
+:clear
+s/\\$//
+t bsnlc
+s/["\\]/\\&/g; s/^/"/; s/$/"/p
+d
+:bsnlc
+s/["\\]/\\&/g; s/^/"/; s/$/\\\\\\n"\\/p
+b cont
+' <confdefs.h | sed '
+s/'"$ac_delim"'/"\\\
+"/g' >>$CONFIG_STATUS || ac_write_fail=1
+
+cat >>$CONFIG_STATUS <<_ACEOF || ac_write_fail=1
+  for (key in D) D_is_set[key] = 1
+  FS = "\a"
+}
+/^[\t ]*#[\t ]*(define|undef)[\t ]+$ac_word_re([\t (]|\$)/ {
+  line = \$ 0
+  split(line, arg, " ")
+  if (arg[1] == "#") {
+    defundef = arg[2]
+    mac1 = arg[3]
+  } else {
+    defundef = substr(arg[1], 2)
+    mac1 = arg[2]
+  }
+  split(mac1, mac2, "(") #)
+  macro = mac2[1]
+  prefix = substr(line, 1, index(line, defundef) - 1)
+  if (D_is_set[macro]) {
+    # Preserve the white space surrounding the "#".
+    print prefix "define", macro P[macro] D[macro]
+    next
+  } else {
+    # Replace #undef with comments.  This is necessary, for example,
+    # in the case of _POSIX_SOURCE, which is predefined and required
+    # on some systems where configure will not decide to define it.
+    if (defundef == "undef") {
+      print "/*", prefix defundef, macro, "*/"
+      next
+    }
+  }
+}
+{ print }
+_ACAWK
+_ACEOF
+cat >>$CONFIG_STATUS <<\_ACEOF || ac_write_fail=1
+  as_fn_error "could not setup config headers machinery" "$LINENO" 5
+fi # test -n "$CONFIG_HEADERS"
+
+
+eval set X "    :H $CONFIG_HEADERS    :C $CONFIG_COMMANDS"
+shift
+for ac_tag
+do
+  case $ac_tag in
+  :[FHLC]) ac_mode=$ac_tag; continue;;
+  esac
+  case $ac_mode$ac_tag in
+  :[FHL]*:*);;
+  :L* | :C*:*) as_fn_error "invalid tag \`$ac_tag'" "$LINENO" 5;;
+  :[FH]-) ac_tag=-:-;;
+  :[FH]*) ac_tag=$ac_tag:$ac_tag.in;;
+  esac
+  ac_save_IFS=$IFS
+  IFS=:
+  set x $ac_tag
+  IFS=$ac_save_IFS
+  shift
+  ac_file=$1
+  shift
+
+  case $ac_mode in
+  :L) ac_source=$1;;
+  :[FH])
+    ac_file_inputs=
+    for ac_f
+    do
+      case $ac_f in
+      -) ac_f="$tmp/stdin";;
+      *) # Look for the file first in the build tree, then in the source tree
+	 # (if the path is not absolute).  The absolute path cannot be DOS-style,
+	 # because $ac_f cannot contain `:'.
+	 test -f "$ac_f" ||
+	   case $ac_f in
+	   [\\/$]*) false;;
+	   *) test -f "$srcdir/$ac_f" && ac_f="$srcdir/$ac_f";;
+	   esac ||
+	   as_fn_error "cannot find input file: \`$ac_f'" "$LINENO" 5;;
+      esac
+      case $ac_f in *\'*) ac_f=`$as_echo "$ac_f" | sed "s/'/'\\\\\\\\''/g"`;; esac
+      as_fn_append ac_file_inputs " '$ac_f'"
+    done
+
+    # Let's still pretend it is `configure' which instantiates (i.e., don't
+    # use $as_me), people would be surprised to read:
+    #    /* config.h.  Generated by config.status.  */
+    configure_input='Generated from '`
+	  $as_echo "$*" | sed 's|^[^:]*/||;s|:[^:]*/|, |g'
+	`' by configure.'
+    if test x"$ac_file" != x-; then
+      configure_input="$ac_file.  $configure_input"
+      { $as_echo "$as_me:${as_lineno-$LINENO}: creating $ac_file" >&5
+$as_echo "$as_me: creating $ac_file" >&6;}
+    fi
+    # Neutralize special characters interpreted by sed in replacement strings.
+    case $configure_input in #(
+    *\&* | *\|* | *\\* )
+       ac_sed_conf_input=`$as_echo "$configure_input" |
+       sed 's/[\\\\&|]/\\\\&/g'`;; #(
+    *) ac_sed_conf_input=$configure_input;;
+    esac
+
+    case $ac_tag in
+    *:-:* | *:-) cat >"$tmp/stdin" \
+      || as_fn_error "could not create $ac_file" "$LINENO" 5 ;;
+    esac
+    ;;
+  esac
+
+  ac_dir=`$as_dirname -- "$ac_file" ||
+$as_expr X"$ac_file" : 'X\(.*[^/]\)//*[^/][^/]*/*$' \| \
+	 X"$ac_file" : 'X\(//\)[^/]' \| \
+	 X"$ac_file" : 'X\(//\)$' \| \
+	 X"$ac_file" : 'X\(/\)' \| . 2>/dev/null ||
+$as_echo X"$ac_file" |
+    sed '/^X\(.*[^/]\)\/\/*[^/][^/]*\/*$/{
+	    s//\1/
+	    q
+	  }
+	  /^X\(\/\/\)[^/].*/{
+	    s//\1/
+	    q
+	  }
+	  /^X\(\/\/\)$/{
+	    s//\1/
+	    q
+	  }
+	  /^X\(\/\).*/{
+	    s//\1/
+	    q
+	  }
+	  s/.*/./; q'`
+  as_dir="$ac_dir"; as_fn_mkdir_p
+  ac_builddir=.
+
+case "$ac_dir" in
+.) ac_dir_suffix= ac_top_builddir_sub=. ac_top_build_prefix= ;;
+*)
+  ac_dir_suffix=/`$as_echo "$ac_dir" | sed 's|^\.[\\/]||'`
+  # A ".." for each directory in $ac_dir_suffix.
+  ac_top_builddir_sub=`$as_echo "$ac_dir_suffix" | sed 's|/[^\\/]*|/..|g;s|/||'`
+  case $ac_top_builddir_sub in
+  "") ac_top_builddir_sub=. ac_top_build_prefix= ;;
+  *)  ac_top_build_prefix=$ac_top_builddir_sub/ ;;
+  esac ;;
+esac
+ac_abs_top_builddir=$ac_pwd
+ac_abs_builddir=$ac_pwd$ac_dir_suffix
+# for backward compatibility:
+ac_top_builddir=$ac_top_build_prefix
+
+case $srcdir in
+  .)  # We are building in place.
+    ac_srcdir=.
+    ac_top_srcdir=$ac_top_builddir_sub
+    ac_abs_top_srcdir=$ac_pwd ;;
+  [\\/]* | ?:[\\/]* )  # Absolute name.
+    ac_srcdir=$srcdir$ac_dir_suffix;
+    ac_top_srcdir=$srcdir
+    ac_abs_top_srcdir=$srcdir ;;
+  *) # Relative name.
+    ac_srcdir=$ac_top_build_prefix$srcdir$ac_dir_suffix
+    ac_top_srcdir=$ac_top_build_prefix$srcdir
+    ac_abs_top_srcdir=$ac_pwd/$srcdir ;;
+esac
+ac_abs_srcdir=$ac_abs_top_srcdir$ac_dir_suffix
+
+
+  case $ac_mode in
+
+  :H)
+  #
+  # CONFIG_HEADER
+  #
+  if test x"$ac_file" != x-; then
+    {
+      $as_echo "/* $configure_input  */" \
+      && eval '$AWK -f "$tmp/defines.awk"' "$ac_file_inputs"
+    } >"$tmp/config.h" \
+      || as_fn_error "could not create $ac_file" "$LINENO" 5
+    if diff "$ac_file" "$tmp/config.h" >/dev/null 2>&1; then
+      { $as_echo "$as_me:${as_lineno-$LINENO}: $ac_file is unchanged" >&5
+$as_echo "$as_me: $ac_file is unchanged" >&6;}
+    else
+      rm -f "$ac_file"
+      mv "$tmp/config.h" "$ac_file" \
+	|| as_fn_error "could not create $ac_file" "$LINENO" 5
+    fi
+  else
+    $as_echo "/* $configure_input  */" \
+      && eval '$AWK -f "$tmp/defines.awk"' "$ac_file_inputs" \
+      || as_fn_error "could not create -" "$LINENO" 5
+  fi
+# Compute "$ac_file"'s index in $config_headers.
+_am_arg="$ac_file"
+_am_stamp_count=1
+for _am_header in $config_headers :; do
+  case $_am_header in
+    $_am_arg | $_am_arg:* )
+      break ;;
+    * )
+      _am_stamp_count=`expr $_am_stamp_count + 1` ;;
+  esac
+done
+echo "timestamp for $_am_arg" >`$as_dirname -- "$_am_arg" ||
+$as_expr X"$_am_arg" : 'X\(.*[^/]\)//*[^/][^/]*/*$' \| \
+	 X"$_am_arg" : 'X\(//\)[^/]' \| \
+	 X"$_am_arg" : 'X\(//\)$' \| \
+	 X"$_am_arg" : 'X\(/\)' \| . 2>/dev/null ||
+$as_echo X"$_am_arg" |
+    sed '/^X\(.*[^/]\)\/\/*[^/][^/]*\/*$/{
+	    s//\1/
+	    q
+	  }
+	  /^X\(\/\/\)[^/].*/{
+	    s//\1/
+	    q
+	  }
+	  /^X\(\/\/\)$/{
+	    s//\1/
+	    q
+	  }
+	  /^X\(\/\).*/{
+	    s//\1/
+	    q
+	  }
+	  s/.*/./; q'`/stamp-h$_am_stamp_count
+ ;;
+
+  :C)  { $as_echo "$as_me:${as_lineno-$LINENO}: executing $ac_file commands" >&5
+$as_echo "$as_me: executing $ac_file commands" >&6;}
+ ;;
+  esac
+
+
+  case $ac_file$ac_mode in
+    "default-1":C)
+# Only add multilib support code if we just rebuilt the top-level
+# Makefile.
+case " $CONFIG_FILES " in
+ *" Makefile "*)
+   ac_file=Makefile . ${multi_basedir}/config-ml.in
+   ;;
+esac ;;
+    "depfiles":C) test x"$AMDEP_TRUE" != x"" || {
+  # Autoconf 2.62 quotes --file arguments for eval, but not when files
+  # are listed without --file.  Let's play safe and only enable the eval
+  # if we detect the quoting.
+  case $CONFIG_FILES in
+  *\'*) eval set x "$CONFIG_FILES" ;;
+  *)   set x $CONFIG_FILES ;;
+  esac
+  shift
+  for mf
+  do
+    # Strip MF so we end up with the name of the file.
+    mf=`echo "$mf" | sed -e 's/:.*$//'`
+    # Check whether this is an Automake generated Makefile or not.
+    # We used to match only the files named `Makefile.in', but
+    # some people rename them; so instead we look at the file content.
+    # Grep'ing the first line is not enough: some people post-process
+    # each Makefile.in and add a new line on top of each file to say so.
+    # Grep'ing the whole file is not good either: AIX grep has a line
+    # limit of 2048, but all sed's we know have understand at least 4000.
+    if sed -n 's,^#.*generated by automake.*,X,p' "$mf" | grep X >/dev/null 2>&1; then
+      dirpart=`$as_dirname -- "$mf" ||
+$as_expr X"$mf" : 'X\(.*[^/]\)//*[^/][^/]*/*$' \| \
+	 X"$mf" : 'X\(//\)[^/]' \| \
+	 X"$mf" : 'X\(//\)$' \| \
+	 X"$mf" : 'X\(/\)' \| . 2>/dev/null ||
+$as_echo X"$mf" |
+    sed '/^X\(.*[^/]\)\/\/*[^/][^/]*\/*$/{
+	    s//\1/
+	    q
+	  }
+	  /^X\(\/\/\)[^/].*/{
+	    s//\1/
+	    q
+	  }
+	  /^X\(\/\/\)$/{
+	    s//\1/
+	    q
+	  }
+	  /^X\(\/\).*/{
+	    s//\1/
+	    q
+	  }
+	  s/.*/./; q'`
+    else
+      continue
+    fi
+    # Extract the definition of DEPDIR, am__include, and am__quote
+    # from the Makefile without running `make'.
+    DEPDIR=`sed -n 's/^DEPDIR = //p' < "$mf"`
+    test -z "$DEPDIR" && continue
+    am__include=`sed -n 's/^am__include = //p' < "$mf"`
+    test -z "am__include" && continue
+    am__quote=`sed -n 's/^am__quote = //p' < "$mf"`
+    # When using ansi2knr, U may be empty or an underscore; expand it
+    U=`sed -n 's/^U = //p' < "$mf"`
+    # Find all dependency output files, they are included files with
+    # $(DEPDIR) in their names.  We invoke sed twice because it is the
+    # simplest approach to changing $(DEPDIR) to its actual value in the
+    # expansion.
+    for file in `sed -n "
+      s/^$am__include $am__quote\(.*(DEPDIR).*\)$am__quote"'$/\1/p' <"$mf" | \
+	 sed -e 's/\$(DEPDIR)/'"$DEPDIR"'/g' -e 's/\$U/'"$U"'/g'`; do
+      # Make sure the directory exists.
+      test -f "$dirpart/$file" && continue
+      fdir=`$as_dirname -- "$file" ||
+$as_expr X"$file" : 'X\(.*[^/]\)//*[^/][^/]*/*$' \| \
+	 X"$file" : 'X\(//\)[^/]' \| \
+	 X"$file" : 'X\(//\)$' \| \
+	 X"$file" : 'X\(/\)' \| . 2>/dev/null ||
+$as_echo X"$file" |
+    sed '/^X\(.*[^/]\)\/\/*[^/][^/]*\/*$/{
+	    s//\1/
+	    q
+	  }
+	  /^X\(\/\/\)[^/].*/{
+	    s//\1/
+	    q
+	  }
+	  /^X\(\/\/\)$/{
+	    s//\1/
+	    q
+	  }
+	  /^X\(\/\).*/{
+	    s//\1/
+	    q
+	  }
+	  s/.*/./; q'`
+      as_dir=$dirpart/$fdir; as_fn_mkdir_p
+      # echo "creating $dirpart/$file"
+      echo '# dummy' > "$dirpart/$file"
+    done
+  done
+}
+ ;;
+    "libtool":C)
+
+    # See if we are running on zsh, and set the options which allow our
+    # commands through without removal of \ escapes.
+    if test -n "${ZSH_VERSION+set}" ; then
+      setopt NO_GLOB_SUBST
+    fi
+
+    cfgfile="${ofile}T"
+    trap "$RM \"$cfgfile\"; exit 1" 1 2 15
+    $RM "$cfgfile"
+
+    cat <<_LT_EOF >> "$cfgfile"
+#! $SHELL
+
+# `$ECHO "$ofile" | sed 's%^.*/%%'` - Provide generalized library-building support services.
+# Generated automatically by $as_me ($PACKAGE$TIMESTAMP) $VERSION
+# Libtool was configured on host `(hostname || uname -n) 2>/dev/null | sed 1q`:
+# NOTE: Changes made to this file will be lost: look at ltmain.sh.
+#
+#   Copyright (C) 1996, 1997, 1998, 1999, 2000, 2001, 2003, 2004, 2005,
+#                 2006, 2007, 2008, 2009 Free Software Foundation, Inc.
+#   Written by Gordon Matzigkeit, 1996
+#
+#   This file is part of GNU Libtool.
+#
+# GNU Libtool is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation; either version 2 of
+# the License, or (at your option) any later version.
+#
+# As a special exception to the GNU General Public License,
+# if you distribute this file as part of a program or library that
+# is built using GNU Libtool, you may include this file under the
+# same distribution terms that you use for the rest of that program.
+#
+# GNU Libtool is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GNU Libtool; see the file COPYING.  If not, a copy
+# can be downloaded from http://www.gnu.org/licenses/gpl.html, or
+# obtained by writing to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
+
+
+# The names of the tagged configurations supported by this script.
+available_tags="FC "
+
+# ### BEGIN LIBTOOL CONFIG
+
+# Which release of libtool.m4 was used?
+macro_version=$macro_version
+macro_revision=$macro_revision
+
+# Whether or not to build shared libraries.
+build_libtool_libs=$enable_shared
+
+# Whether or not to build static libraries.
+build_old_libs=$enable_static
+
+# What type of objects to build.
+pic_mode=$pic_mode
+
+# Whether or not to optimize for fast installation.
+fast_install=$enable_fast_install
+
+# Shell to use when invoking shell scripts.
+SHELL=$lt_SHELL
+
+# An echo program that protects backslashes.
+ECHO=$lt_ECHO
+
+# The host system.
+host_alias=$host_alias
+host=$host
+host_os=$host_os
+
+# The build system.
+build_alias=$build_alias
+build=$build
+build_os=$build_os
+
+# A sed program that does not truncate output.
+SED=$lt_SED
+
+# Sed that helps us avoid accidentally triggering echo(1) options like -n.
+Xsed="\$SED -e 1s/^X//"
+
+# A grep program that handles long lines.
+GREP=$lt_GREP
+
+# An ERE matcher.
+EGREP=$lt_EGREP
+
+# A literal string matcher.
+FGREP=$lt_FGREP
+
+# A BSD- or MS-compatible name lister.
+NM=$lt_NM
+
+# Whether we need soft or hard links.
+LN_S=$lt_LN_S
+
+# What is the maximum length of a command?
+max_cmd_len=$max_cmd_len
+
+# Object file suffix (normally "o").
+objext=$ac_objext
+
+# Executable file suffix (normally "").
+exeext=$exeext
+
+# whether the shell understands "unset".
+lt_unset=$lt_unset
+
+# turn spaces into newlines.
+SP2NL=$lt_lt_SP2NL
+
+# turn newlines into spaces.
+NL2SP=$lt_lt_NL2SP
+
+# An object symbol dumper.
+OBJDUMP=$lt_OBJDUMP
+
+# Method to check whether dependent libraries are shared objects.
+deplibs_check_method=$lt_deplibs_check_method
+
+# Command to use when deplibs_check_method == "file_magic".
+file_magic_cmd=$lt_file_magic_cmd
+
+# The archiver.
+AR=$lt_AR
+AR_FLAGS=$lt_AR_FLAGS
+
+# A symbol stripping program.
+STRIP=$lt_STRIP
+
+# Commands used to install an old-style archive.
+RANLIB=$lt_RANLIB
+old_postinstall_cmds=$lt_old_postinstall_cmds
+old_postuninstall_cmds=$lt_old_postuninstall_cmds
+
+# Whether to use a lock for old archive extraction.
+lock_old_archive_extraction=$lock_old_archive_extraction
+
+# A C compiler.
+LTCC=$lt_CC
+
+# LTCC compiler flags.
+LTCFLAGS=$lt_CFLAGS
+
+# Take the output of nm and produce a listing of raw symbols and C names.
+global_symbol_pipe=$lt_lt_cv_sys_global_symbol_pipe
+
+# Transform the output of nm in a proper C declaration.
+global_symbol_to_cdecl=$lt_lt_cv_sys_global_symbol_to_cdecl
+
+# Transform the output of nm in a C name address pair.
+global_symbol_to_c_name_address=$lt_lt_cv_sys_global_symbol_to_c_name_address
+
+# Transform the output of nm in a C name address pair when lib prefix is needed.
+global_symbol_to_c_name_address_lib_prefix=$lt_lt_cv_sys_global_symbol_to_c_name_address_lib_prefix
+
+# The name of the directory that contains temporary libtool files.
+objdir=$objdir
+
+# Used to examine libraries when file_magic_cmd begins with "file".
+MAGIC_CMD=$MAGIC_CMD
+
+# Must we lock files when doing compilation?
+need_locks=$lt_need_locks
+
+# Tool to manipulate archived DWARF debug symbol files on Mac OS X.
+DSYMUTIL=$lt_DSYMUTIL
+
+# Tool to change global to local symbols on Mac OS X.
+NMEDIT=$lt_NMEDIT
+
+# Tool to manipulate fat objects and archives on Mac OS X.
+LIPO=$lt_LIPO
+
+# ldd/readelf like tool for Mach-O binaries on Mac OS X.
+OTOOL=$lt_OTOOL
+
+# ldd/readelf like tool for 64 bit Mach-O binaries on Mac OS X 10.4.
+OTOOL64=$lt_OTOOL64
+
+# Old archive suffix (normally "a").
+libext=$libext
+
+# Shared library suffix (normally ".so").
+shrext_cmds=$lt_shrext_cmds
+
+# The commands to extract the exported symbol list from a shared archive.
+extract_expsyms_cmds=$lt_extract_expsyms_cmds
+
+# Variables whose values should be saved in libtool wrapper scripts and
+# restored at link time.
+variables_saved_for_relink=$lt_variables_saved_for_relink
+
+# Do we need the "lib" prefix for modules?
+need_lib_prefix=$need_lib_prefix
+
+# Do we need a version for libraries?
+need_version=$need_version
+
+# Library versioning type.
+version_type=$version_type
+
+# Shared library runtime path variable.
+runpath_var=$runpath_var
+
+# Shared library path variable.
+shlibpath_var=$shlibpath_var
+
+# Is shlibpath searched before the hard-coded library search path?
+shlibpath_overrides_runpath=$shlibpath_overrides_runpath
+
+# Format of library name prefix.
+libname_spec=$lt_libname_spec
+
+# List of archive names.  First name is the real one, the rest are links.
+# The last name is the one that the linker finds with -lNAME
+library_names_spec=$lt_library_names_spec
+
+# The coded name of the library, if different from the real name.
+soname_spec=$lt_soname_spec
+
+# Permission mode override for installation of shared libraries.
+install_override_mode=$lt_install_override_mode
+
+# Command to use after installation of a shared archive.
+postinstall_cmds=$lt_postinstall_cmds
+
+# Command to use after uninstallation of a shared archive.
+postuninstall_cmds=$lt_postuninstall_cmds
+
+# Commands used to finish a libtool library installation in a directory.
+finish_cmds=$lt_finish_cmds
+
+# As "finish_cmds", except a single script fragment to be evaled but
+# not shown.
+finish_eval=$lt_finish_eval
+
+# Whether we should hardcode library paths into libraries.
+hardcode_into_libs=$hardcode_into_libs
+
+# Compile-time system search path for libraries.
+sys_lib_search_path_spec=$lt_sys_lib_search_path_spec
+
+# Run-time system search path for libraries.
+sys_lib_dlsearch_path_spec=$lt_sys_lib_dlsearch_path_spec
+
+# Whether dlopen is supported.
+dlopen_support=$enable_dlopen
+
+# Whether dlopen of programs is supported.
+dlopen_self=$enable_dlopen_self
+
+# Whether dlopen of statically linked programs is supported.
+dlopen_self_static=$enable_dlopen_self_static
+
+# Commands to strip libraries.
+old_striplib=$lt_old_striplib
+striplib=$lt_striplib
+
+
+# The linker used to build libraries.
+LD=$lt_LD
+
+# How to create reloadable object files.
+reload_flag=$lt_reload_flag
+reload_cmds=$lt_reload_cmds
+
+# Commands used to build an old-style archive.
+old_archive_cmds=$lt_old_archive_cmds
+
+# A language specific compiler.
+CC=$lt_compiler
+
+# Is the compiler the GNU compiler?
+with_gcc=$GCC
+
+# Compiler flag to turn off builtin functions.
+no_builtin_flag=$lt_lt_prog_compiler_no_builtin_flag
+
+# How to pass a linker flag through the compiler.
+wl=$lt_lt_prog_compiler_wl
+
+# Additional compiler flags for building library objects.
+pic_flag=$lt_lt_prog_compiler_pic
+
+# Compiler flag to prevent dynamic linking.
+link_static_flag=$lt_lt_prog_compiler_static
+
+# Does compiler simultaneously support -c and -o options?
+compiler_c_o=$lt_lt_cv_prog_compiler_c_o
+
+# Whether or not to add -lc for building shared libraries.
+build_libtool_need_lc=$archive_cmds_need_lc
+
+# Whether or not to disallow shared libs when runtime libs are static.
+allow_libtool_libs_with_static_runtimes=$enable_shared_with_static_runtimes
+
+# Compiler flag to allow reflexive dlopens.
+export_dynamic_flag_spec=$lt_export_dynamic_flag_spec
+
+# Compiler flag to generate shared objects directly from archives.
+whole_archive_flag_spec=$lt_whole_archive_flag_spec
+
+# Whether the compiler copes with passing no objects directly.
+compiler_needs_object=$lt_compiler_needs_object
+
+# Create an old-style archive from a shared archive.
+old_archive_from_new_cmds=$lt_old_archive_from_new_cmds
+
+# Create a temporary old-style archive to link instead of a shared archive.
+old_archive_from_expsyms_cmds=$lt_old_archive_from_expsyms_cmds
+
+# Commands used to build a shared archive.
+archive_cmds=$lt_archive_cmds
+archive_expsym_cmds=$lt_archive_expsym_cmds
+
+# Commands used to build a loadable module if different from building
+# a shared archive.
+module_cmds=$lt_module_cmds
+module_expsym_cmds=$lt_module_expsym_cmds
+
+# Whether we are building with GNU ld or not.
+with_gnu_ld=$lt_with_gnu_ld
+
+# Flag that allows shared libraries with undefined symbols to be built.
+allow_undefined_flag=$lt_allow_undefined_flag
+
+# Flag that enforces no undefined symbols.
+no_undefined_flag=$lt_no_undefined_flag
+
+# Flag to hardcode \$libdir into a binary during linking.
+# This must work even if \$libdir does not exist
+hardcode_libdir_flag_spec=$lt_hardcode_libdir_flag_spec
+
+# If ld is used when linking, flag to hardcode \$libdir into a binary
+# during linking.  This must work even if \$libdir does not exist.
+hardcode_libdir_flag_spec_ld=$lt_hardcode_libdir_flag_spec_ld
+
+# Whether we need a single "-rpath" flag with a separated argument.
+hardcode_libdir_separator=$lt_hardcode_libdir_separator
+
+# Set to "yes" if using DIR/libNAME\${shared_ext} during linking hardcodes
+# DIR into the resulting binary.
+hardcode_direct=$hardcode_direct
+
+# Set to "yes" if using DIR/libNAME\${shared_ext} during linking hardcodes
+# DIR into the resulting binary and the resulting library dependency is
+# "absolute",i.e impossible to change by setting \${shlibpath_var} if the
+# library is relocated.
+hardcode_direct_absolute=$hardcode_direct_absolute
+
+# Set to "yes" if using the -LDIR flag during linking hardcodes DIR
+# into the resulting binary.
+hardcode_minus_L=$hardcode_minus_L
+
+# Set to "yes" if using SHLIBPATH_VAR=DIR during linking hardcodes DIR
+# into the resulting binary.
+hardcode_shlibpath_var=$hardcode_shlibpath_var
+
+# Set to "yes" if building a shared library automatically hardcodes DIR
+# into the library and all subsequent libraries and executables linked
+# against it.
+hardcode_automatic=$hardcode_automatic
+
+# Set to yes if linker adds runtime paths of dependent libraries
+# to runtime path list.
+inherit_rpath=$inherit_rpath
+
+# Whether libtool must link a program against all its dependency libraries.
+link_all_deplibs=$link_all_deplibs
+
+# Fix the shell variable \$srcfile for the compiler.
+fix_srcfile_path=$lt_fix_srcfile_path
+
+# Set to "yes" if exported symbols are required.
+always_export_symbols=$always_export_symbols
+
+# The commands to list exported symbols.
+export_symbols_cmds=$lt_export_symbols_cmds
+
+# Symbols that should not be listed in the preloaded symbols.
+exclude_expsyms=$lt_exclude_expsyms
+
+# Symbols that must always be exported.
+include_expsyms=$lt_include_expsyms
+
+# Commands necessary for linking programs (against libraries) with templates.
+prelink_cmds=$lt_prelink_cmds
+
+# Specify filename containing input files.
+file_list_spec=$lt_file_list_spec
+
+# How to hardcode a shared library path into an executable.
+hardcode_action=$hardcode_action
+
+# The directories searched by this compiler when creating a shared library.
+compiler_lib_search_dirs=$lt_compiler_lib_search_dirs
+
+# Dependencies to place before and after the objects being linked to
+# create a shared library.
+predep_objects=$lt_predep_objects
+postdep_objects=$lt_postdep_objects
+predeps=$lt_predeps
+postdeps=$lt_postdeps
+
+# The library search path used internally by the compiler when linking
+# a shared library.
+compiler_lib_search_path=$lt_compiler_lib_search_path
+
+# ### END LIBTOOL CONFIG
+
+_LT_EOF
+
+  case $host_os in
+  aix3*)
+    cat <<\_LT_EOF >> "$cfgfile"
+# AIX sometimes has problems with the GCC collect2 program.  For some
+# reason, if we set the COLLECT_NAMES environment variable, the problems
+# vanish in a puff of smoke.
+if test "X${COLLECT_NAMES+set}" != Xset; then
+  COLLECT_NAMES=
+  export COLLECT_NAMES
+fi
+_LT_EOF
+    ;;
+  esac
+
+
+ltmain="$ac_aux_dir/ltmain.sh"
+
+
+  # We use sed instead of cat because bash on DJGPP gets confused if
+  # if finds mixed CR/LF and LF-only lines.  Since sed operates in
+  # text mode, it properly converts lines to CR/LF.  This bash problem
+  # is reportedly fixed, but why not run on old versions too?
+  sed '/^# Generated shell functions inserted here/q' "$ltmain" >> "$cfgfile" \
+    || (rm -f "$cfgfile"; exit 1)
+
+  case $xsi_shell in
+  yes)
+    cat << \_LT_EOF >> "$cfgfile"
+
+# func_dirname file append nondir_replacement
+# Compute the dirname of FILE.  If nonempty, add APPEND to the result,
+# otherwise set result to NONDIR_REPLACEMENT.
+func_dirname ()
+{
+  case ${1} in
+    */*) func_dirname_result="${1%/*}${2}" ;;
+    *  ) func_dirname_result="${3}" ;;
+  esac
+}
+
+# func_basename file
+func_basename ()
+{
+  func_basename_result="${1##*/}"
+}
+
+# func_dirname_and_basename file append nondir_replacement
+# perform func_basename and func_dirname in a single function
+# call:
+#   dirname:  Compute the dirname of FILE.  If nonempty,
+#             add APPEND to the result, otherwise set result
+#             to NONDIR_REPLACEMENT.
+#             value returned in "$func_dirname_result"
+#   basename: Compute filename of FILE.
+#             value retuned in "$func_basename_result"
+# Implementation must be kept synchronized with func_dirname
+# and func_basename. For efficiency, we do not delegate to
+# those functions but instead duplicate the functionality here.
+func_dirname_and_basename ()
+{
+  case ${1} in
+    */*) func_dirname_result="${1%/*}${2}" ;;
+    *  ) func_dirname_result="${3}" ;;
+  esac
+  func_basename_result="${1##*/}"
+}
+
+# func_stripname prefix suffix name
+# strip PREFIX and SUFFIX off of NAME.
+# PREFIX and SUFFIX must not contain globbing or regex special
+# characters, hashes, percent signs, but SUFFIX may contain a leading
+# dot (in which case that matches only a dot).
+func_stripname ()
+{
+  # pdksh 5.2.14 does not do ${X%$Y} correctly if both X and Y are
+  # positional parameters, so assign one to ordinary parameter first.
+  func_stripname_result=${3}
+  func_stripname_result=${func_stripname_result#"${1}"}
+  func_stripname_result=${func_stripname_result%"${2}"}
+}
+
+# func_opt_split
+func_opt_split ()
+{
+  func_opt_split_opt=${1%%=*}
+  func_opt_split_arg=${1#*=}
+}
+
+# func_lo2o object
+func_lo2o ()
+{
+  case ${1} in
+    *.lo) func_lo2o_result=${1%.lo}.${objext} ;;
+    *)    func_lo2o_result=${1} ;;
+  esac
+}
+
+# func_xform libobj-or-source
+func_xform ()
+{
+  func_xform_result=${1%.*}.lo
+}
+
+# func_arith arithmetic-term...
+func_arith ()
+{
+  func_arith_result=$(( $* ))
+}
+
+# func_len string
+# STRING may not start with a hyphen.
+func_len ()
+{
+  func_len_result=${#1}
+}
+
+_LT_EOF
+    ;;
+  *) # Bourne compatible functions.
+    cat << \_LT_EOF >> "$cfgfile"
+
+# func_dirname file append nondir_replacement
+# Compute the dirname of FILE.  If nonempty, add APPEND to the result,
+# otherwise set result to NONDIR_REPLACEMENT.
+func_dirname ()
+{
+  # Extract subdirectory from the argument.
+  func_dirname_result=`$ECHO "${1}" | $SED "$dirname"`
+  if test "X$func_dirname_result" = "X${1}"; then
+    func_dirname_result="${3}"
+  else
+    func_dirname_result="$func_dirname_result${2}"
+  fi
+}
+
+# func_basename file
+func_basename ()
+{
+  func_basename_result=`$ECHO "${1}" | $SED "$basename"`
+}
+
+
+# func_stripname prefix suffix name
+# strip PREFIX and SUFFIX off of NAME.
+# PREFIX and SUFFIX must not contain globbing or regex special
+# characters, hashes, percent signs, but SUFFIX may contain a leading
+# dot (in which case that matches only a dot).
+# func_strip_suffix prefix name
+func_stripname ()
+{
+  case ${2} in
+    .*) func_stripname_result=`$ECHO "${3}" | $SED "s%^${1}%%; s%\\\\${2}\$%%"`;;
+    *)  func_stripname_result=`$ECHO "${3}" | $SED "s%^${1}%%; s%${2}\$%%"`;;
+  esac
+}
+
+# sed scripts:
+my_sed_long_opt='1s/^\(-[^=]*\)=.*/\1/;q'
+my_sed_long_arg='1s/^-[^=]*=//'
+
+# func_opt_split
+func_opt_split ()
+{
+  func_opt_split_opt=`$ECHO "${1}" | $SED "$my_sed_long_opt"`
+  func_opt_split_arg=`$ECHO "${1}" | $SED "$my_sed_long_arg"`
+}
+
+# func_lo2o object
+func_lo2o ()
+{
+  func_lo2o_result=`$ECHO "${1}" | $SED "$lo2o"`
+}
+
+# func_xform libobj-or-source
+func_xform ()
+{
+  func_xform_result=`$ECHO "${1}" | $SED 's/\.[^.]*$/.lo/'`
+}
+
+# func_arith arithmetic-term...
+func_arith ()
+{
+  func_arith_result=`expr "$@"`
+}
+
+# func_len string
+# STRING may not start with a hyphen.
+func_len ()
+{
+  func_len_result=`expr "$1" : ".*" 2>/dev/null || echo $max_cmd_len`
+}
+
+_LT_EOF
+esac
+
+case $lt_shell_append in
+  yes)
+    cat << \_LT_EOF >> "$cfgfile"
+
+# func_append var value
+# Append VALUE to the end of shell variable VAR.
+func_append ()
+{
+  eval "$1+=\$2"
+}
+_LT_EOF
+    ;;
+  *)
+    cat << \_LT_EOF >> "$cfgfile"
+
+# func_append var value
+# Append VALUE to the end of shell variable VAR.
+func_append ()
+{
+  eval "$1=\$$1\$2"
+}
+
+_LT_EOF
+    ;;
+  esac
+
+
+  sed -n '/^# Generated shell functions inserted here/,$p' "$ltmain" >> "$cfgfile" \
+    || (rm -f "$cfgfile"; exit 1)
+
+  mv -f "$cfgfile" "$ofile" ||
+    (rm -f "$ofile" && cp "$cfgfile" "$ofile" && rm -f "$cfgfile")
+  chmod +x "$ofile"
+
+
+    cat <<_LT_EOF >> "$ofile"
+
+# ### BEGIN LIBTOOL TAG CONFIG: FC
+
+# The linker used to build libraries.
+LD=$lt_LD_FC
+
+# How to create reloadable object files.
+reload_flag=$lt_reload_flag_FC
+reload_cmds=$lt_reload_cmds_FC
+
+# Commands used to build an old-style archive.
+old_archive_cmds=$lt_old_archive_cmds_FC
+
+# A language specific compiler.
+CC=$lt_compiler_FC
+
+# Is the compiler the GNU compiler?
+with_gcc=$GCC_FC
+
+# Compiler flag to turn off builtin functions.
+no_builtin_flag=$lt_lt_prog_compiler_no_builtin_flag_FC
+
+# How to pass a linker flag through the compiler.
+wl=$lt_lt_prog_compiler_wl_FC
+
+# Additional compiler flags for building library objects.
+pic_flag=$lt_lt_prog_compiler_pic_FC
+
+# Compiler flag to prevent dynamic linking.
+link_static_flag=$lt_lt_prog_compiler_static_FC
+
+# Does compiler simultaneously support -c and -o options?
+compiler_c_o=$lt_lt_cv_prog_compiler_c_o_FC
+
+# Whether or not to add -lc for building shared libraries.
+build_libtool_need_lc=$archive_cmds_need_lc_FC
+
+# Whether or not to disallow shared libs when runtime libs are static.
+allow_libtool_libs_with_static_runtimes=$enable_shared_with_static_runtimes_FC
+
+# Compiler flag to allow reflexive dlopens.
+export_dynamic_flag_spec=$lt_export_dynamic_flag_spec_FC
+
+# Compiler flag to generate shared objects directly from archives.
+whole_archive_flag_spec=$lt_whole_archive_flag_spec_FC
+
+# Whether the compiler copes with passing no objects directly.
+compiler_needs_object=$lt_compiler_needs_object_FC
+
+# Create an old-style archive from a shared archive.
+old_archive_from_new_cmds=$lt_old_archive_from_new_cmds_FC
+
+# Create a temporary old-style archive to link instead of a shared archive.
+old_archive_from_expsyms_cmds=$lt_old_archive_from_expsyms_cmds_FC
+
+# Commands used to build a shared archive.
+archive_cmds=$lt_archive_cmds_FC
+archive_expsym_cmds=$lt_archive_expsym_cmds_FC
+
+# Commands used to build a loadable module if different from building
+# a shared archive.
+module_cmds=$lt_module_cmds_FC
+module_expsym_cmds=$lt_module_expsym_cmds_FC
+
+# Whether we are building with GNU ld or not.
+with_gnu_ld=$lt_with_gnu_ld_FC
+
+# Flag that allows shared libraries with undefined symbols to be built.
+allow_undefined_flag=$lt_allow_undefined_flag_FC
+
+# Flag that enforces no undefined symbols.
+no_undefined_flag=$lt_no_undefined_flag_FC
+
+# Flag to hardcode \$libdir into a binary during linking.
+# This must work even if \$libdir does not exist
+hardcode_libdir_flag_spec=$lt_hardcode_libdir_flag_spec_FC
+
+# If ld is used when linking, flag to hardcode \$libdir into a binary
+# during linking.  This must work even if \$libdir does not exist.
+hardcode_libdir_flag_spec_ld=$lt_hardcode_libdir_flag_spec_ld_FC
+
+# Whether we need a single "-rpath" flag with a separated argument.
+hardcode_libdir_separator=$lt_hardcode_libdir_separator_FC
+
+# Set to "yes" if using DIR/libNAME\${shared_ext} during linking hardcodes
+# DIR into the resulting binary.
+hardcode_direct=$hardcode_direct_FC
+
+# Set to "yes" if using DIR/libNAME\${shared_ext} during linking hardcodes
+# DIR into the resulting binary and the resulting library dependency is
+# "absolute",i.e impossible to change by setting \${shlibpath_var} if the
+# library is relocated.
+hardcode_direct_absolute=$hardcode_direct_absolute_FC
+
+# Set to "yes" if using the -LDIR flag during linking hardcodes DIR
+# into the resulting binary.
+hardcode_minus_L=$hardcode_minus_L_FC
+
+# Set to "yes" if using SHLIBPATH_VAR=DIR during linking hardcodes DIR
+# into the resulting binary.
+hardcode_shlibpath_var=$hardcode_shlibpath_var_FC
+
+# Set to "yes" if building a shared library automatically hardcodes DIR
+# into the library and all subsequent libraries and executables linked
+# against it.
+hardcode_automatic=$hardcode_automatic_FC
+
+# Set to yes if linker adds runtime paths of dependent libraries
+# to runtime path list.
+inherit_rpath=$inherit_rpath_FC
+
+# Whether libtool must link a program against all its dependency libraries.
+link_all_deplibs=$link_all_deplibs_FC
+
+# Fix the shell variable \$srcfile for the compiler.
+fix_srcfile_path=$lt_fix_srcfile_path_FC
+
+# Set to "yes" if exported symbols are required.
+always_export_symbols=$always_export_symbols_FC
+
+# The commands to list exported symbols.
+export_symbols_cmds=$lt_export_symbols_cmds_FC
+
+# Symbols that should not be listed in the preloaded symbols.
+exclude_expsyms=$lt_exclude_expsyms_FC
+
+# Symbols that must always be exported.
+include_expsyms=$lt_include_expsyms_FC
+
+# Commands necessary for linking programs (against libraries) with templates.
+prelink_cmds=$lt_prelink_cmds_FC
+
+# Specify filename containing input files.
+file_list_spec=$lt_file_list_spec_FC
+
+# How to hardcode a shared library path into an executable.
+hardcode_action=$hardcode_action_FC
+
+# The directories searched by this compiler when creating a shared library.
+compiler_lib_search_dirs=$lt_compiler_lib_search_dirs_FC
+
+# Dependencies to place before and after the objects being linked to
+# create a shared library.
+predep_objects=$lt_predep_objects_FC
+postdep_objects=$lt_postdep_objects_FC
+predeps=$lt_predeps_FC
+postdeps=$lt_postdeps_FC
+
+# The library search path used internally by the compiler when linking
+# a shared library.
+compiler_lib_search_path=$lt_compiler_lib_search_path_FC
+
+# ### END LIBTOOL TAG CONFIG: FC
+_LT_EOF
+
+ ;;
+    "gstdint.h":C)
+if test "$GCC" = yes; then
+  echo "/* generated for " `$CC --version | sed 1q` "*/" > tmp-stdint.h
+else
+  echo "/* generated for $CC */" > tmp-stdint.h
+fi
+
+sed 's/^ *//' >> tmp-stdint.h <<EOF
+
+  #ifndef GCC_GENERATED_STDINT_H
+  #define GCC_GENERATED_STDINT_H 1
+
+  #include <sys/types.h>
+EOF
+
+if test "$acx_cv_header_stdint" != stdint.h; then
+  echo "#include <stddef.h>" >> tmp-stdint.h
+fi
+if test "$acx_cv_header_stdint" != stddef.h; then
+  echo "#include <$acx_cv_header_stdint>" >> tmp-stdint.h
+fi
+
+sed 's/^ *//' >> tmp-stdint.h <<EOF
+  /* glibc uses these symbols as guards to prevent redefinitions.  */
+  #ifdef __int8_t_defined
+  #define _INT8_T
+  #define _INT16_T
+  #define _INT32_T
+  #endif
+  #ifdef __uint32_t_defined
+  #define _UINT32_T
+  #endif
+
+EOF
+
+# ----------------- done header, emit basic int types -------------
+if test "$acx_cv_header_stdint" = stddef.h; then
+  sed 's/^ *//' >> tmp-stdint.h <<EOF
+
+    #ifndef _UINT8_T
+    #define _UINT8_T
+    #ifndef __uint8_t_defined
+    #define __uint8_t_defined
+    #ifndef uint8_t
+    typedef unsigned $acx_cv_type_int8_t uint8_t;
+    #endif
+    #endif
+    #endif
+
+    #ifndef _UINT16_T
+    #define _UINT16_T
+    #ifndef __uint16_t_defined
+    #define __uint16_t_defined
+    #ifndef uint16_t
+    typedef unsigned $acx_cv_type_int16_t uint16_t;
+    #endif
+    #endif
+    #endif
+
+    #ifndef _UINT32_T
+    #define _UINT32_T
+    #ifndef __uint32_t_defined
+    #define __uint32_t_defined
+    #ifndef uint32_t
+    typedef unsigned $acx_cv_type_int32_t uint32_t;
+    #endif
+    #endif
+    #endif
+
+    #ifndef _INT8_T
+    #define _INT8_T
+    #ifndef __int8_t_defined
+    #define __int8_t_defined
+    #ifndef int8_t
+    typedef $acx_cv_type_int8_t int8_t;
+    #endif
+    #endif
+    #endif
+
+    #ifndef _INT16_T
+    #define _INT16_T
+    #ifndef __int16_t_defined
+    #define __int16_t_defined
+    #ifndef int16_t
+    typedef $acx_cv_type_int16_t int16_t;
+    #endif
+    #endif
+    #endif
+
+    #ifndef _INT32_T
+    #define _INT32_T
+    #ifndef __int32_t_defined
+    #define __int32_t_defined
+    #ifndef int32_t
+    typedef $acx_cv_type_int32_t int32_t;
+    #endif
+    #endif
+    #endif
+EOF
+elif test "$ac_cv_type_u_int32_t" = yes; then
+  sed 's/^ *//' >> tmp-stdint.h <<EOF
+
+    /* int8_t int16_t int32_t defined by inet code, we do the u_intXX types */
+    #ifndef _INT8_T
+    #define _INT8_T
+    #endif
+    #ifndef _INT16_T
+    #define _INT16_T
+    #endif
+    #ifndef _INT32_T
+    #define _INT32_T
+    #endif
+
+    #ifndef _UINT8_T
+    #define _UINT8_T
+    #ifndef __uint8_t_defined
+    #define __uint8_t_defined
+    #ifndef uint8_t
+    typedef u_int8_t uint8_t;
+    #endif
+    #endif
+    #endif
+
+    #ifndef _UINT16_T
+    #define _UINT16_T
+    #ifndef __uint16_t_defined
+    #define __uint16_t_defined
+    #ifndef uint16_t
+    typedef u_int16_t uint16_t;
+    #endif
+    #endif
+    #endif
+
+    #ifndef _UINT32_T
+    #define _UINT32_T
+    #ifndef __uint32_t_defined
+    #define __uint32_t_defined
+    #ifndef uint32_t
+    typedef u_int32_t uint32_t;
+    #endif
+    #endif
+    #endif
+EOF
+else
+  sed 's/^ *//' >> tmp-stdint.h <<EOF
+
+    /* Some systems have guard macros to prevent redefinitions, define them.  */
+    #ifndef _INT8_T
+    #define _INT8_T
+    #endif
+    #ifndef _INT16_T
+    #define _INT16_T
+    #endif
+    #ifndef _INT32_T
+    #define _INT32_T
+    #endif
+    #ifndef _UINT8_T
+    #define _UINT8_T
+    #endif
+    #ifndef _UINT16_T
+    #define _UINT16_T
+    #endif
+    #ifndef _UINT32_T
+    #define _UINT32_T
+    #endif
+EOF
+fi
+
+# ------------- done basic int types, emit int64_t types ------------
+if test "$ac_cv_type_uint64_t" = yes; then
+  sed 's/^ *//' >> tmp-stdint.h <<EOF
+
+    /* system headers have good uint64_t and int64_t */
+    #ifndef _INT64_T
+    #define _INT64_T
+    #endif
+    #ifndef _UINT64_T
+    #define _UINT64_T
+    #endif
+EOF
+elif test "$ac_cv_type_u_int64_t" = yes; then
+  sed 's/^ *//' >> tmp-stdint.h <<EOF
+
+    /* system headers have an u_int64_t (and int64_t) */
+    #ifndef _INT64_T
+    #define _INT64_T
+    #endif
+    #ifndef _UINT64_T
+    #define _UINT64_T
+    #ifndef __uint64_t_defined
+    #define __uint64_t_defined
+    #ifndef uint64_t
+    typedef u_int64_t uint64_t;
+    #endif
+    #endif
+    #endif
+EOF
+elif test -n "$acx_cv_type_int64_t"; then
+  sed 's/^ *//' >> tmp-stdint.h <<EOF
+
+    /* architecture has a 64-bit type, $acx_cv_type_int64_t */
+    #ifndef _INT64_T
+    #define _INT64_T
+    #ifndef int64_t
+    typedef $acx_cv_type_int64_t int64_t;
+    #endif
+    #endif
+    #ifndef _UINT64_T
+    #define _UINT64_T
+    #ifndef __uint64_t_defined
+    #define __uint64_t_defined
+    #ifndef uint64_t
+    typedef unsigned $acx_cv_type_int64_t uint64_t;
+    #endif
+    #endif
+    #endif
+EOF
+else
+  sed 's/^ *//' >> tmp-stdint.h <<EOF
+
+    /* some common heuristics for int64_t, using compiler-specific tests */
+    #if defined __STDC_VERSION__ && (__STDC_VERSION__-0) >= 199901L
+    #ifndef _INT64_T
+    #define _INT64_T
+    #ifndef __int64_t_defined
+    #ifndef int64_t
+    typedef long long int64_t;
+    #endif
+    #endif
+    #endif
+    #ifndef _UINT64_T
+    #define _UINT64_T
+    #ifndef uint64_t
+    typedef unsigned long long uint64_t;
+    #endif
+    #endif
+
+    #elif defined __GNUC__ && defined (__STDC__) && __STDC__-0
+    /* NextStep 2.0 cc is really gcc 1.93 but it defines __GNUC__ = 2 and
+       does not implement __extension__.  But that compiler doesn't define
+       __GNUC_MINOR__.  */
+    # if __GNUC__ < 2 || (__NeXT__ && !__GNUC_MINOR__)
+    # define __extension__
+    # endif
+
+    # ifndef _INT64_T
+    # define _INT64_T
+    # ifndef int64_t
+    __extension__ typedef long long int64_t;
+    # endif
+    # endif
+    # ifndef _UINT64_T
+    # define _UINT64_T
+    # ifndef uint64_t
+    __extension__ typedef unsigned long long uint64_t;
+    # endif
+    # endif
+
+    #elif !defined __STRICT_ANSI__
+    # if defined _MSC_VER || defined __WATCOMC__ || defined __BORLANDC__
+
+    #  ifndef _INT64_T
+    #  define _INT64_T
+    #  ifndef int64_t
+    typedef __int64 int64_t;
+    #  endif
+    #  endif
+    #  ifndef _UINT64_T
+    #  define _UINT64_T
+    #  ifndef uint64_t
+    typedef unsigned __int64 uint64_t;
+    #  endif
+    #  endif
+    # endif /* compiler */
+
+    #endif /* ANSI version */
+EOF
+fi
+
+# ------------- done int64_t types, emit intptr types ------------
+if test "$ac_cv_type_uintptr_t" != yes; then
+  sed 's/^ *//' >> tmp-stdint.h <<EOF
+
+    /* Define intptr_t based on sizeof(void*) = $ac_cv_sizeof_void_p */
+    #ifndef __uintptr_t_defined
+    #ifndef uintptr_t
+    typedef u$acx_cv_type_intptr_t uintptr_t;
+    #endif
+    #endif
+    #ifndef __intptr_t_defined
+    #ifndef intptr_t
+    typedef $acx_cv_type_intptr_t  intptr_t;
+    #endif
+    #endif
+EOF
+fi
+
+# ------------- done intptr types, emit int_least types ------------
+if test "$ac_cv_type_int_least32_t" != yes; then
+  sed 's/^ *//' >> tmp-stdint.h <<EOF
+
+    /* Define int_least types */
+    typedef int8_t     int_least8_t;
+    typedef int16_t    int_least16_t;
+    typedef int32_t    int_least32_t;
+    #ifdef _INT64_T
+    typedef int64_t    int_least64_t;
+    #endif
+
+    typedef uint8_t    uint_least8_t;
+    typedef uint16_t   uint_least16_t;
+    typedef uint32_t   uint_least32_t;
+    #ifdef _UINT64_T
+    typedef uint64_t   uint_least64_t;
+    #endif
+EOF
+fi
+
+# ------------- done intptr types, emit int_fast types ------------
+if test "$ac_cv_type_int_fast32_t" != yes; then
+      sed 's/^ *//' >> tmp-stdint.h <<EOF
+
+    /* Define int_fast types.  short is often slow */
+    typedef int8_t       int_fast8_t;
+    typedef int          int_fast16_t;
+    typedef int32_t      int_fast32_t;
+    #ifdef _INT64_T
+    typedef int64_t      int_fast64_t;
+    #endif
+
+    typedef uint8_t      uint_fast8_t;
+    typedef unsigned int uint_fast16_t;
+    typedef uint32_t     uint_fast32_t;
+    #ifdef _UINT64_T
+    typedef uint64_t     uint_fast64_t;
+    #endif
+EOF
+fi
+
+if test "$ac_cv_type_uintmax_t" != yes; then
+  sed 's/^ *//' >> tmp-stdint.h <<EOF
+
+    /* Define intmax based on what we found */
+    #ifndef intmax_t
+    #ifdef _INT64_T
+    typedef int64_t       intmax_t;
+    #else
+    typedef long          intmax_t;
+    #endif
+    #endif
+    #ifndef uintmax_t
+    #ifdef _UINT64_T
+    typedef uint64_t      uintmax_t;
+    #else
+    typedef unsigned long uintmax_t;
+    #endif
+    #endif
+EOF
+fi
+
+sed 's/^ *//' >> tmp-stdint.h <<EOF
+
+  #endif /* GCC_GENERATED_STDINT_H */
+EOF
+
+if test -r gstdint.h && cmp -s tmp-stdint.h gstdint.h; then
+  rm -f tmp-stdint.h
+else
+  mv -f tmp-stdint.h gstdint.h
+fi
+
+ ;;
+
+  esac
+done # for ac_tag
+
+
+as_fn_exit 0
+_ACEOF
+ac_clean_files=$ac_clean_files_save
+
+test $ac_write_fail = 0 ||
+  as_fn_error "write failure creating $CONFIG_STATUS" "$LINENO" 5
+
+
+# configure is writing to config.log, and then calls config.status.
+# config.status does its own redirection, appending to config.log.
+# Unfortunately, on DOS this fails, as config.log is still kept open
+# by configure, so config.status won't be able to write to it; its
+# output is simply discarded.  So we exec the FD to /dev/null,
+# effectively closing config.log, so it can be properly (re)opened and
+# appended to by config.status.  When coming back to configure, we
+# need to make the FD available again.
+if test "$no_create" != yes; then
+  ac_cs_success=:
+  ac_config_status_args=
+  test "$silent" = yes &&
+    ac_config_status_args="$ac_config_status_args --quiet"
+  exec 5>/dev/null
+  $SHELL $CONFIG_STATUS $ac_config_status_args || ac_cs_success=false
+  exec 5>>config.log
+  # Use ||, not &&, to avoid exiting from the if with $? = 1, which
+  # would make configure fail if this is the last instruction.
+  $ac_cs_success || as_fn_exit $?
+fi
+if test -n "$ac_unrecognized_opts" && test "$enable_option_checking" != no; then
+  { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: unrecognized options: $ac_unrecognized_opts" >&5
+$as_echo "$as_me: WARNING: unrecognized options: $ac_unrecognized_opts" >&2;}
+fi
+
+
+
+# Check for functions needed.
+for ac_func in getloadavg clock_gettime strtoull
+do :
+  as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
+ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
+eval as_val=\$$as_ac_var
+   if test "x$as_val" = x""yes; then :
+  cat >>confdefs.h <<_ACEOF
+#define `$as_echo "HAVE_$ac_func" | $as_tr_cpp` 1
+_ACEOF
+
+fi
+done
+
+
+# Check for broken semaphore implementation on darwin.
+# sem_init returns: sem_init error: Function not implemented.
+case "$host" in
+  *-darwin*)
+
+$as_echo "#define HAVE_BROKEN_POSIX_SEMAPHORES 1" >>confdefs.h
+
+    ;;
+esac
+
+ # Check whether --enable-linux-futex was given.
+if test "${enable_linux_futex+set}" = set; then :
+  enableval=$enable_linux_futex;
+      case "$enableval" in
+       yes|no|default) ;;
+       *) as_fn_error "Unknown argument to enable/disable linux-futex" "$LINENO" 5 ;;
+                          esac
+
+else
+  enable_linux_futex=default
+fi
+
+
+case "$target" in
+  *-linux*)
+    case "$enable_linux_futex" in
+      default)
+	# If headers don't have gettid/futex syscalls definition, then
+	# default to no, otherwise there will be compile time failures.
+	# Otherwise, default to yes.  If we don't detect we are
+	# compiled/linked against NPTL and not cross-compiling, check
+	# if programs are run by default against NPTL and if not, issue
+	# a warning.
+	enable_linux_futex=no
+	cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#include <sys/syscall.h>
+	   int lk;
+int
+main ()
+{
+syscall (SYS_gettid); syscall (SYS_futex, &lk, 0, 0, 0);
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_link "$LINENO"; then :
+  save_LIBS="$LIBS"
+	   LIBS="-lpthread $LIBS"
+	   cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#ifndef _GNU_SOURCE
+	     #define _GNU_SOURCE 1
+	     #endif
+	     #include <pthread.h>
+	     pthread_t th; void *status;
+int
+main ()
+{
+pthread_tryjoin_np (th, &status);
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_link "$LINENO"; then :
+  enable_linux_futex=yes
+else
+  if test x$cross_compiling = xno; then
+	       if getconf GNU_LIBPTHREAD_VERSION 2>/dev/null \
+		  | LC_ALL=C grep -i NPTL > /dev/null 2>/dev/null; then :; else
+		 { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: The kernel might not support futex or gettid syscalls.
+If so, please configure with --disable-linux-futex" >&5
+$as_echo "$as_me: WARNING: The kernel might not support futex or gettid syscalls.
+If so, please configure with --disable-linux-futex" >&2;}
+	       fi
+	     fi
+	     enable_linux_futex=yes
+fi
+rm -f core conftest.err conftest.$ac_objext \
+    conftest$ac_exeext conftest.$ac_ext
+	   LIBS="$save_LIBS"
+fi
+rm -f core conftest.err conftest.$ac_objext \
+    conftest$ac_exeext conftest.$ac_ext
+	;;
+      yes)
+	cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#include <sys/syscall.h>
+	   int lk;
+int
+main ()
+{
+syscall (SYS_gettid); syscall (SYS_futex, &lk, 0, 0, 0);
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_link "$LINENO"; then :
+
+else
+  as_fn_error "SYS_gettid and SYS_futex required for --enable-linux-futex" "$LINENO" 5
+fi
+rm -f core conftest.err conftest.$ac_objext \
+    conftest$ac_exeext conftest.$ac_ext
+	;;
+    esac
+    ;;
+  *)
+    enable_linux_futex=no
+    ;;
+esac
+if test x$enable_linux_futex = xyes; then
+  :
+fi
+
+
+# Check for pthread_{,attr_}[sg]etaffinity_np.
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#define _GNU_SOURCE
+   #include <pthread.h>
+int
+main ()
+{
+cpu_set_t cpuset;
+   pthread_attr_t attr;
+   pthread_getaffinity_np (pthread_self (), sizeof (cpu_set_t), &cpuset);
+   if (CPU_ISSET (0, &cpuset))
+     CPU_SET (1, &cpuset);
+   else
+     CPU_ZERO (&cpuset);
+   pthread_setaffinity_np (pthread_self (), sizeof (cpu_set_t), &cpuset);
+   pthread_attr_init (&attr);
+   pthread_attr_getaffinity_np (&attr, sizeof (cpu_set_t), &cpuset);
+   pthread_attr_setaffinity_np (&attr, sizeof (cpu_set_t), &cpuset);
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_link "$LINENO"; then :
+
+$as_echo "#define HAVE_PTHREAD_AFFINITY_NP 1" >>confdefs.h
+
+fi
+rm -f core conftest.err conftest.$ac_objext \
+    conftest$ac_exeext conftest.$ac_ext
+
+# At least for glibc, clock_gettime is in librt.  But don't pull that
+# in if it still doesn't give us the function we want.
+if test $ac_cv_func_clock_gettime = no; then
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for clock_gettime in -lrt" >&5
+$as_echo_n "checking for clock_gettime in -lrt... " >&6; }
+if test "${ac_cv_lib_rt_clock_gettime+set}" = set; then :
+  $as_echo_n "(cached) " >&6
+else
+  ac_check_lib_save_LIBS=$LIBS
+LIBS="-lrt  $LIBS"
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+/* Override any GCC internal prototype to avoid an error.
+   Use char because int might match the return type of a GCC
+   builtin and then its argument prototype would still apply.  */
+#ifdef __cplusplus
+extern "C"
+#endif
+char clock_gettime ();
+int
+main ()
+{
+return clock_gettime ();
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_link "$LINENO"; then :
+  ac_cv_lib_rt_clock_gettime=yes
+else
+  ac_cv_lib_rt_clock_gettime=no
+fi
+rm -f core conftest.err conftest.$ac_objext \
+    conftest$ac_exeext conftest.$ac_ext
+LIBS=$ac_check_lib_save_LIBS
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_rt_clock_gettime" >&5
+$as_echo "$ac_cv_lib_rt_clock_gettime" >&6; }
+if test "x$ac_cv_lib_rt_clock_gettime" = x""yes; then :
+  LIBS="-lrt $LIBS"
+
+$as_echo "#define HAVE_CLOCK_GETTIME 1" >>confdefs.h
+
+fi
+
+fi
+
+# See if we support thread-local storage.
+
+
+   # Check whether --enable-tls was given.
+if test "${enable_tls+set}" = set; then :
+  enableval=$enable_tls;
+      case "$enableval" in
+       yes|no) ;;
+       *) as_fn_error "Argument to enable/disable tls must be yes or no" "$LINENO" 5 ;;
+      esac
+
+else
+  enable_tls=yes
 fi
 
 
@@ -16213,7 +19174,7 @@ fi
 # Get accel target and path to install tree of accel compiler
 offload_additional_options=
 offload_additional_lib_paths=
-offload_targets=
+offload_targets=host_nonshm
 if test x"$enable_offload_targets" != x; then
   for tgt in `echo $enable_offload_targets | sed -e 's#,# #g'`; do
     tgt_dir=`echo $tgt | grep '=' | sed 's/.*=//'`
@@ -16221,6 +19182,8 @@ if test x"$enable_offload_targets" != x; then
     case $tgt in
       *-intelmic-* | *-intelmicemul-*)
 	tgt_name="intelmic" ;;
+      nvptx-*)
+	tgt_name="nvptx" ;;
       *)
 	as_fn_error "unknown offload target specified" "$LINENO" 5 ;;
     esac
@@ -16364,6 +19327,8 @@ ac_config_files="$ac_config_files omp.h omp_lib.h omp_lib.f90 libgomp_f.h"
 
 ac_config_files="$ac_config_files Makefile testsuite/Makefile libgomp.spec"
 
+ac_config_files="$ac_config_files testsuite/libgomp-test-support.exp"
+
 cat >confcache <<\_ACEOF
 # This file is a shell script that caches the results of configure
 # tests run on this system so they can be shared between configure
@@ -16489,6 +19454,18 @@ if test -z "${MAINTAINER_MODE_TRUE}" && test -z "${MAINTAINER_MODE_FALSE}"; then
   as_fn_error "conditional \"MAINTAINER_MODE\" was never defined.
 Usually this means the macro was only invoked conditionally." "$LINENO" 5
 fi
+if test -z "${LIBGOMP_VERBOSE_TRUE}" && test -z "${LIBGOMP_VERBOSE_FALSE}"; then
+  as_fn_error "conditional \"LIBGOMP_VERBOSE\" was never defined.
+Usually this means the macro was only invoked conditionally." "$LINENO" 5
+fi
+if test -z "${PLUGIN_NVPTX_TRUE}" && test -z "${PLUGIN_NVPTX_FALSE}"; then
+  as_fn_error "conditional \"PLUGIN_NVPTX\" was never defined.
+Usually this means the macro was only invoked conditionally." "$LINENO" 5
+fi
+if test -z "${am__EXEEXT_TRUE}" && test -z "${am__EXEEXT_FALSE}"; then
+  as_fn_error "conditional \"am__EXEEXT\" was never defined.
+Usually this means the macro was only invoked conditionally." "$LINENO" 5
+fi
 if test -z "${LIBGOMP_BUILD_VERSIONED_SHLIB_TRUE}" && test -z "${LIBGOMP_BUILD_VERSIONED_SHLIB_FALSE}"; then
   as_fn_error "conditional \"LIBGOMP_BUILD_VERSIONED_SHLIB\" was never defined.
 Usually this means the macro was only invoked conditionally." "$LINENO" 5
@@ -16913,7 +19890,7 @@ cat >>$CONFIG_STATUS <<\_ACEOF || ac_write_fail=1
 # report actual input values of CONFIG_FILES etc. instead of their
 # values after options handling.
 ac_log="
-This file was extended by GNU OpenMP Runtime Library $as_me 1.0, which was
+This file was extended by GNU Offloading and Multi Processing Runtime Library $as_me 1.0, which was
 generated by GNU Autoconf 2.64.  Invocation command line was
 
   CONFIG_FILES    = $CONFIG_FILES
@@ -16973,13 +19950,13 @@ Configuration commands:
 $config_commands
 
 Report bugs to the package provider.
-GNU OpenMP Runtime Library home page: <http://www.gnu.org/software/libgomp/>.
+GNU Offloading and Multi Processing Runtime Library home page: <http://www.gnu.org/software/libgomp/>.
 General help using GNU software: <http://www.gnu.org/gethelp/>."
 
 _ACEOF
 cat >>$CONFIG_STATUS <<_ACEOF || ac_write_fail=1
 ac_cs_version="\\
-GNU OpenMP Runtime Library config.status 1.0
+GNU Offloading and Multi Processing Runtime Library config.status 1.0
 configured by $0, generated by GNU Autoconf 2.64,
   with options \\"`$as_echo "$ac_configure_args" | sed 's/^ //; s/[\\""\`\$]/\\\\&/g'`\\"
 
@@ -17463,6 +20440,7 @@ fi
 
 
 
+ac_aux_dir='$ac_aux_dir'
 
 
 
@@ -17504,6 +20482,7 @@ do
     "Makefile") CONFIG_FILES="$CONFIG_FILES Makefile" ;;
     "testsuite/Makefile") CONFIG_FILES="$CONFIG_FILES testsuite/Makefile" ;;
     "libgomp.spec") CONFIG_FILES="$CONFIG_FILES libgomp.spec" ;;
+    "testsuite/libgomp-test-support.exp") CONFIG_FILES="$CONFIG_FILES testsuite/libgomp-test-support.exp" ;;
 
   *) as_fn_error "invalid argument: \`$ac_config_target'" "$LINENO" 5;;
   esac
diff --git a/libgomp/configure.ac b/libgomp/configure.ac
index cea6366..68bcb27 100644
--- a/libgomp/configure.ac
+++ b/libgomp/configure.ac
@@ -2,7 +2,7 @@
 # aclocal -I ../config && autoconf && autoheader && automake
 
 AC_PREREQ(2.64)
-AC_INIT([GNU OpenMP Runtime Library], 1.0,,[libgomp])
+AC_INIT([GNU Offloading and Multi Processing Runtime Library], 1.0,,[libgomp])
 AC_CONFIG_HEADER(config.h)
 
 # -------
@@ -28,7 +28,6 @@ LIBGOMP_ENABLE(generated-files-in-srcdir, no, ,
 AC_MSG_RESULT($enable_generated_files_in_srcdir)
 AM_CONDITIONAL(GENINSRC, test "$enable_generated_files_in_srcdir" = yes)
 
-
 # -------
 # -------
 
@@ -193,13 +192,28 @@ AC_LINK_IFELSE(
    [],
    [AC_MSG_ERROR([Pthreads are required to build libgomp])])])
 
+# Enable --enable-libgomp-verbose
+AC_ARG_ENABLE(libgomp-verbose,
+[AS_HELP_STRING([--enable-libgomp-verbose],
+                [enable verbose debugging output for libgomp])],
+[case "${enableval}" in
+  yes) libgomp_verbose=true ;;
+  no) libgomp_verbose=false ;;
+  *) AC_MSG_ERROR([bad value ${enableval} for --enable-libgomp-verbose]) ;;
+esac], [libgomp_verbose=false])
+AM_CONDITIONAL([LIBGOMP_VERBOSE], [test x$libgomp_verbose = xtrue])
+
 plugin_support=yes
 AC_CHECK_LIB(dl, dlsym, , [plugin_support=no])
 if test x"$plugin_support" = xyes; then
   AC_DEFINE(PLUGIN_SUPPORT, 1,
     [Define if all infrastructure, needed for plugins, is supported.])
+elif test "x$enable_accelerator" != xno; then
+  AC_MSG_ERROR([Can't have support for accelerators without support for plugins])
 fi
 
+m4_include([plugin/configfrag.ac])
+
 # Check for functions needed.
 AC_CHECK_FUNCS(getloadavg clock_gettime strtoull)
 
@@ -283,7 +297,7 @@ fi
 # Get accel target and path to install tree of accel compiler
 offload_additional_options=
 offload_additional_lib_paths=
-offload_targets=
+offload_targets=host_nonshm
 if test x"$enable_offload_targets" != x; then
   for tgt in `echo $enable_offload_targets | sed -e 's#,# #g'`; do
     tgt_dir=`echo $tgt | grep '=' | sed 's/.*=//'`
@@ -291,6 +305,8 @@ if test x"$enable_offload_targets" != x; then
     case $tgt in
       *-intelmic-* | *-intelmicemul-*)
 	tgt_name="intelmic" ;;
+      nvptx-*)
+	tgt_name="nvptx" ;;
       *)
 	AC_MSG_ERROR([unknown offload target specified]) ;;
     esac
@@ -388,4 +404,5 @@ CFLAGS="$save_CFLAGS"
 
 AC_CONFIG_FILES(omp.h omp_lib.h omp_lib.f90 libgomp_f.h)
 AC_CONFIG_FILES(Makefile testsuite/Makefile libgomp.spec)
+AC_CONFIG_FILES([testsuite/libgomp-test-support.exp])
 AC_OUTPUT
diff --git a/libgomp/env.c b/libgomp/env.c
index 94c72a3..7e32eb7 100644
--- a/libgomp/env.c
+++ b/libgomp/env.c
@@ -27,6 +27,8 @@
 
 #include "libgomp.h"
 #include "libgomp_f.h"
+#include "libgomp_target.h"
+#include "oacc-int.h"
 #include <ctype.h>
 #include <stdlib.h>
 #include <stdio.h>
@@ -77,6 +79,10 @@ unsigned long gomp_bind_var_list_len;
 void **gomp_places_list;
 unsigned long gomp_places_list_len;
 
+int goacc_notify_var;
+int goacc_device_num;
+char* goacc_device_type;
+
 /* Parse the OMP_SCHEDULE environment variable.  */
 
 static void
@@ -1011,6 +1017,16 @@ parse_affinity (bool ignore)
   return false;
 }
 
+static void
+goacc_parse_device_type (void)
+{
+  const char *env = getenv ("ACC_DEVICE_TYPE");
+  
+  if (env && *env != '\0')
+    goacc_device_type = strdup (env);
+  else
+    goacc_device_type = NULL;
+}
 
 static void
 handle_omp_display_env (unsigned long stacksize, int wait_policy)
@@ -1181,6 +1197,7 @@ initialize_env (void)
       gomp_global_icv.thread_limit_var
 	= thread_limit_var > INT_MAX ? UINT_MAX : thread_limit_var;
     }
+  parse_int ("GOACC_NOTIFY", &goacc_notify_var, true);
 #ifndef HAVE_SYNC_BUILTINS
   gomp_mutex_init (&gomp_managed_threads_lock);
 #endif
@@ -1271,6 +1288,15 @@ initialize_env (void)
     }
 
   handle_omp_display_env (stacksize, wait_policy);
+  
+  /* Look for OpenACC-specific environment variables.  */
+  if (!parse_int ("ACC_DEVICE_NUM", &goacc_device_num, true))
+    goacc_device_num = 0;
+
+  goacc_parse_device_type ();
+
+  /* Initialize OpenACC-specific internal state.  */
+  goacc_runtime_initialize ();
 }
 
 \f
diff --git a/libgomp/error.c b/libgomp/error.c
index d9b28f1..c455f58 100644
--- a/libgomp/error.c
+++ b/libgomp/error.c
@@ -35,7 +35,7 @@
 #include <stdlib.h>
 
 
-static void
+void
 gomp_verror (const char *fmt, va_list list)
 {
   fputs ("\nlibgomp: ", stderr);
@@ -54,13 +54,40 @@ gomp_error (const char *fmt, ...)
 }
 
 void
+gomp_vfatal (const char *fmt, va_list list)
+{
+  gomp_verror (fmt, list);
+  exit (EXIT_FAILURE);
+}
+
+void
 gomp_fatal (const char *fmt, ...)
 {
   va_list list;
 
   va_start (list, fmt);
-  gomp_verror (fmt, list);
+  gomp_vfatal (fmt, list);
   va_end (list);
+}
 
-  exit (EXIT_FAILURE);
+#ifdef LIBGOMP_VERBOSE
+
+#undef gomp_vnotify
+void
+gomp_vnotify (const char *msg, va_list list)
+{
+  if (goacc_notify_var)
+    vfprintf (stderr, msg, list);
+}
+
+#undef gomp_notify
+void
+gomp_notify (const char *msg, ...)
+{
+  va_list list;
+  
+  va_start (list, msg);
+  gomp_vnotify (msg, list);
+  va_end (list);
 }
+#endif
diff --git a/libgomp/libgomp-plugin.c b/libgomp/libgomp-plugin.c
new file mode 100644
index 0000000..51f3a38
--- /dev/null
+++ b/libgomp/libgomp-plugin.c
@@ -0,0 +1,107 @@
+/* Copyright (C) 2014 Free Software Foundation, Inc.
+
+   Contributed by Mentor Embedded.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* Exported (non-hidden) functions exposing libgomp interface for plugins.  */
+
+#include <stdlib.h>
+
+#include "libgomp.h"
+#include "libgomp-plugin.h"
+#include "libgomp_target.h"
+
+void *
+GOMP_PLUGIN_malloc (size_t size)
+{
+  return gomp_malloc (size);
+}
+
+void *
+GOMP_PLUGIN_malloc_cleared (size_t size)
+{
+  return gomp_malloc_cleared (size);
+}
+
+void *
+GOMP_PLUGIN_realloc (void *ptr, size_t size)
+{
+  return gomp_realloc (ptr, size);
+}
+
+void
+GOMP_PLUGIN_error (const char *msg, ...)
+{
+  va_list ap;
+  
+  va_start (ap, msg);
+  gomp_verror (msg, ap);
+  va_end (ap);
+}
+
+void
+GOMP_PLUGIN_notify (const char *msg, ...)
+{
+  va_list ap;
+  
+  va_start (ap, msg);
+  gomp_vnotify (msg, ap);
+  va_end (ap);
+}
+
+void
+GOMP_PLUGIN_fatal (const char *msg, ...)
+{
+  va_list ap;
+  
+  va_start (ap, msg);
+  gomp_vfatal (msg, ap);
+  va_end (ap);
+  
+  /* Unreachable.  */
+  abort ();
+}
+
+void
+GOMP_PLUGIN_mutex_init (gomp_mutex_t *mutex)
+{
+  gomp_mutex_init (mutex);
+}
+
+void
+GOMP_PLUGIN_mutex_destroy (gomp_mutex_t *mutex)
+{
+  gomp_mutex_destroy (mutex);
+}
+
+void
+GOMP_PLUGIN_mutex_lock (gomp_mutex_t *mutex)
+{
+  gomp_mutex_lock (mutex);
+}
+
+void
+GOMP_PLUGIN_mutex_unlock (gomp_mutex_t *mutex)
+{
+  gomp_mutex_unlock (mutex);
+}
diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h
new file mode 100644
index 0000000..87367e3
--- /dev/null
+++ b/libgomp/libgomp-plugin.h
@@ -0,0 +1,54 @@
+/* Copyright (C) 2014 Free Software Foundation, Inc.
+
+   Contributed by Mentor Embedded.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* An interface to various libgomp-internal functions for use by plugins.  */
+
+#ifndef LIBGOMP_PLUGIN_H
+#define LIBGOMP_PLUGIN_H 1
+
+#include "mutex.h"
+
+/* alloc.c */
+
+extern void *GOMP_PLUGIN_malloc (size_t) __attribute__((malloc));
+extern void *GOMP_PLUGIN_malloc_cleared (size_t) __attribute__((malloc));
+extern void *GOMP_PLUGIN_realloc (void *, size_t);
+
+/* error.c */
+
+extern void GOMP_PLUGIN_notify(const char *msg, ...);
+extern void GOMP_PLUGIN_error (const char *, ...)
+	__attribute__((format (printf, 1, 2)));
+extern void GOMP_PLUGIN_fatal (const char *, ...)
+	__attribute__((noreturn, format (printf, 1, 2)));
+
+/* mutex.c */
+
+extern void GOMP_PLUGIN_mutex_init (gomp_mutex_t *mutex);
+extern void GOMP_PLUGIN_mutex_destroy (gomp_mutex_t *mutex);
+extern void GOMP_PLUGIN_mutex_lock (gomp_mutex_t *mutex);
+extern void GOMP_PLUGIN_mutex_unlock (gomp_mutex_t *mutex);
+
+#endif
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index a1482cc..b86b960 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -40,6 +40,7 @@
 #include <pthread.h>
 #include <stdbool.h>
 #include <stdlib.h>
+#include <stdarg.h>
 
 #ifdef HAVE_ATTRIBUTE_VISIBILITY
 # pragma GCC visibility push(hidden)
@@ -220,6 +221,7 @@ struct gomp_team_state
 };
 
 struct target_mem_desc;
+struct gomp_memory_mapping;
 
 /* These are the OpenMP 4.0 Internal Control Variables described in
    section 2.3.1.  Those described as having one copy per task are
@@ -254,6 +256,10 @@ extern unsigned long gomp_bind_var_list_len;
 extern void **gomp_places_list;
 extern unsigned long gomp_places_list_len;
 
+extern int goacc_notify_var;
+extern int goacc_device_num;
+extern char* goacc_device_type;
+
 enum gomp_task_kind
 {
   GOMP_TASK_IMPLICIT,
@@ -532,8 +538,29 @@ extern void *gomp_realloc (void *, size_t);
 
 /* error.c */
 
+#ifdef LIBGOMP_VERBOSE
+extern void gomp_vnotify (const char *, va_list);
+extern void gomp_notify (const char *msg, ...)
+	__attribute__((format (printf, 1, 2)));
+#define gomp_notify(...) \
+  do { \
+    if (__builtin_expect (goacc_notify_var, 0)) \
+      (gomp_notify) (__VA_ARGS__); \
+  } while (0)
+#define gomp_vnotify(FMT, VALIST) \
+  do { \
+    if (__builtin_expect (goacc_notify_var, 0)) \
+      (gomp_vnotify) ((FMT), (VALIST)); \
+  } while (0)
+#else
+#define gomp_vnotify(FMT, VALIST)
+#define gomp_notify(FMT, ...)
+#endif
+extern void gomp_verror (const char *, va_list);
 extern void gomp_error (const char *, ...)
 	__attribute__((format (printf, 1, 2)));
+extern void gomp_vfatal (const char *, va_list)
+	__attribute__((noreturn));
 extern void gomp_fatal (const char *, ...)
 	__attribute__((noreturn, format (printf, 1, 2)));
 
@@ -606,6 +633,7 @@ extern void gomp_free_thread (void *);
 
 /* target.c */
 
+extern void gomp_init_targets_once (void);
 extern int gomp_get_num_devices (void);
 
 /* work.c */
diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
index f36df23..f6e70e9 100644
--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -232,3 +232,98 @@ GOMP_4.0.1 {
   global:
 	GOMP_offload_register;
 } GOMP_4.0;
+
+OACC_2.0 {
+  global:
+	acc_get_num_devices;
+	acc_set_device_type;
+	acc_get_device_type;
+	acc_set_device_num;
+	acc_get_device_num;
+	acc_async_test;
+	acc_async_test_h_;
+	acc_async_test_all;
+	acc_async_test_all_h_;
+	acc_wait;
+	acc_wait_async;
+	acc_wait_all;
+	acc_wait_all_async;
+	acc_init;
+	acc_shutdown;
+	acc_on_device;
+	acc_on_device_h_;
+	acc_malloc;
+	acc_free;
+	acc_copyin;
+	acc_copyin_32_h_;
+	acc_copyin_64_h_;
+	acc_copyin_array_h_;
+	acc_present_or_copyin;
+	acc_present_or_copyin_32_h_;
+	acc_present_or_copyin_64_h_;
+	acc_present_or_copyin_array_h_;
+	acc_create;
+	acc_create_32_h_;
+	acc_create_64_h_;
+	acc_create_array_h_;
+	acc_present_or_create;
+	acc_present_or_create_32_h_;
+	acc_present_or_create_64_h_;
+	acc_present_or_create_array_h_;
+	acc_copyout;
+	acc_copyout_32_h_;
+	acc_copyout_64_h_;
+	acc_copyout_array_h_;
+	acc_delete;
+	acc_delete_32_h_;
+	acc_delete_64_h_;
+	acc_delete_array_h_;
+	acc_update_device;
+	acc_update_device_32_h_;
+	acc_update_device_64_h_;
+	acc_update_device_array_h_;
+	acc_update_self;
+	acc_update_self_32_h_;
+	acc_update_self_64_h_;
+	acc_update_self_array_h_;
+	acc_map_data;
+	acc_unmap_data;
+	acc_deviceptr;
+	acc_hostptr;
+	acc_is_present;
+	acc_is_present_32_h_;
+	acc_is_present_64_h_;
+	acc_is_present_array_h_;
+	acc_memcpy_to_device;
+	acc_memcpy_from_device;
+	acc_get_current_cuda_device;
+	acc_get_current_cuda_context;
+	acc_get_cuda_stream;
+	acc_set_cuda_stream;
+};
+
+GOACC_2.0 {
+  global:
+	GOACC_data_end;
+	GOACC_data_start;
+	GOACC_kernels;
+	GOACC_parallel;
+	GOACC_update;
+	GOACC_wait;
+};
+
+GOMP_PLUGIN_1.0 {
+  global:
+	GOMP_PLUGIN_malloc;
+	GOMP_PLUGIN_malloc_cleared;
+	GOMP_PLUGIN_realloc;
+	GOMP_PLUGIN_error;
+	GOMP_PLUGIN_notify;
+	GOMP_PLUGIN_fatal;
+	GOMP_PLUGIN_mutex_init;
+	GOMP_PLUGIN_mutex_destroy;
+	GOMP_PLUGIN_mutex_lock;
+	GOMP_PLUGIN_mutex_unlock;
+	GOMP_PLUGIN_async_unmap_vars;
+	GOMP_PLUGIN_acc_thread;
+};
diff --git a/libgomp/libgomp_g.h b/libgomp/libgomp_g.h
index be0c6ea..44f200c 100644
--- a/libgomp/libgomp_g.h
+++ b/libgomp/libgomp_g.h
@@ -214,4 +214,17 @@ extern void GOMP_target_update (int, const void *,
 				size_t, void **, size_t *, unsigned char *);
 extern void GOMP_teams (unsigned int, unsigned int);
 
+/* oacc-parallel.c */
+
+extern void GOACC_data_start (int, const void *,
+			      size_t, void **, size_t *, unsigned short *);
+extern void GOACC_data_end (void);
+extern void GOACC_kernels (int, void (*) (void *), const void *,
+			   size_t, void **, size_t *, unsigned short *,
+			   int, int, int, int, int, ...);
+extern void GOACC_parallel (int, void (*) (void *), const void *,
+			    size_t, void **, size_t *, unsigned short *,
+			    int, int, int, int, int, ...);
+extern void GOACC_wait (int, int, ...);
+
 #endif /* LIBGOMP_G_H */
diff --git a/libgomp/libgomp_target.h b/libgomp/libgomp_target.h
index f7d19d0..679368a 100644
--- a/libgomp/libgomp_target.h
+++ b/libgomp/libgomp_target.h
@@ -24,11 +24,15 @@
 #ifndef LIBGOMP_TARGET_H
 #define LIBGOMP_TARGET_H 1
 
-/* Type of offload target device.  */
+#include "gomp-constants.h"
+
+/* Type of offload target device.  Keep in sync with openacc.h:acc_device_t.  */
 enum offload_target_type
 {
-  OFFLOAD_TARGET_TYPE_HOST,
-  OFFLOAD_TARGET_TYPE_INTEL_MIC
+  OFFLOAD_TARGET_TYPE_HOST = GOMP_TARGET_HOST,
+  OFFLOAD_TARGET_TYPE_HOST_NONSHM = GOMP_TARGET_HOST_NONSHM,
+  OFFLOAD_TARGET_TYPE_NVIDIA_PTX = GOMP_TARGET_NVIDIA_PTX,
+  OFFLOAD_TARGET_TYPE_INTEL_MIC = GOMP_TARGET_INTEL_MIC
 };
 
 /* Auxiliary struct, used for transferring a host-target address range mapping
@@ -41,4 +45,177 @@ struct mapping_table
   uintptr_t tgt_end;
 };
 
+#include "splay-tree.h"
+
+struct target_mem_desc {
+  /* Reference count.  */
+  uintptr_t refcount;
+  /* All the splay nodes allocated together.  */
+  splay_tree_node array;
+  /* Start of the target region.  */
+  uintptr_t tgt_start;
+  /* End of the targer region.  */
+  uintptr_t tgt_end;
+  /* Handle to free.  */
+  void *to_free;
+  /* Previous target_mem_desc.  */
+  struct target_mem_desc *prev;
+  /* Number of items in following list.  */
+  size_t list_count;
+
+  /* Corresponding target device descriptor.  */
+  struct gomp_device_descr *device_descr;
+  
+  /* Memory mapping info for the thread that created this descriptor.  */
+  struct gomp_memory_mapping *mem_map;
+
+  /* List of splay keys to remove (or decrease refcount)
+     at the end of region.  */
+  splay_tree_key list[];
+};
+
+#define TARGET_CAP_SHARED_MEM	1
+#define TARGET_CAP_NATIVE_EXEC	2
+#define TARGET_CAP_OPENMP_400	4
+#define TARGET_CAP_OPENACC_200	8
+
+/* Information about mapped memory regions (per device/context).  */
+
+struct gomp_memory_mapping
+{
+  /* Splay tree containing information about mapped memory regions.  */
+  struct splay_tree_s splay_tree;
+
+  /* Mutex for operating with the splay tree and other shared structures.  */
+  gomp_mutex_t lock;
+  
+  /* True when tables have been added to this memory map.  */
+  bool is_initialized;
+};
+
+typedef struct acc_dispatch_t
+{
+  /* This is a linked list of data mapped using the
+     acc_map_data/acc_unmap_data or "acc enter data"/"acc exit data" pragmas
+     (TODO).  Unlike mapped_data in the goacc_thread struct, unmapping can
+     happen out-of-order with respect to mapping.  */
+  struct target_mem_desc *data_environ;
+
+  /* Open or close a device instance.  */
+  void *(*open_device_func) (int n);
+  int (*close_device_func) (void *h);
+
+  /* Set or get the device number.  */
+  int (*get_device_num_func) (void);
+  void (*set_device_num_func) (int);
+
+  /* Execute.  */
+  void (*exec_func) (void (*) (void *), size_t, void **, void **, size_t *,
+		     unsigned short *, int, int, int, int, void *);
+
+  /* Async cleanup callback registration.  */
+  void (*register_async_cleanup_func) (void *);
+
+  /* Asynchronous routines.  */
+  int (*async_test_func) (int);
+  int (*async_test_all_func) (void);
+  void (*async_wait_func) (int);
+  void (*async_wait_async_func) (int, int);
+  void (*async_wait_all_func) (void);
+  void (*async_wait_all_async_func) (int);
+  void (*async_set_async_func) (int);
+
+  /* Create/destroy TLS data.  */
+  void *(*create_thread_data_func) (void *);
+  void (*destroy_thread_data_func) (void *);
+
+  /* NVIDIA target specific routines.  */
+  struct {
+    void *(*get_current_device_func) (void);
+    void *(*get_current_context_func) (void);
+    void *(*get_stream_func) (int);
+    int (*set_stream_func) (int, void *);
+  } cuda;
+} acc_dispatch_t;
+
+/* This structure describes accelerator device.
+   It contains name of the corresponding libgomp plugin, function handlers for
+   interaction with the device, ID-number of the device, and information about
+   mapped memory.  */
+struct gomp_device_descr
+{
+  /* The name of the device.  */
+  const char *name;
+
+  /* Capabilities of device (supports OpenACC, OpenMP).  */
+  unsigned int capabilities;
+
+  /* This is the ID number of device.  It could be specified in DEVICE-clause of
+     TARGET construct.  */
+  int id;
+
+  /* This is the ID number of device among devices of the same type.  */
+  int target_id;
+
+  /* This is the TYPE of device.  */
+  enum offload_target_type type;
+
+  /* Set to true when device is initialized.  */
+  bool is_initialized;
+  
+  /* True when offload regions have been registered with this device.  */
+  bool offload_regions_registered;
+
+  /* Plugin file handler.  */
+  void *plugin_handle;
+
+  /* Function handlers.  */
+  const char *(*get_name_func) (void);
+  unsigned int (*get_caps_func) (void);
+  int (*get_type_func) (void);
+  int (*get_num_devices_func) (void);
+  void (*register_image_func) (void *, void *);
+  void (*init_device_func) (int);
+  void (*fini_device_func) (int);
+  int (*get_table_func) (int, struct mapping_table **);
+  void *(*alloc_func) (int, size_t);
+  void (*free_func) (int, void *);
+  void *(*dev2host_func) (int, void *, const void *, size_t);
+  void *(*host2dev_func) (int, void *, const void *, size_t);
+  void (*run_func) (int, void *, void *);
+
+  /* OpenACC-specific functions.  */
+  acc_dispatch_t openacc;
+  
+  /* Memory-mapping info for this device instance.  */
+  struct gomp_memory_mapping mem_map;
+
+  /* Extra information required for a device instance by a given target.  */
+  void *target_data;
+};
+
+extern struct target_mem_desc *
+gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
+	       void **hostaddrs, void **devaddrs, size_t *sizes, void *kinds,
+	       bool is_openacc, bool is_target);
+
+extern void
+gomp_copy_from_async (struct target_mem_desc *tgt);
+
+extern void
+gomp_unmap_vars (struct target_mem_desc *tgt, bool);
+
+extern attribute_hidden void
+gomp_init_device (struct gomp_device_descr *devicep);
+
+extern attribute_hidden void
+gomp_init_tables (const struct gomp_device_descr *devicep,
+		  struct gomp_memory_mapping *mm);
+
+extern attribute_hidden void
+gomp_fini_device (struct gomp_device_descr *devicep);
+
+extern attribute_hidden void
+gomp_free_memmap (struct gomp_device_descr *devicep);
+
 #endif /* LIBGOMP_TARGET_H */
diff --git a/libgomp/oacc-async.c b/libgomp/oacc-async.c
new file mode 100644
index 0000000..94c62d8
--- /dev/null
+++ b/libgomp/oacc-async.c
@@ -0,0 +1,77 @@
+/* OpenACC Runtime Library Definitions.
+
+   Copyright (C) 2013-2014 Free Software Foundation, Inc.
+
+   Contributed by Mentor Embedded.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#include "openacc.h"
+#include "libgomp.h"
+#include "libgomp_target.h"
+#include "oacc-int.h"
+
+int
+acc_async_test (int async)
+{
+  if (async < acc_async_sync)
+    gomp_fatal ("invalid async argument: %d", async);
+
+  return base_dev->openacc.async_test_func (async);
+}
+
+int
+acc_async_test_all (void)
+{
+  return base_dev->openacc.async_test_all_func ();
+}
+
+void
+acc_wait (int async)
+{
+  if (async < acc_async_sync)
+    gomp_fatal ("invalid async argument: %d", async);
+
+  base_dev->openacc.async_wait_func (async);
+}
+
+void
+acc_wait_async (int async1, int async2)
+{
+  base_dev->openacc.async_wait_async_func (async1, async2);
+}
+
+void
+acc_wait_all (void)
+{
+  base_dev->openacc.async_wait_all_func ();
+}
+
+void
+acc_wait_all_async (int async)
+{
+  if (async < acc_async_sync)
+    gomp_fatal ("invalid async argument: %d", async);
+
+  base_dev->openacc.async_wait_all_async_func (async);
+}
diff --git a/libgomp/oacc-cuda.c b/libgomp/oacc-cuda.c
new file mode 100644
index 0000000..4d0b284
--- /dev/null
+++ b/libgomp/oacc-cuda.c
@@ -0,0 +1,84 @@
+/* OpenACC Runtime Library: CUDA support glue.
+
+   Copyright (C) 2014 Free Software Foundation, Inc.
+
+   Contributed by Mentor Embedded.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "openacc.h"
+#include "config.h"
+#include "libgomp.h"
+#include "libgomp_target.h"
+#include "oacc-int.h"
+
+void *
+acc_get_current_cuda_device (void)
+{
+  void *p = NULL;
+
+  if (base_dev && base_dev->openacc.cuda.get_current_device_func)
+    p = base_dev->openacc.cuda.get_current_device_func ();
+
+  return p;
+}
+
+void *
+acc_get_current_cuda_context (void)
+{
+  void *p = NULL;
+
+  if (base_dev && base_dev->openacc.cuda.get_current_context_func)
+    p = base_dev->openacc.cuda.get_current_context_func ();
+
+  return p;
+}
+
+void *
+acc_get_cuda_stream (int async)
+{
+  void *p = NULL;
+
+  if (async < 0)
+    return p;
+
+  if (base_dev && base_dev->openacc.cuda.get_stream_func)
+    p = base_dev->openacc.cuda.get_stream_func (async);
+
+  return p;
+}
+
+int
+acc_set_cuda_stream (int async, void *stream)
+{
+  int s = -1;
+
+  if (async < 0 || stream == NULL)
+    return 0;
+  
+  goacc_lazy_initialize ();
+
+  if (base_dev && base_dev->openacc.cuda.set_stream_func)
+    s = base_dev->openacc.cuda.set_stream_func (async, stream);
+
+  return s;
+}
diff --git a/libgomp/oacc-host.c b/libgomp/oacc-host.c
new file mode 100644
index 0000000..0d94465
--- /dev/null
+++ b/libgomp/oacc-host.c
@@ -0,0 +1,99 @@
+/* OpenACC Runtime Library: acc_device_host, acc_device_host_nonshm.
+
+   Copyright (C) 2013-2014 Free Software Foundation, Inc.
+
+   Contributed by Mentor Embedded.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* This shares much of the implementation of the plugin-host.c "host_nonshm"
+   plugin.  */
+#include "plugin/plugin-host.c"
+
+static struct gomp_device_descr host_dispatch =
+  {
+    .name = "host",
+
+    .type = OFFLOAD_TARGET_TYPE_HOST,
+    .capabilities = TARGET_CAP_OPENACC_200 | TARGET_CAP_NATIVE_EXEC
+		    | TARGET_CAP_SHARED_MEM,
+    .id = 0,
+
+    .is_initialized = false,
+    .offload_regions_registered = false,
+
+    .get_name_func = GOMP_OFFLOAD_get_name,
+    .get_type_func = GOMP_OFFLOAD_get_type,
+    .get_caps_func = GOMP_OFFLOAD_get_caps,
+
+    .init_device_func = GOMP_OFFLOAD_init_device,
+    .fini_device_func = GOMP_OFFLOAD_fini_device,
+    .get_num_devices_func = GOMP_OFFLOAD_get_num_devices,
+    .register_image_func = GOMP_OFFLOAD_register_image,
+    .get_table_func = GOMP_OFFLOAD_get_table,
+
+    .alloc_func = GOMP_OFFLOAD_alloc,
+    .free_func = GOMP_OFFLOAD_free,
+    .host2dev_func = GOMP_OFFLOAD_host2dev,
+    .dev2host_func = GOMP_OFFLOAD_dev2host,
+    
+    .run_func = GOMP_OFFLOAD_run,
+
+    .openacc = {
+      .open_device_func = GOMP_OFFLOAD_openacc_open_device,
+      .close_device_func = GOMP_OFFLOAD_openacc_close_device,
+
+      .get_device_num_func = GOMP_OFFLOAD_openacc_get_device_num,
+      .set_device_num_func = GOMP_OFFLOAD_openacc_set_device_num,
+
+      .exec_func = GOMP_OFFLOAD_openacc_parallel,
+
+      .register_async_cleanup_func
+        = GOMP_OFFLOAD_openacc_register_async_cleanup,
+
+      .async_set_async_func = GOMP_OFFLOAD_openacc_async_set_async,
+      .async_test_func = GOMP_OFFLOAD_openacc_async_test,
+      .async_test_all_func = GOMP_OFFLOAD_openacc_async_test_all,
+      .async_wait_func = GOMP_OFFLOAD_openacc_async_wait,
+      .async_wait_async_func = GOMP_OFFLOAD_openacc_async_wait_async,
+      .async_wait_all_func = GOMP_OFFLOAD_openacc_async_wait_all,
+      .async_wait_all_async_func = GOMP_OFFLOAD_openacc_async_wait_all_async,
+
+      .create_thread_data_func = GOMP_OFFLOAD_openacc_create_thread_data,
+      .destroy_thread_data_func = GOMP_OFFLOAD_openacc_destroy_thread_data,
+
+      .cuda = {
+	.get_current_device_func = NULL,
+	.get_current_context_func = NULL,
+	.get_stream_func = NULL,
+	.set_stream_func = NULL,
+      }
+    }
+  };
+
+/* Register this device type.  */
+static __attribute__ ((constructor))
+void goacc_host_init (void)
+{
+  gomp_mutex_init (&host_dispatch.mem_map.lock);
+  goacc_register (&host_dispatch);
+}
diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c
new file mode 100644
index 0000000..ed5deb3
--- /dev/null
+++ b/libgomp/oacc-init.c
@@ -0,0 +1,613 @@
+/* OpenACC Runtime initialization routines
+
+   Copyright (C) 2013-2014 Free Software Foundation, Inc.
+
+   Contributed by Mentor Embedded.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "libgomp.h"
+#include "libgomp_target.h"
+#include "oacc-int.h"
+#include "openacc.h"
+#include <assert.h>
+#include <stdlib.h>
+#include <strings.h>
+#include <stdbool.h>
+#include <stdio.h>
+
+static gomp_mutex_t acc_device_lock;
+
+/* The dispatch table for the current accelerator device.  This is global, so
+   you can only have one type of device open at any given time in a program. 
+   This is the "base" device in that several devices that use the same
+   dispatch table may be active concurrently: this one (the "zeroth") is used
+   for overall initialisation/shutdown, and other instances -- not necessarily
+   including this one -- may be opened and closed once the base device has
+   been initialized.  */
+struct gomp_device_descr const *base_dev;
+
+#ifdef HAVE_TLS
+__thread struct goacc_thread *goacc_tls_data;
+#else
+pthread_key_t goacc_tls_key;
+#endif
+static pthread_key_t goacc_cleanup_key;
+
+/* Current dispatcher, and how it was initialized */
+static acc_device_t init_key = _ACC_device_hwm;
+
+static struct goacc_thread *goacc_threads;
+static gomp_mutex_t goacc_thread_lock;
+
+/* An array of dispatchers for device types, indexed by the type.  This array
+   only references "base" devices, and other instances of the same type are
+   found by simply indexing from each such device (which are stored linearly,
+   grouped by device in target.c:devices).  */
+static struct gomp_device_descr const *dispatchers[_ACC_device_hwm] = { 0 };
+
+attribute_hidden void
+goacc_register (struct gomp_device_descr const *disp)
+{
+  /* Only register the 0th device here.  */
+  if (disp->target_id != 0)
+    return;
+
+  gomp_mutex_lock (&acc_device_lock);
+
+  assert (acc_device_type (disp->type) != acc_device_none
+	  && acc_device_type (disp->type) != acc_device_default
+	  && acc_device_type (disp->type) != acc_device_not_host);
+  assert (!dispatchers[disp->type]);
+  dispatchers[disp->type] = disp;
+
+  gomp_mutex_unlock (&acc_device_lock);
+}
+
+static struct gomp_device_descr const *
+resolve_device (acc_device_t d)
+{
+  acc_device_t d_arg = d;
+
+  switch (d)
+    {
+    case acc_device_default:
+      {
+	if (goacc_device_type)
+	  {
+	    /* Lookup the named device.  */
+	    while (++d != _ACC_device_hwm)
+	      if (dispatchers[d]
+		  && !strcasecmp (goacc_device_type, dispatchers[d]->name)
+		  && dispatchers[d]->get_num_devices_func () > 0)
+		goto found;
+
+	    gomp_fatal ("device type %s not supported", goacc_device_type);
+	  }
+
+	/* No default device specified, so start scanning for any non-host
+	   device that is available.  */
+	d = acc_device_not_host;
+      }
+      /* FALLTHROUGH */
+
+    case acc_device_not_host:
+      /* Find the first available device after acc_device_not_host.  */
+      while (++d != _ACC_device_hwm)
+	if (dispatchers[d] && dispatchers[d]->get_num_devices_func () > 0)
+	  goto found;
+      if (d_arg == acc_device_default)
+	{	  
+	  d = acc_device_host;
+	  goto found;
+	}
+      gomp_fatal ("no device found");
+      break;
+
+    case acc_device_host:
+      break;
+
+    default:
+      if (d > _ACC_device_hwm)
+	gomp_fatal ("device %u out of range", (unsigned)d);
+      break;
+    }
+ found:
+
+  assert (d != acc_device_none
+	  && d != acc_device_default
+	  && d != acc_device_not_host);
+
+  return dispatchers[d];
+}
+
+/* This is called when plugins have been initialized, and serves to call
+   (indirectly) the target's device_init hook.  Calling multiple times without
+   an intervening acc_shutdown_1 call is an error.  */
+
+static struct gomp_device_descr const *
+acc_init_1 (acc_device_t d)
+{
+  struct gomp_device_descr const *acc_dev;
+
+  acc_dev = resolve_device (d);
+
+  if (!acc_dev || acc_dev->get_num_devices_func () <= 0)
+    gomp_fatal ("device %u not supported", (unsigned)d);
+
+  if (acc_dev->is_initialized)
+    gomp_fatal ("device already active");
+
+  /* We need to remember what we were intialized as, to check shutdown etc.  */
+  init_key = d;  
+
+  gomp_init_device ((struct gomp_device_descr *) acc_dev);
+
+  return acc_dev;
+}
+
+static struct goacc_thread *
+goacc_new_thread (void)
+{
+  struct goacc_thread *thr = gomp_malloc (sizeof (struct gomp_thread));
+
+#ifdef HAVE_TLS
+  goacc_tls_data = thr;
+#else
+  pthread_setspecific (goacc_tls_key, thr);
+#endif
+
+  pthread_setspecific (goacc_cleanup_key, thr);
+
+  gomp_mutex_lock (&goacc_thread_lock);
+  thr->next = goacc_threads;
+  goacc_threads = thr;
+  gomp_mutex_unlock (&goacc_thread_lock);
+
+  return thr;
+}
+
+static void
+goacc_destroy_thread (void *data)
+{
+  struct goacc_thread *thr = data, *walk, *prev;
+  
+  gomp_mutex_lock (&goacc_thread_lock);
+  
+  if (thr)
+    {
+      if (base_dev && thr->target_tls)
+	{
+	  base_dev->openacc.destroy_thread_data_func (thr->target_tls);
+	  thr->target_tls = NULL;
+	}
+
+      assert (!thr->mapped_data);
+
+      /* Remove from thread list.  */
+      for (prev = NULL, walk = goacc_threads; walk;
+	   prev = walk, walk = walk->next)
+	if (walk == thr)
+	  {
+	    if (prev == NULL)
+	      goacc_threads = walk->next;
+	    else
+	      prev->next = walk->next;
+
+	    free (thr);
+
+	    break;
+	  }
+
+      assert (walk);
+    }
+
+  gomp_mutex_unlock (&goacc_thread_lock);
+}
+
+/* Open the ORD'th device of the currently-active type (base_dev must be
+   initialised before calling).  If ORD is < 0, open the default-numbered
+   device (set by the ACC_DEVICE_NUM environment variable or a call to
+   acc_set_device_num), or leave any currently-opened device as is.  "Opening"
+   consists of calling the device's open_device_func hook, and setting up
+   thread-local data (maybe allocating, then initializing with information
+   pertaining to the newly-opened or previously-opened device).  */
+
+static void
+lazy_open (int ord)
+{
+  struct goacc_thread *thr = goacc_thread ();
+  struct gomp_device_descr *acc_dev;
+
+  if (thr && thr->dev)
+    {
+      assert (ord < 0 || ord == thr->dev->target_id);
+      return;
+    }
+
+  assert (base_dev);
+
+  if (ord < 0)
+    ord = goacc_device_num;
+
+  if (ord >= base_dev->get_num_devices_func ())
+    gomp_fatal ("device %u does not exist", ord);
+
+  if (!thr)
+    thr = goacc_new_thread ();
+
+  acc_dev = thr->dev = (struct gomp_device_descr *) &base_dev[ord];
+
+  assert (acc_dev->target_id == ord);
+
+  thr->saved_bound_dev = NULL;
+  thr->mapped_data = NULL;
+
+  if (!acc_dev->target_data)
+    acc_dev->target_data = acc_dev->openacc.open_device_func (ord);
+
+  thr->target_tls
+    = acc_dev->openacc.create_thread_data_func (acc_dev->target_data);
+
+  acc_dev->openacc.async_set_async_func (acc_async_sync);
+
+  if (!acc_dev->mem_map.is_initialized)
+    gomp_init_tables (acc_dev, &acc_dev->mem_map);
+}
+
+/* OpenACC 2.0a (3.2.12, 3.2.13) doesn't specify whether the serialization of
+   init/shutdown is per-process or per-thread.  We choose per-process.  */
+
+void
+acc_init (acc_device_t d)
+{
+  if (!base_dev)
+    gomp_init_targets_once ();
+
+  gomp_mutex_lock (&acc_device_lock);
+
+  base_dev = acc_init_1 (d);
+
+  lazy_open (-1);
+
+  gomp_mutex_unlock (&acc_device_lock);
+}
+
+ialias (acc_init)
+
+void
+acc_shutdown_1 (acc_device_t d)
+{
+  struct goacc_thread *walk;
+
+  /* We don't check whether d matches the actual device found, because
+     OpenACC 2.0 (3.2.12) says the parameters to the init and this
+     call must match (for the shutdown call anyway, it's silent on
+     others).  */
+
+  if (!base_dev)
+    gomp_fatal ("no device initialized");
+  if (d != init_key)
+    gomp_fatal ("device %u(%u) is initialized",
+		(unsigned) init_key, (unsigned) base_dev->type);
+
+  gomp_mutex_lock (&goacc_thread_lock);
+
+  /* Free target-specific TLS data and close all devices.  */
+  for (walk = goacc_threads; walk != NULL; walk = walk->next)
+    {
+      if (walk->target_tls)
+	base_dev->openacc.destroy_thread_data_func (walk->target_tls);
+
+      walk->target_tls = NULL;
+
+      /* This would mean the user is shutting down OpenACC in the middle of an
+         "acc data" pragma.  Likely not intentional.  */
+      if (walk->mapped_data)
+	gomp_fatal ("shutdown in 'acc data' region");
+
+      if (walk->dev)
+	{
+          if (walk->dev->openacc.close_device_func (walk->dev->target_data) < 0)
+	    gomp_fatal ("failed to close device");
+
+	  walk->dev->target_data = NULL;
+
+	  gomp_free_memmap (walk->dev);
+
+	  walk->dev = NULL;
+	}
+    }
+
+  gomp_mutex_unlock (&goacc_thread_lock);
+
+  gomp_fini_device ((struct gomp_device_descr *) base_dev);
+
+  base_dev = NULL;
+}
+
+void
+acc_shutdown (acc_device_t d)
+{
+  gomp_mutex_lock (&acc_device_lock);
+
+  acc_shutdown_1 (d);
+
+  gomp_mutex_unlock (&acc_device_lock);
+}
+
+ialias (acc_shutdown)
+
+/* This function is called after plugins have been initialized.  It deals with
+   the "base" device, and is used to prepare the runtime for dealing with a
+   number of such devices (as implemented by some particular plugin).  If the
+   argument device type D matches a previous call to the function, return the
+   current base device, else shut the old device down and re-initialize with
+   the new device type.  */
+
+static struct gomp_device_descr const *
+lazy_init (acc_device_t d)
+{
+  if (base_dev)
+    {
+      /* Re-initializing the same device, do nothing.  */
+      if (d == init_key)
+	return base_dev;
+
+      acc_shutdown_1 (init_key);
+    }
+
+  assert (!base_dev);
+
+  return acc_init_1 (d);
+}
+
+/* Ensure that plugins are loaded, initialize and open the (default-numbered)
+   device.  */
+
+static void
+lazy_init_and_open (acc_device_t d)
+{
+  if (!base_dev)
+    gomp_init_targets_once ();
+
+  gomp_mutex_lock (&acc_device_lock);
+
+  base_dev = lazy_init (d);
+
+  lazy_open (-1);
+
+  gomp_mutex_unlock (&acc_device_lock);
+}
+
+int
+acc_get_num_devices (acc_device_t d)
+{
+  int n = 0;
+  struct gomp_device_descr const *acc_dev;
+
+  if (d == acc_device_none)
+    return 0;
+
+  if (!base_dev)
+    gomp_init_targets_once ();
+
+  acc_dev = resolve_device (d);
+  if (!acc_dev)
+    return 0;
+
+  n = acc_dev->get_num_devices_func ();
+  if (n < 0)
+    n = 0;
+
+  return n;
+}
+
+ialias (acc_get_num_devices)
+
+void
+acc_set_device_type (acc_device_t d)
+{
+  lazy_init_and_open (d);
+}
+
+ialias (acc_set_device_type)
+
+acc_device_t
+acc_get_device_type (void)
+{
+  acc_device_t res = acc_device_none;
+  const struct gomp_device_descr *dev;
+
+  if (base_dev)
+    res = acc_device_type (base_dev->type);
+  else
+    {
+      gomp_init_targets_once ();
+
+      dev = resolve_device (acc_device_default);
+      res = acc_device_type (dev->type);
+    }
+
+  assert (res != acc_device_default
+	  && res != acc_device_not_host);
+
+  return res;
+}
+
+ialias (acc_get_device_type)
+
+int
+acc_get_device_num (acc_device_t d)
+{
+  const struct gomp_device_descr *dev;
+  int num;
+
+  if (d >= _ACC_device_hwm)
+    gomp_fatal ("device %u out of range", (unsigned)d);
+
+  if (!base_dev)
+    gomp_init_targets_once ();
+
+  dev = resolve_device (d);
+  if (!dev)
+    gomp_fatal ("no devices of type %u", d);
+
+  /* We might not have called lazy_open for this host thread yet, in which case
+     the get_device_num_func hook will return -1.  */
+  num = dev->openacc.get_device_num_func ();
+  if (num < 0)
+    num = goacc_device_num;
+  
+  return num;
+}
+
+ialias (acc_get_device_num)
+
+void
+acc_set_device_num (int n, acc_device_t d)
+{
+  const struct gomp_device_descr *dev;
+  int num_devices;
+
+  if (!base_dev)
+    gomp_init_targets_once ();
+  
+  if ((int) d == 0)
+    {
+      int i;
+      
+      /* A device setting of zero sets all device types on the system to use
+         the Nth instance of that device type.  Only attempt it for initialized
+	 devices though.  */
+      for (i = acc_device_not_host + 1; i < _ACC_device_hwm; i++)
+        {
+	  dev = resolve_device (d);
+	  if (dev && dev->is_initialized)
+	    dev->openacc.set_device_num_func (n);
+	}
+
+      /* ...and for future calls to acc_init/acc_set_device_type, etc.  */
+      goacc_device_num = n;
+    }
+  else
+    {
+      struct goacc_thread *thr = goacc_thread ();
+
+      gomp_mutex_lock (&acc_device_lock);
+
+      base_dev = lazy_init (d);
+
+      num_devices = base_dev->get_num_devices_func ();
+
+      if (n >= num_devices)
+        gomp_fatal ("device %u out of range", n);
+
+      /* If we're changing the device number, de-associate this thread with
+	 the device (but don't close the device, since it may be in use by
+	 other threads).  */
+      if (thr && thr->dev && n != thr->dev->target_id)
+	thr->dev = NULL;
+
+      lazy_open (n);
+
+      gomp_mutex_unlock (&acc_device_lock);
+    }
+}
+
+ialias (acc_set_device_num)
+
+int
+acc_on_device (acc_device_t dev)
+{
+  struct goacc_thread *thr = goacc_thread ();
+
+  if (thr && thr->dev
+      && acc_device_type (thr->dev->type) == acc_device_host_nonshm)
+    return dev == acc_device_host_nonshm || dev == acc_device_not_host;
+
+  /* Just rely on the compiler builtin.  */
+  return __builtin_acc_on_device (dev);
+}
+ialias (acc_on_device)
+
+attribute_hidden void
+goacc_runtime_initialize (void)
+{
+  gomp_mutex_init (&acc_device_lock);
+
+#ifndef HAVE_TLS
+  pthread_key_create (&goacc_tls_key, NULL);
+#endif
+
+  pthread_key_create (&goacc_cleanup_key, goacc_destroy_thread);
+
+  base_dev = NULL;
+
+  goacc_threads = NULL;
+  gomp_mutex_init (&goacc_thread_lock);
+}
+
+/* Compiler helper functions */
+
+attribute_hidden void
+goacc_save_and_set_bind (acc_device_t d)
+{
+  struct goacc_thread *thr = goacc_thread ();
+
+  assert (!thr->saved_bound_dev);
+
+  thr->saved_bound_dev = thr->dev;
+  thr->dev = (struct gomp_device_descr *) dispatchers[d];
+}
+
+attribute_hidden void
+goacc_restore_bind (void)
+{
+  struct goacc_thread *thr = goacc_thread ();
+
+  thr->dev = thr->saved_bound_dev;
+  thr->saved_bound_dev = NULL;
+}
+
+/* This is called from any OpenACC support function that may need to implicitly
+   initialize the libgomp runtime.  On exit all such initialization will have
+   been done, and both the global ACC_dev and the per-host-thread ACC_memmap
+   pointers will be valid.  */
+
+attribute_hidden void
+goacc_lazy_initialize (void)
+{
+  struct goacc_thread *thr = goacc_thread ();
+
+  if (thr && thr->dev)
+    return;
+
+  if (!base_dev)
+    lazy_init_and_open (acc_device_default);
+  else
+    {
+      gomp_mutex_lock (&acc_device_lock);
+      lazy_open (-1);
+      gomp_mutex_unlock (&acc_device_lock);
+    }
+}
diff --git a/libgomp/oacc-int.h b/libgomp/oacc-int.h
new file mode 100644
index 0000000..c333a20
--- /dev/null
+++ b/libgomp/oacc-int.h
@@ -0,0 +1,106 @@
+/* OpenACC Runtime - internal declarations
+
+   Copyright (C) 2005-2014 Free Software Foundation, Inc.
+
+   Contributed by Mentor Embedded.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* This file contains data types and function declarations that are not
+   part of the official OpenACC user interface.  There are declarations
+   in here that are part of the GNU OpenACC ABI, in that the compiler is
+   required to know about them and use them.
+
+   The convention is that the all caps prefix "GOACC" is used group items
+   that are part of the external ABI, and the lower case prefix "goacc"
+   is used group items that are completely private to the library.  */
+
+#ifndef _OACC_INT_H
+#define _OACC_INT_H 1
+
+#include "openacc.h"
+#include "config.h"
+#include <stddef.h>
+#include <stdbool.h>
+#include <stdarg.h>
+
+#ifdef HAVE_ATTRIBUTE_VISIBILITY
+# pragma GCC visibility push(hidden)
+#endif
+
+static inline enum acc_device_t
+acc_device_type (enum offload_target_type type)
+{
+  return (enum acc_device_t) type;
+}
+
+struct goacc_thread
+{
+  /* The device for the current thread.  */
+  struct gomp_device_descr *dev;
+  
+  struct gomp_device_descr *saved_bound_dev;
+
+  /* This is a linked list of data mapped by the "acc data" pragma, following
+     strictly push/pop semantics according to lexical scope.  */
+  struct target_mem_desc *mapped_data;
+    
+  /* These structures form a list: this is the next thread in that list.  */
+  struct goacc_thread *next;
+  
+  /* Target-specific data (used by plugin).  */
+  void *target_tls;
+};
+
+#ifdef HAVE_TLS
+extern __thread struct goacc_thread *goacc_tls_data;
+static inline struct goacc_thread *
+goacc_thread (void)
+{
+  return goacc_tls_data;
+}
+#else
+extern pthread_key_t goacc_tls_key;
+static inline struct goacc_thread *
+goacc_thread (void)
+{
+  return pthread_getspecific (goacc_tls_key);
+}
+#endif
+
+struct gomp_device_descr;
+
+void goacc_register (struct gomp_device_descr const *) __GOACC_NOTHROW;
+
+/* Current dispatcher.  */
+extern struct gomp_device_descr const *base_dev;
+
+void goacc_runtime_initialize (void);
+void goacc_save_and_set_bind (acc_device_t);
+void goacc_restore_bind (void);
+void goacc_lazy_initialize (void);
+
+#ifdef HAVE_ATTRIBUTE_VISIBILITY
+# pragma GCC visibility pop
+#endif
+
+#endif /* _OACC_INT_H */
diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
new file mode 100644
index 0000000..ac1ea47
--- /dev/null
+++ b/libgomp/oacc-mem.c
@@ -0,0 +1,510 @@
+/* OpenACC Runtime initialization routines
+
+   Copyright (C) 2013 Free Software Foundation, Inc.
+
+   Contributed by Mentor Embedded.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "openacc.h"
+#include "config.h"
+#include "libgomp.h"
+#include "gomp-constants.h"
+#include "libgomp_target.h"
+#include "oacc-int.h"
+#include <stdio.h>
+#include <stdint.h>
+#include <assert.h>
+
+#include "splay-tree.h"
+
+/* Return block containing [H->S), or NULL if not contained.  */
+
+attribute_hidden splay_tree_key
+lookup_host (struct gomp_memory_mapping *mem_map, void *h, size_t s)
+{
+  struct splay_tree_key_s node;
+  splay_tree_key key;
+
+  node.host_start = (uintptr_t) h;
+  node.host_end = (uintptr_t) h + s;
+
+  gomp_mutex_lock (&mem_map->lock);
+
+  key = splay_tree_lookup (&mem_map->splay_tree, &node);
+
+  gomp_mutex_unlock (&mem_map->lock);
+
+  return key;
+}
+
+/* Return block containing [D->S), or NULL if not contained.
+   The list isn't ordered by device address, so we have to iterate
+   over the whole array.  This is not expected to be a common
+   operation.  */
+
+static splay_tree_key
+lookup_dev (struct target_mem_desc *tgt, void *d, size_t s)
+{
+  int i;
+  struct target_mem_desc *t;
+  struct gomp_memory_mapping *mem_map;
+  
+  if (!tgt)
+    return NULL;
+  
+  mem_map = tgt->mem_map;
+
+  gomp_mutex_lock (&mem_map->lock);
+
+  for (t = tgt; t != NULL; t = t->prev)
+    {
+      if (t->tgt_start <= (uintptr_t) d && t->tgt_end >= (uintptr_t) d + s)
+        break;
+    }
+
+  gomp_mutex_unlock (&mem_map->lock);
+
+  if (!t)
+    return NULL;
+
+  for (i = 0; i < t->list_count; i++)
+    {
+      void * offset;
+
+      splay_tree_key k = &t->array[i].key;
+      offset = d - t->tgt_start + k->tgt_offset;
+
+      if (k->host_start + offset <= (void *) k->host_end)
+        return k;
+    }
+ 
+  return NULL;
+}
+
+/* OpenACC is silent on how memory exhaustion is indicated.  We return
+   NULL.  */
+
+void *
+acc_malloc (size_t s)
+{
+  if (!s)
+    return NULL;
+
+  goacc_lazy_initialize ();
+
+  struct goacc_thread *thr = goacc_thread ();
+
+  return base_dev->alloc_func (thr->dev->target_id, s);
+}
+
+/* OpenACC 2.0a (3.2.16) doesn't specify what to do in the event
+   the device address is mapped. We choose to check if it mapped,
+   and if it is, to unmap it. */
+void
+acc_free (void *d)
+{
+  splay_tree_key k;
+  struct goacc_thread *thr = goacc_thread ();
+
+  if (!d)
+    return;
+
+  /* We don't have to call lazy open here, as the ptr value must have
+     been returned by acc_malloc.  It's not permitted to pass NULL in
+     (unless you got that null from acc_malloc).  */
+  if ((k = lookup_dev (thr->dev->openacc.data_environ, d, 1)))
+   {
+     void *offset;
+
+     offset = d - k->tgt->tgt_start + k->tgt_offset;
+
+     acc_unmap_data ((void *)(k->host_start + offset));
+   }
+
+  base_dev->free_func (thr->dev->target_id, d);
+}
+
+void
+acc_memcpy_to_device (void *d, void *h, size_t s)
+{
+  /* No need to call lazy open here, as the device pointer must have
+     been obtained from a routine that did that.  */
+  struct goacc_thread *thr = goacc_thread ();
+
+  base_dev->host2dev_func (thr->dev->target_id, d, h, s);
+}
+
+void
+acc_memcpy_from_device (void *h, void *d, size_t s)
+{
+  /* No need to call lazy open here, as the device pointer must have
+     been obtained from a routine that did that.  */
+  struct goacc_thread *thr = goacc_thread ();
+
+  base_dev->dev2host_func (thr->dev->target_id, h, d, s);
+}
+
+/* Return the device pointer that corresponds to host data H.  Or NULL
+   if no mapping.  */
+
+void *
+acc_deviceptr (void *h)
+{
+  splay_tree_key n;
+  void *d;
+  void *offset;
+
+  goacc_lazy_initialize ();
+
+  struct goacc_thread *thr = goacc_thread ();
+
+  n = lookup_host (&thr->dev->mem_map, h, 1);
+
+  if (!n)
+    return NULL;
+
+  offset = h - n->host_start;
+
+  d = n->tgt->tgt_start + n->tgt_offset + offset;
+
+  return d;
+}
+
+/* Return the host pointer that corresponds to device data D.  Or NULL
+   if no mapping.  */
+
+void *
+acc_hostptr (void *d)
+{
+  splay_tree_key n;
+  void *h;
+  void *offset;
+
+  goacc_lazy_initialize ();
+
+  struct goacc_thread *thr = goacc_thread ();
+
+  n = lookup_dev (thr->dev->openacc.data_environ, d, 1);
+
+  if (!n)
+    return NULL;
+
+  offset = d - n->tgt->tgt_start + n->tgt_offset;
+
+  h = n->host_start + offset;
+
+  return h;
+}
+
+/* Return 1 if host data [H,+S] is present on the device.  */
+
+int
+acc_is_present (void *h, size_t s)
+{
+  splay_tree_key n;
+
+  if (!s || !h)
+    return 0;
+
+  goacc_lazy_initialize ();
+
+  struct goacc_thread *thr = goacc_thread ();
+  struct gomp_device_descr *acc_dev = thr->dev;
+
+  n = lookup_host (&acc_dev->mem_map, h, s);
+
+  if (n && ((uintptr_t)h < n->host_start
+	    || (uintptr_t)h + s > n->host_end
+	    || s > n->host_end - n->host_start))
+    n = NULL;
+
+  return n != NULL;
+}
+
+/* Create a mapping for host [H,+S] -> device [D,+S] */
+
+void
+acc_map_data (void *h, void *d, size_t s)
+{
+  struct target_mem_desc *tgt;
+  size_t mapnum = 1;
+  void *hostaddrs = h;
+  void *devaddrs = d;
+  size_t sizes = s;
+  unsigned short kinds = GOMP_MAP_ALLOC;
+
+  goacc_lazy_initialize ();
+
+  struct goacc_thread *thr = goacc_thread ();
+  struct gomp_device_descr *acc_dev = thr->dev;
+
+  if (acc_dev->capabilities & TARGET_CAP_SHARED_MEM)
+    {
+      if (d != h)
+        gomp_fatal ("cannot map data on shared-memory system");
+
+      tgt = gomp_map_vars (NULL, 0, NULL, NULL, NULL, NULL, true, false);
+    }
+  else
+    {
+      struct goacc_thread *thr = goacc_thread ();
+
+      if (!d || !h || !s)
+	gomp_fatal ("[%p,+%d]->[%p,+%d] is a bad map",
+                    (void *)h, (int)s, (void *)d, (int)s);
+
+      if (lookup_host (&acc_dev->mem_map, h, s))
+	gomp_fatal ("host address [%p, +%d] is already mapped", (void *)h,
+		    (int)s);
+
+      if (lookup_dev (thr->dev->openacc.data_environ, d, s))
+	gomp_fatal ("device address [%p, +%d] is already mapped", (void *)d,
+		    (int)s);
+
+      tgt = gomp_map_vars (acc_dev, mapnum, &hostaddrs, &devaddrs, &sizes,
+			   &kinds, true, false);
+    }
+
+  tgt->prev = acc_dev->openacc.data_environ;
+  acc_dev->openacc.data_environ = tgt;
+}
+
+void
+acc_unmap_data (void *h)
+{
+  struct goacc_thread *thr = goacc_thread ();
+  struct gomp_device_descr *acc_dev = thr->dev;
+
+  /* No need to call lazy open, as the address must have been mapped.  */
+
+  size_t host_size;
+  splay_tree_key n = lookup_host (&acc_dev->mem_map, h, 1);
+  struct target_mem_desc *t;
+
+  if (!n)
+    gomp_fatal ("%p is not a mapped block", (void *)h);
+
+  host_size = n->host_end - n->host_start;
+
+  if (n->host_start != (uintptr_t) h)
+    gomp_fatal ("[%p,%d] surrounds1 %p",
+        	(void *) n->host_start, (int) host_size, (void *) h);
+
+  t = n->tgt;
+
+  if (t->refcount == 2)
+    {
+      struct target_mem_desc *tp;
+
+      /* This is the last reference, so pull the descriptor off the 
+         chain. This avoids gomp_unmap_vars via gomp_unmap_tgt from
+         freeing the device memory. */
+      t->tgt_end = 0;
+      t->to_free = 0;
+
+      gomp_mutex_lock (&acc_dev->mem_map.lock);
+
+      for (tp = NULL, t = acc_dev->openacc.data_environ; t != NULL;
+	   tp = t, t = t->prev)
+        if (n->tgt == t)
+          {
+            if (tp)
+              tp->prev = t->prev;
+            else
+              acc_dev->openacc.data_environ = t->prev;
+
+            break; 
+          }
+
+      gomp_mutex_unlock (&acc_dev->mem_map.lock);
+    }
+  
+  gomp_unmap_vars (t, true);
+}
+
+#define PCC_Present (1 << 0)
+#define PCC_Create (1 << 1)
+#define PCC_Copy (1 << 2)
+
+attribute_hidden void *
+present_create_copy (unsigned f, void *h, size_t s)
+{
+  void *d;
+  splay_tree_key n;
+
+  if (!h || !s)
+    gomp_fatal ("[%p,+%d] is a bad range", (void *)h, (int)s);
+
+  goacc_lazy_initialize ();
+
+  struct goacc_thread *thr = goacc_thread ();
+  struct gomp_device_descr *acc_dev = thr->dev;
+
+  n = lookup_host (&acc_dev->mem_map, h, s);
+  if (n)
+    {
+      /* Present. */
+      d = (void *) (n->tgt->tgt_start + n->tgt_offset);
+
+      if (!(f & PCC_Present))
+        gomp_fatal ("[%p,+%d] already mapped to [%p,+%d]",
+            (void *)h, (int)s, (void *)d, (int)s);
+      if ((h + s) > (void *)n->host_end)    
+        gomp_fatal ("[%p,+%d] not mapped", (void *)h, (int)s);
+    }
+  else if (!(f & PCC_Create))
+    {
+      gomp_fatal ("[%p,+%d] not mapped", (void *)h, (int)s);
+    }
+  else
+    {
+      struct target_mem_desc *tgt;
+      size_t mapnum = 1;
+      unsigned short kinds;
+      void *hostaddrs = h;
+
+      if (f & PCC_Copy)
+        kinds = GOMP_MAP_ALLOC_TO;
+      else
+        kinds = GOMP_MAP_ALLOC;
+
+      tgt = gomp_map_vars (acc_dev, mapnum, &hostaddrs, NULL, &s, &kinds, true,
+			   false);
+
+      gomp_mutex_lock (&acc_dev->mem_map.lock);
+
+      d = tgt->to_free;
+      tgt->prev = acc_dev->openacc.data_environ;
+      acc_dev->openacc.data_environ = tgt;
+
+      gomp_mutex_unlock (&acc_dev->mem_map.lock);
+    }
+  
+  return d;
+}
+
+void *
+acc_create (void *h, size_t s)
+{
+  return present_create_copy (PCC_Create, h, s);
+}
+
+void *
+acc_copyin (void *h, size_t s)
+{
+  return present_create_copy (PCC_Create | PCC_Copy, h, s);
+}
+
+void *
+acc_present_or_create (void *h, size_t s)
+{
+  return present_create_copy (PCC_Present | PCC_Create, h, s);
+}
+
+void *
+acc_present_or_copyin (void *h, size_t s)
+{
+  return present_create_copy (PCC_Present | PCC_Create | PCC_Copy, h, s);
+}
+
+#define DC_Copyout (1 << 0)
+
+static void
+delete_copyout (unsigned f, void *h, size_t s)
+{
+  size_t host_size;
+  splay_tree_key n;
+  void *d;
+  struct goacc_thread *thr = goacc_thread ();
+  struct gomp_device_descr *acc_dev = thr->dev;
+
+  n = lookup_host (&acc_dev->mem_map, h, s);
+
+  /* No need to call lazy open, as the data must already have been
+     mapped.  */
+
+  if (!n)
+    gomp_fatal ("[%p,%d] is not mapped", (void *)h, (int)s);
+
+  d = (void *) (n->tgt->tgt_start + n->tgt_offset);
+
+  host_size = n->host_end - n->host_start;
+
+  if (n->host_start != (uintptr_t) h || host_size != s)
+    gomp_fatal ("[%p,%d] surrounds2 [%p,+%d]",
+        	(void *) n->host_start, (int) host_size, (void *) h, (int) s);
+
+  if (f & DC_Copyout)
+    acc_dev->dev2host_func (acc_dev->target_id, h, d, s);
+  
+  acc_unmap_data (h);
+
+  acc_dev->free_func (acc_dev->target_id, d);
+}
+
+void
+acc_delete (void *h , size_t s)
+{
+  delete_copyout (0, h, s);
+}
+
+void acc_copyout (void *h, size_t s)
+{
+  delete_copyout (DC_Copyout, h, s);
+}
+
+static void
+update_dev_host (int is_dev, void *h, size_t s)
+{
+  splay_tree_key n;
+  void *d;
+  struct goacc_thread *thr = goacc_thread ();
+  struct gomp_device_descr *acc_dev = thr->dev;
+
+  n = lookup_host (&acc_dev->mem_map, h, s);
+
+  /* No need to call lazy open, as the data must already have been
+     mapped.  */
+
+  if (!n)
+    gomp_fatal ("[%p,%d] is not mapped", h, (int)s);
+
+  d = (void *) (n->tgt->tgt_start + n->tgt_offset);
+
+  if (is_dev)
+    acc_dev->host2dev_func (acc_dev->target_id, d, h, s);
+  else
+    acc_dev->dev2host_func (acc_dev->target_id, h, d, s);
+}
+
+void
+acc_update_device (void *h, size_t s)
+{
+  update_dev_host (1, h, s);
+}
+
+void
+acc_update_self (void *h, size_t s)
+{
+  update_dev_host (0, h, s);
+}
diff --git a/libgomp/oacc-parallel.c b/libgomp/oacc-parallel.c
new file mode 100644
index 0000000..0ff44bf
--- /dev/null
+++ b/libgomp/oacc-parallel.c
@@ -0,0 +1,388 @@
+/* Copyright (C) 2013-2014 Free Software Foundation, Inc.
+
+   Contributed by Mentor Embedded.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* This file handles OpenACC constructs.  */
+
+#include "openacc.h"
+#include "libgomp.h"
+#include "libgomp_g.h"
+#include "gomp-constants.h"
+#include "libgomp_target.h"
+#include "oacc-int.h"
+#include <stdio.h>
+#include <string.h>
+#include <stdarg.h>
+#include <assert.h>
+#include <alloca.h>
+
+static void
+dump_var (char *s, size_t idx, void *hostaddr, size_t size, unsigned char kind)
+{
+  gomp_notify (" %2zi: %3s 0x%.2x -", idx, s, kind & 0xff);
+
+  switch (kind & 0xff)
+    {
+      case 0x00: gomp_notify (" ALLOC              "); break;
+      case 0x01: gomp_notify (" ALLOC TO           "); break;
+      case 0x02: gomp_notify (" ALLOC FROM         "); break;
+      case 0x03: gomp_notify (" ALLOC TOFROM       "); break;
+      case 0x04: gomp_notify (" POINTER            "); break;
+      case 0x05: gomp_notify (" TO_PSET            "); break;
+
+      case 0x08: gomp_notify (" FORCE_ALLOC        "); break;
+      case 0x09: gomp_notify (" FORCE_TO           "); break;
+      case 0x0a: gomp_notify (" FORCE_FROM         "); break;
+      case 0x0b: gomp_notify (" FORCE_TOFROM       "); break;
+      case 0x0c: gomp_notify (" FORCE_PRESENT      "); break;
+      case 0x0d: gomp_notify (" FORCE_DEALLOC      "); break;
+      case 0x0e: gomp_notify (" FORCE_DEVICEPTR    "); break;
+
+      case 0x18: gomp_notify (" FORCE_PRIVATE      "); break;
+      case 0x19: gomp_notify (" FORCE_FIRSTPRIVATE "); break;
+
+      case (unsigned char) -1: gomp_notify (" DUMMY              "); break;
+      default: gomp_notify ("UGH! 0x%x\n", kind);
+    }
+    
+  gomp_notify ("- %d - %4d/0x%04x ", 1 << (kind >> 8), (int) size, (int) size);
+  gomp_notify ("- %p\n", hostaddr);
+}
+
+/* Ensure that the target device for DEVICE_TYPE is initialised (and that
+   plugins have been loaded if appropriate).  The ACC_dev variable for the
+   current thread will be set appropriately for the given device type on
+   return.  */
+
+attribute_hidden void
+select_acc_device (int device_type)
+{
+  goacc_lazy_initialize ();
+
+  if (device_type == GOMP_IF_CLAUSE_FALSE)
+    return;
+
+  if (device_type == acc_device_none)
+    device_type = acc_device_host;
+
+  if (device_type >= 0)
+    {
+      /* NOTE: this will go badly if the surrounding data environment is set up
+         to use a different device type.  We'll just have to trust that users
+	 know what they're doing...  */
+      acc_set_device_type (device_type);
+    }
+}
+
+void goacc_wait (int async, int num_waits, va_list ap);
+
+void
+GOACC_parallel (int device, void (*fn) (void *), const void *openmp_target,
+		size_t mapnum, void **hostaddrs, size_t *sizes,
+		unsigned short *kinds,
+		int num_gangs, int num_workers, int vector_length,
+		int async, int num_waits, ...)
+{
+  bool if_clause_condition_value = device != GOMP_IF_CLAUSE_FALSE;
+  va_list ap;
+  struct goacc_thread *thr;
+  struct gomp_device_descr *acc_dev;
+  struct target_mem_desc *tgt;
+  void **devaddrs;
+  unsigned int i;
+  struct splay_tree_key_s k;
+  splay_tree_key tgt_fn_key;
+  void (*tgt_fn);
+
+  if (num_gangs != 1)
+    gomp_fatal ("num_gangs (%d) different from one is not yet supported",
+		num_gangs);
+  if (num_workers != 1)
+    gomp_fatal ("num_workers (%d) different from one is not yet supported",
+		num_workers);
+
+  gomp_notify ("%s: mapnum=%zd, hostaddrs=%p, sizes=%p, kinds=%p, async=%d\n",
+	       __FUNCTION__, mapnum, hostaddrs, sizes, kinds, async);
+
+  select_acc_device (device);
+
+  thr = goacc_thread ();
+  acc_dev = thr->dev;
+
+  /* Host fallback if "if" clause is false or if the current device is set to
+     the host.  */
+  if (!if_clause_condition_value)
+    {
+      goacc_save_and_set_bind (acc_device_host);
+      fn (hostaddrs);
+      goacc_restore_bind ();
+      return;
+    }
+  else if (acc_device_type (acc_dev->type) == acc_device_host)
+    {
+      fn (hostaddrs);
+      return;
+    }
+
+  va_start (ap, num_waits);
+  
+  if (num_waits > 0)
+    goacc_wait (async, num_waits, ap);
+
+  va_end (ap);
+
+  acc_dev->openacc.async_set_async_func (async);
+
+  if (!(acc_dev->capabilities & TARGET_CAP_NATIVE_EXEC))
+    {
+      k.host_start = (uintptr_t) fn;
+      k.host_end = k.host_start + 1;
+      gomp_mutex_lock (&acc_dev->mem_map.lock);
+      tgt_fn_key = splay_tree_lookup (&acc_dev->mem_map.splay_tree, &k);
+      gomp_mutex_unlock (&acc_dev->mem_map.lock);
+
+      if (tgt_fn_key == NULL)
+	gomp_fatal ("target function wasn't mapped: perhaps -fopenacc was "
+		    "used without -flto?");
+
+      tgt_fn = (void (*)) tgt_fn_key->tgt->tgt_start;
+    }
+  else
+    tgt_fn = (void (*)) fn;
+
+  tgt = gomp_map_vars (acc_dev, mapnum, hostaddrs, NULL, sizes, kinds, true,
+		       false);
+
+  devaddrs = alloca (sizeof (void *) * mapnum);
+  for (i = 0; i < mapnum; i++)
+    devaddrs[i] = (void *) (tgt->list[i]->tgt->tgt_start
+			    + tgt->list[i]->tgt_offset);
+
+  acc_dev->openacc.exec_func (tgt_fn, mapnum, hostaddrs, devaddrs, sizes, kinds,
+			      num_gangs, num_workers, vector_length, async,
+			      tgt);
+
+  /* If running synchronously, unmap immediately.  */
+  if (async < acc_async_noval)
+    gomp_unmap_vars (tgt, true);
+  else
+    {
+      gomp_copy_from_async (tgt);
+      acc_dev->openacc.register_async_cleanup_func (tgt);
+    }
+
+  acc_dev->openacc.async_set_async_func (acc_async_sync);
+}
+
+void
+GOACC_data_start (int device, const void *openmp_target, size_t mapnum,
+		  void **hostaddrs, size_t *sizes, unsigned short *kinds)
+{
+  bool if_clause_condition_value = device != GOMP_IF_CLAUSE_FALSE;
+  struct target_mem_desc *tgt;
+
+  gomp_notify ("%s: mapnum=%zd, hostaddrs=%p, sizes=%p, kinds=%p\n",
+	       __FUNCTION__, mapnum, hostaddrs, sizes, kinds);
+
+  select_acc_device (device);
+
+  struct goacc_thread *thr = goacc_thread ();
+  struct gomp_device_descr *acc_dev = thr->dev;
+
+  /* Host fallback or 'do nothing'.  */
+  if ((acc_dev->capabilities & TARGET_CAP_SHARED_MEM)
+      || !if_clause_condition_value)
+    {
+      tgt = gomp_map_vars (NULL, 0, NULL, NULL, NULL, NULL, true, false);
+      tgt->prev = thr->mapped_data;
+      thr->mapped_data = tgt;
+
+      return;
+    }
+
+  gomp_notify ("  %s: prepare mappings\n", __FUNCTION__);
+  tgt = gomp_map_vars (acc_dev, mapnum, hostaddrs, NULL, sizes, kinds, true,
+		       false);
+  gomp_notify ("  %s: mappings prepared\n", __FUNCTION__);
+  tgt->prev = thr->mapped_data;
+  thr->mapped_data = tgt;
+}
+
+void
+GOACC_data_end (void)
+{
+  struct goacc_thread *thr = goacc_thread ();
+  struct target_mem_desc *tgt = thr->mapped_data;
+
+  gomp_notify ("  %s: restore mappings\n", __FUNCTION__);
+  thr->mapped_data = tgt->prev;
+  gomp_unmap_vars (tgt, true);
+  gomp_notify ("  %s: mappings restored\n", __FUNCTION__);
+}
+
+
+void
+GOACC_kernels (int device, void (*fn) (void *), const void *openmp_target,
+	       size_t mapnum, void **hostaddrs, size_t *sizes,
+	       unsigned short *kinds,
+	       int num_gangs, int num_workers, int vector_length,
+	       int async, int num_waits, ...)
+{
+  gomp_notify ("%s: mapnum=%zd, hostaddrs=%p, sizes=%p, kinds=%p\n",
+	       __FUNCTION__, mapnum, hostaddrs, sizes, kinds);
+
+  va_list ap;
+
+  select_acc_device (device);
+
+  va_start (ap, num_waits);
+
+  if (num_waits > 0)
+    goacc_wait (async, num_waits, ap);
+
+  va_end (ap);
+
+  GOACC_parallel (device, fn, openmp_target, mapnum, hostaddrs, sizes, kinds,
+		  num_gangs, num_workers, vector_length, async, 0);
+}
+
+void
+goacc_wait (int async, int num_waits, va_list ap)
+{
+  struct goacc_thread *thr = goacc_thread ();
+  struct gomp_device_descr *acc_dev = thr->dev;
+  int i;
+
+  assert (num_waits >= 0);
+
+  if (async == acc_async_sync && num_waits == 0)
+    {
+      acc_wait_all ();
+      return;
+    }
+
+  if (async == acc_async_sync && num_waits)
+    {
+      for (i = 0; i < num_waits; i++)
+        {
+          int qid = va_arg (ap, int);
+
+          if (acc_async_test (qid))
+            continue;
+
+          acc_wait (qid);
+        }
+      return;
+    }
+
+  if (async == acc_async_noval && num_waits == 0)
+    {
+      acc_dev->openacc.async_wait_all_async_func (acc_async_noval);
+      return;
+    }
+
+  for (i = 0; i < num_waits; i++)
+    {
+      int qid = va_arg (ap, int);
+
+      if (acc_async_test (qid))
+	continue;
+
+      /* If we're waiting on the same asynchronous queue as we're launching on,
+         the queue itself will order work as required, so there's no need to
+	 wait explicitly.  */
+      if (qid != async)
+	acc_dev->openacc.async_wait_async_func (qid, async);
+    }
+}
+
+void
+GOACC_update (int device, const void *openmp_target, size_t mapnum,
+	      void **hostaddrs, size_t *sizes, unsigned short *kinds,
+	      int async, int num_waits, ...)
+{
+  bool if_clause_condition_value = device != GOMP_IF_CLAUSE_FALSE;
+  size_t i;
+
+  select_acc_device (device);
+
+  struct goacc_thread *thr = goacc_thread ();
+  struct gomp_device_descr *acc_dev = thr->dev;
+
+  if ((acc_dev->capabilities & TARGET_CAP_SHARED_MEM)
+      || !if_clause_condition_value)
+    return;
+
+  if (num_waits > 0)
+    {
+      va_list ap;
+
+      va_start (ap, num_waits);
+
+      goacc_wait (async, num_waits, ap);
+
+      va_end (ap);
+    }
+
+  acc_dev->openacc.async_set_async_func (async);
+
+  for (i = 0; i < mapnum; ++i)
+    {
+      unsigned char kind = kinds[i] & 0xff;
+
+      dump_var ("UPD", i, hostaddrs[i], sizes[i], kinds[i]);
+
+      switch (kind)
+	{
+	case GOMP_MAP_POINTER:
+	  break;
+
+	case GOMP_MAP_FORCE_TO:
+	  acc_update_device (hostaddrs[i], sizes[i]);
+	  break;
+
+	case GOMP_MAP_FORCE_FROM:
+	  acc_update_self (hostaddrs[i], sizes[i]);
+	  break;
+
+	default:
+	  gomp_fatal (">>>> GOACC_update UNHANDLED kind 0x%.2x", kind);
+	  break;
+	}
+    }
+
+  acc_dev->openacc.async_set_async_func (acc_async_sync);
+}
+
+void
+GOACC_wait (int async, int num_waits, ...)
+{
+  va_list ap;
+
+  va_start (ap, num_waits);
+
+  goacc_wait (async, num_waits, ap);
+
+  va_end (ap);
+}
diff --git a/libgomp/oacc-plugin.c b/libgomp/oacc-plugin.c
new file mode 100644
index 0000000..357cb5f
--- /dev/null
+++ b/libgomp/oacc-plugin.c
@@ -0,0 +1,48 @@
+/* Copyright (C) 2014 Free Software Foundation, Inc.
+
+   Contributed by Mentor Embedded.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* Initialize and register OpenACC dispatch table from libgomp plugin.  */
+
+#include "libgomp.h"
+#include "oacc-plugin.h"
+#include "libgomp_target.h"
+#include "oacc-int.h"
+
+void
+GOMP_PLUGIN_async_unmap_vars (void *ptr)
+{
+  struct target_mem_desc *tgt = ptr;
+  
+  gomp_unmap_vars (tgt, false);
+}
+
+/* Return the target-specific part of the TLS data for the current thread.  */
+
+void *
+GOMP_PLUGIN_acc_thread (void)
+{
+  struct goacc_thread *thr = goacc_thread ();
+  return thr ? thr->target_tls : NULL;
+}
diff --git a/libgomp/oacc-plugin.h b/libgomp/oacc-plugin.h
new file mode 100644
index 0000000..d05a28f
--- /dev/null
+++ b/libgomp/oacc-plugin.h
@@ -0,0 +1,32 @@
+/* Copyright (C) 2014 Free Software Foundation, Inc.
+
+   Contributed by Mentor Embedded.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _OACC_PLUGIN_H
+#define _OACC_PLUGIN_H 1
+
+extern void GOMP_PLUGIN_async_unmap_vars (void *ptr);
+extern void *GOMP_PLUGIN_acc_thread (void);
+
+#endif
diff --git a/libgomp/openacc.f90 b/libgomp/openacc.f90
new file mode 100644
index 0000000..a344929
--- /dev/null
+++ b/libgomp/openacc.f90
@@ -0,0 +1,803 @@
+!  OpenACC Runtime Library Definitions.
+
+!  Copyright (C) 2014 Free Software Foundation, Inc.
+
+!  Contributed by Tobias Burnus <burnus@net-b.de>
+!              and Mentor Embedded.
+
+!  This file is part of the GNU OpenMP Library (libgomp).
+
+!  Libgomp is free software; you can redistribute it and/or modify it
+!  under the terms of the GNU General Public License as published by
+!  the Free Software Foundation; either version 3, or (at your option)
+!  any later version.
+
+!  Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+!  WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+!  FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+!  more details.
+
+!  Under Section 7 of GPL version 3, you are granted additional
+!  permissions described in the GCC Runtime Library Exception, version
+!  3.1, as published by the Free Software Foundation.
+
+!  You should have received a copy of the GNU General Public License and
+!  a copy of the GCC Runtime Library Exception along with this program;
+!  see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+!  <http://www.gnu.org/licenses/>.
+
+module openacc_kinds
+  use iso_fortran_env, only: int32
+  implicit none
+
+  private :: int32
+  public :: acc_device_kind
+
+  integer, parameter :: acc_device_kind = int32
+
+  public :: acc_device_none, acc_device_default, acc_device_host
+  public :: acc_device_not_host, acc_device_nvidia
+
+  integer (acc_device_kind), parameter :: acc_device_none = 0
+  integer (acc_device_kind), parameter :: acc_device_default = 1
+  integer (acc_device_kind), parameter :: acc_device_host = 2
+  integer (acc_device_kind), parameter :: acc_device_host_nonshm = 3
+  integer (acc_device_kind), parameter :: acc_device_not_host = 4
+  integer (acc_device_kind), parameter :: acc_device_nvidia = 5
+
+  public :: acc_handle_kind
+
+  integer, parameter :: acc_handle_kind = int32
+
+  public :: acc_async_noval, acc_async_sync
+
+  integer (acc_handle_kind), parameter :: acc_async_noval = -1
+  integer (acc_handle_kind), parameter :: acc_async_sync = -2
+
+end module
+
+module openacc_internal
+  use openacc_kinds
+  implicit none
+
+  interface
+    function acc_async_test_h (a)
+      logical acc_async_test_h
+      integer a
+    end function
+
+    function acc_async_test_all_h ()
+      logical acc_async_test_all_h
+    end function
+
+    function acc_on_device_h (d)
+      import
+      integer (acc_device_kind) d
+      logical acc_on_device_h
+    end function
+
+    subroutine acc_copyin_32_h (a, len)
+      use iso_c_binding, only: c_int32_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int32_t) len
+    end subroutine
+
+    subroutine acc_copyin_64_h (a, len)
+      use iso_c_binding, only: c_int64_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int64_t) len
+    end subroutine
+
+    subroutine acc_copyin_array_h (a)
+      type (*), dimension (..), contiguous :: a
+    end subroutine
+
+    subroutine acc_present_or_copyin_32_h (a, len)
+      use iso_c_binding, only: c_int32_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int32_t) len
+    end subroutine
+
+    subroutine acc_present_or_copyin_64_h (a, len)
+      use iso_c_binding, only: c_int64_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int64_t) len
+    end subroutine
+
+    subroutine acc_present_or_copyin_array_h (a)
+      type (*), dimension (..), contiguous :: a
+    end subroutine
+
+    subroutine acc_create_32_h (a, len)
+      use iso_c_binding, only: c_int32_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int32_t) len
+    end subroutine
+
+    subroutine acc_create_64_h (a, len)
+      use iso_c_binding, only: c_int64_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int64_t) len
+    end subroutine
+
+    subroutine acc_create_array_h (a)
+      type (*), dimension (..), contiguous :: a
+    end subroutine
+
+    subroutine acc_present_or_create_32_h (a, len)
+      use iso_c_binding, only: c_int32_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int32_t) len
+    end subroutine
+
+    subroutine acc_present_or_create_64_h (a, len)
+      use iso_c_binding, only: c_int64_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int64_t) len
+    end subroutine
+
+    subroutine acc_present_or_create_array_h (a)
+      type (*), dimension (..), contiguous :: a
+    end subroutine
+
+    subroutine acc_copyout_32_h (a, len)
+      use iso_c_binding, only: c_int32_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int32_t) len
+    end subroutine
+
+    subroutine acc_copyout_64_h (a, len)
+      use iso_c_binding, only: c_int64_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int64_t) len
+    end subroutine
+
+    subroutine acc_copyout_array_h (a)
+      type (*), dimension (..), contiguous :: a
+    end subroutine
+
+    subroutine acc_delete_32_h (a, len)
+      use iso_c_binding, only: c_int32_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int32_t) len
+    end subroutine
+
+    subroutine acc_delete_64_h (a, len)
+      use iso_c_binding, only: c_int64_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int64_t) len
+    end subroutine
+
+    subroutine acc_delete_array_h (a)
+      type (*), dimension (..), contiguous :: a
+    end subroutine
+
+    subroutine acc_update_device_32_h (a, len)
+      use iso_c_binding, only: c_int32_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int32_t) len
+    end subroutine
+
+    subroutine acc_update_device_64_h (a, len)
+      use iso_c_binding, only: c_int64_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int64_t) len
+    end subroutine
+
+    subroutine acc_update_device_array_h (a)
+      type (*), dimension (..), contiguous :: a
+    end subroutine
+
+    subroutine acc_update_self_32_h (a, len)
+      use iso_c_binding, only: c_int32_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int32_t) len
+    end subroutine
+
+    subroutine acc_update_self_64_h (a, len)
+      use iso_c_binding, only: c_int64_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int64_t) len
+    end subroutine
+
+    subroutine acc_update_self_array_h (a)
+      type (*), dimension (..), contiguous :: a
+    end subroutine
+
+    function acc_is_present_32_h (a, len)
+      use iso_c_binding, only: c_int32_t
+      logical acc_is_present_32_h
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int32_t) len
+    end function
+
+    function acc_is_present_64_h (a, len)
+      use iso_c_binding, only: c_int64_t
+      logical acc_is_present_64_h
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int64_t) len
+    end function
+
+    function acc_is_present_array_h (a)
+      logical acc_is_present_array_h
+      type (*), dimension (..), contiguous :: a
+    end function
+  end interface
+
+  interface
+    function acc_async_test_l (a) &
+        bind (C, name = "acc_async_test")
+      use iso_c_binding, only: c_int
+      integer (c_int) :: acc_async_test_l
+      integer (c_int), value :: a
+    end function
+
+    function acc_async_test_all_l () &
+        bind (C, name = "acc_async_test_all")
+      use iso_c_binding, only: c_int
+      integer (c_int) :: acc_async_test_all_l
+    end function
+
+    function acc_on_device_l (d) &
+        bind (C, name = "acc_on_device")
+      use iso_c_binding, only: c_int
+      integer (c_int) :: acc_on_device_l
+      integer (c_int), value :: d
+    end function
+
+    subroutine acc_copyin_l (a, len) &
+        bind (C, name = "acc_copyin")
+      use iso_c_binding, only: c_size_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_size_t), value :: len
+    end subroutine
+
+    subroutine acc_present_or_copyin_l (a, len) &
+        bind (C, name = "acc_present_or_copyin")
+      use iso_c_binding, only: c_size_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_size_t), value :: len
+    end subroutine
+
+    subroutine acc_create_l (a, len) &
+        bind (C, name = "acc_create")
+      use iso_c_binding, only: c_size_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_size_t), value :: len
+    end subroutine
+
+    subroutine acc_present_or_create_l (a, len) &
+        bind (C, name = "acc_present_or_create")
+      use iso_c_binding, only: c_size_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_size_t), value :: len
+    end subroutine
+
+    subroutine acc_copyout_l (a, len) &
+        bind (C, name = "acc_copyout")
+      use iso_c_binding, only: c_size_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_size_t), value :: len
+    end subroutine
+
+    subroutine acc_delete_l (a, len) &
+        bind (C, name = "acc_delete")
+      use iso_c_binding, only: c_size_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_size_t), value :: len
+    end subroutine
+
+    subroutine acc_update_device_l (a, len) &
+        bind (C, name = "acc_update_device")
+      use iso_c_binding, only: c_size_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_size_t), value :: len
+    end subroutine
+
+    subroutine acc_update_self_l (a, len) &
+        bind (C, name = "acc_update_self")
+      use iso_c_binding, only: c_size_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_size_t), value :: len
+    end subroutine
+
+    function acc_is_present_l (a, len) &
+        bind (C, name = "acc_is_present")
+      use iso_c_binding, only: c_int32_t, c_size_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      integer (c_int32_t) :: acc_is_present_l
+      type (*), dimension (*) :: a
+      integer (c_size_t), value :: len
+    end function
+  end interface
+end module
+
+module openacc
+  use openacc_kinds
+  use openacc_internal
+  implicit none
+
+  public :: openacc_version
+
+  public :: acc_get_num_devices, acc_set_device_type, acc_get_device_type
+  public :: acc_set_device_num, acc_get_device_num, acc_async_test
+  public :: acc_async_test_all, acc_wait, acc_wait_async, acc_wait_all
+  public :: acc_wait_all_async, acc_init, acc_shutdown, acc_on_device
+  public :: acc_copyin, acc_present_or_copyin, acc_pcopyin, acc_create
+  public :: acc_present_or_create, acc_pcreate, acc_copyout, acc_delete
+  public :: acc_update_device, acc_update_self, acc_is_present
+
+  integer, parameter :: openacc_version = 201306
+
+  interface acc_get_num_devices
+    function acc_get_num_devices (d) &
+        bind (C, name = "acc_get_num_devices")
+      use iso_c_binding, only: c_int
+      integer (c_int) :: acc_get_num_devices
+      integer (c_int), value :: d
+    end function
+  end interface
+
+  interface acc_set_device_type
+    subroutine acc_set_device_type (d) &
+        bind (C, name = "acc_set_device_type")
+      use iso_c_binding, only: c_int
+      integer (c_int), value :: d
+    end subroutine
+  end interface
+
+  interface acc_get_device_type
+    function acc_get_device_type () &
+        bind (C, name = "acc_get_device_type")
+      use iso_c_binding, only: c_int
+      integer (c_int) :: acc_get_device_type
+    end function
+  end interface
+
+  interface acc_set_device_num
+    subroutine acc_set_device_num (n, d) &
+        bind (C, name = "acc_set_device_num")
+      use iso_c_binding, only: c_int
+      integer (c_int), value :: n, d
+    end subroutine
+  end interface
+
+  interface acc_get_device_num
+    function acc_get_device_num (d) &
+        bind (C, name = "acc_get_device_num")
+      use iso_c_binding, only: c_int
+      integer (c_int) :: acc_get_device_num
+      integer (c_int), value :: d
+    end function
+  end interface
+
+  interface acc_async_test
+    procedure :: acc_async_test_h
+  end interface
+
+  interface acc_async_test_all
+    procedure :: acc_async_test_all_h
+  end interface
+
+  interface acc_wait
+    subroutine acc_wait (a) &
+        bind (C, name = "acc_wait")
+      use iso_c_binding, only: c_int
+      integer (c_int), value :: a
+    end subroutine
+  end interface
+
+  interface acc_wait_async
+    subroutine acc_wait_async (a1, a2) &
+        bind (C, name = "acc_wait_async")
+      use iso_c_binding, only: c_int
+      integer (c_int), value :: a1, a2
+    end subroutine
+  end interface
+
+  interface acc_wait_all
+    subroutine acc_wait_all () &
+        bind (C, name = "acc_wait_all")
+      use iso_c_binding, only: c_int
+    end subroutine
+  end interface
+
+  interface acc_wait_all_async
+    subroutine acc_wait_all_async (a) &
+        bind (C, name = "acc_wait_all_async")
+      use iso_c_binding, only: c_int
+      integer (c_int), value :: a
+    end subroutine
+  end interface
+
+  interface acc_init
+    subroutine acc_init (d) &
+        bind (C, name = "acc_init")
+      use iso_c_binding, only: c_int
+      integer (c_int), value :: d
+    end subroutine
+  end interface
+
+  interface acc_shutdown
+    subroutine acc_shutdown (d) &
+        bind (C, name = "acc_shutdown")
+      use iso_c_binding, only: c_int
+      integer (c_int), value :: d
+    end subroutine
+  end interface
+
+  interface acc_on_device
+    procedure :: acc_on_device_h
+  end interface
+
+  ! acc_malloc: Only available in C/C++
+  ! acc_free: Only available in C/C++
+
+  ! As vendor extension, the following code supports both 32bit and 64bit
+  ! arguments for "size"; the OpenACC standard only permits default-kind
+  ! integers, which are of kind 4 (i.e. 32 bits).
+  ! Additionally, the two-argument version also takes arrays as argument.
+  ! and the one argument version also scalars. Note that the code assumes
+  ! that the arrays are contiguous.
+
+  interface acc_copyin
+    procedure :: acc_copyin_32_h
+    procedure :: acc_copyin_64_h
+    procedure :: acc_copyin_array_h
+  end interface
+
+  interface acc_present_or_copyin
+    procedure :: acc_present_or_copyin_32_h
+    procedure :: acc_present_or_copyin_64_h
+    procedure :: acc_present_or_copyin_array_h
+  end interface
+
+  interface acc_pcopyin
+    procedure :: acc_present_or_copyin_32_h
+    procedure :: acc_present_or_copyin_64_h
+    procedure :: acc_present_or_copyin_array_h
+  end interface
+
+  interface acc_create
+    procedure :: acc_create_32_h
+    procedure :: acc_create_64_h
+    procedure :: acc_create_array_h
+  end interface
+
+  interface acc_present_or_create
+    procedure :: acc_present_or_create_32_h
+    procedure :: acc_present_or_create_64_h
+    procedure :: acc_present_or_create_array_h
+  end interface
+
+  interface acc_pcreate
+    procedure :: acc_present_or_create_32_h
+    procedure :: acc_present_or_create_64_h
+    procedure :: acc_present_or_create_array_h
+  end interface
+
+  interface acc_copyout
+    procedure :: acc_copyout_32_h
+    procedure :: acc_copyout_64_h
+    procedure :: acc_copyout_array_h
+  end interface
+
+  interface acc_delete
+    procedure :: acc_delete_32_h
+    procedure :: acc_delete_64_h
+    procedure :: acc_delete_array_h
+  end interface
+
+  interface acc_update_device
+    procedure :: acc_update_device_32_h
+    procedure :: acc_update_device_64_h
+    procedure :: acc_update_device_array_h
+  end interface
+
+  interface acc_update_self
+    procedure :: acc_update_self_32_h
+    procedure :: acc_update_self_64_h
+    procedure :: acc_update_self_array_h
+  end interface
+
+  ! acc_map_data: Only available in C/C++
+  ! acc_unmap_data: Only available in C/C++
+  ! acc_deviceptr: Only available in C/C++
+  ! acc_hostptr: Only available in C/C++
+
+  interface acc_is_present
+    procedure :: acc_is_present_32_h
+    procedure :: acc_is_present_64_h
+    procedure :: acc_is_present_array_h
+  end interface
+
+  ! acc_memcpy_to_device: Only available in C/C++
+  ! acc_memcpy_from_device: Only available in C/C++
+
+end module
+
+function acc_async_test_h (a)
+  use openacc_internal, only: acc_async_test_l
+  logical acc_async_test_h
+  integer a
+  if (acc_async_test_l (a) .eq. 1) then
+    acc_async_test_h = .TRUE.
+  else
+    acc_async_test_h = .FALSE.
+  end if
+end function
+
+function acc_async_test_all_h ()
+  use openacc_internal, only: acc_async_test_all_l
+  logical acc_async_test_all_h
+  if (acc_async_test_all_l () .eq. 1) then
+    acc_async_test_all_h = .TRUE.
+  else
+    acc_async_test_all_h = .FALSE.
+  end if
+end function
+
+function acc_on_device_h (d)
+  use openacc_internal, only: acc_on_device_l
+  use openacc_kinds
+  integer (acc_device_kind) d
+  logical acc_on_device_h
+  if (acc_on_device_l (d) .eq. 1) then
+    acc_on_device_h = .TRUE.
+  else
+    acc_on_device_h = .FALSE.
+  end if
+end function
+
+subroutine acc_copyin_32_h (a, len)
+  use iso_c_binding, only: c_int32_t, c_size_t
+  use openacc_internal, only: acc_copyin_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int32_t) len
+  call acc_copyin_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_copyin_64_h (a, len)
+  use iso_c_binding, only: c_int64_t, c_size_t
+  use openacc_internal, only: acc_copyin_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int64_t) len
+  call acc_copyin_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_copyin_array_h (a)
+  use openacc_internal, only: acc_copyin_l
+  type (*), dimension (..), contiguous :: a
+  call acc_copyin_l (a, sizeof (a))
+end subroutine
+
+subroutine acc_present_or_copyin_32_h (a, len)
+  use iso_c_binding, only: c_int32_t, c_size_t
+  use openacc_internal, only: acc_present_or_copyin_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int32_t) len
+  call acc_present_or_copyin_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_present_or_copyin_64_h (a, len)
+  use iso_c_binding, only: c_int64_t, c_size_t
+  use openacc_internal, only: acc_present_or_copyin_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int64_t) len
+  call acc_present_or_copyin_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_present_or_copyin_array_h (a)
+  use openacc_internal, only: acc_present_or_copyin_l
+  type (*), dimension (..), contiguous :: a
+  call acc_present_or_copyin_l (a, sizeof (a))
+end subroutine
+
+subroutine acc_create_32_h (a, len)
+  use iso_c_binding, only: c_int32_t, c_size_t
+  use openacc_internal, only: acc_create_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int32_t) len
+  call acc_create_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_create_64_h (a, len)
+  use iso_c_binding, only: c_int64_t, c_size_t
+  use openacc_internal, only: acc_create_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int64_t) len
+  call acc_create_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_create_array_h (a)
+  use openacc_internal, only: acc_create_l
+  type (*), dimension (..), contiguous :: a
+  call acc_create_l (a, sizeof (a))
+end subroutine
+
+subroutine acc_present_or_create_32_h (a, len)
+  use iso_c_binding, only: c_int32_t, c_size_t
+  use openacc_internal, only: acc_present_or_create_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int32_t) len
+  call acc_present_or_create_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_present_or_create_64_h (a, len)
+  use iso_c_binding, only: c_int64_t, c_size_t
+  use openacc_internal, only: acc_present_or_create_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int64_t) len
+  call acc_present_or_create_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_present_or_create_array_h (a)
+  use openacc_internal, only: acc_present_or_create_l
+  type (*), dimension (..), contiguous :: a
+  call acc_present_or_create_l (a, sizeof (a))
+end subroutine
+
+subroutine acc_copyout_32_h (a, len)
+  use iso_c_binding, only: c_int32_t, c_size_t
+  use openacc_internal, only: acc_copyout_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int32_t) len
+  call acc_copyout_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_copyout_64_h (a, len)
+  use iso_c_binding, only: c_int64_t, c_size_t
+  use openacc_internal, only: acc_copyout_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int64_t) len
+  call acc_copyout_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_copyout_array_h (a)
+  use openacc_internal, only: acc_copyout_l
+  type (*), dimension (..), contiguous :: a
+  call acc_copyout_l (a, sizeof (a))
+end subroutine
+
+subroutine acc_delete_32_h (a, len)
+  use iso_c_binding, only: c_int32_t, c_size_t
+  use openacc_internal, only: acc_delete_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int32_t) len
+  call acc_delete_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_delete_64_h (a, len)
+  use iso_c_binding, only: c_int64_t, c_size_t
+  use openacc_internal, only: acc_delete_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int64_t) len
+  call acc_delete_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_delete_array_h (a)
+  use openacc_internal, only: acc_delete_l
+  type (*), dimension (..), contiguous :: a
+  call acc_delete_l (a, sizeof (a))
+end subroutine
+
+subroutine acc_update_device_32_h (a, len)
+  use iso_c_binding, only: c_int32_t, c_size_t
+  use openacc_internal, only: acc_update_device_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int32_t) len
+  call acc_update_device_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_update_device_64_h (a, len)
+  use iso_c_binding, only: c_int64_t, c_size_t
+  use openacc_internal, only: acc_update_device_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int64_t) len
+  call acc_update_device_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_update_device_array_h (a)
+  use openacc_internal, only: acc_update_device_l
+  type (*), dimension (..), contiguous :: a
+  call acc_update_device_l (a, sizeof (a))
+end subroutine
+
+subroutine acc_update_self_32_h (a, len)
+  use iso_c_binding, only: c_int32_t, c_size_t
+  use openacc_internal, only: acc_update_self_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int32_t) len
+  call acc_update_self_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_update_self_64_h (a, len)
+  use iso_c_binding, only: c_int64_t, c_size_t
+  use openacc_internal, only: acc_update_self_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int64_t) len
+  call acc_update_self_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_update_self_array_h (a)
+  use openacc_internal, only: acc_update_self_l
+  type (*), dimension (..), contiguous :: a
+  call acc_update_self_l (a, sizeof (a))
+end subroutine
+
+function acc_is_present_32_h (a, len)
+  use iso_c_binding, only: c_int32_t, c_size_t
+  use openacc_internal, only: acc_is_present_l
+  logical acc_is_present_32_h
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int32_t) len
+  if (acc_is_present_l (a, int (len, kind = c_size_t)) .eq. 1) then
+    acc_is_present_32_h = .TRUE.
+  else
+    acc_is_present_32_h = .FALSE.
+  end if
+end function
+
+function acc_is_present_64_h (a, len)
+  use iso_c_binding, only: c_int64_t, c_size_t
+  use openacc_internal, only: acc_is_present_l
+  logical acc_is_present_64_h
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int64_t) len
+  if (acc_is_present_l (a, int (len, kind = c_size_t)) .eq. 1) then
+    acc_is_present_64_h = .TRUE.
+  else
+    acc_is_present_64_h = .FALSE.
+  end if
+end function
+
+function acc_is_present_array_h (a)
+  use openacc_internal, only: acc_is_present_l
+  logical acc_is_present_array_h
+  type (*), dimension (..), contiguous :: a
+  acc_is_present_array_h = acc_is_present_l (a, sizeof (a)) == 1
+end function
diff --git a/libgomp/openacc.h b/libgomp/openacc.h
new file mode 100644
index 0000000..01e0722
--- /dev/null
+++ b/libgomp/openacc.h
@@ -0,0 +1,127 @@
+/* OpenACC Runtime Library User-facing Declarations
+
+   Copyright (C) 2013-2014 Free Software Foundation, Inc.
+
+   Contributed by Mentor Embedded.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _OPENACC_H
+#define _OPENACC_H 1
+
+#include "gomp-constants.h"
+
+/* The OpenACC std is silent on whether or not including openacc.h
+   might or must not include other header files.  We chose to include
+   some.  */
+#include <stddef.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#if __cplusplus >= 201103
+# define __GOACC_NOTHROW noexcept ()
+#elif __cplusplus
+# define __GOACC_NOTHROW throw ()
+#else /* Not C++ */
+# define __GOACC_NOTHROW __attribute__ ((__nothrow__))
+#endif
+
+  /* Types */
+  typedef enum acc_device_t
+    {
+      acc_device_none = 0,
+      acc_device_default, /* This has to be a distinct value, as no
+			     return value can match it.  */
+      acc_device_host = GOMP_TARGET_HOST,
+      acc_device_host_nonshm = GOMP_TARGET_HOST_NONSHM,
+      acc_device_not_host,
+      acc_device_nvidia = GOMP_TARGET_NVIDIA_PTX,
+      _ACC_device_hwm
+    } acc_device_t;
+
+  typedef enum acc_async_t
+    {
+      acc_async_noval = -1,
+      acc_async_sync  = -2
+    } acc_async_t;
+
+  int acc_get_num_devices (acc_device_t __dev) __GOACC_NOTHROW;
+  void acc_set_device_type (acc_device_t __dev) __GOACC_NOTHROW;
+  acc_device_t acc_get_device_type (void) __GOACC_NOTHROW;
+  void acc_set_device_num (int __num, acc_device_t __dev) __GOACC_NOTHROW;
+  int acc_get_device_num (acc_device_t __dev) __GOACC_NOTHROW;
+  int acc_async_test (int __async) __GOACC_NOTHROW;
+  int acc_async_test_all (void) __GOACC_NOTHROW;
+  void acc_wait (int __async) __GOACC_NOTHROW;
+  void acc_wait_async (int __async1, int __async2) __GOACC_NOTHROW;
+  void acc_wait_all (void) __GOACC_NOTHROW;
+  void acc_wait_all_async (int __async) __GOACC_NOTHROW;
+  void acc_init (acc_device_t __dev) __GOACC_NOTHROW;
+  void acc_shutdown (acc_device_t __dev) __GOACC_NOTHROW;
+  int acc_on_device (acc_device_t __dev) __GOACC_NOTHROW;
+  void *acc_malloc (size_t) __GOACC_NOTHROW;
+  void acc_free (void *) __GOACC_NOTHROW;
+  /* Some of these would be more correct with const qualifiers, but
+     the standard specifies otherwise.  */
+  void *acc_copyin (void *, size_t) __GOACC_NOTHROW;
+  void *acc_present_or_copyin (void *, size_t) __GOACC_NOTHROW;
+  void *acc_create (void *, size_t) __GOACC_NOTHROW;
+  void *acc_present_or_create (void *, size_t) __GOACC_NOTHROW;
+  void acc_copyout (void *, size_t) __GOACC_NOTHROW;
+  void acc_delete (void *, size_t) __GOACC_NOTHROW;
+  void acc_update_device (void *, size_t) __GOACC_NOTHROW;
+  void acc_update_self (void *, size_t) __GOACC_NOTHROW;
+  void acc_map_data (void *, void *, size_t) __GOACC_NOTHROW;
+  void acc_unmap_data (void *) __GOACC_NOTHROW;
+  void *acc_deviceptr (void *) __GOACC_NOTHROW;
+  void *acc_hostptr (void *) __GOACC_NOTHROW;
+  int acc_is_present (void *, size_t) __GOACC_NOTHROW;
+  void acc_memcpy_to_device (void *, void *, size_t) __GOACC_NOTHROW;
+  void acc_memcpy_from_device (void *, void *, size_t) __GOACC_NOTHROW;
+
+  void ACC_target (int, void (*) (void *), const void *,
+	     size_t, void **, size_t *, unsigned char *, int *) __GOACC_NOTHROW;
+  void ACC_parallel (int, void (*) (void *), const void *,
+	     size_t, void **, size_t *, unsigned char *) __GOACC_NOTHROW;
+  void ACC_add_device_code (void const *, char const *) __GOACC_NOTHROW;
+
+  void ACC_async_copy (int) __GOACC_NOTHROW;
+  void ACC_async_kern (int) __GOACC_NOTHROW;
+
+  /* Old names.  OpenACC does not specify whether these can or must
+     not be macros, inlines or aliases for the new names.  */
+  #define acc_pcreate acc_present_or_create
+  #define acc_pcopyin acc_present_or_copyin
+
+  /* CUDA-specific routines.  */
+  void *acc_get_current_cuda_device (void) __GOACC_NOTHROW;
+  void *acc_get_current_cuda_context (void) __GOACC_NOTHROW;
+  void *acc_get_cuda_stream (int __async) __GOACC_NOTHROW;
+  int acc_set_cuda_stream (int __async, void *__stream) __GOACC_NOTHROW;
+  
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _OPENACC_H */
diff --git a/libgomp/openacc_lib.h b/libgomp/openacc_lib.h
new file mode 100644
index 0000000..13118a7
--- /dev/null
+++ b/libgomp/openacc_lib.h
@@ -0,0 +1,390 @@
+!  OpenACC Runtime Library Definitions.			-*- mode: fortran -*-
+
+!  Copyright (C) 2014 Free Software Foundation, Inc.
+
+!  Contributed by Tobias Burnus <burnus@net-b.de>
+!              and Mentor Embedded.
+
+!  This file is part of the GNU OpenMP Library (libgomp).
+
+!  Libgomp is free software; you can redistribute it and/or modify it
+!  under the terms of the GNU General Public License as published by
+!  the Free Software Foundation; either version 3, or (at your option)
+!  any later version.
+
+!  Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+!  WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+!  FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+!  more details.
+
+!  Under Section 7 of GPL version 3, you are granted additional
+!  permissions described in the GCC Runtime Library Exception, version
+!  3.1, as published by the Free Software Foundation.
+
+!  You should have received a copy of the GNU General Public License and
+!  a copy of the GCC Runtime Library Exception along with this program;
+!  see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+!  <http://www.gnu.org/licenses/>.
+
+! NOTE: Due to the use of dimension (..), the code only works when compiled
+! with -std=f2008ts/gnu/legacy but not with other standard settings.
+! Alternatively, the user can use the module version, which permits
+! compilation with -std=f95.
+
+      integer, parameter :: acc_device_kind = 4
+
+      integer (acc_device_kind), parameter :: acc_device_none = 0
+      integer (acc_device_kind), parameter :: acc_device_default = 1
+      integer (acc_device_kind), parameter :: acc_device_host = 2
+      integer (acc_device_kind), parameter :: acc_device_host_nonshm = 3
+      integer (acc_device_kind), parameter :: acc_device_not_host = 4
+      integer (acc_device_kind), parameter :: acc_device_nvidia = 5
+
+      integer, parameter :: acc_handle_kind = 4
+
+      integer (acc_handle_kind), parameter :: acc_async_noval = -1
+      integer (acc_handle_kind), parameter :: acc_async_sync = -2
+
+      integer, parameter :: openacc_version = 201306
+
+      interface
+	function acc_get_num_devices (d)
+     &    bind (C, name = "acc_get_num_devices")
+	  use iso_c_binding, only: c_int
+	  integer (c_int) :: acc_get_num_devices
+	  integer (c_int), value :: d
+	end function
+      end interface
+
+      interface acc_set_device_type
+	subroutine acc_set_device_type (d)
+     &    bind (C, name = "acc_set_device_type")
+	  use iso_c_binding, only: c_int
+	  integer (c_int), value :: d
+	end subroutine
+      end interface
+
+      interface acc_get_device_type
+	function acc_get_device_type ()
+     &    bind (C, name = "acc_get_device_type")
+	  use iso_c_binding, only: c_int
+	  integer (c_int) :: acc_get_device_type
+	end function
+      end interface
+
+      interface acc_set_device_num
+	subroutine acc_set_device_num (n, d)
+     &    bind (C, name = "acc_set_device_num")
+	  use iso_c_binding, only: c_int
+	  integer (c_int), value :: n, d
+	end subroutine
+      end interface
+
+      interface acc_get_device_num
+	function acc_get_device_num (d)
+     &    bind (C, name = "acc_get_device_num")
+	  use iso_c_binding, only: c_int
+	  integer (c_int) :: acc_get_device_num
+	  integer (c_int), value :: d
+	end function
+      end interface
+
+      interface acc_async_test
+        function acc_async_test_h (a)
+          logical acc_async_test_h
+          integer a
+        end function
+      end interface
+
+      interface acc_async_test_all
+        function acc_async_test_all_h ()
+          logical acc_async_test_all_h
+        end function
+      end interface
+
+      interface acc_wait
+	subroutine acc_wait (a)
+     &    bind (C, name = "acc_wait")
+	  use iso_c_binding, only: c_int
+	  integer (c_int), value :: a
+	end subroutine
+      end interface
+
+      interface acc_wait_async
+	subroutine acc_wait_async (a1, a2)
+     &  bind (C, name = "acc_wait_async")
+	end subroutine
+      end interface
+
+      interface acc_wait_all
+	subroutine acc_wait_all ()
+     &    bind (C, name = "acc_wait_all")
+	  use iso_c_binding, only: c_int
+	end subroutine
+      end interface
+
+      interface acc_wait_all_async
+	subroutine acc_wait_all_async (a)
+     &    bind (C, name = "acc_wait_all_async")
+	  use iso_c_binding, only: c_int
+	  integer (c_int), value :: a
+	end subroutine
+      end interface
+
+      interface acc_init
+	subroutine acc_init (d)
+     &    bind (C, name = "acc_init")
+	  use iso_c_binding, only: c_int
+	  integer (c_int), value :: d
+	end subroutine
+      end interface
+
+      interface acc_shutdown
+	subroutine acc_shutdown (d)
+     &    bind (C, name = "acc_shutdown")
+	  use iso_c_binding, only: c_int
+	  integer (c_int), value :: d
+	end subroutine
+      end interface
+
+      interface acc_on_device
+        function acc_on_device_h (devicetype)
+          import acc_device_kind
+          logical acc_on_device_h
+          integer (acc_device_kind) devicetype
+        end function
+      end interface
+
+      ! acc_malloc: Only available in C/C++
+      ! acc_free: Only available in C/C++
+
+      interface acc_copyin
+        subroutine acc_copyin_32_h (a, len)
+          use iso_c_binding, only: c_int32_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int32_t) len
+        end subroutine
+
+        subroutine acc_copyin_64_h (a, len)
+          use iso_c_binding, only: c_int64_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int64_t) len
+        end subroutine
+
+        subroutine acc_copyin_array_h (a)
+          type (*), dimension (..), contiguous :: a
+          end subroutine
+      end interface
+
+      interface acc_present_or_copyin
+        subroutine acc_present_or_copyin_32_h (a, len)
+          use iso_c_binding, only: c_int32_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int32_t) len
+        end subroutine
+
+        subroutine acc_present_or_copyin_64_h (a, len)
+          use iso_c_binding, only: c_int64_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int64_t) len
+        end subroutine
+
+        subroutine acc_present_or_copyin_array_h (a)
+          type (*), dimension (..), contiguous :: a
+          end subroutine
+      end interface
+
+      interface acc_pcopyin
+        subroutine acc_pcopyin_32_h (a, len)
+          use iso_c_binding, only: c_int32_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int32_t) len
+        end subroutine
+
+        subroutine acc_pcopyin_64_h (a, len)
+          use iso_c_binding, only: c_int64_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int64_t) len
+        end subroutine
+
+        subroutine acc_pcopyin_array_h (a)
+          type (*), dimension (..), contiguous :: a
+          end subroutine
+      end interface
+
+      interface acc_create
+        subroutine acc_create_32_h (a, len)
+          use iso_c_binding, only: c_int32_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int32_t) len
+        end subroutine
+
+        subroutine acc_create_64_h (a, len)
+          use iso_c_binding, only: c_int64_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int64_t) len
+        end subroutine
+
+        subroutine acc_create_array_h (a)
+          type (*), dimension (..), contiguous :: a
+          end subroutine
+      end interface
+
+      interface acc_present_or_create
+        subroutine acc_present_or_create_32_h (a, len)
+          use iso_c_binding, only: c_int32_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int32_t) len
+        end subroutine
+
+        subroutine acc_present_or_create_64_h (a, len)
+          use iso_c_binding, only: c_int64_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int64_t) len
+        end subroutine
+
+        subroutine acc_present_or_create_array_h (a)
+          type (*), dimension (..), contiguous :: a
+          end subroutine
+      end interface
+
+      interface acc_pcreate
+        subroutine acc_pcreate_32_h (a, len)
+          use iso_c_binding, only: c_int32_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int32_t) len
+        end subroutine
+
+        subroutine acc_pcreate_64_h (a, len)
+          use iso_c_binding, only: c_int64_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int64_t) len
+        end subroutine
+
+        subroutine acc_pcreate_array_h (a)
+          type (*), dimension (..), contiguous :: a
+          end subroutine
+      end interface
+
+      interface acc_copyout
+        subroutine acc_copyout_32_h (a, len)
+          use iso_c_binding, only: c_int32_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int32_t) len
+        end subroutine
+
+        subroutine acc_copyout_64_h (a, len)
+          use iso_c_binding, only: c_int64_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int64_t) len
+        end subroutine
+
+        subroutine acc_copyout_array_h (a)
+          type (*), dimension (..), contiguous :: a
+        end subroutine
+      end interface
+
+      interface acc_delete
+        subroutine acc_delete_32_h (a, len)
+          use iso_c_binding, only: c_int32_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int32_t) len
+        end subroutine
+
+        subroutine acc_delete_64_h (a, len)
+          use iso_c_binding, only: c_int64_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int64_t) len
+        end subroutine
+
+        subroutine acc_delete_array_h (a)
+          type (*), dimension (..), contiguous :: a
+        end subroutine
+      end interface
+
+      interface acc_update_device
+        subroutine acc_update_device_32_h (a, len)
+          use iso_c_binding, only: c_int32_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int32_t) len
+        end subroutine
+
+        subroutine acc_update_device_64_h (a, len)
+          use iso_c_binding, only: c_int64_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int64_t) len
+        end subroutine
+
+        subroutine acc_update_device_array_h (a)
+          type (*), dimension (..), contiguous :: a
+        end subroutine
+      end interface
+
+      interface acc_update_self
+        subroutine acc_update_self_32_h (a, len)
+          use iso_c_binding, only: c_int32_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int32_t) len
+        end subroutine
+
+        subroutine acc_update_self_64_h (a, len)
+          use iso_c_binding, only: c_int64_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int64_t) len
+        end subroutine
+
+        subroutine acc_update_self_array_h (a)
+          type (*), dimension (..), contiguous :: a
+        end subroutine
+      end interface
+
+      ! acc_map_data: Only available in C/C++
+      ! acc_unmap_data: Only available in C/C++
+      ! acc_deviceptr: Only available in C/C++
+      ! acc_ostptr: Only available in C/C++
+
+      interface acc_is_present
+        function acc_is_present_32_h (a, len)
+          use iso_c_binding, only: c_int32_t
+          logical acc_is_present_32_h
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int32_t) len
+        end function
+
+        function acc_is_present_64_h (a, len)
+          use iso_c_binding, only: c_int64_t
+          logical acc_is_present_64_h
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int64_t) len
+        end function
+
+        function acc_is_present_array_h (a)
+          logical acc_is_present_array_h
+          type (*), dimension (..), contiguous :: a
+        end function
+      end interface
+
+      ! acc_memcpy_to_device: Only available in C/C++
+      ! acc_memcpy_from_device: Only available in C/C++
diff --git a/libgomp/plugin/Makefrag.am b/libgomp/plugin/Makefrag.am
new file mode 100644
index 0000000..d6642d9
--- /dev/null
+++ b/libgomp/plugin/Makefrag.am
@@ -0,0 +1,47 @@
+# Plugins for offload execution, Makefile.am fragment.
+#
+# Copyright (C) 2014 Free Software Foundation, Inc.
+#
+# Contributed by Mentor Embedded.
+#
+# This file is part of the GNU OpenMP Library (libgomp).
+#
+# Libgomp is free software; you can redistribute it and/or modify it
+# under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+# WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+# FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+# more details.
+#
+# Under Section 7 of GPL version 3, you are granted additional
+# permissions described in the GCC Runtime Library Exception, version
+# 3.1, as published by the Free Software Foundation.
+#
+# You should have received a copy of the GNU General Public License and
+# a copy of the GCC Runtime Library Exception along with this program;
+# see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+# <http://www.gnu.org/licenses/>.
+
+if PLUGIN_NVPTX
+# Nvidia PTX OpenACC plugin.
+libgomp_plugin_nvptx_version_info = -version-info $(libtool_VERSION)
+toolexeclib_LTLIBRARIES += libgomp-plugin-nvptx.la
+libgomp_plugin_nvptx_la_SOURCES = plugin/plugin-nvptx.c
+libgomp_plugin_nvptx_la_CPPFLAGS = $(AM_CPPFLAGS) $(PLUGIN_NVPTX_CPPFLAGS)
+libgomp_plugin_nvptx_la_LDFLAGS = $(libgomp_plugin_nvptx_version_info) \
+	$(lt_host_flags)
+libgomp_plugin_nvptx_la_LDFLAGS += $(PLUGIN_NVPTX_LDFLAGS)
+libgomp_plugin_nvptx_la_LIBADD = $(PLUGIN_NVPTX_LIBS)
+libgomp_plugin_nvptx_la_LIBTOOLFLAGS = --tag=disable-static
+endif
+
+libgomp_plugin_host_nonshm_version_info = -version-info $(libtool_VERSION)
+toolexeclib_LTLIBRARIES += libgomp-plugin-host_nonshm.la
+libgomp_plugin_host_nonshm_la_SOURCES = plugin/plugin-host.c
+libgomp_plugin_host_nonshm_la_CPPFLAGS = $(AM_CPPFLAGS) -DHOST_NONSHM_PLUGIN
+libgomp_plugin_host_nonshm_la_LDFLAGS = \
+	$(libgomp_plugin_host_nonshm_version_info) $(lt_host_flags)
+libgomp_plugin_host_nonshm_la_LIBTOOLFLAGS = --tag=disable-static
diff --git a/libgomp/plugin/configfrag.ac b/libgomp/plugin/configfrag.ac
new file mode 100644
index 0000000..68c7dc7
--- /dev/null
+++ b/libgomp/plugin/configfrag.ac
@@ -0,0 +1,107 @@
+# Plugins for offload execution, configure.ac fragment.
+#
+# Copyright (C) 2014 Free Software Foundation, Inc.
+#
+# Contributed by Mentor Embedded.
+#
+# This file is part of the GNU OpenMP Library (libgomp).
+#
+# Libgomp is free software; you can redistribute it and/or modify it
+# under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+# WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+# FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+# more details.
+#
+# Under Section 7 of GPL version 3, you are granted additional
+# permissions described in the GCC Runtime Library Exception, version
+# 3.1, as published by the Free Software Foundation.
+#
+# You should have received a copy of the GNU General Public License and
+# a copy of the GCC Runtime Library Exception along with this program;
+# see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+# <http://www.gnu.org/licenses/>.
+
+# Look for the CUDA driver package.
+CUDA_DRIVER_INCLUDE=
+CUDA_DRIVER_LIB=
+AC_SUBST(CUDA_DRIVER_INCLUDE)
+AC_SUBST(CUDA_DRIVER_LIB)
+CUDA_DRIVER_CPPFLAGS=
+CUDA_DRIVER_LDFLAGS=
+AC_ARG_WITH(cuda-driver,
+	[AS_HELP_STRING([--with-cuda-driver=PATH],
+		[specify prefix directory for installed CUDA driver package.
+		 Equivalent to --with-cuda-driver-include=PATH/include
+		 plus --with-cuda-driver-lib=PATH/lib])])
+AC_ARG_WITH(cuda-driver-include,
+	[AS_HELP_STRING([--with-cuda-driver-include=PATH],
+		[specify directory for installed CUDA driver include files])])
+AC_ARG_WITH(cuda-driver-lib,
+	[AS_HELP_STRING([--with-cuda-driver-lib=PATH],
+		[specify directory for the installed CUDA driver library])])
+if test "x$with_cuda_driver" != x; then
+  CUDA_DRIVER_INCLUDE=$with_cuda_driver/include
+  CUDA_DRIVER_LIB=$with_cuda_driver/lib
+fi
+if test "x$with_cuda_driver_include" != x; then
+  CUDA_DRIVER_INCLUDE=$with_cuda_driver_include
+fi
+if test "x$with_cuda_driver_lib" != x; then
+  CUDA_DRIVER_LIB=$with_cuda_driver_lib
+fi
+if test "x$CUDA_DRIVER_INCLUDE" != x; then
+  CUDA_DRIVER_CPPFLAGS=-I$CUDA_DRIVER_INCLUDE
+fi
+if test "x$CUDA_DRIVER_LIB" != x; then
+  CUDA_DRIVER_LDFLAGS=-L$CUDA_DRIVER_LIB
+fi
+
+PLUGIN_NVPTX=0
+PLUGIN_NVPTX_CPPFLAGS=
+PLUGIN_NVPTX_LDFLAGS=
+PLUGIN_NVPTX_LIBS=
+AC_SUBST(PLUGIN_NVPTX)
+AC_SUBST(PLUGIN_NVPTX_CPPFLAGS)
+AC_SUBST(PLUGIN_NVPTX_LDFLAGS)
+AC_SUBST(PLUGIN_NVPTX_LIBS)
+
+for accel in `echo $enable_offload_targets | sed -e 's#,# #g'`; do
+  case "$accel" in
+    nvptx*)
+      PLUGIN_NVPTX=$accel
+      PLUGIN_NVPTX_CPPFLAGS=$CUDA_DRIVER_CPPFLAGS
+      PLUGIN_NVPTX_LDFLAGS=$CUDA_DRIVER_LDFLAGS
+      PLUGIN_NVPTX_LIBS='-lcuda'
+
+      PLUGIN_NVPTX_save_CPPFLAGS=$CPPFLAGS
+      CPPFLAGS="$PLUGIN_NVPTX_CPPFLAGS $CPPFLAGS"
+      PLUGIN_NVPTX_save_LDFLAGS=$LDFLAGS
+      LDFLAGS="$PLUGIN_NVPTX_LDFLAGS $LDFLAGS"
+      PLUGIN_NVPTX_save_LIBS=$LIBS
+      LIBS="$PLUGIN_NVPTX_LIBS $LIBS"
+      AC_LINK_IFELSE(
+	[AC_LANG_PROGRAM(
+	  [#include "cuda.h"],
+	  [CUresult r = cuCtxPushCurrent (NULL);])],
+	[PLUGIN_NVPTX=1])
+      CPPFLAGS=$PLUGIN_NVPTX_save_CPPFLAGS
+      LDFLAGS=$PLUGIN_NVPTX_save_LDFLAGS
+      LIBS=$PLUGIN_NVPTX_save_LIBS
+      case $PLUGIN_NVPTX in
+	nvptx*)
+	  PLUGIN_NVPTX=0
+	  AC_MSG_ERROR([CUDA driver package required for nvptx support])
+	  ;;
+      esac
+      ;;
+  esac
+done
+AM_CONDITIONAL([PLUGIN_NVPTX], [test $PLUGIN_NVPTX = 1])
+AC_DEFINE_UNQUOTED([PLUGIN_NVPTX], [$PLUGIN_NVPTX],
+		  [Define to 1 if the NVIDIA plugin is built, 0 if not.])
+
+AC_OUTPUT
diff --git a/libgomp/plugin/plugin-host.c b/libgomp/plugin/plugin-host.c
new file mode 100644
index 0000000..aee3c4e
--- /dev/null
+++ b/libgomp/plugin/plugin-host.c
@@ -0,0 +1,269 @@
+/* OpenACC Runtime Library: acc_device_host, acc_device_host_nonshm.
+
+   Copyright (C) 2013 Free Software Foundation, Inc.
+
+   Contributed by Mentor Embedded.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* Simple implementation of support routines for a shared-memory
+   acc_device_host, and a non-shared memory acc_device_host_nonshm, with the
+   latter built as a plugin.  */
+
+#include "openacc.h"
+#include "config.h"
+#include "libgomp.h"
+#include "libgomp_target.h"
+#ifdef HOST_NONSHM_PLUGIN
+#include "libgomp-plugin.h"
+#include "oacc-plugin.h"
+#else
+#include "oacc-int.h"
+#endif
+
+#include <stdint.h>
+#include <stdlib.h>
+#include <string.h>
+#include <stdio.h>
+
+#ifdef HOST_NONSHM_PLUGIN
+#define STATIC
+#define GOMP(X) GOMP_PLUGIN_##X
+#define SELF "host_nonshm plugin: "
+#else
+#define STATIC static
+#define GOMP(X) gomp_##X
+#define SELF "host: "
+#endif
+
+#ifndef HOST_NONSHM_PLUGIN
+static struct gomp_device_descr host_dispatch;
+#endif
+
+STATIC const char *
+GOMP_OFFLOAD_get_name (void)
+{
+#ifdef HOST_NONSHM_PLUGIN
+  return "host_nonshm";
+#else
+  return "host";
+#endif
+}
+
+STATIC int
+GOMP_OFFLOAD_get_type (void)
+{
+#ifdef HOST_NONSHM_PLUGIN
+  return OFFLOAD_TARGET_TYPE_HOST_NONSHM;
+#else
+  return OFFLOAD_TARGET_TYPE_HOST;
+#endif
+}
+
+STATIC unsigned int
+GOMP_OFFLOAD_get_caps (void)
+{
+  unsigned int caps = TARGET_CAP_OPENACC_200 | TARGET_CAP_NATIVE_EXEC;
+
+#ifndef HOST_NONSHM_PLUGIN
+  caps |= TARGET_CAP_SHARED_MEM;
+#endif
+
+  return caps;
+}
+
+STATIC int
+GOMP_OFFLOAD_get_num_devices (void)
+{
+  return 1;
+}
+
+STATIC void
+GOMP_OFFLOAD_register_image (void *host_table __attribute__((unused)),
+			     void *target_data __attribute__((unused)))
+{
+}
+
+STATIC void
+GOMP_OFFLOAD_init_device (int n __attribute__((unused)))
+{
+}
+
+STATIC void
+GOMP_OFFLOAD_fini_device (int n __attribute__((unused)))
+{
+}
+
+STATIC int
+GOMP_OFFLOAD_get_table (int n __attribute__((unused)),
+			struct mapping_table **table __attribute__((unused)))
+{
+  return 0;
+}
+
+STATIC void *
+GOMP_OFFLOAD_openacc_open_device (int n)
+{
+  return (void *) (intptr_t) n;
+}
+
+STATIC int
+GOMP_OFFLOAD_openacc_close_device (void *hnd)
+{
+  return 0;
+}
+
+STATIC int
+GOMP_OFFLOAD_openacc_get_device_num (void)
+{
+  return 0;
+}
+
+STATIC void
+GOMP_OFFLOAD_openacc_set_device_num (int n)
+{
+  if (n > 0)
+    GOMP(fatal) ("device number %u out of range for host execution", n);
+}
+
+STATIC void *
+GOMP_OFFLOAD_alloc (int n __attribute__((unused)), size_t s)
+{
+  return GOMP(malloc) (s);
+}
+
+STATIC void
+GOMP_OFFLOAD_free (int n __attribute__((unused)), void *p)
+{
+  free (p);
+}
+
+STATIC void *
+GOMP_OFFLOAD_host2dev (int n __attribute__((unused)), void *d, const void *h,
+		       size_t s)
+{
+#ifdef HOST_NONSHM_PLUGIN
+  memcpy (d, h, s);
+#endif
+
+  return 0;
+}
+
+STATIC void *
+GOMP_OFFLOAD_dev2host (int n __attribute__((unused)), void *h, const void *d,
+		       size_t s)
+{
+#ifdef HOST_NONSHM_PLUGIN
+  memcpy (h, d, s);
+#endif
+
+  return 0;
+}
+
+STATIC void
+GOMP_OFFLOAD_run (int n __attribute__((unused)), void *fn_ptr, void *vars)
+{
+  void (*fn)(void *) = (void (*)(void *)) fn_ptr;
+
+  fn (vars);
+}
+
+STATIC void
+GOMP_OFFLOAD_openacc_parallel (void (*fn) (void *),
+			       size_t mapnum __attribute__((unused)),
+			       void **hostaddrs,
+			       void **devaddrs __attribute__((unused)),
+			       size_t *sizes __attribute__((unused)),
+			       unsigned short *kinds __attribute__((unused)),
+			       int num_gangs __attribute__((unused)),
+			       int num_workers __attribute__((unused)),
+			       int vector_length __attribute__((unused)),
+			       int async __attribute__((unused)),
+			       void *targ_mem_desc __attribute__((unused)))
+{
+#ifdef HOST_NONSHM_PLUGIN
+  fn (devaddrs);
+#else
+  fn (hostaddrs);
+#endif
+}
+
+STATIC void
+GOMP_OFFLOAD_openacc_register_async_cleanup (void *targ_mem_desc)
+{
+#ifdef HOST_NONSHM_PLUGIN
+  /* "Asynchronous" launches are executed synchronously on the (non-SHM) host,
+     so there's no point in delaying host-side cleanup -- just do it now.  */
+  GOMP_PLUGIN_async_unmap_vars (targ_mem_desc);
+#endif
+}
+
+STATIC void
+GOMP_OFFLOAD_openacc_async_set_async (int async __attribute__((unused)))
+{
+}
+
+STATIC int
+GOMP_OFFLOAD_openacc_async_test (int async __attribute__((unused)))
+{
+  return 1;
+}
+
+STATIC int
+GOMP_OFFLOAD_openacc_async_test_all (void)
+{
+  return 1;
+}
+
+STATIC void
+GOMP_OFFLOAD_openacc_async_wait (int async __attribute__((unused)))
+{
+}
+
+STATIC void
+GOMP_OFFLOAD_openacc_async_wait_all (void)
+{
+}
+
+STATIC void
+GOMP_OFFLOAD_openacc_async_wait_async (int async1 __attribute__((unused)),
+				       int async2 __attribute__((unused)))
+{
+}
+
+STATIC void
+GOMP_OFFLOAD_openacc_async_wait_all_async (int async __attribute__((unused)))
+{
+}
+
+STATIC void *
+GOMP_OFFLOAD_openacc_create_thread_data (void *targ_data
+					 __attribute__((unused)))
+{
+  return NULL;
+}
+
+STATIC void
+GOMP_OFFLOAD_openacc_destroy_thread_data (void *tls_data
+					  __attribute__((unused)))
+{
+}
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
new file mode 100644
index 0000000..3d1b81b
--- /dev/null
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -0,0 +1,1852 @@
+/* Plugin for NVPTX execution.
+
+   Copyright (C) 2013-2014 Free Software Foundation, Inc.
+
+   Contributed by Mentor Embedded.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* Nvidia PTX-specific parts of OpenACC support.  The cuda driver
+   library appears to hold some implicit state, but the documentation
+   is not clear as to what that state might be.  Or how one might
+   propagate it from one thread to another.  */
+
+#include "openacc.h"
+#include "config.h"
+#include "libgomp.h"
+#include "libgomp_target.h"
+#include "libgomp-plugin.h"
+#include "oacc-plugin.h"
+
+#include <cuda.h>
+#include <stdint.h>
+#include <string.h>
+#include <stdio.h>
+#include <dlfcn.h>
+#include <unistd.h>
+#include <assert.h>
+
+#define	ARRAYSIZE(X) (sizeof (X) / sizeof ((X)[0]))
+
+static struct
+{
+  CUresult r;
+  char *m;
+} cuda_errlist[]=
+{
+  { CUDA_ERROR_INVALID_VALUE, "invalid value" },
+  { CUDA_ERROR_OUT_OF_MEMORY, "out of memory" },
+  { CUDA_ERROR_NOT_INITIALIZED, "not initialized" },
+  { CUDA_ERROR_DEINITIALIZED, "deinitialized" },
+  { CUDA_ERROR_PROFILER_DISABLED, "profiler disabled" },
+  { CUDA_ERROR_PROFILER_NOT_INITIALIZED, "profiler not initialized" },
+  { CUDA_ERROR_PROFILER_ALREADY_STARTED, "already started" },
+  { CUDA_ERROR_PROFILER_ALREADY_STOPPED, "already stopped" },
+  { CUDA_ERROR_NO_DEVICE, "no device" },
+  { CUDA_ERROR_INVALID_DEVICE, "invalid device" },
+  { CUDA_ERROR_INVALID_IMAGE, "invalid image" },
+  { CUDA_ERROR_INVALID_CONTEXT, "invalid context" },
+  { CUDA_ERROR_CONTEXT_ALREADY_CURRENT, "context already current" },
+  { CUDA_ERROR_MAP_FAILED, "map error" },
+  { CUDA_ERROR_UNMAP_FAILED, "unmap error" },
+  { CUDA_ERROR_ARRAY_IS_MAPPED, "array is mapped" },
+  { CUDA_ERROR_ALREADY_MAPPED, "already mapped" },
+  { CUDA_ERROR_NO_BINARY_FOR_GPU, "no binary for gpu" },
+  { CUDA_ERROR_ALREADY_ACQUIRED, "already acquired" },
+  { CUDA_ERROR_NOT_MAPPED, "not mapped" },
+  { CUDA_ERROR_NOT_MAPPED_AS_ARRAY, "not mapped as array" },
+  { CUDA_ERROR_NOT_MAPPED_AS_POINTER, "not mapped as pointer" },
+  { CUDA_ERROR_ECC_UNCORRECTABLE, "ecc uncorrectable" },
+  { CUDA_ERROR_UNSUPPORTED_LIMIT, "unsupported limit" },
+  { CUDA_ERROR_CONTEXT_ALREADY_IN_USE, "context already in use" },
+  { CUDA_ERROR_PEER_ACCESS_UNSUPPORTED, "peer access unsupported" },
+  { CUDA_ERROR_INVALID_SOURCE, "invalid source" },
+  { CUDA_ERROR_FILE_NOT_FOUND, "file not found" },
+  { CUDA_ERROR_SHARED_OBJECT_SYMBOL_NOT_FOUND,
+                                           "shared object symbol not found" },
+  { CUDA_ERROR_SHARED_OBJECT_INIT_FAILED, "shared object init error" },
+  { CUDA_ERROR_OPERATING_SYSTEM, "operating system" },
+  { CUDA_ERROR_INVALID_HANDLE, "invalid handle" },
+  { CUDA_ERROR_NOT_FOUND, "not found" },
+  { CUDA_ERROR_NOT_READY, "not ready" },
+  { CUDA_ERROR_LAUNCH_FAILED, "launch error" },
+  { CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES, "launch out of resources" },
+  { CUDA_ERROR_LAUNCH_TIMEOUT, "launch timeout" },
+  { CUDA_ERROR_LAUNCH_INCOMPATIBLE_TEXTURING,
+                                             "launch incompatibe texturing" },
+  { CUDA_ERROR_PEER_ACCESS_ALREADY_ENABLED, "peer access already enabled" },
+  { CUDA_ERROR_PEER_ACCESS_NOT_ENABLED, "peer access not enabled " },
+  { CUDA_ERROR_PRIMARY_CONTEXT_ACTIVE, "primary cotext active" },
+  { CUDA_ERROR_CONTEXT_IS_DESTROYED, "context is destroyed" },
+  { CUDA_ERROR_ASSERT, "assert" },
+  { CUDA_ERROR_TOO_MANY_PEERS, "too many peers" },
+  { CUDA_ERROR_HOST_MEMORY_ALREADY_REGISTERED,
+                                           "host memory already registered" },
+  { CUDA_ERROR_HOST_MEMORY_NOT_REGISTERED, "host memory not registered" },
+  { CUDA_ERROR_NOT_PERMITTED, "not permitted" },
+  { CUDA_ERROR_NOT_SUPPORTED, "not supported" },
+  { CUDA_ERROR_UNKNOWN, "unknown" }
+};
+
+static char errmsg[128];
+
+static char *
+cuda_error (CUresult r)
+{
+  int i;
+
+  for (i = 0; i < ARRAYSIZE (cuda_errlist); i++)
+    {
+      if (cuda_errlist[i].r == r)
+	return &cuda_errlist[i].m[0];
+    }
+
+  sprintf (&errmsg[0], "unknown result code: %5d", r);
+
+  return &errmsg[0];
+}
+
+struct targ_fn_descriptor
+{
+  CUfunction fn;
+  const char *name;
+};
+
+static bool ptx_inited = false;
+
+struct ptx_stream
+{
+  CUstream stream;
+  pthread_t host_thread;
+  bool multithreaded;
+
+  CUdeviceptr d;
+  void *h;
+  void *h_begin;
+  void *h_end;
+  void *h_next;
+  void *h_prev;
+  void *h_tail;
+
+  struct ptx_stream *next;
+};
+
+/* Thread-specific data for PTX.  */
+
+struct nvptx_thread
+{
+  struct ptx_stream *current_stream;
+  struct ptx_device *ptx_dev;
+};
+
+struct map
+{
+  int     async;
+  size_t  size;
+  char    mappings[0];
+};
+
+static void
+map_init (struct ptx_stream *s)
+{
+  CUresult r;
+
+  int size = getpagesize ();
+
+  assert (s);
+  assert (!s->d);
+  assert (!s->h);
+
+  r = cuMemAllocHost (&s->h, size);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuMemAllocHost error: %s", cuda_error (r));
+
+  r = cuMemHostGetDevicePointer (&s->d, s->h, 0);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuMemHostGetDevicePointer error: %s", cuda_error (r));
+
+  assert (s->h);
+
+  s->h_begin = s->h;
+  s->h_end = s->h_begin + size;
+  s->h_next = s->h_prev = s->h_tail = s->h_begin;
+
+  assert (s->h_next);
+  assert (s->h_end);
+}
+
+static void
+map_fini (struct ptx_stream *s)
+{
+  CUresult r;
+  
+  r = cuMemFreeHost (s->h);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuMemFreeHost error: %s", cuda_error (r));
+}
+
+static void
+map_pop (struct ptx_stream *s)
+{
+  struct map *m;
+
+  assert (s != NULL);
+  assert (s->h_next);
+  assert (s->h_prev);
+  assert (s->h_tail);
+
+  m = s->h_tail;
+
+  s->h_tail += m->size;
+
+  if (s->h_tail >= s->h_end)
+    s->h_tail = s->h_begin + (int) (s->h_tail - s->h_end);
+
+  if (s->h_next == s->h_tail)
+    s->h_prev = s->h_next;
+
+  assert (s->h_next >= s->h_begin);
+  assert (s->h_tail >= s->h_begin);
+  assert (s->h_prev >= s->h_begin);
+
+  assert (s->h_next <= s->h_end);
+  assert (s->h_tail <= s->h_end);
+  assert (s->h_prev <= s->h_end);
+}
+
+static void
+map_push (struct ptx_stream *s, int async, size_t size, void **h, void **d)
+{
+  int left;
+  int offset;
+  struct map *m;
+
+  assert (s != NULL);
+
+  left = s->h_end - s->h_next;
+  size += sizeof (struct map);
+
+  assert (s->h_prev);
+  assert (s->h_next);
+
+  if (size >= left)
+    {
+      m = s->h_prev;
+      m->size += left;
+      s->h_next = s->h_begin;
+
+      if (s->h_next + size > s->h_end)
+	GOMP_PLUGIN_fatal ("unable to push map");
+    }
+
+  assert (s->h_next);
+
+  m = s->h_next;
+  m->async = async;
+  m->size = size;
+
+  offset = (void *)&m->mappings[0] - s->h;
+
+  *d = (void *)(s->d + offset);
+  *h = (void *)(s->h + offset);
+
+  s->h_prev = s->h_next;
+  s->h_next += size;
+
+  assert (s->h_prev);
+  assert (s->h_next);
+
+  assert (s->h_next >= s->h_begin);
+  assert (s->h_tail >= s->h_begin);
+  assert (s->h_prev >= s->h_begin);
+  assert (s->h_next <= s->h_end);
+  assert (s->h_tail <= s->h_end);
+  assert (s->h_prev <= s->h_end);
+
+  return;
+}
+
+struct ptx_device
+{
+  CUcontext ctx;
+  bool ctx_shared;
+  CUdevice dev;
+  struct ptx_stream *null_stream;
+  /* All non-null streams associated with this device (actually context),
+     either created implicitly or passed in from the user (via
+     acc_set_cuda_stream).  */
+  struct ptx_stream *active_streams;
+  struct {
+    struct ptx_stream **arr;
+    int size;
+  } async_streams;
+  /* A lock for use when manipulating the above stream list and array.  */
+  gomp_mutex_t stream_lock;
+  int ord;
+  bool overlap;
+  bool map;
+  bool concur;
+  int  mode;
+  bool mkern;
+
+  struct ptx_device *next;
+};
+
+enum PTX_event_type
+{
+  PTX_EVT_MEM,
+  PTX_EVT_KNL,
+  PTX_EVT_SYNC,
+  PTX_EVT_ASYNC_CLEANUP
+};
+
+struct PTX_event
+{
+  CUevent *evt;
+  int type;
+  void *addr;
+  int ord;
+
+  struct PTX_event *next;
+};
+
+static gomp_mutex_t PTX_event_lock;
+static struct PTX_event *PTX_events;
+
+#define _XSTR(s) _STR(s)
+#define _STR(s) #s
+
+static struct _synames
+{
+  char *n;
+} cuSymNames[] =
+{
+  { _XSTR (cuCtxCreate) },
+  { _XSTR (cuCtxDestroy) },
+  { _XSTR (cuCtxGetCurrent) },
+  { _XSTR (cuCtxPushCurrent) },
+  { _XSTR (cuCtxSynchronize) },
+  { _XSTR (cuDeviceGet) },
+  { _XSTR (cuDeviceGetAttribute) },
+  { _XSTR (cuDeviceGetCount) },
+  { _XSTR (cuEventCreate) },
+  { _XSTR (cuEventDestroy) },
+  { _XSTR (cuEventQuery) },
+  { _XSTR (cuEventRecord) },
+  { _XSTR (cuInit) },
+  { _XSTR (cuLaunchKernel) },
+  { _XSTR (cuLinkAddData) },
+  { _XSTR (cuLinkComplete) },
+  { _XSTR (cuLinkCreate) },
+  { _XSTR (cuMemAlloc) },
+  { _XSTR (cuMemAllocHost) },
+  { _XSTR (cuMemcpy) },
+  { _XSTR (cuMemcpyDtoH) },
+  { _XSTR (cuMemcpyDtoHAsync) },
+  { _XSTR (cuMemcpyHtoD) },
+  { _XSTR (cuMemcpyHtoDAsync) },
+  { _XSTR (cuMemFree) },
+  { _XSTR (cuMemFreeHost) },
+  { _XSTR (cuMemGetAddressRange) },
+  { _XSTR (cuMemHostGetDevicePointer) },
+  { _XSTR (cuMemHostRegister) },
+  { _XSTR (cuMemHostUnregister) },
+  { _XSTR (cuModuleGetFunction) },
+  { _XSTR (cuModuleLoadData) },
+  { _XSTR (cuStreamDestroy) },
+  { _XSTR (cuStreamQuery) },
+  { _XSTR (cuStreamSynchronize) },
+  { _XSTR (cuStreamWaitEvent) }
+};
+
+static int
+verify_device_library (void)
+{
+  int i;
+  void *dh, *ds;
+
+  dh = dlopen ("libcuda.so", RTLD_LAZY);
+  if (!dh)
+    return -1;
+
+  for (i = 0; i < ARRAYSIZE (cuSymNames); i++)
+    {
+      ds = dlsym (dh, cuSymNames[i].n);
+      if (!ds)
+        return -1;
+    }
+
+  dlclose (dh);
+  
+  return 0;
+}
+
+static inline struct nvptx_thread *
+nvptx_thread (void)
+{
+  return (struct nvptx_thread *) GOMP_PLUGIN_acc_thread ();
+}
+
+static void
+init_streams_for_device (struct ptx_device *ptx_dev, int concurrency)
+{
+  int i;
+  struct ptx_stream *null_stream
+    = GOMP_PLUGIN_malloc (sizeof (struct ptx_stream));
+
+  null_stream->stream = NULL;
+  null_stream->host_thread = pthread_self ();
+  null_stream->multithreaded = true;
+  null_stream->d = (CUdeviceptr) NULL;
+  null_stream->h = NULL;
+  map_init (null_stream);
+  ptx_dev->null_stream = null_stream;
+  
+  ptx_dev->active_streams = NULL;
+  GOMP_PLUGIN_mutex_init (&ptx_dev->stream_lock);
+  
+  if (concurrency < 1)
+    concurrency = 1;
+  
+  /* This is just a guess -- make space for as many async streams as the
+     current device is capable of concurrently executing.  This can grow
+     later as necessary.  No streams are created yet.  */
+  ptx_dev->async_streams.arr
+    = GOMP_PLUGIN_malloc (concurrency * sizeof (struct ptx_stream *));
+  ptx_dev->async_streams.size = concurrency;
+  
+  for (i = 0; i < concurrency; i++)
+    ptx_dev->async_streams.arr[i] = NULL;
+}
+
+static void
+fini_streams_for_device (struct ptx_device *ptx_dev)
+{
+  free (ptx_dev->async_streams.arr);
+  
+  while (ptx_dev->active_streams != NULL)
+    {
+      struct ptx_stream *s = ptx_dev->active_streams;
+      ptx_dev->active_streams = ptx_dev->active_streams->next;
+
+      cuStreamDestroy (s->stream);
+      map_fini (s);
+      free (s);
+    }
+  
+  map_fini (ptx_dev->null_stream);
+  free (ptx_dev->null_stream);
+}
+
+/* Select a stream for (OpenACC-semantics) ASYNC argument for the current
+   thread THREAD (and also current device/context).  If CREATE is true, create
+   the stream if it does not exist (or use EXISTING if it is non-NULL), and
+   associate the stream with the same thread argument.  Returns stream to use
+   as result.  */
+
+static struct ptx_stream *
+select_stream_for_async (int async, pthread_t thread, bool create,
+			 CUstream existing)
+{
+  struct nvptx_thread *nvthd = nvptx_thread ();
+  /* Local copy of TLS variable.  */
+  struct ptx_device *ptx_dev = nvthd->ptx_dev;
+  struct ptx_stream *stream = NULL;
+  int orig_async = async;
+  
+  /* The special value acc_async_noval (-1) maps (for now) to an
+     implicitly-created stream, which is then handled the same as any other
+     numbered async stream.  Other options are available, e.g. using the null
+     stream for anonymous async operations, or choosing an idle stream from an
+     active set.  But, stick with this for now.  */
+  if (async > acc_async_sync)
+    async++;
+  
+  if (create)
+    GOMP_PLUGIN_mutex_lock (&ptx_dev->stream_lock);
+
+  /* NOTE: AFAICT there's no particular need for acc_async_sync to map to the
+     null stream, and in fact better performance may be obtainable if it doesn't
+     (because the null stream enforces overly-strict synchronisation with
+     respect to other streams for legacy reasons, and that's probably not
+     needed with OpenACC).  Maybe investigate later.  */
+  if (async == acc_async_sync)
+    stream = ptx_dev->null_stream;
+  else if (async >= 0 && async < ptx_dev->async_streams.size
+	   && ptx_dev->async_streams.arr[async] && !(create && existing))
+    stream = ptx_dev->async_streams.arr[async];
+  else if (async >= 0 && create)
+    {
+      if (async >= ptx_dev->async_streams.size)
+	{
+	  int i, newsize = ptx_dev->async_streams.size * 2;
+	  
+	  if (async >= newsize)
+	    newsize = async + 1;
+	  
+	  ptx_dev->async_streams.arr
+	    = GOMP_PLUGIN_realloc (ptx_dev->async_streams.arr,
+				   newsize * sizeof (struct ptx_stream *));
+	  
+	  for (i = ptx_dev->async_streams.size; i < newsize; i++)
+	    ptx_dev->async_streams.arr[i] = NULL;
+	  
+	  ptx_dev->async_streams.size = newsize;
+	}
+
+      /* Create a new stream on-demand if there isn't one already, or if we're
+	 setting a particular async value to an existing (externally-provided)
+	 stream.  */
+      if (!ptx_dev->async_streams.arr[async] || existing)
+        {
+	  CUresult r;
+	  struct ptx_stream *s
+	    = GOMP_PLUGIN_malloc (sizeof (struct ptx_stream));
+
+	  if (existing)
+	    s->stream = existing;
+	  else
+	    {
+	      r = cuStreamCreate (&s->stream, CU_STREAM_DEFAULT);
+	      if (r != CUDA_SUCCESS)
+		GOMP_PLUGIN_fatal ("cuStreamCreate error: %s", cuda_error (r));
+	    }
+	  
+	  /* If CREATE is true, we're going to be queueing some work on this
+	     stream.  Associate it with the current host thread.  */
+	  s->host_thread = thread;
+	  s->multithreaded = false;
+	  
+	  s->d = (CUdeviceptr) NULL;
+	  s->h = NULL;
+	  map_init (s);
+	  
+	  s->next = ptx_dev->active_streams;
+	  ptx_dev->active_streams = s;
+	  ptx_dev->async_streams.arr[async] = s;
+	}
+
+      stream = ptx_dev->async_streams.arr[async];
+    }
+  else if (async < 0)
+    GOMP_PLUGIN_fatal ("bad async %d", async);
+
+  if (create)
+    {
+      assert (stream != NULL);
+
+      /* If we're trying to use the same stream from different threads
+	 simultaneously, set stream->multithreaded to true.  This affects the
+	 behaviour of acc_async_test_all and acc_wait_all, which are supposed to
+	 only wait for asynchronous launches from the same host thread they are
+	 invoked on.  If multiple threads use the same async value, we make note
+	 of that here and fall back to testing/waiting for all threads in those
+	 functions.  */
+      if (thread != stream->host_thread)
+        stream->multithreaded = true;
+
+      GOMP_PLUGIN_mutex_unlock (&ptx_dev->stream_lock);
+    }
+  else if (stream && !stream->multithreaded
+	   && !pthread_equal (stream->host_thread, thread))
+    GOMP_PLUGIN_fatal ("async %d used on wrong thread", orig_async);
+
+  return stream;
+}
+
+static int PTX_get_num_devices (void);
+
+/* Initialize the device.  */
+static int
+PTX_init (void)
+{
+  CUresult r;
+  int rc;
+
+  if (ptx_inited)
+    return PTX_get_num_devices ();
+
+  rc = verify_device_library ();
+  if (rc < 0)
+    return -1;
+
+  r = cuInit (0);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuInit error: %s", cuda_error (r));
+
+  PTX_events = NULL;
+
+  GOMP_PLUGIN_mutex_init (&PTX_event_lock);
+
+  ptx_inited = true;
+
+  return PTX_get_num_devices ();
+}
+
+static void
+PTX_fini (void)
+{
+  ptx_inited = false;
+}
+
+static void *
+PTX_open_device (int n)
+{
+  struct ptx_device *ptx_dev;
+  CUdevice dev;
+  CUresult r;
+  int async_engines, pi;
+
+  r = cuDeviceGet (&dev, n);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuDeviceGet error: %s", cuda_error (r));
+
+  ptx_dev = GOMP_PLUGIN_malloc (sizeof (struct ptx_device));
+
+  ptx_dev->ord = n;
+  ptx_dev->dev = dev;
+  ptx_dev->ctx_shared = false;
+
+  r = cuCtxGetCurrent (&ptx_dev->ctx);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuCtxGetCurrent error: %s", cuda_error (r));
+
+  if (!ptx_dev->ctx)
+    {
+      r = cuCtxCreate (&ptx_dev->ctx, CU_CTX_SCHED_AUTO, dev);
+      if (r != CUDA_SUCCESS)
+	GOMP_PLUGIN_fatal ("cuCtxCreate error: %s", cuda_error (r));
+    }
+  else
+    ptx_dev->ctx_shared = true;
+
+  r = cuDeviceGetAttribute (&pi, CU_DEVICE_ATTRIBUTE_GPU_OVERLAP, dev);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuDeviceGetAttribute error: %s", cuda_error (r));
+
+  ptx_dev->overlap = pi;
+
+  r = cuDeviceGetAttribute (&pi, CU_DEVICE_ATTRIBUTE_CAN_MAP_HOST_MEMORY, dev);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuDeviceGetAttribute error: %s", cuda_error (r));
+
+  ptx_dev->map = pi;
+
+  r = cuDeviceGetAttribute (&pi, CU_DEVICE_ATTRIBUTE_CONCURRENT_KERNELS, dev);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuDeviceGetAttribute error: %s", cuda_error (r));
+
+  ptx_dev->concur = pi;
+
+  r = cuDeviceGetAttribute (&pi, CU_DEVICE_ATTRIBUTE_COMPUTE_MODE, dev);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuDeviceGetAttribute error: %s", cuda_error (r));
+
+  ptx_dev->mode = pi;
+
+  r = cuDeviceGetAttribute (&pi, CU_DEVICE_ATTRIBUTE_INTEGRATED, dev);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuDeviceGetAttribute error: %s", cuda_error (r));
+
+  ptx_dev->mkern = pi;
+
+  r = cuDeviceGetAttribute (&async_engines,
+			    CU_DEVICE_ATTRIBUTE_ASYNC_ENGINE_COUNT, dev);
+  if (r != CUDA_SUCCESS)
+    async_engines = 1;
+
+  init_streams_for_device (ptx_dev, async_engines);
+
+  return (void *) ptx_dev;
+}
+
+static int
+PTX_close_device (void *targ_data)
+{
+  CUresult r;
+  struct ptx_device *ptx_dev = targ_data;
+
+  if (!ptx_dev)
+    return 0;
+  
+  fini_streams_for_device (ptx_dev);
+
+  if (!ptx_dev->ctx_shared)
+    {
+      r = cuCtxDestroy (ptx_dev->ctx);
+      if (r != CUDA_SUCCESS)
+	GOMP_PLUGIN_fatal ("cuCtxDestroy error: %s", cuda_error (r));
+    }
+
+  free (ptx_dev);
+
+  return 0;
+}
+
+static int
+PTX_get_num_devices (void)
+{
+  int n;
+  CUresult r;
+
+  /* This function will be called before the plugin has been initialized in
+     order to enumerate available devices, but CUDA API routines can't be used
+     until cuInit has been called.  Just call it now (but don't yet do any
+     further initialization).  */
+  if (!ptx_inited)
+    cuInit (0);
+
+  r = cuDeviceGetCount (&n);
+  if (r!= CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuDeviceGetCount error: %s", cuda_error (r));
+
+  return n;
+}
+
+#define ABORT_PTX				\
+  ".version 3.1\n"				\
+  ".target sm_30\n"				\
+  ".address_size 64\n"				\
+  ".visible .func abort;\n"			\
+  ".visible .func abort\n"			\
+  "{\n"						\
+  "trap;\n"					\
+  "ret;\n"					\
+  "}\n"						\
+  ".visible .func _gfortran_abort;\n"		\
+  ".visible .func _gfortran_abort\n"		\
+  "{\n"						\
+  "trap;\n"					\
+  "ret;\n"					\
+  "}\n" \
+
+/* Generated with:
+
+   $ echo 'int acc_on_device(int d) { return __builtin_acc_on_device(d); } int acc_on_device_(int *d) { return acc_on_device(*d); }' | accel-gcc/xgcc -Baccel-gcc -x c - -o - -S -m64 -O3 -fno-builtin-acc_on_device -fno-inline
+*/
+#define ACC_ON_DEVICE_PTX						\
+  "        .version        3.1\n"					\
+  "        .target sm_30\n"						\
+  "        .address_size 64\n"						\
+  ".visible .func (.param.u32 %out_retval)acc_on_device(.param.u32 %in_ar1);\n" \
+  ".visible .func (.param.u32 %out_retval)acc_on_device(.param.u32 %in_ar1)\n" \
+  "{\n"									\
+  "        .reg.u32 %ar1;\n"						\
+  ".reg.u32 %retval;\n"							\
+  "        .reg.u64 %hr10;\n"						\
+  "        .reg.u32 %r24;\n"						\
+  "        .reg.u32 %r25;\n"						\
+  "        .reg.pred %r27;\n"						\
+  "        .reg.u32 %r30;\n"						\
+  "        ld.param.u32 %ar1, [%in_ar1];\n"				\
+  "                mov.u32 %r24, %ar1;\n"				\
+  "                setp.ne.u32 %r27,%r24,4;\n"				\
+  "                set.u32.eq.u32 %r30,%r24,5;\n"			\
+  "                neg.s32 %r25, %r30;\n"				\
+  "        @%r27   bra     $L3;\n"					\
+  "                mov.u32 %r25, 1;\n"					\
+  "$L3:\n"								\
+  "                mov.u32 %retval, %r25;\n"				\
+  "        st.param.u32    [%out_retval], %retval;\n"			\
+  "        ret;\n"							\
+  "        }\n"								\
+  ".visible .func (.param.u32 %out_retval)acc_on_device_(.param.u64 %in_ar1);\n" \
+  ".visible .func (.param.u32 %out_retval)acc_on_device_(.param.u64 %in_ar1)\n" \
+  "{\n"									\
+  "        .reg.u64 %ar1;\n"						\
+  ".reg.u32 %retval;\n"							\
+  "        .reg.u64 %hr10;\n"						\
+  "        .reg.u64 %r25;\n"						\
+  "        .reg.u32 %r26;\n"						\
+  "        .reg.u32 %r27;\n"						\
+  "        ld.param.u64 %ar1, [%in_ar1];\n"				\
+  "                mov.u64 %r25, %ar1;\n"				\
+  "                ld.u32  %r26, [%r25];\n"				\
+  "        {\n"								\
+  "                .param.u32 %retval_in;\n"				\
+  "        {\n"								\
+  "                .param.u32 %out_arg0;\n"				\
+  "                st.param.u32 [%out_arg0], %r26;\n"			\
+  "                call (%retval_in), acc_on_device, (%out_arg0);\n"	\
+  "        }\n"								\
+  "                ld.param.u32    %r27, [%retval_in];\n"		\
+  "}\n"									\
+  "                mov.u32 %retval, %r27;\n"				\
+  "        st.param.u32    [%out_retval], %retval;\n"			\
+  "        ret;\n"							\
+  "        }"
+
+static void
+link_ptx (CUmodule *module, char *ptx_code)
+{
+  CUjit_option opts[7];
+  void *optvals[7];
+  float elapsed = 0.0;
+#define LOGSIZE 8192
+  char elog[LOGSIZE];
+  char ilog[LOGSIZE];
+  unsigned long logsize = LOGSIZE;
+  CUlinkState linkstate;
+  CUresult r;
+  void *linkout;
+  size_t linkoutsize __attribute__((unused));
+
+  GOMP_PLUGIN_notify ("attempting to load:\n---\n%s\n---\n", ptx_code);
+
+  opts[0] = CU_JIT_WALL_TIME;
+  optvals[0] = &elapsed;
+
+  opts[1] = CU_JIT_INFO_LOG_BUFFER;
+  optvals[1] = &ilog[0];
+
+  opts[2] = CU_JIT_INFO_LOG_BUFFER_SIZE_BYTES;
+  optvals[2] = (void *) logsize;
+
+  opts[3] = CU_JIT_ERROR_LOG_BUFFER;
+  optvals[3] = &elog[0];
+
+  opts[4] = CU_JIT_ERROR_LOG_BUFFER_SIZE_BYTES;
+  optvals[4] = (void *) logsize;
+
+  opts[5] = CU_JIT_LOG_VERBOSE;
+  optvals[5] = (void *) 1;
+
+  opts[6] = CU_JIT_TARGET;
+  optvals[6] = (void *) CU_TARGET_COMPUTE_30;
+
+  r = cuLinkCreate (7, opts, optvals, &linkstate);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuLinkCreate error: %s", cuda_error (r));
+
+  char *abort_ptx = ABORT_PTX;
+  r = cuLinkAddData (linkstate, CU_JIT_INPUT_PTX, abort_ptx,
+		     strlen (abort_ptx) + 1, 0, 0, 0, 0);
+  if (r != CUDA_SUCCESS)
+    {
+      GOMP_PLUGIN_error ("Link error log %s\n", &elog[0]);
+      GOMP_PLUGIN_fatal ("cuLinkAddData (abort) error: %s", cuda_error (r));
+    }
+
+  char *acc_on_device_ptx = ACC_ON_DEVICE_PTX;
+  r = cuLinkAddData (linkstate, CU_JIT_INPUT_PTX, acc_on_device_ptx,
+		     strlen (acc_on_device_ptx) + 1, 0, 0, 0, 0);
+  if (r != CUDA_SUCCESS)
+    {
+      GOMP_PLUGIN_error ("Link error log %s\n", &elog[0]);
+      GOMP_PLUGIN_fatal ("cuLinkAddData (acc_on_device) error: %s",
+			 cuda_error (r));
+    }
+
+  r = cuLinkAddData (linkstate, CU_JIT_INPUT_PTX, ptx_code,
+              strlen (ptx_code) + 1, 0, 0, 0, 0);
+  if (r != CUDA_SUCCESS)
+    {
+      GOMP_PLUGIN_error ("Link error log %s\n", &elog[0]);
+      GOMP_PLUGIN_fatal ("cuLinkAddData (ptx_code) error: %s", cuda_error (r));
+    }
+
+  r = cuLinkComplete (linkstate, &linkout, &linkoutsize);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuLinkComplete error: %s", cuda_error (r));
+
+  GOMP_PLUGIN_notify ("Link complete: %fms\n", elapsed);
+  GOMP_PLUGIN_notify ("Link log %s\n", &ilog[0]);
+
+  r = cuModuleLoadData (module, linkout);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuModuleLoadData error: %s", cuda_error (r));
+}
+
+static void
+event_gc (bool memmap_lockable)
+{
+  struct PTX_event *e = PTX_events;
+  struct nvptx_thread *nvthd = nvptx_thread ();
+
+  GOMP_PLUGIN_mutex_lock (&PTX_event_lock);
+
+  while (e != NULL)
+    {
+      CUresult r;
+
+      if (e->ord != nvthd->ptx_dev->ord)
+	{
+	  e = e->next;
+	  continue;
+	}
+
+      r = cuEventQuery (*e->evt);
+      if (r == CUDA_SUCCESS)
+	{
+	  CUevent *te;
+
+	  te = e->evt;
+
+	  switch (e->type)
+	    {
+	    case PTX_EVT_MEM:
+	    case PTX_EVT_SYNC:
+	      break;
+	    
+	    case PTX_EVT_KNL:
+	      map_pop (e->addr);
+	      break;
+	    
+	    case PTX_EVT_ASYNC_CLEANUP:
+	      {
+		/* The function gomp_plugin_async_unmap_vars needs to claim the
+		   memory-map splay tree lock for the current device, so we
+		   can't call it when one of our callers has already claimed
+		   the lock.  In that case, just delay the GC for this event
+		   until later.  */
+		if (!memmap_lockable)
+		  {
+		    e = e->next;
+		    continue;
+		  }
+
+		GOMP_PLUGIN_async_unmap_vars (e->addr);
+	      }
+	      break;
+	    }
+
+	  cuEventDestroy (*te);
+	  free ((void *)te);
+
+	  struct PTX_event *next = e->next;
+
+	  if (PTX_events == e)
+	    PTX_events = PTX_events->next;
+	  else
+	    {
+	      struct PTX_event *e_ = PTX_events;
+	      while (e_->next != e)
+		e_ = e_->next;
+	      e_->next = e_->next->next;
+	    }
+
+	  free (e);
+	  e = next;
+        }
+      else
+	e = e->next;
+    }
+
+  GOMP_PLUGIN_mutex_unlock (&PTX_event_lock);
+}
+
+static void
+event_add (enum PTX_event_type type, CUevent *e, void *h)
+{
+  struct PTX_event *ptx_event;
+  struct nvptx_thread *nvthd = nvptx_thread ();
+
+  assert (type == PTX_EVT_MEM || type == PTX_EVT_KNL || type == PTX_EVT_SYNC
+	  || type == PTX_EVT_ASYNC_CLEANUP);
+
+  ptx_event = GOMP_PLUGIN_malloc (sizeof (struct PTX_event));
+  ptx_event->type = type;
+  ptx_event->evt = e;
+  ptx_event->addr = h;
+  ptx_event->ord = nvthd->ptx_dev->ord;
+
+  GOMP_PLUGIN_mutex_lock (&PTX_event_lock);
+
+  ptx_event->next = PTX_events;
+  PTX_events = ptx_event;
+
+  GOMP_PLUGIN_mutex_unlock (&PTX_event_lock);
+}
+
+void
+PTX_exec (void (*fn), size_t mapnum, void **hostaddrs, void **devaddrs,
+	  size_t *sizes, unsigned short *kinds, int num_gangs, int num_workers,
+	  int vector_length, int async, void *targ_mem_desc)
+{
+  struct targ_fn_descriptor *targ_fn = (struct targ_fn_descriptor *) fn;
+  CUfunction function;
+  CUresult r;
+  int i;
+  struct ptx_stream *dev_str;
+  void *kargs[1];
+  void *hp, *dp;
+  unsigned int nthreads_in_block;
+  struct nvptx_thread *nvthd = nvptx_thread ();
+
+  function = targ_fn->fn;
+  
+  dev_str = select_stream_for_async (async, pthread_self (), false, NULL);
+  assert (dev_str == nvthd->current_stream);
+
+  /* This reserves a chunk of a pre-allocated page of memory mapped on both
+     the host and the device. HP is a host pointer to the new chunk, and DP is
+     the corresponding device pointer.  */
+  map_push (dev_str, async, mapnum * sizeof (void *), &hp, &dp);
+
+  GOMP_PLUGIN_notify ("  %s: prepare mappings\n", __FUNCTION__);
+
+  /* Copy the array of arguments to the mapped page.  */
+  for (i = 0; i < mapnum; i++)
+    ((void **) hp)[i] = devaddrs[i];
+
+  /* Copy the (device) pointers to arguments to the device (dp and hp might in
+     fact have the same value on a unified-memory system).  */
+  r = cuMemcpy ((CUdeviceptr)dp, (CUdeviceptr)hp, mapnum * sizeof (void *));
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuMemcpy failed: %s", cuda_error (r));
+
+  GOMP_PLUGIN_notify ("  %s: kernel %s: launch\n", __FUNCTION__, targ_fn->name);
+
+  // XXX: possible geometry mappings??
+  //
+  // OpenACC		CUDA
+  //
+  // num_gangs		blocks
+  // num_workers	warps (where a warp is equivalent to 32 threads)
+  // vector length	threads
+  //
+
+  /* The openacc vector_length clause 'determines the vector length to use for
+     vector or SIMD operations'.  The question is how to map this to CUDA.
+
+     In CUDA, the warp size is the vector length of a CUDA device.  However, the
+     CUDA interface abstracts away from that, and only shows us warp size
+     indirectly in maximum number of threads per block, which is a product of
+     warp size and the number of hyperthreads of a multiprocessor.
+
+     We choose to map openacc vector_length directly onto the number of threads
+     in a block, in the x dimension.  This is reflected in gcc code generation
+     that uses ThreadIdx.x to access vector elements.
+
+     Attempting to use an openacc vector_length of more than the maximum number
+     of threads per block will result in a cuda error.  */
+  nthreads_in_block = vector_length;
+
+  kargs[0] = &dp;
+  r = cuLaunchKernel (function,
+			1, 1, 1,
+			nthreads_in_block, 1, 1,
+			0, dev_str->stream, kargs, 0);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuLaunchKernel error: %s", cuda_error (r));
+
+#ifndef DISABLE_ASYNC
+  if (async < acc_async_noval)
+    {
+      r = cuStreamSynchronize (dev_str->stream);
+      if (r != CUDA_SUCCESS)
+        GOMP_PLUGIN_fatal ("cuStreamSynchronize error: %s", cuda_error (r));
+    }
+  else
+    {
+      CUevent *e;
+
+      e = (CUevent *)GOMP_PLUGIN_malloc (sizeof (CUevent));
+
+      r = cuEventCreate (e, CU_EVENT_DISABLE_TIMING);
+      if (r != CUDA_SUCCESS)
+        GOMP_PLUGIN_fatal ("cuEventCreate error: %s", cuda_error (r));
+
+      event_gc (true);
+
+      r = cuEventRecord (*e, dev_str->stream);
+      if (r != CUDA_SUCCESS)
+        GOMP_PLUGIN_fatal ("cuEventRecord error: %s", cuda_error (r));
+
+      event_add (PTX_EVT_KNL, e, (void *)dev_str);
+    }
+#else
+  r = cuCtxSynchronize ();
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuCtxSynchronize error: %s", cuda_error (r));
+#endif
+
+  GOMP_PLUGIN_notify ("  %s: kernel %s: finished\n", __FUNCTION__,
+		      targ_fn->name);
+
+#ifndef DISABLE_ASYNC
+  if (async < acc_async_noval)
+#endif
+    map_pop (dev_str);
+}
+
+void * openacc_get_current_cuda_context (void);
+
+static void *
+PTX_alloc (size_t s)
+{
+  CUdeviceptr d;
+  CUresult r;
+
+  r = cuMemAlloc (&d, s);
+  if (r == CUDA_ERROR_OUT_OF_MEMORY)
+    return 0;
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuMemAlloc error: %s", cuda_error (r));
+  return (void *)d;
+}
+
+static void
+PTX_free (void *p)
+{
+  CUresult r;
+  CUdeviceptr pb;
+  size_t ps;
+
+  r = cuMemGetAddressRange (&pb, &ps, (CUdeviceptr)p);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuMemGetAddressRange error: %s", cuda_error (r));
+
+  if ((CUdeviceptr)p != pb)
+    GOMP_PLUGIN_fatal ("invalid device address");
+
+  r = cuMemFree ((CUdeviceptr)p);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuMemFree error: %s", cuda_error (r));
+}
+
+static void *
+PTX_host2dev (void *d, const void *h, size_t s)
+{
+  CUresult r;
+  CUdeviceptr pb;
+  size_t ps;
+  struct nvptx_thread *nvthd = nvptx_thread ();
+
+  if (!s)
+    return 0;
+
+  if (!d)
+    GOMP_PLUGIN_fatal ("invalid device address");
+
+  r = cuMemGetAddressRange (&pb, &ps, (CUdeviceptr)d);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuMemGetAddressRange error: %s", cuda_error (r));
+
+  if (!pb)
+    GOMP_PLUGIN_fatal ("invalid device address");
+
+  if (!h)
+    GOMP_PLUGIN_fatal ("invalid host address");
+
+  if (d == h)
+    GOMP_PLUGIN_fatal ("invalid host or device address");
+
+  if ((void *)(d + s) > (void *)(pb + ps))
+    GOMP_PLUGIN_fatal ("invalid size");
+
+#ifndef DISABLE_ASYNC
+  if (nvthd->current_stream != nvthd->ptx_dev->null_stream)
+    {
+      CUevent *e;
+
+      e = (CUevent *)GOMP_PLUGIN_malloc (sizeof (CUevent));
+
+      r = cuEventCreate (e, CU_EVENT_DISABLE_TIMING);
+      if (r != CUDA_SUCCESS)
+        GOMP_PLUGIN_fatal ("cuEventCreate error: %s", cuda_error (r));
+
+      event_gc (false);
+
+      r = cuMemcpyHtoDAsync ((CUdeviceptr)d, h, s,
+			     nvthd->current_stream->stream);
+      if (r != CUDA_SUCCESS)
+        GOMP_PLUGIN_fatal ("cuMemcpyHtoDAsync error: %s", cuda_error (r));
+
+      r = cuEventRecord (*e, nvthd->current_stream->stream);
+      if (r != CUDA_SUCCESS)
+        GOMP_PLUGIN_fatal ("cuEventRecord error: %s", cuda_error (r));
+
+      event_add (PTX_EVT_MEM, e, (void *)h);
+    }
+  else
+#endif
+    {
+      r = cuMemcpyHtoD ((CUdeviceptr)d, h, s);
+      if (r != CUDA_SUCCESS)
+        GOMP_PLUGIN_fatal ("cuMemcpyHtoD error: %s", cuda_error (r));
+    }
+
+  return 0;
+}
+
+static void *
+PTX_dev2host (void *h, const void *d, size_t s)
+{
+  CUresult r;
+  CUdeviceptr pb;
+  size_t ps;
+  struct nvptx_thread *nvthd = nvptx_thread ();
+
+  if (!s)
+    return 0;
+
+  if (!d)
+    GOMP_PLUGIN_fatal ("invalid device address");
+
+  r = cuMemGetAddressRange (&pb, &ps, (CUdeviceptr)d);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuMemGetAddressRange error: %s", cuda_error (r));
+
+  if (!pb)
+    GOMP_PLUGIN_fatal ("invalid device address");
+
+  if (!h)
+    GOMP_PLUGIN_fatal ("invalid host address");
+
+  if (d == h)
+    GOMP_PLUGIN_fatal ("invalid host or device address");
+
+  if ((void *)(d + s) > (void *)(pb + ps))
+    GOMP_PLUGIN_fatal ("invalid size");
+
+#ifndef DISABLE_ASYNC
+  if (nvthd->current_stream != nvthd->ptx_dev->null_stream)
+    {
+      CUevent *e;
+
+      e = (CUevent *)GOMP_PLUGIN_malloc (sizeof (CUevent));
+
+      r = cuEventCreate (e, CU_EVENT_DISABLE_TIMING);
+      if (r != CUDA_SUCCESS)
+        GOMP_PLUGIN_fatal ("cuEventCreate error: %s\n", cuda_error (r));
+
+      event_gc (false);
+
+      r = cuMemcpyDtoHAsync (h, (CUdeviceptr)d, s,
+			     nvthd->current_stream->stream);
+      if (r != CUDA_SUCCESS)
+        GOMP_PLUGIN_fatal ("cuMemcpyDtoHAsync error: %s", cuda_error (r));
+
+      r = cuEventRecord (*e, nvthd->current_stream->stream);
+      if (r != CUDA_SUCCESS)
+        GOMP_PLUGIN_fatal ("cuEventRecord error: %s", cuda_error (r));
+
+      event_add (PTX_EVT_MEM, e, (void *)h);
+    }
+  else
+#endif
+    {
+      r = cuMemcpyDtoH (h, (CUdeviceptr)d, s);
+      if (r != CUDA_SUCCESS)
+	GOMP_PLUGIN_fatal ("cuMemcpyDtoH error: %s", cuda_error (r));
+    }
+
+  return 0;
+}
+
+static void
+PTX_set_async (int async)
+{
+  struct nvptx_thread *nvthd = nvptx_thread ();
+  nvthd->current_stream
+    = select_stream_for_async (async, pthread_self (), true, NULL);
+}
+
+static int
+PTX_async_test (int async)
+{
+  CUresult r;
+  struct ptx_stream *s;
+  
+  s = select_stream_for_async (async, pthread_self (), false, NULL);
+
+  if (!s)
+    GOMP_PLUGIN_fatal ("unknown async %d", async);
+
+  r = cuStreamQuery (s->stream);
+  if (r == CUDA_SUCCESS)
+    {
+      /* The oacc-parallel.c:goacc_wait function calls this hook to determine
+	 whether all work has completed on this stream, and if so omits the call
+	 to the wait hook.  If that happens, event_gc might not get called
+	 (which prevents variables from getting unmapped and their associated
+	 device storage freed), so call it here.  */
+      event_gc (true);
+      return 1;
+    }
+  else if (r == CUDA_ERROR_NOT_READY)
+    return 0;
+
+  GOMP_PLUGIN_fatal ("cuStreamQuery error: %s", cuda_error (r));
+
+  return 0;
+}
+
+static int
+PTX_async_test_all (void)
+{
+  struct ptx_stream *s;
+  pthread_t self = pthread_self ();
+  struct nvptx_thread *nvthd = nvptx_thread ();
+
+  GOMP_PLUGIN_mutex_lock (&nvthd->ptx_dev->stream_lock);
+
+  for (s = nvthd->ptx_dev->active_streams; s != NULL; s = s->next)
+    {
+      if ((s->multithreaded || pthread_equal (s->host_thread, self))
+	  && cuStreamQuery (s->stream) == CUDA_ERROR_NOT_READY)
+	{
+	  GOMP_PLUGIN_mutex_unlock (&nvthd->ptx_dev->stream_lock);
+	  return 0;
+	}
+    }
+
+  GOMP_PLUGIN_mutex_unlock (&nvthd->ptx_dev->stream_lock);
+
+  event_gc (true);
+
+  return 1;
+}
+
+static void
+PTX_wait (int async)
+{
+  CUresult r;
+  struct ptx_stream *s;
+  
+  s = select_stream_for_async (async, pthread_self (), false, NULL);
+
+  if (!s)
+    GOMP_PLUGIN_fatal ("unknown async %d", async);
+
+  r = cuStreamSynchronize (s->stream);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuStreamSynchronize error: %s", cuda_error (r));
+  
+  event_gc (true);
+}
+
+static void
+PTX_wait_async (int async1, int async2)
+{
+  CUresult r;
+  CUevent *e;
+  struct ptx_stream *s1, *s2;
+  pthread_t self = pthread_self ();
+
+  /* The stream that is waiting (rather than being waited for) doesn't
+     necessarily have to exist already.  */
+  s2 = select_stream_for_async (async2, self, true, NULL);
+
+  s1 = select_stream_for_async (async1, self, false, NULL);
+  if (!s1)
+    GOMP_PLUGIN_fatal ("invalid async 1\n");
+
+  if (s1 == s2)
+    GOMP_PLUGIN_fatal ("identical parameters");
+
+  e = (CUevent *)GOMP_PLUGIN_malloc (sizeof (CUevent));
+
+  r = cuEventCreate (e, CU_EVENT_DISABLE_TIMING);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuEventCreate error: %s", cuda_error (r));
+
+  event_gc (true);
+
+  r = cuEventRecord (*e, s1->stream);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuEventRecord error: %s", cuda_error (r));
+
+  event_add (PTX_EVT_SYNC, e, NULL);
+
+  r = cuStreamWaitEvent (s2->stream, *e, 0);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuStreamWaitEvent error: %s", cuda_error (r));
+}
+
+static void
+PTX_wait_all (void)
+{
+  CUresult r;
+  struct ptx_stream *s;
+  pthread_t self = pthread_self ();
+  struct nvptx_thread *nvthd = nvptx_thread ();
+
+  GOMP_PLUGIN_mutex_lock (&nvthd->ptx_dev->stream_lock);
+
+  /* Wait for active streams initiated by this thread (or by multiple threads)
+     to complete.  */
+  for (s = nvthd->ptx_dev->active_streams; s != NULL; s = s->next)
+    {
+      if (s->multithreaded || pthread_equal (s->host_thread, self))
+	{
+	  r = cuStreamQuery (s->stream);
+	  if (r == CUDA_SUCCESS)
+	    continue;
+	  else if (r != CUDA_ERROR_NOT_READY)
+	    GOMP_PLUGIN_fatal ("cuStreamQuery error: %s", cuda_error (r));
+
+	  r = cuStreamSynchronize (s->stream);
+	  if (r != CUDA_SUCCESS)
+	    GOMP_PLUGIN_fatal ("cuStreamSynchronize error: %s", cuda_error (r));
+	}
+    }
+
+  GOMP_PLUGIN_mutex_unlock (&nvthd->ptx_dev->stream_lock);
+
+  event_gc (true);
+}
+
+static void
+PTX_wait_all_async (int async)
+{
+  CUresult r;
+  struct ptx_stream *waiting_stream, *other_stream;
+  CUevent *e;
+  struct nvptx_thread *nvthd = nvptx_thread ();
+  pthread_t self = pthread_self ();
+  
+  /* The stream doing the waiting.  This could be the first mention of the
+     stream, so create it if necessary.  */
+  waiting_stream
+    = select_stream_for_async (async, pthread_self (), true, NULL);
+  
+  /* Launches on the null stream already block on other streams in the
+     context.  */
+  if (!waiting_stream || waiting_stream == nvthd->ptx_dev->null_stream)
+    return;
+
+  event_gc (true);
+
+  GOMP_PLUGIN_mutex_lock (&nvthd->ptx_dev->stream_lock);
+
+  for (other_stream = nvthd->ptx_dev->active_streams;
+       other_stream != NULL;
+       other_stream = other_stream->next)
+    {
+      if (!other_stream->multithreaded
+	  && !pthread_equal (other_stream->host_thread, self))
+	continue;
+
+      e = (CUevent *) GOMP_PLUGIN_malloc (sizeof (CUevent));
+
+      r = cuEventCreate (e, CU_EVENT_DISABLE_TIMING);
+      if (r != CUDA_SUCCESS)
+	GOMP_PLUGIN_fatal ("cuEventCreate error: %s", cuda_error (r));
+
+      /* Record an event on the waited-for stream.  */
+      r = cuEventRecord (*e, other_stream->stream);
+      if (r != CUDA_SUCCESS)
+	GOMP_PLUGIN_fatal ("cuEventRecord error: %s", cuda_error (r));
+
+      event_add (PTX_EVT_SYNC, e, NULL);
+
+      r = cuStreamWaitEvent (waiting_stream->stream, *e, 0);
+      if (r != CUDA_SUCCESS)
+	GOMP_PLUGIN_fatal ("cuStreamWaitEvent error: %s", cuda_error (r));
+   }
+
+  GOMP_PLUGIN_mutex_unlock (&nvthd->ptx_dev->stream_lock);
+}
+
+static void *
+PTX_get_current_cuda_device (void)
+{
+  struct nvptx_thread *nvthd = nvptx_thread ();
+
+  if (!nvthd || !nvthd->ptx_dev)
+    return NULL;
+
+  return &nvthd->ptx_dev->dev;
+}
+
+static void *
+PTX_get_current_cuda_context (void)
+{
+  struct nvptx_thread *nvthd = nvptx_thread ();
+
+  if (!nvthd || !nvthd->ptx_dev)
+    return NULL;
+
+  return nvthd->ptx_dev->ctx;
+}
+
+static void *
+PTX_get_cuda_stream (int async)
+{
+  struct ptx_stream *s;
+  struct nvptx_thread *nvthd = nvptx_thread ();
+
+  if (!nvthd || !nvthd->ptx_dev)
+    return NULL;
+
+  s = select_stream_for_async (async, pthread_self (), false, NULL);
+
+  return s ? s->stream : NULL;
+}
+
+static int
+PTX_set_cuda_stream (int async, void *stream)
+{
+  struct ptx_stream *oldstream;
+  pthread_t self = pthread_self ();
+  struct nvptx_thread *nvthd = nvptx_thread ();
+
+  GOMP_PLUGIN_mutex_lock (&nvthd->ptx_dev->stream_lock);
+
+  if (async < 0)
+    GOMP_PLUGIN_fatal ("bad async %d", async);
+
+  /* We have a list of active streams and an array mapping async values to
+     entries of that list.  We need to take "ownership" of the passed-in stream,
+     and add it to our list, removing the previous entry also (if there was one)
+     in order to prevent resource leaks.  Note the potential for surprise
+     here: maybe we should keep track of passed-in streams and leave it up to
+     the user to tidy those up, but that doesn't work for stream handles
+     returned from acc_get_cuda_stream above...  */
+
+  oldstream = select_stream_for_async (async, self, false, NULL);
+  
+  if (oldstream)
+    {
+      if (nvthd->ptx_dev->active_streams == oldstream)
+	nvthd->ptx_dev->active_streams = nvthd->ptx_dev->active_streams->next;
+      else
+	{
+	  struct ptx_stream *s = nvthd->ptx_dev->active_streams;
+	  while (s->next != oldstream)
+	    s = s->next;
+	  s->next = s->next->next;
+	}
+
+      cuStreamDestroy (oldstream->stream);
+      map_fini (oldstream);
+      free (oldstream);
+    }
+
+  GOMP_PLUGIN_mutex_unlock (&nvthd->ptx_dev->stream_lock);
+
+  (void) select_stream_for_async (async, self, true, (CUstream) stream);
+
+  return 1;
+}
+
+/* Plugin entry points.  */
+
+
+int
+GOMP_OFFLOAD_get_type (void)
+{
+  return OFFLOAD_TARGET_TYPE_NVIDIA_PTX;
+}
+
+unsigned int
+GOMP_OFFLOAD_get_caps (void)
+{
+  return TARGET_CAP_OPENACC_200;
+}
+
+const char *
+GOMP_OFFLOAD_get_name (void)
+{
+  return "nvidia";
+}
+
+int
+GOMP_OFFLOAD_get_num_devices (void)
+{
+  return PTX_get_num_devices ();
+}
+
+static void **kernel_target_data;
+static void **kernel_host_table;
+
+void
+GOMP_OFFLOAD_register_image (void *host_table, void *target_data)
+{
+  kernel_target_data = target_data;
+  kernel_host_table = host_table;
+}
+
+void
+GOMP_OFFLOAD_init_device (int n __attribute__((unused)))
+{
+  (void) PTX_init ();
+}
+
+void
+GOMP_OFFLOAD_fini_device (int n __attribute__((unused)))
+{
+  PTX_fini ();
+}
+
+int
+GOMP_OFFLOAD_get_table (int n __attribute__((unused)),
+			struct mapping_table **tablep)
+{
+  CUmodule module;
+  void **fn_table;
+  char **fn_names;
+  int fn_entries, i;
+  CUresult r;
+  struct targ_fn_descriptor *targ_fns;
+
+  if (PTX_init () <= 0)
+    return 0;
+
+  /* This isn't an error, because an image may legitimately have no offloaded
+     regions and so will not call GOMP_offload_register.  */
+  if (kernel_target_data == NULL)
+    return 0;
+
+  link_ptx (&module, kernel_target_data[0]);
+
+  /* kernel_target_data[0] -> ptx code
+     kernel_target_data[1] -> variable mappings
+     kernel_target_data[2] -> array of kernel names in ascii
+
+     kernel_host_table[0] -> start of function addresses (_omp_func_table)
+     kernel_host_table[1] -> end of function addresses (_omp_funcs_end)
+
+     The array of kernel names and the functions addresses form a
+     one-to-one correspondence.  */
+
+  fn_table = kernel_host_table[0];
+  fn_names = (char **) kernel_target_data[2];
+  fn_entries = (kernel_host_table[1] - kernel_host_table[0]) / sizeof (void *);
+
+  *tablep = GOMP_PLUGIN_malloc (sizeof (struct mapping_table) * fn_entries);
+  targ_fns = GOMP_PLUGIN_malloc (sizeof (struct targ_fn_descriptor)
+				 * fn_entries);
+
+  for (i = 0; i < fn_entries; i++)
+    {
+      CUfunction function;
+
+      r = cuModuleGetFunction (&function, module, fn_names[i]);
+      if (r != CUDA_SUCCESS)
+	GOMP_PLUGIN_fatal ("cuModuleGetFunction error: %s", cuda_error (r));
+
+      targ_fns[i].fn = function;
+      targ_fns[i].name = (const char *) fn_names[i];
+
+      (*tablep)[i].host_start = (uintptr_t) fn_table[i];
+      (*tablep)[i].host_end = (*tablep)[i].host_start + 1;
+      (*tablep)[i].tgt_start = (uintptr_t) &targ_fns[i];
+      (*tablep)[i].tgt_end = (*tablep)[i].tgt_start + 1;
+    }
+
+  return fn_entries;
+}
+
+void *
+GOMP_OFFLOAD_alloc (int n __attribute__((unused)), size_t size)
+{
+  return PTX_alloc (size);
+}
+
+void
+GOMP_OFFLOAD_free (int n __attribute__((unused)), void *ptr)
+{
+  PTX_free (ptr);
+}
+
+void *
+GOMP_OFFLOAD_dev2host (int ord __attribute__((unused)), void *dst,
+		       const void *src, size_t n)
+{
+  return PTX_dev2host (dst, src, n);
+}
+
+void *
+GOMP_OFFLOAD_host2dev (int ord __attribute__((unused)), void *dst,
+		       const void *src, size_t n)
+{
+  return PTX_host2dev (dst, src, n);
+}
+
+void (*device_run) (void *fn_ptr, void *vars) = NULL;
+
+void
+GOMP_OFFLOAD_openacc_parallel (void (*fn) (void *), size_t mapnum,
+			      void **hostaddrs, void **devaddrs, size_t *sizes,
+			      unsigned short *kinds, int num_gangs,
+			      int num_workers, int vector_length, int async,
+			      void *targ_mem_desc)
+{
+  PTX_exec (fn, mapnum, hostaddrs, devaddrs, sizes, kinds, num_gangs,
+	    num_workers, vector_length, async, targ_mem_desc);
+}
+
+void *
+GOMP_OFFLOAD_openacc_open_device (int n)
+{
+  return PTX_open_device (n);
+}
+
+int
+GOMP_OFFLOAD_openacc_close_device (void *h)
+{
+  return PTX_close_device (h);
+}
+
+void
+GOMP_OFFLOAD_openacc_set_device_num (int n)
+{
+  struct nvptx_thread *nvthd = nvptx_thread ();
+
+  assert (n >= 0);
+
+  if (!nvthd->ptx_dev || nvthd->ptx_dev->ord != n)
+    (void) PTX_open_device (n);
+}
+
+/* This can be called before the device is "opened" for the current thread, in
+   which case we can't tell which device number should be returned.  We don't
+   actually want to open the device here, so just return -1 and let the caller
+   (oacc-init.c:acc_get_device_num) handle it.  */
+
+int
+GOMP_OFFLOAD_openacc_get_device_num (void)
+{
+  struct nvptx_thread *nvthd = nvptx_thread ();
+
+  if (nvthd && nvthd->ptx_dev)
+    return nvthd->ptx_dev->ord;
+  else
+    return -1;
+}
+
+void
+GOMP_OFFLOAD_openacc_register_async_cleanup (void *targ_mem_desc)
+{
+  CUevent *e;
+  CUresult r;
+  struct nvptx_thread *nvthd = nvptx_thread ();
+
+  e = (CUevent *) GOMP_PLUGIN_malloc (sizeof (CUevent));
+
+  r = cuEventCreate (e, CU_EVENT_DISABLE_TIMING);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuEventCreate error: %s", cuda_error (r));
+
+  r = cuEventRecord (*e, nvthd->current_stream->stream);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuEventRecord error: %s", cuda_error (r));
+
+  event_add (PTX_EVT_ASYNC_CLEANUP, e, targ_mem_desc);
+}
+
+int
+GOMP_OFFLOAD_openacc_async_test (int async)
+{
+  return PTX_async_test (async);
+}
+
+int
+GOMP_OFFLOAD_openacc_async_test_all (void)
+{
+  return PTX_async_test_all ();
+}
+
+void
+GOMP_OFFLOAD_openacc_async_wait (int async)
+{
+  PTX_wait (async);
+}
+
+void
+GOMP_OFFLOAD_openacc_async_wait_async (int async1, int async2)
+{
+  PTX_wait_async (async1, async2);
+}
+
+void
+GOMP_OFFLOAD_openacc_async_wait_all (void)
+{
+  PTX_wait_all ();
+}
+
+void
+GOMP_OFFLOAD_openacc_async_wait_all_async (int async)
+{
+  PTX_wait_all_async (async);
+}
+
+void
+GOMP_OFFLOAD_openacc_async_set_async (int async)
+{
+  PTX_set_async (async);
+}
+
+void *
+GOMP_OFFLOAD_openacc_create_thread_data (void *targ_data)
+{
+  struct ptx_device *ptx_dev = (struct ptx_device *) targ_data;
+  struct nvptx_thread *nvthd
+    = GOMP_PLUGIN_malloc (sizeof (struct nvptx_thread));
+  CUresult r;
+  CUcontext thd_ctx;
+
+  r = cuCtxGetCurrent (&thd_ctx);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuCtxGetCurrent error: %s", cuda_error (r));
+
+  assert (ptx_dev->ctx);
+
+  if (!thd_ctx)
+    {
+      r = cuCtxPushCurrent (ptx_dev->ctx);
+      if (r != CUDA_SUCCESS)
+	GOMP_PLUGIN_fatal ("cuCtxPushCurrent error: %s", cuda_error (r));
+    }
+
+  nvthd->current_stream = ptx_dev->null_stream;
+  nvthd->ptx_dev = ptx_dev;
+
+  return (void *) nvthd;
+}
+
+void
+GOMP_OFFLOAD_openacc_destroy_thread_data (void *data)
+{
+  free (data);
+}
+
+void *
+GOMP_OFFLOAD_openacc_get_current_cuda_device (void)
+{
+  return PTX_get_current_cuda_device ();
+}
+
+void *
+GOMP_OFFLOAD_openacc_get_current_cuda_context (void)
+{
+  return PTX_get_current_cuda_context ();
+}
+
+/* NOTE: This returns a CUstream, not a ptx_stream pointer.  */
+
+void *
+GOMP_OFFLOAD_openacc_get_cuda_stream (int async)
+{
+  return PTX_get_cuda_stream (async);
+}
+
+/* NOTE: This takes a CUstream, not a ptx_stream pointer.  */
+
+int
+GOMP_OFFLOAD_openacc_set_cuda_stream (int async, void *stream)
+{
+  return PTX_set_cuda_stream (async, stream);
+}
diff --git a/libgomp/splay-tree.c b/libgomp/splay-tree.c
new file mode 100644
index 0000000..14b03ac
--- /dev/null
+++ b/libgomp/splay-tree.c
@@ -0,0 +1,224 @@
+/* A splay-tree datatype.
+   Copyright 1998-2013
+   Free Software Foundation, Inc.
+   Contributed by Mark Mitchell (mark@markmitchell.com).
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* The splay tree code copied from include/splay-tree.h and adjusted,
+   so that all the data lives directly in splay_tree_node_s structure
+   and no extra allocations are needed.
+
+   Files including this header should before including it add:
+typedef struct splay_tree_node_s *splay_tree_node;
+typedef struct splay_tree_s *splay_tree;
+typedef struct splay_tree_key_s *splay_tree_key;
+   define splay_tree_key_s structure, and define
+   splay_compare inline function.  */
+
+/* For an easily readable description of splay-trees, see:
+
+     Lewis, Harry R. and Denenberg, Larry.  Data Structures and Their
+     Algorithms.  Harper-Collins, Inc.  1991.
+
+   The major feature of splay trees is that all basic tree operations
+   are amortized O(log n) time for a tree with n nodes.  */
+
+#include "libgomp.h"
+#include "splay-tree.h"
+
+extern int splay_compare (splay_tree_key, splay_tree_key);
+
+/* Rotate the edge joining the left child N with its parent P.  PP is the
+   grandparents' pointer to P.  */
+
+static inline void
+rotate_left (splay_tree_node *pp, splay_tree_node p, splay_tree_node n)
+{
+  splay_tree_node tmp;
+  tmp = n->right;
+  n->right = p;
+  p->left = tmp;
+  *pp = n;
+}
+
+/* Rotate the edge joining the right child N with its parent P.  PP is the
+   grandparents' pointer to P.  */
+
+static inline void
+rotate_right (splay_tree_node *pp, splay_tree_node p, splay_tree_node n)
+{
+  splay_tree_node tmp;
+  tmp = n->left;
+  n->left = p;
+  p->right = tmp;
+  *pp = n;
+}
+
+/* Bottom up splay of KEY.  */
+
+static void
+splay_tree_splay (splay_tree sp, splay_tree_key key)
+{
+  if (sp->root == NULL)
+    return;
+
+  do {
+    int cmp1, cmp2;
+    splay_tree_node n, c;
+
+    n = sp->root;
+    cmp1 = splay_compare (key, &n->key);
+
+    /* Found.  */
+    if (cmp1 == 0)
+      return;
+
+    /* Left or right?  If no child, then we're done.  */
+    if (cmp1 < 0)
+      c = n->left;
+    else
+      c = n->right;
+    if (!c)
+      return;
+
+    /* Next one left or right?  If found or no child, we're done
+       after one rotation.  */
+    cmp2 = splay_compare (key, &c->key);
+    if (cmp2 == 0
+	|| (cmp2 < 0 && !c->left)
+	|| (cmp2 > 0 && !c->right))
+      {
+	if (cmp1 < 0)
+	  rotate_left (&sp->root, n, c);
+	else
+	  rotate_right (&sp->root, n, c);
+	return;
+      }
+
+    /* Now we have the four cases of double-rotation.  */
+    if (cmp1 < 0 && cmp2 < 0)
+      {
+	rotate_left (&n->left, c, c->left);
+	rotate_left (&sp->root, n, n->left);
+      }
+    else if (cmp1 > 0 && cmp2 > 0)
+      {
+	rotate_right (&n->right, c, c->right);
+	rotate_right (&sp->root, n, n->right);
+      }
+    else if (cmp1 < 0 && cmp2 > 0)
+      {
+	rotate_right (&n->left, c, c->right);
+	rotate_left (&sp->root, n, n->left);
+      }
+    else if (cmp1 > 0 && cmp2 < 0)
+      {
+	rotate_left (&n->right, c, c->left);
+	rotate_right (&sp->root, n, n->right);
+      }
+  } while (1);
+}
+
+/* Insert a new NODE into SP.  The NODE shouldn't exist in the tree.  */
+
+attribute_hidden void
+splay_tree_insert (splay_tree sp, splay_tree_node node)
+{
+  int comparison = 0;
+
+  splay_tree_splay (sp, &node->key);
+
+  if (sp->root)
+    comparison = splay_compare (&sp->root->key, &node->key);
+
+  if (sp->root && comparison == 0)
+    gomp_fatal ("Duplicate node");
+  else
+    {
+      /* Insert it at the root.  */
+      if (sp->root == NULL)
+	node->left = node->right = NULL;
+      else if (comparison < 0)
+	{
+	  node->left = sp->root;
+	  node->right = node->left->right;
+	  node->left->right = NULL;
+	}
+      else
+	{
+	  node->right = sp->root;
+	  node->left = node->right->left;
+	  node->right->left = NULL;
+	}
+
+      sp->root = node;
+    }
+}
+
+/* Remove node with KEY from SP.  It is not an error if it did not exist.  */
+
+attribute_hidden void
+splay_tree_remove (splay_tree sp, splay_tree_key key)
+{
+  splay_tree_splay (sp, key);
+
+  if (sp->root && splay_compare (&sp->root->key, key) == 0)
+    {
+      splay_tree_node left, right;
+
+      left = sp->root->left;
+      right = sp->root->right;
+
+      /* One of the children is now the root.  Doesn't matter much
+	 which, so long as we preserve the properties of the tree.  */
+      if (left)
+	{
+	  sp->root = left;
+
+	  /* If there was a right child as well, hang it off the
+	     right-most leaf of the left child.  */
+	  if (right)
+	    {
+	      while (left->right)
+		left = left->right;
+	      left->right = right;
+	    }
+	}
+      else
+	sp->root = right;
+    }
+}
+
+/* Lookup KEY in SP, returning NODE if present, and NULL
+   otherwise.  */
+
+attribute_hidden splay_tree_key
+splay_tree_lookup (splay_tree sp, splay_tree_key key)
+{
+  splay_tree_splay (sp, key);
+
+  if (sp->root && splay_compare (&sp->root->key, key) == 0)
+    return &sp->root->key;
+  else
+    return NULL;
+}
diff --git a/libgomp/splay-tree.h b/libgomp/splay-tree.h
index eb8011a..f29d437 100644
--- a/libgomp/splay-tree.h
+++ b/libgomp/splay-tree.h
@@ -43,6 +43,30 @@ typedef struct splay_tree_key_s *splay_tree_key;
    The major feature of splay trees is that all basic tree operations
    are amortized O(log n) time for a tree with n nodes.  */
 
+#ifndef _SPLAY_TREE_H
+#define _SPLAY_TREE_H 1
+
+typedef struct splay_tree_node_s *splay_tree_node;
+typedef struct splay_tree_s *splay_tree;
+typedef struct splay_tree_key_s *splay_tree_key;
+
+struct splay_tree_key_s {
+  /* Address of the host object.  */
+  uintptr_t host_start;
+  /* Address immediately after the host object.  */
+  uintptr_t host_end;
+  /* Descriptor of the target memory.  */
+  struct target_mem_desc *tgt;
+  /* Offset from tgt->tgt_start to the start of the target object.  */
+  uintptr_t tgt_offset;
+  /* Reference count.  */
+  uintptr_t refcount;
+  /* Asynchronous reference count.  */
+  uintptr_t async_refcount;
+  /* True if data should be copied from device to host at the end.  */
+  bool copy_from;
+};
+
 /* The nodes in the splay tree.  */
 struct splay_tree_node_s {
   struct splay_tree_key_s key;
@@ -56,177 +80,8 @@ struct splay_tree_s {
   splay_tree_node root;
 };
 
-/* Rotate the edge joining the left child N with its parent P.  PP is the
-   grandparents' pointer to P.  */
-
-static inline void
-rotate_left (splay_tree_node *pp, splay_tree_node p, splay_tree_node n)
-{
-  splay_tree_node tmp;
-  tmp = n->right;
-  n->right = p;
-  p->left = tmp;
-  *pp = n;
-}
-
-/* Rotate the edge joining the right child N with its parent P.  PP is the
-   grandparents' pointer to P.  */
-
-static inline void
-rotate_right (splay_tree_node *pp, splay_tree_node p, splay_tree_node n)
-{
-  splay_tree_node tmp;
-  tmp = n->left;
-  n->left = p;
-  p->right = tmp;
-  *pp = n;
-}
-
-/* Bottom up splay of KEY.  */
-
-static void
-splay_tree_splay (splay_tree sp, splay_tree_key key)
-{
-  if (sp->root == NULL)
-    return;
-
-  do {
-    int cmp1, cmp2;
-    splay_tree_node n, c;
-
-    n = sp->root;
-    cmp1 = splay_compare (key, &n->key);
-
-    /* Found.  */
-    if (cmp1 == 0)
-      return;
-
-    /* Left or right?  If no child, then we're done.  */
-    if (cmp1 < 0)
-      c = n->left;
-    else
-      c = n->right;
-    if (!c)
-      return;
-
-    /* Next one left or right?  If found or no child, we're done
-       after one rotation.  */
-    cmp2 = splay_compare (key, &c->key);
-    if (cmp2 == 0
-	|| (cmp2 < 0 && !c->left)
-	|| (cmp2 > 0 && !c->right))
-      {
-	if (cmp1 < 0)
-	  rotate_left (&sp->root, n, c);
-	else
-	  rotate_right (&sp->root, n, c);
-	return;
-      }
-
-    /* Now we have the four cases of double-rotation.  */
-    if (cmp1 < 0 && cmp2 < 0)
-      {
-	rotate_left (&n->left, c, c->left);
-	rotate_left (&sp->root, n, n->left);
-      }
-    else if (cmp1 > 0 && cmp2 > 0)
-      {
-	rotate_right (&n->right, c, c->right);
-	rotate_right (&sp->root, n, n->right);
-      }
-    else if (cmp1 < 0 && cmp2 > 0)
-      {
-	rotate_right (&n->left, c, c->right);
-	rotate_left (&sp->root, n, n->left);
-      }
-    else if (cmp1 > 0 && cmp2 < 0)
-      {
-	rotate_left (&n->right, c, c->left);
-	rotate_right (&sp->root, n, n->right);
-      }
-  } while (1);
-}
-
-/* Insert a new NODE into SP.  The NODE shouldn't exist in the tree.  */
-
-static void
-splay_tree_insert (splay_tree sp, splay_tree_node node)
-{
-  int comparison = 0;
-
-  splay_tree_splay (sp, &node->key);
-
-  if (sp->root)
-    comparison = splay_compare (&sp->root->key, &node->key);
-
-  if (sp->root && comparison == 0)
-    abort ();
-  else
-    {
-      /* Insert it at the root.  */
-      if (sp->root == NULL)
-	node->left = node->right = NULL;
-      else if (comparison < 0)
-	{
-	  node->left = sp->root;
-	  node->right = node->left->right;
-	  node->left->right = NULL;
-	}
-      else
-	{
-	  node->right = sp->root;
-	  node->left = node->right->left;
-	  node->right->left = NULL;
-	}
-
-      sp->root = node;
-    }
-}
-
-/* Remove node with KEY from SP.  It is not an error if it did not exist.  */
-
-static void
-splay_tree_remove (splay_tree sp, splay_tree_key key)
-{
-  splay_tree_splay (sp, key);
-
-  if (sp->root && splay_compare (&sp->root->key, key) == 0)
-    {
-      splay_tree_node left, right;
-
-      left = sp->root->left;
-      right = sp->root->right;
-
-      /* One of the children is now the root.  Doesn't matter much
-	 which, so long as we preserve the properties of the tree.  */
-      if (left)
-	{
-	  sp->root = left;
-
-	  /* If there was a right child as well, hang it off the
-	     right-most leaf of the left child.  */
-	  if (right)
-	    {
-	      while (left->right)
-		left = left->right;
-	      left->right = right;
-	    }
-	}
-      else
-	sp->root = right;
-    }
-}
-
-/* Lookup KEY in SP, returning NODE if present, and NULL
-   otherwise.  */
-
-static splay_tree_key
-splay_tree_lookup (splay_tree sp, splay_tree_key key)
-{
-  splay_tree_splay (sp, key);
-
-  if (sp->root && splay_compare (&sp->root->key, key) == 0)
-    return &sp->root->key;
-  else
-    return NULL;
-}
+attribute_hidden splay_tree_key splay_tree_lookup (splay_tree, splay_tree_key);
+attribute_hidden void splay_tree_insert (splay_tree, splay_tree_node);
+attribute_hidden void splay_tree_remove (splay_tree, splay_tree_key);
+
+#endif /* _SPLAY_TREE_H */
diff --git a/libgomp/target.c b/libgomp/target.c
index 5b4873b..9345ac2 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -30,7 +30,12 @@
 #include <limits.h>
 #include <stdbool.h>
 #include <stdlib.h>
+#include "oacc-plugin.h"
+#include "gomp-constants.h"
+#include "oacc-int.h"
 #include <string.h>
+#include <stdio.h>
+#include <assert.h>
 
 #ifdef PLUGIN_SUPPORT
 #include <dlfcn.h>
@@ -40,50 +45,6 @@ static void gomp_target_init (void);
 
 static pthread_once_t gomp_is_initialized = PTHREAD_ONCE_INIT;
 
-/* Forward declaration for a node in the tree.  */
-typedef struct splay_tree_node_s *splay_tree_node;
-typedef struct splay_tree_s *splay_tree;
-typedef struct splay_tree_key_s *splay_tree_key;
-
-struct target_mem_desc {
-  /* Reference count.  */
-  uintptr_t refcount;
-  /* All the splay nodes allocated together.  */
-  splay_tree_node array;
-  /* Start of the target region.  */
-  uintptr_t tgt_start;
-  /* End of the targer region.  */
-  uintptr_t tgt_end;
-  /* Handle to free.  */
-  void *to_free;
-  /* Previous target_mem_desc.  */
-  struct target_mem_desc *prev;
-  /* Number of items in following list.  */
-  size_t list_count;
-
-  /* Corresponding target device descriptor.  */
-  struct gomp_device_descr *device_descr;
-
-  /* List of splay keys to remove (or decrease refcount)
-     at the end of region.  */
-  splay_tree_key list[];
-};
-
-struct splay_tree_key_s {
-  /* Address of the host object.  */
-  uintptr_t host_start;
-  /* Address immediately after the host object.  */
-  uintptr_t host_end;
-  /* Descriptor of the target memory.  */
-  struct target_mem_desc *tgt;
-  /* Offset from tgt->tgt_start to the start of the target object.  */
-  uintptr_t tgt_offset;
-  /* Reference count.  */
-  uintptr_t refcount;
-  /* True if data should be copied from device to host at the end.  */
-  bool copy_from;
-};
-
 /* This structure describes an offload image.
    It contains type of the target device, pointer to host table descriptor, and
    pointer to target data.  */
@@ -107,7 +68,7 @@ static int num_devices;
 
 /* The comparison function.  */
 
-static int
+attribute_hidden int
 splay_compare (splay_tree_key x, splay_tree_key y)
 {
   if (x->host_start == x->host_end
@@ -122,47 +83,16 @@ splay_compare (splay_tree_key x, splay_tree_key y)
 
 #include "splay-tree.h"
 
-/* This structure describes accelerator device.
-   It contains ID-number of the device, its type, function handlers for
-   interaction with the device, and information about mapped memory.  */
-struct gomp_device_descr
+attribute_hidden void
+gomp_init_targets_once (void)
 {
-  /* This is the ID number of device.  It could be specified in DEVICE-clause of
-     TARGET construct.  */
-  int id;
-
-  /* This is the ID number of device among devices of the same type.  */
-  int target_id;
-
-  /* This is the TYPE of device.  */
-  enum offload_target_type type;
-
-  /* Set to true when device is initialized.  */
-  bool is_initialized;
-
-  /* Function handlers.  */
-  int (*get_type_func) (void);
-  int (*get_num_devices_func) (void);
-  void (*register_image_func) (void *, void *);
-  void (*init_device_func) (int);
-  int (*get_table_func) (int, void *);
-  void *(*alloc_func) (int, size_t);
-  void (*free_func) (int, void *);
-  void *(*host2dev_func) (int, void *, const void *, size_t);
-  void *(*dev2host_func) (int, void *, const void *, size_t);
-  void (*run_func) (int, void *, void *);
-
-  /* Splay tree containing information about mapped memory regions.  */
-  struct splay_tree_s dev_splay_tree;
-
-  /* Mutex for operating with the splay tree and other shared structures.  */
-  gomp_mutex_t dev_env_lock;
-};
+  (void) pthread_once (&gomp_is_initialized, gomp_target_init);
+}
 
 attribute_hidden int
 gomp_get_num_devices (void)
 {
-  (void) pthread_once (&gomp_is_initialized, gomp_target_init);
+  gomp_init_targets_once ();
   return num_devices;
 }
 
@@ -198,18 +128,29 @@ gomp_map_vars_existing (splay_tree_key oldn, splay_tree_key newn,
   oldn->refcount++;
 }
 
-static struct target_mem_desc *
+static int
+get_kind (bool is_openacc, void *kinds, int idx)
+{
+  return is_openacc ? ((unsigned short *) kinds)[idx]
+		    : ((unsigned char *) kinds)[idx];
+}
+
+attribute_hidden struct target_mem_desc *
 gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
-	       void **hostaddrs, size_t *sizes, unsigned char *kinds,
-	       bool is_target)
+	       void **hostaddrs, void **devaddrs, size_t *sizes, void *kinds,
+	       bool is_openacc, bool is_target)
 {
   size_t i, tgt_align, tgt_size, not_found_cnt = 0;
+  const int rshift = is_openacc ? 8 : 3;
+  const int typemask = is_openacc ? 0xff : 0x7;
+  struct gomp_memory_mapping *mm = &devicep->mem_map;
   struct splay_tree_key_s cur_node;
   struct target_mem_desc *tgt
     = gomp_malloc (sizeof (*tgt) + sizeof (tgt->list[0]) * mapnum);
   tgt->list_count = mapnum;
   tgt->refcount = 1;
   tgt->device_descr = devicep;
+  tgt->mem_map = mm;
 
   if (mapnum == 0)
     return tgt;
@@ -222,41 +163,41 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
       tgt_align = align;
       tgt_size = mapnum * sizeof (void *);
     }
-
-  gomp_mutex_lock (&devicep->dev_env_lock);
+  gomp_mutex_lock (&mm->lock);
   for (i = 0; i < mapnum; i++)
     {
+      int kind = get_kind (is_openacc, kinds, i);
       if (hostaddrs[i] == NULL)
 	{
 	  tgt->list[i] = NULL;
 	  continue;
 	}
       cur_node.host_start = (uintptr_t) hostaddrs[i];
-      if ((kinds[i] & 7) != 4)
+      if (!GOMP_MAP_POINTER_P (kind & typemask))
 	cur_node.host_end = cur_node.host_start + sizes[i];
       else
 	cur_node.host_end = cur_node.host_start + sizeof (void *);
-      splay_tree_key n = splay_tree_lookup (&devicep->dev_splay_tree,
-					    &cur_node);
+      splay_tree_key n = splay_tree_lookup (&mm->splay_tree, &cur_node);
       if (n)
 	{
 	  tgt->list[i] = n;
-	  gomp_map_vars_existing (n, &cur_node, kinds[i]);
+	  gomp_map_vars_existing (n, &cur_node, kind);
 	}
       else
 	{
-	  size_t align = (size_t) 1 << (kinds[i] >> 3);
+	  size_t align = (size_t) 1 << (kind >> rshift);
 	  tgt->list[i] = NULL;
 	  not_found_cnt++;
 	  if (tgt_align < align)
 	    tgt_align = align;
 	  tgt_size = (tgt_size + align - 1) & ~(align - 1);
 	  tgt_size += cur_node.host_end - cur_node.host_start;
-	  if ((kinds[i] & 7) == 5)
+	  if ((kind & typemask) == GOMP_MAP_TO_PSET)
 	    {
 	      size_t j;
 	      for (j = i + 1; j < mapnum; j++)
-		if ((kinds[j] & 7) != 4)
+		if (!GOMP_MAP_POINTER_P (get_kind (is_openacc, kinds, j)
+					 & typemask))
 		  break;
 		else if ((uintptr_t) hostaddrs[j] < cur_node.host_start
 			 || ((uintptr_t) hostaddrs[j] + sizeof (void *)
@@ -271,7 +212,15 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 	}
     }
 
-  if (not_found_cnt || is_target)
+  if (devaddrs)
+    {
+      if (mapnum != 1)
+        gomp_fatal ("unexpected aggregation");
+      tgt->to_free = devaddrs[0];
+      tgt->tgt_start = (uintptr_t) tgt->to_free;
+      tgt->tgt_end = tgt->tgt_start + sizes[0];
+    }
+  else if (not_found_cnt || is_target)
     {
       /* Allocate tgt_align aligned tgt_size block of memory.  */
       /* FIXME: Perhaps change interface to allocate properly aligned
@@ -303,44 +252,52 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
       for (i = 0; i < mapnum; i++)
 	if (tgt->list[i] == NULL)
 	  {
+	    int kind = get_kind (is_openacc, kinds, i);
 	    if (hostaddrs[i] == NULL)
 	      continue;
 	    splay_tree_key k = &array->key;
 	    k->host_start = (uintptr_t) hostaddrs[i];
-	    if ((kinds[i] & 7) != 4)
+	    if (!GOMP_MAP_POINTER_P (kind & typemask))
 	      k->host_end = k->host_start + sizes[i];
 	    else
 	      k->host_end = k->host_start + sizeof (void *);
-	    splay_tree_key n
-	      = splay_tree_lookup (&devicep->dev_splay_tree, k);
+	    splay_tree_key n = splay_tree_lookup (&mm->splay_tree, k);
 	    if (n)
 	      {
 		tgt->list[i] = n;
-		gomp_map_vars_existing (n, k, kinds[i]);
+		gomp_map_vars_existing (n, k, kind);
 	      }
 	    else
 	      {
-		size_t align = (size_t) 1 << (kinds[i] >> 3);
+		size_t align = (size_t) 1 << (kind >> rshift);
 		tgt->list[i] = k;
 		tgt_size = (tgt_size + align - 1) & ~(align - 1);
 		k->tgt = tgt;
 		k->tgt_offset = tgt_size;
 		tgt_size += k->host_end - k->host_start;
-		k->copy_from = false;
-		if ((kinds[i] & 7) == 2 || (kinds[i] & 7) == 3)
-		  k->copy_from = true;
+		k->copy_from = GOMP_MAP_COPYFROM_P (kind & typemask)
+			       || GOMP_MAP_TOFROM_P (kind & typemask);
 		k->refcount = 1;
+		k->async_refcount = 0;
 		tgt->refcount++;
 		array->left = NULL;
 		array->right = NULL;
-		splay_tree_insert (&devicep->dev_splay_tree, array);
-		switch (kinds[i] & 7)
+		splay_tree_insert (&mm->splay_tree, array);
+		switch (kind & typemask)
 		  {
-		  case 0: /* ALLOC */
-		  case 2: /* FROM */
+		  case GOMP_MAP_FORCE_ALLOC:
+		  case GOMP_MAP_FORCE_FROM:
+		    /* FIXME: No special handling (see comment in
+		       oacc-parallel.c).  */
+		  case GOMP_MAP_ALLOC:
+		  case GOMP_MAP_ALLOC_FROM:
 		    break;
-		  case 1: /* TO */
-		  case 3: /* TOFROM */
+		  case GOMP_MAP_FORCE_TO:
+		  case GOMP_MAP_FORCE_TOFROM:
+		    /* FIXME: No special handling, as above.  */
+		  case GOMP_MAP_ALLOC_TO:
+		  case GOMP_MAP_ALLOC_TOFROM:
+		    /* Copy from host to device memory.  */
 		    /* FIXME: Perhaps add some smarts, like if copying
 		       several adjacent fields from host to target, use some
 		       host buffer to avoid sending each var individually.  */
@@ -350,7 +307,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 					    (void *) k->host_start,
 					    k->host_end - k->host_start);
 		    break;
-		  case 4: /* POINTER */
+		  case GOMP_MAP_POINTER:
 		    cur_node.host_start
 		      = (uintptr_t) *(void **) k->host_start;
 		    if (cur_node.host_start == (uintptr_t) NULL)
@@ -366,19 +323,16 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 		    /* Add bias to the pointer value.  */
 		    cur_node.host_start += sizes[i];
 		    cur_node.host_end = cur_node.host_start + 1;
-		    n = splay_tree_lookup (&devicep->dev_splay_tree,
-					   &cur_node);
+		    n = splay_tree_lookup (&mm->splay_tree, &cur_node);
 		    if (n == NULL)
 		      {
 			/* Could be possibly zero size array section.  */
 			cur_node.host_end--;
-			n = splay_tree_lookup (&devicep->dev_splay_tree,
-					       &cur_node);
+			n = splay_tree_lookup (&mm->splay_tree, &cur_node);
 			if (n == NULL)
 			  {
 			    cur_node.host_start--;
-			    n = splay_tree_lookup (&devicep->dev_splay_tree,
-						   &cur_node);
+			    n = splay_tree_lookup (&mm->splay_tree, &cur_node);
 			    cur_node.host_start++;
 			  }
 		      }
@@ -398,14 +352,17 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 					    (void *) &cur_node.tgt_offset,
 					    sizeof (void *));
 		    break;
-		  case 5: /* TO_PSET */
-		    devicep->host2dev_func (devicep->target_id,
-					    (void *) (tgt->tgt_start
-						      + k->tgt_offset),
-					    (void *) k->host_start,
-					    k->host_end - k->host_start);
+		  case GOMP_MAP_TO_PSET:
+		    /* Copy from host to device memory.  */
+		    /* FIXME: see above FIXME comment.  */
+		    devicep->host2dev_func
+		      (devicep->target_id,
+		       (void *) (tgt->tgt_start + k->tgt_offset),
+		       (void *) k->host_start,
+		       (k->host_end - k->host_start));
 		    for (j = i + 1; j < mapnum; j++)
-		      if ((kinds[j] & 7) != 4)
+		      if (!GOMP_MAP_POINTER_P (get_kind (is_openacc, kinds, j)
+					       & typemask))
 			break;
 		      else if ((uintptr_t) hostaddrs[j] < k->host_start
 			       || ((uintptr_t) hostaddrs[j] + sizeof (void *)
@@ -432,19 +389,18 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 			  /* Add bias to the pointer value.  */
 			  cur_node.host_start += sizes[j];
 			  cur_node.host_end = cur_node.host_start + 1;
-			  n = splay_tree_lookup (&devicep->dev_splay_tree,
-						 &cur_node);
+			  n = splay_tree_lookup (&mm->splay_tree, &cur_node);
 			  if (n == NULL)
 			    {
 			      /* Could be possibly zero size array section.  */
 			      cur_node.host_end--;
-			      n = splay_tree_lookup (&devicep->dev_splay_tree,
+			      n = splay_tree_lookup (&mm->splay_tree,
 						     &cur_node);
 			      if (n == NULL)
 				{
 				  cur_node.host_start--;
-				  n = splay_tree_lookup
-					(&devicep->dev_splay_tree, &cur_node);
+				  n = splay_tree_lookup (&mm->splay_tree,
+							 &cur_node);
 				  cur_node.host_start++;
 				}
 			    }
@@ -468,6 +424,32 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 			  i++;
 			}
 		      break;
+		    case GOMP_MAP_FORCE_PRESENT:
+		      {
+		        /* We already looked up the memory region above and it
+			   was missing.  */
+			size_t size = k->host_end - k->host_start;
+			gomp_fatal ("present clause: !acc_is_present (%p, "
+				    "%zd (0x%zx))", (void *) k->host_start,
+				    size, size);
+		      }
+		      break;
+		    case GOMP_MAP_FORCE_DEVICEPTR:
+		      assert (k->host_end - k->host_start == sizeof (void *));
+		      
+		      devicep->host2dev_func
+		        (devicep->target_id,
+			 (void *) (tgt->tgt_start + k->tgt_offset),
+			 (void *) k->host_start,
+			 sizeof (void *));
+		      break;
+		    case GOMP_MAP_FORCE_PRIVATE:
+		      abort ();
+		    case GOMP_MAP_FORCE_FIRSTPRIVATE:
+		      abort ();
+		    default:
+		      gomp_fatal ("%s: unhandled kind 0x%.2x", __FUNCTION__,
+				  kind);
 		  }
 		array++;
 	      }
@@ -490,7 +472,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 	}
     }
 
-  gomp_mutex_unlock (&devicep->dev_env_lock);
+  gomp_mutex_unlock (&mm->lock);
   return tgt;
 }
 
@@ -505,10 +487,51 @@ gomp_unmap_tgt (struct target_mem_desc *tgt)
   free (tgt);
 }
 
-static void
-gomp_unmap_vars (struct target_mem_desc *tgt)
+/* Decrease the refcount for a set of mapped variables, and queue asychronous
+   copies from the device back to the host after any work that has been issued. 
+   Because the regions are still "live", increment an asynchronous reference
+   count to indicate that they should not be unmapped from host-side data
+   structures until the asynchronous copy has completed.  */
+
+attribute_hidden void
+gomp_copy_from_async (struct target_mem_desc *tgt)
+{
+  struct gomp_device_descr *devicep = tgt->device_descr;
+  struct gomp_memory_mapping *mm = tgt->mem_map;
+  size_t i;
+  
+  gomp_mutex_lock (&mm->lock);
+
+  for (i = 0; i < tgt->list_count; i++)
+    if (tgt->list[i] == NULL)
+      ;
+    else if (tgt->list[i]->refcount > 1)
+      {
+	tgt->list[i]->refcount--;
+	tgt->list[i]->async_refcount++;
+      }
+    else
+      {
+	splay_tree_key k = tgt->list[i];
+	if (k->copy_from)
+	  /* Copy from device to host memory.  */
+	  devicep->dev2host_func (devicep->target_id, (void *) k->host_start,
+				  (void *) (k->tgt->tgt_start + k->tgt_offset),
+				  k->host_end - k->host_start);
+      }
+
+  gomp_mutex_unlock (&mm->lock);
+}
+
+/* Unmap variables described by TGT.  If DO_COPYFROM is true, copy relevant
+   variables back from device to host: if it is false, it is assumed that this
+   has been done already, i.e. by gomp_copy_from_async above.  */
+
+attribute_hidden void
+gomp_unmap_vars (struct target_mem_desc *tgt, bool do_copyfrom)
 {
   struct gomp_device_descr *devicep = tgt->device_descr;
+  struct gomp_memory_mapping *mm = tgt->mem_map;
 
   if (tgt->list_count == 0)
     {
@@ -517,20 +540,23 @@ gomp_unmap_vars (struct target_mem_desc *tgt)
     }
 
   size_t i;
-  gomp_mutex_lock (&devicep->dev_env_lock);
+  gomp_mutex_lock (&mm->lock);
   for (i = 0; i < tgt->list_count; i++)
     if (tgt->list[i] == NULL)
       ;
     else if (tgt->list[i]->refcount > 1)
       tgt->list[i]->refcount--;
+    else if (tgt->list[i]->async_refcount > 0)
+      tgt->list[i]->async_refcount--;
     else
       {
 	splay_tree_key k = tgt->list[i];
-	if (k->copy_from)
+	if (k->copy_from && do_copyfrom)
+	  /* Copy from device to host memory.  */
 	  devicep->dev2host_func (devicep->target_id, (void *) k->host_start,
 				  (void *) (k->tgt->tgt_start + k->tgt_offset),
 				  k->host_end - k->host_start);
-	splay_tree_remove (&devicep->dev_splay_tree, k);
+	splay_tree_remove (&mm->splay_tree, k);
 	if (k->tgt->refcount > 1)
 	  k->tgt->refcount--;
 	else
@@ -541,15 +567,17 @@ gomp_unmap_vars (struct target_mem_desc *tgt)
     tgt->refcount--;
   else
     gomp_unmap_tgt (tgt);
-  gomp_mutex_unlock (&devicep->dev_env_lock);
+  gomp_mutex_unlock (&mm->lock);
 }
 
 static void
-gomp_update (struct gomp_device_descr *devicep, size_t mapnum,
-	     void **hostaddrs, size_t *sizes, unsigned char *kinds)
+gomp_update (struct gomp_device_descr *devicep, struct gomp_memory_mapping *mm,
+	     size_t mapnum, void **hostaddrs, size_t *sizes, void *kinds,
+	     bool is_openacc)
 {
   size_t i;
   struct splay_tree_key_s cur_node;
+  const int typemask = is_openacc ? 0xff : 0x7;
 
   if (!devicep)
     return;
@@ -557,16 +585,17 @@ gomp_update (struct gomp_device_descr *devicep, size_t mapnum,
   if (mapnum == 0)
     return;
 
-  gomp_mutex_lock (&devicep->dev_env_lock);
+  gomp_mutex_lock (&mm->lock);
   for (i = 0; i < mapnum; i++)
     if (sizes[i])
       {
 	cur_node.host_start = (uintptr_t) hostaddrs[i];
 	cur_node.host_end = cur_node.host_start + sizes[i];
-	splay_tree_key n = splay_tree_lookup (&devicep->dev_splay_tree,
+	splay_tree_key n = splay_tree_lookup (&mm->splay_tree,
 					      &cur_node);
 	if (n)
 	  {
+	    int kind = get_kind (is_openacc, kinds, i);
 	    if (n->host_start > cur_node.host_start
 		|| n->host_end < cur_node.host_end)
 	      gomp_fatal ("Trying to update [%p..%p) object when"
@@ -575,31 +604,38 @@ gomp_update (struct gomp_device_descr *devicep, size_t mapnum,
 			  (void *) cur_node.host_end,
 			  (void *) n->host_start,
 			  (void *) n->host_end);
-	    if ((kinds[i] & 7) == 1)
-	      devicep->host2dev_func (devicep->target_id,
-				      (void *) (n->tgt->tgt_start
-						+ n->tgt_offset
-						+ cur_node.host_start
-						- n->host_start),
-				      (void *) cur_node.host_start,
-				      cur_node.host_end - cur_node.host_start);
-	    else if ((kinds[i] & 7) == 2)
-	      devicep->dev2host_func (devicep->target_id,
-				      (void *) cur_node.host_start,
-				      (void *) (n->tgt->tgt_start
-						+ n->tgt_offset
-						+ cur_node.host_start
-						- n->host_start),
-				      cur_node.host_end - cur_node.host_start);
+	    if (GOMP_MAP_COPYTO_P (kind & typemask))
+	      /* Copy from host to device memory.  */
+	      devicep->host2dev_func
+		(devicep->target_id, 
+		 (void *) (n->tgt->tgt_start
+			   + n->tgt_offset
+			   + cur_node.host_start
+			   - n->host_start),
+		 (void *) cur_node.host_start,
+		 cur_node.host_end - cur_node.host_start);
+	    else if (GOMP_MAP_COPYFROM_P (kind & typemask))
+	      /* Copy from device to host memory.  */
+	      devicep->dev2host_func
+		(devicep->target_id,
+		 (void *) cur_node.host_start,
+		 (void *) (n->tgt->tgt_start
+			   + n->tgt_offset
+			   + cur_node.host_start
+			   - n->host_start),
+		 cur_node.host_end - cur_node.host_start);
 	  }
 	else
 	  gomp_fatal ("Trying to update [%p..%p) object that is not mapped",
 		      (void *) cur_node.host_start,
 		      (void *) cur_node.host_end);
       }
-  gomp_mutex_unlock (&devicep->dev_env_lock);
+  gomp_mutex_unlock (&mm->lock);
 }
 
+static void gomp_register_image_for_device (struct gomp_device_descr *device,
+					    struct offload_image_descr *image);
+
 /* This function should be called from every offload image.
    It gets the descriptor of the host func and var tables HOST_TABLE, TYPE of
    the target, and TARGET_DATA needed by target plugin.  */
@@ -612,6 +648,9 @@ GOMP_offload_register (void *host_table, enum offload_target_type target_type,
 				 (num_offload_images + 1)
 				 * sizeof (struct offload_image_descr));
 
+  if (offload_images == NULL)
+    return;
+
   offload_images[num_offload_images].type = target_type;
   offload_images[num_offload_images].host_table = host_table;
   offload_images[num_offload_images].target_data = target_data;
@@ -621,17 +660,24 @@ GOMP_offload_register (void *host_table, enum offload_target_type target_type,
 
 /* This function initializes the target device, specified by DEVICEP.  */
 
-static void
+attribute_hidden void
 gomp_init_device (struct gomp_device_descr *devicep)
 {
+  /* Initialize the target device.  */
   devicep->init_device_func (devicep->target_id);
+  
+  devicep->is_initialized = true;
+}
 
+attribute_hidden void
+gomp_init_tables (const struct gomp_device_descr *devicep,
+		  struct gomp_memory_mapping *mm)
+{
   /* Get address mapping table for device.  */
   struct mapping_table *table = NULL;
-  int num_entries = devicep->get_table_func (devicep->target_id, &table);
+  int i, num_entries = devicep->get_table_func (devicep->target_id, &table);
 
   /* Insert host-target address mapping into dev_splay_tree.  */
-  int i;
   for (i = 0; i < num_entries; i++)
     {
       struct target_mem_desc *tgt = gomp_malloc (sizeof (*tgt));
@@ -641,7 +687,7 @@ gomp_init_device (struct gomp_device_descr *devicep)
       tgt->tgt_end = table[i].tgt_end;
       tgt->to_free = NULL;
       tgt->list_count = 0;
-      tgt->device_descr = devicep;
+      tgt->device_descr = (struct gomp_device_descr *) devicep;
       splay_tree_node node = tgt->array;
       splay_tree_key k = &node->key;
       k->host_start = table[i].host_start;
@@ -652,11 +698,45 @@ gomp_init_device (struct gomp_device_descr *devicep)
       k->tgt = tgt;
       node->left = NULL;
       node->right = NULL;
-      splay_tree_insert (&devicep->dev_splay_tree, node);
+      splay_tree_insert (&mm->splay_tree, node);
     }
 
   free (table);
-  devicep->is_initialized = true;
+  mm->is_initialized = true;
+}
+
+static void
+gomp_init_dev_tables (struct gomp_device_descr *devicep)
+{
+  gomp_init_device (devicep);
+  gomp_init_tables (devicep, &devicep->mem_map);
+}
+
+
+attribute_hidden void
+gomp_free_memmap (struct gomp_device_descr *devicep)
+{
+  struct gomp_memory_mapping *mm = &devicep->mem_map;
+
+  while (mm->splay_tree.root)
+    {
+      struct target_mem_desc *tgt = mm->splay_tree.root->key.tgt;
+      
+      splay_tree_remove (&mm->splay_tree, &mm->splay_tree.root->key);
+      free (tgt->array);
+      free (tgt);
+    }
+
+  mm->is_initialized = false;
+}
+
+attribute_hidden void
+gomp_fini_device (struct gomp_device_descr *devicep)
+{
+  if (devicep->is_initialized)
+    devicep->fini_device_func (devicep->target_id);
+
+  devicep->is_initialized = false;
 }
 
 /* Called when encountering a target directive.  If DEVICE
@@ -675,7 +755,12 @@ GOMP_target (int device, void (*fn) (void *), const void *openmp_target,
 	     unsigned char *kinds)
 {
   struct gomp_device_descr *devicep = resolve_device (device);
-  if (devicep == NULL)
+  struct gomp_memory_mapping *mm = &devicep->mem_map;
+
+  if (devicep != NULL && !devicep->is_initialized)
+    gomp_init_dev_tables (devicep);
+
+  if (devicep == NULL || !(devicep->capabilities & TARGET_CAP_OPENMP_400))
     {
       /* Host fallback.  */
       struct gomp_thread old_thr, *thr = gomp_thread ();
@@ -692,20 +777,30 @@ GOMP_target (int device, void (*fn) (void *), const void *openmp_target,
       return;
     }
 
-  gomp_mutex_lock (&devicep->dev_env_lock);
-  if (!devicep->is_initialized)
-    gomp_init_device (devicep);
+  void *fn_addr;
 
-  struct splay_tree_key_s k;
-  k.host_start = (uintptr_t) fn;
-  k.host_end = k.host_start + 1;
-  splay_tree_key tgt_fn = splay_tree_lookup (&devicep->dev_splay_tree, &k);
-  if (tgt_fn == NULL)
-    gomp_fatal ("Target function wasn't mapped");
-  gomp_mutex_unlock (&devicep->dev_env_lock);
+  if (devicep->capabilities & TARGET_CAP_NATIVE_EXEC)
+    fn_addr = (void *) fn;
+  else
+    {
+      gomp_mutex_lock (&mm->lock);
+      if (!devicep->is_initialized)
+	gomp_init_dev_tables (devicep);
+      struct splay_tree_key_s k;
+      k.host_start = (uintptr_t) fn;
+      k.host_end = k.host_start + 1;
+      splay_tree_key tgt_fn = splay_tree_lookup (&devicep->mem_map.splay_tree,
+						 &k);
+      if (tgt_fn == NULL)
+	gomp_fatal ("Target function wasn't mapped");
+      gomp_mutex_unlock (&mm->lock);
+      
+      fn_addr = (void *) tgt_fn->tgt->tgt_start;
+    }
 
   struct target_mem_desc *tgt_vars
-    = gomp_map_vars (devicep, mapnum, hostaddrs, sizes, kinds, true);
+    = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, false,
+		     true);
   struct gomp_thread old_thr, *thr = gomp_thread ();
   old_thr = *thr;
   memset (thr, '\0', sizeof (*thr));
@@ -714,11 +809,10 @@ GOMP_target (int device, void (*fn) (void *), const void *openmp_target,
       thr->place = old_thr.place;
       thr->ts.place_partition_len = gomp_places_list_len;
     }
-  devicep->run_func (devicep->target_id, (void *) tgt_fn->tgt->tgt_start,
-		     (void *) tgt_vars->tgt_start);
+  devicep->run_func (devicep->target_id, fn_addr, (void *) tgt_vars->tgt_start);
   gomp_free_thread (thr);
   *thr = old_thr;
-  gomp_unmap_vars (tgt_vars);
+  gomp_unmap_vars (tgt_vars, true);
 }
 
 void
@@ -726,7 +820,12 @@ GOMP_target_data (int device, const void *openmp_target, size_t mapnum,
 		  void **hostaddrs, size_t *sizes, unsigned char *kinds)
 {
   struct gomp_device_descr *devicep = resolve_device (device);
-  if (devicep == NULL)
+  struct gomp_memory_mapping *mm = &devicep->mem_map;
+
+  if (devicep != NULL && !devicep->is_initialized)
+    gomp_init_dev_tables (devicep);
+
+  if (devicep == NULL || !(devicep->capabilities & TARGET_CAP_OPENMP_400))
     {
       /* Host fallback.  */
       struct gomp_task_icv *icv = gomp_icv (false);
@@ -737,20 +836,21 @@ GOMP_target_data (int device, const void *openmp_target, size_t mapnum,
 	     new #pragma omp target data, otherwise GOMP_target_end_data
 	     would get out of sync.  */
 	  struct target_mem_desc *tgt
-	    = gomp_map_vars (NULL, 0, NULL, NULL, NULL, false);
+	    = gomp_map_vars (NULL, 0, NULL, NULL, NULL, NULL, false, false);
 	  tgt->prev = icv->target_data;
 	  icv->target_data = tgt;
 	}
       return;
     }
 
-  gomp_mutex_lock (&devicep->dev_env_lock);
+  gomp_mutex_lock (&mm->lock);
   if (!devicep->is_initialized)
-    gomp_init_device (devicep);
-  gomp_mutex_unlock (&devicep->dev_env_lock);
+    gomp_init_dev_tables (devicep);
+  gomp_mutex_unlock (&mm->lock);
 
   struct target_mem_desc *tgt
-    = gomp_map_vars (devicep, mapnum, hostaddrs, sizes, kinds, false);
+    = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, false,
+		     false);
   struct gomp_task_icv *icv = gomp_icv (true);
   tgt->prev = icv->target_data;
   icv->target_data = tgt;
@@ -764,7 +864,7 @@ GOMP_target_end_data (void)
     {
       struct target_mem_desc *tgt = icv->target_data;
       icv->target_data = tgt->prev;
-      gomp_unmap_vars (tgt);
+      gomp_unmap_vars (tgt, true);
     }
 }
 
@@ -773,15 +873,18 @@ GOMP_target_update (int device, const void *openmp_target, size_t mapnum,
 		    void **hostaddrs, size_t *sizes, unsigned char *kinds)
 {
   struct gomp_device_descr *devicep = resolve_device (device);
-  if (devicep == NULL)
-    return;
+  struct gomp_memory_mapping *mm = &devicep->mem_map;
 
-  gomp_mutex_lock (&devicep->dev_env_lock);
-  if (!devicep->is_initialized)
+  gomp_mutex_lock (&mm->lock);
+  if (devicep != NULL && !devicep->is_initialized)
     gomp_init_device (devicep);
-  gomp_mutex_unlock (&devicep->dev_env_lock);
+  gomp_mutex_unlock (&mm->lock);
 
-  gomp_update (devicep, mapnum, hostaddrs, sizes, kinds);
+  if (devicep == NULL || !(devicep->capabilities & TARGET_CAP_OPENMP_400))
+    return;
+
+  gomp_update (devicep, &devicep->mem_map, mapnum, hostaddrs, sizes, kinds,
+	       false);
 }
 
 void
@@ -808,9 +911,22 @@ gomp_load_plugin_for_device (struct gomp_device_descr *device,
 			     const char *plugin_name)
 {
   void *plugin_handle = dlopen (plugin_name, RTLD_LAZY);
+  char *err = NULL, *last_missing = NULL;
+  int optional_present, optional_total;
+
   if (!plugin_handle)
     return false;
 
+  /* Clear any existing error.  */
+  dlerror ();
+
+  device->plugin_handle = dlopen (plugin_name, RTLD_LAZY);
+  if (!device->plugin_handle)
+    {
+      err = dlerror ();
+      goto out;
+    }
+
   /* Check if all required functions are available in the plugin and store
      their handlers.  */
 #define DLSYM(f)						    \
@@ -821,33 +937,104 @@ gomp_load_plugin_for_device (struct gomp_device_descr *device,
 	return false;						    \
     }								    \
   while (0)
+  /* Similar, but missing functions are not an error.  */
+#define DLSYM_OPT(f,n) \
+  do									\
+    {									\
+      char *tmp_err;							\
+      device->f##_func = dlsym (device->plugin_handle,			\
+				"GOMP_OFFLOAD_" #n);			\
+      tmp_err = dlerror ();						\
+      if (tmp_err == NULL)						\
+        optional_present++;						\
+      else								\
+        last_missing = #n;						\
+      optional_total++;							\
+    }									\
+  while (0)
+
+  DLSYM (get_name);
+  DLSYM (get_caps);
   DLSYM (get_type);
   DLSYM (get_num_devices);
   DLSYM (register_image);
   DLSYM (init_device);
+  DLSYM (fini_device);
   DLSYM (get_table);
   DLSYM (alloc);
   DLSYM (free);
   DLSYM (dev2host);
   DLSYM (host2dev);
-  DLSYM (run);
+  device->capabilities = device->get_caps_func ();
+  if (device->capabilities & TARGET_CAP_OPENMP_400)
+    DLSYM (run);
+  if (device->capabilities & TARGET_CAP_OPENACC_200)
+    {
+      optional_present = optional_total = 0;
+      DLSYM_OPT (openacc.exec, openacc_parallel);
+      DLSYM_OPT (openacc.open_device, openacc_open_device);
+      DLSYM_OPT (openacc.close_device, openacc_close_device);
+      DLSYM_OPT (openacc.get_device_num, openacc_get_device_num);
+      DLSYM_OPT (openacc.set_device_num, openacc_set_device_num);
+      DLSYM_OPT (openacc.register_async_cleanup,
+		 openacc_register_async_cleanup);
+      DLSYM_OPT (openacc.async_test, openacc_async_test);
+      DLSYM_OPT (openacc.async_test_all, openacc_async_test_all);
+      DLSYM_OPT (openacc.async_wait, openacc_async_wait);
+      DLSYM_OPT (openacc.async_wait_async, openacc_async_wait_async);
+      DLSYM_OPT (openacc.async_wait_all, openacc_async_wait_all);
+      DLSYM_OPT (openacc.async_wait_all_async, openacc_async_wait_all_async);
+      DLSYM_OPT (openacc.async_set_async, openacc_async_set_async);
+      DLSYM_OPT (openacc.create_thread_data, openacc_create_thread_data);
+      DLSYM_OPT (openacc.destroy_thread_data, openacc_destroy_thread_data);
+      /* Require all the OpenACC handlers if we have TARGET_CAP_OPENACC_200.  */
+      if (optional_present != optional_total)
+	{
+	  err = "plugin missing OpenACC handler function";
+	  goto out;
+	}
+      optional_present = optional_total = 0;
+      DLSYM_OPT (openacc.cuda.get_current_device,
+		 openacc_get_current_cuda_device);
+      DLSYM_OPT (openacc.cuda.get_current_context,
+		 openacc_get_current_cuda_context);
+      DLSYM_OPT (openacc.cuda.get_stream, openacc_get_cuda_stream);
+      DLSYM_OPT (openacc.cuda.set_stream, openacc_set_cuda_stream);
+      /* Make sure all the CUDA functions are there if any of them are.  */
+      if (optional_present && optional_present != optional_total)
+	{
+	  err = "plugin missing OpenACC CUDA handler function";
+	  goto out;
+	}
+    }
 #undef DLSYM
+#undef DLSYM_OPT
 
-  return true;
+ out:
+  if (err != NULL)
+    {
+      gomp_error ("while loading %s: %s", plugin_name, err);
+      if (last_missing)
+        gomp_error ("missing function was %s", last_missing);
+      if (device->plugin_handle)
+	dlclose (device->plugin_handle);
+    }
+  return err == NULL;
 }
 
-/* This function finds OFFLOAD_IMAGES corresponding to DEVICE type, and
-   registers them in the plugin.  */
+/* This function adds a compatible offload image IMAGE to an accelerator device
+   DEVICE.  */
 
 static void
-gomp_register_images_for_device (struct gomp_device_descr *device)
+gomp_register_image_for_device (struct gomp_device_descr *device,
+				struct offload_image_descr *image)
 {
-  int i;
-  for (i = 0; i < num_offload_images; i++)
+  if (!device->offload_regions_registered
+      && (device->type == image->type
+	  || device->type == OFFLOAD_TARGET_TYPE_HOST))
     {
-      struct offload_image_descr *image = &offload_images[i];
-      if (image->type == device->type)
-	device->register_image_func (image->host_table, image->target_data);
+      device->register_image_func (image->host_table, image->target_data);
+      device->offload_regions_registered = true;
     }
 }
 
@@ -903,15 +1090,19 @@ gomp_target_init (void)
 		  }
 
 		current_device.type = current_device.get_type_func ();
+		current_device.name = current_device.get_name_func ();
 		current_device.is_initialized = false;
-		current_device.dev_splay_tree.root = NULL;
-		gomp_register_images_for_device (&current_device);
+		current_device.offload_regions_registered = false;
+		current_device.mem_map.splay_tree.root = NULL;
+		current_device.mem_map.is_initialized = false;
+		current_device.target_data = NULL;
+		current_device.openacc.data_environ = NULL;
 		for (i = 0; i < new_num_devices; i++)
 		  {
 		    current_device.id = num_devices + 1;
 		    current_device.target_id = i;
 		    devices[num_devices] = current_device;
-		    gomp_mutex_init (&devices[num_devices].dev_env_lock);
+		    gomp_mutex_init (&devices[num_devices].mem_map.lock);
 		    num_devices++;
 		  }
 	      }
@@ -922,6 +1113,43 @@ gomp_target_init (void)
       }
     while (next);
 
+  /* Prefer a device with TARGET_CAP_OPENMP_400 for ICV default-device-var.  */
+  if (num_devices > 1)
+    {
+      int d = gomp_icv (false)->default_device_var;
+
+      if (!(devices[d].capabilities & TARGET_CAP_OPENMP_400))
+	{
+	  for (i = 0; i < num_devices; i++)
+	    {
+	      if (devices[i].capabilities & TARGET_CAP_OPENMP_400)
+		{
+		  struct gomp_device_descr device_tmp = devices[d];
+		  devices[d] = devices[i];
+		  devices[d].id = d + 1;
+		  devices[i] = device_tmp;
+		  devices[i].id = i + 1;
+
+		  break;
+		}
+	    }
+	}
+    }
+
+  for (i = 0; i < num_devices; i++)
+    {
+      int j;
+
+      for (j = 0; j < num_offload_images; j++)
+	gomp_register_image_for_device (&devices[i], &offload_images[j]);
+
+      /* The 'devices' array can be moved (by the realloc call) until we have
+	 found all the plugins, so registering with the OpenACC runtime (which
+	 takes a copy of the pointer argument) must be delayed until now.  */
+      if (devices[i].capabilities & TARGET_CAP_OPENACC_200)
+	goacc_register (&devices[i]);
+    }
+
   free (offload_images);
   offload_images = NULL;
   num_offload_images = 0;
diff --git a/libgomp/testsuite/Makefile.in b/libgomp/testsuite/Makefile.in
index 2f845f0..78b6351 100644
--- a/libgomp/testsuite/Makefile.in
+++ b/libgomp/testsuite/Makefile.in
@@ -35,7 +35,8 @@ build_triplet = @build@
 host_triplet = @host@
 target_triplet = @target@
 subdir = testsuite
-DIST_COMMON = $(srcdir)/Makefile.in $(srcdir)/Makefile.am
+DIST_COMMON = $(srcdir)/Makefile.in $(srcdir)/Makefile.am \
+	$(srcdir)/libgomp-test-support.exp.in
 ACLOCAL_M4 = $(top_srcdir)/aclocal.m4
 am__aclocal_m4_deps = $(top_srcdir)/../config/acx.m4 \
 	$(top_srcdir)/../config/depstand.m4 \
@@ -49,12 +50,13 @@ am__aclocal_m4_deps = $(top_srcdir)/../config/acx.m4 \
 	$(top_srcdir)/../config/tls.m4 $(top_srcdir)/../ltoptions.m4 \
 	$(top_srcdir)/../ltsugar.m4 $(top_srcdir)/../ltversion.m4 \
 	$(top_srcdir)/../lt~obsolete.m4 $(top_srcdir)/acinclude.m4 \
-	$(top_srcdir)/../libtool.m4 $(top_srcdir)/configure.ac
+	$(top_srcdir)/../libtool.m4 $(top_srcdir)/plugin/configfrag.ac \
+	$(top_srcdir)/configure.ac
 am__configure_deps = $(am__aclocal_m4_deps) $(CONFIGURE_DEPENDENCIES) \
 	$(ACLOCAL_M4)
 mkinstalldirs = $(SHELL) $(top_srcdir)/../mkinstalldirs
 CONFIG_HEADER = $(top_builddir)/config.h
-CONFIG_CLEAN_FILES =
+CONFIG_CLEAN_FILES = libgomp-test-support.exp
 CONFIG_CLEAN_VPATH_FILES =
 SOURCES =
 DEJATOOL = $(PACKAGE)
@@ -71,6 +73,8 @@ CCDEPMODE = @CCDEPMODE@
 CFLAGS = @CFLAGS@
 CPP = @CPP@
 CPPFLAGS = @CPPFLAGS@
+CUDA_DRIVER_INCLUDE = @CUDA_DRIVER_INCLUDE@
+CUDA_DRIVER_LIB = @CUDA_DRIVER_LIB@
 CYGPATH_W = @CYGPATH_W@
 DEFS = @DEFS@
 DEPDIR = @DEPDIR@
@@ -129,6 +133,10 @@ PACKAGE_URL = @PACKAGE_URL@
 PACKAGE_VERSION = @PACKAGE_VERSION@
 PATH_SEPARATOR = @PATH_SEPARATOR@
 PERL = @PERL@
+PLUGIN_NVPTX = @PLUGIN_NVPTX@
+PLUGIN_NVPTX_CPPFLAGS = @PLUGIN_NVPTX_CPPFLAGS@
+PLUGIN_NVPTX_LDFLAGS = @PLUGIN_NVPTX_LDFLAGS@
+PLUGIN_NVPTX_LIBS = @PLUGIN_NVPTX_LIBS@
 RANLIB = @RANLIB@
 SECTION_LDFLAGS = @SECTION_LDFLAGS@
 SED = @SED@
@@ -250,6 +258,8 @@ $(top_srcdir)/configure: @MAINTAINER_MODE_TRUE@ $(am__configure_deps)
 $(ACLOCAL_M4): @MAINTAINER_MODE_TRUE@ $(am__aclocal_m4_deps)
 	cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh
 $(am__aclocal_m4_deps):
+libgomp-test-support.exp: $(top_builddir)/config.status $(srcdir)/libgomp-test-support.exp.in
+	cd $(top_builddir) && $(SHELL) ./config.status $(subdir)/$@
 
 mostlyclean-libtool:
 	-rm -f *.lo
diff --git a/libgomp/testsuite/libgomp-test-support.exp.in b/libgomp/testsuite/libgomp-test-support.exp.in
new file mode 100644
index 0000000..dcadad7
--- /dev/null
+++ b/libgomp/testsuite/libgomp-test-support.exp.in
@@ -0,0 +1,2 @@
+set cuda_driver_include "@CUDA_DRIVER_INCLUDE@"
+set cuda_driver_lib "@CUDA_DRIVER_LIB@"

[-- Attachment #3: cesar-goacc-b.diff --]
[-- Type: text/x-patch, Size: 20280 bytes --]

commit 4a5e8ad6d5c5fa2e944d1318dbcba28f234abffe
Author: Bernd Schmidt <bernds@codesourcery.com>
Date:   Wed Nov 19 18:35:41 2014 +0100

    Cesar's latest patch

diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
index f6e70e9..0fa62ff 100644
--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -310,6 +310,8 @@ GOACC_2.0 {
 	GOACC_parallel;
 	GOACC_update;
 	GOACC_wait;
+	GOACC_get_thread_num;
+	GOACC_get_num_threads;
 };
 
 GOMP_PLUGIN_1.0 {
diff --git a/libgomp/libgomp_g.h b/libgomp/libgomp_g.h
index 44f200c..3db5676 100644
--- a/libgomp/libgomp_g.h
+++ b/libgomp/libgomp_g.h
@@ -226,5 +226,7 @@ extern void GOACC_parallel (int, void (*) (void *), const void *,
 			    size_t, void **, size_t *, unsigned short *,
 			    int, int, int, int, int, ...);
 extern void GOACC_wait (int, int, ...);
+extern int GOACC_get_num_threads (void);
+extern int GOACC_get_thread_num (void);
 
 #endif /* LIBGOMP_G_H */
diff --git a/libgomp/oacc-parallel.c b/libgomp/oacc-parallel.c
index 0ff44bf..e142384 100644
--- a/libgomp/oacc-parallel.c
+++ b/libgomp/oacc-parallel.c
@@ -115,9 +115,6 @@ GOACC_parallel (int device, void (*fn) (void *), const void *openmp_target,
   splay_tree_key tgt_fn_key;
   void (*tgt_fn);
 
-  if (num_gangs != 1)
-    gomp_fatal ("num_gangs (%d) different from one is not yet supported",
-		num_gangs);
   if (num_workers != 1)
     gomp_fatal ("num_workers (%d) different from one is not yet supported",
 		num_workers);
@@ -386,3 +383,15 @@ GOACC_wait (int async, int num_waits, ...)
 
   va_end (ap);
 }
+
+int
+GOACC_get_num_threads (void)
+{
+  return 1;
+}
+
+int
+GOACC_get_thread_num (void)
+{
+  return 0;
+}
diff --git a/libgomp/oacc-ptx.h b/libgomp/oacc-ptx.h
new file mode 100644
index 0000000..1af81b2
--- /dev/null
+++ b/libgomp/oacc-ptx.h
@@ -0,0 +1,400 @@
+#define ABORT_PTX				\
+  ".version 3.1\n"				\
+  ".target sm_30\n"				\
+  ".address_size 64\n"				\
+  ".visible .func abort;\n"			\
+  ".visible .func abort\n"			\
+  "{\n"						\
+  "trap;\n"					\
+  "ret;\n"					\
+  "}\n"						\
+  ".visible .func _gfortran_abort;\n"		\
+  ".visible .func _gfortran_abort\n"		\
+  "{\n"						\
+  "trap;\n"					\
+  "ret;\n"					\
+  "}\n" \
+
+/* Generated with:
+
+   $ echo 'int acc_on_device(int d) { return __builtin_acc_on_device(d); } int acc_on_device_h_(int *d) { return acc_on_device(*d); }' | accel-gcc/xgcc -Baccel-gcc -x c - -o - -S -m64 -O3 -fno-builtin-acc_on_device -fno-inline
+*/
+#define ACC_ON_DEVICE_PTX						\
+  "        .version        3.1\n"					\
+  "        .target sm_30\n"						\
+  "        .address_size 64\n"						\
+  ".visible .func (.param.u32 %out_retval)acc_on_device(.param.u32 %in_ar1);\n" \
+  ".visible .func (.param.u32 %out_retval)acc_on_device(.param.u32 %in_ar1)\n" \
+  "{\n"									\
+  "        .reg.u32 %ar1;\n"						\
+  ".reg.u32 %retval;\n"							\
+  "        .reg.u64 %hr10;\n"						\
+  "        .reg.u32 %r24;\n"						\
+  "        .reg.u32 %r25;\n"						\
+  "        .reg.pred %r27;\n"						\
+  "        .reg.u32 %r30;\n"						\
+  "        ld.param.u32 %ar1, [%in_ar1];\n"				\
+  "                mov.u32 %r24, %ar1;\n"				\
+  "                setp.ne.u32 %r27,%r24,4;\n"				\
+  "                set.u32.eq.u32 %r30,%r24,5;\n"			\
+  "                neg.s32 %r25, %r30;\n"				\
+  "        @%r27   bra     $L3;\n"					\
+  "                mov.u32 %r25, 1;\n"					\
+  "$L3:\n"								\
+  "                mov.u32 %retval, %r25;\n"				\
+  "        st.param.u32    [%out_retval], %retval;\n"			\
+  "        ret;\n"							\
+  "        }\n"								\
+  ".visible .func (.param.u32 %out_retval)acc_on_device_h_(.param.u64 %in_ar1);\n" \
+  ".visible .func (.param.u32 %out_retval)acc_on_device_h_(.param.u64 %in_ar1)\n" \
+  "{\n"									\
+  "        .reg.u64 %ar1;\n"						\
+  ".reg.u32 %retval;\n"							\
+  "        .reg.u64 %hr10;\n"						\
+  "        .reg.u64 %r25;\n"						\
+  "        .reg.u32 %r26;\n"						\
+  "        .reg.u32 %r27;\n"						\
+  "        ld.param.u64 %ar1, [%in_ar1];\n"				\
+  "                mov.u64 %r25, %ar1;\n"				\
+  "                ld.u32  %r26, [%r25];\n"				\
+  "        {\n"								\
+  "                .param.u32 %retval_in;\n"				\
+  "        {\n"								\
+  "                .param.u32 %out_arg0;\n"				\
+  "                st.param.u32 [%out_arg0], %r26;\n"			\
+  "                call (%retval_in), acc_on_device, (%out_arg0);\n"	\
+  "        }\n"								\
+  "                ld.param.u32    %r27, [%retval_in];\n"		\
+  "}\n"									\
+  "                mov.u32 %retval, %r27;\n"				\
+  "        st.param.u32    [%out_retval], %retval;\n"			\
+  "        ret;\n"							\
+  "        }"
+
+ #define GOACC_INTERNAL_PTX						\
+  ".version 3.1\n" \
+  ".target sm_30\n" \
+  ".address_size 64\n" \
+  ".visible .func (.param .u32 %out_retval) GOACC_tid (.param .u32 %in_ar1);\n" \
+  ".visible .func (.param .u32 %out_retval) GOACC_ntid (.param .u32 %in_ar1);\n" \
+  ".visible .func (.param .u32 %out_retval) GOACC_ctaid (.param .u32 %in_ar1);\n" \
+  ".visible .func (.param .u32 %out_retval) GOACC_nctaid (.param .u32 %in_ar1);\n" \
+  ".visible .func (.param .u32 %out_retval) GOACC_get_num_threads;\n" \
+  ".visible .func (.param .u32 %out_retval) GOACC_get_thread_num;\n" \
+  ".extern .func abort;\n" \
+  ".visible .func (.param .u32 %out_retval) GOACC_tid (.param .u32 %in_ar1)\n" \
+  "{\n" \
+  ".reg .u32 %ar1;\n" \
+  ".reg .u32 %retval;\n" \
+  ".reg .u64 %hr10;\n" \
+  ".reg .u32 %r22;\n" \
+  ".reg .u32 %r23;\n" \
+  ".reg .u32 %r24;\n" \
+  ".reg .u32 %r25;\n" \
+  ".reg .u32 %r26;\n" \
+  ".reg .u32 %r27;\n" \
+  ".reg .u32 %r28;\n" \
+  ".reg .u32 %r29;\n" \
+  ".reg .pred %r30;\n" \
+  ".reg .u32 %r31;\n" \
+  ".reg .pred %r32;\n" \
+  ".reg .u32 %r33;\n" \
+  ".reg .pred %r34;\n" \
+  ".local .align 8 .b8 %frame[4];\n" \
+  "ld.param.u32 %ar1,[%in_ar1];\n" \
+  "mov.u32 %r27,%ar1;\n" \
+  "st.local.u32 [%frame],%r27;\n" \
+  "ld.local.u32 %r28,[%frame];\n" \
+  "mov.u32 %r29,1;\n"							\
+  "setp.eq.u32 %r30,%r28,%r29;\n"					\
+  "@%r30 bra $L4;\n"							\
+  "mov.u32 %r31,2;\n"							\
+  "setp.eq.u32 %r32,%r28,%r31;\n"					\
+  "@%r32 bra $L5;\n"							\
+  "mov.u32 %r33,0;\n"							\
+  "setp.eq.u32 %r34,%r28,%r33;\n"					\
+  "@!%r34 bra $L8;\n"							\
+  "mov.u32 %r23,%tid.x;\n"						\
+  "mov.u32 %r22,%r23;\n"						\
+  "bra $L7;\n"								\
+  "$L4:\n"								\
+  "mov.u32 %r24,%tid.y;\n"						\
+  "mov.u32 %r22,%r24;\n"						\
+  "bra $L7;\n"								\
+  "$L5:\n"								\
+  "mov.u32 %r25,%tid.z;\n"						\
+  "mov.u32 %r22,%r25;\n"						\
+  "bra $L7;\n"								\
+  "$L8:\n"								\
+  "{\n"									\
+  "{\n"									\
+  "call abort;\n"							\
+  "}\n"									\
+  "}\n"									\
+  "$L7:\n"								\
+  "mov.u32 %r26,%r22;\n"						\
+  "mov.u32 %retval,%r26;\n"						\
+  "st.param.u32 [%out_retval],%retval;\n"				\
+  "ret;\n"								\
+  "}\n"									\
+  ".visible .func (.param .u32 %out_retval) GOACC_ntid (.param .u32 %in_ar1)\n" \
+  "{\n"									\
+  ".reg .u32 %ar1;\n"							\
+  ".reg .u32 %retval;\n"						\
+  ".reg .u64 %hr10;\n"							\
+  ".reg .u32 %r22;\n"							\
+  ".reg .u32 %r23;\n"							\
+  ".reg .u32 %r24;\n"							\
+  ".reg .u32 %r25;\n"							\
+  ".reg .u32 %r26;\n"							\
+  ".reg .u32 %r27;\n"							\
+  ".reg .u32 %r28;\n"							\
+  ".reg .u32 %r29;\n"							\
+  ".reg .pred %r30;\n"							\
+  ".reg .u32 %r31;\n"							\
+  ".reg .pred %r32;\n"							\
+  ".reg .u32 %r33;\n"							\
+  ".reg .pred %r34;\n"							\
+  ".local .align 8 .b8 %frame[4];\n"					\
+  "ld.param.u32 %ar1,[%in_ar1];\n"					\
+  "mov.u32 %r27,%ar1;\n"						\
+  "st.local.u32 [%frame],%r27;\n"					\
+  "ld.local.u32 %r28,[%frame];\n"					\
+  "mov.u32 %r29,1;\n"							\
+  "setp.eq.u32 %r30,%r28,%r29;\n"					\
+  "@%r30 bra $L11;\n"							\
+  "mov.u32 %r31,2;\n"							\
+  "setp.eq.u32 %r32,%r28,%r31;\n"					\
+  "@%r32 bra $L12;\n"							\
+  "mov.u32 %r33,0;\n"							\
+  "setp.eq.u32 %r34,%r28,%r33;\n"					\
+  "@!%r34 bra $L15;\n"							\
+  "mov.u32 %r23,%ntid.x;\n"						\
+  "mov.u32 %r22,%r23;\n"						\
+  "bra $L14;\n"								\
+  "$L11:\n"								\
+  "mov.u32 %r24,%ntid.y;\n"						\
+  "mov.u32 %r22,%r24;\n"						\
+  "bra $L14;\n"								\
+  "$L12:\n"								\
+  "mov.u32 %r25,%ntid.z;\n"						\
+  "mov.u32 %r22,%r25;\n"						\
+  "bra $L14;\n"								\
+  "$L15:\n"								\
+  "{\n"									\
+  "{\n"									\
+  "call abort;\n"							\
+  "}\n"									\
+  "}\n"									\
+  "$L14:\n"								\
+  "mov.u32 %r26,%r22;\n"						\
+  "mov.u32 %retval,%r26;\n"						\
+  "st.param.u32 [%out_retval],%retval;\n"				\
+  "ret;\n"								\
+  "}\n"									\
+  ".visible .func (.param .u32 %out_retval) GOACC_ctaid (.param .u32 %in_ar1)\n" \
+  "{\n"									\
+  ".reg .u32 %ar1;\n"							\
+  ".reg .u32 %retval;\n"						\
+  ".reg .u64 %hr10;\n"							\
+  ".reg .u32 %r22;\n"							\
+  ".reg .u32 %r23;\n"							\
+  ".reg .u32 %r24;\n"							\
+  ".reg .u32 %r25;\n"							\
+  ".reg .u32 %r26;\n"							\
+  ".reg .u32 %r27;\n"							\
+  ".reg .u32 %r28;\n"							\
+  ".reg .u32 %r29;\n"							\
+  ".reg .pred %r30;\n"							\
+  ".reg .u32 %r31;\n"							\
+  ".reg .pred %r32;\n"							\
+  ".reg .u32 %r33;\n"							\
+  ".reg .pred %r34;\n"							\
+  ".local .align 8 .b8 %frame[4];\n"					\
+  "ld.param.u32 %ar1,[%in_ar1];\n"					\
+  "mov.u32 %r27,%ar1;\n"						\
+  "st.local.u32 [%frame],%r27;\n"					\
+  "ld.local.u32 %r28,[%frame];\n"					\
+  "mov.u32 %r29,1;\n"							\
+  "setp.eq.u32 %r30,%r28,%r29;\n"					\
+  "@%r30 bra $L18;\n"							\
+  "mov.u32 %r31,2;\n"							\
+  "setp.eq.u32 %r32,%r28,%r31;\n"					\
+  "@%r32 bra $L19;\n"							\
+  "mov.u32 %r33,0;\n"							\
+  "setp.eq.u32 %r34,%r28,%r33;\n"					\
+  "@!%r34 bra $L22;\n"							\
+  "mov.u32 %r23,%ctaid.x;\n"						\
+  "mov.u32 %r22,%r23;\n"						\
+  "bra $L21;\n"								\
+  "$L18:\n"								\
+  "mov.u32 %r24,%ctaid.y;\n"						\
+  "mov.u32 %r22,%r24;\n"						\
+  "bra $L21;\n"								\
+  "$L19:\n"								\
+  "mov.u32 %r25,%ctaid.z;\n"						\
+  "mov.u32 %r22,%r25;\n"						\
+  "bra $L21;\n"								\
+  "$L22:\n"								\
+  "{\n"									\
+  "{\n"									\
+  "call abort;\n"							\
+  "}\n"									\
+  "}\n"									\
+  "$L21:\n"								\
+  "mov.u32 %r26,%r22;\n"						\
+  "mov.u32 %retval,%r26;\n"						\
+  "st.param.u32 [%out_retval],%retval;\n"				\
+  "ret;\n"								\
+  "}\n"									\
+  ".visible .func (.param .u32 %out_retval) GOACC_nctaid (.param .u32 %in_ar1)\n" \
+  "{\n"									\
+  ".reg .u32 %ar1;\n"							\
+  ".reg .u32 %retval;\n"						\
+  ".reg .u64 %hr10;\n"							\
+  ".reg .u32 %r22;\n"							\
+  ".reg .u32 %r23;\n"							\
+  ".reg .u32 %r24;\n"							\
+  ".reg .u32 %r25;\n"							\
+  ".reg .u32 %r26;\n"							\
+  ".reg .u32 %r27;\n"							\
+  ".reg .u32 %r28;\n"							\
+  ".reg .u32 %r29;\n"							\
+  ".reg .pred %r30;\n"							\
+  ".reg .u32 %r31;\n"							\
+  ".reg .pred %r32;\n"							\
+  ".reg .u32 %r33;\n"							\
+  ".reg .pred %r34;\n"							\
+  ".local .align 8 .b8 %frame[4];\n"					\
+  "ld.param.u32 %ar1,[%in_ar1];\n"					\
+  "mov.u32 %r27,%ar1;\n"						\
+  "st.local.u32 [%frame],%r27;\n"					\
+  "ld.local.u32 %r28,[%frame];\n"					\
+  "mov.u32 %r29,1;\n"							\
+  "setp.eq.u32 %r30,%r28,%r29;\n"					\
+  "@%r30 bra $L25;\n"							\
+  "mov.u32 %r31,2;\n"							\
+  "setp.eq.u32 %r32,%r28,%r31;\n"					\
+  "@%r32 bra $L26;\n"							\
+  "mov.u32 %r33,0;\n"							\
+  "setp.eq.u32 %r34,%r28,%r33;\n"					\
+  "@!%r34 bra $L29;\n"							\
+  "mov.u32 %r23,%nctaid.x;\n"						\
+  "mov.u32 %r22,%r23;\n"						\
+  "bra $L28;\n"								\
+  "$L25:\n"								\
+  "mov.u32 %r24,%nctaid.y;\n"						\
+  "mov.u32 %r22,%r24;\n"						\
+  "bra $L28;\n"								\
+  "$L26:\n"								\
+  "mov.u32 %r25,%nctaid.z;\n"						\
+  "mov.u32 %r22,%r25;\n"						\
+  "bra $L28;\n"								\
+  "$L29:\n"								\
+  "{\n"									\
+  "{\n"									\
+  "call abort;\n"							\
+  "}\n"									\
+  "}\n"									\
+  "$L28:\n"								\
+  "mov.u32 %r26,%r22;\n"						\
+  "mov.u32 %retval,%r26;\n"						\
+  "st.param.u32 [%out_retval],%retval;\n"				\
+  "ret;\n"								\
+  "}\n"									\
+  ".visible .func (.param .u32 %out_retval) GOACC_get_num_threads\n"	\
+  "{\n"									\
+  ".reg .u32 %retval;\n"						\
+  ".reg .u64 %hr10;\n"							\
+  ".reg .u32 %r22;\n"							\
+  ".reg .u32 %r23;\n"							\
+  ".reg .u32 %r24;\n"							\
+  ".reg .u32 %r25;\n"							\
+  ".reg .u32 %r26;\n"							\
+  ".reg .u32 %r27;\n"							\
+  ".reg .u32 %r28;\n"							\
+  ".reg .u32 %r29;\n"							\
+  "mov.u32 %r26,0;\n"							\
+  "{\n"									\
+  ".param .u32 %retval_in;\n"						\
+  "{\n"									\
+  ".param .u32 %out_arg0;\n"						\
+  "st.param.u32 [%out_arg0],%r26;\n"					\
+  "call (%retval_in),GOACC_ntid,(%out_arg0);\n"				\
+  "}\n"									\
+  "ld.param.u32 %r27,[%retval_in];\n"					\
+  "}\n"									\
+  "mov.u32 %r22,%r27;\n"						\
+  "mov.u32 %r28,0;\n"							\
+  "{\n"									\
+  ".param .u32 %retval_in;\n"						\
+  "{\n"									\
+  ".param .u32 %out_arg0;\n"						\
+  "st.param.u32 [%out_arg0],%r28;\n"					\
+  "call (%retval_in),GOACC_nctaid,(%out_arg0);\n"			\
+  "}\n"									\
+  "ld.param.u32 %r29,[%retval_in];\n"					\
+  "}\n"									\
+  "mov.u32 %r23,%r29;\n"						\
+  "mul.lo.u32 %r24,%r22,%r23;\n"					\
+  "mov.u32 %r25,%r24;\n"						\
+  "mov.u32 %retval,%r25;\n"						\
+  "st.param.u32 [%out_retval],%retval;\n"				\
+  "ret;\n"								\
+  "}\n"									\
+  ".visible .func (.param .u32 %out_retval) GOACC_get_thread_num\n"	\
+  "{\n"									\
+  ".reg .u32 %retval;\n"						\
+  ".reg .u64 %hr10;\n"							\
+  ".reg .u32 %r22;\n"							\
+  ".reg .u32 %r23;\n"							\
+  ".reg .u32 %r24;\n"							\
+  ".reg .u32 %r25;\n"							\
+  ".reg .u32 %r26;\n"							\
+  ".reg .u32 %r27;\n"							\
+  ".reg .u32 %r28;\n"							\
+  ".reg .u32 %r29;\n"							\
+  ".reg .u32 %r30;\n"							\
+  ".reg .u32 %r31;\n"							\
+  ".reg .u32 %r32;\n"							\
+  ".reg .u32 %r33;\n"							\
+  "mov.u32 %r28,0;\n"							\
+  "{\n"									\
+  ".param .u32 %retval_in;\n"						\
+  "{\n"									\
+  ".param .u32 %out_arg0;\n"						\
+  "st.param.u32 [%out_arg0],%r28;\n"					\
+  "call (%retval_in),GOACC_ntid,(%out_arg0);\n"				\
+  "}\n"									\
+  "ld.param.u32 %r29,[%retval_in];\n"					\
+  "}\n"									\
+  "mov.u32 %r22,%r29;\n"						\
+  "mov.u32 %r30,0;\n"							\
+  "{\n"									\
+  ".param .u32 %retval_in;\n"						\
+  "{\n"									\
+  ".param .u32 %out_arg0;\n"						\
+  "st.param.u32 [%out_arg0],%r30;\n"					\
+  "call (%retval_in),GOACC_ctaid,(%out_arg0);\n"			\
+  "}\n"									\
+  "ld.param.u32 %r31,[%retval_in];\n"					\
+  "}\n"									\
+  "mov.u32 %r23,%r31;\n"						\
+  "mul.lo.u32 %r24,%r22,%r23;\n"					\
+  "mov.u32 %r32,0;\n"							\
+  "{\n"									\
+  ".param .u32 %retval_in;\n"						\
+  "{\n"									\
+  ".param .u32 %out_arg0;\n"						\
+  "st.param.u32 [%out_arg0],%r32;\n"					\
+  "call (%retval_in),GOACC_tid,(%out_arg0);\n"				\
+  "}\n"									\
+  "ld.param.u32 %r33,[%retval_in];\n"					\
+  "}\n"									\
+  "mov.u32 %r25,%r33;\n"						\
+  "add.u32 %r26,%r24,%r25;\n"						\
+  "mov.u32 %r27,%r26;\n"						\
+  "mov.u32 %retval,%r27;\n"						\
+  "st.param.u32 [%out_retval],%retval;\n"				\
+  "ret;\n"								\
+  "}\n"
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 3d1b81b..7fedd2d 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -35,6 +35,7 @@
 #include "libgomp.h"
 #include "libgomp_target.h"
 #include "libgomp-plugin.h"
+#include "oacc-ptx.h"
 #include "oacc-plugin.h"
 
 #include <cuda.h>
@@ -722,78 +723,6 @@ PTX_get_num_devices (void)
   return n;
 }
 
-#define ABORT_PTX				\
-  ".version 3.1\n"				\
-  ".target sm_30\n"				\
-  ".address_size 64\n"				\
-  ".visible .func abort;\n"			\
-  ".visible .func abort\n"			\
-  "{\n"						\
-  "trap;\n"					\
-  "ret;\n"					\
-  "}\n"						\
-  ".visible .func _gfortran_abort;\n"		\
-  ".visible .func _gfortran_abort\n"		\
-  "{\n"						\
-  "trap;\n"					\
-  "ret;\n"					\
-  "}\n" \
-
-/* Generated with:
-
-   $ echo 'int acc_on_device(int d) { return __builtin_acc_on_device(d); } int acc_on_device_(int *d) { return acc_on_device(*d); }' | accel-gcc/xgcc -Baccel-gcc -x c - -o - -S -m64 -O3 -fno-builtin-acc_on_device -fno-inline
-*/
-#define ACC_ON_DEVICE_PTX						\
-  "        .version        3.1\n"					\
-  "        .target sm_30\n"						\
-  "        .address_size 64\n"						\
-  ".visible .func (.param.u32 %out_retval)acc_on_device(.param.u32 %in_ar1);\n" \
-  ".visible .func (.param.u32 %out_retval)acc_on_device(.param.u32 %in_ar1)\n" \
-  "{\n"									\
-  "        .reg.u32 %ar1;\n"						\
-  ".reg.u32 %retval;\n"							\
-  "        .reg.u64 %hr10;\n"						\
-  "        .reg.u32 %r24;\n"						\
-  "        .reg.u32 %r25;\n"						\
-  "        .reg.pred %r27;\n"						\
-  "        .reg.u32 %r30;\n"						\
-  "        ld.param.u32 %ar1, [%in_ar1];\n"				\
-  "                mov.u32 %r24, %ar1;\n"				\
-  "                setp.ne.u32 %r27,%r24,4;\n"				\
-  "                set.u32.eq.u32 %r30,%r24,5;\n"			\
-  "                neg.s32 %r25, %r30;\n"				\
-  "        @%r27   bra     $L3;\n"					\
-  "                mov.u32 %r25, 1;\n"					\
-  "$L3:\n"								\
-  "                mov.u32 %retval, %r25;\n"				\
-  "        st.param.u32    [%out_retval], %retval;\n"			\
-  "        ret;\n"							\
-  "        }\n"								\
-  ".visible .func (.param.u32 %out_retval)acc_on_device_(.param.u64 %in_ar1);\n" \
-  ".visible .func (.param.u32 %out_retval)acc_on_device_(.param.u64 %in_ar1)\n" \
-  "{\n"									\
-  "        .reg.u64 %ar1;\n"						\
-  ".reg.u32 %retval;\n"							\
-  "        .reg.u64 %hr10;\n"						\
-  "        .reg.u64 %r25;\n"						\
-  "        .reg.u32 %r26;\n"						\
-  "        .reg.u32 %r27;\n"						\
-  "        ld.param.u64 %ar1, [%in_ar1];\n"				\
-  "                mov.u64 %r25, %ar1;\n"				\
-  "                ld.u32  %r26, [%r25];\n"				\
-  "        {\n"								\
-  "                .param.u32 %retval_in;\n"				\
-  "        {\n"								\
-  "                .param.u32 %out_arg0;\n"				\
-  "                st.param.u32 [%out_arg0], %r26;\n"			\
-  "                call (%retval_in), acc_on_device, (%out_arg0);\n"	\
-  "        }\n"								\
-  "                ld.param.u32    %r27, [%retval_in];\n"		\
-  "}\n"									\
-  "                mov.u32 %retval, %r27;\n"				\
-  "        st.param.u32    [%out_retval], %retval;\n"			\
-  "        ret;\n"							\
-  "        }"
 
 static void
 link_ptx (CUmodule *module, char *ptx_code)
@@ -856,6 +785,16 @@ link_ptx (CUmodule *module, char *ptx_code)
 			 cuda_error (r));
     }
 
+  char *goacc_internal_ptx = GOACC_INTERNAL_PTX;
+  r = cuLinkAddData (linkstate, CU_JIT_INPUT_PTX, goacc_internal_ptx,
+		     strlen (goacc_internal_ptx) + 1, 0, 0, 0, 0);
+  if (r != CUDA_SUCCESS)
+    {
+      GOMP_PLUGIN_error ("Link error log %s\n", &elog[0]);
+      GOMP_PLUGIN_fatal ("cuLinkAddData (goacc_internal_ptx) error: %s",
+			 cuda_error (r));
+    }
+
   r = cuLinkAddData (linkstate, CU_JIT_INPUT_PTX, ptx_code,
               strlen (ptx_code) + 1, 0, 0, 0, 0);
   if (r != CUDA_SUCCESS)
@@ -1043,7 +982,7 @@ PTX_exec (void (*fn), size_t mapnum, void **hostaddrs, void **devaddrs,
 
   kargs[0] = &dp;
   r = cuLaunchKernel (function,
-			1, 1, 1,
+			num_gangs, 1, 1,
 			nthreads_in_block, 1, 1,
 			0, dev_str->stream, kargs, 0);
   if (r != CUDA_SUCCESS)
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/reduction-6.f90 b/libgomp/testsuite/libgomp.oacc-fortran/reduction-6.f90
new file mode 100644
index 0000000..6325431
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/reduction-6.f90
@@ -0,0 +1,30 @@
+! { dg-do run }
+
+program reduction
+  implicit none
+
+  integer, parameter    :: n = 100
+  integer               :: i, s1, s2, vs1, vs2
+
+  s1 = 0
+  s2 = 0
+  vs1 = 0
+  vs2 = 0
+
+  !$acc parallel vector_length (1000)
+  !$acc loop reduction(+:s1, s2)
+  do i = 1, n
+     s1 = s1 + 1
+     s2 = s2 + 2
+  end do
+  !$acc end parallel
+
+  ! Verify the results
+  do i = 1, n
+     vs1 = vs1 + 1
+     vs2 = vs2 + 2
+  end do
+
+  if (s1.ne.vs1) call abort ()
+  if (s2.ne.vs2) call abort ()
+end program reduction

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/5] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin (repost)
  2014-11-19 19:58       ` Bernd Schmidt
@ 2014-11-19 20:39         ` Cesar Philippidis
  0 siblings, 0 replies; 36+ messages in thread
From: Cesar Philippidis @ 2014-11-19 20:39 UTC (permalink / raw)
  To: Bernd Schmidt, Julian Brown, Jakub Jelinek
  Cc: gcc-patches, Thomas Schwinge, Ilya Verbin

On 11/19/2014 11:44 AM, Bernd Schmidt wrote:

> I'm attaching the patch in the form in which I've made it work locally,
> plus Cesar's patch which is needed on top of it. Julian, you'll probably
> want to look for that patch since it also included testsuite changes.
> Cesar - have a look over this please and maybe explain for review
> purposes what your patch does.

Julian's initial libgomp patch set somewhat diverged both from our
internal tree and gomp-4_0-branch. I think he was trying to get an
earlier snapshot of gomp-4_0-branch to play nicely with gomp4-offload
branch, and my patch went in kind of late.

Anyway, here's the like to my original patch:

https://gcc.gnu.org/ml/gcc-patches/2014-10/msg03392.html

The patch introduces two new libgomp-internal functions
GOACC_get_thread_num and GOACC_get_num_thread. There's some more details
in the link.

Cesar

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 7/10] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin
  2014-09-23 18:20 [PATCH 7/10] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin Julian Brown
  2014-11-11 13:54 ` [PATCH 1/5] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin (repost) Julian Brown
@ 2014-12-22 16:41 ` Thomas Schwinge
  2015-01-12 14:49 ` Thomas Schwinge
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 36+ messages in thread
From: Thomas Schwinge @ 2014-12-22 16:41 UTC (permalink / raw)
  To: gcc-patches; +Cc: Julian Brown

[-- Attachment #1: Type: text/plain, Size: 3213 bytes --]

Hi!

On Tue, 23 Sep 2014 19:19:31 +0100, Julian Brown <julian@codesourcery.com> wrote:
> This patch contains the bulk of the OpenACC 2.0 runtime support,

> --- /dev/null
> +++ b/libgomp/plugin-nvptx.c
> @@ -0,0 +1,1854 @@
> +/* Plugin for NVPTX execution.

> +const char *
> +get_name (void)
> +{
> +  return "nvidia";
> +}

Committed to gomp-4_0-branch in r219018:

commit 56e092991a343484fe3d26b4506587d9bb99c1a9
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Mon Dec 22 16:37:48 2014 +0000

    libgomp: Fix nvptx plugin's GOMP_OFFLOAD_get_name.
    
    This is a function in the "generic" interface, so it should return the
    "generic" name instead of the OpenACC one.
    
    	libgomp/
    	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_name): Return "nvptx".
    	* oacc-init.c (resolve_device): Update for that using...
    	(get_openacc_name): ... this new function.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@219018 138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog.gomp        |  4 ++++
 libgomp/oacc-init.c           | 15 ++++++++++++++-
 libgomp/plugin/plugin-nvptx.c |  2 +-
 3 files changed, 19 insertions(+), 2 deletions(-)

diff --git libgomp/ChangeLog.gomp libgomp/ChangeLog.gomp
index 6653e58..a36ec1f 100644
--- libgomp/ChangeLog.gomp
+++ libgomp/ChangeLog.gomp
@@ -1,5 +1,9 @@
 2014-12-22  Thomas Schwinge  <thomas@codesourcery.com>
 
+	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_name): Return "nvptx".
+	* oacc-init.c (resolve_device): Update for that using...
+	(get_openacc_name): ... this new function.
+
 	* testsuite/libgomp-test-support.exp.in
 	(offload_additional_options, offload_additional_lib_paths): Don't
 	set.
diff --git libgomp/oacc-init.c libgomp/oacc-init.c
index 7298d9a..ff51856 100644
--- libgomp/oacc-init.c
+++ libgomp/oacc-init.c
@@ -35,6 +35,7 @@
 #include <strings.h>
 #include <stdbool.h>
 #include <stdio.h>
+#include <string.h>
 
 static gomp_mutex_t acc_device_lock;
 
@@ -84,6 +85,17 @@ goacc_register (struct gomp_device_descr const *disp)
   gomp_mutex_unlock (&acc_device_lock);
 }
 
+/* OpenACC names some things a little differently.  */
+
+static const char *
+get_openacc_name (const char *name)
+{
+  if (strcmp (name, "nvptx") == 0)
+    return "nvidia";
+  else
+    return name;
+}
+
 static struct gomp_device_descr const *
 resolve_device (acc_device_t d)
 {
@@ -98,7 +110,8 @@ resolve_device (acc_device_t d)
 	    /* Lookup the named device.  */
 	    while (++d != _ACC_device_hwm)
 	      if (dispatchers[d]
-		  && !strcasecmp (goacc_device_type, dispatchers[d]->name)
+		  && !strcasecmp (goacc_device_type,
+				  get_openacc_name (dispatchers[d]->name))
 		  && dispatchers[d]->get_num_devices_func () > 0)
 		goto found;
 
diff --git libgomp/plugin/plugin-nvptx.c libgomp/plugin/plugin-nvptx.c
index bc5739a..d423d3a 100644
--- libgomp/plugin/plugin-nvptx.c
+++ libgomp/plugin/plugin-nvptx.c
@@ -1499,7 +1499,7 @@ GOMP_OFFLOAD_get_caps (void)
 const char *
 GOMP_OFFLOAD_get_name (void)
 {
-  return "nvidia";
+  return "nvptx";
 }
 
 int


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/5] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin (repost)
  2014-11-11 13:54 ` [PATCH 1/5] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin (repost) Julian Brown
  2014-11-12 10:10   ` Jakub Jelinek
@ 2014-12-22 17:55   ` Thomas Schwinge
  2014-12-22 18:05   ` Thomas Schwinge
                     ` (4 subsequent siblings)
  6 siblings, 0 replies; 36+ messages in thread
From: Thomas Schwinge @ 2014-12-22 17:55 UTC (permalink / raw)
  To: gcc-patches, Jakub Jelinek; +Cc: Julian Brown

[-- Attachment #1: Type: text/plain, Size: 6408 bytes --]

Hi!

On Tue, 11 Nov 2014 13:53:23 +0000, Julian Brown <julian@codesourcery.com> wrote:
> On Tue, 23 Sep 2014 19:19:31 +0100
> Julian Brown <julian@codesourcery.com> wrote:
> 
> > This patch contains the bulk of the OpenACC 2.0 runtime support,
> > building around, or on top of, the OpenMP 4.0 support (as previously
> > posted or already extant upstream) where we could. [...]
> 
> Here is a new version of the OpenACC support patch for libgomp, rebased
> on top of a version of Ilya Verbin's patches that I merged to a local
> clone of trunk, and tested as far as possible without the
> middle/front-end pieces, since those are not ready yet.

> --- a/libgomp/target.c
> +++ b/libgomp/target.c

Code is split off of gomp_init_device into the new gomp_init_tables:

>  gomp_init_device (struct gomp_device_descr *devicep)
>  {
> +  /* Initialize the target device.  */
>    devicep->init_device_func (devicep->target_id);
> +  
> +  devicep->is_initialized = true;
> +}
>  
> +attribute_hidden void
> +gomp_init_tables (const struct gomp_device_descr *devicep,
> +		  struct gomp_memory_mapping *mm)
> +{
>    /* Get address mapping table for device.  */
>    struct mapping_table *table = NULL;
> [...]

..., and a new function gomp_init_dev_tables added to call both of them:

> +static void
> +gomp_init_dev_tables (struct gomp_device_descr *devicep)
> +{
> +  gomp_init_device (devicep);
> +  gomp_init_tables (devicep, &devicep->mem_map);
> +}

..., which is then used in GOMP_target:

> @@ -673,7 +753,12 @@ GOMP_target (int device, void (*fn) (void *), const void *openmp_target,

> +  if (devicep != NULL && !devicep->is_initialized)
> +    gomp_init_dev_tables (devicep);

..., and GOMP_target_data:

> @@ -724,7 +818,12 @@ GOMP_target_data (int device, const void *openmp_target, size_t mapnum,

> +  if (devicep != NULL && !devicep->is_initialized)
> +    gomp_init_dev_tables (devicep);

..., but not in GOMP_target_update:

> @@ -771,15 +871,18 @@ GOMP_target_update (int device, const void *openmp_target, size_t mapnum,

> +  if (devicep != NULL && !devicep->is_initialized)
>      gomp_init_device (devicep);

Committed to gomp-4_0-branch in r219023:

commit 0020b7972a6f7f422bd9d4782021ff5b8e2b0f50
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Mon Dec 22 17:40:37 2014 +0000

    libgomp: Restore code that initializes the address mapping tables in GOMP_target_update.
    
    	libgomp/
    	* target.c (GOMP_target_update): To initialize, call
    	gomp_init_dev_tables instead of gomp_init_device.
    
    With Intel MIC offloading (emulation), this fixes cases of "#pragma omp declare
    target" of variables, followed by "#pragma omp target update to([...])" before
    any "#pragma omp target" or "#pragma omp target data" is called:
    
        FAIL: libgomp.c/examples-4/e.53.3.c execution test
        FAIL: libgomp.c/examples-4/e.53.4.c execution test
        FAIL: libgomp.c/examples-4/e.53.5.c execution test
        FAIL: libgomp.fortran/examples-4/e.53.3.f90   -O0  execution test
        FAIL: libgomp.fortran/examples-4/e.53.3.f90   -O1  execution test
        FAIL: libgomp.fortran/examples-4/e.53.3.f90   -O2  execution test
        FAIL: libgomp.fortran/examples-4/e.53.3.f90   -O3 -fomit-frame-pointer  execution test
        FAIL: libgomp.fortran/examples-4/e.53.3.f90   -O3 -fomit-frame-pointer -funroll-loops  execution test
        FAIL: libgomp.fortran/examples-4/e.53.3.f90   -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions  execution test
        FAIL: libgomp.fortran/examples-4/e.53.3.f90   -O3 -g  execution test
        FAIL: libgomp.fortran/examples-4/e.53.3.f90   -Os  execution test
        FAIL: libgomp.fortran/examples-4/e.53.4.f90   -O0  execution test
        FAIL: libgomp.fortran/examples-4/e.53.4.f90   -O1  execution test
        FAIL: libgomp.fortran/examples-4/e.53.4.f90   -O2  execution test
        FAIL: libgomp.fortran/examples-4/e.53.4.f90   -O3 -fomit-frame-pointer  execution test
        FAIL: libgomp.fortran/examples-4/e.53.4.f90   -O3 -fomit-frame-pointer -funroll-loops  execution test
        FAIL: libgomp.fortran/examples-4/e.53.4.f90   -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions  execution test
        FAIL: libgomp.fortran/examples-4/e.53.4.f90   -O3 -g  execution test
        FAIL: libgomp.fortran/examples-4/e.53.4.f90   -Os  execution test
        FAIL: libgomp.fortran/examples-4/e.53.5.f90   -O0  execution test
        FAIL: libgomp.fortran/examples-4/e.53.5.f90   -O1  execution test
        FAIL: libgomp.fortran/examples-4/e.53.5.f90   -O2  execution test
        FAIL: libgomp.fortran/examples-4/e.53.5.f90   -O3 -fomit-frame-pointer  execution test
        FAIL: libgomp.fortran/examples-4/e.53.5.f90   -O3 -fomit-frame-pointer -funroll-loops  execution test
        FAIL: libgomp.fortran/examples-4/e.53.5.f90   -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions  execution test
        FAIL: libgomp.fortran/examples-4/e.53.5.f90   -O3 -g  execution test
        FAIL: libgomp.fortran/examples-4/e.53.5.f90   -Os  execution test
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@219023 138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog.gomp | 3 +++
 libgomp/target.c       | 2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git libgomp/ChangeLog.gomp libgomp/ChangeLog.gomp
index 26fdfe6..3aa9bf4 100644
--- libgomp/ChangeLog.gomp
+++ libgomp/ChangeLog.gomp
@@ -1,5 +1,8 @@
 2014-12-22  Thomas Schwinge  <thomas@codesourcery.com>
 
+	* target.c (GOMP_target_update): To initialize, call
+	gomp_init_dev_tables instead of gomp_init_device.
+
 	* target.c (gomp_map_vars) <GOMP_MAP_TO_PSET>: Revert earlier
 	changes.
 
diff --git libgomp/target.c libgomp/target.c
index 423bbee..8517a84 100644
--- libgomp/target.c
+++ libgomp/target.c
@@ -891,7 +891,7 @@ GOMP_target_update (int device, const void *offload_table, size_t mapnum,
   struct gomp_memory_mapping *mm = &devicep->mem_map;
   gomp_mutex_lock (&mm->lock);
   if (!devicep->is_initialized)
-    gomp_init_device (devicep);
+    gomp_init_dev_tables (devicep);
   gomp_mutex_unlock (&mm->lock);
 
   if (!(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/5] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin (repost)
  2014-11-11 13:54 ` [PATCH 1/5] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin (repost) Julian Brown
  2014-11-12 10:10   ` Jakub Jelinek
  2014-12-22 17:55   ` Thomas Schwinge
@ 2014-12-22 18:05   ` Thomas Schwinge
  2014-12-22 18:12   ` Thomas Schwinge
                     ` (3 subsequent siblings)
  6 siblings, 0 replies; 36+ messages in thread
From: Thomas Schwinge @ 2014-12-22 18:05 UTC (permalink / raw)
  To: gcc-patches, Jakub Jelinek; +Cc: Julian Brown

[-- Attachment #1: Type: text/plain, Size: 5185 bytes --]

Hi!

On Tue, 11 Nov 2014 13:53:23 +0000, Julian Brown <julian@codesourcery.com> wrote:
> On Tue, 23 Sep 2014 19:19:31 +0100
> Julian Brown <julian@codesourcery.com> wrote:
> > This patch contains the bulk of the OpenACC 2.0 runtime support,
> > building around, or on top of, the OpenMP 4.0 support (as previously
> > posted or already extant upstream) where we could. [...]
> 
> Here is a new version of the OpenACC support patch for libgomp, [...]

>     libgomp/

>     * libgomp_target.h [...]
>     (struct gomp_device_descr): Move here. Add offload_regions_registered,
>     openacc dispatch functions, target_data.

>     * target.c [...]
>     (splay_tree_key_s, gomp_device_descr): Don't declare here.

> --- a/libgomp/libgomp_target.h
> +++ b/libgomp/libgomp_target.h

> +/* This structure describes accelerator device.
> +   It contains name of the corresponding libgomp plugin, function handlers for
> +   interaction with the device, ID-number of the device, and information about
> +   mapped memory.  */
> +struct gomp_device_descr
> +{
> [...]
> +  /* Extra information required for a device instance by a given target.  */
> +  void *target_data;
> +};

Committed to gomp-4_0-branch in r219024:

commit 095199060ff07ddfd0fb5d5c9fecabfe80ed8eed
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Mon Dec 22 17:57:50 2014 +0000

    libgomp: Move target_data member from struct gomp_device_descr into struct acc_dispatch_t.
    
    It is only used with struct acc_dispatch_t's open_device_func and
    close_device_func, so specific to OpenACC support.
    
    	libgomp/
    	* libgomp.h (struct gomp_device_descr): Move target_data member...
    	(struct acc_dispatch_t): ... into here.  Change all users.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@219024 138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog.gomp |  3 +++
 libgomp/libgomp.h      |  6 +++---
 libgomp/oacc-init.c    | 11 ++++++-----
 libgomp/target.c       |  2 +-
 4 files changed, 13 insertions(+), 9 deletions(-)

diff --git libgomp/ChangeLog.gomp libgomp/ChangeLog.gomp
index 3aa9bf4..4eac98c 100644
--- libgomp/ChangeLog.gomp
+++ libgomp/ChangeLog.gomp
@@ -1,5 +1,8 @@
 2014-12-22  Thomas Schwinge  <thomas@codesourcery.com>
 
+	* libgomp.h (struct gomp_device_descr): Move target_data member...
+	(struct acc_dispatch_t): ... into here.  Change all users.
+
 	* target.c (GOMP_target_update): To initialize, call
 	gomp_init_dev_tables instead of gomp_init_device.
 
diff --git libgomp/libgomp.h libgomp/libgomp.h
index 866f6ca..ec3c52e 100644
--- libgomp/libgomp.h
+++ libgomp/libgomp.h
@@ -684,6 +684,9 @@ typedef struct acc_dispatch_t
      happen out-of-order with respect to mapping.  */
   struct target_mem_desc *data_environ;
 
+  /* Extra information required for a device instance by a given target.  */
+  void *target_data;
+
   /* Open or close a device instance.  */
   void *(*open_device_func) (int n);
   int (*close_device_func) (void *h);
@@ -769,9 +772,6 @@ struct gomp_device_descr
 
   /* Memory-mapping info for this device instance.  */
   struct gomp_memory_mapping mem_map;
-
-  /* Extra information required for a device instance by a given target.  */
-  void *target_data;
 };
 
 extern void gomp_acc_insert_pointer (size_t, void **, size_t *, void *);
diff --git libgomp/oacc-init.c libgomp/oacc-init.c
index ff51856..06039b3 100644
--- libgomp/oacc-init.c
+++ libgomp/oacc-init.c
@@ -279,11 +279,11 @@ lazy_open (int ord)
   thr->saved_bound_dev = NULL;
   thr->mapped_data = NULL;
 
-  if (!acc_dev->target_data)
-    acc_dev->target_data = acc_dev->openacc.open_device_func (ord);
+  if (!acc_dev->openacc.target_data)
+    acc_dev->openacc.target_data = acc_dev->openacc.open_device_func (ord);
 
   thr->target_tls
-    = acc_dev->openacc.create_thread_data_func (acc_dev->target_data);
+    = acc_dev->openacc.create_thread_data_func (acc_dev->openacc.target_data);
 
   acc_dev->openacc.async_set_async_func (acc_async_sync);
 
@@ -344,10 +344,11 @@ acc_shutdown_1 (acc_device_t d)
 
       if (walk->dev)
 	{
-          if (walk->dev->openacc.close_device_func (walk->dev->target_data) < 0)
+	  void *target_data = walk->dev->openacc.target_data;
+	  if (walk->dev->openacc.close_device_func (target_data) < 0)
 	    gomp_fatal ("failed to close device");
 
-	  walk->dev->target_data = NULL;
+	  walk->dev->openacc.target_data = target_data = NULL;
 
 	  gomp_free_memmap (walk->dev);
 
diff --git libgomp/target.c libgomp/target.c
index 8517a84..bf719f8 100644
--- libgomp/target.c
+++ libgomp/target.c
@@ -1106,8 +1106,8 @@ gomp_target_init (void)
 		current_device.offload_regions_registered = false;
 		current_device.mem_map.splay_tree.root = NULL;
 		current_device.mem_map.is_initialized = false;
-		current_device.target_data = NULL;
 		current_device.openacc.data_environ = NULL;
+		current_device.openacc.target_data = NULL;
 		for (i = 0; i < new_num_devices; i++)
 		  {
 		    current_device.id = num_devices + 1;


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/5] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin (repost)
  2014-11-11 13:54 ` [PATCH 1/5] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin (repost) Julian Brown
                     ` (2 preceding siblings ...)
  2014-12-22 18:05   ` Thomas Schwinge
@ 2014-12-22 18:12   ` Thomas Schwinge
  2014-12-22 18:16   ` Thomas Schwinge
                     ` (2 subsequent siblings)
  6 siblings, 0 replies; 36+ messages in thread
From: Thomas Schwinge @ 2014-12-22 18:12 UTC (permalink / raw)
  To: gcc-patches, Jakub Jelinek; +Cc: Julian Brown

[-- Attachment #1: Type: text/plain, Size: 3913 bytes --]

Hi!

On Tue, 11 Nov 2014 13:53:23 +0000, Julian Brown <julian@codesourcery.com> wrote:
> On Tue, 23 Sep 2014 19:19:31 +0100
> Julian Brown <julian@codesourcery.com> wrote:
> > This patch contains the bulk of the OpenACC 2.0 runtime support,
> > building around, or on top of, the OpenMP 4.0 support (as previously
> > posted or already extant upstream) where we could. [...]
> 
> Here is a new version of the OpenACC support patch for libgomp, [...]

> --- a/libgomp/libgomp_target.h
> +++ b/libgomp/libgomp_target.h

> +extern attribute_hidden void
> +gomp_free_memmap (struct gomp_device_descr *devicep);

> --- a/libgomp/target.c
> +++ b/libgomp/target.c

> +attribute_hidden void
> +gomp_free_memmap (struct gomp_device_descr *devicep)
> +{
> +  struct gomp_memory_mapping *mm = &devicep->mem_map;
> +
> +  while (mm->splay_tree.root)
> +    {
> +      struct target_mem_desc *tgt = mm->splay_tree.root->key.tgt;
> +      
> +      splay_tree_remove (&mm->splay_tree, &mm->splay_tree.root->key);
> +      free (tgt->array);
> +      free (tgt);
> +    }
> +
> +  mm->is_initialized = false;
> +}

Committed to gomp-4_0-branch in r219025:

commit 2eb33739d20c07303c42ed56db0fb925b575f33e
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Mon Dec 22 18:04:41 2014 +0000

    libgomp: gomp_free_memmap interface change.
    
    	libgomp/
    	* libgomp.h (gomp_free_memmap): Take a pointer to a struct
    	gomp_memory_mapping instead of a pointer to a struct
    	gomp_device_descr.  Change all users.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@219025 138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog.gomp | 4 ++++
 libgomp/libgomp.h      | 4 ++--
 libgomp/oacc-init.c    | 2 +-
 libgomp/target.c       | 4 +---
 4 files changed, 8 insertions(+), 6 deletions(-)

diff --git libgomp/ChangeLog.gomp libgomp/ChangeLog.gomp
index 4eac98c..383993d 100644
--- libgomp/ChangeLog.gomp
+++ libgomp/ChangeLog.gomp
@@ -1,5 +1,9 @@
 2014-12-22  Thomas Schwinge  <thomas@codesourcery.com>
 
+	* libgomp.h (gomp_free_memmap): Take a pointer to a struct
+	gomp_memory_mapping instead of a pointer to a struct
+	gomp_device_descr.  Change all users.
+
 	* libgomp.h (struct gomp_device_descr): Move target_data member...
 	(struct acc_dispatch_t): ... into here.  Change all users.
 
diff --git libgomp/libgomp.h libgomp/libgomp.h
index ec3c52e..5897d8f 100644
--- libgomp/libgomp.h
+++ libgomp/libgomp.h
@@ -790,10 +790,10 @@ extern void gomp_init_device (struct gomp_device_descr *);
 extern void gomp_init_tables (const struct gomp_device_descr *,
 			      struct gomp_memory_mapping *);
 
+extern void gomp_free_memmap (struct gomp_memory_mapping *);
+
 extern void gomp_fini_device (struct gomp_device_descr *);
 
-extern void gomp_free_memmap (struct gomp_device_descr *);
-
 /* work.c */
 
 extern void gomp_init_work_share (struct gomp_work_share *, bool, unsigned);
diff --git libgomp/oacc-init.c libgomp/oacc-init.c
index 06039b3..3867ca7 100644
--- libgomp/oacc-init.c
+++ libgomp/oacc-init.c
@@ -350,7 +350,7 @@ acc_shutdown_1 (acc_device_t d)
 
 	  walk->dev->openacc.target_data = target_data = NULL;
 
-	  gomp_free_memmap (walk->dev);
+	  gomp_free_memmap (&walk->dev->mem_map);
 
 	  walk->dev = NULL;
 	}
diff --git libgomp/target.c libgomp/target.c
index bf719f8..788d9fb 100644
--- libgomp/target.c
+++ libgomp/target.c
@@ -722,10 +722,8 @@ gomp_init_dev_tables (struct gomp_device_descr *devicep)
 
 
 attribute_hidden void
-gomp_free_memmap (struct gomp_device_descr *devicep)
+gomp_free_memmap (struct gomp_memory_mapping *mm)
 {
-  struct gomp_memory_mapping *mm = &devicep->mem_map;
-
   while (mm->splay_tree.root)
     {
       struct target_mem_desc *tgt = mm->splay_tree.root->key.tgt;


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/5] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin (repost)
  2014-11-11 13:54 ` [PATCH 1/5] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin (repost) Julian Brown
                     ` (3 preceding siblings ...)
  2014-12-22 18:12   ` Thomas Schwinge
@ 2014-12-22 18:16   ` Thomas Schwinge
  2014-12-22 18:55   ` Thomas Schwinge
  2014-12-23  0:57   ` Thomas Schwinge
  6 siblings, 0 replies; 36+ messages in thread
From: Thomas Schwinge @ 2014-12-22 18:16 UTC (permalink / raw)
  To: gcc-patches, Jakub Jelinek; +Cc: Julian Brown

[-- Attachment #1: Type: text/plain, Size: 8044 bytes --]

Hi!

On Tue, 11 Nov 2014 13:53:23 +0000, Julian Brown <julian@codesourcery.com> wrote:
> On Tue, 23 Sep 2014 19:19:31 +0100
> Julian Brown <julian@codesourcery.com> wrote:
> 
> > This patch contains the bulk of the OpenACC 2.0 runtime support,
> > building around, or on top of, the OpenMP 4.0 support (as previously
> > posted or already extant upstream) where we could. [...]
> 
> Here is a new version of the OpenACC support patch for libgomp, [...]

Committed to gomp-4_0-branch in r219026:

commit 9a4509c31bcb89a4eb78d70dba4eb3d1b4709c8b
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Mon Dec 22 18:14:16 2014 +0000

    libgomp: Remove dubious "const casting".
    
    This may be re-instantiated later on, but then "more completely".
    
    	libgomp/
    	* libgomp.h (gomp_init_tables): Remove const qualifier from struct
    	gomp_device_descr.  Change all users.
    	* oacc-int.h (base_dev, goacc_register): Likewise.
    	* oacc-init.c (dispatchers, resolve_device, acc_init_1)
    	(lazy_init): Likewise.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@219026 138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog.gomp |  6 ++++++
 libgomp/libgomp.h      |  2 +-
 libgomp/oacc-init.c    | 24 ++++++++++++------------
 libgomp/oacc-int.h     |  4 ++--
 libgomp/oacc-mem.c     |  2 +-
 libgomp/target.c       |  4 ++--
 6 files changed, 24 insertions(+), 18 deletions(-)

diff --git libgomp/ChangeLog.gomp libgomp/ChangeLog.gomp
index 383993d..3439797 100644
--- libgomp/ChangeLog.gomp
+++ libgomp/ChangeLog.gomp
@@ -1,5 +1,11 @@
 2014-12-22  Thomas Schwinge  <thomas@codesourcery.com>
 
+	* libgomp.h (gomp_init_tables): Remove const qualifier from struct
+	gomp_device_descr.  Change all users.
+	* oacc-int.h (base_dev, goacc_register): Likewise.
+	* oacc-init.c (dispatchers, resolve_device, acc_init_1)
+	(lazy_init): Likewise.
+
 	* libgomp.h (gomp_free_memmap): Take a pointer to a struct
 	gomp_memory_mapping instead of a pointer to a struct
 	gomp_device_descr.  Change all users.
diff --git libgomp/libgomp.h libgomp/libgomp.h
index 5897d8f..440bfce 100644
--- libgomp/libgomp.h
+++ libgomp/libgomp.h
@@ -787,7 +787,7 @@ extern void gomp_unmap_vars (struct target_mem_desc *, bool);
 
 extern void gomp_init_device (struct gomp_device_descr *);
 
-extern void gomp_init_tables (const struct gomp_device_descr *,
+extern void gomp_init_tables (struct gomp_device_descr *,
 			      struct gomp_memory_mapping *);
 
 extern void gomp_free_memmap (struct gomp_memory_mapping *);
diff --git libgomp/oacc-init.c libgomp/oacc-init.c
index 3867ca7..d10b974 100644
--- libgomp/oacc-init.c
+++ libgomp/oacc-init.c
@@ -46,7 +46,7 @@ static gomp_mutex_t acc_device_lock;
    for overall initialisation/shutdown, and other instances -- not necessarily
    including this one -- may be opened and closed once the base device has
    been initialized.  */
-struct gomp_device_descr const *base_dev;
+struct gomp_device_descr *base_dev;
 
 #if defined HAVE_TLS || defined USE_EMUTLS
 __thread struct goacc_thread *goacc_tls_data;
@@ -65,10 +65,10 @@ static gomp_mutex_t goacc_thread_lock;
    only references "base" devices, and other instances of the same type are
    found by simply indexing from each such device (which are stored linearly,
    grouped by device in target.c:devices).  */
-static struct gomp_device_descr const *dispatchers[_ACC_device_hwm] = { 0 };
+static struct gomp_device_descr *dispatchers[_ACC_device_hwm] = { 0 };
 
 attribute_hidden void
-goacc_register (struct gomp_device_descr const *disp)
+goacc_register (struct gomp_device_descr *disp)
 {
   /* Only register the 0th device here.  */
   if (disp->target_id != 0)
@@ -96,7 +96,7 @@ get_openacc_name (const char *name)
     return name;
 }
 
-static struct gomp_device_descr const *
+static struct gomp_device_descr *
 resolve_device (acc_device_t d)
 {
   acc_device_t d_arg = d;
@@ -158,10 +158,10 @@ resolve_device (acc_device_t d)
    (indirectly) the target's device_init hook.  Calling multiple times without
    an intervening acc_shutdown_1 call is an error.  */
 
-static struct gomp_device_descr const *
+static struct gomp_device_descr *
 acc_init_1 (acc_device_t d)
 {
-  struct gomp_device_descr const *acc_dev;
+  struct gomp_device_descr *acc_dev;
 
   acc_dev = resolve_device (d);
 
@@ -174,7 +174,7 @@ acc_init_1 (acc_device_t d)
   /* We need to remember what we were intialized as, to check shutdown etc.  */
   init_key = d;  
 
-  gomp_init_device ((struct gomp_device_descr *) acc_dev);
+  gomp_init_device (acc_dev);
 
   return acc_dev;
 }
@@ -272,7 +272,7 @@ lazy_open (int ord)
   if (!thr)
     thr = goacc_new_thread ();
 
-  acc_dev = thr->dev = (struct gomp_device_descr *) &base_dev[ord];
+  acc_dev = thr->dev = &base_dev[ord];
 
   assert (acc_dev->target_id == ord);
 
@@ -358,7 +358,7 @@ acc_shutdown_1 (acc_device_t d)
 
   gomp_mutex_unlock (&goacc_thread_lock);
 
-  gomp_fini_device ((struct gomp_device_descr *) base_dev);
+  gomp_fini_device (base_dev);
 
   base_dev = NULL;
 }
@@ -382,7 +382,7 @@ ialias (acc_shutdown)
    current base device, else shut the old device down and re-initialize with
    the new device type.  */
 
-static struct gomp_device_descr const *
+static struct gomp_device_descr *
 lazy_init (acc_device_t d)
 {
   if (base_dev)
@@ -421,7 +421,7 @@ int
 acc_get_num_devices (acc_device_t d)
 {
   int n = 0;
-  struct gomp_device_descr const *acc_dev;
+  const struct gomp_device_descr *acc_dev;
 
   if (d == acc_device_none)
     return 0;
@@ -595,7 +595,7 @@ goacc_save_and_set_bind (acc_device_t d)
   assert (!thr->saved_bound_dev);
 
   thr->saved_bound_dev = thr->dev;
-  thr->dev = (struct gomp_device_descr *) dispatchers[d];
+  thr->dev = dispatchers[d];
 }
 
 attribute_hidden void
diff --git libgomp/oacc-int.h libgomp/oacc-int.h
index 3c2c37f..e03cd8d 100644
--- libgomp/oacc-int.h
+++ libgomp/oacc-int.h
@@ -90,10 +90,10 @@ goacc_thread (void)
 
 struct gomp_device_descr;
 
-void goacc_register (struct gomp_device_descr const *) __GOACC_NOTHROW;
+void goacc_register (struct gomp_device_descr *) __GOACC_NOTHROW;
 
 /* Current dispatcher.  */
-extern struct gomp_device_descr const *base_dev;
+extern struct gomp_device_descr *base_dev;
 
 void goacc_runtime_initialize (void);
 void goacc_save_and_set_bind (acc_device_t);
diff --git libgomp/oacc-mem.c libgomp/oacc-mem.c
index 8f7868e..60c4e8b 100644
--- libgomp/oacc-mem.c
+++ libgomp/oacc-mem.c
@@ -519,7 +519,7 @@ gomp_acc_insert_pointer (size_t mapnum, void **hostaddrs, size_t *sizes,
   struct gomp_device_descr *acc_dev = thr->dev;
 
   gomp_debug (0, "  %s: prepare mappings\n", __FUNCTION__);
-  tgt = gomp_map_vars ((struct gomp_device_descr *) acc_dev, mapnum, hostaddrs,
+  tgt = gomp_map_vars (acc_dev, mapnum, hostaddrs,
 		       NULL, sizes, kinds, true, false);
   gomp_debug (0, "  %s: mappings prepared\n", __FUNCTION__);
   tgt->prev = acc_dev->openacc.data_environ;
diff --git libgomp/target.c libgomp/target.c
index 788d9fb..d823045 100644
--- libgomp/target.c
+++ libgomp/target.c
@@ -678,7 +678,7 @@ gomp_init_device (struct gomp_device_descr *devicep)
 }
 
 attribute_hidden void
-gomp_init_tables (const struct gomp_device_descr *devicep,
+gomp_init_tables (struct gomp_device_descr *devicep,
 		  struct gomp_memory_mapping *mm)
 {
   /* Get address mapping table for device.  */
@@ -695,7 +695,7 @@ gomp_init_tables (const struct gomp_device_descr *devicep,
       tgt->tgt_end = table[i].tgt_end;
       tgt->to_free = NULL;
       tgt->list_count = 0;
-      tgt->device_descr = (struct gomp_device_descr *) devicep;
+      tgt->device_descr = devicep;
       splay_tree_node node = tgt->array;
       splay_tree_key k = &node->key;
       k->host_start = table[i].host_start;


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/5] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin (repost)
  2014-11-11 13:54 ` [PATCH 1/5] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin (repost) Julian Brown
                     ` (4 preceding siblings ...)
  2014-12-22 18:16   ` Thomas Schwinge
@ 2014-12-22 18:55   ` Thomas Schwinge
  2014-12-23  0:57   ` Thomas Schwinge
  6 siblings, 0 replies; 36+ messages in thread
From: Thomas Schwinge @ 2014-12-22 18:55 UTC (permalink / raw)
  To: gcc-patches, Jakub Jelinek; +Cc: Julian Brown

[-- Attachment #1: Type: text/plain, Size: 6557 bytes --]

Hi!

On Tue, 11 Nov 2014 13:53:23 +0000, Julian Brown <julian@codesourcery.com> wrote:
> On Tue, 23 Sep 2014 19:19:31 +0100
> Julian Brown <julian@codesourcery.com> wrote:
> 
> > This patch contains the bulk of the OpenACC 2.0 runtime support,
> > building around, or on top of, the OpenMP 4.0 support (as previously
> > posted or already extant upstream) where we could. [...]
> 
> Here is a new version of the OpenACC support patch for libgomp, [...]

> --- a/libgomp/target.c
> +++ b/libgomp/target.c

> @@ -920,6 +1111,43 @@ gomp_target_init (void)
>        }
>      while (next);
>  
> +  /* Prefer a device with TARGET_CAP_OPENMP_400 for ICV default-device-var.  */
> +  if (num_devices > 1)
> +    {
> +      int d = gomp_icv (false)->default_device_var;
> +
> +      if (!(devices[d].capabilities & TARGET_CAP_OPENMP_400))
> +	{
> +	  for (i = 0; i < num_devices; i++)
> +	    {
> +	      if (devices[i].capabilities & TARGET_CAP_OPENMP_400)
> +		{
> +		  struct gomp_device_descr device_tmp = devices[d];
> +		  devices[d] = devices[i];
> +		  devices[d].id = d + 1;
> +		  devices[i] = device_tmp;
> +		  devices[i].id = i + 1;
> +
> +		  break;
> +		}
> +	    }
> +	}
> +    }

A thinko of mine; committed to gomp-4_0-branch in r219029:

commit 806b4f5eed613a43bf52082816469268df0ed9a5
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Mon Dec 22 18:31:42 2014 +0000

    libgomp: For OpenMP offloading, only publicize GOMP_OFFLOAD_CAP_OPENMP_400 devices.
    
    	libgomp/
    	* target.c (num_devices_openmp): New variable.
    	(gomp_get_num_devices): Use it.
    	(gomp_target_init): Initialize it, and sort the devices array
    	appropriately.
    
    With Intel MIC offloading (emulation), this fixes:
    
        FAIL: libgomp.c/examples-4/e.57.2.c execution test
        FAIL: libgomp.fortran/examples-4/e.57.2.f90   -O0  execution test
        FAIL: libgomp.fortran/examples-4/e.57.2.f90   -O1  execution test
        FAIL: libgomp.fortran/examples-4/e.57.2.f90   -O2  execution test
        FAIL: libgomp.fortran/examples-4/e.57.2.f90   -O3 -fomit-frame-pointer  execution test
        FAIL: libgomp.fortran/examples-4/e.57.2.f90   -O3 -fomit-frame-pointer -funroll-loops  execution test
        FAIL: libgomp.fortran/examples-4/e.57.2.f90   -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions  execution test
        FAIL: libgomp.fortran/examples-4/e.57.2.f90   -O3 -g  execution test
        FAIL: libgomp.fortran/examples-4/e.57.2.f90   -Os  execution test
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@219029 138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog.gomp |  5 +++++
 libgomp/target.c       | 52 ++++++++++++++++++++++++++++----------------------
 2 files changed, 34 insertions(+), 23 deletions(-)

diff --git libgomp/ChangeLog.gomp libgomp/ChangeLog.gomp
index eabf737..b9bd024 100644
--- libgomp/ChangeLog.gomp
+++ libgomp/ChangeLog.gomp
@@ -1,5 +1,10 @@
 2014-12-22  Thomas Schwinge  <thomas@codesourcery.com>
 
+	* target.c (num_devices_openmp): New variable.
+	(gomp_get_num_devices): Use it.
+	(gomp_target_init): Initialize it, and sort the devices array
+	appropriately.
+
 	* libgomp.h (struct gomp_device_descr): Remove id member.  Update
 	all users.
 
diff --git libgomp/target.c libgomp/target.c
index 226b95b..bf6edd2 100644
--- libgomp/target.c
+++ libgomp/target.c
@@ -67,6 +67,9 @@ static struct gomp_device_descr *devices;
 /* Total number of available devices.  */
 static int num_devices;
 
+/* Number of GOMP_OFFLOAD_CAP_OPENMP_400 devices.  */
+static int num_devices_openmp;
+
 /* The comparison function.  */
 
 attribute_hidden int
@@ -94,7 +97,7 @@ attribute_hidden int
 gomp_get_num_devices (void)
 {
   gomp_init_targets_once ();
-  return num_devices;
+  return num_devices_openmp;
 }
 
 static struct gomp_device_descr *
@@ -1048,9 +1051,11 @@ gomp_register_image_for_device (struct gomp_device_descr *device,
 }
 
 /* This function initializes the runtime needed for offloading.
-   It parses the list of offload targets and tries to load the plugins for these
-   targets.  Result of the function is properly initialized variable NUM_DEVICES
-   and array DEVICES, containing descriptors for corresponding devices.  */
+   It parses the list of offload targets and tries to load the plugins for
+   these targets.  On return, the variables NUM_DEVICES and NUM_DEVICES_OPENMP
+   will be set, and the array DEVICES initialized, containing descriptors for
+   corresponding devices, first the GOMP_OFFLOAD_CAP_OPENMP_400 ones, follows
+   by the others.  */
 
 static void
 gomp_target_init (void)
@@ -1089,6 +1094,8 @@ gomp_target_init (void)
 	    new_num_devices = current_device.get_num_devices_func ();
 	    if (new_num_devices >= 1)
 	      {
+		/* Augment DEVICES and NUM_DEVICES.  */
+
 		devices = realloc (devices, (num_devices + new_num_devices)
 				   * sizeof (struct gomp_device_descr));
 		if (!devices)
@@ -1121,27 +1128,26 @@ gomp_target_init (void)
       }
     while (next);
 
-  /* Prefer a device with GOMP_OFFLOAD_CAP_OPENMP_400 for ICV
-     default-device-var.  */
-  if (num_devices > 1)
+  /* In DEVICES, sort the GOMP_OFFLOAD_CAP_OPENMP_400 ones first, and set
+     NUM_DEVICES_OPENMP.  */
+  struct gomp_device_descr *devices_s
+    = malloc (num_devices * sizeof (struct gomp_device_descr));
+  if (!devices_s)
     {
-      int d = gomp_icv (false)->default_device_var;
-
-      if (!(devices[d].capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
-	{
-	  for (i = 0; i < num_devices; i++)
-	    {
-	      if (devices[i].capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
-		{
-		  struct gomp_device_descr device_tmp = devices[d];
-		  devices[d] = devices[i];
-		  devices[i] = device_tmp;
-
-		  break;
-		}
-	    }
-	}
+      num_devices = 0;
+      free (devices);
+      devices = NULL;
     }
+  num_devices_openmp = 0;
+  for (i = 0; i < num_devices; i++)
+    if (devices[i].capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
+      devices_s[num_devices_openmp++] = devices[i];
+  int num_devices_after_openmp = num_devices_openmp;
+  for (i = 0; i < num_devices; i++)
+    if (!(devices[i].capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
+      devices_s[num_devices_after_openmp++] = devices[i];
+  free (devices);
+  devices = devices_s;
 
   for (i = 0; i < num_devices; i++)
     {


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/5] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin (repost)
  2014-11-11 13:54 ` [PATCH 1/5] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin (repost) Julian Brown
                     ` (5 preceding siblings ...)
  2014-12-22 18:55   ` Thomas Schwinge
@ 2014-12-23  0:57   ` Thomas Schwinge
  6 siblings, 0 replies; 36+ messages in thread
From: Thomas Schwinge @ 2014-12-23  0:57 UTC (permalink / raw)
  To: gcc-patches, Jakub Jelinek; +Cc: Julian Brown

[-- Attachment #1: Type: text/plain, Size: 19994 bytes --]

Hi!

On Tue, 11 Nov 2014 13:53:23 +0000, Julian Brown <julian@codesourcery.com> wrote:
> On Tue, 23 Sep 2014 19:19:31 +0100
> Julian Brown <julian@codesourcery.com> wrote:
> 
> > This patch contains the bulk of the OpenACC 2.0 runtime support,
> > building around, or on top of, the OpenMP 4.0 support (as previously
> > posted or already extant upstream) where we could. [...]
> 
> Here is a new version of the OpenACC support patch for libgomp, [...]

> --- a/libgomp/target.c
> +++ b/libgomp/target.c

> @@ -673,7 +753,12 @@ GOMP_target (int device, void (*fn) (void *), const void *openmp_target,
>  	     unsigned char *kinds)
>  {
>    struct gomp_device_descr *devicep = resolve_device (device);
> -  if (devicep == NULL)
> +  struct gomp_memory_mapping *mm = &devicep->mem_map;
> +
> +  if (devicep != NULL && !devicep->is_initialized)
> +    gomp_init_dev_tables (devicep);
> +
> +  if (devicep == NULL || !(devicep->capabilities & TARGET_CAP_OPENMP_400))
>      {
>        /* Host fallback.  */
>        struct gomp_thread old_thr, *thr = gomp_thread ();
> @@ -690,20 +775,30 @@ GOMP_target (int device, void (*fn) (void *), const void *openmp_target,
>        return;
>      }
>  
> -  gomp_mutex_lock (&devicep->dev_env_lock);
> -  if (!devicep->is_initialized)
> -    gomp_init_device (devicep);
> [...]
> -  gomp_mutex_unlock (&devicep->dev_env_lock);

Here, gomp_init_device has (correctly) been replaced with
gomp_init_dev_tables, but also locking of dev_env_lock removed.

Then, shortly after that:

> [...]
> +      gomp_mutex_lock (&mm->lock);
> +      if (!devicep->is_initialized)
> +	gomp_init_dev_tables (devicep);
> [...]
> +      gomp_mutex_unlock (&mm->lock);
> [...]

Again checking the device's is_initialized flag (should instead be the
memory mappings'?), and again calling gomp_init_dev_tables?  What I think
this meant to do is more fine-grained locking and initialization,
distinguishing between the device itself and its memory mapping tables?

Similar for GOMP_target_data and GOMP_target_update.

Removing locking from device initialization is certainly not a good idea,
for example:

    $ build-gcc/gcc/xgcc -Bbuild-gcc/gcc/ -Bbuild-gcc/x86_64-unknown-linux-gnu/./libgomp/ -Bbuild-gcc/x86_64-unknown-linux-gnu/./libgomp/.libs -Ibuild-gcc/x86_64-unknown-linux-gnu/./libgomp -Isource-gcc/libgomp/testsuite/../../include -Isource-gcc/libgomp/testsuite/.. -Binstall/offload-x86_64-intelmicemul-linux-gnu/libexec/gcc/x86_64-unknown-linux-gnu/5.0.0 -Binstall/offload-x86_64-intelmicemul-linux-gnu/bin -fopenmp -Bbuild-gcc/x86_64-unknown-linux-gnu/./libgomp/../libquadmath/.libs/   -O0   -Bbuild-gcc/x86_64-unknown-linux-gnu/./libgomp/../libgfortran/.libs -fintrinsic-modules-path=build-gcc/x86_64-unknown-linux-gnu/./libgomp   -Lbuild-gcc/x86_64-unknown-linux-gnu/./libgomp/.libs -Lbuild-gcc/x86_64-unknown-linux-gnu/./libgomp/../libquadmath/.libs/ -Lbuild-gcc/x86_64-unknown-linux-gnu/./libgomp/../libgfortran/.libs -lgfortran -lm -g source-gcc/libgomp/testsuite/libgomp.fortran/target7.f90 
    $ LD_LIBRARY_PATH=.:build-gcc/x86_64-unknown-linux-gnu/./libgomp/.libs:build-gcc/x86_64-unknown-linux-gnu/./libgomp/../liboffloadmic/.libs:build-gcc/x86_64-unknown-linux-gnu/./libgomp/../liboffloadmic/plugin/.libs:build-gcc/x86_64-unknown-linux-gnu/./libgomp/../libstdc++-v3/src/.libs:install/offload-x86_64-intelmicemul-linux-gnu/lib64:install/offload-x86_64-intelmicemul-linux-gnu/lib:build-gcc/gcc:build-gcc/x86_64-unknown-linux-gnu/./libgomp/../libgfortran/.libs:build-gcc/x86_64-unknown-linux-gnu/./libgomp/../libquadmath/.libs:.:build-gcc/x86_64-unknown-linux-gnu/./libgomp/.libs:build-gcc/x86_64-unknown-linux-gnu/./libgomp/../liboffloadmic/.libs:build-gcc/x86_64-unknown-linux-gnu/./libgomp/../liboffloadmic/plugin/.libs:build-gcc/x86_64-unknown-linux-gnu/./libgomp/../libstdc++-v3/src/.libs:install/offload-x86_64-intelmicemul-linux-gnu/lib64:install/offload-x86_64-intelmicemul-linux-gnu/lib:build-gcc/gcc:build-gcc/x86_64-unknown-linux-gnu/./libgomp/../libgfortran/.libs:build-gcc/x86_64-unknown-linux-gnu/./libgomp/../libquadmath/.libs: ./a.out
    
    libgomp: 
    libgomp: Duplicate node
    Duplicate node
    libgomp: 
    COI ERROR - TARGET: Cannot read from pipe.
    Success
    
    offload error: wait for process shutdown failed on device 0 (error code 1)

Smells like a concurrency issue, and indeed, with OMP_NUM_THREADS=1
everything is fine.  Confirmed (by looking at the source code, or) with
GDB:

    #0  gomp_fatal (fmt=fmt@entry=0x7ffff75ac69b "Duplicate node")
        at /home/thomas/tmp/source/gcc/openacc/gomp-4_0-branch-work_/source-gcc/libgomp/error.c:85
    #1  0x00007ffff75a857b in splay_tree_insert (sp=sp@entry=0x7fffe40058e8, node=node@entry=0x608570)
        at /home/thomas/tmp/source/gcc/openacc/gomp-4_0-branch-work_/source-gcc/libgomp/splay-tree.c:156
    #2  0x00007ffff75a5a06 in gomp_init_tables (devicep=devicep@entry=0x7fffe40057c0, mm=mm@entry=0x7fffe40058e8)
        at /home/thomas/tmp/source/gcc/openacc/gomp-4_0-branch-work_/source-gcc/libgomp/target.c:785
    #3  0x00007ffff75a5a66 in gomp_init_dev_tables (devicep=devicep@entry=0x7fffe40057c0)
        at /home/thomas/tmp/source/gcc/openacc/gomp-4_0-branch-work_/source-gcc/libgomp/target.c:796
    #4  0x00007ffff75a5b8b in GOMP_target (device=-1, fn=0x400b9c <MAIN__._omp_fn.2>, offload_table=0x6014c0, mapnum=3, hostaddrs=0x7fffffffbe60, 
        sizes=0x7fffffffbe90, kinds=0x6014a0 <.omp_data_kinds.14> "\003\034\023")
        at /home/thomas/tmp/source/gcc/openacc/gomp-4_0-branch-work_/source-gcc/libgomp/target.c:845
    #5  0x0000000000400dc4 in MAIN__._omp_fn.1 () at source-gcc/libgomp/testsuite/libgomp.fortran/target7.f90:21
    #6  0x00007ffff759deef in gomp_barrier_handle_tasks (state=state@entry=0)
        at /home/thomas/tmp/source/gcc/openacc/gomp-4_0-branch-work_/source-gcc/libgomp/task.c:733
    #7  0x00007ffff75a24cc in gomp_team_barrier_wait_end (bar=0x607190, state=0)
        at /home/thomas/tmp/source/gcc/openacc/gomp-4_0-branch-work_/source-gcc/libgomp/config/linux/bar.c:116
    #8  0x0000000000400dee in MAIN__._omp_fn.0 () at source-gcc/libgomp/testsuite/libgomp.fortran/target7.f90:16
    #9  0x00007ffff759bd3f in GOMP_parallel (fn=0x400dd3 <MAIN__._omp_fn.0>, data=0x7fffffffd050, num_threads=8, flags=0)
        at /home/thomas/tmp/source/gcc/openacc/gomp-4_0-branch-work_/source-gcc/libgomp/parallel.c:168
    #10 0x0000000000400b03 in MAIN__ () at source-gcc/libgomp/testsuite/libgomp.fortran/target7.f90:16

GOMP_target inside GOMP_parallel.

From a quick look, on the OpenACC side things also don't look too good.
This all will need careful review for concurrency issues, but I'm not
addresseing all of that in this patch, primarily just restoring OpenMP
device initialization locking.  Committed to gomp-4_0-branch in r219036:

commit c20a14aa9800eafcdbe9775cae0e47c2eb4a9769
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Mon Dec 22 22:53:55 2014 +0000

    libgomp: Fix locking in OpenMP GOMP_target* functions.
    
    	libgomp/
    	* libgomp.c (struct gomp_device_descr): Add lock member.
    	* oacc-host.c (goacc_host_init): Initialize it.
    	* target.c (gomp_target_init): Likewise.
    	(gomp_init_dev_tables): Remove function.
    	(GOMP_target, GOMP_target_data, GOMP_target_update): Instead of
    	calling gomp_init_dev_tables, separate device and memory mapping
    	initilization, guarded by appropriate locking.  Check (immutable)
    	device capabilities early.
    
    With Intel MIC offloading (emulation), this fixes:
    
        FAIL: libgomp.fortran/target7.f90   -O0  execution test
        FAIL: libgomp.fortran/target7.f90   -O1  execution test
        FAIL: libgomp.fortran/target7.f90   -O2  execution test
        FAIL: libgomp.fortran/target7.f90   -O3 -fomit-frame-pointer  execution test
        FAIL: libgomp.fortran/target7.f90   -O3 -fomit-frame-pointer -funroll-loops  execution test
        FAIL: libgomp.fortran/target7.f90   -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions  execution test
        FAIL: libgomp.fortran/target7.f90   -O3 -g  execution test
        FAIL: libgomp.fortran/target7.f90   -Os  execution test
        FAIL: libgomp.fortran/target8.f90   -O0  execution test
        FAIL: libgomp.fortran/target8.f90   -O1  execution test
        FAIL: libgomp.fortran/target8.f90   -O2  execution test
        FAIL: libgomp.fortran/target8.f90   -O3 -fomit-frame-pointer  execution test
        FAIL: libgomp.fortran/target8.f90   -O3 -fomit-frame-pointer -funroll-loops  execution test
        FAIL: libgomp.fortran/target8.f90   -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions  execution test
        FAIL: libgomp.fortran/target8.f90   -O3 -g  execution test
        FAIL: libgomp.fortran/target8.f90   -Os  execution test
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@219036 138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog.gomp | 12 ++++++++
 libgomp/libgomp.h      | 11 ++++++++
 libgomp/oacc-host.c    |  1 +
 libgomp/oacc-init.c    | 12 ++++++--
 libgomp/target.c       | 77 +++++++++++++++++++++++++++++---------------------
 5 files changed, 78 insertions(+), 35 deletions(-)

diff --git libgomp/ChangeLog.gomp libgomp/ChangeLog.gomp
index b9bd024..2f5b4ae 100644
--- libgomp/ChangeLog.gomp
+++ libgomp/ChangeLog.gomp
@@ -1,5 +1,17 @@
 2014-12-22  Thomas Schwinge  <thomas@codesourcery.com>
 
+	* libgomp.c (struct gomp_device_descr): Add lock member.
+	* oacc-host.c (goacc_host_init): Initialize it.
+	* target.c (gomp_target_init): Likewise.
+	(gomp_init_dev_tables): Remove function.
+	(GOMP_target, GOMP_target_data, GOMP_target_update): Instead of
+	calling gomp_init_dev_tables, separate device and memory mapping
+	initilization, guarded by appropriate locking.  Check (immutable)
+	device capabilities early.
+
+	* oacc-init.c (lazy_open, acc_shutdown_1): Lock mem_map when in
+	use.
+
 	* target.c (num_devices_openmp): New variable.
 	(gomp_get_num_devices): Use it.
 	(gomp_target_init): Initialize it, and sort the devices array
diff --git libgomp/libgomp.h libgomp/libgomp.h
index b6d216b..edcd7bc 100644
--- libgomp/libgomp.h
+++ libgomp/libgomp.h
@@ -682,9 +682,11 @@ typedef struct acc_dispatch_t
      acc_map_data/acc_unmap_data or "acc enter data"/"acc exit data" pragmas
      (TODO).  Unlike mapped_data in the goacc_thread struct, unmapping can
      happen out-of-order with respect to mapping.  */
+  /* This is guarded by the lock in the "outer" struct gomp_device_descr.  */
   struct target_mem_desc *data_environ;
 
   /* Extra information required for a device instance by a given target.  */
+  /* This is guarded by the lock in the "outer" struct gomp_device_descr.  */
   void *target_data;
 
   /* Open or close a device instance.  */
@@ -730,6 +732,9 @@ typedef struct acc_dispatch_t
    mapped memory.  */
 struct gomp_device_descr
 {
+  /* Immutable data, which is only set during initialization, and which is not
+     guarded by the lock.  */
+
   /* The name of the device.  */
   const char *name;
 
@@ -764,10 +769,16 @@ struct gomp_device_descr
   void (*run_func) (int, void *, void *);
 
   /* OpenACC-specific functions.  */
+  /* This is mutable because of its mutable data_environ and target_data
+     members.  */
   acc_dispatch_t openacc;
 
   /* Memory-mapping info for this device instance.  */
+  /* Uses a separate lock.  */
   struct gomp_memory_mapping mem_map;
+
+  /* Mutex for the mutable data.  */
+  gomp_mutex_t lock;
 };
 
 extern void gomp_acc_insert_pointer (size_t, void **, size_t *, void *);
diff --git libgomp/oacc-host.c libgomp/oacc-host.c
index 2a82517..c375463 100644
--- libgomp/oacc-host.c
+++ libgomp/oacc-host.c
@@ -96,5 +96,6 @@ static __attribute__ ((constructor))
 void goacc_host_init (void)
 {
   gomp_mutex_init (&host_dispatch.mem_map.lock);
+  gomp_mutex_init (&host_dispatch.lock);
   goacc_register (&host_dispatch);
 }
diff --git libgomp/oacc-init.c libgomp/oacc-init.c
index d10b974..6acf61b 100644
--- libgomp/oacc-init.c
+++ libgomp/oacc-init.c
@@ -287,8 +287,11 @@ lazy_open (int ord)
 
   acc_dev->openacc.async_set_async_func (acc_async_sync);
 
-  if (!acc_dev->mem_map.is_initialized)
-    gomp_init_tables (acc_dev, &acc_dev->mem_map);
+  struct gomp_memory_mapping *mem_map = &acc_dev->mem_map;
+  gomp_mutex_lock (&mem_map->lock);
+  if (!mem_map->is_initialized)
+    gomp_init_tables (acc_dev, mem_map);
+  gomp_mutex_unlock (&mem_map->lock);
 }
 
 /* OpenACC 2.0a (3.2.12, 3.2.13) doesn't specify whether the serialization of
@@ -350,7 +353,10 @@ acc_shutdown_1 (acc_device_t d)
 
 	  walk->dev->openacc.target_data = target_data = NULL;
 
-	  gomp_free_memmap (&walk->dev->mem_map);
+	  struct gomp_memory_mapping *mem_map = &walk->dev->mem_map;
+	  gomp_mutex_lock (&mem_map->lock);
+	  gomp_free_memmap (mem_map);
+	  gomp_mutex_unlock (&mem_map->lock);
 
 	  walk->dev = NULL;
 	}
diff --git libgomp/target.c libgomp/target.c
index bf6edd2..53e0c7f 100644
--- libgomp/target.c
+++ libgomp/target.c
@@ -44,6 +44,7 @@
 
 static void gomp_target_init (void);
 
+/* The whole initialization code for offloading plugins is only run one.  */
 static pthread_once_t gomp_is_initialized = PTHREAD_ONCE_INIT;
 
 /* This structure describes an offload image.
@@ -169,6 +170,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
       tgt_size = mapnum * sizeof (void *);
     }
   gomp_mutex_lock (&mm->lock);
+
   for (i = 0; i < mapnum; i++)
     {
       int kind = get_kind (is_openacc, kinds, i);
@@ -552,8 +554,9 @@ gomp_unmap_vars (struct target_mem_desc *tgt, bool do_copyfrom)
       return;
     }
 
-  size_t i;
   gomp_mutex_lock (&mm->lock);
+
+  size_t i;
   for (i = 0; i < tgt->list_count; i++)
     if (tgt->list[i] == NULL)
       ;
@@ -580,6 +583,7 @@ gomp_unmap_vars (struct target_mem_desc *tgt, bool do_copyfrom)
     tgt->refcount--;
   else
     gomp_unmap_tgt (tgt);
+
   gomp_mutex_unlock (&mm->lock);
 }
 
@@ -669,17 +673,19 @@ GOMP_offload_register (void *host_table, enum offload_target_type target_type,
   num_offload_images++;
 }
 
-/* This function initializes the target device, specified by DEVICEP.  */
+/* This function initializes the target device, specified by DEVICEP.  DEVICEP
+   must be locked on entry, and remains locked on return.  */
 
 attribute_hidden void
 gomp_init_device (struct gomp_device_descr *devicep)
 {
-  /* Initialize the target device.  */
   devicep->init_device_func (devicep->target_id);
-
   devicep->is_initialized = true;
 }
 
+/* Initialize address mapping tables.  MM must be locked on entry, and remains
+   locked on return.  */
+
 attribute_hidden void
 gomp_init_tables (struct gomp_device_descr *devicep,
 		  struct gomp_memory_mapping *mm)
@@ -716,13 +722,8 @@ gomp_init_tables (struct gomp_device_descr *devicep,
   mm->is_initialized = true;
 }
 
-static void
-gomp_init_dev_tables (struct gomp_device_descr *devicep)
-{
-  gomp_init_device (devicep);
-  gomp_init_tables (devicep, &devicep->mem_map);
-}
-
+/* Free address mapping tables.  MM must be locked on entry, and remains locked
+   on return.  */
 
 attribute_hidden void
 gomp_free_memmap (struct gomp_memory_mapping *mm)
@@ -739,6 +740,9 @@ gomp_free_memmap (struct gomp_memory_mapping *mm)
   mm->is_initialized = false;
 }
 
+/* This function de-initializes the target device, specified by DEVICEP.
+   DEVICEP must be locked on entry, and remains locked on return.  */
+
 attribute_hidden void
 gomp_fini_device (struct gomp_device_descr *devicep)
 {
@@ -766,9 +770,6 @@ GOMP_target (int device, void (*fn) (void *), const void *offload_table,
 {
   struct gomp_device_descr *devicep = resolve_device (device);
 
-  if (devicep != NULL && !devicep->is_initialized)
-    gomp_init_dev_tables (devicep);
-
   if (devicep == NULL
       || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
     {
@@ -787,6 +788,11 @@ GOMP_target (int device, void (*fn) (void *), const void *offload_table,
       return;
     }
 
+  gomp_mutex_lock (&devicep->lock);
+  if (!devicep->is_initialized)
+    gomp_init_device (devicep);
+  gomp_mutex_unlock (&devicep->lock);
+
   void *fn_addr;
 
   if (devicep->capabilities & GOMP_OFFLOAD_CAP_NATIVE_EXEC)
@@ -795,15 +801,17 @@ GOMP_target (int device, void (*fn) (void *), const void *offload_table,
     {
       struct gomp_memory_mapping *mm = &devicep->mem_map;
       gomp_mutex_lock (&mm->lock);
-      if (!devicep->is_initialized)
-	gomp_init_dev_tables (devicep);
+
+      if (!mm->is_initialized)
+	gomp_init_tables (devicep, mm);
+
       struct splay_tree_key_s k;
       k.host_start = (uintptr_t) fn;
       k.host_end = k.host_start + 1;
-      splay_tree_key tgt_fn = splay_tree_lookup (&devicep->mem_map.splay_tree,
-						 &k);
+      splay_tree_key tgt_fn = splay_tree_lookup (&mm->splay_tree, &k);
       if (tgt_fn == NULL)
 	gomp_fatal ("Target function wasn't mapped");
+
       gomp_mutex_unlock (&mm->lock);
 
       fn_addr = (void *) tgt_fn->tgt->tgt_start;
@@ -832,9 +840,6 @@ GOMP_target_data (int device, const void *offload_table, size_t mapnum,
 {
   struct gomp_device_descr *devicep = resolve_device (device);
 
-  if (devicep != NULL && !devicep->is_initialized)
-    gomp_init_dev_tables (devicep);
-
   if (devicep == NULL
       || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
     {
@@ -854,10 +859,15 @@ GOMP_target_data (int device, const void *offload_table, size_t mapnum,
       return;
     }
 
+  gomp_mutex_lock (&devicep->lock);
+  if (!devicep->is_initialized)
+    gomp_init_device (devicep);
+  gomp_mutex_unlock (&devicep->lock);
+
   struct gomp_memory_mapping *mm = &devicep->mem_map;
   gomp_mutex_lock (&mm->lock);
-  if (!devicep->is_initialized)
-    gomp_init_dev_tables (devicep);
+  if (!mm->is_initialized)
+    gomp_init_tables (devicep, mm);
   gomp_mutex_unlock (&mm->lock);
 
   struct target_mem_desc *tgt
@@ -886,20 +896,22 @@ GOMP_target_update (int device, const void *offload_table, size_t mapnum,
 {
   struct gomp_device_descr *devicep = resolve_device (device);
 
-  if (devicep == NULL)
+  if (devicep == NULL
+      || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
     return;
 
+  gomp_mutex_lock (&devicep->lock);
+  if (!devicep->is_initialized)
+    gomp_init_device (devicep);
+  gomp_mutex_unlock (&devicep->lock);
+
   struct gomp_memory_mapping *mm = &devicep->mem_map;
   gomp_mutex_lock (&mm->lock);
-  if (!devicep->is_initialized)
-    gomp_init_dev_tables (devicep);
+  if (!mm->is_initialized)
+    gomp_init_tables (devicep, mm);
   gomp_mutex_unlock (&mm->lock);
 
-  if (!(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
-    return;
-
-  gomp_update (devicep, &devicep->mem_map, mapnum, hostaddrs, sizes, kinds,
-	       false);
+  gomp_update (devicep, mm, mapnum, hostaddrs, sizes, kinds, false);
 }
 
 void
@@ -1035,7 +1047,7 @@ gomp_load_plugin_for_device (struct gomp_device_descr *device,
 }
 
 /* This function adds a compatible offload image IMAGE to an accelerator device
-   DEVICE.  */
+   DEVICE.  DEVICE must be locked on entry, and remains locked on return.  */
 
 static void
 gomp_register_image_for_device (struct gomp_device_descr *device,
@@ -1118,6 +1130,7 @@ gomp_target_init (void)
 		    current_device.target_id = i;
 		    devices[num_devices] = current_device;
 		    gomp_mutex_init (&devices[num_devices].mem_map.lock);
+		    gomp_mutex_init (&devices[num_devices].lock);
 		    num_devices++;
 		  }
 	      }


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* libgomp: Now known as the GNU Offloading and Multi Processing Runtime Library (was: libgomp: "GNU OpenMP Runtime Library")
  2014-11-12 20:50                 ` David Malcolm
@ 2015-01-11  2:18                   ` Thomas Schwinge
  2015-01-29 10:25                     ` Thomas Schwinge
  0 siblings, 1 reply; 36+ messages in thread
From: Thomas Schwinge @ 2015-01-11  2:18 UTC (permalink / raw)
  To: gcc-patches
  Cc: gcc, Ilya Verbin, Julian Brown, David Malcolm, Jakub Jelinek, burnus

[-- Attachment #1: Type: text/plain, Size: 60159 bytes --]

Hi!

On Wed, 12 Nov 2014 15:43:06 -0500, David Malcolm <dmalcolm@redhat.com> wrote:
> On Wed, 2014-11-12 at 21:30 +0100, Jakub Jelinek wrote:
> > On Wed, Nov 12, 2014 at 03:22:21PM -0500, David Malcolm wrote:
> > > On Wed, 2014-11-12 at 14:47 +0100, Jakub Jelinek wrote:
> > > > On Wed, Nov 12, 2014 at 08:33:34AM -0500, David Malcolm wrote:
> > > > > Apologies for bikeshedding, and I normally dislike "cute" names, but
> > > > > renaming it to
> > > > > 
> > > > >    "GNU Offloading and Multi Processing library"

Oh, how cute!  ;-P

> > > > > would allow a backronym of "libgomp", thus preserving the existing
> > > > > filenames/SONAME etc.
> > > > 
> > > > I think this is fine, can you change it both in libgomp/configure.ac
> > > > and texi docs?
> > > 
> > > Am attaching a patch that does so, though I suspect the wording in the
> > > texi may need some more work (not my area of expertise).
> > 
> > Oops, I didn't mean by "you" above you, but the OpenACC folks, sorry for
> > confusion.  Anyway, your patch is ok for trunk.  Thanks.
> 
> Ah, ok :)   Presumably this is conditional on the rest of the OpenACC
> work merging?  (AIUI the OpenACC work is not yet on trunk, right?)
> 
> If so, perhaps the OpenACC people can adopt the patch and apply it (or
> in a modified form) when they merge their work?

I already committed this patch to trunk in r219425, see below.

> Sorry if I'm stepping on any toes here

Not at all -- thanks to you and Julian for preparing the patch!  As
pointed out by Tobias in
<https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01430.html>, we'll also
need to update some more files outside of the GCC sources repository,
that is, in the web pages repository as well as some wiki pages, I
assume, which I'll do next week.

commit c35c9a626070a8660c10a37786cedf2d6e3742c9
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Sat Jan 10 19:10:37 2015 +0000

    libgomp: Now known as the GNU Offloading and Multi Processing Runtime Library.
    
    	libgomp/
    	* configure.ac: Rename libgomp from "GNU OpenMP Runtime Library"
    	to "GNU Offloading and Multi Processing Runtime Library".  Change
    	all users.
    	* configure: Regenerate.
    	* libgomp.texi: Update.
    	gcc/
    	* doc/install.texi: Update for libgomp being renamed from "GNU
    	OpenMP Runtime Library" to "GNU Offloading and Multi Processing
    	Runtime Library".
    	* doc/sourcebuild.texi: Likewise.
    	gcc/fortran/
    	* gfortran.texi: Update for libgomp being renamed from "GNU OpenMP
    	Runtime Library" to "GNU Offloading and Multi Processing Runtime
    	Library".
    	* intrinsic.texi: Likewise.
    	libstdc++-v3/
    	* doc/xml/manual/parallel_mode.xml: Update for libgomp being
    	renamed from "GNU OpenMP Runtime Library" to "GNU Offloading and
    	Multi Processing Runtime Library".
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@219425 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog                                    |  7 +++++++
 gcc/doc/install.texi                             |  3 ++-
 gcc/doc/sourcebuild.texi                         |  2 +-
 gcc/fortran/ChangeLog                            |  7 +++++++
 gcc/fortran/gfortran.texi                        |  5 +++--
 gcc/fortran/intrinsic.texi                       |  3 ++-
 libgomp/ChangeLog                                | 10 ++++++++++
 libgomp/alloc.c                                  |  3 ++-
 libgomp/barrier.c                                |  3 ++-
 libgomp/config/bsd/proc.c                        |  3 ++-
 libgomp/config/linux/affinity.c                  |  3 ++-
 libgomp/config/linux/alpha/futex.h               |  3 ++-
 libgomp/config/linux/bar.c                       |  3 ++-
 libgomp/config/linux/bar.h                       |  3 ++-
 libgomp/config/linux/futex.h                     |  3 ++-
 libgomp/config/linux/ia64/futex.h                |  3 ++-
 libgomp/config/linux/lock.c                      |  3 ++-
 libgomp/config/linux/mips/futex.h                |  3 ++-
 libgomp/config/linux/mutex.c                     |  3 ++-
 libgomp/config/linux/mutex.h                     |  3 ++-
 libgomp/config/linux/powerpc/futex.h             |  3 ++-
 libgomp/config/linux/proc.c                      |  3 ++-
 libgomp/config/linux/proc.h                      |  3 ++-
 libgomp/config/linux/ptrlock.c                   |  3 ++-
 libgomp/config/linux/ptrlock.h                   |  3 ++-
 libgomp/config/linux/s390/futex.h                |  3 ++-
 libgomp/config/linux/sem.c                       |  3 ++-
 libgomp/config/linux/sem.h                       |  3 ++-
 libgomp/config/linux/sparc/futex.h               |  3 ++-
 libgomp/config/linux/tile/futex.h                |  3 ++-
 libgomp/config/linux/wait.h                      |  3 ++-
 libgomp/config/linux/x86/futex.h                 |  3 ++-
 libgomp/config/mingw32/proc.c                    |  3 ++-
 libgomp/config/mingw32/time.c                    |  3 ++-
 libgomp/config/posix/affinity.c                  |  3 ++-
 libgomp/config/posix/bar.c                       |  3 ++-
 libgomp/config/posix/bar.h                       |  3 ++-
 libgomp/config/posix/lock.c                      |  3 ++-
 libgomp/config/posix/mutex.h                     |  3 ++-
 libgomp/config/posix/proc.c                      |  3 ++-
 libgomp/config/posix/ptrlock.h                   |  3 ++-
 libgomp/config/posix/sem.c                       |  3 ++-
 libgomp/config/posix/sem.h                       |  3 ++-
 libgomp/config/posix/time.c                      |  3 ++-
 libgomp/configure                                | 22 +++++++++++-----------
 libgomp/configure.ac                             |  2 +-
 libgomp/critical.c                               |  3 ++-
 libgomp/env.c                                    |  3 ++-
 libgomp/error.c                                  |  3 ++-
 libgomp/fortran.c                                |  3 ++-
 libgomp/iter.c                                   |  3 ++-
 libgomp/iter_ull.c                               |  3 ++-
 libgomp/libgomp.h                                |  3 ++-
 libgomp/libgomp.texi                             | 24 +++++++++++++++---------
 libgomp/libgomp_f.h.in                           |  3 ++-
 libgomp/libgomp_g.h                              |  3 ++-
 libgomp/libgomp_target.h                         |  3 ++-
 libgomp/loop.c                                   |  3 ++-
 libgomp/loop_ull.c                               |  3 ++-
 libgomp/omp.h.in                                 |  3 ++-
 libgomp/omp_lib.f90.in                           |  3 ++-
 libgomp/omp_lib.h.in                             |  3 ++-
 libgomp/ordered.c                                |  3 ++-
 libgomp/parallel.c                               |  3 ++-
 libgomp/sections.c                               |  3 ++-
 libgomp/single.c                                 |  3 ++-
 libgomp/splay-tree.h                             |  3 ++-
 libgomp/target.c                                 |  3 ++-
 libgomp/task.c                                   |  3 ++-
 libgomp/team.c                                   |  3 ++-
 libgomp/work.c                                   |  3 ++-
 liboffloadmic/plugin/Makefile.am                 |  3 ++-
 liboffloadmic/plugin/Makefile.in                 |  3 ++-
 liboffloadmic/plugin/configure.ac                |  3 ++-
 liboffloadmic/plugin/libgomp-plugin-intelmic.cpp |  3 ++-
 liboffloadmic/plugin/offload_target_main.cpp     |  3 ++-
 libstdc++-v3/ChangeLog                           |  6 ++++++
 libstdc++-v3/doc/xml/manual/parallel_mode.xml    |  4 +++-
 78 files changed, 200 insertions(+), 93 deletions(-)

diff --git gcc/ChangeLog gcc/ChangeLog
index 15a47fc..70a8cac 100644
--- gcc/ChangeLog
+++ gcc/ChangeLog
@@ -1,3 +1,10 @@
+2015-01-10  Thomas Schwinge  <thomas@codesourcery.com>
+
+	* doc/install.texi: Update for libgomp being renamed from "GNU
+	OpenMP Runtime Library" to "GNU Offloading and Multi Processing
+	Runtime Library".
+	* doc/sourcebuild.texi: Likewise.
+
 2015-01-10  Anthony Green  <green@moxielogic.com>
 
 	* config/moxie/moxie.c (moxie_option_override): Fix forcing of
diff --git gcc/doc/install.texi gcc/doc/install.texi
index 94e039d..c9e3bf1 100644
--- gcc/doc/install.texi
+++ gcc/doc/install.texi
@@ -1594,7 +1594,8 @@ Specify that the Fortran front end and @code{libgfortran} do not add
 support for @code{libquadmath} on systems supporting it.
 
 @item --disable-libgomp
-Specify that the run-time libraries used by GOMP should not be built.
+Specify that the GNU Offloading and Multi Processing Runtime Library
+should not be built.
 
 @item --disable-libvtv
 Specify that the run-time libraries used by vtable verification
diff --git gcc/doc/sourcebuild.texi gcc/doc/sourcebuild.texi
index 4be383c..b8b6a06 100644
--- gcc/doc/sourcebuild.texi
+++ gcc/doc/sourcebuild.texi
@@ -89,7 +89,7 @@ The Go runtime library.  The bulk of this library is mirrored from the
 @uref{http://code.google.com/@/p/@/go/, master Go repository}.
 
 @item libgomp
-The GNU OpenMP runtime library.
+The GNU Offloading and Multi Processing Runtime Library.
 
 @item libiberty
 The @code{libiberty} library, used for portability and for some
diff --git gcc/fortran/ChangeLog gcc/fortran/ChangeLog
index c10a1db..c21b46e 100644
--- gcc/fortran/ChangeLog
+++ gcc/fortran/ChangeLog
@@ -1,3 +1,10 @@
+2015-01-10  Thomas Schwinge  <thomas@codesourcery.com>
+
+	* gfortran.texi: Update for libgomp being renamed from "GNU OpenMP
+	Runtime Library" to "GNU Offloading and Multi Processing Runtime
+	Library".
+	* intrinsic.texi: Likewise.
+
 2015-01-10  Tobias Burnus  <burnus@net-b.de>
 
 	PR fortran/64522
diff --git gcc/fortran/gfortran.texi gcc/fortran/gfortran.texi
index cf96b0b..5cc624a 100644
--- gcc/fortran/gfortran.texi
+++ gcc/fortran/gfortran.texi
@@ -1910,8 +1910,9 @@ directives in fixed form; the @code{!$} conditional compilation sentinels
 in free form; and the @code{c$}, @code{*$} and @code{!$} sentinels
 in fixed form, @command{gfortran} needs to be invoked with the
 @option{-fopenmp}.  This also arranges for automatic linking of the
-GNU OpenMP runtime library @ref{Top,,libgomp,libgomp,GNU OpenMP
-runtime library}.
+GNU Offloading and Multi Processing Runtime Library
+@ref{Top,,libgomp,libgomp,GNU Offloading and Multi Processing Runtime
+Library}.
 
 The OpenMP Fortran runtime library routines are provided both in a
 form of a Fortran 90 module named @code{omp_lib} and in a form of
diff --git gcc/fortran/intrinsic.texi gcc/fortran/intrinsic.texi
index 41f499e..36c70d9 100644
--- gcc/fortran/intrinsic.texi
+++ gcc/fortran/intrinsic.texi
@@ -14030,7 +14030,8 @@ The OpenMP Fortran runtime library routines are provided both in
 a form of two Fortran 90 modules, named @code{OMP_LIB} and 
 @code{OMP_LIB_KINDS}, and in a form of a Fortran @code{include} file named
 @file{omp_lib.h}. The procedures provided by @code{OMP_LIB} can be found
-in the @ref{Top,,Introduction,libgomp,GNU OpenMP runtime library} manual,
+in the @ref{Top,,Introduction,libgomp,GNU Offloading and Multi
+Processing Runtime Library} manual,
 the named constants defined in the modules are listed
 below.
 
diff --git libgomp/ChangeLog libgomp/ChangeLog
index 11e0086..6e1e141 100644
--- libgomp/ChangeLog
+++ libgomp/ChangeLog
@@ -1,3 +1,13 @@
+2015-01-10  Thomas Schwinge  <thomas@codesourcery.com>
+	    Julian Brown  <julian@codesourcery.com>
+	    David Malcolm  <dmalcolm@redhat.com>
+
+	* configure.ac: Rename libgomp from "GNU OpenMP Runtime Library"
+	to "GNU Offloading and Multi Processing Runtime Library".  Change
+	all users.
+	* configure: Regenerate.
+	* libgomp.texi: Update.
+
 2015-01-08  Thomas Schwinge  <thomas@codesourcery.com>
 
 	* configure.ac [tgt_dir] (offload_additional_lib_paths): Also add
diff --git libgomp/alloc.c libgomp/alloc.c
index de996ee..f738a66 100644
--- libgomp/alloc.c
+++ libgomp/alloc.c
@@ -1,7 +1,8 @@
 /* Copyright (C) 2005-2015 Free Software Foundation, Inc.
    Contributed by Richard Henderson <rth@redhat.com>.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/barrier.c libgomp/barrier.c
index c9a3478..c17660c 100644
--- libgomp/barrier.c
+++ libgomp/barrier.c
@@ -1,7 +1,8 @@
 /* Copyright (C) 2005-2015 Free Software Foundation, Inc.
    Contributed by Richard Henderson <rth@redhat.com>.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/config/bsd/proc.c libgomp/config/bsd/proc.c
index ab026f0..9a435e1 100644
--- libgomp/config/bsd/proc.c
+++ libgomp/config/bsd/proc.c
@@ -1,7 +1,8 @@
 /* Copyright (C) 2005-2015 Free Software Foundation, Inc.
    Contributed by Richard Henderson <rth@redhat.com>.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/config/linux/affinity.c libgomp/config/linux/affinity.c
index d909cea8..17b65af 100644
--- libgomp/config/linux/affinity.c
+++ libgomp/config/linux/affinity.c
@@ -1,7 +1,8 @@
 /* Copyright (C) 2006-2015 Free Software Foundation, Inc.
    Contributed by Jakub Jelinek <jakub@redhat.com>.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/config/linux/alpha/futex.h libgomp/config/linux/alpha/futex.h
index dd39d84..b8e1066 100644
--- libgomp/config/linux/alpha/futex.h
+++ libgomp/config/linux/alpha/futex.h
@@ -1,7 +1,8 @@
 /* Copyright (C) 2005-2015 Free Software Foundation, Inc.
    Contributed by Richard Henderson <rth@redhat.com>.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/config/linux/bar.c libgomp/config/linux/bar.c
index 7ae7388..51fbd99 100644
--- libgomp/config/linux/bar.c
+++ libgomp/config/linux/bar.c
@@ -1,7 +1,8 @@
 /* Copyright (C) 2005-2015 Free Software Foundation, Inc.
    Contributed by Richard Henderson <rth@redhat.com>.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/config/linux/bar.h libgomp/config/linux/bar.h
index 4a48651..3236436 100644
--- libgomp/config/linux/bar.h
+++ libgomp/config/linux/bar.h
@@ -1,7 +1,8 @@
 /* Copyright (C) 2005-2015 Free Software Foundation, Inc.
    Contributed by Richard Henderson <rth@redhat.com>.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/config/linux/futex.h libgomp/config/linux/futex.h
index 5e54c41..c99ea37 100644
--- libgomp/config/linux/futex.h
+++ libgomp/config/linux/futex.h
@@ -1,7 +1,8 @@
 /* Copyright (C) 2010-2015 Free Software Foundation, Inc.
    Contributed by ARM Ltd.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/config/linux/ia64/futex.h libgomp/config/linux/ia64/futex.h
index 03f8ac9..b63cd20 100644
--- libgomp/config/linux/ia64/futex.h
+++ libgomp/config/linux/ia64/futex.h
@@ -1,7 +1,8 @@
 /* Copyright (C) 2005-2015 Free Software Foundation, Inc.
    Contributed by Richard Henderson <rth@redhat.com>.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/config/linux/lock.c libgomp/config/linux/lock.c
index b02b880..32cd21d 100644
--- libgomp/config/linux/lock.c
+++ libgomp/config/linux/lock.c
@@ -1,7 +1,8 @@
 /* Copyright (C) 2005-2015 Free Software Foundation, Inc.
    Contributed by Richard Henderson <rth@redhat.com>.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/config/linux/mips/futex.h libgomp/config/linux/mips/futex.h
index 915d9b6..927a702 100644
--- libgomp/config/linux/mips/futex.h
+++ libgomp/config/linux/mips/futex.h
@@ -1,7 +1,8 @@
 /* Copyright (C) 2005-2015 Free Software Foundation, Inc.
    Contributed by Ilie Garbacea <ilie@mips.com>, Chao-ying Fu <fu@mips.com>.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/config/linux/mutex.c libgomp/config/linux/mutex.c
index 4a16754..7ab05a7 100644
--- libgomp/config/linux/mutex.c
+++ libgomp/config/linux/mutex.c
@@ -1,7 +1,8 @@
 /* Copyright (C) 2005-2015 Free Software Foundation, Inc.
    Contributed by Richard Henderson <rth@redhat.com>.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/config/linux/mutex.h libgomp/config/linux/mutex.h
index 93658f7..617195e 100644
--- libgomp/config/linux/mutex.h
+++ libgomp/config/linux/mutex.h
@@ -1,7 +1,8 @@
 /* Copyright (C) 2005-2015 Free Software Foundation, Inc.
    Contributed by Richard Henderson <rth@redhat.com>.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/config/linux/powerpc/futex.h libgomp/config/linux/powerpc/futex.h
index 7d8f79b..4c1bcfb 100644
--- libgomp/config/linux/powerpc/futex.h
+++ libgomp/config/linux/powerpc/futex.h
@@ -1,7 +1,8 @@
 /* Copyright (C) 2005-2015 Free Software Foundation, Inc.
    Contributed by Richard Henderson <rth@redhat.com>.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/config/linux/proc.c libgomp/config/linux/proc.c
index 18ecfe1..64964ee 100644
--- libgomp/config/linux/proc.c
+++ libgomp/config/linux/proc.c
@@ -1,7 +1,8 @@
 /* Copyright (C) 2005-2015 Free Software Foundation, Inc.
    Contributed by Jakub Jelinek <jakub@redhat.com>.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/config/linux/proc.h libgomp/config/linux/proc.h
index 9132f9e..22ad529 100644
--- libgomp/config/linux/proc.h
+++ libgomp/config/linux/proc.h
@@ -1,7 +1,8 @@
 /* Copyright (C) 2011-2015 Free Software Foundation, Inc.
    Contributed by Uros Bizjak <ubizjak@gmail.com>
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/config/linux/ptrlock.c libgomp/config/linux/ptrlock.c
index 9a6fb6d..5feb735 100644
--- libgomp/config/linux/ptrlock.c
+++ libgomp/config/linux/ptrlock.c
@@ -1,7 +1,8 @@
 /* Copyright (C) 2008-2015 Free Software Foundation, Inc.
    Contributed by Jakub Jelinek <jakub@redhat.com>.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/config/linux/ptrlock.h libgomp/config/linux/ptrlock.h
index c3ff9df..35ac464 100644
--- libgomp/config/linux/ptrlock.h
+++ libgomp/config/linux/ptrlock.h
@@ -1,7 +1,8 @@
 /* Copyright (C) 2008-2015 Free Software Foundation, Inc.
    Contributed by Jakub Jelinek <jakub@redhat.com>.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/config/linux/s390/futex.h libgomp/config/linux/s390/futex.h
index d1daefd..d99eba9 100644
--- libgomp/config/linux/s390/futex.h
+++ libgomp/config/linux/s390/futex.h
@@ -1,7 +1,8 @@
 /* Copyright (C) 2005-2015 Free Software Foundation, Inc.
    Contributed by Jakub Jelinek <jakub@redhat.com>.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/config/linux/sem.c libgomp/config/linux/sem.c
index 6436384..7485de2 100644
--- libgomp/config/linux/sem.c
+++ libgomp/config/linux/sem.c
@@ -1,7 +1,8 @@
 /* Copyright (C) 2005-2015 Free Software Foundation, Inc.
    Contributed by Richard Henderson <rth@redhat.com>.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/config/linux/sem.h libgomp/config/linux/sem.h
index 4fb1113..a710c99 100644
--- libgomp/config/linux/sem.h
+++ libgomp/config/linux/sem.h
@@ -1,7 +1,8 @@
 /* Copyright (C) 2005-2015 Free Software Foundation, Inc.
    Contributed by Richard Henderson <rth@redhat.com>.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/config/linux/sparc/futex.h libgomp/config/linux/sparc/futex.h
index 483536a..a4d2152 100644
--- libgomp/config/linux/sparc/futex.h
+++ libgomp/config/linux/sparc/futex.h
@@ -1,7 +1,8 @@
 /* Copyright (C) 2005-2015 Free Software Foundation, Inc.
    Contributed by Jakub Jelinek <jakub@redhat.com>.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/config/linux/tile/futex.h libgomp/config/linux/tile/futex.h
index db05da1..a9836a7 100644
--- libgomp/config/linux/tile/futex.h
+++ libgomp/config/linux/tile/futex.h
@@ -1,7 +1,8 @@
 /* Copyright (C) 2011-2015 Free Software Foundation, Inc.
    Contributed by Walter Lee (walt@tilera.com)
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/config/linux/wait.h libgomp/config/linux/wait.h
index 34803db..96d2fbe 100644
--- libgomp/config/linux/wait.h
+++ libgomp/config/linux/wait.h
@@ -1,7 +1,8 @@
 /* Copyright (C) 2008-2015 Free Software Foundation, Inc.
    Contributed by Jakub Jelinek <jakub@redhat.com>.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/config/linux/x86/futex.h libgomp/config/linux/x86/futex.h
index 0072876..6c6d317 100644
--- libgomp/config/linux/x86/futex.h
+++ libgomp/config/linux/x86/futex.h
@@ -1,7 +1,8 @@
 /* Copyright (C) 2005-2015 Free Software Foundation, Inc.
    Contributed by Richard Henderson <rth@redhat.com>.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/config/mingw32/proc.c libgomp/config/mingw32/proc.c
index bd68fd8..99766ab 100644
--- libgomp/config/mingw32/proc.c
+++ libgomp/config/mingw32/proc.c
@@ -1,7 +1,8 @@
 /* Copyright (C) 2007-2015 Free Software Foundation, Inc.
    Contributed by Danny Smith <dannysmith@users.sourceforge.net>
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/config/mingw32/time.c libgomp/config/mingw32/time.c
index 58b2c8e..cf0d25c 100644
--- libgomp/config/mingw32/time.c
+++ libgomp/config/mingw32/time.c
@@ -1,7 +1,8 @@
 /* Copyright (C) 2006-2015 Free Software Foundation, Inc.
    Contributed by Francois-Xavier Coudert <coudert@clipper.ens.fr>
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/config/posix/affinity.c libgomp/config/posix/affinity.c
index efd0aac..6840d3a 100644
--- libgomp/config/posix/affinity.c
+++ libgomp/config/posix/affinity.c
@@ -1,7 +1,8 @@
 /* Copyright (C) 2006-2015 Free Software Foundation, Inc.
    Contributed by Jakub Jelinek <jakub@redhat.com>.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/config/posix/bar.c libgomp/config/posix/bar.c
index 7f31dfd..de66d6c 100644
--- libgomp/config/posix/bar.c
+++ libgomp/config/posix/bar.c
@@ -1,7 +1,8 @@
 /* Copyright (C) 2005-2015 Free Software Foundation, Inc.
    Contributed by Richard Henderson <rth@redhat.com>.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/config/posix/bar.h libgomp/config/posix/bar.h
index 47f0b10..3b29c31 100644
--- libgomp/config/posix/bar.h
+++ libgomp/config/posix/bar.h
@@ -1,7 +1,8 @@
 /* Copyright (C) 2005-2015 Free Software Foundation, Inc.
    Contributed by Richard Henderson <rth@redhat.com>.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/config/posix/lock.c libgomp/config/posix/lock.c
index a77d2af..6cbc1c3 100644
--- libgomp/config/posix/lock.c
+++ libgomp/config/posix/lock.c
@@ -1,7 +1,8 @@
 /* Copyright (C) 2005-2015 Free Software Foundation, Inc.
    Contributed by Richard Henderson <rth@redhat.com>.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/config/posix/mutex.h libgomp/config/posix/mutex.h
index 46deee2..5b46026 100644
--- libgomp/config/posix/mutex.h
+++ libgomp/config/posix/mutex.h
@@ -1,7 +1,8 @@
 /* Copyright (C) 2005-2015 Free Software Foundation, Inc.
    Contributed by Richard Henderson <rth@redhat.com>.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/config/posix/proc.c libgomp/config/posix/proc.c
index 2769715..2f1cab9 100644
--- libgomp/config/posix/proc.c
+++ libgomp/config/posix/proc.c
@@ -1,7 +1,8 @@
 /* Copyright (C) 2005-2015 Free Software Foundation, Inc.
    Contributed by Richard Henderson <rth@redhat.com>.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/config/posix/ptrlock.h libgomp/config/posix/ptrlock.h
index 53dcd83..86faad7 100644
--- libgomp/config/posix/ptrlock.h
+++ libgomp/config/posix/ptrlock.h
@@ -1,7 +1,8 @@
 /* Copyright (C) 2008-2015 Free Software Foundation, Inc.
    Contributed by Jakub Jelinek <jakub@redhat.com>.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/config/posix/sem.c libgomp/config/posix/sem.c
index 59e4c45..4b8fb08 100644
--- libgomp/config/posix/sem.c
+++ libgomp/config/posix/sem.c
@@ -1,7 +1,8 @@
 /* Copyright (C) 2005-2015 Free Software Foundation, Inc.
    Contributed by Richard Henderson <rth@redhat.com>.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/config/posix/sem.h libgomp/config/posix/sem.h
index df39b53..51ba379 100644
--- libgomp/config/posix/sem.h
+++ libgomp/config/posix/sem.h
@@ -1,7 +1,8 @@
 /* Copyright (C) 2005-2015 Free Software Foundation, Inc.
    Contributed by Richard Henderson <rth@redhat.com>.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/config/posix/time.c libgomp/config/posix/time.c
index 26c2f28..cb8b4c3 100644
--- libgomp/config/posix/time.c
+++ libgomp/config/posix/time.c
@@ -1,7 +1,8 @@
 /* Copyright (C) 2005-2015 Free Software Foundation, Inc.
    Contributed by Richard Henderson <rth@redhat.com>.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/configure libgomp/configure
index d109fc1..3214e9d 100755
--- libgomp/configure
+++ libgomp/configure
@@ -1,6 +1,6 @@
 #! /bin/sh
 # Guess values for system-dependent variables and create Makefiles.
-# Generated by GNU Autoconf 2.64 for GNU OpenMP Runtime Library 1.0.
+# Generated by GNU Autoconf 2.64 for GNU Offloading and Multi Processing Runtime Library 1.0.
 #
 # Copyright (C) 1992, 1993, 1994, 1995, 1996, 1998, 1999, 2000, 2001,
 # 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009 Free Software
@@ -554,10 +554,10 @@ MFLAGS=
 MAKEFLAGS=
 
 # Identity of this package.
-PACKAGE_NAME='GNU OpenMP Runtime Library'
+PACKAGE_NAME='GNU Offloading and Multi Processing Runtime Library'
 PACKAGE_TARNAME='libgomp'
 PACKAGE_VERSION='1.0'
-PACKAGE_STRING='GNU OpenMP Runtime Library 1.0'
+PACKAGE_STRING='GNU Offloading and Multi Processing Runtime Library 1.0'
 PACKAGE_BUGREPORT=''
 PACKAGE_URL='http://www.gnu.org/software/libgomp/'
 
@@ -1324,7 +1324,7 @@ if test "$ac_init_help" = "long"; then
   # Omit some internal or obsolete options to make the list less imposing.
   # This message is too long to be a string in the A/UX 3.1 sh.
   cat <<_ACEOF
-\`configure' configures GNU OpenMP Runtime Library 1.0 to adapt to many kinds of systems.
+\`configure' configures GNU Offloading and Multi Processing Runtime Library 1.0 to adapt to many kinds of systems.
 
 Usage: $0 [OPTION]... [VAR=VALUE]...
 
@@ -1395,7 +1395,7 @@ fi
 
 if test -n "$ac_init_help"; then
   case $ac_init_help in
-     short | recursive ) echo "Configuration of GNU OpenMP Runtime Library 1.0:";;
+     short | recursive ) echo "Configuration of GNU Offloading and Multi Processing Runtime Library 1.0:";;
    esac
   cat <<\_ACEOF
 
@@ -1448,7 +1448,7 @@ Use these variables to override the choices made by `configure' or to help
 it to find libraries and programs with nonstandard names/locations.
 
 Report bugs to the package provider.
-GNU OpenMP Runtime Library home page: <http://www.gnu.org/software/libgomp/>.
+GNU Offloading and Multi Processing Runtime Library home page: <http://www.gnu.org/software/libgomp/>.
 General help using GNU software: <http://www.gnu.org/gethelp/>.
 _ACEOF
 ac_status=$?
@@ -1512,7 +1512,7 @@ fi
 test -n "$ac_init_help" && exit $ac_status
 if $ac_init_version; then
   cat <<\_ACEOF
-GNU OpenMP Runtime Library configure 1.0
+GNU Offloading and Multi Processing Runtime Library configure 1.0
 generated by GNU Autoconf 2.64
 
 Copyright (C) 2009 Free Software Foundation, Inc.
@@ -2193,7 +2193,7 @@ cat >config.log <<_ACEOF
 This file contains any messages produced by compilers while
 running configure, to aid debugging if configure makes a mistake.
 
-It was created by GNU OpenMP Runtime Library $as_me 1.0, which was
+It was created by GNU Offloading and Multi Processing Runtime Library $as_me 1.0, which was
 generated by GNU Autoconf 2.64.  Invocation command line was
 
   $ $0 $@
@@ -16944,7 +16944,7 @@ cat >>$CONFIG_STATUS <<\_ACEOF || ac_write_fail=1
 # report actual input values of CONFIG_FILES etc. instead of their
 # values after options handling.
 ac_log="
-This file was extended by GNU OpenMP Runtime Library $as_me 1.0, which was
+This file was extended by GNU Offloading and Multi Processing Runtime Library $as_me 1.0, which was
 generated by GNU Autoconf 2.64.  Invocation command line was
 
   CONFIG_FILES    = $CONFIG_FILES
@@ -17004,13 +17004,13 @@ Configuration commands:
 $config_commands
 
 Report bugs to the package provider.
-GNU OpenMP Runtime Library home page: <http://www.gnu.org/software/libgomp/>.
+GNU Offloading and Multi Processing Runtime Library home page: <http://www.gnu.org/software/libgomp/>.
 General help using GNU software: <http://www.gnu.org/gethelp/>."
 
 _ACEOF
 cat >>$CONFIG_STATUS <<_ACEOF || ac_write_fail=1
 ac_cs_version="\\
-GNU OpenMP Runtime Library config.status 1.0
+GNU Offloading and Multi Processing Runtime Library config.status 1.0
 configured by $0, generated by GNU Autoconf 2.64,
   with options \\"`$as_echo "$ac_configure_args" | sed 's/^ //; s/[\\""\`\$]/\\\\&/g'`\\"
 
diff --git libgomp/configure.ac libgomp/configure.ac
index c8a98f0..8ed1bae 100644
--- libgomp/configure.ac
+++ libgomp/configure.ac
@@ -2,7 +2,7 @@
 # aclocal -I ../config && autoconf && autoheader && automake
 
 AC_PREREQ(2.64)
-AC_INIT([GNU OpenMP Runtime Library], 1.0,,[libgomp])
+AC_INIT([GNU Offloading and Multi Processing Runtime Library], 1.0,,[libgomp])
 AC_CONFIG_HEADER(config.h)
 
 # -------
diff --git libgomp/critical.c libgomp/critical.c
index abd4d66..12b23d5 100644
--- libgomp/critical.c
+++ libgomp/critical.c
@@ -1,7 +1,8 @@
 /* Copyright (C) 2005-2015 Free Software Foundation, Inc.
    Contributed by Richard Henderson <rth@redhat.com>.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/env.c libgomp/env.c
index 1fd3461..b05b73a 100644
--- libgomp/env.c
+++ libgomp/env.c
@@ -1,7 +1,8 @@
 /* Copyright (C) 2005-2015 Free Software Foundation, Inc.
    Contributed by Richard Henderson <rth@redhat.com>.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/error.c libgomp/error.c
index e9fd595..e61d82f 100644
--- libgomp/error.c
+++ libgomp/error.c
@@ -1,7 +1,8 @@
 /* Copyright (C) 2005-2015 Free Software Foundation, Inc.
    Contributed by Richard Henderson <rth@redhat.com>.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/fortran.c libgomp/fortran.c
index 69979ac..993145f 100644
--- libgomp/fortran.c
+++ libgomp/fortran.c
@@ -1,7 +1,8 @@
 /* Copyright (C) 2005-2015 Free Software Foundation, Inc.
    Contributed by Jakub Jelinek <jakub@redhat.com>.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/iter.c libgomp/iter.c
index d145bf0..0ceb41d 100644
--- libgomp/iter.c
+++ libgomp/iter.c
@@ -1,7 +1,8 @@
 /* Copyright (C) 2005-2015 Free Software Foundation, Inc.
    Contributed by Richard Henderson <rth@redhat.com>.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/iter_ull.c libgomp/iter_ull.c
index 5d542bf..b1cad84 100644
--- libgomp/iter_ull.c
+++ libgomp/iter_ull.c
@@ -1,7 +1,8 @@
 /* Copyright (C) 2005-2015 Free Software Foundation, Inc.
    Contributed by Richard Henderson <rth@redhat.com>.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/libgomp.h libgomp/libgomp.h
index 54fa3b0..05f3496 100644
--- libgomp/libgomp.h
+++ libgomp/libgomp.h
@@ -1,7 +1,8 @@
 /* Copyright (C) 2005-2015 Free Software Foundation, Inc.
    Contributed by Richard Henderson <rth@redhat.com>.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/libgomp.texi libgomp/libgomp.texi
index 52ae0ea..b7306f1 100644
--- libgomp/libgomp.texi
+++ libgomp/libgomp.texi
@@ -31,10 +31,11 @@ texts being (a) (see below), and with the Back-Cover Texts being (b)
 @ifinfo
 @dircategory GNU Libraries
 @direntry
-* libgomp: (libgomp).                    GNU OpenMP runtime library
+* libgomp: (libgomp).          GNU Offloading and Multi Processing Runtime Library.
 @end direntry
 
-This manual documents the GNU implementation of the OpenMP API for 
+This manual documents libgomp, the GNU Offloading and Multi Processing
+Runtime library.  This is the GNU implementation of the OpenMP API for
 multi-platform shared-memory parallel programming in C/C++ and Fortran.
 
 Published by the Free Software Foundation
@@ -48,7 +49,8 @@ Boston, MA 02110-1301 USA
 @setchapternewpage odd
 
 @titlepage
-@title The GNU OpenMP Implementation
+@title GNU Offloading and Multi Processing Runtime Library
+@subtitle The GNU OpenMP Implementation
 @page
 @vskip 0pt plus 1filll
 @comment For the @value{version-GCC} Version*
@@ -69,10 +71,13 @@ Boston, MA 02110-1301, USA@*
 @top Introduction
 @cindex Introduction
 
-This manual documents the usage of libgomp, the GNU implementation of the 
+This manual documents the usage of libgomp, the GNU Offloading and
+Multi Processing Runtime Library.  This is the GNU implementation of the
 @uref{http://www.openmp.org, OpenMP} Application Programming Interface (API)
 for multi-platform shared-memory parallel programming in C/C++ and Fortran.
 
+Originally, libgomp was known as the GNU OpenMP Runtime Library.
+
 
 
 @comment
@@ -87,7 +92,8 @@ for multi-platform shared-memory parallel programming in C/C++ and Fortran.
 * Environment Variables::      Influencing runtime behavior with environment 
                                variables.
 * The libgomp ABI::            Notes on the external ABI presented by libgomp.
-* Reporting Bugs::             How to report bugs in GNU OpenMP.
+* Reporting Bugs::             How to report bugs in the GNU Offloading and
+                               Multi Processing Runtime Library.
 * Copying::                    GNU general public license says
                                how you can copy and share libgomp.
 * GNU Free Documentation License::
@@ -1607,7 +1613,7 @@ CPU 2, the fifth to CPU 4, the sixth through tenth to CPUs 6, 8, 10, 12,
 and 14 respectively and then start assigning back from the beginning of
 the list.  @code{GOMP_CPU_AFFINITY=0} binds all threads to CPU 0.
 
-There is no GNU OpenMP library routine to determine whether a CPU affinity 
+There is no libgomp library routine to determine whether a CPU affinity
 specification is in effect.  As a workaround, language-specific library 
 functions, e.g., @code{getenv} in C or @code{GET_ENVIRONMENT_VARIABLE} in 
 Fortran, may be used to query the setting of the @code{GOMP_CPU_AFFINITY} 
@@ -2066,14 +2072,14 @@ becomes
 
 
 @c ---------------------------------------------------------------------
-@c 
+@c Reporting Bugs
 @c ---------------------------------------------------------------------
 
 @node Reporting Bugs
 @chapter Reporting Bugs
 
-Bugs in the GNU OpenMP implementation should be reported via 
-@uref{http://gcc.gnu.org/bugzilla/, Bugzilla}.  For all cases, please add 
+Bugs in the GNU Offloading and Multi Processing Runtime Library should
+be reported via @uref{http://gcc.gnu.org/bugzilla/, Bugzilla}.  Please add
 "openmp" to the keywords field in the bug report.
 
 
diff --git libgomp/libgomp_f.h.in libgomp/libgomp_f.h.in
index 194ea14..d84fd3f 100644
--- libgomp/libgomp_f.h.in
+++ libgomp/libgomp_f.h.in
@@ -1,7 +1,8 @@
 /* Copyright (C) 2005-2015 Free Software Foundation, Inc.
    Contributed by Jakub Jelinek <jakub@redhat.com>.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/libgomp_g.h libgomp/libgomp_g.h
index 4b0ebd6..56a4a97 100644
--- libgomp/libgomp_g.h
+++ libgomp/libgomp_g.h
@@ -1,7 +1,8 @@
 /* Copyright (C) 2005-2015 Free Software Foundation, Inc.
    Contributed by Richard Henderson <rth@redhat.com>.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/libgomp_target.h libgomp/libgomp_target.h
index 9c2947e..2e18a64 100644
--- libgomp/libgomp_target.h
+++ libgomp/libgomp_target.h
@@ -1,6 +1,7 @@
 /* Copyright (C) 2014-2015 Free Software Foundation, Inc.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/loop.c libgomp/loop.c
index 11a07c6..27d78db 100644
--- libgomp/loop.c
+++ libgomp/loop.c
@@ -1,7 +1,8 @@
 /* Copyright (C) 2005-2015 Free Software Foundation, Inc.
    Contributed by Richard Henderson <rth@redhat.com>.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/loop_ull.c libgomp/loop_ull.c
index e2967ed..de56ae0 100644
--- libgomp/loop_ull.c
+++ libgomp/loop_ull.c
@@ -1,7 +1,8 @@
 /* Copyright (C) 2005-2015 Free Software Foundation, Inc.
    Contributed by Richard Henderson <rth@redhat.com>.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/omp.h.in libgomp/omp.h.in
index 18b6f35..dac3e8a 100644
--- libgomp/omp.h.in
+++ libgomp/omp.h.in
@@ -1,7 +1,8 @@
 /* Copyright (C) 2005-2015 Free Software Foundation, Inc.
    Contributed by Richard Henderson <rth@redhat.com>.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/omp_lib.f90.in libgomp/omp_lib.f90.in
index e37ff4b..122563e 100644
--- libgomp/omp_lib.f90.in
+++ libgomp/omp_lib.f90.in
@@ -1,7 +1,8 @@
 !  Copyright (C) 2005-2015 Free Software Foundation, Inc.
 !  Contributed by Jakub Jelinek <jakub@redhat.com>.
 
-!  This file is part of the GNU OpenMP Library (libgomp).
+!  This file is part of the GNU Offloading and Multi Processing Library
+!  (libgomp).
 
 !  Libgomp is free software; you can redistribute it and/or modify it
 !  under the terms of the GNU General Public License as published by
diff --git libgomp/omp_lib.h.in libgomp/omp_lib.h.in
index c8179a9..d590bc1 100644
--- libgomp/omp_lib.h.in
+++ libgomp/omp_lib.h.in
@@ -1,7 +1,8 @@
 !  Copyright (C) 2005-2015 Free Software Foundation, Inc.
 !  Contributed by Jakub Jelinek <jakub@redhat.com>.
 
-!  This file is part of the GNU OpenMP Library (libgomp).
+!  This file is part of the GNU Offloading and Multi Processing Library
+!  (libgomp).
 
 !  Libgomp is free software; you can redistribute it and/or modify it
 !  under the terms of the GNU General Public License as published by
diff --git libgomp/ordered.c libgomp/ordered.c
index bc0caec..69ca217 100644
--- libgomp/ordered.c
+++ libgomp/ordered.c
@@ -1,7 +1,8 @@
 /* Copyright (C) 2005-2015 Free Software Foundation, Inc.
    Contributed by Richard Henderson <rth@redhat.com>.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/parallel.c libgomp/parallel.c
index cf533e4..6d5ef05 100644
--- libgomp/parallel.c
+++ libgomp/parallel.c
@@ -1,7 +1,8 @@
 /* Copyright (C) 2005-2015 Free Software Foundation, Inc.
    Contributed by Richard Henderson <rth@redhat.com>.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/sections.c libgomp/sections.c
index 1069cb4..f3a1725 100644
--- libgomp/sections.c
+++ libgomp/sections.c
@@ -1,7 +1,8 @@
 /* Copyright (C) 2005-2015 Free Software Foundation, Inc.
    Contributed by Richard Henderson <rth@redhat.com>.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/single.c libgomp/single.c
index c40a86f..7cb6eed3 100644
--- libgomp/single.c
+++ libgomp/single.c
@@ -1,7 +1,8 @@
 /* Copyright (C) 2005-2015 Free Software Foundation, Inc.
    Contributed by Richard Henderson <rth@redhat.com>.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/splay-tree.h libgomp/splay-tree.h
index 7fadc0d..1296be6 100644
--- libgomp/splay-tree.h
+++ libgomp/splay-tree.h
@@ -2,7 +2,8 @@
    Copyright (C) 1998-2015 Free Software Foundation, Inc.
    Contributed by Mark Mitchell (mark@markmitchell.com).
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/target.c libgomp/target.c
index 07b7a27..ec097de 100644
--- libgomp/target.c
+++ libgomp/target.c
@@ -1,7 +1,8 @@
 /* Copyright (C) 2013-2015 Free Software Foundation, Inc.
    Contributed by Jakub Jelinek <jakub@redhat.com>.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/task.c libgomp/task.c
index 1d76f45..74920d5 100644
--- libgomp/task.c
+++ libgomp/task.c
@@ -1,7 +1,8 @@
 /* Copyright (C) 2007-2015 Free Software Foundation, Inc.
    Contributed by Richard Henderson <rth@redhat.com>.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/team.c libgomp/team.c
index 2322ecb..b98b233 100644
--- libgomp/team.c
+++ libgomp/team.c
@@ -1,7 +1,8 @@
 /* Copyright (C) 2005-2015 Free Software Foundation, Inc.
    Contributed by Richard Henderson <rth@redhat.com>.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libgomp/work.c libgomp/work.c
index 1d9e340..0570b90 100644
--- libgomp/work.c
+++ libgomp/work.c
@@ -1,7 +1,8 @@
 /* Copyright (C) 2005-2015 Free Software Foundation, Inc.
    Contributed by Richard Henderson <rth@redhat.com>.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git liboffloadmic/plugin/Makefile.am liboffloadmic/plugin/Makefile.am
index 0baf70d..a814f0c 100644
--- liboffloadmic/plugin/Makefile.am
+++ liboffloadmic/plugin/Makefile.am
@@ -5,7 +5,8 @@
 # Contributed by Ilya Verbin <ilya.verbin@intel.com> and
 # Andrey Turetskiy <andrey.turetskiy@intel.com>.
 #
-# This file is part of the GNU OpenMP Library (libgomp).
+# This file is part of the GNU Offloading and Multi Processing Library
+# (libgomp).
 #
 # Libgomp is free software; you can redistribute it and/or modify it
 # under the terms of the GNU General Public License as published by
diff --git liboffloadmic/plugin/Makefile.in liboffloadmic/plugin/Makefile.in
index 5ba750a..21ce060 100644
--- liboffloadmic/plugin/Makefile.in
+++ liboffloadmic/plugin/Makefile.in
@@ -22,7 +22,8 @@
 # Contributed by Ilya Verbin <ilya.verbin@intel.com> and
 # Andrey Turetskiy <andrey.turetskiy@intel.com>.
 #
-# This file is part of the GNU OpenMP Library (libgomp).
+# This file is part of the GNU Offloading and Multi Processing Library
+# (libgomp).
 #
 # Libgomp is free software; you can redistribute it and/or modify it
 # under the terms of the GNU General Public License as published by
diff --git liboffloadmic/plugin/configure.ac liboffloadmic/plugin/configure.ac
index 283faad..a2dd02d 100644
--- liboffloadmic/plugin/configure.ac
+++ liboffloadmic/plugin/configure.ac
@@ -4,7 +4,8 @@
 #
 # Contributed by Andrey Turetskiy <andrey.turetskiy@intel.com>.
 #
-# This file is part of the GNU OpenMP Library (libgomp).
+# This file is part of the GNU Offloading and Multi Processing Library
+# (libgomp).
 #
 # Libgomp is free software; you can redistribute it and/or modify it
 # under the terms of the GNU General Public License as published by
diff --git liboffloadmic/plugin/libgomp-plugin-intelmic.cpp liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
index 28ddbc3..0428b79 100644
--- liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
+++ liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
@@ -4,7 +4,8 @@
 
    Contributed by Ilya Verbin <ilya.verbin@intel.com>.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git liboffloadmic/plugin/offload_target_main.cpp liboffloadmic/plugin/offload_target_main.cpp
index 4a2778e..3fead01 100644
--- liboffloadmic/plugin/offload_target_main.cpp
+++ liboffloadmic/plugin/offload_target_main.cpp
@@ -4,7 +4,8 @@
 
    Contributed by Ilya Verbin <ilya.verbin@intel.com>.
 
-   This file is part of the GNU OpenMP Library (libgomp).
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
 
    Libgomp is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by
diff --git libstdc++-v3/ChangeLog libstdc++-v3/ChangeLog
index c27250d..cd666d1 100644
--- libstdc++-v3/ChangeLog
+++ libstdc++-v3/ChangeLog
@@ -1,3 +1,9 @@
+2015-01-10  Thomas Schwinge  <thomas@codesourcery.com>
+
+	* doc/xml/manual/parallel_mode.xml: Update for libgomp being
+	renamed from "GNU OpenMP Runtime Library" to "GNU Offloading and
+	Multi Processing Runtime Library".
+
 2015-01-09  Jonathan Wakely  <jwakely@redhat.com>
 
 	PR libstdc++/64476
diff --git libstdc++-v3/doc/xml/manual/parallel_mode.xml libstdc++-v3/doc/xml/manual/parallel_mode.xml
index 8ddec65..abf63ca 100644
--- libstdc++-v3/doc/xml/manual/parallel_mode.xml
+++ libstdc++-v3/doc/xml/manual/parallel_mode.xml
@@ -106,7 +106,9 @@ It might work with other compilers, though.</para>
   not difficult: just compile your application with the compiler
   flag <literal>-fopenmp</literal>. This will link
   in <code>libgomp</code>, the
-  OpenMP <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://gcc.gnu.org/onlinedocs/libgomp/">GNU implementation</link>,
+  <link xmlns:xlink="http://www.w3.org/1999/xlink"
+    xlink:href="http://gcc.gnu.org/onlinedocs/libgomp/">GNU Offloading and
+    Multi Processing Runtime Library</link>,
   whose presence is mandatory.
 </para>
 


Grüße,
 Thomas

[-- Attachment #2: Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 7/10] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin
  2014-09-23 18:20 [PATCH 7/10] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin Julian Brown
  2014-11-11 13:54 ` [PATCH 1/5] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin (repost) Julian Brown
  2014-12-22 16:41 ` [PATCH 7/10] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin Thomas Schwinge
@ 2015-01-12 14:49 ` Thomas Schwinge
  2015-01-12 15:07   ` Thomas Schwinge
  2015-01-12 15:00 ` Thomas Schwinge
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 36+ messages in thread
From: Thomas Schwinge @ 2015-01-12 14:49 UTC (permalink / raw)
  To: gcc-patches; +Cc: Julian Brown

[-- Attachment #1: Type: text/plain, Size: 11653 bytes --]

Hi!

On Tue, 23 Sep 2014 19:19:31 +0100, Julian Brown <julian@codesourcery.com> wrote:
> This patch contains the bulk of the OpenACC 2.0 runtime support, [...]

> --- /dev/null
> +++ b/libgomp/libgomp-plugin.c
> @@ -0,0 +1,106 @@

> +/* Exported (non-hidden) functions exposing libgomp interface for plugins.  */

> +void
> +gomp_plugin_mutex_init (gomp_mutex_t *mutex)
> +{
> +  gomp_mutex_init (mutex);
> +}
> +
> +void
> +gomp_plugin_mutex_destroy (gomp_mutex_t *mutex)
> +{
> +  gomp_mutex_destroy (mutex);
> +}
> +
> +void
> +gomp_plugin_mutex_lock (gomp_mutex_t *mutex)
> +{
> +  gomp_mutex_lock (mutex);
> +}
> +
> +void
> +gomp_plugin_mutex_unlock (gomp_mutex_t *mutex)
> +{
> +  gomp_mutex_unlock (mutex);
> +}

> --- a/libgomp/libgomp.map
> +++ b/libgomp/libgomp.map

> +PLUGIN_1.0 {
> +  global:

> +	gomp_plugin_mutex_init;
> +	gomp_plugin_mutex_destroy;
> +	gomp_plugin_mutex_lock;
> +	gomp_plugin_mutex_unlock;

> +};

> --- /dev/null
> +++ b/libgomp/plugin-nvptx.c
> @@ -0,0 +1,1854 @@
> +/* Plugin for NVPTX execution.

> +#include "libgomp.h"

Plugins in libgomp are not to depend on libgomp internals (libgomp.h),
and given that...

> +struct PTX_device
> +{

> +  /* A lock for use when manipulating the above stream list and array.  */
> +  gomp_mutex_t stream_lock;

> +};

> +static gomp_mutex_t PTX_event_lock;

> +static void
> +init_streams_for_device (struct PTX_device *ptx_dev, int concurrency)
> +{

> +  gomp_plugin_mutex_init (&ptx_dev->stream_lock);

> +}
> +[...]

... it much more makes sense to just use pthread mutexes here.  Committed
to gomp-4_0-branch in r219467:

commit 4de7ea8222739fa60d6eb81284dac61dc2bae7b2
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Mon Jan 12 14:35:51 2015 +0000

    libgomp: Use pthread mutexes in the nvptx plugin.
    
    ... instead of libgomp's internal mutex implementation.  Plugins aren't to
    depend on internal libgomp interfaces, and how would you instantiate a
    gomp_mutex_t in a plugin without knowing what it is exactly?
    
    	libgomp/
    	* plugin/plugin-nvptx.c (struct ptx_device): Turn stream_lock
    	member into a pthread_mutex_t.  Adjust all users.
    	(ptx_event_lock): Likewise.
    	* libgomp-plugin.c (GOMP_PLUGIN_mutex_init)
    	(GOMP_PLUGIN_mutex_destroy, GOMP_PLUGIN_mutex_lock)
    	(GOMP_PLUGIN_mutex_unlock): Remove.
    	* libgomp-plugin.h (GOMP_PLUGIN_mutex_init)
    	(GOMP_PLUGIN_mutex_destroy, GOMP_PLUGIN_mutex_lock)
    	(GOMP_PLUGIN_mutex_unlock): Likewise.
    	* libgomp.map (GOMP_PLUGIN_1.0): Remove GOMP_PLUGIN_mutex_init,
    	GOMP_PLUGIN_mutex_destroy, GOMP_PLUGIN_mutex_lock,
    	GOMP_PLUGIN_mutex_unlock.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@219467 138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog.gomp        | 15 +++++++++++++++
 libgomp/libgomp-plugin.c      | 24 ------------------------
 libgomp/libgomp-plugin.h      |  7 -------
 libgomp/libgomp.map           |  4 ----
 libgomp/plugin/plugin-nvptx.c | 39 ++++++++++++++++++++-------------------
 5 files changed, 35 insertions(+), 54 deletions(-)

diff --git libgomp/ChangeLog.gomp libgomp/ChangeLog.gomp
index 745b836..d955a85 100644
--- libgomp/ChangeLog.gomp
+++ libgomp/ChangeLog.gomp
@@ -1,3 +1,18 @@
+2015-01-12  Thomas Schwinge  <thomas@codesourcery.com>
+
+	* plugin/plugin-nvptx.c (struct ptx_device): Turn stream_lock
+	member into a pthread_mutex_t.  Adjust all users.
+	(ptx_event_lock): Likewise.
+	* libgomp-plugin.c (GOMP_PLUGIN_mutex_init)
+	(GOMP_PLUGIN_mutex_destroy, GOMP_PLUGIN_mutex_lock)
+	(GOMP_PLUGIN_mutex_unlock): Remove.
+	* libgomp-plugin.h (GOMP_PLUGIN_mutex_init)
+	(GOMP_PLUGIN_mutex_destroy, GOMP_PLUGIN_mutex_lock)
+	(GOMP_PLUGIN_mutex_unlock): Likewise.
+	* libgomp.map (GOMP_PLUGIN_1.0): Remove GOMP_PLUGIN_mutex_init,
+	GOMP_PLUGIN_mutex_destroy, GOMP_PLUGIN_mutex_lock,
+	GOMP_PLUGIN_mutex_unlock.
+
 2014-12-22  Thomas Schwinge  <thomas@codesourcery.com>
 
 	* libgomp.c (struct gomp_device_descr): Add lock member.
diff --git libgomp/libgomp-plugin.c libgomp/libgomp-plugin.c
index 0026270..77e250e 100644
--- libgomp/libgomp-plugin.c
+++ libgomp/libgomp-plugin.c
@@ -82,27 +82,3 @@ GOMP_PLUGIN_fatal (const char *msg, ...)
   /* Unreachable.  */
   abort ();
 }
-
-void
-GOMP_PLUGIN_mutex_init (gomp_mutex_t *mutex)
-{
-  gomp_mutex_init (mutex);
-}
-
-void
-GOMP_PLUGIN_mutex_destroy (gomp_mutex_t *mutex)
-{
-  gomp_mutex_destroy (mutex);
-}
-
-void
-GOMP_PLUGIN_mutex_lock (gomp_mutex_t *mutex)
-{
-  gomp_mutex_lock (mutex);
-}
-
-void
-GOMP_PLUGIN_mutex_unlock (gomp_mutex_t *mutex)
-{
-  gomp_mutex_unlock (mutex);
-}
diff --git libgomp/libgomp-plugin.h libgomp/libgomp-plugin.h
index 051d4e2..2e2be1f 100644
--- libgomp/libgomp-plugin.h
+++ libgomp/libgomp-plugin.h
@@ -29,8 +29,6 @@
 #ifndef LIBGOMP_PLUGIN_H
 #define LIBGOMP_PLUGIN_H 1
 
-#include "mutex.h"
-
 extern void *GOMP_PLUGIN_malloc (size_t) __attribute__((malloc));
 extern void *GOMP_PLUGIN_malloc_cleared (size_t) __attribute__((malloc));
 extern void *GOMP_PLUGIN_realloc (void *, size_t);
@@ -42,9 +40,4 @@ extern void GOMP_PLUGIN_error (const char *, ...)
 extern void GOMP_PLUGIN_fatal (const char *, ...)
 	__attribute__((noreturn, format (printf, 1, 2)));
 
-extern void GOMP_PLUGIN_mutex_init (gomp_mutex_t *);
-extern void GOMP_PLUGIN_mutex_destroy (gomp_mutex_t *);
-extern void GOMP_PLUGIN_mutex_lock (gomp_mutex_t *);
-extern void GOMP_PLUGIN_mutex_unlock (gomp_mutex_t *);
-
 #endif
diff --git libgomp/libgomp.map libgomp/libgomp.map
index aa1fdb8..bfdb78c 100644
--- libgomp/libgomp.map
+++ libgomp/libgomp.map
@@ -334,10 +334,6 @@ GOMP_PLUGIN_1.0 {
 	GOMP_PLUGIN_error;
 	GOMP_PLUGIN_debug;
 	GOMP_PLUGIN_fatal;
-	GOMP_PLUGIN_mutex_init;
-	GOMP_PLUGIN_mutex_destroy;
-	GOMP_PLUGIN_mutex_lock;
-	GOMP_PLUGIN_mutex_unlock;
 	GOMP_PLUGIN_async_unmap_vars;
 	GOMP_PLUGIN_acc_thread;
 };
diff --git libgomp/plugin/plugin-nvptx.c libgomp/plugin/plugin-nvptx.c
index 593b1a9..f92ff40 100644
--- libgomp/plugin/plugin-nvptx.c
+++ libgomp/plugin/plugin-nvptx.c
@@ -39,6 +39,7 @@
 #include "oacc-ptx.h"
 #include "oacc-plugin.h"
 
+#include <pthread.h>
 #include <cuda.h>
 #include <stdint.h>
 #include <string.h>
@@ -302,7 +303,7 @@ struct ptx_device
     int size;
   } async_streams;
   /* A lock for use when manipulating the above stream list and array.  */
-  gomp_mutex_t stream_lock;
+  pthread_mutex_t stream_lock;
   int ord;
   bool overlap;
   bool map;
@@ -331,7 +332,7 @@ struct ptx_event
   struct ptx_event *next;
 };
 
-static gomp_mutex_t ptx_event_lock;
+static pthread_mutex_t ptx_event_lock;
 static struct ptx_event *ptx_events;
 
 #define _XSTR(s) _STR(s)
@@ -424,7 +425,7 @@ init_streams_for_device (struct ptx_device *ptx_dev, int concurrency)
   ptx_dev->null_stream = null_stream;
 
   ptx_dev->active_streams = NULL;
-  GOMP_PLUGIN_mutex_init (&ptx_dev->stream_lock);
+  pthread_mutex_init (&ptx_dev->stream_lock, NULL);
 
   if (concurrency < 1)
     concurrency = 1;
@@ -484,7 +485,7 @@ select_stream_for_async (int async, pthread_t thread, bool create,
     async++;
 
   if (create)
-    GOMP_PLUGIN_mutex_lock (&ptx_dev->stream_lock);
+    pthread_mutex_lock (&ptx_dev->stream_lock);
 
   /* NOTE: AFAICT there's no particular need for acc_async_sync to map to the
      null stream, and in fact better performance may be obtainable if it doesn't
@@ -566,7 +567,7 @@ select_stream_for_async (int async, pthread_t thread, bool create,
       if (thread != stream->host_thread)
         stream->multithreaded = true;
 
-      GOMP_PLUGIN_mutex_unlock (&ptx_dev->stream_lock);
+      pthread_mutex_unlock (&ptx_dev->stream_lock);
     }
   else if (stream && !stream->multithreaded
 	   && !pthread_equal (stream->host_thread, thread))
@@ -597,7 +598,7 @@ nvptx_init (void)
 
   ptx_events = NULL;
 
-  GOMP_PLUGIN_mutex_init (&ptx_event_lock);
+  pthread_mutex_init (&ptx_event_lock, NULL);
 
   ptx_inited = true;
 
@@ -822,7 +823,7 @@ event_gc (bool memmap_lockable)
   struct ptx_event *ptx_event = ptx_events;
   struct nvptx_thread *nvthd = nvptx_thread ();
 
-  GOMP_PLUGIN_mutex_lock (&ptx_event_lock);
+  pthread_mutex_lock (&ptx_event_lock);
 
   while (ptx_event != NULL)
     {
@@ -883,7 +884,7 @@ event_gc (bool memmap_lockable)
 	}
     }
 
-  GOMP_PLUGIN_mutex_unlock (&ptx_event_lock);
+  pthread_mutex_unlock (&ptx_event_lock);
 }
 
 static void
@@ -901,12 +902,12 @@ event_add (enum ptx_event_type type, CUevent *e, void *h)
   ptx_event->addr = h;
   ptx_event->ord = nvthd->ptx_dev->ord;
 
-  GOMP_PLUGIN_mutex_lock (&ptx_event_lock);
+  pthread_mutex_lock (&ptx_event_lock);
 
   ptx_event->next = ptx_events;
   ptx_events = ptx_event;
 
-  GOMP_PLUGIN_mutex_unlock (&ptx_event_lock);
+  pthread_mutex_unlock (&ptx_event_lock);
 }
 
 void
@@ -1239,19 +1240,19 @@ nvptx_async_test_all (void)
   pthread_t self = pthread_self ();
   struct nvptx_thread *nvthd = nvptx_thread ();
 
-  GOMP_PLUGIN_mutex_lock (&nvthd->ptx_dev->stream_lock);
+  pthread_mutex_lock (&nvthd->ptx_dev->stream_lock);
 
   for (s = nvthd->ptx_dev->active_streams; s != NULL; s = s->next)
     {
       if ((s->multithreaded || pthread_equal (s->host_thread, self))
 	  && cuStreamQuery (s->stream) == CUDA_ERROR_NOT_READY)
 	{
-	  GOMP_PLUGIN_mutex_unlock (&nvthd->ptx_dev->stream_lock);
+	  pthread_mutex_unlock (&nvthd->ptx_dev->stream_lock);
 	  return 0;
 	}
     }
 
-  GOMP_PLUGIN_mutex_unlock (&nvthd->ptx_dev->stream_lock);
+  pthread_mutex_unlock (&nvthd->ptx_dev->stream_lock);
 
   event_gc (true);
 
@@ -1322,7 +1323,7 @@ nvptx_wait_all (void)
   pthread_t self = pthread_self ();
   struct nvptx_thread *nvthd = nvptx_thread ();
 
-  GOMP_PLUGIN_mutex_lock (&nvthd->ptx_dev->stream_lock);
+  pthread_mutex_lock (&nvthd->ptx_dev->stream_lock);
 
   /* Wait for active streams initiated by this thread (or by multiple threads)
      to complete.  */
@@ -1342,7 +1343,7 @@ nvptx_wait_all (void)
 	}
     }
 
-  GOMP_PLUGIN_mutex_unlock (&nvthd->ptx_dev->stream_lock);
+  pthread_mutex_unlock (&nvthd->ptx_dev->stream_lock);
 
   event_gc (true);
 }
@@ -1368,7 +1369,7 @@ nvptx_wait_all_async (int async)
 
   event_gc (true);
 
-  GOMP_PLUGIN_mutex_lock (&nvthd->ptx_dev->stream_lock);
+  pthread_mutex_lock (&nvthd->ptx_dev->stream_lock);
 
   for (other_stream = nvthd->ptx_dev->active_streams;
        other_stream != NULL;
@@ -1396,7 +1397,7 @@ nvptx_wait_all_async (int async)
 	GOMP_PLUGIN_fatal ("cuStreamWaitEvent error: %s", cuda_error (r));
    }
 
-  GOMP_PLUGIN_mutex_unlock (&nvthd->ptx_dev->stream_lock);
+  pthread_mutex_unlock (&nvthd->ptx_dev->stream_lock);
 }
 
 static void *
@@ -1442,7 +1443,7 @@ nvptx_set_cuda_stream (int async, void *stream)
   pthread_t self = pthread_self ();
   struct nvptx_thread *nvthd = nvptx_thread ();
 
-  GOMP_PLUGIN_mutex_lock (&nvthd->ptx_dev->stream_lock);
+  pthread_mutex_lock (&nvthd->ptx_dev->stream_lock);
 
   if (async < 0)
     GOMP_PLUGIN_fatal ("bad async %d", async);
@@ -1474,7 +1475,7 @@ nvptx_set_cuda_stream (int async, void *stream)
       free (oldstream);
     }
 
-  GOMP_PLUGIN_mutex_unlock (&nvthd->ptx_dev->stream_lock);
+  pthread_mutex_unlock (&nvthd->ptx_dev->stream_lock);
 
   (void) select_stream_for_async (async, self, true, (CUstream) stream);
 


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 7/10] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin
  2014-09-23 18:20 [PATCH 7/10] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin Julian Brown
                   ` (2 preceding siblings ...)
  2015-01-12 14:49 ` Thomas Schwinge
@ 2015-01-12 15:00 ` Thomas Schwinge
  2017-02-02 14:38 ` libgomp, nvptx plugin: Make "nvptx_exec" static (was: [PATCH 7/10] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin) Thomas Schwinge
  2022-05-12 11:32 ` libgomp plugins: Don't 'AC_SUBST' and 'AC_DEFINE_UNQUOTED' for 'PLUGIN_GCN', 'PLUGIN_NVPTX' " Thomas Schwinge
  5 siblings, 0 replies; 36+ messages in thread
From: Thomas Schwinge @ 2015-01-12 15:00 UTC (permalink / raw)
  To: gcc-patches; +Cc: Julian Brown

[-- Attachment #1: Type: text/plain, Size: 11981 bytes --]

Hi!

On Tue, 23 Sep 2014 19:19:31 +0100, Julian Brown <julian@codesourcery.com> wrote:
> This patch contains the bulk of the OpenACC 2.0 runtime support, [...]

> --- /dev/null
> +++ b/libgomp/libgomp-plugin.h
> @@ -0,0 +1,57 @@

> +/* An interface to various libgomp-internal functions for use by plugins.  */

..., and in parallel, a libgomp_target.h file came into existence.  In
gomp-4_0-branch's r219468, I now merged the two into the one with -- in
my opinion -- the more descriptive name:

commit 5024605e60ed2a42fefaa6882ac0ca7493643460
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Mon Jan 12 14:47:46 2015 +0000

    libgomp: Merge libgomp_target.h into libgomp-plugin.h.
    
    	libgomp/
    	* env.c: Don't include "libgomp_target.h".
    	* libgomp-plugin.c: Likewise.
    	* oacc-async.c: Likewise.
    	* oacc-cuda.c: Likewise.
    	* oacc-init.c: Likewise.
    	* oacc-mem.c: Likewise.
    	* oacc-parallel.c: Likewise.
    	* oacc-plugin.c: Likewise.
    	* plugin/plugin-host.c: Likewise.
    	* plugin/plugin-nvptx.c: Likewise.
    	* target.c: Likewise.
    	* libgomp_target.h: Remove file after merging its content into...
    	* libgomp-plugin.h: ... this file.  Adjust all users.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@219468 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/config/i386/intelmic-mkoffload.c             |  2 +-
 libgomp/ChangeLog.gomp                           | 14 +++++++
 libgomp/env.c                                    |  1 -
 libgomp/libgomp-plugin.c                         |  1 -
 libgomp/libgomp-plugin.h                         | 37 +++++++++++++++++
 libgomp/libgomp.h                                |  2 +-
 libgomp/libgomp_target.h                         | 53 ------------------------
 libgomp/oacc-async.c                             |  1 -
 libgomp/oacc-cuda.c                              |  1 -
 libgomp/oacc-init.c                              |  1 -
 libgomp/oacc-mem.c                               |  1 -
 libgomp/oacc-parallel.c                          |  1 -
 libgomp/oacc-plugin.c                            |  1 -
 libgomp/plugin/plugin-host.c                     |  1 -
 libgomp/plugin/plugin-nvptx.c                    |  1 -
 libgomp/target.c                                 |  1 -
 liboffloadmic/plugin/libgomp-plugin-intelmic.cpp |  2 +-
 17 files changed, 54 insertions(+), 67 deletions(-)

diff --git gcc/config/i386/intelmic-mkoffload.c gcc/config/i386/intelmic-mkoffload.c
index 050f2e6..edc3f92 100644
--- gcc/config/i386/intelmic-mkoffload.c
+++ gcc/config/i386/intelmic-mkoffload.c
@@ -22,13 +22,13 @@
 
 #include "config.h"
 #include <libgen.h>
+#include "libgomp-plugin.h"
 #include "system.h"
 #include "coretypes.h"
 #include "obstack.h"
 #include "intl.h"
 #include "diagnostic.h"
 #include "collect-utils.h"
-#include <libgomp_target.h>
 
 const char tool_name[] = "intelmic mkoffload";
 
diff --git libgomp/ChangeLog.gomp libgomp/ChangeLog.gomp
index d955a85..76f21e6 100644
--- libgomp/ChangeLog.gomp
+++ libgomp/ChangeLog.gomp
@@ -1,5 +1,19 @@
 2015-01-12  Thomas Schwinge  <thomas@codesourcery.com>
 
+	* env.c: Don't include "libgomp_target.h".
+	* libgomp-plugin.c: Likewise.
+	* oacc-async.c: Likewise.
+	* oacc-cuda.c: Likewise.
+	* oacc-init.c: Likewise.
+	* oacc-mem.c: Likewise.
+	* oacc-parallel.c: Likewise.
+	* oacc-plugin.c: Likewise.
+	* plugin/plugin-host.c: Likewise.
+	* plugin/plugin-nvptx.c: Likewise.
+	* target.c: Likewise.
+	* libgomp_target.h: Remove file after merging its content into...
+	* libgomp-plugin.h: ... this file.  Adjust all users.
+
 	* plugin/plugin-nvptx.c (struct ptx_device): Turn stream_lock
 	member into a pthread_mutex_t.  Adjust all users.
 	(ptx_event_lock): Likewise.
diff --git libgomp/env.c libgomp/env.c
index 81460dc..130c52c 100644
--- libgomp/env.c
+++ libgomp/env.c
@@ -28,7 +28,6 @@
 
 #include "libgomp.h"
 #include "libgomp_f.h"
-#include "libgomp_target.h"
 #include "oacc-int.h"
 #include <ctype.h>
 #include <stdlib.h>
diff --git libgomp/libgomp-plugin.c libgomp/libgomp-plugin.c
index 77e250e..1dd33f5 100644
--- libgomp/libgomp-plugin.c
+++ libgomp/libgomp-plugin.c
@@ -30,7 +30,6 @@
 
 #include "libgomp.h"
 #include "libgomp-plugin.h"
-#include "libgomp_target.h"
 
 void *
 GOMP_PLUGIN_malloc (size_t size)
diff --git libgomp/libgomp-plugin.h libgomp/libgomp-plugin.h
index 2e2be1f..c8383e1 100644
--- libgomp/libgomp-plugin.h
+++ libgomp/libgomp-plugin.h
@@ -29,6 +29,39 @@
 #ifndef LIBGOMP_PLUGIN_H
 #define LIBGOMP_PLUGIN_H 1
 
+#include <stddef.h>
+#include <stdint.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/* Capabilities of offloading devices.  */
+#define GOMP_OFFLOAD_CAP_SHARED_MEM	(1 << 0)
+#define GOMP_OFFLOAD_CAP_NATIVE_EXEC	(1 << 1)
+#define GOMP_OFFLOAD_CAP_OPENMP_400	(1 << 2)
+#define GOMP_OFFLOAD_CAP_OPENACC_200	(1 << 3)
+
+/* Type of offload target device.  Keep in sync with include/gomp-constants.h.  */
+enum offload_target_type
+{
+  OFFLOAD_TARGET_TYPE_HOST = 2,
+  OFFLOAD_TARGET_TYPE_HOST_NONSHM = 3,
+  OFFLOAD_TARGET_TYPE_NVIDIA_PTX = 5,
+  OFFLOAD_TARGET_TYPE_INTEL_MIC = 6
+};
+
+/* Auxiliary struct, used for transferring a host-target address range mapping
+   from plugin to libgomp.  */
+struct mapping_table
+{
+  uintptr_t host_start;
+  uintptr_t host_end;
+  uintptr_t tgt_start;
+  uintptr_t tgt_end;
+};
+
+/* Miscellaneous functions.  */
 extern void *GOMP_PLUGIN_malloc (size_t) __attribute__((malloc));
 extern void *GOMP_PLUGIN_malloc_cleared (size_t) __attribute__((malloc));
 extern void *GOMP_PLUGIN_realloc (void *, size_t);
@@ -40,4 +73,8 @@ extern void GOMP_PLUGIN_error (const char *, ...)
 extern void GOMP_PLUGIN_fatal (const char *, ...)
 	__attribute__((noreturn, format (printf, 1, 2)));
 
+#ifdef __cplusplus
+}
+#endif
+
 #endif
diff --git libgomp/libgomp.h libgomp/libgomp.h
index eff790a..97732a5 100644
--- libgomp/libgomp.h
+++ libgomp/libgomp.h
@@ -38,6 +38,7 @@
 
 #include "config.h"
 #include "gstdint.h"
+#include "libgomp-plugin.h"
 
 #include <pthread.h>
 #include <stdbool.h>
@@ -632,7 +633,6 @@ extern void gomp_free_thread (void *);
 extern void gomp_init_targets_once (void);
 extern int gomp_get_num_devices (void);
 
-#include "libgomp_target.h"
 #include "splay-tree.h"
 
 struct target_mem_desc {
diff --git libgomp/libgomp_target.h libgomp/libgomp_target.h
deleted file mode 100644
index d753dfe..0000000
--- libgomp/libgomp_target.h
+++ /dev/null
@@ -1,53 +0,0 @@
-/* Copyright (C) 2014-2015 Free Software Foundation, Inc.
-
-   This file is part of the GNU Offloading and Multi Processing Library
-   (libgomp).
-
-   Libgomp is free software; you can redistribute it and/or modify it
-   under the terms of the GNU General Public License as published by
-   the Free Software Foundation; either version 3, or (at your option)
-   any later version.
-
-   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
-   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
-   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
-   more details.
-
-   Under Section 7 of GPL version 3, you are granted additional
-   permissions described in the GCC Runtime Library Exception, version
-   3.1, as published by the Free Software Foundation.
-
-   You should have received a copy of the GNU General Public License and
-   a copy of the GCC Runtime Library Exception along with this program;
-   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
-   <http://www.gnu.org/licenses/>.  */
-
-#ifndef LIBGOMP_TARGET_H
-#define LIBGOMP_TARGET_H 1
-
-/* Capabilities of offloading devices.  */
-#define GOMP_OFFLOAD_CAP_SHARED_MEM	(1 << 0)
-#define GOMP_OFFLOAD_CAP_NATIVE_EXEC	(1 << 1)
-#define GOMP_OFFLOAD_CAP_OPENMP_400	(1 << 2)
-#define GOMP_OFFLOAD_CAP_OPENACC_200	(1 << 3)
-
-/* Type of offload target device.  Keep in sync with include/gomp-constants.h.  */
-enum offload_target_type
-{
-  OFFLOAD_TARGET_TYPE_HOST = 2,
-  OFFLOAD_TARGET_TYPE_HOST_NONSHM = 3,
-  OFFLOAD_TARGET_TYPE_NVIDIA_PTX = 5,
-  OFFLOAD_TARGET_TYPE_INTEL_MIC = 6
-};
-
-/* Auxiliary struct, used for transferring a host-target address range mapping
-   from plugin to libgomp.  */
-struct mapping_table
-{
-  uintptr_t host_start;
-  uintptr_t host_end;
-  uintptr_t tgt_start;
-  uintptr_t tgt_end;
-};
-
-#endif /* LIBGOMP_TARGET_H */
diff --git libgomp/oacc-async.c libgomp/oacc-async.c
index be59036..08b7c5e 100644
--- libgomp/oacc-async.c
+++ libgomp/oacc-async.c
@@ -29,7 +29,6 @@
 
 #include "openacc.h"
 #include "libgomp.h"
-#include "libgomp_target.h"
 #include "oacc-int.h"
 
 int
diff --git libgomp/oacc-cuda.c libgomp/oacc-cuda.c
index 4dc9e38..6f1a06f 100644
--- libgomp/oacc-cuda.c
+++ libgomp/oacc-cuda.c
@@ -29,7 +29,6 @@
 #include "openacc.h"
 #include "config.h"
 #include "libgomp.h"
-#include "libgomp_target.h"
 #include "oacc-int.h"
 
 void *
diff --git libgomp/oacc-init.c libgomp/oacc-init.c
index 4ca25eb..6f4a32c 100644
--- libgomp/oacc-init.c
+++ libgomp/oacc-init.c
@@ -27,7 +27,6 @@
    <http://www.gnu.org/licenses/>.  */
 
 #include "libgomp.h"
-#include "libgomp_target.h"
 #include "oacc-int.h"
 #include "openacc.h"
 #include <assert.h>
diff --git libgomp/oacc-mem.c libgomp/oacc-mem.c
index eb821b3..674fb76 100644
--- libgomp/oacc-mem.c
+++ libgomp/oacc-mem.c
@@ -30,7 +30,6 @@
 #include "config.h"
 #include "libgomp.h"
 #include "gomp-constants.h"
-#include "libgomp_target.h"
 #include "oacc-int.h"
 #include <stdio.h>
 #include <stdint.h>
diff --git libgomp/oacc-parallel.c libgomp/oacc-parallel.c
index 6bdf674..b6ee7c1 100644
--- libgomp/oacc-parallel.c
+++ libgomp/oacc-parallel.c
@@ -30,7 +30,6 @@
 #include "libgomp.h"
 #include "libgomp_g.h"
 #include "gomp-constants.h"
-#include "libgomp_target.h"
 #include "oacc-int.h"
 #include <stdio.h>
 #include <string.h>
diff --git libgomp/oacc-plugin.c libgomp/oacc-plugin.c
index baa891f..1fd6b2d 100644
--- libgomp/oacc-plugin.c
+++ libgomp/oacc-plugin.c
@@ -28,7 +28,6 @@
 
 #include "libgomp.h"
 #include "oacc-plugin.h"
-#include "libgomp_target.h"
 #include "oacc-int.h"
 
 void
diff --git libgomp/plugin/plugin-host.c libgomp/plugin/plugin-host.c
index 3a8bb48..acf9efd 100644
--- libgomp/plugin/plugin-host.c
+++ libgomp/plugin/plugin-host.c
@@ -33,7 +33,6 @@
 #include "openacc.h"
 #include "config.h"
 #include "libgomp.h"
-#include "libgomp_target.h"
 #ifdef HOST_NONSHM_PLUGIN
 #include "libgomp-plugin.h"
 #include "oacc-plugin.h"
diff --git libgomp/plugin/plugin-nvptx.c libgomp/plugin/plugin-nvptx.c
index f92ff40..4f0dc9a 100644
--- libgomp/plugin/plugin-nvptx.c
+++ libgomp/plugin/plugin-nvptx.c
@@ -34,7 +34,6 @@
 #include "openacc.h"
 #include "config.h"
 #include "libgomp.h"
-#include "libgomp_target.h"
 #include "libgomp-plugin.h"
 #include "oacc-ptx.h"
 #include "oacc-plugin.h"
diff --git libgomp/target.c libgomp/target.c
index 8bb0ae9..6871e7b 100644
--- libgomp/target.c
+++ libgomp/target.c
@@ -27,7 +27,6 @@
 
 #include "config.h"
 #include "libgomp.h"
-#include "libgomp_target.h"
 #include "oacc-plugin.h"
 #include "oacc-int.h"
 #include "gomp-constants.h"
diff --git liboffloadmic/plugin/libgomp-plugin-intelmic.cpp liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
index b5aff92..3e7a958 100644
--- liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
+++ liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
@@ -34,7 +34,7 @@
 #include <string.h>
 #include <utility>
 #include <vector>
-#include <libgomp_target.h>
+#include "libgomp-plugin.h"
 #include "compiler_if_host.h"
 #include "main_target_image.h"
 


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 7/10] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin
  2015-01-12 14:49 ` Thomas Schwinge
@ 2015-01-12 15:07   ` Thomas Schwinge
  0 siblings, 0 replies; 36+ messages in thread
From: Thomas Schwinge @ 2015-01-12 15:07 UTC (permalink / raw)
  To: gcc-patches; +Cc: Julian Brown

[-- Attachment #1: Type: text/plain, Size: 6847 bytes --]

Hi!

On Mon, 12 Jan 2015 15:37:46 +0100, I wrote:
> On Tue, 23 Sep 2014 19:19:31 +0100, Julian Brown <julian@codesourcery.com> wrote:
> > This patch contains the bulk of the OpenACC 2.0 runtime support, [...]

> > --- /dev/null
> > +++ b/libgomp/plugin-nvptx.c
> > @@ -0,0 +1,1854 @@
> > +/* Plugin for NVPTX execution.
> 
> > +#include "libgomp.h"
> 
> Plugins in libgomp are not to depend on libgomp internals (libgomp.h),

> ... it much more makes sense to just use pthread mutexes here.  Committed
> to gomp-4_0-branch in r219467:
> 
> commit 4de7ea8222739fa60d6eb81284dac61dc2bae7b2
> Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
> Date:   Mon Jan 12 14:35:51 2015 +0000
> 
>     libgomp: Use pthread mutexes in the nvptx plugin.
>     
>     ... instead of libgomp's internal mutex implementation.  Plugins aren't to
>     depend on internal libgomp interfaces, and how would you instantiate a
>     gomp_mutex_t in a plugin without knowing what it is exactly?

Given this, we can then tighten the libgomp plugins' include files;
committed to gomp-4_0-branch in r219469:

commit 7c011e60ec4e056e4c1b054966fd95fb2cb5e44a
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Mon Jan 12 14:53:53 2015 +0000

    libgomp: Don't use internal libgomp.h for plugins.
    
    ..., and explicitly link libgomp plugins against libgomp.
    
    	libgomp/
    	* plugin/plugin-host.c [HOST_NONSHM_PLUGIN]: Don't include "libgomp.h".
    	* plugin/plugin-nvptx.c: Likewise.  Include <stdbool.h>.
    	* plugin/Makefrag.am (libgomp_plugin_nvptx_la_LIBADD)
    	(libgomp_plugin_host_nonshm_la_LIBADD): Append "libgomp.la".
    	* Makefile.in: Regenerate.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@219469 138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog.gomp        | 6 ++++++
 libgomp/Makefile.in           | 7 ++++---
 libgomp/plugin/Makefrag.am    | 3 ++-
 libgomp/plugin/plugin-host.c  | 2 +-
 libgomp/plugin/plugin-nvptx.c | 2 +-
 5 files changed, 14 insertions(+), 6 deletions(-)

diff --git libgomp/ChangeLog.gomp libgomp/ChangeLog.gomp
index 76f21e6..c2566cf 100644
--- libgomp/ChangeLog.gomp
+++ libgomp/ChangeLog.gomp
@@ -1,5 +1,11 @@
 2015-01-12  Thomas Schwinge  <thomas@codesourcery.com>
 
+	* plugin/plugin-host.c [HOST_NONSHM_PLUGIN]: Don't include "libgomp.h".
+	* plugin/plugin-nvptx.c: Likewise.  Include <stdbool.h>.
+	* plugin/Makefrag.am (libgomp_plugin_nvptx_la_LIBADD)
+	(libgomp_plugin_host_nonshm_la_LIBADD): Append "libgomp.la".
+	* Makefile.in: Regenerate.
+
 	* env.c: Don't include "libgomp_target.h".
 	* libgomp-plugin.c: Likewise.
 	* oacc-async.c: Likewise.
diff --git libgomp/Makefile.in libgomp/Makefile.in
index ac34b97..8758989 100644
--- libgomp/Makefile.in
+++ libgomp/Makefile.in
@@ -123,7 +123,7 @@ am__installdirs = "$(DESTDIR)$(toolexeclibdir)" "$(DESTDIR)$(infodir)" \
 	"$(DESTDIR)$(fincludedir)" "$(DESTDIR)$(libsubincludedir)" \
 	"$(DESTDIR)$(toolexeclibdir)"
 LTLIBRARIES = $(toolexeclib_LTLIBRARIES)
-libgomp_plugin_host_nonshm_la_LIBADD =
+libgomp_plugin_host_nonshm_la_DEPENDENCIES = libgomp.la
 am_libgomp_plugin_host_nonshm_la_OBJECTS =  \
 	libgomp_plugin_host_nonshm_la-plugin-host.lo
 libgomp_plugin_host_nonshm_la_OBJECTS =  \
@@ -133,7 +133,7 @@ libgomp_plugin_host_nonshm_la_LINK = $(LIBTOOL) --tag=CC \
 	--mode=link $(CCLD) $(AM_CFLAGS) $(CFLAGS) \
 	$(libgomp_plugin_host_nonshm_la_LDFLAGS) $(LDFLAGS) -o $@
 am__DEPENDENCIES_1 =
-@PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_DEPENDENCIES =  \
+@PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_DEPENDENCIES = libgomp.la \
 @PLUGIN_NVPTX_TRUE@	$(am__DEPENDENCIES_1)
 @PLUGIN_NVPTX_TRUE@am_libgomp_plugin_nvptx_la_OBJECTS =  \
 @PLUGIN_NVPTX_TRUE@	libgomp_plugin_nvptx_la-plugin-nvptx.lo
@@ -407,7 +407,7 @@ libgomp_la_SOURCES = alloc.c barrier.c critical.c env.c error.c iter.c \
 @PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_LDFLAGS =  \
 @PLUGIN_NVPTX_TRUE@	$(libgomp_plugin_nvptx_version_info) \
 @PLUGIN_NVPTX_TRUE@	$(lt_host_flags) $(PLUGIN_NVPTX_LDFLAGS)
-@PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_LIBADD = $(PLUGIN_NVPTX_LIBS)
+@PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_LIBADD = libgomp.la $(PLUGIN_NVPTX_LIBS)
 @PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_LIBTOOLFLAGS = --tag=disable-static
 libgomp_plugin_host_nonshm_version_info = -version-info $(libtool_VERSION)
 libgomp_plugin_host_nonshm_la_SOURCES = plugin/plugin-host.c
@@ -415,6 +415,7 @@ libgomp_plugin_host_nonshm_la_CPPFLAGS = $(AM_CPPFLAGS) -DHOST_NONSHM_PLUGIN
 libgomp_plugin_host_nonshm_la_LDFLAGS = \
 	$(libgomp_plugin_host_nonshm_version_info) $(lt_host_flags)
 
+libgomp_plugin_host_nonshm_la_LIBADD = libgomp.la
 libgomp_plugin_host_nonshm_la_LIBTOOLFLAGS = --tag=disable-static
 nodist_noinst_HEADERS = libgomp_f.h
 nodist_libsubinclude_HEADERS = omp.h openacc.h
diff --git libgomp/plugin/Makefrag.am libgomp/plugin/Makefrag.am
index d2c5428..167485f 100644
--- libgomp/plugin/Makefrag.am
+++ libgomp/plugin/Makefrag.am
@@ -35,7 +35,7 @@ libgomp_plugin_nvptx_la_CPPFLAGS = $(AM_CPPFLAGS) $(PLUGIN_NVPTX_CPPFLAGS)
 libgomp_plugin_nvptx_la_LDFLAGS = $(libgomp_plugin_nvptx_version_info) \
 	$(lt_host_flags)
 libgomp_plugin_nvptx_la_LDFLAGS += $(PLUGIN_NVPTX_LDFLAGS)
-libgomp_plugin_nvptx_la_LIBADD = $(PLUGIN_NVPTX_LIBS)
+libgomp_plugin_nvptx_la_LIBADD = libgomp.la $(PLUGIN_NVPTX_LIBS)
 libgomp_plugin_nvptx_la_LIBTOOLFLAGS = --tag=disable-static
 endif
 
@@ -45,4 +45,5 @@ libgomp_plugin_host_nonshm_la_SOURCES = plugin/plugin-host.c
 libgomp_plugin_host_nonshm_la_CPPFLAGS = $(AM_CPPFLAGS) -DHOST_NONSHM_PLUGIN
 libgomp_plugin_host_nonshm_la_LDFLAGS = \
 	$(libgomp_plugin_host_nonshm_version_info) $(lt_host_flags)
+libgomp_plugin_host_nonshm_la_LIBADD = libgomp.la
 libgomp_plugin_host_nonshm_la_LIBTOOLFLAGS = --tag=disable-static
diff --git libgomp/plugin/plugin-host.c libgomp/plugin/plugin-host.c
index acf9efd..7437407 100644
--- libgomp/plugin/plugin-host.c
+++ libgomp/plugin/plugin-host.c
@@ -32,11 +32,11 @@
 
 #include "openacc.h"
 #include "config.h"
-#include "libgomp.h"
 #ifdef HOST_NONSHM_PLUGIN
 #include "libgomp-plugin.h"
 #include "oacc-plugin.h"
 #else
+#include "libgomp.h"
 #include "oacc-int.h"
 #endif
 
diff --git libgomp/plugin/plugin-nvptx.c libgomp/plugin/plugin-nvptx.c
index 4f0dc9a..ee0c818 100644
--- libgomp/plugin/plugin-nvptx.c
+++ libgomp/plugin/plugin-nvptx.c
@@ -33,13 +33,13 @@
 
 #include "openacc.h"
 #include "config.h"
-#include "libgomp.h"
 #include "libgomp-plugin.h"
 #include "oacc-ptx.h"
 #include "oacc-plugin.h"
 
 #include <pthread.h>
 #include <cuda.h>
+#include <stdbool.h>
 #include <stdint.h>
 #include <string.h>
 #include <stdio.h>


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: libgomp: Now known as the GNU Offloading and Multi Processing Runtime Library (was: libgomp: "GNU OpenMP Runtime Library")
  2015-01-11  2:18                   ` libgomp: Now known as the GNU Offloading and Multi Processing Runtime Library (was: libgomp: "GNU OpenMP Runtime Library") Thomas Schwinge
@ 2015-01-29 10:25                     ` Thomas Schwinge
  2019-05-28 21:27                       ` libgomp: long known as the GNU Offloading and Multi Processing Runtime Library (was: libgomp: Now known as the GNU Offloading and Multi Processing Runtime Library (was: libgomp: "GNU OpenMP Runtime Library")) Thomas Schwinge
  0 siblings, 1 reply; 36+ messages in thread
From: Thomas Schwinge @ 2015-01-29 10:25 UTC (permalink / raw)
  To: gcc-patches
  Cc: gcc, Ilya Verbin, Julian Brown, David Malcolm, Jakub Jelinek,
	burnus, gerald

[-- Attachment #1: Type: text/plain, Size: 2654 bytes --]

Hi!

On Sat, 10 Jan 2015 20:21:46 +0100, I wrote:
> On Wed, 12 Nov 2014 15:43:06 -0500, David Malcolm <dmalcolm@redhat.com> wrote:
> > On Wed, 2014-11-12 at 21:30 +0100, Jakub Jelinek wrote:
> > > On Wed, Nov 12, 2014 at 03:22:21PM -0500, David Malcolm wrote:
> > > > On Wed, 2014-11-12 at 14:47 +0100, Jakub Jelinek wrote:
> > > > > On Wed, Nov 12, 2014 at 08:33:34AM -0500, David Malcolm wrote:
> > > > > > Apologies for bikeshedding, and I normally dislike "cute" names, but
> > > > > > renaming it to
> > > > > > 
> > > > > >    "GNU Offloading and Multi Processing library"
> 
> Oh, how cute!  ;-P
> 
> > > > > > would allow a backronym of "libgomp", thus preserving the existing
> > > > > > filenames/SONAME etc.

> As
> pointed out by Tobias in
> <https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01430.html>, we'll also
> need to update some more files outside of the GCC sources repository,
> that is, in the web pages repository as well as some wiki pages, I
> assume, which I'll do next week.
> 
> commit c35c9a626070a8660c10a37786cedf2d6e3742c9
> Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
> Date:   Sat Jan 10 19:10:37 2015 +0000
> 
>     libgomp: Now known as the GNU Offloading and Multi Processing Runtime Library.

Changed in <https://gcc.gnu.org/wiki/openmp?action=diff&rev2=37&rev1=36>,
and committed to wwwdocs:

Index: htdocs/onlinedocs/index.html
===================================================================
RCS file: /cvs/gcc/wwwdocs/htdocs/onlinedocs/index.html,v
retrieving revision 1.145
retrieving revision 1.146
diff -u -p -r1.145 -r1.146
--- htdocs/onlinedocs/index.html	19 Dec 2014 13:24:58 -0000	1.145
+++ htdocs/onlinedocs/index.html	29 Jan 2015 08:26:06 -0000	1.146
@@ -957,8 +957,8 @@ existing release.</p>
            href="https://gcc.gnu.org/onlinedocs/gccgo.ps.gz">PostScript</a> or <a
            href="https://gcc.gnu.org/onlinedocs/gccgo-html.tar.gz">an
            HTML tarball</a>)</li>
-    <li><a href="https://gcc.gnu.org/onlinedocs/libgomp/">GNU OpenMP
-           Manual</a> (<a
+    <li><a href="https://gcc.gnu.org/onlinedocs/libgomp/">GNU Offloading and
+           Multi Processing Runtime Library Manual</a> (<a
            href="https://gcc.gnu.org/onlinedocs/libgomp.pdf">also in
            PDF</a> or <a
            href="https://gcc.gnu.org/onlinedocs/libgomp.ps.gz">PostScript</a> or <a

It remains to be seen if for the GCC 5.0 release, the text is copied from
the 4.9.2 section at the top of the file, or from the current development
section at the bottom (which I just changed).  ;-)


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* libgomp, nvptx plugin: Make "nvptx_exec" static (was: [PATCH 7/10] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin)
  2014-09-23 18:20 [PATCH 7/10] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin Julian Brown
                   ` (3 preceding siblings ...)
  2015-01-12 15:00 ` Thomas Schwinge
@ 2017-02-02 14:38 ` Thomas Schwinge
  2022-05-12 11:32 ` libgomp plugins: Don't 'AC_SUBST' and 'AC_DEFINE_UNQUOTED' for 'PLUGIN_GCN', 'PLUGIN_NVPTX' " Thomas Schwinge
  5 siblings, 0 replies; 36+ messages in thread
From: Thomas Schwinge @ 2017-02-02 14:38 UTC (permalink / raw)
  To: gcc-patches

Hi!

On Tue, 23 Sep 2014 19:19:31 +0100, Julian Brown <julian@codesourcery.com> wrote:
> This patch contains the bulk of the OpenACC 2.0 runtime support, [...]

> --- /dev/null
> +++ b/libgomp/plugin-nvptx.c

> +void
> +PTX_exec (void (*fn), size_t mapnum, void **hostaddrs, void **devaddrs,
> +	  size_t *sizes, unsigned short *kinds, int num_gangs, int num_workers,
> +	  int vector_length, int async, void *targ_mem_desc)
> +{

As obvious, committed to trunk in r245127:

commit fbfa5aaf7537a8d4a86873dd993fdd20aaed0298
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Thu Feb 2 14:35:30 2017 +0000

    libgomp, nvptx plugin: Make "nvptx_exec" static
    
            libgomp/
            * plugin/plugin-nvptx.c (nvptx_exec): Make it static.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@245127 138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog             | 2 ++
 libgomp/plugin/plugin-nvptx.c | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git libgomp/ChangeLog libgomp/ChangeLog
index 5f05cdb..56dc5bb 100644
--- libgomp/ChangeLog
+++ libgomp/ChangeLog
@@ -1,5 +1,7 @@
 2017-02-02  Thomas Schwinge  <thomas@codesourcery.com>
 
+	* plugin/plugin-nvptx.c (nvptx_exec): Make it static.
+
 	* libgomp-plugin.h (GOMP_OFFLOAD_openacc_parallel): Rename to
 	GOMP_OFFLOAD_openacc_exec.  Adjust all users.
 	(GOMP_OFFLOAD_openacc_get_current_cuda_device): Rename to
diff --git libgomp/plugin/plugin-nvptx.c libgomp/plugin/plugin-nvptx.c
index 0284c7f..36d447c 100644
--- libgomp/plugin/plugin-nvptx.c
+++ libgomp/plugin/plugin-nvptx.c
@@ -1041,7 +1041,7 @@ event_add (enum ptx_event_type type, CUevent *e, void *h, int val)
   pthread_mutex_unlock (&ptx_event_lock);
 }
 
-void
+static void
 nvptx_exec (void (*fn), size_t mapnum, void **hostaddrs, void **devaddrs,
 	    int async, unsigned *dims, void *targ_mem_desc)
 {


Backported to gomp-4_0-branch in r245128:

commit 18ebacbb8d4a978eee356d6bfb5051a431e6400b
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Thu Feb 2 14:36:31 2017 +0000

    libgomp, nvptx plugin: Make "nvptx_exec" static
    
    Backport from trunk r245127:
    
            libgomp/
            * plugin/plugin-nvptx.c (nvptx_exec): Make it static.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@245128 138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog.gomp        | 5 +++++
 libgomp/plugin/plugin-nvptx.c | 2 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git libgomp/ChangeLog.gomp libgomp/ChangeLog.gomp
index 17b10ef..062103e 100644
--- libgomp/ChangeLog.gomp
+++ libgomp/ChangeLog.gomp
@@ -1,5 +1,10 @@
 2017-02-02  Thomas Schwinge  <thomas@codesourcery.com>
 
+	Backport from trunk r245127:
+	2017-02-02  Thomas Schwinge  <thomas@codesourcery.com>
+
+	* plugin/plugin-nvptx.c (nvptx_exec): Make it static.
+
 	Backport from trunk r245125:
 	2017-02-02  Thomas Schwinge  <thomas@codesourcery.com>
 
diff --git libgomp/plugin/plugin-nvptx.c libgomp/plugin/plugin-nvptx.c
index 79c58c6..15018e5 100644
--- libgomp/plugin/plugin-nvptx.c
+++ libgomp/plugin/plugin-nvptx.c
@@ -888,7 +888,7 @@ event_add (enum ptx_event_type type, CUevent *e, void *h, int val)
   pthread_mutex_unlock (&ptx_event_lock);
 }
 
-void
+static void
 nvptx_exec (void (*fn), size_t mapnum, void **hostaddrs, void **devaddrs,
 	    int async, unsigned *dims, void *targ_mem_desc)
 {


Grüße
 Thomas

^ permalink raw reply	[flat|nested] 36+ messages in thread

* libgomp: long known as the GNU Offloading and Multi Processing Runtime Library (was: libgomp: Now known as the GNU Offloading and Multi Processing Runtime Library (was: libgomp: "GNU OpenMP Runtime Library"))
  2015-01-29 10:25                     ` Thomas Schwinge
@ 2019-05-28 21:27                       ` Thomas Schwinge
  0 siblings, 0 replies; 36+ messages in thread
From: Thomas Schwinge @ 2019-05-28 21:27 UTC (permalink / raw)
  To: gcc-patches
  Cc: gcc, Ilya Verbin, Julian Brown, David Malcolm, Jakub Jelinek,
	burnus, gerald

[-- Attachment #1: Type: text/plain, Size: 21934 bytes --]

Hi!

On Thu, 29 Jan 2015 09:28:38 +0100, I wrote:
> On Sat, 10 Jan 2015 20:21:46 +0100, I wrote:
> > On Wed, 12 Nov 2014 15:43:06 -0500, David Malcolm <dmalcolm@redhat.com> wrote:
> > > On Wed, 2014-11-12 at 21:30 +0100, Jakub Jelinek wrote:
> > > > On Wed, Nov 12, 2014 at 03:22:21PM -0500, David Malcolm wrote:
> > > > > On Wed, 2014-11-12 at 14:47 +0100, Jakub Jelinek wrote:
> > > > > > On Wed, Nov 12, 2014 at 08:33:34AM -0500, David Malcolm wrote:
> > > > > > > Apologies for bikeshedding, and I normally dislike "cute" names, but
> > > > > > > renaming it to
> > > > > > > 
> > > > > > >    "GNU Offloading and Multi Processing library"
> > 
> > Oh, how cute!  ;-P
> > 
> > > > > > > would allow a backronym of "libgomp", thus preserving the existing
> > > > > > > filenames/SONAME etc.

> [...] and committed to wwwdocs:
> 
> Index: htdocs/onlinedocs/index.html
> ===================================================================
> RCS file: /cvs/gcc/wwwdocs/htdocs/onlinedocs/index.html,v
> retrieving revision 1.145
> retrieving revision 1.146
> diff -u -p -r1.145 -r1.146
> --- htdocs/onlinedocs/index.html	19 Dec 2014 13:24:58 -0000	1.145
> +++ htdocs/onlinedocs/index.html	29 Jan 2015 08:26:06 -0000	1.146
> @@ -957,8 +957,8 @@ existing release.</p>
>             href="https://gcc.gnu.org/onlinedocs/gccgo.ps.gz">PostScript</a> or <a
>             href="https://gcc.gnu.org/onlinedocs/gccgo-html.tar.gz">an
>             HTML tarball</a>)</li>
> -    <li><a href="https://gcc.gnu.org/onlinedocs/libgomp/">GNU OpenMP
> -           Manual</a> (<a
> +    <li><a href="https://gcc.gnu.org/onlinedocs/libgomp/">GNU Offloading and
> +           Multi Processing Runtime Library Manual</a> (<a
>             href="https://gcc.gnu.org/onlinedocs/libgomp.pdf">also in
>             PDF</a> or <a
>             href="https://gcc.gnu.org/onlinedocs/libgomp.ps.gz">PostScript</a> or <a
> 
> It remains to be seen if for the GCC 5.0 release, the text is copied from
> the 4.9.2 section at the top of the file, or from the current development
> section at the bottom (which I just changed).  ;-)

Well... ;-) -- that didn't work out.  I now committed the following
changes, "libgomp: long known as the GNU Offloading and Multi Processing
Runtime Library":

Index: htdocs/onlinedocs/index.html
===================================================================
RCS file: /cvs/gcc/wwwdocs/htdocs/onlinedocs/index.html,v
retrieving revision 1.177
retrieving revision 1.178
diff -u -p -r1.177 -r1.178
--- htdocs/onlinedocs/index.html	3 May 2019 09:05:54 -0000	1.177
+++ htdocs/onlinedocs/index.html	28 May 2019 21:01:16 -0000	1.178
@@ -78,7 +78,7 @@
            href="https://gcc.gnu.org/onlinedocs/gcc-9.1.0/gccgo-html.tar.gz">an
            HTML tarball</a>)</li>
     <li><a href="https://gcc.gnu.org/onlinedocs/gcc-9.1.0/libgomp/">GCC 9.1
-         GNU OpenMP Manual</a> (<a
+         GNU Offloading and Multi Processing Runtime Library Manual</a> (<a
          href="https://gcc.gnu.org/onlinedocs/gcc-9.1.0/libgomp.pdf">also in
          PDF</a> or <a
          href="https://gcc.gnu.org/onlinedocs/gcc-9.1.0/libgomp.ps.gz">PostScript</a> or <a
@@ -157,7 +157,7 @@
            href="https://gcc.gnu.org/onlinedocs/gcc-8.3.0/gccgo-html.tar.gz">an
            HTML tarball</a>)</li>
     <li><a href="https://gcc.gnu.org/onlinedocs/gcc-8.3.0/libgomp/">GCC 8.3
-         GNU OpenMP Manual</a> (<a
+         GNU Offloading and Multi Processing Runtime Library Manual</a> (<a
          href="https://gcc.gnu.org/onlinedocs/gcc-8.3.0/libgomp.pdf">also in
          PDF</a> or <a
          href="https://gcc.gnu.org/onlinedocs/gcc-8.3.0/libgomp.ps.gz">PostScript</a> or <a
@@ -236,7 +236,7 @@
            href="https://gcc.gnu.org/onlinedocs/gcc-7.4.0/gccgo-html.tar.gz">an
            HTML tarball</a>)</li>
     <li><a href="https://gcc.gnu.org/onlinedocs/gcc-7.4.0/libgomp/">GCC 7.4
-         GNU OpenMP Manual</a> (<a
+         GNU Offloading and Multi Processing Runtime Library Manual</a> (<a
          href="https://gcc.gnu.org/onlinedocs/gcc-7.4.0/libgomp.pdf">also in
          PDF</a> or <a
          href="https://gcc.gnu.org/onlinedocs/gcc-7.4.0/libgomp.ps.gz">PostScript</a> or <a
@@ -322,7 +322,7 @@
            href="https://gcc.gnu.org/onlinedocs/gcc-6.5.0/gccgo-html.tar.gz">an
            HTML tarball</a>)</li>
     <li><a href="https://gcc.gnu.org/onlinedocs/gcc-6.5.0/libgomp/">GCC 6.5
-         GNU OpenMP Manual</a> (<a
+         GNU Offloading and Multi Processing Runtime Library Manual</a> (<a
          href="https://gcc.gnu.org/onlinedocs/gcc-6.5.0/libgomp.pdf">also in
          PDF</a> or <a
          href="https://gcc.gnu.org/onlinedocs/gcc-6.5.0/libgomp.ps.gz">PostScript</a> or <a
@@ -408,7 +408,7 @@
            href="https://gcc.gnu.org/onlinedocs/gcc-5.5.0/gccgo-html.tar.gz">an
            HTML tarball</a>)</li>
     <li><a href="https://gcc.gnu.org/onlinedocs/gcc-5.5.0/libgomp/">GCC 5.5
-         GNU OpenMP Manual</a> (<a
+         GNU Offloading and Multi Processing Runtime Library Manual</a> (<a
          href="https://gcc.gnu.org/onlinedocs/gcc-5.5.0/libgomp.pdf">also in
          PDF</a> or <a
          href="https://gcc.gnu.org/onlinedocs/gcc-5.5.0/libgomp.ps.gz">PostScript</a> or <a
Index: htdocs/onlinedocs/5.1.0/index.html
===================================================================
RCS file: /cvs/gcc/wwwdocs/htdocs/onlinedocs/5.1.0/index.html,v
retrieving revision 1.5
retrieving revision 1.6
diff -u -p -r1.5 -r1.6
--- htdocs/onlinedocs/5.1.0/index.html	30 Sep 2018 14:38:56 -0000	1.5
+++ htdocs/onlinedocs/5.1.0/index.html	28 May 2019 21:01:16 -0000	1.6
@@ -75,7 +75,7 @@
            href="https://gcc.gnu.org/onlinedocs/gcc-5.1.0/gccgo-html.tar.gz">an
            HTML tarball</a>)</li>
     <li><a href="https://gcc.gnu.org/onlinedocs/gcc-5.1.0/libgomp/">GCC 5.1
-         GNU OpenMP Manual</a> (<a
+         GNU Offloading and Multi Processing Runtime Library Manual</a> (<a
          href="https://gcc.gnu.org/onlinedocs/gcc-5.1.0/libgomp.pdf">also in
          PDF</a> or <a
          href="https://gcc.gnu.org/onlinedocs/gcc-5.1.0/libgomp.ps.gz">PostScript</a> or <a
Index: htdocs/onlinedocs/5.2.0/index.html
===================================================================
RCS file: /cvs/gcc/wwwdocs/htdocs/onlinedocs/5.2.0/index.html,v
retrieving revision 1.4
retrieving revision 1.5
diff -u -p -r1.4 -r1.5
--- htdocs/onlinedocs/5.2.0/index.html	30 Sep 2018 14:38:56 -0000	1.4
+++ htdocs/onlinedocs/5.2.0/index.html	28 May 2019 21:01:16 -0000	1.5
@@ -75,7 +75,7 @@
            href="https://gcc.gnu.org/onlinedocs/gcc-5.2.0/gccgo-html.tar.gz">an
            HTML tarball</a>)</li>
     <li><a href="https://gcc.gnu.org/onlinedocs/gcc-5.2.0/libgomp/">GCC 5.2
-         GNU OpenMP Manual</a> (<a
+         GNU Offloading and Multi Processing Runtime Library Manual</a> (<a
          href="https://gcc.gnu.org/onlinedocs/gcc-5.2.0/libgomp.pdf">also in
          PDF</a> or <a
          href="https://gcc.gnu.org/onlinedocs/gcc-5.2.0/libgomp.ps.gz">PostScript</a> or <a
Index: htdocs/onlinedocs/5.3.0/index.html
===================================================================
RCS file: /cvs/gcc/wwwdocs/htdocs/onlinedocs/5.3.0/index.html,v
retrieving revision 1.4
retrieving revision 1.5
diff -u -p -r1.4 -r1.5
--- htdocs/onlinedocs/5.3.0/index.html	30 Sep 2018 14:38:56 -0000	1.4
+++ htdocs/onlinedocs/5.3.0/index.html	28 May 2019 21:01:16 -0000	1.5
@@ -75,7 +75,7 @@
            href="https://gcc.gnu.org/onlinedocs/gcc-5.3.0/gccgo-html.tar.gz">an
            HTML tarball</a>)</li>
     <li><a href="https://gcc.gnu.org/onlinedocs/gcc-5.3.0/libgomp/">GCC 5.3
-         GNU OpenMP Manual</a> (<a
+         GNU Offloading and Multi Processing Runtime Library Manual</a> (<a
          href="https://gcc.gnu.org/onlinedocs/gcc-5.3.0/libgomp.pdf">also in
          PDF</a> or <a
          href="https://gcc.gnu.org/onlinedocs/gcc-5.3.0/libgomp.ps.gz">PostScript</a> or <a
Index: htdocs/onlinedocs/5.4.0/index.html
===================================================================
RCS file: /cvs/gcc/wwwdocs/htdocs/onlinedocs/5.4.0/index.html,v
retrieving revision 1.4
retrieving revision 1.5
diff -u -p -r1.4 -r1.5
--- htdocs/onlinedocs/5.4.0/index.html	30 Sep 2018 14:38:56 -0000	1.4
+++ htdocs/onlinedocs/5.4.0/index.html	28 May 2019 21:01:16 -0000	1.5
@@ -75,7 +75,7 @@
            href="https://gcc.gnu.org/onlinedocs/gcc-5.4.0/gccgo-html.tar.gz">an
            HTML tarball</a>)</li>
     <li><a href="https://gcc.gnu.org/onlinedocs/gcc-5.4.0/libgomp/">GCC 5.4
-         GNU OpenMP Manual</a> (<a
+         GNU Offloading and Multi Processing Runtime Library Manual</a> (<a
          href="https://gcc.gnu.org/onlinedocs/gcc-5.4.0/libgomp.pdf">also in
          PDF</a> or <a
          href="https://gcc.gnu.org/onlinedocs/gcc-5.4.0/libgomp.ps.gz">PostScript</a> or <a
Index: htdocs/onlinedocs/5.5.0/index.html
===================================================================
RCS file: /cvs/gcc/wwwdocs/htdocs/onlinedocs/5.5.0/index.html,v
retrieving revision 1.4
retrieving revision 1.5
diff -u -p -r1.4 -r1.5
--- htdocs/onlinedocs/5.5.0/index.html	30 Sep 2018 14:38:56 -0000	1.4
+++ htdocs/onlinedocs/5.5.0/index.html	28 May 2019 21:01:16 -0000	1.5
@@ -75,7 +75,7 @@
            href="https://gcc.gnu.org/onlinedocs/gcc-5.5.0/gccgo-html.tar.gz">an
            HTML tarball</a>)</li>
     <li><a href="https://gcc.gnu.org/onlinedocs/gcc-5.5.0/libgomp/">GCC 5.5
-         GNU OpenMP Manual</a> (<a
+         GNU Offloading and Multi Processing Runtime Library Manual</a> (<a
          href="https://gcc.gnu.org/onlinedocs/gcc-5.5.0/libgomp.pdf">also in
          PDF</a> or <a
          href="https://gcc.gnu.org/onlinedocs/gcc-5.5.0/libgomp.ps.gz">PostScript</a> or <a
Index: htdocs/onlinedocs/6.1.0/index.html
===================================================================
RCS file: /cvs/gcc/wwwdocs/htdocs/onlinedocs/6.1.0/index.html,v
retrieving revision 1.4
retrieving revision 1.5
diff -u -p -r1.4 -r1.5
--- htdocs/onlinedocs/6.1.0/index.html	30 Sep 2018 14:38:56 -0000	1.4
+++ htdocs/onlinedocs/6.1.0/index.html	28 May 2019 21:01:17 -0000	1.5
@@ -75,7 +75,7 @@
            href="https://gcc.gnu.org/onlinedocs/gcc-6.1.0/gccgo-html.tar.gz">an
            HTML tarball</a>)</li>
     <li><a href="https://gcc.gnu.org/onlinedocs/gcc-6.1.0/libgomp/">GCC 6.1
-         GNU OpenMP Manual</a> (<a
+         GNU Offloading and Multi Processing Runtime Library Manual</a> (<a
          href="https://gcc.gnu.org/onlinedocs/gcc-6.1.0/libgomp.pdf">also in
          PDF</a> or <a
          href="https://gcc.gnu.org/onlinedocs/gcc-6.1.0/libgomp.ps.gz">PostScript</a> or <a
Index: htdocs/onlinedocs/6.2.0/index.html
===================================================================
RCS file: /cvs/gcc/wwwdocs/htdocs/onlinedocs/6.2.0/index.html,v
retrieving revision 1.4
retrieving revision 1.5
diff -u -p -r1.4 -r1.5
--- htdocs/onlinedocs/6.2.0/index.html	30 Sep 2018 14:38:56 -0000	1.4
+++ htdocs/onlinedocs/6.2.0/index.html	28 May 2019 21:01:17 -0000	1.5
@@ -75,7 +75,7 @@
            href="https://gcc.gnu.org/onlinedocs/gcc-6.2.0/gccgo-html.tar.gz">an
            HTML tarball</a>)</li>
     <li><a href="https://gcc.gnu.org/onlinedocs/gcc-6.2.0/libgomp/">GCC 6.2
-         GNU OpenMP Manual</a> (<a
+         GNU Offloading and Multi Processing Runtime Library Manual</a> (<a
          href="https://gcc.gnu.org/onlinedocs/gcc-6.2.0/libgomp.pdf">also in
          PDF</a> or <a
          href="https://gcc.gnu.org/onlinedocs/gcc-6.2.0/libgomp.ps.gz">PostScript</a> or <a
Index: htdocs/onlinedocs/6.3.0/index.html
===================================================================
RCS file: /cvs/gcc/wwwdocs/htdocs/onlinedocs/6.3.0/index.html,v
retrieving revision 1.4
retrieving revision 1.5
diff -u -p -r1.4 -r1.5
--- htdocs/onlinedocs/6.3.0/index.html	30 Sep 2018 14:38:56 -0000	1.4
+++ htdocs/onlinedocs/6.3.0/index.html	28 May 2019 21:01:17 -0000	1.5
@@ -75,7 +75,7 @@
            href="https://gcc.gnu.org/onlinedocs/gcc-6.3.0/gccgo-html.tar.gz">an
            HTML tarball</a>)</li>
     <li><a href="https://gcc.gnu.org/onlinedocs/gcc-6.3.0/libgomp/">GCC 6.3
-         GNU OpenMP Manual</a> (<a
+         GNU Offloading and Multi Processing Runtime Library Manual</a> (<a
          href="https://gcc.gnu.org/onlinedocs/gcc-6.3.0/libgomp.pdf">also in
          PDF</a> or <a
          href="https://gcc.gnu.org/onlinedocs/gcc-6.3.0/libgomp.ps.gz">PostScript</a> or <a
Index: htdocs/onlinedocs/6.4.0/index.html
===================================================================
RCS file: /cvs/gcc/wwwdocs/htdocs/onlinedocs/6.4.0/index.html,v
retrieving revision 1.4
retrieving revision 1.5
diff -u -p -r1.4 -r1.5
--- htdocs/onlinedocs/6.4.0/index.html	30 Sep 2018 14:38:56 -0000	1.4
+++ htdocs/onlinedocs/6.4.0/index.html	28 May 2019 21:01:17 -0000	1.5
@@ -75,7 +75,7 @@
            href="https://gcc.gnu.org/onlinedocs/gcc-6.4.0/gccgo-html.tar.gz">an
            HTML tarball</a>)</li>
     <li><a href="https://gcc.gnu.org/onlinedocs/gcc-6.4.0/libgomp/">GCC 6.4
-         GNU OpenMP Manual</a> (<a
+         GNU Offloading and Multi Processing Runtime Library Manual</a> (<a
          href="https://gcc.gnu.org/onlinedocs/gcc-6.4.0/libgomp.pdf">also in
          PDF</a> or <a
          href="https://gcc.gnu.org/onlinedocs/gcc-6.4.0/libgomp.ps.gz">PostScript</a> or <a
Index: htdocs/onlinedocs/6.5.0/index.html
===================================================================
RCS file: /cvs/gcc/wwwdocs/htdocs/onlinedocs/6.5.0/index.html,v
retrieving revision 1.1
retrieving revision 1.2
diff -u -p -r1.1 -r1.2
--- htdocs/onlinedocs/6.5.0/index.html	26 Oct 2018 12:03:54 -0000	1.1
+++ htdocs/onlinedocs/6.5.0/index.html	28 May 2019 21:01:17 -0000	1.2
@@ -75,7 +75,7 @@
            href="https://gcc.gnu.org/onlinedocs/gcc-6.5.0/gccgo-html.tar.gz">an
            HTML tarball</a>)</li>
     <li><a href="https://gcc.gnu.org/onlinedocs/gcc-6.5.0/libgomp/">GCC 6.5
-         GNU OpenMP Manual</a> (<a
+         GNU Offloading and Multi Processing Runtime Library Manual</a> (<a
          href="https://gcc.gnu.org/onlinedocs/gcc-6.5.0/libgomp.pdf">also in
          PDF</a> or <a
          href="https://gcc.gnu.org/onlinedocs/gcc-6.5.0/libgomp.ps.gz">PostScript</a> or <a
Index: htdocs/onlinedocs/7.1.0/index.html
===================================================================
RCS file: /cvs/gcc/wwwdocs/htdocs/onlinedocs/7.1.0/index.html,v
retrieving revision 1.5
retrieving revision 1.6
diff -u -p -r1.5 -r1.6
--- htdocs/onlinedocs/7.1.0/index.html	30 Sep 2018 14:38:56 -0000	1.5
+++ htdocs/onlinedocs/7.1.0/index.html	28 May 2019 21:01:17 -0000	1.6
@@ -68,7 +68,7 @@
            href="https://gcc.gnu.org/onlinedocs/gcc-7.1.0/gccgo-html.tar.gz">an
            HTML tarball</a>)</li>
     <li><a href="https://gcc.gnu.org/onlinedocs/gcc-7.1.0/libgomp/">GCC 7.1
-         GNU OpenMP Manual</a> (<a
+         GNU Offloading and Multi Processing Runtime Library Manual</a> (<a
          href="https://gcc.gnu.org/onlinedocs/gcc-7.1.0/libgomp.pdf">also in
          PDF</a> or <a
          href="https://gcc.gnu.org/onlinedocs/gcc-7.1.0/libgomp.ps.gz">PostScript</a> or <a
Index: htdocs/onlinedocs/7.2.0/index.html
===================================================================
RCS file: /cvs/gcc/wwwdocs/htdocs/onlinedocs/7.2.0/index.html,v
retrieving revision 1.4
retrieving revision 1.5
diff -u -p -r1.4 -r1.5
--- htdocs/onlinedocs/7.2.0/index.html	30 Sep 2018 14:38:56 -0000	1.4
+++ htdocs/onlinedocs/7.2.0/index.html	28 May 2019 21:01:17 -0000	1.5
@@ -68,7 +68,7 @@
            href="https://gcc.gnu.org/onlinedocs/gcc-7.2.0/gccgo-html.tar.gz">an
            HTML tarball</a>)</li>
     <li><a href="https://gcc.gnu.org/onlinedocs/gcc-7.2.0/libgomp/">GCC 7.2
-         GNU OpenMP Manual</a> (<a
+         GNU Offloading and Multi Processing Runtime Library Manual</a> (<a
          href="https://gcc.gnu.org/onlinedocs/gcc-7.2.0/libgomp.pdf">also in
          PDF</a> or <a
          href="https://gcc.gnu.org/onlinedocs/gcc-7.2.0/libgomp.ps.gz">PostScript</a> or <a
Index: htdocs/onlinedocs/7.3.0/index.html
===================================================================
RCS file: /cvs/gcc/wwwdocs/htdocs/onlinedocs/7.3.0/index.html,v
retrieving revision 1.4
retrieving revision 1.5
diff -u -p -r1.4 -r1.5
--- htdocs/onlinedocs/7.3.0/index.html	30 Sep 2018 14:38:56 -0000	1.4
+++ htdocs/onlinedocs/7.3.0/index.html	28 May 2019 21:01:17 -0000	1.5
@@ -68,7 +68,7 @@
            href="https://gcc.gnu.org/onlinedocs/gcc-7.3.0/gccgo-html.tar.gz">an
            HTML tarball</a>)</li>
     <li><a href="https://gcc.gnu.org/onlinedocs/gcc-7.3.0/libgomp/">GCC 7.3
-         GNU OpenMP Manual</a> (<a
+         GNU Offloading and Multi Processing Runtime Library Manual</a> (<a
          href="https://gcc.gnu.org/onlinedocs/gcc-7.3.0/libgomp.pdf">also in
          PDF</a> or <a
          href="https://gcc.gnu.org/onlinedocs/gcc-7.3.0/libgomp.ps.gz">PostScript</a> or <a
Index: htdocs/onlinedocs/7.4.0/index.html
===================================================================
RCS file: /cvs/gcc/wwwdocs/htdocs/onlinedocs/7.4.0/index.html,v
retrieving revision 1.3
retrieving revision 1.4
diff -u -p -r1.3 -r1.4
--- htdocs/onlinedocs/7.4.0/index.html	5 Feb 2019 09:08:34 -0000	1.3
+++ htdocs/onlinedocs/7.4.0/index.html	28 May 2019 21:01:17 -0000	1.4
@@ -68,7 +68,7 @@
            href="https://gcc.gnu.org/onlinedocs/gcc-7.4.0/gccgo-html.tar.gz">an
            HTML tarball</a>)</li>
     <li><a href="https://gcc.gnu.org/onlinedocs/gcc-7.4.0/libgomp/">GCC 7.4
-         GNU OpenMP Manual</a> (<a
+         GNU Offloading and Multi Processing Runtime Library Manual</a> (<a
          href="https://gcc.gnu.org/onlinedocs/gcc-7.4.0/libgomp.pdf">also in
          PDF</a> or <a
          href="https://gcc.gnu.org/onlinedocs/gcc-7.4.0/libgomp.ps.gz">PostScript</a> or <a
Index: htdocs/onlinedocs/8.1.0/index.html
===================================================================
RCS file: /cvs/gcc/wwwdocs/htdocs/onlinedocs/8.1.0/index.html,v
retrieving revision 1.4
retrieving revision 1.5
diff -u -p -r1.4 -r1.5
--- htdocs/onlinedocs/8.1.0/index.html	30 Sep 2018 14:38:56 -0000	1.4
+++ htdocs/onlinedocs/8.1.0/index.html	28 May 2019 21:01:17 -0000	1.5
@@ -68,7 +68,7 @@
            href="https://gcc.gnu.org/onlinedocs/gcc-8.1.0/gccgo-html.tar.gz">an
            HTML tarball</a>)</li>
     <li><a href="https://gcc.gnu.org/onlinedocs/gcc-8.1.0/libgomp/">GCC 8.1
-         GNU OpenMP Manual</a> (<a
+         GNU Offloading and Multi Processing Runtime Library Manual</a> (<a
          href="https://gcc.gnu.org/onlinedocs/gcc-8.1.0/libgomp.pdf">also in
          PDF</a> or <a
          href="https://gcc.gnu.org/onlinedocs/gcc-8.1.0/libgomp.ps.gz">PostScript</a> or <a
Index: htdocs/onlinedocs/8.2.0/index.html
===================================================================
RCS file: /cvs/gcc/wwwdocs/htdocs/onlinedocs/8.2.0/index.html,v
retrieving revision 1.3
retrieving revision 1.4
diff -u -p -r1.3 -r1.4
--- htdocs/onlinedocs/8.2.0/index.html	30 Sep 2018 14:38:56 -0000	1.3
+++ htdocs/onlinedocs/8.2.0/index.html	28 May 2019 21:01:17 -0000	1.4
@@ -68,7 +68,7 @@
            href="https://gcc.gnu.org/onlinedocs/gcc-8.2.0/gccgo-html.tar.gz">an
            HTML tarball</a>)</li>
     <li><a href="https://gcc.gnu.org/onlinedocs/gcc-8.2.0/libgomp/">GCC 8.2
-         GNU OpenMP Manual</a> (<a
+         GNU Offloading and Multi Processing Runtime Library Manual</a> (<a
          href="https://gcc.gnu.org/onlinedocs/gcc-8.2.0/libgomp.pdf">also in
          PDF</a> or <a
          href="https://gcc.gnu.org/onlinedocs/gcc-8.2.0/libgomp.ps.gz">PostScript</a> or <a
Index: htdocs/onlinedocs/8.3.0/index.html
===================================================================
RCS file: /cvs/gcc/wwwdocs/htdocs/onlinedocs/8.3.0/index.html,v
retrieving revision 1.1
retrieving revision 1.2
diff -u -p -r1.1 -r1.2
--- htdocs/onlinedocs/8.3.0/index.html	22 Feb 2019 15:09:42 -0000	1.1
+++ htdocs/onlinedocs/8.3.0/index.html	28 May 2019 21:01:17 -0000	1.2
@@ -68,7 +68,7 @@
            href="https://gcc.gnu.org/onlinedocs/gcc-8.3.0/gccgo-html.tar.gz">an
            HTML tarball</a>)</li>
     <li><a href="https://gcc.gnu.org/onlinedocs/gcc-8.3.0/libgomp/">GCC 8.3
-         GNU OpenMP Manual</a> (<a
+         GNU Offloading and Multi Processing Runtime Library Manual</a> (<a
          href="https://gcc.gnu.org/onlinedocs/gcc-8.3.0/libgomp.pdf">also in
          PDF</a> or <a
          href="https://gcc.gnu.org/onlinedocs/gcc-8.3.0/libgomp.ps.gz">PostScript</a> or <a
Index: htdocs/onlinedocs/9.1.0/index.html
===================================================================
RCS file: /cvs/gcc/wwwdocs/htdocs/onlinedocs/9.1.0/index.html,v
retrieving revision 1.1
retrieving revision 1.2
diff -u -p -r1.1 -r1.2
--- htdocs/onlinedocs/9.1.0/index.html	3 May 2019 09:05:54 -0000	1.1
+++ htdocs/onlinedocs/9.1.0/index.html	28 May 2019 21:01:17 -0000	1.2
@@ -68,7 +68,7 @@
            href="https://gcc.gnu.org/onlinedocs/gcc-9.1.0/gccgo-html.tar.gz">an
            HTML tarball</a>)</li>
     <li><a href="https://gcc.gnu.org/onlinedocs/gcc-9.1.0/libgomp/">GCC 9.1
-         GNU OpenMP Manual</a> (<a
+         GNU Offloading and Multi Processing Runtime Library Manual</a> (<a
          href="https://gcc.gnu.org/onlinedocs/gcc-9.1.0/libgomp.pdf">also in
          PDF</a> or <a
          href="https://gcc.gnu.org/onlinedocs/gcc-9.1.0/libgomp.ps.gz">PostScript</a> or <a


Grüße
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 658 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* libgomp plugins: Don't 'AC_SUBST' and 'AC_DEFINE_UNQUOTED' for 'PLUGIN_GCN', 'PLUGIN_NVPTX' (was: [PATCH 7/10] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin)
  2014-09-23 18:20 [PATCH 7/10] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin Julian Brown
                   ` (4 preceding siblings ...)
  2017-02-02 14:38 ` libgomp, nvptx plugin: Make "nvptx_exec" static (was: [PATCH 7/10] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin) Thomas Schwinge
@ 2022-05-12 11:32 ` Thomas Schwinge
  5 siblings, 0 replies; 36+ messages in thread
From: Thomas Schwinge @ 2022-05-12 11:32 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jakub Jelinek, Andrew Stubbs, Julian Brown, Tom de Vries

[-- Attachment #1: Type: text/plain, Size: 1760 bytes --]

Hi!

On 2014-09-23T19:19:31+0100, Julian Brown <julian@codesourcery.com> wrote:
> This patch contains the bulk of the OpenACC 2.0 runtime support,
> building around, or on top of, the OpenMP 4.0 support (as previously
> posted or already extant upstream) where we could. [...]

> --- a/libgomp/Makefile.am
> +++ b/libgomp/Makefile.am

> +if PLUGIN_NVPTX

For this 'if' here (later moved into 'libgomp/plugin/Makefrag.am'), we do
need the 'AM_CONDITIONAL'...

> --- a/libgomp/Makefile.in
> +++ b/libgomp/Makefile.in

> +PLUGIN_NVPTX = @PLUGIN_NVPTX@

..., but this here (and similar elsewhere) due to 'AC_SUBST'...

> --- a/libgomp/config.h.in
> +++ b/libgomp/config.h.in

> +/* Define to 1 if the NVIDIA plugin is built, 0 if not. */
> +#undef PLUGIN_NVPTX

..., and this here due to 'AC_DEFINE_UNQUOTED' have always been unused,
so we may clean those up:

> --- a/libgomp/configure.ac
> +++ b/libgomp/configure.ac

> +AC_SUBST(PLUGIN_NVPTX)

> +AM_CONDITIONAL([PLUGIN_NVPTX], [test $PLUGIN_NVPTX = 1])
> +AC_DEFINE_UNQUOTED([PLUGIN_NVPTX], [$PLUGIN_NVPTX],
> +               [Define to 1 if the NVIDIA plugin is built, 0 if not.])

Later also cargo-culted for other libgomp plugins, where the same
"unused" reasoning applies likewise.

Pushed to master branch commit edbd2b1caaa79d68467418a4571c3b09f9602805
"libgomp plugins: Don't 'AC_SUBST' and 'AC_DEFINE_UNQUOTED' for
'PLUGIN_GCN', 'PLUGIN_NVPTX'", see attached.


Grüße
 Thomas


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-libgomp-plugins-Don-t-AC_SUBST-and-AC_DEFINE_UNQUOTE.patch --]
[-- Type: text/x-diff, Size: 5187 bytes --]

From edbd2b1caaa79d68467418a4571c3b09f9602805 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Thu, 5 May 2022 23:10:23 +0200
Subject: [PATCH] libgomp plugins: Don't 'AC_SUBST' and 'AC_DEFINE_UNQUOTED'
 for 'PLUGIN_GCN', 'PLUGIN_NVPTX'

Nothing ever used these.

	libgomp/
	* plugin/configfrag.ac: Don't 'AC_SUBST' and 'AC_DEFINE_UNQUOTED'
	for 'PLUGIN_GCN', 'PLUGIN_NVPTX'.
	* Makefile.in: Regenerate.
	* config.h.in: Likewise.
	* configure: Likewise.
	* testsuite/Makefile.in: Likewise.
---
 libgomp/Makefile.in           |  2 --
 libgomp/config.h.in           |  6 ------
 libgomp/configure             | 18 ++----------------
 libgomp/plugin/configfrag.ac  |  6 ------
 libgomp/testsuite/Makefile.in |  2 --
 5 files changed, 2 insertions(+), 32 deletions(-)

diff --git a/libgomp/Makefile.in b/libgomp/Makefile.in
index f2712aa5133..1d55f4b65e2 100644
--- a/libgomp/Makefile.in
+++ b/libgomp/Makefile.in
@@ -431,9 +431,7 @@ PACKAGE_URL = @PACKAGE_URL@
 PACKAGE_VERSION = @PACKAGE_VERSION@
 PATH_SEPARATOR = @PATH_SEPARATOR@
 PERL = @PERL@
-PLUGIN_GCN = @PLUGIN_GCN@
 PLUGIN_GCN_LIBS = @PLUGIN_GCN_LIBS@
-PLUGIN_NVPTX = @PLUGIN_NVPTX@
 PLUGIN_NVPTX_CPPFLAGS = @PLUGIN_NVPTX_CPPFLAGS@
 PLUGIN_NVPTX_LDFLAGS = @PLUGIN_NVPTX_LDFLAGS@
 PLUGIN_NVPTX_LIBS = @PLUGIN_NVPTX_LIBS@
diff --git a/libgomp/config.h.in b/libgomp/config.h.in
index e702625ab6e..5611ed925ad 100644
--- a/libgomp/config.h.in
+++ b/libgomp/config.h.in
@@ -170,12 +170,6 @@
 /* Define to the version of this package. */
 #undef PACKAGE_VERSION
 
-/* Define to 1 if the GCN plugin is built, 0 if not. */
-#undef PLUGIN_GCN
-
-/* Define to 1 if the NVIDIA plugin is built, 0 if not. */
-#undef PLUGIN_NVPTX
-
 /* Define to 1 if the NVIDIA plugin should dlopen libcuda.so.1, 0 if it should
    be linked against it. */
 #undef PLUGIN_NVPTX_DYNAMIC
diff --git a/libgomp/configure b/libgomp/configure
index 3de8eb2641f..be675a6b8ab 100755
--- a/libgomp/configure
+++ b/libgomp/configure
@@ -674,11 +674,9 @@ offload_additional_options
 offload_targets
 offload_plugins
 PLUGIN_GCN_LIBS
-PLUGIN_GCN
 PLUGIN_NVPTX_LIBS
 PLUGIN_NVPTX_LDFLAGS
 PLUGIN_NVPTX_CPPFLAGS
-PLUGIN_NVPTX
 CUDA_DRIVER_LIB
 CUDA_DRIVER_INCLUDE
 libtool_VERSION
@@ -11414,7 +11412,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 11417 "configure"
+#line 11415 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -11520,7 +11518,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 11523 "configure"
+#line 11521 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -15220,12 +15218,10 @@ PLUGIN_NVPTX_DYNAMIC=0
 
 
 
-
 PLUGIN_GCN=0
 PLUGIN_GCN_LIBS=
 
 
-
 # Parse '--enable-offload-targets', figure out the corresponding libgomp
 # plugins, and configure to find the corresponding offload compilers.
 # 'offload_plugins' and 'offload_targets' will be populated in the same order.
@@ -15373,11 +15369,6 @@ else
 fi
 
 
-cat >>confdefs.h <<_ACEOF
-#define PLUGIN_NVPTX $PLUGIN_NVPTX
-_ACEOF
-
-
 cat >>confdefs.h <<_ACEOF
 #define PLUGIN_NVPTX_DYNAMIC $PLUGIN_NVPTX_DYNAMIC
 _ACEOF
@@ -15391,11 +15382,6 @@ else
 fi
 
 
-cat >>confdefs.h <<_ACEOF
-#define PLUGIN_GCN $PLUGIN_GCN
-_ACEOF
-
-
 
 # Check for functions needed.
 for ac_func in getloadavg clock_gettime strtoull
diff --git a/libgomp/plugin/configfrag.ac b/libgomp/plugin/configfrag.ac
index 9eeac4562e4..1a61db94381 100644
--- a/libgomp/plugin/configfrag.ac
+++ b/libgomp/plugin/configfrag.ac
@@ -86,14 +86,12 @@ PLUGIN_NVPTX_CPPFLAGS=
 PLUGIN_NVPTX_LDFLAGS=
 PLUGIN_NVPTX_LIBS=
 PLUGIN_NVPTX_DYNAMIC=0
-AC_SUBST(PLUGIN_NVPTX)
 AC_SUBST(PLUGIN_NVPTX_CPPFLAGS)
 AC_SUBST(PLUGIN_NVPTX_LDFLAGS)
 AC_SUBST(PLUGIN_NVPTX_LIBS)
 
 PLUGIN_GCN=0
 PLUGIN_GCN_LIBS=
-AC_SUBST(PLUGIN_GCN)
 AC_SUBST(PLUGIN_GCN_LIBS)
 
 # Parse '--enable-offload-targets', figure out the corresponding libgomp
@@ -221,10 +219,6 @@ fi
 AC_DEFINE_UNQUOTED(OFFLOAD_PLUGINS, "$offload_plugins",
   [Define to offload plugins, separated by commas.])
 AM_CONDITIONAL([PLUGIN_NVPTX], [test $PLUGIN_NVPTX = 1])
-AC_DEFINE_UNQUOTED([PLUGIN_NVPTX], [$PLUGIN_NVPTX],
-  [Define to 1 if the NVIDIA plugin is built, 0 if not.])
 AC_DEFINE_UNQUOTED([PLUGIN_NVPTX_DYNAMIC], [$PLUGIN_NVPTX_DYNAMIC],
   [Define to 1 if the NVIDIA plugin should dlopen libcuda.so.1, 0 if it should be linked against it.])
 AM_CONDITIONAL([PLUGIN_GCN], [test $PLUGIN_GCN = 1])
-AC_DEFINE_UNQUOTED([PLUGIN_GCN], [$PLUGIN_GCN],
-  [Define to 1 if the GCN plugin is built, 0 if not.])
diff --git a/libgomp/testsuite/Makefile.in b/libgomp/testsuite/Makefile.in
index 32be337b8fc..2d1bf8f20d7 100644
--- a/libgomp/testsuite/Makefile.in
+++ b/libgomp/testsuite/Makefile.in
@@ -209,9 +209,7 @@ PACKAGE_URL = @PACKAGE_URL@
 PACKAGE_VERSION = @PACKAGE_VERSION@
 PATH_SEPARATOR = @PATH_SEPARATOR@
 PERL = @PERL@
-PLUGIN_GCN = @PLUGIN_GCN@
 PLUGIN_GCN_LIBS = @PLUGIN_GCN_LIBS@
-PLUGIN_NVPTX = @PLUGIN_NVPTX@
 PLUGIN_NVPTX_CPPFLAGS = @PLUGIN_NVPTX_CPPFLAGS@
 PLUGIN_NVPTX_LDFLAGS = @PLUGIN_NVPTX_LDFLAGS@
 PLUGIN_NVPTX_LIBS = @PLUGIN_NVPTX_LIBS@
-- 
2.35.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2022-05-12 11:32 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-23 18:20 [PATCH 7/10] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin Julian Brown
2014-11-11 13:54 ` [PATCH 1/5] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin (repost) Julian Brown
2014-11-12 10:10   ` Jakub Jelinek
2014-11-12 10:59     ` Thomas Schwinge
2014-11-12 21:11       ` Mike Stump
2014-11-12 11:06     ` Julian Brown
2014-11-12 11:15       ` Jakub Jelinek
2014-11-12 11:33     ` libgomp: "GNU OpenMP Runtime Library" (was: [PATCH 1/5] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin (repost)) Thomas Schwinge
2014-11-12 11:49       ` Jakub Jelinek
2014-11-12 13:40         ` David Malcolm
2014-11-12 13:49           ` Jakub Jelinek
2014-11-12 20:30             ` David Malcolm
2014-11-12 20:41               ` Jakub Jelinek
2014-11-12 20:50                 ` David Malcolm
2015-01-11  2:18                   ` libgomp: Now known as the GNU Offloading and Multi Processing Runtime Library (was: libgomp: "GNU OpenMP Runtime Library") Thomas Schwinge
2015-01-29 10:25                     ` Thomas Schwinge
2019-05-28 21:27                       ` libgomp: long known as the GNU Offloading and Multi Processing Runtime Library (was: libgomp: Now known as the GNU Offloading and Multi Processing Runtime Library (was: libgomp: "GNU OpenMP Runtime Library")) Thomas Schwinge
     [not found]     ` <20141113232615.4ff373bf@octopus>
2014-11-14 16:07       ` Fortran/C interfacing (was: [PATCH 1/5] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin (repost)) Thomas Schwinge
2014-11-14 21:01         ` Fortran/C interfacing Tobias Burnus
2014-11-14 21:24           ` Jakub Jelinek
2014-11-14 16:38     ` GOMP_DEBUG environment variable? (was: [PATCH 1/5] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin (repost)) Thomas Schwinge
2014-11-15  1:04     ` [PATCH 1/5] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin (repost) Julian Brown
2014-11-19 19:58       ` Bernd Schmidt
2014-11-19 20:39         ` Cesar Philippidis
2014-12-22 17:55   ` Thomas Schwinge
2014-12-22 18:05   ` Thomas Schwinge
2014-12-22 18:12   ` Thomas Schwinge
2014-12-22 18:16   ` Thomas Schwinge
2014-12-22 18:55   ` Thomas Schwinge
2014-12-23  0:57   ` Thomas Schwinge
2014-12-22 16:41 ` [PATCH 7/10] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin Thomas Schwinge
2015-01-12 14:49 ` Thomas Schwinge
2015-01-12 15:07   ` Thomas Schwinge
2015-01-12 15:00 ` Thomas Schwinge
2017-02-02 14:38 ` libgomp, nvptx plugin: Make "nvptx_exec" static (was: [PATCH 7/10] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin) Thomas Schwinge
2022-05-12 11:32 ` libgomp plugins: Don't 'AC_SUBST' and 'AC_DEFINE_UNQUOTED' for 'PLUGIN_GCN', 'PLUGIN_NVPTX' " Thomas Schwinge

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).