public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [gomp] [3/3] OpenACC 2.0 support for libgomp - documentation
@ 2014-10-14 16:12   ` Julian Brown
  2014-10-16 17:06     ` [gomp4] [1/3] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin Thomas Schwinge
  2014-11-05 16:13     ` [gomp4] OpenACC documentation updates Thomas Schwinge
  0 siblings, 2 replies; 12+ messages in thread
From: Julian Brown @ 2014-10-14 16:12 UTC (permalink / raw)
  To: gcc-patches, Jakub Jelinek, Thomas Schwinge

[-- Attachment #1: Type: text/plain, Size: 335 bytes --]

This is a version of the patch:

https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02024.html

against gomp4 branch instead of mainline.

OK to apply?

Thanks,

Julian

xxxx-xx-xx  Thomas Schwinge  <thomas@codesourcery.com>
	    James Norris  <jnorris@codesourcery.com>

    libgomp/
    * libgomp.texi: Outline documentation for OpenACC.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0002-OpenACC-documentation.patch --]
[-- Type: text/x-patch, Size: 24125 bytes --]

From c58006a7ade2a9556bd73bac9ef45b3bbd62ca37 Mon Sep 17 00:00:00 2001
From: Julian Brown <julian@codesourcery.com>
Date: Wed, 17 Sep 2014 10:26:56 -0700
Subject: [PATCH 2/3] OpenACC documentation

---
 libgomp/libgomp.texi |  661 ++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 636 insertions(+), 25 deletions(-)

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 254be57..9530a2b 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -31,10 +31,12 @@ texts being (a) (see below), and with the Back-Cover Texts being (b)
 @ifinfo
 @dircategory GNU Libraries
 @direntry
-* libgomp: (libgomp).                    GNU OpenMP runtime library
+* libgomp: (libgomp).                    GNU OpenACC and OpenMP runtime library
 @end direntry
 
-This manual documents the GNU implementation of the OpenMP API for 
+This manual documents the GNU implementation of the OpenACC API for 
+offloading of code to accelerator devices in C/C++ and Fortran and
+the GNU implementation of the OpenMP API for 
 multi-platform shared-memory parallel programming in C/C++ and Fortran.
 
 Published by the Free Software Foundation
@@ -48,7 +50,7 @@ Boston, MA 02110-1301 USA
 @setchapternewpage odd
 
 @titlepage
-@title The GNU OpenMP Implementation
+@title The GNU OpenACC and OpenMP Implementation
 @page
 @vskip 0pt plus 1filll
 @comment For the @value{version-GCC} Version*
@@ -69,7 +71,10 @@ Boston, MA 02110-1301, USA@*
 @top Introduction
 @cindex Introduction
 
-This manual documents the usage of libgomp, the GNU implementation of the 
+This manual documents the usage of libgomp, the GNU implementation of the
+@uref{http://www.openacc.org/, OpenACC} Application Programming Interface (API)
+for offloading of code to accelerator devices in C/C++ and Fortran, and
+the GNU implementation of the 
 @uref{http://www.openmp.org, OpenMP} Application Programming Interface (API)
 for multi-platform shared-memory parallel programming in C/C++ and Fortran.
 
@@ -81,23 +86,619 @@ for multi-platform shared-memory parallel programming in C/C++ and Fortran.
 @comment  better formatting.
 @comment
 @menu
-* Enabling OpenMP::            How to enable OpenMP for your applications.
-* Runtime Library Routines::   The OpenMP runtime application programming 
-                               interface.
-* Environment Variables::      Influencing runtime behavior with environment 
-                               variables.
-* The libgomp ABI::            Notes on the external ABI presented by libgomp.
-* Reporting Bugs::             How to report bugs in GNU OpenMP.
-* Copying::                    GNU general public license says
-                               how you can copy and share libgomp.
-* GNU Free Documentation License::
-                               How you can copy and share this manual.
-* Funding::                    How to help assure continued work for free 
-                               software.
-* Library Index::              Index of this documentation.
+* Enabling OpenACC::                 How to enable OpenACC for your
+                                     applications.
+* OpenACC Runtime Library Routines:: The OpenACC runtime application
+                                      programming interface.
+* OpenACC Environment Variables::    Influencing OpenACC runtime behavior with
+                                     environment variables.
+* OpenACC Library Interoperability:: OpenACC library interoperability with the
+                                     NVIDIA CUBLAS library.
+* Enabling OpenMP::                  How to enable OpenMP for your
+                                     applications.
+* OpenMP Runtime Library Routines: Runtime Library Routines.
+                                     The OpenMP runtime application programming
+                                     interface.
+* OpenMP Environment Variables: Environment Variables.
+                                     Influencing OpenMP runtime behavior with
+                                     environment variables.
+* The libgomp ABI::                  Notes on the external libgomp ABI.
+* Reporting Bugs::                   How to report bugs.
+* Copying::                          GNU general public license says how you
+                                     can copy and share libgomp.
+* GNU Free Documentation License::   How you can copy and share this manual.
+* Funding::                          How to help assure continued work for free
+                                     software.
+* Library Index::                    Index of this documentation.
 @end menu
 
 
+
+@c ---------------------------------------------------------------------
+@c Enabling OpenACC
+@c ---------------------------------------------------------------------
+
+@node Enabling OpenACC
+@chapter Enabling OpenACC
+
+To activate the OpenACC extensions for C/C++ and Fortran, the compile-time 
+flag @command{-fopenacc} must be specified.  This enables OpenACC, and
+arranges for automatic linking of the OpenACC runtime library 
+(@ref{Runtime Library Routines}).
+
+A complete description of all OpenACC directives accepted may be found in 
+the @uref{http://www.openacc.org/, OpenMP Application Programming
+Interface} manual, version 2.0.
+
+
+@c ---------------------------------------------------------------------
+@c OpenACC Runtime Library Routines
+@c ---------------------------------------------------------------------
+
+@node OpenACC Runtime Library Routines
+@chapter OpenACC Runtime Library Routines
+
+The runtime routines described here are defined by section 3 of the OpenACC
+specifications in version 2.0.
+They have C linkage, and do not throw exceptions.
+Generally, they are available only for the host, with the exception of
+@code{acc_on_device}, which is available for both the host and the
+acceleration device.
+
+@menu
+* acc_get_num_devices::         Get number of devices for the given device type
+* acc_set_device_type::
+* acc_get_device_type::
+* acc_set_device_num::
+* acc_get_device_num::
+* acc_init::
+* acc_shutdown::
+* acc_on_device::               Whether executing on a particular device
+* acc_malloc::
+* acc_free::
+* acc_copyin::
+* acc_present_or_copyin::
+* acc_create::
+* acc_present_or_create::
+* acc_copyout::
+* acc_delete::
+* acc_update_device::
+* acc_update_self::
+* acc_map_data::
+* acc_unmap_data::
+* acc_deviceptr::
+* acc_hostptr::
+* acc_is_present::
+* acc_memcpy_to_device::
+* acc_memcpy_from_device::
+@end menu
+
+API routines for target platforms.
+
+@menu
+* acc_get_current_cuda_device::
+* acc_get_current_cuda_context::
+* acc_get_cuda_stream::
+* acc_set_cuda_stream::
+@end menu
+
+
+
+@node acc_get_num_devices
+@section @code{acc_get_num_devices} -- Get number of devices for given device type
+@table @asis
+item @emph{Description}
+This routine returns a value, between 0 and @emph{n}, indicating the
+number of devices available for the given device type. It determines
+the number of devices in a @emph{passive} manner. In other words, it
+does not alter the state within the runtime environment aside from
+possibly initializing an uninitialized device. This aspect allows
+the routine to be called without concern for altering the interaction
+with an attached accelerator device.
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.1.
+@end table
+
+
+
+@node acc_set_device_type
+@section @code{acc_set_device_type}
+@table @asis
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.2.
+@end table
+
+
+
+@node acc_get_device_type
+@section @code{acc_get_device_type}
+@table @asis
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.3.
+@end table
+
+
+
+@node acc_set_device_num
+@section @code{acc_set_device_num}
+@table @asis
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.4.
+@end table
+
+
+
+@node acc_get_device_num
+@section @code{acc_get_device_num}
+@table @asis
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.5.
+@end table
+
+
+
+@node acc_init
+@section @code{acc_init}
+@table @asis
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.12.
+@end table
+
+
+
+@node acc_shutdown
+@section @code{acc_shutdown}
+@table @asis
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.13.
+@end table
+
+
+
+@node acc_on_device
+@section @code{acc_on_device} -- Whether executing on a particular device
+@table @asis
+@item @emph{Description}:
+This routine tells the program whether it is executing on a particular
+device.  Based on the argument passed, GCC tries to evaluate this to a
+constant at compile time, but library functions are also provided, for
+both the host and the acceleration device.
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.14.
+@end table
+
+
+
+@node acc_malloc
+@section @code{acc_malloc}
+@table @asis
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.15.
+@end table
+
+
+
+@node acc_free
+@section @code{acc_free}
+@table @asis
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.16.
+@end table
+
+
+
+@node acc_copyin
+@section @code{acc_copyin}
+@table @asis
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.17.
+@end table
+
+
+
+@node acc_present_or_copyin
+@section @code{acc_present_or_copyin}
+@table @asis
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.18.
+@end table
+
+
+
+@node acc_create
+@section @code{acc_create}
+@table @asis
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.19.
+@end table
+
+
+
+@node acc_present_or_create
+@section @code{acc_present_or_create}
+@table @asis
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.20.
+@end table
+
+
+
+@node acc_copyout
+@section @code{acc_copyout}
+@table @asis
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.21.
+@end table
+
+
+
+@node acc_delete
+@section @code{acc_delete}
+@table @asis
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.22.
+@end table
+
+
+
+@node acc_update_device
+@section @code{acc_update_device}
+@table @asis
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.23.
+@end table
+
+
+
+@node acc_update_self
+@section @code{acc_update_self}
+@table @asis
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.24.
+@end table
+
+
+
+@node acc_map_data
+@section @code{acc_map_data}
+@table @asis
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.25.
+@end table
+
+
+
+@node acc_unmap_data
+@section @code{acc_unmap_data}
+@table @asis
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.26.
+@end table
+
+
+
+@node acc_deviceptr
+@section @code{acc_deviceptr}
+@table @asis
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.27.
+@end table
+
+
+
+@node acc_hostptr
+@section @code{acc_hostptr}
+@table @asis
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.28.
+@end table
+
+
+
+@node acc_is_present
+@section @code{acc_is_present}
+@table @asis
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.29.
+@end table
+
+
+
+@node acc_memcpy_to_device
+@section @code{acc_memcpy_to_device}
+@table @asis
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.30.
+@end table
+
+
+
+@node acc_memcpy_from_device
+@section @code{acc_memcpy_from_device}
+@table @asis
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.31.
+@end table
+
+
+
+@node acc_get_current_cuda_device
+@section @code{acc_get_current_cuda_device}
+@table @asis
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+A.2.1.1.
+@end table
+
+
+
+@node acc_get_current_cuda_context
+@section @code{acc_get_current_cuda_context}
+@table @asis
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+A.2.1.2.
+@end table
+
+
+
+@node acc_get_cuda_stream
+@section @code{acc_get_cuda_stream}
+@table @asis
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+A.2.1.3.
+@end table
+
+
+
+@node acc_set_cuda_stream
+@section @code{acc_set_cuda_stream}
+@table @asis
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+A.2.1.4.
+@end table
+
+
+
+@c ---------------------------------------------------------------------
+@c OpenACC Environment Variables
+@c ---------------------------------------------------------------------
+
+@node OpenACC Environment Variables
+@chapter OpenACC Environment Variables
+
+The variables @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM}
+are defined by section 4 of the OpenACC specification in version 2.0.
+The variable @env{GCC_ACC_NOTIFY} is used for diagnostic purposes.
+
+@menu
+* ACC_DEVICE_TYPE::
+* ACC_DEVICE_NUM::
+* GCC_ACC_NOTIFY::
+@end menu
+
+
+
+@node ACC_DEVICE_TYPE
+@section @code{ACC_DEVICE_TYPE}
+@table @asis
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+4.1.
+@end table
+
+
+
+@node ACC_DEVICE_NUM
+@section @code{ACC_DEVICE_NUM}
+@table @asis
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+4.2.
+@end table
+
+
+
+@node GCC_ACC_NOTIFY
+@section @code{GCC_ACC_NOTIFY}
+@table @asis
+@item @emph{Description}:
+Print debug information pertaining to the accelerator.
+@end table
+
+
+@c ---------------------------------------------------------------------
+@c OpenACC Library Interoperability
+@c ---------------------------------------------------------------------
+
+@node OpenACC Library Interoperability
+@chapter OpenACC Library Interoperability
+
+@section Introduction
+
+As the OpenACC library is built using the CUDA Driver API, the question has
+arisen on what impact does using the OpenACC library have on a program that
+uses the Runtime library, or a library based on the Runtime library, e.g.,
+CUBLAS@footnote{Seee section 2.26, "Interactions with the CUDA Driver API" in
+"CUDA Runtime API", Version 5.5, July 2013 and section 2.27, "VDPAU
+Interoperability", in "CUDA Driver API", TRM-06703-001, Version 5.5,
+July 2013, for additional information on library interoperability.}.
+This chapter will describe the use cases and what changes are
+required in order to use both the OpenACC library and the CUBLAS and Runtime
+libraries within a program.
+
+@section First invocation: NVIDIA CUBLAS library API
+
+In this first use case (see below), a function in the CUBLAS library is called
+prior to any of the functions in the OpenACC library. More specifically, the
+function @code{cublasCreate()}.
+
+When invoked, the function will initialize the library and allocate the
+hardware resources on the host and the device on behalf of the caller. Once
+the initialization and allocation has completed, a handle is returned to the
+caller. The OpenACC library also requires initialization and allocation of
+hardware resources. Since the CUBLAS library has already allocated the
+hardware resources for the device, all that is left to do is to initialize
+the OpenACC library and acquire the hardware resources on the host.
+
+Prior to calling the OpenACC function that will initialize the library and
+allocate the host hardware resources, one needs to acquire the device number
+that was allocated during the call to @code{cublasCreate()}. The invoking of the
+runtime library function @code{cudaGetDevice()} will accomplish this. Once
+acquired, the device number is passed along with the device type as
+parameters to the OpenACC library function @code{acc_set_device_num()}.
+
+Once the call to @code{acc_set_device_num()} has completed, the OpenACC
+library will be using the  context that was created during the call to
+@code{cublasCreate()}. In other words, both libraries will be sharing the
+same context.
+
+@verbatim
+    /* Create the handle */
+    s = cublasCreate(&h);
+    if (s != CUBLAS_STATUS_SUCCESS)
+    {
+        fprintf(stderr, "cublasCreate failed %d\n", s);
+        exit(EXIT_FAILURE);
+    }
+
+    /* Get the device number */
+    e = cudaGetDevice(&dev);
+    if (e != cudaSuccess)
+    {
+        fprintf(stderr, "cudaGetDevice failed %d\n", e);
+        exit(EXIT_FAILURE);
+    }
+
+    /* Initialize OpenACC library and use device 'dev' */
+    acc_set_device_num(dev, acc_device_nvidia);
+
+@end verbatim
+@center Use Case 1 
+
+@section First invocation: OpenACC library API
+
+In this second use case (see below), a function in the OpenACC library is
+called prior to any of the functions in the CUBLAS library. More specificially,
+the function acc_set_device_num().
+
+In the use case presented here, the function @code{acc_set_device_num()}
+is used to both initialize the OpenACC library and allocate the hardware
+resources on the host and the device. In the call to the function, the
+call parameters specify which device to use, i.e., 'dev', and what device
+type to use, i.e., @code{acc_device_nvidia}. It should be noted that this
+is but one method to initialize the OpenACC library and allocate the
+appropriate hardware resources. Other methods are available through the
+use of environment variables and these will be discussed in the next section.
+
+Once the call to @code{acc_set_device_num()} has completed, other OpenACC
+functions can be called as seen with multiple calls being made to
+@code{acc_copyin()}. In addition, calls can be made to functions in the
+CUBLAS library. In the use case a call to @code{cublasCreate()} is made
+subsequent to the calls to @code{acc_copyin()}.
+As seen in the previous use case, a call to @code{cublasCreate()} will
+initialize the CUBLAS library and allocate the hardware resources on the
+host and the device.  However, since the device has already been allocated,
+@code{cublasCreate()} will only initialize the CUBLAS library and allocate
+the appropriate hardware resources on the host. The context that was created
+as part of the OpenACC initialization will be shared with the CUBLAS library,
+similarly to the first use case.
+
+@verbatim
+    dev = 0;
+
+    acc_set_device_num(dev, acc_device_nvidia);
+
+    /* Copy the first set to the device */
+    d_X = acc_copyin(&h_X[0], N * sizeof (float));
+    if (d_X == NULL)
+    { 
+        fprintf(stderr, "copyin error h_X\n");
+        exit(EXIT_FAILURE);
+    }
+
+    /* Copy the second set to the device */
+    d_Y = acc_copyin(&h_Y1[0], N * sizeof (float));
+    if (d_Y == NULL)
+    { 
+        fprintf(stderr, "copyin error h_Y1\n");
+        exit(EXIT_FAILURE);
+    }
+
+    /* Create the handle */
+    s = cublasCreate(&h);
+    if (s != CUBLAS_STATUS_SUCCESS)
+    {
+        fprintf(stderr, "cublasCreate failed %d\n", s);
+        exit(EXIT_FAILURE);
+    }
+
+    /* Perform saxpy using CUBLAS library function */
+    s = cublasSaxpy(h, N, &alpha, d_X, 1, d_Y, 1);
+    if (s != CUBLAS_STATUS_SUCCESS)
+    {
+        fprintf(stderr, "cublasSaxpy failed %d\n", s);
+        exit(EXIT_FAILURE);
+    }
+
+    /* Copy the results from the device */
+    acc_memcpy_from_device(&h_Y1[0], d_Y, N * sizeof (float));
+
+}
+@end verbatim
+@center Use Case 2
+
+@section OpenACC library and environment variables
+
+There are two environment variables associated with the OpenACC library that
+may be used to control the device type and device number.
+Namely, @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM}. In the second
+use case, the device type and device number were specified using
+@code{acc_set_device_num()}. However, @env{ACC_DEVICE_TYPE} and 
+@env{ACC_DEVICE_NUM} could have been defined and the call to
+@code{acc_set_device_num()} would be not be required. At the time of the
+call to @code{acc_copyin()}, these two environment variables would be
+sampled and their values used.
+
+The use of the environment variables is only relevant when an OpenACC function
+is called prior to a call to @code{cudaCreate()}. If @code{cudaCreate()}
+is called prior to a call to an OpenACC function, then a call to
+@code{acc_set_device_num()}, must be done@footnote{More complete information
+about @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM} can be found in
+sections 4.1 and 4.2 of the “The OpenACC
+Application Programming Interface”, Version 2.0, June, 2013.}.
+
+
+
 @c ---------------------------------------------------------------------
 @c Enabling OpenMP
 @c ---------------------------------------------------------------------
@@ -120,11 +721,11 @@ version 4.0.
 
 
 @c ---------------------------------------------------------------------
-@c Runtime Library Routines
+@c OpenMP Runtime Library Routines
 @c ---------------------------------------------------------------------
 
 @node Runtime Library Routines
-@chapter Runtime Library Routines
+@chapter OpenMP Runtime Library Routines
 
 The runtime routines described here are defined by Section 3 of the OpenMP
 specification in version 4.0.  The routines are structured in following
@@ -1281,11 +1882,11 @@ guaranteed not to change during the execution of the program.
 
 
 @c ---------------------------------------------------------------------
-@c Environment Variables
+@c OpenMP Environment Variables
 @c ---------------------------------------------------------------------
 
 @node Environment Variables
-@chapter Environment Variables
+@chapter OpenMP Environment Variables
 
 The environment variables which beginning with @env{OMP_} are defined by
 section 4 of the OpenMP specification in version 4.0, while those
@@ -1701,6 +2302,7 @@ presented by libgomp.  Only maintainers should need them.
 * Implementing ORDERED construct::
 * Implementing SECTIONS construct::
 * Implementing SINGLE construct::
+* Implementing OpenACC's PARALLEL construct::
 @end menu
 
 
@@ -2065,15 +2667,24 @@ becomes
 
 
 
+@node Implementing OpenACC's PARALLEL construct
+@section Implementing OpenACC's PARALLEL construct
+
+@smallexample
+  void GOACC_parallel ()
+@end smallexample
+
+
+
 @c ---------------------------------------------------------------------
-@c 
+@c Reporting Bugs
 @c ---------------------------------------------------------------------
 
 @node Reporting Bugs
 @chapter Reporting Bugs
 
-Bugs in the GNU OpenMP implementation should be reported via 
-@uref{http://gcc.gnu.org/bugzilla/, Bugzilla}.  For all cases, please add 
+Bugs in the GNU OpenACC or OpenMP implementation should be reported via
+@uref{http://gcc.gnu.org/bugzilla/, Bugzilla}.  For OpenMP cases, please add
 "openmp" to the keywords field in the bug report.
 
 
-- 
1.7.10.4


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [gomp4] [1/3] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin
@ 2014-10-14 16:12 Julian Brown
  2014-10-14 16:33 ` [gomp4] [2/3] OpenACC 2.0 support for libgomp - new tests Julian Brown
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Julian Brown @ 2014-10-14 16:12 UTC (permalink / raw)
  To: gcc-patches, Jakub Jelinek, Thomas Schwinge

[-- Attachment #1: Type: text/plain, Size: 6193 bytes --]

This is a slightly-updated version of the following patch, but this
time tested (with the aid of a series of patches implementing PTX
support from Bernd Schmidt), and against the gomp4 branch:

https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02022.html

Results (at least for the parts where the middle-end support is on the
branch already) are comparable with our local development branch.

Many of Jakub's initial review comments from the mainline version of
the patch have not yet been addressed, but I have a couple of bits ready
as follow-up patches and will be posting those shortly also. I plan to
address the remainder of the issues directly on the gomp4 branch, if
possible.

OK to apply (to the gomp4 branch)?

Thanks,

Julian

ChangeLog

xxxx-xx-xx  Nathan Sidwell  <nathan@codesourcery.com>
	    James Norris  <jnorris@codesourcery.com>
	    Thomas Schwinge  <thomas@codesourcery.com>
	    Tom de Vries  <tom@codesourcery.com>
	    Julian Brown  <julian@codesourcery.com>

    include/
    * gomp-constants.h: New file.

    libgomp/
    * Makefile.am (AM_CPPFLAGS): Search in ../include also.
    (libgomp_plugin_nvptx_version_info, libgomp_plugin_nvptx_la_SOURCES)
    (libgomp_plugin_nvptx_la_CPPFLAGS, libgomp_plugin_nvptx_la_LDFLAGS)
    (libgomp_plugin_nvptx_la_LIBADD)
    (libgomp_plugin_nvptx_la_LIBTOOLFLAGS): Set variables if
    PLUGIN_NVPTX is defined.
    (toolexeclib_LTLIBRARIES): Add nonshm-host
    and (conditionally) nvidia plugins.
    (libgomp_plugin_nonshm_host_version_info)
    (libgomp_plugin_nonshm_host_la_SOURCES)
    (libgomp_plugin_nonshm_host_la_CPPFLAGS)
    (libgomp_plugin_nonshm_host_la_LDFLAGS)
    (libgomp_plugin_nonshm_host_la_LIBTOOLFLAGS): Set variables.
    (libgomp_la_SOURCES): Add oacc-parallel.c, splay-tree.c,
    oacc-host.c, oacc-init.c, oacc-mem.c, oacc-async.c, oacc-plugin.c,
    oacc-cuda.c, libgomp-plugin.c.
    (nodist_libsubinclude_HEADERS): Add openacc.h,
    ../include/gomp-constants.h.
    * Makefile.in: Regenerate.
    * config.h.in: Regenerate.
    * configure.ac: Add TODOs for OpenACC in various places.
    (CUDA_DRIVER_CPPFLAGS, CUDA_DRIVER_LDFLAGS): Initialize.
    (--with-cuda-driver, --with-cuda-driver-include)
    (--with-cuda-driver-lib, --enable-offload-targets): Implement new
    options.
    (PLUGIN_NVPTX, PLUGIN_NVPTX_CPPFLAGS, PLUGIN_NVPTX_LDFLAGS)
    (PLUGIN_NVPTX_LIBS): Initialize variables.
    * configure: Regenerate.
    * env.c (target.h): Include.
    (goacc_device_num, goacc_device_type): New globals.
    (goacc_parse_device_num, goacc_parse_device_type): New functions.
    (initialize_env): Parse GCC_ACC_NOTIFY, ACC_DEVICE_TYPE,
    ACC_DEVICE_NUM environment variables.
    * error.c (gomp_verror, gomp_vfatal, gomp_vnotify, gomp_notify):
    New functions.
    (gomp_fatal): Make global.
    * libgomp.h (stdarg.h): Include.
    (struct gomp_memory_mapping): Forward declaration.
    (struct gomp_task_icv): Add acc_notify_var member.
    (goacc_device_num, goacc_device_type): Add extern declarations.
    (gomp_vnotify, gomp_notify, gomp_verror, gomp_vfatal): Add
    prototypes.
    (gomp_init_targets_once): Add prototype.
    * libgomp.map (OACC_2.0): New symbol version. Add public acc_*
    interface functions.
    (PLUGIN_1.0): New symbol version. Add gomp plugin interface
    functions.
    * libgomp_g.h (GOACC_kernels, GOACC_parallel): Update prototypes.
    (GOACC_wait): Add prototype.
    * target.c (limits.h, stdbool.h, stdlib.h): Don't include.
    (oacc-plugin.h, gomp-constants.h, stdio.h, assert.h): Include.
    (splay_tree_node, splay_tree, splay_tree_key, target_mem_desc)
    (splay_tree_key_s, enum target_type, gomp_device_descr): Don't
    declare here.
    (splay-tree.h): Include.
    (target.h): Include.
    (splay_compare): Change linkage to hidden not static.
    (gomp_init_targets_once): New function.
    (gomp_get_num_devices): Use above.
    (dump_mappings): New function (for debugging).
    (get_kind): New function.
    (gomp_map_vars): Add gomp_memory_mapping (mm), is_openacc
    parameters. Change KINDS to void *. Use lock from memory map
    not device. Use macros from gomp-constants.h instead of
    hard-coded values. Support OpenACC-specific mappings.
    (gomp_copy_from_async): New function.
    (gomp_unmap_vars): Add DO_COPYFROM argument. Only copy memory
    back from device if it is true. Use lock from memory map not
    device.
    (gomp_update): Add mm, is_openacc args. Use lock from
    memory map not device. Use macros from gomp-constants.h not
    hard-coded values.
    (gomp_register_image_for_device): Add forward
    declaration.
    (GOMP_offload_register): Change TARGET_DATA type to
    void **. Check realloc result.
    (gomp_init_device): Change linkage to hidden not static. Tweak mem
    map location.
    (gomp_fini_device): New function.
    (GOMP_target): Adjust lazy initialization, check target
    capabilities for OpenMP 4.0 support. Add locking around splay tree
    lookup. Add new arg to gomp_unmap_vars call.
    (GOMP_target_data): Tweak lazy initialization. Add new args to
    gomp_map_vars, gomp_unmap_vars calls.
    (GOMP_target_update): Tweak lazy initialization. Add new args to
    gomp_update call.
    (gomp_load_plugin_for_device): Initialize device_fini and
    OpenACC-specific plugin hooks.
    (gomp_register_images_for_device): Rename to...
    (gomp_register_image_for_device): This, and register a single
    device only, and only if it has not already had images registered.
    (gomp_find_available_plugins): Rearrange to fix plugin loading and
    initialization for OpenACC. Prefer a device with
    TARGET_CAP_OPENMP_400 for OpenMP.
    * target.h: New file.
    * splay-tree.h: Move bulk of implementation to...
    * splay-tree.c: New file.
    * libgomp-plugin.c: New file.
    * libgomp-plugin.h: New file.
    * oacc-async.c: New file.
    * oacc-cuda.c: New file.
    * oacc-host.c: New file.
    * oacc-init.c: New file.
    * oacc-mem.c: New file.
    * oacc-parallel.c: New file.
    * oacc-plugin.c: New file.
    * plugin-nvptx.c: New file.
    * oacc-int.h: New file.
    * openacc.f90: New file.
    * openacc.h: New file.
    * openacc_lib.h: New file.


[-- Attachment #2: 0003-libgomp-openacc-support.patch --]
[-- Type: text/x-patch, Size: 255676 bytes --]

From f8be7c084a8a6eb85112195db2b4bc78c3a704e9 Mon Sep 17 00:00:00 2001
From: Julian Brown <julian@codesourcery.com>
Date: Tue, 30 Sep 2014 04:50:08 -0700
Subject: [PATCH 3/3] libgomp openacc support

---
 include/gomp-constants.h |   45 ++
 libgomp/Makefile.am      |   34 +-
 libgomp/Makefile.in      |  119 ++-
 libgomp/config.h.in      |    3 +
 libgomp/configure        |  130 +++-
 libgomp/configure.ac     |   76 ++
 libgomp/env.c            |   43 ++
 libgomp/error.c          |   33 +-
 libgomp/fortran.c        |    8 -
 libgomp/libgomp-plugin.c |  106 +++
 libgomp/libgomp-plugin.h |   57 ++
 libgomp/libgomp.h        |   11 +
 libgomp/libgomp.map      |   93 ++-
 libgomp/libgomp_g.h      |    5 +-
 libgomp/oacc-async.c     |   80 ++
 libgomp/oacc-cuda.c      |   81 ++
 libgomp/oacc-host.c      |  425 +++++++++++
 libgomp/oacc-init.c      |  513 +++++++++++++
 libgomp/oacc-int.h       |  127 ++++
 libgomp/oacc-mem.c       |  528 +++++++++++++
 libgomp/oacc-parallel.c  |  376 +++++++--
 libgomp/oacc-plugin.c    |   44 ++
 libgomp/oacc-plugin.h    |   32 +
 libgomp/openacc.f90      |  929 ++++++++++++++++++++++-
 libgomp/openacc.h        |   97 ++-
 libgomp/openacc_lib.h    |  373 ++++++++-
 libgomp/plugin-nvptx.c   | 1882 ++++++++++++++++++++++++++++++++++++++++++++++
 libgomp/splay-tree.c     |  224 ++++++
 libgomp/splay-tree.h     |  203 +----
 libgomp/target.c         |  785 ++++++++++++-------
 libgomp/target.h         |  178 +++++
 31 files changed, 7073 insertions(+), 567 deletions(-)
 create mode 100644 include/gomp-constants.h
 create mode 100644 libgomp/libgomp-plugin.c
 create mode 100644 libgomp/libgomp-plugin.h
 create mode 100644 libgomp/oacc-async.c
 create mode 100644 libgomp/oacc-cuda.c
 create mode 100644 libgomp/oacc-host.c
 create mode 100644 libgomp/oacc-init.c
 create mode 100644 libgomp/oacc-int.h
 create mode 100644 libgomp/oacc-mem.c
 create mode 100644 libgomp/oacc-plugin.c
 create mode 100644 libgomp/oacc-plugin.h
 create mode 100644 libgomp/plugin-nvptx.c
 create mode 100644 libgomp/splay-tree.c
 create mode 100644 libgomp/target.h

diff --git a/include/gomp-constants.h b/include/gomp-constants.h
new file mode 100644
index 0000000..7ef5c88
--- /dev/null
+++ b/include/gomp-constants.h
@@ -0,0 +1,45 @@
+#ifndef GOMP_CONSTANTS_H
+#define GOMP_CONSTANTS_H 1
+
+/* Enumerated variable mapping types used to communicate between GCC and
+   libgomp.  These values are used for both OpenMP and OpenACC.  */
+
+#define GOMP_MAP_ALLOC			0x00
+#define GOMP_MAP_ALLOC_TO		0x01
+#define GOMP_MAP_ALLOC_FROM		0x02
+#define GOMP_MAP_ALLOC_TOFROM		0x03
+#define GOMP_MAP_POINTER		0x04
+#define GOMP_MAP_TO_PSET		0x05
+#define GOMP_MAP_FORCE_ALLOC		0x08
+#define GOMP_MAP_FORCE_TO		0x09
+#define GOMP_MAP_FORCE_FROM		0x0a
+#define GOMP_MAP_FORCE_TOFROM		0x0b
+#define GOMP_MAP_FORCE_PRESENT		0x0c
+#define GOMP_MAP_FORCE_DEALLOC		0x0d
+#define GOMP_MAP_FORCE_DEVICEPTR	0x0e
+#define GOMP_MAP_FORCE_PRIVATE		0x18
+#define GOMP_MAP_FORCE_FIRSTPRIVATE	0x19
+
+#define GOMP_MAP_COPYTO_P(X) \
+  ((X) == GOMP_MAP_ALLOC_TO || (X) == GOMP_MAP_FORCE_TO)
+
+#define GOMP_MAP_COPYFROM_P(X) \
+  ((X) == GOMP_MAP_ALLOC_FROM || (X) == GOMP_MAP_FORCE_FROM)
+
+#define GOMP_MAP_TOFROM_P(X) \
+  ((X) == GOMP_MAP_ALLOC_TOFROM || (X) == GOMP_MAP_FORCE_TOFROM)
+
+#define GOMP_MAP_POINTER_P(X) \
+  ((X) == GOMP_MAP_POINTER)
+
+#define GOMP_IF_CLAUSE_FALSE		-2
+
+/* Canonical list of target type codes for OpenMP/OpenACC.  */
+#define GOMP_TARGET_NONE		0
+#define GOMP_TARGET_HOST		2
+#define GOMP_TARGET_HOST_NONSHM		3
+#define GOMP_TARGET_NOT_HOST		4
+#define GOMP_TARGET_NVIDIA_PTX		5
+#define GOMP_TARGET_INTEL_MIC		6
+
+#endif
diff --git a/libgomp/Makefile.am b/libgomp/Makefile.am
index 37b36bd..7ddb0a4 100644
--- a/libgomp/Makefile.am
+++ b/libgomp/Makefile.am
@@ -14,13 +14,35 @@ libsubincludedir = $(libdir)/gcc/$(target_alias)/$(gcc_version)/include
 
 vpath % $(strip $(search_path))
 
-AM_CPPFLAGS = $(addprefix -I, $(search_path))
+AM_CPPFLAGS = $(addprefix -I, $(search_path)) \
+	$(addprefix -I, $(search_path)/../include)
 AM_CFLAGS = $(XCFLAGS)
 AM_LDFLAGS = $(XLDFLAGS) $(SECTION_LDFLAGS) $(OPT_LDFLAGS)
 
 toolexeclib_LTLIBRARIES = libgomp.la
 nodist_toolexeclib_HEADERS = libgomp.spec
 
+if PLUGIN_NVPTX
+# Nvidia PTX OpenACC plugin.
+libgomp_plugin_nvptx_version_info = -version-info $(libtool_VERSION)
+toolexeclib_LTLIBRARIES += libgomp-plugin-nvptx.la
+libgomp_plugin_nvptx_la_SOURCES = plugin-nvptx.c
+libgomp_plugin_nvptx_la_CPPFLAGS = $(AM_CPPFLAGS) $(PLUGIN_NVPTX_CPPFLAGS)
+libgomp_plugin_nvptx_la_LDFLAGS = $(libgomp_plugin_nvptx_version_info) \
+	$(lt_host_flags)
+libgomp_plugin_nvptx_la_LDFLAGS += $(PLUGIN_NVPTX_LDFLAGS)
+libgomp_plugin_nvptx_la_LIBADD = $(PLUGIN_NVPTX_LIBS)
+libgomp_plugin_nvptx_la_LIBTOOLFLAGS = --tag=disable-static
+endif
+
+libgomp_plugin_host_nonshm_version_info = -version-info $(libtool_VERSION)
+toolexeclib_LTLIBRARIES += libgomp-plugin-host_nonshm.la
+libgomp_plugin_host_nonshm_la_SOURCES = oacc-host.c
+libgomp_plugin_host_nonshm_la_CPPFLAGS = $(AM_CPPFLAGS) -DHOST_NONSHM_PLUGIN
+libgomp_plugin_host_nonshm_la_LDFLAGS = \
+	$(libgomp_plugin_host_nonshm_version_info) $(lt_host_flags)
+libgomp_plugin_host_nonshm_la_LIBTOOLFLAGS = --tag=disable-static
+
 if LIBGOMP_BUILD_VERSIONED_SHLIB
 # -Wc is only a libtool option.
 comma = ,
@@ -60,10 +82,16 @@ libgomp_la_LINK = $(LINK) $(libgomp_la_LDFLAGS)
 libgomp_la_SOURCES = alloc.c barrier.c critical.c env.c error.c iter.c \
 	iter_ull.c loop.c loop_ull.c ordered.c parallel.c sections.c single.c \
 	task.c team.c work.c lock.c mutex.c proc.c sem.c bar.c ptrlock.c \
-	time.c fortran.c affinity.c target.c oacc-parallel.c
+	time.c fortran.c affinity.c target.c oacc-parallel.c splay-tree.c \
+	oacc-host.c oacc-init.c oacc-mem.c oacc-async.c \
+	oacc-plugin.c oacc-cuda.c libgomp-plugin.c
+
+if USE_FORTRAN
+libgomp_la_SOURCES += openacc.f90
+endif
 
 nodist_noinst_HEADERS = libgomp_f.h
-nodist_libsubinclude_HEADERS = omp.h openacc.h
+nodist_libsubinclude_HEADERS = omp.h openacc.h ../include/gomp-constants.h
 if USE_FORTRAN
 nodist_finclude_HEADERS = omp_lib.h omp_lib.f90 omp_lib.mod omp_lib_kinds.mod \
 	openacc_lib.h openacc.f90 openacc.mod openacc_kinds.mod
diff --git a/libgomp/Makefile.in b/libgomp/Makefile.in
index bc60253..4965442 100644
--- a/libgomp/Makefile.in
+++ b/libgomp/Makefile.in
@@ -36,6 +36,8 @@ POST_UNINSTALL = :
 build_triplet = @build@
 host_triplet = @host@
 target_triplet = @target@
+@PLUGIN_NVPTX_TRUE@am__append_1 = libgomp-plugin-nvptx.la
+@USE_FORTRAN_TRUE@am__append_2 = openacc.f90
 subdir = .
 DIST_COMMON = ChangeLog $(srcdir)/Makefile.in $(srcdir)/Makefile.am \
 	$(top_srcdir)/configure $(am__configure_deps) \
@@ -91,12 +93,38 @@ am__installdirs = "$(DESTDIR)$(toolexeclibdir)" "$(DESTDIR)$(infodir)" \
 	"$(DESTDIR)$(fincludedir)" "$(DESTDIR)$(libsubincludedir)" \
 	"$(DESTDIR)$(toolexeclibdir)"
 LTLIBRARIES = $(toolexeclib_LTLIBRARIES)
+libgomp_plugin_host_nonshm_la_LIBADD =
+am_libgomp_plugin_host_nonshm_la_OBJECTS =  \
+	libgomp_plugin_host_nonshm_la-oacc-host.lo
+libgomp_plugin_host_nonshm_la_OBJECTS =  \
+	$(am_libgomp_plugin_host_nonshm_la_OBJECTS)
+libgomp_plugin_host_nonshm_la_LINK = $(LIBTOOL) --tag=CC \
+	$(libgomp_plugin_host_nonshm_la_LIBTOOLFLAGS) $(LIBTOOLFLAGS) \
+	--mode=link $(CCLD) $(AM_CFLAGS) $(CFLAGS) \
+	$(libgomp_plugin_host_nonshm_la_LDFLAGS) $(LDFLAGS) -o $@
+am__DEPENDENCIES_1 =
+@PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_DEPENDENCIES =  \
+@PLUGIN_NVPTX_TRUE@	$(am__DEPENDENCIES_1)
+@PLUGIN_NVPTX_TRUE@am_libgomp_plugin_nvptx_la_OBJECTS =  \
+@PLUGIN_NVPTX_TRUE@	libgomp_plugin_nvptx_la-plugin-nvptx.lo
+libgomp_plugin_nvptx_la_OBJECTS =  \
+	$(am_libgomp_plugin_nvptx_la_OBJECTS)
+libgomp_plugin_nvptx_la_LINK = $(LIBTOOL) --tag=CC \
+	$(libgomp_plugin_nvptx_la_LIBTOOLFLAGS) $(LIBTOOLFLAGS) \
+	--mode=link $(CCLD) $(AM_CFLAGS) $(CFLAGS) \
+	$(libgomp_plugin_nvptx_la_LDFLAGS) $(LDFLAGS) -o $@
+@PLUGIN_NVPTX_TRUE@am_libgomp_plugin_nvptx_la_rpath = -rpath \
+@PLUGIN_NVPTX_TRUE@	$(toolexeclibdir)
 libgomp_la_LIBADD =
+@USE_FORTRAN_TRUE@am__objects_1 = openacc.lo
 am_libgomp_la_OBJECTS = alloc.lo barrier.lo critical.lo env.lo \
 	error.lo iter.lo iter_ull.lo loop.lo loop_ull.lo ordered.lo \
 	parallel.lo sections.lo single.lo task.lo team.lo work.lo \
 	lock.lo mutex.lo proc.lo sem.lo bar.lo ptrlock.lo time.lo \
-	fortran.lo affinity.lo target.lo oacc-parallel.lo
+	fortran.lo affinity.lo target.lo oacc-parallel.lo \
+	splay-tree.lo oacc-host.lo oacc-init.lo oacc-mem.lo \
+	oacc-async.lo oacc-plugin.lo oacc-cuda.lo libgomp-plugin.lo \
+	$(am__objects_1)
 libgomp_la_OBJECTS = $(am_libgomp_la_OBJECTS)
 DEFAULT_INCLUDES = -I.@am__isrc@
 depcomp = $(SHELL) $(top_srcdir)/../depcomp
@@ -108,7 +136,15 @@ LTCOMPILE = $(LIBTOOL) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) \
 	--mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) \
 	$(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS)
 CCLD = $(CC)
-SOURCES = $(libgomp_la_SOURCES)
+FCCOMPILE = $(FC) $(AM_FCFLAGS) $(FCFLAGS)
+LTFCCOMPILE = $(LIBTOOL) --tag=FC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) \
+	--mode=compile $(FC) $(AM_FCFLAGS) $(FCFLAGS)
+FCLD = $(FC)
+FCLINK = $(LIBTOOL) --tag=FC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) \
+	--mode=link $(FCLD) $(AM_FCFLAGS) $(FCFLAGS) $(AM_LDFLAGS) \
+	$(LDFLAGS) -o $@
+SOURCES = $(libgomp_plugin_host_nonshm_la_SOURCES) \
+	$(libgomp_plugin_nvptx_la_SOURCES) $(libgomp_la_SOURCES)
 MULTISRCTOP = 
 MULTIBUILDTOP = 
 MULTIDIRS = 
@@ -213,6 +249,10 @@ PACKAGE_URL = @PACKAGE_URL@
 PACKAGE_VERSION = @PACKAGE_VERSION@
 PATH_SEPARATOR = @PATH_SEPARATOR@
 PERL = @PERL@
+PLUGIN_NVPTX = @PLUGIN_NVPTX@
+PLUGIN_NVPTX_CPPFLAGS = @PLUGIN_NVPTX_CPPFLAGS@
+PLUGIN_NVPTX_LDFLAGS = @PLUGIN_NVPTX_LDFLAGS@
+PLUGIN_NVPTX_LIBS = @PLUGIN_NVPTX_LIBS@
 RANLIB = @RANLIB@
 SECTION_LDFLAGS = @SECTION_LDFLAGS@
 SED = @SED@
@@ -293,12 +333,32 @@ gcc_version := $(shell cat $(top_srcdir)/../gcc/BASE-VER)
 search_path = $(addprefix $(top_srcdir)/config/, $(config_path)) $(top_srcdir)
 fincludedir = $(libdir)/gcc/$(target_alias)/$(gcc_version)/finclude
 libsubincludedir = $(libdir)/gcc/$(target_alias)/$(gcc_version)/include
-AM_CPPFLAGS = $(addprefix -I, $(search_path))
+AM_CPPFLAGS = $(addprefix -I, $(search_path)) \
+	$(addprefix -I, $(search_path)/../include)
+
 AM_CFLAGS = $(XCFLAGS)
 AM_LDFLAGS = $(XLDFLAGS) $(SECTION_LDFLAGS) $(OPT_LDFLAGS)
-toolexeclib_LTLIBRARIES = libgomp.la
+toolexeclib_LTLIBRARIES = libgomp.la $(am__append_1) \
+	libgomp-plugin-host_nonshm.la
 nodist_toolexeclib_HEADERS = libgomp.spec
 
+# Nvidia PTX OpenACC plugin.
+@PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_version_info = -version-info $(libtool_VERSION)
+@PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_SOURCES = plugin-nvptx.c
+@PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_CPPFLAGS = $(AM_CPPFLAGS) $(PLUGIN_NVPTX_CPPFLAGS)
+@PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_LDFLAGS =  \
+@PLUGIN_NVPTX_TRUE@	$(libgomp_plugin_nvptx_version_info) \
+@PLUGIN_NVPTX_TRUE@	$(lt_host_flags) $(PLUGIN_NVPTX_LDFLAGS)
+@PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_LIBADD = $(PLUGIN_NVPTX_LIBS)
+@PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_LIBTOOLFLAGS = --tag=disable-static
+libgomp_plugin_host_nonshm_version_info = -version-info $(libtool_VERSION)
+libgomp_plugin_host_nonshm_la_SOURCES = oacc-host.c
+libgomp_plugin_host_nonshm_la_CPPFLAGS = $(AM_CPPFLAGS) -DHOST_NONSHM_PLUGIN
+libgomp_plugin_host_nonshm_la_LDFLAGS = \
+	$(libgomp_plugin_host_nonshm_version_info) $(lt_host_flags)
+
+libgomp_plugin_host_nonshm_la_LIBTOOLFLAGS = --tag=disable-static
+
 # -Wc is only a libtool option.
 @LIBGOMP_BUILD_VERSIONED_SHLIB_TRUE@comma = ,
 @LIBGOMP_BUILD_VERSIONED_SHLIB_TRUE@PREPROCESS = $(subst -Wc$(comma), , $(COMPILE)) -E
@@ -315,12 +375,14 @@ libgomp_la_LDFLAGS = $(libgomp_version_info) $(libgomp_version_script) \
 libgomp_la_DEPENDENCIES = $(libgomp_version_dep)
 libgomp_la_LINK = $(LINK) $(libgomp_la_LDFLAGS)
 libgomp_la_SOURCES = alloc.c barrier.c critical.c env.c error.c iter.c \
-	iter_ull.c loop.c loop_ull.c ordered.c parallel.c sections.c single.c \
-	task.c team.c work.c lock.c mutex.c proc.c sem.c bar.c ptrlock.c \
-	time.c fortran.c affinity.c target.c oacc-parallel.c
-
+	iter_ull.c loop.c loop_ull.c ordered.c parallel.c sections.c \
+	single.c task.c team.c work.c lock.c mutex.c proc.c sem.c \
+	bar.c ptrlock.c time.c fortran.c affinity.c target.c \
+	oacc-parallel.c splay-tree.c oacc-host.c oacc-init.c \
+	oacc-mem.c oacc-async.c oacc-plugin.c oacc-cuda.c \
+	libgomp-plugin.c $(am__append_2)
 nodist_noinst_HEADERS = libgomp_f.h
-nodist_libsubinclude_HEADERS = omp.h openacc.h
+nodist_libsubinclude_HEADERS = omp.h openacc.h ../include/gomp-constants.h
 @USE_FORTRAN_TRUE@nodist_finclude_HEADERS = omp_lib.h omp_lib.f90 omp_lib.mod omp_lib_kinds.mod \
 @USE_FORTRAN_TRUE@	openacc_lib.h openacc.f90 openacc.mod openacc_kinds.mod
 
@@ -353,7 +415,7 @@ all: config.h
 	$(MAKE) $(AM_MAKEFLAGS) all-recursive
 
 .SUFFIXES:
-.SUFFIXES: .c .dvi .lo .o .obj .ps
+.SUFFIXES: .c .dvi .f90 .lo .o .obj .ps
 am--refresh:
 	@:
 $(srcdir)/Makefile.in: @MAINTAINER_MODE_TRUE@ $(srcdir)/Makefile.am  $(am__configure_deps)
@@ -446,6 +508,10 @@ clean-toolexeclibLTLIBRARIES:
 	  echo "rm -f \"$${dir}/so_locations\""; \
 	  rm -f "$${dir}/so_locations"; \
 	done
+libgomp-plugin-host_nonshm.la: $(libgomp_plugin_host_nonshm_la_OBJECTS) $(libgomp_plugin_host_nonshm_la_DEPENDENCIES) 
+	$(libgomp_plugin_host_nonshm_la_LINK) -rpath $(toolexeclibdir) $(libgomp_plugin_host_nonshm_la_OBJECTS) $(libgomp_plugin_host_nonshm_la_LIBADD) $(LIBS)
+libgomp-plugin-nvptx.la: $(libgomp_plugin_nvptx_la_OBJECTS) $(libgomp_plugin_nvptx_la_DEPENDENCIES) 
+	$(libgomp_plugin_nvptx_la_LINK) $(am_libgomp_plugin_nvptx_la_rpath) $(libgomp_plugin_nvptx_la_OBJECTS) $(libgomp_plugin_nvptx_la_LIBADD) $(LIBS)
 libgomp.la: $(libgomp_la_OBJECTS) $(libgomp_la_DEPENDENCIES) 
 	$(libgomp_la_LINK) -rpath $(toolexeclibdir) $(libgomp_la_OBJECTS) $(libgomp_la_LIBADD) $(LIBS)
 
@@ -465,11 +531,20 @@ distclean-compile:
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/fortran.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/iter.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/iter_ull.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libgomp-plugin.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libgomp_plugin_host_nonshm_la-oacc-host.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libgomp_plugin_nvptx_la-plugin-nvptx.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/lock.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/loop.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/loop_ull.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/mutex.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/oacc-async.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/oacc-cuda.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/oacc-host.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/oacc-init.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/oacc-mem.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/oacc-parallel.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/oacc-plugin.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ordered.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/parallel.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/proc.Plo@am__quote@
@@ -477,6 +552,7 @@ distclean-compile:
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/sections.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/sem.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/single.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/splay-tree.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/target.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/task.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/team.Plo@am__quote@
@@ -504,6 +580,29 @@ distclean-compile:
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
 @am__fastdepCC_FALSE@	$(LTCOMPILE) -c -o $@ $<
 
+libgomp_plugin_host_nonshm_la-oacc-host.lo: oacc-host.c
+@am__fastdepCC_TRUE@	$(LIBTOOL)  --tag=CC $(libgomp_plugin_host_nonshm_la_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libgomp_plugin_host_nonshm_la_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) -MT libgomp_plugin_host_nonshm_la-oacc-host.lo -MD -MP -MF $(DEPDIR)/libgomp_plugin_host_nonshm_la-oacc-host.Tpo -c -o libgomp_plugin_host_nonshm_la-oacc-host.lo `test -f 'oacc-host.c' || echo '$(srcdir)/'`oacc-host.c
+@am__fastdepCC_TRUE@	$(am__mv) $(DEPDIR)/libgomp_plugin_host_nonshm_la-oacc-host.Tpo $(DEPDIR)/libgomp_plugin_host_nonshm_la-oacc-host.Plo
+@AMDEP_TRUE@@am__fastdepCC_FALSE@	source='oacc-host.c' object='libgomp_plugin_host_nonshm_la-oacc-host.lo' libtool=yes @AMDEPBACKSLASH@
+@AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
+@am__fastdepCC_FALSE@	$(LIBTOOL)  --tag=CC $(libgomp_plugin_host_nonshm_la_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libgomp_plugin_host_nonshm_la_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) -c -o libgomp_plugin_host_nonshm_la-oacc-host.lo `test -f 'oacc-host.c' || echo '$(srcdir)/'`oacc-host.c
+
+libgomp_plugin_nvptx_la-plugin-nvptx.lo: plugin-nvptx.c
+@am__fastdepCC_TRUE@	$(LIBTOOL)  --tag=CC $(libgomp_plugin_nvptx_la_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libgomp_plugin_nvptx_la_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) -MT libgomp_plugin_nvptx_la-plugin-nvptx.lo -MD -MP -MF $(DEPDIR)/libgomp_plugin_nvptx_la-plugin-nvptx.Tpo -c -o libgomp_plugin_nvptx_la-plugin-nvptx.lo `test -f 'plugin-nvptx.c' || echo '$(srcdir)/'`plugin-nvptx.c
+@am__fastdepCC_TRUE@	$(am__mv) $(DEPDIR)/libgomp_plugin_nvptx_la-plugin-nvptx.Tpo $(DEPDIR)/libgomp_plugin_nvptx_la-plugin-nvptx.Plo
+@AMDEP_TRUE@@am__fastdepCC_FALSE@	source='plugin-nvptx.c' object='libgomp_plugin_nvptx_la-plugin-nvptx.lo' libtool=yes @AMDEPBACKSLASH@
+@AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
+@am__fastdepCC_FALSE@	$(LIBTOOL)  --tag=CC $(libgomp_plugin_nvptx_la_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libgomp_plugin_nvptx_la_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) -c -o libgomp_plugin_nvptx_la-plugin-nvptx.lo `test -f 'plugin-nvptx.c' || echo '$(srcdir)/'`plugin-nvptx.c
+
+.f90.o:
+	$(FCCOMPILE) -c -o $@ $<
+
+.f90.obj:
+	$(FCCOMPILE) -c -o $@ `$(CYGPATH_W) '$<'`
+
+.f90.lo:
+	$(LTFCCOMPILE) -c -o $@ $<
+
 mostlyclean-libtool:
 	-rm -f *.lo
 
diff --git a/libgomp/config.h.in b/libgomp/config.h.in
index 67f5420..13f8952 100644
--- a/libgomp/config.h.in
+++ b/libgomp/config.h.in
@@ -110,6 +110,9 @@
 /* Define to the version of this package. */
 #undef PACKAGE_VERSION
 
+/* Define to 1 if the NVIDIA plugin is built, 0 if not. */
+#undef PLUGIN_NVPTX
+
 /* Define if all infrastructure, needed for plugins, is supported. */
 #undef PLUGIN_SUPPORT
 
diff --git a/libgomp/configure b/libgomp/configure
index 704f22a..e23c1e2 100755
--- a/libgomp/configure
+++ b/libgomp/configure
@@ -627,6 +627,12 @@ LIBGOMP_BUILD_VERSIONED_SHLIB_FALSE
 LIBGOMP_BUILD_VERSIONED_SHLIB_TRUE
 OPT_LDFLAGS
 SECTION_LDFLAGS
+PLUGIN_NVPTX_FALSE
+PLUGIN_NVPTX_TRUE
+PLUGIN_NVPTX_LIBS
+PLUGIN_NVPTX_LDFLAGS
+PLUGIN_NVPTX_CPPFLAGS
+PLUGIN_NVPTX
 libtool_VERSION
 ac_ct_FC
 FCFLAGS
@@ -758,6 +764,9 @@ ac_user_opts='
 enable_option_checking
 enable_version_specific_runtime_libs
 enable_generated_files_in_srcdir
+with_cuda_driver
+with_cuda_driver_include
+with_cuda_driver_lib
 enable_multilib
 enable_dependency_tracking
 enable_shared
@@ -1425,6 +1434,16 @@ Optional Features:
 Optional Packages:
   --with-PACKAGE[=ARG]    use PACKAGE [ARG=yes]
   --without-PACKAGE       do not use PACKAGE (same as --with-PACKAGE=no)
+  --with-cuda-driver=PATH specify prefix directory for installed CUDA driver
+                          package. Equivalent to
+                          --with-cuda-driver-include=PATH/include plus
+                          --with-cuda-driver-lib=PATH/lib
+  --with-cuda-driver-include=PATH
+                          specify directory for installed CUDA driver include
+                          files
+  --with-cuda-driver-lib=PATH
+                          specify directory for the installed CUDA driver
+                          library
   --with-pic              try to use only PIC/non-PIC objects [default=use
                           both]
   --with-gnu-ld           assume the C compiler uses GNU ld [default=no]
@@ -2596,6 +2615,38 @@ else
 fi
 
 
+# Look for the CUDA driver package.
+CUDA_DRIVER_CPPFLAGS=
+CUDA_DRIVER_LDFLAGS=
+
+# Check whether --with-cuda-driver was given.
+if test "${with_cuda_driver+set}" = set; then :
+  withval=$with_cuda_driver;
+fi
+
+
+# Check whether --with-cuda-driver-include was given.
+if test "${with_cuda_driver_include+set}" = set; then :
+  withval=$with_cuda_driver_include;
+fi
+
+
+# Check whether --with-cuda-driver-lib was given.
+if test "${with_cuda_driver_lib+set}" = set; then :
+  withval=$with_cuda_driver_lib;
+fi
+
+if test "x$with_cuda_driver" != x; then
+  CUDA_DRIVER_CPPFLAGS=-I$with_cuda_driver/include
+  CUDA_DRIVER_LDFLAGS=-L$with_cuda_driver/lib
+fi
+if test "x$with_cuda_driver_include" != x; then
+  CUDA_DRIVER_CPPFLAGS=-I$with_cuda_driver_include
+fi
+if test "x$with_cuda_driver_lib" != x; then
+  CUDA_DRIVER_LDFLAGS=-L$with_cuda_driver_lib
+fi
+
 
 # -------
 # -------
@@ -11094,7 +11145,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 11097 "configure"
+#line 11148 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -11200,7 +11251,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 11203 "configure"
+#line 11254 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -15113,7 +15164,78 @@ if test x$plugin_support = xyes; then
 
 $as_echo "#define PLUGIN_SUPPORT 1" >>confdefs.h
 
+elif test "x$enable_accelerator" != xno; then
+  as_fn_error "Can't have support for accelerators without support for plugins" "$LINENO" 5
+fi
+
+PLUGIN_NVPTX=0
+PLUGIN_NVPTX_CPPFLAGS=
+PLUGIN_NVPTX_LDFLAGS=
+PLUGIN_NVPTX_LIBS=
+
+
+
+
+# enable_accelerator has already been validated at top level.
+# No need to do it again.
+case $enable_offload_targets in
+  auto-nvptx*|nvptx*)
+    PLUGIN_NVPTX=yes
+    PLUGIN_NVPTX_CPPFLAGS=$CUDA_DRIVER_CPPFLAGS
+    PLUGIN_NVPTX_LDFLAGS=$CUDA_DRIVER_LDFLAGS
+    PLUGIN_NVPTX_LIBS='-lcuda'
+
+    PLUGIN_NVPTX_save_CPPFLAGS=$CPPFLAGS
+    CPPFLAGS="$PLUGIN_NVPTX_CPPFLAGS $CPPFLAGS"
+    PLUGIN_NVPTX_save_LDFLAGS=$LDFLAGS
+    LDFLAGS="$PLUGIN_NVPTX_LDFLAGS $LDFLAGS"
+    PLUGIN_NVPTX_save_LIBS=$LIBS
+    LIBS="$PLUGIN_NVPTX_LIBS $LIBS"
+    cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#include "cuda.h"
+int
+main ()
+{
+CUresult r = cuCtxPushCurrent (NULL);
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_link "$LINENO"; then :
+  PLUGIN_NVPTX=1
 fi
+rm -f core conftest.err conftest.$ac_objext \
+    conftest$ac_exeext conftest.$ac_ext
+    CPPFLAGS=$PLUGIN_NVPTX_save_CPPFLAGS
+    LDFLAGS=$PLUGIN_NVPTX_save_LDFLAGS
+    LIBS=$PLUGIN_NVPTX_save_LIBS
+    case $PLUGIN_NVPTX in
+      auto-nvptx*)
+	PLUGIN_NVPTX=0
+	{ $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: CUDA driver package required for nvptx support; disabling" >&5
+$as_echo "$as_me: WARNING: CUDA driver package required for nvptx support; disabling" >&2;}
+	;;
+      nvptx*)
+	PLUGIN_NVPTX=0
+	as_fn_error "CUDA driver package required for nvptx support" "$LINENO" 5
+	;;
+    esac
+    ;;
+esac
+ if test $PLUGIN_NVPTX = 1; then
+  PLUGIN_NVPTX_TRUE=
+  PLUGIN_NVPTX_FALSE='#'
+else
+  PLUGIN_NVPTX_TRUE='#'
+  PLUGIN_NVPTX_FALSE=
+fi
+
+
+cat >>confdefs.h <<_ACEOF
+#define PLUGIN_NVPTX $PLUGIN_NVPTX
+_ACEOF
+
 
 # Check for functions needed.
 for ac_func in getloadavg clock_gettime strtoull
@@ -16458,6 +16580,10 @@ if test -z "${MAINTAINER_MODE_TRUE}" && test -z "${MAINTAINER_MODE_FALSE}"; then
   as_fn_error "conditional \"MAINTAINER_MODE\" was never defined.
 Usually this means the macro was only invoked conditionally." "$LINENO" 5
 fi
+if test -z "${PLUGIN_NVPTX_TRUE}" && test -z "${PLUGIN_NVPTX_FALSE}"; then
+  as_fn_error "conditional \"PLUGIN_NVPTX\" was never defined.
+Usually this means the macro was only invoked conditionally." "$LINENO" 5
+fi
 if test -z "${LIBGOMP_BUILD_VERSIONED_SHLIB_TRUE}" && test -z "${LIBGOMP_BUILD_VERSIONED_SHLIB_FALSE}"; then
   as_fn_error "conditional \"LIBGOMP_BUILD_VERSIONED_SHLIB\" was never defined.
 Usually this means the macro was only invoked conditionally." "$LINENO" 5
diff --git a/libgomp/configure.ac b/libgomp/configure.ac
index da06426..2633dac 100644
--- a/libgomp/configure.ac
+++ b/libgomp/configure.ac
@@ -2,6 +2,8 @@
 # aclocal -I ../config && autoconf && autoheader && automake
 
 AC_PREREQ(2.64)
+#TODO: Update for OpenACC?  But then also have to update copyright notices in
+#all source files...
 AC_INIT([GNU OpenMP Runtime Library], 1.0,,[libgomp])
 AC_CONFIG_HEADER(config.h)
 
@@ -28,6 +30,31 @@ LIBGOMP_ENABLE(generated-files-in-srcdir, no, ,
 AC_MSG_RESULT($enable_generated_files_in_srcdir)
 AM_CONDITIONAL(GENINSRC, test "$enable_generated_files_in_srcdir" = yes)
 
+# Look for the CUDA driver package.
+CUDA_DRIVER_CPPFLAGS=
+CUDA_DRIVER_LDFLAGS=
+AC_ARG_WITH(cuda-driver,
+	[AS_HELP_STRING([--with-cuda-driver=PATH],
+		[specify prefix directory for installed CUDA driver package.
+		 Equivalent to --with-cuda-driver-include=PATH/include
+		 plus --with-cuda-driver-lib=PATH/lib])])
+AC_ARG_WITH(cuda-driver-include,
+	[AS_HELP_STRING([--with-cuda-driver-include=PATH],
+		[specify directory for installed CUDA driver include files])])
+AC_ARG_WITH(cuda-driver-lib,
+	[AS_HELP_STRING([--with-cuda-driver-lib=PATH],
+		[specify directory for the installed CUDA driver library])])
+if test "x$with_cuda_driver" != x; then
+  CUDA_DRIVER_CPPFLAGS=-I$with_cuda_driver/include
+  CUDA_DRIVER_LDFLAGS=-L$with_cuda_driver/lib
+fi
+if test "x$with_cuda_driver_include" != x; then
+  CUDA_DRIVER_CPPFLAGS=-I$with_cuda_driver_include
+fi
+if test "x$with_cuda_driver_lib" != x; then
+  CUDA_DRIVER_LDFLAGS=-L$with_cuda_driver_lib
+fi
+
 
 # -------
 # -------
@@ -200,8 +227,57 @@ AC_CHECK_HEADER(dirent.h, , [plugin_support=no])
 if test x$plugin_support = xyes; then
   AC_DEFINE(PLUGIN_SUPPORT, 1,
     [Define if all infrastructure, needed for plugins, is supported.])
+elif test "x$enable_accelerator" != xno; then
+  AC_MSG_ERROR([Can't have support for accelerators without support for plugins])
 fi
 
+PLUGIN_NVPTX=0
+PLUGIN_NVPTX_CPPFLAGS=
+PLUGIN_NVPTX_LDFLAGS=
+PLUGIN_NVPTX_LIBS=
+AC_SUBST(PLUGIN_NVPTX)
+AC_SUBST(PLUGIN_NVPTX_CPPFLAGS)
+AC_SUBST(PLUGIN_NVPTX_LDFLAGS)
+AC_SUBST(PLUGIN_NVPTX_LIBS)
+# enable_accelerator has already been validated at top level.
+# No need to do it again.
+case $enable_offload_targets in
+  auto-nvptx*|nvptx*)
+    PLUGIN_NVPTX=yes
+    PLUGIN_NVPTX_CPPFLAGS=$CUDA_DRIVER_CPPFLAGS
+    PLUGIN_NVPTX_LDFLAGS=$CUDA_DRIVER_LDFLAGS
+    PLUGIN_NVPTX_LIBS='-lcuda'
+
+    PLUGIN_NVPTX_save_CPPFLAGS=$CPPFLAGS
+    CPPFLAGS="$PLUGIN_NVPTX_CPPFLAGS $CPPFLAGS"
+    PLUGIN_NVPTX_save_LDFLAGS=$LDFLAGS
+    LDFLAGS="$PLUGIN_NVPTX_LDFLAGS $LDFLAGS"
+    PLUGIN_NVPTX_save_LIBS=$LIBS
+    LIBS="$PLUGIN_NVPTX_LIBS $LIBS"
+    AC_LINK_IFELSE(
+      [AC_LANG_PROGRAM(
+	[#include "cuda.h"],
+	[CUresult r = cuCtxPushCurrent (NULL);])],
+      [PLUGIN_NVPTX=1])
+    CPPFLAGS=$PLUGIN_NVPTX_save_CPPFLAGS
+    LDFLAGS=$PLUGIN_NVPTX_save_LDFLAGS
+    LIBS=$PLUGIN_NVPTX_save_LIBS
+    case $PLUGIN_NVPTX in
+      auto-nvptx*)
+	PLUGIN_NVPTX=0
+	AC_MSG_WARN([CUDA driver package required for nvptx support; disabling])
+	;;
+      nvptx*)
+	PLUGIN_NVPTX=0
+	AC_MSG_ERROR([CUDA driver package required for nvptx support])
+	;;
+    esac
+    ;;
+esac
+AM_CONDITIONAL([PLUGIN_NVPTX], [test $PLUGIN_NVPTX = 1])
+AC_DEFINE_UNQUOTED([PLUGIN_NVPTX], [$PLUGIN_NVPTX],
+		  [Define to 1 if the NVIDIA plugin is built, 0 if not.])
+
 # Check for functions needed.
 AC_CHECK_FUNCS(getloadavg clock_gettime strtoull)
 
diff --git a/libgomp/env.c b/libgomp/env.c
index 94c72a3..32fb92c 100644
--- a/libgomp/env.c
+++ b/libgomp/env.c
@@ -27,6 +27,7 @@
 
 #include "libgomp.h"
 #include "libgomp_f.h"
+#include "target.h"
 #include <ctype.h>
 #include <stdlib.h>
 #include <stdio.h>
@@ -77,6 +78,9 @@ unsigned long gomp_bind_var_list_len;
 void **gomp_places_list;
 unsigned long gomp_places_list_len;
 
+int goacc_device_num;
+char* goacc_device_type;
+
 /* Parse the OMP_SCHEDULE environment variable.  */
 
 static void
@@ -1013,6 +1017,37 @@ parse_affinity (bool ignore)
 
 
 static void
+goacc_parse_device_num (void)
+{
+  const char *env = getenv ("ACC_DEVICE_NUM");
+  int default_num = -1;
+  
+  if (env && *env != '\0')
+    {
+      char *end;
+      default_num = strtol (env, &end, 0);
+      
+      if (*end || default_num < 0)
+        default_num = 0;
+    }
+  else
+    default_num = 0;
+  
+  goacc_device_num = default_num;
+}
+
+static void
+goacc_parse_device_type (void)
+{
+  const char *env = getenv ("ACC_DEVICE_TYPE");
+  
+  if (env && *env != '\0')
+    goacc_device_type = strdup (env);
+  else
+    goacc_device_type = NULL;
+}
+
+static void
 handle_omp_display_env (unsigned long stacksize, int wait_policy)
 {
   const char *env;
@@ -1181,6 +1216,7 @@ initialize_env (void)
       gomp_global_icv.thread_limit_var
 	= thread_limit_var > INT_MAX ? UINT_MAX : thread_limit_var;
     }
+  parse_int ("GCC_ACC_NOTIFY", &gomp_global_icv.acc_notify_var, true);
 #ifndef HAVE_SYNC_BUILTINS
   gomp_mutex_init (&gomp_managed_threads_lock);
 #endif
@@ -1271,6 +1307,13 @@ initialize_env (void)
     }
 
   handle_omp_display_env (stacksize, wait_policy);
+  
+  /* Look for OpenACC-specific environment variables.  */
+  goacc_parse_device_num ();
+  goacc_parse_device_type ();
+
+  /* Initialize OpenACC-specific internal state.  */
+  ACC_runtime_initialize ();
 }
 
 \f
diff --git a/libgomp/error.c b/libgomp/error.c
index d9b28f1..5f400cc 100644
--- a/libgomp/error.c
+++ b/libgomp/error.c
@@ -35,7 +35,7 @@
 #include <stdlib.h>
 
 
-static void
+void
 gomp_verror (const char *fmt, va_list list)
 {
   fputs ("\nlibgomp: ", stderr);
@@ -54,13 +54,40 @@ gomp_error (const char *fmt, ...)
 }
 
 void
+gomp_vfatal (const char *fmt, va_list list)
+{
+  gomp_verror (fmt, list);
+  exit (EXIT_FAILURE);
+}
+
+void
 gomp_fatal (const char *fmt, ...)
 {
   va_list list;
 
   va_start (list, fmt);
-  gomp_verror (fmt, list);
+  gomp_vfatal (fmt, list);
   va_end (list);
 
-  exit (EXIT_FAILURE);
+  /* Unreachable.  */
+  abort ();
+}
+
+void
+gomp_vnotify (const char *msg, va_list list)
+{
+  struct gomp_task_icv *icv = gomp_icv (false);
+  if (icv->acc_notify_var)
+    vfprintf (stderr, msg, list);
+}
+
+void
+gomp_notify(const char *msg, ...)
+{
+  va_list list;
+  
+  va_start (list, msg);
+  gomp_vnotify (msg, list);
+  va_end (list);
 }
+
diff --git a/libgomp/fortran.c b/libgomp/fortran.c
index 28c83cc..1f30c51 100644
--- a/libgomp/fortran.c
+++ b/libgomp/fortran.c
@@ -26,7 +26,6 @@
 
 #include "libgomp.h"
 #include "libgomp_f.h"
-#include "openacc.h"
 #include <stdlib.h>
 #include <limits.h>
 
@@ -74,7 +73,6 @@ ialias_redirect (omp_get_num_devices)
 ialias_redirect (omp_get_num_teams)
 ialias_redirect (omp_get_team_num)
 ialias_redirect (omp_is_initial_device)
-ialias_redirect (acc_on_device)
 #endif
 
 #ifndef LIBGOMP_GNU_SYMBOL_VERSIONING
@@ -494,9 +492,3 @@ omp_is_initial_device_ (void)
 {
   return omp_is_initial_device ();
 }
-
-int32_t
-acc_on_device_ (const int32_t *dev)
-{
-  return acc_on_device (*dev);
-}
diff --git a/libgomp/libgomp-plugin.c b/libgomp/libgomp-plugin.c
new file mode 100644
index 0000000..73c8765
--- /dev/null
+++ b/libgomp/libgomp-plugin.c
@@ -0,0 +1,106 @@
+/* Copyright (C) 2014 Free Software Foundation, Inc.
+   Contributed by CodeSourcery.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* Exported (non-hidden) functions exposing libgomp interface for plugins.  */
+
+#include <stdlib.h>
+
+#include "libgomp.h"
+#include "libgomp-plugin.h"
+#include "target.h"
+
+void *
+gomp_plugin_malloc (size_t size)
+{
+  return gomp_malloc (size);
+}
+
+void *
+gomp_plugin_malloc_cleared (size_t size)
+{
+  return gomp_malloc_cleared (size);
+}
+
+void *
+gomp_plugin_realloc (void *ptr, size_t size)
+{
+  return gomp_realloc (ptr, size);
+}
+
+void
+gomp_plugin_error (const char *msg, ...)
+{
+  va_list ap;
+  
+  va_start (ap, msg);
+  gomp_verror (msg, ap);
+  va_end (ap);
+}
+
+void
+gomp_plugin_notify (const char *msg, ...)
+{
+  va_list ap;
+  
+  va_start (ap, msg);
+  gomp_vnotify (msg, ap);
+  va_end (ap);
+}
+
+void
+gomp_plugin_fatal (const char *msg, ...)
+{
+  va_list ap;
+  
+  va_start (ap, msg);
+  gomp_vfatal (msg, ap);
+  va_end (ap);
+  
+  /* Unreachable.  */
+  abort ();
+}
+
+void
+gomp_plugin_mutex_init (gomp_mutex_t *mutex)
+{
+  gomp_mutex_init (mutex);
+}
+
+void
+gomp_plugin_mutex_destroy (gomp_mutex_t *mutex)
+{
+  gomp_mutex_destroy (mutex);
+}
+
+void
+gomp_plugin_mutex_lock (gomp_mutex_t *mutex)
+{
+  gomp_mutex_lock (mutex);
+}
+
+void
+gomp_plugin_mutex_unlock (gomp_mutex_t *mutex)
+{
+  gomp_mutex_unlock (mutex);
+}
diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h
new file mode 100644
index 0000000..ea4d89a
--- /dev/null
+++ b/libgomp/libgomp-plugin.h
@@ -0,0 +1,57 @@
+/* Copyright (C) 2014 Free Software Foundation, Inc.
+   Contributed by CodeSourcery.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* An interface to various libgomp-internal functions for use by plugins.  */
+
+#ifndef LIBGOMP_PLUGIN_H
+#define LIBGOMP_PLUGIN_H 1
+
+#include "mutex.h"
+
+/* alloc.c */
+
+extern void *gomp_plugin_malloc (size_t) __attribute__((malloc));
+extern void *gomp_plugin_malloc_cleared (size_t) __attribute__((malloc));
+extern void *gomp_plugin_realloc (void *, size_t);
+
+/* error.c */
+
+extern void gomp_plugin_notify(const char *msg, ...);
+extern void gomp_plugin_error (const char *, ...)
+	__attribute__((format (printf, 1, 2)));
+extern void gomp_plugin_fatal (const char *, ...)
+	__attribute__((noreturn, format (printf, 1, 2)));
+
+/* mutex.c */
+
+extern void gomp_plugin_mutex_init (gomp_mutex_t *mutex);
+extern void gomp_plugin_mutex_destroy (gomp_mutex_t *mutex);
+extern void gomp_plugin_mutex_lock (gomp_mutex_t *mutex);
+extern void gomp_plugin_mutex_unlock (gomp_mutex_t *mutex);
+
+/* target.c */
+
+extern void gomp_plugin_async_unmap_vars (void *ptr);
+
+#endif
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index a1482cc..8b7327d 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -40,6 +40,7 @@
 #include <pthread.h>
 #include <stdbool.h>
 #include <stdlib.h>
+#include <stdarg.h>
 
 #ifdef HAVE_ATTRIBUTE_VISIBILITY
 # pragma GCC visibility push(hidden)
@@ -220,6 +221,7 @@ struct gomp_team_state
 };
 
 struct target_mem_desc;
+struct gomp_memory_mapping;
 
 /* These are the OpenMP 4.0 Internal Control Variables described in
    section 2.3.1.  Those described as having one copy per task are
@@ -236,6 +238,7 @@ struct gomp_task_icv
   bool dyn_var;
   bool nest_var;
   char bind_var;
+  int acc_notify_var;
   /* Internal ICV.  */
   struct target_mem_desc *target_data;
 };
@@ -254,6 +257,9 @@ extern unsigned long gomp_bind_var_list_len;
 extern void **gomp_places_list;
 extern unsigned long gomp_places_list_len;
 
+extern int goacc_device_num;
+extern char* goacc_device_type;
+
 enum gomp_task_kind
 {
   GOMP_TASK_IMPLICIT,
@@ -532,8 +538,12 @@ extern void *gomp_realloc (void *, size_t);
 
 /* error.c */
 
+extern void gomp_vnotify (const char *, va_list);
+extern void gomp_notify(const char *msg, ...);
+extern void gomp_verror (const char *, va_list);
 extern void gomp_error (const char *, ...)
 	__attribute__((format (printf, 1, 2)));
+extern void gomp_vfatal (const char *, va_list);
 extern void gomp_fatal (const char *, ...)
 	__attribute__((noreturn, format (printf, 1, 2)));
 
@@ -606,6 +616,7 @@ extern void gomp_free_thread (void *);
 
 /* target.c */
 
+extern void gomp_init_targets_once (void);
 extern int gomp_get_num_devices (void);
 
 /* work.c */
diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
index 69a4d83..e1e87d9 100644
--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -235,8 +235,82 @@ GOMP_4.0.1 {
 
 OACC_2.0 {
   global:
+	acc_get_num_devices;
+	acc_get_num_devices_h_;
+	acc_set_device_type;
+	acc_set_device_type_h_;
+	acc_get_device_type;
+	acc_get_device_type_h_;
+	acc_set_device_num;
+	acc_set_device_num_h_;
+	acc_get_device_num;
+	acc_get_device_num_h_;
+	acc_async_test;
+	acc_async_test_h_;
+	acc_async_test_all;
+	acc_async_test_all_h_;
+	acc_wait;
+	acc_wait_h_;
+	acc_wait_async;
+	acc_wait_async_h_;
+	acc_wait_all;
+	acc_wait_all_h_;
+	acc_wait_all_async;
+	acc_wait_all_async_h_;
+	acc_init;
+	acc_init_h_;
+	acc_shutdown;
+	acc_shutdown_h_;
 	acc_on_device;
-	acc_on_device_;
+	acc_on_device_h_;
+	acc_malloc;
+	acc_free;
+	acc_copyin;
+	acc_copyin_32_h_;
+	acc_copyin_64_h_;
+	acc_copyin_array_h_;
+	acc_present_or_copyin;
+	acc_present_or_copyin_32_h_;
+	acc_present_or_copyin_64_h_;
+	acc_present_or_copyin_array_h_;
+	acc_create;
+	acc_create_32_h_;
+	acc_create_64_h_;
+	acc_create_array_h_;
+	acc_present_or_create;
+	acc_present_or_create_32_h_;
+	acc_present_or_create_64_h_;
+	acc_present_or_create_array_h_;
+	acc_copyout;
+	acc_copyout_32_h_;
+	acc_copyout_64_h_;
+	acc_copyout_array_h_;
+	acc_delete;
+	acc_delete_32_h_;
+	acc_delete_64_h_;
+	acc_delete_array_h_;
+	acc_update_device;
+	acc_update_device_32_h_;
+	acc_update_device_64_h_;
+	acc_update_device_array_h_;
+	acc_update_self;
+	acc_update_self_32_h_;
+	acc_update_self_64_h_;
+	acc_update_self_array_h_;
+	acc_map_data;
+	acc_unmap_data;
+	acc_deviceptr;
+	acc_hostptr;
+	acc_is_present;
+	acc_is_present_32_h_;
+	acc_is_present_64_h_;
+	acc_is_present_array_h_;
+	acc_memcpy_to_device;
+	acc_memcpy_from_device;
+	acc_get_current_cuda_device;
+	acc_get_current_cuda_context;
+	acc_get_cuda_stream;
+	acc_set_cuda_stream;
 };
 
 GOACC_2.0 {
@@ -246,4 +320,21 @@ GOACC_2.0 {
 	GOACC_kernels;
 	GOACC_parallel;
 	GOACC_update;
+	GOACC_wait;
+};
+
+# FIXME: Hygiene/grouping/naming?
+PLUGIN_1.0 {
+  global:
+	gomp_plugin_malloc;
+	gomp_plugin_malloc_cleared;
+	gomp_plugin_realloc;
+	gomp_plugin_error;
+	gomp_plugin_notify;
+	gomp_plugin_fatal;
+	gomp_plugin_mutex_init;
+	gomp_plugin_mutex_destroy;
+	gomp_plugin_mutex_lock;
+	gomp_plugin_mutex_unlock;
+	gomp_plugin_async_unmap_vars;
 };
diff --git a/libgomp/libgomp_g.h b/libgomp/libgomp_g.h
index 9dca76a9..44f200c 100644
--- a/libgomp/libgomp_g.h
+++ b/libgomp/libgomp_g.h
@@ -221,9 +221,10 @@ extern void GOACC_data_start (int, const void *,
 extern void GOACC_data_end (void);
 extern void GOACC_kernels (int, void (*) (void *), const void *,
 			   size_t, void **, size_t *, unsigned short *,
-			   int, int, int);
+			   int, int, int, int, int, ...);
 extern void GOACC_parallel (int, void (*) (void *), const void *,
 			    size_t, void **, size_t *, unsigned short *,
-			    int, int, int);
+			    int, int, int, int, int, ...);
+extern void GOACC_wait (int, int, ...);
 
 #endif /* LIBGOMP_G_H */
diff --git a/libgomp/oacc-async.c b/libgomp/oacc-async.c
new file mode 100644
index 0000000..e6b6ebf
--- /dev/null
+++ b/libgomp/oacc-async.c
@@ -0,0 +1,80 @@
+/* OpenACC Runtime Library Definitions.
+
+   Copyright (C) 2013-2014 Free Software Foundation, Inc.
+
+   Contributed by Nathan Sidwell <nathan@codesourcery.com>.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#include "openacc.h"
+#include "libgomp.h"
+#include "target.h"
+
+int
+acc_async_test (int async)
+{
+  if (async < acc_async_sync)
+    gomp_fatal ("invalid async argument: %d", async);
+
+  return ACC_dev->openacc.async_test_func (async);
+}
+
+int
+acc_async_test_all (void)
+{
+  return ACC_dev->openacc.async_test_all_func ();
+}
+
+void
+acc_wait (int async)
+{
+  if (async < acc_async_sync)
+    gomp_fatal ("invalid async argument: %d", async);
+
+  ACC_dev->openacc.async_wait_func (async);
+  return;
+}
+
+void
+acc_wait_async (int async1, int async2)
+{
+  ACC_dev->openacc.async_wait_async_func (async1, async2);
+  return;
+}
+
+void
+acc_wait_all (void)
+{
+  ACC_dev->openacc.async_wait_all_func ();
+  return;
+}
+
+void
+acc_wait_all_async (int async)
+{
+  if (async < acc_async_sync)
+    gomp_fatal ("invalid async argument: %d", async);
+
+  ACC_dev->openacc.async_wait_all_async_func (async);
+  return;
+}
diff --git a/libgomp/oacc-cuda.c b/libgomp/oacc-cuda.c
new file mode 100644
index 0000000..f587325
--- /dev/null
+++ b/libgomp/oacc-cuda.c
@@ -0,0 +1,81 @@
+/* OpenACC Runtime Library: CUDA support glue.
+
+   Copyright (C) 2014 Free Software Foundation, Inc.
+
+   Contributed by Mentor Embedded.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "openacc.h"
+#include "config.h"
+#include "libgomp.h"
+#include "target.h"
+
+void *
+acc_get_current_cuda_device (void)
+{
+  void *p = NULL;
+
+  if (ACC_dev && ACC_dev->openacc.cuda.get_current_device_func)
+    p = ACC_dev->openacc.cuda.get_current_device_func ();
+
+  return p;
+}
+
+void *
+acc_get_current_cuda_context (void)
+{
+  void *p = NULL;
+
+  if (ACC_dev && ACC_dev->openacc.cuda.get_current_context_func)
+    p = ACC_dev->openacc.cuda.get_current_context_func ();
+
+  return p;
+}
+
+void *
+acc_get_cuda_stream (int async)
+{
+  void *p = NULL;
+
+  if (async < 0)
+    return p;
+
+  if (ACC_dev && ACC_dev->openacc.cuda.get_stream_func)
+    p = ACC_dev->openacc.cuda.get_stream_func (async);
+
+  return p;
+}
+
+int
+acc_set_cuda_stream (int async, void *stream)
+{
+  int s = -1;
+
+  if (async < 0 || stream == NULL)
+    return 0;
+
+  if (ACC_dev && ACC_dev->openacc.cuda.set_stream_func)
+    s = ACC_dev->openacc.cuda.set_stream_func (async, stream);
+
+  return s;
+}
diff --git a/libgomp/oacc-host.c b/libgomp/oacc-host.c
new file mode 100644
index 0000000..27f73b6
--- /dev/null
+++ b/libgomp/oacc-host.c
@@ -0,0 +1,425 @@
+/* OpenACC Runtime Library: acc_device_host, acc_device_host_nonshm.
+
+   Copyright (C) 2013 Free Software Foundation, Inc.
+
+   Contributed by Thomas Schwinge <thomas@codesourcery.com>.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* Simple implementation of support routines for a shared-memory
+   acc_device_host, and a non-shared memory acc_device_host_nonshm, with the
+   latter built as a plugin.  */
+
+#include "openacc.h"
+#include "config.h"
+#include "libgomp.h"
+#include "target.h"
+#ifdef HOST_NONSHM_PLUGIN
+#include "libgomp-plugin.h"
+#endif
+
+#include <stdint.h>
+#include <stdlib.h>
+#include <string.h>
+#include <stdio.h>
+
+#undef DEBUG
+
+#ifdef HOST_NONSHM_PLUGIN
+#define STATIC
+#define GOMP(X) gomp_plugin_##X
+#define SELF "host_nonshm plugin: "
+#else
+#define STATIC static
+#define GOMP(X) gomp_##X
+#define SELF "host: "
+#endif
+
+#ifndef HOST_NONSHM_PLUGIN
+static struct gomp_device_descr host_dispatch;
+#endif
+
+STATIC const char *
+get_name (void)
+{
+#ifdef DEBUG
+  fprintf (stderr, SELF "%s:%s\n", __FILE__, __FUNCTION__);
+#endif
+
+#ifdef HOST_NONSHM_PLUGIN
+  return "host_nonshm";
+#else
+  return "host";
+#endif
+}
+
+STATIC int
+get_type (void)
+{
+#ifdef DEBUG
+  fprintf (stderr, SELF "%s:%s\n", __FILE__, __FUNCTION__);
+#endif
+
+#ifdef HOST_NONSHM_PLUGIN
+  return TARGET_TYPE_HOST_NONSHM;
+#else
+  return TARGET_TYPE_HOST;
+#endif
+}
+
+STATIC unsigned int
+get_caps (void)
+{
+  unsigned int caps = TARGET_CAP_OPENACC_200 | TARGET_CAP_OPENMP_400
+		      | TARGET_CAP_NATIVE_EXEC;
+
+#ifndef HOST_NONSHM_PLUGIN
+  caps |= TARGET_CAP_SHARED_MEM;
+#endif
+
+#ifdef DEBUG
+  fprintf (stderr, SELF "%s:%s: 0x%x\n", __FILE__, __FUNCTION__, caps);
+#endif
+
+  return caps;
+}
+
+STATIC int
+get_num_devices (void)
+{
+#ifdef DEBUG
+  fprintf (stderr, SELF "%s:%s\n", __FILE__, __FUNCTION__);
+#endif
+
+  return 1;
+}
+
+STATIC void
+offload_register (void *host_table, void *target_data)
+{
+#ifdef DEBUG
+  fprintf (stderr, SELF "%s:%s (%p, %p)\n", __FILE__, __FUNCTION__, host_table,
+	   target_data);
+#endif
+}
+
+STATIC int
+device_init (void)
+{
+#ifdef DEBUG
+  fprintf (stderr, SELF "%s:%s\n", __FILE__, __FUNCTION__);
+#endif
+
+  return get_num_devices ();
+}
+
+STATIC int
+device_fini (void)
+{
+#ifdef DEBUG
+  fprintf (stderr, SELF "%s:%s\n", __FILE__, __FUNCTION__);
+#endif
+
+  return 0;
+}
+
+STATIC int
+device_get_table (struct mapping_table **table)
+{
+#ifdef DEBUG
+  fprintf (stderr, SELF "%s:%s (%p)\n", __FILE__, __FUNCTION__, table);
+#endif
+
+  return 0;
+}
+
+STATIC bool
+openacc_avail (void)
+{
+#ifdef DEBUG
+  fprintf (stderr, SELF "%s:%s\n", __FILE__, __FUNCTION__);
+#endif
+
+  return 1;
+}
+
+STATIC void *
+openacc_open_device (int n)
+{
+#ifdef DEBUG
+  fprintf (stderr, SELF "%s:%s (%u)\n", __FILE__, __FUNCTION__, n);
+#endif
+
+  return (void *) (intptr_t) n;
+}
+
+STATIC int
+openacc_close_device (void *hnd)
+{
+#ifdef DEBUG
+  fprintf (stderr, SELF "%s:%s (%p)\n", __FILE__, __FUNCTION__, hnd);
+#endif
+
+  return 0;
+}
+
+STATIC int
+openacc_get_device_num (void)
+{
+#ifdef DEBUG
+  fprintf (stderr, SELF "%s:%s\n", __FILE__, __FUNCTION__);
+#endif
+
+  return 0;
+}
+
+STATIC void
+openacc_set_device_num (int n)
+{
+#ifdef DEBUG
+  fprintf (stderr, SELF "%s:%s (%u)\n", __FILE__, __FUNCTION__, n);
+#endif
+
+  if (n > 0)
+    GOMP(fatal) ("device number %u out of range for host execution", n);
+}
+
+STATIC void *
+device_alloc (size_t s)
+{
+  void *ptr = GOMP(malloc) (s);
+
+#ifdef DEBUG
+  fprintf (stderr, SELF "%s:%s (%zd): %p\n", __FILE__, __FUNCTION__, s, ptr);
+#endif
+
+  return ptr;
+}
+
+STATIC void
+device_free (void *p)
+{
+#ifdef DEBUG
+  fprintf (stderr, SELF "%s:%s (%p)\n", __FILE__, __FUNCTION__, p);
+#endif
+
+  free (p);
+}
+
+STATIC void *
+device_host2dev (void *d, const void *h, size_t s)
+{
+#ifdef DEBUG
+  fprintf (stderr, SELF "%s:%s (%p, %p, %zd)\n", __FILE__, __FUNCTION__, d, h,
+	   s);
+#endif
+
+#ifdef HOST_NONSHM_PLUGIN
+  memcpy (d, h, s);
+#endif
+
+  return 0;
+}
+
+STATIC void *
+device_dev2host (void *h, const void *d, size_t s)
+{
+#ifdef DEBUG
+  fprintf (stderr, SELF "%s:%s (%p, %p, %zd)\n", __FILE__, __FUNCTION__, h, d,
+	   s);
+#endif
+
+#ifdef HOST_NONSHM_PLUGIN
+  memcpy (h, d, s);
+#endif
+
+  return 0;
+}
+
+STATIC void
+device_run (void *fn_ptr, void *vars)
+{
+#ifdef DEBUG
+  fprintf (stderr, SELF "%s:%s (%p, %p)\n", __FILE__, __FUNCTION__, fn_ptr,
+	   vars);
+#endif
+
+  void (*fn)(void *) = (void (*)(void *)) fn_ptr;
+
+  fn (vars);
+}
+
+STATIC void
+openacc_parallel (void (*fn) (void *), size_t mapnum __attribute__((unused)),
+		  void **hostaddrs __attribute__((unused)),
+		  void **devaddrs __attribute__((unused)),
+		  size_t *sizes __attribute__((unused)),
+		  unsigned short *kinds __attribute__((unused)),
+		  int num_gangs __attribute__((unused)),
+		  int num_workers __attribute__((unused)),
+		  int vector_length __attribute__((unused)),
+		  int async __attribute__((unused)),
+		  void *targ_mem_desc __attribute__((unused)))
+{
+#ifdef DEBUG
+  fprintf (stderr, SELF "%s:%s (%p, %zu, %p, %p, %p, %d, %d, %d, %d, %p)\n",
+	   __FILE__, __FUNCTION__, fn, mapnum, hostaddrs, sizes, kinds,
+	   num_gangs, num_workers, vector_length, async, targ_mem_desc);
+#endif
+
+#ifdef HOST_NONSHM_PLUGIN
+  fn (devaddrs);
+#else
+  fn (hostaddrs);
+#endif
+}
+
+STATIC void
+openacc_async_set_async (int async __attribute__((unused)))
+{
+#ifdef DEBUG
+  fprintf (stderr, SELF "%s:%s (%d)\n", __FILE__, __FUNCTION__, async);
+#endif
+}
+
+STATIC int
+openacc_async_test (int async __attribute__((unused)))
+{
+#ifdef DEBUG
+  fprintf (stderr, SELF "%s:%s (%d)\n", __FILE__, __FUNCTION__, async);
+#endif
+
+  return 1;
+}
+
+STATIC int
+openacc_async_test_all (void)
+{
+#ifdef DEBUG
+  fprintf (stderr, SELF "%s:%s\n", __FILE__, __FUNCTION__);
+#endif
+
+  return 1;
+}
+
+STATIC void
+openacc_async_wait (int async __attribute__((unused)))
+{
+#ifdef DEBUG
+  fprintf (stderr, SELF "%s:%s (%d)\n", __FILE__, __FUNCTION__, async);
+#endif
+}
+
+STATIC void
+openacc_async_wait_all (void)
+{
+#ifdef DEBUG
+  fprintf (stderr, SELF "%s:%s\n", __FILE__, __FUNCTION__);
+#endif
+}
+
+STATIC void
+openacc_async_wait_async (int async1 __attribute__((unused)),
+                	  int async2 __attribute__((unused)))
+{
+#ifdef DEBUG
+  fprintf (stderr, SELF "%s:%s (%d, %d)\n", __FILE__, __FUNCTION__, async1,
+	   async2);
+#endif
+}
+
+STATIC void
+openacc_async_wait_all_async (int async __attribute__((unused)))
+{
+#ifdef DEBUG
+  fprintf (stderr, SELF "%s:%s (%d)\n", __FILE__, __FUNCTION__, async);
+#endif
+}
+
+#ifndef HOST_NONSHM_PLUGIN
+static struct gomp_device_descr host_dispatch =
+  {
+    .name = "host",
+
+    .type = TARGET_TYPE_HOST,
+    .capabilities = TARGET_CAP_OPENACC_200 | TARGET_CAP_NATIVE_EXEC
+		    | TARGET_CAP_SHARED_MEM,
+    .id = 0,
+
+    .is_initialized = false,
+    .offload_regions_registered = false,
+
+    .get_name_func = get_name,
+    .get_type_func = get_type,
+    .get_caps_func = get_caps,
+
+    .device_init_func = device_init,
+    .device_fini_func = device_fini,
+    .get_num_devices_func = get_num_devices,
+    .offload_register_func = offload_register,
+    .device_get_table_func = device_get_table,
+
+    .device_alloc_func = device_alloc,
+    .device_free_func = device_free,
+    .device_host2dev_func = device_host2dev,
+    .device_dev2host_func = device_dev2host,
+    
+    .device_run_func = device_run,
+
+    .openacc = {
+      .open_device_func = openacc_open_device,
+      .close_device_func = openacc_close_device,
+
+      .get_device_num_func = openacc_get_device_num,
+      .set_device_num_func = openacc_set_device_num,
+
+      /* Device available.  */
+      .avail_func = openacc_avail,
+
+      .exec_func = openacc_parallel,
+
+      .async_set_async_func = openacc_async_set_async,
+      .async_test_func = openacc_async_test,
+      .async_test_all_func = openacc_async_test_all,
+      .async_wait_func = openacc_async_wait,
+      .async_wait_async_func = openacc_async_wait_async,
+      .async_wait_all_func = openacc_async_wait_all,
+      .async_wait_all_async_func = openacc_async_wait_all_async,
+      
+      .cuda = {
+	.get_current_device_func = NULL,
+	.get_current_context_func = NULL,
+	.get_stream_func = NULL,
+	.set_stream_func = NULL,
+      }
+    }
+  };
+
+/* Register this device type.  */
+static __attribute__ ((constructor))
+void ACC_host_init (void)
+{
+  gomp_mutex_init (&host_dispatch.mem_map.lock);
+  ACC_register (&host_dispatch);
+}
+#endif
+
diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c
new file mode 100644
index 0000000..2aa2635
--- /dev/null
+++ b/libgomp/oacc-init.c
@@ -0,0 +1,513 @@
+/* OpenACC Runtime initialization routines
+
+   Copyright (C) 2013-2014 Free Software Foundation, Inc.
+
+   Contributed by Nathan Sidwell <nathan@codesourcery.com>.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "libgomp.h"
+#include "target.h"
+#include <assert.h>
+#include <stdlib.h>
+#include <strings.h>
+#include <stdbool.h>
+#include <sys/queue.h>
+#include <stdio.h>
+
+gomp_mutex_t acc_device_lock;
+
+/* Current dispatcher, and how it was initialized */
+static acc_device_t init_key = _ACC_device_hwm;
+
+/* The dispatch table for the current accelerator device.  This is currently
+   global, so you can only have one type of device open at any given time in a
+   program.  */
+struct gomp_device_descr const *ACC_dev;
+
+/* Handle for current thread.  */
+__thread  void *ACC_handle;
+static __thread int handle_num = -1;
+
+/* This context structure associates the handle for a physical device with
+   memory-mapping information for that device, and is used to associate new
+   host threads with previously-opened devices.  Note that it's not directly
+   connected with the CUDA "context" concept as used by the NVidia plugin.  */
+struct ACC_context {
+  struct memmap_t *ACC_memmap;
+  void *ACC_handle;
+  SLIST_ENTRY(ACC_context) next;
+};
+
+static SLIST_HEAD(_ACC_contexts, ACC_context) _ACC_contexts;
+static struct _ACC_contexts *ACC_contexts;
+
+static struct gomp_device_descr const *dispatchers[_ACC_device_hwm] = { 0 };
+
+void
+ACC_register (struct gomp_device_descr const *disp)
+{
+  gomp_mutex_lock (&acc_device_lock);
+
+  assert (acc_device_type (disp->type) != acc_device_none
+	  && acc_device_type (disp->type) != acc_device_default
+	  && acc_device_type (disp->type) != acc_device_not_host);
+  assert (!dispatchers[disp->type]);
+  dispatchers[disp->type] = disp;
+
+  gomp_mutex_unlock (&acc_device_lock);
+}
+
+static void
+close_handle (void)
+{
+  if (ACC_memmap)
+    {
+      if (ACC_mem_close (ACC_handle, ACC_memmap))
+        {
+          if (ACC_dev->openacc.close_device_func (ACC_handle) < 0)
+            gomp_fatal ("failed to close device");
+        }
+
+      ACC_memmap = 0;
+    }
+}
+
+static struct gomp_device_descr const *
+resolve_device (acc_device_t d)
+{
+  acc_device_t d_arg = d;
+
+  switch (d)
+    {
+    case acc_device_default:
+      {
+	if (goacc_device_type)
+	  {
+	    /* Lookup the named device.  */
+	    while (++d != _ACC_device_hwm)
+	      if (dispatchers[d]
+		  && !strcasecmp (goacc_device_type, dispatchers[d]->name)
+		  && dispatchers[d]->openacc.avail_func ())
+		goto found;
+
+	    gomp_fatal ("device type %s not supported", goacc_device_type);
+	  }
+
+	/* No default device specified, so start scanning for any non-host
+	   device that is available.  */
+	d = acc_device_not_host;
+      }
+      /* FALLTHROUGH */
+
+    case acc_device_not_host:
+      /* Find the first available device after acc_device_not_host.  */
+      while (++d != _ACC_device_hwm)
+	if (dispatchers[d] && dispatchers[d]->openacc.avail_func ())
+	  goto found;
+      if (d_arg == acc_device_default)
+	{	  
+	  d = acc_device_host;
+	  goto found;
+	}
+      gomp_fatal ("no device found");
+      break;
+
+    case acc_device_host:
+      break;
+
+    default:
+      if (d > _ACC_device_hwm)
+	gomp_fatal ("device %u out of range", (unsigned)d);
+      break;
+    }
+ found:
+
+  assert (d != acc_device_none
+	  && d != acc_device_default
+	  && d != acc_device_not_host);
+
+  return dispatchers[d];
+}
+
+static struct gomp_device_descr const *
+_acc_init (acc_device_t d)
+{
+  struct gomp_device_descr const *acc_dev;
+
+  if (ACC_dev)
+    gomp_fatal ("device already active");
+
+  init_key = d;  /* We need to remember what we were intialized as, to
+		    check shutdown etc.  */
+
+  acc_dev = resolve_device (d);
+  if (!acc_dev || !acc_dev->openacc.avail_func ())
+    gomp_fatal ("device %u not supported", (unsigned)d);
+
+  if (!acc_dev->is_initialized)
+    gomp_init_device ((struct gomp_device_descr *) acc_dev);
+
+  return acc_dev;
+}
+
+/* Open the ORD'th device of the currently-active type (ACC_dev must be
+   initialised before calling).  If ORD is < 0, open the default-numbered
+   device (set by the ACC_DEVICE_NUM environment variable or a call to
+   acc_set_device_num), or leave any currently-opened device as is.  "Opening"
+   consists of  calling the device's open_device_func hook, and either creating
+   a new memory mapping or associating a new thread with an existing such
+   mapping (that matches ACC_handle, i.e. which corresponds to the same
+   physical device).  */
+
+static void
+lazy_open (int ord)
+{
+  struct ACC_context *acc_ctx;
+
+  if (ACC_memmap)
+    {
+      assert (ord < 0 || ord == handle_num);
+      return;
+    }
+
+  assert (ACC_dev);
+
+  if (ord < 0)
+    ord = goacc_device_num;
+
+  ACC_handle = ACC_dev->openacc.open_device_func (ord);
+  handle_num = ord;
+
+  SLIST_FOREACH(acc_ctx, ACC_contexts, next)
+    {
+      if (acc_ctx->ACC_handle == ACC_handle)
+        {
+          ACC_memmap = acc_ctx->ACC_memmap;
+	  ACC_dev->openacc.async_set_async_func (acc_async_sync);
+
+          return;
+        }
+    }
+
+  ACC_memmap = ACC_mem_open (ACC_handle, NULL, handle_num);
+
+  ACC_dev->openacc.async_set_async_func (acc_async_sync);
+
+  acc_ctx = gomp_malloc (sizeof (struct ACC_context));
+  acc_ctx->ACC_handle = ACC_handle;
+  acc_ctx->ACC_memmap = ACC_memmap;
+
+  if (!ACC_memmap->mem_map.is_initialized)
+    gomp_init_tables (ACC_dev, &ACC_memmap->mem_map);
+
+  SLIST_INSERT_HEAD(ACC_contexts, acc_ctx, next);
+}
+
+/* OpenACC 2.0a (3.2.12, 3.2.13) doesn't specify whether the serialization of
+   init/shutdown is per-process or per-thread.  We choose per-process.  */
+
+void
+acc_init (acc_device_t d)
+{
+  if (!ACC_dev)
+    gomp_init_targets_once ();
+
+  gomp_mutex_lock (&acc_device_lock);
+
+  ACC_dev = _acc_init (d);
+
+  lazy_open (-1);
+
+  gomp_mutex_unlock (&acc_device_lock);
+}
+
+ialias (acc_init)
+
+void
+_acc_shutdown (acc_device_t d)
+{
+  /* We don't check whether d matches the actual device found, because
+     OpenACC 2.0 (3.2.12) says the parameters to the init and this
+     call must match (for the shutdown call anyway, it's silent on
+     others).  */
+
+  if (!ACC_dev)
+    gomp_fatal ("no device initialized");
+  if (init_key != d)
+    gomp_fatal ("device %u(%u) is initialized",
+	       (unsigned)init_key, (unsigned)ACC_dev->type);
+
+  close_handle ();
+
+  while (SLIST_FIRST(ACC_contexts) != NULL)
+    {
+      struct ACC_context *c;
+
+      c = SLIST_FIRST(ACC_contexts);
+      SLIST_REMOVE_HEAD(ACC_contexts, next);
+      free (c);
+    }
+
+  gomp_fini_device ((struct gomp_device_descr *) ACC_dev);
+
+  ACC_dev = 0;
+  ACC_handle = 0;
+  handle_num = -1;
+}
+
+void
+acc_shutdown (acc_device_t d)
+{
+  gomp_mutex_lock (&acc_device_lock);
+
+  _acc_shutdown (d);
+
+  gomp_mutex_unlock (&acc_device_lock);
+}
+
+ialias (acc_shutdown)
+
+static struct gomp_device_descr const *
+lazy_init (acc_device_t d)
+{
+  if (ACC_dev)
+    {
+      /* Re-initializing the same device, do nothing.  */
+      if (d == init_key)
+	return ACC_dev;
+
+      _acc_shutdown (init_key);
+    }
+
+  assert (!ACC_dev);
+
+  return _acc_init (d);
+}
+
+static void
+lazy_init_and_open (acc_device_t d)
+{
+  if (!ACC_dev)
+    gomp_init_targets_once ();
+
+  gomp_mutex_lock (&acc_device_lock);
+
+  ACC_dev = lazy_init (d);
+
+  lazy_open (-1);
+
+  gomp_mutex_unlock (&acc_device_lock);
+}
+
+int
+acc_get_num_devices (acc_device_t d)
+{
+  int n = 0;
+  struct gomp_device_descr const *acc_dev;
+
+  if (d == acc_device_none)
+    return 0;
+
+  if (!ACC_dev)
+    gomp_init_targets_once ();
+
+  acc_dev = resolve_device (d);
+  if (!acc_dev)
+    return 0;
+
+  n = acc_dev->device_init_func ();
+  if (n < 0)
+    n = 0;
+
+  return n;
+}
+
+ialias (acc_get_num_devices)
+
+void
+acc_set_device_type (acc_device_t d)
+{
+  lazy_init_and_open (d);
+}
+
+ialias (acc_set_device_type)
+
+acc_device_t
+acc_get_device_type (void)
+{
+  acc_device_t res = acc_device_none;
+  const struct gomp_device_descr *dev;
+
+  if (ACC_dev)
+    res = acc_device_type (ACC_dev->type);
+  else
+    {
+      gomp_init_targets_once ();
+
+      dev = resolve_device (acc_device_default);
+      res = acc_device_type (dev->type);
+    }
+
+  assert (res != acc_device_default
+	  && res != acc_device_not_host);
+
+  return res;
+}
+
+ialias (acc_get_device_type)
+
+int
+acc_get_device_num (acc_device_t d)
+{
+  const struct gomp_device_descr *dev;
+  int num;
+
+  if (d >= _ACC_device_hwm)
+    gomp_fatal ("device %u out of range", (unsigned)d);
+
+  if (!ACC_dev)
+    gomp_init_targets_once ();
+
+  dev = resolve_device (d);
+  if (!dev)
+    gomp_fatal ("no devices of type %u", d);
+
+  /* We might not have called lazy_open for this host thread yet, in which case
+     the get_device_num_func hook will return -1.  */
+  num = dev->openacc.get_device_num_func ();
+  if (num < 0)
+    num = goacc_device_num;
+  
+  return num;
+}
+
+ialias (acc_get_device_num)
+
+void
+acc_set_device_num (int n, acc_device_t d)
+{
+  const struct gomp_device_descr *dev;
+  int num_devices;
+
+  if (!ACC_dev)
+    gomp_init_targets_once ();
+  
+  if ((int) d == 0)
+    {
+      int i;
+      
+      /* A device setting of zero sets all device types on the system to use
+         the Nth instance of that device type.  Only attempt it for initialized
+	 devices though.  */
+      for (i = acc_device_not_host + 1; i < _ACC_device_hwm; i++)
+        {
+	  dev = resolve_device (d);
+	  if (dev && dev->is_initialized)
+	    dev->openacc.set_device_num_func (n);
+	}
+
+      /* ...and for future calls to acc_init/acc_set_device_type, etc.  */
+      goacc_device_num = n;
+    }
+  else
+    {
+      gomp_mutex_lock (&acc_device_lock);
+
+      ACC_dev = lazy_init (d);
+
+      num_devices = ACC_dev->get_num_devices_func ();
+
+      if (n >= num_devices)
+        gomp_fatal ("device %u out of range", n);
+
+      if (n != handle_num)
+	close_handle ();
+
+      lazy_open (n);
+
+      gomp_mutex_unlock (&acc_device_lock);
+    }
+}
+
+ialias (acc_set_device_num)
+
+int
+acc_on_device (acc_device_t dev)
+{
+  if (ACC_dev && acc_device_type (ACC_dev->type) == acc_device_host_nonshm)
+    return dev == acc_device_host_nonshm || dev == acc_device_not_host;
+    
+  /* Just rely on the compiler builtin.  */
+  return __builtin_acc_on_device (dev);
+}
+ialias (acc_on_device)
+
+attribute_hidden void
+ACC_runtime_initialize (void)
+{
+  gomp_mutex_init (&acc_device_lock);
+
+  ACC_contexts = &_ACC_contexts;
+  SLIST_INIT (ACC_contexts);
+}
+
+/* Compiler helper functions */
+
+static __thread struct gomp_device_descr const *saved_bound_dev;
+
+void
+ACC_save_and_set_bind (acc_device_t d)
+{
+  assert (!saved_bound_dev);
+
+  saved_bound_dev = ACC_dev;
+  ACC_dev = dispatchers[d];
+}
+
+void
+ACC_restore_bind (void)
+{
+  ACC_dev = saved_bound_dev;
+  saved_bound_dev = NULL;
+}
+
+/* This is called from any OpenACC support function that may need to implicitly
+   initialize the libgomp runtime.  On exit all such initialization will have
+   been done, and both the global ACC_dev and the per-host-thread ACC_memmap
+   pointers will be valid.  */
+
+void
+ACC_lazy_initialize (void)
+{
+  if (ACC_dev && ACC_memmap)
+    return;
+
+  if (!ACC_dev)
+    lazy_init_and_open (acc_device_default);
+  else
+    {
+      gomp_mutex_lock (&acc_device_lock);
+      lazy_open (-1);
+      gomp_mutex_unlock (&acc_device_lock);
+    }
+}
diff --git a/libgomp/oacc-int.h b/libgomp/oacc-int.h
new file mode 100644
index 0000000..470774b
--- /dev/null
+++ b/libgomp/oacc-int.h
@@ -0,0 +1,127 @@
+/* OpenACC Runtime - internal declarations
+
+   Copyright (C) 2005-2014 Free Software Foundation, Inc.
+
+   Contributed by Nathan Sidwell <nathan@codesourcery.com> and Thomas Schwinge
+   <thomas@codesourcery.com>.  In parts based on libgomp.h contributed by
+   Richard Henderson <rth@redhat.com>.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* This file contains data types and function declarations that are not
+   part of the official OpenACC user interface.  There are declarations
+   in here that are part of the GNU OpenACC ABI, in that the compiler is
+   required to know about them and use them.
+
+   The convention is that the all caps prefix "GOACC" is used group items
+   that are part of the external ABI, and the lower case prefix "goacc"
+   is used group items that are completely private to the library.  */
+
+#ifndef _OACC_INT_H
+#define _OACC_INT_H 1
+
+#include "openacc.h"
+#include "config.h"
+#include <stddef.h>
+#include <stdbool.h>
+#include <stdarg.h>
+
+#ifdef HAVE_ATTRIBUTE_VISIBILITY
+# pragma GCC visibility push(hidden)
+#endif
+
+typedef struct ACC_dispatch_t
+{
+  /* open or close a device instance.  */
+  void *(*open_device_func) (int n);
+  int (*close_device_func) (void *h);
+
+  /* set or get the device number.  */
+  int (*get_device_num_func) (void);
+  void (*set_device_num_func) (int);
+
+  /* availability */
+  bool (*avail_func) (void);
+
+  /* execute */
+  void (*exec_func) (void (*) (void *), size_t, void **, void **, size_t *,
+		     unsigned short *, int, int, int, int, void *);
+
+  /* asynchronous routines  */
+  int (*async_test_func) (int);
+  int (*async_test_all_func) (void);
+  void (*async_wait_func) (int);
+  void (*async_wait_async_func) (int, int);
+  void (*async_wait_all_func) (void);
+  void (*async_wait_all_async_func) (int);
+  void (*async_set_async_func) (int);
+
+  /* NVIDIA target specific routines  */
+  struct {
+    void *(*get_current_device_func) (void);
+    void *(*get_current_context_func) (void);
+    void *(*get_stream_func) (int);
+    int (*set_stream_func) (int, void *);
+  } cuda;
+} ACC_dispatch_t;
+
+typedef enum ACC_dispatch_f
+  {
+    ACC_unified_mem_f = 1 << 0,
+  }
+ACC_dispatch_f;
+
+struct gomp_device_descr;
+
+void ACC_register (struct gomp_device_descr const *) __GOACC_NOTHROW;
+
+/* Memory routines.  */
+struct memmap_t *ACC_mem_open (void *, struct memmap_t *, int) __GOACC_NOTHROW;
+bool ACC_mem_close (void *, struct memmap_t *) __GOACC_NOTHROW;
+struct gomp_device_descr *ACC_resolve_device(int) __GOACC_NOTHROW;
+
+/* Current dispatcher */
+extern struct gomp_device_descr const *ACC_dev;
+
+/* Device handle for current thread.  */
+extern __thread void *ACC_handle;
+
+typedef struct memmap_t
+{
+  unsigned live;
+  struct target_mem_desc *tlist;
+  struct gomp_memory_mapping mem_map;
+} memmap_t;
+
+/* Memory mapping */
+extern __thread struct memmap_t *ACC_memmap;
+
+void ACC_runtime_initialize (void);
+void ACC_save_and_set_bind (acc_device_t);
+void ACC_restore_bind (void);
+void ACC_lazy_initialize (void);
+
+#ifdef HAVE_ATTRIBUTE_VISIBILITY
+# pragma GCC visibility pop
+#endif
+
+#endif /* _OACC_INT_H */
diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
new file mode 100644
index 0000000..52798cc
--- /dev/null
+++ b/libgomp/oacc-mem.c
@@ -0,0 +1,528 @@
+/* OpenACC Runtime initialization routines
+
+   Copyright (C) 2013 Free Software Foundation, Inc.
+
+   Contributed by Nathan Sidwell <nathan@codesourcery.com>.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "openacc.h"
+#include "config.h"
+#include "libgomp.h"
+#include "gomp-constants.h"
+#include "target.h"
+#include <stdio.h>
+#include <stdint.h>
+
+#include "splay-tree.h"
+
+/* Although this pointer is local to each host thread, it points to a memmap_t
+   that is stored per-context (different host threads may be associated with
+   different contexts, and each context is associated with a physical
+   device).  */
+__thread struct memmap_t *ACC_memmap;
+
+memmap_t *
+ACC_mem_open (void *handle, memmap_t *src, int handle_num)
+{
+  if (!src)
+    {
+      src = gomp_malloc (sizeof (*src));
+      src->live = 0;
+      src->mem_map.splay_tree.root = NULL;
+      src->tlist = NULL;
+      gomp_mutex_init (&src->mem_map.lock);
+      src->mem_map.is_initialized = false;
+    }
+
+  src->live++;
+
+  return src;
+}
+
+bool
+ACC_mem_close (void *handle, memmap_t *mm)
+{
+  bool closed = 0;
+
+  if (!--mm->live)
+    {
+      struct target_mem_desc *t;
+
+      for (t = mm->tlist; t != NULL; t = t->prev)
+        {
+          ACC_dev->device_free_func (t->to_free);
+
+          t->tgt_end = 0;
+          t->to_free = 0;
+
+          gomp_unmap_vars (t, true);
+        }
+
+       closed = 1;
+    }
+
+  gomp_mutex_destroy (&mm->mem_map.lock);
+
+  return closed;
+}
+
+/* Return block containing [H->S), or NULL if not contained.  */
+
+attribute_hidden splay_tree_key
+lookup_host (memmap_t *mm, void *h, size_t s)
+{
+  struct splay_tree_key_s node;
+  splay_tree_key key;
+  struct gomp_memory_mapping *mem_map = &mm->mem_map;
+
+  node.host_start = (uintptr_t) h;
+  node.host_end = (uintptr_t) h + s;
+
+  gomp_mutex_lock (&mem_map->lock);
+
+  key = splay_tree_lookup (&mem_map->splay_tree, &node);
+
+  gomp_mutex_unlock (&mem_map->lock);
+
+  return key;
+}
+
+/* Return block containing [D->S), or NULL if not contained.
+   The list isn't ordered by device address, so we have to iterate
+   over the whole array.  This is not expected to be a common
+   operation.  */
+
+static splay_tree_key
+lookup_dev (memmap_t *b, void *d, size_t s)
+{
+  int i;
+  struct target_mem_desc *t;
+
+  gomp_mutex_lock (&b->mem_map.lock);
+
+  for (t = b->tlist; t != NULL; t = t->prev)
+    {
+      if (t->tgt_start <= (uintptr_t) d && t->tgt_end >= (uintptr_t) d + s)
+        break;
+    }
+
+  gomp_mutex_unlock (&b->mem_map.lock);
+
+  if (!t)
+    return NULL;
+
+  for (i = 0; i < t->refcount; i++)
+    {
+      void * offset;
+
+      splay_tree_key k = &t->array[i].key;
+      offset = d - t->tgt_start + k->tgt_offset;
+
+      if (k->host_start + offset <= (void *) k->host_end)
+        return k;
+    }
+ 
+  return NULL;
+}
+
+/* OpenACC is silent on how memory exhaustion is indicated.  We return
+   NULL.  */
+
+void *
+acc_malloc (size_t s)
+{
+  if (!s)
+    return NULL;
+
+  ACC_lazy_initialize ();
+
+  return ACC_dev->device_alloc_func (s);
+}
+
+/* OpenACC 2.0a (3.2.16) doesn't specify what to do in the event
+   the device address is mapped. We choose to check if it mapped,
+   and if it is, to unmap it. */
+void
+acc_free (void *d)
+{
+  splay_tree_key k;
+
+  if (!d)
+    return;
+
+  /* We don't have to call lazy open here, as the ptr value must have
+     been returned by acc_malloc.  It's not permitted to pass NULL in
+     (unless you got that null from acc_malloc).  */
+  if ((k = lookup_dev (ACC_memmap, d, 1)))
+   {
+     void *offset;
+
+     offset = d - k->tgt->tgt_start + k->tgt_offset;
+
+     acc_unmap_data((void *)(k->host_start + offset));
+   }
+
+  ACC_dev->device_free_func (d);
+}
+
+void
+acc_memcpy_to_device (void *d, void *h, size_t s)
+{
+  /* No need to call lazy open here, as the device pointer must have
+     been obtained from a routine that did that.  */
+  ACC_dev->device_host2dev_func (d, h, s);
+}
+
+void
+acc_memcpy_from_device (void *h, void *d, size_t s)
+{
+  /* No need to call lazy open here, as the device pointer must have
+     been obtained from a routine that did that.  */
+  ACC_dev->device_dev2host_func (h, d, s);
+}
+
+/* Return the device pointer that corresponds to host data H.  Or NULL
+   if no mapping.  */
+
+void *
+acc_deviceptr (void *h)
+{
+  splay_tree_key n;
+  void *d;
+  void *offset;
+
+  ACC_lazy_initialize ();
+
+  n = lookup_host (ACC_memmap, h, 1);
+
+  if (!n)
+    return NULL;
+
+  offset = h - n->host_start;
+
+  d = n->tgt->tgt_start + n->tgt_offset + offset;
+
+  return d;
+}
+
+/* Return the host pointer that corresponds to device data D.  Or NULL
+   if no mapping.  */
+
+void *
+acc_hostptr (void *d)
+{
+  splay_tree_key n;
+  void *h;
+  void *offset;
+
+  ACC_lazy_initialize ();
+
+  n = lookup_dev (ACC_memmap, d, 1);
+
+  if (!n)
+    return NULL;
+
+  offset = d - n->tgt->tgt_start + n->tgt_offset;
+
+  h = n->host_start + offset;
+
+  return h;
+}
+
+/* Return 1 if host data [H,+S] is present on the device.  */
+
+int
+acc_is_present (void *h, size_t s)
+{
+  splay_tree_key n;
+
+  if (!s || !h)
+    return 0;
+
+  ACC_lazy_initialize ();
+
+  n = lookup_host (ACC_memmap, h, s);
+
+  if (n && (((uintptr_t)h < n->host_start) ||
+	((uintptr_t)h + s > n->host_end) || (s > n->host_end - n->host_start)))
+    n = NULL;
+
+  return n != NULL;
+}
+
+/* Create a mapping for host [H,+S] -> device [D,+S] */
+
+void
+acc_map_data (void *h, void *d, size_t s)
+{
+  struct target_mem_desc *tgt;
+  size_t mapnum = 1;
+  void *hostaddrs = h;
+  void *devaddrs = d;
+  size_t sizes = s;
+  unsigned short kinds = GOMP_MAP_ALLOC;
+
+  ACC_lazy_initialize ();
+
+  if (ACC_dev->capabilities & TARGET_CAP_SHARED_MEM)
+    {
+      if (d != h)
+        gomp_fatal ("cannot map data on shared-memory system");
+
+      tgt = gomp_map_vars (NULL, NULL, 0, NULL, NULL, NULL, NULL, true, false);
+    }
+  else
+    {
+      if (!d || !h || !s)
+	gomp_fatal ("[%p,+%d]->[%p,+%d] is a bad map",
+                    (void *)h, (int)s, (void *)d, (int)s);
+
+      if (lookup_host (ACC_memmap, h, s))
+	gomp_fatal ("host address [%p, +%d] is already mapped", (void *)h,
+		    (int)s);
+
+      if (lookup_dev (ACC_memmap, d, s))
+	gomp_fatal ("device address [%p, +%d] is already mapped", (void *)d,
+		    (int)s);
+
+      tgt = gomp_map_vars ((struct gomp_device_descr *) ACC_dev,
+			   &ACC_memmap->mem_map, mapnum, &hostaddrs,
+			   &devaddrs, &sizes, &kinds, true, false);
+    }
+
+  tgt->prev = ACC_memmap->tlist;
+  ACC_memmap->tlist = tgt;
+}
+
+void
+acc_unmap_data (void *h)
+{
+  /* No need to call lazy open, as the address must have been mapped.
+   */
+
+  size_t host_size;
+  splay_tree_key n = lookup_host (ACC_memmap, h, 1);
+  struct target_mem_desc *t;
+
+  if (!n)
+    gomp_fatal ("%p is not a mapped block", (void *)h);
+
+  host_size = n->host_end - n->host_start;
+
+  if (n->host_start != (uintptr_t) h)
+    gomp_fatal ("[%p,%d] surrounds1 %p",
+            (void *)n->host_start, (int)host_size, (void *)h);
+
+  t = n->tgt;
+
+  if (t->refcount == 2)
+    {
+      struct target_mem_desc *tp;
+
+      /* This is the last reference, so pull the descriptor off the 
+         chain. This avoids gomp_unmap_vars via gomp_unmap_tgt from
+         freeing the device memory. */
+      t->tgt_end = 0;
+      t->to_free = 0;
+
+      gomp_mutex_lock (&ACC_memmap->mem_map.lock);
+
+      for (tp = NULL, t = ACC_memmap->tlist; t != NULL; tp = t, t = t->prev)
+        {
+          if (n->tgt == t)
+            {
+              if (tp)
+                tp->prev = t->prev;
+              else
+                ACC_memmap->tlist = t->prev;
+
+              break; 
+            }
+        }
+
+      gomp_mutex_unlock (&ACC_memmap->mem_map.lock);
+    }
+
+  gomp_unmap_vars (t, true);
+}
+
+#define PCC_Present (1 << 0)
+#define PCC_Create (1 << 1)
+#define PCC_Copy (1 << 2)
+
+attribute_hidden void *
+present_create_copy (unsigned f, void *h, size_t s)
+{
+  void *d;
+  splay_tree_key n;
+
+  if (!h || !s)
+    gomp_fatal ("[%p,+%d] is a bad range", (void *)h, (int)s);
+
+  ACC_lazy_initialize ();
+
+  n = lookup_host (ACC_memmap, h, s);
+  if (n)
+    {
+      /* Present. */
+      d = (void *) (n->tgt->tgt_start + n->tgt_offset);
+
+      if (!(f & PCC_Present))
+        gomp_fatal ("[%p,+%d] already mapped to [%p,+%d]",
+            (void *)h, (int)s, (void *)d, (int)s);
+      if ((h + s) > (void *)n->host_end)    
+        gomp_fatal ("[%p,+%d] not mapped", (void *)h, (int)s);
+    }
+  else if (!(f & PCC_Create))
+    {
+      gomp_fatal ("[%p,+%d] not mapped", (void *)h, (int)s);
+    }
+  else
+    {
+      struct target_mem_desc *tgt;
+      size_t mapnum = 1;
+      unsigned short kinds;
+      void *hostaddrs = h;
+
+      if (f & PCC_Copy)
+        kinds = GOMP_MAP_ALLOC_TO;
+      else
+        kinds = GOMP_MAP_ALLOC;
+
+      tgt = gomp_map_vars ((struct gomp_device_descr *) ACC_dev,
+			   &ACC_memmap->mem_map, mapnum, &hostaddrs,
+			   NULL, &s, &kinds, true, false);
+
+      d = tgt->to_free;
+      tgt->prev = ACC_memmap->tlist;
+      ACC_memmap->tlist = tgt;
+    }
+  
+  return d;
+}
+
+void *
+acc_create (void *h, size_t s)
+{
+  return present_create_copy (PCC_Create, h, s);
+}
+
+void *
+acc_copyin (void *h, size_t s)
+{
+  return present_create_copy (PCC_Create | PCC_Copy, h, s);
+}
+
+void *
+acc_present_or_create (void *h, size_t s)
+{
+  return present_create_copy (PCC_Present | PCC_Create, h, s);
+}
+
+void *
+acc_present_or_copyin (void *h, size_t s)
+{
+  return present_create_copy (PCC_Present | PCC_Create | PCC_Copy, h, s);
+}
+
+#define DC_Copyout (1 << 0)
+
+static void
+delete_copyout (unsigned f, void *h, size_t s)
+{
+  size_t host_size;
+  splay_tree_key n;
+  void *d;
+
+  n = lookup_host (ACC_memmap, h, s);
+
+  /* No need to call lazy open, as the data must already have been
+     mapped.  */
+
+  if (!n)
+    gomp_fatal ("[%p,%d] is not mapped", (void *)h, (int)s);
+
+  d = (void *) (n->tgt->tgt_start + n->tgt_offset);
+
+  host_size = n->host_end - n->host_start;
+
+  if (n->host_start != (uintptr_t) h || host_size != s)
+    gomp_fatal ("[%p,%d] surrounds2 [%p,+%d]",
+            (void *)n->host_start, (int)host_size, (void *)h, (int)s);
+
+  if (f & DC_Copyout)
+    ACC_dev->device_dev2host_func (h, d, s);
+  
+  acc_unmap_data(h);
+
+  ACC_dev->device_free_func (d);
+}
+
+void
+acc_delete (void *h , size_t s)
+{
+  delete_copyout (0, h, s);
+}
+
+void acc_copyout (void *h, size_t s)
+{
+  delete_copyout (DC_Copyout, h, s);
+}
+
+static void
+update_dev_host (int is_dev, void *h, size_t s)
+{
+  splay_tree_key n;
+  void *d;
+
+  if (!ACC_memmap)
+    gomp_fatal ("[%p,%d] is not mapped", h, (int)s);
+
+  n = lookup_host (ACC_memmap, h, s);
+
+  /* No need to call lazy open, as the data must already have been
+     mapped.  */
+
+  if (!n)
+    gomp_fatal ("[%p,%d] is not mapped", h, (int)s);
+
+  d = (void *) (n->tgt->tgt_start + n->tgt_offset);
+
+  if (is_dev)
+    ACC_dev->device_host2dev_func (d, h, s);
+  else
+    ACC_dev->device_dev2host_func (h, d, s);
+
+}
+
+void
+acc_update_device (void *h, size_t s)
+{
+  update_dev_host (1, h, s);
+}
+
+void
+acc_update_self (void *h, size_t s)
+{
+  update_dev_host (0, h, s);
+}
diff --git a/libgomp/oacc-parallel.c b/libgomp/oacc-parallel.c
index 02fbb12..57ac8de 100644
--- a/libgomp/oacc-parallel.c
+++ b/libgomp/oacc-parallel.c
@@ -25,73 +25,242 @@
 
 /* This file handles OpenACC constructs.  */
 
+#include "openacc.h"
 #include "libgomp.h"
 #include "libgomp_g.h"
-#include "openacc.h"
+#include "gomp-constants.h"
+#include "target.h"
+#include <stdio.h>
+#include <string.h>
+#include <stdarg.h>
+#include <assert.h>
+#include <alloca.h>
+
+#ifdef FUTURE
+// device geometry per device type
+struct devgeom
+{
+  int gangs;
+  int workers;
+  int vectors;
+};
+  
+
+// XXX: acceptable defaults?
+static __thread struct devgeom devgeom = { 1, 1, 1 };
+#endif
+
+#ifdef LATER
+static void
+dump_devaddrs(void)
+{
+  int i;
+  struct devaddr *dp;
+
+  gomp_notify("++++ num_devaddrs %d\n", num_devaddrs);
+  for (dp = devaddrs, i = 1; dp != 0; dp = dp->next, i++)
+    {
+      gomp_notify("++++ %.02d) %p\n", i, dp->d);
+    }
+}
+#endif
+
+static void
+dump_var(char *s, size_t idx, void *hostaddr, size_t size, unsigned char kind)
+{
+  gomp_notify(" %2zi: %3s 0x%.2x -", idx, s, kind & 0xff);
+
+  switch (kind & 0xff)
+    {
+      case 0x00: gomp_notify(" ALLOC              "); break;
+      case 0x01: gomp_notify(" ALLOC TO           "); break;
+      case 0x02: gomp_notify(" ALLOC FROM         "); break;
+      case 0x03: gomp_notify(" ALLOC TOFROM       "); break;
+      case 0x04: gomp_notify(" POINTER            "); break;
+      case 0x05: gomp_notify(" TO_PSET            "); break;
+
+      case 0x08: gomp_notify(" FORCE_ALLOC        "); break;
+      case 0x09: gomp_notify(" FORCE_TO           "); break;
+      case 0x0a: gomp_notify(" FORCE_FROM         "); break;
+      case 0x0b: gomp_notify(" FORCE_TOFROM       "); break;
+      case 0x0c: gomp_notify(" FORCE_PRESENT      "); break;
+      case 0x0d: gomp_notify(" FORCE_DEALLOC      "); break;
+      case 0x0e: gomp_notify(" FORCE_DEVICEPTR    "); break;
+
+      case 0x18: gomp_notify(" FORCE_PRIVATE      "); break;
+      case 0x19: gomp_notify(" FORCE_FIRSTPRIVATE "); break;
+
+      case (unsigned char) -1: gomp_notify(" DUMMY              "); break;
+      default: gomp_notify("UGH! 0x%x\n", kind);
+    }
+    
+  gomp_notify("- %d - %4d/0x%04x ", 1 << (kind >> 8), (int)size, (int)size);
+  gomp_notify("- %p\n", hostaddr);
+
+  return;
+}
+
+/* Ensure that the target device for DEVICE_TYPE is initialised (and that
+   plugins have been loaded if appropriate).  The ACC_dev variable for the
+   current thread will be set appropriately for the given device type on
+   return.  */
+
+attribute_hidden void
+select_acc_device (int device_type)
+{
+  if (device_type == GOMP_IF_CLAUSE_FALSE)
+    return;
+
+  if (device_type == acc_device_none)
+    device_type = acc_device_host;
+
+  if (device_type >= 0)
+    {
+      /* NOTE: this will go badly if the surrounding data environment is set up
+         to use a different device type.  We'll just have to trust that users
+	 know what they're doing...  */
+      acc_set_device_type (device_type);
+    }
+
+  ACC_lazy_initialize ();
+}
+
+void goacc_wait (int async, int num_waits, va_list ap);
 
 void
 GOACC_parallel (int device, void (*fn) (void *), const void *openmp_target,
 		size_t mapnum, void **hostaddrs, size_t *sizes,
 		unsigned short *kinds,
-		int num_gangs, int num_workers, int vector_length)
+		int num_gangs, int num_workers, int vector_length,
+		int async, int num_waits, ...)
 {
-  unsigned char kinds_[mapnum];
-  size_t i;
+  bool if_clause_condition_value = device != GOMP_IF_CLAUSE_FALSE;
+  va_list ap;
+  struct target_mem_desc *tgt;
+  void **devaddrs;
+  unsigned int i;
+  struct splay_tree_key_s k;
+  splay_tree_key tgt_fn_key;
+  void (*tgt_fn);
 
-  /* TODO.  Eventually, we'll be interpreting all mapping kinds according to
-     the OpenACC semantics; for now we're re-using what is implemented for
-     OpenMP.  */
-  for (i = 0; i < mapnum; ++i)
-    {
-      unsigned char kind = kinds[i];
-      unsigned char align = kinds[i] >> 8;
-      if (kind > 4)
-	gomp_fatal ("memory mapping kind %x for %zd is not yet supported",
-		    kind, i);
-
-      kinds_[i] = kind | align << 3;
-    }
   if (num_gangs != 1)
     gomp_fatal ("num_gangs (%d) different from one is not yet supported",
 		num_gangs);
   if (num_workers != 1)
     gomp_fatal ("num_workers (%d) different from one is not yet supported",
 		num_workers);
-  if (vector_length != 1)
-    gomp_fatal ("vector_length (%d) different from one is not yet supported",
-		vector_length);
 
-  GOMP_target (device, fn, openmp_target, mapnum, hostaddrs, sizes, kinds_);
+  gomp_notify ("%s: mapnum=%zd, hostaddrs=%p, sizes=%p, kinds=%p, async=%d\n",
+	       __FUNCTION__, mapnum, hostaddrs, sizes, kinds, async);
+
+  select_acc_device (device);
+
+  /* Host fallback if "if" clause is false or if the current device is set to
+     the host.  */
+  if (!if_clause_condition_value)
+    {
+      ACC_save_and_set_bind (acc_device_host);
+      fn (hostaddrs);
+      ACC_restore_bind ();
+      return;
+    }
+  else if (acc_device_type (ACC_dev->type) == acc_device_host)
+    {
+      fn (hostaddrs);
+      return;
+    }
+
+  va_start (ap, num_waits);
+  
+  if (num_waits > 0)
+    goacc_wait (async, num_waits, ap);
+
+  va_end (ap);
+
+  ACC_dev->openacc.async_set_async_func (async);
+
+  if (!(ACC_dev->capabilities & TARGET_CAP_NATIVE_EXEC))
+    {
+      k.host_start = (uintptr_t) fn;
+      k.host_end = k.host_start + 1;
+      gomp_mutex_lock (&ACC_memmap->mem_map.lock);
+      tgt_fn_key = splay_tree_lookup (&ACC_memmap->mem_map.splay_tree, &k);
+      gomp_mutex_unlock (&ACC_memmap->mem_map.lock);
+
+      if (tgt_fn_key == NULL)
+	gomp_fatal ("target function wasn't mapped: perhaps -fopenacc was "
+		    "used without -flto?");
+
+      tgt_fn = (void (*)) tgt_fn_key->tgt->tgt_start;
+    }
+  else
+    tgt_fn = (void (*)) fn;
+
+  tgt = gomp_map_vars ((struct gomp_device_descr *) ACC_dev,
+		       &ACC_memmap->mem_map, mapnum, hostaddrs,
+		       NULL, sizes, kinds, true, false);
+
+  devaddrs = alloca (sizeof (void *) * mapnum);
+  for (i = 0; i < mapnum; i++)
+    devaddrs[i] = (void *) (tgt->list[i]->tgt->tgt_start
+			    + tgt->list[i]->tgt_offset);
+
+  ACC_dev->openacc.exec_func (tgt_fn, mapnum, hostaddrs, devaddrs, sizes, kinds,
+			      num_gangs, num_workers, vector_length, async,
+			      tgt);
+
+  /* If running synchronously, unmap immediately.  */
+  if (async < acc_async_noval)
+    gomp_unmap_vars (tgt, true);
+  else
+    gomp_copy_from_async (tgt);
+
+  ACC_dev->openacc.async_set_async_func (acc_async_sync);
 }
 
+static __thread struct target_mem_desc *mapped_data = NULL;
 
 void
 GOACC_data_start (int device, const void *openmp_target, size_t mapnum,
 		  void **hostaddrs, size_t *sizes, unsigned short *kinds)
 {
-  unsigned char kinds_[mapnum];
-  size_t i;
+  bool if_clause_condition_value = device != GOMP_IF_CLAUSE_FALSE;
+  struct target_mem_desc *tgt;
 
-  /* TODO.  Eventually, we'll be interpreting all mapping kinds according to
-     the OpenACC semantics; for now we're re-using what is implemented for
-     OpenMP.  */
-  for (i = 0; i < mapnum; ++i)
+  gomp_notify ("%s: mapnum=%zd, hostaddrs=%p, sizes=%p, kinds=%p\n",
+	       __FUNCTION__, mapnum, hostaddrs, sizes, kinds);
+
+  select_acc_device (device);
+
+  /* Host fallback or 'do nothing'.  */
+  if ((ACC_dev->capabilities & TARGET_CAP_SHARED_MEM)
+      || !if_clause_condition_value)
     {
-      unsigned char kind = kinds[i];
-      unsigned char align = kinds[i] >> 8;
-      if (kind > 4)
-	gomp_fatal ("memory mapping kind %x for %zd is not yet supported",
-		    kind, i);
+      tgt = gomp_map_vars (NULL, NULL, 0, NULL, NULL, NULL, NULL, true, false);
+      tgt->prev = mapped_data;
+      mapped_data = tgt;
 
-      kinds_[i] = kind | align << 3;
+      return;
     }
-  GOMP_target_data (device, openmp_target, mapnum, hostaddrs, sizes, kinds_);
+
+  gomp_notify ("  %s: prepare mappings\n", __FUNCTION__);
+  tgt = gomp_map_vars ((struct gomp_device_descr *) ACC_dev,
+		       &ACC_memmap->mem_map, mapnum, hostaddrs,
+		       NULL, sizes, kinds, true, false);
+  gomp_notify ("  %s: mappings prepared\n", __FUNCTION__);
+  tgt->prev = mapped_data;
+  mapped_data = tgt;
 }
 
 void
 GOACC_data_end (void)
 {
-  GOMP_target_end_data ();
+  struct target_mem_desc *tgt = mapped_data;
+
+  gomp_notify ("  %s: restore mappings\n", __FUNCTION__);
+  mapped_data = tgt->prev;
+  gomp_unmap_vars (tgt, true);
+  gomp_notify ("  %s: mappings restored\n", __FUNCTION__);
 }
 
 
@@ -99,42 +268,139 @@ void
 GOACC_kernels (int device, void (*fn) (void *), const void *openmp_target,
 	       size_t mapnum, void **hostaddrs, size_t *sizes,
 	       unsigned short *kinds,
-	       int num_gangs, int num_workers, int vector_length)
+	       int num_gangs, int num_workers, int vector_length,
+	       int async, int num_waits, ...)
 {
+  gomp_notify ("%s: mapnum=%zd, hostaddrs=%p, sizes=%p, kinds=%p\n", __FUNCTION__,
+	 mapnum, hostaddrs, sizes, kinds);
+
+  va_list ap;
+
+  select_acc_device (device);
+
+  va_start (ap, num_waits);
+
+  if (num_waits > 0)
+    goacc_wait (async, num_waits, ap);
+
+  va_end (ap);
+
   /* TODO.  */
   GOACC_parallel (device, fn, openmp_target, mapnum, hostaddrs, sizes, kinds,
-		  num_gangs, num_workers, vector_length);
+		  num_gangs, num_workers, vector_length, async, num_waits);
 }
 
+void
+goacc_wait (int async, int num_waits, va_list ap)
+{
+  int i;
+
+  assert (num_waits >= 0);
+
+  if (async == acc_async_sync && num_waits == 0)
+    {
+      acc_wait_all ();
+      return;
+    }
+
+  if (async == acc_async_sync && num_waits)
+    {
+      for (i = 0; i < num_waits; i++)
+        {
+          int qid = va_arg (ap, int);
+
+          if (acc_async_test (qid))
+            continue;
+
+          acc_wait (qid);
+        }
+      return;
+    }
+
+  if (async == acc_async_noval && num_waits == 0)
+    {
+      ACC_dev->openacc.async_wait_all_async_func (acc_async_noval);
+      return;
+    }
+
+  for (i = 0; i < num_waits; i++)
+    {
+      int qid = va_arg (ap, int);
+
+      if (acc_async_test (qid))
+	continue;
+
+      /* If we're waiting on the same asynchronous queue as we're launching on,
+         the queue itself will order work as required, so there's no need to
+	 wait explicitly.  */
+      if (qid != async)
+	ACC_dev->openacc.async_wait_async_func (qid, async);
+    }
+}
 
 void
 GOACC_update (int device, const void *openmp_target, size_t mapnum,
-	      void **hostaddrs, size_t *sizes, unsigned short *kinds)
+	      void **hostaddrs, size_t *sizes, unsigned short *kinds,
+	      int async, int num_waits, ...)
 {
-  unsigned char kinds_[mapnum];
+  bool if_clause_condition_value = device != GOMP_IF_CLAUSE_FALSE;
   size_t i;
 
-  /* TODO.  Eventually, we'll be interpreting all mapping kinds according to
-     the OpenACC semantics; for now we're re-using what is implemented for
-     OpenMP.  */
+  select_acc_device (device);
+
+  if ((ACC_dev->capabilities & TARGET_CAP_SHARED_MEM)
+      || !if_clause_condition_value)
+    return;
+
+  if (num_waits > 0)
+    {
+      va_list ap;
+
+      va_start (ap, num_waits);
+
+      goacc_wait (async, num_waits, ap);
+
+      va_end (ap);
+    }
+
+  ACC_dev->openacc.async_set_async_func (async);
+
   for (i = 0; i < mapnum; ++i)
     {
-      unsigned char kind = kinds[i];
-      unsigned char align = kinds[i] >> 8;
-      if (kind > 4)
-	gomp_fatal ("memory mapping kind %x for %zd is not yet supported",
-		    kind, i);
+      unsigned char kind = kinds[i] & 0xff;
+
+      dump_var("UPD", i, hostaddrs[i], sizes[i], kinds[i]);
+
+      switch (kind)
+	{
+	  case GOMP_MAP_POINTER:
+	     break;
+
+	  case GOMP_MAP_FORCE_TO:
+	     acc_update_device (hostaddrs[i], sizes[i]);
+	     break;
 
-      kinds_[i] = kind | align << 3;
+	  case GOMP_MAP_FORCE_FROM:
+	     acc_update_self (hostaddrs[i], sizes[i]);
+	     break;
+
+	  default:
+	     gomp_fatal (">>>> GOACC_update UNHANDLED kind 0x%.2x", kind);
+	     break;
+	}
     }
-  GOMP_target_update (device, openmp_target, mapnum, hostaddrs, sizes, kinds_);
+
+  ACC_dev->openacc.async_set_async_func (acc_async_sync);
 }
 
-/* TODO: Move elsewhere.  */
-int
-acc_on_device (acc_device_t dev)
+void
+GOACC_wait (int async, int num_waits, ...)
 {
-  /* Just rely on the compiler builtin.  */
-  return __builtin_acc_on_device (dev);
+  va_list ap;
+
+  va_start (ap, num_waits);
+
+  goacc_wait (async, num_waits, ap);
+
+  va_end (ap);
 }
-ialias (acc_on_device)
diff --git a/libgomp/oacc-plugin.c b/libgomp/oacc-plugin.c
new file mode 100644
index 0000000..c335b51
--- /dev/null
+++ b/libgomp/oacc-plugin.c
@@ -0,0 +1,44 @@
+/* Copyright (C) 2014 Free Software Foundation, Inc.
+   Contributed by CodeSourcery.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* Initialize and register OpenACC dispatch table from libgomp plugin.  */
+
+#include "libgomp.h"
+#include "oacc-plugin.h"
+#include "target.h"
+
+void
+ACC_plugin_register (struct gomp_device_descr *device)
+{
+  ACC_register (device);
+}
+
+
+void
+gomp_plugin_async_unmap_vars (void *ptr)
+{
+  struct target_mem_desc *tgt = ptr;
+  
+  gomp_unmap_vars (tgt, false);
+}
diff --git a/libgomp/oacc-plugin.h b/libgomp/oacc-plugin.h
new file mode 100644
index 0000000..0493a12
--- /dev/null
+++ b/libgomp/oacc-plugin.h
@@ -0,0 +1,32 @@
+/* Copyright (C) 2014 Free Software Foundation, Inc.
+   Contributed by CodeSourcery.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _OACC_PLUGIN_H
+#define _OACC_PLUGIN_H 1
+
+#include "target.h"
+
+extern void ACC_plugin_register (struct gomp_device_descr *dev);
+
+#endif
diff --git a/libgomp/openacc.f90 b/libgomp/openacc.f90
index 70b58d6..71f376a 100644
--- a/libgomp/openacc.f90
+++ b/libgomp/openacc.f90
@@ -1,8 +1,9 @@
 !  OpenACC Runtime Library Definitions.
 
-!  Copyright (C) 2013-2014 Free Software Foundation, Inc.
+!  Copyright (C) 2014 Free Software Foundation, Inc.
 
-!  Contributed by Thomas Schwinge <thomas@codesourcery.com>.
+!  Contributed by Tobias Burnus <burnus@net-b.de>
+!             and James Norris <jnorris@codesourcery.com>
 
 !  This file is part of the GNU OpenMP Library (libgomp).
 
@@ -26,29 +27,927 @@
 !  <http://www.gnu.org/licenses/>.
 
 module openacc_kinds
+  use iso_fortran_env, only: int32
   implicit none
 
-  integer, parameter :: acc_device_kind = 4
+  private :: int32
+  public :: acc_device_kind
 
-end module openacc_kinds
+  integer, parameter :: acc_device_kind = int32
+
+  public :: acc_device_none, acc_device_default, acc_device_host
+  public :: acc_device_not_host, acc_device_nvidia
+
+  integer (acc_device_kind), parameter :: acc_device_none = 0
+  integer (acc_device_kind), parameter :: acc_device_default = 1
+  integer (acc_device_kind), parameter :: acc_device_host = 2
+  integer (acc_device_kind), parameter :: acc_device_host_nonshm = 3
+  integer (acc_device_kind), parameter :: acc_device_not_host = 4
+  integer (acc_device_kind), parameter :: acc_device_nvidia = 5
+
+  public :: acc_handle_kind
+
+  integer, parameter :: acc_handle_kind = int32
+
+  public :: acc_async_noval, acc_async_sync
+
+  integer (acc_handle_kind), parameter :: acc_async_noval = -1
+  integer (acc_handle_kind), parameter :: acc_async_sync = -2
+
+end module
+
+module openacc_internal
+  use openacc_kinds
+  implicit none
+
+  interface
+    function acc_get_num_devices_h (d)
+      import
+      integer acc_get_num_devices_h
+      integer (acc_device_kind) d
+    end function
+
+    subroutine acc_set_device_type_h (d)
+      import
+      integer (acc_device_kind) d
+    end subroutine
+
+    function acc_get_device_type_h ()
+      import
+      integer (acc_device_kind) acc_get_device_type_h
+    end function
+
+    subroutine acc_set_device_num_h (n, d)
+      import
+      integer n
+      integer (acc_device_kind) d
+    end subroutine
+
+    function acc_get_device_num_h (d)
+      import
+      integer acc_get_device_num_h
+      integer (acc_device_kind) d
+    end function
+
+    function acc_async_test_h (a)
+      logical acc_async_test_h
+      integer a
+    end function
+
+    function acc_async_test_all_h ()
+      logical acc_async_test_all_h
+    end function
+
+    subroutine acc_wait_h (a)
+      integer a
+    end subroutine
+
+    subroutine acc_wait_async_h (a1, a2)
+      integer a1, a2
+    end subroutine
+
+    subroutine acc_wait_all_h ()
+    end subroutine
+
+    subroutine acc_wait_all_async_h (a)
+      integer a
+    end subroutine
+
+    subroutine acc_init_h (d)
+      import
+      integer (acc_device_kind) d
+    end subroutine
+
+    subroutine acc_shutdown_h (d)
+      import
+      integer (acc_device_kind) d
+    end subroutine
+
+    function acc_on_device_h (d)
+      import
+      integer (acc_device_kind) d
+      logical acc_on_device_h
+    end function
+
+    subroutine acc_copyin_32_h (a, len)
+      use iso_c_binding, only: c_int32_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int32_t) len
+    end subroutine
+
+    subroutine acc_copyin_64_h (a, len)
+      use iso_c_binding, only: c_int64_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int64_t) len
+    end subroutine
+
+    subroutine acc_copyin_array_h (a)
+      type (*), dimension (..), contiguous :: a
+    end subroutine
+
+    subroutine acc_present_or_copyin_32_h (a, len)
+      use iso_c_binding, only: c_int32_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int32_t) len
+    end subroutine
+
+    subroutine acc_present_or_copyin_64_h (a, len)
+      use iso_c_binding, only: c_int64_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int64_t) len
+    end subroutine
+
+    subroutine acc_present_or_copyin_array_h (a)
+      type (*), dimension (..), contiguous :: a
+    end subroutine
+
+    subroutine acc_create_32_h (a, len)
+      use iso_c_binding, only: c_int32_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int32_t) len
+    end subroutine
+
+    subroutine acc_create_64_h (a, len)
+      use iso_c_binding, only: c_int64_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int64_t) len
+    end subroutine
+
+    subroutine acc_create_array_h (a)
+      type (*), dimension (..), contiguous :: a
+    end subroutine
+
+    subroutine acc_present_or_create_32_h (a, len)
+      use iso_c_binding, only: c_int32_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int32_t) len
+    end subroutine
+
+    subroutine acc_present_or_create_64_h (a, len)
+      use iso_c_binding, only: c_int64_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int64_t) len
+    end subroutine
+
+    subroutine acc_present_or_create_array_h (a)
+      type (*), dimension (..), contiguous :: a
+    end subroutine
+
+    subroutine acc_copyout_32_h (a, len)
+      use iso_c_binding, only: c_int32_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int32_t) len
+    end subroutine
+
+    subroutine acc_copyout_64_h (a, len)
+      use iso_c_binding, only: c_int64_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int64_t) len
+    end subroutine
+
+    subroutine acc_copyout_array_h (a)
+      type (*), dimension (..), contiguous :: a
+    end subroutine
+
+    subroutine acc_delete_32_h (a, len)
+      use iso_c_binding, only: c_int32_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int32_t) len
+    end subroutine
+
+    subroutine acc_delete_64_h (a, len)
+      use iso_c_binding, only: c_int64_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int64_t) len
+    end subroutine
+
+    subroutine acc_delete_array_h (a)
+      type (*), dimension (..), contiguous :: a
+    end subroutine
+
+    subroutine acc_update_device_32_h (a, len)
+      use iso_c_binding, only: c_int32_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int32_t) len
+    end subroutine
+
+    subroutine acc_update_device_64_h (a, len)
+      use iso_c_binding, only: c_int64_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int64_t) len
+    end subroutine
+
+    subroutine acc_update_device_array_h (a)
+      type (*), dimension (..), contiguous :: a
+    end subroutine
+
+    subroutine acc_update_self_32_h (a, len)
+      use iso_c_binding, only: c_int32_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int32_t) len
+    end subroutine
+
+    subroutine acc_update_self_64_h (a, len)
+      use iso_c_binding, only: c_int64_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int64_t) len
+    end subroutine
+
+    subroutine acc_update_self_array_h (a)
+      type (*), dimension (..), contiguous :: a
+    end subroutine
+
+    function acc_is_present_32_h (a, len)
+      use iso_c_binding, only: c_int32_t
+      logical acc_is_present_32_h
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int32_t) len
+    end function
+
+    function acc_is_present_64_h (a, len)
+      use iso_c_binding, only: c_int64_t
+      logical acc_is_present_64_h
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_int64_t) len
+    end function
+
+    function acc_is_present_array_h (a)
+      logical acc_is_present_array_h
+      type (*), dimension (..), contiguous :: a
+    end function
+  end interface
+
+  interface
+    function acc_get_num_devices_l (d) &
+        bind (C, name = "acc_get_num_devices")
+      use iso_c_binding, only: c_int
+      integer (c_int) :: acc_get_num_devices_l
+      integer (c_int), value :: d
+    end function
+
+    subroutine acc_set_device_type_l (d) &
+        bind (C, name = "acc_set_device_type")
+      use iso_c_binding, only: c_int
+      integer (c_int), value :: d
+    end subroutine
+
+    function acc_get_device_type_l () &
+        bind (C, name = "acc_get_device_type")
+      use iso_c_binding, only: c_int
+      integer (c_int) :: acc_get_device_type_l
+    end function
+
+    subroutine acc_set_device_num_l (n, d) &
+        bind (C, name = "acc_set_device_num")
+      use iso_c_binding, only: c_int
+      integer (c_int), value :: n, d
+    end subroutine
+
+    function acc_get_device_num_l (d) &
+        bind (C, name = "acc_get_device_num")
+      use iso_c_binding, only: c_int
+      integer (c_int) :: acc_get_device_num_l
+      integer (c_int), value :: d
+    end function
+
+    function acc_async_test_l (a) &
+        bind (C, name = "acc_async_test")
+      use iso_c_binding, only: c_int
+      integer (c_int) :: acc_async_test_l
+      integer (c_int), value :: a
+    end function
+
+    function acc_async_test_all_l () &
+        bind (C, name = "acc_async_test_all")
+      use iso_c_binding, only: c_int
+      integer (c_int) :: acc_async_test_all_l
+    end function
+
+    subroutine acc_wait_l (a) &
+        bind (C, name = "acc_wait")
+      use iso_c_binding, only: c_int
+      integer (c_int), value :: a
+    end subroutine
+
+    subroutine acc_wait_async_l (a1, a2) &
+        bind (C, name = "acc_wait_async")
+      use iso_c_binding, only: c_int
+      integer (c_int), value :: a1, a2
+    end subroutine
+
+    subroutine acc_wait_all_l () &
+        bind (C, name = "acc_wait_all")
+      use iso_c_binding, only: c_int
+    end subroutine
+
+    subroutine acc_wait_all_async_l (a) &
+        bind (C, name = "acc_wait_all_async")
+      use iso_c_binding, only: c_int
+      integer (c_int), value :: a
+    end subroutine
+
+    subroutine acc_init_l (d) &
+        bind (C, name = "acc_init")
+      use iso_c_binding, only: c_int
+      integer (c_int), value :: d
+    end subroutine
+
+    subroutine acc_shutdown_l (d) &
+        bind (C, name = "acc_shutdown")
+      use iso_c_binding, only: c_int
+      integer (c_int), value :: d
+    end subroutine
+
+    function acc_on_device_l (d) &
+        bind (C, name = "acc_on_device")
+      use iso_c_binding, only: c_int
+      integer (c_int) :: acc_on_device_l
+      integer (c_int), value :: d
+    end function
+
+    subroutine acc_copyin_l (a, len) &
+        bind (C, name = "acc_copyin")
+      use iso_c_binding, only: c_size_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_size_t), value :: len
+    end subroutine
+
+    subroutine acc_present_or_copyin_l (a, len) &
+        bind (C, name = "acc_present_or_copyin")
+      use iso_c_binding, only: c_size_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_size_t), value :: len
+    end subroutine
+
+    subroutine acc_create_l (a, len) &
+        bind (C, name = "acc_create")
+      use iso_c_binding, only: c_size_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_size_t), value :: len
+    end subroutine
+
+    subroutine acc_present_or_create_l (a, len) &
+        bind (C, name = "acc_present_or_create")
+      use iso_c_binding, only: c_size_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_size_t), value :: len
+    end subroutine
+
+    subroutine acc_copyout_l (a, len) &
+        bind (C, name = "acc_copyout")
+      use iso_c_binding, only: c_size_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_size_t), value :: len
+    end subroutine
+
+    subroutine acc_delete_l (a, len) &
+        bind (C, name = "acc_delete")
+      use iso_c_binding, only: c_size_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_size_t), value :: len
+    end subroutine
+
+    subroutine acc_update_device_l (a, len) &
+        bind (C, name = "acc_update_device")
+      use iso_c_binding, only: c_size_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_size_t), value :: len
+    end subroutine
+
+    subroutine acc_update_self_l (a, len) &
+        bind (C, name = "acc_update_self")
+      use iso_c_binding, only: c_size_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      type (*), dimension (*) :: a
+      integer (c_size_t), value :: len
+    end subroutine
+
+    function acc_is_present_l (a, len) &
+        bind (C, name = "acc_is_present")
+      use iso_c_binding, only: c_int32_t, c_size_t
+      !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+      integer (c_int32_t) :: acc_is_present_l
+      type (*), dimension (*) :: a
+      integer (c_size_t), value :: len
+    end function
+  end interface
+end module
 
 module openacc
   use openacc_kinds
+  use openacc_internal
   implicit none
 
+  public :: openacc_version
+
+  public :: acc_get_num_devices, acc_set_device_type, acc_get_device_type
+  public :: acc_set_device_num, acc_get_device_num, acc_async_test
+  public :: acc_async_test_all, acc_wait, acc_wait_async, acc_wait_all
+  public :: acc_wait_all_async, acc_init, acc_shutdown, acc_on_device
+  public :: acc_copyin, acc_present_or_copyin, acc_pcopyin, acc_create
+  public :: acc_present_or_create, acc_pcreate, acc_copyout, acc_delete
+  public :: acc_update_device, acc_update_self, acc_is_present
+
   integer, parameter :: openacc_version = 201306
 
-  integer (acc_device_kind), parameter :: acc_device_none = 0
-  integer (acc_device_kind), parameter :: acc_device_default = 1
-  integer (acc_device_kind), parameter :: acc_device_host = 2
-  integer (acc_device_kind), parameter :: acc_device_not_host = 3
+  interface acc_get_num_devices
+    procedure :: acc_get_num_devices_h
+  end interface
 
-  interface
-     function acc_on_device (dev)
-       use openacc_kinds
-       logical (4) :: acc_on_device
-       integer (acc_device_kind), intent (in) :: dev
-     end function acc_on_device
+  interface acc_set_device_type
+    procedure :: acc_set_device_type_h
+  end interface
+
+  interface acc_get_device_type
+    procedure :: acc_get_device_type_h
+  end interface
+
+  interface acc_set_device_num
+    procedure :: acc_set_device_num_h
+  end interface
+
+  interface acc_get_device_num
+    procedure :: acc_get_device_num_h
   end interface
 
-end module openacc
+  interface acc_async_test
+    procedure :: acc_async_test_h
+  end interface
+
+  interface acc_async_test_all
+    procedure :: acc_async_test_all_h
+  end interface
+
+  interface acc_wait
+    procedure :: acc_wait_h
+  end interface
+
+  interface acc_wait_async
+    procedure :: acc_wait_async_h
+  end interface
+
+  interface acc_wait_all
+    procedure :: acc_wait_all_h
+  end interface
+
+  interface acc_wait_all_async
+    procedure :: acc_wait_all_async_h
+  end interface
+
+  interface acc_init
+    procedure :: acc_init_h
+  end interface
+
+  interface acc_shutdown
+    procedure :: acc_shutdown_h
+  end interface
+
+  interface acc_on_device
+    procedure :: acc_on_device_h
+  end interface
+
+  ! acc_malloc: Only available in C/C++
+  ! acc_free: Only available in C/C++
+
+  ! As vendor extension, the following code supports both 32bit and 64bit
+  ! arguments for "size"; the OpenACC standard only permits default-kind
+  ! integers, which are of kind 4 (i.e. 32 bits).
+  ! Additionally, the two-argument version also takes arrays as argument.
+  ! and the one argument version also scalars. Note that the code assumes
+  ! that the arrays are contiguous.
+
+  interface acc_copyin
+    procedure :: acc_copyin_32_h
+    procedure :: acc_copyin_64_h
+    procedure :: acc_copyin_array_h
+  end interface
+
+  interface acc_present_or_copyin
+    procedure :: acc_present_or_copyin_32_h
+    procedure :: acc_present_or_copyin_64_h
+    procedure :: acc_present_or_copyin_array_h
+  end interface
+
+  interface acc_pcopyin
+    procedure :: acc_present_or_copyin_32_h
+    procedure :: acc_present_or_copyin_64_h
+    procedure :: acc_present_or_copyin_array_h
+  end interface
+
+  interface acc_create
+    procedure :: acc_create_32_h
+    procedure :: acc_create_64_h
+    procedure :: acc_create_array_h
+  end interface
+
+  interface acc_present_or_create
+    procedure :: acc_present_or_create_32_h
+    procedure :: acc_present_or_create_64_h
+    procedure :: acc_present_or_create_array_h
+  end interface
+
+  interface acc_pcreate
+    procedure :: acc_present_or_create_32_h
+    procedure :: acc_present_or_create_64_h
+    procedure :: acc_present_or_create_array_h
+  end interface
+
+  interface acc_copyout
+    procedure :: acc_copyout_32_h
+    procedure :: acc_copyout_64_h
+    procedure :: acc_copyout_array_h
+  end interface
+
+  interface acc_delete
+    procedure :: acc_delete_32_h
+    procedure :: acc_delete_64_h
+    procedure :: acc_delete_array_h
+  end interface
+
+  interface acc_update_device
+    procedure :: acc_update_device_32_h
+    procedure :: acc_update_device_64_h
+    procedure :: acc_update_device_array_h
+  end interface
+
+  interface acc_update_self
+    procedure :: acc_update_self_32_h
+    procedure :: acc_update_self_64_h
+    procedure :: acc_update_self_array_h
+  end interface
+
+  ! acc_map_data: Only available in C/C++
+  ! acc_unmap_data: Only available in C/C++
+  ! acc_deviceptr: Only available in C/C++
+  ! acc_hostptr: Only available in C/C++
+
+  interface acc_is_present
+    procedure :: acc_is_present_32_h
+    procedure :: acc_is_present_64_h
+    procedure :: acc_is_present_array_h
+  end interface
+
+  ! acc_memcpy_to_device: Only available in C/C++
+  ! acc_memcpy_from_device: Only available in C/C++
+
+end module
+
+function acc_get_num_devices_h (d)
+  use openacc_internal, only: acc_get_num_devices_l
+  use openacc_kinds
+  integer acc_get_num_devices_h
+  integer (acc_device_kind) d
+  acc_get_num_devices_h = acc_get_num_devices_l (d)
+end function
+
+subroutine acc_set_device_type_h (d)
+  use openacc_internal, only: acc_set_device_type_l
+  use openacc_kinds
+  integer (acc_device_kind) d
+  call acc_set_device_type_l (d)
+end subroutine
+
+function acc_get_device_type_h ()
+  use openacc_internal, only: acc_get_device_type_l
+  use openacc_kinds
+  integer (acc_device_kind) acc_get_device_type_h
+  acc_get_device_type_h = acc_get_device_type_l ()
+end function
+
+subroutine acc_set_device_num_h (n, d)
+  use openacc_internal, only: acc_set_device_num_l
+  use openacc_kinds
+  integer n
+  integer (acc_device_kind) d
+  call acc_set_device_num_l (n, d)
+end subroutine
+
+function acc_get_device_num_h (d)
+  use openacc_internal, only: acc_get_device_num_l
+  use openacc_kinds
+  integer acc_get_device_num_h
+  integer (acc_device_kind) d
+  acc_get_device_num_h = acc_get_device_num_l (d)
+end function
+
+function acc_async_test_h (a)
+  use openacc_internal, only: acc_async_test_l
+  logical acc_async_test_h
+  integer a
+  if (acc_async_test_l (a) .eq. 1) then
+    acc_async_test_h = .TRUE.
+  else
+    acc_async_test_h = .FALSE.
+  end if
+end function
+
+function acc_async_test_all_h ()
+  use openacc_internal, only: acc_async_test_all_l
+  logical acc_async_test_all_h
+  if (acc_async_test_all_l () .eq. 1) then
+    acc_async_test_all_h = .TRUE.
+  else
+    acc_async_test_all_h = .FALSE.
+  end if
+end function
+
+subroutine acc_wait_h (a)
+  use openacc_internal, only: acc_wait_l
+  integer a
+  call acc_wait_l (a)
+end subroutine
+
+subroutine acc_wait_async_h (a1, a2)
+  use openacc_internal, only: acc_wait_async_l
+  integer a1, a2
+  call acc_wait_async_l (a1, a2)
+end subroutine
+
+subroutine acc_wait_all_h ()
+  use openacc_internal, only: acc_wait_all_l
+  call acc_wait_all_l ()
+end subroutine
+
+subroutine acc_wait_all_async_h (a)
+  use openacc_internal, only: acc_wait_all_async_l
+  integer a
+  call acc_wait_all_async_l (a)
+end subroutine
+
+subroutine acc_init_h (d)
+  use openacc_internal, only: acc_init_l
+  use openacc_kinds
+  integer (acc_device_kind) d
+  call acc_init_l (d)
+end subroutine
+
+subroutine acc_shutdown_h (d)
+  use openacc_internal, only: acc_shutdown_l
+  use openacc_kinds
+  integer (acc_device_kind) d
+  call acc_shutdown_l (d)
+end subroutine
+
+function acc_on_device_h (d)
+  use openacc_internal, only: acc_on_device_l
+  use openacc_kinds
+  integer (acc_device_kind) d
+  logical acc_on_device_h
+  if (acc_on_device_l (d) .eq. 1) then
+    acc_on_device_h = .TRUE.
+  else
+    acc_on_device_h = .FALSE.
+  end if
+end function
+
+subroutine acc_copyin_32_h (a, len)
+  use iso_c_binding, only: c_int32_t, c_size_t
+  use openacc_internal, only: acc_copyin_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int32_t) len
+  call acc_copyin_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_copyin_64_h (a, len)
+  use iso_c_binding, only: c_int64_t, c_size_t
+  use openacc_internal, only: acc_copyin_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int64_t) len
+  call acc_copyin_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_copyin_array_h (a)
+  use openacc_internal, only: acc_copyin_l
+  type (*), dimension (..), contiguous :: a
+  call acc_copyin_l (a, sizeof (a))
+end subroutine
+
+subroutine acc_present_or_copyin_32_h (a, len)
+  use iso_c_binding, only: c_int32_t, c_size_t
+  use openacc_internal, only: acc_present_or_copyin_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int32_t) len
+  call acc_present_or_copyin_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_present_or_copyin_64_h (a, len)
+  use iso_c_binding, only: c_int64_t, c_size_t
+  use openacc_internal, only: acc_present_or_copyin_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int64_t) len
+  call acc_present_or_copyin_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_present_or_copyin_array_h (a)
+  use openacc_internal, only: acc_present_or_copyin_l
+  type (*), dimension (..), contiguous :: a
+  call acc_present_or_copyin_l (a, sizeof (a))
+end subroutine
+
+subroutine acc_create_32_h (a, len)
+  use iso_c_binding, only: c_int32_t, c_size_t
+  use openacc_internal, only: acc_create_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int32_t) len
+  call acc_create_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_create_64_h (a, len)
+  use iso_c_binding, only: c_int64_t, c_size_t
+  use openacc_internal, only: acc_create_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int64_t) len
+  call acc_create_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_create_array_h (a)
+  use openacc_internal, only: acc_create_l
+  type (*), dimension (..), contiguous :: a
+  call acc_create_l (a, sizeof (a))
+end subroutine
+
+subroutine acc_present_or_create_32_h (a, len)
+  use iso_c_binding, only: c_int32_t, c_size_t
+  use openacc_internal, only: acc_present_or_create_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int32_t) len
+  call acc_present_or_create_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_present_or_create_64_h (a, len)
+  use iso_c_binding, only: c_int64_t, c_size_t
+  use openacc_internal, only: acc_present_or_create_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int64_t) len
+  call acc_present_or_create_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_present_or_create_array_h (a)
+  use openacc_internal, only: acc_present_or_create_l
+  type (*), dimension (..), contiguous :: a
+  call acc_present_or_create_l (a, sizeof (a))
+end subroutine
+
+subroutine acc_copyout_32_h (a, len)
+  use iso_c_binding, only: c_int32_t, c_size_t
+  use openacc_internal, only: acc_copyout_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int32_t) len
+  call acc_copyout_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_copyout_64_h (a, len)
+  use iso_c_binding, only: c_int64_t, c_size_t
+  use openacc_internal, only: acc_copyout_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int64_t) len
+  call acc_copyout_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_copyout_array_h (a)
+  use openacc_internal, only: acc_copyout_l
+  type (*), dimension (..), contiguous :: a
+  call acc_copyout_l (a, sizeof (a))
+end subroutine
+
+subroutine acc_delete_32_h (a, len)
+  use iso_c_binding, only: c_int32_t, c_size_t
+  use openacc_internal, only: acc_delete_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int32_t) len
+  call acc_delete_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_delete_64_h (a, len)
+  use iso_c_binding, only: c_int64_t, c_size_t
+  use openacc_internal, only: acc_delete_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int64_t) len
+  call acc_delete_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_delete_array_h (a)
+  use openacc_internal, only: acc_delete_l
+  type (*), dimension (..), contiguous :: a
+  call acc_delete_l (a, sizeof (a))
+end subroutine
+
+subroutine acc_update_device_32_h (a, len)
+  use iso_c_binding, only: c_int32_t, c_size_t
+  use openacc_internal, only: acc_update_device_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int32_t) len
+  call acc_update_device_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_update_device_64_h (a, len)
+  use iso_c_binding, only: c_int64_t, c_size_t
+  use openacc_internal, only: acc_update_device_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int64_t) len
+  call acc_update_device_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_update_device_array_h (a)
+  use openacc_internal, only: acc_update_device_l
+  type (*), dimension (..), contiguous :: a
+  call acc_update_device_l (a, sizeof (a))
+end subroutine
+
+subroutine acc_update_self_32_h (a, len)
+  use iso_c_binding, only: c_int32_t, c_size_t
+  use openacc_internal, only: acc_update_self_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int32_t) len
+  call acc_update_self_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_update_self_64_h (a, len)
+  use iso_c_binding, only: c_int64_t, c_size_t
+  use openacc_internal, only: acc_update_self_l
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int64_t) len
+  call acc_update_self_l (a, int (len, kind = c_size_t))
+end subroutine
+
+subroutine acc_update_self_array_h (a)
+  use openacc_internal, only: acc_update_self_l
+  type (*), dimension (..), contiguous :: a
+  call acc_update_self_l (a, sizeof (a))
+end subroutine
+
+function acc_is_present_32_h (a, len)
+  use iso_c_binding, only: c_int32_t, c_size_t
+  use openacc_internal, only: acc_is_present_l
+  logical acc_is_present_32_h
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int32_t) len
+  if (acc_is_present_l (a, int (len, kind = c_size_t)) .eq. 1) then
+    acc_is_present_32_h = .TRUE.
+  else
+    acc_is_present_32_h = .FALSE.
+  end if
+end function
+
+function acc_is_present_64_h (a, len)
+  use iso_c_binding, only: c_int64_t, c_size_t
+  use openacc_internal, only: acc_is_present_l
+  logical acc_is_present_64_h
+  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+  type (*), dimension (*) :: a
+  integer (c_int64_t) len
+  if (acc_is_present_l (a, int (len, kind = c_size_t)) .eq. 1) then
+    acc_is_present_64_h = .TRUE.
+  else
+    acc_is_present_64_h = .FALSE.
+  end if
+end function
+
+function acc_is_present_array_h (a)
+  use openacc_internal, only: acc_is_present_l
+  logical acc_is_present_array_h
+  type (*), dimension (..), contiguous :: a
+  acc_is_present_array_h = acc_is_present_l (a, sizeof (a)) == 1
+end function
diff --git a/libgomp/openacc.h b/libgomp/openacc.h
index cde7429..cf40d07 100644
--- a/libgomp/openacc.h
+++ b/libgomp/openacc.h
@@ -1,6 +1,6 @@
-/* OpenACC Runtime Library Declarations
+/* OpenACC Runtime Library User-facing Declarations
 
-   Copyright (C) 2013 Free Software Foundation, Inc.
+   Copyright (C) 2013-2014 Free Software Foundation, Inc.
 
    Contributed by Thomas Schwinge <thomas@codesourcery.com>.
 
@@ -28,27 +28,98 @@
 #ifndef _OPENACC_H
 #define _OPENACC_H 1
 
+#include "gomp-constants.h"
+
+/* The OpenACC std is silent on whether or not including openacc.h
+   might or must not include other header files.  We chose to include
+   some.  */
+#include <stddef.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#ifdef __cplusplus
+#if __cplusplus >= 201103
+# define __GOACC_NOTHROW noexcept ()
+#elif __cplusplus
 # define __GOACC_NOTHROW throw ()
-#else
+#else /* Not C++ */
 # define __GOACC_NOTHROW __attribute__ ((__nothrow__))
 #endif
 
-typedef enum acc_device_t
-  {
-    acc_device_none = 0,
-    acc_device_default, /* This has to be a distinct value, as no
-			   return value can match it.  */
-    acc_device_host = 2,
-    acc_device_not_host = 3
-  } acc_device_t;
+  /* Types */
+  typedef enum acc_device_t
+    {
+      acc_device_none = 0,
+      acc_device_default, /* This has to be a distinct value, as no
+			     return value can match it.  */
+      acc_device_host = GOMP_TARGET_HOST,
+      acc_device_host_nonshm = GOMP_TARGET_HOST_NONSHM,
+      acc_device_not_host,
+      acc_device_nvidia = GOMP_TARGET_NVIDIA_PTX,
+      _ACC_device_hwm
+    } acc_device_t;
+
+  typedef enum acc_async_t
+    {
+      acc_async_noval = -1,
+      acc_async_sync  = -2
+    } acc_async_t;
+
+  int acc_get_num_devices (acc_device_t __dev) __GOACC_NOTHROW;
+  void acc_set_device_type (acc_device_t __dev) __GOACC_NOTHROW;
+  acc_device_t acc_get_device_type (void) __GOACC_NOTHROW;
+  void acc_set_device_num (int __num, acc_device_t __dev) __GOACC_NOTHROW;
+  int acc_get_device_num (acc_device_t __dev) __GOACC_NOTHROW;
+  int acc_async_test (int __async) __GOACC_NOTHROW;
+  int acc_async_test_all (void) __GOACC_NOTHROW;
+  void acc_wait (int __async) __GOACC_NOTHROW;
+  void acc_wait_async (int __async1, int __async2) __GOACC_NOTHROW;
+  void acc_wait_all (void) __GOACC_NOTHROW;
+  void acc_wait_all_async (int __async) __GOACC_NOTHROW;
+  void acc_init (acc_device_t __dev) __GOACC_NOTHROW;
+  void acc_shutdown (acc_device_t __dev) __GOACC_NOTHROW;
+  int acc_on_device (acc_device_t __dev) __GOACC_NOTHROW;
+  void *acc_malloc (size_t) __GOACC_NOTHROW;
+  void acc_free (void *) __GOACC_NOTHROW;
+  /* Some of these would be more correct with const qualifiers, but
+     the standard specifies otherwise.  */
+  void *acc_copyin (void *, size_t) __GOACC_NOTHROW;
+  void *acc_present_or_copyin (void *, size_t) __GOACC_NOTHROW;
+  void *acc_create (void *, size_t) __GOACC_NOTHROW;
+  void *acc_present_or_create (void *, size_t) __GOACC_NOTHROW;
+  void acc_copyout (void *, size_t) __GOACC_NOTHROW;
+  void acc_delete (void *, size_t) __GOACC_NOTHROW;
+  void acc_update_device (void *, size_t) __GOACC_NOTHROW;
+  void acc_update_self (void *, size_t) __GOACC_NOTHROW;
+  void acc_map_data (void *, void *, size_t) __GOACC_NOTHROW;
+  void acc_unmap_data (void *) __GOACC_NOTHROW;
+  void *acc_deviceptr (void *) __GOACC_NOTHROW;
+  void *acc_hostptr (void *) __GOACC_NOTHROW;
+  int acc_is_present (void *, size_t) __GOACC_NOTHROW;
+  void acc_memcpy_to_device (void *, void *, size_t) __GOACC_NOTHROW;
+  void acc_memcpy_from_device (void *, void *, size_t) __GOACC_NOTHROW;
+
+  void ACC_target (int, void (*) (void *), const void *,
+	     size_t, void **, size_t *, unsigned char *, int *) __GOACC_NOTHROW;
+  void ACC_parallel (int, void (*) (void *), const void *,
+	     size_t, void **, size_t *, unsigned char *) __GOACC_NOTHROW;
+  void ACC_add_device_code (void const *, char const *) __GOACC_NOTHROW;
+
+  void ACC_async_copy(int) __GOACC_NOTHROW;
+  void ACC_async_kern(int) __GOACC_NOTHROW;
 
-int acc_on_device (acc_device_t __dev) __GOACC_NOTHROW;
+  /* Old names.  OpenACC does not specify whether these can or must
+     not be macros, inlines or aliases for the new names.  */
+  #define acc_pcreate acc_present_or_create
+  #define acc_pcopyin acc_present_or_copyin
 
+  /* CUDA-specific routines.  */
+  void *acc_get_current_cuda_device (void) __GOACC_NOTHROW;
+  void *acc_get_current_cuda_context (void) __GOACC_NOTHROW;
+  void *acc_get_cuda_stream (int __async) __GOACC_NOTHROW;
+  int acc_set_cuda_stream (int __async, void *__stream) __GOACC_NOTHROW;
+  
 #ifdef __cplusplus
 }
 #endif
diff --git a/libgomp/openacc_lib.h b/libgomp/openacc_lib.h
index be49100..35ca5a7 100644
--- a/libgomp/openacc_lib.h
+++ b/libgomp/openacc_lib.h
@@ -1,8 +1,9 @@
-!  OpenACC Runtime Library Definitions.                   -*- mode: fortran -*-
+!  OpenACC Runtime Library Definitions.			-*- mode: fortran -*-
 
-!  Copyright (C) 2013-2014 Free Software Foundation, Inc.
+!  Copyright (C) 2014 Free Software Foundation, Inc.
 
-!  Contributed by Thomas Schwinge <thomas@codesourcery.com>.
+!  Contributed by Tobias Burnus <burnus@net-b.de>
+!             and James Norris <jnorris@codesourcery.com>
 
 !  This file is part of the GNU OpenMP Library (libgomp).
 
@@ -25,19 +26,353 @@
 !  see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 !  <http://www.gnu.org/licenses/>.
 
-      integer openacc_version
-      parameter (openacc_version = 201306)
-
-      integer acc_device_kind
-      parameter (acc_device_kind = 4)
-      integer (acc_device_kind) acc_device_none
-      parameter (acc_device_none = 0)
-      integer (acc_device_kind) acc_device_default
-      parameter (acc_device_default = 1)
-      integer (acc_device_kind) acc_device_host
-      parameter (acc_device_host = 2)
-      integer (acc_device_kind) acc_device_not_host
-      parameter (acc_device_not_host = 3)
-
-      external acc_on_device
-      logical (4) acc_on_device
+! NOTE: Due to the use of dimension (..), the code only works when compiled
+! with -std=f2008ts/gnu/legacy but not with other standard settings.
+! Alternatively, the user can use the module version, which permits
+! compilation with -std=f95.
+
+      integer, parameter :: acc_device_kind = 4
+
+      integer (acc_device_kind), parameter :: acc_device_none = 0
+      integer (acc_device_kind), parameter :: acc_device_default = 1
+      integer (acc_device_kind), parameter :: acc_device_host = 2
+      integer (acc_device_kind), parameter :: acc_device_host_nonshm = 3
+      integer (acc_device_kind), parameter :: acc_device_not_host = 4
+      integer (acc_device_kind), parameter :: acc_device_nvidia = 5
+
+      integer, parameter :: acc_handle_kind = 4
+
+      integer (acc_handle_kind), parameter :: acc_async_noval = -1
+      integer (acc_handle_kind), parameter :: acc_async_sync = -2
+
+      integer, parameter :: openacc_version = 201306
+
+      interface acc_get_num_devices
+        function acc_get_num_devices_h (d)
+          import acc_device_kind
+          integer acc_get_num_devices_h
+          integer (acc_device_kind) d
+        end function
+      end interface
+
+      interface acc_set_device_type
+        subroutine acc_set_device_type_h (d)
+          import acc_device_kind
+          integer (acc_device_kind) d
+        end subroutine
+      end interface
+
+      interface acc_get_device_type
+        function acc_get_device_type_h ()
+          import acc_device_kind
+          integer (acc_device_kind) acc_get_device_type_h
+        end function
+      end interface
+
+      interface acc_set_device_num
+        subroutine acc_set_device_num_h (n, d)
+          import acc_device_kind
+          integer n
+          integer (acc_device_kind) d
+        end subroutine
+      end interface
+
+      interface acc_get_device_num
+        function acc_get_device_num_h (d)
+          import acc_device_kind
+          integer acc_get_device_num_h
+          integer (acc_device_kind) d
+        end function
+      end interface
+
+      interface acc_async_test
+        function acc_async_test_h (a)
+          logical acc_async_test_h
+          integer a
+        end function
+      end interface
+
+      interface acc_async_test_all
+        function acc_async_test_all_h ()
+          logical acc_async_test_all_h
+        end function
+      end interface
+
+      interface acc_wait
+        subroutine acc_wait_h (a)
+          integer a
+        end subroutine
+      end interface
+
+      interface acc_wait_async
+        subroutine acc_wait_async_h (a1, a2)
+          integer a1, a2
+        end subroutine
+      end interface
+
+      interface acc_wait_all
+        subroutine acc_wait_all_h ()
+        end subroutine
+      end interface
+
+      interface acc_wait_all_async
+        subroutine acc_wait_all_async_h (a)
+          integer a
+        end subroutine
+      end interface
+
+      interface acc_init
+        subroutine acc_init_h (devicetype)
+          import acc_device_kind
+          integer (acc_device_kind) devicetype
+        end subroutine
+      end interface
+
+      interface acc_shutdown
+        subroutine acc_shutdown_h (devicetype)
+          import acc_device_kind
+          integer (acc_device_kind) devicetype
+        end subroutine
+      end interface
+
+      interface acc_on_device
+        function acc_on_device_h (devicetype)
+          import acc_device_kind
+          logical acc_on_device_h
+          integer (acc_device_kind) devicetype
+        end function
+      end interface
+
+      ! acc_malloc: Only available in C/C++
+      ! acc_free: Only available in C/C++
+
+      interface acc_copyin
+        subroutine acc_copyin_32_h (a, len)
+          use iso_c_binding, only: c_int32_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int32_t) len
+        end subroutine
+
+        subroutine acc_copyin_64_h (a, len)
+          use iso_c_binding, only: c_int64_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int64_t) len
+        end subroutine
+
+        subroutine acc_copyin_array_h (a)
+          type (*), dimension (..), contiguous :: a
+          end subroutine
+      end interface
+
+      interface acc_present_or_copyin
+        subroutine acc_present_or_copyin_32_h (a, len)
+          use iso_c_binding, only: c_int32_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int32_t) len
+        end subroutine
+
+        subroutine acc_present_or_copyin_64_h (a, len)
+          use iso_c_binding, only: c_int64_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int64_t) len
+        end subroutine
+
+        subroutine acc_present_or_copyin_array_h (a)
+          type (*), dimension (..), contiguous :: a
+          end subroutine
+      end interface
+
+      interface acc_pcopyin
+        subroutine acc_pcopyin_32_h (a, len)
+          use iso_c_binding, only: c_int32_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int32_t) len
+        end subroutine
+
+        subroutine acc_pcopyin_64_h (a, len)
+          use iso_c_binding, only: c_int64_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int64_t) len
+        end subroutine
+
+        subroutine acc_pcopyin_array_h (a)
+          type (*), dimension (..), contiguous :: a
+          end subroutine
+      end interface
+
+      interface acc_create
+        subroutine acc_create_32_h (a, len)
+          use iso_c_binding, only: c_int32_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int32_t) len
+        end subroutine
+
+        subroutine acc_create_64_h (a, len)
+          use iso_c_binding, only: c_int64_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int64_t) len
+        end subroutine
+
+        subroutine acc_create_array_h (a)
+          type (*), dimension (..), contiguous :: a
+          end subroutine
+      end interface
+
+      interface acc_present_or_create
+        subroutine acc_present_or_create_32_h (a, len)
+          use iso_c_binding, only: c_int32_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int32_t) len
+        end subroutine
+
+        subroutine acc_present_or_create_64_h (a, len)
+          use iso_c_binding, only: c_int64_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int64_t) len
+        end subroutine
+
+        subroutine acc_present_or_create_array_h (a)
+          type (*), dimension (..), contiguous :: a
+          end subroutine
+      end interface
+
+      interface acc_pcreate
+        subroutine acc_pcreate_32_h (a, len)
+          use iso_c_binding, only: c_int32_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int32_t) len
+        end subroutine
+
+        subroutine acc_pcreate_64_h (a, len)
+          use iso_c_binding, only: c_int64_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int64_t) len
+        end subroutine
+
+        subroutine acc_pcreate_array_h (a)
+          type (*), dimension (..), contiguous :: a
+          end subroutine
+      end interface
+
+      interface acc_copyout
+        subroutine acc_copyout_32_h (a, len)
+          use iso_c_binding, only: c_int32_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int32_t) len
+        end subroutine
+
+        subroutine acc_copyout_64_h (a, len)
+          use iso_c_binding, only: c_int64_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int64_t) len
+        end subroutine
+
+        subroutine acc_copyout_array_h (a)
+          type (*), dimension (..), contiguous :: a
+        end subroutine
+      end interface
+
+      interface acc_delete
+        subroutine acc_delete_32_h (a, len)
+          use iso_c_binding, only: c_int32_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int32_t) len
+        end subroutine
+
+        subroutine acc_delete_64_h (a, len)
+          use iso_c_binding, only: c_int64_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int64_t) len
+        end subroutine
+
+        subroutine acc_delete_array_h (a)
+          type (*), dimension (..), contiguous :: a
+        end subroutine
+      end interface
+
+      interface acc_update_device
+        subroutine acc_update_device_32_h (a, len)
+          use iso_c_binding, only: c_int32_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int32_t) len
+        end subroutine
+
+        subroutine acc_update_device_64_h (a, len)
+          use iso_c_binding, only: c_int64_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int64_t) len
+        end subroutine
+
+        subroutine acc_update_device_array_h (a)
+          type (*), dimension (..), contiguous :: a
+        end subroutine
+      end interface
+
+      interface acc_update_self
+        subroutine acc_update_self_32_h (a, len)
+          use iso_c_binding, only: c_int32_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int32_t) len
+        end subroutine
+
+        subroutine acc_update_self_64_h (a, len)
+          use iso_c_binding, only: c_int64_t
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int64_t) len
+        end subroutine
+
+        subroutine acc_update_self_array_h (a)
+          type (*), dimension (..), contiguous :: a
+        end subroutine
+      end interface
+
+      ! acc_map_data: Only available in C/C++
+      ! acc_unmap_data: Only available in C/C++
+      ! acc_deviceptr: Only available in C/C++
+      ! acc_ostptr: Only available in C/C++
+
+      interface acc_is_present
+        function acc_is_present_32_h (a, len)
+          use iso_c_binding, only: c_int32_t
+          logical acc_is_present_32_h
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int32_t) len
+        end function
+
+        function acc_is_present_64_h (a, len)
+          use iso_c_binding, only: c_int64_t
+          logical acc_is_present_64_h
+          !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+          type (*), dimension (*) :: a
+          integer (c_int64_t) len
+        end function
+
+        function acc_is_present_array_h (a)
+          logical acc_is_present_array_h
+          type (*), dimension (..), contiguous :: a
+        end function
+      end interface
+
+      ! acc_memcpy_to_device: Only available in C/C++
+      ! acc_memcpy_from_device: Only available in C/C++
diff --git a/libgomp/plugin-nvptx.c b/libgomp/plugin-nvptx.c
new file mode 100644
index 0000000..51c915f
--- /dev/null
+++ b/libgomp/plugin-nvptx.c
@@ -0,0 +1,1882 @@
+/* Plugin for NVPTX execution.
+
+   Copyright (C) 2013-2014 Free Software Foundation, Inc.
+
+   Contributed by CodeSourcery.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* Nvidia PTX-specific parts of OpenACC support.  The cuda driver
+   library appears to hold some implicit state, but the documentation
+   is not clear as to what that state might be.  Or how one might
+   propagate it from one thread to another.  */
+
+//#define DEBUG
+//#define DISABLE_ASYNC
+
+#include "openacc.h"
+#include "config.h"
+#include "libgomp.h"
+#include "target.h"
+#include "libgomp-plugin.h"
+
+#include <cuda.h>
+#include <sys/queue.h>
+#include <stdint.h>
+#include <string.h>
+#include <stdio.h>
+#include <dlfcn.h>
+#include <unistd.h>
+#include <assert.h>
+
+#define	ARRAYSIZE(X) (sizeof (X) / sizeof ((X)[0]))
+
+static struct _errlist
+{
+  CUresult r;
+  char *m;
+} cuErrorList[] = {
+    { CUDA_ERROR_INVALID_VALUE, "invalid value" },
+    { CUDA_ERROR_OUT_OF_MEMORY, "out of memory" },
+    { CUDA_ERROR_NOT_INITIALIZED, "not initialized" },
+    { CUDA_ERROR_DEINITIALIZED, "deinitialized" },
+    { CUDA_ERROR_PROFILER_DISABLED, "profiler disabled" },
+    { CUDA_ERROR_PROFILER_NOT_INITIALIZED, "profiler not initialized" },
+    { CUDA_ERROR_PROFILER_ALREADY_STARTED, "already started" },
+    { CUDA_ERROR_PROFILER_ALREADY_STOPPED, "already stopped" },
+    { CUDA_ERROR_NO_DEVICE, "no device" },
+    { CUDA_ERROR_INVALID_DEVICE, "invalid device" },
+    { CUDA_ERROR_INVALID_IMAGE, "invalid image" },
+    { CUDA_ERROR_INVALID_CONTEXT, "invalid context" },
+    { CUDA_ERROR_CONTEXT_ALREADY_CURRENT, "context already current" },
+    { CUDA_ERROR_MAP_FAILED, "map error" },
+    { CUDA_ERROR_UNMAP_FAILED, "unmap error" },
+    { CUDA_ERROR_ARRAY_IS_MAPPED, "array is mapped" },
+    { CUDA_ERROR_ALREADY_MAPPED, "already mapped" },
+    { CUDA_ERROR_NO_BINARY_FOR_GPU, "no binary for gpu" },
+    { CUDA_ERROR_ALREADY_ACQUIRED, "already acquired" },
+    { CUDA_ERROR_NOT_MAPPED, "not mapped" },
+    { CUDA_ERROR_NOT_MAPPED_AS_ARRAY, "not mapped as array" },
+    { CUDA_ERROR_NOT_MAPPED_AS_POINTER, "not mapped as pointer" },
+    { CUDA_ERROR_ECC_UNCORRECTABLE, "ecc uncorrectable" },
+    { CUDA_ERROR_UNSUPPORTED_LIMIT, "unsupported limit" },
+    { CUDA_ERROR_CONTEXT_ALREADY_IN_USE, "context already in use" },
+    { CUDA_ERROR_PEER_ACCESS_UNSUPPORTED, "peer access unsupported" },
+    { CUDA_ERROR_INVALID_SOURCE, "invalid source" },
+    { CUDA_ERROR_FILE_NOT_FOUND, "file not found" },
+    { CUDA_ERROR_SHARED_OBJECT_SYMBOL_NOT_FOUND,
+                                            "shared object symbol not found" },
+    { CUDA_ERROR_SHARED_OBJECT_INIT_FAILED, "shared object init error" },
+    { CUDA_ERROR_OPERATING_SYSTEM, "operating system" },
+    { CUDA_ERROR_INVALID_HANDLE, "invalid handle" },
+    { CUDA_ERROR_NOT_FOUND, "not found" },
+    { CUDA_ERROR_NOT_READY, "not ready" },
+    { CUDA_ERROR_LAUNCH_FAILED, "launch error" },
+    { CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES, "launch out of resources" },
+    { CUDA_ERROR_LAUNCH_TIMEOUT, "launch timeout" },
+    { CUDA_ERROR_LAUNCH_INCOMPATIBLE_TEXTURING,
+                                            "launch incompatibe texturing" },
+    { CUDA_ERROR_PEER_ACCESS_ALREADY_ENABLED, "peer access already enabled" },
+    { CUDA_ERROR_PEER_ACCESS_NOT_ENABLED, "peer access not enabled " },
+    { CUDA_ERROR_PRIMARY_CONTEXT_ACTIVE, "primary cotext active" },
+    { CUDA_ERROR_CONTEXT_IS_DESTROYED, "context is destroyed" },
+    { CUDA_ERROR_ASSERT, "assert" },
+    { CUDA_ERROR_TOO_MANY_PEERS, "too many peers" },
+    { CUDA_ERROR_HOST_MEMORY_ALREADY_REGISTERED,
+                                            "host memory already registered" },
+    { CUDA_ERROR_HOST_MEMORY_NOT_REGISTERED, "host memory not registered" },
+    { CUDA_ERROR_NOT_PERMITTED, "no permitted" },
+    { CUDA_ERROR_NOT_SUPPORTED, "not supported" },
+    { CUDA_ERROR_UNKNOWN, "unknown" }
+};
+
+static char errmsg[128];
+
+static char *
+cuErrorMsg (CUresult r)
+{
+  int i;
+
+  for (i = 0; i < ARRAYSIZE (cuErrorList); i++)
+    {
+      if (cuErrorList[i].r == r)
+	return &cuErrorList[i].m[0];
+    }
+
+  sprintf (&errmsg[0], "unknown result code: %5d", r);
+
+  return &errmsg[0];
+}
+
+struct targ_fn_descriptor
+{
+  CUfunction fn;
+  const char *name;
+};
+
+static bool PTX_inited = false;
+
+struct PTX_stream
+{
+  CUstream stream;
+  pthread_t host_thread;
+  bool multithreaded;
+
+  CUdeviceptr d;
+  void *h;
+  void *h_begin;
+  void *h_end;
+  void *h_next;
+  void *h_prev;
+  void *h_tail;
+
+  SLIST_ENTRY(PTX_stream) next;
+};
+
+SLIST_HEAD(PTX_streams, PTX_stream);
+
+/* Each thread may select a stream (also specific to a device/context).  */
+static __thread struct PTX_stream *current_stream;
+
+struct map
+{
+  int     async;
+  size_t  size;
+  char    mappings[0];
+};
+
+static void
+map_init (struct PTX_stream *s)
+{
+  CUresult r;
+
+  int size = getpagesize ();
+
+  assert (s);
+  assert (!s->d);
+  assert (!s->h);
+
+  r = cuMemAllocHost (&s->h, size);
+  if (r != CUDA_SUCCESS)
+    gomp_plugin_fatal ("cuMemAllocHost error: %s", cuErrorMsg (r));
+
+  r = cuMemHostGetDevicePointer (&s->d, s->h, 0);
+  if (r != CUDA_SUCCESS)
+    gomp_plugin_fatal ("cuMemHostGetDevicePointer error: %s", cuErrorMsg (r));
+
+  assert (s->h);
+
+  s->h_begin = s->h;
+  s->h_end = s->h_begin + size;
+  s->h_next = s->h_prev = s->h_tail = s->h_begin;
+
+  assert (s->h_next);
+  assert (s->h_end);
+}
+
+static void
+map_fini (struct PTX_stream *s)
+{
+  CUresult r;
+  
+  r = cuMemFreeHost (s->h);
+  if (r != CUDA_SUCCESS)
+    gomp_plugin_fatal ("cuMemFreeHost error: %s", cuErrorMsg (r));
+}
+
+static void
+map_pop (struct PTX_stream *s)
+{
+  struct map *m;
+
+  assert (s != NULL);
+  assert (s->h_next);
+  assert (s->h_prev);
+  assert (s->h_tail);
+
+  m = s->h_tail;
+
+  s->h_tail += m->size;
+
+  if (s->h_tail >= s->h_end)
+    s->h_tail = s->h_begin + (int) (s->h_tail - s->h_end);
+
+  if (s->h_next == s->h_tail)
+    s->h_prev = s->h_next;
+
+  assert (s->h_next >= s->h_begin);
+  assert (s->h_tail >= s->h_begin);
+  assert (s->h_prev >= s->h_begin);
+
+  assert (s->h_next <= s->h_end);
+  assert (s->h_tail <= s->h_end);
+  assert (s->h_prev <= s->h_end);
+}
+
+static void
+map_push (struct PTX_stream *s, int async, size_t size, void **h, void **d)
+{
+  int left;
+  int offset;
+  struct map *m;
+
+  assert (s != NULL);
+
+  left = s->h_end - s->h_next;
+  size += sizeof (struct map);
+
+  assert (s->h_prev);
+  assert (s->h_next);
+
+  if (size >= left)
+    {
+      m = s->h_prev;
+      m->size += left;
+      s->h_next = s->h_begin;
+
+      if (s->h_next + size > s->h_end)
+	gomp_plugin_fatal ("unable to push map");
+    }
+
+  assert (s->h_next);
+
+  m = s->h_next;
+  m->async = async;
+  m->size = size;
+
+  offset = (void *)&m->mappings[0] - s->h;
+
+  *d = (void *)(s->d + offset);
+  *h = (void *)(s->h + offset);
+
+  s->h_prev = s->h_next;
+  s->h_next += size;
+
+  assert (s->h_prev);
+  assert (s->h_next);
+
+  assert (s->h_next >= s->h_begin);
+  assert (s->h_tail >= s->h_begin);
+  assert (s->h_prev >= s->h_begin);
+  assert (s->h_next <= s->h_end);
+  assert (s->h_tail <= s->h_end);
+  assert (s->h_prev <= s->h_end);
+
+  return;
+}
+
+struct PTX_device
+{
+  CUcontext ctx;
+  bool ctx_shared;
+  CUdevice dev;
+  struct PTX_stream *null_stream;
+  /* All non-null streams associated with this device (actually context),
+     either created implicitly or passed in from the user (via
+     acc_set_cuda_stream).  */
+  struct PTX_streams active_streams;
+  struct {
+    struct PTX_stream **arr;
+    int size;
+  } async_streams;
+  /* A lock for use when manipulating the above stream list and array.  */
+  gomp_mutex_t stream_lock;
+  int ord;
+  bool overlap;
+  bool map;
+  bool concur;
+  int  mode;
+  bool mkern;
+  SLIST_ENTRY(PTX_device) next;
+};
+
+static __thread struct PTX_device *PTX_dev;
+static SLIST_HEAD(_PTX_devices, PTX_device) _PTX_devices;
+static struct _PTX_devices *PTX_devices;
+
+enum PTX_event_type
+{
+  PTX_EVT_MEM,
+  PTX_EVT_KNL,
+  PTX_EVT_SYNC
+};
+
+struct PTX_event
+{
+  CUevent *evt;
+  int type;
+  void *addr;
+  void *tgt;
+  int ord;
+  SLIST_ENTRY(PTX_event) next;
+};
+
+static gomp_mutex_t PTX_event_lock;
+static SLIST_HEAD(_PTX_events, PTX_event) _PTX_events;
+static struct _PTX_events *PTX_events;
+
+#define _XSTR(s) _STR(s)
+#define _STR(s) #s
+
+static struct _synames
+{
+  char *n;
+} cuSymNames[] =
+{
+  { _XSTR(cuCtxCreate) },
+  { _XSTR(cuCtxDestroy) },
+  { _XSTR(cuCtxGetCurrent) },
+  { _XSTR(cuCtxPushCurrent) },
+  { _XSTR(cuCtxSynchronize) },
+  { _XSTR(cuDeviceGet) },
+  { _XSTR(cuDeviceGetAttribute) },
+  { _XSTR(cuDeviceGetCount) },
+  { _XSTR(cuEventCreate) },
+  { _XSTR(cuEventDestroy) },
+  { _XSTR(cuEventQuery) },
+  { _XSTR(cuEventRecord) },
+  { _XSTR(cuInit) },
+  { _XSTR(cuLaunchKernel) },
+  { _XSTR(cuLinkAddData) },
+  { _XSTR(cuLinkComplete) },
+  { _XSTR(cuLinkCreate) },
+  { _XSTR(cuMemAlloc) },
+  { _XSTR(cuMemAllocHost) },
+  { _XSTR(cuMemcpy) },
+  { _XSTR(cuMemcpyDtoH) },
+  { _XSTR(cuMemcpyDtoHAsync) },
+  { _XSTR(cuMemcpyHtoD) },
+  { _XSTR(cuMemcpyHtoDAsync) },
+  { _XSTR(cuMemFree) },
+  { _XSTR(cuMemFreeHost) },
+  { _XSTR(cuMemGetAddressRange) },
+  { _XSTR(cuMemHostGetDevicePointer) },
+  { _XSTR(cuMemHostRegister) },
+  { _XSTR(cuMemHostUnregister) },
+  { _XSTR(cuModuleGetFunction) },
+  { _XSTR(cuModuleLoadData) },
+  { _XSTR(cuStreamDestroy) },
+  { _XSTR(cuStreamQuery) },
+  { _XSTR(cuStreamSynchronize) },
+  { _XSTR(cuStreamWaitEvent) }
+};
+
+static int
+verify_device_library (void)
+{
+  int i;
+  void *dh, *ds;
+
+  dh = dlopen ("libcuda.so", RTLD_LAZY);
+  if (!dh)
+    return -1;
+
+  for (i = 0; i < ARRAYSIZE (cuSymNames); i++)
+    {
+      ds = dlsym (dh, cuSymNames[i].n);
+      if (!ds)
+        return -1;
+    }
+
+  dlclose (dh);
+  
+  return 0;
+}
+
+static void
+init_streams_for_device (struct PTX_device *ptx_dev, int concurrency)
+{
+  int i;
+  struct PTX_stream *null_stream
+    = gomp_plugin_malloc (sizeof (struct PTX_stream));
+
+  null_stream->stream = NULL;
+  null_stream->host_thread = pthread_self ();
+  null_stream->multithreaded = true;
+  null_stream->d = (CUdeviceptr) NULL;
+  null_stream->h = NULL;
+  map_init (null_stream);
+  ptx_dev->null_stream = null_stream;
+  
+  SLIST_INIT (&ptx_dev->active_streams);
+  gomp_plugin_mutex_init (&ptx_dev->stream_lock);
+  
+  if (concurrency < 1)
+    concurrency = 1;
+  
+  /* This is just a guess -- make space for as many async streams as the
+     current device is capable of concurrently executing.  This can grow
+     later as necessary.  No streams are created yet.  */
+  ptx_dev->async_streams.arr
+    = gomp_plugin_malloc (concurrency * sizeof (struct PTX_stream *));
+  ptx_dev->async_streams.size = concurrency;
+  
+  for (i = 0; i < concurrency; i++)
+    ptx_dev->async_streams.arr[i] = NULL;
+}
+
+static void
+fini_streams_for_device (struct PTX_device *ptx_dev)
+{
+  struct PTX_stream *s;
+  free (ptx_dev->async_streams.arr);
+  
+  while (!SLIST_EMPTY (&ptx_dev->active_streams))
+    {
+      s = SLIST_FIRST (&ptx_dev->active_streams);
+      SLIST_REMOVE_HEAD (&ptx_dev->active_streams, next);
+      cuStreamDestroy (s->stream);
+      map_fini (s);
+      free (s);
+    }
+  
+  map_fini (ptx_dev->null_stream);
+  free (ptx_dev->null_stream);
+}
+
+/* Select a stream for (OpenACC-semantics) ASYNC argument for the current
+   thread THREAD (and also current device/context).  If CREATE is true, create
+   the stream if it does not exist (or use EXISTING if it is non-NULL), and
+   associate the stream with the same thread argument.  Returns stream to use
+   as result.  */
+
+static struct PTX_stream *
+select_stream_for_async (int async, pthread_t thread, bool create,
+			 CUstream existing)
+{
+  /* Local copy of TLS variable.  */
+  struct PTX_device *ptx_dev = PTX_dev;
+  struct PTX_stream *stream = NULL;
+  int orig_async = async;
+  
+  /* The special value acc_async_noval (-1) maps (for now) to an
+     implicitly-created stream, which is then handled the same as any other
+     numbered async stream.  Other options are available, e.g. using the null
+     stream for anonymous async operations, or choosing an idle stream from an
+     active set.  But, stick with this for now.  */
+  if (async > acc_async_sync)
+    async++;
+  
+  if (create)
+    gomp_plugin_mutex_lock (&ptx_dev->stream_lock);
+
+  /* NOTE: AFAICT there's no particular need for acc_async_sync to map to the
+     null stream, and in fact better performance may be obtainable if it doesn't
+     (because the null stream enforces overly-strict synchronisation with
+     respect to other streams for legacy reasons, and that's probably not
+     needed with OpenACC).  Maybe investigate later.  */
+  if (async == acc_async_sync)
+    stream = ptx_dev->null_stream;
+  else if (async >= 0 && async < ptx_dev->async_streams.size
+	   && ptx_dev->async_streams.arr[async] && !(create && existing))
+    stream = ptx_dev->async_streams.arr[async];
+  else if (async >= 0 && create)
+    {
+      if (async >= ptx_dev->async_streams.size)
+        {
+	  int i, newsize = ptx_dev->async_streams.size * 2;
+	  
+	  if (async >= newsize)
+	    newsize = async + 1;
+	  
+	  ptx_dev->async_streams.arr
+	    = gomp_plugin_realloc (ptx_dev->async_streams.arr,
+				   newsize * sizeof (struct PTX_stream *));
+	  
+	  for (i = ptx_dev->async_streams.size; i < newsize; i++)
+	    ptx_dev->async_streams.arr[i] = NULL;
+	  
+	  ptx_dev->async_streams.size = newsize;
+	}
+
+      /* Create a new stream on-demand if there isn't one already, or if we're
+	 setting a particular async value to an existing (externally-provided)
+	 stream.  */
+      if (!ptx_dev->async_streams.arr[async] || existing)
+        {
+	  CUresult r;
+	  struct PTX_stream *s
+	    = gomp_plugin_malloc (sizeof (struct PTX_stream));
+
+	  if (existing)
+	    s->stream = existing;
+	  else
+	    {
+	      r = cuStreamCreate (&s->stream, CU_STREAM_DEFAULT);
+	      if (r != CUDA_SUCCESS)
+		gomp_plugin_fatal ("cuStreamCreate error: %s", cuErrorMsg (r));
+	    }
+	  
+	  /* If CREATE is true, we're going to be queueing some work on this
+	     stream.  Associate it with the current host thread.  */
+	  s->host_thread = thread;
+	  s->multithreaded = false;
+	  
+	  s->d = (CUdeviceptr) NULL;
+	  s->h = NULL;
+	  map_init (s);
+	  
+	  SLIST_INSERT_HEAD (&ptx_dev->active_streams, s, next);
+	  ptx_dev->async_streams.arr[async] = s;
+	}
+
+      stream = ptx_dev->async_streams.arr[async];
+    }
+  else if (async < 0)
+    gomp_plugin_fatal ("bad async %d", async);
+
+  if (create)
+    {
+      assert (stream != NULL);
+
+      /* If we're trying to use the same stream from different threads
+	 simultaneously, set stream->multithreaded to true.  This affects the
+	 behaviour of acc_async_test_all and acc_wait_all, which are supposed to
+	 only wait for asynchronous launches from the same host thread they are
+	 invoked on.  If multiple threads use the same async value, we make note
+	 of that here and fall back to testing/waiting for all threads in those
+	 functions.  */
+      if (thread != stream->host_thread)
+        stream->multithreaded = true;
+
+      gomp_plugin_mutex_unlock (&ptx_dev->stream_lock);
+    }
+  else if (stream && !stream->multithreaded
+	   && !pthread_equal (stream->host_thread, thread))
+    gomp_plugin_fatal ("async %d used on wrong thread", orig_async);
+
+#ifdef DEBUG
+  fprintf (stderr, "libgomp plugin: %s:%s using stream %p (CUDA stream %p) "
+	   "for async %d\n", __FILE__, __FUNCTION__, stream,
+	   stream ? stream->stream : NULL, orig_async);
+#endif
+
+  return stream;
+}
+
+static int PTX_get_num_devices (void);
+
+/* Initialize the device.  */
+static int
+PTX_init (void)
+{
+  CUresult r;
+  int rc;
+
+  if (PTX_inited)
+    return PTX_get_num_devices ();
+
+  rc = verify_device_library ();
+  if (rc < 0)
+    return -1;
+
+  r = cuInit (0);
+  if (r != CUDA_SUCCESS)
+    gomp_plugin_fatal ("cuInit error: %s", cuErrorMsg (r));
+
+  PTX_devices = &_PTX_devices;
+  PTX_events = &_PTX_events;
+
+  SLIST_INIT(PTX_devices);
+  SLIST_INIT(PTX_events);
+
+  gomp_plugin_mutex_init (&PTX_event_lock);
+
+  PTX_inited = true;
+
+  return PTX_get_num_devices ();
+}
+
+static int
+PTX_fini (void)
+{
+  PTX_inited = false;
+
+  return 0;
+}
+
+static void *
+PTX_open_device (int n)
+{
+  CUdevice dev;
+  CUresult r;
+  int async_engines, pi;
+
+  if (PTX_devices)
+    {
+      struct PTX_device *ptx_device;
+
+      SLIST_FOREACH(ptx_device, PTX_devices, next)
+        {
+          if (ptx_device->ord == n)
+            {
+              PTX_dev = ptx_device;
+
+              if (PTX_dev->ctx)
+                {
+                  r = cuCtxPushCurrent (PTX_dev->ctx);
+                  if (r != CUDA_SUCCESS)
+                    gomp_plugin_fatal ("cuCtxPushCurrent error: %s",
+				       cuErrorMsg (r));
+                }
+
+              return (void *)PTX_dev;
+            }
+        }
+    }
+
+  r = cuDeviceGet (&dev, n);
+  if (r != CUDA_SUCCESS)
+    gomp_plugin_fatal ("cuDeviceGet error: %s", cuErrorMsg (r));
+
+  PTX_dev = gomp_plugin_malloc (sizeof (struct PTX_device));
+  PTX_dev->ord = n;
+  PTX_dev->dev = dev;
+  PTX_dev->ctx_shared = false;
+
+  SLIST_INSERT_HEAD(PTX_devices, PTX_dev, next);
+
+  r = cuCtxGetCurrent (&PTX_dev->ctx);
+  if (r != CUDA_SUCCESS)
+    gomp_plugin_fatal ("cuCtxGetCurrent error: %s", cuErrorMsg (r));
+
+  if (!PTX_dev->ctx)
+    {
+      r = cuCtxCreate (&PTX_dev->ctx, CU_CTX_SCHED_AUTO, dev);
+      if (r != CUDA_SUCCESS)
+	gomp_plugin_fatal ("cuCtxCreate error: %s", cuErrorMsg (r));
+    }
+  else
+    {
+      PTX_dev->ctx_shared = true;
+    }
+   
+  r = cuDeviceGetAttribute (&pi, CU_DEVICE_ATTRIBUTE_GPU_OVERLAP, dev);
+  if (r != CUDA_SUCCESS)
+    gomp_plugin_fatal ("cuDeviceGetAttribute error: %s", cuErrorMsg (r));
+
+  PTX_dev->overlap = pi;
+
+  r = cuDeviceGetAttribute (&pi, CU_DEVICE_ATTRIBUTE_CAN_MAP_HOST_MEMORY, dev);
+  if (r != CUDA_SUCCESS)
+    gomp_plugin_fatal ("cuDeviceGetAttribute error: %s", cuErrorMsg (r));
+
+  PTX_dev->map = pi;
+
+  r = cuDeviceGetAttribute (&pi, CU_DEVICE_ATTRIBUTE_CONCURRENT_KERNELS, dev);
+  if (r != CUDA_SUCCESS)
+    gomp_plugin_fatal ("cuDeviceGetAttribute error: %s", cuErrorMsg (r));
+
+  PTX_dev->concur = pi;
+
+  r = cuDeviceGetAttribute (&pi, CU_DEVICE_ATTRIBUTE_COMPUTE_MODE, dev);
+  if (r != CUDA_SUCCESS)
+    gomp_plugin_fatal ("cuDeviceGetAttribute error: %s", cuErrorMsg (r));
+
+  PTX_dev->mode = pi;
+
+  r = cuDeviceGetAttribute (&pi, CU_DEVICE_ATTRIBUTE_INTEGRATED, dev);
+  if (r != CUDA_SUCCESS)
+    gomp_plugin_fatal ("cuDeviceGetAttribute error: %s", cuErrorMsg (r));
+
+  PTX_dev->mkern = pi;
+
+  r = cuDeviceGetAttribute (&async_engines,
+			    CU_DEVICE_ATTRIBUTE_ASYNC_ENGINE_COUNT, dev);
+  if (r != CUDA_SUCCESS)
+    async_engines = 1;
+
+  init_streams_for_device (PTX_dev, async_engines);
+
+  current_stream = PTX_dev->null_stream;
+
+  return (void *)PTX_dev;
+}
+
+static int
+PTX_close_device (void *h __attribute__((unused)))
+{
+  CUresult r;
+
+  if (!PTX_dev)
+    return 0;
+  
+  fini_streams_for_device (PTX_dev);
+
+  if (!PTX_dev->ctx_shared)
+    {
+      r = cuCtxDestroy (PTX_dev->ctx);
+      if (r != CUDA_SUCCESS)
+	gomp_plugin_fatal ("cuCtxDestroy error: %s", cuErrorMsg (r));
+    }
+
+  SLIST_REMOVE(PTX_devices, PTX_dev, PTX_device, next);
+  free (PTX_dev);
+
+  PTX_dev = NULL;
+
+  return 0;
+}
+
+static int
+PTX_get_num_devices (void)
+{
+  int n;
+  CUresult r;
+
+  assert (PTX_inited);
+
+  r = cuDeviceGetCount (&n);
+  if (r!= CUDA_SUCCESS)
+    gomp_plugin_fatal ("cuDeviceGetCount error: %s", cuErrorMsg (r));
+
+  return n;
+}
+
+static bool
+PTX_avail(void)
+{
+  bool avail = false;
+
+  if (PTX_init () > 0)
+    avail = true;
+
+  return avail;
+}
+
+#define ABORT_PTX				\
+  ".version 3.1\n"				\
+  ".target sm_30\n"				\
+  ".address_size 64\n"				\
+  ".visible .func abort;\n"			\
+  ".visible .func abort\n"			\
+  "{\n"						\
+  "trap;\n"					\
+  "ret;\n"					\
+  "}\n"						\
+  ".visible .func _gfortran_abort;\n"		\
+  ".visible .func _gfortran_abort\n"		\
+  "{\n"						\
+  "trap;\n"					\
+  "ret;\n"					\
+  "}\n" \
+
+/* Generated with:
+
+   $ echo 'int acc_on_device(int d) { return __builtin_acc_on_device(d); } int acc_on_device_h_(int *d) { return acc_on_device(*d); }' | accel-gcc/xgcc -Baccel-gcc -x c - -o - -S -m64 -O3 -fno-builtin-acc_on_device -fno-inline
+*/
+#define ACC_ON_DEVICE_PTX						\
+  "        .version        3.1\n"					\
+  "        .target sm_30\n"						\
+  "        .address_size 64\n"						\
+  ".visible .func (.param.u32 %out_retval)acc_on_device(.param.u32 %in_ar1);\n" \
+  ".visible .func (.param.u32 %out_retval)acc_on_device(.param.u32 %in_ar1)\n" \
+  "{\n"									\
+  "        .reg.u32 %ar1;\n"						\
+  ".reg.u32 %retval;\n"							\
+  "        .reg.u64 %hr10;\n"						\
+  "        .reg.u32 %r24;\n"						\
+  "        .reg.u32 %r25;\n"						\
+  "        .reg.pred %r27;\n"						\
+  "        .reg.u32 %r30;\n"						\
+  "        ld.param.u32 %ar1, [%in_ar1];\n"				\
+  "                mov.u32 %r24, %ar1;\n"				\
+  "                setp.ne.u32 %r27,%r24,4;\n"				\
+  "                set.u32.eq.u32 %r30,%r24,5;\n"			\
+  "                neg.s32 %r25, %r30;\n"				\
+  "        @%r27   bra     $L3;\n"					\
+  "                mov.u32 %r25, 1;\n"					\
+  "$L3:\n"								\
+  "                mov.u32 %retval, %r25;\n"				\
+  "        st.param.u32    [%out_retval], %retval;\n"			\
+  "        ret;\n"							\
+  "        }\n"								\
+  ".visible .func (.param.u32 %out_retval)acc_on_device_h_(.param.u64 %in_ar1);\n" \
+  ".visible .func (.param.u32 %out_retval)acc_on_device_h_(.param.u64 %in_ar1)\n" \
+  "{\n"									\
+  "        .reg.u64 %ar1;\n"						\
+  ".reg.u32 %retval;\n"							\
+  "        .reg.u64 %hr10;\n"						\
+  "        .reg.u64 %r25;\n"						\
+  "        .reg.u32 %r26;\n"						\
+  "        .reg.u32 %r27;\n"						\
+  "        ld.param.u64 %ar1, [%in_ar1];\n"				\
+  "                mov.u64 %r25, %ar1;\n"				\
+  "                ld.u32  %r26, [%r25];\n"				\
+  "        {\n"								\
+  "                .param.u32 %retval_in;\n"				\
+  "        {\n"								\
+  "                .param.u32 %out_arg0;\n"				\
+  "                st.param.u32 [%out_arg0], %r26;\n"			\
+  "                call (%retval_in), acc_on_device, (%out_arg0);\n"	\
+  "        }\n"								\
+  "                ld.param.u32    %r27, [%retval_in];\n"		\
+  "}\n"									\
+  "                mov.u32 %retval, %r27;\n"				\
+  "        st.param.u32    [%out_retval], %retval;\n"			\
+  "        ret;\n"							\
+  "        }"
+
+static void
+link_ptx (CUmodule *module, char *ptx_code)
+{
+  CUjit_option opts[7];
+  void *optvals[7];
+  float elapsed = 0.0;
+#define LOGSIZE 8192
+  char elog[LOGSIZE];
+  char ilog[LOGSIZE];
+  unsigned long logsize = LOGSIZE;
+  CUlinkState linkstate;
+  CUresult r;
+  void *linkout;
+  size_t linkoutsize __attribute__((unused));
+
+  gomp_plugin_notify ("attempting to load:\n---\n%s\n---\n", ptx_code);
+
+  opts[0] = CU_JIT_WALL_TIME;
+  optvals[0] = &elapsed;
+
+  opts[1] = CU_JIT_INFO_LOG_BUFFER;
+  optvals[1] = &ilog[0];
+
+  opts[2] = CU_JIT_INFO_LOG_BUFFER_SIZE_BYTES;
+  optvals[2] = (void *) logsize;
+
+  opts[3] = CU_JIT_ERROR_LOG_BUFFER;
+  optvals[3] = &elog[0];
+
+  opts[4] = CU_JIT_ERROR_LOG_BUFFER_SIZE_BYTES;
+  optvals[4] = (void *) logsize;
+
+  opts[5] = CU_JIT_LOG_VERBOSE;
+  optvals[5] = (void *) 1;
+
+  opts[6] = CU_JIT_TARGET;
+  optvals[6] = (void *) CU_TARGET_COMPUTE_30;
+
+  r = cuLinkCreate (7, opts, optvals, &linkstate);
+  if (r != CUDA_SUCCESS)
+    gomp_plugin_fatal ("cuLinkCreate error: %s", cuErrorMsg (r));
+
+  char *abort_ptx = ABORT_PTX;
+  r = cuLinkAddData (linkstate, CU_JIT_INPUT_PTX, abort_ptx,
+		     strlen (abort_ptx) + 1, 0, 0, 0, 0);
+  if (r != CUDA_SUCCESS)
+    {
+      gomp_plugin_error ("Link error log %s\n", &elog[0]);
+      gomp_plugin_fatal ("cuLinkAddData (abort) error: %s", cuErrorMsg (r));
+    }
+
+  char *acc_on_device_ptx = ACC_ON_DEVICE_PTX;
+  r = cuLinkAddData (linkstate, CU_JIT_INPUT_PTX, acc_on_device_ptx,
+		     strlen (acc_on_device_ptx) + 1, 0, 0, 0, 0);
+  if (r != CUDA_SUCCESS)
+    {
+      gomp_plugin_error ("Link error log %s\n", &elog[0]);
+      gomp_plugin_fatal ("cuLinkAddData (acc_on_device) error: %s",
+			 cuErrorMsg (r));
+    }
+
+  r = cuLinkAddData (linkstate, CU_JIT_INPUT_PTX, ptx_code,
+              strlen (ptx_code) + 1, 0, 0, 0, 0);
+  if (r != CUDA_SUCCESS)
+    {
+      gomp_plugin_error ("Link error log %s\n", &elog[0]);
+      gomp_plugin_fatal ("cuLinkAddData (ptx_code) error: %s", cuErrorMsg (r));
+    }
+
+  r = cuLinkComplete (linkstate, &linkout, &linkoutsize);
+  if (r != CUDA_SUCCESS)
+    gomp_plugin_fatal ("cuLinkComplete error: %s", cuErrorMsg (r));
+
+  gomp_plugin_notify ("Link complete: %fms\n", elapsed);
+  gomp_plugin_notify ("Link log %s\n", &ilog[0]);
+
+  r = cuModuleLoadData (module, linkout);
+  if (r != CUDA_SUCCESS)
+    gomp_plugin_fatal ("cuModuleLoadData error: %s", cuErrorMsg (r));
+}
+
+static void
+event_gc (bool memmap_lockable)
+{
+  struct PTX_event *ptx_event;
+
+  gomp_plugin_mutex_lock (&PTX_event_lock);
+
+  for (ptx_event = SLIST_FIRST (PTX_events); ptx_event;)
+    {
+      CUresult r;
+      struct PTX_event *next = SLIST_NEXT (ptx_event, next);
+
+      if (ptx_event->ord != PTX_dev->ord)
+        goto next_event;
+
+      r = cuEventQuery (*ptx_event->evt);
+      if (r == CUDA_SUCCESS)
+        {
+          CUevent *te;
+
+          te = ptx_event->evt;
+
+	  switch (ptx_event->type)
+	    {
+	    case PTX_EVT_MEM:
+	    case PTX_EVT_SYNC:
+	      break;
+	    
+	    case PTX_EVT_KNL:
+              {
+	        /* The function gomp_plugin_async_unmap_vars needs to claim the
+		   memory-map splay tree lock for the current device, so we
+		   can't call it when one of our callers has already claimed
+		   the lock.  In that case, just delay the GC for this event
+		   until later.  */
+	        if (!memmap_lockable)
+		  goto next_event;
+
+        	map_pop (ptx_event->addr);
+		if (ptx_event->tgt)
+		  gomp_plugin_async_unmap_vars (ptx_event->tgt);
+              }
+	      break;
+	    }
+
+          cuEventDestroy (*te);
+          free ((void *)te);
+
+          SLIST_REMOVE (PTX_events, ptx_event, PTX_event, next);
+
+          free (ptx_event);
+        }
+
+    next_event:
+      ptx_event = next;
+    }
+
+  gomp_plugin_mutex_unlock (&PTX_event_lock);
+}
+
+static void
+event_add (enum PTX_event_type type, CUevent *e, void *h, void *tgt)
+{
+  struct PTX_event *ptx_event;
+
+  assert (type == PTX_EVT_MEM || type == PTX_EVT_KNL || type == PTX_EVT_SYNC);
+
+  ptx_event = gomp_plugin_malloc (sizeof (struct PTX_event));
+  ptx_event->type = type;
+  ptx_event->evt = e;
+  ptx_event->addr = h;
+  ptx_event->tgt = tgt;
+  ptx_event->ord = PTX_dev->ord;
+
+  gomp_plugin_mutex_lock (&PTX_event_lock);
+
+  SLIST_INSERT_HEAD(PTX_events, ptx_event, next);
+
+  gomp_plugin_mutex_unlock (&PTX_event_lock);
+}
+
+void
+PTX_exec (void (*fn), size_t mapnum, void **hostaddrs, void **devaddrs,
+	  size_t *sizes, unsigned short *kinds, int num_gangs, int num_workers,
+	  int vector_length, int async, void *targ_mem_desc)
+{
+  struct targ_fn_descriptor *targ_fn = (struct targ_fn_descriptor *) fn;
+  CUfunction function;
+  CUresult r;
+  int i;
+  struct PTX_stream *dev_str;
+  void *kargs[1];
+  void *hp, *dp;
+  unsigned int nthreads_in_block;
+
+  function = targ_fn->fn;
+  
+  dev_str = select_stream_for_async (async, pthread_self (), false, NULL);
+  assert (dev_str == current_stream);
+
+  /* This reserves a chunk of a pre-allocated page of memory mapped on both
+     the host and the device. HP is a host pointer to the new chunk, and DP is
+     the corresponding device pointer.  */
+  map_push (dev_str, async, mapnum * sizeof (void *), &hp, &dp);
+
+  gomp_plugin_notify ("  %s: prepare mappings\n", __FUNCTION__);
+
+  /* Copy the array of arguments to the mapped page.  */
+  for (i = 0; i < mapnum; i++)
+    ((void **) hp)[i] = devaddrs[i];
+
+  /* Copy the (device) pointers to arguments to the device (dp and hp might in
+     fact have the same value on a unified-memory system).  */
+  r = cuMemcpy ((CUdeviceptr)dp, (CUdeviceptr)hp, mapnum * sizeof (void *));
+  if (r != CUDA_SUCCESS)
+    gomp_plugin_fatal ("cuMemcpy failed: %s", cuErrorMsg (r));
+
+  gomp_plugin_notify ("  %s: kernel %s: launch\n", __FUNCTION__, targ_fn->name);
+
+  // XXX: possible geometry mappings??
+  //
+  // OpenACC		CUDA
+  //
+  // num_gangs		blocks
+  // num_workers	warps (where a warp is equivalent to 32 threads)
+  // vector length	threads
+  //
+
+  /* The openacc vector_length clause 'determines the vector length to use for
+     vector or SIMD operations'.  The question is how to map this to CUDA.
+
+     In CUDA, the warp size is the vector length of a CUDA device.  However, the
+     CUDA interface abstracts away from that, and only shows us warp size
+     indirectly in maximum number of threads per block, which is a product of
+     warp size and the number of hyperthreads of a multiprocessor.
+
+     We choose to map openacc vector_length directly onto the number of threads
+     in a block, in the x dimension.  This is reflected in gcc code generation
+     that uses ThreadIdx.x to access vector elements.
+
+     Attempting to use an openacc vector_length of more than the maximum number
+     of threads per block will result in a cuda error.  */
+  nthreads_in_block = vector_length;
+
+  kargs[0] = &dp;
+  r = cuLaunchKernel (function,
+			1, 1, 1,
+			nthreads_in_block, 1, 1,
+			0, dev_str->stream, kargs, 0);
+  if (r != CUDA_SUCCESS)
+    gomp_plugin_fatal ("cuLaunchKernel error: %s", cuErrorMsg (r));
+
+#ifndef DISABLE_ASYNC
+  if (async < acc_async_noval)
+    {
+      r = cuStreamSynchronize (dev_str->stream);
+      if (r != CUDA_SUCCESS)
+        gomp_plugin_fatal ("cuStreamSynchronize error: %s", cuErrorMsg (r));
+    }
+  else
+    {
+      CUevent *e;
+
+      e = (CUevent *)gomp_plugin_malloc (sizeof (CUevent));
+
+      r = cuEventCreate (e, CU_EVENT_DISABLE_TIMING);
+      if (r != CUDA_SUCCESS)
+        gomp_plugin_fatal ("cuEventCreate error: %s", cuErrorMsg (r));
+
+      event_gc (true);
+
+      r = cuEventRecord (*e, dev_str->stream);
+      if (r != CUDA_SUCCESS)
+        gomp_plugin_fatal ("cuEventRecord error: %s", cuErrorMsg (r));
+
+      event_add (PTX_EVT_KNL, e, (void *)dev_str, targ_mem_desc);
+    }
+#else
+  r = cuCtxSynchronize ();
+  if (r != CUDA_SUCCESS)
+    gomp_plugin_fatal ("cuCtxSynchronize error: %s", cuErrorMsg (r));
+#endif
+
+  gomp_plugin_notify ("  %s: kernel %s: finished\n", __FUNCTION__,
+		      targ_fn->name);
+
+#ifndef DISABLE_ASYNC
+  if (async < acc_async_noval)
+#endif
+    map_pop (dev_str);
+}
+
+void * openacc_get_current_cuda_context (void);
+
+static void *
+PTX_alloc (size_t s)
+{
+  CUdeviceptr d;
+  CUresult r;
+
+  r = cuMemAlloc (&d, s);
+  if (r == CUDA_ERROR_OUT_OF_MEMORY)
+    return 0;
+  if (r != CUDA_SUCCESS)
+    gomp_plugin_fatal ("cuMemAlloc error: %s", cuErrorMsg (r));
+  return (void *)d;
+}
+
+static void
+PTX_free (void *p)
+{
+  CUresult r;
+  CUdeviceptr pb;
+  size_t ps;
+
+  r = cuMemGetAddressRange (&pb, &ps, (CUdeviceptr)p);
+  if (r != CUDA_SUCCESS)
+    gomp_plugin_fatal ("cuMemGetAddressRange error: %s", cuErrorMsg (r));
+
+  if ((CUdeviceptr)p != pb)
+    gomp_plugin_fatal ("invalid device address");
+
+  r = cuMemFree ((CUdeviceptr)p);
+  if (r != CUDA_SUCCESS)
+    gomp_plugin_fatal ("cuMemFree error: %s", cuErrorMsg (r));
+}
+
+static void *
+PTX_host2dev (void *d, const void *h, size_t s)
+{
+  CUresult r;
+  CUdeviceptr pb;
+  size_t ps;
+
+  if (!s)
+    return 0;
+
+  if (!d)
+    gomp_plugin_fatal ("invalid device address");
+
+  r = cuMemGetAddressRange (&pb, &ps, (CUdeviceptr)d);
+  if (r != CUDA_SUCCESS)
+    gomp_plugin_fatal ("cuMemGetAddressRange error: %s", cuErrorMsg (r));
+
+  if (!pb)
+    gomp_plugin_fatal ("invalid device address");
+
+  if (!h)
+    gomp_plugin_fatal ("invalid host address");
+
+  if (d == h)
+    gomp_plugin_fatal ("invalid host or device address");
+
+  if ((void *)(d + s) > (void *)(pb + ps))
+    gomp_plugin_fatal ("invalid size");
+
+#ifndef DISABLE_ASYNC
+  if (current_stream != PTX_dev->null_stream)
+    {
+      CUevent *e;
+
+      e = (CUevent *)gomp_plugin_malloc (sizeof (CUevent));
+
+      r = cuEventCreate (e, CU_EVENT_DISABLE_TIMING);
+      if (r != CUDA_SUCCESS)
+        gomp_plugin_fatal ("cuEventCreate error: %s", cuErrorMsg (r));
+
+      event_gc (false);
+
+      r = cuMemcpyHtoDAsync ((CUdeviceptr)d, h, s, current_stream->stream);
+      if (r != CUDA_SUCCESS)
+        gomp_plugin_fatal ("cuMemcpyHtoDAsync error: %s", cuErrorMsg (r));
+
+      r = cuEventRecord (*e, current_stream->stream);
+      if (r != CUDA_SUCCESS)
+        gomp_plugin_fatal ("cuEventRecord error: %s", cuErrorMsg (r));
+
+      event_add (PTX_EVT_MEM, e, (void *)h, NULL);
+    }
+  else
+#endif
+    {
+      r = cuMemcpyHtoD ((CUdeviceptr)d, h, s);
+      if (r != CUDA_SUCCESS)
+        gomp_plugin_fatal ("cuMemcpyHtoD error: %s", cuErrorMsg (r));
+    }
+
+  return 0;
+}
+
+static void *
+PTX_dev2host (void *h, const void *d, size_t s)
+{
+  CUresult r;
+  CUdeviceptr pb;
+  size_t ps;
+
+  if (!s)
+    return 0;
+
+  if (!d)
+    gomp_plugin_fatal ("invalid device address");
+
+  r = cuMemGetAddressRange (&pb, &ps, (CUdeviceptr)d);
+  if (r != CUDA_SUCCESS)
+    gomp_plugin_fatal ("cuMemGetAddressRange error: %s", cuErrorMsg (r));
+
+  if (!pb)
+    gomp_plugin_fatal ("invalid device address");
+
+  if (!h)
+    gomp_plugin_fatal ("invalid host address");
+
+  if (d == h)
+    gomp_plugin_fatal ("invalid host or device address");
+
+  if ((void *)(d + s) > (void *)(pb + ps))
+    gomp_plugin_fatal ("invalid size");
+
+#ifndef DISABLE_ASYNC
+  if (current_stream != PTX_dev->null_stream)
+    {
+      CUevent *e;
+
+      e = (CUevent *)gomp_plugin_malloc (sizeof (CUevent));
+
+      r = cuEventCreate (e, CU_EVENT_DISABLE_TIMING);
+      if (r != CUDA_SUCCESS)
+        gomp_plugin_fatal ("cuEventCreate error: %s\n", cuErrorMsg (r));
+
+      event_gc (false);
+
+      r = cuMemcpyDtoHAsync (h, (CUdeviceptr)d, s, current_stream->stream);
+      if (r != CUDA_SUCCESS)
+        gomp_plugin_fatal ("cuMemcpyDtoHAsync error: %s", cuErrorMsg (r));
+
+      r = cuEventRecord (*e, current_stream->stream);
+      if (r != CUDA_SUCCESS)
+        gomp_plugin_fatal ("cuEventRecord error: %s", cuErrorMsg (r));
+
+      event_add (PTX_EVT_MEM, e, (void *)h, NULL);
+    }
+  else
+#endif
+    {
+      r = cuMemcpyDtoH (h, (CUdeviceptr)d, s);
+      if (r != CUDA_SUCCESS)
+        gomp_plugin_fatal ("cuMemcpyDtoH error: %s", cuErrorMsg (r));
+    }
+
+  return 0;
+}
+
+static void
+PTX_set_async (int async)
+{
+  current_stream = select_stream_for_async (async, pthread_self (), true, NULL);
+}
+
+static int
+PTX_async_test (int async)
+{
+  CUresult r;
+  struct PTX_stream *s;
+  
+  s = select_stream_for_async (async, pthread_self (), false, NULL);
+
+  if (!s)
+    gomp_plugin_fatal ("unknown async %d", async);
+
+  r = cuStreamQuery (s->stream);
+  if (r == CUDA_SUCCESS)
+    return 1;
+  else if (r == CUDA_ERROR_NOT_READY)
+    return 0;
+
+  gomp_plugin_fatal ("cuStreamQuery error: %s", cuErrorMsg (r));
+
+  return 0;
+}
+
+static int
+PTX_async_test_all (void)
+{
+  struct PTX_stream *s;
+  pthread_t self = pthread_self ();
+
+  gomp_plugin_mutex_lock (&PTX_dev->stream_lock);
+
+  SLIST_FOREACH (s, &PTX_dev->active_streams, next)
+    {
+      if ((s->multithreaded || pthread_equal (s->host_thread, self))
+	  && cuStreamQuery (s->stream) == CUDA_ERROR_NOT_READY)
+	{
+	  gomp_plugin_mutex_unlock (&PTX_dev->stream_lock);
+	  return 0;
+	}
+    }
+
+  gomp_plugin_mutex_unlock (&PTX_dev->stream_lock);
+
+  return 1;
+}
+
+static void
+PTX_wait (int async)
+{
+  CUresult r;
+  struct PTX_stream *s;
+  
+  s = select_stream_for_async (async, pthread_self (), false, NULL);
+
+  if (!s)
+    gomp_plugin_fatal ("unknown async %d", async);
+
+  r = cuStreamSynchronize (s->stream);
+  if (r != CUDA_SUCCESS)
+    gomp_plugin_fatal ("cuStreamSynchronize error: %s", cuErrorMsg (r));
+  
+  event_gc (true);
+}
+
+static void
+PTX_wait_async (int async1, int async2)
+{
+  CUresult r;
+  CUevent *e;
+  struct PTX_stream *s1, *s2;
+  pthread_t self = pthread_self ();
+
+  /* The stream that is waiting (rather than being waited for) doesn't
+     necessarily have to exist already.  */
+  s2 = select_stream_for_async (async2, self, true, NULL);
+
+  s1 = select_stream_for_async (async1, self, false, NULL);
+  if (!s1)
+    gomp_plugin_fatal ("invalid async 1\n");
+
+  if (s1 == s2)
+    gomp_plugin_fatal ("identical parameters");
+
+  e = (CUevent *)gomp_plugin_malloc (sizeof (CUevent));
+
+  r = cuEventCreate (e, CU_EVENT_DISABLE_TIMING);
+  if (r != CUDA_SUCCESS)
+    gomp_plugin_fatal ("cuEventCreate error: %s", cuErrorMsg (r));
+
+  event_gc (true);
+
+  r = cuEventRecord (*e, s1->stream);
+  if (r != CUDA_SUCCESS)
+    gomp_plugin_fatal ("cuEventRecord error: %s", cuErrorMsg (r));
+
+  event_add (PTX_EVT_SYNC, e, NULL, NULL);
+
+  r = cuStreamWaitEvent (s2->stream, *e, 0);
+  if (r != CUDA_SUCCESS)
+    gomp_plugin_fatal ("cuStreamWaitEvent error: %s", cuErrorMsg (r));
+}
+
+static void
+PTX_wait_all (void)
+{
+  CUresult r;
+  struct PTX_stream *s;
+  pthread_t self = pthread_self ();
+
+  gomp_plugin_mutex_lock (&PTX_dev->stream_lock);
+
+  /* Wait for active streams initiated by this thread (or by multiple threads)
+     to complete.  */
+  SLIST_FOREACH (s, &PTX_dev->active_streams, next)
+    {
+      if (s->multithreaded || pthread_equal (s->host_thread, self))
+        {
+	  r = cuStreamQuery (s->stream);
+	  if (r == CUDA_SUCCESS)
+	    continue;
+	  else if (r != CUDA_ERROR_NOT_READY)
+	    gomp_plugin_fatal ("cuStreamQuery error: %s", cuErrorMsg (r));
+
+	  r = cuStreamSynchronize (s->stream);
+	  if (r != CUDA_SUCCESS)
+	    gomp_plugin_fatal ("cuStreamSynchronize error: %s", cuErrorMsg (r));
+	}
+    }
+
+  gomp_plugin_mutex_unlock (&PTX_dev->stream_lock);
+
+  event_gc (true);
+}
+
+static void
+PTX_wait_all_async (int async)
+{
+  CUresult r;
+  struct PTX_stream *waiting_stream, *other_stream;
+  CUevent *e;
+  pthread_t self = pthread_self ();
+  
+  /* The stream doing the waiting.  This could be the first mention of the
+     stream, so create it if necessary.  */
+  waiting_stream
+    = select_stream_for_async (async, pthread_self (), true, NULL);
+  
+  /* Launches on the null stream already block on other streams in the
+     context.  */
+  if (!waiting_stream || waiting_stream == PTX_dev->null_stream)
+    return;
+
+  event_gc (true);
+
+  gomp_plugin_mutex_lock (&PTX_dev->stream_lock);
+
+  SLIST_FOREACH (other_stream, &PTX_dev->active_streams, next)
+    {
+      if (!other_stream->multithreaded
+	  && !pthread_equal (other_stream->host_thread, self))
+	continue;
+
+      e = (CUevent *) gomp_plugin_malloc (sizeof (CUevent));
+
+      r = cuEventCreate (e, CU_EVENT_DISABLE_TIMING);
+      if (r != CUDA_SUCCESS)
+	gomp_plugin_fatal ("cuEventCreate error: %s", cuErrorMsg (r));
+
+      /* Record an event on the waited-for stream.  */
+      r = cuEventRecord (*e, other_stream->stream);
+      if (r != CUDA_SUCCESS)
+	gomp_plugin_fatal ("cuEventRecord error: %s", cuErrorMsg (r));
+
+      event_add (PTX_EVT_SYNC, e, NULL, NULL);
+
+      r = cuStreamWaitEvent (waiting_stream->stream, *e, 0);
+      if (r != CUDA_SUCCESS)
+	gomp_plugin_fatal ("cuStreamWaitEvent error: %s", cuErrorMsg (r));
+   }
+
+  gomp_plugin_mutex_unlock (&PTX_dev->stream_lock);
+}
+
+static void *
+PTX_get_current_cuda_device (void)
+{
+  if (!PTX_dev)
+    return NULL;
+
+  return &PTX_dev->dev;
+}
+
+static void *
+PTX_get_current_cuda_context (void)
+{
+  if (!PTX_dev)
+    return NULL;
+
+  return PTX_dev->ctx;
+}
+
+static void *
+PTX_get_cuda_stream (int async)
+{
+  struct PTX_stream *s;
+
+  if (!PTX_dev)
+    return NULL;
+
+  s = select_stream_for_async (async, pthread_self (), false, NULL);
+
+  return s ? s->stream : NULL;
+}
+
+static int
+PTX_set_cuda_stream (int async, void *stream)
+{
+  struct PTX_stream *oldstream;
+  pthread_t self = pthread_self ();
+
+  gomp_plugin_mutex_lock (&PTX_dev->stream_lock);
+
+  if (async < 0)
+    gomp_plugin_fatal ("bad async %d", async);
+
+  /* We have a list of active streams and an array mapping async values to
+     entries of that list.  We need to take "ownership" of the passed-in stream,
+     and add it to our list, removing the previous entry also (if there was one)
+     in order to prevent resource leaks.  Note the potential for surprise
+     here: maybe we should keep track of passed-in streams and leave it up to
+     the user to tidy those up, but that doesn't work for stream handles
+     returned from acc_get_cuda_stream above...  */
+
+  oldstream = select_stream_for_async (async, self, false, NULL);
+  
+  if (oldstream)
+    {
+      SLIST_REMOVE (&PTX_dev->active_streams, oldstream, PTX_stream, next);
+      
+      cuStreamDestroy (oldstream->stream);
+      map_fini (oldstream);
+      free (oldstream);
+    }
+
+  gomp_plugin_mutex_unlock (&PTX_dev->stream_lock);
+
+  (void) select_stream_for_async (async, self, true, (CUstream) stream);
+
+  return 1;
+}
+
+/* Plugin entry points.  */
+
+
+int
+get_type (void)
+{
+#ifdef DEBUG
+  fprintf (stderr, "libgomp plugin: %s:%s\n", __FILE__, __FUNCTION__);
+#endif
+
+  return TARGET_TYPE_NVIDIA_PTX;
+}
+
+unsigned int
+get_caps (void)
+{
+  return TARGET_CAP_OPENACC_200;
+}
+
+const char *
+get_name (void)
+{
+  return "nvidia";
+}
+
+int
+get_num_devices (void)
+{
+#ifdef DEBUG
+  fprintf (stderr, "libgomp plugin: %s:%s\n", __FILE__, __FUNCTION__);
+#endif
+
+  return PTX_get_num_devices ();
+}
+
+static void **kernel_target_data;
+static void **kernel_host_table;
+
+void
+offload_register (void *host_table, void *target_data)
+{
+#ifdef DEBUG
+  fprintf (stderr, "libgomp plugin: %s:%s (%p, %p)\n", __FILE__, __FUNCTION__,
+	   host_table, target_data);
+#endif
+  
+  kernel_target_data = target_data;
+  kernel_host_table = host_table;
+}
+
+int
+device_init (void)
+{
+#ifdef DEBUG
+  fprintf (stderr, "libgomp plugin: %s:%s\n", __FILE__, __FUNCTION__);
+#endif
+
+  return PTX_init ();
+}
+
+int
+device_fini (void)
+{
+#ifdef DEBUG
+  fprintf (stderr, "libgomp plugin: %s:%s\n", __FILE__, __FUNCTION__);
+#endif
+
+  return PTX_fini ();
+}
+
+int
+device_get_table (struct mapping_table **tablep)
+{
+  CUmodule module;
+  void **fn_table;
+  char **fn_names;
+  int fn_entries, i;
+  CUresult r;
+  struct targ_fn_descriptor *targ_fns;
+
+#ifdef DEBUG
+  fprintf (stderr, "libgomp plugin: %s:%s (%p)\n", __FILE__, __FUNCTION__,
+	   tablep);
+#endif
+
+  if (PTX_init () <= 0)
+    return 0;
+
+  /* This isn't an error, because an image may legitimately have no offloaded
+     regions and so will not call GOMP_offload_register.  */
+  if (kernel_target_data == NULL)
+    return 0;
+
+  link_ptx (&module, kernel_target_data[0]);
+
+  /* kernel_target_data[0] -> ptx code
+     kernel_target_data[1] -> variable mappings
+     kernel_target_data[2] -> array of kernel names in ascii
+
+     kernel_host_table[0] -> start of function addresses (_omp_func_table)
+     kernel_host_table[1] -> end of function addresses (_omp_funcs_end)
+
+     The array of kernel names and the functions addresses form a
+     one-to-one correspondence.  */
+
+  fn_table = kernel_host_table[0];
+  fn_names = (char **) kernel_target_data[2];
+  fn_entries = (kernel_host_table[1] - kernel_host_table[0]) / sizeof (void *);
+
+  *tablep = gomp_plugin_malloc (sizeof (struct mapping_table) * fn_entries);
+  targ_fns = gomp_plugin_malloc (sizeof (struct targ_fn_descriptor)
+				 * fn_entries);
+
+  for (i = 0; i < fn_entries; i++)
+    {
+      CUfunction function;
+
+      r = cuModuleGetFunction (&function, module, fn_names[i]);
+      if (r != CUDA_SUCCESS)
+	gomp_plugin_fatal ("cuModuleGetFunction error: %s", cuErrorMsg (r));
+
+      targ_fns[i].fn = function;
+      targ_fns[i].name = (const char *) fn_names[i];
+      
+      (*tablep)[i].host_start = (uintptr_t) fn_table[i];
+      (*tablep)[i].host_end = (*tablep)[i].host_start + 1;
+      (*tablep)[i].tgt_start = (uintptr_t) &targ_fns[i];
+      (*tablep)[i].tgt_end = (*tablep)[i].tgt_start + 1;
+    }
+
+  return fn_entries;
+}
+
+void *
+device_alloc (size_t size)
+{
+#ifdef DEBUG
+  fprintf (stderr, "libgomp plugin: %s:%s (%zu)\n", __FILE__, __FUNCTION__,
+	   size);
+#endif
+
+  return PTX_alloc (size);
+}
+
+void
+device_free (void *ptr)
+{
+#ifdef DEBUG
+  fprintf (stderr, "libgomp plugin: %s:%s (%p)\n", __FILE__, __FUNCTION__, ptr);
+#endif
+
+  PTX_free (ptr);
+}
+
+void *
+device_dev2host (void *dst, const void *src, size_t n)
+{
+#ifdef DEBUG
+  fprintf (stderr, "libgomp plugin: %s:%s (%p, %p, %zu)\n", __FILE__,
+	   __FUNCTION__, dst,
+	  src, n);
+#endif
+
+  return PTX_dev2host (dst, src, n);
+}
+
+void *
+device_host2dev (void *dst, const void *src, size_t n)
+{
+#ifdef DEBUG
+  fprintf (stderr, "libgomp plugin: %s:%s (%p, %p, %zu)\n", __FILE__,
+	   __FUNCTION__, dst, src, n);
+#endif
+
+  return PTX_host2dev (dst, src, n);
+}
+
+void (*device_run) (void *fn_ptr, void *vars) = NULL;
+
+void
+openacc_parallel (void (*fn) (void *), size_t mapnum, void **hostaddrs,
+		  void **devaddrs, size_t *sizes, unsigned short *kinds,
+		  int num_gangs, int num_workers, int vector_length,
+		  int async, void *targ_mem_desc)
+{
+#ifdef DEBUG
+  fprintf (stderr, "libgomp plugin: %s:%s (%p, %zu, %p, %p, %p, %d, %d, %d, "
+	   "%d, %p)\n", __FILE__, __FUNCTION__, fn, mapnum, hostaddrs, sizes,
+	   kinds, num_gangs, num_workers, vector_length, async, targ_mem_desc);
+#endif
+
+  PTX_exec (fn, mapnum, hostaddrs, devaddrs, sizes, kinds, num_gangs,
+	    num_workers, vector_length, async, targ_mem_desc);
+}
+
+void *
+openacc_open_device (int n)
+{
+#ifdef DEBUG
+  fprintf (stderr, "libgomp plugin: %s:%s (%d)\n", __FILE__, __FUNCTION__, n);
+#endif
+  return PTX_open_device (n);
+}
+
+int
+openacc_close_device (void *h)
+{
+#ifdef DEBUG
+  fprintf (stderr, "libgomp plugin: %s:%s (%p)\n", __FILE__, __FUNCTION__, h);
+#endif
+  return PTX_close_device (h);
+}
+
+void
+openacc_set_device_num (int n)
+{
+  assert (n >= 0);
+
+  if (!PTX_dev || PTX_dev->ord != n)
+    (void) PTX_open_device (n);
+}
+
+/* This can be called before the device is "opened" for the current thread, in
+   which case we can't tell which device number should be returned.  We don't
+   actually want to open the device here, so just return -1 and let the caller
+   (oacc-init.c:acc_get_device_num) handle it.  */
+
+int
+openacc_get_device_num (void)
+{
+  if (PTX_dev)
+    return PTX_dev->ord;
+  else
+    return -1;
+}
+
+bool
+openacc_avail (void)
+{
+#ifdef DEBUG
+  fprintf (stderr, "libgomp plugin: %s:%s\n", __FILE__, __FUNCTION__);
+#endif
+  return PTX_avail ();
+}
+
+int
+openacc_async_test (int async)
+{
+#ifdef DEBUG
+  fprintf (stderr, "libgomp plugin: %s:%s (%d)\n", __FILE__, __FUNCTION__,
+	   async);
+#endif
+  return PTX_async_test (async);
+}
+
+int
+openacc_async_test_all (void)
+{
+#ifdef DEBUG
+  fprintf (stderr, "libgomp plugin: %s:%s\n", __FILE__, __FUNCTION__);
+#endif
+  return PTX_async_test_all ();
+}
+
+void
+openacc_async_wait (int async)
+{
+#ifdef DEBUG
+  fprintf (stderr, "libgomp plugin: %s:%s (%d)\n", __FILE__, __FUNCTION__,
+	   async);
+#endif
+  PTX_wait (async);
+}
+
+void
+openacc_async_wait_async (int async1, int async2)
+{
+#ifdef DEBUG
+  fprintf (stderr, "libgomp plugin: %s:%s (%d, %d)\n", __FILE__, __FUNCTION__,
+	   async1, async2);
+#endif
+  PTX_wait_async (async1, async2);
+}
+
+void
+openacc_async_wait_all (void)
+{
+#ifdef DEBUG
+  fprintf (stderr, "libgomp plugin: %s:%s\n", __FILE__, __FUNCTION__);
+#endif
+  PTX_wait_all ();
+}
+
+void
+openacc_async_wait_all_async (int async)
+{
+#ifdef DEBUG
+  fprintf (stderr, "libgomp plugin: %s:%s (%d)\n", __FILE__, __FUNCTION__,
+	   async);
+#endif
+  PTX_wait_all_async (async);
+}
+
+void
+openacc_async_set_async (int async)
+{
+#ifdef DEBUG
+  fprintf (stderr, "libgomp plugin: %s:%s (%d)\n", __FILE__, __FUNCTION__,
+	   async);
+#endif
+  PTX_set_async (async);
+}
+
+void *
+openacc_get_current_cuda_device (void)
+{
+#ifdef DEBUG
+  fprintf (stderr, "libgomp plugin: %s:%s\n", __FILE__, __FUNCTION__);
+#endif
+  return PTX_get_current_cuda_device ();
+}
+
+void *
+openacc_get_current_cuda_context (void)
+{
+#ifdef DEBUG
+  fprintf (stderr, "libgomp plugin: %s:%s\n", __FILE__, __FUNCTION__);
+#endif
+  return PTX_get_current_cuda_context ();
+}
+
+/* NOTE: This returns a CUstream, not a PTX_stream pointer.  */
+
+void *
+openacc_get_cuda_stream (int async)
+{
+#ifdef DEBUG
+  fprintf (stderr, "libgomp plugin: %s:%s (%d)\n", __FILE__, __FUNCTION__,
+	   async);
+#endif
+  return PTX_get_cuda_stream (async);
+}
+
+/* NOTE: This takes a CUstream, not a PTX_stream pointer.  */
+
+int
+openacc_set_cuda_stream (int async, void *stream)
+{
+#ifdef DEBUG
+  fprintf (stderr, "libgomp plugin: %s:%s (%d, %p)\n", __FILE__, __FUNCTION__,
+	   async, stream);
+#endif
+  return PTX_set_cuda_stream (async, stream);
+}
diff --git a/libgomp/splay-tree.c b/libgomp/splay-tree.c
new file mode 100644
index 0000000..14b03ac
--- /dev/null
+++ b/libgomp/splay-tree.c
@@ -0,0 +1,224 @@
+/* A splay-tree datatype.
+   Copyright 1998-2013
+   Free Software Foundation, Inc.
+   Contributed by Mark Mitchell (mark@markmitchell.com).
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* The splay tree code copied from include/splay-tree.h and adjusted,
+   so that all the data lives directly in splay_tree_node_s structure
+   and no extra allocations are needed.
+
+   Files including this header should before including it add:
+typedef struct splay_tree_node_s *splay_tree_node;
+typedef struct splay_tree_s *splay_tree;
+typedef struct splay_tree_key_s *splay_tree_key;
+   define splay_tree_key_s structure, and define
+   splay_compare inline function.  */
+
+/* For an easily readable description of splay-trees, see:
+
+     Lewis, Harry R. and Denenberg, Larry.  Data Structures and Their
+     Algorithms.  Harper-Collins, Inc.  1991.
+
+   The major feature of splay trees is that all basic tree operations
+   are amortized O(log n) time for a tree with n nodes.  */
+
+#include "libgomp.h"
+#include "splay-tree.h"
+
+extern int splay_compare (splay_tree_key, splay_tree_key);
+
+/* Rotate the edge joining the left child N with its parent P.  PP is the
+   grandparents' pointer to P.  */
+
+static inline void
+rotate_left (splay_tree_node *pp, splay_tree_node p, splay_tree_node n)
+{
+  splay_tree_node tmp;
+  tmp = n->right;
+  n->right = p;
+  p->left = tmp;
+  *pp = n;
+}
+
+/* Rotate the edge joining the right child N with its parent P.  PP is the
+   grandparents' pointer to P.  */
+
+static inline void
+rotate_right (splay_tree_node *pp, splay_tree_node p, splay_tree_node n)
+{
+  splay_tree_node tmp;
+  tmp = n->left;
+  n->left = p;
+  p->right = tmp;
+  *pp = n;
+}
+
+/* Bottom up splay of KEY.  */
+
+static void
+splay_tree_splay (splay_tree sp, splay_tree_key key)
+{
+  if (sp->root == NULL)
+    return;
+
+  do {
+    int cmp1, cmp2;
+    splay_tree_node n, c;
+
+    n = sp->root;
+    cmp1 = splay_compare (key, &n->key);
+
+    /* Found.  */
+    if (cmp1 == 0)
+      return;
+
+    /* Left or right?  If no child, then we're done.  */
+    if (cmp1 < 0)
+      c = n->left;
+    else
+      c = n->right;
+    if (!c)
+      return;
+
+    /* Next one left or right?  If found or no child, we're done
+       after one rotation.  */
+    cmp2 = splay_compare (key, &c->key);
+    if (cmp2 == 0
+	|| (cmp2 < 0 && !c->left)
+	|| (cmp2 > 0 && !c->right))
+      {
+	if (cmp1 < 0)
+	  rotate_left (&sp->root, n, c);
+	else
+	  rotate_right (&sp->root, n, c);
+	return;
+      }
+
+    /* Now we have the four cases of double-rotation.  */
+    if (cmp1 < 0 && cmp2 < 0)
+      {
+	rotate_left (&n->left, c, c->left);
+	rotate_left (&sp->root, n, n->left);
+      }
+    else if (cmp1 > 0 && cmp2 > 0)
+      {
+	rotate_right (&n->right, c, c->right);
+	rotate_right (&sp->root, n, n->right);
+      }
+    else if (cmp1 < 0 && cmp2 > 0)
+      {
+	rotate_right (&n->left, c, c->right);
+	rotate_left (&sp->root, n, n->left);
+      }
+    else if (cmp1 > 0 && cmp2 < 0)
+      {
+	rotate_left (&n->right, c, c->left);
+	rotate_right (&sp->root, n, n->right);
+      }
+  } while (1);
+}
+
+/* Insert a new NODE into SP.  The NODE shouldn't exist in the tree.  */
+
+attribute_hidden void
+splay_tree_insert (splay_tree sp, splay_tree_node node)
+{
+  int comparison = 0;
+
+  splay_tree_splay (sp, &node->key);
+
+  if (sp->root)
+    comparison = splay_compare (&sp->root->key, &node->key);
+
+  if (sp->root && comparison == 0)
+    gomp_fatal ("Duplicate node");
+  else
+    {
+      /* Insert it at the root.  */
+      if (sp->root == NULL)
+	node->left = node->right = NULL;
+      else if (comparison < 0)
+	{
+	  node->left = sp->root;
+	  node->right = node->left->right;
+	  node->left->right = NULL;
+	}
+      else
+	{
+	  node->right = sp->root;
+	  node->left = node->right->left;
+	  node->right->left = NULL;
+	}
+
+      sp->root = node;
+    }
+}
+
+/* Remove node with KEY from SP.  It is not an error if it did not exist.  */
+
+attribute_hidden void
+splay_tree_remove (splay_tree sp, splay_tree_key key)
+{
+  splay_tree_splay (sp, key);
+
+  if (sp->root && splay_compare (&sp->root->key, key) == 0)
+    {
+      splay_tree_node left, right;
+
+      left = sp->root->left;
+      right = sp->root->right;
+
+      /* One of the children is now the root.  Doesn't matter much
+	 which, so long as we preserve the properties of the tree.  */
+      if (left)
+	{
+	  sp->root = left;
+
+	  /* If there was a right child as well, hang it off the
+	     right-most leaf of the left child.  */
+	  if (right)
+	    {
+	      while (left->right)
+		left = left->right;
+	      left->right = right;
+	    }
+	}
+      else
+	sp->root = right;
+    }
+}
+
+/* Lookup KEY in SP, returning NODE if present, and NULL
+   otherwise.  */
+
+attribute_hidden splay_tree_key
+splay_tree_lookup (splay_tree sp, splay_tree_key key)
+{
+  splay_tree_splay (sp, key);
+
+  if (sp->root && splay_compare (&sp->root->key, key) == 0)
+    return &sp->root->key;
+  else
+    return NULL;
+}
diff --git a/libgomp/splay-tree.h b/libgomp/splay-tree.h
index 04a71d1..d98ee9e 100644
--- a/libgomp/splay-tree.h
+++ b/libgomp/splay-tree.h
@@ -43,6 +43,30 @@ typedef struct splay_tree_key_s *splay_tree_key;
    The major feature of splay trees is that all basic tree operations
    are amortized O(log n) time for a tree with n nodes.  */
 
+#ifndef _SPLAY_TREE_H
+#define _SPLAY_TREE_H 1
+
+typedef struct splay_tree_node_s *splay_tree_node;
+typedef struct splay_tree_s *splay_tree;
+typedef struct splay_tree_key_s *splay_tree_key;
+
+struct splay_tree_key_s {
+  /* Address of the host object.  */
+  uintptr_t host_start;
+  /* Address immediately after the host object.  */
+  uintptr_t host_end;
+  /* Descriptor of the target memory.  */
+  struct target_mem_desc *tgt;
+  /* Offset from tgt->tgt_start to the start of the target object.  */
+  uintptr_t tgt_offset;
+  /* Reference count.  */
+  uintptr_t refcount;
+  /* Asynchronous reference count.  */
+  uintptr_t async_refcount;
+  /* True if data should be copied from device to host at the end.  */
+  bool copy_from;
+};
+
 /* The nodes in the splay tree.  */
 struct splay_tree_node_s {
   struct splay_tree_key_s key;
@@ -56,177 +80,8 @@ struct splay_tree_s {
   splay_tree_node root;
 };
 
-/* Rotate the edge joining the left child N with its parent P.  PP is the
-   grandparents' pointer to P.  */
-
-static inline void
-rotate_left (splay_tree_node *pp, splay_tree_node p, splay_tree_node n)
-{
-  splay_tree_node tmp;
-  tmp = n->right;
-  n->right = p;
-  p->left = tmp;
-  *pp = n;
-}
-
-/* Rotate the edge joining the right child N with its parent P.  PP is the
-   grandparents' pointer to P.  */
-
-static inline void
-rotate_right (splay_tree_node *pp, splay_tree_node p, splay_tree_node n)
-{
-  splay_tree_node tmp;
-  tmp = n->left;
-  n->left = p;
-  p->right = tmp;
-  *pp = n;
-}
-
-/* Bottom up splay of KEY.  */
-
-static void
-splay_tree_splay (splay_tree sp, splay_tree_key key)
-{
-  if (sp->root == NULL)
-    return;
-
-  do {
-    int cmp1, cmp2;
-    splay_tree_node n, c;
-
-    n = sp->root;
-    cmp1 = splay_compare (key, &n->key);
-
-    /* Found.  */
-    if (cmp1 == 0)
-      return;
-
-    /* Left or right?  If no child, then we're done.  */
-    if (cmp1 < 0)
-      c = n->left;
-    else
-      c = n->right;
-    if (!c)
-      return;
-
-    /* Next one left or right?  If found or no child, we're done
-       after one rotation.  */
-    cmp2 = splay_compare (key, &c->key);
-    if (cmp2 == 0
-	|| (cmp2 < 0 && !c->left)
-	|| (cmp2 > 0 && !c->right))
-      {
-	if (cmp1 < 0)
-	  rotate_left (&sp->root, n, c);
-	else
-	  rotate_right (&sp->root, n, c);
-	return;
-      }
-
-    /* Now we have the four cases of double-rotation.  */
-    if (cmp1 < 0 && cmp2 < 0)
-      {
-	rotate_left (&n->left, c, c->left);
-	rotate_left (&sp->root, n, n->left);
-      }
-    else if (cmp1 > 0 && cmp2 > 0)
-      {
-	rotate_right (&n->right, c, c->right);
-	rotate_right (&sp->root, n, n->right);
-      }
-    else if (cmp1 < 0 && cmp2 > 0)
-      {
-	rotate_right (&n->left, c, c->right);
-	rotate_left (&sp->root, n, n->left);
-      }
-    else if (cmp1 > 0 && cmp2 < 0)
-      {
-	rotate_left (&n->right, c, c->left);
-	rotate_right (&sp->root, n, n->right);
-      }
-  } while (1);
-}
-
-/* Insert a new NODE into SP.  The NODE shouldn't exist in the tree.  */
-
-static void
-splay_tree_insert (splay_tree sp, splay_tree_node node)
-{
-  int comparison = 0;
-
-  splay_tree_splay (sp, &node->key);
-
-  if (sp->root)
-    comparison = splay_compare (&sp->root->key, &node->key);
-
-  if (sp->root && comparison == 0)
-    abort ();
-  else
-    {
-      /* Insert it at the root.  */
-      if (sp->root == NULL)
-	node->left = node->right = NULL;
-      else if (comparison < 0)
-	{
-	  node->left = sp->root;
-	  node->right = node->left->right;
-	  node->left->right = NULL;
-	}
-      else
-	{
-	  node->right = sp->root;
-	  node->left = node->right->left;
-	  node->right->left = NULL;
-	}
-
-      sp->root = node;
-    }
-}
-
-/* Remove node with KEY from SP.  It is not an error if it did not exist.  */
-
-static void
-splay_tree_remove (splay_tree sp, splay_tree_key key)
-{
-  splay_tree_splay (sp, key);
-
-  if (sp->root && splay_compare (&sp->root->key, key) == 0)
-    {
-      splay_tree_node left, right;
-
-      left = sp->root->left;
-      right = sp->root->right;
-
-      /* One of the children is now the root.  Doesn't matter much
-	 which, so long as we preserve the properties of the tree.  */
-      if (left)
-	{
-	  sp->root = left;
-
-	  /* If there was a right child as well, hang it off the
-	     right-most leaf of the left child.  */
-	  if (right)
-	    {
-	      while (left->right)
-		left = left->right;
-	      left->right = right;
-	    }
-	}
-      else
-	sp->root = right;
-    }
-}
-
-/* Lookup KEY in SP, returning NODE if present, and NULL
-   otherwise.  */
-
-static splay_tree_key
-splay_tree_lookup (splay_tree sp, splay_tree_key key)
-{
-  splay_tree_splay (sp, key);
-
-  if (sp->root && splay_compare (&sp->root->key, key) == 0)
-    return &sp->root->key;
-  else
-    return NULL;
-}
+attribute_hidden splay_tree_key splay_tree_lookup (splay_tree, splay_tree_key);
+attribute_hidden void splay_tree_insert (splay_tree, splay_tree_node);
+attribute_hidden void splay_tree_remove (splay_tree, splay_tree_key);
+
+#endif /* _SPLAY_TREE_H */
diff --git a/libgomp/target.c b/libgomp/target.c
index af74916..418ba61 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -26,10 +26,11 @@
    creation and termination.  */
 
 #include "libgomp.h"
-#include <limits.h>
-#include <stdbool.h>
-#include <stdlib.h>
+#include "oacc-plugin.h"
+#include "gomp-constants.h"
 #include <string.h>
+#include <stdio.h>
+#include <assert.h>
 
 #ifdef PLUGIN_SUPPORT
 # include <dlfcn.h>
@@ -40,54 +41,7 @@ static void gomp_target_init (void);
 
 static pthread_once_t gomp_is_initialized = PTHREAD_ONCE_INIT;
 
-/* Forward declaration for a node in the tree.  */
-typedef struct splay_tree_node_s *splay_tree_node;
-typedef struct splay_tree_s *splay_tree;
-typedef struct splay_tree_key_s *splay_tree_key;
-
-struct target_mem_desc {
-  /* Reference count.  */
-  uintptr_t refcount;
-  /* All the splay nodes allocated together.  */
-  splay_tree_node array;
-  /* Start of the target region.  */
-  uintptr_t tgt_start;
-  /* End of the targer region.  */
-  uintptr_t tgt_end;
-  /* Handle to free.  */
-  void *to_free;
-  /* Previous target_mem_desc.  */
-  struct target_mem_desc *prev;
-  /* Number of items in following list.  */
-  size_t list_count;
-
-  /* Corresponding target device descriptor.  */
-  struct gomp_device_descr *device_descr;
-
-  /* List of splay keys to remove (or decrease refcount)
-     at the end of region.  */
-  splay_tree_key list[];
-};
-
-struct splay_tree_key_s {
-  /* Address of the host object.  */
-  uintptr_t host_start;
-  /* Address immediately after the host object.  */
-  uintptr_t host_end;
-  /* Descriptor of the target memory.  */
-  struct target_mem_desc *tgt;
-  /* Offset from tgt->tgt_start to the start of the target object.  */
-  uintptr_t tgt_offset;
-  /* Reference count.  */
-  uintptr_t refcount;
-  /* True if data should be copied from device to host at the end.  */
-  bool copy_from;
-};
-
-enum target_type {
-  TARGET_TYPE_HOST,
-  TARGET_TYPE_INTEL_MIC
-};
+#include "splay-tree.h"
 
 /* This structure describes an offload image.
    It contains type of the target, pointer to host table descriptor, and pointer
@@ -112,7 +66,7 @@ static int num_devices;
 
 /* The comparison function.  */
 
-static int
+attribute_hidden int
 splay_compare (splay_tree_key x, splay_tree_key y)
 {
   if (x->host_start == x->host_end
@@ -125,57 +79,18 @@ splay_compare (splay_tree_key x, splay_tree_key y)
   return 0;
 }
 
-#include "splay-tree.h"
+#include "target.h"
 
-/* This structure describes accelerator device.
-   It contains name of the corresponding libgomp plugin, function handlers for
-   interaction with the device, ID-number of the device, and information about
-   mapped memory.  */
-struct gomp_device_descr
+attribute_hidden void
+gomp_init_targets_once (void)
 {
-  /* This is the ID number of device.  It could be specified in DEVICE-clause of
-     TARGET construct.  */
-  int id;
-
-  /* This is the TYPE of device.  */
-  enum target_type type;
-
-  /* Set to true when device is initialized.  */
-  bool is_initialized;
-
-  /* Plugin file handler.  */
-  void *plugin_handle;
-
-  /* Function handlers.  */
-  int (*get_type_func) (void);
-  int (*get_num_devices_func) (void);
-  void (*offload_register_func) (void *, void *);
-  void (*device_init_func) (void);
-  int (*device_get_table_func) (void *);
-  void *(*device_alloc_func) (size_t);
-  void (*device_free_func) (void *);
-  void *(*device_dev2host_func) (void *, const void *, size_t);
-  void *(*device_host2dev_func) (void *, const void *, size_t);
-  void (*device_run_func) (void *, void *);
-
-  /* Splay tree containing information about mapped memory regions.  */
-  struct splay_tree_s dev_splay_tree;
-
-  /* Mutex for operating with the splay tree and other shared structures.  */
-  gomp_mutex_t dev_env_lock;
-};
-
-struct mapping_table {
-  uintptr_t host_start;
-  uintptr_t host_end;
-  uintptr_t tgt_start;
-  uintptr_t tgt_end;
-};
+  (void) pthread_once (&gomp_is_initialized, gomp_target_init);
+}
 
 attribute_hidden int
 gomp_get_num_devices (void)
 {
-  (void) pthread_once (&gomp_is_initialized, gomp_target_init);
+  gomp_init_targets_once ();
   return num_devices;
 }
 
@@ -188,12 +103,39 @@ resolve_device (int device_id)
       device_id = icv->default_device_var;
     }
   if (device_id < 0
-      || (device_id >= gomp_get_num_devices ()))
+      || device_id >= gomp_get_num_devices ())
     return NULL;
 
   return &devices[device_id];
 }
 
+__attribute__((used)) static void
+dump_mappings (FILE *f, splay_tree_node node)
+{
+  int i;
+  
+  splay_tree_key k = &node->key;
+  
+  if (!k)
+    return;
+  
+  fprintf (f, "key %p: host_start %p, host_end %p, tgt_offset %p, refcount %d, "
+	   "copy_from %s\n", k, (void *) k->host_start,
+	   (void *) k->host_end, (void *) k->tgt_offset, (int) k->refcount,
+	   k->copy_from ? "true" : "false");
+  fprintf (f, "tgt->refcount %d, tgt->tgt_start %p, tgt->tgt_end %p, "
+	   "tgt->to_free %p, tgt->prev %p, tgt->list_count %d, "
+	   "tgt->device_descr %p\n", (int) k->tgt->refcount,
+	   (void *) k->tgt->tgt_start, (void *) k->tgt->tgt_end,
+	   k->tgt->to_free, k->tgt->prev, (int) k->tgt->list_count,
+	   k->tgt->device_descr);
+
+  for (i = 0; i < k->tgt->list_count; i++)
+    fprintf (f, "item %d: %p\n", i, k->tgt->list[i]);
+  
+  dump_mappings (f, node->left);
+  dump_mappings (f, node->right);
+}
 
 /* Handle the case where splay_tree_lookup found oldn for newn.
    Helper function of gomp_map_vars.  */
@@ -211,18 +153,50 @@ gomp_map_vars_existing (splay_tree_key oldn, splay_tree_key newn,
   oldn->refcount++;
 }
 
-static struct target_mem_desc *
-gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
-	       void **hostaddrs, size_t *sizes, unsigned char *kinds,
-	       bool is_target)
+static int
+get_kind (bool is_openacc, void *kinds, int idx)
+{
+  return is_openacc ? ((unsigned short *) kinds)[idx]
+		    : ((unsigned char *) kinds)[idx];
+}
+
+attribute_hidden struct target_mem_desc *
+gomp_map_vars (struct gomp_device_descr *devicep,
+	       struct gomp_memory_mapping *mm, size_t mapnum,
+	       void **hostaddrs, void **devaddrs, size_t *sizes,
+	       void *kinds, bool is_openacc, bool is_target)
 {
   size_t i, tgt_align, tgt_size, not_found_cnt = 0;
+  const int rshift = is_openacc ? 8 : 3;
+  const int typemask = is_openacc ? 0xff : 0x7;
   struct splay_tree_key_s cur_node;
   struct target_mem_desc *tgt
     = gomp_malloc (sizeof (*tgt) + sizeof (tgt->list[0]) * mapnum);
   tgt->list_count = mapnum;
   tgt->refcount = 1;
   tgt->device_descr = devicep;
+  tgt->mem_map = mm;
+
+  /* From gcc/fortran/trans-types.c  */
+  struct descriptor_dimension
+    {
+      long stride;
+      long lbound;
+      long ubound;
+    };
+
+   struct gfc_array_descriptor
+     {
+       void *data;
+       long offset;
+       long dtype;
+       struct descriptor_dimension dimension[];
+     };
+
+#define GFC_DTYPE_RANK_MASK     0x07
+#define GFC_DTYPE_TYPE_MASK     0x38
+#define GFC_DTYPE_TYPE_SHIFT    3
+#define GFC_DTYPE_SIZE_SHIFT    6
 
   if (mapnum == 0)
     return tgt;
@@ -235,41 +209,81 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
       tgt_align = align;
       tgt_size = mapnum * sizeof (void *);
     }
-
-  gomp_mutex_lock (&devicep->dev_env_lock);
+  gomp_mutex_lock (&mm->lock);
   for (i = 0; i < mapnum; i++)
     {
+      int kind = get_kind (is_openacc, kinds, i);
       if (hostaddrs[i] == NULL)
 	{
 	  tgt->list[i] = NULL;
 	  continue;
 	}
       cur_node.host_start = (uintptr_t) hostaddrs[i];
-      if ((kinds[i] & 7) != 4)
+      if (!GOMP_MAP_POINTER_P (kind & typemask))
 	cur_node.host_end = cur_node.host_start + sizes[i];
       else
 	cur_node.host_end = cur_node.host_start + sizeof (void *);
-      splay_tree_key n = splay_tree_lookup (&devicep->dev_splay_tree,
-					    &cur_node);
+      splay_tree_key n = splay_tree_lookup (&mm->splay_tree, &cur_node);
       if (n)
 	{
 	  tgt->list[i] = n;
-	  gomp_map_vars_existing (n, &cur_node, kinds[i]);
+	  gomp_map_vars_existing (n, &cur_node, kind);
 	}
       else
 	{
-	  size_t align = (size_t) 1 << (kinds[i] >> 3);
 	  tgt->list[i] = NULL;
+
+	  if ((kind & typemask) == GOMP_MAP_TO_PSET)
+	    {
+	      struct gfc_array_descriptor *gad;
+	      size_t rank;
+	      int j;
+              bool alloc_arrays = true;
+
+	      for (j = i - 1; j >= 0; j--)
+		{
+		  if (hostaddrs[j] == *(void**)hostaddrs[i])
+		    {
+		      alloc_arrays = false;
+		      break;
+		    }
+		}
+
+	      gad = (struct gfc_array_descriptor *) cur_node.host_start;
+	      rank = gad->dtype & GFC_DTYPE_RANK_MASK;
+
+	      cur_node.host_start = (uintptr_t)gad->data;
+	      cur_node.host_end = cur_node.host_start +
+				sizeof (struct gfc_array_descriptor) +
+				(sizeof (struct descriptor_dimension) * rank);
+
+	      if (alloc_arrays)
+                {
+                  size_t tsize;
+
+                  tsize = gad->dtype >> GFC_DTYPE_SIZE_SHIFT;
+
+                  for (j = 0; j < rank; j++)
+                    {
+                      cur_node.host_end += tsize *
+                        (gad->dimension[j].ubound -
+                         gad->dimension[j].lbound + 1);
+                    }
+                }
+	    }
+
+	  size_t align = (size_t) 1 << (kind >> rshift);
 	  not_found_cnt++;
 	  if (tgt_align < align)
 	    tgt_align = align;
 	  tgt_size = (tgt_size + align - 1) & ~(align - 1);
 	  tgt_size += cur_node.host_end - cur_node.host_start;
-	  if ((kinds[i] & 7) == 5)
+	  if ((kind & typemask) == GOMP_MAP_TO_PSET)
 	    {
 	      size_t j;
 	      for (j = i + 1; j < mapnum; j++)
-		if ((kinds[j] & 7) != 4)
+		if (!GOMP_MAP_POINTER_P (get_kind (is_openacc, kinds, j)
+					 & typemask))
 		  break;
 		else if ((uintptr_t) hostaddrs[j] < cur_node.host_start
 			 || ((uintptr_t) hostaddrs[j] + sizeof (void *)
@@ -284,7 +298,15 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 	}
     }
 
-  if (not_found_cnt || is_target)
+  if (devaddrs)
+    {
+      if (mapnum != 1)
+        gomp_fatal ("unexpected aggregation");
+      tgt->to_free = devaddrs[0];
+      tgt->tgt_start = (uintptr_t) tgt->to_free;
+      tgt->tgt_end = tgt->tgt_start + sizes[0];
+    }
+  else if (not_found_cnt || is_target)
     {
       /* Allocate tgt_align aligned tgt_size block of memory.  */
       /* FIXME: Perhaps change interface to allocate properly aligned
@@ -315,44 +337,53 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
       for (i = 0; i < mapnum; i++)
 	if (tgt->list[i] == NULL)
 	  {
+	    int kind = get_kind (is_openacc, kinds, i);
 	    if (hostaddrs[i] == NULL)
 	      continue;
 	    splay_tree_key k = &array->key;
 	    k->host_start = (uintptr_t) hostaddrs[i];
-	    if ((kinds[i] & 7) != 4)
+	    if (!GOMP_MAP_POINTER_P (kind & typemask))
 	      k->host_end = k->host_start + sizes[i];
 	    else
 	      k->host_end = k->host_start + sizeof (void *);
-	    splay_tree_key n
-	      = splay_tree_lookup (&devicep->dev_splay_tree, k);
+	    splay_tree_key n = splay_tree_lookup (&mm->splay_tree, k);
 	    if (n)
 	      {
 		tgt->list[i] = n;
-		gomp_map_vars_existing (n, k, kinds[i]);
+		gomp_map_vars_existing (n, k, kind);
 	      }
 	    else
 	      {
-		size_t align = (size_t) 1 << (kinds[i] >> 3);
+		size_t align = (size_t) 1 << (kind >> rshift);
 		tgt->list[i] = k;
 		tgt_size = (tgt_size + align - 1) & ~(align - 1);
 		k->tgt = tgt;
 		k->tgt_offset = tgt_size;
 		tgt_size += k->host_end - k->host_start;
-		k->copy_from = false;
-		if ((kinds[i] & 7) == 2 || (kinds[i] & 7) == 3)
-		  k->copy_from = true;
+		k->copy_from = GOMP_MAP_COPYFROM_P (kind & typemask)
+			       || GOMP_MAP_TOFROM_P (kind & typemask);
 		k->refcount = 1;
+		k->async_refcount = 0;
 		tgt->refcount++;
 		array->left = NULL;
 		array->right = NULL;
-		splay_tree_insert (&devicep->dev_splay_tree, array);
-		switch (kinds[i] & 7)
+
+		splay_tree_insert (&mm->splay_tree, array);
+
+		switch (kind & typemask)
 		  {
-		  case 0: /* ALLOC */
-		  case 2: /* FROM */
+		  case GOMP_MAP_FORCE_ALLOC:
+		  case GOMP_MAP_FORCE_FROM:
+		    /* FIXME: No special handling (see comment in
+		       oacc-parallel.c).  */
+		  case GOMP_MAP_ALLOC:
+		  case GOMP_MAP_ALLOC_FROM:
 		    break;
-		  case 1: /* TO */
-		  case 3: /* TOFROM */
+		  case GOMP_MAP_FORCE_TO:
+		  case GOMP_MAP_FORCE_TOFROM:
+		    /* FIXME: No special handling, as above.  */
+		  case GOMP_MAP_ALLOC_TO:
+		  case GOMP_MAP_ALLOC_TOFROM:
 		    /* Copy from host to device memory.  */
 		    /* FIXME: Perhaps add some smarts, like if copying
 		       several adjacent fields from host to target, use some
@@ -362,7 +393,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 		       (void *) k->host_start,
 		       k->host_end - k->host_start);
 		    break;
-		  case 4: /* POINTER */
+		  case GOMP_MAP_POINTER:
 		    cur_node.host_start
 		      = (uintptr_t) *(void **) k->host_start;
 		    if (cur_node.host_start == (uintptr_t) NULL)
@@ -379,25 +410,23 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 		    /* Add bias to the pointer value.  */
 		    cur_node.host_start += sizes[i];
 		    cur_node.host_end = cur_node.host_start + 1;
-		    n = splay_tree_lookup (&devicep->dev_splay_tree,
-					   &cur_node);
+		    n = splay_tree_lookup (&mm->splay_tree, &cur_node);
 		    if (n == NULL)
 		      {
 			/* Could be possibly zero size array section.  */
 			cur_node.host_end--;
-			n = splay_tree_lookup (&devicep->dev_splay_tree,
-					       &cur_node);
+			n = splay_tree_lookup (&mm->splay_tree, &cur_node);
 			if (n == NULL)
 			  {
 			    cur_node.host_start--;
-			    n = splay_tree_lookup (&devicep->dev_splay_tree,
-						   &cur_node);
+			    n = splay_tree_lookup (&mm->splay_tree, &cur_node);
 			    cur_node.host_start++;
 			  }
 		      }
 		    if (n == NULL)
 		      gomp_fatal ("Pointer target of array section "
 				  "wasn't mapped");
+
 		    cur_node.host_start -= n->host_start;
 		    cur_node.tgt_offset = n->tgt->tgt_start + n->tgt_offset
 					  + cur_node.host_start;
@@ -412,86 +441,126 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 		       (void *) &cur_node.tgt_offset,
 		       sizeof (void *));
 		    break;
-		  case 5: /* TO_PSET */
-		    /* Copy from host to device memory.  */
-		    /* FIXME: see above FIXME comment.  */
-		    devicep->device_host2dev_func
-		      ((void *) (tgt->tgt_start + k->tgt_offset),
-		       (void *) k->host_start,
-		       (k->host_end - k->host_start));
-		    for (j = i + 1; j < mapnum; j++)
-		      if ((kinds[j] & 7) != 4)
-			break;
-		      else if ((uintptr_t) hostaddrs[j] < k->host_start
+		  case GOMP_MAP_TO_PSET:
+		    {
+		      /* Copy from host to device memory.  */
+		      /* FIXME: see above FIXME comment.  */
+		      devicep->device_host2dev_func
+				((void *) (tgt->tgt_start + k->tgt_offset),
+				(void *) k->host_start,
+				(k->host_end - k->host_start));
+		      devicep->device_host2dev_func
+				((void *) (tgt->tgt_start + k->tgt_offset),
+				(void *) &tgt->tgt_start,
+				sizeof (void *));
+
+		      for (j = i + 1; j < mapnum; j++)
+			if (!GOMP_MAP_POINTER_P (get_kind (is_openacc, kinds, j)
+					       & typemask))
+			  break;
+			else if ((uintptr_t) hostaddrs[j] < k->host_start
 			       || ((uintptr_t) hostaddrs[j] + sizeof (void *)
 				   > k->host_end))
-			break;
-		      else
-			{
-			  tgt->list[j] = k;
-			  k->refcount++;
-			  cur_node.host_start
-			    = (uintptr_t) *(void **) hostaddrs[j];
-			  if (cur_node.host_start == (uintptr_t) NULL)
-			    {
-			      cur_node.tgt_offset = (uintptr_t) NULL;
-			      /* Copy from host to device memory.  */
-			      /* FIXME: see above FIXME comment.  */
-			      devicep->device_host2dev_func
-				((void *) (tgt->tgt_start + k->tgt_offset
+			  break;
+			else
+			  {
+			    tgt->list[j] = k;
+			    k->refcount++;
+			    cur_node.host_start
+			      = (uintptr_t) *(void **) hostaddrs[j];
+			    if (cur_node.host_start == (uintptr_t) NULL)
+			      {
+			        cur_node.tgt_offset = (uintptr_t) NULL;
+			        /* Copy from host to device memory.  */
+			        /* FIXME: see above FIXME comment.  */
+			        devicep->device_host2dev_func
+				  ((void *) (tgt->tgt_start + k->tgt_offset
 					   + ((uintptr_t) hostaddrs[j]
 					      - k->host_start)),
-				 (void *) &cur_node.tgt_offset,
-				 sizeof (void *));
-			      i++;
-			      continue;
-			    }
-			  /* Add bias to the pointer value.  */
-			  cur_node.host_start += sizes[j];
-			  cur_node.host_end = cur_node.host_start + 1;
-			  n = splay_tree_lookup (&devicep->dev_splay_tree,
-						 &cur_node);
-			  if (n == NULL)
-			    {
-			      /* Could be possibly zero size array section.  */
-			      cur_node.host_end--;
-			      n = splay_tree_lookup (&devicep->dev_splay_tree,
+				   (void *) &cur_node.tgt_offset,
+				   sizeof (void *));
+			        i++;
+			        continue;
+			      }
+			    /* Add bias to the pointer value.  */
+			    cur_node.host_start += sizes[j];
+			    cur_node.host_end = cur_node.host_start + 1;
+			    n = splay_tree_lookup (&mm->splay_tree, &cur_node);
+			    if (n == NULL)
+			      {
+			        /* Could be possibly zero size array
+				   section.  */
+			        cur_node.host_end--;
+			        n = splay_tree_lookup (&mm->splay_tree,
 						     &cur_node);
-			      if (n == NULL)
-				{
-				  cur_node.host_start--;
-				  n = splay_tree_lookup
-					(&devicep->dev_splay_tree, &cur_node);
-				  cur_node.host_start++;
-				}
-			    }
-			  if (n == NULL)
-			    gomp_fatal ("Pointer target of array section "
+			        if (n == NULL)
+				  {
+				    cur_node.host_start--;
+				    n = splay_tree_lookup (&mm->splay_tree,
+							 &cur_node);
+				    cur_node.host_start++;
+				  }
+			      }
+			    if (n == NULL)
+				gomp_fatal ("Pointer target of array section "
 					"wasn't mapped");
-			  cur_node.host_start -= n->host_start;
-			  cur_node.tgt_offset = n->tgt->tgt_start
+			    cur_node.host_start -= n->host_start;
+			    cur_node.tgt_offset = n->tgt->tgt_start
 						+ n->tgt_offset
 						+ cur_node.host_start;
-			  /* At this point tgt_offset is target address of the
-			     array section.  Now subtract bias to get what we
-			     want to initialize the pointer with.  */
-			  cur_node.tgt_offset -= sizes[j];
-			  /* Copy from host to device memory.  */
-			  /* FIXME: see above FIXME comment.  */
-			  devicep->device_host2dev_func
-			    ((void *) (tgt->tgt_start + k->tgt_offset
+			    /* At this point tgt_offset is target address of the
+			       array section.  Now subtract bias to get what we
+			       want to initialize the pointer with.  */
+			    cur_node.tgt_offset -= sizes[j];
+			    /* Copy from host to device memory.  */
+			    /* FIXME: see above FIXME comment.  */
+
+			    devicep->device_host2dev_func
+				((void *) (tgt->tgt_start + k->tgt_offset
 				       + ((uintptr_t) hostaddrs[j]
 					  - k->host_start)),
-			     (void *) &cur_node.tgt_offset,
-			     sizeof (void *));
-			  i++;
-			}
+				(void *) &cur_node.tgt_offset,
+				sizeof (void *));
+			    i++;
+			  }
 		      break;
+		      }
+		    case GOMP_MAP_FORCE_PRESENT:
+		      {
+		        /* We already looked up the memory region above and it
+			   was missing.  */
+			size_t size = k->host_end - k->host_start;
+			gomp_fatal ("present clause: !acc_is_present (%p, "
+				    "%zd (0x%zx))", (void *) k->host_start,
+				    size, size);
+		      }
+		      break;
+		    case GOMP_MAP_FORCE_DEVICEPTR:
+		      assert (k->host_end - k->host_start == sizeof (void *));
+		      
+		      devicep->device_host2dev_func
+		        ((void *) (tgt->tgt_start + k->tgt_offset),
+			 (void *) k->host_start,
+			 sizeof (void *));
+		      break;
+		    case GOMP_MAP_FORCE_PRIVATE:
+		      abort ();
+		    case GOMP_MAP_FORCE_FIRSTPRIVATE:
+		      abort ();
+		    default:
+		      gomp_fatal ("%s: unhandled kind 0x%.2x", __FUNCTION__,
+				  kind);
 		  }
 		array++;
 	      }
 	  }
     }
+
+#undef GFC_DTYPE_RANK_MASK
+#undef GFC_DTYPE_TYPE_MASK
+#undef GFC_DTYPE_TYPE_SHIFT
+#undef GFC_DTYPE_SIZE_SHIFT
+	
   if (is_target)
     {
       for (i = 0; i < mapnum; i++)
@@ -510,7 +579,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 	}
     }
 
-  gomp_mutex_unlock (&devicep->dev_env_lock);
+  gomp_mutex_unlock (&mm->lock);
   return tgt;
 }
 
@@ -525,10 +594,52 @@ gomp_unmap_tgt (struct target_mem_desc *tgt)
   free (tgt);
 }
 
-static void
-gomp_unmap_vars (struct target_mem_desc *tgt)
+/* Decrease the refcount for a set of mapped variables, and queue asychronous
+   copies from the device back to the host after any work that has been issued. 
+   Because the regions are still "live", increment an asynchronous reference
+   count to indicate that they should not be unmapped from host-side data
+   structures until the asynchronous copy has completed.  */
+
+attribute_hidden void
+gomp_copy_from_async (struct target_mem_desc *tgt)
+{
+  struct gomp_device_descr *devicep = tgt->device_descr;
+  struct gomp_memory_mapping *mm = tgt->mem_map;
+  size_t i;
+  
+  gomp_mutex_lock (&mm->lock);
+
+  for (i = 0; i < tgt->list_count; i++)
+    if (tgt->list[i] == NULL)
+      ;
+    else if (tgt->list[i]->refcount > 1)
+      {
+	tgt->list[i]->refcount--;
+	tgt->list[i]->async_refcount++;
+      }
+    else
+      {
+	splay_tree_key k = tgt->list[i];
+	if (k->copy_from)
+	  /* Copy from device to host memory.  */
+	  devicep->device_dev2host_func
+	    ((void *) k->host_start,
+	     (void *) (k->tgt->tgt_start + k->tgt_offset),
+	     k->host_end - k->host_start);
+      }
+
+  gomp_mutex_unlock (&mm->lock);
+}
+
+/* Unmap variables described by TGT.  If DO_COPYFROM is true, copy relevant
+   variables back from device to host: if it is false, it is assumed that this
+   has been done already, i.e. by gomp_copy_from_async above.  */
+
+attribute_hidden void
+gomp_unmap_vars (struct target_mem_desc *tgt, bool do_copyfrom)
 {
   struct gomp_device_descr *devicep = tgt->device_descr;
+  struct gomp_memory_mapping *mm = tgt->mem_map;
 
   if (tgt->list_count == 0)
     {
@@ -537,22 +648,24 @@ gomp_unmap_vars (struct target_mem_desc *tgt)
     }
 
   size_t i;
-  gomp_mutex_lock (&devicep->dev_env_lock);
+  gomp_mutex_lock (&mm->lock);
   for (i = 0; i < tgt->list_count; i++)
     if (tgt->list[i] == NULL)
       ;
     else if (tgt->list[i]->refcount > 1)
       tgt->list[i]->refcount--;
+    else if (tgt->list[i]->async_refcount > 0)
+      tgt->list[i]->async_refcount--;
     else
       {
 	splay_tree_key k = tgt->list[i];
-	if (k->copy_from)
+	if (k->copy_from && do_copyfrom)
 	  /* Copy from device to host memory.  */
 	  devicep->device_dev2host_func
 	    ((void *) k->host_start,
 	     (void *) (k->tgt->tgt_start + k->tgt_offset),
 	     k->host_end - k->host_start);
-	splay_tree_remove (&devicep->dev_splay_tree, k);
+	splay_tree_remove (&mm->splay_tree, k);
 	if (k->tgt->refcount > 1)
 	  k->tgt->refcount--;
 	else
@@ -563,15 +676,17 @@ gomp_unmap_vars (struct target_mem_desc *tgt)
     tgt->refcount--;
   else
     gomp_unmap_tgt (tgt);
-  gomp_mutex_unlock (&devicep->dev_env_lock);
+  gomp_mutex_unlock (&mm->lock);
 }
 
 static void
-gomp_update (struct gomp_device_descr *devicep, size_t mapnum,
-	     void **hostaddrs, size_t *sizes, unsigned char *kinds)
+gomp_update (struct gomp_device_descr *devicep, struct gomp_memory_mapping *mm,
+	     size_t mapnum, void **hostaddrs, size_t *sizes, void *kinds,
+	     bool is_openacc)
 {
   size_t i;
   struct splay_tree_key_s cur_node;
+  const int typemask = is_openacc ? 0xff : 0x7;
 
   if (!devicep)
     return;
@@ -579,16 +694,17 @@ gomp_update (struct gomp_device_descr *devicep, size_t mapnum,
   if (mapnum == 0)
     return;
 
-  gomp_mutex_lock (&devicep->dev_env_lock);
+  gomp_mutex_lock (&mm->lock);
   for (i = 0; i < mapnum; i++)
     if (sizes[i])
       {
 	cur_node.host_start = (uintptr_t) hostaddrs[i];
 	cur_node.host_end = cur_node.host_start + sizes[i];
-	splay_tree_key n = splay_tree_lookup (&devicep->dev_splay_tree,
+	splay_tree_key n = splay_tree_lookup (&mm->splay_tree,
 					      &cur_node);
 	if (n)
 	  {
+	    int kind = get_kind (is_openacc, kinds, i);
 	    if (n->host_start > cur_node.host_start
 		|| n->host_end < cur_node.host_end)
 	      gomp_fatal ("Trying to update [%p..%p) object when"
@@ -597,7 +713,7 @@ gomp_update (struct gomp_device_descr *devicep, size_t mapnum,
 			  (void *) cur_node.host_end,
 			  (void *) n->host_start,
 			  (void *) n->host_end);
-	    if ((kinds[i] & 7) == 1)
+	    if (GOMP_MAP_COPYTO_P (kind & typemask))
 	      /* Copy from host to device memory.  */
 	      devicep->device_host2dev_func
 		((void *) (n->tgt->tgt_start
@@ -606,7 +722,7 @@ gomp_update (struct gomp_device_descr *devicep, size_t mapnum,
 			   - n->host_start),
 		 (void *) cur_node.host_start,
 		 cur_node.host_end - cur_node.host_start);
-	    else if ((kinds[i] & 7) == 2)
+	    else if (GOMP_MAP_COPYFROM_P (kind & typemask))
 	      /* Copy from device to host memory.  */
 	      devicep->device_dev2host_func
 		((void *) cur_node.host_start,
@@ -621,19 +737,25 @@ gomp_update (struct gomp_device_descr *devicep, size_t mapnum,
 		      (void *) cur_node.host_start,
 		      (void *) cur_node.host_end);
       }
-  gomp_mutex_unlock (&devicep->dev_env_lock);
+  gomp_mutex_unlock (&mm->lock);
 }
 
+static void gomp_register_image_for_device (struct gomp_device_descr *device,
+					    struct offload_image_descr *image);
+
 /* This function should be called from every offload image.  It gets the
    descriptor of the host func and var tables HOST_TABLE, TYPE of the target,
    and TARGET_DATA needed by target plugin (target tables, etc.)  */
 void
-GOMP_offload_register (void *host_table, int type, void *target_data)
+GOMP_offload_register (void *host_table, int type, void **target_data)
 {
   offload_images = gomp_realloc (offload_images,
 				 (num_offload_images + 1)
 				 * sizeof (struct offload_image_descr));
 
+  if (offload_images == NULL)
+    return;
+
   offload_images[num_offload_images].type = type;
   offload_images[num_offload_images].host_table = host_table;
   offload_images[num_offload_images].target_data = target_data;
@@ -641,18 +763,24 @@ GOMP_offload_register (void *host_table, int type, void *target_data)
   num_offload_images++;
 }
 
-static void
+attribute_hidden void
 gomp_init_device (struct gomp_device_descr *devicep)
 {
   /* Initialize the target device.  */
   devicep->device_init_func ();
 
+  devicep->is_initialized = true;
+}
+
+attribute_hidden void
+gomp_init_tables (const struct gomp_device_descr *devicep,
+		  struct gomp_memory_mapping *mm)
+{
   /* Get address mapping table for device.  */
   struct mapping_table *table = NULL;
-  int num_entries = devicep->device_get_table_func (&table);
+  int i, num_entries = devicep->device_get_table_func (&table);
 
   /* Insert host-target address mapping into dev_splay_tree.  */
-  int i;
   for (i = 0; i < num_entries; i++)
     {
       struct target_mem_desc *tgt = gomp_malloc (sizeof (*tgt));
@@ -662,7 +790,7 @@ gomp_init_device (struct gomp_device_descr *devicep)
       tgt->tgt_end = table[i].tgt_end;
       tgt->to_free = NULL;
       tgt->list_count = 0;
-      tgt->device_descr = devicep;
+      tgt->device_descr = (struct gomp_device_descr *) devicep;
       splay_tree_node node = tgt->array;
       splay_tree_key k = &node->key;
       k->host_start = table[i].host_start;
@@ -671,11 +799,38 @@ gomp_init_device (struct gomp_device_descr *devicep)
       k->tgt = tgt;
       node->left = NULL;
       node->right = NULL;
-      splay_tree_insert (&devicep->dev_splay_tree, node);
+      splay_tree_insert (&mm->splay_tree, node);
     }
 
   free (table);
-  devicep->is_initialized = true;
+  
+  mm->is_initialized = true;
+}
+
+static void
+gomp_init_dev_tables (struct gomp_device_descr *devicep)
+{
+  gomp_init_device (devicep);
+  gomp_init_tables (devicep, &devicep->mem_map);
+}
+
+attribute_hidden void
+gomp_fini_device (struct gomp_device_descr *devicep)
+{
+  struct gomp_memory_mapping *mm = &devicep->mem_map;
+
+  if (devicep->is_initialized)
+    devicep->device_fini_func ();
+
+  while (mm->splay_tree.root)
+    {
+      struct target_mem_desc *tgt = mm->splay_tree.root->key.tgt;
+      free (tgt->array);
+      free (tgt);
+      splay_tree_remove (&mm->splay_tree, &mm->splay_tree.root->key);
+    }
+
+  devicep->is_initialized = false;
 }
 
 /* Called when encountering a target directive.  If DEVICE
@@ -694,7 +849,12 @@ GOMP_target (int device, void (*fn) (void *), const void *openmp_target,
 	     unsigned char *kinds)
 {
   struct gomp_device_descr *devicep = resolve_device (device);
-  if (devicep == NULL)
+  struct gomp_memory_mapping *mm = &devicep->mem_map;
+
+  if (devicep != NULL && !devicep->is_initialized)
+    gomp_init_dev_tables (devicep);
+
+  if (devicep == NULL || !(devicep->capabilities & TARGET_CAP_OPENMP_400))
     {
       /* Host fallback.  */
       struct gomp_thread old_thr, *thr = gomp_thread ();
@@ -711,18 +871,18 @@ GOMP_target (int device, void (*fn) (void *), const void *openmp_target,
       return;
     }
 
-  if (!devicep->is_initialized)
-    gomp_init_device (devicep);
-
   struct splay_tree_key_s k;
   k.host_start = (uintptr_t) fn;
   k.host_end = k.host_start + 1;
-  splay_tree_key tgt_fn = splay_tree_lookup (&devicep->dev_splay_tree, &k);
-  if (tgt_fn == NULL && devicep->type != TARGET_TYPE_HOST)
+  gomp_mutex_lock (&mm->lock);
+  splay_tree_key tgt_fn = splay_tree_lookup (&devicep->mem_map.splay_tree, &k);
+  if (tgt_fn == NULL && !(devicep->capabilities & TARGET_CAP_NATIVE_EXEC))
     gomp_fatal ("Target function wasn't mapped");
+  gomp_mutex_unlock (&mm->lock);
 
   struct target_mem_desc *tgt_vars
-    = gomp_map_vars (devicep, mapnum, hostaddrs, sizes, kinds, true);
+    = gomp_map_vars (devicep, &devicep->mem_map, mapnum, hostaddrs, NULL,
+		     sizes, kinds, false, true);
   struct gomp_thread old_thr, *thr = gomp_thread ();
   old_thr = *thr;
   memset (thr, '\0', sizeof (*thr));
@@ -731,14 +891,14 @@ GOMP_target (int device, void (*fn) (void *), const void *openmp_target,
       thr->place = old_thr.place;
       thr->ts.place_partition_len = gomp_places_list_len;
     }
-  if (devicep->type == TARGET_TYPE_HOST)
+  if (devicep->capabilities & TARGET_CAP_NATIVE_EXEC)
     devicep->device_run_func (fn, (void *) tgt_vars->tgt_start);
   else
     devicep->device_run_func ((void *) tgt_fn->tgt->tgt_start,
 			      (void *) tgt_vars->tgt_start);
   gomp_free_thread (thr);
   *thr = old_thr;
-  gomp_unmap_vars (tgt_vars);
+  gomp_unmap_vars (tgt_vars, true);
 }
 
 void
@@ -746,7 +906,11 @@ GOMP_target_data (int device, const void *openmp_target, size_t mapnum,
 		  void **hostaddrs, size_t *sizes, unsigned char *kinds)
 {
   struct gomp_device_descr *devicep = resolve_device (device);
-  if (devicep == NULL)
+
+  if (devicep != NULL && !devicep->is_initialized)
+    gomp_init_dev_tables (devicep);
+
+  if (devicep == NULL || !(devicep->capabilities & TARGET_CAP_OPENMP_400))
     {
       /* Host fallback.  */
       struct gomp_task_icv *icv = gomp_icv (false);
@@ -757,18 +921,17 @@ GOMP_target_data (int device, const void *openmp_target, size_t mapnum,
 	     new #pragma omp target data, otherwise GOMP_target_end_data
 	     would get out of sync.  */
 	  struct target_mem_desc *tgt
-	    = gomp_map_vars (NULL, 0, NULL, NULL, NULL, false);
+	    = gomp_map_vars (NULL, NULL, 0, NULL, NULL, NULL, NULL, false,
+			     false);
 	  tgt->prev = icv->target_data;
 	  icv->target_data = tgt;
 	}
       return;
     }
 
-  if (!devicep->is_initialized)
-    gomp_init_device (devicep);
-
   struct target_mem_desc *tgt
-    = gomp_map_vars (devicep, mapnum, hostaddrs, sizes, kinds, false);
+    = gomp_map_vars (devicep, &devicep->mem_map, mapnum, hostaddrs, NULL, sizes,
+		     kinds, false, false);
   struct gomp_task_icv *icv = gomp_icv (true);
   tgt->prev = icv->target_data;
   icv->target_data = tgt;
@@ -782,7 +945,7 @@ GOMP_target_end_data (void)
     {
       struct target_mem_desc *tgt = icv->target_data;
       icv->target_data = tgt->prev;
-      gomp_unmap_vars (tgt);
+      gomp_unmap_vars (tgt, true);
     }
 }
 
@@ -791,13 +954,15 @@ GOMP_target_update (int device, const void *openmp_target, size_t mapnum,
 		    void **hostaddrs, size_t *sizes, unsigned char *kinds)
 {
   struct gomp_device_descr *devicep = resolve_device (device);
-  if (devicep == NULL)
-    return;
 
-  if (!devicep->is_initialized)
-    gomp_init_device (devicep);
+  if (devicep != NULL && !devicep->is_initialized)
+    gomp_init_dev_tables (devicep);
 
-  gomp_update (devicep, mapnum, hostaddrs, sizes, kinds);
+  if (devicep == NULL || !(devicep->capabilities & TARGET_CAP_OPENMP_400))
+    return;
+
+  gomp_update (devicep, &devicep->mem_map, mapnum, hostaddrs, sizes, kinds,
+	       false);
 }
 
 void
@@ -840,7 +1005,8 @@ static bool
 gomp_load_plugin_for_device (struct gomp_device_descr *device,
 			     const char *plugin_name)
 {
-  char *err = NULL;
+  char *err = NULL, *last_missing = NULL;
+  int optional_present, optional_total;
 
   /* Clear any existing error.  */
   dlerror ();
@@ -863,40 +1029,98 @@ gomp_load_plugin_for_device (struct gomp_device_descr *device,
 	goto out;							\
     }									\
   while (0)
+  /* Similar, but missing functions are not an error.  */
+#define DLSYM_OPT(f,n) \
+  do									\
+    {									\
+      char *tmp_err;							\
+      device->f##_func = dlsym (device->plugin_handle, #n);		\
+      tmp_err = dlerror ();						\
+      if (tmp_err == NULL)						\
+        optional_present++;						\
+      else								\
+        last_missing = #n;						\
+      optional_total++;							\
+    }									\
+  while (0)
+
+  DLSYM (get_name);
+  DLSYM (get_caps);
   DLSYM (get_type);
   DLSYM (get_num_devices);
   DLSYM (offload_register);
   DLSYM (device_init);
+  DLSYM (device_fini);
   DLSYM (device_get_table);
   DLSYM (device_alloc);
   DLSYM (device_free);
   DLSYM (device_dev2host);
   DLSYM (device_host2dev);
-  DLSYM (device_run);
+  if (device->get_caps_func () & TARGET_CAP_OPENMP_400)
+    DLSYM (device_run);
+  if (device->get_caps_func () & TARGET_CAP_OPENACC_200)
+    {
+      optional_present = optional_total = 0;
+      DLSYM_OPT (openacc.exec, openacc_parallel);
+      DLSYM_OPT (openacc.open_device, openacc_open_device);
+      DLSYM_OPT (openacc.close_device, openacc_close_device);
+      DLSYM_OPT (openacc.get_device_num, openacc_get_device_num);
+      DLSYM_OPT (openacc.set_device_num, openacc_set_device_num);
+      DLSYM_OPT (openacc.avail, openacc_avail);
+      DLSYM_OPT (openacc.async_test, openacc_async_test);
+      DLSYM_OPT (openacc.async_test_all, openacc_async_test_all);
+      DLSYM_OPT (openacc.async_wait, openacc_async_wait);
+      DLSYM_OPT (openacc.async_wait_async, openacc_async_wait_async);
+      DLSYM_OPT (openacc.async_wait_all, openacc_async_wait_all);
+      DLSYM_OPT (openacc.async_wait_all_async, openacc_async_wait_all_async);
+      DLSYM_OPT (openacc.async_set_async, openacc_async_set_async);
+      /* Require all the OpenACC handlers if we have TARGET_CAP_OPENACC_200.  */
+      if (optional_present != optional_total)
+	{
+	  err = "plugin missing OpenACC handler function";
+	  goto out;
+	}
+      optional_present = optional_total = 0;
+      DLSYM_OPT (openacc.cuda.get_current_device,
+		 openacc_get_current_cuda_device);
+      DLSYM_OPT (openacc.cuda.get_current_context,
+		 openacc_get_current_cuda_context);
+      DLSYM_OPT (openacc.cuda.get_stream, openacc_get_cuda_stream);
+      DLSYM_OPT (openacc.cuda.set_stream, openacc_set_cuda_stream);
+      /* Make sure all the CUDA functions are there if any of them are.  */
+      if (optional_present && optional_present != optional_total)
+	{
+	  err = "plugin missing OpenACC CUDA handler function";
+	  goto out;
+	}
+    }
 #undef DLSYM
+#undef DLSYM_OPT
 
  out:
   if (err != NULL)
     {
       gomp_error ("while loading %s: %s", plugin_name, err);
+      if (last_missing)
+        gomp_error ("missing function was %s", last_missing);
       if (device->plugin_handle)
 	dlclose (device->plugin_handle);
     }
   return err == NULL;
 }
 
-/* This function finds OFFLOAD_IMAGES corresponding to DEVICE type, and
-   registers them in the plugin.  */
+/* This function adds a compatible offload image IMAGE to an accelerator device
+   DEVICE.  */
+
 static void
-gomp_register_images_for_device (struct gomp_device_descr *device)
+gomp_register_image_for_device (struct gomp_device_descr *device,
+				struct offload_image_descr *image)
 {
-  int i;
-  for (i = 0; i < num_offload_images; i++)
+  if (!device->offload_regions_registered
+      && (device->type == image->type || device->type == TARGET_TYPE_HOST))
     {
-      struct offload_image_descr *image = &offload_images[i];
-
-      if (device->type == image->type || device->type == TARGET_TYPE_HOST)
-	device->offload_register_func (image->host_table, image->target_data);
+      device->offload_register_func (image->host_table, image->target_data);
+      device->offload_regions_registered = true;
     }
 }
 
@@ -913,6 +1137,7 @@ gomp_find_available_plugins (void)
   DIR *dir = NULL;
   struct dirent *ent;
   char plugin_name[PATH_MAX];
+  int i;
 
   num_devices = 0;
   devices = NULL;
@@ -927,7 +1152,7 @@ gomp_find_available_plugins (void)
 
   while ((ent = readdir (dir)) != NULL)
     {
-      struct gomp_device_descr current_device;
+      struct gomp_device_descr current_device, *devicep;
       if (!gomp_check_plugin_file_name (ent->d_name))
 	continue;
       if (strlen (plugin_path) + 1 + strlen (ent->d_name) >= PATH_MAX)
@@ -937,7 +1162,7 @@ gomp_find_available_plugins (void)
       strcat (plugin_name, ent->d_name);
       if (!gomp_load_plugin_for_device (&current_device, plugin_name))
 	continue;
-      devices = realloc (devices, (num_devices + 1)
+      devices = gomp_realloc (devices, (num_devices + 1)
 				  * sizeof (struct gomp_device_descr));
       if (devices == NULL)
 	{
@@ -945,20 +1170,56 @@ gomp_find_available_plugins (void)
 	  goto out;
 	}
 
-      /* FIXME: Properly handle multiple devices of the same type.  */
-      if (current_device.get_num_devices_func () >= 1)
+      devices[num_devices] = current_device;
+      devicep = &devices[num_devices];
+
+      devicep->is_initialized = false;
+      devicep->offload_regions_registered = false;
+      devicep->mem_map.splay_tree.root = NULL;
+      devicep->mem_map.is_initialized = false;
+      devicep->type = devicep->get_type_func ();
+      devicep->name = devicep->get_name_func ();
+      devicep->capabilities = devicep->get_caps_func ();
+      gomp_mutex_init (&devicep->mem_map.lock);
+      devicep->id = ++num_devices;
+    }
+  /* Prefer a device with TARGET_CAP_OPENMP_400 for ICV default-device-var.  */
+  if (num_devices > 1)
+    {
+      int d = gomp_icv (false)->default_device_var;
+
+      if (!(devices[d].capabilities & TARGET_CAP_OPENMP_400))
 	{
-	  current_device.id = num_devices + 1;
-	  current_device.type = current_device.get_type_func ();
-	  current_device.is_initialized = false;
-	  current_device.dev_splay_tree.root = NULL;
-	  gomp_register_images_for_device (&current_device);
-	  devices[num_devices] = current_device;
-	  gomp_mutex_init (&devices[num_devices].dev_env_lock);
-	  num_devices++;
+	  for (i = 0; i < num_devices; i++)
+	    {
+	      if (devices[i].capabilities & TARGET_CAP_OPENMP_400)
+		{
+		  struct gomp_device_descr device_tmp = devices[d];
+		  devices[d] = devices[i];
+		  devices[d].id = d + 1;
+		  devices[i] = device_tmp;
+		  devices[i].id = i + 1;
+
+		  break;
+		}
+	    }
 	}
     }
 
+  for (i = 0; i < num_devices; i++)
+    {
+      int j;
+
+      for (j = 0; j < num_offload_images; j++)
+	gomp_register_image_for_device (&devices[i], &offload_images[j]);
+
+      /* The 'devices' array can be moved (by the realloc call) until we have
+	 found all the plugins, so registering with the OpenACC runtime (which
+	 takes a copy of the pointer argument) must be delayed until now.  */
+      if (devices[i].capabilities & TARGET_CAP_OPENACC_200)
+	ACC_plugin_register (&devices[i]);
+    }
+
  out:
   if (dir)
     closedir (dir);
diff --git a/libgomp/target.h b/libgomp/target.h
new file mode 100644
index 0000000..635cc52
--- /dev/null
+++ b/libgomp/target.h
@@ -0,0 +1,178 @@
+/* Copyright (C) 2013-2014 Free Software Foundation, Inc.
+   Contributed by Jakub Jelinek <jakub@redhat.com>.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* This file handles the maintainence of threads in response to team
+   creation and termination.  */
+
+#ifndef _TARGET_H
+#define _TARGET_H 1
+
+#include <stdarg.h>
+#include "splay-tree.h"
+#include "gomp-constants.h"
+
+struct target_mem_desc {
+  /* Reference count.  */
+  uintptr_t refcount;
+  /* All the splay nodes allocated together.  */
+  splay_tree_node array;
+  /* Start of the target region.  */
+  uintptr_t tgt_start;
+  /* End of the targer region.  */
+  uintptr_t tgt_end;
+  /* Handle to free.  */
+  void *to_free;
+  /* Previous target_mem_desc.  */
+  struct target_mem_desc *prev;
+  /* Number of items in following list.  */
+  size_t list_count;
+
+  /* Corresponding target device descriptor.  */
+  struct gomp_device_descr *device_descr;
+  
+  /* Memory mapping info for the thread that created this descriptor.  */
+  struct gomp_memory_mapping *mem_map;
+
+  /* List of splay keys to remove (or decrease refcount)
+     at the end of region.  */
+  splay_tree_key list[];
+};
+
+/* Keep in sync with openacc.h:acc_device_t.  */
+
+enum target_type {
+  TARGET_TYPE_HOST = GOMP_TARGET_HOST,
+  TARGET_TYPE_HOST_NONSHM = GOMP_TARGET_HOST_NONSHM,
+  TARGET_TYPE_NVIDIA_PTX = GOMP_TARGET_NVIDIA_PTX,
+  TARGET_TYPE_INTEL_MIC = GOMP_TARGET_INTEL_MIC,
+};
+
+#define TARGET_CAP_SHARED_MEM	1
+#define TARGET_CAP_NATIVE_EXEC	2
+#define TARGET_CAP_OPENMP_400	4
+#define TARGET_CAP_OPENACC_200	8
+
+/* Information about mapped memory regions (per device/context).  */
+
+struct gomp_memory_mapping
+{
+  /* Splay tree containing information about mapped memory regions.  */
+  struct splay_tree_s splay_tree;
+
+  /* Mutex for operating with the splay tree and other shared structures.  */
+  gomp_mutex_t lock;
+  
+  /* True when tables have been added to this memory map.  */
+  bool is_initialized;
+};
+
+#include "oacc-int.h"
+
+static inline enum acc_device_t
+acc_device_type (enum target_type type)
+{
+  return (enum acc_device_t) type;
+}
+
+struct mapping_table {
+  uintptr_t host_start;
+  uintptr_t host_end;
+  uintptr_t tgt_start;
+  uintptr_t tgt_end;
+};
+
+/* This structure describes accelerator device.
+   It contains name of the corresponding libgomp plugin, function handlers for
+   interaction with the device, ID-number of the device, and information about
+   mapped memory.  */
+struct gomp_device_descr
+{
+  /* The name of the device.  */
+  const char *name;
+
+  /* Capabilities of device (supports OpenACC, OpenMP).  */
+  unsigned int capabilities;
+
+  /* This is the ID number of device.  It could be specified in DEVICE-clause of
+     TARGET construct.  */
+  int id;
+
+  /* This is the TYPE of device.  */
+  enum target_type type;
+
+  /* Set to true when device is initialized.  */
+  bool is_initialized;
+  
+  /* True when offload regions have been registered with this device.  */
+  bool offload_regions_registered;
+
+  /* Plugin file handler.  */
+  void *plugin_handle;
+
+  /* Function handlers.  */
+  const char *(*get_name_func) (void);
+  unsigned int (*get_caps_func) (void);
+  int (*get_type_func) (void);
+  int (*get_num_devices_func) (void);
+  void (*offload_register_func) (void *, void *);
+  int (*device_init_func) (void);
+  int (*device_fini_func) (void);
+  int (*device_get_table_func) (struct mapping_table **);
+  void *(*device_alloc_func) (size_t);
+  void (*device_free_func) (void *);
+  void *(*device_dev2host_func) (void *, const void *, size_t);
+  void *(*device_host2dev_func) (void *, const void *, size_t);
+  void (*device_run_func) (void *, void *);
+
+  /* OpenACC-specific functions.  */
+  ACC_dispatch_t openacc;
+  
+  /* Memory-mapping info (only for OpenMP -- mappings are stored per-thread
+     for OpenACC. It's not clear if that's a useful distinction).  */
+  struct gomp_memory_mapping mem_map;
+};
+
+extern struct target_mem_desc *
+gomp_map_vars (struct gomp_device_descr *devicep,
+	       struct gomp_memory_mapping *mm, size_t mapnum,
+	       void **hostaddrs, void **devaddrs, size_t *sizes,
+	       void *kinds, bool is_openacc, bool is_target);
+
+extern void
+gomp_copy_from_async (struct target_mem_desc *tgt);
+
+extern void
+gomp_unmap_vars (struct target_mem_desc *tgt, bool);
+
+extern attribute_hidden void
+gomp_init_device (struct gomp_device_descr *devicep);
+
+extern attribute_hidden void
+gomp_init_tables (const struct gomp_device_descr *devicep,
+		  struct gomp_memory_mapping *mm);
+
+extern attribute_hidden void
+gomp_fini_device (struct gomp_device_descr *devicep);
+
+#endif /* _TARGET_H */
-- 
1.7.10.4


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [gomp4] [2/3] OpenACC 2.0 support for libgomp - new tests
@ 2014-10-14 16:33 ` Julian Brown
  2014-10-14 16:12   ` [gomp] [3/3] OpenACC 2.0 support for libgomp - documentation Julian Brown
                     ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Julian Brown @ 2014-10-14 16:33 UTC (permalink / raw)
  To: gcc-patches, Jakub Jelinek, Thomas Schwinge

[-- Attachment #1: Type: text/plain, Size: 1165 bytes --]

This is an updated version of the patch:

https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02025.html

but against gomp4 branch instead of mainline. Some tests have been
updated a little since the last patch.

OK to apply (to the gomp4 branch)?

Thanks,

Julian

ChangeLog

xxxx-xx-xx  James Norris  <jnorris@codesourcery.com>
	    Thomas Schwinge  <thomas@codesourcery.com>
	    Tom de Vries  <tom@codesourcery.com>
	    Cesar Philippidis  <cesar@codesourcery.com>

    libgomp/
    * testsuite/Makefile.in: Regenerated.
    * testsuite/lib/libgomp.exp
    (check_effective_target_openacc_nvidia_accel_present)
    (check_effective_target_openacc_nvidia_accel_selected): New
    functions.
    * testsuite/libgomp.oacc-fortran/fortran.exp: New exp file.
    * testsuite/libgomp.oacc-fortran/*.f: New tests.
    * testsuite/libgomp.oacc-fortran/*.f90: Likewise.
    * testsuite/libgomp.oacc-c/c.exp: New exp file.
    * testsuite/libgomp.oacc-c/subr.ptx: New file.
    * testsuite/libgomp.oacc-c/subr.cu: New file.
    * testsuite/libgomp.oacc-c/timer.h: New file.
    * testsuite/libgomp.oacc-c/*.c: New tests.
    * testsuite/libgomp.oacc-c++/c++.exp: New exp file.

[-- Attachment #2: 0001-Tests-for-libgomp-OpenACC-support.patch --]
[-- Type: text/x-patch, Size: 259065 bytes --]

From 18c107c58d42314128e485bb79892672a8feaa6b Mon Sep 17 00:00:00 2001
From: Julian Brown <julian@codesourcery.com>
Date: Mon, 13 Oct 2014 04:40:51 -0700
Subject: [PATCH 1/3] Tests for libgomp OpenACC support.

---
 libgomp/testsuite/Makefile.in                      |    4 +
 libgomp/testsuite/lib/libgomp.exp                  |   30 +
 libgomp/testsuite/libgomp.oacc-c++/c++.exp         |   37 +-
 libgomp/testsuite/libgomp.oacc-c/abort-2.c         |   17 +
 libgomp/testsuite/libgomp.oacc-c/abort.c           |   17 +
 libgomp/testsuite/libgomp.oacc-c/acc_on_device-1.c |   25 +-
 libgomp/testsuite/libgomp.oacc-c/c.exp             |   50 +-
 libgomp/testsuite/libgomp.oacc-c/clauses-1.c       |  623 ++++++++++++++++++
 libgomp/testsuite/libgomp.oacc-c/clauses-2.c       |   67 ++
 libgomp/testsuite/libgomp.oacc-c/context-1.c       |  213 ++++++
 libgomp/testsuite/libgomp.oacc-c/context-2.c       |  223 +++++++
 libgomp/testsuite/libgomp.oacc-c/context-3.c       |  200 ++++++
 libgomp/testsuite/libgomp.oacc-c/context-4.c       |  213 ++++++
 libgomp/testsuite/libgomp.oacc-c/data-1.c          |  112 ++--
 libgomp/testsuite/libgomp.oacc-c/deviceptr-1.c     |   32 +
 libgomp/testsuite/libgomp.oacc-c/goacc_kernels.c   |    3 +-
 libgomp/testsuite/libgomp.oacc-c/goacc_parallel.c  |    3 +-
 libgomp/testsuite/libgomp.oacc-c/if-1.c            |  547 ++++++++++++++++
 libgomp/testsuite/libgomp.oacc-c/kernels-1.c       |   22 +-
 libgomp/testsuite/libgomp.oacc-c/lib-1.c           |   19 +-
 libgomp/testsuite/libgomp.oacc-c/lib-10.c          |   58 ++
 libgomp/testsuite/libgomp.oacc-c/lib-11.c          |   22 +
 libgomp/testsuite/libgomp.oacc-c/lib-12.c          |   37 ++
 libgomp/testsuite/libgomp.oacc-c/lib-13.c          |   60 ++
 libgomp/testsuite/libgomp.oacc-c/lib-14.c          |   61 ++
 libgomp/testsuite/libgomp.oacc-c/lib-15.c          |   33 +
 libgomp/testsuite/libgomp.oacc-c/lib-16.c          |   29 +
 libgomp/testsuite/libgomp.oacc-c/lib-17.c          |   31 +
 libgomp/testsuite/libgomp.oacc-c/lib-18.c          |   34 +
 libgomp/testsuite/libgomp.oacc-c/lib-19.c          |   60 ++
 libgomp/testsuite/libgomp.oacc-c/lib-2.c           |   26 +
 libgomp/testsuite/libgomp.oacc-c/lib-20.c          |   29 +
 libgomp/testsuite/libgomp.oacc-c/lib-21.c          |   29 +
 libgomp/testsuite/libgomp.oacc-c/lib-22.c          |   29 +
 libgomp/testsuite/libgomp.oacc-c/lib-23.c          |   39 ++
 libgomp/testsuite/libgomp.oacc-c/lib-24.c          |   55 ++
 libgomp/testsuite/libgomp.oacc-c/lib-25.c          |   30 +
 libgomp/testsuite/libgomp.oacc-c/lib-26.c          |   26 +
 libgomp/testsuite/libgomp.oacc-c/lib-27.c          |   26 +
 libgomp/testsuite/libgomp.oacc-c/lib-28.c          |   26 +
 libgomp/testsuite/libgomp.oacc-c/lib-29.c          |   26 +
 libgomp/testsuite/libgomp.oacc-c/lib-3.c           |   15 +
 libgomp/testsuite/libgomp.oacc-c/lib-30.c          |   26 +
 libgomp/testsuite/libgomp.oacc-c/lib-31.c          |   27 +
 libgomp/testsuite/libgomp.oacc-c/lib-32.c          |   38 ++
 libgomp/testsuite/libgomp.oacc-c/lib-33.c          |   31 +
 libgomp/testsuite/libgomp.oacc-c/lib-34.c          |   33 +
 libgomp/testsuite/libgomp.oacc-c/lib-35.c          |   26 +
 libgomp/testsuite/libgomp.oacc-c/lib-36.c          |   26 +
 libgomp/testsuite/libgomp.oacc-c/lib-37.c          |   40 ++
 libgomp/testsuite/libgomp.oacc-c/lib-38.c          |   67 ++
 libgomp/testsuite/libgomp.oacc-c/lib-39.c          |   41 ++
 libgomp/testsuite/libgomp.oacc-c/lib-4.c           |   13 +
 libgomp/testsuite/libgomp.oacc-c/lib-40.c          |   42 ++
 libgomp/testsuite/libgomp.oacc-c/lib-41.c          |   43 ++
 libgomp/testsuite/libgomp.oacc-c/lib-42.c          |   35 +
 libgomp/testsuite/libgomp.oacc-c/lib-43.c          |   45 ++
 libgomp/testsuite/libgomp.oacc-c/lib-44.c          |   45 ++
 libgomp/testsuite/libgomp.oacc-c/lib-45.c          |   50 ++
 libgomp/testsuite/libgomp.oacc-c/lib-46.c          |   42 ++
 libgomp/testsuite/libgomp.oacc-c/lib-47.c          |   43 ++
 libgomp/testsuite/libgomp.oacc-c/lib-48.c          |   43 ++
 libgomp/testsuite/libgomp.oacc-c/lib-49.c          |   48 ++
 libgomp/testsuite/libgomp.oacc-c/lib-5.c           |   40 ++
 libgomp/testsuite/libgomp.oacc-c/lib-50.c          |   30 +
 libgomp/testsuite/libgomp.oacc-c/lib-51.c          |   41 ++
 libgomp/testsuite/libgomp.oacc-c/lib-52.c          |   28 +
 libgomp/testsuite/libgomp.oacc-c/lib-53.c          |   28 +
 libgomp/testsuite/libgomp.oacc-c/lib-54.c          |   28 +
 libgomp/testsuite/libgomp.oacc-c/lib-55.c          |   48 ++
 libgomp/testsuite/libgomp.oacc-c/lib-56.c          |   33 +
 libgomp/testsuite/libgomp.oacc-c/lib-57.c          |   28 +
 libgomp/testsuite/libgomp.oacc-c/lib-58.c          |   28 +
 libgomp/testsuite/libgomp.oacc-c/lib-59.c          |   55 ++
 libgomp/testsuite/libgomp.oacc-c/lib-6.c           |   39 ++
 libgomp/testsuite/libgomp.oacc-c/lib-60.c          |   54 ++
 libgomp/testsuite/libgomp.oacc-c/lib-61.c          |   70 ++
 libgomp/testsuite/libgomp.oacc-c/lib-62.c          |   49 ++
 libgomp/testsuite/libgomp.oacc-c/lib-63.c          |   43 ++
 libgomp/testsuite/libgomp.oacc-c/lib-64.c          |   43 ++
 libgomp/testsuite/libgomp.oacc-c/lib-65.c          |   43 ++
 libgomp/testsuite/libgomp.oacc-c/lib-66.c          |   47 ++
 libgomp/testsuite/libgomp.oacc-c/lib-67.c          |   43 ++
 libgomp/testsuite/libgomp.oacc-c/lib-68.c          |   43 ++
 libgomp/testsuite/libgomp.oacc-c/lib-69.c          |  124 ++++
 libgomp/testsuite/libgomp.oacc-c/lib-7.c           |   18 +
 libgomp/testsuite/libgomp.oacc-c/lib-70.c          |  136 ++++
 libgomp/testsuite/libgomp.oacc-c/lib-71.c          |  119 ++++
 libgomp/testsuite/libgomp.oacc-c/lib-72.c          |  121 ++++
 libgomp/testsuite/libgomp.oacc-c/lib-73.c          |  134 ++++
 libgomp/testsuite/libgomp.oacc-c/lib-74.c          |  139 ++++
 libgomp/testsuite/libgomp.oacc-c/lib-75.c          |  141 ++++
 libgomp/testsuite/libgomp.oacc-c/lib-76.c          |  147 +++++
 libgomp/testsuite/libgomp.oacc-c/lib-77.c          |  135 ++++
 libgomp/testsuite/libgomp.oacc-c/lib-78.c          |  140 ++++
 libgomp/testsuite/libgomp.oacc-c/lib-79.c          |  167 +++++
 libgomp/testsuite/libgomp.oacc-c/lib-80.c          |  132 ++++
 libgomp/testsuite/libgomp.oacc-c/lib-81.c          |  211 ++++++
 libgomp/testsuite/libgomp.oacc-c/lib-82.c          |  144 +++++
 libgomp/testsuite/libgomp.oacc-c/lib-83.c          |   58 ++
 libgomp/testsuite/libgomp.oacc-c/lib-84.c          |   66 ++
 libgomp/testsuite/libgomp.oacc-c/lib-85.c          |   52 ++
 libgomp/testsuite/libgomp.oacc-c/lib-86.c          |   42 ++
 libgomp/testsuite/libgomp.oacc-c/lib-87.c          |   42 ++
 libgomp/testsuite/libgomp.oacc-c/lib-88.c          |  111 ++++
 libgomp/testsuite/libgomp.oacc-c/lib-89.c          |  118 ++++
 libgomp/testsuite/libgomp.oacc-c/lib-9.c           |   70 ++
 libgomp/testsuite/libgomp.oacc-c/lib-90.c          |  137 ++++
 libgomp/testsuite/libgomp.oacc-c/lib-91.c          |   84 +++
 libgomp/testsuite/libgomp.oacc-c/lib-92.c          |  112 ++++
 libgomp/testsuite/libgomp.oacc-c/nested-1.c        |  680 ++++++++++++++++++++
 libgomp/testsuite/libgomp.oacc-c/nested-2.c        |   35 +
 libgomp/testsuite/libgomp.oacc-c/offset-1.c        |   97 +++
 libgomp/testsuite/libgomp.oacc-c/parallel-1.c      |   76 ++-
 libgomp/testsuite/libgomp.oacc-c/pointer-align-1.c |   35 +
 libgomp/testsuite/libgomp.oacc-c/present-1.c       |   48 ++
 libgomp/testsuite/libgomp.oacc-c/present-2.c       |   48 ++
 libgomp/testsuite/libgomp.oacc-c/subr.cu           |   64 ++
 libgomp/testsuite/libgomp.oacc-c/subr.ptx          |  148 +++++
 libgomp/testsuite/libgomp.oacc-c/timer.h           |  103 +++
 libgomp/testsuite/libgomp.oacc-c/update-1.c        |  280 ++++++++
 libgomp/testsuite/libgomp.oacc-fortran/abort-1.f90 |   10 +
 libgomp/testsuite/libgomp.oacc-fortran/abort-2.f90 |   13 +
 .../libgomp.oacc-fortran/acc_on_device-1-1.f90     |   17 +-
 .../libgomp.oacc-fortran/acc_on_device-1-2.f       |   17 +-
 .../libgomp.oacc-fortran/acc_on_device-1-3.f       |   17 +-
 libgomp/testsuite/libgomp.oacc-fortran/fortran.exp |   42 +-
 libgomp/testsuite/libgomp.oacc-fortran/lib-1.f90   |   10 +
 libgomp/testsuite/libgomp.oacc-fortran/lib-10.f90  |   82 +++
 libgomp/testsuite/libgomp.oacc-fortran/lib-11.f90  |   82 +++
 libgomp/testsuite/libgomp.oacc-fortran/lib-2.f     |   10 +
 libgomp/testsuite/libgomp.oacc-fortran/lib-3.f     |   10 +
 libgomp/testsuite/libgomp.oacc-fortran/lib-4.f90   |   35 +
 libgomp/testsuite/libgomp.oacc-fortran/lib-5.f90   |   31 +
 libgomp/testsuite/libgomp.oacc-fortran/lib-6.f90   |   35 +
 libgomp/testsuite/libgomp.oacc-fortran/lib-7.f90   |   31 +
 libgomp/testsuite/libgomp.oacc-fortran/lib-8.f90   |   83 +++
 libgomp/testsuite/libgomp.oacc-fortran/lib-9.f90   |   83 +++
 libgomp/testsuite/libgomp.oacc-fortran/map-1.f90   |   97 +++
 .../libgomp.oacc-fortran/pointer-align-1.f90       |   21 +
 libgomp/testsuite/libgomp.oacc-fortran/pset-1.f90  |  229 +++++++
 .../testsuite/libgomp.oacc-fortran/subarrays-1.f90 |   97 +++
 .../testsuite/libgomp.oacc-fortran/subarrays-2.f90 |  100 +++
 143 files changed, 10476 insertions(+), 93 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/abort-2.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/abort.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/clauses-1.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/clauses-2.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/context-1.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/context-2.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/context-3.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/context-4.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/deviceptr-1.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/if-1.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-10.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-11.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-12.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-13.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-14.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-15.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-16.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-17.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-18.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-19.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-2.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-20.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-21.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-22.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-23.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-24.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-25.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-26.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-27.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-28.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-29.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-3.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-30.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-31.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-32.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-33.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-34.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-35.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-36.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-37.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-38.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-39.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-4.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-40.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-41.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-42.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-43.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-44.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-45.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-46.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-47.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-48.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-49.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-5.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-50.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-51.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-52.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-53.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-54.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-55.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-56.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-57.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-58.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-59.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-6.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-60.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-61.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-62.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-63.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-64.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-65.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-66.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-67.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-68.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-69.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-7.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-70.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-71.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-72.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-73.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-74.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-75.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-76.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-77.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-78.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-79.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-80.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-81.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-82.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-83.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-84.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-85.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-86.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-87.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-88.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-89.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-9.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-90.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-91.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-92.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/nested-1.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/nested-2.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/offset-1.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/pointer-align-1.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/present-1.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/present-2.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/subr.cu
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/subr.ptx
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/timer.h
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/update-1.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/abort-1.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/abort-2.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/lib-10.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/lib-11.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/lib-4.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/lib-5.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/lib-6.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/lib-7.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/lib-8.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/lib-9.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/map-1.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/pointer-align-1.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/pset-1.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/subarrays-1.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/subarrays-2.f90

diff --git a/libgomp/testsuite/Makefile.in b/libgomp/testsuite/Makefile.in
index 5273eaa..77b365e 100644
--- a/libgomp/testsuite/Makefile.in
+++ b/libgomp/testsuite/Makefile.in
@@ -129,6 +129,10 @@ PACKAGE_URL = @PACKAGE_URL@
 PACKAGE_VERSION = @PACKAGE_VERSION@
 PATH_SEPARATOR = @PATH_SEPARATOR@
 PERL = @PERL@
+PLUGIN_NVPTX = @PLUGIN_NVPTX@
+PLUGIN_NVPTX_CPPFLAGS = @PLUGIN_NVPTX_CPPFLAGS@
+PLUGIN_NVPTX_LDFLAGS = @PLUGIN_NVPTX_LDFLAGS@
+PLUGIN_NVPTX_LIBS = @PLUGIN_NVPTX_LIBS@
 RANLIB = @RANLIB@
 SECTION_LDFLAGS = @SECTION_LDFLAGS@
 SED = @SED@
diff --git a/libgomp/testsuite/lib/libgomp.exp b/libgomp/testsuite/lib/libgomp.exp
index 094e5ed..78a14cb 100644
--- a/libgomp/testsuite/lib/libgomp.exp
+++ b/libgomp/testsuite/lib/libgomp.exp
@@ -139,6 +139,8 @@ proc libgomp_init { args } {
         lappend ALWAYS_CFLAGS "additional_flags=-B${blddir}/.libs"
         lappend ALWAYS_CFLAGS "additional_flags=-I${blddir}"
         lappend ALWAYS_CFLAGS "ldflags=-L${blddir}/.libs"
+	# The top-level include directory, for libgomp-constants.h.
+	lappend ALWAYS_CFLAGS "additional_flags=-I${srcdir}/../../include"
     }
     lappend ALWAYS_CFLAGS "additional_flags=-I${srcdir}/.."
 
@@ -239,3 +241,31 @@ proc libgomp_option_proc { option } {
 	return 0
     }
 }
+
+# Return 1 if at least one nvidia board is present.
+
+proc check_effective_target_openacc_nvidia_accel_present { } {
+    return [check_runtime openacc_nvidia_accel_present {
+	#include <openacc.h>
+	int main () {
+	return !(acc_get_num_devices (acc_device_nvidia) > 0);
+	}
+    } "" ]
+}
+
+# Return 1 if at least one nvidia board is present, and the nvidia device type
+# is selected by default by means of setting the environment variable
+# ACC_DEVICE_TYPE.
+
+proc check_effective_target_openacc_nvidia_accel_selected { } {
+    if { ![check_effective_target_openacc_nvidia_accel_present] } {
+	return 0;
+    }
+    if { ![info exists ::env(ACC_DEVICE_TYPE)] } {
+	return 0;
+    }
+    if { $::env(ACC_DEVICE_TYPE) == "nvidia" } {
+        return 1;
+    }
+    return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c++/c++.exp b/libgomp/testsuite/libgomp.oacc-c++/c++.exp
index ae8a1d5..164d7d2 100644
--- a/libgomp/testsuite/libgomp.oacc-c++/c++.exp
+++ b/libgomp/testsuite/libgomp.oacc-c++/c++.exp
@@ -17,7 +17,8 @@ if [info exists lang_include_flags] then {
 dg-init
 
 # Turn on OpenACC.
-lappend ALWAYS_CFLAGS "additional_flags=-fopenacc"
+# XXX (TEMPORARY): Remove the -flto once that's properly integrated.
+lappend ALWAYS_CFLAGS "additional_flags=-fopenacc -flto"
 
 set blddir [lookfor_file [get_multilibs] libgomp]
 
@@ -61,8 +62,38 @@ if { $lang_test_file_found } {
 	set libstdcxx_includes ""
     }
 
-    # Main loop.
-    dg-runtest $tests "" $libstdcxx_includes
+    # Todo: get list of accelerators from configure options --enable-accelerator.
+    set accels { "nvidia" "host_nonshm" }
+
+    # Run on host (or fallback) accelerator.
+    lappend accels "host"
+
+    # Test OpenACC with available accelerators.
+    foreach accel $accels {
+	set tagopt "-DACC_DEVICE_TYPE_$accel=1"
+
+	# Todo: Determine shared memory or not using run-time test.
+	switch $accel {
+	    host {
+		set acc_mem_shared 1
+	    }
+	    host_nonshm {
+		set acc_mem_shared 0
+	    }
+	    nvidia {
+		set acc_mem_shared 0
+	    }
+	    default {
+		set acc_mem_shared 0
+	    }
+	}
+	set tagopt "$tagopt -DACC_MEM_SHARED=$acc_mem_shared"
+
+	# Todo: Verify that this works for both local and remote testing.
+	setenv ACC_DEVICE_TYPE $accel
+
+	dg-runtest $tests "$tagopt" $libstdcxx_includes
+    }
 }
 
 # All done.
diff --git a/libgomp/testsuite/libgomp.oacc-c/abort-2.c b/libgomp/testsuite/libgomp.oacc-c/abort-2.c
new file mode 100644
index 0000000..debb81e
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/abort-2.c
@@ -0,0 +1,17 @@
+/* { dg-do run } */
+
+#include <stdlib.h>
+
+int
+main (int argc, char **argv)
+{
+
+#pragma acc parallel
+  {
+    if (argc != 1)
+      abort ();
+  }
+
+  return 0;
+}
+
diff --git a/libgomp/testsuite/libgomp.oacc-c/abort.c b/libgomp/testsuite/libgomp.oacc-c/abort.c
new file mode 100644
index 0000000..f88b9e3
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/abort.c
@@ -0,0 +1,17 @@
+/* { dg-do run } */
+/* { dg-shouldfail "" { *-*-* } { "*" } { "" } } */
+
+#include <stdlib.h>
+
+int
+main (void)
+{
+
+#pragma acc parallel
+  {
+    abort ();
+  }
+
+  return 0;
+}
+
diff --git a/libgomp/testsuite/libgomp.oacc-c/acc_on_device-1.c b/libgomp/testsuite/libgomp.oacc-c/acc_on_device-1.c
index f216587..81ea476 100644
--- a/libgomp/testsuite/libgomp.oacc-c/acc_on_device-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c/acc_on_device-1.c
@@ -1,7 +1,6 @@
 /* Disable the acc_on_device builtin; we want to test the libgomp library
    function.  */
-/* TODO: Remove -DACC_DEVICE_TYPE_host once that is set by the test harness.  */
-/* { dg-additional-options "-fno-builtin-acc_on_device -DACC_DEVICE_TYPE_host" } */
+/* { dg-additional-options "-fno-builtin-acc_on_device" } */
 
 #include <stdlib.h>
 #include <openacc.h>
@@ -16,8 +15,12 @@ main (int argc, char *argv[])
       abort ();
     if (!acc_on_device (acc_device_host))
       abort ();
+    if (acc_on_device (acc_device_host_nonshm))
+      abort ();
     if (acc_on_device (acc_device_not_host))
       abort ();
+    if (acc_on_device (acc_device_nvidia))
+      abort ();
   }
 
 
@@ -29,8 +32,12 @@ main (int argc, char *argv[])
       abort ();
     if (!acc_on_device (acc_device_host))
       abort ();
+    if (acc_on_device (acc_device_host_nonshm))
+      abort ();
     if (acc_on_device (acc_device_not_host))
       abort ();
+    if (acc_on_device (acc_device_nvidia))
+      abort ();
   }
 
 
@@ -44,8 +51,22 @@ main (int argc, char *argv[])
       abort ();
     if (acc_on_device (acc_device_host))
       abort ();
+#if ACC_DEVICE_TYPE_host_nonshm
+    if (!acc_on_device (acc_device_host_nonshm))
+      abort ();
+#else
+    if (acc_on_device (acc_device_host_nonshm))
+      abort ();
+#endif
     if (!acc_on_device (acc_device_not_host))
       abort ();
+#if ACC_DEVICE_TYPE_nvidia
+    if (!acc_on_device (acc_device_nvidia))
+      abort ();
+#else
+    if (acc_on_device (acc_device_nvidia))
+      abort ();
+#endif
   }
 
 #endif
diff --git a/libgomp/testsuite/libgomp.oacc-c/c.exp b/libgomp/testsuite/libgomp.oacc-c/c.exp
index 13a478e..553c225 100644
--- a/libgomp/testsuite/libgomp.oacc-c/c.exp
+++ b/libgomp/testsuite/libgomp.oacc-c/c.exp
@@ -23,17 +23,61 @@ if ![info exists DEFAULT_CFLAGS] then {
 dg-init
 
 # Turn on OpenACC.
-lappend ALWAYS_CFLAGS "additional_flags=-fopenacc"
+# XXX (TEMPORARY): Remove the -flto once that's properly integrated.
+lappend ALWAYS_CFLAGS "additional_flags=-fopenacc -flto"
 
 # Gather a list of all tests.
 set tests [lsort [find $srcdir/$subdir *.c]]
 
 set ld_library_path $always_ld_library_path
 append ld_library_path [gcc-set-multilib-library-path $GCC_UNDER_TEST]
+append ld_library_path ":/opt/nvidia/cuda-5.5/lib64"
 set_ld_library_path_env_vars
 
-# Main loop.
-dg-runtest $tests "" $DEFAULT_CFLAGS
+# Todo: get list of accelerators from configure options --enable-accelerator.
+set accels { "nvidia" "host_nonshm" }
+
+# Run on host (or fallback) accelerator.
+lappend accels "host"
+
+# Test OpenACC with available accelerators.
+set SAVE_ALWAYS_CFLAGS "$ALWAYS_CFLAGS"
+foreach accel $accels {
+    set ALWAYS_CFLAGS "$SAVE_ALWAYS_CFLAGS"
+    set tagopt "-DACC_DEVICE_TYPE_$accel=1"
+
+    # Todo: Determine shared memory or not using run-time test.
+    switch $accel {
+	host {
+	    set acc_mem_shared 1
+	}
+	host_nonshm {
+	    set acc_mem_shared 0
+	}
+	nvidia {
+	    # Copy ptx file (TEMPORARY)
+	    remote_download host $srcdir/libgomp.oacc-c/subr.ptx
+
+	    # Where cuda.h lives
+	    # Todo: get that from configure option --with-cuda-driver.
+	    lappend ALWAYS_CFLAGS "additional_flags=-I/opt/nvidia/cuda-5.5/include"
+	    lappend ALWAYS_CFLAGS "additional_flags=-L/opt/nvidia/cuda-5.5/lib64"
+
+	    # Where timer.h lives
+	    lappend ALWAYS_CFLAGS "additional_flags=-I${srcdir}"
+	    set acc_mem_shared 0
+	}
+	default {
+	    set acc_mem_shared 0
+	}
+    }
+    set tagopt "$tagopt -DACC_MEM_SHARED=$acc_mem_shared"
+
+    # Todo: Verify that this works for both local and remote testing.
+    setenv ACC_DEVICE_TYPE $accel
+
+    dg-runtest $tests "$tagopt" $DEFAULT_CFLAGS
+}
 
 # All done.
 dg-finish
diff --git a/libgomp/testsuite/libgomp.oacc-c/clauses-1.c b/libgomp/testsuite/libgomp.oacc-c/clauses-1.c
new file mode 100644
index 0000000..51c0cf5
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/clauses-1.c
@@ -0,0 +1,623 @@
+/* { dg-do run } */
+/* { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } } */
+
+#include <openacc.h>
+#include <string.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdbool.h>
+
+int
+main (int argc, char **argv)
+{
+    int N = 8;
+    float *a, *b, *c, *d;
+    int i;
+
+    a = (float *) malloc (N * sizeof (float));
+    b = (float *) malloc (N * sizeof (float));
+    c = (float *) malloc (N * sizeof (float));
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 3.0;
+        b[i] = 0.0;
+    }
+
+#pragma acc parallel copyin (a[0:N]) copyout (b[0:N])
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            b[ii] = a[ii];
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != 3.0)
+            abort ();
+    }
+
+    if (acc_is_present (&a[0], (N * sizeof (float))))
+      abort ();
+
+    if (acc_is_present (&b[0], (N * sizeof (float))))
+      abort ();
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 5.0;
+        b[i] = 1.0;
+    }
+
+#pragma acc parallel copyin (a[0:N]) copyout (b[0:N])
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            b[ii] = a[ii];
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != 5.0)
+            abort ();
+    }
+
+    if (acc_is_present (&a[0], (N * sizeof (float))))
+      abort ();
+
+    if (acc_is_present (&b[0], (N * sizeof (float))))
+      abort ();
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 6.0;
+        b[i] = 0.0;
+    }
+
+    d = (float *) acc_copyin (&a[0], N * sizeof (float));
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 9.0;
+    }
+
+#pragma acc parallel present_or_copyin (a[0:N]) copyout (b[0:N])
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            b[ii] = a[ii];
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != 6.0)
+            abort ();
+    }
+
+    if (!acc_is_present (&a[0], (N * sizeof (float))))
+      abort ();
+
+    if (acc_is_present (&b[0], (N * sizeof (float))))
+      abort ();
+
+    acc_free (d);
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 6.0;
+        b[i] = 0.0;
+    }
+
+#pragma acc parallel copyin (a[0:N]) present_or_copyout (b[0:N])
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            b[ii] = a[ii];
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != 6.0)
+            abort ();
+    }
+
+    if (acc_is_present (&a[0], (N * sizeof (float))))
+      abort ();
+
+    if (acc_is_present (&b[0], (N * sizeof (float))))
+      abort ();
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 5.0;
+        b[i] = 2.0;
+    }
+
+    d = (float *) acc_copyin (&b[0], N * sizeof (float));
+
+#pragma acc parallel copyin (a[0:N]) present_or_copyout (b[0:N])
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            b[ii] = a[ii];
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (a[i] != 5.0)
+            abort ();
+
+        if (b[i] != 2.0)
+            abort ();
+    }
+
+    if (acc_is_present (&a[0], (N * sizeof (float))))
+      abort ();
+
+    if (!acc_is_present (&b[0], (N * sizeof (float))))
+      abort ();
+
+    acc_free (d);
+
+    if (acc_is_present (&b[0], (N * sizeof (float))))
+      abort ();
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 3.0;
+        b[i] = 4.0;
+    }
+
+#pragma acc parallel copy (a[0:N]) copyout (b[0:N])
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+        {
+            a[ii] = a[ii] + 1;
+            b[ii] = a[ii] + 2;
+        }
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (a[i] != 4.0)
+            abort ();
+
+        if (b[i] != 6.0)
+            abort ();
+    }
+
+    if (acc_is_present (&a[0], (N * sizeof (float))))
+      abort ();
+
+    if (acc_is_present (&b[0], (N * sizeof (float))))
+      abort ();
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 4.0;
+        b[i] = 7.0;
+    }
+
+#pragma acc parallel present_or_copy (a[0:N]) present_or_copy (b[0:N])
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+        {
+            a[ii] = a[ii] + 1;
+            b[ii] = b[ii] + 2;
+        }
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (a[i] != 5.0)
+            abort ();
+
+        if (b[i] != 9.0)
+            abort ();
+    }
+
+    if (acc_is_present (&a[0], (N * sizeof (float))))
+      abort ();
+
+    if (acc_is_present (&b[0], (N * sizeof (float))))
+      abort ();
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 3.0;
+        b[i] = 7.0;
+    }
+
+    d = (float *) acc_copyin (&a[0], N * sizeof (float));
+    d = (float *) acc_copyin (&b[0], N * sizeof (float));
+
+#pragma acc parallel present_or_copy (a[0:N]) present_or_copy (b[0:N])
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+        {
+            a[ii] = a[ii] + 1;
+            b[ii] = b[ii] + 2;
+        }
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (a[i] != 3.0)
+            abort ();
+
+        if (b[i] != 7.0)
+            abort ();
+    }
+
+    if (!acc_is_present (&a[0], (N * sizeof (float))))
+      abort ();
+
+    if (!acc_is_present (&b[0], (N * sizeof (float))))
+      abort ();
+
+    d = (float *) acc_deviceptr (&a[0]);
+    acc_unmap_data (&a[0]);
+    acc_free (d);
+
+    d = (float *) acc_deviceptr (&b[0]);
+    acc_unmap_data (&b[0]);
+    acc_free (d);
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 3.0;
+        b[i] = 7.0;
+    }
+
+#pragma acc parallel copyin (a[0:N]) create (c[0:N]) copyout (b[0:N])
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+        {
+            c[ii] = a[ii];
+            b[ii] = c[ii];
+        }
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (a[i] != 3.0)
+            abort ();
+
+        if (b[i] != 3.0)
+            abort ();
+    }
+
+    if (acc_is_present (&a[0], (N * sizeof (float))))
+      abort ();
+
+    if (acc_is_present (&b[0], (N * sizeof (float))))
+      abort ();
+
+    if (acc_is_present (&c[0], (N * sizeof (float))))
+      abort ();
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 4.0;
+        b[i] = 8.0;
+    }
+
+#pragma acc parallel copyin (a[0:N]) present_or_create (c[0:N]) copyout (b[0:N])
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+        {
+            c[ii] = a[ii];
+            b[ii] = c[ii];
+        }
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (a[i] != 4.0)
+            abort ();
+
+        if (b[i] != 4.0)
+            abort ();
+    }
+
+    if (acc_is_present (&a[0], (N * sizeof (float))))
+      abort ();
+
+    if (acc_is_present (&b[0], (N * sizeof (float))))
+      abort ();
+
+    if (acc_is_present (&c[0], (N * sizeof (float))))
+      abort ();
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 2.0;
+        b[i] = 5.0;
+    }
+
+    d = (float *) acc_malloc (N * sizeof (float));
+    acc_map_data (c, d, N * sizeof (float));
+
+#pragma acc parallel copyin (a[0:N]) present_or_create (c[0:N]) copyout (b[0:N])
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+        {
+            c[ii] = a[ii];
+            b[ii] = c[ii];
+        }
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (a[i] != 2.0)
+            abort ();
+
+        if (b[i] != 2.0)
+            abort ();
+    }
+
+    if (acc_is_present (a, (N * sizeof (float))))
+      abort ();
+
+    if (acc_is_present (b, (N * sizeof (float))))
+      abort ();
+
+    if (!acc_is_present (c, (N * sizeof (float))))
+      abort ();
+
+    d = (float *) acc_deviceptr (c);
+
+    acc_unmap_data (c);
+
+    acc_free (d);
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 4.0;
+        b[i] = 8.0;
+    }
+
+    d = (float *) acc_malloc (N * sizeof (float));
+    acc_map_data (c, d, N * sizeof (float));
+
+#pragma acc parallel copyin (a[0:N]) present (c[0:N]) copyout (b[0:N])
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+        {
+            c[ii] = a[ii];
+            b[ii] = c[ii];
+        }
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (a[i] != 4.0)
+            abort ();
+
+        if (b[i] != 4.0)
+            abort ();
+    }
+
+    if (acc_is_present (a, (N * sizeof (float))))
+      abort ();
+
+    if (acc_is_present (b, (N * sizeof (float))))
+      abort ();
+
+    if (!acc_is_present (c, (N * sizeof (float))))
+      abort ();
+
+    acc_unmap_data (c);
+
+    acc_free (d);
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 4.0;
+        b[i] = 8.0;
+    }
+
+    acc_copyin (a, N * sizeof (float));
+
+    d = (float *) acc_malloc (N * sizeof (float));
+    acc_map_data (b, d, N * sizeof (float));
+
+    d = (float *) acc_malloc (N * sizeof (float));
+    acc_map_data (c, d, N * sizeof (float));
+
+#pragma acc parallel present (a[0:N]) present (c[0:N]) present (b[0:N])
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+        {
+            c[ii] = a[ii];
+            b[ii] = c[ii];
+        }
+    }
+
+    if (!acc_is_present (a, (N * sizeof (float))))
+      abort ();
+
+    if (!acc_is_present (b, (N * sizeof (float))))
+      abort ();
+
+    if (!acc_is_present (c, (N * sizeof (float))))
+      abort ();
+
+    acc_copyout (b, N * sizeof (float));
+
+    for (i = 0; i < N; i++)
+    {
+        if (a[i] != 4.0)
+            abort ();
+
+        if (b[i] != 4.0)
+            abort ();
+    }
+
+    d = (float *) acc_deviceptr (a);
+
+    acc_unmap_data (a);
+
+    acc_free (d);
+
+    d = (float *) acc_deviceptr (c);
+
+    acc_unmap_data (c);
+
+    acc_free (d);
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 3.0;
+        b[i] = 6.0;
+    }
+
+    d = (float *) acc_malloc (N * sizeof (float));
+
+#pragma acc parallel copyin (a[0:N]) deviceptr (d) copyout (b[0:N])
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+        {
+            d[ii] = a[ii];
+            b[ii] = d[ii];
+        }
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (a[i] != 3.0)
+            abort ();
+
+        if (b[i] != 3.0)
+            abort ();
+    }
+
+    if (acc_is_present (a, (N * sizeof (float))))
+      abort ();
+
+    if (acc_is_present (b, (N * sizeof (float))))
+      abort ();
+
+    acc_free (d);
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 6.0;
+        b[i] = 0.0;
+    }
+
+    d = (float *) acc_copyin (&a[0], N * sizeof (float));
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 9.0;
+    }
+
+#pragma acc parallel pcopyin (a[0:N]) copyout (b[0:N])
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            b[ii] = a[ii];
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != 6.0)
+            abort ();
+    }
+
+    if (!acc_is_present (&a[0], (N * sizeof (float))))
+      abort ();
+
+    if (acc_is_present (&b[0], (N * sizeof (float))))
+      abort ();
+
+    acc_free (d);
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 6.0;
+        b[i] = 0.0;
+    }
+
+#pragma acc parallel copyin (a[0:N]) pcopyout (b[0:N])
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            b[ii] = a[ii];
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != 6.0)
+            abort ();
+    }
+
+    if (acc_is_present (&a[0], (N * sizeof (float))))
+      abort ();
+
+    if (acc_is_present (&b[0], (N * sizeof (float))))
+      abort ();
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 5.0;
+        b[i] = 7.0;
+    }
+
+#pragma acc parallel copyin (a[0:N]) pcreate (c[0:N]) copyout (b[0:N])
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+        {
+            c[ii] = a[ii];
+            b[ii] = c[ii];
+        }
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (a[i] != 5.0)
+            abort ();
+
+        if (b[i] != 5.0)
+            abort ();
+    }
+
+    if (acc_is_present (&a[0], (N * sizeof (float))))
+      abort ();
+
+    if (acc_is_present (&b[0], (N * sizeof (float))))
+      abort ();
+
+    if (acc_is_present (&c[0], (N * sizeof (float))))
+      abort ();
+
+    return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c/clauses-2.c b/libgomp/testsuite/libgomp.oacc-c/clauses-2.c
new file mode 100644
index 0000000..8dc45cb
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/clauses-2.c
@@ -0,0 +1,67 @@
+/* { dg-do run } */
+/* { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } } */
+
+#include <openacc.h>
+#include <string.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdbool.h>
+
+int
+main (int argc, char **argv)
+{
+    int N = 8;
+    float *a, *b, *c, *d;
+    int i;
+
+    a = (float *) malloc (N * sizeof (float));
+    b = (float *) malloc (N * sizeof (float));
+    c = (float *) malloc (N * sizeof (float));
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 2.0;
+        b[i] = 5.0;
+    }
+
+    d = (float *) acc_malloc (N * sizeof (float));
+    acc_map_data (c, d, N * sizeof (float));
+
+#pragma acc parallel copyin (a[0:N]) present_or_create (c[0:N+1]) copyout (b[0:N])
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+        {
+            c[ii] = a[ii];
+            b[ii] = c[ii];
+        }
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (a[i] != 2.0)
+            abort ();
+
+        if (b[i] != 2.0)
+            abort ();
+    }
+
+    if (acc_is_present (a, (N * sizeof (float))))
+      abort ();
+
+    if (acc_is_present (b, (N * sizeof (float))))
+      abort ();
+
+    if (!acc_is_present (c, (N * sizeof (float))))
+      abort ();
+
+    d = (float *) acc_deviceptr (c);
+
+    acc_unmap_data (c);
+
+    acc_free (d);
+
+    return 0;
+}
+/* { dg-shouldfail "libgomp: \[\h+,\d+\] is not mapped" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/context-1.c b/libgomp/testsuite/libgomp.oacc-c/context-1.c
new file mode 100644
index 0000000..dabc706
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/context-1.c
@@ -0,0 +1,213 @@
+/* { dg-do run { target openacc_nvidia_accel_selected } } */
+/* { dg-additional-options "-lcuda -lcublas -lcudart" } */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <cuda.h>
+#include <cuda_runtime_api.h>
+#include <cublas_v2.h>
+#include <openacc.h>
+
+void
+saxpy (int n, float a, float *x, float *y)
+{
+    int i;
+
+    for (i = 0; i < n; i++)
+    {
+        y[i] = a * x[i] + y[i];
+    }
+}
+
+void
+context_check (CUcontext ctx1)
+{
+    CUcontext ctx2, ctx3;
+    CUresult r;
+
+    r = cuCtxGetCurrent (&ctx2);
+    if (r != CUDA_SUCCESS)
+    {
+        fprintf (stderr, "cuCtxGetCurrent failed: %d\n", r);
+        exit (EXIT_FAILURE);
+    }
+
+    if (ctx1 != ctx2)
+    {
+        fprintf (stderr, "new context established\n");
+        exit (EXIT_FAILURE);
+    }
+
+    ctx3 = (CUcontext) acc_get_current_cuda_context ();
+
+    if (ctx1 != ctx3)
+    {
+        fprintf (stderr, "acc_get_current_cuda_context returned wrong value\n");
+        exit (EXIT_FAILURE);
+    }
+
+    return;
+}
+
+int
+main (int argc, char **argv)
+{
+    cublasStatus_t s;
+    cudaError_t e;
+    cublasHandle_t h;
+    CUcontext pctx, ctx;
+    CUresult r;
+    int dev;
+    int i;
+    const int N = 256;
+    float *h_X, *h_Y1, *h_Y2;
+    float *d_X,*d_Y;
+    float alpha = 2.0f;
+    float error_norm;
+    float ref_norm;
+
+    /* Test 1 - cuBLAS creates, OpenACC shares.  */
+
+    s = cublasCreate (&h);
+    if (s != CUBLAS_STATUS_SUCCESS)
+    {
+        fprintf (stderr, "cublasCreate failed: %d\n", s);
+        exit (EXIT_FAILURE);
+    }
+
+    r = cuCtxGetCurrent (&pctx);
+    if (r != CUDA_SUCCESS)
+    {
+        fprintf (stderr, "cuCtxGetCurrent failed: %d\n", r);
+        exit (EXIT_FAILURE);
+    }
+
+    e = cudaGetDevice (&dev);
+    if (e != cudaSuccess)
+    {
+        fprintf (stderr, "cudaGetDevice failed: %d\n", e);
+        exit (EXIT_FAILURE);
+    }
+
+    acc_set_device_num (dev, acc_device_nvidia);
+
+    h_X = (float *) malloc (N * sizeof (float));
+    if (!h_X)
+    {
+        fprintf (stderr, "malloc failed: for h_X\n");
+        exit (EXIT_FAILURE);
+    }
+
+    h_Y1 = (float *) malloc (N * sizeof (float));
+    if (!h_Y1)
+    {
+        fprintf (stderr, "malloc failed: for h_Y1\n");
+        exit (EXIT_FAILURE);
+    }
+
+    h_Y2 = (float *) malloc (N * sizeof (float));
+    if (!h_Y2)
+    {
+        fprintf (stderr, "malloc failed: for h_Y2\n");
+        exit (EXIT_FAILURE);
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        h_X[i] = rand () / (float) RAND_MAX;
+        h_Y2[i] = h_Y1[i] = rand () / (float) RAND_MAX;
+    }
+
+    d_X = (float *) acc_copyin (&h_X[0], N * sizeof (float));
+    if (d_X == NULL)
+    {
+        fprintf (stderr, "copyin error h_X\n");
+        exit (EXIT_FAILURE);
+    }
+
+    context_check (pctx);
+
+    d_Y = (float *) acc_copyin (&h_Y1[0], N * sizeof (float));
+    if (d_Y == NULL)
+    {
+        fprintf (stderr, "copyin error h_Y1\n");
+        exit (EXIT_FAILURE);
+    }
+
+    context_check (pctx);
+
+    s = cublasSaxpy (h, N, &alpha, d_X, 1, d_Y, 1);
+    if (s != CUBLAS_STATUS_SUCCESS)
+    {
+        fprintf (stderr, "cublasSaxpy failed: %d\n", s);
+        exit (EXIT_FAILURE);
+    }
+
+    context_check (pctx);
+
+    acc_memcpy_from_device (&h_Y1[0], d_Y, N * sizeof (float));
+
+    context_check (pctx);
+
+    saxpy (N, alpha, h_X, h_Y2);
+
+    error_norm = 0;
+    ref_norm = 0;
+
+    for (i = 0; i < N; ++i)
+    {
+        float diff;
+
+        diff = h_Y1[i] - h_Y2[i];
+        error_norm += diff * diff;
+        ref_norm += h_Y2[i] * h_Y2[i];
+    }
+
+    error_norm = (float) sqrt ((double) error_norm);
+    ref_norm = (float) sqrt ((double) ref_norm);
+
+    if ((fabs (ref_norm) < 1e-7) || ((error_norm / ref_norm) >= 1e-6f))
+    {
+        fprintf (stderr, "math error\n");
+        exit (EXIT_FAILURE);
+    }
+
+    free (h_X);
+    free (h_Y1);
+    free (h_Y2);
+
+    acc_free (d_X);
+    acc_free (d_Y);
+
+    context_check (pctx);
+
+    s = cublasDestroy (h);
+    if (s != CUBLAS_STATUS_SUCCESS)
+    {
+        fprintf (stderr, "cublasDestroy failed: %d\n", s);
+        exit (EXIT_FAILURE);
+    }
+
+    acc_shutdown (acc_device_nvidia);
+
+    r = cuCtxGetCurrent (&ctx);
+    if (r != CUDA_SUCCESS)
+    {
+        fprintf (stderr, "cuCtxGetCurrent failed: %d\n", r);
+        exit (EXIT_FAILURE);
+    }
+
+    if (!ctx)
+    {
+        fprintf (stderr, "Expected context\n");
+        exit (EXIT_FAILURE);
+    }
+
+    if (pctx != ctx)
+    {
+        fprintf (stderr, "Unexpected new context\n");
+        exit (EXIT_FAILURE);
+    }
+
+    return EXIT_SUCCESS;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c/context-2.c b/libgomp/testsuite/libgomp.oacc-c/context-2.c
new file mode 100644
index 0000000..16464d5
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/context-2.c
@@ -0,0 +1,223 @@
+/* { dg-do run { target openacc_nvidia_accel_selected } } */
+/* { dg-additional-options "-lcuda -lcublas -lcudart" } */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <cuda.h>
+#include <cuda_runtime_api.h>
+#include <cublas_v2.h>
+#include <openacc.h>
+
+void
+saxpy (int n, float a, float *x, float *y)
+{
+    int i;
+
+    for (i = 0; i < n; i++)
+    {
+        y[i] = a * x[i] + y[i];
+    }
+}
+
+void
+context_check (CUcontext ctx1)
+{
+    CUcontext ctx2, ctx3;
+    CUresult r;
+
+    r = cuCtxGetCurrent (&ctx2);
+    if (r != CUDA_SUCCESS)
+    {
+        fprintf (stderr, "cuCtxGetCurrent failed: %d\n", r);
+        exit (EXIT_FAILURE);
+    }
+
+    if (ctx1 != ctx2)
+    {
+        fprintf (stderr, "new context established\n");
+        exit (EXIT_FAILURE);
+    }
+
+    ctx3 = (CUcontext) acc_get_current_cuda_context ();
+
+    if (ctx1 != ctx3)
+    {
+        fprintf (stderr, "acc_get_current_cuda_context returned wrong value\n");
+        exit (EXIT_FAILURE);
+    }
+
+    return;
+}
+
+int
+main (int argc, char **argv)
+{
+    cublasStatus_t s;
+    cudaError_t e;
+    cublasHandle_t h;
+    CUcontext pctx, ctx;
+    CUresult r;
+    int dev;
+    int i;
+    const int N = 256;
+    float *h_X, *h_Y1, *h_Y2;
+    float *d_X,*d_Y;
+    float alpha = 2.0f;
+    float error_norm;
+    float ref_norm;
+
+    /* Test 2 - cuBLAS creates, OpenACC shares.  */
+
+    s = cublasCreate (&h);
+    if (s != CUBLAS_STATUS_SUCCESS)
+    {
+        fprintf (stderr, "cublasCreate failed: %d\n", s);
+        exit (EXIT_FAILURE);
+    }
+
+    r = cuCtxGetCurrent (&pctx);
+    if (r != CUDA_SUCCESS)
+    {
+        fprintf (stderr, "cuCtxGetCurrent failed: %d\n", r);
+        exit (EXIT_FAILURE);
+    }
+
+    e = cudaGetDevice (&dev);
+    if (e != cudaSuccess)
+    {
+        fprintf (stderr, "cudaGetDevice failed: %d\n", e);
+        exit (EXIT_FAILURE);
+    }
+
+    acc_set_device_num (dev, acc_device_nvidia);
+
+    h_X = (float *) malloc (N * sizeof (float));
+    if (h_X == 0)
+    {
+        fprintf (stderr, "malloc failed: for h_X\n");
+        exit (EXIT_FAILURE);
+    }
+
+    h_Y1 = (float *) malloc (N * sizeof (float));
+    if (h_Y1 == 0)
+    {
+        fprintf (stderr, "malloc failed: for h_Y1\n");
+        exit (EXIT_FAILURE);
+    }
+
+    h_Y2 = (float *) malloc (N * sizeof (float));
+    if (h_Y2 == 0)
+    {
+        fprintf (stderr, "malloc failed: for h_Y2\n");
+        exit (EXIT_FAILURE);
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        h_X[i] = rand () / (float) RAND_MAX;
+        h_Y2[i] = h_Y1[i] = rand () / (float) RAND_MAX;
+    }
+
+    d_X = (float *) acc_copyin (&h_X[0], N * sizeof (float));
+    if (d_X == NULL)
+    {
+        fprintf (stderr, "copyin error h_X\n");
+        exit (EXIT_FAILURE);
+    }
+
+    context_check (pctx);
+
+    d_Y = (float *) acc_copyin (&h_Y1[0], N * sizeof (float));
+    if (d_Y == NULL)
+    {
+        fprintf (stderr, "copyin error h_Y1\n");
+        exit (EXIT_FAILURE);
+    }
+
+    context_check (pctx);
+
+    s = cublasSaxpy (h, N, &alpha, d_X, 1, d_Y, 1);
+    if (s != CUBLAS_STATUS_SUCCESS)
+    {
+        fprintf (stderr, "cublasSaxpy failed: %d\n", s);
+        exit (EXIT_FAILURE);
+    }
+
+    context_check (pctx);
+
+    acc_memcpy_from_device (&h_Y1[0], d_Y, N * sizeof (float));
+
+    context_check (pctx);
+
+#pragma acc parallel copyin (h_X[0:N]), copy (h_Y2[0:N]) copyin (alpha)
+    {
+        int i;
+
+        for (i = 0; i < N; i++)
+        {
+            h_Y2[i] = alpha * h_X[i] + h_Y2[i];
+        }
+    }
+
+    context_check (pctx);
+
+    error_norm = 0;
+    ref_norm = 0;
+
+    for (i = 0; i < N; ++i)
+    {
+        float diff;
+
+        diff = h_Y1[i] - h_Y2[i];
+        error_norm += diff * diff;
+        ref_norm += h_Y2[i] * h_Y2[i];
+    }
+
+    error_norm = (float) sqrt ((double) error_norm);
+    ref_norm = (float) sqrt ((double) ref_norm);
+
+    if ((fabs (ref_norm) < 1e-7) || ((error_norm / ref_norm) >= 1e-6f))
+    {
+        fprintf (stderr, "math error\n");
+        exit (EXIT_FAILURE);
+    }
+
+    free (h_X);
+    free (h_Y1);
+    free (h_Y2);
+
+    acc_free (d_X);
+    acc_free (d_Y);
+
+    context_check (pctx);
+
+    s = cublasDestroy (h);
+    if (s != CUBLAS_STATUS_SUCCESS)
+    {
+        fprintf (stderr, "cublasDestroy failed: %d\n", s);
+        exit (EXIT_FAILURE);
+    }
+
+    acc_shutdown (acc_device_nvidia);
+
+    r = cuCtxGetCurrent (&ctx);
+    if (r != CUDA_SUCCESS)
+    {
+        fprintf (stderr, "cuCtxGetCurrent failed: %d\n", r);
+        exit (EXIT_FAILURE);
+    }
+
+    if (!ctx)
+    {
+        fprintf (stderr, "Expected context\n");
+        exit (EXIT_FAILURE);
+    }
+
+    if (pctx != ctx)
+    {
+        fprintf (stderr, "Unexpected new context\n");
+        exit (EXIT_FAILURE);
+    }
+
+    return EXIT_SUCCESS;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c/context-3.c b/libgomp/testsuite/libgomp.oacc-c/context-3.c
new file mode 100644
index 0000000..ccd276c
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/context-3.c
@@ -0,0 +1,200 @@
+/* { dg-do run { target openacc_nvidia_accel_selected } } */
+/* { dg-additional-options "-lcuda -lcublas -lcudart" } */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <cuda.h>
+#include <cuda_runtime_api.h>
+#include <cublas_v2.h>
+#include <openacc.h>
+
+void
+saxpy (int n, float a, float *x, float *y)
+{
+    int i;
+
+    for (i = 0; i < n; i++)
+    {
+        y[i] = a * x[i] + y[i];
+    }
+}
+
+void
+context_check (CUcontext ctx1)
+{
+    CUcontext ctx2, ctx3;
+    CUresult r;
+
+    r = cuCtxGetCurrent (&ctx2);
+    if (r != CUDA_SUCCESS)
+    {
+        fprintf (stderr, "cuCtxGetCurrent failed: %d\n", r);
+        exit (EXIT_FAILURE);
+    }
+
+    if (ctx1 != ctx2)
+    {
+        fprintf (stderr, "new context established\n");
+        exit (EXIT_FAILURE);
+    }
+
+    ctx3 = (CUcontext) acc_get_current_cuda_context ();
+
+    if (ctx1 != ctx3)
+    {
+        fprintf (stderr, "acc_get_current_cuda_context returned wrong value\n");
+        exit (EXIT_FAILURE);
+    }
+
+    return;
+}
+
+int
+main (int argc, char **argv)
+{
+    cublasStatus_t s;
+    cublasHandle_t h;
+    CUcontext pctx;
+    CUresult r;
+    int i;
+    const int N = 256;
+    float *h_X, *h_Y1, *h_Y2;
+    float *d_X,*d_Y;
+    float alpha = 2.0f;
+    float error_norm;
+    float ref_norm;
+
+    /* Test 3 - OpenACC creates, cuBLAS shares.  */
+
+    acc_set_device_num (0, acc_device_nvidia);
+
+    r = cuCtxGetCurrent (&pctx);
+    if (r != CUDA_SUCCESS)
+    {
+        fprintf (stderr, "cuCtxGetCurrent failed: %d\n", r);
+        exit (EXIT_FAILURE);
+    }
+
+    h_X = (float *) malloc (N * sizeof (float));
+    if (h_X == 0)
+    {
+        fprintf (stderr, "malloc failed: for h_X\n");
+        exit (EXIT_FAILURE);
+    }
+
+    h_Y1 = (float *) malloc (N * sizeof (float));
+    if (h_Y1 == 0)
+    {
+        fprintf (stderr, "malloc failed: for h_Y1\n");
+        exit (EXIT_FAILURE);
+    }
+
+    h_Y2 = (float *) malloc (N * sizeof (float));
+    if (h_Y2 == 0)
+    {
+        fprintf (stderr, "malloc failed: for h_Y2\n");
+        exit (EXIT_FAILURE);
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        h_X[i] = rand () / (float) RAND_MAX;
+        h_Y2[i] = h_Y1[i] = rand () / (float) RAND_MAX;
+    }
+
+    d_X = (float *) acc_copyin (&h_X[0], N * sizeof (float));
+    if (d_X == NULL)
+    {
+        fprintf (stderr, "copyin error h_X\n");
+        exit (EXIT_FAILURE);
+    }
+
+    d_Y = (float *) acc_copyin (&h_Y1[0], N * sizeof (float));
+    if (d_Y == NULL)
+    {
+        fprintf (stderr, "copyin error h_Y1\n");
+        exit (EXIT_FAILURE);
+    }
+
+    context_check (pctx);
+
+    s = cublasCreate (&h);
+    if (s != CUBLAS_STATUS_SUCCESS)
+    {
+        fprintf (stderr, "cublasCreate failed: %d\n", s);
+        exit (EXIT_FAILURE);
+    }
+
+    context_check (pctx);
+
+    s = cublasSaxpy (h, N, &alpha, d_X, 1, d_Y, 1);
+    if (s != CUBLAS_STATUS_SUCCESS)
+    {
+        fprintf (stderr, "cublasSaxpy failed: %d\n", s);
+        exit (EXIT_FAILURE);
+    }
+
+    context_check (pctx);
+
+    acc_memcpy_from_device (&h_Y1[0], d_Y, N * sizeof (float));
+
+    context_check (pctx);
+
+    saxpy (N, alpha, h_X, h_Y2);
+
+    error_norm = 0;
+    ref_norm = 0;
+
+    for (i = 0; i < N; ++i)
+    {
+        float diff;
+
+        diff = h_Y1[i] - h_Y2[i];
+        error_norm += diff * diff;
+        ref_norm += h_Y2[i] * h_Y2[i];
+    }
+
+    error_norm = (float) sqrt ((double) error_norm);
+    ref_norm = (float) sqrt ((double) ref_norm);
+
+    if ((fabs (ref_norm) < 1e-7) || ((error_norm / ref_norm) >= 1e-6f))
+    {
+        fprintf (stderr, "math error\n");
+        exit (EXIT_FAILURE);
+    }
+
+    free (h_X);
+    free (h_Y1);
+    free (h_Y2);
+
+    acc_free (d_X);
+    acc_free (d_Y);
+
+    context_check (pctx);
+
+    s = cublasDestroy (h);
+    if (s != CUBLAS_STATUS_SUCCESS)
+    {
+        fprintf (stderr, "cublasDestroy failed: %d\n", s);
+        exit (EXIT_FAILURE);
+    }
+
+    context_check (pctx);
+
+    acc_shutdown (acc_device_nvidia);
+
+    r = cuCtxGetCurrent (&pctx);
+    if (r != CUDA_SUCCESS)
+    {
+        fprintf (stderr, "cuCtxGetCurrent failed: %d\n", r);
+        exit (EXIT_FAILURE);
+    }
+
+    if (pctx)
+    {
+        fprintf (stderr, "Unexpected context\n");
+        exit (EXIT_FAILURE);
+    }
+
+    return EXIT_SUCCESS;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c/context-4.c b/libgomp/testsuite/libgomp.oacc-c/context-4.c
new file mode 100644
index 0000000..71365e8
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/context-4.c
@@ -0,0 +1,213 @@
+/* { dg-do run { target openacc_nvidia_accel_selected } } */
+/* { dg-additional-options "-lcuda -lcublas -lcudart" } */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <cuda.h>
+#include <cuda_runtime_api.h>
+#include <cublas_v2.h>
+#include <openacc.h>
+
+void
+saxpy (int n, float a, float *x, float *y)
+{
+    int i;
+
+    for (i = 0; i < n; i++)
+    {
+        y[i] = a * x[i] + y[i];
+    }
+}
+
+void
+context_check (CUcontext ctx1)
+{
+    CUcontext ctx2, ctx3;
+    CUresult r;
+
+    r = cuCtxGetCurrent (&ctx2);
+    if (r != CUDA_SUCCESS)
+    {
+        fprintf (stderr, "cuCtxGetCurrent failed: %d\n", r);
+        exit (EXIT_FAILURE);
+    }
+
+    if (ctx1 != ctx2)
+    {
+        fprintf (stderr, "new context established\n");
+        exit (EXIT_FAILURE);
+    }
+
+    ctx3 = (CUcontext) acc_get_current_cuda_context ();
+
+    if (ctx1 != ctx3)
+    {
+        fprintf (stderr, "acc_get_current_cuda_context returned wrong value\n");
+        exit (EXIT_FAILURE);
+    }
+
+    return;
+}
+
+int
+main (int argc, char **argv)
+{
+    cublasStatus_t s;
+    cublasHandle_t h;
+    CUcontext pctx;
+    CUresult r;
+    int i;
+    const int N = 256;
+    float *h_X, *h_Y1, *h_Y2;
+    float *d_X,*d_Y;
+    float alpha = 2.0f;
+    float error_norm;
+    float ref_norm;
+
+    /* Test 4 - OpenACC creates, cuBLAS shares.  */
+
+    acc_set_device_num (0, acc_device_nvidia);
+
+    r = cuCtxGetCurrent (&pctx);
+    if (r != CUDA_SUCCESS)
+    {
+        fprintf (stderr, "cuCtxGetCurrent failed: %d\n", r);
+        exit (EXIT_FAILURE);
+    }
+
+    h_X = (float *) malloc (N * sizeof (float));
+    if (h_X == 0)
+    {
+        fprintf (stderr, "malloc failed: for h_X\n");
+        exit (EXIT_FAILURE);
+    }
+
+    h_Y1 = (float *) malloc (N * sizeof (float));
+    if (h_Y1 == 0)
+    {
+        fprintf (stderr, "malloc failed: for h_Y1\n");
+        exit (EXIT_FAILURE);
+    }
+
+    h_Y2 = (float *) malloc (N * sizeof (float));
+    if (h_Y2 == 0)
+    {
+        fprintf (stderr, "malloc failed: for h_Y2\n");
+        exit (EXIT_FAILURE);
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        h_X[i] = rand () / (float) RAND_MAX;
+        h_Y2[i] = h_Y1[i] = rand () / (float) RAND_MAX;
+    }
+
+#pragma acc parallel copyin (h_X[0:N]), copy (h_Y2[0:N]) copy (alpha)
+    {
+        int i;
+
+        for (i = 0; i < N; i++)
+        {
+            h_Y2[i] = alpha * h_X[i] + h_Y2[i];
+        }
+    }
+
+    r = cuCtxGetCurrent (&pctx);
+    if (r != CUDA_SUCCESS)
+    {
+        fprintf (stderr, "cuCtxGetCurrent failed: %d\n", r);
+        exit (EXIT_FAILURE);
+    }
+
+    d_X = (float *) acc_copyin (&h_X[0], N * sizeof (float));
+    if (d_X == NULL)
+    {
+        fprintf (stderr, "copyin error h_Y1\n");
+        exit (EXIT_FAILURE);
+    }
+
+    d_Y = (float *) acc_copyin (&h_Y1[0], N * sizeof (float));
+    if (d_Y == NULL)
+    {
+        fprintf (stderr, "copyin error h_Y1\n");
+        exit (EXIT_FAILURE);
+    }
+
+    s = cublasCreate (&h);
+    if (s != CUBLAS_STATUS_SUCCESS)
+    {
+        fprintf (stderr, "cublasCreate failed: %d\n", s);
+        exit (EXIT_FAILURE);
+    }
+
+    context_check (pctx);
+
+    s = cublasSaxpy (h, N, &alpha, d_X, 1, d_Y, 1);
+    if (s != CUBLAS_STATUS_SUCCESS)
+    {
+        fprintf (stderr, "cublasSaxpy failed: %d\n", s);
+        exit (EXIT_FAILURE);
+    }
+
+    context_check (pctx);
+
+    acc_memcpy_from_device (&h_Y1[0], d_Y, N * sizeof (float));
+
+    context_check (pctx);
+
+    error_norm = 0;
+    ref_norm = 0;
+
+    for (i = 0; i < N; ++i)
+    {
+        float diff;
+
+        diff = h_Y1[i] - h_Y2[i];
+        error_norm += diff * diff;
+        ref_norm += h_Y2[i] * h_Y2[i];
+    }
+
+    error_norm = (float) sqrt ((double) error_norm);
+    ref_norm = (float) sqrt ((double) ref_norm);
+
+    if ((fabs (ref_norm) < 1e-7) || ((error_norm / ref_norm) >= 1e-6f))
+    {
+        fprintf (stderr, "math error\n");
+        exit (EXIT_FAILURE);
+    }
+
+    free (h_X);
+    free (h_Y1);
+    free (h_Y2);
+
+    acc_free (d_X);
+    acc_free (d_Y);
+
+    context_check (pctx);
+
+    s = cublasDestroy (h);
+    if (s != CUBLAS_STATUS_SUCCESS)
+    {
+        fprintf (stderr, "cublasDestroy failed: %d\n", s);
+        exit (EXIT_FAILURE);
+    }
+
+    context_check (pctx);
+
+    acc_shutdown (acc_device_nvidia);
+
+    r = cuCtxGetCurrent (&pctx);
+    if (r != CUDA_SUCCESS)
+    {
+        fprintf (stderr, "cuCtxGetCurrent failed: %d\n", r);
+        exit (EXIT_FAILURE);
+    }
+
+    if (pctx)
+    {
+        fprintf (stderr, "Unexpected context\n");
+        exit (EXIT_FAILURE);
+    }
+
+    return EXIT_SUCCESS;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c/data-1.c b/libgomp/testsuite/libgomp.oacc-c/data-1.c
index 8f9a17a..e7564cc 100644
--- a/libgomp/testsuite/libgomp.oacc-c/data-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c/data-1.c
@@ -1,19 +1,30 @@
 /* { dg-do run } */
 
-extern void abort ();
+#include <stdlib.h>
+#include <openacc.h>
 
 int i;
 
+int
+is_mapped (void *p, size_t n)
+{
+#if ACC_MEM_SHARED
+  return 1;
+#else
+  return acc_is_present (p, n);
+#endif
+}
+
 int main(void)
 {
   int j;
 
-#if 0
   i = -1;
   j = -2;
 #pragma acc data copyin (i, j)
   {
-    // TODO: check that variables have been mapped.
+    if (!is_mapped (&i, sizeof (i)) || !is_mapped (&j, sizeof (j)))
+      abort ();
     if (i != -1 || j != -2)
       abort ();
     i = 2;
@@ -28,37 +39,30 @@ int main(void)
   j = -2;
 #pragma acc data copyout (i, j)
   {
-    // TODO: check that variables have been mapped.
-    if (i != -1 || j != -2)
-      abort ();
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
+    if (!is_mapped (&i, sizeof (i)) || !is_mapped (&j, sizeof (j)))
       abort ();
-  }
-  if (i != -1 || j != -2)
-    abort ();
-
-  i = -1;
-  j = -2;
-#pragma acc data copy (i, j)
-  {
-    // TODO: check that variables have been mapped.
     if (i != -1 || j != -2)
       abort ();
     i = 2;
     j = 1;
     if (i != 2 || j != 1)
       abort ();
+
+#pragma acc parallel present (i, j)
+    {
+      i = 4;
+      j = 2;
+    }
   }
-  if (i != -1 || j != -2)
+  if (i != 4 || j != 2)
     abort ();
 
   i = -1;
   j = -2;
 #pragma acc data create (i, j)
   {
-    // TODO: check that variables have been mapped.
+    if (!is_mapped (&i, sizeof (i)) || !is_mapped (&j, sizeof (j)))
+      abort ();
     if (i != -1 || j != -2)
       abort ();
     i = 2;
@@ -66,15 +70,15 @@ int main(void)
     if (i != 2 || j != 1)
       abort ();
   }
-  if (i != -1 || j != -2)
+  if (i != 2 || j != 1)
     abort ();
-#endif
 
   i = -1;
   j = -2;
 #pragma acc data present_or_copyin (i, j)
   {
-    // TODO: check that variables have been mapped.
+    if (!is_mapped (&i, sizeof (i)) || !is_mapped (&j, sizeof (j)))
+      abort ();
     if (i != -1 || j != -2)
       abort ();
     i = 2;
@@ -85,28 +89,34 @@ int main(void)
   if (i != 2 || j != 1)
     abort ();
 
-#if 0
   i = -1;
   j = -2;
 #pragma acc data present_or_copyout (i, j)
   {
-    // TODO: check that variables have been mapped.
+    if (!is_mapped (&i, sizeof (i)) || !is_mapped (&j, sizeof (j)))
+      abort ();
     if (i != -1 || j != -2)
       abort ();
     i = 2;
     j = 1;
     if (i != 2 || j != 1)
       abort ();
+
+#pragma acc parallel present (i, j)
+    {
+      i = 4;
+      j = 2;
+    }
   }
-  if (i != -1 || j != -2)
+  if (i != 4 || j != 2)
     abort ();
-#endif
 
   i = -1;
   j = -2;
 #pragma acc data present_or_copy (i, j)
   {
-    // TODO: check that variables have been mapped.
+    if (!is_mapped (&i, sizeof (i)) || !is_mapped (&j, sizeof (j)))
+      abort ();
     if (i != -1 || j != -2)
       abort ();
     i = 2;
@@ -114,47 +124,56 @@ int main(void)
     if (i != 2 || j != 1)
       abort ();
   }
+#if ACC_MEM_SHARED
+  if (i != 2 || j != 1)
+    abort ();
+#else
   if (i != -1 || j != -2)
     abort ();
+#endif
 
-#if 0
   i = -1;
   j = -2;
 #pragma acc data present_or_create (i, j)
   {
-    // TODO: check that variables have been mapped.
+    if (!is_mapped (&i, sizeof (i)) || !is_mapped (&j, sizeof (j)))
+      abort ();
     i = 2;
     j = 1;
     if (i != 2 || j != 1)
       abort ();
   }
-  if (i != -1 || j != -2)
+
+  if (i != 2 || j != 1)
     abort ();
-#endif
 
-#if 0
   i = -1;
   j = -2;
-#pragma acc data present (i, j)
+#pragma acc data copyin (i, j)
   {
-    // TODO: check that variables have been mapped.
-    if (i != -1 || j != -2)
-      abort ();
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
+#pragma acc data present (i, j)
+    {
+      if (!is_mapped (&i, sizeof (i)) || !is_mapped (&j, sizeof (j)))
+        abort ();
+      if (i != -1 || j != -2)
+        abort ();
+      i = 2;
+      j = 1;
+      if (i != 2 || j != 1)
+        abort ();
+    }
   }
-  if (i != -1 || j != -2)
+  if (i != 2 || j != 1)
     abort ();
-#endif
 
-#if 0
   i = -1;
   j = -2;
 #pragma acc data
   {
-    // TODO: check that variables have been mapped.
+#if !ACC_MEM_SHARED
+    if (is_mapped (&i, sizeof (i)) || is_mapped (&j, sizeof (j)))
+      abort ();
+#endif
     if (i != -1 || j != -2)
       abort ();
     i = 2;
@@ -162,9 +181,8 @@ int main(void)
     if (i != 2 || j != 1)
       abort ();
   }
-  if (i != -1 || j != -2)
+  if (i != 2 || j != 1)
     abort ();
-#endif
 
   return 0;
 }
diff --git a/libgomp/testsuite/libgomp.oacc-c/deviceptr-1.c b/libgomp/testsuite/libgomp.oacc-c/deviceptr-1.c
new file mode 100644
index 0000000..e271a37
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/deviceptr-1.c
@@ -0,0 +1,32 @@
+/* { dg-do run } */
+
+#include <stdlib.h>
+
+int main (void)
+{
+  void *a, *a_1, *a_2;
+
+#define A (void *) 0x123
+  a = A;
+
+#pragma acc data copyout (a_1, a_2)
+#pragma acc kernels deviceptr (a)
+  {
+    a_1 = a;
+    a_2 = &a;
+  }
+
+  if (a != A)
+    abort ();
+  if (a_1 != a)
+    abort ();
+#if ACC_MEM_SHARED
+  if (a_2 != &a)
+    abort ();
+#else
+  if (a_2 == &a)
+    abort ();
+#endif
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c/goacc_kernels.c b/libgomp/testsuite/libgomp.oacc-c/goacc_kernels.c
index b41e558..683fefa 100644
--- a/libgomp/testsuite/libgomp.oacc-c/goacc_kernels.c
+++ b/libgomp/testsuite/libgomp.oacc-c/goacc_kernels.c
@@ -1,4 +1,5 @@
 /* { dg-do run } */
+/* { dg-skip-if "" { *-*-* } { "*" } { "-DACC_DEVICE_TYPE_host=1" } } */
 
 #include "libgomp_g.h"
 
@@ -19,7 +20,7 @@ int main(void)
   i = -1;
   GOACC_kernels (0, f, (const void *) 0,
 		 0, (void *) 0, (void *) 0, (void *) 0,
-		 1, 1, 1);
+		 1, 1, 1, -2, -1);
   if (i != 42)
     abort ();
 
diff --git a/libgomp/testsuite/libgomp.oacc-c/goacc_parallel.c b/libgomp/testsuite/libgomp.oacc-c/goacc_parallel.c
index 4ab1e9b..232ce8a 100644
--- a/libgomp/testsuite/libgomp.oacc-c/goacc_parallel.c
+++ b/libgomp/testsuite/libgomp.oacc-c/goacc_parallel.c
@@ -1,4 +1,5 @@
 /* { dg-do run } */
+/* { dg-skip-if "" { *-*-* } { "*" } { "-DACC_DEVICE_TYPE_host=1" } } */
 
 #include "libgomp_g.h"
 
@@ -19,7 +20,7 @@ int main(void)
   i = -1;
   GOACC_parallel (0, f, (const void *) 0,
 		  0, (void *) 0, (void *) 0, (void *) 0,
-		  1, 1, 1);
+		  1, 1, 1, -2, -1);
   if (i != 42)
     abort ();
 
diff --git a/libgomp/testsuite/libgomp.oacc-c/if-1.c b/libgomp/testsuite/libgomp.oacc-c/if-1.c
new file mode 100644
index 0000000..e289f40
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/if-1.c
@@ -0,0 +1,547 @@
+/* { dg-do run } */
+/* { dg-additional-options "-fno-builtin-acc_on_device" } */
+
+#include <openacc.h>
+#include <stdlib.h>
+#include <stdbool.h>
+
+#define N   32
+
+int
+main(int argc, char **argv)
+{
+    float *a, *b, *d_a, *d_b, exp, exp2;
+    int i;
+    const int one = 1;
+    const int zero = 0;
+    int n;
+
+    a = (float *) malloc (N * sizeof (float));
+    b = (float *) malloc (N * sizeof (float));
+    d_a = (float *) acc_malloc (N * sizeof (float));
+    d_b = (float *) acc_malloc (N * sizeof (float));
+
+    for (i = 0; i < N; i++)
+        a[i] = 4.0;
+
+#pragma acc parallel copyin(a[0:N]) copyout(b[0:N]) if(1)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+        {
+            if (acc_on_device (acc_device_host))
+                b[ii] = a[ii] + 1;
+            else
+                b[ii] = a[ii];
+        }
+    }
+
+#if ACC_MEM_SHARED
+    exp = 5.0;
+#else
+    exp = 4.0;
+#endif
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != exp)
+            abort();
+    }
+
+    for (i = 0; i < N; i++)
+        a[i] = 16.0;
+
+#pragma acc parallel if(0)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+        {
+            if (acc_on_device (acc_device_host))
+                b[ii] = a[ii] + 1;
+            else
+                b[ii] = a[ii];
+        }
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != 17.0)
+            abort();
+    }
+
+    for (i = 0; i < N; i++)
+        a[i] = 8.0;
+
+#pragma acc parallel copyin(a[0:N]) copyout(b[0:N]) if(one)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+        {
+            if (acc_on_device (acc_device_host))
+                b[ii] = a[ii] + 1;
+            else
+                b[ii] = a[ii];
+        }
+    }
+
+#if ACC_MEM_SHARED
+    exp = 9.0;
+#else
+    exp = 8.0;
+#endif
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != exp)
+            abort();
+    }
+
+    for (i = 0; i < N; i++)
+        a[i] = 22.0;
+
+#pragma acc parallel if(zero)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+        {
+            if (acc_on_device (acc_device_host))
+                b[ii] = a[ii] + 1;
+            else
+                b[ii] = a[ii];
+        }
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != 23.0)
+            abort();
+    }
+
+    for (i = 0; i < N; i++)
+        a[i] = 16.0;
+
+#pragma acc parallel copyin(a[0:N]) copyout(b[0:N]) if(true)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+        {
+            if (acc_on_device (acc_device_host))
+                b[ii] = a[ii] + 1;
+            else
+                b[ii] = a[ii];
+        }
+    }
+
+#if ACC_MEM_SHARED
+    exp = 17.0;
+#else
+    exp = 16.0;
+#endif
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != exp)
+            abort();
+    }
+
+    for (i = 0; i < N; i++)
+        a[i] = 76.0;
+
+#pragma acc parallel if(false)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+        {
+            if (acc_on_device (acc_device_host))
+                b[ii] = a[ii] + 1;
+            else
+                b[ii] = a[ii];
+        }
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != 77.0)
+            abort();
+    }
+
+    for (i = 0; i < N; i++)
+        a[i] = 22.0;
+
+    n = 1;
+
+#pragma acc parallel copyin(a[0:N]) copyout(b[0:N]) if(n)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+        {
+            if (acc_on_device (acc_device_host))
+                b[ii] = a[ii] + 1;
+            else
+                b[ii] = a[ii];
+        }
+    }
+
+#if ACC_MEM_SHARED
+    exp = 23.0;
+#else
+    exp = 22.0;
+#endif
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != exp)
+            abort();
+    }
+
+    for (i = 0; i < N; i++)
+        a[i] = 18.0;
+
+    n = 0;
+
+#pragma acc parallel if(n)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+        {
+            if (acc_on_device (acc_device_host))
+                b[ii] = a[ii] + 1;
+            else
+                b[ii] = a[ii];
+        }
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != 19.0)
+            abort();
+    }
+
+    for (i = 0; i < N; i++)
+        a[i] = 49.0;
+
+    n = 1;
+
+#pragma acc parallel copyin(a[0:N]) copyout(b[0:N]) if(n + n)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+        {
+            if (acc_on_device (acc_device_host))
+                b[ii] = a[ii] + 1;
+            else
+                b[ii] = a[ii];
+        }
+    }
+
+#if ACC_MEM_SHARED
+    exp = 50.0;
+#else
+    exp = 49.0;
+#endif
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != exp)
+            abort();
+    }
+
+    for (i = 0; i < N; i++)
+        a[i] = 38.0;
+
+    n = 0;
+
+#pragma acc parallel if(n + n)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+        {
+            if (acc_on_device (acc_device_host))
+                b[ii] = a[ii] + 1;
+            else
+                b[ii] = a[ii];
+        }
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != 39.0)
+            abort();
+    }
+
+    for (i = 0; i < N; i++)
+        a[i] = 91.0;
+
+#pragma acc parallel copyin(a[0:N]) copyout(b[0:N]) if(-2)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+        {
+            if (acc_on_device (acc_device_host))
+                b[ii] = a[ii] + 1;
+            else
+                b[ii] = a[ii];
+        }
+    }
+
+#if ACC_MEM_SHARED
+    exp = 92.0;
+#else
+    exp = 91.0;
+#endif
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != exp)
+            abort();
+    }
+
+    for (i = 0; i < N; i++)
+        a[i] = 43.0;
+
+#pragma acc parallel copyin(a[0:N]) copyout(b[0:N]) if(one == 1)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+        {
+            if (acc_on_device (acc_device_host))
+                b[ii] = a[ii] + 1;
+            else
+                b[ii] = a[ii];
+        }
+    }
+
+#if ACC_MEM_SHARED
+    exp = 44.0;
+#else
+    exp = 43.0;
+#endif
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != exp)
+            abort();
+    }
+
+    for (i = 0; i < N; i++)
+        a[i] = 87.0;
+
+#pragma acc parallel if(one == 0)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+        {
+            if (acc_on_device (acc_device_host))
+                b[ii] = a[ii] + 1;
+            else
+                b[ii] = a[ii];
+        }
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != 88.0)
+            abort();
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 3.0;
+        b[i] = 9.0;
+    }
+
+#if ACC_MEM_SHARED
+    exp = 0.0;
+    exp2 = 0.0;
+#else
+    acc_map_data (a, d_a, N * sizeof (float));
+    acc_map_data (b, d_b, N * sizeof (float));
+    exp = 3.0;
+    exp2 = 9.0;
+#endif
+
+#pragma acc update device(a[0:N], b[0:N]) if(1)
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 0.0;
+        b[i] = 0.0;
+    }
+
+#pragma acc update host(a[0:N], b[0:N]) if(1)
+
+    for (i = 0; i < N; i++)
+    {
+        if (a[i] != exp)
+            abort();
+
+        if (b[i] != exp2)
+            abort();
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 6.0;
+        b[i] = 12.0;
+    }
+
+#pragma acc update device(a[0:N], b[0:N]) if(0)
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 0.0;
+        b[i] = 0.0;
+    }
+
+#pragma acc update host(a[0:N], b[0:N]) if(1)
+
+    for (i = 0; i < N; i++)
+    {
+        if (a[i] != exp)
+            abort();
+
+        if (b[i] != exp2)
+            abort();
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 26.0;
+        b[i] = 21.0;
+    }
+
+#pragma acc update device(a[0:N], b[0:N]) if(1)
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 0.0;
+        b[i] = 0.0;
+    }
+
+#pragma acc update host(a[0:N], b[0:N]) if(0)
+
+    for (i = 0; i < N; i++)
+    {
+        if (a[i] != 0.0)
+            abort();
+
+        if (b[i] != 0.0)
+            abort();
+    }
+
+#if !ACC_MEM_SHARED
+    acc_unmap_data (a);
+    acc_unmap_data (b);
+#endif
+
+    acc_free (d_a);
+    acc_free (d_b);
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 4.0;
+        b[i] = 0.0;
+    }
+
+#pragma acc data copyin(a[0:N]) copyout(b[0:N]) if(1)
+{
+#pragma acc parallel present(a[0:N])
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+        {
+            b[ii] = a[ii];
+        }
+    }
+}
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != 4.0)
+            abort();
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 8.0;
+        b[i] = 1.0;
+    }
+
+#pragma acc data copyin(a[0:N]) copyout(b[0:N]) if(0)
+{
+#if !ACC_MEM_SHARED
+    if (acc_is_present (a, N * sizeof (float)))
+        abort ();
+#endif
+
+#if !ACC_MEM_SHARED
+    if (acc_is_present (b, N * sizeof (float)))
+        abort ();
+#endif
+}
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 18.0;
+        b[i] = 21.0;
+    }
+
+#pragma acc data copyin(a[0:N]) if(1)
+{
+#if !ACC_MEM_SHARED
+    if (!acc_is_present (a, N * sizeof (float)))
+        abort ();
+#endif
+
+#pragma acc data copyout(b[0:N]) if(0)
+    {
+#if !ACC_MEM_SHARED
+        if (acc_is_present (b, N * sizeof (float)))
+            abort ();
+#endif
+
+#pragma acc data copyout(b[0:N]) if(1)
+        {
+#pragma acc parallel present(a[0:N]) present(b[0:N])
+            {
+                int ii;
+
+                for (ii = 0; ii < N; ii++)
+                {
+                    b[ii] = a[ii];
+                }
+            }
+        }
+
+#if !ACC_MEM_SHARED
+        if (acc_is_present (b, N * sizeof (float)))
+            abort ();
+#endif
+    }
+}
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != 18.0)
+            abort ();
+	}
+
+#ifdef XXX_TODO_ENTER_END_DATA
+#endif
+
+    return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c/kernels-1.c b/libgomp/testsuite/libgomp.oacc-c/kernels-1.c
index 8550662..3acfdf5 100644
--- a/libgomp/testsuite/libgomp.oacc-c/kernels-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c/kernels-1.c
@@ -1,10 +1,10 @@
 /* { dg-do run } */
 
-extern void abort ();
+#include <stdlib.h>
 
 int i;
 
-int main(void)
+int main (void)
 {
   int j, v;
 
@@ -83,8 +83,15 @@ int main(void)
       abort ();
     v = 1;
   }
-  if (v != 1 || i != -1 || j != -2)
+  if (v != 1)
+    abort ();
+#if ACC_MEM_SHARED
+  if (i != 2 || j != 1)
     abort ();
+#else
+  if (i != -1 || j != -2)
+    abort ();
+#endif
 
   i = -1;
   j = -2;
@@ -127,8 +134,15 @@ int main(void)
       abort ();
     v = 1;
   }
-  if (v != 1 || i != -1 || j != -2)
+  if (v != 1)
+    abort ();
+#if ACC_MEM_SHARED
+  if (i != 2 || j != 1)
     abort ();
+#else
+  if (i != -1 || j != -2)
+    abort ();
+#endif
 
 #if 0
   i = -1;
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-1.c b/libgomp/testsuite/libgomp.oacc-c/lib-1.c
index 8ad1b19..17129d8 100644
--- a/libgomp/testsuite/libgomp.oacc-c/lib-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-1.c
@@ -1,7 +1,24 @@
+/* { dg-do run } */
+
 #include <openacc.h>
 
 int
-main (void)
+main (int argc, char **argv)
 {
+  acc_device_t devtype = acc_device_host;
+
+#if ACC_DEVICE_TYPE_nvidia
+  devtype = acc_device_nvidia;
+
+  if (acc_get_num_devices (devtype) == 0)
+    return 0;
+#endif
+
+  acc_init (devtype);
+
+  acc_init (devtype);
+
   return 0;
 }
+
+/* { dg-shouldfail "libgomp: device already active" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-10.c b/libgomp/testsuite/libgomp.oacc-c/lib-10.c
new file mode 100644
index 0000000..cf1af8c
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-10.c
@@ -0,0 +1,58 @@
+/* { dg-do run } */
+
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  void *d;
+  acc_device_t devtype = acc_device_host;
+
+#if ACC_DEVICE_TYPE_nvidia
+  devtype = acc_device_nvidia;
+
+  if (acc_get_num_devices (acc_device_nvidia) == 0)
+    return 0;
+#endif
+
+  acc_init (devtype);
+
+  d = acc_malloc (0);
+  if (d != NULL)
+    abort ();
+
+  acc_free (0);
+
+  acc_shutdown (devtype);
+
+  acc_set_device_type (devtype);
+
+  d = acc_malloc (0);
+  if (d != NULL)
+    abort ();
+
+  acc_shutdown (devtype);
+
+  acc_init (devtype);
+
+  d = acc_malloc (1024);
+  if (d == NULL)
+    abort ();
+
+  acc_free (d);
+
+  acc_shutdown (devtype);
+
+  acc_set_device_type (devtype);
+
+  d = acc_malloc (1024);
+  if (d == NULL)
+    abort ();
+
+  acc_free (d);
+
+  acc_shutdown (devtype);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-11.c b/libgomp/testsuite/libgomp.oacc-c/lib-11.c
new file mode 100644
index 0000000..b4583ae
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-11.c
@@ -0,0 +1,22 @@
+/* { dg-do run } */
+
+#include <stdlib.h>
+#include <openacc.h>
+#include <stdint.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 512;
+  void *d;
+
+  d = acc_malloc (N);
+  if (d == NULL)
+    abort ();
+
+  acc_free ((void *)((uintptr_t) d + (uintptr_t) (N >> 1)));
+
+  return 0;
+}
+
+/* { dg-shouldfail "libgomp: mem free failed 1" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-12.c b/libgomp/testsuite/libgomp.oacc-c/lib-12.c
new file mode 100644
index 0000000..b46f590
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-12.c
@@ -0,0 +1,37 @@
+/* { dg-do run } */
+/* { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } } */
+
+#include <string.h>
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  int i;
+  unsigned char *h;
+
+  h = (unsigned char *) malloc (N);
+
+  for (i = 0; i < N; i++)
+    {
+      h[i] = i;
+    }
+
+  (void) acc_copyin (h, N);
+
+  memset (h, 0, N);
+
+  acc_copyout (h, N);
+
+  for (i = 0; i < N; i++)
+    {
+      if (h[i] != i)
+	abort ();
+    }
+
+  free (h);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-13.c b/libgomp/testsuite/libgomp.oacc-c/lib-13.c
new file mode 100644
index 0000000..7098ef3
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-13.c
@@ -0,0 +1,60 @@
+/* { dg-do run } */
+
+#include <stdlib.h>
+#include <openacc.h>
+
+#include <stdio.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  int i;
+  unsigned char *h;
+  void *d;
+
+  h = (unsigned char *) malloc (N);
+
+  for (i = 0; i < N; i++)
+    {
+      h[i] = i;
+    }
+
+  d = acc_copyin (h, N);
+
+  if (acc_is_present (h, 1) != 1)
+    abort ();
+
+  if (acc_is_present (h, N + 1) != 0)
+    abort ();
+
+  if (acc_is_present (h + 1, N) != 0)
+    abort ();
+
+  if (acc_is_present (h - 1, N) != 0)
+    abort ();
+
+  if (acc_is_present (h - 1, N - 1) != 0)
+    abort ();
+
+  if (acc_is_present (h + N, 0) != 0)
+    abort ();
+
+  if (acc_is_present (h + N, N) != 0)
+    abort ();
+
+  if (acc_is_present (0, N) != 0)
+    abort ();
+   
+  if (acc_is_present (h, 0) != 0)
+    abort ();
+
+  acc_free (d);
+
+  if (acc_is_present (h, 1) != 0)
+    abort ();
+
+  free (h);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-14.c b/libgomp/testsuite/libgomp.oacc-c/lib-14.c
new file mode 100644
index 0000000..a9632f7
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-14.c
@@ -0,0 +1,61 @@
+/* { dg-do run } */
+
+#include <stdlib.h>
+#include <openacc.h>
+
+#include <stdio.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  int i;
+  unsigned char *h;
+  void *d;
+
+  h = (unsigned char *) malloc (N);
+
+  for (i = 0; i < N; i++)
+    {
+      h[i] = i;
+    }
+
+  d = acc_copyin (h, N);
+
+  if (acc_is_present (h, 1) != 1)
+    abort ();
+
+  if (acc_is_present (h + N - 1, 1) != 1)
+    abort ();
+
+  if (acc_is_present (h - 1, 1) != 0)
+    abort ();
+
+  if (acc_is_present (h + N, 1) != 0)
+    abort ();
+
+  for (i = 0; i < N; i++)
+    {
+      if (acc_is_present (h + i, 1) != 1)
+	abort ();
+    }
+
+  for (i = 0; i < N; i++)
+    {
+      if (acc_is_present (h + i, N - i) != 1)
+	abort ();
+    }
+
+  acc_free (d);
+
+  for (i = 0; i < N; i++)
+    {
+      if (acc_is_present (h + i, N - i) != 0)
+	abort ();
+    }
+
+
+  free (h);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-15.c b/libgomp/testsuite/libgomp.oacc-c/lib-15.c
new file mode 100644
index 0000000..4f6a731
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-15.c
@@ -0,0 +1,33 @@
+/* { dg-do run } */
+
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  int i;
+  unsigned char *h;
+
+  h = (unsigned char *) malloc (N);
+
+  for (i = 0; i < N; i++)
+    {
+      h[i] = i;
+    }
+
+  (void) acc_copyin (h, N);
+
+  acc_copyout (h, N);
+
+  for (i = 0; i < N; i++)
+    {
+      if (acc_is_present (h + i, 1) != 0)
+	abort ();
+    }
+
+  free (h);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-16.c b/libgomp/testsuite/libgomp.oacc-c/lib-16.c
new file mode 100644
index 0000000..9d277ac
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-16.c
@@ -0,0 +1,29 @@
+/* { dg-do run } */
+
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  int i;
+  unsigned char *h;
+
+  h = (unsigned char *) malloc (N);
+
+  for (i = 0; i < N; i++)
+    {
+      h[i] = i;
+    }
+
+  (void) acc_copyin (h, N);
+
+  (void) acc_copyin (h, N);
+
+  free (h);
+
+  return 0;
+}
+
+/* { dg-shouldfail "libgomp: \[\h+,\+256\] already mapped to \[\h+,\+256\]" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-17.c b/libgomp/testsuite/libgomp.oacc-c/lib-17.c
new file mode 100644
index 0000000..5ff894c
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-17.c
@@ -0,0 +1,31 @@
+/* { dg-do run } */
+
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  int i;
+  unsigned char *h;
+
+  h = (unsigned char *) malloc (N);
+
+  for (i = 0; i < N; i++)
+    {
+      h[i] = i;
+    }
+
+  (void) acc_copyin (h, N);
+
+  acc_copyout (h, N);
+
+  acc_copyout (h, N);
+
+  free (h);
+
+  return 0;
+}
+
+/* { dg-shouldfail "libgomp: \[\h+,256\] is not mapped" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-18.c b/libgomp/testsuite/libgomp.oacc-c/lib-18.c
new file mode 100644
index 0000000..2bc3263
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-18.c
@@ -0,0 +1,34 @@
+/* { dg-do run } */
+
+#include <stdlib.h>
+#include <openacc.h>
+
+#include <stdio.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  int i;
+  unsigned char *h;
+  void *d;
+
+  h = (unsigned char *) malloc (N);
+
+  for (i = 0; i < N; i++)
+    {
+      h[i] = i;
+    }
+
+  d = acc_copyin (h, N);
+
+  acc_free (d);
+
+  acc_copyout (h, N);
+
+  free (h);
+
+  return 0;
+}
+
+/* { dg-shouldfail "libgomp: \[\h+,256\] is not mapped" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-19.c b/libgomp/testsuite/libgomp.oacc-c/lib-19.c
new file mode 100644
index 0000000..3581616
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-19.c
@@ -0,0 +1,60 @@
+/* { dg-do run } */
+/* { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } } */
+
+#include <string.h>
+#include <stdlib.h>
+#include <openacc.h>
+
+#include <stdio.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  int i;
+  unsigned char *h[N];
+
+  for (i = 0; i < N; i++)
+    {
+      int j;
+      unsigned char *p;
+
+      h[i] = (unsigned char *) malloc (N);
+      p = h[i];
+
+      for (j = 0; j < N; j++)
+	{
+	  p[j] = i;
+	}
+
+      (void) acc_copyin (p, N);
+    }
+
+  for (i = 0; i < N; i++)
+    {
+      memset (h[i], 0, i);
+    }
+
+  for (i = 0; i < N; i++)
+    {
+      int j;
+      unsigned char *p;
+
+      acc_copyout (h[i], N);
+
+      p = h[i];
+
+      for (j = 0; j < N; j++)
+	{
+	  if (p[j] != i)
+	    abort ();
+	}
+    }
+
+  for (i = 0; i < N; i++)
+    {
+      free (h[i]);
+    }
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-2.c b/libgomp/testsuite/libgomp.oacc-c/lib-2.c
new file mode 100644
index 0000000..9a4501f
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-2.c
@@ -0,0 +1,26 @@
+/* { dg-do run } */
+
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  acc_device_t devtype = acc_device_host;
+
+#if ACC_DEVICE_TYPE_nvidia
+  devtype = acc_device_nvidia;
+
+  if (acc_get_num_devices (acc_device_nvidia) == 0)
+    return 0;
+#endif
+
+  acc_init (devtype);
+
+  acc_shutdown (devtype);
+
+  acc_shutdown (devtype);
+
+  return 0;
+}
+
+/* { dg-shouldfail "libgomp: no device initialized" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-20.c b/libgomp/testsuite/libgomp.oacc-c/lib-20.c
new file mode 100644
index 0000000..b379a8f
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-20.c
@@ -0,0 +1,29 @@
+/* { dg-do run } */
+
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  int i;
+  unsigned char *h;
+
+  h = (unsigned char *) malloc (N);
+
+  for (i = 0; i < N; i++)
+    {
+      h[i] = i;
+    }
+
+  (void) acc_copyin (h, N);
+
+  acc_copyout (h, N + 1);
+
+  free (h);
+
+  return 0;
+}
+
+/* { dg-shouldfail "libgomp: \[\h+,256\] surounds2 \[\h+,\+257\]" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-21.c b/libgomp/testsuite/libgomp.oacc-c/lib-21.c
new file mode 100644
index 0000000..3a67400
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-21.c
@@ -0,0 +1,29 @@
+/* { dg-do run } */
+
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  int i;
+  unsigned char *h;
+
+  h = (unsigned char *) malloc (N);
+
+  for (i = 0; i < N; i++)
+    {
+      h[i] = i;
+    }
+
+  (void) acc_copyin (h, N);
+
+  acc_copyout (h, 0);
+
+  free (h);
+
+  return 0;
+}
+
+/* { dg-shouldfail "libgomp: \[\h+,0\] is not mapped" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-22.c b/libgomp/testsuite/libgomp.oacc-c/lib-22.c
new file mode 100644
index 0000000..2b86da8
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-22.c
@@ -0,0 +1,29 @@
+/* { dg-do run } */
+
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  int i;
+  unsigned char *h;
+
+  h = (unsigned char *) malloc (N);
+
+  for (i = 0; i < N; i++)
+    {
+      h[i] = i;
+    }
+
+  (void) acc_copyin (h, N);
+
+  acc_copyout (h + 1, N - 1);
+
+  free (h);
+
+  return 0;
+}
+
+/* { dg-shouldfail "libgomp: \[\h+,256\] surrounds2 \[\h+,\+255\]" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-23.c b/libgomp/testsuite/libgomp.oacc-c/lib-23.c
new file mode 100644
index 0000000..38f236d
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-23.c
@@ -0,0 +1,39 @@
+/* { dg-do run } */
+
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  int i;
+  unsigned char *h1, *h2;
+
+  h1 = (unsigned char *) malloc (N);
+
+  for (i = 0; i < N; i++)
+    {
+      h1[i] = 0xab;
+    }
+
+  (void) acc_copyin (h1, N);
+
+  h2 = (unsigned char *) malloc (N);
+
+  for (i = 0; i < N; i++)
+    {
+      h2[i] = 0xde;
+    }
+
+  (void) acc_copyin (h2, N);
+
+  acc_copyout (h1, N + N);
+
+  free (h1);
+  free (h2);
+
+  return 0;
+}
+
+/* { dg-shouldfail "libgomp: \[\h+,256\] surrounds2 \[\h+,\+512\]" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-24.c b/libgomp/testsuite/libgomp.oacc-c/lib-24.c
new file mode 100644
index 0000000..d7de8e3
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-24.c
@@ -0,0 +1,55 @@
+/* { dg-do run } */
+
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  int i;
+  unsigned char *h;
+  void *d;
+
+  h = (unsigned char *) malloc (N);
+
+  d = acc_create (h, N);
+  if (!d)
+    abort ();
+
+  for (i = 0; i < N; i++)
+    {
+      if (acc_is_present (h + i, 1) != 1)
+	abort ();
+    }
+
+  acc_delete (h, N);
+
+  for (i = 0; i < N; i++)
+    {
+      if (acc_is_present (h + i, 1) != 0)
+	abort ();
+    }
+
+  d = acc_create (h, N);
+  if (!d)
+    abort ();
+
+  for (i = 0; i < N; i++)
+    {
+      if (acc_is_present (h + i, 1) != 1)
+	abort ();
+    }
+
+  acc_delete (h, N);
+
+  for (i = 0; i < N; i++)
+    {
+      if (acc_is_present (h + i, 1) != 0)
+	abort ();
+    }
+
+  free (h);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-25.c b/libgomp/testsuite/libgomp.oacc-c/lib-25.c
new file mode 100644
index 0000000..1145828
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-25.c
@@ -0,0 +1,30 @@
+/* { dg-do run } */
+
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  unsigned char *h;
+  void *d;
+
+  h = (unsigned char *) malloc (N);
+
+  d = acc_create (h, N);
+  if (!d)
+    abort ();
+
+  d = acc_create (h, N);
+  if (!d)
+    abort ();
+
+  acc_delete (h, N);
+
+  free (h);
+
+  return 0;
+}
+
+/* { dg-shouldfail "libgomp: \[\h+,256\] already mapped to \[\h+,256\]" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-26.c b/libgomp/testsuite/libgomp.oacc-c/lib-26.c
new file mode 100644
index 0000000..a23f56e
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-26.c
@@ -0,0 +1,26 @@
+/* { dg-do run } */
+
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  unsigned char *h;
+  void *d;
+
+  h = (unsigned char *) malloc (N);
+
+  d = acc_create (h, 0);
+  if (!d)
+    abort ();
+
+  acc_delete (h, N);
+
+  free (h);
+
+  return 0;
+}
+
+/* { dg-shouldfail "libgomp: \[\h+,\+0\] is a bad range" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-27.c b/libgomp/testsuite/libgomp.oacc-c/lib-27.c
new file mode 100644
index 0000000..074fddb
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-27.c
@@ -0,0 +1,26 @@
+/* { dg-do run } */
+
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  unsigned char *h;
+  void *d;
+
+  h = (unsigned char *) malloc (N);
+
+  d = acc_create (0, N);
+  if (!d)
+    abort ();
+
+  acc_delete (h, N);
+
+  free (h);
+
+  return 0;
+}
+
+/* { dg-shouldfail "libgomp: \[\(nil\)\] is a bad range" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-28.c b/libgomp/testsuite/libgomp.oacc-c/lib-28.c
new file mode 100644
index 0000000..027f7cc
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-28.c
@@ -0,0 +1,26 @@
+/* { dg-do run } */
+
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  unsigned char *h;
+  void *d;
+
+  h = (unsigned char *) malloc (N);
+
+  d = acc_create (h, N);
+  if (!d)
+    abort ();
+
+  acc_delete (0, N);
+
+  free (h);
+
+  return 0;
+}
+
+/* { dg-shouldfail "libgomp: \[\(nil\),256\] is not mapped" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-29.c b/libgomp/testsuite/libgomp.oacc-c/lib-29.c
new file mode 100644
index 0000000..a66de0f
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-29.c
@@ -0,0 +1,26 @@
+/* { dg-do run } */
+
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  unsigned char *h;
+  void *d;
+
+  h = (unsigned char *) malloc (N);
+
+  d = acc_create (h, N);
+  if (!d)
+    abort ();
+
+  acc_delete (h, 0);
+
+  free (h);
+
+  return 0;
+}
+
+/* { dg-shouldfail "libgomp: \[\h+,0\] is not mapped" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-3.c b/libgomp/testsuite/libgomp.oacc-c/lib-3.c
new file mode 100644
index 0000000..e823a41
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-3.c
@@ -0,0 +1,15 @@
+/* { dg-do run } */
+
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  acc_init (acc_device_host);
+
+  acc_shutdown (acc_device_not_host);
+
+  return 0;
+}
+
+/* { dg-shouldfail "libgomp: device 4(4) is initialized" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-30.c b/libgomp/testsuite/libgomp.oacc-c/lib-30.c
new file mode 100644
index 0000000..ce2bdb4
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-30.c
@@ -0,0 +1,26 @@
+/* { dg-do run } */
+
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  unsigned char *h;
+  void *d;
+
+  h = (unsigned char *) malloc (N);
+
+  d = acc_create (h, N);
+  if (!d)
+    abort ();
+
+  acc_delete (h, N - 2);
+
+  free (h);
+
+  return 0;
+}
+
+/* { dg-shouldfail "libgomp: \[\h+,256\] surrounds2 \[\h+,\+254\]" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-31.c b/libgomp/testsuite/libgomp.oacc-c/lib-31.c
new file mode 100644
index 0000000..25ce5a9
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-31.c
@@ -0,0 +1,27 @@
+/* { dg-do run } */
+
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  unsigned char *h;
+  void *d;
+
+  h = (unsigned char *) malloc (N);
+
+  d = acc_present_or_create (h, N);
+  if (!d)
+    abort ();
+
+  if (acc_is_present (h, 1) != 1)
+    abort ();
+
+  acc_delete (h, N);
+
+  free (h);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-32.c b/libgomp/testsuite/libgomp.oacc-c/lib-32.c
new file mode 100644
index 0000000..e3f87a8
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-32.c
@@ -0,0 +1,38 @@
+/* { dg-do run } */
+
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  unsigned char *h;
+  void *d1, *d2;
+
+  h = (unsigned char *) malloc (N);
+
+  d1 = acc_present_or_create (h, N);
+  if (!d1)
+    abort ();
+
+  d2 = acc_present_or_create (h, N);
+  if (!d2)
+    abort ();
+
+  if (d1 != d2)
+    abort ();
+
+  d2 = acc_pcreate (h, N);
+  if (!d2)
+    abort ();
+
+  if (d1 != d2)
+    abort ();
+
+  acc_delete (h, N);
+
+  free (h);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-33.c b/libgomp/testsuite/libgomp.oacc-c/lib-33.c
new file mode 100644
index 0000000..4abaa02
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-33.c
@@ -0,0 +1,31 @@
+/* { dg-do run } */
+
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  unsigned char *h;
+  void *d1, *d2;
+
+  h = (unsigned char *) malloc (N);
+
+  d1 = acc_present_or_create (h, N);
+  if (!d1)
+    abort ();
+
+  d2 = acc_present_or_create (h, N - 2);
+  if (!d2)
+    abort ();
+
+  if (d1 != d2)
+    abort ();
+
+  acc_delete (h, N);
+
+  free (h);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-34.c b/libgomp/testsuite/libgomp.oacc-c/lib-34.c
new file mode 100644
index 0000000..32d5d51
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-34.c
@@ -0,0 +1,33 @@
+/* { dg-do run } */
+
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  unsigned char *h;
+  void *d1, *d2;
+
+  h = (unsigned char *) malloc (N);
+
+  d1 = acc_present_or_create (h, N);
+  if (!d1)
+    abort ();
+
+  d2 = acc_present_or_create (h + 2, N);
+  if (!d2)
+    abort ();
+
+  if (d1 != d2)
+    abort ();
+
+  acc_delete (h, N);
+
+  free (h);
+
+  return 0;
+}
+
+/* { dg-shouldfail "libgomp: \[\h+,\+256\] not mapped" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-35.c b/libgomp/testsuite/libgomp.oacc-c/lib-35.c
new file mode 100644
index 0000000..ca8edab
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-35.c
@@ -0,0 +1,26 @@
+/* { dg-do run } */
+
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  unsigned char *h;
+  void *d;
+
+  h = (unsigned char *) malloc (N);
+
+  d = acc_present_or_create (0, N);
+  if (!d)
+    abort ();
+
+  acc_delete (h, N);
+
+  free (h);
+
+  return 0;
+}
+
+/* { dg-shouldfail "libgomp: \[\(nil\),+256\] is a bad range" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-36.c b/libgomp/testsuite/libgomp.oacc-c/lib-36.c
new file mode 100644
index 0000000..cb29397
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-36.c
@@ -0,0 +1,26 @@
+/* { dg-do run } */
+
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  unsigned char *h;
+  void *d;
+
+  h = (unsigned char *) malloc (N);
+
+  d = acc_present_or_create (h, 0);
+  if (!d)
+    abort ();
+
+  acc_delete (h, N);
+
+  free (h);
+
+  return 0;
+}
+
+/* { dg-shouldfail "libgomp: \[\h+,\+0\] is a bad range" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-37.c b/libgomp/testsuite/libgomp.oacc-c/lib-37.c
new file mode 100644
index 0000000..5a7d533
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-37.c
@@ -0,0 +1,40 @@
+/* { dg-do run } */
+/* { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } } */
+
+#include <string.h>
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  int i;
+  unsigned char *h;
+  void *d;
+
+  h = (unsigned char *) malloc (N);
+
+  for (i = 0; i < N; i++)
+    {
+      h[i] = i;
+    }
+
+  d = acc_present_or_copyin (h, N);
+  if (!d)
+    abort ();
+
+  memset (&h[0], 0, N);
+
+  acc_copyout (h, N);
+
+  for (i = 0; i < N; i++)
+    {
+      if (h[i] != i)
+	abort ();
+    }
+
+  free (h);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-38.c b/libgomp/testsuite/libgomp.oacc-c/lib-38.c
new file mode 100644
index 0000000..1e16a1d
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-38.c
@@ -0,0 +1,67 @@
+/* { dg-do run } */
+/* { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } } */
+
+#include <string.h>
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  int i;
+  unsigned char *h;
+  void *d1, *d2;
+
+  h = (unsigned char *) malloc (N);
+
+  for (i = 0; i < N; i++)
+    {
+      h[i] = i;
+    }
+
+  d1 = acc_present_or_copyin (h, N);
+  if (!d1)
+    abort ();
+
+  for (i = 0; i < N; i++)
+    {
+      h[i] = 0xab;
+    }
+
+  d2 = acc_present_or_copyin (h, N);
+  if (!d2)
+    abort ();
+
+  if (d1 != d2)
+    abort ();
+
+  memset (&h[0], 0, N);
+
+  acc_copyout (h, N);
+
+  for (i = 0; i < N; i++)
+    {
+      if (h[i] != i)
+	abort ();
+    }
+
+  d2 = acc_pcopyin (h, N);
+  if (!d2)
+    abort ();
+
+  if (d1 != d2)
+    abort ();
+
+  acc_copyout (h, N);
+
+  for (i = 0; i < N; i++)
+    {
+      if (h[i] != i)
+	abort ();
+    }
+
+  free (h);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-39.c b/libgomp/testsuite/libgomp.oacc-c/lib-39.c
new file mode 100644
index 0000000..db1e0b3
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-39.c
@@ -0,0 +1,41 @@
+/* { dg-do run } */
+
+#include <string.h>
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  int i;
+  unsigned char *h;
+  void *d;
+
+  h = (unsigned char *) malloc (N);
+
+  for (i = 0; i < N; i++)
+    {
+      h[i] = i;
+    }
+
+  d = acc_present_or_copyin (0, N);
+  if (!d)
+    abort ();
+
+  memset (&h[0], 0, N);
+
+  acc_copyout (h, N);
+
+  for (i = 0; i < N; i++)
+    {
+      if (h[i] != i)
+	abort ();
+    }
+
+  free (h);
+
+  return 0;
+}
+
+/* { dg-shouldfail "libgomp: \[\(nil\),+256\] is a bad range" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-4.c b/libgomp/testsuite/libgomp.oacc-c/lib-4.c
new file mode 100644
index 0000000..060275b
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-4.c
@@ -0,0 +1,13 @@
+/* { dg-do run } */
+
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  acc_init ((acc_device_t) 99);
+
+  return 0;
+}
+
+/* { dg-shouldfail "libgomp: device 99 is out of range" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-40.c b/libgomp/testsuite/libgomp.oacc-c/lib-40.c
new file mode 100644
index 0000000..cb6c422
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-40.c
@@ -0,0 +1,42 @@
+/* { dg-do run } */
+
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  int i;
+  unsigned char *h;
+  void *d;
+
+  h = (unsigned char *) malloc (N);
+
+  for (i = 0; i < N; i++)
+    {
+      h[i] = i;
+    }
+
+  d = acc_present_or_copyin (h, 0);
+  if (!d)
+    abort ();
+
+  memset (&h[0], 0, N);
+
+  acc_copyout (h, N);
+
+  for (i = 0; i < N; i++)
+    {
+      if (h[i] != i)
+	abort ();
+    }
+
+  free (h);
+
+  return 0;
+}
+
+/* { dg-shouldfail "libgomp: \[\h+,\+0\] is a bad range" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-41.c b/libgomp/testsuite/libgomp.oacc-c/lib-41.c
new file mode 100644
index 0000000..01c5f3c
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-41.c
@@ -0,0 +1,43 @@
+/* { dg-do run } */
+
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  int i;
+  unsigned char *h;
+  void *d;
+
+  h = (unsigned char *) malloc (N);
+
+  for (i = 0; i < N; i++)
+    {
+      h[i] = i;
+    }
+
+  d = acc_copyin (h, N);
+  if (!d)
+    abort ();
+
+  for (i = 0; i < N; i++)
+    {
+      h[i] = 0xab;
+    }
+
+  acc_update_device (h, N);
+
+  acc_copyout (h, N);
+
+  for (i = 0; i < N; i++)
+    {
+      if (h[i] != 0xab)
+	abort ();
+    }
+
+  free (h);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-42.c b/libgomp/testsuite/libgomp.oacc-c/lib-42.c
new file mode 100644
index 0000000..d577fe3
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-42.c
@@ -0,0 +1,35 @@
+/* { dg-do run } */
+
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  int i;
+  unsigned char *h;
+
+  h = (unsigned char *) malloc (N);
+
+  for (i = 0; i < N; i++)
+    {
+      h[i] = i;
+    }
+
+  acc_update_device (h, N);
+
+  acc_copyout (h, N);
+
+  for (i = 0; i < N; i++)
+    {
+      if (h[i] != 0xab)
+	abort ();
+    }
+
+  free (h);
+
+  return 0;
+}
+
+/* { dg-shouldfail "libgomp: \[\h+,256\] is not mapped" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-43.c b/libgomp/testsuite/libgomp.oacc-c/lib-43.c
new file mode 100644
index 0000000..ceeb155
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-43.c
@@ -0,0 +1,45 @@
+/* { dg-do run } */
+
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  int i;
+  unsigned char *h;
+  void *d;
+
+  h = (unsigned char *) malloc (N);
+
+  for (i = 0; i < N; i++)
+    {
+      h[i] = i;
+    }
+
+  d = acc_copyin (h, N);
+  if (!d)
+    abort ();
+
+  for (i = 0; i < N; i++)
+    {
+      h[i] = 0xab;
+    }
+
+  acc_update_device (0, N);
+
+  acc_copyout (h, N);
+
+  for (i = 0; i < N; i++)
+    {
+      if (h[i] != 0xab)
+	abort ();
+    }
+
+  free (h);
+
+  return 0;
+}
+
+/* { dg-shouldfail "libgomp: \[\(nil\),256\] is not mapped" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-44.c b/libgomp/testsuite/libgomp.oacc-c/lib-44.c
new file mode 100644
index 0000000..0cabb0d
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-44.c
@@ -0,0 +1,45 @@
+/* { dg-do run } */
+
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  int i;
+  unsigned char *h;
+  void *d;
+
+  h = (unsigned char *) malloc (N);
+
+  for (i = 0; i < N; i++)
+    {
+      h[i] = i;
+    }
+
+  d = acc_copyin (h, N);
+  if (!d)
+    abort ();
+
+  for (i = 0; i < N; i++)
+    {
+      h[i] = 0xab;
+    }
+
+  acc_update_device (h, 0);
+
+  acc_copyout (h, N);
+
+  for (i = 0; i < N; i++)
+    {
+      if (h[i] != 0xab)
+	abort ();
+    }
+
+  free (h);
+
+  return 0;
+}
+
+/* { dg-shouldfail "libgomp: \[\h+,0\] is not mapped" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-45.c b/libgomp/testsuite/libgomp.oacc-c/lib-45.c
new file mode 100644
index 0000000..f9a6294
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-45.c
@@ -0,0 +1,50 @@
+/* { dg-do run } */
+/* { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } } */
+
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  int i;
+  unsigned char *h;
+  void *d;
+
+  h = (unsigned char *) malloc (N);
+
+  for (i = 0; i < N; i++)
+    {
+      h[i] = i;
+    }
+
+  d = acc_copyin (h, N);
+  if (!d)
+    abort ();
+
+  for (i = 0; i < N; i++)
+    {
+      h[i] = 0xab;
+    }
+
+  acc_update_device (h, N - 2);
+
+  acc_copyout (h, N);
+
+  for (i = 0; i < N - 2; i++)
+    {
+      if (h[i] != 0xab)
+	abort ();
+    }
+
+  for (i = N - 2; i < N; i++)
+    {
+      if (h[i] != i)
+	abort ();
+    }
+
+  free (h);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-46.c b/libgomp/testsuite/libgomp.oacc-c/lib-46.c
new file mode 100644
index 0000000..b195725
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-46.c
@@ -0,0 +1,42 @@
+/* { dg-do run } */
+/* { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } } */
+
+#include <string.h>
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  int i;
+  unsigned char *h;
+  void *d;
+
+  h = (unsigned char *) malloc (N);
+
+  for (i = 0; i < N; i++)
+    {
+      h[i] = i;
+    }
+
+  d = acc_copyin (h, N);
+  if (!d)
+    abort ();
+
+  memset (&h[0], 0, N);
+
+  acc_update_self (h, N);
+
+  for (i = 0; i < N; i++)
+    {
+      if (h[i] != i)
+	abort ();
+    }
+
+  acc_delete (h, N);
+
+  free (h);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-47.c b/libgomp/testsuite/libgomp.oacc-c/lib-47.c
new file mode 100644
index 0000000..a7ff904
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-47.c
@@ -0,0 +1,43 @@
+/* { dg-do run } */
+
+#include <string.h>
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  int i;
+  unsigned char *h;
+  void *d;
+
+  h = (unsigned char *) malloc (N);
+
+  for (i = 0; i < N; i++)
+    {
+      h[i] = i;
+    }
+
+  d = acc_copyin (h, N);
+  if (!d)
+    abort ();
+
+  memset (&h[0], 0, N);
+
+  acc_update_self (0, N);
+
+  for (i = 0; i < N; i++)
+    {
+      if (h[i] != i)
+	abort ();
+    }
+
+  acc_delete (h, N);
+
+  free (h);
+
+  return 0;
+}
+
+/* { dg-shouldfail "libgomp: \[\(nil\),256\] is not mapped" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-48.c b/libgomp/testsuite/libgomp.oacc-c/lib-48.c
new file mode 100644
index 0000000..01d3c6c
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-48.c
@@ -0,0 +1,43 @@
+/* { dg-do run } */
+
+#include <string.h>
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  int i;
+  unsigned char *h;
+  void *d;
+
+  h = (unsigned char *) malloc (N);
+
+  for (i = 0; i < N; i++)
+    {
+      h[i] = i;
+    }
+
+  d = acc_copyin (h, N);
+  if (!d)
+    abort ();
+
+  memset (&h[0], 0, N);
+
+  acc_update_self (h, 0);
+
+  for (i = 0; i < N; i++)
+    {
+      if (h[i] != i)
+	abort ();
+    }
+
+  acc_delete (h, N);
+
+  free (h);
+
+  return 0;
+}
+
+/* { dg-shouldfail "libgomp: \[\h+,0\] is not mapped" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-49.c b/libgomp/testsuite/libgomp.oacc-c/lib-49.c
new file mode 100644
index 0000000..a33324c
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-49.c
@@ -0,0 +1,48 @@
+/* { dg-do run } */
+/* { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } } */
+
+#include <string.h>
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  int i;
+  unsigned char *h;
+  void *d;
+
+  h = (unsigned char *) malloc (N);
+
+  for (i = 0; i < N; i++)
+    {
+      h[i] = i;
+    }
+
+  d = acc_copyin (h, N);
+  if (!d)
+    abort ();
+
+  memset (&h[0], 0, N);
+
+  acc_update_self (h, N - 2);
+
+  for (i = 0; i < N - 2; i++)
+    {
+      if (h[i] != i)
+	abort ();
+    }
+
+  for (i = N - 2; i < N; i++)
+    {
+      if (h[i] != 0)
+	abort ();
+    }
+
+  acc_delete (h, N);
+
+  free (h);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-5.c b/libgomp/testsuite/libgomp.oacc-c/lib-5.c
new file mode 100644
index 0000000..961a62c
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-5.c
@@ -0,0 +1,40 @@
+/* { dg-do run } */
+
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  if (acc_get_device_type () == acc_device_default)
+    abort ();
+
+  acc_init (acc_device_default);
+
+  if (acc_get_device_type () == acc_device_default)
+    abort ();
+
+  acc_shutdown (acc_device_default);
+
+  if (acc_get_num_devices (acc_device_nvidia) != 0)
+    {
+      acc_init (acc_device_nvidia);
+
+      if (acc_get_device_type () != acc_device_nvidia)
+        abort ();
+
+      acc_shutdown (acc_device_nvidia);
+
+      acc_init (acc_device_default);
+
+      acc_set_device_type (acc_device_nvidia);
+
+      if (acc_get_device_type () != acc_device_nvidia)
+        abort ();
+
+      acc_shutdown (acc_device_nvidia);
+    }
+
+  return 0;
+
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-50.c b/libgomp/testsuite/libgomp.oacc-c/lib-50.c
new file mode 100644
index 0000000..e8294e1
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-50.c
@@ -0,0 +1,30 @@
+/* { dg-do run } */
+/* { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } } */
+
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  unsigned char *h;
+  void *d;
+
+  h = (unsigned char *) malloc (N);
+
+  d = acc_malloc (N);
+
+  acc_map_data (h, d, N);
+
+  if (acc_is_present (h, N) != 1)
+    abort ();
+
+  acc_unmap_data (h);
+
+  acc_free (d);
+
+  free (h);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-51.c b/libgomp/testsuite/libgomp.oacc-c/lib-51.c
new file mode 100644
index 0000000..29d28f2
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-51.c
@@ -0,0 +1,41 @@
+/* { dg-do run } */
+/* { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } } */
+
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  int i;
+  unsigned char *h[N];
+  void *d[N];
+
+  for (i = 0; i < N; i++)
+    {
+      h[i] = (unsigned char *) malloc (N);
+      d[i] = acc_malloc (N);
+
+      acc_map_data (h[i], d[i], N);
+    }
+
+  for (i = 0; i < N; i++)
+    {
+      if (acc_is_present (h[i], N) != 1)
+	abort ();
+    }
+
+  for (i = 0; i < N; i++)
+    {
+      acc_unmap_data (h[i]);
+
+      if (acc_is_present (h[i], N) != 0)
+	abort ();
+
+      acc_free (d[i]);
+      free (h[i]);
+    }
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-52.c b/libgomp/testsuite/libgomp.oacc-c/lib-52.c
new file mode 100644
index 0000000..780db31
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-52.c
@@ -0,0 +1,28 @@
+/* { dg-do run } */
+
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  unsigned char *h;
+  void *d;
+
+  h = (unsigned char *) malloc (N);
+
+  d = acc_malloc (N);
+
+  acc_map_data (0, d, N);
+
+  acc_unmap_data (h);
+
+  acc_free (d);
+
+  free (h);
+
+  return 0;
+}
+
+/* { dg-shouldfail "libgomp: \[(nil),+256\]->\[\h+,\+256\] is a bad map" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-53.c b/libgomp/testsuite/libgomp.oacc-c/lib-53.c
new file mode 100644
index 0000000..657adde
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-53.c
@@ -0,0 +1,28 @@
+/* { dg-do run } */
+
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  unsigned char *h;
+  void *d;
+
+  h = (unsigned char *) malloc (N);
+
+  d = acc_malloc (N);
+
+  acc_map_data (h, 0, N);
+
+  acc_unmap_data (h);
+
+  acc_free (d);
+
+  free (h);
+
+  return 0;
+}
+
+/* { dg-shouldfail "libgomp: \[\h+,\+256\]->\[(nil),\+256\] is a bad map" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-54.c b/libgomp/testsuite/libgomp.oacc-c/lib-54.c
new file mode 100644
index 0000000..1f3df80
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-54.c
@@ -0,0 +1,28 @@
+/* { dg-do run } */
+
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  unsigned char *h;
+  void *d;
+
+  h = (unsigned char *) malloc (N);
+
+  d = acc_malloc (N);
+
+  acc_map_data (h, d, 0);
+
+  acc_unmap_data (h);
+
+  acc_free (d);
+
+  free (h);
+
+  return 0;
+}
+
+/* { dg-shouldfail "libgomp: \[\h+,\+0\]->\[\h+,\+0\] is a bad map" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-55.c b/libgomp/testsuite/libgomp.oacc-c/lib-55.c
new file mode 100644
index 0000000..286653f
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-55.c
@@ -0,0 +1,48 @@
+/* { dg-do run } */
+/* { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } } */
+
+#include <stdlib.h>
+#include <openacc.h>
+#include <stdint.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  unsigned char *h;
+  int i;
+  void *d;
+
+  h = (unsigned char *) malloc (N);
+
+  d = acc_malloc (N);
+
+  for (i = 0; i < N; i++)
+    {
+      acc_map_data ((void *)((uintptr_t) h + (uintptr_t) i),
+                    				(void *)((uintptr_t) d + (uintptr_t) i), 1);
+    }
+
+  for (i = 0; i < N; i++)
+    {
+      if (acc_is_present (h + 1, 1) != 1)
+	abort ();
+    }
+
+  for (i = 0; i < N; i++)
+    {
+      acc_unmap_data (h + i);
+    }
+
+  for (i = 0; i < N; i++)
+    {
+      if (acc_is_present (h + 1, 1) != 0)
+	abort ();
+    }
+
+  acc_free (d);
+
+  free (h);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-56.c b/libgomp/testsuite/libgomp.oacc-c/lib-56.c
new file mode 100644
index 0000000..e3f5a80
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-56.c
@@ -0,0 +1,33 @@
+/* { dg-do run } */
+/* { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } } */
+
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  unsigned char *h;
+  void *d;
+
+  h = (unsigned char *) malloc (N);
+
+  d = acc_malloc (N);
+
+  acc_map_data (h, d, N >> 1);
+
+  if (acc_is_present (h, 1) != 1)
+    abort ();
+
+  if (acc_is_present (h + (N >> 1), 1) != 0)
+    abort ();
+
+  acc_unmap_data (h);
+
+  acc_free (d);
+
+  free (h);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-57.c b/libgomp/testsuite/libgomp.oacc-c/lib-57.c
new file mode 100644
index 0000000..f9043a4
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-57.c
@@ -0,0 +1,28 @@
+/* { dg-do run } */
+
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  unsigned char *h;
+  void *d;
+
+  h = (unsigned char *) malloc (N);
+
+  d = acc_malloc (N);
+
+  acc_map_data (h, d, N);
+
+  acc_unmap_data (d);
+
+  acc_free (d);
+
+  free (h);
+
+  return 0;
+}
+
+/* { dg-shouldfail "libgomp: \h+ is not a mapped block" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-58.c b/libgomp/testsuite/libgomp.oacc-c/lib-58.c
new file mode 100644
index 0000000..9d6e27d
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-58.c
@@ -0,0 +1,28 @@
+/* { dg-do run } */
+
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  unsigned char *h;
+  void *d;
+
+  h = (unsigned char *) malloc (N);
+
+  d = acc_malloc (N);
+
+  acc_map_data (h, d, N);
+
+  acc_unmap_data (0);
+
+  acc_free (d);
+
+  free (h);
+
+  return 0;
+}
+
+/* { dg-shouldfail "libgomp: \(nil\) is not a mapped block" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-59.c b/libgomp/testsuite/libgomp.oacc-c/lib-59.c
new file mode 100644
index 0000000..2f087ae
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-59.c
@@ -0,0 +1,55 @@
+/* { dg-do run } */
+/* { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } } */
+
+#include <stdlib.h>
+#include <openacc.h>
+#include <stdint.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  int i;
+  unsigned char *h;
+  void *d;
+
+  h = (unsigned char *) malloc (N);
+
+  d = acc_malloc (N);
+
+  acc_map_data (h, d, N);
+
+  for (i = 0; i < N; i++)
+    {
+      if (acc_hostptr ((void *)((uintptr_t) d + (uintptr_t) i)) !=
+                            (void *)((uintptr_t) h + (uintptr_t) i))
+	abort ();
+    }
+
+  for (i = 0; i < N; i++)
+    {
+      if (acc_deviceptr ((void *)((uintptr_t) h + (uintptr_t) i)) !=
+                            (void *)((uintptr_t) d + (uintptr_t) i))
+	abort ();
+    }
+
+  acc_unmap_data (h);
+
+  for (i = 0; i < N; i++)
+    {
+      if (acc_hostptr ((void *)((uintptr_t) d + (uintptr_t) i)) != 0)
+	abort ();
+    }
+
+  for (i = 0; i < N; i++)
+    {
+      if (acc_deviceptr (h + i) != 0)
+	abort ();
+    }
+
+  acc_free (d);
+
+  free (h);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-6.c b/libgomp/testsuite/libgomp.oacc-c/lib-6.c
new file mode 100644
index 0000000..afdd480
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-6.c
@@ -0,0 +1,39 @@
+/* { dg-do run } */
+
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  int devnum;
+
+  if (acc_get_device_type () == acc_device_default)
+    abort ();
+
+  if (acc_get_num_devices (acc_device_nvidia) == 0)
+    return 0;
+
+  acc_set_device_type (acc_device_nvidia);
+
+  if (acc_get_device_type () != acc_device_nvidia)
+    abort ();
+
+  acc_shutdown (acc_device_nvidia);
+
+  acc_set_device_type (acc_device_nvidia);
+
+  if (acc_get_device_type () != acc_device_nvidia)
+    abort ();
+
+  devnum = acc_get_num_devices (acc_device_host);
+  if (devnum != 1)
+    abort ();
+
+  acc_shutdown (acc_device_nvidia);
+
+  if (acc_get_device_type () == acc_device_default)
+    abort ();
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-60.c b/libgomp/testsuite/libgomp.oacc-c/lib-60.c
new file mode 100644
index 0000000..ccae728
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-60.c
@@ -0,0 +1,54 @@
+/* { dg-do run } */
+/* { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } } */
+
+#include <string.h>
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  int i;
+  unsigned char *h;
+  void *d;
+
+  h = (unsigned char *) malloc (N);
+
+  for (i = 0; i < N; i++)
+    {
+      h[i] = i;
+    }
+
+  d = acc_malloc (N);
+
+  acc_memcpy_to_device (d, h, N);
+
+  for (i = 0; i < N; i++)
+    {
+      if (acc_is_present (h + i, 1) != 0)
+	abort ();
+    }
+
+  memset (&h[0], 0, N);
+
+  acc_memcpy_from_device (h, d, N);
+
+  for (i = 0; i < N; i++)
+    {
+      if (h[i] != i)
+	abort ();
+    }
+
+  for (i = 0; i < N; i++)
+    {
+      if (acc_is_present (h + i, 1) != 0)
+	abort ();
+    }
+
+  acc_free (d);
+
+  free (h);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-61.c b/libgomp/testsuite/libgomp.oacc-c/lib-61.c
new file mode 100644
index 0000000..ce66ced
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-61.c
@@ -0,0 +1,70 @@
+/* { dg-do run } */
+/* { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } } */
+
+#include <string.h>
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  int i;
+  unsigned char *h[N];
+  void *d[N];
+
+  for (i = 0; i < N; i++)
+    {
+      int j;
+      unsigned char *p;
+
+      h[i] = (unsigned char *) malloc (N);
+
+      p = h[i];
+
+      for (j = 0; j < N; j++)
+	{
+	  p[j] = i;
+	}
+
+      d[i] = acc_malloc (N);
+
+      acc_memcpy_to_device (d[i], h[i], N);
+
+      for (j = 0; j < N; j++)
+	{
+	  if (acc_is_present (h[i] + j, 1) != 0)
+	    abort ();
+	}
+    }
+
+  for (i = 0; i < N; i++)
+    {
+      int j;
+      unsigned char *p;
+
+      memset (h[i], 0, N);
+
+      acc_memcpy_from_device (h[i], d[i], N);
+
+      p = h[i];
+
+      for (j = 0; j < N; j++)
+	{
+	  if (p[j] != i)
+	    abort ();
+	}
+
+      for (j = 0; j < N; j++)
+	{
+	  if (acc_is_present (h[i] + j, 1) != 0)
+	    abort ();
+	}
+
+      acc_free (d[i]);
+
+      free (h[i]);
+    }
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-62.c b/libgomp/testsuite/libgomp.oacc-c/lib-62.c
new file mode 100644
index 0000000..e6178e2
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-62.c
@@ -0,0 +1,49 @@
+/* { dg-do run } */
+
+#include <string.h>
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  int i;
+  unsigned char *h;
+  void *d;
+
+  acc_init (acc_device_nvidia);
+
+  h = (unsigned char *) malloc (N);
+
+  for (i = 0; i < N; i++)
+    {
+      h[i] = i;
+    }
+
+  d = acc_malloc (N);
+
+  acc_memcpy_to_device (d, h, N);
+
+  memset (&h[0], 0, N);
+
+  acc_memcpy_to_device (d, h, N << 1);
+
+  acc_memcpy_from_device (h, d, N);
+
+  for (i = 0; i < N; i++)
+    {
+      if (h[i] != i)
+	abort ();
+    }
+
+  acc_free (d);
+
+  free (h);
+
+  acc_shutdown (acc_device_nvidia);
+
+  return 0;
+}
+
+/* { dg-shouldfail "libgomp: invalid size" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-63.c b/libgomp/testsuite/libgomp.oacc-c/lib-63.c
new file mode 100644
index 0000000..ca237ec
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-63.c
@@ -0,0 +1,43 @@
+/* { dg-do run } */
+
+#include <string.h>
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  int i;
+  unsigned char *h;
+  void *d;
+
+  h = (unsigned char *) malloc (N);
+
+  for (i = 0; i < N; i++)
+    {
+      h[i] = i;
+    }
+
+  d = acc_malloc (N);
+
+  acc_memcpy_to_device (0, h, N);
+
+  memset (&h[0], 0, N);
+
+  acc_memcpy_from_device (h, d, N);
+
+  for (i = 0; i < N; i++)
+    {
+      if (h[i] != i)
+	abort ();
+    }
+
+  acc_free (d);
+
+  free (h);
+
+  return 0;
+}
+
+/* { dg-shouldfail "libgomp: invalid device address" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-64.c b/libgomp/testsuite/libgomp.oacc-c/lib-64.c
new file mode 100644
index 0000000..850fd2e
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-64.c
@@ -0,0 +1,43 @@
+/* { dg-do run } */
+
+#include <string.h>
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  int i;
+  unsigned char *h;
+  void *d;
+
+  h = (unsigned char *) malloc (N);
+
+  for (i = 0; i < N; i++)
+    {
+      h[i] = i;
+    }
+
+  d = acc_malloc (N);
+
+  acc_memcpy_to_device (d, 0, N);
+
+  memset (&h[0], 0, N);
+
+  acc_memcpy_from_device (h, d, N);
+
+  for (i = 0; i < N; i++)
+    {
+      if (h[i] != i)
+	abort ();
+    }
+
+  acc_free (d);
+
+  free (h);
+
+  return 0;
+}
+
+/* { dg-shouldfail "libgomp: invalid host address" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-65.c b/libgomp/testsuite/libgomp.oacc-c/lib-65.c
new file mode 100644
index 0000000..26c8cef
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-65.c
@@ -0,0 +1,43 @@
+/* { dg-do run } */
+
+#include <string.h>
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  int i;
+  unsigned char *h;
+  void *d;
+
+  h = (unsigned char *) malloc (N);
+
+  for (i = 0; i < N; i++)
+    {
+      h[i] = i;
+    }
+
+  d = acc_malloc (N);
+
+  acc_memcpy_to_device (d, d, N);
+
+  memset (&h[0], 0, N);
+
+  acc_memcpy_from_device (h, d, N);
+
+  for (i = 0; i < N; i++)
+    {
+      if (h[i] != i)
+	abort ();
+    }
+
+  acc_free (d);
+
+  free (h);
+
+  return 0;
+}
+
+/* { dg-shouldfail "libgomp: invalid host or device address" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-66.c b/libgomp/testsuite/libgomp.oacc-c/lib-66.c
new file mode 100644
index 0000000..360c05b
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-66.c
@@ -0,0 +1,47 @@
+/* { dg-do run } */
+
+#include <string.h>
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  int i;
+  unsigned char *h;
+  void *d;
+
+  acc_init (acc_device_nvidia);
+
+  h = (unsigned char *) malloc (N);
+
+  for (i = 0; i < N; i++)
+    {
+      h[i] = i;
+    }
+
+  d = acc_malloc (N);
+
+  acc_memcpy_to_device (d, h, N);
+
+  memset (&h[0], 0, N);
+
+  acc_memcpy_to_device (d, h, 0);
+
+  acc_memcpy_from_device (h, d, N);
+
+  for (i = 0; i < N; i++)
+    {
+      if (h[i] != i)
+	abort ();
+    }
+
+  acc_free (d);
+
+  free (h);
+
+  acc_shutdown (acc_device_nvidia);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-67.c b/libgomp/testsuite/libgomp.oacc-c/lib-67.c
new file mode 100644
index 0000000..01b8b2d
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-67.c
@@ -0,0 +1,43 @@
+/* { dg-do run } */
+
+#include <string.h>
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  int i;
+  unsigned char *h;
+  void *d;
+
+  h = (unsigned char *) malloc (N);
+
+  for (i = 0; i < N; i++)
+    {
+      h[i] = i;
+    }
+
+  d = acc_malloc (N);
+
+  acc_memcpy_to_device (d, h, N);
+
+  memset (&h[0], 0, N);
+
+  acc_memcpy_from_device (0, d, N);
+
+  for (i = 0; i < N; i++)
+    {
+      if (h[i] != i)
+	abort ();
+    }
+
+  acc_free (d);
+
+  free (h);
+
+  return 0;
+}
+
+/* { dg-shouldfail "libgomp: invalid host address" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-68.c b/libgomp/testsuite/libgomp.oacc-c/lib-68.c
new file mode 100644
index 0000000..3ff5bd7
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-68.c
@@ -0,0 +1,43 @@
+/* { dg-do run } */
+
+#include <string.h>
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 256;
+  int i;
+  unsigned char *h;
+  void *d;
+
+  h = (unsigned char *) malloc (N);
+
+  for (i = 0; i < N; i++)
+    {
+      h[i] = i;
+    }
+
+  d = acc_malloc (N);
+
+  acc_memcpy_to_device (d, h, N);
+
+  memset (&h[0], 0, N);
+
+  acc_memcpy_from_device (h, 0, N);
+
+  for (i = 0; i < N; i++)
+    {
+      if (h[i] != i)
+	abort ();
+    }
+
+  acc_free (d);
+
+  free (h);
+
+  return 0;
+}
+
+/* { dg-shouldfail "libgomp: invalid device address" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-69.c b/libgomp/testsuite/libgomp.oacc-c/lib-69.c
new file mode 100644
index 0000000..5462f12
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-69.c
@@ -0,0 +1,124 @@
+/* { dg-do run { target openacc_nvidia_accel_selected } } */
+/* { dg-additional-options "-lcuda" } */
+
+#include <stdio.h>
+#include <unistd.h>
+#include <openacc.h>
+#include <cuda.h>
+
+int
+main (int argc, char **argv)
+{
+  CUdevice dev;
+  CUfunction delay;
+  CUmodule module;
+  CUresult r;
+  CUstream stream;
+  unsigned long *a, *d_a, dticks;
+  int nbytes;
+  float dtime;
+  void *kargs[2];
+  int clkrate;
+  int devnum, nprocs;
+
+  acc_init (acc_device_nvidia);
+
+  devnum = acc_get_device_num (acc_device_nvidia);
+
+  r = cuDeviceGet (&dev, devnum);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuDeviceGet failed: %d\n", r);
+      abort ();
+    }
+
+  r =
+    cuDeviceGetAttribute (&nprocs, CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT,
+			  dev);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
+      abort ();
+    }
+
+  r = cuDeviceGetAttribute (&clkrate, CU_DEVICE_ATTRIBUTE_CLOCK_RATE, dev);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
+      abort ();
+    }
+
+  r = cuModuleLoad (&module, "subr.ptx");
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuModuleLoad failed: %d\n", r);
+      abort ();
+    }
+
+  r = cuModuleGetFunction (&delay, module, "delay");
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuModuleGetFunction failed: %d\n", r);
+      abort ();
+    }
+
+  nbytes = nprocs * sizeof (unsigned long);
+
+  dtime = 200.0;
+
+  dticks = (unsigned long) (dtime * clkrate);
+
+  a = (unsigned long *) malloc (nbytes);
+  d_a = (unsigned long *) acc_malloc (nbytes);
+
+  acc_map_data (a, d_a, nbytes);
+
+  kargs[0] = (void *) &d_a;
+  kargs[1] = (void *) &dticks;
+
+  stream = (CUstream) acc_get_cuda_stream (0);
+  if (stream != NULL)
+    abort ();
+
+  r = cuStreamCreate (&stream, CU_STREAM_DEFAULT);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuStreamCreate failed: %d\n", r);
+      abort ();
+    }
+
+  if (!acc_set_cuda_stream (0, stream))
+    abort ();
+
+  r = cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, stream, kargs, 0);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuLaunchKernel failed: %d\n", r);
+      abort ();
+    }
+
+  if (acc_async_test (0) != 0)
+    {
+      fprintf (stderr, "asynchronous operation not running\n");
+      abort ();
+    }
+
+  sleep (1);
+
+  if (acc_async_test (0) != 1)
+    {
+      fprintf (stderr, "found asynchronous operation still running\n");
+      abort ();
+    }
+
+  acc_unmap_data (a);
+
+  free (a);
+  acc_free (d_a);
+
+  acc_shutdown (acc_device_nvidia);
+
+  exit (0);
+}
+
+/* { dg-output "" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-7.c b/libgomp/testsuite/libgomp.oacc-c/lib-7.c
new file mode 100644
index 0000000..e78734b
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-7.c
@@ -0,0 +1,18 @@
+/* { dg-do run } */
+
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  if (acc_get_num_devices (acc_device_none) != 0)
+    abort ();
+
+  if (acc_get_num_devices (acc_device_host) == 0)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-output "" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-70.c b/libgomp/testsuite/libgomp.oacc-c/lib-70.c
new file mode 100644
index 0000000..912b266
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-70.c
@@ -0,0 +1,136 @@
+/* { dg-do run { target openacc_nvidia_accel_selected } } */
+/* { dg-additional-options "-lcuda" } */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <openacc.h>
+#include <cuda.h>
+
+int
+main (int argc, char **argv)
+{
+  CUdevice dev;
+  CUfunction delay;
+  CUmodule module;
+  CUresult r;
+  const int N = 10;
+  int i;
+  CUstream streams[N];
+  unsigned long *a, *d_a, dticks;
+  int nbytes;
+  float dtime;
+  void *kargs[2];
+  int clkrate;
+  int devnum, nprocs;
+
+  acc_init (acc_device_nvidia);
+
+  devnum = acc_get_device_num (acc_device_nvidia);
+
+  r = cuDeviceGet (&dev, devnum);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuDeviceGet failed: %d\n", r);
+      abort ();
+    }
+
+  r =
+    cuDeviceGetAttribute (&nprocs, CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT,
+			  dev);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
+      abort ();
+    }
+
+  r = cuDeviceGetAttribute (&clkrate, CU_DEVICE_ATTRIBUTE_CLOCK_RATE, dev);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
+      abort ();
+    }
+
+  r = cuModuleLoad (&module, "subr.ptx");
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuModuleLoad failed: %d\n", r);
+      abort ();
+    }
+
+  r = cuModuleGetFunction (&delay, module, "delay");
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuModuleGetFunction failed: %d\n", r);
+      abort ();
+    }
+
+  nbytes = nprocs * sizeof (unsigned long);
+
+  dtime = 200.0;
+
+  dticks = (unsigned long) (dtime * clkrate);
+
+  a = (unsigned long *) malloc (nbytes);
+  d_a = (unsigned long *) acc_malloc (nbytes);
+
+  acc_map_data (a, d_a, nbytes);
+
+  kargs[0] = (void *) &d_a;
+  kargs[1] = (void *) &dticks;
+
+  for (i = 0; i < N; i++)
+    {
+      streams[i] = (CUstream) acc_get_cuda_stream (i);
+      if (streams[i] != NULL)
+	abort ();
+
+      r = cuStreamCreate (&streams[i], CU_STREAM_DEFAULT);
+      if (r != CUDA_SUCCESS)
+	{
+	  fprintf (stderr, "cuStreamCreate failed: %d\n", r);
+	  abort ();
+	}
+
+        if (!acc_set_cuda_stream (i, streams[i]))
+	  abort ();
+    }
+
+  for (i = 0; i < N; i++)
+    {
+      r = cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, streams[i], kargs, 0);
+      if (r != CUDA_SUCCESS)
+	{
+	  fprintf (stderr, "cuLaunchKernel failed: %d\n", r);
+	  abort ();
+	}
+
+      if (acc_async_test (i) != 0)
+	{
+	  fprintf (stderr, "asynchronous operation not running\n");
+	  abort ();
+	}
+    }
+
+  sleep ((int) (dtime / 1000.0f) + 1);
+
+  for (i = 0; i < N; i++)
+    {
+      if (acc_async_test (i) != 1)
+	{
+	  fprintf (stderr, "found asynchronous operation still running\n");
+	  abort ();
+	}
+    }
+
+  acc_unmap_data (a);
+
+  free (a);
+  acc_free (d_a);
+
+  acc_shutdown (acc_device_nvidia);
+
+  exit (0);
+}
+
+/* { dg-output "" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-71.c b/libgomp/testsuite/libgomp.oacc-c/lib-71.c
new file mode 100644
index 0000000..a045379
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-71.c
@@ -0,0 +1,119 @@
+/* { dg-do run { target openacc_nvidia_accel_selected } } */
+/* { dg-additional-options "-lcuda" } */
+
+#include <stdio.h>
+#include <unistd.h>
+#include <openacc.h>
+#include <cuda.h>
+
+int
+main (int argc, char **argv)
+{
+  CUdevice dev;
+  CUfunction delay;
+  CUmodule module;
+  CUresult r;
+  CUstream stream;
+  unsigned long *a, *d_a, dticks;
+  int nbytes;
+  float dtime;
+  void *kargs[2];
+  int clkrate;
+  int devnum, nprocs;
+
+  acc_init (acc_device_nvidia);
+
+  devnum = acc_get_device_num (acc_device_nvidia);
+
+  r = cuDeviceGet (&dev, devnum);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuDeviceGet failed: %d\n", r);
+      abort ();
+    }
+
+  r =
+    cuDeviceGetAttribute (&nprocs, CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT,
+			  dev);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
+      abort ();
+    }
+
+  r = cuDeviceGetAttribute (&clkrate, CU_DEVICE_ATTRIBUTE_CLOCK_RATE, dev);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
+      abort ();
+    }
+
+  r = cuModuleLoad (&module, "subr.ptx");
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuModuleLoad failed: %d\n", r);
+      abort ();
+    }
+
+  r = cuModuleGetFunction (&delay, module, "delay");
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuModuleGetFunction failed: %d\n", r);
+      abort ();
+    }
+
+  nbytes = nprocs * sizeof (unsigned long);
+
+  dtime = 200.0;
+
+  dticks = (unsigned long) (dtime * clkrate);
+
+  a = (unsigned long *) malloc (nbytes);
+  d_a = (unsigned long *) acc_malloc (nbytes);
+
+  acc_map_data (a, d_a, nbytes);
+
+  kargs[0] = (void *) &d_a;
+  kargs[1] = (void *) &dticks;
+
+  r = cuStreamCreate (&stream, CU_STREAM_DEFAULT);
+  if (r != CUDA_SUCCESS)
+	{
+	  fprintf (stderr, "cuStreamCreate failed: %d\n", r);
+	  abort ();
+	}
+
+  acc_set_cuda_stream (0, stream);
+
+  r = cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, stream, kargs, 0);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuLaunchKernel failed: %d\n", r);
+      abort ();
+    }
+
+  if (acc_async_test (1) != 0)
+    {
+      fprintf (stderr, "asynchronous operation not running\n");
+      abort ();
+    }
+
+  sleep ((int) (dtime / 1000.0f) + 1);
+
+  if (acc_async_test (1) != 1)
+    {
+      fprintf (stderr, "found asynchronous operation still running\n");
+      abort ();
+    }
+
+  acc_unmap_data (a);
+
+  free (a);
+  acc_free (d_a);
+
+  acc_shutdown (acc_device_nvidia);
+
+  return 0;
+}
+
+/* { dg-shouldfail "libgomp: unknown async \d" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-72.c b/libgomp/testsuite/libgomp.oacc-c/lib-72.c
new file mode 100644
index 0000000..e383ba0
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-72.c
@@ -0,0 +1,121 @@
+/* { dg-do run { target openacc_nvidia_accel_selected } } */
+/* { dg-additional-options "-lcuda" } */
+
+#include <stdio.h>
+#include <unistd.h>
+#include <stdlib.h>
+#include <openacc.h>
+#include <cuda.h>
+
+int
+main (int argc, char **argv)
+{
+  CUdevice dev;
+  CUfunction delay;
+  CUmodule module;
+  CUresult r;
+  CUstream stream;
+  unsigned long *a, *d_a, dticks;
+  int nbytes;
+  float dtime;
+  void *kargs[2];
+  int clkrate;
+  int devnum, nprocs;
+
+  acc_init (acc_device_nvidia);
+
+  devnum = acc_get_device_num (acc_device_nvidia);
+
+  r = cuDeviceGet (&dev, devnum);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuDeviceGet failed: %d\n", r);
+      abort ();
+    }
+
+  r =
+    cuDeviceGetAttribute (&nprocs, CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT,
+			  dev);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
+      abort ();
+    }
+
+  r = cuDeviceGetAttribute (&clkrate, CU_DEVICE_ATTRIBUTE_CLOCK_RATE, dev);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
+      abort ();
+    }
+
+  r = cuModuleLoad (&module, "subr.ptx");
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuModuleLoad failed: %d\n", r);
+      abort ();
+    }
+
+  r = cuModuleGetFunction (&delay, module, "delay");
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuModuleGetFunction failed: %d\n", r);
+      abort ();
+    }
+
+  nbytes = nprocs * sizeof (unsigned long);
+
+  dtime = 200.0;
+
+  dticks = (unsigned long) (dtime * clkrate);
+
+  a = (unsigned long *) malloc (nbytes);
+  d_a = (unsigned long *) acc_malloc (nbytes);
+
+  acc_map_data (a, d_a, nbytes);
+
+  kargs[0] = (void *) &d_a;
+  kargs[1] = (void *) &dticks;
+
+  r = cuStreamCreate (&stream, CU_STREAM_DEFAULT);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuStreamCreate failed: %d\n", r);
+      abort ();
+    }
+
+  if (!acc_set_cuda_stream (0, stream))
+    abort ();
+    
+  r = cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, stream, kargs, 0);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuLaunchKernel failed: %d\n", r);
+      abort ();
+    }
+
+  if (acc_async_test_all () != 0)
+    {
+      fprintf (stderr, "asynchronous operation not running\n");
+      abort ();
+    }
+
+  sleep ((int) (dtime / 1000.f) + 1);
+
+  if (acc_async_test_all () != 1)
+    {
+      fprintf (stderr, "found asynchronous operation still running\n");
+      abort ();
+    }
+
+  acc_unmap_data (a);
+
+  free (a);
+  acc_free (d_a);
+
+  acc_shutdown (acc_device_nvidia);
+
+  exit (0);
+}
+
+/* { dg-output "" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-73.c b/libgomp/testsuite/libgomp.oacc-c/lib-73.c
new file mode 100644
index 0000000..43a8b7e
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-73.c
@@ -0,0 +1,134 @@
+/* { dg-do run { target openacc_nvidia_accel_selected } } */
+/* { dg-additional-options "-lcuda" } */
+
+#include <stdio.h>
+#include <unistd.h>
+#include <stdlib.h>
+#include <openacc.h>
+#include <cuda.h>
+
+int
+main (int argc, char **argv)
+{
+  CUdevice dev;
+  CUfunction delay;
+  CUmodule module;
+  CUresult r;
+  const int N = 10;
+  int i;
+  CUstream streams[N];
+  unsigned long *a, *d_a, dticks;
+  int nbytes;
+  float dtime;
+  void *kargs[2];
+  int clkrate;
+  int devnum, nprocs;
+
+  acc_init (acc_device_nvidia);
+
+  devnum = acc_get_device_num (acc_device_nvidia);
+
+  r = cuDeviceGet (&dev, devnum);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuDeviceGet failed: %d\n", r);
+      abort ();
+    }
+
+  r =
+    cuDeviceGetAttribute (&nprocs, CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT,
+			  dev);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
+      abort ();
+    }
+
+  r = cuDeviceGetAttribute (&clkrate, CU_DEVICE_ATTRIBUTE_CLOCK_RATE, dev);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
+      abort ();
+    }
+
+  r = cuModuleLoad (&module, "subr.ptx");
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuModuleLoad failed: %d\n", r);
+      abort ();
+    }
+
+  r = cuModuleGetFunction (&delay, module, "delay");
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuModuleGetFunction failed: %d\n", r);
+      abort ();
+    }
+
+  nbytes = nprocs * sizeof (unsigned long);
+
+  dtime = 200.0;
+
+  dticks = (unsigned long) (dtime * clkrate);
+
+  a = (unsigned long *) malloc (nbytes);
+  d_a = (unsigned long *) acc_malloc (nbytes);
+
+  acc_map_data (a, d_a, nbytes);
+
+  kargs[0] = (void *) &d_a;
+  kargs[1] = (void *) &dticks;
+
+  for (i = 0; i < N; i++)
+    {
+      streams[i] = (CUstream) acc_get_cuda_stream (i);
+      if (streams[i] != NULL)
+	abort ();
+
+      r = cuStreamCreate (&streams[i], CU_STREAM_DEFAULT);
+      if (r != CUDA_SUCCESS)
+	{
+	  fprintf (stderr, "cuStreamCreate failed: %d\n", r);
+	  abort ();
+	}
+
+        if (!acc_set_cuda_stream (i, streams[i]))
+	  abort ();
+    }
+
+  for (i = 0; i < N; i++)
+    {
+      r = cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, streams[i], kargs, 0);
+      if (r != CUDA_SUCCESS)
+	{
+	  fprintf (stderr, "cuLaunchKernel failed: %d\n", r);
+	  abort ();
+	}
+
+    }
+
+  if (acc_async_test_all () != 0)
+    {
+      fprintf (stderr, "asynchronous operation not running\n");
+      abort ();
+    }
+
+  sleep ((int) (dtime / 1000.0f) + 1);
+
+  if (acc_async_test_all () != 1)
+    {
+      fprintf (stderr, "asynchronous operation not running\n");
+      abort ();
+    }
+
+  acc_unmap_data (a);
+
+  free (a);
+  acc_free (d_a);
+
+  acc_shutdown (acc_device_nvidia);
+
+  exit (0);
+}
+
+/* { dg-output "" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-74.c b/libgomp/testsuite/libgomp.oacc-c/lib-74.c
new file mode 100644
index 0000000..0726ee4
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-74.c
@@ -0,0 +1,139 @@
+/* { dg-do run { target openacc_nvidia_accel_selected } } */
+/* { dg-additional-options "-lcuda" } */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <openacc.h>
+#include <cuda.h>
+#include "timer.h"
+
+int
+main (int argc, char **argv)
+{
+  CUdevice dev;
+  CUfunction delay;
+  CUmodule module;
+  CUresult r;
+  CUstream stream;
+  unsigned long *a, *d_a, dticks;
+  int nbytes;
+  float atime, dtime;
+  void *kargs[2];
+  int clkrate;
+  int devnum, nprocs;
+
+  acc_init (acc_device_nvidia);
+
+  devnum = acc_get_device_num (acc_device_nvidia);
+
+  r = cuDeviceGet (&dev, devnum);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuDeviceGet failed: %d\n", r);
+      abort ();
+    }
+
+  r =
+    cuDeviceGetAttribute (&nprocs, CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT,
+			  dev);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
+      abort ();
+    }
+
+  r = cuDeviceGetAttribute (&clkrate, CU_DEVICE_ATTRIBUTE_CLOCK_RATE, dev);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
+      abort ();
+    }
+
+  r = cuModuleLoad (&module, "subr.ptx");
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuModuleLoad failed: %d\n", r);
+      abort ();
+    }
+
+  r = cuModuleGetFunction (&delay, module, "delay");
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuModuleGetFunction failed: %d\n", r);
+      abort ();
+    }
+
+  nbytes = nprocs * sizeof (unsigned long);
+
+  dtime = 200.0;
+
+  dticks = (unsigned long) (dtime * clkrate);
+
+  a = (unsigned long *) malloc (nbytes);
+  d_a = (unsigned long *) acc_malloc (nbytes);
+
+  acc_map_data (a, d_a, nbytes);
+
+  kargs[0] = (void *) &d_a;
+  kargs[1] = (void *) &dticks;
+
+  stream = (CUstream) acc_get_cuda_stream (0);
+  if (stream != NULL)
+    abort ();
+
+  r = cuStreamCreate (&stream, CU_STREAM_DEFAULT);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuStreamCreate failed: %d\n", r);
+      abort ();
+    }
+
+  if (!acc_set_cuda_stream (0, stream))
+    abort ();
+
+  init_timers (1);
+
+  start_timer (0);
+
+  r = cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, stream, kargs, 0);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuLaunchKernel failed: %d\n", r);
+      abort ();
+    }
+
+  acc_wait (0);
+
+  atime = stop_timer (0);
+
+  if (atime < dtime)
+    {
+      fprintf (stderr, "actual time < delay time\n");
+      abort ();
+    }
+
+  start_timer (0);
+
+  acc_wait (0);
+
+  atime = stop_timer (0);
+
+  if (0.010 < atime)
+    {
+      fprintf (stderr, "actual time too long\n");
+      abort ();
+    }
+
+  acc_unmap_data (a);
+
+  fini_timers ();
+
+  free (a);
+  acc_free (d_a);
+
+  acc_shutdown (acc_device_nvidia);
+
+  exit (0);
+}
+
+/* { dg-output "" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-75.c b/libgomp/testsuite/libgomp.oacc-c/lib-75.c
new file mode 100644
index 0000000..1942211
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-75.c
@@ -0,0 +1,141 @@
+/* { dg-do run { target openacc_nvidia_accel_selected } } */
+/* { dg-additional-options "-lcuda" } */
+
+#include <stdio.h>
+#include <unistd.h>
+#include <stdlib.h>
+#include <openacc.h>
+#include <cuda.h>
+#include "timer.h"
+
+int
+main (int argc, char **argv)
+{
+  CUdevice dev;
+  CUfunction delay;
+  CUmodule module;
+  CUresult r;
+  int N;
+  int i;
+  CUstream stream;
+  unsigned long *a, *d_a, dticks;
+  int nbytes;
+  float atime, dtime, hitime, lotime;
+  void *kargs[2];
+  int clkrate;
+  int devnum, nprocs;
+
+  acc_init (acc_device_nvidia);
+
+  devnum = acc_get_device_num (acc_device_nvidia);
+
+  r = cuDeviceGet (&dev, devnum);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuDeviceGet failed: %d\n", r);
+      abort ();
+    }
+
+  r =
+    cuDeviceGetAttribute (&nprocs, CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT,
+			  dev);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
+      abort ();
+    }
+
+  r = cuDeviceGetAttribute (&clkrate, CU_DEVICE_ATTRIBUTE_CLOCK_RATE, dev);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
+      abort ();
+    }
+
+  r = cuModuleLoad (&module, "subr.ptx");
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuModuleLoad failed: %d\n", r);
+      abort ();
+    }
+
+  r = cuModuleGetFunction (&delay, module, "delay");
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuModuleGetFunction failed: %d\n", r);
+      abort ();
+    }
+
+  nbytes = nprocs * sizeof (unsigned long);
+
+  dtime = 200.0;
+
+  dticks = (unsigned long) (dtime * clkrate);
+
+  N = nprocs;
+
+  a = (unsigned long *) malloc (nbytes);
+  d_a = (unsigned long *) acc_malloc (nbytes);
+
+  acc_map_data (a, d_a, nbytes);
+
+  stream = (CUstream) acc_get_cuda_stream (0);
+  if (stream != NULL)
+    abort ();
+
+  r = cuStreamCreate (&stream, CU_STREAM_DEFAULT);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuStreamCreate failed: %d\n", r);
+      abort ();
+    }
+
+  if (!acc_set_cuda_stream (0, stream))
+    abort ();
+
+  init_timers (1);
+
+  kargs[0] = (void *) &d_a;
+  kargs[1] = (void *) &dticks;
+
+  start_timer (0);
+
+  for (i = 0; i < N; i++)
+    {
+      r = cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, stream, kargs, 0);
+      if (r != CUDA_SUCCESS)
+	{
+	  fprintf (stderr, "cuLaunchKernel failed: %d\n", r);
+	  abort ();
+	}
+
+      acc_wait (0);
+    }
+
+  atime = stop_timer (0);
+
+  hitime = dtime * N;
+  hitime += hitime * 0.02;
+
+  lotime = dtime * N;
+  lotime -= lotime * 0.02;
+
+  if (atime > hitime || atime < lotime)
+    {
+      fprintf (stderr, "actual time < delay time\n");
+      abort ();
+    }
+
+  acc_unmap_data (a);
+
+  fini_timers ();
+
+  free (a);
+  acc_free (d_a);
+
+  acc_shutdown (acc_device_nvidia);
+
+  exit (0);
+}
+
+/* { dg-output "" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-76.c b/libgomp/testsuite/libgomp.oacc-c/lib-76.c
new file mode 100644
index 0000000..11d9d62
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-76.c
@@ -0,0 +1,147 @@
+/* { dg-do run { target openacc_nvidia_accel_selected } } */
+/* { dg-additional-options "-lcuda" } */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <openacc.h>
+#include <cuda.h>
+#include "timer.h"
+
+int
+main (int argc, char **argv)
+{
+  CUdevice dev;
+  CUfunction delay;
+  CUmodule module;
+  CUresult r;
+  int N;
+  int i;
+  CUstream *streams;
+  unsigned long *a, *d_a, dticks;
+  int nbytes;
+  float atime, dtime, hitime, lotime;
+  void *kargs[2];
+  int clkrate;
+  int devnum, nprocs;
+
+  acc_init (acc_device_nvidia);
+
+  devnum = acc_get_device_num (acc_device_nvidia);
+
+  r = cuDeviceGet (&dev, devnum);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuDeviceGet failed: %d\n", r);
+      abort ();
+    }
+
+  r =
+    cuDeviceGetAttribute (&nprocs, CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT,
+			  dev);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
+      abort ();
+    }
+
+  r = cuDeviceGetAttribute (&clkrate, CU_DEVICE_ATTRIBUTE_CLOCK_RATE, dev);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
+      abort ();
+    }
+
+  r = cuModuleLoad (&module, "subr.ptx");
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuModuleLoad failed: %d\n", r);
+      abort ();
+    }
+
+  r = cuModuleGetFunction (&delay, module, "delay");
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuModuleGetFunction failed: %d\n", r);
+      abort ();
+    }
+
+  nbytes = nprocs * sizeof (unsigned long);
+
+  dtime = 200.0;
+
+  dticks = (unsigned long) (dtime * clkrate);
+
+  N = nprocs;
+
+  a = (unsigned long *) malloc (nbytes);
+  d_a = (unsigned long *) acc_malloc (nbytes);
+
+  acc_map_data (a, d_a, nbytes);
+
+  streams = (CUstream *) malloc (N * sizeof (void *));
+
+  for (i = 0; i < N; i++)
+    {
+      streams[i] = (CUstream) acc_get_cuda_stream (i);
+      if (streams[i] != NULL)
+	abort ();
+
+      r = cuStreamCreate (&streams[i], CU_STREAM_DEFAULT);
+      if (r != CUDA_SUCCESS)
+	{
+	  fprintf (stderr, "cuStreamCreate failed: %d\n", r);
+	  abort ();
+	}
+
+        if (!acc_set_cuda_stream (i, streams[i]))
+	  abort ();
+    }
+
+  init_timers (1);
+
+  kargs[0] = (void *) &d_a;
+  kargs[1] = (void *) &dticks;
+
+  start_timer (0);
+
+  for (i = 0; i < N; i++)
+    {
+      r = cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, streams[i], kargs, 0);
+      if (r != CUDA_SUCCESS)
+	{
+	  fprintf (stderr, "cuLaunchKernel failed: %d\n", r);
+	  abort ();
+	}
+
+      acc_wait (i);
+    }
+
+  atime = stop_timer (0);
+
+  hitime = dtime * N;
+  hitime += hitime * 0.02;
+
+  lotime = dtime * N;
+  lotime -= lotime * 0.02;
+
+  if (atime > hitime || atime < lotime)
+    {
+      fprintf (stderr, "actual time < delay time\n");
+      abort ();
+    }
+
+  acc_unmap_data (a);
+
+  fini_timers ();
+
+  free (streams);
+  free (a);
+  acc_free (d_a);
+
+  acc_shutdown (acc_device_nvidia);
+
+  exit (0);
+}
+
+/* { dg-output "" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-77.c b/libgomp/testsuite/libgomp.oacc-c/lib-77.c
new file mode 100644
index 0000000..e47212b
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-77.c
@@ -0,0 +1,135 @@
+/* { dg-do run { target openacc_nvidia_accel_selected } } */
+/* { dg-additional-options "-lcuda" } */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <openacc.h>
+#include <cuda.h>
+#include "timer.h"
+
+int
+main (int argc, char **argv)
+{
+  CUdevice dev;
+  CUfunction delay;
+  CUmodule module;
+  CUresult r;
+  CUstream stream;
+  unsigned long *a, *d_a, dticks;
+  int nbytes;
+  float atime, dtime;
+  void *kargs[2];
+  int clkrate;
+  int devnum, nprocs;
+
+  acc_init (acc_device_nvidia);
+
+  devnum = acc_get_device_num (acc_device_nvidia);
+
+  r = cuDeviceGet (&dev, devnum);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuDeviceGet failed: %d\n", r);
+      abort ();
+    }
+
+  r =
+    cuDeviceGetAttribute (&nprocs, CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT,
+			  dev);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
+      abort ();
+    }
+
+  r = cuDeviceGetAttribute (&clkrate, CU_DEVICE_ATTRIBUTE_CLOCK_RATE, dev);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
+      abort ();
+    }
+
+  r = cuModuleLoad (&module, "subr.ptx");
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuModuleLoad failed: %d\n", r);
+      abort ();
+    }
+
+  r = cuModuleGetFunction (&delay, module, "delay");
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuModuleGetFunction failed: %d\n", r);
+      abort ();
+    }
+
+  nbytes = nprocs * sizeof (unsigned long);
+
+  dtime = 200.0;
+
+  dticks = (unsigned long) (dtime * clkrate);
+
+  a = (unsigned long *) malloc (nbytes);
+  d_a = (unsigned long *) acc_malloc (nbytes);
+
+  acc_map_data (a, d_a, nbytes);
+
+  kargs[0] = (void *) &d_a;
+  kargs[1] = (void *) &dticks;
+
+  r = cuStreamCreate (&stream, CU_STREAM_DEFAULT);
+  if (r != CUDA_SUCCESS)
+	{
+	  fprintf (stderr, "cuStreamCreate failed: %d\n", r);
+	  abort ();
+	}
+
+  acc_set_cuda_stream (0, stream);
+
+  init_timers (1);
+
+  start_timer (0);
+
+  r = cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, stream, kargs, 0);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuLaunchKernel failed: %d\n", r);
+      abort ();
+    }
+
+  acc_wait (1);
+
+  atime = stop_timer (0);
+
+  if (atime < dtime)
+    {
+      fprintf (stderr, "actual time < delay time\n");
+      abort ();
+    }
+
+  start_timer (0);
+
+  acc_wait (1);
+
+  atime = stop_timer (0);
+
+  if (0.010 < atime)
+    {
+      fprintf (stderr, "actual time < delay time\n");
+      abort ();
+    }
+
+  acc_unmap_data (a);
+
+  fini_timers ();
+
+  free (a);
+  acc_free (d_a);
+
+  acc_shutdown (acc_device_nvidia);
+
+  return 0;
+}
+
+/* { dg-shouldfail "libgomp: unknown async \d" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-78.c b/libgomp/testsuite/libgomp.oacc-c/lib-78.c
new file mode 100644
index 0000000..4f58fb2
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-78.c
@@ -0,0 +1,140 @@
+/* { dg-do run { target openacc_nvidia_accel_selected } } */
+/* { dg-additional-options "-lcuda" } */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <openacc.h>
+#include <cuda.h>
+#include "timer.h"
+
+int
+main (int argc, char **argv)
+{
+  CUdevice dev;
+  CUfunction delay;
+  CUmodule module;
+  CUresult r;
+  CUstream stream;
+  unsigned long *a, *d_a, dticks;
+  int nbytes;
+  float atime, dtime;
+  void *kargs[2];
+  int clkrate;
+  int devnum, nprocs;
+
+  acc_init (acc_device_nvidia);
+
+  devnum = acc_get_device_num (acc_device_nvidia);
+
+  r = cuDeviceGet (&dev, devnum);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuDeviceGet failed: %d\n", r);
+      abort ();
+    }
+
+  r =
+    cuDeviceGetAttribute (&nprocs, CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT,
+			  dev);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
+      abort ();
+    }
+
+  r = cuDeviceGetAttribute (&clkrate, CU_DEVICE_ATTRIBUTE_CLOCK_RATE, dev);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
+      abort ();
+    }
+
+  r = cuModuleLoad (&module, "subr.ptx");
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuModuleLoad failed: %d\n", r);
+      abort ();
+    }
+
+  r = cuModuleGetFunction (&delay, module, "delay");
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuModuleGetFunction failed: %d\n", r);
+      abort ();
+    }
+
+  nbytes = nprocs * sizeof (unsigned long);
+
+  dtime = 200.0;
+
+  dticks = (unsigned long) (dtime * clkrate);
+
+  a = (unsigned long *) malloc (nbytes);
+  d_a = (unsigned long *) acc_malloc (nbytes);
+
+  acc_map_data (a, d_a, nbytes);
+
+  kargs[0] = (void *) &d_a;
+  kargs[1] = (void *) &dticks;
+
+  stream = (CUstream) acc_get_cuda_stream (0);
+  if (stream != NULL)
+    abort ();
+
+  r = cuStreamCreate (&stream, CU_STREAM_DEFAULT);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuStreamCreate failed: %d\n", r);
+      abort ();
+    }
+
+  if (!acc_set_cuda_stream (0, stream))
+    abort ();
+
+  init_timers (1);
+
+  start_timer (0);
+
+  r = cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, stream, kargs, 0);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuLaunchKernel failed: %d\n", r);
+      abort ();
+    }
+
+  acc_wait_all ();
+
+  atime = stop_timer (0);
+
+  if (atime < dtime)
+    {
+      fprintf (stderr, "actual time < delay time\n");
+      abort ();
+    }
+
+  start_timer (0);
+
+  acc_wait_all ();
+
+  atime = stop_timer (0);
+
+  if (0.010 < atime)
+    {
+      fprintf (stderr, "actual time too long\n");
+      abort ();
+    }
+
+  acc_unmap_data (a);
+
+  fini_timers ();
+
+  free (a);
+  acc_free (d_a);
+
+  acc_shutdown (acc_device_nvidia);
+
+  exit (0);
+}
+
+/* { dg-output "" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-79.c b/libgomp/testsuite/libgomp.oacc-c/lib-79.c
new file mode 100644
index 0000000..ef3df13
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-79.c
@@ -0,0 +1,167 @@
+/* { dg-do run { target openacc_nvidia_accel_selected } } */
+/* { dg-additional-options "-lcuda" } */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <openacc.h>
+#include <cuda.h>
+#include "timer.h"
+
+int
+main (int argc, char **argv)
+{
+  CUdevice dev;
+  CUfunction delay;
+  CUmodule module;
+  CUresult r;
+  int N;
+  int i;
+  CUstream stream;
+  unsigned long *a, *d_a, dticks;
+  int nbytes;
+  float atime, dtime, hitime, lotime;
+  void *kargs[2];
+  int clkrate;
+  int devnum, nprocs;
+
+  devnum = 2;
+
+  acc_init (acc_device_nvidia);
+
+  devnum = acc_get_device_num (acc_device_nvidia);
+
+  r = cuDeviceGet (&dev, devnum);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuDeviceGet failed: %d\n", r);
+      abort ();
+    }
+
+  r =
+    cuDeviceGetAttribute (&nprocs, CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT,
+			  dev);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
+      abort ();
+    }
+
+  r = cuDeviceGetAttribute (&clkrate, CU_DEVICE_ATTRIBUTE_CLOCK_RATE, dev);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
+      abort ();
+    }
+
+  r = cuModuleLoad (&module, "subr.ptx");
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuModuleLoad failed: %d\n", r);
+      abort ();
+    }
+
+  r = cuModuleGetFunction (&delay, module, "delay");
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuModuleGetFunction failed: %d\n", r);
+      abort ();
+    }
+
+  nbytes = nprocs * sizeof (unsigned long);
+
+  dtime = 200.0;
+
+  dticks = (unsigned long) (dtime * clkrate);
+
+  N = nprocs;
+
+  a = (unsigned long *) malloc (nbytes);
+  d_a = (unsigned long *) acc_malloc (nbytes);
+
+  acc_map_data (a, d_a, nbytes);
+
+  r = cuStreamCreate (&stream, CU_STREAM_DEFAULT);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuStreamCreate failed: %d\n", r);
+      abort ();
+    }
+
+  if (!acc_set_cuda_stream (1, stream))
+    abort ();
+
+  stream = (CUstream) acc_get_cuda_stream (0);
+  if (stream != NULL)
+    abort ();
+
+  r = cuStreamCreate (&stream, CU_STREAM_DEFAULT);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuStreamCreate failed: %d\n", r);
+      abort ();
+    }
+
+  if (!acc_set_cuda_stream (0, stream))
+    abort ();
+
+  init_timers (1);
+
+  kargs[0] = (void *) &d_a;
+  kargs[1] = (void *) &dticks;
+
+  start_timer (0);
+
+  for (i = 0; i < N; i++)
+    {
+      r = cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, stream, kargs, 0);
+      if (r != CUDA_SUCCESS)
+	{
+	  fprintf (stderr, "cuLaunchKernel failed: %d\n", r);
+	  abort ();
+	}
+    }
+
+  acc_wait_async (0, 1);
+
+  if (acc_async_test (0) != 0)
+    abort ();
+
+  if (acc_async_test (1) != 0)
+    abort ();
+
+  acc_wait (1);
+
+  atime = stop_timer (0);
+
+  if (acc_async_test (0) != 1)
+    abort ();
+
+  if (acc_async_test (1) != 1)
+    abort ();
+
+  hitime = dtime * N;
+  hitime += hitime * 0.02;
+
+  lotime = dtime * N;
+  lotime -= lotime * 0.02;
+
+  if (atime > hitime || atime < lotime)
+    {
+      fprintf (stderr, "actual time < delay time\n");
+      abort ();
+    }
+
+  acc_unmap_data (a);
+
+  fini_timers ();
+
+  free (a);
+  acc_free (d_a);
+
+  acc_shutdown (acc_device_nvidia);
+
+  exit (0);
+}
+
+/* { dg-output "" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-80.c b/libgomp/testsuite/libgomp.oacc-c/lib-80.c
new file mode 100644
index 0000000..0b5ec24
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-80.c
@@ -0,0 +1,132 @@
+/* { dg-do run { target openacc_nvidia_accel_selected } } */
+/* { dg-additional-options "-lcuda" } */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <openacc.h>
+#include <cuda.h>
+#include "timer.h"
+
+int
+main (int argc, char **argv)
+{
+  CUdevice dev;
+  CUfunction delay;
+  CUmodule module;
+  CUresult r;
+  CUstream stream;
+  int N;
+  int i;
+  unsigned long *a, *d_a, dticks;
+  int nbytes;
+  float atime, dtime;
+  void *kargs[2];
+  int clkrate;
+  int devnum, nprocs;
+
+  acc_init (acc_device_nvidia);
+
+  devnum = acc_get_device_num (acc_device_nvidia);
+
+  r = cuDeviceGet (&dev, devnum);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuDeviceGet failed: %d\n", r);
+      abort ();
+    }
+
+  r =
+    cuDeviceGetAttribute (&nprocs, CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT,
+			  dev);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
+      abort ();
+    }
+
+  r = cuDeviceGetAttribute (&clkrate, CU_DEVICE_ATTRIBUTE_CLOCK_RATE, dev);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
+      abort ();
+    }
+
+  r = cuModuleLoad (&module, "subr.ptx");
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuModuleLoad failed: %d\n", r);
+      abort ();
+    }
+
+  r = cuModuleGetFunction (&delay, module, "delay");
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuModuleGetFunction failed: %d\n", r);
+      abort ();
+    }
+
+  nbytes = nprocs * sizeof (unsigned long);
+
+  dtime = 200.0;
+
+  dticks = (unsigned long) (dtime * clkrate);
+
+  N = nprocs;
+
+  a = (unsigned long *) malloc (nbytes);
+  d_a = (unsigned long *) acc_malloc (nbytes);
+
+  acc_map_data (a, d_a, nbytes);
+
+  r = cuStreamCreate (&stream, CU_STREAM_DEFAULT);
+  if (r != CUDA_SUCCESS)
+	{
+	  fprintf (stderr, "cuStreamCreate failed: %d\n", r);
+	  abort ();
+	}
+
+  acc_set_cuda_stream (1, stream);
+
+  init_timers (1);
+
+  kargs[0] = (void *) &d_a;
+  kargs[1] = (void *) &dticks;
+
+  start_timer (0);
+
+  for (i = 0; i < N; i++)
+    {
+      r = cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, stream, kargs, 0);
+      if (r != CUDA_SUCCESS)
+	{
+	  fprintf (stderr, "cuLaunchKernel failed: %d\n", r);
+	  abort ();
+	}
+    }
+
+  acc_wait_async (1, 1);
+
+  acc_wait (1);
+
+  atime = stop_timer (0);
+
+  if (atime < dtime)
+    {
+      fprintf (stderr, "actual time < delay time\n");
+      abort ();
+    }
+
+  acc_unmap_data (a);
+
+  fini_timers ();
+
+  free (a);
+  acc_free (d_a);
+
+  acc_shutdown (acc_device_nvidia);
+
+  return 0;
+}
+
+/* { dg-shouldfail "libgomp: identical parameters" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-81.c b/libgomp/testsuite/libgomp.oacc-c/lib-81.c
new file mode 100644
index 0000000..d5f18f0
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-81.c
@@ -0,0 +1,211 @@
+/* { dg-do run { target openacc_nvidia_accel_selected } } */
+/* { dg-additional-options "-lcuda" } */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <openacc.h>
+#include <cuda.h>
+#include "timer.h"
+
+int
+main (int argc, char **argv)
+{
+  CUdevice dev;
+  CUfunction delay;
+  CUmodule module;
+  CUresult r;
+  int N;
+  int i;
+  CUstream *streams, stream;
+  unsigned long *a, *d_a, dticks;
+  int nbytes;
+  float atime, dtime;
+  void *kargs[2];
+  int clkrate;
+  int devnum, nprocs;
+
+  acc_init (acc_device_nvidia);
+
+  devnum = acc_get_device_num (acc_device_nvidia);
+
+  r = cuDeviceGet (&dev, devnum);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuDeviceGet failed: %d\n", r);
+      abort ();
+    }
+
+  r =
+    cuDeviceGetAttribute (&nprocs, CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT,
+			  dev);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
+      abort ();
+    }
+
+  r = cuDeviceGetAttribute (&clkrate, CU_DEVICE_ATTRIBUTE_CLOCK_RATE, dev);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
+      abort ();
+    }
+
+  r = cuModuleLoad (&module, "subr.ptx");
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuModuleLoad failed: %d\n", r);
+      abort ();
+    }
+
+  r = cuModuleGetFunction (&delay, module, "delay");
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuModuleGetFunction failed: %d\n", r);
+      abort ();
+    }
+
+  nbytes = nprocs * sizeof (unsigned long);
+
+  dtime = 500.0;
+
+  dticks = (unsigned long) (dtime * clkrate);
+
+  N = nprocs;
+
+  a = (unsigned long *) malloc (nbytes);
+  d_a = (unsigned long *) acc_malloc (nbytes);
+
+  acc_map_data (a, d_a, nbytes);
+
+  streams = (CUstream *) malloc (N * sizeof (void *));
+
+  for (i = 0; i < N; i++)
+    {
+      streams[i] = (CUstream) acc_get_cuda_stream (i);
+      if (streams[i] != NULL)
+	abort ();
+
+      r = cuStreamCreate (&streams[i], CU_STREAM_DEFAULT);
+      if (r != CUDA_SUCCESS)
+	{
+	  fprintf (stderr, "cuStreamCreate failed: %d\n", r);
+	  abort ();
+	}
+
+        if (!acc_set_cuda_stream (i, streams[i]))
+	  abort ();
+    }
+
+  init_timers (1);
+
+  kargs[0] = (void *) &d_a;
+  kargs[1] = (void *) &dticks;
+
+  stream = (CUstream) acc_get_cuda_stream (N);
+  if (stream != NULL)
+    abort ();
+
+  r = cuStreamCreate (&stream, CU_STREAM_DEFAULT);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuStreamCreate failed: %d\n", r);
+      abort ();
+    }
+
+  if (!acc_set_cuda_stream (N, stream))
+    abort ();
+
+  start_timer (0);
+
+  for (i = 0; i < N; i++)
+    {
+      r = cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, streams[i], kargs, 0);
+      if (r != CUDA_SUCCESS)
+	{
+	  fprintf (stderr, "cuLaunchKernel failed: %d\n", r);
+	  abort ();
+	}
+    }
+
+  acc_wait_all_async (N);
+
+  for (i = 0; i <= N; i++)
+    {
+      if (acc_async_test (i) != 0)
+	abort ();
+    }
+
+  acc_wait (N);
+
+  for (i = 0; i <= N; i++)
+    {
+      if (acc_async_test (i) != 1)
+	abort ();
+    }
+
+  atime = stop_timer (0);
+
+  if (atime < dtime)
+    {
+      fprintf (stderr, "actual time < delay time\n");
+      abort ();
+    }
+
+  start_timer (0);
+
+  stream = (CUstream) acc_get_cuda_stream (N + 1);
+  if (stream != NULL)
+    abort ();
+
+  r = cuStreamCreate (&stream, CU_STREAM_DEFAULT);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuStreamCreate failed: %d\n", r);
+      abort ();
+    }
+
+  if (!acc_set_cuda_stream (N + 1, stream))
+    abort ();
+
+  acc_wait_all_async (N + 1);
+
+  acc_wait (N + 1);
+
+  atime = stop_timer (0);
+
+  if (0.10 < atime)
+    {
+      fprintf (stderr, "actual time too long\n");
+      abort ();
+    }
+
+  start_timer (0);
+
+  acc_wait_all_async (N);
+
+  acc_wait (N);
+
+  atime = stop_timer (0);
+
+  if (0.10 < atime)
+    {
+      fprintf (stderr, "actual time too long\n");
+      abort ();
+    }
+
+  acc_unmap_data (a);
+
+  fini_timers ();
+
+  free (streams);
+  free (a);
+  acc_free (d_a);
+
+  acc_shutdown (acc_device_nvidia);
+
+  exit (0);
+}
+
+/* { dg-output "" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-82.c b/libgomp/testsuite/libgomp.oacc-c/lib-82.c
new file mode 100644
index 0000000..be30a7f
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-82.c
@@ -0,0 +1,144 @@
+/* { dg-do run { target openacc_nvidia_accel_selected } } */
+/* { dg-additional-options "-lcuda" } */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <openacc.h>
+#include <cuda.h>
+
+int
+main (int argc, char **argv)
+{
+  CUdevice dev;
+  CUfunction delay2;
+  CUmodule module;
+  CUresult r;
+  int N;
+  int i;
+  CUstream *streams;
+  unsigned long **a, **d_a, *tid, ticks;
+  int nbytes;
+  void *kargs[3];
+  int clkrate;
+  int devnum, nprocs;
+
+  acc_init (acc_device_nvidia);
+
+  devnum = acc_get_device_num (acc_device_nvidia);
+
+  r = cuDeviceGet (&dev, devnum);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuDeviceGet failed: %d\n", r);
+      abort ();
+    }
+
+  r =
+    cuDeviceGetAttribute (&nprocs, CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT,
+			  dev);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
+      abort ();
+    }
+
+  r = cuDeviceGetAttribute (&clkrate, CU_DEVICE_ATTRIBUTE_CLOCK_RATE, dev);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
+      abort ();
+    }
+
+  r = cuModuleLoad (&module, "subr.ptx");
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuModuleLoad failed: %d\n", r);
+      abort ();
+    }
+
+  r = cuModuleGetFunction (&delay2, module, "delay2");
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuModuleGetFunction failed: %d\n", r);
+      abort ();
+    }
+
+  nbytes = sizeof (int);
+
+  ticks = (unsigned long) (200.0 * clkrate);
+
+  N = nprocs;
+
+  streams = (CUstream *) malloc (N * sizeof (void *));
+
+  a = (unsigned long **) malloc (N * sizeof (unsigned long *));
+  d_a = (unsigned long **) malloc (N * sizeof (unsigned long *));
+  tid = (unsigned long *) malloc (N * sizeof (unsigned long));
+
+  for (i = 0; i < N; i++)
+    {
+      a[i] = (unsigned long *) malloc (sizeof (unsigned long));
+      *a[i] = N;
+      d_a[i] = (unsigned long *) acc_malloc (nbytes);
+      tid[i] = i;
+
+      acc_map_data (a[i], d_a[i], nbytes);
+
+      streams[i] = (CUstream) acc_get_cuda_stream (i);
+      if (streams[i] != NULL)
+        abort ();
+
+      r = cuStreamCreate (&streams[i], CU_STREAM_DEFAULT);
+      if (r != CUDA_SUCCESS)
+        {
+          fprintf (stderr, "cuStreamCreate failed: %d\n", r);
+          abort ();
+        }
+
+       if (!acc_set_cuda_stream (i, streams[i]))
+        abort ();
+    }
+
+  for (i = 0; i < N; i++)
+    {
+      kargs[0] = (void *) &d_a[i];
+      kargs[1] = (void *) &ticks;
+      kargs[2] = (void *) &tid[i];
+
+      r = cuLaunchKernel (delay2, 1, 1, 1, 1, 1, 1, 0, streams[i], kargs, 0);
+      if (r != CUDA_SUCCESS)
+	{
+	  fprintf (stderr, "cuLaunchKernel failed: %d\n", r);
+	  abort ();
+	}
+
+      ticks = (unsigned long) (50.0 * clkrate);
+    }
+
+  acc_wait_all_async (0);
+
+  for (i = 0; i < N; i++)
+    {
+      acc_copyout (a[i], nbytes);
+      if (*a[i] != i)
+	abort ();
+    }
+
+  free (streams);
+
+  for (i = 0; i < N; i++)
+    {
+      free (a[i]);
+    }
+
+  free (a);
+  free (d_a);
+  free (tid);
+
+  acc_shutdown (acc_device_nvidia);
+
+  exit (0);
+}
+
+/* { dg-output "" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-83.c b/libgomp/testsuite/libgomp.oacc-c/lib-83.c
new file mode 100644
index 0000000..1c2e52b
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-83.c
@@ -0,0 +1,58 @@
+/* { dg-do run { target openacc_nvidia_accel_selected } } */
+/* { dg-additional-options "-lcuda" } */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <openacc.h>
+#include "timer.h"
+
+int
+main (int argc, char **argv)
+{
+  float atime;
+  CUstream stream;
+  CUresult r;
+
+  acc_init (acc_device_nvidia);
+
+  (void) acc_get_device_num (acc_device_nvidia);
+
+  init_timers (1);
+
+  stream = (CUstream) acc_get_cuda_stream (0);
+  if (stream != NULL)
+    abort ();
+
+  r = cuStreamCreate (&stream, CU_STREAM_DEFAULT);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuStreamCreate failed: %d\n", r);
+      abort ();
+    }
+
+  if (!acc_set_cuda_stream (0, stream))
+    abort ();
+
+  start_timer (0);
+
+  acc_wait_all_async (0);
+
+  acc_wait (0);
+
+  atime = stop_timer (0);
+
+  if (0.010 < atime)
+    {
+      fprintf (stderr, "actual time too long\n");
+      abort ();
+    }
+
+  fini_timers ();
+
+  acc_shutdown (acc_device_nvidia);
+
+  exit (0);
+}
+
+/* { dg-output "" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-84.c b/libgomp/testsuite/libgomp.oacc-c/lib-84.c
new file mode 100644
index 0000000..786b908
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-84.c
@@ -0,0 +1,66 @@
+/* { dg-do run { target openacc_nvidia_accel_selected } } */
+/* { dg-additional-options "-lcuda" } */
+
+#include <stdlib.h>
+#include <unistd.h>
+#include <stdio.h>
+#include <openacc.h>
+#include <cuda.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 100;
+  int i;
+  CUstream *streams;
+  CUstream s;
+  CUresult r;
+
+  acc_init (acc_device_nvidia);
+
+  (void) acc_get_device_num (acc_device_nvidia);
+
+  streams = (CUstream *) malloc (N * sizeof (void *));
+
+  for (i = 0; i < N; i++)
+    {
+      streams[i] = (CUstream) acc_get_cuda_stream (i);
+      if (streams[i] != NULL)
+	abort ();
+
+      r = cuStreamCreate (&streams[i], CU_STREAM_DEFAULT);
+      if (r != CUDA_SUCCESS)
+	{
+	  fprintf (stderr, "cuStreamCreate failed: %d\n", r);
+	  abort ();
+	}
+
+        if (!acc_set_cuda_stream (i, streams[i]))
+	  abort ();
+    }
+
+  for (i = 0; i < N; i++)
+    {
+      int j;
+      int cnt;
+
+      cnt = 0;
+
+      s = streams[i];
+
+      for (j = 0; j < N; j++)
+	{
+	  if (s == streams[j])
+	    cnt++;
+	}
+
+      if (cnt != 1)
+	abort ();
+    }
+
+  acc_shutdown (acc_device_nvidia);
+
+  exit (0);
+}
+
+/* { dg-output "" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-85.c b/libgomp/testsuite/libgomp.oacc-c/lib-85.c
new file mode 100644
index 0000000..cf925a7
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-85.c
@@ -0,0 +1,52 @@
+/* { dg-do run { target openacc_nvidia_accel_selected } } */
+/* { dg-additional-options "-lcuda" } */
+
+#include <stdlib.h>
+#include <unistd.h>
+#include <openacc.h>
+#include <stdio.h>
+#include <cuda.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 100;
+  int i;
+  CUstream *streams;
+  CUstream s;
+  CUresult r;
+
+  acc_init (acc_device_nvidia);
+
+  (void) acc_get_device_num (acc_device_nvidia);
+
+  streams = (CUstream *) malloc (N * sizeof (void *));
+
+  for (i = 0; i < N; i++)
+    {
+      streams[i] = (CUstream) acc_get_cuda_stream (i);
+      if (streams[i] != NULL)
+	abort ();
+
+      r = cuStreamCreate (&streams[i], CU_STREAM_DEFAULT);
+      if (r != CUDA_SUCCESS)
+	{
+	  fprintf (stderr, "cuStreamCreate failed: %d\n", r);
+	  abort ();
+	}
+
+        if (!acc_set_cuda_stream (i, streams[i]))
+	  abort ();
+    }
+
+  s = NULL;
+
+  if (acc_set_cuda_stream (N + 1, s) != 0)
+    abort ();
+
+  acc_shutdown (acc_device_nvidia);
+
+  exit (0);
+}
+
+/* { dg-output "" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-86.c b/libgomp/testsuite/libgomp.oacc-c/lib-86.c
new file mode 100644
index 0000000..b8a8ee9
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-86.c
@@ -0,0 +1,42 @@
+/* { dg-do run } */
+
+#include <stdlib.h>
+#include <unistd.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  if (acc_get_num_devices (acc_device_nvidia) == 0)
+    return 0;
+
+  if (acc_get_current_cuda_device () != 0)
+    abort ();
+
+  acc_init (acc_device_host);
+
+  if (acc_get_current_cuda_device () != 0)
+    abort ();
+
+  acc_shutdown (acc_device_host);
+
+  if (acc_get_num_devices (acc_device_nvidia) == 0)
+    return 0;
+
+  if (acc_get_current_cuda_device () != 0)
+    abort ();
+
+  acc_init (acc_device_nvidia);
+
+  if (acc_get_current_cuda_device () == 0)
+    abort ();
+
+  acc_shutdown (acc_device_nvidia);
+
+  if (acc_get_current_cuda_device () != 0)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-output "" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-87.c b/libgomp/testsuite/libgomp.oacc-c/lib-87.c
new file mode 100644
index 0000000..147d443
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-87.c
@@ -0,0 +1,42 @@
+/* { dg-do run } */
+
+#include <stdlib.h>
+#include <unistd.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  if (acc_get_num_devices (acc_device_nvidia) == 0)
+    return 0;
+
+  if (acc_get_current_cuda_context () != 0)
+    abort ();
+
+  acc_init (acc_device_host);
+
+  if (acc_get_current_cuda_context () != 0)
+    abort ();
+
+  acc_shutdown (acc_device_host);
+
+  if (acc_get_num_devices (acc_device_nvidia) == 0)
+    return 0;
+
+  if (acc_get_current_cuda_context () != 0)
+    abort ();
+
+  acc_init (acc_device_nvidia);
+
+  if (acc_get_current_cuda_context () == 0)
+    abort ();
+
+  acc_shutdown (acc_device_nvidia);
+
+  if (acc_get_current_cuda_context () != 0)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-output "" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-88.c b/libgomp/testsuite/libgomp.oacc-c/lib-88.c
new file mode 100644
index 0000000..10f4ad8
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-88.c
@@ -0,0 +1,111 @@
+/* { dg-do run } */
+
+#include <stdio.h>
+#include <pthread.h>
+#include <string.h>
+#include <stdlib.h>
+#include <ctype.h>
+#include <openacc.h>
+
+unsigned char *x;
+void *d_x;
+const int N = 256;
+
+static void *
+test (void *arg)
+{
+  int i;
+
+  if (acc_get_current_cuda_context () != NULL)
+    abort ();
+
+  if (acc_is_present (x, N) != 1)
+    abort ();
+
+  memset (x, 0, N);
+
+  acc_copyout (x, N);
+
+  for (i = 0; i < N; i++)
+    {
+      if (x[i] != i)
+	abort ();
+
+      x[i] = N - i - 1;
+    }
+
+  d_x = acc_copyin (x, N);
+
+  return 0;
+}
+
+int
+main (int argc, char **argv)
+{
+  const int nthreads = 1;
+  int i;
+  pthread_attr_t attr;
+  pthread_t *tid;
+
+  if (acc_get_num_devices (acc_device_nvidia) == 0)
+    return 0;
+
+  acc_init (acc_device_nvidia);
+
+  x = (unsigned char *) malloc (N);
+
+  for (i = 0; i < N; i++)
+    {
+      x[i] = i;
+    }
+
+  d_x = acc_copyin (x, N);
+
+  if (acc_is_present (x, N) != 1)
+    abort ();
+
+  if (pthread_attr_init (&attr) != 0)
+    perror ("pthread_attr_init failed");
+
+  tid = (pthread_t *) malloc (nthreads * sizeof (pthread_t));
+
+  for (i = 0; i < nthreads; i++)
+    {
+      if (pthread_create (&tid[i], &attr, &test, (void *) (unsigned long) (i))
+	  != 0)
+	perror ("pthread_create failed");
+    }
+
+  if (pthread_attr_destroy (&attr) != 0)
+    perror ("pthread_attr_destroy failed");
+
+  for (i = 0; i < nthreads; i++)
+    {
+      void *res;
+
+      if (pthread_join (tid[i], &res) != 0)
+	perror ("pthread join failed");
+    }
+
+  if (acc_is_present (x, N) != 1)
+    abort ();
+
+  memset (x, 0, N);
+
+  acc_copyout (x, N);
+
+  for (i = 0; i < N; i++)
+    {
+      if (x[i] != N - i - 1)
+	abort ();
+    }
+
+  if (acc_is_present (x, N) != 0)
+    abort ();
+
+  acc_shutdown (acc_device_nvidia);
+
+  return 0;
+}
+
+/* { dg-output "" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-89.c b/libgomp/testsuite/libgomp.oacc-c/lib-89.c
new file mode 100644
index 0000000..061c409
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-89.c
@@ -0,0 +1,118 @@
+/* { dg-do run } */
+
+#include <stdio.h>
+#include <pthread.h>
+#include <string.h>
+#include <stdlib.h>
+#include <errno.h>
+#include <ctype.h>
+#include <openacc.h>
+
+unsigned char **x;
+void **d_x;
+const int N = 16;
+const int NTHREADS = 32;
+
+static void *
+test (void *arg)
+{
+  int i;
+  int tid;
+  unsigned char *p;
+  int devnum;
+
+  tid = (int) (long) arg;
+
+  devnum = acc_get_device_num (acc_device_nvidia);
+  acc_set_device_num (devnum, acc_device_nvidia);
+
+  if (acc_get_current_cuda_context () == NULL)
+    abort ();
+
+  p = (unsigned char *) malloc (N);
+
+  for (i = 0; i < N; i++)
+    {
+      p[i] = tid;
+    }
+
+  x[tid] = p;
+
+  d_x[tid] = acc_copyin (p, N);
+
+  return 0;
+}
+
+int
+main (int argc, char **argv)
+{
+  int i;
+  pthread_attr_t attr;
+  pthread_t *tid;
+
+  if (acc_get_num_devices (acc_device_nvidia) == 0)
+    return 0;
+
+  acc_init (acc_device_nvidia);
+
+  x = (unsigned char **) malloc (NTHREADS * N);
+  d_x = (void **) malloc (NTHREADS * N);
+
+  if (pthread_attr_init (&attr) != 0)
+    perror ("pthread_attr_init failed");
+
+  tid = (pthread_t *) malloc (NTHREADS * sizeof (pthread_t));
+
+  for (i = 0; i < NTHREADS; i++)
+    {
+      if (pthread_create (&tid[i], &attr, &test, (void *) (unsigned long) (i))
+	  != 0)
+	perror ("pthread_create failed");
+    }
+
+  if (pthread_attr_destroy (&attr) != 0)
+    perror ("pthread_attr_destroy failed");
+
+  for (i = 0; i < NTHREADS; i++)
+    {
+      void *res;
+
+      if (pthread_join (tid[i], &res) != 0)
+	perror ("pthread join failed");
+    }
+
+  for (i = 0; i < NTHREADS; i++)
+    {
+      if (acc_is_present (x[i], N) != 1)
+	abort ();
+    }
+
+  for (i = 0; i < NTHREADS; i++)
+    {
+      memset (x[i], 0, N);
+      acc_copyout (x[i], N);
+    }
+
+  for (i = 0; i < NTHREADS; i++)
+    {
+      unsigned char *p;
+      int j;
+
+      p = x[i];
+
+      for (j = 0; j < N; j++)
+	{
+	  if (p[j] != i)
+	    abort ();
+	}
+
+      if (acc_is_present (x[i], N) != 0)
+	abort ();
+    }
+
+  acc_shutdown (acc_device_nvidia);
+
+  return 0;
+}
+
+/* { dg-output "" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-9.c b/libgomp/testsuite/libgomp.oacc-c/lib-9.c
new file mode 100644
index 0000000..a4cf7f2
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-9.c
@@ -0,0 +1,70 @@
+/* { dg-do run } */
+
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+  int i;
+  int num_devices;
+  int devnum;
+  acc_device_t devtype = acc_device_host;
+
+#if ACC_DEVICE_TYPE_nvidia
+  devtype = acc_device_nvidia;
+#endif
+
+  num_devices = acc_get_num_devices (devtype);
+  if (num_devices == 0)
+    return 0;
+
+  acc_init (devtype);
+
+  for (i = 0; i < num_devices; i++)
+    {
+      acc_set_device_num (i, devtype);
+      devnum = acc_get_device_num (devtype);
+      if (devnum != i)
+	abort ();
+    }
+
+  acc_shutdown (devtype);
+
+  num_devices = acc_get_num_devices (devtype);
+  if (num_devices == 0)
+    abort ();
+
+  for (i = 0; i < num_devices; i++)
+    {
+      acc_set_device_num (i, devtype);
+      devnum = acc_get_device_num (devtype);
+      if (devnum != i)
+	abort ();
+    }
+
+  acc_shutdown (devtype);
+
+  acc_init (devtype);
+
+  acc_set_device_num (0, devtype);
+
+  devnum = acc_get_device_num (devtype);
+  if (devnum != 0)
+    abort ();
+
+  if (num_devices > 1)
+    {
+      acc_set_device_num (1, (acc_device_t) 0);
+
+      devnum = acc_get_device_num (devtype);
+      if (devnum != 1)
+	abort ();
+  }
+
+  acc_shutdown (devtype);
+
+  return 0;
+}
+
+/* { dg-output "" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-90.c b/libgomp/testsuite/libgomp.oacc-c/lib-90.c
new file mode 100644
index 0000000..d17755b
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-90.c
@@ -0,0 +1,137 @@
+/* { dg-do run { target openacc_nvidia_accel_selected } } */
+/* { dg-additional-options "-lcuda" } */
+
+#include <pthread.h>
+#include <stdio.h>
+#include <string.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <errno.h>
+#include <ctype.h>
+#include <openacc.h>
+#include <cuda.h>
+
+unsigned char **x;
+void **d_x;
+const int N = 16;
+const int NTHREADS = 32;
+
+static void *
+test (void *arg)
+{
+  int i;
+  int tid;
+  unsigned char *p;
+  int devnum;
+
+  tid = (int) (long) arg;
+
+  devnum = acc_get_device_num (acc_device_nvidia);
+  acc_set_device_num (devnum, acc_device_nvidia);
+
+  if (acc_get_current_cuda_context () == NULL)
+    abort ();
+
+  p = (unsigned char *) malloc (N);
+
+  for (i = 0; i < N; i++)
+    {
+      p[i] = tid;
+    }
+
+  x[tid] = p;
+
+  d_x[tid] = acc_copyin (p, N);
+
+  acc_wait_all ();
+
+  return 0;
+}
+
+int
+main (int argc, char **argv)
+{
+  int i;
+  pthread_attr_t attr;
+  pthread_t *tid;
+  CUresult r;
+  CUstream s;
+
+  acc_init (acc_device_nvidia);
+
+  x = (unsigned char **) malloc (NTHREADS * N);
+  d_x = (void **) malloc (NTHREADS * N);
+
+  if (pthread_attr_init (&attr) != 0)
+    perror ("pthread_attr_init failed");
+
+  tid = (pthread_t *) malloc (NTHREADS * sizeof (pthread_t));
+
+  r = cuStreamCreate (&s, CU_STREAM_DEFAULT);
+  if (r != CUDA_SUCCESS)
+	{
+	  fprintf (stderr, "cuStreamCreate failed: %d\n", r);
+	  abort ();
+	}
+
+  if (!acc_set_cuda_stream (0, s))
+	  abort ();
+
+  for (i = 0; i < NTHREADS; i++)
+    {
+      if (pthread_create (&tid[i], &attr, &test, (void *) (unsigned long) (i))
+	  != 0)
+	perror ("pthread_create failed");
+    }
+
+  if (pthread_attr_destroy (&attr) != 0)
+    perror ("pthread_attr_destroy failed");
+
+  for (i = 0; i < NTHREADS; i++)
+    {
+      void *res;
+
+      if (pthread_join (tid[i], &res) != 0)
+	perror ("pthread join failed");
+    }
+
+
+  for (i = 0; i < NTHREADS; i++)
+    {
+      if (acc_is_present (x[i], N) != 1)
+	abort ();
+    }
+
+  acc_get_cuda_stream (1);
+
+  for (i = 0; i < NTHREADS; i++)
+    {
+      memset (x[i], 0, N);
+      acc_copyout (x[i], N);
+    }
+
+  acc_wait_all ();
+
+  for (i = 0; i < NTHREADS; i++)
+    {
+      unsigned char *p;
+      int j;
+
+      p = x[i];
+
+      for (j = 0; j < N; j++)
+	{
+	  if (p[j] != i)
+	    abort ();
+	}
+
+      if (acc_is_present (x[i], N) != 0)
+	abort ();
+    }
+
+  acc_shutdown (acc_device_nvidia);
+
+  return 0;
+}
+
+/* { dg-output "" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-91.c b/libgomp/testsuite/libgomp.oacc-c/lib-91.c
new file mode 100644
index 0000000..e00ef4f
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-91.c
@@ -0,0 +1,84 @@
+/* { dg-do run { target openacc_nvidia_accel_selected } } */
+/* { dg-additional-options "-lcuda" } */
+
+#include <stdlib.h>
+#include <unistd.h>
+#include <openacc.h>
+#include <sys/time.h>
+#include <stdio.h>
+#include <cuda.h>
+
+int
+main (int argc, char **argv)
+{
+  const int N = 1024 * 1024;
+  int i;
+  unsigned char *h;
+  void *d;
+  float async, sync;
+  struct timeval start, stop;
+  CUresult r;
+  CUstream s;
+
+  acc_init (acc_device_nvidia);
+
+  h = (unsigned char *) malloc (N);
+
+  for (i = 0; i < N; i++)
+    {
+      h[i] = i;
+    }
+
+  d = acc_malloc (N);
+
+  acc_map_data (h, d, N);
+
+  gettimeofday (&start, NULL);
+
+  for (i = 0; i < 100; i++)
+    {
+#pragma acc update device(h[0:N])
+    }
+
+  gettimeofday (&stop, NULL);
+
+  sync = (float) (stop.tv_sec - start.tv_sec);
+  sync += (float) ((stop.tv_usec - start.tv_usec) / 1000000.0);
+
+  gettimeofday (&start, NULL);
+
+  r = cuStreamCreate (&s, CU_STREAM_DEFAULT);
+  if (r != CUDA_SUCCESS)
+	{
+	  fprintf (stderr, "cuStreamCreate failed: %d\n", r);
+	  abort ();
+	}
+
+  if (!acc_set_cuda_stream (0, s))
+	  abort ();
+
+  for (i = 0; i < 100; i++)
+    {
+#pragma acc update device(h[0:N]) async(0)
+    }
+
+  acc_wait_all ();
+
+  gettimeofday (&stop, NULL);
+
+  async = (float) (stop.tv_sec - start.tv_sec);
+  async += (float) ((stop.tv_usec - start.tv_usec) / 1000000.0);
+
+  if (async > (sync * 1.5))
+    abort ();
+
+  acc_free (d);
+
+  free (h);
+
+  acc_shutdown (acc_device_nvidia);
+
+  return 0;
+}
+
+/* { dg-output "" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/lib-92.c b/libgomp/testsuite/libgomp.oacc-c/lib-92.c
new file mode 100644
index 0000000..18193e0
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/lib-92.c
@@ -0,0 +1,112 @@
+/* { dg-do run } */
+
+#include <pthread.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <errno.h>
+#include <ctype.h>
+#include <openacc.h>
+
+unsigned char **x;
+void **d_x;
+const int N = 32;
+const int NTHREADS = 32;
+
+static void *
+test (void *arg)
+{
+  int i;
+  int tid;
+  unsigned char *p;
+  int devnum;
+
+  tid = (int) (long) arg;
+
+  devnum = acc_get_device_num (acc_device_nvidia);
+  acc_set_device_num (devnum, acc_device_nvidia);
+
+  if (acc_get_current_cuda_context () == NULL)
+    abort ();
+
+  acc_copyout (x[tid], N);
+
+  p = x[tid];
+
+  for (i = 0; i < N; i++)
+    {
+      if (p[i] != i)
+	abort ();
+    }
+
+  return 0;
+}
+
+int
+main (int argc, char **argv)
+{
+  int i;
+  pthread_attr_t attr;
+  pthread_t *tid;
+  unsigned char *p;
+
+  if (acc_get_num_devices (acc_device_nvidia) == 0)
+    return 0;
+
+  acc_init (acc_device_nvidia);
+
+  x = (unsigned char **) malloc (NTHREADS * N);
+  d_x = (void **) malloc (NTHREADS * N);
+
+  for (i = 0; i < N; i++)
+    {
+      int j;
+
+      p = (unsigned char *) malloc (N);
+
+      x[i] = p;
+
+      for (j = 0; j < N; j++)
+	{
+	  p[j] = j;
+	}
+
+      d_x[i] = acc_copyin (p, N);
+    }
+
+  if (pthread_attr_init (&attr) != 0)
+    perror ("pthread_attr_init failed");
+
+  tid = (pthread_t *) malloc (NTHREADS * sizeof (pthread_t));
+
+  acc_get_cuda_stream (1);
+
+  for (i = 0; i < NTHREADS; i++)
+    {
+      if (pthread_create (&tid[i], &attr, &test, (void *) (unsigned long) (i))
+	  != 0)
+	perror ("pthread_create failed");
+    }
+
+  if (pthread_attr_destroy (&attr) != 0)
+    perror ("pthread_attr_destroy failed");
+
+  for (i = 0; i < NTHREADS; i++)
+    {
+      void *res;
+
+      if (pthread_join (tid[i], &res) != 0)
+	perror ("pthread join failed");
+    }
+
+  for (i = 0; i < NTHREADS; i++)
+    {
+      if (acc_is_present (x[i], N) != 0)
+	abort ();
+    }
+
+  acc_shutdown (acc_device_nvidia);
+
+  return 0;
+}
+
+/* { dg-output "" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/nested-1.c b/libgomp/testsuite/libgomp.oacc-c/nested-1.c
new file mode 100644
index 0000000..ededf2b
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/nested-1.c
@@ -0,0 +1,680 @@
+/* { dg-do run } */
+/* { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } } */
+
+#include <openacc.h>
+#include <string.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdbool.h>
+
+int
+main (int argc, char **argv)
+{
+    int N = 8;
+    float *a, *b, *c, *d;
+    int i;
+
+    a = (float *) malloc (N * sizeof (float));
+    b = (float *) malloc (N * sizeof (float));
+    c = (float *) malloc (N * sizeof (float));
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 3.0;
+        b[i] = 0.0;
+    }
+
+#pragma acc data copyin (a[0:N]) copyout (b[0:N])
+    {
+#pragma acc parallel
+        {
+            int ii;
+
+            for (ii = 0; ii < N; ii++)
+                b[ii] = a[ii];
+        }
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != 3.0)
+            abort ();
+    }
+
+    if (acc_is_present (&a[0], (N * sizeof (float))))
+      abort ();
+
+    if (acc_is_present (&b[0], (N * sizeof (float))))
+      abort ();
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 5.0;
+        b[i] = 1.0;
+    }
+
+#pragma acc data copyin (a[0:N]) copyout (b[0:N])
+    {
+#pragma acc parallel
+        {
+            int ii;
+
+            for (ii = 0; ii < N; ii++)
+                b[ii] = a[ii];
+        }
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != 5.0)
+            abort ();
+    }
+
+    if (acc_is_present (&a[0], (N * sizeof (float))))
+      abort ();
+
+    if (acc_is_present (&b[0], (N * sizeof (float))))
+      abort ();
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 6.0;
+        b[i] = 0.0;
+    }
+
+    d = (float *) acc_copyin (&a[0], N * sizeof (float));
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 9.0;
+    }
+
+#pragma acc data present_or_copyin (a[0:N]) copyout (b[0:N])
+    {
+#pragma acc parallel
+        {
+            int ii;
+
+            for (ii = 0; ii < N; ii++)
+                b[ii] = a[ii];
+        }
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != 6.0)
+            abort ();
+    }
+
+    if (!acc_is_present (&a[0], (N * sizeof (float))))
+      abort ();
+
+    if (acc_is_present (&b[0], (N * sizeof (float))))
+      abort ();
+
+    acc_free (d);
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 6.0;
+        b[i] = 0.0;
+    }
+
+#pragma acc data copyin (a[0:N]) present_or_copyout (b[0:N])
+    {
+#pragma acc parallel
+        {
+            int ii;
+
+            for (ii = 0; ii < N; ii++)
+                b[ii] = a[ii];
+        }
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != 6.0)
+            abort ();
+    }
+
+    if (acc_is_present (&a[0], (N * sizeof (float))))
+      abort ();
+
+    if (acc_is_present (&b[0], (N * sizeof (float))))
+      abort ();
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 5.0;
+        b[i] = 2.0;
+    }
+
+    d = (float *) acc_copyin (&b[0], N * sizeof (float));
+
+#pragma acc data copyin (a[0:N]) present_or_copyout (b[0:N])
+    {
+#pragma acc parallel
+        {
+            int ii;
+
+            for (ii = 0; ii < N; ii++)
+                b[ii] = a[ii];
+        }
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (a[i] != 5.0)
+            abort ();
+
+        if (b[i] != 2.0)
+            abort ();
+    }
+
+    if (acc_is_present (&a[0], (N * sizeof (float))))
+      abort ();
+
+    if (!acc_is_present (&b[0], (N * sizeof (float))))
+      abort ();
+
+    acc_free (d);
+
+    if (acc_is_present (&b[0], (N * sizeof (float))))
+      abort ();
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 3.0;
+        b[i] = 4.0;
+    }
+
+#pragma acc data copy (a[0:N]) copyout (b[0:N])
+    {
+#pragma acc parallel
+        {
+            int ii;
+
+            for (ii = 0; ii < N; ii++)
+            {
+                a[ii] = a[ii] + 1;
+                b[ii] = a[ii] + 2;
+            }
+        }
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (a[i] != 4.0)
+            abort ();
+
+        if (b[i] != 6.0)
+            abort ();
+    }
+
+    if (acc_is_present (&a[0], (N * sizeof (float))))
+      abort ();
+
+    if (acc_is_present (&b[0], (N * sizeof (float))))
+      abort ();
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 4.0;
+        b[i] = 7.0;
+    }
+
+#pragma acc data present_or_copy (a[0:N]) present_or_copy (b[0:N])
+    {
+#pragma acc parallel
+        {
+            int ii;
+
+            for (ii = 0; ii < N; ii++)
+            {
+                a[ii] = a[ii] + 1;
+                b[ii] = b[ii] + 2;
+            }
+        }
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (a[i] != 5.0)
+            abort ();
+
+        if (b[i] != 9.0)
+            abort ();
+    }
+
+    if (acc_is_present (&a[0], (N * sizeof (float))))
+      abort ();
+
+    if (acc_is_present (&b[0], (N * sizeof (float))))
+      abort ();
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 3.0;
+        b[i] = 7.0;
+    }
+
+    d = (float *) acc_copyin (&a[0], N * sizeof (float));
+    d = (float *) acc_copyin (&b[0], N * sizeof (float));
+
+#pragma acc data present_or_copy (a[0:N]) present_or_copy (b[0:N])
+    {
+#pragma acc parallel
+        {
+            int ii;
+
+            for (ii = 0; ii < N; ii++)
+            {
+                a[ii] = a[ii] + 1;
+                b[ii] = b[ii] + 2;
+            }
+        }
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (a[i] != 3.0)
+            abort ();
+
+        if (b[i] != 7.0)
+            abort ();
+    }
+
+    if (!acc_is_present (&a[0], (N * sizeof (float))))
+      abort ();
+
+    if (!acc_is_present (&b[0], (N * sizeof (float))))
+      abort ();
+
+    d = (float *) acc_deviceptr (&a[0]);
+    acc_unmap_data (&a[0]);
+    acc_free (d);
+
+    d = (float *) acc_deviceptr (&b[0]);
+    acc_unmap_data (&b[0]);
+    acc_free (d);
+
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 3.0;
+        b[i] = 7.0;
+    }
+
+#pragma acc data copyin (a[0:N]) create (c[0:N]) copyout (b[0:N])
+    {
+#pragma acc parallel
+        {
+            int ii;
+
+            for (ii = 0; ii < N; ii++)
+            {
+                c[ii] = a[ii];
+                b[ii] = c[ii];
+            }
+        }
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (a[i] != 3.0)
+            abort ();
+
+        if (b[i] != 3.0)
+            abort ();
+    }
+
+    if (acc_is_present (&a[0], (N * sizeof (float))))
+      abort ();
+
+    if (acc_is_present (&b[0], (N * sizeof (float))))
+      abort ();
+
+    if (acc_is_present (&c[0], (N * sizeof (float))))
+      abort ();
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 4.0;
+        b[i] = 8.0;
+    }
+
+#pragma acc data copyin (a[0:N]) present_or_create (c[0:N]) copyout (b[0:N])
+    {
+#pragma acc parallel
+        {
+            int ii;
+
+            for (ii = 0; ii < N; ii++)
+            {
+                c[ii] = a[ii];
+                b[ii] = c[ii];
+            }
+        }
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (a[i] != 4.0)
+            abort ();
+
+        if (b[i] != 4.0)
+            abort ();
+    }
+
+    if (acc_is_present (&a[0], (N * sizeof (float))))
+      abort ();
+
+    if (acc_is_present (&b[0], (N * sizeof (float))))
+      abort ();
+
+    if (acc_is_present (&c[0], (N * sizeof (float))))
+      abort ();
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 2.0;
+        b[i] = 5.0;
+    }
+
+    d = (float *) acc_malloc (N * sizeof (float));
+    acc_map_data (c, d, N * sizeof (float));
+
+#pragma acc data copyin (a[0:N]) present_or_create (c[0:N]) copyout (b[0:N])
+    {
+#pragma acc parallel
+        {
+            int ii;
+
+            for (ii = 0; ii < N; ii++)
+            {
+                c[ii] = a[ii];
+                b[ii] = c[ii];
+            }
+        }
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (a[i] != 2.0)
+            abort ();
+
+        if (b[i] != 2.0)
+            abort ();
+    }
+
+    if (acc_is_present (a, (N * sizeof (float))))
+      abort ();
+
+    if (acc_is_present (b, (N * sizeof (float))))
+      abort ();
+
+    if (!acc_is_present (c, (N * sizeof (float))))
+      abort ();
+
+    d = (float *) acc_deviceptr (c);
+
+    acc_unmap_data (c);
+
+    acc_free (d);
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 4.0;
+        b[i] = 8.0;
+    }
+
+    d = (float *) acc_malloc (N * sizeof (float));
+    acc_map_data (c, d, N * sizeof (float));
+
+#pragma acc data copyin (a[0:N]) present (c[0:N]) copyout (b[0:N])
+    {
+#pragma acc parallel
+        {
+            int ii;
+
+            for (ii = 0; ii < N; ii++)
+            {
+                c[ii] = a[ii];
+                b[ii] = c[ii];
+            }
+        }
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (a[i] != 4.0)
+            abort ();
+
+        if (b[i] != 4.0)
+            abort ();
+    }
+
+    if (acc_is_present (a, (N * sizeof (float))))
+      abort ();
+
+    if (acc_is_present (b, (N * sizeof (float))))
+      abort ();
+
+    if (!acc_is_present (c, (N * sizeof (float))))
+      abort ();
+
+    acc_unmap_data (c);
+
+    if (acc_is_present (c, (N * sizeof (float))))
+      abort ();
+
+    acc_free (d);
+
+    d = (float *) acc_malloc (N * sizeof (float));
+    acc_map_data (c, d, N * sizeof (float));
+
+    if (!acc_is_present (c, (N * sizeof (float))))
+      abort ();
+
+    d = (float *) acc_malloc (N * sizeof (float));
+    acc_map_data (b, d, N * sizeof (float));
+
+    if (!acc_is_present (b, (N * sizeof (float))))
+      abort ();
+
+    d = (float *) acc_malloc (N * sizeof (float));
+    acc_map_data (a, d, N * sizeof (float));
+
+    if (!acc_is_present (a, (N * sizeof (float))))
+      abort ();
+
+#pragma acc data present (a[0:N]) present (c[0:N]) present (b[0:N])
+    {
+#pragma acc parallel
+        {
+            int ii;
+
+            for (ii = 0; ii < N; ii++)
+            {
+                a[ii] = 1.0;
+                c[ii] = 2.0;
+                b[ii] = 4.0;
+            }
+        }
+    }
+
+    if (!acc_is_present (a, (N * sizeof (float))))
+      abort ();
+
+    if (!acc_is_present (b, (N * sizeof (float))))
+      abort ();
+
+    if (!acc_is_present (c, (N * sizeof (float))))
+      abort ();
+
+    acc_copyout (b, N * sizeof (float));
+
+    for (i = 0; i < N; i++)
+    {
+        if (a[i] != 4.0)
+            abort ();
+
+        if (b[i] != 4.0)
+            abort ();
+    }
+
+    d = (float *) acc_deviceptr (a);
+
+    acc_unmap_data (a);
+
+    acc_free (d);
+
+    d = (float *) acc_deviceptr (c);
+
+    acc_unmap_data (c);
+
+    acc_free (d);
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 3.0;
+        b[i] = 6.0;
+    }
+
+    d = (float *) acc_malloc (N * sizeof (float));
+
+#pragma acc parallel copyin (a[0:N]) deviceptr (d) copyout (b[0:N])
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+        {
+            d[ii] = a[ii];
+            b[ii] = d[ii];
+        }
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (a[i] != 3.0)
+            abort ();
+
+        if (b[i] != 3.0)
+            abort ();
+    }
+
+    if (acc_is_present (a, (N * sizeof (float))))
+      abort ();
+
+    if (acc_is_present (b, (N * sizeof (float))))
+      abort ();
+
+    acc_free (d);
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 6.0;
+        b[i] = 0.0;
+    }
+
+    d = (float *) acc_copyin (&a[0], N * sizeof (float));
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 9.0;
+    }
+
+#pragma acc data pcopyin (a[0:N]) copyout (b[0:N])
+    {
+#pragma acc parallel
+        {
+            int ii;
+
+            for (ii = 0; ii < N; ii++)
+                b[ii] = a[ii];
+        }
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != 6.0)
+            abort ();
+    }
+
+    if (!acc_is_present (&a[0], (N * sizeof (float))))
+      abort ();
+
+    if (acc_is_present (&b[0], (N * sizeof (float))))
+      abort ();
+
+    acc_free (d);
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 6.0;
+        b[i] = 0.0;
+    }
+
+#pragma acc data copyin (a[0:N]) pcopyout (b[0:N])
+    {
+#pragma acc parallel
+        {
+            int ii;
+
+            for (ii = 0; ii < N; ii++)
+                b[ii] = a[ii];
+        }
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != 6.0)
+            abort ();
+    }
+
+    if (acc_is_present (&a[0], (N * sizeof (float))))
+      abort ();
+
+    if (acc_is_present (&b[0], (N * sizeof (float))))
+      abort ();
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 5.0;
+        b[i] = 7.0;
+    }
+
+#pragma acc data copyin (a[0:N]) pcreate (c[0:N]) copyout (b[0:N])
+    {
+#pragma acc parallel
+        {
+            int ii;
+
+            for (ii = 0; ii < N; ii++)
+            {
+                c[ii] = a[ii];
+                b[ii] = c[ii];
+            }
+        }
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (a[i] != 5.0)
+            abort ();
+
+        if (b[i] != 5.0)
+            abort ();
+    }
+
+    if (acc_is_present (&a[0], (N * sizeof (float))))
+      abort ();
+
+    if (acc_is_present (&b[0], (N * sizeof (float))))
+      abort ();
+
+    if (acc_is_present (&c[0], (N * sizeof (float))))
+      abort ();
+
+    return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c/nested-2.c b/libgomp/testsuite/libgomp.oacc-c/nested-2.c
new file mode 100644
index 0000000..0579185
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/nested-2.c
@@ -0,0 +1,35 @@
+/* { dg-do run } */
+
+#include <stdlib.h>
+
+int
+main (int argc, char *argv[])
+{
+#define N 10
+  char a[N];
+
+  {
+    int i;
+    for (i = 0; i < N; ++i)
+      a[i] = 0;
+  }
+
+#pragma acc data copyout (a)
+  {
+#pragma acc parallel /* will result in a "dummy frame" */ present (a)
+    {
+      int i;
+      for (i = 0; i < N; ++i)
+	a[i] = i;
+    }
+  }
+
+  {
+    int i;
+    for (i = 0; i < N; ++i)
+      if (a[i] != i)
+	abort ();
+  }
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c/offset-1.c b/libgomp/testsuite/libgomp.oacc-c/offset-1.c
new file mode 100644
index 0000000..0bae23a
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/offset-1.c
@@ -0,0 +1,97 @@
+/* { dg-do run } */
+/* { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } } */
+
+#include <openacc.h>
+#include <string.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdbool.h>
+
+int
+main(int argc, char **argv)
+{
+    int N = 8;
+    float *a, *b;
+    int i;
+
+    a = (float *) malloc(N * sizeof (float));
+    b = (float *) malloc(N * sizeof (float));
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 2.0;
+        b[i] = 5.0;
+    }
+
+#pragma acc parallel copyin(a[2:4]) copyout(b[2:4])
+    {
+        b[2] = a[2];
+        b[3] = a[3];
+    }
+
+    for (i = 2; i < 4; i++)
+    {
+        if (a[i] != 2.0)
+            abort();
+
+        if (b[i] != 2.0)
+            abort();
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 3.0;
+        b[i] = 1.0;
+    }
+
+#pragma acc parallel copyin(a[0:4]) copyout(b[0:4])
+    {
+        b[0] = a[0];
+        b[1] = a[1];
+        b[2] = a[2];
+        b[3] = a[3];
+    }
+
+    for (i = 0; i < 4; i++)
+    {
+        if (a[i] != 3.0)
+            abort();
+
+        if (b[i] != 3.0)
+            abort();
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 9.0;
+        b[i] = 6.0;
+    }
+
+#pragma acc parallel copyin(a[0:4]) copyout(b[4:4])
+    {
+        b[4] = a[0];
+        b[5] = a[1];
+        b[6] = a[2];
+        b[7] = a[3];
+    }
+
+    for (i = 0; i < 4; i++)
+    {
+        if (a[i] != 9.0)
+            abort();
+    }
+
+    for (i = 4; i < 8; i++)
+    {
+        if (b[i] != 9.0)
+            abort();
+    }
+
+    if (acc_is_present (a, (N * sizeof (float))))
+      abort();
+
+    if (acc_is_present (b, (N * sizeof (float))))
+      abort();
+
+    return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c/parallel-1.c b/libgomp/testsuite/libgomp.oacc-c/parallel-1.c
index 68f7de5..fd9df33 100644
--- a/libgomp/testsuite/libgomp.oacc-c/parallel-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c/parallel-1.c
@@ -1,6 +1,6 @@
 /* { dg-do run } */
 
-extern void abort ();
+#include <stdlib.h>
 
 int i;
 
@@ -8,7 +8,6 @@ int main(void)
 {
   int j, v;
 
-#if 0
   i = -1;
   j = -2;
   v = 0;
@@ -22,8 +21,13 @@ int main(void)
       abort ();
     v = 1;
   }
+#if ACC_MEM_SHARED
+  if (v != 1 || i != 2 || j != 1)
+    abort ();
+#else
   if (v != 1 || i != -1 || j != -2)
     abort ();
+#endif
 
   i = -1;
   j = -2;
@@ -66,6 +70,10 @@ int main(void)
       abort ();
     v = 1;
   }
+#if ACC_MEM_SHARED
+  if (v != 1 || i != 2 || j != 1)
+    abort ();
+#else
   if (v != 1 || i != -1 || j != -2)
     abort ();
 #endif
@@ -83,8 +91,15 @@ int main(void)
       abort ();
     v = 1;
   }
+  if (v != 1)
+    abort ();
+#if ACC_MEM_SHARED
+  if (v != 1 || i != 2 || j != 1)
+    abort ();
+#else
   if (v != 1 || i != -1 || j != -2)
     abort ();
+#endif
 
   i = -1;
   j = -2;
@@ -127,43 +142,64 @@ int main(void)
       abort ();
     v = 1;
   }
+  if (v != 1)
+    abort ();
+#if ACC_MEM_SHARED
+  if (v != 1 || i != 2 || j != 1)
+    abort ();
+#else
   if (v != 1 || i != -1 || j != -2)
     abort ();
+#endif
 
-#if 0
   i = -1;
   j = -2;
   v = 0;
-#pragma acc parallel /* copyout */ present_or_copyout (v) present (i, j)
+
+#pragma acc data copyin (i, j)
   {
-    if (i != -1 || j != -2)
-      abort ();
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
+#pragma acc parallel /* copyout */ present_or_copyout (v) present (i, j)
+    {
+      if (i != -1 || j != -2)
+        abort ();
+      i = 2;
+      j = 1;
+      if (i != 2 || j != 1)
+        abort ();
+      v = 1;
+    }
   }
+#if ACC_MEM_SHARED
   if (v != 1 || i != 2 || j != 1)
     abort ();
+#else
+  if (v != 1 || i != -1 || j != -2)
+    abort ();
 #endif
 
-#if 0
   i = -1;
   j = -2;
   v = 0;
-#pragma acc parallel /* copyout */ present_or_copyout (v)
+
+#pragma acc data copyin(i, j)
   {
-    if (i != -1 || j != -2)
-      abort ();
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
+#pragma acc parallel /* copyout */ present_or_copyout (v)
+    {
+      if (i != -1 || j != -2)
+        abort ();
+      i = 2;
+      j = 1;
+      if (i != 2 || j != 1)
+        abort ();
+      v = 1;
+    }
   }
+#if ACC_MEM_SHARED
   if (v != 1 || i != 2 || j != 1)
     abort ();
+#else
+  if (v != 1 || i != -1 || j != -2)
+    abort ();
 #endif
 
   return 0;
diff --git a/libgomp/testsuite/libgomp.oacc-c/pointer-align-1.c b/libgomp/testsuite/libgomp.oacc-c/pointer-align-1.c
new file mode 100644
index 0000000..f7d5b9b
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/pointer-align-1.c
@@ -0,0 +1,35 @@
+/* { dg-do run } */
+
+/* PR middle-end/63247 */
+
+#include <stdlib.h>
+
+int
+main(int argc, char **argv)
+{
+#define N 4
+    short a[N];
+
+    a[0] = 10;
+    a[1] = 10;
+    a[2] = 10;
+    a[3] = 10;
+
+#pragma acc parallel copy(a[1:N-1])
+    {
+      a[1] = 51;
+      a[2] = 52;
+      a[3] = 53;
+    }
+
+    if (a[0] != 10)
+      abort ();
+    if (a[1] != 51)
+      abort ();
+    if (a[2] != 52)
+      abort ();
+    if (a[3] != 53)
+      abort ();
+
+    return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c/present-1.c b/libgomp/testsuite/libgomp.oacc-c/present-1.c
new file mode 100644
index 0000000..f331f1f
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/present-1.c
@@ -0,0 +1,48 @@
+/* { dg-do run } */
+/* { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } } */
+
+#include <openacc.h>
+#include <string.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdbool.h>
+
+int
+main (int argc, char **argv)
+{
+    int N = 8;
+    float *a, *b, *c, *d;
+    int i;
+
+    a = (float *) malloc (N * sizeof (float));
+    b = (float *) malloc (N * sizeof (float));
+    c = (float *) malloc (N * sizeof (float));
+
+    d = (float *) acc_malloc (N * sizeof (float));
+    acc_map_data (c, d, N * sizeof (float));
+
+#pragma acc data present (a[0:N]) present (c[0:N]) present (b[0:N])
+    {
+#pragma acc parallel
+        {
+            int ii;
+
+            for (ii = 0; ii < N; ii++)
+            {
+                c[ii] = a[ii];
+                b[ii] = c[ii];
+            }
+        }
+    }
+
+    d = (float *) acc_deviceptr (c);
+    acc_unmap_data (c);
+    acc_free (d);
+
+    free (a);
+    free (b);
+    free (c);
+
+    return 0;
+}
+/* { dg-shouldfail "libgomp: present clause: !acc_is_present" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c/present-2.c b/libgomp/testsuite/libgomp.oacc-c/present-2.c
new file mode 100644
index 0000000..41efa70
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/present-2.c
@@ -0,0 +1,48 @@
+/* { dg-do run } */
+/* { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } } */
+
+#include <openacc.h>
+#include <stdlib.h>
+
+int
+main (int argc, char **argv)
+{
+  int N = 8;
+  float *a, *b;
+  int i;
+
+  a = (float *) malloc (N * sizeof (float));
+  b = (float *) malloc (N * sizeof (float));
+
+  for (i = 0; i < N; i++)
+    {
+      a[i] = 4.0;
+      b[i] = 0.0;
+    }
+
+#pragma acc data copyin(a[0:N]) copyout(b[0:N])
+  {
+
+#pragma acc parallel present(a[0:N])
+    {
+      int ii;
+
+      for (ii = 0; ii < N; ii++)
+	{
+	  b[ii] = a[ii];
+	}
+    }
+
+  }
+
+  for (i = 0; i < N; i++)
+    {
+      if (a[i] != 4.0)
+	abort ();
+
+      if (b[i] != 4.0)
+	abort ();
+    }
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c/subr.cu b/libgomp/testsuite/libgomp.oacc-c/subr.cu
new file mode 100644
index 0000000..e86e0fc
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/subr.cu
@@ -0,0 +1,64 @@
+
+extern "C" __global__ void
+delay (clock_t * d_o, clock_t delay)
+{
+  clock_t start, ticks;
+
+  start = clock ();
+
+  ticks = 0;
+
+  while (ticks < delay)
+    ticks = clock () - start;
+}
+
+extern "C" __global__ void
+delay2 (unsigned long *d_o, clock_t delay, unsigned long tid)
+{
+  clock_t start, ticks;
+
+  start = clock ();
+
+  ticks = 0;
+
+  while (ticks < delay)
+    ticks = clock () - start;
+
+  d_o[0] = tid;
+}
+
+extern "C" __global__ void
+sum (clock_t * d_o, int N)
+{
+  int i;
+  clock_t sum;
+  __shared__ clock_t ticks[32];
+
+  sum = 0;
+
+  for (i = threadIdx.x; i < N; i += blockDim.x)
+    sum += d_o[i];
+
+  ticks[threadIdx.x] = sum;
+
+  syncthreads ();
+
+  for (i = 16; i >= 1; i >>= 1)
+    {
+      if (threadIdx.x < i)
+	ticks[threadIdx.x] += ticks[threadIdx.x + i];
+
+      syncthreads ();
+    }
+
+  d_o[0] = ticks[0];
+}
+
+extern "C" __global__ void
+mult (int n, float *x, float *y)
+{
+  int i = blockIdx.x * blockDim.x + threadIdx.x;
+
+  for (i = 0; i < n; i++)
+    y[i] = x[i] * x[i];
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c/subr.ptx b/libgomp/testsuite/libgomp.oacc-c/subr.ptx
new file mode 100644
index 0000000..6f748fc
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/subr.ptx
@@ -0,0 +1,148 @@
+// BEGIN PREAMBLE
+	.version	3.1
+	.target	sm_30
+	.address_size 64
+// END PREAMBLE
+
+// BEGIN FUNCTION DEF: clock
+.func (.param.u32 %out_retval)clock
+{
+.reg.u32 %retval;
+	.reg.u64 %hr10;
+	.reg.u32 %r22;
+	.reg.u32 %r23;
+	.reg.u32 %r24;
+	.local.align 8 .b8 %frame[8];
+	// #APP 
+// 7 "subr.c" 1
+	mov.u32 %r24, %clock;
+// 0 "" 2
+	// #NO_APP 
+		st.local.u32	[%frame], %r24;
+		ld.local.u32	%r22, [%frame];
+		mov.u32	%r23, %r22;
+		mov.u32	%retval, %r23;
+	st.param.u32	[%out_retval], %retval;
+	ret;
+	}
+// END FUNCTION DEF
+// BEGIN GLOBAL FUNCTION DEF: delay
+.visible .entry delay(.param.u64 %in_ar1, .param.u64 %in_ar2)
+{
+	.reg.u64 %ar1;
+	.reg.u64 %ar2;
+	.reg.u64 %hr10;
+	.reg.u64 %r22;
+	.reg.u32 %r23;
+	.reg.u64 %r24;
+	.reg.u64 %r25;
+	.reg.u32 %r26;
+	.reg.u32 %r27;
+	.reg.u32 %r28;
+	.reg.u32 %r29;
+	.reg.u32 %r30;
+	.reg.u64 %r31;
+	.reg.pred %r32;
+	.local.align 8 .b8 %frame[24];
+	ld.param.u64 %ar1, [%in_ar1];
+	ld.param.u64 %ar2, [%in_ar2];
+		mov.u64	%r24, %ar1;
+		st.u64	[%frame+8], %r24;
+		mov.u64	%r25, %ar2;
+		st.local.u64	[%frame+16], %r25;
+	{
+		.param.u32 %retval_in;
+	{
+		call (%retval_in), clock;
+	}
+		ld.param.u32	%r26, [%retval_in];
+}
+		st.local.u32	[%frame+4], %r26;
+		mov.u32	%r27, 0;
+		st.local.u32	[%frame], %r27;
+		bra	$L4;
+$L5:
+	{
+		.param.u32 %retval_in;
+	{
+		call (%retval_in), clock;
+	}
+		ld.param.u32	%r28, [%retval_in];
+}
+		mov.u32	%r23, %r28;
+		ld.local.u32	%r30, [%frame+4];
+		sub.u32	%r29, %r23, %r30;
+		st.local.u32	[%frame], %r29;
+$L4:
+		ld.local.s32	%r22, [%frame];
+		ld.local.u64	%r31, [%frame+16];
+		setp.lo.u64 %r32,%r22,%r31;
+	@%r32	bra	$L5;
+	ret;
+	}
+// END FUNCTION DEF
+// BEGIN GLOBAL FUNCTION DEF: delay2
+.visible .entry delay2(.param.u64 %in_ar1, .param.u64 %in_ar2, .param.u64 %in_ar3)
+{
+	.reg.u64 %ar1;
+	.reg.u64 %ar2;
+	.reg.u64 %ar3;
+	.reg.u64 %hr10;
+	.reg.u64 %r22;
+	.reg.u32 %r23;
+	.reg.u64 %r24;
+	.reg.u64 %r25;
+	.reg.u64 %r26;
+	.reg.u32 %r27;
+	.reg.u32 %r28;
+	.reg.u32 %r29;
+	.reg.u32 %r30;
+	.reg.u32 %r31;
+	.reg.u64 %r32;
+	.reg.pred %r33;
+	.reg.u64 %r34;
+	.reg.u64 %r35;
+	.local.align 8 .b8 %frame[32];
+	ld.param.u64 %ar1, [%in_ar1];
+	ld.param.u64 %ar2, [%in_ar2];
+	ld.param.u64 %ar3, [%in_ar3];
+		mov.u64	%r24, %ar1;
+		st.local.u64	[%frame+8], %r24;
+		mov.u64	%r25, %ar2;
+		st.local.u64	[%frame+16], %r25;
+		mov.u64	%r26, %ar3;
+		st.local.u64	[%frame+24], %r26;
+	{
+		.param.u32 %retval_in;
+	{
+		call (%retval_in), clock;
+	}
+		ld.param.u32	%r27, [%retval_in];
+}
+		st.local.u32	[%frame+4], %r27;
+		mov.u32	%r28, 0;
+		st.local.u32	[%frame], %r28;
+		bra	$L8;
+$L9:
+	{
+		.param.u32 %retval_in;
+	{
+		call (%retval_in), clock;
+	}
+		ld.param.u32	%r29, [%retval_in];
+}
+		mov.u32	%r23, %r29;
+		ld.local.u32	%r31, [%frame+4];
+		sub.u32	%r30, %r23, %r31;
+		st.local.u32	[%frame], %r30;
+$L8:
+		ld.local.s32	%r22, [%frame];
+		ld.local.u64	%r32, [%frame+16];
+		setp.lo.u64 %r33,%r22,%r32;
+	@%r33	bra	$L9;
+		ld.local.u64	%r34, [%frame+8];
+		ld.local.u64	%r35, [%frame+24];
+		st.u64	[%r34], %r35;
+	ret;
+	}
+// END FUNCTION DEF
diff --git a/libgomp/testsuite/libgomp.oacc-c/timer.h b/libgomp/testsuite/libgomp.oacc-c/timer.h
new file mode 100644
index 0000000..53749da
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/timer.h
@@ -0,0 +1,103 @@
+
+#include <stdio.h>
+#include <cuda.h>
+
+static int _Tnum_timers;
+static CUevent *_Tstart_events, *_Tstop_events;
+static CUstream _Tstream;
+
+void
+init_timers (int ntimers)
+{
+  int i;
+  CUresult r;
+
+  _Tnum_timers = ntimers;
+
+  _Tstart_events = (CUevent *) malloc (_Tnum_timers * sizeof (CUevent));
+  _Tstop_events = (CUevent *) malloc (_Tnum_timers * sizeof (CUevent));
+
+  r = cuStreamCreate (&_Tstream, CU_STREAM_DEFAULT);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuStreamCreate failed: %d\n", r);
+      abort ();
+    }
+
+  for (i = 0; i < _Tnum_timers; i++)
+    {
+      r = cuEventCreate (&_Tstart_events[i], CU_EVENT_DEFAULT);
+      if (r != CUDA_SUCCESS)
+	{
+	  fprintf (stderr, "cuEventCreate failed: %d\n", r);
+	  abort ();
+	}
+
+      r = cuEventCreate (&_Tstop_events[i], CU_EVENT_DEFAULT);
+      if (r != CUDA_SUCCESS)
+	{
+	  fprintf (stderr, "cuEventCreate failed: %d\n", r);
+	  abort ();
+	}
+    }
+}
+
+void
+fini_timers (void)
+{
+  int i;
+
+  for (i = 0; i < _Tnum_timers; i++)
+    {
+      cuEventDestroy (_Tstart_events[i]);
+      cuEventDestroy (_Tstop_events[i]);
+    }
+
+  cuStreamDestroy (_Tstream);
+
+  free (_Tstart_events);
+  free (_Tstop_events);
+}
+
+void
+start_timer (int timer)
+{
+  CUresult r;
+
+  r = cuEventRecord (_Tstart_events[timer], _Tstream);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuEventRecord failed: %d\n", r);
+      abort ();
+    }
+}
+
+float
+stop_timer (int timer)
+{
+  CUresult r;
+  float etime;
+
+  r = cuEventRecord (_Tstop_events[timer], _Tstream);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuEventRecord failed: %d\n", r);
+      abort ();
+    }
+
+  r = cuEventSynchronize (_Tstop_events[timer]);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuEventSynchronize failed: %d\n", r);
+      abort ();
+    }
+
+  r = cuEventElapsedTime (&etime, _Tstart_events[timer], _Tstop_events[timer]);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuEventElapsedTime failed: %d\n", r);
+      abort ();
+    }
+
+  return etime;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c/update-1.c b/libgomp/testsuite/libgomp.oacc-c/update-1.c
new file mode 100644
index 0000000..dff139f
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/update-1.c
@@ -0,0 +1,280 @@
+/* { dg-do run } */
+/* { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } } */
+
+#include <openacc.h>
+#include <string.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdbool.h>
+
+int
+main (int argc, char **argv)
+{
+    int N = 8;
+    float *a, *b, *c;
+    float *d_a, *d_b, *d_c;
+    int i;
+
+    a = (float *) malloc (N * sizeof (float));
+    b = (float *) malloc (N * sizeof (float));
+    c = (float *) malloc (N * sizeof (float));
+
+    d_a = (float *) acc_malloc (N * sizeof (float));
+    d_b = (float *) acc_malloc (N * sizeof (float));
+    d_c = (float *) acc_malloc (N * sizeof (float));
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 3.0;
+        b[i] = 0.0;
+    }
+
+    acc_map_data (a, d_a, N * sizeof (float));
+    acc_map_data (b, d_b, N * sizeof (float));
+    acc_map_data (c, d_c, N * sizeof (float));
+
+#pragma acc update device (a[0:N], b[0:N])
+
+#pragma acc parallel present (a[0:N], b[0:N])
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            b[ii] = a[ii];
+    }
+
+#pragma acc update host (a[0:N], b[0:N])
+
+    for (i = 0; i < N; i++)
+    {
+        if (a[i] != 3.0)
+            abort ();
+
+        if (b[i] != 3.0)
+            abort ();
+    }
+
+    if (!acc_is_present (&a[0], (N * sizeof (float))))
+      abort ();
+
+    if (!acc_is_present (&b[0], (N * sizeof (float))))
+      abort ();
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 5.0;
+        b[i] = 1.0;
+    }
+
+#pragma acc update device (a[0:N], b[0:N])
+
+#pragma acc parallel present (a[0:N], b[0:N])
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            b[ii] = a[ii];
+    }
+
+#pragma acc update host (a[0:N], b[0:N])
+
+    for (i = 0; i < N; i++)
+    {
+        if (a[i] != 5.0)
+            abort ();
+
+        if (b[i] != 5.0)
+            abort ();
+    }
+
+    if (!acc_is_present (&a[0], (N * sizeof (float))))
+      abort ();
+
+    if (!acc_is_present (&b[0], (N * sizeof (float))))
+      abort ();
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 5.0;
+        b[i] = 1.0;
+    }
+
+#pragma acc update device (a[0:N], b[0:N])
+
+#pragma acc parallel present (a[0:N], b[0:N])
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            b[ii] = a[ii];
+    }
+
+#pragma acc update self (a[0:N], b[0:N])
+
+    for (i = 0; i < N; i++)
+    {
+        if (a[i] != 5.0)
+            abort ();
+
+        if (b[i] != 5.0)
+            abort ();
+    }
+
+    if (!acc_is_present (&a[0], (N * sizeof (float))))
+      abort ();
+
+    if (!acc_is_present (&b[0], (N * sizeof (float))))
+      abort ();
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 6.0;
+        b[i] = 0.0;
+    }
+
+#pragma acc update device (a[0:N], b[0:N])
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 9.0;
+    }
+
+#pragma acc parallel present (a[0:N], b[0:N])
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            b[ii] = a[ii];
+    }
+
+#pragma acc update host (a[0:N], b[0:N])
+
+    for (i = 0; i < N; i++)
+    {
+        if (a[i] != 6.0)
+            abort ();
+
+        if (b[i] != 6.0)
+            abort ();
+    }
+
+    if (!acc_is_present (&a[0], (N * sizeof (float))))
+      abort ();
+
+    if (!acc_is_present (&b[0], (N * sizeof (float))))
+      abort ();
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 7.0;
+        b[i] = 2.0;
+    }
+
+#pragma acc update device (a[0:N], b[0:N])
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 9.0;
+    }
+
+#pragma acc parallel present (a[0:N], b[0:N])
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            b[ii] = a[ii];
+    }
+
+#pragma acc update host (a[0:N], b[0:N])
+
+    for (i = 0; i < N; i++)
+    {
+        if (a[i] != 7.0)
+            abort ();
+
+        if (b[i] != 7.0)
+            abort ();
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 9.0;
+    }
+
+#pragma acc update device (a[0:N])
+
+#pragma acc parallel present (a[0:N], b[0:N])
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            b[ii] = a[ii];
+    }
+
+#pragma acc update host (a[0:N], b[0:N])
+
+    for (i = 0; i < N; i++)
+    {
+        if (a[i] != 9.0)
+            abort ();
+
+        if (b[i] != 9.0)
+            abort ();
+    }
+
+    if (!acc_is_present (&a[0], (N * sizeof (float))))
+      abort ();
+
+    if (!acc_is_present (&b[0], (N * sizeof (float))))
+      abort ();
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 5.0;
+    }
+
+#pragma acc update device (a[0:N])
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 6.0;
+    }
+
+#pragma acc update device (a[0:N >> 1])
+
+#pragma acc parallel present (a[0:N], b[0:N])
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            b[ii] = a[ii];
+    }
+
+#pragma acc update host (a[0:N], b[0:N])
+
+    for (i = 0; i < (N >> 1); i++)
+    {
+        if (a[i] != 6.0)
+            abort ();
+
+        if (b[i] != 6.0)
+            abort ();
+    }
+
+    for (i = (N >> 1); i < N; i++)
+    {
+        if (a[i] != 5.0)
+            abort ();
+
+        if (b[i] != 5.0)
+            abort ();
+    }
+
+    if (!acc_is_present (&a[0], (N * sizeof (float))))
+      abort ();
+
+    if (!acc_is_present (&b[0], (N * sizeof (float))))
+      abort ();
+
+    return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/abort-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/abort-1.f90
new file mode 100644
index 0000000..52b030b
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/abort-1.f90
@@ -0,0 +1,10 @@
+! { dg-shouldfail "" { *-*-* } { "*" } { "" } }
+
+program main
+  implicit none
+
+  !$acc parallel
+  call abort
+  !$acc end parallel
+
+end program main
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/abort-2.f90 b/libgomp/testsuite/libgomp.oacc-fortran/abort-2.f90
new file mode 100644
index 0000000..2ba2bcb
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/abort-2.f90
@@ -0,0 +1,13 @@
+program main
+  implicit none
+
+  integer :: argc
+  argc = command_argument_count ()
+
+  !$acc parallel copyin(argc)
+  if (argc .ne. 0) then
+     call abort
+  end if
+  !$acc end parallel
+
+end program main
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90
index c4597a6..4488818 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90
@@ -1,5 +1,4 @@
-! TODO: Remove -DACC_DEVICE_TYPE_host once that is set by the test harness.
-! { dg-additional-options "-cpp -DACC_DEVICE_TYPE_host" }
+! { dg-additional-options "-cpp" }
 ! TODO: Have to disable the acc_on_device builtin for we want to test the
 ! libgomp library function?  The command line option
 ! '-fno-builtin-acc_on_device' is valid for C/C++/ObjC/ObjC++ but not for
@@ -12,7 +11,9 @@ implicit none
 
 if (.not. acc_on_device (acc_device_none)) call abort
 if (.not. acc_on_device (acc_device_host)) call abort
+if (acc_on_device (acc_device_host_nonshm)) call abort
 if (acc_on_device (acc_device_not_host)) call abort
+if (acc_on_device (acc_device_nvidia)) call abort
 
 
 ! Host via offloading fallback mode.
@@ -20,7 +21,9 @@ if (acc_on_device (acc_device_not_host)) call abort
 !$acc parallel if(.false.)
 if (.not. acc_on_device (acc_device_none)) call abort
 if (.not. acc_on_device (acc_device_host)) call abort
+if (acc_on_device (acc_device_host_nonshm)) call abort
 if (acc_on_device (acc_device_not_host)) call abort
+if (acc_on_device (acc_device_nvidia)) call abort
 !$acc end parallel
 
 
@@ -31,7 +34,17 @@ if (acc_on_device (acc_device_not_host)) call abort
 !$acc parallel
 if (acc_on_device (acc_device_none)) call abort
 if (acc_on_device (acc_device_host)) call abort
+#if ACC_DEVICE_TYPE_host_nonshm
+if (.not. acc_on_device (acc_device_host_nonshm)) call abort
+#else
+if (acc_on_device (acc_device_host_nonshm)) call abort
+#endif
 if (.not. acc_on_device (acc_device_not_host)) call abort
+#if ACC_DEVICE_TYPE_nvidia
+if (.not. acc_on_device (acc_device_nvidia)) call abort
+#else
+if (acc_on_device (acc_device_nvidia)) call abort
+#endif
 !$acc end parallel
 
 #endif
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f b/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f
index 3787e1e..0047a19 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f
+++ b/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f
@@ -1,5 +1,4 @@
-! TODO: Remove -DACC_DEVICE_TYPE_host once that is set by the test harness.
-! { dg-additional-options "-cpp -DACC_DEVICE_TYPE_host" }
+! { dg-additional-options "-cpp" }
 ! TODO: Have to disable the acc_on_device builtin for we want to test
 ! the libgomp library function?  The command line option
 ! '-fno-builtin-acc_on_device' is valid for C/C++/ObjC/ObjC++ but not
@@ -12,7 +11,9 @@
 
       IF (.NOT. ACC_ON_DEVICE (ACC_DEVICE_NONE)) CALL ABORT
       IF (.NOT. ACC_ON_DEVICE (ACC_DEVICE_HOST)) CALL ABORT
+      IF (ACC_ON_DEVICE (ACC_DEVICE_HOST_NONSHM)) CALL ABORT
       IF (ACC_ON_DEVICE (ACC_DEVICE_NOT_HOST)) CALL ABORT
+      IF (ACC_ON_DEVICE (ACC_DEVICE_NVIDIA)) CALL ABORT
 
 
 !Host via offloading fallback mode.
@@ -20,7 +21,9 @@
 !$ACC PARALLEL IF(.FALSE.)
       IF (.NOT. ACC_ON_DEVICE (ACC_DEVICE_NONE)) CALL ABORT
       IF (.NOT. ACC_ON_DEVICE (ACC_DEVICE_HOST)) CALL ABORT
+      IF (ACC_ON_DEVICE (ACC_DEVICE_HOST_NONSHM)) CALL ABORT
       IF (ACC_ON_DEVICE (ACC_DEVICE_NOT_HOST)) CALL ABORT
+      IF (ACC_ON_DEVICE (ACC_DEVICE_NVIDIA)) CALL ABORT
 !$ACC END PARALLEL
 
 
@@ -31,7 +34,17 @@
 !$ACC PARALLEL
       IF (ACC_ON_DEVICE (ACC_DEVICE_NONE)) CALL ABORT
       IF (ACC_ON_DEVICE (ACC_DEVICE_HOST)) CALL ABORT
+#if ACC_DEVICE_TYPE_host_nonshm
+      IF (.NOT. ACC_ON_DEVICE (ACC_DEVICE_HOST_NONSHM)) CALL ABORT
+#else
+      IF (ACC_ON_DEVICE (ACC_DEVICE_HOST_NONSHM)) CALL ABORT
+#endif
       IF (.NOT. ACC_ON_DEVICE (ACC_DEVICE_NOT_HOST)) CALL ABORT
+#if ACC_DEVICE_TYPE_nvidia
+      IF (.NOT. ACC_ON_DEVICE (ACC_DEVICE_NVIDIA)) CALL ABORT
+#else
+      IF (ACC_ON_DEVICE (ACC_DEVICE_NVIDIA)) CALL ABORT
+#endif
 !$ACC END PARALLEL
 
 #endif
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f b/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f
index 1ee5926..49d7a72 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f
+++ b/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f
@@ -1,5 +1,4 @@
-! TODO: Remove -DACC_DEVICE_TYPE_host once that is set by the test harness.
-! { dg-additional-options "-cpp -DACC_DEVICE_TYPE_host" }
+! { dg-additional-options "-cpp" }
 ! TODO: Have to disable the acc_on_device builtin for we want to test
 ! the libgomp library function?  The command line option
 ! '-fno-builtin-acc_on_device' is valid for C/C++/ObjC/ObjC++ but not
@@ -12,7 +11,9 @@
 
       IF (.NOT. ACC_ON_DEVICE (ACC_DEVICE_NONE)) CALL ABORT
       IF (.NOT. ACC_ON_DEVICE (ACC_DEVICE_HOST)) CALL ABORT
+      IF (ACC_ON_DEVICE (ACC_DEVICE_HOST_NONSHM)) CALL ABORT
       IF (ACC_ON_DEVICE (ACC_DEVICE_NOT_HOST)) CALL ABORT
+      IF (ACC_ON_DEVICE (ACC_DEVICE_NVIDIA)) CALL ABORT
 
 
 !Host via offloading fallback mode.
@@ -20,7 +21,9 @@
 !$ACC PARALLEL IF(.FALSE.)
       IF (.NOT. ACC_ON_DEVICE (ACC_DEVICE_NONE)) CALL ABORT
       IF (.NOT. ACC_ON_DEVICE (ACC_DEVICE_HOST)) CALL ABORT
+      IF (ACC_ON_DEVICE (ACC_DEVICE_HOST_NONSHM)) CALL ABORT
       IF (ACC_ON_DEVICE (ACC_DEVICE_NOT_HOST)) CALL ABORT
+      IF (ACC_ON_DEVICE (ACC_DEVICE_NVIDIA)) CALL ABORT
 !$ACC END PARALLEL
 
 
@@ -31,7 +34,17 @@
 !$ACC PARALLEL
       IF (ACC_ON_DEVICE (ACC_DEVICE_NONE)) CALL ABORT
       IF (ACC_ON_DEVICE (ACC_DEVICE_HOST)) CALL ABORT
+#if ACC_DEVICE_TYPE_host_nonshm
+      IF (.NOT. ACC_ON_DEVICE (ACC_DEVICE_HOST_NONSHM)) CALL ABORT
+#else
+      IF (ACC_ON_DEVICE (ACC_DEVICE_HOST_NONSHM)) CALL ABORT
+#endif
       IF (.NOT. ACC_ON_DEVICE (ACC_DEVICE_NOT_HOST)) CALL ABORT
+#if ACC_DEVICE_TYPE_nvidia
+      IF (.NOT. ACC_ON_DEVICE (ACC_DEVICE_NVIDIA)) CALL ABORT
+#else
+      IF (ACC_ON_DEVICE (ACC_DEVICE_NVIDIA)) CALL ABORT
+#endif
 !$ACC END PARALLEL
 
 #endif
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/fortran.exp b/libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
index cd0ab26..312f947 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
+++ b/libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
@@ -21,7 +21,8 @@ set quadmath_library_path "../libquadmath/.libs"
 dg-init
 
 # Turn on OpenACC.
-lappend ALWAYS_CFLAGS "additional_flags=-fopenacc"
+# XXX (TEMPORARY): Remove the -flto once that's properly integrated.
+lappend ALWAYS_CFLAGS "additional_flags=-fopenacc -flto"
 
 if { $blddir != "" } {
     set lang_source_re {^.*\.[fF](|90|95|03|08)$}
@@ -65,10 +66,41 @@ if { $lang_test_file_found } {
     append ld_library_path [gcc-set-multilib-library-path $GCC_UNDER_TEST]
     set_ld_library_path_env_vars
 
-    # For Fortran we're doing torture testing, as Fortran has far more tests
-    # with arrays etc. that testing just -O0 or -O2 is insufficient, that is
-    # typically not the case for C/C++.
-    gfortran-dg-runtest $tests "" ""
+    # Todo: get list of accelerators from configure options --enable-accelerator.
+    set accels { "nvidia" "host_nonshm" }
+
+    # Run on host (or fallback) accelerator.
+    lappend accels "host"
+
+    # Test OpenACC with available accelerators.
+    foreach accel $accels {
+	set tagopt "-DACC_DEVICE_TYPE_$accel=1"
+
+	# Todo: Determine shared memory or not using run-time test.
+	switch $accel {
+	    host {
+		set acc_mem_shared 1
+	    }
+	    host_nonshm {
+		set acc_mem_shared 0
+	    }
+	    nvidia {
+		set acc_mem_shared 0
+	    }
+	    default {
+		set acc_mem_shared 0
+	    }
+	}
+	set tagopt "$tagopt -DACC_MEM_SHARED=$acc_mem_shared"
+
+	# Todo: Verify that this works for both local and remote testing.
+	setenv ACC_DEVICE_TYPE $accel
+
+	# For Fortran we're doing torture testing, as Fortran has far more tests
+	# with arrays etc. that testing just -O0 or -O2 is insufficient, that is
+	# typically not the case for C/C++.
+	gfortran-dg-runtest $tests "$tagopt" ""
+    }
 }
 
 # All done.
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/lib-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/lib-1.f90
index 124aa87..51dc452 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/lib-1.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/lib-1.f90
@@ -1,3 +1,13 @@
 use openacc
 
+if (acc_get_num_devices (acc_device_host) .ne. 1) call abort
+call acc_set_device_type (acc_device_host)
+if (acc_get_device_type () .ne. acc_device_host) call abort
+call acc_set_device_num (0, acc_device_host)
+if (acc_get_device_num (acc_device_host) .ne. 0) call abort
+call acc_shutdown (acc_device_host)
+
+call acc_init (acc_device_host)
+call acc_shutdown (acc_device_host)
+
 end
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/lib-10.f90 b/libgomp/testsuite/libgomp.oacc-fortran/lib-10.f90
new file mode 100644
index 0000000..a54d6a7
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/lib-10.f90
@@ -0,0 +1,82 @@
+! { dg-do run }
+
+program main
+  implicit none
+  include "openacc_lib.h"
+
+  integer, target :: a_3d_i(10, 10, 10)
+  complex a_3d_c(10, 10, 10)
+  real a_3d_r(10, 10, 10)
+
+  integer i, j, k
+  complex c
+  real r
+  integer, parameter :: i_size = sizeof (i)
+  integer, parameter :: c_size = sizeof (c)
+  integer, parameter :: r_size = sizeof (r)
+
+  if (acc_get_num_devices (acc_device_nvidia) .eq. 0) call exit
+
+  call acc_init (acc_device_nvidia)
+
+  call set3d (.FALSE., a_3d_i, a_3d_c, a_3d_r)
+
+  call acc_copyin (a_3d_i)
+  call acc_copyin (a_3d_c)
+  call acc_copyin (a_3d_r)
+
+  if (acc_is_present (a_3d_i) .neqv. .TRUE.) call abort
+  if (acc_is_present (a_3d_c) .neqv. .TRUE.) call abort
+  if (acc_is_present (a_3d_r) .neqv. .TRUE.) call abort
+
+  do i = 1, 10
+    do j = 1, 10
+      do k = 1, 10
+        if (acc_is_present (a_3d_i(i, j, k), i_size) .neqv. .TRUE.) call abort
+        if (acc_is_present (a_3d_c(i, j, k), i_size) .neqv. .TRUE.) call abort
+        if (acc_is_present (a_3d_r(i, j, k), i_size) .neqv. .TRUE.) call abort
+      end do
+    end do
+  end do
+
+  call acc_shutdown (acc_device_nvidia)
+
+contains
+
+  subroutine set3d (clear, a_i, a_c, a_r)
+  logical clear
+  integer, dimension (:,:,:), intent (inout) :: a_i
+  complex, dimension (:,:,:), intent (inout) :: a_c
+  real, dimension (:,:,:), intent (inout) :: a_r
+
+  integer i, j, k
+  integer lb1, ub1, lb2, ub2, lb3, ub3
+
+  lb1 = lbound (a_i, 1)
+  ub1 = ubound (a_i, 1)
+
+  lb2 = lbound (a_i, 2)
+  ub2 = ubound (a_i, 2)
+
+  lb3 = lbound (a_i, 3)
+  ub3 = ubound (a_i, 3)
+
+  do i = lb1, ub1
+    do j = lb2, ub2
+      do k = lb3, ub3
+        if (clear) then
+          a_i(i, j, k) = 0
+          a_c(i, j, k) = cmplx (0.0, 0.0)
+          a_r(i, j, k) = 0.0
+        else
+          a_i(i, j, k) = i
+          a_c(i, j, k) = cmplx (i, j)
+          a_r(i, j, k) = i
+        end if
+      end do
+    end do
+  end do
+
+  end subroutine
+
+end program
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/lib-11.f90 b/libgomp/testsuite/libgomp.oacc-fortran/lib-11.f90
new file mode 100644
index 0000000..a54d6a7
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/lib-11.f90
@@ -0,0 +1,82 @@
+! { dg-do run }
+
+program main
+  implicit none
+  include "openacc_lib.h"
+
+  integer, target :: a_3d_i(10, 10, 10)
+  complex a_3d_c(10, 10, 10)
+  real a_3d_r(10, 10, 10)
+
+  integer i, j, k
+  complex c
+  real r
+  integer, parameter :: i_size = sizeof (i)
+  integer, parameter :: c_size = sizeof (c)
+  integer, parameter :: r_size = sizeof (r)
+
+  if (acc_get_num_devices (acc_device_nvidia) .eq. 0) call exit
+
+  call acc_init (acc_device_nvidia)
+
+  call set3d (.FALSE., a_3d_i, a_3d_c, a_3d_r)
+
+  call acc_copyin (a_3d_i)
+  call acc_copyin (a_3d_c)
+  call acc_copyin (a_3d_r)
+
+  if (acc_is_present (a_3d_i) .neqv. .TRUE.) call abort
+  if (acc_is_present (a_3d_c) .neqv. .TRUE.) call abort
+  if (acc_is_present (a_3d_r) .neqv. .TRUE.) call abort
+
+  do i = 1, 10
+    do j = 1, 10
+      do k = 1, 10
+        if (acc_is_present (a_3d_i(i, j, k), i_size) .neqv. .TRUE.) call abort
+        if (acc_is_present (a_3d_c(i, j, k), i_size) .neqv. .TRUE.) call abort
+        if (acc_is_present (a_3d_r(i, j, k), i_size) .neqv. .TRUE.) call abort
+      end do
+    end do
+  end do
+
+  call acc_shutdown (acc_device_nvidia)
+
+contains
+
+  subroutine set3d (clear, a_i, a_c, a_r)
+  logical clear
+  integer, dimension (:,:,:), intent (inout) :: a_i
+  complex, dimension (:,:,:), intent (inout) :: a_c
+  real, dimension (:,:,:), intent (inout) :: a_r
+
+  integer i, j, k
+  integer lb1, ub1, lb2, ub2, lb3, ub3
+
+  lb1 = lbound (a_i, 1)
+  ub1 = ubound (a_i, 1)
+
+  lb2 = lbound (a_i, 2)
+  ub2 = ubound (a_i, 2)
+
+  lb3 = lbound (a_i, 3)
+  ub3 = ubound (a_i, 3)
+
+  do i = lb1, ub1
+    do j = lb2, ub2
+      do k = lb3, ub3
+        if (clear) then
+          a_i(i, j, k) = 0
+          a_c(i, j, k) = cmplx (0.0, 0.0)
+          a_r(i, j, k) = 0.0
+        else
+          a_i(i, j, k) = i
+          a_c(i, j, k) = cmplx (i, j)
+          a_r(i, j, k) = i
+        end if
+      end do
+    end do
+  end do
+
+  end subroutine
+
+end program
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/lib-2.f b/libgomp/testsuite/libgomp.oacc-fortran/lib-2.f
index 64beb9e..a9d70b2 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/lib-2.f
+++ b/libgomp/testsuite/libgomp.oacc-fortran/lib-2.f
@@ -1,3 +1,13 @@
       USE OPENACC
 
+      IF (ACC_GET_NUM_DEVICES (ACC_DEVICE_HOST) .NE. 1) CALL ABORT
+      CALL ACC_SET_DEVICE_TYPE (ACC_DEVICE_HOST)
+      IF (ACC_GET_DEVICE_TYPE () .NE. ACC_DEVICE_HOST) CALL ABORT
+      CALL ACC_SET_DEVICE_NUM (0, ACC_DEVICE_HOST)
+      IF (ACC_GET_DEVICE_NUM (ACC_DEVICE_HOST) .NE. 0) CALL ABORT
+      CALL ACC_SHUTDOWN (ACC_DEVICE_HOST)
+
+      CALL ACC_INIT (ACC_DEVICE_HOST)
+      CALL ACC_SHUTDOWN (ACC_DEVICE_HOST)
+
       END
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/lib-3.f b/libgomp/testsuite/libgomp.oacc-fortran/lib-3.f
index 3f9940b..56d2cd2 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/lib-3.f
+++ b/libgomp/testsuite/libgomp.oacc-fortran/lib-3.f
@@ -1,3 +1,13 @@
       INCLUDE "openacc_lib.h"
 
+      IF (ACC_GET_NUM_DEVICES (ACC_DEVICE_HOST) .NE. 1) CALL ABORT
+      CALL ACC_SET_DEVICE_TYPE (ACC_DEVICE_HOST)
+      IF (ACC_GET_DEVICE_TYPE () .NE. ACC_DEVICE_HOST) CALL ABORT
+      CALL ACC_SET_DEVICE_NUM (0, ACC_DEVICE_HOST)
+      IF (ACC_GET_DEVICE_NUM (ACC_DEVICE_HOST) .NE. 0) CALL ABORT
+      CALL ACC_SHUTDOWN (ACC_DEVICE_HOST)
+
+      CALL ACC_INIT (ACC_DEVICE_HOST)
+      CALL ACC_SHUTDOWN (ACC_DEVICE_HOST)
+
       END
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/lib-4.f90 b/libgomp/testsuite/libgomp.oacc-fortran/lib-4.f90
new file mode 100644
index 0000000..3a2b661
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/lib-4.f90
@@ -0,0 +1,35 @@
+! { dg-do run }
+
+program main
+  use openacc
+  implicit none
+
+  integer n
+
+  if (acc_get_num_devices (acc_device_host) .ne. 1) call abort
+
+  if (acc_get_num_devices (acc_device_none) .ne. 0) call abort
+
+  call acc_init (acc_device_host)
+
+  if (acc_get_device_type () .ne. acc_device_host) call abort
+
+  call acc_set_device_type (acc_device_host)
+
+  if (acc_get_device_type () .ne. acc_device_host) call abort
+
+  n = 0
+
+  call acc_set_device_num (n, acc_device_host)
+
+  if (acc_get_device_num (acc_device_host) .ne. 0) call abort
+
+  if (.NOT. acc_async_test (n) ) call abort
+
+  call acc_wait (n)
+
+  call acc_wait_all ()
+
+  call acc_shutdown (acc_device_host)
+
+end program
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/lib-5.f90 b/libgomp/testsuite/libgomp.oacc-fortran/lib-5.f90
new file mode 100644
index 0000000..e68eb89
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/lib-5.f90
@@ -0,0 +1,31 @@
+! { dg-do run }
+
+program main
+  use openacc
+  implicit none
+
+  integer n
+
+  if (acc_get_num_devices (acc_device_nvidia) .eq. 0) call exit
+
+  call acc_init (acc_device_nvidia)
+
+  n = 0
+
+  call acc_set_device_num (n, acc_device_nvidia)
+
+  if (acc_get_device_num (acc_device_nvidia) .ne. 0) call abort
+
+  if (acc_get_num_devices (acc_device_nvidia) .gt. 1) then
+
+    n = 1
+
+    call acc_set_device_num (n, acc_device_nvidia)
+
+    if (acc_get_device_num (acc_device_nvidia) .ne. 1) call abort
+
+  end if
+
+  call acc_shutdown (acc_device_nvidia)
+
+end program
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/lib-6.f90 b/libgomp/testsuite/libgomp.oacc-fortran/lib-6.f90
new file mode 100644
index 0000000..401ad66
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/lib-6.f90
@@ -0,0 +1,35 @@
+! { dg-do run }
+
+program main
+  implicit none
+  include "openacc_lib.h"
+
+  integer n
+
+  if (acc_get_num_devices (acc_device_host) .ne. 1) call abort
+
+  if (acc_get_num_devices (acc_device_none) .ne. 0) call abort
+
+  call acc_init (acc_device_host)
+
+  if (acc_get_device_type () .ne. acc_device_host) call abort
+
+  call acc_set_device_type (acc_device_host)
+
+  if (acc_get_device_type () .ne. acc_device_host) call abort
+
+  n = 0
+
+  call acc_set_device_num (n, acc_device_host)
+
+  if (acc_get_device_num (acc_device_host) .ne. 0) call abort
+
+  if (.NOT. acc_async_test (n) ) call abort
+
+  call acc_wait (n)
+
+  call acc_wait_all ()
+
+  call acc_shutdown (acc_device_host)
+
+end program
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/lib-7.f90 b/libgomp/testsuite/libgomp.oacc-fortran/lib-7.f90
new file mode 100644
index 0000000..422df53
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/lib-7.f90
@@ -0,0 +1,31 @@
+! { dg-do run }
+
+program main
+  implicit none
+  include "openacc_lib.h"
+
+  integer n
+
+  if (acc_get_num_devices (acc_device_nvidia) .eq. 0) call exit
+
+  call acc_init (acc_device_nvidia)
+
+  n = 0
+
+  call acc_set_device_num (n, acc_device_nvidia)
+
+  if (acc_get_device_num (acc_device_nvidia) .ne. 0) call abort
+
+  if (acc_get_num_devices (acc_device_nvidia) .gt. 1) then
+
+    n = 1
+
+    call acc_set_device_num (n, acc_device_nvidia)
+
+    if (acc_get_device_num (acc_device_nvidia) .ne. 1) call abort
+
+  end if
+
+  call acc_shutdown (acc_device_nvidia)
+
+end program
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/lib-8.f90 b/libgomp/testsuite/libgomp.oacc-fortran/lib-8.f90
new file mode 100644
index 0000000..ad758b2
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/lib-8.f90
@@ -0,0 +1,83 @@
+! { dg-do run }
+
+program main
+  use openacc
+  use iso_c_binding
+  implicit none
+
+  integer, target :: a_3d_i(10, 10, 10)
+  complex a_3d_c(10, 10, 10)
+  real a_3d_r(10, 10, 10)
+
+  integer i, j, k
+  complex c
+  real r
+  integer, parameter :: i_size = sizeof (i)
+  integer, parameter :: c_size = sizeof (c)
+  integer, parameter :: r_size = sizeof (r)
+
+  if (acc_get_num_devices (acc_device_nvidia) .eq. 0) call exit
+
+  call acc_init (acc_device_nvidia)
+
+  call set3d (.FALSE., a_3d_i, a_3d_c, a_3d_r)
+
+  call acc_copyin (a_3d_i)
+  call acc_copyin (a_3d_c)
+  call acc_copyin (a_3d_r)
+
+  if (acc_is_present (a_3d_i) .neqv. .TRUE.) call abort
+  if (acc_is_present (a_3d_c) .neqv. .TRUE.) call abort
+  if (acc_is_present (a_3d_r) .neqv. .TRUE.) call abort
+
+  do i = 1, 10
+    do j = 1, 10
+      do k = 1, 10
+        if (acc_is_present (a_3d_i(i, j, k), i_size) .neqv. .TRUE.) call abort
+        if (acc_is_present (a_3d_c(i, j, k), i_size) .neqv. .TRUE.) call abort
+        if (acc_is_present (a_3d_r(i, j, k), i_size) .neqv. .TRUE.) call abort
+      end do
+    end do
+  end do
+
+  call acc_shutdown (acc_device_nvidia)
+
+contains
+
+  subroutine set3d (clear, a_i, a_c, a_r)
+  logical clear
+  integer, dimension (:,:,:), intent (inout) :: a_i
+  complex, dimension (:,:,:), intent (inout) :: a_c
+  real, dimension (:,:,:), intent (inout) :: a_r
+
+  integer i, j, k
+  integer lb1, ub1, lb2, ub2, lb3, ub3
+
+  lb1 = lbound (a_i, 1)
+  ub1 = ubound (a_i, 1)
+
+  lb2 = lbound (a_i, 2)
+  ub2 = ubound (a_i, 2)
+
+  lb3 = lbound (a_i, 3)
+  ub3 = ubound (a_i, 3)
+
+  do i = lb1, ub1
+    do j = lb2, ub2
+      do k = lb3, ub3
+        if (clear) then
+          a_i(i, j, k) = 0
+          a_c(i, j, k) = cmplx (0.0, 0.0)
+          a_r(i, j, k) = 0.0
+        else
+          a_i(i, j, k) = i
+          a_c(i, j, k) = cmplx (i, j)
+          a_r(i, j, k) = i
+        end if
+      end do
+    end do
+  end do
+
+  end subroutine
+
+end program
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/lib-9.f90 b/libgomp/testsuite/libgomp.oacc-fortran/lib-9.f90
new file mode 100644
index 0000000..ad758b2
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/lib-9.f90
@@ -0,0 +1,83 @@
+! { dg-do run }
+
+program main
+  use openacc
+  use iso_c_binding
+  implicit none
+
+  integer, target :: a_3d_i(10, 10, 10)
+  complex a_3d_c(10, 10, 10)
+  real a_3d_r(10, 10, 10)
+
+  integer i, j, k
+  complex c
+  real r
+  integer, parameter :: i_size = sizeof (i)
+  integer, parameter :: c_size = sizeof (c)
+  integer, parameter :: r_size = sizeof (r)
+
+  if (acc_get_num_devices (acc_device_nvidia) .eq. 0) call exit
+
+  call acc_init (acc_device_nvidia)
+
+  call set3d (.FALSE., a_3d_i, a_3d_c, a_3d_r)
+
+  call acc_copyin (a_3d_i)
+  call acc_copyin (a_3d_c)
+  call acc_copyin (a_3d_r)
+
+  if (acc_is_present (a_3d_i) .neqv. .TRUE.) call abort
+  if (acc_is_present (a_3d_c) .neqv. .TRUE.) call abort
+  if (acc_is_present (a_3d_r) .neqv. .TRUE.) call abort
+
+  do i = 1, 10
+    do j = 1, 10
+      do k = 1, 10
+        if (acc_is_present (a_3d_i(i, j, k), i_size) .neqv. .TRUE.) call abort
+        if (acc_is_present (a_3d_c(i, j, k), i_size) .neqv. .TRUE.) call abort
+        if (acc_is_present (a_3d_r(i, j, k), i_size) .neqv. .TRUE.) call abort
+      end do
+    end do
+  end do
+
+  call acc_shutdown (acc_device_nvidia)
+
+contains
+
+  subroutine set3d (clear, a_i, a_c, a_r)
+  logical clear
+  integer, dimension (:,:,:), intent (inout) :: a_i
+  complex, dimension (:,:,:), intent (inout) :: a_c
+  real, dimension (:,:,:), intent (inout) :: a_r
+
+  integer i, j, k
+  integer lb1, ub1, lb2, ub2, lb3, ub3
+
+  lb1 = lbound (a_i, 1)
+  ub1 = ubound (a_i, 1)
+
+  lb2 = lbound (a_i, 2)
+  ub2 = ubound (a_i, 2)
+
+  lb3 = lbound (a_i, 3)
+  ub3 = ubound (a_i, 3)
+
+  do i = lb1, ub1
+    do j = lb2, ub2
+      do k = lb3, ub3
+        if (clear) then
+          a_i(i, j, k) = 0
+          a_c(i, j, k) = cmplx (0.0, 0.0)
+          a_r(i, j, k) = 0.0
+        else
+          a_i(i, j, k) = i
+          a_c(i, j, k) = cmplx (i, j)
+          a_r(i, j, k) = i
+        end if
+      end do
+    end do
+  end do
+
+  end subroutine
+
+end program
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/map-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/map-1.f90
new file mode 100644
index 0000000..082dd8a
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/map-1.f90
@@ -0,0 +1,97 @@
+program map
+  integer, parameter     :: n = 20, c = 10
+  integer                :: i, a(n), b(n)
+
+  a(:) = 0
+  b(:) = 0
+
+  ! COPY
+
+  !$acc parallel copy (a)
+  !$acc loop
+  do i = 1, n
+     a(i) = i
+  end do
+  !$acc end parallel
+
+  do i = 1, n
+     b(i) = i
+  end do
+
+  call check (a, b, n)
+
+  ! COPYOUT
+
+  a(:) = 0
+
+  !$acc parallel copyout (a)
+  !$acc loop
+  do i = 1, n
+     a(i) = i
+  end do
+  !$acc end parallel
+
+  do i = 1, n
+     if (a(i) .ne. b(i)) call abort
+  end do
+  call check (a, b, n)
+
+  ! COPYIN
+
+  a(:) = 0
+
+  !$acc parallel copyout (a) copyin (b)
+  !$acc loop
+  do i = 1, n
+     a(i) = i
+  end do
+  !$acc end parallel
+
+  call check (a, b, n)
+
+  ! PRESENT_OR_COPY
+
+  !$acc parallel pcopy (a)
+  !$acc loop
+  do i = 1, n
+     a(i) = i
+  end do
+  !$acc end parallel
+
+  call check (a, b, n)
+
+  ! PRESENT_OR_COPYOUT
+
+  a(:) = 0
+
+  !$acc parallel pcopyout (a)
+  !$acc loop
+  do i = 1, n
+     a(i) = i
+  end do
+  !$acc end parallel
+
+  call check (a, b, n)
+
+  ! PRESENT_OR_COPYIN
+
+  a(:) = 0
+
+  !$acc parallel pcopyout (a) pcopyin (b)
+  !$acc loop
+  do i = 1, n
+     a(i) = i
+  end do
+  !$acc end parallel
+
+  call check (a, b, n)
+end program map
+
+subroutine check (a, b, n)
+  integer :: n, a(n), b(n)
+  integer :: i
+
+  do i = 1, n
+     if (a(i) .ne. b(i)) call abort
+  end do
+end subroutine check
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/pointer-align-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/pointer-align-1.f90
new file mode 100644
index 0000000..a5e1fcb
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/pointer-align-1.f90
@@ -0,0 +1,21 @@
+! PR middle-end/63247
+
+program test
+  implicit none
+
+  integer(kind=2) a(4)
+
+  a = 10;
+
+  !$acc parallel copy(a(2:4))
+  a(2) = 52
+  a(3) = 53
+  a(4) = 54
+  !$acc end parallel
+
+  if (a(1) .ne. 10) call abort
+  if (a(2) .ne. 52) call abort
+  if (a(3) .ne. 53) call abort
+  if (a(4) .ne. 54) call abort
+
+end program test
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/pset-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/pset-1.f90
new file mode 100644
index 0000000..1a1d4c7
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/pset-1.f90
@@ -0,0 +1,229 @@
+! { dg-do run }
+
+program test
+  implicit none
+  integer, allocatable :: a1(:)
+  integer, allocatable :: b1(:)
+  integer, allocatable :: c1(:)
+  integer, allocatable :: b2(:,:)
+  integer, allocatable :: c3(:,:,:)
+
+  allocate (a1(5))
+  if (.not.allocated (a1)) call abort()
+
+  a1 = 10
+
+  !$acc parallel copy(a1(1:5))
+  a1(1) = 1
+  a1(2) = 2
+  a1(3) = 3
+  a1(4) = 4
+  a1(5) = 5
+  !$acc end parallel
+
+  if (a1(1) .ne. 1) call abort
+  if (a1(2) .ne. 2) call abort
+  if (a1(3) .ne. 3) call abort
+  if (a1(4) .ne. 4) call abort
+  if (a1(5) .ne. 5) call abort
+
+  deallocate(a1)
+
+  allocate (a1(0:4))
+  if (.not.allocated (a1)) call abort()
+
+  a1 = 10
+
+  !$acc parallel copy(a1(0:4))
+  a1(0) = 1
+  a1(1) = 2
+  a1(2) = 3
+  a1(3) = 4
+  a1(4) = 5
+  !$acc end parallel
+
+  if (a1(0) .ne. 1) call abort
+  if (a1(1) .ne. 2) call abort
+  if (a1(2) .ne. 3) call abort
+  if (a1(3) .ne. 4) call abort
+  if (a1(4) .ne. 5) call abort
+
+  deallocate(a1)
+
+  allocate (b2(5,5))
+  if (.not.allocated (b2)) call abort()
+
+  b2 = 11
+
+  !$acc parallel copy(b2(1:5,1:5))
+  b2(1,1) = 1
+  b2(2,2) = 2
+  b2(3,3) = 3
+  b2(4,4) = 4
+  b2(5,5) = 5
+  !$acc end parallel
+
+  if (b2(1,1) .ne. 1) call abort
+  if (b2(2,2) .ne. 2) call abort
+  if (b2(3,3) .ne. 3) call abort
+  if (b2(4,4) .ne. 4) call abort
+  if (b2(5,5) .ne. 5) call abort
+
+  deallocate(b2)
+
+  allocate (b2(0:4,0:4))
+  if (.not.allocated (b2)) call abort()
+
+  b2 = 11
+
+  !$acc parallel copy(b2(0:4,0:4))
+  b2(0,0) = 1
+  b2(1,1) = 2
+  b2(2,2) = 3
+  b2(3,3) = 4
+  b2(4,4) = 5
+  !$acc end parallel
+
+  if (b2(0,0) .ne. 1) call abort
+  if (b2(1,1) .ne. 2) call abort
+  if (b2(2,2) .ne. 3) call abort
+  if (b2(3,3) .ne. 4) call abort
+  if (b2(4,4) .ne. 5) call abort
+
+  deallocate(b2)
+
+  allocate (c3(5,5,5))
+  if (.not.allocated (c3)) call abort()
+
+  c3 = 12
+
+  !$acc parallel copy(c3(1:5,1:5,1:5))
+  c3(1,1,1) = 1
+  c3(2,2,2) = 2
+  c3(3,3,3) = 3
+  c3(4,4,4) = 4
+  c3(5,5,5) = 5
+  !$acc end parallel
+
+  if (c3(1,1,1) .ne. 1) call abort
+  if (c3(2,2,2) .ne. 2) call abort
+  if (c3(3,3,3) .ne. 3) call abort
+  if (c3(4,4,4) .ne. 4) call abort
+  if (c3(5,5,5) .ne. 5) call abort
+
+  deallocate(c3)
+
+  allocate (c3(0:4,0:4,0:4))
+  if (.not.allocated (c3)) call abort()
+
+  c3 = 12
+
+  !$acc parallel copy(c3(0:4,0:4,0:4))
+  c3(0,0,0) = 1
+  c3(1,1,1) = 2
+  c3(2,2,2) = 3
+  c3(3,3,3) = 4
+  c3(4,4,4) = 5
+  !$acc end parallel
+
+  if (c3(0,0,0) .ne. 1) call abort
+  if (c3(1,1,1) .ne. 2) call abort
+  if (c3(2,2,2) .ne. 3) call abort
+  if (c3(3,3,3) .ne. 4) call abort
+  if (c3(4,4,4) .ne. 5) call abort
+
+  deallocate(c3)
+
+  allocate (a1(5))
+  if (.not.allocated (a1)) call abort()
+
+  allocate (b1(5))
+  if (.not.allocated (b1)) call abort()
+
+  allocate (c1(5))
+  if (.not.allocated (c1)) call abort()
+
+  a1 = 10
+  b1 = 3
+  c1 = 7
+   
+  !$acc parallel copyin(a1(1:5)) create(c1(1:5)) copyout(b1(1:5))
+  c1(1) = a1(1)
+  c1(2) = a1(2)
+  c1(3) = a1(3)
+  c1(4) = a1(4)
+  c1(5) = a1(5)
+
+  b1(1) = c1(1)
+  b1(2) = c1(2)
+  b1(3) = c1(3)
+  b1(4) = c1(4)
+  b1(5) = c1(5)
+  !$acc end parallel
+
+  if (b1(1) .ne. 10) call abort
+  if (b1(2) .ne. 10) call abort
+  if (b1(3) .ne. 10) call abort
+  if (b1(4) .ne. 10) call abort
+  if (b1(5) .ne. 10) call abort
+
+  deallocate(a1)
+  deallocate(b1)
+  deallocate(c1)
+
+  allocate (a1(0:4))
+  if (.not.allocated (a1)) call abort()
+
+  allocate (b1(0:4))
+  if (.not.allocated (b1)) call abort()
+
+  allocate (c1(0:4))
+  if (.not.allocated (c1)) call abort()
+
+  a1 = 10
+  b1 = 3
+  c1 = 7
+   
+  !$acc parallel copyin(a1(0:4)) create(c1(0:4)) copyout(b1(0:4))
+  c1(0) = a1(0)
+  c1(1) = a1(1)
+  c1(2) = a1(2)
+  c1(3) = a1(3)
+  c1(4) = a1(4)
+
+  b1(0) = c1(0)
+  b1(1) = c1(1)
+  b1(2) = c1(2)
+  b1(3) = c1(3)
+  b1(4) = c1(4)
+  !$acc end parallel
+
+  if (b1(0) .ne. 10) call abort
+  if (b1(1) .ne. 10) call abort
+  if (b1(2) .ne. 10) call abort
+  if (b1(3) .ne. 10) call abort
+  if (b1(4) .ne. 10) call abort
+
+  deallocate(a1)
+  deallocate(b1)
+  deallocate(c1)
+
+  allocate (a1(5))
+  if (.not.allocated (a1)) call abort()
+
+  a1 = 10
+
+  !$acc parallel copy(a1(2:3))
+  a1(2) = 2
+  a1(3) = 3
+  !$acc end parallel
+
+  if (a1(1) .ne. 10) call abort
+  if (a1(2) .ne. 2) call abort
+  if (a1(3) .ne. 3) call abort
+  if (a1(4) .ne. 10) call abort
+  if (a1(5) .ne. 10) call abort
+
+  deallocate(a1)
+
+end program test
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/subarrays-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/subarrays-1.f90
new file mode 100644
index 0000000..b39414f
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/subarrays-1.f90
@@ -0,0 +1,97 @@
+program subarrays
+  integer, parameter     :: n = 20, c = 10
+  integer                :: i, a(n), b(n)
+
+  a(:) = 0
+  b(:) = 0
+
+  ! COPY
+
+  !$acc parallel copy (a(1:n))
+  !$acc loop
+  do i = 1, n
+     a(i) = i
+  end do
+  !$acc end parallel
+
+  do i = 1, n
+     b(i) = i
+  end do
+
+  call check (a, b, n)
+
+  ! COPYOUT
+
+  a(:) = 0
+
+  !$acc parallel copyout (a(1:n))
+  !$acc loop
+  do i = 1, n
+     a(i) = i
+  end do
+  !$acc end parallel
+
+  do i = 1, n
+     if (a(i) .ne. b(i)) call abort
+  end do
+  call check (a, b, n)
+
+  ! COPYIN
+
+  a(:) = 0
+
+  !$acc parallel copyout (a(1:n)) copyin (b(1:n))
+  !$acc loop
+  do i = 1, n
+     a(i) = i
+  end do
+  !$acc end parallel
+
+  call check (a, b, n)
+
+  ! PRESENT_OR_COPY
+
+  !$acc parallel pcopy (a(1:n))
+  !$acc loop
+  do i = 1, n
+     a(i) = i
+  end do
+  !$acc end parallel
+
+  call check (a, b, n)
+
+  ! PRESENT_OR_COPYOUT
+
+  a(:) = 0
+
+  !$acc parallel pcopyout (a(1:n))
+  !$acc loop
+  do i = 1, n
+     a(i) = i
+  end do
+  !$acc end parallel
+
+  call check (a, b, n)
+
+  ! PRESENT_OR_COPYIN
+
+  a(:) = 0
+
+  !$acc parallel pcopyout (a(1:n)) pcopyin (b(1:n))
+  !$acc loop
+  do i = 1, n
+     a(i) = i
+  end do
+  !$acc end parallel
+
+  call check (a, b, n)
+end program subarrays
+
+subroutine check (a, b, n)
+  integer :: n, a(n), b(n)
+  integer :: i
+
+  do i = 1, n
+     if (a(i) .ne. b(i)) call abort
+  end do
+end subroutine check
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/subarrays-2.f90 b/libgomp/testsuite/libgomp.oacc-fortran/subarrays-2.f90
new file mode 100644
index 0000000..81799f6
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/subarrays-2.f90
@@ -0,0 +1,100 @@
+program subarrays
+  integer, parameter     :: n = 20, c = 10, low = 5, high = 10
+  integer                :: i, a(n), b(n)
+
+  a(:) = 0
+  b(:) = 0
+
+  ! COPY
+
+  !$acc parallel copy (a(low:high))
+  !$acc loop
+  do i = low, high
+     a(i) = i
+  end do
+  !$acc end parallel
+
+  do i = low, high
+     b(i) = i
+  end do
+
+  call check (a, b, n)
+
+  ! COPYOUT
+
+  a(:) = 0
+
+  !$acc parallel copyout (a(low:high))
+  !$acc loop
+  do i = low, high
+     a(i) = i
+  end do
+  !$acc end parallel
+
+  do i = low, high
+     if (a(i) .ne. b(i)) call abort
+  end do
+  call check (a, b, n)
+
+  ! COPYIN
+
+  a(:) = 0
+
+  !$acc parallel copyout (a(low:high)) copyin (b(low:high))
+  !$acc loop
+  do i = low, high
+     a(i) = b(i)
+  end do
+  !$acc end parallel
+
+  call check (a, b, n)
+
+  ! PRESENT_OR_COPY
+
+  a(:) = 0
+  
+  !$acc parallel pcopy (a(low:high))
+  !$acc loop
+  do i = low, high
+     a(i) = i
+  end do
+  !$acc end parallel
+
+  call check (a, b, n)
+
+  ! PRESENT_OR_COPYOUT
+
+  a(:) = 0
+
+  !$acc parallel pcopyout (a(low:high))
+  !$acc loop
+  do i = low, high
+     a(i) = i
+  end do
+  !$acc end parallel
+
+  call check (a, b, n)
+
+  ! PRESENT_OR_COPYIN
+
+  a(:) = 0
+
+  !$acc parallel pcopyout (a(low:high)) &
+  !$acc & pcopyin (b(low:high))
+  !$acc loop
+  do i = low, high
+     a(i) = b(i)
+  end do
+  !$acc end parallel
+
+  call check (a, b, n)
+end program subarrays
+
+subroutine check (a, b, n)
+  integer :: n, a(n), b(n)
+  integer :: i
+
+  do i = 1, n
+     if (a(i) .ne. b(i)) call abort
+  end do
+end subroutine check
-- 
1.7.10.4


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [gomp4] [1/3] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin
  2014-10-14 16:12   ` [gomp] [3/3] OpenACC 2.0 support for libgomp - documentation Julian Brown
@ 2014-10-16 17:06     ` Thomas Schwinge
  2014-11-05 16:13     ` [gomp4] OpenACC documentation updates Thomas Schwinge
  1 sibling, 0 replies; 12+ messages in thread
From: Thomas Schwinge @ 2014-10-16 17:06 UTC (permalink / raw)
  To: Julian Brown; +Cc: gcc-patches, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 2738 bytes --]

Hi Julian!

On Tue, 14 Oct 2014 17:11:18 +0100, Julian Brown <julian@codesourcery.com> wrote:
> This is a slightly-updated version of the following patch, but this
> time tested (with the aid of a series of patches implementing PTX
> support from Bernd Schmidt), and against the gomp4 branch:
> 
> https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02022.html
> 
> Results (at least for the parts where the middle-end support is on the
> branch already) are comparable with our local development branch.
> 
> Many of Jakub's initial review comments from the mainline version of
> the patch have not yet been addressed, but I have a couple of bits ready
> as follow-up patches and will be posting those shortly also. I plan to
> address the remainder of the issues directly on the gomp4 branch, if
> possible.
> 
> OK to apply (to the gomp4 branch)?

Yes, thanks!  Also the tests and initial documentation patches.

As you're saying, further incremental patch will be required on top of
that; just one small request at this time:

>     libgomp/

>     * oacc-host.c: New file.

As this one completely obsoletes the existing non-shared memory host
plugin, you might as well just remove that file, libgomp/plugin-host.c,
as part of your commit.


Also to everyone working on the gomp-4_0-branch: a patch like the
following one will temporarily be required to avoid a lot of ICEs, until
propagation of options (-fopenacc, -fopenmp) is available in
LTO/offloading mode; for the time being, always enable all OpenACC and
OpenMP builtins:

--- gcc/builtins.def
+++ gcc/builtins.def
@@ -151,7 +151,8 @@ along with GCC; see the file COPYING3.  If not see
 #undef DEF_GOACC_BUILTIN
 #define DEF_GOACC_BUILTIN(ENUM, NAME, TYPE, ATTRS) \
   DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE,    \
-               false, true, true, ATTRS, false, flag_openacc)
+               false, true, true, ATTRS, false, \
+	       (/* TODO */ true || flag_openacc))
 #undef DEF_GOACC_BUILTIN_COMPILER
 #define DEF_GOACC_BUILTIN_COMPILER(ENUM, NAME, TYPE, ATTRS) \
   DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE,    \
@@ -163,7 +164,7 @@ along with GCC; see the file COPYING3.  If not see
 #define DEF_GOMP_BUILTIN(ENUM, NAME, TYPE, ATTRS) \
   DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE,    \
                false, true, true, ATTRS, false, \
-	       (flag_openmp || flag_tree_parallelize_loops))
+	       (/* TODO */ true || flag_openmp || flag_tree_parallelize_loops))
 
 /* Builtin used by implementation of Cilk Plus.  Most of these are decomposed
    by the compiler but a few are implemented in libcilkrts.  */ 


Grüße,
 Thomas

[-- Attachment #2: Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [gomp4] [2/3] OpenACC 2.0 support for libgomp - new tests
  2014-10-14 16:33 ` [gomp4] [2/3] OpenACC 2.0 support for libgomp - new tests Julian Brown
  2014-10-14 16:12   ` [gomp] [3/3] OpenACC 2.0 support for libgomp - documentation Julian Brown
@ 2014-10-28 16:07   ` Thomas Schwinge
  2014-10-29 19:54     ` [gomp4] libgomp: Also consider --with-cuda-driver flags for build-tree testing (was: [2/3] OpenACC 2.0 support for libgomp - new tests) Thomas Schwinge
  2014-11-05 16:17   ` [gomp4] libgomp testsuite: OpenACC C++ " Thomas Schwinge
  2014-11-13 13:32   ` [gomp4] [2/3] OpenACC 2.0 support for libgomp - new tests Thomas Schwinge
  3 siblings, 1 reply; 12+ messages in thread
From: Thomas Schwinge @ 2014-10-28 16:07 UTC (permalink / raw)
  To: gcc-patches; +Cc: Julian Brown

[-- Attachment #1: Type: text/plain, Size: 2247 bytes --]

Hi!

Committed in r216804:

commit 4f9566b3e2954218c0d9ce3c585e14e539f0c1af
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Tue Oct 28 15:57:48 2014 +0000

    libgomp: Don't refer to CUDA installation in /opt/nvidia/cuda-5.5/.
    
    	libgomp/
    	* testsuite/libgomp.oacc-c/c.exp (ld_library_path, ALWAYS_CFLAGS):
    	Don't refer to CUDA installation in /opt/nvidia/cuda-5.5/.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@216804 138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog.gomp                 | 3 +++
 libgomp/testsuite/libgomp.oacc-c/c.exp | 6 ------
 2 files changed, 3 insertions(+), 6 deletions(-)

diff --git libgomp/ChangeLog.gomp libgomp/ChangeLog.gomp
index fda1cbc..5879e20 100644
--- libgomp/ChangeLog.gomp
+++ libgomp/ChangeLog.gomp
@@ -1,5 +1,8 @@
 2014-10-28  Thomas Schwinge  <thomas@codesourcery.com>
 
+	* testsuite/libgomp.oacc-c/c.exp (ld_library_path, ALWAYS_CFLAGS):
+	Don't refer to CUDA installation in /opt/nvidia/cuda-5.5/.
+
 	* oacc-init.c: Don't use <sys/queue.h>'s SLIST_*.
 	* plugin-nvptx.c: Likewise.
 
diff --git libgomp/testsuite/libgomp.oacc-c/c.exp libgomp/testsuite/libgomp.oacc-c/c.exp
index 553c225..318f78e 100644
--- libgomp/testsuite/libgomp.oacc-c/c.exp
+++ libgomp/testsuite/libgomp.oacc-c/c.exp
@@ -31,7 +31,6 @@ set tests [lsort [find $srcdir/$subdir *.c]]
 
 set ld_library_path $always_ld_library_path
 append ld_library_path [gcc-set-multilib-library-path $GCC_UNDER_TEST]
-append ld_library_path ":/opt/nvidia/cuda-5.5/lib64"
 set_ld_library_path_env_vars
 
 # Todo: get list of accelerators from configure options --enable-accelerator.
@@ -58,11 +57,6 @@ foreach accel $accels {
 	    # Copy ptx file (TEMPORARY)
 	    remote_download host $srcdir/libgomp.oacc-c/subr.ptx
 
-	    # Where cuda.h lives
-	    # Todo: get that from configure option --with-cuda-driver.
-	    lappend ALWAYS_CFLAGS "additional_flags=-I/opt/nvidia/cuda-5.5/include"
-	    lappend ALWAYS_CFLAGS "additional_flags=-L/opt/nvidia/cuda-5.5/lib64"
-
 	    # Where timer.h lives
 	    lappend ALWAYS_CFLAGS "additional_flags=-I${srcdir}"
 	    set acc_mem_shared 0


Grüße,
 Thomas

[-- Attachment #2: Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [gomp4] [1/3] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin
  2014-10-14 16:12 [gomp4] [1/3] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin Julian Brown
  2014-10-14 16:33 ` [gomp4] [2/3] OpenACC 2.0 support for libgomp - new tests Julian Brown
@ 2014-10-28 16:15 ` Thomas Schwinge
  2014-10-28 19:42 ` [gomp4] Synchronous mode? (was: [1/3] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin) Thomas Schwinge
       [not found] ` <541877C3.6080507@mentor.com>
  3 siblings, 0 replies; 12+ messages in thread
From: Thomas Schwinge @ 2014-10-28 16:15 UTC (permalink / raw)
  To: gcc-patches; +Cc: Julian Brown

[-- Attachment #1: Type: text/plain, Size: 12611 bytes --]

Hi!

Following the noble goal of code re-use, we had been using <sys/queue.h>
for a standard C linked list implementation.  However, we found that
elderly (but still sufficient to build GCC) glibc releases contain a
variant of <sys/queue.h> that pre-dates a 2006 upstream glibc update to a
more recent upstream BSD version of that file, and so is missing certain
interfaces that we were using.  Instead of conditionally re-implementing
those, in r216803 I committed a patch to remove the LIST_* usage, and
instead do things manually:

commit ba8916f6bc1dd93d8b6dc92f3d84aec49b68dea9
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Tue Oct 28 15:57:37 2014 +0000

    libgomp: Don't use <sys/queue.h>'s SLIST_*.
    
    Some of the interfaces are "too new".
    
    	libgomp/
    	* oacc-init.c: Don't use <sys/queue.h>'s SLIST_*.
    	* plugin-nvptx.c: Likewise.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@216803 138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog.gomp |   5 ++
 libgomp/oacc-init.c    |  23 ++++-----
 libgomp/plugin-nvptx.c | 138 +++++++++++++++++++++++++++++--------------------
 3 files changed, 96 insertions(+), 70 deletions(-)

diff --git libgomp/ChangeLog.gomp libgomp/ChangeLog.gomp
index 5363068..fda1cbc 100644
--- libgomp/ChangeLog.gomp
+++ libgomp/ChangeLog.gomp
@@ -1,3 +1,8 @@
+2014-10-28  Thomas Schwinge  <thomas@codesourcery.com>
+
+	* oacc-init.c: Don't use <sys/queue.h>'s SLIST_*.
+	* plugin-nvptx.c: Likewise.
+
 2014-10-23  Thomas Schwinge  <thomas@codesourcery.com>
 
 	* testsuite/libgomp.oacc-c/reduction-initial-1.c: New file.
diff --git libgomp/oacc-init.c libgomp/oacc-init.c
index f797f89..ffa9ad8 100644
--- libgomp/oacc-init.c
+++ libgomp/oacc-init.c
@@ -31,7 +31,6 @@
 #include <stdlib.h>
 #include <strings.h>
 #include <stdbool.h>
-#include <sys/queue.h>
 #include <stdio.h>
 
 gomp_mutex_t acc_device_lock;
@@ -55,11 +54,11 @@ static __thread int handle_num = -1;
 struct ACC_context {
   struct memmap_t *ACC_memmap;
   void *ACC_handle;
-  SLIST_ENTRY(ACC_context) next;
+
+  struct ACC_context *next;
 };
 
-static SLIST_HEAD(_ACC_contexts, ACC_context) _ACC_contexts;
-static struct _ACC_contexts *ACC_contexts;
+static struct ACC_context *ACC_contexts;
 
 static struct gomp_device_descr const *dispatchers[_ACC_device_hwm] = { 0 };
 
@@ -198,7 +197,7 @@ lazy_open (int ord)
   ACC_handle = ACC_dev->openacc.open_device_func (ord);
   handle_num = ord;
 
-  SLIST_FOREACH(acc_ctx, ACC_contexts, next)
+  for (acc_ctx = ACC_contexts; acc_ctx != NULL; acc_ctx = acc_ctx->next)
     {
       if (acc_ctx->ACC_handle == ACC_handle)
         {
@@ -220,7 +219,8 @@ lazy_open (int ord)
   if (!ACC_memmap->mem_map.is_initialized)
     gomp_init_tables (ACC_dev, &ACC_memmap->mem_map);
 
-  SLIST_INSERT_HEAD(ACC_contexts, acc_ctx, next);
+  acc_ctx->next = ACC_contexts;
+  ACC_contexts = acc_ctx;
 }
 
 /* OpenACC 2.0a (3.2.12, 3.2.13) doesn't specify whether the serialization of
@@ -259,12 +259,10 @@ _acc_shutdown (acc_device_t d)
 
   close_handle ();
 
-  while (SLIST_FIRST(ACC_contexts) != NULL)
+  while (ACC_contexts != NULL)
     {
-      struct ACC_context *c;
-
-      c = SLIST_FIRST(ACC_contexts);
-      SLIST_REMOVE_HEAD(ACC_contexts, next);
+      struct ACC_context *c = ACC_contexts;
+      ACC_contexts = ACC_contexts->next;
       free (c);
     }
 
@@ -467,8 +465,7 @@ ACC_runtime_initialize (void)
 {
   gomp_mutex_init (&acc_device_lock);
 
-  ACC_contexts = &_ACC_contexts;
-  SLIST_INIT (ACC_contexts);
+  ACC_contexts = NULL;
 }
 
 /* Compiler helper functions */
diff --git libgomp/plugin-nvptx.c libgomp/plugin-nvptx.c
index f193229..33f868a 100644
--- libgomp/plugin-nvptx.c
+++ libgomp/plugin-nvptx.c
@@ -40,7 +40,6 @@
 #include "libgomp-plugin.h"
 
 #include <cuda.h>
-#include <sys/queue.h>
 #include <stdint.h>
 #include <string.h>
 #include <stdio.h>
@@ -149,11 +148,9 @@ struct PTX_stream
   void *h_prev;
   void *h_tail;
 
-  SLIST_ENTRY(PTX_stream) next;
+  struct PTX_stream *next;
 };
 
-SLIST_HEAD(PTX_streams, PTX_stream);
-
 /* Each thread may select a stream (also specific to a device/context).  */
 static __thread struct PTX_stream *current_stream;
 
@@ -293,7 +290,7 @@ struct PTX_device
   /* All non-null streams associated with this device (actually context),
      either created implicitly or passed in from the user (via
      acc_set_cuda_stream).  */
-  struct PTX_streams active_streams;
+  struct PTX_stream *active_streams;
   struct {
     struct PTX_stream **arr;
     int size;
@@ -306,12 +303,12 @@ struct PTX_device
   bool concur;
   int  mode;
   bool mkern;
-  SLIST_ENTRY(PTX_device) next;
+
+  struct PTX_device *next;
 };
 
 static __thread struct PTX_device *PTX_dev;
-static SLIST_HEAD(_PTX_devices, PTX_device) _PTX_devices;
-static struct _PTX_devices *PTX_devices;
+static struct PTX_device *PTX_devices;
 
 enum PTX_event_type
 {
@@ -327,12 +324,12 @@ struct PTX_event
   int type;
   void *addr;
   int ord;
-  SLIST_ENTRY(PTX_event) next;
+
+  struct PTX_event *next;
 };
 
 static gomp_mutex_t PTX_event_lock;
-static SLIST_HEAD(_PTX_events, PTX_event) _PTX_events;
-static struct _PTX_events *PTX_events;
+static struct PTX_event *PTX_events;
 
 #define _XSTR(s) _STR(s)
 #define _STR(s) #s
@@ -417,7 +414,7 @@ init_streams_for_device (struct PTX_device *ptx_dev, int concurrency)
   map_init (null_stream);
   ptx_dev->null_stream = null_stream;
   
-  SLIST_INIT (&ptx_dev->active_streams);
+  ptx_dev->active_streams = NULL;
   GOMP_PLUGIN_mutex_init (&ptx_dev->stream_lock);
   
   if (concurrency < 1)
@@ -437,13 +434,13 @@ init_streams_for_device (struct PTX_device *ptx_dev, int concurrency)
 static void
 fini_streams_for_device (struct PTX_device *ptx_dev)
 {
-  struct PTX_stream *s;
   free (ptx_dev->async_streams.arr);
   
-  while (!SLIST_EMPTY (&ptx_dev->active_streams))
+  while (ptx_dev->active_streams != NULL)
     {
-      s = SLIST_FIRST (&ptx_dev->active_streams);
-      SLIST_REMOVE_HEAD (&ptx_dev->active_streams, next);
+      struct PTX_stream *s = ptx_dev->active_streams;
+      ptx_dev->active_streams = ptx_dev->active_streams->next;
+
       cuStreamDestroy (s->stream);
       map_fini (s);
       free (s);
@@ -535,7 +532,8 @@ select_stream_for_async (int async, pthread_t thread, bool create,
 	  s->h = NULL;
 	  map_init (s);
 	  
-	  SLIST_INSERT_HEAD (&ptx_dev->active_streams, s, next);
+	  s->next = ptx_dev->active_streams;
+	  ptx_dev->active_streams = s;
 	  ptx_dev->async_streams.arr[async] = s;
 	}
 
@@ -593,11 +591,8 @@ PTX_init (void)
   if (r != CUDA_SUCCESS)
     GOMP_PLUGIN_fatal ("cuInit error: %s", cuErrorMsg (r));
 
-  PTX_devices = &_PTX_devices;
-  PTX_events = &_PTX_events;
-
-  SLIST_INIT(PTX_devices);
-  SLIST_INIT(PTX_events);
+  PTX_devices = NULL;
+  PTX_events = NULL;
 
   GOMP_PLUGIN_mutex_init (&PTX_event_lock);
 
@@ -625,7 +620,9 @@ PTX_open_device (int n)
     {
       struct PTX_device *ptx_device;
 
-      SLIST_FOREACH(ptx_device, PTX_devices, next)
+      for (ptx_device = PTX_devices;
+	   ptx_device != NULL;
+	   ptx_device = ptx_device->next)
         {
           if (ptx_device->ord == n)
             {
@@ -653,7 +650,8 @@ PTX_open_device (int n)
   PTX_dev->dev = dev;
   PTX_dev->ctx_shared = false;
 
-  SLIST_INSERT_HEAD(PTX_devices, PTX_dev, next);
+  PTX_dev->next = PTX_devices;
+  PTX_devices = PTX_dev;
 
   r = cuCtxGetCurrent (&PTX_dev->ctx);
   if (r != CUDA_SUCCESS)
@@ -729,7 +727,15 @@ PTX_close_device (void *h __attribute__((unused)))
 	GOMP_PLUGIN_fatal ("cuCtxDestroy error: %s", cuErrorMsg (r));
     }
 
-  SLIST_REMOVE(PTX_devices, PTX_dev, PTX_device, next);
+  if (PTX_devices == PTX_dev)
+    PTX_devices = PTX_devices->next;
+  else
+    {
+      struct PTX_device* d = PTX_devices;
+      while (d->next != PTX_dev)
+	d = d->next;
+      d->next = d->next->next;
+    }
   free (PTX_dev);
 
   PTX_dev = NULL;
@@ -920,60 +926,67 @@ link_ptx (CUmodule *module, char *ptx_code)
 static void
 event_gc (bool memmap_lockable)
 {
-  struct PTX_event *ptx_event;
+  struct PTX_event *ptx_event = PTX_events;
 
   GOMP_PLUGIN_mutex_lock (&PTX_event_lock);
 
-  for (ptx_event = SLIST_FIRST (PTX_events); ptx_event;)
+  while (ptx_event != NULL)
     {
       CUresult r;
-      struct PTX_event *next = SLIST_NEXT (ptx_event, next);
+      struct PTX_event *e = ptx_event;
 
-      if (ptx_event->ord != PTX_dev->ord)
-        goto next_event;
+      ptx_event = ptx_event->next;
 
-      r = cuEventQuery (*ptx_event->evt);
+      if (e->ord != PTX_dev->ord)
+	continue;
+
+      r = cuEventQuery (*e->evt);
       if (r == CUDA_SUCCESS)
-        {
-          CUevent *te;
+	{
+	  CUevent *te;
 
-          te = ptx_event->evt;
+	  te = e->evt;
 
-	  switch (ptx_event->type)
+	  switch (e->type)
 	    {
 	    case PTX_EVT_MEM:
 	    case PTX_EVT_SYNC:
 	      break;
 	    
 	    case PTX_EVT_KNL:
-              map_pop (ptx_event->addr);
+	      map_pop (e->addr);
 	      break;
 
 	    case PTX_EVT_ASYNC_CLEANUP:
-              {
-	        /* The function GOMP_PLUGIN_async_unmap_vars needs to claim the
+	      {
+		/* The function GOMP_PLUGIN_async_unmap_vars needs to claim the
 		   memory-map splay tree lock for the current device, so we
 		   can't call it when one of our callers has already claimed
 		   the lock.  In that case, just delay the GC for this event
-		   until later.  */
-	        if (!memmap_lockable)
-		  goto next_event;
+		   until later.	 */
+		if (!memmap_lockable)
+		  continue;
 
-		GOMP_PLUGIN_async_unmap_vars (ptx_event->addr);
-              }
+		GOMP_PLUGIN_async_unmap_vars (e->addr);
+	      }
 	      break;
 	    }
 
-          cuEventDestroy (*te);
-          free ((void *)te);
+	  cuEventDestroy (*te);
+	  free ((void *)te);
 
-          SLIST_REMOVE (PTX_events, ptx_event, PTX_event, next);
+	  if (PTX_events == e)
+	    PTX_events = PTX_events->next;
+	  else
+	    {
+	      struct PTX_event *e_ = PTX_events;
+	      while (e_->next != e)
+		e_ = e_->next;
+	      e_->next = e_->next->next;
+	    }
 
-          free (ptx_event);
-        }
-
-    next_event:
-      ptx_event = next;
+	  free (e);
+	}
     }
 
   GOMP_PLUGIN_mutex_unlock (&PTX_event_lock);
@@ -995,7 +1008,8 @@ event_add (enum PTX_event_type type, CUevent *e, void *h)
 
   GOMP_PLUGIN_mutex_lock (&PTX_event_lock);
 
-  SLIST_INSERT_HEAD(PTX_events, ptx_event, next);
+  ptx_event->next = PTX_events;
+  PTX_events = ptx_event;
 
   GOMP_PLUGIN_mutex_unlock (&PTX_event_lock);
 }
@@ -1316,7 +1330,7 @@ PTX_async_test_all (void)
 
   GOMP_PLUGIN_mutex_lock (&PTX_dev->stream_lock);
 
-  SLIST_FOREACH (s, &PTX_dev->active_streams, next)
+  for (s = PTX_dev->active_streams; s != NULL; s = s->next)
     {
       if ((s->multithreaded || pthread_equal (s->host_thread, self))
 	  && cuStreamQuery (s->stream) == CUDA_ERROR_NOT_READY)
@@ -1400,7 +1414,7 @@ PTX_wait_all (void)
 
   /* Wait for active streams initiated by this thread (or by multiple threads)
      to complete.  */
-  SLIST_FOREACH (s, &PTX_dev->active_streams, next)
+  for (s = PTX_dev->active_streams; s != NULL; s = s->next)
     {
       if (s->multithreaded || pthread_equal (s->host_thread, self))
         {
@@ -1443,7 +1457,9 @@ PTX_wait_all_async (int async)
 
   GOMP_PLUGIN_mutex_lock (&PTX_dev->stream_lock);
 
-  SLIST_FOREACH (other_stream, &PTX_dev->active_streams, next)
+  for (other_stream = PTX_dev->active_streams;
+       other_stream != NULL;
+       other_stream = other_stream->next)
     {
       if (!other_stream->multithreaded
 	  && !pthread_equal (other_stream->host_thread, self))
@@ -1524,8 +1540,16 @@ PTX_set_cuda_stream (int async, void *stream)
   
   if (oldstream)
     {
-      SLIST_REMOVE (&PTX_dev->active_streams, oldstream, PTX_stream, next);
-      
+      if (PTX_dev->active_streams == oldstream)
+	PTX_dev->active_streams = PTX_dev->active_streams->next;
+      else
+	{
+	  struct PTX_stream *s = PTX_dev->active_streams;
+	  while (s->next != oldstream)
+	    s = s->next;
+	  s->next = s->next->next;
+	}
+
       cuStreamDestroy (oldstream->stream);
       map_fini (oldstream);
       free (oldstream);


Grüße,
 Thomas

[-- Attachment #2: Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [gomp4] Synchronous mode?  (was: [1/3] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin)
  2014-10-14 16:12 [gomp4] [1/3] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin Julian Brown
  2014-10-14 16:33 ` [gomp4] [2/3] OpenACC 2.0 support for libgomp - new tests Julian Brown
  2014-10-28 16:15 ` [gomp4] [1/3] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin Thomas Schwinge
@ 2014-10-28 19:42 ` Thomas Schwinge
       [not found] ` <541877C3.6080507@mentor.com>
  3 siblings, 0 replies; 12+ messages in thread
From: Thomas Schwinge @ 2014-10-28 19:42 UTC (permalink / raw)
  To: gcc-patches; +Cc: Julian Brown, Jakub Jelinek, James Norris

[-- Attachment #1: Type: text/plain, Size: 716 bytes --]

Hi!

One remark here, not related to the patch itself:

On Tue, 14 Oct 2014 17:11:18 +0100, Julian Brown <julian@codesourcery.com> wrote:
> --- /dev/null
> +++ b/libgomp/plugin-nvptx.c

> +//#define DISABLE_ASYNC

> +#ifndef DISABLE_ASYNC
> +  [...]
> +#else
> +  r = cuCtxSynchronize ();
> +  if (r != CUDA_SUCCESS)
> +    gomp_plugin_fatal ("cuCtxSynchronize error: %s", cuErrorMsg (r));
> +#endif

Earlier on, in discussion with Jim, we wondered whether it'd make sense
to expose the synchronous mode to the user (via an environment
variable?), which may help them when debugging?  This is not a priority
right now, but perhaps something to keep in mind for later on.


Grüße,
 Thomas

[-- Attachment #2: Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [gomp4] libgomp: Also consider --with-cuda-driver flags for build-tree testing (was: [2/3] OpenACC 2.0 support for libgomp - new tests)
  2014-10-28 16:07   ` [gomp4] [2/3] OpenACC 2.0 support for libgomp - new tests Thomas Schwinge
@ 2014-10-29 19:54     ` Thomas Schwinge
  0 siblings, 0 replies; 12+ messages in thread
From: Thomas Schwinge @ 2014-10-29 19:54 UTC (permalink / raw)
  To: gcc-patches, Jakub Jelinek, James Norris; +Cc: Julian Brown

[-- Attachment #1: Type: text/plain, Size: 10448 bytes --]

Hi!

On Tue, 28 Oct 2014 17:00:38 +0100, I wrote:
> Committed in r216804:
> 
> commit 4f9566b3e2954218c0d9ce3c585e14e539f0c1af
> Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
> Date:   Tue Oct 28 15:57:48 2014 +0000
> 
>     libgomp: Don't refer to CUDA installation in /opt/nvidia/cuda-5.5/.
>     
>     	libgomp/
>     	* testsuite/libgomp.oacc-c/c.exp (ld_library_path, ALWAYS_CFLAGS):
>     	Don't refer to CUDA installation in /opt/nvidia/cuda-5.5/.
>     
>     git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@216804 138bc75d-0d04-0410-961f-82ee72b054a4

> --- libgomp/testsuite/libgomp.oacc-c/c.exp
> +++ libgomp/testsuite/libgomp.oacc-c/c.exp
> @@ -31,7 +31,6 @@ set tests [lsort [find $srcdir/$subdir *.c]]
>  
>  set ld_library_path $always_ld_library_path
>  append ld_library_path [gcc-set-multilib-library-path $GCC_UNDER_TEST]
> -append ld_library_path ":/opt/nvidia/cuda-5.5/lib64"
>  set_ld_library_path_env_vars
>  
>  # Todo: get list of accelerators from configure options --enable-accelerator.
> @@ -58,11 +57,6 @@ foreach accel $accels {
>  	    # Copy ptx file (TEMPORARY)
>  	    remote_download host $srcdir/libgomp.oacc-c/subr.ptx
>  
> -	    # Where cuda.h lives
> -	    # Todo: get that from configure option --with-cuda-driver.
> -	    lappend ALWAYS_CFLAGS "additional_flags=-I/opt/nvidia/cuda-5.5/include"
> -	    lappend ALWAYS_CFLAGS "additional_flags=-L/opt/nvidia/cuda-5.5/lib64"
> -

Jim "complained" that this broke his testing setup.  ;-P

Here is a patch, not tested very much.  Jakub, is that conceptually OK,
and Jim, does it resolve the problem?  I reckon, additionally to
--with-cuda-driver, we might also need --with-cuda-runtime, just for the
purpose of the (few) test cases that test interoperability with the CUDA
Runtime library?

commit 6692df50139e3986d9eb18841b9032e47179db13
Author: Thomas Schwinge <thomas@codesourcery.com>
Date:   Wed Oct 29 20:14:52 2014 +0100

    libgomp: Also consider --with-cuda-driver flags for build-tree testing.
    
    For installed testing, we assume all that to be provided in the sysroot.

diff --git libgomp/Makefile.in libgomp/Makefile.in
index 373c417..d12376e 100644
--- libgomp/Makefile.in
+++ libgomp/Makefile.in
@@ -191,6 +191,8 @@ CCDEPMODE = @CCDEPMODE@
 CFLAGS = @CFLAGS@
 CPP = @CPP@
 CPPFLAGS = @CPPFLAGS@
+CUDA_DRIVER_INCLUDE = @CUDA_DRIVER_INCLUDE@
+CUDA_DRIVER_LIB = @CUDA_DRIVER_LIB@
 CYGPATH_W = @CYGPATH_W@
 DEFS = @DEFS@
 DEPDIR = @DEPDIR@
diff --git libgomp/configure libgomp/configure
index e23c1e2..7daccd9 100755
--- libgomp/configure
+++ libgomp/configure
@@ -719,6 +719,8 @@ build_os
 build_vendor
 build_cpu
 build
+CUDA_DRIVER_LIB
+CUDA_DRIVER_INCLUDE
 GENINSRC_FALSE
 GENINSRC_TRUE
 target_alias
@@ -2616,6 +2618,10 @@ fi
 
 
 # Look for the CUDA driver package.
+CUDA_DRIVER_INCLUDE=
+CUDA_DRIVER_LIB=
+
+
 CUDA_DRIVER_CPPFLAGS=
 CUDA_DRIVER_LDFLAGS=
 
@@ -2637,14 +2643,20 @@ if test "${with_cuda_driver_lib+set}" = set; then :
 fi
 
 if test "x$with_cuda_driver" != x; then
-  CUDA_DRIVER_CPPFLAGS=-I$with_cuda_driver/include
-  CUDA_DRIVER_LDFLAGS=-L$with_cuda_driver/lib
+  CUDA_DRIVER_INCLUDE=$with_cuda_driver/include
+  CUDA_DRIVER_LIB=$with_cuda_driver/lib
 fi
 if test "x$with_cuda_driver_include" != x; then
-  CUDA_DRIVER_CPPFLAGS=-I$with_cuda_driver_include
+  CUDA_DRIVER_INCLUDE=$with_cuda_driver_include
 fi
 if test "x$with_cuda_driver_lib" != x; then
-  CUDA_DRIVER_LDFLAGS=-L$with_cuda_driver_lib
+  CUDA_DRIVER_LIB=$with_cuda_driver_lib
+fi
+if test "x$CUDA_DRIVER_INCLUDE" != x; then
+  CUDA_DRIVER_CPPFLAGS=-I$CUDA_DRIVER_INCLUDE
+fi
+if test "x$CUDA_DRIVER_LIB" != x; then
+  CUDA_DRIVER_LDFLAGS=-L$CUDA_DRIVER_LIB
 fi
 
 
@@ -11145,7 +11157,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 11148 "configure"
+#line 11160 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -11251,7 +11263,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 11254 "configure"
+#line 11266 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -16453,7 +16465,11 @@ CFLAGS="$save_CFLAGS"
 
 ac_config_files="$ac_config_files omp.h omp_lib.h omp_lib.f90 libgomp_f.h"
 
-ac_config_files="$ac_config_files Makefile testsuite/Makefile libgomp.spec"
+ac_config_files="$ac_config_files Makefile testsuite/Makefile"
+
+ac_config_files="$ac_config_files testsuite/libgomp-test-support.exp"
+
+ac_config_files="$ac_config_files libgomp.spec"
 
 cat >confcache <<\_ACEOF
 # This file is a shell script that caches the results of configure
@@ -17598,6 +17614,7 @@ do
     "libgomp_f.h") CONFIG_FILES="$CONFIG_FILES libgomp_f.h" ;;
     "Makefile") CONFIG_FILES="$CONFIG_FILES Makefile" ;;
     "testsuite/Makefile") CONFIG_FILES="$CONFIG_FILES testsuite/Makefile" ;;
+    "testsuite/libgomp-test-support.exp") CONFIG_FILES="$CONFIG_FILES testsuite/libgomp-test-support.exp" ;;
     "libgomp.spec") CONFIG_FILES="$CONFIG_FILES libgomp.spec" ;;
 
   *) as_fn_error "invalid argument: \`$ac_config_target'" "$LINENO" 5;;
diff --git libgomp/configure.ac libgomp/configure.ac
index 2633dac..89c6b31 100644
--- libgomp/configure.ac
+++ libgomp/configure.ac
@@ -31,6 +31,10 @@ AC_MSG_RESULT($enable_generated_files_in_srcdir)
 AM_CONDITIONAL(GENINSRC, test "$enable_generated_files_in_srcdir" = yes)
 
 # Look for the CUDA driver package.
+CUDA_DRIVER_INCLUDE=
+CUDA_DRIVER_LIB=
+AC_SUBST(CUDA_DRIVER_INCLUDE)
+AC_SUBST(CUDA_DRIVER_LIB)
 CUDA_DRIVER_CPPFLAGS=
 CUDA_DRIVER_LDFLAGS=
 AC_ARG_WITH(cuda-driver,
@@ -45,14 +49,20 @@ AC_ARG_WITH(cuda-driver-lib,
 	[AS_HELP_STRING([--with-cuda-driver-lib=PATH],
 		[specify directory for the installed CUDA driver library])])
 if test "x$with_cuda_driver" != x; then
-  CUDA_DRIVER_CPPFLAGS=-I$with_cuda_driver/include
-  CUDA_DRIVER_LDFLAGS=-L$with_cuda_driver/lib
+  CUDA_DRIVER_INCLUDE=$with_cuda_driver/include
+  CUDA_DRIVER_LIB=$with_cuda_driver/lib
 fi
 if test "x$with_cuda_driver_include" != x; then
-  CUDA_DRIVER_CPPFLAGS=-I$with_cuda_driver_include
+  CUDA_DRIVER_INCLUDE=$with_cuda_driver_include
 fi
 if test "x$with_cuda_driver_lib" != x; then
-  CUDA_DRIVER_LDFLAGS=-L$with_cuda_driver_lib
+  CUDA_DRIVER_LIB=$with_cuda_driver_lib
+fi
+if test "x$CUDA_DRIVER_INCLUDE" != x; then
+  CUDA_DRIVER_CPPFLAGS=-I$CUDA_DRIVER_INCLUDE
+fi
+if test "x$CUDA_DRIVER_LIB" != x; then
+  CUDA_DRIVER_LDFLAGS=-L$CUDA_DRIVER_LIB
 fi
 
 
@@ -431,5 +441,7 @@ AC_SUBST(OMP_NEST_LOCK_25_KIND)
 CFLAGS="$save_CFLAGS"
 
 AC_CONFIG_FILES(omp.h omp_lib.h omp_lib.f90 libgomp_f.h)
-AC_CONFIG_FILES(Makefile testsuite/Makefile libgomp.spec)
+AC_CONFIG_FILES(Makefile testsuite/Makefile)
+AC_CONFIG_FILES(testsuite/libgomp-test-support.exp)
+AC_CONFIG_FILES(libgomp.spec)
 AC_OUTPUT
diff --git libgomp/testsuite/Makefile.in libgomp/testsuite/Makefile.in
index 77b365e..17ee96b 100644
--- libgomp/testsuite/Makefile.in
+++ libgomp/testsuite/Makefile.in
@@ -35,7 +35,8 @@ build_triplet = @build@
 host_triplet = @host@
 target_triplet = @target@
 subdir = testsuite
-DIST_COMMON = $(srcdir)/Makefile.in $(srcdir)/Makefile.am
+DIST_COMMON = $(srcdir)/Makefile.in $(srcdir)/Makefile.am \
+	$(srcdir)/libgomp-test-support.exp.in
 ACLOCAL_M4 = $(top_srcdir)/aclocal.m4
 am__aclocal_m4_deps = $(top_srcdir)/../config/acx.m4 \
 	$(top_srcdir)/../config/depstand.m4 \
@@ -54,7 +55,7 @@ am__configure_deps = $(am__aclocal_m4_deps) $(CONFIGURE_DEPENDENCIES) \
 	$(ACLOCAL_M4)
 mkinstalldirs = $(SHELL) $(top_srcdir)/../mkinstalldirs
 CONFIG_HEADER = $(top_builddir)/config.h
-CONFIG_CLEAN_FILES =
+CONFIG_CLEAN_FILES = libgomp-test-support.exp
 CONFIG_CLEAN_VPATH_FILES =
 SOURCES =
 DEJATOOL = $(PACKAGE)
@@ -71,6 +72,8 @@ CCDEPMODE = @CCDEPMODE@
 CFLAGS = @CFLAGS@
 CPP = @CPP@
 CPPFLAGS = @CPPFLAGS@
+CUDA_DRIVER_INCLUDE = @CUDA_DRIVER_INCLUDE@
+CUDA_DRIVER_LIB = @CUDA_DRIVER_LIB@
 CYGPATH_W = @CYGPATH_W@
 DEFS = @DEFS@
 DEPDIR = @DEPDIR@
@@ -251,6 +254,8 @@ $(top_srcdir)/configure: @MAINTAINER_MODE_TRUE@ $(am__configure_deps)
 $(ACLOCAL_M4): @MAINTAINER_MODE_TRUE@ $(am__aclocal_m4_deps)
 	cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh
 $(am__aclocal_m4_deps):
+libgomp-test-support.exp: $(top_builddir)/config.status $(srcdir)/libgomp-test-support.exp.in
+	cd $(top_builddir) && $(SHELL) ./config.status $(subdir)/$@
 
 mostlyclean-libtool:
 	-rm -f *.lo
diff --git libgomp/testsuite/lib/libgomp.exp libgomp/testsuite/lib/libgomp.exp
index 78a14cb..eab97b6 100644
--- libgomp/testsuite/lib/libgomp.exp
+++ libgomp/testsuite/lib/libgomp.exp
@@ -31,6 +31,9 @@ load_gcc_lib timeout-dg.exp
 load_gcc_lib torture-options.exp
 load_gcc_lib fortran-modules.exp
 
+# Try to load a test support file, built during libgomp configuration.
+load_file libgomp-test-support.exp
+
 set dg-do-what-default run
 
 #
@@ -144,6 +147,24 @@ proc libgomp_init { args } {
     }
     lappend ALWAYS_CFLAGS "additional_flags=-I${srcdir}/.."
 
+    # For build-tree testing, also consider the CUDA paths used for builing.
+    # For installed testing, we assume all that to be provided in the sysroot.
+    if { $blddir != "" } {
+	global cuda_driver_include
+	global cuda_driver_lib
+	if { $cuda_driver_include != "" } {
+	    # Stop gfortran from freaking out:
+	    # Warning: Nonexistent include directory "[...]"
+	    if {[file exists $cuda_driver_include]} {
+		lappend ALWAYS_CFLAGS "additional_flags=-I$cuda_driver_include"
+	    }
+	}
+	if { $cuda_driver_lib != "" } {
+	    lappend ALWAYS_CFLAGS "additional_flags=-L$cuda_driver_lib"
+	    append always_ld_library_path ":$cuda_driver_lib"
+	}
+    }
+
     # We use atomic operations in the testcases to validate results.
     if { ([istarget i?86-*-*] || [istarget x86_64-*-*])
 	 && [check_effective_target_ia32] } {
diff --git libgomp/testsuite/libgomp-test-support.exp.in libgomp/testsuite/libgomp-test-support.exp.in
new file mode 100644
index 0000000..e7afd85
--- /dev/null
+++ libgomp/testsuite/libgomp-test-support.exp.in
@@ -0,0 +1,2 @@
+set cuda_driver_include @CUDA_DRIVER_INCLUDE@
+set cuda_driver_lib @CUDA_DRIVER_LIB@


Grüße,
 Thomas

[-- Attachment #2: Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [gomp4] OpenACC documentation updates.
  2014-10-14 16:12   ` [gomp] [3/3] OpenACC 2.0 support for libgomp - documentation Julian Brown
  2014-10-16 17:06     ` [gomp4] [1/3] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin Thomas Schwinge
@ 2014-11-05 16:13     ` Thomas Schwinge
  1 sibling, 0 replies; 12+ messages in thread
From: Thomas Schwinge @ 2014-11-05 16:13 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 11671 bytes --]

Hi!

Applied to gomp-4_0-branch in r217142:

commit 0c5178ff5207bf1ede83070629c7d76fbbdf1afb
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Wed Nov 5 16:12:51 2014 +0000

    OpenACC documentation updates.
    
    	gcc/
    	* invoke.texi: Update for OpenACC.
    	* sourcebuild.texi: Likewise.
    	gcc/fortran/
    	* gfortran.texi: Update for OpenACC.
    	* intrinsic.texi: Likewise.
    	* invoke.texi: Likewise.
    	libgomp/
    	* libgomp.texi: Update for OpenACC.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@217142 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog.gomp         |  3 +++
 gcc/doc/invoke.texi        |  4 ++--
 gcc/doc/sourcebuild.texi   |  2 +-
 gcc/fortran/ChangeLog.gomp |  6 ++++++
 gcc/fortran/gfortran.texi  | 38 ++++++++++++++++++++++++++++++++++----
 gcc/fortran/intrinsic.texi | 31 ++++++++++++++++++++++++++++++-
 gcc/fortran/invoke.texi    |  7 ++++++-
 libgomp/ChangeLog.gomp     |  2 ++
 libgomp/libgomp.texi       | 10 ++++++----
 9 files changed, 90 insertions(+), 13 deletions(-)

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index 5b2bade..fc624c8 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,5 +1,8 @@
 2014-11-05  Thomas Schwinge  <thomas@codesourcery.com>
 
+	* invoke.texi: Update for OpenACC.
+	* sourcebuild.texi: Likewise.
+
 	* tree.def (OACC_WAIT): Remove.  Update all users.
 
 	* omp-builtins.def (BUILT_IN_OMP_SET_NUM_THREADS): Remove.
diff --git gcc/doc/invoke.texi gcc/doc/invoke.texi
index 4cd4f4a..0fe875b 100644
--- gcc/doc/invoke.texi
+++ gcc/doc/invoke.texi
@@ -1872,8 +1872,8 @@ freestanding and hosted environments.
 @item -fopenacc
 @opindex fopenacc
 @cindex OpenACC accelerator programming
-Enable handling of OpenACC directives @code{#pragma acc} in C.
-When @option{-fopenacc} is specified, the
+Enable handling of OpenACC directives @code{#pragma acc} in C/C++ and
+@code{!$acc} in Fortran.  When @option{-fopenacc} is specified, the
 compiler generates accelerated code according to the OpenACC Application
 Programming Interface v2.0 @w{@uref{http://www.openacc.org/}}.  This option
 implies @option{-pthread}, and thus is only supported on targets that
diff --git gcc/doc/sourcebuild.texi gcc/doc/sourcebuild.texi
index 5d1625d..d27fac0 100644
--- gcc/doc/sourcebuild.texi
+++ gcc/doc/sourcebuild.texi
@@ -89,7 +89,7 @@ The Go runtime library.  The bulk of this library is mirrored from the
 @uref{http://code.google.com/@/p/@/go/, master Go repository}.
 
 @item libgomp
-The GNU OpenMP runtime library.
+The GNU OpenACC and OpenMP runtime library.
 
 @item libiberty
 The @code{libiberty} library, used for portability and for some
diff --git gcc/fortran/ChangeLog.gomp gcc/fortran/ChangeLog.gomp
index 5f2e9ba..98e3971 100644
--- gcc/fortran/ChangeLog.gomp
+++ gcc/fortran/ChangeLog.gomp
@@ -1,3 +1,9 @@
+2014-11-05  Thomas Schwinge  <thomas@codesourcery.com>
+
+	* gfortran.texi: Update for OpenACC.
+	* intrinsic.texi: Likewise.
+	* invoke.texi: Likewise.
+
 2014-11-04  Cesar Philippidis  <cesar@codesourcery.com>
 
 	* gfortran.h (ST_OACC_ROUTINE): New statement enum.
diff --git gcc/fortran/gfortran.texi gcc/fortran/gfortran.texi
index 41d6559..c3e7518 100644
--- gcc/fortran/gfortran.texi
+++ gcc/fortran/gfortran.texi
@@ -474,7 +474,8 @@ The GNU Fortran compiler is able to compile nearly all
 standard-compliant Fortran 95, Fortran 90, and Fortran 77 programs,
 including a number of standard and non-standard extensions, and can be
 used on real-world programs.  In particular, the supported extensions
-include OpenMP, Cray-style pointers, and several Fortran 2003 and Fortran
+include OpenACC, OpenMP, Cray-style pointers, and several Fortran 2003
+and Fortran
 2008 features, including TR 15581.  However, it is still under
 development and has a few remaining rough edges.
 
@@ -531,7 +532,8 @@ The current status of the support is can be found in the
 @ref{Fortran 2003 status}, @ref{Fortran 2008 status} and
 @ref{TS 29113 status} sections of the documentation.
 
-Additionally, the GNU Fortran compilers supports the OpenMP specification
+Additionally, the GNU Fortran compilers supports the OpenACC specification
+(version 2.0, @url{http://www.openacc.org/}), and OpenMP specification
 (version 4.0, @url{http://openmp.org/@/wp/@/openmp-specifications/}).
 
 @node Varying Length Character Strings
@@ -963,7 +965,8 @@ module.
 @cindex statement, @code{ISO_FORTRAN_ENV}
 @code{USE} statement with @code{INTRINSIC} and @code{NON_INTRINSIC}
 attribute; supported intrinsic modules: @code{ISO_FORTRAN_ENV},
-@code{ISO_C_BINDING}, @code{OMP_LIB} and @code{OMP_LIB_KINDS}.
+@code{ISO_C_BINDING}, @code{OMP_LIB} and @code{OMP_LIB_KINDS},
+and @code{OPENACC}.
 
 @item
 Renaming of operators in the @code{USE} statement.
@@ -1358,6 +1361,7 @@ without warning.
 * Hollerith constants support::
 * Cray pointers::
 * CONVERT specifier::
+* OpenACC::
 * OpenMP::
 * Argument list functions::
 @end menu
@@ -1873,6 +1877,32 @@ carries a significant speed overhead.  If speed in this area matters
 to you, it is best if you use this only for data that needs to be
 portable.
 
+@node OpenACC
+@subsection OpenACC
+@cindex OpenACC
+
+OpenACC is an application programming interface (API) that supports
+offloading of code to accelerator devices.  It consists of a set of
+compiler directives, library routines, and environment variables that
+influence run-time behavior.
+
+GNU Fortran strives to be compatible to the
+@uref{http://www.openacc.org/, OpenACC Application Programming
+Interface v2.0}.
+
+To enable the processing of the OpenACC directive @code{!$acc} in
+free-form source code; the @code{c$acc}, @code{*$acc} and @code{!$acc}
+directives in fixed form; the @code{!$} conditional compilation
+sentinels in free form; and the @code{c$}, @code{*$} and @code{!$}
+sentinels in fixed form, @command{gfortran} needs to be invoked with
+the @option{-fopenacc}.  This also arranges for automatic linking of
+the GNU OpenACC runtime library @ref{Top,,libgomp,libgomp,GNU OpenACC
+and OpenMP runtime library}.
+
+The OpenACC Fortran runtime library routines are provided both in a
+form of a Fortran 90 module named @code{openacc} and in a form of a
+Fortran @code{include} file named @file{openacc_lib.h}.
+
 @node OpenMP
 @subsection OpenMP
 @cindex OpenMP
@@ -1894,7 +1924,7 @@ directives in fixed form; the @code{!$} conditional compilation sentinels
 in free form; and the @code{c$}, @code{*$} and @code{!$} sentinels
 in fixed form, @command{gfortran} needs to be invoked with the
 @option{-fopenmp}.  This also arranges for automatic linking of the
-GNU OpenMP runtime library @ref{Top,,libgomp,libgomp,GNU OpenMP
+GNU OpenMP runtime library @ref{Top,,libgomp,libgomp,GNU OpenACC and OpenMP
 runtime library}.
 
 The OpenMP Fortran runtime library routines are provided both in a
diff --git gcc/fortran/intrinsic.texi gcc/fortran/intrinsic.texi
index 90c9a3a..fdaf044 100644
--- gcc/fortran/intrinsic.texi
+++ gcc/fortran/intrinsic.texi
@@ -13773,6 +13773,7 @@ Fortran 95 elemental function: @ref{IEOR}
 * ISO_FORTRAN_ENV::
 * ISO_C_BINDING::
 * IEEE modules::
+* OpenACC Module OPENACC::
 * OpenMP Modules OMP_LIB and OMP_LIB_KINDS::
 @end menu
 
@@ -14018,6 +14019,33 @@ with the following options: @code{-fno-unsafe-math-optimizations
 -frounding-math -fsignaling-nans}.
 
 
+
+@node OpenACC Module OPENACC
+@section OpenACC Module @code{OPENACC}
+@table @asis
+@item @emph{Standard}:
+OpenACC Application Programming Interface v2.0
+@end table
+
+
+The OpenACC Fortran runtime library routines are provided both in a
+form of a Fortran 90 module, named @code{OPENACC}, and in form of a
+Fortran @code{include} file named @file{openacc_lib.h}.  The
+procedures provided by @code{OPENACC} can be found in the
+@ref{Top,,Introduction,libgomp,GNU OpenACC and OpenMP runtime library}
+manual, the named constants defined in the modules are listed below.
+
+For details refer to the actual
+@uref{http://www.openacc.org/,
+OpenACC Application Programming Interface v2.0}.
+
+@code{OPENACC} provides the scalar default-integer
+named constant @code{openacc_version} with a value of the form
+@var{yyyymm}, where @code{yyyy} is the year and @var{mm} the month
+of the OpenACC version; for OpenACC v2.0 the value is @code{201306}.
+
+
+
 @node OpenMP Modules OMP_LIB and OMP_LIB_KINDS
 @section OpenMP Modules @code{OMP_LIB} and @code{OMP_LIB_KINDS}
 @table @asis
@@ -14030,7 +14058,8 @@ The OpenMP Fortran runtime library routines are provided both in
 a form of two Fortran 90 modules, named @code{OMP_LIB} and 
 @code{OMP_LIB_KINDS}, and in a form of a Fortran @code{include} file named
 @file{omp_lib.h}. The procedures provided by @code{OMP_LIB} can be found
-in the @ref{Top,,Introduction,libgomp,GNU OpenMP runtime library} manual,
+in the @ref{Top,,Introduction,libgomp,GNU OpenACC and OpenMP runtime
+library} manual,
 the named constants defined in the modules are listed
 below.
 
diff --git gcc/fortran/invoke.texi gcc/fortran/invoke.texi
index 67d9d57..2a1d1ad 100644
--- gcc/fortran/invoke.texi
+++ gcc/fortran/invoke.texi
@@ -305,7 +305,12 @@ functionality.
 @item -fopenacc
 @opindex @code{fopenacc}
 @cindex OpenACC
-Enable the OpenACC extensions.
+Enable the OpenACC extensions.  This includes OpenACC @code{!$acc}
+directives in free form and @code{c$acc}, @code{*$acc} and
+@code{!$acc} directives in fixed form, @code{!$} conditional
+compilation sentinels in free form and @code{c$}, @code{*$} and
+@code{!$} sentinels in fixed form, and when linking arranges for the
+OpenACC runtime library to be linked in.
 
 @item -fopenmp
 @opindex @code{fopenmp}
diff --git libgomp/ChangeLog.gomp libgomp/ChangeLog.gomp
index d65ce71..d4cde2f 100644
--- libgomp/ChangeLog.gomp
+++ libgomp/ChangeLog.gomp
@@ -1,5 +1,7 @@
 2014-11-05  Thomas Schwinge  <thomas@codesourcery.com>
 
+	* libgomp.texi: Update for OpenACC.
+
 	* target.h (struct ACC_dispatch_t): Remove avail_func.  Update all
 	users.
 	* oacc-host.c (openacc_avail): Likewise.
diff --git libgomp/libgomp.texi libgomp/libgomp.texi
index 66c678f..26c65a6 100644
--- libgomp/libgomp.texi
+++ libgomp/libgomp.texi
@@ -122,9 +122,13 @@ for multi-platform shared-memory parallel programming in C/C++ and Fortran.
 @chapter Enabling OpenACC
 
 To activate the OpenACC extensions for C/C++ and Fortran, the compile-time 
-flag @command{-fopenacc} must be specified.  This enables OpenACC, and
+flag @command{-fopenacc} must be specified.  This enables the OpenACC directive
+@code{#pragma acc} in C/C++ and @code{!$accp} directives in free form,
+@code{c$acc}, @code{*$acc} and @code{!$acc} directives in fixed form,
+@code{!$} conditional compilation sentinels in free form and @code{c$},
+@code{*$} and @code{!$} sentinels in fixed form, for Fortran.  The flag also
 arranges for automatic linking of the OpenACC runtime library 
-(@ref{Runtime Library Routines}).
+(@ref{OpenACC Runtime Library Routines}).
 
 A complete description of all OpenACC directives accepted may be found in 
 the @uref{http://www.openacc.org/, OpenMP Application Programming
@@ -171,11 +175,9 @@ acceleration device.
 * acc_is_present::
 * acc_memcpy_to_device::
 * acc_memcpy_from_device::
-@end menu
 
 API routines for target platforms.
 
-@menu
 * acc_get_current_cuda_device::
 * acc_get_current_cuda_context::
 * acc_get_cuda_stream::


Grüße,
 Thomas

[-- Attachment #2: Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [gomp4] libgomp testsuite: OpenACC C++ testing (was: [2/3] OpenACC 2.0 support for libgomp - new tests)
  2014-10-14 16:33 ` [gomp4] [2/3] OpenACC 2.0 support for libgomp - new tests Julian Brown
  2014-10-14 16:12   ` [gomp] [3/3] OpenACC 2.0 support for libgomp - documentation Julian Brown
  2014-10-28 16:07   ` [gomp4] [2/3] OpenACC 2.0 support for libgomp - new tests Thomas Schwinge
@ 2014-11-05 16:17   ` Thomas Schwinge
  2014-11-13 13:32   ` [gomp4] [2/3] OpenACC 2.0 support for libgomp - new tests Thomas Schwinge
  3 siblings, 0 replies; 12+ messages in thread
From: Thomas Schwinge @ 2014-11-05 16:17 UTC (permalink / raw)
  To: gcc-patches; +Cc: Julian Brown, Jakub Jelinek, James Norris

[-- Attachment #1: Type: text/plain, Size: 71352 bytes --]

Hi!

Applied to gomp-4_0-branch in r217143:

commit a78a06124f4047ec46a85e539e83640cc973aec1
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Wed Nov 5 16:16:14 2014 +0000

    libgomp testsuite: OpenACC C++ testing.
    
    	libgomp/
    	* testsuite/libgomp.oacc-c++/c++.exp: Enable
    	libgomp.oacc-c-c++-common testing.
    	* testsuite/libgomp.oacc-c/c.exp: Likewise.
    	* testsuite/libgomp.oacc-c/abort-2.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/abort-2.c: ... this.
    	* testsuite/libgomp.oacc-c/abort.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/abort.c: ... this.
    	* testsuite/libgomp.oacc-c/acc_on_device-1.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/acc_on_device-1.c: ... this.
    	* testsuite/libgomp.oacc-c/clauses-1.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/clauses-1.c: ... this.
    	* testsuite/libgomp.oacc-c/clauses-2.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/clauses-2.c: ... this.
    	* testsuite/libgomp.oacc-c/context-1.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/context-1.c: ... this.
    	* testsuite/libgomp.oacc-c/context-2.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/context-2.c: ... this.
    	* testsuite/libgomp.oacc-c/context-3.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/context-3.c: ... this.
    	* testsuite/libgomp.oacc-c/context-4.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/context-4.c: ... this.
    	* testsuite/libgomp.oacc-c/data-1.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/data-1.c: ... this.
    	* testsuite/libgomp.oacc-c/data-2.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/data-2.c: ... this.
    	* testsuite/libgomp.oacc-c/data-3.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/data-3.c: ... this.
    	* testsuite/libgomp.oacc-c/deviceptr-1.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/deviceptr-1.c: ... this.
    	* testsuite/libgomp.oacc-c/if-1.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/if-1.c: ... this.
    	* testsuite/libgomp.oacc-c/kernels-1.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/kernels-1.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-1.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-1.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-10.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-10.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-11.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-11.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-12.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-12.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-13.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-13.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-14.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-14.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-15.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-15.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-16.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-16.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-17.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-17.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-18.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-18.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-19.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-19.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-2.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-2.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-20.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-20.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-21.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-21.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-22.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-22.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-23.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-23.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-24.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-24.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-25.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-25.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-26.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-26.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-27.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-27.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-28.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-28.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-29.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-29.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-3.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-3.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-30.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-30.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-31.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-31.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-32.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-32.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-33.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-33.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-34.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-34.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-35.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-35.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-36.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-36.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-37.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-37.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-38.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-38.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-39.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-39.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-4.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-4.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-40.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-40.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-41.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-41.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-42.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-42.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-43.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-43.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-44.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-44.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-45.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-45.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-46.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-46.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-47.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-47.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-48.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-48.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-49.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-49.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-5.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-5.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-50.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-50.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-51.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-51.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-52.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-52.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-53.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-53.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-54.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-54.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-55.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-55.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-56.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-56.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-57.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-57.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-58.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-58.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-59.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-59.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-6.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-6.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-60.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-60.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-61.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-61.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-62.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-62.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-63.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-63.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-64.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-64.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-65.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-65.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-66.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-66.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-67.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-67.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-68.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-68.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-69.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-69.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-7.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-7.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-70.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-70.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-71.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-71.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-72.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-72.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-73.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-73.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-74.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-74.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-75.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-75.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-76.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-76.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-77.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-77.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-78.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-78.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-79.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-79.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-80.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-80.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-81.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-81.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-82.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-82.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-83.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-83.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-84.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-84.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-85.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-85.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-86.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-86.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-87.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-87.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-88.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-88.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-89.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-89.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-9.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-9.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-90.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-90.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-91.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-91.c: ... this.
    	* testsuite/libgomp.oacc-c/lib-92.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/lib-92.c: ... this.
    	* testsuite/libgomp.oacc-c/nested-1.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/nested-1.c: ... this.
    	* testsuite/libgomp.oacc-c/nested-2.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/nested-2.c: ... this.
    	* testsuite/libgomp.oacc-c/offset-1.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/offset-1.c: ... this.
    	* testsuite/libgomp.oacc-c/parallel-1.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/parallel-1.c: ... this.
    	* testsuite/libgomp.oacc-c/pointer-align-1.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/pointer-align-1.c: ... this.
    	* testsuite/libgomp.oacc-c/present-1.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/present-1.c: ... this.
    	* testsuite/libgomp.oacc-c/present-2.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/present-2.c: ... this.
    	* testsuite/libgomp.oacc-c/reduction-1.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/reduction-1.c: ... this.
    	* testsuite/libgomp.oacc-c/reduction-2.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/reduction-2.c: ... this.
    	* testsuite/libgomp.oacc-c/reduction-3.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/reduction-3.c: ... this.
    	* testsuite/libgomp.oacc-c/reduction-4.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/reduction-4.c: ... this.
    	* testsuite/libgomp.oacc-c/reduction-5.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/reduction-5.c: ... this.
    	* testsuite/libgomp.oacc-c/reduction-initial-1.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/reduction-initial-1.c: ... this.
    	* testsuite/libgomp.oacc-c/subr.cu: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/subr.cu: ... this.
    	* testsuite/libgomp.oacc-c/subr.ptx: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/subr.ptx: ... this.
    	* testsuite/libgomp.oacc-c/timer.h: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/timer.h: ... this.
    	* testsuite/libgomp.oacc-c/update-1.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/update-1.c: ... this.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@217143 138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog.gomp                             | 251 +++++++++++++++++++++
 libgomp/testsuite/libgomp.oacc-c++/c++.exp         |  18 +-
 .../abort-2.c                                      |   0
 .../abort.c                                        |   0
 .../acc_on_device-1.c                              |   0
 .../clauses-1.c                                    |   0
 .../clauses-2.c                                    |   0
 .../context-1.c                                    |   0
 .../context-2.c                                    |   0
 .../context-3.c                                    |   0
 .../context-4.c                                    |   0
 .../data-1.c                                       |   0
 .../data-2.c                                       |   0
 .../data-3.c                                       |   0
 .../deviceptr-1.c                                  |   0
 .../if-1.c                                         |   0
 .../kernels-1.c                                    |   0
 .../lib-1.c                                        |   0
 .../lib-10.c                                       |   0
 .../lib-11.c                                       |   0
 .../lib-12.c                                       |   0
 .../lib-13.c                                       |   0
 .../lib-14.c                                       |   0
 .../lib-15.c                                       |   0
 .../lib-16.c                                       |   0
 .../lib-17.c                                       |   0
 .../lib-18.c                                       |   0
 .../lib-19.c                                       |   0
 .../lib-2.c                                        |   0
 .../lib-20.c                                       |   0
 .../lib-21.c                                       |   0
 .../lib-22.c                                       |   0
 .../lib-23.c                                       |   0
 .../lib-24.c                                       |   0
 .../lib-25.c                                       |   0
 .../lib-26.c                                       |   0
 .../lib-27.c                                       |   0
 .../lib-28.c                                       |   0
 .../lib-29.c                                       |   0
 .../lib-3.c                                        |   0
 .../lib-30.c                                       |   0
 .../lib-31.c                                       |   0
 .../lib-32.c                                       |   0
 .../lib-33.c                                       |   0
 .../lib-34.c                                       |   0
 .../lib-35.c                                       |   0
 .../lib-36.c                                       |   0
 .../lib-37.c                                       |   0
 .../lib-38.c                                       |   0
 .../lib-39.c                                       |   0
 .../lib-4.c                                        |   0
 .../lib-40.c                                       |   0
 .../lib-41.c                                       |   0
 .../lib-42.c                                       |   0
 .../lib-43.c                                       |   0
 .../lib-44.c                                       |   0
 .../lib-45.c                                       |   0
 .../lib-46.c                                       |   0
 .../lib-47.c                                       |   0
 .../lib-48.c                                       |   0
 .../lib-49.c                                       |   0
 .../lib-5.c                                        |   0
 .../lib-50.c                                       |   0
 .../lib-51.c                                       |   0
 .../lib-52.c                                       |   0
 .../lib-53.c                                       |   0
 .../lib-54.c                                       |   0
 .../lib-55.c                                       |   0
 .../lib-56.c                                       |   0
 .../lib-57.c                                       |   0
 .../lib-58.c                                       |   0
 .../lib-59.c                                       |   0
 .../lib-6.c                                        |   0
 .../lib-60.c                                       |   0
 .../lib-61.c                                       |   0
 .../lib-62.c                                       |   0
 .../lib-63.c                                       |   0
 .../lib-64.c                                       |   0
 .../lib-65.c                                       |   0
 .../lib-66.c                                       |   0
 .../lib-67.c                                       |   0
 .../lib-68.c                                       |   0
 .../lib-69.c                                       |   0
 .../lib-7.c                                        |   0
 .../lib-70.c                                       |   0
 .../lib-71.c                                       |   0
 .../lib-72.c                                       |   0
 .../lib-73.c                                       |   0
 .../lib-74.c                                       |   0
 .../lib-75.c                                       |   0
 .../lib-76.c                                       |   0
 .../lib-77.c                                       |   0
 .../lib-78.c                                       |   0
 .../lib-79.c                                       |   0
 .../lib-80.c                                       |   0
 .../lib-81.c                                       |   0
 .../lib-82.c                                       |   0
 .../lib-83.c                                       |   0
 .../lib-84.c                                       |   0
 .../lib-85.c                                       |   0
 .../lib-86.c                                       |   0
 .../lib-87.c                                       |   0
 .../lib-88.c                                       |   0
 .../lib-89.c                                       |   0
 .../lib-9.c                                        |   0
 .../lib-90.c                                       |   0
 .../lib-91.c                                       |   0
 .../lib-92.c                                       |   0
 .../nested-1.c                                     |   0
 .../nested-2.c                                     |   0
 .../offset-1.c                                     |   0
 .../parallel-1.c                                   |   0
 .../pointer-align-1.c                              |   0
 .../present-1.c                                    |   0
 .../present-2.c                                    |   0
 .../reduction-1.c                                  |   0
 .../reduction-2.c                                  |   0
 .../reduction-3.c                                  |   0
 .../reduction-4.c                                  |   0
 .../reduction-5.c                                  |   0
 .../reduction-initial-1.c                          |   0
 .../subr.cu                                        |   0
 .../subr.ptx                                       |   0
 .../timer.h                                        |   0
 .../update-1.c                                     |   0
 libgomp/testsuite/libgomp.oacc-c/c.exp             |   9 +-
 126 files changed, 274 insertions(+), 4 deletions(-)

diff --git libgomp/ChangeLog.gomp libgomp/ChangeLog.gomp
index d4cde2f..8dc947d 100644
--- libgomp/ChangeLog.gomp
+++ libgomp/ChangeLog.gomp
@@ -1,4 +1,255 @@
 2014-11-05  Thomas Schwinge  <thomas@codesourcery.com>
+	    James Norris  <jnorris@codesourcery.com>
+
+	* testsuite/libgomp.oacc-c++/c++.exp: Enable
+	libgomp.oacc-c-c++-common testing.
+	* testsuite/libgomp.oacc-c/c.exp: Likewise.
+	* testsuite/libgomp.oacc-c/abort-2.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/abort-2.c: ... this.
+	* testsuite/libgomp.oacc-c/abort.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/abort.c: ... this.
+	* testsuite/libgomp.oacc-c/acc_on_device-1.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/acc_on_device-1.c: ... this.
+	* testsuite/libgomp.oacc-c/clauses-1.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/clauses-1.c: ... this.
+	* testsuite/libgomp.oacc-c/clauses-2.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/clauses-2.c: ... this.
+	* testsuite/libgomp.oacc-c/context-1.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/context-1.c: ... this.
+	* testsuite/libgomp.oacc-c/context-2.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/context-2.c: ... this.
+	* testsuite/libgomp.oacc-c/context-3.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/context-3.c: ... this.
+	* testsuite/libgomp.oacc-c/context-4.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/context-4.c: ... this.
+	* testsuite/libgomp.oacc-c/data-1.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/data-1.c: ... this.
+	* testsuite/libgomp.oacc-c/data-2.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/data-2.c: ... this.
+	* testsuite/libgomp.oacc-c/data-3.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/data-3.c: ... this.
+	* testsuite/libgomp.oacc-c/deviceptr-1.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/deviceptr-1.c: ... this.
+	* testsuite/libgomp.oacc-c/if-1.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/if-1.c: ... this.
+	* testsuite/libgomp.oacc-c/kernels-1.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/kernels-1.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-1.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-1.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-10.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-10.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-11.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-11.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-12.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-12.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-13.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-13.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-14.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-14.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-15.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-15.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-16.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-16.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-17.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-17.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-18.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-18.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-19.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-19.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-2.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-2.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-20.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-20.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-21.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-21.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-22.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-22.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-23.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-23.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-24.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-24.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-25.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-25.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-26.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-26.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-27.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-27.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-28.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-28.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-29.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-29.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-3.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-3.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-30.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-30.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-31.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-31.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-32.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-32.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-33.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-33.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-34.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-34.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-35.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-35.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-36.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-36.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-37.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-37.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-38.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-38.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-39.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-39.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-4.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-4.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-40.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-40.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-41.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-41.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-42.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-42.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-43.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-43.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-44.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-44.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-45.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-45.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-46.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-46.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-47.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-47.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-48.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-48.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-49.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-49.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-5.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-5.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-50.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-50.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-51.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-51.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-52.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-52.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-53.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-53.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-54.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-54.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-55.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-55.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-56.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-56.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-57.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-57.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-58.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-58.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-59.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-59.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-6.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-6.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-60.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-60.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-61.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-61.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-62.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-62.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-63.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-63.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-64.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-64.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-65.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-65.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-66.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-66.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-67.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-67.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-68.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-68.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-69.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-69.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-7.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-7.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-70.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-70.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-71.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-71.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-72.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-72.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-73.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-73.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-74.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-74.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-75.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-75.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-76.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-76.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-77.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-77.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-78.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-78.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-79.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-79.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-80.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-80.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-81.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-81.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-82.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-82.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-83.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-83.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-84.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-84.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-85.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-85.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-86.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-86.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-87.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-87.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-88.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-88.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-89.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-89.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-9.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-9.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-90.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-90.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-91.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-91.c: ... this.
+	* testsuite/libgomp.oacc-c/lib-92.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/lib-92.c: ... this.
+	* testsuite/libgomp.oacc-c/nested-1.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/nested-1.c: ... this.
+	* testsuite/libgomp.oacc-c/nested-2.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/nested-2.c: ... this.
+	* testsuite/libgomp.oacc-c/offset-1.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/offset-1.c: ... this.
+	* testsuite/libgomp.oacc-c/parallel-1.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/parallel-1.c: ... this.
+	* testsuite/libgomp.oacc-c/pointer-align-1.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/pointer-align-1.c: ... this.
+	* testsuite/libgomp.oacc-c/present-1.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/present-1.c: ... this.
+	* testsuite/libgomp.oacc-c/present-2.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/present-2.c: ... this.
+	* testsuite/libgomp.oacc-c/reduction-1.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/reduction-1.c: ... this.
+	* testsuite/libgomp.oacc-c/reduction-2.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/reduction-2.c: ... this.
+	* testsuite/libgomp.oacc-c/reduction-3.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/reduction-3.c: ... this.
+	* testsuite/libgomp.oacc-c/reduction-4.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/reduction-4.c: ... this.
+	* testsuite/libgomp.oacc-c/reduction-5.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/reduction-5.c: ... this.
+	* testsuite/libgomp.oacc-c/reduction-initial-1.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/reduction-initial-1.c: ... this.
+	* testsuite/libgomp.oacc-c/subr.cu: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/subr.cu: ... this.
+	* testsuite/libgomp.oacc-c/subr.ptx: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/subr.ptx: ... this.
+	* testsuite/libgomp.oacc-c/timer.h: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/timer.h: ... this.
+	* testsuite/libgomp.oacc-c/update-1.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/update-1.c: ... this.
 
 	* libgomp.texi: Update for OpenACC.
 
diff --git libgomp/testsuite/libgomp.oacc-c++/c++.exp libgomp/testsuite/libgomp.oacc-c++/c++.exp
index 3b64da7..9d5bf0b 100644
--- libgomp/testsuite/libgomp.oacc-c++/c++.exp
+++ libgomp/testsuite/libgomp.oacc-c++/c++.exp
@@ -24,6 +24,11 @@ dg-init
 # XXX (TEMPORARY): Remove the -flto once that's properly integrated.
 lappend ALWAYS_CFLAGS "additional_flags=-fopenacc -flto"
 
+# TODO.  Switch into C++ mode.  Otherwise, the libgomp.oacc-c-c++-common/*.c
+# files would be compiled as C files.
+set SAVE_GCC_UNDER_TEST "$GCC_UNDER_TEST"
+set GCC_UNDER_TEST "$GCC_UNDER_TEST -x c++"
+
 set blddir [lookfor_file [get_multilibs] libgomp]
 
 
@@ -49,7 +54,9 @@ if { $blddir != "" } {
 
 if { $lang_test_file_found } {
     # Gather a list of all tests.
-    set tests [lsort [glob -nocomplain $srcdir/$subdir/*.C]]
+    set tests [lsort [concat \
+			  [find $srcdir/$subdir *.C] \
+			  [find $srcdir/$subdir/../libgomp.oacc-c-c++-common *.c]]]
 
     if { $blddir != "" } {
         set ld_library_path "$always_ld_library_path:${blddir}/${lang_library_path}"
@@ -85,6 +92,12 @@ if { $lang_test_file_found } {
 		set acc_mem_shared 0
 	    }
 	    nvidia {
+		# Copy ptx file (TEMPORARY)
+		remote_download host $srcdir/libgomp.oacc-c-c++-common/subr.ptx
+
+		# Where timer.h lives
+		lappend ALWAYS_CFLAGS "additional_flags=-I${srcdir}/libgomp.oacc-c-c++-common"
+
 		set acc_mem_shared 0
 	    }
 	    default {
@@ -100,5 +113,8 @@ if { $lang_test_file_found } {
     }
 }
 
+# TODO.  See above.
+set GCC_UNDER_TEST "$SAVE_GCC_UNDER_TEST"
+
 # All done.
 dg-finish
diff --git libgomp/testsuite/libgomp.oacc-c/abort-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/abort-2.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/abort-2.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/abort-2.c
diff --git libgomp/testsuite/libgomp.oacc-c/abort.c libgomp/testsuite/libgomp.oacc-c-c++-common/abort.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/abort.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/abort.c
diff --git libgomp/testsuite/libgomp.oacc-c/acc_on_device-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/acc_on_device-1.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/acc_on_device-1.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/acc_on_device-1.c
diff --git libgomp/testsuite/libgomp.oacc-c/clauses-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/clauses-1.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/clauses-1.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/clauses-1.c
diff --git libgomp/testsuite/libgomp.oacc-c/clauses-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/clauses-2.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/clauses-2.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/clauses-2.c
diff --git libgomp/testsuite/libgomp.oacc-c/context-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/context-1.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/context-1.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/context-1.c
diff --git libgomp/testsuite/libgomp.oacc-c/context-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/context-2.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/context-2.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/context-2.c
diff --git libgomp/testsuite/libgomp.oacc-c/context-3.c libgomp/testsuite/libgomp.oacc-c-c++-common/context-3.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/context-3.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/context-3.c
diff --git libgomp/testsuite/libgomp.oacc-c/context-4.c libgomp/testsuite/libgomp.oacc-c-c++-common/context-4.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/context-4.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/context-4.c
diff --git libgomp/testsuite/libgomp.oacc-c/data-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/data-1.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/data-1.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/data-1.c
diff --git libgomp/testsuite/libgomp.oacc-c/data-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/data-2.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/data-2.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/data-2.c
diff --git libgomp/testsuite/libgomp.oacc-c/data-3.c libgomp/testsuite/libgomp.oacc-c-c++-common/data-3.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/data-3.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/data-3.c
diff --git libgomp/testsuite/libgomp.oacc-c/deviceptr-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/deviceptr-1.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/deviceptr-1.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/deviceptr-1.c
diff --git libgomp/testsuite/libgomp.oacc-c/if-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/if-1.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/if-1.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/if-1.c
diff --git libgomp/testsuite/libgomp.oacc-c/kernels-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-1.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/kernels-1.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-1.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-1.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-1.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-1.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-10.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-10.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-10.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-10.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-11.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-11.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-11.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-11.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-12.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-12.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-12.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-12.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-13.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-13.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-13.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-13.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-14.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-14.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-14.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-14.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-15.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-15.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-15.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-15.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-16.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-16.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-16.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-16.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-17.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-17.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-17.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-17.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-18.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-18.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-18.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-18.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-19.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-19.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-19.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-19.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-2.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-2.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-2.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-20.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-20.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-20.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-20.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-21.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-21.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-21.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-21.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-22.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-22.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-22.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-22.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-23.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-23.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-23.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-23.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-24.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-24.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-24.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-24.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-25.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-25.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-25.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-25.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-26.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-26.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-26.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-26.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-27.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-27.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-27.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-27.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-28.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-28.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-28.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-28.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-29.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-29.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-29.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-29.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-3.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-3.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-3.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-3.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-30.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-30.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-30.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-30.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-31.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-31.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-31.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-31.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-32.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-32.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-32.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-32.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-33.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-33.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-33.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-33.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-34.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-34.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-34.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-34.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-35.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-35.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-35.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-35.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-36.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-36.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-36.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-36.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-37.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-37.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-37.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-37.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-38.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-38.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-38.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-38.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-39.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-39.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-39.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-39.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-4.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-4.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-4.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-4.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-40.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-40.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-40.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-40.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-41.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-41.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-41.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-41.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-42.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-42.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-42.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-42.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-43.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-43.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-43.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-43.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-44.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-44.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-44.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-44.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-45.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-45.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-45.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-45.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-46.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-46.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-46.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-46.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-47.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-47.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-47.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-47.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-48.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-48.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-48.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-48.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-49.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-49.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-49.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-49.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-5.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-5.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-5.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-5.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-50.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-50.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-50.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-50.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-51.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-51.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-51.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-51.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-52.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-52.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-52.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-52.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-53.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-53.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-53.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-53.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-54.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-54.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-54.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-54.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-55.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-55.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-55.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-55.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-56.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-56.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-56.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-56.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-57.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-57.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-57.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-57.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-58.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-58.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-58.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-58.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-59.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-59.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-59.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-59.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-6.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-6.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-6.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-6.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-60.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-60.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-60.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-60.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-61.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-61.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-61.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-61.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-62.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-62.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-62.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-62.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-63.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-63.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-63.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-63.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-64.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-64.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-64.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-64.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-65.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-65.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-65.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-65.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-66.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-66.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-66.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-66.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-67.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-67.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-67.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-67.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-68.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-68.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-68.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-68.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-69.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-69.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-69.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-69.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-7.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-7.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-7.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-7.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-70.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-70.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-70.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-70.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-71.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-71.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-71.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-71.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-72.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-72.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-72.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-72.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-73.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-73.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-73.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-73.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-74.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-74.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-74.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-74.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-75.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-75.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-75.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-75.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-76.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-76.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-76.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-76.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-77.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-77.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-77.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-77.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-78.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-78.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-78.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-78.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-79.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-79.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-79.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-79.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-80.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-80.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-80.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-80.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-81.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-81.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-81.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-81.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-82.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-82.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-82.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-82.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-83.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-83.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-83.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-83.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-84.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-84.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-84.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-84.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-85.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-85.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-85.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-85.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-86.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-86.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-86.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-86.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-87.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-87.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-87.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-87.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-88.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-88.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-88.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-88.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-89.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-89.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-89.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-89.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-9.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-9.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-9.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-9.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-90.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-90.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-90.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-90.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-91.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-91.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-91.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-91.c
diff --git libgomp/testsuite/libgomp.oacc-c/lib-92.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-92.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/lib-92.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/lib-92.c
diff --git libgomp/testsuite/libgomp.oacc-c/nested-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/nested-1.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/nested-1.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/nested-1.c
diff --git libgomp/testsuite/libgomp.oacc-c/nested-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/nested-2.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/nested-2.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/nested-2.c
diff --git libgomp/testsuite/libgomp.oacc-c/offset-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/offset-1.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/offset-1.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/offset-1.c
diff --git libgomp/testsuite/libgomp.oacc-c/parallel-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-1.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/parallel-1.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-1.c
diff --git libgomp/testsuite/libgomp.oacc-c/pointer-align-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/pointer-align-1.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/pointer-align-1.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/pointer-align-1.c
diff --git libgomp/testsuite/libgomp.oacc-c/present-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/present-1.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/present-1.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/present-1.c
diff --git libgomp/testsuite/libgomp.oacc-c/present-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/present-2.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/present-2.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/present-2.c
diff --git libgomp/testsuite/libgomp.oacc-c/reduction-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-1.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/reduction-1.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-1.c
diff --git libgomp/testsuite/libgomp.oacc-c/reduction-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-2.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/reduction-2.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-2.c
diff --git libgomp/testsuite/libgomp.oacc-c/reduction-3.c libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-3.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/reduction-3.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-3.c
diff --git libgomp/testsuite/libgomp.oacc-c/reduction-4.c libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-4.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/reduction-4.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-4.c
diff --git libgomp/testsuite/libgomp.oacc-c/reduction-5.c libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-5.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/reduction-5.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-5.c
diff --git libgomp/testsuite/libgomp.oacc-c/reduction-initial-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-initial-1.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/reduction-initial-1.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-initial-1.c
diff --git libgomp/testsuite/libgomp.oacc-c/subr.cu libgomp/testsuite/libgomp.oacc-c-c++-common/subr.cu
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/subr.cu
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/subr.cu
diff --git libgomp/testsuite/libgomp.oacc-c/subr.ptx libgomp/testsuite/libgomp.oacc-c-c++-common/subr.ptx
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/subr.ptx
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/subr.ptx
diff --git libgomp/testsuite/libgomp.oacc-c/timer.h libgomp/testsuite/libgomp.oacc-c-c++-common/timer.h
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/timer.h
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/timer.h
diff --git libgomp/testsuite/libgomp.oacc-c/update-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/update-1.c
similarity index 100%
rename from libgomp/testsuite/libgomp.oacc-c/update-1.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/update-1.c
diff --git libgomp/testsuite/libgomp.oacc-c/c.exp libgomp/testsuite/libgomp.oacc-c/c.exp
index 7559afa..0c31447 100644
--- libgomp/testsuite/libgomp.oacc-c/c.exp
+++ libgomp/testsuite/libgomp.oacc-c/c.exp
@@ -31,7 +31,9 @@ dg-init
 lappend ALWAYS_CFLAGS "additional_flags=-fopenacc -flto"
 
 # Gather a list of all tests.
-set tests [lsort [find $srcdir/$subdir *.c]]
+set tests [lsort [concat \
+		      [find $srcdir/$subdir *.c] \
+		      [find $srcdir/$subdir/../libgomp.oacc-c-c++-common *.c]]]
 
 set ld_library_path $always_ld_library_path
 append ld_library_path [gcc-set-multilib-library-path $GCC_UNDER_TEST]
@@ -59,10 +61,11 @@ foreach accel $accels {
 	}
 	nvidia {
 	    # Copy ptx file (TEMPORARY)
-	    remote_download host $srcdir/libgomp.oacc-c/subr.ptx
+	    remote_download host $srcdir/libgomp.oacc-c-c++-common/subr.ptx
 
 	    # Where timer.h lives
-	    lappend ALWAYS_CFLAGS "additional_flags=-I${srcdir}"
+	    lappend ALWAYS_CFLAGS "additional_flags=-I${srcdir}/libgomp.oacc-c-c++-common"
+
 	    set acc_mem_shared 0
 	}
 	default {


Grüße,
 Thomas

[-- Attachment #2: Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [gomp4] [2/3] OpenACC 2.0 support for libgomp - new tests
  2014-10-14 16:33 ` [gomp4] [2/3] OpenACC 2.0 support for libgomp - new tests Julian Brown
                     ` (2 preceding siblings ...)
  2014-11-05 16:17   ` [gomp4] libgomp testsuite: OpenACC C++ " Thomas Schwinge
@ 2014-11-13 13:32   ` Thomas Schwinge
  3 siblings, 0 replies; 12+ messages in thread
From: Thomas Schwinge @ 2014-11-13 13:32 UTC (permalink / raw)
  To: gcc-patches; +Cc: Julian Brown, James Norris

[-- Attachment #1: Type: text/plain, Size: 2385 bytes --]

Hi!

On Tue, 14 Oct 2014 17:11:42 +0100, Julian Brown <julian@codesourcery.com> wrote:
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.oacc-c/context-2.c

> +    float *h_X, [...]

> +    h_X = (float *) malloc (N * sizeof (float));

> +    d_X = (float *) acc_copyin (&h_X[0], N * sizeof (float));

> +#pragma acc parallel copyin (h_X[0:N]), copy (h_Y2[0:N]) copyin (alpha)

As made apparent by a testsuite regression, after gomp-4_0-branch commit
r217482,
<http://news.gmane.org/find-root.php?message_id=%3C87fvdncmxg.fsf%40kepler.schwinge.homeip.net%3E>,
this needs to be changed as follows; committed to gomp-4_0-branch in
r217484:

commit 0a74cef871aefd56e577d19864f80e78b6af09e8
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Thu Nov 13 13:22:54 2014 +0000

    libgomp testsuite: Fix data clause.
    
    ... after having extended libgomp to actually distinguish between
    "non-force"/"force" semantics.
    
    	libgomp/
    	* testsuite/libgomp.oacc-c-c++-common/context-2.c: Fix data
    	clause.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@217484 138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog.gomp                                  | 3 +++
 libgomp/testsuite/libgomp.oacc-c-c++-common/context-2.c | 2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git libgomp/ChangeLog.gomp libgomp/ChangeLog.gomp
index 254846f..a5a58a0 100644
--- libgomp/ChangeLog.gomp
+++ libgomp/ChangeLog.gomp
@@ -1,5 +1,8 @@
 2014-11-13  Thomas Schwinge  <thomas@codesourcery.com>
 
+	* testsuite/libgomp.oacc-c-c++-common/context-2.c: Fix data
+	clause.
+
 	* target.c (gomp_map_vars_existing): Error out if "force"
 	semantics.
 	(gomp_map_vars): Actually pass kinds to gomp_map_vars_existing.
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/context-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/context-2.c
index 16464d5..6a52f74 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/context-2.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/context-2.c
@@ -149,7 +149,7 @@ main (int argc, char **argv)
 
     context_check (pctx);
 
-#pragma acc parallel copyin (h_X[0:N]), copy (h_Y2[0:N]) copyin (alpha)
+#pragma acc parallel present (h_X[0:N]), copy (h_Y2[0:N]) copyin (alpha)
     {
         int i;
 


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [gomp4] [1/3] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin
       [not found] ` <541877C3.6080507@mentor.com>
@ 2014-12-22 17:33   ` Thomas Schwinge
  0 siblings, 0 replies; 12+ messages in thread
From: Thomas Schwinge @ 2014-12-22 17:33 UTC (permalink / raw)
  To: gcc-patches, Jakub Jelinek; +Cc: Julian Brown, James Norris

[-- Attachment #1: Type: text/plain, Size: 11473 bytes --]

Hi!

We had committed to gomp-4_0-branch code to handle TO_PSET memory mapping
(used with Fortran allocatable arrays).  It turns out that is not
actually useful; committed to gomp-4_0-branch in r219022:

commit b2c3a33803b074052c5178fb1b6cabbd834cfa72
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Mon Dec 22 17:12:40 2014 +0000

    libgomp: Remove the GOMP_MAP_TO_PSET handling code that we once added.
    
    	libgomp/
    	* target.c (gomp_map_vars) <GOMP_MAP_TO_PSET>: Revert earlier
    	changes.
    
    With Intel MIC offloading (emulation), this fixes:
    
        FAIL: libgomp.fortran/examples-4/e.55.2.f90   -O0  execution test
        FAIL: libgomp.fortran/examples-4/e.55.2.f90   -O1  execution test
        FAIL: libgomp.fortran/examples-4/e.55.2.f90   -O2  execution test
        FAIL: libgomp.fortran/examples-4/e.55.2.f90   -O3 -fomit-frame-pointer  execution test
        FAIL: libgomp.fortran/examples-4/e.55.2.f90   -O3 -fomit-frame-pointer -funroll-loops  execution test
        FAIL: libgomp.fortran/examples-4/e.55.2.f90   -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions  execution test
        FAIL: libgomp.fortran/examples-4/e.55.2.f90   -O3 -g  execution test
        FAIL: libgomp.fortran/examples-4/e.55.2.f90   -Os  execution test
        FAIL: libgomp.fortran/target3.f90   -O0  execution test
        FAIL: libgomp.fortran/target3.f90   -O1  execution test
        FAIL: libgomp.fortran/target3.f90   -O2  execution test
        FAIL: libgomp.fortran/target3.f90   -O3 -fomit-frame-pointer  execution test
        FAIL: libgomp.fortran/target3.f90   -O3 -fomit-frame-pointer -funroll-loops  execution test
        FAIL: libgomp.fortran/target3.f90   -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions  execution test
        FAIL: libgomp.fortran/target3.f90   -O3 -g  execution test
        FAIL: libgomp.fortran/target3.f90   -Os  execution test
    
    ... for which Valgrind had reported:
    
        ==21161== Conditional jump or move depends on uninitialised value(s)
        ==21161==    at 0x547233D: gomp_map_vars (target.c:267)
        ==21161==    by 0x54743C3: GOMP_target_data (target.c:934)
        ==21161==    by 0x400E6F: vec_mult_ (e.55.2.f90:38)
        ==21161==    by 0x4011C9: MAIN__ (e.55.2.f90:55)
        ==21161==    by 0x401200: main (e.55.2.f90:56)
        *** Error in `/tmp/offload_aCxI50/offload_target_main': corrupted double-linked list: 0x0000000000c8b9e0 ***
    
    The OpenACC PSET test cases still work.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@219022 138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog.gomp |   3 +
 libgomp/target.c       | 214 ++++++++++++++++---------------------------------
 2 files changed, 72 insertions(+), 145 deletions(-)

diff --git libgomp/ChangeLog.gomp libgomp/ChangeLog.gomp
index 898040d..26fdfe6 100644
--- libgomp/ChangeLog.gomp
+++ libgomp/ChangeLog.gomp
@@ -1,5 +1,8 @@
 2014-12-22  Thomas Schwinge  <thomas@codesourcery.com>
 
+	* target.c (gomp_map_vars) <GOMP_MAP_TO_PSET>: Revert earlier
+	changes.
+
 	* libgomp.h (TARGET_CAP_SHARED_MEM, TARGET_CAP_NATIVE_EXEC)
 	(TARGET_CAP_OPENMP_400, TARGET_CAP_OPENACC_200): Remove, and
 	instead...
diff --git libgomp/target.c libgomp/target.c
index dadcc03..423bbee 100644
--- libgomp/target.c
+++ libgomp/target.c
@@ -154,27 +154,6 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
   tgt->device_descr = devicep;
   tgt->mem_map = mm;
 
-  /* From gcc/fortran/trans-types.c  */
-  struct descriptor_dimension
-    {
-      long stride;
-      long lbound;
-      long ubound;
-    };
-
-   struct gfc_array_descriptor
-     {
-       void *data;
-       long offset;
-       long dtype;
-       struct descriptor_dimension dimension[];
-     };
-
-#define GFC_DTYPE_RANK_MASK     0x07
-#define GFC_DTYPE_TYPE_MASK     0x38
-#define GFC_DTYPE_TYPE_SHIFT    3
-#define GFC_DTYPE_SIZE_SHIFT    6
-
   if (mapnum == 0)
     return tgt;
 
@@ -210,45 +189,6 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 	{
 	  tgt->list[i] = NULL;
 
-	  if ((kind & typemask) == GOMP_MAP_TO_PSET)
-	    {
-	      struct gfc_array_descriptor *gad;
-	      size_t rank;
-	      int j;
-              bool alloc_arrays = true;
-
-	      for (j = i - 1; j >= 0; j--)
-		{
-		  if (hostaddrs[j] == *(void**)hostaddrs[i])
-		    {
-		      alloc_arrays = false;
-		      break;
-		    }
-		}
-
-	      gad = (struct gfc_array_descriptor *) cur_node.host_start;
-	      rank = gad->dtype & GFC_DTYPE_RANK_MASK;
-
-	      cur_node.host_start = (uintptr_t)gad->data;
-	      cur_node.host_end = cur_node.host_start +
-				sizeof (struct gfc_array_descriptor) +
-				(sizeof (struct descriptor_dimension) * rank);
-
-	      if (alloc_arrays)
-                {
-                  size_t tsize;
-
-                  tsize = gad->dtype >> GFC_DTYPE_SIZE_SHIFT;
-
-                  for (j = 0; j < rank; j++)
-                    {
-                      cur_node.host_end += tsize *
-                        (gad->dimension[j].ubound -
-                         gad->dimension[j].lbound + 1);
-                    }
-                }
-	    }
-
 	  size_t align = (size_t) 1 << (kind >> rshift);
 	  not_found_cnt++;
 	  if (tgt_align < align)
@@ -419,92 +359,81 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 					    sizeof (void *));
 		    break;
 		  case GOMP_MAP_TO_PSET:
-		    {
-		      /* Copy from host to device memory.  */
-		      /* FIXME: see above FIXME comment.  */
-		      devicep->host2dev_func (devicep->target_id,
-					      (void *) (tgt->tgt_start
-							+ k->tgt_offset),
-					      (void *) k->host_start,
-					      (k->host_end - k->host_start));
-		      devicep->host2dev_func (devicep->target_id,
-					      (void *) (tgt->tgt_start
-							+ k->tgt_offset),
-					      (void *) &tgt->tgt_start,
-					      sizeof (void *));
+		    /* Copy from host to device memory.  */
+		    /* FIXME: see above FIXME comment.  */
+		    devicep->host2dev_func (devicep->target_id,
+					    (void *) (tgt->tgt_start
+						      + k->tgt_offset),
+					    (void *) k->host_start,
+					    k->host_end - k->host_start);
 
-		      for (j = i + 1; j < mapnum; j++)
-			if (!GOMP_MAP_POINTER_P (get_kind (is_openacc, kinds, j)
-						 & typemask))
-			  break;
-			else if ((uintptr_t) hostaddrs[j] < k->host_start
-				 || ((uintptr_t) hostaddrs[j] + sizeof (void *)
-				     > k->host_end))
-			  break;
-			else
-			  {
-			    tgt->list[j] = k;
-			    k->refcount++;
-			    cur_node.host_start
-			      = (uintptr_t) *(void **) hostaddrs[j];
-			    if (cur_node.host_start == (uintptr_t) NULL)
-			      {
-			        cur_node.tgt_offset = (uintptr_t) NULL;
-			        /* Copy from host to device memory.  */
-			        /* FIXME: see above FIXME comment.  */
-			        devicep->host2dev_func (devicep->target_id,
-							(void *) (tgt->tgt_start
-								  + k->tgt_offset
-								  + ((uintptr_t) hostaddrs[j]
-								     - k->host_start)),
-							(void *) &cur_node.tgt_offset,
-							sizeof (void *));
-			        i++;
-			        continue;
-			      }
-			    /* Add bias to the pointer value.  */
-			    cur_node.host_start += sizes[j];
-			    cur_node.host_end = cur_node.host_start + 1;
-			    n = splay_tree_lookup (&mm->splay_tree, &cur_node);
-			    if (n == NULL)
-			      {
-			        /* Could be possibly zero size array
-				   section.  */
-			        cur_node.host_end--;
-			        n = splay_tree_lookup (&mm->splay_tree,
+		    for (j = i + 1; j < mapnum; j++)
+		      if (!GOMP_MAP_POINTER_P (get_kind (is_openacc, kinds, j)
+					       & typemask))
+			break;
+		      else if ((uintptr_t) hostaddrs[j] < k->host_start
+			       || ((uintptr_t) hostaddrs[j] + sizeof (void *)
+				   > k->host_end))
+			break;
+		      else
+			{
+			  tgt->list[j] = k;
+			  k->refcount++;
+			  cur_node.host_start
+			    = (uintptr_t) *(void **) hostaddrs[j];
+			  if (cur_node.host_start == (uintptr_t) NULL)
+			    {
+			      cur_node.tgt_offset = (uintptr_t) NULL;
+			      /* Copy from host to device memory.  */
+			      /* FIXME: see above FIXME comment.  */
+			      devicep->host2dev_func (devicep->target_id,
+				 (void *) (tgt->tgt_start + k->tgt_offset
+					   + ((uintptr_t) hostaddrs[j]
+					      - k->host_start)),
+				 (void *) &cur_node.tgt_offset,
+				 sizeof (void *));
+			      i++;
+			      continue;
+			    }
+			  /* Add bias to the pointer value.  */
+			  cur_node.host_start += sizes[j];
+			  cur_node.host_end = cur_node.host_start + 1;
+			  n = splay_tree_lookup (&mm->splay_tree, &cur_node);
+			  if (n == NULL)
+			    {
+			      /* Could be possibly zero size array section.  */
+			      cur_node.host_end--;
+			      n = splay_tree_lookup (&mm->splay_tree,
 						     &cur_node);
-			        if (n == NULL)
-				  {
-				    cur_node.host_start--;
-				    n = splay_tree_lookup (&mm->splay_tree,
+			      if (n == NULL)
+				{
+				  cur_node.host_start--;
+				  n = splay_tree_lookup (&mm->splay_tree,
 							 &cur_node);
-				    cur_node.host_start++;
-				  }
-			      }
-			    if (n == NULL)
-				gomp_fatal ("Pointer target of array section "
+				  cur_node.host_start++;
+				}
+			    }
+			  if (n == NULL)
+			    gomp_fatal ("Pointer target of array section "
 					"wasn't mapped");
-			    cur_node.host_start -= n->host_start;
-			    cur_node.tgt_offset = n->tgt->tgt_start
+			  cur_node.host_start -= n->host_start;
+			  cur_node.tgt_offset = n->tgt->tgt_start
 						+ n->tgt_offset
 						+ cur_node.host_start;
-			    /* At this point tgt_offset is target address of the
-			       array section.  Now subtract bias to get what we
-			       want to initialize the pointer with.  */
-			    cur_node.tgt_offset -= sizes[j];
-			    /* Copy from host to device memory.  */
-			    /* FIXME: see above FIXME comment.  */
-
-			    devicep->host2dev_func (devicep->target_id,
-						    (void *) (tgt->tgt_start
-							      + k->tgt_offset
-							      + ((uintptr_t) hostaddrs[j]
-								 - k->host_start)),
-						    (void *) &cur_node.tgt_offset,
-						    sizeof (void *));
-			    i++;
-			  }
-		    }
+			  /* At this point tgt_offset is target address of the
+			     array section.  Now subtract bias to get what we
+			     want to initialize the pointer with.  */
+			  cur_node.tgt_offset -= sizes[j];
+			  /* Copy from host to device memory.  */
+			  /* FIXME: see above FIXME comment.  */
+			  devicep->host2dev_func (devicep->target_id,
+			     (void *) (tgt->tgt_start + k->tgt_offset
+				       + ((uintptr_t) hostaddrs[j]
+					  - k->host_start)),
+			     (void *) &cur_node.tgt_offset,
+			     sizeof (void *));
+			  i++;
+			}
 		    break;
 		  case GOMP_MAP_FORCE_PRESENT:
 		    {
@@ -534,11 +463,6 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 	  }
     }
 
-#undef GFC_DTYPE_RANK_MASK
-#undef GFC_DTYPE_TYPE_MASK
-#undef GFC_DTYPE_TYPE_SHIFT
-#undef GFC_DTYPE_SIZE_SHIFT
-
   if (is_target)
     {
       for (i = 0; i < mapnum; i++)


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-12-22 17:14 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-14 16:12 [gomp4] [1/3] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin Julian Brown
2014-10-14 16:33 ` [gomp4] [2/3] OpenACC 2.0 support for libgomp - new tests Julian Brown
2014-10-14 16:12   ` [gomp] [3/3] OpenACC 2.0 support for libgomp - documentation Julian Brown
2014-10-16 17:06     ` [gomp4] [1/3] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin Thomas Schwinge
2014-11-05 16:13     ` [gomp4] OpenACC documentation updates Thomas Schwinge
2014-10-28 16:07   ` [gomp4] [2/3] OpenACC 2.0 support for libgomp - new tests Thomas Schwinge
2014-10-29 19:54     ` [gomp4] libgomp: Also consider --with-cuda-driver flags for build-tree testing (was: [2/3] OpenACC 2.0 support for libgomp - new tests) Thomas Schwinge
2014-11-05 16:17   ` [gomp4] libgomp testsuite: OpenACC C++ " Thomas Schwinge
2014-11-13 13:32   ` [gomp4] [2/3] OpenACC 2.0 support for libgomp - new tests Thomas Schwinge
2014-10-28 16:15 ` [gomp4] [1/3] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin Thomas Schwinge
2014-10-28 19:42 ` [gomp4] Synchronous mode? (was: [1/3] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin) Thomas Schwinge
     [not found] ` <541877C3.6080507@mentor.com>
2014-12-22 17:33   ` [gomp4] [1/3] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin Thomas Schwinge

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).