Index: libgomp.texi =================================================================== --- libgomp.texi (revision 231662) +++ libgomp.texi (working copy) @@ -94,10 +94,25 @@ @comment better formatting. @comment @menu +* Enabling OpenACC:: How to enable OpenACC for your + applications. +* OpenACC Runtime Library Routines:: + The OpenACC runtime application + programming interface. +* OpenACC Environment Variables:: + Influencing OpenACC runtime behavior with + environment variables. +* CUDA Streams Usage:: Notes on the implementation of + asynchronous operations. +* OpenACC Library Interoperability:: + OpenACC library interoperability with the + NVIDIA CUBLAS library. * Enabling OpenMP:: How to enable OpenMP for your applications. -* Runtime Library Routines:: The OpenMP runtime application programming +* OpenMP Runtime Library Routines:: + The OpenMP runtime application programming interface. -* Environment Variables:: Influencing runtime behavior with environment +* OpenMP Environment Variables:: + Influencing runtime behavior with environment variables. * The libgomp ABI:: Notes on the external ABI presented by libgomp. * Reporting Bugs:: How to report bugs in the GNU Offloading and @@ -113,6 +128,643 @@ @c --------------------------------------------------------------------- +@c Enabling OpenACC +@c --------------------------------------------------------------------- + +@node Enabling OpenACC +@chapter Enabling OpenACC + +To activate the OpenACC extensions for C/C++ and Fortran, the compile-time +flag @command{-fopenacc} must be specified. This enables the OpenACC directive +@code{#pragma acc} in C/C++ and @code{!$accp} directives in free form, +@code{c$acc}, @code{*$acc} and @code{!$acc} directives in fixed form, +@code{!$} conditional compilation sentinels in free form and @code{c$}, +@code{*$} and @code{!$} sentinels in fixed form, for Fortran. The flag also +arranges for automatic linking of the OpenACC runtime library +(@ref{OpenACC Runtime Library Routines}). + +A complete description of all OpenACC directives accepted may be found in +the @uref{http://www.openacc.org/, OpenMP Application Programming +Interface} manual, version 2.0. + +Note that this is an experimental feature, incomplete, and subject to +change in future versions of GCC. See +@uref{https://gcc.gnu.org/wiki/OpenACC} for more information. + + + +@c --------------------------------------------------------------------- +@c OpenACC Runtime Library Routines +@c --------------------------------------------------------------------- + +@node OpenACC Runtime Library Routines +@chapter OpenACC Runtime Library Routines + +The runtime routines described here are defined by section 3 of the OpenACC +specifications in version 2.0. +They have C linkage, and do not throw exceptions. +Generally, they are available only for the host, with the exception of +@code{acc_on_device}, which is available for both the host and the +acceleration device. + +@menu +* acc_get_num_devices:: Get number of devices for the given device type +* acc_set_device_type:: +* acc_get_device_type:: +* acc_set_device_num:: +* acc_get_device_num:: +* acc_init:: +* acc_shutdown:: +* acc_on_device:: Whether executing on a particular device +* acc_malloc:: +* acc_free:: +* acc_copyin:: +* acc_present_or_copyin:: +* acc_create:: +* acc_present_or_create:: +* acc_copyout:: +* acc_delete:: +* acc_update_device:: +* acc_update_self:: +* acc_map_data:: +* acc_unmap_data:: +* acc_deviceptr:: +* acc_hostptr:: +* acc_is_present:: +* acc_memcpy_to_device:: +* acc_memcpy_from_device:: + +API routines for target platforms. + +* acc_get_current_cuda_device:: +* acc_get_current_cuda_context:: +* acc_get_cuda_stream:: +* acc_set_cuda_stream:: +@end menu + + + +@node acc_get_num_devices +@section @code{acc_get_num_devices} -- Get number of devices for given device type +@table @asis +@item @emph{Description} +This routine returns a value indicating the +number of devices available for the given device type. It determines +the number of devices in a @emph{passive} manner. In other words, it +does not alter the state within the runtime environment aside from +possibly initializing an uninitialized device. This aspect allows +the routine to be called without concern for altering the interaction +with an attached accelerator device. + +@item @emph{Reference}: +@uref{http://www.openacc.org/, OpenACC specification v2.0}, section +3.2.1. +@end table + + + +@node acc_set_device_type +@section @code{acc_set_device_type} +@table @asis +@item @emph{Reference}: +@uref{http://www.openacc.org/, OpenACC specification v2.0}, section +3.2.2. +@end table + + + +@node acc_get_device_type +@section @code{acc_get_device_type} +@table @asis +@item @emph{Reference}: +@uref{http://www.openacc.org/, OpenACC specification v2.0}, section +3.2.3. +@end table + + + +@node acc_set_device_num +@section @code{acc_set_device_num} +@table @asis +@item @emph{Reference}: +@uref{http://www.openacc.org/, OpenACC specification v2.0}, section +3.2.4. +@end table + + + +@node acc_get_device_num +@section @code{acc_get_device_num} +@table @asis +@item @emph{Reference}: +@uref{http://www.openacc.org/, OpenACC specification v2.0}, section +3.2.5. +@end table + + + +@node acc_init +@section @code{acc_init} +@table @asis +@item @emph{Reference}: +@uref{http://www.openacc.org/, OpenACC specification v2.0}, section +3.2.12. +@end table + + + +@node acc_shutdown +@section @code{acc_shutdown} +@table @asis +@item @emph{Reference}: +@uref{http://www.openacc.org/, OpenACC specification v2.0}, section +3.2.13. +@end table + + + +@node acc_on_device +@section @code{acc_on_device} -- Whether executing on a particular device +@table @asis +@item @emph{Description}: +This routine tells the program whether it is executing on a particular +device. Based on the argument passed, GCC tries to evaluate this to a +constant at compile time, but library functions are also provided, for +both the host and the acceleration device. + +@item @emph{Reference}: +@uref{http://www.openacc.org/, OpenACC specification v2.0}, section +3.2.14. +@end table + + + +@node acc_malloc +@section @code{acc_malloc} +@table @asis +@item @emph{Reference}: +@uref{http://www.openacc.org/, OpenACC specification v2.0}, section +3.2.15. +@end table + + + +@node acc_free +@section @code{acc_free} +@table @asis +@item @emph{Reference}: +@uref{http://www.openacc.org/, OpenACC specification v2.0}, section +3.2.16. +@end table + + + +@node acc_copyin +@section @code{acc_copyin} +@table @asis +@item @emph{Reference}: +@uref{http://www.openacc.org/, OpenACC specification v2.0}, section +3.2.17. +@end table + + + +@node acc_present_or_copyin +@section @code{acc_present_or_copyin} +@table @asis +@item @emph{Reference}: +@uref{http://www.openacc.org/, OpenACC specification v2.0}, section +3.2.18. +@end table + + + +@node acc_create +@section @code{acc_create} +@table @asis +@item @emph{Reference}: +@uref{http://www.openacc.org/, OpenACC specification v2.0}, section +3.2.19. +@end table + + + +@node acc_present_or_create +@section @code{acc_present_or_create} +@table @asis +@item @emph{Reference}: +@uref{http://www.openacc.org/, OpenACC specification v2.0}, section +3.2.20. +@end table + + + +@node acc_copyout +@section @code{acc_copyout} +@table @asis +@item @emph{Reference}: +@uref{http://www.openacc.org/, OpenACC specification v2.0}, section +3.2.21. +@end table + + + +@node acc_delete +@section @code{acc_delete} +@table @asis +@item @emph{Reference}: +@uref{http://www.openacc.org/, OpenACC specification v2.0}, section +3.2.22. +@end table + + + +@node acc_update_device +@section @code{acc_update_device} +@table @asis +@item @emph{Reference}: +@uref{http://www.openacc.org/, OpenACC specification v2.0}, section +3.2.23. +@end table + + + +@node acc_update_self +@section @code{acc_update_self} +@table @asis +@item @emph{Reference}: +@uref{http://www.openacc.org/, OpenACC specification v2.0}, section +3.2.24. +@end table + + + +@node acc_map_data +@section @code{acc_map_data} +@table @asis +@item @emph{Reference}: +@uref{http://www.openacc.org/, OpenACC specification v2.0}, section +3.2.25. +@end table + + + +@node acc_unmap_data +@section @code{acc_unmap_data} +@table @asis +@item @emph{Reference}: +@uref{http://www.openacc.org/, OpenACC specification v2.0}, section +3.2.26. +@end table + + + +@node acc_deviceptr +@section @code{acc_deviceptr} +@table @asis +@item @emph{Reference}: +@uref{http://www.openacc.org/, OpenACC specification v2.0}, section +3.2.27. +@end table + + + +@node acc_hostptr +@section @code{acc_hostptr} +@table @asis +@item @emph{Reference}: +@uref{http://www.openacc.org/, OpenACC specification v2.0}, section +3.2.28. +@end table + + + +@node acc_is_present +@section @code{acc_is_present} +@table @asis +@item @emph{Reference}: +@uref{http://www.openacc.org/, OpenACC specification v2.0}, section +3.2.29. +@end table + + + +@node acc_memcpy_to_device +@section @code{acc_memcpy_to_device} +@table @asis +@item @emph{Reference}: +@uref{http://www.openacc.org/, OpenACC specification v2.0}, section +3.2.30. +@end table + + + +@node acc_memcpy_from_device +@section @code{acc_memcpy_from_device} +@table @asis +@item @emph{Reference}: +@uref{http://www.openacc.org/, OpenACC specification v2.0}, section +3.2.31. +@end table + + + +@node acc_get_current_cuda_device +@section @code{acc_get_current_cuda_device} +@table @asis +@item @emph{Reference}: +@uref{http://www.openacc.org/, OpenACC specification v2.0}, section +A.2.1.1. +@end table + + + +@node acc_get_current_cuda_context +@section @code{acc_get_current_cuda_context} +@table @asis +@item @emph{Reference}: +@uref{http://www.openacc.org/, OpenACC specification v2.0}, section +A.2.1.2. +@end table + + + +@node acc_get_cuda_stream +@section @code{acc_get_cuda_stream} +@table @asis +@item @emph{Reference}: +@uref{http://www.openacc.org/, OpenACC specification v2.0}, section +A.2.1.3. +@end table + + + +@node acc_set_cuda_stream +@section @code{acc_set_cuda_stream} +@table @asis +@item @emph{Reference}: +@uref{http://www.openacc.org/, OpenACC specification v2.0}, section +A.2.1.4. +@end table + + + +@c --------------------------------------------------------------------- +@c OpenACC Environment Variables +@c --------------------------------------------------------------------- + +@node OpenACC Environment Variables +@chapter OpenACC Environment Variables + +The variables @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM} +are defined by section 4 of the OpenACC specification in version 2.0. +The variable @env{GCC_ACC_NOTIFY} is used for diagnostic purposes. + +@menu +* ACC_DEVICE_TYPE:: +* ACC_DEVICE_NUM:: +* GCC_ACC_NOTIFY:: +@end menu + + + +@node ACC_DEVICE_TYPE +@section @code{ACC_DEVICE_TYPE} +@table @asis +@item @emph{Reference}: +@uref{http://www.openacc.org/, OpenACC specification v2.0}, section +4.1. +@end table + + + +@node ACC_DEVICE_NUM +@section @code{ACC_DEVICE_NUM} +@table @asis +@item @emph{Reference}: +@uref{http://www.openacc.org/, OpenACC specification v2.0}, section +4.2. +@end table + + + +@node GCC_ACC_NOTIFY +@section @code{GCC_ACC_NOTIFY} +@table @asis +@item @emph{Description}: +Print debug information pertaining to the accelerator. +@end table + + + +@c --------------------------------------------------------------------- +@c CUDA Streams Usage +@c --------------------------------------------------------------------- + +@node CUDA Streams Usage +@chapter CUDA Streams Usage + +This applies to the @code{nvptx} plugin only. + +The library provides elements that perform asynchronous movement of +data and asynchronous operation of computing constructs. This +asynchronous functionality is implemented by making use of CUDA +streams@footnote{See "Stream Management" in "CUDA Driver API", +TRM-06703-001, Version 5.5, July 2013, for additional information}. + +The primary means by which the asychronous functionality is accessed +is through the use of those OpenACC directives which make use of the +@code{async} and @code{wait} clauses. When the @code{async} clause is +first used with a directive, it will create a CUDA stream. If an +@code{async-argument} is used with the @code{async} clause, then the +stream will be associated with the specified @code{async-argument}. + +Following the creation of an association between a CUDA stream and the +@code{async-argument} of an @code{async} clause, both the @code{wait} +clause and the @code{wait} directive can be used. When either the +clause or directive is used after stream creation, it creates a +rendezvous point whereby execution will wait until all operations +associated with the @code{async-argument}, that is, stream, have +completed. + +Normally, the management of the streams that are created as a result of +using the @code{async} clause, is done without any intervention by the +caller. This implies the association between the @code{async-argument} +and the CUDA stream will be maintained for the lifetime of the program. +However, this association can be changed through the use of the library +function @code{acc_set_cuda_stream}. When the function +@code{acc_set_cuda_stream} is used, the CUDA stream that was +originally associated with the @code{async} clause will be destroyed. +Caution should be taken when changing the association as subsequent +references to the @code{async-argument} will be referring to a different +CUDA stream. + + + +@c --------------------------------------------------------------------- +@c OpenACC Library Interoperability +@c --------------------------------------------------------------------- + +@node OpenACC Library Interoperability +@chapter OpenACC Library Interoperability + +@section Introduction + +As the OpenACC library is built using the CUDA Driver API, the question has +arisen on what impact does using the OpenACC library have on a program that +uses the Runtime library, or a library based on the Runtime library, e.g., +CUBLAS@footnote{See section 2.26, "Interactions with the CUDA Driver API" in +"CUDA Runtime API", Version 5.5, July 2013 and section 2.27, "VDPAU +Interoperability", in "CUDA Driver API", TRM-06703-001, Version 5.5, +July 2013, for additional information on library interoperability.}. +This chapter will describe the use cases and what changes are +required in order to use both the OpenACC library and the CUBLAS and Runtime +libraries within a program. + +@section First invocation: NVIDIA CUBLAS library API + +In this first use case (see below), a function in the CUBLAS library is called +prior to any of the functions in the OpenACC library. More specifically, the +function @code{cublasCreate()}. + +When invoked, the function will initialize the library and allocate the +hardware resources on the host and the device on behalf of the caller. Once +the initialization and allocation has completed, a handle is returned to the +caller. The OpenACC library also requires initialization and allocation of +hardware resources. Since the CUBLAS library has already allocated the +hardware resources for the device, all that is left to do is to initialize +the OpenACC library and acquire the hardware resources on the host. + +Prior to calling the OpenACC function that will initialize the library and +allocate the host hardware resources, one needs to acquire the device number +that was allocated during the call to @code{cublasCreate()}. The invoking of the +runtime library function @code{cudaGetDevice()} will accomplish this. Once +acquired, the device number is passed along with the device type as +parameters to the OpenACC library function @code{acc_set_device_num()}. + +Once the call to @code{acc_set_device_num()} has completed, the OpenACC +library will be using the context that was created during the call to +@code{cublasCreate()}. In other words, both libraries will be sharing the +same context. + +@verbatim + /* Create the handle */ + s = cublasCreate(&h); + if (s != CUBLAS_STATUS_SUCCESS) + { + fprintf(stderr, "cublasCreate failed %d\n", s); + exit(EXIT_FAILURE); + } + + /* Get the device number */ + e = cudaGetDevice(&dev); + if (e != cudaSuccess) + { + fprintf(stderr, "cudaGetDevice failed %d\n", e); + exit(EXIT_FAILURE); + } + + /* Initialize OpenACC library and use device 'dev' */ + acc_set_device_num(dev, acc_device_nvidia); + +@end verbatim +@center Use Case 1 + +@section First invocation: OpenACC library API + +In this second use case (see below), a function in the OpenACC library is +called prior to any of the functions in the CUBLAS library. More specificially, +the function acc_set_device_num(). + +In the use case presented here, the function @code{acc_set_device_num()} +is used to both initialize the OpenACC library and allocate the hardware +resources on the host and the device. In the call to the function, the +call parameters specify which device to use, i.e., 'dev', and what device +type to use, i.e., @code{acc_device_nvidia}. It should be noted that this +is but one method to initialize the OpenACC library and allocate the +appropriate hardware resources. Other methods are available through the +use of environment variables and these will be discussed in the next section. + +Once the call to @code{acc_set_device_num()} has completed, other OpenACC +functions can be called as seen with multiple calls being made to +@code{acc_copyin()}. In addition, calls can be made to functions in the +CUBLAS library. In the use case a call to @code{cublasCreate()} is made +subsequent to the calls to @code{acc_copyin()}. +As seen in the previous use case, a call to @code{cublasCreate()} will +initialize the CUBLAS library and allocate the hardware resources on the +host and the device. However, since the device has already been allocated, +@code{cublasCreate()} will only initialize the CUBLAS library and allocate +the appropriate hardware resources on the host. The context that was created +as part of the OpenACC initialization will be shared with the CUBLAS library, +similarly to the first use case. + +@verbatim + dev = 0; + + acc_set_device_num(dev, acc_device_nvidia); + + /* Copy the first set to the device */ + d_X = acc_copyin(&h_X[0], N * sizeof (float)); + if (d_X == NULL) + { + fprintf(stderr, "copyin error h_X\n"); + exit(EXIT_FAILURE); + } + + /* Copy the second set to the device */ + d_Y = acc_copyin(&h_Y1[0], N * sizeof (float)); + if (d_Y == NULL) + { + fprintf(stderr, "copyin error h_Y1\n"); + exit(EXIT_FAILURE); + } + + /* Create the handle */ + s = cublasCreate(&h); + if (s != CUBLAS_STATUS_SUCCESS) + { + fprintf(stderr, "cublasCreate failed %d\n", s); + exit(EXIT_FAILURE); + } + + /* Perform saxpy using CUBLAS library function */ + s = cublasSaxpy(h, N, &alpha, d_X, 1, d_Y, 1); + if (s != CUBLAS_STATUS_SUCCESS) + { + fprintf(stderr, "cublasSaxpy failed %d\n", s); + exit(EXIT_FAILURE); + } + + /* Copy the results from the device */ + acc_memcpy_from_device(&h_Y1[0], d_Y, N * sizeof (float)); + +} +@end verbatim +@center Use Case 2 + +@section OpenACC library and environment variables + +There are two environment variables associated with the OpenACC library that +may be used to control the device type and device number. +Namely, @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM}. In the second +use case, the device type and device number were specified using +@code{acc_set_device_num()}. However, @env{ACC_DEVICE_TYPE} and +@env{ACC_DEVICE_NUM} could have been defined and the call to +@code{acc_set_device_num()} would be not be required. At the time of the +call to @code{acc_copyin()}, these two environment variables would be +sampled and their values used. + +The use of the environment variables is only relevant when an OpenACC function +is called prior to a call to @code{cudaCreate()}. If @code{cudaCreate()} +is called prior to a call to an OpenACC function, then a call to +@code{acc_set_device_num()}, must be done@footnote{More complete information +about @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM} can be found in +sections 4.1 and 4.2 of the “The OpenACC +Application Programming Interface”, Version 2.0, June, 2013.}. + + + +@c --------------------------------------------------------------------- @c Enabling OpenMP @c --------------------------------------------------------------------- @@ -134,11 +786,11 @@ @c --------------------------------------------------------------------- -@c Runtime Library Routines +@c OpenMP Runtime Library Routines @c --------------------------------------------------------------------- @node Runtime Library Routines -@chapter Runtime Library Routines +@chapter OpenMP Runtime Library Routines The runtime routines described here are defined by Section 3 of the OpenMP specification in version 4.5. The routines are structured in following @@ -1317,11 +1969,11 @@ @c --------------------------------------------------------------------- -@c Environment Variables +@c OpenMP Environment Variables @c --------------------------------------------------------------------- @node Environment Variables -@chapter Environment Variables +@chapter OpenMP Environment Variables The environment variables which beginning with @env{OMP_} are defined by section 4 of the OpenMP specification in version 4.5, while those