From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 1534) id 553473858D33; Wed, 1 Feb 2023 14:29:24 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 553473858D33 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1675261764; bh=RstCVjLIa5erZo8UZ5lPdrwLynU6gzeMYSEKJhZr6uE=; h=From:To:Subject:Date:From; b=cXYWvH0MLyqVALmcA2N0tgdAKNkc9+vOxHGdPsIkEvidXiekdsR5N3lJPF7iIRGB8 rEJesNt+znW87Bw0VKvT6iEJba13T6svA3UDdDVZKnmVJJJj5I8MH2rJL7gSgry0YZ UYWcj4XyFhcogAUX/1HIRVq/59Gky+3PYLBvPnZc= Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: Tobias Burnus To: gcc-cvs@gcc.gnu.org Subject: [gcc/devel/omp/gcc-12] libgomp.texi: Reverse-offload updates X-Act-Checkin: gcc X-Git-Author: Tobias Burnus X-Git-Refname: refs/heads/devel/omp/gcc-12 X-Git-Oldrev: 23f2f065bf051ec7dd0e32bb60d4cdd707c501e9 X-Git-Newrev: 6b611f1b3ca5edae8e3500209cea82128b2dc594 Message-Id: <20230201142924.553473858D33@sourceware.org> Date: Wed, 1 Feb 2023 14:29:24 +0000 (GMT) List-Id: https://gcc.gnu.org/g:6b611f1b3ca5edae8e3500209cea82128b2dc594 commit 6b611f1b3ca5edae8e3500209cea82128b2dc594 Author: Tobias Burnus Date: Wed Feb 1 15:29:11 2023 +0100 libgomp.texi: Reverse-offload updates libgomp/ * libgomp.texi (5.0 Impl. Status): Update 'requires' and 'ancestor'. (GCN): Add item about 'omp requires'. (nvptx): Likewise; add item about reverse offload. (cherry picked from commit eda38850a7980d78d966a39b58961349bea7c984) Diff: --- libgomp/ChangeLog.omp | 9 +++++++++ libgomp/libgomp.texi | 26 +++++++++++++++++++------- 2 files changed, 28 insertions(+), 7 deletions(-) diff --git a/libgomp/ChangeLog.omp b/libgomp/ChangeLog.omp index 3d1cb16348a..75a47a77ed7 100644 --- a/libgomp/ChangeLog.omp +++ b/libgomp/ChangeLog.omp @@ -1,3 +1,12 @@ +2023-02-01 Tobias Burnus + + Backported from master: + 2023-02-01 Tobias Burnus + + * libgomp.texi (5.0 Impl. Status): Update 'requires' and 'ancestor'. + (GCN): Add item about 'omp requires'. + (nvptx): Likewise; add item about reverse offload. + 2023-02-01 Tobias Burnus Backported from master: diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi index 966af076f31..dc3da5a84e8 100644 --- a/libgomp/libgomp.texi +++ b/libgomp/libgomp.texi @@ -192,8 +192,8 @@ The OpenMP 4.5 specification is fully supported. env variable @tab Y @tab @item Nested-parallel changes to @emph{max-active-levels-var} ICV @tab Y @tab @item @code{requires} directive @tab P - @tab complete but no non-host devices provides @code{unified_address}, - @code{unified_shared_memory} or @code{reverse_offload} + @tab complete but no non-host devices provides @code{unified_address} or + @code{unified_shared_memory} @item @code{teams} construct outside an enclosing target region @tab Y @tab @item Non-rectangular loop nests @tab P @tab Full support for C/C++, partial for Fortran @item @code{!=} as relational-op in canonical loop form for C/C++ @tab Y @tab @@ -228,7 +228,7 @@ The OpenMP 4.5 specification is fully supported. @item @code{allocate} clause @tab P @tab Initial support @item @code{use_device_addr} clause on @code{target data} @tab Y @tab @item @code{ancestor} modifier on @code{device} clause - @tab Y @tab See comment for @code{requires} + @tab Y @tab Host fallback with GCN devices @item Implicit declare target directive @tab Y @tab @item Discontiguous array section with @code{target update} construct @tab N @tab @@ -288,7 +288,7 @@ The OpenMP 4.5 specification is fully supported. @code{append_args} @tab N @tab @item @code{dispatch} construct @tab N @tab @item device-specific ICV settings with environment variables @tab Y @tab -@item @code{assume} directive @tab Y @tab +@item @code{assume} and @code{assumes} directives @tab Y @tab @item @code{nothing} directive @tab Y @tab @item @code{error} directive @tab Y @tab @item @code{masked} construct @tab Y @tab @@ -351,7 +351,7 @@ The OpenMP 4.5 specification is fully supported. to address of matching mapped list item per 5.1, Sect. 2.21.7.2 @tab N @tab @item For Fortran, diagnose placing declarative before/between @code{USE}, @code{IMPORT}, and @code{IMPLICIT} as invalid @tab N @tab -@item Optional comma beween directive and clause in the @code{#pragma} form @tab Y @tab +@item Optional comma between directive and clause in the @code{#pragma} form @tab Y @tab @item @code{indirect} clause in @code{declare target} @tab N @tab @item @code{device_type(nohost)}/@code{device_type(host)} for variables @tab N @tab @end multitable @@ -3959,7 +3959,7 @@ same context. @section First invocation: OpenACC library API In this second use case (see below), a function in the OpenACC library is -called prior to any of the functions in the CUBLAS library. More specificially, +called prior to any of the functions in the CUBLAS library. More specifically, the function @code{acc_set_device_num()}. In the use case presented here, the function @code{acc_set_device_num()} @@ -4451,6 +4451,9 @@ The implementation remark: @item I/O within OpenMP target regions and OpenACC compute regions is supported using the C library @code{printf} functions and the Fortran @code{print}/@code{write} statements. +@item OpenMP code that has a requires directive with @code{unified_address}, + @code{unified_shared_memory} or @code{reverse_offload} will remove + any GCN device from the list of available devices (``host fallback''). @end itemize @@ -4491,7 +4494,7 @@ which caches the JIT in the user's directory (see CUDA documentation; can be tuned by the environment variables @code{CUDA_CACHE_@{DISABLE,MAXSIZE,PATH@}}. Note: While PTX ISA is generic, the @code{-mptx=} and @code{-march=} commandline -options still affect the used PTX ISA code and, thus, the requirments on +options still affect the used PTX ISA code and, thus, the requirements on CUDA version and hardware. The implementation remark: @@ -4504,6 +4507,15 @@ The implementation remark: @item Compilation OpenMP code that contains @code{requires reverse_offload} requires at least @code{-march=sm_35}, compiling for @code{-march=sm_30} is not supported. +@item For code containing reverse offload (i.e. @code{target} regions with + @code{device(ancestor:1)}), there is a slight performance penalty + for @emph{all} target regions, consisting mostly of shutdown delay + Per device, reverse offload regions are processed serially such that + the next reverse offload region is only executed after the previous + one returned. +@item OpenMP code that has a requires directive with @code{unified_address} + or @code{unified_shared_memory} will remove any nvptx device from the + list of available devices (``host fallback''). @end itemize