public inbox for gcc-cvs@sourceware.org help / color / mirror / Atom feed
From: Tobias Burnus <burnus@gcc.gnu.org> To: gcc-cvs@gcc.gnu.org Subject: [gcc/devel/omp/gcc-12] libgomp.texi: Document libmemkind + nvptx/gcn specifics Date: Fri, 9 Sep 2022 09:37:50 +0000 (GMT) [thread overview] Message-ID: <20220909093750.87B50385742E@sourceware.org> (raw) https://gcc.gnu.org/g:8b479946d9a32bcd47494c9cb6b121b83e185c40 commit 8b479946d9a32bcd47494c9cb6b121b83e185c40 Author: Tobias Burnus <tobias@codesourcery.com> Date: Fri Sep 9 11:28:51 2022 +0200 libgomp.texi: Document libmemkind + nvptx/gcn specifics libgomp/ChangeLog: * libgomp.texi (OpenMP-Implementation Specifics): New; add libmemkind section; move OpenMP Context Selectors from ... (Offload-Target Specifics): ... here; add 'AMD Radeo (GCN)' and 'nvptx' sections. (cherry picked from commit 4f05ff34d63b582557918189528531f35041ef0e) Diff: --- libgomp/ChangeLog.omp | 10 ++++ libgomp/libgomp.texi | 131 +++++++++++++++++++++++++++++++++++++++++++++++--- 2 files changed, 135 insertions(+), 6 deletions(-) diff --git a/libgomp/ChangeLog.omp b/libgomp/ChangeLog.omp index 79574927b1f..62fbddbf677 100644 --- a/libgomp/ChangeLog.omp +++ b/libgomp/ChangeLog.omp @@ -1,3 +1,13 @@ +2022-09-09 Tobias Burnus <tobias@codesourcery.com> + + Backport from mainline: + 2022-09-08 Tobias Burnus <tobias@codesourcery.com> + + * libgomp.texi (OpenMP-Implementation Specifics): New; add libmemkind + section; move OpenMP Context Selectors from ... + (Offload-Target Specifics): ... here; add 'AMD Radeo (GCN)' and + 'nvptx' sections. + 2022-09-09 Tobias Burnus <tobias@codesourcery.com> Backport from mainline: diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi index 8b800985a48..10bfb90abb0 100644 --- a/libgomp/libgomp.texi +++ b/libgomp/libgomp.texi @@ -113,6 +113,8 @@ changed to GNU Offloading and Multi Processing Runtime Library. * OpenACC Library Interoperability:: OpenACC library interoperability with the NVIDIA CUBLAS library. * OpenACC Profiling Interface:: +* OpenMP-Implementation Specifics:: Notes specifics of this OpenMP + implementation * Offload-Target Specifics:: Notes on offload-target specific internals * The libgomp ABI:: Notes on the external ABI presented by libgomp. * Reporting Bugs:: How to report bugs in the GNU Offloading and @@ -4269,16 +4271,15 @@ offloading devices (it's not clear if they should be): @end itemize @c --------------------------------------------------------------------- -@c Offload-Target Specifics +@c OpenMP-Implementation Specifics @c --------------------------------------------------------------------- -@node Offload-Target Specifics -@chapter Offload-Target Specifics - -The following sections present notes on the offload-target specifics. +@node OpenMP-Implementation Specifics +@chapter OpenMP-Implementation Specifics @menu * OpenMP Context Selectors:: +* Memory allocation with libmemkind:: @end menu @node OpenMP Context Selectors @@ -4297,9 +4298,127 @@ The following sections present notes on the offload-target specifics. @tab See @code{-march=} in ``AMD GCN Options'' @item @code{nvptx} @tab @code{gpu} - @tab See @code{-misa=} in ``Nvidia PTX Options'' + @tab See @code{-march=} in ``Nvidia PTX Options'' @end multitable +@node Memory allocation with libmemkind +@section Memory allocation with libmemkind + +On Linux systems, where the @uref{https://github.com/memkind/memkind, memkind +library} (@code{libmemkind.so.0}) is available at runtime, it is used when +creating memory allocators requesting + +@itemize +@item the memory space @code{omp_high_bw_mem_space} +@item the memory space @code{omp_large_cap_mem_space} +@item the partition trait @code{omp_atv_interleaved} +@end itemize + + +@c --------------------------------------------------------------------- +@c Offload-Target Specifics +@c --------------------------------------------------------------------- + +@node Offload-Target Specifics +@chapter Offload-Target Specifics + +The following sections present notes on the offload-target specifics + +@menu +* AMD Radeon:: +* nvptx:: +@end menu + +@node AMD Radeon +@section AMD Radeon (GCN) + +On the hardware side, there is the hierarchy (fine to coarse): +@itemize +@item work item (thread) +@item wavefront +@item work group +@item compute unite (CU) +@end itemize + +All OpenMP and OpenACC levels are used, i.e. +@itemize +@item OpenMP's simd and OpenACC's vector map to work items (thread) +@item OpenMP's threads (``parallel'') and OpenACC's workers map + to wavefronts +@item OpenMP's teams and OpenACC's gang use a threadpool with the + size of the number of teams or gangs, respectively. +@end itemize + +The used sizes are +@itemize +@item Number of teams is the specified @code{num_teams} (OpenMP) or + @code{num_gangs} (OpenACC) or otherwise the number of CU +@item Number of wavefronts is 4 for gfx900 and 16 otherwise; + @code{num_threads} (OpenMP) and @code{num_workers} (OpenACC) + overrides this if smaller. +@item The wavefront has 102 scalars and 64 vectors +@item Number of workitems is always 64 +@item The hardware permits maximally 40 workgroups/CU and + 16 wavefronts/workgroup up to a limit of 40 wavefronts in total per CU. +@item 80 scalars registers and 24 vector registers in non-kernel functions + (the chosen procedure-calling API). +@item For the kernel itself: as many as register pressure demands (number of + teams and number of threads, scaled down if registers are exhausted) +@end itemize + +The implementation remark: +@itemize +@item I/O within OpenMP target regions and OpenACC parallel/kernels is supported + using the C library @code{printf} functions and the Fortran + @code{print}/@code{write} statements. +@end itemize + + + +@node nvptx +@section nvptx + +On the hardware side, there is the hierarchy (fine to coarse): +@itemize +@item thread +@item warp +@item thread block +@item streaming multiprocessor +@end itemize + +All OpenMP and OpenACC levels are used, i.e. +@itemize +@item OpenMP's simd and OpenACC's vector map to threads +@item OpenMP's threads (``parallel'') and OpenACC's workers map to warps +@item OpenMP's teams and OpenACC's gang use a threadpool with the + size of the number of teams or gangs, respectively. +@end itemize + +The used sizes are +@itemize +@item The @code{warp_size} is always 32 +@item CUDA kernel launched: @code{dim=@{#teams,1,1@}, blocks=@{#threads,warp_size,1@}}. +@end itemize + +Additional information can be obtained by setting the environment variable to +@code{GOMP_DEBUG=1} (very verbose; grep for @code{kernel.*launch} for launch +parameters). + +GCC generates generic PTX ISA code, which is just-in-time compiled by CUDA, +which caches the JIT in the user's directory (see CUDA documentation; can be +tuned by the environment variables @code{CUDA_CACHE_@{DISABLE,MAXSIZE,PATH@}}. + +Note: While PTX ISA is generic, the @code{-mptx=} and @code{-march=} commandline +options still affect the used PTX ISA code and, thus, the requirments on +CUDA version and hardware. + +The implementation remark: +@itemize +@item I/O within OpenMP target regions and OpenACC parallel/kernels is supported + using the C library @code{printf} functions. Note that the Fortran + @code{print}/@code{write} statements are not supported, yet. +@end itemize + @c --------------------------------------------------------------------- @c The libgomp ABI
reply other threads:[~2022-09-09 9:37 UTC|newest] Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20220909093750.87B50385742E@sourceware.org \ --to=burnus@gcc.gnu.org \ --cc=gcc-cvs@gcc.gnu.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).