public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libgomp/110813] New: [OpenMP] omp_target_memcpy_rect (+ strided 'target update'): Improve GCN performance and contiguous subranges
@ 2023-07-26 10:35 burnus at gcc dot gnu.org
2023-07-28 8:06 ` [Bug libgomp/110813] " burnus at gcc dot gnu.org
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: burnus at gcc dot gnu.org @ 2023-07-26 10:35 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110813
Bug ID: 110813
Summary: [OpenMP] omp_target_memcpy_rect (+ strided 'target
update'): Improve GCN performance and contiguous
subranges
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Keywords: missed-optimization, openmp
Severity: normal
Priority: P3
Component: libgomp
Assignee: unassigned at gcc dot gnu.org
Reporter: burnus at gcc dot gnu.org
CC: jakub at gcc dot gnu.org, jules at gcc dot gnu.org
Target Milestone: ---
omp_target_memcpy_rect_worker is used by omp_target_memcpy_rect and
omp_target_memcpy_rect_async.
It is also used when passing strided memory to 'target update' - either on OG13
or when applying the patch
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623502.html - as can be
seen on OG13:
https://github.com/gcc-mirror/gcc/blob/devel/omp/gcc-13/libgomp/target.c#L5689-L5843
(links to omp_target_memcpy_rect_worker; lines might be off when the file was
changed after I linked there.)
ISSUES:
* The current algorithm always loops until dim == 1,
even if the referenced memory is contiguous
That's the case for _rect if src_dim == dst_dim == volume such as:
volume=[V1,N2,N3], ..., dst_dimension=[D1,N2,N3], ... src_dimension=[S1,N2,N3]
the inner two dimensions are contiguous, only the outermost isn't.
Likewise for '!$omp target update to(cont_array(:,:,::2)'
* While for nvptx, a patch exists (see below) that handles _rect copying
for dim=2 and dim=3 more efficiently (CUDA functions), for GCN such a feature
is currently missing.
EXPECTED:
* Improve performance if partially contiguous
* Improve performance on GCN
Cross ref:
- "[patch] OpenMP: Call cuMemcpy2D/cuMemcpy3D for nvptx for
omp_target_memcpy_rect"
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625465.html
(as mentioned in that patch, cross ref to:
- PR101581 - [OpenMP] omp_target_memcpy – support inter-device memcpy )
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug libgomp/110813] [OpenMP] omp_target_memcpy_rect (+ strided 'target update'): Improve GCN performance and contiguous subranges
2023-07-26 10:35 [Bug libgomp/110813] New: [OpenMP] omp_target_memcpy_rect (+ strided 'target update'): Improve GCN performance and contiguous subranges burnus at gcc dot gnu.org
@ 2023-07-28 8:06 ` burnus at gcc dot gnu.org
2023-07-28 11:52 ` burnus at gcc dot gnu.org
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: burnus at gcc dot gnu.org @ 2023-07-28 8:06 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110813
--- Comment #1 from Tobias Burnus <burnus at gcc dot gnu.org> ---
Consider also to use a library function for *inter*-device copy if the device
type or the function pointer is the same.
(If unsupported, the function can still return "-1" to skip to the fallback
code.)
For CUDA, that's cuMemcpyPeer + cuMemcpy3DPeer - the latter must then be used
for 2D and 3D as a 2D version is missing.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug libgomp/110813] [OpenMP] omp_target_memcpy_rect (+ strided 'target update'): Improve GCN performance and contiguous subranges
2023-07-26 10:35 [Bug libgomp/110813] New: [OpenMP] omp_target_memcpy_rect (+ strided 'target update'): Improve GCN performance and contiguous subranges burnus at gcc dot gnu.org
2023-07-28 8:06 ` [Bug libgomp/110813] " burnus at gcc dot gnu.org
@ 2023-07-28 11:52 ` burnus at gcc dot gnu.org
2023-11-16 16:43 ` burnus at gcc dot gnu.org
2024-01-28 21:35 ` burnus at gcc dot gnu.org
3 siblings, 0 replies; 5+ messages in thread
From: burnus at gcc dot gnu.org @ 2023-07-28 11:52 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110813
--- Comment #2 from Tobias Burnus <burnus at gcc dot gnu.org> ---
CUDA for memcpy2d/memcpy3d (quote from plugin-nvptx.c):
/* TODO: Consider using CU_MEMORYTYPE_UNIFIED if supported. */
Or is this unreachable due to omp_target_memcpy_check's NULL
setting of the device pointer for
Further optimization:
* PR 110840 for whether some locking could be removed
The patch mentioned in comment 0:
* has been committed as r14-2792-g25072a477a56a7
("[patch] OpenMP: Call cuMemcpy2D/cuMemcpy3D for nvptx for
omp_target_memcpy_rect")
* Additionally, a follow-up cleanup patch is about to get committed,
cf. https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625725.html
"[patch] libgomp: cuda.h and omp_target_memcpy_rect cleanup"
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug libgomp/110813] [OpenMP] omp_target_memcpy_rect (+ strided 'target update'): Improve GCN performance and contiguous subranges
2023-07-26 10:35 [Bug libgomp/110813] New: [OpenMP] omp_target_memcpy_rect (+ strided 'target update'): Improve GCN performance and contiguous subranges burnus at gcc dot gnu.org
2023-07-28 8:06 ` [Bug libgomp/110813] " burnus at gcc dot gnu.org
2023-07-28 11:52 ` burnus at gcc dot gnu.org
@ 2023-11-16 16:43 ` burnus at gcc dot gnu.org
2024-01-28 21:35 ` burnus at gcc dot gnu.org
3 siblings, 0 replies; 5+ messages in thread
From: burnus at gcc dot gnu.org @ 2023-11-16 16:43 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110813
--- Comment #3 from Tobias Burnus <burnus at gcc dot gnu.org> ---
Julian's patch for this:
https://gcc.gnu.org/pipermail/gcc-patches/2023-September/630996.html
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug libgomp/110813] [OpenMP] omp_target_memcpy_rect (+ strided 'target update'): Improve GCN performance and contiguous subranges
2023-07-26 10:35 [Bug libgomp/110813] New: [OpenMP] omp_target_memcpy_rect (+ strided 'target update'): Improve GCN performance and contiguous subranges burnus at gcc dot gnu.org
` (2 preceding siblings ...)
2023-11-16 16:43 ` burnus at gcc dot gnu.org
@ 2024-01-28 21:35 ` burnus at gcc dot gnu.org
3 siblings, 0 replies; 5+ messages in thread
From: burnus at gcc dot gnu.org @ 2024-01-28 21:35 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110813
--- Comment #4 from Tobias Burnus <burnus at gcc dot gnu.org> ---
The GCN specific part has been applied to
GCC 14 mainline in commit:
https://gcc.gnu.org/g:a17299c17afeb92a56ef716d2d6380c8538493c4
Unhandled:
* Strided and optimized strided copy (incl. generic part of the linked comment
3, which still needs to be comitted), the former is "[PATCH 0/5] OpenMP:
Array-shaping operator and strided/rectangular 'target update' support",
https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629422.html
* Consider also to use a library function for *inter*-device copy if the device
type or the function pointer is the same.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-01-28 21:35 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-26 10:35 [Bug libgomp/110813] New: [OpenMP] omp_target_memcpy_rect (+ strided 'target update'): Improve GCN performance and contiguous subranges burnus at gcc dot gnu.org
2023-07-28 8:06 ` [Bug libgomp/110813] " burnus at gcc dot gnu.org
2023-07-28 11:52 ` burnus at gcc dot gnu.org
2023-11-16 16:43 ` burnus at gcc dot gnu.org
2024-01-28 21:35 ` burnus at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).