public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [Patch] libgomp: Update OpenMP memory allocation doc, fix omp_high_bw_mem_space
@ 2023-07-10 22:07 Tobias Burnus
  2023-07-11 14:17 ` Tobias Burnus
  0 siblings, 1 reply; 2+ messages in thread
From: Tobias Burnus @ 2023-07-10 22:07 UTC (permalink / raw)
  To: Jakub Jelinek, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1517 bytes --]

I noted that all memory spaces are supported, some by falling
back to the default ("malloc") - except for omp_high_bw_mem_space
(unless the memkind lib is available).

I think it makes more sense to fallback to 'malloc' also for
omp_high_bw_mem_space.

Additionally, I updated the documentation to more explicitly state
what the current implementation is.

Thoughts? Wording improvement suggestions?

Tobias

PS: I wonder whether it makes sense to use use libnuma besides
libmemkind (which depends on libnuma); however, the question is
when. libnuma provides numa_alloc_interleaved(_subset),
numa_alloc_local and numa_alloc_onnode.

In any case, something is odd here. I have two nodes, 0 and 1
(→ 'lscpu') and 'numactls --show' shows "preferred node: current".
I allocate memory and then use the following to find the node:

"get_mempolicy (&node, NULL, 0, ptr, MPOL_F_ADDR|MPOL_F_NODE)"

Result: With malloc'ed data, it shows the same node as the node
running the code (i.e. the same as 'getcpu (NULL, &node1);' ==
'numa_node_of_cpu (sched_getcpu());'). But I get a constant
result of 1 for numa_alloc_local and numa_alloc_onnode, independent
of the passed node number (0 or 1) and on the CPU the thread runs on.
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Attachment #2: allocator.diff --]
[-- Type: text/x-patch, Size: 3019 bytes --]

libgomp: Update OpenMP memory allocation doc, fix omp_high_bw_mem_space

libgomp/ChangeLog:

	* allocator.c (omp_init_allocator): Use malloc for
	omp_high_bw_mem_space when the memkind lib is unavailable instead
	of returning omp_null_allocator.
	* libgomp.texi (Memory allocation with libmemkind): Document
	implementation in more details.

 libgomp/allocator.c  |  2 +-
 libgomp/libgomp.texi | 26 +++++++++++++++++++++++++-
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/libgomp/allocator.c b/libgomp/allocator.c
index c49931cbad4..25c0f150302 100644
--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -301,7 +301,7 @@ omp_init_allocator (omp_memspace_handle_t memspace, int ntraits,
 	  break;
 	}
 #endif
-      return omp_null_allocator;
+      break;
     case omp_large_cap_mem_space:
 #ifdef LIBGOMP_USE_MEMKIND
       memkind_data = gomp_get_memkind ();
diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 7d27cc50df5..b1f58e74903 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -4634,6 +4634,17 @@ smaller number.  On non-host devices, the value of the
 @node Memory allocation with libmemkind
 @section Memory allocation with libmemkind
 
+For the memory spaces, the following applies:
+@itemize
+@item @code{omp_default_mem_space} is supported
+@item @code{omp_const_mem_space} maps to @code{omp_default_mem_space}
+@item @code{omp_low_lat_mem_space} maps to @code{omp_default_mem_space}
+@item @code{omp_large_cap_mem_space} maps to @code{omp_default_mem_space},
+      unless the memkind library is available
+@item @code{omp_high_bw_mem_space} maps to @code{omp_default_mem_space},
+      unless the memkind library is available
+@end itemize
+
 On Linux systems, where the @uref{https://github.com/memkind/memkind, memkind
 library} (@code{libmemkind.so.0}) is available at runtime, it is used when
 creating memory allocators requesting
@@ -4641,9 +4652,22 @@ creating memory allocators requesting
 @itemize
 @item the memory space @code{omp_high_bw_mem_space}
 @item the memory space @code{omp_large_cap_mem_space}
-@item the partition trait @code{omp_atv_interleaved}
+@item the partition trait @code{omp_atv_interleaved}; note that for
+      @code{omp_large_cap_mem_space} the allocation will not be interleaved
 @end itemize
 
+Additional notes:
+@itemize
+@item The @code{pinned} trait is unsupported.
+@item For the @code{partition} trait, the partition part size will be the same
+      as the requested size (i.e. @code{interleaved} or @code{blocked} has no
+      effect), except for @code{interleaved} when the memkind library is
+      available.  Furthermore, @code{nearest} might not always return memory
+      on the node of the CPU that triggered an allocation.
+@item The @code{access} trait has no effect such that memory is always
+      accessible by all threads.
+@item The @code{sync_hint} trait has no effect.
+@end itemize
 
 @c ---------------------------------------------------------------------
 @c Offload-Target Specifics

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [Patch] libgomp: Update OpenMP memory allocation doc, fix omp_high_bw_mem_space
  2023-07-10 22:07 [Patch] libgomp: Update OpenMP memory allocation doc, fix omp_high_bw_mem_space Tobias Burnus
@ 2023-07-11 14:17 ` Tobias Burnus
  0 siblings, 0 replies; 2+ messages in thread
From: Tobias Burnus @ 2023-07-11 14:17 UTC (permalink / raw)
  To: Jakub Jelinek, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 955 bytes --]

I have now committed this (mostly .texi) patch as Rev.
r14-2434-g8c2fc744a25ec4

Changes to my previously posted version: Fixed a typo in .texi and in
the changelog, tweaked the wording for {nearest} to sound better and to
provide more details.

Tobias

On 11.07.23 00:07, Tobias Burnus wrote:
> I noted that all memory spaces are supported, some by falling
> back to the default ("malloc") - except for omp_high_bw_mem_space
> (unless the memkind lib is available).
>
> I think it makes more sense to fallback to 'malloc' also for
> omp_high_bw_mem_space.
>
> Additionally, I updated the documentation to more explicitly state
> what the current implementation is.
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Attachment #2: committed.diff --]
[-- Type: text/x-patch, Size: 4000 bytes --]

commit 8c2fc744a25ec423ab1a317bf4e7d24315c40024
Author: Tobias Burnus <tobias@codesourcery.com>
Date:   Tue Jul 11 16:11:35 2023 +0200

    libgomp: Update OpenMP memory allocation doc, fix omp_high_bw_mem_space
    
    libgomp/
    
            * allocator.c (omp_init_allocator): Use malloc for
            omp_high_bw_mem_space when the memkind lib is unavailable
            instead of returning omp_null_allocator.
            * libgomp.texi (OpenMP 5.0): Fix typo.
            (Memory allocation with libmemkind): Document implementation
            in more detail.
---
 libgomp/allocator.c  |  2 +-
 libgomp/libgomp.texi | 30 ++++++++++++++++++++++++++++--
 2 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/libgomp/allocator.c b/libgomp/allocator.c
index c49931cbad4..25c0f150302 100644
--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -301,7 +301,7 @@ omp_init_allocator (omp_memspace_handle_t memspace, int ntraits,
 	  break;
 	}
 #endif
-      return omp_null_allocator;
+      break;
     case omp_large_cap_mem_space:
 #ifdef LIBGOMP_USE_MEMKIND
       memkind_data = gomp_get_memkind ();
diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 7d27cc50df5..d1a5e67329a 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -192,7 +192,7 @@ The OpenMP 4.5 specification is fully supported.
       env variable @tab Y @tab
 @item Nested-parallel changes to @var{max-active-levels-var} ICV @tab Y @tab
 @item @code{requires} directive @tab P
-      @tab complete but no non-host devices provides @code{unified_shared_memory}
+      @tab complete but no non-host device provides @code{unified_shared_memory}
 @item @code{teams} construct outside an enclosing target region @tab Y @tab
 @item Non-rectangular loop nests @tab P @tab Full support for C/C++, partial for Fortran
 @item @code{!=} as relational-op in canonical loop form for C/C++ @tab Y @tab
@@ -4634,6 +4634,17 @@ smaller number.  On non-host devices, the value of the
 @node Memory allocation with libmemkind
 @section Memory allocation with libmemkind
 
+For the memory spaces, the following applies:
+@itemize
+@item @code{omp_default_mem_space} is supported
+@item @code{omp_const_mem_space} maps to @code{omp_default_mem_space}
+@item @code{omp_low_lat_mem_space} maps to @code{omp_default_mem_space}
+@item @code{omp_large_cap_mem_space} maps to @code{omp_default_mem_space},
+      unless the memkind library is available
+@item @code{omp_high_bw_mem_space} maps to @code{omp_default_mem_space},
+      unless the memkind library is available
+@end itemize
+
 On Linux systems, where the @uref{https://github.com/memkind/memkind, memkind
 library} (@code{libmemkind.so.0}) is available at runtime, it is used when
 creating memory allocators requesting
@@ -4641,9 +4652,24 @@ creating memory allocators requesting
 @itemize
 @item the memory space @code{omp_high_bw_mem_space}
 @item the memory space @code{omp_large_cap_mem_space}
-@item the partition trait @code{omp_atv_interleaved}
+@item the partition trait @code{omp_atv_interleaved}; note that for
+      @code{omp_large_cap_mem_space} the allocation will not be interleaved
 @end itemize
 
+Additional notes:
+@itemize
+@item The @code{pinned} trait is unsupported.
+@item For the @code{partition} trait, the partition part size will be the same
+      as the requested size (i.e. @code{interleaved} or @code{blocked} has no
+      effect), except for @code{interleaved} when the memkind library is
+      available.  Furthermore, for @code{nearest} the memory might not be
+      on the same NUMA node as thread that allocated the memory; on Linux,
+      this is in particular the case when the memory placement policy is
+      set to preferred.
+@item The @code{access} trait has no effect such that memory is always
+      accessible by all threads.
+@item The @code{sync_hint} trait has no effect.
+@end itemize
 
 @c ---------------------------------------------------------------------
 @c Offload-Target Specifics

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2023-07-11 14:17 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-10 22:07 [Patch] libgomp: Update OpenMP memory allocation doc, fix omp_high_bw_mem_space Tobias Burnus
2023-07-11 14:17 ` Tobias Burnus

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).