public inbox for fortran@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 0/5] openmp: Handle pinned and unified shared memory.
@ 2022-03-08 11:30 Hafiz Abid Qadeer
  2022-03-08 11:30 ` [PATCH 1/5] openmp: Add -foffload-memory Hafiz Abid Qadeer
                   ` (6 more replies)
  0 siblings, 7 replies; 12+ messages in thread
From: Hafiz Abid Qadeer @ 2022-03-08 11:30 UTC (permalink / raw)
  To: gcc-patches, fortran; +Cc: jakub, ams, joseph

This patch series add support for unified shared memory (USM) and pinned
memory. The support in libgomp is for nvptx offloading only.  A new
command line option -foffload-memory allows user to choose either USM
or pinned memory. The USM can also be enabled using requires construct.

When USM us in use, calls to memory allocation function like malloc are
changed to omp_alloc with appropriate allocator.  No transformations are
required for the pinned memory which is implemented using mlockall so is
only available on Linux.

Andrew Stubbs (4):
  openmp: Add -foffload-memory
  openmp: allow requires unified_shared_memory
  openmp, nvptx: ompx_unified_shared_mem_alloc
  openmp: -foffload-memory=pinned

Hafiz Abid Qadeer (1):
  openmp: Use libgomp memory allocation functions with unified shared
    memory.

 gcc/c/c-parser.cc                             |  13 +-
 gcc/common.opt                                |  16 ++
 gcc/coretypes.h                               |   7 +
 gcc/cp/parser.cc                              |  13 +-
 gcc/doc/invoke.texi                           |  16 +-
 gcc/fortran/openmp.cc                         |  10 +-
 gcc/omp-low.cc                                | 220 ++++++++++++++++++
 gcc/passes.def                                |   1 +
 .../c-c++-common/gomp/alloc-pinned-1.c        |  28 +++
 gcc/testsuite/c-c++-common/gomp/usm-1.c       |   4 +
 gcc/testsuite/c-c++-common/gomp/usm-2.c       |  34 +++
 gcc/testsuite/c-c++-common/gomp/usm-3.c       |  32 +++
 gcc/testsuite/g++.dg/gomp/usm-1.C             |  32 +++
 gcc/testsuite/g++.dg/gomp/usm-2.C             |  30 +++
 gcc/testsuite/g++.dg/gomp/usm-3.C             |  38 +++
 gcc/testsuite/gfortran.dg/gomp/usm-1.f90      |   6 +
 gcc/testsuite/gfortran.dg/gomp/usm-2.f90      |  16 ++
 gcc/testsuite/gfortran.dg/gomp/usm-3.f90      |  13 ++
 gcc/tree-pass.h                               |   1 +
 libgomp/allocator.c                           |  13 +-
 libgomp/config/linux/allocator.c              |  70 ++++--
 libgomp/config/nvptx/allocator.c              |   6 +
 libgomp/libgomp-plugin.h                      |   3 +
 libgomp/libgomp.h                             |   6 +
 libgomp/libgomp.map                           |   5 +
 libgomp/omp.h.in                              |   4 +
 libgomp/omp_lib.f90.in                        |   8 +
 libgomp/plugin/plugin-nvptx.c                 |  45 +++-
 libgomp/target.c                              |  70 ++++++
 libgomp/testsuite/libgomp.c++/usm-1.C         |  54 +++++
 libgomp/testsuite/libgomp.c/alloc-pinned-7.c  |  66 ++++++
 libgomp/testsuite/libgomp.c/usm-1.c           |  24 ++
 libgomp/testsuite/libgomp.c/usm-2.c           |  32 +++
 libgomp/testsuite/libgomp.c/usm-3.c           |  35 +++
 libgomp/testsuite/libgomp.c/usm-4.c           |  36 +++
 libgomp/testsuite/libgomp.c/usm-5.c           |  28 +++
 libgomp/testsuite/libgomp.c/usm-6.c           |  70 ++++++
 37 files changed, 1075 insertions(+), 30 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/alloc-pinned-1.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/usm-1.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/usm-2.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/usm-3.c
 create mode 100644 gcc/testsuite/g++.dg/gomp/usm-1.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/usm-2.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/usm-3.C
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/usm-1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/usm-2.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/usm-3.f90
 create mode 100644 libgomp/testsuite/libgomp.c++/usm-1.C
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-7.c
 create mode 100644 libgomp/testsuite/libgomp.c/usm-1.c
 create mode 100644 libgomp/testsuite/libgomp.c/usm-2.c
 create mode 100644 libgomp/testsuite/libgomp.c/usm-3.c
 create mode 100644 libgomp/testsuite/libgomp.c/usm-4.c
 create mode 100644 libgomp/testsuite/libgomp.c/usm-5.c
 create mode 100644 libgomp/testsuite/libgomp.c/usm-6.c

-- 
2.25.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/5] openmp: Add -foffload-memory
  2022-03-08 11:30 [PATCH 0/5] openmp: Handle pinned and unified shared memory Hafiz Abid Qadeer
@ 2022-03-08 11:30 ` Hafiz Abid Qadeer
  2022-03-08 11:30 ` [PATCH 2/5] openmp: allow requires unified_shared_memory Hafiz Abid Qadeer
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Hafiz Abid Qadeer @ 2022-03-08 11:30 UTC (permalink / raw)
  To: gcc-patches, fortran; +Cc: jakub, ams, joseph

From: Andrew Stubbs <ams@codesourcery.com>

Add a new option.  It will be used in follow-up patches.

gcc/ChangeLog:

	* common.opt: Add -foffload-memory and its enum values.
	* coretypes.h (enum offload_memory): New.
	* doc/invoke.texi: Document -foffload-memory.
---
 gcc/common.opt      | 16 ++++++++++++++++
 gcc/coretypes.h     |  7 +++++++
 gcc/doc/invoke.texi | 16 +++++++++++++++-
 3 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 8b6513de47c..17426523e23 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2182,6 +2182,22 @@ Enum(offload_abi) String(ilp32) Value(OFFLOAD_ABI_ILP32)
 EnumValue
 Enum(offload_abi) String(lp64) Value(OFFLOAD_ABI_LP64)
 
+foffload-memory=
+Common Joined RejectNegative Enum(offload_memory) Var(flag_offload_memory) Init(OFFLOAD_MEMORY_NONE)
+-foffload-memory=[none|unified|pinned]	Use an offload memory optimization.
+
+Enum
+Name(offload_memory) Type(enum offload_memory) UnknownError(Unknown offload memory option %qs)
+
+EnumValue
+Enum(offload_memory) String(none) Value(OFFLOAD_MEMORY_NONE)
+
+EnumValue
+Enum(offload_memory) String(unified) Value(OFFLOAD_MEMORY_UNIFIED)
+
+EnumValue
+Enum(offload_memory) String(pinned) Value(OFFLOAD_MEMORY_PINNED)
+
 fomit-frame-pointer
 Common Var(flag_omit_frame_pointer) Optimization
 When possible do not generate stack frames.
diff --git a/gcc/coretypes.h b/gcc/coretypes.h
index 08b9ac9094c..dd52d5bb113 100644
--- a/gcc/coretypes.h
+++ b/gcc/coretypes.h
@@ -206,6 +206,13 @@ enum offload_abi {
   OFFLOAD_ABI_ILP32
 };
 
+/* Types of memory optimization for an offload device.  */
+enum offload_memory {
+  OFFLOAD_MEMORY_NONE,
+  OFFLOAD_MEMORY_UNIFIED,
+  OFFLOAD_MEMORY_PINNED
+};
+
 /* Types of profile update methods.  */
 enum profile_update {
   PROFILE_UPDATE_SINGLE,
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 248ed534aee..d16019fc8c3 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -202,7 +202,7 @@ in the following sections.
 -fno-builtin  -fno-builtin-@var{function}  -fcond-mismatch @gol
 -ffreestanding  -fgimple  -fgnu-tm  -fgnu89-inline  -fhosted @gol
 -flax-vector-conversions  -fms-extensions @gol
--foffload=@var{arg}  -foffload-options=@var{arg} @gol
+-foffload=@var{arg}  -foffload-options=@var{arg} -foffload-memory=@var{arg} @gol
 -fopenacc  -fopenacc-dim=@var{geom} @gol
 -fopenmp  -fopenmp-simd @gol
 -fpermitted-flt-eval-methods=@var{standard} @gol
@@ -2694,6 +2694,20 @@ Typical command lines are
 -foffload-options=amdgcn-amdhsa=-march=gfx906 -foffload-options=-lm
 @end smallexample
 
+@item -foffload-memory=none
+@itemx -foffload-memory=unified
+@itemx -foffload-memory=pinned
+@opindex foffload-memory
+@cindex OpenMP offloading memory modes
+Enable a memory optimization mode to use with OpenMP.  The default behavior,
+@option{-foffload-memory=none}, is to do nothing special (unless enabled via
+a requires directive in the code).  @option{-foffload-memory=unified} is
+equivalent to @code{#pragma omp requires unified_shared_memory}.
+@option{-foffload-memory=pinned} forces all host memory to be pinned (this
+mode may require the user to increase the ulimit setting for locked memory).
+All translation units must select the same setting to avoid undefined
+behavior.
+
 @item -fopenacc
 @opindex fopenacc
 @cindex OpenACC accelerator programming
-- 
2.25.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 2/5] openmp: allow requires unified_shared_memory
  2022-03-08 11:30 [PATCH 0/5] openmp: Handle pinned and unified shared memory Hafiz Abid Qadeer
  2022-03-08 11:30 ` [PATCH 1/5] openmp: Add -foffload-memory Hafiz Abid Qadeer
@ 2022-03-08 11:30 ` Hafiz Abid Qadeer
  2022-03-08 11:30 ` [PATCH 3/5] openmp, nvptx: ompx_unified_shared_mem_alloc Hafiz Abid Qadeer
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Hafiz Abid Qadeer @ 2022-03-08 11:30 UTC (permalink / raw)
  To: gcc-patches, fortran; +Cc: jakub, ams, joseph

From: Andrew Stubbs <ams@codesourcery.com>

This is the front-end portion of the Unified Shared Memory implementation.
It removes the "sorry, unimplemented message" in C, C++, and Fortran, and sets
flag_offload_memory, but is otherwise inactive, for now.

It also checks that -foffload-memory isn't set to an incompatible mode.

gcc/c/ChangeLog:

	* c-parser.cc (c_parser_omp_requires): Allow "requires
	  unified_share_memory".

gcc/cp/ChangeLog:

	* parser.cc (cp_parser_omp_requires): Allow "requires
	unified_share_memory".

gcc/fortran/ChangeLog:

	* openmp.cc (gfc_match_omp_requires): Allow "requires
	unified_share_memory".

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/usm-1.c: New test.
	* gfortran.dg/gomp/usm-1.f90: New test.
---
 gcc/c/c-parser.cc                        | 13 ++++++++++++-
 gcc/cp/parser.cc                         | 13 ++++++++++++-
 gcc/fortran/openmp.cc                    | 10 +++++++++-
 gcc/testsuite/c-c++-common/gomp/usm-1.c  |  4 ++++
 gcc/testsuite/gfortran.dg/gomp/usm-1.f90 |  6 ++++++
 5 files changed, 43 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/usm-1.c
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/usm-1.f90

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 84deac04c44..dc834158d1c 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -22542,7 +22542,16 @@ c_parser_omp_requires (c_parser *parser)
 	  if (!strcmp (p, "unified_address"))
 	    this_req = OMP_REQUIRES_UNIFIED_ADDRESS;
 	  else if (!strcmp (p, "unified_shared_memory"))
+	  {
 	    this_req = OMP_REQUIRES_UNIFIED_SHARED_MEMORY;
+
+	    if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED
+		&& flag_offload_memory != OFFLOAD_MEMORY_NONE)
+	      error_at (cloc,
+			"unified_shared_memory is incompatible with the "
+			"selected -foffload-memory option");
+	    flag_offload_memory = OFFLOAD_MEMORY_UNIFIED;
+	  }
 	  else if (!strcmp (p, "dynamic_allocators"))
 	    this_req = OMP_REQUIRES_DYNAMIC_ALLOCATORS;
 	  else if (!strcmp (p, "reverse_offload"))
@@ -22609,7 +22618,9 @@ c_parser_omp_requires (c_parser *parser)
 	      c_parser_skip_to_pragma_eol (parser, false);
 	      return;
 	    }
-	  if (p && this_req != OMP_REQUIRES_DYNAMIC_ALLOCATORS)
+	  if (p
+	      && this_req != OMP_REQUIRES_DYNAMIC_ALLOCATORS
+	      && this_req != OMP_REQUIRES_UNIFIED_SHARED_MEMORY)
 	    sorry_at (cloc, "%qs clause on %<requires%> directive not "
 			    "supported yet", p);
 	  if (p)
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 03d99aba13e..ba263152aaf 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -46464,7 +46464,16 @@ cp_parser_omp_requires (cp_parser *parser, cp_token *pragma_tok)
 	  if (!strcmp (p, "unified_address"))
 	    this_req = OMP_REQUIRES_UNIFIED_ADDRESS;
 	  else if (!strcmp (p, "unified_shared_memory"))
+	  {
 	    this_req = OMP_REQUIRES_UNIFIED_SHARED_MEMORY;
+
+	    if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED
+		&& flag_offload_memory != OFFLOAD_MEMORY_NONE)
+	      error_at (cloc,
+			"unified_shared_memory is incompatible with the "
+			"selected -foffload-memory option");
+	    flag_offload_memory = OFFLOAD_MEMORY_UNIFIED;
+	  }
 	  else if (!strcmp (p, "dynamic_allocators"))
 	    this_req = OMP_REQUIRES_DYNAMIC_ALLOCATORS;
 	  else if (!strcmp (p, "reverse_offload"))
@@ -46537,7 +46546,9 @@ cp_parser_omp_requires (cp_parser *parser, cp_token *pragma_tok)
 	      cp_parser_skip_to_pragma_eol (parser, pragma_tok);
 	      return false;
 	    }
-	  if (p && this_req != OMP_REQUIRES_DYNAMIC_ALLOCATORS)
+	  if (p
+	      && this_req != OMP_REQUIRES_DYNAMIC_ALLOCATORS
+	      && this_req != OMP_REQUIRES_UNIFIED_SHARED_MEMORY)
 	    sorry_at (cloc, "%qs clause on %<requires%> directive not "
 			    "supported yet", p);
 	  if (p)
diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index 16cd03a3d67..1f434857719 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -29,6 +29,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "diagnostic.h"
 #include "gomp-constants.h"
 #include "target-memory.h"  /* For gfc_encode_character.  */
+#include "options.h"
 
 /* Match an end of OpenMP directive.  End of OpenMP directive is optional
    whitespace, followed by '\n' or comment '!'.  */
@@ -5373,6 +5374,12 @@ gfc_match_omp_requires (void)
 	  requires_clause = OMP_REQ_UNIFIED_SHARED_MEMORY;
 	  if (requires_clauses & OMP_REQ_UNIFIED_SHARED_MEMORY)
 	    goto duplicate_clause;
+
+	  if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED
+	      && flag_offload_memory != OFFLOAD_MEMORY_NONE)
+	    gfc_error_now ("unified_shared_memory at %C is incompatible with "
+			   "the selected -foffload-memory option");
+	  flag_offload_memory = OFFLOAD_MEMORY_UNIFIED;
 	}
       else if (gfc_match (clauses[3]) == MATCH_YES)
 	{
@@ -5412,7 +5419,8 @@ gfc_match_omp_requires (void)
 	goto error;
 
       if (requires_clause & ~(OMP_REQ_ATOMIC_MEM_ORDER_MASK
-			      | OMP_REQ_DYNAMIC_ALLOCATORS))
+			      | OMP_REQ_DYNAMIC_ALLOCATORS
+			      | OMP_REQ_UNIFIED_SHARED_MEMORY))
 	gfc_error_now ("Sorry, %qs clause at %L on REQUIRES directive is not "
 		       "yet supported", clause, &old_loc);
       if (!gfc_omp_requires_add_clause (requires_clause, clause, &old_loc, NULL))
diff --git a/gcc/testsuite/c-c++-common/gomp/usm-1.c b/gcc/testsuite/c-c++-common/gomp/usm-1.c
new file mode 100644
index 00000000000..619c21a83f4
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/usm-1.c
@@ -0,0 +1,4 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-foffload-memory=pinned" } */
+
+#pragma omp requires unified_shared_memory  /* { dg-error "unified_shared_memory is incompatible with the selected -foffload-memory option" } */
diff --git a/gcc/testsuite/gfortran.dg/gomp/usm-1.f90 b/gcc/testsuite/gfortran.dg/gomp/usm-1.f90
new file mode 100644
index 00000000000..340f6bb50a5
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/usm-1.f90
@@ -0,0 +1,6 @@
+! { dg-do compile }
+! { dg-additional-options "-foffload-memory=pinned" }
+
+!$omp requires unified_shared_memory  ! { dg-error "unified_shared_memory at .* is incompatible with the selected -foffload-memory option" }
+
+end
-- 
2.25.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 3/5] openmp, nvptx: ompx_unified_shared_mem_alloc
  2022-03-08 11:30 [PATCH 0/5] openmp: Handle pinned and unified shared memory Hafiz Abid Qadeer
  2022-03-08 11:30 ` [PATCH 1/5] openmp: Add -foffload-memory Hafiz Abid Qadeer
  2022-03-08 11:30 ` [PATCH 2/5] openmp: allow requires unified_shared_memory Hafiz Abid Qadeer
@ 2022-03-08 11:30 ` Hafiz Abid Qadeer
  2022-03-08 11:30 ` [PATCH 4/5] openmp: Use libgomp memory allocation functions with unified shared memory Hafiz Abid Qadeer
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Hafiz Abid Qadeer @ 2022-03-08 11:30 UTC (permalink / raw)
  To: gcc-patches, fortran; +Cc: jakub, ams, joseph

From: Andrew Stubbs <ams@codesourcery.com>

This adds support for using Cuda Managed Memory with omp_alloc.  It will be
used as the underpinnings for "requires unified_shared_memory" in a later
patch.

There are two new predefined allocators, ompx_unified_shared_mem_alloc and
ompx_host_mem_alloc, plus corresponding memory spaces, which can be used to
allocate memory in the "managed" space and explicitly on the host (it is
intended that "malloc" will be intercepted by the compiler).

The nvptx plugin is modified to make the necessary Cuda calls, and libgomp
is modified to switch to shared-memory mode for USM allocated mappings.

libgomp/ChangeLog:

	* allocator.c (omp_max_predefined_alloc): Update.
	(omp_aligned_alloc): Don't fallback ompx_host_mem_alloc.
	(omp_aligned_calloc): Likewise.
	(omp_realloc): Likewise.
	* config/linux/allocator.c (linux_memspace_alloc): Handle USM.
	(linux_memspace_calloc): Handle USM.
	(linux_memspace_free): Handle USM.
	(linux_memspace_realloc): Handle USM.
	* config/nvptx/allocator.c (nvptx_memspace_alloc): Reject
	ompx_host_mem_alloc.
	(nvptx_memspace_calloc): Likewise.
	(nvptx_memspace_realloc): Likewise.
	* libgomp-plugin.h (GOMP_OFFLOAD_usm_alloc): New prototype.
	(GOMP_OFFLOAD_usm_free): New prototype.
	(GOMP_OFFLOAD_is_usm_ptr): New prototype.
	* libgomp.h (gomp_usm_alloc): New prototype.
	(gomp_usm_free): New prototype.
	(gomp_is_usm_ptr): New prototype.
	(struct gomp_device_descr): Add USM functions.
	* omp.h.in (omp_memspace_handle_t): Add ompx_unified_shared_mem_space
	and ompx_host_mem_space.
	(omp_allocator_handle_t): Add ompx_unified_shared_mem_alloc and
	ompx_host_mem_alloc.
	* omp_lib.f90.in: Likewise.
	* plugin/plugin-nvptx.c (nvptx_alloc): Add "usm" parameter.
	Call cuMemAllocManaged as appropriate.
	(GOMP_OFFLOAD_alloc): Move internals to ...
	(GOMP_OFFLOAD_alloc_1): ... this, and add usm parameter.
	(GOMP_OFFLOAD_usm_alloc): New function.
	(GOMP_OFFLOAD_usm_free): New function.
	(GOMP_OFFLOAD_is_usm_ptr): New function.
	* target.c (gomp_map_vars_internal): Add USM support.
	(gomp_usm_alloc): New function.
	(gomp_usm_free): New function.
	(gomp_load_plugin_for_device): New function.
	* testsuite/libgomp.c/usm-1.c: New test.
	* testsuite/libgomp.c/usm-2.c: New test.
	* testsuite/libgomp.c/usm-3.c: New test.
	* testsuite/libgomp.c/usm-4.c: New test.
	* testsuite/libgomp.c/usm-5.c: New test.
---
 libgomp/allocator.c                 | 13 ++++--
 libgomp/config/linux/allocator.c    | 48 ++++++++++++--------
 libgomp/config/nvptx/allocator.c    |  6 +++
 libgomp/libgomp-plugin.h            |  3 ++
 libgomp/libgomp.h                   |  6 +++
 libgomp/omp.h.in                    |  4 ++
 libgomp/omp_lib.f90.in              |  8 ++++
 libgomp/plugin/plugin-nvptx.c       | 45 ++++++++++++++++---
 libgomp/target.c                    | 70 +++++++++++++++++++++++++++++
 libgomp/testsuite/libgomp.c/usm-1.c | 24 ++++++++++
 libgomp/testsuite/libgomp.c/usm-2.c | 32 +++++++++++++
 libgomp/testsuite/libgomp.c/usm-3.c | 35 +++++++++++++++
 libgomp/testsuite/libgomp.c/usm-4.c | 36 +++++++++++++++
 libgomp/testsuite/libgomp.c/usm-5.c | 28 ++++++++++++
 14 files changed, 330 insertions(+), 28 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c/usm-1.c
 create mode 100644 libgomp/testsuite/libgomp.c/usm-2.c
 create mode 100644 libgomp/testsuite/libgomp.c/usm-3.c
 create mode 100644 libgomp/testsuite/libgomp.c/usm-4.c
 create mode 100644 libgomp/testsuite/libgomp.c/usm-5.c

diff --git a/libgomp/allocator.c b/libgomp/allocator.c
index 000ccc2dd9c..18045dbe0c4 100644
--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -32,7 +32,7 @@
 #include <stdlib.h>
 #include <string.h>
 
-#define omp_max_predefined_alloc ompx_pinned_mem_alloc
+#define omp_max_predefined_alloc ompx_host_mem_alloc
 
 /* These macros may be overridden in config/<target>/allocator.c.  */
 #ifndef MEMSPACE_ALLOC
@@ -68,6 +68,8 @@ static const omp_memspace_handle_t predefined_alloc_mapping[] = {
   omp_low_lat_mem_space,   /* omp_pteam_mem_alloc. */
   omp_low_lat_mem_space,   /* omp_thread_mem_alloc. */
   omp_default_mem_space,   /* ompx_pinned_mem_alloc. */
+  ompx_unified_shared_mem_space,  /* ompx_unified_shared_mem_alloc. */
+  ompx_host_mem_space,     /* ompx_host_mem_alloc.  */
 };
 
 struct omp_allocator_data
@@ -367,7 +369,8 @@ fail:
   int fallback = (allocator_data
 		  ? allocator_data->fallback
 		  : (allocator == omp_default_mem_alloc
-		     || allocator == ompx_pinned_mem_alloc)
+		     || allocator == ompx_pinned_mem_alloc
+		     || allocator == ompx_host_mem_alloc)
 		  ? omp_atv_null_fb
 		  : omp_atv_default_mem_fb);
   switch (fallback)
@@ -597,7 +600,8 @@ fail:
   int fallback = (allocator_data
 		  ? allocator_data->fallback
 		  : (allocator == omp_default_mem_alloc
-		     || allocator == ompx_pinned_mem_alloc)
+		     || allocator == ompx_pinned_mem_alloc
+		     || allocator == ompx_host_mem_alloc)
 		  ? omp_atv_null_fb
 		  : omp_atv_default_mem_fb);
   switch (fallback)
@@ -855,7 +859,8 @@ fail:
   int fallback = (allocator_data
 		  ? allocator_data->fallback
 		  : (allocator == omp_default_mem_alloc
-		     || allocator == ompx_pinned_mem_alloc)
+		     || allocator == ompx_pinned_mem_alloc
+		     || allocator == ompx_host_mem_alloc)
 		  ? omp_atv_null_fb
 		  : omp_atv_default_mem_fb);
   switch (fallback)
diff --git a/libgomp/config/linux/allocator.c b/libgomp/config/linux/allocator.c
index 5f3ae491f07..face524259c 100644
--- a/libgomp/config/linux/allocator.c
+++ b/libgomp/config/linux/allocator.c
@@ -42,9 +42,11 @@
 static void *
 linux_memspace_alloc (omp_memspace_handle_t memspace, size_t size, int pin)
 {
-  (void)memspace;
-
-  if (pin)
+  if (memspace == ompx_unified_shared_mem_space)
+    {
+      return gomp_usm_alloc (size, GOMP_DEVICE_ICV);
+    }
+  else if (pin)
     {
       void *addr = mmap (NULL, size, PROT_READ | PROT_WRITE,
 			 MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
@@ -67,7 +69,14 @@ linux_memspace_alloc (omp_memspace_handle_t memspace, size_t size, int pin)
 static void *
 linux_memspace_calloc (omp_memspace_handle_t memspace, size_t size, int pin)
 {
-  if (pin)
+  if (memspace == ompx_unified_shared_mem_space)
+    {
+      void *ret = gomp_usm_alloc (size, GOMP_DEVICE_ICV);
+      memset (ret, 0, size);
+      return ret;
+    }
+  else if (memspace == ompx_unified_shared_mem_space
+      || pin)
     return linux_memspace_alloc (memspace, size, pin);
   else
     return calloc (1, size);
@@ -77,9 +86,9 @@ static void
 linux_memspace_free (omp_memspace_handle_t memspace, void *addr, size_t size,
 		     int pin)
 {
-  (void)memspace;
-
-  if (pin)
+  if (memspace == ompx_unified_shared_mem_space)
+    gomp_usm_free (addr, GOMP_DEVICE_ICV);
+  else if (pin)
     munmap (addr, size);
   else
     free (addr);
@@ -89,7 +98,9 @@ static void *
 linux_memspace_realloc (omp_memspace_handle_t memspace, void *addr,
 			size_t oldsize, size_t size, int oldpin, int pin)
 {
-  if (oldpin && pin)
+  if (memspace == ompx_unified_shared_mem_space)
+    goto manual_realloc;
+  else if (oldpin && pin)
     {
       void *newaddr = mremap (addr, oldsize, size, MREMAP_MAYMOVE);
       if (newaddr == MAP_FAILED)
@@ -98,18 +109,19 @@ linux_memspace_realloc (omp_memspace_handle_t memspace, void *addr,
       return newaddr;
     }
   else if (oldpin || pin)
-    {
-      void *newaddr = linux_memspace_alloc (memspace, size, pin);
-      if (newaddr)
-	{
-	  memcpy (newaddr, addr, oldsize < size ? oldsize : size);
-	  linux_memspace_free (memspace, addr, oldsize, oldpin);
-	}
-
-      return newaddr;
-    }
+    goto manual_realloc;
   else
     return realloc (addr, size);
+
+manual_realloc:
+  void *newaddr = linux_memspace_alloc (memspace, size, pin);
+  if (newaddr)
+    {
+      memcpy (newaddr, addr, oldsize < size ? oldsize : size);
+      linux_memspace_free (memspace, addr, oldsize, oldpin);
+    }
+
+  return newaddr;
 }
 
 #define MEMSPACE_ALLOC(MEMSPACE, SIZE, PIN) \
diff --git a/libgomp/config/nvptx/allocator.c b/libgomp/config/nvptx/allocator.c
index 0102680b717..c1a73511623 100644
--- a/libgomp/config/nvptx/allocator.c
+++ b/libgomp/config/nvptx/allocator.c
@@ -125,6 +125,8 @@ nvptx_memspace_alloc (omp_memspace_handle_t memspace, size_t size)
       __atomic_store_n (&__nvptx_lowlat_heap_root, root.raw, MEMMODEL_RELEASE);
       return result;
     }
+  else if (memspace == ompx_host_mem_space)
+    return NULL;
   else
     return malloc (size);
 }
@@ -145,6 +147,8 @@ nvptx_memspace_calloc (omp_memspace_handle_t memspace, size_t size)
 
       return result;
     }
+  else if (memspace == ompx_host_mem_space)
+    return NULL;
   else
     return calloc (1, size);
 }
@@ -354,6 +358,8 @@ nvptx_memspace_realloc (omp_memspace_handle_t memspace, void *addr,
 	}
       return result;
     }
+  else if (memspace == ompx_host_mem_space)
+    return NULL;
   else
     return realloc (addr, size);
 }
diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h
index 07ab700b80c..104f375bc1b 100644
--- a/libgomp/libgomp-plugin.h
+++ b/libgomp/libgomp-plugin.h
@@ -134,6 +134,9 @@ extern int GOMP_OFFLOAD_load_image (int, unsigned, const void *,
 extern bool GOMP_OFFLOAD_unload_image (int, unsigned, const void *);
 extern void *GOMP_OFFLOAD_alloc (int, size_t);
 extern bool GOMP_OFFLOAD_free (int, void *);
+extern void *GOMP_OFFLOAD_usm_alloc (int, size_t);
+extern bool GOMP_OFFLOAD_usm_free (int, void *);
+extern bool GOMP_OFFLOAD_is_usm_ptr (void *);
 extern bool GOMP_OFFLOAD_dev2host (int, void *, const void *, size_t);
 extern bool GOMP_OFFLOAD_host2dev (int, void *, const void *, size_t);
 extern bool GOMP_OFFLOAD_dev2dev (int, void *, const void *, size_t);
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index b9e03919993..1cbde607794 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -1013,6 +1013,9 @@ extern int gomp_pause_host (void);
 extern void gomp_init_targets_once (void);
 extern int gomp_get_num_devices (void);
 extern bool gomp_target_task_fn (void *);
+extern void * gomp_usm_alloc (size_t size, int device_num);
+extern void gomp_usm_free (void *device_ptr, int device_num);
+extern bool gomp_is_usm_ptr (void *ptr);
 
 /* Splay tree definitions.  */
 typedef struct splay_tree_node_s *splay_tree_node;
@@ -1238,6 +1241,9 @@ struct gomp_device_descr
   __typeof (GOMP_OFFLOAD_unload_image) *unload_image_func;
   __typeof (GOMP_OFFLOAD_alloc) *alloc_func;
   __typeof (GOMP_OFFLOAD_free) *free_func;
+  __typeof (GOMP_OFFLOAD_usm_alloc) *usm_alloc_func;
+  __typeof (GOMP_OFFLOAD_usm_free) *usm_free_func;
+  __typeof (GOMP_OFFLOAD_is_usm_ptr) *is_usm_ptr_func;
   __typeof (GOMP_OFFLOAD_dev2host) *dev2host_func;
   __typeof (GOMP_OFFLOAD_host2dev) *host2dev_func;
   __typeof (GOMP_OFFLOAD_dev2dev) *dev2dev_func;
diff --git a/libgomp/omp.h.in b/libgomp/omp.h.in
index 1d002d36aae..4ec4475306b 100644
--- a/libgomp/omp.h.in
+++ b/libgomp/omp.h.in
@@ -120,6 +120,8 @@ typedef enum omp_memspace_handle_t __GOMP_UINTPTR_T_ENUM
   omp_const_mem_space = 2,
   omp_high_bw_mem_space = 3,
   omp_low_lat_mem_space = 4,
+  ompx_unified_shared_mem_space = 5,
+  ompx_host_mem_space = 6,
   __omp_memspace_handle_t_max__ = __UINTPTR_MAX__
 } omp_memspace_handle_t;
 
@@ -135,6 +137,8 @@ typedef enum omp_allocator_handle_t __GOMP_UINTPTR_T_ENUM
   omp_pteam_mem_alloc = 7,
   omp_thread_mem_alloc = 8,
   ompx_pinned_mem_alloc = 9,
+  ompx_unified_shared_mem_alloc = 10,
+  ompx_host_mem_alloc = 11,
   __omp_allocator_handle_t_max__ = __UINTPTR_MAX__
 } omp_allocator_handle_t;
 
diff --git a/libgomp/omp_lib.f90.in b/libgomp/omp_lib.f90.in
index a095dad8962..e1c32aa78d2 100644
--- a/libgomp/omp_lib.f90.in
+++ b/libgomp/omp_lib.f90.in
@@ -160,6 +160,10 @@
                  parameter :: omp_thread_mem_alloc = 8
         integer (kind=omp_allocator_handle_kind), &
                  parameter :: ompx_pinned_mem_alloc = 9
+        integer (kind=omp_allocator_handle_kind), &
+                 parameter :: ompx_unified_shared_mem_alloc = 10
+        integer (kind=omp_allocator_handle_kind), &
+                 parameter :: ompx_host_mem_alloc = 11
         integer (omp_memspace_handle_kind), &
                  parameter :: omp_default_mem_space = 0
         integer (omp_memspace_handle_kind), &
@@ -170,6 +174,10 @@
                  parameter :: omp_high_bw_mem_space = 3
         integer (omp_memspace_handle_kind), &
                  parameter :: omp_low_lat_mem_space = 4
+        integer (omp_memspace_handle_kind), &
+                 parameter :: omp_unified_shared_mem_space = 5
+        integer (omp_memspace_handle_kind), &
+                 parameter :: omp_host_mem_space = 6
 
         type omp_alloctrait
           integer (kind=omp_alloctrait_key_kind) key
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 1b9a5e95c07..b664d652a45 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1042,11 +1042,13 @@ nvptx_stacks_free (struct ptx_device *ptx_dev, bool force)
 }
 
 static void *
-nvptx_alloc (size_t s, bool suppress_errors)
+nvptx_alloc (size_t s, bool suppress_errors, bool usm)
 {
   CUdeviceptr d;
 
-  CUresult r = CUDA_CALL_NOCHECK (cuMemAlloc, &d, s);
+  CUresult r = (usm ? CUDA_CALL_NOCHECK (cuMemAllocManaged, &d, s,
+					 CU_MEM_ATTACH_GLOBAL)
+		: CUDA_CALL_NOCHECK (cuMemAlloc, &d, s));
   if (suppress_errors && r == CUDA_ERROR_OUT_OF_MEMORY)
     return NULL;
   else if (r != CUDA_SUCCESS)
@@ -1423,8 +1425,8 @@ GOMP_OFFLOAD_unload_image (int ord, unsigned version, const void *target_data)
   return ret;
 }
 
-void *
-GOMP_OFFLOAD_alloc (int ord, size_t size)
+static void *
+GOMP_OFFLOAD_alloc_1 (int ord, size_t size, bool usm)
 {
   if (!nvptx_attach_host_thread_to_device (ord))
     return NULL;
@@ -1447,7 +1449,7 @@ GOMP_OFFLOAD_alloc (int ord, size_t size)
       blocks = tmp;
     }
 
-  void *d = nvptx_alloc (size, true);
+  void *d = nvptx_alloc (size, true, usm);
   if (d)
     return d;
   else
@@ -1455,10 +1457,22 @@ GOMP_OFFLOAD_alloc (int ord, size_t size)
       /* Memory allocation failed.  Try freeing the stacks block, and
 	 retrying.  */
       nvptx_stacks_free (ptx_dev, true);
-      return nvptx_alloc (size, false);
+      return nvptx_alloc (size, false, usm);
     }
 }
 
+void *
+GOMP_OFFLOAD_alloc (int ord, size_t size)
+{
+  return GOMP_OFFLOAD_alloc_1 (ord, size, false);
+}
+
+void *
+GOMP_OFFLOAD_usm_alloc (int ord, size_t size)
+{
+  return GOMP_OFFLOAD_alloc_1 (ord, size, true);
+}
+
 bool
 GOMP_OFFLOAD_free (int ord, void *ptr)
 {
@@ -1466,6 +1480,25 @@ GOMP_OFFLOAD_free (int ord, void *ptr)
 	  && nvptx_free (ptr, ptx_devices[ord]));
 }
 
+bool
+GOMP_OFFLOAD_usm_free (int ord, void *ptr)
+{
+  return GOMP_OFFLOAD_free (ord, ptr);
+}
+
+bool
+GOMP_OFFLOAD_is_usm_ptr (void *ptr)
+{
+  bool managed = false;
+  /* This returns 3 outcomes ...
+     CUDA_ERROR_INVALID_VALUE    - Not a Cuda allocated pointer.
+     CUDA_SUCCESS, managed:false - Cuda allocated, but not USM.
+     CUDA_SUCCESS, managed:true  - USM.  */
+  CUDA_CALL_NOCHECK (cuPointerGetAttribute, &managed,
+		     CU_POINTER_ATTRIBUTE_IS_MANAGED, (CUdeviceptr)ptr);
+  return managed;
+}
+
 void
 GOMP_OFFLOAD_openacc_exec (void (*fn) (void *), size_t mapnum,
 			   void **hostaddrs, void **devaddrs,
diff --git a/libgomp/target.c b/libgomp/target.c
index 9017458885e..f98e8da2526 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -1030,6 +1030,15 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep,
 	    tgt->list[i].offset = 0;
 	  continue;
 	}
+      else if (devicep->is_usm_ptr_func
+	       && devicep->is_usm_ptr_func (hostaddrs[i]))
+	{
+	  /* The memory is visible from both host and target
+	     so nothing needs to be moved.  */
+	  tgt->list[i].key = NULL;
+	  tgt->list[i].offset = OFFSET_INLINED;
+	  continue;
+	}
       else if ((kind & typemask) == GOMP_MAP_STRUCT)
 	{
 	  size_t first = i + 1;
@@ -1488,6 +1497,9 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep,
 		  continue;
 		}
 	      default:
+		if (tgt->list[i].offset == OFFSET_INLINED
+		    && !array)
+		  continue;
 		break;
 	      }
 	    splay_tree_key k = &array->key;
@@ -3323,6 +3335,61 @@ omp_target_free (void *device_ptr, int device_num)
   gomp_mutex_unlock (&devicep->lock);
 }
 
+void *
+gomp_usm_alloc (size_t size, int device_num)
+{
+  if (device_num == gomp_get_num_devices ())
+    return malloc (size);
+
+  struct gomp_device_descr *devicep = resolve_device (device_num);
+  if (devicep == NULL)
+    return NULL;
+
+  if (!(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
+      || devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
+    return malloc (size);
+
+  void *ret = NULL;
+  gomp_mutex_lock (&devicep->lock);
+  if (devicep->usm_alloc_func)
+    ret = devicep->usm_alloc_func (devicep->target_id, size);
+  gomp_mutex_unlock (&devicep->lock);
+  return ret;
+}
+
+void
+gomp_usm_free (void *device_ptr, int device_num)
+{
+  if (device_ptr == NULL)
+    return;
+
+  if (device_num == gomp_get_num_devices ())
+    {
+      free (device_ptr);
+      return;
+    }
+
+  struct gomp_device_descr *devicep = resolve_device (device_num);
+  if (devicep == NULL)
+    return;
+
+  if (!(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
+      || devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
+    {
+      free (device_ptr);
+      return;
+    }
+
+  gomp_mutex_lock (&devicep->lock);
+  if (devicep->usm_free_func
+      && !devicep->usm_free_func (devicep->target_id, device_ptr))
+    {
+      gomp_mutex_unlock (&devicep->lock);
+      gomp_fatal ("error in freeing device memory block at %p", device_ptr);
+    }
+  gomp_mutex_unlock (&devicep->lock);
+}
+
 int
 omp_target_is_present (const void *ptr, int device_num)
 {
@@ -3740,6 +3807,9 @@ gomp_load_plugin_for_device (struct gomp_device_descr *device,
   DLSYM (unload_image);
   DLSYM (alloc);
   DLSYM (free);
+  DLSYM_OPT (usm_alloc, usm_alloc);
+  DLSYM_OPT (usm_free, usm_free);
+  DLSYM_OPT (is_usm_ptr, is_usm_ptr);
   DLSYM (dev2host);
   DLSYM (host2dev);
   device->capabilities = device->get_caps_func ();
diff --git a/libgomp/testsuite/libgomp.c/usm-1.c b/libgomp/testsuite/libgomp.c/usm-1.c
new file mode 100644
index 00000000000..1b35f19c45b
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/usm-1.c
@@ -0,0 +1,24 @@
+/* { dg-do run } */
+
+#include <omp.h>
+#include <stdint.h>
+
+int
+main ()
+{
+  int *a = (int *) omp_alloc(sizeof(int), ompx_unified_shared_mem_alloc);
+  if (!a)
+    __builtin_abort ();
+
+  *a = 42;
+  uintptr_t a_p = (uintptr_t)a;
+
+  #pragma omp target is_device_ptr(a)
+    {
+      if (*a != 42 || a_p != (uintptr_t)a)
+	__builtin_abort ();
+    }
+
+  omp_free(a, ompx_unified_shared_mem_alloc);
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c/usm-2.c b/libgomp/testsuite/libgomp.c/usm-2.c
new file mode 100644
index 00000000000..689cee7e456
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/usm-2.c
@@ -0,0 +1,32 @@
+/* { dg-do run } */
+
+#include <omp.h>
+#include <stdint.h>
+
+int
+main ()
+{
+  int *a = (int *) omp_alloc(sizeof(int)*2, ompx_unified_shared_mem_alloc);
+  if (!a)
+    __builtin_abort ();
+
+  a[0] = 42;
+  a[1] = 43;
+
+  uintptr_t a_p = (uintptr_t)a;
+
+  #pragma omp target map(a[0])
+    {
+      if (a[0] != 42 || a_p != (uintptr_t)a)
+	__builtin_abort ();
+    }
+
+  #pragma omp target map(a[1])
+    {
+      if (a[1] != 43 || a_p != (uintptr_t)a)
+	__builtin_abort ();
+    }
+
+  omp_free(a, ompx_unified_shared_mem_alloc);
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c/usm-3.c b/libgomp/testsuite/libgomp.c/usm-3.c
new file mode 100644
index 00000000000..2ca66afe93f
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/usm-3.c
@@ -0,0 +1,35 @@
+/* { dg-do run } */
+
+#include <omp.h>
+#include <stdint.h>
+
+int
+main ()
+{
+  int *a = (int *) omp_alloc(sizeof(int)*2, ompx_unified_shared_mem_alloc);
+  if (!a)
+    __builtin_abort ();
+
+  a[0] = 42;
+  a[1] = 43;
+
+  uintptr_t a_p = (uintptr_t)a;
+
+#pragma omp target data map(a[0:2])
+    {
+#pragma omp target
+	{
+	  if (a[0] != 42 || a_p != (uintptr_t)a)
+	    __builtin_abort ();
+	}
+
+#pragma omp target
+	{
+	  if (a[1] != 43 || a_p != (uintptr_t)a)
+	    __builtin_abort ();
+	}
+    }
+
+  omp_free(a, ompx_unified_shared_mem_alloc);
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c/usm-4.c b/libgomp/testsuite/libgomp.c/usm-4.c
new file mode 100644
index 00000000000..753908c8440
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/usm-4.c
@@ -0,0 +1,36 @@
+/* { dg-do run } */
+
+#include <omp.h>
+#include <stdint.h>
+
+int
+main ()
+{
+  int *a = (int *) omp_alloc(sizeof(int)*2, ompx_unified_shared_mem_alloc);
+  if (!a)
+    __builtin_abort ();
+
+  a[0] = 42;
+  a[1] = 43;
+
+  uintptr_t a_p = (uintptr_t)a;
+
+#pragma omp target enter data map(to:a[0:2])
+
+#pragma omp target
+    {
+      if (a[0] != 42 || a_p != (uintptr_t)a)
+	__builtin_abort ();
+    }
+
+#pragma omp target
+    {
+      if (a[1] != 43 || a_p != (uintptr_t)a)
+	__builtin_abort ();
+    }
+
+#pragma omp target exit data map(delete:a[0:2])
+
+  omp_free(a, ompx_unified_shared_mem_alloc);
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c/usm-5.c b/libgomp/testsuite/libgomp.c/usm-5.c
new file mode 100644
index 00000000000..4d8b3cf71b1
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/usm-5.c
@@ -0,0 +1,28 @@
+/* { dg-do run } */
+/* { dg-require-effective-target offload_device } */
+
+#include <omp.h>
+#include <stdint.h>
+
+#pragma omp requires unified_shared_memory
+
+int
+main ()
+{
+  int *a = (int *) omp_alloc(sizeof(int), ompx_host_mem_alloc);
+  if (!a)
+    __builtin_abort ();
+
+  a[0] = 42;
+
+  uintptr_t a_p = (uintptr_t)a;
+
+#pragma omp target map(a[0:1])
+    {
+      if (a[0] != 42 || a_p == (uintptr_t)a)
+	__builtin_abort ();
+    }
+
+  omp_free(a, ompx_host_mem_alloc);
+  return 0;
+}
-- 
2.25.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 4/5] openmp: Use libgomp memory allocation functions with unified shared memory.
  2022-03-08 11:30 [PATCH 0/5] openmp: Handle pinned and unified shared memory Hafiz Abid Qadeer
                   ` (2 preceding siblings ...)
  2022-03-08 11:30 ` [PATCH 3/5] openmp, nvptx: ompx_unified_shared_mem_alloc Hafiz Abid Qadeer
@ 2022-03-08 11:30 ` Hafiz Abid Qadeer
  2022-04-02 12:04   ` Andrew Stubbs
  2022-03-08 11:30 ` [PATCH 5/5] openmp: -foffload-memory=pinned Hafiz Abid Qadeer
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 12+ messages in thread
From: Hafiz Abid Qadeer @ 2022-03-08 11:30 UTC (permalink / raw)
  To: gcc-patches, fortran; +Cc: jakub, ams, joseph

This patches changes calls to malloc/free/calloc/realloc and operator new to
memory allocation functions in libgomp with
allocator=ompx_unified_shared_mem_alloc.  This helps existing code to benefit
from the unified shared memory.  The libgomp does the correct thing with all
the mapping constructs and there is no memory copies if the pointer is pointing
to unified shared memory.

We only replace replacable new operator and not the class member or placement new.

gcc/ChangeLog:

	* omp-low.cc (usm_transform): New function.
	(make_pass_usm_transform): Likewise.
	(class pass_usm_transform): New.
	* passes.def: Add pass_usm_transform.
	* tree-pass.h (make_pass_usm_transform): New declaration.

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/usm-2.c: New test.
	* c-c++-common/gomp/usm-3.c: New test.
	* g++.dg/gomp/usm-1.C: New test.
	* g++.dg/gomp/usm-2.C: New test.
	* g++.dg/gomp/usm-3.C: New test.
	* gfortran.dg/gomp/usm-2.f90: New test.
	* gfortran.dg/gomp/usm-3.f90: New test.

libgomp/ChangeLog:

	* testsuite/libgomp.c/usm-6.c: New test.
	* testsuite/libgomp.c++/usm-1.C: Likewise.
---
 gcc/omp-low.cc                           | 152 +++++++++++++++++++++++
 gcc/passes.def                           |   1 +
 gcc/testsuite/c-c++-common/gomp/usm-2.c  |  34 +++++
 gcc/testsuite/c-c++-common/gomp/usm-3.c  |  32 +++++
 gcc/testsuite/g++.dg/gomp/usm-1.C        |  32 +++++
 gcc/testsuite/g++.dg/gomp/usm-2.C        |  30 +++++
 gcc/testsuite/g++.dg/gomp/usm-3.C        |  38 ++++++
 gcc/testsuite/gfortran.dg/gomp/usm-2.f90 |  16 +++
 gcc/testsuite/gfortran.dg/gomp/usm-3.f90 |  13 ++
 gcc/tree-pass.h                          |   1 +
 libgomp/testsuite/libgomp.c++/usm-1.C    |  54 ++++++++
 libgomp/testsuite/libgomp.c/usm-6.c      |  70 +++++++++++
 12 files changed, 473 insertions(+)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/usm-2.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/usm-3.c
 create mode 100644 gcc/testsuite/g++.dg/gomp/usm-1.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/usm-2.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/usm-3.C
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/usm-2.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/usm-3.f90
 create mode 100644 libgomp/testsuite/libgomp.c++/usm-1.C
 create mode 100644 libgomp/testsuite/libgomp.c/usm-6.c

diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index 5ce3a50709a..ec08d59f676 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -14849,6 +14849,158 @@ make_pass_diagnose_omp_blocks (gcc::context *ctxt)
 {
   return new pass_diagnose_omp_blocks (ctxt);
 }
+
+/* Provide transformation required for using unified shared memory
+   by replacing calls to standard memory allocation functions with
+   function provided by the libgomp.  */
+
+static tree
+usm_transform (gimple_stmt_iterator *gsi_p, bool *,
+	       struct walk_stmt_info *wi)
+{
+  gimple *stmt = gsi_stmt (*gsi_p);
+  /* ompx_unified_shared_mem_alloc is 10.  */
+  const unsigned int unified_shared_mem_alloc = 10;
+
+  switch (gimple_code (stmt))
+    {
+    case GIMPLE_CALL:
+      {
+	gcall *gs = as_a <gcall *> (stmt);
+	tree fndecl = gimple_call_fndecl (gs);
+	if (fndecl)
+	  {
+	    tree allocator = build_int_cst (pointer_sized_int_node,
+					    unified_shared_mem_alloc);
+	    const char *name = IDENTIFIER_POINTER (DECL_NAME (fndecl));
+	    if ((strcmp (name, "malloc") == 0)
+		 || (fndecl_built_in_p (fndecl, BUILT_IN_NORMAL)
+		     && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_MALLOC)
+		 || DECL_IS_REPLACEABLE_OPERATOR_NEW_P (fndecl))
+	      {
+		  tree omp_alloc_type
+		    = build_function_type_list (ptr_type_node, size_type_node,
+						pointer_sized_int_node,
+						NULL_TREE);
+		tree repl = build_fn_decl ("omp_alloc", omp_alloc_type);
+		tree size = gimple_call_arg (gs, 0);
+		gimple *g = gimple_build_call (repl, 2, size, allocator);
+		gimple_call_set_lhs (g, gimple_call_lhs (gs));
+		gimple_set_location (g, gimple_location (stmt));
+		gsi_replace (gsi_p, g, true);
+	      }
+	    else if ((strcmp (name, "calloc") == 0)
+		      || (fndecl_built_in_p (fndecl, BUILT_IN_NORMAL)
+			  && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_CALLOC))
+	      {
+		tree omp_calloc_type
+		  = build_function_type_list (ptr_type_node, size_type_node,
+					      size_type_node,
+					      pointer_sized_int_node,
+					      NULL_TREE);
+		tree repl = build_fn_decl ("omp_calloc", omp_calloc_type);
+		tree num = gimple_call_arg (gs, 0);
+		tree size = gimple_call_arg (gs, 1);
+		gimple *g = gimple_build_call (repl, 3, num, size, allocator);
+		gimple_call_set_lhs (g, gimple_call_lhs (gs));
+		gimple_set_location (g, gimple_location (stmt));
+		gsi_replace (gsi_p, g, true);
+	      }
+	    else if ((strcmp (name, "realloc") == 0)
+		      || (fndecl_built_in_p (fndecl, BUILT_IN_NORMAL)
+			  && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_REALLOC))
+	      {
+		tree omp_realloc_type
+		  = build_function_type_list (ptr_type_node, ptr_type_node,
+					      size_type_node,
+					      pointer_sized_int_node,
+					      pointer_sized_int_node,
+					      NULL_TREE);
+		tree repl = build_fn_decl ("omp_realloc", omp_realloc_type);
+		tree ptr = gimple_call_arg (gs, 0);
+		tree size = gimple_call_arg (gs, 1);
+		gimple *g = gimple_build_call (repl, 4, ptr, size, allocator,
+					       allocator);
+		gimple_call_set_lhs (g, gimple_call_lhs (gs));
+		gimple_set_location (g, gimple_location (stmt));
+		gsi_replace (gsi_p, g, true);
+	      }
+	    else  if ((strcmp (name, "free") == 0)
+		       || (fndecl_built_in_p (fndecl, BUILT_IN_NORMAL)
+			   && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_FREE)
+		       || (DECL_IS_OPERATOR_DELETE_P (fndecl)
+			   && DECL_IS_REPLACEABLE_OPERATOR (fndecl)))
+	      {
+		tree omp_free_type
+		  = build_function_type_list (void_type_node, ptr_type_node,
+					      pointer_sized_int_node,
+					      NULL_TREE);
+		tree repl = build_fn_decl ("omp_free", omp_free_type);
+		tree ptr = gimple_call_arg (gs, 0);
+		gimple *g = gimple_build_call (repl, 2, ptr, allocator);
+		gimple_set_location (g, gimple_location (stmt));
+		gsi_replace (gsi_p, g, true);
+	      }
+	  }
+      }
+      break;
+
+    default:
+      break;
+    }
+
+  return NULL_TREE;
+}
+
+namespace {
+
+const pass_data pass_data_usm_transform =
+{
+  GIMPLE_PASS, /* type */
+  "usm_transform", /* name */
+  OPTGROUP_OMP, /* optinfo_flags */
+  TV_NONE, /* tv_id */
+  PROP_gimple_any, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_usm_transform : public gimple_opt_pass
+{
+public:
+  pass_usm_transform (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_usm_transform, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *)
+  {
+    return (flag_openmp || flag_openmp_simd)
+	    && (flag_offload_memory == OFFLOAD_MEMORY_UNIFIED
+		|| omp_requires_mask & OMP_REQUIRES_UNIFIED_SHARED_MEMORY);
+  }
+  virtual unsigned int execute (function *)
+  {
+    struct walk_stmt_info wi;
+    gimple_seq body = gimple_body (current_function_decl);
+
+    memset (&wi, 0, sizeof (wi));
+    walk_gimple_seq (body, usm_transform, NULL, &wi);
+
+    return 0;
+  }
+
+}; // class pass_usm_transform
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_usm_transform (gcc::context *ctxt)
+{
+  return new pass_usm_transform (ctxt);
+}
 \f
 
 #include "gt-omp-low.h"
diff --git a/gcc/passes.def b/gcc/passes.def
index f7718181038..98c7736bb8b 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -36,6 +36,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_diagnose_tm_blocks);
   NEXT_PASS (pass_omp_oacc_kernels_decompose);
   NEXT_PASS (pass_lower_omp);
+  NEXT_PASS (pass_usm_transform);
   NEXT_PASS (pass_lower_cf);
   NEXT_PASS (pass_lower_tm);
   NEXT_PASS (pass_refactor_eh);
diff --git a/gcc/testsuite/c-c++-common/gomp/usm-2.c b/gcc/testsuite/c-c++-common/gomp/usm-2.c
new file mode 100644
index 00000000000..2f3f986012c
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/usm-2.c
@@ -0,0 +1,34 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-fdump-tree-usm_transform" } */
+
+#pragma omp requires unified_shared_memory
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+void *malloc (__SIZE_TYPE__);
+void *calloc(__SIZE_TYPE__, __SIZE_TYPE__);
+void *realloc(void *, __SIZE_TYPE__);
+void free (void *);
+
+#ifdef __cplusplus
+}
+#endif
+
+void
+foo ()
+{
+  void *p1 = malloc(20);
+  void *p2 = realloc(p1, 30);
+  void *p3 = calloc(4, 15);
+  free (p2);
+  free (p3);
+}
+
+/* { dg-final { scan-tree-dump-times "omp_alloc \\(20, 10\\)" 1 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-times "omp_realloc \\(.*, 30, 10, 10\\)" 1 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-times "omp_calloc \\(4, 15, 10\\)" 1 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-times "omp_free" 2 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-not " free"  "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-not " malloc"  "usm_transform"  } } */
diff --git a/gcc/testsuite/c-c++-common/gomp/usm-3.c b/gcc/testsuite/c-c++-common/gomp/usm-3.c
new file mode 100644
index 00000000000..c8230e7ff7c
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/usm-3.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-foffload-memory=unified -fdump-tree-usm_transform" } */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+void *malloc (__SIZE_TYPE__);
+void *calloc(__SIZE_TYPE__, __SIZE_TYPE__);
+void *realloc(void *, __SIZE_TYPE__);
+void free (void *);
+
+#ifdef __cplusplus
+}
+#endif
+
+void
+foo ()
+{
+  void *p1 = malloc(20);
+  void *p2 = realloc(p1, 30);
+  void *p3 = calloc(4, 15);
+  free (p2);
+  free (p3);
+}
+
+/* { dg-final { scan-tree-dump-times "omp_alloc \\(20, 10\\)" 1 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-times "omp_realloc \\(.*, 30, 10, 10\\)" 1 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-times "omp_calloc \\(4, 15, 10\\)" 1 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-times "omp_free" 2 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-not " free"  "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-not " malloc"  "usm_transform"  } } */
diff --git a/gcc/testsuite/g++.dg/gomp/usm-1.C b/gcc/testsuite/g++.dg/gomp/usm-1.C
new file mode 100644
index 00000000000..bd70a81b5bb
--- /dev/null
+++ b/gcc/testsuite/g++.dg/gomp/usm-1.C
@@ -0,0 +1,32 @@
+// { dg-do compile }
+// { dg-options "-fopenmp -fdump-tree-usm_transform" }
+
+#pragma omp requires unified_shared_memory
+
+struct t1
+{
+  int a;
+  int b;
+};
+
+typedef unsigned char uint8_t;
+
+void
+foo (__SIZE_TYPE__ x, __SIZE_TYPE__ y)
+{
+  uint8_t *p1 = new uint8_t;
+  uint8_t *p2 = new uint8_t[20];
+  t1 *p3 = new t1;
+  t1 *p4 = new t1[y];
+  delete p1;
+  delete p3;
+  delete [] p2;
+  delete [] p4;
+}
+
+/* { dg-final { scan-tree-dump-times "omp_alloc \\(1, 10\\)" 1 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-times "omp_alloc \\(20, 10\\)" 1 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-times "omp_alloc" 4 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-times "omp_free" 4 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-not "operator new"  "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-not "operator delete"  "usm_transform"  } } */
diff --git a/gcc/testsuite/g++.dg/gomp/usm-2.C b/gcc/testsuite/g++.dg/gomp/usm-2.C
new file mode 100644
index 00000000000..f6ab155c6de
--- /dev/null
+++ b/gcc/testsuite/g++.dg/gomp/usm-2.C
@@ -0,0 +1,30 @@
+// { dg-do compile }
+// { dg-options "-fopenmp -foffload-memory=unified -fdump-tree-usm_transform" }
+
+struct t1
+{
+  int a;
+  int b;
+};
+
+typedef unsigned char uint8_t;
+
+void
+foo (__SIZE_TYPE__ x, __SIZE_TYPE__ y)
+{
+  uint8_t *p1 = new uint8_t;
+  uint8_t *p2 = new uint8_t[20];
+  t1 *p3 = new t1;
+  t1 *p4 = new t1[y];
+  delete p1;
+  delete p3;
+  delete [] p2;
+  delete [] p4;
+}
+
+/* { dg-final { scan-tree-dump-times "omp_alloc \\(1, 10\\)" 1 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-times "omp_alloc \\(20, 10\\)" 1 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-times "omp_alloc" 4 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-times "omp_free" 4 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-not "operator new"  "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-not "operator delete"  "usm_transform"  } } */
diff --git a/gcc/testsuite/g++.dg/gomp/usm-3.C b/gcc/testsuite/g++.dg/gomp/usm-3.C
new file mode 100644
index 00000000000..50ac9302c8b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/gomp/usm-3.C
@@ -0,0 +1,38 @@
+// { dg-do compile }
+// { dg-options "-fopenmp -fdump-tree-usm_transform" }
+
+#pragma omp requires unified_shared_memory
+
+#include <new>
+
+
+struct X {
+    static void* operator new(std::size_t count)
+    {
+      static char buf[10];
+      return &buf[0];
+    }
+    static void* operator new[](std::size_t count)
+    {
+      static char buf[10];
+      return &buf[0];
+    }
+    static void operator delete(void*)
+    {
+    }
+    static void operator delete[](void*)
+    {
+    }
+};
+void foo() {
+  X* p1 = new X;
+  delete p1;
+  X* p2 = new X[10];
+  delete[] p2;
+  unsigned char buf[24] ;
+  int *p3 = new (buf) int(3);
+  p3[0] = 1;
+}
+
+/* { dg-final { scan-tree-dump-not "omp_alloc"  "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-not "omp_free"  "usm_transform"  } } */
diff --git a/gcc/testsuite/gfortran.dg/gomp/usm-2.f90 b/gcc/testsuite/gfortran.dg/gomp/usm-2.f90
new file mode 100644
index 00000000000..dc775260cb7
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/usm-2.f90
@@ -0,0 +1,16 @@
+! { dg-do compile }
+! { dg-additional-options "-fdump-tree-usm_transform" }
+
+!$omp requires unified_shared_memory
+end
+
+subroutine foo()
+  implicit none
+  integer, allocatable :: var1
+
+  allocate(var1)
+
+end subroutine
+
+! { dg-final { scan-tree-dump-times "omp_alloc" 1 "usm_transform"  } } 
+! { dg-final { scan-tree-dump-times "omp_free" 1 "usm_transform"  } } 
\ No newline at end of file
diff --git a/gcc/testsuite/gfortran.dg/gomp/usm-3.f90 b/gcc/testsuite/gfortran.dg/gomp/usm-3.f90
new file mode 100644
index 00000000000..7983444ebff
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/usm-3.f90
@@ -0,0 +1,13 @@
+! { dg-do compile }
+! { dg-additional-options "-foffload-memory=unified -fdump-tree-usm_transform" }
+
+subroutine foo()
+  implicit none
+  integer, allocatable :: var1
+
+  allocate(var1)
+
+end subroutine
+
+! { dg-final { scan-tree-dump-times "omp_alloc" 1 "usm_transform"  } } 
+! { dg-final { scan-tree-dump-times "omp_free" 1 "usm_transform"  } } 
\ No newline at end of file
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 606d1d60b85..494a9662afa 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -424,6 +424,7 @@ extern gimple_opt_pass *make_pass_lower_vector_ssa (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_omp_oacc_kernels_decompose (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_lower_omp (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_diagnose_omp_blocks (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_usm_transform (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_expand_omp (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_expand_omp_ssa (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_omp_target_link (gcc::context *ctxt);
diff --git a/libgomp/testsuite/libgomp.c++/usm-1.C b/libgomp/testsuite/libgomp.c++/usm-1.C
new file mode 100644
index 00000000000..fea25e5f10b
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c++/usm-1.C
@@ -0,0 +1,54 @@
+/* { dg-do run } */
+/* { dg-skip-if "Only valid for nvptx" { ! offload_target_nvptx } } */
+#include <stdint.h>
+
+#pragma omp requires unified_shared_memory
+
+int g1 = 0;
+
+struct s1
+{
+  s1() { a = g1++;}
+  ~s1() { g1--;}
+  int a;
+};
+
+int
+main ()
+{
+  s1 *p1 = new s1;
+  s1 *p2 = new s1[10];
+
+  if (!p1 || !p2 || p1->a != 0)
+    __builtin_abort ();
+
+  for (int i = 0; i < 10; i++)
+    if (p2[i].a != i+1)
+      __builtin_abort ();
+
+  uintptr_t pp1 = (uintptr_t)p1;
+  uintptr_t pp2 = (uintptr_t)p2;
+
+#pragma omp target firstprivate(pp1, pp2)
+    {
+      s1 *t1 = (s1*)pp1;
+      s1 *t2 = (s1*)pp2;
+      if (t1->a != 0)
+	__builtin_abort ();
+
+      for (int i = 0; i < 10; i++)
+	if (t2[i].a != i+1)
+	  __builtin_abort ();
+
+      t1->a = 42;
+    }
+
+  if (p1->a != 42)
+    __builtin_abort ();
+
+  delete [] p2;
+  delete p1;
+  if (g1 != 0)
+    __builtin_abort ();
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c/usm-6.c b/libgomp/testsuite/libgomp.c/usm-6.c
new file mode 100644
index 00000000000..d98da68a1ed
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/usm-6.c
@@ -0,0 +1,70 @@
+/* { dg-do run } */
+/* { dg-skip-if "Only valid for nvptx" { ! offload_target_nvptx } } */
+
+#include <stdint.h>
+#include <stdlib.h>
+
+#pragma omp requires unified_shared_memory
+
+int
+main ()
+{
+  int *a = (int *) malloc(sizeof(int)*2);
+  int *b = (int *) calloc(sizeof(int), 3);
+  int *c = (int *) realloc(NULL, sizeof(int) * 4);
+  if (!a || !b || !c)
+    __builtin_abort ();
+
+  a[0] = 42;
+  a[1] = 43;
+  b[0] = 52;
+  b[1] = 53;
+  b[2] = 54;
+  c[0] = 62;
+  c[1] = 63;
+  c[2] = 64;
+  c[3] = 65;
+
+  uintptr_t a_p = (uintptr_t)a;
+  uintptr_t b_p = (uintptr_t)b;
+  uintptr_t c_p = (uintptr_t)c;
+
+#pragma omp target enter data map(to:a[0:2])
+
+#pragma omp target is_device_ptr(c)
+    {
+      if (a[0] != 42 || a_p != (uintptr_t)a)
+	__builtin_abort ();
+      if (b[0] != 52 || b[2] != 54 || b_p != (uintptr_t)b)
+	__builtin_abort ();
+      if (c[0] != 62 || c[3] != 65 || c_p != (uintptr_t)c)
+	__builtin_abort ();
+      a[0] = 72;
+      b[0] = 82;
+      c[0] = 92;
+    }
+
+#pragma omp target
+    {
+      if (a[1] != 43 || a_p != (uintptr_t)a)
+	__builtin_abort ();
+      if (b[1] != 53 || b_p != (uintptr_t)b)
+	__builtin_abort ();
+      if (c[1] != 63 || c[2] != 64 || c_p != (uintptr_t)c)
+	__builtin_abort ();
+      a[1] = 73;
+      b[1] = 83;
+      c[1] = 93;
+    }
+
+#pragma omp target exit data map(delete:a[0:2])
+
+  if (a[0] != 72 || a[1] != 73
+      || b[0] != 82 || b[1] != 83
+      || c[0] != 92 || c[1] != 93)
+	__builtin_abort ();
+  free(a);
+  free(b);
+  free(c);
+  return 0;
+}
-- 
2.25.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 5/5] openmp: -foffload-memory=pinned
  2022-03-08 11:30 [PATCH 0/5] openmp: Handle pinned and unified shared memory Hafiz Abid Qadeer
                   ` (3 preceding siblings ...)
  2022-03-08 11:30 ` [PATCH 4/5] openmp: Use libgomp memory allocation functions with unified shared memory Hafiz Abid Qadeer
@ 2022-03-08 11:30 ` Hafiz Abid Qadeer
  2022-03-30 22:40   ` Andrew Stubbs
  2023-02-09 11:16   ` [og12] 'c-c++-common/gomp/alloc-pinned-1.c' -> 'libgomp.c-c++-common/alloc-pinned-1.c' (was: [PATCH 5/5] openmp: -foffload-memory=pinned) Thomas Schwinge
  2022-04-13 13:14 ` [PATCH 0/5] openmp: Handle pinned and unified shared memory Andrew Stubbs
  2022-04-20 13:25 ` [PATCH] openmp: Handle unified address memory Andrew Stubbs
  6 siblings, 2 replies; 12+ messages in thread
From: Hafiz Abid Qadeer @ 2022-03-08 11:30 UTC (permalink / raw)
  To: gcc-patches, fortran; +Cc: jakub, ams, joseph

From: Andrew Stubbs <ams@codesourcery.com>

Implement the -foffload-memory=pinned option such that libgomp is
instructed to enable fully-pinned memory at start-up.  The option is
intended to provide a performance boost to certain offload programs without
modifying the code.

This feature only works on Linux, at present, and simply calls mlockall to
enable always-on memory pinning.  It requires that the ulimit feature is
set high enough to accommodate all the program's memory usage.

In this mode the ompx_pinned_memory_alloc feature is disabled as it is not
needed and may conflict.

gcc/ChangeLog:

	* omp-low.cc (omp_enable_pinned_mode): New function.
	(execute_lower_omp): Call omp_enable_pinned_mode.

libgomp/ChangeLog:

	* config/linux/allocator.c (always_pinned_mode): New variable.
	(GOMP_enable_pinned_mode): New function.
	(linux_memspace_alloc): Disable pinning when always_pinned_mode set.
	(linux_memspace_calloc): Likewise.
	(linux_memspace_free): Likewise.
	(linux_memspace_realloc): Likewise.
	* libgomp.map (GOMP_5.1.1): New version space with
	GOMP_enable_pinned_mode.
	* testsuite/libgomp.c/alloc-pinned-7.c: New test.

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/alloc-pinned-1.c: New test.
---
 gcc/omp-low.cc                                | 68 +++++++++++++++++++
 .../c-c++-common/gomp/alloc-pinned-1.c        | 28 ++++++++
 libgomp/config/linux/allocator.c              | 26 +++++++
 libgomp/libgomp.map                           |  5 ++
 libgomp/testsuite/libgomp.c/alloc-pinned-7.c  | 66 ++++++++++++++++++
 5 files changed, 193 insertions(+)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/alloc-pinned-1.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-7.c

diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index ec08d59f676..ce21b3bd6f8 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -14441,6 +14441,70 @@ lower_omp (gimple_seq *body, omp_context *ctx)
   input_location = saved_location;
 }
 
+/* Emit a constructor function to enable -foffload-memory=pinned
+   at runtime.  Libgomp handles the OS mode setting, but we need to trigger
+   it by calling GOMP_enable_pinned mode before the program proper runs.  */
+
+static void
+omp_enable_pinned_mode ()
+{
+  static bool visited = false;
+  if (visited)
+    return;
+  visited = true;
+
+  /* Create a new function like this:
+
+       static void __attribute__((constructor))
+       __set_pinned_mode ()
+       {
+	 GOMP_enable_pinned_mode ();
+       }
+  */
+
+  tree name = get_identifier ("__set_pinned_mode");
+  tree voidfntype = build_function_type_list (void_type_node, NULL_TREE);
+  tree decl = build_decl (UNKNOWN_LOCATION, FUNCTION_DECL, name, voidfntype);
+
+  TREE_STATIC (decl) = 1;
+  TREE_USED (decl) = 1;
+  DECL_ARTIFICIAL (decl) = 1;
+  DECL_IGNORED_P (decl) = 0;
+  TREE_PUBLIC (decl) = 0;
+  DECL_UNINLINABLE (decl) = 1;
+  DECL_EXTERNAL (decl) = 0;
+  DECL_CONTEXT (decl) = NULL_TREE;
+  DECL_INITIAL (decl) = make_node (BLOCK);
+  BLOCK_SUPERCONTEXT (DECL_INITIAL (decl)) = decl;
+  DECL_STATIC_CONSTRUCTOR (decl) = 1;
+  DECL_ATTRIBUTES (decl) = tree_cons (get_identifier ("constructor"),
+				      NULL_TREE, NULL_TREE);
+
+  tree t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE,
+		       void_type_node);
+  DECL_ARTIFICIAL (t) = 1;
+  DECL_IGNORED_P (t) = 1;
+  DECL_CONTEXT (t) = decl;
+  DECL_RESULT (decl) = t;
+
+  push_struct_function (decl);
+  init_tree_ssa (cfun);
+
+  tree callname = get_identifier ("GOMP_enable_pinned_mode");
+  tree calldecl = build_decl (UNKNOWN_LOCATION, FUNCTION_DECL, callname,
+			      voidfntype);
+  gcall *call = gimple_build_call (calldecl, 0);
+
+  gimple_seq seq = NULL;
+  gimple_seq_add_stmt (&seq, call);
+  gimple_set_body (decl, gimple_build_bind (NULL_TREE, seq, NULL));
+
+  cfun->function_end_locus = UNKNOWN_LOCATION;
+  cfun->curr_properties |= PROP_gimple_any;
+  pop_cfun ();
+  cgraph_node::add_new_function (decl, true);
+}
+
 /* Main entry point.  */
 
 static unsigned int
@@ -14497,6 +14561,10 @@ execute_lower_omp (void)
   for (auto task_stmt : task_cpyfns)
     finalize_task_copyfn (task_stmt);
   task_cpyfns.release ();
+
+  if (flag_offload_memory == OFFLOAD_MEMORY_PINNED)
+    omp_enable_pinned_mode ();
+
   return 0;
 }
 
diff --git a/gcc/testsuite/c-c++-common/gomp/alloc-pinned-1.c b/gcc/testsuite/c-c++-common/gomp/alloc-pinned-1.c
new file mode 100644
index 00000000000..e0e08019bff
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/alloc-pinned-1.c
@@ -0,0 +1,28 @@
+/* { dg-do run } */
+/* { dg-additional-options "-foffload-memory=pinned" } */
+/* { dg-xfail-run-if "Pinning not implemented on this host" { ! *-*-linux-gnu } } */
+
+#if __cplusplus
+#define EXTERNC extern "C"
+#else
+#define EXTERNC
+#endif
+
+/* Intercept the libgomp initialization call to check it happens.  */
+
+int good = 0;
+
+EXTERNC void
+GOMP_enable_pinned_mode ()
+{
+  good = 1;
+}
+
+int
+main ()
+{
+  if (!good)
+    __builtin_exit (1);
+
+  return 0;
+}
diff --git a/libgomp/config/linux/allocator.c b/libgomp/config/linux/allocator.c
index face524259c..4bd5bd6b930 100644
--- a/libgomp/config/linux/allocator.c
+++ b/libgomp/config/linux/allocator.c
@@ -39,9 +39,26 @@
 #include <string.h>
 #include "libgomp.h"
 
+static bool always_pinned_mode = false;
+
+/* This function is called by the compiler when -foffload-memory=pinned
+   is used.  */
+
+void
+GOMP_enable_pinned_mode ()
+{
+  if (mlockall (MCL_CURRENT | MCL_FUTURE) != 0)
+    gomp_error ("failed to pin all memory (ulimit too low?)");
+  else
+    always_pinned_mode = true;
+}
+
 static void *
 linux_memspace_alloc (omp_memspace_handle_t memspace, size_t size, int pin)
 {
+  /* Explicit pinning may not be required.  */
+  pin = pin && !always_pinned_mode;
+
   if (memspace == ompx_unified_shared_mem_space)
     {
       return gomp_usm_alloc (size, GOMP_DEVICE_ICV);
@@ -69,6 +86,9 @@ linux_memspace_alloc (omp_memspace_handle_t memspace, size_t size, int pin)
 static void *
 linux_memspace_calloc (omp_memspace_handle_t memspace, size_t size, int pin)
 {
+  /* Explicit pinning may not be required.  */
+  pin = pin && !always_pinned_mode;
+
   if (memspace == ompx_unified_shared_mem_space)
     {
       void *ret = gomp_usm_alloc (size, GOMP_DEVICE_ICV);
@@ -86,6 +106,9 @@ static void
 linux_memspace_free (omp_memspace_handle_t memspace, void *addr, size_t size,
 		     int pin)
 {
+  /* Explicit pinning may not be required.  */
+  pin = pin && !always_pinned_mode;
+
   if (memspace == ompx_unified_shared_mem_space)
     gomp_usm_free (addr, GOMP_DEVICE_ICV);
   else if (pin)
@@ -98,6 +121,9 @@ static void *
 linux_memspace_realloc (omp_memspace_handle_t memspace, void *addr,
 			size_t oldsize, size_t size, int oldpin, int pin)
 {
+  /* Explicit pinning may not be required.  */
+  pin = pin && !always_pinned_mode;
+
   if (memspace == ompx_unified_shared_mem_space)
     goto manual_realloc;
   else if (oldpin && pin)
diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
index 2ac58094169..40402dc9893 100644
--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -402,6 +402,11 @@ GOMP_5.1 {
 	GOMP_teams4;
 } GOMP_5.0.1;
 
+GOMP_5.1.1 {
+  global:
+	GOMP_enable_pinned_mode;
+} GOMP_5.1;
+
 OACC_2.0 {
   global:
 	acc_get_num_devices;
diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-7.c b/libgomp/testsuite/libgomp.c/alloc-pinned-7.c
new file mode 100644
index 00000000000..6fd19b46a5c
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/alloc-pinned-7.c
@@ -0,0 +1,66 @@
+/* { dg-do run } */
+/* { dg-additional-options "-foffload-memory=pinned" } */
+
+/* { dg-xfail-run-if "Pinning not implemented on this host" { ! *-*-linux-gnu } } */
+
+/* Test that pinned memory works.  */
+
+#ifdef __linux__
+#include <sys/types.h>
+#include <unistd.h>
+#include <stdio.h>
+#include <stdlib.h>
+
+#include <sys/mman.h>
+
+int
+get_pinned_mem ()
+{
+  int pid = getpid ();
+  char buf[100];
+  sprintf (buf, "/proc/%d/status", pid);
+
+  FILE *proc = fopen (buf, "r");
+  if (!proc)
+    abort ();
+  while (fgets (buf, 100, proc))
+    {
+      int val;
+      if (sscanf (buf, "VmLck: %d", &val))
+	{
+	  printf ("lock %d\n", val);
+	  fclose (proc);
+	  return val;
+	}
+    }
+  abort ();
+}
+#else
+int
+get_pinned_mem ()
+{
+  return 0;
+}
+
+#define mlockall(...) 0
+#endif
+
+#include <omp.h>
+
+/* Allocate more than a page each time, but stay within the ulimit.  */
+#define SIZE 10*1024
+
+int
+main ()
+{
+  // Sanity check
+  if (get_pinned_mem () == 0)
+    {
+      /* -foffload-memory=pinned has failed, but maybe that's because
+	 isufficient pinned memory was available.  */
+      if (mlockall (MCL_CURRENT | MCL_FUTURE) == 0)
+	abort ();
+    }
+
+  return 0;
+}
-- 
2.25.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 5/5] openmp: -foffload-memory=pinned
  2022-03-08 11:30 ` [PATCH 5/5] openmp: -foffload-memory=pinned Hafiz Abid Qadeer
@ 2022-03-30 22:40   ` Andrew Stubbs
  2023-02-09 11:16   ` [og12] 'c-c++-common/gomp/alloc-pinned-1.c' -> 'libgomp.c-c++-common/alloc-pinned-1.c' (was: [PATCH 5/5] openmp: -foffload-memory=pinned) Thomas Schwinge
  1 sibling, 0 replies; 12+ messages in thread
From: Andrew Stubbs @ 2022-03-30 22:40 UTC (permalink / raw)
  To: Hafiz Abid Qadeer, gcc-patches, fortran; +Cc: jakub, joseph

[-- Attachment #1: Type: text/plain, Size: 324 bytes --]

On 08/03/2022 11:30, Hafiz Abid Qadeer wrote:
> gcc/ChangeLog:
> 
> 	* omp-low.cc (omp_enable_pinned_mode): New function.
> 	(execute_lower_omp): Call omp_enable_pinned_mode.

This worked for x86_64, but I needed to make the attached adjustment to 
work on powerpc without a linker error.

I've committed it to OG11.

Andrew

[-- Attachment #2: 220330-gomp_enable_pinned_mode.patch --]
[-- Type: text/plain, Size: 1360 bytes --]

openmp: BUILT_IN_GOMP_ENABLE_PINNED_MODE

Rework the GOMP_enable_pinned_mode call so that it works on powerpc where
the old way gave a local call.

gcc/ChangeLog:

	* omp-builtins.def (BUILT_IN_GOMP_ENABLE_PINNED_MODE): New.
	* omp-low.c (omp_enable_pinned_mode): Use
	BUILT_IN_GOMP_ENABLE_PINNED_MODE.

diff --git a/gcc/omp-builtins.def b/gcc/omp-builtins.def
index c591d79fa07..e442b0b5c94 100644
--- a/gcc/omp-builtins.def
+++ b/gcc/omp-builtins.def
@@ -468,3 +468,6 @@ DEF_GOMP_BUILTIN (BUILT_IN_GOMP_ERROR, "GOMP_error",
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_EVALUATE_TARGET_DEVICE, "GOMP_evaluate_target_device",
 		  BT_FN_BOOL_INT_CONST_PTR_CONST_PTR_CONST_PTR,
 		  ATTR_NOTHROW_LEAF_LIST)
+DEF_GOMP_BUILTIN (BUILT_IN_GOMP_ENABLE_PINNED_MODE,
+		  "GOMP_enable_pinned_mode",
+		  BT_FN_VOID, ATTR_NOTHROW_LIST)
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index f7ecfb52c73..4e8ab9e4ca0 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -15432,9 +15432,7 @@ omp_enable_pinned_mode ()
   push_struct_function (decl);
   init_tree_ssa (cfun);
 
-  tree callname = get_identifier ("GOMP_enable_pinned_mode");
-  tree calldecl = build_decl (UNKNOWN_LOCATION, FUNCTION_DECL, callname,
-			      voidfntype);
+  tree calldecl = builtin_decl_explicit (BUILT_IN_GOMP_ENABLE_PINNED_MODE);
   gcall *call = gimple_build_call (calldecl, 0);
 
   gimple_seq seq = NULL;

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/5] openmp: Use libgomp memory allocation functions with unified shared memory.
  2022-03-08 11:30 ` [PATCH 4/5] openmp: Use libgomp memory allocation functions with unified shared memory Hafiz Abid Qadeer
@ 2022-04-02 12:04   ` Andrew Stubbs
  2022-04-02 12:42     ` Andrew Stubbs
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Stubbs @ 2022-04-02 12:04 UTC (permalink / raw)
  To: Hafiz Abid Qadeer, gcc-patches, fortran; +Cc: jakub, joseph

[-- Attachment #1: Type: text/plain, Size: 491 bytes --]

On 08/03/2022 11:30, Hafiz Abid Qadeer wrote:
> This patches changes calls to malloc/free/calloc/realloc and operator new to
> memory allocation functions in libgomp with
> allocator=ompx_unified_shared_mem_alloc.

This additional patch adds transformation for omp_target_alloc. The 
OpenMP 5.0 document says that addresses allocated this way needs to work 
without is_device_ptr. The easiest way to make that work is to make them 
USM addresses.

I will commit this to OG11 shortly.

Andrew

[-- Attachment #2: 220401-usm-omp_target_alloc.patch --]
[-- Type: text/plain, Size: 6595 bytes --]

openmp: Do USM transform for omp_target_alloc

OpenMP 5.0 says that omp_target_alloc should return USM addresses.

gcc/ChangeLog:

	* omp-low.c (usm_transform): Transform omp_target_alloc and
	omp_target_free.

libgomp/ChangeLog:

	* testsuite/libgomp.c/usm-6.c: Add omp_target_alloc.

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/usm-2.c: Add omp_target_alloc.
	* c-c++-common/gomp/usm-3.c: Add omp_target_alloc.

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 4e8ab9e4ca0..9235eafd1d7 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -15880,7 +15880,8 @@ usm_transform (gimple_stmt_iterator *gsi_p, bool *,
 	    if ((strcmp (name, "malloc") == 0)
 		 || (fndecl_built_in_p (fndecl, BUILT_IN_NORMAL)
 		     && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_MALLOC)
-		 || DECL_IS_REPLACEABLE_OPERATOR_NEW_P (fndecl))
+		 || DECL_IS_REPLACEABLE_OPERATOR_NEW_P (fndecl)
+		 || strcmp (name, "omp_target_alloc") == 0)
 	      {
 		  tree omp_alloc_type
 		    = build_function_type_list (ptr_type_node, size_type_node,
@@ -15952,7 +15953,8 @@ usm_transform (gimple_stmt_iterator *gsi_p, bool *,
 		       || (fndecl_built_in_p (fndecl, BUILT_IN_NORMAL)
 			   && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_FREE)
 		       || (DECL_IS_OPERATOR_DELETE_P (fndecl)
-			   && DECL_IS_REPLACEABLE_OPERATOR (fndecl)))
+			   && DECL_IS_REPLACEABLE_OPERATOR (fndecl))
+		       || strcmp (name, "omp_target_free") == 0)
 	      {
 		tree omp_free_type
 		  = build_function_type_list (void_type_node, ptr_type_node,
diff --git a/gcc/testsuite/c-c++-common/gomp/usm-2.c b/gcc/testsuite/c-c++-common/gomp/usm-2.c
index 64dbb6be131..8c20ef94e69 100644
--- a/gcc/testsuite/c-c++-common/gomp/usm-2.c
+++ b/gcc/testsuite/c-c++-common/gomp/usm-2.c
@@ -12,6 +12,8 @@ void *aligned_alloc (__SIZE_TYPE__, __SIZE_TYPE__);
 void *calloc(__SIZE_TYPE__, __SIZE_TYPE__);
 void *realloc(void *, __SIZE_TYPE__);
 void free (void *);
+void *omp_target_alloc (__SIZE_TYPE__, int);
+void omp_target_free (void *, int);
 
 #ifdef __cplusplus
 }
@@ -24,16 +26,21 @@ foo ()
   void *p2 = realloc(p1, 30);
   void *p3 = calloc(4, 15);
   void *p4 = aligned_alloc(16, 40);
+  void *p5 = omp_target_alloc(50, 1);
   free (p2);
   free (p3);
   free (p4);
+  omp_target_free (p5, 1);
 }
 
 /* { dg-final { scan-tree-dump-times "omp_alloc \\(20, 10\\)" 1 "usm_transform"  } } */
 /* { dg-final { scan-tree-dump-times "omp_realloc \\(.*, 30, 10, 10\\)" 1 "usm_transform"  } } */
 /* { dg-final { scan-tree-dump-times "omp_calloc \\(4, 15, 10\\)" 1 "usm_transform"  } } */
 /* { dg-final { scan-tree-dump-times "omp_aligned_alloc \\(16, 40, 10\\)" 1 "usm_transform"  } } */
-/* { dg-final { scan-tree-dump-times "omp_free" 3 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-times "omp_alloc \\(50, 10\\)" 1 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-times "omp_free" 4 "usm_transform"  } } */
 /* { dg-final { scan-tree-dump-not " free"  "usm_transform"  } } */
 /* { dg-final { scan-tree-dump-not " aligned_alloc"  "usm_transform"  } } */
 /* { dg-final { scan-tree-dump-not " malloc"  "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-not " omp_target_alloc"  "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-not " omp_target_free"  "usm_transform"  } } */
diff --git a/gcc/testsuite/c-c++-common/gomp/usm-3.c b/gcc/testsuite/c-c++-common/gomp/usm-3.c
index 934582ea5fd..2b0cbb45e27 100644
--- a/gcc/testsuite/c-c++-common/gomp/usm-3.c
+++ b/gcc/testsuite/c-c++-common/gomp/usm-3.c
@@ -10,6 +10,8 @@ void *aligned_alloc (__SIZE_TYPE__, __SIZE_TYPE__);
 void *calloc(__SIZE_TYPE__, __SIZE_TYPE__);
 void *realloc(void *, __SIZE_TYPE__);
 void free (void *);
+void *omp_target_alloc (__SIZE_TYPE__, int);
+void omp_target_free (void *, int);
 
 #ifdef __cplusplus
 }
@@ -22,16 +24,21 @@ foo ()
   void *p2 = realloc(p1, 30);
   void *p3 = calloc(4, 15);
   void *p4 = aligned_alloc(16, 40);
+  void *p5 = omp_target_alloc(50, 1);
   free (p2);
   free (p3);
   free (p4);
+  omp_target_free (p5, 1);
 }
 
 /* { dg-final { scan-tree-dump-times "omp_alloc \\(20, 10\\)" 1 "usm_transform"  } } */
 /* { dg-final { scan-tree-dump-times "omp_realloc \\(.*, 30, 10, 10\\)" 1 "usm_transform"  } } */
 /* { dg-final { scan-tree-dump-times "omp_calloc \\(4, 15, 10\\)" 1 "usm_transform"  } } */
 /* { dg-final { scan-tree-dump-times "omp_aligned_alloc \\(16, 40, 10\\)" 1 "usm_transform"  } } */
-/* { dg-final { scan-tree-dump-times "omp_free" 3 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-times "omp_alloc \\(50, 10\\)" 1 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-times "omp_free" 4 "usm_transform"  } } */
 /* { dg-final { scan-tree-dump-not " free"  "usm_transform"  } } */
 /* { dg-final { scan-tree-dump-not " aligned_alloc"  "usm_transform"  } } */
 /* { dg-final { scan-tree-dump-not " malloc"  "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-not " omp_target_alloc"  "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-not " omp_target_free"  "usm_transform"  } } */
diff --git a/libgomp/testsuite/libgomp.c/usm-6.c b/libgomp/testsuite/libgomp.c/usm-6.c
index d2c828fdc9d..c207140092a 100644
--- a/libgomp/testsuite/libgomp.c/usm-6.c
+++ b/libgomp/testsuite/libgomp.c/usm-6.c
@@ -4,6 +4,8 @@
 #include <stdint.h>
 #include <stdlib.h>
 
+#include <omp.h>
+
 /* On old systems, the declaraition may not be present in stdlib.h which
    will generate a warning.  This function is going to be replaced with
    omp_aligned_alloc so the purpose of this declaration is to avoid that
@@ -19,7 +21,8 @@ main ()
   int *b = (int *) calloc(sizeof(int), 3);
   int *c = (int *) realloc(NULL, sizeof(int) * 4);
   int *d = (int *) aligned_alloc(32, sizeof(int));
-  if (!a || !b || !c || !d)
+  int *e = (int *) omp_target_alloc(sizeof(int), 1);
+  if (!a || !b || !c || !d || !e)
     __builtin_abort ();
 
   a[0] = 42;
@@ -36,6 +39,7 @@ main ()
   uintptr_t b_p = (uintptr_t)b;
   uintptr_t c_p = (uintptr_t)c;
   uintptr_t d_p = (uintptr_t)d;
+  uintptr_t e_p = (uintptr_t)e;
 
   if (d_p & 31 != 0)
     __builtin_abort ();
@@ -52,9 +56,12 @@ main ()
 	__builtin_abort ();
       if (d_p != (uintptr_t)d)
 	__builtin_abort ();
+      if (e_p != (uintptr_t)e)
+	__builtin_abort ();
       a[0] = 72;
       b[0] = 82;
       c[0] = 92;
+      e[0] = 102;
     }
 
 #pragma omp target
@@ -74,10 +81,12 @@ main ()
 
   if (a[0] != 72 || a[1] != 73
       || b[0] != 82 || b[1] != 83
-      || c[0] != 92 || c[1] != 93)
+      || c[0] != 92 || c[1] != 93
+      || e[0] != 102)
 	__builtin_abort ();
   free(a);
   free(b);
   free(c);
+  omp_target_free(e, 1);
   return 0;
 }

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/5] openmp: Use libgomp memory allocation functions with unified shared memory.
  2022-04-02 12:04   ` Andrew Stubbs
@ 2022-04-02 12:42     ` Andrew Stubbs
  0 siblings, 0 replies; 12+ messages in thread
From: Andrew Stubbs @ 2022-04-02 12:42 UTC (permalink / raw)
  To: Hafiz Abid Qadeer, gcc-patches, fortran; +Cc: jakub, joseph

On 02/04/2022 13:04, Andrew Stubbs wrote:
> This additional patch adds transformation for omp_target_alloc. The 
> OpenMP 5.0 document says that addresses allocated this way needs to work 
> without is_device_ptr. The easiest way to make that work is to make them 
> USM addresses.

Actually, reading on, it says "Every device address allocated through 
OpenMP device memory routines is a valid host pointer", so USM is the 
correct answer.

Andrew


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/5] openmp: Handle pinned and unified shared memory.
  2022-03-08 11:30 [PATCH 0/5] openmp: Handle pinned and unified shared memory Hafiz Abid Qadeer
                   ` (4 preceding siblings ...)
  2022-03-08 11:30 ` [PATCH 5/5] openmp: -foffload-memory=pinned Hafiz Abid Qadeer
@ 2022-04-13 13:14 ` Andrew Stubbs
  2022-04-20 13:25 ` [PATCH] openmp: Handle unified address memory Andrew Stubbs
  6 siblings, 0 replies; 12+ messages in thread
From: Andrew Stubbs @ 2022-04-13 13:14 UTC (permalink / raw)
  To: Hafiz Abid Qadeer, gcc-patches, fortran; +Cc: jakub, joseph

[-- Attachment #1: Type: text/plain, Size: 338 bytes --]

This patch adjusts the testcases, previously proposed, to allow for 
testing on machines with varying page sizes and default amounts of 
lockable memory.  There turns out to be more variation than I had thought.

This should go on mainline at the same time as the previous patches in 
this thread.

I'll commit it to OG11 shortly.

Andrew

[-- Attachment #2: 220413-pinned-tests.patch --]
[-- Type: text/plain, Size: 8831 bytes --]

libgomp: autodetect page sizes in pinned memory tests

There's not one number that works everywhere.
This also fixes the failure mode on non-Linux hosts.

libgomp/ChangeLog:

	* testsuite/libgomp.c/alloc-pinned-1.c: Autodetect page size.
	* testsuite/libgomp.c/alloc-pinned-2.c: Likewise.
	* testsuite/libgomp.c/alloc-pinned-3.c: Likewise.
	* testsuite/libgomp.c/alloc-pinned-4.c: Likewise.
	* testsuite/libgomp.c/alloc-pinned-5.c: Likewise.
	* testsuite/libgomp.c/alloc-pinned-6.c: Likewise.
	* testsuite/libgomp.c/alloc-pinned-7.c: Clean up.

diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-1.c b/libgomp/testsuite/libgomp.c/alloc-pinned-1.c
index 0a6360cda29..79792b16d83 100644
--- a/libgomp/testsuite/libgomp.c/alloc-pinned-1.c
+++ b/libgomp/testsuite/libgomp.c/alloc-pinned-1.c
@@ -4,13 +4,23 @@
 
 /* Test that pinned memory works.  */
 
+#include <stdio.h>
+#include <stdlib.h>
+
 #ifdef __linux__
 #include <sys/types.h>
 #include <unistd.h>
-#include <stdio.h>
-#include <stdlib.h>
 
 #include <sys/mman.h>
+#include <sys/resource.h>
+
+#define PAGE_SIZE sysconf(_SC_PAGESIZE)
+#define CHECK_SIZE(SIZE) { \
+  struct rlimit limit; \
+  if (getrlimit (RLIMIT_MEMLOCK, &limit) \
+      || limit.rlim_cur <= SIZE) \
+    fprintf (stderr, "unsufficient lockable memory; please increase ulimit\n"); \
+  }
 
 int
 get_pinned_mem ()
@@ -34,6 +44,9 @@ get_pinned_mem ()
   abort ();
 }
 #else
+#define PAGE_SIZE 1 /* unknown */
+#define CHECK_SIZE(SIZE) fprintf (stderr, "OS unsupported\n");
+
 int
 get_pinned_mem ()
 {
@@ -43,12 +56,13 @@ get_pinned_mem ()
 
 #include <omp.h>
 
-/* Allocate more than a page each time, but stay within the ulimit.  */
-#define SIZE 10*1024
-
 int
 main ()
 {
+  /* Allocate at least a page each time, but stay within the ulimit.  */
+  const int SIZE = PAGE_SIZE;
+  CHECK_SIZE (SIZE*3);
+
   const omp_alloctrait_t traits[] = {
       { omp_atk_pinned, 1 }
   };
diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-2.c b/libgomp/testsuite/libgomp.c/alloc-pinned-2.c
index 8fdb4ff5cfd..228c656b715 100644
--- a/libgomp/testsuite/libgomp.c/alloc-pinned-2.c
+++ b/libgomp/testsuite/libgomp.c/alloc-pinned-2.c
@@ -4,13 +4,23 @@
 
 /* Test that pinned memory works (pool_size code path).  */
 
+#include <stdio.h>
+#include <stdlib.h>
+
 #ifdef __linux__
 #include <sys/types.h>
 #include <unistd.h>
-#include <stdio.h>
-#include <stdlib.h>
 
 #include <sys/mman.h>
+#include <sys/resource.h>
+
+#define PAGE_SIZE sysconf(_SC_PAGESIZE)
+#define CHECK_SIZE(SIZE) { \
+  struct rlimit limit; \
+  if (getrlimit (RLIMIT_MEMLOCK, &limit) \
+      || limit.rlim_cur <= SIZE) \
+    fprintf (stderr, "unsufficient lockable memory; please increase ulimit\n"); \
+  }
 
 int
 get_pinned_mem ()
@@ -34,6 +44,9 @@ get_pinned_mem ()
   abort ();
 }
 #else
+#define PAGE_SIZE 1 /* unknown */
+#define CHECK_SIZE(SIZE) fprintf (stderr, "OS unsupported\n");
+
 int
 get_pinned_mem ()
 {
@@ -43,12 +56,13 @@ get_pinned_mem ()
 
 #include <omp.h>
 
-/* Allocate more than a page each time, but stay within the ulimit.  */
-#define SIZE 10*1024
-
 int
 main ()
 {
+  /* Allocate at least a page each time, but stay within the ulimit.  */
+  const int SIZE = PAGE_SIZE;
+  CHECK_SIZE (SIZE*3);
+
   const omp_alloctrait_t traits[] = {
       { omp_atk_pinned, 1 },
       { omp_atk_pool_size, SIZE*8 }
diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-3.c b/libgomp/testsuite/libgomp.c/alloc-pinned-3.c
index 943dfea5c9b..90539ffe3e0 100644
--- a/libgomp/testsuite/libgomp.c/alloc-pinned-3.c
+++ b/libgomp/testsuite/libgomp.c/alloc-pinned-3.c
@@ -2,15 +2,18 @@
 
 /* Test that pinned memory fails correctly.  */
 
+#include <stdio.h>
+#include <stdlib.h>
+
 #ifdef __linux__
 #include <sys/types.h>
 #include <unistd.h>
-#include <stdio.h>
-#include <stdlib.h>
 
 #include <sys/mman.h>
 #include <sys/resource.h>
 
+#define PAGE_SIZE sysconf(_SC_PAGESIZE)
+
 int
 get_pinned_mem ()
 {
@@ -45,6 +48,8 @@ set_pin_limit (int size)
 }
 #else
 int
+#define PAGE_SIZE 10000*1024 /* unknown */
+
 get_pinned_mem ()
 {
   return 0;
@@ -58,12 +63,12 @@ set_pin_limit ()
 
 #include <omp.h>
 
-/* This should be large enough to cover multiple pages.  */
-#define SIZE 10000*1024
-
 int
 main ()
 {
+  /* This needs to be large enough to cover multiple pages.  */
+  const int SIZE = PAGE_SIZE*4;
+
   /* Pinned memory, no fallback.  */
   const omp_alloctrait_t traits1[] = {
       { omp_atk_pinned, 1 },
diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-4.c b/libgomp/testsuite/libgomp.c/alloc-pinned-4.c
index d9cb8dfe1fd..534e49eefc4 100644
--- a/libgomp/testsuite/libgomp.c/alloc-pinned-4.c
+++ b/libgomp/testsuite/libgomp.c/alloc-pinned-4.c
@@ -2,15 +2,18 @@
 
 /* Test that pinned memory fails correctly, pool_size code path.  */
 
+#include <stdio.h>
+#include <stdlib.h>
+
 #ifdef __linux__
 #include <sys/types.h>
 #include <unistd.h>
-#include <stdio.h>
-#include <stdlib.h>
 
 #include <sys/mman.h>
 #include <sys/resource.h>
 
+#define PAGE_SIZE sysconf(_SC_PAGESIZE)
+
 int
 get_pinned_mem ()
 {
@@ -45,6 +48,8 @@ set_pin_limit (int size)
 }
 #else
 int
+#define PAGE_SIZE 10000*1024 /* unknown */
+
 get_pinned_mem ()
 {
   return 0;
@@ -58,12 +63,12 @@ set_pin_limit ()
 
 #include <omp.h>
 
-/* This should be large enough to cover multiple pages.  */
-#define SIZE 10000*1024
-
 int
 main ()
 {
+  /* This needs to be large enough to cover multiple pages.  */
+  const int SIZE = PAGE_SIZE*4;
+
   /* Pinned memory, no fallback.  */
   const omp_alloctrait_t traits1[] = {
       { omp_atk_pinned, 1 },
diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-5.c b/libgomp/testsuite/libgomp.c/alloc-pinned-5.c
index 8355ca83790..315c7161a39 100644
--- a/libgomp/testsuite/libgomp.c/alloc-pinned-5.c
+++ b/libgomp/testsuite/libgomp.c/alloc-pinned-5.c
@@ -4,13 +4,23 @@
 
 /* Test that ompx_pinned_mem_alloc works.  */
 
+#include <stdio.h>
+#include <stdlib.h>
+
 #ifdef __linux__
 #include <sys/types.h>
 #include <unistd.h>
-#include <stdio.h>
-#include <stdlib.h>
 
 #include <sys/mman.h>
+#include <sys/resource.h>
+
+#define PAGE_SIZE sysconf(_SC_PAGESIZE)
+#define CHECK_SIZE(SIZE) { \
+  struct rlimit limit; \
+  if (getrlimit (RLIMIT_MEMLOCK, &limit) \
+      || limit.rlim_cur <= SIZE) \
+    fprintf (stderr, "unsufficient lockable memory; please increase ulimit\n"); \
+  }
 
 int
 get_pinned_mem ()
@@ -34,6 +44,9 @@ get_pinned_mem ()
   abort ();
 }
 #else
+#define PAGE_SIZE 1 /* unknown */
+#define CHECK_SIZE(SIZE) fprintf (stderr, "OS unsupported\n");
+
 int
 get_pinned_mem ()
 {
@@ -43,12 +56,13 @@ get_pinned_mem ()
 
 #include <omp.h>
 
-/* Allocate more than a page each time, but stay within the ulimit.  */
-#define SIZE 10*1024
-
 int
 main ()
 {
+  /* Allocate at least a page each time, but stay within the ulimit.  */
+  const int SIZE = PAGE_SIZE;
+  CHECK_SIZE (SIZE*3);
+
   // Sanity check
   if (get_pinned_mem () != 0)
     abort ();
diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-6.c b/libgomp/testsuite/libgomp.c/alloc-pinned-6.c
index 80fd37ab875..bbe20c04875 100644
--- a/libgomp/testsuite/libgomp.c/alloc-pinned-6.c
+++ b/libgomp/testsuite/libgomp.c/alloc-pinned-6.c
@@ -2,15 +2,18 @@
 
 /* Test that ompx_pinned_mem_alloc fails correctly.  */
 
+#include <stdio.h>
+#include <stdlib.h>
+
 #ifdef __linux__
 #include <sys/types.h>
 #include <unistd.h>
-#include <stdio.h>
-#include <stdlib.h>
 
 #include <sys/mman.h>
 #include <sys/resource.h>
 
+#define PAGE_SIZE sysconf(_SC_PAGESIZE)
+
 int
 get_pinned_mem ()
 {
@@ -44,6 +47,8 @@ set_pin_limit (int size)
     abort ();
 }
 #else
+#define PAGE_SIZE 10000*1024 /* unknown */
+
 int
 get_pinned_mem ()
 {
@@ -58,12 +63,12 @@ set_pin_limit ()
 
 #include <omp.h>
 
-/* This should be large enough to cover multiple pages.  */
-#define SIZE 10000*1024
-
 int
 main ()
 {
+  /* Allocate at least a page each time, but stay within the ulimit.  */
+  const int SIZE = PAGE_SIZE*4;
+
   /* Ensure that the limit is smaller than the allocation.  */
   set_pin_limit (SIZE/2);
 
diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-7.c b/libgomp/testsuite/libgomp.c/alloc-pinned-7.c
index 6fd19b46a5c..8dc19055038 100644
--- a/libgomp/testsuite/libgomp.c/alloc-pinned-7.c
+++ b/libgomp/testsuite/libgomp.c/alloc-pinned-7.c
@@ -5,11 +5,12 @@
 
 /* Test that pinned memory works.  */
 
+#include <stdio.h>
+#include <stdlib.h>
+
 #ifdef __linux__
 #include <sys/types.h>
 #include <unistd.h>
-#include <stdio.h>
-#include <stdlib.h>
 
 #include <sys/mman.h>
 
@@ -28,7 +29,6 @@ get_pinned_mem ()
       int val;
       if (sscanf (buf, "VmLck: %d", &val))
 	{
-	  printf ("lock %d\n", val);
 	  fclose (proc);
 	  return val;
 	}
@@ -47,9 +47,6 @@ get_pinned_mem ()
 
 #include <omp.h>
 
-/* Allocate more than a page each time, but stay within the ulimit.  */
-#define SIZE 10*1024
-
 int
 main ()
 {

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] openmp: Handle unified address memory.
  2022-03-08 11:30 [PATCH 0/5] openmp: Handle pinned and unified shared memory Hafiz Abid Qadeer
                   ` (5 preceding siblings ...)
  2022-04-13 13:14 ` [PATCH 0/5] openmp: Handle pinned and unified shared memory Andrew Stubbs
@ 2022-04-20 13:25 ` Andrew Stubbs
  6 siblings, 0 replies; 12+ messages in thread
From: Andrew Stubbs @ 2022-04-20 13:25 UTC (permalink / raw)
  To: Hafiz Abid Qadeer, gcc-patches, fortran; +Cc: jakub, joseph

[-- Attachment #1: Type: text/plain, Size: 393 bytes --]

This patch adds enough support for "requires unified_address" to make 
the sollve_vv testcases pass. It implements unified_address as a synonym 
of unified_shared_memory, which is both valid and the only way I know of 
to unify addresses with Cuda (could be wrong).

This patch should be applied on to of the previous patch set for USM.

OK for stage 1?

I'll apply it to OG11 shortly.

Andrew

[-- Attachment #2: 220420-unified_address.patch --]
[-- Type: text/plain, Size: 6037 bytes --]

openmp: unified_address support

This makes "requires unified_address" work by making it eqivalent to
"requires unified_shared_memory".  This is more than is strictly necessary,
but should be standard compliant.

gcc/c/ChangeLog:

	* c-parser.c (c_parser_omp_requires): Check requires unified_address
	for conflict with -foffload-memory=shared.

gcc/cp/ChangeLog:

	* parser.c (cp_parser_omp_requires): Check requires unified_address
	for conflict with -foffload-memory=shared.

gcc/fortran/ChangeLog:

	* openmp.c (gfc_match_omp_requires): Check requires unified_address
	for conflict with -foffload-memory=shared.

gcc/ChangeLog:

	* omp-low.c: Do USM transformations for "unified_address".

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/usm-4.c: New test.
	* gfortran.dg/gomp/usm-4.f90: New test.

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 12408770193..9a3d0cb8cea 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -22531,18 +22531,27 @@ c_parser_omp_requires (c_parser *parser)
 	  enum omp_requires this_req = (enum omp_requires) 0;
 
 	  if (!strcmp (p, "unified_address"))
-	    this_req = OMP_REQUIRES_UNIFIED_ADDRESS;
+	    {
+	      this_req = OMP_REQUIRES_UNIFIED_ADDRESS;
+
+	      if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED
+		  && flag_offload_memory != OFFLOAD_MEMORY_NONE)
+		error_at (cloc,
+			  "unified_address is incompatible with the "
+			  "selected -foffload-memory option");
+	      flag_offload_memory = OFFLOAD_MEMORY_UNIFIED;
+	    }
 	  else if (!strcmp (p, "unified_shared_memory"))
-	  {
-	    this_req = OMP_REQUIRES_UNIFIED_SHARED_MEMORY;
-
-	    if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED
-		&& flag_offload_memory != OFFLOAD_MEMORY_NONE)
-	      error_at (cloc,
-			"unified_shared_memory is incompatible with the "
-			"selected -foffload-memory option");
-	    flag_offload_memory = OFFLOAD_MEMORY_UNIFIED;
-	  }
+	    {
+	      this_req = OMP_REQUIRES_UNIFIED_SHARED_MEMORY;
+
+	      if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED
+		  && flag_offload_memory != OFFLOAD_MEMORY_NONE)
+		error_at (cloc,
+			  "unified_shared_memory is incompatible with the "
+			  "selected -foffload-memory option");
+	      flag_offload_memory = OFFLOAD_MEMORY_UNIFIED;
+	    }
 	  else if (!strcmp (p, "dynamic_allocators"))
 	    this_req = OMP_REQUIRES_DYNAMIC_ALLOCATORS;
 	  else if (!strcmp (p, "reverse_offload"))
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index fd9f62f4543..3a9ea272f10 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -46406,18 +46406,27 @@ cp_parser_omp_requires (cp_parser *parser, cp_token *pragma_tok)
 	  enum omp_requires this_req = (enum omp_requires) 0;
 
 	  if (!strcmp (p, "unified_address"))
-	    this_req = OMP_REQUIRES_UNIFIED_ADDRESS;
+	    {
+	      this_req = OMP_REQUIRES_UNIFIED_ADDRESS;
+
+	      if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED
+		  && flag_offload_memory != OFFLOAD_MEMORY_NONE)
+		error_at (cloc,
+			  "unified_address is incompatible with the "
+			  "selected -foffload-memory option");
+	      flag_offload_memory = OFFLOAD_MEMORY_UNIFIED;
+	    }
 	  else if (!strcmp (p, "unified_shared_memory"))
-	  {
-	    this_req = OMP_REQUIRES_UNIFIED_SHARED_MEMORY;
-
-	    if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED
-		&& flag_offload_memory != OFFLOAD_MEMORY_NONE)
-	      error_at (cloc,
-			"unified_shared_memory is incompatible with the "
-			"selected -foffload-memory option");
-	    flag_offload_memory = OFFLOAD_MEMORY_UNIFIED;
-	  }
+	    {
+	      this_req = OMP_REQUIRES_UNIFIED_SHARED_MEMORY;
+
+	      if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED
+		  && flag_offload_memory != OFFLOAD_MEMORY_NONE)
+		error_at (cloc,
+			  "unified_shared_memory is incompatible with the "
+			  "selected -foffload-memory option");
+	      flag_offload_memory = OFFLOAD_MEMORY_UNIFIED;
+	    }
 	  else if (!strcmp (p, "dynamic_allocators"))
 	    this_req = OMP_REQUIRES_DYNAMIC_ALLOCATORS;
 	  else if (!strcmp (p, "reverse_offload"))
diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index ac4126bd7ea..ece04c03a68 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -5546,6 +5546,12 @@ gfc_match_omp_requires (void)
 	  requires_clause = OMP_REQ_UNIFIED_ADDRESS;
 	  if (requires_clauses & OMP_REQ_UNIFIED_ADDRESS)
 	    goto duplicate_clause;
+
+	  if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED
+	      && flag_offload_memory != OFFLOAD_MEMORY_NONE)
+	    gfc_error_now ("unified_address at %C is incompatible with "
+			   "the selected -foffload-memory option");
+	  flag_offload_memory = OFFLOAD_MEMORY_UNIFIED;
 	}
       else if (gfc_match (clauses[2]) == MATCH_YES)
 	{
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 4653370aa41..ce30f53dbb5 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -16008,7 +16008,8 @@ public:
   {
     return (flag_openmp || flag_openmp_simd)
 	    && (flag_offload_memory == OFFLOAD_MEMORY_UNIFIED
-		|| omp_requires_mask & OMP_REQUIRES_UNIFIED_SHARED_MEMORY);
+		|| omp_requires_mask & OMP_REQUIRES_UNIFIED_SHARED_MEMORY
+		|| omp_requires_mask & OMP_REQUIRES_UNIFIED_ADDRESS);
   }
   virtual unsigned int execute (function *)
   {
diff --git a/gcc/testsuite/c-c++-common/gomp/usm-4.c b/gcc/testsuite/c-c++-common/gomp/usm-4.c
new file mode 100644
index 00000000000..b19664e9b66
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/usm-4.c
@@ -0,0 +1,4 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-foffload-memory=pinned" } */
+
+#pragma omp requires unified_address        /* { dg-error "unified_address is incompatible with the selected -foffload-memory option" } */
diff --git a/gcc/testsuite/gfortran.dg/gomp/usm-4.f90 b/gcc/testsuite/gfortran.dg/gomp/usm-4.f90
new file mode 100644
index 00000000000..725b07f2f88
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/usm-4.f90
@@ -0,0 +1,6 @@
+! { dg-do compile }
+! { dg-additional-options "-foffload-memory=pinned" }
+
+!$omp requires unified_address  ! { dg-error "unified_address at .* is incompatible with the selected -foffload-memory option" }
+
+end

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [og12] 'c-c++-common/gomp/alloc-pinned-1.c' -> 'libgomp.c-c++-common/alloc-pinned-1.c' (was: [PATCH 5/5] openmp: -foffload-memory=pinned)
  2022-03-08 11:30 ` [PATCH 5/5] openmp: -foffload-memory=pinned Hafiz Abid Qadeer
  2022-03-30 22:40   ` Andrew Stubbs
@ 2023-02-09 11:16   ` Thomas Schwinge
  1 sibling, 0 replies; 12+ messages in thread
From: Thomas Schwinge @ 2023-02-09 11:16 UTC (permalink / raw)
  To: gcc-patches, ams; +Cc: Hafiz Abid Qadeer, fortran, jakub, joseph

[-- Attachment #1: Type: text/plain, Size: 967 bytes --]

Hi!

On 2022-03-08T11:30:59+0000, Hafiz Abid Qadeer <abidh@codesourcery.com> wrote:
> From: Andrew Stubbs <ams@codesourcery.com>
>
> [...]

> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/gomp/alloc-pinned-1.c
> @@ -0,0 +1,28 @@
> +/* { dg-do run } */

Pushed to devel/omp/gcc-12 branch
commit 9c0ffa3776a135a69697253a0bd75ebf9b9d0150
"'c-c++-common/gomp/alloc-pinned-1.c' -> 'libgomp.c-c++-common/alloc-pinned-1.c'",
see attached.

Note that this likewise applies to the current upstream submission:
<inbox.sourceware.org/gcc-patches/8011a994bb38db60f37127880b0fc682564f6e8d.1657188329.git.ams@codesourcery.com>
"openmp: -foffload-memory=pinned".


Grüße
 Thomas


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-c-c-common-gomp-alloc-pinned-1.c-libgomp.c-c-common-.patch --]
[-- Type: text/x-diff, Size: 2046 bytes --]

From 9c0ffa3776a135a69697253a0bd75ebf9b9d0150 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Mon, 30 Jan 2023 17:46:29 +0100
Subject: [PATCH] 'c-c++-common/gomp/alloc-pinned-1.c' ->
 'libgomp.c-c++-common/alloc-pinned-1.c'

Otherwise, for build-tree testing:

    xgcc: fatal error: cannot read spec file 'libgomp.spec': No such file or directory

..., and thus corresponding FAILs, UNRESOLVEDs.

Fix-up for og12 commit 842df187487f5b16ae29bbe7e9acd79661a9df48
"openmp: -foffload-memory=pinned".

	gcc/testsuite/
	* c-c++-common/gomp/alloc-pinned-1.c: Cut.
	libgomp/
	* testsuite/libgomp.c-c++-common/alloc-pinned-1.c: Paste.
---
 gcc/testsuite/ChangeLog.omp                                   | 2 ++
 libgomp/ChangeLog.omp                                         | 4 ++++
 .../testsuite/libgomp.c-c++-common}/alloc-pinned-1.c          | 0
 3 files changed, 6 insertions(+)
 rename {gcc/testsuite/c-c++-common/gomp => libgomp/testsuite/libgomp.c-c++-common}/alloc-pinned-1.c (100%)

diff --git a/gcc/testsuite/ChangeLog.omp b/gcc/testsuite/ChangeLog.omp
index 42769c7dae5..9f9d5a10ac3 100644
--- a/gcc/testsuite/ChangeLog.omp
+++ b/gcc/testsuite/ChangeLog.omp
@@ -1,5 +1,7 @@
 2023-02-09  Thomas Schwinge  <thomas@codesourcery.com>
 
+	* c-c++-common/gomp/alloc-pinned-1.c: Cut.
+
 	* gfortran.dg/gomp/allocate-4.f90: Fix 'omp_allocator_handle_kind'
 	example.
 
diff --git a/libgomp/ChangeLog.omp b/libgomp/ChangeLog.omp
index d319d43ceb0..39165173884 100644
--- a/libgomp/ChangeLog.omp
+++ b/libgomp/ChangeLog.omp
@@ -1,3 +1,7 @@
+2023-02-09  Thomas Schwinge  <thomas@codesourcery.com>
+
+	* testsuite/libgomp.c-c++-common/alloc-pinned-1.c: Paste.
+
 2023-02-08  Tobias Burnus  <tobias@codesourcery.com>
 
 	Backported from master:
diff --git a/gcc/testsuite/c-c++-common/gomp/alloc-pinned-1.c b/libgomp/testsuite/libgomp.c-c++-common/alloc-pinned-1.c
similarity index 100%
rename from gcc/testsuite/c-c++-common/gomp/alloc-pinned-1.c
rename to libgomp/testsuite/libgomp.c-c++-common/alloc-pinned-1.c
-- 
2.25.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-02-09 11:16 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-08 11:30 [PATCH 0/5] openmp: Handle pinned and unified shared memory Hafiz Abid Qadeer
2022-03-08 11:30 ` [PATCH 1/5] openmp: Add -foffload-memory Hafiz Abid Qadeer
2022-03-08 11:30 ` [PATCH 2/5] openmp: allow requires unified_shared_memory Hafiz Abid Qadeer
2022-03-08 11:30 ` [PATCH 3/5] openmp, nvptx: ompx_unified_shared_mem_alloc Hafiz Abid Qadeer
2022-03-08 11:30 ` [PATCH 4/5] openmp: Use libgomp memory allocation functions with unified shared memory Hafiz Abid Qadeer
2022-04-02 12:04   ` Andrew Stubbs
2022-04-02 12:42     ` Andrew Stubbs
2022-03-08 11:30 ` [PATCH 5/5] openmp: -foffload-memory=pinned Hafiz Abid Qadeer
2022-03-30 22:40   ` Andrew Stubbs
2023-02-09 11:16   ` [og12] 'c-c++-common/gomp/alloc-pinned-1.c' -> 'libgomp.c-c++-common/alloc-pinned-1.c' (was: [PATCH 5/5] openmp: -foffload-memory=pinned) Thomas Schwinge
2022-04-13 13:14 ` [PATCH 0/5] openmp: Handle pinned and unified shared memory Andrew Stubbs
2022-04-20 13:25 ` [PATCH] openmp: Handle unified address memory Andrew Stubbs

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).