[PATCH 00/17] openmp, nvptx, amdgcn: 5.0 Memory Allocators

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH 00/17] openmp, nvptx, amdgcn: 5.0 Memory Allocators
@ 2022-07-07 10:34 Andrew Stubbs
  2022-07-07 10:34 ` [PATCH 01/17] libgomp, nvptx: low-latency memory allocator Andrew Stubbs
                   ` (16 more replies)
  0 siblings, 17 replies; 30+ messages in thread
From: Andrew Stubbs @ 2022-07-07 10:34 UTC (permalink / raw)
  To: gcc-patches

This patch series implements OpenMP allocators for low-latency memory on
nvptx, unified shared memory on both nvptx and amdgcn, and generic
pinned memory support for all Linux hosts (an nvptx-specific
implementation using Cuda pinned memory is planned for the future, as is
low-latency memory on amdgcn).

Patches 01 to 14 are reposts of patches previously submitted, now
forward ported to the current master branch and with the various
follow-up patches folded in. Where it conflicts with the new memkind
implementation the memkind takes precedence (but there's currently no way to
implement memory that's both high-bandwidth and pinned anyway).

Patches 15 to 17 are new work. I can probably approve these myself, but
they can't be committed until the rest of the series is approved.

Andrew

Andrew Stubbs (11):
  libgomp, nvptx: low-latency memory allocator
  libgomp: pinned memory
  libgomp, openmp: Add ompx_pinned_mem_alloc
  openmp, nvptx: low-lat memory access traits
  openmp, nvptx: ompx_unified_shared_mem_alloc
  openmp: Add -foffload-memory
  openmp: allow requires unified_shared_memory
  openmp: -foffload-memory=pinned
  amdgcn: Support XNACK mode
  amdgcn, openmp: Auto-detect USM mode and set HSA_XNACK
  amdgcn: libgomp plugin USM implementation

Hafiz Abid Qadeer (6):
  openmp: Use libgomp memory allocation functions with unified shared
    memory.
  Add parsing support for allocate directive (OpenMP 5.0)
  Translate allocate directive (OpenMP 5.0).
  Handle cleanup of omp allocated variables (OpenMP 5.0).
  Gimplify allocate directive (OpenMP 5.0).
  Lower allocate directive (OpenMP 5.0).

 gcc/c/c-parser.cc                             |  22 +-
 gcc/common.opt                                |  16 +
 gcc/config/gcn/gcn-hsa.h                      |   3 +-
 gcc/config/gcn/gcn-opts.h                     |  10 +-
 gcc/config/gcn/gcn-valu.md                    |  29 +-
 gcc/config/gcn/gcn.cc                         |  62 ++-
 gcc/config/gcn/gcn.md                         | 113 +++--
 gcc/config/gcn/gcn.opt                        |  18 +-
 gcc/config/gcn/mkoffload.cc                   |  56 ++-
 gcc/coretypes.h                               |   7 +
 gcc/cp/parser.cc                              |  22 +-
 gcc/doc/gimple.texi                           |  38 +-
 gcc/doc/invoke.texi                           |  16 +-
 gcc/fortran/dump-parse-tree.cc                |   3 +
 gcc/fortran/gfortran.h                        |   5 +-
 gcc/fortran/match.h                           |   1 +
 gcc/fortran/openmp.cc                         | 242 ++++++++++-
 gcc/fortran/parse.cc                          |  10 +-
 gcc/fortran/resolve.cc                        |   1 +
 gcc/fortran/st.cc                             |   1 +
 gcc/fortran/trans-decl.cc                     |  20 +
 gcc/fortran/trans-openmp.cc                   |  50 +++
 gcc/fortran/trans.cc                          |   1 +
 gcc/gimple-pretty-print.cc                    |  37 ++
 gcc/gimple.cc                                 |  12 +
 gcc/gimple.def                                |   6 +
 gcc/gimple.h                                  |  60 ++-
 gcc/gimplify.cc                               |  19 +
 gcc/gsstruct.def                              |   1 +
 gcc/omp-builtins.def                          |   3 +
 gcc/omp-low.cc                                | 383 +++++++++++++++++
 gcc/passes.def                                |   1 +
 .../c-c++-common/gomp/alloc-pinned-1.c        |  28 ++
 gcc/testsuite/c-c++-common/gomp/usm-1.c       |   4 +
 gcc/testsuite/c-c++-common/gomp/usm-2.c       |  46 +++
 gcc/testsuite/c-c++-common/gomp/usm-3.c       |  44 ++
 gcc/testsuite/c-c++-common/gomp/usm-4.c       |   4 +
 gcc/testsuite/g++.dg/gomp/usm-1.C             |  32 ++
 gcc/testsuite/g++.dg/gomp/usm-2.C             |  30 ++
 gcc/testsuite/g++.dg/gomp/usm-3.C             |  38 ++
 gcc/testsuite/gfortran.dg/gomp/allocate-4.f90 | 112 +++++
 gcc/testsuite/gfortran.dg/gomp/allocate-5.f90 |  73 ++++
 gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 |  84 ++++
 gcc/testsuite/gfortran.dg/gomp/allocate-7.f90 |  13 +
 gcc/testsuite/gfortran.dg/gomp/allocate-8.f90 |  15 +
 gcc/testsuite/gfortran.dg/gomp/usm-1.f90      |   6 +
 gcc/testsuite/gfortran.dg/gomp/usm-2.f90      |  16 +
 gcc/testsuite/gfortran.dg/gomp/usm-3.f90      |  13 +
 gcc/testsuite/gfortran.dg/gomp/usm-4.f90      |   6 +
 gcc/tree-core.h                               |   9 +
 gcc/tree-pass.h                               |   1 +
 gcc/tree-pretty-print.cc                      |  23 ++
 gcc/tree.cc                                   |   1 +
 gcc/tree.def                                  |   4 +
 gcc/tree.h                                    |  15 +
 include/cuda/cuda.h                           |  12 +
 libgomp/allocator.c                           | 304 ++++++++++----
 libgomp/config/linux/allocator.c              | 137 +++++++
 libgomp/config/nvptx/allocator.c              | 387 ++++++++++++++++++
 libgomp/config/nvptx/team.c                   |  28 ++
 libgomp/libgomp-plugin.h                      |   3 +
 libgomp/libgomp.h                             |   6 +
 libgomp/libgomp.map                           |   1 +
 libgomp/omp.h.in                              |   5 +
 libgomp/omp_lib.f90.in                        |  10 +
 libgomp/plugin/cuda-lib.def                   |   2 +
 libgomp/plugin/plugin-gcn.c                   | 104 ++++-
 libgomp/plugin/plugin-nvptx.c                 |  70 +++-
 libgomp/target.c                              |  66 +++
 libgomp/testsuite/lib/libgomp.exp             |  22 +
 libgomp/testsuite/libgomp.c++/usm-1.C         |  54 +++
 .../libgomp.c-c++-common/requires-1.c         |   1 +
 libgomp/testsuite/libgomp.c/alloc-pinned-1.c  |  95 +++++
 libgomp/testsuite/libgomp.c/alloc-pinned-2.c  | 101 +++++
 libgomp/testsuite/libgomp.c/alloc-pinned-3.c  | 130 ++++++
 libgomp/testsuite/libgomp.c/alloc-pinned-4.c  | 132 ++++++
 libgomp/testsuite/libgomp.c/alloc-pinned-5.c  |  90 ++++
 libgomp/testsuite/libgomp.c/alloc-pinned-6.c  | 101 +++++
 libgomp/testsuite/libgomp.c/alloc-pinned-7.c  |  63 +++
 libgomp/testsuite/libgomp.c/allocators-1.c    |  56 +++
 libgomp/testsuite/libgomp.c/allocators-2.c    |  64 +++
 libgomp/testsuite/libgomp.c/allocators-3.c    |  42 ++
 libgomp/testsuite/libgomp.c/allocators-4.c    | 197 +++++++++
 libgomp/testsuite/libgomp.c/allocators-5.c    |  63 +++
 libgomp/testsuite/libgomp.c/allocators-6.c    | 118 ++++++
 libgomp/testsuite/libgomp.c/allocators-7.c    |  68 +++
 libgomp/testsuite/libgomp.c/usm-1.c           |  25 ++
 libgomp/testsuite/libgomp.c/usm-2.c           |  33 ++
 libgomp/testsuite/libgomp.c/usm-3.c           |  36 ++
 libgomp/testsuite/libgomp.c/usm-4.c           |  37 ++
 libgomp/testsuite/libgomp.c/usm-5.c           |  28 ++
 libgomp/testsuite/libgomp.c/usm-6.c           |  92 +++++
 .../libgomp.fortran/alloc-pinned-1.f90        |  16 +
 .../testsuite/libgomp.fortran/allocate-2.f90  |  48 +++
 94 files changed, 4535 insertions(+), 197 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/alloc-pinned-1.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/usm-1.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/usm-2.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/usm-3.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/usm-4.c
 create mode 100644 gcc/testsuite/g++.dg/gomp/usm-1.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/usm-2.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/usm-3.C
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/allocate-4.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/allocate-5.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/allocate-6.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/allocate-7.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/allocate-8.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/usm-1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/usm-2.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/usm-3.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/usm-4.f90
 create mode 100644 libgomp/config/nvptx/allocator.c
 create mode 100644 libgomp/testsuite/libgomp.c++/usm-1.C
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-1.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-2.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-3.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-4.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-5.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-6.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-7.c
 create mode 100644 libgomp/testsuite/libgomp.c/allocators-1.c
 create mode 100644 libgomp/testsuite/libgomp.c/allocators-2.c
 create mode 100644 libgomp/testsuite/libgomp.c/allocators-3.c
 create mode 100644 libgomp/testsuite/libgomp.c/allocators-4.c
 create mode 100644 libgomp/testsuite/libgomp.c/allocators-5.c
 create mode 100644 libgomp/testsuite/libgomp.c/allocators-6.c
 create mode 100644 libgomp/testsuite/libgomp.c/allocators-7.c
 create mode 100644 libgomp/testsuite/libgomp.c/usm-1.c
 create mode 100644 libgomp/testsuite/libgomp.c/usm-2.c
 create mode 100644 libgomp/testsuite/libgomp.c/usm-3.c
 create mode 100644 libgomp/testsuite/libgomp.c/usm-4.c
 create mode 100644 libgomp/testsuite/libgomp.c/usm-5.c
 create mode 100644 libgomp/testsuite/libgomp.c/usm-6.c
 create mode 100644 libgomp/testsuite/libgomp.fortran/alloc-pinned-1.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/allocate-2.f90

-- 
2.33.0


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 01/17] libgomp, nvptx: low-latency memory allocator
  2022-07-07 10:34 [PATCH 00/17] openmp, nvptx, amdgcn: 5.0 Memory Allocators Andrew Stubbs
@ 2022-07-07 10:34 ` Andrew Stubbs
  2022-12-08 11:40   ` Jakub Jelinek
  2022-07-07 10:34 ` [PATCH 02/17] libgomp: pinned memory Andrew Stubbs
                   ` (15 subsequent siblings)
  16 siblings, 1 reply; 30+ messages in thread
From: Andrew Stubbs @ 2022-07-07 10:34 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2765 bytes --]


This patch adds support for allocating low-latency ".shared" memory on
NVPTX GPU device, via the omp_low_lat_mem_space and omp_alloc.  The memory
can be allocated, reallocated, and freed using a basic but fast algorithm,
is thread safe and the size of the low-latency heap can be configured using
the GOMP_NVPTX_LOWLAT_POOL environment variable.

The use of the PTX dynamic_smem_size feature means that low-latency allocator
will not work with the PTX 3.1 multilib.

libgomp/ChangeLog:

	* allocator.c (MEMSPACE_ALLOC): New macro.
	(MEMSPACE_CALLOC): New macro.
	(MEMSPACE_REALLOC): New macro.
	(MEMSPACE_FREE): New macro.
	(dynamic_smem_size): New constants.
	(omp_alloc): Use MEMSPACE_ALLOC.
	Implement fall-backs for predefined allocators.
	(omp_free): Use MEMSPACE_FREE.
	(omp_calloc): Use MEMSPACE_CALLOC.
	Implement fall-backs for predefined allocators.
	(omp_realloc): Use MEMSPACE_REALLOC and MEMSPACE_ALLOC..
	Implement fall-backs for predefined allocators.
	* config/nvptx/team.c (__nvptx_lowlat_heap_root): New variable.
	(__nvptx_lowlat_pool): New asm varaible.
	(gomp_nvptx_main): Initialize the low-latency heap.
	* plugin/plugin-nvptx.c (lowlat_pool_size): New variable.
	(GOMP_OFFLOAD_init_device): Read the GOMP_NVPTX_LOWLAT_POOL envvar.
	(GOMP_OFFLOAD_run): Apply lowlat_pool_size.
	* config/nvptx/allocator.c: New file.
	* testsuite/libgomp.c/allocators-1.c: New test.
	* testsuite/libgomp.c/allocators-2.c: New test.
	* testsuite/libgomp.c/allocators-3.c: New test.
	* testsuite/libgomp.c/allocators-4.c: New test.
	* testsuite/libgomp.c/allocators-5.c: New test.
	* testsuite/libgomp.c/allocators-6.c: New test.

co-authored-by: Kwok Cheung Yeung  <kcy@codesourcery.com>
---
 libgomp/allocator.c                        | 235 ++++++++-----
 libgomp/config/nvptx/allocator.c           | 370 +++++++++++++++++++++
 libgomp/config/nvptx/team.c                |  28 ++
 libgomp/plugin/plugin-nvptx.c              |  23 +-
 libgomp/testsuite/libgomp.c/allocators-1.c |  56 ++++
 libgomp/testsuite/libgomp.c/allocators-2.c |  64 ++++
 libgomp/testsuite/libgomp.c/allocators-3.c |  42 +++
 libgomp/testsuite/libgomp.c/allocators-4.c | 196 +++++++++++
 libgomp/testsuite/libgomp.c/allocators-5.c |  63 ++++
 libgomp/testsuite/libgomp.c/allocators-6.c | 117 +++++++
 10 files changed, 1110 insertions(+), 84 deletions(-)
 create mode 100644 libgomp/config/nvptx/allocator.c
 create mode 100644 libgomp/testsuite/libgomp.c/allocators-1.c
 create mode 100644 libgomp/testsuite/libgomp.c/allocators-2.c
 create mode 100644 libgomp/testsuite/libgomp.c/allocators-3.c
 create mode 100644 libgomp/testsuite/libgomp.c/allocators-4.c
 create mode 100644 libgomp/testsuite/libgomp.c/allocators-5.c
 create mode 100644 libgomp/testsuite/libgomp.c/allocators-6.c


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-libgomp-nvptx-low-latency-memory-allocator.patch --]
[-- Type: text/x-patch; name="0001-libgomp-nvptx-low-latency-memory-allocator.patch", Size: 42973 bytes --]

diff --git a/libgomp/allocator.c b/libgomp/allocator.c
index b04820b8cf9..9b33bcf529b 100644
--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -37,6 +37,34 @@
 
 #define omp_max_predefined_alloc omp_thread_mem_alloc
 
+/* These macros may be overridden in config/<target>/allocator.c.  */
+#ifndef MEMSPACE_ALLOC
+#define MEMSPACE_ALLOC(MEMSPACE, SIZE) malloc (SIZE)
+#endif
+#ifndef MEMSPACE_CALLOC
+#define MEMSPACE_CALLOC(MEMSPACE, SIZE) calloc (1, SIZE)
+#endif
+#ifndef MEMSPACE_REALLOC
+#define MEMSPACE_REALLOC(MEMSPACE, ADDR, OLDSIZE, SIZE) realloc (ADDR, SIZE)
+#endif
+#ifndef MEMSPACE_FREE
+#define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE) free (ADDR)
+#endif
+
+/* Map the predefined allocators to the correct memory space.
+   The index to this table is the omp_allocator_handle_t enum value.  */
+static const omp_memspace_handle_t predefined_alloc_mapping[] = {
+  omp_default_mem_space,   /* omp_null_allocator. */
+  omp_default_mem_space,   /* omp_default_mem_alloc. */
+  omp_large_cap_mem_space, /* omp_large_cap_mem_alloc. */
+  omp_default_mem_space,   /* omp_const_mem_alloc. */
+  omp_high_bw_mem_space,   /* omp_high_bw_mem_alloc. */
+  omp_low_lat_mem_space,   /* omp_low_lat_mem_alloc. */
+  omp_low_lat_mem_space,   /* omp_cgroup_mem_alloc. */
+  omp_low_lat_mem_space,   /* omp_pteam_mem_alloc. */
+  omp_low_lat_mem_space,   /* omp_thread_mem_alloc. */
+};
+
 enum gomp_memkind_kind
 {
   GOMP_MEMKIND_NONE = 0,
@@ -453,7 +481,7 @@ retry:
 	}
       else
 #endif
-	ptr = malloc (new_size);
+	ptr = MEMSPACE_ALLOC (allocator_data->memspace, new_size);
       if (ptr == NULL)
 	{
 #ifdef HAVE_SYNC_BUILTINS
@@ -478,7 +506,13 @@ retry:
 	}
       else
 #endif
-	ptr = malloc (new_size);
+	{
+	  omp_memspace_handle_t memspace __attribute__((unused))
+	    = (allocator_data
+	       ? allocator_data->memspace
+	       : predefined_alloc_mapping[allocator]);
+	  ptr = MEMSPACE_ALLOC (memspace, new_size);
+	}
       if (ptr == NULL)
 	goto fail;
     }
@@ -496,35 +530,38 @@ retry:
   return ret;
 
 fail:
-  if (allocator_data)
+  int fallback = (allocator_data
+		  ? allocator_data->fallback
+		  : allocator == omp_default_mem_alloc
+		  ? omp_atv_null_fb
+		  : omp_atv_default_mem_fb);
+  switch (fallback)
     {
-      switch (allocator_data->fallback)
-	{
-	case omp_atv_default_mem_fb:
-	  if ((new_alignment > sizeof (void *) && new_alignment > alignment)
+    case omp_atv_default_mem_fb:
+      if ((new_alignment > sizeof (void *) && new_alignment > alignment)
 #ifdef LIBGOMP_USE_MEMKIND
-	      || memkind
+	  || memkind
 #endif
-	      || (allocator_data
-		  && allocator_data->pool_size < ~(uintptr_t) 0))
-	    {
-	      allocator = omp_default_mem_alloc;
-	      goto retry;
-	    }
-	  /* Otherwise, we've already performed default mem allocation
-	     and if that failed, it won't succeed again (unless it was
-	     intermittent.  Return NULL then, as that is the fallback.  */
-	  break;
-	case omp_atv_null_fb:
-	  break;
-	default:
-	case omp_atv_abort_fb:
-	  gomp_fatal ("Out of memory allocating %lu bytes",
-		      (unsigned long) size);
-	case omp_atv_allocator_fb:
-	  allocator = allocator_data->fb_data;
+	  || (allocator_data
+	      && allocator_data->pool_size < ~(uintptr_t) 0)
+	  || !allocator_data)
+	{
+	  allocator = omp_default_mem_alloc;
 	  goto retry;
 	}
+      /* Otherwise, we've already performed default mem allocation
+	 and if that failed, it won't succeed again (unless it was
+	 intermittent.  Return NULL then, as that is the fallback.  */
+      break;
+    case omp_atv_null_fb:
+      break;
+    default:
+    case omp_atv_abort_fb:
+      gomp_fatal ("Out of memory allocating %lu bytes",
+		  (unsigned long) size);
+    case omp_atv_allocator_fb:
+      allocator = allocator_data->fb_data;
+      goto retry;
     }
   return NULL;
 }
@@ -557,6 +594,8 @@ void
 omp_free (void *ptr, omp_allocator_handle_t allocator)
 {
   struct omp_mem_header *data;
+  omp_memspace_handle_t memspace __attribute__((unused))
+    = omp_default_mem_space;
 
   if (ptr == NULL)
     return;
@@ -586,10 +625,12 @@ omp_free (void *ptr, omp_allocator_handle_t allocator)
 	  return;
 	}
 #endif
+
+      memspace = allocator_data->memspace;
     }
-#ifdef LIBGOMP_USE_MEMKIND
   else
     {
+#ifdef LIBGOMP_USE_MEMKIND
       enum gomp_memkind_kind memkind = GOMP_MEMKIND_NONE;
       if (data->allocator == omp_high_bw_mem_alloc)
 	memkind = GOMP_MEMKIND_HBW_PREFERRED;
@@ -605,9 +646,12 @@ omp_free (void *ptr, omp_allocator_handle_t allocator)
 	      return;
 	    }
 	}
-    }
 #endif
-  free (data->ptr);
+
+      memspace = predefined_alloc_mapping[data->allocator];
+    }
+
+  MEMSPACE_FREE (memspace, data->ptr, data->size);
 }
 
 ialias (omp_free)
@@ -723,7 +767,7 @@ retry:
 	}
       else
 #endif
-	ptr = calloc (1, new_size);
+	ptr = MEMSPACE_CALLOC (allocator_data->memspace, new_size);
       if (ptr == NULL)
 	{
 #ifdef HAVE_SYNC_BUILTINS
@@ -748,7 +792,13 @@ retry:
 	}
       else
 #endif
-	ptr = calloc (1, new_size);
+	{
+	  omp_memspace_handle_t memspace __attribute__((unused))
+	    = (allocator_data
+	       ? allocator_data->memspace
+	       : predefined_alloc_mapping[allocator]);
+	  ptr = MEMSPACE_CALLOC (memspace, new_size);
+	}
       if (ptr == NULL)
 	goto fail;
     }
@@ -766,35 +816,38 @@ retry:
   return ret;
 
 fail:
-  if (allocator_data)
+  int fallback = (allocator_data
+		  ? allocator_data->fallback
+		  : allocator == omp_default_mem_alloc
+		  ? omp_atv_null_fb
+		  : omp_atv_default_mem_fb);
+  switch (fallback)
     {
-      switch (allocator_data->fallback)
-	{
-	case omp_atv_default_mem_fb:
-	  if ((new_alignment > sizeof (void *) && new_alignment > alignment)
+    case omp_atv_default_mem_fb:
+      if ((new_alignment > sizeof (void *) && new_alignment > alignment)
 #ifdef LIBGOMP_USE_MEMKIND
-	      || memkind
+	  || memkind
 #endif
-	      || (allocator_data
-		  && allocator_data->pool_size < ~(uintptr_t) 0))
-	    {
-	      allocator = omp_default_mem_alloc;
-	      goto retry;
-	    }
-	  /* Otherwise, we've already performed default mem allocation
-	     and if that failed, it won't succeed again (unless it was
-	     intermittent.  Return NULL then, as that is the fallback.  */
-	  break;
-	case omp_atv_null_fb:
-	  break;
-	default:
-	case omp_atv_abort_fb:
-	  gomp_fatal ("Out of memory allocating %lu bytes",
-		      (unsigned long) (size * nmemb));
-	case omp_atv_allocator_fb:
-	  allocator = allocator_data->fb_data;
+	  || (allocator_data
+	      && allocator_data->pool_size < ~(uintptr_t) 0)
+	  || !allocator_data)
+	{
+	  allocator = omp_default_mem_alloc;
 	  goto retry;
 	}
+      /* Otherwise, we've already performed default mem allocation
+	 and if that failed, it won't succeed again (unless it was
+	 intermittent.  Return NULL then, as that is the fallback.  */
+      break;
+    case omp_atv_null_fb:
+      break;
+    default:
+    case omp_atv_abort_fb:
+      gomp_fatal ("Out of memory allocating %lu bytes",
+		  (unsigned long) (size * nmemb));
+    case omp_atv_allocator_fb:
+      allocator = allocator_data->fb_data;
+      goto retry;
     }
   return NULL;
 }
@@ -967,9 +1020,10 @@ retry:
       else
 #endif
       if (prev_size)
-	new_ptr = realloc (data->ptr, new_size);
+	new_ptr = MEMSPACE_REALLOC (allocator_data->memspace, data->ptr,
+				    data->size, new_size);
       else
-	new_ptr = malloc (new_size);
+	new_ptr = MEMSPACE_ALLOC (allocator_data->memspace, new_size);
       if (new_ptr == NULL)
 	{
 #ifdef HAVE_SYNC_BUILTINS
@@ -1010,7 +1064,13 @@ retry:
 	}
       else
 #endif
-	new_ptr = realloc (data->ptr, new_size);
+	{
+	  omp_memspace_handle_t memspace __attribute__((unused))
+	    = (allocator_data
+	       ? allocator_data->memspace
+	       : predefined_alloc_mapping[allocator]);
+	  new_ptr = MEMSPACE_REALLOC (memspace, data->ptr, data->size, new_size);
+	}
       if (new_ptr == NULL)
 	goto fail;
       ret = (char *) new_ptr + sizeof (struct omp_mem_header);
@@ -1030,7 +1090,13 @@ retry:
 	}
       else
 #endif
-	new_ptr = malloc (new_size);
+	{
+	  omp_memspace_handle_t memspace __attribute__((unused))
+	    = (allocator_data
+	       ? allocator_data->memspace
+	       : predefined_alloc_mapping[allocator]);
+	  new_ptr = MEMSPACE_ALLOC (memspace, new_size);
+	}
       if (new_ptr == NULL)
 	goto fail;
     }
@@ -1073,35 +1139,38 @@ retry:
   return ret;
 
 fail:
-  if (allocator_data)
+  int fallback = (allocator_data
+		  ? allocator_data->fallback
+		  : allocator == omp_default_mem_alloc
+		  ? omp_atv_null_fb
+		  : omp_atv_default_mem_fb);
+  switch (fallback)
     {
-      switch (allocator_data->fallback)
-	{
-	case omp_atv_default_mem_fb:
-	  if (new_alignment > sizeof (void *)
+    case omp_atv_default_mem_fb:
+      if (new_alignment > sizeof (void *)
 #ifdef LIBGOMP_USE_MEMKIND
-	      || memkind
+	  || memkind
 #endif
-	      || (allocator_data
-		  && allocator_data->pool_size < ~(uintptr_t) 0))
-	    {
-	      allocator = omp_default_mem_alloc;
-	      goto retry;
-	    }
-	  /* Otherwise, we've already performed default mem allocation
-	     and if that failed, it won't succeed again (unless it was
-	     intermittent.  Return NULL then, as that is the fallback.  */
-	  break;
-	case omp_atv_null_fb:
-	  break;
-	default:
-	case omp_atv_abort_fb:
-	  gomp_fatal ("Out of memory allocating %lu bytes",
-		      (unsigned long) size);
-	case omp_atv_allocator_fb:
-	  allocator = allocator_data->fb_data;
+	  || (allocator_data
+	      && allocator_data->pool_size < ~(uintptr_t) 0)
+	  || !allocator_data)
+	{
+	  allocator = omp_default_mem_alloc;
 	  goto retry;
 	}
+      /* Otherwise, we've already performed default mem allocation
+	 and if that failed, it won't succeed again (unless it was
+	 intermittent.  Return NULL then, as that is the fallback.  */
+      break;
+    case omp_atv_null_fb:
+      break;
+    default:
+    case omp_atv_abort_fb:
+      gomp_fatal ("Out of memory allocating %lu bytes",
+		  (unsigned long) size);
+    case omp_atv_allocator_fb:
+      allocator = allocator_data->fb_data;
+      goto retry;
     }
   return NULL;
 }
diff --git a/libgomp/config/nvptx/allocator.c b/libgomp/config/nvptx/allocator.c
new file mode 100644
index 00000000000..6bc2ea48043
--- /dev/null
+++ b/libgomp/config/nvptx/allocator.c
@@ -0,0 +1,370 @@
+/* Copyright (C) 2021 Free Software Foundation, Inc.
+
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* The low-latency allocators use space reserved in .shared memory when the
+   kernel is launched.  The heap is initialized in gomp_nvptx_main and all
+   allocations are forgotten when the kernel exits.  Allocations to other
+   memory spaces all use the system malloc syscall.
+
+   The root heap descriptor is stored elsewhere in shared memory, and each
+   free chunk contains a similar descriptor for the next free chunk in the
+   chain.
+
+   The descriptor is two 16-bit values: offset and size, which describe the
+   location of a chunk of memory available for allocation. The offset is
+   relative to the base of the heap.  The special value 0xffff, 0xffff
+   indicates that the heap is locked.  The descriptor is encoded into a
+   single 32-bit integer so that it may be easily accessed atomically.
+
+   Memory is allocated to the first free chunk that fits.  The free chain
+   is always stored in order of the offset to assist coalescing adjacent
+   chunks.  */
+
+#include "libgomp.h"
+#include <stdlib.h>
+
+/* There should be some .shared space reserved for us.  There's no way to
+   express this magic extern sizeless array in C so use asm.  */
+asm (".extern .shared .u8 __nvptx_lowlat_pool[];\n");
+
+extern uint32_t __nvptx_lowlat_heap_root __attribute__((shared,nocommon));
+
+typedef union {
+  uint32_t raw;
+  struct {
+    uint16_t offset;
+    uint16_t size;
+  } desc;
+} heapdesc;
+
+static void *
+nvptx_memspace_alloc (omp_memspace_handle_t memspace, size_t size)
+{
+  if (memspace == omp_low_lat_mem_space)
+    {
+      char *shared_pool;
+      asm ("cvta.shared.u64\t%0, __nvptx_lowlat_pool;" : "=r"(shared_pool));
+
+      /* Memory is allocated in 8-byte granularity.  */
+      size = (size + 7) & ~7;
+
+      /* Acquire a lock on the low-latency heap.  */
+      heapdesc root;
+      do
+	{
+	  root.raw = __atomic_exchange_n (&__nvptx_lowlat_heap_root,
+					  0xffffffff, MEMMODEL_ACQUIRE);
+	  if (root.raw != 0xffffffff)
+	    break;
+	  /* Spin.  */
+	}
+      while (1);
+
+      /* Walk the free chain.  */
+      heapdesc chunk = {root.raw};
+      uint32_t *prev_chunkptr = NULL;
+      uint32_t *chunkptr = (uint32_t*)(shared_pool + chunk.desc.offset);
+      heapdesc onward_chain = {chunkptr[0]};
+      while (chunk.desc.size != 0 && (uint32_t)size > chunk.desc.size)
+	{
+	  chunk.raw = onward_chain.raw;
+	  prev_chunkptr = chunkptr;
+	  chunkptr = (uint32_t*)(shared_pool + chunk.desc.offset);
+	  onward_chain.raw = chunkptr[0];
+	}
+
+      void *result = NULL;
+      if (chunk.desc.size != 0)
+	{
+	  /* Allocation successful.  */
+	  result = chunkptr;
+
+	  /* Update the free chain.  */
+	  heapdesc stillfree = {chunk.raw};
+	  stillfree.desc.offset += size;
+	  stillfree.desc.size -= size;
+	  uint32_t *stillfreeptr = (uint32_t*)(shared_pool
+					       + stillfree.desc.offset);
+
+	  if (stillfree.desc.size == 0)
+	    /* The whole chunk was used.  */
+	    stillfree.raw = onward_chain.raw;
+	  else
+	    /* The chunk was split, so restore the onward chain.  */
+	    stillfreeptr[0] = onward_chain.raw;
+
+	  /* The previous free slot or root now points to stillfree.  */
+	  if (prev_chunkptr)
+	    prev_chunkptr[0] = stillfree.raw;
+	  else
+	    root.raw = stillfree.raw;
+	}
+
+      /* Update the free chain root and release the lock.  */
+      __atomic_store_n (&__nvptx_lowlat_heap_root, root.raw, MEMMODEL_RELEASE);
+      return result;
+    }
+  else
+    return malloc (size);
+}
+
+static void *
+nvptx_memspace_calloc (omp_memspace_handle_t memspace, size_t size)
+{
+  if (memspace == omp_low_lat_mem_space)
+    {
+      /* Memory is allocated in 8-byte granularity.  */
+      size = (size + 7) & ~7;
+
+      uint64_t *result = nvptx_memspace_alloc (memspace, size);
+      if (result)
+	/* Inline memset in which we know size is a multiple of 8.  */
+	for (unsigned i = 0; i < (unsigned)size/8; i++)
+	  result[i] = 0;
+
+      return result;
+    }
+  else
+    return calloc (1, size);
+}
+
+static void
+nvptx_memspace_free (omp_memspace_handle_t memspace, void *addr, size_t size)
+{
+  if (memspace == omp_low_lat_mem_space)
+    {
+      char *shared_pool;
+      asm ("cvta.shared.u64\t%0, __nvptx_lowlat_pool;" : "=r"(shared_pool));
+
+      /* Memory is allocated in 8-byte granularity.  */
+      size = (size + 7) & ~7;
+
+      /* Acquire a lock on the low-latency heap.  */
+      heapdesc root;
+      do
+	{
+	  root.raw = __atomic_exchange_n (&__nvptx_lowlat_heap_root,
+					  0xffffffff, MEMMODEL_ACQUIRE);
+	  if (root.raw != 0xffffffff)
+	    break;
+	  /* Spin.  */
+	}
+      while (1);
+
+      /* Walk the free chain to find where to insert a new entry.  */
+      heapdesc chunk = {root.raw}, prev_chunk;
+      uint32_t *prev_chunkptr = NULL, *prevprev_chunkptr = NULL;
+      uint32_t *chunkptr = (uint32_t*)(shared_pool + chunk.desc.offset);
+      heapdesc onward_chain = {chunkptr[0]};
+      while (chunk.desc.size != 0 && addr > (void*)chunkptr)
+	{
+	  prev_chunk.raw = chunk.raw;
+	  chunk.raw = onward_chain.raw;
+	  prevprev_chunkptr = prev_chunkptr;
+	  prev_chunkptr = chunkptr;
+	  chunkptr = (uint32_t*)(shared_pool + chunk.desc.offset);
+	  onward_chain.raw = chunkptr[0];
+	}
+
+      /* Create the new chunk descriptor.  */
+      heapdesc newfreechunk;
+      newfreechunk.desc.offset = (uint16_t)((uintptr_t)addr
+					    - (uintptr_t)shared_pool);
+      newfreechunk.desc.size = (uint16_t)size;
+
+      /* Coalesce adjacent free chunks.  */
+      if (newfreechunk.desc.offset + size == chunk.desc.offset)
+	{
+	  /* Free chunk follows.  */
+	  newfreechunk.desc.size += chunk.desc.size;
+	  chunk.raw = onward_chain.raw;
+	}
+      if (prev_chunkptr)
+	{
+	  if (prev_chunk.desc.offset + prev_chunk.desc.size
+	      == newfreechunk.desc.offset)
+	    {
+	      /* Free chunk precedes.  */
+	      newfreechunk.desc.offset = prev_chunk.desc.offset;
+	      newfreechunk.desc.size += prev_chunk.desc.size;
+	      addr = shared_pool + prev_chunk.desc.offset;
+	      prev_chunkptr = prevprev_chunkptr;
+	    }
+	}
+
+      /* Update the free chain in the new and previous chunks.  */
+      ((uint32_t*)addr)[0] = chunk.raw;
+      if (prev_chunkptr)
+	prev_chunkptr[0] = newfreechunk.raw;
+      else
+	root.raw = newfreechunk.raw;
+
+      /* Update the free chain root and release the lock.  */
+      __atomic_store_n (&__nvptx_lowlat_heap_root, root.raw, MEMMODEL_RELEASE);
+    }
+  else
+    free (addr);
+}
+
+static void *
+nvptx_memspace_realloc (omp_memspace_handle_t memspace, void *addr,
+			size_t oldsize, size_t size)
+{
+  if (memspace == omp_low_lat_mem_space)
+    {
+      char *shared_pool;
+      asm ("cvta.shared.u64\t%0, __nvptx_lowlat_pool;" : "=r"(shared_pool));
+
+      /* Memory is allocated in 8-byte granularity.  */
+      oldsize = (oldsize + 7) & ~7;
+      size = (size + 7) & ~7;
+
+      if (oldsize == size)
+	return addr;
+
+      /* Acquire a lock on the low-latency heap.  */
+      heapdesc root;
+      do
+	{
+	  root.raw = __atomic_exchange_n (&__nvptx_lowlat_heap_root,
+					  0xffffffff, MEMMODEL_ACQUIRE);
+	  if (root.raw != 0xffffffff)
+	    break;
+	  /* Spin.  */
+	}
+      while (1);
+
+      /* Walk the free chain.  */
+      heapdesc chunk = {root.raw};
+      uint32_t *prev_chunkptr = NULL;
+      uint32_t *chunkptr = (uint32_t*)(shared_pool + chunk.desc.offset);
+      heapdesc onward_chain = {chunkptr[0]};
+      while (chunk.desc.size != 0 && (void*)chunkptr < addr)
+	{
+	  chunk.raw = onward_chain.raw;
+	  prev_chunkptr = chunkptr;
+	  chunkptr = (uint32_t*)(shared_pool + chunk.desc.offset);
+	  onward_chain.raw = chunkptr[0];
+	}
+
+      void *result = NULL;
+      if (size < oldsize)
+	{
+	  /* The new allocation is smaller than the old; we can always
+	     shrink an allocation in place.  */
+	  result = addr;
+
+	  uint32_t *nowfreeptr = (uint32_t*)(addr + size);
+
+	  /* Update the free chain.  */
+	  heapdesc nowfree;
+	  nowfree.desc.offset = (char*)nowfreeptr - shared_pool;
+	  nowfree.desc.size = oldsize - size;
+
+	  if (nowfree.desc.offset + size == chunk.desc.offset)
+	    {
+	      /* Coalesce following free chunk.  */
+	      nowfree.desc.size += chunk.desc.size;
+	      nowfreeptr[0] = onward_chain.raw;
+	    }
+	  else
+	    nowfreeptr[0] = chunk.raw;
+
+	  /* The previous free slot or root now points to nowfree.  */
+	  if (prev_chunkptr)
+	    prev_chunkptr[0] = nowfree.raw;
+	  else
+	    root.raw = nowfree.raw;
+	}
+      else if (chunk.desc.size != 0
+	       && (char *)addr + oldsize == (char *)chunkptr
+	       && chunk.desc.size >= size-oldsize)
+	{
+	  /* The new allocation is larger than the old, and we found a
+	     large enough free block right after the existing block,
+	     so we extend into that space.  */
+	  result = addr;
+
+	  uint16_t delta = size-oldsize;
+
+	  /* Update the free chain.  */
+	  heapdesc stillfree = {chunk.raw};
+	  stillfree.desc.offset += delta;
+	  stillfree.desc.size -= delta;
+	  uint32_t *stillfreeptr = (uint32_t*)(shared_pool
+					       + stillfree.desc.offset);
+
+	  if (stillfree.desc.size == 0)
+	    /* The whole chunk was used.  */
+	    stillfree.raw = onward_chain.raw;
+	  else
+	    /* The chunk was split, so restore the onward chain.  */
+	    stillfreeptr[0] = onward_chain.raw;
+
+	  /* The previous free slot or root now points to stillfree.  */
+	  if (prev_chunkptr)
+	    prev_chunkptr[0] = stillfree.raw;
+	  else
+	    root.raw = stillfree.raw;
+	}
+      /* Else realloc in-place has failed and result remains NULL.  */
+
+      /* Update the free chain root and release the lock.  */
+      __atomic_store_n (&__nvptx_lowlat_heap_root, root.raw, MEMMODEL_RELEASE);
+
+      if (result == NULL)
+	{
+	  /* The allocation could not be extended in place, so we simply
+	     allocate fresh memory and move the data.  If we can't allocate
+	     from low-latency memory then we leave the original alloaction
+	     intact and return NULL.
+	     We could do a fall-back to main memory, but we don't know what
+	     the fall-back trait said to do.  */
+	  result = nvptx_memspace_alloc (memspace, size);
+	  if (result != NULL)
+	    {
+	      /* Inline memcpy in which we know oldsize is a multiple of 8.  */
+	      uint64_t *from = addr, *to = result;
+	      for (unsigned i = 0; i < (unsigned)oldsize/8; i++)
+		to[i] = from[i];
+
+	      nvptx_memspace_free (memspace, addr, oldsize);
+	    }
+	}
+      return result;
+    }
+  else
+    return realloc (addr, size);
+}
+
+#define MEMSPACE_ALLOC(MEMSPACE, SIZE) \
+  nvptx_memspace_alloc (MEMSPACE, SIZE)
+#define MEMSPACE_CALLOC(MEMSPACE, SIZE) \
+  nvptx_memspace_calloc (MEMSPACE, SIZE)
+#define MEMSPACE_REALLOC(MEMSPACE, ADDR, OLDSIZE, SIZE) \
+  nvptx_memspace_realloc (MEMSPACE, ADDR, OLDSIZE, SIZE)
+#define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE) \
+  nvptx_memspace_free (MEMSPACE, ADDR, SIZE)
+
+#include "../../allocator.c"
diff --git a/libgomp/config/nvptx/team.c b/libgomp/config/nvptx/team.c
index 6923416fb4e..65a7af3417b 100644
--- a/libgomp/config/nvptx/team.c
+++ b/libgomp/config/nvptx/team.c
@@ -33,9 +33,13 @@
 
 struct gomp_thread *nvptx_thrs __attribute__((shared,nocommon));
 int __gomp_team_num __attribute__((shared,nocommon));
+uint32_t __nvptx_lowlat_heap_root __attribute__((shared,nocommon));
 
 static void gomp_thread_start (struct gomp_thread_pool *);
 
+/* There should be some .shared space reserved for us.  There's no way to
+   express this magic extern sizeless array in C so use asm.  */
+asm (".extern .shared .u8 __nvptx_lowlat_pool[];\n");
 
 /* This externally visible function handles target region entry.  It
    sets up a per-team thread pool and transfers control by calling FN (FN_DATA)
@@ -63,6 +67,30 @@ gomp_nvptx_main (void (*fn) (void *), void *fn_data)
       nvptx_thrs = alloca (ntids * sizeof (*nvptx_thrs));
       memset (nvptx_thrs, 0, ntids * sizeof (*nvptx_thrs));
 
+      /* Find the low-latency heap details ....  */
+      uint32_t *shared_pool;
+      uint32_t shared_pool_size = 0;
+      asm ("cvta.shared.u64\t%0, __nvptx_lowlat_pool;" : "=r"(shared_pool));
+#if __PTX_ISA_VERSION_MAJOR__ > 4 \
+    || (__PTX_ISA_VERSION_MAJOR__ == 4 && __PTX_ISA_VERSION_MAJOR__ >= 1)
+      asm ("mov.u32\t%0, %%dynamic_smem_size;\n"
+	   : "=r"(shared_pool_size));
+#endif
+
+      /* ... and initialize it with an empty free-chain.  */
+      union {
+	uint32_t raw;
+	struct {
+	  uint16_t offset;
+	  uint16_t size;
+	} desc;
+      } root;
+      root.desc.offset = 0;		 /* The first byte is free.  */
+      root.desc.size = shared_pool_size; /* The whole space is free.  */
+      shared_pool[0] = 0;		 /* Terminate free chain.  */
+      __atomic_store_n (&__nvptx_lowlat_heap_root, root.raw, MEMMODEL_RELEASE);
+
+      /* Initialize the thread pool.  */
       struct gomp_thread_pool *pool = alloca (sizeof (*pool));
       pool->threads = alloca (ntids * sizeof (*pool->threads));
       for (tid = 0; tid < ntids; tid++)
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index bc63e274cdf..40739ba592d 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -334,6 +334,11 @@ struct ptx_device
 
 static struct ptx_device **ptx_devices;
 
+/* OpenMP kernels reserve a small amount of ".shared" space for use by
+   omp_alloc.  The size is configured using GOMP_NVPTX_LOWLAT_POOL, but the
+   default is set here.  */
+static unsigned lowlat_pool_size = 8*1024;
+
 static inline struct nvptx_thread *
 nvptx_thread (void)
 {
@@ -1205,6 +1210,22 @@ GOMP_OFFLOAD_init_device (int n)
       instantiated_devices++;
     }
 
+  const char *var_name = "GOMP_NVPTX_LOWLAT_POOL";
+  const char *env_var = secure_getenv (var_name);
+  notify_var (var_name, env_var);
+
+  if (env_var != NULL)
+    {
+      char *endptr;
+      unsigned long val = strtoul (env_var, &endptr, 10);
+      if (endptr == NULL || *endptr != '\0'
+	  || errno == ERANGE || errno == EINVAL
+	  || val > UINT_MAX)
+	GOMP_PLUGIN_error ("Error parsing %s", var_name);
+      else
+	lowlat_pool_size = val;
+    }
+
   pthread_mutex_unlock (&ptx_dev_lock);
 
   return dev != NULL;
@@ -2030,7 +2051,7 @@ GOMP_OFFLOAD_run (int ord, void *tgt_fn, void *tgt_vars, void **args)
 		     " [(teams: %u), 1, 1] [(lanes: 32), (threads: %u), 1]\n",
 		     __FUNCTION__, fn_name, teams, threads);
   r = CUDA_CALL_NOCHECK (cuLaunchKernel, function, teams, 1, 1,
-			 32, threads, 1, 0, NULL, NULL, config);
+			 32, threads, 1, lowlat_pool_size, NULL, NULL, config);
   if (r != CUDA_SUCCESS)
     GOMP_PLUGIN_fatal ("cuLaunchKernel error: %s", cuda_error (r));
 
diff --git a/libgomp/testsuite/libgomp.c/allocators-1.c b/libgomp/testsuite/libgomp.c/allocators-1.c
new file mode 100644
index 00000000000..04968e4c83d
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/allocators-1.c
@@ -0,0 +1,56 @@
+/* { dg-do run } */
+
+/* Test that omp_alloc returns usable memory.  */
+
+#include <omp.h>
+
+#pragma omp requires dynamic_allocators
+
+void
+test (int n, omp_allocator_handle_t allocator)
+{
+  #pragma omp target map(to:n) map(to:allocator)
+  {
+    int *a;
+    a = (int *) omp_alloc(n*sizeof(int), allocator);
+
+    #pragma omp parallel
+    for (int i = 0; i < n; i++)
+      a[i] = i;
+
+    for (int i = 0; i < n; i++)
+      if (a[i] != i)
+	{
+	  __builtin_printf ("data mismatch at %i\n", i);
+	  __builtin_abort ();
+	}
+
+    omp_free(a, allocator);
+  }
+}
+
+int
+main ()
+{
+  // Smaller than low-latency memory limit
+  test (10, omp_default_mem_alloc);
+  test (10, omp_large_cap_mem_alloc);
+  test (10, omp_const_mem_alloc);
+  test (10, omp_high_bw_mem_alloc);
+  test (10, omp_low_lat_mem_alloc);
+  test (10, omp_cgroup_mem_alloc);
+  test (10, omp_pteam_mem_alloc);
+  test (10, omp_thread_mem_alloc);
+
+  // Larger than low-latency memory limit
+  test (100000, omp_default_mem_alloc);
+  test (100000, omp_large_cap_mem_alloc);
+  test (100000, omp_const_mem_alloc);
+  test (100000, omp_high_bw_mem_alloc);
+  test (100000, omp_low_lat_mem_alloc);
+  test (100000, omp_cgroup_mem_alloc);
+  test (100000, omp_pteam_mem_alloc);
+  test (100000, omp_thread_mem_alloc);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c/allocators-2.c b/libgomp/testsuite/libgomp.c/allocators-2.c
new file mode 100644
index 00000000000..a98f1b4c05e
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/allocators-2.c
@@ -0,0 +1,64 @@
+/* { dg-do run } */
+
+/* Test concurrent and repeated allocations.  */
+
+#include <omp.h>
+
+#pragma omp requires dynamic_allocators
+
+void
+test (int n, omp_allocator_handle_t allocator)
+{
+  #pragma omp target map(to:n) map(to:allocator)
+  {
+    int **a;
+    a = (int **) omp_alloc(n*sizeof(int*), allocator);
+
+    #pragma omp parallel for
+    for (int i = 0; i < n; i++)
+      {
+	/*Use 10x to ensure we do activate low-latency fall-back.  */
+	a[i] = omp_alloc(sizeof(int)*10, allocator);
+	a[i][0] = i;
+      }
+
+    for (int i = 0; i < n; i++)
+      if (a[i][0] != i)
+	{
+	  __builtin_printf ("data mismatch at %i\n", i);
+	  __builtin_abort ();
+	}
+
+    #pragma omp parallel for
+    for (int i = 0; i < n; i++)
+      omp_free(a[i], allocator);
+
+    omp_free (a, allocator);
+  }
+}
+
+int
+main ()
+{
+  // Smaller than low-latency memory limit
+  test (10, omp_default_mem_alloc);
+  test (10, omp_large_cap_mem_alloc);
+  test (10, omp_const_mem_alloc);
+  test (10, omp_high_bw_mem_alloc);
+  test (10, omp_low_lat_mem_alloc);
+  test (10, omp_cgroup_mem_alloc);
+  test (10, omp_pteam_mem_alloc);
+  test (10, omp_thread_mem_alloc);
+
+  // Larger than low-latency memory limit (on aggregate)
+  test (1000, omp_default_mem_alloc);
+  test (1000, omp_large_cap_mem_alloc);
+  test (1000, omp_const_mem_alloc);
+  test (1000, omp_high_bw_mem_alloc);
+  test (1000, omp_low_lat_mem_alloc);
+  test (1000, omp_cgroup_mem_alloc);
+  test (1000, omp_pteam_mem_alloc);
+  test (1000, omp_thread_mem_alloc);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c/allocators-3.c b/libgomp/testsuite/libgomp.c/allocators-3.c
new file mode 100644
index 00000000000..45514c2a088
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/allocators-3.c
@@ -0,0 +1,42 @@
+/* { dg-do run } */
+
+/* Stress-test omp_alloc/omp_malloc under concurrency.  */
+
+#include <omp.h>
+#include <stdio.h>
+#include <stdlib.h>
+
+#pragma omp requires dynamic_allocators
+
+#define N 1000
+
+void
+test (omp_allocator_handle_t allocator)
+{
+  #pragma omp target map(to:allocator)
+  {
+    #pragma omp parallel for
+    for (int i = 0; i < N; i++)
+      for (int j = 0; j < N; j++)
+	{
+	  int *p = omp_alloc(sizeof(int), allocator);
+	  omp_free(p, allocator);
+	}
+  }
+}
+
+int
+main ()
+{
+  // Smaller than low-latency memory limit
+  test (omp_default_mem_alloc);
+  test (omp_large_cap_mem_alloc);
+  test (omp_const_mem_alloc);
+  test (omp_high_bw_mem_alloc);
+  test (omp_low_lat_mem_alloc);
+  test (omp_cgroup_mem_alloc);
+  test (omp_pteam_mem_alloc);
+  test (omp_thread_mem_alloc);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c/allocators-4.c b/libgomp/testsuite/libgomp.c/allocators-4.c
new file mode 100644
index 00000000000..9fa6aa1624f
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/allocators-4.c
@@ -0,0 +1,196 @@
+/* { dg-do run } */
+
+/* Test that low-latency free chains are sound.  */
+
+#include <stddef.h>
+#include <omp.h>
+
+#pragma omp requires dynamic_allocators
+
+void
+check (int cond, const char *msg)
+{
+  if (!cond)
+    {
+      __builtin_printf ("%s\n", msg);
+      __builtin_abort ();
+    }
+}
+
+int
+main ()
+{
+  #pragma omp target
+  {
+    /* Ensure that the memory we get *is* low-latency with a null-fallback.  */
+    omp_alloctrait_t traits[1]
+      = { { omp_atk_fallback, omp_atv_null_fb } };
+    omp_allocator_handle_t lowlat = omp_init_allocator (omp_low_lat_mem_space,
+							1, traits);
+
+    int size = 4;
+
+    char *a = omp_alloc(size, lowlat);
+    char *b = omp_alloc(size, lowlat);
+    char *c = omp_alloc(size, lowlat);
+    char *d = omp_alloc(size, lowlat);
+
+    /* There are headers and padding to account for.  */
+    int size2 = size + (b-a);
+    int size3 = size + (c-a);
+    int size4 = size + (d-a) + 100; /* Random larger amount.  */
+
+    check (a != NULL && b != NULL && c != NULL && d != NULL,
+	   "omp_alloc returned NULL\n");
+
+    omp_free(a, lowlat);
+    char *p = omp_alloc (size, lowlat);
+    check (p == a, "allocate did not reuse first chunk");
+
+    omp_free(b, lowlat);
+    p = omp_alloc (size, lowlat);
+    check (p == b, "allocate did not reuse second chunk");
+
+    omp_free(c, lowlat);
+    p = omp_alloc (size, lowlat);
+    check (p == c, "allocate did not reuse third chunk");
+
+    omp_free(a, lowlat);
+    omp_free(b, lowlat);
+    p = omp_alloc (size2, lowlat);
+    check (p == a, "allocate did not coalesce first two chunks");
+
+    omp_free(p, lowlat);
+    p = omp_alloc (size, lowlat);
+    check (p == a, "allocate did not split first chunk (1)");
+    p = omp_alloc (size, lowlat);
+    check (p == b, "allocate did not split first chunk (2)");
+
+    omp_free(b, lowlat);
+    omp_free(c, lowlat);
+    p = omp_alloc (size2, lowlat);
+    check (p == b, "allocate did not coalesce middle two chunks");
+
+    omp_free(p, lowlat);
+    p = omp_alloc (size, lowlat);
+    check (p == b, "allocate did not split second chunk (1)");
+    p = omp_alloc (size, lowlat);
+    check (p == c, "allocate did not split second chunk (2)");
+
+    omp_free(b, lowlat);
+    omp_free(a, lowlat);
+    p = omp_alloc (size2, lowlat);
+    check (p == a, "allocate did not coalesce first two chunks, reverse free");
+
+    omp_free(p, lowlat);
+    p = omp_alloc (size, lowlat);
+    check (p == a, "allocate did not split first chunk (1), reverse free");
+    p = omp_alloc (size, lowlat);
+    check (p == b, "allocate did not split first chunk (2), reverse free");
+
+    omp_free(c, lowlat);
+    omp_free(b, lowlat);
+    p = omp_alloc (size2, lowlat);
+    check (p == b, "allocate did not coalesce second two chunks, reverse free");
+
+    omp_free(p, lowlat);
+    p = omp_alloc (size, lowlat);
+    check (p == b, "allocate did not split second chunk (1), reverse free");
+    p = omp_alloc (size, lowlat);
+    check (p == c, "allocate did not split second chunk (2), reverse free");
+
+    omp_free(a, lowlat);
+    omp_free(b, lowlat);
+    omp_free(c, lowlat);
+    p = omp_alloc (size3, lowlat);
+    check (p == a, "allocate did not coalesce first three chunks");
+
+    omp_free(p, lowlat);
+    p = omp_alloc (size, lowlat);
+    check (p == a, "allocate did not split first chunk (1)");
+    p = omp_alloc (size, lowlat);
+    check (p == b, "allocate did not split first chunk (2)");
+    p = omp_alloc (size, lowlat);
+    check (p == c, "allocate did not split first chunk (3)");
+
+    omp_free(b, lowlat);
+    omp_free(c, lowlat);
+    omp_free(d, lowlat);
+    p = omp_alloc (size3, lowlat);
+    check (p == b, "allocate did not coalesce last three chunks");
+
+    omp_free(p, lowlat);
+    p = omp_alloc (size, lowlat);
+    check (p == b, "allocate did not split second chunk (1)");
+    p = omp_alloc (size, lowlat);
+    check (p == c, "allocate did not split second chunk (2)");
+    p = omp_alloc (size, lowlat);
+    check (p == d, "allocate did not split second chunk (3)");
+
+    omp_free(c, lowlat);
+    omp_free(b, lowlat);
+    omp_free(a, lowlat);
+    p = omp_alloc (size3, lowlat);
+    check (p == a, "allocate did not coalesce first three chunks, reverse free");
+
+    omp_free(p, lowlat);
+    p = omp_alloc (size, lowlat);
+    check (p == a, "allocate did not split first chunk (1), reverse free");
+    p = omp_alloc (size, lowlat);
+    check (p == b, "allocate did not split first chunk (2), reverse free");
+    p = omp_alloc (size, lowlat);
+    check (p == c, "allocate did not split first chunk (3), reverse free");
+
+    omp_free(d, lowlat);
+    omp_free(c, lowlat);
+    omp_free(b, lowlat);
+    p = omp_alloc (size3, lowlat);
+    check (p == b, "allocate did not coalesce second three chunks, reverse free");
+
+    omp_free(p, lowlat);
+    p = omp_alloc (size, lowlat);
+    check (p == b, "allocate did not split second chunk (1), reverse free");
+    p = omp_alloc (size, lowlat);
+    check (p == c, "allocate did not split second chunk (2), reverse free");
+    p = omp_alloc (size, lowlat);
+    check (p == d, "allocate did not split second chunk (3), reverse free");
+
+    omp_free(c, lowlat);
+    omp_free(a, lowlat);
+    omp_free(b, lowlat);
+    p = omp_alloc (size3, lowlat);
+    check (p == a, "allocate did not coalesce first three chunks, mixed free");
+
+    omp_free(p, lowlat);
+    p = omp_alloc (size, lowlat);
+    check (p == a, "allocate did not split first chunk (1), mixed free");
+    p = omp_alloc (size, lowlat);
+    check (p == b, "allocate did not split first chunk (2), mixed free");
+    p = omp_alloc (size, lowlat);
+    check (p == c, "allocate did not split first chunk (3), mixed free");
+
+    omp_free(d, lowlat);
+    omp_free(b, lowlat);
+    omp_free(c, lowlat);
+    p = omp_alloc (size3, lowlat);
+    check (p == b, "allocate did not coalesce second three chunks, mixed free");
+
+    omp_free(p, lowlat);
+    p = omp_alloc (size, lowlat);
+    check (p == b, "allocate did not split second chunk (1), mixed free");
+    p = omp_alloc (size, lowlat);
+    check (p == c, "allocate did not split second chunk (2), mixed free");
+    p = omp_alloc (size, lowlat);
+    check (p == d, "allocate did not split second chunk (3), mixed free");
+
+    omp_free(a, lowlat);
+    omp_free(b, lowlat);
+    omp_free(c, lowlat);
+    omp_free(d, lowlat);
+    p = omp_alloc(size4, lowlat);
+    check (p == a, "allocate did not coalesce all memory");
+  }
+
+return 0;
+}
+
diff --git a/libgomp/testsuite/libgomp.c/allocators-5.c b/libgomp/testsuite/libgomp.c/allocators-5.c
new file mode 100644
index 00000000000..9694010cf1f
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/allocators-5.c
@@ -0,0 +1,63 @@
+/* { dg-do run } */
+
+/* Test calloc with omp_alloc.  */
+
+#include <omp.h>
+
+#pragma omp requires dynamic_allocators
+
+void
+test (int n, omp_allocator_handle_t allocator)
+{
+  #pragma omp target map(to:n) map(to:allocator)
+  {
+    int *a;
+    a = (int *) omp_calloc(n, sizeof(int), allocator);
+
+    for (int i = 0; i < n; i++)
+      if (a[i] != 0)
+	{
+	  __builtin_printf ("memory not zeroed at %i\n", i);
+	  __builtin_abort ();
+	}
+
+    #pragma omp parallel
+    for (int i = 0; i < n; i++)
+      a[i] = i;
+
+    for (int i = 0; i < n; i++)
+      if (a[i] != i)
+	{
+	  __builtin_printf ("data mismatch at %i\n", i);
+	  __builtin_abort ();
+	}
+
+    omp_free(a, allocator);
+  }
+}
+
+int
+main ()
+{
+  // Smaller than low-latency memory limit
+  test (10, omp_default_mem_alloc);
+  test (10, omp_large_cap_mem_alloc);
+  test (10, omp_const_mem_alloc);
+  test (10, omp_high_bw_mem_alloc);
+  test (10, omp_low_lat_mem_alloc);
+  test (10, omp_cgroup_mem_alloc);
+  test (10, omp_pteam_mem_alloc);
+  test (10, omp_thread_mem_alloc);
+
+  // Larger than low-latency memory limit
+  test (100000, omp_default_mem_alloc);
+  test (100000, omp_large_cap_mem_alloc);
+  test (100000, omp_const_mem_alloc);
+  test (100000, omp_high_bw_mem_alloc);
+  test (100000, omp_low_lat_mem_alloc);
+  test (100000, omp_cgroup_mem_alloc);
+  test (100000, omp_pteam_mem_alloc);
+  test (100000, omp_thread_mem_alloc);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c/allocators-6.c b/libgomp/testsuite/libgomp.c/allocators-6.c
new file mode 100644
index 00000000000..90bf73095ef
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/allocators-6.c
@@ -0,0 +1,117 @@
+/* { dg-do run } */
+
+/* Test that low-latency realloc and free chains are sound.  */
+
+#include <stddef.h>
+#include <omp.h>
+
+#pragma omp requires dynamic_allocators
+
+void
+check (int cond, const char *msg)
+{
+  if (!cond)
+    {
+      __builtin_printf ("%s\n", msg);
+      __builtin_abort ();
+    }
+}
+
+int
+main ()
+{
+  #pragma omp target
+  {
+    /* Ensure that the memory we get *is* low-latency with a null-fallback.  */
+    omp_alloctrait_t traits[1]
+      = { { omp_atk_fallback, omp_atv_null_fb } };
+    omp_allocator_handle_t lowlat = omp_init_allocator (omp_low_lat_mem_space,
+							1, traits);
+
+    int size = 16;
+
+    char *a = (char *)omp_alloc(size, lowlat);
+    char *b = (char *)omp_alloc(size, lowlat);
+    char *c = (char *)omp_alloc(size, lowlat);
+    char *d = (char *)omp_alloc(size, lowlat);
+
+    /* There are headers and padding to account for.  */
+    int size2 = size + (b-a);
+    int size3 = size + (c-a);
+    int size4 = size + (d-a) + 100; /* Random larger amount.  */
+
+    check (a != NULL && b != NULL && c != NULL && d != NULL,
+	   "omp_alloc returned NULL\n");
+
+    char *p = omp_realloc (b, size, lowlat, lowlat);
+    check (p == b, "realloc did not reuse same size chunk, no space after");
+
+    p = omp_realloc (b, size-8, lowlat, lowlat);
+    check (p == b, "realloc did not reuse smaller chunk, no space after");
+
+    p = omp_realloc (b, size, lowlat, lowlat);
+    check (p == b, "realloc did not reuse original size chunk, no space after");
+
+    /* Make space after b.  */
+    omp_free(c, lowlat);
+
+    p = omp_realloc (b, size, lowlat, lowlat);
+    check (p == b, "realloc did not reuse same size chunk");
+
+    p = omp_realloc (b, size-8, lowlat, lowlat);
+    check (p == b, "realloc did not reuse smaller chunk");
+
+    p = omp_realloc (b, size, lowlat, lowlat);
+    check (p == b, "realloc did not reuse original size chunk");
+
+    p = omp_realloc (b, size+8, lowlat, lowlat);
+    check (p == b, "realloc did not extend in place by a little");
+
+    p = omp_realloc (b, size2, lowlat, lowlat);
+    check (p == b, "realloc did not extend into whole next chunk");
+
+    p = omp_realloc (b, size3, lowlat, lowlat);
+    check (p != b, "realloc did not move b elsewhere");
+    omp_free (p, lowlat);
+
+
+    p = omp_realloc (a, size, lowlat, lowlat);
+    check (p == a, "realloc did not reuse same size chunk, first position");
+
+    p = omp_realloc (a, size-8, lowlat, lowlat);
+    check (p == a, "realloc did not reuse smaller chunk, first position");
+
+    p = omp_realloc (a, size, lowlat, lowlat);
+    check (p == a, "realloc did not reuse original size chunk, first position");
+
+    p = omp_realloc (a, size+8, lowlat, lowlat);
+    check (p == a, "realloc did not extend in place by a little, first position");
+
+    p = omp_realloc (a, size3, lowlat, lowlat);
+    check (p == a, "realloc did not extend into whole next chunk, first position");
+
+    p = omp_realloc (a, size4, lowlat, lowlat);
+    check (p != a, "realloc did not move a elsewhere, first position");
+    omp_free (p, lowlat);
+
+
+    p = omp_realloc (d, size, lowlat, lowlat);
+    check (p == d, "realloc did not reuse same size chunk, last position");
+
+    p = omp_realloc (d, size-8, lowlat, lowlat);
+    check (p == d, "realloc did not reuse smaller chunk, last position");
+
+    p = omp_realloc (d, size, lowlat, lowlat);
+    check (p == d, "realloc did not reuse original size chunk, last position");
+
+    p = omp_realloc (d, size+8, lowlat, lowlat);
+    check (p == d, "realloc did not extend in place by d little, last position");
+
+    /* Larger than low latency memory.  */
+    p = omp_realloc(d, 100000000, lowlat, lowlat);
+    check (p == NULL, "realloc did not fail on OOM");
+  }
+
+return 0;
+}
+

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 02/17] libgomp: pinned memory
  2022-07-07 10:34 [PATCH 00/17] openmp, nvptx, amdgcn: 5.0 Memory Allocators Andrew Stubbs
  2022-07-07 10:34 ` [PATCH 01/17] libgomp, nvptx: low-latency memory allocator Andrew Stubbs
@ 2022-07-07 10:34 ` Andrew Stubbs
  2022-12-08 12:11   ` Jakub Jelinek
  2022-07-07 10:34 ` [PATCH 03/17] libgomp, openmp: Add ompx_pinned_mem_alloc Andrew Stubbs
                   ` (14 subsequent siblings)
  16 siblings, 1 reply; 30+ messages in thread
From: Andrew Stubbs @ 2022-07-07 10:34 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1729 bytes --]


Implement the OpenMP pinned memory trait on Linux hosts using the mlock
syscall.  Pinned allocations are performed using mmap, not malloc, to ensure
that they can be unpinned safely when freed.

libgomp/ChangeLog:

	* allocator.c (MEMSPACE_ALLOC): Add PIN.
	(MEMSPACE_CALLOC): Add PIN.
	(MEMSPACE_REALLOC): Add PIN.
	(MEMSPACE_FREE): Add PIN.
	(xmlock): New function.
	(omp_init_allocator): Don't disallow the pinned trait.
	(omp_aligned_alloc): Add pinning to all MEMSPACE_* calls.
	(omp_aligned_calloc): Likewise.
	(omp_realloc): Likewise.
	(omp_free): Likewise.
	* config/linux/allocator.c: New file.
	* config/nvptx/allocator.c (MEMSPACE_ALLOC): Add PIN.
	(MEMSPACE_CALLOC): Add PIN.
	(MEMSPACE_REALLOC): Add PIN.
	(MEMSPACE_FREE): Add PIN.
	* testsuite/libgomp.c/alloc-pinned-1.c: New test.
	* testsuite/libgomp.c/alloc-pinned-2.c: New test.
	* testsuite/libgomp.c/alloc-pinned-3.c: New test.
	* testsuite/libgomp.c/alloc-pinned-4.c: New test.
---
 libgomp/allocator.c                          |  67 ++++++----
 libgomp/config/linux/allocator.c             |  99 ++++++++++++++
 libgomp/config/nvptx/allocator.c             |   8 +-
 libgomp/testsuite/libgomp.c/alloc-pinned-1.c |  95 +++++++++++++
 libgomp/testsuite/libgomp.c/alloc-pinned-2.c | 101 ++++++++++++++
 libgomp/testsuite/libgomp.c/alloc-pinned-3.c | 130 ++++++++++++++++++
 libgomp/testsuite/libgomp.c/alloc-pinned-4.c | 132 +++++++++++++++++++
 7 files changed, 602 insertions(+), 30 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-1.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-2.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-3.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-4.c


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0002-libgomp-pinned-memory.patch --]
[-- Type: text/x-patch; name="0002-libgomp-pinned-memory.patch", Size: 20088 bytes --]

diff --git a/libgomp/allocator.c b/libgomp/allocator.c
index 9b33bcf529b..54310ab93ca 100644
--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -39,16 +39,20 @@
 
 /* These macros may be overridden in config/<target>/allocator.c.  */
 #ifndef MEMSPACE_ALLOC
-#define MEMSPACE_ALLOC(MEMSPACE, SIZE) malloc (SIZE)
+#define MEMSPACE_ALLOC(MEMSPACE, SIZE, PIN) \
+  (PIN ? NULL : malloc (SIZE))
 #endif
 #ifndef MEMSPACE_CALLOC
-#define MEMSPACE_CALLOC(MEMSPACE, SIZE) calloc (1, SIZE)
+#define MEMSPACE_CALLOC(MEMSPACE, SIZE, PIN) \
+  (PIN ? NULL : calloc (1, SIZE))
 #endif
 #ifndef MEMSPACE_REALLOC
-#define MEMSPACE_REALLOC(MEMSPACE, ADDR, OLDSIZE, SIZE) realloc (ADDR, SIZE)
+#define MEMSPACE_REALLOC(MEMSPACE, ADDR, OLDSIZE, SIZE, OLDPIN, PIN) \
+  ((PIN) || (OLDPIN) ? NULL : realloc (ADDR, SIZE))
 #endif
 #ifndef MEMSPACE_FREE
-#define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE) free (ADDR)
+#define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE, PIN) \
+  (PIN ? NULL : free (ADDR))
 #endif
 
 /* Map the predefined allocators to the correct memory space.
@@ -351,10 +355,6 @@ omp_init_allocator (omp_memspace_handle_t memspace, int ntraits,
       break;
     }
 
-  /* No support for this so far.  */
-  if (data.pinned)
-    return omp_null_allocator;
-
   ret = gomp_malloc (sizeof (struct omp_allocator_data));
   *ret = data;
 #ifndef HAVE_SYNC_BUILTINS
@@ -481,7 +481,8 @@ retry:
 	}
       else
 #endif
-	ptr = MEMSPACE_ALLOC (allocator_data->memspace, new_size);
+	ptr = MEMSPACE_ALLOC (allocator_data->memspace, new_size,
+			      allocator_data->pinned);
       if (ptr == NULL)
 	{
 #ifdef HAVE_SYNC_BUILTINS
@@ -511,7 +512,8 @@ retry:
 	    = (allocator_data
 	       ? allocator_data->memspace
 	       : predefined_alloc_mapping[allocator]);
-	  ptr = MEMSPACE_ALLOC (memspace, new_size);
+	  ptr = MEMSPACE_ALLOC (memspace, new_size,
+				allocator_data && allocator_data->pinned);
 	}
       if (ptr == NULL)
 	goto fail;
@@ -542,9 +544,9 @@ fail:
 #ifdef LIBGOMP_USE_MEMKIND
 	  || memkind
 #endif
-	  || (allocator_data
-	      && allocator_data->pool_size < ~(uintptr_t) 0)
-	  || !allocator_data)
+	  || !allocator_data
+	  || allocator_data->pool_size < ~(uintptr_t) 0
+	  || allocator_data->pinned)
 	{
 	  allocator = omp_default_mem_alloc;
 	  goto retry;
@@ -596,6 +598,7 @@ omp_free (void *ptr, omp_allocator_handle_t allocator)
   struct omp_mem_header *data;
   omp_memspace_handle_t memspace __attribute__((unused))
     = omp_default_mem_space;
+  int pinned __attribute__((unused)) = false;
 
   if (ptr == NULL)
     return;
@@ -627,6 +630,7 @@ omp_free (void *ptr, omp_allocator_handle_t allocator)
 #endif
 
       memspace = allocator_data->memspace;
+      pinned = allocator_data->pinned;
     }
   else
     {
@@ -651,7 +655,7 @@ omp_free (void *ptr, omp_allocator_handle_t allocator)
       memspace = predefined_alloc_mapping[data->allocator];
     }
 
-  MEMSPACE_FREE (memspace, data->ptr, data->size);
+  MEMSPACE_FREE (memspace, data->ptr, data->size, pinned);
 }
 
 ialias (omp_free)
@@ -767,7 +771,8 @@ retry:
 	}
       else
 #endif
-	ptr = MEMSPACE_CALLOC (allocator_data->memspace, new_size);
+	ptr = MEMSPACE_CALLOC (allocator_data->memspace, new_size,
+			       allocator_data->pinned);
       if (ptr == NULL)
 	{
 #ifdef HAVE_SYNC_BUILTINS
@@ -797,7 +802,8 @@ retry:
 	    = (allocator_data
 	       ? allocator_data->memspace
 	       : predefined_alloc_mapping[allocator]);
-	  ptr = MEMSPACE_CALLOC (memspace, new_size);
+	  ptr = MEMSPACE_CALLOC (memspace, new_size,
+				 allocator_data && allocator_data->pinned);
 	}
       if (ptr == NULL)
 	goto fail;
@@ -828,9 +834,9 @@ fail:
 #ifdef LIBGOMP_USE_MEMKIND
 	  || memkind
 #endif
-	  || (allocator_data
-	      && allocator_data->pool_size < ~(uintptr_t) 0)
-	  || !allocator_data)
+	  || !allocator_data
+	  || allocator_data->pool_size < ~(uintptr_t) 0
+	  || allocator_data->pinned)
 	{
 	  allocator = omp_default_mem_alloc;
 	  goto retry;
@@ -1021,9 +1027,13 @@ retry:
 #endif
       if (prev_size)
 	new_ptr = MEMSPACE_REALLOC (allocator_data->memspace, data->ptr,
-				    data->size, new_size);
+				    data->size, new_size,
+				    (free_allocator_data
+				     && free_allocator_data->pinned),
+				    allocator_data->pinned);
       else
-	new_ptr = MEMSPACE_ALLOC (allocator_data->memspace, new_size);
+	new_ptr = MEMSPACE_ALLOC (allocator_data->memspace, new_size,
+				  allocator_data->pinned);
       if (new_ptr == NULL)
 	{
 #ifdef HAVE_SYNC_BUILTINS
@@ -1069,10 +1079,14 @@ retry:
 	    = (allocator_data
 	       ? allocator_data->memspace
 	       : predefined_alloc_mapping[allocator]);
-	  new_ptr = MEMSPACE_REALLOC (memspace, data->ptr, data->size, new_size);
+	  new_ptr = MEMSPACE_REALLOC (memspace, data->ptr, data->size, new_size,
+				      (free_allocator_data
+				       && free_allocator_data->pinned),
+				      allocator_data && allocator_data->pinned);
 	}
       if (new_ptr == NULL)
 	goto fail;
+
       ret = (char *) new_ptr + sizeof (struct omp_mem_header);
       ((struct omp_mem_header *) ret)[-1].ptr = new_ptr;
       ((struct omp_mem_header *) ret)[-1].size = new_size;
@@ -1095,7 +1109,8 @@ retry:
 	    = (allocator_data
 	       ? allocator_data->memspace
 	       : predefined_alloc_mapping[allocator]);
-	  new_ptr = MEMSPACE_ALLOC (memspace, new_size);
+	  new_ptr = MEMSPACE_ALLOC (memspace, new_size,
+				    allocator_data && allocator_data->pinned);
 	}
       if (new_ptr == NULL)
 	goto fail;
@@ -1151,9 +1166,9 @@ fail:
 #ifdef LIBGOMP_USE_MEMKIND
 	  || memkind
 #endif
-	  || (allocator_data
-	      && allocator_data->pool_size < ~(uintptr_t) 0)
-	  || !allocator_data)
+	  || !allocator_data
+	  || allocator_data->pool_size < ~(uintptr_t) 0
+	  || allocator_data->pinned)
 	{
 	  allocator = omp_default_mem_alloc;
 	  goto retry;
diff --git a/libgomp/config/linux/allocator.c b/libgomp/config/linux/allocator.c
index b73acce9121..1496e41875c 100644
--- a/libgomp/config/linux/allocator.c
+++ b/libgomp/config/linux/allocator.c
@@ -33,4 +33,103 @@
 #define LIBGOMP_USE_MEMKIND
 #endif
 
+/* Implement malloc routines that can handle pinned memory on Linux.
+   
+   It's possible to use mlock on any heap memory, but using munlock is
+   problematic if there are multiple pinned allocations on the same page.
+   Tracking all that manually would be possible, but adds overhead. This may
+   be worth it if there are a lot of small allocations getting pinned, but
+   this seems less likely in a HPC application.
+
+   Instead we optimize for large pinned allocations, and use mmap to ensure
+   that two pinned allocations don't share the same page.  This also means
+   that large allocations don't pin extra pages by being poorly aligned.  */
+
+#define _GNU_SOURCE
+#include <sys/mman.h>
+#include <string.h>
+#include "libgomp.h"
+
+static void *
+linux_memspace_alloc (omp_memspace_handle_t memspace, size_t size, int pin)
+{
+  (void)memspace;
+
+  if (pin)
+    {
+      void *addr = mmap (NULL, size, PROT_READ | PROT_WRITE,
+			 MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+      if (addr == MAP_FAILED)
+	return NULL;
+
+      if (mlock (addr, size))
+	{
+	  gomp_debug (0, "libgomp: failed to pin memory (ulimit too low?)\n");
+	  munmap (addr, size);
+	  return NULL;
+	}
+
+      return addr;
+    }
+  else
+    return malloc (size);
+}
+
+static void *
+linux_memspace_calloc (omp_memspace_handle_t memspace, size_t size, int pin)
+{
+  if (pin)
+    return linux_memspace_alloc (memspace, size, pin);
+  else
+    return calloc (1, size);
+}
+
+static void
+linux_memspace_free (omp_memspace_handle_t memspace, void *addr, size_t size,
+		     int pin)
+{
+  (void)memspace;
+
+  if (pin)
+    munmap (addr, size);
+  else
+    free (addr);
+}
+
+static void *
+linux_memspace_realloc (omp_memspace_handle_t memspace, void *addr,
+			size_t oldsize, size_t size, int oldpin, int pin)
+{
+  if (oldpin && pin)
+    {
+      void *newaddr = mremap (addr, oldsize, size, MREMAP_MAYMOVE);
+      if (newaddr == MAP_FAILED)
+	return NULL;
+
+      return newaddr;
+    }
+  else if (oldpin || pin)
+    {
+      void *newaddr = linux_memspace_alloc (memspace, size, pin);
+      if (newaddr)
+	{
+	  memcpy (newaddr, addr, oldsize < size ? oldsize : size);
+	  linux_memspace_free (memspace, addr, oldsize, oldpin);
+	}
+
+      return newaddr;
+    }
+  else
+    return realloc (addr, size);
+}
+
+#define MEMSPACE_ALLOC(MEMSPACE, SIZE, PIN) \
+  linux_memspace_alloc (MEMSPACE, SIZE, PIN)
+#define MEMSPACE_CALLOC(MEMSPACE, SIZE, PIN) \
+  linux_memspace_calloc (MEMSPACE, SIZE, PIN)
+#define MEMSPACE_REALLOC(MEMSPACE, ADDR, OLDSIZE, SIZE, OLDPIN, PIN) \
+  linux_memspace_realloc (MEMSPACE, ADDR, OLDSIZE, SIZE, OLDPIN, PIN)
+#define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE, PIN) \
+  linux_memspace_free (MEMSPACE, ADDR, SIZE, PIN)
+
 #include "../../allocator.c"
diff --git a/libgomp/config/nvptx/allocator.c b/libgomp/config/nvptx/allocator.c
index 6bc2ea48043..f740b97f6ac 100644
--- a/libgomp/config/nvptx/allocator.c
+++ b/libgomp/config/nvptx/allocator.c
@@ -358,13 +358,13 @@ nvptx_memspace_realloc (omp_memspace_handle_t memspace, void *addr,
     return realloc (addr, size);
 }
 
-#define MEMSPACE_ALLOC(MEMSPACE, SIZE) \
+#define MEMSPACE_ALLOC(MEMSPACE, SIZE, PIN) \
   nvptx_memspace_alloc (MEMSPACE, SIZE)
-#define MEMSPACE_CALLOC(MEMSPACE, SIZE) \
+#define MEMSPACE_CALLOC(MEMSPACE, SIZE, PIN) \
   nvptx_memspace_calloc (MEMSPACE, SIZE)
-#define MEMSPACE_REALLOC(MEMSPACE, ADDR, OLDSIZE, SIZE) \
+#define MEMSPACE_REALLOC(MEMSPACE, ADDR, OLDSIZE, SIZE, OLDPIN, PIN) \
   nvptx_memspace_realloc (MEMSPACE, ADDR, OLDSIZE, SIZE)
-#define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE) \
+#define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE, PIN) \
   nvptx_memspace_free (MEMSPACE, ADDR, SIZE)
 
 #include "../../allocator.c"
diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-1.c b/libgomp/testsuite/libgomp.c/alloc-pinned-1.c
new file mode 100644
index 00000000000..79792b16d83
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/alloc-pinned-1.c
@@ -0,0 +1,95 @@
+/* { dg-do run } */
+
+/* { dg-xfail-run-if "Pinning not implemented on this host" { ! *-*-linux-gnu } } */
+
+/* Test that pinned memory works.  */
+
+#include <stdio.h>
+#include <stdlib.h>
+
+#ifdef __linux__
+#include <sys/types.h>
+#include <unistd.h>
+
+#include <sys/mman.h>
+#include <sys/resource.h>
+
+#define PAGE_SIZE sysconf(_SC_PAGESIZE)
+#define CHECK_SIZE(SIZE) { \
+  struct rlimit limit; \
+  if (getrlimit (RLIMIT_MEMLOCK, &limit) \
+      || limit.rlim_cur <= SIZE) \
+    fprintf (stderr, "unsufficient lockable memory; please increase ulimit\n"); \
+  }
+
+int
+get_pinned_mem ()
+{
+  int pid = getpid ();
+  char buf[100];
+  sprintf (buf, "/proc/%d/status", pid);
+
+  FILE *proc = fopen (buf, "r");
+  if (!proc)
+    abort ();
+  while (fgets (buf, 100, proc))
+    {
+      int val;
+      if (sscanf (buf, "VmLck: %d", &val))
+	{
+	  fclose (proc);
+	  return val;
+	}
+    }
+  abort ();
+}
+#else
+#define PAGE_SIZE 1 /* unknown */
+#define CHECK_SIZE(SIZE) fprintf (stderr, "OS unsupported\n");
+
+int
+get_pinned_mem ()
+{
+  return 0;
+}
+#endif
+
+#include <omp.h>
+
+int
+main ()
+{
+  /* Allocate at least a page each time, but stay within the ulimit.  */
+  const int SIZE = PAGE_SIZE;
+  CHECK_SIZE (SIZE*3);
+
+  const omp_alloctrait_t traits[] = {
+      { omp_atk_pinned, 1 }
+  };
+  omp_allocator_handle_t allocator = omp_init_allocator (omp_default_mem_space, 1, traits);
+
+  // Sanity check
+  if (get_pinned_mem () != 0)
+    abort ();
+
+  void *p = omp_alloc (SIZE, allocator);
+  if (!p)
+    abort ();
+
+  int amount = get_pinned_mem ();
+  if (amount == 0)
+    abort ();
+
+  p = omp_realloc (p, SIZE*2, allocator, allocator);
+
+  int amount2 = get_pinned_mem ();
+  if (amount2 <= amount)
+    abort ();
+
+  p = omp_calloc (1, SIZE, allocator);
+
+  if (get_pinned_mem () <= amount2)
+    abort ();
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-2.c b/libgomp/testsuite/libgomp.c/alloc-pinned-2.c
new file mode 100644
index 00000000000..228c656b715
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/alloc-pinned-2.c
@@ -0,0 +1,101 @@
+/* { dg-do run } */
+
+/* { dg-xfail-run-if "Pinning not implemented on this host" { ! *-*-linux-gnu } } */
+
+/* Test that pinned memory works (pool_size code path).  */
+
+#include <stdio.h>
+#include <stdlib.h>
+
+#ifdef __linux__
+#include <sys/types.h>
+#include <unistd.h>
+
+#include <sys/mman.h>
+#include <sys/resource.h>
+
+#define PAGE_SIZE sysconf(_SC_PAGESIZE)
+#define CHECK_SIZE(SIZE) { \
+  struct rlimit limit; \
+  if (getrlimit (RLIMIT_MEMLOCK, &limit) \
+      || limit.rlim_cur <= SIZE) \
+    fprintf (stderr, "unsufficient lockable memory; please increase ulimit\n"); \
+  }
+
+int
+get_pinned_mem ()
+{
+  int pid = getpid ();
+  char buf[100];
+  sprintf (buf, "/proc/%d/status", pid);
+
+  FILE *proc = fopen (buf, "r");
+  if (!proc)
+    abort ();
+  while (fgets (buf, 100, proc))
+    {
+      int val;
+      if (sscanf (buf, "VmLck: %d", &val))
+	{
+	  fclose (proc);
+	  return val;
+	}
+    }
+  abort ();
+}
+#else
+#define PAGE_SIZE 1 /* unknown */
+#define CHECK_SIZE(SIZE) fprintf (stderr, "OS unsupported\n");
+
+int
+get_pinned_mem ()
+{
+  return 0;
+}
+#endif
+
+#include <omp.h>
+
+int
+main ()
+{
+  /* Allocate at least a page each time, but stay within the ulimit.  */
+  const int SIZE = PAGE_SIZE;
+  CHECK_SIZE (SIZE*3);
+
+  const omp_alloctrait_t traits[] = {
+      { omp_atk_pinned, 1 },
+      { omp_atk_pool_size, SIZE*8 }
+  };
+  omp_allocator_handle_t allocator = omp_init_allocator (omp_default_mem_space,
+							 2, traits);
+
+  // Sanity check
+  if (get_pinned_mem () != 0)
+    abort ();
+
+  void *p = omp_alloc (SIZE, allocator);
+  if (!p)
+    abort ();
+
+  int amount = get_pinned_mem ();
+  if (amount == 0)
+    abort ();
+
+  p = omp_realloc (p, SIZE*2, allocator, allocator);
+  if (!p)
+    abort ();
+
+  int amount2 = get_pinned_mem ();
+  if (amount2 <= amount)
+    abort ();
+
+  p = omp_calloc (1, SIZE, allocator);
+  if (!p)
+    abort ();
+
+  if (get_pinned_mem () <= amount2)
+    abort ();
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-3.c b/libgomp/testsuite/libgomp.c/alloc-pinned-3.c
new file mode 100644
index 00000000000..90539ffe3e0
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/alloc-pinned-3.c
@@ -0,0 +1,130 @@
+/* { dg-do run } */
+
+/* Test that pinned memory fails correctly.  */
+
+#include <stdio.h>
+#include <stdlib.h>
+
+#ifdef __linux__
+#include <sys/types.h>
+#include <unistd.h>
+
+#include <sys/mman.h>
+#include <sys/resource.h>
+
+#define PAGE_SIZE sysconf(_SC_PAGESIZE)
+
+int
+get_pinned_mem ()
+{
+  int pid = getpid ();
+  char buf[100];
+  sprintf (buf, "/proc/%d/status", pid);
+
+  FILE *proc = fopen (buf, "r");
+  if (!proc)
+    abort ();
+  while (fgets (buf, 100, proc))
+    {
+      int val;
+      if (sscanf (buf, "VmLck: %d", &val))
+	{
+	  fclose (proc);
+	  return val;
+	}
+    }
+  abort ();
+}
+
+void
+set_pin_limit (int size)
+{
+  struct rlimit limit;
+  if (getrlimit (RLIMIT_MEMLOCK, &limit))
+    abort ();
+  limit.rlim_cur = (limit.rlim_max < size ? limit.rlim_max : size);
+  if (setrlimit (RLIMIT_MEMLOCK, &limit))
+    abort ();
+}
+#else
+int
+#define PAGE_SIZE 10000*1024 /* unknown */
+
+get_pinned_mem ()
+{
+  return 0;
+}
+
+void
+set_pin_limit ()
+{
+}
+#endif
+
+#include <omp.h>
+
+int
+main ()
+{
+  /* This needs to be large enough to cover multiple pages.  */
+  const int SIZE = PAGE_SIZE*4;
+
+  /* Pinned memory, no fallback.  */
+  const omp_alloctrait_t traits1[] = {
+      { omp_atk_pinned, 1 },
+      { omp_atk_fallback, omp_atv_null_fb }
+  };
+  omp_allocator_handle_t allocator1 = omp_init_allocator (omp_default_mem_space, 2, traits1);
+
+  /* Pinned memory, plain memory fallback.  */
+  const omp_alloctrait_t traits2[] = {
+      { omp_atk_pinned, 1 },
+      { omp_atk_fallback, omp_atv_default_mem_fb }
+  };
+  omp_allocator_handle_t allocator2 = omp_init_allocator (omp_default_mem_space, 2, traits2);
+
+  /* Ensure that the limit is smaller than the allocation.  */
+  set_pin_limit (SIZE/2);
+
+  // Sanity check
+  if (get_pinned_mem () != 0)
+    abort ();
+
+  // Should fail
+  void *p = omp_alloc (SIZE, allocator1);
+  if (p)
+    abort ();
+
+  // Should fail
+  p = omp_calloc (1, SIZE, allocator1);
+  if (p)
+    abort ();
+
+  // Should fall back
+  p = omp_alloc (SIZE, allocator2);
+  if (!p)
+    abort ();
+
+  // Should fall back
+  p = omp_calloc (1, SIZE, allocator2);
+  if (!p)
+    abort ();
+
+  // Should fail to realloc
+  void *notpinned = omp_alloc (SIZE, omp_default_mem_alloc);
+  p = omp_realloc (notpinned, SIZE, allocator1, omp_default_mem_alloc);
+  if (!notpinned || p)
+    abort ();
+
+  // Should fall back to no realloc needed
+  p = omp_realloc (notpinned, SIZE, allocator2, omp_default_mem_alloc);
+  if (p != notpinned)
+    abort ();
+
+  // No memory should have been pinned
+  int amount = get_pinned_mem ();
+  if (amount != 0)
+    abort ();
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-4.c b/libgomp/testsuite/libgomp.c/alloc-pinned-4.c
new file mode 100644
index 00000000000..534e49eefc4
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/alloc-pinned-4.c
@@ -0,0 +1,132 @@
+/* { dg-do run } */
+
+/* Test that pinned memory fails correctly, pool_size code path.  */
+
+#include <stdio.h>
+#include <stdlib.h>
+
+#ifdef __linux__
+#include <sys/types.h>
+#include <unistd.h>
+
+#include <sys/mman.h>
+#include <sys/resource.h>
+
+#define PAGE_SIZE sysconf(_SC_PAGESIZE)
+
+int
+get_pinned_mem ()
+{
+  int pid = getpid ();
+  char buf[100];
+  sprintf (buf, "/proc/%d/status", pid);
+
+  FILE *proc = fopen (buf, "r");
+  if (!proc)
+    abort ();
+  while (fgets (buf, 100, proc))
+    {
+      int val;
+      if (sscanf (buf, "VmLck: %d", &val))
+	{
+	  fclose (proc);
+	  return val;
+	}
+    }
+  abort ();
+}
+
+void
+set_pin_limit (int size)
+{
+  struct rlimit limit;
+  if (getrlimit (RLIMIT_MEMLOCK, &limit))
+    abort ();
+  limit.rlim_cur = (limit.rlim_max < size ? limit.rlim_max : size);
+  if (setrlimit (RLIMIT_MEMLOCK, &limit))
+    abort ();
+}
+#else
+int
+#define PAGE_SIZE 10000*1024 /* unknown */
+
+get_pinned_mem ()
+{
+  return 0;
+}
+
+void
+set_pin_limit ()
+{
+}
+#endif
+
+#include <omp.h>
+
+int
+main ()
+{
+  /* This needs to be large enough to cover multiple pages.  */
+  const int SIZE = PAGE_SIZE*4;
+
+  /* Pinned memory, no fallback.  */
+  const omp_alloctrait_t traits1[] = {
+      { omp_atk_pinned, 1 },
+      { omp_atk_fallback, omp_atv_null_fb },
+      { omp_atk_pool_size, SIZE*8 }
+  };
+  omp_allocator_handle_t allocator1 = omp_init_allocator (omp_default_mem_space, 3, traits1);
+
+  /* Pinned memory, plain memory fallback.  */
+  const omp_alloctrait_t traits2[] = {
+      { omp_atk_pinned, 1 },
+      { omp_atk_fallback, omp_atv_default_mem_fb },
+      { omp_atk_pool_size, SIZE*8 }
+  };
+  omp_allocator_handle_t allocator2 = omp_init_allocator (omp_default_mem_space, 3, traits2);
+
+  /* Ensure that the limit is smaller than the allocation.  */
+  set_pin_limit (SIZE/2);
+
+  // Sanity check
+  if (get_pinned_mem () != 0)
+    abort ();
+
+  // Should fail
+  void *p = omp_alloc (SIZE, allocator1);
+  if (p)
+    abort ();
+
+  // Should fail
+  p = omp_calloc (1, SIZE, allocator1);
+  if (p)
+    abort ();
+
+  // Should fall back
+  p = omp_alloc (SIZE, allocator2);
+  if (!p)
+    abort ();
+
+  // Should fall back
+  p = omp_calloc (1, SIZE, allocator2);
+  if (!p)
+    abort ();
+
+  // Should fail to realloc
+  void *notpinned = omp_alloc (SIZE, omp_default_mem_alloc);
+  p = omp_realloc (notpinned, SIZE, allocator1, omp_default_mem_alloc);
+  if (!notpinned || p)
+    abort ();
+
+  // Should fall back to no realloc needed
+  p = omp_realloc (notpinned, SIZE, allocator2, omp_default_mem_alloc);
+  if (p != notpinned)
+    abort ();
+
+  // No memory should have been pinned
+  int amount = get_pinned_mem ();
+  if (amount != 0)
+    abort ();
+
+  return 0;
+}

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 03/17] libgomp, openmp: Add ompx_pinned_mem_alloc
  2022-07-07 10:34 [PATCH 00/17] openmp, nvptx, amdgcn: 5.0 Memory Allocators Andrew Stubbs
  2022-07-07 10:34 ` [PATCH 01/17] libgomp, nvptx: low-latency memory allocator Andrew Stubbs
  2022-07-07 10:34 ` [PATCH 02/17] libgomp: pinned memory Andrew Stubbs
@ 2022-07-07 10:34 ` Andrew Stubbs
  2022-07-07 10:34 ` [PATCH 04/17] openmp, nvptx: low-lat memory access traits Andrew Stubbs
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 30+ messages in thread
From: Andrew Stubbs @ 2022-07-07 10:34 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1449 bytes --]


This creates a new predefined allocator as a shortcut for using pinned
memory with OpenMP.  The name uses the OpenMP extension space and is
intended to be consistent with other OpenMP implementations currently in
development.

The allocator is equivalent to using a custom allocator with the pinned
trait and the null fallback trait.

libgomp/ChangeLog:

	* allocator.c (omp_max_predefined_alloc): Update.
	(omp_aligned_alloc): Support ompx_pinned_mem_alloc.
	(omp_free): Likewise.
	(omp_aligned_calloc): Likewise.
	(omp_realloc): Likewise.
	* omp.h.in (omp_allocator_handle_t): Add ompx_pinned_mem_alloc.
	* omp_lib.f90.in: Add ompx_pinned_mem_alloc.
	* testsuite/libgomp.c/alloc-pinned-5.c: New test.
	* testsuite/libgomp.c/alloc-pinned-6.c: New test.
	* testsuite/libgomp.fortran/alloc-pinned-1.f90: New test.
---
 libgomp/allocator.c                           |  60 +++++++----
 libgomp/omp.h.in                              |   1 +
 libgomp/omp_lib.f90.in                        |   2 +
 libgomp/testsuite/libgomp.c/alloc-pinned-5.c  |  90 ++++++++++++++++
 libgomp/testsuite/libgomp.c/alloc-pinned-6.c  | 101 ++++++++++++++++++
 .../libgomp.fortran/alloc-pinned-1.f90        |  16 +++
 6 files changed, 252 insertions(+), 18 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-5.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-6.c
 create mode 100644 libgomp/testsuite/libgomp.fortran/alloc-pinned-1.f90


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0003-libgomp-openmp-Add-ompx_pinned_mem_alloc.patch --]
[-- Type: text/x-patch; name="0003-libgomp-openmp-Add-ompx_pinned_mem_alloc.patch", Size: 10699 bytes --]

diff --git a/libgomp/allocator.c b/libgomp/allocator.c
index 54310ab93ca..029d0d40a36 100644
--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -35,7 +35,7 @@
 #include <dlfcn.h>
 #endif
 
-#define omp_max_predefined_alloc omp_thread_mem_alloc
+#define omp_max_predefined_alloc ompx_pinned_mem_alloc
 
 /* These macros may be overridden in config/<target>/allocator.c.  */
 #ifndef MEMSPACE_ALLOC
@@ -67,6 +67,7 @@ static const omp_memspace_handle_t predefined_alloc_mapping[] = {
   omp_low_lat_mem_space,   /* omp_cgroup_mem_alloc. */
   omp_low_lat_mem_space,   /* omp_pteam_mem_alloc. */
   omp_low_lat_mem_space,   /* omp_thread_mem_alloc. */
+  omp_default_mem_space,   /* ompx_pinned_mem_alloc. */
 };
 
 enum gomp_memkind_kind
@@ -512,8 +513,11 @@ retry:
 	    = (allocator_data
 	       ? allocator_data->memspace
 	       : predefined_alloc_mapping[allocator]);
-	  ptr = MEMSPACE_ALLOC (memspace, new_size,
-				allocator_data && allocator_data->pinned);
+	  int pinned __attribute__((unused))
+	    = (allocator_data
+	       ? allocator_data->pinned
+	       : allocator == ompx_pinned_mem_alloc);
+	  ptr = MEMSPACE_ALLOC (memspace, new_size, pinned);
 	}
       if (ptr == NULL)
 	goto fail;
@@ -534,7 +538,8 @@ retry:
 fail:
   int fallback = (allocator_data
 		  ? allocator_data->fallback
-		  : allocator == omp_default_mem_alloc
+		  : (allocator == omp_default_mem_alloc
+		     || allocator == ompx_pinned_mem_alloc)
 		  ? omp_atv_null_fb
 		  : omp_atv_default_mem_fb);
   switch (fallback)
@@ -653,6 +658,7 @@ omp_free (void *ptr, omp_allocator_handle_t allocator)
 #endif
 
       memspace = predefined_alloc_mapping[data->allocator];
+      pinned = (data->allocator == ompx_pinned_mem_alloc);
     }
 
   MEMSPACE_FREE (memspace, data->ptr, data->size, pinned);
@@ -802,8 +808,11 @@ retry:
 	    = (allocator_data
 	       ? allocator_data->memspace
 	       : predefined_alloc_mapping[allocator]);
-	  ptr = MEMSPACE_CALLOC (memspace, new_size,
-				 allocator_data && allocator_data->pinned);
+	  int pinned __attribute__((unused))
+	    = (allocator_data
+	       ? allocator_data->pinned
+	       : allocator == ompx_pinned_mem_alloc);
+	  ptr = MEMSPACE_CALLOC (memspace, new_size, pinned);
 	}
       if (ptr == NULL)
 	goto fail;
@@ -824,7 +833,8 @@ retry:
 fail:
   int fallback = (allocator_data
 		  ? allocator_data->fallback
-		  : allocator == omp_default_mem_alloc
+		  : (allocator == omp_default_mem_alloc
+		     || allocator == ompx_pinned_mem_alloc)
 		  ? omp_atv_null_fb
 		  : omp_atv_default_mem_fb);
   switch (fallback)
@@ -1026,11 +1036,15 @@ retry:
       else
 #endif
       if (prev_size)
-	new_ptr = MEMSPACE_REALLOC (allocator_data->memspace, data->ptr,
-				    data->size, new_size,
-				    (free_allocator_data
-				     && free_allocator_data->pinned),
-				    allocator_data->pinned);
+	{
+	  int was_pinned __attribute__((unused))
+	    = (free_allocator_data
+	       ? free_allocator_data->pinned
+	       : free_allocator == ompx_pinned_mem_alloc);
+	  new_ptr = MEMSPACE_REALLOC (allocator_data->memspace, data->ptr,
+				      data->size, new_size, was_pinned,
+				      allocator_data->pinned);
+	}
       else
 	new_ptr = MEMSPACE_ALLOC (allocator_data->memspace, new_size,
 				  allocator_data->pinned);
@@ -1079,10 +1093,16 @@ retry:
 	    = (allocator_data
 	       ? allocator_data->memspace
 	       : predefined_alloc_mapping[allocator]);
+	  int was_pinned __attribute__((unused))
+	    = (free_allocator_data
+	       ? free_allocator_data->pinned
+	       : free_allocator == ompx_pinned_mem_alloc);
+	  int pinned __attribute__((unused))
+	    = (allocator_data
+	       ? allocator_data->pinned
+	       : allocator == ompx_pinned_mem_alloc);
 	  new_ptr = MEMSPACE_REALLOC (memspace, data->ptr, data->size, new_size,
-				      (free_allocator_data
-				       && free_allocator_data->pinned),
-				      allocator_data && allocator_data->pinned);
+				      was_pinned, pinned);
 	}
       if (new_ptr == NULL)
 	goto fail;
@@ -1109,8 +1129,11 @@ retry:
 	    = (allocator_data
 	       ? allocator_data->memspace
 	       : predefined_alloc_mapping[allocator]);
-	  new_ptr = MEMSPACE_ALLOC (memspace, new_size,
-				    allocator_data && allocator_data->pinned);
+	  int pinned __attribute__((unused))
+	    = (allocator_data
+	       ? allocator_data->pinned
+	       : allocator == ompx_pinned_mem_alloc);
+	  new_ptr = MEMSPACE_ALLOC (memspace, new_size, pinned);
 	}
       if (new_ptr == NULL)
 	goto fail;
@@ -1156,7 +1179,8 @@ retry:
 fail:
   int fallback = (allocator_data
 		  ? allocator_data->fallback
-		  : allocator == omp_default_mem_alloc
+		  : (allocator == omp_default_mem_alloc
+		     || allocator == ompx_pinned_mem_alloc)
 		  ? omp_atv_null_fb
 		  : omp_atv_default_mem_fb);
   switch (fallback)
diff --git a/libgomp/omp.h.in b/libgomp/omp.h.in
index 925a650135e..eb071aa2e00 100644
--- a/libgomp/omp.h.in
+++ b/libgomp/omp.h.in
@@ -134,6 +134,7 @@ typedef enum omp_allocator_handle_t __GOMP_UINTPTR_T_ENUM
   omp_cgroup_mem_alloc = 6,
   omp_pteam_mem_alloc = 7,
   omp_thread_mem_alloc = 8,
+  ompx_pinned_mem_alloc = 9,
   __omp_allocator_handle_t_max__ = __UINTPTR_MAX__
 } omp_allocator_handle_t;
 
diff --git a/libgomp/omp_lib.f90.in b/libgomp/omp_lib.f90.in
index 7ba115f3a1a..10610d64cfe 100644
--- a/libgomp/omp_lib.f90.in
+++ b/libgomp/omp_lib.f90.in
@@ -158,6 +158,8 @@
                  parameter :: omp_pteam_mem_alloc = 7
         integer (kind=omp_allocator_handle_kind), &
                  parameter :: omp_thread_mem_alloc = 8
+        integer (kind=omp_allocator_handle_kind), &
+                 parameter :: ompx_pinned_mem_alloc = 9
         integer (omp_memspace_handle_kind), &
                  parameter :: omp_default_mem_space = 0
         integer (omp_memspace_handle_kind), &
diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-5.c b/libgomp/testsuite/libgomp.c/alloc-pinned-5.c
new file mode 100644
index 00000000000..315c7161a39
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/alloc-pinned-5.c
@@ -0,0 +1,90 @@
+/* { dg-do run } */
+
+/* { dg-xfail-run-if "Pinning not implemented on this host" { ! *-*-linux-gnu } } */
+
+/* Test that ompx_pinned_mem_alloc works.  */
+
+#include <stdio.h>
+#include <stdlib.h>
+
+#ifdef __linux__
+#include <sys/types.h>
+#include <unistd.h>
+
+#include <sys/mman.h>
+#include <sys/resource.h>
+
+#define PAGE_SIZE sysconf(_SC_PAGESIZE)
+#define CHECK_SIZE(SIZE) { \
+  struct rlimit limit; \
+  if (getrlimit (RLIMIT_MEMLOCK, &limit) \
+      || limit.rlim_cur <= SIZE) \
+    fprintf (stderr, "unsufficient lockable memory; please increase ulimit\n"); \
+  }
+
+int
+get_pinned_mem ()
+{
+  int pid = getpid ();
+  char buf[100];
+  sprintf (buf, "/proc/%d/status", pid);
+
+  FILE *proc = fopen (buf, "r");
+  if (!proc)
+    abort ();
+  while (fgets (buf, 100, proc))
+    {
+      int val;
+      if (sscanf (buf, "VmLck: %d", &val))
+	{
+	  fclose (proc);
+	  return val;
+	}
+    }
+  abort ();
+}
+#else
+#define PAGE_SIZE 1 /* unknown */
+#define CHECK_SIZE(SIZE) fprintf (stderr, "OS unsupported\n");
+
+int
+get_pinned_mem ()
+{
+  return 0;
+}
+#endif
+
+#include <omp.h>
+
+int
+main ()
+{
+  /* Allocate at least a page each time, but stay within the ulimit.  */
+  const int SIZE = PAGE_SIZE;
+  CHECK_SIZE (SIZE*3);
+
+  // Sanity check
+  if (get_pinned_mem () != 0)
+    abort ();
+
+  void *p = omp_alloc (SIZE, ompx_pinned_mem_alloc);
+  if (!p)
+    abort ();
+
+  int amount = get_pinned_mem ();
+  if (amount == 0)
+    abort ();
+
+  p = omp_realloc (p, SIZE*2, ompx_pinned_mem_alloc, ompx_pinned_mem_alloc);
+
+  int amount2 = get_pinned_mem ();
+  if (amount2 <= amount)
+    abort ();
+
+  p = omp_calloc (1, SIZE, ompx_pinned_mem_alloc);
+
+  if (get_pinned_mem () <= amount2)
+    abort ();
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-6.c b/libgomp/testsuite/libgomp.c/alloc-pinned-6.c
new file mode 100644
index 00000000000..bbe20c04875
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/alloc-pinned-6.c
@@ -0,0 +1,101 @@
+/* { dg-do run } */
+
+/* Test that ompx_pinned_mem_alloc fails correctly.  */
+
+#include <stdio.h>
+#include <stdlib.h>
+
+#ifdef __linux__
+#include <sys/types.h>
+#include <unistd.h>
+
+#include <sys/mman.h>
+#include <sys/resource.h>
+
+#define PAGE_SIZE sysconf(_SC_PAGESIZE)
+
+int
+get_pinned_mem ()
+{
+  int pid = getpid ();
+  char buf[100];
+  sprintf (buf, "/proc/%d/status", pid);
+
+  FILE *proc = fopen (buf, "r");
+  if (!proc)
+    abort ();
+  while (fgets (buf, 100, proc))
+    {
+      int val;
+      if (sscanf (buf, "VmLck: %d", &val))
+	{
+	  fclose (proc);
+	  return val;
+	}
+    }
+  abort ();
+}
+
+void
+set_pin_limit (int size)
+{
+  struct rlimit limit;
+  if (getrlimit (RLIMIT_MEMLOCK, &limit))
+    abort ();
+  limit.rlim_cur = (limit.rlim_max < size ? limit.rlim_max : size);
+  if (setrlimit (RLIMIT_MEMLOCK, &limit))
+    abort ();
+}
+#else
+#define PAGE_SIZE 10000*1024 /* unknown */
+
+int
+get_pinned_mem ()
+{
+  return 0;
+}
+
+void
+set_pin_limit ()
+{
+}
+#endif
+
+#include <omp.h>
+
+int
+main ()
+{
+  /* Allocate at least a page each time, but stay within the ulimit.  */
+  const int SIZE = PAGE_SIZE*4;
+
+  /* Ensure that the limit is smaller than the allocation.  */
+  set_pin_limit (SIZE/2);
+
+  // Sanity check
+  if (get_pinned_mem () != 0)
+    abort ();
+
+  // Should fail
+  void *p = omp_alloc (SIZE, ompx_pinned_mem_alloc);
+  if (p)
+    abort ();
+
+  // Should fail
+  p = omp_calloc (1, SIZE, ompx_pinned_mem_alloc);
+  if (p)
+    abort ();
+
+  // Should fail to realloc
+  void *notpinned = omp_alloc (SIZE, omp_default_mem_alloc);
+  p = omp_realloc (notpinned, SIZE, ompx_pinned_mem_alloc, omp_default_mem_alloc);
+  if (!notpinned || p)
+    abort ();
+
+  // No memory should have been pinned
+  int amount = get_pinned_mem ();
+  if (amount != 0)
+    abort ();
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.fortran/alloc-pinned-1.f90 b/libgomp/testsuite/libgomp.fortran/alloc-pinned-1.f90
new file mode 100644
index 00000000000..798dc3d5a12
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/alloc-pinned-1.f90
@@ -0,0 +1,16 @@
+! Ensure that the ompx_pinned_mem_alloc predefined allocator is present and
+! accepted.  The majority of the functionality testing lives in the C tests.
+!
+! { dg-xfail-run-if "Pinning not implemented on this host" { ! *-*-linux-gnu } }
+
+program main
+  use omp_lib
+  use ISO_C_Binding
+  implicit none (external, type)
+
+  type(c_ptr) :: p
+
+  p = omp_alloc (10_c_size_t, ompx_pinned_mem_alloc);
+  if (.not. c_associated (p)) stop 1
+  call omp_free (p, ompx_pinned_mem_alloc);
+end program main

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 04/17] openmp, nvptx: low-lat memory access traits
  2022-07-07 10:34 [PATCH 00/17] openmp, nvptx, amdgcn: 5.0 Memory Allocators Andrew Stubbs
                   ` (2 preceding siblings ...)
  2022-07-07 10:34 ` [PATCH 03/17] libgomp, openmp: Add ompx_pinned_mem_alloc Andrew Stubbs
@ 2022-07-07 10:34 ` Andrew Stubbs
  2022-07-07 10:34 ` [PATCH 05/17] openmp, nvptx: ompx_unified_shared_mem_alloc Andrew Stubbs
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 30+ messages in thread
From: Andrew Stubbs @ 2022-07-07 10:34 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1135 bytes --]


The NVPTX low latency memory is not accessible outside the team that allocates
it, and therefore should be unavailable for allocators with the access trait
"all".  This change means that the omp_low_lat_mem_alloc predefined
allocator now implicitly implies the "pteam" trait.

libgomp/ChangeLog:

	* allocator.c (MEMSPACE_VALIDATE): New macro.
	(omp_aligned_alloc): Use MEMSPACE_VALIDATE.
	(omp_aligned_calloc): Likewise.
	(omp_realloc): Likewise.
	* config/nvptx/allocator.c (nvptx_memspace_validate): New function.
	(MEMSPACE_VALIDATE): New macro.
	* testsuite/libgomp.c/allocators-4.c (main): Add access trait.
	* testsuite/libgomp.c/allocators-6.c (main): Add access trait.
	* testsuite/libgomp.c/allocators-7.c: New test.
---
 libgomp/allocator.c                        | 15 +++++
 libgomp/config/nvptx/allocator.c           | 11 ++++
 libgomp/testsuite/libgomp.c/allocators-4.c |  7 ++-
 libgomp/testsuite/libgomp.c/allocators-6.c |  7 ++-
 libgomp/testsuite/libgomp.c/allocators-7.c | 68 ++++++++++++++++++++++
 5 files changed, 102 insertions(+), 6 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c/allocators-7.c


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0004-openmp-nvptx-low-lat-memory-access-traits.patch --]
[-- Type: text/x-patch; name="0004-openmp-nvptx-low-lat-memory-access-traits.patch", Size: 6096 bytes --]

diff --git a/libgomp/allocator.c b/libgomp/allocator.c
index 029d0d40a36..48ab0782e6b 100644
--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -54,6 +54,9 @@
 #define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE, PIN) \
   (PIN ? NULL : free (ADDR))
 #endif
+#ifndef MEMSPACE_VALIDATE
+#define MEMSPACE_VALIDATE(MEMSPACE, ACCESS) 1
+#endif
 
 /* Map the predefined allocators to the correct memory space.
    The index to this table is the omp_allocator_handle_t enum value.  */
@@ -438,6 +441,10 @@ retry:
   if (__builtin_add_overflow (size, new_size, &new_size))
     goto fail;
 
+  if (allocator_data
+      && !MEMSPACE_VALIDATE (allocator_data->memspace, allocator_data->access))
+    goto fail;
+
   if (__builtin_expect (allocator_data
 			&& allocator_data->pool_size < ~(uintptr_t) 0, 0))
     {
@@ -733,6 +740,10 @@ retry:
   if (__builtin_add_overflow (size_temp, new_size, &new_size))
     goto fail;
 
+  if (allocator_data
+      && !MEMSPACE_VALIDATE (allocator_data->memspace, allocator_data->access))
+    goto fail;
+
   if (__builtin_expect (allocator_data
 			&& allocator_data->pool_size < ~(uintptr_t) 0, 0))
     {
@@ -964,6 +975,10 @@ retry:
     goto fail;
   old_size = data->size;
 
+  if (allocator_data
+      && !MEMSPACE_VALIDATE (allocator_data->memspace, allocator_data->access))
+    goto fail;
+
   if (__builtin_expect (allocator_data
 			&& allocator_data->pool_size < ~(uintptr_t) 0, 0))
     {
diff --git a/libgomp/config/nvptx/allocator.c b/libgomp/config/nvptx/allocator.c
index f740b97f6ac..0102680b717 100644
--- a/libgomp/config/nvptx/allocator.c
+++ b/libgomp/config/nvptx/allocator.c
@@ -358,6 +358,15 @@ nvptx_memspace_realloc (omp_memspace_handle_t memspace, void *addr,
     return realloc (addr, size);
 }
 
+static inline int
+nvptx_memspace_validate (omp_memspace_handle_t memspace, unsigned access)
+{
+  /* Disallow use of low-latency memory when it must be accessible by
+     all threads.  */
+  return (memspace != omp_low_lat_mem_space
+	  || access != omp_atv_all);
+}
+
 #define MEMSPACE_ALLOC(MEMSPACE, SIZE, PIN) \
   nvptx_memspace_alloc (MEMSPACE, SIZE)
 #define MEMSPACE_CALLOC(MEMSPACE, SIZE, PIN) \
@@ -366,5 +375,7 @@ nvptx_memspace_realloc (omp_memspace_handle_t memspace, void *addr,
   nvptx_memspace_realloc (MEMSPACE, ADDR, OLDSIZE, SIZE)
 #define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE, PIN) \
   nvptx_memspace_free (MEMSPACE, ADDR, SIZE)
+#define MEMSPACE_VALIDATE(MEMSPACE, ACCESS) \
+  nvptx_memspace_validate (MEMSPACE, ACCESS)
 
 #include "../../allocator.c"
diff --git a/libgomp/testsuite/libgomp.c/allocators-4.c b/libgomp/testsuite/libgomp.c/allocators-4.c
index 9fa6aa1624f..cae27ea33c1 100644
--- a/libgomp/testsuite/libgomp.c/allocators-4.c
+++ b/libgomp/testsuite/libgomp.c/allocators-4.c
@@ -23,10 +23,11 @@ main ()
   #pragma omp target
   {
     /* Ensure that the memory we get *is* low-latency with a null-fallback.  */
-    omp_alloctrait_t traits[1]
-      = { { omp_atk_fallback, omp_atv_null_fb } };
+    omp_alloctrait_t traits[2]
+      = { { omp_atk_fallback, omp_atv_null_fb },
+          { omp_atk_access, omp_atv_pteam } };
     omp_allocator_handle_t lowlat = omp_init_allocator (omp_low_lat_mem_space,
-							1, traits);
+							2, traits);
 
     int size = 4;
 
diff --git a/libgomp/testsuite/libgomp.c/allocators-6.c b/libgomp/testsuite/libgomp.c/allocators-6.c
index 90bf73095ef..c03233df582 100644
--- a/libgomp/testsuite/libgomp.c/allocators-6.c
+++ b/libgomp/testsuite/libgomp.c/allocators-6.c
@@ -23,10 +23,11 @@ main ()
   #pragma omp target
   {
     /* Ensure that the memory we get *is* low-latency with a null-fallback.  */
-    omp_alloctrait_t traits[1]
-      = { { omp_atk_fallback, omp_atv_null_fb } };
+    omp_alloctrait_t traits[2]
+      = { { omp_atk_fallback, omp_atv_null_fb },
+          { omp_atk_access, omp_atv_pteam } };
     omp_allocator_handle_t lowlat = omp_init_allocator (omp_low_lat_mem_space,
-							1, traits);
+							2, traits);
 
     int size = 16;
 
diff --git a/libgomp/testsuite/libgomp.c/allocators-7.c b/libgomp/testsuite/libgomp.c/allocators-7.c
new file mode 100644
index 00000000000..a0a738b1d1d
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/allocators-7.c
@@ -0,0 +1,68 @@
+/* { dg-do run } */
+
+/* { dg-require-effective-target offload_device } */
+/* { dg-xfail-if "not implemented" { ! offload_target_nvptx } } */
+
+/* Test that GPU low-latency allocation is limited to team access.  */
+
+#include <stddef.h>
+#include <omp.h>
+
+#pragma omp requires dynamic_allocators
+
+int
+main ()
+{
+  #pragma omp target
+  {
+    /* Ensure that the memory we get *is* low-latency with a null-fallback.  */
+    omp_alloctrait_t traits[2]
+      = { { omp_atk_fallback, omp_atv_null_fb },
+	  { omp_atk_access, omp_atv_pteam } };
+    omp_allocator_handle_t lowlat = omp_init_allocator (omp_low_lat_mem_space,
+							2, traits);
+
+    omp_alloctrait_t traits_all[2]
+      = { { omp_atk_fallback, omp_atv_null_fb },
+	  { omp_atk_access, omp_atv_all } };
+    omp_allocator_handle_t lowlat_all
+      = omp_init_allocator (omp_low_lat_mem_space, 2, traits_all);
+
+    omp_alloctrait_t traits_default[1]
+      = { { omp_atk_fallback, omp_atv_null_fb } };
+    omp_allocator_handle_t lowlat_default
+      = omp_init_allocator (omp_low_lat_mem_space, 1, traits_default);
+
+    void *a = omp_alloc(1, lowlat);	    // good
+    void *b = omp_alloc(1, lowlat_all);     // bad
+    void *c = omp_alloc(1, lowlat_default); // bad
+
+    if (!a || b || c)
+      __builtin_abort ();
+
+    omp_free (a, lowlat);
+
+
+    a = omp_calloc(1, 1, lowlat);	  // good
+    b = omp_calloc(1, 1, lowlat_all);     // bad
+    c = omp_calloc(1, 1, lowlat_default); // bad
+
+    if (!a || b || c)
+      __builtin_abort ();
+
+    omp_free (a, lowlat);
+
+
+    a = omp_realloc(NULL, 1, lowlat, lowlat);		      // good
+    b = omp_realloc(NULL, 1, lowlat_all, lowlat_all);	      // bad
+    c = omp_realloc(NULL, 1, lowlat_default, lowlat_default); // bad
+
+    if (!a || b || c)
+      __builtin_abort ();
+
+    omp_free (a, lowlat);
+  }
+
+return 0;
+}
+

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 05/17] openmp, nvptx: ompx_unified_shared_mem_alloc
  2022-07-07 10:34 [PATCH 00/17] openmp, nvptx, amdgcn: 5.0 Memory Allocators Andrew Stubbs
                   ` (3 preceding siblings ...)
  2022-07-07 10:34 ` [PATCH 04/17] openmp, nvptx: low-lat memory access traits Andrew Stubbs
@ 2022-07-07 10:34 ` Andrew Stubbs
  2022-07-07 10:34 ` [PATCH 06/17] openmp: Add -foffload-memory Andrew Stubbs
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 30+ messages in thread
From: Andrew Stubbs @ 2022-07-07 10:34 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 4159 bytes --]


This adds support for using Cuda Managed Memory with omp_alloc.  It will be
used as the underpinnings for "requires unified_shared_memory" in a later
patch.

There are two new predefined allocators, ompx_unified_shared_mem_alloc and
ompx_host_mem_alloc, plus corresponding memory spaces, which can be used to
allocate memory in the "managed" space and explicitly on the host (it is
intended that "malloc" will be intercepted by the compiler).

The nvptx plugin is modified to make the necessary Cuda calls, and libgomp
is modified to switch to shared-memory mode for USM allocated mappings.

include/ChangeLog:

	* cuda/cuda.h (CUdevice_attribute): Add definitions for
	CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR and
	CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR.
	(CUmemAttach_flags): New.
	(CUpointer_attribute): New.
	(cuMemAllocManaged): New prototype.
	(cuPointerGetAttribute): New prototype.

libgomp/ChangeLog:

	* allocator.c (omp_max_predefined_alloc): Update.
	(omp_aligned_alloc): Don't fallback ompx_host_mem_alloc.
	(omp_aligned_calloc): Likewise.
	(omp_realloc): Likewise.
	* config/linux/allocator.c (linux_memspace_alloc): Handle USM.
	(linux_memspace_calloc): Handle USM.
	(linux_memspace_free): Handle USM.
	(linux_memspace_realloc): Handle USM.
	* config/nvptx/allocator.c (nvptx_memspace_alloc): Reject
	ompx_host_mem_alloc.
	(nvptx_memspace_calloc): Likewise.
	(nvptx_memspace_realloc): Likewise.
	* libgomp-plugin.h (GOMP_OFFLOAD_usm_alloc): New prototype.
	(GOMP_OFFLOAD_usm_free): New prototype.
	(GOMP_OFFLOAD_is_usm_ptr): New prototype.
	* libgomp.h (gomp_usm_alloc): New prototype.
	(gomp_usm_free): New prototype.
	(gomp_is_usm_ptr): New prototype.
	(struct gomp_device_descr): Add USM functions.
	* omp.h.in (omp_memspace_handle_t): Add ompx_unified_shared_mem_space
	and ompx_host_mem_space.
	(omp_allocator_handle_t): Add ompx_unified_shared_mem_alloc and
	ompx_host_mem_alloc.
	* omp_lib.f90.in: Likewise.
	* plugin/cuda-lib.def (cuMemAllocManaged): Add new call.
	(cuPointerGetAttribute): Likewise.
	* plugin/plugin-nvptx.c (nvptx_alloc): Add "usm" parameter.
	Call cuMemAllocManaged as appropriate.
	(GOMP_OFFLOAD_get_num_devices): Allow GOMP_REQUIRES_UNIFIED_ADDRESS
	and GOMP_REQUIRES_UNIFIED_SHARED_MEMORY.
	(GOMP_OFFLOAD_alloc): Move internals to ...
	(GOMP_OFFLOAD_alloc_1): ... this, and add usm parameter.
	(GOMP_OFFLOAD_usm_alloc): New function.
	(GOMP_OFFLOAD_usm_free): New function.
	(GOMP_OFFLOAD_is_usm_ptr): New function.
	* target.c (gomp_map_vars_internal): Add USM support.
	(gomp_usm_alloc): New function.
	(gomp_usm_free): New function.
	(gomp_load_plugin_for_device): New function.
	* testsuite/libgomp.c/usm-1.c: New test.
	* testsuite/libgomp.c/usm-2.c: New test.
	* testsuite/libgomp.c/usm-3.c: New test.
	* testsuite/libgomp.c/usm-4.c: New test.
	* testsuite/libgomp.c/usm-5.c: New test.

co-authored-by: Kwok Cheung Yeung  <kcy@codesourcery.com>

squash! openmp, nvptx: ompx_unified_shared_mem_alloc
---
 include/cuda/cuda.h                 | 12 ++++++
 libgomp/allocator.c                 | 13 ++++--
 libgomp/config/linux/allocator.c    | 48 ++++++++++++++--------
 libgomp/config/nvptx/allocator.c    |  6 +++
 libgomp/libgomp-plugin.h            |  3 ++
 libgomp/libgomp.h                   |  6 +++
 libgomp/omp.h.in                    |  4 ++
 libgomp/omp_lib.f90.in              |  8 ++++
 libgomp/plugin/cuda-lib.def         |  2 +
 libgomp/plugin/plugin-nvptx.c       | 47 ++++++++++++++++++---
 libgomp/target.c                    | 64 +++++++++++++++++++++++++++++
 libgomp/testsuite/libgomp.c/usm-1.c | 24 +++++++++++
 libgomp/testsuite/libgomp.c/usm-2.c | 32 +++++++++++++++
 libgomp/testsuite/libgomp.c/usm-3.c | 35 ++++++++++++++++
 libgomp/testsuite/libgomp.c/usm-4.c | 36 ++++++++++++++++
 libgomp/testsuite/libgomp.c/usm-5.c | 28 +++++++++++++
 16 files changed, 340 insertions(+), 28 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c/usm-1.c
 create mode 100644 libgomp/testsuite/libgomp.c/usm-2.c
 create mode 100644 libgomp/testsuite/libgomp.c/usm-3.c
 create mode 100644 libgomp/testsuite/libgomp.c/usm-4.c
 create mode 100644 libgomp/testsuite/libgomp.c/usm-5.c


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0005-openmp-nvptx-ompx_unified_shared_mem_alloc.patch --]
[-- Type: text/x-patch; name="0005-openmp-nvptx-ompx_unified_shared_mem_alloc.patch", Size: 21130 bytes --]

diff --git a/include/cuda/cuda.h b/include/cuda/cuda.h
index 3938d05d150..8135e7c9247 100644
--- a/include/cuda/cuda.h
+++ b/include/cuda/cuda.h
@@ -77,9 +77,19 @@ typedef enum {
   CU_DEVICE_ATTRIBUTE_CONCURRENT_KERNELS = 31,
   CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_MULTIPROCESSOR = 39,
   CU_DEVICE_ATTRIBUTE_ASYNC_ENGINE_COUNT = 40,
+  CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR = 75,
+  CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR = 76,
   CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR = 82
 } CUdevice_attribute;
 
+typedef enum {
+  CU_MEM_ATTACH_GLOBAL = 0x1
+} CUmemAttach_flags;
+
+typedef enum {
+  CU_POINTER_ATTRIBUTE_IS_MANAGED = 8
+} CUpointer_attribute;
+
 enum {
   CU_EVENT_DEFAULT = 0,
   CU_EVENT_DISABLE_TIMING = 2
@@ -169,6 +179,7 @@ CUresult cuMemGetInfo (size_t *, size_t *);
 CUresult cuMemAlloc (CUdeviceptr *, size_t);
 #define cuMemAllocHost cuMemAllocHost_v2
 CUresult cuMemAllocHost (void **, size_t);
+CUresult cuMemAllocManaged(CUdeviceptr *, size_t, unsigned int);
 CUresult cuMemcpy (CUdeviceptr, CUdeviceptr, size_t);
 #define cuMemcpyDtoDAsync cuMemcpyDtoDAsync_v2
 CUresult cuMemcpyDtoDAsync (CUdeviceptr, CUdeviceptr, size_t, CUstream);
@@ -195,6 +206,7 @@ CUresult cuModuleLoadData (CUmodule *, const void *);
 CUresult cuModuleUnload (CUmodule);
 CUresult cuOccupancyMaxPotentialBlockSize(int *, int *, CUfunction,
 					  CUoccupancyB2DSize, size_t, int);
+CUresult cuPointerGetAttribute(void *, CUpointer_attribute, CUdeviceptr);
 typedef void (*CUstreamCallback)(CUstream, CUresult, void *);
 CUresult cuStreamAddCallback(CUstream, CUstreamCallback, void *, unsigned int);
 CUresult cuStreamCreate (CUstream *, unsigned);
diff --git a/libgomp/allocator.c b/libgomp/allocator.c
index 48ab0782e6b..ec31f8841a3 100644
--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -35,7 +35,7 @@
 #include <dlfcn.h>
 #endif
 
-#define omp_max_predefined_alloc ompx_pinned_mem_alloc
+#define omp_max_predefined_alloc ompx_host_mem_alloc
 
 /* These macros may be overridden in config/<target>/allocator.c.  */
 #ifndef MEMSPACE_ALLOC
@@ -71,6 +71,8 @@ static const omp_memspace_handle_t predefined_alloc_mapping[] = {
   omp_low_lat_mem_space,   /* omp_pteam_mem_alloc. */
   omp_low_lat_mem_space,   /* omp_thread_mem_alloc. */
   omp_default_mem_space,   /* ompx_pinned_mem_alloc. */
+  ompx_unified_shared_mem_space,  /* ompx_unified_shared_mem_alloc. */
+  ompx_host_mem_space,     /* ompx_host_mem_alloc.  */
 };
 
 enum gomp_memkind_kind
@@ -546,7 +548,8 @@ fail:
   int fallback = (allocator_data
 		  ? allocator_data->fallback
 		  : (allocator == omp_default_mem_alloc
-		     || allocator == ompx_pinned_mem_alloc)
+		     || allocator == ompx_pinned_mem_alloc
+		     || allocator == ompx_host_mem_alloc)
 		  ? omp_atv_null_fb
 		  : omp_atv_default_mem_fb);
   switch (fallback)
@@ -845,7 +848,8 @@ fail:
   int fallback = (allocator_data
 		  ? allocator_data->fallback
 		  : (allocator == omp_default_mem_alloc
-		     || allocator == ompx_pinned_mem_alloc)
+		     || allocator == ompx_pinned_mem_alloc
+		     || allocator == ompx_host_mem_alloc)
 		  ? omp_atv_null_fb
 		  : omp_atv_default_mem_fb);
   switch (fallback)
@@ -1195,7 +1199,8 @@ fail:
   int fallback = (allocator_data
 		  ? allocator_data->fallback
 		  : (allocator == omp_default_mem_alloc
-		     || allocator == ompx_pinned_mem_alloc)
+		     || allocator == ompx_pinned_mem_alloc
+		     || allocator == ompx_host_mem_alloc)
 		  ? omp_atv_null_fb
 		  : omp_atv_default_mem_fb);
   switch (fallback)
diff --git a/libgomp/config/linux/allocator.c b/libgomp/config/linux/allocator.c
index 1496e41875c..18235f59775 100644
--- a/libgomp/config/linux/allocator.c
+++ b/libgomp/config/linux/allocator.c
@@ -53,9 +53,11 @@
 static void *
 linux_memspace_alloc (omp_memspace_handle_t memspace, size_t size, int pin)
 {
-  (void)memspace;
-
-  if (pin)
+  if (memspace == ompx_unified_shared_mem_space)
+    {
+      return gomp_usm_alloc (size);
+    }
+  else if (pin)
     {
       void *addr = mmap (NULL, size, PROT_READ | PROT_WRITE,
 			 MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
@@ -78,7 +80,14 @@ linux_memspace_alloc (omp_memspace_handle_t memspace, size_t size, int pin)
 static void *
 linux_memspace_calloc (omp_memspace_handle_t memspace, size_t size, int pin)
 {
-  if (pin)
+  if (memspace == ompx_unified_shared_mem_space)
+    {
+      void *ret = gomp_usm_alloc (size);
+      memset (ret, 0, size);
+      return ret;
+    }
+  else if (memspace == ompx_unified_shared_mem_space
+      || pin)
     return linux_memspace_alloc (memspace, size, pin);
   else
     return calloc (1, size);
@@ -88,9 +97,9 @@ static void
 linux_memspace_free (omp_memspace_handle_t memspace, void *addr, size_t size,
 		     int pin)
 {
-  (void)memspace;
-
-  if (pin)
+  if (memspace == ompx_unified_shared_mem_space)
+    gomp_usm_free (addr);
+  else if (pin)
     munmap (addr, size);
   else
     free (addr);
@@ -100,7 +109,9 @@ static void *
 linux_memspace_realloc (omp_memspace_handle_t memspace, void *addr,
 			size_t oldsize, size_t size, int oldpin, int pin)
 {
-  if (oldpin && pin)
+  if (memspace == ompx_unified_shared_mem_space)
+    goto manual_realloc;
+  else if (oldpin && pin)
     {
       void *newaddr = mremap (addr, oldsize, size, MREMAP_MAYMOVE);
       if (newaddr == MAP_FAILED)
@@ -109,18 +120,19 @@ linux_memspace_realloc (omp_memspace_handle_t memspace, void *addr,
       return newaddr;
     }
   else if (oldpin || pin)
-    {
-      void *newaddr = linux_memspace_alloc (memspace, size, pin);
-      if (newaddr)
-	{
-	  memcpy (newaddr, addr, oldsize < size ? oldsize : size);
-	  linux_memspace_free (memspace, addr, oldsize, oldpin);
-	}
-
-      return newaddr;
-    }
+    goto manual_realloc;
   else
     return realloc (addr, size);
+
+manual_realloc:
+  void *newaddr = linux_memspace_alloc (memspace, size, pin);
+  if (newaddr)
+    {
+      memcpy (newaddr, addr, oldsize < size ? oldsize : size);
+      linux_memspace_free (memspace, addr, oldsize, oldpin);
+    }
+
+  return newaddr;
 }
 
 #define MEMSPACE_ALLOC(MEMSPACE, SIZE, PIN) \
diff --git a/libgomp/config/nvptx/allocator.c b/libgomp/config/nvptx/allocator.c
index 0102680b717..c1a73511623 100644
--- a/libgomp/config/nvptx/allocator.c
+++ b/libgomp/config/nvptx/allocator.c
@@ -125,6 +125,8 @@ nvptx_memspace_alloc (omp_memspace_handle_t memspace, size_t size)
       __atomic_store_n (&__nvptx_lowlat_heap_root, root.raw, MEMMODEL_RELEASE);
       return result;
     }
+  else if (memspace == ompx_host_mem_space)
+    return NULL;
   else
     return malloc (size);
 }
@@ -145,6 +147,8 @@ nvptx_memspace_calloc (omp_memspace_handle_t memspace, size_t size)
 
       return result;
     }
+  else if (memspace == ompx_host_mem_space)
+    return NULL;
   else
     return calloc (1, size);
 }
@@ -354,6 +358,8 @@ nvptx_memspace_realloc (omp_memspace_handle_t memspace, void *addr,
 	}
       return result;
     }
+  else if (memspace == ompx_host_mem_space)
+    return NULL;
   else
     return realloc (addr, size);
 }
diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h
index ab3ed638475..3e609bd3894 100644
--- a/libgomp/libgomp-plugin.h
+++ b/libgomp/libgomp-plugin.h
@@ -134,6 +134,9 @@ extern int GOMP_OFFLOAD_load_image (int, unsigned, const void *,
 extern bool GOMP_OFFLOAD_unload_image (int, unsigned, const void *);
 extern void *GOMP_OFFLOAD_alloc (int, size_t);
 extern bool GOMP_OFFLOAD_free (int, void *);
+extern void *GOMP_OFFLOAD_usm_alloc (int, size_t);
+extern bool GOMP_OFFLOAD_usm_free (int, void *);
+extern bool GOMP_OFFLOAD_is_usm_ptr (void *);
 extern bool GOMP_OFFLOAD_dev2host (int, void *, const void *, size_t);
 extern bool GOMP_OFFLOAD_host2dev (int, void *, const void *, size_t);
 extern bool GOMP_OFFLOAD_dev2dev (int, void *, const void *, size_t);
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index c243c4d6cf4..3fdce301372 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -1014,6 +1014,9 @@ extern int gomp_pause_host (void);
 extern void gomp_init_targets_once (void);
 extern int gomp_get_num_devices (void);
 extern bool gomp_target_task_fn (void *);
+extern void * gomp_usm_alloc (size_t size);
+extern void gomp_usm_free (void *device_ptr);
+extern bool gomp_is_usm_ptr (void *ptr);
 
 /* Splay tree definitions.  */
 typedef struct splay_tree_node_s *splay_tree_node;
@@ -1239,6 +1242,9 @@ struct gomp_device_descr
   __typeof (GOMP_OFFLOAD_unload_image) *unload_image_func;
   __typeof (GOMP_OFFLOAD_alloc) *alloc_func;
   __typeof (GOMP_OFFLOAD_free) *free_func;
+  __typeof (GOMP_OFFLOAD_usm_alloc) *usm_alloc_func;
+  __typeof (GOMP_OFFLOAD_usm_free) *usm_free_func;
+  __typeof (GOMP_OFFLOAD_is_usm_ptr) *is_usm_ptr_func;
   __typeof (GOMP_OFFLOAD_dev2host) *dev2host_func;
   __typeof (GOMP_OFFLOAD_host2dev) *host2dev_func;
   __typeof (GOMP_OFFLOAD_dev2dev) *dev2dev_func;
diff --git a/libgomp/omp.h.in b/libgomp/omp.h.in
index eb071aa2e00..eea019ad88d 100644
--- a/libgomp/omp.h.in
+++ b/libgomp/omp.h.in
@@ -120,6 +120,8 @@ typedef enum omp_memspace_handle_t __GOMP_UINTPTR_T_ENUM
   omp_const_mem_space = 2,
   omp_high_bw_mem_space = 3,
   omp_low_lat_mem_space = 4,
+  ompx_unified_shared_mem_space = 5,
+  ompx_host_mem_space = 6,
   __omp_memspace_handle_t_max__ = __UINTPTR_MAX__
 } omp_memspace_handle_t;
 
@@ -135,6 +137,8 @@ typedef enum omp_allocator_handle_t __GOMP_UINTPTR_T_ENUM
   omp_pteam_mem_alloc = 7,
   omp_thread_mem_alloc = 8,
   ompx_pinned_mem_alloc = 9,
+  ompx_unified_shared_mem_alloc = 10,
+  ompx_host_mem_alloc = 11,
   __omp_allocator_handle_t_max__ = __UINTPTR_MAX__
 } omp_allocator_handle_t;
 
diff --git a/libgomp/omp_lib.f90.in b/libgomp/omp_lib.f90.in
index 10610d64cfe..39a58b4bc4d 100644
--- a/libgomp/omp_lib.f90.in
+++ b/libgomp/omp_lib.f90.in
@@ -160,6 +160,10 @@
                  parameter :: omp_thread_mem_alloc = 8
         integer (kind=omp_allocator_handle_kind), &
                  parameter :: ompx_pinned_mem_alloc = 9
+        integer (kind=omp_allocator_handle_kind), &
+                 parameter :: ompx_unified_shared_mem_alloc = 10
+        integer (kind=omp_allocator_handle_kind), &
+                 parameter :: ompx_host_mem_alloc = 11
         integer (omp_memspace_handle_kind), &
                  parameter :: omp_default_mem_space = 0
         integer (omp_memspace_handle_kind), &
@@ -170,6 +174,10 @@
                  parameter :: omp_high_bw_mem_space = 3
         integer (omp_memspace_handle_kind), &
                  parameter :: omp_low_lat_mem_space = 4
+        integer (omp_memspace_handle_kind), &
+                 parameter :: omp_unified_shared_mem_space = 5
+        integer (omp_memspace_handle_kind), &
+                 parameter :: omp_host_mem_space = 6
         integer, parameter :: omp_initial_device = -1
         integer, parameter :: omp_invalid_device = -4
 
diff --git a/libgomp/plugin/cuda-lib.def b/libgomp/plugin/cuda-lib.def
index cd91b39b1d2..b6d03290f35 100644
--- a/libgomp/plugin/cuda-lib.def
+++ b/libgomp/plugin/cuda-lib.def
@@ -29,6 +29,7 @@ CUDA_ONE_CALL_MAYBE_NULL (cuLinkCreate_v2)
 CUDA_ONE_CALL (cuLinkDestroy)
 CUDA_ONE_CALL (cuMemAlloc)
 CUDA_ONE_CALL (cuMemAllocHost)
+CUDA_ONE_CALL (cuMemAllocManaged)
 CUDA_ONE_CALL (cuMemcpy)
 CUDA_ONE_CALL (cuMemcpyDtoDAsync)
 CUDA_ONE_CALL (cuMemcpyDtoH)
@@ -46,6 +47,7 @@ CUDA_ONE_CALL (cuModuleLoad)
 CUDA_ONE_CALL (cuModuleLoadData)
 CUDA_ONE_CALL (cuModuleUnload)
 CUDA_ONE_CALL_MAYBE_NULL (cuOccupancyMaxPotentialBlockSize)
+CUDA_ONE_CALL (cuPointerGetAttribute)
 CUDA_ONE_CALL (cuStreamAddCallback)
 CUDA_ONE_CALL (cuStreamCreate)
 CUDA_ONE_CALL (cuStreamDestroy)
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 40739ba592d..2800c0dce6d 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1046,11 +1046,13 @@ nvptx_stacks_free (struct ptx_device *ptx_dev, bool force)
 }
 
 static void *
-nvptx_alloc (size_t s, bool suppress_errors)
+nvptx_alloc (size_t s, bool suppress_errors, bool usm)
 {
   CUdeviceptr d;
 
-  CUresult r = CUDA_CALL_NOCHECK (cuMemAlloc, &d, s);
+  CUresult r = (usm ? CUDA_CALL_NOCHECK (cuMemAllocManaged, &d, s,
+					 CU_MEM_ATTACH_GLOBAL)
+		: CUDA_CALL_NOCHECK (cuMemAlloc, &d, s));
   if (suppress_errors && r == CUDA_ERROR_OUT_OF_MEMORY)
     return NULL;
   else if (r != CUDA_SUCCESS)
@@ -1185,6 +1187,8 @@ GOMP_OFFLOAD_get_num_devices (unsigned int omp_requires_mask)
   int num_devices = nvptx_get_num_devices ();
   /* Return -1 if no omp_requires_mask cannot be fulfilled but
      devices were present.  */
+  omp_requires_mask &= ~(GOMP_REQUIRES_UNIFIED_ADDRESS
+			 | GOMP_REQUIRES_UNIFIED_SHARED_MEMORY);
   if (num_devices > 0 && omp_requires_mask != 0)
     return -1;
   return num_devices;
@@ -1432,8 +1436,8 @@ GOMP_OFFLOAD_unload_image (int ord, unsigned version, const void *target_data)
   return ret;
 }
 
-void *
-GOMP_OFFLOAD_alloc (int ord, size_t size)
+static void *
+GOMP_OFFLOAD_alloc_1 (int ord, size_t size, bool usm)
 {
   if (!nvptx_attach_host_thread_to_device (ord))
     return NULL;
@@ -1456,7 +1460,7 @@ GOMP_OFFLOAD_alloc (int ord, size_t size)
       blocks = tmp;
     }
 
-  void *d = nvptx_alloc (size, true);
+  void *d = nvptx_alloc (size, true, usm);
   if (d)
     return d;
   else
@@ -1464,10 +1468,22 @@ GOMP_OFFLOAD_alloc (int ord, size_t size)
       /* Memory allocation failed.  Try freeing the stacks block, and
 	 retrying.  */
       nvptx_stacks_free (ptx_dev, true);
-      return nvptx_alloc (size, false);
+      return nvptx_alloc (size, false, usm);
     }
 }
 
+void *
+GOMP_OFFLOAD_alloc (int ord, size_t size)
+{
+  return GOMP_OFFLOAD_alloc_1 (ord, size, false);
+}
+
+void *
+GOMP_OFFLOAD_usm_alloc (int ord, size_t size)
+{
+  return GOMP_OFFLOAD_alloc_1 (ord, size, true);
+}
+
 bool
 GOMP_OFFLOAD_free (int ord, void *ptr)
 {
@@ -1475,6 +1491,25 @@ GOMP_OFFLOAD_free (int ord, void *ptr)
 	  && nvptx_free (ptr, ptx_devices[ord]));
 }
 
+bool
+GOMP_OFFLOAD_usm_free (int ord, void *ptr)
+{
+  return GOMP_OFFLOAD_free (ord, ptr);
+}
+
+bool
+GOMP_OFFLOAD_is_usm_ptr (void *ptr)
+{
+  bool managed = false;
+  /* This returns 3 outcomes ...
+     CUDA_ERROR_INVALID_VALUE    - Not a Cuda allocated pointer.
+     CUDA_SUCCESS, managed:false - Cuda allocated, but not USM.
+     CUDA_SUCCESS, managed:true  - USM.  */
+  CUDA_CALL_NOCHECK (cuPointerGetAttribute, &managed,
+		     CU_POINTER_ATTRIBUTE_IS_MANAGED, (CUdeviceptr)ptr);
+  return managed;
+}
+
 void
 GOMP_OFFLOAD_openacc_exec (void (*fn) (void *), size_t mapnum,
 			   void **hostaddrs, void **devaddrs,
diff --git a/libgomp/target.c b/libgomp/target.c
index 4dac81862d7..4e203ae3c06 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -1049,6 +1049,15 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep,
 	    tgt->list[i].offset = 0;
 	  continue;
 	}
+      else if (devicep->is_usm_ptr_func
+	       && devicep->is_usm_ptr_func (hostaddrs[i]))
+	{
+	  /* The memory is visible from both host and target
+	     so nothing needs to be moved.  */
+	  tgt->list[i].key = NULL;
+	  tgt->list[i].offset = OFFSET_INLINED;
+	  continue;
+	}
       else if ((kind & typemask) == GOMP_MAP_STRUCT)
 	{
 	  size_t first = i + 1;
@@ -1524,6 +1533,8 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep,
 		  continue;
 		}
 	      default:
+		if (tgt->list[i].offset == OFFSET_INLINED)
+		  continue;
 		break;
 	      }
 	    splay_tree_key k = &array->key;
@@ -3401,6 +3412,56 @@ omp_target_free (void *device_ptr, int device_num)
   gomp_mutex_unlock (&devicep->lock);
 }
 
+void *
+gomp_usm_alloc (size_t size)
+{
+  struct gomp_task_icv *icv = gomp_icv (false);
+  struct gomp_device_descr *devicep = resolve_device (icv->default_device_var,
+						      false);
+  if (devicep == NULL)
+    return NULL;
+
+  if (!(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
+      || devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
+    return malloc (size);
+
+  void *ret = NULL;
+  gomp_mutex_lock (&devicep->lock);
+  if (devicep->usm_alloc_func)
+    ret = devicep->usm_alloc_func (devicep->target_id, size);
+  gomp_mutex_unlock (&devicep->lock);
+  return ret;
+}
+
+void
+gomp_usm_free (void *device_ptr)
+{
+  if (device_ptr == NULL)
+    return;
+
+  struct gomp_task_icv *icv = gomp_icv (false);
+  struct gomp_device_descr *devicep = resolve_device (icv->default_device_var,
+						      false);
+  if (devicep == NULL)
+    return;
+
+  if (!(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
+      || devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
+    {
+      free (device_ptr);
+      return;
+    }
+
+  gomp_mutex_lock (&devicep->lock);
+  if (devicep->usm_free_func
+      && !devicep->usm_free_func (devicep->target_id, device_ptr))
+    {
+      gomp_mutex_unlock (&devicep->lock);
+      gomp_fatal ("error in freeing device memory block at %p", device_ptr);
+    }
+  gomp_mutex_unlock (&devicep->lock);
+}
+
 int
 omp_target_is_present (const void *ptr, int device_num)
 {
@@ -4041,6 +4102,9 @@ gomp_load_plugin_for_device (struct gomp_device_descr *device,
   DLSYM (unload_image);
   DLSYM (alloc);
   DLSYM (free);
+  DLSYM_OPT (usm_alloc, usm_alloc);
+  DLSYM_OPT (usm_free, usm_free);
+  DLSYM_OPT (is_usm_ptr, is_usm_ptr);
   DLSYM (dev2host);
   DLSYM (host2dev);
   device->capabilities = device->get_caps_func ();
diff --git a/libgomp/testsuite/libgomp.c/usm-1.c b/libgomp/testsuite/libgomp.c/usm-1.c
new file mode 100644
index 00000000000..1b35f19c45b
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/usm-1.c
@@ -0,0 +1,24 @@
+/* { dg-do run } */
+
+#include <omp.h>
+#include <stdint.h>
+
+int
+main ()
+{
+  int *a = (int *) omp_alloc(sizeof(int), ompx_unified_shared_mem_alloc);
+  if (!a)
+    __builtin_abort ();
+
+  *a = 42;
+  uintptr_t a_p = (uintptr_t)a;
+
+  #pragma omp target is_device_ptr(a)
+    {
+      if (*a != 42 || a_p != (uintptr_t)a)
+	__builtin_abort ();
+    }
+
+  omp_free(a, ompx_unified_shared_mem_alloc);
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c/usm-2.c b/libgomp/testsuite/libgomp.c/usm-2.c
new file mode 100644
index 00000000000..689cee7e456
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/usm-2.c
@@ -0,0 +1,32 @@
+/* { dg-do run } */
+
+#include <omp.h>
+#include <stdint.h>
+
+int
+main ()
+{
+  int *a = (int *) omp_alloc(sizeof(int)*2, ompx_unified_shared_mem_alloc);
+  if (!a)
+    __builtin_abort ();
+
+  a[0] = 42;
+  a[1] = 43;
+
+  uintptr_t a_p = (uintptr_t)a;
+
+  #pragma omp target map(a[0])
+    {
+      if (a[0] != 42 || a_p != (uintptr_t)a)
+	__builtin_abort ();
+    }
+
+  #pragma omp target map(a[1])
+    {
+      if (a[1] != 43 || a_p != (uintptr_t)a)
+	__builtin_abort ();
+    }
+
+  omp_free(a, ompx_unified_shared_mem_alloc);
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c/usm-3.c b/libgomp/testsuite/libgomp.c/usm-3.c
new file mode 100644
index 00000000000..2ca66afe93f
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/usm-3.c
@@ -0,0 +1,35 @@
+/* { dg-do run } */
+
+#include <omp.h>
+#include <stdint.h>
+
+int
+main ()
+{
+  int *a = (int *) omp_alloc(sizeof(int)*2, ompx_unified_shared_mem_alloc);
+  if (!a)
+    __builtin_abort ();
+
+  a[0] = 42;
+  a[1] = 43;
+
+  uintptr_t a_p = (uintptr_t)a;
+
+#pragma omp target data map(a[0:2])
+    {
+#pragma omp target
+	{
+	  if (a[0] != 42 || a_p != (uintptr_t)a)
+	    __builtin_abort ();
+	}
+
+#pragma omp target
+	{
+	  if (a[1] != 43 || a_p != (uintptr_t)a)
+	    __builtin_abort ();
+	}
+    }
+
+  omp_free(a, ompx_unified_shared_mem_alloc);
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c/usm-4.c b/libgomp/testsuite/libgomp.c/usm-4.c
new file mode 100644
index 00000000000..753908c8440
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/usm-4.c
@@ -0,0 +1,36 @@
+/* { dg-do run } */
+
+#include <omp.h>
+#include <stdint.h>
+
+int
+main ()
+{
+  int *a = (int *) omp_alloc(sizeof(int)*2, ompx_unified_shared_mem_alloc);
+  if (!a)
+    __builtin_abort ();
+
+  a[0] = 42;
+  a[1] = 43;
+
+  uintptr_t a_p = (uintptr_t)a;
+
+#pragma omp target enter data map(to:a[0:2])
+
+#pragma omp target
+    {
+      if (a[0] != 42 || a_p != (uintptr_t)a)
+	__builtin_abort ();
+    }
+
+#pragma omp target
+    {
+      if (a[1] != 43 || a_p != (uintptr_t)a)
+	__builtin_abort ();
+    }
+
+#pragma omp target exit data map(delete:a[0:2])
+
+  omp_free(a, ompx_unified_shared_mem_alloc);
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c/usm-5.c b/libgomp/testsuite/libgomp.c/usm-5.c
new file mode 100644
index 00000000000..4d8b3cf71b1
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/usm-5.c
@@ -0,0 +1,28 @@
+/* { dg-do run } */
+/* { dg-require-effective-target offload_device } */
+
+#include <omp.h>
+#include <stdint.h>
+
+#pragma omp requires unified_shared_memory
+
+int
+main ()
+{
+  int *a = (int *) omp_alloc(sizeof(int), ompx_host_mem_alloc);
+  if (!a)
+    __builtin_abort ();
+
+  a[0] = 42;
+
+  uintptr_t a_p = (uintptr_t)a;
+
+#pragma omp target map(a[0:1])
+    {
+      if (a[0] != 42 || a_p == (uintptr_t)a)
+	__builtin_abort ();
+    }
+
+  omp_free(a, ompx_host_mem_alloc);
+  return 0;
+}

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 06/17] openmp: Add -foffload-memory
  2022-07-07 10:34 [PATCH 00/17] openmp, nvptx, amdgcn: 5.0 Memory Allocators Andrew Stubbs
                   ` (4 preceding siblings ...)
  2022-07-07 10:34 ` [PATCH 05/17] openmp, nvptx: ompx_unified_shared_mem_alloc Andrew Stubbs
@ 2022-07-07 10:34 ` Andrew Stubbs
  2022-07-07 10:34 ` [PATCH 07/17] openmp: allow requires unified_shared_memory Andrew Stubbs
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 30+ messages in thread
From: Andrew Stubbs @ 2022-07-07 10:34 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 409 bytes --]


Add a new option.  It's inactive until I add some follow-up patches.

gcc/ChangeLog:

	* common.opt: Add -foffload-memory and its enum values.
	* coretypes.h (enum offload_memory): New.
	* doc/invoke.texi: Document -foffload-memory.
---
 gcc/common.opt      | 16 ++++++++++++++++
 gcc/coretypes.h     |  7 +++++++
 gcc/doc/invoke.texi | 16 +++++++++++++++-
 3 files changed, 38 insertions(+), 1 deletion(-)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0006-openmp-Add-foffload-memory.patch --]
[-- Type: text/x-patch; name="0006-openmp-Add-foffload-memory.patch", Size: 2919 bytes --]

diff --git a/gcc/common.opt b/gcc/common.opt
index e7a51e882ba..8d76980fbbb 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2213,6 +2213,22 @@ Enum(offload_abi) String(ilp32) Value(OFFLOAD_ABI_ILP32)
 EnumValue
 Enum(offload_abi) String(lp64) Value(OFFLOAD_ABI_LP64)
 
+foffload-memory=
+Common Joined RejectNegative Enum(offload_memory) Var(flag_offload_memory) Init(OFFLOAD_MEMORY_NONE)
+-foffload-memory=[none|unified|pinned]	Use an offload memory optimization.
+
+Enum
+Name(offload_memory) Type(enum offload_memory) UnknownError(Unknown offload memory option %qs)
+
+EnumValue
+Enum(offload_memory) String(none) Value(OFFLOAD_MEMORY_NONE)
+
+EnumValue
+Enum(offload_memory) String(unified) Value(OFFLOAD_MEMORY_UNIFIED)
+
+EnumValue
+Enum(offload_memory) String(pinned) Value(OFFLOAD_MEMORY_PINNED)
+
 fomit-frame-pointer
 Common Var(flag_omit_frame_pointer) Optimization
 When possible do not generate stack frames.
diff --git a/gcc/coretypes.h b/gcc/coretypes.h
index 08b9ac9094c..dd52d5bb113 100644
--- a/gcc/coretypes.h
+++ b/gcc/coretypes.h
@@ -206,6 +206,13 @@ enum offload_abi {
   OFFLOAD_ABI_ILP32
 };
 
+/* Types of memory optimization for an offload device.  */
+enum offload_memory {
+  OFFLOAD_MEMORY_NONE,
+  OFFLOAD_MEMORY_UNIFIED,
+  OFFLOAD_MEMORY_PINNED
+};
+
 /* Types of profile update methods.  */
 enum profile_update {
   PROFILE_UPDATE_SINGLE,
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index d5ff1018372..3df39bb06e3 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -202,7 +202,7 @@ in the following sections.
 -fno-builtin  -fno-builtin-@var{function}  -fcond-mismatch @gol
 -ffreestanding  -fgimple  -fgnu-tm  -fgnu89-inline  -fhosted @gol
 -flax-vector-conversions  -fms-extensions @gol
--foffload=@var{arg}  -foffload-options=@var{arg} @gol
+-foffload=@var{arg}  -foffload-options=@var{arg} -foffload-memory=@var{arg} @gol
 -fopenacc  -fopenacc-dim=@var{geom} @gol
 -fopenmp  -fopenmp-simd @gol
 -fpermitted-flt-eval-methods=@var{standard} @gol
@@ -2708,6 +2708,20 @@ Typical command lines are
 -foffload-options=amdgcn-amdhsa=-march=gfx906 -foffload-options=-lm
 @end smallexample
 
+@item -foffload-memory=none
+@itemx -foffload-memory=unified
+@itemx -foffload-memory=pinned
+@opindex foffload-memory
+@cindex OpenMP offloading memory modes
+Enable a memory optimization mode to use with OpenMP.  The default behavior,
+@option{-foffload-memory=none}, is to do nothing special (unless enabled via
+a requires directive in the code).  @option{-foffload-memory=unified} is
+equivalent to @code{#pragma omp requires unified_shared_memory}.
+@option{-foffload-memory=pinned} forces all host memory to be pinned (this
+mode may require the user to increase the ulimit setting for locked memory).
+All translation units must select the same setting to avoid undefined
+behavior.
+
 @item -fopenacc
 @opindex fopenacc
 @cindex OpenACC accelerator programming

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 07/17] openmp: allow requires unified_shared_memory
  2022-07-07 10:34 [PATCH 00/17] openmp, nvptx, amdgcn: 5.0 Memory Allocators Andrew Stubbs
                   ` (5 preceding siblings ...)
  2022-07-07 10:34 ` [PATCH 06/17] openmp: Add -foffload-memory Andrew Stubbs
@ 2022-07-07 10:34 ` Andrew Stubbs
  2022-07-07 10:34 ` [PATCH 08/17] openmp: -foffload-memory=pinned Andrew Stubbs
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 30+ messages in thread
From: Andrew Stubbs @ 2022-07-07 10:34 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1555 bytes --]


This is the front-end portion of the Unified Shared Memory implementation.
It removes the "sorry, unimplemented message" in C, C++, and Fortran, and sets
flag_offload_memory, but is otherwise inactive, for now.

It also checks that -foffload-memory isn't set to an incompatible mode.

gcc/c/ChangeLog:

	* c-parser.cc (c_parser_omp_requires): Allow "requires
	  unified_share_memory" and "unified_address".

gcc/cp/ChangeLog:

	* parser.cc (cp_parser_omp_requires): Allow "requires
	unified_share_memory" and "unified_address".

gcc/fortran/ChangeLog:

	* openmp.cc (gfc_match_omp_requires): Allow "requires
	unified_share_memory" and "unified_address".

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/usm-1.c: New test.
	* c-c++-common/gomp/usm-4.c: New test.
	* gfortran.dg/gomp/usm-1.f90: New test.
	* gfortran.dg/gomp/usm-4.f90: New test.
---
 gcc/c/c-parser.cc                        | 22 ++++++++++++++++++++--
 gcc/cp/parser.cc                         | 22 ++++++++++++++++++++--
 gcc/fortran/openmp.cc                    | 13 +++++++++++++
 gcc/testsuite/c-c++-common/gomp/usm-1.c  |  4 ++++
 gcc/testsuite/c-c++-common/gomp/usm-4.c  |  4 ++++
 gcc/testsuite/gfortran.dg/gomp/usm-1.f90 |  6 ++++++
 gcc/testsuite/gfortran.dg/gomp/usm-4.f90 |  6 ++++++
 7 files changed, 73 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/usm-1.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/usm-4.c
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/usm-1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/usm-4.f90


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0007-openmp-allow-requires-unified_shared_memory.patch --]
[-- Type: text/x-patch; name="0007-openmp-allow-requires-unified_shared_memory.patch", Size: 5988 bytes --]

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 9c02141e2c6..c30f67cd2da 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -22726,9 +22726,27 @@ c_parser_omp_requires (c_parser *parser)
 	  enum omp_requires this_req = (enum omp_requires) 0;
 
 	  if (!strcmp (p, "unified_address"))
-	    this_req = OMP_REQUIRES_UNIFIED_ADDRESS;
+	    {
+	      this_req = OMP_REQUIRES_UNIFIED_ADDRESS;
+
+	      if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED
+		  && flag_offload_memory != OFFLOAD_MEMORY_NONE)
+		error_at (cloc,
+			  "%<unified_address%> is incompatible with the "
+			  "selected %<-foffload-memory%> option");
+	      flag_offload_memory = OFFLOAD_MEMORY_UNIFIED;
+	    }
 	  else if (!strcmp (p, "unified_shared_memory"))
-	    this_req = OMP_REQUIRES_UNIFIED_SHARED_MEMORY;
+	    {
+	      this_req = OMP_REQUIRES_UNIFIED_SHARED_MEMORY;
+
+	      if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED
+		  && flag_offload_memory != OFFLOAD_MEMORY_NONE)
+		error_at (cloc,
+			  "%<unified_shared_memory%> is incompatible with the "
+			  "selected %<-foffload-memory%> option");
+	      flag_offload_memory = OFFLOAD_MEMORY_UNIFIED;
+	    }
 	  else if (!strcmp (p, "dynamic_allocators"))
 	    this_req = OMP_REQUIRES_DYNAMIC_ALLOCATORS;
 	  else if (!strcmp (p, "reverse_offload"))
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index df657a3fb2b..3deafc7c928 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -46860,9 +46860,27 @@ cp_parser_omp_requires (cp_parser *parser, cp_token *pragma_tok)
 	  enum omp_requires this_req = (enum omp_requires) 0;
 
 	  if (!strcmp (p, "unified_address"))
-	    this_req = OMP_REQUIRES_UNIFIED_ADDRESS;
+	    {
+	      this_req = OMP_REQUIRES_UNIFIED_ADDRESS;
+
+	      if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED
+		  && flag_offload_memory != OFFLOAD_MEMORY_NONE)
+		error_at (cloc,
+			  "%<unified_address%> is incompatible with the "
+			  "selected %<-foffload-memory%> option");
+	      flag_offload_memory = OFFLOAD_MEMORY_UNIFIED;
+	    }
 	  else if (!strcmp (p, "unified_shared_memory"))
-	    this_req = OMP_REQUIRES_UNIFIED_SHARED_MEMORY;
+	    {
+	      this_req = OMP_REQUIRES_UNIFIED_SHARED_MEMORY;
+
+	      if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED
+		  && flag_offload_memory != OFFLOAD_MEMORY_NONE)
+		error_at (cloc,
+			  "%<unified_shared_memory%> is incompatible with the "
+			  "selected %<-foffload-memory%> option");
+	      flag_offload_memory = OFFLOAD_MEMORY_UNIFIED;
+	    }
 	  else if (!strcmp (p, "dynamic_allocators"))
 	    this_req = OMP_REQUIRES_DYNAMIC_ALLOCATORS;
 	  else if (!strcmp (p, "reverse_offload"))
diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index bd4ff259fe0..91bf8a3c50d 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -29,6 +29,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "diagnostic.h"
 #include "gomp-constants.h"
 #include "target-memory.h"  /* For gfc_encode_character.  */
+#include "options.h"
 
 /* Match an end of OpenMP directive.  End of OpenMP directive is optional
    whitespace, followed by '\n' or comment '!'.  */
@@ -5556,6 +5557,12 @@ gfc_match_omp_requires (void)
 	  requires_clause = OMP_REQ_UNIFIED_ADDRESS;
 	  if (requires_clauses & OMP_REQ_UNIFIED_ADDRESS)
 	    goto duplicate_clause;
+
+	  if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED
+	      && flag_offload_memory != OFFLOAD_MEMORY_NONE)
+	    gfc_error_now ("unified_address at %C is incompatible with "
+			   "the selected -foffload-memory option");
+	  flag_offload_memory = OFFLOAD_MEMORY_UNIFIED;
 	}
       else if (gfc_match (clauses[2]) == MATCH_YES)
 	{
@@ -5563,6 +5570,12 @@ gfc_match_omp_requires (void)
 	  requires_clause = OMP_REQ_UNIFIED_SHARED_MEMORY;
 	  if (requires_clauses & OMP_REQ_UNIFIED_SHARED_MEMORY)
 	    goto duplicate_clause;
+
+	  if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED
+	      && flag_offload_memory != OFFLOAD_MEMORY_NONE)
+	    gfc_error_now ("unified_shared_memory at %C is incompatible with "
+			   "the selected -foffload-memory option");
+	  flag_offload_memory = OFFLOAD_MEMORY_UNIFIED;
 	}
       else if (gfc_match (clauses[3]) == MATCH_YES)
 	{
diff --git a/gcc/testsuite/c-c++-common/gomp/usm-1.c b/gcc/testsuite/c-c++-common/gomp/usm-1.c
new file mode 100644
index 00000000000..8d2ba62aba3
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/usm-1.c
@@ -0,0 +1,4 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-foffload-memory=pinned" } */
+
+#pragma omp requires unified_shared_memory  /* { dg-error ".unified_shared_memory. is incompatible with the selected .-foffload-memory. option" } */
diff --git a/gcc/testsuite/c-c++-common/gomp/usm-4.c b/gcc/testsuite/c-c++-common/gomp/usm-4.c
new file mode 100644
index 00000000000..84f6f785079
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/usm-4.c
@@ -0,0 +1,4 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-foffload-memory=pinned" } */
+
+#pragma omp requires unified_address        /* { dg-error ".unified_address. is incompatible with the selected .-foffload-memory. option" } */
diff --git a/gcc/testsuite/gfortran.dg/gomp/usm-1.f90 b/gcc/testsuite/gfortran.dg/gomp/usm-1.f90
new file mode 100644
index 00000000000..340f6bb50a5
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/usm-1.f90
@@ -0,0 +1,6 @@
+! { dg-do compile }
+! { dg-additional-options "-foffload-memory=pinned" }
+
+!$omp requires unified_shared_memory  ! { dg-error "unified_shared_memory at .* is incompatible with the selected -foffload-memory option" }
+
+end
diff --git a/gcc/testsuite/gfortran.dg/gomp/usm-4.f90 b/gcc/testsuite/gfortran.dg/gomp/usm-4.f90
new file mode 100644
index 00000000000..725b07f2f88
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/usm-4.f90
@@ -0,0 +1,6 @@
+! { dg-do compile }
+! { dg-additional-options "-foffload-memory=pinned" }
+
+!$omp requires unified_address  ! { dg-error "unified_address at .* is incompatible with the selected -foffload-memory option" }
+
+end

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 08/17] openmp: -foffload-memory=pinned
  2022-07-07 10:34 [PATCH 00/17] openmp, nvptx, amdgcn: 5.0 Memory Allocators Andrew Stubbs
                   ` (6 preceding siblings ...)
  2022-07-07 10:34 ` [PATCH 07/17] openmp: allow requires unified_shared_memory Andrew Stubbs
@ 2022-07-07 10:34 ` Andrew Stubbs
  2022-07-07 11:54   ` Tobias Burnus
  2022-07-07 10:34 ` [PATCH 09/17] openmp: Use libgomp memory allocation functions with unified shared memory Andrew Stubbs
                   ` (8 subsequent siblings)
  16 siblings, 1 reply; 30+ messages in thread
From: Andrew Stubbs @ 2022-07-07 10:34 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1822 bytes --]


Implement the -foffload-memory=pinned option such that libgomp is
instructed to enable fully-pinned memory at start-up.  The option is
intended to provide a performance boost to certain offload programs without
modifying the code.

This feature only works on Linux, at present, and simply calls mlockall to
enable always-on memory pinning.  It requires that the ulimit feature is
set high enough to accommodate all the program's memory usage.

In this mode the ompx_pinned_memory_alloc feature is disabled as it is not
needed and may conflict.

gcc/ChangeLog:

	* omp-builtins.def (BUILT_IN_GOMP_ENABLE_PINNED_MODE): New.
	* omp-low.cc (omp_enable_pinned_mode): New function.
	(execute_lower_omp): Call omp_enable_pinned_mode.

libgomp/ChangeLog:

	* config/linux/allocator.c (always_pinned_mode): New variable.
	(GOMP_enable_pinned_mode): New function.
	(linux_memspace_alloc): Disable pinning when always_pinned_mode set.
	(linux_memspace_calloc): Likewise.
	(linux_memspace_free): Likewise.
	(linux_memspace_realloc): Likewise.
	* libgomp.map: Add GOMP_enable_pinned_mode.
	* testsuite/libgomp.c/alloc-pinned-7.c: New test.

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/alloc-pinned-1.c: New test.
---
 gcc/omp-builtins.def                          |  3 +
 gcc/omp-low.cc                                | 66 +++++++++++++++++++
 .../c-c++-common/gomp/alloc-pinned-1.c        | 28 ++++++++
 libgomp/config/linux/allocator.c              | 26 ++++++++
 libgomp/libgomp.map                           |  1 +
 libgomp/target.c                              |  4 +-
 libgomp/testsuite/libgomp.c/alloc-pinned-7.c  | 63 ++++++++++++++++++
 7 files changed, 190 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/alloc-pinned-1.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-7.c


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0008-openmp-foffload-memory-pinned.patch --]
[-- Type: text/x-patch; name="0008-openmp-foffload-memory-pinned.patch", Size: 7749 bytes --]

diff --git a/gcc/omp-builtins.def b/gcc/omp-builtins.def
index ee5213eedcf..276dd7812f2 100644
--- a/gcc/omp-builtins.def
+++ b/gcc/omp-builtins.def
@@ -470,3 +470,6 @@ DEF_GOMP_BUILTIN (BUILT_IN_GOMP_WARNING, "GOMP_warning",
 		  BT_FN_VOID_CONST_PTR_SIZE, ATTR_NOTHROW_LEAF_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_ERROR, "GOMP_error",
 		  BT_FN_VOID_CONST_PTR_SIZE, ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST)
+DEF_GOMP_BUILTIN (BUILT_IN_GOMP_ENABLE_PINNED_MODE,
+		  "GOMP_enable_pinned_mode",
+		  BT_FN_VOID, ATTR_NOTHROW_LIST)
diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index d73c165f029..ba612e5c67d 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -14620,6 +14620,68 @@ lower_omp (gimple_seq *body, omp_context *ctx)
   input_location = saved_location;
 }
 
+/* Emit a constructor function to enable -foffload-memory=pinned
+   at runtime.  Libgomp handles the OS mode setting, but we need to trigger
+   it by calling GOMP_enable_pinned mode before the program proper runs.  */
+
+static void
+omp_enable_pinned_mode ()
+{
+  static bool visited = false;
+  if (visited)
+    return;
+  visited = true;
+
+  /* Create a new function like this:
+     
+       static void __attribute__((constructor))
+       __set_pinned_mode ()
+       {
+         GOMP_enable_pinned_mode ();
+       }
+  */
+
+  tree name = get_identifier ("__set_pinned_mode");
+  tree voidfntype = build_function_type_list (void_type_node, NULL_TREE);
+  tree decl = build_decl (UNKNOWN_LOCATION, FUNCTION_DECL, name, voidfntype);
+
+  TREE_STATIC (decl) = 1;
+  TREE_USED (decl) = 1;
+  DECL_ARTIFICIAL (decl) = 1;
+  DECL_IGNORED_P (decl) = 0;
+  TREE_PUBLIC (decl) = 0;
+  DECL_UNINLINABLE (decl) = 1;
+  DECL_EXTERNAL (decl) = 0;
+  DECL_CONTEXT (decl) = NULL_TREE;
+  DECL_INITIAL (decl) = make_node (BLOCK);
+  BLOCK_SUPERCONTEXT (DECL_INITIAL (decl)) = decl;
+  DECL_STATIC_CONSTRUCTOR (decl) = 1;
+  DECL_ATTRIBUTES (decl) = tree_cons (get_identifier ("constructor"),
+				      NULL_TREE, NULL_TREE);
+
+  tree t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE,
+		       void_type_node);
+  DECL_ARTIFICIAL (t) = 1;
+  DECL_IGNORED_P (t) = 1;
+  DECL_CONTEXT (t) = decl;
+  DECL_RESULT (decl) = t;
+
+  push_struct_function (decl);
+  init_tree_ssa (cfun);
+
+  tree calldecl = builtin_decl_explicit (BUILT_IN_GOMP_ENABLE_PINNED_MODE);
+  gcall *call = gimple_build_call (calldecl, 0);
+
+  gimple_seq seq = NULL;
+  gimple_seq_add_stmt (&seq, call);
+  gimple_set_body (decl, gimple_build_bind (NULL_TREE, seq, NULL));
+
+  cfun->function_end_locus = UNKNOWN_LOCATION;
+  cfun->curr_properties |= PROP_gimple_any;
+  pop_cfun ();
+  cgraph_node::add_new_function (decl, true);
+}
+
 /* Main entry point.  */
 
 static unsigned int
@@ -14676,6 +14738,10 @@ execute_lower_omp (void)
   for (auto task_stmt : task_cpyfns)
     finalize_task_copyfn (task_stmt);
   task_cpyfns.release ();
+
+  if (flag_offload_memory == OFFLOAD_MEMORY_PINNED)
+    omp_enable_pinned_mode ();
+
   return 0;
 }
 
diff --git a/gcc/testsuite/c-c++-common/gomp/alloc-pinned-1.c b/gcc/testsuite/c-c++-common/gomp/alloc-pinned-1.c
new file mode 100644
index 00000000000..e0e08019bff
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/alloc-pinned-1.c
@@ -0,0 +1,28 @@
+/* { dg-do run } */
+/* { dg-additional-options "-foffload-memory=pinned" } */
+/* { dg-xfail-run-if "Pinning not implemented on this host" { ! *-*-linux-gnu } } */
+
+#if __cplusplus
+#define EXTERNC extern "C"
+#else
+#define EXTERNC
+#endif
+
+/* Intercept the libgomp initialization call to check it happens.  */
+
+int good = 0;
+
+EXTERNC void
+GOMP_enable_pinned_mode ()
+{
+  good = 1;
+}
+
+int
+main ()
+{
+  if (!good)
+    __builtin_exit (1);
+
+  return 0;
+}
diff --git a/libgomp/config/linux/allocator.c b/libgomp/config/linux/allocator.c
index 18235f59775..e7fe6c3c49a 100644
--- a/libgomp/config/linux/allocator.c
+++ b/libgomp/config/linux/allocator.c
@@ -50,9 +50,26 @@
 #include <string.h>
 #include "libgomp.h"
 
+static bool always_pinned_mode = false;
+
+/* This function is called by the compiler when -foffload-memory=pinned
+   is used.  */
+
+void
+GOMP_enable_pinned_mode ()
+{
+  if (mlockall (MCL_CURRENT | MCL_FUTURE) != 0)
+    gomp_error ("failed to pin all memory (ulimit too low?)");
+  else
+    always_pinned_mode = true;
+}
+
 static void *
 linux_memspace_alloc (omp_memspace_handle_t memspace, size_t size, int pin)
 {
+  /* Explicit pinning may not be required.  */
+  pin = pin && !always_pinned_mode;
+
   if (memspace == ompx_unified_shared_mem_space)
     {
       return gomp_usm_alloc (size);
@@ -80,6 +97,9 @@ linux_memspace_alloc (omp_memspace_handle_t memspace, size_t size, int pin)
 static void *
 linux_memspace_calloc (omp_memspace_handle_t memspace, size_t size, int pin)
 {
+  /* Explicit pinning may not be required.  */
+  pin = pin && !always_pinned_mode;
+
   if (memspace == ompx_unified_shared_mem_space)
     {
       void *ret = gomp_usm_alloc (size);
@@ -97,6 +117,9 @@ static void
 linux_memspace_free (omp_memspace_handle_t memspace, void *addr, size_t size,
 		     int pin)
 {
+  /* Explicit pinning may not be required.  */
+  pin = pin && !always_pinned_mode;
+
   if (memspace == ompx_unified_shared_mem_space)
     gomp_usm_free (addr);
   else if (pin)
@@ -109,6 +132,9 @@ static void *
 linux_memspace_realloc (omp_memspace_handle_t memspace, void *addr,
 			size_t oldsize, size_t size, int oldpin, int pin)
 {
+  /* Explicit pinning may not be required.  */
+  pin = pin && !always_pinned_mode;
+
   if (memspace == ompx_unified_shared_mem_space)
     goto manual_realloc;
   else if (oldpin && pin)
diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
index 46d5f10f3e1..c86734f15e2 100644
--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -400,6 +400,7 @@ GOMP_5.0.1 {
   global:
 	GOMP_alloc;
 	GOMP_free;
+	GOMP_enable_pinned_mode;
 } GOMP_5.0;
 
 GOMP_5.1 {
diff --git a/libgomp/target.c b/libgomp/target.c
index 4e203ae3c06..3dd09b7afbd 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -1,3 +1,4 @@
+#include <stdio.h>
 /* Copyright (C) 2013-2022 Free Software Foundation, Inc.
    Contributed by Jakub Jelinek <jakub@redhat.com>.
 
@@ -1533,7 +1534,8 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep,
 		  continue;
 		}
 	      default:
-		if (tgt->list[i].offset == OFFSET_INLINED)
+		if (tgt->list[i].offset == OFFSET_INLINED
+		    && !array)
 		  continue;
 		break;
 	      }
diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-7.c b/libgomp/testsuite/libgomp.c/alloc-pinned-7.c
new file mode 100644
index 00000000000..8dc19055038
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/alloc-pinned-7.c
@@ -0,0 +1,63 @@
+/* { dg-do run } */
+/* { dg-additional-options "-foffload-memory=pinned" } */
+
+/* { dg-xfail-run-if "Pinning not implemented on this host" { ! *-*-linux-gnu } } */
+
+/* Test that pinned memory works.  */
+
+#include <stdio.h>
+#include <stdlib.h>
+
+#ifdef __linux__
+#include <sys/types.h>
+#include <unistd.h>
+
+#include <sys/mman.h>
+
+int
+get_pinned_mem ()
+{
+  int pid = getpid ();
+  char buf[100];
+  sprintf (buf, "/proc/%d/status", pid);
+
+  FILE *proc = fopen (buf, "r");
+  if (!proc)
+    abort ();
+  while (fgets (buf, 100, proc))
+    {
+      int val;
+      if (sscanf (buf, "VmLck: %d", &val))
+	{
+	  fclose (proc);
+	  return val;
+	}
+    }
+  abort ();
+}
+#else
+int
+get_pinned_mem ()
+{
+  return 0;
+}
+
+#define mlockall(...) 0
+#endif
+
+#include <omp.h>
+
+int
+main ()
+{
+  // Sanity check
+  if (get_pinned_mem () == 0)
+    {
+      /* -foffload-memory=pinned has failed, but maybe that's because
+	 isufficient pinned memory was available.  */
+      if (mlockall (MCL_CURRENT | MCL_FUTURE) == 0)
+	abort ();
+    }
+
+  return 0;
+}

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 09/17] openmp: Use libgomp memory allocation functions with unified shared memory.
  2022-07-07 10:34 [PATCH 00/17] openmp, nvptx, amdgcn: 5.0 Memory Allocators Andrew Stubbs
                   ` (7 preceding siblings ...)
  2022-07-07 10:34 ` [PATCH 08/17] openmp: -foffload-memory=pinned Andrew Stubbs
@ 2022-07-07 10:34 ` Andrew Stubbs
  2022-07-07 10:34 ` [PATCH 10/17] Add parsing support for allocate directive (OpenMP 5.0) Andrew Stubbs
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 30+ messages in thread
From: Andrew Stubbs @ 2022-07-07 10:34 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2375 bytes --]


This patches changes calls to malloc/free/calloc/realloc and operator new to
memory allocation functions in libgomp with
allocator=ompx_unified_shared_mem_alloc.  This helps existing code to benefit
from the unified shared memory.  The libgomp does the correct thing with all
the mapping constructs and there is no memory copies if the pointer is pointing
to unified shared memory.

We only replace replacable new operator and not the class member or placement new.

gcc/ChangeLog:

	* omp-low.cc (usm_transform): New function.
	(make_pass_usm_transform): Likewise.
	(class pass_usm_transform): New.
	* passes.def: Add pass_usm_transform.
	* tree-pass.h (make_pass_usm_transform): New declaration.

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/usm-2.c: New test.
	* c-c++-common/gomp/usm-3.c: New test.
	* g++.dg/gomp/usm-1.C: New test.
	* g++.dg/gomp/usm-2.C: New test.
	* g++.dg/gomp/usm-3.C: New test.
	* gfortran.dg/gomp/usm-2.f90: New test.
	* gfortran.dg/gomp/usm-3.f90: New test.

libgomp/ChangeLog:

	* testsuite/libgomp.c/usm-6.c: New test.
	* testsuite/libgomp.c++/usm-1.C: Likewise.

co-authored-by: Andrew Stubbs  <ams@codesourcery.com>
---
 gcc/omp-low.cc                           | 174 +++++++++++++++++++++++
 gcc/passes.def                           |   1 +
 gcc/testsuite/c-c++-common/gomp/usm-2.c  |  46 ++++++
 gcc/testsuite/c-c++-common/gomp/usm-3.c  |  44 ++++++
 gcc/testsuite/g++.dg/gomp/usm-1.C        |  32 +++++
 gcc/testsuite/g++.dg/gomp/usm-2.C        |  30 ++++
 gcc/testsuite/g++.dg/gomp/usm-3.C        |  38 +++++
 gcc/testsuite/gfortran.dg/gomp/usm-2.f90 |  16 +++
 gcc/testsuite/gfortran.dg/gomp/usm-3.f90 |  13 ++
 gcc/tree-pass.h                          |   1 +
 libgomp/testsuite/libgomp.c++/usm-1.C    |  54 +++++++
 libgomp/testsuite/libgomp.c/usm-6.c      |  92 ++++++++++++
 12 files changed, 541 insertions(+)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/usm-2.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/usm-3.c
 create mode 100644 gcc/testsuite/g++.dg/gomp/usm-1.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/usm-2.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/usm-3.C
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/usm-2.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/usm-3.f90
 create mode 100644 libgomp/testsuite/libgomp.c++/usm-1.C
 create mode 100644 libgomp/testsuite/libgomp.c/usm-6.c


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0009-openmp-Use-libgomp-memory-allocation-functions-with-.patch --]
[-- Type: text/x-patch; name="0009-openmp-Use-libgomp-memory-allocation-functions-with-.patch", Size: 18991 bytes --]

diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index ba612e5c67d..cdadd6f0c96 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -15097,6 +15097,180 @@ make_pass_diagnose_omp_blocks (gcc::context *ctxt)
 {
   return new pass_diagnose_omp_blocks (ctxt);
 }
+
+/* Provide transformation required for using unified shared memory
+   by replacing calls to standard memory allocation functions with
+   function provided by the libgomp.  */
+
+static tree
+usm_transform (gimple_stmt_iterator *gsi_p, bool *,
+	       struct walk_stmt_info *wi)
+{
+  gimple *stmt = gsi_stmt (*gsi_p);
+  /* ompx_unified_shared_mem_alloc is 10.  */
+  const unsigned int unified_shared_mem_alloc = 10;
+
+  switch (gimple_code (stmt))
+    {
+    case GIMPLE_CALL:
+      {
+	gcall *gs = as_a <gcall *> (stmt);
+	tree fndecl = gimple_call_fndecl (gs);
+	if (fndecl)
+	  {
+	    tree allocator = build_int_cst (pointer_sized_int_node,
+					    unified_shared_mem_alloc);
+	    const char *name = IDENTIFIER_POINTER (DECL_NAME (fndecl));
+	    if ((strcmp (name, "malloc") == 0)
+		 || (fndecl_built_in_p (fndecl, BUILT_IN_NORMAL)
+		     && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_MALLOC)
+		 || DECL_IS_REPLACEABLE_OPERATOR_NEW_P (fndecl)
+		 || strcmp (name, "omp_target_alloc") == 0)
+	      {
+		  tree omp_alloc_type
+		    = build_function_type_list (ptr_type_node, size_type_node,
+						pointer_sized_int_node,
+						NULL_TREE);
+		tree repl = build_fn_decl ("omp_alloc", omp_alloc_type);
+		tree size = gimple_call_arg (gs, 0);
+		gimple *g = gimple_build_call (repl, 2, size, allocator);
+		gimple_call_set_lhs (g, gimple_call_lhs (gs));
+		gimple_set_location (g, gimple_location (stmt));
+		gsi_replace (gsi_p, g, true);
+	      }
+	    else if (strcmp (name, "aligned_alloc") == 0)
+	      {
+		/*  May be we can also use this for new operator with
+		    std::align_val_t parameter.  */
+		tree omp_alloc_type
+		  = build_function_type_list (ptr_type_node, size_type_node,
+					      size_type_node,
+					      pointer_sized_int_node,
+					      NULL_TREE);
+		tree repl = build_fn_decl ("omp_aligned_alloc",
+					   omp_alloc_type);
+		tree align = gimple_call_arg (gs, 0);
+		tree size = gimple_call_arg (gs, 1);
+		gimple *g = gimple_build_call (repl, 3, align, size,
+					       allocator);
+		gimple_call_set_lhs (g, gimple_call_lhs (gs));
+		gimple_set_location (g, gimple_location (stmt));
+		gsi_replace (gsi_p, g, true);
+	      }
+	    else if ((strcmp (name, "calloc") == 0)
+		      || (fndecl_built_in_p (fndecl, BUILT_IN_NORMAL)
+			  && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_CALLOC))
+	      {
+		tree omp_calloc_type
+		  = build_function_type_list (ptr_type_node, size_type_node,
+					      size_type_node,
+					      pointer_sized_int_node,
+					      NULL_TREE);
+		tree repl = build_fn_decl ("omp_calloc", omp_calloc_type);
+		tree num = gimple_call_arg (gs, 0);
+		tree size = gimple_call_arg (gs, 1);
+		gimple *g = gimple_build_call (repl, 3, num, size, allocator);
+		gimple_call_set_lhs (g, gimple_call_lhs (gs));
+		gimple_set_location (g, gimple_location (stmt));
+		gsi_replace (gsi_p, g, true);
+	      }
+	    else if ((strcmp (name, "realloc") == 0)
+		      || (fndecl_built_in_p (fndecl, BUILT_IN_NORMAL)
+			  && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_REALLOC))
+	      {
+		tree omp_realloc_type
+		  = build_function_type_list (ptr_type_node, ptr_type_node,
+					      size_type_node,
+					      pointer_sized_int_node,
+					      pointer_sized_int_node,
+					      NULL_TREE);
+		tree repl = build_fn_decl ("omp_realloc", omp_realloc_type);
+		tree ptr = gimple_call_arg (gs, 0);
+		tree size = gimple_call_arg (gs, 1);
+		gimple *g = gimple_build_call (repl, 4, ptr, size, allocator,
+					       allocator);
+		gimple_call_set_lhs (g, gimple_call_lhs (gs));
+		gimple_set_location (g, gimple_location (stmt));
+		gsi_replace (gsi_p, g, true);
+	      }
+	    else  if ((strcmp (name, "free") == 0)
+		       || (fndecl_built_in_p (fndecl, BUILT_IN_NORMAL)
+			   && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_FREE)
+		       || (DECL_IS_OPERATOR_DELETE_P (fndecl)
+			   && DECL_IS_REPLACEABLE_OPERATOR (fndecl))
+		       || strcmp (name, "omp_target_free") == 0)
+	      {
+		tree omp_free_type
+		  = build_function_type_list (void_type_node, ptr_type_node,
+					      pointer_sized_int_node,
+					      NULL_TREE);
+		tree repl = build_fn_decl ("omp_free", omp_free_type);
+		tree ptr = gimple_call_arg (gs, 0);
+		gimple *g = gimple_build_call (repl, 2, ptr, allocator);
+		gimple_set_location (g, gimple_location (stmt));
+		gsi_replace (gsi_p, g, true);
+	      }
+	  }
+      }
+      break;
+
+    default:
+      break;
+    }
+
+  return NULL_TREE;
+}
+
+namespace {
+
+const pass_data pass_data_usm_transform =
+{
+  GIMPLE_PASS, /* type */
+  "usm_transform", /* name */
+  OPTGROUP_OMP, /* optinfo_flags */
+  TV_NONE, /* tv_id */
+  PROP_gimple_any, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_usm_transform : public gimple_opt_pass
+{
+public:
+  pass_usm_transform (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_usm_transform, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *)
+  {
+    return (flag_openmp || flag_openmp_simd)
+	    && (flag_offload_memory == OFFLOAD_MEMORY_UNIFIED
+		|| omp_requires_mask & OMP_REQUIRES_UNIFIED_SHARED_MEMORY
+		|| omp_requires_mask & OMP_REQUIRES_UNIFIED_ADDRESS);
+  }
+  virtual unsigned int execute (function *)
+  {
+    struct walk_stmt_info wi;
+    gimple_seq body = gimple_body (current_function_decl);
+
+    memset (&wi, 0, sizeof (wi));
+    walk_gimple_seq (body, usm_transform, NULL, &wi);
+
+    return 0;
+  }
+
+}; // class pass_usm_transform
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_usm_transform (gcc::context *ctxt)
+{
+  return new pass_usm_transform (ctxt);
+}
 \f
 
 #include "gt-omp-low.h"
diff --git a/gcc/passes.def b/gcc/passes.def
index 375d3d62d51..7f838bfc96a 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -36,6 +36,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_diagnose_tm_blocks);
   NEXT_PASS (pass_omp_oacc_kernels_decompose);
   NEXT_PASS (pass_lower_omp);
+  NEXT_PASS (pass_usm_transform);
   NEXT_PASS (pass_lower_cf);
   NEXT_PASS (pass_lower_tm);
   NEXT_PASS (pass_refactor_eh);
diff --git a/gcc/testsuite/c-c++-common/gomp/usm-2.c b/gcc/testsuite/c-c++-common/gomp/usm-2.c
new file mode 100644
index 00000000000..8c20ef94e69
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/usm-2.c
@@ -0,0 +1,46 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-fdump-tree-usm_transform" } */
+
+#pragma omp requires unified_shared_memory
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+void *malloc (__SIZE_TYPE__);
+void *aligned_alloc (__SIZE_TYPE__, __SIZE_TYPE__);
+void *calloc(__SIZE_TYPE__, __SIZE_TYPE__);
+void *realloc(void *, __SIZE_TYPE__);
+void free (void *);
+void *omp_target_alloc (__SIZE_TYPE__, int);
+void omp_target_free (void *, int);
+
+#ifdef __cplusplus
+}
+#endif
+
+void
+foo ()
+{
+  void *p1 = malloc(20);
+  void *p2 = realloc(p1, 30);
+  void *p3 = calloc(4, 15);
+  void *p4 = aligned_alloc(16, 40);
+  void *p5 = omp_target_alloc(50, 1);
+  free (p2);
+  free (p3);
+  free (p4);
+  omp_target_free (p5, 1);
+}
+
+/* { dg-final { scan-tree-dump-times "omp_alloc \\(20, 10\\)" 1 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-times "omp_realloc \\(.*, 30, 10, 10\\)" 1 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-times "omp_calloc \\(4, 15, 10\\)" 1 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-times "omp_aligned_alloc \\(16, 40, 10\\)" 1 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-times "omp_alloc \\(50, 10\\)" 1 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-times "omp_free" 4 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-not " free"  "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-not " aligned_alloc"  "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-not " malloc"  "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-not " omp_target_alloc"  "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-not " omp_target_free"  "usm_transform"  } } */
diff --git a/gcc/testsuite/c-c++-common/gomp/usm-3.c b/gcc/testsuite/c-c++-common/gomp/usm-3.c
new file mode 100644
index 00000000000..2b0cbb45e27
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/usm-3.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-foffload-memory=unified -fdump-tree-usm_transform" } */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+void *malloc (__SIZE_TYPE__);
+void *aligned_alloc (__SIZE_TYPE__, __SIZE_TYPE__);
+void *calloc(__SIZE_TYPE__, __SIZE_TYPE__);
+void *realloc(void *, __SIZE_TYPE__);
+void free (void *);
+void *omp_target_alloc (__SIZE_TYPE__, int);
+void omp_target_free (void *, int);
+
+#ifdef __cplusplus
+}
+#endif
+
+void
+foo ()
+{
+  void *p1 = malloc(20);
+  void *p2 = realloc(p1, 30);
+  void *p3 = calloc(4, 15);
+  void *p4 = aligned_alloc(16, 40);
+  void *p5 = omp_target_alloc(50, 1);
+  free (p2);
+  free (p3);
+  free (p4);
+  omp_target_free (p5, 1);
+}
+
+/* { dg-final { scan-tree-dump-times "omp_alloc \\(20, 10\\)" 1 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-times "omp_realloc \\(.*, 30, 10, 10\\)" 1 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-times "omp_calloc \\(4, 15, 10\\)" 1 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-times "omp_aligned_alloc \\(16, 40, 10\\)" 1 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-times "omp_alloc \\(50, 10\\)" 1 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-times "omp_free" 4 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-not " free"  "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-not " aligned_alloc"  "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-not " malloc"  "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-not " omp_target_alloc"  "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-not " omp_target_free"  "usm_transform"  } } */
diff --git a/gcc/testsuite/g++.dg/gomp/usm-1.C b/gcc/testsuite/g++.dg/gomp/usm-1.C
new file mode 100644
index 00000000000..bd70a81b5bb
--- /dev/null
+++ b/gcc/testsuite/g++.dg/gomp/usm-1.C
@@ -0,0 +1,32 @@
+// { dg-do compile }
+// { dg-options "-fopenmp -fdump-tree-usm_transform" }
+
+#pragma omp requires unified_shared_memory
+
+struct t1
+{
+  int a;
+  int b;
+};
+
+typedef unsigned char uint8_t;
+
+void
+foo (__SIZE_TYPE__ x, __SIZE_TYPE__ y)
+{
+  uint8_t *p1 = new uint8_t;
+  uint8_t *p2 = new uint8_t[20];
+  t1 *p3 = new t1;
+  t1 *p4 = new t1[y];
+  delete p1;
+  delete p3;
+  delete [] p2;
+  delete [] p4;
+}
+
+/* { dg-final { scan-tree-dump-times "omp_alloc \\(1, 10\\)" 1 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-times "omp_alloc \\(20, 10\\)" 1 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-times "omp_alloc" 4 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-times "omp_free" 4 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-not "operator new"  "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-not "operator delete"  "usm_transform"  } } */
diff --git a/gcc/testsuite/g++.dg/gomp/usm-2.C b/gcc/testsuite/g++.dg/gomp/usm-2.C
new file mode 100644
index 00000000000..f6ab155c6de
--- /dev/null
+++ b/gcc/testsuite/g++.dg/gomp/usm-2.C
@@ -0,0 +1,30 @@
+// { dg-do compile }
+// { dg-options "-fopenmp -foffload-memory=unified -fdump-tree-usm_transform" }
+
+struct t1
+{
+  int a;
+  int b;
+};
+
+typedef unsigned char uint8_t;
+
+void
+foo (__SIZE_TYPE__ x, __SIZE_TYPE__ y)
+{
+  uint8_t *p1 = new uint8_t;
+  uint8_t *p2 = new uint8_t[20];
+  t1 *p3 = new t1;
+  t1 *p4 = new t1[y];
+  delete p1;
+  delete p3;
+  delete [] p2;
+  delete [] p4;
+}
+
+/* { dg-final { scan-tree-dump-times "omp_alloc \\(1, 10\\)" 1 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-times "omp_alloc \\(20, 10\\)" 1 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-times "omp_alloc" 4 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-times "omp_free" 4 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-not "operator new"  "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-not "operator delete"  "usm_transform"  } } */
diff --git a/gcc/testsuite/g++.dg/gomp/usm-3.C b/gcc/testsuite/g++.dg/gomp/usm-3.C
new file mode 100644
index 00000000000..50ac9302c8b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/gomp/usm-3.C
@@ -0,0 +1,38 @@
+// { dg-do compile }
+// { dg-options "-fopenmp -fdump-tree-usm_transform" }
+
+#pragma omp requires unified_shared_memory
+
+#include <new>
+
+
+struct X {
+    static void* operator new(std::size_t count)
+    {
+      static char buf[10];
+      return &buf[0];
+    }
+    static void* operator new[](std::size_t count)
+    {
+      static char buf[10];
+      return &buf[0];
+    }
+    static void operator delete(void*)
+    {
+    }
+    static void operator delete[](void*)
+    {
+    }
+};
+void foo() {
+  X* p1 = new X;
+  delete p1;
+  X* p2 = new X[10];
+  delete[] p2;
+  unsigned char buf[24] ;
+  int *p3 = new (buf) int(3);
+  p3[0] = 1;
+}
+
+/* { dg-final { scan-tree-dump-not "omp_alloc"  "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-not "omp_free"  "usm_transform"  } } */
diff --git a/gcc/testsuite/gfortran.dg/gomp/usm-2.f90 b/gcc/testsuite/gfortran.dg/gomp/usm-2.f90
new file mode 100644
index 00000000000..dc775260cb7
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/usm-2.f90
@@ -0,0 +1,16 @@
+! { dg-do compile }
+! { dg-additional-options "-fdump-tree-usm_transform" }
+
+!$omp requires unified_shared_memory
+end
+
+subroutine foo()
+  implicit none
+  integer, allocatable :: var1
+
+  allocate(var1)
+
+end subroutine
+
+! { dg-final { scan-tree-dump-times "omp_alloc" 1 "usm_transform"  } } 
+! { dg-final { scan-tree-dump-times "omp_free" 1 "usm_transform"  } } 
\ No newline at end of file
diff --git a/gcc/testsuite/gfortran.dg/gomp/usm-3.f90 b/gcc/testsuite/gfortran.dg/gomp/usm-3.f90
new file mode 100644
index 00000000000..7983444ebff
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/usm-3.f90
@@ -0,0 +1,13 @@
+! { dg-do compile }
+! { dg-additional-options "-foffload-memory=unified -fdump-tree-usm_transform" }
+
+subroutine foo()
+  implicit none
+  integer, allocatable :: var1
+
+  allocate(var1)
+
+end subroutine
+
+! { dg-final { scan-tree-dump-times "omp_alloc" 1 "usm_transform"  } } 
+! { dg-final { scan-tree-dump-times "omp_free" 1 "usm_transform"  } } 
\ No newline at end of file
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 606d1d60b85..494a9662afa 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -424,6 +424,7 @@ extern gimple_opt_pass *make_pass_lower_vector_ssa (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_omp_oacc_kernels_decompose (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_lower_omp (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_diagnose_omp_blocks (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_usm_transform (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_expand_omp (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_expand_omp_ssa (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_omp_target_link (gcc::context *ctxt);
diff --git a/libgomp/testsuite/libgomp.c++/usm-1.C b/libgomp/testsuite/libgomp.c++/usm-1.C
new file mode 100644
index 00000000000..fea25e5f10b
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c++/usm-1.C
@@ -0,0 +1,54 @@
+/* { dg-do run } */
+/* { dg-skip-if "Only valid for nvptx" { ! offload_target_nvptx } } */
+#include <stdint.h>
+
+#pragma omp requires unified_shared_memory
+
+int g1 = 0;
+
+struct s1
+{
+  s1() { a = g1++;}
+  ~s1() { g1--;}
+  int a;
+};
+
+int
+main ()
+{
+  s1 *p1 = new s1;
+  s1 *p2 = new s1[10];
+
+  if (!p1 || !p2 || p1->a != 0)
+    __builtin_abort ();
+
+  for (int i = 0; i < 10; i++)
+    if (p2[i].a != i+1)
+      __builtin_abort ();
+
+  uintptr_t pp1 = (uintptr_t)p1;
+  uintptr_t pp2 = (uintptr_t)p2;
+
+#pragma omp target firstprivate(pp1, pp2)
+    {
+      s1 *t1 = (s1*)pp1;
+      s1 *t2 = (s1*)pp2;
+      if (t1->a != 0)
+	__builtin_abort ();
+
+      for (int i = 0; i < 10; i++)
+	if (t2[i].a != i+1)
+	  __builtin_abort ();
+
+      t1->a = 42;
+    }
+
+  if (p1->a != 42)
+    __builtin_abort ();
+
+  delete [] p2;
+  delete p1;
+  if (g1 != 0)
+    __builtin_abort ();
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c/usm-6.c b/libgomp/testsuite/libgomp.c/usm-6.c
new file mode 100644
index 00000000000..c207140092a
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/usm-6.c
@@ -0,0 +1,92 @@
+/* { dg-do run } */
+/* { dg-skip-if "Only valid for nvptx" { ! offload_target_nvptx } } */
+
+#include <stdint.h>
+#include <stdlib.h>
+
+#include <omp.h>
+
+/* On old systems, the declaraition may not be present in stdlib.h which
+   will generate a warning.  This function is going to be replaced with
+   omp_aligned_alloc so the purpose of this declaration is to avoid that
+   warning.  */
+void *aligned_alloc(size_t alignment, size_t size);
+
+#pragma omp requires unified_shared_memory
+
+int
+main ()
+{
+  int *a = (int *) malloc(sizeof(int)*2);
+  int *b = (int *) calloc(sizeof(int), 3);
+  int *c = (int *) realloc(NULL, sizeof(int) * 4);
+  int *d = (int *) aligned_alloc(32, sizeof(int));
+  int *e = (int *) omp_target_alloc(sizeof(int), 1);
+  if (!a || !b || !c || !d || !e)
+    __builtin_abort ();
+
+  a[0] = 42;
+  a[1] = 43;
+  b[0] = 52;
+  b[1] = 53;
+  b[2] = 54;
+  c[0] = 62;
+  c[1] = 63;
+  c[2] = 64;
+  c[3] = 65;
+
+  uintptr_t a_p = (uintptr_t)a;
+  uintptr_t b_p = (uintptr_t)b;
+  uintptr_t c_p = (uintptr_t)c;
+  uintptr_t d_p = (uintptr_t)d;
+  uintptr_t e_p = (uintptr_t)e;
+
+  if (d_p & 31 != 0)
+    __builtin_abort ();
+
+#pragma omp target enter data map(to:a[0:2])
+
+#pragma omp target is_device_ptr(c)
+    {
+      if (a[0] != 42 || a_p != (uintptr_t)a)
+	__builtin_abort ();
+      if (b[0] != 52 || b[2] != 54 || b_p != (uintptr_t)b)
+	__builtin_abort ();
+      if (c[0] != 62 || c[3] != 65 || c_p != (uintptr_t)c)
+	__builtin_abort ();
+      if (d_p != (uintptr_t)d)
+	__builtin_abort ();
+      if (e_p != (uintptr_t)e)
+	__builtin_abort ();
+      a[0] = 72;
+      b[0] = 82;
+      c[0] = 92;
+      e[0] = 102;
+    }
+
+#pragma omp target
+    {
+      if (a[1] != 43 || a_p != (uintptr_t)a)
+	__builtin_abort ();
+      if (b[1] != 53 || b_p != (uintptr_t)b)
+	__builtin_abort ();
+      if (c[1] != 63 || c[2] != 64 || c_p != (uintptr_t)c)
+	__builtin_abort ();
+      a[1] = 73;
+      b[1] = 83;
+      c[1] = 93;
+    }
+
+#pragma omp target exit data map(delete:a[0:2])
+
+  if (a[0] != 72 || a[1] != 73
+      || b[0] != 82 || b[1] != 83
+      || c[0] != 92 || c[1] != 93
+      || e[0] != 102)
+	__builtin_abort ();
+  free(a);
+  free(b);
+  free(c);
+  omp_target_free(e, 1);
+  return 0;
+}

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 10/17] Add parsing support for allocate directive (OpenMP 5.0)
  2022-07-07 10:34 [PATCH 00/17] openmp, nvptx, amdgcn: 5.0 Memory Allocators Andrew Stubbs
                   ` (8 preceding siblings ...)
  2022-07-07 10:34 ` [PATCH 09/17] openmp: Use libgomp memory allocation functions with unified shared memory Andrew Stubbs
@ 2022-07-07 10:34 ` Andrew Stubbs
  2022-07-07 10:34 ` [PATCH 11/17] Translate " Andrew Stubbs
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 30+ messages in thread
From: Andrew Stubbs @ 2022-07-07 10:34 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1863 bytes --]


Currently we only make use of this directive when it is associated
with an allocate statement.

gcc/fortran/ChangeLog:

	* dump-parse-tree.cc (show_omp_node): Handle EXEC_OMP_ALLOCATE.
	(show_code_node): Likewise.
	* gfortran.h (enum gfc_statement): Add ST_OMP_ALLOCATE.
	(OMP_LIST_ALLOCATOR): New enum value.
	(enum gfc_exec_op): Add EXEC_OMP_ALLOCATE.
	* match.h (gfc_match_omp_allocate): New function.
	* openmp.cc (enum omp_mask1): Add OMP_CLAUSE_ALLOCATOR.
	(OMP_ALLOCATE_CLAUSES): New define.
	(gfc_match_omp_allocate): New function.
	(resolve_omp_clauses): Add ALLOCATOR in clause_names.
	(omp_code_to_statement): Handle EXEC_OMP_ALLOCATE.
	(EMPTY_VAR_LIST): New define.
	(check_allocate_directive_restrictions): New function.
	(gfc_resolve_omp_allocate): Likewise.
	(gfc_resolve_omp_directive): Handle EXEC_OMP_ALLOCATE.
	* parse.cc (decode_omp_directive): Handle ST_OMP_ALLOCATE.
	(next_statement): Likewise.
	(gfc_ascii_statement): Likewise.
	* resolve.cc (gfc_resolve_code): Handle EXEC_OMP_ALLOCATE.
	* st.cc (gfc_free_statement): Likewise.
	* trans.cc (trans_code): Likewise
---
 gcc/fortran/dump-parse-tree.cc                |   3 +
 gcc/fortran/gfortran.h                        |   4 +-
 gcc/fortran/match.h                           |   1 +
 gcc/fortran/openmp.cc                         | 199 +++++++++++++++++-
 gcc/fortran/parse.cc                          |  10 +-
 gcc/fortran/resolve.cc                        |   1 +
 gcc/fortran/st.cc                             |   1 +
 gcc/fortran/trans.cc                          |   1 +
 gcc/testsuite/gfortran.dg/gomp/allocate-4.f90 | 112 ++++++++++
 gcc/testsuite/gfortran.dg/gomp/allocate-5.f90 |  73 +++++++
 10 files changed, 400 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/allocate-4.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/allocate-5.f90


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0010-Add-parsing-support-for-allocate-directive-OpenMP-5..patch --]
[-- Type: text/x-patch; name="0010-Add-parsing-support-for-allocate-directive-OpenMP-5..patch", Size: 20254 bytes --]

diff --git a/gcc/fortran/dump-parse-tree.cc b/gcc/fortran/dump-parse-tree.cc
index 5352008a63d..e0c6c0d9d96 100644
--- a/gcc/fortran/dump-parse-tree.cc
+++ b/gcc/fortran/dump-parse-tree.cc
@@ -2003,6 +2003,7 @@ show_omp_node (int level, gfc_code *c)
     case EXEC_OACC_CACHE: name = "CACHE"; is_oacc = true; break;
     case EXEC_OACC_ENTER_DATA: name = "ENTER DATA"; is_oacc = true; break;
     case EXEC_OACC_EXIT_DATA: name = "EXIT DATA"; is_oacc = true; break;
+    case EXEC_OMP_ALLOCATE: name = "ALLOCATE"; break;
     case EXEC_OMP_ATOMIC: name = "ATOMIC"; break;
     case EXEC_OMP_BARRIER: name = "BARRIER"; break;
     case EXEC_OMP_CANCEL: name = "CANCEL"; break;
@@ -2204,6 +2205,7 @@ show_omp_node (int level, gfc_code *c)
       || c->op == EXEC_OMP_TARGET_UPDATE || c->op == EXEC_OMP_TARGET_ENTER_DATA
       || c->op == EXEC_OMP_TARGET_EXIT_DATA || c->op == EXEC_OMP_SCAN
       || c->op == EXEC_OMP_DEPOBJ || c->op == EXEC_OMP_ERROR
+      || c->op == EXEC_OMP_ALLOCATE
       || (c->op == EXEC_OMP_ORDERED && c->block == NULL))
     return;
   if (c->op == EXEC_OMP_SECTIONS || c->op == EXEC_OMP_PARALLEL_SECTIONS)
@@ -3329,6 +3331,7 @@ show_code_node (int level, gfc_code *c)
     case EXEC_OACC_CACHE:
     case EXEC_OACC_ENTER_DATA:
     case EXEC_OACC_EXIT_DATA:
+    case EXEC_OMP_ALLOCATE:
     case EXEC_OMP_ATOMIC:
     case EXEC_OMP_CANCEL:
     case EXEC_OMP_CANCELLATION_POINT:
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 696aadd7db6..755469185a6 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -259,7 +259,7 @@ enum gfc_statement
   ST_OACC_CACHE, ST_OACC_KERNELS_LOOP, ST_OACC_END_KERNELS_LOOP,
   ST_OACC_SERIAL_LOOP, ST_OACC_END_SERIAL_LOOP, ST_OACC_SERIAL,
   ST_OACC_END_SERIAL, ST_OACC_ENTER_DATA, ST_OACC_EXIT_DATA, ST_OACC_ROUTINE,
-  ST_OACC_ATOMIC, ST_OACC_END_ATOMIC,
+  ST_OACC_ATOMIC, ST_OACC_END_ATOMIC, ST_OMP_ALLOCATE,
   ST_OMP_ATOMIC, ST_OMP_BARRIER, ST_OMP_CRITICAL, ST_OMP_END_ATOMIC,
   ST_OMP_END_CRITICAL, ST_OMP_END_DO, ST_OMP_END_MASTER, ST_OMP_END_ORDERED,
   ST_OMP_END_PARALLEL, ST_OMP_END_PARALLEL_DO, ST_OMP_END_PARALLEL_SECTIONS,
@@ -1398,6 +1398,7 @@ enum
   OMP_LIST_USE_DEVICE_ADDR,
   OMP_LIST_NONTEMPORAL,
   OMP_LIST_ALLOCATE,
+  OMP_LIST_ALLOCATOR,
   OMP_LIST_HAS_DEVICE_ADDR,
   OMP_LIST_ENTER,
   OMP_LIST_NUM /* Must be the last.  */
@@ -2908,6 +2909,7 @@ enum gfc_exec_op
   EXEC_OACC_DATA, EXEC_OACC_HOST_DATA, EXEC_OACC_LOOP, EXEC_OACC_UPDATE,
   EXEC_OACC_WAIT, EXEC_OACC_CACHE, EXEC_OACC_ENTER_DATA, EXEC_OACC_EXIT_DATA,
   EXEC_OACC_ATOMIC, EXEC_OACC_DECLARE,
+  EXEC_OMP_ALLOCATE,
   EXEC_OMP_CRITICAL, EXEC_OMP_DO, EXEC_OMP_FLUSH, EXEC_OMP_MASTER,
   EXEC_OMP_ORDERED, EXEC_OMP_PARALLEL, EXEC_OMP_PARALLEL_DO,
   EXEC_OMP_PARALLEL_SECTIONS, EXEC_OMP_PARALLEL_WORKSHARE,
diff --git a/gcc/fortran/match.h b/gcc/fortran/match.h
index 495c93e0b5c..fe43d4b3fd3 100644
--- a/gcc/fortran/match.h
+++ b/gcc/fortran/match.h
@@ -149,6 +149,7 @@ match gfc_match_oacc_routine (void);
 
 /* OpenMP directive matchers.  */
 match gfc_match_omp_eos_error (void);
+match gfc_match_omp_allocate (void);
 match gfc_match_omp_atomic (void);
 match gfc_match_omp_barrier (void);
 match gfc_match_omp_cancel (void);
diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index 91bf8a3c50d..38003890bb0 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -986,6 +986,7 @@ enum omp_mask2
   OMP_CLAUSE_FINALIZE,
   OMP_CLAUSE_ATTACH,
   OMP_CLAUSE_NOHOST,
+  OMP_CLAUSE_ALLOCATOR,
   OMP_CLAUSE_HAS_DEVICE_ADDR,  /* OpenMP 5.1  */
   OMP_CLAUSE_ENTER, /* OpenMP 5.2 */
   /* This must come last.  */
@@ -3784,6 +3785,7 @@ cleanup:
 }
 
 
+#define OMP_ALLOCATE_CLAUSES (omp_mask (OMP_CLAUSE_ALLOCATOR))
 #define OMP_PARALLEL_CLAUSES \
   (omp_mask (OMP_CLAUSE_PRIVATE) | OMP_CLAUSE_FIRSTPRIVATE		\
    | OMP_CLAUSE_SHARED | OMP_CLAUSE_COPYIN | OMP_CLAUSE_REDUCTION	\
@@ -6001,6 +6003,64 @@ gfc_match_omp_ordered_depend (void)
   return match_omp (EXEC_OMP_ORDERED, omp_mask (OMP_CLAUSE_DEPEND));
 }
 
+/* omp allocate (list) [clause-list]
+   - clause-list:  allocator
+*/
+
+match
+gfc_match_omp_allocate (void)
+{
+  gfc_omp_clauses *c = gfc_get_omp_clauses ();
+  gfc_expr *allocator = NULL;
+  match m;
+
+  m = gfc_match (" (");
+  if (m == MATCH_YES)
+    {
+      m = gfc_match_omp_variable_list ("", &c->lists[OMP_LIST_ALLOCATOR],
+				       true, NULL);
+
+      if (m != MATCH_YES)
+	{
+	  /* If the list was empty, we must find closing ')'.  */
+	  m = gfc_match (")");
+	  if (m != MATCH_YES)
+	    return m;
+	}
+    }
+
+  if (gfc_match (" allocator ( ") == MATCH_YES)
+    {
+      m = gfc_match_expr (&allocator);
+      if (m != MATCH_YES)
+	{
+	  gfc_error ("Expected allocator at %C");
+	  return MATCH_ERROR;
+	}
+      if (gfc_match (" ) ") != MATCH_YES)
+	{
+	  gfc_error ("Expected ')' at %C");
+	  gfc_free_expr (allocator);
+	  return MATCH_ERROR;
+	}
+    }
+
+  if (gfc_match_omp_eos () != MATCH_YES)
+    {
+      gfc_free_expr (allocator);
+      gfc_error ("Unexpected junk after $OMP allocate at %C");
+      return MATCH_ERROR;
+    }
+  gfc_omp_namelist *n;
+  for (n = c->lists[OMP_LIST_ALLOCATOR]; n; n = n->next)
+      n->expr = gfc_copy_expr (allocator);
+
+  new_st.op = EXEC_OMP_ALLOCATE;
+  new_st.ext.omp_clauses = c;
+  gfc_free_expr (allocator);
+  return MATCH_YES;
+}
+
 
 /* omp atomic [clause-list]
    - atomic-clause:  read | write | update
@@ -6482,7 +6542,7 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses *omp_clauses,
 	"IN_REDUCTION", "TASK_REDUCTION",
 	"DEVICE_RESIDENT", "LINK", "USE_DEVICE",
 	"CACHE", "IS_DEVICE_PTR", "USE_DEVICE_PTR", "USE_DEVICE_ADDR",
-	"NONTEMPORAL", "ALLOCATE", "HAS_DEVICE_ADDR", "ENTER" };
+	"NONTEMPORAL", "ALLOCATE", "HAS_DEVICE_ADDR", "ENTER", "ALLOCATOR" };
   STATIC_ASSERT (ARRAY_SIZE (clause_names) == OMP_LIST_NUM);
 
   if (omp_clauses == NULL)
@@ -9006,6 +9066,8 @@ omp_code_to_statement (gfc_code *code)
 {
   switch (code->op)
     {
+    case EXEC_OMP_ALLOCATE:
+      return ST_OMP_ALLOCATE;
     case EXEC_OMP_PARALLEL:
       return ST_OMP_PARALLEL;
     case EXEC_OMP_PARALLEL_MASKED:
@@ -9486,6 +9548,138 @@ gfc_resolve_oacc_routines (gfc_namespace *ns)
     }
 }
 
+static void
+check_allocate_directive_restrictions (gfc_symbol *sym, gfc_expr *omp_al,
+				       gfc_namespace *ns, locus loc)
+{
+  if (sym->attr.save != SAVE_NONE || sym->attr.in_common == 1
+      || sym->module != NULL)
+    {
+      int tmp;
+      /*  Assumption here is that we can extract an integer then
+	  it is a predefined thing.  */
+      if (!omp_al || gfc_extract_int (omp_al, &tmp))
+	  gfc_error ("%qs should use predefined allocator at %L", sym->name,
+		     &loc);
+    }
+  if (ns != sym->ns)
+    gfc_error ("%qs is not in the same scope as %<allocate%>"
+	       " directive at %L", sym->name, &loc);
+}
+
+#define EMPTY_VAR_LIST(node) \
+  (node->ext.omp_clauses->lists[OMP_LIST_ALLOCATOR] == NULL)
+
+static void
+gfc_resolve_omp_allocate (gfc_code *code, gfc_namespace *ns)
+{
+  gfc_alloc *al;
+  gfc_omp_namelist *n = NULL;
+  gfc_omp_namelist *cn = NULL;
+  gfc_omp_namelist *p, *tail;
+  gfc_code *cur;
+  hash_set<gfc_symbol*> vars;
+
+  gfc_omp_clauses *clauses = code->ext.omp_clauses;
+  gcc_assert (clauses);
+  cn = clauses->lists[OMP_LIST_ALLOCATOR];
+  gfc_expr *omp_al = cn ? cn->expr : NULL;
+
+  if (omp_al && (omp_al->ts.type != BT_INTEGER
+      || omp_al->ts.kind != gfc_c_intptr_kind))
+    gfc_error ("Expected integer expression of the "
+	       "%<omp_allocator_handle_kind%> kind at %L", &omp_al->where);
+
+  /* Check that variables in this allocate directive are not duplicated
+     in this directive or others coming directly after it.  */
+  for (cur = code; cur != NULL && cur->op == EXEC_OMP_ALLOCATE;
+      cur = cur->next)
+    {
+      gfc_omp_clauses *c = cur->ext.omp_clauses;
+      gcc_assert (c);
+      for (n = c->lists[OMP_LIST_ALLOCATOR]; n; n = n->next)
+	{
+	  if (vars.contains (n->sym))
+	    gfc_error ("%qs is used in multiple %<allocate%> "
+		       "directives at %L", n->sym->name, &cur->loc);
+	  /* This helps us avoid duplicate error messages.  */
+	  if (cur == code)
+	    vars.add (n->sym);
+	}
+    }
+
+  if (cur == NULL || cur->op != EXEC_ALLOCATE)
+    {
+      /*  There is no allocate statement right after allocate directive.
+	  We don't support this case at the moment.  */
+      for (n = cn; n != NULL; n = n->next)
+	{
+	  gfc_symbol *sym = n->sym;
+	  if (sym->attr.allocatable == 1)
+	    gfc_error ("%qs with ALLOCATABLE attribute is not allowed in "
+		       "%<allocate%> directive at %L as this directive is not"
+		       " associated with an %<allocate%> statement.",
+		       sym->name, &code->loc);
+	}
+      sorry_at (code->loc.lb->location, "%<allocate%> directive that is "
+		"not associated with an %<allocate%> statement is not "
+		"supported.");
+      return;
+    }
+
+  /* If there is another allocate directive right after this one, check
+     that none of them is empty.  Doing it this way, we can check this
+     thing even when multiple directives are together and generate
+     error at right location.  */
+  if (code->next && code->next->op == EXEC_OMP_ALLOCATE
+      && (EMPTY_VAR_LIST (code) || EMPTY_VAR_LIST (code->next)))
+    gfc_error ("Empty variable list is not allowed at %L when multiple "
+	       "%<allocate%> directives are associated with an "
+	       "%<allocate%> statement.",
+	       EMPTY_VAR_LIST (code) ? &code->loc : &code->next->loc);
+
+  if (EMPTY_VAR_LIST (code))
+    {
+      /* Empty namelist means allocate directive applies to all
+	 variables in allocate statement.  'cur' points to associated
+	 allocate statement.  */
+      for (al = cur->ext.alloc.list; al != NULL; al = al->next)
+	if (al->expr && al->expr->symtree && al->expr->symtree->n.sym)
+	  {
+	    check_allocate_directive_restrictions (al->expr->symtree->n.sym,
+						   omp_al, ns, code->loc);
+	    p = gfc_get_omp_namelist ();
+	    p->sym = al->expr->symtree->n.sym;
+	    p->expr = omp_al;
+	    p->where = code->loc;
+	    if (cn == NULL)
+	      cn = tail = p;
+	    else
+	      {
+		tail->next = p;
+		tail = tail->next;
+	      }
+	  }
+      clauses->lists[OMP_LIST_ALLOCATOR]= cn;
+    }
+  else
+    {
+      for (n = cn; n != NULL; n = n->next)
+	{
+	  for (al = cur->ext.alloc.list; al != NULL; al = al->next)
+	    if (al->expr && al->expr->symtree && al->expr->symtree->n.sym
+		&& al->expr->symtree->n.sym == n->sym)
+	      break;
+	  if (al == NULL)
+	    gfc_error ("%qs in %<allocate%> directive at %L is not present "
+		       "in associated %<allocate%> statement.",
+		       n->sym->name, &code->loc);
+	  check_allocate_directive_restrictions (n->sym, omp_al, ns,
+						 code->loc);
+	}
+    }
+}
+
 
 void
 gfc_resolve_oacc_directive (gfc_code *code, gfc_namespace *ns ATTRIBUTE_UNUSED)
@@ -9627,6 +9821,9 @@ gfc_resolve_omp_directive (gfc_code *code, gfc_namespace *ns)
       code->ext.omp_clauses->if_present = false;
       resolve_omp_clauses (code, code->ext.omp_clauses, ns);
       break;
+    case EXEC_OMP_ALLOCATE:
+      gfc_resolve_omp_allocate (code, ns);
+      break;
     default:
       break;
     }
diff --git a/gcc/fortran/parse.cc b/gcc/fortran/parse.cc
index 0b4c596996c..97d182d46ad 100644
--- a/gcc/fortran/parse.cc
+++ b/gcc/fortran/parse.cc
@@ -886,6 +886,7 @@ decode_omp_directive (void)
     {
     case 'a':
       matcho ("atomic", gfc_match_omp_atomic, ST_OMP_ATOMIC);
+      matcho ("allocate", gfc_match_omp_allocate, ST_OMP_ALLOCATE);
       break;
     case 'b':
       matcho ("barrier", gfc_match_omp_barrier, ST_OMP_BARRIER);
@@ -1673,9 +1674,9 @@ next_statement (void)
   case ST_OMP_CANCEL: case ST_OMP_CANCELLATION_POINT: case ST_OMP_DEPOBJ: \
   case ST_OMP_TARGET_UPDATE: case ST_OMP_TARGET_ENTER_DATA: \
   case ST_OMP_TARGET_EXIT_DATA: case ST_OMP_ORDERED_DEPEND: case ST_OMP_ERROR: \
-  case ST_ERROR_STOP: case ST_OMP_SCAN: case ST_SYNC_ALL: \
-  case ST_SYNC_IMAGES: case ST_SYNC_MEMORY: case ST_LOCK: case ST_UNLOCK: \
-  case ST_FORM_TEAM: case ST_CHANGE_TEAM: \
+  case ST_OMP_ALLOCATE: case ST_ERROR_STOP: case ST_OMP_SCAN: \
+  case ST_SYNC_ALL: case ST_SYNC_IMAGES: case ST_SYNC_MEMORY: case ST_LOCK: \
+  case ST_UNLOCK: case ST_FORM_TEAM: case ST_CHANGE_TEAM: \
   case ST_END_TEAM: case ST_SYNC_TEAM: \
   case ST_EVENT_POST: case ST_EVENT_WAIT: case ST_FAIL_IMAGE: \
   case ST_OACC_UPDATE: case ST_OACC_WAIT: case ST_OACC_CACHE: \
@@ -2352,6 +2353,9 @@ gfc_ascii_statement (gfc_statement st)
     case ST_OACC_END_ATOMIC:
       p = "!$ACC END ATOMIC";
       break;
+    case ST_OMP_ALLOCATE:
+      p = "!$OMP ALLOCATE";
+      break;
     case ST_OMP_ATOMIC:
       p = "!$OMP ATOMIC";
       break;
diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc
index 2ebf076f730..65f24b88067 100644
--- a/gcc/fortran/resolve.cc
+++ b/gcc/fortran/resolve.cc
@@ -12368,6 +12368,7 @@ start:
 	  gfc_resolve_oacc_directive (code, ns);
 	  break;
 
+	case EXEC_OMP_ALLOCATE:
 	case EXEC_OMP_ATOMIC:
 	case EXEC_OMP_BARRIER:
 	case EXEC_OMP_CANCEL:
diff --git a/gcc/fortran/st.cc b/gcc/fortran/st.cc
index 73f30c2137f..7b282e96c3d 100644
--- a/gcc/fortran/st.cc
+++ b/gcc/fortran/st.cc
@@ -214,6 +214,7 @@ gfc_free_statement (gfc_code *p)
     case EXEC_OACC_ENTER_DATA:
     case EXEC_OACC_EXIT_DATA:
     case EXEC_OACC_ROUTINE:
+    case EXEC_OMP_ALLOCATE:
     case EXEC_OMP_ATOMIC:
     case EXEC_OMP_CANCEL:
     case EXEC_OMP_CANCELLATION_POINT:
diff --git a/gcc/fortran/trans.cc b/gcc/fortran/trans.cc
index 912a206f2ed..a9d5714be22 100644
--- a/gcc/fortran/trans.cc
+++ b/gcc/fortran/trans.cc
@@ -2174,6 +2174,7 @@ trans_code (gfc_code * code, tree cond)
 	  res = gfc_trans_dt_end (code);
 	  break;
 
+	case EXEC_OMP_ALLOCATE:
 	case EXEC_OMP_ATOMIC:
 	case EXEC_OMP_BARRIER:
 	case EXEC_OMP_CANCEL:
diff --git a/gcc/testsuite/gfortran.dg/gomp/allocate-4.f90 b/gcc/testsuite/gfortran.dg/gomp/allocate-4.f90
new file mode 100644
index 00000000000..3f512d66495
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/allocate-4.f90
@@ -0,0 +1,112 @@
+! { dg-do compile }
+
+module test
+  integer, allocatable :: mvar1
+  integer, allocatable :: mvar2
+  integer, allocatable :: mvar3
+end module
+
+subroutine foo(x, y)
+  use omp_lib
+  implicit none
+  integer  :: x
+  integer  :: y
+  
+  integer, allocatable :: var1(:)
+  integer, allocatable :: var2(:)
+  integer, allocatable :: var3(:)
+  integer, allocatable :: var4(:)
+  integer, allocatable :: var5(:)
+  integer, allocatable :: var6(:)
+  integer, allocatable :: var7(:)
+  integer, allocatable :: var8(:)
+  integer, allocatable :: var9(:)
+
+  !$omp allocate (var1) allocator(10) ! { dg-error "Expected integer expression of the 'omp_allocator_handle_kind' kind at .1." }
+  allocate (var1(x))
+
+  !$omp allocate (var2)  ! { dg-error "'var2' in 'allocate' directive at .1. is not present in associated 'allocate' statement." }
+  allocate (var3(x))
+
+  !$omp allocate (x) ! { dg-message "sorry, unimplemented: 'allocate' directive that is not associated with an 'allocate' statement is not supported." }
+  x = 2
+
+  !$omp allocate (var4) ! { dg-error "'var4' with ALLOCATABLE attribute is not allowed in 'allocate' directive at .1. as this directive is not associated with an 'allocate' statement." } 
+  ! { dg-message "sorry, unimplemented: 'allocate' directive that is not associated with an 'allocate' statement is not supported." "" { target *-*-* } .-1 }
+  y = 2
+
+  !$omp allocate (var5)
+  !$omp allocate  ! { dg-error "Empty variable list is not allowed at .1. when multiple 'allocate' directives are associated with an 'allocate' statement." }
+  allocate (var5(x))
+
+  !$omp allocate (var6)
+  !$omp allocate (var7)  ! { dg-error "'var7' in 'allocate' directive at .1. is not present in associated 'allocate' statement." }
+  !$omp allocate (var8)  ! { dg-error "'var8' in 'allocate' directive at .1. is not present in associated 'allocate' statement." }
+  allocate (var6(x))
+
+  !$omp allocate (var9)
+  !$omp allocate (var9)  ! { dg-error "'var9' is used in multiple 'allocate' directives at .1." }
+  allocate (var9(x))
+
+end subroutine
+
+function outer(a)
+  IMPLICIT NONE
+
+  integer :: outer, a
+  integer, allocatable :: var1
+
+  outer = inner(a) + 5
+  return
+
+  contains
+
+    integer function inner(x)
+    integer :: x
+    integer, allocatable :: var2
+
+    !$omp allocate (var1, var2) ! { dg-error "'var1' is not in the same scope as 'allocate' directive at .1." }
+    allocate (var1, var2)
+
+    inner = x + 10
+    return
+    end function inner
+
+end function outer
+
+subroutine bar(s)
+  use omp_lib
+  use test
+  integer  :: s
+  integer, save, allocatable :: svar1
+  integer, save, allocatable :: svar2
+  integer, save, allocatable :: svar3
+
+  type (omp_alloctrait) :: traits(3)
+  integer (omp_allocator_handle_kind) :: a
+
+  traits = [omp_alloctrait (omp_atk_alignment, 64), &
+            omp_alloctrait (omp_atk_fallback, omp_atv_null_fb), &
+            omp_alloctrait (omp_atk_pool_size, 8192)]
+  a = omp_init_allocator (omp_default_mem_space, 3, traits)
+  if (a == omp_null_allocator) stop 1
+
+  !$omp allocate (mvar1) allocator(a) ! { dg-error "'mvar1' should use predefined allocator at .1." }
+  allocate (mvar1)
+
+  !$omp allocate (mvar2) ! { dg-error "'mvar2' should use predefined allocator at .1." }
+  allocate (mvar2)
+
+  !$omp allocate (mvar3) allocator(omp_low_lat_mem_alloc)
+  allocate (mvar3)
+
+  !$omp allocate (svar1)  allocator(a) ! { dg-error "'svar1' should use predefined allocator at .1." }
+  allocate (svar1)
+
+  !$omp allocate (svar2) ! { dg-error "'svar2' should use predefined allocator at .1." }
+  allocate (svar2)
+
+  !$omp allocate (svar3) allocator(omp_low_lat_mem_alloc)
+  allocate (svar3)
+end subroutine
+
diff --git a/gcc/testsuite/gfortran.dg/gomp/allocate-5.f90 b/gcc/testsuite/gfortran.dg/gomp/allocate-5.f90
new file mode 100644
index 00000000000..761b6dede28
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/allocate-5.f90
@@ -0,0 +1,73 @@
+! { dg-do compile }
+
+module omp_lib_kinds
+  use iso_c_binding, only: c_int, c_intptr_t
+  implicit none
+  private :: c_int, c_intptr_t
+  integer, parameter :: omp_allocator_handle_kind = c_intptr_t
+
+  integer (kind=omp_allocator_handle_kind), &
+     parameter :: omp_null_allocator = 0
+  integer (kind=omp_allocator_handle_kind), &
+     parameter :: omp_default_mem_alloc = 1
+  integer (kind=omp_allocator_handle_kind), &
+     parameter :: omp_large_cap_mem_alloc = 2
+  integer (kind=omp_allocator_handle_kind), &
+     parameter :: omp_const_mem_alloc = 3
+  integer (kind=omp_allocator_handle_kind), &
+     parameter :: omp_high_bw_mem_alloc = 4
+  integer (kind=omp_allocator_handle_kind), &
+     parameter :: omp_low_lat_mem_alloc = 5
+  integer (kind=omp_allocator_handle_kind), &
+     parameter :: omp_cgroup_mem_alloc = 6
+  integer (kind=omp_allocator_handle_kind), &
+     parameter :: omp_pteam_mem_alloc = 7
+  integer (kind=omp_allocator_handle_kind), &
+     parameter :: omp_thread_mem_alloc = 8
+end module
+
+subroutine foo(x, y)
+  use omp_lib_kinds
+  implicit none
+  integer  :: x
+  integer  :: y
+
+  integer, allocatable :: var1(:)
+  integer, allocatable :: var2(:)
+  integer, allocatable :: var3(:)
+  integer, allocatable :: var4(:,:)
+  integer, allocatable :: var5(:)
+  integer, allocatable :: var6(:)
+  integer, allocatable :: var7(:)
+  integer, allocatable :: var8(:)
+  integer, allocatable :: var9(:)
+  integer, allocatable :: var10(:)
+  integer, allocatable :: var11(:)
+  integer, allocatable :: var12(:)
+
+  !$omp allocate (var1) allocator(omp_default_mem_alloc)
+  allocate (var1(x))
+  
+  !$omp allocate (var2)
+  allocate (var2(x))
+
+  !$omp allocate (var3, var4) allocator(omp_large_cap_mem_alloc)
+  allocate (var3(x),var4(x,y))
+
+  !$omp allocate()
+  allocate (var5(x))
+
+  !$omp allocate
+  allocate (var6(x))
+
+  !$omp allocate () allocator(omp_default_mem_alloc)
+  allocate (var7(x))
+
+  !$omp allocate allocator(omp_default_mem_alloc)
+  allocate (var8(x))
+
+  !$omp allocate (var9) allocator(omp_default_mem_alloc)
+  !$omp allocate (var10) allocator(omp_large_cap_mem_alloc)
+  allocate (var9(x), var10(x))
+
+end subroutine

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 11/17] Translate allocate directive (OpenMP 5.0).
  2022-07-07 10:34 [PATCH 00/17] openmp, nvptx, amdgcn: 5.0 Memory Allocators Andrew Stubbs
                   ` (9 preceding siblings ...)
  2022-07-07 10:34 ` [PATCH 10/17] Add parsing support for allocate directive (OpenMP 5.0) Andrew Stubbs
@ 2022-07-07 10:34 ` Andrew Stubbs
  2022-07-07 10:34 ` [PATCH 12/17] Handle cleanup of omp allocated variables " Andrew Stubbs
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 30+ messages in thread
From: Andrew Stubbs @ 2022-07-07 10:34 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1132 bytes --]


gcc/fortran/ChangeLog:

	* trans-openmp.cc (gfc_trans_omp_clauses): Handle OMP_LIST_ALLOCATOR.
	(gfc_trans_omp_allocate): New function.
	(gfc_trans_omp_directive): Handle EXEC_OMP_ALLOCATE.

gcc/ChangeLog:

	* tree-pretty-print.cc (dump_omp_clause): Handle OMP_CLAUSE_ALLOCATOR.
	(dump_generic_node): Handle OMP_ALLOCATE.
	* tree.def (OMP_ALLOCATE): New.
	* tree.h (OMP_ALLOCATE_CLAUSES): Likewise.
	(OMP_ALLOCATE_DECL): Likewise.
	(OMP_ALLOCATE_ALLOCATOR): Likewise.
	* tree.cc (omp_clause_num_ops): Add entry for OMP_CLAUSE_ALLOCATOR.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/allocate-6.f90: New test.
---
 gcc/fortran/trans-openmp.cc                   | 44 ++++++++++++
 gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 | 72 +++++++++++++++++++
 gcc/tree-core.h                               |  3 +
 gcc/tree-pretty-print.cc                      | 19 +++++
 gcc/tree.cc                                   |  1 +
 gcc/tree.def                                  |  4 ++
 gcc/tree.h                                    | 11 +++
 7 files changed, 154 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/allocate-6.f90


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0011-Translate-allocate-directive-OpenMP-5.0.patch --]
[-- Type: text/x-patch; name="0011-Translate-allocate-directive-OpenMP-5.0.patch", Size: 8152 bytes --]

diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index de27ed52c02..3ee63e416ed 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -2728,6 +2728,28 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses,
 		  }
 	      }
 	  break;
+	case OMP_LIST_ALLOCATOR:
+	  for (; n != NULL; n = n->next)
+	    if (n->sym->attr.referenced)
+	      {
+		tree t = gfc_trans_omp_variable (n->sym, false);
+		if (t != error_mark_node)
+		  {
+		    tree node = build_omp_clause (input_location,
+						  OMP_CLAUSE_ALLOCATOR);
+		    OMP_ALLOCATE_DECL (node) = t;
+		    if (n->expr)
+		      {
+			tree allocator_;
+			gfc_init_se (&se, NULL);
+			gfc_conv_expr (&se, n->expr);
+			allocator_ = gfc_evaluate_now (se.expr, block);
+			OMP_ALLOCATE_ALLOCATOR (node) = allocator_;
+		      }
+		    omp_clauses = gfc_trans_add_clause (node, omp_clauses);
+		  }
+	      }
+	  break;
 	case OMP_LIST_LINEAR:
 	  {
 	    gfc_expr *last_step_expr = NULL;
@@ -4982,6 +5004,26 @@ gfc_trans_omp_atomic (gfc_code *code)
   return gfc_finish_block (&block);
 }
 
+static tree
+gfc_trans_omp_allocate (gfc_code *code)
+{
+  stmtblock_t block;
+  tree stmt;
+
+  gfc_omp_clauses *clauses = code->ext.omp_clauses;
+  gcc_assert (clauses);
+
+  gfc_start_block (&block);
+  stmt = make_node (OMP_ALLOCATE);
+  TREE_TYPE (stmt) = void_type_node;
+  OMP_ALLOCATE_CLAUSES (stmt) = gfc_trans_omp_clauses (&block, clauses,
+						       code->loc, false,
+						       true);
+  gfc_add_expr_to_block (&block, stmt);
+  gfc_merge_block_scope (&block);
+  return gfc_finish_block (&block);
+}
+
 static tree
 gfc_trans_omp_barrier (void)
 {
@@ -7488,6 +7530,8 @@ gfc_trans_omp_directive (gfc_code *code)
 {
   switch (code->op)
     {
+    case EXEC_OMP_ALLOCATE:
+      return gfc_trans_omp_allocate (code);
     case EXEC_OMP_ATOMIC:
       return gfc_trans_omp_atomic (code);
     case EXEC_OMP_BARRIER:
diff --git a/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 b/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90
new file mode 100644
index 00000000000..2de2b52ee44
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90
@@ -0,0 +1,72 @@
+! { dg-do compile }
+! { dg-additional-options "-fdump-tree-original" }
+
+module omp_lib_kinds
+  use iso_c_binding, only: c_int, c_intptr_t
+  implicit none
+  private :: c_int, c_intptr_t
+  integer, parameter :: omp_allocator_handle_kind = c_intptr_t
+
+  integer (kind=omp_allocator_handle_kind), &
+     parameter :: omp_null_allocator = 0
+  integer (kind=omp_allocator_handle_kind), &
+     parameter :: omp_default_mem_alloc = 1
+  integer (kind=omp_allocator_handle_kind), &
+     parameter :: omp_large_cap_mem_alloc = 2
+  integer (kind=omp_allocator_handle_kind), &
+     parameter :: omp_const_mem_alloc = 3
+  integer (kind=omp_allocator_handle_kind), &
+     parameter :: omp_high_bw_mem_alloc = 4
+  integer (kind=omp_allocator_handle_kind), &
+     parameter :: omp_low_lat_mem_alloc = 5
+  integer (kind=omp_allocator_handle_kind), &
+     parameter :: omp_cgroup_mem_alloc = 6
+  integer (kind=omp_allocator_handle_kind), &
+     parameter :: omp_pteam_mem_alloc = 7
+  integer (kind=omp_allocator_handle_kind), &
+     parameter :: omp_thread_mem_alloc = 8
+end module
+
+
+subroutine foo(x, y, al)
+  use omp_lib_kinds
+  implicit none
+  
+type :: my_type
+  integer :: i
+  integer :: j
+  real :: x
+end type
+
+  integer  :: x
+  integer  :: y
+  integer (kind=omp_allocator_handle_kind) :: al
+
+  integer, allocatable :: var1
+  integer, allocatable :: var2
+  real, allocatable :: var3(:,:)
+  type (my_type), allocatable :: var4
+  integer, pointer :: pii, parr(:)
+
+  character, allocatable :: str1a, str1aarr(:) 
+  character(len=5), allocatable :: str5a, str5aarr(:)
+  
+  !$omp allocate
+  allocate(str1a, str1aarr(10), str5a, str5aarr(10))
+
+  !$omp allocate (var1) allocator(omp_default_mem_alloc)
+  !$omp allocate (var2) allocator(omp_large_cap_mem_alloc)
+  allocate (var1, var2)
+
+  !$omp allocate (var4)  allocator(omp_low_lat_mem_alloc)
+  allocate (var4)
+  var4%i = 5
+
+  !$omp allocate (var3)  allocator(omp_low_lat_mem_alloc)
+  allocate (var3(x,y))
+
+  !$omp allocate
+  allocate(pii, parr(5))
+end subroutine
+
+! { dg-final { scan-tree-dump-times "#pragma omp allocate" 6 "original" } }
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index ab5fa01e5cb..774bf0d7658 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -522,6 +522,9 @@ enum omp_clause_code {
 
   /* OpenACC clause: nohost.  */
   OMP_CLAUSE_NOHOST,
+
+  /* OpenMP clause: allocator.  */
+  OMP_CLAUSE_ALLOCATOR
 };
 
 #undef DEFTREESTRUCT
diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc
index 47371d8bcbe..4d21babbd34 100644
--- a/gcc/tree-pretty-print.cc
+++ b/gcc/tree-pretty-print.cc
@@ -767,6 +767,20 @@ dump_omp_clause (pretty_printer *pp, tree clause, int spc, dump_flags_t flags)
       pp_right_paren (pp);
       break;
 
+    case OMP_CLAUSE_ALLOCATOR:
+      pp_string (pp, "(");
+      dump_generic_node (pp, OMP_ALLOCATE_DECL (clause),
+			 spc, flags, false);
+      if (OMP_ALLOCATE_ALLOCATOR (clause))
+	{
+	  pp_string (pp, ":allocator(");
+	  dump_generic_node (pp, OMP_ALLOCATE_ALLOCATOR (clause),
+			     spc, flags, false);
+	  pp_right_paren (pp);
+	}
+      pp_right_paren (pp);
+      break;
+
     case OMP_CLAUSE_ALLOCATE:
       pp_string (pp, "allocate(");
       if (OMP_CLAUSE_ALLOCATE_ALLOCATOR (clause))
@@ -3525,6 +3539,11 @@ dump_generic_node (pretty_printer *pp, tree node, int spc, dump_flags_t flags,
       dump_omp_clauses (pp, OACC_CACHE_CLAUSES (node), spc, flags);
       break;
 
+    case OMP_ALLOCATE:
+      pp_string (pp, "#pragma omp allocate ");
+      dump_omp_clauses (pp, OMP_ALLOCATE_CLAUSES (node), spc, flags);
+      break;
+
     case OMP_PARALLEL:
       pp_string (pp, "#pragma omp parallel");
       dump_omp_clauses (pp, OMP_PARALLEL_CLAUSES (node), spc, flags);
diff --git a/gcc/tree.cc b/gcc/tree.cc
index 84000dd8b69..6dc1cf4d9b3 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -351,6 +351,7 @@ unsigned const char omp_clause_num_ops[] =
   0, /* OMP_CLAUSE_IF_PRESENT */
   0, /* OMP_CLAUSE_FINALIZE */
   0, /* OMP_CLAUSE_NOHOST */
+  2, /* OMP_CLAUSE_ALLOCATOR */
 };
 
 const char * const omp_clause_code_name[] =
diff --git a/gcc/tree.def b/gcc/tree.def
index 62650b6934b..b4d2f7a575d 100644
--- a/gcc/tree.def
+++ b/gcc/tree.def
@@ -1307,6 +1307,10 @@ DEFTREECODE (OMP_ATOMIC_READ, "omp_atomic_read", tcc_statement, 1)
 DEFTREECODE (OMP_ATOMIC_CAPTURE_OLD, "omp_atomic_capture_old", tcc_statement, 2)
 DEFTREECODE (OMP_ATOMIC_CAPTURE_NEW, "omp_atomic_capture_new", tcc_statement, 2)
 
+/* OpenMP - #pragma omp allocate
+   Operand 0: Clauses.  */
+DEFTREECODE (OMP_ALLOCATE, "omp allocate", tcc_statement, 1)
+
 /* OpenMP clauses.  */
 DEFTREECODE (OMP_CLAUSE, "omp_clause", tcc_exceptional, 0)
 
diff --git a/gcc/tree.h b/gcc/tree.h
index 6f6ad5a3a5f..b2575c18693 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -1466,6 +1466,8 @@ class auto_suppress_location_wrappers
 #define OACC_UPDATE_CLAUSES(NODE) \
   TREE_OPERAND (OACC_UPDATE_CHECK (NODE), 0)
 
+#define OMP_ALLOCATE_CLAUSES(NODE) TREE_OPERAND (OMP_ALLOCATE_CHECK (NODE), 0)
+
 #define OMP_PARALLEL_BODY(NODE)    TREE_OPERAND (OMP_PARALLEL_CHECK (NODE), 0)
 #define OMP_PARALLEL_CLAUSES(NODE) TREE_OPERAND (OMP_PARALLEL_CHECK (NODE), 1)
 
@@ -1872,6 +1874,15 @@ class auto_suppress_location_wrappers
 #define OMP_CLAUSE_ALLOCATE_ALIGN(NODE) \
   OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_ALLOCATE), 2)
 
+/* May be we can use OMP_CLAUSE_DECL but the I am not sure where to place
+   OMP_CLAUSE_ALLOCATOR in omp_clause_code.  */
+
+#define OMP_ALLOCATE_DECL(NODE) \
+  OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_ALLOCATOR), 0)
+
+#define OMP_ALLOCATE_ALLOCATOR(NODE) \
+  OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_ALLOCATOR), 1)
+
 /* True if an ALLOCATE clause was present on a combined or composite
    construct and the code for splitting the clauses has already performed
    checking if the listed variable has explicit privatization on the

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 12/17] Handle cleanup of omp allocated variables (OpenMP 5.0).
  2022-07-07 10:34 [PATCH 00/17] openmp, nvptx, amdgcn: 5.0 Memory Allocators Andrew Stubbs
                   ` (10 preceding siblings ...)
  2022-07-07 10:34 ` [PATCH 11/17] Translate " Andrew Stubbs
@ 2022-07-07 10:34 ` Andrew Stubbs
  2022-07-07 10:34 ` [PATCH 13/17] Gimplify allocate directive " Andrew Stubbs
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 30+ messages in thread
From: Andrew Stubbs @ 2022-07-07 10:34 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1777 bytes --]


Currently we are only handling omp allocate directive that is associated
with an allocate statement.  This statement results in malloc and free calls.
The malloc calls are easy to get to as they are in the same block as allocate
directive.  But the free calls come in a separate cleanup block.  To help any
later passes finding them, an allocate directive is generated in the
cleanup block with kind=free. The normal allocate directive is given
kind=allocate.

gcc/fortran/ChangeLog:

	* gfortran.h (struct access_ref): Declare new members
	omp_allocated and omp_allocated_end.
	* openmp.cc (gfc_match_omp_allocate): Set new_st.resolved_sym to
	NULL.
	(prepare_omp_allocated_var_list_for_cleanup): New function.
	(gfc_resolve_omp_allocate): Call it.
	* trans-decl.cc (gfc_trans_deferred_vars): Process omp_allocated.
	* trans-openmp.cc (gfc_trans_omp_allocate): Set kind for the stmt
	generated for allocate directive.

gcc/ChangeLog:

	* tree-core.h (struct tree_base): Add comments.
	* tree-pretty-print.cc (dump_generic_node): Handle allocate directive
	kind.
	* tree.h (OMP_ALLOCATE_KIND_ALLOCATE): New define.
	(OMP_ALLOCATE_KIND_FREE): Likewise.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/allocate-6.f90: Test kind of allocate directive.
---
 gcc/fortran/gfortran.h                        |  1 +
 gcc/fortran/openmp.cc                         | 30 +++++++++++++++++++
 gcc/fortran/trans-decl.cc                     | 20 +++++++++++++
 gcc/fortran/trans-openmp.cc                   |  6 ++++
 gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 |  3 +-
 gcc/tree-core.h                               |  6 ++++
 gcc/tree-pretty-print.cc                      |  4 +++
 gcc/tree.h                                    |  4 +++
 8 files changed, 73 insertions(+), 1 deletion(-)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0012-Handle-cleanup-of-omp-allocated-variables-OpenMP-5.0.patch --]
[-- Type: text/x-patch; name="0012-Handle-cleanup-of-omp-allocated-variables-OpenMP-5.0.patch", Size: 6228 bytes --]

diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 755469185a6..c6f58341cf3 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -1829,6 +1829,7 @@ typedef struct gfc_symbol
   gfc_array_spec *as;
   struct gfc_symbol *result;	/* function result symbol */
   gfc_component *components;	/* Derived type components */
+  gfc_omp_namelist *omp_allocated, *omp_allocated_end;
 
   /* Defined only for Cray pointees; points to their pointer.  */
   struct gfc_symbol *cp_pointer;
diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index 38003890bb0..4c94bc763b5 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -6057,6 +6057,7 @@ gfc_match_omp_allocate (void)
 
   new_st.op = EXEC_OMP_ALLOCATE;
   new_st.ext.omp_clauses = c;
+  new_st.resolved_sym = NULL;
   gfc_free_expr (allocator);
   return MATCH_YES;
 }
@@ -9548,6 +9549,34 @@ gfc_resolve_oacc_routines (gfc_namespace *ns)
     }
 }
 
+static void
+prepare_omp_allocated_var_list_for_cleanup (gfc_omp_namelist *cn, locus loc)
+{
+  gfc_symbol *proc = cn->sym->ns->proc_name;
+  gfc_omp_namelist *p, *n;
+
+  for (n = cn; n; n = n->next)
+    {
+      if (n->sym->attr.allocatable && !n->sym->attr.save
+	  && !n->sym->attr.result && !proc->attr.is_main_program)
+	{
+	  p = gfc_get_omp_namelist ();
+	  p->sym = n->sym;
+	  p->expr = gfc_copy_expr (n->expr);
+	  p->where = loc;
+	  p->next = NULL;
+	  if (proc->omp_allocated == NULL)
+	    proc->omp_allocated_end = proc->omp_allocated = p;
+	  else
+	    {
+	      proc->omp_allocated_end->next = p;
+	      proc->omp_allocated_end = p;
+	    }
+
+	}
+    }
+}
+
 static void
 check_allocate_directive_restrictions (gfc_symbol *sym, gfc_expr *omp_al,
 				       gfc_namespace *ns, locus loc)
@@ -9678,6 +9707,7 @@ gfc_resolve_omp_allocate (gfc_code *code, gfc_namespace *ns)
 						 code->loc);
 	}
     }
+  prepare_omp_allocated_var_list_for_cleanup (cn, code->loc);
 }
 
 
diff --git a/gcc/fortran/trans-decl.cc b/gcc/fortran/trans-decl.cc
index 6493cc2f6b1..326365f22fc 100644
--- a/gcc/fortran/trans-decl.cc
+++ b/gcc/fortran/trans-decl.cc
@@ -4588,6 +4588,26 @@ gfc_trans_deferred_vars (gfc_symbol * proc_sym, gfc_wrapped_block * block)
 	  }
     }
 
+  /* Generate a dummy allocate pragma with free kind so that cleanup
+     of those variables which were allocated using the allocate statement
+     associated with an allocate clause happens correctly.  */
+
+  if (proc_sym->omp_allocated)
+    {
+      gfc_clear_new_st ();
+      new_st.op = EXEC_OMP_ALLOCATE;
+      gfc_omp_clauses *c = gfc_get_omp_clauses ();
+      c->lists[OMP_LIST_ALLOCATOR] = proc_sym->omp_allocated;
+      new_st.ext.omp_clauses = c;
+      /* This is just a hacky way to convey to handler that we are
+	 dealing with cleanup here.  Saves us from using another field
+	 for it.  */
+      new_st.resolved_sym = proc_sym->omp_allocated->sym;
+      gfc_add_init_cleanup (block, NULL,
+			    gfc_trans_omp_directive (&new_st));
+      gfc_free_omp_clauses (c);
+      proc_sym->omp_allocated = NULL;
+    }
 
   /* Initialize the INTENT(OUT) derived type dummy arguments.  This
      should be done here so that the offsets and lbounds of arrays
diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index 3ee63e416ed..ab3c0c620b7 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -5019,6 +5019,12 @@ gfc_trans_omp_allocate (gfc_code *code)
   OMP_ALLOCATE_CLAUSES (stmt) = gfc_trans_omp_clauses (&block, clauses,
 						       code->loc, false,
 						       true);
+  if (code->next == NULL && code->block == NULL
+      && code->resolved_sym != NULL)
+    OMP_ALLOCATE_KIND_FREE (stmt) = 1;
+  else
+    OMP_ALLOCATE_KIND_ALLOCATE (stmt) = 1;
+
   gfc_add_expr_to_block (&block, stmt);
   gfc_merge_block_scope (&block);
   return gfc_finish_block (&block);
diff --git a/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 b/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90
index 2de2b52ee44..0eb35178e03 100644
--- a/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90
@@ -69,4 +69,5 @@ end type
   allocate(pii, parr(5))
 end subroutine
 
-! { dg-final { scan-tree-dump-times "#pragma omp allocate" 6 "original" } }
+! { dg-final { scan-tree-dump-times "#pragma omp allocate \\(kind=allocate\\)" 6 "original" } }
+! { dg-final { scan-tree-dump "#pragma omp allocate \\(kind=free\\)" "original" } }
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index 774bf0d7658..b0d5c074552 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -1257,6 +1257,9 @@ struct GTY(()) tree_base {
        EXPR_LOCATION_WRAPPER_P in
 	   NON_LVALUE_EXPR, VIEW_CONVERT_EXPR
 
+       OMP_ALLOCATE_KIND_ALLOCATE in
+	   OMP_ALLOCATE
+
    private_flag:
 
        TREE_PRIVATE in
@@ -1283,6 +1286,9 @@ struct GTY(()) tree_base {
        ENUM_IS_OPAQUE in
 	   ENUMERAL_TYPE
 
+       OMP_ALLOCATE_KIND_FREE in
+	   OMP_ALLOCATE
+
    protected_flag:
 
        TREE_PROTECTED in
diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc
index 4d21babbd34..23dd45de556 100644
--- a/gcc/tree-pretty-print.cc
+++ b/gcc/tree-pretty-print.cc
@@ -3541,6 +3541,10 @@ dump_generic_node (pretty_printer *pp, tree node, int spc, dump_flags_t flags,
 
     case OMP_ALLOCATE:
       pp_string (pp, "#pragma omp allocate ");
+      if (OMP_ALLOCATE_KIND_ALLOCATE (node))
+	pp_string (pp, "(kind=allocate) ");
+      else if (OMP_ALLOCATE_KIND_FREE (node))
+	pp_string (pp, "(kind=free) ");
       dump_omp_clauses (pp, OMP_ALLOCATE_CLAUSES (node), spc, flags);
       break;
 
diff --git a/gcc/tree.h b/gcc/tree.h
index b2575c18693..1b67505f974 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -1467,6 +1467,10 @@ class auto_suppress_location_wrappers
   TREE_OPERAND (OACC_UPDATE_CHECK (NODE), 0)
 
 #define OMP_ALLOCATE_CLAUSES(NODE) TREE_OPERAND (OMP_ALLOCATE_CHECK (NODE), 0)
+#define OMP_ALLOCATE_KIND_ALLOCATE(NODE) \
+  (OMP_ALLOCATE_CHECK (NODE)->base.public_flag)
+#define OMP_ALLOCATE_KIND_FREE(NODE) \
+  (OMP_ALLOCATE_CHECK (NODE)->base.private_flag)
 
 #define OMP_PARALLEL_BODY(NODE)    TREE_OPERAND (OMP_PARALLEL_CHECK (NODE), 0)
 #define OMP_PARALLEL_CLAUSES(NODE) TREE_OPERAND (OMP_PARALLEL_CHECK (NODE), 1)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 13/17] Gimplify allocate directive (OpenMP 5.0).
  2022-07-07 10:34 [PATCH 00/17] openmp, nvptx, amdgcn: 5.0 Memory Allocators Andrew Stubbs
                   ` (11 preceding siblings ...)
  2022-07-07 10:34 ` [PATCH 12/17] Handle cleanup of omp allocated variables " Andrew Stubbs
@ 2022-07-07 10:34 ` Andrew Stubbs
  2022-07-07 10:34 ` [PATCH 14/17] Lower " Andrew Stubbs
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 30+ messages in thread
From: Andrew Stubbs @ 2022-07-07 10:34 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1568 bytes --]


gcc/ChangeLog:

	* doc/gimple.texi: Describe GIMPLE_OMP_ALLOCATE.
	* gimple-pretty-print.cc (dump_gimple_omp_allocate): New function.
	(pp_gimple_stmt_1): Call it.
	* gimple.cc (gimple_build_omp_allocate): New function.
	* gimple.def (GIMPLE_OMP_ALLOCATE): New node.
	* gimple.h (enum gf_mask): Add GF_OMP_ALLOCATE_KIND_MASK,
	GF_OMP_ALLOCATE_KIND_ALLOCATE and GF_OMP_ALLOCATE_KIND_FREE.
	(struct gomp_allocate): New.
	(is_a_helper <gomp_allocate *>::test): New.
	(is_a_helper <const gomp_allocate *>::test): New.
	(gimple_build_omp_allocate): Declare.
	(gimple_omp_subcode): Replace GIMPLE_OMP_TEAMS with
	GIMPLE_OMP_ALLOCATE.
	(gimple_omp_allocate_set_clauses): New.
	(gimple_omp_allocate_set_kind): Likewise.
	(gimple_omp_allocate_clauses): Likewise.
	(gimple_omp_allocate_kind): Likewise.
	(CASE_GIMPLE_OMP): Add GIMPLE_OMP_ALLOCATE.
	* gimplify.cc (gimplify_omp_allocate): New.
	(gimplify_expr): Call it.
	* gsstruct.def (GSS_OMP_ALLOCATE): Define.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/allocate-6.f90: Add tests.
---
 gcc/doc/gimple.texi                           | 38 +++++++++++-
 gcc/gimple-pretty-print.cc                    | 37 ++++++++++++
 gcc/gimple.cc                                 | 12 ++++
 gcc/gimple.def                                |  6 ++
 gcc/gimple.h                                  | 60 ++++++++++++++++++-
 gcc/gimplify.cc                               | 19 ++++++
 gcc/gsstruct.def                              |  1 +
 gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 |  4 +-
 8 files changed, 173 insertions(+), 4 deletions(-)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0013-Gimplify-allocate-directive-OpenMP-5.0.patch --]
[-- Type: text/x-patch; name="0013-Gimplify-allocate-directive-OpenMP-5.0.patch", Size: 11863 bytes --]

diff --git a/gcc/doc/gimple.texi b/gcc/doc/gimple.texi
index dd9149377f3..67b9061f3a7 100644
--- a/gcc/doc/gimple.texi
+++ b/gcc/doc/gimple.texi
@@ -420,6 +420,9 @@ kinds, along with their relationships to @code{GSS_} values (layouts) and
      + gomp_continue
      |        layout: GSS_OMP_CONTINUE, code: GIMPLE_OMP_CONTINUE
      |
+     + gomp_allocate
+     |        layout: GSS_OMP_ALLOCATE, code: GIMPLE_OMP_ALLOCATE
+     |
      + gomp_atomic_load
      |        layout: GSS_OMP_ATOMIC_LOAD, code: GIMPLE_OMP_ATOMIC_LOAD
      |
@@ -454,6 +457,7 @@ The following table briefly describes the GIMPLE instruction set.
 @item @code{GIMPLE_GOTO}		@tab x			@tab x
 @item @code{GIMPLE_LABEL}		@tab x			@tab x
 @item @code{GIMPLE_NOP}			@tab x			@tab x
+@item @code{GIMPLE_OMP_ALLOCATE}	@tab x			@tab x
 @item @code{GIMPLE_OMP_ATOMIC_LOAD}	@tab x			@tab x
 @item @code{GIMPLE_OMP_ATOMIC_STORE}	@tab x			@tab x
 @item @code{GIMPLE_OMP_CONTINUE}	@tab x			@tab x
@@ -1029,6 +1033,7 @@ Return a deep copy of statement @code{STMT}.
 * @code{GIMPLE_LABEL}::
 * @code{GIMPLE_GOTO}::
 * @code{GIMPLE_NOP}::
+* @code{GIMPLE_OMP_ALLOCATE}::
 * @code{GIMPLE_OMP_ATOMIC_LOAD}::
 * @code{GIMPLE_OMP_ATOMIC_STORE}::
 * @code{GIMPLE_OMP_CONTINUE}::
@@ -1729,6 +1734,38 @@ Build a @code{GIMPLE_NOP} statement.
 Returns @code{TRUE} if statement @code{G} is a @code{GIMPLE_NOP}.
 @end deftypefn
 
+@node @code{GIMPLE_OMP_ALLOCATE}
+@subsection @code{GIMPLE_OMP_ALLOCATE}
+@cindex @code{GIMPLE_OMP_ALLOCATE}
+
+@deftypefn {GIMPLE function} gomp_allocate *gimple_build_omp_allocate ( @
+tree clauses, int kind)
+Build a @code{GIMPLE_OMP_ALLOCATE} statement.  @code{CLAUSES} is the clauses
+associated with this node.  @code{KIND} is the enumeration value
+@code{GF_OMP_ALLOCATE_KIND_ALLOCATE} if this directive allocates memory
+or @code{GF_OMP_ALLOCATE_KIND_FREE} if it de-allocates.
+@end deftypefn
+
+@deftypefn {GIMPLE function} void gimple_omp_allocate_set_clauses ( @
+gomp_allocate *g, tree clauses)
+Set the @code{CLAUSES} for a @code{GIMPLE_OMP_ALLOCATE}.
+@end deftypefn
+
+@deftypefn {GIMPLE function} tree gimple_omp_aallocate_clauses ( @
+const gomp_allocate *g)
+Get the @code{CLAUSES} of a @code{GIMPLE_OMP_ALLOCATE}.
+@end deftypefn
+
+@deftypefn {GIMPLE function} void gimple_omp_allocate_set_kind ( @
+gomp_allocate *g, int kind)
+Set the @code{KIND} for a @code{GIMPLE_OMP_ALLOCATE}.
+@end deftypefn
+
+@deftypefn {GIMPLE function} tree gimple_omp_allocate_kind ( @
+const gomp_atomic_load *g)
+Get the @code{KIND} of a @code{GIMPLE_OMP_ALLOCATE}.
+@end deftypefn
+
 @node @code{GIMPLE_OMP_ATOMIC_LOAD}
 @subsection @code{GIMPLE_OMP_ATOMIC_LOAD}
 @cindex @code{GIMPLE_OMP_ATOMIC_LOAD}
@@ -1760,7 +1797,6 @@ const gomp_atomic_load *g)
 Get the @code{RHS} of an atomic set.
 @end deftypefn
 
-
 @node @code{GIMPLE_OMP_ATOMIC_STORE}
 @subsection @code{GIMPLE_OMP_ATOMIC_STORE}
 @cindex @code{GIMPLE_OMP_ATOMIC_STORE}
diff --git a/gcc/gimple-pretty-print.cc b/gcc/gimple-pretty-print.cc
index ebd87b20a0a..bb961a900df 100644
--- a/gcc/gimple-pretty-print.cc
+++ b/gcc/gimple-pretty-print.cc
@@ -1967,6 +1967,38 @@ dump_gimple_omp_critical (pretty_printer *buffer, const gomp_critical *gs,
     }
 }
 
+static void
+dump_gimple_omp_allocate (pretty_printer *buffer, const gomp_allocate *gs,
+			  int spc, dump_flags_t flags)
+{
+  if (flags & TDF_RAW)
+    {
+      const char *kind="";
+      switch (gimple_omp_allocate_kind (gs))
+      {
+	case GF_OMP_ALLOCATE_KIND_ALLOCATE:
+	  kind = "allocate";
+	  break;
+	case GF_OMP_ALLOCATE_KIND_FREE:
+	  kind = "free";
+	  break;
+      }
+    dump_gimple_fmt (buffer, spc, flags, "%G <kind:%s CLAUSES <", gs, kind);
+    dump_omp_clauses (buffer, gimple_omp_allocate_clauses (gs), spc, flags);
+    dump_gimple_fmt (buffer, spc, flags, " > >");
+    }
+  else
+    {
+      pp_string (buffer, "#pragma omp allocate ");
+      if (gimple_omp_allocate_kind (gs) == GF_OMP_ALLOCATE_KIND_ALLOCATE)
+	pp_string (buffer, "(kind=allocate) ");
+      else if (gimple_omp_allocate_kind (gs) == GF_OMP_ALLOCATE_KIND_FREE)
+	pp_string (buffer, "(kind=free) ");
+
+      dump_omp_clauses (buffer, gimple_omp_allocate_clauses (gs), spc, flags);
+    }
+}
+
 /* Dump a GIMPLE_OMP_ORDERED tuple on the pretty_printer BUFFER.  */
 
 static void
@@ -2823,6 +2855,11 @@ pp_gimple_stmt_1 (pretty_printer *buffer, const gimple *gs, int spc,
 				flags);
       break;
 
+    case GIMPLE_OMP_ALLOCATE:
+      dump_gimple_omp_allocate (buffer, as_a <const gomp_allocate *> (gs), spc,
+				flags);
+      break;
+
     case GIMPLE_CATCH:
       dump_gimple_catch (buffer, as_a <const gcatch *> (gs), spc, flags);
       break;
diff --git a/gcc/gimple.cc b/gcc/gimple.cc
index 9b156399ba1..a8b29f85d3d 100644
--- a/gcc/gimple.cc
+++ b/gcc/gimple.cc
@@ -1280,6 +1280,18 @@ gimple_build_omp_atomic_store (tree val, enum omp_memory_order mo)
   return p;
 }
 
+/* Build a GIMPLE_OMP_ALLOCATE statement.  */
+
+gomp_allocate *
+gimple_build_omp_allocate (tree clauses, int kind)
+{
+  gomp_allocate *p
+    = as_a <gomp_allocate *> (gimple_alloc (GIMPLE_OMP_ALLOCATE, 0));
+  gimple_omp_allocate_set_clauses (p, clauses);
+  gimple_omp_allocate_set_kind (p, kind);
+  return p;
+}
+
 /* Build a GIMPLE_TRANSACTION statement.  */
 
 gtransaction *
diff --git a/gcc/gimple.def b/gcc/gimple.def
index 296c73c2d52..079565c3920 100644
--- a/gcc/gimple.def
+++ b/gcc/gimple.def
@@ -388,6 +388,12 @@ DEFGSCODE(GIMPLE_OMP_TARGET, "gimple_omp_target", GSS_OMP_PARALLEL_LAYOUT)
    CHILD_FN and DATA_ARG like for GIMPLE_OMP_PARALLEL.  */
 DEFGSCODE(GIMPLE_OMP_TEAMS, "gimple_omp_teams", GSS_OMP_PARALLEL_LAYOUT)
 
+/* GIMPLE_OMP_ALLOCATE <CLAUSES> represents
+   #pragma omp allocate
+   CLAUSES is an OMP_CLAUSE chain holding the associated clauses which hold
+   variables to be allocated.  */
+DEFGSCODE(GIMPLE_OMP_ALLOCATE, "gimple_omp_allocate", GSS_OMP_ALLOCATE)
+
 /* GIMPLE_OMP_ORDERED <BODY, CLAUSES> represents #pragma omp ordered.
    BODY is the sequence of statements to execute in the ordered section.
    CLAUSES is an OMP_CLAUSE chain holding the associated clauses.  */
diff --git a/gcc/gimple.h b/gcc/gimple.h
index 1d15ff98ac2..aa0ae4078ad 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -150,6 +150,9 @@ enum gf_mask {
     GF_CALL_BY_DESCRIPTOR	= 1 << 10,
     GF_CALL_NOCF_CHECK		= 1 << 11,
     GF_CALL_FROM_NEW_OR_DELETE	= 1 << 12,
+    GF_OMP_ALLOCATE_KIND_MASK	= (1 << 2) - 1,
+    GF_OMP_ALLOCATE_KIND_ALLOCATE = 1,
+    GF_OMP_ALLOCATE_KIND_FREE = 2,
     GF_OMP_PARALLEL_COMBINED	= 1 << 0,
     GF_OMP_TASK_TASKLOOP	= 1 << 0,
     GF_OMP_TASK_TASKWAIT	= 1 << 1,
@@ -796,6 +799,17 @@ struct GTY((tag("GSS_OMP_ATOMIC_LOAD")))
   tree rhs, lhs;
 };
 
+/* GSS_OMP_ALLOCATE.  */
+
+struct GTY((tag("GSS_OMP_ALLOCATE")))
+  gomp_allocate : public gimple
+{
+  /* [ WORD 1-6 ] : base class */
+
+  /* [ WORD 7 ]  */
+  tree clauses;
+};
+
 /* GIMPLE_OMP_ATOMIC_STORE.
    See note on GIMPLE_OMP_ATOMIC_LOAD.  */
 
@@ -1129,6 +1143,14 @@ is_a_helper <gomp_atomic_store *>::test (gimple *gs)
   return gs->code == GIMPLE_OMP_ATOMIC_STORE;
 }
 
+template <>
+template <>
+inline bool
+is_a_helper <gomp_allocate *>::test (gimple *gs)
+{
+  return gs->code == GIMPLE_OMP_ALLOCATE;
+}
+
 template <>
 template <>
 inline bool
@@ -1371,6 +1393,14 @@ is_a_helper <const gomp_atomic_store *>::test (const gimple *gs)
   return gs->code == GIMPLE_OMP_ATOMIC_STORE;
 }
 
+template <>
+template <>
+inline bool
+is_a_helper <const gomp_allocate *>::test (const gimple *gs)
+{
+  return gs->code == GIMPLE_OMP_ALLOCATE;
+}
+
 template <>
 template <>
 inline bool
@@ -1572,6 +1602,7 @@ gomp_sections *gimple_build_omp_sections (gimple_seq, tree);
 gimple *gimple_build_omp_sections_switch (void);
 gomp_single *gimple_build_omp_single (gimple_seq, tree);
 gomp_target *gimple_build_omp_target (gimple_seq, int, tree);
+gomp_allocate *gimple_build_omp_allocate (tree, int);
 gomp_teams *gimple_build_omp_teams (gimple_seq, tree);
 gomp_atomic_load *gimple_build_omp_atomic_load (tree, tree,
 						enum omp_memory_order);
@@ -2312,7 +2343,7 @@ static inline unsigned
 gimple_omp_subcode (const gimple *s)
 {
   gcc_gimple_checking_assert (gimple_code (s) >= GIMPLE_OMP_ATOMIC_LOAD
-			      && gimple_code (s) <= GIMPLE_OMP_TEAMS);
+			      && gimple_code (s) <= GIMPLE_OMP_ALLOCATE);
   return s->subcode;
 }
 
@@ -6365,6 +6396,30 @@ gimple_omp_sections_set_control (gimple *gs, tree control)
   omp_sections_stmt->control = control;
 }
 
+static inline void
+gimple_omp_allocate_set_clauses (gomp_allocate *gs, tree c)
+{
+  gs->clauses = c;
+}
+
+static inline void
+gimple_omp_allocate_set_kind (gomp_allocate *gs, int kind)
+{
+  gs->subcode = (gs->subcode & ~GF_OMP_ALLOCATE_KIND_MASK)
+		      | (kind & GF_OMP_ALLOCATE_KIND_MASK);
+}
+
+static inline tree
+gimple_omp_allocate_clauses (const gomp_allocate *gs)
+{
+  return gs->clauses;
+}
+
+static inline int
+gimple_omp_allocate_kind (const gomp_allocate *gs)
+{
+  return (gimple_omp_subcode (gs) & GF_OMP_ALLOCATE_KIND_MASK);
+}
 
 /* Set the value being stored in an atomic store.  */
 
@@ -6648,7 +6703,8 @@ gimple_return_set_retval (greturn *gs, tree retval)
     case GIMPLE_OMP_RETURN:			\
     case GIMPLE_OMP_ATOMIC_LOAD:		\
     case GIMPLE_OMP_ATOMIC_STORE:		\
-    case GIMPLE_OMP_CONTINUE
+    case GIMPLE_OMP_CONTINUE:			\
+    case GIMPLE_OMP_ALLOCATE
 
 static inline bool
 is_gimple_omp (const gimple *stmt)
diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 04990ad91a6..1119ee3bc42 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -14356,6 +14356,21 @@ gimplify_omp_workshare (tree *expr_p, gimple_seq *pre_p)
   *expr_p = NULL_TREE;
 }
 
+static void
+gimplify_omp_allocate (tree *expr_p, gimple_seq *pre_p)
+{
+  tree expr = *expr_p;
+  int kind;
+  if (OMP_ALLOCATE_KIND_ALLOCATE (expr))
+    kind = GF_OMP_ALLOCATE_KIND_ALLOCATE;
+  else
+    kind = GF_OMP_ALLOCATE_KIND_FREE;
+  gimple *stmt = gimple_build_omp_allocate (OMP_ALLOCATE_CLAUSES (expr),
+					    kind);
+  gimplify_seq_add_stmt (pre_p, stmt);
+  *expr_p = NULL_TREE;
+}
+
 /* Gimplify the gross structure of OpenACC enter/exit data, update, and OpenMP
    target update constructs.  */
 
@@ -15755,6 +15770,10 @@ gimplify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p,
 	  gimplify_omp_target_update (expr_p, pre_p);
 	  ret = GS_ALL_DONE;
 	  break;
+	case OMP_ALLOCATE:
+	  gimplify_omp_allocate (expr_p, pre_p);
+	  ret = GS_ALL_DONE;
+	  break;
 
 	case OMP_SECTION:
 	case OMP_MASTER:
diff --git a/gcc/gsstruct.def b/gcc/gsstruct.def
index 19e1088b718..9c7526596e8 100644
--- a/gcc/gsstruct.def
+++ b/gcc/gsstruct.def
@@ -50,4 +50,5 @@ DEFGSSTRUCT(GSS_OMP_SINGLE_LAYOUT, gimple_statement_omp_single_layout, false)
 DEFGSSTRUCT(GSS_OMP_CONTINUE, gomp_continue, false)
 DEFGSSTRUCT(GSS_OMP_ATOMIC_LOAD, gomp_atomic_load, false)
 DEFGSSTRUCT(GSS_OMP_ATOMIC_STORE_LAYOUT, gomp_atomic_store, false)
+DEFGSSTRUCT(GSS_OMP_ALLOCATE, gomp_allocate, false)
 DEFGSSTRUCT(GSS_TRANSACTION, gtransaction, false)
diff --git a/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 b/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90
index 0eb35178e03..6957bc55da0 100644
--- a/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90
@@ -1,5 +1,5 @@
 ! { dg-do compile }
-! { dg-additional-options "-fdump-tree-original" }
+! { dg-additional-options "-fdump-tree-original -fdump-tree-gimple" }
 
 module omp_lib_kinds
   use iso_c_binding, only: c_int, c_intptr_t
@@ -71,3 +71,5 @@ end subroutine
 
 ! { dg-final { scan-tree-dump-times "#pragma omp allocate \\(kind=allocate\\)" 6 "original" } }
 ! { dg-final { scan-tree-dump "#pragma omp allocate \\(kind=free\\)" "original" } }
+! { dg-final { scan-tree-dump-times "#pragma omp allocate \\(kind=allocate\\)" 6 "gimple" } }
+! { dg-final { scan-tree-dump "#pragma omp allocate \\(kind=free\\)" "gimple" } }

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 14/17] Lower allocate directive (OpenMP 5.0).
  2022-07-07 10:34 [PATCH 00/17] openmp, nvptx, amdgcn: 5.0 Memory Allocators Andrew Stubbs
                   ` (12 preceding siblings ...)
  2022-07-07 10:34 ` [PATCH 13/17] Gimplify allocate directive " Andrew Stubbs
@ 2022-07-07 10:34 ` Andrew Stubbs
  2022-07-07 10:34 ` [PATCH 15/17] amdgcn: Support XNACK mode Andrew Stubbs
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 30+ messages in thread
From: Andrew Stubbs @ 2022-07-07 10:34 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1160 bytes --]


This patch looks for malloc/free calls that were generated by allocate statement
that is associated with allocate directive and replaces them with GOMP_alloc
and GOMP_free.

gcc/ChangeLog:

	* omp-low.cc (scan_sharing_clauses): Handle OMP_CLAUSE_ALLOCATOR.
	(scan_omp_allocate): New.
	(scan_omp_1_stmt): Call it.
	(lower_omp_allocate): New function.
	(lower_omp_1): Call it.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/allocate-6.f90: Add tests.
	* gfortran.dg/gomp/allocate-7.f90: New test.
	* gfortran.dg/gomp/allocate-8.f90: New test.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/allocate-2.f90: New test.
---
 gcc/omp-low.cc                                | 139 ++++++++++++++++++
 gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 |   9 ++
 gcc/testsuite/gfortran.dg/gomp/allocate-7.f90 |  13 ++
 gcc/testsuite/gfortran.dg/gomp/allocate-8.f90 |  15 ++
 .../testsuite/libgomp.fortran/allocate-2.f90  |  48 ++++++
 5 files changed, 224 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/allocate-7.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/allocate-8.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/allocate-2.f90


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0014-Lower-allocate-directive-OpenMP-5.0.patch --]
[-- Type: text/x-patch; name="0014-Lower-allocate-directive-OpenMP-5.0.patch", Size: 9831 bytes --]

diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index cdadd6f0c96..7d1a2a0d795 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -1746,6 +1746,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
 	case OMP_CLAUSE_FINALIZE:
 	case OMP_CLAUSE_TASK_REDUCTION:
 	case OMP_CLAUSE_ALLOCATE:
+	case OMP_CLAUSE_ALLOCATOR:
 	  break;
 
 	case OMP_CLAUSE_ALIGNED:
@@ -1963,6 +1964,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
 	case OMP_CLAUSE_FINALIZE:
 	case OMP_CLAUSE_FILTER:
 	case OMP_CLAUSE__CONDTEMP_:
+	case OMP_CLAUSE_ALLOCATOR:
 	  break;
 
 	case OMP_CLAUSE__CACHE_:
@@ -3033,6 +3035,16 @@ scan_omp_simd_scan (gimple_stmt_iterator *gsi, gomp_for *stmt,
   maybe_lookup_ctx (new_stmt)->for_simd_scan_phase = true;
 }
 
+/* Scan an OpenMP allocate directive.  */
+
+static void
+scan_omp_allocate (gomp_allocate *stmt, omp_context *outer_ctx)
+{
+  omp_context *ctx;
+  ctx = new_omp_context (stmt, outer_ctx);
+  scan_sharing_clauses (gimple_omp_allocate_clauses (stmt), ctx);
+}
+
 /* Scan an OpenMP sections directive.  */
 
 static void
@@ -4332,6 +4344,9 @@ scan_omp_1_stmt (gimple_stmt_iterator *gsi, bool *handled_ops_p,
 	    insert_decl_map (&ctx->cb, var, var);
       }
       break;
+    case GIMPLE_OMP_ALLOCATE:
+      scan_omp_allocate (as_a <gomp_allocate *> (stmt), ctx);
+      break;
     default:
       *handled_ops_p = false;
       break;
@@ -8768,6 +8783,125 @@ lower_omp_single_simple (gomp_single *single_stmt, gimple_seq *pre_p)
   gimple_seq_add_stmt (pre_p, gimple_build_label (flabel));
 }
 
+static void
+lower_omp_allocate (gimple_stmt_iterator *gsi_p, omp_context *ctx)
+{
+  gomp_allocate *st = as_a <gomp_allocate *> (gsi_stmt (*gsi_p));
+  tree clauses = gimple_omp_allocate_clauses (st);
+  int kind = gimple_omp_allocate_kind (st);
+  gcc_assert (kind == GF_OMP_ALLOCATE_KIND_ALLOCATE
+	      || kind == GF_OMP_ALLOCATE_KIND_FREE);
+
+  for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+    {
+      if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_ALLOCATOR)
+	continue;
+
+      bool allocate = (kind == GF_OMP_ALLOCATE_KIND_ALLOCATE);
+      /* The allocate directives that appear in a target region must specify
+	 an allocator clause unless a requires directive with the
+	 dynamic_allocators clause is present in the same compilation unit.  */
+      if (OMP_ALLOCATE_ALLOCATOR (c) == NULL_TREE
+	  && ((omp_requires_mask & OMP_REQUIRES_DYNAMIC_ALLOCATORS) == 0)
+	  && omp_maybe_offloaded_ctx (ctx))
+	error_at (OMP_CLAUSE_LOCATION (c), "%<allocate%> directive must"
+		  " specify an allocator here");
+
+      tree var = OMP_ALLOCATE_DECL (c);
+
+      gimple_stmt_iterator gsi = *gsi_p;
+      for (gsi_next (&gsi); !gsi_end_p (gsi); gsi_next (&gsi))
+	{
+	  gimple *stmt = gsi_stmt (gsi);
+
+	  if (gimple_code (stmt) != GIMPLE_CALL
+	      || (allocate && gimple_call_fndecl (stmt)
+		  != builtin_decl_explicit (BUILT_IN_MALLOC))
+	      || (!allocate && gimple_call_fndecl (stmt)
+		  != builtin_decl_explicit (BUILT_IN_FREE)))
+	    continue;
+	  const gcall *gs = as_a <const gcall *> (stmt);
+	  tree allocator = OMP_ALLOCATE_ALLOCATOR (c)
+			   ? OMP_ALLOCATE_ALLOCATOR (c)
+			   : integer_zero_node;
+	  if (allocate)
+	    {
+	      tree lhs = gimple_call_lhs (gs);
+	      if (lhs && TREE_CODE (lhs) == SSA_NAME)
+		{
+		  gimple_stmt_iterator gsi2 = gsi;
+		  gsi_next (&gsi2);
+		  gimple *assign = gsi_stmt (gsi2);
+		  if (gimple_code (assign) == GIMPLE_ASSIGN)
+		    {
+		      lhs = gimple_assign_lhs (as_a <const gassign *> (assign));
+		      if (lhs == NULL_TREE
+			  || TREE_CODE (lhs) != COMPONENT_REF)
+			continue;
+		      lhs = TREE_OPERAND (lhs, 0);
+		    }
+		}
+
+	      if (lhs == var)
+		{
+		  unsigned HOST_WIDE_INT ialign = 0;
+		  tree align;
+		  if (TYPE_P (var))
+		    ialign = TYPE_ALIGN_UNIT (var);
+		  else
+		    ialign = DECL_ALIGN_UNIT (var);
+		  align = build_int_cst (size_type_node, ialign);
+		  tree repl = builtin_decl_explicit (BUILT_IN_GOMP_ALLOC);
+		  tree size = gimple_call_arg (gs, 0);
+		  gimple *g = gimple_build_call (repl, 3, align, size,
+						 allocator);
+		  gimple_call_set_lhs (g, gimple_call_lhs (gs));
+		  gimple_set_location (g, gimple_location (stmt));
+		  gsi_replace (&gsi, g, true);
+		  /* The malloc call has been replaced.  Now see if there is
+		     any free call due to deallocate statement and replace
+		     that too.  */
+		  allocate = false;
+		}
+	    }
+	  else
+	    {
+	      tree arg = gimple_call_arg (gs, 0);
+	      if (arg && TREE_CODE (arg) == SSA_NAME)
+		{
+		  gimple_stmt_iterator gsi2 = gsi;
+		  gsi_prev (&gsi2);
+		  if (!gsi_end_p (gsi2))
+		    {
+		      gimple *gs = gsi_stmt (gsi2);
+		      if (gimple_code (gs) == GIMPLE_ASSIGN)
+			{
+			  const gassign *assign = as_a <const gassign *> (gs);
+			  tree rhs = gimple_assign_rhs1 (assign);
+			  tree lhs = gimple_assign_lhs (assign);
+			  if (lhs == arg && rhs
+			      && TREE_CODE (rhs) == COMPONENT_REF)
+			      arg = TREE_OPERAND (rhs, 0);
+			}
+		    }
+		}
+
+	      if (arg == var)
+		{
+		  tree repl = builtin_decl_explicit (BUILT_IN_GOMP_FREE);
+		  gimple *g = gimple_build_call (repl, 2,
+						 gimple_call_arg (gs, 0),
+						 allocator);
+		  gimple_set_location (g, gimple_location (stmt));
+		  gsi_replace (&gsi, g, true);
+		  break;
+		}
+	    }
+	}
+    }
+  gsi_replace (gsi_p, gimple_build_nop (), true);
+}
+
 
 /* A subroutine of lower_omp_single.  Expand the simple form of
    a GIMPLE_OMP_SINGLE, with a copyprivate clause:
@@ -14431,6 +14565,11 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p, omp_context *ctx)
       gcc_assert (ctx);
       lower_omp_scope (gsi_p, ctx);
       break;
+    case GIMPLE_OMP_ALLOCATE:
+      ctx = maybe_lookup_ctx (stmt);
+      gcc_assert (ctx);
+      lower_omp_allocate (gsi_p, ctx);
+      break;
     case GIMPLE_OMP_SINGLE:
       ctx = maybe_lookup_ctx (stmt);
       gcc_assert (ctx);
diff --git a/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 b/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90
index 6957bc55da0..738d9936f6a 100644
--- a/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90
@@ -1,5 +1,6 @@
 ! { dg-do compile }
 ! { dg-additional-options "-fdump-tree-original -fdump-tree-gimple" }
+! { dg-additional-options "-fdump-tree-omplower" }
 
 module omp_lib_kinds
   use iso_c_binding, only: c_int, c_intptr_t
@@ -47,6 +48,7 @@ end type
   real, allocatable :: var3(:,:)
   type (my_type), allocatable :: var4
   integer, pointer :: pii, parr(:)
+  integer, allocatable :: var
 
   character, allocatable :: str1a, str1aarr(:) 
   character(len=5), allocatable :: str5a, str5aarr(:)
@@ -67,9 +69,16 @@ end type
 
   !$omp allocate
   allocate(pii, parr(5))
+
+  ! allocate statement not associated with an allocate directive
+  allocate(var)
 end subroutine
 
 ! { dg-final { scan-tree-dump-times "#pragma omp allocate \\(kind=allocate\\)" 6 "original" } }
 ! { dg-final { scan-tree-dump "#pragma omp allocate \\(kind=free\\)" "original" } }
 ! { dg-final { scan-tree-dump-times "#pragma omp allocate \\(kind=allocate\\)" 6 "gimple" } }
 ! { dg-final { scan-tree-dump "#pragma omp allocate \\(kind=free\\)" "gimple" } }
+! { dg-final { scan-tree-dump-times "builtin_malloc" 11 "original" } }
+! { dg-final { scan-tree-dump-times "builtin_free" 9 "original" } }
+! { dg-final { scan-tree-dump-times "GOMP_alloc" 10 "omplower" } }
+! { dg-final { scan-tree-dump-times "GOMP_free" 8 "omplower" } }
diff --git a/gcc/testsuite/gfortran.dg/gomp/allocate-7.f90 b/gcc/testsuite/gfortran.dg/gomp/allocate-7.f90
new file mode 100644
index 00000000000..db76e901c08
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/allocate-7.f90
@@ -0,0 +1,13 @@
+! { dg-do compile }
+
+subroutine bar(a)
+  implicit none
+  integer  :: a
+  integer, allocatable :: var
+!$omp target
+  !$omp allocate (var) ! { dg-error "'allocate' directive must specify an allocator here" }
+  allocate (var)
+!$omp end target
+
+end subroutine
+
diff --git a/gcc/testsuite/gfortran.dg/gomp/allocate-8.f90 b/gcc/testsuite/gfortran.dg/gomp/allocate-8.f90
new file mode 100644
index 00000000000..699a3b80878
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/allocate-8.f90
@@ -0,0 +1,15 @@
+! { dg-do compile }
+
+
+subroutine bar(a)
+  implicit none
+  integer  :: a
+  integer, allocatable :: var
+!$omp requires dynamic_allocators
+!$omp target
+  !$omp allocate (var)
+  allocate (var)
+!$omp end target
+
+end subroutine
+
diff --git a/libgomp/testsuite/libgomp.fortran/allocate-2.f90 b/libgomp/testsuite/libgomp.fortran/allocate-2.f90
new file mode 100644
index 00000000000..2219f107fe7
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/allocate-2.f90
@@ -0,0 +1,48 @@
+! { dg-do run }
+! { dg-additional-sources allocate-1.c }
+! { dg-prune-output "command-line option '-fintrinsic-modules-path=.*' is valid for Fortran but not for C" }
+
+module m
+  use omp_lib
+  use iso_c_binding
+  implicit none
+  interface
+    integer(c_int) function is_64bit_aligned (a) bind(C)
+      import :: c_int
+      integer  :: a
+    end
+  end interface
+
+contains
+
+subroutine foo (x, y, h)
+  use omp_lib
+  integer  :: x
+  integer  :: y
+  integer (kind=omp_allocator_handle_kind) :: h
+  integer, allocatable :: var1
+
+  !$omp allocate (var1)  allocator(h)
+  allocate (var1)
+
+  if (is_64bit_aligned(var1) == 0) then
+    stop 19
+  end if
+
+  deallocate(var1)
+end subroutine
+end module m
+
+program main
+  use omp_lib
+  use m
+  type (omp_alloctrait) :: traits(2)
+  integer (omp_allocator_handle_kind) :: a
+
+  traits = [omp_alloctrait (omp_atk_alignment, 64), &
+            omp_alloctrait (omp_atk_fallback, omp_atv_null_fb)]
+  a = omp_init_allocator (omp_default_mem_space, 2, traits)
+  if (a == omp_null_allocator) stop 1
+  call foo (42, 12, a);
+  call omp_destroy_allocator (a);
+end

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 15/17] amdgcn: Support XNACK mode
  2022-07-07 10:34 [PATCH 00/17] openmp, nvptx, amdgcn: 5.0 Memory Allocators Andrew Stubbs
                   ` (13 preceding siblings ...)
  2022-07-07 10:34 ` [PATCH 14/17] Lower " Andrew Stubbs
@ 2022-07-07 10:34 ` Andrew Stubbs
  2022-07-07 10:34 ` [PATCH 16/17] amdgcn, openmp: Auto-detect USM mode and set HSA_XNACK Andrew Stubbs
  2022-07-07 10:34 ` [PATCH 17/17] amdgcn: libgomp plugin USM implementation Andrew Stubbs
  16 siblings, 0 replies; 30+ messages in thread
From: Andrew Stubbs @ 2022-07-07 10:34 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2002 bytes --]


The XNACK feature allows memory load instructions to restart safely following
a page-miss interrupt.  This is useful for shared-memory devices, like APUs,
and to implement OpenMP Unified Shared Memory.

To support the feature we must be able to set the appropriate meta-data and
set the load instructions to early-clobber.  When the port supports scheduling
of s_waitcnt instructions there will be further requirements.

gcc/ChangeLog:

	* config/gcn/gcn-hsa.h (XNACKOPT): New macro.
	(ASM_SPEC): Use XNACKOPT.
	* config/gcn/gcn-opts.h (enum sram_ecc_type): Rename to ...
	(enum hsaco_attr_type): ... this, and generalize the names.
	(TARGET_XNACK): New macro.
	* config/gcn/gcn-valu.md (gather<mode>_insn_1offset<exec>):
	Add xnack compatible alternatives.
	(gather<mode>_insn_2offsets<exec>): Likewise.
	* config/gcn/gcn.c (gcn_option_override): Permit -mxnack for devices
	other than Fiji.
	(gcn_expand_epilogue): Remove early-clobber problems.
	(output_file_start): Emit xnack attributes.
	(gcn_hsa_declare_function_name): Obey -mxnack setting.
	* config/gcn/gcn.md (xnack): New attribute.
	(enabled): Rework to include "xnack" attribute.
	(*movbi): Add xnack compatible alternatives.
	(*mov<mode>_insn): Likewise.
	(*mov<mode>_insn): Likewise.
	(*mov<mode>_insn): Likewise.
	(*movti_insn): Likewise.
	* config/gcn/gcn.opt (-mxnack): Add the "on/off/any" syntax.
	(sram_ecc_type): Rename to ...
	(hsaco_attr_type: ... this.)
	* config/gcn/mkoffload.c (SET_XNACK_ANY): New macro.
	(TEST_XNACK): Delete.
	(TEST_XNACK_ANY): New macro.
	(TEST_XNACK_ON): New macro.
	(main): Support the new -mxnack=on/off/any syntax.
---
 gcc/config/gcn/gcn-hsa.h    |   3 +-
 gcc/config/gcn/gcn-opts.h   |  10 ++--
 gcc/config/gcn/gcn-valu.md  |  29 ++++-----
 gcc/config/gcn/gcn.cc       |  34 ++++++-----
 gcc/config/gcn/gcn.md       | 113 +++++++++++++++++++++++-------------
 gcc/config/gcn/gcn.opt      |  18 +++---
 gcc/config/gcn/mkoffload.cc |  19 ++++--
 7 files changed, 140 insertions(+), 86 deletions(-)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0015-amdgcn-Support-XNACK-mode.patch --]
[-- Type: text/x-patch; name="0015-amdgcn-Support-XNACK-mode.patch", Size: 21025 bytes --]

diff --git a/gcc/config/gcn/gcn-hsa.h b/gcc/config/gcn/gcn-hsa.h
index b3079cebb43..fd08947574f 100644
--- a/gcc/config/gcn/gcn-hsa.h
+++ b/gcc/config/gcn/gcn-hsa.h
@@ -81,12 +81,13 @@ extern unsigned int gcn_local_sym_hash (const char *name);
 /* In HSACOv4 no attribute setting means the binary supports "any" hardware
    configuration.  The name of the attribute also changed.  */
 #define SRAMOPT "msram-ecc=on:-mattr=+sramecc;msram-ecc=off:-mattr=-sramecc"
+#define XNACKOPT "mxnack=on:-mattr=+xnack;mxnack=off:-mattr=-xnack"
 
 /* Use LLVM assembler and linker options.  */
 #define ASM_SPEC  "-triple=amdgcn--amdhsa "  \
 		  "%:last_arg(%{march=*:-mcpu=%*}) " \
 		  "%{!march=*|march=fiji:--amdhsa-code-object-version=3} " \
-		  "%{" NO_XNACK "mxnack:-mattr=+xnack;:-mattr=-xnack} " \
+		  "%{" NO_XNACK XNACKOPT "}" \
 		  "%{" NO_SRAM_ECC SRAMOPT "} " \
 		  "-filetype=obj"
 #define LINK_SPEC "--pie --export-dynamic"
diff --git a/gcc/config/gcn/gcn-opts.h b/gcc/config/gcn/gcn-opts.h
index b62dfb45f59..07ddc79cda3 100644
--- a/gcc/config/gcn/gcn-opts.h
+++ b/gcc/config/gcn/gcn-opts.h
@@ -48,11 +48,13 @@ extern enum gcn_isa {
 #define TARGET_M0_LDS_LIMIT (TARGET_GCN3)
 #define TARGET_PACKED_WORK_ITEMS (TARGET_CDNA2_PLUS)
 
-enum sram_ecc_type
+#define TARGET_XNACK (flag_xnack != HSACO_ATTR_OFF)
+
+enum hsaco_attr_type
 {
-  SRAM_ECC_OFF,
-  SRAM_ECC_ON,
-  SRAM_ECC_ANY
+  HSACO_ATTR_OFF,
+  HSACO_ATTR_ON,
+  HSACO_ATTR_ANY
 };
 
 #endif
diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index abe46201344..ec114db9dd1 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -741,13 +741,13 @@ (define_expand "gather<mode>_expr<exec>"
     {})
 
 (define_insn "gather<mode>_insn_1offset<exec>"
-  [(set (match_operand:V_ALL 0 "register_operand"		   "=v")
+  [(set (match_operand:V_ALL 0 "register_operand"		   "=v,&v")
 	(unspec:V_ALL
-	  [(plus:<VnDI> (match_operand:<VnDI> 1 "register_operand" " v")
+	  [(plus:<VnDI> (match_operand:<VnDI> 1 "register_operand" " v, v")
 			(vec_duplicate:<VnDI>
-			  (match_operand 2 "immediate_operand"	   " n")))
-	   (match_operand 3 "immediate_operand"			   " n")
-	   (match_operand 4 "immediate_operand"			   " n")
+			  (match_operand 2 "immediate_operand"	   " n, n")))
+	   (match_operand 3 "immediate_operand"			   " n, n")
+	   (match_operand 4 "immediate_operand"			   " n, n")
 	   (mem:BLK (scratch))]
 	  UNSPEC_GATHER))]
   "(AS_FLAT_P (INTVAL (operands[3]))
@@ -777,7 +777,8 @@ (define_insn "gather<mode>_insn_1offset<exec>"
     return buf;
   }
   [(set_attr "type" "flat")
-   (set_attr "length" "12")])
+   (set_attr "length" "12")
+   (set_attr "xnack" "off,on")])
 
 (define_insn "gather<mode>_insn_1offset_ds<exec>"
   [(set (match_operand:V_ALL 0 "register_operand"		   "=v")
@@ -802,17 +803,18 @@ (define_insn "gather<mode>_insn_1offset_ds<exec>"
    (set_attr "length" "12")])
 
 (define_insn "gather<mode>_insn_2offsets<exec>"
-  [(set (match_operand:V_ALL 0 "register_operand"			"=v")
+  [(set (match_operand:V_ALL 0 "register_operand"		     "=v,&v")
 	(unspec:V_ALL
 	  [(plus:<VnDI>
 	     (plus:<VnDI>
 	       (vec_duplicate:<VnDI>
-		 (match_operand:DI 1 "register_operand"			"Sv"))
+		 (match_operand:DI 1 "register_operand"		     "Sv,Sv"))
 	       (sign_extend:<VnDI>
-		 (match_operand:<VnSI> 2 "register_operand"		" v")))
-	     (vec_duplicate:<VnDI> (match_operand 3 "immediate_operand" " n")))
-	   (match_operand 4 "immediate_operand"				" n")
-	   (match_operand 5 "immediate_operand"				" n")
+		 (match_operand:<VnSI> 2 "register_operand"	     " v, v")))
+	     (vec_duplicate:<VnDI> (match_operand 3 "immediate_operand"
+								     " n, n")))
+	   (match_operand 4 "immediate_operand"			     " n, n")
+	   (match_operand 5 "immediate_operand"			     " n, n")
 	   (mem:BLK (scratch))]
 	  UNSPEC_GATHER))]
   "(AS_GLOBAL_P (INTVAL (operands[4]))
@@ -831,7 +833,8 @@ (define_insn "gather<mode>_insn_2offsets<exec>"
     return buf;
   }
   [(set_attr "type" "flat")
-   (set_attr "length" "12")])
+   (set_attr "length" "12")
+   (set_attr "xnack" "off,on")])
 
 (define_expand "scatter_store<mode><vnsi>"
   [(match_operand:DI 0 "register_operand")
diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index 6fc20d3f659..4df05453604 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -170,9 +170,14 @@ gcn_option_override (void)
 	acc_lds_size = 32768;
     }
 
-  /* The xnack option is a placeholder, for now.  */
-  if (flag_xnack)
-    sorry ("XNACK support");
+  /* gfx908 "Fiji" does not support XNACK.  */
+  if (gcn_arch == PROCESSOR_FIJI)
+    {
+      if (flag_xnack == HSACO_ATTR_ON)
+	error ("-mxnack=on is incompatible with -march=fiji");
+      /* Allow HSACO_ATTR_ANY silently because that's the default.  */
+      flag_xnack = HSACO_ATTR_OFF;
+    }
 }
 
 /* }}}  */
@@ -3188,17 +3193,19 @@ gcn_expand_epilogue (void)
       /* Assume that an exit value compatible with gcn-run is expected.
          That is, the third input parameter is an int*.
 
-         We can't allocate any new registers, but the kernarg_reg is
-         dead after this, so we'll use that.  */
+         We can't allocate any new registers, but the dispatch_ptr and
+	 kernarg_reg are dead after this, so we'll use those.  */
+      rtx dispatch_ptr_reg = gen_rtx_REG (DImode, cfun->machine->args.reg
+					  [DISPATCH_PTR_ARG]);
       rtx kernarg_reg = gen_rtx_REG (DImode, cfun->machine->args.reg
 				     [KERNARG_SEGMENT_PTR_ARG]);
       rtx retptr_mem = gen_rtx_MEM (DImode,
 				    gen_rtx_PLUS (DImode, kernarg_reg,
 						  GEN_INT (16)));
       set_mem_addr_space (retptr_mem, ADDR_SPACE_SCALAR_FLAT);
-      emit_move_insn (kernarg_reg, retptr_mem);
+      emit_move_insn (dispatch_ptr_reg, retptr_mem);
 
-      rtx retval_mem = gen_rtx_MEM (SImode, kernarg_reg);
+      rtx retval_mem = gen_rtx_MEM (SImode, dispatch_ptr_reg);
       set_mem_addr_space (retval_mem, ADDR_SPACE_SCALAR_FLAT);
       emit_move_insn (retval_mem,
 		      gen_rtx_REG (SImode, SGPR_REGNO (RETURN_VALUE_REG)));
@@ -5250,11 +5257,12 @@ static void
 output_file_start (void)
 {
   /* In HSACOv4 no attribute setting means the binary supports "any" hardware
-     configuration.  In GCC binaries, this is true for SRAM ECC, but not
-     XNACK.  */
-  const char *xnack = (flag_xnack ? ":xnack+" : ":xnack-");
-  const char *sram_ecc = (flag_sram_ecc == SRAM_ECC_ON ? ":sramecc+"
-			  : flag_sram_ecc == SRAM_ECC_OFF ? ":sramecc-"
+     configuration.  */
+  const char *xnack = (flag_xnack == HSACO_ATTR_ON ? ":xnack+"
+		       : flag_xnack == HSACO_ATTR_OFF ? ":xnack-"
+		       : "");
+  const char *sram_ecc = (flag_sram_ecc == HSACO_ATTR_ON ? ":sramecc+"
+			  : flag_sram_ecc == HSACO_ATTR_OFF ? ":sramecc-"
 			  : "");
 
   const char *cpu;
@@ -5298,7 +5306,7 @@ void
 gcn_hsa_declare_function_name (FILE *file, const char *name, tree)
 {
   int sgpr, vgpr;
-  bool xnack_enabled = false;
+  bool xnack_enabled = TARGET_XNACK;
 
   fputs ("\n\n", file);
 
diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md
index 033c1708e88..0f9381c9194 100644
--- a/gcc/config/gcn/gcn.md
+++ b/gcc/config/gcn/gcn.md
@@ -277,12 +277,19 @@ (define_attr "length" ""
 
 (define_attr "gcn_version" "gcn3,gcn5" (const_string "gcn3"))
 
+(define_attr "xnack" "na,off,on" (const_string "na"))
+
 (define_attr "enabled" ""
-  (cond [(eq_attr "gcn_version" "gcn3") (const_int 1)
-	 (and (eq_attr "gcn_version" "gcn5")
-	      (ne (symbol_ref "TARGET_GCN5_PLUS") (const_int 0)))
-	   (const_int 1)]
-	(const_int 0)))
+  (cond [(and (eq_attr "gcn_version" "gcn5")
+	      (eq (symbol_ref "TARGET_GCN5_PLUS") (const_int 0)))
+	   (const_int 0)
+	 (and (eq_attr "xnack" "off")
+	      (ne (symbol_ref "TARGET_XNACK") (const_int 0)))
+	   (const_int 0)
+	 (and (eq_attr "xnack" "on")
+	      (eq (symbol_ref "TARGET_XNACK") (const_int 0)))
+	   (const_int 0)]
+	(const_int 1)))
 
 ; We need to be able to identify v_readlane and v_writelane with
 ; SGPR lane selection in order to handle "Manually Inserted Wait States".
@@ -472,9 +479,9 @@ (define_split
 
 (define_insn "*movbi"
   [(set (match_operand:BI 0 "nonimmediate_operand"
-				    "=Sg,   v,Sg,cs,cV,cV,Sm,RS, v,RF, v,RM")
+			  "=Sg,   v,Sg,cs,cV,cV,Sm,&Sm,RS, v,&v,RF, v,&v,RM")
 	(match_operand:BI 1 "gcn_load_operand"
-				    "SSA,vSvA, v,SS, v,SS,RS,Sm,RF, v,RM, v"))]
+			  "SSA,vSvA, v,SS, v,SS,RS, RS,Sm,RF,RF, v,RM,RM, v"))]
   ""
   {
     /* SCC as an operand is currently not accepted by the LLVM assembler, so
@@ -501,66 +508,77 @@ (define_insn "*movbi"
       return "s_mov_b32\tvcc_lo, %1\;"
 	     "s_mov_b32\tvcc_hi, 0";
     case 6:
-      return "s_load_dword\t%0, %A1\;s_waitcnt\tlgkmcnt(0)";
     case 7:
-      return "s_store_dword\t%1, %A0";
+      return "s_load_dword\t%0, %A1\;s_waitcnt\tlgkmcnt(0)";
     case 8:
-      return "flat_load_dword\t%0, %A1%O1%g1\;s_waitcnt\t0";
+      return "s_store_dword\t%1, %A0";
     case 9:
-      return "flat_store_dword\t%A0, %1%O0%g0";
     case 10:
-      return "global_load_dword\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0)";
+      return "flat_load_dword\t%0, %A1%O1%g1\;s_waitcnt\t0";
     case 11:
+      return "flat_store_dword\t%A0, %1%O0%g0";
+    case 12:
+    case 13:
+      return "global_load_dword\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0)";
+    case 14:
       return "global_store_dword\t%A0, %1%O0%g0";
     default:
       gcc_unreachable ();
     }
   }
-  [(set_attr "type" "sop1,vop1,vop3a,sopk,vopc,mult,smem,smem,flat,flat,
-		     flat,flat")
-   (set_attr "exec" "*,*,none,*,*,*,*,*,*,*,*,*")
-   (set_attr "length" "4,4,4,4,4,8,12,12,12,12,12,12")])
+  [(set_attr "type" "sop1,vop1,vop3a,sopk,vopc,mult,smem,smem,smem,flat,flat,
+		     flat,flat,flat,flat")
+   (set_attr "exec" "*,*,none,*,*,*,*,*,*,*,*,*,*,*,*")
+   (set_attr "length" "4,4,4,4,4,8,12,12,12,12,12,12,12,12,12")
+   (set_attr "xnack" "*,*,*,*,*,*,off,on,*,off,on,*,off,on,*")])
 
 ; 32bit move pattern
 
 (define_insn "*mov<mode>_insn"
   [(set (match_operand:SISF 0 "nonimmediate_operand"
-		  "=SD,SD,SD,SD,RB,Sm,RS,v,Sg, v, v,RF,v,RLRG,   v,SD, v,RM")
+     "=SD,SD,SD,SD,&SD,RB,Sm,&Sm,RS,v,Sg, v, v,&v,RF,v,RLRG,   v,SD, v,&v,RM")
 	(match_operand:SISF 1 "gcn_load_operand"
-		  "SSA, J, B,RB,Sm,RS,Sm,v, v,Sv,RF, v,B,   v,RLRG, Y,RM, v"))]
+    "SSA, J, B,RB, RB,Sm,RS, RS,Sm,v, v,Sv,RF,RF, v,B,   v,RLRG, Y,RM,RM, v"))]
   ""
   "@
   s_mov_b32\t%0, %1
   s_movk_i32\t%0, %1
   s_mov_b32\t%0, %1
   s_buffer_load%s0\t%0, s[0:3], %1\;s_waitcnt\tlgkmcnt(0)
+  s_buffer_load%s0\t%0, s[0:3], %1\;s_waitcnt\tlgkmcnt(0)
   s_buffer_store%s1\t%1, s[0:3], %0
   s_load_dword\t%0, %A1\;s_waitcnt\tlgkmcnt(0)
+  s_load_dword\t%0, %A1\;s_waitcnt\tlgkmcnt(0)
   s_store_dword\t%1, %A0
   v_mov_b32\t%0, %1
   v_readlane_b32\t%0, %1, 0
   v_writelane_b32\t%0, %1, 0
   flat_load_dword\t%0, %A1%O1%g1\;s_waitcnt\t0
+  flat_load_dword\t%0, %A1%O1%g1\;s_waitcnt\t0
   flat_store_dword\t%A0, %1%O0%g0
   v_mov_b32\t%0, %1
   ds_write_b32\t%A0, %1%O0\;s_waitcnt\tlgkmcnt(0)
   ds_read_b32\t%0, %A1%O1\;s_waitcnt\tlgkmcnt(0)
   s_mov_b32\t%0, %1
   global_load_dword\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0)
+  global_load_dword\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0)
   global_store_dword\t%A0, %1%O0%g0"
-  [(set_attr "type" "sop1,sopk,sop1,smem,smem,smem,smem,vop1,vop3a,vop3a,flat,
-		     flat,vop1,ds,ds,sop1,flat,flat")
-   (set_attr "exec" "*,*,*,*,*,*,*,*,none,none,*,*,*,*,*,*,*,*")
-   (set_attr "length" "4,4,8,12,12,12,12,4,8,8,12,12,8,12,12,8,12,12")])
+  [(set_attr "type" "sop1,sopk,sop1,smem,smem,smem,smem,smem,smem,vop1,vop3a,
+	      vop3a,flat,flat,flat,vop1,ds,ds,sop1,flat,flat,flat")
+   (set_attr "exec" "*,*,*,*,*,*,*,*,*,*,none,none,*,*,*,*,*,*,*,*,*,*")
+   (set_attr "length"
+	     "4,4,8,12,12,12,12,12,12,4,8,8,12,12,12,8,12,12,8,12,12,12")
+   (set_attr "xnack"
+	     "*,*,*,off,on,*,off,on,*,*,*,*,off,on,*,*,*,*,*,off,on,*")])
 
 ; 8/16bit move pattern
 ; TODO: implement combined load and zero_extend, but *only* for -msram-ecc=on
 
 (define_insn "*mov<mode>_insn"
   [(set (match_operand:QIHI 0 "nonimmediate_operand"
-				 "=SD,SD,SD,v,Sg, v, v,RF,v,RLRG,   v, v,RM")
+			   "=SD,SD,SD,v,Sg, v, v,&v,RF,v,RLRG,   v, v,&v,RM")
 	(match_operand:QIHI 1 "gcn_load_operand"
-				 "SSA, J, B,v, v,Sv,RF, v,B,   v,RLRG,RM, v"))]
+			   "SSA, J, B,v, v,Sv,RF,RF, v,B,   v,RLRG,RM,RM, v"))]
   "gcn_valid_move_p (<MODE>mode, operands[0], operands[1])"
   "@
   s_mov_b32\t%0, %1
@@ -570,24 +588,27 @@ (define_insn "*mov<mode>_insn"
   v_readlane_b32\t%0, %1, 0
   v_writelane_b32\t%0, %1, 0
   flat_load%o1\t%0, %A1%O1%g1\;s_waitcnt\t0
+  flat_load%o1\t%0, %A1%O1%g1\;s_waitcnt\t0
   flat_store%s0\t%A0, %1%O0%g0
   v_mov_b32\t%0, %1
   ds_write%b0\t%A0, %1%O0\;s_waitcnt\tlgkmcnt(0)
   ds_read%u1\t%0, %A1%O1\;s_waitcnt\tlgkmcnt(0)
   global_load%o1\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0)
+  global_load%o1\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0)
   global_store%s0\t%A0, %1%O0%g0"
-  [(set_attr "type"
-	     "sop1,sopk,sop1,vop1,vop3a,vop3a,flat,flat,vop1,ds,ds,flat,flat")
-   (set_attr "exec" "*,*,*,*,none,none,*,*,*,*,*,*,*")
-   (set_attr "length" "4,4,8,4,4,4,12,12,8,12,12,12,12")])
+  [(set_attr "type" "sop1,sopk,sop1,vop1,vop3a,vop3a,flat,flat,flat,vop1,ds,ds,
+	             flat,flat,flat")
+   (set_attr "exec" "*,*,*,*,none,none,*,*,*,*,*,*,*,*,*")
+   (set_attr "length" "4,4,8,4,4,4,12,12,12,8,12,12,12,12,12")
+   (set_attr "xnack" "*,*,*,*,*,*,off,on,*,*,*,*,off,on,*")])
 
 ; 64bit move pattern
 
 (define_insn_and_split "*mov<mode>_insn"
   [(set (match_operand:DIDF 0 "nonimmediate_operand"
-			  "=SD,SD,SD,RS,Sm,v, v,Sg, v, v,RF,RLRG,   v, v,RM")
+		"=SD,SD,SD,RS,Sm,&Sm,v, v,Sg, v, v,&v,RF,RLRG,   v, v,&v,RM")
 	(match_operand:DIDF 1 "general_operand"
-			  "SSA, C,DB,Sm,RS,v,DB, v,Sv,RF, v,   v,RLRG,RM, v"))]
+		"SSA, C,DB,Sm,RS, RS,v,DB, v,Sv,RF,RF, v,   v,RLRG,RM,RM, v"))]
   "GET_CODE(operands[1]) != SYMBOL_REF"
   "@
   s_mov_b64\t%0, %1
@@ -595,15 +616,18 @@ (define_insn_and_split "*mov<mode>_insn"
   #
   s_store_dwordx2\t%1, %A0
   s_load_dwordx2\t%0, %A1\;s_waitcnt\tlgkmcnt(0)
+  s_load_dwordx2\t%0, %A1\;s_waitcnt\tlgkmcnt(0)
   #
   #
   #
   #
   flat_load_dwordx2\t%0, %A1%O1%g1\;s_waitcnt\t0
+  flat_load_dwordx2\t%0, %A1%O1%g1\;s_waitcnt\t0
   flat_store_dwordx2\t%A0, %1%O0%g0
   ds_write_b64\t%A0, %1%O0\;s_waitcnt\tlgkmcnt(0)
   ds_read_b64\t%0, %A1%O1\;s_waitcnt\tlgkmcnt(0)
   global_load_dwordx2\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0)
+  global_load_dwordx2\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0)
   global_store_dwordx2\t%A0, %1%O0%g0"
   "reload_completed
    && ((!MEM_P (operands[0]) && !MEM_P (operands[1])
@@ -634,29 +658,33 @@ (define_insn_and_split "*mov<mode>_insn"
 	operands[3] = inhi;
       }
   }
-  [(set_attr "type" "sop1,sop1,mult,smem,smem,vmult,vmult,vmult,vmult,flat,
-		     flat,ds,ds,flat,flat")
-   (set_attr "length" "4,8,*,12,12,*,*,*,*,12,12,12,12,12,12")])
+  [(set_attr "type" "sop1,sop1,mult,smem,smem,smem,vmult,vmult,vmult,vmult,
+	      flat,flat,flat,ds,ds,flat,flat,flat")
+   (set_attr "length" "4,8,*,12,12,12,*,*,*,*,12,12,12,12,12,12,12,12")
+   (set_attr "xnack" "*,*,*,*,off,on,*,*,*,*,off,on,*,*,*,off,on,*")])
 
 ; 128-bit move.
 
 (define_insn_and_split "*movti_insn"
   [(set (match_operand:TI 0 "nonimmediate_operand"
-				      "=SD,RS,Sm,RF, v,v, v,SD,RM, v,RL, v")
-	(match_operand:TI 1 "general_operand"  
-				      "SSB,Sm,RS, v,RF,v,Sv, v, v,RM, v,RL"))]
+			     "=SD,RS,Sm,&Sm,RF, v,&v,v, v,SD,RM, v,&v,RL, v")
+	(match_operand:TI 1 "general_operand"
+			     "SSB,Sm,RS, RS, v,RF,RF,v,Sv, v, v,RM,RM, v,RL"))]
   ""
   "@
   #
   s_store_dwordx4\t%1, %A0
   s_load_dwordx4\t%0, %A1\;s_waitcnt\tlgkmcnt(0)
+  s_load_dwordx4\t%0, %A1\;s_waitcnt\tlgkmcnt(0)
   flat_store_dwordx4\t%A0, %1%O0%g0
   flat_load_dwordx4\t%0, %A1%O1%g1\;s_waitcnt\t0
+  flat_load_dwordx4\t%0, %A1%O1%g1\;s_waitcnt\t0
   #
   #
   #
   global_store_dwordx4\t%A0, %1%O0%g0
   global_load_dwordx4\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0)
+  global_load_dwordx4\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0)
   ds_write_b128\t%A0, %1%O0\;s_waitcnt\tlgkmcnt(0)
   ds_read_b128\t%0, %A1%O1\;s_waitcnt\tlgkmcnt(0)"
   "reload_completed
@@ -678,10 +706,11 @@ (define_insn_and_split "*movti_insn"
     operands[0] = gcn_operand_part (TImode, operands[0], 0);
     operands[1] = gcn_operand_part (TImode, operands[1], 0);
   }
-  [(set_attr "type" "mult,smem,smem,flat,flat,vmult,vmult,vmult,flat,flat,\
-		     ds,ds")
-   (set_attr "delayeduse" "*,*,yes,*,*,*,*,*,yes,*,*,*")
-   (set_attr "length" "*,12,12,12,12,*,*,*,12,12,12,12")])
+  [(set_attr "type" "mult,smem,smem,smem,flat,flat,flat,vmult,vmult,vmult,flat,
+	             flat,flat,ds,ds")
+   (set_attr "delayeduse" "*,*,yes,yes,*,*,*,*,*,*,*,yes,*,*,*")
+   (set_attr "length" "*,12,12,12,12,12,12,*,*,*,12,12,12,12,12")
+   (set_attr "xnack" "*,*,off,on,*,off,on,*,*,*,*,off,on,*,*")])
 
 ;; }}}
 ;; {{{ Prologue/Epilogue
@@ -844,6 +873,8 @@ (define_insn "movdi_symbol"
   (clobber (reg:BI SCC_REG))]
  "GET_CODE (operands[1]) == SYMBOL_REF || GET_CODE (operands[1]) == LABEL_REF"
   {
+    /* This s_load may not be XNACK-safe on devices where the GOT may fault.
+       DGPUs are most likely fine.  */
     if (SYMBOL_REF_P (operands[1])
 	&& SYMBOL_REF_WEAK (operands[1]))
 	return "s_getpc_b64\t%0\;"
@@ -868,6 +899,8 @@ (define_insn "movdi_symbol_save_scc"
   {
     /* !!! These sequences clobber CC_SAVE_REG.  */
 
+    /* This s_load may not be XNACK-safe on devices where the GOT may fault.
+       DGPUs are most likely fine.  */
     if (SYMBOL_REF_P (operands[1])
 	&& SYMBOL_REF_WEAK (operands[1]))
 	return "s_mov_b32\ts22, scc\;"
diff --git a/gcc/config/gcn/gcn.opt b/gcc/config/gcn/gcn.opt
index 9606aaf0b1a..759f7a064c9 100644
--- a/gcc/config/gcn/gcn.opt
+++ b/gcc/config/gcn/gcn.opt
@@ -81,23 +81,23 @@ Wopenacc-dims
 Target Var(warn_openacc_dims) Warning
 Warn about invalid OpenACC dimensions.
 
-mxnack
-Target Var(flag_xnack) Init(0)
-Compile for devices requiring XNACK enabled. Default off.
-
 Enum
-Name(sram_ecc_type) Type(enum sram_ecc_type)
+Name(hsaco_attr_type) Type(enum hsaco_attr_type)
 SRAM-ECC modes:
 
 EnumValue
-Enum(sram_ecc_type) String(off) Value(SRAM_ECC_OFF)
+Enum(hsaco_attr_type) String(off) Value(HSACO_ATTR_OFF)
 
 EnumValue
-Enum(sram_ecc_type) String(on) Value(SRAM_ECC_ON)
+Enum(hsaco_attr_type) String(on) Value(HSACO_ATTR_ON)
 
 EnumValue
-Enum(sram_ecc_type) String(any) Value(SRAM_ECC_ANY)
+Enum(hsaco_attr_type) String(any) Value(HSACO_ATTR_ANY)
+
+mxnack=
+Target RejectNegative Joined ToLower Enum(hsaco_attr_type) Var(flag_xnack) Init(HSACO_ATTR_ANY)
+Compile for devices requiring XNACK enabled. Default off.
 
 msram-ecc=
-Target RejectNegative Joined ToLower Enum(sram_ecc_type) Var(flag_sram_ecc) Init(SRAM_ECC_ANY)
+Target RejectNegative Joined ToLower Enum(hsaco_attr_type) Var(flag_sram_ecc) Init(HSACO_ATTR_ANY)
 Compile for devices with the SRAM ECC feature enabled, or not. Default \"any\".
diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc
index b8b3fecfcb4..cb8903c27cb 100644
--- a/gcc/config/gcn/mkoffload.cc
+++ b/gcc/config/gcn/mkoffload.cc
@@ -72,10 +72,14 @@
 
 #define SET_XNACK_ON(VAR) VAR = ((VAR & ~EF_AMDGPU_FEATURE_XNACK_V4) \
 				 | EF_AMDGPU_FEATURE_XNACK_ON_V4)
+#define SET_XNACK_ANY(VAR) VAR = ((VAR & ~EF_AMDGPU_FEATURE_XNACK_V4) \
+				  | EF_AMDGPU_FEATURE_XNACK_ANY_V4)
 #define SET_XNACK_OFF(VAR) VAR = ((VAR & ~EF_AMDGPU_FEATURE_XNACK_V4) \
 				  | EF_AMDGPU_FEATURE_XNACK_OFF_V4)
-#define TEST_XNACK(VAR) ((VAR & EF_AMDGPU_FEATURE_XNACK_V4) \
-			 == EF_AMDGPU_FEATURE_XNACK_ON_V4)
+#define TEST_XNACK_ANY(VAR) ((VAR & EF_AMDGPU_FEATURE_XNACK_V4) \
+			     == EF_AMDGPU_FEATURE_XNACK_ANY_V4)
+#define TEST_XNACK_ON(VAR) ((VAR & EF_AMDGPU_FEATURE_XNACK_V4) \
+			    == EF_AMDGPU_FEATURE_XNACK_ON_V4)
 
 #define SET_SRAM_ECC_ON(VAR) VAR = ((VAR & ~EF_AMDGPU_FEATURE_SRAMECC_V4) \
 				    | EF_AMDGPU_FEATURE_SRAMECC_ON_V4)
@@ -884,9 +888,11 @@ main (int argc, char **argv)
 	fPIC = true;
       else if (strcmp (argv[i], "-fpic") == 0)
 	fpic = true;
-      else if (strcmp (argv[i], "-mxnack") == 0)
+      else if (strcmp (argv[i], "-mxnack=on") == 0)
 	SET_XNACK_ON (elf_flags);
-      else if (strcmp (argv[i], "-mno-xnack") == 0)
+      else if (strcmp (argv[i], "-mxnack=any") == 0)
+	SET_XNACK_ANY (elf_flags);
+      else if (strcmp (argv[i], "-mxnack=off") == 0)
 	SET_XNACK_OFF (elf_flags);
       else if (strcmp (argv[i], "-msram-ecc=on") == 0)
 	SET_SRAM_ECC_ON (elf_flags);
@@ -1045,8 +1051,9 @@ main (int argc, char **argv)
       obstack_ptr_grow (&ld_argv_obstack, gcn_s2_name);
       obstack_ptr_grow (&ld_argv_obstack, "-lgomp");
       obstack_ptr_grow (&ld_argv_obstack,
-			(TEST_XNACK (elf_flags)
-			 ? "-mxnack" : "-mno-xnack"));
+			(TEST_XNACK_ON (elf_flags) ? "-mxnack=on"
+			 : TEST_XNACK_ANY (elf_flags) ? "-mxnack=any"
+			 : "-mxnack=off"));
       obstack_ptr_grow (&ld_argv_obstack,
 			(TEST_SRAM_ECC_ON (elf_flags) ? "-msram-ecc=on"
 			 : TEST_SRAM_ECC_ANY (elf_flags) ? "-msram-ecc=any"

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 16/17] amdgcn, openmp: Auto-detect USM mode and set HSA_XNACK
  2022-07-07 10:34 [PATCH 00/17] openmp, nvptx, amdgcn: 5.0 Memory Allocators Andrew Stubbs
                   ` (14 preceding siblings ...)
  2022-07-07 10:34 ` [PATCH 15/17] amdgcn: Support XNACK mode Andrew Stubbs
@ 2022-07-07 10:34 ` Andrew Stubbs
  2022-07-07 10:34 ` [PATCH 17/17] amdgcn: libgomp plugin USM implementation Andrew Stubbs
  16 siblings, 0 replies; 30+ messages in thread
From: Andrew Stubbs @ 2022-07-07 10:34 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1446 bytes --]


The AMD GCN runtime must be set to the correct mode for Unified Shared Memory
to work, but this is not always clear at compile and link time due to the split
nature of the offload compilation pipeline.

This patch sets a new attribute on OpenMP offload functions to ensure that the
information is passed all the way to the backend.  The backend then places a
marker in the assembler code for mkoffload to find. Finally mkoffload places a
constructor function into the final program to ensure that the HSA_XNACK
environment variable passes the correct mode to the GPU.

The HSA_XNACK variable must be set before the HSA runtime is even loaded, so
it makes more sense to have this set within the constructor than at some point
later within libgomp or the GCN plugin.

gcc/ChangeLog:

	* config/gcn/gcn.c (unified_shared_memory_enabled): New variable.
	(gcn_init_cumulative_args): Handle attribute "omp unified memory".
	(gcn_hsa_declare_function_name): Emit "MKOFFLOAD OPTIONS: USM+".
	* config/gcn/mkoffload.c (TEST_XNACK_OFF): New macro.
	(process_asm): Detect "MKOFFLOAD OPTIONS: USM+".
	Emit configure_xnack constructor, as required.
	* omp-low.c (create_omp_child_function): Add attribute "omp unified
	memory".
---
 gcc/config/gcn/gcn.cc       | 28 +++++++++++++++++++++++++++-
 gcc/config/gcn/mkoffload.cc | 37 ++++++++++++++++++++++++++++++++++++-
 gcc/omp-low.cc              |  4 ++++
 3 files changed, 67 insertions(+), 2 deletions(-)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0016-amdgcn-openmp-Auto-detect-USM-mode-and-set-HSA_XNACK.patch --]
[-- Type: text/x-patch; name="0016-amdgcn-openmp-Auto-detect-USM-mode-and-set-HSA_XNACK.patch", Size: 5927 bytes --]

diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index 4df05453604..88cc505597e 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -68,6 +68,11 @@ static bool ext_gcn_constants_init = 0;
 
 enum gcn_isa gcn_isa = ISA_GCN3;	/* Default to GCN3.  */
 
+/* Record whether the host compiler added "omp unifed memory" attributes to
+   any functions.  We can then pass this on to mkoffload to ensure xnack is
+   compatible there too.  */
+static bool unified_shared_memory_enabled = false;
+
 /* Reserve this much space for LDS (for propagating variables from
    worker-single mode to worker-partitioned mode), per workgroup.  Global
    analysis could calculate an exact bound, but we don't do that yet.
@@ -2542,6 +2547,25 @@ gcn_init_cumulative_args (CUMULATIVE_ARGS *cum /* Argument info to init */ ,
   if (!caller && cfun->machine->normal_function)
     gcn_detect_incoming_pointer_arg (fndecl);
 
+  if (fndecl && lookup_attribute ("omp unified memory",
+				  DECL_ATTRIBUTES (fndecl)))
+    {
+      unified_shared_memory_enabled = true;
+
+      switch (gcn_arch)
+	{
+	case PROCESSOR_FIJI:
+	case PROCESSOR_VEGA10:
+	case PROCESSOR_VEGA20:
+	  error ("GPU architecture does not support Unified Shared Memory");
+	default:
+	  ;
+	}
+
+      if (flag_xnack == HSACO_ATTR_OFF)
+	error ("Unified Shared Memory is enabled, but XNACK is disabled");
+    }
+
   reinit_regs ();
 }
 
@@ -5458,12 +5482,14 @@ gcn_hsa_declare_function_name (FILE *file, const char *name, tree)
   assemble_name (file, name);
   fputs (":\n", file);
 
-  /* This comment is read by mkoffload.  */
+  /* These comments are read by mkoffload.  */
   if (flag_openacc)
     fprintf (file, "\t;; OPENACC-DIMS: %d, %d, %d : %s\n",
 	     oacc_get_fn_dim_size (cfun->decl, GOMP_DIM_GANG),
 	     oacc_get_fn_dim_size (cfun->decl, GOMP_DIM_WORKER),
 	     oacc_get_fn_dim_size (cfun->decl, GOMP_DIM_VECTOR), name);
+  if (unified_shared_memory_enabled)
+    fprintf (asm_out_file, "\t;; MKOFFLOAD OPTIONS: USM+\n");
 }
 
 /* Implement TARGET_ASM_SELECT_SECTION.
diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc
index cb8903c27cb..5741d0a917b 100644
--- a/gcc/config/gcn/mkoffload.cc
+++ b/gcc/config/gcn/mkoffload.cc
@@ -80,6 +80,8 @@
 			     == EF_AMDGPU_FEATURE_XNACK_ANY_V4)
 #define TEST_XNACK_ON(VAR) ((VAR & EF_AMDGPU_FEATURE_XNACK_V4) \
 			    == EF_AMDGPU_FEATURE_XNACK_ON_V4)
+#define TEST_XNACK_OFF(VAR) ((VAR & EF_AMDGPU_FEATURE_XNACK_V4) \
+			     == EF_AMDGPU_FEATURE_XNACK_OFF_V4)
 
 #define SET_SRAM_ECC_ON(VAR) VAR = ((VAR & ~EF_AMDGPU_FEATURE_SRAMECC_V4) \
 				    | EF_AMDGPU_FEATURE_SRAMECC_ON_V4)
@@ -474,6 +476,7 @@ static void
 process_asm (FILE *in, FILE *out, FILE *cfile)
 {
   int fn_count = 0, var_count = 0, dims_count = 0, regcount_count = 0;
+  bool unified_shared_memory_enabled = false;
   struct obstack fns_os, dims_os, regcounts_os;
   obstack_init (&fns_os);
   obstack_init (&dims_os);
@@ -498,6 +501,7 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
   fn_count += 2;
 
   char buf[1000];
+  char dummy;
   enum
     { IN_CODE,
       IN_METADATA,
@@ -517,6 +521,9 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
 		dims_count++;
 	      }
 
+	    if (sscanf (buf, " ;; MKOFFLOAD OPTIONS: USM+%c", &dummy) > 0)
+	      unified_shared_memory_enabled = true;
+
 	    break;
 	  }
 	case IN_METADATA:
@@ -565,7 +572,6 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
 	  }
 	}
 
-      char dummy;
       if (sscanf (buf, " .section .gnu.offload_vars%c", &dummy) > 0)
 	{
 	  state = IN_VARS;
@@ -617,6 +623,7 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
   fprintf (cfile, "#include <stdlib.h>\n");
   fprintf (cfile, "#include <stdint.h>\n");
   fprintf (cfile, "#include <stdbool.h>\n\n");
+  fprintf (cfile, "#include <stdio.h>\n\n");
 
   fprintf (cfile, "static const int gcn_num_vars = %d;\n\n", var_count);
 
@@ -657,6 +664,34 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
     }
   fprintf (cfile, "\n};\n\n");
 
+  /* Emit a constructor function to set the HSA_XNACK environment variable.
+     This must be done before the ROCr runtime library is loaded.
+     We never override a user value (exit empty string), but we do emit a
+     useful diagnostic in the wrong mode (the ROCr message is not good.  */
+  if (TEST_XNACK_OFF (elf_flags) && unified_shared_memory_enabled)
+    fatal_error (input_location,
+		 "conflicting settings; XNACK is forced off but Unified "
+		 "Shared Memory is on");
+  if (!TEST_XNACK_ANY (elf_flags) || unified_shared_memory_enabled)
+    fprintf (cfile,
+	     "static __attribute__((constructor))\n"
+	     "void configure_xnack (void)\n"
+	     "{\n"
+	     "  const char *val = getenv (\"HSA_XNACK\");\n"
+	     "  if (!val || val[0] == '\\0')\n"
+	     "    setenv (\"HSA_XNACK\", \"%d\", true);\n"
+	     "  else if (%s)\n"
+	     "    {\n"
+	     "      fprintf (stderr, \"error: HSA_XNACK=%%s is incompatible; "
+			    "please unset\\n\", val);\n"
+	     "      exit (1);\n"
+	     "    }\n"
+	     "}\n\n",
+	     unified_shared_memory_enabled || TEST_XNACK_ON (elf_flags),
+	     (unified_shared_memory_enabled || TEST_XNACK_ON (elf_flags)
+	      ? "val[0] != '1' || val[1] != '\\0'"
+	      : "val[0] == '1' && val[1] == '\\0'"));
+
   obstack_free (&fns_os, NULL);
   for (i = 0; i < dims_count; i++)
     free (dims[i].name);
diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index 7d1a2a0d795..239446beb52 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -2107,6 +2107,10 @@ create_omp_child_function (omp_context *ctx, bool task_copy)
 	DECL_ATTRIBUTES (decl)
 	  = tree_cons (get_identifier (target_attr),
 		       NULL_TREE, DECL_ATTRIBUTES (decl));
+      if (flag_offload_memory == OFFLOAD_MEMORY_UNIFIED)
+	DECL_ATTRIBUTES (decl)
+	  = tree_cons (get_identifier ("omp unified memory"),
+		       NULL_TREE, DECL_ATTRIBUTES (decl));
     }
 
   t = build_decl (DECL_SOURCE_LOCATION (decl),

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 17/17] amdgcn: libgomp plugin USM implementation
  2022-07-07 10:34 [PATCH 00/17] openmp, nvptx, amdgcn: 5.0 Memory Allocators Andrew Stubbs
                   ` (15 preceding siblings ...)
  2022-07-07 10:34 ` [PATCH 16/17] amdgcn, openmp: Auto-detect USM mode and set HSA_XNACK Andrew Stubbs
@ 2022-07-07 10:34 ` Andrew Stubbs
  16 siblings, 0 replies; 30+ messages in thread
From: Andrew Stubbs @ 2022-07-07 10:34 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1975 bytes --]


Implement the Unified Shared Memory API calls in the GCN plugin.

The allocate and free are pretty straight-forward because all "target" memory
allocations are compatible with USM, on the right hardware.  However, there's
no known way to check what memory region was used, after the fact, so we use a
splay tree to record allocations so we can answer "is_usm_ptr" later.

libgomp/ChangeLog:

	* plugin/plugin-gcn.c (GOMP_OFFLOAD_get_num_devices): Allow
	GOMP_REQUIRES_UNIFIED_ADDRESS and GOMP_REQUIRES_UNIFIED_SHARED_MEMORY.
	(struct usm_splay_tree_key_s): New.
	(usm_splay_compare): New.
	(splay_tree_prefix): New.
	(GOMP_OFFLOAD_usm_alloc): New.
	(GOMP_OFFLOAD_usm_free): New.
	(GOMP_OFFLOAD_is_usm_ptr): New.
	(GOMP_OFFLOAD_supported_features): Move into the OpenMP API fold.
	Add GOMP_REQUIRES_UNIFIED_ADDRESS and
	GOMP_REQUIRES_UNIFIED_SHARED_MEMORY.
	(gomp_fatal): New.
	(splay_tree_c): New.
	* testsuite/lib/libgomp.exp (check_effective_target_omp_usm): New.
	* testsuite/libgomp.c++/usm-1.C: Use dg-require-effective-target.
	* testsuite/libgomp.c-c++-common/requires-1.c: Likewise.
	* testsuite/libgomp.c/usm-1.c: Likewise.
	* testsuite/libgomp.c/usm-2.c: Likewise.
	* testsuite/libgomp.c/usm-3.c: Likewise.
	* testsuite/libgomp.c/usm-4.c: Likewise.
	* testsuite/libgomp.c/usm-5.c: Likewise.
	* testsuite/libgomp.c/usm-6.c: Likewise.
---
 libgomp/plugin/plugin-gcn.c                   | 104 +++++++++++++++++-
 libgomp/testsuite/lib/libgomp.exp             |  22 ++++
 libgomp/testsuite/libgomp.c++/usm-1.C         |   2 +-
 .../libgomp.c-c++-common/requires-1.c         |   1 +
 libgomp/testsuite/libgomp.c/usm-1.c           |   1 +
 libgomp/testsuite/libgomp.c/usm-2.c           |   1 +
 libgomp/testsuite/libgomp.c/usm-3.c           |   1 +
 libgomp/testsuite/libgomp.c/usm-4.c           |   1 +
 libgomp/testsuite/libgomp.c/usm-5.c           |   2 +-
 libgomp/testsuite/libgomp.c/usm-6.c           |   2 +-
 10 files changed, 133 insertions(+), 4 deletions(-)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0017-amdgcn-libgomp-plugin-USM-implementation.patch --]
[-- Type: text/x-patch; name="0017-amdgcn-libgomp-plugin-USM-implementation.patch", Size: 7637 bytes --]

diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index ea327bf2ca0..6a9ff5cd93e 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -3226,7 +3226,11 @@ GOMP_OFFLOAD_get_num_devices (unsigned int omp_requires_mask)
   if (!init_hsa_context ())
     return 0;
   /* Return -1 if no omp_requires_mask cannot be fulfilled but
-     devices were present.  */
+     devices were present.
+     Note: not all devices support USM, but the compiler refuses to create
+     binaries for those that don't anyway.  */
+  omp_requires_mask &= ~(GOMP_REQUIRES_UNIFIED_ADDRESS
+			 | GOMP_REQUIRES_UNIFIED_SHARED_MEMORY);
   if (hsa_context.agent_count > 0 && omp_requires_mask != 0)
     return -1;
   return hsa_context.agent_count;
@@ -3810,6 +3814,89 @@ GOMP_OFFLOAD_async_run (int device, void *tgt_fn, void *tgt_vars,
 		       GOMP_PLUGIN_target_task_completion, async_data);
 }
 
+/* Use a splay tree to track USM allocations.  */
+
+typedef struct usm_splay_tree_node_s *usm_splay_tree_node;
+typedef struct usm_splay_tree_s *usm_splay_tree;
+typedef struct usm_splay_tree_key_s *usm_splay_tree_key;
+
+struct usm_splay_tree_key_s {
+  void *addr;
+  size_t size;
+};
+
+static inline int
+usm_splay_compare (usm_splay_tree_key x, usm_splay_tree_key y)
+{
+  if ((x->addr <= y->addr && x->addr + x->size > y->addr)
+      || (y->addr <= x->addr && y->addr + y->size > x->addr))
+    return 0;
+
+  return (x->addr > y->addr ? 1 : -1);
+}
+
+#define splay_tree_prefix usm
+#include "../splay-tree.h"
+
+static struct usm_splay_tree_s usm_map = { NULL };
+
+/* Allocate memory suitable for Unified Shared Memory.
+
+   In fact, AMD memory need only be "coarse grained", which target
+   allocations already are.  We do need to track allocations so that
+   GOMP_OFFLOAD_is_usm_ptr can look them up.  */
+
+void *
+GOMP_OFFLOAD_usm_alloc (int device, size_t size)
+{
+  void *ptr = GOMP_OFFLOAD_alloc (device, size);
+
+  usm_splay_tree_node node = malloc (sizeof (struct usm_splay_tree_node_s));
+  node->key.addr = ptr;
+  node->key.size = size;
+  node->left = NULL;
+  node->right = NULL;
+  usm_splay_tree_insert (&usm_map, node);
+
+  return ptr;
+}
+
+/* Free memory allocated via GOMP_OFFLOAD_usm_alloc.  */
+
+bool
+GOMP_OFFLOAD_usm_free (int device, void *ptr)
+{
+  struct usm_splay_tree_key_s key = { ptr, 1 };
+  usm_splay_tree_key node = usm_splay_tree_lookup (&usm_map, &key);
+  if (node)
+    {
+      usm_splay_tree_remove (&usm_map, &key);
+      free (node);
+    }
+
+  return GOMP_OFFLOAD_free (device, ptr);
+}
+
+/* True if the memory was allocated via GOMP_OFFLOAD_usm_alloc.  */
+
+bool
+GOMP_OFFLOAD_is_usm_ptr (void *ptr)
+{
+  struct usm_splay_tree_key_s key = { ptr, 1 };
+  return usm_splay_tree_lookup (&usm_map, &key);
+}
+
+/* Indicate which GOMP_REQUIRES_* features are supported.  */
+
+bool
+GOMP_OFFLOAD_supported_features (unsigned int *mask)
+{
+  *mask &= ~(GOMP_REQUIRES_UNIFIED_ADDRESS
+             | GOMP_REQUIRES_UNIFIED_SHARED_MEMORY);
+
+  return (*mask == 0);
+}
+
 /* }}} */
 /* {{{ OpenACC Plugin API  */
 
@@ -4084,3 +4171,18 @@ GOMP_OFFLOAD_openacc_destroy_thread_data (void *data)
 }
 
 /* }}} */
+/* {{{ USM splay tree */
+
+/* Include this now so that splay-tree.c doesn't include it later.  This
+   avoids a conflict with splay_tree_prefix.  */
+#include "libgomp.h"
+
+/* This allows splay-tree.c to call gomp_fatal in this context.  The splay
+   tree code doesn't use the variadic arguments right now.  */
+#define gomp_fatal(MSG, ...) GOMP_PLUGIN_fatal (MSG)
+
+/* Include the splay tree code inline, with the prefixes added.  */
+#define splay_tree_prefix usm
+#define splay_tree_c
+#include "../splay-tree.h"
+/* }}}  */
diff --git a/libgomp/testsuite/lib/libgomp.exp b/libgomp/testsuite/lib/libgomp.exp
index 891f90929d2..dce1af279e1 100644
--- a/libgomp/testsuite/lib/libgomp.exp
+++ b/libgomp/testsuite/lib/libgomp.exp
@@ -536,3 +536,25 @@ int main() {
     return 0;
 } } "-lcuda -lcudart" ]
 }
+
+# return 1 if OpenMP Unified Share Memory is supported
+
+proc check_effective_target_omp_usm { } {
+    if { [libgomp_check_effective_target_offload_target "nvptx"] } {
+	return 1
+    }
+
+    if { [libgomp_check_effective_target_offload_target "amdgcn"] } {
+	return [check_no_compiler_messages omp_usm executable {
+           #pragma omp requires unified_shared_memory
+	   int main () {
+	     #pragma omp target
+	       ;
+	     return 0;
+	   }
+	}]
+    }
+
+    return 0
+}
+
diff --git a/libgomp/testsuite/libgomp.c++/usm-1.C b/libgomp/testsuite/libgomp.c++/usm-1.C
index fea25e5f10b..6e88f90d61f 100644
--- a/libgomp/testsuite/libgomp.c++/usm-1.C
+++ b/libgomp/testsuite/libgomp.c++/usm-1.C
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-skip-if "Only valid for nvptx" { ! offload_target_nvptx } } */
+/* { dg-require-effective-target omp_usm } */
 #include <stdint.h>
 
 #pragma omp requires unified_shared_memory
diff --git a/libgomp/testsuite/libgomp.c-c++-common/requires-1.c b/libgomp/testsuite/libgomp.c-c++-common/requires-1.c
index fedf9779769..b760d5ebaf7 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/requires-1.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/requires-1.c
@@ -1,5 +1,6 @@
 /* { dg-do link { target { offload_target_nvptx || offload_target_amdgcn } } } */
 /* { dg-additional-sources requires-1-aux.c } */
+/* { dg-require-effective-target omp_usm } */
 
 /* Check diagnostic by device-compiler's lto1.
    Other file uses: 'requires unified_address'.  */
diff --git a/libgomp/testsuite/libgomp.c/usm-1.c b/libgomp/testsuite/libgomp.c/usm-1.c
index 1b35f19c45b..e73f1816f9a 100644
--- a/libgomp/testsuite/libgomp.c/usm-1.c
+++ b/libgomp/testsuite/libgomp.c/usm-1.c
@@ -1,4 +1,5 @@
 /* { dg-do run } */
+/* { dg-require-effective-target omp_usm } */
 
 #include <omp.h>
 #include <stdint.h>
diff --git a/libgomp/testsuite/libgomp.c/usm-2.c b/libgomp/testsuite/libgomp.c/usm-2.c
index 689cee7e456..31f2bae7145 100644
--- a/libgomp/testsuite/libgomp.c/usm-2.c
+++ b/libgomp/testsuite/libgomp.c/usm-2.c
@@ -1,4 +1,5 @@
 /* { dg-do run } */
+/* { dg-require-effective-target omp_usm } */
 
 #include <omp.h>
 #include <stdint.h>
diff --git a/libgomp/testsuite/libgomp.c/usm-3.c b/libgomp/testsuite/libgomp.c/usm-3.c
index 2ca66afe93f..2c78a0d8ced 100644
--- a/libgomp/testsuite/libgomp.c/usm-3.c
+++ b/libgomp/testsuite/libgomp.c/usm-3.c
@@ -1,4 +1,5 @@
 /* { dg-do run } */
+/* { dg-require-effective-target omp_usm } */
 
 #include <omp.h>
 #include <stdint.h>
diff --git a/libgomp/testsuite/libgomp.c/usm-4.c b/libgomp/testsuite/libgomp.c/usm-4.c
index 753908c8440..1ac5498f73f 100644
--- a/libgomp/testsuite/libgomp.c/usm-4.c
+++ b/libgomp/testsuite/libgomp.c/usm-4.c
@@ -1,4 +1,5 @@
 /* { dg-do run } */
+/* { dg-require-effective-target omp_usm } */
 
 #include <omp.h>
 #include <stdint.h>
diff --git a/libgomp/testsuite/libgomp.c/usm-5.c b/libgomp/testsuite/libgomp.c/usm-5.c
index 4d8b3cf71b1..563397f941a 100644
--- a/libgomp/testsuite/libgomp.c/usm-5.c
+++ b/libgomp/testsuite/libgomp.c/usm-5.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-require-effective-target offload_device } */
+/* { dg-require-effective-target omp_usm } */
 
 #include <omp.h>
 #include <stdint.h>
diff --git a/libgomp/testsuite/libgomp.c/usm-6.c b/libgomp/testsuite/libgomp.c/usm-6.c
index c207140092a..bd14f8197b3 100644
--- a/libgomp/testsuite/libgomp.c/usm-6.c
+++ b/libgomp/testsuite/libgomp.c/usm-6.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-skip-if "Only valid for nvptx" { ! offload_target_nvptx } } */
+/* { dg-require-effective-target omp_usm } */
 
 #include <stdint.h>
 #include <stdlib.h>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 08/17] openmp: -foffload-memory=pinned
  2022-07-07 10:34 ` [PATCH 08/17] openmp: -foffload-memory=pinned Andrew Stubbs
@ 2022-07-07 11:54   ` Tobias Burnus
  2022-07-07 22:18     ` Andrew Stubbs
  0 siblings, 1 reply; 30+ messages in thread
From: Tobias Burnus @ 2022-07-07 11:54 UTC (permalink / raw)
  To: Andrew Stubbs, gcc-patches

Hi Andrew,

On 07.07.22 12:34, Andrew Stubbs wrote:
> Implement the -foffload-memory=pinned option such that libgomp is
> instructed to enable fully-pinned memory at start-up.  The option is
> intended to provide a performance boost to certain offload programs without
> modifying the code.
...
> gcc/ChangeLog:
>
>       * omp-builtins.def (BUILT_IN_GOMP_ENABLE_PINNED_MODE): New.
>       * omp-low.cc (omp_enable_pinned_mode): New function.
>       (execute_lower_omp): Call omp_enable_pinned_mode.
>
> libgomp/ChangeLog:
>
>       * config/linux/allocator.c (always_pinned_mode): New variable.
>       (GOMP_enable_pinned_mode): New function.
>       (linux_memspace_alloc): Disable pinning when always_pinned_mode set.
>       (linux_memspace_calloc): Likewise.
>       (linux_memspace_free): Likewise.
>       (linux_memspace_realloc): Likewise.
>       * libgomp.map: Add GOMP_enable_pinned_mode.
>       * testsuite/libgomp.c/alloc-pinned-7.c: New test.
> ...
...
> --- a/gcc/omp-low.cc
> +++ b/gcc/omp-low.cc
> @@ -14620,6 +14620,68 @@ lower_omp (gimple_seq *body, omp_context *ctx)
>     input_location = saved_location;
>   }
>
> +/* Emit a constructor function to enable -foffload-memory=pinned
> +   at runtime.  Libgomp handles the OS mode setting, but we need to trigger
> +   it by calling GOMP_enable_pinned mode before the program proper runs.  */
> +
> +static void
> +omp_enable_pinned_mode ()

Is there a reason not to use the mechanism of OpenMP's 'requires'
directive for this?

(Okay, I have to admit that the final patch was only committed on
Monday. But still ...)

It looks very similar in spirit. I don't know whether there are issues
of having -foffload-memory=pinned in some TU and not, but that could be
handled in a similar way to GOMP_REQUIRES_TARGET_USED.

For requires, omp_requires_mask is streamed out if
OMP_REQUIRES_TARGET_USED and g->have_offload. (For completeness, it also
requires ENABLE_OFFLOADING.)

This data is read in by all lto1 (in lto-cgraph.cc) and checked for
consistency. This data is then also passed on to *mkoffload.cc.

And in libgomp, it is processed by GOMP_register_ver.

Likewise, the 'requires' mechanism could then also be used in '[PATCH
16/17] amdgcn, openmp: Auto-detect USM mode and set HSA_XNACK'.

Tobias

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 08/17] openmp: -foffload-memory=pinned
  2022-07-07 11:54   ` Tobias Burnus
@ 2022-07-07 22:18     ` Andrew Stubbs
  2022-07-08  9:00       ` Tobias Burnus
  2023-02-20 14:59       ` Prototype 'GOMP_enable_pinned_mode' (was: [PATCH 08/17] openmp: -foffload-memory=pinned) Thomas Schwinge
  0 siblings, 2 replies; 30+ messages in thread
From: Andrew Stubbs @ 2022-07-07 22:18 UTC (permalink / raw)
  To: Tobias Burnus, gcc-patches

On 07/07/2022 12:54, Tobias Burnus wrote:
> Hi Andrew,
> 
> On 07.07.22 12:34, Andrew Stubbs wrote:
>> Implement the -foffload-memory=pinned option such that libgomp is
>> instructed to enable fully-pinned memory at start-up.  The option is
>> intended to provide a performance boost to certain offload programs 
>> without
>> modifying the code.
> ...
>> gcc/ChangeLog:
>>
>>     * omp-builtins.def (BUILT_IN_GOMP_ENABLE_PINNED_MODE): New.
>>     * omp-low.cc (omp_enable_pinned_mode): New function.
>>     (execute_lower_omp): Call omp_enable_pinned_mode.
>>
>> libgomp/ChangeLog:
>>
>>     * config/linux/allocator.c (always_pinned_mode): New variable.
>>     (GOMP_enable_pinned_mode): New function.
>>     (linux_memspace_alloc): Disable pinning when always_pinned_mode set.
>>     (linux_memspace_calloc): Likewise.
>>     (linux_memspace_free): Likewise.
>>     (linux_memspace_realloc): Likewise.
>>     * libgomp.map: Add GOMP_enable_pinned_mode.
>>     * testsuite/libgomp.c/alloc-pinned-7.c: New test.
>> ...
> ...
>> --- a/gcc/omp-low.cc
>> +++ b/gcc/omp-low.cc
>> @@ -14620,6 +14620,68 @@ lower_omp (gimple_seq *body, omp_context *ctx)
>>     input_location = saved_location;
>>   }
>> +/* Emit a constructor function to enable -foffload-memory=pinned
>> +   at runtime.  Libgomp handles the OS mode setting, but we need to 
>> trigger
>> +   it by calling GOMP_enable_pinned mode before the program proper 
>> runs.  */
>> +
>> +static void
>> +omp_enable_pinned_mode ()
> 
> Is there a reason not to use the mechanism of OpenMP's 'requires' 
> directive for this?
> 
> (Okay, I have to admit that the final patch was only committed on 
> Monday. But still ...)

Possibly, I had most of this done before then. I'll have a look next 
time I visit this patch.

The Cuda-specific solution can't work this way anyway, because there's 
no mlockall equivalent, so I will make conditional adjustments anyway.

> Likewise, the 'requires' mechanism could then also be used in '[PATCH 
> 16/17] amdgcn, openmp: Auto-detect USM mode and set HSA_XNACK'.

No, I don't think so; that environment variable needs to be set before 
the libraries are loaded or it's too late.  There are other ways to 
achieve the same thing, by leaving messages for the libgomp plugin to 
pick up, perhaps, but it's all extra complexity for no real gain.

Andrew

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 08/17] openmp: -foffload-memory=pinned
  2022-07-07 22:18     ` Andrew Stubbs
@ 2022-07-08  9:00       ` Tobias Burnus
  2022-07-08  9:55         ` Andrew Stubbs
  2023-02-20 14:59       ` Prototype 'GOMP_enable_pinned_mode' (was: [PATCH 08/17] openmp: -foffload-memory=pinned) Thomas Schwinge
  1 sibling, 1 reply; 30+ messages in thread
From: Tobias Burnus @ 2022-07-08  9:00 UTC (permalink / raw)
  To: Andrew Stubbs, gcc-patches

On 08.07.22 00:18, Andrew Stubbs wrote:
>> Likewise, the 'requires' mechanism could then also be used in '[PATCH
>> 16/17] amdgcn, openmp: Auto-detect USM mode and set HSA_XNACK'.
>
> No, I don't think so; that environment variable needs to be set before
> the libraries are loaded or it's too late.  There are other ways to
> achieve the same thing, by leaving messages for the libgomp plugin to
> pick up, perhaps, but it's all extra complexity for no real gain.

I think we talk about two different things:

(a) where (and when) to check/set the environment variable. I think this
part is fine. You could consider moving the generated code for
'configure_xnack' code into the existing 'init' constructor function,
but it does not really matter. (Nor does the order in which the
constructor function runs.)

(I also do not see any benefit of moving it to libgomp. The message
could then be suppressed if no device available or similar tricky, but I
do not see any real advantage of moving it.)

Longer side note: I think the message "error: HSA_XNACK=%%s is
incompatible; please unset" could be clearer. Both in terms who issues
it and that it talks about an environment variable. Maybe:

"|libgomp: fatal error: Environment variable HSA_XNACK=%s is
incompatible with GCN offloading; please unset"|

|or something like that. (I did misuse 'libgomp:' for this; I am not
sure that makes sense or is even more misleading.) – I am also not sure
GCN fits that well, given that CDNA is not GCN. But that is a general
problem. But in any case, adding "fatal", "environment variable" and ...
offloading makes surely sense, IMHO.
|

(b) How the value is made available inside both gcc/config/gcn/gcn.cc
and in mkoffload.cc.

I was talking about (b). Namely:

omp_requires_mask is already available in gcc/config/gcn/gcn.cc and
mkoffload.cc. Thus, there is no reason to reinvent the wheel and coming
up with another means to pass the same kind of data to the very same files.

(You still might want to add another flag to it (assuming 'omp requires
unified_shared_memory' alias OMP_REQUIRES_UNIFIED_SHARED_MEMORY is
insufficient.)

Tobias
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 08/17] openmp: -foffload-memory=pinned
  2022-07-08  9:00       ` Tobias Burnus
@ 2022-07-08  9:55         ` Andrew Stubbs
  2022-07-08  9:57           ` Tobias Burnus
  0 siblings, 1 reply; 30+ messages in thread
From: Andrew Stubbs @ 2022-07-08  9:55 UTC (permalink / raw)
  To: Tobias Burnus, gcc-patches

On 08/07/2022 10:00, Tobias Burnus wrote:
> On 08.07.22 00:18, Andrew Stubbs wrote:
>>> Likewise, the 'requires' mechanism could then also be used in '[PATCH 
>>> 16/17] amdgcn, openmp: Auto-detect USM mode and set HSA_XNACK'.
>>
>> No, I don't think so; that environment variable needs to be set before 
>> the libraries are loaded or it's too late.  There are other ways to 
>> achieve the same thing, by leaving messages for the libgomp plugin to 
>> pick up, perhaps, but it's all extra complexity for no real gain. 
> 
> I think we talk about two different things:
> 
> 
> (a) where (and when) to check/set the environment variable. I think this 
> part is fine. You could consider moving the generated code for 
> 'configure_xnack' code into the existing 'init' constructor function, 
> but it does not really matter. (Nor does the order in which the 
> constructor function runs.)
> 
> (I also do not see any benefit of moving it to libgomp. The message 
> could then be suppressed if no device available or similar tricky, but I 
> do not see any real advantage of moving it.)
> 
> Longer side note: I think the message "error: HSA_XNACK=%%s is 
> incompatible; please unset" could be clearer. Both in terms who issues 
> it and that it talks about an environment variable. Maybe:
> 
> "|libgomp: fatal error: Environment variable HSA_XNACK=%s is 
> incompatible with GCN offloading; please unset"|
> 
> |or something like that. (I did misuse 'libgomp:' for this; I am not 
> sure that makes sense or is even more misleading.) – I am also not sure 
> GCN fits that well, given that CDNA is not GCN. But that is a general
> problem. But in any case, adding "fatal", "environment variable" and ... 
> offloading makes surely sense, IMHO.

It's not incompatible with GCN offloading, only with the XNACK mode in 
which the binary was compiled (i.e. USM on or off).

The message could be less terse, indeed. I went through a variety of 
messages for this and couldn't find one that I liked. How about...

   fatal error: HSA_XNACK=%s is set but this program was compiled for 
HSA_XNACK=%s; please unset your environment variable.

> (b) How the value is made available inside both gcc/config/gcn/gcn.cc 
> and in mkoffload.cc.
> 
> I was talking about (b). Namely:
> 
> omp_requires_mask is already available in gcc/config/gcn/gcn.cc and 
> mkoffload.cc. Thus, there is no reason to reinvent the wheel and coming 
> up with another means to pass the same kind of data to the very same files.
> 
> (You still might want to add another flag to it (assuming 'omp requires 
> unified_shared_memory' alias OMP_REQUIRES_UNIFIED_SHARED_MEMORY is 
> insufficient.)

OK, this is a new feature that I probably should investigate.

Thanks

Andrew


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 08/17] openmp: -foffload-memory=pinned
  2022-07-08  9:55         ` Andrew Stubbs
@ 2022-07-08  9:57           ` Tobias Burnus
  0 siblings, 0 replies; 30+ messages in thread
From: Tobias Burnus @ 2022-07-08  9:57 UTC (permalink / raw)
  To: Andrew Stubbs, gcc-patches

On 08.07.22 11:55, Andrew Stubbs wrote:
> It's not incompatible with GCN offloading, only with the XNACK mode in
> which the binary was compiled (i.e. USM on or off).
>
> The message could be less terse, indeed. I went through a variety of
> messages for this and couldn't find one that I liked. How about...
>
>   fatal error: HSA_XNACK=%s is set but this program was compiled for
> HSA_XNACK=%s; please unset your environment variable.

For what it is worth: I like this message.

Thanks,

Tobias

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 01/17] libgomp, nvptx: low-latency memory allocator
  2022-07-07 10:34 ` [PATCH 01/17] libgomp, nvptx: low-latency memory allocator Andrew Stubbs
@ 2022-12-08 11:40   ` Jakub Jelinek
  0 siblings, 0 replies; 30+ messages in thread
From: Jakub Jelinek @ 2022-12-08 11:40 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: gcc-patches

On Thu, Jul 07, 2022 at 11:34:32AM +0100, Andrew Stubbs wrote:
> libgomp/ChangeLog:
> 
> 	* allocator.c (MEMSPACE_ALLOC): New macro.
> 	(MEMSPACE_CALLOC): New macro.
> 	(MEMSPACE_REALLOC): New macro.
> 	(MEMSPACE_FREE): New macro.
> 	(dynamic_smem_size): New constants.
> 	(omp_alloc): Use MEMSPACE_ALLOC.
> 	Implement fall-backs for predefined allocators.
> 	(omp_free): Use MEMSPACE_FREE.
> 	(omp_calloc): Use MEMSPACE_CALLOC.
> 	Implement fall-backs for predefined allocators.
> 	(omp_realloc): Use MEMSPACE_REALLOC and MEMSPACE_ALLOC..
> 	Implement fall-backs for predefined allocators.
> 	* config/nvptx/team.c (__nvptx_lowlat_heap_root): New variable.
> 	(__nvptx_lowlat_pool): New asm varaible.
> 	(gomp_nvptx_main): Initialize the low-latency heap.
> 	* plugin/plugin-nvptx.c (lowlat_pool_size): New variable.
> 	(GOMP_OFFLOAD_init_device): Read the GOMP_NVPTX_LOWLAT_POOL envvar.
> 	(GOMP_OFFLOAD_run): Apply lowlat_pool_size.
> 	* config/nvptx/allocator.c: New file.
> 	* testsuite/libgomp.c/allocators-1.c: New test.
> 	* testsuite/libgomp.c/allocators-2.c: New test.
> 	* testsuite/libgomp.c/allocators-3.c: New test.
> 	* testsuite/libgomp.c/allocators-4.c: New test.
> 	* testsuite/libgomp.c/allocators-5.c: New test.
> 	* testsuite/libgomp.c/allocators-6.c: New test.
> 
> co-authored-by: Kwok Cheung Yeung  <kcy@codesourcery.com>

> +/* These macros may be overridden in config/<target>/allocator.c.  */
> +#ifndef MEMSPACE_ALLOC
> +#define MEMSPACE_ALLOC(MEMSPACE, SIZE) malloc (SIZE)
> +#endif

Rather than uglifying the sources with __attribute__((unused)) on the
memspace variables, wouldn't it be better to always use MEMSPACE?
So,
#define MEMSPACE_ALLOC(MEMSPACE, SIZE) malloc (((MEMSPACE), (SIZE)))
or so (similarly other macros)?

> +#ifndef MEMSPACE_CALLOC
> +#define MEMSPACE_CALLOC(MEMSPACE, SIZE) calloc (1, SIZE)
> +#endif
> +#ifndef MEMSPACE_REALLOC
> +#define MEMSPACE_REALLOC(MEMSPACE, ADDR, OLDSIZE, SIZE) realloc (ADDR, SIZE)
> +#endif
> +#ifndef MEMSPACE_FREE
> +#define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE) free (ADDR)
> +#endif

> +/* Map the predefined allocators to the correct memory space.
> +   The index to this table is the omp_allocator_handle_t enum value.  */
> +static const omp_memspace_handle_t predefined_alloc_mapping[] = {
> +  omp_default_mem_space,   /* omp_null_allocator. */
> +  omp_default_mem_space,   /* omp_default_mem_alloc. */
> +  omp_large_cap_mem_space, /* omp_large_cap_mem_alloc. */
> +  omp_default_mem_space,   /* omp_const_mem_alloc. */

Shouldn't this be omp_const_mem_space ?
That is what the standard says and you need to handle it in MEMSPACE_ALLOC
etc. anyway because omp_init_allocator could be done with that memspace.

> +  omp_high_bw_mem_space,   /* omp_high_bw_mem_alloc. */
> +  omp_low_lat_mem_space,   /* omp_low_lat_mem_alloc. */
> +  omp_low_lat_mem_space,   /* omp_cgroup_mem_alloc. */
> +  omp_low_lat_mem_space,   /* omp_pteam_mem_alloc. */
> +  omp_low_lat_mem_space,   /* omp_thread_mem_alloc. */

The above 3 are implementation defined, so we can choose whatever we want.

> @@ -496,35 +530,38 @@ retry:
>    return ret;
>  
>  fail:
> -  if (allocator_data)
> +  int fallback = (allocator_data
> +		  ? allocator_data->fallback
> +		  : allocator == omp_default_mem_alloc
> +		  ? omp_atv_null_fb
> +		  : omp_atv_default_mem_fb);

A label can be only followed by variable declaration in C2X (and in C++),
I think we should keep libgomp in C99 for the time being.
So, it should be
fail:;

> +	  || (allocator_data
> +	      && allocator_data->pool_size < ~(uintptr_t) 0)
> +	  || !allocator_data)

This would be better written as:
	  || allocator_data == NULL
	  || allocator_data->pool_size < ~(uintptr_t) 0)

> @@ -766,35 +816,38 @@ retry:
>    return ret;
>  
>  fail:
> -  if (allocator_data)
> +  int fallback = (allocator_data
> +		  ? allocator_data->fallback
> +		  : allocator == omp_default_mem_alloc
> +		  ? omp_atv_null_fb
> +		  : omp_atv_default_mem_fb);

See above.

> +	  || (allocator_data
> +	      && allocator_data->pool_size < ~(uintptr_t) 0)
> +	  || !allocator_data)

And again.

> @@ -1073,35 +1139,38 @@ retry:
>    return ret;
>  
>  fail:
> -  if (allocator_data)
> +  int fallback = (allocator_data

And again.

> +	  || (allocator_data
> +	      && allocator_data->pool_size < ~(uintptr_t) 0)
> +	  || !allocator_data)

And again.

> --- /dev/null
> +++ b/libgomp/config/nvptx/allocator.c
> @@ -0,0 +1,370 @@
> +/* Copyright (C) 2021 Free Software Foundation, Inc.

-2022

> +static void *
> +nvptx_memspace_alloc (omp_memspace_handle_t memspace, size_t size)
> +{
> +  if (memspace == omp_low_lat_mem_space)
> +    {
> +      char *shared_pool;
> +      asm ("cvta.shared.u64\t%0, __nvptx_lowlat_pool;" : "=r"(shared_pool));

Space between " and (

> +      uint32_t *chunkptr = (uint32_t*)(shared_pool + chunk.desc.offset);

Space between ) and ( and before *

> +	  chunkptr = (uint32_t*)(shared_pool + chunk.desc.offset);

Ditto.

> +	  uint32_t *stillfreeptr = (uint32_t*)(shared_pool
> +					       + stillfree.desc.offset);

And again.

> +	for (unsigned i = 0; i < (unsigned)size/8; i++)

Space in between ) and size and 2 spaces around /

> +	  result[i] = 0;
> +
> +      return result;
> +    }
> +  else
> +    return calloc (1, size);
> +}
> +
> +static void
> +nvptx_memspace_free (omp_memspace_handle_t memspace, void *addr, size_t size)
> +{
> +  if (memspace == omp_low_lat_mem_space)
> +    {
> +      char *shared_pool;
> +      asm ("cvta.shared.u64\t%0, __nvptx_lowlat_pool;" : "=r"(shared_pool));

Formatting.

> +      uint32_t *chunkptr = (uint32_t*)(shared_pool + chunk.desc.offset);

Again.
> +      heapdesc onward_chain = {chunkptr[0]};
> +      while (chunk.desc.size != 0 && addr > (void*)chunkptr)

Again (won't enumerate anymore).

> --- a/libgomp/plugin/plugin-nvptx.c
> +++ b/libgomp/plugin/plugin-nvptx.c
> @@ -334,6 +334,11 @@ struct ptx_device
>  
>  static struct ptx_device **ptx_devices;
>  
> +/* OpenMP kernels reserve a small amount of ".shared" space for use by
> +   omp_alloc.  The size is configured using GOMP_NVPTX_LOWLAT_POOL, but the
> +   default is set here.  */
> +static unsigned lowlat_pool_size = 8*1024;

Spaces around *
> +void
> +test (int n, omp_allocator_handle_t allocator)
> +{
> +  #pragma omp target map(to:n) map(to:allocator)
> +  {
> +    int *a;
> +    a = (int *) omp_alloc(n*sizeof(int), allocator);

Space before ( (twice) and around *.
> +
> +    omp_free(a, allocator);

Space before (
> +    a = (int **) omp_alloc(n*sizeof(int*), allocator);

Again plus space before *)

> +	a[i] = omp_alloc(sizeof(int)*10, allocator);

Again.
> +      omp_free(a[i], allocator);

Again.
> +
> +return 0;

2 spaces before return 0;
> +}
> +

	Jakub


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 02/17] libgomp: pinned memory
  2022-07-07 10:34 ` [PATCH 02/17] libgomp: pinned memory Andrew Stubbs
@ 2022-12-08 12:11   ` Jakub Jelinek
  2022-12-08 12:51     ` Andrew Stubbs
  0 siblings, 1 reply; 30+ messages in thread
From: Jakub Jelinek @ 2022-12-08 12:11 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: gcc-patches

On Thu, Jul 07, 2022 at 11:34:33AM +0100, Andrew Stubbs wrote:
> 
> Implement the OpenMP pinned memory trait on Linux hosts using the mlock
> syscall.  Pinned allocations are performed using mmap, not malloc, to ensure
> that they can be unpinned safely when freed.

As I said before, I think the pinned memory is too precious to waste it this
way, we should handle the -> pinned case through memkind_create_fixed on
mmap + mlock area, that way we can create even quite small pinned
allocations.

	Jakub


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 02/17] libgomp: pinned memory
  2022-12-08 12:11   ` Jakub Jelinek
@ 2022-12-08 12:51     ` Andrew Stubbs
  2022-12-08 14:02       ` Tobias Burnus
  0 siblings, 1 reply; 30+ messages in thread
From: Andrew Stubbs @ 2022-12-08 12:51 UTC (permalink / raw)
  To: Jakub Jelinek, Thomas Schwinge; +Cc: gcc-patches

On 08/12/2022 12:11, Jakub Jelinek wrote:
> On Thu, Jul 07, 2022 at 11:34:33AM +0100, Andrew Stubbs wrote:
>>
>> Implement the OpenMP pinned memory trait on Linux hosts using the mlock
>> syscall.  Pinned allocations are performed using mmap, not malloc, to ensure
>> that they can be unpinned safely when freed.
> 
> As I said before, I think the pinned memory is too precious to waste it this
> way, we should handle the -> pinned case through memkind_create_fixed on
> mmap + mlock area, that way we can create even quite small pinned
> allocations.

This has been delayed due to other priorities, but our current plan is 
to switch to using cudaHostAlloc, when available, but we can certainly 
use memkind_create_fixed for the fallback case (including amdgcn).

Using Cuda might be trickier to implement because there's a layering 
violation inherent in routing target independent allocations through the 
nvptx plugin, but benchmarking shows that that's the only way to get the 
faster path through the Cuda black box; being pinned is good because it 
avoids page faults, but apparently if Cuda *knows* it is pinned then you 
get a speed boost even when there would be *no* faults (i.e. on a quiet 
machine). Additionally, Cuda somehow ignores the OS-defining limits.

Thomas Schwinge has been assigned this task and will be getting to it 
soonish.

Andrew

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 02/17] libgomp: pinned memory
  2022-12-08 12:51     ` Andrew Stubbs
@ 2022-12-08 14:02       ` Tobias Burnus
  2022-12-08 14:35         ` Andrew Stubbs
  0 siblings, 1 reply; 30+ messages in thread
From: Tobias Burnus @ 2022-12-08 14:02 UTC (permalink / raw)
  To: Andrew Stubbs, Jakub Jelinek, Thomas Schwinge; +Cc: gcc-patches

On 08.12.22 13:51, Andrew Stubbs wrote:
> On 08/12/2022 12:11, Jakub Jelinek wrote:
>> On Thu, Jul 07, 2022 at 11:34:33AM +0100, Andrew Stubbs wrote:
>>> Implement the OpenMP pinned memory trait on Linux hosts using the mlock
>>> syscall.  Pinned allocations are performed using mmap, not malloc,
>>> to ensure
>>> that they can be unpinned safely when freed.
>> As I said before, I think the pinned memory is too precious to waste
>> it this
>> way, we should handle the -> pinned case through memkind_create_fixed on
>> mmap + mlock area, that way we can create even quite small pinned
>> allocations.
>
> This has been delayed due to other priorities, but our current plan is
> to switch to using cudaHostAlloc, when available, but we can certainly
> use memkind_create_fixed for the fallback case (including amdgcn).

With available, I assume that nvptx is an 'available device' (per OpenMP
definition, finally added in TR11), i.e. there is an image for nvptx and
- after omp_requires filtering - there remains at least one nvptx device.

* * *

For completeness, I want to note that OpenMP TR11 adds support for
creating memory spaces that are accessible from multiple devices, e.g.
host + one/all devices, and adds some convenience functions for the
latter (all devices, host and a specific device etc.) →
https://openmp.org/specifications/ TR11 (see Appendix B.2 for the
release notes, esp. for Section 6.2).

I think it makes sense to keep those addition in mind when doing the
actual implementation to avoid incompatibilities.

Side note regarding ompx_ additions proposed in
https://gcc.gnu.org/pipermail/gcc-patches/2022-July/597979.html (adds
ompx_pinned_mem_alloc),
https://gcc.gnu.org/pipermail/gcc-patches/2022-July/597983.html
(ompx_unified_shared_mem_alloc and ompx_host_mem_alloc;
ompx_unified_shared_mem_space and ompx_host_mem_space).

While TR11 does not add any predefined allocators or new memory spaces,
using e.g. omp_get_devices_all_allocator(memspace) returns a
unified-shared-memory allocator.

I note that LLVM does not seem to have any ompx_ in this regard (yet?).
(It has some ompx_ – but related to assumptions.)

> Using Cuda might be trickier to implement because there's a layering
> violation inherent in routing target independent allocations through
> the nvptx plugin, but benchmarking shows that that's the only way to
> get the faster path through the Cuda black box; being pinned is good
> because it avoids page faults, but apparently if Cuda *knows* it is
> pinned then you get a speed boost even when there would be *no* faults
> (i.e. on a quiet machine). Additionally, Cuda somehow ignores the
> OS-defining limits.

I wonder whether for a NUMA machine (and non-offloading access), using
memkind_create_fixed will have an advantage over cuHostAlloc or not.
(BTW, I find cuHostAlloc vs. cuAllocHost confusing.) And if so, whether
we should provide a means (GOMP_... env var?) to toggle the preference.

My feeling is that, on most systems, it does not matter - except (a)
possibly for large NUMA systems, where the memkind tuning will probably
make a difference and (b) we know that CUDA's cu(HostAlloc/AllocHost) is
faster with nvptx offloading. (cu(HostAlloc/AllocHost) also permits DMA
from the device. (If unified-shared address is supported, but that's the
case [cf. comment + assert in plugin-nvptx.c].)

Tobias

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 02/17] libgomp: pinned memory
  2022-12-08 14:02       ` Tobias Burnus
@ 2022-12-08 14:35         ` Andrew Stubbs
  2022-12-08 15:02           ` Tobias Burnus
  0 siblings, 1 reply; 30+ messages in thread
From: Andrew Stubbs @ 2022-12-08 14:35 UTC (permalink / raw)
  To: Tobias Burnus, Jakub Jelinek, Thomas Schwinge; +Cc: gcc-patches

On 08/12/2022 14:02, Tobias Burnus wrote:
> On 08.12.22 13:51, Andrew Stubbs wrote:
>> On 08/12/2022 12:11, Jakub Jelinek wrote:
>>> On Thu, Jul 07, 2022 at 11:34:33AM +0100, Andrew Stubbs wrote:
>>>> Implement the OpenMP pinned memory trait on Linux hosts using the mlock
>>>> syscall.  Pinned allocations are performed using mmap, not malloc,
>>>> to ensure
>>>> that they can be unpinned safely when freed.
>>> As I said before, I think the pinned memory is too precious to waste
>>> it this
>>> way, we should handle the -> pinned case through memkind_create_fixed on
>>> mmap + mlock area, that way we can create even quite small pinned
>>> allocations.
>>
>> This has been delayed due to other priorities, but our current plan is
>> to switch to using cudaHostAlloc, when available, but we can certainly
>> use memkind_create_fixed for the fallback case (including amdgcn).
> 
> With available, I assume that nvptx is an 'available device' (per OpenMP
> definition, finally added in TR11), i.e. there is an image for nvptx and
> - after omp_requires filtering - there remains at least one nvptx device.

If plugin-nvptx has been loaded then the function will be available. Do 
we need to get fancier than that?

Andrew

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 02/17] libgomp: pinned memory
  2022-12-08 14:35         ` Andrew Stubbs
@ 2022-12-08 15:02           ` Tobias Burnus
  0 siblings, 0 replies; 30+ messages in thread
From: Tobias Burnus @ 2022-12-08 15:02 UTC (permalink / raw)
  To: Andrew Stubbs, Jakub Jelinek, Thomas Schwinge; +Cc: gcc-patches

On 08.12.22 15:35, Andrew Stubbs wrote:
> On 08/12/2022 14:02, Tobias Burnus wrote:
>> With available, I assume that nvptx is an 'available device' (per OpenMP
>> definition, finally added in TR11), i.e. there is an image for nvptx and
>> - after omp_requires filtering - there remains at least one nvptx
>> device.
>
> If plugin-nvptx has been loaded then the function will be available.
> Do we need to get fancier than that?

I think it does not really make sense to use CUDA if there is no single device.
In terms of loading, the code does:

gomp_target_init(void)
{
...
   cur = OFFLOAD_PLUGINS;  /* This is a comma-separated string with the supported plugins. */
...
         if (gomp_load_plugin_for_device (&current_device, plugin_name))
           {
             int omp_req = omp_requires_mask & ~GOMP_REQUIRES_TARGET_USED;
             new_num_devs = current_device.get_num_devices_func (omp_req);

Thus, CUDA is loaded at the 'gomp_load_plugin_for_device' line and at the
'new_num_devs =' line, it has been filtered for OpenMP's 'requires' demands.*

Thus, 'new_num_devs' contains the number of 'accessible devices' (OpenMP definition),
filtered for the 'requires'* (which part of the 'supported devices' requirements).

(* With some caveats related to late loading of offloading code from (shared) libraries.)

  * * *

Admittedly, this does not yet cover the last suggested feature:

GOMP_offload_register_ver (...)
{
         gomp_load_image_to_device (devicep, version,

which is relevant for the first part of:

'supported devices' - '... supported by the implementation for execution of target code ...
requires directive are fulfilled'.

(available = (intersection of 'accessible devices' and 'supported devices') possibly
filtered + reordered via the OMP_AVAILABLE_DEVICES env var.)

I am not sure how strictly it is required and when we know when the all offload_register are
over; I do note that OpenMP TR 11 has an over-engineered OMP_AVAILABLE_DEVICES environment
variable which permits to filter the list of available devices – which also requires early
access to the initial 'available devices' list. But it might be sufficient to rely on the
device-is-accessible + requires filtering and ignore whether an actual image is available.

Tobias

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Prototype 'GOMP_enable_pinned_mode' (was: [PATCH 08/17] openmp: -foffload-memory=pinned)
  2022-07-07 22:18     ` Andrew Stubbs
  2022-07-08  9:00       ` Tobias Burnus
@ 2023-02-20 14:59       ` Thomas Schwinge
  1 sibling, 0 replies; 30+ messages in thread
From: Thomas Schwinge @ 2023-02-20 14:59 UTC (permalink / raw)
  To: Andrew Stubbs, Tobias Burnus, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2499 bytes --]

Hi!

On 2022-07-07T23:18:03+0100, Andrew Stubbs <ams@codesourcery.com> wrote:
> On 07/07/2022 12:54, Tobias Burnus wrote:
>> On 07.07.22 12:34, Andrew Stubbs wrote:
>>> Implement the -foffload-memory=pinned option such that libgomp is
>>> instructed to enable fully-pinned memory at start-up.  The option is
>>> intended to provide a performance boost to certain offload programs
>>> without
>>> modifying the code.
>> ...
>>> gcc/ChangeLog:
>>>
>>>     * omp-builtins.def (BUILT_IN_GOMP_ENABLE_PINNED_MODE): New.
>>>     * omp-low.cc (omp_enable_pinned_mode): New function.
>>>     (execute_lower_omp): Call omp_enable_pinned_mode.
>>>
>>> libgomp/ChangeLog:
>>>
>>>     * config/linux/allocator.c (always_pinned_mode): New variable.
>>>     (GOMP_enable_pinned_mode): New function.
>>>     (linux_memspace_alloc): Disable pinning when always_pinned_mode set.
>>>     (linux_memspace_calloc): Likewise.
>>>     (linux_memspace_free): Likewise.
>>>     (linux_memspace_realloc): Likewise.
>>>     * libgomp.map: Add GOMP_enable_pinned_mode.
>>>     * testsuite/libgomp.c/alloc-pinned-7.c: New test.
>>> ...
>> ...
>>> --- a/gcc/omp-low.cc
>>> +++ b/gcc/omp-low.cc
>>> @@ -14620,6 +14620,68 @@ lower_omp (gimple_seq *body, omp_context *ctx)
>>>     input_location = saved_location;
>>>   }
>>> +/* Emit a constructor function to enable -foffload-memory=pinned
>>> +   at runtime.  Libgomp handles the OS mode setting, but we need to
>>> trigger
>>> +   it by calling GOMP_enable_pinned mode before the program proper
>>> runs.  */
>>> +
>>> +static void
>>> +omp_enable_pinned_mode ()
>>
>> Is there a reason not to use the mechanism of OpenMP's 'requires'
>> directive for this?

I agree.  (But I'm not working on that, for avoidance of doubt.)

>> (Okay, I have to admit that the final patch was only committed on
>> Monday. But still ...)
>
> Possibly, I had most of this done before then. I'll have a look next
> time I visit this patch.

Until then, let's at least document/verify 'GOMP_enable_pinned_mode';
I've pushed to devel/omp/gcc-12
commit 9657d906869e098340c23118c2eb8592d9e77ac5
"Prototype 'GOMP_enable_pinned_mode'", see attached.


Grüße
 Thomas


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Prototype-GOMP_enable_pinned_mode.patch --]
[-- Type: text/x-diff, Size: 1353 bytes --]

From 9657d906869e098340c23118c2eb8592d9e77ac5 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Mon, 20 Feb 2023 15:29:44 +0100
Subject: [PATCH] Prototype 'GOMP_enable_pinned_mode'

Fix-up for og12 commit 842df187487f5b16ae29bbe7e9acd79661a9df48
"openmp: -foffload-memory=pinned".  No functional change.

	libgomp/
	* libgomp_g.h (GOMP_enable_pinned_mode): New.
---
 libgomp/ChangeLog.omp | 2 ++
 libgomp/libgomp_g.h   | 1 +
 2 files changed, 3 insertions(+)

diff --git a/libgomp/ChangeLog.omp b/libgomp/ChangeLog.omp
index c5a7860478e..e4475093055 100644
--- a/libgomp/ChangeLog.omp
+++ b/libgomp/ChangeLog.omp
@@ -1,5 +1,7 @@
 2023-02-20  Thomas Schwinge  <thomas@codesourcery.com>
 
+	* libgomp_g.h (GOMP_enable_pinned_mode): New.
+
 	* config/linux/allocator.c (linux_memspace_alloc): Add 'init0'
 	formal parameter.  Adjust all users.
 	(linux_memspace_alloc, linux_memspace_free): Attempt to allocate
diff --git a/libgomp/libgomp_g.h b/libgomp/libgomp_g.h
index ece1f97a61f..fe66a53d94a 100644
--- a/libgomp/libgomp_g.h
+++ b/libgomp/libgomp_g.h
@@ -375,6 +375,7 @@ extern void GOMP_teams_reg (void (*) (void *), void *, unsigned, unsigned,
 
 extern void *GOMP_alloc (size_t, size_t, uintptr_t);
 extern void GOMP_free (void *, uintptr_t);
+extern void GOMP_enable_pinned_mode (void);
 
 /* error.c */
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2023-02-20 14:59 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-07 10:34 [PATCH 00/17] openmp, nvptx, amdgcn: 5.0 Memory Allocators Andrew Stubbs
2022-07-07 10:34 ` [PATCH 01/17] libgomp, nvptx: low-latency memory allocator Andrew Stubbs
2022-12-08 11:40   ` Jakub Jelinek
2022-07-07 10:34 ` [PATCH 02/17] libgomp: pinned memory Andrew Stubbs
2022-12-08 12:11   ` Jakub Jelinek
2022-12-08 12:51     ` Andrew Stubbs
2022-12-08 14:02       ` Tobias Burnus
2022-12-08 14:35         ` Andrew Stubbs
2022-12-08 15:02           ` Tobias Burnus
2022-07-07 10:34 ` [PATCH 03/17] libgomp, openmp: Add ompx_pinned_mem_alloc Andrew Stubbs
2022-07-07 10:34 ` [PATCH 04/17] openmp, nvptx: low-lat memory access traits Andrew Stubbs
2022-07-07 10:34 ` [PATCH 05/17] openmp, nvptx: ompx_unified_shared_mem_alloc Andrew Stubbs
2022-07-07 10:34 ` [PATCH 06/17] openmp: Add -foffload-memory Andrew Stubbs
2022-07-07 10:34 ` [PATCH 07/17] openmp: allow requires unified_shared_memory Andrew Stubbs
2022-07-07 10:34 ` [PATCH 08/17] openmp: -foffload-memory=pinned Andrew Stubbs
2022-07-07 11:54   ` Tobias Burnus
2022-07-07 22:18     ` Andrew Stubbs
2022-07-08  9:00       ` Tobias Burnus
2022-07-08  9:55         ` Andrew Stubbs
2022-07-08  9:57           ` Tobias Burnus
2023-02-20 14:59       ` Prototype 'GOMP_enable_pinned_mode' (was: [PATCH 08/17] openmp: -foffload-memory=pinned) Thomas Schwinge
2022-07-07 10:34 ` [PATCH 09/17] openmp: Use libgomp memory allocation functions with unified shared memory Andrew Stubbs
2022-07-07 10:34 ` [PATCH 10/17] Add parsing support for allocate directive (OpenMP 5.0) Andrew Stubbs
2022-07-07 10:34 ` [PATCH 11/17] Translate " Andrew Stubbs
2022-07-07 10:34 ` [PATCH 12/17] Handle cleanup of omp allocated variables " Andrew Stubbs
2022-07-07 10:34 ` [PATCH 13/17] Gimplify allocate directive " Andrew Stubbs
2022-07-07 10:34 ` [PATCH 14/17] Lower " Andrew Stubbs
2022-07-07 10:34 ` [PATCH 15/17] amdgcn: Support XNACK mode Andrew Stubbs
2022-07-07 10:34 ` [PATCH 16/17] amdgcn, openmp: Auto-detect USM mode and set HSA_XNACK Andrew Stubbs
2022-07-07 10:34 ` [PATCH 17/17] amdgcn: libgomp plugin USM implementation Andrew Stubbs

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).