This adds support for using Cuda Managed Memory with omp_alloc.  It will be
used as the underpinnings for "requires unified_shared_memory" in a later
patch.

There are two new predefined allocators, ompx_unified_shared_mem_alloc and
ompx_host_mem_alloc, plus corresponding memory spaces, which can be used to
allocate memory in the "managed" space and explicitly on the host (it is
intended that "malloc" will be intercepted by the compiler).

The nvptx plugin is modified to make the necessary Cuda calls, and libgomp
is modified to switch to shared-memory mode for USM allocated mappings.

include/ChangeLog:

	* cuda/cuda.h (CUdevice_attribute): Add definitions for
	CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR and
	CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR.
	(CUmemAttach_flags): New.
	(CUpointer_attribute): New.
	(cuMemAllocManaged): New prototype.
	(cuPointerGetAttribute): New prototype.

libgomp/ChangeLog:

	* allocator.c (omp_max_predefined_alloc): Update.
	(omp_aligned_alloc): Don't fallback ompx_host_mem_alloc.
	(omp_aligned_calloc): Likewise.
	(omp_realloc): Likewise.
	* config/linux/allocator.c (linux_memspace_alloc): Handle USM.
	(linux_memspace_calloc): Handle USM.
	(linux_memspace_free): Handle USM.
	(linux_memspace_realloc): Handle USM.
	* config/nvptx/allocator.c (nvptx_memspace_alloc): Reject
	ompx_host_mem_alloc.
	(nvptx_memspace_calloc): Likewise.
	(nvptx_memspace_realloc): Likewise.
	* libgomp-plugin.h (GOMP_OFFLOAD_usm_alloc): New prototype.
	(GOMP_OFFLOAD_usm_free): New prototype.
	(GOMP_OFFLOAD_is_usm_ptr): New prototype.
	* libgomp.h (gomp_usm_alloc): New prototype.
	(gomp_usm_free): New prototype.
	(gomp_is_usm_ptr): New prototype.
	(struct gomp_device_descr): Add USM functions.
	* omp.h.in (omp_memspace_handle_t): Add ompx_unified_shared_mem_space
	and ompx_host_mem_space.
	(omp_allocator_handle_t): Add ompx_unified_shared_mem_alloc and
	ompx_host_mem_alloc.
	* omp_lib.f90.in: Likewise.
	* plugin/cuda-lib.def (cuMemAllocManaged): Add new call.
	(cuPointerGetAttribute): Likewise.
	* plugin/plugin-nvptx.c (nvptx_alloc): Add "usm" parameter.
	Call cuMemAllocManaged as appropriate.
	(GOMP_OFFLOAD_get_num_devices): Allow GOMP_REQUIRES_UNIFIED_ADDRESS
	and GOMP_REQUIRES_UNIFIED_SHARED_MEMORY.
	(GOMP_OFFLOAD_alloc): Move internals to ...
	(GOMP_OFFLOAD_alloc_1): ... this, and add usm parameter.
	(GOMP_OFFLOAD_usm_alloc): New function.
	(GOMP_OFFLOAD_usm_free): New function.
	(GOMP_OFFLOAD_is_usm_ptr): New function.
	* target.c (gomp_map_vars_internal): Add USM support.
	(gomp_usm_alloc): New function.
	(gomp_usm_free): New function.
	(gomp_load_plugin_for_device): New function.
	* testsuite/libgomp.c/usm-1.c: New test.
	* testsuite/libgomp.c/usm-2.c: New test.
	* testsuite/libgomp.c/usm-3.c: New test.
	* testsuite/libgomp.c/usm-4.c: New test.
	* testsuite/libgomp.c/usm-5.c: New test.

co-authored-by: Kwok Cheung Yeung  <kcy@codesourcery.com>

squash! openmp, nvptx: ompx_unified_shared_mem_alloc
---
 include/cuda/cuda.h                 | 12 ++++++
 libgomp/allocator.c                 | 13 ++++--
 libgomp/config/linux/allocator.c    | 48 ++++++++++++++--------
 libgomp/config/nvptx/allocator.c    |  6 +++
 libgomp/libgomp-plugin.h            |  3 ++
 libgomp/libgomp.h                   |  6 +++
 libgomp/omp.h.in                    |  4 ++
 libgomp/omp_lib.f90.in              |  8 ++++
 libgomp/plugin/cuda-lib.def         |  2 +
 libgomp/plugin/plugin-nvptx.c       | 47 ++++++++++++++++++---
 libgomp/target.c                    | 64 +++++++++++++++++++++++++++++
 libgomp/testsuite/libgomp.c/usm-1.c | 24 +++++++++++
 libgomp/testsuite/libgomp.c/usm-2.c | 32 +++++++++++++++
 libgomp/testsuite/libgomp.c/usm-3.c | 35 ++++++++++++++++
 libgomp/testsuite/libgomp.c/usm-4.c | 36 ++++++++++++++++
 libgomp/testsuite/libgomp.c/usm-5.c | 28 +++++++++++++
 16 files changed, 340 insertions(+), 28 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c/usm-1.c
 create mode 100644 libgomp/testsuite/libgomp.c/usm-2.c
 create mode 100644 libgomp/testsuite/libgomp.c/usm-3.c
 create mode 100644 libgomp/testsuite/libgomp.c/usm-4.c
 create mode 100644 libgomp/testsuite/libgomp.c/usm-5.c