This adds support for using Cuda Managed Memory with omp_alloc. It will be used as the underpinnings for "requires unified_shared_memory" in a later patch. There are two new predefined allocators, ompx_unified_shared_mem_alloc and ompx_host_mem_alloc, plus corresponding memory spaces, which can be used to allocate memory in the "managed" space and explicitly on the host (it is intended that "malloc" will be intercepted by the compiler). The nvptx plugin is modified to make the necessary Cuda calls, and libgomp is modified to switch to shared-memory mode for USM allocated mappings. include/ChangeLog: * cuda/cuda.h (CUdevice_attribute): Add definitions for CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR and CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR. (CUmemAttach_flags): New. (CUpointer_attribute): New. (cuMemAllocManaged): New prototype. (cuPointerGetAttribute): New prototype. libgomp/ChangeLog: * allocator.c (omp_max_predefined_alloc): Update. (omp_aligned_alloc): Don't fallback ompx_host_mem_alloc. (omp_aligned_calloc): Likewise. (omp_realloc): Likewise. * config/linux/allocator.c (linux_memspace_alloc): Handle USM. (linux_memspace_calloc): Handle USM. (linux_memspace_free): Handle USM. (linux_memspace_realloc): Handle USM. * config/nvptx/allocator.c (nvptx_memspace_alloc): Reject ompx_host_mem_alloc. (nvptx_memspace_calloc): Likewise. (nvptx_memspace_realloc): Likewise. * libgomp-plugin.h (GOMP_OFFLOAD_usm_alloc): New prototype. (GOMP_OFFLOAD_usm_free): New prototype. (GOMP_OFFLOAD_is_usm_ptr): New prototype. * libgomp.h (gomp_usm_alloc): New prototype. (gomp_usm_free): New prototype. (gomp_is_usm_ptr): New prototype. (struct gomp_device_descr): Add USM functions. * omp.h.in (omp_memspace_handle_t): Add ompx_unified_shared_mem_space and ompx_host_mem_space. (omp_allocator_handle_t): Add ompx_unified_shared_mem_alloc and ompx_host_mem_alloc. * omp_lib.f90.in: Likewise. * plugin/cuda-lib.def (cuMemAllocManaged): Add new call. (cuPointerGetAttribute): Likewise. * plugin/plugin-nvptx.c (nvptx_alloc): Add "usm" parameter. Call cuMemAllocManaged as appropriate. (GOMP_OFFLOAD_get_num_devices): Allow GOMP_REQUIRES_UNIFIED_ADDRESS and GOMP_REQUIRES_UNIFIED_SHARED_MEMORY. (GOMP_OFFLOAD_alloc): Move internals to ... (GOMP_OFFLOAD_alloc_1): ... this, and add usm parameter. (GOMP_OFFLOAD_usm_alloc): New function. (GOMP_OFFLOAD_usm_free): New function. (GOMP_OFFLOAD_is_usm_ptr): New function. * target.c (gomp_map_vars_internal): Add USM support. (gomp_usm_alloc): New function. (gomp_usm_free): New function. (gomp_load_plugin_for_device): New function. * testsuite/libgomp.c/usm-1.c: New test. * testsuite/libgomp.c/usm-2.c: New test. * testsuite/libgomp.c/usm-3.c: New test. * testsuite/libgomp.c/usm-4.c: New test. * testsuite/libgomp.c/usm-5.c: New test. co-authored-by: Kwok Cheung Yeung squash! openmp, nvptx: ompx_unified_shared_mem_alloc --- include/cuda/cuda.h | 12 ++++++ libgomp/allocator.c | 13 ++++-- libgomp/config/linux/allocator.c | 48 ++++++++++++++-------- libgomp/config/nvptx/allocator.c | 6 +++ libgomp/libgomp-plugin.h | 3 ++ libgomp/libgomp.h | 6 +++ libgomp/omp.h.in | 4 ++ libgomp/omp_lib.f90.in | 8 ++++ libgomp/plugin/cuda-lib.def | 2 + libgomp/plugin/plugin-nvptx.c | 47 ++++++++++++++++++--- libgomp/target.c | 64 +++++++++++++++++++++++++++++ libgomp/testsuite/libgomp.c/usm-1.c | 24 +++++++++++ libgomp/testsuite/libgomp.c/usm-2.c | 32 +++++++++++++++ libgomp/testsuite/libgomp.c/usm-3.c | 35 ++++++++++++++++ libgomp/testsuite/libgomp.c/usm-4.c | 36 ++++++++++++++++ libgomp/testsuite/libgomp.c/usm-5.c | 28 +++++++++++++ 16 files changed, 340 insertions(+), 28 deletions(-) create mode 100644 libgomp/testsuite/libgomp.c/usm-1.c create mode 100644 libgomp/testsuite/libgomp.c/usm-2.c create mode 100644 libgomp/testsuite/libgomp.c/usm-3.c create mode 100644 libgomp/testsuite/libgomp.c/usm-4.c create mode 100644 libgomp/testsuite/libgomp.c/usm-5.c