Hi Frederik! On 2019-11-14T16:35:31+0100, Frederik Harwath wrote: > this patch implements OpenACC 2.6 "acc_get_property" and related functions. I'm attaching a version rebased on top of current GCC trunk, as well as an incremental patch; please review these changes, and merge into yours. > There is no AMD GCN support yet. This will be added later on. ACK, just to note that there now is a 'libgomp/plugin/plugin-gcn.c' that at least needs to get a stub implementation (can mostly copy from 'libgomp/plugin/plugin-hsa.c'?) as otherwise the build will fail. Tobias has generally reviewed the Fortran bits, correct? As I mentioned before ("thinking aloud"): | Before Frederik starts working on integrating this into GCC trunk, do you | (Jakub) agree with the libgomp plugin interface changes as implemented by | Maciej? For example, top-level 'GOMP_OFFLOAD_get_property' function in | 'struct gomp_device_descr' instead of stuffing this into its | 'acc_dispatch_t openacc'. (I never understood why the OpenACC functions | need to be segregated like they are.) Jakub didn't answer, but I now myself decided that we should group this with the other OpenACC libgomp-plugin functions, as this interface is defined in terms of OpenACC-specific stuff such as 'acc_device_t'. Frederik, please work on that, also try to move function definitions etc. into appropriate places in case they aren't; ask if you need help. > Add generic support for the OpenACC 2.6 `acc_get_property' and > `acc_get_property_string' routines, as well as full handlers for the > host and the NVPTX offload targets and minimal handlers for the HSA > and Intel MIC offload targets. > > Included are C/C++ and Fortran tests that, in particular, print > the property values for acc_property_vendor, acc_property_memory, > acc_property_free_memory, acc_property_name, and acc_property_driver. > The output looks as follows: > > Vendor: GNU > Name: GOMP > Total memory: 0 > Free memory: 0 > Driver: 1.0 > > with the host driver (where the memory related properties are not > supported for the host device and yield 0, conforming to the standard) > and output like: > > OpenACC vendor: Nvidia > OpenACC total memory: 12651462656 > OpenACC free memory: 12202737664 > OpenACC name: TITAN V > OpenACC driver: CUDA Driver 9.1 > > with the NVPTX driver. That needs to be updated. > .../acc-get-property-2.c | 68 +++++++++ > .../acc-get-property-3.c | 19 +++ > .../acc-get-property-aux.c | 60 ++++++++ > .../acc-get-property.c | 75 ++++++++++ > .../libgomp.oacc-fortran/acc-get-property.f90 | 80 ++++++++++ Please name all these 'acc_get_property*', which is the name of the interface tested. > --- a/include/gomp-constants.h > +++ b/include/gomp-constants.h > @@ -178,6 +178,20 @@ enum gomp_map_kind > > #define GOMP_DEVICE_ICV -1 > #define GOMP_DEVICE_HOST_FALLBACK -2 > +#define GOMP_DEVICE_CURRENT -3 As I mentioned before ("thinking aloud"): | This is used for 'acc_device_current', relevant only for | 'acc_get_property', to return "the value of the property for the current | device". This [now has a] special (negative?) value | [...], so that when additional real device types are added | later on, we can just add them with increasing numbers, and keep the | scanning code simple. Not should if this should be grouped with 'GOMP_DEVICE_ICV', 'GOMP_DEVICE_HOST_FALLBACK', for it is not related to there. Should this actually get value '-1' instead of '-3'? Or, is the OpenACC 'acc_device_t' code already paying special attention to negative values '-1', '-2'? (I don't think so.) | (Use of 'acc_device_current' as an argument to other functions taking an | 'acc_device_t' is undefined, and should be rejected with 'gomp_fatal'?) That should now be implemented; r278937 "Validate acc_device_t uses". | Also, 'acc_device_current' is a libgomp-internal thing (doesn't interface | with the compiler proper), so strictly speaking 'GOMP_DEVICE_CURRENT' | isn't needed in 'include/gomp-constants.h'. But probably still a good | idea to list it there, in this canonical place, to keep the several lists | of device types coherent. I still wonder about that... ;-) | (I have not checked how 'acc_device_current' is actually implemented in | the following.) > +/* Device property codes. Keep in sync with > + libgomp/{openacc.h,openacc.f90,openacc_lib.h}:acc_device_property_t | Same thing, libgomp-internal, not sure whether to list these here? > + as well as libgomp/libgomp-plugin.h. */ (Not sure why 'libgomp/libgomp-plugin.h' is relevant here?) > +/* Start from 1 to catch uninitialized use. */ | Hmm, not sure about that either. Don't think we're generally doing that? | | (But I see PGI have 'acc_property_none = 0', oh well.) > +#define GOMP_DEVICE_PROPERTY_MEMORY 1 > +#define GOMP_DEVICE_PROPERTY_FREE_MEMORY 2 > +#define GOMP_DEVICE_PROPERTY_NAME 0x10001 > +#define GOMP_DEVICE_PROPERTY_VENDOR 0x10002 > +#define GOMP_DEVICE_PROPERTY_DRIVER 0x10003 > + > +/* Internal property mask to tell numeric and string values apart. */ > +#define GOMP_DEVICE_PROPERTY_STRING_MASK 0x10000 (Maybe should use an 'enum'?) Maybe this stuff should move from 'include/gomp-constants.h' to 'libgomp/oacc-int.h'. I'll think about that again, when I'm awake again tomorrow. ;-) > --- a/libgomp/libgomp-plugin.h > +++ b/libgomp/libgomp-plugin.h > @@ -54,6 +54,13 @@ enum offload_target_type > OFFLOAD_TARGET_TYPE_GCN = 8 > }; > > +/* Container type for passing device properties. */ > +union gomp_device_property_value > +{ > + void *ptr; > + uintmax_t val; > +}; Why wouldn't that be 'size_t', 'const char *', as the actual data types used? (Maybe I'm missing something.) > --- a/libgomp/libgomp.map > +++ b/libgomp/libgomp.map > @@ -502,6 +502,14 @@ GOACC_2.0.1 { > GOACC_parallel_keyed; > } GOACC_2.0; > > +OACC_2.6 { > + global: > + acc_get_property; > + acc_get_property_h_; > + acc_get_property_string; > + acc_get_property_string_h_; > +} OACC_2.5; > + > GOMP_PLUGIN_1.0 { > global: > GOMP_PLUGIN_malloc; That's not correct: 'OACC_2.6' should come after 'OACC_2.5.1', and also inherit from that one. > --- a/libgomp/oacc-init.c > +++ b/libgomp/oacc-init.c > +static union gomp_device_property_value > +get_property_any (int ord, acc_device_t d, acc_device_property_t prop) > +{ > + if (!acc_known_device_type (d)) > + unknown_device_type_error(d); Checking isn't needed here for this is an internal interface? > + > + union gomp_device_property_value propval; > + struct gomp_device_descr *dev; > + struct goacc_thread *thr; Generally, in new code, we try to place these next to their first use. > + > + if (d == acc_device_none) > + return (union gomp_device_property_value) { .val = 0 }; > + > + goacc_lazy_initialize (); > + thr = goacc_thread (); > + > + if (d == acc_device_current && (!thr || !thr->dev)) > + return (union gomp_device_property_value) { .val = 0 }; Should we use a 'nullval' here, as used elsewhere? Also, this checking seems a bit convoluted; shouldn't this be integrated into the following? It's certainly not necessary to special-case 'acc_device_none' before 'goacc_lazy_initialize' etc.? > + > + if (d == acc_device_current) > + { > + dev = thr->dev; > + } > + else > + { > + int num_devices; > + > + gomp_mutex_lock (&acc_device_lock); > + > + dev = resolve_device (d, false); Why call this without 'fail_is_error' flag here? > + > + num_devices = dev->get_num_devices_func (); > + > + if (num_devices <= 0 || ord >= num_devices) > + acc_dev_num_out_of_range (d, ord, num_devices); > + > + dev += ord; > + > + gomp_mutex_lock (&dev->lock); > + if (dev->state == GOMP_DEVICE_UNINITIALIZED) > + gomp_init_device (dev); > + gomp_mutex_unlock (&dev->lock); > + > + gomp_mutex_unlock (&acc_device_lock); > + } > + > + assert (dev); > + > + propval = dev->get_property_func (dev->target_id, prop); > + > + return propval; > +} I'll have to look at this again, tomorrow. > --- a/libgomp/openacc.f90 > +++ b/libgomp/openacc.f90 > @@ -28,7 +28,7 @@ > ! . > > module openacc_kinds > - use iso_fortran_env, only: int32 > + use iso_fortran_env, only: int32, int64 > implicit none > > private :: int32 > @@ -47,6 +47,21 @@ module openacc_kinds > integer (acc_device_kind), parameter :: acc_device_not_host = 4 > integer (acc_device_kind), parameter :: acc_device_nvidia = 5 > integer (acc_device_kind), parameter :: acc_device_gcn = 8 > + integer (acc_device_kind), parameter :: acc_device_current = -3 > + > + public :: acc_device_property > + > + integer, parameter :: acc_device_property = int64 Why 'int64'? I changed this to 'int32', but please tell if there's a reason for 'int64'. > + > + public :: acc_property_memory, acc_property_free_memory > + public :: acc_property_name, acc_property_vendor, acc_property_driver > + > + ! Keep in sync with include/gomp-constants.h. > + integer (acc_device_property), parameter :: acc_property_memory = 1 > + integer (acc_device_property), parameter :: acc_property_free_memory = 2 > + integer (acc_device_property), parameter :: acc_property_name = int(Z'10001') > + integer (acc_device_property), parameter :: acc_property_vendor = int(Z'10002') > + integer (acc_device_property), parameter :: acc_property_driver = int(Z'10003') > > public :: acc_handle_kind > Per Tobias' recent "libgomp/openacc.f90 – clean-up public/private attributes" changes, some of this needs to move/change (done, I think). > @@ -87,6 +102,24 @@ module openacc_internal > integer (acc_device_kind) d > end subroutine > > + function acc_get_property_h (n, d, p) > + import > + implicit none (type, external) > + integer (acc_device_property) :: acc_get_property_h > + integer, value :: n > + integer (acc_device_kind), value :: d > + integer (acc_device_property), value :: p > + end function > + > + subroutine acc_get_property_string_h (n, d, p, s) > + import > + implicit none (type, external) > + integer, value :: n > + integer (acc_device_kind), value :: d > + integer (acc_device_property), value :: p > + character (*) :: s > + end subroutine > + > function acc_get_device_num_h (d) > import > integer acc_get_device_num_h No reason to place that between 'acc_set_device_num_h' and 'acc_get_device_num_h'? I moved it after the latter, like done elsewhere. Is it a conscious decision that we're not supporting the new 'acc_get_property' interface via 'openacc_lib.h', which is (or, used to be) an alternative to the Fortran 'openacc' module? As of OpenACC 2.5, 'openacc_lib.h' has been deprecated ("no longer supported"), but so far, we continued to support it, and it's (maybe?) strange when that one now works for everything but the 'acc_get_property' interface? Or, is that a statement that users really should move to the Fortran 'openacc' module? > --- a/libgomp/openacc.h > +++ b/libgomp/openacc.h > @@ -59,9 +59,20 @@ typedef enum acc_device_t { > _ACC_device_hwm, > /* Ensure enumeration is layout compatible with int. */ > _ACC_highest = __INT_MAX__, > - _ACC_neg = -1 > + _ACC_neg = -1, > + acc_device_current = -3 > } acc_device_t; (Update per final 'GOMP_DEVICE_CURRENT' value. If settling on '-1', 'acc_device_current' may *replace* the placeholder '_ACC_neg'.) > +typedef enum acc_device_property_t { > + /* Keep in sync with include/gomp-constants.h. */ > + /* Start from 1 to catch uninitialized use. */ > + acc_property_memory = 1, > + acc_property_free_memory = 2, > + acc_property_name = 0x10001, > + acc_property_vendor = 0x10002, > + acc_property_driver = 0x10003 > +} acc_device_property_t; Do we also need the magic here so that "Ensure enumeration is layout compatible with int"? But I see that is not done for the 'typedef enum acc_async_t' either. I don't remember the history behind that. > --- a/libgomp/plugin/plugin-hsa.c > +++ b/libgomp/plugin/plugin-hsa.c > @@ -699,6 +699,32 @@ GOMP_OFFLOAD_get_num_devices (void) > return hsa_context.agent_count; > } > > +/* Part of the libgomp plugin interface. Return the value of property > + PROP of agent number N. */ > + > +union gomp_device_property_value > +GOMP_OFFLOAD_get_property (int n, int prop) > +{ > + union gomp_device_property_value nullval = { .val = 0 }; > + > + if (!init_hsa_context ()) > + return nullval; I'm not familiar with that code, but similar to other plugins, 'init_hsa_context' already is called via 'GOMP_OFFLOAD_get_num_devices' (and 'GOMP_OFFLOAD_init_device', hmm...), so probably don't need to call it here? > + if (n >= hsa_context.agent_count) > + { > + GOMP_PLUGIN_error > + ("Request for a property of a non-existing HSA device %i", n); > + return nullval; > + } (Probably this would eventually call 'get_agent_info' to resolve 'n', which then already does the checking done here. No need to look into that right now.) > + > + switch (prop) > + { > + case GOMP_DEVICE_PROPERTY_VENDOR: > + return (union gomp_device_property_value) { .ptr = "AMD" }; > + default: > + return nullval; > + } > +} Not sure if "AMD" is actually correct here -- isn't HSA a vendor-independent standard? > --- a/libgomp/plugin/plugin-nvptx.c > +++ b/libgomp/plugin/plugin-nvptx.c > @@ -304,6 +305,7 @@ struct ptx_device > }; > > static struct ptx_device **ptx_devices; > +static char cuda_driver_version[30]; > > static inline struct nvptx_thread * > nvptx_thread (void) > @@ -330,6 +332,12 @@ nvptx_init (void) > CUDA_CALL (cuDeviceGetCount, &ndevs); > ptx_devices = GOMP_PLUGIN_malloc_cleared (sizeof (struct ptx_device *) > * ndevs); > + > + int v; > + CUDA_CALL_ERET (NULL, cuDriverGetVersion, &v); > + snprintf (cuda_driver_version, sizeof (cuda_driver_version) - 1, > + "CUDA Driver %u.%u", v / 1000, v % 1000 / 10); > + > return true; > } I moved these further up; this is relevant at the top-level, global in this plugin. Why the '- 1' in 'sizeof (cuda_driver_version) - 1'? (I removed that.) > @@ -293,6 +293,7 @@ struct ptx_device > int max_threads_per_block; > int max_threads_per_multiprocessor; > int default_dims[GOMP_DIM_MAX]; > + char* name; > > struct ptx_image_data *images; /* Images loaded on device. */ > pthread_mutex_t image_lock; /* Lock for above list. */ The number of 'struct ptx_device' instances will always be low, so let's just embed 'name' into this (done), so that we can avoid the dynamic allocation: > @@ -491,6 +499,10 @@ nvptx_open_device (int n) > for (int i = 0; i != GOMP_DIM_MAX; i++) > ptx_dev->default_dims[i] = 0; > > + const int max_name_len = 256; > + ptx_dev->name = GOMP_PLUGIN_malloc(max_name_len); > + CUDA_CALL_ERET (NULL, cuDeviceGetName, ptx_dev->name, max_name_len, dev); > + > ptx_dev->images = NULL; > pthread_mutex_init (&ptx_dev->image_lock, NULL); > > @@ -520,6 +532,7 @@ nvptx_close_device (struct ptx_device *ptx_dev) > if (!ptx_dev->ctx_shared) > CUDA_CALL (cuCtxDestroy, ptx_dev->ctx); > > + free (ptx_dev->name); > free (ptx_dev); > return true; > } As 'acc_get_property' is not an interface commonly used, as an alternative to "embed", we could also again make this a pointer, but "allocated, initialized upon first use", but I'm not convinced it's worth the effort. > +union gomp_device_property_value > +GOMP_OFFLOAD_get_property (int n, int prop) > +{ > + union gomp_device_property_value propval = { .val = 0 }; > + > + pthread_mutex_lock (&ptx_dev_lock); Everything (?) else seems to be accessing 'ptx_devices' without locking? (I don't quickly understand the locking protocol used there... Will look again tomorrow.) > + > + if (n >= nvptx_get_num_devices () || n < 0 || ptx_devices[n] == NULL) > + { > + pthread_mutex_unlock (&ptx_dev_lock); > + return propval; > + } > + > + struct ptx_device *ptx_dev = ptx_devices[n]; > + switch (prop) > + { > + case GOMP_DEVICE_PROPERTY_MEMORY: > + { > + size_t total_mem; > + > + CUDA_CALL_ERET (propval, cuDeviceTotalMem, &total_mem, ptx_dev->dev); > + propval.val = total_mem; > + } > + break; > + case GOMP_DEVICE_PROPERTY_FREE_MEMORY: > + { > + size_t total_mem; > + size_t free_mem; > + CUdevice ctxdev; > + > + CUDA_CALL_ERET (propval, cuCtxGetDevice, &ctxdev); > + if (ptx_dev->dev == ctxdev) > + CUDA_CALL_ERET (propval, cuMemGetInfo, &free_mem, &total_mem); > + else if (ptx_dev->ctx) > + { > + CUcontext old_ctx; > + > + CUDA_CALL_ERET (propval, cuCtxPushCurrent, ptx_dev->ctx); > + CUDA_CALL_ERET (propval, cuMemGetInfo, &free_mem, &total_mem); > + CUDA_CALL_ASSERT (cuCtxPopCurrent, &old_ctx); > + } > + else > + { > + CUcontext new_ctx; > + > + CUDA_CALL_ERET (propval, cuCtxCreate, &new_ctx, CU_CTX_SCHED_AUTO, > + ptx_dev->dev); > + CUDA_CALL_ERET (propval, cuMemGetInfo, &free_mem, &total_mem); > + CUDA_CALL_ASSERT (cuCtxDestroy, new_ctx); > + } > + propval.val = free_mem; > + } > + break; (Have not yet reviewed that CUDA magic. Do you understand that?) > + case GOMP_DEVICE_PROPERTY_NAME: > + propval.ptr = ptx_dev->name; > + break; > + case GOMP_DEVICE_PROPERTY_VENDOR: > + propval.ptr = "Nvidia"; > + break; > + case GOMP_DEVICE_PROPERTY_DRIVER: > + propval.ptr = cuda_driver_version; > + break; > + default: > + GOMP_PLUGIN_error("Unknown OpenACC device-property"); > + } I remember I asked: "Should this 'GOMP_PLUGIN_error' or even 'GOMP_PLUGIN_fatal'? (Similar then elsewhere.)" You only changed that here, and I now see that OpenACC 2.6, 3.2.6. "acc_get_property" actually states that "If the value of 'property' is not one of the known values for that query routine, or that property has no value for the specified device, 'acc_get_property' will return 0 and 'acc_get_property_string' will return NULL (in C or C++) or an blank string (in Fortran)". So this means this should actually not call 'GOMP_PLUGIN_error' but instead return zero etc.? Please also make sure that we have testsuite coverage for all the different cases possible. (Especially different Fortran interfaces.) I see 'libgomp/oacc-host.c:host_get_property', 'libgomp/plugin/plugin-hsa.c:GOMP_OFFLOAD_get_property', 'liboffloadmic/plugin/libgomp-plugin-intelmic.cpp:GOMP_OFFLOAD_get_property' do have a 'default: return nullval'; that's probably what we need to do here, too? > + > + pthread_mutex_unlock (&ptx_dev_lock); > + return propval; > +} > --- /dev/null > +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc-get-property-2.c FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/acc-get-property-2.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 (test for excess errors) UNRESOLVED: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/acc-get-property-2.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 compilation failed to produce executable [...]/libgomp.oacc-c-c++-common/acc-get-property-2.c:61:28: error: invalid conversion from 'void*' to 'char*' [-fpermissive] 61 | char *driver = malloc(sizeof(char) * 40); | ~~~~~~^~~~~~~~~~~~~~~~~~~ | | | void* > + char *driver = malloc(sizeof(char) * 40); > + snprintf (driver, 40, "CUDA Driver %u.%u", driver_version / 1000, > + driver_version % 1000 / 10); Let's just use 'char driver[30]', and 'sizeof driver'? > --- a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp > +++ b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp > @@ -174,6 +174,28 @@ GOMP_OFFLOAD_get_num_devices (void) > return num_devices; > } > > +extern "C" union gomp_device_property_value > +GOMP_OFFLOAD_get_property (int n, int prop) > +{ > + union gomp_device_property_value nullval = { .val = 0 }; > + > + if (n >= num_devices) > + { > + GOMP_PLUGIN_error > + ("Request for a property of a non-existing Intel MIC device %i", n); > + return nullval; > + } > + > + switch (prop) > + { > + case GOMP_DEVICE_PROPERTY_VENDOR: > + /* TODO: "error: invalid conversion from 'const void*' to 'void*' [-fpermissive]" */ > + return (union gomp_device_property_value) { .ptr = (char *) "Intel" }; Type cast maybe unnecessary per my 'libgomp/libgomp-plugin.h' comment above? > + default: > + return nullval; > + } > +} Grüße Thomas