This patch finally handles reverse offload. Due to the prep work, it essentially only adds content to libgomp/target.c's gomp_target_rev(), except that it additionally saves the reverse-offload-function table in gomp_load_image_to_device. In the comment to "[Patch] libgomp: Add reverse-offload splay tree", https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601368.html , it was suggested not to keep track of all the variable mappings and to reconstruct the mapping from the normal splay tree, which this patch does. (Albeit in the very slow walk-everything way. Given that reverse-offload target regions likely have only few map items and program should only use few reverse-offload regions and expect them not being fast, that might be okay.) Specification references: - For pointer attachment, I assume that the pointer is already fine on the host (if existed on the host before) and it does not need to get updated. I think the spec lacks a wording for this; cf. OpenMP Spec Issue #3424. - There are plans to permit 'nowait'. I think it wouldn't change anything except for not spin waiting for the result - and (only for shared memory), the argument lists (addr, kinds, sizes) need to be copied to have a sufficent life time. (To be implemented in future; cf. OpenMP Spec Pull Req. 3423 for Issue 2038.) * * * 32bit vs. 64bit: libgomp itself is compiled with both -m32 and -m64; however, nvptx and gcn requires -m64 on the device side and assume that the device pointers are representable on the host (i.e. all are 64bit). The new code tries to be in principle compatible with uint32_t pointers and uses uint64_t to represent it consistently. – The code should be mostly fine, except that one called function requires an array of void* and size_t. Instead of handling that case, I added some code to permit optimizing away the function content without offloading - and a run-time assert if it should ever happen that this function gets called on a 32bit host from the target side. It is a run-time fail as '#if TARGET_OFFLOAD == ""' does not work (string comparison by the C preprocessor not supported, unfortunately). Comments, suggestions, OK for mainline, ... ? Tobias PS: * As follow-up, libgomp.texi must be updated * For GCN, it currently does not work until stack variables are accessible from the host. (Prep work for this is in newlib + GCC 13.) One done, a similar one-line change to plugin-gcn.c's GOMP_OFFLOAD_get_num_devices is required. PPS: (Off topic remark to 32bit host) While 32bit host with 32bit device will mostly work, having a 32bit host with a 64bit device becomes interesting as 'void *' returned by omp_target_alloc(...) can't represent a device pointer. The solution is a 32bit pointer pointing to a 64bit valirable, e.g. uint64_t *devptr = malloc(sizeof(uint64_t*); *devptr = internal_device_alloc (); return devptr; with all the fun to translate this correctly with {use,has}_device_ptr etc. To actually support this will require some larger changes to libgomp, which I do not see happening unless a device system with sizeof(void*) > 64 bit shows up. Or some compelling reason to use 32bit on the host; but not for for x86-64 or arm64 (or PowerPC). (There exist 128bit pointer systems, which use the upper bits for extra purposes - but for unified-shared address purposes, it seems to be unlikely that accelerator devices head this direction.) ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955