Hi, On 15.12.22 20:42, Tobias Burnus wrote: >> If the libgomp plugin doesn't request special >> 'host_to_dev_cpy'/'dev_to_host_cpy' for 'gomp_target_rev', then standard >> 'gomp_copy_host2dev'/'gomp_copy_dev2host' are used, which use >> 'gomp_device_copy', which expects the device to be locked. (As can be >> told by the unconditional 'gomp_mutex_unlock (&devicep->lock);' before >> 'gomp_fatal'.) However, in a number of the >> 'gomp_copy_host2dev'/'gomp_copy_dev2host' calls from 'gomp_target_rev', >> the device definitely is not locked; see Actually, reading it + the source code again, I think it makes sense to return a boolean – similar to devicep->host2dev_func and devicep->dev2host_func — and possibly wrap it into some convenience function, similar to gomp_device_copy – at least a bare exit() without further diagnostic does not seem to userfriendly. BTW: In line with the other code, you could use CUDA_CALL instead of CUDA_CALL_ERET; the fomer already calls the latter with 'false' as first argument + is used elsewhere. Regarding the lock: It seems the problem is the copying of devaddrs/sizes/kinds; this does not need any lock as the stack variables are on the device and only used for this reverse offload. Thus, there is no need for a lock as there are no races. However, as the existing gomp_copy_dev2host removes the lock, we could simply keep this lock – and probably should move it down to just before the user-function call – removing all (non-error) locks and unlocks on the way. — I mean something like the attached patch. Finally, I think we need to find a solution for the issue Andrew tried to address. — The current code invokes CUDA_CALL_ASSERT – which calls GOMP_PLUGIN_fatal. Tobias ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955