Slightly changed patch: nvptx_attach_host_thread_to_device now fails again with an error for CUDA_ERROR_DEINITIALIZED, except for GOMP_OFFLOAD_fini_device. I think it makes more sense that way. Tobias Burnus wrote: > Testing showed that the libgomp.c/target-52.c failed with: > > libgomp: cuCtxGetDevice error: unknown cuda error > > libgomp: device finalization failed > > This testcase uses OMP_DISPLAY_ENV=true and > OMP_TARGET_OFFLOAD=mandatory, and those env vars matter, i.e. it only > fails if dg-set-target-env-var is honored. > > If both env vars are set, the device initialization occurs earlier as > OMP_DEFAULT_DEVICE is shown due to the display-env env var and its > value (when target-offload-var is 'mandatory') might be either > 'omp_invalid_device' or '0'. > > It turned out that this had an effect on device finalization, which > caused CUDA to stop earlier than expected. This patch now handles this > case gracefully. For details, see the commit log message in the > attached patch and/or the PR. > > Comments, remarks, suggestions? > > Does this look sensible? (I would like to see some acknowledgement by > someone who feels more comfortable with CUDA than me.) Tobias