On 4/1/22 14:28, Thomas Schwinge wrote: > Hi Tom! > > On 2022-04-01T13:24:40+0200, Tom de Vries wrote: >> When running testcases libgomp.fortran/examples-4/declare_target-{1,2}.f90 on >> an RTX A2000 (sm_86) with driver 510.60.02 and with GOMP_NVPTX_JIT=-O0 I run >> into: >> ... >> FAIL: libgomp.fortran/examples-4/declare_target-1.f90 -O0 \ >> -DGOMP_NVPTX_JIT=-O0 execution test >> FAIL: libgomp.fortran/examples-4/declare_target-2.f90 -O0 \ >> -DGOMP_NVPTX_JIT=-O0 execution test >> ... >> >> Fix this by further limiting recursion depth in the test-cases for nvptx. >> >> Furthermore, make the recursion depth limiting nvptx-specific. > > Careful: > >> --- a/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-1.f90 >> +++ b/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-1.f90 >> @@ -1,4 +1,16 @@ >> ! { dg-do run } >> +! { dg-additional-options "-cpp" } >> +! Reduced from 25 to 23, otherwise execution runs out of thread stack on >> +! Nvidia Titan V. >> +! Reduced from 23 to 22, otherwise execution runs out of thread stack on >> +! Nvidia T400 (2GB variant), when run with GOMP_NVPTX_JIT=-O0. >> +! Reduced from 22 to 20, otherwise execution runs out of thread stack on >> +! Nvidia RTX A2000 (6GB variant), when run with GOMP_NVPTX_JIT=-O0. >> +! { dg-additional-options "-DREC_DEPTH=20" { target { offload_target_nvptx } } } */ > > 'offload_target_nvptx' doesn't mean that offloading execution is done on > nvptx, but rather that we're "*compiling* for offload target nvptx" > (emphasis mine). That means, with such a change we're now getting > different behavior in a system with an AMD GPU, when using a toolchain > that only has GCN offloading configured vs. a toolchain that has GCN and > nvptx offloading configured. This isn't going to cause any real > problems, of course, but it's confusing, and a bad example of > 'offload_target_nvptx'. > > 'offload_device_nvptx' ought to work: "using nvptx offload device". > Thanks for pointing that out. I tried to understand this multiple offloading configuration a bit, and came up with the following mental model: it's possible to have a host with say an nvptx and amd offloading device, and then configure and build a toolchain that can generate a single executable that can offload to either device, depending on the value of appropriate openacc/openmp environment variables. So, in principle the libgomp testsuite could have a mode in which it does that: run the same executable twice, once for each offloading device. In that case, even using offload_device_nvptx would not be accurate enough, and we'd need to test for offload device type at runtime, as used to be done in libgomp/testsuite/libgomp.fortran/task-detach-6.f90. I've tried to copy that setup to libgomp/testsuite/libgomp.fortran/examples-4/declare_target-1.f90, but that doesn't seem to work anymore. I've also tried copying that test-case to libgomp/testsuite/libgomp.fortran/copy-of-declare_target-1.f90 to rule out any subdir-related problems, but no luck there either. Attached is that copy approach, could you try it out and see if it works for you? Do you perhaps have an idea why it's failing? I can make a patch using offload_device_nvptx, but I'd prefer to understand first why the approach above isn't working. Thanks, - Tom