On 3/30/22 11:02, Tobias Burnus wrote: > On 30.03.22 10:03, Tom de Vries wrote: > >> On 3/29/22 16:47, Tobias Burnus wrote: >>> I think it would be useful to have additionally some wording for the >>> (new in GCC 12/new since today) macros, > [...] >> The macro is defined also if the option is not specified, so I think >> this formulation is not 100% clear in that aspect.  I've reformulated >> to fix that. > Fine. (It was a copy, paste + modify from elsewhere.) > >> Also, I took out the detail of how the value is determined, since >> we're just following __CUDA_ARCH__ rather than defining our own policy. > > OK. While I am not sure that it is obvious, also the example makes clear > what value to expect. Combining the two, I concur that the details > aren't required. > >> Any comments? > > LGTM. > > Tobias > > > PS: Regarding the sm_30 -> sm_35 change (before in this email thread). > That was not meant to be in the the .texi file, but just as item to > remember when updating the wwwdocs / gcc-12/changes.html document. > I see, I misunderstood then. FWIW, it's already added to the version in my sandbox. > It was/is also not completely clear to me whether there is still this > CUDA 11.x issue of not supporting sm_30 (only sm_35 and higher) or not. Thanks for reminding me of this issue. > I assume it still exists but is mitigated at > compiler-usage/libgomp-runtime-usage time as PTX ISA now defaults to 6.0 > such that CUDA – but shouldn't it still see sm_30 instead of sm_35 in > this case? > > If so, I think it will still show up when using either explicitly PTX > ISA 3.1 or when building GCC itself and all of the following holds: > nvptx-tools is installed, CUDA (in a too new version) is installed > (ptxas in $PATH) , and the the pending pull request nvptx-tools has not > been applied that ignores the non-explicit '--verify' when .target sm_xx > or PTX ISA .version is not supported by ptxas. I don't think the 6.0 default has any influence (and I'll be using -mptx=3.1 below to make sure we run into the worst-case behaviour). Anyway, in absence of an nvptx-tools fix I committed a work-around in the compiler: ... #define ASM_SPEC "%{misa=*:-m %*; :-m sm_35}%{misa=sm_30:--no-verify}" ... Note that this was before reverting back the default to sm_30, and I probably forgot to update this spot when changing the default. So now, there are effectively two workarounds in place. This (implicitly using sm_30) passes: ... $ ( PATH=$PATH:~/cuda/11.6.0/bin; ./gcc.sh ~/hello.c -c -save-temps -Wa,--verify -mptx=3.1 ) ... because as we can see with -v, sm_35 is used to verify: ... ./build-gcc/gcc/as -m sm_35 --verify -o hello.o hello.s ... This (explicitly using sm_30) passes: ... $ ( PATH=$PATH:~/cuda/11.6.0/bin; ./gcc.sh ~/hello.c -c -save-temps -march=sm_30 -mptx=3.1 ) ... because as we can see with -v, the --no-verify workaround is triggered: ... ./build-gcc/gcc/as -m sm_30 --no-verify -o hello.o hello.s ... But that one stops working once we use an explicit -Wa,--verify: ... $ ( PATH=$PATH:~/cuda/11.6.0/bin; ./gcc.sh ~/hello.c -c -save-temps -Wa,--verify -march=sm_30 -mptx=3.1 ) ptxas fatal : Value 'sm_30' is not defined for option 'gpu-name' nvptx-as: ptxas returned 255 exit status ... So, it seems using sm_35 to verify sm_30 is the most robust workaround. I'm currently testing attached patch. Thanks, - Tom