Updated patch enclosed. Changes: * Fixes the sm >= 700 issue, I noted before (cf. below) * The < sm_70 code is still in, but disabled at user-compile time, with a warning, if libgomp.a wasn't compiled with sm_70 or higher. (mkoffload strips the nvptx offload code) * Some minor cleanup OK for mainline? Tobias On 02.10.22 20:13, Tobias Burnus wrote: On 27.09.22 11:23, Tobias Burnus wrote: We do support #if __PTX_SM__ >= 600 (CUDA >= 8.0, ptx isa >= 5.0) and we also can configure GCC with --with-arch=sm_70 (or sm_80 or ...) Thus, adding atomics with .sys scope is possible. See attached patch. This seems to work fine and I hope I got the assembly right in terms of atomic use. (And I do believe that the .release/.acquire do not need an additional __sync_syncronize()/"membar.sys".) Regarding this: While 'atom.op' (op = and/or/xor/cas/exch/add/inc/dec/min/max) with scope is a sm_60 feature, the used 'st/ld' with scope qualifier and .relaxed, .release / .relaxed, .acquire require sm_70. (Does not really matter as only ..., sm_53 and sm_70, ... is currently supported but not sm_60, but the #if should be obviously fixed.) * * * Looking at the generated code for without inline assembler, we have instead of st.global.release.sys.u64 [%r27],%r39; and ld.acquire.sys.global.u64 %r62,[%r27]; for the older-systems (__PTX_SM < 700) the code: @ %r69 membar.sys; @ %r69 atom.exch.b64 _,[%r27],%r41; and ld.global.u64 %r64,[__gomp_rev_offload_var]; ld.u64 %r36,[%r64]; membar.sys; In my understanding, the membar.sys ensures - similar to st.release / ld.acquire that the memory handling is done in the correct order in scope .sys. As the 'fn' variable is initially 0 - and then only set via the device i.e. there is eventually a DMA write device->host, which is atomically as the will int64_t is written at once (and not first, e.g. the lower and then the upper half). The 'st'/'atom.exch' should work fine, despite having no .sys scope. Likewise, the membar.sys applies also in the other direction. Or did I miss something. If so, would an explicit __sync_synchronize() (= membar.sys) help between the 'st' and the 'ld'? Tobias ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955