From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98]) by sourceware.org (Postfix) with ESMTPS id AEFDE385843A for ; Fri, 28 Apr 2023 08:28:32 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org AEFDE385843A Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.99,234,1677571200"; d="scan'208";a="4213019" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa2.mentor.iphmx.com with ESMTP; 28 Apr 2023 00:28:31 -0800 IronPort-SDR: aMyBGYEtofgaWClGfcJzt4oCb5pSH9vqg2nnEMYadxI6+nhCG8hlHS/q2dX9uYP5JYjm7Onjep 1GgMEQvk4DkSUFvXF2cfWNq8UL1CqhMouxbBSsLdYfjRyWD0kc2+eXdC8iDzbDNJr7bgAMWMj3 z4abq9kLHDEUPogHrx5MacelCcd51fSkgL1JQ1m3InkcYjr3SgMqT+yuKnRLGLDWoZL86QvVLK P/yUYzZc7Y8AmqUK0nfzJLm+TCGavvwUYlIua3STyDpHWuESPUOZLrTgDjSAUligZ6pvOCRbwq SjY= Message-ID: <5915414b-cd34-f1b0-cb2d-aea0560b1f3c@codesourcery.com> Date: Fri, 28 Apr 2023 10:28:22 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.1 Subject: Re: [Patch] libgomp/nvptx: Prepare for reverse-offload callback handling Content-Language: en-US To: Thomas Schwinge CC: Alexander Monakov , Jakub Jelinek , Tom de Vries , References: <57b3ae5e-8f15-8bea-fa09-39bccbaa2414@codesourcery.com> <871qkzpv6t.fsf@euler.schwinge.homeip.net> From: Tobias Burnus In-Reply-To: <871qkzpv6t.fsf@euler.schwinge.homeip.net> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: quoted-printable X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) To svr-ies-mbx-12.mgc.mentorg.com (139.181.222.12) X-Spam-Status: No, score=-6.0 required=5.0 tests=BAYES_00,HEADER_FROM_DIFFERENT_DOMAINS,KAM_DMARC_STATUS,NICE_REPLY_A,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Thomas, maybe I misunderstood your suggestion, but "Wait on a memory location" assumes that there will be a change =E2=80=93 but if a target region happen= s to have no reverse offload, the memory location will never change, but still the target region should return to the host. What we would need: Wait on memory location =E2=80=93 and return if either = the kernel stopped *or* the memory location changed. My impression is that "return if the kernel stopped" is not really guaranteed. Of did I miss some fineprint? Tobias On 04.04.23 16:40, Thomas Schwinge wrote: > Hi! > > During GCC/OpenMP/nvptx reverse offload investigations, about how to > replace the problematic global 'GOMP_REV_OFFLOAD_VAR', I may have found > something re: > > On 2022-08-26T11:07:28+0200, Tobias Burnus wrot= e: >> Better suggestions are welcome for the busy loop in >> libgomp/plugin/plugin-nvptx.c regarding the variable placement and check= ing >> its value. >> On the host side, the last address is checked - if fn_addr !=3D NULL, >> it passes all arguments on to the generic (target.c) gomp_target_rev >> to do the actual offloading. >> >> CUDA does lockup when trying to copy data from the currently running >> stream; hence, a new stream is generated to do the memory copying. >> Future work for nvptx: >> * Adjust 'sleep', possibly [...] >> to do shorter sleeps than usleep(1)? > ... this busy loop. > > Current 'libgomp/plugin/plugin-nvptx.c:GOMP_OFFLOAD_run': > > [...] > if (reverse_offload) > CUDA_CALL_ASSERT (cuStreamCreate, ©_stream, CU_STREAM_NON_BL= OCKING); > r =3D CUDA_CALL_NOCHECK (cuLaunchKernel, function, teams, 1, 1, > 32, threads, 1, 0, NULL, NULL, config); > if (r !=3D CUDA_SUCCESS) > GOMP_PLUGIN_fatal ("cuLaunchKernel error: %s", cuda_error (r)); > if (reverse_offload) > while (true) > { > r =3D CUDA_CALL_NOCHECK (cuStreamQuery, NULL); > if (r =3D=3D CUDA_SUCCESS) > break; > if (r =3D=3D CUDA_ERROR_LAUNCH_FAILED) > GOMP_PLUGIN_fatal ("cuStreamQuery error: %s %s\n", cuda_er= ror (r), > maybe_abort_msg); > else if (r !=3D CUDA_ERROR_NOT_READY) > GOMP_PLUGIN_fatal ("cuStreamQuery error: %s", cuda_error (= r)); > > if (__atomic_load_n (&ptx_dev->rev_data->fn, __ATOMIC_ACQUIR= E) !=3D 0) > { > struct rev_offload *rev_data =3D ptx_dev->rev_data; > GOMP_PLUGIN_target_rev (rev_data->fn, rev_data->mapnum, > rev_data->addrs, rev_data->sizes= , > rev_data->kinds, rev_data->dev_n= um, > rev_off_dev_to_host_cpy, > rev_off_host_to_dev_cpy, copy_st= ream); > CUDA_CALL_ASSERT (cuStreamSynchronize, copy_stream); > __atomic_store_n (&rev_data->fn, 0, __ATOMIC_RELEASE); > } > usleep (1); > } > else > r =3D CUDA_CALL_NOCHECK (cuCtxSynchronize, ); > if (reverse_offload) > CUDA_CALL_ASSERT (cuStreamDestroy, copy_stream); > [...] > > Instead of this 'while (true)', 'usleep (1)' loop, shouldn't we be able > to use "Stream Memory Operations", > that allow to "Wait on a memory location", "until the given condition on > the memory is satisfied"? > > For reference, current 'libgomp/config/nvptx/target.c:GOMP_target_ext': > > [...] > GOMP_REV_OFFLOAD_VAR->mapnum =3D mapnum; > GOMP_REV_OFFLOAD_VAR->addrs =3D (uint64_t) hostaddrs; > GOMP_REV_OFFLOAD_VAR->sizes =3D (uint64_t) sizes; > GOMP_REV_OFFLOAD_VAR->kinds =3D (uint64_t) kinds; > GOMP_REV_OFFLOAD_VAR->dev_num =3D GOMP_ADDITIONAL_ICVS.device_num; > > /* Set 'fn' to trigger processing on the host; wait for completion= , > which is flagged by setting 'fn' back to 0 on the host. */ > uint64_t addr_struct_fn =3D (uint64_t) &GOMP_REV_OFFLOAD_VAR->fn; > #if __PTX_SM__ >=3D 700 > asm volatile ("st.global.release.sys.u64 [%0], %1;" > : : "r"(addr_struct_fn), "r" (fn) : "memory"); > #else > __sync_synchronize (); /* membar.sys */ > asm volatile ("st.volatile.global.u64 [%0], %1;" > : : "r"(addr_struct_fn), "r" (fn) : "memory"); > #endif > > #if __PTX_SM__ >=3D 700 > uint64_t fn2; > do > { > asm volatile ("ld.acquire.sys.global.u64 %0, [%1];" > : "=3Dr" (fn2) : "r" (addr_struct_fn) : "memory"= ); > } > while (fn2 !=3D 0); > #else > /* ld.global.u64 %r64,[__gomp_rev_offload_var]; > ld.u64 %r36,[%r64]; > membar.sys; */ > while (__atomic_load_n (&GOMP_REV_OFFLOAD_VAR->fn, __ATOMIC_ACQUIR= E) !=3D 0) > ; /* spin */ > #endif > [...] > > > Gr=C3=BC=C3=9Fe > Thomas > ----------------- > Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstra=C3=9Fe 2= 01, 80634 M=C3=BCnchen; Gesellschaft mit beschr=C3=A4nkter Haftung; Gesch= =C3=A4ftsf=C3=BChrer: Thomas Heurung, Frank Th=C3=BCrauf; Sitz der Gesellsc= haft: M=C3=BCnchen; Registergericht M=C3=BCnchen, HRB 106955 ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstra=C3=9Fe 201= , 80634 M=C3=BCnchen; Gesellschaft mit beschr=C3=A4nkter Haftung; Gesch=C3= =A4ftsf=C3=BChrer: Thomas Heurung, Frank Th=C3=BCrauf; Sitz der Gesellschaf= t: M=C3=BCnchen; Registergericht M=C3=BCnchen, HRB 106955