public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Tobias Burnus <tobias@codesourcery.com>
To: Thomas Schwinge <thomas@codesourcery.com>
Cc: Alexander Monakov <amonakov@ispras.ru>,
	Jakub Jelinek <jakub@redhat.com>, Tom de Vries <tdevries@suse.de>,
	<gcc-patches@gcc.gnu.org>
Subject: Re: [Patch] libgomp/nvptx: Prepare for reverse-offload callback handling
Date: Fri, 28 Apr 2023 10:28:22 +0200	[thread overview]
Message-ID: <5915414b-cd34-f1b0-cb2d-aea0560b1f3c@codesourcery.com> (raw)
In-Reply-To: <871qkzpv6t.fsf@euler.schwinge.homeip.net>

Hi Thomas,

maybe I misunderstood your suggestion, but "Wait on a memory location"
assumes that there will be a change – but if a target region happens to
have no reverse offload, the memory location will never change, but
still the target region should return to the host.

What we would need: Wait on memory location – and return if either the
kernel stopped *or* the memory location changed.

My impression is that "return if the kernel stopped" is not really
guaranteed. Of did I miss some fineprint?

Tobias

On 04.04.23 16:40, Thomas Schwinge wrote:
> Hi!
>
> During GCC/OpenMP/nvptx reverse offload investigations, about how to
> replace the problematic global 'GOMP_REV_OFFLOAD_VAR', I may have found
> something re:
>
> On 2022-08-26T11:07:28+0200, Tobias Burnus <tobias@codesourcery.com> wrote:
>> Better suggestions are welcome for the busy loop in
>> libgomp/plugin/plugin-nvptx.c regarding the variable placement and checking
>> its value.
>> On the host side, the last address is checked - if fn_addr != NULL,
>> it passes all arguments on to the generic (target.c) gomp_target_rev
>> to do the actual offloading.
>>
>> CUDA does lockup when trying to copy data from the currently running
>> stream; hence, a new stream is generated to do the memory copying.
>> Future work for nvptx:
>> * Adjust 'sleep', possibly [...]
>>    to do shorter sleeps than usleep(1)?
> ... this busy loop.
>
> Current 'libgomp/plugin/plugin-nvptx.c:GOMP_OFFLOAD_run':
>
>      [...]
>        if (reverse_offload)
>          CUDA_CALL_ASSERT (cuStreamCreate, &copy_stream, CU_STREAM_NON_BLOCKING);
>        r = CUDA_CALL_NOCHECK (cuLaunchKernel, function, teams, 1, 1,
>                               32, threads, 1, 0, NULL, NULL, config);
>        if (r != CUDA_SUCCESS)
>          GOMP_PLUGIN_fatal ("cuLaunchKernel error: %s", cuda_error (r));
>        if (reverse_offload)
>          while (true)
>            {
>              r = CUDA_CALL_NOCHECK (cuStreamQuery, NULL);
>              if (r == CUDA_SUCCESS)
>                break;
>              if (r == CUDA_ERROR_LAUNCH_FAILED)
>                GOMP_PLUGIN_fatal ("cuStreamQuery error: %s %s\n", cuda_error (r),
>                                   maybe_abort_msg);
>              else if (r != CUDA_ERROR_NOT_READY)
>                GOMP_PLUGIN_fatal ("cuStreamQuery error: %s", cuda_error (r));
>
>              if (__atomic_load_n (&ptx_dev->rev_data->fn, __ATOMIC_ACQUIRE) != 0)
>                {
>                  struct rev_offload *rev_data = ptx_dev->rev_data;
>                  GOMP_PLUGIN_target_rev (rev_data->fn, rev_data->mapnum,
>                                          rev_data->addrs, rev_data->sizes,
>                                          rev_data->kinds, rev_data->dev_num,
>                                          rev_off_dev_to_host_cpy,
>                                          rev_off_host_to_dev_cpy, copy_stream);
>                  CUDA_CALL_ASSERT (cuStreamSynchronize, copy_stream);
>                  __atomic_store_n (&rev_data->fn, 0, __ATOMIC_RELEASE);
>                }
>              usleep (1);
>            }
>        else
>          r = CUDA_CALL_NOCHECK (cuCtxSynchronize, );
>        if (reverse_offload)
>          CUDA_CALL_ASSERT (cuStreamDestroy, copy_stream);
>      [...]
>
> Instead of this 'while (true)', 'usleep (1)' loop, shouldn't we be able
> to use "Stream Memory Operations",
> <https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEMOP.html
> that allow to "Wait on a memory location", "until the given condition on
> the memory is satisfied"?
>
> For reference, current 'libgomp/config/nvptx/target.c:GOMP_target_ext':
>
>      [...]
>        GOMP_REV_OFFLOAD_VAR->mapnum = mapnum;
>        GOMP_REV_OFFLOAD_VAR->addrs = (uint64_t) hostaddrs;
>        GOMP_REV_OFFLOAD_VAR->sizes = (uint64_t) sizes;
>        GOMP_REV_OFFLOAD_VAR->kinds = (uint64_t) kinds;
>        GOMP_REV_OFFLOAD_VAR->dev_num = GOMP_ADDITIONAL_ICVS.device_num;
>
>        /* Set 'fn' to trigger processing on the host; wait for completion,
>           which is flagged by setting 'fn' back to 0 on the host.  */
>        uint64_t addr_struct_fn = (uint64_t) &GOMP_REV_OFFLOAD_VAR->fn;
>      #if __PTX_SM__ >= 700
>        asm volatile ("st.global.release.sys.u64 [%0], %1;"
>                      : : "r"(addr_struct_fn), "r" (fn) : "memory");
>      #else
>        __sync_synchronize ();  /* membar.sys */
>        asm volatile ("st.volatile.global.u64 [%0], %1;"
>                      : : "r"(addr_struct_fn), "r" (fn) : "memory");
>      #endif
>
>      #if __PTX_SM__ >= 700
>        uint64_t fn2;
>        do
>          {
>            asm volatile ("ld.acquire.sys.global.u64 %0, [%1];"
>                          : "=r" (fn2) : "r" (addr_struct_fn) : "memory");
>          }
>        while (fn2 != 0);
>      #else
>        /* ld.global.u64 %r64,[__gomp_rev_offload_var];
>           ld.u64 %r36,[%r64];
>           membar.sys;  */
>        while (__atomic_load_n (&GOMP_REV_OFFLOAD_VAR->fn, __ATOMIC_ACQUIRE) != 0)
>          ;  /* spin  */
>      #endif
>      [...]
>
>
> Grüße
>   Thomas
> -----------------
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

  reply	other threads:[~2023-04-28  8:28 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-26  9:07 Tobias Burnus
2022-08-26  9:07 ` Tobias Burnus
2022-08-26 14:56 ` Alexander Monakov
2022-09-09 15:49   ` Jakub Jelinek
2022-09-09 15:51 ` Jakub Jelinek
2022-09-13  7:07 ` Tobias Burnus
2022-09-21 20:06   ` Alexander Monakov
2022-09-26 15:07     ` Tobias Burnus
2022-09-26 17:45       ` Alexander Monakov
2022-09-27  9:23         ` Tobias Burnus
2022-09-28 13:16           ` Alexander Monakov
2022-10-02 18:13           ` Tobias Burnus
2022-10-07 14:26             ` [Patch][v5] " Tobias Burnus
2022-10-11 10:49               ` Jakub Jelinek
2022-10-11 11:12                 ` Alexander Monakov
2022-10-12  8:55                   ` Tobias Burnus
2022-10-17  7:35                     ` *ping* / " Tobias Burnus
2022-10-19 15:53                     ` Alexander Monakov
2022-10-24 14:07                     ` Jakub Jelinek
2022-10-24 19:05                       ` Thomas Schwinge
2022-10-24 19:11                         ` Thomas Schwinge
2022-10-24 19:46                           ` Tobias Burnus
2022-10-24 19:51                           ` libgomp/nvptx: Prepare for reverse-offload callback handling, resolve spurious SIGSEGVs (was: [Patch][v5] libgomp/nvptx: Prepare for reverse-offload callback handling) Thomas Schwinge
2023-03-21 15:53 ` libgomp: Simplify OpenMP reverse offload host <-> device memory copy implementation (was: [Patch] " Thomas Schwinge
2023-03-24 15:43   ` [og12] " Thomas Schwinge
2023-04-28  8:48   ` Tobias Burnus
2023-04-28  9:31     ` Thomas Schwinge
2023-04-28 10:51       ` Tobias Burnus
2023-04-04 14:40 ` [Patch] libgomp/nvptx: Prepare for reverse-offload callback handling Thomas Schwinge
2023-04-28  8:28   ` Tobias Burnus [this message]
2023-04-28  9:23     ` Thomas Schwinge

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5915414b-cd34-f1b0-cb2d-aea0560b1f3c@codesourcery.com \
    --to=tobias@codesourcery.com \
    --cc=amonakov@ispras.ru \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jakub@redhat.com \
    --cc=tdevries@suse.de \
    --cc=thomas@codesourcery.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).