public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Chung-Lin Tang <chunglin_tang@mentor.com>
To: Thomas Schwinge <thomas@codesourcery.com>,
	Chung-Lin Tang	<cltang@codesourcery.com>
Cc: <gcc-patches@gcc.gnu.org>
Subject: Re: Too strict synchronization with the local (host) thread?
Date: Tue, 11 Dec 2018 13:30:00 -0000	[thread overview]
Message-ID: <dc03f1a2-405d-9cc8-8aae-702d8ff04124@mentor.com> (raw)
In-Reply-To: <yxfp7egl8h4w.fsf@hertz.schwinge.homeip.net>

On 2018/12/7 11:56 PM, Thomas Schwinge wrote:
>> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-79.c
>> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-79.c
>> @@ -114,6 +114,7 @@ main (int argc, char **argv)
>>   
>>     for (i = 0; i < N; i++)
>>       {
>> +      stream = (CUstream) acc_get_cuda_stream (i & 1);
>>         r = cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, stream, kargs, 0);
> What's the motivation for this change?

To place work on both streams 0 and 1.

> ..., and this change are needed because we're now more strictly
> synchronizing with the local (host) thread.
> 
> Regarding the case of "libgomp.oacc-c-c++-common/lib-81.c", as currently
> present:
> 
>      [...]
>        for (i = 0; i < N; i++)
>          {
>            r = cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, streams[i], kargs, 0);
>            if (r != CUDA_SUCCESS)
>              {
>                fprintf (stderr, "cuLaunchKernel failed: %d\n", r);
>                abort ();
>              }
>          }
> 
> This launches N kernels on N separate async queues/CUDA streams, [0..N).
> 
>        acc_wait_all_async (N);
> 
> Then, the "acc_wait_all_async (N)" -- in my understanding! -- should
> *not*  synchronize with the local (host) thread, but instead just set up
> the additional async queue/CUDA stream N to "depend" on [0..N).
> 
>        for (i = 0; i <= N; i++)
>          {
>            if (acc_async_test (i) != 0)
>              abort ();
>          }
> 
> Thus, all [0..N) should then still be "acc_async_test (i) != 0" (still
> running).
> 
>        acc_wait (N);
> 
> Here, the "acc_wait (N)" would synchronize the local (host) thread with
> async queue/CUDA stream N and thus recursively with [0..N).
> 
>        for (i = 0; i <= N; i++)
>          {
>            if (acc_async_test (i) != 1)
>              abort ();
>          }
>      [...]
> 
> So, then all these async queues/CUDA streams here indeed are
> "acc_async_test (i) != 1", thas is, idle.
> 
> 
> Now, the more strict synchronization with the local (host) thread is not
> wrong in term of correctness, but I suppose it will impact performance of
> otherwise asynchronous operations, which now get synchronized too much?
> 
> Or, of course, I'm misunderstanding something...

IIRC, we encountered many issues where people misunderstood the meaning of "wait+async",
using it as if the local host sync happened, where in our original implementation it does not.

Also some areas of the OpenACC spec were vague on whether the local host synchronization should
or should not happen; basically, the wording treated as if it was only an implementation detail
and didn't matter, and didn't acknowledge that this would be something visible to the user.

At the end, IIRC, I decided that adding a local host synchronization is easier for all of us,
and took the opportunity of the re-org to make this change.

That said, I didn't notice those tests you listed above were meant to test such delicate behavior.

> (For avoidance of doubt, I would accept the "async re-work" as is, but we
> should eventually clarify this, and restore the behavior we -- apparently
> -- had before, where we didn't synchronize so much?  (So, technically,
> the "async re-work" would constitute a regression for this kind of
> usage?)

It's not hard to restore the old behavior, just a few lines to delete. Although as described
above, this change was deliberate.

This might be another issue to raise with the committee. I think I tried on this exact issue
a long time ago, but never got answers.

Thanks,
Chung-Lin

  reply	other threads:[~2018-12-11 13:30 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-25 13:12 [PATCH 5/6, OpenACC, libgomp] Async re-work, C/C++ testsuite changes Chung-Lin Tang
2018-12-07 15:31 ` Thomas Schwinge
2018-12-14 21:09   ` Thomas Schwinge
2018-12-07 15:39 ` [PR88407] [OpenACC] Correctly handle unseen async-arguments (was: [PATCH 5/6, OpenACC, libgomp] Async re-work, C/C++ testsuite changes) Thomas Schwinge
2018-12-14 21:13   ` [PR88407] [OpenACC] Correctly handle unseen async-arguments Thomas Schwinge
2018-12-07 15:57 ` Too strict synchronization with the local (host) thread? (was: [PATCH 5/6, OpenACC, libgomp] Async re-work, C/C++ testsuite changes) Thomas Schwinge
2018-12-11 13:30   ` Chung-Lin Tang [this message]
2018-12-14 14:50     ` Too strict synchronization with the local (host) thread? Thomas Schwinge

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dc03f1a2-405d-9cc8-8aae-702d8ff04124@mentor.com \
    --to=chunglin_tang@mentor.com \
    --cc=cltang@codesourcery.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=thomas@codesourcery.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).