Re: [PATCH, nvptx, 1/2] Reimplement libgomp barriers for nvptx

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Jakub Jelinek <jakub@redhat.com>
To: Chung-Lin Tang <chunglin.tang@gmail.com>
Cc: gcc-patches <gcc-patches@gcc.gnu.org>,
	Tom de Vries <tdevries@suse.de>,
	Catherine Moore <clm@codesourcery.com>
Subject: Re: [PATCH, nvptx, 1/2] Reimplement libgomp barriers for nvptx
Date: Wed, 21 Sep 2022 11:01:51 +0200	[thread overview]
Message-ID: <YyrS/1hYWD518wM1@tucnak> (raw)
In-Reply-To: <8b974d21-e288-4596-7500-277a43c92771@gmail.com>

On Wed, Sep 21, 2022 at 03:45:36PM +0800, Chung-Lin Tang via Gcc-patches wrote:
> Hi Tom,
> I had a patch submitted earlier, where I reported that the current way of implementing
> barriers in libgomp on nvptx created a quite significant performance drop on some SPEChpc2021
> benchmarks:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-September/600818.html
> 
> That previous patch wasn't accepted well (admittedly, it was kind of a hack).
> So in this patch, I tried to (mostly) re-implement team-barriers for NVPTX.
> 
> Basically, instead of trying to have the GPU do CPU-with-OS-like things that it isn't suited for,
> barriers are implemented simplistically with bar.* synchronization instructions.
> Tasks are processed after threads have joined, and only if team->task_count != 0
> 
> (arguably, there might be a little bit of performance forfeited where earlier arriving threads
> could've been used to process tasks ahead of other threads. But that again falls into requiring
> implementing complex futex-wait/wake like behavior. Really, that kind of tasking is not what target
> offloading is usually used for)

I admit I don't have a good picture if people in real-world actually use
tasking in offloading regions and how much and in what way, but the above
definitely would be a show-stopper for typical tasking workloads, where
one thread (usually from master/masked/single construct's body) creates lots
of tasks and can spend considerable amount of time in those preparations,
while other threads are expected to handle those tasks.

Do we have an idea how are other implementations handling this?
I think it should be easily observable with atomics, have
master/masked/single that creates lots of tasks and then spends a long time
doing something, have very small task bodies that just increment some atomic
counter and at the end of the master/masked/single see how many tasks were
already encountered.

Note, I don't have any smart ideas how to handle this instead and what
you posted might be ok for what people usually do on offloading targets
in OpenMP if they use tasking at all, just wanted to mention that there
could be workloads where the above is a serious problem.  If there are
say hundreds of threads doing nothing until a single thread reaches a
barrier and there are hundreds of pending tasks...
E.g. note we have that 64 pending task limit after which we start to
create undeferred tasks, so if we never start handling tasks until
one thread is done with them, that would mean the single thread
would create 64 deferred tasks and then handle all the others itself
making it even longer until the other tasks can deal with it.

	Jakub

next prev parent reply	other threads:[~2022-09-21  9:02 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-21  7:45 Chung-Lin Tang
2022-09-21  9:01 ` Jakub Jelinek [this message]
2022-09-21 10:02   ` Chung-Lin Tang
2022-10-17 14:29 ` Chung-Lin Tang
2022-10-31 14:18   ` [Ping x2] " Chung-Lin Tang
2022-11-07 16:34     ` [Ping x3] " Chung-Lin Tang
2022-11-21 16:24       ` [Ping x4] " Chung-Lin Tang
2022-12-05 16:21         ` [Ping x5] " Chung-Lin Tang
2022-12-12 11:13           ` [Ping x6] " Chung-Lin Tang
2022-12-16 14:51 ` Tom de Vries
2022-12-19 12:13   ` Thomas Schwinge

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YyrS/1hYWD518wM1@tucnak \
    --to=jakub@redhat.com \
    --cc=chunglin.tang@gmail.com \
    --cc=clm@codesourcery.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=tdevries@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).