Re: [PATCH] nvptx: Cache stacks block for OpenMP kernel launch

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Julian Brown <julian@codesourcery.com>
To: Jakub Jelinek <jakub@redhat.com>
Cc: Alexander Monakov <amonakov@ispras.ru>, <gcc-patches@gcc.gnu.org>,
	"Thomas Schwinge" <thomas@codesourcery.com>
Subject: Re: [PATCH] nvptx: Cache stacks block for OpenMP kernel launch
Date: Tue, 15 Dec 2020 16:49:38 +0000	[thread overview]
Message-ID: <20201215164938.22e5477c@squid.athome> (raw)
In-Reply-To: <20201215134940.GT3788@tucnak>

On Tue, 15 Dec 2020 14:49:40 +0100
Jakub Jelinek <jakub@redhat.com> wrote:

> On Tue, Dec 15, 2020 at 01:39:13PM +0000, Julian Brown wrote:
> > @@ -1922,7 +1997,9 @@ GOMP_OFFLOAD_run (int ord, void *tgt_fn, void
> > *tgt_vars, void **args) nvptx_adjust_launch_bounds (tgt_fn,
> > ptx_dev, &teams, &threads); 
> >    size_t stack_size = nvptx_stacks_size ();
> > -  void *stacks = nvptx_stacks_alloc (stack_size, teams * threads);
> > +
> > +  pthread_mutex_lock (&ptx_dev->omp_stacks.lock);
> > +  void *stacks = nvptx_stacks_acquire (ptx_dev, stack_size, teams
> > * threads); void *fn_args[] = {tgt_vars, stacks, (void *)
> > stack_size}; size_t fn_args_size = sizeof fn_args;
> >    void *config[] = {
> > @@ -1944,7 +2021,8 @@ GOMP_OFFLOAD_run (int ord, void *tgt_fn, void
> > *tgt_vars, void **args) maybe_abort_msg);
> >    else if (r != CUDA_SUCCESS)
> >      GOMP_PLUGIN_fatal ("cuCtxSynchronize error: %s", cuda_error
> > (r));
> > -  nvptx_stacks_free (stacks, teams * threads);
> > +
> > +  pthread_mutex_unlock (&ptx_dev->omp_stacks.lock);
> >  }  
> 
> Do you need to hold the omp_stacks.lock across the entire offloading?
> Doesn't that serialize all offloading kernels to the same device?
> I mean, can't the lock be taken just shortly at the start to either
> acquire the cached stacks or allocate a fresh stack, and then at the
> end to put the stack back into the cache?

I think you're suggesting something like what Alexander mentioned -- a
pool of cached stacks blocks in case the single, locked block is
contested. Obviously at present kernel launches are serialised on the
target anyway, so it's a question of whether having the device wait for
the host to unlock the stacks block (i.e. a context switch, FSVO context
switch), or allocating a new stacks block, is quicker. I think the
numbers posted in the parent email show that memory allocation is so
slow that just waiting for the lock wins. I'm wary of adding
unnecessary complication, especially if it'll only be exercised in
already hard-to-debug cases (i.e. lots of threads)!

Just ignoring the cache if it's "in use" (and doing an allocation/free
of another stacks block, as at present) is something I'd not quite
considered. Indeed that might work, but I'm not sure if it'll be
any faster in practice.

> Also, how will this caching interact with malloc etc. performed in
> target regions?  Shall we do the caching only if there is no other
> concurrent offloading to the device because the newlib malloc will
> not be able to figure out it could free this and let the host know it
> has freed it.

Does target-side memory allocation call back into the plugin's
GOMP_OFFLOAD_alloc? I'm not sure how that works. If not, target-side
memory allocation shouldn't be affected, I don't think?

Thanks,

Julian

next prev parent reply	other threads:[~2020-12-15 16:49 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-26 14:14 Julian Brown
2020-10-26 14:26 ` Jakub Jelinek
2020-11-09 21:32   ` Alexander Monakov
2020-11-13 20:54     ` Julian Brown
2020-12-08  1:13       ` Julian Brown
2020-12-08 17:11         ` Alexander Monakov
2020-12-15 13:39           ` Julian Brown
2020-12-15 13:49             ` Jakub Jelinek
2020-12-15 16:49               ` Julian Brown [this message]
2020-12-15 17:00                 ` Jakub Jelinek
2020-12-15 23:16                   ` Julian Brown
2021-01-05 12:13                     ` Julian Brown
2021-01-05 15:32                       ` Jakub Jelinek
2020-10-27 13:17 ` Julian Brown
2020-10-28  7:25   ` Chung-Lin Tang
2020-10-28 11:32     ` Julian Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201215164938.22e5477c@squid.athome \
    --to=julian@codesourcery.com \
    --cc=amonakov@ispras.ru \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jakub@redhat.com \
    --cc=thomas@codesourcery.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).