Re: [PATCH] nvptx: Cache stacks block for OpenMP kernel launch

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Jakub Jelinek <jakub@redhat.com>
To: Julian Brown <julian@codesourcery.com>
Cc: gcc-patches@gcc.gnu.org,
	Thomas Schwinge <thomas@codesourcery.com>,
	Tom de Vries <tdevries@suse.de>,
	Alexander Monakov <amonakov@ispras.ru>
Subject: Re: [PATCH] nvptx: Cache stacks block for OpenMP kernel launch
Date: Mon, 26 Oct 2020 15:26:34 +0100	[thread overview]
Message-ID: <20201026142634.GI7080@tucnak> (raw)
In-Reply-To: <20201026141448.109041-1-julian@codesourcery.com>

On Mon, Oct 26, 2020 at 07:14:48AM -0700, Julian Brown wrote:
> This patch adds caching for the stack block allocated for offloaded
> OpenMP kernel launches on NVPTX. This is a performance optimisation --
> we observed an average 11% or so performance improvement with this patch
> across a set of accelerated GPU benchmarks on one machine (results vary
> according to individual benchmark and with hardware used).
> 
> A given kernel launch will reuse the stack block from the previous launch
> if it is large enough, else it is freed and reallocated. A slight caveat
> is that memory will not be freed until the device is closed, so e.g. if
> code is using highly variable launch geometries and large amounts of
> GPU RAM, you might run out of resources slightly quicker with this patch.
> 
> Another way this patch gains performance is by omitting the
> synchronisation at the end of an OpenMP offload kernel launch -- it's
> safe for the GPU and CPU to continue executing in parallel at that point,
> because e.g. copies-back from the device will be synchronised properly
> with kernel completion anyway.
> 
> In turn, the last part necessitates a change to the way "(perhaps abort
> was called)" errors are detected and reported.
> 
> Tested with offloading to NVPTX. OK for mainline?

I'm afraid I don't know the plugin nor CUDA well enough to review this
properly (therefore I'd like to hear from Thomas, Tom and/or Alexander.
Anyway, just two questions, wouldn't it make sense to add some upper bound
limit over which it wouldn't cache the stacks, so that it would cache
most of the time for normal programs but if some kernel is really excessive
and then many normal ones wouldn't result in memory allocation failures?

And, in which context are cuStreamAddCallback registered callbacks run?
E.g. if it is inside of asynchronous interrput, using locking in there might
not be the best thing to do.

> -  r = CUDA_CALL_NOCHECK (cuCtxSynchronize, );
> -  if (r == CUDA_ERROR_LAUNCH_FAILED)
> -    GOMP_PLUGIN_fatal ("cuCtxSynchronize error: %s %s\n", cuda_error (r),
> -		       maybe_abort_msg);
> -  else if (r != CUDA_SUCCESS)
> -    GOMP_PLUGIN_fatal ("cuCtxSynchronize error: %s", cuda_error (r));
> -  nvptx_stacks_free (stacks, teams * threads);
> +  CUDA_CALL_ASSERT (cuStreamAddCallback, NULL, nvptx_stacks_release,
> +		    (void *) ptx_dev, 0);
>  }
>  
>  /* TODO: Implement GOMP_OFFLOAD_async_run. */
> -- 
> 2.28.0

	Jakub

next prev parent reply	other threads:[~2020-10-26 14:26 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-26 14:14 Julian Brown
2020-10-26 14:26 ` Jakub Jelinek [this message]
2020-11-09 21:32   ` Alexander Monakov
2020-11-13 20:54     ` Julian Brown
2020-12-08  1:13       ` Julian Brown
2020-12-08 17:11         ` Alexander Monakov
2020-12-15 13:39           ` Julian Brown
2020-12-15 13:49             ` Jakub Jelinek
2020-12-15 16:49               ` Julian Brown
2020-12-15 17:00                 ` Jakub Jelinek
2020-12-15 23:16                   ` Julian Brown
2021-01-05 12:13                     ` Julian Brown
2021-01-05 15:32                       ` Jakub Jelinek
2020-10-27 13:17 ` Julian Brown
2020-10-28  7:25   ` Chung-Lin Tang
2020-10-28 11:32     ` Julian Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201026142634.GI7080@tucnak \
    --to=jakub@redhat.com \
    --cc=amonakov@ispras.ru \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=julian@codesourcery.com \
    --cc=tdevries@suse.de \
    --cc=thomas@codesourcery.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).