Re: [RFC PATCH] Coalesce host to device transfers in libgomp

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Jakub Jelinek <jakub@redhat.com>
To: Cesar Philippidis <cesar@codesourcery.com>
Cc: Alexander Monakov <amonakov@ispras.ru>,
	       Thomas Schwinge <thomas@codesourcery.com>,
	       Martin Jambor <mjambor@suse.cz>,
	gcc-patches@gcc.gnu.org
Subject: Re: [RFC PATCH] Coalesce host to device transfers in libgomp
Date: Tue, 24 Oct 2017 16:02:00 -0000	[thread overview]
Message-ID: <20171024155926.GN14653@tucnak> (raw)
In-Reply-To: <51b85cc2-a282-bb76-a6ea-365a92262244@codesourcery.com>

On Tue, Oct 24, 2017 at 08:47:39AM -0700, Cesar Philippidis wrote:
> On 10/24/2017 02:55 AM, Jakub Jelinek wrote:
> 
> > Poeple from NVidia reported privately unexpected amount of host2dev
> > transfers for #pragma omp target*.
> 
> Did they mention which program they were testing?

No.  Just the nvprof counted counts from GCC and LLVM.

> > The following patch implements coalescing of transfers (only those that are
> > copied into the freshly allocated device buffer) into one or multiple larger
> > transfers.  The patch doesn't coalesce > 32KB transfers or transfers where
> > the gap is 4KB or more.  I guess it would be not too hard to do similar
> > coalescing for the dev2host transfers that are from a single device mapping,
> > though probably far less important than the more common host2dev transfers.
> 
> Why did you chose the 32KB and 4KB limits? I wonder if that would have
> any impact on firstprivate_int values. If this proves to be effective,
> it seems like we should be able to eliminate GOMP_MAP_FIRSTPRIVATE_INT
> altogether.

The thing is that this is a generic code, so it is hard to come up with
reasonable limits.  We could even have some limits e.g. in *devicep
if we get different needs for different offloading targets.

The 32KB and 4KB just come from some discussions with Alexander on IRC
that larger copies saturate the PCI and the overhead isn't significant, so
in that case copying e.g. megabyte into another memory and then to the
device would likely not be beneficial.

I'd prefer to keep GOMP_MAP_FIRSTPRIVATE_INT, I think it is a useful
optimization for the most common case, even if it is not 2 separate host2dev
copies for it compared to 1 for GOMP_MAP_FIRSTPRIVATE_INT, it is still extra
memory dereferences both on the host and on the target.

> > +struct gomp_map_cache
> > +{
> > +  void *buf;
> > +  struct target_mem_desc *tgt;
> > +  size_t *chunks;
> > +  long chunk_cnt;
> > +  long use_cnt;
> > +};
> > +
> 
> Maybe include a comment here stating that you want to restrict caching
> to 32KB with variables with no gaps larger than 4KB?

Sure.  Maybe even better to turn those for now into defines and add comments
to those.

> One other minor optimization, would be to change arguments to offloaded
> functions from a single struct to individual arguments. At least for
> nvptx, cuLaunchKernel accepts variable arguments for PTX kernels. There
> are two advantages of this. 1) At least with nvptx, nvptx_exec wouldn't
> need to reserve a block of device memory for struct argument. 2) This
> would eliminate one level of indirection for each offloaded argument
> (although SRA probably takes care of the majority of this already).

At least for OpenMP, we are now using a wrapper around the generated code
which sets stuff up, so not sure if that would be possible.  The wrapper
among other things sets up the soft-stack.  Not sure if it wouldn't be
possible to replace it with a magic call at the begining of OpenMP kernel
starts.

	Jakub

next prev parent reply	other threads:[~2017-10-24 15:59 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-24  9:57 Jakub Jelinek
2017-10-24 15:59 ` Cesar Philippidis
2017-10-24 16:02   ` Jakub Jelinek [this message]
2017-10-24 17:56     ` Alexander Monakov
2017-10-24 17:40 ` Alexander Monakov
2017-10-24 19:36   ` Jakub Jelinek
2017-10-25 12:03   ` Jakub Jelinek
2017-10-27 14:13     ` [PATCH] Implement omp async support for nvptx Tom de Vries
2017-10-30  7:25       ` Jakub Jelinek
2017-10-30 12:02         ` Tom de Vries
2017-10-30 13:52           ` Tom de Vries
2018-12-06 17:02     ` [RFC PATCH] Coalesce host to device transfers in libgomp Thomas Schwinge
2018-12-06 17:19       ` Jakub Jelinek
2018-12-06 17:54         ` Thomas Schwinge
2018-12-06 17:57           ` Jakub Jelinek
2018-12-09 12:53             ` Thomas Schwinge
2019-12-18 17:15     ` Thomas Schwinge
2019-05-23 14:40 ` Thomas Schwinge
2019-05-23 14:57   ` Jakub Jelinek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171024155926.GN14653@tucnak \
    --to=jakub@redhat.com \
    --cc=amonakov@ispras.ru \
    --cc=cesar@codesourcery.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=mjambor@suse.cz \
    --cc=thomas@codesourcery.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).