public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Julian Brown <julian@codesourcery.com>
To: Jakub Jelinek <jakub@redhat.com>
Cc: Bernd Schmidt <bernds@codesourcery.com>,
	Thomas Schwinge	<thomas@codesourcery.com>,
	<gcc-patches@gcc.gnu.org>,
	Nathan Sidwell	<nathan@codesourcery.com>
Subject: Re: [gomp4] Preserve NVPTX "reconvergence" points
Date: Mon, 22 Jun 2015 17:54:00 -0000	[thread overview]
Message-ID: <20150622184810.76fba1c2@octopus> (raw)
In-Reply-To: <20150622142456.GZ10247@tucnak.redhat.com>

On Mon, 22 Jun 2015 16:24:56 +0200
Jakub Jelinek <jakub@redhat.com> wrote:

> On Mon, Jun 22, 2015 at 02:55:49PM +0100, Julian Brown wrote:
> > One problem is that (at least on the GPU hardware we've considered
> > so far) we're somewhat constrained in how much control we have over
> > how the underlying hardware executes code: it's possible to draw up
> > a scheme where OpenACC source-level control-flow semantics are
> > reflected directly in the PTX assembly output (e.g. to say "all
> > threads in a CTA/warp will be coherent after such-and-such a
> > loop"), and lowering OpenACC directives quite early seems to make
> > that relatively tractable. (Even if the resulting code is
> > relatively un-optimisable due to the abnormal edges inserted to
> > make sure that the CFG doesn't become "ill-formed".)
> > 
> > If arbitrary optimisations are done between OMP-lowering time and
> > somewhere around vectorisation (say), it's less clear if that
> > correspondence can be maintained. Say if the code executed by half
> > the threads in a warp becomes physically separated from the code
> > executed by the other half of the threads in a warp due to some loop
> > optimisation, we can no longer easily determine where that warp will
> > reconverge, and certain other operations (relying on coherent warps
> > -- e.g. CTA synchronisation) become impossible. A similar issue
> > exists for warps within a CTA.
> > 
> > So, essentially -- I don't know how "late" loop lowering would
> > interact with:
> > 
> > (a) Maintaining a CFG that will work with PTX.
> > 
> > (b) Predication for worker-single and/or vector-single modes
> > (actually all currently-proposed schemes have problems with proper
> > representation of data-dependencies for variables and
> > compiler-generated temporaries between predicated regions.)
> 
> I don't understand why lowering the way you suggest helps here at all.
> In the proposed scheme, you essentially have whole function
> in e.g. worker-single or vector-single mode, which you need to be
> able to handle properly in any case, because users can write such
> routines themselves.

In vector-single or worker-single mode, divergence of threads within a
warp or a CTA is controlled by broadcasting the controlling expression
of conditional branches to the set of "inactive" threads, so each of
those follows along with the active thread. So you only get
potentially-problematic thread divergence when workers or vectors are
operating in partitioned mode.

So, for instance, a made-up example:

#pragma acc parallel
{
  #pragma acc loop gang
  for (i = 0; i < N; i++))
  {
    #pragma acc loop worker
    for (j = 0; j < M; j++)
    {
      if (j < M / 2)
        /* stmt 1 */
      else
        /* stmt 2 */
    }

    /* reconvergence point: thread barrier */

    [...]
  }
}

Here "stmt 1" and "stmt 2" execute in worker-partitioned, vector-single
mode. With "early lowering", the reconvergence point can be
inserted at the end of the loop, and abnormal edges (etc.) can be used
to ensure that the CFG does not get changed in such a way that there is
no longer a unique point at which the loop threads reconverge.

With "late lowering", it's no longer obvious to me if that can still be
done.

Julian

  parent reply	other threads:[~2015-06-22 17:48 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-28 14:20 Julian Brown
2015-05-28 14:59 ` Jakub Jelinek
2015-05-28 15:14   ` Thomas Schwinge
2015-05-28 15:28     ` Jakub Jelinek
2015-06-19 10:44       ` Bernd Schmidt
2015-06-19 12:32         ` Jakub Jelinek
2015-06-19 13:07           ` Bernd Schmidt
2015-06-19 14:10             ` Jakub Jelinek
2015-06-22 14:04               ` Bernd Schmidt
2015-06-22 14:25                 ` Jakub Jelinek
2015-06-24 13:37               ` Bernd Schmidt
2015-06-24 14:08                 ` Jakub Jelinek
2015-06-22 14:00           ` Julian Brown
2015-06-22 14:36             ` Jakub Jelinek
2015-06-22 15:18               ` Julian Brown
2015-06-22 15:33               ` Bernd Schmidt
2015-06-22 16:13                 ` Nathan Sidwell
2015-06-22 16:27                   ` Jakub Jelinek
2015-06-22 16:35                     ` Nathan Sidwell
2015-06-22 17:54               ` Julian Brown [this message]
2015-06-22 18:48                 ` Jakub Jelinek
2015-05-28 15:02 ` Richard Biener
2015-06-03 11:47   ` Julian Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150622184810.76fba1c2@octopus \
    --to=julian@codesourcery.com \
    --cc=bernds@codesourcery.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jakub@redhat.com \
    --cc=nathan@codesourcery.com \
    --cc=thomas@codesourcery.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).