[ptx] partitioning optimization

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Nathan Sidwell <nathan@acm.org>
To: GCC Patches <gcc-patches@gcc.gnu.org>
Subject: [ptx] partitioning optimization
Date: Tue, 10 Nov 2015 22:34:00 -0000	[thread overview]
Message-ID: <564270D6.6090303@acm.org> (raw)

[-- Attachment #1: Type: text/plain, Size: 633 bytes --]

I've committed this patch to trunk.  It implements a partitioning optimization 
for a loop partitioned over both vector and worker axes.  We can elide the inner 
vector partitioning state propagation, if there are no intervening instructions 
in the worker-partitioned outer loop other than the forking and joining.  We 
simply execute the worker propagation on all vectors.

I've been unable to introduce a testcase for this. The difficulty is we want to 
check an rtl dump from the acceleration compiler, and there doesn't  appear to 
be existing machinery for that in the testsuite.  Perhaps something to be added 
later?

nathan

[-- Attachment #2: trunk-ptx-opt-1110.patch --]
[-- Type: text/x-patch, Size: 4574 bytes --]

2015-11-10  Nathan Sidwell  <nathan@codesourcery.com>

	* config/nvptx/nvptx.opt (moptimize): New flag.
	* config/nvptx/nvptx.c (nvptx_option_override): Set nvptx_optimize
	default.
	(nvptx_optimize_inner): New.
	(nvptx_process_pars): Call it when optimizing.
	* doc/invoke.texi (Nvidia PTX Options): Document -moptimize.

Index: config/nvptx/nvptx.c
===================================================================
--- config/nvptx/nvptx.c	(revision 230112)
+++ config/nvptx/nvptx.c	(working copy)
@@ -137,6 +137,9 @@ nvptx_option_override (void)
   write_symbols = NO_DEBUG;
   debug_info_level = DINFO_LEVEL_NONE;
 
+  if (nvptx_optimize < 0)
+    nvptx_optimize = optimize > 0;
+
   declared_fndecls_htab = hash_table<tree_hasher>::create_ggc (17);
   needed_fndecls_htab = hash_table<tree_hasher>::create_ggc (17);
   declared_libfuncs_htab
@@ -2942,6 +2945,69 @@ nvptx_skip_par (unsigned mask, parallel
   nvptx_single (mask, par->forked_block, pre_tail);
 }
 
+/* If PAR has a single inner parallel and PAR itself only contains
+   empty entry and exit blocks, swallow the inner PAR.  */
+
+static void
+nvptx_optimize_inner (parallel *par)
+{
+  parallel *inner = par->inner;
+
+  /* We mustn't be the outer dummy par.  */
+  if (!par->mask)
+    return;
+
+  /* We must have a single inner par.  */
+  if (!inner || inner->next)
+    return;
+
+  /* We must only contain 2 blocks ourselves -- the head and tail of
+     the inner par.  */
+  if (par->blocks.length () != 2)
+    return;
+
+  /* We must be disjoint partitioning.  As we only have vector and
+     worker partitioning, this is sufficient to guarantee the pars
+     have adjacent partitioning.  */
+  if ((par->mask & inner->mask) & (GOMP_DIM_MASK (GOMP_DIM_MAX) - 1))
+    /* This indicates malformed code generation.  */
+    return;
+
+  /* The outer forked insn should be immediately followed by the inner
+     fork insn.  */
+  rtx_insn *forked = par->forked_insn;
+  rtx_insn *fork = BB_END (par->forked_block);
+
+  if (NEXT_INSN (forked) != fork)
+    return;
+  gcc_checking_assert (recog_memoized (fork) == CODE_FOR_nvptx_fork);
+
+  /* The outer joining insn must immediately follow the inner join
+     insn.  */
+  rtx_insn *joining = par->joining_insn;
+  rtx_insn *join = inner->join_insn;
+  if (NEXT_INSN (join) != joining)
+    return;
+
+  /* Preconditions met.  Swallow the inner par.  */
+  if (dump_file)
+    fprintf (dump_file, "Merging loop %x [%d,%d] into %x [%d,%d]\n",
+	     inner->mask, inner->forked_block->index,
+	     inner->join_block->index,
+	     par->mask, par->forked_block->index, par->join_block->index);
+
+  par->mask |= inner->mask & (GOMP_DIM_MASK (GOMP_DIM_MAX) - 1);
+
+  par->blocks.reserve (inner->blocks.length ());
+  while (inner->blocks.length ())
+    par->blocks.quick_push (inner->blocks.pop ());
+
+  par->inner = inner->inner;
+  inner->inner = NULL;
+
+  delete inner;
+}
+
 /* Process the parallel PAR and all its contained
    parallels.  We do everything but the neutering.  Return mask of
    partitioned modes used within this parallel.  */
@@ -2949,6 +3015,9 @@ nvptx_skip_par (unsigned mask, parallel
 static unsigned
 nvptx_process_pars (parallel *par)
 {
+  if (nvptx_optimize)
+    nvptx_optimize_inner (par);
+  
   unsigned inner_mask = par->mask;
 
   /* Do the inner parallels first.  */
Index: config/nvptx/nvptx.opt
===================================================================
--- config/nvptx/nvptx.opt	(revision 230112)
+++ config/nvptx/nvptx.opt	(working copy)
@@ -28,3 +28,7 @@ Generate code for a 64-bit ABI.
 mmainkernel
 Target Report RejectNegative
 Link in code for a __main kernel.
+
+moptimize
+Target Report Var(nvptx_optimize) Init(-1)
+Optimize partition neutering
Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi	(revision 230112)
+++ doc/invoke.texi	(working copy)
@@ -873,7 +873,7 @@ Objective-C and Objective-C++ Dialects}.
 -march=@var{arch} -mbmx -mno-bmx -mcdx -mno-cdx}
 
 @emph{Nvidia PTX Options}
-@gccoptlist{-m32 -m64 -mmainkernel}
+@gccoptlist{-m32 -m64 -mmainkernel -moptimize}
 
 @emph{PDP-11 Options}
 @gccoptlist{-mfpu  -msoft-float  -mac0  -mno-ac0  -m40  -m45  -m10 @gol
@@ -18960,6 +18960,11 @@ Generate code for 32-bit or 64-bit ABI.
 Link in code for a __main kernel.  This is for stand-alone instead of
 offloading execution.
 
+@item -moptimize
+@opindex moptimize
+Apply partitioned execution optimizations.  This is the default when any
+level of optimization is selected.
+
 @end table
 
 @node PDP-11 Options

next prev        reply	other threads:[~2015-11-10 22:34 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-26 12:33 [gomp4] loop partition optimization Nathan Sidwell
2015-11-10 22:34 ` Nathan Sidwell [this message]
2015-11-10 22:45   ` [ptx] partitioning optimization Ilya Verbin
2015-11-11 13:37     ` Nathan Sidwell
2015-11-11 12:06   ` Bernd Schmidt
2015-11-11 13:59     ` Nathan Sidwell
2015-11-11 14:19       ` Bernd Schmidt
2015-11-13 20:07         ` Nathan Sidwell
2015-11-13 20:22           ` Bernd Schmidt
2015-11-23  7:47             ` Thomas Schwinge
2015-11-23  8:46               ` Jakub Jelinek
2015-11-11 17:16       ` Thomas Schwinge

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=564270D6.6090303@acm.org \
    --to=nathan@acm.org \
    --cc=gcc-patches@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).