public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Thomas Schwinge <thomas@codesourcery.com>
To: <gcc-patches@gcc.gnu.org>
Cc: Tom de Vries <tdevries@suse.de>
Subject: [og12] nvptx: Make 'nvptx_uniform_warp_check' fit for non-full-warp execution
Date: Fri, 20 Jan 2023 21:23:06 +0100	[thread overview]
Message-ID: <87a62d2bx1.fsf@dem-tschwing-1.ger.mentorg.com> (raw)
In-Reply-To: <87a63ofrpf.fsf@euler.schwinge.homeip.net>

[-- Attachment #1: Type: text/plain, Size: 1070 bytes --]

Hi!

On 2022-12-15T19:27:08+0100, I wrote:
> [...] I'd like to make 'nvptx_uniform_warp_check'
> fit for non-full-warp execution.  For example, to be able to execute such
> code in single-threaded 'cuLaunchKernel' for execution of global
> constructors/destructors, where those may, for example, call into nvptx
> target libraries compiled with '-mgomp' (thus, '-muniform-simt').
>
> OK to push (after proper testing, and with TODO markers adjusted/removed)
> the attached
> "nvptx: Make 'nvptx_uniform_warp_check' fit for non-full-warp execution"?

For now pushed, still with TODO markers, to devel/omp/gcc-12 branch in
commit d26a2a299392af330b3576b62d4eb6c81820be29
"nvptx: Make 'nvptx_uniform_warp_check' fit for non-full-warp execution",
see attached.


Grüße
 Thomas


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-nvptx-Make-nvptx_uniform_warp_check-fit-for-non-full.patch --]
[-- Type: text/x-diff, Size: 7278 bytes --]

From d26a2a299392af330b3576b62d4eb6c81820be29 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Mon, 12 Dec 2022 22:05:37 +0100
Subject: [PATCH] nvptx: Make 'nvptx_uniform_warp_check' fit for non-full-warp
 execution

For example, this allows for '-muniform-simt' code to be executed
single-threaded, which currently fails (device-side 'trap'), as the 0xffffffff
mask isn't correct if not all 32 threads of a warp are active.  The same
issue/fix, I suppose but have not verified, would apply if we were to allow for
OpenACC 'vector_length' smaller than 32, for example for OpenACC 'serial'.

We use 'nvptx_uniform_warp_check' only for PTX ISA version less than 6.0.
Otherwise we're using 'nvptx_warpsync', which emits 'bar.warp.sync 0xffffffff',
which evidently appears to do the right thing.  (I've tested '-muniform-simt'
code executing single-threaded.)

	gcc/
	* config/nvptx/nvptx.md (nvptx_uniform_warp_check): Make fit for
	non-full-warp execution.
	gcc/testsuite/
	* gcc.target/nvptx/nvptx.exp
	(check_effective_target_default_ptx_isa_version_at_least_6_0):
	New.
	* gcc.target/nvptx/uniform-simt-5.c: New.
	libgomp/
	* plugin/plugin-nvptx.c (nvptx_exec): Assert what we know about
	'blockDimX'.
---
 gcc/ChangeLog.omp                             |  5 ++++
 gcc/config/nvptx/nvptx.md                     | 16 ++++++++++-
 gcc/testsuite/ChangeLog.omp                   |  7 +++++
 gcc/testsuite/gcc.target/nvptx/nvptx.exp      |  5 ++++
 .../gcc.target/nvptx/uniform-simt-5.c         | 28 +++++++++++++++++++
 libgomp/ChangeLog.omp                         |  3 ++
 libgomp/plugin/plugin-nvptx.c                 |  3 ++
 7 files changed, 66 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/nvptx/uniform-simt-5.c

diff --git a/gcc/ChangeLog.omp b/gcc/ChangeLog.omp
index 2d4b7513413..382cd5c80c2 100644
--- a/gcc/ChangeLog.omp
+++ b/gcc/ChangeLog.omp
@@ -1,3 +1,8 @@
+2023-01-20  Thomas Schwinge  <thomas@codesourcery.com>
+
+	* config/nvptx/nvptx.md (nvptx_uniform_warp_check): Make fit for
+	non-full-warp execution.
+
 2023-01-19  Tobias Burnus  <tobias@codesourcery.com>
 
 	Backported from master:
diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 04c150b8982..d27126556ce 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -2321,10 +2321,24 @@
       "{",
       "\\t"		  ".reg.b32"	    "\\t" "%%r_act;",
       "%.\\t"		  "vote.ballot.b32" "\\t" "%%r_act,1;",
+      /* For '%r_exp', we essentially need 'activemask.b32', but that is "Introduced in PTX ISA version 6.2", and this code here is used only 'if (!TARGET_PTX_6_0)'.  Thus, emulate it.
+         TODO Is that actually correct?  Wouldn't 'activemask.b32' rather replace our 'vote.ballot.b32' given that it registers the *currently active threads*?  */
+      /* Compute the "membermask" of all threads of the warp that are expected to be converged here.
+      	 For OpenACC, '%ntid.x' is 'vector_length', which per 'nvptx_goacc_validate_dims' always is a multiple of 32.
+	 For OpenMP, '%ntid.x' always is 32.
+      	 Thus, this is typically 0xffffffff, but additionally always for the case that not all 32 threads of the warp have been launched.
+	 This assume that lane IDs are assigned in ascending order.  */
+      //TODO Can we rely on '1 << 32 == 0', and '0 - 1 = 0xffffffff'?
+      //TODO https://developer.nvidia.com/blog/using-cuda-warp-level-primitives/
+      //TODO https://stackoverflow.com/questions/54055195/activemask-vs-ballot-sync
+      "\\t"		  ".reg.b32"	    "\\t" "%%r_exp;",
+      "%.\\t"		  "mov.b32"	    "\\t" "%%r_exp, %%ntid.x;",
+      "%.\\t"		  "shl.b32"	    "\\t" "%%r_exp, 1, %%r_exp;",
+      "%.\\t"		  "sub.u32"	    "\\t" "%%r_exp, %%r_exp, 1;",
       "\\t"		  ".reg.pred"	    "\\t" "%%r_do_abort;",
       "\\t"		  "mov.pred"	    "\\t" "%%r_do_abort,0;",
       "%.\\t"		  "setp.ne.b32"	    "\\t" "%%r_do_abort,%%r_act,"
-						  "0xffffffff;",
+						  "%%r_exp;",
       "@ %%r_do_abort\\t" "trap;",
       "@ %%r_do_abort\\t" "exit;",
       "}",
diff --git a/gcc/testsuite/ChangeLog.omp b/gcc/testsuite/ChangeLog.omp
index d4b483b124b..7339bf41482 100644
--- a/gcc/testsuite/ChangeLog.omp
+++ b/gcc/testsuite/ChangeLog.omp
@@ -1,3 +1,10 @@
+2023-01-20  Thomas Schwinge  <thomas@codesourcery.com>
+
+	* gcc.target/nvptx/nvptx.exp
+	(check_effective_target_default_ptx_isa_version_at_least_6_0):
+	New.
+	* gcc.target/nvptx/uniform-simt-5.c: New.
+
 2023-01-16  Tobias Burnus  <tobias@codesourcery.com>
 
 	Backported from master:
diff --git a/gcc/testsuite/gcc.target/nvptx/nvptx.exp b/gcc/testsuite/gcc.target/nvptx/nvptx.exp
index e9622ae7aaa..17e03daeb7e 100644
--- a/gcc/testsuite/gcc.target/nvptx/nvptx.exp
+++ b/gcc/testsuite/gcc.target/nvptx/nvptx.exp
@@ -49,6 +49,11 @@ proc check_effective_target_default_ptx_isa_version_at_least { major minor } {
     return $res
 }
 
+# Return 1 if code by default compiles for at least PTX ISA version 6.0.
+proc check_effective_target_default_ptx_isa_version_at_least_6_0 { } {
+    return [check_effective_target_default_ptx_isa_version_at_least 6 0]
+}
+
 # Return 1 if code with PTX ISA version major.minor or higher can be run.
 proc check_effective_target_runtime_ptx_isa_version_at_least { major minor } {
     set name runtime_ptx_isa_version_${major}_${minor}
diff --git a/gcc/testsuite/gcc.target/nvptx/uniform-simt-5.c b/gcc/testsuite/gcc.target/nvptx/uniform-simt-5.c
new file mode 100644
index 00000000000..b2f78198db2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/uniform-simt-5.c
@@ -0,0 +1,28 @@
+/* Verify that '-muniform-simt' code may be executed single-threaded.
+
+   { dg-do run }
+   { dg-options {-save-temps -O2 -muniform-simt} } */
+
+enum memmodel
+{
+  MEMMODEL_RELAXED = 0
+};
+
+unsigned long long int v64;
+unsigned long long int *p64 = &v64;
+
+int
+main()
+{
+  /* Trigger uniform-SIMT processing.  */
+  __atomic_fetch_add (p64, v64, MEMMODEL_RELAXED);
+
+  return 0;
+}
+
+/* Per 'omp_simt_exit':
+     - 'nvptx_warpsync'
+       { dg-final { scan-assembler-times {bar\.warp\.sync\t0xffffffff;} 1 { target default_ptx_isa_version_at_least_6_0 } } }
+     - 'nvptx_uniform_warp_check'
+       { dg-final { scan-assembler-times {vote\.ballot\.b32\t%r_act,1;} 1 { target { ! default_ptx_isa_version_at_least_6_0 } } } }
+*/
diff --git a/libgomp/ChangeLog.omp b/libgomp/ChangeLog.omp
index 33aa4b01350..4447b74a2ab 100644
--- a/libgomp/ChangeLog.omp
+++ b/libgomp/ChangeLog.omp
@@ -1,5 +1,8 @@
 2023-01-20  Thomas Schwinge  <thomas@codesourcery.com>
 
+	* plugin/plugin-nvptx.c (nvptx_exec): Assert what we know about
+	'blockDimX'.
+
 	PR target/85463
 	* config/nvptx/error.c (exit): Don't override.
 	* testsuite/libgomp.oacc-fortran/error_stop-1.f: Update.
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 4a1b9f579e4..b2fabc61cc8 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -998,6 +998,9 @@ nvptx_exec (void (*fn), size_t mapnum, void **hostaddrs, void **devaddrs,
 					    api_info);
     }
 
+  /* Per 'nvptx_goacc_validate_dims'.  */
+  assert (dims[GOMP_DIM_VECTOR] % warp_size == 0);
+
   kargs[0] = &dp;
   CUDA_CALL_ASSERT (cuLaunchKernel, function,
 		    dims[GOMP_DIM_GANG], 1, 1,
-- 
2.25.1


  parent reply	other threads:[~2023-01-20 20:23 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-01 18:31 [committed][nvptx] Add uniform_warp_check insn Tom de Vries
2022-09-14  9:41 ` Thomas Schwinge
2022-09-14  9:56   ` Tom de Vries
2022-12-15 18:27 ` nvptx: Make 'nvptx_uniform_warp_check' fit for non-full-warp execution (was: [committed][nvptx] Add uniform_warp_check insn) Thomas Schwinge
2023-01-11 11:37   ` [PING] " Thomas Schwinge
2023-01-20 20:23   ` Thomas Schwinge [this message]
2024-06-04 19:53   ` nvptx: Make 'nvptx_uniform_warp_check' fit for non-full-warp execution, via 'vote.all.pred' (was: nvptx: Make 'nvptx_uniform_warp_check' fit for non-full-warp execution (was: [committed][nvptx] Add uniform_warp_check insn)) Thomas Schwinge

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87a62d2bx1.fsf@dem-tschwing-1.ger.mentorg.com \
    --to=thomas@codesourcery.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=tdevries@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).