[PATCH] openmp: fix max_vf setting for amdgcn offloading

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] openmp: fix max_vf setting for amdgcn offloading
@ 2022-07-12 14:16 Andrew Stubbs
  2022-08-17 14:41 ` Jakub Jelinek
  2022-10-28  7:58 ` Thomas Schwinge
  0 siblings, 2 replies; 3+ messages in thread
From: Andrew Stubbs @ 2022-07-12 14:16 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1880 bytes --]

This patch ensures that the maximum vectorization factor used to set the 
"safelen" attribute on "omp simd" constructs is suitable for all the 
configured offload devices.

Right now it makes the proper adjustment for NVPTX, but otherwise just 
uses a value suitable for the host system (always x86_64 in the case of 
amdgcn).  This typically ends up being 16 where 64 is the minimum for 
vectorization to work properly on GCN.

There is a potential problem that one "safelen" must be set for *all* 
offload devices, which means it can't be perfect for all devices. 
However I believe that too big is always OK (at least for powers of 
two?) whereas too small is not OK, so this code always selects the 
largest value of max_vf, regardless of where it comes from.

The existing target VF function, omp_max_simt_vf, is tangled up with the 
notion of whether SIMT is available or not, so I couldn't add amdgcn in 
there. It's tempting to have omp_max_vf do some kind of autodetect what 
VF to choose, but the current implementation in omp-general.cc doesn't 
have access to the context in a convenient way, and nor do all the 
callers, so I couldn't easily do that. Instead, I have opted to add a 
new function, omp_max_simd_vf, which can check for the presence of amdgcn.

While reviewing the callers of omp_max_vf I found one other case that 
looks like it ought to be tuned for the device, not just the host. In 
that case it's not clear how to achieve that and in fact, at least on 
x86_64, the way it is coded the actual value from omp_max_vf is always 
ignored in favour of a much larger "minimum", so I have added a comment 
for the next person to touch that spot and left it alone.

This change gives a 10x performance improvement on the BabelStream "dot" 
benchmark on amdgcn and is not harmful on nvptx.

OK for mainline?

I will commit a backport to OG12 shortly.

Andrew

[-- Attachment #2: 220712-max_vf.patch --]
[-- Type: text/plain, Size: 5483 bytes --]

openmp: fix max_vf setting for amdgcn offloading

Ensure that the "max_vf" figure used for the "safelen" attribute is large
enough for the largest configured offload device.

This change gives ~10x speed improvement on the Bablestream "dot" benchmark for
AMD GCN.

gcc/ChangeLog:

	* gimple-loop-versioning.cc (loop_versioning::loop_versioning): Add
	comment.
	* omp-general.cc (omp_max_simd_vf): New function.
	* omp-general.h (omp_max_simd_vf): New prototype.
	* omp-low.cc (lower_rec_simd_input_clauses): Select largest from
	  omp_max_vf, omp_max_simt_vf, and omp_max_simd_vf.

gcc/testsuite/ChangeLog:

	* lib/target-supports.exp
	(check_effective_target_amdgcn_offloading_enabled): New.
	(check_effective_target_nvptx_offloading_enabled): New.
	* gcc.dg/gomp/target-vf.c: New test.

diff --git a/gcc/gimple-loop-versioning.cc b/gcc/gimple-loop-versioning.cc
index 6bcf6eba691..e908c27fc44 100644
--- a/gcc/gimple-loop-versioning.cc
+++ b/gcc/gimple-loop-versioning.cc
@@ -555,7 +555,10 @@ loop_versioning::loop_versioning (function *fn)
      unvectorizable code, since it is the largest size that can be
      handled efficiently by scalar code.  omp_max_vf calculates the
      maximum number of bytes in a vector, when such a value is relevant
-     to loop optimization.  */
+     to loop optimization.
+     FIXME: this probably needs to use omp_max_simd_vf when in a target
+     region, but how to tell? (And MAX_FIXED_MODE_SIZE is large enough that
+     it doesn't actually matter.)  */
   m_maximum_scale = estimated_poly_value (omp_max_vf ());
   m_maximum_scale = MAX (m_maximum_scale, MAX_FIXED_MODE_SIZE);
 }
diff --git a/gcc/omp-general.cc b/gcc/omp-general.cc
index a406c578f33..8c6fcebc4b3 100644
--- a/gcc/omp-general.cc
+++ b/gcc/omp-general.cc
@@ -994,6 +994,24 @@ omp_max_simt_vf (void)
   return 0;
 }
 
+/* Return maximum SIMD width if offloading may target SIMD hardware.  */
+
+int
+omp_max_simd_vf (void)
+{
+  if (!optimize)
+    return 0;
+  if (ENABLE_OFFLOADING)
+    for (const char *c = getenv ("OFFLOAD_TARGET_NAMES"); c;)
+      {
+	if (startswith (c, "amdgcn"))
+	  return 64;
+	else if ((c = strchr (c, ':')))
+	  c++;
+      }
+  return 0;
+}
+
 /* Store the construct selectors as tree codes from last to first,
    return their number.  */
 
diff --git a/gcc/omp-general.h b/gcc/omp-general.h
index 74e90e1a71a..410343e45fa 100644
--- a/gcc/omp-general.h
+++ b/gcc/omp-general.h
@@ -104,6 +104,7 @@ extern gimple *omp_build_barrier (tree lhs);
 extern tree find_combined_omp_for (tree *, int *, void *);
 extern poly_uint64 omp_max_vf (void);
 extern int omp_max_simt_vf (void);
+extern int omp_max_simd_vf (void);
 extern int omp_constructor_traits_to_codes (tree, enum tree_code *);
 extern tree omp_check_context_selector (location_t loc, tree ctx);
 extern void omp_mark_declare_variant (location_t loc, tree variant,
diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index d73c165f029..1a9a509adb9 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -4646,7 +4646,14 @@ lower_rec_simd_input_clauses (tree new_var, omp_context *ctx,
 {
   if (known_eq (sctx->max_vf, 0U))
     {
-      sctx->max_vf = sctx->is_simt ? omp_max_simt_vf () : omp_max_vf ();
+      /* If we are compiling for multiple devices choose the largest VF.  */
+      sctx->max_vf = omp_max_vf ();
+      if (omp_maybe_offloaded_ctx (ctx))
+	{
+	  if (sctx->is_simt)
+	    sctx->max_vf = ordered_max (sctx->max_vf, omp_max_simt_vf ());
+	  sctx->max_vf = ordered_max (sctx->max_vf, omp_max_simd_vf ());
+	}
       if (maybe_gt (sctx->max_vf, 1U))
 	{
 	  tree c = omp_find_clause (gimple_omp_for_clauses (ctx->stmt),
diff --git a/gcc/testsuite/gcc.dg/gomp/target-vf.c b/gcc/testsuite/gcc.dg/gomp/target-vf.c
new file mode 100644
index 00000000000..14cea45e53c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gomp/target-vf.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-fopenmp -O2 -fdump-tree-omplower" } */ 
+
+/* Ensure that the omp_max_vf, omp_max_simt_vf, and omp_max_simd_vf are working
+   properly to set the OpenMP vectorization factor for the offload target, and
+   not just for the host.  */
+
+float
+foo (float * __restrict x, float * __restrict y)
+{
+  float sum = 0.0;
+
+#pragma omp target teams distribute parallel for simd map(tofrom: sum) reduction(+:sum)
+  for (int i=0; i<1024; i++)
+    sum += x[i] * y[i];
+
+  return sum;
+}
+
+/* { dg-final { scan-tree-dump  "safelen\\(64\\)" "omplower" { target amdgcn_offloading_enabled } } } */
+/* { dg-final { scan-tree-dump  "safelen\\(32\\)" "omplower" { target { { nvptx_offloading_enabled } && { ! amdgcn_offloading_enabled } } } } } */
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 4ed7b25b9a4..363354be461 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -1025,6 +1025,16 @@ proc check_effective_target_offloading_enabled {} {
     return [check_configured_with "--enable-offload-targets"]
 }
 
+# Return 1 if compiled with --enable-offload-targets=amdgcn
+proc check_effective_target_amdgcn_offloading_enabled {} {
+    return [check_configured_with {--enable-offload-targets=[^ ]*amdgcn}]
+}
+
+# Return 1 if compiled with --enable-offload-targets=amdgcn
+proc check_effective_target_nvptx_offloading_enabled {} {
+    return [check_configured_with {--enable-offload-targets=[^ ]*nvptx}]
+}
+
 # Return 1 if compilation with -fopenacc is error-free for trivial
 # code, 0 otherwise.
 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] openmp: fix max_vf setting for amdgcn offloading
  2022-07-12 14:16 [PATCH] openmp: fix max_vf setting for amdgcn offloading Andrew Stubbs
@ 2022-08-17 14:41 ` Jakub Jelinek
  2022-10-28  7:58 ` Thomas Schwinge
  1 sibling, 0 replies; 3+ messages in thread
From: Jakub Jelinek @ 2022-08-17 14:41 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: gcc-patches

On Tue, Jul 12, 2022 at 03:16:35PM +0100, Andrew Stubbs wrote:
> --- a/gcc/gimple-loop-versioning.cc
> +++ b/gcc/gimple-loop-versioning.cc
> @@ -555,7 +555,10 @@ loop_versioning::loop_versioning (function *fn)
>       unvectorizable code, since it is the largest size that can be
>       handled efficiently by scalar code.  omp_max_vf calculates the
>       maximum number of bytes in a vector, when such a value is relevant
> -     to loop optimization.  */
> +     to loop optimization.
> +     FIXME: this probably needs to use omp_max_simd_vf when in a target
> +     region, but how to tell? (And MAX_FIXED_MODE_SIZE is large enough that
> +     it doesn't actually matter.)  */
>    m_maximum_scale = estimated_poly_value (omp_max_vf ());
>    m_maximum_scale = MAX (m_maximum_scale, MAX_FIXED_MODE_SIZE);

I think this shouldn't have the comment added, the use here actually isn't
much OpenMP related, it just uses the function because it implements
what it wants.

> --- a/gcc/omp-general.cc
> +++ b/gcc/omp-general.cc
> @@ -994,6 +994,24 @@ omp_max_simt_vf (void)
>    return 0;
>  }
>  
> +/* Return maximum SIMD width if offloading may target SIMD hardware.  */
> +
> +int
> +omp_max_simd_vf (void)

The name is just confusing.
omp_max_vf is about the SIMD maximum VF, so if you really want,
rename omp_max_vf to omp_max_simd_vf.

For the offloading related stuff, IMHO either we put it into
that single omp-general.cc function and add a bool argument to it
whether it is or might be in offloading region (ordered_maximum
from the returned value and the offloading one, but only after the
initialy return 1; conditions and adjust callers), or have this
separate function, but then IMHO the if (!optimize) return 0;
initial test should be
  if (!optimize
      || optimize_debug
      || !flag_tree_loop_optimize
      || (!flag_tree_loop_vectorize
          && OPTION_SET_P (flag_tree_loop_vectorize)))
    return 1;
because without that nothing is vectorized, on host nor on offloading
targets, and the function should be called omp_max_target_vf or
omp_max_target_simd_vf.

> +{
> +  if (!optimize)
> +    return 0;

> --- a/gcc/omp-low.cc
> +++ b/gcc/omp-low.cc
> @@ -4646,7 +4646,14 @@ lower_rec_simd_input_clauses (tree new_var, omp_context *ctx,
>  {
>    if (known_eq (sctx->max_vf, 0U))
>      {
> -      sctx->max_vf = sctx->is_simt ? omp_max_simt_vf () : omp_max_vf ();
> +      /* If we are compiling for multiple devices choose the largest VF.  */
> +      sctx->max_vf = omp_max_vf ();
> +      if (omp_maybe_offloaded_ctx (ctx))
> +	{
> +	  if (sctx->is_simt)
> +	    sctx->max_vf = ordered_max (sctx->max_vf, omp_max_simt_vf ());
> +	  sctx->max_vf = ordered_max (sctx->max_vf, omp_max_simd_vf ());
> +	}

This is wrong.  If sctx->is_simt, we know it is the SIMT version.
So we want to use omp_max_simt_vf (), not maximum of that and something
unrelated.
Only if !sctx->is_simt, we want to use maximum of omp_max_vf and if
omp_maybe_offloaded_ctx also omp_max_target_vf or how it is called (or
pass that as argument to omp_max_vf).

We have another omp_max_vf () call though, in
omp-expand.cc (omp_adjust_chunk_size).
That is for schedule (simd: dynamic, 32)
and similar, though unlike the omp-low.cc case (where using a larger VF
in that case doesn't hurt, it is used for sizing of the maxic arrays that
are afterwards resized to the actual size), using too large values in
that case is harmful.
So dunno if it should take into account offloading vf or not.
Maybe if maybe offloading maybe not it should fold to some internal fn call
dependent expression that folds to omp_max_vf of the actual target after
IPA.

	Jakub


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] openmp: fix max_vf setting for amdgcn offloading
  2022-07-12 14:16 [PATCH] openmp: fix max_vf setting for amdgcn offloading Andrew Stubbs
  2022-08-17 14:41 ` Jakub Jelinek
@ 2022-10-28  7:58 ` Thomas Schwinge
  1 sibling, 0 replies; 3+ messages in thread
From: Thomas Schwinge @ 2022-10-28  7:58 UTC (permalink / raw)
  To: Andrew Stubbs, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1768 bytes --]

Hi!

In addition to the technical issues pointed out by Jakub for this og12
commit:

On 2022-07-12T15:16:35+0100, Andrew Stubbs <ams@codesourcery.com> wrote:
> This patch [...]

> I will commit a backport to OG12 shortly.

> openmp: fix max_vf setting for amdgcn offloading

> --- a/gcc/omp-general.h
> +++ b/gcc/omp-general.h

>  extern poly_uint64 omp_max_vf (void);
>  extern int omp_max_simt_vf (void);
> +extern int omp_max_simd_vf (void);

> --- a/gcc/omp-low.cc
> +++ b/gcc/omp-low.cc
> @@ -4646,7 +4646,14 @@ lower_rec_simd_input_clauses (tree new_var, omp_context *ctx,
>  {
>    if (known_eq (sctx->max_vf, 0U))
>      {
> -      sctx->max_vf = sctx->is_simt ? omp_max_simt_vf () : omp_max_vf ();
> +      /* If we are compiling for multiple devices choose the largest VF.  */
> +      sctx->max_vf = omp_max_vf ();
> +      if (omp_maybe_offloaded_ctx (ctx))
> +     {
> +       if (sctx->is_simt)
> +         sctx->max_vf = ordered_max (sctx->max_vf, omp_max_simt_vf ());
> +       sctx->max_vf = ordered_max (sctx->max_vf, omp_max_simd_vf ());
> +     }
>        if (maybe_gt (sctx->max_vf, 1U))
>       {
>         tree c = omp_find_clause (gimple_omp_for_clauses (ctx->stmt),

... I've additionally run into a bootstrap error, and have now pushed
"Resolve '-Wsign-compare' issue in 'gcc/omp-low.cc:lower_rec_simd_input_clauses'"
to devel/omp/gcc-12 in commit 4e32d1582a137d5f34248fdd3e93d35a798f5221,
see attached.


Grüße
 Thomas


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Attachment #2: 0001-Resolve-Wsign-compare-issue-in-gcc-omp-low.cc-l.og12.patch --]
[-- Type: text/x-diff, Size: 3393 bytes --]

From 4e32d1582a137d5f34248fdd3e93d35a798f5221 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Tue, 25 Oct 2022 09:45:31 +0200
Subject: [PATCH 1/2] Resolve '-Wsign-compare' issue in
 'gcc/omp-low.cc:lower_rec_simd_input_clauses'
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

..., introduced in og12 commit 55722a87dd223149dcd41ca9c8eba16ad5b3eddc
"openmp: fix max_vf setting for amdgcn offloading":

    In file included from [...]/source-gcc/gcc/coretypes.h:482,
                     from [...]/source-gcc/gcc/omp-low.cc:27:
    [...]/source-gcc/gcc/poly-int.h: In instantiation of ‘typename if_nonpoly<Ca, bool>::type maybe_lt(const Ca&, const poly_int_pod<N, Cb>&) [with unsigned int N = 1; Ca = int; Cb = long unsigned int; typename if_nonpoly<Ca, bool>::type = bool]’:
    [...]/source-gcc/gcc/poly-int.h:1510:7:   required from ‘poly_int<N, typename poly_result<Ca, typename if_nonpoly<Cb>::type>::type> ordered_max(const poly_int_pod<N, C>&, const Cb&) [with unsigned int N = 1; Ca = long unsigned int; Cb = int; typename poly_result<Ca, typename if_nonpoly<Cb>::type>::type = long unsigned int; typename if_nonpoly<Cb>::type = int]’
    [...]/source-gcc/gcc/omp-low.cc:5180:33:   required from here
    [...]/source-gcc/gcc/poly-int.h:1384:12: error: comparison of integer expressions of different signedness: ‘const int’ and ‘const long unsigned int’ [-Werror=sign-compare]
     1384 |   return a < b.coeffs[0];
          |          ~~^~~~~~~~~~~
    [...]/source-gcc/gcc/poly-int.h: In instantiation of ‘typename if_nonpoly<Cb, bool>::type maybe_lt(const poly_int_pod<N, C>&, const Cb&) [with unsigned int N = 1; Ca = long unsigned int; Cb = int; typename if_nonpoly<Cb, bool>::type = bool]’:
    [...]/source-gcc/gcc/poly-int.h:1515:2:   required from ‘poly_int<N, typename poly_result<Ca, typename if_nonpoly<Cb>::type>::type> ordered_max(const poly_int_pod<N, C>&, const Cb&) [with unsigned int N = 1; Ca = long unsigned int; Cb = int; typename poly_result<Ca, typename if_nonpoly<Cb>::type>::type = long unsigned int; typename if_nonpoly<Cb>::type = int]’
    [...]/source-gcc/gcc/omp-low.cc:5180:33:   required from here
    [...]/source-gcc/gcc/poly-int.h:1373:22: error: comparison of integer expressions of different signedness: ‘const long unsigned int’ and ‘const int’ [-Werror=sign-compare]
     1373 |   return a.coeffs[0] < b;
          |          ~~~~~~~~~~~~^~~

	gcc/
	* omp-low.cc (lower_rec_simd_input_clauses): For 'ordered_max',
	cast 'omp_max_simt_vf ()', 'omp_max_simd_vf ()' to 'unsigned'.
---
 gcc/omp-low.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index b5b2681b654..002f91d930a 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -5177,8 +5177,8 @@ lower_rec_simd_input_clauses (tree new_var, omp_context *ctx,
       if (omp_maybe_offloaded_ctx (ctx))
 	{
 	  if (sctx->is_simt)
-	    sctx->max_vf = ordered_max (sctx->max_vf, omp_max_simt_vf ());
-	  sctx->max_vf = ordered_max (sctx->max_vf, omp_max_simd_vf ());
+	    sctx->max_vf = ordered_max (sctx->max_vf, (unsigned) omp_max_simt_vf ());
+	  sctx->max_vf = ordered_max (sctx->max_vf, (unsigned) omp_max_simd_vf ());
 	}
       if (maybe_gt (sctx->max_vf, 1U))
 	{
-- 
2.35.1


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: 0002-Resolve-Wsign-compare-issue-in-gcc-omp-low.cc-l.og12.patch --]
[-- Type: text/x-diff, Size: 968 bytes --]

From 1c5087dfff64c40505bcb81b5069781a44bb0b4d Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Fri, 28 Oct 2022 09:55:22 +0200
Subject: [PATCH 2/2] Resolve '-Wsign-compare' issue in
 'gcc/omp-low.cc:lower_rec_simd_input_clauses': ChangeLog

... forgotten in og12 commit 4e32d1582a137d5f34248fdd3e93d35a798f5221
"Resolve '-Wsign-compare' issue in 'gcc/omp-low.cc:lower_rec_simd_input_clauses'".
---
 gcc/ChangeLog.omp | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/gcc/ChangeLog.omp b/gcc/ChangeLog.omp
index 53cbfbab603..4bf521f2162 100644
--- a/gcc/ChangeLog.omp
+++ b/gcc/ChangeLog.omp
@@ -1,3 +1,8 @@
+2022-10-28  Thomas Schwinge  <thomas@codesourcery.com>
+
+	* omp-low.cc (lower_rec_simd_input_clauses): For 'ordered_max',
+	cast 'omp_max_simt_vf ()', 'omp_max_simd_vf ()' to 'unsigned'.
+
 2022-10-25  Abid Qadeer  <abidh@codesourcery.com>
 
 	* omp-low.cc (usm_transform): Handle operator new with alignment.
-- 
2.35.1


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-10-28  7:58 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-12 14:16 [PATCH] openmp: fix max_vf setting for amdgcn offloading Andrew Stubbs
2022-08-17 14:41 ` Jakub Jelinek
2022-10-28  7:58 ` Thomas Schwinge

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).