[PATCH] amdgcn: additional gfx1100 support

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] amdgcn: additional gfx1100 support
@ 2024-01-24 12:43 Andrew Stubbs
  2024-01-24 16:01 ` [patch] amdgcn: config.gcc - enable gfx1100 multilib; add gfx1100 to docs (was: [PATCH] amdgcn: additional gfx1100 support) Tobias Burnus
                   ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Andrew Stubbs @ 2024-01-24 12:43 UTC (permalink / raw)
  To: gcc-patches; +Cc: pa, rguenther

This is enough to get gfx1100 working for most purposes, on top of the
patch that Tobias committed a week or so ago; there are still some test
failures to investigate, and probably some tuning to do.

It might also get gfx1030 working too. @Richi, could you test it,
please?

I can't test the other multilibs right now. @PA, can you test it please?

I can self-approve the patch, but I'll hold off the commit until the
test results come back.

Andrew

gcc/ChangeLog:

	* config/gcn/gcn-opts.h (TARGET_PACKED_WORK_ITEMS): Add TARGET_RDNA3.
	* config/gcn/gcn-valu.md (all_convert): New iterator.
	(<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>): New
	define_expand, and rename the old one to ...
	(*<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>): ... this.
	(extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>): Likewise, to ...
	(extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>): .. this.
	(*<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_shift<exec>): New.
	* config/gcn/gcn.cc (gcn_global_address_p): Use "offsetbits" correctly.
	(gcn_hsa_declare_function_name): Update the vgpr counting for gfx1100.
	* config/gcn/gcn.md (<u>mulhisi3): Disable on RDNA3.
	(<u>mulqihi3_scalar): Likewise.

libgcc/ChangeLog:

	* config/gcn/amdgcn_veclib.h (CDNA3_PLUS): Handle RDNA3.

libgomp/ChangeLog:

	* config/gcn/time.c (RTC_TICKS): Configure RDNA3.
	(omp_get_wtime): Add RDNA3-compatible variant.
	* plugin/plugin-gcn.c (max_isa_vgprs): Tune for gfx1030 and gfx1100.

Signed-off-by:  Andrew Stubbs <ams@baylibre.com>
---
 gcc/config/gcn/gcn-opts.h         |  2 +-
 gcc/config/gcn/gcn-valu.md        | 41 ++++++++++++++++++++++++++++---
 gcc/config/gcn/gcn.cc             | 31 ++++++++++++++++-------
 gcc/config/gcn/gcn.md             |  4 +--
 libgcc/config/gcn/amdgcn_veclib.h |  2 +-
 libgomp/config/gcn/time.c         | 10 ++++++++
 libgomp/plugin/plugin-gcn.c       |  6 +++--
 7 files changed, 77 insertions(+), 19 deletions(-)

diff --git a/gcc/config/gcn/gcn-opts.h b/gcc/config/gcn/gcn-opts.h
index 79fbda3ab25..6be2c9204fa 100644
--- a/gcc/config/gcn/gcn-opts.h
+++ b/gcc/config/gcn/gcn-opts.h
@@ -62,7 +62,7 @@ extern enum gcn_isa {
 
 
 #define TARGET_M0_LDS_LIMIT (TARGET_GCN3)
-#define TARGET_PACKED_WORK_ITEMS (TARGET_CDNA2_PLUS)
+#define TARGET_PACKED_WORK_ITEMS (TARGET_CDNA2_PLUS || TARGET_RDNA3)
 
 #define TARGET_XNACK (flag_xnack != HSACO_ATTR_OFF)
 
diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index 3d5b6271ee6..cd027f8b369 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -3555,30 +3555,63 @@
 ;; }}}
 ;; {{{ Int/int conversions
 
+(define_code_iterator all_convert [truncate zero_extend sign_extend])
 (define_code_iterator zero_convert [truncate zero_extend])
 (define_code_attr convop [
 	(sign_extend "extend")
 	(zero_extend "zero_extend")
 	(truncate "trunc")])
 
-(define_insn "<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>"
+(define_expand "<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>"
+  [(set (match_operand:V_INT_1REG 0 "register_operand"      "=v")
+        (all_convert:V_INT_1REG
+	  (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))]
+  "")
+
+(define_insn "*<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>"
   [(set (match_operand:V_INT_1REG 0 "register_operand"      "=v")
         (zero_convert:V_INT_1REG
 	  (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))]
-  ""
+  "!TARGET_RDNA3"
   "v_mov_b32_sdwa\t%0, %1 dst_sel:<V_INT_1REG:sdwa> dst_unused:UNUSED_PAD src0_sel:<V_INT_1REG_ALT:sdwa>"
   [(set_attr "type" "vop_sdwa")
    (set_attr "length" "8")])
 
-(define_insn "extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>"
+(define_insn "extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>"
   [(set (match_operand:V_INT_1REG 0 "register_operand"	    "=v")
         (sign_extend:V_INT_1REG
 	  (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))]
-  ""
+  "!TARGET_RDNA3"
   "v_mov_b32_sdwa\t%0, sext(%1) src0_sel:<V_INT_1REG_ALT:sdwa>"
   [(set_attr "type" "vop_sdwa")
    (set_attr "length" "8")])
 
+(define_insn "*<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_shift<exec>"
+  [(set (match_operand:V_INT_1REG 0 "register_operand"      "=v")
+        (all_convert:V_INT_1REG
+	  (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))]
+  "TARGET_RDNA3"
+  {
+    enum {extend, zero_extend, trunc};
+    rtx shiftwidth = (<V_INT_1REG_ALT:SCALAR_MODE>mode == QImode
+		      || <V_INT_1REG:SCALAR_MODE>mode == QImode
+		      ? GEN_INT (24)
+		      : <V_INT_1REG_ALT:SCALAR_MODE>mode == HImode
+		        || <V_INT_1REG:SCALAR_MODE>mode == HImode
+		      ? GEN_INT (16)
+		      : NULL);
+    operands[2] = shiftwidth;
+
+    if (!shiftwidth)
+      return "v_mov_b32 %0, %1";
+    else if (<convop> == extend || <convop> == trunc)
+      return "v_lshlrev_b32\t%0, %2, %1\;v_ashrrev_i32\t%0, %2, %0";
+    else
+      return "v_lshlrev_b32\t%0, %2, %1\;v_lshrrev_b32\t%0, %2, %0";
+  }
+  [(set_attr "type" "mult")
+   (set_attr "length" "8")])
+
 ;; GCC can already do these for scalar types, but not for vector types.
 ;; Unfortunately you can't just do SUBREG on a vector to select the low part,
 ;; so there must be a few tricks here.
diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index e668ce7c69e..e80de2ce056 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -1597,8 +1597,8 @@ gcn_global_address_p (rtx addr)
       rtx offset = XEXP (addr, 1);
       int offsetbits = (TARGET_RDNA2_PLUS ? 11 : 12);
       bool immediate_p = (CONST_INT_P (offset)
-			  && INTVAL (offset) >= -(1 << 12)
-			  && INTVAL (offset) < (1 << 12));
+			  && INTVAL (offset) >= -(1 << offsetbits)
+			  && INTVAL (offset) < (1 << offsetbits));
 
       if ((gcn_address_register_p (base, DImode, false)
 	   || gcn_vec_address_register_p (base, DImode, false))
@@ -6597,8 +6597,10 @@ gcn_hsa_declare_function_name (FILE *file, const char *name,
     if (df_regs_ever_live_p (FIRST_AVGPR_REG + avgpr))
       break;
   avgpr++;
-  vgpr = (vgpr + 3) & ~3;
-  avgpr = (avgpr + 3) & ~3;
+
+  /* The main function epilogue uses v8, but df doesn't see that.  */
+  if (vgpr < 9)
+    vgpr = 9;
 
   if (!leaf_function_p ())
     {
@@ -6611,9 +6613,18 @@ gcn_hsa_declare_function_name (FILE *file, const char *name,
 	avgpr = MAX_NORMAL_AVGPR_COUNT;
     }
 
-  /* The gfx90a accum_offset field can't represent 0 registers.  */
-  if (gcn_arch == PROCESSOR_GFX90a && vgpr < 4)
-    vgpr = 4;
+  /* SIMD32 devices count double in wavefront64 mode.  */
+  if (TARGET_RDNA2_PLUS)
+    vgpr *= 2;
+
+  /* Round up to the allocation block size.  */
+  int vgpr_block_size = (TARGET_RDNA3 ? 12
+			 : TARGET_RDNA2_PLUS || TARGET_CDNA2_PLUS ? 8
+			 : 4);
+  if (vgpr % vgpr_block_size)
+    vgpr += vgpr_block_size - (vgpr % vgpr_block_size);
+  if (avgpr % vgpr_block_size)
+    avgpr += vgpr_block_size - (avgpr % vgpr_block_size);
 
   fputs ("\t.rodata\n"
 	 "\t.p2align\t6\n"
@@ -6714,12 +6725,14 @@ gcn_hsa_declare_function_name (FILE *file, const char *name,
 	   "            .private_segment_fixed_size: 0\n"
 	   "            .wavefront_size: 64\n"
 	   "            .sgpr_count: %i\n"
-	   "            .vgpr_count: %i\n"
+	   "            .vgpr_count: %i%s\n"
 	   "            .max_flat_workgroup_size: 1024\n",
 	   cfun->machine->kernarg_segment_byte_size,
 	   cfun->machine->kernarg_segment_alignment,
 	   LDS_SIZE,
-	   sgpr, next_free_vgpr);
+	   sgpr, next_free_vgpr,
+	   (TARGET_RDNA2_PLUS ? " ; wavefrontsize64 counts double on SIMD32"
+	    : ""));
   if (gcn_arch == PROCESSOR_GFX90a || gcn_arch == PROCESSOR_GFX908)
     fprintf (file, "            .agpr_count: %i\n", avgpr);
   fputs ("        .end_amdgpu_metadata\n", file);
diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md
index 492b833e255..1f3c692b7a6 100644
--- a/gcc/config/gcn/gcn.md
+++ b/gcc/config/gcn/gcn.md
@@ -1618,7 +1618,7 @@
 	(mult:SI
 	  (any_extend:SI (match_operand:HI 1 "register_operand" "%v"))
 	  (any_extend:SI (match_operand:HI 2 "register_operand" " v"))))]
-  ""
+  "!TARGET_RDNA3"
   "v_mul_<iu>32_<iu>24_sdwa\t%0, %<e>1, %<e>2 src0_sel:WORD_0 src1_sel:WORD_0"
   [(set_attr "type" "vop_sdwa")
    (set_attr "length" "8")])
@@ -1628,7 +1628,7 @@
 	(mult:HI
 	  (any_extend:HI (match_operand:QI 1 "register_operand" "%v"))
 	  (any_extend:HI (match_operand:QI 2 "register_operand" " v"))))]
-  ""
+  "!TARGET_RDNA3"
   "v_mul_<iu>32_<iu>24_sdwa\t%0, %<e>1, %<e>2 src0_sel:BYTE_0 src1_sel:BYTE_0"
   [(set_attr "type" "vop_sdwa")
    (set_attr "length" "8")])
diff --git a/libgcc/config/gcn/amdgcn_veclib.h b/libgcc/config/gcn/amdgcn_veclib.h
index 821f6386dd6..d268c6cac16 100644
--- a/libgcc/config/gcn/amdgcn_veclib.h
+++ b/libgcc/config/gcn/amdgcn_veclib.h
@@ -230,7 +230,7 @@ do { \
 
 #if defined (__GCN3__) || defined (__GCN5__) \
     || defined (__CDNA1__) || defined (__CDNA2__) \
-    || defined (__RDNA2__)
+    || defined (__RDNA2__) || defined (__RDNA3__)
 #define CDNA3_PLUS 0
 #else
 #define CDNA3_PLUS 1
diff --git a/libgomp/config/gcn/time.c b/libgomp/config/gcn/time.c
index 30a0d0188e4..efcd04f5f43 100644
--- a/libgomp/config/gcn/time.c
+++ b/libgomp/config/gcn/time.c
@@ -30,15 +30,25 @@
 /* According to AMD:
     dGPU RTC is 27MHz
     AGPU RTC is 100MHz
+    RDNA3 ISA manual states "typically 100MHz"
    FIXME: DTRT on an APU.  */
+#ifdef __RDNA3__
+#define RTC_TICKS (1.0 / 100000000.0) /* 100MHz */
+#else
 #define RTC_TICKS (1.0 / 27000000.0) /* 27MHz */
+#endif
 
 double
 omp_get_wtime (void)
 {
   uint64_t clock;
+#ifdef __RDNA3__
+  asm ("s_sendmsg_rtn_b64 %0 0x83 ;Get REALTIME\n\t"
+       "s_waitcnt 0" : "=r" (clock));
+#else
   asm ("s_memrealtime %0\n\t"
        "s_waitcnt 0" : "=r" (clock));
+#endif
   return clock * RTC_TICKS;
 }
 
diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 0339848451e..db28781dedb 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -1741,11 +1741,13 @@ max_isa_vgprs (int isa)
     case EF_AMDGPU_MACH_AMDGCN_GFX900:
     case EF_AMDGPU_MACH_AMDGCN_GFX906:
     case EF_AMDGPU_MACH_AMDGCN_GFX908:
-    case EF_AMDGPU_MACH_AMDGCN_GFX1030:
-    case EF_AMDGPU_MACH_AMDGCN_GFX1100:
       return 256;
     case EF_AMDGPU_MACH_AMDGCN_GFX90a:
       return 512;
+    case EF_AMDGPU_MACH_AMDGCN_GFX1030:
+      return 512;  /* 512 SIMD32 = 256 wavefrontsize64.  */
+    case EF_AMDGPU_MACH_AMDGCN_GFX1100:
+      return 1536; /* 1536 SIMD32 = 768 wavefrontsize64.  */
     }
   GOMP_PLUGIN_fatal ("unhandled ISA in max_isa_vgprs");
 }
-- 
2.41.0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [patch] amdgcn: config.gcc - enable gfx1100 multilib; add gfx1100 to docs (was: [PATCH] amdgcn: additional gfx1100 support)
  2024-01-24 12:43 [PATCH] amdgcn: additional gfx1100 support Andrew Stubbs
@ 2024-01-24 16:01 ` Tobias Burnus
  2024-01-26 12:26   ` [patch] amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs (was: [patch] amdgcn: config.gcc - enable gfx1100 multilib; add gfx1100 to docs) Tobias Burnus
  2024-01-26  8:56 ` [PATCH] amdgcn: additional gfx1100 support Richard Biener
  2024-03-06 13:49 ` amdgcn: additional gfx1030/gfx1100 support: adjust test cases (was: [PATCH] amdgcn: additional gfx1100 support) Thomas Schwinge
  2 siblings, 1 reply; 25+ messages in thread
From: Tobias Burnus @ 2024-01-24 16:01 UTC (permalink / raw)
  To: Andrew Stubbs, gcc-patches; +Cc: pa, rguenther

[-- Attachment #1: Type: text/plain, Size: 986 bytes --]

This patch obviously depends on Andrew's; he wrote in the previous email 
of this thread regarding his patch:

Andrew Stubbs wrote:
> This is enough to get gfx1100 working for most purposes, on top of the
> patch that Tobias committed a week or so ago; there are still some test
> failures to investigate, and probably some tuning to do.
> 
> It might also get gfx1030 working too. @Richi, could you test it,
> please?
> 
> I can't test the other multilibs right now. @PA, can you test it please?
> 
> I can self-approve the patch, but I'll hold off the commit until the
> test results come back.

Okay to enable gfx1100 multilib building and to document gfx1100 in the 
manual?

(I mean, obviously, only after Andrew committed his patch. For gfx1030, 
we might eventually also enable gfx1030 multilib support; if Richard 
confirms that collaterally fixes gfx1030, we probably should - and 
depending on the number/kinds of testsuite, we could then document it
or not, I guess.)

Tobias

[-- Attachment #2: gfx1100-doc.diff --]
[-- Type: text/x-patch, Size: 4657 bytes --]

amdgcn: config.gcc - enable gfx1100 multilib; add gfx1100 to docs

gcc/ChangeLog:

	* config.gcc (amdgcn-*-*): Add gfx1100 to TM_MULTILIB_CONFIG.
	* doc/install.texi (Configuration amdgcn-*-*): Mention gfx1100.
	* doc/invoke.texi (AMD GCN Options): Add gfx1100 to -march/-mtune.

libgomp/ChangeLog:

	* testsuite/libgomp.c/declare-variant-4.h: Add variant functions
	for gfx1030 and gfx1100.
	* testsuite/libgomp.c/declare-variant-4-gfx1100.c: New test.

Signed-off-by: Tobias Burnus <tburnus@baylibre.com>

 gcc/config.gcc                                          |  2 +-
 gcc/doc/install.texi                                    | 12 ++++++------
 gcc/doc/invoke.texi                                     |  3 +++
 libgomp/testsuite/libgomp.c/declare-variant-4-gfx1100.c |  8 ++++++++
 libgomp/testsuite/libgomp.c/declare-variant-4.h         | 16 ++++++++++++++++
 5 files changed, 34 insertions(+), 7 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index b2d7d7dd475..2343e98ebe6 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -4564,7 +4564,7 @@ case "${target}" in
 			TM_MULTILIB_CONFIG=
 			;;
 		xdefault | xyes)
-			TM_MULTILIB_CONFIG=`echo "gfx900,gfx906,gfx908,gfx90a" | sed "s/${with_arch},\?//;s/,$//"`
+			TM_MULTILIB_CONFIG=`echo "gfx900,gfx906,gfx908,gfx90a,gfx1100" | sed "s/${with_arch},\?//;s/,$//"`
 			;;
 		*)
 			TM_MULTILIB_CONFIG="${with_multilib_list}"
diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 71593919389..5304ebd36a9 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -1258,12 +1258,12 @@ default set of libraries is selected based on the value of
 
 @item amdgcn*-*-*
 @var{list} is a comma separated list of ISA names (allowed values: @code{fiji},
-@code{gfx900}, @code{gfx906}, @code{gfx908}, @code{gfx90a}). It ought not
-include the name of the default ISA, specified via @option{--with-arch}.  If
-@var{list} is empty, then there will be no multilibs and only the default
-run-time library will be built.  If @var{list} is @code{default} or
-@option{--with-multilib-list=} is not specified, then the default set of
-libraries is selected.
+@code{gfx900}, @code{gfx906}, @code{gfx908}, @code{gfx90a}, @code{gfx1100}).
+It ought not include the name of the default ISA, specified
+via @option{--with-arch}.  If @var{list} is empty, then there will be no
+multilibs and only the default run-time library will be built.  If @var{list}
+is @code{default} or @option{--with-multilib-list=} is not specified, then
+the default set of libraries is selected.
 
 @item arm*-*-*
 @var{list} is a comma separated list of @code{aprofile} and
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 5f904cf1ef2..d1b2c284e2b 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -21723,6 +21723,9 @@ Compile for CDNA1 Instinct MI100 series devices (gfx908).
 @item gfx90a
 Compile for CDNA2 Instinct MI200 series devices (gfx90a).
 
+@item gfx1100
+Compile for RDNA3 gfx1100 devices (GFX11 series).
+
 @end table
 
 @opindex msram-ecc
diff --git a/libgomp/testsuite/libgomp.c/declare-variant-4-gfx1100.c b/libgomp/testsuite/libgomp.c/declare-variant-4-gfx1100.c
new file mode 100644
index 00000000000..6ade35224cc
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/declare-variant-4-gfx1100.c
@@ -0,0 +1,8 @@
+/* { dg-do link { target { offload_target_amdgcn } } } */
+/* { dg-additional-options -foffload=amdgcn-amdhsa } */
+/* { dg-additional-options -foffload=-march=gfx1100 } */
+/* { dg-additional-options "-foffload=-fdump-tree-optimized" } */
+
+#include "declare-variant-4.h"
+
+/* { dg-final { only_for_offload_target amdgcn-amdhsa scan-offload-tree-dump "= gfx1100 \\(\\);" "optimized" } } */
diff --git a/libgomp/testsuite/libgomp.c/declare-variant-4.h b/libgomp/testsuite/libgomp.c/declare-variant-4.h
index a70352430c2..393a5e295cc 100644
--- a/libgomp/testsuite/libgomp.c/declare-variant-4.h
+++ b/libgomp/testsuite/libgomp.c/declare-variant-4.h
@@ -35,6 +35,20 @@ gfx90a (void)
   return 0x90a;
 }
 
+__attribute__ ((noipa))
+int
+gfx1030 (void)
+{
+  return 0x1030;
+}
+
+__attribute__ ((noipa))
+int
+gfx1100 (void)
+{
+  return 0x1100;
+}
+
 #ifdef USE_FIJI_FOR_GFX803
 #pragma omp declare variant(gfx803) match(device = {isa("fiji")})
 #else
@@ -44,6 +58,8 @@ gfx90a (void)
 #pragma omp declare variant(gfx906) match(device = {isa("gfx906")})
 #pragma omp declare variant(gfx908) match(device = {isa("gfx908")})
 #pragma omp declare variant(gfx90a) match(device = {isa("gfx90a")})
+#pragma omp declare variant(gfx90a) match(device = {isa("gfx1030")})
+#pragma omp declare variant(gfx90a) match(device = {isa("gfx1100")})
 __attribute__ ((noipa))
 int
 f (void)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [patch] amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs (was: [patch] amdgcn: config.gcc - enable gfx1100 multilib; add gfx1100 to docs)
  2024-01-24 16:01 ` [patch] amdgcn: config.gcc - enable gfx1100 multilib; add gfx1100 to docs (was: [PATCH] amdgcn: additional gfx1100 support) Tobias Burnus
@ 2024-01-26 12:26   ` Tobias Burnus
  2024-01-26 12:32     ` [patch] amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs Tobias Burnus
  0 siblings, 1 reply; 25+ messages in thread
From: Tobias Burnus @ 2024-01-26 12:26 UTC (permalink / raw)
  To: Andrew Stubbs, gcc-patches, rguenther

Hi all, hi Richard & Andrew,

Am 24.01.24 um 17:01 schrieb Tobias Burnus:
> This patch obviously depends on Andrew's; he wrote in the previous 
> email of this thread regarding his patch:
>
> Andrew Stubbs wrote:
>> This is enough to get gfx1100 working for most purposes, on top of the
>> patch that Tobias committed a week or so ago; there are still some test
>> failures to investigate, and probably some tuning to do.
>>
>> It might also get gfx1030 working too. @Richi, could you test it,
>> please?

If gfx1030 doesn't work, I would propose my patch previously in the 
thread, https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643835.html

This patch assumes that both gfx1100 and gfx1030 are now working:

> Okay to enable gfx1100 multilib building and to document gfx1100 in 
> the manual?
and, with this patch, additionally gfx1030?

OK for mainline, once Andrew's patch is in?

Tobias


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch] amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs
  2024-01-26 12:26   ` [patch] amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs (was: [patch] amdgcn: config.gcc - enable gfx1100 multilib; add gfx1100 to docs) Tobias Burnus
@ 2024-01-26 12:32     ` Tobias Burnus
  2024-01-26 12:40       ` Richard Biener
  2024-01-26 16:21       ` Thomas Schwinge
  0 siblings, 2 replies; 25+ messages in thread
From: Tobias Burnus @ 2024-01-26 12:32 UTC (permalink / raw)
  To: Andrew Stubbs, gcc-patches, rguenther

[-- Attachment #1: Type: text/plain, Size: 974 bytes --]

Now with patch ...

Tobias Burnus wrote:
> Hi all, hi Richard & Andrew,
>
> Am 24.01.24 um 17:01 schrieb Tobias Burnus:
>> This patch obviously depends on Andrew's; he wrote in the previous 
>> email of this thread regarding his patch:
>>
>> Andrew Stubbs wrote:
>>> This is enough to get gfx1100 working for most purposes, on top of the
>>> patch that Tobias committed a week or so ago; there are still some test
>>> failures to investigate, and probably some tuning to do.
>>>
>>> It might also get gfx1030 working too. @Richi, could you test it,
>>> please?
>
> If gfx1030 doesn't work, I would propose my patch previously in the 
> thread, 
> https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643835.html
>
> This patch assumes that both gfx1100 and gfx1030 are now working:
>
>> Okay to enable gfx1100 multilib building and to document gfx1100 in 
>> the manual?
> and, with this patch, additionally gfx1030?
>
> OK for mainline, once Andrew's patch is in? 
Tobias

[-- Attachment #2: gfx1100-gfx1030-doc.diff --]
[-- Type: text/x-patch, Size: 5599 bytes --]

amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs

gcc/ChangeLog:

	* config.gcc (amdgcn-*-*): Add gfx1030 and gfx1100 to
	TM_MULTILIB_CONFIG.
	* doc/install.texi (Configuration amdgcn-*-*): Mention gfx1030/gfx1100.
	* doc/invoke.texi (AMD GCN Options): Add gfx1030 and gfx1100 to
	-march/-mtune.

libgomp/ChangeLog:

	* testsuite/libgomp.c/declare-variant-4.h: Add variant functions
	for gfx1030 and gfx1100.
	* testsuite/libgomp.c/declare-variant-4-gfx1030.c: New test.
	* testsuite/libgomp.c/declare-variant-4-gfx1100.c: New test.

Signed-off-by: Tobias Burnus <tburnus@baylibre.com>

 gcc/config.gcc                                          |  2 +-
 gcc/doc/install.texi                                    | 12 ++++++------
 gcc/doc/invoke.texi                                     |  6 ++++++
 libgomp/testsuite/libgomp.c/declare-variant-4-gfx1030.c |  8 ++++++++
 libgomp/testsuite/libgomp.c/declare-variant-4-gfx1100.c |  8 ++++++++
 libgomp/testsuite/libgomp.c/declare-variant-4.h         | 16 ++++++++++++++++
 6 files changed, 45 insertions(+), 7 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index b2d7d7dd475..a0f9c672308 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -4564,7 +4564,7 @@ case "${target}" in
 			TM_MULTILIB_CONFIG=
 			;;
 		xdefault | xyes)
-			TM_MULTILIB_CONFIG=`echo "gfx900,gfx906,gfx908,gfx90a" | sed "s/${with_arch},\?//;s/,$//"`
+			TM_MULTILIB_CONFIG=`echo "gfx900,gfx906,gfx908,gfx90a,gfx1030,gfx1100" | sed "s/${with_arch},\?//;s/,$//"`
 			;;
 		*)
 			TM_MULTILIB_CONFIG="${with_multilib_list}"
diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 71593919389..5747b5a12fe 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -1258,12 +1258,12 @@ default set of libraries is selected based on the value of
 
 @item amdgcn*-*-*
 @var{list} is a comma separated list of ISA names (allowed values: @code{fiji},
-@code{gfx900}, @code{gfx906}, @code{gfx908}, @code{gfx90a}). It ought not
-include the name of the default ISA, specified via @option{--with-arch}.  If
-@var{list} is empty, then there will be no multilibs and only the default
-run-time library will be built.  If @var{list} is @code{default} or
-@option{--with-multilib-list=} is not specified, then the default set of
-libraries is selected.
+@code{gfx900}, @code{gfx906}, @code{gfx908}, @code{gfx90a}, @code{gfx1030},
+@code{gfx1100}).  It ought not include the name of the default ISA, specified
+via @option{--with-arch}.  If @var{list} is empty, then there will be no
+multilibs and only the default run-time library will be built.  If @var{list}
+is @code{default} or @option{--with-multilib-list=} is not specified, then
+the default set of libraries is selected.
 
 @item arm*-*-*
 @var{list} is a comma separated list of @code{aprofile} and
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 6ec56493e59..64c5ed2ffde 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -21739,6 +21739,12 @@ Compile for CDNA1 Instinct MI100 series devices (gfx908).
 @item gfx90a
 Compile for CDNA2 Instinct MI200 series devices (gfx90a).
 
+@item gfx1030
+Compile for RDNA2 gfx1030 devices (GFX10 series).
+
+@item gfx1100
+Compile for RDNA3 gfx1100 devices (GFX11 series).
+
 @end table
 
 @opindex msram-ecc
diff --git a/libgomp/testsuite/libgomp.c/declare-variant-4-gfx1030.c b/libgomp/testsuite/libgomp.c/declare-variant-4-gfx1030.c
new file mode 100644
index 00000000000..d98d5ef54ec
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/declare-variant-4-gfx1030.c
@@ -0,0 +1,8 @@
+/* { dg-do link { target { offload_target_amdgcn } } } */
+/* { dg-additional-options -foffload=amdgcn-amdhsa } */
+/* { dg-additional-options -foffload=-march=gfx1030 } */
+/* { dg-additional-options "-foffload=-fdump-tree-optimized" } */
+
+#include "declare-variant-4.h"
+
+/* { dg-final { only_for_offload_target amdgcn-amdhsa scan-offload-tree-dump "= gfx1030 \\(\\);" "optimized" } } */
diff --git a/libgomp/testsuite/libgomp.c/declare-variant-4-gfx1100.c b/libgomp/testsuite/libgomp.c/declare-variant-4-gfx1100.c
new file mode 100644
index 00000000000..6ade35224cc
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/declare-variant-4-gfx1100.c
@@ -0,0 +1,8 @@
+/* { dg-do link { target { offload_target_amdgcn } } } */
+/* { dg-additional-options -foffload=amdgcn-amdhsa } */
+/* { dg-additional-options -foffload=-march=gfx1100 } */
+/* { dg-additional-options "-foffload=-fdump-tree-optimized" } */
+
+#include "declare-variant-4.h"
+
+/* { dg-final { only_for_offload_target amdgcn-amdhsa scan-offload-tree-dump "= gfx1100 \\(\\);" "optimized" } } */
diff --git a/libgomp/testsuite/libgomp.c/declare-variant-4.h b/libgomp/testsuite/libgomp.c/declare-variant-4.h
index a70352430c2..393a5e295cc 100644
--- a/libgomp/testsuite/libgomp.c/declare-variant-4.h
+++ b/libgomp/testsuite/libgomp.c/declare-variant-4.h
@@ -35,6 +35,20 @@ gfx90a (void)
   return 0x90a;
 }
 
+__attribute__ ((noipa))
+int
+gfx1030 (void)
+{
+  return 0x1030;
+}
+
+__attribute__ ((noipa))
+int
+gfx1100 (void)
+{
+  return 0x1100;
+}
+
 #ifdef USE_FIJI_FOR_GFX803
 #pragma omp declare variant(gfx803) match(device = {isa("fiji")})
 #else
@@ -44,6 +58,8 @@ gfx90a (void)
 #pragma omp declare variant(gfx906) match(device = {isa("gfx906")})
 #pragma omp declare variant(gfx908) match(device = {isa("gfx908")})
 #pragma omp declare variant(gfx90a) match(device = {isa("gfx90a")})
+#pragma omp declare variant(gfx90a) match(device = {isa("gfx1030")})
+#pragma omp declare variant(gfx90a) match(device = {isa("gfx1100")})
 __attribute__ ((noipa))
 int
 f (void)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch] amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs
  2024-01-26 12:32     ` [patch] amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs Tobias Burnus
@ 2024-01-26 12:40       ` Richard Biener
  2024-01-26 12:59         ` Tobias Burnus
  2024-01-26 16:21       ` Thomas Schwinge
  1 sibling, 1 reply; 25+ messages in thread
From: Richard Biener @ 2024-01-26 12:40 UTC (permalink / raw)
  To: Tobias Burnus; +Cc: Andrew Stubbs, gcc-patches

On Fri, 26 Jan 2024, Tobias Burnus wrote:

> Now with patch ...
> 
> Tobias Burnus wrote:
> > Hi all, hi Richard & Andrew,
> >
> > Am 24.01.24 um 17:01 schrieb Tobias Burnus:
> >> This patch obviously depends on Andrew's; he wrote in the previous email of
> >> this thread regarding his patch:
> >>
> >> Andrew Stubbs wrote:
> >>> This is enough to get gfx1100 working for most purposes, on top of the
> >>> patch that Tobias committed a week or so ago; there are still some test
> >>> failures to investigate, and probably some tuning to do.
> >>>
> >>> It might also get gfx1030 working too. @Richi, could you test it,
> >>> please?
> >
> > If gfx1030 doesn't work, I would propose my patch previously in the thread,
> > https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643835.html
> >
> > This patch assumes that both gfx1100 and gfx1030 are now working:
> >
> >> Okay to enable gfx1100 multilib building and to document gfx1100 in the
> >> manual?
> > and, with this patch, additionally gfx1030?
> >
> > OK for mainline, once Andrew's patch is in? 
> Tobias

Looks good to me.

+@item gfx1030
+Compile for RDNA2 gfx1030 devices (GFX10 series).
+
+@item gfx1100
+Compile for RDNA3 gfx1100 devices (GFX11 series).

Btw, "GFX10" series isn't precise as it's only the high-end parts
that are covered by gfx1030, there's gfx103[0-6] where hopefully
at least gfx1031, gfx1032 and gfx1034 (the dGPU variants) are
trivial to support as well(?).  Using gfx103x might be better
that way, OTOH if APU vs dGPU will make a compilation target
difference then gfx103d vs gfx103a maybe?  "GFX10" series might
be also not know to users, but I'm unsure we can list AMD
product names here?

Richard.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch] amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs
  2024-01-26 12:40       ` Richard Biener
@ 2024-01-26 12:59         ` Tobias Burnus
  0 siblings, 0 replies; 25+ messages in thread
From: Tobias Burnus @ 2024-01-26 12:59 UTC (permalink / raw)
  To: Richard Biener, Andrew Stubbs; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1231 bytes --]

Hi Richard,

Richard Biener wrote:
> Looks good to me.
Thanks - I will commit it after lunch to see whether someone else has 
additional comments.
> +@item gfx1030
> +Compile for RDNA2 gfx1030 devices (GFX10 series).
> +
> +@item gfx1100
> +Compile for RDNA3 gfx1100 devices (GFX11 series).
>
> Btw, "GFX10" series isn't precise as it's only the high-end parts
> that are covered by gfx1030, there's gfx103[0-6] where hopefully
> at least gfx1031, gfx1032 and gfx1034 (the dGPU variants) are
> trivial to support as well(?).
>
> Using gfx103x might be better
> that way, OTOH if APU vs dGPU will make a compilation target
> difference then gfx103d vs gfx103a maybe?  "GFX10" series might

On the LLVM side (and also for the llvm-mc assembler), they distinguish 
all of the gfx* for the -mcpu= argument. See 
https://llvm.org/docs/AMDGPUUsage.html#id26 for that list.

Thus, I think it makes sense to do the same here.  The last column on 
that page lists the supported hardware but is it neither really up to 
date nor complete.

Thus, I found it easier to just mention gfx1100 as that's unique. On the 
ROCm side, AMD has:

https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html

Tobias

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch] amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs
  2024-01-26 12:32     ` [patch] amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs Tobias Burnus
  2024-01-26 12:40       ` Richard Biener
@ 2024-01-26 16:21       ` Thomas Schwinge
  2024-01-26 16:36         ` Richard Biener
  2024-01-26 16:45         ` [patch] install.texi: For gcn, recommend LLVM 15, unless gfx1100 is disabled (was: [patch] amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs) Tobias Burnus
  1 sibling, 2 replies; 25+ messages in thread
From: Thomas Schwinge @ 2024-01-26 16:21 UTC (permalink / raw)
  To: Tobias Burnus, Andrew Stubbs; +Cc: gcc-patches, rguenther

Hi!

Great progress that you've made!  :-)

On 2024-01-26T13:32:02+0100, Tobias Burnus <tburnus@baylibre.com> wrote:
> Tobias Burnus wrote:
>> Am 24.01.24 um 17:01 schrieb Tobias Burnus:
>>> Okay to enable gfx1100 multilib building and to document gfx1100 in 
>>> the manual?
>>
>> and, with this patch, additionally gfx1030?

> amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs

> --- a/gcc/doc/install.texi
> +++ b/gcc/doc/install.texi
> @@ -1258,12 +1258,12 @@ default set of libraries is selected based on the value of
>  
>  @item amdgcn*-*-*
>  @var{list} is a comma separated list of ISA names (allowed values: @code{fiji},
> -@code{gfx900}, @code{gfx906}, @code{gfx908}, @code{gfx90a}). It ought not
> -include the name of the default ISA, specified via @option{--with-arch}.  If
> -@var{list} is empty, then there will be no multilibs and only the default
> -run-time library will be built.  If @var{list} is @code{default} or
> -@option{--with-multilib-list=} is not specified, then the default set of
> -libraries is selected.
> +@code{gfx900}, @code{gfx906}, @code{gfx908}, @code{gfx90a}, @code{gfx1030},
> +@code{gfx1100}).  It ought not include the name of the default ISA, specified
> +via @option{--with-arch}.  If @var{list} is empty, then there will be no
> +multilibs and only the default run-time library will be built.  If @var{list}
> +is @code{default} or @option{--with-multilib-list=} is not specified, then
> +the default set of libraries is selected.

Further down in that file, we state:

    @anchor{amdgcn-x-amdhsa}
    @heading amdgcn-*-amdhsa
    AMD GCN GPU target.
    
    Instead of GNU Binutils, you will need to install LLVM 13.0.1, or later, [...]

LLVM 13.0.1 may still be fine for gfx1030
('[...]/amdgcn-amdhsa/gfx1030/libgcc' does get built; I've not further
tested), but it's not sufficient for gfx1100 anymore:

    [...]
    checking for suffix of object files... configure: error: in `[...]/amdgcn-amdhsa/gfx1100/libgcc':
    configure: error: cannot compute suffix of object files: cannot compile
    See `config.log' for more details
    make[1]: *** [Makefile:14105: configure-target-libgcc] Error 1
    [...]

'[...]/amdgcn-amdhsa/gfx1100/libgcc/config.log':

    [...]
    'gfx1100' is not a recognized processor for this target (ignoring processor)
    'gfx1100' is not a recognized processor for this target (ignoring processor)
    /tmp/ccZdohcj.s:1:17: error: .amdgcn_target directive's target id amdgcn-unknown-amdhsa--gfx1100 does not match the specified target id amdgcn-unknown-amdhsa--gfx000
            .amdgcn_target "amdgcn-unknown-amdhsa--gfx1100"
                           ^
    [...]

Which version of LLVM should we be recommending?


Grüße
 Thomas

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch] amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs
  2024-01-26 16:21       ` Thomas Schwinge
@ 2024-01-26 16:36         ` Richard Biener
  2024-01-26 16:45         ` [patch] install.texi: For gcn, recommend LLVM 15, unless gfx1100 is disabled (was: [patch] amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs) Tobias Burnus
  1 sibling, 0 replies; 25+ messages in thread
From: Richard Biener @ 2024-01-26 16:36 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: Tobias Burnus, Andrew Stubbs, gcc-patches



> Am 26.01.2024 um 17:22 schrieb Thomas Schwinge <tschwinge@baylibre.com>:
> 
> Hi!
> 
> Great progress that you've made!  :-)
> 
>> On 2024-01-26T13:32:02+0100, Tobias Burnus <tburnus@baylibre.com> wrote:
>> Tobias Burnus wrote:
>>> Am 24.01.24 um 17:01 schrieb Tobias Burnus:
>>>> Okay to enable gfx1100 multilib building and to document gfx1100 in
>>>> the manual?
>>> 
>>> and, with this patch, additionally gfx1030?
> 
>> amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs
> 
>> --- a/gcc/doc/install.texi
>> +++ b/gcc/doc/install.texi
>> @@ -1258,12 +1258,12 @@ default set of libraries is selected based on the value of
>> 
>> @item amdgcn*-*-*
>> @var{list} is a comma separated list of ISA names (allowed values: @code{fiji},
>> -@code{gfx900}, @code{gfx906}, @code{gfx908}, @code{gfx90a}). It ought not
>> -include the name of the default ISA, specified via @option{--with-arch}.  If
>> -@var{list} is empty, then there will be no multilibs and only the default
>> -run-time library will be built.  If @var{list} is @code{default} or
>> -@option{--with-multilib-list=} is not specified, then the default set of
>> -libraries is selected.
>> +@code{gfx900}, @code{gfx906}, @code{gfx908}, @code{gfx90a}, @code{gfx1030},
>> +@code{gfx1100}).  It ought not include the name of the default ISA, specified
>> +via @option{--with-arch}.  If @var{list} is empty, then there will be no
>> +multilibs and only the default run-time library will be built.  If @var{list}
>> +is @code{default} or @option{--with-multilib-list=} is not specified, then
>> +the default set of libraries is selected.
> 
> Further down in that file, we state:
> 
>    @anchor{amdgcn-x-amdhsa}
>    @heading amdgcn-*-amdhsa
>    AMD GCN GPU target.
> 
>    Instead of GNU Binutils, you will need to install LLVM 13.0.1, or later, [...]
> 
> LLVM 13.0.1 may still be fine for gfx1030
> ('[...]/amdgcn-amdhsa/gfx1030/libgcc' does get built; I've not further
> tested), but it's not sufficient for gfx1100 anymore:
> 
>    [...]
>    checking for suffix of object files... configure: error: in `[...]/amdgcn-amdhsa/gfx1100/libgcc':
>    configure: error: cannot compute suffix of object files: cannot compile
>    See `config.log' for more details
>    make[1]: *** [Makefile:14105: configure-target-libgcc] Error 1
>    [...]
> 
> '[...]/amdgcn-amdhsa/gfx1100/libgcc/config.log':
> 
>    [...]
>    'gfx1100' is not a recognized processor for this target (ignoring processor)
>    'gfx1100' is not a recognized processor for this target (ignoring processor)
>    /tmp/ccZdohcj.s:1:17: error: .amdgcn_target directive's target id amdgcn-unknown-amdhsa--gfx1100 does not match the specified target id amdgcn-unknown-amdhsa--gfx000
>            .amdgcn_target "amdgcn-unknown-amdhsa--gfx1100"
>                           ^
>    [...]
> 
> Which version of LLVM should we be recommending?

A more recent one ;)  unless we know of any fixed bugs in the assembler we’d rely on I‘d day the oldest that works „or later“

Richard 

> 
> Grüße
> Thomas

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [patch] install.texi: For gcn, recommend LLVM 15, unless gfx1100 is disabled (was: [patch] amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs)
  2024-01-26 16:21       ` Thomas Schwinge
  2024-01-26 16:36         ` Richard Biener
@ 2024-01-26 16:45         ` Tobias Burnus
  2024-01-29 10:01           ` [patch] install.texi: For gcn, recommend LLVM 15, unless gfx1100 is disabled Andrew Stubbs
  1 sibling, 1 reply; 25+ messages in thread
From: Tobias Burnus @ 2024-01-26 16:45 UTC (permalink / raw)
  To: Thomas Schwinge, Andrew Stubbs; +Cc: gcc-patches, rguenther


[-- Attachment #1.1: Type: text/plain, Size: 845 bytes --]

Hi,

Thomas Schwinge wrote:
> amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to 
> the docs
> ...
> Further down in that file, we state:
>      @anchor{amdgcn-x-amdhsa}
>      @heading amdgcn-*-amdhsa
>      AMD GCN GPU target.
>      
>      Instead of GNU Binutils, you will need to install LLVM 13.0.1, or later, [...]
>
> LLVM 13.0.1 may still be fine for gfx1030
> ('[...]/amdgcn-amdhsa/gfx1030/libgcc' does get built; I've not further
> tested), but it's not sufficient for gfx1100 anymore:

Testing with the system compilers here, llvm-mc-14.0.6 also fails while 
llvm-mc-15.0.7 accepts it.

> Which version of LLVM should we be recommending?

 >= LLVM 15, I think. How about the following wording? It still mentions 
LLVM 13.0.1 for those that really need it but with for the default 
setup, it requires 15+.

Tobias

[-- Attachment #2: install-fix.diff --]
[-- Type: text/x-patch, Size: 1340 bytes --]

install.texi: For gcn, recommend LLVM 15, unless gfx1100 is disabled

gcc/ChangeLog:

	* doc/install.texi (amdgcn): Recommend LLVM 15+ and newlib 4.4+,
	but keep requiring only newlib 4.3+ and, if gfx1100 is disabled,
	LLVM 13.0.1+.

Signed-off-by: Tobias Burnus <tburnus@baylibre.com>

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 5747b5a12fe..c7794439107 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -3927,14 +3927,15 @@ This is a synonym for @samp{x86_64-*-solaris2*}.
 @heading amdgcn-*-amdhsa
 AMD GCN GPU target.
 
-Instead of GNU Binutils, you will need to install LLVM 13.0.1, or later, and copy
+Instead of GNU Binutils, you will need to install LLVM 15, or later, and copy
 @file{bin/llvm-mc} to @file{amdgcn-amdhsa/bin/as},
 @file{bin/lld} to @file{amdgcn-amdhsa/bin/ld},
 @file{bin/llvm-nm} to @file{amdgcn-amdhsa/bin/nm}, and
 @file{bin/llvm-ar} to both @file{bin/amdgcn-amdhsa-ar} and
-@file{bin/amdgcn-amdhsa-ranlib}.
+@file{bin/amdgcn-amdhsa-ranlib}.  Note that LLVM 13.0.1 or LLVM 14 can be used
+by specifying a @code{--with-multilib-list=} that does not list @code{gfx1100}.
 
-Use Newlib (4.3.0 or newer).
+Use Newlib (4.3.0 or newer; 4.4.0 or later is recommended).
 
 To run the binaries, install the HSA Runtime from the
 @uref{https://rocm.docs.amd.com/,,ROCm Platform}, and use

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch] install.texi: For gcn, recommend LLVM 15, unless gfx1100 is disabled
  2024-01-26 16:45         ` [patch] install.texi: For gcn, recommend LLVM 15, unless gfx1100 is disabled (was: [patch] amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs) Tobias Burnus
@ 2024-01-29 10:01           ` Andrew Stubbs
  0 siblings, 0 replies; 25+ messages in thread
From: Andrew Stubbs @ 2024-01-29 10:01 UTC (permalink / raw)
  To: Tobias Burnus, Thomas Schwinge; +Cc: gcc-patches, rguenther

On 26/01/2024 16:45, Tobias Burnus wrote:
> Hi,
> 
> Thomas Schwinge wrote:
>> amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to 
>> the docs
>> ...
>> Further down in that file, we state:
>>      @anchor{amdgcn-x-amdhsa}
>>      @heading amdgcn-*-amdhsa
>>      AMD GCN GPU target.
>>      
>>      Instead of GNU Binutils, you will need to install LLVM 13.0.1, or later, [...]
>>
>> LLVM 13.0.1 may still be fine for gfx1030
>> ('[...]/amdgcn-amdhsa/gfx1030/libgcc' does get built; I've not further
>> tested), but it's not sufficient for gfx1100 anymore:
> 
> Testing with the system compilers here, llvm-mc-14.0.6 also fails while 
> llvm-mc-15.0.7 accepts it.
> 
>> Which version of LLVM should we be recommending?
> 
>  >= LLVM 15, I think. How about the following wording? It still mentions 
> LLVM 13.0.1 for those that really need it but with for the default 
> setup, it requires 15+.

OK.

Andrew



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] amdgcn: additional gfx1100 support
  2024-01-24 12:43 [PATCH] amdgcn: additional gfx1100 support Andrew Stubbs
  2024-01-24 16:01 ` [patch] amdgcn: config.gcc - enable gfx1100 multilib; add gfx1100 to docs (was: [PATCH] amdgcn: additional gfx1100 support) Tobias Burnus
@ 2024-01-26  8:56 ` Richard Biener
  2024-01-26  9:45   ` Richard Biener
  2024-03-06 13:49 ` amdgcn: additional gfx1030/gfx1100 support: adjust test cases (was: [PATCH] amdgcn: additional gfx1100 support) Thomas Schwinge
  2 siblings, 1 reply; 25+ messages in thread
From: Richard Biener @ 2024-01-26  8:56 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: gcc-patches, pa

On Wed, 24 Jan 2024, Andrew Stubbs wrote:

> This is enough to get gfx1100 working for most purposes, on top of the
> patch that Tobias committed a week or so ago; there are still some test
> failures to investigate, and probably some tuning to do.
> 
> It might also get gfx1030 working too. @Richi, could you test it,
> please?

I can report partial success here.  I do see quite some FAILs because of

/space/rguenther/src/gcc-autopar_devel/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-4.f90: 
In function 'accum_._omp_fn.1':^M
/space/rguenther/src/gcc-autopar_devel/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-4.f90:20:38: 
error: unrecognizable insn:^M
(insn 108 107 109 6 (set (reg:V8SF 849)^M
        (unspec:V8SF [^M
                (reg:V8SF 844 [ vect__43.12_106 ]) repeated x2^M
                (const_int 1 [0x1])^M
            ] UNSPEC_PLUS_DPP_SHR)) 
"/space/rguenther/src/gcc-autopar_devel/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-4.f90":22:29 
discrim 1 -1^M
     (nil))^M
during RTL pass: vregs^M
/space/rguenther/src/gcc-autopar_devel/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-4.f90:20:38: 
internal compiler error: in extract_insn, at recog.cc:2812^M

there are also quite a number of execution FAILs like

icv-5.exe: 
/space/rguenther/src/gcc-autopar_devel/libgomp/plugin/plugin-gcn.c:2462: 
isa_matches_agent: Assertion `agent_isa_s' failed.
FAIL: libgomp.c/../libgomp.c-c++-common/icv-5.c execution test

(the assert in question looks bad - yeah, somehow we got past
device initialization - not sure how - but end up here)

Maybe HSA behaves odd here - I didn't constrain the device it should
choose but it works (most of the time).  GCN_DEBUG prints me all the
HSA agents available but I don't see any debug on which agent
is actually initialized during libgomp device init (at least nothing
I can easily match up).  Maybe sth to improve.

I'll followup with a test summary once the (serial) run of libgomp
testing finished.  At least there are quite some number of
actual kernel executions and PASSing testcases.

Richard.

> I can't test the other multilibs right now. @PA, can you test it please?
> 
> I can self-approve the patch, but I'll hold off the commit until the
> test results come back.
> 
> Andrew
> 
> gcc/ChangeLog:
> 
> 	* config/gcn/gcn-opts.h (TARGET_PACKED_WORK_ITEMS): Add TARGET_RDNA3.
> 	* config/gcn/gcn-valu.md (all_convert): New iterator.
> 	(<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>): New
> 	define_expand, and rename the old one to ...
> 	(*<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>): ... this.
> 	(extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>): Likewise, to ...
> 	(extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>): .. this.
> 	(*<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_shift<exec>): New.
> 	* config/gcn/gcn.cc (gcn_global_address_p): Use "offsetbits" correctly.
> 	(gcn_hsa_declare_function_name): Update the vgpr counting for gfx1100.
> 	* config/gcn/gcn.md (<u>mulhisi3): Disable on RDNA3.
> 	(<u>mulqihi3_scalar): Likewise.
> 
> libgcc/ChangeLog:
> 
> 	* config/gcn/amdgcn_veclib.h (CDNA3_PLUS): Handle RDNA3.
> 
> libgomp/ChangeLog:
> 
> 	* config/gcn/time.c (RTC_TICKS): Configure RDNA3.
> 	(omp_get_wtime): Add RDNA3-compatible variant.
> 	* plugin/plugin-gcn.c (max_isa_vgprs): Tune for gfx1030 and gfx1100.
> 
> Signed-off-by:  Andrew Stubbs <ams@baylibre.com>
> ---
>  gcc/config/gcn/gcn-opts.h         |  2 +-
>  gcc/config/gcn/gcn-valu.md        | 41 ++++++++++++++++++++++++++++---
>  gcc/config/gcn/gcn.cc             | 31 ++++++++++++++++-------
>  gcc/config/gcn/gcn.md             |  4 +--
>  libgcc/config/gcn/amdgcn_veclib.h |  2 +-
>  libgomp/config/gcn/time.c         | 10 ++++++++
>  libgomp/plugin/plugin-gcn.c       |  6 +++--
>  7 files changed, 77 insertions(+), 19 deletions(-)
> 
> diff --git a/gcc/config/gcn/gcn-opts.h b/gcc/config/gcn/gcn-opts.h
> index 79fbda3ab25..6be2c9204fa 100644
> --- a/gcc/config/gcn/gcn-opts.h
> +++ b/gcc/config/gcn/gcn-opts.h
> @@ -62,7 +62,7 @@ extern enum gcn_isa {
>  
>  
>  #define TARGET_M0_LDS_LIMIT (TARGET_GCN3)
> -#define TARGET_PACKED_WORK_ITEMS (TARGET_CDNA2_PLUS)
> +#define TARGET_PACKED_WORK_ITEMS (TARGET_CDNA2_PLUS || TARGET_RDNA3)
>  
>  #define TARGET_XNACK (flag_xnack != HSACO_ATTR_OFF)
>  
> diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
> index 3d5b6271ee6..cd027f8b369 100644
> --- a/gcc/config/gcn/gcn-valu.md
> +++ b/gcc/config/gcn/gcn-valu.md
> @@ -3555,30 +3555,63 @@
>  ;; }}}
>  ;; {{{ Int/int conversions
>  
> +(define_code_iterator all_convert [truncate zero_extend sign_extend])
>  (define_code_iterator zero_convert [truncate zero_extend])
>  (define_code_attr convop [
>  	(sign_extend "extend")
>  	(zero_extend "zero_extend")
>  	(truncate "trunc")])
>  
> -(define_insn "<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>"
> +(define_expand "<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>"
> +  [(set (match_operand:V_INT_1REG 0 "register_operand"      "=v")
> +        (all_convert:V_INT_1REG
> +	  (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))]
> +  "")
> +
> +(define_insn "*<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>"
>    [(set (match_operand:V_INT_1REG 0 "register_operand"      "=v")
>          (zero_convert:V_INT_1REG
>  	  (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))]
> -  ""
> +  "!TARGET_RDNA3"
>    "v_mov_b32_sdwa\t%0, %1 dst_sel:<V_INT_1REG:sdwa> dst_unused:UNUSED_PAD src0_sel:<V_INT_1REG_ALT:sdwa>"
>    [(set_attr "type" "vop_sdwa")
>     (set_attr "length" "8")])
>  
> -(define_insn "extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>"
> +(define_insn "extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>"
>    [(set (match_operand:V_INT_1REG 0 "register_operand"	    "=v")
>          (sign_extend:V_INT_1REG
>  	  (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))]
> -  ""
> +  "!TARGET_RDNA3"
>    "v_mov_b32_sdwa\t%0, sext(%1) src0_sel:<V_INT_1REG_ALT:sdwa>"
>    [(set_attr "type" "vop_sdwa")
>     (set_attr "length" "8")])
>  
> +(define_insn "*<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_shift<exec>"
> +  [(set (match_operand:V_INT_1REG 0 "register_operand"      "=v")
> +        (all_convert:V_INT_1REG
> +	  (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))]
> +  "TARGET_RDNA3"
> +  {
> +    enum {extend, zero_extend, trunc};
> +    rtx shiftwidth = (<V_INT_1REG_ALT:SCALAR_MODE>mode == QImode
> +		      || <V_INT_1REG:SCALAR_MODE>mode == QImode
> +		      ? GEN_INT (24)
> +		      : <V_INT_1REG_ALT:SCALAR_MODE>mode == HImode
> +		        || <V_INT_1REG:SCALAR_MODE>mode == HImode
> +		      ? GEN_INT (16)
> +		      : NULL);
> +    operands[2] = shiftwidth;
> +
> +    if (!shiftwidth)
> +      return "v_mov_b32 %0, %1";
> +    else if (<convop> == extend || <convop> == trunc)
> +      return "v_lshlrev_b32\t%0, %2, %1\;v_ashrrev_i32\t%0, %2, %0";
> +    else
> +      return "v_lshlrev_b32\t%0, %2, %1\;v_lshrrev_b32\t%0, %2, %0";
> +  }
> +  [(set_attr "type" "mult")
> +   (set_attr "length" "8")])
> +
>  ;; GCC can already do these for scalar types, but not for vector types.
>  ;; Unfortunately you can't just do SUBREG on a vector to select the low part,
>  ;; so there must be a few tricks here.
> diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
> index e668ce7c69e..e80de2ce056 100644
> --- a/gcc/config/gcn/gcn.cc
> +++ b/gcc/config/gcn/gcn.cc
> @@ -1597,8 +1597,8 @@ gcn_global_address_p (rtx addr)
>        rtx offset = XEXP (addr, 1);
>        int offsetbits = (TARGET_RDNA2_PLUS ? 11 : 12);
>        bool immediate_p = (CONST_INT_P (offset)
> -			  && INTVAL (offset) >= -(1 << 12)
> -			  && INTVAL (offset) < (1 << 12));
> +			  && INTVAL (offset) >= -(1 << offsetbits)
> +			  && INTVAL (offset) < (1 << offsetbits));
>  
>        if ((gcn_address_register_p (base, DImode, false)
>  	   || gcn_vec_address_register_p (base, DImode, false))
> @@ -6597,8 +6597,10 @@ gcn_hsa_declare_function_name (FILE *file, const char *name,
>      if (df_regs_ever_live_p (FIRST_AVGPR_REG + avgpr))
>        break;
>    avgpr++;
> -  vgpr = (vgpr + 3) & ~3;
> -  avgpr = (avgpr + 3) & ~3;
> +
> +  /* The main function epilogue uses v8, but df doesn't see that.  */
> +  if (vgpr < 9)
> +    vgpr = 9;
>  
>    if (!leaf_function_p ())
>      {
> @@ -6611,9 +6613,18 @@ gcn_hsa_declare_function_name (FILE *file, const char *name,
>  	avgpr = MAX_NORMAL_AVGPR_COUNT;
>      }
>  
> -  /* The gfx90a accum_offset field can't represent 0 registers.  */
> -  if (gcn_arch == PROCESSOR_GFX90a && vgpr < 4)
> -    vgpr = 4;
> +  /* SIMD32 devices count double in wavefront64 mode.  */
> +  if (TARGET_RDNA2_PLUS)
> +    vgpr *= 2;
> +
> +  /* Round up to the allocation block size.  */
> +  int vgpr_block_size = (TARGET_RDNA3 ? 12
> +			 : TARGET_RDNA2_PLUS || TARGET_CDNA2_PLUS ? 8
> +			 : 4);
> +  if (vgpr % vgpr_block_size)
> +    vgpr += vgpr_block_size - (vgpr % vgpr_block_size);
> +  if (avgpr % vgpr_block_size)
> +    avgpr += vgpr_block_size - (avgpr % vgpr_block_size);
>  
>    fputs ("\t.rodata\n"
>  	 "\t.p2align\t6\n"
> @@ -6714,12 +6725,14 @@ gcn_hsa_declare_function_name (FILE *file, const char *name,
>  	   "            .private_segment_fixed_size: 0\n"
>  	   "            .wavefront_size: 64\n"
>  	   "            .sgpr_count: %i\n"
> -	   "            .vgpr_count: %i\n"
> +	   "            .vgpr_count: %i%s\n"
>  	   "            .max_flat_workgroup_size: 1024\n",
>  	   cfun->machine->kernarg_segment_byte_size,
>  	   cfun->machine->kernarg_segment_alignment,
>  	   LDS_SIZE,
> -	   sgpr, next_free_vgpr);
> +	   sgpr, next_free_vgpr,
> +	   (TARGET_RDNA2_PLUS ? " ; wavefrontsize64 counts double on SIMD32"
> +	    : ""));
>    if (gcn_arch == PROCESSOR_GFX90a || gcn_arch == PROCESSOR_GFX908)
>      fprintf (file, "            .agpr_count: %i\n", avgpr);
>    fputs ("        .end_amdgpu_metadata\n", file);
> diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md
> index 492b833e255..1f3c692b7a6 100644
> --- a/gcc/config/gcn/gcn.md
> +++ b/gcc/config/gcn/gcn.md
> @@ -1618,7 +1618,7 @@
>  	(mult:SI
>  	  (any_extend:SI (match_operand:HI 1 "register_operand" "%v"))
>  	  (any_extend:SI (match_operand:HI 2 "register_operand" " v"))))]
> -  ""
> +  "!TARGET_RDNA3"
>    "v_mul_<iu>32_<iu>24_sdwa\t%0, %<e>1, %<e>2 src0_sel:WORD_0 src1_sel:WORD_0"
>    [(set_attr "type" "vop_sdwa")
>     (set_attr "length" "8")])
> @@ -1628,7 +1628,7 @@
>  	(mult:HI
>  	  (any_extend:HI (match_operand:QI 1 "register_operand" "%v"))
>  	  (any_extend:HI (match_operand:QI 2 "register_operand" " v"))))]
> -  ""
> +  "!TARGET_RDNA3"
>    "v_mul_<iu>32_<iu>24_sdwa\t%0, %<e>1, %<e>2 src0_sel:BYTE_0 src1_sel:BYTE_0"
>    [(set_attr "type" "vop_sdwa")
>     (set_attr "length" "8")])
> diff --git a/libgcc/config/gcn/amdgcn_veclib.h b/libgcc/config/gcn/amdgcn_veclib.h
> index 821f6386dd6..d268c6cac16 100644
> --- a/libgcc/config/gcn/amdgcn_veclib.h
> +++ b/libgcc/config/gcn/amdgcn_veclib.h
> @@ -230,7 +230,7 @@ do { \
>  
>  #if defined (__GCN3__) || defined (__GCN5__) \
>      || defined (__CDNA1__) || defined (__CDNA2__) \
> -    || defined (__RDNA2__)
> +    || defined (__RDNA2__) || defined (__RDNA3__)
>  #define CDNA3_PLUS 0
>  #else
>  #define CDNA3_PLUS 1
> diff --git a/libgomp/config/gcn/time.c b/libgomp/config/gcn/time.c
> index 30a0d0188e4..efcd04f5f43 100644
> --- a/libgomp/config/gcn/time.c
> +++ b/libgomp/config/gcn/time.c
> @@ -30,15 +30,25 @@
>  /* According to AMD:
>      dGPU RTC is 27MHz
>      AGPU RTC is 100MHz
> +    RDNA3 ISA manual states "typically 100MHz"
>     FIXME: DTRT on an APU.  */
> +#ifdef __RDNA3__
> +#define RTC_TICKS (1.0 / 100000000.0) /* 100MHz */
> +#else
>  #define RTC_TICKS (1.0 / 27000000.0) /* 27MHz */
> +#endif
>  
>  double
>  omp_get_wtime (void)
>  {
>    uint64_t clock;
> +#ifdef __RDNA3__
> +  asm ("s_sendmsg_rtn_b64 %0 0x83 ;Get REALTIME\n\t"
> +       "s_waitcnt 0" : "=r" (clock));
> +#else
>    asm ("s_memrealtime %0\n\t"
>         "s_waitcnt 0" : "=r" (clock));
> +#endif
>    return clock * RTC_TICKS;
>  }
>  
> diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
> index 0339848451e..db28781dedb 100644
> --- a/libgomp/plugin/plugin-gcn.c
> +++ b/libgomp/plugin/plugin-gcn.c
> @@ -1741,11 +1741,13 @@ max_isa_vgprs (int isa)
>      case EF_AMDGPU_MACH_AMDGCN_GFX900:
>      case EF_AMDGPU_MACH_AMDGCN_GFX906:
>      case EF_AMDGPU_MACH_AMDGCN_GFX908:
> -    case EF_AMDGPU_MACH_AMDGCN_GFX1030:
> -    case EF_AMDGPU_MACH_AMDGCN_GFX1100:
>        return 256;
>      case EF_AMDGPU_MACH_AMDGCN_GFX90a:
>        return 512;
> +    case EF_AMDGPU_MACH_AMDGCN_GFX1030:
> +      return 512;  /* 512 SIMD32 = 256 wavefrontsize64.  */
> +    case EF_AMDGPU_MACH_AMDGCN_GFX1100:
> +      return 1536; /* 1536 SIMD32 = 768 wavefrontsize64.  */
>      }
>    GOMP_PLUGIN_fatal ("unhandled ISA in max_isa_vgprs");
>  }
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] amdgcn: additional gfx1100 support
  2024-01-26  8:56 ` [PATCH] amdgcn: additional gfx1100 support Richard Biener
@ 2024-01-26  9:45   ` Richard Biener
  2024-01-26 10:19     ` Andrew Stubbs
  2024-02-01 14:41     ` libgomp GCN gfx1030/gfx1100 offloading status (was: [PATCH] amdgcn: additional gfx1100 support) Thomas Schwinge
  0 siblings, 2 replies; 25+ messages in thread
From: Richard Biener @ 2024-01-26  9:45 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: gcc-patches, pa

[-- Attachment #1: Type: text/plain, Size: 14213 bytes --]

On Fri, 26 Jan 2024, Richard Biener wrote:

> On Wed, 24 Jan 2024, Andrew Stubbs wrote:
> 
> > This is enough to get gfx1100 working for most purposes, on top of the
> > patch that Tobias committed a week or so ago; there are still some test
> > failures to investigate, and probably some tuning to do.
> > 
> > It might also get gfx1030 working too. @Richi, could you test it,
> > please?
> 
> I can report partial success here.  I do see quite some FAILs because of
> 
> /space/rguenther/src/gcc-autopar_devel/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-4.f90: 
> In function 'accum_._omp_fn.1':^M
> /space/rguenther/src/gcc-autopar_devel/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-4.f90:20:38: 
> error: unrecognizable insn:^M
> (insn 108 107 109 6 (set (reg:V8SF 849)^M
>         (unspec:V8SF [^M
>                 (reg:V8SF 844 [ vect__43.12_106 ]) repeated x2^M
>                 (const_int 1 [0x1])^M
>             ] UNSPEC_PLUS_DPP_SHR)) 
> "/space/rguenther/src/gcc-autopar_devel/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-4.f90":22:29 
> discrim 1 -1^M
>      (nil))^M
> during RTL pass: vregs^M
> /space/rguenther/src/gcc-autopar_devel/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-4.f90:20:38: 
> internal compiler error: in extract_insn, at recog.cc:2812^M
> 
> there are also quite a number of execution FAILs like
> 
> icv-5.exe: 
> /space/rguenther/src/gcc-autopar_devel/libgomp/plugin/plugin-gcn.c:2462: 
> isa_matches_agent: Assertion `agent_isa_s' failed.
> FAIL: libgomp.c/../libgomp.c-c++-common/icv-5.c execution test
> 
> (the assert in question looks bad - yeah, somehow we got past
> device initialization - not sure how - but end up here)
> 
> Maybe HSA behaves odd here - I didn't constrain the device it should
> choose but it works (most of the time).  GCN_DEBUG prints me all the
> HSA agents available but I don't see any debug on which agent
> is actually initialized during libgomp device init (at least nothing
> I can easily match up).  Maybe sth to improve.
> 
> I'll followup with a test summary once the (serial) run of libgomp
> testing finished.  At least there are quite some number of
> actual kernel executions and PASSing testcases.

                === libgomp Summary ===

# of expected passes            29126
# of unexpected failures        697
# of unexpected successes       1
# of expected failures          703
# of unresolved testcases       318
# of unsupported tests          766

full summary attached (compressed).  Even compressed libgomp.log is
too big to send.

Richard.

> 
> Richard.
> 
> > I can't test the other multilibs right now. @PA, can you test it please?
> > 
> > I can self-approve the patch, but I'll hold off the commit until the
> > test results come back.
> > 
> > Andrew
> > 
> > gcc/ChangeLog:
> > 
> > 	* config/gcn/gcn-opts.h (TARGET_PACKED_WORK_ITEMS): Add TARGET_RDNA3.
> > 	* config/gcn/gcn-valu.md (all_convert): New iterator.
> > 	(<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>): New
> > 	define_expand, and rename the old one to ...
> > 	(*<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>): ... this.
> > 	(extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>): Likewise, to ...
> > 	(extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>): .. this.
> > 	(*<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_shift<exec>): New.
> > 	* config/gcn/gcn.cc (gcn_global_address_p): Use "offsetbits" correctly.
> > 	(gcn_hsa_declare_function_name): Update the vgpr counting for gfx1100.
> > 	* config/gcn/gcn.md (<u>mulhisi3): Disable on RDNA3.
> > 	(<u>mulqihi3_scalar): Likewise.
> > 
> > libgcc/ChangeLog:
> > 
> > 	* config/gcn/amdgcn_veclib.h (CDNA3_PLUS): Handle RDNA3.
> > 
> > libgomp/ChangeLog:
> > 
> > 	* config/gcn/time.c (RTC_TICKS): Configure RDNA3.
> > 	(omp_get_wtime): Add RDNA3-compatible variant.
> > 	* plugin/plugin-gcn.c (max_isa_vgprs): Tune for gfx1030 and gfx1100.
> > 
> > Signed-off-by:  Andrew Stubbs <ams@baylibre.com>
> > ---
> >  gcc/config/gcn/gcn-opts.h         |  2 +-
> >  gcc/config/gcn/gcn-valu.md        | 41 ++++++++++++++++++++++++++++---
> >  gcc/config/gcn/gcn.cc             | 31 ++++++++++++++++-------
> >  gcc/config/gcn/gcn.md             |  4 +--
> >  libgcc/config/gcn/amdgcn_veclib.h |  2 +-
> >  libgomp/config/gcn/time.c         | 10 ++++++++
> >  libgomp/plugin/plugin-gcn.c       |  6 +++--
> >  7 files changed, 77 insertions(+), 19 deletions(-)
> > 
> > diff --git a/gcc/config/gcn/gcn-opts.h b/gcc/config/gcn/gcn-opts.h
> > index 79fbda3ab25..6be2c9204fa 100644
> > --- a/gcc/config/gcn/gcn-opts.h
> > +++ b/gcc/config/gcn/gcn-opts.h
> > @@ -62,7 +62,7 @@ extern enum gcn_isa {
> >  
> >  
> >  #define TARGET_M0_LDS_LIMIT (TARGET_GCN3)
> > -#define TARGET_PACKED_WORK_ITEMS (TARGET_CDNA2_PLUS)
> > +#define TARGET_PACKED_WORK_ITEMS (TARGET_CDNA2_PLUS || TARGET_RDNA3)
> >  
> >  #define TARGET_XNACK (flag_xnack != HSACO_ATTR_OFF)
> >  
> > diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
> > index 3d5b6271ee6..cd027f8b369 100644
> > --- a/gcc/config/gcn/gcn-valu.md
> > +++ b/gcc/config/gcn/gcn-valu.md
> > @@ -3555,30 +3555,63 @@
> >  ;; }}}
> >  ;; {{{ Int/int conversions
> >  
> > +(define_code_iterator all_convert [truncate zero_extend sign_extend])
> >  (define_code_iterator zero_convert [truncate zero_extend])
> >  (define_code_attr convop [
> >  	(sign_extend "extend")
> >  	(zero_extend "zero_extend")
> >  	(truncate "trunc")])
> >  
> > -(define_insn "<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>"
> > +(define_expand "<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>"
> > +  [(set (match_operand:V_INT_1REG 0 "register_operand"      "=v")
> > +        (all_convert:V_INT_1REG
> > +	  (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))]
> > +  "")
> > +
> > +(define_insn "*<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>"
> >    [(set (match_operand:V_INT_1REG 0 "register_operand"      "=v")
> >          (zero_convert:V_INT_1REG
> >  	  (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))]
> > -  ""
> > +  "!TARGET_RDNA3"
> >    "v_mov_b32_sdwa\t%0, %1 dst_sel:<V_INT_1REG:sdwa> dst_unused:UNUSED_PAD src0_sel:<V_INT_1REG_ALT:sdwa>"
> >    [(set_attr "type" "vop_sdwa")
> >     (set_attr "length" "8")])
> >  
> > -(define_insn "extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>"
> > +(define_insn "extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>"
> >    [(set (match_operand:V_INT_1REG 0 "register_operand"	    "=v")
> >          (sign_extend:V_INT_1REG
> >  	  (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))]
> > -  ""
> > +  "!TARGET_RDNA3"
> >    "v_mov_b32_sdwa\t%0, sext(%1) src0_sel:<V_INT_1REG_ALT:sdwa>"
> >    [(set_attr "type" "vop_sdwa")
> >     (set_attr "length" "8")])
> >  
> > +(define_insn "*<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_shift<exec>"
> > +  [(set (match_operand:V_INT_1REG 0 "register_operand"      "=v")
> > +        (all_convert:V_INT_1REG
> > +	  (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))]
> > +  "TARGET_RDNA3"
> > +  {
> > +    enum {extend, zero_extend, trunc};
> > +    rtx shiftwidth = (<V_INT_1REG_ALT:SCALAR_MODE>mode == QImode
> > +		      || <V_INT_1REG:SCALAR_MODE>mode == QImode
> > +		      ? GEN_INT (24)
> > +		      : <V_INT_1REG_ALT:SCALAR_MODE>mode == HImode
> > +		        || <V_INT_1REG:SCALAR_MODE>mode == HImode
> > +		      ? GEN_INT (16)
> > +		      : NULL);
> > +    operands[2] = shiftwidth;
> > +
> > +    if (!shiftwidth)
> > +      return "v_mov_b32 %0, %1";
> > +    else if (<convop> == extend || <convop> == trunc)
> > +      return "v_lshlrev_b32\t%0, %2, %1\;v_ashrrev_i32\t%0, %2, %0";
> > +    else
> > +      return "v_lshlrev_b32\t%0, %2, %1\;v_lshrrev_b32\t%0, %2, %0";
> > +  }
> > +  [(set_attr "type" "mult")
> > +   (set_attr "length" "8")])
> > +
> >  ;; GCC can already do these for scalar types, but not for vector types.
> >  ;; Unfortunately you can't just do SUBREG on a vector to select the low part,
> >  ;; so there must be a few tricks here.
> > diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
> > index e668ce7c69e..e80de2ce056 100644
> > --- a/gcc/config/gcn/gcn.cc
> > +++ b/gcc/config/gcn/gcn.cc
> > @@ -1597,8 +1597,8 @@ gcn_global_address_p (rtx addr)
> >        rtx offset = XEXP (addr, 1);
> >        int offsetbits = (TARGET_RDNA2_PLUS ? 11 : 12);
> >        bool immediate_p = (CONST_INT_P (offset)
> > -			  && INTVAL (offset) >= -(1 << 12)
> > -			  && INTVAL (offset) < (1 << 12));
> > +			  && INTVAL (offset) >= -(1 << offsetbits)
> > +			  && INTVAL (offset) < (1 << offsetbits));
> >  
> >        if ((gcn_address_register_p (base, DImode, false)
> >  	   || gcn_vec_address_register_p (base, DImode, false))
> > @@ -6597,8 +6597,10 @@ gcn_hsa_declare_function_name (FILE *file, const char *name,
> >      if (df_regs_ever_live_p (FIRST_AVGPR_REG + avgpr))
> >        break;
> >    avgpr++;
> > -  vgpr = (vgpr + 3) & ~3;
> > -  avgpr = (avgpr + 3) & ~3;
> > +
> > +  /* The main function epilogue uses v8, but df doesn't see that.  */
> > +  if (vgpr < 9)
> > +    vgpr = 9;
> >  
> >    if (!leaf_function_p ())
> >      {
> > @@ -6611,9 +6613,18 @@ gcn_hsa_declare_function_name (FILE *file, const char *name,
> >  	avgpr = MAX_NORMAL_AVGPR_COUNT;
> >      }
> >  
> > -  /* The gfx90a accum_offset field can't represent 0 registers.  */
> > -  if (gcn_arch == PROCESSOR_GFX90a && vgpr < 4)
> > -    vgpr = 4;
> > +  /* SIMD32 devices count double in wavefront64 mode.  */
> > +  if (TARGET_RDNA2_PLUS)
> > +    vgpr *= 2;
> > +
> > +  /* Round up to the allocation block size.  */
> > +  int vgpr_block_size = (TARGET_RDNA3 ? 12
> > +			 : TARGET_RDNA2_PLUS || TARGET_CDNA2_PLUS ? 8
> > +			 : 4);
> > +  if (vgpr % vgpr_block_size)
> > +    vgpr += vgpr_block_size - (vgpr % vgpr_block_size);
> > +  if (avgpr % vgpr_block_size)
> > +    avgpr += vgpr_block_size - (avgpr % vgpr_block_size);
> >  
> >    fputs ("\t.rodata\n"
> >  	 "\t.p2align\t6\n"
> > @@ -6714,12 +6725,14 @@ gcn_hsa_declare_function_name (FILE *file, const char *name,
> >  	   "            .private_segment_fixed_size: 0\n"
> >  	   "            .wavefront_size: 64\n"
> >  	   "            .sgpr_count: %i\n"
> > -	   "            .vgpr_count: %i\n"
> > +	   "            .vgpr_count: %i%s\n"
> >  	   "            .max_flat_workgroup_size: 1024\n",
> >  	   cfun->machine->kernarg_segment_byte_size,
> >  	   cfun->machine->kernarg_segment_alignment,
> >  	   LDS_SIZE,
> > -	   sgpr, next_free_vgpr);
> > +	   sgpr, next_free_vgpr,
> > +	   (TARGET_RDNA2_PLUS ? " ; wavefrontsize64 counts double on SIMD32"
> > +	    : ""));
> >    if (gcn_arch == PROCESSOR_GFX90a || gcn_arch == PROCESSOR_GFX908)
> >      fprintf (file, "            .agpr_count: %i\n", avgpr);
> >    fputs ("        .end_amdgpu_metadata\n", file);
> > diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md
> > index 492b833e255..1f3c692b7a6 100644
> > --- a/gcc/config/gcn/gcn.md
> > +++ b/gcc/config/gcn/gcn.md
> > @@ -1618,7 +1618,7 @@
> >  	(mult:SI
> >  	  (any_extend:SI (match_operand:HI 1 "register_operand" "%v"))
> >  	  (any_extend:SI (match_operand:HI 2 "register_operand" " v"))))]
> > -  ""
> > +  "!TARGET_RDNA3"
> >    "v_mul_<iu>32_<iu>24_sdwa\t%0, %<e>1, %<e>2 src0_sel:WORD_0 src1_sel:WORD_0"
> >    [(set_attr "type" "vop_sdwa")
> >     (set_attr "length" "8")])
> > @@ -1628,7 +1628,7 @@
> >  	(mult:HI
> >  	  (any_extend:HI (match_operand:QI 1 "register_operand" "%v"))
> >  	  (any_extend:HI (match_operand:QI 2 "register_operand" " v"))))]
> > -  ""
> > +  "!TARGET_RDNA3"
> >    "v_mul_<iu>32_<iu>24_sdwa\t%0, %<e>1, %<e>2 src0_sel:BYTE_0 src1_sel:BYTE_0"
> >    [(set_attr "type" "vop_sdwa")
> >     (set_attr "length" "8")])
> > diff --git a/libgcc/config/gcn/amdgcn_veclib.h b/libgcc/config/gcn/amdgcn_veclib.h
> > index 821f6386dd6..d268c6cac16 100644
> > --- a/libgcc/config/gcn/amdgcn_veclib.h
> > +++ b/libgcc/config/gcn/amdgcn_veclib.h
> > @@ -230,7 +230,7 @@ do { \
> >  
> >  #if defined (__GCN3__) || defined (__GCN5__) \
> >      || defined (__CDNA1__) || defined (__CDNA2__) \
> > -    || defined (__RDNA2__)
> > +    || defined (__RDNA2__) || defined (__RDNA3__)
> >  #define CDNA3_PLUS 0
> >  #else
> >  #define CDNA3_PLUS 1
> > diff --git a/libgomp/config/gcn/time.c b/libgomp/config/gcn/time.c
> > index 30a0d0188e4..efcd04f5f43 100644
> > --- a/libgomp/config/gcn/time.c
> > +++ b/libgomp/config/gcn/time.c
> > @@ -30,15 +30,25 @@
> >  /* According to AMD:
> >      dGPU RTC is 27MHz
> >      AGPU RTC is 100MHz
> > +    RDNA3 ISA manual states "typically 100MHz"
> >     FIXME: DTRT on an APU.  */
> > +#ifdef __RDNA3__
> > +#define RTC_TICKS (1.0 / 100000000.0) /* 100MHz */
> > +#else
> >  #define RTC_TICKS (1.0 / 27000000.0) /* 27MHz */
> > +#endif
> >  
> >  double
> >  omp_get_wtime (void)
> >  {
> >    uint64_t clock;
> > +#ifdef __RDNA3__
> > +  asm ("s_sendmsg_rtn_b64 %0 0x83 ;Get REALTIME\n\t"
> > +       "s_waitcnt 0" : "=r" (clock));
> > +#else
> >    asm ("s_memrealtime %0\n\t"
> >         "s_waitcnt 0" : "=r" (clock));
> > +#endif
> >    return clock * RTC_TICKS;
> >  }
> >  
> > diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
> > index 0339848451e..db28781dedb 100644
> > --- a/libgomp/plugin/plugin-gcn.c
> > +++ b/libgomp/plugin/plugin-gcn.c
> > @@ -1741,11 +1741,13 @@ max_isa_vgprs (int isa)
> >      case EF_AMDGPU_MACH_AMDGCN_GFX900:
> >      case EF_AMDGPU_MACH_AMDGCN_GFX906:
> >      case EF_AMDGPU_MACH_AMDGCN_GFX908:
> > -    case EF_AMDGPU_MACH_AMDGCN_GFX1030:
> > -    case EF_AMDGPU_MACH_AMDGCN_GFX1100:
> >        return 256;
> >      case EF_AMDGPU_MACH_AMDGCN_GFX90a:
> >        return 512;
> > +    case EF_AMDGPU_MACH_AMDGCN_GFX1030:
> > +      return 512;  /* 512 SIMD32 = 256 wavefrontsize64.  */
> > +    case EF_AMDGPU_MACH_AMDGCN_GFX1100:
> > +      return 1536; /* 1536 SIMD32 = 768 wavefrontsize64.  */
> >      }
> >    GOMP_PLUGIN_fatal ("unhandled ISA in max_isa_vgprs");
> >  }
> > 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

[-- Attachment #2: test summary --]
[-- Type: application/x-gzip, Size: 6007 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] amdgcn: additional gfx1100 support
  2024-01-26  9:45   ` Richard Biener
@ 2024-01-26 10:19     ` Andrew Stubbs
  2024-01-26 10:22       ` Richard Biener
  2024-02-01 14:41     ` libgomp GCN gfx1030/gfx1100 offloading status (was: [PATCH] amdgcn: additional gfx1100 support) Thomas Schwinge
  1 sibling, 1 reply; 25+ messages in thread
From: Andrew Stubbs @ 2024-01-26 10:19 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, pa

On 26/01/2024 09:45, Richard Biener wrote:
> On Fri, 26 Jan 2024, Richard Biener wrote:
> 
>                  === libgomp Summary ===
> 
> # of expected passes            29126
> # of unexpected failures        697
> # of unexpected successes       1
> # of expected failures          703
> # of unresolved testcases       318
> # of unsupported tests          766
> 
> full summary attached (compressed).  Even compressed libgomp.log is
> too big to send.
> 
> Richard.

I think this is good enough to start with. PA reported clean results for 
everything except gfx900 (looks like an unrelated issue).

I'll go ahead and commit the patch.

Hopefully Tobias's patch has already trimmed all the "-g" failures from 
that list.

Andrew

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] amdgcn: additional gfx1100 support
  2024-01-26 10:19     ` Andrew Stubbs
@ 2024-01-26 10:22       ` Richard Biener
  2024-01-26 10:31         ` Andrew Stubbs
  0 siblings, 1 reply; 25+ messages in thread
From: Richard Biener @ 2024-01-26 10:22 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: gcc-patches, pa

On Fri, 26 Jan 2024, Andrew Stubbs wrote:

> On 26/01/2024 09:45, Richard Biener wrote:
> > On Fri, 26 Jan 2024, Richard Biener wrote:
> > 
> >                  === libgomp Summary ===
> > 
> > # of expected passes            29126
> > # of unexpected failures        697
> > # of unexpected successes       1
> > # of expected failures          703
> > # of unresolved testcases       318
> > # of unsupported tests          766
> > 
> > full summary attached (compressed).  Even compressed libgomp.log is
> > too big to send.
> > 
> > Richard.
> 
> I think this is good enough to start with. PA reported clean results for
> everything except gfx900 (looks like an unrelated issue).
> 
> I'll go ahead and commit the patch.
> 
> Hopefully Tobias's patch has already trimmed all the "-g" failures from that
> list.

Should I open a bug for the ICE?  That's responsible for quite a number
of failures as well.

Richard.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] amdgcn: additional gfx1100 support
  2024-01-26 10:22       ` Richard Biener
@ 2024-01-26 10:31         ` Andrew Stubbs
  0 siblings, 0 replies; 25+ messages in thread
From: Andrew Stubbs @ 2024-01-26 10:31 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, pa

On 26/01/2024 10:22, Richard Biener wrote:
> On Fri, 26 Jan 2024, Andrew Stubbs wrote:
> 
>> On 26/01/2024 09:45, Richard Biener wrote:
>>> On Fri, 26 Jan 2024, Richard Biener wrote:
>>>
>>>                   === libgomp Summary ===
>>>
>>> # of expected passes            29126
>>> # of unexpected failures        697
>>> # of unexpected successes       1
>>> # of expected failures          703
>>> # of unresolved testcases       318
>>> # of unsupported tests          766
>>>
>>> full summary attached (compressed).  Even compressed libgomp.log is
>>> too big to send.
>>>
>>> Richard.
>>
>> I think this is good enough to start with. PA reported clean results for
>> everything except gfx900 (looks like an unrelated issue).
>>
>> I'll go ahead and commit the patch.
>>
>> Hopefully Tobias's patch has already trimmed all the "-g" failures from that
>> list.
> 
> Should I open a bug for the ICE?  That's responsible for quite a number
> of failures as well.

The broken vector reduction instruction? It's a known issue (RDNA 
doesn't support those instructions anymore, and somehow disabling the 
insn isn't enough to stop them being generated), but it doesn't have a 
tracking number, so why not?

Thanks

Andrew


^ permalink raw reply	[flat|nested] 25+ messages in thread

* libgomp GCN gfx1030/gfx1100 offloading status (was: [PATCH] amdgcn: additional gfx1100 support)
  2024-01-26  9:45   ` Richard Biener
  2024-01-26 10:19     ` Andrew Stubbs
@ 2024-02-01 14:41     ` Thomas Schwinge
  2024-02-01 14:49       ` Richard Biener
  1 sibling, 1 reply; 25+ messages in thread
From: Thomas Schwinge @ 2024-02-01 14:41 UTC (permalink / raw)
  To: Richard Biener, Andrew Stubbs, Tobias Burnus; +Cc: gcc-patches

Hi!

On 2024-01-26T10:45:10+0100, Richard Biener <rguenther@suse.de> wrote:
> On Fri, 26 Jan 2024, Richard Biener wrote:
>> On Wed, 24 Jan 2024, Andrew Stubbs wrote:
>> > [...] is enough to get gfx1100 working for most purposes, on top of the
>> > patch that Tobias committed a week or so ago; there are still some test
>> > failures to investigate, and probably some tuning to do.
>> > 
>> > It might also get gfx1030 working too. @Richi, could you test it,
>> > please?
>> 
>> I can report partial success here.  [...]

>> I'll followup with a test summary once the (serial) run of libgomp
>> testing finished.

(Why serial, by the way?)

>> At least there are quite some number of
>> actual kernel executions and PASSing testcases.
>
>                 === libgomp Summary ===
>
> # of expected passes            29126
> # of unexpected failures        697
> # of unexpected successes       1
> # of expected failures          703
> # of unresolved testcases       318
> # of unsupported tests          766
>
> full summary attached (compressed).

Compating your old results ('|     ' prefix in the following) with what I
got with '-march=gfx1100' for AMD Radeon RX 7900 XTX.  My GCC sources are
a few weeks old, but have all the recent fix-up commits cherry-picked,
and a work-around applied for:

    /tmp/ccfrKwEK.mkoffload.2.s:29:27: error: value out of range
              .amdhsa_next_free_vgpr        516
                                            ^~~

(..., to be discussed later.)

There are, I think, no compilation FAILs anymore; I'm only commenting on
execution test FAILs.  Not all FAILs appear all the time (so it follows
that I may be missing a few), and 'libgomp.c++/../libgomp.c-c++-common'
generally behaves similar to 'libgomp.c/../libgomp.c-c++-common', so
omitting the former here.

|     FAIL: libgomp.c/../libgomp.c-c++-common/error-1.c output pattern test

Not seeing that FAIL.

I also see 'libgomp.c-c++-common/for-5.c' FAIL.

|     FAIL: libgomp.c/../libgomp.c-c++-common/icv-5.c execution test
|     FAIL: libgomp.c/../libgomp.c-c++-common/icv-6.c execution test
|     FAIL: libgomp.c/../libgomp.c-c++-common/icv-7.c execution test
|     FAIL: libgomp.c/../libgomp.c-c++-common/icv-9.c execution test

I confirm 'libgomp.c-c++-common/icv-5.c', 'libgomp.c-c++-common/icv-9.c'
FAIL, but 'libgomp.c-c++-common/icv-6.c', 'libgomp.c-c++-common/icv-7.c'
PASS.

|     FAIL: libgomp.c/../libgomp.c-c++-common/non-rect-loop-1.c execution test

Not seeing that FAIL.

|     FAIL: libgomp.c/../libgomp.c-c++-common/reduction-6.c execution test

I confirm that FAIL, and also 'libgomp.c-c++-common/reduction-5.c'
occasionally.

|     FAIL: libgomp.c/../libgomp.c-c++-common/requires-unified-addr-1.c execution test
|     FAIL: libgomp.c/../libgomp.c-c++-common/requires-unified-addr-2.c execution test
|     FAIL: libgomp.c/../libgomp.c-c++-common/target-45.c execution test
|     FAIL: libgomp.c/../libgomp.c-c++-common/target-implicit-map-3.c execution test
|     FAIL: libgomp.c/../libgomp.c-c++-common/target-is-accessible-1.c execution test

Not seeing these FAILs.

I also see 'libgomp.c-c++-common/reverse-offload-1.c' FAIL.

|     FAIL: libgomp.c/../libgomp.c-c++-common/task-detach-6.c execution test
|     WARNING: program timed out.
|     FAIL: libgomp.c/../libgomp.c-c++-common/task-in-explicit-1.c execution test

I confirm these FAILs.

|     FAIL: libgomp.c/../libgomp.c-c++-common/teams-2.c execution test

Known FAIL.

|     FAIL: libgomp.c/../libgomp.c-c++-common/teams-nteams-icv-1.c execution test
|     FAIL: libgomp.c/../libgomp.c-c++-common/teams-nteams-icv-2.c execution test
|     FAIL: libgomp.c/../libgomp.c-c++-common/teams-nteams-icv-3.c execution test
|     FAIL: libgomp.c/../libgomp.c-c++-common/teams-nteams-icv-4.c execution test
|     FAIL: libgomp.c/declare-variant-4-gfx900.c (test for excess errors)
|     FAIL: libgomp.c/declare-variant-4-gfx906.c (test for excess errors)
|     FAIL: libgomp.c/declare-variant-4-gfx908.c (test for excess errors)
|     FAIL: libgomp.c/declare-variant-4-gfx90a.c (test for excess errors)
|     FAIL: libgomp.c/declare-variant-4.c execution test
|     FAIL: libgomp.c/declare-variant-4.c scan-amdgcn-amdhsa-offload-tree-dump optimized "= gfx[^ ]+ \\\\(\\\\);"
|     FAIL: libgomp.c/examples-4/device-2.c execution test
|     WARNING: program timed out.

Not seeing these FAILs.

I also see 'libgomp.c/examples-4/teams-4.c', 'libgomp.c/target-31.c' FAIL.

|     FAIL: libgomp.c/target-teams-1.c execution test

I confirm this FAIL.

|     FAIL: libgomp.fortran/[...] execution test

You had a lot of FAILs there.  I only see the following:

|     FAIL: libgomp.fortran/examples-4/teams-2.f90   -O0  execution test
|     [...]

|     FAIL: libgomp.fortran/examples-4/teams-4.f90   -O0  execution test
|     [...]

|     FAIL: libgomp.fortran/icv-6.f90   -O  execution test

|     FAIL: libgomp.fortran/reverse-offload-1.f90   -O2  execution test
|     FAIL: libgomp.fortran/reverse-offload-1.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
|     FAIL: libgomp.fortran/reverse-offload-1.f90   -O3 -g  (test for excess errors)
|     UNRESOLVED: libgomp.fortran/reverse-offload-1.f90   -O3 -g  compilation failed to produce executable

|     FAIL: libgomp.fortran/reverse-offload-3.f90   -O  execution test
|     FAIL: libgomp.fortran/reverse-offload-4.f90   -O  execution test

|     FAIL: libgomp.fortran/task-detach-6.f90   -O0  execution test
|     [...]

|     FAIL: libgomp.fortran/task-in-explicit-1.f90   -O0  execution test
|     [...]

You had a lot of FAILs for 'libgomp.oacc-c', 'libgomp.oacc-c++',
'libgomp.oacc-fortran'.  For me:

|     FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/vector-length-128-10.c -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa  -O2  execution test

I confirm this FAIL (also 'libgomp.oacc-c++').

|     FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/vprop.c -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa  -O2  (test for excess errors)

Known FAIL (also 'libgomp.oacc-c++').

|     FAIL: libgomp.oacc-fortran/reduction-5.f90 -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa  -O0  execution test

I confirm this FAIL.

So, "not bad!", but also still some work to be done.  :-)

Grüße
 Thomas

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: libgomp GCN gfx1030/gfx1100 offloading status (was: [PATCH] amdgcn: additional gfx1100 support)
  2024-02-01 14:41     ` libgomp GCN gfx1030/gfx1100 offloading status (was: [PATCH] amdgcn: additional gfx1100 support) Thomas Schwinge
@ 2024-02-01 14:49       ` Richard Biener
  2024-02-21 12:34         ` Stabilizing flaky libgomp GCN target/offloading testing (was: libgomp GCN gfx1030/gfx1100 offloading status) Thomas Schwinge
  0 siblings, 1 reply; 25+ messages in thread
From: Richard Biener @ 2024-02-01 14:49 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: Andrew Stubbs, Tobias Burnus, gcc-patches

On Thu, 1 Feb 2024, Thomas Schwinge wrote:

> Hi!
> 
> On 2024-01-26T10:45:10+0100, Richard Biener <rguenther@suse.de> wrote:
> > On Fri, 26 Jan 2024, Richard Biener wrote:
> >> On Wed, 24 Jan 2024, Andrew Stubbs wrote:
> >> > [...] is enough to get gfx1100 working for most purposes, on top of the
> >> > patch that Tobias committed a week or so ago; there are still some test
> >> > failures to investigate, and probably some tuning to do.
> >> > 
> >> > It might also get gfx1030 working too. @Richi, could you test it,
> >> > please?
> >> 
> >> I can report partial success here.  [...]
> 
> >> I'll followup with a test summary once the (serial) run of libgomp
> >> testing finished.
> 
> (Why serial, by the way?)

Just out of caution ... (I'm using the GPU for the desktop at the
same time and dmesg gets spammed with some not-so reassuring
"errors" during the offloading)

> >> At least there are quite some number of
> >> actual kernel executions and PASSing testcases.
> >
> >                 === libgomp Summary ===
> >
> > # of expected passes            29126
> > # of unexpected failures        697
> > # of unexpected successes       1
> > # of expected failures          703
> > # of unresolved testcases       318
> > # of unsupported tests          766
> >
> > full summary attached (compressed).
> 
> Compating your old results ('|     ' prefix in the following) with what I
> got with '-march=gfx1100' for AMD Radeon RX 7900 XTX.  My GCC sources are
> a few weeks old, but have all the recent fix-up commits cherry-picked,
> and a work-around applied for:
> 
>     /tmp/ccfrKwEK.mkoffload.2.s:29:27: error: value out of range
>               .amdhsa_next_free_vgpr        516
>                                             ^~~
> 
> (..., to be discussed later.)
> 
> There are, I think, no compilation FAILs anymore; I'm only commenting on
> execution test FAILs.  Not all FAILs appear all the time (so it follows
> that I may be missing a few), and 'libgomp.c++/../libgomp.c-c++-common'
> generally behaves similar to 'libgomp.c/../libgomp.c-c++-common', so
> omitting the former here.
> 
> |     FAIL: libgomp.c/../libgomp.c-c++-common/error-1.c output pattern test
> 
> Not seeing that FAIL.
> 
> I also see 'libgomp.c-c++-common/for-5.c' FAIL.
> 
> |     FAIL: libgomp.c/../libgomp.c-c++-common/icv-5.c execution test
> |     FAIL: libgomp.c/../libgomp.c-c++-common/icv-6.c execution test
> |     FAIL: libgomp.c/../libgomp.c-c++-common/icv-7.c execution test
> |     FAIL: libgomp.c/../libgomp.c-c++-common/icv-9.c execution test
> 
> I confirm 'libgomp.c-c++-common/icv-5.c', 'libgomp.c-c++-common/icv-9.c'
> FAIL, but 'libgomp.c-c++-common/icv-6.c', 'libgomp.c-c++-common/icv-7.c'
> PASS.
> 
> |     FAIL: libgomp.c/../libgomp.c-c++-common/non-rect-loop-1.c execution test
> 
> Not seeing that FAIL.
> 
> |     FAIL: libgomp.c/../libgomp.c-c++-common/reduction-6.c execution test
> 
> I confirm that FAIL, and also 'libgomp.c-c++-common/reduction-5.c'
> occasionally.
> 
> |     FAIL: libgomp.c/../libgomp.c-c++-common/requires-unified-addr-1.c execution test
> |     FAIL: libgomp.c/../libgomp.c-c++-common/requires-unified-addr-2.c execution test
> |     FAIL: libgomp.c/../libgomp.c-c++-common/target-45.c execution test
> |     FAIL: libgomp.c/../libgomp.c-c++-common/target-implicit-map-3.c execution test
> |     FAIL: libgomp.c/../libgomp.c-c++-common/target-is-accessible-1.c execution test
> 
> Not seeing these FAILs.
> 
> I also see 'libgomp.c-c++-common/reverse-offload-1.c' FAIL.
> 
> |     FAIL: libgomp.c/../libgomp.c-c++-common/task-detach-6.c execution test
> |     WARNING: program timed out.
> |     FAIL: libgomp.c/../libgomp.c-c++-common/task-in-explicit-1.c execution test
> 
> I confirm these FAILs.
> 
> |     FAIL: libgomp.c/../libgomp.c-c++-common/teams-2.c execution test
> 
> Known FAIL.
> 
> |     FAIL: libgomp.c/../libgomp.c-c++-common/teams-nteams-icv-1.c execution test
> |     FAIL: libgomp.c/../libgomp.c-c++-common/teams-nteams-icv-2.c execution test
> |     FAIL: libgomp.c/../libgomp.c-c++-common/teams-nteams-icv-3.c execution test
> |     FAIL: libgomp.c/../libgomp.c-c++-common/teams-nteams-icv-4.c execution test
> |     FAIL: libgomp.c/declare-variant-4-gfx900.c (test for excess errors)
> |     FAIL: libgomp.c/declare-variant-4-gfx906.c (test for excess errors)
> |     FAIL: libgomp.c/declare-variant-4-gfx908.c (test for excess errors)
> |     FAIL: libgomp.c/declare-variant-4-gfx90a.c (test for excess errors)
> |     FAIL: libgomp.c/declare-variant-4.c execution test
> |     FAIL: libgomp.c/declare-variant-4.c scan-amdgcn-amdhsa-offload-tree-dump optimized "= gfx[^ ]+ \\\\(\\\\);"
> |     FAIL: libgomp.c/examples-4/device-2.c execution test
> |     WARNING: program timed out.
> 
> Not seeing these FAILs.
> 
> I also see 'libgomp.c/examples-4/teams-4.c', 'libgomp.c/target-31.c' FAIL.
> 
> |     FAIL: libgomp.c/target-teams-1.c execution test
> 
> I confirm this FAIL.
> 
> |     FAIL: libgomp.fortran/[...] execution test
> 
> You had a lot of FAILs there.  I only see the following:
> 
> |     FAIL: libgomp.fortran/examples-4/teams-2.f90   -O0  execution test
> |     [...]
> 
> |     FAIL: libgomp.fortran/examples-4/teams-4.f90   -O0  execution test
> |     [...]
> 
> |     FAIL: libgomp.fortran/icv-6.f90   -O  execution test
> 
> |     FAIL: libgomp.fortran/reverse-offload-1.f90   -O2  execution test
> |     FAIL: libgomp.fortran/reverse-offload-1.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
> |     FAIL: libgomp.fortran/reverse-offload-1.f90   -O3 -g  (test for excess errors)
> |     UNRESOLVED: libgomp.fortran/reverse-offload-1.f90   -O3 -g  compilation failed to produce executable
> 
> |     FAIL: libgomp.fortran/reverse-offload-3.f90   -O  execution test
> |     FAIL: libgomp.fortran/reverse-offload-4.f90   -O  execution test
> 
> |     FAIL: libgomp.fortran/task-detach-6.f90   -O0  execution test
> |     [...]
> 
> |     FAIL: libgomp.fortran/task-in-explicit-1.f90   -O0  execution test
> |     [...]
> 
> You had a lot of FAILs for 'libgomp.oacc-c', 'libgomp.oacc-c++',
> 'libgomp.oacc-fortran'.  For me:
> 
> |     FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/vector-length-128-10.c -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa  -O2  execution test
> 
> I confirm this FAIL (also 'libgomp.oacc-c++').
> 
> |     FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/vprop.c -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa  -O2  (test for excess errors)
> 
> Known FAIL (also 'libgomp.oacc-c++').
> 
> |     FAIL: libgomp.oacc-fortran/reduction-5.f90 -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa  -O0  execution test
> 
> I confirm this FAIL.
> 
> 
> So, "not bad!", but also still some work to be done.  :-)

I'm going to re-do the testing with all the fixes in on Monday and
will report back.

Richard.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Stabilizing flaky libgomp GCN target/offloading testing (was: libgomp GCN gfx1030/gfx1100 offloading status)
  2024-02-01 14:49       ` Richard Biener
@ 2024-02-21 12:34         ` Thomas Schwinge
  2024-02-21 16:32           ` Richard Biener
  2024-03-08 10:34           ` GCN, nvptx: Errors during device probing are fatal (was: Stabilizing flaky libgomp GCN target/offloading testing) Thomas Schwinge
  0 siblings, 2 replies; 25+ messages in thread
From: Thomas Schwinge @ 2024-02-21 12:34 UTC (permalink / raw)
  To: Richard Biener, Andrew Stubbs; +Cc: Tobias Burnus, gcc-patches, Jakub Jelinek

Hi!

On 2024-02-01T15:49:02+0100, Richard Biener <rguenther@suse.de> wrote:
> On Thu, 1 Feb 2024, Thomas Schwinge wrote:
>> On 2024-01-26T10:45:10+0100, Richard Biener <rguenther@suse.de> wrote:
>> > On Fri, 26 Jan 2024, Richard Biener wrote:
>> >> On Wed, 24 Jan 2024, Andrew Stubbs wrote:
>> >> > [...] is enough to get gfx1100 working for most purposes, on top of the
>> >> > patch that Tobias committed a week or so ago; there are still some test
>> >> > failures to investigate, and probably some tuning to do.
>> >> > 
>> >> > It might also get gfx1030 working too. @Richi, could you test it,
>> >> > please?
>> >> 
>> >> I can report partial success here.  [...]
>> 
>> >> I'll followup with a test summary once the (serial) run of libgomp
>> >> testing finished.
>> 
>> (Why serial, by the way?)
>
> Just out of caution ... (I'm using the GPU for the desktop at the
> same time and dmesg gets spammed with some not-so reassuring
> "errors" during the offloading)

Yeah, indeed 'dmesg' is full of "notes"...

However, note that per my work on <https://gcc.gnu.org/PR66005>
"libgomp make check time is excessive", all execution testing in libgomp
is serialized in 'libgomp/testsuite/lib/libgomp.exp:libgomp_load'.  So,
no problem/difference in that regard, to run parallel
'check-target-libgomp'.  (... with the caveat that execution tests for
effective-targets are *not* governed by that, as I've found yesterday.
I have a WIP hack for that, too.)

>> [...] what I
>> got with '-march=gfx1100' for AMD Radeon RX 7900 XTX.  [...]

>> [...] execution test FAILs.  Not all FAILs appear all the time [...]

What disturbs the testing a lot is, that the GPU may get into a bad
state, upon which any use either fails with a
'HSA_STATUS_ERROR_OUT_OF_RESOURCES' error -- or by just hanging, deep in
'libhsa-runtime64.so.1'...

I've now tried to debug the latter case (hang).  When the GPU gets into
this bad state (whatever exactly that is),
'hsa_executable_load_code_object' still returns 'HSA_STATUS_SUCCESS', but
then GCN target execution ('gcn-run') hangs in 'hsa_executable_freeze'
vs. GCN offloading execution ('libgomp-plugin-gcn.so.1') hangs right
before 'hsa_executable_freeze', in the GCN heap setup 'hsa_memory_copy'.
There it hangs until killed (for example, until DejaGnu's timeout
mechanism kills the process -- just that the next GPU-using execution
test then runs into the same thing again...).

In this state (and also the 'HSA_STATUS_ERROR_OUT_OF_RESOURCES' state),
we're able to recover via:

    $ flock /tmp/gpu.lock sudo cat /sys/kernel/debug/dri/0/amdgpu_gpu_recover
    0

This is, obviously, a hack, probably needs a serial lock to not disturb
other things, has hard-coded 'dri/0', and as I said in
<https://inbox.sourceware.org/87plww8qin.fsf@euler.schwinge.ddns.net>
"GCN RDNA2+ vs. GCC SLP vectorizer":

| I've no idea what
| 'amdgpu_gpu_recover' would do if the GPU is also used for display.

However, it's very useful in my testing.  :-|

The questions is, how to detect the "hang" state without first running
into a timeout (and disambiguating such a timeout from a user code
timeout)?  Add a watchdog: call 'alarm([a few seconds])' before device
initialization, and before the actual GPU kernel launch cancel it with
'alarm(0)'?  (..., and add a handler for 'SIGALRM' to print a distinct
error message that we can then react on, like for
'HSA_STATUS_ERROR_OUT_OF_RESOURCES'.)  Probably 'alarm'/'SIGALRM' is a
no-go in libgomp -- instead, use a helper thread to similarly implement a
watchdog?  ('libgomp/plugin/plugin-gcn.c' already is using pthreads for
other purposes.)  Any other clever ideas?  What's a suitable value for
"a few seconds"?

Grüße
 Thomas

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Stabilizing flaky libgomp GCN target/offloading testing (was: libgomp GCN gfx1030/gfx1100 offloading status)
  2024-02-21 12:34         ` Stabilizing flaky libgomp GCN target/offloading testing (was: libgomp GCN gfx1030/gfx1100 offloading status) Thomas Schwinge
@ 2024-02-21 16:32           ` Richard Biener
  2024-03-06 12:09             ` Stabilize flaky GCN target/offloading testing Thomas Schwinge
  2024-03-08 10:34           ` GCN, nvptx: Errors during device probing are fatal (was: Stabilizing flaky libgomp GCN target/offloading testing) Thomas Schwinge
  1 sibling, 1 reply; 25+ messages in thread
From: Richard Biener @ 2024-02-21 16:32 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: Andrew Stubbs, Tobias Burnus, gcc-patches, Jakub Jelinek



> Am 21.02.2024 um 13:34 schrieb Thomas Schwinge <tschwinge@baylibre.com>:
> 
> Hi!
> 
>> On 2024-02-01T15:49:02+0100, Richard Biener <rguenther@suse.de> wrote:
>>> On Thu, 1 Feb 2024, Thomas Schwinge wrote:
>>> On 2024-01-26T10:45:10+0100, Richard Biener <rguenther@suse.de> wrote:
>>>> On Fri, 26 Jan 2024, Richard Biener wrote:
>>>>> On Wed, 24 Jan 2024, Andrew Stubbs wrote:
>>>>>> [...] is enough to get gfx1100 working for most purposes, on top of the
>>>>>> patch that Tobias committed a week or so ago; there are still some test
>>>>>> failures to investigate, and probably some tuning to do.
>>>>>> 
>>>>>> It might also get gfx1030 working too. @Richi, could you test it,
>>>>>> please?
>>>>> 
>>>>> I can report partial success here.  [...]
>>> 
>>>>> I'll followup with a test summary once the (serial) run of libgomp
>>>>> testing finished.
>>> 
>>> (Why serial, by the way?)
>> 
>> Just out of caution ... (I'm using the GPU for the desktop at the
>> same time and dmesg gets spammed with some not-so reassuring
>> "errors" during the offloading)
> 
> Yeah, indeed 'dmesg' is full of "notes"...
> 
> However, note that per my work on <https://gcc.gnu.org/PR66005>
> "libgomp make check time is excessive", all execution testing in libgomp
> is serialized in 'libgomp/testsuite/lib/libgomp.exp:libgomp_load'.  So,
> no problem/difference in that regard, to run parallel
> 'check-target-libgomp'.  (... with the caveat that execution tests for
> effective-targets are *not* governed by that, as I've found yesterday.
> I have a WIP hack for that, too.)
> 
> 
>>> [...] what I
>>> got with '-march=gfx1100' for AMD Radeon RX 7900 XTX.  [...]
> 
>>> [...] execution test FAILs.  Not all FAILs appear all the time [...]
> 
> What disturbs the testing a lot is, that the GPU may get into a bad
> state, upon which any use either fails with a
> 'HSA_STATUS_ERROR_OUT_OF_RESOURCES' error -- or by just hanging, deep in
> 'libhsa-runtime64.so.1'...
> 
> I've now tried to debug the latter case (hang).  When the GPU gets into
> this bad state (whatever exactly that is),
> 'hsa_executable_load_code_object' still returns 'HSA_STATUS_SUCCESS', but
> then GCN target execution ('gcn-run') hangs in 'hsa_executable_freeze'
> vs. GCN offloading execution ('libgomp-plugin-gcn.so.1') hangs right
> before 'hsa_executable_freeze', in the GCN heap setup 'hsa_memory_copy'.
> There it hangs until killed (for example, until DejaGnu's timeout
> mechanism kills the process -- just that the next GPU-using execution
> test then runs into the same thing again...).
> 
> In this state (and also the 'HSA_STATUS_ERROR_OUT_OF_RESOURCES' state),
> we're able to recover via:
> 
>    $ flock /tmp/gpu.lock sudo cat /sys/kernel/debug/dri/0/amdgpu_gpu_recover
>    0
> 
> This is, obviously, a hack, probably needs a serial lock to not disturb
> other things, has hard-coded 'dri/0', and as I said in
> <https://inbox.sourceware.org/87plww8qin.fsf@euler.schwinge.ddns.net>
> "GCN RDNA2+ vs. GCC SLP vectorizer":
> 
> | I've no idea what
> | 'amdgpu_gpu_recover' would do if the GPU is also used for display.

It ends up terminating your X session… (there’s some automatic driver recovery that’s also sometimes triggered which sounds like the same thing).  I need to try using the integrated graphics for X11 to see if that avoids the issue.

Guess AMD needs to improve the driver/runtime (or we - it’s open source at least up to the firmware).

Richard 

> However, it's very useful in my testing.  :-|
> 
> The questions is, how to detect the "hang" state without first running
> into a timeout (and disambiguating such a timeout from a user code
> timeout)?  Add a watchdog: call 'alarm([a few seconds])' before device
> initialization, and before the actual GPU kernel launch cancel it with
> 'alarm(0)'?  (..., and add a handler for 'SIGALRM' to print a distinct
> error message that we can then react on, like for
> 'HSA_STATUS_ERROR_OUT_OF_RESOURCES'.)  Probably 'alarm'/'SIGALRM' is a
> no-go in libgomp -- instead, use a helper thread to similarly implement a
> watchdog?  ('libgomp/plugin/plugin-gcn.c' already is using pthreads for
> other purposes.)  Any other clever ideas?  What's a suitable value for
> "a few seconds"?
> 
> 
> Grüße
> Thomas

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Stabilize flaky GCN target/offloading testing
  2024-02-21 16:32           ` Richard Biener
@ 2024-03-06 12:09             ` Thomas Schwinge
  2024-03-06 12:39               ` Andrew Stubbs
  0 siblings, 1 reply; 25+ messages in thread
From: Thomas Schwinge @ 2024-03-06 12:09 UTC (permalink / raw)
  To: Richard Biener, Andrew Stubbs; +Cc: Tobias Burnus, gcc-patches, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 4691 bytes --]

Hi!

On 2024-02-21T17:32:13+0100, Richard Biener <rguenther@suse.de> wrote:
> Am 21.02.2024 um 13:34 schrieb Thomas Schwinge <tschwinge@baylibre.com>:
>> [...] per my work on <https://gcc.gnu.org/PR66005>
>> "libgomp make check time is excessive", all execution testing in libgomp
>> is serialized in 'libgomp/testsuite/lib/libgomp.exp:libgomp_load'.  [...]
>> (... with the caveat that execution tests for
>> effective-targets are *not* governed by that, as I've found yesterday.
>> I have a WIP hack for that, too.)

>> What disturbs the testing a lot is, that the GPU may get into a bad
>> state, upon which any use either fails with a
>> 'HSA_STATUS_ERROR_OUT_OF_RESOURCES' error -- or by just hanging, deep in
>> 'libhsa-runtime64.so.1'...
>> 
>> I've now tried to debug the latter case (hang).  When the GPU gets into
>> this bad state (whatever exactly that is),
>> 'hsa_executable_load_code_object' still returns 'HSA_STATUS_SUCCESS', but
>> then GCN target execution ('gcn-run') hangs in 'hsa_executable_freeze'
>> vs. GCN offloading execution ('libgomp-plugin-gcn.so.1') hangs right
>> before 'hsa_executable_freeze', in the GCN heap setup 'hsa_memory_copy'.
>> There it hangs until killed (for example, until DejaGnu's timeout
>> mechanism kills the process -- just that the next GPU-using execution
>> test then runs into the same thing again...).
>> 
>> In this state (and also the 'HSA_STATUS_ERROR_OUT_OF_RESOURCES' state),
>> we're able to recover via:
>> 
>>    $ flock /tmp/gpu.lock sudo cat /sys/kernel/debug/dri/0/amdgpu_gpu_recover
>>    0

At least most of the times.  I've found that -- sometimes... ;-( -- if
you run into 'HSA_STATUS_ERROR_OUT_OF_RESOURCES', then do
'amdgpu_gpu_recover', and then immediately re-execute, you'll again run
into 'HSA_STATUS_ERROR_OUT_OF_RESOURCES'.  That appears to be avoidable
by injecting some artificial "cool-down period"...  (The latter I've not
yet tested extensively.)

>> This is, obviously, a hack, probably needs a serial lock to not disturb
>> other things, has hard-coded 'dri/0', and as I said in
>> <https://inbox.sourceware.org/87plww8qin.fsf@euler.schwinge.ddns.net>
>> "GCN RDNA2+ vs. GCC SLP vectorizer":
>> 
>> | I've no idea what
>> | 'amdgpu_gpu_recover' would do if the GPU is also used for display.
>
> It ends up terminating your X session…

Eh....  ;'-|

> (there’s some automatic driver recovery that’s also sometimes triggered which sounds like the same thing).

> I need to try using the integrated graphics for X11 to see if that avoids the issue.

A few years ago, I tried that for a Nvidia GPU laptop, and -- if I now
remember correctly -- basically got it to work, via hand-editing
'/etc/X11/xorg.conf' and all that...  But: I couldn't get external HDMI
to work in that setup, and therefore reverted to "standard".

> Guess AMD needs to improve the driver/runtime (or we - it’s open source at least up to the firmware).

>> However, it's very useful in my testing.  :-|
>> 
>> The questions is, how to detect the "hang" state without first running
>> into a timeout (and disambiguating such a timeout from a user code
>> timeout)?  Add a watchdog: call 'alarm([a few seconds])' before device
>> initialization, and before the actual GPU kernel launch cancel it with
>> 'alarm(0)'?  (..., and add a handler for 'SIGALRM' to print a distinct
>> error message that we can then react on, like for
>> 'HSA_STATUS_ERROR_OUT_OF_RESOURCES'.)  Probably 'alarm'/'SIGALRM' is a
>> no-go in libgomp -- instead, use a helper thread to similarly implement a
>> watchdog?  ('libgomp/plugin/plugin-gcn.c' already is using pthreads for
>> other purposes.)  Any other clever ideas?  What's a suitable value for
>> "a few seconds"?

I'm attaching my current "GCN: Watchdog for device image load", covering
both 'gcc/config/gcn/gcn-run.cc' and 'libgomp/plugin/plugin-gcn.c'.
(That's using 'timer_create' etc. instead of 'alarm'/'SIGALRM'. )

That, plus routing *all* potential GPU usage (in particular: including
execution tests for effective-targets, see above) through a serial lock
('flock', implemented in DejaGnu board file, outside of the the
"DejaGnu timeout domain", similar to
'libgomp/testsuite/lib/libgomp.exp:libgomp_load', see above), plus
catching 'HSA_STATUS_ERROR_OUT_OF_RESOURCES' (both the "real" ones and
the "fake" ones via "GCN: Watchdog for device image load") and in that
case 'amdgpu_gpu_recover' and re-execution of the respective executable,
does greatly stabilize flaky GCN target/offloading testing.

Do we have consensus to move forward with this approach, generally?


Grüße
 Thomas



[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-GCN-Watchdog-for-device-image-load.patch --]
[-- Type: text/x-diff, Size: 6408 bytes --]

From 21795353483c263c91a5efa80da41a75a6b2b629 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <tschwinge@baylibre.com>
Date: Thu, 22 Feb 2024 21:50:45 +0100
Subject: [PATCH] GCN: Watchdog for device image load

---
 gcc/config/gcn/gcn-run.cc   | 76 ++++++++++++++++++++++++++++++++++
 libgomp/plugin/plugin-gcn.c | 81 ++++++++++++++++++++++++++++++++++++-
 2 files changed, 156 insertions(+), 1 deletion(-)

diff --git a/gcc/config/gcn/gcn-run.cc b/gcc/config/gcn/gcn-run.cc
index d45ff3e6c2ba..ab15185af471 100644
--- a/gcc/config/gcn/gcn-run.cc
+++ b/gcc/config/gcn/gcn-run.cc
@@ -33,6 +33,8 @@
 #include <unistd.h>
 #include <elf.h>
 #include <signal.h>
+#include <time.h>
+#include <errno.h>
 
 #include "hsa.h"
 #include "../../../libgomp/config/gcn/libgomp-gcn.h"
@@ -616,6 +618,70 @@ run (uint64_t kernel, void *kernargs)
 	"Clean up signal");
 }
 
+/* Watchdog.  */
+
+static void
+watchdog_bark (union sigval sigev_value)
+{
+  const char *msg = sigev_value.sival_ptr;
+  fprintf (stderr, "Watchdog barking %s\n", msg);
+  exit (EXIT_FAILURE);
+}
+
+static void
+watchdog_start (timer_t *restrict timeridp, const int s, const char *msg)
+{
+  if (debug)
+    fprintf (stderr, "Starting watchdog\n");
+
+  struct sigevent sev;
+  sev.sigev_notify = SIGEV_THREAD;
+  sev.sigev_value.sival_ptr = (void *) (uintptr_t) msg;
+  sev.sigev_notify_function = watchdog_bark;
+  sev.sigev_notify_attributes = NULL;
+  int res;
+  /* Backoff in case of 'EAGAIN': waiting 255..534773760 ns in 22 attempts.  */
+  int32_t wait_ns = 255;
+  while ((res = timer_create (CLOCK_MONOTONIC, &sev, timeridp)) == EAGAIN
+	 && wait_ns <= 999999999)
+    {
+      if (debug)
+	fprintf (stderr, "'timer_create': 'EAGAIN'; waiting %d ns\n",
+		 (int) wait_ns);
+      struct timespec wait_ts = { 0, wait_ns };
+      (void) nanosleep (&wait_ts, NULL);
+      wait_ns <<= 1;
+    }
+  if (res != 0)
+    {
+      perror ("'timer_create' FAILED");
+      exit (EXIT_FAILURE);
+    }
+
+  struct itimerspec its = { { 0, 0 }, { s, 0 } };
+  res = timer_settime (*timeridp, 0, &its, NULL);
+  if (res != 0)
+    {
+      perror ("'timer_settime' FAILED");
+      exit (EXIT_FAILURE);
+    }
+}
+
+static void
+watchdog_stop (timer_t timerid)
+{
+  int res;
+  res = timer_delete (timerid);
+  if (res != 0)
+    {
+      perror ("'timer_delete' FAILED");
+      exit (EXIT_FAILURE);
+    }
+
+  if (debug)
+    fprintf (stderr, "Stopped watchdog\n");
+}
+
 int
 main (int argc, char *argv[])
 {
@@ -658,7 +724,17 @@ main (int argc, char *argv[])
   char **kernel_argv = &argv[kernel_arg];
 
   init_device ();
+
+  /* Something's wrong if the device image load doesn't complete quickly;
+     <https://inbox.sourceware.org/87il2ij8sm.fsf@euler.schwinge.ddns.net>
+     "Stabilizing flaky libgomp GCN target/offloading testing".  */
+  timer_t watchdog;
+  static const int watchdog_s = 10;
+  watchdog_start (&watchdog, watchdog_s,
+		  "during device image load; maybe handle similar to"
+		  " 'HSA_STATUS_ERROR_OUT_OF_RESOURCES'?");
   load_image (kernel_argv[0]);
+  watchdog_stop (watchdog);
 
   /* Calculate size of function parameters + argv data.  */
   size_t args_size = 0;
diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 2771123252a8..5680d9f5a34a 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -48,6 +48,8 @@
 #include "oacc-plugin.h"
 #include "oacc-int.h"
 #include <assert.h>
+#include <time.h>
+#include <errno.h>
 
 /* These probably won't be in elf.h for a while.  */
 #ifndef R_AMDGPU_NONE
@@ -1371,6 +1373,71 @@ hsa_queue_callback (hsa_status_t status,
   hsa_fatal ("Asynchronous queue error", status);
 }
 
+/* }}}  */
+/* {{{ Watchdog  */
+
+static void
+watchdog_bark (union sigval sigev_value)
+{
+  const char *msg = sigev_value.sival_ptr;
+  GOMP_PLUGIN_error ("GCN fatal error: watchdog barking %s\n", msg);
+  _Exit (EXIT_FAILURE);
+}
+
+static void
+watchdog_start (timer_t *restrict timeridp, const int s, const char *msg)
+{
+  GCN_DEBUG ("Starting watchdog\n");
+
+  struct sigevent sev;
+  sev.sigev_notify = SIGEV_THREAD;
+  sev.sigev_value.sival_ptr = (void *) (uintptr_t) msg;
+  sev.sigev_notify_function = watchdog_bark;
+  sev.sigev_notify_attributes = NULL;
+  int res;
+  /* Backoff in case of 'EAGAIN': waiting 255..534773760 ns in 22 attempts.  */
+  int32_t wait_ns = 255;
+  while ((res = timer_create (CLOCK_MONOTONIC, &sev, timeridp)) == EAGAIN
+	 && wait_ns <= 999999999)
+    {
+      GCN_DEBUG ("'timer_create': 'EAGAIN'; waiting %d ns\n",
+		 (int) wait_ns);
+      struct timespec wait_ts = { 0, wait_ns };
+      (void) nanosleep (&wait_ts, NULL);
+      wait_ns <<= 1;
+    }
+  if (res != 0)
+    {
+      GOMP_PLUGIN_error ("GCN fatal error: 'timer_create' FAILED: %s",
+			 strerror (errno));
+      _Exit (EXIT_FAILURE);
+    }
+
+  struct itimerspec its = { { 0, 0 }, { s, 0 } };
+  res = timer_settime (*timeridp, 0, &its, NULL);
+  if (res != 0)
+    {
+      GOMP_PLUGIN_error ("GCN fatal error: 'timer_settime' FAILED: %s",
+			 strerror (errno));
+      _Exit (EXIT_FAILURE);
+    }
+}
+
+static void
+watchdog_stop (timer_t timerid)
+{
+  int res;
+  res = timer_delete (timerid);
+  if (res != 0)
+    {
+      GOMP_PLUGIN_error ("GCN fatal error: 'timer_delete' FAILED: %s",
+			 strerror (errno));
+      _Exit (EXIT_FAILURE);
+    }
+
+  GCN_DEBUG ("Stopped watchdog\n");
+}
+
 /* }}}  */
 /* {{{ HSA initialization  */
 
@@ -2502,7 +2569,16 @@ create_and_finalize_hsa_program (struct agent_info *agent)
       return false;
     }
   if (agent->prog_finalized)
-    goto final;
+    goto unlock;
+
+  /* Something's wrong if the device image load doesn't complete quickly;
+     <https://inbox.sourceware.org/87il2ij8sm.fsf@euler.schwinge.ddns.net>
+     "Stabilizing flaky libgomp GCN target/offloading testing".  */
+  timer_t watchdog;
+  static const int watchdog_s = 10;
+  watchdog_start (&watchdog, watchdog_s,
+		  "during device image load; maybe handle similar to"
+		  " 'HSA_STATUS_ERROR_OUT_OF_RESOURCES'?");
 
   status
     = hsa_fns.hsa_executable_create_fn (HSA_PROFILE_FULL,
@@ -2581,6 +2657,9 @@ create_and_finalize_hsa_program (struct agent_info *agent)
 final:
   agent->prog_finalized = true;
 
+  watchdog_stop (watchdog);
+
+unlock:
   if (pthread_mutex_unlock (&agent->prog_mutex))
     {
       GOMP_PLUGIN_error ("Could not unlock a GCN agent program mutex");
-- 
2.43.0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Stabilize flaky GCN target/offloading testing
  2024-03-06 12:09             ` Stabilize flaky GCN target/offloading testing Thomas Schwinge
@ 2024-03-06 12:39               ` Andrew Stubbs
  2024-03-06 13:29                 ` Richard Biener
  0 siblings, 1 reply; 25+ messages in thread
From: Andrew Stubbs @ 2024-03-06 12:39 UTC (permalink / raw)
  To: Thomas Schwinge, Richard Biener; +Cc: Tobias Burnus, gcc-patches, Jakub Jelinek

On 06/03/2024 12:09, Thomas Schwinge wrote:
> Hi!
> 
> On 2024-02-21T17:32:13+0100, Richard Biener <rguenther@suse.de> wrote:
>> Am 21.02.2024 um 13:34 schrieb Thomas Schwinge <tschwinge@baylibre.com>:
>>> [...] per my work on <https://gcc.gnu.org/PR66005>
>>> "libgomp make check time is excessive", all execution testing in libgomp
>>> is serialized in 'libgomp/testsuite/lib/libgomp.exp:libgomp_load'.  [...]
>>> (... with the caveat that execution tests for
>>> effective-targets are *not* governed by that, as I've found yesterday.
>>> I have a WIP hack for that, too.)
> 
>>> What disturbs the testing a lot is, that the GPU may get into a bad
>>> state, upon which any use either fails with a
>>> 'HSA_STATUS_ERROR_OUT_OF_RESOURCES' error -- or by just hanging, deep in
>>> 'libhsa-runtime64.so.1'...
>>>
>>> I've now tried to debug the latter case (hang).  When the GPU gets into
>>> this bad state (whatever exactly that is),
>>> 'hsa_executable_load_code_object' still returns 'HSA_STATUS_SUCCESS', but
>>> then GCN target execution ('gcn-run') hangs in 'hsa_executable_freeze'
>>> vs. GCN offloading execution ('libgomp-plugin-gcn.so.1') hangs right
>>> before 'hsa_executable_freeze', in the GCN heap setup 'hsa_memory_copy'.
>>> There it hangs until killed (for example, until DejaGnu's timeout
>>> mechanism kills the process -- just that the next GPU-using execution
>>> test then runs into the same thing again...).
>>>
>>> In this state (and also the 'HSA_STATUS_ERROR_OUT_OF_RESOURCES' state),
>>> we're able to recover via:
>>>
>>>     $ flock /tmp/gpu.lock sudo cat /sys/kernel/debug/dri/0/amdgpu_gpu_recover
>>>     0
> 
> At least most of the times.  I've found that -- sometimes... ;-( -- if
> you run into 'HSA_STATUS_ERROR_OUT_OF_RESOURCES', then do
> 'amdgpu_gpu_recover', and then immediately re-execute, you'll again run
> into 'HSA_STATUS_ERROR_OUT_OF_RESOURCES'.  That appears to be avoidable
> by injecting some artificial "cool-down period"...  (The latter I've not
> yet tested extensively.)
> 
>>> This is, obviously, a hack, probably needs a serial lock to not disturb
>>> other things, has hard-coded 'dri/0', and as I said in
>>> <https://inbox.sourceware.org/87plww8qin.fsf@euler.schwinge.ddns.net>
>>> "GCN RDNA2+ vs. GCC SLP vectorizer":
>>>
>>> | I've no idea what
>>> | 'amdgpu_gpu_recover' would do if the GPU is also used for display.
>>
>> It ends up terminating your X session…
> 
> Eh....  ;'-|
> 
>> (there’s some automatic driver recovery that’s also sometimes triggered which sounds like the same thing).
> 
>> I need to try using the integrated graphics for X11 to see if that avoids the issue.
> 
> A few years ago, I tried that for a Nvidia GPU laptop, and -- if I now
> remember correctly -- basically got it to work, via hand-editing
> '/etc/X11/xorg.conf' and all that...  But: I couldn't get external HDMI
> to work in that setup, and therefore reverted to "standard".
> 
>> Guess AMD needs to improve the driver/runtime (or we - it’s open source at least up to the firmware).
> 
>>> However, it's very useful in my testing.  :-|
>>>
>>> The questions is, how to detect the "hang" state without first running
>>> into a timeout (and disambiguating such a timeout from a user code
>>> timeout)?  Add a watchdog: call 'alarm([a few seconds])' before device
>>> initialization, and before the actual GPU kernel launch cancel it with
>>> 'alarm(0)'?  (..., and add a handler for 'SIGALRM' to print a distinct
>>> error message that we can then react on, like for
>>> 'HSA_STATUS_ERROR_OUT_OF_RESOURCES'.)  Probably 'alarm'/'SIGALRM' is a
>>> no-go in libgomp -- instead, use a helper thread to similarly implement a
>>> watchdog?  ('libgomp/plugin/plugin-gcn.c' already is using pthreads for
>>> other purposes.)  Any other clever ideas?  What's a suitable value for
>>> "a few seconds"?
> 
> I'm attaching my current "GCN: Watchdog for device image load", covering
> both 'gcc/config/gcn/gcn-run.cc' and 'libgomp/plugin/plugin-gcn.c'.
> (That's using 'timer_create' etc. instead of 'alarm'/'SIGALRM'. )
> 
> That, plus routing *all* potential GPU usage (in particular: including
> execution tests for effective-targets, see above) through a serial lock
> ('flock', implemented in DejaGnu board file, outside of the the
> "DejaGnu timeout domain", similar to
> 'libgomp/testsuite/lib/libgomp.exp:libgomp_load', see above), plus
> catching 'HSA_STATUS_ERROR_OUT_OF_RESOURCES' (both the "real" ones and
> the "fake" ones via "GCN: Watchdog for device image load") and in that
> case 'amdgpu_gpu_recover' and re-execution of the respective executable,
> does greatly stabilize flaky GCN target/offloading testing.
> 
> Do we have consensus to move forward with this approach, generally?

I've also observed a number of random hangs in host-side code outside 
our control, but after the kernel has exited. In general this watchdog 
approach might help with these. I do feel like it's "papering over the 
cracks", but if we can't fix it.... at the end of the day it's just a 
little extra code.

My only concern is that it might actually cause failures, perhaps on 
heavily loaded systems, or with network filesystems, or during debugging.

Andrew

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Stabilize flaky GCN target/offloading testing
  2024-03-06 12:39               ` Andrew Stubbs
@ 2024-03-06 13:29                 ` Richard Biener
  0 siblings, 0 replies; 25+ messages in thread
From: Richard Biener @ 2024-03-06 13:29 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: Thomas Schwinge, Tobias Burnus, gcc-patches, Jakub Jelinek

On Wed, 6 Mar 2024, Andrew Stubbs wrote:

> On 06/03/2024 12:09, Thomas Schwinge wrote:
> > Hi!
> > 
> > On 2024-02-21T17:32:13+0100, Richard Biener <rguenther@suse.de> wrote:
> >> Am 21.02.2024 um 13:34 schrieb Thomas Schwinge <tschwinge@baylibre.com>:
> >>> [...] per my work on <https://gcc.gnu.org/PR66005>
> >>> "libgomp make check time is excessive", all execution testing in libgomp
> >>> is serialized in 'libgomp/testsuite/lib/libgomp.exp:libgomp_load'.  [...]
> >>> (... with the caveat that execution tests for
> >>> effective-targets are *not* governed by that, as I've found yesterday.
> >>> I have a WIP hack for that, too.)
> > 
> >>> What disturbs the testing a lot is, that the GPU may get into a bad
> >>> state, upon which any use either fails with a
> >>> 'HSA_STATUS_ERROR_OUT_OF_RESOURCES' error -- or by just hanging, deep in
> >>> 'libhsa-runtime64.so.1'...
> >>>
> >>> I've now tried to debug the latter case (hang).  When the GPU gets into
> >>> this bad state (whatever exactly that is),
> >>> 'hsa_executable_load_code_object' still returns 'HSA_STATUS_SUCCESS', but
> >>> then GCN target execution ('gcn-run') hangs in 'hsa_executable_freeze'
> >>> vs. GCN offloading execution ('libgomp-plugin-gcn.so.1') hangs right
> >>> before 'hsa_executable_freeze', in the GCN heap setup 'hsa_memory_copy'.
> >>> There it hangs until killed (for example, until DejaGnu's timeout
> >>> mechanism kills the process -- just that the next GPU-using execution
> >>> test then runs into the same thing again...).
> >>>
> >>> In this state (and also the 'HSA_STATUS_ERROR_OUT_OF_RESOURCES' state),
> >>> we're able to recover via:
> >>>
> >>>     $ flock /tmp/gpu.lock sudo cat
> >>>     /sys/kernel/debug/dri/0/amdgpu_gpu_recover
> >>>     0
> > 
> > At least most of the times.  I've found that -- sometimes... ;-( -- if
> > you run into 'HSA_STATUS_ERROR_OUT_OF_RESOURCES', then do
> > 'amdgpu_gpu_recover', and then immediately re-execute, you'll again run
> > into 'HSA_STATUS_ERROR_OUT_OF_RESOURCES'.  That appears to be avoidable
> > by injecting some artificial "cool-down period"...  (The latter I've not
> > yet tested extensively.)
> > 
> >>> This is, obviously, a hack, probably needs a serial lock to not disturb
> >>> other things, has hard-coded 'dri/0', and as I said in
> >>> <https://inbox.sourceware.org/87plww8qin.fsf@euler.schwinge.ddns.net>
> >>> "GCN RDNA2+ vs. GCC SLP vectorizer":
> >>>
> >>> | I've no idea what
> >>> | 'amdgpu_gpu_recover' would do if the GPU is also used for display.
> >>
> >> It ends up terminating your X session?
> > 
> > Eh....  ;'-|
> > 
> >> (there?s some automatic driver recovery that?s also sometimes triggered
> >> which sounds like the same thing).
> > 
> >> I need to try using the integrated graphics for X11 to see if that avoids
> >> the issue.
> > 
> > A few years ago, I tried that for a Nvidia GPU laptop, and -- if I now
> > remember correctly -- basically got it to work, via hand-editing
> > '/etc/X11/xorg.conf' and all that...  But: I couldn't get external HDMI
> > to work in that setup, and therefore reverted to "standard".
> > 
> >> Guess AMD needs to improve the driver/runtime (or we - it?s open source at
> >> least up to the firmware).
> > 
> >>> However, it's very useful in my testing.  :-|
> >>>
> >>> The questions is, how to detect the "hang" state without first running
> >>> into a timeout (and disambiguating such a timeout from a user code
> >>> timeout)?  Add a watchdog: call 'alarm([a few seconds])' before device
> >>> initialization, and before the actual GPU kernel launch cancel it with
> >>> 'alarm(0)'?  (..., and add a handler for 'SIGALRM' to print a distinct
> >>> error message that we can then react on, like for
> >>> 'HSA_STATUS_ERROR_OUT_OF_RESOURCES'.)  Probably 'alarm'/'SIGALRM' is a
> >>> no-go in libgomp -- instead, use a helper thread to similarly implement a
> >>> watchdog?  ('libgomp/plugin/plugin-gcn.c' already is using pthreads for
> >>> other purposes.)  Any other clever ideas?  What's a suitable value for
> >>> "a few seconds"?
> > 
> > I'm attaching my current "GCN: Watchdog for device image load", covering
> > both 'gcc/config/gcn/gcn-run.cc' and 'libgomp/plugin/plugin-gcn.c'.
> > (That's using 'timer_create' etc. instead of 'alarm'/'SIGALRM'. )
> > 
> > That, plus routing *all* potential GPU usage (in particular: including
> > execution tests for effective-targets, see above) through a serial lock
> > ('flock', implemented in DejaGnu board file, outside of the the
> > "DejaGnu timeout domain", similar to
> > 'libgomp/testsuite/lib/libgomp.exp:libgomp_load', see above), plus
> > catching 'HSA_STATUS_ERROR_OUT_OF_RESOURCES' (both the "real" ones and
> > the "fake" ones via "GCN: Watchdog for device image load") and in that
> > case 'amdgpu_gpu_recover' and re-execution of the respective executable,
> > does greatly stabilize flaky GCN target/offloading testing.
> > 
> > Do we have consensus to move forward with this approach, generally?
> 
> I've also observed a number of random hangs in host-side code outside our
> control, but after the kernel has exited. In general this watchdog approach
> might help with these. I do feel like it's "papering over the cracks", but if
> we can't fix it.... at the end of the day it's just a little extra code.

I wonder if you maybe have contact to people at AMD that are willing
to debug this and improve the driver side of this?  I'm seeing quite
a number of similar reports for the issue I hit in the github tracker,
multiple years old and also current, so that doesn't seem to be a good
way to get things fixed ...

Richard.

> My only concern is that it might actually cause failures, perhaps on heavily
> loaded systems, or with network filesystems, or during debugging.
>
> Andrew
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* GCN, nvptx: Errors during device probing are fatal (was: Stabilizing flaky libgomp GCN target/offloading testing)
  2024-02-21 12:34         ` Stabilizing flaky libgomp GCN target/offloading testing (was: libgomp GCN gfx1030/gfx1100 offloading status) Thomas Schwinge
  2024-02-21 16:32           ` Richard Biener
@ 2024-03-08 10:34           ` Thomas Schwinge
  1 sibling, 0 replies; 25+ messages in thread
From: Thomas Schwinge @ 2024-03-08 10:34 UTC (permalink / raw)
  To: Andrew Stubbs, gcc-patches, Jakub Jelinek; +Cc: Richard Biener, Tobias Burnus

[-- Attachment #1: Type: text/plain, Size: 1145 bytes --]

Hi!

On 2024-02-21T13:34:01+0100, I wrote:
> On 2024-02-01T15:49:02+0100, Richard Biener <rguenther@suse.de> wrote:
>> On Thu, 1 Feb 2024, Thomas Schwinge wrote:
>>> [...] what I
>>> got with '-march=gfx1100' for AMD Radeon RX 7900 XTX.  [...]
>
>>> [...] execution test FAILs.  Not all FAILs appear all the time [...]
>
> What disturbs the testing a lot is, that the GPU may get into a bad
> state, upon which any use either fails with a
> 'HSA_STATUS_ERROR_OUT_OF_RESOURCES' error -- or by just hanging, deep in
> 'libhsa-runtime64.so.1'...

So, there's a "fun" aspect: if we run into
'HSA_STATUS_ERROR_OUT_OF_RESOURCES' (or other errors; and similar in the
libgomp nvptx plugin) during libgomp GCN plugin device probing, then it's
not fatal, but instead silently disables the libgomp plugin/device, thus
typically silently resorting to host-fallback execution.  That's not
helpful behavior in my opinion, so I propose the attached
"GCN, nvptx: Errors during device probing are fatal".  OK to push?

(That's also the behavior that's implemented in both the GCN and nvptx
target 'run' tools.)


Grüße
 Thomas



[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-GCN-nvptx-Errors-during-device-probing-are-fatal.patch --]
[-- Type: text/x-diff, Size: 5022 bytes --]

From 0dc72089dccc10d3b55096ade5fc4d72de6cb96f Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <tschwinge@baylibre.com>
Date: Thu, 7 Mar 2024 14:42:07 +0100
Subject: [PATCH] GCN, nvptx: Errors during device probing are fatal

Currently, we silently disable libgomp GCN and nvptx plugins/devices in
presence of certain error conditions during device probing, thus typically
silently resorting to host-fallback execution.  Make such errors fatal, similar
as for any other device access later on, so that we early and reliably notice
when things go wrong.  (Keep just two cases non-fatal: (a) libgomp GCN or nvptx
plugins are available but 'libhsa-runtime64.so.1' or 'libcuda.so.1' are not,
and (b) those are available, but the corresponding devices are not.)

This resolves the issue that we've got execution test cases unexpectedly
PASSing, despite:

    libgomp: GCN fatal error: Run-time could not be initialized
    Runtime message: HSA_STATUS_ERROR_OUT_OF_RESOURCES: The runtime failed to allocate the necessary resources. This error may also occur when the core runtime library needs to spawn threads or create internal OS-specific events.

..., and therefore they were not offloaded to the GCN device, but ran in
host-fallback execution mode.  What happend in that scenario is that in
'init_hsa_context' during the initial 'GOMP_OFFLOAD_get_num_devices' we ran
into 'HSA_STATUS_ERROR_OUT_OF_RESOURCES', but it wasn't fatal, but just
silently disabled the libgomp plugin/device.

Especially "entertaining" were cases where such unintended host-fallback
execution happened during effective-target checks like
'offload_device_available' (host-fallback execution there meaning: no offload
device available), but actual test cases then were running with an offload
device available, and therefore mis-configured.

	include/
	* cuda/cuda.h (CUresult): Add 'CUDA_ERROR_NO_DEVICE'.
	libgomp/
	* plugin/plugin-gcn.c (init_hsa_context): Add and handle
	'bool probe' parameter.  Adjust all users; errors during device
	probing are fatal.
	* plugin/plugin-nvptx.c (nvptx_get_num_devices): Aside from
	'CUDA_ERROR_NO_DEVICE', errors during device probing are fatal.
---
 include/cuda/cuda.h           |  1 +
 libgomp/plugin/plugin-gcn.c   | 14 ++++++++------
 libgomp/plugin/plugin-nvptx.c |  4 +++-
 3 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/include/cuda/cuda.h b/include/cuda/cuda.h
index 114aba4e074..0dca4b3a5c0 100644
--- a/include/cuda/cuda.h
+++ b/include/cuda/cuda.h
@@ -57,6 +57,7 @@ typedef enum {
   CUDA_ERROR_OUT_OF_MEMORY = 2,
   CUDA_ERROR_NOT_INITIALIZED = 3,
   CUDA_ERROR_DEINITIALIZED = 4,
+  CUDA_ERROR_NO_DEVICE = 100,
   CUDA_ERROR_INVALID_CONTEXT = 201,
   CUDA_ERROR_INVALID_HANDLE = 400,
   CUDA_ERROR_NOT_FOUND = 500,
diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 7e141a85f31..2bea9157e9d 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -1511,10 +1511,12 @@ assign_agent_ids (hsa_agent_t agent, void *data)
 }
 
 /* Initialize hsa_context if it has not already been done.
-   Return TRUE on success.  */
+   If !PROBE: returns TRUE on success.
+   If PROBE: returns TRUE on success or if the plugin/device shall be silently
+   ignored, and otherwise emits an error and returns FALSE.  */
 
 static bool
-init_hsa_context (void)
+init_hsa_context (bool probe)
 {
   hsa_status_t status;
   int agent_index = 0;
@@ -1529,7 +1531,7 @@ init_hsa_context (void)
 	GOMP_PLUGIN_fatal ("%s\n", msg);
       else
 	GCN_WARNING ("%s\n", msg);
-      return false;
+      return probe ? true : false;
     }
   status = hsa_fns.hsa_init_fn ();
   if (status != HSA_STATUS_SUCCESS)
@@ -3321,8 +3323,8 @@ GOMP_OFFLOAD_version (void)
 int
 GOMP_OFFLOAD_get_num_devices (unsigned int omp_requires_mask)
 {
-  if (!init_hsa_context ())
-    return 0;
+  if (!init_hsa_context (true))
+    exit (EXIT_FAILURE);
   /* Return -1 if no omp_requires_mask cannot be fulfilled but
      devices were present.  */
   if (hsa_context.agent_count > 0
@@ -3339,7 +3341,7 @@ GOMP_OFFLOAD_get_num_devices (unsigned int omp_requires_mask)
 bool
 GOMP_OFFLOAD_init_device (int n)
 {
-  if (!init_hsa_context ())
+  if (!init_hsa_context (false))
     return false;
   if (n >= hsa_context.agent_count)
     {
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 81b4a7f499a..ba92a3a48cb 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -622,12 +622,14 @@ nvptx_get_num_devices (void)
       CUresult r = CUDA_CALL_NOCHECK (cuInit, 0);
       /* This is not an error: e.g. we may have CUDA libraries installed but
          no devices available.  */
-      if (r != CUDA_SUCCESS)
+      if (r == CUDA_ERROR_NO_DEVICE)
 	{
 	  GOMP_PLUGIN_debug (0, "Disabling nvptx offloading; cuInit: %s\n",
 			     cuda_error (r));
 	  return 0;
 	}
+      else if (r != CUDA_SUCCESS)
+	GOMP_PLUGIN_fatal ("cuInit error: %s", cuda_error (r));
     }
 
   CUDA_CALL_ASSERT (cuDeviceGetCount, &n);
-- 
2.34.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* amdgcn: additional gfx1030/gfx1100 support: adjust test cases (was: [PATCH] amdgcn: additional gfx1100 support)
  2024-01-24 12:43 [PATCH] amdgcn: additional gfx1100 support Andrew Stubbs
  2024-01-24 16:01 ` [patch] amdgcn: config.gcc - enable gfx1100 multilib; add gfx1100 to docs (was: [PATCH] amdgcn: additional gfx1100 support) Tobias Burnus
  2024-01-26  8:56 ` [PATCH] amdgcn: additional gfx1100 support Richard Biener
@ 2024-03-06 13:49 ` Thomas Schwinge
  2024-03-06 14:03   ` amdgcn: additional gfx1030/gfx1100 support: adjust test cases Andrew Stubbs
  2 siblings, 1 reply; 25+ messages in thread
From: Thomas Schwinge @ 2024-03-06 13:49 UTC (permalink / raw)
  To: Andrew Stubbs, gcc-patches; +Cc: Richard Biener

[-- Attachment #1: Type: text/plain, Size: 3109 bytes --]

Hi!

On 2024-01-24T12:43:04+0000, Andrew Stubbs <ams@baylibre.com> wrote:
> This [...]

... became commit 99890e15527f1f04caef95ecdd135c9f1a077f08
"amdgcn: additional gfx1030/gfx1100 support", and included the following:

> --- a/gcc/config/gcn/gcn-valu.md
> +++ b/gcc/config/gcn/gcn-valu.md
> @@ -3555,30 +3555,63 @@
>  ;; }}}
>  ;; {{{ Int/int conversions
>  
> +(define_code_iterator all_convert [truncate zero_extend sign_extend])
>  (define_code_iterator zero_convert [truncate zero_extend])
>  (define_code_attr convop [
>  	(sign_extend "extend")
>  	(zero_extend "zero_extend")
>  	(truncate "trunc")])
>  
> -(define_insn "<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>"
> +(define_expand "<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>"
> +  [(set (match_operand:V_INT_1REG 0 "register_operand"      "=v")
> +        (all_convert:V_INT_1REG
> +	  (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))]
> +  "")
> +
> +(define_insn "*<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>"
>    [(set (match_operand:V_INT_1REG 0 "register_operand"      "=v")
>          (zero_convert:V_INT_1REG
>  	  (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))]
> -  ""
> +  "!TARGET_RDNA3"
>    "v_mov_b32_sdwa\t%0, %1 dst_sel:<V_INT_1REG:sdwa> dst_unused:UNUSED_PAD src0_sel:<V_INT_1REG_ALT:sdwa>"
>    [(set_attr "type" "vop_sdwa")
>     (set_attr "length" "8")])
>  
> -(define_insn "extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>"
> +(define_insn "extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>"
>    [(set (match_operand:V_INT_1REG 0 "register_operand"	    "=v")
>          (sign_extend:V_INT_1REG
>  	  (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))]
> -  ""
> +  "!TARGET_RDNA3"
>    "v_mov_b32_sdwa\t%0, sext(%1) src0_sel:<V_INT_1REG_ALT:sdwa>"
>    [(set_attr "type" "vop_sdwa")
>     (set_attr "length" "8")])
>  
> +(define_insn "*<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_shift<exec>"
> +  [(set (match_operand:V_INT_1REG 0 "register_operand"      "=v")
> +        (all_convert:V_INT_1REG
> +	  (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))]
> +  "TARGET_RDNA3"
> +  {
> +    enum {extend, zero_extend, trunc};
> +    rtx shiftwidth = (<V_INT_1REG_ALT:SCALAR_MODE>mode == QImode
> +		      || <V_INT_1REG:SCALAR_MODE>mode == QImode
> +		      ? GEN_INT (24)
> +		      : <V_INT_1REG_ALT:SCALAR_MODE>mode == HImode
> +		        || <V_INT_1REG:SCALAR_MODE>mode == HImode
> +		      ? GEN_INT (16)
> +		      : NULL);
> +    operands[2] = shiftwidth;
> +
> +    if (!shiftwidth)
> +      return "v_mov_b32 %0, %1";
> +    else if (<convop> == extend || <convop> == trunc)
> +      return "v_lshlrev_b32\t%0, %2, %1\;v_ashrrev_i32\t%0, %2, %0";
> +    else
> +      return "v_lshlrev_b32\t%0, %2, %1\;v_lshrrev_b32\t%0, %2, %0";
> +  }
> +  [(set_attr "type" "mult")
> +   (set_attr "length" "8")])

OK to push the attached
"amdgcn: additional gfx1030/gfx1100 support: adjust test cases"?
Tested 'gcn.exp' for all '-march'es.


Grüße
 Thomas



[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-amdgcn-additional-gfx1030-gfx1100-support-adjust-tes.patch --]
[-- Type: text/x-diff, Size: 3283 bytes --]

From 04b83e9aa19b02b9805e03f31db14325bb00e737 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <tschwinge@baylibre.com>
Date: Mon, 4 Mar 2024 10:40:39 +0100
Subject: [PATCH] amdgcn: additional gfx1030/gfx1100 support: adjust test cases

The "SDWA" changes in commit 99890e15527f1f04caef95ecdd135c9f1a077f08
"amdgcn: additional gfx1030/gfx1100 support" caused a few regressions:

    PASS: gcc.target/gcn/sram-ecc-3.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.target/gcn/sram-ecc-3.c scan-assembler zero_extendv64qiv64si2

    PASS: gcc.target/gcn/sram-ecc-4.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.target/gcn/sram-ecc-4.c scan-assembler zero_extendv64hiv64si2

    PASS: gcc.target/gcn/sram-ecc-7.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.target/gcn/sram-ecc-7.c scan-assembler zero_extendv64qiv64si2

    PASS: gcc.target/gcn/sram-ecc-8.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.target/gcn/sram-ecc-8.c scan-assembler zero_extendv64hiv64si2

Those test cases need corresponding adjustment.

	gcc/testsuite/
	* gcc.target/gcn/sram-ecc-3.c: Adjust.
	* gcc.target/gcn/sram-ecc-4.c: Likewise.
	* gcc.target/gcn/sram-ecc-7.c: Likewise.
	* gcc.target/gcn/sram-ecc-8.c: Likewise.
---
 gcc/testsuite/gcc.target/gcn/sram-ecc-3.c | 2 +-
 gcc/testsuite/gcc.target/gcn/sram-ecc-4.c | 2 +-
 gcc/testsuite/gcc.target/gcn/sram-ecc-7.c | 2 +-
 gcc/testsuite/gcc.target/gcn/sram-ecc-8.c | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.target/gcn/sram-ecc-3.c b/gcc/testsuite/gcc.target/gcn/sram-ecc-3.c
index 692d4578b66..bc89e3542d2 100644
--- a/gcc/testsuite/gcc.target/gcn/sram-ecc-3.c
+++ b/gcc/testsuite/gcc.target/gcn/sram-ecc-3.c
@@ -18,4 +18,4 @@ f ()
     a[n] = b[n];
 }
 
-/* { dg-final { scan-assembler "zero_extendv64qiv64si2" } } */
+/* { dg-final { scan-assembler "(\\\*zero_extendv64qiv64si_sdwa|\\\*zero_extendv64qiv64si_shift)" } } */
diff --git a/gcc/testsuite/gcc.target/gcn/sram-ecc-4.c b/gcc/testsuite/gcc.target/gcn/sram-ecc-4.c
index 61b8d552759..ff7e2d0bda5 100644
--- a/gcc/testsuite/gcc.target/gcn/sram-ecc-4.c
+++ b/gcc/testsuite/gcc.target/gcn/sram-ecc-4.c
@@ -18,4 +18,4 @@ f ()
     a[n] = b[n];
 }
 
-/* { dg-final { scan-assembler "zero_extendv64hiv64si2" } } */
+/* { dg-final { scan-assembler "(\\\*zero_extendv64hiv64si_sdwa|\\\*zero_extendv64hiv64si_shift)" } } */
diff --git a/gcc/testsuite/gcc.target/gcn/sram-ecc-7.c b/gcc/testsuite/gcc.target/gcn/sram-ecc-7.c
index 9d0ce6f6b5a..8d363970ffb 100644
--- a/gcc/testsuite/gcc.target/gcn/sram-ecc-7.c
+++ b/gcc/testsuite/gcc.target/gcn/sram-ecc-7.c
@@ -18,4 +18,4 @@ f ()
     a[n] = b[n];
 }
 
-/* { dg-final { scan-assembler "zero_extendv64qiv64si2" } } */
+/* { dg-final { scan-assembler "(\\\*zero_extendv64qiv64si_sdwa|\\\*zero_extendv64qiv64si_shift)" } } */
diff --git a/gcc/testsuite/gcc.target/gcn/sram-ecc-8.c b/gcc/testsuite/gcc.target/gcn/sram-ecc-8.c
index 76e02882798..a2b25076ed1 100644
--- a/gcc/testsuite/gcc.target/gcn/sram-ecc-8.c
+++ b/gcc/testsuite/gcc.target/gcn/sram-ecc-8.c
@@ -18,4 +18,4 @@ f ()
     a[n] = b[n];
 }
 
-/* { dg-final { scan-assembler "zero_extendv64hiv64si2" } } */
+/* { dg-final { scan-assembler "(\\\*zero_extendv64hiv64si_sdwa|\\\*zero_extendv64hiv64si_shift)" } } */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: amdgcn: additional gfx1030/gfx1100 support: adjust test cases
  2024-03-06 13:49 ` amdgcn: additional gfx1030/gfx1100 support: adjust test cases (was: [PATCH] amdgcn: additional gfx1100 support) Thomas Schwinge
@ 2024-03-06 14:03   ` Andrew Stubbs
  0 siblings, 0 replies; 25+ messages in thread
From: Andrew Stubbs @ 2024-03-06 14:03 UTC (permalink / raw)
  To: Thomas Schwinge, gcc-patches; +Cc: Richard Biener

On 06/03/2024 13:49, Thomas Schwinge wrote:
> Hi!
> 
> On 2024-01-24T12:43:04+0000, Andrew Stubbs <ams@baylibre.com> wrote:
>> This [...]
> 
> ... became commit 99890e15527f1f04caef95ecdd135c9f1a077f08
> "amdgcn: additional gfx1030/gfx1100 support", and included the following:
> 
>> --- a/gcc/config/gcn/gcn-valu.md
>> +++ b/gcc/config/gcn/gcn-valu.md
>> @@ -3555,30 +3555,63 @@
>>   ;; }}}
>>   ;; {{{ Int/int conversions
>>   
>> +(define_code_iterator all_convert [truncate zero_extend sign_extend])
>>   (define_code_iterator zero_convert [truncate zero_extend])
>>   (define_code_attr convop [
>>   	(sign_extend "extend")
>>   	(zero_extend "zero_extend")
>>   	(truncate "trunc")])
>>   
>> -(define_insn "<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>"
>> +(define_expand "<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>"
>> +  [(set (match_operand:V_INT_1REG 0 "register_operand"      "=v")
>> +        (all_convert:V_INT_1REG
>> +	  (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))]
>> +  "")
>> +
>> +(define_insn "*<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>"
>>     [(set (match_operand:V_INT_1REG 0 "register_operand"      "=v")
>>           (zero_convert:V_INT_1REG
>>   	  (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))]
>> -  ""
>> +  "!TARGET_RDNA3"
>>     "v_mov_b32_sdwa\t%0, %1 dst_sel:<V_INT_1REG:sdwa> dst_unused:UNUSED_PAD src0_sel:<V_INT_1REG_ALT:sdwa>"
>>     [(set_attr "type" "vop_sdwa")
>>      (set_attr "length" "8")])
>>   
>> -(define_insn "extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>"
>> +(define_insn "extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>"
>>     [(set (match_operand:V_INT_1REG 0 "register_operand"	    "=v")
>>           (sign_extend:V_INT_1REG
>>   	  (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))]
>> -  ""
>> +  "!TARGET_RDNA3"
>>     "v_mov_b32_sdwa\t%0, sext(%1) src0_sel:<V_INT_1REG_ALT:sdwa>"
>>     [(set_attr "type" "vop_sdwa")
>>      (set_attr "length" "8")])
>>   
>> +(define_insn "*<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_shift<exec>"
>> +  [(set (match_operand:V_INT_1REG 0 "register_operand"      "=v")
>> +        (all_convert:V_INT_1REG
>> +	  (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))]
>> +  "TARGET_RDNA3"
>> +  {
>> +    enum {extend, zero_extend, trunc};
>> +    rtx shiftwidth = (<V_INT_1REG_ALT:SCALAR_MODE>mode == QImode
>> +		      || <V_INT_1REG:SCALAR_MODE>mode == QImode
>> +		      ? GEN_INT (24)
>> +		      : <V_INT_1REG_ALT:SCALAR_MODE>mode == HImode
>> +		        || <V_INT_1REG:SCALAR_MODE>mode == HImode
>> +		      ? GEN_INT (16)
>> +		      : NULL);
>> +    operands[2] = shiftwidth;
>> +
>> +    if (!shiftwidth)
>> +      return "v_mov_b32 %0, %1";
>> +    else if (<convop> == extend || <convop> == trunc)
>> +      return "v_lshlrev_b32\t%0, %2, %1\;v_ashrrev_i32\t%0, %2, %0";
>> +    else
>> +      return "v_lshlrev_b32\t%0, %2, %1\;v_lshrrev_b32\t%0, %2, %0";
>> +  }
>> +  [(set_attr "type" "mult")
>> +   (set_attr "length" "8")])
> 
> OK to push the attached
> "amdgcn: additional gfx1030/gfx1100 support: adjust test cases"?
> Tested 'gcn.exp' for all '-march'es.

LGTM.

Andrew


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2024-03-08 10:34 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-24 12:43 [PATCH] amdgcn: additional gfx1100 support Andrew Stubbs
2024-01-24 16:01 ` [patch] amdgcn: config.gcc - enable gfx1100 multilib; add gfx1100 to docs (was: [PATCH] amdgcn: additional gfx1100 support) Tobias Burnus
2024-01-26 12:26   ` [patch] amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs (was: [patch] amdgcn: config.gcc - enable gfx1100 multilib; add gfx1100 to docs) Tobias Burnus
2024-01-26 12:32     ` [patch] amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs Tobias Burnus
2024-01-26 12:40       ` Richard Biener
2024-01-26 12:59         ` Tobias Burnus
2024-01-26 16:21       ` Thomas Schwinge
2024-01-26 16:36         ` Richard Biener
2024-01-26 16:45         ` [patch] install.texi: For gcn, recommend LLVM 15, unless gfx1100 is disabled (was: [patch] amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs) Tobias Burnus
2024-01-29 10:01           ` [patch] install.texi: For gcn, recommend LLVM 15, unless gfx1100 is disabled Andrew Stubbs
2024-01-26  8:56 ` [PATCH] amdgcn: additional gfx1100 support Richard Biener
2024-01-26  9:45   ` Richard Biener
2024-01-26 10:19     ` Andrew Stubbs
2024-01-26 10:22       ` Richard Biener
2024-01-26 10:31         ` Andrew Stubbs
2024-02-01 14:41     ` libgomp GCN gfx1030/gfx1100 offloading status (was: [PATCH] amdgcn: additional gfx1100 support) Thomas Schwinge
2024-02-01 14:49       ` Richard Biener
2024-02-21 12:34         ` Stabilizing flaky libgomp GCN target/offloading testing (was: libgomp GCN gfx1030/gfx1100 offloading status) Thomas Schwinge
2024-02-21 16:32           ` Richard Biener
2024-03-06 12:09             ` Stabilize flaky GCN target/offloading testing Thomas Schwinge
2024-03-06 12:39               ` Andrew Stubbs
2024-03-06 13:29                 ` Richard Biener
2024-03-08 10:34           ` GCN, nvptx: Errors during device probing are fatal (was: Stabilizing flaky libgomp GCN target/offloading testing) Thomas Schwinge
2024-03-06 13:49 ` amdgcn: additional gfx1030/gfx1100 support: adjust test cases (was: [PATCH] amdgcn: additional gfx1100 support) Thomas Schwinge
2024-03-06 14:03   ` amdgcn: additional gfx1030/gfx1100 support: adjust test cases Andrew Stubbs

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).