* [PATCH] amdgcn: additional gfx1100 support @ 2024-01-24 12:43 Andrew Stubbs 2024-01-24 16:01 ` [patch] amdgcn: config.gcc - enable gfx1100 multilib; add gfx1100 to docs (was: [PATCH] amdgcn: additional gfx1100 support) Tobias Burnus ` (2 more replies) 0 siblings, 3 replies; 25+ messages in thread From: Andrew Stubbs @ 2024-01-24 12:43 UTC (permalink / raw) To: gcc-patches; +Cc: pa, rguenther This is enough to get gfx1100 working for most purposes, on top of the patch that Tobias committed a week or so ago; there are still some test failures to investigate, and probably some tuning to do. It might also get gfx1030 working too. @Richi, could you test it, please? I can't test the other multilibs right now. @PA, can you test it please? I can self-approve the patch, but I'll hold off the commit until the test results come back. Andrew gcc/ChangeLog: * config/gcn/gcn-opts.h (TARGET_PACKED_WORK_ITEMS): Add TARGET_RDNA3. * config/gcn/gcn-valu.md (all_convert): New iterator. (<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>): New define_expand, and rename the old one to ... (*<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>): ... this. (extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>): Likewise, to ... (extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>): .. this. (*<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_shift<exec>): New. * config/gcn/gcn.cc (gcn_global_address_p): Use "offsetbits" correctly. (gcn_hsa_declare_function_name): Update the vgpr counting for gfx1100. * config/gcn/gcn.md (<u>mulhisi3): Disable on RDNA3. (<u>mulqihi3_scalar): Likewise. libgcc/ChangeLog: * config/gcn/amdgcn_veclib.h (CDNA3_PLUS): Handle RDNA3. libgomp/ChangeLog: * config/gcn/time.c (RTC_TICKS): Configure RDNA3. (omp_get_wtime): Add RDNA3-compatible variant. * plugin/plugin-gcn.c (max_isa_vgprs): Tune for gfx1030 and gfx1100. Signed-off-by: Andrew Stubbs <ams@baylibre.com> --- gcc/config/gcn/gcn-opts.h | 2 +- gcc/config/gcn/gcn-valu.md | 41 ++++++++++++++++++++++++++++--- gcc/config/gcn/gcn.cc | 31 ++++++++++++++++------- gcc/config/gcn/gcn.md | 4 +-- libgcc/config/gcn/amdgcn_veclib.h | 2 +- libgomp/config/gcn/time.c | 10 ++++++++ libgomp/plugin/plugin-gcn.c | 6 +++-- 7 files changed, 77 insertions(+), 19 deletions(-) diff --git a/gcc/config/gcn/gcn-opts.h b/gcc/config/gcn/gcn-opts.h index 79fbda3ab25..6be2c9204fa 100644 --- a/gcc/config/gcn/gcn-opts.h +++ b/gcc/config/gcn/gcn-opts.h @@ -62,7 +62,7 @@ extern enum gcn_isa { #define TARGET_M0_LDS_LIMIT (TARGET_GCN3) -#define TARGET_PACKED_WORK_ITEMS (TARGET_CDNA2_PLUS) +#define TARGET_PACKED_WORK_ITEMS (TARGET_CDNA2_PLUS || TARGET_RDNA3) #define TARGET_XNACK (flag_xnack != HSACO_ATTR_OFF) diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md index 3d5b6271ee6..cd027f8b369 100644 --- a/gcc/config/gcn/gcn-valu.md +++ b/gcc/config/gcn/gcn-valu.md @@ -3555,30 +3555,63 @@ ;; }}} ;; {{{ Int/int conversions +(define_code_iterator all_convert [truncate zero_extend sign_extend]) (define_code_iterator zero_convert [truncate zero_extend]) (define_code_attr convop [ (sign_extend "extend") (zero_extend "zero_extend") (truncate "trunc")]) -(define_insn "<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>" +(define_expand "<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>" + [(set (match_operand:V_INT_1REG 0 "register_operand" "=v") + (all_convert:V_INT_1REG + (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))] + "") + +(define_insn "*<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>" [(set (match_operand:V_INT_1REG 0 "register_operand" "=v") (zero_convert:V_INT_1REG (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))] - "" + "!TARGET_RDNA3" "v_mov_b32_sdwa\t%0, %1 dst_sel:<V_INT_1REG:sdwa> dst_unused:UNUSED_PAD src0_sel:<V_INT_1REG_ALT:sdwa>" [(set_attr "type" "vop_sdwa") (set_attr "length" "8")]) -(define_insn "extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>" +(define_insn "extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>" [(set (match_operand:V_INT_1REG 0 "register_operand" "=v") (sign_extend:V_INT_1REG (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))] - "" + "!TARGET_RDNA3" "v_mov_b32_sdwa\t%0, sext(%1) src0_sel:<V_INT_1REG_ALT:sdwa>" [(set_attr "type" "vop_sdwa") (set_attr "length" "8")]) +(define_insn "*<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_shift<exec>" + [(set (match_operand:V_INT_1REG 0 "register_operand" "=v") + (all_convert:V_INT_1REG + (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))] + "TARGET_RDNA3" + { + enum {extend, zero_extend, trunc}; + rtx shiftwidth = (<V_INT_1REG_ALT:SCALAR_MODE>mode == QImode + || <V_INT_1REG:SCALAR_MODE>mode == QImode + ? GEN_INT (24) + : <V_INT_1REG_ALT:SCALAR_MODE>mode == HImode + || <V_INT_1REG:SCALAR_MODE>mode == HImode + ? GEN_INT (16) + : NULL); + operands[2] = shiftwidth; + + if (!shiftwidth) + return "v_mov_b32 %0, %1"; + else if (<convop> == extend || <convop> == trunc) + return "v_lshlrev_b32\t%0, %2, %1\;v_ashrrev_i32\t%0, %2, %0"; + else + return "v_lshlrev_b32\t%0, %2, %1\;v_lshrrev_b32\t%0, %2, %0"; + } + [(set_attr "type" "mult") + (set_attr "length" "8")]) + ;; GCC can already do these for scalar types, but not for vector types. ;; Unfortunately you can't just do SUBREG on a vector to select the low part, ;; so there must be a few tricks here. diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc index e668ce7c69e..e80de2ce056 100644 --- a/gcc/config/gcn/gcn.cc +++ b/gcc/config/gcn/gcn.cc @@ -1597,8 +1597,8 @@ gcn_global_address_p (rtx addr) rtx offset = XEXP (addr, 1); int offsetbits = (TARGET_RDNA2_PLUS ? 11 : 12); bool immediate_p = (CONST_INT_P (offset) - && INTVAL (offset) >= -(1 << 12) - && INTVAL (offset) < (1 << 12)); + && INTVAL (offset) >= -(1 << offsetbits) + && INTVAL (offset) < (1 << offsetbits)); if ((gcn_address_register_p (base, DImode, false) || gcn_vec_address_register_p (base, DImode, false)) @@ -6597,8 +6597,10 @@ gcn_hsa_declare_function_name (FILE *file, const char *name, if (df_regs_ever_live_p (FIRST_AVGPR_REG + avgpr)) break; avgpr++; - vgpr = (vgpr + 3) & ~3; - avgpr = (avgpr + 3) & ~3; + + /* The main function epilogue uses v8, but df doesn't see that. */ + if (vgpr < 9) + vgpr = 9; if (!leaf_function_p ()) { @@ -6611,9 +6613,18 @@ gcn_hsa_declare_function_name (FILE *file, const char *name, avgpr = MAX_NORMAL_AVGPR_COUNT; } - /* The gfx90a accum_offset field can't represent 0 registers. */ - if (gcn_arch == PROCESSOR_GFX90a && vgpr < 4) - vgpr = 4; + /* SIMD32 devices count double in wavefront64 mode. */ + if (TARGET_RDNA2_PLUS) + vgpr *= 2; + + /* Round up to the allocation block size. */ + int vgpr_block_size = (TARGET_RDNA3 ? 12 + : TARGET_RDNA2_PLUS || TARGET_CDNA2_PLUS ? 8 + : 4); + if (vgpr % vgpr_block_size) + vgpr += vgpr_block_size - (vgpr % vgpr_block_size); + if (avgpr % vgpr_block_size) + avgpr += vgpr_block_size - (avgpr % vgpr_block_size); fputs ("\t.rodata\n" "\t.p2align\t6\n" @@ -6714,12 +6725,14 @@ gcn_hsa_declare_function_name (FILE *file, const char *name, " .private_segment_fixed_size: 0\n" " .wavefront_size: 64\n" " .sgpr_count: %i\n" - " .vgpr_count: %i\n" + " .vgpr_count: %i%s\n" " .max_flat_workgroup_size: 1024\n", cfun->machine->kernarg_segment_byte_size, cfun->machine->kernarg_segment_alignment, LDS_SIZE, - sgpr, next_free_vgpr); + sgpr, next_free_vgpr, + (TARGET_RDNA2_PLUS ? " ; wavefrontsize64 counts double on SIMD32" + : "")); if (gcn_arch == PROCESSOR_GFX90a || gcn_arch == PROCESSOR_GFX908) fprintf (file, " .agpr_count: %i\n", avgpr); fputs (" .end_amdgpu_metadata\n", file); diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md index 492b833e255..1f3c692b7a6 100644 --- a/gcc/config/gcn/gcn.md +++ b/gcc/config/gcn/gcn.md @@ -1618,7 +1618,7 @@ (mult:SI (any_extend:SI (match_operand:HI 1 "register_operand" "%v")) (any_extend:SI (match_operand:HI 2 "register_operand" " v"))))] - "" + "!TARGET_RDNA3" "v_mul_<iu>32_<iu>24_sdwa\t%0, %<e>1, %<e>2 src0_sel:WORD_0 src1_sel:WORD_0" [(set_attr "type" "vop_sdwa") (set_attr "length" "8")]) @@ -1628,7 +1628,7 @@ (mult:HI (any_extend:HI (match_operand:QI 1 "register_operand" "%v")) (any_extend:HI (match_operand:QI 2 "register_operand" " v"))))] - "" + "!TARGET_RDNA3" "v_mul_<iu>32_<iu>24_sdwa\t%0, %<e>1, %<e>2 src0_sel:BYTE_0 src1_sel:BYTE_0" [(set_attr "type" "vop_sdwa") (set_attr "length" "8")]) diff --git a/libgcc/config/gcn/amdgcn_veclib.h b/libgcc/config/gcn/amdgcn_veclib.h index 821f6386dd6..d268c6cac16 100644 --- a/libgcc/config/gcn/amdgcn_veclib.h +++ b/libgcc/config/gcn/amdgcn_veclib.h @@ -230,7 +230,7 @@ do { \ #if defined (__GCN3__) || defined (__GCN5__) \ || defined (__CDNA1__) || defined (__CDNA2__) \ - || defined (__RDNA2__) + || defined (__RDNA2__) || defined (__RDNA3__) #define CDNA3_PLUS 0 #else #define CDNA3_PLUS 1 diff --git a/libgomp/config/gcn/time.c b/libgomp/config/gcn/time.c index 30a0d0188e4..efcd04f5f43 100644 --- a/libgomp/config/gcn/time.c +++ b/libgomp/config/gcn/time.c @@ -30,15 +30,25 @@ /* According to AMD: dGPU RTC is 27MHz AGPU RTC is 100MHz + RDNA3 ISA manual states "typically 100MHz" FIXME: DTRT on an APU. */ +#ifdef __RDNA3__ +#define RTC_TICKS (1.0 / 100000000.0) /* 100MHz */ +#else #define RTC_TICKS (1.0 / 27000000.0) /* 27MHz */ +#endif double omp_get_wtime (void) { uint64_t clock; +#ifdef __RDNA3__ + asm ("s_sendmsg_rtn_b64 %0 0x83 ;Get REALTIME\n\t" + "s_waitcnt 0" : "=r" (clock)); +#else asm ("s_memrealtime %0\n\t" "s_waitcnt 0" : "=r" (clock)); +#endif return clock * RTC_TICKS; } diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c index 0339848451e..db28781dedb 100644 --- a/libgomp/plugin/plugin-gcn.c +++ b/libgomp/plugin/plugin-gcn.c @@ -1741,11 +1741,13 @@ max_isa_vgprs (int isa) case EF_AMDGPU_MACH_AMDGCN_GFX900: case EF_AMDGPU_MACH_AMDGCN_GFX906: case EF_AMDGPU_MACH_AMDGCN_GFX908: - case EF_AMDGPU_MACH_AMDGCN_GFX1030: - case EF_AMDGPU_MACH_AMDGCN_GFX1100: return 256; case EF_AMDGPU_MACH_AMDGCN_GFX90a: return 512; + case EF_AMDGPU_MACH_AMDGCN_GFX1030: + return 512; /* 512 SIMD32 = 256 wavefrontsize64. */ + case EF_AMDGPU_MACH_AMDGCN_GFX1100: + return 1536; /* 1536 SIMD32 = 768 wavefrontsize64. */ } GOMP_PLUGIN_fatal ("unhandled ISA in max_isa_vgprs"); } -- 2.41.0 ^ permalink raw reply [flat|nested] 25+ messages in thread
* [patch] amdgcn: config.gcc - enable gfx1100 multilib; add gfx1100 to docs (was: [PATCH] amdgcn: additional gfx1100 support) 2024-01-24 12:43 [PATCH] amdgcn: additional gfx1100 support Andrew Stubbs @ 2024-01-24 16:01 ` Tobias Burnus 2024-01-26 12:26 ` [patch] amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs (was: [patch] amdgcn: config.gcc - enable gfx1100 multilib; add gfx1100 to docs) Tobias Burnus 2024-01-26 8:56 ` [PATCH] amdgcn: additional gfx1100 support Richard Biener 2024-03-06 13:49 ` amdgcn: additional gfx1030/gfx1100 support: adjust test cases (was: [PATCH] amdgcn: additional gfx1100 support) Thomas Schwinge 2 siblings, 1 reply; 25+ messages in thread From: Tobias Burnus @ 2024-01-24 16:01 UTC (permalink / raw) To: Andrew Stubbs, gcc-patches; +Cc: pa, rguenther [-- Attachment #1: Type: text/plain, Size: 986 bytes --] This patch obviously depends on Andrew's; he wrote in the previous email of this thread regarding his patch: Andrew Stubbs wrote: > This is enough to get gfx1100 working for most purposes, on top of the > patch that Tobias committed a week or so ago; there are still some test > failures to investigate, and probably some tuning to do. > > It might also get gfx1030 working too. @Richi, could you test it, > please? > > I can't test the other multilibs right now. @PA, can you test it please? > > I can self-approve the patch, but I'll hold off the commit until the > test results come back. Okay to enable gfx1100 multilib building and to document gfx1100 in the manual? (I mean, obviously, only after Andrew committed his patch. For gfx1030, we might eventually also enable gfx1030 multilib support; if Richard confirms that collaterally fixes gfx1030, we probably should - and depending on the number/kinds of testsuite, we could then document it or not, I guess.) Tobias [-- Attachment #2: gfx1100-doc.diff --] [-- Type: text/x-patch, Size: 4657 bytes --] amdgcn: config.gcc - enable gfx1100 multilib; add gfx1100 to docs gcc/ChangeLog: * config.gcc (amdgcn-*-*): Add gfx1100 to TM_MULTILIB_CONFIG. * doc/install.texi (Configuration amdgcn-*-*): Mention gfx1100. * doc/invoke.texi (AMD GCN Options): Add gfx1100 to -march/-mtune. libgomp/ChangeLog: * testsuite/libgomp.c/declare-variant-4.h: Add variant functions for gfx1030 and gfx1100. * testsuite/libgomp.c/declare-variant-4-gfx1100.c: New test. Signed-off-by: Tobias Burnus <tburnus@baylibre.com> gcc/config.gcc | 2 +- gcc/doc/install.texi | 12 ++++++------ gcc/doc/invoke.texi | 3 +++ libgomp/testsuite/libgomp.c/declare-variant-4-gfx1100.c | 8 ++++++++ libgomp/testsuite/libgomp.c/declare-variant-4.h | 16 ++++++++++++++++ 5 files changed, 34 insertions(+), 7 deletions(-) diff --git a/gcc/config.gcc b/gcc/config.gcc index b2d7d7dd475..2343e98ebe6 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -4564,7 +4564,7 @@ case "${target}" in TM_MULTILIB_CONFIG= ;; xdefault | xyes) - TM_MULTILIB_CONFIG=`echo "gfx900,gfx906,gfx908,gfx90a" | sed "s/${with_arch},\?//;s/,$//"` + TM_MULTILIB_CONFIG=`echo "gfx900,gfx906,gfx908,gfx90a,gfx1100" | sed "s/${with_arch},\?//;s/,$//"` ;; *) TM_MULTILIB_CONFIG="${with_multilib_list}" diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi index 71593919389..5304ebd36a9 100644 --- a/gcc/doc/install.texi +++ b/gcc/doc/install.texi @@ -1258,12 +1258,12 @@ default set of libraries is selected based on the value of @item amdgcn*-*-* @var{list} is a comma separated list of ISA names (allowed values: @code{fiji}, -@code{gfx900}, @code{gfx906}, @code{gfx908}, @code{gfx90a}). It ought not -include the name of the default ISA, specified via @option{--with-arch}. If -@var{list} is empty, then there will be no multilibs and only the default -run-time library will be built. If @var{list} is @code{default} or -@option{--with-multilib-list=} is not specified, then the default set of -libraries is selected. +@code{gfx900}, @code{gfx906}, @code{gfx908}, @code{gfx90a}, @code{gfx1100}). +It ought not include the name of the default ISA, specified +via @option{--with-arch}. If @var{list} is empty, then there will be no +multilibs and only the default run-time library will be built. If @var{list} +is @code{default} or @option{--with-multilib-list=} is not specified, then +the default set of libraries is selected. @item arm*-*-* @var{list} is a comma separated list of @code{aprofile} and diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 5f904cf1ef2..d1b2c284e2b 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -21723,6 +21723,9 @@ Compile for CDNA1 Instinct MI100 series devices (gfx908). @item gfx90a Compile for CDNA2 Instinct MI200 series devices (gfx90a). +@item gfx1100 +Compile for RDNA3 gfx1100 devices (GFX11 series). + @end table @opindex msram-ecc diff --git a/libgomp/testsuite/libgomp.c/declare-variant-4-gfx1100.c b/libgomp/testsuite/libgomp.c/declare-variant-4-gfx1100.c new file mode 100644 index 00000000000..6ade35224cc --- /dev/null +++ b/libgomp/testsuite/libgomp.c/declare-variant-4-gfx1100.c @@ -0,0 +1,8 @@ +/* { dg-do link { target { offload_target_amdgcn } } } */ +/* { dg-additional-options -foffload=amdgcn-amdhsa } */ +/* { dg-additional-options -foffload=-march=gfx1100 } */ +/* { dg-additional-options "-foffload=-fdump-tree-optimized" } */ + +#include "declare-variant-4.h" + +/* { dg-final { only_for_offload_target amdgcn-amdhsa scan-offload-tree-dump "= gfx1100 \\(\\);" "optimized" } } */ diff --git a/libgomp/testsuite/libgomp.c/declare-variant-4.h b/libgomp/testsuite/libgomp.c/declare-variant-4.h index a70352430c2..393a5e295cc 100644 --- a/libgomp/testsuite/libgomp.c/declare-variant-4.h +++ b/libgomp/testsuite/libgomp.c/declare-variant-4.h @@ -35,6 +35,20 @@ gfx90a (void) return 0x90a; } +__attribute__ ((noipa)) +int +gfx1030 (void) +{ + return 0x1030; +} + +__attribute__ ((noipa)) +int +gfx1100 (void) +{ + return 0x1100; +} + #ifdef USE_FIJI_FOR_GFX803 #pragma omp declare variant(gfx803) match(device = {isa("fiji")}) #else @@ -44,6 +58,8 @@ gfx90a (void) #pragma omp declare variant(gfx906) match(device = {isa("gfx906")}) #pragma omp declare variant(gfx908) match(device = {isa("gfx908")}) #pragma omp declare variant(gfx90a) match(device = {isa("gfx90a")}) +#pragma omp declare variant(gfx90a) match(device = {isa("gfx1030")}) +#pragma omp declare variant(gfx90a) match(device = {isa("gfx1100")}) __attribute__ ((noipa)) int f (void) ^ permalink raw reply [flat|nested] 25+ messages in thread
* [patch] amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs (was: [patch] amdgcn: config.gcc - enable gfx1100 multilib; add gfx1100 to docs) 2024-01-24 16:01 ` [patch] amdgcn: config.gcc - enable gfx1100 multilib; add gfx1100 to docs (was: [PATCH] amdgcn: additional gfx1100 support) Tobias Burnus @ 2024-01-26 12:26 ` Tobias Burnus 2024-01-26 12:32 ` [patch] amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs Tobias Burnus 0 siblings, 1 reply; 25+ messages in thread From: Tobias Burnus @ 2024-01-26 12:26 UTC (permalink / raw) To: Andrew Stubbs, gcc-patches, rguenther Hi all, hi Richard & Andrew, Am 24.01.24 um 17:01 schrieb Tobias Burnus: > This patch obviously depends on Andrew's; he wrote in the previous > email of this thread regarding his patch: > > Andrew Stubbs wrote: >> This is enough to get gfx1100 working for most purposes, on top of the >> patch that Tobias committed a week or so ago; there are still some test >> failures to investigate, and probably some tuning to do. >> >> It might also get gfx1030 working too. @Richi, could you test it, >> please? If gfx1030 doesn't work, I would propose my patch previously in the thread, https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643835.html This patch assumes that both gfx1100 and gfx1030 are now working: > Okay to enable gfx1100 multilib building and to document gfx1100 in > the manual? and, with this patch, additionally gfx1030? OK for mainline, once Andrew's patch is in? Tobias ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch] amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs 2024-01-26 12:26 ` [patch] amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs (was: [patch] amdgcn: config.gcc - enable gfx1100 multilib; add gfx1100 to docs) Tobias Burnus @ 2024-01-26 12:32 ` Tobias Burnus 2024-01-26 12:40 ` Richard Biener 2024-01-26 16:21 ` Thomas Schwinge 0 siblings, 2 replies; 25+ messages in thread From: Tobias Burnus @ 2024-01-26 12:32 UTC (permalink / raw) To: Andrew Stubbs, gcc-patches, rguenther [-- Attachment #1: Type: text/plain, Size: 974 bytes --] Now with patch ... Tobias Burnus wrote: > Hi all, hi Richard & Andrew, > > Am 24.01.24 um 17:01 schrieb Tobias Burnus: >> This patch obviously depends on Andrew's; he wrote in the previous >> email of this thread regarding his patch: >> >> Andrew Stubbs wrote: >>> This is enough to get gfx1100 working for most purposes, on top of the >>> patch that Tobias committed a week or so ago; there are still some test >>> failures to investigate, and probably some tuning to do. >>> >>> It might also get gfx1030 working too. @Richi, could you test it, >>> please? > > If gfx1030 doesn't work, I would propose my patch previously in the > thread, > https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643835.html > > This patch assumes that both gfx1100 and gfx1030 are now working: > >> Okay to enable gfx1100 multilib building and to document gfx1100 in >> the manual? > and, with this patch, additionally gfx1030? > > OK for mainline, once Andrew's patch is in? Tobias [-- Attachment #2: gfx1100-gfx1030-doc.diff --] [-- Type: text/x-patch, Size: 5599 bytes --] amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs gcc/ChangeLog: * config.gcc (amdgcn-*-*): Add gfx1030 and gfx1100 to TM_MULTILIB_CONFIG. * doc/install.texi (Configuration amdgcn-*-*): Mention gfx1030/gfx1100. * doc/invoke.texi (AMD GCN Options): Add gfx1030 and gfx1100 to -march/-mtune. libgomp/ChangeLog: * testsuite/libgomp.c/declare-variant-4.h: Add variant functions for gfx1030 and gfx1100. * testsuite/libgomp.c/declare-variant-4-gfx1030.c: New test. * testsuite/libgomp.c/declare-variant-4-gfx1100.c: New test. Signed-off-by: Tobias Burnus <tburnus@baylibre.com> gcc/config.gcc | 2 +- gcc/doc/install.texi | 12 ++++++------ gcc/doc/invoke.texi | 6 ++++++ libgomp/testsuite/libgomp.c/declare-variant-4-gfx1030.c | 8 ++++++++ libgomp/testsuite/libgomp.c/declare-variant-4-gfx1100.c | 8 ++++++++ libgomp/testsuite/libgomp.c/declare-variant-4.h | 16 ++++++++++++++++ 6 files changed, 45 insertions(+), 7 deletions(-) diff --git a/gcc/config.gcc b/gcc/config.gcc index b2d7d7dd475..a0f9c672308 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -4564,7 +4564,7 @@ case "${target}" in TM_MULTILIB_CONFIG= ;; xdefault | xyes) - TM_MULTILIB_CONFIG=`echo "gfx900,gfx906,gfx908,gfx90a" | sed "s/${with_arch},\?//;s/,$//"` + TM_MULTILIB_CONFIG=`echo "gfx900,gfx906,gfx908,gfx90a,gfx1030,gfx1100" | sed "s/${with_arch},\?//;s/,$//"` ;; *) TM_MULTILIB_CONFIG="${with_multilib_list}" diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi index 71593919389..5747b5a12fe 100644 --- a/gcc/doc/install.texi +++ b/gcc/doc/install.texi @@ -1258,12 +1258,12 @@ default set of libraries is selected based on the value of @item amdgcn*-*-* @var{list} is a comma separated list of ISA names (allowed values: @code{fiji}, -@code{gfx900}, @code{gfx906}, @code{gfx908}, @code{gfx90a}). It ought not -include the name of the default ISA, specified via @option{--with-arch}. If -@var{list} is empty, then there will be no multilibs and only the default -run-time library will be built. If @var{list} is @code{default} or -@option{--with-multilib-list=} is not specified, then the default set of -libraries is selected. +@code{gfx900}, @code{gfx906}, @code{gfx908}, @code{gfx90a}, @code{gfx1030}, +@code{gfx1100}). It ought not include the name of the default ISA, specified +via @option{--with-arch}. If @var{list} is empty, then there will be no +multilibs and only the default run-time library will be built. If @var{list} +is @code{default} or @option{--with-multilib-list=} is not specified, then +the default set of libraries is selected. @item arm*-*-* @var{list} is a comma separated list of @code{aprofile} and diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 6ec56493e59..64c5ed2ffde 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -21739,6 +21739,12 @@ Compile for CDNA1 Instinct MI100 series devices (gfx908). @item gfx90a Compile for CDNA2 Instinct MI200 series devices (gfx90a). +@item gfx1030 +Compile for RDNA2 gfx1030 devices (GFX10 series). + +@item gfx1100 +Compile for RDNA3 gfx1100 devices (GFX11 series). + @end table @opindex msram-ecc diff --git a/libgomp/testsuite/libgomp.c/declare-variant-4-gfx1030.c b/libgomp/testsuite/libgomp.c/declare-variant-4-gfx1030.c new file mode 100644 index 00000000000..d98d5ef54ec --- /dev/null +++ b/libgomp/testsuite/libgomp.c/declare-variant-4-gfx1030.c @@ -0,0 +1,8 @@ +/* { dg-do link { target { offload_target_amdgcn } } } */ +/* { dg-additional-options -foffload=amdgcn-amdhsa } */ +/* { dg-additional-options -foffload=-march=gfx1030 } */ +/* { dg-additional-options "-foffload=-fdump-tree-optimized" } */ + +#include "declare-variant-4.h" + +/* { dg-final { only_for_offload_target amdgcn-amdhsa scan-offload-tree-dump "= gfx1030 \\(\\);" "optimized" } } */ diff --git a/libgomp/testsuite/libgomp.c/declare-variant-4-gfx1100.c b/libgomp/testsuite/libgomp.c/declare-variant-4-gfx1100.c new file mode 100644 index 00000000000..6ade35224cc --- /dev/null +++ b/libgomp/testsuite/libgomp.c/declare-variant-4-gfx1100.c @@ -0,0 +1,8 @@ +/* { dg-do link { target { offload_target_amdgcn } } } */ +/* { dg-additional-options -foffload=amdgcn-amdhsa } */ +/* { dg-additional-options -foffload=-march=gfx1100 } */ +/* { dg-additional-options "-foffload=-fdump-tree-optimized" } */ + +#include "declare-variant-4.h" + +/* { dg-final { only_for_offload_target amdgcn-amdhsa scan-offload-tree-dump "= gfx1100 \\(\\);" "optimized" } } */ diff --git a/libgomp/testsuite/libgomp.c/declare-variant-4.h b/libgomp/testsuite/libgomp.c/declare-variant-4.h index a70352430c2..393a5e295cc 100644 --- a/libgomp/testsuite/libgomp.c/declare-variant-4.h +++ b/libgomp/testsuite/libgomp.c/declare-variant-4.h @@ -35,6 +35,20 @@ gfx90a (void) return 0x90a; } +__attribute__ ((noipa)) +int +gfx1030 (void) +{ + return 0x1030; +} + +__attribute__ ((noipa)) +int +gfx1100 (void) +{ + return 0x1100; +} + #ifdef USE_FIJI_FOR_GFX803 #pragma omp declare variant(gfx803) match(device = {isa("fiji")}) #else @@ -44,6 +58,8 @@ gfx90a (void) #pragma omp declare variant(gfx906) match(device = {isa("gfx906")}) #pragma omp declare variant(gfx908) match(device = {isa("gfx908")}) #pragma omp declare variant(gfx90a) match(device = {isa("gfx90a")}) +#pragma omp declare variant(gfx90a) match(device = {isa("gfx1030")}) +#pragma omp declare variant(gfx90a) match(device = {isa("gfx1100")}) __attribute__ ((noipa)) int f (void) ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch] amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs 2024-01-26 12:32 ` [patch] amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs Tobias Burnus @ 2024-01-26 12:40 ` Richard Biener 2024-01-26 12:59 ` Tobias Burnus 2024-01-26 16:21 ` Thomas Schwinge 1 sibling, 1 reply; 25+ messages in thread From: Richard Biener @ 2024-01-26 12:40 UTC (permalink / raw) To: Tobias Burnus; +Cc: Andrew Stubbs, gcc-patches On Fri, 26 Jan 2024, Tobias Burnus wrote: > Now with patch ... > > Tobias Burnus wrote: > > Hi all, hi Richard & Andrew, > > > > Am 24.01.24 um 17:01 schrieb Tobias Burnus: > >> This patch obviously depends on Andrew's; he wrote in the previous email of > >> this thread regarding his patch: > >> > >> Andrew Stubbs wrote: > >>> This is enough to get gfx1100 working for most purposes, on top of the > >>> patch that Tobias committed a week or so ago; there are still some test > >>> failures to investigate, and probably some tuning to do. > >>> > >>> It might also get gfx1030 working too. @Richi, could you test it, > >>> please? > > > > If gfx1030 doesn't work, I would propose my patch previously in the thread, > > https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643835.html > > > > This patch assumes that both gfx1100 and gfx1030 are now working: > > > >> Okay to enable gfx1100 multilib building and to document gfx1100 in the > >> manual? > > and, with this patch, additionally gfx1030? > > > > OK for mainline, once Andrew's patch is in? > Tobias Looks good to me. +@item gfx1030 +Compile for RDNA2 gfx1030 devices (GFX10 series). + +@item gfx1100 +Compile for RDNA3 gfx1100 devices (GFX11 series). Btw, "GFX10" series isn't precise as it's only the high-end parts that are covered by gfx1030, there's gfx103[0-6] where hopefully at least gfx1031, gfx1032 and gfx1034 (the dGPU variants) are trivial to support as well(?). Using gfx103x might be better that way, OTOH if APU vs dGPU will make a compilation target difference then gfx103d vs gfx103a maybe? "GFX10" series might be also not know to users, but I'm unsure we can list AMD product names here? Richard. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch] amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs 2024-01-26 12:40 ` Richard Biener @ 2024-01-26 12:59 ` Tobias Burnus 0 siblings, 0 replies; 25+ messages in thread From: Tobias Burnus @ 2024-01-26 12:59 UTC (permalink / raw) To: Richard Biener, Andrew Stubbs; +Cc: gcc-patches [-- Attachment #1: Type: text/plain, Size: 1231 bytes --] Hi Richard, Richard Biener wrote: > Looks good to me. Thanks - I will commit it after lunch to see whether someone else has additional comments. > +@item gfx1030 > +Compile for RDNA2 gfx1030 devices (GFX10 series). > + > +@item gfx1100 > +Compile for RDNA3 gfx1100 devices (GFX11 series). > > Btw, "GFX10" series isn't precise as it's only the high-end parts > that are covered by gfx1030, there's gfx103[0-6] where hopefully > at least gfx1031, gfx1032 and gfx1034 (the dGPU variants) are > trivial to support as well(?). > > Using gfx103x might be better > that way, OTOH if APU vs dGPU will make a compilation target > difference then gfx103d vs gfx103a maybe? "GFX10" series might On the LLVM side (and also for the llvm-mc assembler), they distinguish all of the gfx* for the -mcpu= argument. See https://llvm.org/docs/AMDGPUUsage.html#id26 for that list. Thus, I think it makes sense to do the same here. The last column on that page lists the supported hardware but is it neither really up to date nor complete. Thus, I found it easier to just mention gfx1100 as that's unique. On the ROCm side, AMD has: https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html Tobias ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch] amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs 2024-01-26 12:32 ` [patch] amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs Tobias Burnus 2024-01-26 12:40 ` Richard Biener @ 2024-01-26 16:21 ` Thomas Schwinge 2024-01-26 16:36 ` Richard Biener 2024-01-26 16:45 ` [patch] install.texi: For gcn, recommend LLVM 15, unless gfx1100 is disabled (was: [patch] amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs) Tobias Burnus 1 sibling, 2 replies; 25+ messages in thread From: Thomas Schwinge @ 2024-01-26 16:21 UTC (permalink / raw) To: Tobias Burnus, Andrew Stubbs; +Cc: gcc-patches, rguenther Hi! Great progress that you've made! :-) On 2024-01-26T13:32:02+0100, Tobias Burnus <tburnus@baylibre.com> wrote: > Tobias Burnus wrote: >> Am 24.01.24 um 17:01 schrieb Tobias Burnus: >>> Okay to enable gfx1100 multilib building and to document gfx1100 in >>> the manual? >> >> and, with this patch, additionally gfx1030? > amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs > --- a/gcc/doc/install.texi > +++ b/gcc/doc/install.texi > @@ -1258,12 +1258,12 @@ default set of libraries is selected based on the value of > > @item amdgcn*-*-* > @var{list} is a comma separated list of ISA names (allowed values: @code{fiji}, > -@code{gfx900}, @code{gfx906}, @code{gfx908}, @code{gfx90a}). It ought not > -include the name of the default ISA, specified via @option{--with-arch}. If > -@var{list} is empty, then there will be no multilibs and only the default > -run-time library will be built. If @var{list} is @code{default} or > -@option{--with-multilib-list=} is not specified, then the default set of > -libraries is selected. > +@code{gfx900}, @code{gfx906}, @code{gfx908}, @code{gfx90a}, @code{gfx1030}, > +@code{gfx1100}). It ought not include the name of the default ISA, specified > +via @option{--with-arch}. If @var{list} is empty, then there will be no > +multilibs and only the default run-time library will be built. If @var{list} > +is @code{default} or @option{--with-multilib-list=} is not specified, then > +the default set of libraries is selected. Further down in that file, we state: @anchor{amdgcn-x-amdhsa} @heading amdgcn-*-amdhsa AMD GCN GPU target. Instead of GNU Binutils, you will need to install LLVM 13.0.1, or later, [...] LLVM 13.0.1 may still be fine for gfx1030 ('[...]/amdgcn-amdhsa/gfx1030/libgcc' does get built; I've not further tested), but it's not sufficient for gfx1100 anymore: [...] checking for suffix of object files... configure: error: in `[...]/amdgcn-amdhsa/gfx1100/libgcc': configure: error: cannot compute suffix of object files: cannot compile See `config.log' for more details make[1]: *** [Makefile:14105: configure-target-libgcc] Error 1 [...] '[...]/amdgcn-amdhsa/gfx1100/libgcc/config.log': [...] 'gfx1100' is not a recognized processor for this target (ignoring processor) 'gfx1100' is not a recognized processor for this target (ignoring processor) /tmp/ccZdohcj.s:1:17: error: .amdgcn_target directive's target id amdgcn-unknown-amdhsa--gfx1100 does not match the specified target id amdgcn-unknown-amdhsa--gfx000 .amdgcn_target "amdgcn-unknown-amdhsa--gfx1100" ^ [...] Which version of LLVM should we be recommending? Grüße Thomas ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch] amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs 2024-01-26 16:21 ` Thomas Schwinge @ 2024-01-26 16:36 ` Richard Biener 2024-01-26 16:45 ` [patch] install.texi: For gcn, recommend LLVM 15, unless gfx1100 is disabled (was: [patch] amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs) Tobias Burnus 1 sibling, 0 replies; 25+ messages in thread From: Richard Biener @ 2024-01-26 16:36 UTC (permalink / raw) To: Thomas Schwinge; +Cc: Tobias Burnus, Andrew Stubbs, gcc-patches > Am 26.01.2024 um 17:22 schrieb Thomas Schwinge <tschwinge@baylibre.com>: > > Hi! > > Great progress that you've made! :-) > >> On 2024-01-26T13:32:02+0100, Tobias Burnus <tburnus@baylibre.com> wrote: >> Tobias Burnus wrote: >>> Am 24.01.24 um 17:01 schrieb Tobias Burnus: >>>> Okay to enable gfx1100 multilib building and to document gfx1100 in >>>> the manual? >>> >>> and, with this patch, additionally gfx1030? > >> amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs > >> --- a/gcc/doc/install.texi >> +++ b/gcc/doc/install.texi >> @@ -1258,12 +1258,12 @@ default set of libraries is selected based on the value of >> >> @item amdgcn*-*-* >> @var{list} is a comma separated list of ISA names (allowed values: @code{fiji}, >> -@code{gfx900}, @code{gfx906}, @code{gfx908}, @code{gfx90a}). It ought not >> -include the name of the default ISA, specified via @option{--with-arch}. If >> -@var{list} is empty, then there will be no multilibs and only the default >> -run-time library will be built. If @var{list} is @code{default} or >> -@option{--with-multilib-list=} is not specified, then the default set of >> -libraries is selected. >> +@code{gfx900}, @code{gfx906}, @code{gfx908}, @code{gfx90a}, @code{gfx1030}, >> +@code{gfx1100}). It ought not include the name of the default ISA, specified >> +via @option{--with-arch}. If @var{list} is empty, then there will be no >> +multilibs and only the default run-time library will be built. If @var{list} >> +is @code{default} or @option{--with-multilib-list=} is not specified, then >> +the default set of libraries is selected. > > Further down in that file, we state: > > @anchor{amdgcn-x-amdhsa} > @heading amdgcn-*-amdhsa > AMD GCN GPU target. > > Instead of GNU Binutils, you will need to install LLVM 13.0.1, or later, [...] > > LLVM 13.0.1 may still be fine for gfx1030 > ('[...]/amdgcn-amdhsa/gfx1030/libgcc' does get built; I've not further > tested), but it's not sufficient for gfx1100 anymore: > > [...] > checking for suffix of object files... configure: error: in `[...]/amdgcn-amdhsa/gfx1100/libgcc': > configure: error: cannot compute suffix of object files: cannot compile > See `config.log' for more details > make[1]: *** [Makefile:14105: configure-target-libgcc] Error 1 > [...] > > '[...]/amdgcn-amdhsa/gfx1100/libgcc/config.log': > > [...] > 'gfx1100' is not a recognized processor for this target (ignoring processor) > 'gfx1100' is not a recognized processor for this target (ignoring processor) > /tmp/ccZdohcj.s:1:17: error: .amdgcn_target directive's target id amdgcn-unknown-amdhsa--gfx1100 does not match the specified target id amdgcn-unknown-amdhsa--gfx000 > .amdgcn_target "amdgcn-unknown-amdhsa--gfx1100" > ^ > [...] > > Which version of LLVM should we be recommending? A more recent one ;) unless we know of any fixed bugs in the assembler we’d rely on I‘d day the oldest that works „or later“ Richard > > Grüße > Thomas ^ permalink raw reply [flat|nested] 25+ messages in thread
* [patch] install.texi: For gcn, recommend LLVM 15, unless gfx1100 is disabled (was: [patch] amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs) 2024-01-26 16:21 ` Thomas Schwinge 2024-01-26 16:36 ` Richard Biener @ 2024-01-26 16:45 ` Tobias Burnus 2024-01-29 10:01 ` [patch] install.texi: For gcn, recommend LLVM 15, unless gfx1100 is disabled Andrew Stubbs 1 sibling, 1 reply; 25+ messages in thread From: Tobias Burnus @ 2024-01-26 16:45 UTC (permalink / raw) To: Thomas Schwinge, Andrew Stubbs; +Cc: gcc-patches, rguenther [-- Attachment #1.1: Type: text/plain, Size: 845 bytes --] Hi, Thomas Schwinge wrote: > amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to > the docs > ... > Further down in that file, we state: > @anchor{amdgcn-x-amdhsa} > @heading amdgcn-*-amdhsa > AMD GCN GPU target. > > Instead of GNU Binutils, you will need to install LLVM 13.0.1, or later, [...] > > LLVM 13.0.1 may still be fine for gfx1030 > ('[...]/amdgcn-amdhsa/gfx1030/libgcc' does get built; I've not further > tested), but it's not sufficient for gfx1100 anymore: Testing with the system compilers here, llvm-mc-14.0.6 also fails while llvm-mc-15.0.7 accepts it. > Which version of LLVM should we be recommending? >= LLVM 15, I think. How about the following wording? It still mentions LLVM 13.0.1 for those that really need it but with for the default setup, it requires 15+. Tobias [-- Attachment #2: install-fix.diff --] [-- Type: text/x-patch, Size: 1340 bytes --] install.texi: For gcn, recommend LLVM 15, unless gfx1100 is disabled gcc/ChangeLog: * doc/install.texi (amdgcn): Recommend LLVM 15+ and newlib 4.4+, but keep requiring only newlib 4.3+ and, if gfx1100 is disabled, LLVM 13.0.1+. Signed-off-by: Tobias Burnus <tburnus@baylibre.com> diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi index 5747b5a12fe..c7794439107 100644 --- a/gcc/doc/install.texi +++ b/gcc/doc/install.texi @@ -3927,14 +3927,15 @@ This is a synonym for @samp{x86_64-*-solaris2*}. @heading amdgcn-*-amdhsa AMD GCN GPU target. -Instead of GNU Binutils, you will need to install LLVM 13.0.1, or later, and copy +Instead of GNU Binutils, you will need to install LLVM 15, or later, and copy @file{bin/llvm-mc} to @file{amdgcn-amdhsa/bin/as}, @file{bin/lld} to @file{amdgcn-amdhsa/bin/ld}, @file{bin/llvm-nm} to @file{amdgcn-amdhsa/bin/nm}, and @file{bin/llvm-ar} to both @file{bin/amdgcn-amdhsa-ar} and -@file{bin/amdgcn-amdhsa-ranlib}. +@file{bin/amdgcn-amdhsa-ranlib}. Note that LLVM 13.0.1 or LLVM 14 can be used +by specifying a @code{--with-multilib-list=} that does not list @code{gfx1100}. -Use Newlib (4.3.0 or newer). +Use Newlib (4.3.0 or newer; 4.4.0 or later is recommended). To run the binaries, install the HSA Runtime from the @uref{https://rocm.docs.amd.com/,,ROCm Platform}, and use ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch] install.texi: For gcn, recommend LLVM 15, unless gfx1100 is disabled 2024-01-26 16:45 ` [patch] install.texi: For gcn, recommend LLVM 15, unless gfx1100 is disabled (was: [patch] amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs) Tobias Burnus @ 2024-01-29 10:01 ` Andrew Stubbs 0 siblings, 0 replies; 25+ messages in thread From: Andrew Stubbs @ 2024-01-29 10:01 UTC (permalink / raw) To: Tobias Burnus, Thomas Schwinge; +Cc: gcc-patches, rguenther On 26/01/2024 16:45, Tobias Burnus wrote: > Hi, > > Thomas Schwinge wrote: >> amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to >> the docs >> ... >> Further down in that file, we state: >> @anchor{amdgcn-x-amdhsa} >> @heading amdgcn-*-amdhsa >> AMD GCN GPU target. >> >> Instead of GNU Binutils, you will need to install LLVM 13.0.1, or later, [...] >> >> LLVM 13.0.1 may still be fine for gfx1030 >> ('[...]/amdgcn-amdhsa/gfx1030/libgcc' does get built; I've not further >> tested), but it's not sufficient for gfx1100 anymore: > > Testing with the system compilers here, llvm-mc-14.0.6 also fails while > llvm-mc-15.0.7 accepts it. > >> Which version of LLVM should we be recommending? > > >= LLVM 15, I think. How about the following wording? It still mentions > LLVM 13.0.1 for those that really need it but with for the default > setup, it requires 15+. OK. Andrew ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] amdgcn: additional gfx1100 support 2024-01-24 12:43 [PATCH] amdgcn: additional gfx1100 support Andrew Stubbs 2024-01-24 16:01 ` [patch] amdgcn: config.gcc - enable gfx1100 multilib; add gfx1100 to docs (was: [PATCH] amdgcn: additional gfx1100 support) Tobias Burnus @ 2024-01-26 8:56 ` Richard Biener 2024-01-26 9:45 ` Richard Biener 2024-03-06 13:49 ` amdgcn: additional gfx1030/gfx1100 support: adjust test cases (was: [PATCH] amdgcn: additional gfx1100 support) Thomas Schwinge 2 siblings, 1 reply; 25+ messages in thread From: Richard Biener @ 2024-01-26 8:56 UTC (permalink / raw) To: Andrew Stubbs; +Cc: gcc-patches, pa On Wed, 24 Jan 2024, Andrew Stubbs wrote: > This is enough to get gfx1100 working for most purposes, on top of the > patch that Tobias committed a week or so ago; there are still some test > failures to investigate, and probably some tuning to do. > > It might also get gfx1030 working too. @Richi, could you test it, > please? I can report partial success here. I do see quite some FAILs because of /space/rguenther/src/gcc-autopar_devel/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-4.f90: In function 'accum_._omp_fn.1':^M /space/rguenther/src/gcc-autopar_devel/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-4.f90:20:38: error: unrecognizable insn:^M (insn 108 107 109 6 (set (reg:V8SF 849)^M (unspec:V8SF [^M (reg:V8SF 844 [ vect__43.12_106 ]) repeated x2^M (const_int 1 [0x1])^M ] UNSPEC_PLUS_DPP_SHR)) "/space/rguenther/src/gcc-autopar_devel/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-4.f90":22:29 discrim 1 -1^M (nil))^M during RTL pass: vregs^M /space/rguenther/src/gcc-autopar_devel/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-4.f90:20:38: internal compiler error: in extract_insn, at recog.cc:2812^M there are also quite a number of execution FAILs like icv-5.exe: /space/rguenther/src/gcc-autopar_devel/libgomp/plugin/plugin-gcn.c:2462: isa_matches_agent: Assertion `agent_isa_s' failed. FAIL: libgomp.c/../libgomp.c-c++-common/icv-5.c execution test (the assert in question looks bad - yeah, somehow we got past device initialization - not sure how - but end up here) Maybe HSA behaves odd here - I didn't constrain the device it should choose but it works (most of the time). GCN_DEBUG prints me all the HSA agents available but I don't see any debug on which agent is actually initialized during libgomp device init (at least nothing I can easily match up). Maybe sth to improve. I'll followup with a test summary once the (serial) run of libgomp testing finished. At least there are quite some number of actual kernel executions and PASSing testcases. Richard. > I can't test the other multilibs right now. @PA, can you test it please? > > I can self-approve the patch, but I'll hold off the commit until the > test results come back. > > Andrew > > gcc/ChangeLog: > > * config/gcn/gcn-opts.h (TARGET_PACKED_WORK_ITEMS): Add TARGET_RDNA3. > * config/gcn/gcn-valu.md (all_convert): New iterator. > (<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>): New > define_expand, and rename the old one to ... > (*<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>): ... this. > (extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>): Likewise, to ... > (extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>): .. this. > (*<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_shift<exec>): New. > * config/gcn/gcn.cc (gcn_global_address_p): Use "offsetbits" correctly. > (gcn_hsa_declare_function_name): Update the vgpr counting for gfx1100. > * config/gcn/gcn.md (<u>mulhisi3): Disable on RDNA3. > (<u>mulqihi3_scalar): Likewise. > > libgcc/ChangeLog: > > * config/gcn/amdgcn_veclib.h (CDNA3_PLUS): Handle RDNA3. > > libgomp/ChangeLog: > > * config/gcn/time.c (RTC_TICKS): Configure RDNA3. > (omp_get_wtime): Add RDNA3-compatible variant. > * plugin/plugin-gcn.c (max_isa_vgprs): Tune for gfx1030 and gfx1100. > > Signed-off-by: Andrew Stubbs <ams@baylibre.com> > --- > gcc/config/gcn/gcn-opts.h | 2 +- > gcc/config/gcn/gcn-valu.md | 41 ++++++++++++++++++++++++++++--- > gcc/config/gcn/gcn.cc | 31 ++++++++++++++++------- > gcc/config/gcn/gcn.md | 4 +-- > libgcc/config/gcn/amdgcn_veclib.h | 2 +- > libgomp/config/gcn/time.c | 10 ++++++++ > libgomp/plugin/plugin-gcn.c | 6 +++-- > 7 files changed, 77 insertions(+), 19 deletions(-) > > diff --git a/gcc/config/gcn/gcn-opts.h b/gcc/config/gcn/gcn-opts.h > index 79fbda3ab25..6be2c9204fa 100644 > --- a/gcc/config/gcn/gcn-opts.h > +++ b/gcc/config/gcn/gcn-opts.h > @@ -62,7 +62,7 @@ extern enum gcn_isa { > > > #define TARGET_M0_LDS_LIMIT (TARGET_GCN3) > -#define TARGET_PACKED_WORK_ITEMS (TARGET_CDNA2_PLUS) > +#define TARGET_PACKED_WORK_ITEMS (TARGET_CDNA2_PLUS || TARGET_RDNA3) > > #define TARGET_XNACK (flag_xnack != HSACO_ATTR_OFF) > > diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md > index 3d5b6271ee6..cd027f8b369 100644 > --- a/gcc/config/gcn/gcn-valu.md > +++ b/gcc/config/gcn/gcn-valu.md > @@ -3555,30 +3555,63 @@ > ;; }}} > ;; {{{ Int/int conversions > > +(define_code_iterator all_convert [truncate zero_extend sign_extend]) > (define_code_iterator zero_convert [truncate zero_extend]) > (define_code_attr convop [ > (sign_extend "extend") > (zero_extend "zero_extend") > (truncate "trunc")]) > > -(define_insn "<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>" > +(define_expand "<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>" > + [(set (match_operand:V_INT_1REG 0 "register_operand" "=v") > + (all_convert:V_INT_1REG > + (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))] > + "") > + > +(define_insn "*<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>" > [(set (match_operand:V_INT_1REG 0 "register_operand" "=v") > (zero_convert:V_INT_1REG > (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))] > - "" > + "!TARGET_RDNA3" > "v_mov_b32_sdwa\t%0, %1 dst_sel:<V_INT_1REG:sdwa> dst_unused:UNUSED_PAD src0_sel:<V_INT_1REG_ALT:sdwa>" > [(set_attr "type" "vop_sdwa") > (set_attr "length" "8")]) > > -(define_insn "extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>" > +(define_insn "extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>" > [(set (match_operand:V_INT_1REG 0 "register_operand" "=v") > (sign_extend:V_INT_1REG > (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))] > - "" > + "!TARGET_RDNA3" > "v_mov_b32_sdwa\t%0, sext(%1) src0_sel:<V_INT_1REG_ALT:sdwa>" > [(set_attr "type" "vop_sdwa") > (set_attr "length" "8")]) > > +(define_insn "*<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_shift<exec>" > + [(set (match_operand:V_INT_1REG 0 "register_operand" "=v") > + (all_convert:V_INT_1REG > + (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))] > + "TARGET_RDNA3" > + { > + enum {extend, zero_extend, trunc}; > + rtx shiftwidth = (<V_INT_1REG_ALT:SCALAR_MODE>mode == QImode > + || <V_INT_1REG:SCALAR_MODE>mode == QImode > + ? GEN_INT (24) > + : <V_INT_1REG_ALT:SCALAR_MODE>mode == HImode > + || <V_INT_1REG:SCALAR_MODE>mode == HImode > + ? GEN_INT (16) > + : NULL); > + operands[2] = shiftwidth; > + > + if (!shiftwidth) > + return "v_mov_b32 %0, %1"; > + else if (<convop> == extend || <convop> == trunc) > + return "v_lshlrev_b32\t%0, %2, %1\;v_ashrrev_i32\t%0, %2, %0"; > + else > + return "v_lshlrev_b32\t%0, %2, %1\;v_lshrrev_b32\t%0, %2, %0"; > + } > + [(set_attr "type" "mult") > + (set_attr "length" "8")]) > + > ;; GCC can already do these for scalar types, but not for vector types. > ;; Unfortunately you can't just do SUBREG on a vector to select the low part, > ;; so there must be a few tricks here. > diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc > index e668ce7c69e..e80de2ce056 100644 > --- a/gcc/config/gcn/gcn.cc > +++ b/gcc/config/gcn/gcn.cc > @@ -1597,8 +1597,8 @@ gcn_global_address_p (rtx addr) > rtx offset = XEXP (addr, 1); > int offsetbits = (TARGET_RDNA2_PLUS ? 11 : 12); > bool immediate_p = (CONST_INT_P (offset) > - && INTVAL (offset) >= -(1 << 12) > - && INTVAL (offset) < (1 << 12)); > + && INTVAL (offset) >= -(1 << offsetbits) > + && INTVAL (offset) < (1 << offsetbits)); > > if ((gcn_address_register_p (base, DImode, false) > || gcn_vec_address_register_p (base, DImode, false)) > @@ -6597,8 +6597,10 @@ gcn_hsa_declare_function_name (FILE *file, const char *name, > if (df_regs_ever_live_p (FIRST_AVGPR_REG + avgpr)) > break; > avgpr++; > - vgpr = (vgpr + 3) & ~3; > - avgpr = (avgpr + 3) & ~3; > + > + /* The main function epilogue uses v8, but df doesn't see that. */ > + if (vgpr < 9) > + vgpr = 9; > > if (!leaf_function_p ()) > { > @@ -6611,9 +6613,18 @@ gcn_hsa_declare_function_name (FILE *file, const char *name, > avgpr = MAX_NORMAL_AVGPR_COUNT; > } > > - /* The gfx90a accum_offset field can't represent 0 registers. */ > - if (gcn_arch == PROCESSOR_GFX90a && vgpr < 4) > - vgpr = 4; > + /* SIMD32 devices count double in wavefront64 mode. */ > + if (TARGET_RDNA2_PLUS) > + vgpr *= 2; > + > + /* Round up to the allocation block size. */ > + int vgpr_block_size = (TARGET_RDNA3 ? 12 > + : TARGET_RDNA2_PLUS || TARGET_CDNA2_PLUS ? 8 > + : 4); > + if (vgpr % vgpr_block_size) > + vgpr += vgpr_block_size - (vgpr % vgpr_block_size); > + if (avgpr % vgpr_block_size) > + avgpr += vgpr_block_size - (avgpr % vgpr_block_size); > > fputs ("\t.rodata\n" > "\t.p2align\t6\n" > @@ -6714,12 +6725,14 @@ gcn_hsa_declare_function_name (FILE *file, const char *name, > " .private_segment_fixed_size: 0\n" > " .wavefront_size: 64\n" > " .sgpr_count: %i\n" > - " .vgpr_count: %i\n" > + " .vgpr_count: %i%s\n" > " .max_flat_workgroup_size: 1024\n", > cfun->machine->kernarg_segment_byte_size, > cfun->machine->kernarg_segment_alignment, > LDS_SIZE, > - sgpr, next_free_vgpr); > + sgpr, next_free_vgpr, > + (TARGET_RDNA2_PLUS ? " ; wavefrontsize64 counts double on SIMD32" > + : "")); > if (gcn_arch == PROCESSOR_GFX90a || gcn_arch == PROCESSOR_GFX908) > fprintf (file, " .agpr_count: %i\n", avgpr); > fputs (" .end_amdgpu_metadata\n", file); > diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md > index 492b833e255..1f3c692b7a6 100644 > --- a/gcc/config/gcn/gcn.md > +++ b/gcc/config/gcn/gcn.md > @@ -1618,7 +1618,7 @@ > (mult:SI > (any_extend:SI (match_operand:HI 1 "register_operand" "%v")) > (any_extend:SI (match_operand:HI 2 "register_operand" " v"))))] > - "" > + "!TARGET_RDNA3" > "v_mul_<iu>32_<iu>24_sdwa\t%0, %<e>1, %<e>2 src0_sel:WORD_0 src1_sel:WORD_0" > [(set_attr "type" "vop_sdwa") > (set_attr "length" "8")]) > @@ -1628,7 +1628,7 @@ > (mult:HI > (any_extend:HI (match_operand:QI 1 "register_operand" "%v")) > (any_extend:HI (match_operand:QI 2 "register_operand" " v"))))] > - "" > + "!TARGET_RDNA3" > "v_mul_<iu>32_<iu>24_sdwa\t%0, %<e>1, %<e>2 src0_sel:BYTE_0 src1_sel:BYTE_0" > [(set_attr "type" "vop_sdwa") > (set_attr "length" "8")]) > diff --git a/libgcc/config/gcn/amdgcn_veclib.h b/libgcc/config/gcn/amdgcn_veclib.h > index 821f6386dd6..d268c6cac16 100644 > --- a/libgcc/config/gcn/amdgcn_veclib.h > +++ b/libgcc/config/gcn/amdgcn_veclib.h > @@ -230,7 +230,7 @@ do { \ > > #if defined (__GCN3__) || defined (__GCN5__) \ > || defined (__CDNA1__) || defined (__CDNA2__) \ > - || defined (__RDNA2__) > + || defined (__RDNA2__) || defined (__RDNA3__) > #define CDNA3_PLUS 0 > #else > #define CDNA3_PLUS 1 > diff --git a/libgomp/config/gcn/time.c b/libgomp/config/gcn/time.c > index 30a0d0188e4..efcd04f5f43 100644 > --- a/libgomp/config/gcn/time.c > +++ b/libgomp/config/gcn/time.c > @@ -30,15 +30,25 @@ > /* According to AMD: > dGPU RTC is 27MHz > AGPU RTC is 100MHz > + RDNA3 ISA manual states "typically 100MHz" > FIXME: DTRT on an APU. */ > +#ifdef __RDNA3__ > +#define RTC_TICKS (1.0 / 100000000.0) /* 100MHz */ > +#else > #define RTC_TICKS (1.0 / 27000000.0) /* 27MHz */ > +#endif > > double > omp_get_wtime (void) > { > uint64_t clock; > +#ifdef __RDNA3__ > + asm ("s_sendmsg_rtn_b64 %0 0x83 ;Get REALTIME\n\t" > + "s_waitcnt 0" : "=r" (clock)); > +#else > asm ("s_memrealtime %0\n\t" > "s_waitcnt 0" : "=r" (clock)); > +#endif > return clock * RTC_TICKS; > } > > diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c > index 0339848451e..db28781dedb 100644 > --- a/libgomp/plugin/plugin-gcn.c > +++ b/libgomp/plugin/plugin-gcn.c > @@ -1741,11 +1741,13 @@ max_isa_vgprs (int isa) > case EF_AMDGPU_MACH_AMDGCN_GFX900: > case EF_AMDGPU_MACH_AMDGCN_GFX906: > case EF_AMDGPU_MACH_AMDGCN_GFX908: > - case EF_AMDGPU_MACH_AMDGCN_GFX1030: > - case EF_AMDGPU_MACH_AMDGCN_GFX1100: > return 256; > case EF_AMDGPU_MACH_AMDGCN_GFX90a: > return 512; > + case EF_AMDGPU_MACH_AMDGCN_GFX1030: > + return 512; /* 512 SIMD32 = 256 wavefrontsize64. */ > + case EF_AMDGPU_MACH_AMDGCN_GFX1100: > + return 1536; /* 1536 SIMD32 = 768 wavefrontsize64. */ > } > GOMP_PLUGIN_fatal ("unhandled ISA in max_isa_vgprs"); > } > -- Richard Biener <rguenther@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg) ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] amdgcn: additional gfx1100 support 2024-01-26 8:56 ` [PATCH] amdgcn: additional gfx1100 support Richard Biener @ 2024-01-26 9:45 ` Richard Biener 2024-01-26 10:19 ` Andrew Stubbs 2024-02-01 14:41 ` libgomp GCN gfx1030/gfx1100 offloading status (was: [PATCH] amdgcn: additional gfx1100 support) Thomas Schwinge 0 siblings, 2 replies; 25+ messages in thread From: Richard Biener @ 2024-01-26 9:45 UTC (permalink / raw) To: Andrew Stubbs; +Cc: gcc-patches, pa [-- Attachment #1: Type: text/plain, Size: 14213 bytes --] On Fri, 26 Jan 2024, Richard Biener wrote: > On Wed, 24 Jan 2024, Andrew Stubbs wrote: > > > This is enough to get gfx1100 working for most purposes, on top of the > > patch that Tobias committed a week or so ago; there are still some test > > failures to investigate, and probably some tuning to do. > > > > It might also get gfx1030 working too. @Richi, could you test it, > > please? > > I can report partial success here. I do see quite some FAILs because of > > /space/rguenther/src/gcc-autopar_devel/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-4.f90: > In function 'accum_._omp_fn.1':^M > /space/rguenther/src/gcc-autopar_devel/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-4.f90:20:38: > error: unrecognizable insn:^M > (insn 108 107 109 6 (set (reg:V8SF 849)^M > (unspec:V8SF [^M > (reg:V8SF 844 [ vect__43.12_106 ]) repeated x2^M > (const_int 1 [0x1])^M > ] UNSPEC_PLUS_DPP_SHR)) > "/space/rguenther/src/gcc-autopar_devel/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-4.f90":22:29 > discrim 1 -1^M > (nil))^M > during RTL pass: vregs^M > /space/rguenther/src/gcc-autopar_devel/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-4.f90:20:38: > internal compiler error: in extract_insn, at recog.cc:2812^M > > there are also quite a number of execution FAILs like > > icv-5.exe: > /space/rguenther/src/gcc-autopar_devel/libgomp/plugin/plugin-gcn.c:2462: > isa_matches_agent: Assertion `agent_isa_s' failed. > FAIL: libgomp.c/../libgomp.c-c++-common/icv-5.c execution test > > (the assert in question looks bad - yeah, somehow we got past > device initialization - not sure how - but end up here) > > Maybe HSA behaves odd here - I didn't constrain the device it should > choose but it works (most of the time). GCN_DEBUG prints me all the > HSA agents available but I don't see any debug on which agent > is actually initialized during libgomp device init (at least nothing > I can easily match up). Maybe sth to improve. > > I'll followup with a test summary once the (serial) run of libgomp > testing finished. At least there are quite some number of > actual kernel executions and PASSing testcases. === libgomp Summary === # of expected passes 29126 # of unexpected failures 697 # of unexpected successes 1 # of expected failures 703 # of unresolved testcases 318 # of unsupported tests 766 full summary attached (compressed). Even compressed libgomp.log is too big to send. Richard. > > Richard. > > > I can't test the other multilibs right now. @PA, can you test it please? > > > > I can self-approve the patch, but I'll hold off the commit until the > > test results come back. > > > > Andrew > > > > gcc/ChangeLog: > > > > * config/gcn/gcn-opts.h (TARGET_PACKED_WORK_ITEMS): Add TARGET_RDNA3. > > * config/gcn/gcn-valu.md (all_convert): New iterator. > > (<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>): New > > define_expand, and rename the old one to ... > > (*<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>): ... this. > > (extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>): Likewise, to ... > > (extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>): .. this. > > (*<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_shift<exec>): New. > > * config/gcn/gcn.cc (gcn_global_address_p): Use "offsetbits" correctly. > > (gcn_hsa_declare_function_name): Update the vgpr counting for gfx1100. > > * config/gcn/gcn.md (<u>mulhisi3): Disable on RDNA3. > > (<u>mulqihi3_scalar): Likewise. > > > > libgcc/ChangeLog: > > > > * config/gcn/amdgcn_veclib.h (CDNA3_PLUS): Handle RDNA3. > > > > libgomp/ChangeLog: > > > > * config/gcn/time.c (RTC_TICKS): Configure RDNA3. > > (omp_get_wtime): Add RDNA3-compatible variant. > > * plugin/plugin-gcn.c (max_isa_vgprs): Tune for gfx1030 and gfx1100. > > > > Signed-off-by: Andrew Stubbs <ams@baylibre.com> > > --- > > gcc/config/gcn/gcn-opts.h | 2 +- > > gcc/config/gcn/gcn-valu.md | 41 ++++++++++++++++++++++++++++--- > > gcc/config/gcn/gcn.cc | 31 ++++++++++++++++------- > > gcc/config/gcn/gcn.md | 4 +-- > > libgcc/config/gcn/amdgcn_veclib.h | 2 +- > > libgomp/config/gcn/time.c | 10 ++++++++ > > libgomp/plugin/plugin-gcn.c | 6 +++-- > > 7 files changed, 77 insertions(+), 19 deletions(-) > > > > diff --git a/gcc/config/gcn/gcn-opts.h b/gcc/config/gcn/gcn-opts.h > > index 79fbda3ab25..6be2c9204fa 100644 > > --- a/gcc/config/gcn/gcn-opts.h > > +++ b/gcc/config/gcn/gcn-opts.h > > @@ -62,7 +62,7 @@ extern enum gcn_isa { > > > > > > #define TARGET_M0_LDS_LIMIT (TARGET_GCN3) > > -#define TARGET_PACKED_WORK_ITEMS (TARGET_CDNA2_PLUS) > > +#define TARGET_PACKED_WORK_ITEMS (TARGET_CDNA2_PLUS || TARGET_RDNA3) > > > > #define TARGET_XNACK (flag_xnack != HSACO_ATTR_OFF) > > > > diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md > > index 3d5b6271ee6..cd027f8b369 100644 > > --- a/gcc/config/gcn/gcn-valu.md > > +++ b/gcc/config/gcn/gcn-valu.md > > @@ -3555,30 +3555,63 @@ > > ;; }}} > > ;; {{{ Int/int conversions > > > > +(define_code_iterator all_convert [truncate zero_extend sign_extend]) > > (define_code_iterator zero_convert [truncate zero_extend]) > > (define_code_attr convop [ > > (sign_extend "extend") > > (zero_extend "zero_extend") > > (truncate "trunc")]) > > > > -(define_insn "<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>" > > +(define_expand "<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>" > > + [(set (match_operand:V_INT_1REG 0 "register_operand" "=v") > > + (all_convert:V_INT_1REG > > + (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))] > > + "") > > + > > +(define_insn "*<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>" > > [(set (match_operand:V_INT_1REG 0 "register_operand" "=v") > > (zero_convert:V_INT_1REG > > (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))] > > - "" > > + "!TARGET_RDNA3" > > "v_mov_b32_sdwa\t%0, %1 dst_sel:<V_INT_1REG:sdwa> dst_unused:UNUSED_PAD src0_sel:<V_INT_1REG_ALT:sdwa>" > > [(set_attr "type" "vop_sdwa") > > (set_attr "length" "8")]) > > > > -(define_insn "extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>" > > +(define_insn "extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>" > > [(set (match_operand:V_INT_1REG 0 "register_operand" "=v") > > (sign_extend:V_INT_1REG > > (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))] > > - "" > > + "!TARGET_RDNA3" > > "v_mov_b32_sdwa\t%0, sext(%1) src0_sel:<V_INT_1REG_ALT:sdwa>" > > [(set_attr "type" "vop_sdwa") > > (set_attr "length" "8")]) > > > > +(define_insn "*<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_shift<exec>" > > + [(set (match_operand:V_INT_1REG 0 "register_operand" "=v") > > + (all_convert:V_INT_1REG > > + (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))] > > + "TARGET_RDNA3" > > + { > > + enum {extend, zero_extend, trunc}; > > + rtx shiftwidth = (<V_INT_1REG_ALT:SCALAR_MODE>mode == QImode > > + || <V_INT_1REG:SCALAR_MODE>mode == QImode > > + ? GEN_INT (24) > > + : <V_INT_1REG_ALT:SCALAR_MODE>mode == HImode > > + || <V_INT_1REG:SCALAR_MODE>mode == HImode > > + ? GEN_INT (16) > > + : NULL); > > + operands[2] = shiftwidth; > > + > > + if (!shiftwidth) > > + return "v_mov_b32 %0, %1"; > > + else if (<convop> == extend || <convop> == trunc) > > + return "v_lshlrev_b32\t%0, %2, %1\;v_ashrrev_i32\t%0, %2, %0"; > > + else > > + return "v_lshlrev_b32\t%0, %2, %1\;v_lshrrev_b32\t%0, %2, %0"; > > + } > > + [(set_attr "type" "mult") > > + (set_attr "length" "8")]) > > + > > ;; GCC can already do these for scalar types, but not for vector types. > > ;; Unfortunately you can't just do SUBREG on a vector to select the low part, > > ;; so there must be a few tricks here. > > diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc > > index e668ce7c69e..e80de2ce056 100644 > > --- a/gcc/config/gcn/gcn.cc > > +++ b/gcc/config/gcn/gcn.cc > > @@ -1597,8 +1597,8 @@ gcn_global_address_p (rtx addr) > > rtx offset = XEXP (addr, 1); > > int offsetbits = (TARGET_RDNA2_PLUS ? 11 : 12); > > bool immediate_p = (CONST_INT_P (offset) > > - && INTVAL (offset) >= -(1 << 12) > > - && INTVAL (offset) < (1 << 12)); > > + && INTVAL (offset) >= -(1 << offsetbits) > > + && INTVAL (offset) < (1 << offsetbits)); > > > > if ((gcn_address_register_p (base, DImode, false) > > || gcn_vec_address_register_p (base, DImode, false)) > > @@ -6597,8 +6597,10 @@ gcn_hsa_declare_function_name (FILE *file, const char *name, > > if (df_regs_ever_live_p (FIRST_AVGPR_REG + avgpr)) > > break; > > avgpr++; > > - vgpr = (vgpr + 3) & ~3; > > - avgpr = (avgpr + 3) & ~3; > > + > > + /* The main function epilogue uses v8, but df doesn't see that. */ > > + if (vgpr < 9) > > + vgpr = 9; > > > > if (!leaf_function_p ()) > > { > > @@ -6611,9 +6613,18 @@ gcn_hsa_declare_function_name (FILE *file, const char *name, > > avgpr = MAX_NORMAL_AVGPR_COUNT; > > } > > > > - /* The gfx90a accum_offset field can't represent 0 registers. */ > > - if (gcn_arch == PROCESSOR_GFX90a && vgpr < 4) > > - vgpr = 4; > > + /* SIMD32 devices count double in wavefront64 mode. */ > > + if (TARGET_RDNA2_PLUS) > > + vgpr *= 2; > > + > > + /* Round up to the allocation block size. */ > > + int vgpr_block_size = (TARGET_RDNA3 ? 12 > > + : TARGET_RDNA2_PLUS || TARGET_CDNA2_PLUS ? 8 > > + : 4); > > + if (vgpr % vgpr_block_size) > > + vgpr += vgpr_block_size - (vgpr % vgpr_block_size); > > + if (avgpr % vgpr_block_size) > > + avgpr += vgpr_block_size - (avgpr % vgpr_block_size); > > > > fputs ("\t.rodata\n" > > "\t.p2align\t6\n" > > @@ -6714,12 +6725,14 @@ gcn_hsa_declare_function_name (FILE *file, const char *name, > > " .private_segment_fixed_size: 0\n" > > " .wavefront_size: 64\n" > > " .sgpr_count: %i\n" > > - " .vgpr_count: %i\n" > > + " .vgpr_count: %i%s\n" > > " .max_flat_workgroup_size: 1024\n", > > cfun->machine->kernarg_segment_byte_size, > > cfun->machine->kernarg_segment_alignment, > > LDS_SIZE, > > - sgpr, next_free_vgpr); > > + sgpr, next_free_vgpr, > > + (TARGET_RDNA2_PLUS ? " ; wavefrontsize64 counts double on SIMD32" > > + : "")); > > if (gcn_arch == PROCESSOR_GFX90a || gcn_arch == PROCESSOR_GFX908) > > fprintf (file, " .agpr_count: %i\n", avgpr); > > fputs (" .end_amdgpu_metadata\n", file); > > diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md > > index 492b833e255..1f3c692b7a6 100644 > > --- a/gcc/config/gcn/gcn.md > > +++ b/gcc/config/gcn/gcn.md > > @@ -1618,7 +1618,7 @@ > > (mult:SI > > (any_extend:SI (match_operand:HI 1 "register_operand" "%v")) > > (any_extend:SI (match_operand:HI 2 "register_operand" " v"))))] > > - "" > > + "!TARGET_RDNA3" > > "v_mul_<iu>32_<iu>24_sdwa\t%0, %<e>1, %<e>2 src0_sel:WORD_0 src1_sel:WORD_0" > > [(set_attr "type" "vop_sdwa") > > (set_attr "length" "8")]) > > @@ -1628,7 +1628,7 @@ > > (mult:HI > > (any_extend:HI (match_operand:QI 1 "register_operand" "%v")) > > (any_extend:HI (match_operand:QI 2 "register_operand" " v"))))] > > - "" > > + "!TARGET_RDNA3" > > "v_mul_<iu>32_<iu>24_sdwa\t%0, %<e>1, %<e>2 src0_sel:BYTE_0 src1_sel:BYTE_0" > > [(set_attr "type" "vop_sdwa") > > (set_attr "length" "8")]) > > diff --git a/libgcc/config/gcn/amdgcn_veclib.h b/libgcc/config/gcn/amdgcn_veclib.h > > index 821f6386dd6..d268c6cac16 100644 > > --- a/libgcc/config/gcn/amdgcn_veclib.h > > +++ b/libgcc/config/gcn/amdgcn_veclib.h > > @@ -230,7 +230,7 @@ do { \ > > > > #if defined (__GCN3__) || defined (__GCN5__) \ > > || defined (__CDNA1__) || defined (__CDNA2__) \ > > - || defined (__RDNA2__) > > + || defined (__RDNA2__) || defined (__RDNA3__) > > #define CDNA3_PLUS 0 > > #else > > #define CDNA3_PLUS 1 > > diff --git a/libgomp/config/gcn/time.c b/libgomp/config/gcn/time.c > > index 30a0d0188e4..efcd04f5f43 100644 > > --- a/libgomp/config/gcn/time.c > > +++ b/libgomp/config/gcn/time.c > > @@ -30,15 +30,25 @@ > > /* According to AMD: > > dGPU RTC is 27MHz > > AGPU RTC is 100MHz > > + RDNA3 ISA manual states "typically 100MHz" > > FIXME: DTRT on an APU. */ > > +#ifdef __RDNA3__ > > +#define RTC_TICKS (1.0 / 100000000.0) /* 100MHz */ > > +#else > > #define RTC_TICKS (1.0 / 27000000.0) /* 27MHz */ > > +#endif > > > > double > > omp_get_wtime (void) > > { > > uint64_t clock; > > +#ifdef __RDNA3__ > > + asm ("s_sendmsg_rtn_b64 %0 0x83 ;Get REALTIME\n\t" > > + "s_waitcnt 0" : "=r" (clock)); > > +#else > > asm ("s_memrealtime %0\n\t" > > "s_waitcnt 0" : "=r" (clock)); > > +#endif > > return clock * RTC_TICKS; > > } > > > > diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c > > index 0339848451e..db28781dedb 100644 > > --- a/libgomp/plugin/plugin-gcn.c > > +++ b/libgomp/plugin/plugin-gcn.c > > @@ -1741,11 +1741,13 @@ max_isa_vgprs (int isa) > > case EF_AMDGPU_MACH_AMDGCN_GFX900: > > case EF_AMDGPU_MACH_AMDGCN_GFX906: > > case EF_AMDGPU_MACH_AMDGCN_GFX908: > > - case EF_AMDGPU_MACH_AMDGCN_GFX1030: > > - case EF_AMDGPU_MACH_AMDGCN_GFX1100: > > return 256; > > case EF_AMDGPU_MACH_AMDGCN_GFX90a: > > return 512; > > + case EF_AMDGPU_MACH_AMDGCN_GFX1030: > > + return 512; /* 512 SIMD32 = 256 wavefrontsize64. */ > > + case EF_AMDGPU_MACH_AMDGCN_GFX1100: > > + return 1536; /* 1536 SIMD32 = 768 wavefrontsize64. */ > > } > > GOMP_PLUGIN_fatal ("unhandled ISA in max_isa_vgprs"); > > } > > > > -- Richard Biener <rguenther@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg) [-- Attachment #2: test summary --] [-- Type: application/x-gzip, Size: 6007 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] amdgcn: additional gfx1100 support 2024-01-26 9:45 ` Richard Biener @ 2024-01-26 10:19 ` Andrew Stubbs 2024-01-26 10:22 ` Richard Biener 2024-02-01 14:41 ` libgomp GCN gfx1030/gfx1100 offloading status (was: [PATCH] amdgcn: additional gfx1100 support) Thomas Schwinge 1 sibling, 1 reply; 25+ messages in thread From: Andrew Stubbs @ 2024-01-26 10:19 UTC (permalink / raw) To: Richard Biener; +Cc: gcc-patches, pa On 26/01/2024 09:45, Richard Biener wrote: > On Fri, 26 Jan 2024, Richard Biener wrote: > > === libgomp Summary === > > # of expected passes 29126 > # of unexpected failures 697 > # of unexpected successes 1 > # of expected failures 703 > # of unresolved testcases 318 > # of unsupported tests 766 > > full summary attached (compressed). Even compressed libgomp.log is > too big to send. > > Richard. I think this is good enough to start with. PA reported clean results for everything except gfx900 (looks like an unrelated issue). I'll go ahead and commit the patch. Hopefully Tobias's patch has already trimmed all the "-g" failures from that list. Andrew ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] amdgcn: additional gfx1100 support 2024-01-26 10:19 ` Andrew Stubbs @ 2024-01-26 10:22 ` Richard Biener 2024-01-26 10:31 ` Andrew Stubbs 0 siblings, 1 reply; 25+ messages in thread From: Richard Biener @ 2024-01-26 10:22 UTC (permalink / raw) To: Andrew Stubbs; +Cc: gcc-patches, pa On Fri, 26 Jan 2024, Andrew Stubbs wrote: > On 26/01/2024 09:45, Richard Biener wrote: > > On Fri, 26 Jan 2024, Richard Biener wrote: > > > > === libgomp Summary === > > > > # of expected passes 29126 > > # of unexpected failures 697 > > # of unexpected successes 1 > > # of expected failures 703 > > # of unresolved testcases 318 > > # of unsupported tests 766 > > > > full summary attached (compressed). Even compressed libgomp.log is > > too big to send. > > > > Richard. > > I think this is good enough to start with. PA reported clean results for > everything except gfx900 (looks like an unrelated issue). > > I'll go ahead and commit the patch. > > Hopefully Tobias's patch has already trimmed all the "-g" failures from that > list. Should I open a bug for the ICE? That's responsible for quite a number of failures as well. Richard. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] amdgcn: additional gfx1100 support 2024-01-26 10:22 ` Richard Biener @ 2024-01-26 10:31 ` Andrew Stubbs 0 siblings, 0 replies; 25+ messages in thread From: Andrew Stubbs @ 2024-01-26 10:31 UTC (permalink / raw) To: Richard Biener; +Cc: gcc-patches, pa On 26/01/2024 10:22, Richard Biener wrote: > On Fri, 26 Jan 2024, Andrew Stubbs wrote: > >> On 26/01/2024 09:45, Richard Biener wrote: >>> On Fri, 26 Jan 2024, Richard Biener wrote: >>> >>> === libgomp Summary === >>> >>> # of expected passes 29126 >>> # of unexpected failures 697 >>> # of unexpected successes 1 >>> # of expected failures 703 >>> # of unresolved testcases 318 >>> # of unsupported tests 766 >>> >>> full summary attached (compressed). Even compressed libgomp.log is >>> too big to send. >>> >>> Richard. >> >> I think this is good enough to start with. PA reported clean results for >> everything except gfx900 (looks like an unrelated issue). >> >> I'll go ahead and commit the patch. >> >> Hopefully Tobias's patch has already trimmed all the "-g" failures from that >> list. > > Should I open a bug for the ICE? That's responsible for quite a number > of failures as well. The broken vector reduction instruction? It's a known issue (RDNA doesn't support those instructions anymore, and somehow disabling the insn isn't enough to stop them being generated), but it doesn't have a tracking number, so why not? Thanks Andrew ^ permalink raw reply [flat|nested] 25+ messages in thread
* libgomp GCN gfx1030/gfx1100 offloading status (was: [PATCH] amdgcn: additional gfx1100 support) 2024-01-26 9:45 ` Richard Biener 2024-01-26 10:19 ` Andrew Stubbs @ 2024-02-01 14:41 ` Thomas Schwinge 2024-02-01 14:49 ` Richard Biener 1 sibling, 1 reply; 25+ messages in thread From: Thomas Schwinge @ 2024-02-01 14:41 UTC (permalink / raw) To: Richard Biener, Andrew Stubbs, Tobias Burnus; +Cc: gcc-patches Hi! On 2024-01-26T10:45:10+0100, Richard Biener <rguenther@suse.de> wrote: > On Fri, 26 Jan 2024, Richard Biener wrote: >> On Wed, 24 Jan 2024, Andrew Stubbs wrote: >> > [...] is enough to get gfx1100 working for most purposes, on top of the >> > patch that Tobias committed a week or so ago; there are still some test >> > failures to investigate, and probably some tuning to do. >> > >> > It might also get gfx1030 working too. @Richi, could you test it, >> > please? >> >> I can report partial success here. [...] >> I'll followup with a test summary once the (serial) run of libgomp >> testing finished. (Why serial, by the way?) >> At least there are quite some number of >> actual kernel executions and PASSing testcases. > > === libgomp Summary === > > # of expected passes 29126 > # of unexpected failures 697 > # of unexpected successes 1 > # of expected failures 703 > # of unresolved testcases 318 > # of unsupported tests 766 > > full summary attached (compressed). Compating your old results ('| ' prefix in the following) with what I got with '-march=gfx1100' for AMD Radeon RX 7900 XTX. My GCC sources are a few weeks old, but have all the recent fix-up commits cherry-picked, and a work-around applied for: /tmp/ccfrKwEK.mkoffload.2.s:29:27: error: value out of range .amdhsa_next_free_vgpr 516 ^~~ (..., to be discussed later.) There are, I think, no compilation FAILs anymore; I'm only commenting on execution test FAILs. Not all FAILs appear all the time (so it follows that I may be missing a few), and 'libgomp.c++/../libgomp.c-c++-common' generally behaves similar to 'libgomp.c/../libgomp.c-c++-common', so omitting the former here. | FAIL: libgomp.c/../libgomp.c-c++-common/error-1.c output pattern test Not seeing that FAIL. I also see 'libgomp.c-c++-common/for-5.c' FAIL. | FAIL: libgomp.c/../libgomp.c-c++-common/icv-5.c execution test | FAIL: libgomp.c/../libgomp.c-c++-common/icv-6.c execution test | FAIL: libgomp.c/../libgomp.c-c++-common/icv-7.c execution test | FAIL: libgomp.c/../libgomp.c-c++-common/icv-9.c execution test I confirm 'libgomp.c-c++-common/icv-5.c', 'libgomp.c-c++-common/icv-9.c' FAIL, but 'libgomp.c-c++-common/icv-6.c', 'libgomp.c-c++-common/icv-7.c' PASS. | FAIL: libgomp.c/../libgomp.c-c++-common/non-rect-loop-1.c execution test Not seeing that FAIL. | FAIL: libgomp.c/../libgomp.c-c++-common/reduction-6.c execution test I confirm that FAIL, and also 'libgomp.c-c++-common/reduction-5.c' occasionally. | FAIL: libgomp.c/../libgomp.c-c++-common/requires-unified-addr-1.c execution test | FAIL: libgomp.c/../libgomp.c-c++-common/requires-unified-addr-2.c execution test | FAIL: libgomp.c/../libgomp.c-c++-common/target-45.c execution test | FAIL: libgomp.c/../libgomp.c-c++-common/target-implicit-map-3.c execution test | FAIL: libgomp.c/../libgomp.c-c++-common/target-is-accessible-1.c execution test Not seeing these FAILs. I also see 'libgomp.c-c++-common/reverse-offload-1.c' FAIL. | FAIL: libgomp.c/../libgomp.c-c++-common/task-detach-6.c execution test | WARNING: program timed out. | FAIL: libgomp.c/../libgomp.c-c++-common/task-in-explicit-1.c execution test I confirm these FAILs. | FAIL: libgomp.c/../libgomp.c-c++-common/teams-2.c execution test Known FAIL. | FAIL: libgomp.c/../libgomp.c-c++-common/teams-nteams-icv-1.c execution test | FAIL: libgomp.c/../libgomp.c-c++-common/teams-nteams-icv-2.c execution test | FAIL: libgomp.c/../libgomp.c-c++-common/teams-nteams-icv-3.c execution test | FAIL: libgomp.c/../libgomp.c-c++-common/teams-nteams-icv-4.c execution test | FAIL: libgomp.c/declare-variant-4-gfx900.c (test for excess errors) | FAIL: libgomp.c/declare-variant-4-gfx906.c (test for excess errors) | FAIL: libgomp.c/declare-variant-4-gfx908.c (test for excess errors) | FAIL: libgomp.c/declare-variant-4-gfx90a.c (test for excess errors) | FAIL: libgomp.c/declare-variant-4.c execution test | FAIL: libgomp.c/declare-variant-4.c scan-amdgcn-amdhsa-offload-tree-dump optimized "= gfx[^ ]+ \\\\(\\\\);" | FAIL: libgomp.c/examples-4/device-2.c execution test | WARNING: program timed out. Not seeing these FAILs. I also see 'libgomp.c/examples-4/teams-4.c', 'libgomp.c/target-31.c' FAIL. | FAIL: libgomp.c/target-teams-1.c execution test I confirm this FAIL. | FAIL: libgomp.fortran/[...] execution test You had a lot of FAILs there. I only see the following: | FAIL: libgomp.fortran/examples-4/teams-2.f90 -O0 execution test | [...] | FAIL: libgomp.fortran/examples-4/teams-4.f90 -O0 execution test | [...] | FAIL: libgomp.fortran/icv-6.f90 -O execution test | FAIL: libgomp.fortran/reverse-offload-1.f90 -O2 execution test | FAIL: libgomp.fortran/reverse-offload-1.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test | FAIL: libgomp.fortran/reverse-offload-1.f90 -O3 -g (test for excess errors) | UNRESOLVED: libgomp.fortran/reverse-offload-1.f90 -O3 -g compilation failed to produce executable | FAIL: libgomp.fortran/reverse-offload-3.f90 -O execution test | FAIL: libgomp.fortran/reverse-offload-4.f90 -O execution test | FAIL: libgomp.fortran/task-detach-6.f90 -O0 execution test | [...] | FAIL: libgomp.fortran/task-in-explicit-1.f90 -O0 execution test | [...] You had a lot of FAILs for 'libgomp.oacc-c', 'libgomp.oacc-c++', 'libgomp.oacc-fortran'. For me: | FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/vector-length-128-10.c -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa -O2 execution test I confirm this FAIL (also 'libgomp.oacc-c++'). | FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/vprop.c -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa -O2 (test for excess errors) Known FAIL (also 'libgomp.oacc-c++'). | FAIL: libgomp.oacc-fortran/reduction-5.f90 -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa -O0 execution test I confirm this FAIL. So, "not bad!", but also still some work to be done. :-) Grüße Thomas ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: libgomp GCN gfx1030/gfx1100 offloading status (was: [PATCH] amdgcn: additional gfx1100 support) 2024-02-01 14:41 ` libgomp GCN gfx1030/gfx1100 offloading status (was: [PATCH] amdgcn: additional gfx1100 support) Thomas Schwinge @ 2024-02-01 14:49 ` Richard Biener 2024-02-21 12:34 ` Stabilizing flaky libgomp GCN target/offloading testing (was: libgomp GCN gfx1030/gfx1100 offloading status) Thomas Schwinge 0 siblings, 1 reply; 25+ messages in thread From: Richard Biener @ 2024-02-01 14:49 UTC (permalink / raw) To: Thomas Schwinge; +Cc: Andrew Stubbs, Tobias Burnus, gcc-patches On Thu, 1 Feb 2024, Thomas Schwinge wrote: > Hi! > > On 2024-01-26T10:45:10+0100, Richard Biener <rguenther@suse.de> wrote: > > On Fri, 26 Jan 2024, Richard Biener wrote: > >> On Wed, 24 Jan 2024, Andrew Stubbs wrote: > >> > [...] is enough to get gfx1100 working for most purposes, on top of the > >> > patch that Tobias committed a week or so ago; there are still some test > >> > failures to investigate, and probably some tuning to do. > >> > > >> > It might also get gfx1030 working too. @Richi, could you test it, > >> > please? > >> > >> I can report partial success here. [...] > > >> I'll followup with a test summary once the (serial) run of libgomp > >> testing finished. > > (Why serial, by the way?) Just out of caution ... (I'm using the GPU for the desktop at the same time and dmesg gets spammed with some not-so reassuring "errors" during the offloading) > >> At least there are quite some number of > >> actual kernel executions and PASSing testcases. > > > > === libgomp Summary === > > > > # of expected passes 29126 > > # of unexpected failures 697 > > # of unexpected successes 1 > > # of expected failures 703 > > # of unresolved testcases 318 > > # of unsupported tests 766 > > > > full summary attached (compressed). > > Compating your old results ('| ' prefix in the following) with what I > got with '-march=gfx1100' for AMD Radeon RX 7900 XTX. My GCC sources are > a few weeks old, but have all the recent fix-up commits cherry-picked, > and a work-around applied for: > > /tmp/ccfrKwEK.mkoffload.2.s:29:27: error: value out of range > .amdhsa_next_free_vgpr 516 > ^~~ > > (..., to be discussed later.) > > There are, I think, no compilation FAILs anymore; I'm only commenting on > execution test FAILs. Not all FAILs appear all the time (so it follows > that I may be missing a few), and 'libgomp.c++/../libgomp.c-c++-common' > generally behaves similar to 'libgomp.c/../libgomp.c-c++-common', so > omitting the former here. > > | FAIL: libgomp.c/../libgomp.c-c++-common/error-1.c output pattern test > > Not seeing that FAIL. > > I also see 'libgomp.c-c++-common/for-5.c' FAIL. > > | FAIL: libgomp.c/../libgomp.c-c++-common/icv-5.c execution test > | FAIL: libgomp.c/../libgomp.c-c++-common/icv-6.c execution test > | FAIL: libgomp.c/../libgomp.c-c++-common/icv-7.c execution test > | FAIL: libgomp.c/../libgomp.c-c++-common/icv-9.c execution test > > I confirm 'libgomp.c-c++-common/icv-5.c', 'libgomp.c-c++-common/icv-9.c' > FAIL, but 'libgomp.c-c++-common/icv-6.c', 'libgomp.c-c++-common/icv-7.c' > PASS. > > | FAIL: libgomp.c/../libgomp.c-c++-common/non-rect-loop-1.c execution test > > Not seeing that FAIL. > > | FAIL: libgomp.c/../libgomp.c-c++-common/reduction-6.c execution test > > I confirm that FAIL, and also 'libgomp.c-c++-common/reduction-5.c' > occasionally. > > | FAIL: libgomp.c/../libgomp.c-c++-common/requires-unified-addr-1.c execution test > | FAIL: libgomp.c/../libgomp.c-c++-common/requires-unified-addr-2.c execution test > | FAIL: libgomp.c/../libgomp.c-c++-common/target-45.c execution test > | FAIL: libgomp.c/../libgomp.c-c++-common/target-implicit-map-3.c execution test > | FAIL: libgomp.c/../libgomp.c-c++-common/target-is-accessible-1.c execution test > > Not seeing these FAILs. > > I also see 'libgomp.c-c++-common/reverse-offload-1.c' FAIL. > > | FAIL: libgomp.c/../libgomp.c-c++-common/task-detach-6.c execution test > | WARNING: program timed out. > | FAIL: libgomp.c/../libgomp.c-c++-common/task-in-explicit-1.c execution test > > I confirm these FAILs. > > | FAIL: libgomp.c/../libgomp.c-c++-common/teams-2.c execution test > > Known FAIL. > > | FAIL: libgomp.c/../libgomp.c-c++-common/teams-nteams-icv-1.c execution test > | FAIL: libgomp.c/../libgomp.c-c++-common/teams-nteams-icv-2.c execution test > | FAIL: libgomp.c/../libgomp.c-c++-common/teams-nteams-icv-3.c execution test > | FAIL: libgomp.c/../libgomp.c-c++-common/teams-nteams-icv-4.c execution test > | FAIL: libgomp.c/declare-variant-4-gfx900.c (test for excess errors) > | FAIL: libgomp.c/declare-variant-4-gfx906.c (test for excess errors) > | FAIL: libgomp.c/declare-variant-4-gfx908.c (test for excess errors) > | FAIL: libgomp.c/declare-variant-4-gfx90a.c (test for excess errors) > | FAIL: libgomp.c/declare-variant-4.c execution test > | FAIL: libgomp.c/declare-variant-4.c scan-amdgcn-amdhsa-offload-tree-dump optimized "= gfx[^ ]+ \\\\(\\\\);" > | FAIL: libgomp.c/examples-4/device-2.c execution test > | WARNING: program timed out. > > Not seeing these FAILs. > > I also see 'libgomp.c/examples-4/teams-4.c', 'libgomp.c/target-31.c' FAIL. > > | FAIL: libgomp.c/target-teams-1.c execution test > > I confirm this FAIL. > > | FAIL: libgomp.fortran/[...] execution test > > You had a lot of FAILs there. I only see the following: > > | FAIL: libgomp.fortran/examples-4/teams-2.f90 -O0 execution test > | [...] > > | FAIL: libgomp.fortran/examples-4/teams-4.f90 -O0 execution test > | [...] > > | FAIL: libgomp.fortran/icv-6.f90 -O execution test > > | FAIL: libgomp.fortran/reverse-offload-1.f90 -O2 execution test > | FAIL: libgomp.fortran/reverse-offload-1.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test > | FAIL: libgomp.fortran/reverse-offload-1.f90 -O3 -g (test for excess errors) > | UNRESOLVED: libgomp.fortran/reverse-offload-1.f90 -O3 -g compilation failed to produce executable > > | FAIL: libgomp.fortran/reverse-offload-3.f90 -O execution test > | FAIL: libgomp.fortran/reverse-offload-4.f90 -O execution test > > | FAIL: libgomp.fortran/task-detach-6.f90 -O0 execution test > | [...] > > | FAIL: libgomp.fortran/task-in-explicit-1.f90 -O0 execution test > | [...] > > You had a lot of FAILs for 'libgomp.oacc-c', 'libgomp.oacc-c++', > 'libgomp.oacc-fortran'. For me: > > | FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/vector-length-128-10.c -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa -O2 execution test > > I confirm this FAIL (also 'libgomp.oacc-c++'). > > | FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/vprop.c -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa -O2 (test for excess errors) > > Known FAIL (also 'libgomp.oacc-c++'). > > | FAIL: libgomp.oacc-fortran/reduction-5.f90 -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa -O0 execution test > > I confirm this FAIL. > > > So, "not bad!", but also still some work to be done. :-) I'm going to re-do the testing with all the fixes in on Monday and will report back. Richard. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Stabilizing flaky libgomp GCN target/offloading testing (was: libgomp GCN gfx1030/gfx1100 offloading status) 2024-02-01 14:49 ` Richard Biener @ 2024-02-21 12:34 ` Thomas Schwinge 2024-02-21 16:32 ` Richard Biener 2024-03-08 10:34 ` GCN, nvptx: Errors during device probing are fatal (was: Stabilizing flaky libgomp GCN target/offloading testing) Thomas Schwinge 0 siblings, 2 replies; 25+ messages in thread From: Thomas Schwinge @ 2024-02-21 12:34 UTC (permalink / raw) To: Richard Biener, Andrew Stubbs; +Cc: Tobias Burnus, gcc-patches, Jakub Jelinek Hi! On 2024-02-01T15:49:02+0100, Richard Biener <rguenther@suse.de> wrote: > On Thu, 1 Feb 2024, Thomas Schwinge wrote: >> On 2024-01-26T10:45:10+0100, Richard Biener <rguenther@suse.de> wrote: >> > On Fri, 26 Jan 2024, Richard Biener wrote: >> >> On Wed, 24 Jan 2024, Andrew Stubbs wrote: >> >> > [...] is enough to get gfx1100 working for most purposes, on top of the >> >> > patch that Tobias committed a week or so ago; there are still some test >> >> > failures to investigate, and probably some tuning to do. >> >> > >> >> > It might also get gfx1030 working too. @Richi, could you test it, >> >> > please? >> >> >> >> I can report partial success here. [...] >> >> >> I'll followup with a test summary once the (serial) run of libgomp >> >> testing finished. >> >> (Why serial, by the way?) > > Just out of caution ... (I'm using the GPU for the desktop at the > same time and dmesg gets spammed with some not-so reassuring > "errors" during the offloading) Yeah, indeed 'dmesg' is full of "notes"... However, note that per my work on <https://gcc.gnu.org/PR66005> "libgomp make check time is excessive", all execution testing in libgomp is serialized in 'libgomp/testsuite/lib/libgomp.exp:libgomp_load'. So, no problem/difference in that regard, to run parallel 'check-target-libgomp'. (... with the caveat that execution tests for effective-targets are *not* governed by that, as I've found yesterday. I have a WIP hack for that, too.) >> [...] what I >> got with '-march=gfx1100' for AMD Radeon RX 7900 XTX. [...] >> [...] execution test FAILs. Not all FAILs appear all the time [...] What disturbs the testing a lot is, that the GPU may get into a bad state, upon which any use either fails with a 'HSA_STATUS_ERROR_OUT_OF_RESOURCES' error -- or by just hanging, deep in 'libhsa-runtime64.so.1'... I've now tried to debug the latter case (hang). When the GPU gets into this bad state (whatever exactly that is), 'hsa_executable_load_code_object' still returns 'HSA_STATUS_SUCCESS', but then GCN target execution ('gcn-run') hangs in 'hsa_executable_freeze' vs. GCN offloading execution ('libgomp-plugin-gcn.so.1') hangs right before 'hsa_executable_freeze', in the GCN heap setup 'hsa_memory_copy'. There it hangs until killed (for example, until DejaGnu's timeout mechanism kills the process -- just that the next GPU-using execution test then runs into the same thing again...). In this state (and also the 'HSA_STATUS_ERROR_OUT_OF_RESOURCES' state), we're able to recover via: $ flock /tmp/gpu.lock sudo cat /sys/kernel/debug/dri/0/amdgpu_gpu_recover 0 This is, obviously, a hack, probably needs a serial lock to not disturb other things, has hard-coded 'dri/0', and as I said in <https://inbox.sourceware.org/87plww8qin.fsf@euler.schwinge.ddns.net> "GCN RDNA2+ vs. GCC SLP vectorizer": | I've no idea what | 'amdgpu_gpu_recover' would do if the GPU is also used for display. However, it's very useful in my testing. :-| The questions is, how to detect the "hang" state without first running into a timeout (and disambiguating such a timeout from a user code timeout)? Add a watchdog: call 'alarm([a few seconds])' before device initialization, and before the actual GPU kernel launch cancel it with 'alarm(0)'? (..., and add a handler for 'SIGALRM' to print a distinct error message that we can then react on, like for 'HSA_STATUS_ERROR_OUT_OF_RESOURCES'.) Probably 'alarm'/'SIGALRM' is a no-go in libgomp -- instead, use a helper thread to similarly implement a watchdog? ('libgomp/plugin/plugin-gcn.c' already is using pthreads for other purposes.) Any other clever ideas? What's a suitable value for "a few seconds"? Grüße Thomas ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Stabilizing flaky libgomp GCN target/offloading testing (was: libgomp GCN gfx1030/gfx1100 offloading status) 2024-02-21 12:34 ` Stabilizing flaky libgomp GCN target/offloading testing (was: libgomp GCN gfx1030/gfx1100 offloading status) Thomas Schwinge @ 2024-02-21 16:32 ` Richard Biener 2024-03-06 12:09 ` Stabilize flaky GCN target/offloading testing Thomas Schwinge 2024-03-08 10:34 ` GCN, nvptx: Errors during device probing are fatal (was: Stabilizing flaky libgomp GCN target/offloading testing) Thomas Schwinge 1 sibling, 1 reply; 25+ messages in thread From: Richard Biener @ 2024-02-21 16:32 UTC (permalink / raw) To: Thomas Schwinge; +Cc: Andrew Stubbs, Tobias Burnus, gcc-patches, Jakub Jelinek > Am 21.02.2024 um 13:34 schrieb Thomas Schwinge <tschwinge@baylibre.com>: > > Hi! > >> On 2024-02-01T15:49:02+0100, Richard Biener <rguenther@suse.de> wrote: >>> On Thu, 1 Feb 2024, Thomas Schwinge wrote: >>> On 2024-01-26T10:45:10+0100, Richard Biener <rguenther@suse.de> wrote: >>>> On Fri, 26 Jan 2024, Richard Biener wrote: >>>>> On Wed, 24 Jan 2024, Andrew Stubbs wrote: >>>>>> [...] is enough to get gfx1100 working for most purposes, on top of the >>>>>> patch that Tobias committed a week or so ago; there are still some test >>>>>> failures to investigate, and probably some tuning to do. >>>>>> >>>>>> It might also get gfx1030 working too. @Richi, could you test it, >>>>>> please? >>>>> >>>>> I can report partial success here. [...] >>> >>>>> I'll followup with a test summary once the (serial) run of libgomp >>>>> testing finished. >>> >>> (Why serial, by the way?) >> >> Just out of caution ... (I'm using the GPU for the desktop at the >> same time and dmesg gets spammed with some not-so reassuring >> "errors" during the offloading) > > Yeah, indeed 'dmesg' is full of "notes"... > > However, note that per my work on <https://gcc.gnu.org/PR66005> > "libgomp make check time is excessive", all execution testing in libgomp > is serialized in 'libgomp/testsuite/lib/libgomp.exp:libgomp_load'. So, > no problem/difference in that regard, to run parallel > 'check-target-libgomp'. (... with the caveat that execution tests for > effective-targets are *not* governed by that, as I've found yesterday. > I have a WIP hack for that, too.) > > >>> [...] what I >>> got with '-march=gfx1100' for AMD Radeon RX 7900 XTX. [...] > >>> [...] execution test FAILs. Not all FAILs appear all the time [...] > > What disturbs the testing a lot is, that the GPU may get into a bad > state, upon which any use either fails with a > 'HSA_STATUS_ERROR_OUT_OF_RESOURCES' error -- or by just hanging, deep in > 'libhsa-runtime64.so.1'... > > I've now tried to debug the latter case (hang). When the GPU gets into > this bad state (whatever exactly that is), > 'hsa_executable_load_code_object' still returns 'HSA_STATUS_SUCCESS', but > then GCN target execution ('gcn-run') hangs in 'hsa_executable_freeze' > vs. GCN offloading execution ('libgomp-plugin-gcn.so.1') hangs right > before 'hsa_executable_freeze', in the GCN heap setup 'hsa_memory_copy'. > There it hangs until killed (for example, until DejaGnu's timeout > mechanism kills the process -- just that the next GPU-using execution > test then runs into the same thing again...). > > In this state (and also the 'HSA_STATUS_ERROR_OUT_OF_RESOURCES' state), > we're able to recover via: > > $ flock /tmp/gpu.lock sudo cat /sys/kernel/debug/dri/0/amdgpu_gpu_recover > 0 > > This is, obviously, a hack, probably needs a serial lock to not disturb > other things, has hard-coded 'dri/0', and as I said in > <https://inbox.sourceware.org/87plww8qin.fsf@euler.schwinge.ddns.net> > "GCN RDNA2+ vs. GCC SLP vectorizer": > > | I've no idea what > | 'amdgpu_gpu_recover' would do if the GPU is also used for display. It ends up terminating your X session… (there’s some automatic driver recovery that’s also sometimes triggered which sounds like the same thing). I need to try using the integrated graphics for X11 to see if that avoids the issue. Guess AMD needs to improve the driver/runtime (or we - it’s open source at least up to the firmware). Richard > However, it's very useful in my testing. :-| > > The questions is, how to detect the "hang" state without first running > into a timeout (and disambiguating such a timeout from a user code > timeout)? Add a watchdog: call 'alarm([a few seconds])' before device > initialization, and before the actual GPU kernel launch cancel it with > 'alarm(0)'? (..., and add a handler for 'SIGALRM' to print a distinct > error message that we can then react on, like for > 'HSA_STATUS_ERROR_OUT_OF_RESOURCES'.) Probably 'alarm'/'SIGALRM' is a > no-go in libgomp -- instead, use a helper thread to similarly implement a > watchdog? ('libgomp/plugin/plugin-gcn.c' already is using pthreads for > other purposes.) Any other clever ideas? What's a suitable value for > "a few seconds"? > > > Grüße > Thomas ^ permalink raw reply [flat|nested] 25+ messages in thread
* Stabilize flaky GCN target/offloading testing 2024-02-21 16:32 ` Richard Biener @ 2024-03-06 12:09 ` Thomas Schwinge 2024-03-06 12:39 ` Andrew Stubbs 0 siblings, 1 reply; 25+ messages in thread From: Thomas Schwinge @ 2024-03-06 12:09 UTC (permalink / raw) To: Richard Biener, Andrew Stubbs; +Cc: Tobias Burnus, gcc-patches, Jakub Jelinek [-- Attachment #1: Type: text/plain, Size: 4691 bytes --] Hi! On 2024-02-21T17:32:13+0100, Richard Biener <rguenther@suse.de> wrote: > Am 21.02.2024 um 13:34 schrieb Thomas Schwinge <tschwinge@baylibre.com>: >> [...] per my work on <https://gcc.gnu.org/PR66005> >> "libgomp make check time is excessive", all execution testing in libgomp >> is serialized in 'libgomp/testsuite/lib/libgomp.exp:libgomp_load'. [...] >> (... with the caveat that execution tests for >> effective-targets are *not* governed by that, as I've found yesterday. >> I have a WIP hack for that, too.) >> What disturbs the testing a lot is, that the GPU may get into a bad >> state, upon which any use either fails with a >> 'HSA_STATUS_ERROR_OUT_OF_RESOURCES' error -- or by just hanging, deep in >> 'libhsa-runtime64.so.1'... >> >> I've now tried to debug the latter case (hang). When the GPU gets into >> this bad state (whatever exactly that is), >> 'hsa_executable_load_code_object' still returns 'HSA_STATUS_SUCCESS', but >> then GCN target execution ('gcn-run') hangs in 'hsa_executable_freeze' >> vs. GCN offloading execution ('libgomp-plugin-gcn.so.1') hangs right >> before 'hsa_executable_freeze', in the GCN heap setup 'hsa_memory_copy'. >> There it hangs until killed (for example, until DejaGnu's timeout >> mechanism kills the process -- just that the next GPU-using execution >> test then runs into the same thing again...). >> >> In this state (and also the 'HSA_STATUS_ERROR_OUT_OF_RESOURCES' state), >> we're able to recover via: >> >> $ flock /tmp/gpu.lock sudo cat /sys/kernel/debug/dri/0/amdgpu_gpu_recover >> 0 At least most of the times. I've found that -- sometimes... ;-( -- if you run into 'HSA_STATUS_ERROR_OUT_OF_RESOURCES', then do 'amdgpu_gpu_recover', and then immediately re-execute, you'll again run into 'HSA_STATUS_ERROR_OUT_OF_RESOURCES'. That appears to be avoidable by injecting some artificial "cool-down period"... (The latter I've not yet tested extensively.) >> This is, obviously, a hack, probably needs a serial lock to not disturb >> other things, has hard-coded 'dri/0', and as I said in >> <https://inbox.sourceware.org/87plww8qin.fsf@euler.schwinge.ddns.net> >> "GCN RDNA2+ vs. GCC SLP vectorizer": >> >> | I've no idea what >> | 'amdgpu_gpu_recover' would do if the GPU is also used for display. > > It ends up terminating your X session… Eh.... ;'-| > (there’s some automatic driver recovery that’s also sometimes triggered which sounds like the same thing). > I need to try using the integrated graphics for X11 to see if that avoids the issue. A few years ago, I tried that for a Nvidia GPU laptop, and -- if I now remember correctly -- basically got it to work, via hand-editing '/etc/X11/xorg.conf' and all that... But: I couldn't get external HDMI to work in that setup, and therefore reverted to "standard". > Guess AMD needs to improve the driver/runtime (or we - it’s open source at least up to the firmware). >> However, it's very useful in my testing. :-| >> >> The questions is, how to detect the "hang" state without first running >> into a timeout (and disambiguating such a timeout from a user code >> timeout)? Add a watchdog: call 'alarm([a few seconds])' before device >> initialization, and before the actual GPU kernel launch cancel it with >> 'alarm(0)'? (..., and add a handler for 'SIGALRM' to print a distinct >> error message that we can then react on, like for >> 'HSA_STATUS_ERROR_OUT_OF_RESOURCES'.) Probably 'alarm'/'SIGALRM' is a >> no-go in libgomp -- instead, use a helper thread to similarly implement a >> watchdog? ('libgomp/plugin/plugin-gcn.c' already is using pthreads for >> other purposes.) Any other clever ideas? What's a suitable value for >> "a few seconds"? I'm attaching my current "GCN: Watchdog for device image load", covering both 'gcc/config/gcn/gcn-run.cc' and 'libgomp/plugin/plugin-gcn.c'. (That's using 'timer_create' etc. instead of 'alarm'/'SIGALRM'. ) That, plus routing *all* potential GPU usage (in particular: including execution tests for effective-targets, see above) through a serial lock ('flock', implemented in DejaGnu board file, outside of the the "DejaGnu timeout domain", similar to 'libgomp/testsuite/lib/libgomp.exp:libgomp_load', see above), plus catching 'HSA_STATUS_ERROR_OUT_OF_RESOURCES' (both the "real" ones and the "fake" ones via "GCN: Watchdog for device image load") and in that case 'amdgpu_gpu_recover' and re-execution of the respective executable, does greatly stabilize flaky GCN target/offloading testing. Do we have consensus to move forward with this approach, generally? Grüße Thomas [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: 0001-GCN-Watchdog-for-device-image-load.patch --] [-- Type: text/x-diff, Size: 6408 bytes --] From 21795353483c263c91a5efa80da41a75a6b2b629 Mon Sep 17 00:00:00 2001 From: Thomas Schwinge <tschwinge@baylibre.com> Date: Thu, 22 Feb 2024 21:50:45 +0100 Subject: [PATCH] GCN: Watchdog for device image load --- gcc/config/gcn/gcn-run.cc | 76 ++++++++++++++++++++++++++++++++++ libgomp/plugin/plugin-gcn.c | 81 ++++++++++++++++++++++++++++++++++++- 2 files changed, 156 insertions(+), 1 deletion(-) diff --git a/gcc/config/gcn/gcn-run.cc b/gcc/config/gcn/gcn-run.cc index d45ff3e6c2ba..ab15185af471 100644 --- a/gcc/config/gcn/gcn-run.cc +++ b/gcc/config/gcn/gcn-run.cc @@ -33,6 +33,8 @@ #include <unistd.h> #include <elf.h> #include <signal.h> +#include <time.h> +#include <errno.h> #include "hsa.h" #include "../../../libgomp/config/gcn/libgomp-gcn.h" @@ -616,6 +618,70 @@ run (uint64_t kernel, void *kernargs) "Clean up signal"); } +/* Watchdog. */ + +static void +watchdog_bark (union sigval sigev_value) +{ + const char *msg = sigev_value.sival_ptr; + fprintf (stderr, "Watchdog barking %s\n", msg); + exit (EXIT_FAILURE); +} + +static void +watchdog_start (timer_t *restrict timeridp, const int s, const char *msg) +{ + if (debug) + fprintf (stderr, "Starting watchdog\n"); + + struct sigevent sev; + sev.sigev_notify = SIGEV_THREAD; + sev.sigev_value.sival_ptr = (void *) (uintptr_t) msg; + sev.sigev_notify_function = watchdog_bark; + sev.sigev_notify_attributes = NULL; + int res; + /* Backoff in case of 'EAGAIN': waiting 255..534773760 ns in 22 attempts. */ + int32_t wait_ns = 255; + while ((res = timer_create (CLOCK_MONOTONIC, &sev, timeridp)) == EAGAIN + && wait_ns <= 999999999) + { + if (debug) + fprintf (stderr, "'timer_create': 'EAGAIN'; waiting %d ns\n", + (int) wait_ns); + struct timespec wait_ts = { 0, wait_ns }; + (void) nanosleep (&wait_ts, NULL); + wait_ns <<= 1; + } + if (res != 0) + { + perror ("'timer_create' FAILED"); + exit (EXIT_FAILURE); + } + + struct itimerspec its = { { 0, 0 }, { s, 0 } }; + res = timer_settime (*timeridp, 0, &its, NULL); + if (res != 0) + { + perror ("'timer_settime' FAILED"); + exit (EXIT_FAILURE); + } +} + +static void +watchdog_stop (timer_t timerid) +{ + int res; + res = timer_delete (timerid); + if (res != 0) + { + perror ("'timer_delete' FAILED"); + exit (EXIT_FAILURE); + } + + if (debug) + fprintf (stderr, "Stopped watchdog\n"); +} + int main (int argc, char *argv[]) { @@ -658,7 +724,17 @@ main (int argc, char *argv[]) char **kernel_argv = &argv[kernel_arg]; init_device (); + + /* Something's wrong if the device image load doesn't complete quickly; + <https://inbox.sourceware.org/87il2ij8sm.fsf@euler.schwinge.ddns.net> + "Stabilizing flaky libgomp GCN target/offloading testing". */ + timer_t watchdog; + static const int watchdog_s = 10; + watchdog_start (&watchdog, watchdog_s, + "during device image load; maybe handle similar to" + " 'HSA_STATUS_ERROR_OUT_OF_RESOURCES'?"); load_image (kernel_argv[0]); + watchdog_stop (watchdog); /* Calculate size of function parameters + argv data. */ size_t args_size = 0; diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c index 2771123252a8..5680d9f5a34a 100644 --- a/libgomp/plugin/plugin-gcn.c +++ b/libgomp/plugin/plugin-gcn.c @@ -48,6 +48,8 @@ #include "oacc-plugin.h" #include "oacc-int.h" #include <assert.h> +#include <time.h> +#include <errno.h> /* These probably won't be in elf.h for a while. */ #ifndef R_AMDGPU_NONE @@ -1371,6 +1373,71 @@ hsa_queue_callback (hsa_status_t status, hsa_fatal ("Asynchronous queue error", status); } +/* }}} */ +/* {{{ Watchdog */ + +static void +watchdog_bark (union sigval sigev_value) +{ + const char *msg = sigev_value.sival_ptr; + GOMP_PLUGIN_error ("GCN fatal error: watchdog barking %s\n", msg); + _Exit (EXIT_FAILURE); +} + +static void +watchdog_start (timer_t *restrict timeridp, const int s, const char *msg) +{ + GCN_DEBUG ("Starting watchdog\n"); + + struct sigevent sev; + sev.sigev_notify = SIGEV_THREAD; + sev.sigev_value.sival_ptr = (void *) (uintptr_t) msg; + sev.sigev_notify_function = watchdog_bark; + sev.sigev_notify_attributes = NULL; + int res; + /* Backoff in case of 'EAGAIN': waiting 255..534773760 ns in 22 attempts. */ + int32_t wait_ns = 255; + while ((res = timer_create (CLOCK_MONOTONIC, &sev, timeridp)) == EAGAIN + && wait_ns <= 999999999) + { + GCN_DEBUG ("'timer_create': 'EAGAIN'; waiting %d ns\n", + (int) wait_ns); + struct timespec wait_ts = { 0, wait_ns }; + (void) nanosleep (&wait_ts, NULL); + wait_ns <<= 1; + } + if (res != 0) + { + GOMP_PLUGIN_error ("GCN fatal error: 'timer_create' FAILED: %s", + strerror (errno)); + _Exit (EXIT_FAILURE); + } + + struct itimerspec its = { { 0, 0 }, { s, 0 } }; + res = timer_settime (*timeridp, 0, &its, NULL); + if (res != 0) + { + GOMP_PLUGIN_error ("GCN fatal error: 'timer_settime' FAILED: %s", + strerror (errno)); + _Exit (EXIT_FAILURE); + } +} + +static void +watchdog_stop (timer_t timerid) +{ + int res; + res = timer_delete (timerid); + if (res != 0) + { + GOMP_PLUGIN_error ("GCN fatal error: 'timer_delete' FAILED: %s", + strerror (errno)); + _Exit (EXIT_FAILURE); + } + + GCN_DEBUG ("Stopped watchdog\n"); +} + /* }}} */ /* {{{ HSA initialization */ @@ -2502,7 +2569,16 @@ create_and_finalize_hsa_program (struct agent_info *agent) return false; } if (agent->prog_finalized) - goto final; + goto unlock; + + /* Something's wrong if the device image load doesn't complete quickly; + <https://inbox.sourceware.org/87il2ij8sm.fsf@euler.schwinge.ddns.net> + "Stabilizing flaky libgomp GCN target/offloading testing". */ + timer_t watchdog; + static const int watchdog_s = 10; + watchdog_start (&watchdog, watchdog_s, + "during device image load; maybe handle similar to" + " 'HSA_STATUS_ERROR_OUT_OF_RESOURCES'?"); status = hsa_fns.hsa_executable_create_fn (HSA_PROFILE_FULL, @@ -2581,6 +2657,9 @@ create_and_finalize_hsa_program (struct agent_info *agent) final: agent->prog_finalized = true; + watchdog_stop (watchdog); + +unlock: if (pthread_mutex_unlock (&agent->prog_mutex)) { GOMP_PLUGIN_error ("Could not unlock a GCN agent program mutex"); -- 2.43.0 ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Stabilize flaky GCN target/offloading testing 2024-03-06 12:09 ` Stabilize flaky GCN target/offloading testing Thomas Schwinge @ 2024-03-06 12:39 ` Andrew Stubbs 2024-03-06 13:29 ` Richard Biener 0 siblings, 1 reply; 25+ messages in thread From: Andrew Stubbs @ 2024-03-06 12:39 UTC (permalink / raw) To: Thomas Schwinge, Richard Biener; +Cc: Tobias Burnus, gcc-patches, Jakub Jelinek On 06/03/2024 12:09, Thomas Schwinge wrote: > Hi! > > On 2024-02-21T17:32:13+0100, Richard Biener <rguenther@suse.de> wrote: >> Am 21.02.2024 um 13:34 schrieb Thomas Schwinge <tschwinge@baylibre.com>: >>> [...] per my work on <https://gcc.gnu.org/PR66005> >>> "libgomp make check time is excessive", all execution testing in libgomp >>> is serialized in 'libgomp/testsuite/lib/libgomp.exp:libgomp_load'. [...] >>> (... with the caveat that execution tests for >>> effective-targets are *not* governed by that, as I've found yesterday. >>> I have a WIP hack for that, too.) > >>> What disturbs the testing a lot is, that the GPU may get into a bad >>> state, upon which any use either fails with a >>> 'HSA_STATUS_ERROR_OUT_OF_RESOURCES' error -- or by just hanging, deep in >>> 'libhsa-runtime64.so.1'... >>> >>> I've now tried to debug the latter case (hang). When the GPU gets into >>> this bad state (whatever exactly that is), >>> 'hsa_executable_load_code_object' still returns 'HSA_STATUS_SUCCESS', but >>> then GCN target execution ('gcn-run') hangs in 'hsa_executable_freeze' >>> vs. GCN offloading execution ('libgomp-plugin-gcn.so.1') hangs right >>> before 'hsa_executable_freeze', in the GCN heap setup 'hsa_memory_copy'. >>> There it hangs until killed (for example, until DejaGnu's timeout >>> mechanism kills the process -- just that the next GPU-using execution >>> test then runs into the same thing again...). >>> >>> In this state (and also the 'HSA_STATUS_ERROR_OUT_OF_RESOURCES' state), >>> we're able to recover via: >>> >>> $ flock /tmp/gpu.lock sudo cat /sys/kernel/debug/dri/0/amdgpu_gpu_recover >>> 0 > > At least most of the times. I've found that -- sometimes... ;-( -- if > you run into 'HSA_STATUS_ERROR_OUT_OF_RESOURCES', then do > 'amdgpu_gpu_recover', and then immediately re-execute, you'll again run > into 'HSA_STATUS_ERROR_OUT_OF_RESOURCES'. That appears to be avoidable > by injecting some artificial "cool-down period"... (The latter I've not > yet tested extensively.) > >>> This is, obviously, a hack, probably needs a serial lock to not disturb >>> other things, has hard-coded 'dri/0', and as I said in >>> <https://inbox.sourceware.org/87plww8qin.fsf@euler.schwinge.ddns.net> >>> "GCN RDNA2+ vs. GCC SLP vectorizer": >>> >>> | I've no idea what >>> | 'amdgpu_gpu_recover' would do if the GPU is also used for display. >> >> It ends up terminating your X session… > > Eh.... ;'-| > >> (there’s some automatic driver recovery that’s also sometimes triggered which sounds like the same thing). > >> I need to try using the integrated graphics for X11 to see if that avoids the issue. > > A few years ago, I tried that for a Nvidia GPU laptop, and -- if I now > remember correctly -- basically got it to work, via hand-editing > '/etc/X11/xorg.conf' and all that... But: I couldn't get external HDMI > to work in that setup, and therefore reverted to "standard". > >> Guess AMD needs to improve the driver/runtime (or we - it’s open source at least up to the firmware). > >>> However, it's very useful in my testing. :-| >>> >>> The questions is, how to detect the "hang" state without first running >>> into a timeout (and disambiguating such a timeout from a user code >>> timeout)? Add a watchdog: call 'alarm([a few seconds])' before device >>> initialization, and before the actual GPU kernel launch cancel it with >>> 'alarm(0)'? (..., and add a handler for 'SIGALRM' to print a distinct >>> error message that we can then react on, like for >>> 'HSA_STATUS_ERROR_OUT_OF_RESOURCES'.) Probably 'alarm'/'SIGALRM' is a >>> no-go in libgomp -- instead, use a helper thread to similarly implement a >>> watchdog? ('libgomp/plugin/plugin-gcn.c' already is using pthreads for >>> other purposes.) Any other clever ideas? What's a suitable value for >>> "a few seconds"? > > I'm attaching my current "GCN: Watchdog for device image load", covering > both 'gcc/config/gcn/gcn-run.cc' and 'libgomp/plugin/plugin-gcn.c'. > (That's using 'timer_create' etc. instead of 'alarm'/'SIGALRM'. ) > > That, plus routing *all* potential GPU usage (in particular: including > execution tests for effective-targets, see above) through a serial lock > ('flock', implemented in DejaGnu board file, outside of the the > "DejaGnu timeout domain", similar to > 'libgomp/testsuite/lib/libgomp.exp:libgomp_load', see above), plus > catching 'HSA_STATUS_ERROR_OUT_OF_RESOURCES' (both the "real" ones and > the "fake" ones via "GCN: Watchdog for device image load") and in that > case 'amdgpu_gpu_recover' and re-execution of the respective executable, > does greatly stabilize flaky GCN target/offloading testing. > > Do we have consensus to move forward with this approach, generally? I've also observed a number of random hangs in host-side code outside our control, but after the kernel has exited. In general this watchdog approach might help with these. I do feel like it's "papering over the cracks", but if we can't fix it.... at the end of the day it's just a little extra code. My only concern is that it might actually cause failures, perhaps on heavily loaded systems, or with network filesystems, or during debugging. Andrew ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Stabilize flaky GCN target/offloading testing 2024-03-06 12:39 ` Andrew Stubbs @ 2024-03-06 13:29 ` Richard Biener 0 siblings, 0 replies; 25+ messages in thread From: Richard Biener @ 2024-03-06 13:29 UTC (permalink / raw) To: Andrew Stubbs; +Cc: Thomas Schwinge, Tobias Burnus, gcc-patches, Jakub Jelinek On Wed, 6 Mar 2024, Andrew Stubbs wrote: > On 06/03/2024 12:09, Thomas Schwinge wrote: > > Hi! > > > > On 2024-02-21T17:32:13+0100, Richard Biener <rguenther@suse.de> wrote: > >> Am 21.02.2024 um 13:34 schrieb Thomas Schwinge <tschwinge@baylibre.com>: > >>> [...] per my work on <https://gcc.gnu.org/PR66005> > >>> "libgomp make check time is excessive", all execution testing in libgomp > >>> is serialized in 'libgomp/testsuite/lib/libgomp.exp:libgomp_load'. [...] > >>> (... with the caveat that execution tests for > >>> effective-targets are *not* governed by that, as I've found yesterday. > >>> I have a WIP hack for that, too.) > > > >>> What disturbs the testing a lot is, that the GPU may get into a bad > >>> state, upon which any use either fails with a > >>> 'HSA_STATUS_ERROR_OUT_OF_RESOURCES' error -- or by just hanging, deep in > >>> 'libhsa-runtime64.so.1'... > >>> > >>> I've now tried to debug the latter case (hang). When the GPU gets into > >>> this bad state (whatever exactly that is), > >>> 'hsa_executable_load_code_object' still returns 'HSA_STATUS_SUCCESS', but > >>> then GCN target execution ('gcn-run') hangs in 'hsa_executable_freeze' > >>> vs. GCN offloading execution ('libgomp-plugin-gcn.so.1') hangs right > >>> before 'hsa_executable_freeze', in the GCN heap setup 'hsa_memory_copy'. > >>> There it hangs until killed (for example, until DejaGnu's timeout > >>> mechanism kills the process -- just that the next GPU-using execution > >>> test then runs into the same thing again...). > >>> > >>> In this state (and also the 'HSA_STATUS_ERROR_OUT_OF_RESOURCES' state), > >>> we're able to recover via: > >>> > >>> $ flock /tmp/gpu.lock sudo cat > >>> /sys/kernel/debug/dri/0/amdgpu_gpu_recover > >>> 0 > > > > At least most of the times. I've found that -- sometimes... ;-( -- if > > you run into 'HSA_STATUS_ERROR_OUT_OF_RESOURCES', then do > > 'amdgpu_gpu_recover', and then immediately re-execute, you'll again run > > into 'HSA_STATUS_ERROR_OUT_OF_RESOURCES'. That appears to be avoidable > > by injecting some artificial "cool-down period"... (The latter I've not > > yet tested extensively.) > > > >>> This is, obviously, a hack, probably needs a serial lock to not disturb > >>> other things, has hard-coded 'dri/0', and as I said in > >>> <https://inbox.sourceware.org/87plww8qin.fsf@euler.schwinge.ddns.net> > >>> "GCN RDNA2+ vs. GCC SLP vectorizer": > >>> > >>> | I've no idea what > >>> | 'amdgpu_gpu_recover' would do if the GPU is also used for display. > >> > >> It ends up terminating your X session? > > > > Eh.... ;'-| > > > >> (there?s some automatic driver recovery that?s also sometimes triggered > >> which sounds like the same thing). > > > >> I need to try using the integrated graphics for X11 to see if that avoids > >> the issue. > > > > A few years ago, I tried that for a Nvidia GPU laptop, and -- if I now > > remember correctly -- basically got it to work, via hand-editing > > '/etc/X11/xorg.conf' and all that... But: I couldn't get external HDMI > > to work in that setup, and therefore reverted to "standard". > > > >> Guess AMD needs to improve the driver/runtime (or we - it?s open source at > >> least up to the firmware). > > > >>> However, it's very useful in my testing. :-| > >>> > >>> The questions is, how to detect the "hang" state without first running > >>> into a timeout (and disambiguating such a timeout from a user code > >>> timeout)? Add a watchdog: call 'alarm([a few seconds])' before device > >>> initialization, and before the actual GPU kernel launch cancel it with > >>> 'alarm(0)'? (..., and add a handler for 'SIGALRM' to print a distinct > >>> error message that we can then react on, like for > >>> 'HSA_STATUS_ERROR_OUT_OF_RESOURCES'.) Probably 'alarm'/'SIGALRM' is a > >>> no-go in libgomp -- instead, use a helper thread to similarly implement a > >>> watchdog? ('libgomp/plugin/plugin-gcn.c' already is using pthreads for > >>> other purposes.) Any other clever ideas? What's a suitable value for > >>> "a few seconds"? > > > > I'm attaching my current "GCN: Watchdog for device image load", covering > > both 'gcc/config/gcn/gcn-run.cc' and 'libgomp/plugin/plugin-gcn.c'. > > (That's using 'timer_create' etc. instead of 'alarm'/'SIGALRM'. ) > > > > That, plus routing *all* potential GPU usage (in particular: including > > execution tests for effective-targets, see above) through a serial lock > > ('flock', implemented in DejaGnu board file, outside of the the > > "DejaGnu timeout domain", similar to > > 'libgomp/testsuite/lib/libgomp.exp:libgomp_load', see above), plus > > catching 'HSA_STATUS_ERROR_OUT_OF_RESOURCES' (both the "real" ones and > > the "fake" ones via "GCN: Watchdog for device image load") and in that > > case 'amdgpu_gpu_recover' and re-execution of the respective executable, > > does greatly stabilize flaky GCN target/offloading testing. > > > > Do we have consensus to move forward with this approach, generally? > > I've also observed a number of random hangs in host-side code outside our > control, but after the kernel has exited. In general this watchdog approach > might help with these. I do feel like it's "papering over the cracks", but if > we can't fix it.... at the end of the day it's just a little extra code. I wonder if you maybe have contact to people at AMD that are willing to debug this and improve the driver side of this? I'm seeing quite a number of similar reports for the issue I hit in the github tracker, multiple years old and also current, so that doesn't seem to be a good way to get things fixed ... Richard. > My only concern is that it might actually cause failures, perhaps on heavily > loaded systems, or with network filesystems, or during debugging. > > Andrew > ^ permalink raw reply [flat|nested] 25+ messages in thread
* GCN, nvptx: Errors during device probing are fatal (was: Stabilizing flaky libgomp GCN target/offloading testing) 2024-02-21 12:34 ` Stabilizing flaky libgomp GCN target/offloading testing (was: libgomp GCN gfx1030/gfx1100 offloading status) Thomas Schwinge 2024-02-21 16:32 ` Richard Biener @ 2024-03-08 10:34 ` Thomas Schwinge 1 sibling, 0 replies; 25+ messages in thread From: Thomas Schwinge @ 2024-03-08 10:34 UTC (permalink / raw) To: Andrew Stubbs, gcc-patches, Jakub Jelinek; +Cc: Richard Biener, Tobias Burnus [-- Attachment #1: Type: text/plain, Size: 1145 bytes --] Hi! On 2024-02-21T13:34:01+0100, I wrote: > On 2024-02-01T15:49:02+0100, Richard Biener <rguenther@suse.de> wrote: >> On Thu, 1 Feb 2024, Thomas Schwinge wrote: >>> [...] what I >>> got with '-march=gfx1100' for AMD Radeon RX 7900 XTX. [...] > >>> [...] execution test FAILs. Not all FAILs appear all the time [...] > > What disturbs the testing a lot is, that the GPU may get into a bad > state, upon which any use either fails with a > 'HSA_STATUS_ERROR_OUT_OF_RESOURCES' error -- or by just hanging, deep in > 'libhsa-runtime64.so.1'... So, there's a "fun" aspect: if we run into 'HSA_STATUS_ERROR_OUT_OF_RESOURCES' (or other errors; and similar in the libgomp nvptx plugin) during libgomp GCN plugin device probing, then it's not fatal, but instead silently disables the libgomp plugin/device, thus typically silently resorting to host-fallback execution. That's not helpful behavior in my opinion, so I propose the attached "GCN, nvptx: Errors during device probing are fatal". OK to push? (That's also the behavior that's implemented in both the GCN and nvptx target 'run' tools.) Grüße Thomas [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: 0001-GCN-nvptx-Errors-during-device-probing-are-fatal.patch --] [-- Type: text/x-diff, Size: 5022 bytes --] From 0dc72089dccc10d3b55096ade5fc4d72de6cb96f Mon Sep 17 00:00:00 2001 From: Thomas Schwinge <tschwinge@baylibre.com> Date: Thu, 7 Mar 2024 14:42:07 +0100 Subject: [PATCH] GCN, nvptx: Errors during device probing are fatal Currently, we silently disable libgomp GCN and nvptx plugins/devices in presence of certain error conditions during device probing, thus typically silently resorting to host-fallback execution. Make such errors fatal, similar as for any other device access later on, so that we early and reliably notice when things go wrong. (Keep just two cases non-fatal: (a) libgomp GCN or nvptx plugins are available but 'libhsa-runtime64.so.1' or 'libcuda.so.1' are not, and (b) those are available, but the corresponding devices are not.) This resolves the issue that we've got execution test cases unexpectedly PASSing, despite: libgomp: GCN fatal error: Run-time could not be initialized Runtime message: HSA_STATUS_ERROR_OUT_OF_RESOURCES: The runtime failed to allocate the necessary resources. This error may also occur when the core runtime library needs to spawn threads or create internal OS-specific events. ..., and therefore they were not offloaded to the GCN device, but ran in host-fallback execution mode. What happend in that scenario is that in 'init_hsa_context' during the initial 'GOMP_OFFLOAD_get_num_devices' we ran into 'HSA_STATUS_ERROR_OUT_OF_RESOURCES', but it wasn't fatal, but just silently disabled the libgomp plugin/device. Especially "entertaining" were cases where such unintended host-fallback execution happened during effective-target checks like 'offload_device_available' (host-fallback execution there meaning: no offload device available), but actual test cases then were running with an offload device available, and therefore mis-configured. include/ * cuda/cuda.h (CUresult): Add 'CUDA_ERROR_NO_DEVICE'. libgomp/ * plugin/plugin-gcn.c (init_hsa_context): Add and handle 'bool probe' parameter. Adjust all users; errors during device probing are fatal. * plugin/plugin-nvptx.c (nvptx_get_num_devices): Aside from 'CUDA_ERROR_NO_DEVICE', errors during device probing are fatal. --- include/cuda/cuda.h | 1 + libgomp/plugin/plugin-gcn.c | 14 ++++++++------ libgomp/plugin/plugin-nvptx.c | 4 +++- 3 files changed, 12 insertions(+), 7 deletions(-) diff --git a/include/cuda/cuda.h b/include/cuda/cuda.h index 114aba4e074..0dca4b3a5c0 100644 --- a/include/cuda/cuda.h +++ b/include/cuda/cuda.h @@ -57,6 +57,7 @@ typedef enum { CUDA_ERROR_OUT_OF_MEMORY = 2, CUDA_ERROR_NOT_INITIALIZED = 3, CUDA_ERROR_DEINITIALIZED = 4, + CUDA_ERROR_NO_DEVICE = 100, CUDA_ERROR_INVALID_CONTEXT = 201, CUDA_ERROR_INVALID_HANDLE = 400, CUDA_ERROR_NOT_FOUND = 500, diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c index 7e141a85f31..2bea9157e9d 100644 --- a/libgomp/plugin/plugin-gcn.c +++ b/libgomp/plugin/plugin-gcn.c @@ -1511,10 +1511,12 @@ assign_agent_ids (hsa_agent_t agent, void *data) } /* Initialize hsa_context if it has not already been done. - Return TRUE on success. */ + If !PROBE: returns TRUE on success. + If PROBE: returns TRUE on success or if the plugin/device shall be silently + ignored, and otherwise emits an error and returns FALSE. */ static bool -init_hsa_context (void) +init_hsa_context (bool probe) { hsa_status_t status; int agent_index = 0; @@ -1529,7 +1531,7 @@ init_hsa_context (void) GOMP_PLUGIN_fatal ("%s\n", msg); else GCN_WARNING ("%s\n", msg); - return false; + return probe ? true : false; } status = hsa_fns.hsa_init_fn (); if (status != HSA_STATUS_SUCCESS) @@ -3321,8 +3323,8 @@ GOMP_OFFLOAD_version (void) int GOMP_OFFLOAD_get_num_devices (unsigned int omp_requires_mask) { - if (!init_hsa_context ()) - return 0; + if (!init_hsa_context (true)) + exit (EXIT_FAILURE); /* Return -1 if no omp_requires_mask cannot be fulfilled but devices were present. */ if (hsa_context.agent_count > 0 @@ -3339,7 +3341,7 @@ GOMP_OFFLOAD_get_num_devices (unsigned int omp_requires_mask) bool GOMP_OFFLOAD_init_device (int n) { - if (!init_hsa_context ()) + if (!init_hsa_context (false)) return false; if (n >= hsa_context.agent_count) { diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c index 81b4a7f499a..ba92a3a48cb 100644 --- a/libgomp/plugin/plugin-nvptx.c +++ b/libgomp/plugin/plugin-nvptx.c @@ -622,12 +622,14 @@ nvptx_get_num_devices (void) CUresult r = CUDA_CALL_NOCHECK (cuInit, 0); /* This is not an error: e.g. we may have CUDA libraries installed but no devices available. */ - if (r != CUDA_SUCCESS) + if (r == CUDA_ERROR_NO_DEVICE) { GOMP_PLUGIN_debug (0, "Disabling nvptx offloading; cuInit: %s\n", cuda_error (r)); return 0; } + else if (r != CUDA_SUCCESS) + GOMP_PLUGIN_fatal ("cuInit error: %s", cuda_error (r)); } CUDA_CALL_ASSERT (cuDeviceGetCount, &n); -- 2.34.1 ^ permalink raw reply [flat|nested] 25+ messages in thread
* amdgcn: additional gfx1030/gfx1100 support: adjust test cases (was: [PATCH] amdgcn: additional gfx1100 support) 2024-01-24 12:43 [PATCH] amdgcn: additional gfx1100 support Andrew Stubbs 2024-01-24 16:01 ` [patch] amdgcn: config.gcc - enable gfx1100 multilib; add gfx1100 to docs (was: [PATCH] amdgcn: additional gfx1100 support) Tobias Burnus 2024-01-26 8:56 ` [PATCH] amdgcn: additional gfx1100 support Richard Biener @ 2024-03-06 13:49 ` Thomas Schwinge 2024-03-06 14:03 ` amdgcn: additional gfx1030/gfx1100 support: adjust test cases Andrew Stubbs 2 siblings, 1 reply; 25+ messages in thread From: Thomas Schwinge @ 2024-03-06 13:49 UTC (permalink / raw) To: Andrew Stubbs, gcc-patches; +Cc: Richard Biener [-- Attachment #1: Type: text/plain, Size: 3109 bytes --] Hi! On 2024-01-24T12:43:04+0000, Andrew Stubbs <ams@baylibre.com> wrote: > This [...] ... became commit 99890e15527f1f04caef95ecdd135c9f1a077f08 "amdgcn: additional gfx1030/gfx1100 support", and included the following: > --- a/gcc/config/gcn/gcn-valu.md > +++ b/gcc/config/gcn/gcn-valu.md > @@ -3555,30 +3555,63 @@ > ;; }}} > ;; {{{ Int/int conversions > > +(define_code_iterator all_convert [truncate zero_extend sign_extend]) > (define_code_iterator zero_convert [truncate zero_extend]) > (define_code_attr convop [ > (sign_extend "extend") > (zero_extend "zero_extend") > (truncate "trunc")]) > > -(define_insn "<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>" > +(define_expand "<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>" > + [(set (match_operand:V_INT_1REG 0 "register_operand" "=v") > + (all_convert:V_INT_1REG > + (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))] > + "") > + > +(define_insn "*<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>" > [(set (match_operand:V_INT_1REG 0 "register_operand" "=v") > (zero_convert:V_INT_1REG > (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))] > - "" > + "!TARGET_RDNA3" > "v_mov_b32_sdwa\t%0, %1 dst_sel:<V_INT_1REG:sdwa> dst_unused:UNUSED_PAD src0_sel:<V_INT_1REG_ALT:sdwa>" > [(set_attr "type" "vop_sdwa") > (set_attr "length" "8")]) > > -(define_insn "extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>" > +(define_insn "extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>" > [(set (match_operand:V_INT_1REG 0 "register_operand" "=v") > (sign_extend:V_INT_1REG > (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))] > - "" > + "!TARGET_RDNA3" > "v_mov_b32_sdwa\t%0, sext(%1) src0_sel:<V_INT_1REG_ALT:sdwa>" > [(set_attr "type" "vop_sdwa") > (set_attr "length" "8")]) > > +(define_insn "*<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_shift<exec>" > + [(set (match_operand:V_INT_1REG 0 "register_operand" "=v") > + (all_convert:V_INT_1REG > + (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))] > + "TARGET_RDNA3" > + { > + enum {extend, zero_extend, trunc}; > + rtx shiftwidth = (<V_INT_1REG_ALT:SCALAR_MODE>mode == QImode > + || <V_INT_1REG:SCALAR_MODE>mode == QImode > + ? GEN_INT (24) > + : <V_INT_1REG_ALT:SCALAR_MODE>mode == HImode > + || <V_INT_1REG:SCALAR_MODE>mode == HImode > + ? GEN_INT (16) > + : NULL); > + operands[2] = shiftwidth; > + > + if (!shiftwidth) > + return "v_mov_b32 %0, %1"; > + else if (<convop> == extend || <convop> == trunc) > + return "v_lshlrev_b32\t%0, %2, %1\;v_ashrrev_i32\t%0, %2, %0"; > + else > + return "v_lshlrev_b32\t%0, %2, %1\;v_lshrrev_b32\t%0, %2, %0"; > + } > + [(set_attr "type" "mult") > + (set_attr "length" "8")]) OK to push the attached "amdgcn: additional gfx1030/gfx1100 support: adjust test cases"? Tested 'gcn.exp' for all '-march'es. Grüße Thomas [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: 0001-amdgcn-additional-gfx1030-gfx1100-support-adjust-tes.patch --] [-- Type: text/x-diff, Size: 3283 bytes --] From 04b83e9aa19b02b9805e03f31db14325bb00e737 Mon Sep 17 00:00:00 2001 From: Thomas Schwinge <tschwinge@baylibre.com> Date: Mon, 4 Mar 2024 10:40:39 +0100 Subject: [PATCH] amdgcn: additional gfx1030/gfx1100 support: adjust test cases The "SDWA" changes in commit 99890e15527f1f04caef95ecdd135c9f1a077f08 "amdgcn: additional gfx1030/gfx1100 support" caused a few regressions: PASS: gcc.target/gcn/sram-ecc-3.c (test for excess errors) [-PASS:-]{+FAIL:+} gcc.target/gcn/sram-ecc-3.c scan-assembler zero_extendv64qiv64si2 PASS: gcc.target/gcn/sram-ecc-4.c (test for excess errors) [-PASS:-]{+FAIL:+} gcc.target/gcn/sram-ecc-4.c scan-assembler zero_extendv64hiv64si2 PASS: gcc.target/gcn/sram-ecc-7.c (test for excess errors) [-PASS:-]{+FAIL:+} gcc.target/gcn/sram-ecc-7.c scan-assembler zero_extendv64qiv64si2 PASS: gcc.target/gcn/sram-ecc-8.c (test for excess errors) [-PASS:-]{+FAIL:+} gcc.target/gcn/sram-ecc-8.c scan-assembler zero_extendv64hiv64si2 Those test cases need corresponding adjustment. gcc/testsuite/ * gcc.target/gcn/sram-ecc-3.c: Adjust. * gcc.target/gcn/sram-ecc-4.c: Likewise. * gcc.target/gcn/sram-ecc-7.c: Likewise. * gcc.target/gcn/sram-ecc-8.c: Likewise. --- gcc/testsuite/gcc.target/gcn/sram-ecc-3.c | 2 +- gcc/testsuite/gcc.target/gcn/sram-ecc-4.c | 2 +- gcc/testsuite/gcc.target/gcn/sram-ecc-7.c | 2 +- gcc/testsuite/gcc.target/gcn/sram-ecc-8.c | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/gcc/testsuite/gcc.target/gcn/sram-ecc-3.c b/gcc/testsuite/gcc.target/gcn/sram-ecc-3.c index 692d4578b66..bc89e3542d2 100644 --- a/gcc/testsuite/gcc.target/gcn/sram-ecc-3.c +++ b/gcc/testsuite/gcc.target/gcn/sram-ecc-3.c @@ -18,4 +18,4 @@ f () a[n] = b[n]; } -/* { dg-final { scan-assembler "zero_extendv64qiv64si2" } } */ +/* { dg-final { scan-assembler "(\\\*zero_extendv64qiv64si_sdwa|\\\*zero_extendv64qiv64si_shift)" } } */ diff --git a/gcc/testsuite/gcc.target/gcn/sram-ecc-4.c b/gcc/testsuite/gcc.target/gcn/sram-ecc-4.c index 61b8d552759..ff7e2d0bda5 100644 --- a/gcc/testsuite/gcc.target/gcn/sram-ecc-4.c +++ b/gcc/testsuite/gcc.target/gcn/sram-ecc-4.c @@ -18,4 +18,4 @@ f () a[n] = b[n]; } -/* { dg-final { scan-assembler "zero_extendv64hiv64si2" } } */ +/* { dg-final { scan-assembler "(\\\*zero_extendv64hiv64si_sdwa|\\\*zero_extendv64hiv64si_shift)" } } */ diff --git a/gcc/testsuite/gcc.target/gcn/sram-ecc-7.c b/gcc/testsuite/gcc.target/gcn/sram-ecc-7.c index 9d0ce6f6b5a..8d363970ffb 100644 --- a/gcc/testsuite/gcc.target/gcn/sram-ecc-7.c +++ b/gcc/testsuite/gcc.target/gcn/sram-ecc-7.c @@ -18,4 +18,4 @@ f () a[n] = b[n]; } -/* { dg-final { scan-assembler "zero_extendv64qiv64si2" } } */ +/* { dg-final { scan-assembler "(\\\*zero_extendv64qiv64si_sdwa|\\\*zero_extendv64qiv64si_shift)" } } */ diff --git a/gcc/testsuite/gcc.target/gcn/sram-ecc-8.c b/gcc/testsuite/gcc.target/gcn/sram-ecc-8.c index 76e02882798..a2b25076ed1 100644 --- a/gcc/testsuite/gcc.target/gcn/sram-ecc-8.c +++ b/gcc/testsuite/gcc.target/gcn/sram-ecc-8.c @@ -18,4 +18,4 @@ f () a[n] = b[n]; } -/* { dg-final { scan-assembler "zero_extendv64hiv64si2" } } */ +/* { dg-final { scan-assembler "(\\\*zero_extendv64hiv64si_sdwa|\\\*zero_extendv64hiv64si_shift)" } } */ -- 2.34.1 ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: amdgcn: additional gfx1030/gfx1100 support: adjust test cases 2024-03-06 13:49 ` amdgcn: additional gfx1030/gfx1100 support: adjust test cases (was: [PATCH] amdgcn: additional gfx1100 support) Thomas Schwinge @ 2024-03-06 14:03 ` Andrew Stubbs 0 siblings, 0 replies; 25+ messages in thread From: Andrew Stubbs @ 2024-03-06 14:03 UTC (permalink / raw) To: Thomas Schwinge, gcc-patches; +Cc: Richard Biener On 06/03/2024 13:49, Thomas Schwinge wrote: > Hi! > > On 2024-01-24T12:43:04+0000, Andrew Stubbs <ams@baylibre.com> wrote: >> This [...] > > ... became commit 99890e15527f1f04caef95ecdd135c9f1a077f08 > "amdgcn: additional gfx1030/gfx1100 support", and included the following: > >> --- a/gcc/config/gcn/gcn-valu.md >> +++ b/gcc/config/gcn/gcn-valu.md >> @@ -3555,30 +3555,63 @@ >> ;; }}} >> ;; {{{ Int/int conversions >> >> +(define_code_iterator all_convert [truncate zero_extend sign_extend]) >> (define_code_iterator zero_convert [truncate zero_extend]) >> (define_code_attr convop [ >> (sign_extend "extend") >> (zero_extend "zero_extend") >> (truncate "trunc")]) >> >> -(define_insn "<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>" >> +(define_expand "<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>" >> + [(set (match_operand:V_INT_1REG 0 "register_operand" "=v") >> + (all_convert:V_INT_1REG >> + (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))] >> + "") >> + >> +(define_insn "*<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>" >> [(set (match_operand:V_INT_1REG 0 "register_operand" "=v") >> (zero_convert:V_INT_1REG >> (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))] >> - "" >> + "!TARGET_RDNA3" >> "v_mov_b32_sdwa\t%0, %1 dst_sel:<V_INT_1REG:sdwa> dst_unused:UNUSED_PAD src0_sel:<V_INT_1REG_ALT:sdwa>" >> [(set_attr "type" "vop_sdwa") >> (set_attr "length" "8")]) >> >> -(define_insn "extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>" >> +(define_insn "extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>" >> [(set (match_operand:V_INT_1REG 0 "register_operand" "=v") >> (sign_extend:V_INT_1REG >> (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))] >> - "" >> + "!TARGET_RDNA3" >> "v_mov_b32_sdwa\t%0, sext(%1) src0_sel:<V_INT_1REG_ALT:sdwa>" >> [(set_attr "type" "vop_sdwa") >> (set_attr "length" "8")]) >> >> +(define_insn "*<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_shift<exec>" >> + [(set (match_operand:V_INT_1REG 0 "register_operand" "=v") >> + (all_convert:V_INT_1REG >> + (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))] >> + "TARGET_RDNA3" >> + { >> + enum {extend, zero_extend, trunc}; >> + rtx shiftwidth = (<V_INT_1REG_ALT:SCALAR_MODE>mode == QImode >> + || <V_INT_1REG:SCALAR_MODE>mode == QImode >> + ? GEN_INT (24) >> + : <V_INT_1REG_ALT:SCALAR_MODE>mode == HImode >> + || <V_INT_1REG:SCALAR_MODE>mode == HImode >> + ? GEN_INT (16) >> + : NULL); >> + operands[2] = shiftwidth; >> + >> + if (!shiftwidth) >> + return "v_mov_b32 %0, %1"; >> + else if (<convop> == extend || <convop> == trunc) >> + return "v_lshlrev_b32\t%0, %2, %1\;v_ashrrev_i32\t%0, %2, %0"; >> + else >> + return "v_lshlrev_b32\t%0, %2, %1\;v_lshrrev_b32\t%0, %2, %0"; >> + } >> + [(set_attr "type" "mult") >> + (set_attr "length" "8")]) > > OK to push the attached > "amdgcn: additional gfx1030/gfx1100 support: adjust test cases"? > Tested 'gcn.exp' for all '-march'es. LGTM. Andrew ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2024-03-08 10:34 UTC | newest] Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-01-24 12:43 [PATCH] amdgcn: additional gfx1100 support Andrew Stubbs 2024-01-24 16:01 ` [patch] amdgcn: config.gcc - enable gfx1100 multilib; add gfx1100 to docs (was: [PATCH] amdgcn: additional gfx1100 support) Tobias Burnus 2024-01-26 12:26 ` [patch] amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs (was: [patch] amdgcn: config.gcc - enable gfx1100 multilib; add gfx1100 to docs) Tobias Burnus 2024-01-26 12:32 ` [patch] amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs Tobias Burnus 2024-01-26 12:40 ` Richard Biener 2024-01-26 12:59 ` Tobias Burnus 2024-01-26 16:21 ` Thomas Schwinge 2024-01-26 16:36 ` Richard Biener 2024-01-26 16:45 ` [patch] install.texi: For gcn, recommend LLVM 15, unless gfx1100 is disabled (was: [patch] amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs) Tobias Burnus 2024-01-29 10:01 ` [patch] install.texi: For gcn, recommend LLVM 15, unless gfx1100 is disabled Andrew Stubbs 2024-01-26 8:56 ` [PATCH] amdgcn: additional gfx1100 support Richard Biener 2024-01-26 9:45 ` Richard Biener 2024-01-26 10:19 ` Andrew Stubbs 2024-01-26 10:22 ` Richard Biener 2024-01-26 10:31 ` Andrew Stubbs 2024-02-01 14:41 ` libgomp GCN gfx1030/gfx1100 offloading status (was: [PATCH] amdgcn: additional gfx1100 support) Thomas Schwinge 2024-02-01 14:49 ` Richard Biener 2024-02-21 12:34 ` Stabilizing flaky libgomp GCN target/offloading testing (was: libgomp GCN gfx1030/gfx1100 offloading status) Thomas Schwinge 2024-02-21 16:32 ` Richard Biener 2024-03-06 12:09 ` Stabilize flaky GCN target/offloading testing Thomas Schwinge 2024-03-06 12:39 ` Andrew Stubbs 2024-03-06 13:29 ` Richard Biener 2024-03-08 10:34 ` GCN, nvptx: Errors during device probing are fatal (was: Stabilizing flaky libgomp GCN target/offloading testing) Thomas Schwinge 2024-03-06 13:49 ` amdgcn: additional gfx1030/gfx1100 support: adjust test cases (was: [PATCH] amdgcn: additional gfx1100 support) Thomas Schwinge 2024-03-06 14:03 ` amdgcn: additional gfx1030/gfx1100 support: adjust test cases Andrew Stubbs
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).