From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2a07:de40:b251:101:10:150:64:2]) by sourceware.org (Postfix) with ESMTPS id B379E3858C2C for ; Fri, 26 Jan 2024 08:58:00 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B379E3858C2C Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org B379E3858C2C Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a07:de40:b251:101:10:150:64:2 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1706259484; cv=none; b=NBTNx1N8NyKpvF2oa/11NM5hOIWIlaC3QXEvKUT60HkjSlFhyBrKj25qBf+pVZw7SdrqK0aNes3XFI/xwqsvNPBswz1G16wt05xtXpZDb1uN2pGqp23nKjP/xcyinc55X4UitI3EZ/IlSSIhdyVo3mm7GKK7YK7PHBf3JOfWeAY= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1706259484; c=relaxed/simple; bh=wSim7FiPURecvtHDOy3dpzgBVHErHtuLkFSCw+r/+sI=; h=DKIM-Signature:DKIM-Signature:DKIM-Signature:DKIM-Signature:Date: From:To:Subject:Message-ID:MIME-Version; b=wF69JPBpFZqvL8+hi5k7VCcov3CuqLQBLe8dE+18SX04gp655Uc6TCr8Bvk9FCNyyRFVX9kxKUWfoNiQWr7l/rzVMAe+kx9HcpuzlOPTQHuQeWbSr6C2O6G7SLW450ftTjFeaXcCwasVBbrsUJTKlTVEBgvJyiK53UOY36t1t+k= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from [10.168.4.150] (unknown [10.168.4.150]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id E72E11FB6E; Fri, 26 Jan 2024 08:57:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1706259479; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=fz3e+Y1Kh1JMTw55aKku6KotfX1RNpRV6j+JBZ0siPw=; b=Z/wqqpAdOTLSqsaXSfqCtYjpWfSCAgWCyhxJp1T/KLiAVQ0GqzLoFgeWdkiWT6AACMUQuv 8aoTL7xxhVkWVfQAvrCvX8s1dQrW9yy64i54UlWLWJl6wax4NsGm7YtrC8H48QRYw+Ps3o oYvERY5Gf+ZH0uKKCzOhxkslDzsXFL0= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1706259479; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=fz3e+Y1Kh1JMTw55aKku6KotfX1RNpRV6j+JBZ0siPw=; b=xH3Y6JLaYZJP4QPtEdsYvcumZOUTDnRDAoyWRERI2vTic5W1lPUEqgB45slQqMmf+0w0+0 1XmfZTsy+SkhUHAg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1706259478; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=fz3e+Y1Kh1JMTw55aKku6KotfX1RNpRV6j+JBZ0siPw=; b=YE/z7a8EpIsnpHa4Wb/lIaBOqFL79L4Q+PSqpvXY99Y0F9RzJKGOZbGdcAa/QII3aMV1Yu Zd349m5EY/CDbIh9/oNJqzem84nPUCwe7Ig6fzzYjsrDBcixEsSXr02980Vvmf+8GhkkM0 oOhES0HV2o9A1ZZ2sUizGecOD50DTRY= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1706259478; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=fz3e+Y1Kh1JMTw55aKku6KotfX1RNpRV6j+JBZ0siPw=; b=VZSivmRvqBjEfK/KuWJ9CmHZxhCyu2NgmTa4gOBI5M/FIx/zg1mLNyDMd5Suh8J8QPYoAf IqzS/kf5CoVJN2AA== Date: Fri, 26 Jan 2024 09:56:50 +0100 (CET) From: Richard Biener To: Andrew Stubbs cc: gcc-patches@gcc.gnu.org, pa@codesourcery.com Subject: Re: [PATCH] amdgcn: additional gfx1100 support In-Reply-To: <20240124124304.1780645-1-ams@baylibre.com> Message-ID: <78875q15-qq2n-45o2-nooo-59r0s0ss9031@fhfr.qr> References: <20240124124304.1780645-1-ams@baylibre.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Authentication-Results: smtp-out2.suse.de; none X-Spam-Level: X-Spam-Score: -4.30 X-Spamd-Result: default: False [-4.30 / 50.00]; TO_DN_SOME(0.00)[]; NEURAL_HAM_SHORT(-0.20)[-0.991]; RCVD_COUNT_ZERO(0.00)[0]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; BAYES_HAM(-3.00)[100.00%]; ARC_NA(0.00)[]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_ALL(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.de:email]; FUZZY_BLOCKED(0.00)[rspamd.com] X-Spam-Status: No, score=-11.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Wed, 24 Jan 2024, Andrew Stubbs wrote: > This is enough to get gfx1100 working for most purposes, on top of the > patch that Tobias committed a week or so ago; there are still some test > failures to investigate, and probably some tuning to do. > > It might also get gfx1030 working too. @Richi, could you test it, > please? I can report partial success here. I do see quite some FAILs because of /space/rguenther/src/gcc-autopar_devel/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-4.f90: In function 'accum_._omp_fn.1':^M /space/rguenther/src/gcc-autopar_devel/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-4.f90:20:38: error: unrecognizable insn:^M (insn 108 107 109 6 (set (reg:V8SF 849)^M (unspec:V8SF [^M (reg:V8SF 844 [ vect__43.12_106 ]) repeated x2^M (const_int 1 [0x1])^M ] UNSPEC_PLUS_DPP_SHR)) "/space/rguenther/src/gcc-autopar_devel/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-4.f90":22:29 discrim 1 -1^M (nil))^M during RTL pass: vregs^M /space/rguenther/src/gcc-autopar_devel/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-4.f90:20:38: internal compiler error: in extract_insn, at recog.cc:2812^M there are also quite a number of execution FAILs like icv-5.exe: /space/rguenther/src/gcc-autopar_devel/libgomp/plugin/plugin-gcn.c:2462: isa_matches_agent: Assertion `agent_isa_s' failed. FAIL: libgomp.c/../libgomp.c-c++-common/icv-5.c execution test (the assert in question looks bad - yeah, somehow we got past device initialization - not sure how - but end up here) Maybe HSA behaves odd here - I didn't constrain the device it should choose but it works (most of the time). GCN_DEBUG prints me all the HSA agents available but I don't see any debug on which agent is actually initialized during libgomp device init (at least nothing I can easily match up). Maybe sth to improve. I'll followup with a test summary once the (serial) run of libgomp testing finished. At least there are quite some number of actual kernel executions and PASSing testcases. Richard. > I can't test the other multilibs right now. @PA, can you test it please? > > I can self-approve the patch, but I'll hold off the commit until the > test results come back. > > Andrew > > gcc/ChangeLog: > > * config/gcn/gcn-opts.h (TARGET_PACKED_WORK_ITEMS): Add TARGET_RDNA3. > * config/gcn/gcn-valu.md (all_convert): New iterator. > (2): New > define_expand, and rename the old one to ... > (*_sdwa): ... this. > (extend2): Likewise, to ... > (extend_sdwa): .. this. > (*_shift): New. > * config/gcn/gcn.cc (gcn_global_address_p): Use "offsetbits" correctly. > (gcn_hsa_declare_function_name): Update the vgpr counting for gfx1100. > * config/gcn/gcn.md (mulhisi3): Disable on RDNA3. > (mulqihi3_scalar): Likewise. > > libgcc/ChangeLog: > > * config/gcn/amdgcn_veclib.h (CDNA3_PLUS): Handle RDNA3. > > libgomp/ChangeLog: > > * config/gcn/time.c (RTC_TICKS): Configure RDNA3. > (omp_get_wtime): Add RDNA3-compatible variant. > * plugin/plugin-gcn.c (max_isa_vgprs): Tune for gfx1030 and gfx1100. > > Signed-off-by: Andrew Stubbs > --- > gcc/config/gcn/gcn-opts.h | 2 +- > gcc/config/gcn/gcn-valu.md | 41 ++++++++++++++++++++++++++++--- > gcc/config/gcn/gcn.cc | 31 ++++++++++++++++------- > gcc/config/gcn/gcn.md | 4 +-- > libgcc/config/gcn/amdgcn_veclib.h | 2 +- > libgomp/config/gcn/time.c | 10 ++++++++ > libgomp/plugin/plugin-gcn.c | 6 +++-- > 7 files changed, 77 insertions(+), 19 deletions(-) > > diff --git a/gcc/config/gcn/gcn-opts.h b/gcc/config/gcn/gcn-opts.h > index 79fbda3ab25..6be2c9204fa 100644 > --- a/gcc/config/gcn/gcn-opts.h > +++ b/gcc/config/gcn/gcn-opts.h > @@ -62,7 +62,7 @@ extern enum gcn_isa { > > > #define TARGET_M0_LDS_LIMIT (TARGET_GCN3) > -#define TARGET_PACKED_WORK_ITEMS (TARGET_CDNA2_PLUS) > +#define TARGET_PACKED_WORK_ITEMS (TARGET_CDNA2_PLUS || TARGET_RDNA3) > > #define TARGET_XNACK (flag_xnack != HSACO_ATTR_OFF) > > diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md > index 3d5b6271ee6..cd027f8b369 100644 > --- a/gcc/config/gcn/gcn-valu.md > +++ b/gcc/config/gcn/gcn-valu.md > @@ -3555,30 +3555,63 @@ > ;; }}} > ;; {{{ Int/int conversions > > +(define_code_iterator all_convert [truncate zero_extend sign_extend]) > (define_code_iterator zero_convert [truncate zero_extend]) > (define_code_attr convop [ > (sign_extend "extend") > (zero_extend "zero_extend") > (truncate "trunc")]) > > -(define_insn "2" > +(define_expand "2" > + [(set (match_operand:V_INT_1REG 0 "register_operand" "=v") > + (all_convert:V_INT_1REG > + (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))] > + "") > + > +(define_insn "*_sdwa" > [(set (match_operand:V_INT_1REG 0 "register_operand" "=v") > (zero_convert:V_INT_1REG > (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))] > - "" > + "!TARGET_RDNA3" > "v_mov_b32_sdwa\t%0, %1 dst_sel: dst_unused:UNUSED_PAD src0_sel:" > [(set_attr "type" "vop_sdwa") > (set_attr "length" "8")]) > > -(define_insn "extend2" > +(define_insn "extend_sdwa" > [(set (match_operand:V_INT_1REG 0 "register_operand" "=v") > (sign_extend:V_INT_1REG > (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))] > - "" > + "!TARGET_RDNA3" > "v_mov_b32_sdwa\t%0, sext(%1) src0_sel:" > [(set_attr "type" "vop_sdwa") > (set_attr "length" "8")]) > > +(define_insn "*_shift" > + [(set (match_operand:V_INT_1REG 0 "register_operand" "=v") > + (all_convert:V_INT_1REG > + (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))] > + "TARGET_RDNA3" > + { > + enum {extend, zero_extend, trunc}; > + rtx shiftwidth = (mode == QImode > + || mode == QImode > + ? GEN_INT (24) > + : mode == HImode > + || mode == HImode > + ? GEN_INT (16) > + : NULL); > + operands[2] = shiftwidth; > + > + if (!shiftwidth) > + return "v_mov_b32 %0, %1"; > + else if ( == extend || == trunc) > + return "v_lshlrev_b32\t%0, %2, %1\;v_ashrrev_i32\t%0, %2, %0"; > + else > + return "v_lshlrev_b32\t%0, %2, %1\;v_lshrrev_b32\t%0, %2, %0"; > + } > + [(set_attr "type" "mult") > + (set_attr "length" "8")]) > + > ;; GCC can already do these for scalar types, but not for vector types. > ;; Unfortunately you can't just do SUBREG on a vector to select the low part, > ;; so there must be a few tricks here. > diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc > index e668ce7c69e..e80de2ce056 100644 > --- a/gcc/config/gcn/gcn.cc > +++ b/gcc/config/gcn/gcn.cc > @@ -1597,8 +1597,8 @@ gcn_global_address_p (rtx addr) > rtx offset = XEXP (addr, 1); > int offsetbits = (TARGET_RDNA2_PLUS ? 11 : 12); > bool immediate_p = (CONST_INT_P (offset) > - && INTVAL (offset) >= -(1 << 12) > - && INTVAL (offset) < (1 << 12)); > + && INTVAL (offset) >= -(1 << offsetbits) > + && INTVAL (offset) < (1 << offsetbits)); > > if ((gcn_address_register_p (base, DImode, false) > || gcn_vec_address_register_p (base, DImode, false)) > @@ -6597,8 +6597,10 @@ gcn_hsa_declare_function_name (FILE *file, const char *name, > if (df_regs_ever_live_p (FIRST_AVGPR_REG + avgpr)) > break; > avgpr++; > - vgpr = (vgpr + 3) & ~3; > - avgpr = (avgpr + 3) & ~3; > + > + /* The main function epilogue uses v8, but df doesn't see that. */ > + if (vgpr < 9) > + vgpr = 9; > > if (!leaf_function_p ()) > { > @@ -6611,9 +6613,18 @@ gcn_hsa_declare_function_name (FILE *file, const char *name, > avgpr = MAX_NORMAL_AVGPR_COUNT; > } > > - /* The gfx90a accum_offset field can't represent 0 registers. */ > - if (gcn_arch == PROCESSOR_GFX90a && vgpr < 4) > - vgpr = 4; > + /* SIMD32 devices count double in wavefront64 mode. */ > + if (TARGET_RDNA2_PLUS) > + vgpr *= 2; > + > + /* Round up to the allocation block size. */ > + int vgpr_block_size = (TARGET_RDNA3 ? 12 > + : TARGET_RDNA2_PLUS || TARGET_CDNA2_PLUS ? 8 > + : 4); > + if (vgpr % vgpr_block_size) > + vgpr += vgpr_block_size - (vgpr % vgpr_block_size); > + if (avgpr % vgpr_block_size) > + avgpr += vgpr_block_size - (avgpr % vgpr_block_size); > > fputs ("\t.rodata\n" > "\t.p2align\t6\n" > @@ -6714,12 +6725,14 @@ gcn_hsa_declare_function_name (FILE *file, const char *name, > " .private_segment_fixed_size: 0\n" > " .wavefront_size: 64\n" > " .sgpr_count: %i\n" > - " .vgpr_count: %i\n" > + " .vgpr_count: %i%s\n" > " .max_flat_workgroup_size: 1024\n", > cfun->machine->kernarg_segment_byte_size, > cfun->machine->kernarg_segment_alignment, > LDS_SIZE, > - sgpr, next_free_vgpr); > + sgpr, next_free_vgpr, > + (TARGET_RDNA2_PLUS ? " ; wavefrontsize64 counts double on SIMD32" > + : "")); > if (gcn_arch == PROCESSOR_GFX90a || gcn_arch == PROCESSOR_GFX908) > fprintf (file, " .agpr_count: %i\n", avgpr); > fputs (" .end_amdgpu_metadata\n", file); > diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md > index 492b833e255..1f3c692b7a6 100644 > --- a/gcc/config/gcn/gcn.md > +++ b/gcc/config/gcn/gcn.md > @@ -1618,7 +1618,7 @@ > (mult:SI > (any_extend:SI (match_operand:HI 1 "register_operand" "%v")) > (any_extend:SI (match_operand:HI 2 "register_operand" " v"))))] > - "" > + "!TARGET_RDNA3" > "v_mul_32_24_sdwa\t%0, %1, %2 src0_sel:WORD_0 src1_sel:WORD_0" > [(set_attr "type" "vop_sdwa") > (set_attr "length" "8")]) > @@ -1628,7 +1628,7 @@ > (mult:HI > (any_extend:HI (match_operand:QI 1 "register_operand" "%v")) > (any_extend:HI (match_operand:QI 2 "register_operand" " v"))))] > - "" > + "!TARGET_RDNA3" > "v_mul_32_24_sdwa\t%0, %1, %2 src0_sel:BYTE_0 src1_sel:BYTE_0" > [(set_attr "type" "vop_sdwa") > (set_attr "length" "8")]) > diff --git a/libgcc/config/gcn/amdgcn_veclib.h b/libgcc/config/gcn/amdgcn_veclib.h > index 821f6386dd6..d268c6cac16 100644 > --- a/libgcc/config/gcn/amdgcn_veclib.h > +++ b/libgcc/config/gcn/amdgcn_veclib.h > @@ -230,7 +230,7 @@ do { \ > > #if defined (__GCN3__) || defined (__GCN5__) \ > || defined (__CDNA1__) || defined (__CDNA2__) \ > - || defined (__RDNA2__) > + || defined (__RDNA2__) || defined (__RDNA3__) > #define CDNA3_PLUS 0 > #else > #define CDNA3_PLUS 1 > diff --git a/libgomp/config/gcn/time.c b/libgomp/config/gcn/time.c > index 30a0d0188e4..efcd04f5f43 100644 > --- a/libgomp/config/gcn/time.c > +++ b/libgomp/config/gcn/time.c > @@ -30,15 +30,25 @@ > /* According to AMD: > dGPU RTC is 27MHz > AGPU RTC is 100MHz > + RDNA3 ISA manual states "typically 100MHz" > FIXME: DTRT on an APU. */ > +#ifdef __RDNA3__ > +#define RTC_TICKS (1.0 / 100000000.0) /* 100MHz */ > +#else > #define RTC_TICKS (1.0 / 27000000.0) /* 27MHz */ > +#endif > > double > omp_get_wtime (void) > { > uint64_t clock; > +#ifdef __RDNA3__ > + asm ("s_sendmsg_rtn_b64 %0 0x83 ;Get REALTIME\n\t" > + "s_waitcnt 0" : "=r" (clock)); > +#else > asm ("s_memrealtime %0\n\t" > "s_waitcnt 0" : "=r" (clock)); > +#endif > return clock * RTC_TICKS; > } > > diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c > index 0339848451e..db28781dedb 100644 > --- a/libgomp/plugin/plugin-gcn.c > +++ b/libgomp/plugin/plugin-gcn.c > @@ -1741,11 +1741,13 @@ max_isa_vgprs (int isa) > case EF_AMDGPU_MACH_AMDGCN_GFX900: > case EF_AMDGPU_MACH_AMDGCN_GFX906: > case EF_AMDGPU_MACH_AMDGCN_GFX908: > - case EF_AMDGPU_MACH_AMDGCN_GFX1030: > - case EF_AMDGPU_MACH_AMDGCN_GFX1100: > return 256; > case EF_AMDGPU_MACH_AMDGCN_GFX90a: > return 512; > + case EF_AMDGPU_MACH_AMDGCN_GFX1030: > + return 512; /* 512 SIMD32 = 256 wavefrontsize64. */ > + case EF_AMDGPU_MACH_AMDGCN_GFX1100: > + return 1536; /* 1536 SIMD32 = 768 wavefrontsize64. */ > } > GOMP_PLUGIN_fatal ("unhandled ISA in max_isa_vgprs"); > } > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)