From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2a07:de40:b251:101:10:150:64:1]) by sourceware.org (Postfix) with ESMTPS id 4D1C8386187E for ; Mon, 19 Feb 2024 10:52:59 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4D1C8386187E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 4D1C8386187E Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a07:de40:b251:101:10:150:64:1 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1708339981; cv=none; b=eB2a6dBH9HOPzHLJFfD7sIt46eOuuRgQwtCyZhLCFTMJCBaZK4bjbBnPlIeq0K9p7iZ+R6FBjOSPC7kMu0k2n7a03VILkdABwbltIFJ7VyNjf8qFzkmkr4CJyzA5+m3lpjWUTjF+XzRw0IXEo+AqB4J1Zeqdx9eUuP//TYyrb5U= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1708339981; c=relaxed/simple; bh=rQPtUOnzNKjE0D3TYEdipxDgVniPMXOYduEguJSY1sY=; h=DKIM-Signature:DKIM-Signature:DKIM-Signature:DKIM-Signature:Date: From:To:Subject:Message-ID:MIME-Version; b=S1WimXuHpg//OVaPL3bp2qjM/Lrl7m+pugEAOTELpnf1DpXO1+n7nbMrkF/VPuoLaKAEXvWTe7UtF8gB8n8Frm0Lv3dFCUPw1hQfIaGB5kfK0aQKwTd8dF+yljFtZ8ZjbrSjoFZBOKDIeKcyftwE+mpFyw9DMIe5SH4FCRy43bY= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from [10.168.4.150] (unknown [10.168.4.150]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id EFC392125C; Mon, 19 Feb 2024 10:52:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1708339978; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=XKvOZeUP1mzRAKv5hlMvBtPQklN+K9vwaoxpN9htJuU=; b=BzSVSVhszEqD6n3pvRRVxztR9ivmNJMgBfFxxUjieMTFz7SiosEHiTfOwtmB2DEo5c/9gu PegTEqv4POCETvjF+TpfnYgzoJGIAvIb1PNWEx2trf+09rwrhGueDMmo9s5m8nBxUK9KXr akSTTNvZj6+r6s6xccNit0NPlDDy1ro= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1708339978; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=XKvOZeUP1mzRAKv5hlMvBtPQklN+K9vwaoxpN9htJuU=; b=6oGvD4DnXQbHS31elSAuWszqPYAZxmrG/rSzOoh4wsrJbevtMKINMWuk0guE192ZI7BGZU 3CKIPoBgkXt5V8Bg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1708339976; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=XKvOZeUP1mzRAKv5hlMvBtPQklN+K9vwaoxpN9htJuU=; b=jTDv3mY6ARp1wB+bonkXjTtlfT6iJihBZ8zM9TcPBuFC22njJFRPO2wMj5iUgWrwoHdSoM vKe/EO9Eh+XW9uRDJJciKiFhma1KcEWqSpwuQYqqfiaBDgrHWuYkZlhrQEmmyLJFYaqbnK aeRLRnbaxtgD6Hcvnk+943Yc/h8p5ZA= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1708339976; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=XKvOZeUP1mzRAKv5hlMvBtPQklN+K9vwaoxpN9htJuU=; b=FupQPap1SFSYZqgoV5wcLQgiKLN3Rn5YIlSkK8YZgTVXly62IROaZNZyuhE+vr980Yw95x MLZ+Oj1XwjjBSKBQ== Date: Mon, 19 Feb 2024 11:52:55 +0100 (CET) From: Richard Biener To: Thomas Schwinge cc: Andrew Stubbs , gcc-patches@gcc.gnu.org Subject: Re: GCN RDNA2+ vs. GCC SLP vectorizer In-Reply-To: <87jzn091s1.fsf@euler.schwinge.ddns.net> Message-ID: References: <87ttm8ka6h.fsf@euler.schwinge.ddns.net> <55q4729r-1014-5541-7p75-6rq6r97845r7@fhfr.qr> <4eb1a40e-0f54-4e27-90f8-00f4bba90907@baylibre.com> <53s543rq-36qn-ns26-o0qo-97o168o707pn@fhfr.qr> <9714f90d-a581-4ebe-a031-d5d8c6db9cf6@baylibre.com> <87plww8qin.fsf@euler.schwinge.ddns.net> <87jzn091s1.fsf@euler.schwinge.ddns.net> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Authentication-Results: smtp-out1.suse.de; none X-Spamd-Result: default: False [-3.10 / 50.00]; ARC_NA(0.00)[]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; TO_DN_SOME(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain]; DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; FUZZY_BLOCKED(0.00)[rspamd.com]; RCVD_COUNT_ZERO(0.00)[0]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; BAYES_HAM(-3.00)[100.00%] X-Spam-Level: X-Spam-Score: -3.10 X-Spam-Status: No, score=-5.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,KAM_SHORT,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, 19 Feb 2024, Thomas Schwinge wrote: > Hi! > > On 2024-02-16T14:53:04+0100, I wrote: > > On 2024-02-16T12:41:06+0000, Andrew Stubbs wrote: > >> On 16/02/2024 12:26, Richard Biener wrote: > >>> On Fri, 16 Feb 2024, Andrew Stubbs wrote: > >>>> On 16/02/2024 10:17, Richard Biener wrote: > >>>>> On Fri, 16 Feb 2024, Thomas Schwinge wrote: > >>>>>> On 2023-10-20T12:51:03+0100, Andrew Stubbs wrote: > >>>>>>> I've committed this patch > >>>>>> > >>>>>> ... as commit c7ec7bd1c6590cf4eed267feab490288e0b8d691 > >>>>>> "amdgcn: add -march=gfx1030 EXPERIMENTAL", which the later RDNA3/gfx1100 > >>>>>> support builds on top of, and that's what I'm currently working on > >>>>>> getting proper GCC/GCN target (not offloading) results for. > >>>>>> > >>>>>> Now looking at 'gcc.dg/vect/bb-slp-cond-1.c', which is reasonably simple, > >>>>>> and hopefully representative for other SLP execution test FAILs > >>>>>> (regressions compared to my earlier non-gfx1100 testing). > >>>>>> > >>>>>> $ build-gcc/gcc/xgcc -Bbuild-gcc/gcc/ > >>>>>> source-gcc/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c > >>>>>> --sysroot=install/amdgcn-amdhsa -ftree-vectorize > >>>>>> -fno-tree-loop-distribute-patterns -fno-vect-cost-model -fno-common > >>>>>> -O2 -fdump-tree-slp-details -fdump-tree-vect-details -isystem > >>>>>> build-gcc/amdgcn-amdhsa/gfx1100/newlib/targ-include -isystem > >>>>>> source-gcc/newlib/libc/include > >>>>>> -Bbuild-gcc/amdgcn-amdhsa/gfx1100/newlib/ > >>>>>> -Lbuild-gcc/amdgcn-amdhsa/gfx1100/newlib -wrapper > >>>>>> setarch,--addr-no-randomize -fdump-tree-all-all -fdump-ipa-all-all > >>>>>> -fdump-rtl-all-all -save-temps -march=gfx1100 > >>>>>> > >>>>>> The '-march=gfx1030' 'a-bb-slp-cond-1.s' is identical (apart from > >>>>>> 'TARGET_PACKED_WORK_ITEMS' in 'gcn_target_asm_function_prologue'), so I > >>>>>> suppose will also exhibit the same failure mode, once again? > >>>>>> > >>>>>> Compared to '-march=gfx90a', the differences begin in > >>>>>> 'a-bb-slp-cond-1.c.266r.expand' (only!), down to 'a-bb-slp-cond-1.s'. > >>>>>> > >>>>>> Changed like: > >>>>>> > >>>>>> @@ -38,10 +38,10 @@ int main () > >>>>>> #pragma GCC novector > >>>>>> for (i = 1; i < N; i++) > >>>>>> if (a[i] != i%4 + 1) > >>>>>> - abort (); > >>>>>> + __builtin_printf("%d %d != %d\n", i, a[i], i%4 + 1); > >>>>>> > >>>>>> if (a[0] != 5) > >>>>>> - abort (); > >>>>>> + __builtin_printf("%d %d != %d\n", 0, a[0], 5); > >>>>>> > >>>>>> ..., we see: > >>>>>> > >>>>>> $ flock /tmp/gcn.lock build-gcc/gcc/gcn-run a.out > >>>>>> 40 5 != 1 > >>>>>> 41 6 != 2 > >>>>>> 42 7 != 3 > >>>>>> 43 8 != 4 > >>>>>> 44 5 != 1 > >>>>>> 45 6 != 2 > >>>>>> 46 7 != 3 > >>>>>> 47 8 != 4 > >>>>>> > >>>>>> '40..47' are the 'i = 10..11' in 'foo', and the expectation is > >>>>>> 'a[i * stride + 0..3] != 0'. So, either some earlier iteration has > >>>>>> scribbled zero values over these (vector lane masking issue, perhaps?), > >>>>>> or some other code generation issue? > > > >>>> [...], I must be doing something different because vect/bb-slp-cond-1.c > >>>> passes for me, on gfx1100. > > > > That's strange. I've looked at your log file (looks good), and used your > > toolchain to compile, and your 'gcn-run' to invoke, and still do get: > > > > $ flock /tmp/gcn.lock ~/gcn-run ~/bb-slp-cond-1.exe > > GCN Kernel Aborted > > Kernel aborted > > > > Andrew, later on, please try what happens when you put an unconditional > > 'abort' call into a test case? > > Andrew, any luck with that yet? > > Richard, are you able to reproduce the 'gcc.dg/vect/bb-slp-cond-1.c' > execution test failure mentioned above (manual compilation and > 'gcn-run')? No, when manually compiling/running the testcase it works fine for me. Didn't yet get to try the .exp files Richard. > > Gr??e > Thomas > > > >>> I didn't try to run it - when doing make check-gcc fails to using > >>> gcn-run for test invocation > > > > Note, that for such individual test cases, invoking the compiler and then > > 'gcn-run' manually would seem easiest? > > > >>> what's the trick to make it do that? > > > > I tell you've probably not done much "embedded" or simulator testing of > > GCC targets? ;-P > > > >> There's a config file for nvptx here: > >> https://github.com/SourceryTools/nvptx-tools/blob/master/nvptx-none-run.exp > > > > Yes, and I have pending some updates to that one, to be finished once > > I've generally got my testing set up again, to a sufficient degree... > > > >> You can probably make the obvious adjustments. I think Thomas has a GCN > >> version with a few more features. > > > > Right. I'm attaching my current 'amdgcn-amdhsa-run.exp'. > > > > I'm aware that the 'set_board_info gcc,[...] [...]' may be obsolete/wrong > > (as Andrew also noted privately) -- likewise, at least in part, for > > GCC/nvptx, which is where I copied all that from. (Will revise later; > > not relevant for this discussion, here.) > > > > Similar to what I've recently added to libgomp, there is 'flock'ing here, > > so that you may use 'make -j[...] check' for (partial) parallelism, but > > still all execution testing runs serialized. I found this to greatly > > help denoise the test results. (Not ideal, of course, but improving that > > is for later, too.) > > > > You may want to disable the 'HSA_STATUS_ERROR_OUT_OF_RESOURCES' thing if > > that doesn't work like that in your case. (I've no idea what > > 'amdgpu_gpu_recover' would do if the GPU is also used for display.) But > > this, again, greatly helps denoise test results, at least for the one > > system I'm currently testing on. > > > > I intend to publish proper documentation of all this, later on -- happy > > to answer any questions in the mean time. > > > > If you don't already have a common directory for DejaGnu board files, put > > 'amdgcn-amdhsa-run.exp' into '~/tmp/amdgcn-amdhsa/', for example, and add > > a 'dejagnu.exp' file next to it: > > > > lappend boards_dir ~/tmp/amdgcn-amdhsa > > > > Prepare: > > > > $ DEJAGNU=$HOME/tmp/amdgcn-amdhsa/dejagnu.exp > > $ export DEJAGNU > > $ AMDGCN_AMDHSA_RUN=[...]/build-gcc/gcc/gcn-run > > $ export AMDGCN_AMDHSA_RUN > > $ # If necessary: > > $ AMDGCN_AMDHSA_LD_LIBRARY_PATH=/opt/rocm/lib > > $ LD_LIBRARY_PATH=$AMDGCN_AMDHSA_LD_LIBRARY_PATH${LD_LIBRARY_PATH+:$LD_LIBRARY_PATH} > > $ export LD_LIBRARY_PATH > > > > ..., and then run: > > > > $ make -j8 check-gcc-c RUNTESTFLAGS='--target_board=amdgcn-amdhsa-run/-march=gfx1030 vect.exp' > > > > Oh, and I saw that on , Tobias has > > recently put into a new "Using the GPU as stand-alone system" section > > some similar information. (..., but this should, in my opinion, be on a > > different page, as it's explicitly *not* about what we understand as > > offloading.) > > > >> I usually use the CodeSourcery magic stack of scripts for testing > >> installed toolchains on remote devices, so I'm not too familiar with > >> using Dejagnu directly. > > > > Tsk... ;'-| > > > > > > Gr??e > > Thomas > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)