From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-x429.google.com (mail-wr1-x429.google.com [IPv6:2a00:1450:4864:20::429]) by sourceware.org (Postfix) with ESMTPS id 09377384DEEE for ; Mon, 19 Feb 2024 10:38:57 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 09377384DEEE Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=baylibre.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=baylibre.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 09377384DEEE Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::429 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1708339140; cv=none; b=i1w8HzZ86bHiXGX5KWOU/+Ee5cMuBWa88qe7dz/mA7gw+KBr/73eD6v0uNiz0Twg4HppMBj7H3Dqe80ebapIKXG1fWvBTNbYpdKRzIKwhWLT155FMWmDi2EyiF2AgfLK98JidXne4iQvDgCv3JIIKa2uRyH6gvX3mVddQTyyIlg= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1708339140; c=relaxed/simple; bh=cfvVHlDqNM5VxmAJLxnArXAHxyXcBuAyi2KccAGTDMs=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=nawpBBO++lLtY7S7M19OVguwFqnwCTNIS9L0DYLkxcrNj/xWuJfD0JVoFsGQjn02QxyeSciHN6z1bCWUzNtZjfzdA2NXGO/yXjhqJ/fc3ncwZFsjZum5jhgx31vkOL7GNjFpsXRNnEPixGWb5OnRrO0j3R4ySt4mqBwNIw5RWL0= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-wr1-x429.google.com with SMTP id ffacd0b85a97d-33d38c9ca5bso681750f8f.2 for ; Mon, 19 Feb 2024 02:38:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=baylibre-com.20230601.gappssmtp.com; s=20230601; t=1708339136; x=1708943936; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:message-id:date:user-agent :references:in-reply-to:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=hthKASd5vb066r4LWjtFZa41D+W7hPcascPtIqSBDgU=; b=sIgGJV5iv+dJOgKCv8m85Kv52VPcboqjEIcEaIi88Dhq9yhY7nx2P1Pt+Tg33M4w7t ZRj3YHN9Q1w8rLFgbkR81fuid7Ba5rsC3+3/u/OZSSuclRk7yK58XvAyP+z+3ZBuDfLh bWm5H1TJy0Dvi9oCMrTPCE12oUdnHaDb0WV8E3MaEa8jbBDI+sNqHQkw+mt4jzJ1o+IJ c/NqXLEpbmOZZg9VNFkZlrywKOEj1cZJ0TRz2IDNKlfULJsOnIdMEIRJ+R/MQVPUCdz/ nEWFZTP1dQGBDVh3CWaKcCbL2ozxmYPLiQSfIK6yByZFy8CUkQrkCs9hCofqCEHXe+aY 2c+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708339136; x=1708943936; h=content-transfer-encoding:mime-version:message-id:date:user-agent :references:in-reply-to:subject:cc:to:from:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=hthKASd5vb066r4LWjtFZa41D+W7hPcascPtIqSBDgU=; b=Y2TGbNlguEJnL6EudfXF2xFZxON1KzBJgUYLeDwNBz0nsx0nxjz1cdIhsSzz5Og1uP hW0ou/IDobWOfl1rIyZYjFRi6Iy7e/uY81iJ8KESfF6t9Yv2Pz6HKY2GYkTSDNqP695J DPXZwOEfNVtU6Iup8j1Y6rcUmx2fcV+iZgavWffZhVZYzXDtzkupQU68xY9LPeZhyMRr Mbq7c3bqAtokJJF0OA1ioQ3xhZV9gOpztLVRm46rchGYKDrGmqUU5QKaAIRUB+4jKiLz CpbmNpXo4Vct0oRjic3bgV5MNNi1H7NY3hAIXtMajF/PLtQkqbAN8Z7nSi2PLbnpYTrL cJkQ== X-Gm-Message-State: AOJu0YwAb7Zko/5OMSY21W2jyx96DvRvGXdg/XVZ6FZyZPVgs1VVIiFs RM+CDR53B6DOpO/y656lPHCqx6SLQz7Ba4rKOqTG/NU8fXxct7KaWuxqX75bZrs= X-Google-Smtp-Source: AGHT+IHvVhripi6+9BiJBlMOobLVVBt22mdqAtAwh+Sc/GB44uTkLaY6bZNj3n3NnpJw3JzTyQvCCw== X-Received: by 2002:a05:6000:400c:b0:33d:4adf:2949 with SMTP id cp12-20020a056000400c00b0033d4adf2949mr2366291wrb.11.1708339135625; Mon, 19 Feb 2024 02:38:55 -0800 (PST) Received: from euler.schwinge.homeip.net (p200300c8b7344200b5efa23283b9f09b.dip0.t-ipconnect.de. [2003:c8:b734:4200:b5ef:a232:83b9:f09b]) by smtp.gmail.com with ESMTPSA id c5-20020adfe705000000b0033b8305ffe2sm9953634wrm.87.2024.02.19.02.38.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 19 Feb 2024 02:38:55 -0800 (PST) From: Thomas Schwinge To: Andrew Stubbs , Richard Biener Cc: gcc-patches@gcc.gnu.org Subject: Re: GCN RDNA2+ vs. GCC SLP vectorizer In-Reply-To: <87plww8qin.fsf@euler.schwinge.ddns.net> References: <87ttm8ka6h.fsf@euler.schwinge.ddns.net> <55q4729r-1014-5541-7p75-6rq6r97845r7@fhfr.qr> <4eb1a40e-0f54-4e27-90f8-00f4bba90907@baylibre.com> <53s543rq-36qn-ns26-o0qo-97o168o707pn@fhfr.qr> <9714f90d-a581-4ebe-a031-d5d8c6db9cf6@baylibre.com> <87plww8qin.fsf@euler.schwinge.ddns.net> User-Agent: Notmuch/0.29.3+94~g74c3f1b (https://notmuchmail.org) Emacs/29.1 (x86_64-pc-linux-gnu) Date: Mon, 19 Feb 2024 11:38:54 +0100 Message-ID: <87jzn091s1.fsf@euler.schwinge.ddns.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-5.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi! On 2024-02-16T14:53:04+0100, I wrote: > On 2024-02-16T12:41:06+0000, Andrew Stubbs wrote: >> On 16/02/2024 12:26, Richard Biener wrote: >>> On Fri, 16 Feb 2024, Andrew Stubbs wrote: >>>> On 16/02/2024 10:17, Richard Biener wrote: >>>>> On Fri, 16 Feb 2024, Thomas Schwinge wrote: >>>>>> On 2023-10-20T12:51:03+0100, Andrew Stubbs wr= ote: >>>>>>> I've committed this patch >>>>>> >>>>>> ... as commit c7ec7bd1c6590cf4eed267feab490288e0b8d691 >>>>>> "amdgcn: add -march=3Dgfx1030 EXPERIMENTAL", which the later RDNA3/g= fx1100 >>>>>> support builds on top of, and that's what I'm currently working on >>>>>> getting proper GCC/GCN target (not offloading) results for. >>>>>> >>>>>> Now looking at 'gcc.dg/vect/bb-slp-cond-1.c', which is reasonably si= mple, >>>>>> and hopefully representative for other SLP execution test FAILs >>>>>> (regressions compared to my earlier non-gfx1100 testing). >>>>>> >>>>>> $ build-gcc/gcc/xgcc -Bbuild-gcc/gcc/ >>>>>> source-gcc/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c >>>>>> --sysroot=3Dinstall/amdgcn-amdhsa -ftree-vectorize >>>>>> -fno-tree-loop-distribute-patterns -fno-vect-cost-model -fno-c= ommon >>>>>> -O2 -fdump-tree-slp-details -fdump-tree-vect-details -isystem >>>>>> build-gcc/amdgcn-amdhsa/gfx1100/newlib/targ-include -isystem >>>>>> source-gcc/newlib/libc/include >>>>>> -Bbuild-gcc/amdgcn-amdhsa/gfx1100/newlib/ >>>>>> -Lbuild-gcc/amdgcn-amdhsa/gfx1100/newlib -wrapper >>>>>> setarch,--addr-no-randomize -fdump-tree-all-all -fdump-ipa-all= -all >>>>>> -fdump-rtl-all-all -save-temps -march=3Dgfx1100 >>>>>> >>>>>> The '-march=3Dgfx1030' 'a-bb-slp-cond-1.s' is identical (apart from >>>>>> 'TARGET_PACKED_WORK_ITEMS' in 'gcn_target_asm_function_prologue'), s= o I >>>>>> suppose will also exhibit the same failure mode, once again? >>>>>> >>>>>> Compared to '-march=3Dgfx90a', the differences begin in >>>>>> 'a-bb-slp-cond-1.c.266r.expand' (only!), down to 'a-bb-slp-cond-1.s'. >>>>>> >>>>>> Changed like: >>>>>> >>>>>> @@ -38,10 +38,10 @@ int main () >>>>>> #pragma GCC novector >>>>>> for (i =3D 1; i < N; i++) >>>>>> if (a[i] !=3D i%4 + 1) >>>>>> - abort (); >>>>>> + __builtin_printf("%d %d !=3D %d\n", i, a[i], i%4 + 1); >>>>>>=20=20=20=20=20=20=20=20 >>>>>> if (a[0] !=3D 5) >>>>>> - abort (); >>>>>> + __builtin_printf("%d %d !=3D %d\n", 0, a[0], 5); >>>>>> >>>>>> ..., we see: >>>>>> >>>>>> $ flock /tmp/gcn.lock build-gcc/gcc/gcn-run a.out >>>>>> 40 5 !=3D 1 >>>>>> 41 6 !=3D 2 >>>>>> 42 7 !=3D 3 >>>>>> 43 8 !=3D 4 >>>>>> 44 5 !=3D 1 >>>>>> 45 6 !=3D 2 >>>>>> 46 7 !=3D 3 >>>>>> 47 8 !=3D 4 >>>>>> >>>>>> '40..47' are the 'i =3D 10..11' in 'foo', and the expectation is >>>>>> 'a[i * stride + 0..3] !=3D 0'. So, either some earlier iteration has >>>>>> scribbled zero values over these (vector lane masking issue, perhaps= ?), >>>>>> or some other code generation issue? > >>>> [...], I must be doing something different because vect/bb-slp-cond-1.c >>>> passes for me, on gfx1100. > > That's strange. I've looked at your log file (looks good), and used your > toolchain to compile, and your 'gcn-run' to invoke, and still do get: > > $ flock /tmp/gcn.lock ~/gcn-run ~/bb-slp-cond-1.exe > GCN Kernel Aborted > Kernel aborted > > Andrew, later on, please try what happens when you put an unconditional > 'abort' call into a test case? Andrew, any luck with that yet? Richard, are you able to reproduce the 'gcc.dg/vect/bb-slp-cond-1.c' execution test failure mentioned above (manual compilation and 'gcn-run')? Gr=C3=BC=C3=9Fe Thomas >>> I didn't try to run it - when doing make check-gcc fails to using >>> gcn-run for test invocation > > Note, that for such individual test cases, invoking the compiler and then > 'gcn-run' manually would seem easiest? > >>> what's the trick to make it do that? > > I tell you've probably not done much "embedded" or simulator testing of > GCC targets? ;-P > >> There's a config file for nvptx here:=20 >> https://github.com/SourceryTools/nvptx-tools/blob/master/nvptx-none-run.= exp > > Yes, and I have pending some updates to that one, to be finished once > I've generally got my testing set up again, to a sufficient degree... > >> You can probably make the obvious adjustments. I think Thomas has a GCN= =20 >> version with a few more features. > > Right. I'm attaching my current 'amdgcn-amdhsa-run.exp'. > > I'm aware that the 'set_board_info gcc,[...] [...]' may be obsolete/wrong > (as Andrew also noted privately) -- likewise, at least in part, for > GCC/nvptx, which is where I copied all that from. (Will revise later; > not relevant for this discussion, here.) > > Similar to what I've recently added to libgomp, there is 'flock'ing here, > so that you may use 'make -j[...] check' for (partial) parallelism, but > still all execution testing runs serialized. I found this to greatly > help denoise the test results. (Not ideal, of course, but improving that > is for later, too.) > > You may want to disable the 'HSA_STATUS_ERROR_OUT_OF_RESOURCES' thing if > that doesn't work like that in your case. (I've no idea what > 'amdgpu_gpu_recover' would do if the GPU is also used for display.) But > this, again, greatly helps denoise test results, at least for the one > system I'm currently testing on. > > I intend to publish proper documentation of all this, later on -- happy > to answer any questions in the mean time. > > If you don't already have a common directory for DejaGnu board files, put > 'amdgcn-amdhsa-run.exp' into '~/tmp/amdgcn-amdhsa/', for example, and add > a 'dejagnu.exp' file next to it: > > lappend boards_dir ~/tmp/amdgcn-amdhsa > > Prepare: > > $ DEJAGNU=3D$HOME/tmp/amdgcn-amdhsa/dejagnu.exp > $ export DEJAGNU > $ AMDGCN_AMDHSA_RUN=3D[...]/build-gcc/gcc/gcn-run > $ export AMDGCN_AMDHSA_RUN > $ # If necessary: > $ AMDGCN_AMDHSA_LD_LIBRARY_PATH=3D/opt/rocm/lib > $ LD_LIBRARY_PATH=3D$AMDGCN_AMDHSA_LD_LIBRARY_PATH${LD_LIBRARY_PATH+:= $LD_LIBRARY_PATH} > $ export LD_LIBRARY_PATH > > ..., and then run: > > $ make -j8 check-gcc-c RUNTESTFLAGS=3D'--target_board=3Damdgcn-amdhsa= -run/-march=3Dgfx1030 vect.exp' > > Oh, and I saw that on , Tobias has > recently put into a new "Using the GPU as stand-alone system" section > some similar information. (..., but this should, in my opinion, be on a > different page, as it's explicitly *not* about what we understand as > offloading.) > >> I usually use the CodeSourcery magic stack of scripts for testing=20 >> installed toolchains on remote devices, so I'm not too familiar with=20 >> using Dejagnu directly. > > Tsk... ;'-| > > > Gr=C3=BC=C3=9Fe > Thomas