Re: GCN RDNA2+ vs. GCC SLP vectorizer

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Richard Biener <rguenther@suse.de>
To: Thomas Schwinge <tschwinge@baylibre.com>
Cc: Andrew Stubbs <ams@baylibre.com>, gcc-patches@gcc.gnu.org
Subject: Re: GCN RDNA2+ vs. GCC SLP vectorizer
Date: Tue, 20 Feb 2024 08:44:35 +0100 (CET)	[thread overview]
Message-ID: <qroq5ps5-2soo-on17-qq8o-qnsssq5sn330@fhfr.qr> (raw)
In-Reply-To: <87bk8c8lac.fsf@euler.schwinge.ddns.net>

On Mon, 19 Feb 2024, Thomas Schwinge wrote:

> Hi!
> 
> On 2024-02-19T17:31:20+0100, I wrote:
> > On 2024-02-19T11:52:55+0100, Richard Biener <rguenther@suse.de> wrote:
> >> On Mon, 19 Feb 2024, Thomas Schwinge wrote:
> >>> On 2024-02-16T14:53:04+0100, I wrote:
> >>> > On 2024-02-16T12:41:06+0000, Andrew Stubbs <ams@baylibre.com> wrote:
> >>> >> On 16/02/2024 12:26, Richard Biener wrote:
> >>> >>> On Fri, 16 Feb 2024, Andrew Stubbs wrote:
> >>> >>>> On 16/02/2024 10:17, Richard Biener wrote:
> >>> >>>>> On Fri, 16 Feb 2024, Thomas Schwinge wrote:
> >>> >>>>>> On 2023-10-20T12:51:03+0100, Andrew Stubbs <ams@codesourcery.com> wrote:
> >>> >>>>>>> I've committed this patch
> >>> >>>>>>
> >>> >>>>>> ... as commit c7ec7bd1c6590cf4eed267feab490288e0b8d691
> >>> >>>>>> "amdgcn: add -march=gfx1030 EXPERIMENTAL", which the later RDNA3/gfx1100
> >>> >>>>>> support builds on top of, and that's what I'm currently working on
> >>> >>>>>> getting proper GCC/GCN target (not offloading) results for.
> >>> >>>>>>
> >>> >>>>>> Now looking at 'gcc.dg/vect/bb-slp-cond-1.c', which is reasonably simple,
> >>> >>>>>> and hopefully representative for other SLP execution test FAILs
> >>> >>>>>> (regressions compared to my earlier non-gfx1100 testing).
> >>> >>>>>>
> >>> >>>>>>       $ build-gcc/gcc/xgcc -Bbuild-gcc/gcc/
> >>> >>>>>>       source-gcc/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c
> >>> >>>>>>       --sysroot=install/amdgcn-amdhsa -ftree-vectorize
> >>> >>>>>>       -fno-tree-loop-distribute-patterns -fno-vect-cost-model -fno-common
> >>> >>>>>>       -O2 -fdump-tree-slp-details -fdump-tree-vect-details -isystem
> >>> >>>>>>       build-gcc/amdgcn-amdhsa/gfx1100/newlib/targ-include -isystem
> >>> >>>>>>       source-gcc/newlib/libc/include
> >>> >>>>>>       -Bbuild-gcc/amdgcn-amdhsa/gfx1100/newlib/
> >>> >>>>>>       -Lbuild-gcc/amdgcn-amdhsa/gfx1100/newlib -wrapper
> >>> >>>>>>       setarch,--addr-no-randomize -fdump-tree-all-all -fdump-ipa-all-all
> >>> >>>>>>       -fdump-rtl-all-all -save-temps -march=gfx1100
> >>> >>>>>>
> >>> >>>>>> The '-march=gfx1030' 'a-bb-slp-cond-1.s' is identical (apart from
> >>> >>>>>> 'TARGET_PACKED_WORK_ITEMS' in 'gcn_target_asm_function_prologue'), so I
> >>> >>>>>> suppose will also exhibit the same failure mode, once again?
> >>> >>>>>>
> >>> >>>>>> Compared to '-march=gfx90a', the differences begin in
> >>> >>>>>> 'a-bb-slp-cond-1.c.266r.expand' (only!), down to 'a-bb-slp-cond-1.s'.
> >>> >>>>>>
> >>> >>>>>> Changed like:
> >>> >>>>>>
> >>> >>>>>>       @@ -38,10 +38,10 @@ int main ()
> >>> >>>>>>        #pragma GCC novector
> >>> >>>>>>          for (i = 1; i < N; i++)
> >>> >>>>>>            if (a[i] != i%4 + 1)
> >>> >>>>>>       -      abort ();
> >>> >>>>>>       +      __builtin_printf("%d %d != %d\n", i, a[i], i%4 + 1);
> >>> >>>>>>        
> >>> >>>>>>          if (a[0] != 5)
> >>> >>>>>>       -    abort ();
> >>> >>>>>>       +    __builtin_printf("%d %d != %d\n", 0, a[0], 5);
> >>> >>>>>>
> >>> >>>>>> ..., we see:
> >>> >>>>>>
> >>> >>>>>>       $ flock /tmp/gcn.lock build-gcc/gcc/gcn-run a.out
> >>> >>>>>>       40 5 != 1
> >>> >>>>>>       41 6 != 2
> >>> >>>>>>       42 7 != 3
> >>> >>>>>>       43 8 != 4
> >>> >>>>>>       44 5 != 1
> >>> >>>>>>       45 6 != 2
> >>> >>>>>>       46 7 != 3
> >>> >>>>>>       47 8 != 4
> >>> >>>>>>
> >>> >>>>>> '40..47' are the 'i = 10..11' in 'foo', and the expectation is
> >>> >>>>>> 'a[i * stride + 0..3] != 0'.  So, either some earlier iteration has
> >>> >>>>>> scribbled zero values over these (vector lane masking issue, perhaps?),
> >>> >>>>>> or some other code generation issue?
> >>> >
> >>> >>>> [...], I must be doing something different because vect/bb-slp-cond-1.c
> >>> >>>> passes for me, on gfx1100.
> >>> >
> >>> > That's strange.  I've looked at your log file (looks good), and used your
> >>> > toolchain to compile, and your 'gcn-run' to invoke, and still do get:
> >>> >
> >>> >     $ flock /tmp/gcn.lock ~/gcn-run ~/bb-slp-cond-1.exe
> >>> >     GCN Kernel Aborted
> >>> >     Kernel aborted
> >>> >
> >>> > Andrew, later on, please try what happens when you put an unconditional
> >>> > 'abort' call into a test case?
> >>> 
> >>> Andrew, any luck with that yet?
> >>> 
> >>> Richard, are you able to reproduce the 'gcc.dg/vect/bb-slp-cond-1.c'
> >>> execution test failure mentioned above (manual compilation and
> >>> 'gcn-run')?
> >>
> >> No, when manually compiling/running the testcase it works fine for me.
> >
> > I've updated my GCC master branch sources, but it still fails for me:
> >
> >     $ build-gcc/gcc/xgcc -Bbuild-gcc/gcc/ source-gcc/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c --sysroot=install/amdgcn-amdhsa -isystem build-gcc/amdgcn-amdhsa/gfx1100/newlib/targ-include -isystem source-gcc/newlib/libc/include -Bbuild-gcc/amdgcn-amdhsa/gfx1100/newlib/ -Lbuild-gcc/amdgcn-amdhsa/gfx1100/newlib -march=gfx1100 -ftree-vectorize -fno-tree-loop-distribute-patterns -fno-vect-cost-model -fno-common -O2 -save-temps
> >     $ flock /tmp/gcn.lock build-gcc/gcc/gcn-run a.out
> >     GCN Kernel Aborted
> >     Kernel aborted
> >
> > Strange.
> >
> > In 'bb-slp-cond-1.tar.xz' I'm attaching the files I've built.  Could you
> > please compare those to yours and try 'gcn-run gfx1030/a.out'?
> 
> Actually: 'gcn-run gfx1030/a.out' a few times -- our dear friend
> Nondeterminism seems to be at play here...  :-|

What's your set of compile options?  I don't manage to get close
to your gfx1030 assembly when using your preprocessed source ...

I've tried -march=gfx1030 -O[23] [-fno-vect-cost-model]

Looks like you use -fno-omit-frame-pointer but then I still see
-mine +yours

-       v_readlane_b32  s18, v4, 0
-       v_readlane_b32  s19, v5, 0
-       s_add_u32       s18, s18, s26
-       s_addc_u32      s19, s19, s27
-       v_writelane_b32 v4, s18, 0
-       v_writelane_b32 v5, s19, 0
-       s_mov_b32       s18, s14
-       s_mov_b32       s19, s15
-       s_mov_b32       s22, scc
-       s_add_u32       s18, s18, 4096
-       s_addc_u32      s19, s19, 0
-       s_cmpk_lg_u32   s22, 0
-       v_writelane_b32 v6, s18, 0
-       v_writelane_b32 v7, s19, 0
-       flat_store_dwordx2      v[6:7], v[4:5]
+       v_writelane_b32 v6, s26, 0
+       v_writelane_b32 v7, s27, 0
+       v_add_co_u32    v4, vcc, v6, v4
+       v_add_co_ci_u32 v5, vcc, v7, v5, vcc

and more changes.

Richard.

> 
> Gr??e
>  Thomas
> 
> 
> >> Didn't yet get to try the .exp files
> >>
> >> Richard.
> >>
> >>> 
> >>> Gr??e
> >>>  Thomas
> >>> 
> >>> 
> >>> >>> I didn't try to run it - when doing make check-gcc fails to using
> >>> >>> gcn-run for test invocation
> >>> >
> >>> > Note, that for such individual test cases, invoking the compiler and then
> >>> > 'gcn-run' manually would seem easiest?
> >>> >
> >>> >>> what's the trick to make it do that?
> >>> >
> >>> > I tell you've probably not done much "embedded" or simulator testing of
> >>> > GCC targets?  ;-P
> >>> >
> >>> >> There's a config file for nvptx here: 
> >>> >> https://github.com/SourceryTools/nvptx-tools/blob/master/nvptx-none-run.exp
> >>> >
> >>> > Yes, and I have pending some updates to that one, to be finished once
> >>> > I've generally got my testing set up again, to a sufficient degree...
> >>> >
> >>> >> You can probably make the obvious adjustments. I think Thomas has a GCN 
> >>> >> version with a few more features.
> >>> >
> >>> > Right.  I'm attaching my current 'amdgcn-amdhsa-run.exp'.
> >>> >
> >>> > I'm aware that the 'set_board_info gcc,[...] [...]' may be obsolete/wrong
> >>> > (as Andrew also noted privately) -- likewise, at least in part, for
> >>> > GCC/nvptx, which is where I copied all that from.  (Will revise later;
> >>> > not relevant for this discussion, here.)
> >>> >
> >>> > Similar to what I've recently added to libgomp, there is 'flock'ing here,
> >>> > so that you may use 'make -j[...] check' for (partial) parallelism, but
> >>> > still all execution testing runs serialized.  I found this to greatly
> >>> > help denoise the test results.  (Not ideal, of course, but improving that
> >>> > is for later, too.)
> >>> >
> >>> > You may want to disable the 'HSA_STATUS_ERROR_OUT_OF_RESOURCES' thing if
> >>> > that doesn't work like that in your case.  (I've no idea what
> >>> > 'amdgpu_gpu_recover' would do if the GPU is also used for display.)  But
> >>> > this, again, greatly helps denoise test results, at least for the one
> >>> > system I'm currently testing on.
> >>> >
> >>> > I intend to publish proper documentation of all this, later on -- happy
> >>> > to answer any questions in the mean time.
> >>> >
> >>> > If you don't already have a common directory for DejaGnu board files, put
> >>> > 'amdgcn-amdhsa-run.exp' into '~/tmp/amdgcn-amdhsa/', for example, and add
> >>> > a 'dejagnu.exp' file next to it:
> >>> >
> >>> >     lappend boards_dir ~/tmp/amdgcn-amdhsa
> >>> >
> >>> > Prepare:
> >>> >
> >>> >     $ DEJAGNU=$HOME/tmp/amdgcn-amdhsa/dejagnu.exp
> >>> >     $ export DEJAGNU
> >>> >     $ AMDGCN_AMDHSA_RUN=[...]/build-gcc/gcc/gcn-run
> >>> >     $ export AMDGCN_AMDHSA_RUN
> >>> >     $ # If necessary:
> >>> >     $ AMDGCN_AMDHSA_LD_LIBRARY_PATH=/opt/rocm/lib
> >>> >     $ LD_LIBRARY_PATH=$AMDGCN_AMDHSA_LD_LIBRARY_PATH${LD_LIBRARY_PATH+:$LD_LIBRARY_PATH}
> >>> >     $ export LD_LIBRARY_PATH
> >>> >
> >>> > ..., and then run:
> >>> >
> >>> >     $ make -j8 check-gcc-c RUNTESTFLAGS='--target_board=amdgcn-amdhsa-run/-march=gfx1030 vect.exp'
> >>> >
> >>> > Oh, and I saw that on <https://gcc.gnu.org/wiki/Offloading>, Tobias has
> >>> > recently put into a new "Using the GPU as stand-alone system" section
> >>> > some similar information.  (..., but this should, in my opinion, be on a
> >>> > different page, as it's explicitly *not* about what we understand as
> >>> > offloading.)
> >>> >
> >>> >> I usually use the CodeSourcery magic stack of scripts for testing 
> >>> >> installed toolchains on remote devices, so I'm not too familiar with 
> >>> >> using Dejagnu directly.
> >>> >
> >>> > Tsk...  ;'-|
> >>> >
> >>> >
> >>> > Gr??e
> >>> >  Thomas
> >>> 
> >>
> >> -- 
> >> Richard Biener <rguenther@suse.de>
> >> SUSE Software Solutions Germany GmbH,
> >> Frankenstrasse 146, 90461 Nuernberg, Germany;
> >> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

next prev parent reply	other threads:[~2024-02-20  7:44 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-20 11:51 [committed] amdgcn: add -march=gfx1030 EXPERIMENTAL Andrew Stubbs
2023-10-27 17:06 ` Andrew Stubbs
2024-01-29 10:34 ` [patch] gcn/gcn-valu.md: Disable fold_left_plus for TARGET_RDNA2_PLUS [PR113615] Tobias Burnus
2024-01-29 12:34   ` Andrew Stubbs
2024-01-29 12:50     ` Tobias Burnus
2024-01-29 15:17       ` Andrew Stubbs
2024-02-16 14:34   ` GCN: Conditionalize 'define_expand "reduc_<fexpander>_scal_<mode>"' on '!TARGET_RDNA2_PLUS' [PR113615] (was: [patch] gcn/gcn-valu.md: Disable fold_left_plus for TARGET_RDNA2_PLUS [PR113615]) Thomas Schwinge
2024-02-16 14:39     ` GCN: Conditionalize 'define_expand "reduc_<fexpander>_scal_<mode>"' on '!TARGET_RDNA2_PLUS' [PR113615] Andrew Stubbs
2024-02-12 16:35 ` GCN RDNA2+ vs. GCC vectorizer "Reduce using vector shifts" (was: [committed] amdgcn: add -march=gfx1030 EXPERIMENTAL) Thomas Schwinge
2024-02-13  8:26   ` Richard Biener
2024-02-14 12:56     ` GCN RDNA2+ vs. GCC vectorizer "Reduce using vector shifts" Andrew Stubbs
2024-02-14 13:27       ` Richard Biener
2024-02-14 13:40         ` Andrew Stubbs
2024-02-14 13:43           ` Richard Biener
2024-02-14 15:23             ` Andrew Stubbs
2024-02-15  7:49               ` Richard Biener
2024-02-15 10:03                 ` Andrew Stubbs
2024-02-15 10:21                   ` Richard Biener
2024-02-15 10:59                     ` Andrew Stubbs
2024-02-15 12:31                       ` Richard Biener
2024-02-15 10:23                 ` Thomas Schwinge
2024-02-15 13:02                   ` Andrew Stubbs
2024-02-16  9:52 ` GCN RDNA2+ vs. GCC SLP vectorizer (was: [committed] amdgcn: add -march=gfx1030 EXPERIMENTAL) Thomas Schwinge
2024-02-16 10:17   ` Richard Biener
2024-02-16 11:22     ` GCN RDNA2+ vs. GCC SLP vectorizer Andrew Stubbs
2024-02-16 12:26       ` Richard Biener
2024-02-16 12:41         ` Andrew Stubbs
2024-02-16 13:53           ` Thomas Schwinge
2024-02-19 10:38             ` Thomas Schwinge
2024-02-19 10:52               ` Richard Biener
2024-02-19 16:31                 ` Thomas Schwinge
2024-02-19 16:35                   ` Thomas Schwinge
2024-02-20  7:44                     ` Richard Biener [this message]
2024-02-20  8:46                       ` Thomas Schwinge
2024-02-20  9:13                         ` Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=qroq5ps5-2soo-on17-qq8o-qnsssq5sn330@fhfr.qr \
    --to=rguenther@suse.de \
    --cc=ams@baylibre.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=tschwinge@baylibre.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).