public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Biener <rguenther@suse.de>
To: Richard Sandiford <richard.sandiford@arm.com>
Cc: Richard Biener via Gcc <gcc@gcc.gnu.org>, tamar.christina@arm.com
Subject: Re: [RFC] Merge strathegy for all-SLP vectorizer
Date: Fri, 17 May 2024 14:53:36 +0200 (CEST)	[thread overview]
Message-ID: <057q8r2q-8r94-5sr0-51np-4pq441oqq0s1@fhfr.qr> (raw)
In-Reply-To: <mptbk54fxos.fsf@arm.com>

On Fri, 17 May 2024, Richard Sandiford wrote:

> Richard Biener via Gcc <gcc@gcc.gnu.org> writes:
> > Hi,
> >
> > I'd like to discuss how to go forward with getting the vectorizer to
> > all-SLP for this stage1.  While there is a personal branch with my
> > ongoing work (users/rguenth/vect-force-slp) branches haven't proved
> > themselves working well for collaboration.
> 
> Speaking for myself, the problem hasn't been so much the branch as
> lack of time.  I've been pretty swamped the last eight months of so
> (except for the time that I took off, which admittedly was quite a
> bit!), and so I never even got around to properly reading and replying
> to your message after the Cauldron.  It's been on the "this is important,
> I should make time to read and understand it properly" list all this time.
> Sorry about that. :(
> 
> I'm hoping to have time to work/help out on SLP stuff soon.
> 
> > The branch isn't ready to be merged in full but I have been picking
> > improvements to trunk last stage1 and some remaining bits in the past
> > weeks.  I have refrained from merging code paths that cannot be
> > exercised on trunk.
> >
> > There are two important set of changes on the branch, both critical
> > to get more testing on non-x86 targets.
> >
> >  1. enable single-lane SLP discovery
> >  2. avoid splitting store groups (9315bfc661432c3 and 4336060fe2db8ec
> >     if you fetch the branch)
> >
> > The first point is also most annoying on the testsuite since doing
> > SLP instead of interleaving changes what we dump and thus tests
> > start to fail in random ways when you switch between both modes.
> > On the branch single-lane SLP discovery is gated with
> > --param vect-single-lane-slp.
> >
> > The branch has numerous changes to enable single-lane SLP for some
> > code paths that have SLP not implemented and where I did not bother
> > to try supporting multi-lane SLP at this point.  It also adds more
> > SLP discovery entry points.
> >
> > I'm not sure how to try merging these pieces to allow others to
> > more easily help out.  One possibility is to merge
> > --param vect-single-lane-slp defaulted off and pick dependent
> > changes even when they cause testsuite regressions with
> > vect-single-lane-slp=1.  Alternatively adjust the testsuite by
> > adding --param vect-single-lane-slp=0 and default to 1
> > (or keep the default).
> 
> FWIW, this one sounds good to me (the default to 1 version).
> I.e. mechanically add --param vect-single-lane-slp=0 to any tests
> that fail with the new default.  That means that the test that need
> fixing are easily greppable for anyone who wants to help.  Sometimes
> it'll just be a test update.  Sometimes it will be new vectoriser code.

OK.  Meanwhile I figured the most important part is 2. from above
since that enables the single-lane in a grouped access (also covering
single element interleaving).  This will cover all problematical cases
with respect to vectorizing loads and stores.  It also has less
testsuite fallout, mainly because we have a lot less coverage for
grouped stores without SLP.

So I'll see to produce a mergeable patch for part 2 and post that
for review next week.

Thanks,
Richard.

> Thanks,
> Richard
> 
> > Or require a clean testsuite with
> > --param vect-single-lane-slp defaulted to 1 but keep the --param
> > for debugging (and allow FAILs with 0).
> >
> > For fun I merged just single-lane discovery of non-grouped stores
> > and have that enabled by default.  On x86_64 this results in the
> > set of FAILs below.
> >
> > Any suggestions?
> >
> > Thanks,
> > Richard.
> >
> > FAIL: gcc.dg/vect/O3-pr39675-2.c scan-tree-dump-times vect "vectorizing 
> > stmts using SLP" 1
> > XPASS: gcc.dg/vect/no-scevccp-outer-12.c scan-tree-dump-times vect "OUTER 
> > LOOP VECTORIZED." 1
> > FAIL: gcc.dg/vect/no-section-anchors-vect-31.c scan-tree-dump-times vect 
> > "Alignment of access forced using peeling" 2
> > FAIL: gcc.dg/vect/no-section-anchors-vect-31.c scan-tree-dump-times vect 
> > "Vectorizing an unaligned access" 0
> > FAIL: gcc.dg/vect/no-section-anchors-vect-64.c scan-tree-dump-times vect 
> > "Alignment of access forced using peeling" 2
> > FAIL: gcc.dg/vect/no-section-anchors-vect-64.c scan-tree-dump-times vect 
> > "Vectorizing an unaligned access" 0
> > FAIL: gcc.dg/vect/no-section-anchors-vect-66.c scan-tree-dump-times vect 
> > "Alignment of access forced using peeling" 1
> > FAIL: gcc.dg/vect/no-section-anchors-vect-66.c scan-tree-dump-times vect 
> > "Vectorizing an unaligned access" 0
> > FAIL: gcc.dg/vect/no-section-anchors-vect-68.c scan-tree-dump-times vect 
> > "Alignment of access forced using peeling" 2
> > FAIL: gcc.dg/vect/no-section-anchors-vect-68.c scan-tree-dump-times vect 
> > "Vectorizing an unaligned access" 0
> > FAIL: gcc.dg/vect/slp-12a.c -flto -ffat-lto-objects  scan-tree-dump-times 
> > vect "vectorizing stmts using SLP" 1
> > FAIL: gcc.dg/vect/slp-12a.c scan-tree-dump-times vect "vectorizing stmts 
> > using SLP" 1
> > FAIL: gcc.dg/vect/slp-19a.c -flto -ffat-lto-objects  scan-tree-dump-times 
> > vect "vectorizing stmts using SLP" 1
> > FAIL: gcc.dg/vect/slp-19a.c scan-tree-dump-times vect "vectorizing stmts 
> > using SLP" 1
> > FAIL: gcc.dg/vect/slp-19b.c -flto -ffat-lto-objects  scan-tree-dump-times 
> > vect "vectorizing stmts using SLP" 1
> > FAIL: gcc.dg/vect/slp-19b.c scan-tree-dump-times vect "vectorizing stmts 
> > using SLP" 1
> > FAIL: gcc.dg/vect/slp-19c.c -flto -ffat-lto-objects  scan-tree-dump-times 
> > vect "vectorized 1 loops" 1
> > FAIL: gcc.dg/vect/slp-19c.c -flto -ffat-lto-objects  scan-tree-dump-times 
> > vect "vectorizing stmts using SLP" 1
> > FAIL: gcc.dg/vect/slp-19c.c scan-tree-dump-times vect "vectorized 1 loops" 
> > 1
> > FAIL: gcc.dg/vect/slp-19c.c scan-tree-dump-times vect "vectorizing stmts 
> > using SLP" 1
> > XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1115.c -flto -ffat-lto-objects  
> > scan-tree-dump vect "vectorized 1 loops"
> > XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1115.c scan-tree-dump vect "vectorized 
> > 1 loops"
> > XPASS: gcc.dg/vect/tsvc/vect-tsvc-s114.c -flto -ffat-lto-objects  
> > scan-tree-dump vect "vectorized 1 loops"
> > XPASS: gcc.dg/vect/tsvc/vect-tsvc-s114.c scan-tree-dump vect "vectorized 1 
> > loops"
> > XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1232.c -flto -ffat-lto-objects  
> > scan-tree-dump vect "vectorized 1 loops"
> > XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1232.c scan-tree-dump vect "vectorized 
> > 1 loops"
> > XPASS: gcc.dg/vect/tsvc/vect-tsvc-s257.c -flto -ffat-lto-objects  
> > scan-tree-dump vect "vectorized 1 loops"
> > XPASS: gcc.dg/vect/tsvc/vect-tsvc-s257.c scan-tree-dump vect "vectorized 1 
> > loops"
> > FAIL: gcc.dg/vect/vect-26.c -flto -ffat-lto-objects  scan-tree-dump-times 
> > vect "Alignment of access forced using peeling" 1
> > FAIL: gcc.dg/vect/vect-26.c -flto -ffat-lto-objects  scan-tree-dump-times 
> > vect "Vectorizing an unaligned access" 0
> > FAIL: gcc.dg/vect/vect-26.c scan-tree-dump-times vect "Alignment of access 
> > forced using peeling" 1
> > FAIL: gcc.dg/vect/vect-26.c scan-tree-dump-times vect "Vectorizing an 
> > unaligned access" 0
> > FAIL: gcc.dg/vect/vect-54.c -flto -ffat-lto-objects  scan-tree-dump-times 
> > vect "Alignment of access forced using peeling" 1
> > FAIL: gcc.dg/vect/vect-54.c -flto -ffat-lto-objects  scan-tree-dump-times 
> > vect "Vectorizing an unaligned access" 0
> > FAIL: gcc.dg/vect/vect-54.c scan-tree-dump-times vect "Alignment of access 
> > forced using peeling" 1
> > FAIL: gcc.dg/vect/vect-54.c scan-tree-dump-times vect "Vectorizing an 
> > unaligned access" 0
> > FAIL: gcc.dg/vect/vect-56.c -flto -ffat-lto-objects  scan-tree-dump-times 
> > vect "Alignment of access forced using peeling" 1
> > FAIL: gcc.dg/vect/vect-56.c -flto -ffat-lto-objects  scan-tree-dump-times 
> > vect "Vectorizing an unaligned access" 1
> > FAIL: gcc.dg/vect/vect-56.c scan-tree-dump-times vect "Alignment of access 
> > forced using peeling" 1
> > FAIL: gcc.dg/vect/vect-56.c scan-tree-dump-times vect "Vectorizing an 
> > unaligned access" 1
> > FAIL: gcc.dg/vect/vect-58.c -flto -ffat-lto-objects  scan-tree-dump-times 
> > vect "Alignment of access forced using peeling" 1
> > FAIL: gcc.dg/vect/vect-58.c -flto -ffat-lto-objects  scan-tree-dump-times 
> > vect "Vectorizing an unaligned access" 0
> > FAIL: gcc.dg/vect/vect-58.c scan-tree-dump-times vect "Alignment of access 
> > forced using peeling" 1
> > FAIL: gcc.dg/vect/vect-58.c scan-tree-dump-times vect "Vectorizing an 
> > unaligned access" 0
> > FAIL: gcc.dg/vect/vect-60.c -flto -ffat-lto-objects  scan-tree-dump-times 
> > vect "Alignment of access forced using peeling" 1
> > FAIL: gcc.dg/vect/vect-60.c -flto -ffat-lto-objects  scan-tree-dump-times 
> > vect "Vectorizing an unaligned access" 1
> > FAIL: gcc.dg/vect/vect-60.c scan-tree-dump-times vect "Alignment of access 
> > forced using peeling" 1
> > FAIL: gcc.dg/vect/vect-60.c scan-tree-dump-times vect "Vectorizing an 
> > unaligned access" 1
> > FAIL: gcc.dg/vect/vect-89-big-array.c -flto -ffat-lto-objects  
> > scan-tree-dump-times vect "Alignment of access forced using peeling" 1
> > FAIL: gcc.dg/vect/vect-89-big-array.c -flto -ffat-lto-objects  
> > scan-tree-dump-times vect "Vectorizing an unaligned access" 0
> > FAIL: gcc.dg/vect/vect-89-big-array.c scan-tree-dump-times vect "Alignment 
> > of access forced using peeling" 1
> > FAIL: gcc.dg/vect/vect-89-big-array.c scan-tree-dump-times vect 
> > "Vectorizing an unaligned access" 0
> > FAIL: gcc.dg/vect/vect-89.c -flto -ffat-lto-objects  scan-tree-dump-times 
> > vect "Alignment of access forced using peeling" 1
> > FAIL: gcc.dg/vect/vect-89.c -flto -ffat-lto-objects  scan-tree-dump-times 
> > vect "Vectorizing an unaligned access" 0
> > FAIL: gcc.dg/vect/vect-89.c scan-tree-dump-times vect "Alignment of access 
> > forced using peeling" 1
> > FAIL: gcc.dg/vect/vect-89.c scan-tree-dump-times vect "Vectorizing an 
> > unaligned access" 0
> > FAIL: gcc.dg/vect/vect-92.c -flto -ffat-lto-objects  scan-tree-dump-times 
> > vect "Alignment of access forced using peeling" 3
> > FAIL: gcc.dg/vect/vect-92.c -flto -ffat-lto-objects  scan-tree-dump-times 
> > vect "Vectorizing an unaligned access" 0 
> > FAIL: gcc.dg/vect/vect-92.c scan-tree-dump-times vect "Alignment of access 
> > forced using peeling" 3
> > FAIL: gcc.dg/vect/vect-92.c scan-tree-dump-times vect "Vectorizing an 
> > unaligned access" 0 
> > FAIL: gcc.dg/vect/vect-early-break_25.c -flto -ffat-lto-objects  
> > scan-tree-dump-times vect "Alignment of access forced using peeling" 1
> > FAIL: gcc.dg/vect/vect-early-break_25.c scan-tree-dump-times vect 
> > "Alignment of access forced using peeling" 1
> > FAIL: gcc.dg/vect/vect-multitypes-1.c -flto -ffat-lto-objects  
> > scan-tree-dump-times vect "Alignment of access forced using peeling" 2
> > FAIL: gcc.dg/vect/vect-multitypes-1.c scan-tree-dump-times vect "Alignment 
> > of access forced using peeling" 2
> > XPASS: gcc.dg/vect/vect-outer-4e.c -flto -ffat-lto-objects  
> > scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
> > XPASS: gcc.dg/vect/vect-outer-4e.c scan-tree-dump-times vect "OUTER LOOP 
> > VECTORIZED" 1 
> > FAIL: gcc.dg/vect/vect-peel-1.c -flto -ffat-lto-objects  
> > scan-tree-dump-times vect "Alignment of access forced using peeling" 1
> > FAIL: gcc.dg/vect/vect-peel-1.c -flto -ffat-lto-objects  
> > scan-tree-dump-times vect "Vectorizing an unaligned access" 1
> > FAIL: gcc.dg/vect/vect-peel-1.c scan-tree-dump-times vect "Alignment of 
> > access forced using peeling" 1
> > FAIL: gcc.dg/vect/vect-peel-1.c scan-tree-dump-times vect "Vectorizing an 
> > unaligned access" 1
> > FAIL: gcc.dg/vect/vect-peel-2.c -flto -ffat-lto-objects  
> > scan-tree-dump-times vect "Alignment of access forced using peeling" 1
> > FAIL: gcc.dg/vect/vect-peel-2.c -flto -ffat-lto-objects  
> > scan-tree-dump-times vect "Vectorizing an unaligned access" 1
> > FAIL: gcc.dg/vect/vect-peel-2.c scan-tree-dump-times vect "Alignment of 
> > access forced using peeling" 1
> > FAIL: gcc.dg/vect/vect-peel-2.c scan-tree-dump-times vect "Vectorizing an 
> > unaligned access" 1
> > FAIL: gcc.target/i386/cond_op_fma__Float16-1.c scan-assembler-times 
> > vfmadd132ph[ \\\\t]+[^{\\n]*%ymm[0-9]+{%k[1-7]}(?:\\n|[ \\\\t]+#) 1
> > FAIL: gcc.target/i386/cond_op_fma__Float16-1.c scan-assembler-times 
> > vfmsub132ph[ \\\\t]+[^{\\n]*%ymm[0-9]+{%k[1-7]}(?:\\n|[ \\\\t]+#) 1
> > FAIL: gcc.target/i386/cond_op_fma__Float16-1.c scan-assembler-times 
> > vfnmadd132ph[ \\\\t]+[^{\\n]*%ymm[0-9]+{%k[1-7]}(?:\\n|[ \\\\t]+#) 1
> > FAIL: gcc.target/i386/cond_op_fma__Float16-1.c scan-assembler-times 
> > vfnmsub132ph[ \\\\t]+[^{\\n]*%ymm[0-9]+{%k[1-7]}(?:\\n|[ \\\\t]+#) 1
> > FAIL: gcc.target/i386/cond_op_fma_double-1.c scan-assembler-times 
> > vfmadd132pd[ \\\\t]+[^{\\n]*%ymm[0-9]+{%k[1-7]}(?:\\n|[ \\\\t]+#) 1
> > FAIL: gcc.target/i386/cond_op_fma_double-1.c scan-assembler-times 
> > vfmsub132pd[ \\\\t]+[^{\\n]*%ymm[0-9]+{%k[1-7]}(?:\\n|[ \\\\t]+#) 1
> > FAIL: gcc.target/i386/cond_op_fma_double-1.c scan-assembler-times 
> > vfnmadd132pd[ \\\\t]+[^{\\n]*%ymm[0-9]+{%k[1-7]}(?:\\n|[ \\\\t]+#) 1
> > FAIL: gcc.target/i386/cond_op_fma_double-1.c scan-assembler-times 
> > vfnmsub132pd[ \\\\t]+[^{\\n]*%ymm[0-9]+{%k[1-7]}(?:\\n|[ \\\\t]+#) 1
> > FAIL: gcc.target/i386/cond_op_fma_float-1.c scan-assembler-times 
> > vfmadd132ps[ \\\\t]+[^{\\n]*%ymm[0-9]+{%k[1-7]}(?:\\n|[ \\\\t]+#) 1
> > FAIL: gcc.target/i386/cond_op_fma_float-1.c scan-assembler-times 
> > vfmsub132ps[ \\\\t]+[^{\\n]*%ymm[0-9]+{%k[1-7]}(?:\\n|[ \\\\t]+#) 1
> > FAIL: gcc.target/i386/cond_op_fma_float-1.c scan-assembler-times 
> > vfnmadd132ps[ \\\\t]+[^{\\n]*%ymm[0-9]+{%k[1-7]}(?:\\n|[ \\\\t]+#) 1
> > FAIL: gcc.target/i386/cond_op_fma_float-1.c scan-assembler-times 
> > vfnmsub132ps[ \\\\t]+[^{\\n]*%ymm[0-9]+{%k[1-7]}(?:\\n|[ \\\\t]+#) 1
> > FAIL: gcc.target/i386/pr101950-2.c scan-assembler-times \\txor[ql]\\t 2
> > FAIL: gcc.target/i386/pr88531-2b.c scan-assembler-times vmulps 1
> > FAIL: gcc.target/i386/pr88531-2c.c scan-assembler-times vmulps 1
> > FAIL: gcc.target/i386/vectorize1.c scan-tree-dump vect "vect_cst"
> > FAIL: gfortran.dg/temporary_3.f90   -O2  execution test
> > FAIL: gfortran.dg/vect/fast-math-mgrid-resid.f   -O   scan-tree-dump pcom 
> > "Executing predictive commoning without unrolling"
> > FAIL: gfortran.dg/vect/vect-8.f90   -O   scan-tree-dump-times vect 
> > "vectorized 2[234] loops" 1
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

  reply	other threads:[~2024-05-17 12:53 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-17 10:36 Richard Biener
2024-05-17 12:08 ` Richard Sandiford
2024-05-17 12:53   ` Richard Biener [this message]
2024-05-21  6:55     ` Tamar Christina

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=057q8r2q-8r94-5sr0-51np-4pq441oqq0s1@fhfr.qr \
    --to=rguenther@suse.de \
    --cc=gcc@gcc.gnu.org \
    --cc=richard.sandiford@arm.com \
    --cc=tamar.christina@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).