From: Andrew Pinski <pinskia@gmail.com>
To: Tamar Christina <Tamar.Christina@arm.com>
Cc: Victor Do Nascimento <Victor.DoNascimento@arm.com>,
GCC Patches <gcc-patches@gcc.gnu.org>,
Richard Sandiford <Richard.Sandiford@arm.com>,
Richard Earnshaw <Richard.Earnshaw@arm.com>,
Victor Do Nascimento <vicdon01@e133397.arm.com>
Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer
Date: Thu, 16 May 2024 19:53:36 +0200 [thread overview]
Message-ID: <CA+=Sn1nXfTYUFeBUKaF1dcyup9YiPROeSEMOMUpMJMdesb19cg@mail.gmail.com> (raw)
In-Reply-To: <VI1PR08MB532544208751C134004E5EF2FFED2@VI1PR08MB5325.eurprd08.prod.outlook.com>
[-- Attachment #1: Type: text/plain, Size: 13715 bytes --]
On Thu, May 16, 2024, 7:46 PM Tamar Christina <Tamar.Christina@arm.com>
wrote:
> Hi Victor,
>
> > -----Original Message-----
> > From: Victor Do Nascimento <victor.donascimento@arm.com>
> > Sent: Thursday, May 16, 2024 3:39 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Richard Sandiford <Richard.Sandiford@arm.com>; Richard Earnshaw
> > <Richard.Earnshaw@arm.com>; Victor Do Nascimento
> > <vicdon01@e133397.arm.com>
> > Subject: [PATCH] middle-end: Expand {u|s}dot product support in
> autovectorizer
> >
> > From: Victor Do Nascimento <vicdon01@e133397.arm.com>
> >
> > At present, the compiler offers the `{u|s|us}dot_prod_optab' direct
> > optabs for dealing with vectorizable dot product code sequences. The
> > consequence of using a direct optab for this is that backend-pattern
> > selection is only ever able to match against one datatype - Either
> > that of the operands or of the accumulated value, never both.
> >
> > With the introduction of the 2-way (un)signed dot-product insn [1][2]
> > in AArch64 SVE2, the existing direct opcode approach is no longer
> > sufficient for full specification of all the possible dot product
> > machine instructions to be matched to the code sequence; a dot product
> > resulting in VNx4SI may result from either dot products on VNx16QI or
> > VNx8HI values for the 4- and 2-way dot product operations, respectively.
> >
> > This means that the following example fails autovectorization:
> >
> > uint32_t foo(int n, uint16_t* data) {
> > uint32_t sum = 0;
> > for (int i=0; i<n; i+=1) {
> > sum += data[i] * data[i];
> > }
> > return sum;
> > }
> >
> > To remedy the issue a new optab is added, tentatively named
> > `udot_prod_twoway_optab', whose selection is dependent upon checking
> > of both input and output types involved in the operation.
> >
> > In order to minimize changes to the existing codebase,
> > `optab_for_tree_code' is renamed `optab_for_tree_code_1' and a new
> > argument is added to its signature - `const_tree otype', allowing type
> > information to be specified for both input and output types. The
> > existing nterface is retained by defining a new `optab_for_tree_code',
> > which serves as a shim to `optab_for_tree_code_1', passing old
> > parameters as-is and setting the new `optype' argument to `NULL_TREE'.
> >
> > For DOT_PROD_EXPR tree codes, we can call `optab_for_tree_code_1'
> > directly, passing it both types, adding the internal logic to the
> > function to distinguish between competing optabs.
> >
> > Finally, necessary changes are made to `expand_widen_pattern_expr' to
> > ensure the new icode can be correctly selected, given the new optab.
> >
> > [1] https://developer.arm.com/documentation/ddi0602/2024-03/SVE-
> > Instructions/UDOT--2-way--vectors---Unsigned-integer-dot-product-
> > [2] https://developer.arm.com/documentation/ddi0602/2024-03/SVE-
> > Instructions/SDOT--2-way--vectors---Signed-integer-dot-product-
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64-sve2.md
> > (@aarch64_sve_<sur>dotvnx4sivnx8hi):
> > renamed to `<sur>dot_prod_twoway_vnx8hi'.
> > * config/aarch64/aarch64-sve-builtins-base.cc (svdot_impl.expand):
> > update icodes used in line with above rename.
>
> Please split the target specific bits from the target agnostic parts.
> I.e. this patch series should be split in two.
>
> > * optabs-tree.cc (optab_for_tree_code_1): Renamed
> > `optab_for_tree_code' and added new argument.
> > (optab_for_tree_code): Now a call to `optab_for_tree_code_1'.
> > * optabs-tree.h (optab_for_tree_code_1): New.
> > * optabs.cc (expand_widen_pattern_expr): Expand support for
> > DOT_PROD_EXPR patterns.
> > * optabs.def (udot_prod_twoway_optab): New.
> > (sdot_prod_twoway_optab): Likewise.
> > * tree-vect-patterns.cc (vect_supportable_direct_optab_p): Add
> > support for misc optabs that use two modes.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.dg/vect/vect-dotprod-twoway.c: New.
> > ---
> > .../aarch64/aarch64-sve-builtins-base.cc | 4 ++--
> > gcc/config/aarch64/aarch64-sve2.md | 2 +-
> > gcc/optabs-tree.cc | 23 ++++++++++++++++--
> > gcc/optabs-tree.h | 2 ++
> > gcc/optabs.cc | 2 +-
> > gcc/optabs.def | 2 ++
> > .../gcc.dg/vect/vect-dotprod-twoway.c | 24 +++++++++++++++++++
> > gcc/tree-vect-patterns.cc | 2 +-
> > 8 files changed, 54 insertions(+), 7 deletions(-)
> > create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> >
> > diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > index 0d2edf3f19e..e457db09f66 100644
> > --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > @@ -764,8 +764,8 @@ public:
> > icode = (e.type_suffix (0).float_p
> > ? CODE_FOR_aarch64_sve_fdotvnx4sfvnx8hf
> > : e.type_suffix (0).unsigned_p
> > - ? CODE_FOR_aarch64_sve_udotvnx4sivnx8hi
> > - : CODE_FOR_aarch64_sve_sdotvnx4sivnx8hi);
> > + ? CODE_FOR_udot_prod_twoway_vnx8hi
> > + : CODE_FOR_sdot_prod_twoway_vnx8hi);
> > return e.use_unpred_insn (icode);
> > }
> > };
> > diff --git a/gcc/config/aarch64/aarch64-sve2.md
> b/gcc/config/aarch64/aarch64-
> > sve2.md
> > index 934e57055d3..5677de7108d 100644
> > --- a/gcc/config/aarch64/aarch64-sve2.md
> > +++ b/gcc/config/aarch64/aarch64-sve2.md
> > @@ -2021,7 +2021,7 @@ (define_insn
> > "@aarch64_sve_qsub_<sve_int_op>_lane_<mode>"
> > )
> >
> > ;; Two-way dot-product.
> > -(define_insn "@aarch64_sve_<sur>dotvnx4sivnx8hi"
> > +(define_insn "<sur>dot_prod_twoway_vnx8hi"
>
> This drops the @ modifier, did it have no callers?
>
> > [(set (match_operand:VNx4SI 0 "register_operand")
> > (plus:VNx4SI
> > (unspec:VNx4SI
> > diff --git a/gcc/optabs-tree.cc b/gcc/optabs-tree.cc
> > index b69a5bc3676..e3c5a618ea2 100644
> > --- a/gcc/optabs-tree.cc
> > +++ b/gcc/optabs-tree.cc
> > @@ -35,8 +35,8 @@ along with GCC; see the file COPYING3. If not see
> > cannot give complete results for multiplication or division) but
> probably
> > ought to be relied on more widely throughout the expander. */
> > optab
> > -optab_for_tree_code (enum tree_code code, const_tree type,
> > - enum optab_subtype subtype)
> > +optab_for_tree_code_1 (enum tree_code code, const_tree type,
> > + const_tree otype, enum optab_subtype subtype)
> > {
>
> Since we're C++ now, why not just overload optab_for_tree_code.
> Both functions are useful in their own right so it seems better to
> overload them.
> By overloading it also means you don't have to change the existing call
> sites.
>
> I'd also make the optab_for_tree_code with one type just call
> optab_for_tree_code
> with the same type twice.
>
> > bool trapv;
> > switch (code)
> > @@ -149,6 +149,14 @@ optab_for_tree_code (enum tree_code code, const_tree
> > type,
> >
> > case DOT_PROD_EXPR:
> > {
> > + if (otype && (TYPE_PRECISION (TREE_TYPE (type)) * 2
> > + == TYPE_PRECISION (TREE_TYPE (otype))))
> > + {
>
> Because then you can remove the check for otype here,
> And store TREE_TYPE (type) and TREE_TYPE (otype) safely into a variable so
> the if becomes
> if (otype != type
> && TYPE_PRECISION ...
> && TYPE_UNSIGNED (type) == TYPE_UNSIGNED (otype)
>
> > + if (TYPE_UNSIGNED (type) && TYPE_UNSIGNED (otype))
> > + return udot_prod_twoway_optab;
> > + if (!TYPE_UNSIGNED (type) && !TYPE_UNSIGNED (otype))
> > + return sdot_prod_twoway_optab;
> > + }
>
> And you can then simplify this into
>
> return (TYPE_UNSIGNED (otype) ? udot_prod_twoway_optab :
> sdot_prod_twoway_optab;
>
> > if (subtype == optab_vector_mixed_sign)
> > return usdot_prod_optab;
> >
> > @@ -285,6 +293,17 @@ optab_for_tree_code (enum tree_code code, const_tree
> > type,
> > }
> > }
> >
> > +/* Return the optab used for computing the operation given by the tree
> code,
> > + CODE and the tree EXP. This function is not always usable (for
> example, it
> > + cannot give complete results for multiplication or division) but
> probably
> > + ought to be relied on more widely throughout the expander. */
> > +optab
> > +optab_for_tree_code (enum tree_code code, const_tree type,
> > + enum optab_subtype subtype)
> > +{
> > + return optab_for_tree_code_1 (code, type, NULL_TREE, subtype);
> > +}
> > +
> > /* Check whether an operation represented by CODE is a 'half' widening
> operation
> > in which the input vector type has half the number of bits of the
> output
> > vector type e.g. V8QI->V8HI.
> > diff --git a/gcc/optabs-tree.h b/gcc/optabs-tree.h
> > index f2b49991462..13ff7ca2e4b 100644
> > --- a/gcc/optabs-tree.h
> > +++ b/gcc/optabs-tree.h
> > @@ -36,6 +36,8 @@ enum optab_subtype
> > /* Return the optab used for computing the given operation on the type
> given by
> > the second argument. The third argument distinguishes between the
> types of
> > vector shifts and rotates. */
> > +optab optab_for_tree_code_1 (enum tree_code, const_tree, const_tree
> > __attribute__((unused)),
> > + enum optab_subtype );
> > optab optab_for_tree_code (enum tree_code, const_tree, enum
> optab_subtype);
> > bool
> > supportable_half_widening_operation (enum tree_code, tree, tree,
> > diff --git a/gcc/optabs.cc b/gcc/optabs.cc
> > index ce91f94ed43..3a1c6c7b90e 100644
> > --- a/gcc/optabs.cc
> > +++ b/gcc/optabs.cc
> > @@ -311,7 +311,7 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx
> > op1, rtx wide_op,
> > gcc_unreachable ();
> >
> > widen_pattern_optab
> > - = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), subtype);
> > + = optab_for_tree_code_1 (ops->code, TREE_TYPE (oprnd0), TREE_TYPE
> > (oprnd2), subtype);
> > }
> > else
> > widen_pattern_optab
> > diff --git a/gcc/optabs.def b/gcc/optabs.def
> > index ad14f9328b9..cf1a6e7a7dc 100644
> > --- a/gcc/optabs.def
> > +++ b/gcc/optabs.def
> > @@ -408,6 +408,8 @@ OPTAB_D (sdot_prod_optab, "sdot_prod$I$a")
> > OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3")
> > OPTAB_D (udot_prod_optab, "udot_prod$I$a")
> > OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
> > +OPTAB_D (sdot_prod_twoway_optab, "sdot_prod_twoway_$I$a")
> > +OPTAB_D (udot_prod_twoway_optab, "udot_prod_twoway_$I$a")
> > OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
> > OPTAB_D (usad_optab, "usad$I$a")
> > OPTAB_D (ssad_optab, "ssad$I$a")
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> > b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> > new file mode 100644
> > index 00000000000..cba2aadbec8
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> > @@ -0,0 +1,24 @@
> > +/* { dg-do compile { target { aarch64*-*-* } } } */
> > +/* { dg-additional-options "-march=-O3 -march=armv9.2-a+sme2
> -fdump-tree-
> > vect-details" { target { aarch64*-*-* } } } */
>
> Does this work? -march=-O3? I am guessing you're getting lucky that the
> later -march overrides it?
>
Yes the later overrides the first one. I did report that as a bug a few
years back, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103422 .
Thanks,
Andrew
> For tests in the vectorizer testsuite you shouldn't give optimization
> options as the tests
> are ran with -fno-vect-cost-model. I.e. we'll vectorize even if
> unprofitable.
>
> You also shouldn't add the -fdump-tree-vect-details, the vectorizer
> testsuite adds it
> automatically.
>
> > +
> > +#include <stdint.h>
> > +
> > +uint32_t udot2(int n, uint16_t* data) {
> > + uint32_t sum = 0;
> > + for (int i=0; i<n; i+=1) {
> > + sum += data[i] * data[i];
> > + }
> > + return sum;
> > +}
> > +
> > +int32_t sdot2(int n, int16_t* data) {
> > + int32_t sum = 0;
> > + for (int i=0; i<n; i+=1) {
> > + sum += data[i] * data[i];
> > + }
> > + return sum;
> > +}
>
> So you have an AArch64 only test and place it in generic folder.
> You should either make it aarch64 only or make the test a generic test.
>
> Normally for these I add a generic test that scans for the vectorizer
> scans as you've done below
> And I then add an aach64 specific test with check-functions-bodies that
> checks that we actually
> produced the right instructions.
>
> Your test at the moment do not actually guarantee that the instruction
> gets generated.
>
> Look at how the tests for 4 way dot product are written for inspiration
> here.
>
> > +
> > +/* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern:
> detected:" 2
> > "vect" } } */
> > +/* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern:
> detected"
> > 4 "vect" } } */
> > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } }
> */
> > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> > index dfb7d800526..0760f25d94d 100644
> > --- a/gcc/tree-vect-patterns.cc
> > +++ b/gcc/tree-vect-patterns.cc
> > @@ -233,7 +233,7 @@ vect_supportable_direct_optab_p (vec_info *vinfo,
> tree
> > otype, tree_code code,
> > if (!vecotype)
> > return false;
> >
> > - optab optab = optab_for_tree_code (code, vecitype, subtype);
> > + optab optab = optab_for_tree_code_1 (code, vecitype, vecotype,
> subtype);
>
> If you overload the function then these changes go away.
>
> Thanks for working on this,
> Tamar
>
> > if (!optab)
> > return false;
> >
> > --
> > 2.34.1
>
>
next prev parent reply other threads:[~2024-05-16 17:53 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-16 14:39 Victor Do Nascimento
2024-05-16 14:45 ` Andrew Pinski
2024-05-16 17:45 ` Tamar Christina
2024-05-16 17:53 ` Andrew Pinski [this message]
2024-05-17 2:08 ` Hongtao Liu
2024-05-17 2:13 ` Hongtao Liu
2024-05-17 9:09 ` Tamar Christina
2024-05-17 5:51 ` Richard Biener
2024-05-17 9:04 ` Tamar Christina
2024-05-17 9:46 ` Richard Biener
2024-05-17 9:55 ` Tamar Christina
2024-05-17 10:13 ` Richard Biener
2024-05-17 14:53 ` Victor Do Nascimento
2024-05-20 16:28 ` Richard Sandiford
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CA+=Sn1nXfTYUFeBUKaF1dcyup9YiPROeSEMOMUpMJMdesb19cg@mail.gmail.com' \
--to=pinskia@gmail.com \
--cc=Richard.Earnshaw@arm.com \
--cc=Richard.Sandiford@arm.com \
--cc=Tamar.Christina@arm.com \
--cc=Victor.DoNascimento@arm.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=vicdon01@e133397.arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).