From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lj1-x236.google.com (mail-lj1-x236.google.com [IPv6:2a00:1450:4864:20::236]) by sourceware.org (Postfix) with ESMTPS id 51A833858D20 for ; Fri, 17 May 2024 09:46:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 51A833858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 51A833858D20 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::236 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1715939180; cv=none; b=naoL5iPCFEgRErQ+XfGNa2GTXqIq3iES047XfrhQcyD67yrt/658B9y+j42MSD9MJ3UfW9wJF4Vv+DNc27Gz5xPPXfF2qvZX9Z4X+r7/RcbiZwZJWnh938AJJMBffSblheb5hLrP/6H6bVYWDjJPLwEItsG8EFJun5k3NLBDkaE= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1715939180; c=relaxed/simple; bh=bJae+r6vo7RJKq8EDXQD+kSHyfvom5atVU7s53UOJUY=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=EVyywdJOaecFyZFySNOZaHmYw9LyyhU3/FeOcACXWBnBiHS6p9/prKglRyf4JRHloKZHqNve3aR1Cit+ijvvwBC0356qiwBMLbNwSKNaZEb9V1A29IRwdLoD6xrtCx0EhpIqkrP1KaWO7u/if6nZ41CDoOfFq92VkAdg+1TbeZs= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-lj1-x236.google.com with SMTP id 38308e7fff4ca-2e3e18c240fso4885951fa.0 for ; Fri, 17 May 2024 02:46:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1715939176; x=1716543976; darn=gcc.gnu.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=b8pFoiUV6exAn9yxnLAuu2uaVynu9NRnPA4g0XrFwEc=; b=AQvZWXX5BEjh4TrIMwzRc0+4bL+ZLWuh/beZKH4b7JRE6mPt5kPpRPjand4K7hhJiy ASTs87hJKNSs0UTqlmkpXMMWBXPBGjKwk/ULCMBpqLOS7SROHCtfbTakr0sLjNLJy/NI o8CuHzJeGHw93a++WJhpL1GiDzxh+WB89vdhHEnu0gChdbiQpyUpqMMp6JsKrTS2qf98 SayWZa8Nx1L45Am+i4Y+2FRfWX8FjHwZ+DJ6mAY60GNNVeezfvKoAXuWd5nDgKoVQxJx X5XVYopDZOljD+1y5dIOjWbOWxfWmxbPFO0RvyJFKlxARxQilUOkpH7QOZ8lyXnn1GHI ptHA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715939176; x=1716543976; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=b8pFoiUV6exAn9yxnLAuu2uaVynu9NRnPA4g0XrFwEc=; b=kBaWfSr9NfvDEWSLKbdB8WbtyZcRJtyHnJHGmFGUZ8OnkzQYLXNm0CWbaNhoRngZ8q 7Nm49PkXh2K+Trr9KqXRAgT+qZSFR3pKFdeL4gKjMgeRbQ+mnUEYYwOYtiwfV9ry2eA9 uVLgmQA/TIWwCqsv3DwsvTqgxFwHr5pr2HyWw1/Lcx1I1Qp0Bl/hqUirEIy0h6BIBasv N+i1M9dbJ8y8WkWVWosn34wx2owLCnYBVKli70zUFWcVfQmAnyXdoJjJUAokZIEwNmzb l2bazwnJIjcJi7mqAGeWfsQcoYD540F3jBFAGd8G6B8qReD8Tuns7Dh6B50GY5aVznFc I3Tg== X-Forwarded-Encrypted: i=1; AJvYcCVKVeSPjeSPIGBrsrKpK7vaNhkOurqz8im9RCdy1c3tHGH0XEi9QBiwr/4ITdvrowQvTan968nSQOkXyFg8kUS8raJ5PXzrIg== X-Gm-Message-State: AOJu0Yxj/dj6l1slJ1KXvk0Ohtk400GRgj5g2pLDaXoDcI0uYBGRuJ+i wlca55J0phYljTX9g8HOddwOM8CNE/v5CqE7Ac/yPrRgt5hM93S+frPi0ThdW8eC69/WOtfiwJF JrAKjVBPQ3kv79wwit9jN9Yj04Lpn/A== X-Google-Smtp-Source: AGHT+IHm5QJc60qcw/mnMPu/CO/2cnielZScLoEqYtDWNuXIPY7vAnJyDfOta72Bf1W5QIsWEaox80R7B2fESxSut6I= X-Received: by 2002:a2e:8e93:0:b0:2e2:ab69:d101 with SMTP id 38308e7fff4ca-2e51fd3fdeamr185328171fa.7.1715939175414; Fri, 17 May 2024 02:46:15 -0700 (PDT) MIME-Version: 1.0 References: <20240516143920.2513640-1-victor.donascimento@arm.com> In-Reply-To: From: Richard Biener Date: Fri, 17 May 2024 11:46:04 +0200 Message-ID: Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer To: Tamar Christina Cc: Victor Do Nascimento , "gcc-patches@gcc.gnu.org" , Richard Sandiford , Richard Earnshaw , Victor Do Nascimento Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-8.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Fri, May 17, 2024 at 11:05=E2=80=AFAM Tamar Christina wrote: > > > -----Original Message----- > > From: Richard Biener > > Sent: Friday, May 17, 2024 6:51 AM > > To: Victor Do Nascimento > > Cc: gcc-patches@gcc.gnu.org; Richard Sandiford ; > > Richard Earnshaw ; Victor Do Nascimento > > > > Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in > > autovectorizer > > > > On Thu, May 16, 2024 at 4:40=E2=80=AFPM Victor Do Nascimento > > wrote: > > > > > > From: Victor Do Nascimento > > > > > > At present, the compiler offers the `{u|s|us}dot_prod_optab' direct > > > optabs for dealing with vectorizable dot product code sequences. The > > > consequence of using a direct optab for this is that backend-pattern > > > selection is only ever able to match against one datatype - Either > > > that of the operands or of the accumulated value, never both. > > > > > > With the introduction of the 2-way (un)signed dot-product insn [1][2] > > > in AArch64 SVE2, the existing direct opcode approach is no longer > > > sufficient for full specification of all the possible dot product > > > machine instructions to be matched to the code sequence; a dot produc= t > > > resulting in VNx4SI may result from either dot products on VNx16QI or > > > VNx8HI values for the 4- and 2-way dot product operations, respective= ly. > > > > > > This means that the following example fails autovectorization: > > > > > > uint32_t foo(int n, uint16_t* data) { > > > uint32_t sum =3D 0; > > > for (int i=3D0; i > > sum +=3D data[i] * data[i]; > > > } > > > return sum; > > > } > > > > > > To remedy the issue a new optab is added, tentatively named > > > `udot_prod_twoway_optab', whose selection is dependent upon checking > > > of both input and output types involved in the operation. > > > > I don't like this too much. I'll note we document dot_prod as > > > > @cindex @code{sdot_prod@var{m}} instruction pattern > > @item @samp{sdot_prod@var{m}} > > > > Compute the sum of the products of two signed elements. > > Operand 1 and operand 2 are of the same mode. Their > > product, which is of a wider mode, is computed and added to operand 3. > > Operand 3 is of a mode equal or wider than the mode of the product. The > > result is placed in operand 0, which is of the same mode as operand 3. > > @var{m} is the mode of operand 1 and operand 2. > > > > with no restriction on the wider mode but we don't specify it which is > > bad design. This should have been a convert optab with two modes > > from the start - adding a _twoway variant is just a hack. > > We did discuss this at the time we started implementing it. There was tw= o > options, one was indeed to change it to a convert dot_prod optab, but doi= ng > this means we have to update every target that uses it. > > Now that means 3 ISAs for AArch64, Arm, Arc, c6x, 2 for x86, loongson and= altivec. > > Which sure could be possible, but there's also every use in the backends = that need > to be updated, and tested, which for some targets we don't even know how = to begin. > > So it seems very hard to correct dotprod to a convert optab now. It's still the correct way to go. At _least_ your new pattern should have been this, otherwise what do you do when you have two-way, four-way and eight-way vari= ants? Add yet another optab? Another thing is that when you do it your way you should fix the existing o= ptab to be two-way by documenting how the second mode derives from the first. And sure, it's not the only optab suffering from this issue. Richard. > Tamar > > > > > Richard. > > > > > In order to minimize changes to the existing codebase, > > > `optab_for_tree_code' is renamed `optab_for_tree_code_1' and a new > > > argument is added to its signature - `const_tree otype', allowing typ= e > > > information to be specified for both input and output types. The > > > existing nterface is retained by defining a new `optab_for_tree_code'= , > > > which serves as a shim to `optab_for_tree_code_1', passing old > > > parameters as-is and setting the new `optype' argument to `NULL_TREE'= . > > > > > > For DOT_PROD_EXPR tree codes, we can call `optab_for_tree_code_1' > > > directly, passing it both types, adding the internal logic to the > > > function to distinguish between competing optabs. > > > > > > Finally, necessary changes are made to `expand_widen_pattern_expr' to > > > ensure the new icode can be correctly selected, given the new optab. > > > > > > [1] https://developer.arm.com/documentation/ddi0602/2024-03/SVE- > > Instructions/UDOT--2-way--vectors---Unsigned-integer-dot-product- > > > [2] https://developer.arm.com/documentation/ddi0602/2024-03/SVE- > > Instructions/SDOT--2-way--vectors---Signed-integer-dot-product- > > > > > > gcc/ChangeLog: > > > > > > * config/aarch64/aarch64-sve2.md (@aarch64_sve_dotvnx4si= vnx8hi): > > > renamed to `dot_prod_twoway_vnx8hi'. > > > * config/aarch64/aarch64-sve-builtins-base.cc (svdot_impl.exp= and): > > > update icodes used in line with above rename. > > > * optabs-tree.cc (optab_for_tree_code_1): Renamed > > > `optab_for_tree_code' and added new argument. > > > (optab_for_tree_code): Now a call to `optab_for_tree_code_1'. > > > * optabs-tree.h (optab_for_tree_code_1): New. > > > * optabs.cc (expand_widen_pattern_expr): Expand support for > > > DOT_PROD_EXPR patterns. > > > * optabs.def (udot_prod_twoway_optab): New. > > > (sdot_prod_twoway_optab): Likewise. > > > * tree-vect-patterns.cc (vect_supportable_direct_optab_p): Ad= d > > > support for misc optabs that use two modes. > > > > > > gcc/testsuite/ChangeLog: > > > > > > * gcc.dg/vect/vect-dotprod-twoway.c: New. > > > --- > > > .../aarch64/aarch64-sve-builtins-base.cc | 4 ++-- > > > gcc/config/aarch64/aarch64-sve2.md | 2 +- > > > gcc/optabs-tree.cc | 23 ++++++++++++++++-= - > > > gcc/optabs-tree.h | 2 ++ > > > gcc/optabs.cc | 2 +- > > > gcc/optabs.def | 2 ++ > > > .../gcc.dg/vect/vect-dotprod-twoway.c | 24 +++++++++++++++++= ++ > > > gcc/tree-vect-patterns.cc | 2 +- > > > 8 files changed, 54 insertions(+), 7 deletions(-) > > > create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c > > > > > > diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc > > b/gcc/config/aarch64/aarch64-sve-builtins-base.cc > > > index 0d2edf3f19e..e457db09f66 100644 > > > --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc > > > +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc > > > @@ -764,8 +764,8 @@ public: > > > icode =3D (e.type_suffix (0).float_p > > > ? CODE_FOR_aarch64_sve_fdotvnx4sfvnx8hf > > > : e.type_suffix (0).unsigned_p > > > - ? CODE_FOR_aarch64_sve_udotvnx4sivnx8hi > > > - : CODE_FOR_aarch64_sve_sdotvnx4sivnx8hi); > > > + ? CODE_FOR_udot_prod_twoway_vnx8hi > > > + : CODE_FOR_sdot_prod_twoway_vnx8hi); > > > return e.use_unpred_insn (icode); > > > } > > > }; > > > diff --git a/gcc/config/aarch64/aarch64-sve2.md > > b/gcc/config/aarch64/aarch64-sve2.md > > > index 934e57055d3..5677de7108d 100644 > > > --- a/gcc/config/aarch64/aarch64-sve2.md > > > +++ b/gcc/config/aarch64/aarch64-sve2.md > > > @@ -2021,7 +2021,7 @@ (define_insn > > "@aarch64_sve_qsub__lane_" > > > ) > > > > > > ;; Two-way dot-product. > > > -(define_insn "@aarch64_sve_dotvnx4sivnx8hi" > > > +(define_insn "dot_prod_twoway_vnx8hi" > > > [(set (match_operand:VNx4SI 0 "register_operand") > > > (plus:VNx4SI > > > (unspec:VNx4SI > > > diff --git a/gcc/optabs-tree.cc b/gcc/optabs-tree.cc > > > index b69a5bc3676..e3c5a618ea2 100644 > > > --- a/gcc/optabs-tree.cc > > > +++ b/gcc/optabs-tree.cc > > > @@ -35,8 +35,8 @@ along with GCC; see the file COPYING3. If not see > > > cannot give complete results for multiplication or division) but = probably > > > ought to be relied on more widely throughout the expander. */ > > > optab > > > -optab_for_tree_code (enum tree_code code, const_tree type, > > > - enum optab_subtype subtype) > > > +optab_for_tree_code_1 (enum tree_code code, const_tree type, > > > + const_tree otype, enum optab_subtype subtype) > > > { > > > bool trapv; > > > switch (code) > > > @@ -149,6 +149,14 @@ optab_for_tree_code (enum tree_code code, > > const_tree type, > > > > > > case DOT_PROD_EXPR: > > > { > > > + if (otype && (TYPE_PRECISION (TREE_TYPE (type)) * 2 > > > + =3D=3D TYPE_PRECISION (TREE_TYPE (otype)))) > > > + { > > > + if (TYPE_UNSIGNED (type) && TYPE_UNSIGNED (otype)) > > > + return udot_prod_twoway_optab; > > > + if (!TYPE_UNSIGNED (type) && !TYPE_UNSIGNED (otype)) > > > + return sdot_prod_twoway_optab; > > > + } > > > if (subtype =3D=3D optab_vector_mixed_sign) > > > return usdot_prod_optab; > > > > > > @@ -285,6 +293,17 @@ optab_for_tree_code (enum tree_code code, > > const_tree type, > > > } > > > } > > > > > > +/* Return the optab used for computing the operation given by the tr= ee code, > > > + CODE and the tree EXP. This function is not always usable (for e= xample, it > > > + cannot give complete results for multiplication or division) but = probably > > > + ought to be relied on more widely throughout the expander. */ > > > +optab > > > +optab_for_tree_code (enum tree_code code, const_tree type, > > > + enum optab_subtype subtype) > > > +{ > > > + return optab_for_tree_code_1 (code, type, NULL_TREE, subtype); > > > +} > > > + > > > /* Check whether an operation represented by CODE is a 'half' wideni= ng > > operation > > > in which the input vector type has half the number of bits of the= output > > > vector type e.g. V8QI->V8HI. > > > diff --git a/gcc/optabs-tree.h b/gcc/optabs-tree.h > > > index f2b49991462..13ff7ca2e4b 100644 > > > --- a/gcc/optabs-tree.h > > > +++ b/gcc/optabs-tree.h > > > @@ -36,6 +36,8 @@ enum optab_subtype > > > /* Return the optab used for computing the given operation on the ty= pe given > > by > > > the second argument. The third argument distinguishes between th= e types of > > > vector shifts and rotates. */ > > > +optab optab_for_tree_code_1 (enum tree_code, const_tree, const_tree > > __attribute__((unused)), > > > + enum optab_subtype ); > > > optab optab_for_tree_code (enum tree_code, const_tree, enum > > optab_subtype); > > > bool > > > supportable_half_widening_operation (enum tree_code, tree, tree, > > > diff --git a/gcc/optabs.cc b/gcc/optabs.cc > > > index ce91f94ed43..3a1c6c7b90e 100644 > > > --- a/gcc/optabs.cc > > > +++ b/gcc/optabs.cc > > > @@ -311,7 +311,7 @@ expand_widen_pattern_expr (sepops ops, rtx op0, r= tx > > op1, rtx wide_op, > > > gcc_unreachable (); > > > > > > widen_pattern_optab > > > - =3D optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), subty= pe); > > > + =3D optab_for_tree_code_1 (ops->code, TREE_TYPE (oprnd0), TRE= E_TYPE > > (oprnd2), subtype); > > > } > > > else > > > widen_pattern_optab > > > diff --git a/gcc/optabs.def b/gcc/optabs.def > > > index ad14f9328b9..cf1a6e7a7dc 100644 > > > --- a/gcc/optabs.def > > > +++ b/gcc/optabs.def > > > @@ -408,6 +408,8 @@ OPTAB_D (sdot_prod_optab, "sdot_prod$I$a") > > > OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3") > > > OPTAB_D (udot_prod_optab, "udot_prod$I$a") > > > OPTAB_D (usdot_prod_optab, "usdot_prod$I$a") > > > +OPTAB_D (sdot_prod_twoway_optab, "sdot_prod_twoway_$I$a") > > > +OPTAB_D (udot_prod_twoway_optab, "udot_prod_twoway_$I$a") > > > OPTAB_D (usum_widen_optab, "widen_usum$I$a3") > > > OPTAB_D (usad_optab, "usad$I$a") > > > OPTAB_D (ssad_optab, "ssad$I$a") > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c > > b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c > > > new file mode 100644 > > > index 00000000000..cba2aadbec8 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c > > > @@ -0,0 +1,24 @@ > > > +/* { dg-do compile { target { aarch64*-*-* } } } */ > > > +/* { dg-additional-options "-march=3D-O3 -march=3Darmv9.2-a+sme2 -fd= ump-tree- > > vect-details" { target { aarch64*-*-* } } } */ > > > + > > > +#include > > > + > > > +uint32_t udot2(int n, uint16_t* data) { > > > + uint32_t sum =3D 0; > > > + for (int i=3D0; i > > + sum +=3D data[i] * data[i]; > > > + } > > > + return sum; > > > +} > > > + > > > +int32_t sdot2(int n, int16_t* data) { > > > + int32_t sum =3D 0; > > > + for (int i=3D0; i > > + sum +=3D data[i] * data[i]; > > > + } > > > + return sum; > > > +} > > > + > > > +/* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern: d= etected:" > > 2 "vect" } } */ > > > +/* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: > > detected" 4 "vect" } } */ > > > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" }= } */ > > > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc > > > index dfb7d800526..0760f25d94d 100644 > > > --- a/gcc/tree-vect-patterns.cc > > > +++ b/gcc/tree-vect-patterns.cc > > > @@ -233,7 +233,7 @@ vect_supportable_direct_optab_p (vec_info *vinfo,= tree > > otype, tree_code code, > > > if (!vecotype) > > > return false; > > > > > > - optab optab =3D optab_for_tree_code (code, vecitype, subtype); > > > + optab optab =3D optab_for_tree_code_1 (code, vecitype, vecotype, s= ubtype); > > > if (!optab) > > > return false; > > > > > > -- > > > 2.34.1 > > >