From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 83018388802D for ; Mon, 12 Jul 2021 14:55:57 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 83018388802D Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 18C061FB; Mon, 12 Jul 2021 07:55:57 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.126]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 5FB7D3F774; Mon, 12 Jul 2021 07:55:56 -0700 (PDT) From: Richard Sandiford To: Tamar Christina Mail-Followup-To: Tamar Christina , Richard Biener , nd , "gcc-patches\@gcc.gnu.org" , richard.sandiford@arm.com Cc: Richard Biener , nd , "gcc-patches\@gcc.gnu.org" Subject: Re: [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes. References: <7q3oonr2-92r0-8o9q-s27q-9r735s4n3s3@fhfr.qr> Date: Mon, 12 Jul 2021 15:55:55 +0100 In-Reply-To: (Tamar Christina's message of "Mon, 12 Jul 2021 12:29:23 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-6.4 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Jul 2021 14:55:59 -0000 Tamar Christina writes: >> -----Original Message----- >> From: Richard Sandiford >> Sent: Monday, July 12, 2021 11:26 AM >> To: Tamar Christina >> Cc: Richard Biener ; nd ; gcc- >> patches@gcc.gnu.org >> Subject: Re: [PATCH 1/4]middle-end Vect: Add support for dot-product >> where the sign for the multiplicant changes. >>=20 >> Tamar Christina writes: >> >> -----Original Message----- >> >> From: Richard Sandiford >> >> Sent: Monday, July 12, 2021 10:39 AM >> >> To: Tamar Christina >> >> Cc: Richard Biener ; nd ; gcc- >> >> patches@gcc.gnu.org >> >> Subject: Re: [PATCH 1/4]middle-end Vect: Add support for dot-product >> >> where the sign for the multiplicant changes. >> >> >> >> Tamar Christina writes: >> >> > Hi, >> >> > >> >> >> Richard Sandiford writes: >> >> >> >> @@ -992,21 +1029,27 @@ vect_recog_dot_prod_pattern (vec_info >> >> >> *vinfo, >> >> >> >> /* FORNOW. Can continue analyzing the def-use chain when >> >> >> >> this stmt in >> >> >> a phi >> >> >> >> inside the loop (in case we are analyzing an outer-loop).= */ >> >> >> >> vect_unpromoted_value unprom0[2]; >> >> >> >> + enum optab_subtype subtype =3D optab_vector; >> >> >> >> if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR, >> >> >> WIDEN_MULT_EXPR, >> >> >> >> - false, 2, unprom0, &half_type)) >> >> >> >> + false, 2, unprom0, &half_type, &subtype)) >> >> >> >> + return NULL; >> >> >> >> + >> >> >> >> + if (subtype =3D=3D optab_vector_mixed_sign >> >> >> >> + && TYPE_UNSIGNED (unprom_mult.type) >> >> >> >> + && TYPE_PRECISION (half_type) * 4 > TYPE_PRECISION >> >> >> >> + (unprom_mult.type)) >> >> >> >> return NULL; >> >> >> > >> >> >> > Isn't the final condition here instead that TYPE1 is narrower th= an >> TYPE2? >> >> >> > I.e. we need to reject the case in which we multiply a signed >> >> >> > and an unsigned value to get a (logically) signed result, but >> >> >> > then zero-extend it (rather than sign-extend it) to the >> >> >> > precision of the >> >> addition. >> >> >> > >> >> >> > That would make the test: >> >> >> > >> >> >> > if (subtype =3D=3D optab_vector_mixed_sign >> >> >> > && TYPE_UNSIGNED (unprom_mult.type) >> >> >> > && TYPE_PRECISION (unprom_mult.type) < TYPE_PRECISION >> (type)) >> >> >> > return NULL; >> >> >> > >> >> >> > instead. >> >> >> >> >> >> And folding that into the existing test gives: >> >> >> >> >> >> /* If there are two widening operations, make sure they agree on >> >> >> the >> >> sign >> >> >> of the extension. The result of an optab_vector_mixed_sign >> operation >> >> >> is signed; otherwise, the result has the same sign as the ope= rands. >> */ >> >> >> if (TYPE_PRECISION (unprom_mult.type) !=3D TYPE_PRECISION (type) >> >> >> && (subtype =3D=3D optab_vector_mixed_sign >> >> >> ? TYPE_UNSIGNED (unprom_mult.type) >> >> >> : TYPE_SIGN (unprom_mult.type) !=3D TYPE_SIGN (half_type))) >> >> >> return NULL; >> >> >> >> >> > >> >> > I went with the first one which doesn't add the extra constraints >> >> > for the normal dotproduct as that makes it too restrictive. It's >> >> > the type of the multiplication that determines the operation so >> >> > dotproduct can be used a bit more than where we currently do. >> >> > >> >> > This was relaxed in an earlier patch. >> >> >> >> I didn't mean that we should add extra constraints to the normal case >> though. >> >> The existing test I was referring to above was: >> >> >> >> /* If there are two widening operations, make sure they agree on >> >> the sign of the extension. */ >> >> if (TYPE_PRECISION (unprom_mult.type) !=3D TYPE_PRECISION (type) >> >> && TYPE_SIGN (unprom_mult.type) !=3D TYPE_SIGN (half_type)) >> >> return NULL; >> > >> > But as I mentioned, this restriction is unneeded and has been removed >> hence why it's not in my patchset's diff. >> > It's removed by >> > https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569851.html which >> Richi conditioned on the rest of these patches being approved. >> > >> > This change needlessly blocks test vect-reduc-dot-[2,3,6,7].c from >> > being dotproducts for instance >> > >> > It's also part of the deficiency between GCC codegen and Clang >> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D88492#c6 >>=20 >> Hmm, OK. Just removing the check regresses: >>=20 >> unsigned long __attribute__ ((noipa)) >> f (signed short *x, signed short *y) >> { >> unsigned long res =3D 0; >> for (int i =3D 0; i < 100; ++i) >> res +=3D (unsigned int) x[i] * (unsigned int) y[i]; >> return res; >> } >>=20 >> int >> main (void) >> { >> signed short x[100], y[100]; >> for (int i =3D 0; i < 100; ++i) >> { >> x[i] =3D -1; >> y[i] =3D 1; >> } >> if (f (x, y) !=3D 0x6400000000ULL - 100) >> __builtin_abort (); >> return 0; >> } >>=20 >> on SVE. We then use SDOT even though the result of the multiplication is >> zero- rather than sign-extended to 64 bits. Does something else in the = series >> stop that from that happening? > > No, and I hadn't noticed it before because it looks like the mid-end test= s that are execution test don't turn on dot-product for arm targets :/=20 Yeah, I was surprised I needed SVE to get an SDOT above, but didn't look into why=E2=80=A6 > I'll look at it separately, for now I've then added the check back in. > > Ok for trunk now? Reviewing the full patch this time: I have a couple of nits about the documentation, but otherwise it LGTM. > --- a/gcc/doc/md.texi > +++ b/gcc/doc/md.texi > @@ -5446,13 +5446,55 @@ Like @samp{fold_left_plus_@var{m}}, but takes an = additional mask operand >=20=20 > @cindex @code{sdot_prod@var{m}} instruction pattern > @item @samp{sdot_prod@var{m}} > + > +Compute the sum of the products of two signed elements. > +Operand 1 and operand 2 are of the same mode. Their > +product, which is of a wider mode, is computed and added to operand 3. > +Operand 3 is of a mode equal or wider than the mode of the product. The > +result is placed in operand 0, which is of the same mode as operand 3. > + > +Semantically the expressions perform the multiplication in the following= signs > + > +@smallexample > +sdot =3D=3D > + res =3D sign-ext (a) * sign-ext (b) + c > +@dots{} > +@end smallexample I think putting signed c first in the argument list might be confusing, since like you say, it corresponds to operand 3 rather than operand 1. How about calling them op0, op1, op2 and op3 instead of res, a, b and c, and listing them in that order? Same for udot_prod. (Someone who doesn't know the AArch64 instructions might wonder how the elements of op1 and op2 correspond to elements of op0 and op3. That's a pre-existing problem though, so no need to fix it here.) > @cindex @code{udot_prod@var{m}} instruction pattern > -@itemx @samp{udot_prod@var{m}} > -Compute the sum of the products of two signed/unsigned elements. > -Operand 1 and operand 2 are of the same mode. Their product, which is of= a > -wider mode, is computed and added to operand 3. Operand 3 is of a mode e= qual or > -wider than the mode of the product. The result is placed in operand 0, w= hich > -is of the same mode as operand 3. > +@item @samp{udot_prod@var{m}} > + > +Compute the sum of the products of two unsigned elements. > +Operand 1 and operand 2 are of the same mode. Their > +product, which is of a wider mode, is computed and added to operand 3. > +Operand 3 is of a mode equal or wider than the mode of the product. The > +result is placed in operand 0, which is of the same mode as operand 3. > + > +Semantically the expressions perform the multiplication in the following= signs > + > +@smallexample > +udot =3D=3D > + res =3D zero-ext (a) * zero-ext (b) + c > +@dots{} > +@end smallexample > + > + > + Should just be one blank line here. > +@cindex @code{usdot_prod@var{m}} instruction pattern > +@item @samp{usdot_prod@var{m}} > +Compute the sum of the products of elements of different signs. > +Operand 1 must be unsigned and operand 2 signed. Their > +product, which is of a wider mode, is computed and added to operand 3. > +Operand 3 is of a mode equal or wider than the mode of the product. The > +result is placed in operand 0, which is of the same mode as operand 3. > + > +Semantically the expressions perform the multiplication in the following= signs > + > +@smallexample > +usdot =3D=3D > + res =3D ((unsigned-conv) sign-ext (a)) * zero-ext (b) + c It looks like the extensions are the wrong way around. I think it should b= e: usdot =3D=3D res =3D ((signed-conv) zero-ext (a)) * sign-ext (b) + c (before the changes to put c last and use the opN names). I.e. the unsigned operand is zero-extended and the signed operand is sign extended. I think it's easier to understand if we treat the multiplication and c as signed, since in that case we don't reinterpret any negative signed value (of b) as an unsigned value. (Both choices make sense for =E2=80=9Ca=E2=80=9D, since the zero-ext(a) fits into both a = signed wider int and an unsigned wider int.) OK with those changes, and thanks for your patience through the slow review= s. Richard