From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <richard.sandiford@arm.com>
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
 by sourceware.org (Postfix) with ESMTP id 83018388802D
 for <gcc-patches@gcc.gnu.org>; Mon, 12 Jul 2021 14:55:57 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 83018388802D
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
 by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 18C061FB;
 Mon, 12 Jul 2021 07:55:57 -0700 (PDT)
Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.126])
 by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 5FB7D3F774;
 Mon, 12 Jul 2021 07:55:56 -0700 (PDT)
From: Richard Sandiford <richard.sandiford@arm.com>
To: Tamar Christina <Tamar.Christina@arm.com>
Mail-Followup-To: Tamar Christina <Tamar.Christina@arm.com>,
 Richard Biener <rguenther@suse.de>, nd <nd@arm.com>,
 "gcc-patches\@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
 richard.sandiford@arm.com
Cc: Richard Biener <rguenther@suse.de>, nd <nd@arm.com>,
 "gcc-patches\@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH 1/4]middle-end Vect: Add support for dot-product where the
 sign for the multiplicant changes.
References: <patch-14433-tamar@arm.com>
 <nycvar.YFH.7.76.2105071337560.9200@zhemvz.fhfr.qr>
 <VI1PR08MB5325998C3057A611E268B740FF579@VI1PR08MB5325.eurprd08.prod.outlook.com>
 <nycvar.YFH.7.76.2105101325270.9200@zhemvz.fhfr.qr>
 <VI1PR08MB5325748CA53CAFBD3B39AFB1FF549@VI1PR08MB5325.eurprd08.prod.outlook.com>
 <nycvar.YFH.7.76.2105101522390.9200@zhemvz.fhfr.qr>
 <VI1PR08MB532566E22D4A25DCD37DBC99FF259@VI1PR08MB5325.eurprd08.prod.outlook.com>
 <7q3oonr2-92r0-8o9q-s27q-9r735s4n3s3@fhfr.qr>
 <VI1PR08MB53256B0E3990410089DB3333FF3D9@VI1PR08MB5325.eurprd08.prod.outlook.com>
 <VI1PR08MB532593667144D0CA998C2723FF3B9@VI1PR08MB5325.eurprd08.prod.outlook.com>
 <mptzgw2ouif.fsf@arm.com>
 <VI1PR08MB5325C6EB7D937DEDBADF362FFF319@VI1PR08MB5325.eurprd08.prod.outlook.com>
 <mptim26cgmj.fsf@arm.com> <mpt7dimcfog.fsf@arm.com>
 <VI1PR08MB53255B22281D39CB0244E44FFF159@VI1PR08MB5325.eurprd08.prod.outlook.com>
 <mptbl77j2h1.fsf@arm.com>
 <VI1PR08MB53259C129C42CFADD2FDBA8DFF159@VI1PR08MB5325.eurprd08.prod.outlook.com>
 <mpt5yxfj0bp.fsf@arm.com>
 <VI1PR08MB532570940234239E9F607285FF159@VI1PR08MB5325.eurprd08.prod.outlook.com>
Date: Mon, 12 Jul 2021 15:55:55 +0100
In-Reply-To: <VI1PR08MB532570940234239E9F607285FF159@VI1PR08MB5325.eurprd08.prod.outlook.com>
 (Tamar Christina's message of "Mon, 12 Jul 2021 12:29:23 +0000")
Message-ID: <mpta6mrh98k.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-6.4 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS,
 KAM_SHORT, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Mon, 12 Jul 2021 14:55:59 -0000

Tamar Christina <Tamar.Christina@arm.com> writes:
>> -----Original Message-----
>> From: Richard Sandiford <richard.sandiford@arm.com>
>> Sent: Monday, July 12, 2021 11:26 AM
>> To: Tamar Christina <Tamar.Christina@arm.com>
>> Cc: Richard Biener <rguenther@suse.de>; nd <nd@arm.com>; gcc-
>> patches@gcc.gnu.org
>> Subject: Re: [PATCH 1/4]middle-end Vect: Add support for dot-product
>> where the sign for the multiplicant changes.
>>=20
>> Tamar Christina <Tamar.Christina@arm.com> writes:
>> >> -----Original Message-----
>> >> From: Richard Sandiford <richard.sandiford@arm.com>
>> >> Sent: Monday, July 12, 2021 10:39 AM
>> >> To: Tamar Christina <Tamar.Christina@arm.com>
>> >> Cc: Richard Biener <rguenther@suse.de>; nd <nd@arm.com>; gcc-
>> >> patches@gcc.gnu.org
>> >> Subject: Re: [PATCH 1/4]middle-end Vect: Add support for dot-product
>> >> where the sign for the multiplicant changes.
>> >>
>> >> Tamar Christina <Tamar.Christina@arm.com> writes:
>> >> > Hi,
>> >> >
>> >> >> Richard Sandiford <richard.sandiford@arm.com> writes:
>> >> >> >> @@ -992,21 +1029,27 @@ vect_recog_dot_prod_pattern (vec_info
>> >> >> *vinfo,
>> >> >> >>    /* FORNOW.  Can continue analyzing the def-use chain when
>> >> >> >> this stmt in
>> >> >> a phi
>> >> >> >>       inside the loop (in case we are analyzing an outer-loop).=
  */
>> >> >> >>    vect_unpromoted_value unprom0[2];
>> >> >> >> +  enum optab_subtype subtype =3D optab_vector;
>> >> >> >>    if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR,
>> >> >> WIDEN_MULT_EXPR,
>> >> >> >> -			     false, 2, unprom0, &half_type))
>> >> >> >> +			     false, 2, unprom0, &half_type, &subtype))
>> >> >> >> +    return NULL;
>> >> >> >> +
>> >> >> >> +  if (subtype =3D=3D optab_vector_mixed_sign
>> >> >> >> +      && TYPE_UNSIGNED (unprom_mult.type)
>> >> >> >> +      && TYPE_PRECISION (half_type) * 4 > TYPE_PRECISION
>> >> >> >> + (unprom_mult.type))
>> >> >> >>      return NULL;
>> >> >> >
>> >> >> > Isn't the final condition here instead that TYPE1 is narrower th=
an
>> TYPE2?
>> >> >> > I.e. we need to reject the case in which we multiply a signed
>> >> >> > and an unsigned value to get a (logically) signed result, but
>> >> >> > then zero-extend it (rather than sign-extend it) to the
>> >> >> > precision of the
>> >> addition.
>> >> >> >
>> >> >> > That would make the test:
>> >> >> >
>> >> >> >   if (subtype =3D=3D optab_vector_mixed_sign
>> >> >> >       && TYPE_UNSIGNED (unprom_mult.type)
>> >> >> >       && TYPE_PRECISION (unprom_mult.type) < TYPE_PRECISION
>> (type))
>> >> >> >     return NULL;
>> >> >> >
>> >> >> > instead.
>> >> >>
>> >> >> And folding that into the existing test gives:
>> >> >>
>> >> >>   /* If there are two widening operations, make sure they agree on
>> >> >> the
>> >> sign
>> >> >>      of the extension.  The result of an optab_vector_mixed_sign
>> operation
>> >> >>      is signed; otherwise, the result has the same sign as the ope=
rands.
>> */
>> >> >>   if (TYPE_PRECISION (unprom_mult.type) !=3D TYPE_PRECISION (type)
>> >> >>       && (subtype =3D=3D optab_vector_mixed_sign
>> >> >> 	  ? TYPE_UNSIGNED (unprom_mult.type)
>> >> >> 	  : TYPE_SIGN (unprom_mult.type) !=3D TYPE_SIGN (half_type)))
>> >> >>     return NULL;
>> >> >>
>> >> >
>> >> > I went with the first one which doesn't add the extra constraints
>> >> > for the normal dotproduct as that makes it too restrictive. It's
>> >> > the type of the multiplication that determines the operation so
>> >> > dotproduct can be used a bit more than where we currently do.
>> >> >
>> >> > This was relaxed in an earlier patch.
>> >>
>> >> I didn't mean that we should add extra constraints to the normal case
>> though.
>> >> The existing test I was referring to above was:
>> >>
>> >>   /* If there are two widening operations, make sure they agree on
>> >>      the sign of the extension.  */
>> >>   if (TYPE_PRECISION (unprom_mult.type) !=3D TYPE_PRECISION (type)
>> >>       && TYPE_SIGN (unprom_mult.type) !=3D TYPE_SIGN (half_type))
>> >>     return NULL;
>> >
>> > But as I mentioned, this restriction is unneeded and has been removed
>> hence why it's not in my patchset's diff.
>> > It's removed by
>> > https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569851.html which
>> Richi conditioned on the rest of these patches being approved.
>> >
>> > This change needlessly blocks test vect-reduc-dot-[2,3,6,7].c from
>> > being dotproducts for instance
>> >
>> > It's also part of the deficiency between GCC codegen and Clang
>> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D88492#c6
>>=20
>> Hmm, OK.  Just removing the check regresses:
>>=20
>> unsigned long __attribute__ ((noipa))
>> f (signed short *x, signed short *y)
>> {
>>   unsigned long res =3D 0;
>>   for (int i =3D 0; i < 100; ++i)
>>     res +=3D (unsigned int) x[i] * (unsigned int) y[i];
>>   return res;
>> }
>>=20
>> int
>> main (void)
>> {
>>   signed short x[100], y[100];
>>   for (int i =3D 0; i < 100; ++i)
>>     {
>>       x[i] =3D -1;
>>       y[i] =3D 1;
>>     }
>>   if (f (x, y) !=3D 0x6400000000ULL - 100)
>>     __builtin_abort ();
>>   return 0;
>> }
>>=20
>> on SVE.  We then use SDOT even though the result of the multiplication is
>> zero- rather than sign-extended to 64 bits.  Does something else in the =
series
>> stop that from that happening?
>
> No, and I hadn't noticed it before because it looks like the mid-end test=
s that are execution test don't turn on dot-product for arm targets :/=20

Yeah, I was surprised I needed SVE to get an SDOT above, but didn't look
into why=E2=80=A6

> I'll look at it separately, for now I've then added the check back in.
>
> Ok for trunk now?

Reviewing the full patch this time: I have a couple of nits about
the documentation, but otherwise it LGTM.

> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5446,13 +5446,55 @@ Like @samp{fold_left_plus_@var{m}}, but takes an =
additional mask operand
>=20=20
>  @cindex @code{sdot_prod@var{m}} instruction pattern
>  @item @samp{sdot_prod@var{m}}
> +
> +Compute the sum of the products of two signed elements.
> +Operand 1 and operand 2 are of the same mode. Their
> +product, which is of a wider mode, is computed and added to operand 3.
> +Operand 3 is of a mode equal or wider than the mode of the product. The
> +result is placed in operand 0, which is of the same mode as operand 3.
> +
> +Semantically the expressions perform the multiplication in the following=
 signs
> +
> +@smallexample
> +sdot<signed c, signed a, signed b> =3D=3D
> +   res =3D sign-ext (a) * sign-ext (b) + c
> +@dots{}
> +@end smallexample

I think putting signed c first in the argument list might be confusing,
since like you say, it corresponds to operand 3 rather than operand 1.
How about calling them op0, op1, op2 and op3 instead of res, a, b and c,
and listing them in that order?

Same for udot_prod.

(Someone who doesn't know the AArch64 instructions might wonder how
the elements of op1 and op2 correspond to elements of op0 and op3.
That's a pre-existing problem though, so no need to fix it here.)

>  @cindex @code{udot_prod@var{m}} instruction pattern
> -@itemx @samp{udot_prod@var{m}}
> -Compute the sum of the products of two signed/unsigned elements.
> -Operand 1 and operand 2 are of the same mode. Their product, which is of=
 a
> -wider mode, is computed and added to operand 3. Operand 3 is of a mode e=
qual or
> -wider than the mode of the product. The result is placed in operand 0, w=
hich
> -is of the same mode as operand 3.
> +@item @samp{udot_prod@var{m}}
> +
> +Compute the sum of the products of two unsigned elements.
> +Operand 1 and operand 2 are of the same mode. Their
> +product, which is of a wider mode, is computed and added to operand 3.
> +Operand 3 is of a mode equal or wider than the mode of the product. The
> +result is placed in operand 0, which is of the same mode as operand 3.
> +
> +Semantically the expressions perform the multiplication in the following=
 signs
> +
> +@smallexample
> +udot<unsigned c, unsigned a, unsigned b> =3D=3D
> +   res =3D zero-ext (a) * zero-ext (b) + c
> +@dots{}
> +@end smallexample
> +
> +
> +

Should just be one blank line here.

> +@cindex @code{usdot_prod@var{m}} instruction pattern
> +@item @samp{usdot_prod@var{m}}
> +Compute the sum of the products of elements of different signs.
> +Operand 1 must be unsigned and operand 2 signed. Their
> +product, which is of a wider mode, is computed and added to operand 3.
> +Operand 3 is of a mode equal or wider than the mode of the product. The
> +result is placed in operand 0, which is of the same mode as operand 3.
> +
> +Semantically the expressions perform the multiplication in the following=
 signs
> +
> +@smallexample
> +usdot<unsigned c, unsigned a, signed b> =3D=3D
> +   res =3D ((unsigned-conv) sign-ext (a)) * zero-ext (b) + c

It looks like the extensions are the wrong way around.  I think it should b=
e:

usdot<signed c, unsigned a, signed b> =3D=3D
   res =3D ((signed-conv) zero-ext (a)) * sign-ext (b) + c

(before the changes to put c last and use the opN names).

I.e. the unsigned operand is zero-extended and the signed operand is
sign extended.  I think it's easier to understand if we treat the
multiplication and c as signed, since in that case we don't reinterpret
any negative signed value (of b) as an unsigned value.  (Both choices
make sense for =E2=80=9Ca=E2=80=9D, since the zero-ext(a) fits into both a =
signed wider
int and an unsigned wider int.)

OK with those changes, and thanks for your patience through the slow review=
s.

Richard