From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <richard.sandiford@arm.com>
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
 by sourceware.org (Postfix) with ESMTP id 6BC5C3857816
 for <gcc-patches@gcc.gnu.org>; Tue, 22 Jun 2021 10:56:22 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 6BC5C3857816
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
 by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id F056611D4;
 Tue, 22 Jun 2021 03:56:21 -0700 (PDT)
Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.126])
 by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 3F4F63F694;
 Tue, 22 Jun 2021 03:56:21 -0700 (PDT)
From: Richard Sandiford <richard.sandiford@arm.com>
To: Tamar Christina <Tamar.Christina@arm.com>
Mail-Followup-To: Tamar Christina <Tamar.Christina@arm.com>,
 Richard Biener <rguenther@suse.de>, nd <nd@arm.com>,
 "gcc-patches\@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
 richard.sandiford@arm.com
Cc: Richard Biener <rguenther@suse.de>, nd <nd@arm.com>,
 "gcc-patches\@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH 1/4]middle-end Vect: Add support for dot-product where the
 sign for the multiplicant changes.
References: <patch-14433-tamar@arm.com>
 <nycvar.YFH.7.76.2105071337560.9200@zhemvz.fhfr.qr>
 <VI1PR08MB5325998C3057A611E268B740FF579@VI1PR08MB5325.eurprd08.prod.outlook.com>
 <nycvar.YFH.7.76.2105101325270.9200@zhemvz.fhfr.qr>
 <VI1PR08MB5325748CA53CAFBD3B39AFB1FF549@VI1PR08MB5325.eurprd08.prod.outlook.com>
 <nycvar.YFH.7.76.2105101522390.9200@zhemvz.fhfr.qr>
 <VI1PR08MB532566E22D4A25DCD37DBC99FF259@VI1PR08MB5325.eurprd08.prod.outlook.com>
 <7q3oonr2-92r0-8o9q-s27q-9r735s4n3s3@fhfr.qr>
 <VI1PR08MB53256B0E3990410089DB3333FF3D9@VI1PR08MB5325.eurprd08.prod.outlook.com>
 <VI1PR08MB532593667144D0CA998C2723FF3B9@VI1PR08MB5325.eurprd08.prod.outlook.com>
 <mptzgw2ouif.fsf@arm.com>
 <VI1PR08MB5325C6EB7D937DEDBADF362FFF319@VI1PR08MB5325.eurprd08.prod.outlook.com>
Date: Tue, 22 Jun 2021 11:56:20 +0100
In-Reply-To: <VI1PR08MB5325C6EB7D937DEDBADF362FFF319@VI1PR08MB5325.eurprd08.prod.outlook.com>
 (Tamar Christina's message of "Mon, 14 Jun 2021 12:06:23 +0000")
Message-ID: <mptim26cgmj.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-6.5 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS,
 SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Tue, 22 Jun 2021 10:56:24 -0000

Sorry for the slow review.

Just concentrating on tree-vect-patterns.c, as before:

Tamar Christina <Tamar.Christina@arm.com> writes:
> @@ -521,6 +522,9 @@ vect_joust_widened_type (tree type, tree new_type, tr=
ee *common_type)
>    unsigned int precision =3D MAX (TYPE_PRECISION (*common_type),
>  				TYPE_PRECISION (new_type));
>    precision *=3D 2;
> +
> +  /* The resulting application is unsigned, check if we have enough
> +     precision to perform the operation.  */
>    if (precision * 2 > TYPE_PRECISION (type))
>      return false;
>=20=20

Not sure what the comment means by =E2=80=9Capplication=E2=80=9D here, but =
the common
type we pick is signed rather than unsigned.

> @@ -546,7 +554,8 @@ static unsigned int
>  vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_cod=
e code,
>  		      tree_code widened_code, bool shift_p,
>  		      unsigned int max_nops,
> -		      vect_unpromoted_value *unprom, tree *common_type)
> +		      vect_unpromoted_value *unprom, tree *common_type,
> +		      enum optab_subtype *subtype =3D NULL)
>  {
>    /* Check for an integer operation with the right code.  */
>    gassign *assign =3D dyn_cast <gassign *> (stmt_info->stmt);
> @@ -607,7 +616,8 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info =
stmt_info, tree_code code,
>  		=3D vinfo->lookup_def (this_unprom->op);
>  	      nops =3D vect_widened_op_tree (vinfo, def_stmt_info, code,
>  					   widened_code, shift_p, max_nops,
> -					   this_unprom, common_type);
> +					   this_unprom, common_type,
> +					   subtype);
>  	      if (nops =3D=3D 0)
>  		return 0;
>=20=20
> @@ -625,7 +635,24 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info=
 stmt_info, tree_code code,
>  		*common_type =3D this_unprom->type;
>  	      else if (!vect_joust_widened_type (type, this_unprom->type,
>  						 common_type))
> -		return 0;
> +		{
> +		  if (subtype)
> +		    {

AIUI, if we get here then:

- there must be one unsigned operand (A) of precision P
- there must be one signed operand (B) with precision <=3D P
- we can't extend to precision 2*P=20

A conversion is needed if B's precision is < P.
That conversion should be to a signed type with precision P.

So=E2=80=A6

> +		      tree new_type =3D *common_type;
> +		      /* See if we can sign extend the smaller type.  */
> +		      if (TYPE_PRECISION (this_unprom->type) > TYPE_PRECISION (new_typ=
e)
> +			  && (TYPE_UNSIGNED (this_unprom->type) && !TYPE_UNSIGNED (new_type)))

=E2=80=A6I think this second line could be an assert and

> +			new_type =3D build_nonstandard_integer_type (TYPE_PRECISION (this_unp=
rom->type), true);

=E2=80=A6picking an unsigned type here looks wrong.  The net effect would
be to convert B (the previous signed operand) to an unsigned type.

> +
> +		      if (tree_nop_conversion_p (this_unprom->type, new_type))
> +			{
> +			  *subtype =3D optab_vector_mixed_sign;
> +			  *common_type =3D new_type;
> +			}

IMO the sign of the common type shouldn't matter for optab_vector_mixed_sig=
n:
if we need to convert operands later, it should be to the precision of
the common type but retaining the sign of the original type.
So I think it would be simpler to do:

		      if (TYPE_PRECISION (this_unprom->type)
			  > TYPE_PRECISION (*common_type)
			*common_type =3D this_unprom->type;
		      *subtype =3D optab_vector_mixed_sign;

here and adjust the conversion code as described below.

This also has the advantage of coping with > 2 operands, in case that
ever becomes important in future.

> @@ -806,12 +833,15 @@ vect_convert_input (vec_info *vinfo, stmt_vec_info =
stmt_info, tree type,
>  }
>=20=20
>  /* Invoke vect_convert_input for N elements of UNPROM and store the
> -   result in the corresponding elements of RESULT.  */
> +   result in the corresponding elements of RESULT.
> +
> +   If SUBTYPE then don't convert the types if they only
> +   differ by sign.  */
>=20=20
>  static void
>  vect_convert_inputs (vec_info *vinfo, stmt_vec_info stmt_info, unsigned =
int n,
>  		     tree *result, tree type, vect_unpromoted_value *unprom,
> -		     tree vectype)
> +		     tree vectype, enum optab_subtype subtype =3D optab_default)
>  {
>    for (unsigned int i =3D 0; i < n; ++i)
>      {
> @@ -819,8 +849,12 @@ vect_convert_inputs (vec_info *vinfo, stmt_vec_info =
stmt_info, unsigned int n,
>        for (j =3D 0; j < i; ++j)
>  	if (unprom[j].op =3D=3D unprom[i].op)
>  	  break;
> +
>        if (j < i)
>  	result[i] =3D result[j];
> +      else if (subtype =3D=3D optab_vector_mixed_sign
> +	       && tree_nop_conversion_p (type, unprom[i].type))
> +	result[i] =3D unprom[i].op;
>        else
>  	result[i] =3D vect_convert_input (vinfo, stmt_info,
>  					type, &unprom[i], vectype);

As noted above, I think we want to preserve the sign of the original
type for optab_vector_mixed_sign, even if a conversion is needed.
I think we should avoid the special case above and instead push
subtype down into vect_convert_input.  We can then adjust the
type at the head of that function:

  if (subtype =3D=3D optab_vector_mixed_sign
      && TYPE_SIGN (type) !=3D TYPE_SIGN (TREE_TYPE (unprom->op)))
    type =3D build_nonstandard_integer_type (TYPE_PRECISION (type),
					   TYPE_SIGN (this_unprom->type));

> @@ -895,7 +929,8 @@ vect_reassociating_reduction_p (vec_info *vinfo,
>=20=20
>     Try to find the following pattern:
>=20=20
> -     type x_t, y_t;
> +     type1a x_t
> +     type1b y_t;
>       TYPE1 prod;
>       TYPE2 sum =3D init;
>     loop:
> @@ -908,8 +943,10 @@ vect_reassociating_reduction_p (vec_info *vinfo,
>       [S6  prod =3D (TYPE2) prod;  #optional]
>       S7  sum_1 =3D prod + sum_0;
>=20=20
> -   where 'TYPE1' is exactly double the size of type 'type', and 'TYPE2' =
is the
> -   same size of 'TYPE1' or bigger. This is a special case of a reduction
> +   where 'TYPE1' is exactly double the size of type 'type1a' and 'type1b=
',
> +   the sign of 'TYPE1' must be one of 'type1a' or 'type1b' but the sign =
of
> +   'type1a' and 'type1b' can differ. 'TYPE2' is the same size of 'TYPE1'=
 or
> +   bigger and must be the same sign. This is a special case of a reducti=
on

This last bit isn't true: TYPE2 is the type of the addition and can be
any sign.

>     computation.
>=20=20
>     Input:
> @@ -946,15 +983,15 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
>=20=20
>    /* Look for the following pattern
>            DX =3D (TYPE1) X;
> -          DY =3D (TYPE1) Y;
> +	  DY =3D (TYPE1) Y;
>            DPROD =3D DX * DY;
> -          DDPROD =3D (TYPE2) DPROD;
> +	  DDPROD =3D (TYPE2) DPROD;
>            sum_1 =3D DDPROD + sum_0;

Spurious whitespace changes: would be better to tabify the whole thing
or leave it as-is.

>       In which
>       - DX is double the size of X
>       - DY is double the size of Y
>       - DX, DY, DPROD all have the same type but the sign
> -       between DX, DY and DPROD can differ.
> +       between X, Y and DPROD can differ.
>       - sum is the same size of DPROD or bigger
>       - sum has been recognized as a reduction variable.
>=20=20
> @@ -992,21 +1029,27 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
>    /* FORNOW.  Can continue analyzing the def-use chain when this stmt in=
 a phi
>       inside the loop (in case we are analyzing an outer-loop).  */
>    vect_unpromoted_value unprom0[2];
> +  enum optab_subtype subtype =3D optab_vector;
>    if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR, WIDEN_MULT_EX=
PR,
> -			     false, 2, unprom0, &half_type))
> +			     false, 2, unprom0, &half_type, &subtype))
> +    return NULL;
> +
> +  if (subtype =3D=3D optab_vector_mixed_sign
> +      && TYPE_UNSIGNED (unprom_mult.type)
> +      && TYPE_PRECISION (half_type) * 4 > TYPE_PRECISION (unprom_mult.ty=
pe))
>      return NULL;

Isn't the final condition here instead that TYPE1 is narrower than TYPE2?
I.e. we need to reject the case in which we multiply a signed and an
unsigned value to get a (logically) signed result, but then zero-extend
it (rather than sign-extend it) to the precision of the addition.

That would make the test:

  if (subtype =3D=3D optab_vector_mixed_sign
      && TYPE_UNSIGNED (unprom_mult.type)
      && TYPE_PRECISION (unprom_mult.type) < TYPE_PRECISION (type))
    return NULL;=20=20=20=20
=20=20
instead.

Thanks,
Richard