RE: [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes.

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Tamar Christina <Tamar.Christina@arm.com>
To: Richard Biener <rguenther@suse.de>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>, nd <nd@arm.com>
Subject: RE: [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes.
Date: Fri, 7 May 2021 12:42:37 +0000	[thread overview]
Message-ID: <VI1PR08MB5325998C3057A611E268B740FF579@VI1PR08MB5325.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <nycvar.YFH.7.76.2105071337560.9200@zhemvz.fhfr.qr>

Hi Richi,

> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Friday, May 7, 2021 12:46 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> Subject: Re: [PATCH 1/4]middle-end Vect: Add support for dot-product
> where the sign for the multiplicant changes.
> 
> On Wed, 5 May 2021, Tamar Christina wrote:
> 
> > Hi All,
> >
> > This patch adds support for a dot product where the sign of the
> > multiplication arguments differ. i.e. one is signed and one is
> > unsigned but the precisions are the same.
> >
> > #define N 480
> > #define SIGNEDNESS_1 unsigned
> > #define SIGNEDNESS_2 signed
> > #define SIGNEDNESS_3 signed
> > #define SIGNEDNESS_4 unsigned
> >
> > SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int res,
> > SIGNEDNESS_3 char *restrict a,
> >    SIGNEDNESS_4 char *restrict b)
> > {
> >   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> >     {
> >       int av = a[i];
> >       int bv = b[i];
> >       SIGNEDNESS_2 short mult = av * bv;
> >       res += mult;
> >     }
> >   return res;
> > }
> >
> > The operations are performed as if the operands were extended to a 32-bit
> value.
> > As such this operation isn't valid if there is an intermediate
> > conversion to an unsigned value. i.e.  if SIGNEDNESS_2 is unsigned.
> >
> > more over if the signs of SIGNEDNESS_3 and SIGNEDNESS_4 are flipped
> > the same optab is used but the operands are flipped in the optab
> expansion.
> >
> > To support this the patch extends the dot-product detection to
> > optionally ignore operands with different signs and stores this
> > information in the optab subtype which is now made a bitfield.
> >
> > The subtype can now additionally controls which optab an EXPR can expand
> to.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 	* optabs.def (usdot_prod_optab): New.
> > 	* doc/md.texi: Document it.
> > 	* optabs-tree.c (optab_for_tree_code): Support usdot_prod_optab.
> > 	* optabs-tree.h (enum optab_subtype): Likewise.
> > 	* optabs.c (expand_widen_pattern_expr): Likewise.
> > 	* tree-cfg.c (verify_gimple_assign_ternary): Likewise.
> > 	* tree-vect-loop.c (vect_determine_dot_kind): New.
> > 	(vectorizable_reduction): Query dot-product kind.
> > 	* tree-vect-patterns.c (vect_supportable_direct_optab_p): Take
> optional
> > 	optab subtype.
> > 	(vect_joust_widened_type, vect_widened_op_tree): Optionally
> ignore
> > 	mismatch types.
> > 	(vect_recog_dot_prod_pattern): Support usdot_prod_optab.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index
> >
> d166a0debedf4d8edf55c842bcf4ff4690b3e9ce..baf20416e63745097825fc30fd
> f2
> > e66bc80d7d23 100644
> > --- a/gcc/doc/md.texi
> > +++ b/gcc/doc/md.texi
> > @@ -5440,11 +5440,13 @@ Like @samp{fold_left_plus_@var{m}}, but
> takes
> > an additional mask operand  @item @samp{sdot_prod@var{m}}  @cindex
> > @code{udot_prod@var{m}} instruction pattern  @itemx
> > @samp{udot_prod@var{m}}
> > +@cindex @code{usdot_prod@var{m}} instruction pattern @itemx
> > +@samp{usdot_prod@var{m}}
> >  Compute the sum of the products of two signed/unsigned elements.
> > -Operand 1 and operand 2 are of the same mode. Their product, which is
> > of a -wider mode, is computed and added to operand 3. Operand 3 is of
> > a mode equal or -wider than the mode of the product. The result is
> > placed in operand 0, which -is of the same mode as operand 3.
> > +Operand 1 and operand 2 are of the same mode but may differ in signs.
> > +Their product, which is of a wider mode, is computed and added to
> operand 3.
> > +Operand 3 is of a mode equal or wider than the mode of the product.
> > +The result is placed in operand 0, which is of the same mode as operand 3.
> 
> This doesn't really say what the 's', 'u' and 'us' specify.  Since we're doing a
> widen multiplication and then a non-widening addition we only need to
> know the effective sign of the multiplication so I think the existing 's' and 'u'
> are enough to cover all cases?

The existing 's' and 'u' enforce that both operands of the multiplication are of the
same sign.  So for e.g. 'u' both operand must be unsigned.

In the `us` case one can be signed and one unsigned. Operationally this does a sign
extension to the wider type for the signed value, and the unsigned value gets zero extended
first, and then converts it to unsigned to perform the
unsigned multiplication, conforming to the C promotion rules.

TL;DR; Without a new optab I can't tell during expansion which semantic the operation
had at the gimple/C level as modes don't carry signs.

Long version:

The problem with using the existing patterns, because of their enforcement of `av` and `bv` being
the same sign is that we can't remove the explicit sign extensions, but the multiplication must be done
on the sign/zero extended char input in the same sign.

Which means (unless I am mistaken) to get the correct result, you can't use neither `udot` nor `sdot` as
semantically these would zero or sign extend both operands from char to int to perform the multiplication
in the same sigh.  Whereas in this case, one parameter is zero and one parameter is sign extended and the result
is always an unsigned number.

So basically

udot<unsigned c, unsigned a, unsigned b> ==
   c = zero-ext (a) * zero-ext (b)
sdot<signed c, signed a, signed b> ==
   c = sign-ext (a) * sign-ext (b)
usdot<unsigned c, unsigned a, signed b> ==
   c = ((unsigned-conv) sign-ext (a)) * zero-ext (b)

So semantically the existing optabs won't fit here. udot would internally promote to unsigned types before
the multiplication so the result of the multiplication would be wrong.  sdot would promote both to signed
and do signed multiplication, so the result is also wrong.

Now if I relax the constraint on the signs of udot and sdot there are two problems:
RTL Modes don't contain signs.  So a target can't tell me how the operands will be promoted.
So:

1) I can't really check which semantics the target will adhere to on expansion.
2) at expand time I have no way to differentiate between the two instructions variants, given just modes
     I can't tell whether I expand to the normal dot-product or the new instruction.

Regards,
Tamar

> 
> The tree.def docs say the sum is also possibly widening but I don't see this
> covered by the optab so we should eventually remove this feature from the
> tree side.  In fact the tree-cfg.c verifier requires the addition to be not
> widening - thus only tree.def needs adjustment.
> 
> >  @cindex @code{ssad@var{m}} instruction pattern  @item
> > @samp{ssad@var{m}} diff --git a/gcc/optabs-tree.h b/gcc/optabs-tree.h
> > index
> >
> c3aaa1a416991e856d3e24da45968a92ebada82c..ebc23ac86fe99057f375781c2f
> 19
> > 90e0548ba08d 100644
> > --- a/gcc/optabs-tree.h
> > +++ b/gcc/optabs-tree.h
> > @@ -27,11 +27,29 @@ along with GCC; see the file COPYING3.  If not see
> >     shift amount vs. machines that take a vector for the shift amount.
> > */  enum optab_subtype  {
> > -  optab_default,
> > -  optab_scalar,
> > -  optab_vector
> > +  optab_default = 1 << 0,
> > +  optab_scalar = 1 << 1,
> > +  optab_vector = 1 << 2,
> > +  optab_signed_to_unsigned = 1 << 3,
> > +  optab_unsigned_to_signed = 1 << 4
> >  };
> >
> > +/* Override the OrEqual-operator so we can use optab_subtype as a bit
> > +flag.  */ inline enum optab_subtype& operator |= (enum
> optab_subtype&
> > +a, enum optab_subtype b) {
> > +    return a = static_cast<optab_subtype>(static_cast<int>(a)
> > +					  | static_cast<int>(b));
> > +}
> > +
> > +/* Override the Or-operator so we can use optab_subtype as a bit
> > +flag.  */ inline enum optab_subtype operator | (enum optab_subtype a,
> > +enum optab_subtype b) {
> > +    return static_cast<optab_subtype>(static_cast<int>(a)
> > +				      | static_cast<int>(b));
> > +}
> > +
> >  /* Return the optab used for computing the given operation on the type
> given by
> >     the second argument.  The third argument distinguishes between the
> types of
> >     vector shifts and rotates.  */
> > diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c index
> >
> 95ffe397c23e80c105afea52e9d47216bf52f55a..2f60004545defc53182e004eea
> 1e
> > 5c22b7453072 100644
> > --- a/gcc/optabs-tree.c
> > +++ b/gcc/optabs-tree.c
> > @@ -127,7 +127,17 @@ optab_for_tree_code (enum tree_code code,
> const_tree type,
> >        return TYPE_UNSIGNED (type) ? usum_widen_optab :
> > ssum_widen_optab;
> >
> >      case DOT_PROD_EXPR:
> > -      return TYPE_UNSIGNED (type) ? udot_prod_optab : sdot_prod_optab;
> > +      {
> > +	gcc_assert (subtype & optab_default
> > +		    || subtype & optab_vector
> > +		    || subtype & optab_signed_to_unsigned
> > +		    || subtype & optab_unsigned_to_signed);
> > +
> > +	if (subtype & (optab_unsigned_to_signed |
> optab_signed_to_unsigned))
> > +	  return usdot_prod_optab;
> > +
> > +	return (TYPE_UNSIGNED (type) ? udot_prod_optab :
> sdot_prod_optab);
> > +      }
> >
> >      case SAD_EXPR:
> >        return TYPE_UNSIGNED (type) ? usad_optab : ssad_optab; diff
> > --git a/gcc/optabs.c b/gcc/optabs.c index
> >
> f4614a394587787293dc8b680a38901f7906f61c..2e18b76de1412eab71971753ac
> 67
> > 8597c0d00098 100644
> > --- a/gcc/optabs.c
> > +++ b/gcc/optabs.c
> > @@ -262,6 +262,11 @@ expand_widen_pattern_expr (sepops ops, rtx op0,
> rtx op1, rtx wide_op,
> >    bool sbool = false;
> >
> >    oprnd0 = ops->op0;
> > +  if (nops >= 2)
> > +    oprnd1 = ops->op1;
> > +  if (nops >= 3)
> > +    oprnd2 = ops->op2;
> > +
> >    tmode0 = TYPE_MODE (TREE_TYPE (oprnd0));
> >    if (ops->code == VEC_UNPACK_FIX_TRUNC_HI_EXPR
> >        || ops->code == VEC_UNPACK_FIX_TRUNC_LO_EXPR) @@ -285,6
> +290,27
> > @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx
> wide_op,
> >  	   ? vec_unpacks_sbool_hi_optab : vec_unpacks_sbool_lo_optab);
> >        sbool = true;
> >      }
> > +  else if (ops->code == DOT_PROD_EXPR)
> > +    {
> > +      enum optab_subtype subtype = optab_default;
> > +      signop sign1 = TYPE_SIGN (TREE_TYPE (oprnd0));
> > +      signop sign2 = TYPE_SIGN (TREE_TYPE (oprnd1));
> > +      if (sign1 == sign2)
> > +	;
> > +      else if (sign1 == SIGNED && sign2 == UNSIGNED)
> > +	{
> > +	  subtype |= optab_signed_to_unsigned;
> > +	  /* Same as optab_unsigned_to_signed but flip the operands.  */
> > +	  std::swap (op0, op1);
> > +	}
> > +      else if (sign1 == UNSIGNED && sign2 == SIGNED)
> > +	subtype |= optab_unsigned_to_signed;
> > +      else
> > +	gcc_unreachable ();
> > +
> > +      widen_pattern_optab
> > +	= optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), subtype);
> > +    }
> >    else
> >      widen_pattern_optab
> >        = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0),
> > optab_default); @@ -298,10 +324,7 @@ expand_widen_pattern_expr
> (sepops ops, rtx op0, rtx op1, rtx wide_op,
> >    gcc_assert (icode != CODE_FOR_nothing);
> >
> >    if (nops >= 2)
> > -    {
> > -      oprnd1 = ops->op1;
> > -      tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
> > -    }
> > +    tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
> >    else if (sbool)
> >      {
> >        nops = 2;
> > @@ -316,7 +339,6 @@ expand_widen_pattern_expr (sepops ops, rtx op0,
> rtx op1, rtx wide_op,
> >      {
> >        gcc_assert (tmode1 == tmode0);
> >        gcc_assert (op1);
> > -      oprnd2 = ops->op2;
> >        wmode = TYPE_MODE (TREE_TYPE (oprnd2));
> >      }
> >
> > diff --git a/gcc/optabs.def b/gcc/optabs.def index
> >
> b192a9d070b8aa72e5676b2eaa020b5bdd7ffcc8..f470c2168378cec840edf7fbd
> b7c
> > 18615baae928 100644
> > --- a/gcc/optabs.def
> > +++ b/gcc/optabs.def
> > @@ -352,6 +352,7 @@ OPTAB_D (uavg_ceil_optab, "uavg$a3_ceil")
> OPTAB_D
> > (sdot_prod_optab, "sdot_prod$I$a")  OPTAB_D (ssum_widen_optab,
> > "widen_ssum$I$a3")  OPTAB_D (udot_prod_optab, "udot_prod$I$a")
> > +OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
> >  OPTAB_D (usum_widen_optab, "widen_usum$I$a3")  OPTAB_D
> (usad_optab,
> > "usad$I$a")  OPTAB_D (ssad_optab, "ssad$I$a") diff --git
> > a/gcc/tree-cfg.c b/gcc/tree-cfg.c index
> >
> 7e3aae5f9c28a49feedc7cc66e8ac0d476b9f28a..58b55bb648ad97d514f1fa18bb
> 00
> > 808fd2678b42 100644
> > --- a/gcc/tree-cfg.c
> > +++ b/gcc/tree-cfg.c
> > @@ -4421,7 +4421,8 @@ verify_gimple_assign_ternary (gassign *stmt)
> >  		  && !SCALAR_FLOAT_TYPE_P (rhs1_type))
> >  		 || (!INTEGRAL_TYPE_P (lhs_type)
> >  		     && !SCALAR_FLOAT_TYPE_P (lhs_type))))
> > -	    || !types_compatible_p (rhs1_type, rhs2_type)
> > +	    || (!types_compatible_p (rhs1_type, rhs2_type)
> > +		&& TYPE_SIGN (rhs1_type) == TYPE_SIGN (rhs2_type))
> 
> That's not restrictive enough.  I suggest you use
> 
>             && element_precision (rhs1_type) != element_precision
> (rhs2_type)
> 
> instead.
> 
> As said, I'm not sure all the changes in this patch are required.
> 
> Please elaborate.
> 
> Thanks,
> Richard.
> 
> >  	    || !useless_type_conversion_p (lhs_type, rhs3_type)
> >  	    || maybe_lt (GET_MODE_SIZE (element_mode (rhs3_type)),
> >  			 2 * GET_MODE_SIZE (element_mode (rhs1_type))))
> diff --git
> > a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index
> >
> 93fa2928e001c154bd4a9a73ac1dbbbf73c456df..cb8f5fbb6abca181c4171194d1
> 9f
> > ec29ec6e4176 100644
> > --- a/gcc/tree-vect-loop.c
> > +++ b/gcc/tree-vect-loop.c
> > @@ -6401,6 +6401,33 @@ build_vect_cond_expr (enum tree_code code,
> tree vop[3], tree mask,
> >      }
> >  }
> >
> > +/* Determine the optab_subtype to use for the given CODE and STMT.
> For
> > +   most CODE this will be optab_vector, however for certain operations
> such as
> > +   DOT_PROD_EXPR where the operation can different signs for the
> operands we
> > +   need to be able to pick the right optabs.  */
> > +
> > +static enum optab_subtype
> > +vect_determine_dot_kind (tree_code code, stmt_vec_info stmt_vinfo) {
> > +  enum optab_subtype subtype = optab_vector;
> > +  switch (code)
> > +    {
> > +      case DOT_PROD_EXPR:
> > +	{
> > +	  gassign *stmt = as_a <gassign *> (STMT_VINFO_STMT (stmt_vinfo));
> > +	  signop rhs1_sign = TYPE_SIGN (TREE_TYPE (gimple_assign_rhs1
> (stmt)));
> > +	  signop rhs2_sign = TYPE_SIGN (TREE_TYPE (gimple_assign_rhs2
> (stmt)));
> > +	  if (rhs1_sign != rhs2_sign)
> > +	    subtype |= optab_unsigned_to_signed;
> > +	  break;
> > +	}
> > +      default:
> > +	break;
> > +    }
> > +
> > +  return subtype;
> > +}
> > +
> >  /* Function vectorizable_reduction.
> >
> >     Check if STMT_INFO performs a reduction operation that can be
> vectorized.
> > @@ -7189,7 +7216,8 @@ vectorizable_reduction (loop_vec_info
> loop_vinfo,
> >        bool ok = true;
> >
> >        /* 4.1. check support for the operation in the loop  */
> > -      optab optab = optab_for_tree_code (code, vectype_in, optab_vector);
> > +      enum optab_subtype subtype = vect_determine_dot_kind (code,
> stmt_info);
> > +      optab optab = optab_for_tree_code (code, vectype_in, subtype);
> >        if (!optab)
> >  	{
> >  	  if (dump_enabled_p ())
> > diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c index
> >
> 441d6cd28c4eaded7abd756164890dbcffd2f3b8..943c001fb13777b4d1513841f
> a84
> > 942316846d5e 100644
> > --- a/gcc/tree-vect-patterns.c
> > +++ b/gcc/tree-vect-patterns.c
> > @@ -201,7 +201,8 @@ vect_get_external_def_edge (vec_info *vinfo, tree
> > var)  static bool  vect_supportable_direct_optab_p (vec_info *vinfo,
> > tree otype, tree_code code,
> >  				 tree itype, tree *vecotype_out,
> > -				 tree *vecitype_out = NULL)
> > +				 tree *vecitype_out = NULL,
> > +				 enum optab_subtype subtype =
> optab_default)
> >  {
> >    tree vecitype = get_vectype_for_scalar_type (vinfo, itype);
> >    if (!vecitype)
> > @@ -211,7 +212,7 @@ vect_supportable_direct_optab_p (vec_info *vinfo,
> tree otype, tree_code code,
> >    if (!vecotype)
> >      return false;
> >
> > -  optab optab = optab_for_tree_code (code, vecitype, optab_default);
> > +  optab optab = optab_for_tree_code (code, vecitype, subtype);
> >    if (!optab)
> >      return false;
> >
> > @@ -487,14 +488,31 @@ vect_joust_widened_integer (tree type, bool
> > shift_p, tree op,  }
> >
> >  /* Return true if the common supertype of NEW_TYPE and
> *COMMON_TYPE
> > -   is narrower than type, storing the supertype in *COMMON_TYPE if so.
> */
> > +   is narrower than type, storing the supertype in *COMMON_TYPE if so.
> > +   If ALLOW_SHORT_SIGN_MISMATCH then accept that *COMMON_TYPE
> and NEW_TYPE
> > +   may be of different signs but equal precision.   */
> >
> >  static bool
> > -vect_joust_widened_type (tree type, tree new_type, tree
> *common_type)
> > +vect_joust_widened_type (tree type, tree new_type, tree
> *common_type,
> > +			 bool allow_short_sign_mismatch = false)
> >  {
> >    if (types_compatible_p (*common_type, new_type))
> >      return true;
> >
> > +  /* Check if the mismatch is only in the sign and if we have
> > +     allow_short_sign_mismatch then allow it.  */
> > +  if (allow_short_sign_mismatch
> > +      && TYPE_SIGN (*common_type) != TYPE_SIGN (new_type))
> > +    {
> > +      bool sign = TYPE_SIGN (*common_type) == UNSIGNED;
> > +      tree eq_type
> > +	= build_nonstandard_integer_type (TYPE_PRECISION (new_type),
> > +					  sign);
> > +
> > +      if (types_compatible_p (*common_type, eq_type))
> > +	return true;
> > +    }
> > +
> >    /* See if *COMMON_TYPE can hold all values of NEW_TYPE.  */
> >    if ((TYPE_PRECISION (new_type) < TYPE_PRECISION (*common_type))
> >        && (TYPE_UNSIGNED (new_type) || !TYPE_UNSIGNED
> (*common_type)))
> > @@ -532,6 +550,9 @@ vect_joust_widened_type (tree type, tree
> new_type, tree *common_type)
> >     to a type that (a) is narrower than the result of STMT_INFO and
> >     (b) can hold all leaf operand values.
> >
> > +   If ALLOW_SHORT_SIGN_MISMATCH then allow that the signs of the
> operands
> > +   may differ in signs but not in precision.
> > +
> >     Return 0 if STMT_INFO isn't such a tree, or if no such COMMON_TYPE
> >     exists.  */
> >
> > @@ -539,7 +560,8 @@ static unsigned int  vect_widened_op_tree
> > (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
> >  		      tree_code widened_code, bool shift_p,
> >  		      unsigned int max_nops,
> > -		      vect_unpromoted_value *unprom, tree *common_type)
> > +		      vect_unpromoted_value *unprom, tree *common_type,
> > +		      bool allow_short_sign_mismatch = false)
> >  {
> >    /* Check for an integer operation with the right code.  */
> >    gassign *assign = dyn_cast <gassign *> (stmt_info->stmt); @@ -600,7
> > +622,8 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info
> stmt_info, tree_code code,
> >  		= vinfo->lookup_def (this_unprom->op);
> >  	      nops = vect_widened_op_tree (vinfo, def_stmt_info, code,
> >  					   widened_code, shift_p, max_nops,
> > -					   this_unprom, common_type);
> > +					   this_unprom, common_type,
> > +					   allow_short_sign_mismatch);
> >  	      if (nops == 0)
> >  		return 0;
> >
> > @@ -617,7 +640,8 @@ vect_widened_op_tree (vec_info *vinfo,
> stmt_vec_info stmt_info, tree_code code,
> >  	      if (i == 0)
> >  		*common_type = this_unprom->type;
> >  	      else if (!vect_joust_widened_type (type, this_unprom->type,
> > -						 common_type))
> > +						 common_type,
> > +						 allow_short_sign_mismatch))
> >  		return 0;
> >  	    }
> >  	}
> > @@ -888,21 +912,24 @@ vect_reassociating_reduction_p (vec_info *vinfo,
> >
> >     Try to find the following pattern:
> >
> > -     type x_t, y_t;
> > +     type1a x_t
> > +     type1b y_t;
> >       TYPE1 prod;
> >       TYPE2 sum = init;
> >     loop:
> >       sum_0 = phi <init, sum_1>
> >       S1  x_t = ...
> >       S2  y_t = ...
> > -     S3  x_T = (TYPE1) x_t;
> > -     S4  y_T = (TYPE1) y_t;
> > +     S3  x_T = (TYPE3) x_t;
> > +     S4  y_T = (TYPE4) y_t;
> >       S5  prod = x_T * y_T;
> >       [S6  prod = (TYPE2) prod;  #optional]
> >       S7  sum_1 = prod + sum_0;
> >
> > -   where 'TYPE1' is exactly double the size of type 'type', and 'TYPE2' is the
> > -   same size of 'TYPE1' or bigger. This is a special case of a reduction
> > +   where 'TYPE1' is exactly double the size of type 'type1a' and 'type1b',
> > +   the sign of 'TYPE1' must be one of 'type1a' or 'type1b' but the sign of
> > +   'type1a' and 'type1b' can differ. 'TYPE2' is the same size of 'TYPE1' or
> > +   bigger and must be the same sign. This is a special case of a
> > + reduction
> >     computation.
> >
> >     Input:
> > @@ -939,15 +966,16 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
> >
> >    /* Look for the following pattern
> >            DX = (TYPE1) X;
> > -          DY = (TYPE1) Y;
> > +	  DY = (TYPE2) Y;
> >            DPROD = DX * DY;
> > -          DDPROD = (TYPE2) DPROD;
> > +	  DDPROD = (TYPE3) DPROD;
> >            sum_1 = DDPROD + sum_0;
> >       In which
> >       - DX is double the size of X
> >       - DY is double the size of Y
> >       - DX, DY, DPROD all have the same type but the sign
> > -       between DX, DY and DPROD can differ.
> > +       between DX, DY and DPROD can differ. The sign of DPROD
> > +       is one of the signs of DX or DY.
> >       - sum is the same size of DPROD or bigger
> >       - sum has been recognized as a reduction variable.
> >
> > @@ -986,14 +1014,41 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
> >       inside the loop (in case we are analyzing an outer-loop).  */
> >    vect_unpromoted_value unprom0[2];
> >    if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR,
> WIDEN_MULT_EXPR,
> > -			     false, 2, unprom0, &half_type))
> > +			     false, 2, unprom0, &half_type, true))
> >      return NULL;
> >
> > +  /* Check to see if there is a sign change happening in the operands of
> the
> > +     multiplication and pick the appropriate optab subtype.  */
> > +  enum optab_subtype subtype;
> > +  tree rhs_type1 = unprom0[0].type;
> > +  tree rhs_type2 = unprom0[1].type;
> > +  if (TYPE_SIGN (rhs_type1) == TYPE_SIGN (rhs_type2))
> > +     subtype = optab_default;
> > +  else if (TYPE_SIGN (rhs_type1) == SIGNED
> > +	   && TYPE_SIGN (rhs_type2) == UNSIGNED)
> > +     subtype = optab_signed_to_unsigned;
> > +  else if (TYPE_SIGN (rhs_type1) == UNSIGNED
> > +	   && TYPE_SIGN (rhs_type2) == SIGNED)
> > +     subtype = optab_unsigned_to_signed;
> > +  else
> > +    gcc_unreachable ();
> > +
> > +  /* If we have a sign changing dot product we need to check that the
> > +     promoted type if unsigned has at least the same precision as the final
> > +     type of the dot-product.  */
> > +  if (subtype != optab_default)
> > +    {
> > +      tree mult_type = TREE_TYPE (unprom_mult.op);
> > +      if (TYPE_SIGN (mult_type) == UNSIGNED
> > +	  && TYPE_PRECISION (mult_type) < TYPE_PRECISION (type))
> > +	return NULL;
> > +    }
> > +
> >    vect_pattern_detected ("vect_recog_dot_prod_pattern", last_stmt);
> >
> >    tree half_vectype;
> >    if (!vect_supportable_direct_optab_p (vinfo, type, DOT_PROD_EXPR,
> half_type,
> > -					type_out, &half_vectype))
> > +					type_out, &half_vectype, subtype))
> >      return NULL;
> >
> >    /* Get the inputs in the appropriate types.  */ @@ -1002,8 +1057,22
> > @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
> >  		       unprom0, half_vectype);
> >
> >    var = vect_recog_temp_ssa_var (type, NULL);
> > +
> > +  /* If we have a sign changing dot-product the dot-product itself does any
> > +     sign conversions, so consume the type and use the unpromoted
> > + types.  */  tree mult_arg1, mult_arg2;  if (subtype ==
> > + optab_default)
> > +    {
> > +      mult_arg1 = mult_oprnd[0];
> > +      mult_arg2 = mult_oprnd[1];
> > +    }
> > +  else
> > +    {
> > +      mult_arg1 = unprom0[0].op;
> > +      mult_arg2 = unprom0[1].op;
> > +    }
> >    pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,
> > -				      mult_oprnd[0], mult_oprnd[1], oprnd1);
> > +				      mult_arg1, mult_arg2, oprnd1);
> >
> >    return pattern_stmt;
> >  }
> >
> >
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

next prev parent reply	other threads:[~2021-05-07 12:42 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-05 17:38 Tamar Christina
2021-05-05 17:38 ` [PATCH 2/4]AArch64: Add support for sign differing dot-product usdot for NEON and SVE Tamar Christina
2021-05-10 16:49   ` Richard Sandiford
2021-05-25 14:57     ` Tamar Christina
2021-05-26  8:50       ` Richard Sandiford
2021-05-05 17:39 ` [PATCH 3/4][AArch32]: Add support for sign differing dot-product usdot for NEON Tamar Christina
2021-05-05 17:42   ` FW: " Tamar Christina
     [not found]     ` <VI1PR08MB5325B832EE3BB6139886C0E9FF259@VI1PR08MB5325.eurprd08.prod.outlook.com>
2021-05-25 15:02       ` Tamar Christina
2021-05-26 10:45         ` Kyrylo Tkachov
2021-05-06  9:23   ` Christophe Lyon
2021-05-06  9:27     ` Tamar Christina
2021-05-05 17:39 ` [PATCH 4/4]middle-end: Add tests middle end generic tests for sign differing dotproduct Tamar Christina
     [not found]   ` <VI1PR08MB532511701573C18A33AC6291FF259@VI1PR08MB5325.eurprd08.prod.outlook.com>
2021-05-25 15:01     ` FW: " Tamar Christina
     [not found]     ` <11s2181-8856-30rq-26or-84q8o7qrr2o@fhfr.qr>
2021-05-26  8:48       ` Tamar Christina
2021-06-14 12:08       ` Tamar Christina
2021-05-07 11:45 ` [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes Richard Biener
2021-05-07 12:42   ` Tamar Christina [this message]
2021-05-10 11:39     ` Richard Biener
2021-05-10 12:58       ` Tamar Christina
2021-05-10 13:29         ` Richard Biener
2021-05-25 14:57           ` Tamar Christina
2021-05-26  8:56             ` Richard Biener
2021-06-02  9:28               ` Tamar Christina
2021-06-04 10:12                 ` Tamar Christina
2021-06-07 10:10                   ` Richard Sandiford
2021-06-14 12:06                     ` Tamar Christina
2021-06-21  8:11                       ` Tamar Christina
2021-06-22 10:56                       ` Richard Sandiford
2021-06-22 11:16                         ` Richard Sandiford
2021-07-12  9:18                           ` Tamar Christina
2021-07-12  9:39                             ` Richard Sandiford
2021-07-12  9:56                               ` Tamar Christina
2021-07-12 10:25                                 ` Richard Sandiford
2021-07-12 12:29                                   ` Tamar Christina
2021-07-12 14:55                                     ` Richard Sandiford

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=VI1PR08MB5325998C3057A611E268B740FF579@VI1PR08MB5325.eurprd08.prod.outlook.com \
    --to=tamar.christina@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=nd@arm.com \
    --cc=rguenther@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).