RE: [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes.

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Tamar Christina <Tamar.Christina@arm.com>
To: Tamar Christina <Tamar.Christina@arm.com>,
	Richard Biener <rguenther@suse.de>
Cc: Richard Sandiford <Richard.Sandiford@arm.com>, nd <nd@arm.com>,
	"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: RE: [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes.
Date: Fri, 4 Jun 2021 10:12:51 +0000	[thread overview]
Message-ID: <VI1PR08MB532593667144D0CA998C2723FF3B9@VI1PR08MB5325.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <VI1PR08MB53256B0E3990410089DB3333FF3D9@VI1PR08MB5325.eurprd08.prod.outlook.com>

[-- Attachment #1: Type: text/plain, Size: 69585 bytes --]

Hi Richi,

Attached is re-spun patch.  tree_nop_conversion_p was very handy in cleaning up the patch, Thanks!

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master if Richard S has no comments?

Thanks,
Tamar

gcc/ChangeLog:

	* optabs.def (usdot_prod_optab): New.
	* doc/md.texi: Document it and clarify other dot prod optabs.
	* optabs-tree.h (enum optab_subtype): Add optab_vector_mixed_sign.
	* optabs-tree.c (optab_for_tree_code): Support usdot_prod_optab.
	* optabs.c (expand_widen_pattern_expr): Likewise.
	* tree-cfg.c (verify_gimple_assign_ternary): Likewise.
	* tree-vect-loop.c (vectorizable_reduction): Query dot-product kind.
	* tree-vect-patterns.c (vect_supportable_direct_optab_p): Take optional
	optab subtype.
	(vect_joust_widened_type, vect_widened_op_tree): Optionally ignore
	mismatch types.
	(vect_recog_dot_prod_pattern): Support usdot_prod_optab.


--- inline copy of patch ---

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index d166a0debedf4d8edf55c842bcf4ff4690b3e9ce..9fad3322b3f1eb2a836833bb390df78f0cd9734b 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5438,13 +5438,55 @@ Like @samp{fold_left_plus_@var{m}}, but takes an additional mask operand
 
 @cindex @code{sdot_prod@var{m}} instruction pattern
 @item @samp{sdot_prod@var{m}}
+
+Compute the sum of the products of two signed elements.
+Operand 1 and operand 2 are of the same mode. Their
+product, which is of a wider mode, is computed and added to operand 3.
+Operand 3 is of a mode equal or wider than the mode of the product. The
+result is placed in operand 0, which is of the same mode as operand 3.
+
+Semantically the expressions perform the multiplication in the following signs
+
+@smallexample
+sdot<signed c, signed a, signed b> ==
+   res = sign-ext (a) * sign-ext (b) + c
+@dots{}
+@end smallexample
+
 @cindex @code{udot_prod@var{m}} instruction pattern
-@itemx @samp{udot_prod@var{m}}
-Compute the sum of the products of two signed/unsigned elements.
-Operand 1 and operand 2 are of the same mode. Their product, which is of a
-wider mode, is computed and added to operand 3. Operand 3 is of a mode equal or
-wider than the mode of the product. The result is placed in operand 0, which
-is of the same mode as operand 3.
+@item @samp{udot_prod@var{m}}
+
+Compute the sum of the products of two unsigned elements.
+Operand 1 and operand 2 are of the same mode. Their
+product, which is of a wider mode, is computed and added to operand 3.
+Operand 3 is of a mode equal or wider than the mode of the product. The
+result is placed in operand 0, which is of the same mode as operand 3.
+
+Semantically the expressions perform the multiplication in the following signs
+
+@smallexample
+udot<unsigned c, unsigned a, unsigned b> ==
+   res = zero-ext (a) * zero-ext (b) + c
+@dots{}
+@end smallexample
+
+
+
+@cindex @code{usdot_prod@var{m}} instruction pattern
+@item @samp{usdot_prod@var{m}}
+Compute the sum of the products of elements of different signs.
+Operand 1 must be unsigned and operand 2 signed. Their
+product, which is of a wider mode, is computed and added to operand 3.
+Operand 3 is of a mode equal or wider than the mode of the product. The
+result is placed in operand 0, which is of the same mode as operand 3.
+
+Semantically the expressions perform the multiplication in the following signs
+
+@smallexample
+usdot<unsigned c, unsigned a, signed b> ==
+   res = ((unsigned-conv) sign-ext (a)) * zero-ext (b) + c
+@dots{}
+@end smallexample
 
 @cindex @code{ssad@var{m}} instruction pattern
 @item @samp{ssad@var{m}}
diff --git a/gcc/optabs-tree.h b/gcc/optabs-tree.h
index c3aaa1a416991e856d3e24da45968a92ebada82c..fbd2b06b8dbfd560dfb66b314830e6b564b37abb 100644
--- a/gcc/optabs-tree.h
+++ b/gcc/optabs-tree.h
@@ -29,7 +29,8 @@ enum optab_subtype
 {
   optab_default,
   optab_scalar,
-  optab_vector
+  optab_vector,
+  optab_vector_mixed_sign
 };
 
 /* Return the optab used for computing the given operation on the type given by
diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
index 95ffe397c23e80c105afea52e9d47216bf52f55a..eeb5aeed3202cc6971b6447994bc5311e9c010bb 100644
--- a/gcc/optabs-tree.c
+++ b/gcc/optabs-tree.c
@@ -127,7 +127,12 @@ optab_for_tree_code (enum tree_code code, const_tree type,
       return TYPE_UNSIGNED (type) ? usum_widen_optab : ssum_widen_optab;
 
     case DOT_PROD_EXPR:
-      return TYPE_UNSIGNED (type) ? udot_prod_optab : sdot_prod_optab;
+      {
+	if (subtype == optab_vector_mixed_sign)
+	  return usdot_prod_optab;
+
+	return (TYPE_UNSIGNED (type) ? udot_prod_optab : sdot_prod_optab);
+      }
 
     case SAD_EXPR:
       return TYPE_UNSIGNED (type) ? usad_optab : ssad_optab;
diff --git a/gcc/optabs.c b/gcc/optabs.c
index f4614a394587787293dc8b680a38901f7906f61c..d9b64441d0e0726afee89dc9c937350451e7670d 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -262,6 +262,11 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
   bool sbool = false;
 
   oprnd0 = ops->op0;
+  if (nops >= 2)
+    oprnd1 = ops->op1;
+  if (nops >= 3)
+    oprnd2 = ops->op2;
+
   tmode0 = TYPE_MODE (TREE_TYPE (oprnd0));
   if (ops->code == VEC_UNPACK_FIX_TRUNC_HI_EXPR
       || ops->code == VEC_UNPACK_FIX_TRUNC_LO_EXPR)
@@ -285,6 +290,27 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
 	   ? vec_unpacks_sbool_hi_optab : vec_unpacks_sbool_lo_optab);
       sbool = true;
     }
+  else if (ops->code == DOT_PROD_EXPR)
+    {
+      enum optab_subtype subtype = optab_default;
+      signop sign1 = TYPE_SIGN (TREE_TYPE (oprnd0));
+      signop sign2 = TYPE_SIGN (TREE_TYPE (oprnd1));
+      if (sign1 == sign2)
+	;
+      else if (sign1 == SIGNED && sign2 == UNSIGNED)
+	{
+	  subtype = optab_vector_mixed_sign;
+	  /* Same as optab_vector_mixed_sign but flip the operands.  */
+	  std::swap (op0, op1);
+	}
+      else if (sign1 == UNSIGNED && sign2 == SIGNED)
+	subtype = optab_vector_mixed_sign;
+      else
+	gcc_unreachable ();
+
+      widen_pattern_optab
+	= optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), subtype);
+    }
   else
     widen_pattern_optab
       = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), optab_default);
@@ -298,10 +324,7 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
   gcc_assert (icode != CODE_FOR_nothing);
 
   if (nops >= 2)
-    {
-      oprnd1 = ops->op1;
-      tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
-    }
+    tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
   else if (sbool)
     {
       nops = 2;
@@ -316,7 +339,6 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
     {
       gcc_assert (tmode1 == tmode0);
       gcc_assert (op1);
-      oprnd2 = ops->op2;
       wmode = TYPE_MODE (TREE_TYPE (oprnd2));
     }
 
diff --git a/gcc/optabs.def b/gcc/optabs.def
index b192a9d070b8aa72e5676b2eaa020b5bdd7ffcc8..f470c2168378cec840edf7fbdb7c18615baae928 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -352,6 +352,7 @@ OPTAB_D (uavg_ceil_optab, "uavg$a3_ceil")
 OPTAB_D (sdot_prod_optab, "sdot_prod$I$a")
 OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3")
 OPTAB_D (udot_prod_optab, "udot_prod$I$a")
+OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
 OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
 OPTAB_D (usad_optab, "usad$I$a")
 OPTAB_D (ssad_optab, "ssad$I$a")
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index 7e3aae5f9c28a49feedc7cc66e8ac0d476b9f28a..0128891852fcd74fe31cd338614e90a26256b4bd 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -4421,7 +4421,8 @@ verify_gimple_assign_ternary (gassign *stmt)
 		  && !SCALAR_FLOAT_TYPE_P (rhs1_type))
 		 || (!INTEGRAL_TYPE_P (lhs_type)
 		     && !SCALAR_FLOAT_TYPE_P (lhs_type))))
-	    || !types_compatible_p (rhs1_type, rhs2_type)
+	    /* rhs1_type and rhs2_type may differ in sign.  */
+	    || !tree_nop_conversion_p (rhs1_type, rhs2_type)
 	    || !useless_type_conversion_p (lhs_type, rhs3_type)
 	    || maybe_lt (GET_MODE_SIZE (element_mode (rhs3_type)),
 			 2 * GET_MODE_SIZE (element_mode (rhs1_type))))
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 93fa2928e001c154bd4a9a73ac1dbbbf73c456df..756d2867b678d0d8394202c6adb03d9cd26029e7 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -6662,6 +6662,12 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
   bool lane_reduc_code_p
     = (code == DOT_PROD_EXPR || code == WIDEN_SUM_EXPR || code == SAD_EXPR);
   int op_type = TREE_CODE_LENGTH (code);
+  enum optab_subtype optab_query_kind = optab_vector;
+  if (code == DOT_PROD_EXPR
+      && TYPE_SIGN (TREE_TYPE (gimple_assign_rhs1 (stmt)))
+	   != TYPE_SIGN (TREE_TYPE (gimple_assign_rhs2 (stmt))))
+    optab_query_kind = optab_vector_mixed_sign;
+
 
   scalar_dest = gimple_assign_lhs (stmt);
   scalar_type = TREE_TYPE (scalar_dest);
@@ -7189,7 +7195,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
       bool ok = true;
 
       /* 4.1. check support for the operation in the loop  */
-      optab optab = optab_for_tree_code (code, vectype_in, optab_vector);
+      optab optab = optab_for_tree_code (code, vectype_in, optab_query_kind);
       if (!optab)
 	{
 	  if (dump_enabled_p ())
diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index 441d6cd28c4eaded7abd756164890dbcffd2f3b8..82123b96313e6783ea214b9259805d65c07d8858 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -201,7 +201,8 @@ vect_get_external_def_edge (vec_info *vinfo, tree var)
 static bool
 vect_supportable_direct_optab_p (vec_info *vinfo, tree otype, tree_code code,
 				 tree itype, tree *vecotype_out,
-				 tree *vecitype_out = NULL)
+				 tree *vecitype_out = NULL,
+				 enum optab_subtype subtype = optab_default)
 {
   tree vecitype = get_vectype_for_scalar_type (vinfo, itype);
   if (!vecitype)
@@ -211,7 +212,7 @@ vect_supportable_direct_optab_p (vec_info *vinfo, tree otype, tree_code code,
   if (!vecotype)
     return false;
 
-  optab optab = optab_for_tree_code (code, vecitype, optab_default);
+  optab optab = optab_for_tree_code (code, vecitype, subtype);
   if (!optab)
     return false;
 
@@ -487,10 +488,14 @@ vect_joust_widened_integer (tree type, bool shift_p, tree op,
 }
 
 /* Return true if the common supertype of NEW_TYPE and *COMMON_TYPE
-   is narrower than type, storing the supertype in *COMMON_TYPE if so.  */
+   is narrower than type, storing the supertype in *COMMON_TYPE if so.
+   If UNPROM_TYPE then accept that *COMMON_TYPE and NEW_TYPE may be of
+   different signs but equal precision and that the resulting
+   multiplication of them be compatible with UNPROM_TYPE.   */
 
 static bool
-vect_joust_widened_type (tree type, tree new_type, tree *common_type)
+vect_joust_widened_type (tree type, tree new_type, tree *common_type,
+			 tree unprom_type = NULL)
 {
   if (types_compatible_p (*common_type, new_type))
     return true;
@@ -514,7 +519,18 @@ vect_joust_widened_type (tree type, tree new_type, tree *common_type)
   unsigned int precision = MAX (TYPE_PRECISION (*common_type),
 				TYPE_PRECISION (new_type));
   precision *= 2;
-  if (precision * 2 > TYPE_PRECISION (type))
+
+  /* Check if the mismatch is only in the sign and if we have
+     UNPROM_TYPE then allow it if there is enough precision to
+     not lose any information during the conversion.  */
+  if (unprom_type
+      && TYPE_SIGN (unprom_type) == SIGNED
+      && tree_nop_conversion_p (*common_type, new_type))
+	return true;
+
+  /* The resulting application is unsigned, check if we have enough
+     precision to perform the operation.  */
+  if (precision * 2 > TYPE_PRECISION (unprom_type ? unprom_type : type))
     return false;
 
   *common_type = build_nonstandard_integer_type (precision, false);
@@ -532,6 +548,10 @@ vect_joust_widened_type (tree type, tree new_type, tree *common_type)
    to a type that (a) is narrower than the result of STMT_INFO and
    (b) can hold all leaf operand values.
 
+   If UNPROM_TYPE then allow that the signs of the operands
+   may differ in signs but not in precision and that the resulting type
+   of the operation on the operands is compatible with UNPROM_TYPE.
+
    Return 0 if STMT_INFO isn't such a tree, or if no such COMMON_TYPE
    exists.  */
 
@@ -539,7 +559,8 @@ static unsigned int
 vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
 		      tree_code widened_code, bool shift_p,
 		      unsigned int max_nops,
-		      vect_unpromoted_value *unprom, tree *common_type)
+		      vect_unpromoted_value *unprom, tree *common_type,
+		      tree unprom_type = NULL)
 {
   /* Check for an integer operation with the right code.  */
   gassign *assign = dyn_cast <gassign *> (stmt_info->stmt);
@@ -600,7 +621,8 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
 		= vinfo->lookup_def (this_unprom->op);
 	      nops = vect_widened_op_tree (vinfo, def_stmt_info, code,
 					   widened_code, shift_p, max_nops,
-					   this_unprom, common_type);
+					   this_unprom, common_type,
+					   unprom_type);
 	      if (nops == 0)
 		return 0;
 
@@ -617,7 +639,7 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
 	      if (i == 0)
 		*common_type = this_unprom->type;
 	      else if (!vect_joust_widened_type (type, this_unprom->type,
-						 common_type))
+						 common_type, unprom_type))
 		return 0;
 	    }
 	}
@@ -799,12 +821,15 @@ vect_convert_input (vec_info *vinfo, stmt_vec_info stmt_info, tree type,
 }
 
 /* Invoke vect_convert_input for N elements of UNPROM and store the
-   result in the corresponding elements of RESULT.  */
+   result in the corresponding elements of RESULT.
+
+   If ALLOW_SHORT_SIGN_MISMATCH then don't convert the types if they only
+   differ by sign.  */
 
 static void
 vect_convert_inputs (vec_info *vinfo, stmt_vec_info stmt_info, unsigned int n,
 		     tree *result, tree type, vect_unpromoted_value *unprom,
-		     tree vectype)
+		     tree vectype, bool allow_short_sign_mismatch = false)
 {
   for (unsigned int i = 0; i < n; ++i)
     {
@@ -812,8 +837,12 @@ vect_convert_inputs (vec_info *vinfo, stmt_vec_info stmt_info, unsigned int n,
       for (j = 0; j < i; ++j)
 	if (unprom[j].op == unprom[i].op)
 	  break;
+
       if (j < i)
 	result[i] = result[j];
+      else if (allow_short_sign_mismatch
+	       && tree_nop_conversion_p (type, unprom[i].type))
+	result[i] = unprom[i].op;
       else
 	result[i] = vect_convert_input (vinfo, stmt_info,
 					type, &unprom[i], vectype);
@@ -888,21 +917,24 @@ vect_reassociating_reduction_p (vec_info *vinfo,
 
    Try to find the following pattern:
 
-     type x_t, y_t;
+     type1a x_t
+     type1b y_t;
      TYPE1 prod;
      TYPE2 sum = init;
    loop:
      sum_0 = phi <init, sum_1>
      S1  x_t = ...
      S2  y_t = ...
-     S3  x_T = (TYPE1) x_t;
-     S4  y_T = (TYPE1) y_t;
+     S3  x_T = (TYPE3) x_t;
+     S4  y_T = (TYPE4) y_t;
      S5  prod = x_T * y_T;
      [S6  prod = (TYPE2) prod;  #optional]
      S7  sum_1 = prod + sum_0;
 
-   where 'TYPE1' is exactly double the size of type 'type', and 'TYPE2' is the
-   same size of 'TYPE1' or bigger. This is a special case of a reduction
+   where 'TYPE1' is exactly double the size of type 'type1a' and 'type1b',
+   the sign of 'TYPE1' must be one of 'type1a' or 'type1b' but the sign of
+   'type1a' and 'type1b' can differ. 'TYPE2' is the same size of 'TYPE1' or
+   bigger and must be the same sign. This is a special case of a reduction
    computation.
 
    Input:
@@ -939,15 +971,16 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
 
   /* Look for the following pattern
           DX = (TYPE1) X;
-          DY = (TYPE1) Y;
+	  DY = (TYPE2) Y;
           DPROD = DX * DY;
-          DDPROD = (TYPE2) DPROD;
+	  DDPROD = (TYPE3) DPROD;
           sum_1 = DDPROD + sum_0;
      In which
      - DX is double the size of X
      - DY is double the size of Y
      - DX, DY, DPROD all have the same type but the sign
-       between DX, DY and DPROD can differ.
+       between DX, DY and DPROD can differ. The sign of DPROD
+       is one of the signs of DX or DY.
      - sum is the same size of DPROD or bigger
      - sum has been recognized as a reduction variable.
 
@@ -986,20 +1019,29 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
      inside the loop (in case we are analyzing an outer-loop).  */
   vect_unpromoted_value unprom0[2];
   if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR, WIDEN_MULT_EXPR,
-			     false, 2, unprom0, &half_type))
+			     false, 2, unprom0, &half_type,
+			     TREE_TYPE (unprom_mult.op)))
     return NULL;
 
+  /* Check to see if there is a sign change happening in the operands of the
+     multiplication and pick the appropriate optab subtype.  */
+  enum optab_subtype subtype;
+  if (TYPE_SIGN (unprom0[0].type) == TYPE_SIGN (unprom0[1].type))
+    subtype = optab_default;
+  else
+    subtype = optab_vector_mixed_sign;
+
   vect_pattern_detected ("vect_recog_dot_prod_pattern", last_stmt);
 
   tree half_vectype;
   if (!vect_supportable_direct_optab_p (vinfo, type, DOT_PROD_EXPR, half_type,
-					type_out, &half_vectype))
+					type_out, &half_vectype, subtype))
     return NULL;
 
   /* Get the inputs in the appropriate types.  */
   tree mult_oprnd[2];
   vect_convert_inputs (vinfo, stmt_vinfo, 2, mult_oprnd, half_type,
-		       unprom0, half_vectype);
+		       unprom0, half_vectype, true);
 
   var = vect_recog_temp_ssa_var (type, NULL);
   pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,

> -----Original Message-----
> From: Gcc-patches <gcc-patches-
> bounces+tamar.christina=arm.com@gcc.gnu.org> On Behalf Of Tamar
> Christina via Gcc-patches
> Sent: Wednesday, June 2, 2021 10:28 AM
> To: Richard Biener <rguenther@suse.de>
> Cc: Richard Sandiford <Richard.Sandiford@arm.com>; nd <nd@arm.com>;
> gcc-patches@gcc.gnu.org
> Subject: RE: [PATCH 1/4]middle-end Vect: Add support for dot-product
> where the sign for the multiplicant changes.
> 
> Ping,
> 
> Did you have any comments Richard S?
> 
> Otherwise I'll proceed with respining according to Richi's comments.
> 
> Regards,
> Tamar
> 
> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de>
> > Sent: Wednesday, May 26, 2021 9:57 AM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Sandiford
> > <Richard.Sandiford@arm.com>
> > Subject: RE: [PATCH 1/4]middle-end Vect: Add support for dot-product
> > where the sign for the multiplicant changes.
> >
> > On Tue, 25 May 2021, Tamar Christina wrote:
> >
> > > Hi Richi,
> > >
> > > Here's a respun version of the patch.
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > >
> > > Ok for master?
> >
> > index
> >
> 7e3aae5f9c28a49feedc7cc66e8ac0d476b9f28a..13e405edd765dde704c64348d
> > 2d0b3cd88f0af7c
> > 100644
> > --- a/gcc/tree-cfg.c
> > +++ b/gcc/tree-cfg.c
> > @@ -4421,7 +4421,9 @@ verify_gimple_assign_ternary (gassign *stmt)
> >                   && !SCALAR_FLOAT_TYPE_P (rhs1_type))
> >                  || (!INTEGRAL_TYPE_P (lhs_type)
> >                      && !SCALAR_FLOAT_TYPE_P (lhs_type))))
> > -           || !types_compatible_p (rhs1_type, rhs2_type)
> > +           || (!types_compatible_p (rhs1_type, rhs2_type)
> > +               && TYPE_SIGN (rhs1_type) == TYPE_SIGN (rhs2_type)
> > +               && TYPE_PRECISION (rhs1_type) != TYPE_PRECISION
> > (rhs2_type))
> >
> > I think this doesn't capture the constraints - instead please do
> >
> > -           || !types_compatible_p (rhs1_type, rhs2_type)
> > +           /* rhs1_type and rhs2_type may differ in sign.  */
> > +           || !tree_nop_conversion_p (rhs1_type, rhs2_type)
> >
> >
> > +/* Determine the optab_subtype to use for the given CODE and STMT.
> For
> > +   most CODE this will be optab_vector, however for certain
> > +operations
> > such as
> > +   DOT_PROD_EXPR where the operation can different signs for the
> > operands
> > we
> > +   need to be able to pick the right optabs.  */
> > +
> > +static enum optab_subtype
> > +vect_determine_dot_kind (tree_code code, stmt_vec_info stmt_vinfo)
> >
> > vect_determine_optab_subkind would be a better name.  'code' is
> > redundant (or should better match stmt_vinfo->stmts code).  I wonder
> > if it might be clearer to compute the subtype where we compute 'code'
> > and the relation to stmt_info is obvious, I mean here:
> >
> >   /* 3. Check the operands of the operation.  The first operands are
> > defined
> >         inside the loop body. The last operand is the reduction variable,
> >         which is defined by the loop-header-phi.  */
> >
> >   tree vectype_out = STMT_VINFO_VECTYPE (stmt_info);
> >   STMT_VINFO_REDUC_VECTYPE (reduc_info) = vectype_out;
> >   gassign *stmt = as_a <gassign *> (stmt_info->stmt);
> >   enum tree_code code = gimple_assign_rhs_code (stmt);
> >   bool lane_reduc_code_p
> >     = (code == DOT_PROD_EXPR || code == WIDEN_SUM_EXPR || code ==
> > SAD_EXPR);
> >
> > so just add
> >
> >   enum optab_subtype optab_query_kind = optab_vector;
> >   if (code == DOT_PROD_EXPR
> >       && <sign test>)
> >     optab_query_kind = optab_vector_mixed_sign;
> >
> > in this place and avoid adding the new function?
> >
> > I'm not too familiar with the pattern recog code, a 2nd eye would be
> > prefered (Richard?), but
> >
> > +  /* Check if the mismatch is only in the sign and if we have
> > +     allow_short_sign_mismatch then allow it.  */  if (unprom_type
> > +      && TYPE_SIGN (unprom_type) == SIGNED
> > +      && TYPE_SIGN (*common_type) != TYPE_SIGN (new_type))
> > +    {
> > +      bool sign = TYPE_SIGN (*common_type) == UNSIGNED;
> > +      tree eq_type
> > +       = build_nonstandard_integer_type (TYPE_PRECISION (new_type),
> > +                                         sign);
> > +
> > +      if (types_compatible_p (*common_type, eq_type))
> > +       return true;
> > +    }
> >
> > looks somewhat complicated - is that equal to
> >
> >   if (unprom_type
> >       && tree_nop_conversion_p (*common_type, new_type))
> >     return true;
> >
> > ?  That is, *common_type and new_type only differ in sign?
> >
> > @@ -812,8 +844,13 @@ vect_convert_inputs (vec_info *vinfo,
> > stmt_vec_info stmt_info, unsigned int n,
> >        for (j = 0; j < i; ++j)
> >         if (unprom[j].op == unprom[i].op)
> >           break;
> > +      bool only_sign = allow_short_sign_mismatch
> > +                      && TYPE_SIGN (type) != TYPE_SIGN (unprom[i].type)
> > +                      && TYPE_PRECISION (type) == TYPE_PRECISION
> > (unprom[i].type);
> >
> > this could use the same tree_nop_conversion_p predicate.
> >
> > Otherwise the patch looks good.
> >
> > Thanks,
> > Richard.
> >
> >
> >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > > 	* optabs.def (usdot_prod_optab): New.
> > > 	* doc/md.texi: Document it and clarify other dot prod optabs.
> > > 	* optabs-tree.h (enum optab_subtype): Add
> > optab_vector_mixed_sign.
> > > 	* optabs-tree.c (optab_for_tree_code): Support usdot_prod_optab.
> > > 	* optabs.c (expand_widen_pattern_expr): Likewise.
> > > 	* tree-cfg.c (verify_gimple_assign_ternary): Likewise.
> > > 	* tree-vect-loop.c (vect_determine_dot_kind): New.
> > > 	(vectorizable_reduction): Query dot-product kind.
> > > 	* tree-vect-patterns.c (vect_supportable_direct_optab_p): Take
> > optional
> > > 	optab subtype.
> > > 	(vect_joust_widened_type, vect_widened_op_tree): Optionally
> > ignore
> > > 	mismatch types.
> > > 	(vect_recog_dot_prod_pattern): Support usdot_prod_optab.
> > >
> > >
> > > > -----Original Message-----
> > > > From: Richard Biener <rguenther@suse.de>
> > > > Sent: Monday, May 10, 2021 2:29 PM
> > > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> > > > Subject: RE: [PATCH 1/4]middle-end Vect: Add support for
> > > > dot-product where the sign for the multiplicant changes.
> > > >
> > > > On Mon, 10 May 2021, Tamar Christina wrote:
> > > >
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Richard Biener <rguenther@suse.de>
> > > > > > Sent: Monday, May 10, 2021 12:40 PM
> > > > > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > > > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> > > > > > Subject: RE: [PATCH 1/4]middle-end Vect: Add support for dot-
> > product
> > > > > > where the sign for the multiplicant changes.
> > > > > >
> > > > > > On Fri, 7 May 2021, Tamar Christina wrote:
> > > > > >
> > > > > > > Hi Richi,
> > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Richard Biener <rguenther@suse.de>
> > > > > > > > Sent: Friday, May 7, 2021 12:46 PM
> > > > > > > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > > > > > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> > > > > > > > Subject: Re: [PATCH 1/4]middle-end Vect: Add support for
> > > > > > > > dot-product where the sign for the multiplicant changes.
> > > > > > > >
> > > > > > > > On Wed, 5 May 2021, Tamar Christina wrote:
> > > > > > > >
> > > > > > > > > Hi All,
> > > > > > > > >
> > > > > > > > > This patch adds support for a dot product where the sign
> > > > > > > > > of the multiplication arguments differ. i.e. one is
> > > > > > > > > signed and one is unsigned but the precisions are the same.
> > > > > > > > >
> > > > > > > > > #define N 480
> > > > > > > > > #define SIGNEDNESS_1 unsigned #define SIGNEDNESS_2
> > > > > > > > > signed #define SIGNEDNESS_3 signed #define SIGNEDNESS_4
> > > > > > > > > unsigned
> > > > > > > > >
> > > > > > > > > SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1
> > > > > > > > > int res,
> > > > > > > > > SIGNEDNESS_3 char *restrict a,
> > > > > > > > >    SIGNEDNESS_4 char *restrict b) {
> > > > > > > > >   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> > > > > > > > >     {
> > > > > > > > >       int av = a[i];
> > > > > > > > >       int bv = b[i];
> > > > > > > > >       SIGNEDNESS_2 short mult = av * bv;
> > > > > > > > >       res += mult;
> > > > > > > > >     }
> > > > > > > > >   return res;
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > The operations are performed as if the operands were
> > extended
> > > > > > > > > to a 32-bit
> > > > > > > > value.
> > > > > > > > > As such this operation isn't valid if there is an
> > > > > > > > > intermediate conversion to an unsigned value. i.e.  if
> > > > > > > > > SIGNEDNESS_2 is
> > unsigned.
> > > > > > > > >
> > > > > > > > > more over if the signs of SIGNEDNESS_3 and SIGNEDNESS_4
> > > > > > > > > are flipped the same optab is used but the operands are
> > > > > > > > > flipped in the optab
> > > > > > > > expansion.
> > > > > > > > >
> > > > > > > > > To support this the patch extends the dot-product
> > > > > > > > > detection to optionally ignore operands with different
> > > > > > > > > signs and stores this information in the optab subtype
> > > > > > > > > which is now made a
> > bitfield.
> > > > > > > > >
> > > > > > > > > The subtype can now additionally controls which optab an
> > > > > > > > > EXPR can expand
> > > > > > > > to.
> > > > > > > > >
> > > > > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no
> > issues.
> > > > > > > > >
> > > > > > > > > Ok for master?
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Tamar
> > > > > > > > >
> > > > > > > > > gcc/ChangeLog:
> > > > > > > > >
> > > > > > > > > 	* optabs.def (usdot_prod_optab): New.
> > > > > > > > > 	* doc/md.texi: Document it.
> > > > > > > > > 	* optabs-tree.c (optab_for_tree_code): Support
> > > > usdot_prod_optab.
> > > > > > > > > 	* optabs-tree.h (enum optab_subtype): Likewise.
> > > > > > > > > 	* optabs.c (expand_widen_pattern_expr): Likewise.
> > > > > > > > > 	* tree-cfg.c (verify_gimple_assign_ternary): Likewise.
> > > > > > > > > 	* tree-vect-loop.c (vect_determine_dot_kind): New.
> > > > > > > > > 	(vectorizable_reduction): Query dot-product kind.
> > > > > > > > > 	* tree-vect-patterns.c (vect_supportable_direct_optab_p):
> > > > > > > > > Take
> > > > > > > > optional
> > > > > > > > > 	optab subtype.
> > > > > > > > > 	(vect_joust_widened_type, vect_widened_op_tree):
> > > > Optionally
> > > > > > > > ignore
> > > > > > > > > 	mismatch types.
> > > > > > > > > 	(vect_recog_dot_prod_pattern): Support usdot_prod_optab.
> > > > > > > > >
> > > > > > > > > --- inline copy of patch -- diff --git a/gcc/doc/md.texi
> > > > > > > > > b/gcc/doc/md.texi index
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> >
> d166a0debedf4d8edf55c842bcf4ff4690b3e9ce..baf20416e63745097825fc30fd
> > > > > > > > f2
> > > > > > > > > e66bc80d7d23 100644
> > > > > > > > > --- a/gcc/doc/md.texi
> > > > > > > > > +++ b/gcc/doc/md.texi
> > > > > > > > > @@ -5440,11 +5440,13 @@ Like
> > @samp{fold_left_plus_@var{m}},
> > > > > > > > > but
> > > > > > > > takes
> > > > > > > > > an additional mask operand  @item
> > > > > > > > > @samp{sdot_prod@var{m}}
> > > > > > @cindex
> > > > > > > > > @code{udot_prod@var{m}} instruction pattern  @itemx
> > > > > > > > > @samp{udot_prod@var{m}}
> > > > > > > > > +@cindex @code{usdot_prod@var{m}} instruction pattern
> > @itemx
> > > > > > > > > +@samp{usdot_prod@var{m}}
> > > > > > > > >  Compute the sum of the products of two signed/unsigned
> > > > elements.
> > > > > > > > > -Operand 1 and operand 2 are of the same mode. Their
> > > > > > > > > product, which is of a -wider mode, is computed and added to
> operand 3.
> > > > > > > > > Operand 3 is of a mode equal or -wider than the mode of
> > > > > > > > > the product. The result is placed in operand 0, which
> > > > > > > > > -is of the same mode
> > > > > > as operand 3.
> > > > > > > > > +Operand 1 and operand 2 are of the same mode but may
> > > > > > > > > +differ in
> > > > > > signs.
> > > > > > > > > +Their product, which is of a wider mode, is computed
> > > > > > > > > +and added to
> > > > > > > > operand 3.
> > > > > > > > > +Operand 3 is of a mode equal or wider than the mode of
> > > > > > > > > +the
> > > > product.
> > > > > > > > > +The result is placed in operand 0, which is of the same
> > > > > > > > > +mode as
> > > > > > operand 3.
> > > > > > > >
> > > > > > > > This doesn't really say what the 's', 'u' and 'us' specify.
> > > > > > > > Since we're doing a widen multiplication and then a
> > > > > > > > non-widening addition we only need to know the effective
> > > > > > > > sign of the multiplication so I think
> > > > > > the existing 's' and 'u'
> > > > > > > > are enough to cover all cases?
> > > > > > >
> > > > > > > The existing 's' and 'u' enforce that both operands of the
> > > > > > > multiplication are of the same sign.  So for e.g. 'u' both
> > > > > > > operand must be
> > > > > > unsigned.
> > > > > > >
> > > > > > > In the `us` case one can be signed and one unsigned.
> > > > > > > Operationally this does a sign extension to the wider type
> > > > > > > for the signed value, and the unsigned value gets zero
> > > > > > > extended first, and then converts it to unsigned to perform
> > > > > > > the unsigned multiplication, conforming to the C
> > > > > > promotion rules.
> > > > > > >
> > > > > > > TL;DR; Without a new optab I can't tell during expansion
> > > > > > > which semantic the operation had at the gimple/C level as
> > > > > > > modes don't
> > carry
> > > > signs.
> > > > > > >
> > > > > > > Long version:
> > > > > > >
> > > > > > > The problem with using the existing patterns, because of
> > > > > > > their enforcement of `av` and `bv` being the same sign is
> > > > > > > that we can't remove the explicit sign extensions, but the
> > > > > > > multiplication must be done on
> > > > > > the sign/zero extended char input in the same sign.
> > > > > > >
> > > > > > > Which means (unless I am mistaken) to get the correct
> > > > > > > result, you can't use neither `udot` nor `sdot` as
> > > > > > > semantically these would zero or sign extend both operands
> > > > > > > from char to int to perform the multiplication in the same
> > > > > > > sigh.  Whereas in this case, one parameter is zero
> > > > > > and one parameter is sign extended and the result is always an
> > > > > > unsigned number.
> > > > > > >
> > > > > > > So basically
> > > > > > >
> > > > > > > udot<unsigned c, unsigned a, unsigned b> ==
> > > > > > >    c = zero-ext (a) * zero-ext (b) sdot<signed c, signed a,
> > > > > > > signed
> > > > > > > b> ==
> > > > > > >    c = sign-ext (a) * sign-ext (b) usdot<unsigned c,
> > > > > > > unsigned a, signed b> ==
> > > > > > >    c = ((unsigned-conv) sign-ext (a)) * zero-ext (b)
> > > > > > >
> > > > > > > So semantically the existing optabs won't fit here. udot
> > > > > > > would internally promote to unsigned types before the
> > > > > > > multiplication so the result of the multiplication would be
> > > > > > > wrong.  sdot would promote both to
> > > > > > signed and do signed multiplication, so the result is also wrong.
> > > > > > >
> > > > > > > Now if I relax the constraint on the signs of udot and sdot
> > > > > > > there are two
> > > > > > problems:
> > > > > > > RTL Modes don't contain signs.  So a target can't tell me
> > > > > > > how the operands
> > > > > > will be promoted.
> > > > > > > So:
> > > > > > >
> > > > > > > 1) I can't really check which semantics the target will
> > > > > > > adhere to on
> > > > > > expansion.
> > > > > > > 2) at expand time I have no way to differentiate between the
> > > > > > > two
> > > > > > instructions variants, given just modes
> > > > > > >      I can't tell whether I expand to the normal dot-product
> > > > > > > or the new
> > > > > > instruction.
> > > > > >
> > > > > > Ah, OK.  Indeed with such a weird instruction the new variant
> > > > > > makes
> > > > sense.
> > > > > > Still can you please amend the optab documentation to say
> > > > > > which operand is unsigned and which is signed?  Just 'may differ in
> signs'
> > > > > > is bad.
> > > > >
> > > > > Sure, will expand on it.
> > > > >
> > > > > >
> > > > > > Since the multiplication is commutative I wonder why you need
> > > > > > to handle both signed_to_unsigned and unsigned_to_signed - we
> > should
> > > > > > just enforce a canonical order (like the optab does).
> > > > >
> > > > > Sure, I thought it would have been better to change the order at
> > > > > expand time, but can do so at detection time.
> > > > >
> > > > > > I also think it's a particular bad fit for the bad
> > > > > > optab_for_tree_code API - would any of that improve when using
> > > > > > a direct internal function here?
> > > > >
> > > > > Somewhat, but this has considerable knock on effects, e.g.
> > > > > currently DOT_PROD is treated as a widening operation and so is
> > > > > handled by supportable_widening_operation which does not support
> > > > > calls. There's
> > a
> > > > > significant number of places which work on the tree EXPR
> > > > > (including
> > > > constant folding) which all need to be changed.
> > > > >
> > > > > > In particular all the changes around optab_subtype look like
> > > > > > they make a bad API worse ... at least a single
> > > > > > optab_vector_mixed_sign should suffice here, no need to make it a
> flags kind.
> > > > >
> > > > > The reason I did so is because depending on where the query is
> > > > > done it does use different subtypes currently.  During detection
> > > > > it uses optab_default, and during vectorization optab_vector.
> > > > > For this instruction this difference doesn't seem to be used,
> > > > > but did not want to
> > > > lose this information in case something depended on it.
> > > > >
> > > > > But can make it just one.
> > > > >
> > > > > >
> > > > > > +  /* If we have a sign changing dot product we need to check
> > > > > > + that
> > the
> > > > > > +     promoted type if unsigned has at least the same
> > > > > > + precision as the
> > > > > > final
> > > > > > +     type of the dot-product.  */  if (subtype !=
> > > > > > + optab_default)
> > > > > > +    {
> > > > > > +      tree mult_type = TREE_TYPE (unprom_mult.op);
> > > > > > +      if (TYPE_SIGN (mult_type) == UNSIGNED
> > > > > > +         && TYPE_PRECISION (mult_type) < TYPE_PRECISION (type))
> > > > > > +       return NULL;
> > > > > > +    }
> > > > > >
> > > > > > I don't understand this - how do we ever arrive at a result
> > > > > > with less
> > > > precision?
> > > > >
> > > > > The user could have manually truncated the results, i.e. in the
> > > > > detection code notice `mult`
> > > > >
> > > > >       int av = a[i];
> > > > >       int bv = b[i];
> > > > >       SIGNEDNESS_2 short mult = av * bv;
> > > > >       res += mult;
> > > > >
> > > > > which is a short, so it's manually truncating the multiplication
> > > > > which is done as int by the instruction. If `mult` is unsigned
> > > > > then it will truncate the result if the signed input to usdot
> > > > > was negative, unless the Intermediate calculation is of the same
> > > > > precision as the instruction. i.e. if mult is unsigned int then
> > > > > there's no truncation going on, it's casting from int to
> > > > > unsigned int so it's safe to use then as the instruction does the same
> thing internally.
> > > >
> > > > It looks to me that we simply should only ever allow sing-changes
> > > > from multiplication result to the sum.  At least your example
> > > > above is not
> > special to
> > > > mixed sign multiplications, no?
> > > >
> > > > > > And why's this not an issue for signed multiplication?
> > > > >
> > > > > It is, but in that case it's handled by the type jousting, which
> > > > > doesn't allow the type mismatch. i.e.
> > > > >
> > > > > #define SIGNEDNESS_1 unsigned
> > > > > #define SIGNEDNESS_2 unsigned
> > > > > #define SIGNEDNESS_3 signed
> > > > > #define SIGNEDNESS_4 signed
> > > > >
> > > > > SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int
> > > > > res,
> > > > > SIGNEDNESS_3 char *restrict a,
> > > > >    SIGNEDNESS_4 char *restrict b) {
> > > > >   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> > > > >     {
> > > > >       int av = a[i];
> > > > >       int bv = b[i];
> > > > >       SIGNEDNESS_2 short mult = av * bv;
> > > > >       res += mult;
> > > > >     }
> > > > >   return res;
> > > > > }
> > > > >
> > > > > Is also not detected as a dot product.  By adding the carve out
> > > > > to the widen multiplication detection it now allows this case
> > > > > through so I handle it in the detection code.  Thinking about it
> > > > > now, it seems more logical to add this case handling inside the
> > > > > type jousting code as I don't think it's ever something you'd want.
> > > >
> > > > Yeah, I think we only need to look through sign changes on the
> > multiplication
> > > > result.
> > > >
> > > > > > Also...
> > > > > >
> > > > > > +  /* If we have a sign changing dot-product the dot-product
> > > > > > + itself does
> > > > > > any
> > > > > > +     sign conversions, so consume the type and use the
> > > > > > + unpromoted
> > > > types.
> > > > > > */
> > > > > > +  tree mult_arg1, mult_arg2;
> > > > > > +  if (subtype == optab_default)
> > > > > > +    {
> > > > > > +      mult_arg1 = mult_oprnd[0];
> > > > > > +      mult_arg2 = mult_oprnd[1];
> > > > > > +    }
> > > > > > +  else
> > > > > > +    {
> > > > > > +      mult_arg1 = unprom0[0].op;
> > > > > > +      mult_arg2 = unprom0[1].op;
> > > > > > +    }
> > > > > >    pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,
> > > > > > -                                     mult_oprnd[0], mult_oprnd[1],
> > > > > > oprnd1);
> > > > > > +                                     mult_arg1, mult_arg2,
> > > > > > + oprnd1);
> > > > > >
> > > > > > I thought DOT_PROD always performs the promotion.  Maybe
> > > > mult_oprnd
> > > > > > and unprom0 are just misnamed here?
> > > > >
> > > > > Somewhat, in a normal dot-product the sign of the multiplication
> > > > > are the same here as the "unpromoted" types. So after
> > vect_convert_input
> > > > > these two types are the same.
> > > > >
> > > > > However because here the sign changes and to maintain the
> > > > > semantics
> > of
> > > > > the C code there's an extra conversion here to get the arguments
> > > > > in the same sign.  That needs to be stripped before given to the
> > > > > instruction which does the conversion internally.
> > > >
> > > > Yes, but then why's that not done by the detection code?  That is,
> > > > does it (mis-)handle the (int)short_a * (int)(unsigned
> > > > short)short_b where we'd want the mixed-sign handling and not
> > > > strip the unsigned short conversion from short_b?
> > > >
> > > > Richard.
> > > >
> > > > >
> > > > > Regards,
> > > > > Tamar
> > > > >
> > > > > >
> > > > > > Richard.
> > > > > >
> > > > > > > Regards,
> > > > > > > Tamar
> > > > > > >
> > > > > > > >
> > > > > > > > The tree.def docs say the sum is also possibly widening
> > > > > > > > but I don't see this covered by the optab so we should
> > > > > > > > eventually remove this feature from the tree side.  In
> > > > > > > > fact the tree-cfg.c verifier requires the addition to be
> > > > > > > > not widening - thus only tree.def needs
> > > > > > adjustment.
> > > > > > > >
> > > > > > > > >  @cindex @code{ssad@var{m}} instruction pattern  @item
> > > > > > > > > @samp{ssad@var{m}} diff --git a/gcc/optabs-tree.h
> > > > > > > > > b/gcc/optabs-tree.h index
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> >
> c3aaa1a416991e856d3e24da45968a92ebada82c..ebc23ac86fe99057f375781c2f
> > > > > > > > 19
> > > > > > > > > 90e0548ba08d 100644
> > > > > > > > > --- a/gcc/optabs-tree.h
> > > > > > > > > +++ b/gcc/optabs-tree.h
> > > > > > > > > @@ -27,11 +27,29 @@ along with GCC; see the file COPYING3.
> > If
> > > > > > > > > not
> > > > > > see
> > > > > > > > >     shift amount vs. machines that take a vector for the
> > > > > > > > > shift
> > amount.
> > > > > > > > > */  enum optab_subtype  {
> > > > > > > > > -  optab_default,
> > > > > > > > > -  optab_scalar,
> > > > > > > > > -  optab_vector
> > > > > > > > > +  optab_default = 1 << 0,  optab_scalar = 1 << 1,
> > > > > > > > > + optab_vector = 1 << 2,  optab_signed_to_unsigned = 1
> > > > > > > > > + << 3, optab_unsigned_to_signed =
> > > > > > > > > + 1 << 4
> > > > > > > > >  };
> > > > > > > > >
> > > > > > > > > +/* Override the OrEqual-operator so we can use
> > optab_subtype
> > > > > > > > > +as a bit flag.  */ inline enum optab_subtype& operator
> > > > > > > > > +|= (enum
> > > > > > > > optab_subtype&
> > > > > > > > > +a, enum optab_subtype b) {
> > > > > > > > > +    return a = static_cast<optab_subtype>(static_cast<int>(a)
> > > > > > > > > +					  |
> static_cast<int>(b)); }
> > > > > > > > > +
> > > > > > > > > +/* Override the Or-operator so we can use optab_subtype
> > > > > > > > > +as a bit flag.  */ inline enum optab_subtype operator |
> > > > > > > > > +(enum optab_subtype a, enum optab_subtype b) {
> > > > > > > > > +    return static_cast<optab_subtype>(static_cast<int>(a)
> > > > > > > > > +				      | static_cast<int>(b)); }
> > > > > > > > > +
> > > > > > > > >  /* Return the optab used for computing the given
> > > > > > > > > operation on the type
> > > > > > > > given by
> > > > > > > > >     the second argument.  The third argument
> > > > > > > > > distinguishes between the
> > > > > > > > types of
> > > > > > > > >     vector shifts and rotates.  */ diff --git
> > > > > > > > > a/gcc/optabs-tree.c b/gcc/optabs-tree.c index
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> >
> 95ffe397c23e80c105afea52e9d47216bf52f55a..2f60004545defc53182e004eea
> > > > > > > > 1e
> > > > > > > > > 5c22b7453072 100644
> > > > > > > > > --- a/gcc/optabs-tree.c
> > > > > > > > > +++ b/gcc/optabs-tree.c
> > > > > > > > > @@ -127,7 +127,17 @@ optab_for_tree_code (enum
> tree_code
> > > > code,
> > > > > > > > const_tree type,
> > > > > > > > >        return TYPE_UNSIGNED (type) ? usum_widen_optab :
> > > > > > > > > ssum_widen_optab;
> > > > > > > > >
> > > > > > > > >      case DOT_PROD_EXPR:
> > > > > > > > > -      return TYPE_UNSIGNED (type) ? udot_prod_optab :
> > > > > > sdot_prod_optab;
> > > > > > > > > +      {
> > > > > > > > > +	gcc_assert (subtype & optab_default
> > > > > > > > > +		    || subtype & optab_vector
> > > > > > > > > +		    || subtype & optab_signed_to_unsigned
> > > > > > > > > +		    || subtype & optab_unsigned_to_signed);
> > > > > > > > > +
> > > > > > > > > +	if (subtype & (optab_unsigned_to_signed |
> > > > > > > > optab_signed_to_unsigned))
> > > > > > > > > +	  return usdot_prod_optab;
> > > > > > > > > +
> > > > > > > > > +	return (TYPE_UNSIGNED (type) ? udot_prod_optab :
> > > > > > > > sdot_prod_optab);
> > > > > > > > > +      }
> > > > > > > > >
> > > > > > > > >      case SAD_EXPR:
> > > > > > > > >        return TYPE_UNSIGNED (type) ? usad_optab :
> > > > > > > > > ssad_optab; diff --git a/gcc/optabs.c b/gcc/optabs.c
> > > > > > > > > index
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> >
> f4614a394587787293dc8b680a38901f7906f61c..2e18b76de1412eab71971753ac
> > > > > > > > 67
> > > > > > > > > 8597c0d00098 100644
> > > > > > > > > --- a/gcc/optabs.c
> > > > > > > > > +++ b/gcc/optabs.c
> > > > > > > > > @@ -262,6 +262,11 @@ expand_widen_pattern_expr (sepops
> > ops,
> > > > > > > > > rtx op0,
> > > > > > > > rtx op1, rtx wide_op,
> > > > > > > > >    bool sbool = false;
> > > > > > > > >
> > > > > > > > >    oprnd0 = ops->op0;
> > > > > > > > > +  if (nops >= 2)
> > > > > > > > > +    oprnd1 = ops->op1;
> > > > > > > > > +  if (nops >= 3)
> > > > > > > > > +    oprnd2 = ops->op2;
> > > > > > > > > +
> > > > > > > > >    tmode0 = TYPE_MODE (TREE_TYPE (oprnd0));
> > > > > > > > >    if (ops->code == VEC_UNPACK_FIX_TRUNC_HI_EXPR
> > > > > > > > >        || ops->code == VEC_UNPACK_FIX_TRUNC_LO_EXPR)
> @@
> > > > > > > > > -
> > > > 285,6
> > > > > > > > +290,27
> > > > > > > > > @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx
> > > > > > > > > op1, rtx
> > > > > > > > wide_op,
> > > > > > > > >  	   ? vec_unpacks_sbool_hi_optab :
> > > > vec_unpacks_sbool_lo_optab);
> > > > > > > > >        sbool = true;
> > > > > > > > >      }
> > > > > > > > > +  else if (ops->code == DOT_PROD_EXPR)
> > > > > > > > > +    {
> > > > > > > > > +      enum optab_subtype subtype = optab_default;
> > > > > > > > > +      signop sign1 = TYPE_SIGN (TREE_TYPE (oprnd0));
> > > > > > > > > +      signop sign2 = TYPE_SIGN (TREE_TYPE (oprnd1));
> > > > > > > > > +      if (sign1 == sign2)
> > > > > > > > > +	;
> > > > > > > > > +      else if (sign1 == SIGNED && sign2 == UNSIGNED)
> > > > > > > > > +	{
> > > > > > > > > +	  subtype |= optab_signed_to_unsigned;
> > > > > > > > > +	  /* Same as optab_unsigned_to_signed but flip the
> > > > operands.  */
> > > > > > > > > +	  std::swap (op0, op1);
> > > > > > > > > +	}
> > > > > > > > > +      else if (sign1 == UNSIGNED && sign2 == SIGNED)
> > > > > > > > > +	subtype |= optab_unsigned_to_signed;
> > > > > > > > > +      else
> > > > > > > > > +	gcc_unreachable ();
> > > > > > > > > +
> > > > > > > > > +      widen_pattern_optab
> > > > > > > > > +	= optab_for_tree_code (ops->code, TREE_TYPE
> (oprnd0),
> > > > subtype);
> > > > > > > > > +    }
> > > > > > > > >    else
> > > > > > > > >      widen_pattern_optab
> > > > > > > > >        = optab_for_tree_code (ops->code, TREE_TYPE
> > > > > > > > > (oprnd0), optab_default); @@ -298,10 +324,7 @@
> > > > expand_widen_pattern_expr
> > > > > > > > (sepops ops, rtx op0, rtx op1, rtx wide_op,
> > > > > > > > >    gcc_assert (icode != CODE_FOR_nothing);
> > > > > > > > >
> > > > > > > > >    if (nops >= 2)
> > > > > > > > > -    {
> > > > > > > > > -      oprnd1 = ops->op1;
> > > > > > > > > -      tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
> > > > > > > > > -    }
> > > > > > > > > +    tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
> > > > > > > > >    else if (sbool)
> > > > > > > > >      {
> > > > > > > > >        nops = 2;
> > > > > > > > > @@ -316,7 +339,6 @@ expand_widen_pattern_expr (sepops
> > ops,
> > > > rtx
> > > > > > > > > op0,
> > > > > > > > rtx op1, rtx wide_op,
> > > > > > > > >      {
> > > > > > > > >        gcc_assert (tmode1 == tmode0);
> > > > > > > > >        gcc_assert (op1);
> > > > > > > > > -      oprnd2 = ops->op2;
> > > > > > > > >        wmode = TYPE_MODE (TREE_TYPE (oprnd2));
> > > > > > > > >      }
> > > > > > > > >
> > > > > > > > > diff --git a/gcc/optabs.def b/gcc/optabs.def index
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> >
> b192a9d070b8aa72e5676b2eaa020b5bdd7ffcc8..f470c2168378cec840edf7fbd
> > > > > > > > b7c
> > > > > > > > > 18615baae928 100644
> > > > > > > > > --- a/gcc/optabs.def
> > > > > > > > > +++ b/gcc/optabs.def
> > > > > > > > > @@ -352,6 +352,7 @@ OPTAB_D (uavg_ceil_optab,
> > "uavg$a3_ceil")
> > > > > > > > OPTAB_D
> > > > > > > > > (sdot_prod_optab, "sdot_prod$I$a")  OPTAB_D
> > > > (ssum_widen_optab,
> > > > > > > > > "widen_ssum$I$a3")  OPTAB_D (udot_prod_optab,
> > > > "udot_prod$I$a")
> > > > > > > > > +OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
> > > > > > > > >  OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
> > OPTAB_D
> > > > > > > > (usad_optab,
> > > > > > > > > "usad$I$a")  OPTAB_D (ssad_optab, "ssad$I$a") diff --git
> > > > > > > > > a/gcc/tree-cfg.c b/gcc/tree-cfg.c index
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> >
> 7e3aae5f9c28a49feedc7cc66e8ac0d476b9f28a..58b55bb648ad97d514f1fa18bb
> > > > > > > > 00
> > > > > > > > > 808fd2678b42 100644
> > > > > > > > > --- a/gcc/tree-cfg.c
> > > > > > > > > +++ b/gcc/tree-cfg.c
> > > > > > > > > @@ -4421,7 +4421,8 @@ verify_gimple_assign_ternary
> > > > > > > > > (gassign
> > > > *stmt)
> > > > > > > > >  		  && !SCALAR_FLOAT_TYPE_P (rhs1_type))
> > > > > > > > >  		 || (!INTEGRAL_TYPE_P (lhs_type)
> > > > > > > > >  		     && !SCALAR_FLOAT_TYPE_P (lhs_type))))
> > > > > > > > > -	    || !types_compatible_p (rhs1_type, rhs2_type)
> > > > > > > > > +	    || (!types_compatible_p (rhs1_type, rhs2_type)
> > > > > > > > > +		&& TYPE_SIGN (rhs1_type) == TYPE_SIGN
> > > > (rhs2_type))
> > > > > > > >
> > > > > > > > That's not restrictive enough.  I suggest you use
> > > > > > > >
> > > > > > > >             && element_precision (rhs1_type) !=
> > > > > > > > element_precision
> > > > > > > > (rhs2_type)
> > > > > > > >
> > > > > > > > instead.
> > > > > > > >
> > > > > > > > As said, I'm not sure all the changes in this patch are required.
> > > > > > > >
> > > > > > > > Please elaborate.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Richard.
> > > > > > > >
> > > > > > > > >  	    || !useless_type_conversion_p (lhs_type, rhs3_type)
> > > > > > > > >  	    || maybe_lt (GET_MODE_SIZE (element_mode
> > > > (rhs3_type)),
> > > > > > > > >  			 2 * GET_MODE_SIZE (element_mode
> > > > (rhs1_type))))
> > > > > > > > diff --git
> > > > > > > > > a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> >
> 93fa2928e001c154bd4a9a73ac1dbbbf73c456df..cb8f5fbb6abca181c4171194d1
> > > > > > > > 9f
> > > > > > > > > ec29ec6e4176 100644
> > > > > > > > > --- a/gcc/tree-vect-loop.c
> > > > > > > > > +++ b/gcc/tree-vect-loop.c
> > > > > > > > > @@ -6401,6 +6401,33 @@ build_vect_cond_expr (enum
> > tree_code
> > > > > > code,
> > > > > > > > tree vop[3], tree mask,
> > > > > > > > >      }
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > > > +/* Determine the optab_subtype to use for the given
> > > > > > > > > +CODE
> > and
> > > > STMT.
> > > > > > > > For
> > > > > > > > > +   most CODE this will be optab_vector, however for
> > > > > > > > > + certain operations
> > > > > > > > such as
> > > > > > > > > +   DOT_PROD_EXPR where the operation can different
> > > > > > > > > + signs for the
> > > > > > > > operands we
> > > > > > > > > +   need to be able to pick the right optabs.  */
> > > > > > > > > +
> > > > > > > > > +static enum optab_subtype vect_determine_dot_kind
> > > > > > > > > +(tree_code code, stmt_vec_info
> > > > > > > > > +stmt_vinfo) {
> > > > > > > > > +  enum optab_subtype subtype = optab_vector;
> > > > > > > > > +  switch (code)
> > > > > > > > > +    {
> > > > > > > > > +      case DOT_PROD_EXPR:
> > > > > > > > > +	{
> > > > > > > > > +	  gassign *stmt = as_a <gassign *>
> (STMT_VINFO_STMT
> > > > (stmt_vinfo));
> > > > > > > > > +	  signop rhs1_sign = TYPE_SIGN (TREE_TYPE
> > > > > > > > > +(gimple_assign_rhs1
> > > > > > > > (stmt)));
> > > > > > > > > +	  signop rhs2_sign = TYPE_SIGN (TREE_TYPE
> > > > > > > > > +(gimple_assign_rhs2
> > > > > > > > (stmt)));
> > > > > > > > > +	  if (rhs1_sign != rhs2_sign)
> > > > > > > > > +	    subtype |= optab_unsigned_to_signed;
> > > > > > > > > +	  break;
> > > > > > > > > +	}
> > > > > > > > > +      default:
> > > > > > > > > +	break;
> > > > > > > > > +    }
> > > > > > > > > +
> > > > > > > > > +  return subtype;
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > >  /* Function vectorizable_reduction.
> > > > > > > > >
> > > > > > > > >     Check if STMT_INFO performs a reduction operation
> > > > > > > > > that can be
> > > > > > > > vectorized.
> > > > > > > > > @@ -7189,7 +7216,8 @@ vectorizable_reduction
> > > > > > > > > (loop_vec_info
> > > > > > > > loop_vinfo,
> > > > > > > > >        bool ok = true;
> > > > > > > > >
> > > > > > > > >        /* 4.1. check support for the operation in the loop  */
> > > > > > > > > -      optab optab = optab_for_tree_code (code, vectype_in,
> > > > > > optab_vector);
> > > > > > > > > +      enum optab_subtype subtype =
> > > > > > > > > + vect_determine_dot_kind (code,
> > > > > > > > stmt_info);
> > > > > > > > > +      optab optab = optab_for_tree_code (code,
> > > > > > > > > + vectype_in, subtype);
> > > > > > > > >        if (!optab)
> > > > > > > > >  	{
> > > > > > > > >  	  if (dump_enabled_p ()) diff --git
> > > > > > > > > a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
> > > > > > > > > index
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> >
> 441d6cd28c4eaded7abd756164890dbcffd2f3b8..943c001fb13777b4d1513841f
> > > > > > > > a84
> > > > > > > > > 942316846d5e 100644
> > > > > > > > > --- a/gcc/tree-vect-patterns.c
> > > > > > > > > +++ b/gcc/tree-vect-patterns.c
> > > > > > > > > @@ -201,7 +201,8 @@ vect_get_external_def_edge
> (vec_info
> > > > > > > > > *vinfo, tree
> > > > > > > > > var)  static bool  vect_supportable_direct_optab_p
> > > > > > > > > (vec_info *vinfo, tree otype, tree_code code,
> > > > > > > > >  				 tree itype, tree *vecotype_out,
> > > > > > > > > -				 tree *vecitype_out = NULL)
> > > > > > > > > +				 tree *vecitype_out = NULL,
> > > > > > > > > +				 enum optab_subtype
> subtype =
> > > > > > > > optab_default)
> > > > > > > > >  {
> > > > > > > > >    tree vecitype = get_vectype_for_scalar_type (vinfo, itype);
> > > > > > > > >    if (!vecitype)
> > > > > > > > > @@ -211,7 +212,7 @@ vect_supportable_direct_optab_p
> > (vec_info
> > > > > > > > > *vinfo,
> > > > > > > > tree otype, tree_code code,
> > > > > > > > >    if (!vecotype)
> > > > > > > > >      return false;
> > > > > > > > >
> > > > > > > > > -  optab optab = optab_for_tree_code (code, vecitype,
> > > > > > > > > optab_default);
> > > > > > > > > +  optab optab = optab_for_tree_code (code, vecitype,
> > > > > > > > > + subtype);
> > > > > > > > >    if (!optab)
> > > > > > > > >      return false;
> > > > > > > > >
> > > > > > > > > @@ -487,14 +488,31 @@ vect_joust_widened_integer (tree
> > type,
> > > > > > > > > bool shift_p, tree op,  }
> > > > > > > > >
> > > > > > > > >  /* Return true if the common supertype of NEW_TYPE and
> > > > > > > > *COMMON_TYPE
> > > > > > > > > -   is narrower than type, storing the supertype in
> > *COMMON_TYPE
> > > > if
> > > > > > so.
> > > > > > > > */
> > > > > > > > > +   is narrower than type, storing the supertype in
> > > > > > > > > + *COMMON_TYPE if
> > > > > > so.
> > > > > > > > > +   If ALLOW_SHORT_SIGN_MISMATCH then accept that
> > > > > > *COMMON_TYPE
> > > > > > > > and NEW_TYPE
> > > > > > > > > +   may be of different signs but equal precision.   */
> > > > > > > > >
> > > > > > > > >  static bool
> > > > > > > > > -vect_joust_widened_type (tree type, tree new_type, tree
> > > > > > > > *common_type)
> > > > > > > > > +vect_joust_widened_type (tree type, tree new_type, tree
> > > > > > > > *common_type,
> > > > > > > > > +			 bool allow_short_sign_mismatch =
> false)
> > > > > > > > >  {
> > > > > > > > >    if (types_compatible_p (*common_type, new_type))
> > > > > > > > >      return true;
> > > > > > > > >
> > > > > > > > > +  /* Check if the mismatch is only in the sign and if we have
> > > > > > > > > +     allow_short_sign_mismatch then allow it.  */
> > > > > > > > > +  if (allow_short_sign_mismatch
> > > > > > > > > +      && TYPE_SIGN (*common_type) != TYPE_SIGN
> (new_type))
> > > > > > > > > +    {
> > > > > > > > > +      bool sign = TYPE_SIGN (*common_type) == UNSIGNED;
> > > > > > > > > +      tree eq_type
> > > > > > > > > +	= build_nonstandard_integer_type (TYPE_PRECISION
> > > > (new_type),
> > > > > > > > > +					  sign);
> > > > > > > > > +
> > > > > > > > > +      if (types_compatible_p (*common_type, eq_type))
> > > > > > > > > +	return true;
> > > > > > > > > +    }
> > > > > > > > > +
> > > > > > > > >    /* See if *COMMON_TYPE can hold all values of NEW_TYPE.
> > */
> > > > > > > > >    if ((TYPE_PRECISION (new_type) < TYPE_PRECISION
> > > > (*common_type))
> > > > > > > > >        && (TYPE_UNSIGNED (new_type) || !TYPE_UNSIGNED
> > > > > > > > (*common_type)))
> > > > > > > > > @@ -532,6 +550,9 @@ vect_joust_widened_type (tree type,
> > tree
> > > > > > > > new_type, tree *common_type)
> > > > > > > > >     to a type that (a) is narrower than the result of
> > > > > > > > > STMT_INFO
> > and
> > > > > > > > >     (b) can hold all leaf operand values.
> > > > > > > > >
> > > > > > > > > +   If ALLOW_SHORT_SIGN_MISMATCH then allow that the
> > > > > > > > > + signs
> > of
> > > > > > > > > + the
> > > > > > > > operands
> > > > > > > > > +   may differ in signs but not in precision.
> > > > > > > > > +
> > > > > > > > >     Return 0 if STMT_INFO isn't such a tree, or if no
> > > > > > > > > such
> > > > COMMON_TYPE
> > > > > > > > >     exists.  */
> > > > > > > > >
> > > > > > > > > @@ -539,7 +560,8 @@ static unsigned int
> > vect_widened_op_tree
> > > > > > > > > (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
> > > > > > > > >  		      tree_code widened_code, bool shift_p,
> > > > > > > > >  		      unsigned int max_nops,
> > > > > > > > > -		      vect_unpromoted_value *unprom, tree
> > > > *common_type)
> > > > > > > > > +		      vect_unpromoted_value *unprom, tree
> > > > *common_type,
> > > > > > > > > +		      bool allow_short_sign_mismatch = false)
> > > > > > > > >  {
> > > > > > > > >    /* Check for an integer operation with the right code.  */
> > > > > > > > >    gassign *assign = dyn_cast <gassign *>
> > > > > > > > > (stmt_info->stmt); @@
> > > > > > > > > -600,7
> > > > > > > > > +622,8 @@ vect_widened_op_tree (vec_info *vinfo,
> > > > stmt_vec_info
> > > > > > > > stmt_info, tree_code code,
> > > > > > > > >  		= vinfo->lookup_def (this_unprom->op);
> > > > > > > > >  	      nops = vect_widened_op_tree (vinfo,
> > > > > > > > > def_stmt_info,
> > > > code,
> > > > > > > > >  					   widened_code, shift_p,
> > > > max_nops,
> > > > > > > > > -					   this_unprom,
> > > > common_type);
> > > > > > > > > +					   this_unprom,
> > > > common_type,
> > > > > > > > > +
> > > > allow_short_sign_mismatch);
> > > > > > > > >  	      if (nops == 0)
> > > > > > > > >  		return 0;
> > > > > > > > >
> > > > > > > > > @@ -617,7 +640,8 @@ vect_widened_op_tree (vec_info
> > > > > > > > > *vinfo,
> > > > > > > > stmt_vec_info stmt_info, tree_code code,
> > > > > > > > >  	      if (i == 0)
> > > > > > > > >  		*common_type = this_unprom->type;
> > > > > > > > >  	      else if (!vect_joust_widened_type (type,
> > > > > > > > > this_unprom-
> > > > >type,
> > > > > > > > > -						 common_type))
> > > > > > > > > +
> common_type,
> > > > > > > > > +
> > > > allow_short_sign_mismatch))
> > > > > > > > >  		return 0;
> > > > > > > > >  	    }
> > > > > > > > >  	}
> > > > > > > > > @@ -888,21 +912,24 @@ vect_reassociating_reduction_p
> > (vec_info
> > > > > > > > > *vinfo,
> > > > > > > > >
> > > > > > > > >     Try to find the following pattern:
> > > > > > > > >
> > > > > > > > > -     type x_t, y_t;
> > > > > > > > > +     type1a x_t
> > > > > > > > > +     type1b y_t;
> > > > > > > > >       TYPE1 prod;
> > > > > > > > >       TYPE2 sum = init;
> > > > > > > > >     loop:
> > > > > > > > >       sum_0 = phi <init, sum_1>
> > > > > > > > >       S1  x_t = ...
> > > > > > > > >       S2  y_t = ...
> > > > > > > > > -     S3  x_T = (TYPE1) x_t;
> > > > > > > > > -     S4  y_T = (TYPE1) y_t;
> > > > > > > > > +     S3  x_T = (TYPE3) x_t;
> > > > > > > > > +     S4  y_T = (TYPE4) y_t;
> > > > > > > > >       S5  prod = x_T * y_T;
> > > > > > > > >       [S6  prod = (TYPE2) prod;  #optional]
> > > > > > > > >       S7  sum_1 = prod + sum_0;
> > > > > > > > >
> > > > > > > > > -   where 'TYPE1' is exactly double the size of type 'type', and
> > 'TYPE2'
> > > > is
> > > > > > the
> > > > > > > > > -   same size of 'TYPE1' or bigger. This is a special case of a
> > reduction
> > > > > > > > > +   where 'TYPE1' is exactly double the size of type
> > > > > > > > > + 'type1a' and
> > > > 'type1b',
> > > > > > > > > +   the sign of 'TYPE1' must be one of 'type1a' or
> > > > > > > > > + 'type1b' but the
> > > > sign of
> > > > > > > > > +   'type1a' and 'type1b' can differ. 'TYPE2' is the
> > > > > > > > > + same size of
> > 'TYPE1'
> > > > or
> > > > > > > > > +   bigger and must be the same sign. This is a special
> > > > > > > > > + case of a reduction
> > > > > > > > >     computation.
> > > > > > > > >
> > > > > > > > >     Input:
> > > > > > > > > @@ -939,15 +966,16 @@ vect_recog_dot_prod_pattern
> > (vec_info
> > > > > > > > > *vinfo,
> > > > > > > > >
> > > > > > > > >    /* Look for the following pattern
> > > > > > > > >            DX = (TYPE1) X;
> > > > > > > > > -          DY = (TYPE1) Y;
> > > > > > > > > +	  DY = (TYPE2) Y;
> > > > > > > > >            DPROD = DX * DY;
> > > > > > > > > -          DDPROD = (TYPE2) DPROD;
> > > > > > > > > +	  DDPROD = (TYPE3) DPROD;
> > > > > > > > >            sum_1 = DDPROD + sum_0;
> > > > > > > > >       In which
> > > > > > > > >       - DX is double the size of X
> > > > > > > > >       - DY is double the size of Y
> > > > > > > > >       - DX, DY, DPROD all have the same type but the sign
> > > > > > > > > -       between DX, DY and DPROD can differ.
> > > > > > > > > +       between DX, DY and DPROD can differ. The sign of DPROD
> > > > > > > > > +       is one of the signs of DX or DY.
> > > > > > > > >       - sum is the same size of DPROD or bigger
> > > > > > > > >       - sum has been recognized as a reduction variable.
> > > > > > > > >
> > > > > > > > > @@ -986,14 +1014,41 @@ vect_recog_dot_prod_pattern
> > (vec_info
> > > > > > *vinfo,
> > > > > > > > >       inside the loop (in case we are analyzing an outer-loop).  */
> > > > > > > > >    vect_unpromoted_value unprom0[2];
> > > > > > > > >    if (!vect_widened_op_tree (vinfo, mult_vinfo,
> > > > > > > > > MULT_EXPR,
> > > > > > > > WIDEN_MULT_EXPR,
> > > > > > > > > -			     false, 2, unprom0, &half_type))
> > > > > > > > > +			     false, 2, unprom0, &half_type,
> true))
> > > > > > > > >      return NULL;
> > > > > > > > >
> > > > > > > > > +  /* Check to see if there is a sign change happening
> > > > > > > > > + in the operands of
> > > > > > > > the
> > > > > > > > > +     multiplication and pick the appropriate optab subtype.
> > > > > > > > > +*/
> > > > > > > > > +  enum optab_subtype subtype;
> > > > > > > > > +  tree rhs_type1 = unprom0[0].type;
> > > > > > > > > +  tree rhs_type2 = unprom0[1].type;
> > > > > > > > > +  if (TYPE_SIGN (rhs_type1) == TYPE_SIGN (rhs_type2))
> > > > > > > > > +     subtype = optab_default;
> > > > > > > > > +  else if (TYPE_SIGN (rhs_type1) == SIGNED
> > > > > > > > > +	   && TYPE_SIGN (rhs_type2) == UNSIGNED)
> > > > > > > > > +     subtype = optab_signed_to_unsigned;
> > > > > > > > > +  else if (TYPE_SIGN (rhs_type1) == UNSIGNED
> > > > > > > > > +	   && TYPE_SIGN (rhs_type2) == SIGNED)
> > > > > > > > > +     subtype = optab_unsigned_to_signed;
> > > > > > > > > +  else
> > > > > > > > > +    gcc_unreachable ();
> > > > > > > > > +
> > > > > > > > > +  /* If we have a sign changing dot product we need to
> > > > > > > > > + check
> > that
> > > > the
> > > > > > > > > +     promoted type if unsigned has at least the same
> > > > > > > > > + precision as the
> > > > > > final
> > > > > > > > > +     type of the dot-product.  */
> > > > > > > > > +  if (subtype != optab_default)
> > > > > > > > > +    {
> > > > > > > > > +      tree mult_type = TREE_TYPE (unprom_mult.op);
> > > > > > > > > +      if (TYPE_SIGN (mult_type) == UNSIGNED
> > > > > > > > > +	  && TYPE_PRECISION (mult_type) < TYPE_PRECISION
> (type))
> > > > > > > > > +	return NULL;
> > > > > > > > > +    }
> > > > > > > > > +
> > > > > > > > >    vect_pattern_detected ("vect_recog_dot_prod_pattern",
> > > > > > > > > last_stmt);
> > > > > > > > >
> > > > > > > > >    tree half_vectype;
> > > > > > > > >    if (!vect_supportable_direct_optab_p (vinfo, type,
> > > > > > > > > DOT_PROD_EXPR,
> > > > > > > > half_type,
> > > > > > > > > -					type_out, &half_vectype))
> > > > > > > > > +					type_out,
> &half_vectype,
> > > > subtype))
> > > > > > > > >      return NULL;
> > > > > > > > >
> > > > > > > > >    /* Get the inputs in the appropriate types.  */ @@
> > > > > > > > > -1002,8
> > > > > > > > > +1057,22 @@ vect_recog_dot_prod_pattern (vec_info
> > > > > > > > > +*vinfo,
> > > > > > > > >  		       unprom0, half_vectype);
> > > > > > > > >
> > > > > > > > >    var = vect_recog_temp_ssa_var (type, NULL);
> > > > > > > > > +
> > > > > > > > > +  /* If we have a sign changing dot-product the
> > > > > > > > > + dot-product itself does
> > > > > > any
> > > > > > > > > +     sign conversions, so consume the type and use the
> > > > > > > > > + unpromoted types.  */  tree mult_arg1, mult_arg2;  if
> > > > > > > > > + (subtype ==
> > > > > > > > > + optab_default)
> > > > > > > > > +    {
> > > > > > > > > +      mult_arg1 = mult_oprnd[0];
> > > > > > > > > +      mult_arg2 = mult_oprnd[1];
> > > > > > > > > +    }
> > > > > > > > > +  else
> > > > > > > > > +    {
> > > > > > > > > +      mult_arg1 = unprom0[0].op;
> > > > > > > > > +      mult_arg2 = unprom0[1].op;
> > > > > > > > > +    }
> > > > > > > > >    pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,
> > > > > > > > > -				      mult_oprnd[0], mult_oprnd[1],
> > > > oprnd1);
> > > > > > > > > +				      mult_arg1, mult_arg2,
> oprnd1);
> > > > > > > > >
> > > > > > > > >    return pattern_stmt;
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Richard Biener <rguenther@suse.de> SUSE Software Solutions
> > > > > > > > Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany;
> > GF:
> > > > > > > > Felix Imend?rffer; HRB 36809 (AG Nuernberg)
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > Richard Biener <rguenther@suse.de> SUSE Software Solutions
> > > > > > Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF:
> > > > > > Felix Imend?rffer; HRB 36809 (AG
> > Nuernberg)
> > > > >
> > > >
> > > > --
> > > > Richard Biener <rguenther@suse.de> SUSE Software Solutions
> Germany
> > > > GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix
> > > > Imend?rffer; HRB 36809 (AG Nuernberg)
> > >
> >
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> > Nuernberg, Germany; GF: Felix Imend

[-- Attachment #2: rb14433.patch --]
[-- Type: application/octet-stream, Size: 16555 bytes --]

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index d166a0debedf4d8edf55c842bcf4ff4690b3e9ce..9fad3322b3f1eb2a836833bb390df78f0cd9734b 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5438,13 +5438,55 @@ Like @samp{fold_left_plus_@var{m}}, but takes an additional mask operand
 
 @cindex @code{sdot_prod@var{m}} instruction pattern
 @item @samp{sdot_prod@var{m}}
+
+Compute the sum of the products of two signed elements.
+Operand 1 and operand 2 are of the same mode. Their
+product, which is of a wider mode, is computed and added to operand 3.
+Operand 3 is of a mode equal or wider than the mode of the product. The
+result is placed in operand 0, which is of the same mode as operand 3.
+
+Semantically the expressions perform the multiplication in the following signs
+
+@smallexample
+sdot<signed c, signed a, signed b> ==
+   res = sign-ext (a) * sign-ext (b) + c
+@dots{}
+@end smallexample
+
 @cindex @code{udot_prod@var{m}} instruction pattern
-@itemx @samp{udot_prod@var{m}}
-Compute the sum of the products of two signed/unsigned elements.
-Operand 1 and operand 2 are of the same mode. Their product, which is of a
-wider mode, is computed and added to operand 3. Operand 3 is of a mode equal or
-wider than the mode of the product. The result is placed in operand 0, which
-is of the same mode as operand 3.
+@item @samp{udot_prod@var{m}}
+
+Compute the sum of the products of two unsigned elements.
+Operand 1 and operand 2 are of the same mode. Their
+product, which is of a wider mode, is computed and added to operand 3.
+Operand 3 is of a mode equal or wider than the mode of the product. The
+result is placed in operand 0, which is of the same mode as operand 3.
+
+Semantically the expressions perform the multiplication in the following signs
+
+@smallexample
+udot<unsigned c, unsigned a, unsigned b> ==
+   res = zero-ext (a) * zero-ext (b) + c
+@dots{}
+@end smallexample
+
+
+
+@cindex @code{usdot_prod@var{m}} instruction pattern
+@item @samp{usdot_prod@var{m}}
+Compute the sum of the products of elements of different signs.
+Operand 1 must be unsigned and operand 2 signed. Their
+product, which is of a wider mode, is computed and added to operand 3.
+Operand 3 is of a mode equal or wider than the mode of the product. The
+result is placed in operand 0, which is of the same mode as operand 3.
+
+Semantically the expressions perform the multiplication in the following signs
+
+@smallexample
+usdot<unsigned c, unsigned a, signed b> ==
+   res = ((unsigned-conv) sign-ext (a)) * zero-ext (b) + c
+@dots{}
+@end smallexample
 
 @cindex @code{ssad@var{m}} instruction pattern
 @item @samp{ssad@var{m}}
diff --git a/gcc/optabs-tree.h b/gcc/optabs-tree.h
index c3aaa1a416991e856d3e24da45968a92ebada82c..fbd2b06b8dbfd560dfb66b314830e6b564b37abb 100644
--- a/gcc/optabs-tree.h
+++ b/gcc/optabs-tree.h
@@ -29,7 +29,8 @@ enum optab_subtype
 {
   optab_default,
   optab_scalar,
-  optab_vector
+  optab_vector,
+  optab_vector_mixed_sign
 };
 
 /* Return the optab used for computing the given operation on the type given by
diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
index 95ffe397c23e80c105afea52e9d47216bf52f55a..eeb5aeed3202cc6971b6447994bc5311e9c010bb 100644
--- a/gcc/optabs-tree.c
+++ b/gcc/optabs-tree.c
@@ -127,7 +127,12 @@ optab_for_tree_code (enum tree_code code, const_tree type,
       return TYPE_UNSIGNED (type) ? usum_widen_optab : ssum_widen_optab;
 
     case DOT_PROD_EXPR:
-      return TYPE_UNSIGNED (type) ? udot_prod_optab : sdot_prod_optab;
+      {
+	if (subtype == optab_vector_mixed_sign)
+	  return usdot_prod_optab;
+
+	return (TYPE_UNSIGNED (type) ? udot_prod_optab : sdot_prod_optab);
+      }
 
     case SAD_EXPR:
       return TYPE_UNSIGNED (type) ? usad_optab : ssad_optab;
diff --git a/gcc/optabs.c b/gcc/optabs.c
index f4614a394587787293dc8b680a38901f7906f61c..d9b64441d0e0726afee89dc9c937350451e7670d 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -262,6 +262,11 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
   bool sbool = false;
 
   oprnd0 = ops->op0;
+  if (nops >= 2)
+    oprnd1 = ops->op1;
+  if (nops >= 3)
+    oprnd2 = ops->op2;
+
   tmode0 = TYPE_MODE (TREE_TYPE (oprnd0));
   if (ops->code == VEC_UNPACK_FIX_TRUNC_HI_EXPR
       || ops->code == VEC_UNPACK_FIX_TRUNC_LO_EXPR)
@@ -285,6 +290,27 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
 	   ? vec_unpacks_sbool_hi_optab : vec_unpacks_sbool_lo_optab);
       sbool = true;
     }
+  else if (ops->code == DOT_PROD_EXPR)
+    {
+      enum optab_subtype subtype = optab_default;
+      signop sign1 = TYPE_SIGN (TREE_TYPE (oprnd0));
+      signop sign2 = TYPE_SIGN (TREE_TYPE (oprnd1));
+      if (sign1 == sign2)
+	;
+      else if (sign1 == SIGNED && sign2 == UNSIGNED)
+	{
+	  subtype = optab_vector_mixed_sign;
+	  /* Same as optab_vector_mixed_sign but flip the operands.  */
+	  std::swap (op0, op1);
+	}
+      else if (sign1 == UNSIGNED && sign2 == SIGNED)
+	subtype = optab_vector_mixed_sign;
+      else
+	gcc_unreachable ();
+
+      widen_pattern_optab
+	= optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), subtype);
+    }
   else
     widen_pattern_optab
       = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), optab_default);
@@ -298,10 +324,7 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
   gcc_assert (icode != CODE_FOR_nothing);
 
   if (nops >= 2)
-    {
-      oprnd1 = ops->op1;
-      tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
-    }
+    tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
   else if (sbool)
     {
       nops = 2;
@@ -316,7 +339,6 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
     {
       gcc_assert (tmode1 == tmode0);
       gcc_assert (op1);
-      oprnd2 = ops->op2;
       wmode = TYPE_MODE (TREE_TYPE (oprnd2));
     }
 
diff --git a/gcc/optabs.def b/gcc/optabs.def
index b192a9d070b8aa72e5676b2eaa020b5bdd7ffcc8..f470c2168378cec840edf7fbdb7c18615baae928 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -352,6 +352,7 @@ OPTAB_D (uavg_ceil_optab, "uavg$a3_ceil")
 OPTAB_D (sdot_prod_optab, "sdot_prod$I$a")
 OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3")
 OPTAB_D (udot_prod_optab, "udot_prod$I$a")
+OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
 OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
 OPTAB_D (usad_optab, "usad$I$a")
 OPTAB_D (ssad_optab, "ssad$I$a")
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index 7e3aae5f9c28a49feedc7cc66e8ac0d476b9f28a..0128891852fcd74fe31cd338614e90a26256b4bd 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -4421,7 +4421,8 @@ verify_gimple_assign_ternary (gassign *stmt)
 		  && !SCALAR_FLOAT_TYPE_P (rhs1_type))
 		 || (!INTEGRAL_TYPE_P (lhs_type)
 		     && !SCALAR_FLOAT_TYPE_P (lhs_type))))
-	    || !types_compatible_p (rhs1_type, rhs2_type)
+	    /* rhs1_type and rhs2_type may differ in sign.  */
+	    || !tree_nop_conversion_p (rhs1_type, rhs2_type)
 	    || !useless_type_conversion_p (lhs_type, rhs3_type)
 	    || maybe_lt (GET_MODE_SIZE (element_mode (rhs3_type)),
 			 2 * GET_MODE_SIZE (element_mode (rhs1_type))))
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 93fa2928e001c154bd4a9a73ac1dbbbf73c456df..756d2867b678d0d8394202c6adb03d9cd26029e7 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -6662,6 +6662,12 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
   bool lane_reduc_code_p
     = (code == DOT_PROD_EXPR || code == WIDEN_SUM_EXPR || code == SAD_EXPR);
   int op_type = TREE_CODE_LENGTH (code);
+  enum optab_subtype optab_query_kind = optab_vector;
+  if (code == DOT_PROD_EXPR
+      && TYPE_SIGN (TREE_TYPE (gimple_assign_rhs1 (stmt)))
+	   != TYPE_SIGN (TREE_TYPE (gimple_assign_rhs2 (stmt))))
+    optab_query_kind = optab_vector_mixed_sign;
+
 
   scalar_dest = gimple_assign_lhs (stmt);
   scalar_type = TREE_TYPE (scalar_dest);
@@ -7189,7 +7195,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
       bool ok = true;
 
       /* 4.1. check support for the operation in the loop  */
-      optab optab = optab_for_tree_code (code, vectype_in, optab_vector);
+      optab optab = optab_for_tree_code (code, vectype_in, optab_query_kind);
       if (!optab)
 	{
 	  if (dump_enabled_p ())
diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index 441d6cd28c4eaded7abd756164890dbcffd2f3b8..82123b96313e6783ea214b9259805d65c07d8858 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -201,7 +201,8 @@ vect_get_external_def_edge (vec_info *vinfo, tree var)
 static bool
 vect_supportable_direct_optab_p (vec_info *vinfo, tree otype, tree_code code,
 				 tree itype, tree *vecotype_out,
-				 tree *vecitype_out = NULL)
+				 tree *vecitype_out = NULL,
+				 enum optab_subtype subtype = optab_default)
 {
   tree vecitype = get_vectype_for_scalar_type (vinfo, itype);
   if (!vecitype)
@@ -211,7 +212,7 @@ vect_supportable_direct_optab_p (vec_info *vinfo, tree otype, tree_code code,
   if (!vecotype)
     return false;
 
-  optab optab = optab_for_tree_code (code, vecitype, optab_default);
+  optab optab = optab_for_tree_code (code, vecitype, subtype);
   if (!optab)
     return false;
 
@@ -487,10 +488,14 @@ vect_joust_widened_integer (tree type, bool shift_p, tree op,
 }
 
 /* Return true if the common supertype of NEW_TYPE and *COMMON_TYPE
-   is narrower than type, storing the supertype in *COMMON_TYPE if so.  */
+   is narrower than type, storing the supertype in *COMMON_TYPE if so.
+   If UNPROM_TYPE then accept that *COMMON_TYPE and NEW_TYPE may be of
+   different signs but equal precision and that the resulting
+   multiplication of them be compatible with UNPROM_TYPE.   */
 
 static bool
-vect_joust_widened_type (tree type, tree new_type, tree *common_type)
+vect_joust_widened_type (tree type, tree new_type, tree *common_type,
+			 tree unprom_type = NULL)
 {
   if (types_compatible_p (*common_type, new_type))
     return true;
@@ -514,7 +519,18 @@ vect_joust_widened_type (tree type, tree new_type, tree *common_type)
   unsigned int precision = MAX (TYPE_PRECISION (*common_type),
 				TYPE_PRECISION (new_type));
   precision *= 2;
-  if (precision * 2 > TYPE_PRECISION (type))
+
+  /* Check if the mismatch is only in the sign and if we have
+     UNPROM_TYPE then allow it if there is enough precision to
+     not lose any information during the conversion.  */
+  if (unprom_type
+      && TYPE_SIGN (unprom_type) == SIGNED
+      && tree_nop_conversion_p (*common_type, new_type))
+	return true;
+
+  /* The resulting application is unsigned, check if we have enough
+     precision to perform the operation.  */
+  if (precision * 2 > TYPE_PRECISION (unprom_type ? unprom_type : type))
     return false;
 
   *common_type = build_nonstandard_integer_type (precision, false);
@@ -532,6 +548,10 @@ vect_joust_widened_type (tree type, tree new_type, tree *common_type)
    to a type that (a) is narrower than the result of STMT_INFO and
    (b) can hold all leaf operand values.
 
+   If UNPROM_TYPE then allow that the signs of the operands
+   may differ in signs but not in precision and that the resulting type
+   of the operation on the operands is compatible with UNPROM_TYPE.
+
    Return 0 if STMT_INFO isn't such a tree, or if no such COMMON_TYPE
    exists.  */
 
@@ -539,7 +559,8 @@ static unsigned int
 vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
 		      tree_code widened_code, bool shift_p,
 		      unsigned int max_nops,
-		      vect_unpromoted_value *unprom, tree *common_type)
+		      vect_unpromoted_value *unprom, tree *common_type,
+		      tree unprom_type = NULL)
 {
   /* Check for an integer operation with the right code.  */
   gassign *assign = dyn_cast <gassign *> (stmt_info->stmt);
@@ -600,7 +621,8 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
 		= vinfo->lookup_def (this_unprom->op);
 	      nops = vect_widened_op_tree (vinfo, def_stmt_info, code,
 					   widened_code, shift_p, max_nops,
-					   this_unprom, common_type);
+					   this_unprom, common_type,
+					   unprom_type);
 	      if (nops == 0)
 		return 0;
 
@@ -617,7 +639,7 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
 	      if (i == 0)
 		*common_type = this_unprom->type;
 	      else if (!vect_joust_widened_type (type, this_unprom->type,
-						 common_type))
+						 common_type, unprom_type))
 		return 0;
 	    }
 	}
@@ -799,12 +821,15 @@ vect_convert_input (vec_info *vinfo, stmt_vec_info stmt_info, tree type,
 }
 
 /* Invoke vect_convert_input for N elements of UNPROM and store the
-   result in the corresponding elements of RESULT.  */
+   result in the corresponding elements of RESULT.
+
+   If ALLOW_SHORT_SIGN_MISMATCH then don't convert the types if they only
+   differ by sign.  */
 
 static void
 vect_convert_inputs (vec_info *vinfo, stmt_vec_info stmt_info, unsigned int n,
 		     tree *result, tree type, vect_unpromoted_value *unprom,
-		     tree vectype)
+		     tree vectype, bool allow_short_sign_mismatch = false)
 {
   for (unsigned int i = 0; i < n; ++i)
     {
@@ -812,8 +837,12 @@ vect_convert_inputs (vec_info *vinfo, stmt_vec_info stmt_info, unsigned int n,
       for (j = 0; j < i; ++j)
 	if (unprom[j].op == unprom[i].op)
 	  break;
+
       if (j < i)
 	result[i] = result[j];
+      else if (allow_short_sign_mismatch
+	       && tree_nop_conversion_p (type, unprom[i].type))
+	result[i] = unprom[i].op;
       else
 	result[i] = vect_convert_input (vinfo, stmt_info,
 					type, &unprom[i], vectype);
@@ -888,21 +917,24 @@ vect_reassociating_reduction_p (vec_info *vinfo,
 
    Try to find the following pattern:
 
-     type x_t, y_t;
+     type1a x_t
+     type1b y_t;
      TYPE1 prod;
      TYPE2 sum = init;
    loop:
      sum_0 = phi <init, sum_1>
      S1  x_t = ...
      S2  y_t = ...
-     S3  x_T = (TYPE1) x_t;
-     S4  y_T = (TYPE1) y_t;
+     S3  x_T = (TYPE3) x_t;
+     S4  y_T = (TYPE4) y_t;
      S5  prod = x_T * y_T;
      [S6  prod = (TYPE2) prod;  #optional]
      S7  sum_1 = prod + sum_0;
 
-   where 'TYPE1' is exactly double the size of type 'type', and 'TYPE2' is the
-   same size of 'TYPE1' or bigger. This is a special case of a reduction
+   where 'TYPE1' is exactly double the size of type 'type1a' and 'type1b',
+   the sign of 'TYPE1' must be one of 'type1a' or 'type1b' but the sign of
+   'type1a' and 'type1b' can differ. 'TYPE2' is the same size of 'TYPE1' or
+   bigger and must be the same sign. This is a special case of a reduction
    computation.
 
    Input:
@@ -939,15 +971,16 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
 
   /* Look for the following pattern
           DX = (TYPE1) X;
-          DY = (TYPE1) Y;
+	  DY = (TYPE2) Y;
           DPROD = DX * DY;
-          DDPROD = (TYPE2) DPROD;
+	  DDPROD = (TYPE3) DPROD;
           sum_1 = DDPROD + sum_0;
      In which
      - DX is double the size of X
      - DY is double the size of Y
      - DX, DY, DPROD all have the same type but the sign
-       between DX, DY and DPROD can differ.
+       between DX, DY and DPROD can differ. The sign of DPROD
+       is one of the signs of DX or DY.
      - sum is the same size of DPROD or bigger
      - sum has been recognized as a reduction variable.
 
@@ -986,20 +1019,29 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
      inside the loop (in case we are analyzing an outer-loop).  */
   vect_unpromoted_value unprom0[2];
   if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR, WIDEN_MULT_EXPR,
-			     false, 2, unprom0, &half_type))
+			     false, 2, unprom0, &half_type,
+			     TREE_TYPE (unprom_mult.op)))
     return NULL;
 
+  /* Check to see if there is a sign change happening in the operands of the
+     multiplication and pick the appropriate optab subtype.  */
+  enum optab_subtype subtype;
+  if (TYPE_SIGN (unprom0[0].type) == TYPE_SIGN (unprom0[1].type))
+    subtype = optab_default;
+  else
+    subtype = optab_vector_mixed_sign;
+
   vect_pattern_detected ("vect_recog_dot_prod_pattern", last_stmt);
 
   tree half_vectype;
   if (!vect_supportable_direct_optab_p (vinfo, type, DOT_PROD_EXPR, half_type,
-					type_out, &half_vectype))
+					type_out, &half_vectype, subtype))
     return NULL;
 
   /* Get the inputs in the appropriate types.  */
   tree mult_oprnd[2];
   vect_convert_inputs (vinfo, stmt_vinfo, 2, mult_oprnd, half_type,
-		       unprom0, half_vectype);
+		       unprom0, half_vectype, true);
 
   var = vect_recog_temp_ssa_var (type, NULL);
   pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,

next prev parent reply	other threads:[~2021-06-04 10:13 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-05 17:38 Tamar Christina
2021-05-05 17:38 ` [PATCH 2/4]AArch64: Add support for sign differing dot-product usdot for NEON and SVE Tamar Christina
2021-05-10 16:49   ` Richard Sandiford
2021-05-25 14:57     ` Tamar Christina
2021-05-26  8:50       ` Richard Sandiford
2021-05-05 17:39 ` [PATCH 3/4][AArch32]: Add support for sign differing dot-product usdot for NEON Tamar Christina
2021-05-05 17:42   ` FW: " Tamar Christina
     [not found]     ` <VI1PR08MB5325B832EE3BB6139886C0E9FF259@VI1PR08MB5325.eurprd08.prod.outlook.com>
2021-05-25 15:02       ` Tamar Christina
2021-05-26 10:45         ` Kyrylo Tkachov
2021-05-06  9:23   ` Christophe Lyon
2021-05-06  9:27     ` Tamar Christina
2021-05-05 17:39 ` [PATCH 4/4]middle-end: Add tests middle end generic tests for sign differing dotproduct Tamar Christina
     [not found]   ` <VI1PR08MB532511701573C18A33AC6291FF259@VI1PR08MB5325.eurprd08.prod.outlook.com>
2021-05-25 15:01     ` FW: " Tamar Christina
     [not found]     ` <11s2181-8856-30rq-26or-84q8o7qrr2o@fhfr.qr>
2021-05-26  8:48       ` Tamar Christina
2021-06-14 12:08       ` Tamar Christina
2021-05-07 11:45 ` [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes Richard Biener
2021-05-07 12:42   ` Tamar Christina
2021-05-10 11:39     ` Richard Biener
2021-05-10 12:58       ` Tamar Christina
2021-05-10 13:29         ` Richard Biener
2021-05-25 14:57           ` Tamar Christina
2021-05-26  8:56             ` Richard Biener
2021-06-02  9:28               ` Tamar Christina
2021-06-04 10:12                 ` Tamar Christina [this message]
2021-06-07 10:10                   ` Richard Sandiford
2021-06-14 12:06                     ` Tamar Christina
2021-06-21  8:11                       ` Tamar Christina
2021-06-22 10:56                       ` Richard Sandiford
2021-06-22 11:16                         ` Richard Sandiford
2021-07-12  9:18                           ` Tamar Christina
2021-07-12  9:39                             ` Richard Sandiford
2021-07-12  9:56                               ` Tamar Christina
2021-07-12 10:25                                 ` Richard Sandiford
2021-07-12 12:29                                   ` Tamar Christina
2021-07-12 14:55                                     ` Richard Sandiford

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=VI1PR08MB532593667144D0CA998C2723FF3B9@VI1PR08MB5325.eurprd08.prod.outlook.com \
    --to=tamar.christina@arm.com \
    --cc=Richard.Sandiford@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=nd@arm.com \
    --cc=rguenther@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).