[PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes.

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes.
@ 2021-05-05 17:38 Tamar Christina
  2021-05-05 17:38 ` [PATCH 2/4]AArch64: Add support for sign differing dot-product usdot for NEON and SVE Tamar Christina
                   ` (3 more replies)
  0 siblings, 4 replies; 35+ messages in thread
From: Tamar Christina @ 2021-05-05 17:38 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, rguenther

[-- Attachment #1: Type: text/plain, Size: 18498 bytes --]

Hi All,

This patch adds support for a dot product where the sign of the multiplication
arguments differ. i.e. one is signed and one is unsigned but the precisions are
the same.

#define N 480
#define SIGNEDNESS_1 unsigned
#define SIGNEDNESS_2 signed
#define SIGNEDNESS_3 signed
#define SIGNEDNESS_4 unsigned

SIGNEDNESS_1 int __attribute__ ((noipa))
f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a,
   SIGNEDNESS_4 char *restrict b)
{
  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
    {
      int av = a[i];
      int bv = b[i];
      SIGNEDNESS_2 short mult = av * bv;
      res += mult;
    }
  return res;
}

The operations are performed as if the operands were extended to a 32-bit value.
As such this operation isn't valid if there is an intermediate conversion to an
unsigned value. i.e.  if SIGNEDNESS_2 is unsigned.

more over if the signs of SIGNEDNESS_3 and SIGNEDNESS_4 are flipped the same
optab is used but the operands are flipped in the optab expansion.

To support this the patch extends the dot-product detection to optionally
ignore operands with different signs and stores this information in the optab
subtype which is now made a bitfield.

The subtype can now additionally controls which optab an EXPR can expand to.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* optabs.def (usdot_prod_optab): New.
	* doc/md.texi: Document it.
	* optabs-tree.c (optab_for_tree_code): Support usdot_prod_optab.
	* optabs-tree.h (enum optab_subtype): Likewise.
	* optabs.c (expand_widen_pattern_expr): Likewise.
	* tree-cfg.c (verify_gimple_assign_ternary): Likewise.
	* tree-vect-loop.c (vect_determine_dot_kind): New.
	(vectorizable_reduction): Query dot-product kind.
	* tree-vect-patterns.c (vect_supportable_direct_optab_p): Take optional
	optab subtype.
	(vect_joust_widened_type, vect_widened_op_tree): Optionally ignore
	mismatch types.
	(vect_recog_dot_prod_pattern): Support usdot_prod_optab.

--- inline copy of patch -- 
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index d166a0debedf4d8edf55c842bcf4ff4690b3e9ce..baf20416e63745097825fc30fdf2e66bc80d7d23 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5440,11 +5440,13 @@ Like @samp{fold_left_plus_@var{m}}, but takes an additional mask operand
 @item @samp{sdot_prod@var{m}}
 @cindex @code{udot_prod@var{m}} instruction pattern
 @itemx @samp{udot_prod@var{m}}
+@cindex @code{usdot_prod@var{m}} instruction pattern
+@itemx @samp{usdot_prod@var{m}}
 Compute the sum of the products of two signed/unsigned elements.
-Operand 1 and operand 2 are of the same mode. Their product, which is of a
-wider mode, is computed and added to operand 3. Operand 3 is of a mode equal or
-wider than the mode of the product. The result is placed in operand 0, which
-is of the same mode as operand 3.
+Operand 1 and operand 2 are of the same mode but may differ in signs. Their
+product, which is of a wider mode, is computed and added to operand 3.
+Operand 3 is of a mode equal or wider than the mode of the product. The
+result is placed in operand 0, which is of the same mode as operand 3.
 
 @cindex @code{ssad@var{m}} instruction pattern
 @item @samp{ssad@var{m}}
diff --git a/gcc/optabs-tree.h b/gcc/optabs-tree.h
index c3aaa1a416991e856d3e24da45968a92ebada82c..ebc23ac86fe99057f375781c2f1990e0548ba08d 100644
--- a/gcc/optabs-tree.h
+++ b/gcc/optabs-tree.h
@@ -27,11 +27,29 @@ along with GCC; see the file COPYING3.  If not see
    shift amount vs. machines that take a vector for the shift amount.  */
 enum optab_subtype
 {
-  optab_default,
-  optab_scalar,
-  optab_vector
+  optab_default = 1 << 0,
+  optab_scalar = 1 << 1,
+  optab_vector = 1 << 2,
+  optab_signed_to_unsigned = 1 << 3,
+  optab_unsigned_to_signed = 1 << 4
 };
 
+/* Override the OrEqual-operator so we can use optab_subtype as a bit flag.  */
+inline enum optab_subtype&
+operator |= (enum optab_subtype& a, enum optab_subtype b)
+{
+    return a = static_cast<optab_subtype>(static_cast<int>(a)
+					  | static_cast<int>(b));
+}
+
+/* Override the Or-operator so we can use optab_subtype as a bit flag.  */
+inline enum optab_subtype
+operator | (enum optab_subtype a, enum optab_subtype b)
+{
+    return static_cast<optab_subtype>(static_cast<int>(a)
+				      | static_cast<int>(b));
+}
+
 /* Return the optab used for computing the given operation on the type given by
    the second argument.  The third argument distinguishes between the types of
    vector shifts and rotates.  */
diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
index 95ffe397c23e80c105afea52e9d47216bf52f55a..2f60004545defc53182e004eea1e5c22b7453072 100644
--- a/gcc/optabs-tree.c
+++ b/gcc/optabs-tree.c
@@ -127,7 +127,17 @@ optab_for_tree_code (enum tree_code code, const_tree type,
       return TYPE_UNSIGNED (type) ? usum_widen_optab : ssum_widen_optab;
 
     case DOT_PROD_EXPR:
-      return TYPE_UNSIGNED (type) ? udot_prod_optab : sdot_prod_optab;
+      {
+	gcc_assert (subtype & optab_default
+		    || subtype & optab_vector
+		    || subtype & optab_signed_to_unsigned
+		    || subtype & optab_unsigned_to_signed);
+
+	if (subtype & (optab_unsigned_to_signed | optab_signed_to_unsigned))
+	  return usdot_prod_optab;
+
+	return (TYPE_UNSIGNED (type) ? udot_prod_optab : sdot_prod_optab);
+      }
 
     case SAD_EXPR:
       return TYPE_UNSIGNED (type) ? usad_optab : ssad_optab;
diff --git a/gcc/optabs.c b/gcc/optabs.c
index f4614a394587787293dc8b680a38901f7906f61c..2e18b76de1412eab71971753ac678597c0d00098 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -262,6 +262,11 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
   bool sbool = false;
 
   oprnd0 = ops->op0;
+  if (nops >= 2)
+    oprnd1 = ops->op1;
+  if (nops >= 3)
+    oprnd2 = ops->op2;
+
   tmode0 = TYPE_MODE (TREE_TYPE (oprnd0));
   if (ops->code == VEC_UNPACK_FIX_TRUNC_HI_EXPR
       || ops->code == VEC_UNPACK_FIX_TRUNC_LO_EXPR)
@@ -285,6 +290,27 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
 	   ? vec_unpacks_sbool_hi_optab : vec_unpacks_sbool_lo_optab);
       sbool = true;
     }
+  else if (ops->code == DOT_PROD_EXPR)
+    {
+      enum optab_subtype subtype = optab_default;
+      signop sign1 = TYPE_SIGN (TREE_TYPE (oprnd0));
+      signop sign2 = TYPE_SIGN (TREE_TYPE (oprnd1));
+      if (sign1 == sign2)
+	;
+      else if (sign1 == SIGNED && sign2 == UNSIGNED)
+	{
+	  subtype |= optab_signed_to_unsigned;
+	  /* Same as optab_unsigned_to_signed but flip the operands.  */
+	  std::swap (op0, op1);
+	}
+      else if (sign1 == UNSIGNED && sign2 == SIGNED)
+	subtype |= optab_unsigned_to_signed;
+      else
+	gcc_unreachable ();
+
+      widen_pattern_optab
+	= optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), subtype);
+    }
   else
     widen_pattern_optab
       = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), optab_default);
@@ -298,10 +324,7 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
   gcc_assert (icode != CODE_FOR_nothing);
 
   if (nops >= 2)
-    {
-      oprnd1 = ops->op1;
-      tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
-    }
+    tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
   else if (sbool)
     {
       nops = 2;
@@ -316,7 +339,6 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
     {
       gcc_assert (tmode1 == tmode0);
       gcc_assert (op1);
-      oprnd2 = ops->op2;
       wmode = TYPE_MODE (TREE_TYPE (oprnd2));
     }
 
diff --git a/gcc/optabs.def b/gcc/optabs.def
index b192a9d070b8aa72e5676b2eaa020b5bdd7ffcc8..f470c2168378cec840edf7fbdb7c18615baae928 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -352,6 +352,7 @@ OPTAB_D (uavg_ceil_optab, "uavg$a3_ceil")
 OPTAB_D (sdot_prod_optab, "sdot_prod$I$a")
 OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3")
 OPTAB_D (udot_prod_optab, "udot_prod$I$a")
+OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
 OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
 OPTAB_D (usad_optab, "usad$I$a")
 OPTAB_D (ssad_optab, "ssad$I$a")
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index 7e3aae5f9c28a49feedc7cc66e8ac0d476b9f28a..58b55bb648ad97d514f1fa18bb00808fd2678b42 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -4421,7 +4421,8 @@ verify_gimple_assign_ternary (gassign *stmt)
 		  && !SCALAR_FLOAT_TYPE_P (rhs1_type))
 		 || (!INTEGRAL_TYPE_P (lhs_type)
 		     && !SCALAR_FLOAT_TYPE_P (lhs_type))))
-	    || !types_compatible_p (rhs1_type, rhs2_type)
+	    || (!types_compatible_p (rhs1_type, rhs2_type)
+		&& TYPE_SIGN (rhs1_type) == TYPE_SIGN (rhs2_type))
 	    || !useless_type_conversion_p (lhs_type, rhs3_type)
 	    || maybe_lt (GET_MODE_SIZE (element_mode (rhs3_type)),
 			 2 * GET_MODE_SIZE (element_mode (rhs1_type))))
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 93fa2928e001c154bd4a9a73ac1dbbbf73c456df..cb8f5fbb6abca181c4171194d19fec29ec6e4176 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -6401,6 +6401,33 @@ build_vect_cond_expr (enum tree_code code, tree vop[3], tree mask,
     }
 }
 
+/* Determine the optab_subtype to use for the given CODE and STMT.  For
+   most CODE this will be optab_vector, however for certain operations such as
+   DOT_PROD_EXPR where the operation can different signs for the operands we
+   need to be able to pick the right optabs.  */
+
+static enum optab_subtype
+vect_determine_dot_kind (tree_code code, stmt_vec_info stmt_vinfo)
+{
+  enum optab_subtype subtype = optab_vector;
+  switch (code)
+    {
+      case DOT_PROD_EXPR:
+	{
+	  gassign *stmt = as_a <gassign *> (STMT_VINFO_STMT (stmt_vinfo));
+	  signop rhs1_sign = TYPE_SIGN (TREE_TYPE (gimple_assign_rhs1 (stmt)));
+	  signop rhs2_sign = TYPE_SIGN (TREE_TYPE (gimple_assign_rhs2 (stmt)));
+	  if (rhs1_sign != rhs2_sign)
+	    subtype |= optab_unsigned_to_signed;
+	  break;
+	}
+      default:
+	break;
+    }
+
+  return subtype;
+}
+
 /* Function vectorizable_reduction.
 
    Check if STMT_INFO performs a reduction operation that can be vectorized.
@@ -7189,7 +7216,8 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
       bool ok = true;
 
       /* 4.1. check support for the operation in the loop  */
-      optab optab = optab_for_tree_code (code, vectype_in, optab_vector);
+      enum optab_subtype subtype = vect_determine_dot_kind (code, stmt_info);
+      optab optab = optab_for_tree_code (code, vectype_in, subtype);
       if (!optab)
 	{
 	  if (dump_enabled_p ())
diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index 441d6cd28c4eaded7abd756164890dbcffd2f3b8..943c001fb13777b4d1513841fa84942316846d5e 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -201,7 +201,8 @@ vect_get_external_def_edge (vec_info *vinfo, tree var)
 static bool
 vect_supportable_direct_optab_p (vec_info *vinfo, tree otype, tree_code code,
 				 tree itype, tree *vecotype_out,
-				 tree *vecitype_out = NULL)
+				 tree *vecitype_out = NULL,
+				 enum optab_subtype subtype = optab_default)
 {
   tree vecitype = get_vectype_for_scalar_type (vinfo, itype);
   if (!vecitype)
@@ -211,7 +212,7 @@ vect_supportable_direct_optab_p (vec_info *vinfo, tree otype, tree_code code,
   if (!vecotype)
     return false;
 
-  optab optab = optab_for_tree_code (code, vecitype, optab_default);
+  optab optab = optab_for_tree_code (code, vecitype, subtype);
   if (!optab)
     return false;
 
@@ -487,14 +488,31 @@ vect_joust_widened_integer (tree type, bool shift_p, tree op,
 }
 
 /* Return true if the common supertype of NEW_TYPE and *COMMON_TYPE
-   is narrower than type, storing the supertype in *COMMON_TYPE if so.  */
+   is narrower than type, storing the supertype in *COMMON_TYPE if so.
+   If ALLOW_SHORT_SIGN_MISMATCH then accept that *COMMON_TYPE and NEW_TYPE
+   may be of different signs but equal precision.   */
 
 static bool
-vect_joust_widened_type (tree type, tree new_type, tree *common_type)
+vect_joust_widened_type (tree type, tree new_type, tree *common_type,
+			 bool allow_short_sign_mismatch = false)
 {
   if (types_compatible_p (*common_type, new_type))
     return true;
 
+  /* Check if the mismatch is only in the sign and if we have
+     allow_short_sign_mismatch then allow it.  */
+  if (allow_short_sign_mismatch
+      && TYPE_SIGN (*common_type) != TYPE_SIGN (new_type))
+    {
+      bool sign = TYPE_SIGN (*common_type) == UNSIGNED;
+      tree eq_type
+	= build_nonstandard_integer_type (TYPE_PRECISION (new_type),
+					  sign);
+
+      if (types_compatible_p (*common_type, eq_type))
+	return true;
+    }
+
   /* See if *COMMON_TYPE can hold all values of NEW_TYPE.  */
   if ((TYPE_PRECISION (new_type) < TYPE_PRECISION (*common_type))
       && (TYPE_UNSIGNED (new_type) || !TYPE_UNSIGNED (*common_type)))
@@ -532,6 +550,9 @@ vect_joust_widened_type (tree type, tree new_type, tree *common_type)
    to a type that (a) is narrower than the result of STMT_INFO and
    (b) can hold all leaf operand values.
 
+   If ALLOW_SHORT_SIGN_MISMATCH then allow that the signs of the operands
+   may differ in signs but not in precision.
+
    Return 0 if STMT_INFO isn't such a tree, or if no such COMMON_TYPE
    exists.  */
 
@@ -539,7 +560,8 @@ static unsigned int
 vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
 		      tree_code widened_code, bool shift_p,
 		      unsigned int max_nops,
-		      vect_unpromoted_value *unprom, tree *common_type)
+		      vect_unpromoted_value *unprom, tree *common_type,
+		      bool allow_short_sign_mismatch = false)
 {
   /* Check for an integer operation with the right code.  */
   gassign *assign = dyn_cast <gassign *> (stmt_info->stmt);
@@ -600,7 +622,8 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
 		= vinfo->lookup_def (this_unprom->op);
 	      nops = vect_widened_op_tree (vinfo, def_stmt_info, code,
 					   widened_code, shift_p, max_nops,
-					   this_unprom, common_type);
+					   this_unprom, common_type,
+					   allow_short_sign_mismatch);
 	      if (nops == 0)
 		return 0;
 
@@ -617,7 +640,8 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
 	      if (i == 0)
 		*common_type = this_unprom->type;
 	      else if (!vect_joust_widened_type (type, this_unprom->type,
-						 common_type))
+						 common_type,
+						 allow_short_sign_mismatch))
 		return 0;
 	    }
 	}
@@ -888,21 +912,24 @@ vect_reassociating_reduction_p (vec_info *vinfo,
 
    Try to find the following pattern:
 
-     type x_t, y_t;
+     type1a x_t
+     type1b y_t;
      TYPE1 prod;
      TYPE2 sum = init;
    loop:
      sum_0 = phi <init, sum_1>
      S1  x_t = ...
      S2  y_t = ...
-     S3  x_T = (TYPE1) x_t;
-     S4  y_T = (TYPE1) y_t;
+     S3  x_T = (TYPE3) x_t;
+     S4  y_T = (TYPE4) y_t;
      S5  prod = x_T * y_T;
      [S6  prod = (TYPE2) prod;  #optional]
      S7  sum_1 = prod + sum_0;
 
-   where 'TYPE1' is exactly double the size of type 'type', and 'TYPE2' is the
-   same size of 'TYPE1' or bigger. This is a special case of a reduction
+   where 'TYPE1' is exactly double the size of type 'type1a' and 'type1b',
+   the sign of 'TYPE1' must be one of 'type1a' or 'type1b' but the sign of
+   'type1a' and 'type1b' can differ. 'TYPE2' is the same size of 'TYPE1' or
+   bigger and must be the same sign. This is a special case of a reduction
    computation.
 
    Input:
@@ -939,15 +966,16 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
 
   /* Look for the following pattern
           DX = (TYPE1) X;
-          DY = (TYPE1) Y;
+	  DY = (TYPE2) Y;
           DPROD = DX * DY;
-          DDPROD = (TYPE2) DPROD;
+	  DDPROD = (TYPE3) DPROD;
           sum_1 = DDPROD + sum_0;
      In which
      - DX is double the size of X
      - DY is double the size of Y
      - DX, DY, DPROD all have the same type but the sign
-       between DX, DY and DPROD can differ.
+       between DX, DY and DPROD can differ. The sign of DPROD
+       is one of the signs of DX or DY.
      - sum is the same size of DPROD or bigger
      - sum has been recognized as a reduction variable.
 
@@ -986,14 +1014,41 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
      inside the loop (in case we are analyzing an outer-loop).  */
   vect_unpromoted_value unprom0[2];
   if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR, WIDEN_MULT_EXPR,
-			     false, 2, unprom0, &half_type))
+			     false, 2, unprom0, &half_type, true))
     return NULL;
 
+  /* Check to see if there is a sign change happening in the operands of the
+     multiplication and pick the appropriate optab subtype.  */
+  enum optab_subtype subtype;
+  tree rhs_type1 = unprom0[0].type;
+  tree rhs_type2 = unprom0[1].type;
+  if (TYPE_SIGN (rhs_type1) == TYPE_SIGN (rhs_type2))
+     subtype = optab_default;
+  else if (TYPE_SIGN (rhs_type1) == SIGNED
+	   && TYPE_SIGN (rhs_type2) == UNSIGNED)
+     subtype = optab_signed_to_unsigned;
+  else if (TYPE_SIGN (rhs_type1) == UNSIGNED
+	   && TYPE_SIGN (rhs_type2) == SIGNED)
+     subtype = optab_unsigned_to_signed;
+  else
+    gcc_unreachable ();
+
+  /* If we have a sign changing dot product we need to check that the
+     promoted type if unsigned has at least the same precision as the final
+     type of the dot-product.  */
+  if (subtype != optab_default)
+    {
+      tree mult_type = TREE_TYPE (unprom_mult.op);
+      if (TYPE_SIGN (mult_type) == UNSIGNED
+	  && TYPE_PRECISION (mult_type) < TYPE_PRECISION (type))
+	return NULL;
+    }
+
   vect_pattern_detected ("vect_recog_dot_prod_pattern", last_stmt);
 
   tree half_vectype;
   if (!vect_supportable_direct_optab_p (vinfo, type, DOT_PROD_EXPR, half_type,
-					type_out, &half_vectype))
+					type_out, &half_vectype, subtype))
     return NULL;
 
   /* Get the inputs in the appropriate types.  */
@@ -1002,8 +1057,22 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
 		       unprom0, half_vectype);
 
   var = vect_recog_temp_ssa_var (type, NULL);
+
+  /* If we have a sign changing dot-product the dot-product itself does any
+     sign conversions, so consume the type and use the unpromoted types.  */
+  tree mult_arg1, mult_arg2;
+  if (subtype == optab_default)
+    {
+      mult_arg1 = mult_oprnd[0];
+      mult_arg2 = mult_oprnd[1];
+    }
+  else
+    {
+      mult_arg1 = unprom0[0].op;
+      mult_arg2 = unprom0[1].op;
+    }
   pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,
-				      mult_oprnd[0], mult_oprnd[1], oprnd1);
+				      mult_arg1, mult_arg2, oprnd1);
 
   return pattern_stmt;
 }


-- 

[-- Attachment #2: rb14433.patch --]
[-- Type: text/x-diff, Size: 16483 bytes --]

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index d166a0debedf4d8edf55c842bcf4ff4690b3e9ce..baf20416e63745097825fc30fdf2e66bc80d7d23 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5440,11 +5440,13 @@ Like @samp{fold_left_plus_@var{m}}, but takes an additional mask operand
 @item @samp{sdot_prod@var{m}}
 @cindex @code{udot_prod@var{m}} instruction pattern
 @itemx @samp{udot_prod@var{m}}
+@cindex @code{usdot_prod@var{m}} instruction pattern
+@itemx @samp{usdot_prod@var{m}}
 Compute the sum of the products of two signed/unsigned elements.
-Operand 1 and operand 2 are of the same mode. Their product, which is of a
-wider mode, is computed and added to operand 3. Operand 3 is of a mode equal or
-wider than the mode of the product. The result is placed in operand 0, which
-is of the same mode as operand 3.
+Operand 1 and operand 2 are of the same mode but may differ in signs. Their
+product, which is of a wider mode, is computed and added to operand 3.
+Operand 3 is of a mode equal or wider than the mode of the product. The
+result is placed in operand 0, which is of the same mode as operand 3.
 
 @cindex @code{ssad@var{m}} instruction pattern
 @item @samp{ssad@var{m}}
diff --git a/gcc/optabs-tree.h b/gcc/optabs-tree.h
index c3aaa1a416991e856d3e24da45968a92ebada82c..ebc23ac86fe99057f375781c2f1990e0548ba08d 100644
--- a/gcc/optabs-tree.h
+++ b/gcc/optabs-tree.h
@@ -27,11 +27,29 @@ along with GCC; see the file COPYING3.  If not see
    shift amount vs. machines that take a vector for the shift amount.  */
 enum optab_subtype
 {
-  optab_default,
-  optab_scalar,
-  optab_vector
+  optab_default = 1 << 0,
+  optab_scalar = 1 << 1,
+  optab_vector = 1 << 2,
+  optab_signed_to_unsigned = 1 << 3,
+  optab_unsigned_to_signed = 1 << 4
 };
 
+/* Override the OrEqual-operator so we can use optab_subtype as a bit flag.  */
+inline enum optab_subtype&
+operator |= (enum optab_subtype& a, enum optab_subtype b)
+{
+    return a = static_cast<optab_subtype>(static_cast<int>(a)
+					  | static_cast<int>(b));
+}
+
+/* Override the Or-operator so we can use optab_subtype as a bit flag.  */
+inline enum optab_subtype
+operator | (enum optab_subtype a, enum optab_subtype b)
+{
+    return static_cast<optab_subtype>(static_cast<int>(a)
+				      | static_cast<int>(b));
+}
+
 /* Return the optab used for computing the given operation on the type given by
    the second argument.  The third argument distinguishes between the types of
    vector shifts and rotates.  */
diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
index 95ffe397c23e80c105afea52e9d47216bf52f55a..2f60004545defc53182e004eea1e5c22b7453072 100644
--- a/gcc/optabs-tree.c
+++ b/gcc/optabs-tree.c
@@ -127,7 +127,17 @@ optab_for_tree_code (enum tree_code code, const_tree type,
       return TYPE_UNSIGNED (type) ? usum_widen_optab : ssum_widen_optab;
 
     case DOT_PROD_EXPR:
-      return TYPE_UNSIGNED (type) ? udot_prod_optab : sdot_prod_optab;
+      {
+	gcc_assert (subtype & optab_default
+		    || subtype & optab_vector
+		    || subtype & optab_signed_to_unsigned
+		    || subtype & optab_unsigned_to_signed);
+
+	if (subtype & (optab_unsigned_to_signed | optab_signed_to_unsigned))
+	  return usdot_prod_optab;
+
+	return (TYPE_UNSIGNED (type) ? udot_prod_optab : sdot_prod_optab);
+      }
 
     case SAD_EXPR:
       return TYPE_UNSIGNED (type) ? usad_optab : ssad_optab;
diff --git a/gcc/optabs.c b/gcc/optabs.c
index f4614a394587787293dc8b680a38901f7906f61c..2e18b76de1412eab71971753ac678597c0d00098 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -262,6 +262,11 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
   bool sbool = false;
 
   oprnd0 = ops->op0;
+  if (nops >= 2)
+    oprnd1 = ops->op1;
+  if (nops >= 3)
+    oprnd2 = ops->op2;
+
   tmode0 = TYPE_MODE (TREE_TYPE (oprnd0));
   if (ops->code == VEC_UNPACK_FIX_TRUNC_HI_EXPR
       || ops->code == VEC_UNPACK_FIX_TRUNC_LO_EXPR)
@@ -285,6 +290,27 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
 	   ? vec_unpacks_sbool_hi_optab : vec_unpacks_sbool_lo_optab);
       sbool = true;
     }
+  else if (ops->code == DOT_PROD_EXPR)
+    {
+      enum optab_subtype subtype = optab_default;
+      signop sign1 = TYPE_SIGN (TREE_TYPE (oprnd0));
+      signop sign2 = TYPE_SIGN (TREE_TYPE (oprnd1));
+      if (sign1 == sign2)
+	;
+      else if (sign1 == SIGNED && sign2 == UNSIGNED)
+	{
+	  subtype |= optab_signed_to_unsigned;
+	  /* Same as optab_unsigned_to_signed but flip the operands.  */
+	  std::swap (op0, op1);
+	}
+      else if (sign1 == UNSIGNED && sign2 == SIGNED)
+	subtype |= optab_unsigned_to_signed;
+      else
+	gcc_unreachable ();
+
+      widen_pattern_optab
+	= optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), subtype);
+    }
   else
     widen_pattern_optab
       = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), optab_default);
@@ -298,10 +324,7 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
   gcc_assert (icode != CODE_FOR_nothing);
 
   if (nops >= 2)
-    {
-      oprnd1 = ops->op1;
-      tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
-    }
+    tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
   else if (sbool)
     {
       nops = 2;
@@ -316,7 +339,6 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
     {
       gcc_assert (tmode1 == tmode0);
       gcc_assert (op1);
-      oprnd2 = ops->op2;
       wmode = TYPE_MODE (TREE_TYPE (oprnd2));
     }
 
diff --git a/gcc/optabs.def b/gcc/optabs.def
index b192a9d070b8aa72e5676b2eaa020b5bdd7ffcc8..f470c2168378cec840edf7fbdb7c18615baae928 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -352,6 +352,7 @@ OPTAB_D (uavg_ceil_optab, "uavg$a3_ceil")
 OPTAB_D (sdot_prod_optab, "sdot_prod$I$a")
 OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3")
 OPTAB_D (udot_prod_optab, "udot_prod$I$a")
+OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
 OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
 OPTAB_D (usad_optab, "usad$I$a")
 OPTAB_D (ssad_optab, "ssad$I$a")
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index 7e3aae5f9c28a49feedc7cc66e8ac0d476b9f28a..58b55bb648ad97d514f1fa18bb00808fd2678b42 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -4421,7 +4421,8 @@ verify_gimple_assign_ternary (gassign *stmt)
 		  && !SCALAR_FLOAT_TYPE_P (rhs1_type))
 		 || (!INTEGRAL_TYPE_P (lhs_type)
 		     && !SCALAR_FLOAT_TYPE_P (lhs_type))))
-	    || !types_compatible_p (rhs1_type, rhs2_type)
+	    || (!types_compatible_p (rhs1_type, rhs2_type)
+		&& TYPE_SIGN (rhs1_type) == TYPE_SIGN (rhs2_type))
 	    || !useless_type_conversion_p (lhs_type, rhs3_type)
 	    || maybe_lt (GET_MODE_SIZE (element_mode (rhs3_type)),
 			 2 * GET_MODE_SIZE (element_mode (rhs1_type))))
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 93fa2928e001c154bd4a9a73ac1dbbbf73c456df..cb8f5fbb6abca181c4171194d19fec29ec6e4176 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -6401,6 +6401,33 @@ build_vect_cond_expr (enum tree_code code, tree vop[3], tree mask,
     }
 }
 
+/* Determine the optab_subtype to use for the given CODE and STMT.  For
+   most CODE this will be optab_vector, however for certain operations such as
+   DOT_PROD_EXPR where the operation can different signs for the operands we
+   need to be able to pick the right optabs.  */
+
+static enum optab_subtype
+vect_determine_dot_kind (tree_code code, stmt_vec_info stmt_vinfo)
+{
+  enum optab_subtype subtype = optab_vector;
+  switch (code)
+    {
+      case DOT_PROD_EXPR:
+	{
+	  gassign *stmt = as_a <gassign *> (STMT_VINFO_STMT (stmt_vinfo));
+	  signop rhs1_sign = TYPE_SIGN (TREE_TYPE (gimple_assign_rhs1 (stmt)));
+	  signop rhs2_sign = TYPE_SIGN (TREE_TYPE (gimple_assign_rhs2 (stmt)));
+	  if (rhs1_sign != rhs2_sign)
+	    subtype |= optab_unsigned_to_signed;
+	  break;
+	}
+      default:
+	break;
+    }
+
+  return subtype;
+}
+
 /* Function vectorizable_reduction.
 
    Check if STMT_INFO performs a reduction operation that can be vectorized.
@@ -7189,7 +7216,8 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
       bool ok = true;
 
       /* 4.1. check support for the operation in the loop  */
-      optab optab = optab_for_tree_code (code, vectype_in, optab_vector);
+      enum optab_subtype subtype = vect_determine_dot_kind (code, stmt_info);
+      optab optab = optab_for_tree_code (code, vectype_in, subtype);
       if (!optab)
 	{
 	  if (dump_enabled_p ())
diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index 441d6cd28c4eaded7abd756164890dbcffd2f3b8..943c001fb13777b4d1513841fa84942316846d5e 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -201,7 +201,8 @@ vect_get_external_def_edge (vec_info *vinfo, tree var)
 static bool
 vect_supportable_direct_optab_p (vec_info *vinfo, tree otype, tree_code code,
 				 tree itype, tree *vecotype_out,
-				 tree *vecitype_out = NULL)
+				 tree *vecitype_out = NULL,
+				 enum optab_subtype subtype = optab_default)
 {
   tree vecitype = get_vectype_for_scalar_type (vinfo, itype);
   if (!vecitype)
@@ -211,7 +212,7 @@ vect_supportable_direct_optab_p (vec_info *vinfo, tree otype, tree_code code,
   if (!vecotype)
     return false;
 
-  optab optab = optab_for_tree_code (code, vecitype, optab_default);
+  optab optab = optab_for_tree_code (code, vecitype, subtype);
   if (!optab)
     return false;
 
@@ -487,14 +488,31 @@ vect_joust_widened_integer (tree type, bool shift_p, tree op,
 }
 
 /* Return true if the common supertype of NEW_TYPE and *COMMON_TYPE
-   is narrower than type, storing the supertype in *COMMON_TYPE if so.  */
+   is narrower than type, storing the supertype in *COMMON_TYPE if so.
+   If ALLOW_SHORT_SIGN_MISMATCH then accept that *COMMON_TYPE and NEW_TYPE
+   may be of different signs but equal precision.   */
 
 static bool
-vect_joust_widened_type (tree type, tree new_type, tree *common_type)
+vect_joust_widened_type (tree type, tree new_type, tree *common_type,
+			 bool allow_short_sign_mismatch = false)
 {
   if (types_compatible_p (*common_type, new_type))
     return true;
 
+  /* Check if the mismatch is only in the sign and if we have
+     allow_short_sign_mismatch then allow it.  */
+  if (allow_short_sign_mismatch
+      && TYPE_SIGN (*common_type) != TYPE_SIGN (new_type))
+    {
+      bool sign = TYPE_SIGN (*common_type) == UNSIGNED;
+      tree eq_type
+	= build_nonstandard_integer_type (TYPE_PRECISION (new_type),
+					  sign);
+
+      if (types_compatible_p (*common_type, eq_type))
+	return true;
+    }
+
   /* See if *COMMON_TYPE can hold all values of NEW_TYPE.  */
   if ((TYPE_PRECISION (new_type) < TYPE_PRECISION (*common_type))
       && (TYPE_UNSIGNED (new_type) || !TYPE_UNSIGNED (*common_type)))
@@ -532,6 +550,9 @@ vect_joust_widened_type (tree type, tree new_type, tree *common_type)
    to a type that (a) is narrower than the result of STMT_INFO and
    (b) can hold all leaf operand values.
 
+   If ALLOW_SHORT_SIGN_MISMATCH then allow that the signs of the operands
+   may differ in signs but not in precision.
+
    Return 0 if STMT_INFO isn't such a tree, or if no such COMMON_TYPE
    exists.  */
 
@@ -539,7 +560,8 @@ static unsigned int
 vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
 		      tree_code widened_code, bool shift_p,
 		      unsigned int max_nops,
-		      vect_unpromoted_value *unprom, tree *common_type)
+		      vect_unpromoted_value *unprom, tree *common_type,
+		      bool allow_short_sign_mismatch = false)
 {
   /* Check for an integer operation with the right code.  */
   gassign *assign = dyn_cast <gassign *> (stmt_info->stmt);
@@ -600,7 +622,8 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
 		= vinfo->lookup_def (this_unprom->op);
 	      nops = vect_widened_op_tree (vinfo, def_stmt_info, code,
 					   widened_code, shift_p, max_nops,
-					   this_unprom, common_type);
+					   this_unprom, common_type,
+					   allow_short_sign_mismatch);
 	      if (nops == 0)
 		return 0;
 
@@ -617,7 +640,8 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
 	      if (i == 0)
 		*common_type = this_unprom->type;
 	      else if (!vect_joust_widened_type (type, this_unprom->type,
-						 common_type))
+						 common_type,
+						 allow_short_sign_mismatch))
 		return 0;
 	    }
 	}
@@ -888,21 +912,24 @@ vect_reassociating_reduction_p (vec_info *vinfo,
 
    Try to find the following pattern:
 
-     type x_t, y_t;
+     type1a x_t
+     type1b y_t;
      TYPE1 prod;
      TYPE2 sum = init;
    loop:
      sum_0 = phi <init, sum_1>
      S1  x_t = ...
      S2  y_t = ...
-     S3  x_T = (TYPE1) x_t;
-     S4  y_T = (TYPE1) y_t;
+     S3  x_T = (TYPE3) x_t;
+     S4  y_T = (TYPE4) y_t;
      S5  prod = x_T * y_T;
      [S6  prod = (TYPE2) prod;  #optional]
      S7  sum_1 = prod + sum_0;
 
-   where 'TYPE1' is exactly double the size of type 'type', and 'TYPE2' is the
-   same size of 'TYPE1' or bigger. This is a special case of a reduction
+   where 'TYPE1' is exactly double the size of type 'type1a' and 'type1b',
+   the sign of 'TYPE1' must be one of 'type1a' or 'type1b' but the sign of
+   'type1a' and 'type1b' can differ. 'TYPE2' is the same size of 'TYPE1' or
+   bigger and must be the same sign. This is a special case of a reduction
    computation.
 
    Input:
@@ -939,15 +966,16 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
 
   /* Look for the following pattern
           DX = (TYPE1) X;
-          DY = (TYPE1) Y;
+	  DY = (TYPE2) Y;
           DPROD = DX * DY;
-          DDPROD = (TYPE2) DPROD;
+	  DDPROD = (TYPE3) DPROD;
           sum_1 = DDPROD + sum_0;
      In which
      - DX is double the size of X
      - DY is double the size of Y
      - DX, DY, DPROD all have the same type but the sign
-       between DX, DY and DPROD can differ.
+       between DX, DY and DPROD can differ. The sign of DPROD
+       is one of the signs of DX or DY.
      - sum is the same size of DPROD or bigger
      - sum has been recognized as a reduction variable.
 
@@ -986,14 +1014,41 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
      inside the loop (in case we are analyzing an outer-loop).  */
   vect_unpromoted_value unprom0[2];
   if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR, WIDEN_MULT_EXPR,
-			     false, 2, unprom0, &half_type))
+			     false, 2, unprom0, &half_type, true))
     return NULL;
 
+  /* Check to see if there is a sign change happening in the operands of the
+     multiplication and pick the appropriate optab subtype.  */
+  enum optab_subtype subtype;
+  tree rhs_type1 = unprom0[0].type;
+  tree rhs_type2 = unprom0[1].type;
+  if (TYPE_SIGN (rhs_type1) == TYPE_SIGN (rhs_type2))
+     subtype = optab_default;
+  else if (TYPE_SIGN (rhs_type1) == SIGNED
+	   && TYPE_SIGN (rhs_type2) == UNSIGNED)
+     subtype = optab_signed_to_unsigned;
+  else if (TYPE_SIGN (rhs_type1) == UNSIGNED
+	   && TYPE_SIGN (rhs_type2) == SIGNED)
+     subtype = optab_unsigned_to_signed;
+  else
+    gcc_unreachable ();
+
+  /* If we have a sign changing dot product we need to check that the
+     promoted type if unsigned has at least the same precision as the final
+     type of the dot-product.  */
+  if (subtype != optab_default)
+    {
+      tree mult_type = TREE_TYPE (unprom_mult.op);
+      if (TYPE_SIGN (mult_type) == UNSIGNED
+	  && TYPE_PRECISION (mult_type) < TYPE_PRECISION (type))
+	return NULL;
+    }
+
   vect_pattern_detected ("vect_recog_dot_prod_pattern", last_stmt);
 
   tree half_vectype;
   if (!vect_supportable_direct_optab_p (vinfo, type, DOT_PROD_EXPR, half_type,
-					type_out, &half_vectype))
+					type_out, &half_vectype, subtype))
     return NULL;
 
   /* Get the inputs in the appropriate types.  */
@@ -1002,8 +1057,22 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
 		       unprom0, half_vectype);
 
   var = vect_recog_temp_ssa_var (type, NULL);
+
+  /* If we have a sign changing dot-product the dot-product itself does any
+     sign conversions, so consume the type and use the unpromoted types.  */
+  tree mult_arg1, mult_arg2;
+  if (subtype == optab_default)
+    {
+      mult_arg1 = mult_oprnd[0];
+      mult_arg2 = mult_oprnd[1];
+    }
+  else
+    {
+      mult_arg1 = unprom0[0].op;
+      mult_arg2 = unprom0[1].op;
+    }
   pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,
-				      mult_oprnd[0], mult_oprnd[1], oprnd1);
+				      mult_arg1, mult_arg2, oprnd1);
 
   return pattern_stmt;
 }


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 2/4]AArch64: Add support for sign differing dot-product usdot for NEON and SVE.
  2021-05-05 17:38 [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes Tamar Christina
@ 2021-05-05 17:38 ` Tamar Christina
  2021-05-10 16:49   ` Richard Sandiford
  2021-05-05 17:39 ` [PATCH 3/4][AArch32]: Add support for sign differing dot-product usdot for NEON Tamar Christina
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 35+ messages in thread
From: Tamar Christina @ 2021-05-05 17:38 UTC (permalink / raw)
  To: gcc-patches
  Cc: nd, Richard.Earnshaw, Marcus.Shawcroft, Kyrylo.Tkachov,
	richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 7936 bytes --]

Hi All,

This adds optabs implementing usdot_prod.

The following testcase:

#define N 480
#define SIGNEDNESS_1 unsigned
#define SIGNEDNESS_2 signed
#define SIGNEDNESS_3 signed
#define SIGNEDNESS_4 unsigned

SIGNEDNESS_1 int __attribute__ ((noipa))
f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a,
   SIGNEDNESS_4 char *restrict b)
{
  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
    {
      int av = a[i];
      int bv = b[i];
      SIGNEDNESS_2 short mult = av * bv;
      res += mult;
    }
  return res;
}

Generates for NEON

f:
        movi    v0.4s, 0
        mov     x3, 0
        .p2align 3,,7
.L2:
        ldr     q1, [x2, x3]
        ldr     q2, [x1, x3]
        usdot   v0.4s, v1.16b, v2.16b
        add     x3, x3, 16
        cmp     x3, 480
        bne     .L2
        addv    s0, v0.4s
        fmov    w1, s0
        add     w0, w0, w1
        ret

and for SVE

f:
        mov     x3, 0
        cntb    x5
        mov     w4, 480
        mov     z1.b, #0
        whilelo p0.b, wzr, w4
        mov     z3.b, #0
        ptrue   p1.b, all
        .p2align 3,,7
.L2:
        ld1b    z2.b, p0/z, [x1, x3]
        ld1b    z0.b, p0/z, [x2, x3]
        add     x3, x3, x5
        sel     z0.b, p0, z0.b, z3.b
        whilelo p0.b, w3, w4
        usdot   z1.s, z0.b, z2.b
        b.any   .L2
        uaddv   d0, p1, z1.s
        fmov    x1, d0
        add     w0, w0, w1
        ret

instead of

f:
        movi    v0.4s, 0
        mov     x3, 0
        .p2align 3,,7
.L2:
        ldr     q2, [x1, x3]
        ldr     q1, [x2, x3]
        add     x3, x3, 16
        sxtl    v4.8h, v2.8b
        sxtl2   v3.8h, v2.16b
        uxtl    v2.8h, v1.8b
        uxtl2   v1.8h, v1.16b
        mul     v2.8h, v2.8h, v4.8h
        mul     v1.8h, v1.8h, v3.8h
        saddw   v0.4s, v0.4s, v2.4h
        saddw2  v0.4s, v0.4s, v2.8h
        saddw   v0.4s, v0.4s, v1.4h
        saddw2  v0.4s, v0.4s, v1.8h
        cmp     x3, 480
        bne     .L2
        addv    s0, v0.4s
        fmov    w1, s0
        add     w0, w0, w1
        ret

and

f:
        mov     x3, 0
        cnth    x5
        mov     w4, 480
        mov     z1.b, #0
        whilelo p0.h, wzr, w4
        ptrue   p2.b, all
        .p2align 3,,7
.L2:
        ld1sb   z2.h, p0/z, [x1, x3]
        punpklo p1.h, p0.b
        ld1b    z0.h, p0/z, [x2, x3]
        add     x3, x3, x5
        mul     z0.h, p2/m, z0.h, z2.h
        sunpklo z2.s, z0.h
        sunpkhi z0.s, z0.h
        add     z1.s, p1/m, z1.s, z2.s
        punpkhi p1.h, p0.b
        whilelo p0.h, w3, w4
        add     z1.s, p1/m, z1.s, z0.s
        b.any   .L2
        uaddv   d0, p2, z1.s
        fmov    x1, d0
        add     w0, w0, w1
        ret

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* config/aarch64/aarch64-simd.md (usdot_prod<vsi2qi>): New.
	* config/aarch64/aarch64-sve.md (@aarch64_<sur>dot_prod<vsi2qi>):
	Rename to...
	(@<sur>dot_prod<vsi2qi>): ...This.
	* config/aarch64/aarch64-sve-builtins-base.cc
	(svusdot_impl::expand): Use it.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/simd/vusdot-autovec.c: New test.
	* gcc.target/aarch64/sve/vusdot-autovec.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 4edee99051c4e2112b546becca47da32aae21df2..c9fb8e702732dd311fb10de17126432e2a63a32b 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -648,6 +648,22 @@ (define_expand "<sur>dot_prod<vsi2qi>"
   DONE;
 })
 
+;; Auto-vectorizer pattern for usdot
+(define_expand "usdot_prod<vsi2qi>"
+  [(set (match_operand:VS 0 "register_operand")
+	(plus:VS (unspec:VS [(match_operand:<VSI2QI> 1 "register_operand")
+			    (match_operand:<VSI2QI> 2 "register_operand")]
+		 UNSPEC_USDOT)
+		(match_operand:VS 3 "register_operand")))]
+  "TARGET_I8MM"
+{
+  emit_insn (
+    gen_aarch64_usdot<vsi2qi> (operands[3], operands[3], operands[1],
+			       operands[2]));
+  emit_move_insn (operands[0], operands[3]);
+  DONE;
+})
+
 ;; These instructions map to the __builtins for the Dot Product
 ;; indexed operations.
 (define_insn "aarch64_<sur>dot_lane<vsi2qi>"
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index dfdf0e2fd186389cbddcff51ef52f8778d7fdb24..50adcd5404e97e610485140fdbfe4c8ebbf2f602 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -2366,7 +2366,7 @@ public:
        Hence we do the same rotation on arguments as svdot_impl does.  */
     e.rotate_inputs_left (0, 3);
     machine_mode mode = e.vector_mode (0);
-    insn_code icode = code_for_aarch64_dot_prod (UNSPEC_USDOT, mode);
+    insn_code icode = code_for_dot_prod (UNSPEC_USDOT, mode);
     return e.use_exact_insn (icode);
   }
 
diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md
index 7db2938bb84e04d066a7b07574e5cf344a3a8fb6..1278f6f12fadf8eec693cd47fd545ff3277f08f1 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -6870,7 +6870,7 @@ (define_insn "@aarch64_<sur>dot_prod_lane<vsi2qi>"
   [(set_attr "movprfx" "*,yes")]
 )
 
-(define_insn "@aarch64_<sur>dot_prod<vsi2qi>"
+(define_insn "@<sur>dot_prod<vsi2qi>"
   [(set (match_operand:VNx4SI_ONLY 0 "register_operand" "=w, ?&w")
         (plus:VNx4SI_ONLY
 	  (unspec:VNx4SI_ONLY
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vusdot-autovec.c b/gcc/testsuite/gcc.target/aarch64/simd/vusdot-autovec.c
new file mode 100644
index 0000000000000000000000000000000000000000..b99a945903c043c7410becaf6f09496dd038410d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vusdot-autovec.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=armv8.2-a+i8mm" } */
+
+#define N 480
+#define SIGNEDNESS_1 unsigned
+#define SIGNEDNESS_2 signed
+#define SIGNEDNESS_3 signed
+#define SIGNEDNESS_4 unsigned
+
+SIGNEDNESS_1 int __attribute__ ((noipa))
+f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a,
+   SIGNEDNESS_4 char *restrict b)
+{
+  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+    {
+      int av = a[i];
+      int bv = b[i];
+      SIGNEDNESS_2 short mult = av * bv;
+      res += mult;
+    }
+  return res;
+}
+
+SIGNEDNESS_1 int __attribute__ ((noipa))
+g (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict b,
+   SIGNEDNESS_4 char *restrict a)
+{
+  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+    {
+      int av = a[i];
+      int bv = b[i];
+      SIGNEDNESS_2 short mult = av * bv;
+      res += mult;
+    }
+  return res;
+}
+
+/* { dg-final { scan-assembler-times {\tusdot\t} 2 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/vusdot-autovec.c b/gcc/testsuite/gcc.target/aarch64/sve/vusdot-autovec.c
new file mode 100644
index 0000000000000000000000000000000000000000..094dd51cea62e0ba05ec3505657bf05320e5fdbb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/vusdot-autovec.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=armv8.2-a+i8mm+sve" } */
+
+#define N 480
+#define SIGNEDNESS_1 unsigned
+#define SIGNEDNESS_2 signed
+#define SIGNEDNESS_3 signed
+#define SIGNEDNESS_4 unsigned
+
+SIGNEDNESS_1 int __attribute__ ((noipa))
+f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a,
+   SIGNEDNESS_4 char *restrict b)
+{
+  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+    {
+      int av = a[i];
+      int bv = b[i];
+      SIGNEDNESS_2 short mult = av * bv;
+      res += mult;
+    }
+  return res;
+}
+
+SIGNEDNESS_1 int __attribute__ ((noipa))
+g (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict b,
+   SIGNEDNESS_4 char *restrict a)
+{
+  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+    {
+      int av = a[i];
+      int bv = b[i];
+      SIGNEDNESS_2 short mult = av * bv;
+      res += mult;
+    }
+  return res;
+}
+
+/* { dg-final { scan-assembler-times {\tusdot\t} 2 } } */


-- 

[-- Attachment #2: rb14434.patch --]
[-- Type: text/x-diff, Size: 4713 bytes --]

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 4edee99051c4e2112b546becca47da32aae21df2..c9fb8e702732dd311fb10de17126432e2a63a32b 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -648,6 +648,22 @@ (define_expand "<sur>dot_prod<vsi2qi>"
   DONE;
 })
 
+;; Auto-vectorizer pattern for usdot
+(define_expand "usdot_prod<vsi2qi>"
+  [(set (match_operand:VS 0 "register_operand")
+	(plus:VS (unspec:VS [(match_operand:<VSI2QI> 1 "register_operand")
+			    (match_operand:<VSI2QI> 2 "register_operand")]
+		 UNSPEC_USDOT)
+		(match_operand:VS 3 "register_operand")))]
+  "TARGET_I8MM"
+{
+  emit_insn (
+    gen_aarch64_usdot<vsi2qi> (operands[3], operands[3], operands[1],
+			       operands[2]));
+  emit_move_insn (operands[0], operands[3]);
+  DONE;
+})
+
 ;; These instructions map to the __builtins for the Dot Product
 ;; indexed operations.
 (define_insn "aarch64_<sur>dot_lane<vsi2qi>"
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index dfdf0e2fd186389cbddcff51ef52f8778d7fdb24..50adcd5404e97e610485140fdbfe4c8ebbf2f602 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -2366,7 +2366,7 @@ public:
        Hence we do the same rotation on arguments as svdot_impl does.  */
     e.rotate_inputs_left (0, 3);
     machine_mode mode = e.vector_mode (0);
-    insn_code icode = code_for_aarch64_dot_prod (UNSPEC_USDOT, mode);
+    insn_code icode = code_for_dot_prod (UNSPEC_USDOT, mode);
     return e.use_exact_insn (icode);
   }
 
diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md
index 7db2938bb84e04d066a7b07574e5cf344a3a8fb6..1278f6f12fadf8eec693cd47fd545ff3277f08f1 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -6870,7 +6870,7 @@ (define_insn "@aarch64_<sur>dot_prod_lane<vsi2qi>"
   [(set_attr "movprfx" "*,yes")]
 )
 
-(define_insn "@aarch64_<sur>dot_prod<vsi2qi>"
+(define_insn "@<sur>dot_prod<vsi2qi>"
   [(set (match_operand:VNx4SI_ONLY 0 "register_operand" "=w, ?&w")
         (plus:VNx4SI_ONLY
 	  (unspec:VNx4SI_ONLY
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vusdot-autovec.c b/gcc/testsuite/gcc.target/aarch64/simd/vusdot-autovec.c
new file mode 100644
index 0000000000000000000000000000000000000000..b99a945903c043c7410becaf6f09496dd038410d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vusdot-autovec.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=armv8.2-a+i8mm" } */
+
+#define N 480
+#define SIGNEDNESS_1 unsigned
+#define SIGNEDNESS_2 signed
+#define SIGNEDNESS_3 signed
+#define SIGNEDNESS_4 unsigned
+
+SIGNEDNESS_1 int __attribute__ ((noipa))
+f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a,
+   SIGNEDNESS_4 char *restrict b)
+{
+  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+    {
+      int av = a[i];
+      int bv = b[i];
+      SIGNEDNESS_2 short mult = av * bv;
+      res += mult;
+    }
+  return res;
+}
+
+SIGNEDNESS_1 int __attribute__ ((noipa))
+g (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict b,
+   SIGNEDNESS_4 char *restrict a)
+{
+  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+    {
+      int av = a[i];
+      int bv = b[i];
+      SIGNEDNESS_2 short mult = av * bv;
+      res += mult;
+    }
+  return res;
+}
+
+/* { dg-final { scan-assembler-times {\tusdot\t} 2 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/vusdot-autovec.c b/gcc/testsuite/gcc.target/aarch64/sve/vusdot-autovec.c
new file mode 100644
index 0000000000000000000000000000000000000000..094dd51cea62e0ba05ec3505657bf05320e5fdbb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/vusdot-autovec.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=armv8.2-a+i8mm+sve" } */
+
+#define N 480
+#define SIGNEDNESS_1 unsigned
+#define SIGNEDNESS_2 signed
+#define SIGNEDNESS_3 signed
+#define SIGNEDNESS_4 unsigned
+
+SIGNEDNESS_1 int __attribute__ ((noipa))
+f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a,
+   SIGNEDNESS_4 char *restrict b)
+{
+  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+    {
+      int av = a[i];
+      int bv = b[i];
+      SIGNEDNESS_2 short mult = av * bv;
+      res += mult;
+    }
+  return res;
+}
+
+SIGNEDNESS_1 int __attribute__ ((noipa))
+g (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict b,
+   SIGNEDNESS_4 char *restrict a)
+{
+  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+    {
+      int av = a[i];
+      int bv = b[i];
+      SIGNEDNESS_2 short mult = av * bv;
+      res += mult;
+    }
+  return res;
+}
+
+/* { dg-final { scan-assembler-times {\tusdot\t} 2 } } */


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/4]AArch64: Add support for sign differing dot-product usdot for NEON and SVE.
  2021-05-05 17:38 ` [PATCH 2/4]AArch64: Add support for sign differing dot-product usdot for NEON and SVE Tamar Christina
@ 2021-05-10 16:49   ` Richard Sandiford
  2021-05-25 14:57     ` Tamar Christina
  0 siblings, 1 reply; 35+ messages in thread
From: Richard Sandiford @ 2021-05-10 16:49 UTC (permalink / raw)
  To: Tamar Christina
  Cc: gcc-patches, nd, Richard.Earnshaw, Marcus.Shawcroft, Kyrylo.Tkachov

Tamar Christina <tamar.christina@arm.com> writes:
> diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
> index 4edee99051c4e2112b546becca47da32aae21df2..c9fb8e702732dd311fb10de17126432e2a63a32b 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -648,6 +648,22 @@ (define_expand "<sur>dot_prod<vsi2qi>"
>    DONE;
>  })
>  
> +;; Auto-vectorizer pattern for usdot
> +(define_expand "usdot_prod<vsi2qi>"
> +  [(set (match_operand:VS 0 "register_operand")
> +	(plus:VS (unspec:VS [(match_operand:<VSI2QI> 1 "register_operand")
> +			    (match_operand:<VSI2QI> 2 "register_operand")]
> +		 UNSPEC_USDOT)
> +		(match_operand:VS 3 "register_operand")))]
> +  "TARGET_I8MM"
> +{
> +  emit_insn (
> +    gen_aarch64_usdot<vsi2qi> (operands[3], operands[3], operands[1],
> +			       operands[2]));
> +  emit_move_insn (operands[0], operands[3]);
> +  DONE;
> +})

We can't modify operands[3] here; it's an input rather than an output.

It looks like this would work with just the {…} removed though.
The pattern will match aarch64_usdot<vsi2qi> on its own accord.

Even better would be to rename __builtin_aarch64_usdot… to
__builtin_usdot_prod…, change its arguments so that they line up
with the optabs, and change arm_neon.h to match.

> diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vusdot-autovec.c b/gcc/testsuite/gcc.target/aarch64/simd/vusdot-autovec.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..b99a945903c043c7410becaf6f09496dd038410d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/simd/vusdot-autovec.c
> @@ -0,0 +1,38 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -march=armv8.2-a+i8mm" } */
> +
> +#define N 480
> +#define SIGNEDNESS_1 unsigned
> +#define SIGNEDNESS_2 signed
> +#define SIGNEDNESS_3 signed
> +#define SIGNEDNESS_4 unsigned
> +
> +SIGNEDNESS_1 int __attribute__ ((noipa))
> +f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a,
> +   SIGNEDNESS_4 char *restrict b)
> +{
> +  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> +    {
> +      int av = a[i];
> +      int bv = b[i];
> +      SIGNEDNESS_2 short mult = av * bv;
> +      res += mult;
> +    }
> +  return res;
> +}
> +
> +SIGNEDNESS_1 int __attribute__ ((noipa))
> +g (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict b,
> +   SIGNEDNESS_4 char *restrict a)
> +{
> +  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> +    {
> +      int av = a[i];
> +      int bv = b[i];
> +      SIGNEDNESS_2 short mult = av * bv;
> +      res += mult;
> +    }
> +  return res;
> +}
> +
> +/* { dg-final { scan-assembler-times {\tusdot\t} 2 } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/vusdot-autovec.c b/gcc/testsuite/gcc.target/aarch64/sve/vusdot-autovec.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..094dd51cea62e0ba05ec3505657bf05320e5fdbb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/vusdot-autovec.c
> @@ -0,0 +1,38 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -march=armv8.2-a+i8mm+sve" } */
> +
> +#define N 480
> +#define SIGNEDNESS_1 unsigned
> +#define SIGNEDNESS_2 signed
> +#define SIGNEDNESS_3 signed
> +#define SIGNEDNESS_4 unsigned
> +
> +SIGNEDNESS_1 int __attribute__ ((noipa))
> +f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a,
> +   SIGNEDNESS_4 char *restrict b)
> +{
> +  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> +    {
> +      int av = a[i];
> +      int bv = b[i];
> +      SIGNEDNESS_2 short mult = av * bv;
> +      res += mult;
> +    }
> +  return res;
> +}
> +
> +SIGNEDNESS_1 int __attribute__ ((noipa))
> +g (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict b,
> +   SIGNEDNESS_4 char *restrict a)
> +{
> +  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> +    {
> +      int av = a[i];
> +      int bv = b[i];
> +      SIGNEDNESS_2 short mult = av * bv;
> +      res += mult;
> +    }
> +  return res;
> +}
> +
> +/* { dg-final { scan-assembler-times {\tusdot\t} 2 } } */

Guess this is personal preference, but I don't think the SIGNEDNESS_*
macros add anything when used like this.  I remember doing something
similar in the past when including .c files from other .c files(!)
in order to avoid cut-&-paste, but there doesn't seem much benefit
for standalone files like these.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: [PATCH 2/4]AArch64: Add support for sign differing dot-product usdot for NEON and SVE.
  2021-05-10 16:49   ` Richard Sandiford
@ 2021-05-25 14:57     ` Tamar Christina
  2021-05-26  8:50       ` Richard Sandiford
  0 siblings, 1 reply; 35+ messages in thread
From: Tamar Christina @ 2021-05-25 14:57 UTC (permalink / raw)
  To: Richard Sandiford
  Cc: gcc-patches, nd, Richard Earnshaw, Marcus Shawcroft, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 6337 bytes --]

Hi Richard,

> -----Original Message-----
> From: Richard Sandiford <richard.sandiford@arm.com>
> Sent: Monday, May 10, 2021 5:49 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Marcus Shawcroft
> <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> Subject: Re: [PATCH 2/4]AArch64: Add support for sign differing dot-product
> usdot for NEON and SVE.
> 
> Tamar Christina <tamar.christina@arm.com> writes:
> > diff --git a/gcc/config/aarch64/aarch64-simd.md
> > b/gcc/config/aarch64/aarch64-simd.md
> > index
> >
> 4edee99051c4e2112b546becca47da32aae21df2..c9fb8e702732dd311fb10de1
> 7126
> > 432e2a63a32b 100644
> > --- a/gcc/config/aarch64/aarch64-simd.md
> > +++ b/gcc/config/aarch64/aarch64-simd.md
> > @@ -648,6 +648,22 @@ (define_expand "<sur>dot_prod<vsi2qi>"
> >    DONE;
> >  })
> >
> > +;; Auto-vectorizer pattern for usdot
> > +(define_expand "usdot_prod<vsi2qi>"
> > +  [(set (match_operand:VS 0 "register_operand")
> > +	(plus:VS (unspec:VS [(match_operand:<VSI2QI> 1
> "register_operand")
> > +			    (match_operand:<VSI2QI> 2 "register_operand")]
> > +		 UNSPEC_USDOT)
> > +		(match_operand:VS 3 "register_operand")))]
> > +  "TARGET_I8MM"
> > +{
> > +  emit_insn (
> > +    gen_aarch64_usdot<vsi2qi> (operands[3], operands[3], operands[1],
> > +			       operands[2]));
> > +  emit_move_insn (operands[0], operands[3]);
> > +  DONE;
> > +})
> 
> We can't modify operands[3] here; it's an input rather than an output.

Sorry, I should have noticed this.. I had blindly copied the existing pattern for dot-product and that looks like it's wrong.
I'll send a different patch to fix that one.

> 
> It looks like this would work with just the {…} removed though.
> The pattern will match aarch64_usdot<vsi2qi> on its own accord.
> 
> Even better would be to rename __builtin_aarch64_usdot… to
> __builtin_usdot_prod…, change its arguments so that they line up with the
> optabs, and change arm_neon.h to match.
> 
> > diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vusdot-autovec.c
> > b/gcc/testsuite/gcc.target/aarch64/simd/vusdot-autovec.c
> > new file mode 100644
> > index
> >
> 0000000000000000000000000000000000000000..b99a945903c043c7410becaf6f
> 09
> > 496dd038410d
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/simd/vusdot-autovec.c
> > @@ -0,0 +1,38 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O3 -march=armv8.2-a+i8mm" } */
> > +
> > +#define N 480
> > +#define SIGNEDNESS_1 unsigned
> > +#define SIGNEDNESS_2 signed
> > +#define SIGNEDNESS_3 signed
> > +#define SIGNEDNESS_4 unsigned
> > +
> > +SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int res,
> > +SIGNEDNESS_3 char *restrict a,
> > +   SIGNEDNESS_4 char *restrict b)
> > +{
> > +  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> > +    {
> > +      int av = a[i];
> > +      int bv = b[i];
> > +      SIGNEDNESS_2 short mult = av * bv;
> > +      res += mult;
> > +    }
> > +  return res;
> > +}
> > +
> > +SIGNEDNESS_1 int __attribute__ ((noipa)) g (SIGNEDNESS_1 int res,
> > +SIGNEDNESS_3 char *restrict b,
> > +   SIGNEDNESS_4 char *restrict a)
> > +{
> > +  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> > +    {
> > +      int av = a[i];
> > +      int bv = b[i];
> > +      SIGNEDNESS_2 short mult = av * bv;
> > +      res += mult;
> > +    }
> > +  return res;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {\tusdot\t} 2 } } */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/vusdot-autovec.c
> > b/gcc/testsuite/gcc.target/aarch64/sve/vusdot-autovec.c
> > new file mode 100644
> > index
> >
> 0000000000000000000000000000000000000000..094dd51cea62e0ba05ec35056
> 57b
> > f05320e5fdbb
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/sve/vusdot-autovec.c
> > @@ -0,0 +1,38 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O3 -march=armv8.2-a+i8mm+sve" } */
> > +
> > +#define N 480
> > +#define SIGNEDNESS_1 unsigned
> > +#define SIGNEDNESS_2 signed
> > +#define SIGNEDNESS_3 signed
> > +#define SIGNEDNESS_4 unsigned
> > +
> > +SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int res,
> > +SIGNEDNESS_3 char *restrict a,
> > +   SIGNEDNESS_4 char *restrict b)
> > +{
> > +  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> > +    {
> > +      int av = a[i];
> > +      int bv = b[i];
> > +      SIGNEDNESS_2 short mult = av * bv;
> > +      res += mult;
> > +    }
> > +  return res;
> > +}
> > +
> > +SIGNEDNESS_1 int __attribute__ ((noipa)) g (SIGNEDNESS_1 int res,
> > +SIGNEDNESS_3 char *restrict b,
> > +   SIGNEDNESS_4 char *restrict a)
> > +{
> > +  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> > +    {
> > +      int av = a[i];
> > +      int bv = b[i];
> > +      SIGNEDNESS_2 short mult = av * bv;
> > +      res += mult;
> > +    }
> > +  return res;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {\tusdot\t} 2 } } */
> 
> Guess this is personal preference, but I don't think the SIGNEDNESS_*
> macros add anything when used like this.  I remember doing something
> similar in the past when including .c files from other .c files(!) in order to
> avoid cut-&-paste, but there doesn't seem much benefit for standalone files
> like these.

If it's the same to you, I do prefer this version, since it's identical to the mid-end tests,
It does allow when familiar with the  tests to just quickly see what it's testing.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* config/aarch64/aarch64-simd.md (aarch64_usdot<vsi2qi>): Rename to...
	(usdot_prod<vsi2qi>): ... This.
	* config/aarch64/aarch64-simd-builtins.def (usdot): Rename to...
	(usdot_prod): ...This.
	* config/aarch64/arm_neon.h (vusdot_s32, vusdotq_s32): Likewise.
	* config/aarch64/aarch64-sve.md (@aarch64_<sur>dot_prod<vsi2qi>):
	Rename to...
	(@<sur>dot_prod<vsi2qi>): ...This.
	* config/aarch64/aarch64-sve-builtins-base.cc
	(svusdot_impl::expand): Use it.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/simd/vusdot-autovec.c: New test.
	* gcc.target/aarch64/sve/vusdot-autovec.c: New test.

> 
> Thanks,
> Richard

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: rb14434.patch --]
[-- Type: text/x-diff; name="rb14434.patch", Size: 6384 bytes --]

diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index b885bd5b38bf7ad83eb9d801284bf9b34db17210..c869ed9a6ab7d63f0e3d5fe393a93c1cc9142e78 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -361,10 +361,11 @@
   BUILTIN_VSDQ_I_DI (BINOP, srshl, 0, NONE)
   BUILTIN_VSDQ_I_DI (BINOP_UUS, urshl, 0, NONE)
 
-  /* Implemented by aarch64_<sur><dotprod>{_lane}{q}<dot_mode>.  */
+  /* Implemented by <sur><dotprod>_prod<dot_mode>.  */
   BUILTIN_VB (TERNOP, sdot, 0, NONE)
   BUILTIN_VB (TERNOPU, udot, 0, NONE)
-  BUILTIN_VB (TERNOP_SSUS, usdot, 0, NONE)
+  BUILTIN_VB (TERNOP_SSUS, usdot_prod, 10, NONE)
+  /* Implemented by aarch64_<sur><dotprod>_lane{q}<dot_mode>.  */
   BUILTIN_VB (QUADOP_LANE, sdot_lane, 0, NONE)
   BUILTIN_VB (QUADOPU_LANE, udot_lane, 0, NONE)
   BUILTIN_VB (QUADOP_LANE, sdot_laneq, 0, NONE)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 4edee99051c4e2112b546becca47da32aae21df2..253ddbe25d3a86af4b40b056132e6a86a0392ea6 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -601,7 +601,7 @@ (define_insn "aarch64_<sur>dot<vsi2qi>"
 
 ;; These instructions map to the __builtins for the armv8.6a I8MM usdot
 ;; (vector) Dot Product operation.
-(define_insn "aarch64_usdot<vsi2qi>"
+(define_insn "usdot_prod<vsi2qi>"
   [(set (match_operand:VS 0 "register_operand" "=w")
 	(plus:VS
 	  (unspec:VS [(match_operand:<VSI2QI> 2 "register_operand" "w")
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index dfdf0e2fd186389cbddcff51ef52f8778d7fdb24..50adcd5404e97e610485140fdbfe4c8ebbf2f602 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -2366,7 +2366,7 @@ public:
        Hence we do the same rotation on arguments as svdot_impl does.  */
     e.rotate_inputs_left (0, 3);
     machine_mode mode = e.vector_mode (0);
-    insn_code icode = code_for_aarch64_dot_prod (UNSPEC_USDOT, mode);
+    insn_code icode = code_for_dot_prod (UNSPEC_USDOT, mode);
     return e.use_exact_insn (icode);
   }
 
diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md
index 7db2938bb84e04d066a7b07574e5cf344a3a8fb6..1278f6f12fadf8eec693cd47fd545ff3277f08f1 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -6870,7 +6870,7 @@ (define_insn "@aarch64_<sur>dot_prod_lane<vsi2qi>"
   [(set_attr "movprfx" "*,yes")]
 )
 
-(define_insn "@aarch64_<sur>dot_prod<vsi2qi>"
+(define_insn "@<sur>dot_prod<vsi2qi>"
   [(set (match_operand:VNx4SI_ONLY 0 "register_operand" "=w, ?&w")
         (plus:VNx4SI_ONLY
 	  (unspec:VNx4SI_ONLY
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index baa30bd5a9d96c1bf04a37fb105091ea56a6444a..373f06a24ea6ce686d7e0cdf53dd364041c61092 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -34384,14 +34384,14 @@ __extension__ extern __inline int32x2_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vusdot_s32 (int32x2_t __r, uint8x8_t __a, int8x8_t __b)
 {
-  return __builtin_aarch64_usdotv8qi_ssus (__r, __a, __b);
+  return __builtin_aarch64_usdot_prodv8qi_ssus (__r, __a, __b);
 }
 
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vusdotq_s32 (int32x4_t __r, uint8x16_t __a, int8x16_t __b)
 {
-  return __builtin_aarch64_usdotv16qi_ssus (__r, __a, __b);
+  return __builtin_aarch64_usdot_prodv16qi_ssus (__r, __a, __b);
 }
 
 __extension__ extern __inline int32x2_t
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vusdot-autovec.c b/gcc/testsuite/gcc.target/aarch64/simd/vusdot-autovec.c
new file mode 100644
index 0000000000000000000000000000000000000000..b99a945903c043c7410becaf6f09496dd038410d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vusdot-autovec.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=armv8.2-a+i8mm" } */
+
+#define N 480
+#define SIGNEDNESS_1 unsigned
+#define SIGNEDNESS_2 signed
+#define SIGNEDNESS_3 signed
+#define SIGNEDNESS_4 unsigned
+
+SIGNEDNESS_1 int __attribute__ ((noipa))
+f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a,
+   SIGNEDNESS_4 char *restrict b)
+{
+  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+    {
+      int av = a[i];
+      int bv = b[i];
+      SIGNEDNESS_2 short mult = av * bv;
+      res += mult;
+    }
+  return res;
+}
+
+SIGNEDNESS_1 int __attribute__ ((noipa))
+g (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict b,
+   SIGNEDNESS_4 char *restrict a)
+{
+  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+    {
+      int av = a[i];
+      int bv = b[i];
+      SIGNEDNESS_2 short mult = av * bv;
+      res += mult;
+    }
+  return res;
+}
+
+/* { dg-final { scan-assembler-times {\tusdot\t} 2 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/vusdot-autovec.c b/gcc/testsuite/gcc.target/aarch64/sve/vusdot-autovec.c
new file mode 100644
index 0000000000000000000000000000000000000000..094dd51cea62e0ba05ec3505657bf05320e5fdbb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/vusdot-autovec.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=armv8.2-a+i8mm+sve" } */
+
+#define N 480
+#define SIGNEDNESS_1 unsigned
+#define SIGNEDNESS_2 signed
+#define SIGNEDNESS_3 signed
+#define SIGNEDNESS_4 unsigned
+
+SIGNEDNESS_1 int __attribute__ ((noipa))
+f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a,
+   SIGNEDNESS_4 char *restrict b)
+{
+  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+    {
+      int av = a[i];
+      int bv = b[i];
+      SIGNEDNESS_2 short mult = av * bv;
+      res += mult;
+    }
+  return res;
+}
+
+SIGNEDNESS_1 int __attribute__ ((noipa))
+g (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict b,
+   SIGNEDNESS_4 char *restrict a)
+{
+  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+    {
+      int av = a[i];
+      int bv = b[i];
+      SIGNEDNESS_2 short mult = av * bv;
+      res += mult;
+    }
+  return res;
+}
+
+/* { dg-final { scan-assembler-times {\tusdot\t} 2 } } */


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/4]AArch64: Add support for sign differing dot-product usdot for NEON and SVE.
  2021-05-25 14:57     ` Tamar Christina
@ 2021-05-26  8:50       ` Richard Sandiford
  0 siblings, 0 replies; 35+ messages in thread
From: Richard Sandiford @ 2021-05-26  8:50 UTC (permalink / raw)
  To: Tamar Christina
  Cc: gcc-patches, nd, Richard Earnshaw, Marcus Shawcroft, Kyrylo Tkachov

Tamar Christina <Tamar.Christina@arm.com> writes:
>> -----Original Message-----
>> From: Richard Sandiford <richard.sandiford@arm.com>
>> Sent: Monday, May 10, 2021 5:49 PM
>> To: Tamar Christina <Tamar.Christina@arm.com>
>> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Earnshaw
>> <Richard.Earnshaw@arm.com>; Marcus Shawcroft
>> <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
>> Subject: Re: [PATCH 2/4]AArch64: Add support for sign differing dot-product
>> usdot for NEON and SVE.
>>
>> Tamar Christina <tamar.christina@arm.com> writes:
>> > diff --git a/gcc/config/aarch64/aarch64-simd.md
>> > b/gcc/config/aarch64/aarch64-simd.md
>> > index
>> >
>> 4edee99051c4e2112b546becca47da32aae21df2..c9fb8e702732dd311fb10de1
>> 7126
>> > 432e2a63a32b 100644
>> > --- a/gcc/config/aarch64/aarch64-simd.md
>> > +++ b/gcc/config/aarch64/aarch64-simd.md
>> > @@ -648,6 +648,22 @@ (define_expand "<sur>dot_prod<vsi2qi>"
>> >    DONE;
>> >  })
>> >
>> > +;; Auto-vectorizer pattern for usdot
>> > +(define_expand "usdot_prod<vsi2qi>"
>> > +  [(set (match_operand:VS 0 "register_operand")
>> > +   (plus:VS (unspec:VS [(match_operand:<VSI2QI> 1
>> "register_operand")
>> > +                       (match_operand:<VSI2QI> 2 "register_operand")]
>> > +            UNSPEC_USDOT)
>> > +           (match_operand:VS 3 "register_operand")))]
>> > +  "TARGET_I8MM"
>> > +{
>> > +  emit_insn (
>> > +    gen_aarch64_usdot<vsi2qi> (operands[3], operands[3], operands[1],
>> > +                          operands[2]));
>> > +  emit_move_insn (operands[0], operands[3]);
>> > +  DONE;
>> > +})
>>
>> We can't modify operands[3] here; it's an input rather than an output.
>
> Sorry, I should have noticed this.. I had blindly copied the existing pattern for dot-product and that looks like it's wrong.
> I'll send a different patch to fix that one.
>
>>
>> It looks like this would work with just the {…} removed though.
>> The pattern will match aarch64_usdot<vsi2qi> on its own accord.
>>
>> Even better would be to rename __builtin_aarch64_usdot… to
>> __builtin_usdot_prod…, change its arguments so that they line up with the
>> optabs, and change arm_neon.h to match.
>>
>> > diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vusdot-autovec.c
>> > b/gcc/testsuite/gcc.target/aarch64/simd/vusdot-autovec.c
>> > new file mode 100644
>> > index
>> >
>> 0000000000000000000000000000000000000000..b99a945903c043c7410becaf6f
>> 09
>> > 496dd038410d
>> > --- /dev/null
>> > +++ b/gcc/testsuite/gcc.target/aarch64/simd/vusdot-autovec.c
>> > @@ -0,0 +1,38 @@
>> > +/* { dg-do compile } */
>> > +/* { dg-options "-O3 -march=armv8.2-a+i8mm" } */
>> > +
>> > +#define N 480
>> > +#define SIGNEDNESS_1 unsigned
>> > +#define SIGNEDNESS_2 signed
>> > +#define SIGNEDNESS_3 signed
>> > +#define SIGNEDNESS_4 unsigned
>> > +
>> > +SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int res,
>> > +SIGNEDNESS_3 char *restrict a,
>> > +   SIGNEDNESS_4 char *restrict b)
>> > +{
>> > +  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
>> > +    {
>> > +      int av = a[i];
>> > +      int bv = b[i];
>> > +      SIGNEDNESS_2 short mult = av * bv;
>> > +      res += mult;
>> > +    }
>> > +  return res;
>> > +}
>> > +
>> > +SIGNEDNESS_1 int __attribute__ ((noipa)) g (SIGNEDNESS_1 int res,
>> > +SIGNEDNESS_3 char *restrict b,
>> > +   SIGNEDNESS_4 char *restrict a)
>> > +{
>> > +  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
>> > +    {
>> > +      int av = a[i];
>> > +      int bv = b[i];
>> > +      SIGNEDNESS_2 short mult = av * bv;
>> > +      res += mult;
>> > +    }
>> > +  return res;
>> > +}
>> > +
>> > +/* { dg-final { scan-assembler-times {\tusdot\t} 2 } } */
>> > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/vusdot-autovec.c
>> > b/gcc/testsuite/gcc.target/aarch64/sve/vusdot-autovec.c
>> > new file mode 100644
>> > index
>> >
>> 0000000000000000000000000000000000000000..094dd51cea62e0ba05ec35056
>> 57b
>> > f05320e5fdbb
>> > --- /dev/null
>> > +++ b/gcc/testsuite/gcc.target/aarch64/sve/vusdot-autovec.c
>> > @@ -0,0 +1,38 @@
>> > +/* { dg-do compile } */
>> > +/* { dg-options "-O3 -march=armv8.2-a+i8mm+sve" } */
>> > +
>> > +#define N 480
>> > +#define SIGNEDNESS_1 unsigned
>> > +#define SIGNEDNESS_2 signed
>> > +#define SIGNEDNESS_3 signed
>> > +#define SIGNEDNESS_4 unsigned
>> > +
>> > +SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int res,
>> > +SIGNEDNESS_3 char *restrict a,
>> > +   SIGNEDNESS_4 char *restrict b)
>> > +{
>> > +  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
>> > +    {
>> > +      int av = a[i];
>> > +      int bv = b[i];
>> > +      SIGNEDNESS_2 short mult = av * bv;
>> > +      res += mult;
>> > +    }
>> > +  return res;
>> > +}
>> > +
>> > +SIGNEDNESS_1 int __attribute__ ((noipa)) g (SIGNEDNESS_1 int res,
>> > +SIGNEDNESS_3 char *restrict b,
>> > +   SIGNEDNESS_4 char *restrict a)
>> > +{
>> > +  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
>> > +    {
>> > +      int av = a[i];
>> > +      int bv = b[i];
>> > +      SIGNEDNESS_2 short mult = av * bv;
>> > +      res += mult;
>> > +    }
>> > +  return res;
>> > +}
>> > +
>> > +/* { dg-final { scan-assembler-times {\tusdot\t} 2 } } */
>>
>> Guess this is personal preference, but I don't think the SIGNEDNESS_*
>> macros add anything when used like this.  I remember doing something
>> similar in the past when including .c files from other .c files(!) in order to
>> avoid cut-&-paste, but there doesn't seem much benefit for standalone files
>> like these.
>
> If it's the same to you, I do prefer this version, since it's identical to the mid-end tests,
> It does allow when familiar with the  tests to just quickly see what it's testing.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>         * config/aarch64/aarch64-simd.md (aarch64_usdot<vsi2qi>): Rename to...
>         (usdot_prod<vsi2qi>): ... This.
>         * config/aarch64/aarch64-simd-builtins.def (usdot): Rename to...
>         (usdot_prod): ...This.
>         * config/aarch64/arm_neon.h (vusdot_s32, vusdotq_s32): Likewise.
>         * config/aarch64/aarch64-sve.md (@aarch64_<sur>dot_prod<vsi2qi>):
>         Rename to...
>         (@<sur>dot_prod<vsi2qi>): ...This.
>         * config/aarch64/aarch64-sve-builtins-base.cc
>         (svusdot_impl::expand): Use it.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/aarch64/simd/vusdot-autovec.c: New test.
>         * gcc.target/aarch64/sve/vusdot-autovec.c: New test.

OK, thanks.

Richard

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 3/4][AArch32]: Add support for sign differing dot-product usdot for NEON.
  2021-05-05 17:38 [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes Tamar Christina
  2021-05-05 17:38 ` [PATCH 2/4]AArch64: Add support for sign differing dot-product usdot for NEON and SVE Tamar Christina
@ 2021-05-05 17:39 ` Tamar Christina
  2021-05-05 17:42   ` FW: " Tamar Christina
  2021-05-06  9:23   ` Christophe Lyon
  2021-05-05 17:39 ` [PATCH 4/4]middle-end: Add tests middle end generic tests for sign differing dotproduct Tamar Christina
  2021-05-07 11:45 ` [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes Richard Biener
  3 siblings, 2 replies; 35+ messages in thread
From: Tamar Christina @ 2021-05-05 17:39 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd

[-- Attachment #1: Type: text/plain, Size: 4164 bytes --]

Hi All,

This adds optabs implementing usdot_prod.

The following testcase:

#define N 480
#define SIGNEDNESS_1 unsigned
#define SIGNEDNESS_2 signed
#define SIGNEDNESS_3 signed
#define SIGNEDNESS_4 unsigned

SIGNEDNESS_1 int __attribute__ ((noipa))
f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a,
   SIGNEDNESS_4 char *restrict b)
{
  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
    {
      int av = a[i];
      int bv = b[i];
      SIGNEDNESS_2 short mult = av * bv;
      res += mult;
    }
  return res;
}

Generates

f:
        vmov.i32        q8, #0  @ v4si
        add     r3, r2, #480
.L2:
        vld1.8  {q10}, [r2]!
        vld1.8  {q9}, [r1]!
        vusdot.s8       q8, q9, q10
        cmp     r3, r2
        bne     .L2
        vadd.i32        d16, d16, d17
        vpadd.i32       d16, d16, d16
        vmov.32 r3, d16[0]
        add     r0, r0, r3
        bx      lr

instead of

f:
        vmov.i32        q8, #0  @ v4si
        add     r3, r2, #480
.L2:
        vld1.8  {q9}, [r2]!
        vld1.8  {q11}, [r1]!
        cmp     r3, r2
        vmull.s8 q10, d18, d22
        vmull.s8 q9, d19, d23
        vaddw.s16       q8, q8, d20
        vaddw.s16       q8, q8, d21
        vaddw.s16       q8, q8, d18
        vaddw.s16       q8, q8, d19
        bne     .L2
        vadd.i32        d16, d16, d17
        vpadd.i32       d16, d16, d16
        vmov.32 r3, d16[0]
        add     r0, r0, r3
        bx      lr

For NEON.  I couldn't figure out if the MVE instruction vmlaldav.s16 could be
used to emulate this.  Because it would require additional widening to work I
left MVE out of this patch set but perhaps someone should take a look.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* config/arm/neon.md (usdot_prod<vsi2qi>): New.

gcc/testsuite/ChangeLog:

	* gcc.target/arm/simd/vusdot-autovec.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index fec2cc91d24b6eff7b6fc8fdd54f39b3d646c468..23ad411178db77c5d19bee7452bc1070331c1aa0 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -3075,6 +3075,24 @@ (define_expand "<sup>dot_prod<vsi2qi>"
   DONE;
 })
 
+;; Auto-vectorizer pattern for usdot
+(define_expand "usdot_prod<vsi2qi>"
+  [(set (match_operand:VCVTI 0 "register_operand")
+	(plus:VCVTI (unspec:VCVTI [(match_operand:<VSI2QI> 1
+							"register_operand")
+				   (match_operand:<VSI2QI> 2
+							"register_operand")]
+		     UNSPEC_DOT_US)
+		    (match_operand:VCVTI 3 "register_operand")))]
+  "TARGET_I8MM"
+{
+  emit_insn (
+    gen_neon_usdot<vsi2qi> (operands[3], operands[3], operands[1],
+			    operands[2]));
+  emit_insn (gen_rtx_SET (operands[0], operands[3]));
+  DONE;
+})
+
 (define_expand "neon_copysignf<mode>"
   [(match_operand:VCVTF 0 "register_operand")
    (match_operand:VCVTF 1 "register_operand")
diff --git a/gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c b/gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c
new file mode 100644
index 0000000000000000000000000000000000000000..7cc56f68817d77d6950df0ab372d6fbaad6b3813
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=armv8.2-a+i8mm" } */
+
+#define N 480
+#define SIGNEDNESS_1 unsigned
+#define SIGNEDNESS_2 signed
+#define SIGNEDNESS_3 signed
+#define SIGNEDNESS_4 unsigned
+
+SIGNEDNESS_1 int __attribute__ ((noipa))
+f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a,
+   SIGNEDNESS_4 char *restrict b)
+{
+  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+    {
+      int av = a[i];
+      int bv = b[i];
+      SIGNEDNESS_2 short mult = av * bv;
+      res += mult;
+    }
+  return res;
+}
+
+SIGNEDNESS_1 int __attribute__ ((noipa))
+g (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict b,
+   SIGNEDNESS_4 char *restrict a)
+{
+  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+    {
+      int av = a[i];
+      int bv = b[i];
+      SIGNEDNESS_2 short mult = av * bv;
+      res += mult;
+    }
+  return res;
+}
+
+/* { dg-final { scan-assembler-times {vusdot.s8} 2 { target { arm-*-*-gnueabihf } } } } */


-- 

[-- Attachment #2: rb14435.patch --]
[-- Type: text/x-diff, Size: 2232 bytes --]

diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index fec2cc91d24b6eff7b6fc8fdd54f39b3d646c468..23ad411178db77c5d19bee7452bc1070331c1aa0 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -3075,6 +3075,24 @@ (define_expand "<sup>dot_prod<vsi2qi>"
   DONE;
 })
 
+;; Auto-vectorizer pattern for usdot
+(define_expand "usdot_prod<vsi2qi>"
+  [(set (match_operand:VCVTI 0 "register_operand")
+	(plus:VCVTI (unspec:VCVTI [(match_operand:<VSI2QI> 1
+							"register_operand")
+				   (match_operand:<VSI2QI> 2
+							"register_operand")]
+		     UNSPEC_DOT_US)
+		    (match_operand:VCVTI 3 "register_operand")))]
+  "TARGET_I8MM"
+{
+  emit_insn (
+    gen_neon_usdot<vsi2qi> (operands[3], operands[3], operands[1],
+			    operands[2]));
+  emit_insn (gen_rtx_SET (operands[0], operands[3]));
+  DONE;
+})
+
 (define_expand "neon_copysignf<mode>"
   [(match_operand:VCVTF 0 "register_operand")
    (match_operand:VCVTF 1 "register_operand")
diff --git a/gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c b/gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c
new file mode 100644
index 0000000000000000000000000000000000000000..7cc56f68817d77d6950df0ab372d6fbaad6b3813
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=armv8.2-a+i8mm" } */
+
+#define N 480
+#define SIGNEDNESS_1 unsigned
+#define SIGNEDNESS_2 signed
+#define SIGNEDNESS_3 signed
+#define SIGNEDNESS_4 unsigned
+
+SIGNEDNESS_1 int __attribute__ ((noipa))
+f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a,
+   SIGNEDNESS_4 char *restrict b)
+{
+  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+    {
+      int av = a[i];
+      int bv = b[i];
+      SIGNEDNESS_2 short mult = av * bv;
+      res += mult;
+    }
+  return res;
+}
+
+SIGNEDNESS_1 int __attribute__ ((noipa))
+g (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict b,
+   SIGNEDNESS_4 char *restrict a)
+{
+  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+    {
+      int av = a[i];
+      int bv = b[i];
+      SIGNEDNESS_2 short mult = av * bv;
+      res += mult;
+    }
+  return res;
+}
+
+/* { dg-final { scan-assembler-times {vusdot.s8} 2 { target { arm-*-*-gnueabihf } } } } */


^ permalink raw reply	[flat|nested] 35+ messages in thread

* FW: [PATCH 3/4][AArch32]: Add support for sign differing dot-product usdot for NEON.
  2021-05-05 17:39 ` [PATCH 3/4][AArch32]: Add support for sign differing dot-product usdot for NEON Tamar Christina
@ 2021-05-05 17:42   ` Tamar Christina
       [not found]     ` <VI1PR08MB5325B832EE3BB6139886C0E9FF259@VI1PR08MB5325.eurprd08.prod.outlook.com>
  2021-05-06  9:23   ` Christophe Lyon
  1 sibling, 1 reply; 35+ messages in thread
From: Tamar Christina @ 2021-05-05 17:42 UTC (permalink / raw)
  To: gcc Patches
  Cc: nickc, nd, Richard Earnshaw, Ramana Radhakrishnan, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 4614 bytes --]

Forgot to CC maintainers..

-----Original Message-----
From: Tamar Christina <tamar.christina@arm.com> 
Sent: Wednesday, May 5, 2021 6:39 PM
To: gcc-patches@gcc.gnu.org
Cc: nd <nd@arm.com>
Subject: [PATCH 3/4][AArch32]: Add support for sign differing dot-product usdot for NEON. 

Hi All,

This adds optabs implementing usdot_prod.

The following testcase:

#define N 480
#define SIGNEDNESS_1 unsigned
#define SIGNEDNESS_2 signed
#define SIGNEDNESS_3 signed
#define SIGNEDNESS_4 unsigned

SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a,
   SIGNEDNESS_4 char *restrict b)
{
  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
    {
      int av = a[i];
      int bv = b[i];
      SIGNEDNESS_2 short mult = av * bv;
      res += mult;
    }
  return res;
}

Generates

f:
        vmov.i32        q8, #0  @ v4si
        add     r3, r2, #480
.L2:
        vld1.8  {q10}, [r2]!
        vld1.8  {q9}, [r1]!
        vusdot.s8       q8, q9, q10
        cmp     r3, r2
        bne     .L2
        vadd.i32        d16, d16, d17
        vpadd.i32       d16, d16, d16
        vmov.32 r3, d16[0]
        add     r0, r0, r3
        bx      lr

instead of

f:
        vmov.i32        q8, #0  @ v4si
        add     r3, r2, #480
.L2:
        vld1.8  {q9}, [r2]!
        vld1.8  {q11}, [r1]!
        cmp     r3, r2
        vmull.s8 q10, d18, d22
        vmull.s8 q9, d19, d23
        vaddw.s16       q8, q8, d20
        vaddw.s16       q8, q8, d21
        vaddw.s16       q8, q8, d18
        vaddw.s16       q8, q8, d19
        bne     .L2
        vadd.i32        d16, d16, d17
        vpadd.i32       d16, d16, d16
        vmov.32 r3, d16[0]
        add     r0, r0, r3
        bx      lr

For NEON.  I couldn't figure out if the MVE instruction vmlaldav.s16 could be used to emulate this.  Because it would require additional widening to work I left MVE out of this patch set but perhaps someone should take a look.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* config/arm/neon.md (usdot_prod<vsi2qi>): New.

gcc/testsuite/ChangeLog:

	* gcc.target/arm/simd/vusdot-autovec.c: New test.

--- inline copy of patch --
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index fec2cc91d24b6eff7b6fc8fdd54f39b3d646c468..23ad411178db77c5d19bee7452bc1070331c1aa0 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -3075,6 +3075,24 @@ (define_expand "<sup>dot_prod<vsi2qi>"
   DONE;
 })
 
+;; Auto-vectorizer pattern for usdot
+(define_expand "usdot_prod<vsi2qi>"
+  [(set (match_operand:VCVTI 0 "register_operand")
+	(plus:VCVTI (unspec:VCVTI [(match_operand:<VSI2QI> 1
+							"register_operand")
+				   (match_operand:<VSI2QI> 2
+							"register_operand")]
+		     UNSPEC_DOT_US)
+		    (match_operand:VCVTI 3 "register_operand")))]
+  "TARGET_I8MM"
+{
+  emit_insn (
+    gen_neon_usdot<vsi2qi> (operands[3], operands[3], operands[1],
+			    operands[2]));
+  emit_insn (gen_rtx_SET (operands[0], operands[3]));
+  DONE;
+})
+
 (define_expand "neon_copysignf<mode>"
   [(match_operand:VCVTF 0 "register_operand")
    (match_operand:VCVTF 1 "register_operand") diff --git a/gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c b/gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c
new file mode 100644
index 0000000000000000000000000000000000000000..7cc56f68817d77d6950df0ab372d6fbaad6b3813
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=armv8.2-a+i8mm" } */
+
+#define N 480
+#define SIGNEDNESS_1 unsigned
+#define SIGNEDNESS_2 signed
+#define SIGNEDNESS_3 signed
+#define SIGNEDNESS_4 unsigned
+
+SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int res, 
+SIGNEDNESS_3 char *restrict a,
+   SIGNEDNESS_4 char *restrict b)
+{
+  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+    {
+      int av = a[i];
+      int bv = b[i];
+      SIGNEDNESS_2 short mult = av * bv;
+      res += mult;
+    }
+  return res;
+}
+
+SIGNEDNESS_1 int __attribute__ ((noipa)) g (SIGNEDNESS_1 int res, 
+SIGNEDNESS_3 char *restrict b,
+   SIGNEDNESS_4 char *restrict a)
+{
+  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+    {
+      int av = a[i];
+      int bv = b[i];
+      SIGNEDNESS_2 short mult = av * bv;
+      res += mult;
+    }
+  return res;
+}
+
+/* { dg-final { scan-assembler-times {vusdot.s8} 2 { target { 
+arm-*-*-gnueabihf } } } } */


-- 

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: rb14435.patch --]
[-- Type: text/x-diff; name="rb14435.patch", Size: 2306 bytes --]

diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index fec2cc91d24b6eff7b6fc8fdd54f39b3d646c468..23ad411178db77c5d19bee7452bc1070331c1aa0 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -3075,6 +3075,24 @@ (define_expand "<sup>dot_prod<vsi2qi>"
   DONE;
 })
 
+;; Auto-vectorizer pattern for usdot
+(define_expand "usdot_prod<vsi2qi>"
+  [(set (match_operand:VCVTI 0 "register_operand")
+	(plus:VCVTI (unspec:VCVTI [(match_operand:<VSI2QI> 1
+							"register_operand")
+				   (match_operand:<VSI2QI> 2
+							"register_operand")]
+		     UNSPEC_DOT_US)
+		    (match_operand:VCVTI 3 "register_operand")))]
+  "TARGET_I8MM"
+{
+  emit_insn (
+    gen_neon_usdot<vsi2qi> (operands[3], operands[3], operands[1],
+			    operands[2]));
+  emit_insn (gen_rtx_SET (operands[0], operands[3]));
+  DONE;
+})
+
 (define_expand "neon_copysignf<mode>"
   [(match_operand:VCVTF 0 "register_operand")
    (match_operand:VCVTF 1 "register_operand")
diff --git a/gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c b/gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c
new file mode 100644
index 0000000000000000000000000000000000000000..7cc56f68817d77d6950df0ab372d6fbaad6b3813
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=armv8.2-a+i8mm" } */
+
+#define N 480
+#define SIGNEDNESS_1 unsigned
+#define SIGNEDNESS_2 signed
+#define SIGNEDNESS_3 signed
+#define SIGNEDNESS_4 unsigned
+
+SIGNEDNESS_1 int __attribute__ ((noipa))
+f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a,
+   SIGNEDNESS_4 char *restrict b)
+{
+  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+    {
+      int av = a[i];
+      int bv = b[i];
+      SIGNEDNESS_2 short mult = av * bv;
+      res += mult;
+    }
+  return res;
+}
+
+SIGNEDNESS_1 int __attribute__ ((noipa))
+g (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict b,
+   SIGNEDNESS_4 char *restrict a)
+{
+  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+    {
+      int av = a[i];
+      int bv = b[i];
+      SIGNEDNESS_2 short mult = av * bv;
+      res += mult;
+    }
+  return res;
+}
+
+/* { dg-final { scan-assembler-times {vusdot.s8} 2 { target { arm-*-*-gnueabihf } } } } */


^ permalink raw reply	[flat|nested] 35+ messages in thread

[parent not found: <VI1PR08MB5325B832EE3BB6139886C0E9FF259@VI1PR08MB5325.eurprd08.prod.outlook.com>]

* RE: [PATCH 3/4][AArch32]: Add support for sign differing dot-product usdot for NEON.
       [not found]     ` <VI1PR08MB5325B832EE3BB6139886C0E9FF259@VI1PR08MB5325.eurprd08.prod.outlook.com>
@ 2021-05-25 15:02       ` Tamar Christina
  2021-05-26 10:45         ` Kyrylo Tkachov
  0 siblings, 1 reply; 35+ messages in thread
From: Tamar Christina @ 2021-05-25 15:02 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw, nd, Ramana Radhakrishnan, Kyrylo Tkachov

Forgot to include the list

> -----Original Message-----
> From: Tamar Christina
> Sent: Tuesday, May 25, 2021 3:57 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: Richard Earnshaw <Richard.Earnshaw@arm.com>; nd <nd@arm.com>;
> Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com>; Kyrylo Tkachov
> <Kyrylo.Tkachov@arm.com>
> Subject: RE: [PATCH 3/4][AArch32]: Add support for sign differing dot-
> product usdot for NEON.
> 
> Hi All,
> 
> This is a respin based on the feedback gotten from the AArch64 review.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* config/arm/neon.md (usdot_prod<vsi2qi>): New.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/arm/simd/vusdot-autovec.c: New test.
> 
> > -----Original Message-----
> > From: Gcc-patches <gcc-patches-bounces@gcc.gnu.org> On Behalf Of
> Tamar
> > Christina via Gcc-patches
> > Sent: Wednesday, May 5, 2021 6:42 PM
> > To: gcc Patches <gcc-patches@gcc.gnu.org>
> > Cc: Richard Earnshaw <Richard.Earnshaw@arm.com>; nd <nd@arm.com>;
> > Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com>
> > Subject: FW: [PATCH 3/4][AArch32]: Add support for sign differing dot-
> > product usdot for NEON.
> >
> > Forgot to CC maintainers..
> >
> > -----Original Message-----
> > From: Tamar Christina <tamar.christina@arm.com>
> > Sent: Wednesday, May 5, 2021 6:39 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: nd <nd@arm.com>
> > Subject: [PATCH 3/4][AArch32]: Add support for sign differing
> > dot-product usdot for NEON.
> >
> > Hi All,
> >
> > This adds optabs implementing usdot_prod.
> >
> > The following testcase:
> >
> > #define N 480
> > #define SIGNEDNESS_1 unsigned
> > #define SIGNEDNESS_2 signed
> > #define SIGNEDNESS_3 signed
> > #define SIGNEDNESS_4 unsigned
> >
> > SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int res,
> > SIGNEDNESS_3 char *restrict a,
> >    SIGNEDNESS_4 char *restrict b)
> > {
> >   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> >     {
> >       int av = a[i];
> >       int bv = b[i];
> >       SIGNEDNESS_2 short mult = av * bv;
> >       res += mult;
> >     }
> >   return res;
> > }
> >
> > Generates
> >
> > f:
> >         vmov.i32        q8, #0  @ v4si
> >         add     r3, r2, #480
> > .L2:
> >         vld1.8  {q10}, [r2]!
> >         vld1.8  {q9}, [r1]!
> >         vusdot.s8       q8, q9, q10
> >         cmp     r3, r2
> >         bne     .L2
> >         vadd.i32        d16, d16, d17
> >         vpadd.i32       d16, d16, d16
> >         vmov.32 r3, d16[0]
> >         add     r0, r0, r3
> >         bx      lr
> >
> > instead of
> >
> > f:
> >         vmov.i32        q8, #0  @ v4si
> >         add     r3, r2, #480
> > .L2:
> >         vld1.8  {q9}, [r2]!
> >         vld1.8  {q11}, [r1]!
> >         cmp     r3, r2
> >         vmull.s8 q10, d18, d22
> >         vmull.s8 q9, d19, d23
> >         vaddw.s16       q8, q8, d20
> >         vaddw.s16       q8, q8, d21
> >         vaddw.s16       q8, q8, d18
> >         vaddw.s16       q8, q8, d19
> >         bne     .L2
> >         vadd.i32        d16, d16, d17
> >         vpadd.i32       d16, d16, d16
> >         vmov.32 r3, d16[0]
> >         add     r0, r0, r3
> >         bx      lr
> >
> > For NEON.  I couldn't figure out if the MVE instruction vmlaldav.s16
> > could be used to emulate this.  Because it would require additional
> > widening to work I left MVE out of this patch set but perhaps someone
> should take a look.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 	* config/arm/neon.md (usdot_prod<vsi2qi>): New.
> >
> > gcc/testsuite/ChangeLog:
> >
> > 	* gcc.target/arm/simd/vusdot-autovec.c: New test.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index
> >
> fec2cc91d24b6eff7b6fc8fdd54f39b3d646c468..23ad411178db77c5d19bee7452
> > bc1070331c1aa0 100644
> > --- a/gcc/config/arm/neon.md
> > +++ b/gcc/config/arm/neon.md
> > @@ -3075,6 +3075,24 @@ (define_expand "<sup>dot_prod<vsi2qi>"
> >    DONE;
> >  })
> >
> > +;; Auto-vectorizer pattern for usdot
> > +(define_expand "usdot_prod<vsi2qi>"
> > +  [(set (match_operand:VCVTI 0 "register_operand")
> > +	(plus:VCVTI (unspec:VCVTI [(match_operand:<VSI2QI> 1
> > +							"register_operand")
> > +				   (match_operand:<VSI2QI> 2
> > +							"register_operand")]
> > +		     UNSPEC_DOT_US)
> > +		    (match_operand:VCVTI 3 "register_operand")))]
> > +  "TARGET_I8MM"
> > +{
> > +  emit_insn (
> > +    gen_neon_usdot<vsi2qi> (operands[3], operands[3], operands[1],
> > +			    operands[2]));
> > +  emit_insn (gen_rtx_SET (operands[0], operands[3]));
> > +  DONE;
> > +})
> > +
> >  (define_expand "neon_copysignf<mode>"
> >    [(match_operand:VCVTF 0 "register_operand")
> >     (match_operand:VCVTF 1 "register_operand") diff --git
> > a/gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c
> > b/gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c
> > new file mode 100644
> > index
> >
> 0000000000000000000000000000000000000000..7cc56f68817d77d6950df0ab37
> > 2d6fbaad6b3813
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c
> > @@ -0,0 +1,38 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O3 -march=armv8.2-a+i8mm" } */
> > +
> > +#define N 480
> > +#define SIGNEDNESS_1 unsigned
> > +#define SIGNEDNESS_2 signed
> > +#define SIGNEDNESS_3 signed
> > +#define SIGNEDNESS_4 unsigned
> > +
> > +SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int res,
> > +SIGNEDNESS_3 char *restrict a,
> > +   SIGNEDNESS_4 char *restrict b)
> > +{
> > +  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> > +    {
> > +      int av = a[i];
> > +      int bv = b[i];
> > +      SIGNEDNESS_2 short mult = av * bv;
> > +      res += mult;
> > +    }
> > +  return res;
> > +}
> > +
> > +SIGNEDNESS_1 int __attribute__ ((noipa)) g (SIGNEDNESS_1 int res,
> > +SIGNEDNESS_3 char *restrict b,
> > +   SIGNEDNESS_4 char *restrict a)
> > +{
> > +  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> > +    {
> > +      int av = a[i];
> > +      int bv = b[i];
> > +      SIGNEDNESS_2 short mult = av * bv;
> > +      res += mult;
> > +    }
> > +  return res;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {vusdot.s8} 2 { target {
> > +arm-*-*-gnueabihf } } } } */
> >
> >
> > --

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: [PATCH 3/4][AArch32]: Add support for sign differing dot-product usdot for NEON.
  2021-05-25 15:02       ` Tamar Christina
@ 2021-05-26 10:45         ` Kyrylo Tkachov
  0 siblings, 0 replies; 35+ messages in thread
From: Kyrylo Tkachov @ 2021-05-26 10:45 UTC (permalink / raw)
  To: Tamar Christina, gcc-patches; +Cc: Richard Earnshaw, nd, Ramana Radhakrishnan



> -----Original Message-----
> From: Tamar Christina <Tamar.Christina@arm.com>
> Sent: 25 May 2021 16:02
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Earnshaw <Richard.Earnshaw@arm.com>; nd <nd@arm.com>;
> Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com>; Kyrylo
> Tkachov <Kyrylo.Tkachov@arm.com>
> Subject: RE: [PATCH 3/4][AArch32]: Add support for sign differing dot-
> product usdot for NEON.
> 
> Forgot to include the list
> 
> > -----Original Message-----
> > From: Tamar Christina
> > Sent: Tuesday, May 25, 2021 3:57 PM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: Richard Earnshaw <Richard.Earnshaw@arm.com>; nd <nd@arm.com>;
> > Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com>; Kyrylo
> Tkachov
> > <Kyrylo.Tkachov@arm.com>
> > Subject: RE: [PATCH 3/4][AArch32]: Add support for sign differing dot-
> > product usdot for NEON.
> >
> > Hi All,
> >
> > This is a respin based on the feedback gotten from the AArch64 review.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >

Ok.
Thanks,
Kyrill

> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 	* config/arm/neon.md (usdot_prod<vsi2qi>): New.
> >
> > gcc/testsuite/ChangeLog:
> >
> > 	* gcc.target/arm/simd/vusdot-autovec.c: New test.
> >
> > > -----Original Message-----
> > > From: Gcc-patches <gcc-patches-bounces@gcc.gnu.org> On Behalf Of
> > Tamar
> > > Christina via Gcc-patches
> > > Sent: Wednesday, May 5, 2021 6:42 PM
> > > To: gcc Patches <gcc-patches@gcc.gnu.org>
> > > Cc: Richard Earnshaw <Richard.Earnshaw@arm.com>; nd
> <nd@arm.com>;
> > > Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com>
> > > Subject: FW: [PATCH 3/4][AArch32]: Add support for sign differing dot-
> > > product usdot for NEON.
> > >
> > > Forgot to CC maintainers..
> > >
> > > -----Original Message-----
> > > From: Tamar Christina <tamar.christina@arm.com>
> > > Sent: Wednesday, May 5, 2021 6:39 PM
> > > To: gcc-patches@gcc.gnu.org
> > > Cc: nd <nd@arm.com>
> > > Subject: [PATCH 3/4][AArch32]: Add support for sign differing
> > > dot-product usdot for NEON.
> > >
> > > Hi All,
> > >
> > > This adds optabs implementing usdot_prod.
> > >
> > > The following testcase:
> > >
> > > #define N 480
> > > #define SIGNEDNESS_1 unsigned
> > > #define SIGNEDNESS_2 signed
> > > #define SIGNEDNESS_3 signed
> > > #define SIGNEDNESS_4 unsigned
> > >
> > > SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int res,
> > > SIGNEDNESS_3 char *restrict a,
> > >    SIGNEDNESS_4 char *restrict b)
> > > {
> > >   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> > >     {
> > >       int av = a[i];
> > >       int bv = b[i];
> > >       SIGNEDNESS_2 short mult = av * bv;
> > >       res += mult;
> > >     }
> > >   return res;
> > > }
> > >
> > > Generates
> > >
> > > f:
> > >         vmov.i32        q8, #0  @ v4si
> > >         add     r3, r2, #480
> > > .L2:
> > >         vld1.8  {q10}, [r2]!
> > >         vld1.8  {q9}, [r1]!
> > >         vusdot.s8       q8, q9, q10
> > >         cmp     r3, r2
> > >         bne     .L2
> > >         vadd.i32        d16, d16, d17
> > >         vpadd.i32       d16, d16, d16
> > >         vmov.32 r3, d16[0]
> > >         add     r0, r0, r3
> > >         bx      lr
> > >
> > > instead of
> > >
> > > f:
> > >         vmov.i32        q8, #0  @ v4si
> > >         add     r3, r2, #480
> > > .L2:
> > >         vld1.8  {q9}, [r2]!
> > >         vld1.8  {q11}, [r1]!
> > >         cmp     r3, r2
> > >         vmull.s8 q10, d18, d22
> > >         vmull.s8 q9, d19, d23
> > >         vaddw.s16       q8, q8, d20
> > >         vaddw.s16       q8, q8, d21
> > >         vaddw.s16       q8, q8, d18
> > >         vaddw.s16       q8, q8, d19
> > >         bne     .L2
> > >         vadd.i32        d16, d16, d17
> > >         vpadd.i32       d16, d16, d16
> > >         vmov.32 r3, d16[0]
> > >         add     r0, r0, r3
> > >         bx      lr
> > >
> > > For NEON.  I couldn't figure out if the MVE instruction vmlaldav.s16
> > > could be used to emulate this.  Because it would require additional
> > > widening to work I left MVE out of this patch set but perhaps someone
> > should take a look.
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > >
> > > Ok for master?
> > >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > > 	* config/arm/neon.md (usdot_prod<vsi2qi>): New.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > 	* gcc.target/arm/simd/vusdot-autovec.c: New test.
> > >
> > > --- inline copy of patch --
> > > diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index
> > >
> >
> fec2cc91d24b6eff7b6fc8fdd54f39b3d646c468..23ad411178db77c5d19bee74
> 52
> > > bc1070331c1aa0 100644
> > > --- a/gcc/config/arm/neon.md
> > > +++ b/gcc/config/arm/neon.md
> > > @@ -3075,6 +3075,24 @@ (define_expand "<sup>dot_prod<vsi2qi>"
> > >    DONE;
> > >  })
> > >
> > > +;; Auto-vectorizer pattern for usdot
> > > +(define_expand "usdot_prod<vsi2qi>"
> > > +  [(set (match_operand:VCVTI 0 "register_operand")
> > > +	(plus:VCVTI (unspec:VCVTI [(match_operand:<VSI2QI> 1
> > > +							"register_operand")
> > > +				   (match_operand:<VSI2QI> 2
> > > +							"register_operand")]
> > > +		     UNSPEC_DOT_US)
> > > +		    (match_operand:VCVTI 3 "register_operand")))]
> > > +  "TARGET_I8MM"
> > > +{
> > > +  emit_insn (
> > > +    gen_neon_usdot<vsi2qi> (operands[3], operands[3], operands[1],
> > > +			    operands[2]));
> > > +  emit_insn (gen_rtx_SET (operands[0], operands[3]));
> > > +  DONE;
> > > +})
> > > +
> > >  (define_expand "neon_copysignf<mode>"
> > >    [(match_operand:VCVTF 0 "register_operand")
> > >     (match_operand:VCVTF 1 "register_operand") diff --git
> > > a/gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c
> > > b/gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c
> > > new file mode 100644
> > > index
> > >
> >
> 0000000000000000000000000000000000000000..7cc56f68817d77d6950df0
> ab37
> > > 2d6fbaad6b3813
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c
> > > @@ -0,0 +1,38 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O3 -march=armv8.2-a+i8mm" } */
> > > +
> > > +#define N 480
> > > +#define SIGNEDNESS_1 unsigned
> > > +#define SIGNEDNESS_2 signed
> > > +#define SIGNEDNESS_3 signed
> > > +#define SIGNEDNESS_4 unsigned
> > > +
> > > +SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int res,
> > > +SIGNEDNESS_3 char *restrict a,
> > > +   SIGNEDNESS_4 char *restrict b)
> > > +{
> > > +  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> > > +    {
> > > +      int av = a[i];
> > > +      int bv = b[i];
> > > +      SIGNEDNESS_2 short mult = av * bv;
> > > +      res += mult;
> > > +    }
> > > +  return res;
> > > +}
> > > +
> > > +SIGNEDNESS_1 int __attribute__ ((noipa)) g (SIGNEDNESS_1 int res,
> > > +SIGNEDNESS_3 char *restrict b,
> > > +   SIGNEDNESS_4 char *restrict a)
> > > +{
> > > +  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> > > +    {
> > > +      int av = a[i];
> > > +      int bv = b[i];
> > > +      SIGNEDNESS_2 short mult = av * bv;
> > > +      res += mult;
> > > +    }
> > > +  return res;
> > > +}
> > > +
> > > +/* { dg-final { scan-assembler-times {vusdot.s8} 2 { target {
> > > +arm-*-*-gnueabihf } } } } */
> > >
> > >
> > > --


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/4][AArch32]: Add support for sign differing dot-product usdot for NEON.
  2021-05-05 17:39 ` [PATCH 3/4][AArch32]: Add support for sign differing dot-product usdot for NEON Tamar Christina
  2021-05-05 17:42   ` FW: " Tamar Christina
@ 2021-05-06  9:23   ` Christophe Lyon
  2021-05-06  9:27     ` Tamar Christina
  1 sibling, 1 reply; 35+ messages in thread
From: Christophe Lyon @ 2021-05-06  9:23 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc Patches, nd

On Wed, 5 May 2021 at 19:39, Tamar Christina via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Hi All,
>
> This adds optabs implementing usdot_prod.
>
> The following testcase:
>
> #define N 480
> #define SIGNEDNESS_1 unsigned
> #define SIGNEDNESS_2 signed
> #define SIGNEDNESS_3 signed
> #define SIGNEDNESS_4 unsigned
>
> SIGNEDNESS_1 int __attribute__ ((noipa))
> f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a,
>    SIGNEDNESS_4 char *restrict b)
> {
>   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
>     {
>       int av = a[i];
>       int bv = b[i];
>       SIGNEDNESS_2 short mult = av * bv;
>       res += mult;
>     }
>   return res;
> }
>
> Generates
>
> f:
>         vmov.i32        q8, #0  @ v4si
>         add     r3, r2, #480
> .L2:
>         vld1.8  {q10}, [r2]!
>         vld1.8  {q9}, [r1]!
>         vusdot.s8       q8, q9, q10
>         cmp     r3, r2
>         bne     .L2
>         vadd.i32        d16, d16, d17
>         vpadd.i32       d16, d16, d16
>         vmov.32 r3, d16[0]
>         add     r0, r0, r3
>         bx      lr
>
> instead of
>
> f:
>         vmov.i32        q8, #0  @ v4si
>         add     r3, r2, #480
> .L2:
>         vld1.8  {q9}, [r2]!
>         vld1.8  {q11}, [r1]!
>         cmp     r3, r2
>         vmull.s8 q10, d18, d22
>         vmull.s8 q9, d19, d23
>         vaddw.s16       q8, q8, d20
>         vaddw.s16       q8, q8, d21
>         vaddw.s16       q8, q8, d18
>         vaddw.s16       q8, q8, d19
>         bne     .L2
>         vadd.i32        d16, d16, d17
>         vpadd.i32       d16, d16, d16
>         vmov.32 r3, d16[0]
>         add     r0, r0, r3
>         bx      lr
>
> For NEON.  I couldn't figure out if the MVE instruction vmlaldav.s16 could be
> used to emulate this.  Because it would require additional widening to work I
> left MVE out of this patch set but perhaps someone should take a look.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

I guess you mean arm-linux-gnueabihf ?

>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>         * config/arm/neon.md (usdot_prod<vsi2qi>): New.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/arm/simd/vusdot-autovec.c: New test.
>
> --- inline copy of patch --
> diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
> index fec2cc91d24b6eff7b6fc8fdd54f39b3d646c468..23ad411178db77c5d19bee7452bc1070331c1aa0 100644
> --- a/gcc/config/arm/neon.md
> +++ b/gcc/config/arm/neon.md
> @@ -3075,6 +3075,24 @@ (define_expand "<sup>dot_prod<vsi2qi>"
>    DONE;
>  })
>
> +;; Auto-vectorizer pattern for usdot
> +(define_expand "usdot_prod<vsi2qi>"
> +  [(set (match_operand:VCVTI 0 "register_operand")
> +       (plus:VCVTI (unspec:VCVTI [(match_operand:<VSI2QI> 1
> +                                                       "register_operand")
> +                                  (match_operand:<VSI2QI> 2
> +                                                       "register_operand")]
> +                    UNSPEC_DOT_US)
> +                   (match_operand:VCVTI 3 "register_operand")))]
> +  "TARGET_I8MM"
> +{
> +  emit_insn (
> +    gen_neon_usdot<vsi2qi> (operands[3], operands[3], operands[1],
> +                           operands[2]));
> +  emit_insn (gen_rtx_SET (operands[0], operands[3]));
> +  DONE;
> +})
> +
>  (define_expand "neon_copysignf<mode>"
>    [(match_operand:VCVTF 0 "register_operand")
>     (match_operand:VCVTF 1 "register_operand")
> diff --git a/gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c b/gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..7cc56f68817d77d6950df0ab372d6fbaad6b3813
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c
> @@ -0,0 +1,38 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -march=armv8.2-a+i8mm" } */
> +
> +#define N 480
> +#define SIGNEDNESS_1 unsigned
> +#define SIGNEDNESS_2 signed
> +#define SIGNEDNESS_3 signed
> +#define SIGNEDNESS_4 unsigned
> +
> +SIGNEDNESS_1 int __attribute__ ((noipa))
> +f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a,
> +   SIGNEDNESS_4 char *restrict b)
> +{
> +  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> +    {
> +      int av = a[i];
> +      int bv = b[i];
> +      SIGNEDNESS_2 short mult = av * bv;
> +      res += mult;
> +    }
> +  return res;
> +}
> +
> +SIGNEDNESS_1 int __attribute__ ((noipa))
> +g (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict b,
> +   SIGNEDNESS_4 char *restrict a)
> +{
> +  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> +    {
> +      int av = a[i];
> +      int bv = b[i];
> +      SIGNEDNESS_2 short mult = av * bv;
> +      res += mult;
> +    }
> +  return res;
> +}
> +
> +/* { dg-final { scan-assembler-times {vusdot.s8} 2 { target { arm-*-*-gnueabihf } } } } */
>
>
> --

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: [PATCH 3/4][AArch32]: Add support for sign differing dot-product usdot for NEON.
  2021-05-06  9:23   ` Christophe Lyon
@ 2021-05-06  9:27     ` Tamar Christina
  0 siblings, 0 replies; 35+ messages in thread
From: Tamar Christina @ 2021-05-06  9:27 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: gcc Patches, nd



> -----Original Message-----
> From: Christophe Lyon <christophe.lyon@linaro.org>
> Sent: Thursday, May 6, 2021 10:23 AM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc Patches <gcc-patches@gcc.gnu.org>; nd <nd@arm.com>
> Subject: Re: [PATCH 3/4][AArch32]: Add support for sign differing dot-
> product usdot for NEON.
> 
> On Wed, 5 May 2021 at 19:39, Tamar Christina via Gcc-patches <gcc-
> patches@gcc.gnu.org> wrote:
> >
> > Hi All,
> >
> > This adds optabs implementing usdot_prod.
> >
> > The following testcase:
> >
> > #define N 480
> > #define SIGNEDNESS_1 unsigned
> > #define SIGNEDNESS_2 signed
> > #define SIGNEDNESS_3 signed
> > #define SIGNEDNESS_4 unsigned
> >
> > SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int res,
> > SIGNEDNESS_3 char *restrict a,
> >    SIGNEDNESS_4 char *restrict b)
> > {
> >   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> >     {
> >       int av = a[i];
> >       int bv = b[i];
> >       SIGNEDNESS_2 short mult = av * bv;
> >       res += mult;
> >     }
> >   return res;
> > }
> >
> > Generates
> >
> > f:
> >         vmov.i32        q8, #0  @ v4si
> >         add     r3, r2, #480
> > .L2:
> >         vld1.8  {q10}, [r2]!
> >         vld1.8  {q9}, [r1]!
> >         vusdot.s8       q8, q9, q10
> >         cmp     r3, r2
> >         bne     .L2
> >         vadd.i32        d16, d16, d17
> >         vpadd.i32       d16, d16, d16
> >         vmov.32 r3, d16[0]
> >         add     r0, r0, r3
> >         bx      lr
> >
> > instead of
> >
> > f:
> >         vmov.i32        q8, #0  @ v4si
> >         add     r3, r2, #480
> > .L2:
> >         vld1.8  {q9}, [r2]!
> >         vld1.8  {q11}, [r1]!
> >         cmp     r3, r2
> >         vmull.s8 q10, d18, d22
> >         vmull.s8 q9, d19, d23
> >         vaddw.s16       q8, q8, d20
> >         vaddw.s16       q8, q8, d21
> >         vaddw.s16       q8, q8, d18
> >         vaddw.s16       q8, q8, d19
> >         bne     .L2
> >         vadd.i32        d16, d16, d17
> >         vpadd.i32       d16, d16, d16
> >         vmov.32 r3, d16[0]
> >         add     r0, r0, r3
> >         bx      lr
> >
> > For NEON.  I couldn't figure out if the MVE instruction vmlaldav.s16
> > could be used to emulate this.  Because it would require additional
> > widening to work I left MVE out of this patch set but perhaps someone
> should take a look.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> I guess you mean arm-linux-gnueabihf ?
> 

Oops, yeah, automatic pilot..

> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> >         * config/arm/neon.md (usdot_prod<vsi2qi>): New.
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * gcc.target/arm/simd/vusdot-autovec.c: New test.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index
> >
> fec2cc91d24b6eff7b6fc8fdd54f39b3d646c468..23ad411178db77c5d19bee7452
> bc
> > 1070331c1aa0 100644
> > --- a/gcc/config/arm/neon.md
> > +++ b/gcc/config/arm/neon.md
> > @@ -3075,6 +3075,24 @@ (define_expand "<sup>dot_prod<vsi2qi>"
> >    DONE;
> >  })
> >
> > +;; Auto-vectorizer pattern for usdot
> > +(define_expand "usdot_prod<vsi2qi>"
> > +  [(set (match_operand:VCVTI 0 "register_operand")
> > +       (plus:VCVTI (unspec:VCVTI [(match_operand:<VSI2QI> 1
> > +                                                       "register_operand")
> > +                                  (match_operand:<VSI2QI> 2
> > +                                                       "register_operand")]
> > +                    UNSPEC_DOT_US)
> > +                   (match_operand:VCVTI 3 "register_operand")))]
> > +  "TARGET_I8MM"
> > +{
> > +  emit_insn (
> > +    gen_neon_usdot<vsi2qi> (operands[3], operands[3], operands[1],
> > +                           operands[2]));
> > +  emit_insn (gen_rtx_SET (operands[0], operands[3]));
> > +  DONE;
> > +})
> > +
> >  (define_expand "neon_copysignf<mode>"
> >    [(match_operand:VCVTF 0 "register_operand")
> >     (match_operand:VCVTF 1 "register_operand") diff --git
> > a/gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c
> > b/gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c
> > new file mode 100644
> > index
> >
> 0000000000000000000000000000000000000000..7cc56f68817d77d6950df0ab37
> 2d
> > 6fbaad6b3813
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c
> > @@ -0,0 +1,38 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O3 -march=armv8.2-a+i8mm" } */
> > +
> > +#define N 480
> > +#define SIGNEDNESS_1 unsigned
> > +#define SIGNEDNESS_2 signed
> > +#define SIGNEDNESS_3 signed
> > +#define SIGNEDNESS_4 unsigned
> > +
> > +SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int res,
> > +SIGNEDNESS_3 char *restrict a,
> > +   SIGNEDNESS_4 char *restrict b)
> > +{
> > +  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> > +    {
> > +      int av = a[i];
> > +      int bv = b[i];
> > +      SIGNEDNESS_2 short mult = av * bv;
> > +      res += mult;
> > +    }
> > +  return res;
> > +}
> > +
> > +SIGNEDNESS_1 int __attribute__ ((noipa)) g (SIGNEDNESS_1 int res,
> > +SIGNEDNESS_3 char *restrict b,
> > +   SIGNEDNESS_4 char *restrict a)
> > +{
> > +  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> > +    {
> > +      int av = a[i];
> > +      int bv = b[i];
> > +      SIGNEDNESS_2 short mult = av * bv;
> > +      res += mult;
> > +    }
> > +  return res;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {vusdot.s8} 2 { target {
> > +arm-*-*-gnueabihf } } } } */
> >
> >
> > --

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 4/4]middle-end: Add tests middle end generic tests for sign differing dotproduct.
  2021-05-05 17:38 [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes Tamar Christina
  2021-05-05 17:38 ` [PATCH 2/4]AArch64: Add support for sign differing dot-product usdot for NEON and SVE Tamar Christina
  2021-05-05 17:39 ` [PATCH 3/4][AArch32]: Add support for sign differing dot-product usdot for NEON Tamar Christina
@ 2021-05-05 17:39 ` Tamar Christina
       [not found]   ` <VI1PR08MB532511701573C18A33AC6291FF259@VI1PR08MB5325.eurprd08.prod.outlook.com>
  2021-05-07 11:45 ` [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes Richard Biener
  3 siblings, 1 reply; 35+ messages in thread
From: Tamar Christina @ 2021-05-05 17:39 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, rguenther

[-- Attachment #1: Type: text/plain, Size: 13192 bytes --]

Hi All,

This adds testcases to test for auto-vect detection of the new sign differing
dot product.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* doc/sourcebuild.texi (arm_v8_2a_i8mm_neon_hw): Document.

gcc/testsuite/ChangeLog:

	* lib/target-supports.exp
	(check_effective_target_arm_v8_2a_imm8_neon_ok_nocache,
	check_effective_target_arm_v8_2a_i8mm_neon_hw,
	check_effective_target_vect_usdot_qi): New.
	* gcc.dg/vect/vect-reduc-dot-10.c: New test.
	* gcc.dg/vect/vect-reduc-dot-11.c: New test.
	* gcc.dg/vect/vect-reduc-dot-12.c: New test.
	* gcc.dg/vect/vect-reduc-dot-13.c: New test.
	* gcc.dg/vect/vect-reduc-dot-14.c: New test.
	* gcc.dg/vect/vect-reduc-dot-15.c: New test.
	* gcc.dg/vect/vect-reduc-dot-16.c: New test.
	* gcc.dg/vect/vect-reduc-dot-9.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index b0001247795947c9dcab1a14884ecd585976dfdd..0034ac9d86b26e6674d71090b9d04b6148f99e17 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1672,6 +1672,10 @@ Target supports a vector dot-product of @code{signed char}.
 @item vect_udot_qi
 Target supports a vector dot-product of @code{unsigned char}.
 
+@item vect_usdot_qi
+Target supports a vector dot-product where one operand of the multiply is
+@code{signed char} and the other of @code{unsigned char}.
+
 @item vect_sdot_hi
 Target supports a vector dot-product of @code{signed short}.
 
@@ -1947,6 +1951,11 @@ ARM target supports executing instructions from ARMv8.2-A with the Dot
 Product extension. Some multilibs may be incompatible with these options.
 Implies arm_v8_2a_dotprod_neon_ok.
 
+@item arm_v8_2a_i8mm_neon_hw
+ARM target supports executing instructions from ARMv8.2-A with the 8-bit
+Matrix Multiply extension.  Some multilibs may be incompatible with these
+options.  Implies arm_v8_2a_i8mm_ok.
+
 @item arm_fp16fml_neon_ok
 @anchor{arm_fp16fml_neon_ok}
 ARM target supports extensions to generate the @code{VFMAL} and @code{VFMLS}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-10.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-10.c
new file mode 100644
index 0000000000000000000000000000000000000000..7ce86965ea97d37c43d96b4d2271df667dcb2aae
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-10.c
@@ -0,0 +1,13 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */
+/* { dg-add-options arm_v8_2a_i8mm }  */
+
+#define SIGNEDNESS_1 unsigned
+#define SIGNEDNESS_2 unsigned
+#define SIGNEDNESS_3 unsigned
+#define SIGNEDNESS_4 signed
+
+#include "vect-reduc-dot-9.c"
+
+/* { dg-final { scan-tree-dump-not "vect_recog_dot_prod_pattern: detected" "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { target vect_usdot_qi } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-11.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-11.c
new file mode 100644
index 0000000000000000000000000000000000000000..0f7cbbb87ef028f166366aea55bc4ef49d2f8e9b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-11.c
@@ -0,0 +1,13 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */
+/* { dg-add-options arm_v8_2a_i8mm }  */
+
+#define SIGNEDNESS_1 unsigned
+#define SIGNEDNESS_2 signed
+#define SIGNEDNESS_3 unsigned
+#define SIGNEDNESS_4 signed
+
+#include "vect-reduc-dot-9.c"
+
+/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected" "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { target vect_usdot_qi } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-12.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-12.c
new file mode 100644
index 0000000000000000000000000000000000000000..08412614fc67045d3067b5b55ba032d297595237
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-12.c
@@ -0,0 +1,13 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */
+/* { dg-add-options arm_v8_2a_i8mm }  */
+
+#define SIGNEDNESS_1 unsigned
+#define SIGNEDNESS_2 signed
+#define SIGNEDNESS_3 signed
+#define SIGNEDNESS_4 unsigned
+
+#include "vect-reduc-dot-9.c"
+
+/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected" "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { target vect_usdot_qi } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-13.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-13.c
new file mode 100644
index 0000000000000000000000000000000000000000..7ee0f45f64296442204ee13d5f880f4b7716fb85
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-13.c
@@ -0,0 +1,13 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */
+/* { dg-add-options arm_v8_2a_i8mm }  */
+
+#define SIGNEDNESS_1 signed
+#define SIGNEDNESS_2 unsigned
+#define SIGNEDNESS_3 signed
+#define SIGNEDNESS_4 unsigned
+
+#include "vect-reduc-dot-9.c"
+
+/* { dg-final { scan-tree-dump-not "vect_recog_dot_prod_pattern: detected" "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { target vect_usdot_qi } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-14.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-14.c
new file mode 100644
index 0000000000000000000000000000000000000000..2de1434528b87f0c32c54150b16791f3f2a469b5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-14.c
@@ -0,0 +1,13 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */
+/* { dg-add-options arm_v8_2a_i8mm }  */
+
+#define SIGNEDNESS_1 signed
+#define SIGNEDNESS_2 unsigned
+#define SIGNEDNESS_3 unsigned
+#define SIGNEDNESS_4 signed
+
+#include "vect-reduc-dot-9.c"
+
+/* { dg-final { scan-tree-dump-not "vect_recog_dot_prod_pattern: detected" "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { target vect_usdot_qi } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-15.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-15.c
new file mode 100644
index 0000000000000000000000000000000000000000..dc48f95a32bf76c54a906ee81ddee99b16aea84a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-15.c
@@ -0,0 +1,13 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */
+/* { dg-add-options arm_v8_2a_i8mm }  */
+
+#define SIGNEDNESS_1 signed
+#define SIGNEDNESS_2 signed
+#define SIGNEDNESS_3 unsigned
+#define SIGNEDNESS_4 signed
+
+#include "vect-reduc-dot-9.c"
+
+/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected" "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { target vect_usdot_qi } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-16.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-16.c
new file mode 100644
index 0000000000000000000000000000000000000000..aec628789366673321aea88c60316a68fe16cbc5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-16.c
@@ -0,0 +1,13 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */
+/* { dg-add-options arm_v8_2a_i8mm }  */
+
+#define SIGNEDNESS_1 signed
+#define SIGNEDNESS_2 signed
+#define SIGNEDNESS_3 signed
+#define SIGNEDNESS_4 unsigned
+
+#include "vect-reduc-dot-9.c"
+
+/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected" "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { target vect_usdot_qi } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c
new file mode 100644
index 0000000000000000000000000000000000000000..cbbeedec3bfd0810a8ce8036e6670585d9334924
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c
@@ -0,0 +1,52 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */
+/* { dg-add-options arm_v8_2a_i8mm }  */
+
+#include "tree-vect.h"
+
+#define N 50
+
+#ifndef SIGNEDNESS_1
+#define SIGNEDNESS_1 unsigned
+#define SIGNEDNESS_2 unsigned
+#define SIGNEDNESS_3 signed
+#define SIGNEDNESS_4 unsigned
+#endif
+
+SIGNEDNESS_1 int __attribute__ ((noipa))
+f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a,
+   SIGNEDNESS_4 char *restrict b)
+{
+  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+    {
+      int av = a[i];
+      int bv = b[i];
+      SIGNEDNESS_2 short mult = av * bv;
+      res += mult;
+    }
+  return res;
+}
+
+#define BASE ((SIGNEDNESS_3 int) -1 < 0 ? -126 : 4)
+#define OFFSET 20
+
+int
+main (void)
+{
+  check_vect ();
+
+  SIGNEDNESS_3 char a[N], b[N];
+  int expected = 0x12345;
+  for (int i = 0; i < N; ++i)
+    {
+      a[i] = BASE + i * 5;
+      b[i] = BASE + OFFSET + i * 4;
+      asm volatile ("" ::: "memory");
+      expected += (SIGNEDNESS_2 short) (a[i] * b[i]);
+    }
+  if (f (0x12345, a, b) != expected)
+    __builtin_abort ();
+}
+
+/* { dg-final { scan-tree-dump-not "vect_recog_dot_prod_pattern: detected" "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { target vect_usdot_qi } } } */
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index ad323107f2ec5d55a77214beca5e4135643528b4..db9bd605ab4c838f65667fa616da334a171d9dfb 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -5240,6 +5240,36 @@ proc check_effective_target_arm_v8_2a_dotprod_neon_ok_nocache { } {
     return 0;
 }
 
+# Return 1 if the target supports ARMv8.2 Adv.SIMD imm8
+# instructions, 0 otherwise.  The test is valid for ARM and for AArch64.
+# Record the command line options needed.
+
+proc check_effective_target_arm_v8_2a_imm8_neon_ok_nocache { } {
+    global et_arm_v8_2a_imm8_neon_flags
+    set et_arm_v8_2a_imm8_neon_flags ""
+
+    if { ![istarget arm*-*-*] && ![istarget aarch64*-*-*] } {
+        return 0;
+    }
+
+    # Iterate through sets of options to find the compiler flags that
+    # need to be added to the -march option.
+    foreach flags {"" "-mfloat-abi=softfp -mfpu=neon-fp-armv8" "-mfloat-abi=hard -mfpu=neon-fp-armv8"} {
+        if { [check_no_compiler_messages_nocache \
+                  arm_v8_2a_imm8_neon_ok object {
+	    #include <stdint.h>
+            #if !defined (__ARM_FEATURE_MATMUL_INT8)
+            #error "__ARM_FEATURE_MATMUL_INT8 not defined"
+            #endif
+        } "$flags -march=armv8.2-a+imm8"] } {
+            set et_arm_v8_2a_imm8_neon_flags "$flags -march=armv8.2-a+imm8"
+            return 1
+        }
+    }
+
+    return 0;
+}
+
 # Return 1 if the target supports ARMv8.1-M MVE
 # instructions, 0 otherwise.  The test is valid for ARM.
 # Record the command line options needed.
@@ -5667,6 +5697,43 @@ proc check_effective_target_arm_v8_2a_dotprod_neon_hw { } {
     } [add_options_for_arm_v8_2a_dotprod_neon ""]]
 }
 
+# Return 1 if the target supports executing AdvSIMD instructions from ARMv8.2
+# with the i8mm extension, 0 otherwise.  The test is valid for ARM and for
+# AArch64.
+
+proc check_effective_target_arm_v8_2a_i8mm_neon_hw { } {
+    if { ![check_effective_target_arm_v8_2a_i8mm_ok] } {
+        return 0;
+    }
+    return [check_runtime arm_v8_2a_i8mm_neon_hw_available {
+        #include "arm_neon.h"
+        int
+        main (void)
+        {
+
+	  uint32x2_t results = {0,0};
+	  uint8x8_t a = {1,1,1,1,2,2,2,2};
+	  int8x8_t b = {2,2,2,2,3,3,3,3};
+
+          #ifdef __ARM_ARCH_ISA_A64
+          asm ("usdot %0.2s, %1.8b, %2.8b"
+               : "=w"(results)
+               : "w"(a), "w"(b)
+               : /* No clobbers.  */);
+
+	  #else
+          asm ("vusdot.u8 %P0, %P1, %P2"
+               : "=w"(results)
+               : "w"(a), "w"(b)
+               : /* No clobbers.  */);
+          #endif
+
+          return (vget_lane_u32 (results, 0) == 8
+		  && vget_lane_u32 (results, 1) == 24) ? 1 : 0;
+        }
+    } [add_options_for_arm_v8_2a_i8mm ""]]
+}
+
 # Return 1 if this is a ARM target with NEON enabled.
 
 proc check_effective_target_arm_neon { } {
@@ -7022,6 +7089,19 @@ proc check_effective_target_vect_udot_qi { } {
 		 && [et-is-effective-target mips_msa]) }}]
 }
 
+# Return 1 if the target plus current options supports a vector
+# dot-product where one operand of the multiply is signed char
+# and the other unsigned chars, 0 otherwise.
+#
+# This won't change for different subtargets so cache the result.
+
+proc check_effective_target_vect_usdot_qi { } {
+    return [check_cached_effective_target_indexed vect_usdot_qi {
+      expr { [istarget aarch64*-*-*]
+	     || [istarget arm*-*-*] }}]
+}
+
+
 # Return 1 if the target plus current options supports a vector
 # dot-product of signed shorts, 0 otherwise.
 #


-- 

[-- Attachment #2: rb14436.patch --]
[-- Type: text/x-diff, Size: 12313 bytes --]

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index b0001247795947c9dcab1a14884ecd585976dfdd..0034ac9d86b26e6674d71090b9d04b6148f99e17 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1672,6 +1672,10 @@ Target supports a vector dot-product of @code{signed char}.
 @item vect_udot_qi
 Target supports a vector dot-product of @code{unsigned char}.
 
+@item vect_usdot_qi
+Target supports a vector dot-product where one operand of the multiply is
+@code{signed char} and the other of @code{unsigned char}.
+
 @item vect_sdot_hi
 Target supports a vector dot-product of @code{signed short}.
 
@@ -1947,6 +1951,11 @@ ARM target supports executing instructions from ARMv8.2-A with the Dot
 Product extension. Some multilibs may be incompatible with these options.
 Implies arm_v8_2a_dotprod_neon_ok.
 
+@item arm_v8_2a_i8mm_neon_hw
+ARM target supports executing instructions from ARMv8.2-A with the 8-bit
+Matrix Multiply extension.  Some multilibs may be incompatible with these
+options.  Implies arm_v8_2a_i8mm_ok.
+
 @item arm_fp16fml_neon_ok
 @anchor{arm_fp16fml_neon_ok}
 ARM target supports extensions to generate the @code{VFMAL} and @code{VFMLS}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-10.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-10.c
new file mode 100644
index 0000000000000000000000000000000000000000..7ce86965ea97d37c43d96b4d2271df667dcb2aae
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-10.c
@@ -0,0 +1,13 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */
+/* { dg-add-options arm_v8_2a_i8mm }  */
+
+#define SIGNEDNESS_1 unsigned
+#define SIGNEDNESS_2 unsigned
+#define SIGNEDNESS_3 unsigned
+#define SIGNEDNESS_4 signed
+
+#include "vect-reduc-dot-9.c"
+
+/* { dg-final { scan-tree-dump-not "vect_recog_dot_prod_pattern: detected" "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { target vect_usdot_qi } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-11.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-11.c
new file mode 100644
index 0000000000000000000000000000000000000000..0f7cbbb87ef028f166366aea55bc4ef49d2f8e9b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-11.c
@@ -0,0 +1,13 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */
+/* { dg-add-options arm_v8_2a_i8mm }  */
+
+#define SIGNEDNESS_1 unsigned
+#define SIGNEDNESS_2 signed
+#define SIGNEDNESS_3 unsigned
+#define SIGNEDNESS_4 signed
+
+#include "vect-reduc-dot-9.c"
+
+/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected" "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { target vect_usdot_qi } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-12.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-12.c
new file mode 100644
index 0000000000000000000000000000000000000000..08412614fc67045d3067b5b55ba032d297595237
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-12.c
@@ -0,0 +1,13 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */
+/* { dg-add-options arm_v8_2a_i8mm }  */
+
+#define SIGNEDNESS_1 unsigned
+#define SIGNEDNESS_2 signed
+#define SIGNEDNESS_3 signed
+#define SIGNEDNESS_4 unsigned
+
+#include "vect-reduc-dot-9.c"
+
+/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected" "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { target vect_usdot_qi } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-13.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-13.c
new file mode 100644
index 0000000000000000000000000000000000000000..7ee0f45f64296442204ee13d5f880f4b7716fb85
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-13.c
@@ -0,0 +1,13 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */
+/* { dg-add-options arm_v8_2a_i8mm }  */
+
+#define SIGNEDNESS_1 signed
+#define SIGNEDNESS_2 unsigned
+#define SIGNEDNESS_3 signed
+#define SIGNEDNESS_4 unsigned
+
+#include "vect-reduc-dot-9.c"
+
+/* { dg-final { scan-tree-dump-not "vect_recog_dot_prod_pattern: detected" "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { target vect_usdot_qi } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-14.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-14.c
new file mode 100644
index 0000000000000000000000000000000000000000..2de1434528b87f0c32c54150b16791f3f2a469b5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-14.c
@@ -0,0 +1,13 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */
+/* { dg-add-options arm_v8_2a_i8mm }  */
+
+#define SIGNEDNESS_1 signed
+#define SIGNEDNESS_2 unsigned
+#define SIGNEDNESS_3 unsigned
+#define SIGNEDNESS_4 signed
+
+#include "vect-reduc-dot-9.c"
+
+/* { dg-final { scan-tree-dump-not "vect_recog_dot_prod_pattern: detected" "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { target vect_usdot_qi } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-15.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-15.c
new file mode 100644
index 0000000000000000000000000000000000000000..dc48f95a32bf76c54a906ee81ddee99b16aea84a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-15.c
@@ -0,0 +1,13 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */
+/* { dg-add-options arm_v8_2a_i8mm }  */
+
+#define SIGNEDNESS_1 signed
+#define SIGNEDNESS_2 signed
+#define SIGNEDNESS_3 unsigned
+#define SIGNEDNESS_4 signed
+
+#include "vect-reduc-dot-9.c"
+
+/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected" "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { target vect_usdot_qi } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-16.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-16.c
new file mode 100644
index 0000000000000000000000000000000000000000..aec628789366673321aea88c60316a68fe16cbc5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-16.c
@@ -0,0 +1,13 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */
+/* { dg-add-options arm_v8_2a_i8mm }  */
+
+#define SIGNEDNESS_1 signed
+#define SIGNEDNESS_2 signed
+#define SIGNEDNESS_3 signed
+#define SIGNEDNESS_4 unsigned
+
+#include "vect-reduc-dot-9.c"
+
+/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected" "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { target vect_usdot_qi } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c
new file mode 100644
index 0000000000000000000000000000000000000000..cbbeedec3bfd0810a8ce8036e6670585d9334924
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c
@@ -0,0 +1,52 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */
+/* { dg-add-options arm_v8_2a_i8mm }  */
+
+#include "tree-vect.h"
+
+#define N 50
+
+#ifndef SIGNEDNESS_1
+#define SIGNEDNESS_1 unsigned
+#define SIGNEDNESS_2 unsigned
+#define SIGNEDNESS_3 signed
+#define SIGNEDNESS_4 unsigned
+#endif
+
+SIGNEDNESS_1 int __attribute__ ((noipa))
+f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a,
+   SIGNEDNESS_4 char *restrict b)
+{
+  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+    {
+      int av = a[i];
+      int bv = b[i];
+      SIGNEDNESS_2 short mult = av * bv;
+      res += mult;
+    }
+  return res;
+}
+
+#define BASE ((SIGNEDNESS_3 int) -1 < 0 ? -126 : 4)
+#define OFFSET 20
+
+int
+main (void)
+{
+  check_vect ();
+
+  SIGNEDNESS_3 char a[N], b[N];
+  int expected = 0x12345;
+  for (int i = 0; i < N; ++i)
+    {
+      a[i] = BASE + i * 5;
+      b[i] = BASE + OFFSET + i * 4;
+      asm volatile ("" ::: "memory");
+      expected += (SIGNEDNESS_2 short) (a[i] * b[i]);
+    }
+  if (f (0x12345, a, b) != expected)
+    __builtin_abort ();
+}
+
+/* { dg-final { scan-tree-dump-not "vect_recog_dot_prod_pattern: detected" "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { target vect_usdot_qi } } } */
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index ad323107f2ec5d55a77214beca5e4135643528b4..db9bd605ab4c838f65667fa616da334a171d9dfb 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -5240,6 +5240,36 @@ proc check_effective_target_arm_v8_2a_dotprod_neon_ok_nocache { } {
     return 0;
 }
 
+# Return 1 if the target supports ARMv8.2 Adv.SIMD imm8
+# instructions, 0 otherwise.  The test is valid for ARM and for AArch64.
+# Record the command line options needed.
+
+proc check_effective_target_arm_v8_2a_imm8_neon_ok_nocache { } {
+    global et_arm_v8_2a_imm8_neon_flags
+    set et_arm_v8_2a_imm8_neon_flags ""
+
+    if { ![istarget arm*-*-*] && ![istarget aarch64*-*-*] } {
+        return 0;
+    }
+
+    # Iterate through sets of options to find the compiler flags that
+    # need to be added to the -march option.
+    foreach flags {"" "-mfloat-abi=softfp -mfpu=neon-fp-armv8" "-mfloat-abi=hard -mfpu=neon-fp-armv8"} {
+        if { [check_no_compiler_messages_nocache \
+                  arm_v8_2a_imm8_neon_ok object {
+	    #include <stdint.h>
+            #if !defined (__ARM_FEATURE_MATMUL_INT8)
+            #error "__ARM_FEATURE_MATMUL_INT8 not defined"
+            #endif
+        } "$flags -march=armv8.2-a+imm8"] } {
+            set et_arm_v8_2a_imm8_neon_flags "$flags -march=armv8.2-a+imm8"
+            return 1
+        }
+    }
+
+    return 0;
+}
+
 # Return 1 if the target supports ARMv8.1-M MVE
 # instructions, 0 otherwise.  The test is valid for ARM.
 # Record the command line options needed.
@@ -5667,6 +5697,43 @@ proc check_effective_target_arm_v8_2a_dotprod_neon_hw { } {
     } [add_options_for_arm_v8_2a_dotprod_neon ""]]
 }
 
+# Return 1 if the target supports executing AdvSIMD instructions from ARMv8.2
+# with the i8mm extension, 0 otherwise.  The test is valid for ARM and for
+# AArch64.
+
+proc check_effective_target_arm_v8_2a_i8mm_neon_hw { } {
+    if { ![check_effective_target_arm_v8_2a_i8mm_ok] } {
+        return 0;
+    }
+    return [check_runtime arm_v8_2a_i8mm_neon_hw_available {
+        #include "arm_neon.h"
+        int
+        main (void)
+        {
+
+	  uint32x2_t results = {0,0};
+	  uint8x8_t a = {1,1,1,1,2,2,2,2};
+	  int8x8_t b = {2,2,2,2,3,3,3,3};
+
+          #ifdef __ARM_ARCH_ISA_A64
+          asm ("usdot %0.2s, %1.8b, %2.8b"
+               : "=w"(results)
+               : "w"(a), "w"(b)
+               : /* No clobbers.  */);
+
+	  #else
+          asm ("vusdot.u8 %P0, %P1, %P2"
+               : "=w"(results)
+               : "w"(a), "w"(b)
+               : /* No clobbers.  */);
+          #endif
+
+          return (vget_lane_u32 (results, 0) == 8
+		  && vget_lane_u32 (results, 1) == 24) ? 1 : 0;
+        }
+    } [add_options_for_arm_v8_2a_i8mm ""]]
+}
+
 # Return 1 if this is a ARM target with NEON enabled.
 
 proc check_effective_target_arm_neon { } {
@@ -7022,6 +7089,19 @@ proc check_effective_target_vect_udot_qi { } {
 		 && [et-is-effective-target mips_msa]) }}]
 }
 
+# Return 1 if the target plus current options supports a vector
+# dot-product where one operand of the multiply is signed char
+# and the other unsigned chars, 0 otherwise.
+#
+# This won't change for different subtargets so cache the result.
+
+proc check_effective_target_vect_usdot_qi { } {
+    return [check_cached_effective_target_indexed vect_usdot_qi {
+      expr { [istarget aarch64*-*-*]
+	     || [istarget arm*-*-*] }}]
+}
+
+
 # Return 1 if the target plus current options supports a vector
 # dot-product of signed shorts, 0 otherwise.
 #


^ permalink raw reply	[flat|nested] 35+ messages in thread

[parent not found: <VI1PR08MB532511701573C18A33AC6291FF259@VI1PR08MB5325.eurprd08.prod.outlook.com>]

* FW: [PATCH 4/4]middle-end: Add tests middle end generic tests for sign differing dotproduct.
       [not found]   ` <VI1PR08MB532511701573C18A33AC6291FF259@VI1PR08MB5325.eurprd08.prod.outlook.com>
@ 2021-05-25 15:01     ` Tamar Christina
       [not found]     ` <11s2181-8856-30rq-26or-84q8o7qrr2o@fhfr.qr>
  1 sibling, 0 replies; 35+ messages in thread
From: Tamar Christina @ 2021-05-25 15:01 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 16053 bytes --]


Forgot the list...

-----Original Message-----
From: Tamar Christina 
Sent: Tuesday, May 25, 2021 3:58 PM
To: Tamar Christina <Tamar.Christina@arm.com>
Cc: nd <nd@arm.com>; rguenther@suse.de
Subject: RE: [PATCH 4/4]middle-end: Add tests middle end generic tests for sign differing dotproduct.

Hi All,

Adding a few more tests

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* doc/sourcebuild.texi (arm_v8_2a_i8mm_neon_hw): Document.

gcc/testsuite/ChangeLog:

	* lib/target-supports.exp
	(check_effective_target_arm_v8_2a_imm8_neon_ok_nocache,
	check_effective_target_arm_v8_2a_i8mm_neon_hw,
	check_effective_target_vect_usdot_qi): New.
	* gcc.dg/vect/vect-reduc-dot-9.c: New test.
	* gcc.dg/vect/vect-reduc-dot-10.c: New test.
	* gcc.dg/vect/vect-reduc-dot-11.c: New test.
	* gcc.dg/vect/vect-reduc-dot-12.c: New test.
	* gcc.dg/vect/vect-reduc-dot-13.c: New test.
	* gcc.dg/vect/vect-reduc-dot-14.c: New test.
	* gcc.dg/vect/vect-reduc-dot-15.c: New test.
	* gcc.dg/vect/vect-reduc-dot-16.c: New test.
	* gcc.dg/vect/vect-reduc-dot-17.c: New test.
	* gcc.dg/vect/vect-reduc-dot-18.c: New test.

> -----Original Message-----
> From: Gcc-patches <gcc-patches-bounces@gcc.gnu.org> On Behalf Of Tamar 
> Christina via Gcc-patches
> Sent: Wednesday, May 5, 2021 6:40 PM
> To: gcc-patches@gcc.gnu.org
> Cc: nd <nd@arm.com>; rguenther@suse.de
> Subject: [PATCH 4/4]middle-end: Add tests middle end generic tests for 
> sign differing dotproduct.
> 
> Hi All,
> 
> This adds testcases to test for auto-vect detection of the new sign 
> differing dot product.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* doc/sourcebuild.texi (arm_v8_2a_i8mm_neon_hw): Document.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* lib/target-supports.exp
> 	(check_effective_target_arm_v8_2a_imm8_neon_ok_nocache,
> 	check_effective_target_arm_v8_2a_i8mm_neon_hw,
> 	check_effective_target_vect_usdot_qi): New.
> 	* gcc.dg/vect/vect-reduc-dot-10.c: New test.
> 	* gcc.dg/vect/vect-reduc-dot-11.c: New test.
> 	* gcc.dg/vect/vect-reduc-dot-12.c: New test.
> 	* gcc.dg/vect/vect-reduc-dot-13.c: New test.
> 	* gcc.dg/vect/vect-reduc-dot-14.c: New test.
> 	* gcc.dg/vect/vect-reduc-dot-15.c: New test.
> 	* gcc.dg/vect/vect-reduc-dot-16.c: New test.
> 	* gcc.dg/vect/vect-reduc-dot-9.c: New test.
> 
> --- inline copy of patch --
> diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi index 
> b0001247795947c9dcab1a14884ecd585976dfdd..0034ac9d86b26e6674d71090b
> 9d04b6148f99e17 100644
> --- a/gcc/doc/sourcebuild.texi
> +++ b/gcc/doc/sourcebuild.texi
> @@ -1672,6 +1672,10 @@ Target supports a vector dot-product of 
> @code{signed char}.
>  @item vect_udot_qi
>  Target supports a vector dot-product of @code{unsigned char}.
> 
> +@item vect_usdot_qi
> +Target supports a vector dot-product where one operand of the 
> +multiply is @code{signed char} and the other of @code{unsigned char}.
> +
>  @item vect_sdot_hi
>  Target supports a vector dot-product of @code{signed short}.
> 
> @@ -1947,6 +1951,11 @@ ARM target supports executing instructions from 
> ARMv8.2-A with the Dot  Product extension. Some multilibs may be 
> incompatible with these options.
>  Implies arm_v8_2a_dotprod_neon_ok.
> 
> +@item arm_v8_2a_i8mm_neon_hw
> +ARM target supports executing instructions from ARMv8.2-A with the 
> +8-bit Matrix Multiply extension.  Some multilibs may be incompatible 
> +with these options.  Implies arm_v8_2a_i8mm_ok.
> +
>  @item arm_fp16fml_neon_ok
>  @anchor{arm_fp16fml_neon_ok}
>  ARM target supports extensions to generate the @code{VFMAL} and 
> @code{VFMLS} diff --git 
> a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-10.c
> b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-10.c
> new file mode 100644
> index
> 0000000000000000000000000000000000000000..7ce86965ea97d37c43d96b4d2
> 271df667dcb2aae
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-10.c
> @@ -0,0 +1,13 @@
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target {
> +aarch64*-*-* || arm*-*-* } } } */
> +/* { dg-add-options arm_v8_2a_i8mm }  */
> +
> +#define SIGNEDNESS_1 unsigned
> +#define SIGNEDNESS_2 unsigned
> +#define SIGNEDNESS_3 unsigned
> +#define SIGNEDNESS_4 signed
> +
> +#include "vect-reduc-dot-9.c"
> +
> +/* { dg-final { scan-tree-dump-not "vect_recog_dot_prod_pattern:
> +detected" "vect" } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { 
> +target vect_usdot_qi } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-11.c
> b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-11.c
> new file mode 100644
> index
> 0000000000000000000000000000000000000000..0f7cbbb87ef028f166366aea55
> bc4ef49d2f8e9b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-11.c
> @@ -0,0 +1,13 @@
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target {
> +aarch64*-*-* || arm*-*-* } } } */
> +/* { dg-add-options arm_v8_2a_i8mm }  */
> +
> +#define SIGNEDNESS_1 unsigned
> +#define SIGNEDNESS_2 signed
> +#define SIGNEDNESS_3 unsigned
> +#define SIGNEDNESS_4 signed
> +
> +#include "vect-reduc-dot-9.c"
> +
> +/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected"
> +"vect" } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { 
> +target vect_usdot_qi } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-12.c
> b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-12.c
> new file mode 100644
> index
> 0000000000000000000000000000000000000000..08412614fc67045d3067b5b55
> ba032d297595237
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-12.c
> @@ -0,0 +1,13 @@
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target {
> +aarch64*-*-* || arm*-*-* } } } */
> +/* { dg-add-options arm_v8_2a_i8mm }  */
> +
> +#define SIGNEDNESS_1 unsigned
> +#define SIGNEDNESS_2 signed
> +#define SIGNEDNESS_3 signed
> +#define SIGNEDNESS_4 unsigned
> +
> +#include "vect-reduc-dot-9.c"
> +
> +/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected"
> +"vect" } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { 
> +target vect_usdot_qi } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-13.c
> b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-13.c
> new file mode 100644
> index
> 0000000000000000000000000000000000000000..7ee0f45f64296442204ee13d5f
> 880f4b7716fb85
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-13.c
> @@ -0,0 +1,13 @@
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target {
> +aarch64*-*-* || arm*-*-* } } } */
> +/* { dg-add-options arm_v8_2a_i8mm }  */
> +
> +#define SIGNEDNESS_1 signed
> +#define SIGNEDNESS_2 unsigned
> +#define SIGNEDNESS_3 signed
> +#define SIGNEDNESS_4 unsigned
> +
> +#include "vect-reduc-dot-9.c"
> +
> +/* { dg-final { scan-tree-dump-not "vect_recog_dot_prod_pattern:
> +detected" "vect" } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { 
> +target vect_usdot_qi } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-14.c
> b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-14.c
> new file mode 100644
> index
> 0000000000000000000000000000000000000000..2de1434528b87f0c32c54150b1
> 6791f3f2a469b5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-14.c
> @@ -0,0 +1,13 @@
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target {
> +aarch64*-*-* || arm*-*-* } } } */
> +/* { dg-add-options arm_v8_2a_i8mm }  */
> +
> +#define SIGNEDNESS_1 signed
> +#define SIGNEDNESS_2 unsigned
> +#define SIGNEDNESS_3 unsigned
> +#define SIGNEDNESS_4 signed
> +
> +#include "vect-reduc-dot-9.c"
> +
> +/* { dg-final { scan-tree-dump-not "vect_recog_dot_prod_pattern:
> +detected" "vect" } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { 
> +target vect_usdot_qi } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-15.c
> b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-15.c
> new file mode 100644
> index
> 0000000000000000000000000000000000000000..dc48f95a32bf76c54a906ee81d
> dee99b16aea84a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-15.c
> @@ -0,0 +1,13 @@
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target {
> +aarch64*-*-* || arm*-*-* } } } */
> +/* { dg-add-options arm_v8_2a_i8mm }  */
> +
> +#define SIGNEDNESS_1 signed
> +#define SIGNEDNESS_2 signed
> +#define SIGNEDNESS_3 unsigned
> +#define SIGNEDNESS_4 signed
> +
> +#include "vect-reduc-dot-9.c"
> +
> +/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected"
> +"vect" } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { 
> +target vect_usdot_qi } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-16.c
> b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-16.c
> new file mode 100644
> index
> 0000000000000000000000000000000000000000..aec628789366673321aea88c60
> 316a68fe16cbc5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-16.c
> @@ -0,0 +1,13 @@
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target {
> +aarch64*-*-* || arm*-*-* } } } */
> +/* { dg-add-options arm_v8_2a_i8mm }  */
> +
> +#define SIGNEDNESS_1 signed
> +#define SIGNEDNESS_2 signed
> +#define SIGNEDNESS_3 signed
> +#define SIGNEDNESS_4 unsigned
> +
> +#include "vect-reduc-dot-9.c"
> +
> +/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected"
> +"vect" } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { 
> +target vect_usdot_qi } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c
> b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c
> new file mode 100644
> index
> 0000000000000000000000000000000000000000..cbbeedec3bfd0810a8ce8036e
> 6670585d9334924
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c
> @@ -0,0 +1,52 @@
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target {
> +aarch64*-*-* || arm*-*-* } } } */
> +/* { dg-add-options arm_v8_2a_i8mm }  */
> +
> +#include "tree-vect.h"
> +
> +#define N 50
> +
> +#ifndef SIGNEDNESS_1
> +#define SIGNEDNESS_1 unsigned
> +#define SIGNEDNESS_2 unsigned
> +#define SIGNEDNESS_3 signed
> +#define SIGNEDNESS_4 unsigned
> +#endif
> +
> +SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int res,
> +SIGNEDNESS_3 char *restrict a,
> +   SIGNEDNESS_4 char *restrict b)
> +{
> +  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> +    {
> +      int av = a[i];
> +      int bv = b[i];
> +      SIGNEDNESS_2 short mult = av * bv;
> +      res += mult;
> +    }
> +  return res;
> +}
> +
> +#define BASE ((SIGNEDNESS_3 int) -1 < 0 ? -126 : 4) #define OFFSET 20
> +
> +int
> +main (void)
> +{
> +  check_vect ();
> +
> +  SIGNEDNESS_3 char a[N], b[N];
> +  int expected = 0x12345;
> +  for (int i = 0; i < N; ++i)
> +    {
> +      a[i] = BASE + i * 5;
> +      b[i] = BASE + OFFSET + i * 4;
> +      asm volatile ("" ::: "memory");
> +      expected += (SIGNEDNESS_2 short) (a[i] * b[i]);
> +    }
> +  if (f (0x12345, a, b) != expected)
> +    __builtin_abort ();
> +}
> +
> +/* { dg-final { scan-tree-dump-not "vect_recog_dot_prod_pattern:
> +detected" "vect" } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { 
> +target vect_usdot_qi } } } */
> diff --git a/gcc/testsuite/lib/target-supports.exp 
> b/gcc/testsuite/lib/target- supports.exp index
> ad323107f2ec5d55a77214beca5e4135643528b4..db9bd605ab4c838f65667fa61
> 6da334a171d9dfb 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -5240,6 +5240,36 @@ proc
> check_effective_target_arm_v8_2a_dotprod_neon_ok_nocache { } {
>      return 0;
>  }
> 
> +# Return 1 if the target supports ARMv8.2 Adv.SIMD imm8 # 
> +instructions,
> +0 otherwise.  The test is valid for ARM and for AArch64.
> +# Record the command line options needed.
> +
> +proc check_effective_target_arm_v8_2a_imm8_neon_ok_nocache { } {
> +    global et_arm_v8_2a_imm8_neon_flags
> +    set et_arm_v8_2a_imm8_neon_flags ""
> +
> +    if { ![istarget arm*-*-*] && ![istarget aarch64*-*-*] } {
> +        return 0;
> +    }
> +
> +    # Iterate through sets of options to find the compiler flags that
> +    # need to be added to the -march option.
> +    foreach flags {"" "-mfloat-abi=softfp -mfpu=neon-fp-armv8" 
> + "-mfloat-
> abi=hard -mfpu=neon-fp-armv8"} {
> +        if { [check_no_compiler_messages_nocache \
> +                  arm_v8_2a_imm8_neon_ok object {
> +	    #include <stdint.h>
> +            #if !defined (__ARM_FEATURE_MATMUL_INT8)
> +            #error "__ARM_FEATURE_MATMUL_INT8 not defined"
> +            #endif
> +        } "$flags -march=armv8.2-a+imm8"] } {
> +            set et_arm_v8_2a_imm8_neon_flags "$flags -march=armv8.2-
> a+imm8"
> +            return 1
> +        }
> +    }
> +
> +    return 0;
> +}
> +
>  # Return 1 if the target supports ARMv8.1-M MVE  # instructions, 0 
> otherwise.  The test is valid for ARM.
>  # Record the command line options needed.
> @@ -5667,6 +5697,43 @@ proc
> check_effective_target_arm_v8_2a_dotprod_neon_hw { } {
>      } [add_options_for_arm_v8_2a_dotprod_neon ""]]  }
> 
> +# Return 1 if the target supports executing AdvSIMD instructions from
> +ARMv8.2 # with the i8mm extension, 0 otherwise.  The test is valid 
> +for ARM and for # AArch64.
> +
> +proc check_effective_target_arm_v8_2a_i8mm_neon_hw { } {
> +    if { ![check_effective_target_arm_v8_2a_i8mm_ok] } {
> +        return 0;
> +    }
> +    return [check_runtime arm_v8_2a_i8mm_neon_hw_available {
> +        #include "arm_neon.h"
> +        int
> +        main (void)
> +        {
> +
> +	  uint32x2_t results = {0,0};
> +	  uint8x8_t a = {1,1,1,1,2,2,2,2};
> +	  int8x8_t b = {2,2,2,2,3,3,3,3};
> +
> +          #ifdef __ARM_ARCH_ISA_A64
> +          asm ("usdot %0.2s, %1.8b, %2.8b"
> +               : "=w"(results)
> +               : "w"(a), "w"(b)
> +               : /* No clobbers.  */);
> +
> +	  #else
> +          asm ("vusdot.u8 %P0, %P1, %P2"
> +               : "=w"(results)
> +               : "w"(a), "w"(b)
> +               : /* No clobbers.  */);
> +          #endif
> +
> +          return (vget_lane_u32 (results, 0) == 8
> +		  && vget_lane_u32 (results, 1) == 24) ? 1 : 0;
> +        }
> +    } [add_options_for_arm_v8_2a_i8mm ""]] }
> +
>  # Return 1 if this is a ARM target with NEON enabled.
> 
>  proc check_effective_target_arm_neon { } { @@ -7022,6 +7089,19 @@ 
> proc check_effective_target_vect_udot_qi { } {
>  		 && [et-is-effective-target mips_msa]) }}]  }
> 
> +# Return 1 if the target plus current options supports a vector # 
> +dot-product where one operand of the multiply is signed char # and 
> +the other unsigned chars, 0 otherwise.
> +#
> +# This won't change for different subtargets so cache the result.
> +
> +proc check_effective_target_vect_usdot_qi { } {
> +    return [check_cached_effective_target_indexed vect_usdot_qi {
> +      expr { [istarget aarch64*-*-*]
> +	     || [istarget arm*-*-*] }}]
> +}
> +
> +
>  # Return 1 if the target plus current options supports a vector  # 
> dot-product of signed shorts, 0 otherwise.
>  #
> 
> 
> --

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: rb14436.patch --]
[-- Type: text/x-diff; name="rb14436.patch", Size: 16027 bytes --]

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index b0001247795947c9dcab1a14884ecd585976dfdd..0034ac9d86b26e6674d71090b9d04b6148f99e17 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1672,6 +1672,10 @@ Target supports a vector dot-product of @code{signed char}.
 @item vect_udot_qi
 Target supports a vector dot-product of @code{unsigned char}.
 
+@item vect_usdot_qi
+Target supports a vector dot-product where one operand of the multiply is
+@code{signed char} and the other of @code{unsigned char}.
+
 @item vect_sdot_hi
 Target supports a vector dot-product of @code{signed short}.
 
@@ -1947,6 +1951,11 @@ ARM target supports executing instructions from ARMv8.2-A with the Dot
 Product extension. Some multilibs may be incompatible with these options.
 Implies arm_v8_2a_dotprod_neon_ok.
 
+@item arm_v8_2a_i8mm_neon_hw
+ARM target supports executing instructions from ARMv8.2-A with the 8-bit
+Matrix Multiply extension.  Some multilibs may be incompatible with these
+options.  Implies arm_v8_2a_i8mm_ok.
+
 @item arm_fp16fml_neon_ok
 @anchor{arm_fp16fml_neon_ok}
 ARM target supports extensions to generate the @code{VFMAL} and @code{VFMLS}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-10.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-10.c
new file mode 100644
index 0000000000000000000000000000000000000000..7ce86965ea97d37c43d96b4d2271df667dcb2aae
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-10.c
@@ -0,0 +1,13 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */
+/* { dg-add-options arm_v8_2a_i8mm }  */
+
+#define SIGNEDNESS_1 unsigned
+#define SIGNEDNESS_2 unsigned
+#define SIGNEDNESS_3 unsigned
+#define SIGNEDNESS_4 signed
+
+#include "vect-reduc-dot-9.c"
+
+/* { dg-final { scan-tree-dump-not "vect_recog_dot_prod_pattern: detected" "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { target vect_usdot_qi } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-11.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-11.c
new file mode 100644
index 0000000000000000000000000000000000000000..0f7cbbb87ef028f166366aea55bc4ef49d2f8e9b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-11.c
@@ -0,0 +1,13 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */
+/* { dg-add-options arm_v8_2a_i8mm }  */
+
+#define SIGNEDNESS_1 unsigned
+#define SIGNEDNESS_2 signed
+#define SIGNEDNESS_3 unsigned
+#define SIGNEDNESS_4 signed
+
+#include "vect-reduc-dot-9.c"
+
+/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected" "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { target vect_usdot_qi } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-12.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-12.c
new file mode 100644
index 0000000000000000000000000000000000000000..08412614fc67045d3067b5b55ba032d297595237
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-12.c
@@ -0,0 +1,13 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */
+/* { dg-add-options arm_v8_2a_i8mm }  */
+
+#define SIGNEDNESS_1 unsigned
+#define SIGNEDNESS_2 signed
+#define SIGNEDNESS_3 signed
+#define SIGNEDNESS_4 unsigned
+
+#include "vect-reduc-dot-9.c"
+
+/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected" "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { target vect_usdot_qi } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-13.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-13.c
new file mode 100644
index 0000000000000000000000000000000000000000..7ee0f45f64296442204ee13d5f880f4b7716fb85
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-13.c
@@ -0,0 +1,13 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */
+/* { dg-add-options arm_v8_2a_i8mm }  */
+
+#define SIGNEDNESS_1 signed
+#define SIGNEDNESS_2 unsigned
+#define SIGNEDNESS_3 signed
+#define SIGNEDNESS_4 unsigned
+
+#include "vect-reduc-dot-9.c"
+
+/* { dg-final { scan-tree-dump-not "vect_recog_dot_prod_pattern: detected" "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { target vect_usdot_qi } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-14.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-14.c
new file mode 100644
index 0000000000000000000000000000000000000000..2de1434528b87f0c32c54150b16791f3f2a469b5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-14.c
@@ -0,0 +1,13 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */
+/* { dg-add-options arm_v8_2a_i8mm }  */
+
+#define SIGNEDNESS_1 signed
+#define SIGNEDNESS_2 unsigned
+#define SIGNEDNESS_3 unsigned
+#define SIGNEDNESS_4 signed
+
+#include "vect-reduc-dot-9.c"
+
+/* { dg-final { scan-tree-dump-not "vect_recog_dot_prod_pattern: detected" "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { target vect_usdot_qi } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-15.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-15.c
new file mode 100644
index 0000000000000000000000000000000000000000..dc48f95a32bf76c54a906ee81ddee99b16aea84a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-15.c
@@ -0,0 +1,13 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */
+/* { dg-add-options arm_v8_2a_i8mm }  */
+
+#define SIGNEDNESS_1 signed
+#define SIGNEDNESS_2 signed
+#define SIGNEDNESS_3 unsigned
+#define SIGNEDNESS_4 signed
+
+#include "vect-reduc-dot-9.c"
+
+/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected" "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { target vect_usdot_qi } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-16.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-16.c
new file mode 100644
index 0000000000000000000000000000000000000000..aec628789366673321aea88c60316a68fe16cbc5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-16.c
@@ -0,0 +1,13 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */
+/* { dg-add-options arm_v8_2a_i8mm }  */
+
+#define SIGNEDNESS_1 signed
+#define SIGNEDNESS_2 signed
+#define SIGNEDNESS_3 signed
+#define SIGNEDNESS_4 unsigned
+
+#include "vect-reduc-dot-9.c"
+
+/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected" "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { target vect_usdot_qi } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-17.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-17.c
new file mode 100644
index 0000000000000000000000000000000000000000..aa269c4d657f65e07e36df7f3fd0098cf3aaf4d0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-17.c
@@ -0,0 +1,52 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */
+/* { dg-add-options arm_v8_2a_i8mm }  */
+
+#include "tree-vect.h"
+
+#define N 50
+
+#ifndef SIGNEDNESS_1
+#define SIGNEDNESS_1 unsigned
+#define SIGNEDNESS_2 unsigned
+#define SIGNEDNESS_3 signed
+#define SIGNEDNESS_4 unsigned
+#endif
+
+SIGNEDNESS_1 int __attribute__ ((noipa))
+f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a,
+   SIGNEDNESS_4 char *restrict b)
+{
+  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+    {
+      int av = a[i];
+      int bv = b[i];
+      SIGNEDNESS_2 int mult = av * bv;
+      res += mult;
+    }
+  return res;
+}
+
+#define BASE ((SIGNEDNESS_3 int) -1 < 0 ? -126 : 4)
+#define OFFSET 20
+
+int
+main (void)
+{
+  check_vect ();
+
+  SIGNEDNESS_3 char a[N], b[N];
+  int expected = 0x12345;
+  for (int i = 0; i < N; ++i)
+    {
+      a[i] = BASE + i * 5;
+      b[i] = BASE + OFFSET + i * 4;
+      asm volatile ("" ::: "memory");
+      expected += (SIGNEDNESS_2 int) (a[i] * b[i]);
+    }
+  if (f (0x12345, a, b) != expected)
+    __builtin_abort ();
+}
+
+/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected" "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { target vect_usdot_qi } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-18.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-18.c
new file mode 100644
index 0000000000000000000000000000000000000000..2b1cc0411c3256ccd876d8b4da18ce4881dc0af9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-18.c
@@ -0,0 +1,52 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */
+/* { dg-add-options arm_v8_2a_i8mm }  */
+
+#include "tree-vect.h"
+
+#define N 50
+
+#ifndef SIGNEDNESS_1
+#define SIGNEDNESS_1 unsigned
+#define SIGNEDNESS_2 signed
+#define SIGNEDNESS_3 signed
+#define SIGNEDNESS_4 unsigned
+#endif
+
+SIGNEDNESS_1 int __attribute__ ((noipa))
+f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a,
+   SIGNEDNESS_4 char *restrict b)
+{
+  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+    {
+      int av = a[i];
+      int bv = b[i];
+      SIGNEDNESS_2 int mult = av * bv;
+      res += mult;
+    }
+  return res;
+}
+
+#define BASE ((SIGNEDNESS_3 int) -1 < 0 ? -126 : 4)
+#define OFFSET 20
+
+int
+main (void)
+{
+  check_vect ();
+
+  SIGNEDNESS_3 char a[N], b[N];
+  int expected = 0x12345;
+  for (int i = 0; i < N; ++i)
+    {
+      a[i] = BASE + i * 5;
+      b[i] = BASE + OFFSET + i * 4;
+      asm volatile ("" ::: "memory");
+      expected += (SIGNEDNESS_2 int) (a[i] * b[i]);
+    }
+  if (f (0x12345, a, b) != expected)
+    __builtin_abort ();
+}
+
+/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected" "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { target vect_usdot_qi } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c
new file mode 100644
index 0000000000000000000000000000000000000000..cbbeedec3bfd0810a8ce8036e6670585d9334924
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c
@@ -0,0 +1,52 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */
+/* { dg-add-options arm_v8_2a_i8mm }  */
+
+#include "tree-vect.h"
+
+#define N 50
+
+#ifndef SIGNEDNESS_1
+#define SIGNEDNESS_1 unsigned
+#define SIGNEDNESS_2 unsigned
+#define SIGNEDNESS_3 signed
+#define SIGNEDNESS_4 unsigned
+#endif
+
+SIGNEDNESS_1 int __attribute__ ((noipa))
+f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a,
+   SIGNEDNESS_4 char *restrict b)
+{
+  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+    {
+      int av = a[i];
+      int bv = b[i];
+      SIGNEDNESS_2 short mult = av * bv;
+      res += mult;
+    }
+  return res;
+}
+
+#define BASE ((SIGNEDNESS_3 int) -1 < 0 ? -126 : 4)
+#define OFFSET 20
+
+int
+main (void)
+{
+  check_vect ();
+
+  SIGNEDNESS_3 char a[N], b[N];
+  int expected = 0x12345;
+  for (int i = 0; i < N; ++i)
+    {
+      a[i] = BASE + i * 5;
+      b[i] = BASE + OFFSET + i * 4;
+      asm volatile ("" ::: "memory");
+      expected += (SIGNEDNESS_2 short) (a[i] * b[i]);
+    }
+  if (f (0x12345, a, b) != expected)
+    __builtin_abort ();
+}
+
+/* { dg-final { scan-tree-dump-not "vect_recog_dot_prod_pattern: detected" "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { target vect_usdot_qi } } } */
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index ad323107f2ec5d55a77214beca5e4135643528b4..db9bd605ab4c838f65667fa616da334a171d9dfb 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -5240,6 +5240,36 @@ proc check_effective_target_arm_v8_2a_dotprod_neon_ok_nocache { } {
     return 0;
 }
 
+# Return 1 if the target supports ARMv8.2 Adv.SIMD imm8
+# instructions, 0 otherwise.  The test is valid for ARM and for AArch64.
+# Record the command line options needed.
+
+proc check_effective_target_arm_v8_2a_imm8_neon_ok_nocache { } {
+    global et_arm_v8_2a_imm8_neon_flags
+    set et_arm_v8_2a_imm8_neon_flags ""
+
+    if { ![istarget arm*-*-*] && ![istarget aarch64*-*-*] } {
+        return 0;
+    }
+
+    # Iterate through sets of options to find the compiler flags that
+    # need to be added to the -march option.
+    foreach flags {"" "-mfloat-abi=softfp -mfpu=neon-fp-armv8" "-mfloat-abi=hard -mfpu=neon-fp-armv8"} {
+        if { [check_no_compiler_messages_nocache \
+                  arm_v8_2a_imm8_neon_ok object {
+	    #include <stdint.h>
+            #if !defined (__ARM_FEATURE_MATMUL_INT8)
+            #error "__ARM_FEATURE_MATMUL_INT8 not defined"
+            #endif
+        } "$flags -march=armv8.2-a+imm8"] } {
+            set et_arm_v8_2a_imm8_neon_flags "$flags -march=armv8.2-a+imm8"
+            return 1
+        }
+    }
+
+    return 0;
+}
+
 # Return 1 if the target supports ARMv8.1-M MVE
 # instructions, 0 otherwise.  The test is valid for ARM.
 # Record the command line options needed.
@@ -5667,6 +5697,43 @@ proc check_effective_target_arm_v8_2a_dotprod_neon_hw { } {
     } [add_options_for_arm_v8_2a_dotprod_neon ""]]
 }
 
+# Return 1 if the target supports executing AdvSIMD instructions from ARMv8.2
+# with the i8mm extension, 0 otherwise.  The test is valid for ARM and for
+# AArch64.
+
+proc check_effective_target_arm_v8_2a_i8mm_neon_hw { } {
+    if { ![check_effective_target_arm_v8_2a_i8mm_ok] } {
+        return 0;
+    }
+    return [check_runtime arm_v8_2a_i8mm_neon_hw_available {
+        #include "arm_neon.h"
+        int
+        main (void)
+        {
+
+	  uint32x2_t results = {0,0};
+	  uint8x8_t a = {1,1,1,1,2,2,2,2};
+	  int8x8_t b = {2,2,2,2,3,3,3,3};
+
+          #ifdef __ARM_ARCH_ISA_A64
+          asm ("usdot %0.2s, %1.8b, %2.8b"
+               : "=w"(results)
+               : "w"(a), "w"(b)
+               : /* No clobbers.  */);
+
+	  #else
+          asm ("vusdot.u8 %P0, %P1, %P2"
+               : "=w"(results)
+               : "w"(a), "w"(b)
+               : /* No clobbers.  */);
+          #endif
+
+          return (vget_lane_u32 (results, 0) == 8
+		  && vget_lane_u32 (results, 1) == 24) ? 1 : 0;
+        }
+    } [add_options_for_arm_v8_2a_i8mm ""]]
+}
+
 # Return 1 if this is a ARM target with NEON enabled.
 
 proc check_effective_target_arm_neon { } {
@@ -7022,6 +7089,19 @@ proc check_effective_target_vect_udot_qi { } {
 		 && [et-is-effective-target mips_msa]) }}]
 }
 
+# Return 1 if the target plus current options supports a vector
+# dot-product where one operand of the multiply is signed char
+# and the other unsigned chars, 0 otherwise.
+#
+# This won't change for different subtargets so cache the result.
+
+proc check_effective_target_vect_usdot_qi { } {
+    return [check_cached_effective_target_indexed vect_usdot_qi {
+      expr { [istarget aarch64*-*-*]
+	     || [istarget arm*-*-*] }}]
+}
+
+
 # Return 1 if the target plus current options supports a vector
 # dot-product of signed shorts, 0 otherwise.
 #


^ permalink raw reply	[flat|nested] 35+ messages in thread

[parent not found: <11s2181-8856-30rq-26or-84q8o7qrr2o@fhfr.qr>]

* Re: [PATCH 4/4]middle-end: Add tests middle end generic tests for sign differing dotproduct.
       [not found]     ` <11s2181-8856-30rq-26or-84q8o7qrr2o@fhfr.qr>
@ 2021-05-26  8:48       ` Tamar Christina
  2021-06-14 12:08       ` Tamar Christina
  1 sibling, 0 replies; 35+ messages in thread
From: Tamar Christina @ 2021-05-26  8:48 UTC (permalink / raw)
  To: Richard Biener; +Cc: nd, GCC Patches

Think list got dropped on my last reply.

Forwarding to archive the OK.
________________________________
From: Richard Biener <rguenther@suse.de>
Sent: Wednesday, May 26, 2021 9:40 AM
To: Tamar Christina <Tamar.Christina@arm.com>
Cc: nd <nd@arm.com>
Subject: RE: [PATCH 4/4]middle-end: Add tests middle end generic tests for sign differing dotproduct.

On Tue, 25 May 2021, Tamar Christina wrote:

> Hi All,
>
> Adding a few more tests
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?

OK.

> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>        * doc/sourcebuild.texi (arm_v8_2a_i8mm_neon_hw): Document.
>
> gcc/testsuite/ChangeLog:
>
>        * lib/target-supports.exp
>        (check_effective_target_arm_v8_2a_imm8_neon_ok_nocache,
>        check_effective_target_arm_v8_2a_i8mm_neon_hw,
>        check_effective_target_vect_usdot_qi): New.
>        * gcc.dg/vect/vect-reduc-dot-9.c: New test.
>        * gcc.dg/vect/vect-reduc-dot-10.c: New test.
>        * gcc.dg/vect/vect-reduc-dot-11.c: New test.
>        * gcc.dg/vect/vect-reduc-dot-12.c: New test.
>        * gcc.dg/vect/vect-reduc-dot-13.c: New test.
>        * gcc.dg/vect/vect-reduc-dot-14.c: New test.
>        * gcc.dg/vect/vect-reduc-dot-15.c: New test.
>        * gcc.dg/vect/vect-reduc-dot-16.c: New test.
>        * gcc.dg/vect/vect-reduc-dot-17.c: New test.
>        * gcc.dg/vect/vect-reduc-dot-18.c: New test.
>
> > -----Original Message-----
> > From: Gcc-patches <gcc-patches-bounces@gcc.gnu.org> On Behalf Of Tamar
> > Christina via Gcc-patches
> > Sent: Wednesday, May 5, 2021 6:40 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: nd <nd@arm.com>; rguenther@suse.de
> > Subject: [PATCH 4/4]middle-end: Add tests middle end generic tests for sign
> > differing dotproduct.
> >
> > Hi All,
> >
> > This adds testcases to test for auto-vect detection of the new sign differing
> > dot product.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> >      * doc/sourcebuild.texi (arm_v8_2a_i8mm_neon_hw): Document.
> >
> > gcc/testsuite/ChangeLog:
> >
> >      * lib/target-supports.exp
> >      (check_effective_target_arm_v8_2a_imm8_neon_ok_nocache,
> >      check_effective_target_arm_v8_2a_i8mm_neon_hw,
> >      check_effective_target_vect_usdot_qi): New.
> >      * gcc.dg/vect/vect-reduc-dot-10.c: New test.
> >      * gcc.dg/vect/vect-reduc-dot-11.c: New test.
> >      * gcc.dg/vect/vect-reduc-dot-12.c: New test.
> >      * gcc.dg/vect/vect-reduc-dot-13.c: New test.
> >      * gcc.dg/vect/vect-reduc-dot-14.c: New test.
> >      * gcc.dg/vect/vect-reduc-dot-15.c: New test.
> >      * gcc.dg/vect/vect-reduc-dot-16.c: New test.
> >      * gcc.dg/vect/vect-reduc-dot-9.c: New test.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi index
> > b0001247795947c9dcab1a14884ecd585976dfdd..0034ac9d86b26e6674d71090b
> > 9d04b6148f99e17 100644
> > --- a/gcc/doc/sourcebuild.texi
> > +++ b/gcc/doc/sourcebuild.texi
> > @@ -1672,6 +1672,10 @@ Target supports a vector dot-product of
> > @code{signed char}.
> >  @item vect_udot_qi
> >  Target supports a vector dot-product of @code{unsigned char}.
> >
> > +@item vect_usdot_qi
> > +Target supports a vector dot-product where one operand of the multiply
> > +is @code{signed char} and the other of @code{unsigned char}.
> > +
> >  @item vect_sdot_hi
> >  Target supports a vector dot-product of @code{signed short}.
> >
> > @@ -1947,6 +1951,11 @@ ARM target supports executing instructions from
> > ARMv8.2-A with the Dot  Product extension. Some multilibs may be
> > incompatible with these options.
> >  Implies arm_v8_2a_dotprod_neon_ok.
> >
> > +@item arm_v8_2a_i8mm_neon_hw
> > +ARM target supports executing instructions from ARMv8.2-A with the
> > +8-bit Matrix Multiply extension.  Some multilibs may be incompatible
> > +with these options.  Implies arm_v8_2a_i8mm_ok.
> > +
> >  @item arm_fp16fml_neon_ok
> >  @anchor{arm_fp16fml_neon_ok}
> >  ARM target supports extensions to generate the @code{VFMAL} and
> > @code{VFMLS} diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-10.c
> > b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-10.c
> > new file mode 100644
> > index
> > 0000000000000000000000000000000000000000..7ce86965ea97d37c43d96b4d2
> > 271df667dcb2aae
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-10.c
> > @@ -0,0 +1,13 @@
> > +/* { dg-require-effective-target vect_int } */
> > +/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target {
> > +aarch64*-*-* || arm*-*-* } } } */
> > +/* { dg-add-options arm_v8_2a_i8mm }  */
> > +
> > +#define SIGNEDNESS_1 unsigned
> > +#define SIGNEDNESS_2 unsigned
> > +#define SIGNEDNESS_3 unsigned
> > +#define SIGNEDNESS_4 signed
> > +
> > +#include "vect-reduc-dot-9.c"
> > +
> > +/* { dg-final { scan-tree-dump-not "vect_recog_dot_prod_pattern:
> > +detected" "vect" } } */
> > +/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" {
> > +target vect_usdot_qi } } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-11.c
> > b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-11.c
> > new file mode 100644
> > index
> > 0000000000000000000000000000000000000000..0f7cbbb87ef028f166366aea55
> > bc4ef49d2f8e9b
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-11.c
> > @@ -0,0 +1,13 @@
> > +/* { dg-require-effective-target vect_int } */
> > +/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target {
> > +aarch64*-*-* || arm*-*-* } } } */
> > +/* { dg-add-options arm_v8_2a_i8mm }  */
> > +
> > +#define SIGNEDNESS_1 unsigned
> > +#define SIGNEDNESS_2 signed
> > +#define SIGNEDNESS_3 unsigned
> > +#define SIGNEDNESS_4 signed
> > +
> > +#include "vect-reduc-dot-9.c"
> > +
> > +/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected"
> > +"vect" } } */
> > +/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" {
> > +target vect_usdot_qi } } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-12.c
> > b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-12.c
> > new file mode 100644
> > index
> > 0000000000000000000000000000000000000000..08412614fc67045d3067b5b55
> > ba032d297595237
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-12.c
> > @@ -0,0 +1,13 @@
> > +/* { dg-require-effective-target vect_int } */
> > +/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target {
> > +aarch64*-*-* || arm*-*-* } } } */
> > +/* { dg-add-options arm_v8_2a_i8mm }  */
> > +
> > +#define SIGNEDNESS_1 unsigned
> > +#define SIGNEDNESS_2 signed
> > +#define SIGNEDNESS_3 signed
> > +#define SIGNEDNESS_4 unsigned
> > +
> > +#include "vect-reduc-dot-9.c"
> > +
> > +/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected"
> > +"vect" } } */
> > +/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" {
> > +target vect_usdot_qi } } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-13.c
> > b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-13.c
> > new file mode 100644
> > index
> > 0000000000000000000000000000000000000000..7ee0f45f64296442204ee13d5f
> > 880f4b7716fb85
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-13.c
> > @@ -0,0 +1,13 @@
> > +/* { dg-require-effective-target vect_int } */
> > +/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target {
> > +aarch64*-*-* || arm*-*-* } } } */
> > +/* { dg-add-options arm_v8_2a_i8mm }  */
> > +
> > +#define SIGNEDNESS_1 signed
> > +#define SIGNEDNESS_2 unsigned
> > +#define SIGNEDNESS_3 signed
> > +#define SIGNEDNESS_4 unsigned
> > +
> > +#include "vect-reduc-dot-9.c"
> > +
> > +/* { dg-final { scan-tree-dump-not "vect_recog_dot_prod_pattern:
> > +detected" "vect" } } */
> > +/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" {
> > +target vect_usdot_qi } } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-14.c
> > b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-14.c
> > new file mode 100644
> > index
> > 0000000000000000000000000000000000000000..2de1434528b87f0c32c54150b1
> > 6791f3f2a469b5
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-14.c
> > @@ -0,0 +1,13 @@
> > +/* { dg-require-effective-target vect_int } */
> > +/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target {
> > +aarch64*-*-* || arm*-*-* } } } */
> > +/* { dg-add-options arm_v8_2a_i8mm }  */
> > +
> > +#define SIGNEDNESS_1 signed
> > +#define SIGNEDNESS_2 unsigned
> > +#define SIGNEDNESS_3 unsigned
> > +#define SIGNEDNESS_4 signed
> > +
> > +#include "vect-reduc-dot-9.c"
> > +
> > +/* { dg-final { scan-tree-dump-not "vect_recog_dot_prod_pattern:
> > +detected" "vect" } } */
> > +/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" {
> > +target vect_usdot_qi } } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-15.c
> > b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-15.c
> > new file mode 100644
> > index
> > 0000000000000000000000000000000000000000..dc48f95a32bf76c54a906ee81d
> > dee99b16aea84a
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-15.c
> > @@ -0,0 +1,13 @@
> > +/* { dg-require-effective-target vect_int } */
> > +/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target {
> > +aarch64*-*-* || arm*-*-* } } } */
> > +/* { dg-add-options arm_v8_2a_i8mm }  */
> > +
> > +#define SIGNEDNESS_1 signed
> > +#define SIGNEDNESS_2 signed
> > +#define SIGNEDNESS_3 unsigned
> > +#define SIGNEDNESS_4 signed
> > +
> > +#include "vect-reduc-dot-9.c"
> > +
> > +/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected"
> > +"vect" } } */
> > +/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" {
> > +target vect_usdot_qi } } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-16.c
> > b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-16.c
> > new file mode 100644
> > index
> > 0000000000000000000000000000000000000000..aec628789366673321aea88c60
> > 316a68fe16cbc5
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-16.c
> > @@ -0,0 +1,13 @@
> > +/* { dg-require-effective-target vect_int } */
> > +/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target {
> > +aarch64*-*-* || arm*-*-* } } } */
> > +/* { dg-add-options arm_v8_2a_i8mm }  */
> > +
> > +#define SIGNEDNESS_1 signed
> > +#define SIGNEDNESS_2 signed
> > +#define SIGNEDNESS_3 signed
> > +#define SIGNEDNESS_4 unsigned
> > +
> > +#include "vect-reduc-dot-9.c"
> > +
> > +/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected"
> > +"vect" } } */
> > +/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" {
> > +target vect_usdot_qi } } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c
> > b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c
> > new file mode 100644
> > index
> > 0000000000000000000000000000000000000000..cbbeedec3bfd0810a8ce8036e
> > 6670585d9334924
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c
> > @@ -0,0 +1,52 @@
> > +/* { dg-require-effective-target vect_int } */
> > +/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target {
> > +aarch64*-*-* || arm*-*-* } } } */
> > +/* { dg-add-options arm_v8_2a_i8mm }  */
> > +
> > +#include "tree-vect.h"
> > +
> > +#define N 50
> > +
> > +#ifndef SIGNEDNESS_1
> > +#define SIGNEDNESS_1 unsigned
> > +#define SIGNEDNESS_2 unsigned
> > +#define SIGNEDNESS_3 signed
> > +#define SIGNEDNESS_4 unsigned
> > +#endif
> > +
> > +SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int res,
> > +SIGNEDNESS_3 char *restrict a,
> > +   SIGNEDNESS_4 char *restrict b)
> > +{
> > +  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> > +    {
> > +      int av = a[i];
> > +      int bv = b[i];
> > +      SIGNEDNESS_2 short mult = av * bv;
> > +      res += mult;
> > +    }
> > +  return res;
> > +}
> > +
> > +#define BASE ((SIGNEDNESS_3 int) -1 < 0 ? -126 : 4) #define OFFSET 20
> > +
> > +int
> > +main (void)
> > +{
> > +  check_vect ();
> > +
> > +  SIGNEDNESS_3 char a[N], b[N];
> > +  int expected = 0x12345;
> > +  for (int i = 0; i < N; ++i)
> > +    {
> > +      a[i] = BASE + i * 5;
> > +      b[i] = BASE + OFFSET + i * 4;
> > +      asm volatile ("" ::: "memory");
> > +      expected += (SIGNEDNESS_2 short) (a[i] * b[i]);
> > +    }
> > +  if (f (0x12345, a, b) != expected)
> > +    __builtin_abort ();
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-not "vect_recog_dot_prod_pattern:
> > +detected" "vect" } } */
> > +/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" {
> > +target vect_usdot_qi } } } */
> > diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-
> > supports.exp
> > index
> > ad323107f2ec5d55a77214beca5e4135643528b4..db9bd605ab4c838f65667fa61
> > 6da334a171d9dfb 100644
> > --- a/gcc/testsuite/lib/target-supports.exp
> > +++ b/gcc/testsuite/lib/target-supports.exp
> > @@ -5240,6 +5240,36 @@ proc
> > check_effective_target_arm_v8_2a_dotprod_neon_ok_nocache { } {
> >      return 0;
> >  }
> >
> > +# Return 1 if the target supports ARMv8.2 Adv.SIMD imm8 # instructions,
> > +0 otherwise.  The test is valid for ARM and for AArch64.
> > +# Record the command line options needed.
> > +
> > +proc check_effective_target_arm_v8_2a_imm8_neon_ok_nocache { } {
> > +    global et_arm_v8_2a_imm8_neon_flags
> > +    set et_arm_v8_2a_imm8_neon_flags ""
> > +
> > +    if { ![istarget arm*-*-*] && ![istarget aarch64*-*-*] } {
> > +        return 0;
> > +    }
> > +
> > +    # Iterate through sets of options to find the compiler flags that
> > +    # need to be added to the -march option.
> > +    foreach flags {"" "-mfloat-abi=softfp -mfpu=neon-fp-armv8" "-mfloat-
> > abi=hard -mfpu=neon-fp-armv8"} {
> > +        if { [check_no_compiler_messages_nocache \
> > +                  arm_v8_2a_imm8_neon_ok object {
> > +       #include <stdint.h>
> > +            #if !defined (__ARM_FEATURE_MATMUL_INT8)
> > +            #error "__ARM_FEATURE_MATMUL_INT8 not defined"
> > +            #endif
> > +        } "$flags -march=armv8.2-a+imm8"] } {
> > +            set et_arm_v8_2a_imm8_neon_flags "$flags -march=armv8.2-
> > a+imm8"
> > +            return 1
> > +        }
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> >  # Return 1 if the target supports ARMv8.1-M MVE  # instructions, 0
> > otherwise.  The test is valid for ARM.
> >  # Record the command line options needed.
> > @@ -5667,6 +5697,43 @@ proc
> > check_effective_target_arm_v8_2a_dotprod_neon_hw { } {
> >      } [add_options_for_arm_v8_2a_dotprod_neon ""]]  }
> >
> > +# Return 1 if the target supports executing AdvSIMD instructions from
> > +ARMv8.2 # with the i8mm extension, 0 otherwise.  The test is valid for
> > +ARM and for # AArch64.
> > +
> > +proc check_effective_target_arm_v8_2a_i8mm_neon_hw { } {
> > +    if { ![check_effective_target_arm_v8_2a_i8mm_ok] } {
> > +        return 0;
> > +    }
> > +    return [check_runtime arm_v8_2a_i8mm_neon_hw_available {
> > +        #include "arm_neon.h"
> > +        int
> > +        main (void)
> > +        {
> > +
> > +     uint32x2_t results = {0,0};
> > +     uint8x8_t a = {1,1,1,1,2,2,2,2};
> > +     int8x8_t b = {2,2,2,2,3,3,3,3};
> > +
> > +          #ifdef __ARM_ARCH_ISA_A64
> > +          asm ("usdot %0.2s, %1.8b, %2.8b"
> > +               : "=w"(results)
> > +               : "w"(a), "w"(b)
> > +               : /* No clobbers.  */);
> > +
> > +     #else
> > +          asm ("vusdot.u8 %P0, %P1, %P2"
> > +               : "=w"(results)
> > +               : "w"(a), "w"(b)
> > +               : /* No clobbers.  */);
> > +          #endif
> > +
> > +          return (vget_lane_u32 (results, 0) == 8
> > +             && vget_lane_u32 (results, 1) == 24) ? 1 : 0;
> > +        }
> > +    } [add_options_for_arm_v8_2a_i8mm ""]] }
> > +
> >  # Return 1 if this is a ARM target with NEON enabled.
> >
> >  proc check_effective_target_arm_neon { } { @@ -7022,6 +7089,19 @@ proc
> > check_effective_target_vect_udot_qi { } {
> >               && [et-is-effective-target mips_msa]) }}]  }
> >
> > +# Return 1 if the target plus current options supports a vector #
> > +dot-product where one operand of the multiply is signed char # and the
> > +other unsigned chars, 0 otherwise.
> > +#
> > +# This won't change for different subtargets so cache the result.
> > +
> > +proc check_effective_target_vect_usdot_qi { } {
> > +    return [check_cached_effective_target_indexed vect_usdot_qi {
> > +      expr { [istarget aarch64*-*-*]
> > +        || [istarget arm*-*-*] }}]
> > +}
> > +
> > +
> >  # Return 1 if the target plus current options supports a vector  # dot-product
> > of signed shorts, 0 otherwise.
> >  #
> >
> >
> > --
>

--
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: [PATCH 4/4]middle-end: Add tests middle end generic tests for sign differing dotproduct.
       [not found]     ` <11s2181-8856-30rq-26or-84q8o7qrr2o@fhfr.qr>
  2021-05-26  8:48       ` Tamar Christina
@ 2021-06-14 12:08       ` Tamar Christina
  1 sibling, 0 replies; 35+ messages in thread
From: Tamar Christina @ 2021-06-14 12:08 UTC (permalink / raw)
  To: Richard Biener; +Cc: nd, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 18215 bytes --]

Hi,

Just adding 7 more tests, I will assume the OK still stands as it's more of the same.

Thanks,
Tamar

> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Wednesday, May 26, 2021 9:41 AM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: nd <nd@arm.com>
> Subject: RE: [PATCH 4/4]middle-end: Add tests middle end generic tests for
> sign differing dotproduct.
> 
> On Tue, 25 May 2021, Tamar Christina wrote:
> 
> > Hi All,
> >
> > Adding a few more tests
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> 
> OK.
> 
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 	* doc/sourcebuild.texi (arm_v8_2a_i8mm_neon_hw): Document.
> >
> > gcc/testsuite/ChangeLog:
> >
> > 	* lib/target-supports.exp
> > 	(check_effective_target_arm_v8_2a_imm8_neon_ok_nocache,
> > 	check_effective_target_arm_v8_2a_i8mm_neon_hw,
> > 	check_effective_target_vect_usdot_qi): New.
> > 	* gcc.dg/vect/vect-reduc-dot-9.c: New test.
> > 	* gcc.dg/vect/vect-reduc-dot-10.c: New test.
> > 	* gcc.dg/vect/vect-reduc-dot-11.c: New test.
> > 	* gcc.dg/vect/vect-reduc-dot-12.c: New test.
> > 	* gcc.dg/vect/vect-reduc-dot-13.c: New test.
> > 	* gcc.dg/vect/vect-reduc-dot-14.c: New test.
> > 	* gcc.dg/vect/vect-reduc-dot-15.c: New test.
> > 	* gcc.dg/vect/vect-reduc-dot-16.c: New test.
> > 	* gcc.dg/vect/vect-reduc-dot-17.c: New test.
> > 	* gcc.dg/vect/vect-reduc-dot-18.c: New test.
> >
> > > -----Original Message-----
> > > From: Gcc-patches <gcc-patches-bounces@gcc.gnu.org> On Behalf Of
> > > Tamar Christina via Gcc-patches
> > > Sent: Wednesday, May 5, 2021 6:40 PM
> > > To: gcc-patches@gcc.gnu.org
> > > Cc: nd <nd@arm.com>; rguenther@suse.de
> > > Subject: [PATCH 4/4]middle-end: Add tests middle end generic tests
> > > for sign differing dotproduct.
> > >
> > > Hi All,
> > >
> > > This adds testcases to test for auto-vect detection of the new sign
> > > differing dot product.
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > >
> > > Ok for master?
> > >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > > 	* doc/sourcebuild.texi (arm_v8_2a_i8mm_neon_hw): Document.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > 	* lib/target-supports.exp
> > > 	(check_effective_target_arm_v8_2a_imm8_neon_ok_nocache,
> > > 	check_effective_target_arm_v8_2a_i8mm_neon_hw,
> > > 	check_effective_target_vect_usdot_qi): New.
> > > 	* gcc.dg/vect/vect-reduc-dot-10.c: New test.
> > > 	* gcc.dg/vect/vect-reduc-dot-11.c: New test.
> > > 	* gcc.dg/vect/vect-reduc-dot-12.c: New test.
> > > 	* gcc.dg/vect/vect-reduc-dot-13.c: New test.
> > > 	* gcc.dg/vect/vect-reduc-dot-14.c: New test.
> > > 	* gcc.dg/vect/vect-reduc-dot-15.c: New test.
> > > 	* gcc.dg/vect/vect-reduc-dot-16.c: New test.
> > > 	* gcc.dg/vect/vect-reduc-dot-9.c: New test.
> > >
> > > --- inline copy of patch --
> > > diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
> > > index
> > >
> b0001247795947c9dcab1a14884ecd585976dfdd..0034ac9d86b26e6674d71090b
> > > 9d04b6148f99e17 100644
> > > --- a/gcc/doc/sourcebuild.texi
> > > +++ b/gcc/doc/sourcebuild.texi
> > > @@ -1672,6 +1672,10 @@ Target supports a vector dot-product of
> > > @code{signed char}.
> > >  @item vect_udot_qi
> > >  Target supports a vector dot-product of @code{unsigned char}.
> > >
> > > +@item vect_usdot_qi
> > > +Target supports a vector dot-product where one operand of the
> > > +multiply is @code{signed char} and the other of @code{unsigned char}.
> > > +
> > >  @item vect_sdot_hi
> > >  Target supports a vector dot-product of @code{signed short}.
> > >
> > > @@ -1947,6 +1951,11 @@ ARM target supports executing instructions
> > > from ARMv8.2-A with the Dot  Product extension. Some multilibs may
> > > be incompatible with these options.
> > >  Implies arm_v8_2a_dotprod_neon_ok.
> > >
> > > +@item arm_v8_2a_i8mm_neon_hw
> > > +ARM target supports executing instructions from ARMv8.2-A with the
> > > +8-bit Matrix Multiply extension.  Some multilibs may be
> > > +incompatible with these options.  Implies arm_v8_2a_i8mm_ok.
> > > +
> > >  @item arm_fp16fml_neon_ok
> > >  @anchor{arm_fp16fml_neon_ok}
> > >  ARM target supports extensions to generate the @code{VFMAL} and
> > > @code{VFMLS} diff --git
> > > a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-10.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-10.c
> > > new file mode 100644
> > > index
> > >
> 0000000000000000000000000000000000000000..7ce86965ea97d37c43d96b4d2
> > > 271df667dcb2aae
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-10.c
> > > @@ -0,0 +1,13 @@
> > > +/* { dg-require-effective-target vect_int } */
> > > +/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target {
> > > +aarch64*-*-* || arm*-*-* } } } */
> > > +/* { dg-add-options arm_v8_2a_i8mm }  */
> > > +
> > > +#define SIGNEDNESS_1 unsigned
> > > +#define SIGNEDNESS_2 unsigned
> > > +#define SIGNEDNESS_3 unsigned
> > > +#define SIGNEDNESS_4 signed
> > > +
> > > +#include "vect-reduc-dot-9.c"
> > > +
> > > +/* { dg-final { scan-tree-dump-not "vect_recog_dot_prod_pattern:
> > > +detected" "vect" } } */
> > > +/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" {
> > > +target vect_usdot_qi } } } */
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-11.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-11.c
> > > new file mode 100644
> > > index
> > >
> 0000000000000000000000000000000000000000..0f7cbbb87ef028f166366aea55
> > > bc4ef49d2f8e9b
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-11.c
> > > @@ -0,0 +1,13 @@
> > > +/* { dg-require-effective-target vect_int } */
> > > +/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target {
> > > +aarch64*-*-* || arm*-*-* } } } */
> > > +/* { dg-add-options arm_v8_2a_i8mm }  */
> > > +
> > > +#define SIGNEDNESS_1 unsigned
> > > +#define SIGNEDNESS_2 signed
> > > +#define SIGNEDNESS_3 unsigned
> > > +#define SIGNEDNESS_4 signed
> > > +
> > > +#include "vect-reduc-dot-9.c"
> > > +
> > > +/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern:
> detected"
> > > +"vect" } } */
> > > +/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" {
> > > +target vect_usdot_qi } } } */
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-12.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-12.c
> > > new file mode 100644
> > > index
> > >
> 0000000000000000000000000000000000000000..08412614fc67045d3067b5b55
> > > ba032d297595237
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-12.c
> > > @@ -0,0 +1,13 @@
> > > +/* { dg-require-effective-target vect_int } */
> > > +/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target {
> > > +aarch64*-*-* || arm*-*-* } } } */
> > > +/* { dg-add-options arm_v8_2a_i8mm }  */
> > > +
> > > +#define SIGNEDNESS_1 unsigned
> > > +#define SIGNEDNESS_2 signed
> > > +#define SIGNEDNESS_3 signed
> > > +#define SIGNEDNESS_4 unsigned
> > > +
> > > +#include "vect-reduc-dot-9.c"
> > > +
> > > +/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern:
> detected"
> > > +"vect" } } */
> > > +/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" {
> > > +target vect_usdot_qi } } } */
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-13.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-13.c
> > > new file mode 100644
> > > index
> > >
> 0000000000000000000000000000000000000000..7ee0f45f64296442204ee13d5f
> > > 880f4b7716fb85
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-13.c
> > > @@ -0,0 +1,13 @@
> > > +/* { dg-require-effective-target vect_int } */
> > > +/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target {
> > > +aarch64*-*-* || arm*-*-* } } } */
> > > +/* { dg-add-options arm_v8_2a_i8mm }  */
> > > +
> > > +#define SIGNEDNESS_1 signed
> > > +#define SIGNEDNESS_2 unsigned
> > > +#define SIGNEDNESS_3 signed
> > > +#define SIGNEDNESS_4 unsigned
> > > +
> > > +#include "vect-reduc-dot-9.c"
> > > +
> > > +/* { dg-final { scan-tree-dump-not "vect_recog_dot_prod_pattern:
> > > +detected" "vect" } } */
> > > +/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" {
> > > +target vect_usdot_qi } } } */
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-14.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-14.c
> > > new file mode 100644
> > > index
> > >
> 0000000000000000000000000000000000000000..2de1434528b87f0c32c54150b1
> > > 6791f3f2a469b5
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-14.c
> > > @@ -0,0 +1,13 @@
> > > +/* { dg-require-effective-target vect_int } */
> > > +/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target {
> > > +aarch64*-*-* || arm*-*-* } } } */
> > > +/* { dg-add-options arm_v8_2a_i8mm }  */
> > > +
> > > +#define SIGNEDNESS_1 signed
> > > +#define SIGNEDNESS_2 unsigned
> > > +#define SIGNEDNESS_3 unsigned
> > > +#define SIGNEDNESS_4 signed
> > > +
> > > +#include "vect-reduc-dot-9.c"
> > > +
> > > +/* { dg-final { scan-tree-dump-not "vect_recog_dot_prod_pattern:
> > > +detected" "vect" } } */
> > > +/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" {
> > > +target vect_usdot_qi } } } */
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-15.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-15.c
> > > new file mode 100644
> > > index
> > >
> 0000000000000000000000000000000000000000..dc48f95a32bf76c54a906ee81d
> > > dee99b16aea84a
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-15.c
> > > @@ -0,0 +1,13 @@
> > > +/* { dg-require-effective-target vect_int } */
> > > +/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target {
> > > +aarch64*-*-* || arm*-*-* } } } */
> > > +/* { dg-add-options arm_v8_2a_i8mm }  */
> > > +
> > > +#define SIGNEDNESS_1 signed
> > > +#define SIGNEDNESS_2 signed
> > > +#define SIGNEDNESS_3 unsigned
> > > +#define SIGNEDNESS_4 signed
> > > +
> > > +#include "vect-reduc-dot-9.c"
> > > +
> > > +/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern:
> detected"
> > > +"vect" } } */
> > > +/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" {
> > > +target vect_usdot_qi } } } */
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-16.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-16.c
> > > new file mode 100644
> > > index
> > >
> 0000000000000000000000000000000000000000..aec628789366673321aea88c60
> > > 316a68fe16cbc5
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-16.c
> > > @@ -0,0 +1,13 @@
> > > +/* { dg-require-effective-target vect_int } */
> > > +/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target {
> > > +aarch64*-*-* || arm*-*-* } } } */
> > > +/* { dg-add-options arm_v8_2a_i8mm }  */
> > > +
> > > +#define SIGNEDNESS_1 signed
> > > +#define SIGNEDNESS_2 signed
> > > +#define SIGNEDNESS_3 signed
> > > +#define SIGNEDNESS_4 unsigned
> > > +
> > > +#include "vect-reduc-dot-9.c"
> > > +
> > > +/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern:
> detected"
> > > +"vect" } } */
> > > +/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" {
> > > +target vect_usdot_qi } } } */
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c
> > > new file mode 100644
> > > index
> > >
> 0000000000000000000000000000000000000000..cbbeedec3bfd0810a8ce8036e
> > > 6670585d9334924
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c
> > > @@ -0,0 +1,52 @@
> > > +/* { dg-require-effective-target vect_int } */
> > > +/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target {
> > > +aarch64*-*-* || arm*-*-* } } } */
> > > +/* { dg-add-options arm_v8_2a_i8mm }  */
> > > +
> > > +#include "tree-vect.h"
> > > +
> > > +#define N 50
> > > +
> > > +#ifndef SIGNEDNESS_1
> > > +#define SIGNEDNESS_1 unsigned
> > > +#define SIGNEDNESS_2 unsigned
> > > +#define SIGNEDNESS_3 signed
> > > +#define SIGNEDNESS_4 unsigned
> > > +#endif
> > > +
> > > +SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int res,
> > > +SIGNEDNESS_3 char *restrict a,
> > > +   SIGNEDNESS_4 char *restrict b)
> > > +{
> > > +  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> > > +    {
> > > +      int av = a[i];
> > > +      int bv = b[i];
> > > +      SIGNEDNESS_2 short mult = av * bv;
> > > +      res += mult;
> > > +    }
> > > +  return res;
> > > +}
> > > +
> > > +#define BASE ((SIGNEDNESS_3 int) -1 < 0 ? -126 : 4) #define OFFSET
> > > +20
> > > +
> > > +int
> > > +main (void)
> > > +{
> > > +  check_vect ();
> > > +
> > > +  SIGNEDNESS_3 char a[N], b[N];
> > > +  int expected = 0x12345;
> > > +  for (int i = 0; i < N; ++i)
> > > +    {
> > > +      a[i] = BASE + i * 5;
> > > +      b[i] = BASE + OFFSET + i * 4;
> > > +      asm volatile ("" ::: "memory");
> > > +      expected += (SIGNEDNESS_2 short) (a[i] * b[i]);
> > > +    }
> > > +  if (f (0x12345, a, b) != expected)
> > > +    __builtin_abort ();
> > > +}
> > > +
> > > +/* { dg-final { scan-tree-dump-not "vect_recog_dot_prod_pattern:
> > > +detected" "vect" } } */
> > > +/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" {
> > > +target vect_usdot_qi } } } */
> > > diff --git a/gcc/testsuite/lib/target-supports.exp
> > > b/gcc/testsuite/lib/target- supports.exp index
> > >
> ad323107f2ec5d55a77214beca5e4135643528b4..db9bd605ab4c838f65667fa61
> > > 6da334a171d9dfb 100644
> > > --- a/gcc/testsuite/lib/target-supports.exp
> > > +++ b/gcc/testsuite/lib/target-supports.exp
> > > @@ -5240,6 +5240,36 @@ proc
> > > check_effective_target_arm_v8_2a_dotprod_neon_ok_nocache { } {
> > >      return 0;
> > >  }
> > >
> > > +# Return 1 if the target supports ARMv8.2 Adv.SIMD imm8 #
> > > +instructions,
> > > +0 otherwise.  The test is valid for ARM and for AArch64.
> > > +# Record the command line options needed.
> > > +
> > > +proc check_effective_target_arm_v8_2a_imm8_neon_ok_nocache { } {
> > > +    global et_arm_v8_2a_imm8_neon_flags
> > > +    set et_arm_v8_2a_imm8_neon_flags ""
> > > +
> > > +    if { ![istarget arm*-*-*] && ![istarget aarch64*-*-*] } {
> > > +        return 0;
> > > +    }
> > > +
> > > +    # Iterate through sets of options to find the compiler flags that
> > > +    # need to be added to the -march option.
> > > +    foreach flags {"" "-mfloat-abi=softfp -mfpu=neon-fp-armv8"
> > > + "-mfloat-
> > > abi=hard -mfpu=neon-fp-armv8"} {
> > > +        if { [check_no_compiler_messages_nocache \
> > > +                  arm_v8_2a_imm8_neon_ok object {
> > > +	    #include <stdint.h>
> > > +            #if !defined (__ARM_FEATURE_MATMUL_INT8)
> > > +            #error "__ARM_FEATURE_MATMUL_INT8 not defined"
> > > +            #endif
> > > +        } "$flags -march=armv8.2-a+imm8"] } {
> > > +            set et_arm_v8_2a_imm8_neon_flags "$flags
> > > +-march=armv8.2-
> > > a+imm8"
> > > +            return 1
> > > +        }
> > > +    }
> > > +
> > > +    return 0;
> > > +}
> > > +
> > >  # Return 1 if the target supports ARMv8.1-M MVE  # instructions, 0
> > > otherwise.  The test is valid for ARM.
> > >  # Record the command line options needed.
> > > @@ -5667,6 +5697,43 @@ proc
> > > check_effective_target_arm_v8_2a_dotprod_neon_hw { } {
> > >      } [add_options_for_arm_v8_2a_dotprod_neon ""]]  }
> > >
> > > +# Return 1 if the target supports executing AdvSIMD instructions
> > > +from
> > > +ARMv8.2 # with the i8mm extension, 0 otherwise.  The test is valid
> > > +for ARM and for # AArch64.
> > > +
> > > +proc check_effective_target_arm_v8_2a_i8mm_neon_hw { } {
> > > +    if { ![check_effective_target_arm_v8_2a_i8mm_ok] } {
> > > +        return 0;
> > > +    }
> > > +    return [check_runtime arm_v8_2a_i8mm_neon_hw_available {
> > > +        #include "arm_neon.h"
> > > +        int
> > > +        main (void)
> > > +        {
> > > +
> > > +	  uint32x2_t results = {0,0};
> > > +	  uint8x8_t a = {1,1,1,1,2,2,2,2};
> > > +	  int8x8_t b = {2,2,2,2,3,3,3,3};
> > > +
> > > +          #ifdef __ARM_ARCH_ISA_A64
> > > +          asm ("usdot %0.2s, %1.8b, %2.8b"
> > > +               : "=w"(results)
> > > +               : "w"(a), "w"(b)
> > > +               : /* No clobbers.  */);
> > > +
> > > +	  #else
> > > +          asm ("vusdot.u8 %P0, %P1, %P2"
> > > +               : "=w"(results)
> > > +               : "w"(a), "w"(b)
> > > +               : /* No clobbers.  */);
> > > +          #endif
> > > +
> > > +          return (vget_lane_u32 (results, 0) == 8
> > > +		  && vget_lane_u32 (results, 1) == 24) ? 1 : 0;
> > > +        }
> > > +    } [add_options_for_arm_v8_2a_i8mm ""]] }
> > > +
> > >  # Return 1 if this is a ARM target with NEON enabled.
> > >
> > >  proc check_effective_target_arm_neon { } { @@ -7022,6 +7089,19 @@
> > > proc check_effective_target_vect_udot_qi { } {
> > >  		 && [et-is-effective-target mips_msa]) }}]  }
> > >
> > > +# Return 1 if the target plus current options supports a vector #
> > > +dot-product where one operand of the multiply is signed char # and
> > > +the other unsigned chars, 0 otherwise.
> > > +#
> > > +# This won't change for different subtargets so cache the result.
> > > +
> > > +proc check_effective_target_vect_usdot_qi { } {
> > > +    return [check_cached_effective_target_indexed vect_usdot_qi {
> > > +      expr { [istarget aarch64*-*-*]
> > > +	     || [istarget arm*-*-*] }}]
> > > +}
> > > +
> > > +
> > >  # Return 1 if the target plus current options supports a vector  #
> > > dot-product of signed shorts, 0 otherwise.
> > >  #
> > >
> > >
> > > --
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> Nuernberg, Germany; GF: Felix Imend

[-- Attachment #2: rb14436.patch --]
[-- Type: application/octet-stream, Size: 21842 bytes --]

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 16c6a3b8e9956e7ed5ea0766c2bd738d2a112cd4..b1fffd5e90f8b938a4c50c85f5c9ef0efe440468 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1780,6 +1780,10 @@ Target supports a vector dot-product of @code{signed char}.
 @item vect_udot_qi
 Target supports a vector dot-product of @code{unsigned char}.
 
+@item vect_usdot_qi
+Target supports a vector dot-product where one operand of the multiply is
+@code{signed char} and the other of @code{unsigned char}.
+
 @item vect_sdot_hi
 Target supports a vector dot-product of @code{signed short}.
 
@@ -2055,6 +2059,11 @@ ARM target supports executing instructions from ARMv8.2-A with the Dot
 Product extension. Some multilibs may be incompatible with these options.
 Implies arm_v8_2a_dotprod_neon_ok.
 
+@item arm_v8_2a_i8mm_neon_hw
+ARM target supports executing instructions from ARMv8.2-A with the 8-bit
+Matrix Multiply extension.  Some multilibs may be incompatible with these
+options.  Implies arm_v8_2a_i8mm_ok.
+
 @item arm_fp16fml_neon_ok
 @anchor{arm_fp16fml_neon_ok}
 ARM target supports extensions to generate the @code{VFMAL} and @code{VFMLS}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-10.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-10.c
new file mode 100644
index 0000000000000000000000000000000000000000..7ce86965ea97d37c43d96b4d2271df667dcb2aae
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-10.c
@@ -0,0 +1,13 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */
+/* { dg-add-options arm_v8_2a_i8mm }  */
+
+#define SIGNEDNESS_1 unsigned
+#define SIGNEDNESS_2 unsigned
+#define SIGNEDNESS_3 unsigned
+#define SIGNEDNESS_4 signed
+
+#include "vect-reduc-dot-9.c"
+
+/* { dg-final { scan-tree-dump-not "vect_recog_dot_prod_pattern: detected" "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { target vect_usdot_qi } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-11.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-11.c
new file mode 100644
index 0000000000000000000000000000000000000000..0f7cbbb87ef028f166366aea55bc4ef49d2f8e9b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-11.c
@@ -0,0 +1,13 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */
+/* { dg-add-options arm_v8_2a_i8mm }  */
+
+#define SIGNEDNESS_1 unsigned
+#define SIGNEDNESS_2 signed
+#define SIGNEDNESS_3 unsigned
+#define SIGNEDNESS_4 signed
+
+#include "vect-reduc-dot-9.c"
+
+/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected" "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { target vect_usdot_qi } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-12.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-12.c
new file mode 100644
index 0000000000000000000000000000000000000000..08412614fc67045d3067b5b55ba032d297595237
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-12.c
@@ -0,0 +1,13 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */
+/* { dg-add-options arm_v8_2a_i8mm }  */
+
+#define SIGNEDNESS_1 unsigned
+#define SIGNEDNESS_2 signed
+#define SIGNEDNESS_3 signed
+#define SIGNEDNESS_4 unsigned
+
+#include "vect-reduc-dot-9.c"
+
+/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected" "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { target vect_usdot_qi } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-13.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-13.c
new file mode 100644
index 0000000000000000000000000000000000000000..7ee0f45f64296442204ee13d5f880f4b7716fb85
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-13.c
@@ -0,0 +1,13 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */
+/* { dg-add-options arm_v8_2a_i8mm }  */
+
+#define SIGNEDNESS_1 signed
+#define SIGNEDNESS_2 unsigned
+#define SIGNEDNESS_3 signed
+#define SIGNEDNESS_4 unsigned
+
+#include "vect-reduc-dot-9.c"
+
+/* { dg-final { scan-tree-dump-not "vect_recog_dot_prod_pattern: detected" "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { target vect_usdot_qi } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-14.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-14.c
new file mode 100644
index 0000000000000000000000000000000000000000..2de1434528b87f0c32c54150b16791f3f2a469b5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-14.c
@@ -0,0 +1,13 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */
+/* { dg-add-options arm_v8_2a_i8mm }  */
+
+#define SIGNEDNESS_1 signed
+#define SIGNEDNESS_2 unsigned
+#define SIGNEDNESS_3 unsigned
+#define SIGNEDNESS_4 signed
+
+#include "vect-reduc-dot-9.c"
+
+/* { dg-final { scan-tree-dump-not "vect_recog_dot_prod_pattern: detected" "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { target vect_usdot_qi } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-15.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-15.c
new file mode 100644
index 0000000000000000000000000000000000000000..dc48f95a32bf76c54a906ee81ddee99b16aea84a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-15.c
@@ -0,0 +1,13 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */
+/* { dg-add-options arm_v8_2a_i8mm }  */
+
+#define SIGNEDNESS_1 signed
+#define SIGNEDNESS_2 signed
+#define SIGNEDNESS_3 unsigned
+#define SIGNEDNESS_4 signed
+
+#include "vect-reduc-dot-9.c"
+
+/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected" "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { target vect_usdot_qi } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-16.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-16.c
new file mode 100644
index 0000000000000000000000000000000000000000..aec628789366673321aea88c60316a68fe16cbc5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-16.c
@@ -0,0 +1,13 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */
+/* { dg-add-options arm_v8_2a_i8mm }  */
+
+#define SIGNEDNESS_1 signed
+#define SIGNEDNESS_2 signed
+#define SIGNEDNESS_3 signed
+#define SIGNEDNESS_4 unsigned
+
+#include "vect-reduc-dot-9.c"
+
+/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected" "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { target vect_usdot_qi } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-17.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-17.c
new file mode 100644
index 0000000000000000000000000000000000000000..aa269c4d657f65e07e36df7f3fd0098cf3aaf4d0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-17.c
@@ -0,0 +1,52 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */
+/* { dg-add-options arm_v8_2a_i8mm }  */
+
+#include "tree-vect.h"
+
+#define N 50
+
+#ifndef SIGNEDNESS_1
+#define SIGNEDNESS_1 unsigned
+#define SIGNEDNESS_2 unsigned
+#define SIGNEDNESS_3 signed
+#define SIGNEDNESS_4 unsigned
+#endif
+
+SIGNEDNESS_1 int __attribute__ ((noipa))
+f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a,
+   SIGNEDNESS_4 char *restrict b)
+{
+  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+    {
+      int av = a[i];
+      int bv = b[i];
+      SIGNEDNESS_2 int mult = av * bv;
+      res += mult;
+    }
+  return res;
+}
+
+#define BASE ((SIGNEDNESS_3 int) -1 < 0 ? -126 : 4)
+#define OFFSET 20
+
+int
+main (void)
+{
+  check_vect ();
+
+  SIGNEDNESS_3 char a[N], b[N];
+  int expected = 0x12345;
+  for (int i = 0; i < N; ++i)
+    {
+      a[i] = BASE + i * 5;
+      b[i] = BASE + OFFSET + i * 4;
+      asm volatile ("" ::: "memory");
+      expected += (SIGNEDNESS_2 int) (a[i] * b[i]);
+    }
+  if (f (0x12345, a, b) != expected)
+    __builtin_abort ();
+}
+
+/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected" "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { target vect_usdot_qi } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-18.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-18.c
new file mode 100644
index 0000000000000000000000000000000000000000..2b1cc0411c3256ccd876d8b4da18ce4881dc0af9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-18.c
@@ -0,0 +1,52 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */
+/* { dg-add-options arm_v8_2a_i8mm }  */
+
+#include "tree-vect.h"
+
+#define N 50
+
+#ifndef SIGNEDNESS_1
+#define SIGNEDNESS_1 unsigned
+#define SIGNEDNESS_2 signed
+#define SIGNEDNESS_3 signed
+#define SIGNEDNESS_4 unsigned
+#endif
+
+SIGNEDNESS_1 int __attribute__ ((noipa))
+f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a,
+   SIGNEDNESS_4 char *restrict b)
+{
+  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+    {
+      int av = a[i];
+      int bv = b[i];
+      SIGNEDNESS_2 int mult = av * bv;
+      res += mult;
+    }
+  return res;
+}
+
+#define BASE ((SIGNEDNESS_3 int) -1 < 0 ? -126 : 4)
+#define OFFSET 20
+
+int
+main (void)
+{
+  check_vect ();
+
+  SIGNEDNESS_3 char a[N], b[N];
+  int expected = 0x12345;
+  for (int i = 0; i < N; ++i)
+    {
+      a[i] = BASE + i * 5;
+      b[i] = BASE + OFFSET + i * 4;
+      asm volatile ("" ::: "memory");
+      expected += (SIGNEDNESS_2 int) (a[i] * b[i]);
+    }
+  if (f (0x12345, a, b) != expected)
+    __builtin_abort ();
+}
+
+/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected" "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { target vect_usdot_qi } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-19.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-19.c
new file mode 100644
index 0000000000000000000000000000000000000000..dbeaaec24a1095b7730d9e1262f5a951fd2312fc
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-19.c
@@ -0,0 +1,52 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */
+/* { dg-add-options arm_v8_2a_i8mm }  */
+
+#include "tree-vect.h"
+
+#define N 50
+
+#ifndef SIGNEDNESS_1
+#define SIGNEDNESS_1 unsigned
+#define SIGNEDNESS_2 signed
+#define SIGNEDNESS_3 signed
+#define SIGNEDNESS_4 unsigned
+#endif
+
+SIGNEDNESS_1 long __attribute__ ((noipa))
+f (SIGNEDNESS_1 long res, SIGNEDNESS_3 char *restrict a,
+   SIGNEDNESS_4 short *restrict b)
+{
+  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+    {
+      int av = a[i];
+      int bv = b[i];
+      SIGNEDNESS_2 long mult = av * bv;
+      res += mult;
+    }
+  return res;
+}
+
+#define BASE ((SIGNEDNESS_3 int) -1 < 0 ? -126 : 4)
+#define OFFSET 20
+
+int
+main (void)
+{
+  check_vect ();
+
+  SIGNEDNESS_3 char a[N];
+  SIGNEDNESS_4 short b[N];
+  int expected = 0x12345;
+  for (int i = 0; i < N; ++i)
+    {
+      a[i] = BASE + i * 5;
+      b[i] = BASE + OFFSET + i * 4;
+      asm volatile ("" ::: "memory");
+      expected += (SIGNEDNESS_2 int) (a[i] * b[i]);
+    }
+  if (f (0x12345, a, b) != expected)
+    __builtin_abort ();
+}
+
+/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-20.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-20.c
new file mode 100644
index 0000000000000000000000000000000000000000..d757fb15615ba79dedcbfc44407d3f363274ad26
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-20.c
@@ -0,0 +1,52 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */
+/* { dg-add-options arm_v8_2a_i8mm }  */
+
+#include "tree-vect.h"
+
+#define N 50
+
+#ifndef SIGNEDNESS_1
+#define SIGNEDNESS_1 unsigned
+#define SIGNEDNESS_2 signed
+#define SIGNEDNESS_3 signed
+#define SIGNEDNESS_4 unsigned
+#endif
+
+SIGNEDNESS_1 long __attribute__ ((noipa))
+f (SIGNEDNESS_1 long res, SIGNEDNESS_3 short *restrict a,
+   SIGNEDNESS_4 char *restrict b)
+{
+  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+    {
+      int av = a[i];
+      int bv = b[i];
+      SIGNEDNESS_2 long mult = av * bv;
+      res += mult;
+    }
+  return res;
+}
+
+#define BASE ((SIGNEDNESS_3 int) -1 < 0 ? -126 : 4)
+#define OFFSET 20
+
+int
+main (void)
+{
+  check_vect ();
+
+  SIGNEDNESS_3 short a[N];
+  SIGNEDNESS_4 char b[N];
+  int expected = 0x12345;
+  for (int i = 0; i < N; ++i)
+    {
+      a[i] = BASE + i * 5;
+      b[i] = BASE + OFFSET + i * 4;
+      asm volatile ("" ::: "memory");
+      expected += (SIGNEDNESS_2 int) (a[i] * b[i]);
+    }
+  if (f (0x12345, a, b) != expected)
+    __builtin_abort ();
+}
+
+/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-21.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-21.c
new file mode 100644
index 0000000000000000000000000000000000000000..6d08bf4478be83de86b0975524687a75d025123e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-21.c
@@ -0,0 +1,52 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */
+/* { dg-add-options arm_v8_2a_i8mm }  */
+
+#include "tree-vect.h"
+
+#define N 50
+
+#ifndef SIGNEDNESS_1
+#define SIGNEDNESS_1 unsigned
+#define SIGNEDNESS_2 signed
+#define SIGNEDNESS_3 signed
+#define SIGNEDNESS_4 unsigned
+#endif
+
+SIGNEDNESS_1 long __attribute__ ((noipa))
+f (SIGNEDNESS_1 long res, SIGNEDNESS_3 char *restrict a,
+   SIGNEDNESS_4 short *restrict b)
+{
+  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+    {
+      int av = a[i];
+      int bv = b[i];
+      SIGNEDNESS_2 int mult = av * bv;
+      res += mult;
+    }
+  return res;
+}
+
+#define BASE ((SIGNEDNESS_3 int) -1 < 0 ? -126 : 4)
+#define OFFSET 20
+
+int
+main (void)
+{
+  check_vect ();
+
+  SIGNEDNESS_3 char a[N];
+  SIGNEDNESS_4 short b[N];
+  int expected = 0x12345;
+  for (int i = 0; i < N; ++i)
+    {
+      a[i] = BASE + i * 5;
+      b[i] = BASE + OFFSET + i * 4;
+      asm volatile ("" ::: "memory");
+      expected += (SIGNEDNESS_2 int) (a[i] * b[i]);
+    }
+  if (f (0x12345, a, b) != expected)
+    __builtin_abort ();
+}
+
+/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-22.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-22.c
new file mode 100644
index 0000000000000000000000000000000000000000..febeb19784c6aaca72dc0871af0d32cc91fa6ea2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-22.c
@@ -0,0 +1,52 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */
+/* { dg-add-options arm_v8_2a_i8mm }  */
+
+#include "tree-vect.h"
+
+#define N 50
+
+#ifndef SIGNEDNESS_1
+#define SIGNEDNESS_1 unsigned
+#define SIGNEDNESS_2 unsigned
+#define SIGNEDNESS_3 signed
+#define SIGNEDNESS_4 unsigned
+#endif
+
+SIGNEDNESS_1 long __attribute__ ((noipa))
+f (SIGNEDNESS_1 long res, SIGNEDNESS_3 char *restrict a,
+   SIGNEDNESS_4 short *restrict b)
+{
+  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+    {
+      int av = a[i];
+      int bv = b[i];
+      SIGNEDNESS_2 int mult = av * bv;
+      res += mult;
+    }
+  return res;
+}
+
+#define BASE ((SIGNEDNESS_3 int) -1 < 0 ? -126 : 4)
+#define OFFSET 20
+
+int
+main (void)
+{
+  check_vect ();
+
+  SIGNEDNESS_3 char a[N];
+  SIGNEDNESS_4 short b[N];
+  int expected = 0x12345;
+  for (int i = 0; i < N; ++i)
+    {
+      a[i] = BASE + i * 5;
+      b[i] = BASE + OFFSET + i * 4;
+      asm volatile ("" ::: "memory");
+      expected += (SIGNEDNESS_2 int) (a[i] * b[i]);
+    }
+  if (f (0x12345, a, b) != expected)
+    __builtin_abort ();
+}
+
+/* { dg-final { scan-tree-dump-not "vect_recog_dot_prod_pattern: detected" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c
new file mode 100644
index 0000000000000000000000000000000000000000..cbbeedec3bfd0810a8ce8036e6670585d9334924
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c
@@ -0,0 +1,52 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */
+/* { dg-add-options arm_v8_2a_i8mm }  */
+
+#include "tree-vect.h"
+
+#define N 50
+
+#ifndef SIGNEDNESS_1
+#define SIGNEDNESS_1 unsigned
+#define SIGNEDNESS_2 unsigned
+#define SIGNEDNESS_3 signed
+#define SIGNEDNESS_4 unsigned
+#endif
+
+SIGNEDNESS_1 int __attribute__ ((noipa))
+f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a,
+   SIGNEDNESS_4 char *restrict b)
+{
+  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+    {
+      int av = a[i];
+      int bv = b[i];
+      SIGNEDNESS_2 short mult = av * bv;
+      res += mult;
+    }
+  return res;
+}
+
+#define BASE ((SIGNEDNESS_3 int) -1 < 0 ? -126 : 4)
+#define OFFSET 20
+
+int
+main (void)
+{
+  check_vect ();
+
+  SIGNEDNESS_3 char a[N], b[N];
+  int expected = 0x12345;
+  for (int i = 0; i < N; ++i)
+    {
+      a[i] = BASE + i * 5;
+      b[i] = BASE + OFFSET + i * 4;
+      asm volatile ("" ::: "memory");
+      expected += (SIGNEDNESS_2 short) (a[i] * b[i]);
+    }
+  if (f (0x12345, a, b) != expected)
+    __builtin_abort ();
+}
+
+/* { dg-final { scan-tree-dump-not "vect_recog_dot_prod_pattern: detected" "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { target vect_usdot_qi } } } */
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 7f78c5593ac4394fa5ca058e41517d7e7c98bd06..a53fe82929e148eae1912f5681b61c50ffa983a1 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -5267,6 +5267,36 @@ proc check_effective_target_arm_v8_2a_dotprod_neon_ok_nocache { } {
     return 0;
 }
 
+# Return 1 if the target supports ARMv8.2 Adv.SIMD imm8
+# instructions, 0 otherwise.  The test is valid for ARM and for AArch64.
+# Record the command line options needed.
+
+proc check_effective_target_arm_v8_2a_imm8_neon_ok_nocache { } {
+    global et_arm_v8_2a_imm8_neon_flags
+    set et_arm_v8_2a_imm8_neon_flags ""
+
+    if { ![istarget arm*-*-*] && ![istarget aarch64*-*-*] } {
+        return 0;
+    }
+
+    # Iterate through sets of options to find the compiler flags that
+    # need to be added to the -march option.
+    foreach flags {"" "-mfloat-abi=softfp -mfpu=neon-fp-armv8" "-mfloat-abi=hard -mfpu=neon-fp-armv8"} {
+        if { [check_no_compiler_messages_nocache \
+                  arm_v8_2a_imm8_neon_ok object {
+	    #include <stdint.h>
+            #if !defined (__ARM_FEATURE_MATMUL_INT8)
+            #error "__ARM_FEATURE_MATMUL_INT8 not defined"
+            #endif
+        } "$flags -march=armv8.2-a+imm8"] } {
+            set et_arm_v8_2a_imm8_neon_flags "$flags -march=armv8.2-a+imm8"
+            return 1
+        }
+    }
+
+    return 0;
+}
+
 # Return 1 if the target supports ARMv8.1-M MVE
 # instructions, 0 otherwise.  The test is valid for ARM.
 # Record the command line options needed.
@@ -5694,6 +5724,43 @@ proc check_effective_target_arm_v8_2a_dotprod_neon_hw { } {
     } [add_options_for_arm_v8_2a_dotprod_neon ""]]
 }
 
+# Return 1 if the target supports executing AdvSIMD instructions from ARMv8.2
+# with the i8mm extension, 0 otherwise.  The test is valid for ARM and for
+# AArch64.
+
+proc check_effective_target_arm_v8_2a_i8mm_neon_hw { } {
+    if { ![check_effective_target_arm_v8_2a_i8mm_ok] } {
+        return 0;
+    }
+    return [check_runtime arm_v8_2a_i8mm_neon_hw_available {
+        #include "arm_neon.h"
+        int
+        main (void)
+        {
+
+	  uint32x2_t results = {0,0};
+	  uint8x8_t a = {1,1,1,1,2,2,2,2};
+	  int8x8_t b = {2,2,2,2,3,3,3,3};
+
+          #ifdef __ARM_ARCH_ISA_A64
+          asm ("usdot %0.2s, %1.8b, %2.8b"
+               : "=w"(results)
+               : "w"(a), "w"(b)
+               : /* No clobbers.  */);
+
+	  #else
+          asm ("vusdot.u8 %P0, %P1, %P2"
+               : "=w"(results)
+               : "w"(a), "w"(b)
+               : /* No clobbers.  */);
+          #endif
+
+          return (vget_lane_u32 (results, 0) == 8
+		  && vget_lane_u32 (results, 1) == 24) ? 1 : 0;
+        }
+    } [add_options_for_arm_v8_2a_i8mm ""]]
+}
+
 # Return 1 if this is a ARM target with NEON enabled.
 
 proc check_effective_target_arm_neon { } {
@@ -7049,6 +7116,19 @@ proc check_effective_target_vect_udot_qi { } {
 		 && [et-is-effective-target mips_msa]) }}]
 }
 
+# Return 1 if the target plus current options supports a vector
+# dot-product where one operand of the multiply is signed char
+# and the other unsigned chars, 0 otherwise.
+#
+# This won't change for different subtargets so cache the result.
+
+proc check_effective_target_vect_usdot_qi { } {
+    return [check_cached_effective_target_indexed vect_usdot_qi {
+      expr { [istarget aarch64*-*-*]
+	     || [istarget arm*-*-*] }}]
+}
+
+
 # Return 1 if the target plus current options supports a vector
 # dot-product of signed shorts, 0 otherwise.
 #

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes.
  2021-05-05 17:38 [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes Tamar Christina
                   ` (2 preceding siblings ...)
  2021-05-05 17:39 ` [PATCH 4/4]middle-end: Add tests middle end generic tests for sign differing dotproduct Tamar Christina
@ 2021-05-07 11:45 ` Richard Biener
  2021-05-07 12:42   ` Tamar Christina
  3 siblings, 1 reply; 35+ messages in thread
From: Richard Biener @ 2021-05-07 11:45 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd

On Wed, 5 May 2021, Tamar Christina wrote:

> Hi All,
> 
> This patch adds support for a dot product where the sign of the multiplication
> arguments differ. i.e. one is signed and one is unsigned but the precisions are
> the same.
> 
> #define N 480
> #define SIGNEDNESS_1 unsigned
> #define SIGNEDNESS_2 signed
> #define SIGNEDNESS_3 signed
> #define SIGNEDNESS_4 unsigned
> 
> SIGNEDNESS_1 int __attribute__ ((noipa))
> f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a,
>    SIGNEDNESS_4 char *restrict b)
> {
>   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
>     {
>       int av = a[i];
>       int bv = b[i];
>       SIGNEDNESS_2 short mult = av * bv;
>       res += mult;
>     }
>   return res;
> }
> 
> The operations are performed as if the operands were extended to a 32-bit value.
> As such this operation isn't valid if there is an intermediate conversion to an
> unsigned value. i.e.  if SIGNEDNESS_2 is unsigned.
> 
> more over if the signs of SIGNEDNESS_3 and SIGNEDNESS_4 are flipped the same
> optab is used but the operands are flipped in the optab expansion.
> 
> To support this the patch extends the dot-product detection to optionally
> ignore operands with different signs and stores this information in the optab
> subtype which is now made a bitfield.
> 
> The subtype can now additionally controls which optab an EXPR can expand to.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* optabs.def (usdot_prod_optab): New.
> 	* doc/md.texi: Document it.
> 	* optabs-tree.c (optab_for_tree_code): Support usdot_prod_optab.
> 	* optabs-tree.h (enum optab_subtype): Likewise.
> 	* optabs.c (expand_widen_pattern_expr): Likewise.
> 	* tree-cfg.c (verify_gimple_assign_ternary): Likewise.
> 	* tree-vect-loop.c (vect_determine_dot_kind): New.
> 	(vectorizable_reduction): Query dot-product kind.
> 	* tree-vect-patterns.c (vect_supportable_direct_optab_p): Take optional
> 	optab subtype.
> 	(vect_joust_widened_type, vect_widened_op_tree): Optionally ignore
> 	mismatch types.
> 	(vect_recog_dot_prod_pattern): Support usdot_prod_optab.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index d166a0debedf4d8edf55c842bcf4ff4690b3e9ce..baf20416e63745097825fc30fdf2e66bc80d7d23 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5440,11 +5440,13 @@ Like @samp{fold_left_plus_@var{m}}, but takes an additional mask operand
>  @item @samp{sdot_prod@var{m}}
>  @cindex @code{udot_prod@var{m}} instruction pattern
>  @itemx @samp{udot_prod@var{m}}
> +@cindex @code{usdot_prod@var{m}} instruction pattern
> +@itemx @samp{usdot_prod@var{m}}
>  Compute the sum of the products of two signed/unsigned elements.
> -Operand 1 and operand 2 are of the same mode. Their product, which is of a
> -wider mode, is computed and added to operand 3. Operand 3 is of a mode equal or
> -wider than the mode of the product. The result is placed in operand 0, which
> -is of the same mode as operand 3.
> +Operand 1 and operand 2 are of the same mode but may differ in signs. Their
> +product, which is of a wider mode, is computed and added to operand 3.
> +Operand 3 is of a mode equal or wider than the mode of the product. The
> +result is placed in operand 0, which is of the same mode as operand 3.

This doesn't really say what the 's', 'u' and 'us' specify.  Since
we're doing a widen multiplication and then a non-widening addition
we only need to know the effective sign of the multiplication so
I think the existing 's' and 'u' are enough to cover all cases?

The tree.def docs say the sum is also possibly widening but I don't see
this covered by the optab so we should eventually remove this
feature from the tree side.  In fact the tree-cfg.c verifier requires
the addition to be not widening - thus only tree.def needs adjustment.

>  @cindex @code{ssad@var{m}} instruction pattern
>  @item @samp{ssad@var{m}}
> diff --git a/gcc/optabs-tree.h b/gcc/optabs-tree.h
> index c3aaa1a416991e856d3e24da45968a92ebada82c..ebc23ac86fe99057f375781c2f1990e0548ba08d 100644
> --- a/gcc/optabs-tree.h
> +++ b/gcc/optabs-tree.h
> @@ -27,11 +27,29 @@ along with GCC; see the file COPYING3.  If not see
>     shift amount vs. machines that take a vector for the shift amount.  */
>  enum optab_subtype
>  {
> -  optab_default,
> -  optab_scalar,
> -  optab_vector
> +  optab_default = 1 << 0,
> +  optab_scalar = 1 << 1,
> +  optab_vector = 1 << 2,
> +  optab_signed_to_unsigned = 1 << 3,
> +  optab_unsigned_to_signed = 1 << 4
>  };
>  
> +/* Override the OrEqual-operator so we can use optab_subtype as a bit flag.  */
> +inline enum optab_subtype&
> +operator |= (enum optab_subtype& a, enum optab_subtype b)
> +{
> +    return a = static_cast<optab_subtype>(static_cast<int>(a)
> +					  | static_cast<int>(b));
> +}
> +
> +/* Override the Or-operator so we can use optab_subtype as a bit flag.  */
> +inline enum optab_subtype
> +operator | (enum optab_subtype a, enum optab_subtype b)
> +{
> +    return static_cast<optab_subtype>(static_cast<int>(a)
> +				      | static_cast<int>(b));
> +}
> +
>  /* Return the optab used for computing the given operation on the type given by
>     the second argument.  The third argument distinguishes between the types of
>     vector shifts and rotates.  */
> diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
> index 95ffe397c23e80c105afea52e9d47216bf52f55a..2f60004545defc53182e004eea1e5c22b7453072 100644
> --- a/gcc/optabs-tree.c
> +++ b/gcc/optabs-tree.c
> @@ -127,7 +127,17 @@ optab_for_tree_code (enum tree_code code, const_tree type,
>        return TYPE_UNSIGNED (type) ? usum_widen_optab : ssum_widen_optab;
>  
>      case DOT_PROD_EXPR:
> -      return TYPE_UNSIGNED (type) ? udot_prod_optab : sdot_prod_optab;
> +      {
> +	gcc_assert (subtype & optab_default
> +		    || subtype & optab_vector
> +		    || subtype & optab_signed_to_unsigned
> +		    || subtype & optab_unsigned_to_signed);
> +
> +	if (subtype & (optab_unsigned_to_signed | optab_signed_to_unsigned))
> +	  return usdot_prod_optab;
> +
> +	return (TYPE_UNSIGNED (type) ? udot_prod_optab : sdot_prod_optab);
> +      }
>  
>      case SAD_EXPR:
>        return TYPE_UNSIGNED (type) ? usad_optab : ssad_optab;
> diff --git a/gcc/optabs.c b/gcc/optabs.c
> index f4614a394587787293dc8b680a38901f7906f61c..2e18b76de1412eab71971753ac678597c0d00098 100644
> --- a/gcc/optabs.c
> +++ b/gcc/optabs.c
> @@ -262,6 +262,11 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
>    bool sbool = false;
>  
>    oprnd0 = ops->op0;
> +  if (nops >= 2)
> +    oprnd1 = ops->op1;
> +  if (nops >= 3)
> +    oprnd2 = ops->op2;
> +
>    tmode0 = TYPE_MODE (TREE_TYPE (oprnd0));
>    if (ops->code == VEC_UNPACK_FIX_TRUNC_HI_EXPR
>        || ops->code == VEC_UNPACK_FIX_TRUNC_LO_EXPR)
> @@ -285,6 +290,27 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
>  	   ? vec_unpacks_sbool_hi_optab : vec_unpacks_sbool_lo_optab);
>        sbool = true;
>      }
> +  else if (ops->code == DOT_PROD_EXPR)
> +    {
> +      enum optab_subtype subtype = optab_default;
> +      signop sign1 = TYPE_SIGN (TREE_TYPE (oprnd0));
> +      signop sign2 = TYPE_SIGN (TREE_TYPE (oprnd1));
> +      if (sign1 == sign2)
> +	;
> +      else if (sign1 == SIGNED && sign2 == UNSIGNED)
> +	{
> +	  subtype |= optab_signed_to_unsigned;
> +	  /* Same as optab_unsigned_to_signed but flip the operands.  */
> +	  std::swap (op0, op1);
> +	}
> +      else if (sign1 == UNSIGNED && sign2 == SIGNED)
> +	subtype |= optab_unsigned_to_signed;
> +      else
> +	gcc_unreachable ();
> +
> +      widen_pattern_optab
> +	= optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), subtype);
> +    }
>    else
>      widen_pattern_optab
>        = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), optab_default);
> @@ -298,10 +324,7 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
>    gcc_assert (icode != CODE_FOR_nothing);
>  
>    if (nops >= 2)
> -    {
> -      oprnd1 = ops->op1;
> -      tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
> -    }
> +    tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
>    else if (sbool)
>      {
>        nops = 2;
> @@ -316,7 +339,6 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
>      {
>        gcc_assert (tmode1 == tmode0);
>        gcc_assert (op1);
> -      oprnd2 = ops->op2;
>        wmode = TYPE_MODE (TREE_TYPE (oprnd2));
>      }
>  
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index b192a9d070b8aa72e5676b2eaa020b5bdd7ffcc8..f470c2168378cec840edf7fbdb7c18615baae928 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -352,6 +352,7 @@ OPTAB_D (uavg_ceil_optab, "uavg$a3_ceil")
>  OPTAB_D (sdot_prod_optab, "sdot_prod$I$a")
>  OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3")
>  OPTAB_D (udot_prod_optab, "udot_prod$I$a")
> +OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
>  OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
>  OPTAB_D (usad_optab, "usad$I$a")
>  OPTAB_D (ssad_optab, "ssad$I$a")
> diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
> index 7e3aae5f9c28a49feedc7cc66e8ac0d476b9f28a..58b55bb648ad97d514f1fa18bb00808fd2678b42 100644
> --- a/gcc/tree-cfg.c
> +++ b/gcc/tree-cfg.c
> @@ -4421,7 +4421,8 @@ verify_gimple_assign_ternary (gassign *stmt)
>  		  && !SCALAR_FLOAT_TYPE_P (rhs1_type))
>  		 || (!INTEGRAL_TYPE_P (lhs_type)
>  		     && !SCALAR_FLOAT_TYPE_P (lhs_type))))
> -	    || !types_compatible_p (rhs1_type, rhs2_type)
> +	    || (!types_compatible_p (rhs1_type, rhs2_type)
> +		&& TYPE_SIGN (rhs1_type) == TYPE_SIGN (rhs2_type))

That's not restrictive enough.  I suggest you use

            && element_precision (rhs1_type) != element_precision 
(rhs2_type)

instead.

As said, I'm not sure all the changes in this patch are required.

Please elaborate.

Thanks,
Richard.

>  	    || !useless_type_conversion_p (lhs_type, rhs3_type)
>  	    || maybe_lt (GET_MODE_SIZE (element_mode (rhs3_type)),
>  			 2 * GET_MODE_SIZE (element_mode (rhs1_type))))
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index 93fa2928e001c154bd4a9a73ac1dbbbf73c456df..cb8f5fbb6abca181c4171194d19fec29ec6e4176 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -6401,6 +6401,33 @@ build_vect_cond_expr (enum tree_code code, tree vop[3], tree mask,
>      }
>  }
>  
> +/* Determine the optab_subtype to use for the given CODE and STMT.  For
> +   most CODE this will be optab_vector, however for certain operations such as
> +   DOT_PROD_EXPR where the operation can different signs for the operands we
> +   need to be able to pick the right optabs.  */
> +
> +static enum optab_subtype
> +vect_determine_dot_kind (tree_code code, stmt_vec_info stmt_vinfo)
> +{
> +  enum optab_subtype subtype = optab_vector;
> +  switch (code)
> +    {
> +      case DOT_PROD_EXPR:
> +	{
> +	  gassign *stmt = as_a <gassign *> (STMT_VINFO_STMT (stmt_vinfo));
> +	  signop rhs1_sign = TYPE_SIGN (TREE_TYPE (gimple_assign_rhs1 (stmt)));
> +	  signop rhs2_sign = TYPE_SIGN (TREE_TYPE (gimple_assign_rhs2 (stmt)));
> +	  if (rhs1_sign != rhs2_sign)
> +	    subtype |= optab_unsigned_to_signed;
> +	  break;
> +	}
> +      default:
> +	break;
> +    }
> +
> +  return subtype;
> +}
> +
>  /* Function vectorizable_reduction.
>  
>     Check if STMT_INFO performs a reduction operation that can be vectorized.
> @@ -7189,7 +7216,8 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
>        bool ok = true;
>  
>        /* 4.1. check support for the operation in the loop  */
> -      optab optab = optab_for_tree_code (code, vectype_in, optab_vector);
> +      enum optab_subtype subtype = vect_determine_dot_kind (code, stmt_info);
> +      optab optab = optab_for_tree_code (code, vectype_in, subtype);
>        if (!optab)
>  	{
>  	  if (dump_enabled_p ())
> diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
> index 441d6cd28c4eaded7abd756164890dbcffd2f3b8..943c001fb13777b4d1513841fa84942316846d5e 100644
> --- a/gcc/tree-vect-patterns.c
> +++ b/gcc/tree-vect-patterns.c
> @@ -201,7 +201,8 @@ vect_get_external_def_edge (vec_info *vinfo, tree var)
>  static bool
>  vect_supportable_direct_optab_p (vec_info *vinfo, tree otype, tree_code code,
>  				 tree itype, tree *vecotype_out,
> -				 tree *vecitype_out = NULL)
> +				 tree *vecitype_out = NULL,
> +				 enum optab_subtype subtype = optab_default)
>  {
>    tree vecitype = get_vectype_for_scalar_type (vinfo, itype);
>    if (!vecitype)
> @@ -211,7 +212,7 @@ vect_supportable_direct_optab_p (vec_info *vinfo, tree otype, tree_code code,
>    if (!vecotype)
>      return false;
>  
> -  optab optab = optab_for_tree_code (code, vecitype, optab_default);
> +  optab optab = optab_for_tree_code (code, vecitype, subtype);
>    if (!optab)
>      return false;
>  
> @@ -487,14 +488,31 @@ vect_joust_widened_integer (tree type, bool shift_p, tree op,
>  }
>  
>  /* Return true if the common supertype of NEW_TYPE and *COMMON_TYPE
> -   is narrower than type, storing the supertype in *COMMON_TYPE if so.  */
> +   is narrower than type, storing the supertype in *COMMON_TYPE if so.
> +   If ALLOW_SHORT_SIGN_MISMATCH then accept that *COMMON_TYPE and NEW_TYPE
> +   may be of different signs but equal precision.   */
>  
>  static bool
> -vect_joust_widened_type (tree type, tree new_type, tree *common_type)
> +vect_joust_widened_type (tree type, tree new_type, tree *common_type,
> +			 bool allow_short_sign_mismatch = false)
>  {
>    if (types_compatible_p (*common_type, new_type))
>      return true;
>  
> +  /* Check if the mismatch is only in the sign and if we have
> +     allow_short_sign_mismatch then allow it.  */
> +  if (allow_short_sign_mismatch
> +      && TYPE_SIGN (*common_type) != TYPE_SIGN (new_type))
> +    {
> +      bool sign = TYPE_SIGN (*common_type) == UNSIGNED;
> +      tree eq_type
> +	= build_nonstandard_integer_type (TYPE_PRECISION (new_type),
> +					  sign);
> +
> +      if (types_compatible_p (*common_type, eq_type))
> +	return true;
> +    }
> +
>    /* See if *COMMON_TYPE can hold all values of NEW_TYPE.  */
>    if ((TYPE_PRECISION (new_type) < TYPE_PRECISION (*common_type))
>        && (TYPE_UNSIGNED (new_type) || !TYPE_UNSIGNED (*common_type)))
> @@ -532,6 +550,9 @@ vect_joust_widened_type (tree type, tree new_type, tree *common_type)
>     to a type that (a) is narrower than the result of STMT_INFO and
>     (b) can hold all leaf operand values.
>  
> +   If ALLOW_SHORT_SIGN_MISMATCH then allow that the signs of the operands
> +   may differ in signs but not in precision.
> +
>     Return 0 if STMT_INFO isn't such a tree, or if no such COMMON_TYPE
>     exists.  */
>  
> @@ -539,7 +560,8 @@ static unsigned int
>  vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
>  		      tree_code widened_code, bool shift_p,
>  		      unsigned int max_nops,
> -		      vect_unpromoted_value *unprom, tree *common_type)
> +		      vect_unpromoted_value *unprom, tree *common_type,
> +		      bool allow_short_sign_mismatch = false)
>  {
>    /* Check for an integer operation with the right code.  */
>    gassign *assign = dyn_cast <gassign *> (stmt_info->stmt);
> @@ -600,7 +622,8 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
>  		= vinfo->lookup_def (this_unprom->op);
>  	      nops = vect_widened_op_tree (vinfo, def_stmt_info, code,
>  					   widened_code, shift_p, max_nops,
> -					   this_unprom, common_type);
> +					   this_unprom, common_type,
> +					   allow_short_sign_mismatch);
>  	      if (nops == 0)
>  		return 0;
>  
> @@ -617,7 +640,8 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
>  	      if (i == 0)
>  		*common_type = this_unprom->type;
>  	      else if (!vect_joust_widened_type (type, this_unprom->type,
> -						 common_type))
> +						 common_type,
> +						 allow_short_sign_mismatch))
>  		return 0;
>  	    }
>  	}
> @@ -888,21 +912,24 @@ vect_reassociating_reduction_p (vec_info *vinfo,
>  
>     Try to find the following pattern:
>  
> -     type x_t, y_t;
> +     type1a x_t
> +     type1b y_t;
>       TYPE1 prod;
>       TYPE2 sum = init;
>     loop:
>       sum_0 = phi <init, sum_1>
>       S1  x_t = ...
>       S2  y_t = ...
> -     S3  x_T = (TYPE1) x_t;
> -     S4  y_T = (TYPE1) y_t;
> +     S3  x_T = (TYPE3) x_t;
> +     S4  y_T = (TYPE4) y_t;
>       S5  prod = x_T * y_T;
>       [S6  prod = (TYPE2) prod;  #optional]
>       S7  sum_1 = prod + sum_0;
>  
> -   where 'TYPE1' is exactly double the size of type 'type', and 'TYPE2' is the
> -   same size of 'TYPE1' or bigger. This is a special case of a reduction
> +   where 'TYPE1' is exactly double the size of type 'type1a' and 'type1b',
> +   the sign of 'TYPE1' must be one of 'type1a' or 'type1b' but the sign of
> +   'type1a' and 'type1b' can differ. 'TYPE2' is the same size of 'TYPE1' or
> +   bigger and must be the same sign. This is a special case of a reduction
>     computation.
>  
>     Input:
> @@ -939,15 +966,16 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
>  
>    /* Look for the following pattern
>            DX = (TYPE1) X;
> -          DY = (TYPE1) Y;
> +	  DY = (TYPE2) Y;
>            DPROD = DX * DY;
> -          DDPROD = (TYPE2) DPROD;
> +	  DDPROD = (TYPE3) DPROD;
>            sum_1 = DDPROD + sum_0;
>       In which
>       - DX is double the size of X
>       - DY is double the size of Y
>       - DX, DY, DPROD all have the same type but the sign
> -       between DX, DY and DPROD can differ.
> +       between DX, DY and DPROD can differ. The sign of DPROD
> +       is one of the signs of DX or DY.
>       - sum is the same size of DPROD or bigger
>       - sum has been recognized as a reduction variable.
>  
> @@ -986,14 +1014,41 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
>       inside the loop (in case we are analyzing an outer-loop).  */
>    vect_unpromoted_value unprom0[2];
>    if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR, WIDEN_MULT_EXPR,
> -			     false, 2, unprom0, &half_type))
> +			     false, 2, unprom0, &half_type, true))
>      return NULL;
>  
> +  /* Check to see if there is a sign change happening in the operands of the
> +     multiplication and pick the appropriate optab subtype.  */
> +  enum optab_subtype subtype;
> +  tree rhs_type1 = unprom0[0].type;
> +  tree rhs_type2 = unprom0[1].type;
> +  if (TYPE_SIGN (rhs_type1) == TYPE_SIGN (rhs_type2))
> +     subtype = optab_default;
> +  else if (TYPE_SIGN (rhs_type1) == SIGNED
> +	   && TYPE_SIGN (rhs_type2) == UNSIGNED)
> +     subtype = optab_signed_to_unsigned;
> +  else if (TYPE_SIGN (rhs_type1) == UNSIGNED
> +	   && TYPE_SIGN (rhs_type2) == SIGNED)
> +     subtype = optab_unsigned_to_signed;
> +  else
> +    gcc_unreachable ();
> +
> +  /* If we have a sign changing dot product we need to check that the
> +     promoted type if unsigned has at least the same precision as the final
> +     type of the dot-product.  */
> +  if (subtype != optab_default)
> +    {
> +      tree mult_type = TREE_TYPE (unprom_mult.op);
> +      if (TYPE_SIGN (mult_type) == UNSIGNED
> +	  && TYPE_PRECISION (mult_type) < TYPE_PRECISION (type))
> +	return NULL;
> +    }
> +
>    vect_pattern_detected ("vect_recog_dot_prod_pattern", last_stmt);
>  
>    tree half_vectype;
>    if (!vect_supportable_direct_optab_p (vinfo, type, DOT_PROD_EXPR, half_type,
> -					type_out, &half_vectype))
> +					type_out, &half_vectype, subtype))
>      return NULL;
>  
>    /* Get the inputs in the appropriate types.  */
> @@ -1002,8 +1057,22 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
>  		       unprom0, half_vectype);
>  
>    var = vect_recog_temp_ssa_var (type, NULL);
> +
> +  /* If we have a sign changing dot-product the dot-product itself does any
> +     sign conversions, so consume the type and use the unpromoted types.  */
> +  tree mult_arg1, mult_arg2;
> +  if (subtype == optab_default)
> +    {
> +      mult_arg1 = mult_oprnd[0];
> +      mult_arg2 = mult_oprnd[1];
> +    }
> +  else
> +    {
> +      mult_arg1 = unprom0[0].op;
> +      mult_arg2 = unprom0[1].op;
> +    }
>    pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,
> -				      mult_oprnd[0], mult_oprnd[1], oprnd1);
> +				      mult_arg1, mult_arg2, oprnd1);
>  
>    return pattern_stmt;
>  }
> 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes.
  2021-05-07 11:45 ` [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes Richard Biener
@ 2021-05-07 12:42   ` Tamar Christina
  2021-05-10 11:39     ` Richard Biener
  0 siblings, 1 reply; 35+ messages in thread
From: Tamar Christina @ 2021-05-07 12:42 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd

Hi Richi,

> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Friday, May 7, 2021 12:46 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> Subject: Re: [PATCH 1/4]middle-end Vect: Add support for dot-product
> where the sign for the multiplicant changes.
> 
> On Wed, 5 May 2021, Tamar Christina wrote:
> 
> > Hi All,
> >
> > This patch adds support for a dot product where the sign of the
> > multiplication arguments differ. i.e. one is signed and one is
> > unsigned but the precisions are the same.
> >
> > #define N 480
> > #define SIGNEDNESS_1 unsigned
> > #define SIGNEDNESS_2 signed
> > #define SIGNEDNESS_3 signed
> > #define SIGNEDNESS_4 unsigned
> >
> > SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int res,
> > SIGNEDNESS_3 char *restrict a,
> >    SIGNEDNESS_4 char *restrict b)
> > {
> >   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> >     {
> >       int av = a[i];
> >       int bv = b[i];
> >       SIGNEDNESS_2 short mult = av * bv;
> >       res += mult;
> >     }
> >   return res;
> > }
> >
> > The operations are performed as if the operands were extended to a 32-bit
> value.
> > As such this operation isn't valid if there is an intermediate
> > conversion to an unsigned value. i.e.  if SIGNEDNESS_2 is unsigned.
> >
> > more over if the signs of SIGNEDNESS_3 and SIGNEDNESS_4 are flipped
> > the same optab is used but the operands are flipped in the optab
> expansion.
> >
> > To support this the patch extends the dot-product detection to
> > optionally ignore operands with different signs and stores this
> > information in the optab subtype which is now made a bitfield.
> >
> > The subtype can now additionally controls which optab an EXPR can expand
> to.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 	* optabs.def (usdot_prod_optab): New.
> > 	* doc/md.texi: Document it.
> > 	* optabs-tree.c (optab_for_tree_code): Support usdot_prod_optab.
> > 	* optabs-tree.h (enum optab_subtype): Likewise.
> > 	* optabs.c (expand_widen_pattern_expr): Likewise.
> > 	* tree-cfg.c (verify_gimple_assign_ternary): Likewise.
> > 	* tree-vect-loop.c (vect_determine_dot_kind): New.
> > 	(vectorizable_reduction): Query dot-product kind.
> > 	* tree-vect-patterns.c (vect_supportable_direct_optab_p): Take
> optional
> > 	optab subtype.
> > 	(vect_joust_widened_type, vect_widened_op_tree): Optionally
> ignore
> > 	mismatch types.
> > 	(vect_recog_dot_prod_pattern): Support usdot_prod_optab.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index
> >
> d166a0debedf4d8edf55c842bcf4ff4690b3e9ce..baf20416e63745097825fc30fd
> f2
> > e66bc80d7d23 100644
> > --- a/gcc/doc/md.texi
> > +++ b/gcc/doc/md.texi
> > @@ -5440,11 +5440,13 @@ Like @samp{fold_left_plus_@var{m}}, but
> takes
> > an additional mask operand  @item @samp{sdot_prod@var{m}}  @cindex
> > @code{udot_prod@var{m}} instruction pattern  @itemx
> > @samp{udot_prod@var{m}}
> > +@cindex @code{usdot_prod@var{m}} instruction pattern @itemx
> > +@samp{usdot_prod@var{m}}
> >  Compute the sum of the products of two signed/unsigned elements.
> > -Operand 1 and operand 2 are of the same mode. Their product, which is
> > of a -wider mode, is computed and added to operand 3. Operand 3 is of
> > a mode equal or -wider than the mode of the product. The result is
> > placed in operand 0, which -is of the same mode as operand 3.
> > +Operand 1 and operand 2 are of the same mode but may differ in signs.
> > +Their product, which is of a wider mode, is computed and added to
> operand 3.
> > +Operand 3 is of a mode equal or wider than the mode of the product.
> > +The result is placed in operand 0, which is of the same mode as operand 3.
> 
> This doesn't really say what the 's', 'u' and 'us' specify.  Since we're doing a
> widen multiplication and then a non-widening addition we only need to
> know the effective sign of the multiplication so I think the existing 's' and 'u'
> are enough to cover all cases?

The existing 's' and 'u' enforce that both operands of the multiplication are of the
same sign.  So for e.g. 'u' both operand must be unsigned.

In the `us` case one can be signed and one unsigned. Operationally this does a sign
extension to the wider type for the signed value, and the unsigned value gets zero extended
first, and then converts it to unsigned to perform the
unsigned multiplication, conforming to the C promotion rules.

TL;DR; Without a new optab I can't tell during expansion which semantic the operation
had at the gimple/C level as modes don't carry signs.

Long version:

The problem with using the existing patterns, because of their enforcement of `av` and `bv` being
the same sign is that we can't remove the explicit sign extensions, but the multiplication must be done
on the sign/zero extended char input in the same sign.

Which means (unless I am mistaken) to get the correct result, you can't use neither `udot` nor `sdot` as
semantically these would zero or sign extend both operands from char to int to perform the multiplication
in the same sigh.  Whereas in this case, one parameter is zero and one parameter is sign extended and the result
is always an unsigned number.

So basically

udot<unsigned c, unsigned a, unsigned b> ==
   c = zero-ext (a) * zero-ext (b)
sdot<signed c, signed a, signed b> ==
   c = sign-ext (a) * sign-ext (b)
usdot<unsigned c, unsigned a, signed b> ==
   c = ((unsigned-conv) sign-ext (a)) * zero-ext (b)

So semantically the existing optabs won't fit here. udot would internally promote to unsigned types before
the multiplication so the result of the multiplication would be wrong.  sdot would promote both to signed
and do signed multiplication, so the result is also wrong.

Now if I relax the constraint on the signs of udot and sdot there are two problems:
RTL Modes don't contain signs.  So a target can't tell me how the operands will be promoted.
So:

1) I can't really check which semantics the target will adhere to on expansion.
2) at expand time I have no way to differentiate between the two instructions variants, given just modes
     I can't tell whether I expand to the normal dot-product or the new instruction.

Regards,
Tamar

> 
> The tree.def docs say the sum is also possibly widening but I don't see this
> covered by the optab so we should eventually remove this feature from the
> tree side.  In fact the tree-cfg.c verifier requires the addition to be not
> widening - thus only tree.def needs adjustment.
> 
> >  @cindex @code{ssad@var{m}} instruction pattern  @item
> > @samp{ssad@var{m}} diff --git a/gcc/optabs-tree.h b/gcc/optabs-tree.h
> > index
> >
> c3aaa1a416991e856d3e24da45968a92ebada82c..ebc23ac86fe99057f375781c2f
> 19
> > 90e0548ba08d 100644
> > --- a/gcc/optabs-tree.h
> > +++ b/gcc/optabs-tree.h
> > @@ -27,11 +27,29 @@ along with GCC; see the file COPYING3.  If not see
> >     shift amount vs. machines that take a vector for the shift amount.
> > */  enum optab_subtype  {
> > -  optab_default,
> > -  optab_scalar,
> > -  optab_vector
> > +  optab_default = 1 << 0,
> > +  optab_scalar = 1 << 1,
> > +  optab_vector = 1 << 2,
> > +  optab_signed_to_unsigned = 1 << 3,
> > +  optab_unsigned_to_signed = 1 << 4
> >  };
> >
> > +/* Override the OrEqual-operator so we can use optab_subtype as a bit
> > +flag.  */ inline enum optab_subtype& operator |= (enum
> optab_subtype&
> > +a, enum optab_subtype b) {
> > +    return a = static_cast<optab_subtype>(static_cast<int>(a)
> > +					  | static_cast<int>(b));
> > +}
> > +
> > +/* Override the Or-operator so we can use optab_subtype as a bit
> > +flag.  */ inline enum optab_subtype operator | (enum optab_subtype a,
> > +enum optab_subtype b) {
> > +    return static_cast<optab_subtype>(static_cast<int>(a)
> > +				      | static_cast<int>(b));
> > +}
> > +
> >  /* Return the optab used for computing the given operation on the type
> given by
> >     the second argument.  The third argument distinguishes between the
> types of
> >     vector shifts and rotates.  */
> > diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c index
> >
> 95ffe397c23e80c105afea52e9d47216bf52f55a..2f60004545defc53182e004eea
> 1e
> > 5c22b7453072 100644
> > --- a/gcc/optabs-tree.c
> > +++ b/gcc/optabs-tree.c
> > @@ -127,7 +127,17 @@ optab_for_tree_code (enum tree_code code,
> const_tree type,
> >        return TYPE_UNSIGNED (type) ? usum_widen_optab :
> > ssum_widen_optab;
> >
> >      case DOT_PROD_EXPR:
> > -      return TYPE_UNSIGNED (type) ? udot_prod_optab : sdot_prod_optab;
> > +      {
> > +	gcc_assert (subtype & optab_default
> > +		    || subtype & optab_vector
> > +		    || subtype & optab_signed_to_unsigned
> > +		    || subtype & optab_unsigned_to_signed);
> > +
> > +	if (subtype & (optab_unsigned_to_signed |
> optab_signed_to_unsigned))
> > +	  return usdot_prod_optab;
> > +
> > +	return (TYPE_UNSIGNED (type) ? udot_prod_optab :
> sdot_prod_optab);
> > +      }
> >
> >      case SAD_EXPR:
> >        return TYPE_UNSIGNED (type) ? usad_optab : ssad_optab; diff
> > --git a/gcc/optabs.c b/gcc/optabs.c index
> >
> f4614a394587787293dc8b680a38901f7906f61c..2e18b76de1412eab71971753ac
> 67
> > 8597c0d00098 100644
> > --- a/gcc/optabs.c
> > +++ b/gcc/optabs.c
> > @@ -262,6 +262,11 @@ expand_widen_pattern_expr (sepops ops, rtx op0,
> rtx op1, rtx wide_op,
> >    bool sbool = false;
> >
> >    oprnd0 = ops->op0;
> > +  if (nops >= 2)
> > +    oprnd1 = ops->op1;
> > +  if (nops >= 3)
> > +    oprnd2 = ops->op2;
> > +
> >    tmode0 = TYPE_MODE (TREE_TYPE (oprnd0));
> >    if (ops->code == VEC_UNPACK_FIX_TRUNC_HI_EXPR
> >        || ops->code == VEC_UNPACK_FIX_TRUNC_LO_EXPR) @@ -285,6
> +290,27
> > @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx
> wide_op,
> >  	   ? vec_unpacks_sbool_hi_optab : vec_unpacks_sbool_lo_optab);
> >        sbool = true;
> >      }
> > +  else if (ops->code == DOT_PROD_EXPR)
> > +    {
> > +      enum optab_subtype subtype = optab_default;
> > +      signop sign1 = TYPE_SIGN (TREE_TYPE (oprnd0));
> > +      signop sign2 = TYPE_SIGN (TREE_TYPE (oprnd1));
> > +      if (sign1 == sign2)
> > +	;
> > +      else if (sign1 == SIGNED && sign2 == UNSIGNED)
> > +	{
> > +	  subtype |= optab_signed_to_unsigned;
> > +	  /* Same as optab_unsigned_to_signed but flip the operands.  */
> > +	  std::swap (op0, op1);
> > +	}
> > +      else if (sign1 == UNSIGNED && sign2 == SIGNED)
> > +	subtype |= optab_unsigned_to_signed;
> > +      else
> > +	gcc_unreachable ();
> > +
> > +      widen_pattern_optab
> > +	= optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), subtype);
> > +    }
> >    else
> >      widen_pattern_optab
> >        = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0),
> > optab_default); @@ -298,10 +324,7 @@ expand_widen_pattern_expr
> (sepops ops, rtx op0, rtx op1, rtx wide_op,
> >    gcc_assert (icode != CODE_FOR_nothing);
> >
> >    if (nops >= 2)
> > -    {
> > -      oprnd1 = ops->op1;
> > -      tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
> > -    }
> > +    tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
> >    else if (sbool)
> >      {
> >        nops = 2;
> > @@ -316,7 +339,6 @@ expand_widen_pattern_expr (sepops ops, rtx op0,
> rtx op1, rtx wide_op,
> >      {
> >        gcc_assert (tmode1 == tmode0);
> >        gcc_assert (op1);
> > -      oprnd2 = ops->op2;
> >        wmode = TYPE_MODE (TREE_TYPE (oprnd2));
> >      }
> >
> > diff --git a/gcc/optabs.def b/gcc/optabs.def index
> >
> b192a9d070b8aa72e5676b2eaa020b5bdd7ffcc8..f470c2168378cec840edf7fbd
> b7c
> > 18615baae928 100644
> > --- a/gcc/optabs.def
> > +++ b/gcc/optabs.def
> > @@ -352,6 +352,7 @@ OPTAB_D (uavg_ceil_optab, "uavg$a3_ceil")
> OPTAB_D
> > (sdot_prod_optab, "sdot_prod$I$a")  OPTAB_D (ssum_widen_optab,
> > "widen_ssum$I$a3")  OPTAB_D (udot_prod_optab, "udot_prod$I$a")
> > +OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
> >  OPTAB_D (usum_widen_optab, "widen_usum$I$a3")  OPTAB_D
> (usad_optab,
> > "usad$I$a")  OPTAB_D (ssad_optab, "ssad$I$a") diff --git
> > a/gcc/tree-cfg.c b/gcc/tree-cfg.c index
> >
> 7e3aae5f9c28a49feedc7cc66e8ac0d476b9f28a..58b55bb648ad97d514f1fa18bb
> 00
> > 808fd2678b42 100644
> > --- a/gcc/tree-cfg.c
> > +++ b/gcc/tree-cfg.c
> > @@ -4421,7 +4421,8 @@ verify_gimple_assign_ternary (gassign *stmt)
> >  		  && !SCALAR_FLOAT_TYPE_P (rhs1_type))
> >  		 || (!INTEGRAL_TYPE_P (lhs_type)
> >  		     && !SCALAR_FLOAT_TYPE_P (lhs_type))))
> > -	    || !types_compatible_p (rhs1_type, rhs2_type)
> > +	    || (!types_compatible_p (rhs1_type, rhs2_type)
> > +		&& TYPE_SIGN (rhs1_type) == TYPE_SIGN (rhs2_type))
> 
> That's not restrictive enough.  I suggest you use
> 
>             && element_precision (rhs1_type) != element_precision
> (rhs2_type)
> 
> instead.
> 
> As said, I'm not sure all the changes in this patch are required.
> 
> Please elaborate.
> 
> Thanks,
> Richard.
> 
> >  	    || !useless_type_conversion_p (lhs_type, rhs3_type)
> >  	    || maybe_lt (GET_MODE_SIZE (element_mode (rhs3_type)),
> >  			 2 * GET_MODE_SIZE (element_mode (rhs1_type))))
> diff --git
> > a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index
> >
> 93fa2928e001c154bd4a9a73ac1dbbbf73c456df..cb8f5fbb6abca181c4171194d1
> 9f
> > ec29ec6e4176 100644
> > --- a/gcc/tree-vect-loop.c
> > +++ b/gcc/tree-vect-loop.c
> > @@ -6401,6 +6401,33 @@ build_vect_cond_expr (enum tree_code code,
> tree vop[3], tree mask,
> >      }
> >  }
> >
> > +/* Determine the optab_subtype to use for the given CODE and STMT.
> For
> > +   most CODE this will be optab_vector, however for certain operations
> such as
> > +   DOT_PROD_EXPR where the operation can different signs for the
> operands we
> > +   need to be able to pick the right optabs.  */
> > +
> > +static enum optab_subtype
> > +vect_determine_dot_kind (tree_code code, stmt_vec_info stmt_vinfo) {
> > +  enum optab_subtype subtype = optab_vector;
> > +  switch (code)
> > +    {
> > +      case DOT_PROD_EXPR:
> > +	{
> > +	  gassign *stmt = as_a <gassign *> (STMT_VINFO_STMT (stmt_vinfo));
> > +	  signop rhs1_sign = TYPE_SIGN (TREE_TYPE (gimple_assign_rhs1
> (stmt)));
> > +	  signop rhs2_sign = TYPE_SIGN (TREE_TYPE (gimple_assign_rhs2
> (stmt)));
> > +	  if (rhs1_sign != rhs2_sign)
> > +	    subtype |= optab_unsigned_to_signed;
> > +	  break;
> > +	}
> > +      default:
> > +	break;
> > +    }
> > +
> > +  return subtype;
> > +}
> > +
> >  /* Function vectorizable_reduction.
> >
> >     Check if STMT_INFO performs a reduction operation that can be
> vectorized.
> > @@ -7189,7 +7216,8 @@ vectorizable_reduction (loop_vec_info
> loop_vinfo,
> >        bool ok = true;
> >
> >        /* 4.1. check support for the operation in the loop  */
> > -      optab optab = optab_for_tree_code (code, vectype_in, optab_vector);
> > +      enum optab_subtype subtype = vect_determine_dot_kind (code,
> stmt_info);
> > +      optab optab = optab_for_tree_code (code, vectype_in, subtype);
> >        if (!optab)
> >  	{
> >  	  if (dump_enabled_p ())
> > diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c index
> >
> 441d6cd28c4eaded7abd756164890dbcffd2f3b8..943c001fb13777b4d1513841f
> a84
> > 942316846d5e 100644
> > --- a/gcc/tree-vect-patterns.c
> > +++ b/gcc/tree-vect-patterns.c
> > @@ -201,7 +201,8 @@ vect_get_external_def_edge (vec_info *vinfo, tree
> > var)  static bool  vect_supportable_direct_optab_p (vec_info *vinfo,
> > tree otype, tree_code code,
> >  				 tree itype, tree *vecotype_out,
> > -				 tree *vecitype_out = NULL)
> > +				 tree *vecitype_out = NULL,
> > +				 enum optab_subtype subtype =
> optab_default)
> >  {
> >    tree vecitype = get_vectype_for_scalar_type (vinfo, itype);
> >    if (!vecitype)
> > @@ -211,7 +212,7 @@ vect_supportable_direct_optab_p (vec_info *vinfo,
> tree otype, tree_code code,
> >    if (!vecotype)
> >      return false;
> >
> > -  optab optab = optab_for_tree_code (code, vecitype, optab_default);
> > +  optab optab = optab_for_tree_code (code, vecitype, subtype);
> >    if (!optab)
> >      return false;
> >
> > @@ -487,14 +488,31 @@ vect_joust_widened_integer (tree type, bool
> > shift_p, tree op,  }
> >
> >  /* Return true if the common supertype of NEW_TYPE and
> *COMMON_TYPE
> > -   is narrower than type, storing the supertype in *COMMON_TYPE if so.
> */
> > +   is narrower than type, storing the supertype in *COMMON_TYPE if so.
> > +   If ALLOW_SHORT_SIGN_MISMATCH then accept that *COMMON_TYPE
> and NEW_TYPE
> > +   may be of different signs but equal precision.   */
> >
> >  static bool
> > -vect_joust_widened_type (tree type, tree new_type, tree
> *common_type)
> > +vect_joust_widened_type (tree type, tree new_type, tree
> *common_type,
> > +			 bool allow_short_sign_mismatch = false)
> >  {
> >    if (types_compatible_p (*common_type, new_type))
> >      return true;
> >
> > +  /* Check if the mismatch is only in the sign and if we have
> > +     allow_short_sign_mismatch then allow it.  */
> > +  if (allow_short_sign_mismatch
> > +      && TYPE_SIGN (*common_type) != TYPE_SIGN (new_type))
> > +    {
> > +      bool sign = TYPE_SIGN (*common_type) == UNSIGNED;
> > +      tree eq_type
> > +	= build_nonstandard_integer_type (TYPE_PRECISION (new_type),
> > +					  sign);
> > +
> > +      if (types_compatible_p (*common_type, eq_type))
> > +	return true;
> > +    }
> > +
> >    /* See if *COMMON_TYPE can hold all values of NEW_TYPE.  */
> >    if ((TYPE_PRECISION (new_type) < TYPE_PRECISION (*common_type))
> >        && (TYPE_UNSIGNED (new_type) || !TYPE_UNSIGNED
> (*common_type)))
> > @@ -532,6 +550,9 @@ vect_joust_widened_type (tree type, tree
> new_type, tree *common_type)
> >     to a type that (a) is narrower than the result of STMT_INFO and
> >     (b) can hold all leaf operand values.
> >
> > +   If ALLOW_SHORT_SIGN_MISMATCH then allow that the signs of the
> operands
> > +   may differ in signs but not in precision.
> > +
> >     Return 0 if STMT_INFO isn't such a tree, or if no such COMMON_TYPE
> >     exists.  */
> >
> > @@ -539,7 +560,8 @@ static unsigned int  vect_widened_op_tree
> > (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
> >  		      tree_code widened_code, bool shift_p,
> >  		      unsigned int max_nops,
> > -		      vect_unpromoted_value *unprom, tree *common_type)
> > +		      vect_unpromoted_value *unprom, tree *common_type,
> > +		      bool allow_short_sign_mismatch = false)
> >  {
> >    /* Check for an integer operation with the right code.  */
> >    gassign *assign = dyn_cast <gassign *> (stmt_info->stmt); @@ -600,7
> > +622,8 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info
> stmt_info, tree_code code,
> >  		= vinfo->lookup_def (this_unprom->op);
> >  	      nops = vect_widened_op_tree (vinfo, def_stmt_info, code,
> >  					   widened_code, shift_p, max_nops,
> > -					   this_unprom, common_type);
> > +					   this_unprom, common_type,
> > +					   allow_short_sign_mismatch);
> >  	      if (nops == 0)
> >  		return 0;
> >
> > @@ -617,7 +640,8 @@ vect_widened_op_tree (vec_info *vinfo,
> stmt_vec_info stmt_info, tree_code code,
> >  	      if (i == 0)
> >  		*common_type = this_unprom->type;
> >  	      else if (!vect_joust_widened_type (type, this_unprom->type,
> > -						 common_type))
> > +						 common_type,
> > +						 allow_short_sign_mismatch))
> >  		return 0;
> >  	    }
> >  	}
> > @@ -888,21 +912,24 @@ vect_reassociating_reduction_p (vec_info *vinfo,
> >
> >     Try to find the following pattern:
> >
> > -     type x_t, y_t;
> > +     type1a x_t
> > +     type1b y_t;
> >       TYPE1 prod;
> >       TYPE2 sum = init;
> >     loop:
> >       sum_0 = phi <init, sum_1>
> >       S1  x_t = ...
> >       S2  y_t = ...
> > -     S3  x_T = (TYPE1) x_t;
> > -     S4  y_T = (TYPE1) y_t;
> > +     S3  x_T = (TYPE3) x_t;
> > +     S4  y_T = (TYPE4) y_t;
> >       S5  prod = x_T * y_T;
> >       [S6  prod = (TYPE2) prod;  #optional]
> >       S7  sum_1 = prod + sum_0;
> >
> > -   where 'TYPE1' is exactly double the size of type 'type', and 'TYPE2' is the
> > -   same size of 'TYPE1' or bigger. This is a special case of a reduction
> > +   where 'TYPE1' is exactly double the size of type 'type1a' and 'type1b',
> > +   the sign of 'TYPE1' must be one of 'type1a' or 'type1b' but the sign of
> > +   'type1a' and 'type1b' can differ. 'TYPE2' is the same size of 'TYPE1' or
> > +   bigger and must be the same sign. This is a special case of a
> > + reduction
> >     computation.
> >
> >     Input:
> > @@ -939,15 +966,16 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
> >
> >    /* Look for the following pattern
> >            DX = (TYPE1) X;
> > -          DY = (TYPE1) Y;
> > +	  DY = (TYPE2) Y;
> >            DPROD = DX * DY;
> > -          DDPROD = (TYPE2) DPROD;
> > +	  DDPROD = (TYPE3) DPROD;
> >            sum_1 = DDPROD + sum_0;
> >       In which
> >       - DX is double the size of X
> >       - DY is double the size of Y
> >       - DX, DY, DPROD all have the same type but the sign
> > -       between DX, DY and DPROD can differ.
> > +       between DX, DY and DPROD can differ. The sign of DPROD
> > +       is one of the signs of DX or DY.
> >       - sum is the same size of DPROD or bigger
> >       - sum has been recognized as a reduction variable.
> >
> > @@ -986,14 +1014,41 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
> >       inside the loop (in case we are analyzing an outer-loop).  */
> >    vect_unpromoted_value unprom0[2];
> >    if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR,
> WIDEN_MULT_EXPR,
> > -			     false, 2, unprom0, &half_type))
> > +			     false, 2, unprom0, &half_type, true))
> >      return NULL;
> >
> > +  /* Check to see if there is a sign change happening in the operands of
> the
> > +     multiplication and pick the appropriate optab subtype.  */
> > +  enum optab_subtype subtype;
> > +  tree rhs_type1 = unprom0[0].type;
> > +  tree rhs_type2 = unprom0[1].type;
> > +  if (TYPE_SIGN (rhs_type1) == TYPE_SIGN (rhs_type2))
> > +     subtype = optab_default;
> > +  else if (TYPE_SIGN (rhs_type1) == SIGNED
> > +	   && TYPE_SIGN (rhs_type2) == UNSIGNED)
> > +     subtype = optab_signed_to_unsigned;
> > +  else if (TYPE_SIGN (rhs_type1) == UNSIGNED
> > +	   && TYPE_SIGN (rhs_type2) == SIGNED)
> > +     subtype = optab_unsigned_to_signed;
> > +  else
> > +    gcc_unreachable ();
> > +
> > +  /* If we have a sign changing dot product we need to check that the
> > +     promoted type if unsigned has at least the same precision as the final
> > +     type of the dot-product.  */
> > +  if (subtype != optab_default)
> > +    {
> > +      tree mult_type = TREE_TYPE (unprom_mult.op);
> > +      if (TYPE_SIGN (mult_type) == UNSIGNED
> > +	  && TYPE_PRECISION (mult_type) < TYPE_PRECISION (type))
> > +	return NULL;
> > +    }
> > +
> >    vect_pattern_detected ("vect_recog_dot_prod_pattern", last_stmt);
> >
> >    tree half_vectype;
> >    if (!vect_supportable_direct_optab_p (vinfo, type, DOT_PROD_EXPR,
> half_type,
> > -					type_out, &half_vectype))
> > +					type_out, &half_vectype, subtype))
> >      return NULL;
> >
> >    /* Get the inputs in the appropriate types.  */ @@ -1002,8 +1057,22
> > @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
> >  		       unprom0, half_vectype);
> >
> >    var = vect_recog_temp_ssa_var (type, NULL);
> > +
> > +  /* If we have a sign changing dot-product the dot-product itself does any
> > +     sign conversions, so consume the type and use the unpromoted
> > + types.  */  tree mult_arg1, mult_arg2;  if (subtype ==
> > + optab_default)
> > +    {
> > +      mult_arg1 = mult_oprnd[0];
> > +      mult_arg2 = mult_oprnd[1];
> > +    }
> > +  else
> > +    {
> > +      mult_arg1 = unprom0[0].op;
> > +      mult_arg2 = unprom0[1].op;
> > +    }
> >    pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,
> > -				      mult_oprnd[0], mult_oprnd[1], oprnd1);
> > +				      mult_arg1, mult_arg2, oprnd1);
> >
> >    return pattern_stmt;
> >  }
> >
> >
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes.
  2021-05-07 12:42   ` Tamar Christina
@ 2021-05-10 11:39     ` Richard Biener
  2021-05-10 12:58       ` Tamar Christina
  0 siblings, 1 reply; 35+ messages in thread
From: Richard Biener @ 2021-05-10 11:39 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd

On Fri, 7 May 2021, Tamar Christina wrote:

> Hi Richi,
> 
> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de>
> > Sent: Friday, May 7, 2021 12:46 PM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> > Subject: Re: [PATCH 1/4]middle-end Vect: Add support for dot-product
> > where the sign for the multiplicant changes.
> > 
> > On Wed, 5 May 2021, Tamar Christina wrote:
> > 
> > > Hi All,
> > >
> > > This patch adds support for a dot product where the sign of the
> > > multiplication arguments differ. i.e. one is signed and one is
> > > unsigned but the precisions are the same.
> > >
> > > #define N 480
> > > #define SIGNEDNESS_1 unsigned
> > > #define SIGNEDNESS_2 signed
> > > #define SIGNEDNESS_3 signed
> > > #define SIGNEDNESS_4 unsigned
> > >
> > > SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int res,
> > > SIGNEDNESS_3 char *restrict a,
> > >    SIGNEDNESS_4 char *restrict b)
> > > {
> > >   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> > >     {
> > >       int av = a[i];
> > >       int bv = b[i];
> > >       SIGNEDNESS_2 short mult = av * bv;
> > >       res += mult;
> > >     }
> > >   return res;
> > > }
> > >
> > > The operations are performed as if the operands were extended to a 32-bit
> > value.
> > > As such this operation isn't valid if there is an intermediate
> > > conversion to an unsigned value. i.e.  if SIGNEDNESS_2 is unsigned.
> > >
> > > more over if the signs of SIGNEDNESS_3 and SIGNEDNESS_4 are flipped
> > > the same optab is used but the operands are flipped in the optab
> > expansion.
> > >
> > > To support this the patch extends the dot-product detection to
> > > optionally ignore operands with different signs and stores this
> > > information in the optab subtype which is now made a bitfield.
> > >
> > > The subtype can now additionally controls which optab an EXPR can expand
> > to.
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > >
> > > Ok for master?
> > >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > > 	* optabs.def (usdot_prod_optab): New.
> > > 	* doc/md.texi: Document it.
> > > 	* optabs-tree.c (optab_for_tree_code): Support usdot_prod_optab.
> > > 	* optabs-tree.h (enum optab_subtype): Likewise.
> > > 	* optabs.c (expand_widen_pattern_expr): Likewise.
> > > 	* tree-cfg.c (verify_gimple_assign_ternary): Likewise.
> > > 	* tree-vect-loop.c (vect_determine_dot_kind): New.
> > > 	(vectorizable_reduction): Query dot-product kind.
> > > 	* tree-vect-patterns.c (vect_supportable_direct_optab_p): Take
> > optional
> > > 	optab subtype.
> > > 	(vect_joust_widened_type, vect_widened_op_tree): Optionally
> > ignore
> > > 	mismatch types.
> > > 	(vect_recog_dot_prod_pattern): Support usdot_prod_optab.
> > >
> > > --- inline copy of patch --
> > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index
> > >
> > d166a0debedf4d8edf55c842bcf4ff4690b3e9ce..baf20416e63745097825fc30fd
> > f2
> > > e66bc80d7d23 100644
> > > --- a/gcc/doc/md.texi
> > > +++ b/gcc/doc/md.texi
> > > @@ -5440,11 +5440,13 @@ Like @samp{fold_left_plus_@var{m}}, but
> > takes
> > > an additional mask operand  @item @samp{sdot_prod@var{m}}  @cindex
> > > @code{udot_prod@var{m}} instruction pattern  @itemx
> > > @samp{udot_prod@var{m}}
> > > +@cindex @code{usdot_prod@var{m}} instruction pattern @itemx
> > > +@samp{usdot_prod@var{m}}
> > >  Compute the sum of the products of two signed/unsigned elements.
> > > -Operand 1 and operand 2 are of the same mode. Their product, which is
> > > of a -wider mode, is computed and added to operand 3. Operand 3 is of
> > > a mode equal or -wider than the mode of the product. The result is
> > > placed in operand 0, which -is of the same mode as operand 3.
> > > +Operand 1 and operand 2 are of the same mode but may differ in signs.
> > > +Their product, which is of a wider mode, is computed and added to
> > operand 3.
> > > +Operand 3 is of a mode equal or wider than the mode of the product.
> > > +The result is placed in operand 0, which is of the same mode as operand 3.
> > 
> > This doesn't really say what the 's', 'u' and 'us' specify.  Since we're doing a
> > widen multiplication and then a non-widening addition we only need to
> > know the effective sign of the multiplication so I think the existing 's' and 'u'
> > are enough to cover all cases?
> 
> The existing 's' and 'u' enforce that both operands of the multiplication are of the
> same sign.  So for e.g. 'u' both operand must be unsigned.
> 
> In the `us` case one can be signed and one unsigned. Operationally this does a sign
> extension to the wider type for the signed value, and the unsigned value gets zero extended
> first, and then converts it to unsigned to perform the
> unsigned multiplication, conforming to the C promotion rules.
> 
> TL;DR; Without a new optab I can't tell during expansion which semantic the operation
> had at the gimple/C level as modes don't carry signs.
> 
> Long version:
> 
> The problem with using the existing patterns, because of their enforcement of `av` and `bv` being
> the same sign is that we can't remove the explicit sign extensions, but the multiplication must be done
> on the sign/zero extended char input in the same sign.
> 
> Which means (unless I am mistaken) to get the correct result, you can't use neither `udot` nor `sdot` as
> semantically these would zero or sign extend both operands from char to int to perform the multiplication
> in the same sigh.  Whereas in this case, one parameter is zero and one parameter is sign extended and the result
> is always an unsigned number.
> 
> So basically
> 
> udot<unsigned c, unsigned a, unsigned b> ==
>    c = zero-ext (a) * zero-ext (b)
> sdot<signed c, signed a, signed b> ==
>    c = sign-ext (a) * sign-ext (b)
> usdot<unsigned c, unsigned a, signed b> ==
>    c = ((unsigned-conv) sign-ext (a)) * zero-ext (b)
> 
> So semantically the existing optabs won't fit here. udot would internally promote to unsigned types before
> the multiplication so the result of the multiplication would be wrong.  sdot would promote both to signed
> and do signed multiplication, so the result is also wrong.
> 
> Now if I relax the constraint on the signs of udot and sdot there are two problems:
> RTL Modes don't contain signs.  So a target can't tell me how the operands will be promoted.
> So:
> 
> 1) I can't really check which semantics the target will adhere to on expansion.
> 2) at expand time I have no way to differentiate between the two instructions variants, given just modes
>      I can't tell whether I expand to the normal dot-product or the new instruction.

Ah, OK.  Indeed with such a weird instruction the new variant makes
sense.  Still can you please amend the optab documentation to say
which operand is unsigned and which is signed?  Just 'may differ in signs'
is bad.

Since the multiplication is commutative I wonder why you need to handle
both signed_to_unsigned and unsigned_to_signed - we should just enforce
a canonical order (like the optab does).  I also think it's a
particular bad fit for the bad optab_for_tree_code API - would any of
that improve when using a direct internal function here?  In
particular all the changes around optab_subtype look like they make
a bad API worse ... at least a single optab_vector_mixed_sign should
suffice here, no need to make it a flags kind.

+  /* If we have a sign changing dot product we need to check that the
+     promoted type if unsigned has at least the same precision as the 
final
+     type of the dot-product.  */
+  if (subtype != optab_default)
+    {
+      tree mult_type = TREE_TYPE (unprom_mult.op);
+      if (TYPE_SIGN (mult_type) == UNSIGNED
+         && TYPE_PRECISION (mult_type) < TYPE_PRECISION (type))
+       return NULL;
+    }

I don't understand this - how do we ever arrive at a result with
less precision?  And why's this not an issue for signed multiplication?
Also...

+  /* If we have a sign changing dot-product the dot-product itself does 
any
+     sign conversions, so consume the type and use the unpromoted types.  
*/
+  tree mult_arg1, mult_arg2;
+  if (subtype == optab_default)
+    {
+      mult_arg1 = mult_oprnd[0];
+      mult_arg2 = mult_oprnd[1];
+    }
+  else
+    {
+      mult_arg1 = unprom0[0].op;
+      mult_arg2 = unprom0[1].op;
+    }
   pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,
-                                     mult_oprnd[0], mult_oprnd[1], 
oprnd1);
+                                     mult_arg1, mult_arg2, oprnd1);

I thought DOT_PROD always performs the promotion.  Maybe
mult_oprnd and unprom0 are just misnamed here?

Richard.

> Regards,
> Tamar
> 
> > 
> > The tree.def docs say the sum is also possibly widening but I don't see this
> > covered by the optab so we should eventually remove this feature from the
> > tree side.  In fact the tree-cfg.c verifier requires the addition to be not
> > widening - thus only tree.def needs adjustment.
> > 
> > >  @cindex @code{ssad@var{m}} instruction pattern  @item
> > > @samp{ssad@var{m}} diff --git a/gcc/optabs-tree.h b/gcc/optabs-tree.h
> > > index
> > >
> > c3aaa1a416991e856d3e24da45968a92ebada82c..ebc23ac86fe99057f375781c2f
> > 19
> > > 90e0548ba08d 100644
> > > --- a/gcc/optabs-tree.h
> > > +++ b/gcc/optabs-tree.h
> > > @@ -27,11 +27,29 @@ along with GCC; see the file COPYING3.  If not see
> > >     shift amount vs. machines that take a vector for the shift amount.
> > > */  enum optab_subtype  {
> > > -  optab_default,
> > > -  optab_scalar,
> > > -  optab_vector
> > > +  optab_default = 1 << 0,
> > > +  optab_scalar = 1 << 1,
> > > +  optab_vector = 1 << 2,
> > > +  optab_signed_to_unsigned = 1 << 3,
> > > +  optab_unsigned_to_signed = 1 << 4
> > >  };
> > >
> > > +/* Override the OrEqual-operator so we can use optab_subtype as a bit
> > > +flag.  */ inline enum optab_subtype& operator |= (enum
> > optab_subtype&
> > > +a, enum optab_subtype b) {
> > > +    return a = static_cast<optab_subtype>(static_cast<int>(a)
> > > +					  | static_cast<int>(b));
> > > +}
> > > +
> > > +/* Override the Or-operator so we can use optab_subtype as a bit
> > > +flag.  */ inline enum optab_subtype operator | (enum optab_subtype a,
> > > +enum optab_subtype b) {
> > > +    return static_cast<optab_subtype>(static_cast<int>(a)
> > > +				      | static_cast<int>(b));
> > > +}
> > > +
> > >  /* Return the optab used for computing the given operation on the type
> > given by
> > >     the second argument.  The third argument distinguishes between the
> > types of
> > >     vector shifts and rotates.  */
> > > diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c index
> > >
> > 95ffe397c23e80c105afea52e9d47216bf52f55a..2f60004545defc53182e004eea
> > 1e
> > > 5c22b7453072 100644
> > > --- a/gcc/optabs-tree.c
> > > +++ b/gcc/optabs-tree.c
> > > @@ -127,7 +127,17 @@ optab_for_tree_code (enum tree_code code,
> > const_tree type,
> > >        return TYPE_UNSIGNED (type) ? usum_widen_optab :
> > > ssum_widen_optab;
> > >
> > >      case DOT_PROD_EXPR:
> > > -      return TYPE_UNSIGNED (type) ? udot_prod_optab : sdot_prod_optab;
> > > +      {
> > > +	gcc_assert (subtype & optab_default
> > > +		    || subtype & optab_vector
> > > +		    || subtype & optab_signed_to_unsigned
> > > +		    || subtype & optab_unsigned_to_signed);
> > > +
> > > +	if (subtype & (optab_unsigned_to_signed |
> > optab_signed_to_unsigned))
> > > +	  return usdot_prod_optab;
> > > +
> > > +	return (TYPE_UNSIGNED (type) ? udot_prod_optab :
> > sdot_prod_optab);
> > > +      }
> > >
> > >      case SAD_EXPR:
> > >        return TYPE_UNSIGNED (type) ? usad_optab : ssad_optab; diff
> > > --git a/gcc/optabs.c b/gcc/optabs.c index
> > >
> > f4614a394587787293dc8b680a38901f7906f61c..2e18b76de1412eab71971753ac
> > 67
> > > 8597c0d00098 100644
> > > --- a/gcc/optabs.c
> > > +++ b/gcc/optabs.c
> > > @@ -262,6 +262,11 @@ expand_widen_pattern_expr (sepops ops, rtx op0,
> > rtx op1, rtx wide_op,
> > >    bool sbool = false;
> > >
> > >    oprnd0 = ops->op0;
> > > +  if (nops >= 2)
> > > +    oprnd1 = ops->op1;
> > > +  if (nops >= 3)
> > > +    oprnd2 = ops->op2;
> > > +
> > >    tmode0 = TYPE_MODE (TREE_TYPE (oprnd0));
> > >    if (ops->code == VEC_UNPACK_FIX_TRUNC_HI_EXPR
> > >        || ops->code == VEC_UNPACK_FIX_TRUNC_LO_EXPR) @@ -285,6
> > +290,27
> > > @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx
> > wide_op,
> > >  	   ? vec_unpacks_sbool_hi_optab : vec_unpacks_sbool_lo_optab);
> > >        sbool = true;
> > >      }
> > > +  else if (ops->code == DOT_PROD_EXPR)
> > > +    {
> > > +      enum optab_subtype subtype = optab_default;
> > > +      signop sign1 = TYPE_SIGN (TREE_TYPE (oprnd0));
> > > +      signop sign2 = TYPE_SIGN (TREE_TYPE (oprnd1));
> > > +      if (sign1 == sign2)
> > > +	;
> > > +      else if (sign1 == SIGNED && sign2 == UNSIGNED)
> > > +	{
> > > +	  subtype |= optab_signed_to_unsigned;
> > > +	  /* Same as optab_unsigned_to_signed but flip the operands.  */
> > > +	  std::swap (op0, op1);
> > > +	}
> > > +      else if (sign1 == UNSIGNED && sign2 == SIGNED)
> > > +	subtype |= optab_unsigned_to_signed;
> > > +      else
> > > +	gcc_unreachable ();
> > > +
> > > +      widen_pattern_optab
> > > +	= optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), subtype);
> > > +    }
> > >    else
> > >      widen_pattern_optab
> > >        = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0),
> > > optab_default); @@ -298,10 +324,7 @@ expand_widen_pattern_expr
> > (sepops ops, rtx op0, rtx op1, rtx wide_op,
> > >    gcc_assert (icode != CODE_FOR_nothing);
> > >
> > >    if (nops >= 2)
> > > -    {
> > > -      oprnd1 = ops->op1;
> > > -      tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
> > > -    }
> > > +    tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
> > >    else if (sbool)
> > >      {
> > >        nops = 2;
> > > @@ -316,7 +339,6 @@ expand_widen_pattern_expr (sepops ops, rtx op0,
> > rtx op1, rtx wide_op,
> > >      {
> > >        gcc_assert (tmode1 == tmode0);
> > >        gcc_assert (op1);
> > > -      oprnd2 = ops->op2;
> > >        wmode = TYPE_MODE (TREE_TYPE (oprnd2));
> > >      }
> > >
> > > diff --git a/gcc/optabs.def b/gcc/optabs.def index
> > >
> > b192a9d070b8aa72e5676b2eaa020b5bdd7ffcc8..f470c2168378cec840edf7fbd
> > b7c
> > > 18615baae928 100644
> > > --- a/gcc/optabs.def
> > > +++ b/gcc/optabs.def
> > > @@ -352,6 +352,7 @@ OPTAB_D (uavg_ceil_optab, "uavg$a3_ceil")
> > OPTAB_D
> > > (sdot_prod_optab, "sdot_prod$I$a")  OPTAB_D (ssum_widen_optab,
> > > "widen_ssum$I$a3")  OPTAB_D (udot_prod_optab, "udot_prod$I$a")
> > > +OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
> > >  OPTAB_D (usum_widen_optab, "widen_usum$I$a3")  OPTAB_D
> > (usad_optab,
> > > "usad$I$a")  OPTAB_D (ssad_optab, "ssad$I$a") diff --git
> > > a/gcc/tree-cfg.c b/gcc/tree-cfg.c index
> > >
> > 7e3aae5f9c28a49feedc7cc66e8ac0d476b9f28a..58b55bb648ad97d514f1fa18bb
> > 00
> > > 808fd2678b42 100644
> > > --- a/gcc/tree-cfg.c
> > > +++ b/gcc/tree-cfg.c
> > > @@ -4421,7 +4421,8 @@ verify_gimple_assign_ternary (gassign *stmt)
> > >  		  && !SCALAR_FLOAT_TYPE_P (rhs1_type))
> > >  		 || (!INTEGRAL_TYPE_P (lhs_type)
> > >  		     && !SCALAR_FLOAT_TYPE_P (lhs_type))))
> > > -	    || !types_compatible_p (rhs1_type, rhs2_type)
> > > +	    || (!types_compatible_p (rhs1_type, rhs2_type)
> > > +		&& TYPE_SIGN (rhs1_type) == TYPE_SIGN (rhs2_type))
> > 
> > That's not restrictive enough.  I suggest you use
> > 
> >             && element_precision (rhs1_type) != element_precision
> > (rhs2_type)
> > 
> > instead.
> > 
> > As said, I'm not sure all the changes in this patch are required.
> > 
> > Please elaborate.
> > 
> > Thanks,
> > Richard.
> > 
> > >  	    || !useless_type_conversion_p (lhs_type, rhs3_type)
> > >  	    || maybe_lt (GET_MODE_SIZE (element_mode (rhs3_type)),
> > >  			 2 * GET_MODE_SIZE (element_mode (rhs1_type))))
> > diff --git
> > > a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index
> > >
> > 93fa2928e001c154bd4a9a73ac1dbbbf73c456df..cb8f5fbb6abca181c4171194d1
> > 9f
> > > ec29ec6e4176 100644
> > > --- a/gcc/tree-vect-loop.c
> > > +++ b/gcc/tree-vect-loop.c
> > > @@ -6401,6 +6401,33 @@ build_vect_cond_expr (enum tree_code code,
> > tree vop[3], tree mask,
> > >      }
> > >  }
> > >
> > > +/* Determine the optab_subtype to use for the given CODE and STMT.
> > For
> > > +   most CODE this will be optab_vector, however for certain operations
> > such as
> > > +   DOT_PROD_EXPR where the operation can different signs for the
> > operands we
> > > +   need to be able to pick the right optabs.  */
> > > +
> > > +static enum optab_subtype
> > > +vect_determine_dot_kind (tree_code code, stmt_vec_info stmt_vinfo) {
> > > +  enum optab_subtype subtype = optab_vector;
> > > +  switch (code)
> > > +    {
> > > +      case DOT_PROD_EXPR:
> > > +	{
> > > +	  gassign *stmt = as_a <gassign *> (STMT_VINFO_STMT (stmt_vinfo));
> > > +	  signop rhs1_sign = TYPE_SIGN (TREE_TYPE (gimple_assign_rhs1
> > (stmt)));
> > > +	  signop rhs2_sign = TYPE_SIGN (TREE_TYPE (gimple_assign_rhs2
> > (stmt)));
> > > +	  if (rhs1_sign != rhs2_sign)
> > > +	    subtype |= optab_unsigned_to_signed;
> > > +	  break;
> > > +	}
> > > +      default:
> > > +	break;
> > > +    }
> > > +
> > > +  return subtype;
> > > +}
> > > +
> > >  /* Function vectorizable_reduction.
> > >
> > >     Check if STMT_INFO performs a reduction operation that can be
> > vectorized.
> > > @@ -7189,7 +7216,8 @@ vectorizable_reduction (loop_vec_info
> > loop_vinfo,
> > >        bool ok = true;
> > >
> > >        /* 4.1. check support for the operation in the loop  */
> > > -      optab optab = optab_for_tree_code (code, vectype_in, optab_vector);
> > > +      enum optab_subtype subtype = vect_determine_dot_kind (code,
> > stmt_info);
> > > +      optab optab = optab_for_tree_code (code, vectype_in, subtype);
> > >        if (!optab)
> > >  	{
> > >  	  if (dump_enabled_p ())
> > > diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c index
> > >
> > 441d6cd28c4eaded7abd756164890dbcffd2f3b8..943c001fb13777b4d1513841f
> > a84
> > > 942316846d5e 100644
> > > --- a/gcc/tree-vect-patterns.c
> > > +++ b/gcc/tree-vect-patterns.c
> > > @@ -201,7 +201,8 @@ vect_get_external_def_edge (vec_info *vinfo, tree
> > > var)  static bool  vect_supportable_direct_optab_p (vec_info *vinfo,
> > > tree otype, tree_code code,
> > >  				 tree itype, tree *vecotype_out,
> > > -				 tree *vecitype_out = NULL)
> > > +				 tree *vecitype_out = NULL,
> > > +				 enum optab_subtype subtype =
> > optab_default)
> > >  {
> > >    tree vecitype = get_vectype_for_scalar_type (vinfo, itype);
> > >    if (!vecitype)
> > > @@ -211,7 +212,7 @@ vect_supportable_direct_optab_p (vec_info *vinfo,
> > tree otype, tree_code code,
> > >    if (!vecotype)
> > >      return false;
> > >
> > > -  optab optab = optab_for_tree_code (code, vecitype, optab_default);
> > > +  optab optab = optab_for_tree_code (code, vecitype, subtype);
> > >    if (!optab)
> > >      return false;
> > >
> > > @@ -487,14 +488,31 @@ vect_joust_widened_integer (tree type, bool
> > > shift_p, tree op,  }
> > >
> > >  /* Return true if the common supertype of NEW_TYPE and
> > *COMMON_TYPE
> > > -   is narrower than type, storing the supertype in *COMMON_TYPE if so.
> > */
> > > +   is narrower than type, storing the supertype in *COMMON_TYPE if so.
> > > +   If ALLOW_SHORT_SIGN_MISMATCH then accept that *COMMON_TYPE
> > and NEW_TYPE
> > > +   may be of different signs but equal precision.   */
> > >
> > >  static bool
> > > -vect_joust_widened_type (tree type, tree new_type, tree
> > *common_type)
> > > +vect_joust_widened_type (tree type, tree new_type, tree
> > *common_type,
> > > +			 bool allow_short_sign_mismatch = false)
> > >  {
> > >    if (types_compatible_p (*common_type, new_type))
> > >      return true;
> > >
> > > +  /* Check if the mismatch is only in the sign and if we have
> > > +     allow_short_sign_mismatch then allow it.  */
> > > +  if (allow_short_sign_mismatch
> > > +      && TYPE_SIGN (*common_type) != TYPE_SIGN (new_type))
> > > +    {
> > > +      bool sign = TYPE_SIGN (*common_type) == UNSIGNED;
> > > +      tree eq_type
> > > +	= build_nonstandard_integer_type (TYPE_PRECISION (new_type),
> > > +					  sign);
> > > +
> > > +      if (types_compatible_p (*common_type, eq_type))
> > > +	return true;
> > > +    }
> > > +
> > >    /* See if *COMMON_TYPE can hold all values of NEW_TYPE.  */
> > >    if ((TYPE_PRECISION (new_type) < TYPE_PRECISION (*common_type))
> > >        && (TYPE_UNSIGNED (new_type) || !TYPE_UNSIGNED
> > (*common_type)))
> > > @@ -532,6 +550,9 @@ vect_joust_widened_type (tree type, tree
> > new_type, tree *common_type)
> > >     to a type that (a) is narrower than the result of STMT_INFO and
> > >     (b) can hold all leaf operand values.
> > >
> > > +   If ALLOW_SHORT_SIGN_MISMATCH then allow that the signs of the
> > operands
> > > +   may differ in signs but not in precision.
> > > +
> > >     Return 0 if STMT_INFO isn't such a tree, or if no such COMMON_TYPE
> > >     exists.  */
> > >
> > > @@ -539,7 +560,8 @@ static unsigned int  vect_widened_op_tree
> > > (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
> > >  		      tree_code widened_code, bool shift_p,
> > >  		      unsigned int max_nops,
> > > -		      vect_unpromoted_value *unprom, tree *common_type)
> > > +		      vect_unpromoted_value *unprom, tree *common_type,
> > > +		      bool allow_short_sign_mismatch = false)
> > >  {
> > >    /* Check for an integer operation with the right code.  */
> > >    gassign *assign = dyn_cast <gassign *> (stmt_info->stmt); @@ -600,7
> > > +622,8 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info
> > stmt_info, tree_code code,
> > >  		= vinfo->lookup_def (this_unprom->op);
> > >  	      nops = vect_widened_op_tree (vinfo, def_stmt_info, code,
> > >  					   widened_code, shift_p, max_nops,
> > > -					   this_unprom, common_type);
> > > +					   this_unprom, common_type,
> > > +					   allow_short_sign_mismatch);
> > >  	      if (nops == 0)
> > >  		return 0;
> > >
> > > @@ -617,7 +640,8 @@ vect_widened_op_tree (vec_info *vinfo,
> > stmt_vec_info stmt_info, tree_code code,
> > >  	      if (i == 0)
> > >  		*common_type = this_unprom->type;
> > >  	      else if (!vect_joust_widened_type (type, this_unprom->type,
> > > -						 common_type))
> > > +						 common_type,
> > > +						 allow_short_sign_mismatch))
> > >  		return 0;
> > >  	    }
> > >  	}
> > > @@ -888,21 +912,24 @@ vect_reassociating_reduction_p (vec_info *vinfo,
> > >
> > >     Try to find the following pattern:
> > >
> > > -     type x_t, y_t;
> > > +     type1a x_t
> > > +     type1b y_t;
> > >       TYPE1 prod;
> > >       TYPE2 sum = init;
> > >     loop:
> > >       sum_0 = phi <init, sum_1>
> > >       S1  x_t = ...
> > >       S2  y_t = ...
> > > -     S3  x_T = (TYPE1) x_t;
> > > -     S4  y_T = (TYPE1) y_t;
> > > +     S3  x_T = (TYPE3) x_t;
> > > +     S4  y_T = (TYPE4) y_t;
> > >       S5  prod = x_T * y_T;
> > >       [S6  prod = (TYPE2) prod;  #optional]
> > >       S7  sum_1 = prod + sum_0;
> > >
> > > -   where 'TYPE1' is exactly double the size of type 'type', and 'TYPE2' is the
> > > -   same size of 'TYPE1' or bigger. This is a special case of a reduction
> > > +   where 'TYPE1' is exactly double the size of type 'type1a' and 'type1b',
> > > +   the sign of 'TYPE1' must be one of 'type1a' or 'type1b' but the sign of
> > > +   'type1a' and 'type1b' can differ. 'TYPE2' is the same size of 'TYPE1' or
> > > +   bigger and must be the same sign. This is a special case of a
> > > + reduction
> > >     computation.
> > >
> > >     Input:
> > > @@ -939,15 +966,16 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
> > >
> > >    /* Look for the following pattern
> > >            DX = (TYPE1) X;
> > > -          DY = (TYPE1) Y;
> > > +	  DY = (TYPE2) Y;
> > >            DPROD = DX * DY;
> > > -          DDPROD = (TYPE2) DPROD;
> > > +	  DDPROD = (TYPE3) DPROD;
> > >            sum_1 = DDPROD + sum_0;
> > >       In which
> > >       - DX is double the size of X
> > >       - DY is double the size of Y
> > >       - DX, DY, DPROD all have the same type but the sign
> > > -       between DX, DY and DPROD can differ.
> > > +       between DX, DY and DPROD can differ. The sign of DPROD
> > > +       is one of the signs of DX or DY.
> > >       - sum is the same size of DPROD or bigger
> > >       - sum has been recognized as a reduction variable.
> > >
> > > @@ -986,14 +1014,41 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
> > >       inside the loop (in case we are analyzing an outer-loop).  */
> > >    vect_unpromoted_value unprom0[2];
> > >    if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR,
> > WIDEN_MULT_EXPR,
> > > -			     false, 2, unprom0, &half_type))
> > > +			     false, 2, unprom0, &half_type, true))
> > >      return NULL;
> > >
> > > +  /* Check to see if there is a sign change happening in the operands of
> > the
> > > +     multiplication and pick the appropriate optab subtype.  */
> > > +  enum optab_subtype subtype;
> > > +  tree rhs_type1 = unprom0[0].type;
> > > +  tree rhs_type2 = unprom0[1].type;
> > > +  if (TYPE_SIGN (rhs_type1) == TYPE_SIGN (rhs_type2))
> > > +     subtype = optab_default;
> > > +  else if (TYPE_SIGN (rhs_type1) == SIGNED
> > > +	   && TYPE_SIGN (rhs_type2) == UNSIGNED)
> > > +     subtype = optab_signed_to_unsigned;
> > > +  else if (TYPE_SIGN (rhs_type1) == UNSIGNED
> > > +	   && TYPE_SIGN (rhs_type2) == SIGNED)
> > > +     subtype = optab_unsigned_to_signed;
> > > +  else
> > > +    gcc_unreachable ();
> > > +
> > > +  /* If we have a sign changing dot product we need to check that the
> > > +     promoted type if unsigned has at least the same precision as the final
> > > +     type of the dot-product.  */
> > > +  if (subtype != optab_default)
> > > +    {
> > > +      tree mult_type = TREE_TYPE (unprom_mult.op);
> > > +      if (TYPE_SIGN (mult_type) == UNSIGNED
> > > +	  && TYPE_PRECISION (mult_type) < TYPE_PRECISION (type))
> > > +	return NULL;
> > > +    }
> > > +
> > >    vect_pattern_detected ("vect_recog_dot_prod_pattern", last_stmt);
> > >
> > >    tree half_vectype;
> > >    if (!vect_supportable_direct_optab_p (vinfo, type, DOT_PROD_EXPR,
> > half_type,
> > > -					type_out, &half_vectype))
> > > +					type_out, &half_vectype, subtype))
> > >      return NULL;
> > >
> > >    /* Get the inputs in the appropriate types.  */ @@ -1002,8 +1057,22
> > > @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
> > >  		       unprom0, half_vectype);
> > >
> > >    var = vect_recog_temp_ssa_var (type, NULL);
> > > +
> > > +  /* If we have a sign changing dot-product the dot-product itself does any
> > > +     sign conversions, so consume the type and use the unpromoted
> > > + types.  */  tree mult_arg1, mult_arg2;  if (subtype ==
> > > + optab_default)
> > > +    {
> > > +      mult_arg1 = mult_oprnd[0];
> > > +      mult_arg2 = mult_oprnd[1];
> > > +    }
> > > +  else
> > > +    {
> > > +      mult_arg1 = unprom0[0].op;
> > > +      mult_arg2 = unprom0[1].op;
> > > +    }
> > >    pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,
> > > -				      mult_oprnd[0], mult_oprnd[1], oprnd1);
> > > +				      mult_arg1, mult_arg2, oprnd1);
> > >
> > >    return pattern_stmt;
> > >  }
> > >
> > >
> > >
> > 
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> > Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes.
  2021-05-10 11:39     ` Richard Biener
@ 2021-05-10 12:58       ` Tamar Christina
  2021-05-10 13:29         ` Richard Biener
  0 siblings, 1 reply; 35+ messages in thread
From: Tamar Christina @ 2021-05-10 12:58 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd



> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Monday, May 10, 2021 12:40 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> Subject: RE: [PATCH 1/4]middle-end Vect: Add support for dot-product
> where the sign for the multiplicant changes.
> 
> On Fri, 7 May 2021, Tamar Christina wrote:
> 
> > Hi Richi,
> >
> > > -----Original Message-----
> > > From: Richard Biener <rguenther@suse.de>
> > > Sent: Friday, May 7, 2021 12:46 PM
> > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> > > Subject: Re: [PATCH 1/4]middle-end Vect: Add support for dot-product
> > > where the sign for the multiplicant changes.
> > >
> > > On Wed, 5 May 2021, Tamar Christina wrote:
> > >
> > > > Hi All,
> > > >
> > > > This patch adds support for a dot product where the sign of the
> > > > multiplication arguments differ. i.e. one is signed and one is
> > > > unsigned but the precisions are the same.
> > > >
> > > > #define N 480
> > > > #define SIGNEDNESS_1 unsigned
> > > > #define SIGNEDNESS_2 signed
> > > > #define SIGNEDNESS_3 signed
> > > > #define SIGNEDNESS_4 unsigned
> > > >
> > > > SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int res,
> > > > SIGNEDNESS_3 char *restrict a,
> > > >    SIGNEDNESS_4 char *restrict b)
> > > > {
> > > >   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> > > >     {
> > > >       int av = a[i];
> > > >       int bv = b[i];
> > > >       SIGNEDNESS_2 short mult = av * bv;
> > > >       res += mult;
> > > >     }
> > > >   return res;
> > > > }
> > > >
> > > > The operations are performed as if the operands were extended to a
> > > > 32-bit
> > > value.
> > > > As such this operation isn't valid if there is an intermediate
> > > > conversion to an unsigned value. i.e.  if SIGNEDNESS_2 is unsigned.
> > > >
> > > > more over if the signs of SIGNEDNESS_3 and SIGNEDNESS_4 are
> > > > flipped the same optab is used but the operands are flipped in the
> > > > optab
> > > expansion.
> > > >
> > > > To support this the patch extends the dot-product detection to
> > > > optionally ignore operands with different signs and stores this
> > > > information in the optab subtype which is now made a bitfield.
> > > >
> > > > The subtype can now additionally controls which optab an EXPR can
> > > > expand
> > > to.
> > > >
> > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > >
> > > > Ok for master?
> > > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > 	* optabs.def (usdot_prod_optab): New.
> > > > 	* doc/md.texi: Document it.
> > > > 	* optabs-tree.c (optab_for_tree_code): Support usdot_prod_optab.
> > > > 	* optabs-tree.h (enum optab_subtype): Likewise.
> > > > 	* optabs.c (expand_widen_pattern_expr): Likewise.
> > > > 	* tree-cfg.c (verify_gimple_assign_ternary): Likewise.
> > > > 	* tree-vect-loop.c (vect_determine_dot_kind): New.
> > > > 	(vectorizable_reduction): Query dot-product kind.
> > > > 	* tree-vect-patterns.c (vect_supportable_direct_optab_p): Take
> > > optional
> > > > 	optab subtype.
> > > > 	(vect_joust_widened_type, vect_widened_op_tree): Optionally
> > > ignore
> > > > 	mismatch types.
> > > > 	(vect_recog_dot_prod_pattern): Support usdot_prod_optab.
> > > >
> > > > --- inline copy of patch --
> > > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index
> > > >
> > >
> d166a0debedf4d8edf55c842bcf4ff4690b3e9ce..baf20416e63745097825fc30fd
> > > f2
> > > > e66bc80d7d23 100644
> > > > --- a/gcc/doc/md.texi
> > > > +++ b/gcc/doc/md.texi
> > > > @@ -5440,11 +5440,13 @@ Like @samp{fold_left_plus_@var{m}}, but
> > > takes
> > > > an additional mask operand  @item @samp{sdot_prod@var{m}}
> @cindex
> > > > @code{udot_prod@var{m}} instruction pattern  @itemx
> > > > @samp{udot_prod@var{m}}
> > > > +@cindex @code{usdot_prod@var{m}} instruction pattern @itemx
> > > > +@samp{usdot_prod@var{m}}
> > > >  Compute the sum of the products of two signed/unsigned elements.
> > > > -Operand 1 and operand 2 are of the same mode. Their product,
> > > > which is of a -wider mode, is computed and added to operand 3.
> > > > Operand 3 is of a mode equal or -wider than the mode of the
> > > > product. The result is placed in operand 0, which -is of the same mode
> as operand 3.
> > > > +Operand 1 and operand 2 are of the same mode but may differ in
> signs.
> > > > +Their product, which is of a wider mode, is computed and added to
> > > operand 3.
> > > > +Operand 3 is of a mode equal or wider than the mode of the product.
> > > > +The result is placed in operand 0, which is of the same mode as
> operand 3.
> > >
> > > This doesn't really say what the 's', 'u' and 'us' specify.  Since
> > > we're doing a widen multiplication and then a non-widening addition
> > > we only need to know the effective sign of the multiplication so I think
> the existing 's' and 'u'
> > > are enough to cover all cases?
> >
> > The existing 's' and 'u' enforce that both operands of the
> > multiplication are of the same sign.  So for e.g. 'u' both operand must be
> unsigned.
> >
> > In the `us` case one can be signed and one unsigned. Operationally
> > this does a sign extension to the wider type for the signed value, and
> > the unsigned value gets zero extended first, and then converts it to
> > unsigned to perform the unsigned multiplication, conforming to the C
> promotion rules.
> >
> > TL;DR; Without a new optab I can't tell during expansion which
> > semantic the operation had at the gimple/C level as modes don't carry signs.
> >
> > Long version:
> >
> > The problem with using the existing patterns, because of their
> > enforcement of `av` and `bv` being the same sign is that we can't
> > remove the explicit sign extensions, but the multiplication must be done on
> the sign/zero extended char input in the same sign.
> >
> > Which means (unless I am mistaken) to get the correct result, you
> > can't use neither `udot` nor `sdot` as semantically these would zero
> > or sign extend both operands from char to int to perform the
> > multiplication in the same sigh.  Whereas in this case, one parameter is zero
> and one parameter is sign extended and the result is always an unsigned
> number.
> >
> > So basically
> >
> > udot<unsigned c, unsigned a, unsigned b> ==
> >    c = zero-ext (a) * zero-ext (b)
> > sdot<signed c, signed a, signed b> ==
> >    c = sign-ext (a) * sign-ext (b)
> > usdot<unsigned c, unsigned a, signed b> ==
> >    c = ((unsigned-conv) sign-ext (a)) * zero-ext (b)
> >
> > So semantically the existing optabs won't fit here. udot would
> > internally promote to unsigned types before the multiplication so the
> > result of the multiplication would be wrong.  sdot would promote both to
> signed and do signed multiplication, so the result is also wrong.
> >
> > Now if I relax the constraint on the signs of udot and sdot there are two
> problems:
> > RTL Modes don't contain signs.  So a target can't tell me how the operands
> will be promoted.
> > So:
> >
> > 1) I can't really check which semantics the target will adhere to on
> expansion.
> > 2) at expand time I have no way to differentiate between the two
> instructions variants, given just modes
> >      I can't tell whether I expand to the normal dot-product or the new
> instruction.
> 
> Ah, OK.  Indeed with such a weird instruction the new variant makes sense.
> Still can you please amend the optab documentation to say which operand is
> unsigned and which is signed?  Just 'may differ in signs'
> is bad.

Sure, will expand on it.

> 
> Since the multiplication is commutative I wonder why you need to handle
> both signed_to_unsigned and unsigned_to_signed - we should just enforce
> a canonical order (like the optab does). 

Sure, I thought it would have been better to change the order at expand time,
but can do so at detection time.

> I also think it's a particular bad fit for
> the bad optab_for_tree_code API - would any of that improve when using a
> direct internal function here? 

Somewhat, but this has considerable knock on effects, e.g. currently DOT_PROD is
treated as a widening operation and so is handled by supportable_widening_operation
which does not support calls. There's a significant number of places which work on the
tree EXPR (including constant folding) which all need to be changed.

> In particular all the changes around
> optab_subtype look like they make a bad API worse ... at least a single
> optab_vector_mixed_sign should suffice here, no need to make it a flags
> kind.

The reason I did so is because depending on where the query is done it does use
different subtypes currently.  During detection it uses optab_default, and during
vectorization optab_vector.  For this instruction this difference doesn't seem to be
used, but did not want to lose this information in case something depended on it.

But can make it just one.

> 
> +  /* If we have a sign changing dot product we need to check that the
> +     promoted type if unsigned has at least the same precision as the
> final
> +     type of the dot-product.  */
> +  if (subtype != optab_default)
> +    {
> +      tree mult_type = TREE_TYPE (unprom_mult.op);
> +      if (TYPE_SIGN (mult_type) == UNSIGNED
> +         && TYPE_PRECISION (mult_type) < TYPE_PRECISION (type))
> +       return NULL;
> +    }
> 
> I don't understand this - how do we ever arrive at a result with less precision?

The user could have manually truncated the results, i.e. in the detection code notice `mult`

      int av = a[i];
      int bv = b[i];
      SIGNEDNESS_2 short mult = av * bv;
      res += mult;

which is a short, so it's manually truncating the multiplication which is done as int by the instruction.
If `mult` is unsigned then it will truncate the result if the signed input to usdot was negative, unless the
Intermediate calculation is of the same precision as the instruction. i.e. if mult is unsigned int then there's
no truncation going on, it's casting from int to unsigned int so it's safe to use then as the instruction does the
same thing internally.

> And why's this not an issue for signed multiplication?

It is, but in that case it's handled by the type jousting, which doesn't allow the type mismatch. i.e.

#define SIGNEDNESS_1 unsigned
#define SIGNEDNESS_2 unsigned
#define SIGNEDNESS_3 signed
#define SIGNEDNESS_4 signed

SIGNEDNESS_1 int __attribute__ ((noipa))
f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a,
   SIGNEDNESS_4 char *restrict b)
{
  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
    {
      int av = a[i];
      int bv = b[i];
      SIGNEDNESS_2 short mult = av * bv;
      res += mult;
    }
  return res;
}

Is also not detected as a dot product.  By adding the carve out to the widen multiplication detection it now
allows this case through so I handle it in the detection code.  Thinking about it now, it seems more logical
to add this case handling inside the type jousting code as I don't think it's ever something you'd want.

> Also...
> 
> +  /* If we have a sign changing dot-product the dot-product itself does
> any
> +     sign conversions, so consume the type and use the unpromoted types.
> */
> +  tree mult_arg1, mult_arg2;
> +  if (subtype == optab_default)
> +    {
> +      mult_arg1 = mult_oprnd[0];
> +      mult_arg2 = mult_oprnd[1];
> +    }
> +  else
> +    {
> +      mult_arg1 = unprom0[0].op;
> +      mult_arg2 = unprom0[1].op;
> +    }
>    pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,
> -                                     mult_oprnd[0], mult_oprnd[1],
> oprnd1);
> +                                     mult_arg1, mult_arg2, oprnd1);
> 
> I thought DOT_PROD always performs the promotion.  Maybe mult_oprnd
> and unprom0 are just misnamed here?

Somewhat, in a normal dot-product the sign of the multiplication are the same here
as the "unpromoted" types. So after vect_convert_input these two types are the same.

However because here the sign changes and to maintain the semantics of the C code
there's an extra conversion here to get the arguments in the same sign.  That needs to be
stripped before given to the instruction which does the conversion internally.

Regards,
Tamar

> 
> Richard.
> 
> > Regards,
> > Tamar
> >
> > >
> > > The tree.def docs say the sum is also possibly widening but I don't
> > > see this covered by the optab so we should eventually remove this
> > > feature from the tree side.  In fact the tree-cfg.c verifier
> > > requires the addition to be not widening - thus only tree.def needs
> adjustment.
> > >
> > > >  @cindex @code{ssad@var{m}} instruction pattern  @item
> > > > @samp{ssad@var{m}} diff --git a/gcc/optabs-tree.h
> > > > b/gcc/optabs-tree.h index
> > > >
> > >
> c3aaa1a416991e856d3e24da45968a92ebada82c..ebc23ac86fe99057f375781c2f
> > > 19
> > > > 90e0548ba08d 100644
> > > > --- a/gcc/optabs-tree.h
> > > > +++ b/gcc/optabs-tree.h
> > > > @@ -27,11 +27,29 @@ along with GCC; see the file COPYING3.  If not
> see
> > > >     shift amount vs. machines that take a vector for the shift amount.
> > > > */  enum optab_subtype  {
> > > > -  optab_default,
> > > > -  optab_scalar,
> > > > -  optab_vector
> > > > +  optab_default = 1 << 0,
> > > > +  optab_scalar = 1 << 1,
> > > > +  optab_vector = 1 << 2,
> > > > +  optab_signed_to_unsigned = 1 << 3,  optab_unsigned_to_signed =
> > > > + 1 << 4
> > > >  };
> > > >
> > > > +/* Override the OrEqual-operator so we can use optab_subtype as a
> > > > +bit flag.  */ inline enum optab_subtype& operator |= (enum
> > > optab_subtype&
> > > > +a, enum optab_subtype b) {
> > > > +    return a = static_cast<optab_subtype>(static_cast<int>(a)
> > > > +					  | static_cast<int>(b));
> > > > +}
> > > > +
> > > > +/* Override the Or-operator so we can use optab_subtype as a bit
> > > > +flag.  */ inline enum optab_subtype operator | (enum
> > > > +optab_subtype a, enum optab_subtype b) {
> > > > +    return static_cast<optab_subtype>(static_cast<int>(a)
> > > > +				      | static_cast<int>(b)); }
> > > > +
> > > >  /* Return the optab used for computing the given operation on the
> > > > type
> > > given by
> > > >     the second argument.  The third argument distinguishes between
> > > > the
> > > types of
> > > >     vector shifts and rotates.  */ diff --git a/gcc/optabs-tree.c
> > > > b/gcc/optabs-tree.c index
> > > >
> > >
> 95ffe397c23e80c105afea52e9d47216bf52f55a..2f60004545defc53182e004eea
> > > 1e
> > > > 5c22b7453072 100644
> > > > --- a/gcc/optabs-tree.c
> > > > +++ b/gcc/optabs-tree.c
> > > > @@ -127,7 +127,17 @@ optab_for_tree_code (enum tree_code code,
> > > const_tree type,
> > > >        return TYPE_UNSIGNED (type) ? usum_widen_optab :
> > > > ssum_widen_optab;
> > > >
> > > >      case DOT_PROD_EXPR:
> > > > -      return TYPE_UNSIGNED (type) ? udot_prod_optab :
> sdot_prod_optab;
> > > > +      {
> > > > +	gcc_assert (subtype & optab_default
> > > > +		    || subtype & optab_vector
> > > > +		    || subtype & optab_signed_to_unsigned
> > > > +		    || subtype & optab_unsigned_to_signed);
> > > > +
> > > > +	if (subtype & (optab_unsigned_to_signed |
> > > optab_signed_to_unsigned))
> > > > +	  return usdot_prod_optab;
> > > > +
> > > > +	return (TYPE_UNSIGNED (type) ? udot_prod_optab :
> > > sdot_prod_optab);
> > > > +      }
> > > >
> > > >      case SAD_EXPR:
> > > >        return TYPE_UNSIGNED (type) ? usad_optab : ssad_optab; diff
> > > > --git a/gcc/optabs.c b/gcc/optabs.c index
> > > >
> > >
> f4614a394587787293dc8b680a38901f7906f61c..2e18b76de1412eab71971753ac
> > > 67
> > > > 8597c0d00098 100644
> > > > --- a/gcc/optabs.c
> > > > +++ b/gcc/optabs.c
> > > > @@ -262,6 +262,11 @@ expand_widen_pattern_expr (sepops ops, rtx
> > > > op0,
> > > rtx op1, rtx wide_op,
> > > >    bool sbool = false;
> > > >
> > > >    oprnd0 = ops->op0;
> > > > +  if (nops >= 2)
> > > > +    oprnd1 = ops->op1;
> > > > +  if (nops >= 3)
> > > > +    oprnd2 = ops->op2;
> > > > +
> > > >    tmode0 = TYPE_MODE (TREE_TYPE (oprnd0));
> > > >    if (ops->code == VEC_UNPACK_FIX_TRUNC_HI_EXPR
> > > >        || ops->code == VEC_UNPACK_FIX_TRUNC_LO_EXPR) @@ -285,6
> > > +290,27
> > > > @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx
> > > wide_op,
> > > >  	   ? vec_unpacks_sbool_hi_optab : vec_unpacks_sbool_lo_optab);
> > > >        sbool = true;
> > > >      }
> > > > +  else if (ops->code == DOT_PROD_EXPR)
> > > > +    {
> > > > +      enum optab_subtype subtype = optab_default;
> > > > +      signop sign1 = TYPE_SIGN (TREE_TYPE (oprnd0));
> > > > +      signop sign2 = TYPE_SIGN (TREE_TYPE (oprnd1));
> > > > +      if (sign1 == sign2)
> > > > +	;
> > > > +      else if (sign1 == SIGNED && sign2 == UNSIGNED)
> > > > +	{
> > > > +	  subtype |= optab_signed_to_unsigned;
> > > > +	  /* Same as optab_unsigned_to_signed but flip the operands.  */
> > > > +	  std::swap (op0, op1);
> > > > +	}
> > > > +      else if (sign1 == UNSIGNED && sign2 == SIGNED)
> > > > +	subtype |= optab_unsigned_to_signed;
> > > > +      else
> > > > +	gcc_unreachable ();
> > > > +
> > > > +      widen_pattern_optab
> > > > +	= optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), subtype);
> > > > +    }
> > > >    else
> > > >      widen_pattern_optab
> > > >        = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0),
> > > > optab_default); @@ -298,10 +324,7 @@ expand_widen_pattern_expr
> > > (sepops ops, rtx op0, rtx op1, rtx wide_op,
> > > >    gcc_assert (icode != CODE_FOR_nothing);
> > > >
> > > >    if (nops >= 2)
> > > > -    {
> > > > -      oprnd1 = ops->op1;
> > > > -      tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
> > > > -    }
> > > > +    tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
> > > >    else if (sbool)
> > > >      {
> > > >        nops = 2;
> > > > @@ -316,7 +339,6 @@ expand_widen_pattern_expr (sepops ops, rtx
> > > > op0,
> > > rtx op1, rtx wide_op,
> > > >      {
> > > >        gcc_assert (tmode1 == tmode0);
> > > >        gcc_assert (op1);
> > > > -      oprnd2 = ops->op2;
> > > >        wmode = TYPE_MODE (TREE_TYPE (oprnd2));
> > > >      }
> > > >
> > > > diff --git a/gcc/optabs.def b/gcc/optabs.def index
> > > >
> > >
> b192a9d070b8aa72e5676b2eaa020b5bdd7ffcc8..f470c2168378cec840edf7fbd
> > > b7c
> > > > 18615baae928 100644
> > > > --- a/gcc/optabs.def
> > > > +++ b/gcc/optabs.def
> > > > @@ -352,6 +352,7 @@ OPTAB_D (uavg_ceil_optab, "uavg$a3_ceil")
> > > OPTAB_D
> > > > (sdot_prod_optab, "sdot_prod$I$a")  OPTAB_D (ssum_widen_optab,
> > > > "widen_ssum$I$a3")  OPTAB_D (udot_prod_optab, "udot_prod$I$a")
> > > > +OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
> > > >  OPTAB_D (usum_widen_optab, "widen_usum$I$a3")  OPTAB_D
> > > (usad_optab,
> > > > "usad$I$a")  OPTAB_D (ssad_optab, "ssad$I$a") diff --git
> > > > a/gcc/tree-cfg.c b/gcc/tree-cfg.c index
> > > >
> > >
> 7e3aae5f9c28a49feedc7cc66e8ac0d476b9f28a..58b55bb648ad97d514f1fa18bb
> > > 00
> > > > 808fd2678b42 100644
> > > > --- a/gcc/tree-cfg.c
> > > > +++ b/gcc/tree-cfg.c
> > > > @@ -4421,7 +4421,8 @@ verify_gimple_assign_ternary (gassign *stmt)
> > > >  		  && !SCALAR_FLOAT_TYPE_P (rhs1_type))
> > > >  		 || (!INTEGRAL_TYPE_P (lhs_type)
> > > >  		     && !SCALAR_FLOAT_TYPE_P (lhs_type))))
> > > > -	    || !types_compatible_p (rhs1_type, rhs2_type)
> > > > +	    || (!types_compatible_p (rhs1_type, rhs2_type)
> > > > +		&& TYPE_SIGN (rhs1_type) == TYPE_SIGN (rhs2_type))
> > >
> > > That's not restrictive enough.  I suggest you use
> > >
> > >             && element_precision (rhs1_type) != element_precision
> > > (rhs2_type)
> > >
> > > instead.
> > >
> > > As said, I'm not sure all the changes in this patch are required.
> > >
> > > Please elaborate.
> > >
> > > Thanks,
> > > Richard.
> > >
> > > >  	    || !useless_type_conversion_p (lhs_type, rhs3_type)
> > > >  	    || maybe_lt (GET_MODE_SIZE (element_mode (rhs3_type)),
> > > >  			 2 * GET_MODE_SIZE (element_mode (rhs1_type))))
> > > diff --git
> > > > a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index
> > > >
> > >
> 93fa2928e001c154bd4a9a73ac1dbbbf73c456df..cb8f5fbb6abca181c4171194d1
> > > 9f
> > > > ec29ec6e4176 100644
> > > > --- a/gcc/tree-vect-loop.c
> > > > +++ b/gcc/tree-vect-loop.c
> > > > @@ -6401,6 +6401,33 @@ build_vect_cond_expr (enum tree_code
> code,
> > > tree vop[3], tree mask,
> > > >      }
> > > >  }
> > > >
> > > > +/* Determine the optab_subtype to use for the given CODE and STMT.
> > > For
> > > > +   most CODE this will be optab_vector, however for certain
> > > > + operations
> > > such as
> > > > +   DOT_PROD_EXPR where the operation can different signs for the
> > > operands we
> > > > +   need to be able to pick the right optabs.  */
> > > > +
> > > > +static enum optab_subtype
> > > > +vect_determine_dot_kind (tree_code code, stmt_vec_info
> > > > +stmt_vinfo) {
> > > > +  enum optab_subtype subtype = optab_vector;
> > > > +  switch (code)
> > > > +    {
> > > > +      case DOT_PROD_EXPR:
> > > > +	{
> > > > +	  gassign *stmt = as_a <gassign *> (STMT_VINFO_STMT (stmt_vinfo));
> > > > +	  signop rhs1_sign = TYPE_SIGN (TREE_TYPE (gimple_assign_rhs1
> > > (stmt)));
> > > > +	  signop rhs2_sign = TYPE_SIGN (TREE_TYPE (gimple_assign_rhs2
> > > (stmt)));
> > > > +	  if (rhs1_sign != rhs2_sign)
> > > > +	    subtype |= optab_unsigned_to_signed;
> > > > +	  break;
> > > > +	}
> > > > +      default:
> > > > +	break;
> > > > +    }
> > > > +
> > > > +  return subtype;
> > > > +}
> > > > +
> > > >  /* Function vectorizable_reduction.
> > > >
> > > >     Check if STMT_INFO performs a reduction operation that can be
> > > vectorized.
> > > > @@ -7189,7 +7216,8 @@ vectorizable_reduction (loop_vec_info
> > > loop_vinfo,
> > > >        bool ok = true;
> > > >
> > > >        /* 4.1. check support for the operation in the loop  */
> > > > -      optab optab = optab_for_tree_code (code, vectype_in,
> optab_vector);
> > > > +      enum optab_subtype subtype = vect_determine_dot_kind (code,
> > > stmt_info);
> > > > +      optab optab = optab_for_tree_code (code, vectype_in,
> > > > + subtype);
> > > >        if (!optab)
> > > >  	{
> > > >  	  if (dump_enabled_p ())
> > > > diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
> > > > index
> > > >
> > >
> 441d6cd28c4eaded7abd756164890dbcffd2f3b8..943c001fb13777b4d1513841f
> > > a84
> > > > 942316846d5e 100644
> > > > --- a/gcc/tree-vect-patterns.c
> > > > +++ b/gcc/tree-vect-patterns.c
> > > > @@ -201,7 +201,8 @@ vect_get_external_def_edge (vec_info *vinfo,
> > > > tree
> > > > var)  static bool  vect_supportable_direct_optab_p (vec_info
> > > > *vinfo, tree otype, tree_code code,
> > > >  				 tree itype, tree *vecotype_out,
> > > > -				 tree *vecitype_out = NULL)
> > > > +				 tree *vecitype_out = NULL,
> > > > +				 enum optab_subtype subtype =
> > > optab_default)
> > > >  {
> > > >    tree vecitype = get_vectype_for_scalar_type (vinfo, itype);
> > > >    if (!vecitype)
> > > > @@ -211,7 +212,7 @@ vect_supportable_direct_optab_p (vec_info
> > > > *vinfo,
> > > tree otype, tree_code code,
> > > >    if (!vecotype)
> > > >      return false;
> > > >
> > > > -  optab optab = optab_for_tree_code (code, vecitype,
> > > > optab_default);
> > > > +  optab optab = optab_for_tree_code (code, vecitype, subtype);
> > > >    if (!optab)
> > > >      return false;
> > > >
> > > > @@ -487,14 +488,31 @@ vect_joust_widened_integer (tree type, bool
> > > > shift_p, tree op,  }
> > > >
> > > >  /* Return true if the common supertype of NEW_TYPE and
> > > *COMMON_TYPE
> > > > -   is narrower than type, storing the supertype in *COMMON_TYPE if
> so.
> > > */
> > > > +   is narrower than type, storing the supertype in *COMMON_TYPE if
> so.
> > > > +   If ALLOW_SHORT_SIGN_MISMATCH then accept that
> *COMMON_TYPE
> > > and NEW_TYPE
> > > > +   may be of different signs but equal precision.   */
> > > >
> > > >  static bool
> > > > -vect_joust_widened_type (tree type, tree new_type, tree
> > > *common_type)
> > > > +vect_joust_widened_type (tree type, tree new_type, tree
> > > *common_type,
> > > > +			 bool allow_short_sign_mismatch = false)
> > > >  {
> > > >    if (types_compatible_p (*common_type, new_type))
> > > >      return true;
> > > >
> > > > +  /* Check if the mismatch is only in the sign and if we have
> > > > +     allow_short_sign_mismatch then allow it.  */
> > > > +  if (allow_short_sign_mismatch
> > > > +      && TYPE_SIGN (*common_type) != TYPE_SIGN (new_type))
> > > > +    {
> > > > +      bool sign = TYPE_SIGN (*common_type) == UNSIGNED;
> > > > +      tree eq_type
> > > > +	= build_nonstandard_integer_type (TYPE_PRECISION (new_type),
> > > > +					  sign);
> > > > +
> > > > +      if (types_compatible_p (*common_type, eq_type))
> > > > +	return true;
> > > > +    }
> > > > +
> > > >    /* See if *COMMON_TYPE can hold all values of NEW_TYPE.  */
> > > >    if ((TYPE_PRECISION (new_type) < TYPE_PRECISION (*common_type))
> > > >        && (TYPE_UNSIGNED (new_type) || !TYPE_UNSIGNED
> > > (*common_type)))
> > > > @@ -532,6 +550,9 @@ vect_joust_widened_type (tree type, tree
> > > new_type, tree *common_type)
> > > >     to a type that (a) is narrower than the result of STMT_INFO and
> > > >     (b) can hold all leaf operand values.
> > > >
> > > > +   If ALLOW_SHORT_SIGN_MISMATCH then allow that the signs of the
> > > operands
> > > > +   may differ in signs but not in precision.
> > > > +
> > > >     Return 0 if STMT_INFO isn't such a tree, or if no such COMMON_TYPE
> > > >     exists.  */
> > > >
> > > > @@ -539,7 +560,8 @@ static unsigned int  vect_widened_op_tree
> > > > (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
> > > >  		      tree_code widened_code, bool shift_p,
> > > >  		      unsigned int max_nops,
> > > > -		      vect_unpromoted_value *unprom, tree *common_type)
> > > > +		      vect_unpromoted_value *unprom, tree *common_type,
> > > > +		      bool allow_short_sign_mismatch = false)
> > > >  {
> > > >    /* Check for an integer operation with the right code.  */
> > > >    gassign *assign = dyn_cast <gassign *> (stmt_info->stmt); @@
> > > > -600,7
> > > > +622,8 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info
> > > stmt_info, tree_code code,
> > > >  		= vinfo->lookup_def (this_unprom->op);
> > > >  	      nops = vect_widened_op_tree (vinfo, def_stmt_info, code,
> > > >  					   widened_code, shift_p, max_nops,
> > > > -					   this_unprom, common_type);
> > > > +					   this_unprom, common_type,
> > > > +					   allow_short_sign_mismatch);
> > > >  	      if (nops == 0)
> > > >  		return 0;
> > > >
> > > > @@ -617,7 +640,8 @@ vect_widened_op_tree (vec_info *vinfo,
> > > stmt_vec_info stmt_info, tree_code code,
> > > >  	      if (i == 0)
> > > >  		*common_type = this_unprom->type;
> > > >  	      else if (!vect_joust_widened_type (type, this_unprom->type,
> > > > -						 common_type))
> > > > +						 common_type,
> > > > +						 allow_short_sign_mismatch))
> > > >  		return 0;
> > > >  	    }
> > > >  	}
> > > > @@ -888,21 +912,24 @@ vect_reassociating_reduction_p (vec_info
> > > > *vinfo,
> > > >
> > > >     Try to find the following pattern:
> > > >
> > > > -     type x_t, y_t;
> > > > +     type1a x_t
> > > > +     type1b y_t;
> > > >       TYPE1 prod;
> > > >       TYPE2 sum = init;
> > > >     loop:
> > > >       sum_0 = phi <init, sum_1>
> > > >       S1  x_t = ...
> > > >       S2  y_t = ...
> > > > -     S3  x_T = (TYPE1) x_t;
> > > > -     S4  y_T = (TYPE1) y_t;
> > > > +     S3  x_T = (TYPE3) x_t;
> > > > +     S4  y_T = (TYPE4) y_t;
> > > >       S5  prod = x_T * y_T;
> > > >       [S6  prod = (TYPE2) prod;  #optional]
> > > >       S7  sum_1 = prod + sum_0;
> > > >
> > > > -   where 'TYPE1' is exactly double the size of type 'type', and 'TYPE2' is
> the
> > > > -   same size of 'TYPE1' or bigger. This is a special case of a reduction
> > > > +   where 'TYPE1' is exactly double the size of type 'type1a' and 'type1b',
> > > > +   the sign of 'TYPE1' must be one of 'type1a' or 'type1b' but the sign of
> > > > +   'type1a' and 'type1b' can differ. 'TYPE2' is the same size of 'TYPE1' or
> > > > +   bigger and must be the same sign. This is a special case of a
> > > > + reduction
> > > >     computation.
> > > >
> > > >     Input:
> > > > @@ -939,15 +966,16 @@ vect_recog_dot_prod_pattern (vec_info
> > > > *vinfo,
> > > >
> > > >    /* Look for the following pattern
> > > >            DX = (TYPE1) X;
> > > > -          DY = (TYPE1) Y;
> > > > +	  DY = (TYPE2) Y;
> > > >            DPROD = DX * DY;
> > > > -          DDPROD = (TYPE2) DPROD;
> > > > +	  DDPROD = (TYPE3) DPROD;
> > > >            sum_1 = DDPROD + sum_0;
> > > >       In which
> > > >       - DX is double the size of X
> > > >       - DY is double the size of Y
> > > >       - DX, DY, DPROD all have the same type but the sign
> > > > -       between DX, DY and DPROD can differ.
> > > > +       between DX, DY and DPROD can differ. The sign of DPROD
> > > > +       is one of the signs of DX or DY.
> > > >       - sum is the same size of DPROD or bigger
> > > >       - sum has been recognized as a reduction variable.
> > > >
> > > > @@ -986,14 +1014,41 @@ vect_recog_dot_prod_pattern (vec_info
> *vinfo,
> > > >       inside the loop (in case we are analyzing an outer-loop).  */
> > > >    vect_unpromoted_value unprom0[2];
> > > >    if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR,
> > > WIDEN_MULT_EXPR,
> > > > -			     false, 2, unprom0, &half_type))
> > > > +			     false, 2, unprom0, &half_type, true))
> > > >      return NULL;
> > > >
> > > > +  /* Check to see if there is a sign change happening in the
> > > > + operands of
> > > the
> > > > +     multiplication and pick the appropriate optab subtype.  */
> > > > +  enum optab_subtype subtype;
> > > > +  tree rhs_type1 = unprom0[0].type;
> > > > +  tree rhs_type2 = unprom0[1].type;
> > > > +  if (TYPE_SIGN (rhs_type1) == TYPE_SIGN (rhs_type2))
> > > > +     subtype = optab_default;
> > > > +  else if (TYPE_SIGN (rhs_type1) == SIGNED
> > > > +	   && TYPE_SIGN (rhs_type2) == UNSIGNED)
> > > > +     subtype = optab_signed_to_unsigned;
> > > > +  else if (TYPE_SIGN (rhs_type1) == UNSIGNED
> > > > +	   && TYPE_SIGN (rhs_type2) == SIGNED)
> > > > +     subtype = optab_unsigned_to_signed;
> > > > +  else
> > > > +    gcc_unreachable ();
> > > > +
> > > > +  /* If we have a sign changing dot product we need to check that the
> > > > +     promoted type if unsigned has at least the same precision as the
> final
> > > > +     type of the dot-product.  */
> > > > +  if (subtype != optab_default)
> > > > +    {
> > > > +      tree mult_type = TREE_TYPE (unprom_mult.op);
> > > > +      if (TYPE_SIGN (mult_type) == UNSIGNED
> > > > +	  && TYPE_PRECISION (mult_type) < TYPE_PRECISION (type))
> > > > +	return NULL;
> > > > +    }
> > > > +
> > > >    vect_pattern_detected ("vect_recog_dot_prod_pattern",
> > > > last_stmt);
> > > >
> > > >    tree half_vectype;
> > > >    if (!vect_supportable_direct_optab_p (vinfo, type,
> > > > DOT_PROD_EXPR,
> > > half_type,
> > > > -					type_out, &half_vectype))
> > > > +					type_out, &half_vectype, subtype))
> > > >      return NULL;
> > > >
> > > >    /* Get the inputs in the appropriate types.  */ @@ -1002,8
> > > > +1057,22 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
> > > >  		       unprom0, half_vectype);
> > > >
> > > >    var = vect_recog_temp_ssa_var (type, NULL);
> > > > +
> > > > +  /* If we have a sign changing dot-product the dot-product itself does
> any
> > > > +     sign conversions, so consume the type and use the unpromoted
> > > > + types.  */  tree mult_arg1, mult_arg2;  if (subtype ==
> > > > + optab_default)
> > > > +    {
> > > > +      mult_arg1 = mult_oprnd[0];
> > > > +      mult_arg2 = mult_oprnd[1];
> > > > +    }
> > > > +  else
> > > > +    {
> > > > +      mult_arg1 = unprom0[0].op;
> > > > +      mult_arg2 = unprom0[1].op;
> > > > +    }
> > > >    pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,
> > > > -				      mult_oprnd[0], mult_oprnd[1], oprnd1);
> > > > +				      mult_arg1, mult_arg2, oprnd1);
> > > >
> > > >    return pattern_stmt;
> > > >  }
> > > >
> > > >
> > > >
> > >
> > > --
> > > Richard Biener <rguenther@suse.de>
> > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> > > Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes.
  2021-05-10 12:58       ` Tamar Christina
@ 2021-05-10 13:29         ` Richard Biener
  2021-05-25 14:57           ` Tamar Christina
  0 siblings, 1 reply; 35+ messages in thread
From: Richard Biener @ 2021-05-10 13:29 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd

On Mon, 10 May 2021, Tamar Christina wrote:

> 
> 
> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de>
> > Sent: Monday, May 10, 2021 12:40 PM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> > Subject: RE: [PATCH 1/4]middle-end Vect: Add support for dot-product
> > where the sign for the multiplicant changes.
> > 
> > On Fri, 7 May 2021, Tamar Christina wrote:
> > 
> > > Hi Richi,
> > >
> > > > -----Original Message-----
> > > > From: Richard Biener <rguenther@suse.de>
> > > > Sent: Friday, May 7, 2021 12:46 PM
> > > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> > > > Subject: Re: [PATCH 1/4]middle-end Vect: Add support for dot-product
> > > > where the sign for the multiplicant changes.
> > > >
> > > > On Wed, 5 May 2021, Tamar Christina wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > This patch adds support for a dot product where the sign of the
> > > > > multiplication arguments differ. i.e. one is signed and one is
> > > > > unsigned but the precisions are the same.
> > > > >
> > > > > #define N 480
> > > > > #define SIGNEDNESS_1 unsigned
> > > > > #define SIGNEDNESS_2 signed
> > > > > #define SIGNEDNESS_3 signed
> > > > > #define SIGNEDNESS_4 unsigned
> > > > >
> > > > > SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int res,
> > > > > SIGNEDNESS_3 char *restrict a,
> > > > >    SIGNEDNESS_4 char *restrict b)
> > > > > {
> > > > >   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> > > > >     {
> > > > >       int av = a[i];
> > > > >       int bv = b[i];
> > > > >       SIGNEDNESS_2 short mult = av * bv;
> > > > >       res += mult;
> > > > >     }
> > > > >   return res;
> > > > > }
> > > > >
> > > > > The operations are performed as if the operands were extended to a
> > > > > 32-bit
> > > > value.
> > > > > As such this operation isn't valid if there is an intermediate
> > > > > conversion to an unsigned value. i.e.  if SIGNEDNESS_2 is unsigned.
> > > > >
> > > > > more over if the signs of SIGNEDNESS_3 and SIGNEDNESS_4 are
> > > > > flipped the same optab is used but the operands are flipped in the
> > > > > optab
> > > > expansion.
> > > > >
> > > > > To support this the patch extends the dot-product detection to
> > > > > optionally ignore operands with different signs and stores this
> > > > > information in the optab subtype which is now made a bitfield.
> > > > >
> > > > > The subtype can now additionally controls which optab an EXPR can
> > > > > expand
> > > > to.
> > > > >
> > > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > > >
> > > > > Ok for master?
> > > > >
> > > > > Thanks,
> > > > > Tamar
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > > 	* optabs.def (usdot_prod_optab): New.
> > > > > 	* doc/md.texi: Document it.
> > > > > 	* optabs-tree.c (optab_for_tree_code): Support usdot_prod_optab.
> > > > > 	* optabs-tree.h (enum optab_subtype): Likewise.
> > > > > 	* optabs.c (expand_widen_pattern_expr): Likewise.
> > > > > 	* tree-cfg.c (verify_gimple_assign_ternary): Likewise.
> > > > > 	* tree-vect-loop.c (vect_determine_dot_kind): New.
> > > > > 	(vectorizable_reduction): Query dot-product kind.
> > > > > 	* tree-vect-patterns.c (vect_supportable_direct_optab_p): Take
> > > > optional
> > > > > 	optab subtype.
> > > > > 	(vect_joust_widened_type, vect_widened_op_tree): Optionally
> > > > ignore
> > > > > 	mismatch types.
> > > > > 	(vect_recog_dot_prod_pattern): Support usdot_prod_optab.
> > > > >
> > > > > --- inline copy of patch --
> > > > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index
> > > > >
> > > >
> > d166a0debedf4d8edf55c842bcf4ff4690b3e9ce..baf20416e63745097825fc30fd
> > > > f2
> > > > > e66bc80d7d23 100644
> > > > > --- a/gcc/doc/md.texi
> > > > > +++ b/gcc/doc/md.texi
> > > > > @@ -5440,11 +5440,13 @@ Like @samp{fold_left_plus_@var{m}}, but
> > > > takes
> > > > > an additional mask operand  @item @samp{sdot_prod@var{m}}
> > @cindex
> > > > > @code{udot_prod@var{m}} instruction pattern  @itemx
> > > > > @samp{udot_prod@var{m}}
> > > > > +@cindex @code{usdot_prod@var{m}} instruction pattern @itemx
> > > > > +@samp{usdot_prod@var{m}}
> > > > >  Compute the sum of the products of two signed/unsigned elements.
> > > > > -Operand 1 and operand 2 are of the same mode. Their product,
> > > > > which is of a -wider mode, is computed and added to operand 3.
> > > > > Operand 3 is of a mode equal or -wider than the mode of the
> > > > > product. The result is placed in operand 0, which -is of the same mode
> > as operand 3.
> > > > > +Operand 1 and operand 2 are of the same mode but may differ in
> > signs.
> > > > > +Their product, which is of a wider mode, is computed and added to
> > > > operand 3.
> > > > > +Operand 3 is of a mode equal or wider than the mode of the product.
> > > > > +The result is placed in operand 0, which is of the same mode as
> > operand 3.
> > > >
> > > > This doesn't really say what the 's', 'u' and 'us' specify.  Since
> > > > we're doing a widen multiplication and then a non-widening addition
> > > > we only need to know the effective sign of the multiplication so I think
> > the existing 's' and 'u'
> > > > are enough to cover all cases?
> > >
> > > The existing 's' and 'u' enforce that both operands of the
> > > multiplication are of the same sign.  So for e.g. 'u' both operand must be
> > unsigned.
> > >
> > > In the `us` case one can be signed and one unsigned. Operationally
> > > this does a sign extension to the wider type for the signed value, and
> > > the unsigned value gets zero extended first, and then converts it to
> > > unsigned to perform the unsigned multiplication, conforming to the C
> > promotion rules.
> > >
> > > TL;DR; Without a new optab I can't tell during expansion which
> > > semantic the operation had at the gimple/C level as modes don't carry signs.
> > >
> > > Long version:
> > >
> > > The problem with using the existing patterns, because of their
> > > enforcement of `av` and `bv` being the same sign is that we can't
> > > remove the explicit sign extensions, but the multiplication must be done on
> > the sign/zero extended char input in the same sign.
> > >
> > > Which means (unless I am mistaken) to get the correct result, you
> > > can't use neither `udot` nor `sdot` as semantically these would zero
> > > or sign extend both operands from char to int to perform the
> > > multiplication in the same sigh.  Whereas in this case, one parameter is zero
> > and one parameter is sign extended and the result is always an unsigned
> > number.
> > >
> > > So basically
> > >
> > > udot<unsigned c, unsigned a, unsigned b> ==
> > >    c = zero-ext (a) * zero-ext (b)
> > > sdot<signed c, signed a, signed b> ==
> > >    c = sign-ext (a) * sign-ext (b)
> > > usdot<unsigned c, unsigned a, signed b> ==
> > >    c = ((unsigned-conv) sign-ext (a)) * zero-ext (b)
> > >
> > > So semantically the existing optabs won't fit here. udot would
> > > internally promote to unsigned types before the multiplication so the
> > > result of the multiplication would be wrong.  sdot would promote both to
> > signed and do signed multiplication, so the result is also wrong.
> > >
> > > Now if I relax the constraint on the signs of udot and sdot there are two
> > problems:
> > > RTL Modes don't contain signs.  So a target can't tell me how the operands
> > will be promoted.
> > > So:
> > >
> > > 1) I can't really check which semantics the target will adhere to on
> > expansion.
> > > 2) at expand time I have no way to differentiate between the two
> > instructions variants, given just modes
> > >      I can't tell whether I expand to the normal dot-product or the new
> > instruction.
> > 
> > Ah, OK.  Indeed with such a weird instruction the new variant makes sense.
> > Still can you please amend the optab documentation to say which operand is
> > unsigned and which is signed?  Just 'may differ in signs'
> > is bad.
> 
> Sure, will expand on it.
> 
> > 
> > Since the multiplication is commutative I wonder why you need to handle
> > both signed_to_unsigned and unsigned_to_signed - we should just enforce
> > a canonical order (like the optab does). 
> 
> Sure, I thought it would have been better to change the order at expand time,
> but can do so at detection time.
> 
> > I also think it's a particular bad fit for
> > the bad optab_for_tree_code API - would any of that improve when using a
> > direct internal function here? 
> 
> Somewhat, but this has considerable knock on effects, e.g. currently DOT_PROD is
> treated as a widening operation and so is handled by supportable_widening_operation
> which does not support calls. There's a significant number of places which work on the
> tree EXPR (including constant folding) which all need to be changed.
> 
> > In particular all the changes around
> > optab_subtype look like they make a bad API worse ... at least a single
> > optab_vector_mixed_sign should suffice here, no need to make it a flags
> > kind.
> 
> The reason I did so is because depending on where the query is done it does use
> different subtypes currently.  During detection it uses optab_default, and during
> vectorization optab_vector.  For this instruction this difference doesn't seem to be
> used, but did not want to lose this information in case something depended on it.
> 
> But can make it just one.
> 
> > 
> > +  /* If we have a sign changing dot product we need to check that the
> > +     promoted type if unsigned has at least the same precision as the
> > final
> > +     type of the dot-product.  */
> > +  if (subtype != optab_default)
> > +    {
> > +      tree mult_type = TREE_TYPE (unprom_mult.op);
> > +      if (TYPE_SIGN (mult_type) == UNSIGNED
> > +         && TYPE_PRECISION (mult_type) < TYPE_PRECISION (type))
> > +       return NULL;
> > +    }
> > 
> > I don't understand this - how do we ever arrive at a result with less precision?
> 
> The user could have manually truncated the results, i.e. in the detection code notice `mult`
> 
>       int av = a[i];
>       int bv = b[i];
>       SIGNEDNESS_2 short mult = av * bv;
>       res += mult;
> 
> which is a short, so it's manually truncating the multiplication which 
> is done as int by the instruction. If `mult` is unsigned then it will 
> truncate the result if the signed input to usdot was negative, unless 
> the Intermediate calculation is of the same precision as the 
> instruction. i.e. if mult is unsigned int then there's no truncation 
> going on, it's casting from int to unsigned int so it's safe to use then 
> as the instruction does the same thing internally.

It looks to me that we simply should only ever allow sing-changes
from multiplication result to the sum.  At least your example
above is not special to mixed sign multiplications, no?

> > And why's this not an issue for signed multiplication?
> 
> It is, but in that case it's handled by the type jousting, which doesn't 
> allow the type mismatch. i.e.
> 
> #define SIGNEDNESS_1 unsigned
> #define SIGNEDNESS_2 unsigned
> #define SIGNEDNESS_3 signed
> #define SIGNEDNESS_4 signed
> 
> SIGNEDNESS_1 int __attribute__ ((noipa))
> f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a,
>    SIGNEDNESS_4 char *restrict b)
> {
>   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
>     {
>       int av = a[i];
>       int bv = b[i];
>       SIGNEDNESS_2 short mult = av * bv;
>       res += mult;
>     }
>   return res;
> }
> 
> Is also not detected as a dot product.  By adding the carve out to the 
> widen multiplication detection it now allows this case through so I 
> handle it in the detection code.  Thinking about it now, it seems more 
> logical to add this case handling inside the type jousting code as I 
> don't think it's ever something you'd want.

Yeah, I think we only need to look through sign changes on the
multiplication result.

> > Also...
> > 
> > +  /* If we have a sign changing dot-product the dot-product itself does
> > any
> > +     sign conversions, so consume the type and use the unpromoted types.
> > */
> > +  tree mult_arg1, mult_arg2;
> > +  if (subtype == optab_default)
> > +    {
> > +      mult_arg1 = mult_oprnd[0];
> > +      mult_arg2 = mult_oprnd[1];
> > +    }
> > +  else
> > +    {
> > +      mult_arg1 = unprom0[0].op;
> > +      mult_arg2 = unprom0[1].op;
> > +    }
> >    pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,
> > -                                     mult_oprnd[0], mult_oprnd[1],
> > oprnd1);
> > +                                     mult_arg1, mult_arg2, oprnd1);
> > 
> > I thought DOT_PROD always performs the promotion.  Maybe mult_oprnd
> > and unprom0 are just misnamed here?
> 
> Somewhat, in a normal dot-product the sign of the multiplication are the 
> same here as the "unpromoted" types. So after vect_convert_input these 
> two types are the same.
> 
> However because here the sign changes and to maintain the semantics of 
> the C code there's an extra conversion here to get the arguments in the 
> same sign.  That needs to be stripped before given to the instruction 
> which does the conversion internally.

Yes, but then why's that not done by the detection code?  That is,
does it (mis-)handle the (int)short_a * (int)(unsigned short)short_b
where we'd want the mixed-sign handling and not strip the
unsigned short conversion from short_b?

Richard.

> 
> Regards,
> Tamar
> 
> > 
> > Richard.
> > 
> > > Regards,
> > > Tamar
> > >
> > > >
> > > > The tree.def docs say the sum is also possibly widening but I don't
> > > > see this covered by the optab so we should eventually remove this
> > > > feature from the tree side.  In fact the tree-cfg.c verifier
> > > > requires the addition to be not widening - thus only tree.def needs
> > adjustment.
> > > >
> > > > >  @cindex @code{ssad@var{m}} instruction pattern  @item
> > > > > @samp{ssad@var{m}} diff --git a/gcc/optabs-tree.h
> > > > > b/gcc/optabs-tree.h index
> > > > >
> > > >
> > c3aaa1a416991e856d3e24da45968a92ebada82c..ebc23ac86fe99057f375781c2f
> > > > 19
> > > > > 90e0548ba08d 100644
> > > > > --- a/gcc/optabs-tree.h
> > > > > +++ b/gcc/optabs-tree.h
> > > > > @@ -27,11 +27,29 @@ along with GCC; see the file COPYING3.  If not
> > see
> > > > >     shift amount vs. machines that take a vector for the shift amount.
> > > > > */  enum optab_subtype  {
> > > > > -  optab_default,
> > > > > -  optab_scalar,
> > > > > -  optab_vector
> > > > > +  optab_default = 1 << 0,
> > > > > +  optab_scalar = 1 << 1,
> > > > > +  optab_vector = 1 << 2,
> > > > > +  optab_signed_to_unsigned = 1 << 3,  optab_unsigned_to_signed =
> > > > > + 1 << 4
> > > > >  };
> > > > >
> > > > > +/* Override the OrEqual-operator so we can use optab_subtype as a
> > > > > +bit flag.  */ inline enum optab_subtype& operator |= (enum
> > > > optab_subtype&
> > > > > +a, enum optab_subtype b) {
> > > > > +    return a = static_cast<optab_subtype>(static_cast<int>(a)
> > > > > +					  | static_cast<int>(b));
> > > > > +}
> > > > > +
> > > > > +/* Override the Or-operator so we can use optab_subtype as a bit
> > > > > +flag.  */ inline enum optab_subtype operator | (enum
> > > > > +optab_subtype a, enum optab_subtype b) {
> > > > > +    return static_cast<optab_subtype>(static_cast<int>(a)
> > > > > +				      | static_cast<int>(b)); }
> > > > > +
> > > > >  /* Return the optab used for computing the given operation on the
> > > > > type
> > > > given by
> > > > >     the second argument.  The third argument distinguishes between
> > > > > the
> > > > types of
> > > > >     vector shifts and rotates.  */ diff --git a/gcc/optabs-tree.c
> > > > > b/gcc/optabs-tree.c index
> > > > >
> > > >
> > 95ffe397c23e80c105afea52e9d47216bf52f55a..2f60004545defc53182e004eea
> > > > 1e
> > > > > 5c22b7453072 100644
> > > > > --- a/gcc/optabs-tree.c
> > > > > +++ b/gcc/optabs-tree.c
> > > > > @@ -127,7 +127,17 @@ optab_for_tree_code (enum tree_code code,
> > > > const_tree type,
> > > > >        return TYPE_UNSIGNED (type) ? usum_widen_optab :
> > > > > ssum_widen_optab;
> > > > >
> > > > >      case DOT_PROD_EXPR:
> > > > > -      return TYPE_UNSIGNED (type) ? udot_prod_optab :
> > sdot_prod_optab;
> > > > > +      {
> > > > > +	gcc_assert (subtype & optab_default
> > > > > +		    || subtype & optab_vector
> > > > > +		    || subtype & optab_signed_to_unsigned
> > > > > +		    || subtype & optab_unsigned_to_signed);
> > > > > +
> > > > > +	if (subtype & (optab_unsigned_to_signed |
> > > > optab_signed_to_unsigned))
> > > > > +	  return usdot_prod_optab;
> > > > > +
> > > > > +	return (TYPE_UNSIGNED (type) ? udot_prod_optab :
> > > > sdot_prod_optab);
> > > > > +      }
> > > > >
> > > > >      case SAD_EXPR:
> > > > >        return TYPE_UNSIGNED (type) ? usad_optab : ssad_optab; diff
> > > > > --git a/gcc/optabs.c b/gcc/optabs.c index
> > > > >
> > > >
> > f4614a394587787293dc8b680a38901f7906f61c..2e18b76de1412eab71971753ac
> > > > 67
> > > > > 8597c0d00098 100644
> > > > > --- a/gcc/optabs.c
> > > > > +++ b/gcc/optabs.c
> > > > > @@ -262,6 +262,11 @@ expand_widen_pattern_expr (sepops ops, rtx
> > > > > op0,
> > > > rtx op1, rtx wide_op,
> > > > >    bool sbool = false;
> > > > >
> > > > >    oprnd0 = ops->op0;
> > > > > +  if (nops >= 2)
> > > > > +    oprnd1 = ops->op1;
> > > > > +  if (nops >= 3)
> > > > > +    oprnd2 = ops->op2;
> > > > > +
> > > > >    tmode0 = TYPE_MODE (TREE_TYPE (oprnd0));
> > > > >    if (ops->code == VEC_UNPACK_FIX_TRUNC_HI_EXPR
> > > > >        || ops->code == VEC_UNPACK_FIX_TRUNC_LO_EXPR) @@ -285,6
> > > > +290,27
> > > > > @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx
> > > > wide_op,
> > > > >  	   ? vec_unpacks_sbool_hi_optab : vec_unpacks_sbool_lo_optab);
> > > > >        sbool = true;
> > > > >      }
> > > > > +  else if (ops->code == DOT_PROD_EXPR)
> > > > > +    {
> > > > > +      enum optab_subtype subtype = optab_default;
> > > > > +      signop sign1 = TYPE_SIGN (TREE_TYPE (oprnd0));
> > > > > +      signop sign2 = TYPE_SIGN (TREE_TYPE (oprnd1));
> > > > > +      if (sign1 == sign2)
> > > > > +	;
> > > > > +      else if (sign1 == SIGNED && sign2 == UNSIGNED)
> > > > > +	{
> > > > > +	  subtype |= optab_signed_to_unsigned;
> > > > > +	  /* Same as optab_unsigned_to_signed but flip the operands.  */
> > > > > +	  std::swap (op0, op1);
> > > > > +	}
> > > > > +      else if (sign1 == UNSIGNED && sign2 == SIGNED)
> > > > > +	subtype |= optab_unsigned_to_signed;
> > > > > +      else
> > > > > +	gcc_unreachable ();
> > > > > +
> > > > > +      widen_pattern_optab
> > > > > +	= optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), subtype);
> > > > > +    }
> > > > >    else
> > > > >      widen_pattern_optab
> > > > >        = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0),
> > > > > optab_default); @@ -298,10 +324,7 @@ expand_widen_pattern_expr
> > > > (sepops ops, rtx op0, rtx op1, rtx wide_op,
> > > > >    gcc_assert (icode != CODE_FOR_nothing);
> > > > >
> > > > >    if (nops >= 2)
> > > > > -    {
> > > > > -      oprnd1 = ops->op1;
> > > > > -      tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
> > > > > -    }
> > > > > +    tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
> > > > >    else if (sbool)
> > > > >      {
> > > > >        nops = 2;
> > > > > @@ -316,7 +339,6 @@ expand_widen_pattern_expr (sepops ops, rtx
> > > > > op0,
> > > > rtx op1, rtx wide_op,
> > > > >      {
> > > > >        gcc_assert (tmode1 == tmode0);
> > > > >        gcc_assert (op1);
> > > > > -      oprnd2 = ops->op2;
> > > > >        wmode = TYPE_MODE (TREE_TYPE (oprnd2));
> > > > >      }
> > > > >
> > > > > diff --git a/gcc/optabs.def b/gcc/optabs.def index
> > > > >
> > > >
> > b192a9d070b8aa72e5676b2eaa020b5bdd7ffcc8..f470c2168378cec840edf7fbd
> > > > b7c
> > > > > 18615baae928 100644
> > > > > --- a/gcc/optabs.def
> > > > > +++ b/gcc/optabs.def
> > > > > @@ -352,6 +352,7 @@ OPTAB_D (uavg_ceil_optab, "uavg$a3_ceil")
> > > > OPTAB_D
> > > > > (sdot_prod_optab, "sdot_prod$I$a")  OPTAB_D (ssum_widen_optab,
> > > > > "widen_ssum$I$a3")  OPTAB_D (udot_prod_optab, "udot_prod$I$a")
> > > > > +OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
> > > > >  OPTAB_D (usum_widen_optab, "widen_usum$I$a3")  OPTAB_D
> > > > (usad_optab,
> > > > > "usad$I$a")  OPTAB_D (ssad_optab, "ssad$I$a") diff --git
> > > > > a/gcc/tree-cfg.c b/gcc/tree-cfg.c index
> > > > >
> > > >
> > 7e3aae5f9c28a49feedc7cc66e8ac0d476b9f28a..58b55bb648ad97d514f1fa18bb
> > > > 00
> > > > > 808fd2678b42 100644
> > > > > --- a/gcc/tree-cfg.c
> > > > > +++ b/gcc/tree-cfg.c
> > > > > @@ -4421,7 +4421,8 @@ verify_gimple_assign_ternary (gassign *stmt)
> > > > >  		  && !SCALAR_FLOAT_TYPE_P (rhs1_type))
> > > > >  		 || (!INTEGRAL_TYPE_P (lhs_type)
> > > > >  		     && !SCALAR_FLOAT_TYPE_P (lhs_type))))
> > > > > -	    || !types_compatible_p (rhs1_type, rhs2_type)
> > > > > +	    || (!types_compatible_p (rhs1_type, rhs2_type)
> > > > > +		&& TYPE_SIGN (rhs1_type) == TYPE_SIGN (rhs2_type))
> > > >
> > > > That's not restrictive enough.  I suggest you use
> > > >
> > > >             && element_precision (rhs1_type) != element_precision
> > > > (rhs2_type)
> > > >
> > > > instead.
> > > >
> > > > As said, I'm not sure all the changes in this patch are required.
> > > >
> > > > Please elaborate.
> > > >
> > > > Thanks,
> > > > Richard.
> > > >
> > > > >  	    || !useless_type_conversion_p (lhs_type, rhs3_type)
> > > > >  	    || maybe_lt (GET_MODE_SIZE (element_mode (rhs3_type)),
> > > > >  			 2 * GET_MODE_SIZE (element_mode (rhs1_type))))
> > > > diff --git
> > > > > a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index
> > > > >
> > > >
> > 93fa2928e001c154bd4a9a73ac1dbbbf73c456df..cb8f5fbb6abca181c4171194d1
> > > > 9f
> > > > > ec29ec6e4176 100644
> > > > > --- a/gcc/tree-vect-loop.c
> > > > > +++ b/gcc/tree-vect-loop.c
> > > > > @@ -6401,6 +6401,33 @@ build_vect_cond_expr (enum tree_code
> > code,
> > > > tree vop[3], tree mask,
> > > > >      }
> > > > >  }
> > > > >
> > > > > +/* Determine the optab_subtype to use for the given CODE and STMT.
> > > > For
> > > > > +   most CODE this will be optab_vector, however for certain
> > > > > + operations
> > > > such as
> > > > > +   DOT_PROD_EXPR where the operation can different signs for the
> > > > operands we
> > > > > +   need to be able to pick the right optabs.  */
> > > > > +
> > > > > +static enum optab_subtype
> > > > > +vect_determine_dot_kind (tree_code code, stmt_vec_info
> > > > > +stmt_vinfo) {
> > > > > +  enum optab_subtype subtype = optab_vector;
> > > > > +  switch (code)
> > > > > +    {
> > > > > +      case DOT_PROD_EXPR:
> > > > > +	{
> > > > > +	  gassign *stmt = as_a <gassign *> (STMT_VINFO_STMT (stmt_vinfo));
> > > > > +	  signop rhs1_sign = TYPE_SIGN (TREE_TYPE (gimple_assign_rhs1
> > > > (stmt)));
> > > > > +	  signop rhs2_sign = TYPE_SIGN (TREE_TYPE (gimple_assign_rhs2
> > > > (stmt)));
> > > > > +	  if (rhs1_sign != rhs2_sign)
> > > > > +	    subtype |= optab_unsigned_to_signed;
> > > > > +	  break;
> > > > > +	}
> > > > > +      default:
> > > > > +	break;
> > > > > +    }
> > > > > +
> > > > > +  return subtype;
> > > > > +}
> > > > > +
> > > > >  /* Function vectorizable_reduction.
> > > > >
> > > > >     Check if STMT_INFO performs a reduction operation that can be
> > > > vectorized.
> > > > > @@ -7189,7 +7216,8 @@ vectorizable_reduction (loop_vec_info
> > > > loop_vinfo,
> > > > >        bool ok = true;
> > > > >
> > > > >        /* 4.1. check support for the operation in the loop  */
> > > > > -      optab optab = optab_for_tree_code (code, vectype_in,
> > optab_vector);
> > > > > +      enum optab_subtype subtype = vect_determine_dot_kind (code,
> > > > stmt_info);
> > > > > +      optab optab = optab_for_tree_code (code, vectype_in,
> > > > > + subtype);
> > > > >        if (!optab)
> > > > >  	{
> > > > >  	  if (dump_enabled_p ())
> > > > > diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
> > > > > index
> > > > >
> > > >
> > 441d6cd28c4eaded7abd756164890dbcffd2f3b8..943c001fb13777b4d1513841f
> > > > a84
> > > > > 942316846d5e 100644
> > > > > --- a/gcc/tree-vect-patterns.c
> > > > > +++ b/gcc/tree-vect-patterns.c
> > > > > @@ -201,7 +201,8 @@ vect_get_external_def_edge (vec_info *vinfo,
> > > > > tree
> > > > > var)  static bool  vect_supportable_direct_optab_p (vec_info
> > > > > *vinfo, tree otype, tree_code code,
> > > > >  				 tree itype, tree *vecotype_out,
> > > > > -				 tree *vecitype_out = NULL)
> > > > > +				 tree *vecitype_out = NULL,
> > > > > +				 enum optab_subtype subtype =
> > > > optab_default)
> > > > >  {
> > > > >    tree vecitype = get_vectype_for_scalar_type (vinfo, itype);
> > > > >    if (!vecitype)
> > > > > @@ -211,7 +212,7 @@ vect_supportable_direct_optab_p (vec_info
> > > > > *vinfo,
> > > > tree otype, tree_code code,
> > > > >    if (!vecotype)
> > > > >      return false;
> > > > >
> > > > > -  optab optab = optab_for_tree_code (code, vecitype,
> > > > > optab_default);
> > > > > +  optab optab = optab_for_tree_code (code, vecitype, subtype);
> > > > >    if (!optab)
> > > > >      return false;
> > > > >
> > > > > @@ -487,14 +488,31 @@ vect_joust_widened_integer (tree type, bool
> > > > > shift_p, tree op,  }
> > > > >
> > > > >  /* Return true if the common supertype of NEW_TYPE and
> > > > *COMMON_TYPE
> > > > > -   is narrower than type, storing the supertype in *COMMON_TYPE if
> > so.
> > > > */
> > > > > +   is narrower than type, storing the supertype in *COMMON_TYPE if
> > so.
> > > > > +   If ALLOW_SHORT_SIGN_MISMATCH then accept that
> > *COMMON_TYPE
> > > > and NEW_TYPE
> > > > > +   may be of different signs but equal precision.   */
> > > > >
> > > > >  static bool
> > > > > -vect_joust_widened_type (tree type, tree new_type, tree
> > > > *common_type)
> > > > > +vect_joust_widened_type (tree type, tree new_type, tree
> > > > *common_type,
> > > > > +			 bool allow_short_sign_mismatch = false)
> > > > >  {
> > > > >    if (types_compatible_p (*common_type, new_type))
> > > > >      return true;
> > > > >
> > > > > +  /* Check if the mismatch is only in the sign and if we have
> > > > > +     allow_short_sign_mismatch then allow it.  */
> > > > > +  if (allow_short_sign_mismatch
> > > > > +      && TYPE_SIGN (*common_type) != TYPE_SIGN (new_type))
> > > > > +    {
> > > > > +      bool sign = TYPE_SIGN (*common_type) == UNSIGNED;
> > > > > +      tree eq_type
> > > > > +	= build_nonstandard_integer_type (TYPE_PRECISION (new_type),
> > > > > +					  sign);
> > > > > +
> > > > > +      if (types_compatible_p (*common_type, eq_type))
> > > > > +	return true;
> > > > > +    }
> > > > > +
> > > > >    /* See if *COMMON_TYPE can hold all values of NEW_TYPE.  */
> > > > >    if ((TYPE_PRECISION (new_type) < TYPE_PRECISION (*common_type))
> > > > >        && (TYPE_UNSIGNED (new_type) || !TYPE_UNSIGNED
> > > > (*common_type)))
> > > > > @@ -532,6 +550,9 @@ vect_joust_widened_type (tree type, tree
> > > > new_type, tree *common_type)
> > > > >     to a type that (a) is narrower than the result of STMT_INFO and
> > > > >     (b) can hold all leaf operand values.
> > > > >
> > > > > +   If ALLOW_SHORT_SIGN_MISMATCH then allow that the signs of the
> > > > operands
> > > > > +   may differ in signs but not in precision.
> > > > > +
> > > > >     Return 0 if STMT_INFO isn't such a tree, or if no such COMMON_TYPE
> > > > >     exists.  */
> > > > >
> > > > > @@ -539,7 +560,8 @@ static unsigned int  vect_widened_op_tree
> > > > > (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
> > > > >  		      tree_code widened_code, bool shift_p,
> > > > >  		      unsigned int max_nops,
> > > > > -		      vect_unpromoted_value *unprom, tree *common_type)
> > > > > +		      vect_unpromoted_value *unprom, tree *common_type,
> > > > > +		      bool allow_short_sign_mismatch = false)
> > > > >  {
> > > > >    /* Check for an integer operation with the right code.  */
> > > > >    gassign *assign = dyn_cast <gassign *> (stmt_info->stmt); @@
> > > > > -600,7
> > > > > +622,8 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info
> > > > stmt_info, tree_code code,
> > > > >  		= vinfo->lookup_def (this_unprom->op);
> > > > >  	      nops = vect_widened_op_tree (vinfo, def_stmt_info, code,
> > > > >  					   widened_code, shift_p, max_nops,
> > > > > -					   this_unprom, common_type);
> > > > > +					   this_unprom, common_type,
> > > > > +					   allow_short_sign_mismatch);
> > > > >  	      if (nops == 0)
> > > > >  		return 0;
> > > > >
> > > > > @@ -617,7 +640,8 @@ vect_widened_op_tree (vec_info *vinfo,
> > > > stmt_vec_info stmt_info, tree_code code,
> > > > >  	      if (i == 0)
> > > > >  		*common_type = this_unprom->type;
> > > > >  	      else if (!vect_joust_widened_type (type, this_unprom->type,
> > > > > -						 common_type))
> > > > > +						 common_type,
> > > > > +						 allow_short_sign_mismatch))
> > > > >  		return 0;
> > > > >  	    }
> > > > >  	}
> > > > > @@ -888,21 +912,24 @@ vect_reassociating_reduction_p (vec_info
> > > > > *vinfo,
> > > > >
> > > > >     Try to find the following pattern:
> > > > >
> > > > > -     type x_t, y_t;
> > > > > +     type1a x_t
> > > > > +     type1b y_t;
> > > > >       TYPE1 prod;
> > > > >       TYPE2 sum = init;
> > > > >     loop:
> > > > >       sum_0 = phi <init, sum_1>
> > > > >       S1  x_t = ...
> > > > >       S2  y_t = ...
> > > > > -     S3  x_T = (TYPE1) x_t;
> > > > > -     S4  y_T = (TYPE1) y_t;
> > > > > +     S3  x_T = (TYPE3) x_t;
> > > > > +     S4  y_T = (TYPE4) y_t;
> > > > >       S5  prod = x_T * y_T;
> > > > >       [S6  prod = (TYPE2) prod;  #optional]
> > > > >       S7  sum_1 = prod + sum_0;
> > > > >
> > > > > -   where 'TYPE1' is exactly double the size of type 'type', and 'TYPE2' is
> > the
> > > > > -   same size of 'TYPE1' or bigger. This is a special case of a reduction
> > > > > +   where 'TYPE1' is exactly double the size of type 'type1a' and 'type1b',
> > > > > +   the sign of 'TYPE1' must be one of 'type1a' or 'type1b' but the sign of
> > > > > +   'type1a' and 'type1b' can differ. 'TYPE2' is the same size of 'TYPE1' or
> > > > > +   bigger and must be the same sign. This is a special case of a
> > > > > + reduction
> > > > >     computation.
> > > > >
> > > > >     Input:
> > > > > @@ -939,15 +966,16 @@ vect_recog_dot_prod_pattern (vec_info
> > > > > *vinfo,
> > > > >
> > > > >    /* Look for the following pattern
> > > > >            DX = (TYPE1) X;
> > > > > -          DY = (TYPE1) Y;
> > > > > +	  DY = (TYPE2) Y;
> > > > >            DPROD = DX * DY;
> > > > > -          DDPROD = (TYPE2) DPROD;
> > > > > +	  DDPROD = (TYPE3) DPROD;
> > > > >            sum_1 = DDPROD + sum_0;
> > > > >       In which
> > > > >       - DX is double the size of X
> > > > >       - DY is double the size of Y
> > > > >       - DX, DY, DPROD all have the same type but the sign
> > > > > -       between DX, DY and DPROD can differ.
> > > > > +       between DX, DY and DPROD can differ. The sign of DPROD
> > > > > +       is one of the signs of DX or DY.
> > > > >       - sum is the same size of DPROD or bigger
> > > > >       - sum has been recognized as a reduction variable.
> > > > >
> > > > > @@ -986,14 +1014,41 @@ vect_recog_dot_prod_pattern (vec_info
> > *vinfo,
> > > > >       inside the loop (in case we are analyzing an outer-loop).  */
> > > > >    vect_unpromoted_value unprom0[2];
> > > > >    if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR,
> > > > WIDEN_MULT_EXPR,
> > > > > -			     false, 2, unprom0, &half_type))
> > > > > +			     false, 2, unprom0, &half_type, true))
> > > > >      return NULL;
> > > > >
> > > > > +  /* Check to see if there is a sign change happening in the
> > > > > + operands of
> > > > the
> > > > > +     multiplication and pick the appropriate optab subtype.  */
> > > > > +  enum optab_subtype subtype;
> > > > > +  tree rhs_type1 = unprom0[0].type;
> > > > > +  tree rhs_type2 = unprom0[1].type;
> > > > > +  if (TYPE_SIGN (rhs_type1) == TYPE_SIGN (rhs_type2))
> > > > > +     subtype = optab_default;
> > > > > +  else if (TYPE_SIGN (rhs_type1) == SIGNED
> > > > > +	   && TYPE_SIGN (rhs_type2) == UNSIGNED)
> > > > > +     subtype = optab_signed_to_unsigned;
> > > > > +  else if (TYPE_SIGN (rhs_type1) == UNSIGNED
> > > > > +	   && TYPE_SIGN (rhs_type2) == SIGNED)
> > > > > +     subtype = optab_unsigned_to_signed;
> > > > > +  else
> > > > > +    gcc_unreachable ();
> > > > > +
> > > > > +  /* If we have a sign changing dot product we need to check that the
> > > > > +     promoted type if unsigned has at least the same precision as the
> > final
> > > > > +     type of the dot-product.  */
> > > > > +  if (subtype != optab_default)
> > > > > +    {
> > > > > +      tree mult_type = TREE_TYPE (unprom_mult.op);
> > > > > +      if (TYPE_SIGN (mult_type) == UNSIGNED
> > > > > +	  && TYPE_PRECISION (mult_type) < TYPE_PRECISION (type))
> > > > > +	return NULL;
> > > > > +    }
> > > > > +
> > > > >    vect_pattern_detected ("vect_recog_dot_prod_pattern",
> > > > > last_stmt);
> > > > >
> > > > >    tree half_vectype;
> > > > >    if (!vect_supportable_direct_optab_p (vinfo, type,
> > > > > DOT_PROD_EXPR,
> > > > half_type,
> > > > > -					type_out, &half_vectype))
> > > > > +					type_out, &half_vectype, subtype))
> > > > >      return NULL;
> > > > >
> > > > >    /* Get the inputs in the appropriate types.  */ @@ -1002,8
> > > > > +1057,22 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
> > > > >  		       unprom0, half_vectype);
> > > > >
> > > > >    var = vect_recog_temp_ssa_var (type, NULL);
> > > > > +
> > > > > +  /* If we have a sign changing dot-product the dot-product itself does
> > any
> > > > > +     sign conversions, so consume the type and use the unpromoted
> > > > > + types.  */  tree mult_arg1, mult_arg2;  if (subtype ==
> > > > > + optab_default)
> > > > > +    {
> > > > > +      mult_arg1 = mult_oprnd[0];
> > > > > +      mult_arg2 = mult_oprnd[1];
> > > > > +    }
> > > > > +  else
> > > > > +    {
> > > > > +      mult_arg1 = unprom0[0].op;
> > > > > +      mult_arg2 = unprom0[1].op;
> > > > > +    }
> > > > >    pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,
> > > > > -				      mult_oprnd[0], mult_oprnd[1], oprnd1);
> > > > > +				      mult_arg1, mult_arg2, oprnd1);
> > > > >
> > > > >    return pattern_stmt;
> > > > >  }
> > > > >
> > > > >
> > > > >
> > > >
> > > > --
> > > > Richard Biener <rguenther@suse.de>
> > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> > > > Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > >
> > 
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> > Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes.
  2021-05-10 13:29         ` Richard Biener
@ 2021-05-25 14:57           ` Tamar Christina
  2021-05-26  8:56             ` Richard Biener
  0 siblings, 1 reply; 35+ messages in thread
From: Tamar Christina @ 2021-05-25 14:57 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd

[-- Attachment #1: Type: text/plain, Size: 39343 bytes --]

Hi Richi,

Here's a respun version of the patch.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* optabs.def (usdot_prod_optab): New.
	* doc/md.texi: Document it and clarify other dot prod optabs.
	* optabs-tree.h (enum optab_subtype): Add optab_vector_mixed_sign.
	* optabs-tree.c (optab_for_tree_code): Support usdot_prod_optab.
	* optabs.c (expand_widen_pattern_expr): Likewise.
	* tree-cfg.c (verify_gimple_assign_ternary): Likewise.
	* tree-vect-loop.c (vect_determine_dot_kind): New.
	(vectorizable_reduction): Query dot-product kind.
	* tree-vect-patterns.c (vect_supportable_direct_optab_p): Take optional
	optab subtype.
	(vect_joust_widened_type, vect_widened_op_tree): Optionally ignore
	mismatch types.
	(vect_recog_dot_prod_pattern): Support usdot_prod_optab.


> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Monday, May 10, 2021 2:29 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> Subject: RE: [PATCH 1/4]middle-end Vect: Add support for dot-product
> where the sign for the multiplicant changes.
> 
> On Mon, 10 May 2021, Tamar Christina wrote:
> 
> >
> >
> > > -----Original Message-----
> > > From: Richard Biener <rguenther@suse.de>
> > > Sent: Monday, May 10, 2021 12:40 PM
> > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> > > Subject: RE: [PATCH 1/4]middle-end Vect: Add support for dot-product
> > > where the sign for the multiplicant changes.
> > >
> > > On Fri, 7 May 2021, Tamar Christina wrote:
> > >
> > > > Hi Richi,
> > > >
> > > > > -----Original Message-----
> > > > > From: Richard Biener <rguenther@suse.de>
> > > > > Sent: Friday, May 7, 2021 12:46 PM
> > > > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> > > > > Subject: Re: [PATCH 1/4]middle-end Vect: Add support for
> > > > > dot-product where the sign for the multiplicant changes.
> > > > >
> > > > > On Wed, 5 May 2021, Tamar Christina wrote:
> > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > This patch adds support for a dot product where the sign of
> > > > > > the multiplication arguments differ. i.e. one is signed and
> > > > > > one is unsigned but the precisions are the same.
> > > > > >
> > > > > > #define N 480
> > > > > > #define SIGNEDNESS_1 unsigned
> > > > > > #define SIGNEDNESS_2 signed
> > > > > > #define SIGNEDNESS_3 signed
> > > > > > #define SIGNEDNESS_4 unsigned
> > > > > >
> > > > > > SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int
> > > > > > res,
> > > > > > SIGNEDNESS_3 char *restrict a,
> > > > > >    SIGNEDNESS_4 char *restrict b) {
> > > > > >   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> > > > > >     {
> > > > > >       int av = a[i];
> > > > > >       int bv = b[i];
> > > > > >       SIGNEDNESS_2 short mult = av * bv;
> > > > > >       res += mult;
> > > > > >     }
> > > > > >   return res;
> > > > > > }
> > > > > >
> > > > > > The operations are performed as if the operands were extended
> > > > > > to a 32-bit
> > > > > value.
> > > > > > As such this operation isn't valid if there is an intermediate
> > > > > > conversion to an unsigned value. i.e.  if SIGNEDNESS_2 is unsigned.
> > > > > >
> > > > > > more over if the signs of SIGNEDNESS_3 and SIGNEDNESS_4 are
> > > > > > flipped the same optab is used but the operands are flipped in
> > > > > > the optab
> > > > > expansion.
> > > > > >
> > > > > > To support this the patch extends the dot-product detection to
> > > > > > optionally ignore operands with different signs and stores
> > > > > > this information in the optab subtype which is now made a bitfield.
> > > > > >
> > > > > > The subtype can now additionally controls which optab an EXPR
> > > > > > can expand
> > > > > to.
> > > > > >
> > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > > > >
> > > > > > Ok for master?
> > > > > >
> > > > > > Thanks,
> > > > > > Tamar
> > > > > >
> > > > > > gcc/ChangeLog:
> > > > > >
> > > > > > 	* optabs.def (usdot_prod_optab): New.
> > > > > > 	* doc/md.texi: Document it.
> > > > > > 	* optabs-tree.c (optab_for_tree_code): Support
> usdot_prod_optab.
> > > > > > 	* optabs-tree.h (enum optab_subtype): Likewise.
> > > > > > 	* optabs.c (expand_widen_pattern_expr): Likewise.
> > > > > > 	* tree-cfg.c (verify_gimple_assign_ternary): Likewise.
> > > > > > 	* tree-vect-loop.c (vect_determine_dot_kind): New.
> > > > > > 	(vectorizable_reduction): Query dot-product kind.
> > > > > > 	* tree-vect-patterns.c (vect_supportable_direct_optab_p):
> > > > > > Take
> > > > > optional
> > > > > > 	optab subtype.
> > > > > > 	(vect_joust_widened_type, vect_widened_op_tree):
> Optionally
> > > > > ignore
> > > > > > 	mismatch types.
> > > > > > 	(vect_recog_dot_prod_pattern): Support usdot_prod_optab.
> > > > > >
> > > > > > --- inline copy of patch --
> > > > > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index
> > > > > >
> > > > >
> > >
> d166a0debedf4d8edf55c842bcf4ff4690b3e9ce..baf20416e63745097825fc30fd
> > > > > f2
> > > > > > e66bc80d7d23 100644
> > > > > > --- a/gcc/doc/md.texi
> > > > > > +++ b/gcc/doc/md.texi
> > > > > > @@ -5440,11 +5440,13 @@ Like @samp{fold_left_plus_@var{m}},
> > > > > > but
> > > > > takes
> > > > > > an additional mask operand  @item @samp{sdot_prod@var{m}}
> > > @cindex
> > > > > > @code{udot_prod@var{m}} instruction pattern  @itemx
> > > > > > @samp{udot_prod@var{m}}
> > > > > > +@cindex @code{usdot_prod@var{m}} instruction pattern @itemx
> > > > > > +@samp{usdot_prod@var{m}}
> > > > > >  Compute the sum of the products of two signed/unsigned
> elements.
> > > > > > -Operand 1 and operand 2 are of the same mode. Their product,
> > > > > > which is of a -wider mode, is computed and added to operand 3.
> > > > > > Operand 3 is of a mode equal or -wider than the mode of the
> > > > > > product. The result is placed in operand 0, which -is of the
> > > > > > same mode
> > > as operand 3.
> > > > > > +Operand 1 and operand 2 are of the same mode but may differ
> > > > > > +in
> > > signs.
> > > > > > +Their product, which is of a wider mode, is computed and
> > > > > > +added to
> > > > > operand 3.
> > > > > > +Operand 3 is of a mode equal or wider than the mode of the
> product.
> > > > > > +The result is placed in operand 0, which is of the same mode
> > > > > > +as
> > > operand 3.
> > > > >
> > > > > This doesn't really say what the 's', 'u' and 'us' specify.
> > > > > Since we're doing a widen multiplication and then a non-widening
> > > > > addition we only need to know the effective sign of the
> > > > > multiplication so I think
> > > the existing 's' and 'u'
> > > > > are enough to cover all cases?
> > > >
> > > > The existing 's' and 'u' enforce that both operands of the
> > > > multiplication are of the same sign.  So for e.g. 'u' both operand
> > > > must be
> > > unsigned.
> > > >
> > > > In the `us` case one can be signed and one unsigned. Operationally
> > > > this does a sign extension to the wider type for the signed value,
> > > > and the unsigned value gets zero extended first, and then converts
> > > > it to unsigned to perform the unsigned multiplication, conforming
> > > > to the C
> > > promotion rules.
> > > >
> > > > TL;DR; Without a new optab I can't tell during expansion which
> > > > semantic the operation had at the gimple/C level as modes don't carry
> signs.
> > > >
> > > > Long version:
> > > >
> > > > The problem with using the existing patterns, because of their
> > > > enforcement of `av` and `bv` being the same sign is that we can't
> > > > remove the explicit sign extensions, but the multiplication must
> > > > be done on
> > > the sign/zero extended char input in the same sign.
> > > >
> > > > Which means (unless I am mistaken) to get the correct result, you
> > > > can't use neither `udot` nor `sdot` as semantically these would
> > > > zero or sign extend both operands from char to int to perform the
> > > > multiplication in the same sigh.  Whereas in this case, one
> > > > parameter is zero
> > > and one parameter is sign extended and the result is always an
> > > unsigned number.
> > > >
> > > > So basically
> > > >
> > > > udot<unsigned c, unsigned a, unsigned b> ==
> > > >    c = zero-ext (a) * zero-ext (b) sdot<signed c, signed a, signed
> > > > b> ==
> > > >    c = sign-ext (a) * sign-ext (b) usdot<unsigned c, unsigned a,
> > > > signed b> ==
> > > >    c = ((unsigned-conv) sign-ext (a)) * zero-ext (b)
> > > >
> > > > So semantically the existing optabs won't fit here. udot would
> > > > internally promote to unsigned types before the multiplication so
> > > > the result of the multiplication would be wrong.  sdot would
> > > > promote both to
> > > signed and do signed multiplication, so the result is also wrong.
> > > >
> > > > Now if I relax the constraint on the signs of udot and sdot there
> > > > are two
> > > problems:
> > > > RTL Modes don't contain signs.  So a target can't tell me how the
> > > > operands
> > > will be promoted.
> > > > So:
> > > >
> > > > 1) I can't really check which semantics the target will adhere to
> > > > on
> > > expansion.
> > > > 2) at expand time I have no way to differentiate between the two
> > > instructions variants, given just modes
> > > >      I can't tell whether I expand to the normal dot-product or
> > > > the new
> > > instruction.
> > >
> > > Ah, OK.  Indeed with such a weird instruction the new variant makes
> sense.
> > > Still can you please amend the optab documentation to say which
> > > operand is unsigned and which is signed?  Just 'may differ in signs'
> > > is bad.
> >
> > Sure, will expand on it.
> >
> > >
> > > Since the multiplication is commutative I wonder why you need to
> > > handle both signed_to_unsigned and unsigned_to_signed - we should
> > > just enforce a canonical order (like the optab does).
> >
> > Sure, I thought it would have been better to change the order at
> > expand time, but can do so at detection time.
> >
> > > I also think it's a particular bad fit for the bad
> > > optab_for_tree_code API - would any of that improve when using a
> > > direct internal function here?
> >
> > Somewhat, but this has considerable knock on effects, e.g. currently
> > DOT_PROD is treated as a widening operation and so is handled by
> > supportable_widening_operation which does not support calls. There's a
> > significant number of places which work on the tree EXPR (including
> constant folding) which all need to be changed.
> >
> > > In particular all the changes around optab_subtype look like they
> > > make a bad API worse ... at least a single optab_vector_mixed_sign
> > > should suffice here, no need to make it a flags kind.
> >
> > The reason I did so is because depending on where the query is done it
> > does use different subtypes currently.  During detection it uses
> > optab_default, and during vectorization optab_vector.  For this
> > instruction this difference doesn't seem to be used, but did not want to
> lose this information in case something depended on it.
> >
> > But can make it just one.
> >
> > >
> > > +  /* If we have a sign changing dot product we need to check that the
> > > +     promoted type if unsigned has at least the same precision as
> > > + the
> > > final
> > > +     type of the dot-product.  */
> > > +  if (subtype != optab_default)
> > > +    {
> > > +      tree mult_type = TREE_TYPE (unprom_mult.op);
> > > +      if (TYPE_SIGN (mult_type) == UNSIGNED
> > > +         && TYPE_PRECISION (mult_type) < TYPE_PRECISION (type))
> > > +       return NULL;
> > > +    }
> > >
> > > I don't understand this - how do we ever arrive at a result with less
> precision?
> >
> > The user could have manually truncated the results, i.e. in the
> > detection code notice `mult`
> >
> >       int av = a[i];
> >       int bv = b[i];
> >       SIGNEDNESS_2 short mult = av * bv;
> >       res += mult;
> >
> > which is a short, so it's manually truncating the multiplication which
> > is done as int by the instruction. If `mult` is unsigned then it will
> > truncate the result if the signed input to usdot was negative, unless
> > the Intermediate calculation is of the same precision as the
> > instruction. i.e. if mult is unsigned int then there's no truncation
> > going on, it's casting from int to unsigned int so it's safe to use
> > then as the instruction does the same thing internally.
> 
> It looks to me that we simply should only ever allow sing-changes from
> multiplication result to the sum.  At least your example above is not special to
> mixed sign multiplications, no?
> 
> > > And why's this not an issue for signed multiplication?
> >
> > It is, but in that case it's handled by the type jousting, which
> > doesn't allow the type mismatch. i.e.
> >
> > #define SIGNEDNESS_1 unsigned
> > #define SIGNEDNESS_2 unsigned
> > #define SIGNEDNESS_3 signed
> > #define SIGNEDNESS_4 signed
> >
> > SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int res,
> > SIGNEDNESS_3 char *restrict a,
> >    SIGNEDNESS_4 char *restrict b)
> > {
> >   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> >     {
> >       int av = a[i];
> >       int bv = b[i];
> >       SIGNEDNESS_2 short mult = av * bv;
> >       res += mult;
> >     }
> >   return res;
> > }
> >
> > Is also not detected as a dot product.  By adding the carve out to the
> > widen multiplication detection it now allows this case through so I
> > handle it in the detection code.  Thinking about it now, it seems more
> > logical to add this case handling inside the type jousting code as I
> > don't think it's ever something you'd want.
> 
> Yeah, I think we only need to look through sign changes on the multiplication
> result.
> 
> > > Also...
> > >
> > > +  /* If we have a sign changing dot-product the dot-product itself
> > > + does
> > > any
> > > +     sign conversions, so consume the type and use the unpromoted
> types.
> > > */
> > > +  tree mult_arg1, mult_arg2;
> > > +  if (subtype == optab_default)
> > > +    {
> > > +      mult_arg1 = mult_oprnd[0];
> > > +      mult_arg2 = mult_oprnd[1];
> > > +    }
> > > +  else
> > > +    {
> > > +      mult_arg1 = unprom0[0].op;
> > > +      mult_arg2 = unprom0[1].op;
> > > +    }
> > >    pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,
> > > -                                     mult_oprnd[0], mult_oprnd[1],
> > > oprnd1);
> > > +                                     mult_arg1, mult_arg2, oprnd1);
> > >
> > > I thought DOT_PROD always performs the promotion.  Maybe
> mult_oprnd
> > > and unprom0 are just misnamed here?
> >
> > Somewhat, in a normal dot-product the sign of the multiplication are
> > the same here as the "unpromoted" types. So after vect_convert_input
> > these two types are the same.
> >
> > However because here the sign changes and to maintain the semantics of
> > the C code there's an extra conversion here to get the arguments in
> > the same sign.  That needs to be stripped before given to the
> > instruction which does the conversion internally.
> 
> Yes, but then why's that not done by the detection code?  That is, does it
> (mis-)handle the (int)short_a * (int)(unsigned short)short_b where we'd
> want the mixed-sign handling and not strip the unsigned short conversion
> from short_b?
> 
> Richard.
> 
> >
> > Regards,
> > Tamar
> >
> > >
> > > Richard.
> > >
> > > > Regards,
> > > > Tamar
> > > >
> > > > >
> > > > > The tree.def docs say the sum is also possibly widening but I
> > > > > don't see this covered by the optab so we should eventually
> > > > > remove this feature from the tree side.  In fact the tree-cfg.c
> > > > > verifier requires the addition to be not widening - thus only
> > > > > tree.def needs
> > > adjustment.
> > > > >
> > > > > >  @cindex @code{ssad@var{m}} instruction pattern  @item
> > > > > > @samp{ssad@var{m}} diff --git a/gcc/optabs-tree.h
> > > > > > b/gcc/optabs-tree.h index
> > > > > >
> > > > >
> > >
> c3aaa1a416991e856d3e24da45968a92ebada82c..ebc23ac86fe99057f375781c2f
> > > > > 19
> > > > > > 90e0548ba08d 100644
> > > > > > --- a/gcc/optabs-tree.h
> > > > > > +++ b/gcc/optabs-tree.h
> > > > > > @@ -27,11 +27,29 @@ along with GCC; see the file COPYING3.  If
> > > > > > not
> > > see
> > > > > >     shift amount vs. machines that take a vector for the shift amount.
> > > > > > */  enum optab_subtype  {
> > > > > > -  optab_default,
> > > > > > -  optab_scalar,
> > > > > > -  optab_vector
> > > > > > +  optab_default = 1 << 0,
> > > > > > +  optab_scalar = 1 << 1,
> > > > > > +  optab_vector = 1 << 2,
> > > > > > +  optab_signed_to_unsigned = 1 << 3,
> > > > > > + optab_unsigned_to_signed =
> > > > > > + 1 << 4
> > > > > >  };
> > > > > >
> > > > > > +/* Override the OrEqual-operator so we can use optab_subtype
> > > > > > +as a bit flag.  */ inline enum optab_subtype& operator |=
> > > > > > +(enum
> > > > > optab_subtype&
> > > > > > +a, enum optab_subtype b) {
> > > > > > +    return a = static_cast<optab_subtype>(static_cast<int>(a)
> > > > > > +					  | static_cast<int>(b)); }
> > > > > > +
> > > > > > +/* Override the Or-operator so we can use optab_subtype as a
> > > > > > +bit flag.  */ inline enum optab_subtype operator | (enum
> > > > > > +optab_subtype a, enum optab_subtype b) {
> > > > > > +    return static_cast<optab_subtype>(static_cast<int>(a)
> > > > > > +				      | static_cast<int>(b)); }
> > > > > > +
> > > > > >  /* Return the optab used for computing the given operation on
> > > > > > the type
> > > > > given by
> > > > > >     the second argument.  The third argument distinguishes
> > > > > > between the
> > > > > types of
> > > > > >     vector shifts and rotates.  */ diff --git
> > > > > > a/gcc/optabs-tree.c b/gcc/optabs-tree.c index
> > > > > >
> > > > >
> > >
> 95ffe397c23e80c105afea52e9d47216bf52f55a..2f60004545defc53182e004eea
> > > > > 1e
> > > > > > 5c22b7453072 100644
> > > > > > --- a/gcc/optabs-tree.c
> > > > > > +++ b/gcc/optabs-tree.c
> > > > > > @@ -127,7 +127,17 @@ optab_for_tree_code (enum tree_code
> code,
> > > > > const_tree type,
> > > > > >        return TYPE_UNSIGNED (type) ? usum_widen_optab :
> > > > > > ssum_widen_optab;
> > > > > >
> > > > > >      case DOT_PROD_EXPR:
> > > > > > -      return TYPE_UNSIGNED (type) ? udot_prod_optab :
> > > sdot_prod_optab;
> > > > > > +      {
> > > > > > +	gcc_assert (subtype & optab_default
> > > > > > +		    || subtype & optab_vector
> > > > > > +		    || subtype & optab_signed_to_unsigned
> > > > > > +		    || subtype & optab_unsigned_to_signed);
> > > > > > +
> > > > > > +	if (subtype & (optab_unsigned_to_signed |
> > > > > optab_signed_to_unsigned))
> > > > > > +	  return usdot_prod_optab;
> > > > > > +
> > > > > > +	return (TYPE_UNSIGNED (type) ? udot_prod_optab :
> > > > > sdot_prod_optab);
> > > > > > +      }
> > > > > >
> > > > > >      case SAD_EXPR:
> > > > > >        return TYPE_UNSIGNED (type) ? usad_optab : ssad_optab;
> > > > > > diff --git a/gcc/optabs.c b/gcc/optabs.c index
> > > > > >
> > > > >
> > >
> f4614a394587787293dc8b680a38901f7906f61c..2e18b76de1412eab71971753ac
> > > > > 67
> > > > > > 8597c0d00098 100644
> > > > > > --- a/gcc/optabs.c
> > > > > > +++ b/gcc/optabs.c
> > > > > > @@ -262,6 +262,11 @@ expand_widen_pattern_expr (sepops ops,
> > > > > > rtx op0,
> > > > > rtx op1, rtx wide_op,
> > > > > >    bool sbool = false;
> > > > > >
> > > > > >    oprnd0 = ops->op0;
> > > > > > +  if (nops >= 2)
> > > > > > +    oprnd1 = ops->op1;
> > > > > > +  if (nops >= 3)
> > > > > > +    oprnd2 = ops->op2;
> > > > > > +
> > > > > >    tmode0 = TYPE_MODE (TREE_TYPE (oprnd0));
> > > > > >    if (ops->code == VEC_UNPACK_FIX_TRUNC_HI_EXPR
> > > > > >        || ops->code == VEC_UNPACK_FIX_TRUNC_LO_EXPR) @@ -
> 285,6
> > > > > +290,27
> > > > > > @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1,
> > > > > > rtx
> > > > > wide_op,
> > > > > >  	   ? vec_unpacks_sbool_hi_optab :
> vec_unpacks_sbool_lo_optab);
> > > > > >        sbool = true;
> > > > > >      }
> > > > > > +  else if (ops->code == DOT_PROD_EXPR)
> > > > > > +    {
> > > > > > +      enum optab_subtype subtype = optab_default;
> > > > > > +      signop sign1 = TYPE_SIGN (TREE_TYPE (oprnd0));
> > > > > > +      signop sign2 = TYPE_SIGN (TREE_TYPE (oprnd1));
> > > > > > +      if (sign1 == sign2)
> > > > > > +	;
> > > > > > +      else if (sign1 == SIGNED && sign2 == UNSIGNED)
> > > > > > +	{
> > > > > > +	  subtype |= optab_signed_to_unsigned;
> > > > > > +	  /* Same as optab_unsigned_to_signed but flip the
> operands.  */
> > > > > > +	  std::swap (op0, op1);
> > > > > > +	}
> > > > > > +      else if (sign1 == UNSIGNED && sign2 == SIGNED)
> > > > > > +	subtype |= optab_unsigned_to_signed;
> > > > > > +      else
> > > > > > +	gcc_unreachable ();
> > > > > > +
> > > > > > +      widen_pattern_optab
> > > > > > +	= optab_for_tree_code (ops->code, TREE_TYPE (oprnd0),
> subtype);
> > > > > > +    }
> > > > > >    else
> > > > > >      widen_pattern_optab
> > > > > >        = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0),
> > > > > > optab_default); @@ -298,10 +324,7 @@
> expand_widen_pattern_expr
> > > > > (sepops ops, rtx op0, rtx op1, rtx wide_op,
> > > > > >    gcc_assert (icode != CODE_FOR_nothing);
> > > > > >
> > > > > >    if (nops >= 2)
> > > > > > -    {
> > > > > > -      oprnd1 = ops->op1;
> > > > > > -      tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
> > > > > > -    }
> > > > > > +    tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
> > > > > >    else if (sbool)
> > > > > >      {
> > > > > >        nops = 2;
> > > > > > @@ -316,7 +339,6 @@ expand_widen_pattern_expr (sepops ops,
> rtx
> > > > > > op0,
> > > > > rtx op1, rtx wide_op,
> > > > > >      {
> > > > > >        gcc_assert (tmode1 == tmode0);
> > > > > >        gcc_assert (op1);
> > > > > > -      oprnd2 = ops->op2;
> > > > > >        wmode = TYPE_MODE (TREE_TYPE (oprnd2));
> > > > > >      }
> > > > > >
> > > > > > diff --git a/gcc/optabs.def b/gcc/optabs.def index
> > > > > >
> > > > >
> > >
> b192a9d070b8aa72e5676b2eaa020b5bdd7ffcc8..f470c2168378cec840edf7fbd
> > > > > b7c
> > > > > > 18615baae928 100644
> > > > > > --- a/gcc/optabs.def
> > > > > > +++ b/gcc/optabs.def
> > > > > > @@ -352,6 +352,7 @@ OPTAB_D (uavg_ceil_optab, "uavg$a3_ceil")
> > > > > OPTAB_D
> > > > > > (sdot_prod_optab, "sdot_prod$I$a")  OPTAB_D
> (ssum_widen_optab,
> > > > > > "widen_ssum$I$a3")  OPTAB_D (udot_prod_optab,
> "udot_prod$I$a")
> > > > > > +OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
> > > > > >  OPTAB_D (usum_widen_optab, "widen_usum$I$a3")  OPTAB_D
> > > > > (usad_optab,
> > > > > > "usad$I$a")  OPTAB_D (ssad_optab, "ssad$I$a") diff --git
> > > > > > a/gcc/tree-cfg.c b/gcc/tree-cfg.c index
> > > > > >
> > > > >
> > >
> 7e3aae5f9c28a49feedc7cc66e8ac0d476b9f28a..58b55bb648ad97d514f1fa18bb
> > > > > 00
> > > > > > 808fd2678b42 100644
> > > > > > --- a/gcc/tree-cfg.c
> > > > > > +++ b/gcc/tree-cfg.c
> > > > > > @@ -4421,7 +4421,8 @@ verify_gimple_assign_ternary (gassign
> *stmt)
> > > > > >  		  && !SCALAR_FLOAT_TYPE_P (rhs1_type))
> > > > > >  		 || (!INTEGRAL_TYPE_P (lhs_type)
> > > > > >  		     && !SCALAR_FLOAT_TYPE_P (lhs_type))))
> > > > > > -	    || !types_compatible_p (rhs1_type, rhs2_type)
> > > > > > +	    || (!types_compatible_p (rhs1_type, rhs2_type)
> > > > > > +		&& TYPE_SIGN (rhs1_type) == TYPE_SIGN
> (rhs2_type))
> > > > >
> > > > > That's not restrictive enough.  I suggest you use
> > > > >
> > > > >             && element_precision (rhs1_type) !=
> > > > > element_precision
> > > > > (rhs2_type)
> > > > >
> > > > > instead.
> > > > >
> > > > > As said, I'm not sure all the changes in this patch are required.
> > > > >
> > > > > Please elaborate.
> > > > >
> > > > > Thanks,
> > > > > Richard.
> > > > >
> > > > > >  	    || !useless_type_conversion_p (lhs_type, rhs3_type)
> > > > > >  	    || maybe_lt (GET_MODE_SIZE (element_mode
> (rhs3_type)),
> > > > > >  			 2 * GET_MODE_SIZE (element_mode
> (rhs1_type))))
> > > > > diff --git
> > > > > > a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index
> > > > > >
> > > > >
> > >
> 93fa2928e001c154bd4a9a73ac1dbbbf73c456df..cb8f5fbb6abca181c4171194d1
> > > > > 9f
> > > > > > ec29ec6e4176 100644
> > > > > > --- a/gcc/tree-vect-loop.c
> > > > > > +++ b/gcc/tree-vect-loop.c
> > > > > > @@ -6401,6 +6401,33 @@ build_vect_cond_expr (enum tree_code
> > > code,
> > > > > tree vop[3], tree mask,
> > > > > >      }
> > > > > >  }
> > > > > >
> > > > > > +/* Determine the optab_subtype to use for the given CODE and
> STMT.
> > > > > For
> > > > > > +   most CODE this will be optab_vector, however for certain
> > > > > > + operations
> > > > > such as
> > > > > > +   DOT_PROD_EXPR where the operation can different signs for
> > > > > > + the
> > > > > operands we
> > > > > > +   need to be able to pick the right optabs.  */
> > > > > > +
> > > > > > +static enum optab_subtype
> > > > > > +vect_determine_dot_kind (tree_code code, stmt_vec_info
> > > > > > +stmt_vinfo) {
> > > > > > +  enum optab_subtype subtype = optab_vector;
> > > > > > +  switch (code)
> > > > > > +    {
> > > > > > +      case DOT_PROD_EXPR:
> > > > > > +	{
> > > > > > +	  gassign *stmt = as_a <gassign *> (STMT_VINFO_STMT
> (stmt_vinfo));
> > > > > > +	  signop rhs1_sign = TYPE_SIGN (TREE_TYPE
> > > > > > +(gimple_assign_rhs1
> > > > > (stmt)));
> > > > > > +	  signop rhs2_sign = TYPE_SIGN (TREE_TYPE
> > > > > > +(gimple_assign_rhs2
> > > > > (stmt)));
> > > > > > +	  if (rhs1_sign != rhs2_sign)
> > > > > > +	    subtype |= optab_unsigned_to_signed;
> > > > > > +	  break;
> > > > > > +	}
> > > > > > +      default:
> > > > > > +	break;
> > > > > > +    }
> > > > > > +
> > > > > > +  return subtype;
> > > > > > +}
> > > > > > +
> > > > > >  /* Function vectorizable_reduction.
> > > > > >
> > > > > >     Check if STMT_INFO performs a reduction operation that can
> > > > > > be
> > > > > vectorized.
> > > > > > @@ -7189,7 +7216,8 @@ vectorizable_reduction (loop_vec_info
> > > > > loop_vinfo,
> > > > > >        bool ok = true;
> > > > > >
> > > > > >        /* 4.1. check support for the operation in the loop  */
> > > > > > -      optab optab = optab_for_tree_code (code, vectype_in,
> > > optab_vector);
> > > > > > +      enum optab_subtype subtype = vect_determine_dot_kind
> > > > > > + (code,
> > > > > stmt_info);
> > > > > > +      optab optab = optab_for_tree_code (code, vectype_in,
> > > > > > + subtype);
> > > > > >        if (!optab)
> > > > > >  	{
> > > > > >  	  if (dump_enabled_p ())
> > > > > > diff --git a/gcc/tree-vect-patterns.c
> > > > > > b/gcc/tree-vect-patterns.c index
> > > > > >
> > > > >
> > >
> 441d6cd28c4eaded7abd756164890dbcffd2f3b8..943c001fb13777b4d1513841f
> > > > > a84
> > > > > > 942316846d5e 100644
> > > > > > --- a/gcc/tree-vect-patterns.c
> > > > > > +++ b/gcc/tree-vect-patterns.c
> > > > > > @@ -201,7 +201,8 @@ vect_get_external_def_edge (vec_info
> > > > > > *vinfo, tree
> > > > > > var)  static bool  vect_supportable_direct_optab_p (vec_info
> > > > > > *vinfo, tree otype, tree_code code,
> > > > > >  				 tree itype, tree *vecotype_out,
> > > > > > -				 tree *vecitype_out = NULL)
> > > > > > +				 tree *vecitype_out = NULL,
> > > > > > +				 enum optab_subtype subtype =
> > > > > optab_default)
> > > > > >  {
> > > > > >    tree vecitype = get_vectype_for_scalar_type (vinfo, itype);
> > > > > >    if (!vecitype)
> > > > > > @@ -211,7 +212,7 @@ vect_supportable_direct_optab_p (vec_info
> > > > > > *vinfo,
> > > > > tree otype, tree_code code,
> > > > > >    if (!vecotype)
> > > > > >      return false;
> > > > > >
> > > > > > -  optab optab = optab_for_tree_code (code, vecitype,
> > > > > > optab_default);
> > > > > > +  optab optab = optab_for_tree_code (code, vecitype,
> > > > > > + subtype);
> > > > > >    if (!optab)
> > > > > >      return false;
> > > > > >
> > > > > > @@ -487,14 +488,31 @@ vect_joust_widened_integer (tree type,
> > > > > > bool shift_p, tree op,  }
> > > > > >
> > > > > >  /* Return true if the common supertype of NEW_TYPE and
> > > > > *COMMON_TYPE
> > > > > > -   is narrower than type, storing the supertype in *COMMON_TYPE
> if
> > > so.
> > > > > */
> > > > > > +   is narrower than type, storing the supertype in
> > > > > > + *COMMON_TYPE if
> > > so.
> > > > > > +   If ALLOW_SHORT_SIGN_MISMATCH then accept that
> > > *COMMON_TYPE
> > > > > and NEW_TYPE
> > > > > > +   may be of different signs but equal precision.   */
> > > > > >
> > > > > >  static bool
> > > > > > -vect_joust_widened_type (tree type, tree new_type, tree
> > > > > *common_type)
> > > > > > +vect_joust_widened_type (tree type, tree new_type, tree
> > > > > *common_type,
> > > > > > +			 bool allow_short_sign_mismatch = false)
> > > > > >  {
> > > > > >    if (types_compatible_p (*common_type, new_type))
> > > > > >      return true;
> > > > > >
> > > > > > +  /* Check if the mismatch is only in the sign and if we have
> > > > > > +     allow_short_sign_mismatch then allow it.  */
> > > > > > +  if (allow_short_sign_mismatch
> > > > > > +      && TYPE_SIGN (*common_type) != TYPE_SIGN (new_type))
> > > > > > +    {
> > > > > > +      bool sign = TYPE_SIGN (*common_type) == UNSIGNED;
> > > > > > +      tree eq_type
> > > > > > +	= build_nonstandard_integer_type (TYPE_PRECISION
> (new_type),
> > > > > > +					  sign);
> > > > > > +
> > > > > > +      if (types_compatible_p (*common_type, eq_type))
> > > > > > +	return true;
> > > > > > +    }
> > > > > > +
> > > > > >    /* See if *COMMON_TYPE can hold all values of NEW_TYPE.  */
> > > > > >    if ((TYPE_PRECISION (new_type) < TYPE_PRECISION
> (*common_type))
> > > > > >        && (TYPE_UNSIGNED (new_type) || !TYPE_UNSIGNED
> > > > > (*common_type)))
> > > > > > @@ -532,6 +550,9 @@ vect_joust_widened_type (tree type, tree
> > > > > new_type, tree *common_type)
> > > > > >     to a type that (a) is narrower than the result of STMT_INFO and
> > > > > >     (b) can hold all leaf operand values.
> > > > > >
> > > > > > +   If ALLOW_SHORT_SIGN_MISMATCH then allow that the signs of
> > > > > > + the
> > > > > operands
> > > > > > +   may differ in signs but not in precision.
> > > > > > +
> > > > > >     Return 0 if STMT_INFO isn't such a tree, or if no such
> COMMON_TYPE
> > > > > >     exists.  */
> > > > > >
> > > > > > @@ -539,7 +560,8 @@ static unsigned int  vect_widened_op_tree
> > > > > > (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
> > > > > >  		      tree_code widened_code, bool shift_p,
> > > > > >  		      unsigned int max_nops,
> > > > > > -		      vect_unpromoted_value *unprom, tree
> *common_type)
> > > > > > +		      vect_unpromoted_value *unprom, tree
> *common_type,
> > > > > > +		      bool allow_short_sign_mismatch = false)
> > > > > >  {
> > > > > >    /* Check for an integer operation with the right code.  */
> > > > > >    gassign *assign = dyn_cast <gassign *> (stmt_info->stmt);
> > > > > > @@
> > > > > > -600,7
> > > > > > +622,8 @@ vect_widened_op_tree (vec_info *vinfo,
> stmt_vec_info
> > > > > stmt_info, tree_code code,
> > > > > >  		= vinfo->lookup_def (this_unprom->op);
> > > > > >  	      nops = vect_widened_op_tree (vinfo, def_stmt_info,
> code,
> > > > > >  					   widened_code, shift_p,
> max_nops,
> > > > > > -					   this_unprom,
> common_type);
> > > > > > +					   this_unprom,
> common_type,
> > > > > > +
> allow_short_sign_mismatch);
> > > > > >  	      if (nops == 0)
> > > > > >  		return 0;
> > > > > >
> > > > > > @@ -617,7 +640,8 @@ vect_widened_op_tree (vec_info *vinfo,
> > > > > stmt_vec_info stmt_info, tree_code code,
> > > > > >  	      if (i == 0)
> > > > > >  		*common_type = this_unprom->type;
> > > > > >  	      else if (!vect_joust_widened_type (type, this_unprom-
> >type,
> > > > > > -						 common_type))
> > > > > > +						 common_type,
> > > > > > +
> allow_short_sign_mismatch))
> > > > > >  		return 0;
> > > > > >  	    }
> > > > > >  	}
> > > > > > @@ -888,21 +912,24 @@ vect_reassociating_reduction_p (vec_info
> > > > > > *vinfo,
> > > > > >
> > > > > >     Try to find the following pattern:
> > > > > >
> > > > > > -     type x_t, y_t;
> > > > > > +     type1a x_t
> > > > > > +     type1b y_t;
> > > > > >       TYPE1 prod;
> > > > > >       TYPE2 sum = init;
> > > > > >     loop:
> > > > > >       sum_0 = phi <init, sum_1>
> > > > > >       S1  x_t = ...
> > > > > >       S2  y_t = ...
> > > > > > -     S3  x_T = (TYPE1) x_t;
> > > > > > -     S4  y_T = (TYPE1) y_t;
> > > > > > +     S3  x_T = (TYPE3) x_t;
> > > > > > +     S4  y_T = (TYPE4) y_t;
> > > > > >       S5  prod = x_T * y_T;
> > > > > >       [S6  prod = (TYPE2) prod;  #optional]
> > > > > >       S7  sum_1 = prod + sum_0;
> > > > > >
> > > > > > -   where 'TYPE1' is exactly double the size of type 'type', and 'TYPE2'
> is
> > > the
> > > > > > -   same size of 'TYPE1' or bigger. This is a special case of a reduction
> > > > > > +   where 'TYPE1' is exactly double the size of type 'type1a' and
> 'type1b',
> > > > > > +   the sign of 'TYPE1' must be one of 'type1a' or 'type1b' but the
> sign of
> > > > > > +   'type1a' and 'type1b' can differ. 'TYPE2' is the same size of 'TYPE1'
> or
> > > > > > +   bigger and must be the same sign. This is a special case
> > > > > > + of a reduction
> > > > > >     computation.
> > > > > >
> > > > > >     Input:
> > > > > > @@ -939,15 +966,16 @@ vect_recog_dot_prod_pattern (vec_info
> > > > > > *vinfo,
> > > > > >
> > > > > >    /* Look for the following pattern
> > > > > >            DX = (TYPE1) X;
> > > > > > -          DY = (TYPE1) Y;
> > > > > > +	  DY = (TYPE2) Y;
> > > > > >            DPROD = DX * DY;
> > > > > > -          DDPROD = (TYPE2) DPROD;
> > > > > > +	  DDPROD = (TYPE3) DPROD;
> > > > > >            sum_1 = DDPROD + sum_0;
> > > > > >       In which
> > > > > >       - DX is double the size of X
> > > > > >       - DY is double the size of Y
> > > > > >       - DX, DY, DPROD all have the same type but the sign
> > > > > > -       between DX, DY and DPROD can differ.
> > > > > > +       between DX, DY and DPROD can differ. The sign of DPROD
> > > > > > +       is one of the signs of DX or DY.
> > > > > >       - sum is the same size of DPROD or bigger
> > > > > >       - sum has been recognized as a reduction variable.
> > > > > >
> > > > > > @@ -986,14 +1014,41 @@ vect_recog_dot_prod_pattern (vec_info
> > > *vinfo,
> > > > > >       inside the loop (in case we are analyzing an outer-loop).  */
> > > > > >    vect_unpromoted_value unprom0[2];
> > > > > >    if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR,
> > > > > WIDEN_MULT_EXPR,
> > > > > > -			     false, 2, unprom0, &half_type))
> > > > > > +			     false, 2, unprom0, &half_type, true))
> > > > > >      return NULL;
> > > > > >
> > > > > > +  /* Check to see if there is a sign change happening in the
> > > > > > + operands of
> > > > > the
> > > > > > +     multiplication and pick the appropriate optab subtype.
> > > > > > +*/
> > > > > > +  enum optab_subtype subtype;
> > > > > > +  tree rhs_type1 = unprom0[0].type;
> > > > > > +  tree rhs_type2 = unprom0[1].type;
> > > > > > +  if (TYPE_SIGN (rhs_type1) == TYPE_SIGN (rhs_type2))
> > > > > > +     subtype = optab_default;
> > > > > > +  else if (TYPE_SIGN (rhs_type1) == SIGNED
> > > > > > +	   && TYPE_SIGN (rhs_type2) == UNSIGNED)
> > > > > > +     subtype = optab_signed_to_unsigned;
> > > > > > +  else if (TYPE_SIGN (rhs_type1) == UNSIGNED
> > > > > > +	   && TYPE_SIGN (rhs_type2) == SIGNED)
> > > > > > +     subtype = optab_unsigned_to_signed;
> > > > > > +  else
> > > > > > +    gcc_unreachable ();
> > > > > > +
> > > > > > +  /* If we have a sign changing dot product we need to check that
> the
> > > > > > +     promoted type if unsigned has at least the same
> > > > > > + precision as the
> > > final
> > > > > > +     type of the dot-product.  */
> > > > > > +  if (subtype != optab_default)
> > > > > > +    {
> > > > > > +      tree mult_type = TREE_TYPE (unprom_mult.op);
> > > > > > +      if (TYPE_SIGN (mult_type) == UNSIGNED
> > > > > > +	  && TYPE_PRECISION (mult_type) < TYPE_PRECISION (type))
> > > > > > +	return NULL;
> > > > > > +    }
> > > > > > +
> > > > > >    vect_pattern_detected ("vect_recog_dot_prod_pattern",
> > > > > > last_stmt);
> > > > > >
> > > > > >    tree half_vectype;
> > > > > >    if (!vect_supportable_direct_optab_p (vinfo, type,
> > > > > > DOT_PROD_EXPR,
> > > > > half_type,
> > > > > > -					type_out, &half_vectype))
> > > > > > +					type_out, &half_vectype,
> subtype))
> > > > > >      return NULL;
> > > > > >
> > > > > >    /* Get the inputs in the appropriate types.  */ @@ -1002,8
> > > > > > +1057,22 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
> > > > > >  		       unprom0, half_vectype);
> > > > > >
> > > > > >    var = vect_recog_temp_ssa_var (type, NULL);
> > > > > > +
> > > > > > +  /* If we have a sign changing dot-product the dot-product
> > > > > > + itself does
> > > any
> > > > > > +     sign conversions, so consume the type and use the
> > > > > > + unpromoted types.  */  tree mult_arg1, mult_arg2;  if
> > > > > > + (subtype ==
> > > > > > + optab_default)
> > > > > > +    {
> > > > > > +      mult_arg1 = mult_oprnd[0];
> > > > > > +      mult_arg2 = mult_oprnd[1];
> > > > > > +    }
> > > > > > +  else
> > > > > > +    {
> > > > > > +      mult_arg1 = unprom0[0].op;
> > > > > > +      mult_arg2 = unprom0[1].op;
> > > > > > +    }
> > > > > >    pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,
> > > > > > -				      mult_oprnd[0], mult_oprnd[1],
> oprnd1);
> > > > > > +				      mult_arg1, mult_arg2, oprnd1);
> > > > > >
> > > > > >    return pattern_stmt;
> > > > > >  }
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > > --
> > > > > Richard Biener <rguenther@suse.de> SUSE Software Solutions
> > > > > Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF:
> > > > > Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > >
> > >
> > > --
> > > Richard Biener <rguenther@suse.de>
> > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> > > Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: rb14433.patch --]
[-- Type: text/x-diff; name="rb14433.patch", Size: 17866 bytes --]

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index d166a0debedf4d8edf55c842bcf4ff4690b3e9ce..9fad3322b3f1eb2a836833bb390df78f0cd9734b 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5438,13 +5438,55 @@ Like @samp{fold_left_plus_@var{m}}, but takes an additional mask operand
 
 @cindex @code{sdot_prod@var{m}} instruction pattern
 @item @samp{sdot_prod@var{m}}
+
+Compute the sum of the products of two signed elements.
+Operand 1 and operand 2 are of the same mode. Their
+product, which is of a wider mode, is computed and added to operand 3.
+Operand 3 is of a mode equal or wider than the mode of the product. The
+result is placed in operand 0, which is of the same mode as operand 3.
+
+Semantically the expressions perform the multiplication in the following signs
+
+@smallexample
+sdot<signed c, signed a, signed b> ==
+   res = sign-ext (a) * sign-ext (b) + c
+@dots{}
+@end smallexample
+
 @cindex @code{udot_prod@var{m}} instruction pattern
-@itemx @samp{udot_prod@var{m}}
-Compute the sum of the products of two signed/unsigned elements.
-Operand 1 and operand 2 are of the same mode. Their product, which is of a
-wider mode, is computed and added to operand 3. Operand 3 is of a mode equal or
-wider than the mode of the product. The result is placed in operand 0, which
-is of the same mode as operand 3.
+@item @samp{udot_prod@var{m}}
+
+Compute the sum of the products of two unsigned elements.
+Operand 1 and operand 2 are of the same mode. Their
+product, which is of a wider mode, is computed and added to operand 3.
+Operand 3 is of a mode equal or wider than the mode of the product. The
+result is placed in operand 0, which is of the same mode as operand 3.
+
+Semantically the expressions perform the multiplication in the following signs
+
+@smallexample
+udot<unsigned c, unsigned a, unsigned b> ==
+   res = zero-ext (a) * zero-ext (b) + c
+@dots{}
+@end smallexample
+
+
+
+@cindex @code{usdot_prod@var{m}} instruction pattern
+@item @samp{usdot_prod@var{m}}
+Compute the sum of the products of elements of different signs.
+Operand 1 must be unsigned and operand 2 signed. Their
+product, which is of a wider mode, is computed and added to operand 3.
+Operand 3 is of a mode equal or wider than the mode of the product. The
+result is placed in operand 0, which is of the same mode as operand 3.
+
+Semantically the expressions perform the multiplication in the following signs
+
+@smallexample
+usdot<unsigned c, unsigned a, signed b> ==
+   res = ((unsigned-conv) sign-ext (a)) * zero-ext (b) + c
+@dots{}
+@end smallexample
 
 @cindex @code{ssad@var{m}} instruction pattern
 @item @samp{ssad@var{m}}
diff --git a/gcc/optabs-tree.h b/gcc/optabs-tree.h
index c3aaa1a416991e856d3e24da45968a92ebada82c..fbd2b06b8dbfd560dfb66b314830e6b564b37abb 100644
--- a/gcc/optabs-tree.h
+++ b/gcc/optabs-tree.h
@@ -29,7 +29,8 @@ enum optab_subtype
 {
   optab_default,
   optab_scalar,
-  optab_vector
+  optab_vector,
+  optab_vector_mixed_sign
 };
 
 /* Return the optab used for computing the given operation on the type given by
diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
index 95ffe397c23e80c105afea52e9d47216bf52f55a..eeb5aeed3202cc6971b6447994bc5311e9c010bb 100644
--- a/gcc/optabs-tree.c
+++ b/gcc/optabs-tree.c
@@ -127,7 +127,12 @@ optab_for_tree_code (enum tree_code code, const_tree type,
       return TYPE_UNSIGNED (type) ? usum_widen_optab : ssum_widen_optab;
 
     case DOT_PROD_EXPR:
-      return TYPE_UNSIGNED (type) ? udot_prod_optab : sdot_prod_optab;
+      {
+	if (subtype == optab_vector_mixed_sign)
+	  return usdot_prod_optab;
+
+	return (TYPE_UNSIGNED (type) ? udot_prod_optab : sdot_prod_optab);
+      }
 
     case SAD_EXPR:
       return TYPE_UNSIGNED (type) ? usad_optab : ssad_optab;
diff --git a/gcc/optabs.c b/gcc/optabs.c
index f4614a394587787293dc8b680a38901f7906f61c..d9b64441d0e0726afee89dc9c937350451e7670d 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -262,6 +262,11 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
   bool sbool = false;
 
   oprnd0 = ops->op0;
+  if (nops >= 2)
+    oprnd1 = ops->op1;
+  if (nops >= 3)
+    oprnd2 = ops->op2;
+
   tmode0 = TYPE_MODE (TREE_TYPE (oprnd0));
   if (ops->code == VEC_UNPACK_FIX_TRUNC_HI_EXPR
       || ops->code == VEC_UNPACK_FIX_TRUNC_LO_EXPR)
@@ -285,6 +290,27 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
 	   ? vec_unpacks_sbool_hi_optab : vec_unpacks_sbool_lo_optab);
       sbool = true;
     }
+  else if (ops->code == DOT_PROD_EXPR)
+    {
+      enum optab_subtype subtype = optab_default;
+      signop sign1 = TYPE_SIGN (TREE_TYPE (oprnd0));
+      signop sign2 = TYPE_SIGN (TREE_TYPE (oprnd1));
+      if (sign1 == sign2)
+	;
+      else if (sign1 == SIGNED && sign2 == UNSIGNED)
+	{
+	  subtype = optab_vector_mixed_sign;
+	  /* Same as optab_vector_mixed_sign but flip the operands.  */
+	  std::swap (op0, op1);
+	}
+      else if (sign1 == UNSIGNED && sign2 == SIGNED)
+	subtype = optab_vector_mixed_sign;
+      else
+	gcc_unreachable ();
+
+      widen_pattern_optab
+	= optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), subtype);
+    }
   else
     widen_pattern_optab
       = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), optab_default);
@@ -298,10 +324,7 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
   gcc_assert (icode != CODE_FOR_nothing);
 
   if (nops >= 2)
-    {
-      oprnd1 = ops->op1;
-      tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
-    }
+    tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
   else if (sbool)
     {
       nops = 2;
@@ -316,7 +339,6 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
     {
       gcc_assert (tmode1 == tmode0);
       gcc_assert (op1);
-      oprnd2 = ops->op2;
       wmode = TYPE_MODE (TREE_TYPE (oprnd2));
     }
 
diff --git a/gcc/optabs.def b/gcc/optabs.def
index b192a9d070b8aa72e5676b2eaa020b5bdd7ffcc8..f470c2168378cec840edf7fbdb7c18615baae928 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -352,6 +352,7 @@ OPTAB_D (uavg_ceil_optab, "uavg$a3_ceil")
 OPTAB_D (sdot_prod_optab, "sdot_prod$I$a")
 OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3")
 OPTAB_D (udot_prod_optab, "udot_prod$I$a")
+OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
 OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
 OPTAB_D (usad_optab, "usad$I$a")
 OPTAB_D (ssad_optab, "ssad$I$a")
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index 7e3aae5f9c28a49feedc7cc66e8ac0d476b9f28a..13e405edd765dde704c64348d2d0b3cd88f0af7c 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -4421,7 +4421,9 @@ verify_gimple_assign_ternary (gassign *stmt)
 		  && !SCALAR_FLOAT_TYPE_P (rhs1_type))
 		 || (!INTEGRAL_TYPE_P (lhs_type)
 		     && !SCALAR_FLOAT_TYPE_P (lhs_type))))
-	    || !types_compatible_p (rhs1_type, rhs2_type)
+	    || (!types_compatible_p (rhs1_type, rhs2_type)
+		&& TYPE_SIGN (rhs1_type) == TYPE_SIGN (rhs2_type)
+		&& TYPE_PRECISION (rhs1_type) != TYPE_PRECISION (rhs2_type))
 	    || !useless_type_conversion_p (lhs_type, rhs3_type)
 	    || maybe_lt (GET_MODE_SIZE (element_mode (rhs3_type)),
 			 2 * GET_MODE_SIZE (element_mode (rhs1_type))))
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 93fa2928e001c154bd4a9a73ac1dbbbf73c456df..65ee04d2e481c8b34d3fb223e802e3923a766502 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -6401,6 +6401,32 @@ build_vect_cond_expr (enum tree_code code, tree vop[3], tree mask,
     }
 }
 
+/* Determine the optab_subtype to use for the given CODE and STMT.  For
+   most CODE this will be optab_vector, however for certain operations such as
+   DOT_PROD_EXPR where the operation can different signs for the operands we
+   need to be able to pick the right optabs.  */
+
+static enum optab_subtype
+vect_determine_dot_kind (tree_code code, stmt_vec_info stmt_vinfo)
+{
+  switch (code)
+    {
+      case DOT_PROD_EXPR:
+	{
+	  gassign *stmt = as_a <gassign *> (STMT_VINFO_STMT (stmt_vinfo));
+	  signop rhs1_sign = TYPE_SIGN (TREE_TYPE (gimple_assign_rhs1 (stmt)));
+	  signop rhs2_sign = TYPE_SIGN (TREE_TYPE (gimple_assign_rhs2 (stmt)));
+	  if (rhs1_sign != rhs2_sign)
+	    return optab_vector_mixed_sign;
+	  break;
+	}
+      default:
+	break;
+    }
+
+  return optab_vector;
+}
+
 /* Function vectorizable_reduction.
 
    Check if STMT_INFO performs a reduction operation that can be vectorized.
@@ -7189,7 +7215,8 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
       bool ok = true;
 
       /* 4.1. check support for the operation in the loop  */
-      optab optab = optab_for_tree_code (code, vectype_in, optab_vector);
+      enum optab_subtype subtype = vect_determine_dot_kind (code, stmt_info);
+      optab optab = optab_for_tree_code (code, vectype_in, subtype);
       if (!optab)
 	{
 	  if (dump_enabled_p ())
diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index 441d6cd28c4eaded7abd756164890dbcffd2f3b8..9b193cd261dc654f0a3192a9b43f4d2b00ea97e0 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -201,7 +201,8 @@ vect_get_external_def_edge (vec_info *vinfo, tree var)
 static bool
 vect_supportable_direct_optab_p (vec_info *vinfo, tree otype, tree_code code,
 				 tree itype, tree *vecotype_out,
-				 tree *vecitype_out = NULL)
+				 tree *vecitype_out = NULL,
+				 enum optab_subtype subtype = optab_default)
 {
   tree vecitype = get_vectype_for_scalar_type (vinfo, itype);
   if (!vecitype)
@@ -211,7 +212,7 @@ vect_supportable_direct_optab_p (vec_info *vinfo, tree otype, tree_code code,
   if (!vecotype)
     return false;
 
-  optab optab = optab_for_tree_code (code, vecitype, optab_default);
+  optab optab = optab_for_tree_code (code, vecitype, subtype);
   if (!optab)
     return false;
 
@@ -487,10 +488,14 @@ vect_joust_widened_integer (tree type, bool shift_p, tree op,
 }
 
 /* Return true if the common supertype of NEW_TYPE and *COMMON_TYPE
-   is narrower than type, storing the supertype in *COMMON_TYPE if so.  */
+   is narrower than type, storing the supertype in *COMMON_TYPE if so.
+   If UNPROM_TYPE then accept that *COMMON_TYPE and NEW_TYPE may be of
+   different signs but equal precision and that the resulting
+   multiplication of them be compatible with UNPROM_TYPE.   */
 
 static bool
-vect_joust_widened_type (tree type, tree new_type, tree *common_type)
+vect_joust_widened_type (tree type, tree new_type, tree *common_type,
+			 tree unprom_type = NULL)
 {
   if (types_compatible_p (*common_type, new_type))
     return true;
@@ -514,7 +519,25 @@ vect_joust_widened_type (tree type, tree new_type, tree *common_type)
   unsigned int precision = MAX (TYPE_PRECISION (*common_type),
 				TYPE_PRECISION (new_type));
   precision *= 2;
-  if (precision * 2 > TYPE_PRECISION (type))
+
+  /* Check if the mismatch is only in the sign and if we have
+     allow_short_sign_mismatch then allow it.  */
+  if (unprom_type
+      && TYPE_SIGN (unprom_type) == SIGNED
+      && TYPE_SIGN (*common_type) != TYPE_SIGN (new_type))
+    {
+      bool sign = TYPE_SIGN (*common_type) == UNSIGNED;
+      tree eq_type
+	= build_nonstandard_integer_type (TYPE_PRECISION (new_type),
+					  sign);
+
+      if (types_compatible_p (*common_type, eq_type))
+	return true;
+    }
+
+  /* The resulting application is unsigned, check if we have enough
+     precision to perform the operation.  */
+  if (precision * 2 > TYPE_PRECISION (unprom_type ? unprom_type : type))
     return false;
 
   *common_type = build_nonstandard_integer_type (precision, false);
@@ -532,6 +555,10 @@ vect_joust_widened_type (tree type, tree new_type, tree *common_type)
    to a type that (a) is narrower than the result of STMT_INFO and
    (b) can hold all leaf operand values.
 
+   If UNPROM_TYPE then allow that the signs of the operands
+   may differ in signs but not in precision and that the resulting type
+   of the operation on the operands is compatible with UNPROM_TYPE.
+
    Return 0 if STMT_INFO isn't such a tree, or if no such COMMON_TYPE
    exists.  */
 
@@ -539,7 +566,8 @@ static unsigned int
 vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
 		      tree_code widened_code, bool shift_p,
 		      unsigned int max_nops,
-		      vect_unpromoted_value *unprom, tree *common_type)
+		      vect_unpromoted_value *unprom, tree *common_type,
+		      tree unprom_type = NULL)
 {
   /* Check for an integer operation with the right code.  */
   gassign *assign = dyn_cast <gassign *> (stmt_info->stmt);
@@ -600,7 +628,8 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
 		= vinfo->lookup_def (this_unprom->op);
 	      nops = vect_widened_op_tree (vinfo, def_stmt_info, code,
 					   widened_code, shift_p, max_nops,
-					   this_unprom, common_type);
+					   this_unprom, common_type,
+					   unprom_type);
 	      if (nops == 0)
 		return 0;
 
@@ -617,7 +646,7 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
 	      if (i == 0)
 		*common_type = this_unprom->type;
 	      else if (!vect_joust_widened_type (type, this_unprom->type,
-						 common_type))
+						 common_type, unprom_type))
 		return 0;
 	    }
 	}
@@ -799,12 +828,15 @@ vect_convert_input (vec_info *vinfo, stmt_vec_info stmt_info, tree type,
 }
 
 /* Invoke vect_convert_input for N elements of UNPROM and store the
-   result in the corresponding elements of RESULT.  */
+   result in the corresponding elements of RESULT.
+
+   If ALLOW_SHORT_SIGN_MISMATCH then don't convert the types if they only
+   differ by sign.  */
 
 static void
 vect_convert_inputs (vec_info *vinfo, stmt_vec_info stmt_info, unsigned int n,
 		     tree *result, tree type, vect_unpromoted_value *unprom,
-		     tree vectype)
+		     tree vectype, bool allow_short_sign_mismatch = false)
 {
   for (unsigned int i = 0; i < n; ++i)
     {
@@ -812,8 +844,13 @@ vect_convert_inputs (vec_info *vinfo, stmt_vec_info stmt_info, unsigned int n,
       for (j = 0; j < i; ++j)
 	if (unprom[j].op == unprom[i].op)
 	  break;
+      bool only_sign = allow_short_sign_mismatch
+		       && TYPE_SIGN (type) != TYPE_SIGN (unprom[i].type)
+		       && TYPE_PRECISION (type) == TYPE_PRECISION (unprom[i].type);
       if (j < i)
 	result[i] = result[j];
+      else if (only_sign)
+	result[i] = unprom[i].op;
       else
 	result[i] = vect_convert_input (vinfo, stmt_info,
 					type, &unprom[i], vectype);
@@ -888,21 +925,24 @@ vect_reassociating_reduction_p (vec_info *vinfo,
 
    Try to find the following pattern:
 
-     type x_t, y_t;
+     type1a x_t
+     type1b y_t;
      TYPE1 prod;
      TYPE2 sum = init;
    loop:
      sum_0 = phi <init, sum_1>
      S1  x_t = ...
      S2  y_t = ...
-     S3  x_T = (TYPE1) x_t;
-     S4  y_T = (TYPE1) y_t;
+     S3  x_T = (TYPE3) x_t;
+     S4  y_T = (TYPE4) y_t;
      S5  prod = x_T * y_T;
      [S6  prod = (TYPE2) prod;  #optional]
      S7  sum_1 = prod + sum_0;
 
-   where 'TYPE1' is exactly double the size of type 'type', and 'TYPE2' is the
-   same size of 'TYPE1' or bigger. This is a special case of a reduction
+   where 'TYPE1' is exactly double the size of type 'type1a' and 'type1b',
+   the sign of 'TYPE1' must be one of 'type1a' or 'type1b' but the sign of
+   'type1a' and 'type1b' can differ. 'TYPE2' is the same size of 'TYPE1' or
+   bigger and must be the same sign. This is a special case of a reduction
    computation.
 
    Input:
@@ -939,15 +979,16 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
 
   /* Look for the following pattern
           DX = (TYPE1) X;
-          DY = (TYPE1) Y;
+	  DY = (TYPE2) Y;
           DPROD = DX * DY;
-          DDPROD = (TYPE2) DPROD;
+	  DDPROD = (TYPE3) DPROD;
           sum_1 = DDPROD + sum_0;
      In which
      - DX is double the size of X
      - DY is double the size of Y
      - DX, DY, DPROD all have the same type but the sign
-       between DX, DY and DPROD can differ.
+       between DX, DY and DPROD can differ. The sign of DPROD
+       is one of the signs of DX or DY.
      - sum is the same size of DPROD or bigger
      - sum has been recognized as a reduction variable.
 
@@ -986,20 +1027,29 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
      inside the loop (in case we are analyzing an outer-loop).  */
   vect_unpromoted_value unprom0[2];
   if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR, WIDEN_MULT_EXPR,
-			     false, 2, unprom0, &half_type))
+			     false, 2, unprom0, &half_type,
+			     TREE_TYPE (unprom_mult.op)))
     return NULL;
 
+  /* Check to see if there is a sign change happening in the operands of the
+     multiplication and pick the appropriate optab subtype.  */
+  enum optab_subtype subtype;
+  if (TYPE_SIGN (unprom0[0].type) == TYPE_SIGN (unprom0[1].type))
+    subtype = optab_default;
+  else
+    subtype = optab_vector_mixed_sign;
+
   vect_pattern_detected ("vect_recog_dot_prod_pattern", last_stmt);
 
   tree half_vectype;
   if (!vect_supportable_direct_optab_p (vinfo, type, DOT_PROD_EXPR, half_type,
-					type_out, &half_vectype))
+					type_out, &half_vectype, subtype))
     return NULL;
 
   /* Get the inputs in the appropriate types.  */
   tree mult_oprnd[2];
   vect_convert_inputs (vinfo, stmt_vinfo, 2, mult_oprnd, half_type,
-		       unprom0, half_vectype);
+		       unprom0, half_vectype, true);
 
   var = vect_recog_temp_ssa_var (type, NULL);
   pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,


^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes.
  2021-05-25 14:57           ` Tamar Christina
@ 2021-05-26  8:56             ` Richard Biener
  2021-06-02  9:28               ` Tamar Christina
  0 siblings, 1 reply; 35+ messages in thread
From: Richard Biener @ 2021-05-26  8:56 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, richard.sandiford

On Tue, 25 May 2021, Tamar Christina wrote:

> Hi Richi,
> 
> Here's a respun version of the patch.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?

index 
7e3aae5f9c28a49feedc7cc66e8ac0d476b9f28a..13e405edd765dde704c64348d2d0b3cd88f0af7c 
100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -4421,7 +4421,9 @@ verify_gimple_assign_ternary (gassign *stmt)
                  && !SCALAR_FLOAT_TYPE_P (rhs1_type))
                 || (!INTEGRAL_TYPE_P (lhs_type)
                     && !SCALAR_FLOAT_TYPE_P (lhs_type))))
-           || !types_compatible_p (rhs1_type, rhs2_type)
+           || (!types_compatible_p (rhs1_type, rhs2_type)
+               && TYPE_SIGN (rhs1_type) == TYPE_SIGN (rhs2_type)
+               && TYPE_PRECISION (rhs1_type) != TYPE_PRECISION 
(rhs2_type))

I think this doesn't capture the constraints - instead please do

-           || !types_compatible_p (rhs1_type, rhs2_type)
+           /* rhs1_type and rhs2_type may differ in sign.  */
+           || !tree_nop_conversion_p (rhs1_type, rhs2_type)


+/* Determine the optab_subtype to use for the given CODE and STMT.  For
+   most CODE this will be optab_vector, however for certain operations 
such as
+   DOT_PROD_EXPR where the operation can different signs for the operands 
we
+   need to be able to pick the right optabs.  */
+
+static enum optab_subtype
+vect_determine_dot_kind (tree_code code, stmt_vec_info stmt_vinfo)

vect_determine_optab_subkind would be a better name.  'code' is
redundant (or should better match stmt_vinfo->stmts code).  I wonder
if it might be clearer to compute the subtype where we compute 'code'
and the relation to stmt_info is obvious, I mean here:

  /* 3. Check the operands of the operation.  The first operands are 
defined
        inside the loop body. The last operand is the reduction variable,
        which is defined by the loop-header-phi.  */

  tree vectype_out = STMT_VINFO_VECTYPE (stmt_info);
  STMT_VINFO_REDUC_VECTYPE (reduc_info) = vectype_out;
  gassign *stmt = as_a <gassign *> (stmt_info->stmt);
  enum tree_code code = gimple_assign_rhs_code (stmt);
  bool lane_reduc_code_p
    = (code == DOT_PROD_EXPR || code == WIDEN_SUM_EXPR || code == 
SAD_EXPR);

so just add

  enum optab_subtype optab_query_kind = optab_vector;
  if (code == DOT_PROD_EXPR
      && <sign test>)
    optab_query_kind = optab_vector_mixed_sign;

in this place and avoid adding the new function?

I'm not too familiar with the pattern recog code, a 2nd eye would be
prefered (Richard?), but

+  /* Check if the mismatch is only in the sign and if we have
+     allow_short_sign_mismatch then allow it.  */
+  if (unprom_type
+      && TYPE_SIGN (unprom_type) == SIGNED
+      && TYPE_SIGN (*common_type) != TYPE_SIGN (new_type))
+    {
+      bool sign = TYPE_SIGN (*common_type) == UNSIGNED;
+      tree eq_type
+       = build_nonstandard_integer_type (TYPE_PRECISION (new_type),
+                                         sign);
+
+      if (types_compatible_p (*common_type, eq_type))
+       return true;
+    }

looks somewhat complicated - is that equal to

  if (unprom_type
      && tree_nop_conversion_p (*common_type, new_type))
    return true;

?  That is, *common_type and new_type only differ in sign?

@@ -812,8 +844,13 @@ vect_convert_inputs (vec_info *vinfo, stmt_vec_info 
stmt_info, unsigned int n,
       for (j = 0; j < i; ++j)
        if (unprom[j].op == unprom[i].op)
          break;
+      bool only_sign = allow_short_sign_mismatch
+                      && TYPE_SIGN (type) != TYPE_SIGN (unprom[i].type)
+                      && TYPE_PRECISION (type) == TYPE_PRECISION 
(unprom[i].type);

this could use the same tree_nop_conversion_p predicate.

Otherwise the patch looks good.

Thanks,
Richard.



> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* optabs.def (usdot_prod_optab): New.
> 	* doc/md.texi: Document it and clarify other dot prod optabs.
> 	* optabs-tree.h (enum optab_subtype): Add optab_vector_mixed_sign.
> 	* optabs-tree.c (optab_for_tree_code): Support usdot_prod_optab.
> 	* optabs.c (expand_widen_pattern_expr): Likewise.
> 	* tree-cfg.c (verify_gimple_assign_ternary): Likewise.
> 	* tree-vect-loop.c (vect_determine_dot_kind): New.
> 	(vectorizable_reduction): Query dot-product kind.
> 	* tree-vect-patterns.c (vect_supportable_direct_optab_p): Take optional
> 	optab subtype.
> 	(vect_joust_widened_type, vect_widened_op_tree): Optionally ignore
> 	mismatch types.
> 	(vect_recog_dot_prod_pattern): Support usdot_prod_optab.
> 
> 
> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de>
> > Sent: Monday, May 10, 2021 2:29 PM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> > Subject: RE: [PATCH 1/4]middle-end Vect: Add support for dot-product
> > where the sign for the multiplicant changes.
> > 
> > On Mon, 10 May 2021, Tamar Christina wrote:
> > 
> > >
> > >
> > > > -----Original Message-----
> > > > From: Richard Biener <rguenther@suse.de>
> > > > Sent: Monday, May 10, 2021 12:40 PM
> > > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> > > > Subject: RE: [PATCH 1/4]middle-end Vect: Add support for dot-product
> > > > where the sign for the multiplicant changes.
> > > >
> > > > On Fri, 7 May 2021, Tamar Christina wrote:
> > > >
> > > > > Hi Richi,
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Richard Biener <rguenther@suse.de>
> > > > > > Sent: Friday, May 7, 2021 12:46 PM
> > > > > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > > > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> > > > > > Subject: Re: [PATCH 1/4]middle-end Vect: Add support for
> > > > > > dot-product where the sign for the multiplicant changes.
> > > > > >
> > > > > > On Wed, 5 May 2021, Tamar Christina wrote:
> > > > > >
> > > > > > > Hi All,
> > > > > > >
> > > > > > > This patch adds support for a dot product where the sign of
> > > > > > > the multiplication arguments differ. i.e. one is signed and
> > > > > > > one is unsigned but the precisions are the same.
> > > > > > >
> > > > > > > #define N 480
> > > > > > > #define SIGNEDNESS_1 unsigned
> > > > > > > #define SIGNEDNESS_2 signed
> > > > > > > #define SIGNEDNESS_3 signed
> > > > > > > #define SIGNEDNESS_4 unsigned
> > > > > > >
> > > > > > > SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int
> > > > > > > res,
> > > > > > > SIGNEDNESS_3 char *restrict a,
> > > > > > >    SIGNEDNESS_4 char *restrict b) {
> > > > > > >   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> > > > > > >     {
> > > > > > >       int av = a[i];
> > > > > > >       int bv = b[i];
> > > > > > >       SIGNEDNESS_2 short mult = av * bv;
> > > > > > >       res += mult;
> > > > > > >     }
> > > > > > >   return res;
> > > > > > > }
> > > > > > >
> > > > > > > The operations are performed as if the operands were extended
> > > > > > > to a 32-bit
> > > > > > value.
> > > > > > > As such this operation isn't valid if there is an intermediate
> > > > > > > conversion to an unsigned value. i.e.  if SIGNEDNESS_2 is unsigned.
> > > > > > >
> > > > > > > more over if the signs of SIGNEDNESS_3 and SIGNEDNESS_4 are
> > > > > > > flipped the same optab is used but the operands are flipped in
> > > > > > > the optab
> > > > > > expansion.
> > > > > > >
> > > > > > > To support this the patch extends the dot-product detection to
> > > > > > > optionally ignore operands with different signs and stores
> > > > > > > this information in the optab subtype which is now made a bitfield.
> > > > > > >
> > > > > > > The subtype can now additionally controls which optab an EXPR
> > > > > > > can expand
> > > > > > to.
> > > > > > >
> > > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > > > > >
> > > > > > > Ok for master?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Tamar
> > > > > > >
> > > > > > > gcc/ChangeLog:
> > > > > > >
> > > > > > > 	* optabs.def (usdot_prod_optab): New.
> > > > > > > 	* doc/md.texi: Document it.
> > > > > > > 	* optabs-tree.c (optab_for_tree_code): Support
> > usdot_prod_optab.
> > > > > > > 	* optabs-tree.h (enum optab_subtype): Likewise.
> > > > > > > 	* optabs.c (expand_widen_pattern_expr): Likewise.
> > > > > > > 	* tree-cfg.c (verify_gimple_assign_ternary): Likewise.
> > > > > > > 	* tree-vect-loop.c (vect_determine_dot_kind): New.
> > > > > > > 	(vectorizable_reduction): Query dot-product kind.
> > > > > > > 	* tree-vect-patterns.c (vect_supportable_direct_optab_p):
> > > > > > > Take
> > > > > > optional
> > > > > > > 	optab subtype.
> > > > > > > 	(vect_joust_widened_type, vect_widened_op_tree):
> > Optionally
> > > > > > ignore
> > > > > > > 	mismatch types.
> > > > > > > 	(vect_recog_dot_prod_pattern): Support usdot_prod_optab.
> > > > > > >
> > > > > > > --- inline copy of patch --
> > > > > > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index
> > > > > > >
> > > > > >
> > > >
> > d166a0debedf4d8edf55c842bcf4ff4690b3e9ce..baf20416e63745097825fc30fd
> > > > > > f2
> > > > > > > e66bc80d7d23 100644
> > > > > > > --- a/gcc/doc/md.texi
> > > > > > > +++ b/gcc/doc/md.texi
> > > > > > > @@ -5440,11 +5440,13 @@ Like @samp{fold_left_plus_@var{m}},
> > > > > > > but
> > > > > > takes
> > > > > > > an additional mask operand  @item @samp{sdot_prod@var{m}}
> > > > @cindex
> > > > > > > @code{udot_prod@var{m}} instruction pattern  @itemx
> > > > > > > @samp{udot_prod@var{m}}
> > > > > > > +@cindex @code{usdot_prod@var{m}} instruction pattern @itemx
> > > > > > > +@samp{usdot_prod@var{m}}
> > > > > > >  Compute the sum of the products of two signed/unsigned
> > elements.
> > > > > > > -Operand 1 and operand 2 are of the same mode. Their product,
> > > > > > > which is of a -wider mode, is computed and added to operand 3.
> > > > > > > Operand 3 is of a mode equal or -wider than the mode of the
> > > > > > > product. The result is placed in operand 0, which -is of the
> > > > > > > same mode
> > > > as operand 3.
> > > > > > > +Operand 1 and operand 2 are of the same mode but may differ
> > > > > > > +in
> > > > signs.
> > > > > > > +Their product, which is of a wider mode, is computed and
> > > > > > > +added to
> > > > > > operand 3.
> > > > > > > +Operand 3 is of a mode equal or wider than the mode of the
> > product.
> > > > > > > +The result is placed in operand 0, which is of the same mode
> > > > > > > +as
> > > > operand 3.
> > > > > >
> > > > > > This doesn't really say what the 's', 'u' and 'us' specify.
> > > > > > Since we're doing a widen multiplication and then a non-widening
> > > > > > addition we only need to know the effective sign of the
> > > > > > multiplication so I think
> > > > the existing 's' and 'u'
> > > > > > are enough to cover all cases?
> > > > >
> > > > > The existing 's' and 'u' enforce that both operands of the
> > > > > multiplication are of the same sign.  So for e.g. 'u' both operand
> > > > > must be
> > > > unsigned.
> > > > >
> > > > > In the `us` case one can be signed and one unsigned. Operationally
> > > > > this does a sign extension to the wider type for the signed value,
> > > > > and the unsigned value gets zero extended first, and then converts
> > > > > it to unsigned to perform the unsigned multiplication, conforming
> > > > > to the C
> > > > promotion rules.
> > > > >
> > > > > TL;DR; Without a new optab I can't tell during expansion which
> > > > > semantic the operation had at the gimple/C level as modes don't carry
> > signs.
> > > > >
> > > > > Long version:
> > > > >
> > > > > The problem with using the existing patterns, because of their
> > > > > enforcement of `av` and `bv` being the same sign is that we can't
> > > > > remove the explicit sign extensions, but the multiplication must
> > > > > be done on
> > > > the sign/zero extended char input in the same sign.
> > > > >
> > > > > Which means (unless I am mistaken) to get the correct result, you
> > > > > can't use neither `udot` nor `sdot` as semantically these would
> > > > > zero or sign extend both operands from char to int to perform the
> > > > > multiplication in the same sigh.  Whereas in this case, one
> > > > > parameter is zero
> > > > and one parameter is sign extended and the result is always an
> > > > unsigned number.
> > > > >
> > > > > So basically
> > > > >
> > > > > udot<unsigned c, unsigned a, unsigned b> ==
> > > > >    c = zero-ext (a) * zero-ext (b) sdot<signed c, signed a, signed
> > > > > b> ==
> > > > >    c = sign-ext (a) * sign-ext (b) usdot<unsigned c, unsigned a,
> > > > > signed b> ==
> > > > >    c = ((unsigned-conv) sign-ext (a)) * zero-ext (b)
> > > > >
> > > > > So semantically the existing optabs won't fit here. udot would
> > > > > internally promote to unsigned types before the multiplication so
> > > > > the result of the multiplication would be wrong.  sdot would
> > > > > promote both to
> > > > signed and do signed multiplication, so the result is also wrong.
> > > > >
> > > > > Now if I relax the constraint on the signs of udot and sdot there
> > > > > are two
> > > > problems:
> > > > > RTL Modes don't contain signs.  So a target can't tell me how the
> > > > > operands
> > > > will be promoted.
> > > > > So:
> > > > >
> > > > > 1) I can't really check which semantics the target will adhere to
> > > > > on
> > > > expansion.
> > > > > 2) at expand time I have no way to differentiate between the two
> > > > instructions variants, given just modes
> > > > >      I can't tell whether I expand to the normal dot-product or
> > > > > the new
> > > > instruction.
> > > >
> > > > Ah, OK.  Indeed with such a weird instruction the new variant makes
> > sense.
> > > > Still can you please amend the optab documentation to say which
> > > > operand is unsigned and which is signed?  Just 'may differ in signs'
> > > > is bad.
> > >
> > > Sure, will expand on it.
> > >
> > > >
> > > > Since the multiplication is commutative I wonder why you need to
> > > > handle both signed_to_unsigned and unsigned_to_signed - we should
> > > > just enforce a canonical order (like the optab does).
> > >
> > > Sure, I thought it would have been better to change the order at
> > > expand time, but can do so at detection time.
> > >
> > > > I also think it's a particular bad fit for the bad
> > > > optab_for_tree_code API - would any of that improve when using a
> > > > direct internal function here?
> > >
> > > Somewhat, but this has considerable knock on effects, e.g. currently
> > > DOT_PROD is treated as a widening operation and so is handled by
> > > supportable_widening_operation which does not support calls. There's a
> > > significant number of places which work on the tree EXPR (including
> > constant folding) which all need to be changed.
> > >
> > > > In particular all the changes around optab_subtype look like they
> > > > make a bad API worse ... at least a single optab_vector_mixed_sign
> > > > should suffice here, no need to make it a flags kind.
> > >
> > > The reason I did so is because depending on where the query is done it
> > > does use different subtypes currently.  During detection it uses
> > > optab_default, and during vectorization optab_vector.  For this
> > > instruction this difference doesn't seem to be used, but did not want to
> > lose this information in case something depended on it.
> > >
> > > But can make it just one.
> > >
> > > >
> > > > +  /* If we have a sign changing dot product we need to check that the
> > > > +     promoted type if unsigned has at least the same precision as
> > > > + the
> > > > final
> > > > +     type of the dot-product.  */
> > > > +  if (subtype != optab_default)
> > > > +    {
> > > > +      tree mult_type = TREE_TYPE (unprom_mult.op);
> > > > +      if (TYPE_SIGN (mult_type) == UNSIGNED
> > > > +         && TYPE_PRECISION (mult_type) < TYPE_PRECISION (type))
> > > > +       return NULL;
> > > > +    }
> > > >
> > > > I don't understand this - how do we ever arrive at a result with less
> > precision?
> > >
> > > The user could have manually truncated the results, i.e. in the
> > > detection code notice `mult`
> > >
> > >       int av = a[i];
> > >       int bv = b[i];
> > >       SIGNEDNESS_2 short mult = av * bv;
> > >       res += mult;
> > >
> > > which is a short, so it's manually truncating the multiplication which
> > > is done as int by the instruction. If `mult` is unsigned then it will
> > > truncate the result if the signed input to usdot was negative, unless
> > > the Intermediate calculation is of the same precision as the
> > > instruction. i.e. if mult is unsigned int then there's no truncation
> > > going on, it's casting from int to unsigned int so it's safe to use
> > > then as the instruction does the same thing internally.
> > 
> > It looks to me that we simply should only ever allow sing-changes from
> > multiplication result to the sum.  At least your example above is not special to
> > mixed sign multiplications, no?
> > 
> > > > And why's this not an issue for signed multiplication?
> > >
> > > It is, but in that case it's handled by the type jousting, which
> > > doesn't allow the type mismatch. i.e.
> > >
> > > #define SIGNEDNESS_1 unsigned
> > > #define SIGNEDNESS_2 unsigned
> > > #define SIGNEDNESS_3 signed
> > > #define SIGNEDNESS_4 signed
> > >
> > > SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int res,
> > > SIGNEDNESS_3 char *restrict a,
> > >    SIGNEDNESS_4 char *restrict b)
> > > {
> > >   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> > >     {
> > >       int av = a[i];
> > >       int bv = b[i];
> > >       SIGNEDNESS_2 short mult = av * bv;
> > >       res += mult;
> > >     }
> > >   return res;
> > > }
> > >
> > > Is also not detected as a dot product.  By adding the carve out to the
> > > widen multiplication detection it now allows this case through so I
> > > handle it in the detection code.  Thinking about it now, it seems more
> > > logical to add this case handling inside the type jousting code as I
> > > don't think it's ever something you'd want.
> > 
> > Yeah, I think we only need to look through sign changes on the multiplication
> > result.
> > 
> > > > Also...
> > > >
> > > > +  /* If we have a sign changing dot-product the dot-product itself
> > > > + does
> > > > any
> > > > +     sign conversions, so consume the type and use the unpromoted
> > types.
> > > > */
> > > > +  tree mult_arg1, mult_arg2;
> > > > +  if (subtype == optab_default)
> > > > +    {
> > > > +      mult_arg1 = mult_oprnd[0];
> > > > +      mult_arg2 = mult_oprnd[1];
> > > > +    }
> > > > +  else
> > > > +    {
> > > > +      mult_arg1 = unprom0[0].op;
> > > > +      mult_arg2 = unprom0[1].op;
> > > > +    }
> > > >    pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,
> > > > -                                     mult_oprnd[0], mult_oprnd[1],
> > > > oprnd1);
> > > > +                                     mult_arg1, mult_arg2, oprnd1);
> > > >
> > > > I thought DOT_PROD always performs the promotion.  Maybe
> > mult_oprnd
> > > > and unprom0 are just misnamed here?
> > >
> > > Somewhat, in a normal dot-product the sign of the multiplication are
> > > the same here as the "unpromoted" types. So after vect_convert_input
> > > these two types are the same.
> > >
> > > However because here the sign changes and to maintain the semantics of
> > > the C code there's an extra conversion here to get the arguments in
> > > the same sign.  That needs to be stripped before given to the
> > > instruction which does the conversion internally.
> > 
> > Yes, but then why's that not done by the detection code?  That is, does it
> > (mis-)handle the (int)short_a * (int)(unsigned short)short_b where we'd
> > want the mixed-sign handling and not strip the unsigned short conversion
> > from short_b?
> > 
> > Richard.
> > 
> > >
> > > Regards,
> > > Tamar
> > >
> > > >
> > > > Richard.
> > > >
> > > > > Regards,
> > > > > Tamar
> > > > >
> > > > > >
> > > > > > The tree.def docs say the sum is also possibly widening but I
> > > > > > don't see this covered by the optab so we should eventually
> > > > > > remove this feature from the tree side.  In fact the tree-cfg.c
> > > > > > verifier requires the addition to be not widening - thus only
> > > > > > tree.def needs
> > > > adjustment.
> > > > > >
> > > > > > >  @cindex @code{ssad@var{m}} instruction pattern  @item
> > > > > > > @samp{ssad@var{m}} diff --git a/gcc/optabs-tree.h
> > > > > > > b/gcc/optabs-tree.h index
> > > > > > >
> > > > > >
> > > >
> > c3aaa1a416991e856d3e24da45968a92ebada82c..ebc23ac86fe99057f375781c2f
> > > > > > 19
> > > > > > > 90e0548ba08d 100644
> > > > > > > --- a/gcc/optabs-tree.h
> > > > > > > +++ b/gcc/optabs-tree.h
> > > > > > > @@ -27,11 +27,29 @@ along with GCC; see the file COPYING3.  If
> > > > > > > not
> > > > see
> > > > > > >     shift amount vs. machines that take a vector for the shift amount.
> > > > > > > */  enum optab_subtype  {
> > > > > > > -  optab_default,
> > > > > > > -  optab_scalar,
> > > > > > > -  optab_vector
> > > > > > > +  optab_default = 1 << 0,
> > > > > > > +  optab_scalar = 1 << 1,
> > > > > > > +  optab_vector = 1 << 2,
> > > > > > > +  optab_signed_to_unsigned = 1 << 3,
> > > > > > > + optab_unsigned_to_signed =
> > > > > > > + 1 << 4
> > > > > > >  };
> > > > > > >
> > > > > > > +/* Override the OrEqual-operator so we can use optab_subtype
> > > > > > > +as a bit flag.  */ inline enum optab_subtype& operator |=
> > > > > > > +(enum
> > > > > > optab_subtype&
> > > > > > > +a, enum optab_subtype b) {
> > > > > > > +    return a = static_cast<optab_subtype>(static_cast<int>(a)
> > > > > > > +					  | static_cast<int>(b)); }
> > > > > > > +
> > > > > > > +/* Override the Or-operator so we can use optab_subtype as a
> > > > > > > +bit flag.  */ inline enum optab_subtype operator | (enum
> > > > > > > +optab_subtype a, enum optab_subtype b) {
> > > > > > > +    return static_cast<optab_subtype>(static_cast<int>(a)
> > > > > > > +				      | static_cast<int>(b)); }
> > > > > > > +
> > > > > > >  /* Return the optab used for computing the given operation on
> > > > > > > the type
> > > > > > given by
> > > > > > >     the second argument.  The third argument distinguishes
> > > > > > > between the
> > > > > > types of
> > > > > > >     vector shifts and rotates.  */ diff --git
> > > > > > > a/gcc/optabs-tree.c b/gcc/optabs-tree.c index
> > > > > > >
> > > > > >
> > > >
> > 95ffe397c23e80c105afea52e9d47216bf52f55a..2f60004545defc53182e004eea
> > > > > > 1e
> > > > > > > 5c22b7453072 100644
> > > > > > > --- a/gcc/optabs-tree.c
> > > > > > > +++ b/gcc/optabs-tree.c
> > > > > > > @@ -127,7 +127,17 @@ optab_for_tree_code (enum tree_code
> > code,
> > > > > > const_tree type,
> > > > > > >        return TYPE_UNSIGNED (type) ? usum_widen_optab :
> > > > > > > ssum_widen_optab;
> > > > > > >
> > > > > > >      case DOT_PROD_EXPR:
> > > > > > > -      return TYPE_UNSIGNED (type) ? udot_prod_optab :
> > > > sdot_prod_optab;
> > > > > > > +      {
> > > > > > > +	gcc_assert (subtype & optab_default
> > > > > > > +		    || subtype & optab_vector
> > > > > > > +		    || subtype & optab_signed_to_unsigned
> > > > > > > +		    || subtype & optab_unsigned_to_signed);
> > > > > > > +
> > > > > > > +	if (subtype & (optab_unsigned_to_signed |
> > > > > > optab_signed_to_unsigned))
> > > > > > > +	  return usdot_prod_optab;
> > > > > > > +
> > > > > > > +	return (TYPE_UNSIGNED (type) ? udot_prod_optab :
> > > > > > sdot_prod_optab);
> > > > > > > +      }
> > > > > > >
> > > > > > >      case SAD_EXPR:
> > > > > > >        return TYPE_UNSIGNED (type) ? usad_optab : ssad_optab;
> > > > > > > diff --git a/gcc/optabs.c b/gcc/optabs.c index
> > > > > > >
> > > > > >
> > > >
> > f4614a394587787293dc8b680a38901f7906f61c..2e18b76de1412eab71971753ac
> > > > > > 67
> > > > > > > 8597c0d00098 100644
> > > > > > > --- a/gcc/optabs.c
> > > > > > > +++ b/gcc/optabs.c
> > > > > > > @@ -262,6 +262,11 @@ expand_widen_pattern_expr (sepops ops,
> > > > > > > rtx op0,
> > > > > > rtx op1, rtx wide_op,
> > > > > > >    bool sbool = false;
> > > > > > >
> > > > > > >    oprnd0 = ops->op0;
> > > > > > > +  if (nops >= 2)
> > > > > > > +    oprnd1 = ops->op1;
> > > > > > > +  if (nops >= 3)
> > > > > > > +    oprnd2 = ops->op2;
> > > > > > > +
> > > > > > >    tmode0 = TYPE_MODE (TREE_TYPE (oprnd0));
> > > > > > >    if (ops->code == VEC_UNPACK_FIX_TRUNC_HI_EXPR
> > > > > > >        || ops->code == VEC_UNPACK_FIX_TRUNC_LO_EXPR) @@ -
> > 285,6
> > > > > > +290,27
> > > > > > > @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1,
> > > > > > > rtx
> > > > > > wide_op,
> > > > > > >  	   ? vec_unpacks_sbool_hi_optab :
> > vec_unpacks_sbool_lo_optab);
> > > > > > >        sbool = true;
> > > > > > >      }
> > > > > > > +  else if (ops->code == DOT_PROD_EXPR)
> > > > > > > +    {
> > > > > > > +      enum optab_subtype subtype = optab_default;
> > > > > > > +      signop sign1 = TYPE_SIGN (TREE_TYPE (oprnd0));
> > > > > > > +      signop sign2 = TYPE_SIGN (TREE_TYPE (oprnd1));
> > > > > > > +      if (sign1 == sign2)
> > > > > > > +	;
> > > > > > > +      else if (sign1 == SIGNED && sign2 == UNSIGNED)
> > > > > > > +	{
> > > > > > > +	  subtype |= optab_signed_to_unsigned;
> > > > > > > +	  /* Same as optab_unsigned_to_signed but flip the
> > operands.  */
> > > > > > > +	  std::swap (op0, op1);
> > > > > > > +	}
> > > > > > > +      else if (sign1 == UNSIGNED && sign2 == SIGNED)
> > > > > > > +	subtype |= optab_unsigned_to_signed;
> > > > > > > +      else
> > > > > > > +	gcc_unreachable ();
> > > > > > > +
> > > > > > > +      widen_pattern_optab
> > > > > > > +	= optab_for_tree_code (ops->code, TREE_TYPE (oprnd0),
> > subtype);
> > > > > > > +    }
> > > > > > >    else
> > > > > > >      widen_pattern_optab
> > > > > > >        = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0),
> > > > > > > optab_default); @@ -298,10 +324,7 @@
> > expand_widen_pattern_expr
> > > > > > (sepops ops, rtx op0, rtx op1, rtx wide_op,
> > > > > > >    gcc_assert (icode != CODE_FOR_nothing);
> > > > > > >
> > > > > > >    if (nops >= 2)
> > > > > > > -    {
> > > > > > > -      oprnd1 = ops->op1;
> > > > > > > -      tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
> > > > > > > -    }
> > > > > > > +    tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
> > > > > > >    else if (sbool)
> > > > > > >      {
> > > > > > >        nops = 2;
> > > > > > > @@ -316,7 +339,6 @@ expand_widen_pattern_expr (sepops ops,
> > rtx
> > > > > > > op0,
> > > > > > rtx op1, rtx wide_op,
> > > > > > >      {
> > > > > > >        gcc_assert (tmode1 == tmode0);
> > > > > > >        gcc_assert (op1);
> > > > > > > -      oprnd2 = ops->op2;
> > > > > > >        wmode = TYPE_MODE (TREE_TYPE (oprnd2));
> > > > > > >      }
> > > > > > >
> > > > > > > diff --git a/gcc/optabs.def b/gcc/optabs.def index
> > > > > > >
> > > > > >
> > > >
> > b192a9d070b8aa72e5676b2eaa020b5bdd7ffcc8..f470c2168378cec840edf7fbd
> > > > > > b7c
> > > > > > > 18615baae928 100644
> > > > > > > --- a/gcc/optabs.def
> > > > > > > +++ b/gcc/optabs.def
> > > > > > > @@ -352,6 +352,7 @@ OPTAB_D (uavg_ceil_optab, "uavg$a3_ceil")
> > > > > > OPTAB_D
> > > > > > > (sdot_prod_optab, "sdot_prod$I$a")  OPTAB_D
> > (ssum_widen_optab,
> > > > > > > "widen_ssum$I$a3")  OPTAB_D (udot_prod_optab,
> > "udot_prod$I$a")
> > > > > > > +OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
> > > > > > >  OPTAB_D (usum_widen_optab, "widen_usum$I$a3")  OPTAB_D
> > > > > > (usad_optab,
> > > > > > > "usad$I$a")  OPTAB_D (ssad_optab, "ssad$I$a") diff --git
> > > > > > > a/gcc/tree-cfg.c b/gcc/tree-cfg.c index
> > > > > > >
> > > > > >
> > > >
> > 7e3aae5f9c28a49feedc7cc66e8ac0d476b9f28a..58b55bb648ad97d514f1fa18bb
> > > > > > 00
> > > > > > > 808fd2678b42 100644
> > > > > > > --- a/gcc/tree-cfg.c
> > > > > > > +++ b/gcc/tree-cfg.c
> > > > > > > @@ -4421,7 +4421,8 @@ verify_gimple_assign_ternary (gassign
> > *stmt)
> > > > > > >  		  && !SCALAR_FLOAT_TYPE_P (rhs1_type))
> > > > > > >  		 || (!INTEGRAL_TYPE_P (lhs_type)
> > > > > > >  		     && !SCALAR_FLOAT_TYPE_P (lhs_type))))
> > > > > > > -	    || !types_compatible_p (rhs1_type, rhs2_type)
> > > > > > > +	    || (!types_compatible_p (rhs1_type, rhs2_type)
> > > > > > > +		&& TYPE_SIGN (rhs1_type) == TYPE_SIGN
> > (rhs2_type))
> > > > > >
> > > > > > That's not restrictive enough.  I suggest you use
> > > > > >
> > > > > >             && element_precision (rhs1_type) !=
> > > > > > element_precision
> > > > > > (rhs2_type)
> > > > > >
> > > > > > instead.
> > > > > >
> > > > > > As said, I'm not sure all the changes in this patch are required.
> > > > > >
> > > > > > Please elaborate.
> > > > > >
> > > > > > Thanks,
> > > > > > Richard.
> > > > > >
> > > > > > >  	    || !useless_type_conversion_p (lhs_type, rhs3_type)
> > > > > > >  	    || maybe_lt (GET_MODE_SIZE (element_mode
> > (rhs3_type)),
> > > > > > >  			 2 * GET_MODE_SIZE (element_mode
> > (rhs1_type))))
> > > > > > diff --git
> > > > > > > a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index
> > > > > > >
> > > > > >
> > > >
> > 93fa2928e001c154bd4a9a73ac1dbbbf73c456df..cb8f5fbb6abca181c4171194d1
> > > > > > 9f
> > > > > > > ec29ec6e4176 100644
> > > > > > > --- a/gcc/tree-vect-loop.c
> > > > > > > +++ b/gcc/tree-vect-loop.c
> > > > > > > @@ -6401,6 +6401,33 @@ build_vect_cond_expr (enum tree_code
> > > > code,
> > > > > > tree vop[3], tree mask,
> > > > > > >      }
> > > > > > >  }
> > > > > > >
> > > > > > > +/* Determine the optab_subtype to use for the given CODE and
> > STMT.
> > > > > > For
> > > > > > > +   most CODE this will be optab_vector, however for certain
> > > > > > > + operations
> > > > > > such as
> > > > > > > +   DOT_PROD_EXPR where the operation can different signs for
> > > > > > > + the
> > > > > > operands we
> > > > > > > +   need to be able to pick the right optabs.  */
> > > > > > > +
> > > > > > > +static enum optab_subtype
> > > > > > > +vect_determine_dot_kind (tree_code code, stmt_vec_info
> > > > > > > +stmt_vinfo) {
> > > > > > > +  enum optab_subtype subtype = optab_vector;
> > > > > > > +  switch (code)
> > > > > > > +    {
> > > > > > > +      case DOT_PROD_EXPR:
> > > > > > > +	{
> > > > > > > +	  gassign *stmt = as_a <gassign *> (STMT_VINFO_STMT
> > (stmt_vinfo));
> > > > > > > +	  signop rhs1_sign = TYPE_SIGN (TREE_TYPE
> > > > > > > +(gimple_assign_rhs1
> > > > > > (stmt)));
> > > > > > > +	  signop rhs2_sign = TYPE_SIGN (TREE_TYPE
> > > > > > > +(gimple_assign_rhs2
> > > > > > (stmt)));
> > > > > > > +	  if (rhs1_sign != rhs2_sign)
> > > > > > > +	    subtype |= optab_unsigned_to_signed;
> > > > > > > +	  break;
> > > > > > > +	}
> > > > > > > +      default:
> > > > > > > +	break;
> > > > > > > +    }
> > > > > > > +
> > > > > > > +  return subtype;
> > > > > > > +}
> > > > > > > +
> > > > > > >  /* Function vectorizable_reduction.
> > > > > > >
> > > > > > >     Check if STMT_INFO performs a reduction operation that can
> > > > > > > be
> > > > > > vectorized.
> > > > > > > @@ -7189,7 +7216,8 @@ vectorizable_reduction (loop_vec_info
> > > > > > loop_vinfo,
> > > > > > >        bool ok = true;
> > > > > > >
> > > > > > >        /* 4.1. check support for the operation in the loop  */
> > > > > > > -      optab optab = optab_for_tree_code (code, vectype_in,
> > > > optab_vector);
> > > > > > > +      enum optab_subtype subtype = vect_determine_dot_kind
> > > > > > > + (code,
> > > > > > stmt_info);
> > > > > > > +      optab optab = optab_for_tree_code (code, vectype_in,
> > > > > > > + subtype);
> > > > > > >        if (!optab)
> > > > > > >  	{
> > > > > > >  	  if (dump_enabled_p ())
> > > > > > > diff --git a/gcc/tree-vect-patterns.c
> > > > > > > b/gcc/tree-vect-patterns.c index
> > > > > > >
> > > > > >
> > > >
> > 441d6cd28c4eaded7abd756164890dbcffd2f3b8..943c001fb13777b4d1513841f
> > > > > > a84
> > > > > > > 942316846d5e 100644
> > > > > > > --- a/gcc/tree-vect-patterns.c
> > > > > > > +++ b/gcc/tree-vect-patterns.c
> > > > > > > @@ -201,7 +201,8 @@ vect_get_external_def_edge (vec_info
> > > > > > > *vinfo, tree
> > > > > > > var)  static bool  vect_supportable_direct_optab_p (vec_info
> > > > > > > *vinfo, tree otype, tree_code code,
> > > > > > >  				 tree itype, tree *vecotype_out,
> > > > > > > -				 tree *vecitype_out = NULL)
> > > > > > > +				 tree *vecitype_out = NULL,
> > > > > > > +				 enum optab_subtype subtype =
> > > > > > optab_default)
> > > > > > >  {
> > > > > > >    tree vecitype = get_vectype_for_scalar_type (vinfo, itype);
> > > > > > >    if (!vecitype)
> > > > > > > @@ -211,7 +212,7 @@ vect_supportable_direct_optab_p (vec_info
> > > > > > > *vinfo,
> > > > > > tree otype, tree_code code,
> > > > > > >    if (!vecotype)
> > > > > > >      return false;
> > > > > > >
> > > > > > > -  optab optab = optab_for_tree_code (code, vecitype,
> > > > > > > optab_default);
> > > > > > > +  optab optab = optab_for_tree_code (code, vecitype,
> > > > > > > + subtype);
> > > > > > >    if (!optab)
> > > > > > >      return false;
> > > > > > >
> > > > > > > @@ -487,14 +488,31 @@ vect_joust_widened_integer (tree type,
> > > > > > > bool shift_p, tree op,  }
> > > > > > >
> > > > > > >  /* Return true if the common supertype of NEW_TYPE and
> > > > > > *COMMON_TYPE
> > > > > > > -   is narrower than type, storing the supertype in *COMMON_TYPE
> > if
> > > > so.
> > > > > > */
> > > > > > > +   is narrower than type, storing the supertype in
> > > > > > > + *COMMON_TYPE if
> > > > so.
> > > > > > > +   If ALLOW_SHORT_SIGN_MISMATCH then accept that
> > > > *COMMON_TYPE
> > > > > > and NEW_TYPE
> > > > > > > +   may be of different signs but equal precision.   */
> > > > > > >
> > > > > > >  static bool
> > > > > > > -vect_joust_widened_type (tree type, tree new_type, tree
> > > > > > *common_type)
> > > > > > > +vect_joust_widened_type (tree type, tree new_type, tree
> > > > > > *common_type,
> > > > > > > +			 bool allow_short_sign_mismatch = false)
> > > > > > >  {
> > > > > > >    if (types_compatible_p (*common_type, new_type))
> > > > > > >      return true;
> > > > > > >
> > > > > > > +  /* Check if the mismatch is only in the sign and if we have
> > > > > > > +     allow_short_sign_mismatch then allow it.  */
> > > > > > > +  if (allow_short_sign_mismatch
> > > > > > > +      && TYPE_SIGN (*common_type) != TYPE_SIGN (new_type))
> > > > > > > +    {
> > > > > > > +      bool sign = TYPE_SIGN (*common_type) == UNSIGNED;
> > > > > > > +      tree eq_type
> > > > > > > +	= build_nonstandard_integer_type (TYPE_PRECISION
> > (new_type),
> > > > > > > +					  sign);
> > > > > > > +
> > > > > > > +      if (types_compatible_p (*common_type, eq_type))
> > > > > > > +	return true;
> > > > > > > +    }
> > > > > > > +
> > > > > > >    /* See if *COMMON_TYPE can hold all values of NEW_TYPE.  */
> > > > > > >    if ((TYPE_PRECISION (new_type) < TYPE_PRECISION
> > (*common_type))
> > > > > > >        && (TYPE_UNSIGNED (new_type) || !TYPE_UNSIGNED
> > > > > > (*common_type)))
> > > > > > > @@ -532,6 +550,9 @@ vect_joust_widened_type (tree type, tree
> > > > > > new_type, tree *common_type)
> > > > > > >     to a type that (a) is narrower than the result of STMT_INFO and
> > > > > > >     (b) can hold all leaf operand values.
> > > > > > >
> > > > > > > +   If ALLOW_SHORT_SIGN_MISMATCH then allow that the signs of
> > > > > > > + the
> > > > > > operands
> > > > > > > +   may differ in signs but not in precision.
> > > > > > > +
> > > > > > >     Return 0 if STMT_INFO isn't such a tree, or if no such
> > COMMON_TYPE
> > > > > > >     exists.  */
> > > > > > >
> > > > > > > @@ -539,7 +560,8 @@ static unsigned int  vect_widened_op_tree
> > > > > > > (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
> > > > > > >  		      tree_code widened_code, bool shift_p,
> > > > > > >  		      unsigned int max_nops,
> > > > > > > -		      vect_unpromoted_value *unprom, tree
> > *common_type)
> > > > > > > +		      vect_unpromoted_value *unprom, tree
> > *common_type,
> > > > > > > +		      bool allow_short_sign_mismatch = false)
> > > > > > >  {
> > > > > > >    /* Check for an integer operation with the right code.  */
> > > > > > >    gassign *assign = dyn_cast <gassign *> (stmt_info->stmt);
> > > > > > > @@
> > > > > > > -600,7
> > > > > > > +622,8 @@ vect_widened_op_tree (vec_info *vinfo,
> > stmt_vec_info
> > > > > > stmt_info, tree_code code,
> > > > > > >  		= vinfo->lookup_def (this_unprom->op);
> > > > > > >  	      nops = vect_widened_op_tree (vinfo, def_stmt_info,
> > code,
> > > > > > >  					   widened_code, shift_p,
> > max_nops,
> > > > > > > -					   this_unprom,
> > common_type);
> > > > > > > +					   this_unprom,
> > common_type,
> > > > > > > +
> > allow_short_sign_mismatch);
> > > > > > >  	      if (nops == 0)
> > > > > > >  		return 0;
> > > > > > >
> > > > > > > @@ -617,7 +640,8 @@ vect_widened_op_tree (vec_info *vinfo,
> > > > > > stmt_vec_info stmt_info, tree_code code,
> > > > > > >  	      if (i == 0)
> > > > > > >  		*common_type = this_unprom->type;
> > > > > > >  	      else if (!vect_joust_widened_type (type, this_unprom-
> > >type,
> > > > > > > -						 common_type))
> > > > > > > +						 common_type,
> > > > > > > +
> > allow_short_sign_mismatch))
> > > > > > >  		return 0;
> > > > > > >  	    }
> > > > > > >  	}
> > > > > > > @@ -888,21 +912,24 @@ vect_reassociating_reduction_p (vec_info
> > > > > > > *vinfo,
> > > > > > >
> > > > > > >     Try to find the following pattern:
> > > > > > >
> > > > > > > -     type x_t, y_t;
> > > > > > > +     type1a x_t
> > > > > > > +     type1b y_t;
> > > > > > >       TYPE1 prod;
> > > > > > >       TYPE2 sum = init;
> > > > > > >     loop:
> > > > > > >       sum_0 = phi <init, sum_1>
> > > > > > >       S1  x_t = ...
> > > > > > >       S2  y_t = ...
> > > > > > > -     S3  x_T = (TYPE1) x_t;
> > > > > > > -     S4  y_T = (TYPE1) y_t;
> > > > > > > +     S3  x_T = (TYPE3) x_t;
> > > > > > > +     S4  y_T = (TYPE4) y_t;
> > > > > > >       S5  prod = x_T * y_T;
> > > > > > >       [S6  prod = (TYPE2) prod;  #optional]
> > > > > > >       S7  sum_1 = prod + sum_0;
> > > > > > >
> > > > > > > -   where 'TYPE1' is exactly double the size of type 'type', and 'TYPE2'
> > is
> > > > the
> > > > > > > -   same size of 'TYPE1' or bigger. This is a special case of a reduction
> > > > > > > +   where 'TYPE1' is exactly double the size of type 'type1a' and
> > 'type1b',
> > > > > > > +   the sign of 'TYPE1' must be one of 'type1a' or 'type1b' but the
> > sign of
> > > > > > > +   'type1a' and 'type1b' can differ. 'TYPE2' is the same size of 'TYPE1'
> > or
> > > > > > > +   bigger and must be the same sign. This is a special case
> > > > > > > + of a reduction
> > > > > > >     computation.
> > > > > > >
> > > > > > >     Input:
> > > > > > > @@ -939,15 +966,16 @@ vect_recog_dot_prod_pattern (vec_info
> > > > > > > *vinfo,
> > > > > > >
> > > > > > >    /* Look for the following pattern
> > > > > > >            DX = (TYPE1) X;
> > > > > > > -          DY = (TYPE1) Y;
> > > > > > > +	  DY = (TYPE2) Y;
> > > > > > >            DPROD = DX * DY;
> > > > > > > -          DDPROD = (TYPE2) DPROD;
> > > > > > > +	  DDPROD = (TYPE3) DPROD;
> > > > > > >            sum_1 = DDPROD + sum_0;
> > > > > > >       In which
> > > > > > >       - DX is double the size of X
> > > > > > >       - DY is double the size of Y
> > > > > > >       - DX, DY, DPROD all have the same type but the sign
> > > > > > > -       between DX, DY and DPROD can differ.
> > > > > > > +       between DX, DY and DPROD can differ. The sign of DPROD
> > > > > > > +       is one of the signs of DX or DY.
> > > > > > >       - sum is the same size of DPROD or bigger
> > > > > > >       - sum has been recognized as a reduction variable.
> > > > > > >
> > > > > > > @@ -986,14 +1014,41 @@ vect_recog_dot_prod_pattern (vec_info
> > > > *vinfo,
> > > > > > >       inside the loop (in case we are analyzing an outer-loop).  */
> > > > > > >    vect_unpromoted_value unprom0[2];
> > > > > > >    if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR,
> > > > > > WIDEN_MULT_EXPR,
> > > > > > > -			     false, 2, unprom0, &half_type))
> > > > > > > +			     false, 2, unprom0, &half_type, true))
> > > > > > >      return NULL;
> > > > > > >
> > > > > > > +  /* Check to see if there is a sign change happening in the
> > > > > > > + operands of
> > > > > > the
> > > > > > > +     multiplication and pick the appropriate optab subtype.
> > > > > > > +*/
> > > > > > > +  enum optab_subtype subtype;
> > > > > > > +  tree rhs_type1 = unprom0[0].type;
> > > > > > > +  tree rhs_type2 = unprom0[1].type;
> > > > > > > +  if (TYPE_SIGN (rhs_type1) == TYPE_SIGN (rhs_type2))
> > > > > > > +     subtype = optab_default;
> > > > > > > +  else if (TYPE_SIGN (rhs_type1) == SIGNED
> > > > > > > +	   && TYPE_SIGN (rhs_type2) == UNSIGNED)
> > > > > > > +     subtype = optab_signed_to_unsigned;
> > > > > > > +  else if (TYPE_SIGN (rhs_type1) == UNSIGNED
> > > > > > > +	   && TYPE_SIGN (rhs_type2) == SIGNED)
> > > > > > > +     subtype = optab_unsigned_to_signed;
> > > > > > > +  else
> > > > > > > +    gcc_unreachable ();
> > > > > > > +
> > > > > > > +  /* If we have a sign changing dot product we need to check that
> > the
> > > > > > > +     promoted type if unsigned has at least the same
> > > > > > > + precision as the
> > > > final
> > > > > > > +     type of the dot-product.  */
> > > > > > > +  if (subtype != optab_default)
> > > > > > > +    {
> > > > > > > +      tree mult_type = TREE_TYPE (unprom_mult.op);
> > > > > > > +      if (TYPE_SIGN (mult_type) == UNSIGNED
> > > > > > > +	  && TYPE_PRECISION (mult_type) < TYPE_PRECISION (type))
> > > > > > > +	return NULL;
> > > > > > > +    }
> > > > > > > +
> > > > > > >    vect_pattern_detected ("vect_recog_dot_prod_pattern",
> > > > > > > last_stmt);
> > > > > > >
> > > > > > >    tree half_vectype;
> > > > > > >    if (!vect_supportable_direct_optab_p (vinfo, type,
> > > > > > > DOT_PROD_EXPR,
> > > > > > half_type,
> > > > > > > -					type_out, &half_vectype))
> > > > > > > +					type_out, &half_vectype,
> > subtype))
> > > > > > >      return NULL;
> > > > > > >
> > > > > > >    /* Get the inputs in the appropriate types.  */ @@ -1002,8
> > > > > > > +1057,22 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
> > > > > > >  		       unprom0, half_vectype);
> > > > > > >
> > > > > > >    var = vect_recog_temp_ssa_var (type, NULL);
> > > > > > > +
> > > > > > > +  /* If we have a sign changing dot-product the dot-product
> > > > > > > + itself does
> > > > any
> > > > > > > +     sign conversions, so consume the type and use the
> > > > > > > + unpromoted types.  */  tree mult_arg1, mult_arg2;  if
> > > > > > > + (subtype ==
> > > > > > > + optab_default)
> > > > > > > +    {
> > > > > > > +      mult_arg1 = mult_oprnd[0];
> > > > > > > +      mult_arg2 = mult_oprnd[1];
> > > > > > > +    }
> > > > > > > +  else
> > > > > > > +    {
> > > > > > > +      mult_arg1 = unprom0[0].op;
> > > > > > > +      mult_arg2 = unprom0[1].op;
> > > > > > > +    }
> > > > > > >    pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,
> > > > > > > -				      mult_oprnd[0], mult_oprnd[1],
> > oprnd1);
> > > > > > > +				      mult_arg1, mult_arg2, oprnd1);
> > > > > > >
> > > > > > >    return pattern_stmt;
> > > > > > >  }
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > Richard Biener <rguenther@suse.de> SUSE Software Solutions
> > > > > > Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF:
> > > > > > Felix Imend?rffer; HRB 36809 (AG Nuernberg)
> > > > >
> > > >
> > > > --
> > > > Richard Biener <rguenther@suse.de>
> > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> > > > Nuernberg, Germany; GF: Felix Imend?rffer; HRB 36809 (AG Nuernberg)
> > >
> > 
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> > Nuernberg, Germany; GF: Felix Imend?rffer; HRB 36809 (AG Nuernberg)
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes.
  2021-05-26  8:56             ` Richard Biener
@ 2021-06-02  9:28               ` Tamar Christina
  2021-06-04 10:12                 ` Tamar Christina
  0 siblings, 1 reply; 35+ messages in thread
From: Tamar Christina @ 2021-06-02  9:28 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, Richard Sandiford

Ping,

Did you have any comments Richard S?

Otherwise I'll proceed with respining according to Richi's comments.

Regards,
Tamar

> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Wednesday, May 26, 2021 9:57 AM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Sandiford
> <Richard.Sandiford@arm.com>
> Subject: RE: [PATCH 1/4]middle-end Vect: Add support for dot-product
> where the sign for the multiplicant changes.
> 
> On Tue, 25 May 2021, Tamar Christina wrote:
> 
> > Hi Richi,
> >
> > Here's a respun version of the patch.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> 
> index
> 7e3aae5f9c28a49feedc7cc66e8ac0d476b9f28a..13e405edd765dde704c64348d
> 2d0b3cd88f0af7c
> 100644
> --- a/gcc/tree-cfg.c
> +++ b/gcc/tree-cfg.c
> @@ -4421,7 +4421,9 @@ verify_gimple_assign_ternary (gassign *stmt)
>                   && !SCALAR_FLOAT_TYPE_P (rhs1_type))
>                  || (!INTEGRAL_TYPE_P (lhs_type)
>                      && !SCALAR_FLOAT_TYPE_P (lhs_type))))
> -           || !types_compatible_p (rhs1_type, rhs2_type)
> +           || (!types_compatible_p (rhs1_type, rhs2_type)
> +               && TYPE_SIGN (rhs1_type) == TYPE_SIGN (rhs2_type)
> +               && TYPE_PRECISION (rhs1_type) != TYPE_PRECISION
> (rhs2_type))
> 
> I think this doesn't capture the constraints - instead please do
> 
> -           || !types_compatible_p (rhs1_type, rhs2_type)
> +           /* rhs1_type and rhs2_type may differ in sign.  */
> +           || !tree_nop_conversion_p (rhs1_type, rhs2_type)
> 
> 
> +/* Determine the optab_subtype to use for the given CODE and STMT.  For
> +   most CODE this will be optab_vector, however for certain operations
> such as
> +   DOT_PROD_EXPR where the operation can different signs for the
> operands
> we
> +   need to be able to pick the right optabs.  */
> +
> +static enum optab_subtype
> +vect_determine_dot_kind (tree_code code, stmt_vec_info stmt_vinfo)
> 
> vect_determine_optab_subkind would be a better name.  'code' is
> redundant (or should better match stmt_vinfo->stmts code).  I wonder
> if it might be clearer to compute the subtype where we compute 'code'
> and the relation to stmt_info is obvious, I mean here:
> 
>   /* 3. Check the operands of the operation.  The first operands are
> defined
>         inside the loop body. The last operand is the reduction variable,
>         which is defined by the loop-header-phi.  */
> 
>   tree vectype_out = STMT_VINFO_VECTYPE (stmt_info);
>   STMT_VINFO_REDUC_VECTYPE (reduc_info) = vectype_out;
>   gassign *stmt = as_a <gassign *> (stmt_info->stmt);
>   enum tree_code code = gimple_assign_rhs_code (stmt);
>   bool lane_reduc_code_p
>     = (code == DOT_PROD_EXPR || code == WIDEN_SUM_EXPR || code ==
> SAD_EXPR);
> 
> so just add
> 
>   enum optab_subtype optab_query_kind = optab_vector;
>   if (code == DOT_PROD_EXPR
>       && <sign test>)
>     optab_query_kind = optab_vector_mixed_sign;
> 
> in this place and avoid adding the new function?
> 
> I'm not too familiar with the pattern recog code, a 2nd eye would be
> prefered (Richard?), but
> 
> +  /* Check if the mismatch is only in the sign and if we have
> +     allow_short_sign_mismatch then allow it.  */
> +  if (unprom_type
> +      && TYPE_SIGN (unprom_type) == SIGNED
> +      && TYPE_SIGN (*common_type) != TYPE_SIGN (new_type))
> +    {
> +      bool sign = TYPE_SIGN (*common_type) == UNSIGNED;
> +      tree eq_type
> +       = build_nonstandard_integer_type (TYPE_PRECISION (new_type),
> +                                         sign);
> +
> +      if (types_compatible_p (*common_type, eq_type))
> +       return true;
> +    }
> 
> looks somewhat complicated - is that equal to
> 
>   if (unprom_type
>       && tree_nop_conversion_p (*common_type, new_type))
>     return true;
> 
> ?  That is, *common_type and new_type only differ in sign?
> 
> @@ -812,8 +844,13 @@ vect_convert_inputs (vec_info *vinfo,
> stmt_vec_info
> stmt_info, unsigned int n,
>        for (j = 0; j < i; ++j)
>         if (unprom[j].op == unprom[i].op)
>           break;
> +      bool only_sign = allow_short_sign_mismatch
> +                      && TYPE_SIGN (type) != TYPE_SIGN (unprom[i].type)
> +                      && TYPE_PRECISION (type) == TYPE_PRECISION
> (unprom[i].type);
> 
> this could use the same tree_nop_conversion_p predicate.
> 
> Otherwise the patch looks good.
> 
> Thanks,
> Richard.
> 
> 
> 
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 	* optabs.def (usdot_prod_optab): New.
> > 	* doc/md.texi: Document it and clarify other dot prod optabs.
> > 	* optabs-tree.h (enum optab_subtype): Add
> optab_vector_mixed_sign.
> > 	* optabs-tree.c (optab_for_tree_code): Support usdot_prod_optab.
> > 	* optabs.c (expand_widen_pattern_expr): Likewise.
> > 	* tree-cfg.c (verify_gimple_assign_ternary): Likewise.
> > 	* tree-vect-loop.c (vect_determine_dot_kind): New.
> > 	(vectorizable_reduction): Query dot-product kind.
> > 	* tree-vect-patterns.c (vect_supportable_direct_optab_p): Take
> optional
> > 	optab subtype.
> > 	(vect_joust_widened_type, vect_widened_op_tree): Optionally
> ignore
> > 	mismatch types.
> > 	(vect_recog_dot_prod_pattern): Support usdot_prod_optab.
> >
> >
> > > -----Original Message-----
> > > From: Richard Biener <rguenther@suse.de>
> > > Sent: Monday, May 10, 2021 2:29 PM
> > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> > > Subject: RE: [PATCH 1/4]middle-end Vect: Add support for dot-product
> > > where the sign for the multiplicant changes.
> > >
> > > On Mon, 10 May 2021, Tamar Christina wrote:
> > >
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Richard Biener <rguenther@suse.de>
> > > > > Sent: Monday, May 10, 2021 12:40 PM
> > > > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> > > > > Subject: RE: [PATCH 1/4]middle-end Vect: Add support for dot-
> product
> > > > > where the sign for the multiplicant changes.
> > > > >
> > > > > On Fri, 7 May 2021, Tamar Christina wrote:
> > > > >
> > > > > > Hi Richi,
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Richard Biener <rguenther@suse.de>
> > > > > > > Sent: Friday, May 7, 2021 12:46 PM
> > > > > > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > > > > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> > > > > > > Subject: Re: [PATCH 1/4]middle-end Vect: Add support for
> > > > > > > dot-product where the sign for the multiplicant changes.
> > > > > > >
> > > > > > > On Wed, 5 May 2021, Tamar Christina wrote:
> > > > > > >
> > > > > > > > Hi All,
> > > > > > > >
> > > > > > > > This patch adds support for a dot product where the sign of
> > > > > > > > the multiplication arguments differ. i.e. one is signed and
> > > > > > > > one is unsigned but the precisions are the same.
> > > > > > > >
> > > > > > > > #define N 480
> > > > > > > > #define SIGNEDNESS_1 unsigned
> > > > > > > > #define SIGNEDNESS_2 signed
> > > > > > > > #define SIGNEDNESS_3 signed
> > > > > > > > #define SIGNEDNESS_4 unsigned
> > > > > > > >
> > > > > > > > SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int
> > > > > > > > res,
> > > > > > > > SIGNEDNESS_3 char *restrict a,
> > > > > > > >    SIGNEDNESS_4 char *restrict b) {
> > > > > > > >   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> > > > > > > >     {
> > > > > > > >       int av = a[i];
> > > > > > > >       int bv = b[i];
> > > > > > > >       SIGNEDNESS_2 short mult = av * bv;
> > > > > > > >       res += mult;
> > > > > > > >     }
> > > > > > > >   return res;
> > > > > > > > }
> > > > > > > >
> > > > > > > > The operations are performed as if the operands were
> extended
> > > > > > > > to a 32-bit
> > > > > > > value.
> > > > > > > > As such this operation isn't valid if there is an intermediate
> > > > > > > > conversion to an unsigned value. i.e.  if SIGNEDNESS_2 is
> unsigned.
> > > > > > > >
> > > > > > > > more over if the signs of SIGNEDNESS_3 and SIGNEDNESS_4 are
> > > > > > > > flipped the same optab is used but the operands are flipped in
> > > > > > > > the optab
> > > > > > > expansion.
> > > > > > > >
> > > > > > > > To support this the patch extends the dot-product detection to
> > > > > > > > optionally ignore operands with different signs and stores
> > > > > > > > this information in the optab subtype which is now made a
> bitfield.
> > > > > > > >
> > > > > > > > The subtype can now additionally controls which optab an EXPR
> > > > > > > > can expand
> > > > > > > to.
> > > > > > > >
> > > > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no
> issues.
> > > > > > > >
> > > > > > > > Ok for master?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Tamar
> > > > > > > >
> > > > > > > > gcc/ChangeLog:
> > > > > > > >
> > > > > > > > 	* optabs.def (usdot_prod_optab): New.
> > > > > > > > 	* doc/md.texi: Document it.
> > > > > > > > 	* optabs-tree.c (optab_for_tree_code): Support
> > > usdot_prod_optab.
> > > > > > > > 	* optabs-tree.h (enum optab_subtype): Likewise.
> > > > > > > > 	* optabs.c (expand_widen_pattern_expr): Likewise.
> > > > > > > > 	* tree-cfg.c (verify_gimple_assign_ternary): Likewise.
> > > > > > > > 	* tree-vect-loop.c (vect_determine_dot_kind): New.
> > > > > > > > 	(vectorizable_reduction): Query dot-product kind.
> > > > > > > > 	* tree-vect-patterns.c (vect_supportable_direct_optab_p):
> > > > > > > > Take
> > > > > > > optional
> > > > > > > > 	optab subtype.
> > > > > > > > 	(vect_joust_widened_type, vect_widened_op_tree):
> > > Optionally
> > > > > > > ignore
> > > > > > > > 	mismatch types.
> > > > > > > > 	(vect_recog_dot_prod_pattern): Support usdot_prod_optab.
> > > > > > > >
> > > > > > > > --- inline copy of patch --
> > > > > > > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index
> > > > > > > >
> > > > > > >
> > > > >
> > >
> d166a0debedf4d8edf55c842bcf4ff4690b3e9ce..baf20416e63745097825fc30fd
> > > > > > > f2
> > > > > > > > e66bc80d7d23 100644
> > > > > > > > --- a/gcc/doc/md.texi
> > > > > > > > +++ b/gcc/doc/md.texi
> > > > > > > > @@ -5440,11 +5440,13 @@ Like
> @samp{fold_left_plus_@var{m}},
> > > > > > > > but
> > > > > > > takes
> > > > > > > > an additional mask operand  @item @samp{sdot_prod@var{m}}
> > > > > @cindex
> > > > > > > > @code{udot_prod@var{m}} instruction pattern  @itemx
> > > > > > > > @samp{udot_prod@var{m}}
> > > > > > > > +@cindex @code{usdot_prod@var{m}} instruction pattern
> @itemx
> > > > > > > > +@samp{usdot_prod@var{m}}
> > > > > > > >  Compute the sum of the products of two signed/unsigned
> > > elements.
> > > > > > > > -Operand 1 and operand 2 are of the same mode. Their product,
> > > > > > > > which is of a -wider mode, is computed and added to operand 3.
> > > > > > > > Operand 3 is of a mode equal or -wider than the mode of the
> > > > > > > > product. The result is placed in operand 0, which -is of the
> > > > > > > > same mode
> > > > > as operand 3.
> > > > > > > > +Operand 1 and operand 2 are of the same mode but may differ
> > > > > > > > +in
> > > > > signs.
> > > > > > > > +Their product, which is of a wider mode, is computed and
> > > > > > > > +added to
> > > > > > > operand 3.
> > > > > > > > +Operand 3 is of a mode equal or wider than the mode of the
> > > product.
> > > > > > > > +The result is placed in operand 0, which is of the same mode
> > > > > > > > +as
> > > > > operand 3.
> > > > > > >
> > > > > > > This doesn't really say what the 's', 'u' and 'us' specify.
> > > > > > > Since we're doing a widen multiplication and then a non-widening
> > > > > > > addition we only need to know the effective sign of the
> > > > > > > multiplication so I think
> > > > > the existing 's' and 'u'
> > > > > > > are enough to cover all cases?
> > > > > >
> > > > > > The existing 's' and 'u' enforce that both operands of the
> > > > > > multiplication are of the same sign.  So for e.g. 'u' both operand
> > > > > > must be
> > > > > unsigned.
> > > > > >
> > > > > > In the `us` case one can be signed and one unsigned. Operationally
> > > > > > this does a sign extension to the wider type for the signed value,
> > > > > > and the unsigned value gets zero extended first, and then converts
> > > > > > it to unsigned to perform the unsigned multiplication, conforming
> > > > > > to the C
> > > > > promotion rules.
> > > > > >
> > > > > > TL;DR; Without a new optab I can't tell during expansion which
> > > > > > semantic the operation had at the gimple/C level as modes don't
> carry
> > > signs.
> > > > > >
> > > > > > Long version:
> > > > > >
> > > > > > The problem with using the existing patterns, because of their
> > > > > > enforcement of `av` and `bv` being the same sign is that we can't
> > > > > > remove the explicit sign extensions, but the multiplication must
> > > > > > be done on
> > > > > the sign/zero extended char input in the same sign.
> > > > > >
> > > > > > Which means (unless I am mistaken) to get the correct result, you
> > > > > > can't use neither `udot` nor `sdot` as semantically these would
> > > > > > zero or sign extend both operands from char to int to perform the
> > > > > > multiplication in the same sigh.  Whereas in this case, one
> > > > > > parameter is zero
> > > > > and one parameter is sign extended and the result is always an
> > > > > unsigned number.
> > > > > >
> > > > > > So basically
> > > > > >
> > > > > > udot<unsigned c, unsigned a, unsigned b> ==
> > > > > >    c = zero-ext (a) * zero-ext (b) sdot<signed c, signed a, signed
> > > > > > b> ==
> > > > > >    c = sign-ext (a) * sign-ext (b) usdot<unsigned c, unsigned a,
> > > > > > signed b> ==
> > > > > >    c = ((unsigned-conv) sign-ext (a)) * zero-ext (b)
> > > > > >
> > > > > > So semantically the existing optabs won't fit here. udot would
> > > > > > internally promote to unsigned types before the multiplication so
> > > > > > the result of the multiplication would be wrong.  sdot would
> > > > > > promote both to
> > > > > signed and do signed multiplication, so the result is also wrong.
> > > > > >
> > > > > > Now if I relax the constraint on the signs of udot and sdot there
> > > > > > are two
> > > > > problems:
> > > > > > RTL Modes don't contain signs.  So a target can't tell me how the
> > > > > > operands
> > > > > will be promoted.
> > > > > > So:
> > > > > >
> > > > > > 1) I can't really check which semantics the target will adhere to
> > > > > > on
> > > > > expansion.
> > > > > > 2) at expand time I have no way to differentiate between the two
> > > > > instructions variants, given just modes
> > > > > >      I can't tell whether I expand to the normal dot-product or
> > > > > > the new
> > > > > instruction.
> > > > >
> > > > > Ah, OK.  Indeed with such a weird instruction the new variant makes
> > > sense.
> > > > > Still can you please amend the optab documentation to say which
> > > > > operand is unsigned and which is signed?  Just 'may differ in signs'
> > > > > is bad.
> > > >
> > > > Sure, will expand on it.
> > > >
> > > > >
> > > > > Since the multiplication is commutative I wonder why you need to
> > > > > handle both signed_to_unsigned and unsigned_to_signed - we
> should
> > > > > just enforce a canonical order (like the optab does).
> > > >
> > > > Sure, I thought it would have been better to change the order at
> > > > expand time, but can do so at detection time.
> > > >
> > > > > I also think it's a particular bad fit for the bad
> > > > > optab_for_tree_code API - would any of that improve when using a
> > > > > direct internal function here?
> > > >
> > > > Somewhat, but this has considerable knock on effects, e.g. currently
> > > > DOT_PROD is treated as a widening operation and so is handled by
> > > > supportable_widening_operation which does not support calls. There's
> a
> > > > significant number of places which work on the tree EXPR (including
> > > constant folding) which all need to be changed.
> > > >
> > > > > In particular all the changes around optab_subtype look like they
> > > > > make a bad API worse ... at least a single optab_vector_mixed_sign
> > > > > should suffice here, no need to make it a flags kind.
> > > >
> > > > The reason I did so is because depending on where the query is done it
> > > > does use different subtypes currently.  During detection it uses
> > > > optab_default, and during vectorization optab_vector.  For this
> > > > instruction this difference doesn't seem to be used, but did not want to
> > > lose this information in case something depended on it.
> > > >
> > > > But can make it just one.
> > > >
> > > > >
> > > > > +  /* If we have a sign changing dot product we need to check that
> the
> > > > > +     promoted type if unsigned has at least the same precision as
> > > > > + the
> > > > > final
> > > > > +     type of the dot-product.  */
> > > > > +  if (subtype != optab_default)
> > > > > +    {
> > > > > +      tree mult_type = TREE_TYPE (unprom_mult.op);
> > > > > +      if (TYPE_SIGN (mult_type) == UNSIGNED
> > > > > +         && TYPE_PRECISION (mult_type) < TYPE_PRECISION (type))
> > > > > +       return NULL;
> > > > > +    }
> > > > >
> > > > > I don't understand this - how do we ever arrive at a result with less
> > > precision?
> > > >
> > > > The user could have manually truncated the results, i.e. in the
> > > > detection code notice `mult`
> > > >
> > > >       int av = a[i];
> > > >       int bv = b[i];
> > > >       SIGNEDNESS_2 short mult = av * bv;
> > > >       res += mult;
> > > >
> > > > which is a short, so it's manually truncating the multiplication which
> > > > is done as int by the instruction. If `mult` is unsigned then it will
> > > > truncate the result if the signed input to usdot was negative, unless
> > > > the Intermediate calculation is of the same precision as the
> > > > instruction. i.e. if mult is unsigned int then there's no truncation
> > > > going on, it's casting from int to unsigned int so it's safe to use
> > > > then as the instruction does the same thing internally.
> > >
> > > It looks to me that we simply should only ever allow sing-changes from
> > > multiplication result to the sum.  At least your example above is not
> special to
> > > mixed sign multiplications, no?
> > >
> > > > > And why's this not an issue for signed multiplication?
> > > >
> > > > It is, but in that case it's handled by the type jousting, which
> > > > doesn't allow the type mismatch. i.e.
> > > >
> > > > #define SIGNEDNESS_1 unsigned
> > > > #define SIGNEDNESS_2 unsigned
> > > > #define SIGNEDNESS_3 signed
> > > > #define SIGNEDNESS_4 signed
> > > >
> > > > SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int res,
> > > > SIGNEDNESS_3 char *restrict a,
> > > >    SIGNEDNESS_4 char *restrict b)
> > > > {
> > > >   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> > > >     {
> > > >       int av = a[i];
> > > >       int bv = b[i];
> > > >       SIGNEDNESS_2 short mult = av * bv;
> > > >       res += mult;
> > > >     }
> > > >   return res;
> > > > }
> > > >
> > > > Is also not detected as a dot product.  By adding the carve out to the
> > > > widen multiplication detection it now allows this case through so I
> > > > handle it in the detection code.  Thinking about it now, it seems more
> > > > logical to add this case handling inside the type jousting code as I
> > > > don't think it's ever something you'd want.
> > >
> > > Yeah, I think we only need to look through sign changes on the
> multiplication
> > > result.
> > >
> > > > > Also...
> > > > >
> > > > > +  /* If we have a sign changing dot-product the dot-product itself
> > > > > + does
> > > > > any
> > > > > +     sign conversions, so consume the type and use the unpromoted
> > > types.
> > > > > */
> > > > > +  tree mult_arg1, mult_arg2;
> > > > > +  if (subtype == optab_default)
> > > > > +    {
> > > > > +      mult_arg1 = mult_oprnd[0];
> > > > > +      mult_arg2 = mult_oprnd[1];
> > > > > +    }
> > > > > +  else
> > > > > +    {
> > > > > +      mult_arg1 = unprom0[0].op;
> > > > > +      mult_arg2 = unprom0[1].op;
> > > > > +    }
> > > > >    pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,
> > > > > -                                     mult_oprnd[0], mult_oprnd[1],
> > > > > oprnd1);
> > > > > +                                     mult_arg1, mult_arg2, oprnd1);
> > > > >
> > > > > I thought DOT_PROD always performs the promotion.  Maybe
> > > mult_oprnd
> > > > > and unprom0 are just misnamed here?
> > > >
> > > > Somewhat, in a normal dot-product the sign of the multiplication are
> > > > the same here as the "unpromoted" types. So after
> vect_convert_input
> > > > these two types are the same.
> > > >
> > > > However because here the sign changes and to maintain the semantics
> of
> > > > the C code there's an extra conversion here to get the arguments in
> > > > the same sign.  That needs to be stripped before given to the
> > > > instruction which does the conversion internally.
> > >
> > > Yes, but then why's that not done by the detection code?  That is, does it
> > > (mis-)handle the (int)short_a * (int)(unsigned short)short_b where we'd
> > > want the mixed-sign handling and not strip the unsigned short conversion
> > > from short_b?
> > >
> > > Richard.
> > >
> > > >
> > > > Regards,
> > > > Tamar
> > > >
> > > > >
> > > > > Richard.
> > > > >
> > > > > > Regards,
> > > > > > Tamar
> > > > > >
> > > > > > >
> > > > > > > The tree.def docs say the sum is also possibly widening but I
> > > > > > > don't see this covered by the optab so we should eventually
> > > > > > > remove this feature from the tree side.  In fact the tree-cfg.c
> > > > > > > verifier requires the addition to be not widening - thus only
> > > > > > > tree.def needs
> > > > > adjustment.
> > > > > > >
> > > > > > > >  @cindex @code{ssad@var{m}} instruction pattern  @item
> > > > > > > > @samp{ssad@var{m}} diff --git a/gcc/optabs-tree.h
> > > > > > > > b/gcc/optabs-tree.h index
> > > > > > > >
> > > > > > >
> > > > >
> > >
> c3aaa1a416991e856d3e24da45968a92ebada82c..ebc23ac86fe99057f375781c2f
> > > > > > > 19
> > > > > > > > 90e0548ba08d 100644
> > > > > > > > --- a/gcc/optabs-tree.h
> > > > > > > > +++ b/gcc/optabs-tree.h
> > > > > > > > @@ -27,11 +27,29 @@ along with GCC; see the file COPYING3.
> If
> > > > > > > > not
> > > > > see
> > > > > > > >     shift amount vs. machines that take a vector for the shift
> amount.
> > > > > > > > */  enum optab_subtype  {
> > > > > > > > -  optab_default,
> > > > > > > > -  optab_scalar,
> > > > > > > > -  optab_vector
> > > > > > > > +  optab_default = 1 << 0,
> > > > > > > > +  optab_scalar = 1 << 1,
> > > > > > > > +  optab_vector = 1 << 2,
> > > > > > > > +  optab_signed_to_unsigned = 1 << 3,
> > > > > > > > + optab_unsigned_to_signed =
> > > > > > > > + 1 << 4
> > > > > > > >  };
> > > > > > > >
> > > > > > > > +/* Override the OrEqual-operator so we can use
> optab_subtype
> > > > > > > > +as a bit flag.  */ inline enum optab_subtype& operator |=
> > > > > > > > +(enum
> > > > > > > optab_subtype&
> > > > > > > > +a, enum optab_subtype b) {
> > > > > > > > +    return a = static_cast<optab_subtype>(static_cast<int>(a)
> > > > > > > > +					  | static_cast<int>(b)); }
> > > > > > > > +
> > > > > > > > +/* Override the Or-operator so we can use optab_subtype as a
> > > > > > > > +bit flag.  */ inline enum optab_subtype operator | (enum
> > > > > > > > +optab_subtype a, enum optab_subtype b) {
> > > > > > > > +    return static_cast<optab_subtype>(static_cast<int>(a)
> > > > > > > > +				      | static_cast<int>(b)); }
> > > > > > > > +
> > > > > > > >  /* Return the optab used for computing the given operation on
> > > > > > > > the type
> > > > > > > given by
> > > > > > > >     the second argument.  The third argument distinguishes
> > > > > > > > between the
> > > > > > > types of
> > > > > > > >     vector shifts and rotates.  */ diff --git
> > > > > > > > a/gcc/optabs-tree.c b/gcc/optabs-tree.c index
> > > > > > > >
> > > > > > >
> > > > >
> > >
> 95ffe397c23e80c105afea52e9d47216bf52f55a..2f60004545defc53182e004eea
> > > > > > > 1e
> > > > > > > > 5c22b7453072 100644
> > > > > > > > --- a/gcc/optabs-tree.c
> > > > > > > > +++ b/gcc/optabs-tree.c
> > > > > > > > @@ -127,7 +127,17 @@ optab_for_tree_code (enum tree_code
> > > code,
> > > > > > > const_tree type,
> > > > > > > >        return TYPE_UNSIGNED (type) ? usum_widen_optab :
> > > > > > > > ssum_widen_optab;
> > > > > > > >
> > > > > > > >      case DOT_PROD_EXPR:
> > > > > > > > -      return TYPE_UNSIGNED (type) ? udot_prod_optab :
> > > > > sdot_prod_optab;
> > > > > > > > +      {
> > > > > > > > +	gcc_assert (subtype & optab_default
> > > > > > > > +		    || subtype & optab_vector
> > > > > > > > +		    || subtype & optab_signed_to_unsigned
> > > > > > > > +		    || subtype & optab_unsigned_to_signed);
> > > > > > > > +
> > > > > > > > +	if (subtype & (optab_unsigned_to_signed |
> > > > > > > optab_signed_to_unsigned))
> > > > > > > > +	  return usdot_prod_optab;
> > > > > > > > +
> > > > > > > > +	return (TYPE_UNSIGNED (type) ? udot_prod_optab :
> > > > > > > sdot_prod_optab);
> > > > > > > > +      }
> > > > > > > >
> > > > > > > >      case SAD_EXPR:
> > > > > > > >        return TYPE_UNSIGNED (type) ? usad_optab : ssad_optab;
> > > > > > > > diff --git a/gcc/optabs.c b/gcc/optabs.c index
> > > > > > > >
> > > > > > >
> > > > >
> > >
> f4614a394587787293dc8b680a38901f7906f61c..2e18b76de1412eab71971753ac
> > > > > > > 67
> > > > > > > > 8597c0d00098 100644
> > > > > > > > --- a/gcc/optabs.c
> > > > > > > > +++ b/gcc/optabs.c
> > > > > > > > @@ -262,6 +262,11 @@ expand_widen_pattern_expr (sepops
> ops,
> > > > > > > > rtx op0,
> > > > > > > rtx op1, rtx wide_op,
> > > > > > > >    bool sbool = false;
> > > > > > > >
> > > > > > > >    oprnd0 = ops->op0;
> > > > > > > > +  if (nops >= 2)
> > > > > > > > +    oprnd1 = ops->op1;
> > > > > > > > +  if (nops >= 3)
> > > > > > > > +    oprnd2 = ops->op2;
> > > > > > > > +
> > > > > > > >    tmode0 = TYPE_MODE (TREE_TYPE (oprnd0));
> > > > > > > >    if (ops->code == VEC_UNPACK_FIX_TRUNC_HI_EXPR
> > > > > > > >        || ops->code == VEC_UNPACK_FIX_TRUNC_LO_EXPR) @@ -
> > > 285,6
> > > > > > > +290,27
> > > > > > > > @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1,
> > > > > > > > rtx
> > > > > > > wide_op,
> > > > > > > >  	   ? vec_unpacks_sbool_hi_optab :
> > > vec_unpacks_sbool_lo_optab);
> > > > > > > >        sbool = true;
> > > > > > > >      }
> > > > > > > > +  else if (ops->code == DOT_PROD_EXPR)
> > > > > > > > +    {
> > > > > > > > +      enum optab_subtype subtype = optab_default;
> > > > > > > > +      signop sign1 = TYPE_SIGN (TREE_TYPE (oprnd0));
> > > > > > > > +      signop sign2 = TYPE_SIGN (TREE_TYPE (oprnd1));
> > > > > > > > +      if (sign1 == sign2)
> > > > > > > > +	;
> > > > > > > > +      else if (sign1 == SIGNED && sign2 == UNSIGNED)
> > > > > > > > +	{
> > > > > > > > +	  subtype |= optab_signed_to_unsigned;
> > > > > > > > +	  /* Same as optab_unsigned_to_signed but flip the
> > > operands.  */
> > > > > > > > +	  std::swap (op0, op1);
> > > > > > > > +	}
> > > > > > > > +      else if (sign1 == UNSIGNED && sign2 == SIGNED)
> > > > > > > > +	subtype |= optab_unsigned_to_signed;
> > > > > > > > +      else
> > > > > > > > +	gcc_unreachable ();
> > > > > > > > +
> > > > > > > > +      widen_pattern_optab
> > > > > > > > +	= optab_for_tree_code (ops->code, TREE_TYPE (oprnd0),
> > > subtype);
> > > > > > > > +    }
> > > > > > > >    else
> > > > > > > >      widen_pattern_optab
> > > > > > > >        = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0),
> > > > > > > > optab_default); @@ -298,10 +324,7 @@
> > > expand_widen_pattern_expr
> > > > > > > (sepops ops, rtx op0, rtx op1, rtx wide_op,
> > > > > > > >    gcc_assert (icode != CODE_FOR_nothing);
> > > > > > > >
> > > > > > > >    if (nops >= 2)
> > > > > > > > -    {
> > > > > > > > -      oprnd1 = ops->op1;
> > > > > > > > -      tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
> > > > > > > > -    }
> > > > > > > > +    tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
> > > > > > > >    else if (sbool)
> > > > > > > >      {
> > > > > > > >        nops = 2;
> > > > > > > > @@ -316,7 +339,6 @@ expand_widen_pattern_expr (sepops
> ops,
> > > rtx
> > > > > > > > op0,
> > > > > > > rtx op1, rtx wide_op,
> > > > > > > >      {
> > > > > > > >        gcc_assert (tmode1 == tmode0);
> > > > > > > >        gcc_assert (op1);
> > > > > > > > -      oprnd2 = ops->op2;
> > > > > > > >        wmode = TYPE_MODE (TREE_TYPE (oprnd2));
> > > > > > > >      }
> > > > > > > >
> > > > > > > > diff --git a/gcc/optabs.def b/gcc/optabs.def index
> > > > > > > >
> > > > > > >
> > > > >
> > >
> b192a9d070b8aa72e5676b2eaa020b5bdd7ffcc8..f470c2168378cec840edf7fbd
> > > > > > > b7c
> > > > > > > > 18615baae928 100644
> > > > > > > > --- a/gcc/optabs.def
> > > > > > > > +++ b/gcc/optabs.def
> > > > > > > > @@ -352,6 +352,7 @@ OPTAB_D (uavg_ceil_optab,
> "uavg$a3_ceil")
> > > > > > > OPTAB_D
> > > > > > > > (sdot_prod_optab, "sdot_prod$I$a")  OPTAB_D
> > > (ssum_widen_optab,
> > > > > > > > "widen_ssum$I$a3")  OPTAB_D (udot_prod_optab,
> > > "udot_prod$I$a")
> > > > > > > > +OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
> > > > > > > >  OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
> OPTAB_D
> > > > > > > (usad_optab,
> > > > > > > > "usad$I$a")  OPTAB_D (ssad_optab, "ssad$I$a") diff --git
> > > > > > > > a/gcc/tree-cfg.c b/gcc/tree-cfg.c index
> > > > > > > >
> > > > > > >
> > > > >
> > >
> 7e3aae5f9c28a49feedc7cc66e8ac0d476b9f28a..58b55bb648ad97d514f1fa18bb
> > > > > > > 00
> > > > > > > > 808fd2678b42 100644
> > > > > > > > --- a/gcc/tree-cfg.c
> > > > > > > > +++ b/gcc/tree-cfg.c
> > > > > > > > @@ -4421,7 +4421,8 @@ verify_gimple_assign_ternary (gassign
> > > *stmt)
> > > > > > > >  		  && !SCALAR_FLOAT_TYPE_P (rhs1_type))
> > > > > > > >  		 || (!INTEGRAL_TYPE_P (lhs_type)
> > > > > > > >  		     && !SCALAR_FLOAT_TYPE_P (lhs_type))))
> > > > > > > > -	    || !types_compatible_p (rhs1_type, rhs2_type)
> > > > > > > > +	    || (!types_compatible_p (rhs1_type, rhs2_type)
> > > > > > > > +		&& TYPE_SIGN (rhs1_type) == TYPE_SIGN
> > > (rhs2_type))
> > > > > > >
> > > > > > > That's not restrictive enough.  I suggest you use
> > > > > > >
> > > > > > >             && element_precision (rhs1_type) !=
> > > > > > > element_precision
> > > > > > > (rhs2_type)
> > > > > > >
> > > > > > > instead.
> > > > > > >
> > > > > > > As said, I'm not sure all the changes in this patch are required.
> > > > > > >
> > > > > > > Please elaborate.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Richard.
> > > > > > >
> > > > > > > >  	    || !useless_type_conversion_p (lhs_type, rhs3_type)
> > > > > > > >  	    || maybe_lt (GET_MODE_SIZE (element_mode
> > > (rhs3_type)),
> > > > > > > >  			 2 * GET_MODE_SIZE (element_mode
> > > (rhs1_type))))
> > > > > > > diff --git
> > > > > > > > a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index
> > > > > > > >
> > > > > > >
> > > > >
> > >
> 93fa2928e001c154bd4a9a73ac1dbbbf73c456df..cb8f5fbb6abca181c4171194d1
> > > > > > > 9f
> > > > > > > > ec29ec6e4176 100644
> > > > > > > > --- a/gcc/tree-vect-loop.c
> > > > > > > > +++ b/gcc/tree-vect-loop.c
> > > > > > > > @@ -6401,6 +6401,33 @@ build_vect_cond_expr (enum
> tree_code
> > > > > code,
> > > > > > > tree vop[3], tree mask,
> > > > > > > >      }
> > > > > > > >  }
> > > > > > > >
> > > > > > > > +/* Determine the optab_subtype to use for the given CODE
> and
> > > STMT.
> > > > > > > For
> > > > > > > > +   most CODE this will be optab_vector, however for certain
> > > > > > > > + operations
> > > > > > > such as
> > > > > > > > +   DOT_PROD_EXPR where the operation can different signs for
> > > > > > > > + the
> > > > > > > operands we
> > > > > > > > +   need to be able to pick the right optabs.  */
> > > > > > > > +
> > > > > > > > +static enum optab_subtype
> > > > > > > > +vect_determine_dot_kind (tree_code code, stmt_vec_info
> > > > > > > > +stmt_vinfo) {
> > > > > > > > +  enum optab_subtype subtype = optab_vector;
> > > > > > > > +  switch (code)
> > > > > > > > +    {
> > > > > > > > +      case DOT_PROD_EXPR:
> > > > > > > > +	{
> > > > > > > > +	  gassign *stmt = as_a <gassign *> (STMT_VINFO_STMT
> > > (stmt_vinfo));
> > > > > > > > +	  signop rhs1_sign = TYPE_SIGN (TREE_TYPE
> > > > > > > > +(gimple_assign_rhs1
> > > > > > > (stmt)));
> > > > > > > > +	  signop rhs2_sign = TYPE_SIGN (TREE_TYPE
> > > > > > > > +(gimple_assign_rhs2
> > > > > > > (stmt)));
> > > > > > > > +	  if (rhs1_sign != rhs2_sign)
> > > > > > > > +	    subtype |= optab_unsigned_to_signed;
> > > > > > > > +	  break;
> > > > > > > > +	}
> > > > > > > > +      default:
> > > > > > > > +	break;
> > > > > > > > +    }
> > > > > > > > +
> > > > > > > > +  return subtype;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > >  /* Function vectorizable_reduction.
> > > > > > > >
> > > > > > > >     Check if STMT_INFO performs a reduction operation that can
> > > > > > > > be
> > > > > > > vectorized.
> > > > > > > > @@ -7189,7 +7216,8 @@ vectorizable_reduction (loop_vec_info
> > > > > > > loop_vinfo,
> > > > > > > >        bool ok = true;
> > > > > > > >
> > > > > > > >        /* 4.1. check support for the operation in the loop  */
> > > > > > > > -      optab optab = optab_for_tree_code (code, vectype_in,
> > > > > optab_vector);
> > > > > > > > +      enum optab_subtype subtype = vect_determine_dot_kind
> > > > > > > > + (code,
> > > > > > > stmt_info);
> > > > > > > > +      optab optab = optab_for_tree_code (code, vectype_in,
> > > > > > > > + subtype);
> > > > > > > >        if (!optab)
> > > > > > > >  	{
> > > > > > > >  	  if (dump_enabled_p ())
> > > > > > > > diff --git a/gcc/tree-vect-patterns.c
> > > > > > > > b/gcc/tree-vect-patterns.c index
> > > > > > > >
> > > > > > >
> > > > >
> > >
> 441d6cd28c4eaded7abd756164890dbcffd2f3b8..943c001fb13777b4d1513841f
> > > > > > > a84
> > > > > > > > 942316846d5e 100644
> > > > > > > > --- a/gcc/tree-vect-patterns.c
> > > > > > > > +++ b/gcc/tree-vect-patterns.c
> > > > > > > > @@ -201,7 +201,8 @@ vect_get_external_def_edge (vec_info
> > > > > > > > *vinfo, tree
> > > > > > > > var)  static bool  vect_supportable_direct_optab_p (vec_info
> > > > > > > > *vinfo, tree otype, tree_code code,
> > > > > > > >  				 tree itype, tree *vecotype_out,
> > > > > > > > -				 tree *vecitype_out = NULL)
> > > > > > > > +				 tree *vecitype_out = NULL,
> > > > > > > > +				 enum optab_subtype subtype =
> > > > > > > optab_default)
> > > > > > > >  {
> > > > > > > >    tree vecitype = get_vectype_for_scalar_type (vinfo, itype);
> > > > > > > >    if (!vecitype)
> > > > > > > > @@ -211,7 +212,7 @@ vect_supportable_direct_optab_p
> (vec_info
> > > > > > > > *vinfo,
> > > > > > > tree otype, tree_code code,
> > > > > > > >    if (!vecotype)
> > > > > > > >      return false;
> > > > > > > >
> > > > > > > > -  optab optab = optab_for_tree_code (code, vecitype,
> > > > > > > > optab_default);
> > > > > > > > +  optab optab = optab_for_tree_code (code, vecitype,
> > > > > > > > + subtype);
> > > > > > > >    if (!optab)
> > > > > > > >      return false;
> > > > > > > >
> > > > > > > > @@ -487,14 +488,31 @@ vect_joust_widened_integer (tree
> type,
> > > > > > > > bool shift_p, tree op,  }
> > > > > > > >
> > > > > > > >  /* Return true if the common supertype of NEW_TYPE and
> > > > > > > *COMMON_TYPE
> > > > > > > > -   is narrower than type, storing the supertype in
> *COMMON_TYPE
> > > if
> > > > > so.
> > > > > > > */
> > > > > > > > +   is narrower than type, storing the supertype in
> > > > > > > > + *COMMON_TYPE if
> > > > > so.
> > > > > > > > +   If ALLOW_SHORT_SIGN_MISMATCH then accept that
> > > > > *COMMON_TYPE
> > > > > > > and NEW_TYPE
> > > > > > > > +   may be of different signs but equal precision.   */
> > > > > > > >
> > > > > > > >  static bool
> > > > > > > > -vect_joust_widened_type (tree type, tree new_type, tree
> > > > > > > *common_type)
> > > > > > > > +vect_joust_widened_type (tree type, tree new_type, tree
> > > > > > > *common_type,
> > > > > > > > +			 bool allow_short_sign_mismatch = false)
> > > > > > > >  {
> > > > > > > >    if (types_compatible_p (*common_type, new_type))
> > > > > > > >      return true;
> > > > > > > >
> > > > > > > > +  /* Check if the mismatch is only in the sign and if we have
> > > > > > > > +     allow_short_sign_mismatch then allow it.  */
> > > > > > > > +  if (allow_short_sign_mismatch
> > > > > > > > +      && TYPE_SIGN (*common_type) != TYPE_SIGN (new_type))
> > > > > > > > +    {
> > > > > > > > +      bool sign = TYPE_SIGN (*common_type) == UNSIGNED;
> > > > > > > > +      tree eq_type
> > > > > > > > +	= build_nonstandard_integer_type (TYPE_PRECISION
> > > (new_type),
> > > > > > > > +					  sign);
> > > > > > > > +
> > > > > > > > +      if (types_compatible_p (*common_type, eq_type))
> > > > > > > > +	return true;
> > > > > > > > +    }
> > > > > > > > +
> > > > > > > >    /* See if *COMMON_TYPE can hold all values of NEW_TYPE.
> */
> > > > > > > >    if ((TYPE_PRECISION (new_type) < TYPE_PRECISION
> > > (*common_type))
> > > > > > > >        && (TYPE_UNSIGNED (new_type) || !TYPE_UNSIGNED
> > > > > > > (*common_type)))
> > > > > > > > @@ -532,6 +550,9 @@ vect_joust_widened_type (tree type,
> tree
> > > > > > > new_type, tree *common_type)
> > > > > > > >     to a type that (a) is narrower than the result of STMT_INFO
> and
> > > > > > > >     (b) can hold all leaf operand values.
> > > > > > > >
> > > > > > > > +   If ALLOW_SHORT_SIGN_MISMATCH then allow that the signs
> of
> > > > > > > > + the
> > > > > > > operands
> > > > > > > > +   may differ in signs but not in precision.
> > > > > > > > +
> > > > > > > >     Return 0 if STMT_INFO isn't such a tree, or if no such
> > > COMMON_TYPE
> > > > > > > >     exists.  */
> > > > > > > >
> > > > > > > > @@ -539,7 +560,8 @@ static unsigned int
> vect_widened_op_tree
> > > > > > > > (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
> > > > > > > >  		      tree_code widened_code, bool shift_p,
> > > > > > > >  		      unsigned int max_nops,
> > > > > > > > -		      vect_unpromoted_value *unprom, tree
> > > *common_type)
> > > > > > > > +		      vect_unpromoted_value *unprom, tree
> > > *common_type,
> > > > > > > > +		      bool allow_short_sign_mismatch = false)
> > > > > > > >  {
> > > > > > > >    /* Check for an integer operation with the right code.  */
> > > > > > > >    gassign *assign = dyn_cast <gassign *> (stmt_info->stmt);
> > > > > > > > @@
> > > > > > > > -600,7
> > > > > > > > +622,8 @@ vect_widened_op_tree (vec_info *vinfo,
> > > stmt_vec_info
> > > > > > > stmt_info, tree_code code,
> > > > > > > >  		= vinfo->lookup_def (this_unprom->op);
> > > > > > > >  	      nops = vect_widened_op_tree (vinfo, def_stmt_info,
> > > code,
> > > > > > > >  					   widened_code, shift_p,
> > > max_nops,
> > > > > > > > -					   this_unprom,
> > > common_type);
> > > > > > > > +					   this_unprom,
> > > common_type,
> > > > > > > > +
> > > allow_short_sign_mismatch);
> > > > > > > >  	      if (nops == 0)
> > > > > > > >  		return 0;
> > > > > > > >
> > > > > > > > @@ -617,7 +640,8 @@ vect_widened_op_tree (vec_info *vinfo,
> > > > > > > stmt_vec_info stmt_info, tree_code code,
> > > > > > > >  	      if (i == 0)
> > > > > > > >  		*common_type = this_unprom->type;
> > > > > > > >  	      else if (!vect_joust_widened_type (type, this_unprom-
> > > >type,
> > > > > > > > -						 common_type))
> > > > > > > > +						 common_type,
> > > > > > > > +
> > > allow_short_sign_mismatch))
> > > > > > > >  		return 0;
> > > > > > > >  	    }
> > > > > > > >  	}
> > > > > > > > @@ -888,21 +912,24 @@ vect_reassociating_reduction_p
> (vec_info
> > > > > > > > *vinfo,
> > > > > > > >
> > > > > > > >     Try to find the following pattern:
> > > > > > > >
> > > > > > > > -     type x_t, y_t;
> > > > > > > > +     type1a x_t
> > > > > > > > +     type1b y_t;
> > > > > > > >       TYPE1 prod;
> > > > > > > >       TYPE2 sum = init;
> > > > > > > >     loop:
> > > > > > > >       sum_0 = phi <init, sum_1>
> > > > > > > >       S1  x_t = ...
> > > > > > > >       S2  y_t = ...
> > > > > > > > -     S3  x_T = (TYPE1) x_t;
> > > > > > > > -     S4  y_T = (TYPE1) y_t;
> > > > > > > > +     S3  x_T = (TYPE3) x_t;
> > > > > > > > +     S4  y_T = (TYPE4) y_t;
> > > > > > > >       S5  prod = x_T * y_T;
> > > > > > > >       [S6  prod = (TYPE2) prod;  #optional]
> > > > > > > >       S7  sum_1 = prod + sum_0;
> > > > > > > >
> > > > > > > > -   where 'TYPE1' is exactly double the size of type 'type', and
> 'TYPE2'
> > > is
> > > > > the
> > > > > > > > -   same size of 'TYPE1' or bigger. This is a special case of a
> reduction
> > > > > > > > +   where 'TYPE1' is exactly double the size of type 'type1a' and
> > > 'type1b',
> > > > > > > > +   the sign of 'TYPE1' must be one of 'type1a' or 'type1b' but the
> > > sign of
> > > > > > > > +   'type1a' and 'type1b' can differ. 'TYPE2' is the same size of
> 'TYPE1'
> > > or
> > > > > > > > +   bigger and must be the same sign. This is a special case
> > > > > > > > + of a reduction
> > > > > > > >     computation.
> > > > > > > >
> > > > > > > >     Input:
> > > > > > > > @@ -939,15 +966,16 @@ vect_recog_dot_prod_pattern
> (vec_info
> > > > > > > > *vinfo,
> > > > > > > >
> > > > > > > >    /* Look for the following pattern
> > > > > > > >            DX = (TYPE1) X;
> > > > > > > > -          DY = (TYPE1) Y;
> > > > > > > > +	  DY = (TYPE2) Y;
> > > > > > > >            DPROD = DX * DY;
> > > > > > > > -          DDPROD = (TYPE2) DPROD;
> > > > > > > > +	  DDPROD = (TYPE3) DPROD;
> > > > > > > >            sum_1 = DDPROD + sum_0;
> > > > > > > >       In which
> > > > > > > >       - DX is double the size of X
> > > > > > > >       - DY is double the size of Y
> > > > > > > >       - DX, DY, DPROD all have the same type but the sign
> > > > > > > > -       between DX, DY and DPROD can differ.
> > > > > > > > +       between DX, DY and DPROD can differ. The sign of DPROD
> > > > > > > > +       is one of the signs of DX or DY.
> > > > > > > >       - sum is the same size of DPROD or bigger
> > > > > > > >       - sum has been recognized as a reduction variable.
> > > > > > > >
> > > > > > > > @@ -986,14 +1014,41 @@ vect_recog_dot_prod_pattern
> (vec_info
> > > > > *vinfo,
> > > > > > > >       inside the loop (in case we are analyzing an outer-loop).  */
> > > > > > > >    vect_unpromoted_value unprom0[2];
> > > > > > > >    if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR,
> > > > > > > WIDEN_MULT_EXPR,
> > > > > > > > -			     false, 2, unprom0, &half_type))
> > > > > > > > +			     false, 2, unprom0, &half_type, true))
> > > > > > > >      return NULL;
> > > > > > > >
> > > > > > > > +  /* Check to see if there is a sign change happening in the
> > > > > > > > + operands of
> > > > > > > the
> > > > > > > > +     multiplication and pick the appropriate optab subtype.
> > > > > > > > +*/
> > > > > > > > +  enum optab_subtype subtype;
> > > > > > > > +  tree rhs_type1 = unprom0[0].type;
> > > > > > > > +  tree rhs_type2 = unprom0[1].type;
> > > > > > > > +  if (TYPE_SIGN (rhs_type1) == TYPE_SIGN (rhs_type2))
> > > > > > > > +     subtype = optab_default;
> > > > > > > > +  else if (TYPE_SIGN (rhs_type1) == SIGNED
> > > > > > > > +	   && TYPE_SIGN (rhs_type2) == UNSIGNED)
> > > > > > > > +     subtype = optab_signed_to_unsigned;
> > > > > > > > +  else if (TYPE_SIGN (rhs_type1) == UNSIGNED
> > > > > > > > +	   && TYPE_SIGN (rhs_type2) == SIGNED)
> > > > > > > > +     subtype = optab_unsigned_to_signed;
> > > > > > > > +  else
> > > > > > > > +    gcc_unreachable ();
> > > > > > > > +
> > > > > > > > +  /* If we have a sign changing dot product we need to check
> that
> > > the
> > > > > > > > +     promoted type if unsigned has at least the same
> > > > > > > > + precision as the
> > > > > final
> > > > > > > > +     type of the dot-product.  */
> > > > > > > > +  if (subtype != optab_default)
> > > > > > > > +    {
> > > > > > > > +      tree mult_type = TREE_TYPE (unprom_mult.op);
> > > > > > > > +      if (TYPE_SIGN (mult_type) == UNSIGNED
> > > > > > > > +	  && TYPE_PRECISION (mult_type) < TYPE_PRECISION (type))
> > > > > > > > +	return NULL;
> > > > > > > > +    }
> > > > > > > > +
> > > > > > > >    vect_pattern_detected ("vect_recog_dot_prod_pattern",
> > > > > > > > last_stmt);
> > > > > > > >
> > > > > > > >    tree half_vectype;
> > > > > > > >    if (!vect_supportable_direct_optab_p (vinfo, type,
> > > > > > > > DOT_PROD_EXPR,
> > > > > > > half_type,
> > > > > > > > -					type_out, &half_vectype))
> > > > > > > > +					type_out, &half_vectype,
> > > subtype))
> > > > > > > >      return NULL;
> > > > > > > >
> > > > > > > >    /* Get the inputs in the appropriate types.  */ @@ -1002,8
> > > > > > > > +1057,22 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
> > > > > > > >  		       unprom0, half_vectype);
> > > > > > > >
> > > > > > > >    var = vect_recog_temp_ssa_var (type, NULL);
> > > > > > > > +
> > > > > > > > +  /* If we have a sign changing dot-product the dot-product
> > > > > > > > + itself does
> > > > > any
> > > > > > > > +     sign conversions, so consume the type and use the
> > > > > > > > + unpromoted types.  */  tree mult_arg1, mult_arg2;  if
> > > > > > > > + (subtype ==
> > > > > > > > + optab_default)
> > > > > > > > +    {
> > > > > > > > +      mult_arg1 = mult_oprnd[0];
> > > > > > > > +      mult_arg2 = mult_oprnd[1];
> > > > > > > > +    }
> > > > > > > > +  else
> > > > > > > > +    {
> > > > > > > > +      mult_arg1 = unprom0[0].op;
> > > > > > > > +      mult_arg2 = unprom0[1].op;
> > > > > > > > +    }
> > > > > > > >    pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,
> > > > > > > > -				      mult_oprnd[0], mult_oprnd[1],
> > > oprnd1);
> > > > > > > > +				      mult_arg1, mult_arg2, oprnd1);
> > > > > > > >
> > > > > > > >    return pattern_stmt;
> > > > > > > >  }
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Richard Biener <rguenther@suse.de> SUSE Software Solutions
> > > > > > > Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany;
> GF:
> > > > > > > Felix Imend?rffer; HRB 36809 (AG Nuernberg)
> > > > > >
> > > > >
> > > > > --
> > > > > Richard Biener <rguenther@suse.de>
> > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> > > > > Nuernberg, Germany; GF: Felix Imend?rffer; HRB 36809 (AG
> Nuernberg)
> > > >
> > >
> > > --
> > > Richard Biener <rguenther@suse.de>
> > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> > > Nuernberg, Germany; GF: Felix Imend?rffer; HRB 36809 (AG Nuernberg)
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> Nuernberg,
> Germany; GF: Felix Imend

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes.
  2021-06-02  9:28               ` Tamar Christina
@ 2021-06-04 10:12                 ` Tamar Christina
  2021-06-07 10:10                   ` Richard Sandiford
  0 siblings, 1 reply; 35+ messages in thread
From: Tamar Christina @ 2021-06-04 10:12 UTC (permalink / raw)
  To: Tamar Christina, Richard Biener; +Cc: Richard Sandiford, nd, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 69585 bytes --]

Hi Richi,

Attached is re-spun patch.  tree_nop_conversion_p was very handy in cleaning up the patch, Thanks!

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master if Richard S has no comments?

Thanks,
Tamar

gcc/ChangeLog:

	* optabs.def (usdot_prod_optab): New.
	* doc/md.texi: Document it and clarify other dot prod optabs.
	* optabs-tree.h (enum optab_subtype): Add optab_vector_mixed_sign.
	* optabs-tree.c (optab_for_tree_code): Support usdot_prod_optab.
	* optabs.c (expand_widen_pattern_expr): Likewise.
	* tree-cfg.c (verify_gimple_assign_ternary): Likewise.
	* tree-vect-loop.c (vectorizable_reduction): Query dot-product kind.
	* tree-vect-patterns.c (vect_supportable_direct_optab_p): Take optional
	optab subtype.
	(vect_joust_widened_type, vect_widened_op_tree): Optionally ignore
	mismatch types.
	(vect_recog_dot_prod_pattern): Support usdot_prod_optab.


--- inline copy of patch ---

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index d166a0debedf4d8edf55c842bcf4ff4690b3e9ce..9fad3322b3f1eb2a836833bb390df78f0cd9734b 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5438,13 +5438,55 @@ Like @samp{fold_left_plus_@var{m}}, but takes an additional mask operand
 
 @cindex @code{sdot_prod@var{m}} instruction pattern
 @item @samp{sdot_prod@var{m}}
+
+Compute the sum of the products of two signed elements.
+Operand 1 and operand 2 are of the same mode. Their
+product, which is of a wider mode, is computed and added to operand 3.
+Operand 3 is of a mode equal or wider than the mode of the product. The
+result is placed in operand 0, which is of the same mode as operand 3.
+
+Semantically the expressions perform the multiplication in the following signs
+
+@smallexample
+sdot<signed c, signed a, signed b> ==
+   res = sign-ext (a) * sign-ext (b) + c
+@dots{}
+@end smallexample
+
 @cindex @code{udot_prod@var{m}} instruction pattern
-@itemx @samp{udot_prod@var{m}}
-Compute the sum of the products of two signed/unsigned elements.
-Operand 1 and operand 2 are of the same mode. Their product, which is of a
-wider mode, is computed and added to operand 3. Operand 3 is of a mode equal or
-wider than the mode of the product. The result is placed in operand 0, which
-is of the same mode as operand 3.
+@item @samp{udot_prod@var{m}}
+
+Compute the sum of the products of two unsigned elements.
+Operand 1 and operand 2 are of the same mode. Their
+product, which is of a wider mode, is computed and added to operand 3.
+Operand 3 is of a mode equal or wider than the mode of the product. The
+result is placed in operand 0, which is of the same mode as operand 3.
+
+Semantically the expressions perform the multiplication in the following signs
+
+@smallexample
+udot<unsigned c, unsigned a, unsigned b> ==
+   res = zero-ext (a) * zero-ext (b) + c
+@dots{}
+@end smallexample
+
+
+
+@cindex @code{usdot_prod@var{m}} instruction pattern
+@item @samp{usdot_prod@var{m}}
+Compute the sum of the products of elements of different signs.
+Operand 1 must be unsigned and operand 2 signed. Their
+product, which is of a wider mode, is computed and added to operand 3.
+Operand 3 is of a mode equal or wider than the mode of the product. The
+result is placed in operand 0, which is of the same mode as operand 3.
+
+Semantically the expressions perform the multiplication in the following signs
+
+@smallexample
+usdot<unsigned c, unsigned a, signed b> ==
+   res = ((unsigned-conv) sign-ext (a)) * zero-ext (b) + c
+@dots{}
+@end smallexample
 
 @cindex @code{ssad@var{m}} instruction pattern
 @item @samp{ssad@var{m}}
diff --git a/gcc/optabs-tree.h b/gcc/optabs-tree.h
index c3aaa1a416991e856d3e24da45968a92ebada82c..fbd2b06b8dbfd560dfb66b314830e6b564b37abb 100644
--- a/gcc/optabs-tree.h
+++ b/gcc/optabs-tree.h
@@ -29,7 +29,8 @@ enum optab_subtype
 {
   optab_default,
   optab_scalar,
-  optab_vector
+  optab_vector,
+  optab_vector_mixed_sign
 };
 
 /* Return the optab used for computing the given operation on the type given by
diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
index 95ffe397c23e80c105afea52e9d47216bf52f55a..eeb5aeed3202cc6971b6447994bc5311e9c010bb 100644
--- a/gcc/optabs-tree.c
+++ b/gcc/optabs-tree.c
@@ -127,7 +127,12 @@ optab_for_tree_code (enum tree_code code, const_tree type,
       return TYPE_UNSIGNED (type) ? usum_widen_optab : ssum_widen_optab;
 
     case DOT_PROD_EXPR:
-      return TYPE_UNSIGNED (type) ? udot_prod_optab : sdot_prod_optab;
+      {
+	if (subtype == optab_vector_mixed_sign)
+	  return usdot_prod_optab;
+
+	return (TYPE_UNSIGNED (type) ? udot_prod_optab : sdot_prod_optab);
+      }
 
     case SAD_EXPR:
       return TYPE_UNSIGNED (type) ? usad_optab : ssad_optab;
diff --git a/gcc/optabs.c b/gcc/optabs.c
index f4614a394587787293dc8b680a38901f7906f61c..d9b64441d0e0726afee89dc9c937350451e7670d 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -262,6 +262,11 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
   bool sbool = false;
 
   oprnd0 = ops->op0;
+  if (nops >= 2)
+    oprnd1 = ops->op1;
+  if (nops >= 3)
+    oprnd2 = ops->op2;
+
   tmode0 = TYPE_MODE (TREE_TYPE (oprnd0));
   if (ops->code == VEC_UNPACK_FIX_TRUNC_HI_EXPR
       || ops->code == VEC_UNPACK_FIX_TRUNC_LO_EXPR)
@@ -285,6 +290,27 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
 	   ? vec_unpacks_sbool_hi_optab : vec_unpacks_sbool_lo_optab);
       sbool = true;
     }
+  else if (ops->code == DOT_PROD_EXPR)
+    {
+      enum optab_subtype subtype = optab_default;
+      signop sign1 = TYPE_SIGN (TREE_TYPE (oprnd0));
+      signop sign2 = TYPE_SIGN (TREE_TYPE (oprnd1));
+      if (sign1 == sign2)
+	;
+      else if (sign1 == SIGNED && sign2 == UNSIGNED)
+	{
+	  subtype = optab_vector_mixed_sign;
+	  /* Same as optab_vector_mixed_sign but flip the operands.  */
+	  std::swap (op0, op1);
+	}
+      else if (sign1 == UNSIGNED && sign2 == SIGNED)
+	subtype = optab_vector_mixed_sign;
+      else
+	gcc_unreachable ();
+
+      widen_pattern_optab
+	= optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), subtype);
+    }
   else
     widen_pattern_optab
       = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), optab_default);
@@ -298,10 +324,7 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
   gcc_assert (icode != CODE_FOR_nothing);
 
   if (nops >= 2)
-    {
-      oprnd1 = ops->op1;
-      tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
-    }
+    tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
   else if (sbool)
     {
       nops = 2;
@@ -316,7 +339,6 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
     {
       gcc_assert (tmode1 == tmode0);
       gcc_assert (op1);
-      oprnd2 = ops->op2;
       wmode = TYPE_MODE (TREE_TYPE (oprnd2));
     }
 
diff --git a/gcc/optabs.def b/gcc/optabs.def
index b192a9d070b8aa72e5676b2eaa020b5bdd7ffcc8..f470c2168378cec840edf7fbdb7c18615baae928 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -352,6 +352,7 @@ OPTAB_D (uavg_ceil_optab, "uavg$a3_ceil")
 OPTAB_D (sdot_prod_optab, "sdot_prod$I$a")
 OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3")
 OPTAB_D (udot_prod_optab, "udot_prod$I$a")
+OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
 OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
 OPTAB_D (usad_optab, "usad$I$a")
 OPTAB_D (ssad_optab, "ssad$I$a")
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index 7e3aae5f9c28a49feedc7cc66e8ac0d476b9f28a..0128891852fcd74fe31cd338614e90a26256b4bd 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -4421,7 +4421,8 @@ verify_gimple_assign_ternary (gassign *stmt)
 		  && !SCALAR_FLOAT_TYPE_P (rhs1_type))
 		 || (!INTEGRAL_TYPE_P (lhs_type)
 		     && !SCALAR_FLOAT_TYPE_P (lhs_type))))
-	    || !types_compatible_p (rhs1_type, rhs2_type)
+	    /* rhs1_type and rhs2_type may differ in sign.  */
+	    || !tree_nop_conversion_p (rhs1_type, rhs2_type)
 	    || !useless_type_conversion_p (lhs_type, rhs3_type)
 	    || maybe_lt (GET_MODE_SIZE (element_mode (rhs3_type)),
 			 2 * GET_MODE_SIZE (element_mode (rhs1_type))))
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 93fa2928e001c154bd4a9a73ac1dbbbf73c456df..756d2867b678d0d8394202c6adb03d9cd26029e7 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -6662,6 +6662,12 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
   bool lane_reduc_code_p
     = (code == DOT_PROD_EXPR || code == WIDEN_SUM_EXPR || code == SAD_EXPR);
   int op_type = TREE_CODE_LENGTH (code);
+  enum optab_subtype optab_query_kind = optab_vector;
+  if (code == DOT_PROD_EXPR
+      && TYPE_SIGN (TREE_TYPE (gimple_assign_rhs1 (stmt)))
+	   != TYPE_SIGN (TREE_TYPE (gimple_assign_rhs2 (stmt))))
+    optab_query_kind = optab_vector_mixed_sign;
+
 
   scalar_dest = gimple_assign_lhs (stmt);
   scalar_type = TREE_TYPE (scalar_dest);
@@ -7189,7 +7195,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
       bool ok = true;
 
       /* 4.1. check support for the operation in the loop  */
-      optab optab = optab_for_tree_code (code, vectype_in, optab_vector);
+      optab optab = optab_for_tree_code (code, vectype_in, optab_query_kind);
       if (!optab)
 	{
 	  if (dump_enabled_p ())
diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index 441d6cd28c4eaded7abd756164890dbcffd2f3b8..82123b96313e6783ea214b9259805d65c07d8858 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -201,7 +201,8 @@ vect_get_external_def_edge (vec_info *vinfo, tree var)
 static bool
 vect_supportable_direct_optab_p (vec_info *vinfo, tree otype, tree_code code,
 				 tree itype, tree *vecotype_out,
-				 tree *vecitype_out = NULL)
+				 tree *vecitype_out = NULL,
+				 enum optab_subtype subtype = optab_default)
 {
   tree vecitype = get_vectype_for_scalar_type (vinfo, itype);
   if (!vecitype)
@@ -211,7 +212,7 @@ vect_supportable_direct_optab_p (vec_info *vinfo, tree otype, tree_code code,
   if (!vecotype)
     return false;
 
-  optab optab = optab_for_tree_code (code, vecitype, optab_default);
+  optab optab = optab_for_tree_code (code, vecitype, subtype);
   if (!optab)
     return false;
 
@@ -487,10 +488,14 @@ vect_joust_widened_integer (tree type, bool shift_p, tree op,
 }
 
 /* Return true if the common supertype of NEW_TYPE and *COMMON_TYPE
-   is narrower than type, storing the supertype in *COMMON_TYPE if so.  */
+   is narrower than type, storing the supertype in *COMMON_TYPE if so.
+   If UNPROM_TYPE then accept that *COMMON_TYPE and NEW_TYPE may be of
+   different signs but equal precision and that the resulting
+   multiplication of them be compatible with UNPROM_TYPE.   */
 
 static bool
-vect_joust_widened_type (tree type, tree new_type, tree *common_type)
+vect_joust_widened_type (tree type, tree new_type, tree *common_type,
+			 tree unprom_type = NULL)
 {
   if (types_compatible_p (*common_type, new_type))
     return true;
@@ -514,7 +519,18 @@ vect_joust_widened_type (tree type, tree new_type, tree *common_type)
   unsigned int precision = MAX (TYPE_PRECISION (*common_type),
 				TYPE_PRECISION (new_type));
   precision *= 2;
-  if (precision * 2 > TYPE_PRECISION (type))
+
+  /* Check if the mismatch is only in the sign and if we have
+     UNPROM_TYPE then allow it if there is enough precision to
+     not lose any information during the conversion.  */
+  if (unprom_type
+      && TYPE_SIGN (unprom_type) == SIGNED
+      && tree_nop_conversion_p (*common_type, new_type))
+	return true;
+
+  /* The resulting application is unsigned, check if we have enough
+     precision to perform the operation.  */
+  if (precision * 2 > TYPE_PRECISION (unprom_type ? unprom_type : type))
     return false;
 
   *common_type = build_nonstandard_integer_type (precision, false);
@@ -532,6 +548,10 @@ vect_joust_widened_type (tree type, tree new_type, tree *common_type)
    to a type that (a) is narrower than the result of STMT_INFO and
    (b) can hold all leaf operand values.
 
+   If UNPROM_TYPE then allow that the signs of the operands
+   may differ in signs but not in precision and that the resulting type
+   of the operation on the operands is compatible with UNPROM_TYPE.
+
    Return 0 if STMT_INFO isn't such a tree, or if no such COMMON_TYPE
    exists.  */
 
@@ -539,7 +559,8 @@ static unsigned int
 vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
 		      tree_code widened_code, bool shift_p,
 		      unsigned int max_nops,
-		      vect_unpromoted_value *unprom, tree *common_type)
+		      vect_unpromoted_value *unprom, tree *common_type,
+		      tree unprom_type = NULL)
 {
   /* Check for an integer operation with the right code.  */
   gassign *assign = dyn_cast <gassign *> (stmt_info->stmt);
@@ -600,7 +621,8 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
 		= vinfo->lookup_def (this_unprom->op);
 	      nops = vect_widened_op_tree (vinfo, def_stmt_info, code,
 					   widened_code, shift_p, max_nops,
-					   this_unprom, common_type);
+					   this_unprom, common_type,
+					   unprom_type);
 	      if (nops == 0)
 		return 0;
 
@@ -617,7 +639,7 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
 	      if (i == 0)
 		*common_type = this_unprom->type;
 	      else if (!vect_joust_widened_type (type, this_unprom->type,
-						 common_type))
+						 common_type, unprom_type))
 		return 0;
 	    }
 	}
@@ -799,12 +821,15 @@ vect_convert_input (vec_info *vinfo, stmt_vec_info stmt_info, tree type,
 }
 
 /* Invoke vect_convert_input for N elements of UNPROM and store the
-   result in the corresponding elements of RESULT.  */
+   result in the corresponding elements of RESULT.
+
+   If ALLOW_SHORT_SIGN_MISMATCH then don't convert the types if they only
+   differ by sign.  */
 
 static void
 vect_convert_inputs (vec_info *vinfo, stmt_vec_info stmt_info, unsigned int n,
 		     tree *result, tree type, vect_unpromoted_value *unprom,
-		     tree vectype)
+		     tree vectype, bool allow_short_sign_mismatch = false)
 {
   for (unsigned int i = 0; i < n; ++i)
     {
@@ -812,8 +837,12 @@ vect_convert_inputs (vec_info *vinfo, stmt_vec_info stmt_info, unsigned int n,
       for (j = 0; j < i; ++j)
 	if (unprom[j].op == unprom[i].op)
 	  break;
+
       if (j < i)
 	result[i] = result[j];
+      else if (allow_short_sign_mismatch
+	       && tree_nop_conversion_p (type, unprom[i].type))
+	result[i] = unprom[i].op;
       else
 	result[i] = vect_convert_input (vinfo, stmt_info,
 					type, &unprom[i], vectype);
@@ -888,21 +917,24 @@ vect_reassociating_reduction_p (vec_info *vinfo,
 
    Try to find the following pattern:
 
-     type x_t, y_t;
+     type1a x_t
+     type1b y_t;
      TYPE1 prod;
      TYPE2 sum = init;
    loop:
      sum_0 = phi <init, sum_1>
      S1  x_t = ...
      S2  y_t = ...
-     S3  x_T = (TYPE1) x_t;
-     S4  y_T = (TYPE1) y_t;
+     S3  x_T = (TYPE3) x_t;
+     S4  y_T = (TYPE4) y_t;
      S5  prod = x_T * y_T;
      [S6  prod = (TYPE2) prod;  #optional]
      S7  sum_1 = prod + sum_0;
 
-   where 'TYPE1' is exactly double the size of type 'type', and 'TYPE2' is the
-   same size of 'TYPE1' or bigger. This is a special case of a reduction
+   where 'TYPE1' is exactly double the size of type 'type1a' and 'type1b',
+   the sign of 'TYPE1' must be one of 'type1a' or 'type1b' but the sign of
+   'type1a' and 'type1b' can differ. 'TYPE2' is the same size of 'TYPE1' or
+   bigger and must be the same sign. This is a special case of a reduction
    computation.
 
    Input:
@@ -939,15 +971,16 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
 
   /* Look for the following pattern
           DX = (TYPE1) X;
-          DY = (TYPE1) Y;
+	  DY = (TYPE2) Y;
           DPROD = DX * DY;
-          DDPROD = (TYPE2) DPROD;
+	  DDPROD = (TYPE3) DPROD;
           sum_1 = DDPROD + sum_0;
      In which
      - DX is double the size of X
      - DY is double the size of Y
      - DX, DY, DPROD all have the same type but the sign
-       between DX, DY and DPROD can differ.
+       between DX, DY and DPROD can differ. The sign of DPROD
+       is one of the signs of DX or DY.
      - sum is the same size of DPROD or bigger
      - sum has been recognized as a reduction variable.
 
@@ -986,20 +1019,29 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
      inside the loop (in case we are analyzing an outer-loop).  */
   vect_unpromoted_value unprom0[2];
   if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR, WIDEN_MULT_EXPR,
-			     false, 2, unprom0, &half_type))
+			     false, 2, unprom0, &half_type,
+			     TREE_TYPE (unprom_mult.op)))
     return NULL;
 
+  /* Check to see if there is a sign change happening in the operands of the
+     multiplication and pick the appropriate optab subtype.  */
+  enum optab_subtype subtype;
+  if (TYPE_SIGN (unprom0[0].type) == TYPE_SIGN (unprom0[1].type))
+    subtype = optab_default;
+  else
+    subtype = optab_vector_mixed_sign;
+
   vect_pattern_detected ("vect_recog_dot_prod_pattern", last_stmt);
 
   tree half_vectype;
   if (!vect_supportable_direct_optab_p (vinfo, type, DOT_PROD_EXPR, half_type,
-					type_out, &half_vectype))
+					type_out, &half_vectype, subtype))
     return NULL;
 
   /* Get the inputs in the appropriate types.  */
   tree mult_oprnd[2];
   vect_convert_inputs (vinfo, stmt_vinfo, 2, mult_oprnd, half_type,
-		       unprom0, half_vectype);
+		       unprom0, half_vectype, true);
 
   var = vect_recog_temp_ssa_var (type, NULL);
   pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,

> -----Original Message-----
> From: Gcc-patches <gcc-patches-
> bounces+tamar.christina=arm.com@gcc.gnu.org> On Behalf Of Tamar
> Christina via Gcc-patches
> Sent: Wednesday, June 2, 2021 10:28 AM
> To: Richard Biener <rguenther@suse.de>
> Cc: Richard Sandiford <Richard.Sandiford@arm.com>; nd <nd@arm.com>;
> gcc-patches@gcc.gnu.org
> Subject: RE: [PATCH 1/4]middle-end Vect: Add support for dot-product
> where the sign for the multiplicant changes.
> 
> Ping,
> 
> Did you have any comments Richard S?
> 
> Otherwise I'll proceed with respining according to Richi's comments.
> 
> Regards,
> Tamar
> 
> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de>
> > Sent: Wednesday, May 26, 2021 9:57 AM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Sandiford
> > <Richard.Sandiford@arm.com>
> > Subject: RE: [PATCH 1/4]middle-end Vect: Add support for dot-product
> > where the sign for the multiplicant changes.
> >
> > On Tue, 25 May 2021, Tamar Christina wrote:
> >
> > > Hi Richi,
> > >
> > > Here's a respun version of the patch.
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > >
> > > Ok for master?
> >
> > index
> >
> 7e3aae5f9c28a49feedc7cc66e8ac0d476b9f28a..13e405edd765dde704c64348d
> > 2d0b3cd88f0af7c
> > 100644
> > --- a/gcc/tree-cfg.c
> > +++ b/gcc/tree-cfg.c
> > @@ -4421,7 +4421,9 @@ verify_gimple_assign_ternary (gassign *stmt)
> >                   && !SCALAR_FLOAT_TYPE_P (rhs1_type))
> >                  || (!INTEGRAL_TYPE_P (lhs_type)
> >                      && !SCALAR_FLOAT_TYPE_P (lhs_type))))
> > -           || !types_compatible_p (rhs1_type, rhs2_type)
> > +           || (!types_compatible_p (rhs1_type, rhs2_type)
> > +               && TYPE_SIGN (rhs1_type) == TYPE_SIGN (rhs2_type)
> > +               && TYPE_PRECISION (rhs1_type) != TYPE_PRECISION
> > (rhs2_type))
> >
> > I think this doesn't capture the constraints - instead please do
> >
> > -           || !types_compatible_p (rhs1_type, rhs2_type)
> > +           /* rhs1_type and rhs2_type may differ in sign.  */
> > +           || !tree_nop_conversion_p (rhs1_type, rhs2_type)
> >
> >
> > +/* Determine the optab_subtype to use for the given CODE and STMT.
> For
> > +   most CODE this will be optab_vector, however for certain
> > +operations
> > such as
> > +   DOT_PROD_EXPR where the operation can different signs for the
> > operands
> > we
> > +   need to be able to pick the right optabs.  */
> > +
> > +static enum optab_subtype
> > +vect_determine_dot_kind (tree_code code, stmt_vec_info stmt_vinfo)
> >
> > vect_determine_optab_subkind would be a better name.  'code' is
> > redundant (or should better match stmt_vinfo->stmts code).  I wonder
> > if it might be clearer to compute the subtype where we compute 'code'
> > and the relation to stmt_info is obvious, I mean here:
> >
> >   /* 3. Check the operands of the operation.  The first operands are
> > defined
> >         inside the loop body. The last operand is the reduction variable,
> >         which is defined by the loop-header-phi.  */
> >
> >   tree vectype_out = STMT_VINFO_VECTYPE (stmt_info);
> >   STMT_VINFO_REDUC_VECTYPE (reduc_info) = vectype_out;
> >   gassign *stmt = as_a <gassign *> (stmt_info->stmt);
> >   enum tree_code code = gimple_assign_rhs_code (stmt);
> >   bool lane_reduc_code_p
> >     = (code == DOT_PROD_EXPR || code == WIDEN_SUM_EXPR || code ==
> > SAD_EXPR);
> >
> > so just add
> >
> >   enum optab_subtype optab_query_kind = optab_vector;
> >   if (code == DOT_PROD_EXPR
> >       && <sign test>)
> >     optab_query_kind = optab_vector_mixed_sign;
> >
> > in this place and avoid adding the new function?
> >
> > I'm not too familiar with the pattern recog code, a 2nd eye would be
> > prefered (Richard?), but
> >
> > +  /* Check if the mismatch is only in the sign and if we have
> > +     allow_short_sign_mismatch then allow it.  */  if (unprom_type
> > +      && TYPE_SIGN (unprom_type) == SIGNED
> > +      && TYPE_SIGN (*common_type) != TYPE_SIGN (new_type))
> > +    {
> > +      bool sign = TYPE_SIGN (*common_type) == UNSIGNED;
> > +      tree eq_type
> > +       = build_nonstandard_integer_type (TYPE_PRECISION (new_type),
> > +                                         sign);
> > +
> > +      if (types_compatible_p (*common_type, eq_type))
> > +       return true;
> > +    }
> >
> > looks somewhat complicated - is that equal to
> >
> >   if (unprom_type
> >       && tree_nop_conversion_p (*common_type, new_type))
> >     return true;
> >
> > ?  That is, *common_type and new_type only differ in sign?
> >
> > @@ -812,8 +844,13 @@ vect_convert_inputs (vec_info *vinfo,
> > stmt_vec_info stmt_info, unsigned int n,
> >        for (j = 0; j < i; ++j)
> >         if (unprom[j].op == unprom[i].op)
> >           break;
> > +      bool only_sign = allow_short_sign_mismatch
> > +                      && TYPE_SIGN (type) != TYPE_SIGN (unprom[i].type)
> > +                      && TYPE_PRECISION (type) == TYPE_PRECISION
> > (unprom[i].type);
> >
> > this could use the same tree_nop_conversion_p predicate.
> >
> > Otherwise the patch looks good.
> >
> > Thanks,
> > Richard.
> >
> >
> >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > > 	* optabs.def (usdot_prod_optab): New.
> > > 	* doc/md.texi: Document it and clarify other dot prod optabs.
> > > 	* optabs-tree.h (enum optab_subtype): Add
> > optab_vector_mixed_sign.
> > > 	* optabs-tree.c (optab_for_tree_code): Support usdot_prod_optab.
> > > 	* optabs.c (expand_widen_pattern_expr): Likewise.
> > > 	* tree-cfg.c (verify_gimple_assign_ternary): Likewise.
> > > 	* tree-vect-loop.c (vect_determine_dot_kind): New.
> > > 	(vectorizable_reduction): Query dot-product kind.
> > > 	* tree-vect-patterns.c (vect_supportable_direct_optab_p): Take
> > optional
> > > 	optab subtype.
> > > 	(vect_joust_widened_type, vect_widened_op_tree): Optionally
> > ignore
> > > 	mismatch types.
> > > 	(vect_recog_dot_prod_pattern): Support usdot_prod_optab.
> > >
> > >
> > > > -----Original Message-----
> > > > From: Richard Biener <rguenther@suse.de>
> > > > Sent: Monday, May 10, 2021 2:29 PM
> > > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> > > > Subject: RE: [PATCH 1/4]middle-end Vect: Add support for
> > > > dot-product where the sign for the multiplicant changes.
> > > >
> > > > On Mon, 10 May 2021, Tamar Christina wrote:
> > > >
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Richard Biener <rguenther@suse.de>
> > > > > > Sent: Monday, May 10, 2021 12:40 PM
> > > > > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > > > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> > > > > > Subject: RE: [PATCH 1/4]middle-end Vect: Add support for dot-
> > product
> > > > > > where the sign for the multiplicant changes.
> > > > > >
> > > > > > On Fri, 7 May 2021, Tamar Christina wrote:
> > > > > >
> > > > > > > Hi Richi,
> > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Richard Biener <rguenther@suse.de>
> > > > > > > > Sent: Friday, May 7, 2021 12:46 PM
> > > > > > > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > > > > > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> > > > > > > > Subject: Re: [PATCH 1/4]middle-end Vect: Add support for
> > > > > > > > dot-product where the sign for the multiplicant changes.
> > > > > > > >
> > > > > > > > On Wed, 5 May 2021, Tamar Christina wrote:
> > > > > > > >
> > > > > > > > > Hi All,
> > > > > > > > >
> > > > > > > > > This patch adds support for a dot product where the sign
> > > > > > > > > of the multiplication arguments differ. i.e. one is
> > > > > > > > > signed and one is unsigned but the precisions are the same.
> > > > > > > > >
> > > > > > > > > #define N 480
> > > > > > > > > #define SIGNEDNESS_1 unsigned #define SIGNEDNESS_2
> > > > > > > > > signed #define SIGNEDNESS_3 signed #define SIGNEDNESS_4
> > > > > > > > > unsigned
> > > > > > > > >
> > > > > > > > > SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1
> > > > > > > > > int res,
> > > > > > > > > SIGNEDNESS_3 char *restrict a,
> > > > > > > > >    SIGNEDNESS_4 char *restrict b) {
> > > > > > > > >   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> > > > > > > > >     {
> > > > > > > > >       int av = a[i];
> > > > > > > > >       int bv = b[i];
> > > > > > > > >       SIGNEDNESS_2 short mult = av * bv;
> > > > > > > > >       res += mult;
> > > > > > > > >     }
> > > > > > > > >   return res;
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > The operations are performed as if the operands were
> > extended
> > > > > > > > > to a 32-bit
> > > > > > > > value.
> > > > > > > > > As such this operation isn't valid if there is an
> > > > > > > > > intermediate conversion to an unsigned value. i.e.  if
> > > > > > > > > SIGNEDNESS_2 is
> > unsigned.
> > > > > > > > >
> > > > > > > > > more over if the signs of SIGNEDNESS_3 and SIGNEDNESS_4
> > > > > > > > > are flipped the same optab is used but the operands are
> > > > > > > > > flipped in the optab
> > > > > > > > expansion.
> > > > > > > > >
> > > > > > > > > To support this the patch extends the dot-product
> > > > > > > > > detection to optionally ignore operands with different
> > > > > > > > > signs and stores this information in the optab subtype
> > > > > > > > > which is now made a
> > bitfield.
> > > > > > > > >
> > > > > > > > > The subtype can now additionally controls which optab an
> > > > > > > > > EXPR can expand
> > > > > > > > to.
> > > > > > > > >
> > > > > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no
> > issues.
> > > > > > > > >
> > > > > > > > > Ok for master?
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Tamar
> > > > > > > > >
> > > > > > > > > gcc/ChangeLog:
> > > > > > > > >
> > > > > > > > > 	* optabs.def (usdot_prod_optab): New.
> > > > > > > > > 	* doc/md.texi: Document it.
> > > > > > > > > 	* optabs-tree.c (optab_for_tree_code): Support
> > > > usdot_prod_optab.
> > > > > > > > > 	* optabs-tree.h (enum optab_subtype): Likewise.
> > > > > > > > > 	* optabs.c (expand_widen_pattern_expr): Likewise.
> > > > > > > > > 	* tree-cfg.c (verify_gimple_assign_ternary): Likewise.
> > > > > > > > > 	* tree-vect-loop.c (vect_determine_dot_kind): New.
> > > > > > > > > 	(vectorizable_reduction): Query dot-product kind.
> > > > > > > > > 	* tree-vect-patterns.c (vect_supportable_direct_optab_p):
> > > > > > > > > Take
> > > > > > > > optional
> > > > > > > > > 	optab subtype.
> > > > > > > > > 	(vect_joust_widened_type, vect_widened_op_tree):
> > > > Optionally
> > > > > > > > ignore
> > > > > > > > > 	mismatch types.
> > > > > > > > > 	(vect_recog_dot_prod_pattern): Support usdot_prod_optab.
> > > > > > > > >
> > > > > > > > > --- inline copy of patch -- diff --git a/gcc/doc/md.texi
> > > > > > > > > b/gcc/doc/md.texi index
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> >
> d166a0debedf4d8edf55c842bcf4ff4690b3e9ce..baf20416e63745097825fc30fd
> > > > > > > > f2
> > > > > > > > > e66bc80d7d23 100644
> > > > > > > > > --- a/gcc/doc/md.texi
> > > > > > > > > +++ b/gcc/doc/md.texi
> > > > > > > > > @@ -5440,11 +5440,13 @@ Like
> > @samp{fold_left_plus_@var{m}},
> > > > > > > > > but
> > > > > > > > takes
> > > > > > > > > an additional mask operand  @item
> > > > > > > > > @samp{sdot_prod@var{m}}
> > > > > > @cindex
> > > > > > > > > @code{udot_prod@var{m}} instruction pattern  @itemx
> > > > > > > > > @samp{udot_prod@var{m}}
> > > > > > > > > +@cindex @code{usdot_prod@var{m}} instruction pattern
> > @itemx
> > > > > > > > > +@samp{usdot_prod@var{m}}
> > > > > > > > >  Compute the sum of the products of two signed/unsigned
> > > > elements.
> > > > > > > > > -Operand 1 and operand 2 are of the same mode. Their
> > > > > > > > > product, which is of a -wider mode, is computed and added to
> operand 3.
> > > > > > > > > Operand 3 is of a mode equal or -wider than the mode of
> > > > > > > > > the product. The result is placed in operand 0, which
> > > > > > > > > -is of the same mode
> > > > > > as operand 3.
> > > > > > > > > +Operand 1 and operand 2 are of the same mode but may
> > > > > > > > > +differ in
> > > > > > signs.
> > > > > > > > > +Their product, which is of a wider mode, is computed
> > > > > > > > > +and added to
> > > > > > > > operand 3.
> > > > > > > > > +Operand 3 is of a mode equal or wider than the mode of
> > > > > > > > > +the
> > > > product.
> > > > > > > > > +The result is placed in operand 0, which is of the same
> > > > > > > > > +mode as
> > > > > > operand 3.
> > > > > > > >
> > > > > > > > This doesn't really say what the 's', 'u' and 'us' specify.
> > > > > > > > Since we're doing a widen multiplication and then a
> > > > > > > > non-widening addition we only need to know the effective
> > > > > > > > sign of the multiplication so I think
> > > > > > the existing 's' and 'u'
> > > > > > > > are enough to cover all cases?
> > > > > > >
> > > > > > > The existing 's' and 'u' enforce that both operands of the
> > > > > > > multiplication are of the same sign.  So for e.g. 'u' both
> > > > > > > operand must be
> > > > > > unsigned.
> > > > > > >
> > > > > > > In the `us` case one can be signed and one unsigned.
> > > > > > > Operationally this does a sign extension to the wider type
> > > > > > > for the signed value, and the unsigned value gets zero
> > > > > > > extended first, and then converts it to unsigned to perform
> > > > > > > the unsigned multiplication, conforming to the C
> > > > > > promotion rules.
> > > > > > >
> > > > > > > TL;DR; Without a new optab I can't tell during expansion
> > > > > > > which semantic the operation had at the gimple/C level as
> > > > > > > modes don't
> > carry
> > > > signs.
> > > > > > >
> > > > > > > Long version:
> > > > > > >
> > > > > > > The problem with using the existing patterns, because of
> > > > > > > their enforcement of `av` and `bv` being the same sign is
> > > > > > > that we can't remove the explicit sign extensions, but the
> > > > > > > multiplication must be done on
> > > > > > the sign/zero extended char input in the same sign.
> > > > > > >
> > > > > > > Which means (unless I am mistaken) to get the correct
> > > > > > > result, you can't use neither `udot` nor `sdot` as
> > > > > > > semantically these would zero or sign extend both operands
> > > > > > > from char to int to perform the multiplication in the same
> > > > > > > sigh.  Whereas in this case, one parameter is zero
> > > > > > and one parameter is sign extended and the result is always an
> > > > > > unsigned number.
> > > > > > >
> > > > > > > So basically
> > > > > > >
> > > > > > > udot<unsigned c, unsigned a, unsigned b> ==
> > > > > > >    c = zero-ext (a) * zero-ext (b) sdot<signed c, signed a,
> > > > > > > signed
> > > > > > > b> ==
> > > > > > >    c = sign-ext (a) * sign-ext (b) usdot<unsigned c,
> > > > > > > unsigned a, signed b> ==
> > > > > > >    c = ((unsigned-conv) sign-ext (a)) * zero-ext (b)
> > > > > > >
> > > > > > > So semantically the existing optabs won't fit here. udot
> > > > > > > would internally promote to unsigned types before the
> > > > > > > multiplication so the result of the multiplication would be
> > > > > > > wrong.  sdot would promote both to
> > > > > > signed and do signed multiplication, so the result is also wrong.
> > > > > > >
> > > > > > > Now if I relax the constraint on the signs of udot and sdot
> > > > > > > there are two
> > > > > > problems:
> > > > > > > RTL Modes don't contain signs.  So a target can't tell me
> > > > > > > how the operands
> > > > > > will be promoted.
> > > > > > > So:
> > > > > > >
> > > > > > > 1) I can't really check which semantics the target will
> > > > > > > adhere to on
> > > > > > expansion.
> > > > > > > 2) at expand time I have no way to differentiate between the
> > > > > > > two
> > > > > > instructions variants, given just modes
> > > > > > >      I can't tell whether I expand to the normal dot-product
> > > > > > > or the new
> > > > > > instruction.
> > > > > >
> > > > > > Ah, OK.  Indeed with such a weird instruction the new variant
> > > > > > makes
> > > > sense.
> > > > > > Still can you please amend the optab documentation to say
> > > > > > which operand is unsigned and which is signed?  Just 'may differ in
> signs'
> > > > > > is bad.
> > > > >
> > > > > Sure, will expand on it.
> > > > >
> > > > > >
> > > > > > Since the multiplication is commutative I wonder why you need
> > > > > > to handle both signed_to_unsigned and unsigned_to_signed - we
> > should
> > > > > > just enforce a canonical order (like the optab does).
> > > > >
> > > > > Sure, I thought it would have been better to change the order at
> > > > > expand time, but can do so at detection time.
> > > > >
> > > > > > I also think it's a particular bad fit for the bad
> > > > > > optab_for_tree_code API - would any of that improve when using
> > > > > > a direct internal function here?
> > > > >
> > > > > Somewhat, but this has considerable knock on effects, e.g.
> > > > > currently DOT_PROD is treated as a widening operation and so is
> > > > > handled by supportable_widening_operation which does not support
> > > > > calls. There's
> > a
> > > > > significant number of places which work on the tree EXPR
> > > > > (including
> > > > constant folding) which all need to be changed.
> > > > >
> > > > > > In particular all the changes around optab_subtype look like
> > > > > > they make a bad API worse ... at least a single
> > > > > > optab_vector_mixed_sign should suffice here, no need to make it a
> flags kind.
> > > > >
> > > > > The reason I did so is because depending on where the query is
> > > > > done it does use different subtypes currently.  During detection
> > > > > it uses optab_default, and during vectorization optab_vector.
> > > > > For this instruction this difference doesn't seem to be used,
> > > > > but did not want to
> > > > lose this information in case something depended on it.
> > > > >
> > > > > But can make it just one.
> > > > >
> > > > > >
> > > > > > +  /* If we have a sign changing dot product we need to check
> > > > > > + that
> > the
> > > > > > +     promoted type if unsigned has at least the same
> > > > > > + precision as the
> > > > > > final
> > > > > > +     type of the dot-product.  */  if (subtype !=
> > > > > > + optab_default)
> > > > > > +    {
> > > > > > +      tree mult_type = TREE_TYPE (unprom_mult.op);
> > > > > > +      if (TYPE_SIGN (mult_type) == UNSIGNED
> > > > > > +         && TYPE_PRECISION (mult_type) < TYPE_PRECISION (type))
> > > > > > +       return NULL;
> > > > > > +    }
> > > > > >
> > > > > > I don't understand this - how do we ever arrive at a result
> > > > > > with less
> > > > precision?
> > > > >
> > > > > The user could have manually truncated the results, i.e. in the
> > > > > detection code notice `mult`
> > > > >
> > > > >       int av = a[i];
> > > > >       int bv = b[i];
> > > > >       SIGNEDNESS_2 short mult = av * bv;
> > > > >       res += mult;
> > > > >
> > > > > which is a short, so it's manually truncating the multiplication
> > > > > which is done as int by the instruction. If `mult` is unsigned
> > > > > then it will truncate the result if the signed input to usdot
> > > > > was negative, unless the Intermediate calculation is of the same
> > > > > precision as the instruction. i.e. if mult is unsigned int then
> > > > > there's no truncation going on, it's casting from int to
> > > > > unsigned int so it's safe to use then as the instruction does the same
> thing internally.
> > > >
> > > > It looks to me that we simply should only ever allow sing-changes
> > > > from multiplication result to the sum.  At least your example
> > > > above is not
> > special to
> > > > mixed sign multiplications, no?
> > > >
> > > > > > And why's this not an issue for signed multiplication?
> > > > >
> > > > > It is, but in that case it's handled by the type jousting, which
> > > > > doesn't allow the type mismatch. i.e.
> > > > >
> > > > > #define SIGNEDNESS_1 unsigned
> > > > > #define SIGNEDNESS_2 unsigned
> > > > > #define SIGNEDNESS_3 signed
> > > > > #define SIGNEDNESS_4 signed
> > > > >
> > > > > SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int
> > > > > res,
> > > > > SIGNEDNESS_3 char *restrict a,
> > > > >    SIGNEDNESS_4 char *restrict b) {
> > > > >   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> > > > >     {
> > > > >       int av = a[i];
> > > > >       int bv = b[i];
> > > > >       SIGNEDNESS_2 short mult = av * bv;
> > > > >       res += mult;
> > > > >     }
> > > > >   return res;
> > > > > }
> > > > >
> > > > > Is also not detected as a dot product.  By adding the carve out
> > > > > to the widen multiplication detection it now allows this case
> > > > > through so I handle it in the detection code.  Thinking about it
> > > > > now, it seems more logical to add this case handling inside the
> > > > > type jousting code as I don't think it's ever something you'd want.
> > > >
> > > > Yeah, I think we only need to look through sign changes on the
> > multiplication
> > > > result.
> > > >
> > > > > > Also...
> > > > > >
> > > > > > +  /* If we have a sign changing dot-product the dot-product
> > > > > > + itself does
> > > > > > any
> > > > > > +     sign conversions, so consume the type and use the
> > > > > > + unpromoted
> > > > types.
> > > > > > */
> > > > > > +  tree mult_arg1, mult_arg2;
> > > > > > +  if (subtype == optab_default)
> > > > > > +    {
> > > > > > +      mult_arg1 = mult_oprnd[0];
> > > > > > +      mult_arg2 = mult_oprnd[1];
> > > > > > +    }
> > > > > > +  else
> > > > > > +    {
> > > > > > +      mult_arg1 = unprom0[0].op;
> > > > > > +      mult_arg2 = unprom0[1].op;
> > > > > > +    }
> > > > > >    pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,
> > > > > > -                                     mult_oprnd[0], mult_oprnd[1],
> > > > > > oprnd1);
> > > > > > +                                     mult_arg1, mult_arg2,
> > > > > > + oprnd1);
> > > > > >
> > > > > > I thought DOT_PROD always performs the promotion.  Maybe
> > > > mult_oprnd
> > > > > > and unprom0 are just misnamed here?
> > > > >
> > > > > Somewhat, in a normal dot-product the sign of the multiplication
> > > > > are the same here as the "unpromoted" types. So after
> > vect_convert_input
> > > > > these two types are the same.
> > > > >
> > > > > However because here the sign changes and to maintain the
> > > > > semantics
> > of
> > > > > the C code there's an extra conversion here to get the arguments
> > > > > in the same sign.  That needs to be stripped before given to the
> > > > > instruction which does the conversion internally.
> > > >
> > > > Yes, but then why's that not done by the detection code?  That is,
> > > > does it (mis-)handle the (int)short_a * (int)(unsigned
> > > > short)short_b where we'd want the mixed-sign handling and not
> > > > strip the unsigned short conversion from short_b?
> > > >
> > > > Richard.
> > > >
> > > > >
> > > > > Regards,
> > > > > Tamar
> > > > >
> > > > > >
> > > > > > Richard.
> > > > > >
> > > > > > > Regards,
> > > > > > > Tamar
> > > > > > >
> > > > > > > >
> > > > > > > > The tree.def docs say the sum is also possibly widening
> > > > > > > > but I don't see this covered by the optab so we should
> > > > > > > > eventually remove this feature from the tree side.  In
> > > > > > > > fact the tree-cfg.c verifier requires the addition to be
> > > > > > > > not widening - thus only tree.def needs
> > > > > > adjustment.
> > > > > > > >
> > > > > > > > >  @cindex @code{ssad@var{m}} instruction pattern  @item
> > > > > > > > > @samp{ssad@var{m}} diff --git a/gcc/optabs-tree.h
> > > > > > > > > b/gcc/optabs-tree.h index
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> >
> c3aaa1a416991e856d3e24da45968a92ebada82c..ebc23ac86fe99057f375781c2f
> > > > > > > > 19
> > > > > > > > > 90e0548ba08d 100644
> > > > > > > > > --- a/gcc/optabs-tree.h
> > > > > > > > > +++ b/gcc/optabs-tree.h
> > > > > > > > > @@ -27,11 +27,29 @@ along with GCC; see the file COPYING3.
> > If
> > > > > > > > > not
> > > > > > see
> > > > > > > > >     shift amount vs. machines that take a vector for the
> > > > > > > > > shift
> > amount.
> > > > > > > > > */  enum optab_subtype  {
> > > > > > > > > -  optab_default,
> > > > > > > > > -  optab_scalar,
> > > > > > > > > -  optab_vector
> > > > > > > > > +  optab_default = 1 << 0,  optab_scalar = 1 << 1,
> > > > > > > > > + optab_vector = 1 << 2,  optab_signed_to_unsigned = 1
> > > > > > > > > + << 3, optab_unsigned_to_signed =
> > > > > > > > > + 1 << 4
> > > > > > > > >  };
> > > > > > > > >
> > > > > > > > > +/* Override the OrEqual-operator so we can use
> > optab_subtype
> > > > > > > > > +as a bit flag.  */ inline enum optab_subtype& operator
> > > > > > > > > +|= (enum
> > > > > > > > optab_subtype&
> > > > > > > > > +a, enum optab_subtype b) {
> > > > > > > > > +    return a = static_cast<optab_subtype>(static_cast<int>(a)
> > > > > > > > > +					  |
> static_cast<int>(b)); }
> > > > > > > > > +
> > > > > > > > > +/* Override the Or-operator so we can use optab_subtype
> > > > > > > > > +as a bit flag.  */ inline enum optab_subtype operator |
> > > > > > > > > +(enum optab_subtype a, enum optab_subtype b) {
> > > > > > > > > +    return static_cast<optab_subtype>(static_cast<int>(a)
> > > > > > > > > +				      | static_cast<int>(b)); }
> > > > > > > > > +
> > > > > > > > >  /* Return the optab used for computing the given
> > > > > > > > > operation on the type
> > > > > > > > given by
> > > > > > > > >     the second argument.  The third argument
> > > > > > > > > distinguishes between the
> > > > > > > > types of
> > > > > > > > >     vector shifts and rotates.  */ diff --git
> > > > > > > > > a/gcc/optabs-tree.c b/gcc/optabs-tree.c index
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> >
> 95ffe397c23e80c105afea52e9d47216bf52f55a..2f60004545defc53182e004eea
> > > > > > > > 1e
> > > > > > > > > 5c22b7453072 100644
> > > > > > > > > --- a/gcc/optabs-tree.c
> > > > > > > > > +++ b/gcc/optabs-tree.c
> > > > > > > > > @@ -127,7 +127,17 @@ optab_for_tree_code (enum
> tree_code
> > > > code,
> > > > > > > > const_tree type,
> > > > > > > > >        return TYPE_UNSIGNED (type) ? usum_widen_optab :
> > > > > > > > > ssum_widen_optab;
> > > > > > > > >
> > > > > > > > >      case DOT_PROD_EXPR:
> > > > > > > > > -      return TYPE_UNSIGNED (type) ? udot_prod_optab :
> > > > > > sdot_prod_optab;
> > > > > > > > > +      {
> > > > > > > > > +	gcc_assert (subtype & optab_default
> > > > > > > > > +		    || subtype & optab_vector
> > > > > > > > > +		    || subtype & optab_signed_to_unsigned
> > > > > > > > > +		    || subtype & optab_unsigned_to_signed);
> > > > > > > > > +
> > > > > > > > > +	if (subtype & (optab_unsigned_to_signed |
> > > > > > > > optab_signed_to_unsigned))
> > > > > > > > > +	  return usdot_prod_optab;
> > > > > > > > > +
> > > > > > > > > +	return (TYPE_UNSIGNED (type) ? udot_prod_optab :
> > > > > > > > sdot_prod_optab);
> > > > > > > > > +      }
> > > > > > > > >
> > > > > > > > >      case SAD_EXPR:
> > > > > > > > >        return TYPE_UNSIGNED (type) ? usad_optab :
> > > > > > > > > ssad_optab; diff --git a/gcc/optabs.c b/gcc/optabs.c
> > > > > > > > > index
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> >
> f4614a394587787293dc8b680a38901f7906f61c..2e18b76de1412eab71971753ac
> > > > > > > > 67
> > > > > > > > > 8597c0d00098 100644
> > > > > > > > > --- a/gcc/optabs.c
> > > > > > > > > +++ b/gcc/optabs.c
> > > > > > > > > @@ -262,6 +262,11 @@ expand_widen_pattern_expr (sepops
> > ops,
> > > > > > > > > rtx op0,
> > > > > > > > rtx op1, rtx wide_op,
> > > > > > > > >    bool sbool = false;
> > > > > > > > >
> > > > > > > > >    oprnd0 = ops->op0;
> > > > > > > > > +  if (nops >= 2)
> > > > > > > > > +    oprnd1 = ops->op1;
> > > > > > > > > +  if (nops >= 3)
> > > > > > > > > +    oprnd2 = ops->op2;
> > > > > > > > > +
> > > > > > > > >    tmode0 = TYPE_MODE (TREE_TYPE (oprnd0));
> > > > > > > > >    if (ops->code == VEC_UNPACK_FIX_TRUNC_HI_EXPR
> > > > > > > > >        || ops->code == VEC_UNPACK_FIX_TRUNC_LO_EXPR)
> @@
> > > > > > > > > -
> > > > 285,6
> > > > > > > > +290,27
> > > > > > > > > @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx
> > > > > > > > > op1, rtx
> > > > > > > > wide_op,
> > > > > > > > >  	   ? vec_unpacks_sbool_hi_optab :
> > > > vec_unpacks_sbool_lo_optab);
> > > > > > > > >        sbool = true;
> > > > > > > > >      }
> > > > > > > > > +  else if (ops->code == DOT_PROD_EXPR)
> > > > > > > > > +    {
> > > > > > > > > +      enum optab_subtype subtype = optab_default;
> > > > > > > > > +      signop sign1 = TYPE_SIGN (TREE_TYPE (oprnd0));
> > > > > > > > > +      signop sign2 = TYPE_SIGN (TREE_TYPE (oprnd1));
> > > > > > > > > +      if (sign1 == sign2)
> > > > > > > > > +	;
> > > > > > > > > +      else if (sign1 == SIGNED && sign2 == UNSIGNED)
> > > > > > > > > +	{
> > > > > > > > > +	  subtype |= optab_signed_to_unsigned;
> > > > > > > > > +	  /* Same as optab_unsigned_to_signed but flip the
> > > > operands.  */
> > > > > > > > > +	  std::swap (op0, op1);
> > > > > > > > > +	}
> > > > > > > > > +      else if (sign1 == UNSIGNED && sign2 == SIGNED)
> > > > > > > > > +	subtype |= optab_unsigned_to_signed;
> > > > > > > > > +      else
> > > > > > > > > +	gcc_unreachable ();
> > > > > > > > > +
> > > > > > > > > +      widen_pattern_optab
> > > > > > > > > +	= optab_for_tree_code (ops->code, TREE_TYPE
> (oprnd0),
> > > > subtype);
> > > > > > > > > +    }
> > > > > > > > >    else
> > > > > > > > >      widen_pattern_optab
> > > > > > > > >        = optab_for_tree_code (ops->code, TREE_TYPE
> > > > > > > > > (oprnd0), optab_default); @@ -298,10 +324,7 @@
> > > > expand_widen_pattern_expr
> > > > > > > > (sepops ops, rtx op0, rtx op1, rtx wide_op,
> > > > > > > > >    gcc_assert (icode != CODE_FOR_nothing);
> > > > > > > > >
> > > > > > > > >    if (nops >= 2)
> > > > > > > > > -    {
> > > > > > > > > -      oprnd1 = ops->op1;
> > > > > > > > > -      tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
> > > > > > > > > -    }
> > > > > > > > > +    tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
> > > > > > > > >    else if (sbool)
> > > > > > > > >      {
> > > > > > > > >        nops = 2;
> > > > > > > > > @@ -316,7 +339,6 @@ expand_widen_pattern_expr (sepops
> > ops,
> > > > rtx
> > > > > > > > > op0,
> > > > > > > > rtx op1, rtx wide_op,
> > > > > > > > >      {
> > > > > > > > >        gcc_assert (tmode1 == tmode0);
> > > > > > > > >        gcc_assert (op1);
> > > > > > > > > -      oprnd2 = ops->op2;
> > > > > > > > >        wmode = TYPE_MODE (TREE_TYPE (oprnd2));
> > > > > > > > >      }
> > > > > > > > >
> > > > > > > > > diff --git a/gcc/optabs.def b/gcc/optabs.def index
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> >
> b192a9d070b8aa72e5676b2eaa020b5bdd7ffcc8..f470c2168378cec840edf7fbd
> > > > > > > > b7c
> > > > > > > > > 18615baae928 100644
> > > > > > > > > --- a/gcc/optabs.def
> > > > > > > > > +++ b/gcc/optabs.def
> > > > > > > > > @@ -352,6 +352,7 @@ OPTAB_D (uavg_ceil_optab,
> > "uavg$a3_ceil")
> > > > > > > > OPTAB_D
> > > > > > > > > (sdot_prod_optab, "sdot_prod$I$a")  OPTAB_D
> > > > (ssum_widen_optab,
> > > > > > > > > "widen_ssum$I$a3")  OPTAB_D (udot_prod_optab,
> > > > "udot_prod$I$a")
> > > > > > > > > +OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
> > > > > > > > >  OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
> > OPTAB_D
> > > > > > > > (usad_optab,
> > > > > > > > > "usad$I$a")  OPTAB_D (ssad_optab, "ssad$I$a") diff --git
> > > > > > > > > a/gcc/tree-cfg.c b/gcc/tree-cfg.c index
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> >
> 7e3aae5f9c28a49feedc7cc66e8ac0d476b9f28a..58b55bb648ad97d514f1fa18bb
> > > > > > > > 00
> > > > > > > > > 808fd2678b42 100644
> > > > > > > > > --- a/gcc/tree-cfg.c
> > > > > > > > > +++ b/gcc/tree-cfg.c
> > > > > > > > > @@ -4421,7 +4421,8 @@ verify_gimple_assign_ternary
> > > > > > > > > (gassign
> > > > *stmt)
> > > > > > > > >  		  && !SCALAR_FLOAT_TYPE_P (rhs1_type))
> > > > > > > > >  		 || (!INTEGRAL_TYPE_P (lhs_type)
> > > > > > > > >  		     && !SCALAR_FLOAT_TYPE_P (lhs_type))))
> > > > > > > > > -	    || !types_compatible_p (rhs1_type, rhs2_type)
> > > > > > > > > +	    || (!types_compatible_p (rhs1_type, rhs2_type)
> > > > > > > > > +		&& TYPE_SIGN (rhs1_type) == TYPE_SIGN
> > > > (rhs2_type))
> > > > > > > >
> > > > > > > > That's not restrictive enough.  I suggest you use
> > > > > > > >
> > > > > > > >             && element_precision (rhs1_type) !=
> > > > > > > > element_precision
> > > > > > > > (rhs2_type)
> > > > > > > >
> > > > > > > > instead.
> > > > > > > >
> > > > > > > > As said, I'm not sure all the changes in this patch are required.
> > > > > > > >
> > > > > > > > Please elaborate.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Richard.
> > > > > > > >
> > > > > > > > >  	    || !useless_type_conversion_p (lhs_type, rhs3_type)
> > > > > > > > >  	    || maybe_lt (GET_MODE_SIZE (element_mode
> > > > (rhs3_type)),
> > > > > > > > >  			 2 * GET_MODE_SIZE (element_mode
> > > > (rhs1_type))))
> > > > > > > > diff --git
> > > > > > > > > a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> >
> 93fa2928e001c154bd4a9a73ac1dbbbf73c456df..cb8f5fbb6abca181c4171194d1
> > > > > > > > 9f
> > > > > > > > > ec29ec6e4176 100644
> > > > > > > > > --- a/gcc/tree-vect-loop.c
> > > > > > > > > +++ b/gcc/tree-vect-loop.c
> > > > > > > > > @@ -6401,6 +6401,33 @@ build_vect_cond_expr (enum
> > tree_code
> > > > > > code,
> > > > > > > > tree vop[3], tree mask,
> > > > > > > > >      }
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > > > +/* Determine the optab_subtype to use for the given
> > > > > > > > > +CODE
> > and
> > > > STMT.
> > > > > > > > For
> > > > > > > > > +   most CODE this will be optab_vector, however for
> > > > > > > > > + certain operations
> > > > > > > > such as
> > > > > > > > > +   DOT_PROD_EXPR where the operation can different
> > > > > > > > > + signs for the
> > > > > > > > operands we
> > > > > > > > > +   need to be able to pick the right optabs.  */
> > > > > > > > > +
> > > > > > > > > +static enum optab_subtype vect_determine_dot_kind
> > > > > > > > > +(tree_code code, stmt_vec_info
> > > > > > > > > +stmt_vinfo) {
> > > > > > > > > +  enum optab_subtype subtype = optab_vector;
> > > > > > > > > +  switch (code)
> > > > > > > > > +    {
> > > > > > > > > +      case DOT_PROD_EXPR:
> > > > > > > > > +	{
> > > > > > > > > +	  gassign *stmt = as_a <gassign *>
> (STMT_VINFO_STMT
> > > > (stmt_vinfo));
> > > > > > > > > +	  signop rhs1_sign = TYPE_SIGN (TREE_TYPE
> > > > > > > > > +(gimple_assign_rhs1
> > > > > > > > (stmt)));
> > > > > > > > > +	  signop rhs2_sign = TYPE_SIGN (TREE_TYPE
> > > > > > > > > +(gimple_assign_rhs2
> > > > > > > > (stmt)));
> > > > > > > > > +	  if (rhs1_sign != rhs2_sign)
> > > > > > > > > +	    subtype |= optab_unsigned_to_signed;
> > > > > > > > > +	  break;
> > > > > > > > > +	}
> > > > > > > > > +      default:
> > > > > > > > > +	break;
> > > > > > > > > +    }
> > > > > > > > > +
> > > > > > > > > +  return subtype;
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > >  /* Function vectorizable_reduction.
> > > > > > > > >
> > > > > > > > >     Check if STMT_INFO performs a reduction operation
> > > > > > > > > that can be
> > > > > > > > vectorized.
> > > > > > > > > @@ -7189,7 +7216,8 @@ vectorizable_reduction
> > > > > > > > > (loop_vec_info
> > > > > > > > loop_vinfo,
> > > > > > > > >        bool ok = true;
> > > > > > > > >
> > > > > > > > >        /* 4.1. check support for the operation in the loop  */
> > > > > > > > > -      optab optab = optab_for_tree_code (code, vectype_in,
> > > > > > optab_vector);
> > > > > > > > > +      enum optab_subtype subtype =
> > > > > > > > > + vect_determine_dot_kind (code,
> > > > > > > > stmt_info);
> > > > > > > > > +      optab optab = optab_for_tree_code (code,
> > > > > > > > > + vectype_in, subtype);
> > > > > > > > >        if (!optab)
> > > > > > > > >  	{
> > > > > > > > >  	  if (dump_enabled_p ()) diff --git
> > > > > > > > > a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
> > > > > > > > > index
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> >
> 441d6cd28c4eaded7abd756164890dbcffd2f3b8..943c001fb13777b4d1513841f
> > > > > > > > a84
> > > > > > > > > 942316846d5e 100644
> > > > > > > > > --- a/gcc/tree-vect-patterns.c
> > > > > > > > > +++ b/gcc/tree-vect-patterns.c
> > > > > > > > > @@ -201,7 +201,8 @@ vect_get_external_def_edge
> (vec_info
> > > > > > > > > *vinfo, tree
> > > > > > > > > var)  static bool  vect_supportable_direct_optab_p
> > > > > > > > > (vec_info *vinfo, tree otype, tree_code code,
> > > > > > > > >  				 tree itype, tree *vecotype_out,
> > > > > > > > > -				 tree *vecitype_out = NULL)
> > > > > > > > > +				 tree *vecitype_out = NULL,
> > > > > > > > > +				 enum optab_subtype
> subtype =
> > > > > > > > optab_default)
> > > > > > > > >  {
> > > > > > > > >    tree vecitype = get_vectype_for_scalar_type (vinfo, itype);
> > > > > > > > >    if (!vecitype)
> > > > > > > > > @@ -211,7 +212,7 @@ vect_supportable_direct_optab_p
> > (vec_info
> > > > > > > > > *vinfo,
> > > > > > > > tree otype, tree_code code,
> > > > > > > > >    if (!vecotype)
> > > > > > > > >      return false;
> > > > > > > > >
> > > > > > > > > -  optab optab = optab_for_tree_code (code, vecitype,
> > > > > > > > > optab_default);
> > > > > > > > > +  optab optab = optab_for_tree_code (code, vecitype,
> > > > > > > > > + subtype);
> > > > > > > > >    if (!optab)
> > > > > > > > >      return false;
> > > > > > > > >
> > > > > > > > > @@ -487,14 +488,31 @@ vect_joust_widened_integer (tree
> > type,
> > > > > > > > > bool shift_p, tree op,  }
> > > > > > > > >
> > > > > > > > >  /* Return true if the common supertype of NEW_TYPE and
> > > > > > > > *COMMON_TYPE
> > > > > > > > > -   is narrower than type, storing the supertype in
> > *COMMON_TYPE
> > > > if
> > > > > > so.
> > > > > > > > */
> > > > > > > > > +   is narrower than type, storing the supertype in
> > > > > > > > > + *COMMON_TYPE if
> > > > > > so.
> > > > > > > > > +   If ALLOW_SHORT_SIGN_MISMATCH then accept that
> > > > > > *COMMON_TYPE
> > > > > > > > and NEW_TYPE
> > > > > > > > > +   may be of different signs but equal precision.   */
> > > > > > > > >
> > > > > > > > >  static bool
> > > > > > > > > -vect_joust_widened_type (tree type, tree new_type, tree
> > > > > > > > *common_type)
> > > > > > > > > +vect_joust_widened_type (tree type, tree new_type, tree
> > > > > > > > *common_type,
> > > > > > > > > +			 bool allow_short_sign_mismatch =
> false)
> > > > > > > > >  {
> > > > > > > > >    if (types_compatible_p (*common_type, new_type))
> > > > > > > > >      return true;
> > > > > > > > >
> > > > > > > > > +  /* Check if the mismatch is only in the sign and if we have
> > > > > > > > > +     allow_short_sign_mismatch then allow it.  */
> > > > > > > > > +  if (allow_short_sign_mismatch
> > > > > > > > > +      && TYPE_SIGN (*common_type) != TYPE_SIGN
> (new_type))
> > > > > > > > > +    {
> > > > > > > > > +      bool sign = TYPE_SIGN (*common_type) == UNSIGNED;
> > > > > > > > > +      tree eq_type
> > > > > > > > > +	= build_nonstandard_integer_type (TYPE_PRECISION
> > > > (new_type),
> > > > > > > > > +					  sign);
> > > > > > > > > +
> > > > > > > > > +      if (types_compatible_p (*common_type, eq_type))
> > > > > > > > > +	return true;
> > > > > > > > > +    }
> > > > > > > > > +
> > > > > > > > >    /* See if *COMMON_TYPE can hold all values of NEW_TYPE.
> > */
> > > > > > > > >    if ((TYPE_PRECISION (new_type) < TYPE_PRECISION
> > > > (*common_type))
> > > > > > > > >        && (TYPE_UNSIGNED (new_type) || !TYPE_UNSIGNED
> > > > > > > > (*common_type)))
> > > > > > > > > @@ -532,6 +550,9 @@ vect_joust_widened_type (tree type,
> > tree
> > > > > > > > new_type, tree *common_type)
> > > > > > > > >     to a type that (a) is narrower than the result of
> > > > > > > > > STMT_INFO
> > and
> > > > > > > > >     (b) can hold all leaf operand values.
> > > > > > > > >
> > > > > > > > > +   If ALLOW_SHORT_SIGN_MISMATCH then allow that the
> > > > > > > > > + signs
> > of
> > > > > > > > > + the
> > > > > > > > operands
> > > > > > > > > +   may differ in signs but not in precision.
> > > > > > > > > +
> > > > > > > > >     Return 0 if STMT_INFO isn't such a tree, or if no
> > > > > > > > > such
> > > > COMMON_TYPE
> > > > > > > > >     exists.  */
> > > > > > > > >
> > > > > > > > > @@ -539,7 +560,8 @@ static unsigned int
> > vect_widened_op_tree
> > > > > > > > > (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
> > > > > > > > >  		      tree_code widened_code, bool shift_p,
> > > > > > > > >  		      unsigned int max_nops,
> > > > > > > > > -		      vect_unpromoted_value *unprom, tree
> > > > *common_type)
> > > > > > > > > +		      vect_unpromoted_value *unprom, tree
> > > > *common_type,
> > > > > > > > > +		      bool allow_short_sign_mismatch = false)
> > > > > > > > >  {
> > > > > > > > >    /* Check for an integer operation with the right code.  */
> > > > > > > > >    gassign *assign = dyn_cast <gassign *>
> > > > > > > > > (stmt_info->stmt); @@
> > > > > > > > > -600,7
> > > > > > > > > +622,8 @@ vect_widened_op_tree (vec_info *vinfo,
> > > > stmt_vec_info
> > > > > > > > stmt_info, tree_code code,
> > > > > > > > >  		= vinfo->lookup_def (this_unprom->op);
> > > > > > > > >  	      nops = vect_widened_op_tree (vinfo,
> > > > > > > > > def_stmt_info,
> > > > code,
> > > > > > > > >  					   widened_code, shift_p,
> > > > max_nops,
> > > > > > > > > -					   this_unprom,
> > > > common_type);
> > > > > > > > > +					   this_unprom,
> > > > common_type,
> > > > > > > > > +
> > > > allow_short_sign_mismatch);
> > > > > > > > >  	      if (nops == 0)
> > > > > > > > >  		return 0;
> > > > > > > > >
> > > > > > > > > @@ -617,7 +640,8 @@ vect_widened_op_tree (vec_info
> > > > > > > > > *vinfo,
> > > > > > > > stmt_vec_info stmt_info, tree_code code,
> > > > > > > > >  	      if (i == 0)
> > > > > > > > >  		*common_type = this_unprom->type;
> > > > > > > > >  	      else if (!vect_joust_widened_type (type,
> > > > > > > > > this_unprom-
> > > > >type,
> > > > > > > > > -						 common_type))
> > > > > > > > > +
> common_type,
> > > > > > > > > +
> > > > allow_short_sign_mismatch))
> > > > > > > > >  		return 0;
> > > > > > > > >  	    }
> > > > > > > > >  	}
> > > > > > > > > @@ -888,21 +912,24 @@ vect_reassociating_reduction_p
> > (vec_info
> > > > > > > > > *vinfo,
> > > > > > > > >
> > > > > > > > >     Try to find the following pattern:
> > > > > > > > >
> > > > > > > > > -     type x_t, y_t;
> > > > > > > > > +     type1a x_t
> > > > > > > > > +     type1b y_t;
> > > > > > > > >       TYPE1 prod;
> > > > > > > > >       TYPE2 sum = init;
> > > > > > > > >     loop:
> > > > > > > > >       sum_0 = phi <init, sum_1>
> > > > > > > > >       S1  x_t = ...
> > > > > > > > >       S2  y_t = ...
> > > > > > > > > -     S3  x_T = (TYPE1) x_t;
> > > > > > > > > -     S4  y_T = (TYPE1) y_t;
> > > > > > > > > +     S3  x_T = (TYPE3) x_t;
> > > > > > > > > +     S4  y_T = (TYPE4) y_t;
> > > > > > > > >       S5  prod = x_T * y_T;
> > > > > > > > >       [S6  prod = (TYPE2) prod;  #optional]
> > > > > > > > >       S7  sum_1 = prod + sum_0;
> > > > > > > > >
> > > > > > > > > -   where 'TYPE1' is exactly double the size of type 'type', and
> > 'TYPE2'
> > > > is
> > > > > > the
> > > > > > > > > -   same size of 'TYPE1' or bigger. This is a special case of a
> > reduction
> > > > > > > > > +   where 'TYPE1' is exactly double the size of type
> > > > > > > > > + 'type1a' and
> > > > 'type1b',
> > > > > > > > > +   the sign of 'TYPE1' must be one of 'type1a' or
> > > > > > > > > + 'type1b' but the
> > > > sign of
> > > > > > > > > +   'type1a' and 'type1b' can differ. 'TYPE2' is the
> > > > > > > > > + same size of
> > 'TYPE1'
> > > > or
> > > > > > > > > +   bigger and must be the same sign. This is a special
> > > > > > > > > + case of a reduction
> > > > > > > > >     computation.
> > > > > > > > >
> > > > > > > > >     Input:
> > > > > > > > > @@ -939,15 +966,16 @@ vect_recog_dot_prod_pattern
> > (vec_info
> > > > > > > > > *vinfo,
> > > > > > > > >
> > > > > > > > >    /* Look for the following pattern
> > > > > > > > >            DX = (TYPE1) X;
> > > > > > > > > -          DY = (TYPE1) Y;
> > > > > > > > > +	  DY = (TYPE2) Y;
> > > > > > > > >            DPROD = DX * DY;
> > > > > > > > > -          DDPROD = (TYPE2) DPROD;
> > > > > > > > > +	  DDPROD = (TYPE3) DPROD;
> > > > > > > > >            sum_1 = DDPROD + sum_0;
> > > > > > > > >       In which
> > > > > > > > >       - DX is double the size of X
> > > > > > > > >       - DY is double the size of Y
> > > > > > > > >       - DX, DY, DPROD all have the same type but the sign
> > > > > > > > > -       between DX, DY and DPROD can differ.
> > > > > > > > > +       between DX, DY and DPROD can differ. The sign of DPROD
> > > > > > > > > +       is one of the signs of DX or DY.
> > > > > > > > >       - sum is the same size of DPROD or bigger
> > > > > > > > >       - sum has been recognized as a reduction variable.
> > > > > > > > >
> > > > > > > > > @@ -986,14 +1014,41 @@ vect_recog_dot_prod_pattern
> > (vec_info
> > > > > > *vinfo,
> > > > > > > > >       inside the loop (in case we are analyzing an outer-loop).  */
> > > > > > > > >    vect_unpromoted_value unprom0[2];
> > > > > > > > >    if (!vect_widened_op_tree (vinfo, mult_vinfo,
> > > > > > > > > MULT_EXPR,
> > > > > > > > WIDEN_MULT_EXPR,
> > > > > > > > > -			     false, 2, unprom0, &half_type))
> > > > > > > > > +			     false, 2, unprom0, &half_type,
> true))
> > > > > > > > >      return NULL;
> > > > > > > > >
> > > > > > > > > +  /* Check to see if there is a sign change happening
> > > > > > > > > + in the operands of
> > > > > > > > the
> > > > > > > > > +     multiplication and pick the appropriate optab subtype.
> > > > > > > > > +*/
> > > > > > > > > +  enum optab_subtype subtype;
> > > > > > > > > +  tree rhs_type1 = unprom0[0].type;
> > > > > > > > > +  tree rhs_type2 = unprom0[1].type;
> > > > > > > > > +  if (TYPE_SIGN (rhs_type1) == TYPE_SIGN (rhs_type2))
> > > > > > > > > +     subtype = optab_default;
> > > > > > > > > +  else if (TYPE_SIGN (rhs_type1) == SIGNED
> > > > > > > > > +	   && TYPE_SIGN (rhs_type2) == UNSIGNED)
> > > > > > > > > +     subtype = optab_signed_to_unsigned;
> > > > > > > > > +  else if (TYPE_SIGN (rhs_type1) == UNSIGNED
> > > > > > > > > +	   && TYPE_SIGN (rhs_type2) == SIGNED)
> > > > > > > > > +     subtype = optab_unsigned_to_signed;
> > > > > > > > > +  else
> > > > > > > > > +    gcc_unreachable ();
> > > > > > > > > +
> > > > > > > > > +  /* If we have a sign changing dot product we need to
> > > > > > > > > + check
> > that
> > > > the
> > > > > > > > > +     promoted type if unsigned has at least the same
> > > > > > > > > + precision as the
> > > > > > final
> > > > > > > > > +     type of the dot-product.  */
> > > > > > > > > +  if (subtype != optab_default)
> > > > > > > > > +    {
> > > > > > > > > +      tree mult_type = TREE_TYPE (unprom_mult.op);
> > > > > > > > > +      if (TYPE_SIGN (mult_type) == UNSIGNED
> > > > > > > > > +	  && TYPE_PRECISION (mult_type) < TYPE_PRECISION
> (type))
> > > > > > > > > +	return NULL;
> > > > > > > > > +    }
> > > > > > > > > +
> > > > > > > > >    vect_pattern_detected ("vect_recog_dot_prod_pattern",
> > > > > > > > > last_stmt);
> > > > > > > > >
> > > > > > > > >    tree half_vectype;
> > > > > > > > >    if (!vect_supportable_direct_optab_p (vinfo, type,
> > > > > > > > > DOT_PROD_EXPR,
> > > > > > > > half_type,
> > > > > > > > > -					type_out, &half_vectype))
> > > > > > > > > +					type_out,
> &half_vectype,
> > > > subtype))
> > > > > > > > >      return NULL;
> > > > > > > > >
> > > > > > > > >    /* Get the inputs in the appropriate types.  */ @@
> > > > > > > > > -1002,8
> > > > > > > > > +1057,22 @@ vect_recog_dot_prod_pattern (vec_info
> > > > > > > > > +*vinfo,
> > > > > > > > >  		       unprom0, half_vectype);
> > > > > > > > >
> > > > > > > > >    var = vect_recog_temp_ssa_var (type, NULL);
> > > > > > > > > +
> > > > > > > > > +  /* If we have a sign changing dot-product the
> > > > > > > > > + dot-product itself does
> > > > > > any
> > > > > > > > > +     sign conversions, so consume the type and use the
> > > > > > > > > + unpromoted types.  */  tree mult_arg1, mult_arg2;  if
> > > > > > > > > + (subtype ==
> > > > > > > > > + optab_default)
> > > > > > > > > +    {
> > > > > > > > > +      mult_arg1 = mult_oprnd[0];
> > > > > > > > > +      mult_arg2 = mult_oprnd[1];
> > > > > > > > > +    }
> > > > > > > > > +  else
> > > > > > > > > +    {
> > > > > > > > > +      mult_arg1 = unprom0[0].op;
> > > > > > > > > +      mult_arg2 = unprom0[1].op;
> > > > > > > > > +    }
> > > > > > > > >    pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,
> > > > > > > > > -				      mult_oprnd[0], mult_oprnd[1],
> > > > oprnd1);
> > > > > > > > > +				      mult_arg1, mult_arg2,
> oprnd1);
> > > > > > > > >
> > > > > > > > >    return pattern_stmt;
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Richard Biener <rguenther@suse.de> SUSE Software Solutions
> > > > > > > > Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany;
> > GF:
> > > > > > > > Felix Imend?rffer; HRB 36809 (AG Nuernberg)
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > Richard Biener <rguenther@suse.de> SUSE Software Solutions
> > > > > > Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF:
> > > > > > Felix Imend?rffer; HRB 36809 (AG
> > Nuernberg)
> > > > >
> > > >
> > > > --
> > > > Richard Biener <rguenther@suse.de> SUSE Software Solutions
> Germany
> > > > GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix
> > > > Imend?rffer; HRB 36809 (AG Nuernberg)
> > >
> >
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> > Nuernberg, Germany; GF: Felix Imend

[-- Attachment #2: rb14433.patch --]
[-- Type: application/octet-stream, Size: 16555 bytes --]

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index d166a0debedf4d8edf55c842bcf4ff4690b3e9ce..9fad3322b3f1eb2a836833bb390df78f0cd9734b 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5438,13 +5438,55 @@ Like @samp{fold_left_plus_@var{m}}, but takes an additional mask operand
 
 @cindex @code{sdot_prod@var{m}} instruction pattern
 @item @samp{sdot_prod@var{m}}
+
+Compute the sum of the products of two signed elements.
+Operand 1 and operand 2 are of the same mode. Their
+product, which is of a wider mode, is computed and added to operand 3.
+Operand 3 is of a mode equal or wider than the mode of the product. The
+result is placed in operand 0, which is of the same mode as operand 3.
+
+Semantically the expressions perform the multiplication in the following signs
+
+@smallexample
+sdot<signed c, signed a, signed b> ==
+   res = sign-ext (a) * sign-ext (b) + c
+@dots{}
+@end smallexample
+
 @cindex @code{udot_prod@var{m}} instruction pattern
-@itemx @samp{udot_prod@var{m}}
-Compute the sum of the products of two signed/unsigned elements.
-Operand 1 and operand 2 are of the same mode. Their product, which is of a
-wider mode, is computed and added to operand 3. Operand 3 is of a mode equal or
-wider than the mode of the product. The result is placed in operand 0, which
-is of the same mode as operand 3.
+@item @samp{udot_prod@var{m}}
+
+Compute the sum of the products of two unsigned elements.
+Operand 1 and operand 2 are of the same mode. Their
+product, which is of a wider mode, is computed and added to operand 3.
+Operand 3 is of a mode equal or wider than the mode of the product. The
+result is placed in operand 0, which is of the same mode as operand 3.
+
+Semantically the expressions perform the multiplication in the following signs
+
+@smallexample
+udot<unsigned c, unsigned a, unsigned b> ==
+   res = zero-ext (a) * zero-ext (b) + c
+@dots{}
+@end smallexample
+
+
+
+@cindex @code{usdot_prod@var{m}} instruction pattern
+@item @samp{usdot_prod@var{m}}
+Compute the sum of the products of elements of different signs.
+Operand 1 must be unsigned and operand 2 signed. Their
+product, which is of a wider mode, is computed and added to operand 3.
+Operand 3 is of a mode equal or wider than the mode of the product. The
+result is placed in operand 0, which is of the same mode as operand 3.
+
+Semantically the expressions perform the multiplication in the following signs
+
+@smallexample
+usdot<unsigned c, unsigned a, signed b> ==
+   res = ((unsigned-conv) sign-ext (a)) * zero-ext (b) + c
+@dots{}
+@end smallexample
 
 @cindex @code{ssad@var{m}} instruction pattern
 @item @samp{ssad@var{m}}
diff --git a/gcc/optabs-tree.h b/gcc/optabs-tree.h
index c3aaa1a416991e856d3e24da45968a92ebada82c..fbd2b06b8dbfd560dfb66b314830e6b564b37abb 100644
--- a/gcc/optabs-tree.h
+++ b/gcc/optabs-tree.h
@@ -29,7 +29,8 @@ enum optab_subtype
 {
   optab_default,
   optab_scalar,
-  optab_vector
+  optab_vector,
+  optab_vector_mixed_sign
 };
 
 /* Return the optab used for computing the given operation on the type given by
diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
index 95ffe397c23e80c105afea52e9d47216bf52f55a..eeb5aeed3202cc6971b6447994bc5311e9c010bb 100644
--- a/gcc/optabs-tree.c
+++ b/gcc/optabs-tree.c
@@ -127,7 +127,12 @@ optab_for_tree_code (enum tree_code code, const_tree type,
       return TYPE_UNSIGNED (type) ? usum_widen_optab : ssum_widen_optab;
 
     case DOT_PROD_EXPR:
-      return TYPE_UNSIGNED (type) ? udot_prod_optab : sdot_prod_optab;
+      {
+	if (subtype == optab_vector_mixed_sign)
+	  return usdot_prod_optab;
+
+	return (TYPE_UNSIGNED (type) ? udot_prod_optab : sdot_prod_optab);
+      }
 
     case SAD_EXPR:
       return TYPE_UNSIGNED (type) ? usad_optab : ssad_optab;
diff --git a/gcc/optabs.c b/gcc/optabs.c
index f4614a394587787293dc8b680a38901f7906f61c..d9b64441d0e0726afee89dc9c937350451e7670d 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -262,6 +262,11 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
   bool sbool = false;
 
   oprnd0 = ops->op0;
+  if (nops >= 2)
+    oprnd1 = ops->op1;
+  if (nops >= 3)
+    oprnd2 = ops->op2;
+
   tmode0 = TYPE_MODE (TREE_TYPE (oprnd0));
   if (ops->code == VEC_UNPACK_FIX_TRUNC_HI_EXPR
       || ops->code == VEC_UNPACK_FIX_TRUNC_LO_EXPR)
@@ -285,6 +290,27 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
 	   ? vec_unpacks_sbool_hi_optab : vec_unpacks_sbool_lo_optab);
       sbool = true;
     }
+  else if (ops->code == DOT_PROD_EXPR)
+    {
+      enum optab_subtype subtype = optab_default;
+      signop sign1 = TYPE_SIGN (TREE_TYPE (oprnd0));
+      signop sign2 = TYPE_SIGN (TREE_TYPE (oprnd1));
+      if (sign1 == sign2)
+	;
+      else if (sign1 == SIGNED && sign2 == UNSIGNED)
+	{
+	  subtype = optab_vector_mixed_sign;
+	  /* Same as optab_vector_mixed_sign but flip the operands.  */
+	  std::swap (op0, op1);
+	}
+      else if (sign1 == UNSIGNED && sign2 == SIGNED)
+	subtype = optab_vector_mixed_sign;
+      else
+	gcc_unreachable ();
+
+      widen_pattern_optab
+	= optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), subtype);
+    }
   else
     widen_pattern_optab
       = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), optab_default);
@@ -298,10 +324,7 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
   gcc_assert (icode != CODE_FOR_nothing);
 
   if (nops >= 2)
-    {
-      oprnd1 = ops->op1;
-      tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
-    }
+    tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
   else if (sbool)
     {
       nops = 2;
@@ -316,7 +339,6 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
     {
       gcc_assert (tmode1 == tmode0);
       gcc_assert (op1);
-      oprnd2 = ops->op2;
       wmode = TYPE_MODE (TREE_TYPE (oprnd2));
     }
 
diff --git a/gcc/optabs.def b/gcc/optabs.def
index b192a9d070b8aa72e5676b2eaa020b5bdd7ffcc8..f470c2168378cec840edf7fbdb7c18615baae928 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -352,6 +352,7 @@ OPTAB_D (uavg_ceil_optab, "uavg$a3_ceil")
 OPTAB_D (sdot_prod_optab, "sdot_prod$I$a")
 OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3")
 OPTAB_D (udot_prod_optab, "udot_prod$I$a")
+OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
 OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
 OPTAB_D (usad_optab, "usad$I$a")
 OPTAB_D (ssad_optab, "ssad$I$a")
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index 7e3aae5f9c28a49feedc7cc66e8ac0d476b9f28a..0128891852fcd74fe31cd338614e90a26256b4bd 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -4421,7 +4421,8 @@ verify_gimple_assign_ternary (gassign *stmt)
 		  && !SCALAR_FLOAT_TYPE_P (rhs1_type))
 		 || (!INTEGRAL_TYPE_P (lhs_type)
 		     && !SCALAR_FLOAT_TYPE_P (lhs_type))))
-	    || !types_compatible_p (rhs1_type, rhs2_type)
+	    /* rhs1_type and rhs2_type may differ in sign.  */
+	    || !tree_nop_conversion_p (rhs1_type, rhs2_type)
 	    || !useless_type_conversion_p (lhs_type, rhs3_type)
 	    || maybe_lt (GET_MODE_SIZE (element_mode (rhs3_type)),
 			 2 * GET_MODE_SIZE (element_mode (rhs1_type))))
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 93fa2928e001c154bd4a9a73ac1dbbbf73c456df..756d2867b678d0d8394202c6adb03d9cd26029e7 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -6662,6 +6662,12 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
   bool lane_reduc_code_p
     = (code == DOT_PROD_EXPR || code == WIDEN_SUM_EXPR || code == SAD_EXPR);
   int op_type = TREE_CODE_LENGTH (code);
+  enum optab_subtype optab_query_kind = optab_vector;
+  if (code == DOT_PROD_EXPR
+      && TYPE_SIGN (TREE_TYPE (gimple_assign_rhs1 (stmt)))
+	   != TYPE_SIGN (TREE_TYPE (gimple_assign_rhs2 (stmt))))
+    optab_query_kind = optab_vector_mixed_sign;
+
 
   scalar_dest = gimple_assign_lhs (stmt);
   scalar_type = TREE_TYPE (scalar_dest);
@@ -7189,7 +7195,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
       bool ok = true;
 
       /* 4.1. check support for the operation in the loop  */
-      optab optab = optab_for_tree_code (code, vectype_in, optab_vector);
+      optab optab = optab_for_tree_code (code, vectype_in, optab_query_kind);
       if (!optab)
 	{
 	  if (dump_enabled_p ())
diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index 441d6cd28c4eaded7abd756164890dbcffd2f3b8..82123b96313e6783ea214b9259805d65c07d8858 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -201,7 +201,8 @@ vect_get_external_def_edge (vec_info *vinfo, tree var)
 static bool
 vect_supportable_direct_optab_p (vec_info *vinfo, tree otype, tree_code code,
 				 tree itype, tree *vecotype_out,
-				 tree *vecitype_out = NULL)
+				 tree *vecitype_out = NULL,
+				 enum optab_subtype subtype = optab_default)
 {
   tree vecitype = get_vectype_for_scalar_type (vinfo, itype);
   if (!vecitype)
@@ -211,7 +212,7 @@ vect_supportable_direct_optab_p (vec_info *vinfo, tree otype, tree_code code,
   if (!vecotype)
     return false;
 
-  optab optab = optab_for_tree_code (code, vecitype, optab_default);
+  optab optab = optab_for_tree_code (code, vecitype, subtype);
   if (!optab)
     return false;
 
@@ -487,10 +488,14 @@ vect_joust_widened_integer (tree type, bool shift_p, tree op,
 }
 
 /* Return true if the common supertype of NEW_TYPE and *COMMON_TYPE
-   is narrower than type, storing the supertype in *COMMON_TYPE if so.  */
+   is narrower than type, storing the supertype in *COMMON_TYPE if so.
+   If UNPROM_TYPE then accept that *COMMON_TYPE and NEW_TYPE may be of
+   different signs but equal precision and that the resulting
+   multiplication of them be compatible with UNPROM_TYPE.   */
 
 static bool
-vect_joust_widened_type (tree type, tree new_type, tree *common_type)
+vect_joust_widened_type (tree type, tree new_type, tree *common_type,
+			 tree unprom_type = NULL)
 {
   if (types_compatible_p (*common_type, new_type))
     return true;
@@ -514,7 +519,18 @@ vect_joust_widened_type (tree type, tree new_type, tree *common_type)
   unsigned int precision = MAX (TYPE_PRECISION (*common_type),
 				TYPE_PRECISION (new_type));
   precision *= 2;
-  if (precision * 2 > TYPE_PRECISION (type))
+
+  /* Check if the mismatch is only in the sign and if we have
+     UNPROM_TYPE then allow it if there is enough precision to
+     not lose any information during the conversion.  */
+  if (unprom_type
+      && TYPE_SIGN (unprom_type) == SIGNED
+      && tree_nop_conversion_p (*common_type, new_type))
+	return true;
+
+  /* The resulting application is unsigned, check if we have enough
+     precision to perform the operation.  */
+  if (precision * 2 > TYPE_PRECISION (unprom_type ? unprom_type : type))
     return false;
 
   *common_type = build_nonstandard_integer_type (precision, false);
@@ -532,6 +548,10 @@ vect_joust_widened_type (tree type, tree new_type, tree *common_type)
    to a type that (a) is narrower than the result of STMT_INFO and
    (b) can hold all leaf operand values.
 
+   If UNPROM_TYPE then allow that the signs of the operands
+   may differ in signs but not in precision and that the resulting type
+   of the operation on the operands is compatible with UNPROM_TYPE.
+
    Return 0 if STMT_INFO isn't such a tree, or if no such COMMON_TYPE
    exists.  */
 
@@ -539,7 +559,8 @@ static unsigned int
 vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
 		      tree_code widened_code, bool shift_p,
 		      unsigned int max_nops,
-		      vect_unpromoted_value *unprom, tree *common_type)
+		      vect_unpromoted_value *unprom, tree *common_type,
+		      tree unprom_type = NULL)
 {
   /* Check for an integer operation with the right code.  */
   gassign *assign = dyn_cast <gassign *> (stmt_info->stmt);
@@ -600,7 +621,8 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
 		= vinfo->lookup_def (this_unprom->op);
 	      nops = vect_widened_op_tree (vinfo, def_stmt_info, code,
 					   widened_code, shift_p, max_nops,
-					   this_unprom, common_type);
+					   this_unprom, common_type,
+					   unprom_type);
 	      if (nops == 0)
 		return 0;
 
@@ -617,7 +639,7 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
 	      if (i == 0)
 		*common_type = this_unprom->type;
 	      else if (!vect_joust_widened_type (type, this_unprom->type,
-						 common_type))
+						 common_type, unprom_type))
 		return 0;
 	    }
 	}
@@ -799,12 +821,15 @@ vect_convert_input (vec_info *vinfo, stmt_vec_info stmt_info, tree type,
 }
 
 /* Invoke vect_convert_input for N elements of UNPROM and store the
-   result in the corresponding elements of RESULT.  */
+   result in the corresponding elements of RESULT.
+
+   If ALLOW_SHORT_SIGN_MISMATCH then don't convert the types if they only
+   differ by sign.  */
 
 static void
 vect_convert_inputs (vec_info *vinfo, stmt_vec_info stmt_info, unsigned int n,
 		     tree *result, tree type, vect_unpromoted_value *unprom,
-		     tree vectype)
+		     tree vectype, bool allow_short_sign_mismatch = false)
 {
   for (unsigned int i = 0; i < n; ++i)
     {
@@ -812,8 +837,12 @@ vect_convert_inputs (vec_info *vinfo, stmt_vec_info stmt_info, unsigned int n,
       for (j = 0; j < i; ++j)
 	if (unprom[j].op == unprom[i].op)
 	  break;
+
       if (j < i)
 	result[i] = result[j];
+      else if (allow_short_sign_mismatch
+	       && tree_nop_conversion_p (type, unprom[i].type))
+	result[i] = unprom[i].op;
       else
 	result[i] = vect_convert_input (vinfo, stmt_info,
 					type, &unprom[i], vectype);
@@ -888,21 +917,24 @@ vect_reassociating_reduction_p (vec_info *vinfo,
 
    Try to find the following pattern:
 
-     type x_t, y_t;
+     type1a x_t
+     type1b y_t;
      TYPE1 prod;
      TYPE2 sum = init;
    loop:
      sum_0 = phi <init, sum_1>
      S1  x_t = ...
      S2  y_t = ...
-     S3  x_T = (TYPE1) x_t;
-     S4  y_T = (TYPE1) y_t;
+     S3  x_T = (TYPE3) x_t;
+     S4  y_T = (TYPE4) y_t;
      S5  prod = x_T * y_T;
      [S6  prod = (TYPE2) prod;  #optional]
      S7  sum_1 = prod + sum_0;
 
-   where 'TYPE1' is exactly double the size of type 'type', and 'TYPE2' is the
-   same size of 'TYPE1' or bigger. This is a special case of a reduction
+   where 'TYPE1' is exactly double the size of type 'type1a' and 'type1b',
+   the sign of 'TYPE1' must be one of 'type1a' or 'type1b' but the sign of
+   'type1a' and 'type1b' can differ. 'TYPE2' is the same size of 'TYPE1' or
+   bigger and must be the same sign. This is a special case of a reduction
    computation.
 
    Input:
@@ -939,15 +971,16 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
 
   /* Look for the following pattern
           DX = (TYPE1) X;
-          DY = (TYPE1) Y;
+	  DY = (TYPE2) Y;
           DPROD = DX * DY;
-          DDPROD = (TYPE2) DPROD;
+	  DDPROD = (TYPE3) DPROD;
           sum_1 = DDPROD + sum_0;
      In which
      - DX is double the size of X
      - DY is double the size of Y
      - DX, DY, DPROD all have the same type but the sign
-       between DX, DY and DPROD can differ.
+       between DX, DY and DPROD can differ. The sign of DPROD
+       is one of the signs of DX or DY.
      - sum is the same size of DPROD or bigger
      - sum has been recognized as a reduction variable.
 
@@ -986,20 +1019,29 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
      inside the loop (in case we are analyzing an outer-loop).  */
   vect_unpromoted_value unprom0[2];
   if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR, WIDEN_MULT_EXPR,
-			     false, 2, unprom0, &half_type))
+			     false, 2, unprom0, &half_type,
+			     TREE_TYPE (unprom_mult.op)))
     return NULL;
 
+  /* Check to see if there is a sign change happening in the operands of the
+     multiplication and pick the appropriate optab subtype.  */
+  enum optab_subtype subtype;
+  if (TYPE_SIGN (unprom0[0].type) == TYPE_SIGN (unprom0[1].type))
+    subtype = optab_default;
+  else
+    subtype = optab_vector_mixed_sign;
+
   vect_pattern_detected ("vect_recog_dot_prod_pattern", last_stmt);
 
   tree half_vectype;
   if (!vect_supportable_direct_optab_p (vinfo, type, DOT_PROD_EXPR, half_type,
-					type_out, &half_vectype))
+					type_out, &half_vectype, subtype))
     return NULL;
 
   /* Get the inputs in the appropriate types.  */
   tree mult_oprnd[2];
   vect_convert_inputs (vinfo, stmt_vinfo, 2, mult_oprnd, half_type,
-		       unprom0, half_vectype);
+		       unprom0, half_vectype, true);
 
   var = vect_recog_temp_ssa_var (type, NULL);
   pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes.
  2021-06-04 10:12                 ` Tamar Christina
@ 2021-06-07 10:10                   ` Richard Sandiford
  2021-06-14 12:06                     ` Tamar Christina
  0 siblings, 1 reply; 35+ messages in thread
From: Richard Sandiford @ 2021-06-07 10:10 UTC (permalink / raw)
  To: Tamar Christina; +Cc: Richard Biener, nd, gcc-patches

Sorry for the slow response.

Tamar Christina <Tamar.Christina@arm.com> writes:
> […]
> diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
> index 441d6cd28c4eaded7abd756164890dbcffd2f3b8..82123b96313e6783ea214b9259805d65c07d8858 100644
> --- a/gcc/tree-vect-patterns.c
> +++ b/gcc/tree-vect-patterns.c
> @@ -201,7 +201,8 @@ vect_get_external_def_edge (vec_info *vinfo, tree var)
>  static bool
>  vect_supportable_direct_optab_p (vec_info *vinfo, tree otype, tree_code code,
>                                  tree itype, tree *vecotype_out,
> -                                tree *vecitype_out = NULL)
> +                                tree *vecitype_out = NULL,
> +                                enum optab_subtype subtype = optab_default)
>  {
>    tree vecitype = get_vectype_for_scalar_type (vinfo, itype);
>    if (!vecitype)
> @@ -211,7 +212,7 @@ vect_supportable_direct_optab_p (vec_info *vinfo, tree otype, tree_code code,
>    if (!vecotype)
>      return false;
>
> -  optab optab = optab_for_tree_code (code, vecitype, optab_default);
> +  optab optab = optab_for_tree_code (code, vecitype, subtype);
>    if (!optab)
>      return false;
>
> @@ -487,10 +488,14 @@ vect_joust_widened_integer (tree type, bool shift_p, tree op,
>  }
>
>  /* Return true if the common supertype of NEW_TYPE and *COMMON_TYPE
> -   is narrower than type, storing the supertype in *COMMON_TYPE if so.  */
> +   is narrower than type, storing the supertype in *COMMON_TYPE if so.
> +   If UNPROM_TYPE then accept that *COMMON_TYPE and NEW_TYPE may be of
> +   different signs but equal precision and that the resulting
> +   multiplication of them be compatible with UNPROM_TYPE.   */
>
>  static bool
> -vect_joust_widened_type (tree type, tree new_type, tree *common_type)
> +vect_joust_widened_type (tree type, tree new_type, tree *common_type,
> +                        tree unprom_type = NULL)
>  {
>    if (types_compatible_p (*common_type, new_type))
>      return true;
> @@ -514,7 +519,18 @@ vect_joust_widened_type (tree type, tree new_type, tree *common_type)
>    unsigned int precision = MAX (TYPE_PRECISION (*common_type),
>                                 TYPE_PRECISION (new_type));
>    precision *= 2;
> -  if (precision * 2 > TYPE_PRECISION (type))
> +
> +  /* Check if the mismatch is only in the sign and if we have
> +     UNPROM_TYPE then allow it if there is enough precision to
> +     not lose any information during the conversion.  */
> +  if (unprom_type
> +      && TYPE_SIGN (unprom_type) == SIGNED
> +      && tree_nop_conversion_p (*common_type, new_type))
> +       return true;
> +
> +  /* The resulting application is unsigned, check if we have enough
> +     precision to perform the operation.  */
> +  if (precision * 2 > TYPE_PRECISION (unprom_type ? unprom_type : type))
>      return false;
>
>    *common_type = build_nonstandard_integer_type (precision, false);
> @@ -532,6 +548,10 @@ vect_joust_widened_type (tree type, tree new_type, tree *common_type)
>     to a type that (a) is narrower than the result of STMT_INFO and
>     (b) can hold all leaf operand values.
>
> +   If UNPROM_TYPE then allow that the signs of the operands
> +   may differ in signs but not in precision and that the resulting type
> +   of the operation on the operands is compatible with UNPROM_TYPE.
> +
>     Return 0 if STMT_INFO isn't such a tree, or if no such COMMON_TYPE
>     exists.  */
>
> @@ -539,7 +559,8 @@ static unsigned int
>  vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
>                       tree_code widened_code, bool shift_p,
>                       unsigned int max_nops,
> -                     vect_unpromoted_value *unprom, tree *common_type)
> +                     vect_unpromoted_value *unprom, tree *common_type,
> +                     tree unprom_type = NULL)
>  {
>    /* Check for an integer operation with the right code.  */
>    gassign *assign = dyn_cast <gassign *> (stmt_info->stmt);
> @@ -600,7 +621,8 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
>                 = vinfo->lookup_def (this_unprom->op);
>               nops = vect_widened_op_tree (vinfo, def_stmt_info, code,
>                                            widened_code, shift_p, max_nops,
> -                                          this_unprom, common_type);
> +                                          this_unprom, common_type,
> +                                          unprom_type);
>               if (nops == 0)
>                 return 0;
>
> @@ -617,7 +639,7 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
>               if (i == 0)
>                 *common_type = this_unprom->type;
>               else if (!vect_joust_widened_type (type, this_unprom->type,
> -                                                common_type))
> +                                                common_type, unprom_type))
>                 return 0;
>             }
>         }
> @@ -799,12 +821,15 @@ vect_convert_input (vec_info *vinfo, stmt_vec_info stmt_info, tree type,
>  }
>
>  /* Invoke vect_convert_input for N elements of UNPROM and store the
> -   result in the corresponding elements of RESULT.  */
> +   result in the corresponding elements of RESULT.
> +
> +   If ALLOW_SHORT_SIGN_MISMATCH then don't convert the types if they only
> +   differ by sign.  */
>
>  static void
>  vect_convert_inputs (vec_info *vinfo, stmt_vec_info stmt_info, unsigned int n,
>                      tree *result, tree type, vect_unpromoted_value *unprom,
> -                    tree vectype)
> +                    tree vectype, bool allow_short_sign_mismatch = false)
>  {
>    for (unsigned int i = 0; i < n; ++i)
>      {
> @@ -812,8 +837,12 @@ vect_convert_inputs (vec_info *vinfo, stmt_vec_info stmt_info, unsigned int n,
>        for (j = 0; j < i; ++j)
>         if (unprom[j].op == unprom[i].op)
>           break;
> +
>        if (j < i)
>         result[i] = result[j];
> +      else if (allow_short_sign_mismatch
> +              && tree_nop_conversion_p (type, unprom[i].type))
> +       result[i] = unprom[i].op;
>        else
>         result[i] = vect_convert_input (vinfo, stmt_info,
>                                         type, &unprom[i], vectype);
> @@ -888,21 +917,24 @@ vect_reassociating_reduction_p (vec_info *vinfo,
>
>     Try to find the following pattern:
>
> -     type x_t, y_t;
> +     type1a x_t
> +     type1b y_t;
>       TYPE1 prod;
>       TYPE2 sum = init;
>     loop:
>       sum_0 = phi <init, sum_1>
>       S1  x_t = ...
>       S2  y_t = ...
> -     S3  x_T = (TYPE1) x_t;
> -     S4  y_T = (TYPE1) y_t;
> +     S3  x_T = (TYPE3) x_t;
> +     S4  y_T = (TYPE4) y_t;
>       S5  prod = x_T * y_T;
>       [S6  prod = (TYPE2) prod;  #optional]
>       S7  sum_1 = prod + sum_0;
>
> -   where 'TYPE1' is exactly double the size of type 'type', and 'TYPE2' is the
> -   same size of 'TYPE1' or bigger. This is a special case of a reduction
> +   where 'TYPE1' is exactly double the size of type 'type1a' and 'type1b',
> +   the sign of 'TYPE1' must be one of 'type1a' or 'type1b' but the sign of
> +   'type1a' and 'type1b' can differ. 'TYPE2' is the same size of 'TYPE1' or
> +   bigger and must be the same sign. This is a special case of a reduction
>     computation.

What are TYPE3 and TYPE4 in the above?  AFAICT the x_T and y_T casts
should still be to TYPE1, since the types of x_T and y_T need to agree.

The sign of TYPE2 shouldn't matter, since TYPE2 is only used for
the addition.

> @@ -939,15 +971,16 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
>
>    /* Look for the following pattern
>            DX = (TYPE1) X;
> -          DY = (TYPE1) Y;
> +         DY = (TYPE2) Y;
>            DPROD = DX * DY;
> -          DDPROD = (TYPE2) DPROD;
> +         DDPROD = (TYPE3) DPROD;
>            sum_1 = DDPROD + sum_0;
>       In which
>       - DX is double the size of X
>       - DY is double the size of Y
>       - DX, DY, DPROD all have the same type but the sign
> -       between DX, DY and DPROD can differ.
> +       between DX, DY and DPROD can differ. The sign of DPROD
> +       is one of the signs of DX or DY.
>       - sum is the same size of DPROD or bigger
>       - sum has been recognized as a reduction variable.

These changes don't look right: DY has to be the same type as DX.
(What's different with usdot is that X and Y can be different signs.)

> @@ -986,20 +1019,29 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
>       inside the loop (in case we are analyzing an outer-loop).  */
>    vect_unpromoted_value unprom0[2];
>    if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR, WIDEN_MULT_EXPR,
> -                            false, 2, unprom0, &half_type))
> +                            false, 2, unprom0, &half_type,
> +                            TREE_TYPE (unprom_mult.op)))
>      return NULL;
>
> +  /* Check to see if there is a sign change happening in the operands of the
> +     multiplication and pick the appropriate optab subtype.  */
> +  enum optab_subtype subtype;
> +  if (TYPE_SIGN (unprom0[0].type) == TYPE_SIGN (unprom0[1].type))
> +    subtype = optab_default;
> +  else
> +    subtype = optab_vector_mixed_sign;
> +

Doesn't this check the signs of the uncast operands?  What really matters
is how things stand after the result of the (possible) casts to half_type.

E.g.:

   signed short x;
   unsigned char y;
   int z;

   z = (int) x * (int) y + z;

is an sdot operation with half_type signed short, rather than a usdot
operation.

How about instead passing a optab_subtype* to vect_widened_op_tree, in
place of the unprom_mult.op type?  When this optab_subtype* is nonnull,
the joust operation is allowed to fail as long as:

  tree_nop_conversion_p (this_unprom->type, common_type)

is true.  vect_widened_op_tree would set the optab_subtype to
optab_vector_mixed_sign to indicate this case.

We should make sure that we handle:

   unsigned short x;
   signed char y;
   int z;

   z = (int) x * (int) y + z;

correctly though: this should be a usdot operation in which y
is cast to signed short.  I'm not sure whether the patch would
insert the needed cast.

Thanks,
Richard

>    vect_pattern_detected ("vect_recog_dot_prod_pattern", last_stmt);
>
>    tree half_vectype;
>    if (!vect_supportable_direct_optab_p (vinfo, type, DOT_PROD_EXPR, half_type,
> -                                       type_out, &half_vectype))
> +                                       type_out, &half_vectype, subtype))
>      return NULL;
>
>    /* Get the inputs in the appropriate types.  */
>    tree mult_oprnd[2];
>    vect_convert_inputs (vinfo, stmt_vinfo, 2, mult_oprnd, half_type,
> -                      unprom0, half_vectype);
> +                      unprom0, half_vectype, true);
>
>    var = vect_recog_temp_ssa_var (type, NULL);
>    pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes.
  2021-06-07 10:10                   ` Richard Sandiford
@ 2021-06-14 12:06                     ` Tamar Christina
  2021-06-21  8:11                       ` Tamar Christina
  2021-06-22 10:56                       ` Richard Sandiford
  0 siblings, 2 replies; 35+ messages in thread
From: Tamar Christina @ 2021-06-14 12:06 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: Richard Biener, nd, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 17552 bytes --]

Hi Richard,

I've attached a new version of the patch with the changes.
I have also added 7 new tests in the testsuite to check the cases you mentioned.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* optabs.def (usdot_prod_optab): New.
	* doc/md.texi: Document it and clarify other dot prod optabs.
	* optabs-tree.h (enum optab_subtype): Add optab_vector_mixed_sign.
	* optabs-tree.c (optab_for_tree_code): Support usdot_prod_optab.
	* optabs.c (expand_widen_pattern_expr): Likewise.
	* tree-cfg.c (verify_gimple_assign_ternary): Likewise.
	* tree-vect-loop.c (vectorizable_reduction): Query dot-product kind.
	* tree-vect-patterns.c (vect_supportable_direct_optab_p): Take optional
	optab subtype.
	(vect_widened_op_tree): Optionally ignore
	mismatch types.
	(vect_recog_dot_prod_pattern): Support usdot_prod_optab.

--- inline copy of patch ---

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 00caf3844ccf8ea289d581839766502d51b9e8d7..1356afb7f903f17c198103562b5cd145ecb9966f 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5446,13 +5446,55 @@ Like @samp{fold_left_plus_@var{m}}, but takes an additional mask operand
 
 @cindex @code{sdot_prod@var{m}} instruction pattern
 @item @samp{sdot_prod@var{m}}
+
+Compute the sum of the products of two signed elements.
+Operand 1 and operand 2 are of the same mode. Their
+product, which is of a wider mode, is computed and added to operand 3.
+Operand 3 is of a mode equal or wider than the mode of the product. The
+result is placed in operand 0, which is of the same mode as operand 3.
+
+Semantically the expressions perform the multiplication in the following signs
+
+@smallexample
+sdot<signed c, signed a, signed b> ==
+   res = sign-ext (a) * sign-ext (b) + c
+@dots{}
+@end smallexample
+
 @cindex @code{udot_prod@var{m}} instruction pattern
-@itemx @samp{udot_prod@var{m}}
-Compute the sum of the products of two signed/unsigned elements.
-Operand 1 and operand 2 are of the same mode. Their product, which is of a
-wider mode, is computed and added to operand 3. Operand 3 is of a mode equal or
-wider than the mode of the product. The result is placed in operand 0, which
-is of the same mode as operand 3.
+@item @samp{udot_prod@var{m}}
+
+Compute the sum of the products of two unsigned elements.
+Operand 1 and operand 2 are of the same mode. Their
+product, which is of a wider mode, is computed and added to operand 3.
+Operand 3 is of a mode equal or wider than the mode of the product. The
+result is placed in operand 0, which is of the same mode as operand 3.
+
+Semantically the expressions perform the multiplication in the following signs
+
+@smallexample
+udot<unsigned c, unsigned a, unsigned b> ==
+   res = zero-ext (a) * zero-ext (b) + c
+@dots{}
+@end smallexample
+
+
+
+@cindex @code{usdot_prod@var{m}} instruction pattern
+@item @samp{usdot_prod@var{m}}
+Compute the sum of the products of elements of different signs.
+Operand 1 must be unsigned and operand 2 signed. Their
+product, which is of a wider mode, is computed and added to operand 3.
+Operand 3 is of a mode equal or wider than the mode of the product. The
+result is placed in operand 0, which is of the same mode as operand 3.
+
+Semantically the expressions perform the multiplication in the following signs
+
+@smallexample
+usdot<unsigned c, unsigned a, signed b> ==
+   res = ((unsigned-conv) sign-ext (a)) * zero-ext (b) + c
+@dots{}
+@end smallexample
 
 @cindex @code{ssad@var{m}} instruction pattern
 @item @samp{ssad@var{m}}
diff --git a/gcc/optabs-tree.h b/gcc/optabs-tree.h
index c3aaa1a416991e856d3e24da45968a92ebada82c..fbd2b06b8dbfd560dfb66b314830e6b564b37abb 100644
--- a/gcc/optabs-tree.h
+++ b/gcc/optabs-tree.h
@@ -29,7 +29,8 @@ enum optab_subtype
 {
   optab_default,
   optab_scalar,
-  optab_vector
+  optab_vector,
+  optab_vector_mixed_sign
 };
 
 /* Return the optab used for computing the given operation on the type given by
diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
index 95ffe397c23e80c105afea52e9d47216bf52f55a..eeb5aeed3202cc6971b6447994bc5311e9c010bb 100644
--- a/gcc/optabs-tree.c
+++ b/gcc/optabs-tree.c
@@ -127,7 +127,12 @@ optab_for_tree_code (enum tree_code code, const_tree type,
       return TYPE_UNSIGNED (type) ? usum_widen_optab : ssum_widen_optab;
 
     case DOT_PROD_EXPR:
-      return TYPE_UNSIGNED (type) ? udot_prod_optab : sdot_prod_optab;
+      {
+	if (subtype == optab_vector_mixed_sign)
+	  return usdot_prod_optab;
+
+	return (TYPE_UNSIGNED (type) ? udot_prod_optab : sdot_prod_optab);
+      }
 
     case SAD_EXPR:
       return TYPE_UNSIGNED (type) ? usad_optab : ssad_optab;
diff --git a/gcc/optabs.c b/gcc/optabs.c
index 62a6bdb4c59bf8263c499245795576199606d372..14d8ad2f33fd75388435fe912380e177f8f3c54b 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -262,6 +262,11 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
   bool sbool = false;
 
   oprnd0 = ops->op0;
+  if (nops >= 2)
+    oprnd1 = ops->op1;
+  if (nops >= 3)
+    oprnd2 = ops->op2;
+
   tmode0 = TYPE_MODE (TREE_TYPE (oprnd0));
   if (ops->code == VEC_UNPACK_FIX_TRUNC_HI_EXPR
       || ops->code == VEC_UNPACK_FIX_TRUNC_LO_EXPR)
@@ -285,6 +290,27 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
 	   ? vec_unpacks_sbool_hi_optab : vec_unpacks_sbool_lo_optab);
       sbool = true;
     }
+  else if (ops->code == DOT_PROD_EXPR)
+    {
+      enum optab_subtype subtype = optab_default;
+      signop sign1 = TYPE_SIGN (TREE_TYPE (oprnd0));
+      signop sign2 = TYPE_SIGN (TREE_TYPE (oprnd1));
+      if (sign1 == sign2)
+	;
+      else if (sign1 == SIGNED && sign2 == UNSIGNED)
+	{
+	  subtype = optab_vector_mixed_sign;
+	  /* Same as optab_vector_mixed_sign but flip the operands.  */
+	  std::swap (op0, op1);
+	}
+      else if (sign1 == UNSIGNED && sign2 == SIGNED)
+	subtype = optab_vector_mixed_sign;
+      else
+	gcc_unreachable ();
+
+      widen_pattern_optab
+	= optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), subtype);
+    }
   else
     widen_pattern_optab
       = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), optab_default);
@@ -298,10 +324,7 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
   gcc_assert (icode != CODE_FOR_nothing);
 
   if (nops >= 2)
-    {
-      oprnd1 = ops->op1;
-      tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
-    }
+    tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
   else if (sbool)
     {
       nops = 2;
@@ -316,7 +339,6 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
     {
       gcc_assert (tmode1 == tmode0);
       gcc_assert (op1);
-      oprnd2 = ops->op2;
       wmode = TYPE_MODE (TREE_TYPE (oprnd2));
     }
 
diff --git a/gcc/optabs.def b/gcc/optabs.def
index b192a9d070b8aa72e5676b2eaa020b5bdd7ffcc8..f470c2168378cec840edf7fbdb7c18615baae928 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -352,6 +352,7 @@ OPTAB_D (uavg_ceil_optab, "uavg$a3_ceil")
 OPTAB_D (sdot_prod_optab, "sdot_prod$I$a")
 OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3")
 OPTAB_D (udot_prod_optab, "udot_prod$I$a")
+OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
 OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
 OPTAB_D (usad_optab, "usad$I$a")
 OPTAB_D (ssad_optab, "ssad$I$a")
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index 02256580c986be426564adc1105ed2e1c69b0efc..f250f0fe99bec5278a0963e92bc1d2a61d9eee70 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -4412,7 +4412,8 @@ verify_gimple_assign_ternary (gassign *stmt)
 		  && !SCALAR_FLOAT_TYPE_P (rhs1_type))
 		 || (!INTEGRAL_TYPE_P (lhs_type)
 		     && !SCALAR_FLOAT_TYPE_P (lhs_type))))
-	    || !types_compatible_p (rhs1_type, rhs2_type)
+	    /* rhs1_type and rhs2_type may differ in sign.  */
+	    || !tree_nop_conversion_p (rhs1_type, rhs2_type)
 	    || !useless_type_conversion_p (lhs_type, rhs3_type)
 	    || maybe_lt (GET_MODE_SIZE (element_mode (rhs3_type)),
 			 2 * GET_MODE_SIZE (element_mode (rhs1_type))))
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index ee79808472cea88786e5c04756980b456c3f5a02..d2accf3c35ade25e8d2ff4ee88136651e3e87c74 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -6663,6 +6663,12 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
   bool lane_reduc_code_p
     = (code == DOT_PROD_EXPR || code == WIDEN_SUM_EXPR || code == SAD_EXPR);
   int op_type = TREE_CODE_LENGTH (code);
+  enum optab_subtype optab_query_kind = optab_vector;
+  if (code == DOT_PROD_EXPR
+      && TYPE_SIGN (TREE_TYPE (gimple_assign_rhs1 (stmt)))
+	   != TYPE_SIGN (TREE_TYPE (gimple_assign_rhs2 (stmt))))
+    optab_query_kind = optab_vector_mixed_sign;
+
 
   scalar_dest = gimple_assign_lhs (stmt);
   scalar_type = TREE_TYPE (scalar_dest);
@@ -7190,7 +7196,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
       bool ok = true;
 
       /* 4.1. check support for the operation in the loop  */
-      optab optab = optab_for_tree_code (code, vectype_in, optab_vector);
+      optab optab = optab_for_tree_code (code, vectype_in, optab_query_kind);
       if (!optab)
 	{
 	  if (dump_enabled_p ())
diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index c6b6feadb8d8d5cc57ded192cd68dd54b9185aef..77605e55dec7b4f6b0a1e1fdafa6313b987fa12c 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -191,9 +191,9 @@ vect_get_external_def_edge (vec_info *vinfo, tree var)
 }
 
 /* Return true if the target supports a vector version of CODE,
-   where CODE is known to map to a direct optab.  ITYPE specifies
-   the type of (some of) the scalar inputs and OTYPE specifies the
-   type of the scalar result.
+   where CODE is known to map to a direct optab with the given SUBTYPE.
+   ITYPE specifies the type of (some of) the scalar inputs and OTYPE
+   specifies the type of the scalar result.
 
    If CODE allows the inputs and outputs to have different type
    (such as for WIDEN_SUM_EXPR), it is the input mode rather
@@ -208,7 +208,8 @@ vect_get_external_def_edge (vec_info *vinfo, tree var)
 static bool
 vect_supportable_direct_optab_p (vec_info *vinfo, tree otype, tree_code code,
 				 tree itype, tree *vecotype_out,
-				 tree *vecitype_out = NULL)
+				 tree *vecitype_out = NULL,
+				 enum optab_subtype subtype = optab_default)
 {
   tree vecitype = get_vectype_for_scalar_type (vinfo, itype);
   if (!vecitype)
@@ -218,7 +219,7 @@ vect_supportable_direct_optab_p (vec_info *vinfo, tree otype, tree_code code,
   if (!vecotype)
     return false;
 
-  optab optab = optab_for_tree_code (code, vecitype, optab_default);
+  optab optab = optab_for_tree_code (code, vecitype, subtype);
   if (!optab)
     return false;
 
@@ -521,6 +522,9 @@ vect_joust_widened_type (tree type, tree new_type, tree *common_type)
   unsigned int precision = MAX (TYPE_PRECISION (*common_type),
 				TYPE_PRECISION (new_type));
   precision *= 2;
+
+  /* The resulting application is unsigned, check if we have enough
+     precision to perform the operation.  */
   if (precision * 2 > TYPE_PRECISION (type))
     return false;
 
@@ -539,6 +543,10 @@ vect_joust_widened_type (tree type, tree new_type, tree *common_type)
    to a type that (a) is narrower than the result of STMT_INFO and
    (b) can hold all leaf operand values.
 
+   If SUBTYPE then allow that the signs of the operands
+   may differ in signs but not in precision.  SUBTYPE is updated to reflect
+   this.
+
    Return 0 if STMT_INFO isn't such a tree, or if no such COMMON_TYPE
    exists.  */
 
@@ -546,7 +554,8 @@ static unsigned int
 vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
 		      tree_code widened_code, bool shift_p,
 		      unsigned int max_nops,
-		      vect_unpromoted_value *unprom, tree *common_type)
+		      vect_unpromoted_value *unprom, tree *common_type,
+		      enum optab_subtype *subtype = NULL)
 {
   /* Check for an integer operation with the right code.  */
   gassign *assign = dyn_cast <gassign *> (stmt_info->stmt);
@@ -607,7 +616,8 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
 		= vinfo->lookup_def (this_unprom->op);
 	      nops = vect_widened_op_tree (vinfo, def_stmt_info, code,
 					   widened_code, shift_p, max_nops,
-					   this_unprom, common_type);
+					   this_unprom, common_type,
+					   subtype);
 	      if (nops == 0)
 		return 0;
 
@@ -625,7 +635,24 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
 		*common_type = this_unprom->type;
 	      else if (!vect_joust_widened_type (type, this_unprom->type,
 						 common_type))
-		return 0;
+		{
+		  if (subtype)
+		    {
+		      tree new_type = *common_type;
+		      /* See if we can sign extend the smaller type.  */
+		      if (TYPE_PRECISION (this_unprom->type) > TYPE_PRECISION (new_type)
+			  && (TYPE_UNSIGNED (this_unprom->type) && !TYPE_UNSIGNED (new_type)))
+			new_type = build_nonstandard_integer_type (TYPE_PRECISION (this_unprom->type), true);
+
+		      if (tree_nop_conversion_p (this_unprom->type, new_type))
+			{
+			  *subtype = optab_vector_mixed_sign;
+			  *common_type = new_type;
+			}
+		    }
+		  else
+		    return 0;
+		}
 	    }
 	}
       next_op += nops;
@@ -806,12 +833,15 @@ vect_convert_input (vec_info *vinfo, stmt_vec_info stmt_info, tree type,
 }
 
 /* Invoke vect_convert_input for N elements of UNPROM and store the
-   result in the corresponding elements of RESULT.  */
+   result in the corresponding elements of RESULT.
+
+   If SUBTYPE then don't convert the types if they only
+   differ by sign.  */
 
 static void
 vect_convert_inputs (vec_info *vinfo, stmt_vec_info stmt_info, unsigned int n,
 		     tree *result, tree type, vect_unpromoted_value *unprom,
-		     tree vectype)
+		     tree vectype, enum optab_subtype subtype = optab_default)
 {
   for (unsigned int i = 0; i < n; ++i)
     {
@@ -819,8 +849,12 @@ vect_convert_inputs (vec_info *vinfo, stmt_vec_info stmt_info, unsigned int n,
       for (j = 0; j < i; ++j)
 	if (unprom[j].op == unprom[i].op)
 	  break;
+
       if (j < i)
 	result[i] = result[j];
+      else if (subtype == optab_vector_mixed_sign
+	       && tree_nop_conversion_p (type, unprom[i].type))
+	result[i] = unprom[i].op;
       else
 	result[i] = vect_convert_input (vinfo, stmt_info,
 					type, &unprom[i], vectype);
@@ -895,7 +929,8 @@ vect_reassociating_reduction_p (vec_info *vinfo,
 
    Try to find the following pattern:
 
-     type x_t, y_t;
+     type1a x_t
+     type1b y_t;
      TYPE1 prod;
      TYPE2 sum = init;
    loop:
@@ -908,8 +943,10 @@ vect_reassociating_reduction_p (vec_info *vinfo,
      [S6  prod = (TYPE2) prod;  #optional]
      S7  sum_1 = prod + sum_0;
 
-   where 'TYPE1' is exactly double the size of type 'type', and 'TYPE2' is the
-   same size of 'TYPE1' or bigger. This is a special case of a reduction
+   where 'TYPE1' is exactly double the size of type 'type1a' and 'type1b',
+   the sign of 'TYPE1' must be one of 'type1a' or 'type1b' but the sign of
+   'type1a' and 'type1b' can differ. 'TYPE2' is the same size of 'TYPE1' or
+   bigger and must be the same sign. This is a special case of a reduction
    computation.
 
    Input:
@@ -946,15 +983,15 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
 
   /* Look for the following pattern
           DX = (TYPE1) X;
-          DY = (TYPE1) Y;
+	  DY = (TYPE1) Y;
           DPROD = DX * DY;
-          DDPROD = (TYPE2) DPROD;
+	  DDPROD = (TYPE2) DPROD;
           sum_1 = DDPROD + sum_0;
      In which
      - DX is double the size of X
      - DY is double the size of Y
      - DX, DY, DPROD all have the same type but the sign
-       between DX, DY and DPROD can differ.
+       between X, Y and DPROD can differ.
      - sum is the same size of DPROD or bigger
      - sum has been recognized as a reduction variable.
 
@@ -992,21 +1029,27 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
   /* FORNOW.  Can continue analyzing the def-use chain when this stmt in a phi
      inside the loop (in case we are analyzing an outer-loop).  */
   vect_unpromoted_value unprom0[2];
+  enum optab_subtype subtype = optab_vector;
   if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR, WIDEN_MULT_EXPR,
-			     false, 2, unprom0, &half_type))
+			     false, 2, unprom0, &half_type, &subtype))
+    return NULL;
+
+  if (subtype == optab_vector_mixed_sign
+      && TYPE_UNSIGNED (unprom_mult.type)
+      && TYPE_PRECISION (half_type) * 4 > TYPE_PRECISION (unprom_mult.type))
     return NULL;
 
   vect_pattern_detected ("vect_recog_dot_prod_pattern", last_stmt);
 
   tree half_vectype;
   if (!vect_supportable_direct_optab_p (vinfo, type, DOT_PROD_EXPR, half_type,
-					type_out, &half_vectype))
+					type_out, &half_vectype, subtype))
     return NULL;
 
   /* Get the inputs in the appropriate types.  */
   tree mult_oprnd[2];
   vect_convert_inputs (vinfo, stmt_vinfo, 2, mult_oprnd, half_type,
-		       unprom0, half_vectype);
+		       unprom0, half_vectype, subtype);
 
   var = vect_recog_temp_ssa_var (type, NULL);
   pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,

[-- Attachment #2: rb14433.patch --]
[-- Type: application/octet-stream, Size: 16185 bytes --]

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 00caf3844ccf8ea289d581839766502d51b9e8d7..1356afb7f903f17c198103562b5cd145ecb9966f 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5446,13 +5446,55 @@ Like @samp{fold_left_plus_@var{m}}, but takes an additional mask operand
 
 @cindex @code{sdot_prod@var{m}} instruction pattern
 @item @samp{sdot_prod@var{m}}
+
+Compute the sum of the products of two signed elements.
+Operand 1 and operand 2 are of the same mode. Their
+product, which is of a wider mode, is computed and added to operand 3.
+Operand 3 is of a mode equal or wider than the mode of the product. The
+result is placed in operand 0, which is of the same mode as operand 3.
+
+Semantically the expressions perform the multiplication in the following signs
+
+@smallexample
+sdot<signed c, signed a, signed b> ==
+   res = sign-ext (a) * sign-ext (b) + c
+@dots{}
+@end smallexample
+
 @cindex @code{udot_prod@var{m}} instruction pattern
-@itemx @samp{udot_prod@var{m}}
-Compute the sum of the products of two signed/unsigned elements.
-Operand 1 and operand 2 are of the same mode. Their product, which is of a
-wider mode, is computed and added to operand 3. Operand 3 is of a mode equal or
-wider than the mode of the product. The result is placed in operand 0, which
-is of the same mode as operand 3.
+@item @samp{udot_prod@var{m}}
+
+Compute the sum of the products of two unsigned elements.
+Operand 1 and operand 2 are of the same mode. Their
+product, which is of a wider mode, is computed and added to operand 3.
+Operand 3 is of a mode equal or wider than the mode of the product. The
+result is placed in operand 0, which is of the same mode as operand 3.
+
+Semantically the expressions perform the multiplication in the following signs
+
+@smallexample
+udot<unsigned c, unsigned a, unsigned b> ==
+   res = zero-ext (a) * zero-ext (b) + c
+@dots{}
+@end smallexample
+
+
+
+@cindex @code{usdot_prod@var{m}} instruction pattern
+@item @samp{usdot_prod@var{m}}
+Compute the sum of the products of elements of different signs.
+Operand 1 must be unsigned and operand 2 signed. Their
+product, which is of a wider mode, is computed and added to operand 3.
+Operand 3 is of a mode equal or wider than the mode of the product. The
+result is placed in operand 0, which is of the same mode as operand 3.
+
+Semantically the expressions perform the multiplication in the following signs
+
+@smallexample
+usdot<unsigned c, unsigned a, signed b> ==
+   res = ((unsigned-conv) sign-ext (a)) * zero-ext (b) + c
+@dots{}
+@end smallexample
 
 @cindex @code{ssad@var{m}} instruction pattern
 @item @samp{ssad@var{m}}
diff --git a/gcc/optabs-tree.h b/gcc/optabs-tree.h
index c3aaa1a416991e856d3e24da45968a92ebada82c..fbd2b06b8dbfd560dfb66b314830e6b564b37abb 100644
--- a/gcc/optabs-tree.h
+++ b/gcc/optabs-tree.h
@@ -29,7 +29,8 @@ enum optab_subtype
 {
   optab_default,
   optab_scalar,
-  optab_vector
+  optab_vector,
+  optab_vector_mixed_sign
 };
 
 /* Return the optab used for computing the given operation on the type given by
diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
index 95ffe397c23e80c105afea52e9d47216bf52f55a..eeb5aeed3202cc6971b6447994bc5311e9c010bb 100644
--- a/gcc/optabs-tree.c
+++ b/gcc/optabs-tree.c
@@ -127,7 +127,12 @@ optab_for_tree_code (enum tree_code code, const_tree type,
       return TYPE_UNSIGNED (type) ? usum_widen_optab : ssum_widen_optab;
 
     case DOT_PROD_EXPR:
-      return TYPE_UNSIGNED (type) ? udot_prod_optab : sdot_prod_optab;
+      {
+	if (subtype == optab_vector_mixed_sign)
+	  return usdot_prod_optab;
+
+	return (TYPE_UNSIGNED (type) ? udot_prod_optab : sdot_prod_optab);
+      }
 
     case SAD_EXPR:
       return TYPE_UNSIGNED (type) ? usad_optab : ssad_optab;
diff --git a/gcc/optabs.c b/gcc/optabs.c
index 62a6bdb4c59bf8263c499245795576199606d372..14d8ad2f33fd75388435fe912380e177f8f3c54b 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -262,6 +262,11 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
   bool sbool = false;
 
   oprnd0 = ops->op0;
+  if (nops >= 2)
+    oprnd1 = ops->op1;
+  if (nops >= 3)
+    oprnd2 = ops->op2;
+
   tmode0 = TYPE_MODE (TREE_TYPE (oprnd0));
   if (ops->code == VEC_UNPACK_FIX_TRUNC_HI_EXPR
       || ops->code == VEC_UNPACK_FIX_TRUNC_LO_EXPR)
@@ -285,6 +290,27 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
 	   ? vec_unpacks_sbool_hi_optab : vec_unpacks_sbool_lo_optab);
       sbool = true;
     }
+  else if (ops->code == DOT_PROD_EXPR)
+    {
+      enum optab_subtype subtype = optab_default;
+      signop sign1 = TYPE_SIGN (TREE_TYPE (oprnd0));
+      signop sign2 = TYPE_SIGN (TREE_TYPE (oprnd1));
+      if (sign1 == sign2)
+	;
+      else if (sign1 == SIGNED && sign2 == UNSIGNED)
+	{
+	  subtype = optab_vector_mixed_sign;
+	  /* Same as optab_vector_mixed_sign but flip the operands.  */
+	  std::swap (op0, op1);
+	}
+      else if (sign1 == UNSIGNED && sign2 == SIGNED)
+	subtype = optab_vector_mixed_sign;
+      else
+	gcc_unreachable ();
+
+      widen_pattern_optab
+	= optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), subtype);
+    }
   else
     widen_pattern_optab
       = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), optab_default);
@@ -298,10 +324,7 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
   gcc_assert (icode != CODE_FOR_nothing);
 
   if (nops >= 2)
-    {
-      oprnd1 = ops->op1;
-      tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
-    }
+    tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
   else if (sbool)
     {
       nops = 2;
@@ -316,7 +339,6 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
     {
       gcc_assert (tmode1 == tmode0);
       gcc_assert (op1);
-      oprnd2 = ops->op2;
       wmode = TYPE_MODE (TREE_TYPE (oprnd2));
     }
 
diff --git a/gcc/optabs.def b/gcc/optabs.def
index b192a9d070b8aa72e5676b2eaa020b5bdd7ffcc8..f470c2168378cec840edf7fbdb7c18615baae928 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -352,6 +352,7 @@ OPTAB_D (uavg_ceil_optab, "uavg$a3_ceil")
 OPTAB_D (sdot_prod_optab, "sdot_prod$I$a")
 OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3")
 OPTAB_D (udot_prod_optab, "udot_prod$I$a")
+OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
 OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
 OPTAB_D (usad_optab, "usad$I$a")
 OPTAB_D (ssad_optab, "ssad$I$a")
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index 02256580c986be426564adc1105ed2e1c69b0efc..f250f0fe99bec5278a0963e92bc1d2a61d9eee70 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -4412,7 +4412,8 @@ verify_gimple_assign_ternary (gassign *stmt)
 		  && !SCALAR_FLOAT_TYPE_P (rhs1_type))
 		 || (!INTEGRAL_TYPE_P (lhs_type)
 		     && !SCALAR_FLOAT_TYPE_P (lhs_type))))
-	    || !types_compatible_p (rhs1_type, rhs2_type)
+	    /* rhs1_type and rhs2_type may differ in sign.  */
+	    || !tree_nop_conversion_p (rhs1_type, rhs2_type)
 	    || !useless_type_conversion_p (lhs_type, rhs3_type)
 	    || maybe_lt (GET_MODE_SIZE (element_mode (rhs3_type)),
 			 2 * GET_MODE_SIZE (element_mode (rhs1_type))))
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index ee79808472cea88786e5c04756980b456c3f5a02..d2accf3c35ade25e8d2ff4ee88136651e3e87c74 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -6663,6 +6663,12 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
   bool lane_reduc_code_p
     = (code == DOT_PROD_EXPR || code == WIDEN_SUM_EXPR || code == SAD_EXPR);
   int op_type = TREE_CODE_LENGTH (code);
+  enum optab_subtype optab_query_kind = optab_vector;
+  if (code == DOT_PROD_EXPR
+      && TYPE_SIGN (TREE_TYPE (gimple_assign_rhs1 (stmt)))
+	   != TYPE_SIGN (TREE_TYPE (gimple_assign_rhs2 (stmt))))
+    optab_query_kind = optab_vector_mixed_sign;
+
 
   scalar_dest = gimple_assign_lhs (stmt);
   scalar_type = TREE_TYPE (scalar_dest);
@@ -7190,7 +7196,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
       bool ok = true;
 
       /* 4.1. check support for the operation in the loop  */
-      optab optab = optab_for_tree_code (code, vectype_in, optab_vector);
+      optab optab = optab_for_tree_code (code, vectype_in, optab_query_kind);
       if (!optab)
 	{
 	  if (dump_enabled_p ())
diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index c6b6feadb8d8d5cc57ded192cd68dd54b9185aef..77605e55dec7b4f6b0a1e1fdafa6313b987fa12c 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -191,9 +191,9 @@ vect_get_external_def_edge (vec_info *vinfo, tree var)
 }
 
 /* Return true if the target supports a vector version of CODE,
-   where CODE is known to map to a direct optab.  ITYPE specifies
-   the type of (some of) the scalar inputs and OTYPE specifies the
-   type of the scalar result.
+   where CODE is known to map to a direct optab with the given SUBTYPE.
+   ITYPE specifies the type of (some of) the scalar inputs and OTYPE
+   specifies the type of the scalar result.
 
    If CODE allows the inputs and outputs to have different type
    (such as for WIDEN_SUM_EXPR), it is the input mode rather
@@ -208,7 +208,8 @@ vect_get_external_def_edge (vec_info *vinfo, tree var)
 static bool
 vect_supportable_direct_optab_p (vec_info *vinfo, tree otype, tree_code code,
 				 tree itype, tree *vecotype_out,
-				 tree *vecitype_out = NULL)
+				 tree *vecitype_out = NULL,
+				 enum optab_subtype subtype = optab_default)
 {
   tree vecitype = get_vectype_for_scalar_type (vinfo, itype);
   if (!vecitype)
@@ -218,7 +219,7 @@ vect_supportable_direct_optab_p (vec_info *vinfo, tree otype, tree_code code,
   if (!vecotype)
     return false;
 
-  optab optab = optab_for_tree_code (code, vecitype, optab_default);
+  optab optab = optab_for_tree_code (code, vecitype, subtype);
   if (!optab)
     return false;
 
@@ -521,6 +522,9 @@ vect_joust_widened_type (tree type, tree new_type, tree *common_type)
   unsigned int precision = MAX (TYPE_PRECISION (*common_type),
 				TYPE_PRECISION (new_type));
   precision *= 2;
+
+  /* The resulting application is unsigned, check if we have enough
+     precision to perform the operation.  */
   if (precision * 2 > TYPE_PRECISION (type))
     return false;
 
@@ -539,6 +543,10 @@ vect_joust_widened_type (tree type, tree new_type, tree *common_type)
    to a type that (a) is narrower than the result of STMT_INFO and
    (b) can hold all leaf operand values.
 
+   If SUBTYPE then allow that the signs of the operands
+   may differ in signs but not in precision.  SUBTYPE is updated to reflect
+   this.
+
    Return 0 if STMT_INFO isn't such a tree, or if no such COMMON_TYPE
    exists.  */
 
@@ -546,7 +554,8 @@ static unsigned int
 vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
 		      tree_code widened_code, bool shift_p,
 		      unsigned int max_nops,
-		      vect_unpromoted_value *unprom, tree *common_type)
+		      vect_unpromoted_value *unprom, tree *common_type,
+		      enum optab_subtype *subtype = NULL)
 {
   /* Check for an integer operation with the right code.  */
   gassign *assign = dyn_cast <gassign *> (stmt_info->stmt);
@@ -607,7 +616,8 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
 		= vinfo->lookup_def (this_unprom->op);
 	      nops = vect_widened_op_tree (vinfo, def_stmt_info, code,
 					   widened_code, shift_p, max_nops,
-					   this_unprom, common_type);
+					   this_unprom, common_type,
+					   subtype);
 	      if (nops == 0)
 		return 0;
 
@@ -625,7 +635,24 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
 		*common_type = this_unprom->type;
 	      else if (!vect_joust_widened_type (type, this_unprom->type,
 						 common_type))
-		return 0;
+		{
+		  if (subtype)
+		    {
+		      tree new_type = *common_type;
+		      /* See if we can sign extend the smaller type.  */
+		      if (TYPE_PRECISION (this_unprom->type) > TYPE_PRECISION (new_type)
+			  && (TYPE_UNSIGNED (this_unprom->type) && !TYPE_UNSIGNED (new_type)))
+			new_type = build_nonstandard_integer_type (TYPE_PRECISION (this_unprom->type), true);
+
+		      if (tree_nop_conversion_p (this_unprom->type, new_type))
+			{
+			  *subtype = optab_vector_mixed_sign;
+			  *common_type = new_type;
+			}
+		    }
+		  else
+		    return 0;
+		}
 	    }
 	}
       next_op += nops;
@@ -806,12 +833,15 @@ vect_convert_input (vec_info *vinfo, stmt_vec_info stmt_info, tree type,
 }
 
 /* Invoke vect_convert_input for N elements of UNPROM and store the
-   result in the corresponding elements of RESULT.  */
+   result in the corresponding elements of RESULT.
+
+   If SUBTYPE then don't convert the types if they only
+   differ by sign.  */
 
 static void
 vect_convert_inputs (vec_info *vinfo, stmt_vec_info stmt_info, unsigned int n,
 		     tree *result, tree type, vect_unpromoted_value *unprom,
-		     tree vectype)
+		     tree vectype, enum optab_subtype subtype = optab_default)
 {
   for (unsigned int i = 0; i < n; ++i)
     {
@@ -819,8 +849,12 @@ vect_convert_inputs (vec_info *vinfo, stmt_vec_info stmt_info, unsigned int n,
       for (j = 0; j < i; ++j)
 	if (unprom[j].op == unprom[i].op)
 	  break;
+
       if (j < i)
 	result[i] = result[j];
+      else if (subtype == optab_vector_mixed_sign
+	       && tree_nop_conversion_p (type, unprom[i].type))
+	result[i] = unprom[i].op;
       else
 	result[i] = vect_convert_input (vinfo, stmt_info,
 					type, &unprom[i], vectype);
@@ -895,7 +929,8 @@ vect_reassociating_reduction_p (vec_info *vinfo,
 
    Try to find the following pattern:
 
-     type x_t, y_t;
+     type1a x_t
+     type1b y_t;
      TYPE1 prod;
      TYPE2 sum = init;
    loop:
@@ -908,8 +943,10 @@ vect_reassociating_reduction_p (vec_info *vinfo,
      [S6  prod = (TYPE2) prod;  #optional]
      S7  sum_1 = prod + sum_0;
 
-   where 'TYPE1' is exactly double the size of type 'type', and 'TYPE2' is the
-   same size of 'TYPE1' or bigger. This is a special case of a reduction
+   where 'TYPE1' is exactly double the size of type 'type1a' and 'type1b',
+   the sign of 'TYPE1' must be one of 'type1a' or 'type1b' but the sign of
+   'type1a' and 'type1b' can differ. 'TYPE2' is the same size of 'TYPE1' or
+   bigger and must be the same sign. This is a special case of a reduction
    computation.
 
    Input:
@@ -946,15 +983,15 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
 
   /* Look for the following pattern
           DX = (TYPE1) X;
-          DY = (TYPE1) Y;
+	  DY = (TYPE1) Y;
           DPROD = DX * DY;
-          DDPROD = (TYPE2) DPROD;
+	  DDPROD = (TYPE2) DPROD;
           sum_1 = DDPROD + sum_0;
      In which
      - DX is double the size of X
      - DY is double the size of Y
      - DX, DY, DPROD all have the same type but the sign
-       between DX, DY and DPROD can differ.
+       between X, Y and DPROD can differ.
      - sum is the same size of DPROD or bigger
      - sum has been recognized as a reduction variable.
 
@@ -992,21 +1029,27 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
   /* FORNOW.  Can continue analyzing the def-use chain when this stmt in a phi
      inside the loop (in case we are analyzing an outer-loop).  */
   vect_unpromoted_value unprom0[2];
+  enum optab_subtype subtype = optab_vector;
   if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR, WIDEN_MULT_EXPR,
-			     false, 2, unprom0, &half_type))
+			     false, 2, unprom0, &half_type, &subtype))
+    return NULL;
+
+  if (subtype == optab_vector_mixed_sign
+      && TYPE_UNSIGNED (unprom_mult.type)
+      && TYPE_PRECISION (half_type) * 4 > TYPE_PRECISION (unprom_mult.type))
     return NULL;
 
   vect_pattern_detected ("vect_recog_dot_prod_pattern", last_stmt);
 
   tree half_vectype;
   if (!vect_supportable_direct_optab_p (vinfo, type, DOT_PROD_EXPR, half_type,
-					type_out, &half_vectype))
+					type_out, &half_vectype, subtype))
     return NULL;
 
   /* Get the inputs in the appropriate types.  */
   tree mult_oprnd[2];
   vect_convert_inputs (vinfo, stmt_vinfo, 2, mult_oprnd, half_type,
-		       unprom0, half_vectype);
+		       unprom0, half_vectype, subtype);
 
   var = vect_recog_temp_ssa_var (type, NULL);
   pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes.
  2021-06-14 12:06                     ` Tamar Christina
@ 2021-06-21  8:11                       ` Tamar Christina
  2021-06-22 10:56                       ` Richard Sandiford
  1 sibling, 0 replies; 35+ messages in thread
From: Tamar Christina @ 2021-06-21  8:11 UTC (permalink / raw)
  To: Tamar Christina, Richard Sandiford; +Cc: nd, Richard Biener, gcc-patches

Ping

> -----Original Message-----
> From: Gcc-patches <gcc-patches-
> bounces+tamar.christina=arm.com@gcc.gnu.org> On Behalf Of Tamar
> Christina via Gcc-patches
> Sent: Monday, June 14, 2021 1:06 PM
> To: Richard Sandiford <Richard.Sandiford@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Biener
> <rguenther@suse.de>
> Subject: RE: [PATCH 1/4]middle-end Vect: Add support for dot-product
> where the sign for the multiplicant changes.
> 
> Hi Richard,
> 
> I've attached a new version of the patch with the changes.
> I have also added 7 new tests in the testsuite to check the cases you
> mentioned.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* optabs.def (usdot_prod_optab): New.
> 	* doc/md.texi: Document it and clarify other dot prod optabs.
> 	* optabs-tree.h (enum optab_subtype): Add
> optab_vector_mixed_sign.
> 	* optabs-tree.c (optab_for_tree_code): Support usdot_prod_optab.
> 	* optabs.c (expand_widen_pattern_expr): Likewise.
> 	* tree-cfg.c (verify_gimple_assign_ternary): Likewise.
> 	* tree-vect-loop.c (vectorizable_reduction): Query dot-product kind.
> 	* tree-vect-patterns.c (vect_supportable_direct_optab_p): Take
> optional
> 	optab subtype.
> 	(vect_widened_op_tree): Optionally ignore
> 	mismatch types.
> 	(vect_recog_dot_prod_pattern): Support usdot_prod_optab.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index
> 00caf3844ccf8ea289d581839766502d51b9e8d7..1356afb7f903f17c198103562b
> 5cd145ecb9966f 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5446,13 +5446,55 @@ Like @samp{fold_left_plus_@var{m}}, but takes
> an additional mask operand
> 
>  @cindex @code{sdot_prod@var{m}} instruction pattern  @item
> @samp{sdot_prod@var{m}}
> +
> +Compute the sum of the products of two signed elements.
> +Operand 1 and operand 2 are of the same mode. Their product, which is
> +of a wider mode, is computed and added to operand 3.
> +Operand 3 is of a mode equal or wider than the mode of the product. The
> +result is placed in operand 0, which is of the same mode as operand 3.
> +
> +Semantically the expressions perform the multiplication in the
> +following signs
> +
> +@smallexample
> +sdot<signed c, signed a, signed b> ==
> +   res = sign-ext (a) * sign-ext (b) + c @dots{} @end smallexample
> +
>  @cindex @code{udot_prod@var{m}} instruction pattern -@itemx
> @samp{udot_prod@var{m}} -Compute the sum of the products of two
> signed/unsigned elements.
> -Operand 1 and operand 2 are of the same mode. Their product, which is of a
> -wider mode, is computed and added to operand 3. Operand 3 is of a mode
> equal or -wider than the mode of the product. The result is placed in operand
> 0, which -is of the same mode as operand 3.
> +@item @samp{udot_prod@var{m}}
> +
> +Compute the sum of the products of two unsigned elements.
> +Operand 1 and operand 2 are of the same mode. Their product, which is
> +of a wider mode, is computed and added to operand 3.
> +Operand 3 is of a mode equal or wider than the mode of the product. The
> +result is placed in operand 0, which is of the same mode as operand 3.
> +
> +Semantically the expressions perform the multiplication in the
> +following signs
> +
> +@smallexample
> +udot<unsigned c, unsigned a, unsigned b> ==
> +   res = zero-ext (a) * zero-ext (b) + c @dots{} @end smallexample
> +
> +
> +
> +@cindex @code{usdot_prod@var{m}} instruction pattern
> +@item @samp{usdot_prod@var{m}}
> +Compute the sum of the products of elements of different signs.
> +Operand 1 must be unsigned and operand 2 signed. Their
> +product, which is of a wider mode, is computed and added to operand 3.
> +Operand 3 is of a mode equal or wider than the mode of the product. The
> +result is placed in operand 0, which is of the same mode as operand 3.
> +
> +Semantically the expressions perform the multiplication in the following
> signs
> +
> +@smallexample
> +usdot<unsigned c, unsigned a, signed b> ==
> +   res = ((unsigned-conv) sign-ext (a)) * zero-ext (b) + c
> +@dots{}
> +@end smallexample
> 
>  @cindex @code{ssad@var{m}} instruction pattern
>  @item @samp{ssad@var{m}}
> diff --git a/gcc/optabs-tree.h b/gcc/optabs-tree.h
> index
> c3aaa1a416991e856d3e24da45968a92ebada82c..fbd2b06b8dbfd560dfb66b31
> 4830e6b564b37abb 100644
> --- a/gcc/optabs-tree.h
> +++ b/gcc/optabs-tree.h
> @@ -29,7 +29,8 @@ enum optab_subtype
>  {
>    optab_default,
>    optab_scalar,
> -  optab_vector
> +  optab_vector,
> +  optab_vector_mixed_sign
>  };
> 
>  /* Return the optab used for computing the given operation on the type
> given by
> diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
> index
> 95ffe397c23e80c105afea52e9d47216bf52f55a..eeb5aeed3202cc6971b6447994
> bc5311e9c010bb 100644
> --- a/gcc/optabs-tree.c
> +++ b/gcc/optabs-tree.c
> @@ -127,7 +127,12 @@ optab_for_tree_code (enum tree_code code,
> const_tree type,
>        return TYPE_UNSIGNED (type) ? usum_widen_optab :
> ssum_widen_optab;
> 
>      case DOT_PROD_EXPR:
> -      return TYPE_UNSIGNED (type) ? udot_prod_optab : sdot_prod_optab;
> +      {
> +	if (subtype == optab_vector_mixed_sign)
> +	  return usdot_prod_optab;
> +
> +	return (TYPE_UNSIGNED (type) ? udot_prod_optab :
> sdot_prod_optab);
> +      }
> 
>      case SAD_EXPR:
>        return TYPE_UNSIGNED (type) ? usad_optab : ssad_optab;
> diff --git a/gcc/optabs.c b/gcc/optabs.c
> index
> 62a6bdb4c59bf8263c499245795576199606d372..14d8ad2f33fd75388435fe9123
> 80e177f8f3c54b 100644
> --- a/gcc/optabs.c
> +++ b/gcc/optabs.c
> @@ -262,6 +262,11 @@ expand_widen_pattern_expr (sepops ops, rtx op0,
> rtx op1, rtx wide_op,
>    bool sbool = false;
> 
>    oprnd0 = ops->op0;
> +  if (nops >= 2)
> +    oprnd1 = ops->op1;
> +  if (nops >= 3)
> +    oprnd2 = ops->op2;
> +
>    tmode0 = TYPE_MODE (TREE_TYPE (oprnd0));
>    if (ops->code == VEC_UNPACK_FIX_TRUNC_HI_EXPR
>        || ops->code == VEC_UNPACK_FIX_TRUNC_LO_EXPR)
> @@ -285,6 +290,27 @@ expand_widen_pattern_expr (sepops ops, rtx op0,
> rtx op1, rtx wide_op,
>  	   ? vec_unpacks_sbool_hi_optab : vec_unpacks_sbool_lo_optab);
>        sbool = true;
>      }
> +  else if (ops->code == DOT_PROD_EXPR)
> +    {
> +      enum optab_subtype subtype = optab_default;
> +      signop sign1 = TYPE_SIGN (TREE_TYPE (oprnd0));
> +      signop sign2 = TYPE_SIGN (TREE_TYPE (oprnd1));
> +      if (sign1 == sign2)
> +	;
> +      else if (sign1 == SIGNED && sign2 == UNSIGNED)
> +	{
> +	  subtype = optab_vector_mixed_sign;
> +	  /* Same as optab_vector_mixed_sign but flip the operands.  */
> +	  std::swap (op0, op1);
> +	}
> +      else if (sign1 == UNSIGNED && sign2 == SIGNED)
> +	subtype = optab_vector_mixed_sign;
> +      else
> +	gcc_unreachable ();
> +
> +      widen_pattern_optab
> +	= optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), subtype);
> +    }
>    else
>      widen_pattern_optab
>        = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), optab_default);
> @@ -298,10 +324,7 @@ expand_widen_pattern_expr (sepops ops, rtx op0,
> rtx op1, rtx wide_op,
>    gcc_assert (icode != CODE_FOR_nothing);
> 
>    if (nops >= 2)
> -    {
> -      oprnd1 = ops->op1;
> -      tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
> -    }
> +    tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
>    else if (sbool)
>      {
>        nops = 2;
> @@ -316,7 +339,6 @@ expand_widen_pattern_expr (sepops ops, rtx op0,
> rtx op1, rtx wide_op,
>      {
>        gcc_assert (tmode1 == tmode0);
>        gcc_assert (op1);
> -      oprnd2 = ops->op2;
>        wmode = TYPE_MODE (TREE_TYPE (oprnd2));
>      }
> 
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index
> b192a9d070b8aa72e5676b2eaa020b5bdd7ffcc8..f470c2168378cec840edf7fbd
> b7c18615baae928 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -352,6 +352,7 @@ OPTAB_D (uavg_ceil_optab, "uavg$a3_ceil")
>  OPTAB_D (sdot_prod_optab, "sdot_prod$I$a")
>  OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3")
>  OPTAB_D (udot_prod_optab, "udot_prod$I$a")
> +OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
>  OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
>  OPTAB_D (usad_optab, "usad$I$a")
>  OPTAB_D (ssad_optab, "ssad$I$a")
> diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
> index
> 02256580c986be426564adc1105ed2e1c69b0efc..f250f0fe99bec5278a0963e92b
> c1d2a61d9eee70 100644
> --- a/gcc/tree-cfg.c
> +++ b/gcc/tree-cfg.c
> @@ -4412,7 +4412,8 @@ verify_gimple_assign_ternary (gassign *stmt)
>  		  && !SCALAR_FLOAT_TYPE_P (rhs1_type))
>  		 || (!INTEGRAL_TYPE_P (lhs_type)
>  		     && !SCALAR_FLOAT_TYPE_P (lhs_type))))
> -	    || !types_compatible_p (rhs1_type, rhs2_type)
> +	    /* rhs1_type and rhs2_type may differ in sign.  */
> +	    || !tree_nop_conversion_p (rhs1_type, rhs2_type)
>  	    || !useless_type_conversion_p (lhs_type, rhs3_type)
>  	    || maybe_lt (GET_MODE_SIZE (element_mode (rhs3_type)),
>  			 2 * GET_MODE_SIZE (element_mode (rhs1_type))))
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index
> ee79808472cea88786e5c04756980b456c3f5a02..d2accf3c35ade25e8d2ff4ee88
> 136651e3e87c74 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -6663,6 +6663,12 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
>    bool lane_reduc_code_p
>      = (code == DOT_PROD_EXPR || code == WIDEN_SUM_EXPR || code ==
> SAD_EXPR);
>    int op_type = TREE_CODE_LENGTH (code);
> +  enum optab_subtype optab_query_kind = optab_vector;
> +  if (code == DOT_PROD_EXPR
> +      && TYPE_SIGN (TREE_TYPE (gimple_assign_rhs1 (stmt)))
> +	   != TYPE_SIGN (TREE_TYPE (gimple_assign_rhs2 (stmt))))
> +    optab_query_kind = optab_vector_mixed_sign;
> +
> 
>    scalar_dest = gimple_assign_lhs (stmt);
>    scalar_type = TREE_TYPE (scalar_dest);
> @@ -7190,7 +7196,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
>        bool ok = true;
> 
>        /* 4.1. check support for the operation in the loop  */
> -      optab optab = optab_for_tree_code (code, vectype_in, optab_vector);
> +      optab optab = optab_for_tree_code (code, vectype_in,
> optab_query_kind);
>        if (!optab)
>  	{
>  	  if (dump_enabled_p ())
> diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
> index
> c6b6feadb8d8d5cc57ded192cd68dd54b9185aef..77605e55dec7b4f6b0a1e1fd
> afa6313b987fa12c 100644
> --- a/gcc/tree-vect-patterns.c
> +++ b/gcc/tree-vect-patterns.c
> @@ -191,9 +191,9 @@ vect_get_external_def_edge (vec_info *vinfo, tree
> var)
>  }
> 
>  /* Return true if the target supports a vector version of CODE,
> -   where CODE is known to map to a direct optab.  ITYPE specifies
> -   the type of (some of) the scalar inputs and OTYPE specifies the
> -   type of the scalar result.
> +   where CODE is known to map to a direct optab with the given SUBTYPE.
> +   ITYPE specifies the type of (some of) the scalar inputs and OTYPE
> +   specifies the type of the scalar result.
> 
>     If CODE allows the inputs and outputs to have different type
>     (such as for WIDEN_SUM_EXPR), it is the input mode rather
> @@ -208,7 +208,8 @@ vect_get_external_def_edge (vec_info *vinfo, tree
> var)
>  static bool
>  vect_supportable_direct_optab_p (vec_info *vinfo, tree otype, tree_code
> code,
>  				 tree itype, tree *vecotype_out,
> -				 tree *vecitype_out = NULL)
> +				 tree *vecitype_out = NULL,
> +				 enum optab_subtype subtype =
> optab_default)
>  {
>    tree vecitype = get_vectype_for_scalar_type (vinfo, itype);
>    if (!vecitype)
> @@ -218,7 +219,7 @@ vect_supportable_direct_optab_p (vec_info *vinfo,
> tree otype, tree_code code,
>    if (!vecotype)
>      return false;
> 
> -  optab optab = optab_for_tree_code (code, vecitype, optab_default);
> +  optab optab = optab_for_tree_code (code, vecitype, subtype);
>    if (!optab)
>      return false;
> 
> @@ -521,6 +522,9 @@ vect_joust_widened_type (tree type, tree new_type,
> tree *common_type)
>    unsigned int precision = MAX (TYPE_PRECISION (*common_type),
>  				TYPE_PRECISION (new_type));
>    precision *= 2;
> +
> +  /* The resulting application is unsigned, check if we have enough
> +     precision to perform the operation.  */
>    if (precision * 2 > TYPE_PRECISION (type))
>      return false;
> 
> @@ -539,6 +543,10 @@ vect_joust_widened_type (tree type, tree
> new_type, tree *common_type)
>     to a type that (a) is narrower than the result of STMT_INFO and
>     (b) can hold all leaf operand values.
> 
> +   If SUBTYPE then allow that the signs of the operands
> +   may differ in signs but not in precision.  SUBTYPE is updated to reflect
> +   this.
> +
>     Return 0 if STMT_INFO isn't such a tree, or if no such COMMON_TYPE
>     exists.  */
> 
> @@ -546,7 +554,8 @@ static unsigned int
>  vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info,
> tree_code code,
>  		      tree_code widened_code, bool shift_p,
>  		      unsigned int max_nops,
> -		      vect_unpromoted_value *unprom, tree *common_type)
> +		      vect_unpromoted_value *unprom, tree *common_type,
> +		      enum optab_subtype *subtype = NULL)
>  {
>    /* Check for an integer operation with the right code.  */
>    gassign *assign = dyn_cast <gassign *> (stmt_info->stmt);
> @@ -607,7 +616,8 @@ vect_widened_op_tree (vec_info *vinfo,
> stmt_vec_info stmt_info, tree_code code,
>  		= vinfo->lookup_def (this_unprom->op);
>  	      nops = vect_widened_op_tree (vinfo, def_stmt_info, code,
>  					   widened_code, shift_p, max_nops,
> -					   this_unprom, common_type);
> +					   this_unprom, common_type,
> +					   subtype);
>  	      if (nops == 0)
>  		return 0;
> 
> @@ -625,7 +635,24 @@ vect_widened_op_tree (vec_info *vinfo,
> stmt_vec_info stmt_info, tree_code code,
>  		*common_type = this_unprom->type;
>  	      else if (!vect_joust_widened_type (type, this_unprom->type,
>  						 common_type))
> -		return 0;
> +		{
> +		  if (subtype)
> +		    {
> +		      tree new_type = *common_type;
> +		      /* See if we can sign extend the smaller type.  */
> +		      if (TYPE_PRECISION (this_unprom->type) >
> TYPE_PRECISION (new_type)
> +			  && (TYPE_UNSIGNED (this_unprom->type)
> && !TYPE_UNSIGNED (new_type)))
> +			new_type = build_nonstandard_integer_type
> (TYPE_PRECISION (this_unprom->type), true);
> +
> +		      if (tree_nop_conversion_p (this_unprom->type,
> new_type))
> +			{
> +			  *subtype = optab_vector_mixed_sign;
> +			  *common_type = new_type;
> +			}
> +		    }
> +		  else
> +		    return 0;
> +		}
>  	    }
>  	}
>        next_op += nops;
> @@ -806,12 +833,15 @@ vect_convert_input (vec_info *vinfo,
> stmt_vec_info stmt_info, tree type,
>  }
> 
>  /* Invoke vect_convert_input for N elements of UNPROM and store the
> -   result in the corresponding elements of RESULT.  */
> +   result in the corresponding elements of RESULT.
> +
> +   If SUBTYPE then don't convert the types if they only
> +   differ by sign.  */
> 
>  static void
>  vect_convert_inputs (vec_info *vinfo, stmt_vec_info stmt_info, unsigned
> int n,
>  		     tree *result, tree type, vect_unpromoted_value *unprom,
> -		     tree vectype)
> +		     tree vectype, enum optab_subtype subtype =
> optab_default)
>  {
>    for (unsigned int i = 0; i < n; ++i)
>      {
> @@ -819,8 +849,12 @@ vect_convert_inputs (vec_info *vinfo,
> stmt_vec_info stmt_info, unsigned int n,
>        for (j = 0; j < i; ++j)
>  	if (unprom[j].op == unprom[i].op)
>  	  break;
> +
>        if (j < i)
>  	result[i] = result[j];
> +      else if (subtype == optab_vector_mixed_sign
> +	       && tree_nop_conversion_p (type, unprom[i].type))
> +	result[i] = unprom[i].op;
>        else
>  	result[i] = vect_convert_input (vinfo, stmt_info,
>  					type, &unprom[i], vectype);
> @@ -895,7 +929,8 @@ vect_reassociating_reduction_p (vec_info *vinfo,
> 
>     Try to find the following pattern:
> 
> -     type x_t, y_t;
> +     type1a x_t
> +     type1b y_t;
>       TYPE1 prod;
>       TYPE2 sum = init;
>     loop:
> @@ -908,8 +943,10 @@ vect_reassociating_reduction_p (vec_info *vinfo,
>       [S6  prod = (TYPE2) prod;  #optional]
>       S7  sum_1 = prod + sum_0;
> 
> -   where 'TYPE1' is exactly double the size of type 'type', and 'TYPE2' is the
> -   same size of 'TYPE1' or bigger. This is a special case of a reduction
> +   where 'TYPE1' is exactly double the size of type 'type1a' and 'type1b',
> +   the sign of 'TYPE1' must be one of 'type1a' or 'type1b' but the sign of
> +   'type1a' and 'type1b' can differ. 'TYPE2' is the same size of 'TYPE1' or
> +   bigger and must be the same sign. This is a special case of a reduction
>     computation.
> 
>     Input:
> @@ -946,15 +983,15 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
> 
>    /* Look for the following pattern
>            DX = (TYPE1) X;
> -          DY = (TYPE1) Y;
> +	  DY = (TYPE1) Y;
>            DPROD = DX * DY;
> -          DDPROD = (TYPE2) DPROD;
> +	  DDPROD = (TYPE2) DPROD;
>            sum_1 = DDPROD + sum_0;
>       In which
>       - DX is double the size of X
>       - DY is double the size of Y
>       - DX, DY, DPROD all have the same type but the sign
> -       between DX, DY and DPROD can differ.
> +       between X, Y and DPROD can differ.
>       - sum is the same size of DPROD or bigger
>       - sum has been recognized as a reduction variable.
> 
> @@ -992,21 +1029,27 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
>    /* FORNOW.  Can continue analyzing the def-use chain when this stmt in a
> phi
>       inside the loop (in case we are analyzing an outer-loop).  */
>    vect_unpromoted_value unprom0[2];
> +  enum optab_subtype subtype = optab_vector;
>    if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR,
> WIDEN_MULT_EXPR,
> -			     false, 2, unprom0, &half_type))
> +			     false, 2, unprom0, &half_type, &subtype))
> +    return NULL;
> +
> +  if (subtype == optab_vector_mixed_sign
> +      && TYPE_UNSIGNED (unprom_mult.type)
> +      && TYPE_PRECISION (half_type) * 4 > TYPE_PRECISION
> (unprom_mult.type))
>      return NULL;
> 
>    vect_pattern_detected ("vect_recog_dot_prod_pattern", last_stmt);
> 
>    tree half_vectype;
>    if (!vect_supportable_direct_optab_p (vinfo, type, DOT_PROD_EXPR,
> half_type,
> -					type_out, &half_vectype))
> +					type_out, &half_vectype, subtype))
>      return NULL;
> 
>    /* Get the inputs in the appropriate types.  */
>    tree mult_oprnd[2];
>    vect_convert_inputs (vinfo, stmt_vinfo, 2, mult_oprnd, half_type,
> -		       unprom0, half_vectype);
> +		       unprom0, half_vectype, subtype);
> 
>    var = vect_recog_temp_ssa_var (type, NULL);
>    pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes.
  2021-06-14 12:06                     ` Tamar Christina
  2021-06-21  8:11                       ` Tamar Christina
@ 2021-06-22 10:56                       ` Richard Sandiford
  2021-06-22 11:16                         ` Richard Sandiford
  1 sibling, 1 reply; 35+ messages in thread
From: Richard Sandiford @ 2021-06-22 10:56 UTC (permalink / raw)
  To: Tamar Christina; +Cc: Richard Biener, nd, gcc-patches

Sorry for the slow review.

Just concentrating on tree-vect-patterns.c, as before:

Tamar Christina <Tamar.Christina@arm.com> writes:
> @@ -521,6 +522,9 @@ vect_joust_widened_type (tree type, tree new_type, tree *common_type)
>    unsigned int precision = MAX (TYPE_PRECISION (*common_type),
>  				TYPE_PRECISION (new_type));
>    precision *= 2;
> +
> +  /* The resulting application is unsigned, check if we have enough
> +     precision to perform the operation.  */
>    if (precision * 2 > TYPE_PRECISION (type))
>      return false;
>  

Not sure what the comment means by “application” here, but the common
type we pick is signed rather than unsigned.

> @@ -546,7 +554,8 @@ static unsigned int
>  vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
>  		      tree_code widened_code, bool shift_p,
>  		      unsigned int max_nops,
> -		      vect_unpromoted_value *unprom, tree *common_type)
> +		      vect_unpromoted_value *unprom, tree *common_type,
> +		      enum optab_subtype *subtype = NULL)
>  {
>    /* Check for an integer operation with the right code.  */
>    gassign *assign = dyn_cast <gassign *> (stmt_info->stmt);
> @@ -607,7 +616,8 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
>  		= vinfo->lookup_def (this_unprom->op);
>  	      nops = vect_widened_op_tree (vinfo, def_stmt_info, code,
>  					   widened_code, shift_p, max_nops,
> -					   this_unprom, common_type);
> +					   this_unprom, common_type,
> +					   subtype);
>  	      if (nops == 0)
>  		return 0;
>  
> @@ -625,7 +635,24 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
>  		*common_type = this_unprom->type;
>  	      else if (!vect_joust_widened_type (type, this_unprom->type,
>  						 common_type))
> -		return 0;
> +		{
> +		  if (subtype)
> +		    {

AIUI, if we get here then:

- there must be one unsigned operand (A) of precision P
- there must be one signed operand (B) with precision <= P
- we can't extend to precision 2*P 

A conversion is needed if B's precision is < P.
That conversion should be to a signed type with precision P.

So…

> +		      tree new_type = *common_type;
> +		      /* See if we can sign extend the smaller type.  */
> +		      if (TYPE_PRECISION (this_unprom->type) > TYPE_PRECISION (new_type)
> +			  && (TYPE_UNSIGNED (this_unprom->type) && !TYPE_UNSIGNED (new_type)))

…I think this second line could be an assert and

> +			new_type = build_nonstandard_integer_type (TYPE_PRECISION (this_unprom->type), true);

…picking an unsigned type here looks wrong.  The net effect would
be to convert B (the previous signed operand) to an unsigned type.

> +
> +		      if (tree_nop_conversion_p (this_unprom->type, new_type))
> +			{
> +			  *subtype = optab_vector_mixed_sign;
> +			  *common_type = new_type;
> +			}

IMO the sign of the common type shouldn't matter for optab_vector_mixed_sign:
if we need to convert operands later, it should be to the precision of
the common type but retaining the sign of the original type.
So I think it would be simpler to do:

		      if (TYPE_PRECISION (this_unprom->type)
			  > TYPE_PRECISION (*common_type)
			*common_type = this_unprom->type;
		      *subtype = optab_vector_mixed_sign;

here and adjust the conversion code as described below.

This also has the advantage of coping with > 2 operands, in case that
ever becomes important in future.

> @@ -806,12 +833,15 @@ vect_convert_input (vec_info *vinfo, stmt_vec_info stmt_info, tree type,
>  }
>  
>  /* Invoke vect_convert_input for N elements of UNPROM and store the
> -   result in the corresponding elements of RESULT.  */
> +   result in the corresponding elements of RESULT.
> +
> +   If SUBTYPE then don't convert the types if they only
> +   differ by sign.  */
>  
>  static void
>  vect_convert_inputs (vec_info *vinfo, stmt_vec_info stmt_info, unsigned int n,
>  		     tree *result, tree type, vect_unpromoted_value *unprom,
> -		     tree vectype)
> +		     tree vectype, enum optab_subtype subtype = optab_default)
>  {
>    for (unsigned int i = 0; i < n; ++i)
>      {
> @@ -819,8 +849,12 @@ vect_convert_inputs (vec_info *vinfo, stmt_vec_info stmt_info, unsigned int n,
>        for (j = 0; j < i; ++j)
>  	if (unprom[j].op == unprom[i].op)
>  	  break;
> +
>        if (j < i)
>  	result[i] = result[j];
> +      else if (subtype == optab_vector_mixed_sign
> +	       && tree_nop_conversion_p (type, unprom[i].type))
> +	result[i] = unprom[i].op;
>        else
>  	result[i] = vect_convert_input (vinfo, stmt_info,
>  					type, &unprom[i], vectype);

As noted above, I think we want to preserve the sign of the original
type for optab_vector_mixed_sign, even if a conversion is needed.
I think we should avoid the special case above and instead push
subtype down into vect_convert_input.  We can then adjust the
type at the head of that function:

  if (subtype == optab_vector_mixed_sign
      && TYPE_SIGN (type) != TYPE_SIGN (TREE_TYPE (unprom->op)))
    type = build_nonstandard_integer_type (TYPE_PRECISION (type),
					   TYPE_SIGN (this_unprom->type));

> @@ -895,7 +929,8 @@ vect_reassociating_reduction_p (vec_info *vinfo,
>  
>     Try to find the following pattern:
>  
> -     type x_t, y_t;
> +     type1a x_t
> +     type1b y_t;
>       TYPE1 prod;
>       TYPE2 sum = init;
>     loop:
> @@ -908,8 +943,10 @@ vect_reassociating_reduction_p (vec_info *vinfo,
>       [S6  prod = (TYPE2) prod;  #optional]
>       S7  sum_1 = prod + sum_0;
>  
> -   where 'TYPE1' is exactly double the size of type 'type', and 'TYPE2' is the
> -   same size of 'TYPE1' or bigger. This is a special case of a reduction
> +   where 'TYPE1' is exactly double the size of type 'type1a' and 'type1b',
> +   the sign of 'TYPE1' must be one of 'type1a' or 'type1b' but the sign of
> +   'type1a' and 'type1b' can differ. 'TYPE2' is the same size of 'TYPE1' or
> +   bigger and must be the same sign. This is a special case of a reduction

This last bit isn't true: TYPE2 is the type of the addition and can be
any sign.

>     computation.
>  
>     Input:
> @@ -946,15 +983,15 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
>  
>    /* Look for the following pattern
>            DX = (TYPE1) X;
> -          DY = (TYPE1) Y;
> +	  DY = (TYPE1) Y;
>            DPROD = DX * DY;
> -          DDPROD = (TYPE2) DPROD;
> +	  DDPROD = (TYPE2) DPROD;
>            sum_1 = DDPROD + sum_0;

Spurious whitespace changes: would be better to tabify the whole thing
or leave it as-is.

>       In which
>       - DX is double the size of X
>       - DY is double the size of Y
>       - DX, DY, DPROD all have the same type but the sign
> -       between DX, DY and DPROD can differ.
> +       between X, Y and DPROD can differ.
>       - sum is the same size of DPROD or bigger
>       - sum has been recognized as a reduction variable.
>  
> @@ -992,21 +1029,27 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
>    /* FORNOW.  Can continue analyzing the def-use chain when this stmt in a phi
>       inside the loop (in case we are analyzing an outer-loop).  */
>    vect_unpromoted_value unprom0[2];
> +  enum optab_subtype subtype = optab_vector;
>    if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR, WIDEN_MULT_EXPR,
> -			     false, 2, unprom0, &half_type))
> +			     false, 2, unprom0, &half_type, &subtype))
> +    return NULL;
> +
> +  if (subtype == optab_vector_mixed_sign
> +      && TYPE_UNSIGNED (unprom_mult.type)
> +      && TYPE_PRECISION (half_type) * 4 > TYPE_PRECISION (unprom_mult.type))
>      return NULL;

Isn't the final condition here instead that TYPE1 is narrower than TYPE2?
I.e. we need to reject the case in which we multiply a signed and an
unsigned value to get a (logically) signed result, but then zero-extend
it (rather than sign-extend it) to the precision of the addition.

That would make the test:

  if (subtype == optab_vector_mixed_sign
      && TYPE_UNSIGNED (unprom_mult.type)
      && TYPE_PRECISION (unprom_mult.type) < TYPE_PRECISION (type))
    return NULL;    
  
instead.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes.
  2021-06-22 10:56                       ` Richard Sandiford
@ 2021-06-22 11:16                         ` Richard Sandiford
  2021-07-12  9:18                           ` Tamar Christina
  0 siblings, 1 reply; 35+ messages in thread
From: Richard Sandiford @ 2021-06-22 11:16 UTC (permalink / raw)
  To: Tamar Christina; +Cc: Richard Biener, nd, gcc-patches

Richard Sandiford <richard.sandiford@arm.com> writes:
>> @@ -992,21 +1029,27 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
>>    /* FORNOW.  Can continue analyzing the def-use chain when this stmt in a phi
>>       inside the loop (in case we are analyzing an outer-loop).  */
>>    vect_unpromoted_value unprom0[2];
>> +  enum optab_subtype subtype = optab_vector;
>>    if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR, WIDEN_MULT_EXPR,
>> -			     false, 2, unprom0, &half_type))
>> +			     false, 2, unprom0, &half_type, &subtype))
>> +    return NULL;
>> +
>> +  if (subtype == optab_vector_mixed_sign
>> +      && TYPE_UNSIGNED (unprom_mult.type)
>> +      && TYPE_PRECISION (half_type) * 4 > TYPE_PRECISION (unprom_mult.type))
>>      return NULL;
>
> Isn't the final condition here instead that TYPE1 is narrower than TYPE2?
> I.e. we need to reject the case in which we multiply a signed and an
> unsigned value to get a (logically) signed result, but then zero-extend
> it (rather than sign-extend it) to the precision of the addition.
>
> That would make the test:
>
>   if (subtype == optab_vector_mixed_sign
>       && TYPE_UNSIGNED (unprom_mult.type)
>       && TYPE_PRECISION (unprom_mult.type) < TYPE_PRECISION (type))
>     return NULL;    
>   
> instead.

And folding that into the existing test gives:

  /* If there are two widening operations, make sure they agree on the sign
     of the extension.  The result of an optab_vector_mixed_sign operation
     is signed; otherwise, the result has the same sign as the operands.  */
  if (TYPE_PRECISION (unprom_mult.type) != TYPE_PRECISION (type)
      && (subtype == optab_vector_mixed_sign
	  ? TYPE_UNSIGNED (unprom_mult.type)
	  : TYPE_SIGN (unprom_mult.type) != TYPE_SIGN (half_type)))
    return NULL;

Thanks,
Richard

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes.
  2021-06-22 11:16                         ` Richard Sandiford
@ 2021-07-12  9:18                           ` Tamar Christina
  2021-07-12  9:39                             ` Richard Sandiford
  0 siblings, 1 reply; 35+ messages in thread
From: Tamar Christina @ 2021-07-12  9:18 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: Richard Biener, nd, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 19966 bytes --]

Hi,

> Richard Sandiford <richard.sandiford@arm.com> writes:
> >> @@ -992,21 +1029,27 @@ vect_recog_dot_prod_pattern (vec_info
> *vinfo,
> >>    /* FORNOW.  Can continue analyzing the def-use chain when this stmt in
> a phi
> >>       inside the loop (in case we are analyzing an outer-loop).  */
> >>    vect_unpromoted_value unprom0[2];
> >> +  enum optab_subtype subtype = optab_vector;
> >>    if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR,
> WIDEN_MULT_EXPR,
> >> -			     false, 2, unprom0, &half_type))
> >> +			     false, 2, unprom0, &half_type, &subtype))
> >> +    return NULL;
> >> +
> >> +  if (subtype == optab_vector_mixed_sign
> >> +      && TYPE_UNSIGNED (unprom_mult.type)
> >> +      && TYPE_PRECISION (half_type) * 4 > TYPE_PRECISION
> >> + (unprom_mult.type))
> >>      return NULL;
> >
> > Isn't the final condition here instead that TYPE1 is narrower than TYPE2?
> > I.e. we need to reject the case in which we multiply a signed and an
> > unsigned value to get a (logically) signed result, but then
> > zero-extend it (rather than sign-extend it) to the precision of the addition.
> >
> > That would make the test:
> >
> >   if (subtype == optab_vector_mixed_sign
> >       && TYPE_UNSIGNED (unprom_mult.type)
> >       && TYPE_PRECISION (unprom_mult.type) < TYPE_PRECISION (type))
> >     return NULL;
> >
> > instead.
> 
> And folding that into the existing test gives:
> 
>   /* If there are two widening operations, make sure they agree on the sign
>      of the extension.  The result of an optab_vector_mixed_sign operation
>      is signed; otherwise, the result has the same sign as the operands.  */
>   if (TYPE_PRECISION (unprom_mult.type) != TYPE_PRECISION (type)
>       && (subtype == optab_vector_mixed_sign
> 	  ? TYPE_UNSIGNED (unprom_mult.type)
> 	  : TYPE_SIGN (unprom_mult.type) != TYPE_SIGN (half_type)))
>     return NULL;
> 

I went with the first one which doesn't add the extra constraints for the
normal dotproduct as that makes it too restrictive. It's the type of the
multiplication that determines the operation so dotproduct can be used
a bit more than where we currently do.

This was relaxed in an earlier patch.

Updated patch attached.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* optabs.def (usdot_prod_optab): New.
	* doc/md.texi: Document it and clarify other dot prod optabs.
	* optabs-tree.h (enum optab_subtype): Add optab_vector_mixed_sign.
	* optabs-tree.c (optab_for_tree_code): Support usdot_prod_optab.
	* optabs.c (expand_widen_pattern_expr): Likewise.
	* tree-cfg.c (verify_gimple_assign_ternary): Likewise.
	* tree-vect-loop.c (vectorizable_reduction): Query dot-product kind.
	* tree-vect-patterns.c (vect_supportable_direct_optab_p): Take optional
	optab subtype.
	(vect_widened_op_tree): Optionally ignore
	mismatch types.
	(vect_recog_dot_prod_pattern): Support usdot_prod_optab.

---- Inline copy of patch ----

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 1b91814433057b1b377283fd1f40cb970dc3d243..323ba8eab78e2b2e582fa0633752930182e83ee5 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5446,13 +5446,55 @@ Like @samp{fold_left_plus_@var{m}}, but takes an additional mask operand
 
 @cindex @code{sdot_prod@var{m}} instruction pattern
 @item @samp{sdot_prod@var{m}}
+
+Compute the sum of the products of two signed elements.
+Operand 1 and operand 2 are of the same mode. Their
+product, which is of a wider mode, is computed and added to operand 3.
+Operand 3 is of a mode equal or wider than the mode of the product. The
+result is placed in operand 0, which is of the same mode as operand 3.
+
+Semantically the expressions perform the multiplication in the following signs
+
+@smallexample
+sdot<signed c, signed a, signed b> ==
+   res = sign-ext (a) * sign-ext (b) + c
+@dots{}
+@end smallexample
+
 @cindex @code{udot_prod@var{m}} instruction pattern
-@itemx @samp{udot_prod@var{m}}
-Compute the sum of the products of two signed/unsigned elements.
-Operand 1 and operand 2 are of the same mode. Their product, which is of a
-wider mode, is computed and added to operand 3. Operand 3 is of a mode equal or
-wider than the mode of the product. The result is placed in operand 0, which
-is of the same mode as operand 3.
+@item @samp{udot_prod@var{m}}
+
+Compute the sum of the products of two unsigned elements.
+Operand 1 and operand 2 are of the same mode. Their
+product, which is of a wider mode, is computed and added to operand 3.
+Operand 3 is of a mode equal or wider than the mode of the product. The
+result is placed in operand 0, which is of the same mode as operand 3.
+
+Semantically the expressions perform the multiplication in the following signs
+
+@smallexample
+udot<unsigned c, unsigned a, unsigned b> ==
+   res = zero-ext (a) * zero-ext (b) + c
+@dots{}
+@end smallexample
+
+
+
+@cindex @code{usdot_prod@var{m}} instruction pattern
+@item @samp{usdot_prod@var{m}}
+Compute the sum of the products of elements of different signs.
+Operand 1 must be unsigned and operand 2 signed. Their
+product, which is of a wider mode, is computed and added to operand 3.
+Operand 3 is of a mode equal or wider than the mode of the product. The
+result is placed in operand 0, which is of the same mode as operand 3.
+
+Semantically the expressions perform the multiplication in the following signs
+
+@smallexample
+usdot<unsigned c, unsigned a, signed b> ==
+   res = ((unsigned-conv) sign-ext (a)) * zero-ext (b) + c
+@dots{}
+@end smallexample
 
 @cindex @code{ssad@var{m}} instruction pattern
 @item @samp{ssad@var{m}}
diff --git a/gcc/optabs-tree.h b/gcc/optabs-tree.h
index c3aaa1a416991e856d3e24da45968a92ebada82c..fbd2b06b8dbfd560dfb66b314830e6b564b37abb 100644
--- a/gcc/optabs-tree.h
+++ b/gcc/optabs-tree.h
@@ -29,7 +29,8 @@ enum optab_subtype
 {
   optab_default,
   optab_scalar,
-  optab_vector
+  optab_vector,
+  optab_vector_mixed_sign
 };
 
 /* Return the optab used for computing the given operation on the type given by
diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
index 95ffe397c23e80c105afea52e9d47216bf52f55a..eeb5aeed3202cc6971b6447994bc5311e9c010bb 100644
--- a/gcc/optabs-tree.c
+++ b/gcc/optabs-tree.c
@@ -127,7 +127,12 @@ optab_for_tree_code (enum tree_code code, const_tree type,
       return TYPE_UNSIGNED (type) ? usum_widen_optab : ssum_widen_optab;
 
     case DOT_PROD_EXPR:
-      return TYPE_UNSIGNED (type) ? udot_prod_optab : sdot_prod_optab;
+      {
+	if (subtype == optab_vector_mixed_sign)
+	  return usdot_prod_optab;
+
+	return (TYPE_UNSIGNED (type) ? udot_prod_optab : sdot_prod_optab);
+      }
 
     case SAD_EXPR:
       return TYPE_UNSIGNED (type) ? usad_optab : ssad_optab;
diff --git a/gcc/optabs.c b/gcc/optabs.c
index 62a6bdb4c59bf8263c499245795576199606d372..14d8ad2f33fd75388435fe912380e177f8f3c54b 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -262,6 +262,11 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
   bool sbool = false;
 
   oprnd0 = ops->op0;
+  if (nops >= 2)
+    oprnd1 = ops->op1;
+  if (nops >= 3)
+    oprnd2 = ops->op2;
+
   tmode0 = TYPE_MODE (TREE_TYPE (oprnd0));
   if (ops->code == VEC_UNPACK_FIX_TRUNC_HI_EXPR
       || ops->code == VEC_UNPACK_FIX_TRUNC_LO_EXPR)
@@ -285,6 +290,27 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
 	   ? vec_unpacks_sbool_hi_optab : vec_unpacks_sbool_lo_optab);
       sbool = true;
     }
+  else if (ops->code == DOT_PROD_EXPR)
+    {
+      enum optab_subtype subtype = optab_default;
+      signop sign1 = TYPE_SIGN (TREE_TYPE (oprnd0));
+      signop sign2 = TYPE_SIGN (TREE_TYPE (oprnd1));
+      if (sign1 == sign2)
+	;
+      else if (sign1 == SIGNED && sign2 == UNSIGNED)
+	{
+	  subtype = optab_vector_mixed_sign;
+	  /* Same as optab_vector_mixed_sign but flip the operands.  */
+	  std::swap (op0, op1);
+	}
+      else if (sign1 == UNSIGNED && sign2 == SIGNED)
+	subtype = optab_vector_mixed_sign;
+      else
+	gcc_unreachable ();
+
+      widen_pattern_optab
+	= optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), subtype);
+    }
   else
     widen_pattern_optab
       = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), optab_default);
@@ -298,10 +324,7 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
   gcc_assert (icode != CODE_FOR_nothing);
 
   if (nops >= 2)
-    {
-      oprnd1 = ops->op1;
-      tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
-    }
+    tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
   else if (sbool)
     {
       nops = 2;
@@ -316,7 +339,6 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
     {
       gcc_assert (tmode1 == tmode0);
       gcc_assert (op1);
-      oprnd2 = ops->op2;
       wmode = TYPE_MODE (TREE_TYPE (oprnd2));
     }
 
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 41ab2598eb6c32c003cbed490796abf25d2ee315..574d355b6b3092cf893f5ab0e8ae0f6d9ffcefbd 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -352,6 +352,7 @@ OPTAB_D (uavg_ceil_optab, "uavg$a3_ceil")
 OPTAB_D (sdot_prod_optab, "sdot_prod$I$a")
 OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3")
 OPTAB_D (udot_prod_optab, "udot_prod$I$a")
+OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
 OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
 OPTAB_D (usad_optab, "usad$I$a")
 OPTAB_D (ssad_optab, "ssad$I$a")
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index c73e1cbdda6b9380190b03de66caee48c4e173e3..3750d2881cbb7fd1e71c0eb8c0d4929925fd4152 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -4434,7 +4434,8 @@ verify_gimple_assign_ternary (gassign *stmt)
 		  && !SCALAR_FLOAT_TYPE_P (rhs1_type))
 		 || (!INTEGRAL_TYPE_P (lhs_type)
 		     && !SCALAR_FLOAT_TYPE_P (lhs_type))))
-	    || !types_compatible_p (rhs1_type, rhs2_type)
+	    /* rhs1_type and rhs2_type may differ in sign.  */
+	    || !tree_nop_conversion_p (rhs1_type, rhs2_type)
 	    || !useless_type_conversion_p (lhs_type, rhs3_type)
 	    || maybe_lt (GET_MODE_SIZE (element_mode (rhs3_type)),
 			 2 * GET_MODE_SIZE (element_mode (rhs1_type))))
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 51a46a6d852fb342278bb9513d013702cff4b868..4e63e84cc70ca60c706c19367ccf256ea3f851b5 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -6663,6 +6663,12 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
   bool lane_reduc_code_p
     = (code == DOT_PROD_EXPR || code == WIDEN_SUM_EXPR || code == SAD_EXPR);
   int op_type = TREE_CODE_LENGTH (code);
+  enum optab_subtype optab_query_kind = optab_vector;
+  if (code == DOT_PROD_EXPR
+      && TYPE_SIGN (TREE_TYPE (gimple_assign_rhs1 (stmt)))
+	   != TYPE_SIGN (TREE_TYPE (gimple_assign_rhs2 (stmt))))
+    optab_query_kind = optab_vector_mixed_sign;
+
 
   scalar_dest = gimple_assign_lhs (stmt);
   scalar_type = TREE_TYPE (scalar_dest);
@@ -7190,7 +7196,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
       bool ok = true;
 
       /* 4.1. check support for the operation in the loop  */
-      optab optab = optab_for_tree_code (code, vectype_in, optab_vector);
+      optab optab = optab_for_tree_code (code, vectype_in, optab_query_kind);
       if (!optab)
 	{
 	  if (dump_enabled_p ())
diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index d71c8c6180c8711687471060e6c937561dfe5caf..13b435c96ffdd0e7a8adf0c8e63523afb69bd2dc 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -191,9 +191,9 @@ vect_get_external_def_edge (vec_info *vinfo, tree var)
 }
 
 /* Return true if the target supports a vector version of CODE,
-   where CODE is known to map to a direct optab.  ITYPE specifies
-   the type of (some of) the scalar inputs and OTYPE specifies the
-   type of the scalar result.
+   where CODE is known to map to a direct optab with the given SUBTYPE.
+   ITYPE specifies the type of (some of) the scalar inputs and OTYPE
+   specifies the type of the scalar result.
 
    If CODE allows the inputs and outputs to have different type
    (such as for WIDEN_SUM_EXPR), it is the input mode rather
@@ -208,7 +208,8 @@ vect_get_external_def_edge (vec_info *vinfo, tree var)
 static bool
 vect_supportable_direct_optab_p (vec_info *vinfo, tree otype, tree_code code,
 				 tree itype, tree *vecotype_out,
-				 tree *vecitype_out = NULL)
+				 tree *vecitype_out = NULL,
+				 enum optab_subtype subtype = optab_default)
 {
   tree vecitype = get_vectype_for_scalar_type (vinfo, itype);
   if (!vecitype)
@@ -218,7 +219,7 @@ vect_supportable_direct_optab_p (vec_info *vinfo, tree otype, tree_code code,
   if (!vecotype)
     return false;
 
-  optab optab = optab_for_tree_code (code, vecitype, optab_default);
+  optab optab = optab_for_tree_code (code, vecitype, subtype);
   if (!optab)
     return false;
 
@@ -521,6 +522,7 @@ vect_joust_widened_type (tree type, tree new_type, tree *common_type)
   unsigned int precision = MAX (TYPE_PRECISION (*common_type),
 				TYPE_PRECISION (new_type));
   precision *= 2;
+
   if (precision * 2 > TYPE_PRECISION (type))
     return false;
 
@@ -539,6 +541,10 @@ vect_joust_widened_type (tree type, tree new_type, tree *common_type)
    to a type that (a) is narrower than the result of STMT_INFO and
    (b) can hold all leaf operand values.
 
+   If SUBTYPE then allow that the signs of the operands
+   may differ in signs but not in precision.  SUBTYPE is updated to reflect
+   this.
+
    Return 0 if STMT_INFO isn't such a tree, or if no such COMMON_TYPE
    exists.  */
 
@@ -546,7 +552,8 @@ static unsigned int
 vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
 		      tree_code widened_code, bool shift_p,
 		      unsigned int max_nops,
-		      vect_unpromoted_value *unprom, tree *common_type)
+		      vect_unpromoted_value *unprom, tree *common_type,
+		      enum optab_subtype *subtype = NULL)
 {
   /* Check for an integer operation with the right code.  */
   gassign *assign = dyn_cast <gassign *> (stmt_info->stmt);
@@ -607,7 +614,8 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
 		= vinfo->lookup_def (this_unprom->op);
 	      nops = vect_widened_op_tree (vinfo, def_stmt_info, code,
 					   widened_code, shift_p, max_nops,
-					   this_unprom, common_type);
+					   this_unprom, common_type,
+					   subtype);
 	      if (nops == 0)
 		return 0;
 
@@ -625,7 +633,18 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
 		*common_type = this_unprom->type;
 	      else if (!vect_joust_widened_type (type, this_unprom->type,
 						 common_type))
-		return 0;
+		{
+		  if (subtype)
+		    {
+		      /* See if we can sign extend the smaller type.  */
+		      if (TYPE_PRECISION (this_unprom->type)
+			  > TYPE_PRECISION (*common_type))
+			*common_type = this_unprom->type;
+		      *subtype = optab_vector_mixed_sign;
+		    }
+		  else
+		    return 0;
+		}
 	    }
 	}
       next_op += nops;
@@ -725,12 +744,22 @@ vect_split_statement (vec_info *vinfo, stmt_vec_info stmt2_info, tree new_rhs,
 
 /* Convert UNPROM to TYPE and return the result, adding new statements
    to STMT_INFO's pattern definition statements if no better way is
-   available.  VECTYPE is the vector form of TYPE.  */
+   available.  VECTYPE is the vector form of TYPE.
+
+   If SUBTYPE then convert the type based on the subtype.  */
 
 static tree
 vect_convert_input (vec_info *vinfo, stmt_vec_info stmt_info, tree type,
-		    vect_unpromoted_value *unprom, tree vectype)
+		    vect_unpromoted_value *unprom, tree vectype,
+		    enum optab_subtype subtype = optab_default)
 {
+
+  /* Update the type if the signs differ.  */
+  if (subtype == optab_vector_mixed_sign
+      && TYPE_SIGN (type) != TYPE_SIGN (TREE_TYPE (unprom->op)))
+    type = build_nonstandard_integer_type (TYPE_PRECISION (type),
+					   TYPE_SIGN (unprom->type));
+
   /* Check for a no-op conversion.  */
   if (types_compatible_p (type, TREE_TYPE (unprom->op)))
     return unprom->op;
@@ -806,12 +835,14 @@ vect_convert_input (vec_info *vinfo, stmt_vec_info stmt_info, tree type,
 }
 
 /* Invoke vect_convert_input for N elements of UNPROM and store the
-   result in the corresponding elements of RESULT.  */
+   result in the corresponding elements of RESULT.
+
+   If SUBTYPE then convert the type based on the subtype.  */
 
 static void
 vect_convert_inputs (vec_info *vinfo, stmt_vec_info stmt_info, unsigned int n,
 		     tree *result, tree type, vect_unpromoted_value *unprom,
-		     tree vectype)
+		     tree vectype, enum optab_subtype subtype = optab_default)
 {
   for (unsigned int i = 0; i < n; ++i)
     {
@@ -819,11 +850,12 @@ vect_convert_inputs (vec_info *vinfo, stmt_vec_info stmt_info, unsigned int n,
       for (j = 0; j < i; ++j)
 	if (unprom[j].op == unprom[i].op)
 	  break;
+
       if (j < i)
 	result[i] = result[j];
       else
 	result[i] = vect_convert_input (vinfo, stmt_info,
-					type, &unprom[i], vectype);
+					type, &unprom[i], vectype, subtype);
     }
 }
 
@@ -895,7 +927,8 @@ vect_reassociating_reduction_p (vec_info *vinfo,
 
    Try to find the following pattern:
 
-     type x_t, y_t;
+     type1a x_t
+     type1b y_t;
      TYPE1 prod;
      TYPE2 sum = init;
    loop:
@@ -908,9 +941,9 @@ vect_reassociating_reduction_p (vec_info *vinfo,
      [S6  prod = (TYPE2) prod;  #optional]
      S7  sum_1 = prod + sum_0;
 
-   where 'TYPE1' is exactly double the size of type 'type', and 'TYPE2' is the
-   same size of 'TYPE1' or bigger. This is a special case of a reduction
-   computation.
+   where 'TYPE1' is exactly double the size of type 'type1a' and 'type1b',
+   the sign of 'TYPE1' must be one of 'type1a' or 'type1b' but the sign of
+   'type1a' and 'type1b' can differ.
 
    Input:
 
@@ -954,7 +987,7 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
      - DX is double the size of X
      - DY is double the size of Y
      - DX, DY, DPROD all have the same type but the sign
-       between DX, DY and DPROD can differ.
+       between X, Y and DPROD can differ.
      - sum is the same size of DPROD or bigger
      - sum has been recognized as a reduction variable.
 
@@ -992,21 +1025,30 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
   /* FORNOW.  Can continue analyzing the def-use chain when this stmt in a phi
      inside the loop (in case we are analyzing an outer-loop).  */
   vect_unpromoted_value unprom0[2];
+  enum optab_subtype subtype = optab_vector;
   if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR, WIDEN_MULT_EXPR,
-			     false, 2, unprom0, &half_type))
+			     false, 2, unprom0, &half_type, &subtype))
+    return NULL;
+
+  /* If there are two widening operations, make sure they agree on the sign
+     of the extension.  The result of an optab_vector_mixed_sign operation
+     is signed.  */
+  if (subtype == optab_vector_mixed_sign
+      && TYPE_UNSIGNED (unprom_mult.type)
+      && TYPE_PRECISION (unprom_mult.type) < TYPE_PRECISION (type))
     return NULL;
 
   vect_pattern_detected ("vect_recog_dot_prod_pattern", last_stmt);
 
   tree half_vectype;
   if (!vect_supportable_direct_optab_p (vinfo, type, DOT_PROD_EXPR, half_type,
-					type_out, &half_vectype))
+					type_out, &half_vectype, subtype))
     return NULL;
 
   /* Get the inputs in the appropriate types.  */
   tree mult_oprnd[2];
   vect_convert_inputs (vinfo, stmt_vinfo, 2, mult_oprnd, half_type,
-		       unprom0, half_vectype);
+		       unprom0, half_vectype, subtype);
 
   var = vect_recog_temp_ssa_var (type, NULL);
   pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,

[-- Attachment #2: rb14433.patch --]
[-- Type: application/octet-stream, Size: 16551 bytes --]

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 1b91814433057b1b377283fd1f40cb970dc3d243..323ba8eab78e2b2e582fa0633752930182e83ee5 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5446,13 +5446,55 @@ Like @samp{fold_left_plus_@var{m}}, but takes an additional mask operand
 
 @cindex @code{sdot_prod@var{m}} instruction pattern
 @item @samp{sdot_prod@var{m}}
+
+Compute the sum of the products of two signed elements.
+Operand 1 and operand 2 are of the same mode. Their
+product, which is of a wider mode, is computed and added to operand 3.
+Operand 3 is of a mode equal or wider than the mode of the product. The
+result is placed in operand 0, which is of the same mode as operand 3.
+
+Semantically the expressions perform the multiplication in the following signs
+
+@smallexample
+sdot<signed c, signed a, signed b> ==
+   res = sign-ext (a) * sign-ext (b) + c
+@dots{}
+@end smallexample
+
 @cindex @code{udot_prod@var{m}} instruction pattern
-@itemx @samp{udot_prod@var{m}}
-Compute the sum of the products of two signed/unsigned elements.
-Operand 1 and operand 2 are of the same mode. Their product, which is of a
-wider mode, is computed and added to operand 3. Operand 3 is of a mode equal or
-wider than the mode of the product. The result is placed in operand 0, which
-is of the same mode as operand 3.
+@item @samp{udot_prod@var{m}}
+
+Compute the sum of the products of two unsigned elements.
+Operand 1 and operand 2 are of the same mode. Their
+product, which is of a wider mode, is computed and added to operand 3.
+Operand 3 is of a mode equal or wider than the mode of the product. The
+result is placed in operand 0, which is of the same mode as operand 3.
+
+Semantically the expressions perform the multiplication in the following signs
+
+@smallexample
+udot<unsigned c, unsigned a, unsigned b> ==
+   res = zero-ext (a) * zero-ext (b) + c
+@dots{}
+@end smallexample
+
+
+
+@cindex @code{usdot_prod@var{m}} instruction pattern
+@item @samp{usdot_prod@var{m}}
+Compute the sum of the products of elements of different signs.
+Operand 1 must be unsigned and operand 2 signed. Their
+product, which is of a wider mode, is computed and added to operand 3.
+Operand 3 is of a mode equal or wider than the mode of the product. The
+result is placed in operand 0, which is of the same mode as operand 3.
+
+Semantically the expressions perform the multiplication in the following signs
+
+@smallexample
+usdot<unsigned c, unsigned a, signed b> ==
+   res = ((unsigned-conv) sign-ext (a)) * zero-ext (b) + c
+@dots{}
+@end smallexample
 
 @cindex @code{ssad@var{m}} instruction pattern
 @item @samp{ssad@var{m}}
diff --git a/gcc/optabs-tree.h b/gcc/optabs-tree.h
index c3aaa1a416991e856d3e24da45968a92ebada82c..fbd2b06b8dbfd560dfb66b314830e6b564b37abb 100644
--- a/gcc/optabs-tree.h
+++ b/gcc/optabs-tree.h
@@ -29,7 +29,8 @@ enum optab_subtype
 {
   optab_default,
   optab_scalar,
-  optab_vector
+  optab_vector,
+  optab_vector_mixed_sign
 };
 
 /* Return the optab used for computing the given operation on the type given by
diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
index 95ffe397c23e80c105afea52e9d47216bf52f55a..eeb5aeed3202cc6971b6447994bc5311e9c010bb 100644
--- a/gcc/optabs-tree.c
+++ b/gcc/optabs-tree.c
@@ -127,7 +127,12 @@ optab_for_tree_code (enum tree_code code, const_tree type,
       return TYPE_UNSIGNED (type) ? usum_widen_optab : ssum_widen_optab;
 
     case DOT_PROD_EXPR:
-      return TYPE_UNSIGNED (type) ? udot_prod_optab : sdot_prod_optab;
+      {
+	if (subtype == optab_vector_mixed_sign)
+	  return usdot_prod_optab;
+
+	return (TYPE_UNSIGNED (type) ? udot_prod_optab : sdot_prod_optab);
+      }
 
     case SAD_EXPR:
       return TYPE_UNSIGNED (type) ? usad_optab : ssad_optab;
diff --git a/gcc/optabs.c b/gcc/optabs.c
index 62a6bdb4c59bf8263c499245795576199606d372..14d8ad2f33fd75388435fe912380e177f8f3c54b 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -262,6 +262,11 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
   bool sbool = false;
 
   oprnd0 = ops->op0;
+  if (nops >= 2)
+    oprnd1 = ops->op1;
+  if (nops >= 3)
+    oprnd2 = ops->op2;
+
   tmode0 = TYPE_MODE (TREE_TYPE (oprnd0));
   if (ops->code == VEC_UNPACK_FIX_TRUNC_HI_EXPR
       || ops->code == VEC_UNPACK_FIX_TRUNC_LO_EXPR)
@@ -285,6 +290,27 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
 	   ? vec_unpacks_sbool_hi_optab : vec_unpacks_sbool_lo_optab);
       sbool = true;
     }
+  else if (ops->code == DOT_PROD_EXPR)
+    {
+      enum optab_subtype subtype = optab_default;
+      signop sign1 = TYPE_SIGN (TREE_TYPE (oprnd0));
+      signop sign2 = TYPE_SIGN (TREE_TYPE (oprnd1));
+      if (sign1 == sign2)
+	;
+      else if (sign1 == SIGNED && sign2 == UNSIGNED)
+	{
+	  subtype = optab_vector_mixed_sign;
+	  /* Same as optab_vector_mixed_sign but flip the operands.  */
+	  std::swap (op0, op1);
+	}
+      else if (sign1 == UNSIGNED && sign2 == SIGNED)
+	subtype = optab_vector_mixed_sign;
+      else
+	gcc_unreachable ();
+
+      widen_pattern_optab
+	= optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), subtype);
+    }
   else
     widen_pattern_optab
       = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), optab_default);
@@ -298,10 +324,7 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
   gcc_assert (icode != CODE_FOR_nothing);
 
   if (nops >= 2)
-    {
-      oprnd1 = ops->op1;
-      tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
-    }
+    tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
   else if (sbool)
     {
       nops = 2;
@@ -316,7 +339,6 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
     {
       gcc_assert (tmode1 == tmode0);
       gcc_assert (op1);
-      oprnd2 = ops->op2;
       wmode = TYPE_MODE (TREE_TYPE (oprnd2));
     }
 
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 41ab2598eb6c32c003cbed490796abf25d2ee315..574d355b6b3092cf893f5ab0e8ae0f6d9ffcefbd 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -352,6 +352,7 @@ OPTAB_D (uavg_ceil_optab, "uavg$a3_ceil")
 OPTAB_D (sdot_prod_optab, "sdot_prod$I$a")
 OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3")
 OPTAB_D (udot_prod_optab, "udot_prod$I$a")
+OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
 OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
 OPTAB_D (usad_optab, "usad$I$a")
 OPTAB_D (ssad_optab, "ssad$I$a")
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index c73e1cbdda6b9380190b03de66caee48c4e173e3..3750d2881cbb7fd1e71c0eb8c0d4929925fd4152 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -4434,7 +4434,8 @@ verify_gimple_assign_ternary (gassign *stmt)
 		  && !SCALAR_FLOAT_TYPE_P (rhs1_type))
 		 || (!INTEGRAL_TYPE_P (lhs_type)
 		     && !SCALAR_FLOAT_TYPE_P (lhs_type))))
-	    || !types_compatible_p (rhs1_type, rhs2_type)
+	    /* rhs1_type and rhs2_type may differ in sign.  */
+	    || !tree_nop_conversion_p (rhs1_type, rhs2_type)
 	    || !useless_type_conversion_p (lhs_type, rhs3_type)
 	    || maybe_lt (GET_MODE_SIZE (element_mode (rhs3_type)),
 			 2 * GET_MODE_SIZE (element_mode (rhs1_type))))
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 51a46a6d852fb342278bb9513d013702cff4b868..4e63e84cc70ca60c706c19367ccf256ea3f851b5 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -6663,6 +6663,12 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
   bool lane_reduc_code_p
     = (code == DOT_PROD_EXPR || code == WIDEN_SUM_EXPR || code == SAD_EXPR);
   int op_type = TREE_CODE_LENGTH (code);
+  enum optab_subtype optab_query_kind = optab_vector;
+  if (code == DOT_PROD_EXPR
+      && TYPE_SIGN (TREE_TYPE (gimple_assign_rhs1 (stmt)))
+	   != TYPE_SIGN (TREE_TYPE (gimple_assign_rhs2 (stmt))))
+    optab_query_kind = optab_vector_mixed_sign;
+
 
   scalar_dest = gimple_assign_lhs (stmt);
   scalar_type = TREE_TYPE (scalar_dest);
@@ -7190,7 +7196,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
       bool ok = true;
 
       /* 4.1. check support for the operation in the loop  */
-      optab optab = optab_for_tree_code (code, vectype_in, optab_vector);
+      optab optab = optab_for_tree_code (code, vectype_in, optab_query_kind);
       if (!optab)
 	{
 	  if (dump_enabled_p ())
diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index d71c8c6180c8711687471060e6c937561dfe5caf..13b435c96ffdd0e7a8adf0c8e63523afb69bd2dc 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -191,9 +191,9 @@ vect_get_external_def_edge (vec_info *vinfo, tree var)
 }
 
 /* Return true if the target supports a vector version of CODE,
-   where CODE is known to map to a direct optab.  ITYPE specifies
-   the type of (some of) the scalar inputs and OTYPE specifies the
-   type of the scalar result.
+   where CODE is known to map to a direct optab with the given SUBTYPE.
+   ITYPE specifies the type of (some of) the scalar inputs and OTYPE
+   specifies the type of the scalar result.
 
    If CODE allows the inputs and outputs to have different type
    (such as for WIDEN_SUM_EXPR), it is the input mode rather
@@ -208,7 +208,8 @@ vect_get_external_def_edge (vec_info *vinfo, tree var)
 static bool
 vect_supportable_direct_optab_p (vec_info *vinfo, tree otype, tree_code code,
 				 tree itype, tree *vecotype_out,
-				 tree *vecitype_out = NULL)
+				 tree *vecitype_out = NULL,
+				 enum optab_subtype subtype = optab_default)
 {
   tree vecitype = get_vectype_for_scalar_type (vinfo, itype);
   if (!vecitype)
@@ -218,7 +219,7 @@ vect_supportable_direct_optab_p (vec_info *vinfo, tree otype, tree_code code,
   if (!vecotype)
     return false;
 
-  optab optab = optab_for_tree_code (code, vecitype, optab_default);
+  optab optab = optab_for_tree_code (code, vecitype, subtype);
   if (!optab)
     return false;
 
@@ -521,6 +522,7 @@ vect_joust_widened_type (tree type, tree new_type, tree *common_type)
   unsigned int precision = MAX (TYPE_PRECISION (*common_type),
 				TYPE_PRECISION (new_type));
   precision *= 2;
+
   if (precision * 2 > TYPE_PRECISION (type))
     return false;
 
@@ -539,6 +541,10 @@ vect_joust_widened_type (tree type, tree new_type, tree *common_type)
    to a type that (a) is narrower than the result of STMT_INFO and
    (b) can hold all leaf operand values.
 
+   If SUBTYPE then allow that the signs of the operands
+   may differ in signs but not in precision.  SUBTYPE is updated to reflect
+   this.
+
    Return 0 if STMT_INFO isn't such a tree, or if no such COMMON_TYPE
    exists.  */
 
@@ -546,7 +552,8 @@ static unsigned int
 vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
 		      tree_code widened_code, bool shift_p,
 		      unsigned int max_nops,
-		      vect_unpromoted_value *unprom, tree *common_type)
+		      vect_unpromoted_value *unprom, tree *common_type,
+		      enum optab_subtype *subtype = NULL)
 {
   /* Check for an integer operation with the right code.  */
   gassign *assign = dyn_cast <gassign *> (stmt_info->stmt);
@@ -607,7 +614,8 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
 		= vinfo->lookup_def (this_unprom->op);
 	      nops = vect_widened_op_tree (vinfo, def_stmt_info, code,
 					   widened_code, shift_p, max_nops,
-					   this_unprom, common_type);
+					   this_unprom, common_type,
+					   subtype);
 	      if (nops == 0)
 		return 0;
 
@@ -625,7 +633,18 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
 		*common_type = this_unprom->type;
 	      else if (!vect_joust_widened_type (type, this_unprom->type,
 						 common_type))
-		return 0;
+		{
+		  if (subtype)
+		    {
+		      /* See if we can sign extend the smaller type.  */
+		      if (TYPE_PRECISION (this_unprom->type)
+			  > TYPE_PRECISION (*common_type))
+			*common_type = this_unprom->type;
+		      *subtype = optab_vector_mixed_sign;
+		    }
+		  else
+		    return 0;
+		}
 	    }
 	}
       next_op += nops;
@@ -725,12 +744,22 @@ vect_split_statement (vec_info *vinfo, stmt_vec_info stmt2_info, tree new_rhs,
 
 /* Convert UNPROM to TYPE and return the result, adding new statements
    to STMT_INFO's pattern definition statements if no better way is
-   available.  VECTYPE is the vector form of TYPE.  */
+   available.  VECTYPE is the vector form of TYPE.
+
+   If SUBTYPE then convert the type based on the subtype.  */
 
 static tree
 vect_convert_input (vec_info *vinfo, stmt_vec_info stmt_info, tree type,
-		    vect_unpromoted_value *unprom, tree vectype)
+		    vect_unpromoted_value *unprom, tree vectype,
+		    enum optab_subtype subtype = optab_default)
 {
+
+  /* Update the type if the signs differ.  */
+  if (subtype == optab_vector_mixed_sign
+      && TYPE_SIGN (type) != TYPE_SIGN (TREE_TYPE (unprom->op)))
+    type = build_nonstandard_integer_type (TYPE_PRECISION (type),
+					   TYPE_SIGN (unprom->type));
+
   /* Check for a no-op conversion.  */
   if (types_compatible_p (type, TREE_TYPE (unprom->op)))
     return unprom->op;
@@ -806,12 +835,14 @@ vect_convert_input (vec_info *vinfo, stmt_vec_info stmt_info, tree type,
 }
 
 /* Invoke vect_convert_input for N elements of UNPROM and store the
-   result in the corresponding elements of RESULT.  */
+   result in the corresponding elements of RESULT.
+
+   If SUBTYPE then convert the type based on the subtype.  */
 
 static void
 vect_convert_inputs (vec_info *vinfo, stmt_vec_info stmt_info, unsigned int n,
 		     tree *result, tree type, vect_unpromoted_value *unprom,
-		     tree vectype)
+		     tree vectype, enum optab_subtype subtype = optab_default)
 {
   for (unsigned int i = 0; i < n; ++i)
     {
@@ -819,11 +850,12 @@ vect_convert_inputs (vec_info *vinfo, stmt_vec_info stmt_info, unsigned int n,
       for (j = 0; j < i; ++j)
 	if (unprom[j].op == unprom[i].op)
 	  break;
+
       if (j < i)
 	result[i] = result[j];
       else
 	result[i] = vect_convert_input (vinfo, stmt_info,
-					type, &unprom[i], vectype);
+					type, &unprom[i], vectype, subtype);
     }
 }
 
@@ -895,7 +927,8 @@ vect_reassociating_reduction_p (vec_info *vinfo,
 
    Try to find the following pattern:
 
-     type x_t, y_t;
+     type1a x_t
+     type1b y_t;
      TYPE1 prod;
      TYPE2 sum = init;
    loop:
@@ -908,9 +941,9 @@ vect_reassociating_reduction_p (vec_info *vinfo,
      [S6  prod = (TYPE2) prod;  #optional]
      S7  sum_1 = prod + sum_0;
 
-   where 'TYPE1' is exactly double the size of type 'type', and 'TYPE2' is the
-   same size of 'TYPE1' or bigger. This is a special case of a reduction
-   computation.
+   where 'TYPE1' is exactly double the size of type 'type1a' and 'type1b',
+   the sign of 'TYPE1' must be one of 'type1a' or 'type1b' but the sign of
+   'type1a' and 'type1b' can differ.
 
    Input:
 
@@ -954,7 +987,7 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
      - DX is double the size of X
      - DY is double the size of Y
      - DX, DY, DPROD all have the same type but the sign
-       between DX, DY and DPROD can differ.
+       between X, Y and DPROD can differ.
      - sum is the same size of DPROD or bigger
      - sum has been recognized as a reduction variable.
 
@@ -992,21 +1025,30 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
   /* FORNOW.  Can continue analyzing the def-use chain when this stmt in a phi
      inside the loop (in case we are analyzing an outer-loop).  */
   vect_unpromoted_value unprom0[2];
+  enum optab_subtype subtype = optab_vector;
   if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR, WIDEN_MULT_EXPR,
-			     false, 2, unprom0, &half_type))
+			     false, 2, unprom0, &half_type, &subtype))
+    return NULL;
+
+  /* If there are two widening operations, make sure they agree on the sign
+     of the extension.  The result of an optab_vector_mixed_sign operation
+     is signed.  */
+  if (subtype == optab_vector_mixed_sign
+      && TYPE_UNSIGNED (unprom_mult.type)
+      && TYPE_PRECISION (unprom_mult.type) < TYPE_PRECISION (type))
     return NULL;
 
   vect_pattern_detected ("vect_recog_dot_prod_pattern", last_stmt);
 
   tree half_vectype;
   if (!vect_supportable_direct_optab_p (vinfo, type, DOT_PROD_EXPR, half_type,
-					type_out, &half_vectype))
+					type_out, &half_vectype, subtype))
     return NULL;
 
   /* Get the inputs in the appropriate types.  */
   tree mult_oprnd[2];
   vect_convert_inputs (vinfo, stmt_vinfo, 2, mult_oprnd, half_type,
-		       unprom0, half_vectype);
+		       unprom0, half_vectype, subtype);
 
   var = vect_recog_temp_ssa_var (type, NULL);
   pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes.
  2021-07-12  9:18                           ` Tamar Christina
@ 2021-07-12  9:39                             ` Richard Sandiford
  2021-07-12  9:56                               ` Tamar Christina
  0 siblings, 1 reply; 35+ messages in thread
From: Richard Sandiford @ 2021-07-12  9:39 UTC (permalink / raw)
  To: Tamar Christina; +Cc: Richard Biener, nd, gcc-patches

Tamar Christina <Tamar.Christina@arm.com> writes:
> Hi,
>
>> Richard Sandiford <richard.sandiford@arm.com> writes:
>> >> @@ -992,21 +1029,27 @@ vect_recog_dot_prod_pattern (vec_info
>> *vinfo,
>> >>    /* FORNOW.  Can continue analyzing the def-use chain when this stmt in
>> a phi
>> >>       inside the loop (in case we are analyzing an outer-loop).  */
>> >>    vect_unpromoted_value unprom0[2];
>> >> +  enum optab_subtype subtype = optab_vector;
>> >>    if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR,
>> WIDEN_MULT_EXPR,
>> >> -			     false, 2, unprom0, &half_type))
>> >> +			     false, 2, unprom0, &half_type, &subtype))
>> >> +    return NULL;
>> >> +
>> >> +  if (subtype == optab_vector_mixed_sign
>> >> +      && TYPE_UNSIGNED (unprom_mult.type)
>> >> +      && TYPE_PRECISION (half_type) * 4 > TYPE_PRECISION
>> >> + (unprom_mult.type))
>> >>      return NULL;
>> >
>> > Isn't the final condition here instead that TYPE1 is narrower than TYPE2?
>> > I.e. we need to reject the case in which we multiply a signed and an
>> > unsigned value to get a (logically) signed result, but then
>> > zero-extend it (rather than sign-extend it) to the precision of the addition.
>> >
>> > That would make the test:
>> >
>> >   if (subtype == optab_vector_mixed_sign
>> >       && TYPE_UNSIGNED (unprom_mult.type)
>> >       && TYPE_PRECISION (unprom_mult.type) < TYPE_PRECISION (type))
>> >     return NULL;
>> >
>> > instead.
>> 
>> And folding that into the existing test gives:
>> 
>>   /* If there are two widening operations, make sure they agree on the sign
>>      of the extension.  The result of an optab_vector_mixed_sign operation
>>      is signed; otherwise, the result has the same sign as the operands.  */
>>   if (TYPE_PRECISION (unprom_mult.type) != TYPE_PRECISION (type)
>>       && (subtype == optab_vector_mixed_sign
>> 	  ? TYPE_UNSIGNED (unprom_mult.type)
>> 	  : TYPE_SIGN (unprom_mult.type) != TYPE_SIGN (half_type)))
>>     return NULL;
>> 
>
> I went with the first one which doesn't add the extra constraints for the
> normal dotproduct as that makes it too restrictive. It's the type of the
> multiplication that determines the operation so dotproduct can be used
> a bit more than where we currently do.
>
> This was relaxed in an earlier patch.

I didn't mean that we should add extra constraints to the normal case
though.  The existing test I was referring to above was:

  /* If there are two widening operations, make sure they agree on
     the sign of the extension.  */
  if (TYPE_PRECISION (unprom_mult.type) != TYPE_PRECISION (type)
      && TYPE_SIGN (unprom_mult.type) != TYPE_SIGN (half_type))
    return NULL;

Although this existing test makes sense for the normal case, IMO testing
TYPE_SIGN (half_type) doesn't make sense for the mixed-sign case.  I think
we should therefore replace the existing test with:

  /* If there are two widening operations, make sure they agree on the sign
     of the extension.  The result of an optab_vector_mixed_sign operation
     is signed; otherwise, the result has the same sign as the operands.  */
  if (TYPE_PRECISION (unprom_mult.type) != TYPE_PRECISION (type)
      && (subtype == optab_vector_mixed_sign
         ? TYPE_UNSIGNED (unprom_mult.type)
         : TYPE_SIGN (unprom_mult.type) != TYPE_SIGN (half_type)))
    return NULL;

rather than add a separate condition for the mixed-sign case.
The behaviour of the normal case is the same both ways.

Thanks,
Richard



^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes.
  2021-07-12  9:39                             ` Richard Sandiford
@ 2021-07-12  9:56                               ` Tamar Christina
  2021-07-12 10:25                                 ` Richard Sandiford
  0 siblings, 1 reply; 35+ messages in thread
From: Tamar Christina @ 2021-07-12  9:56 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: Richard Biener, nd, gcc-patches



> -----Original Message-----
> From: Richard Sandiford <richard.sandiford@arm.com>
> Sent: Monday, July 12, 2021 10:39 AM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: Richard Biener <rguenther@suse.de>; nd <nd@arm.com>; gcc-
> patches@gcc.gnu.org
> Subject: Re: [PATCH 1/4]middle-end Vect: Add support for dot-product
> where the sign for the multiplicant changes.
> 
> Tamar Christina <Tamar.Christina@arm.com> writes:
> > Hi,
> >
> >> Richard Sandiford <richard.sandiford@arm.com> writes:
> >> >> @@ -992,21 +1029,27 @@ vect_recog_dot_prod_pattern (vec_info
> >> *vinfo,
> >> >>    /* FORNOW.  Can continue analyzing the def-use chain when this
> >> >> stmt in
> >> a phi
> >> >>       inside the loop (in case we are analyzing an outer-loop).  */
> >> >>    vect_unpromoted_value unprom0[2];
> >> >> +  enum optab_subtype subtype = optab_vector;
> >> >>    if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR,
> >> WIDEN_MULT_EXPR,
> >> >> -			     false, 2, unprom0, &half_type))
> >> >> +			     false, 2, unprom0, &half_type, &subtype))
> >> >> +    return NULL;
> >> >> +
> >> >> +  if (subtype == optab_vector_mixed_sign
> >> >> +      && TYPE_UNSIGNED (unprom_mult.type)
> >> >> +      && TYPE_PRECISION (half_type) * 4 > TYPE_PRECISION
> >> >> + (unprom_mult.type))
> >> >>      return NULL;
> >> >
> >> > Isn't the final condition here instead that TYPE1 is narrower than TYPE2?
> >> > I.e. we need to reject the case in which we multiply a signed and
> >> > an unsigned value to get a (logically) signed result, but then
> >> > zero-extend it (rather than sign-extend it) to the precision of the
> addition.
> >> >
> >> > That would make the test:
> >> >
> >> >   if (subtype == optab_vector_mixed_sign
> >> >       && TYPE_UNSIGNED (unprom_mult.type)
> >> >       && TYPE_PRECISION (unprom_mult.type) < TYPE_PRECISION (type))
> >> >     return NULL;
> >> >
> >> > instead.
> >>
> >> And folding that into the existing test gives:
> >>
> >>   /* If there are two widening operations, make sure they agree on the
> sign
> >>      of the extension.  The result of an optab_vector_mixed_sign operation
> >>      is signed; otherwise, the result has the same sign as the operands.  */
> >>   if (TYPE_PRECISION (unprom_mult.type) != TYPE_PRECISION (type)
> >>       && (subtype == optab_vector_mixed_sign
> >> 	  ? TYPE_UNSIGNED (unprom_mult.type)
> >> 	  : TYPE_SIGN (unprom_mult.type) != TYPE_SIGN (half_type)))
> >>     return NULL;
> >>
> >
> > I went with the first one which doesn't add the extra constraints for
> > the normal dotproduct as that makes it too restrictive. It's the type
> > of the multiplication that determines the operation so dotproduct can
> > be used a bit more than where we currently do.
> >
> > This was relaxed in an earlier patch.
> 
> I didn't mean that we should add extra constraints to the normal case though.
> The existing test I was referring to above was:
> 
>   /* If there are two widening operations, make sure they agree on
>      the sign of the extension.  */
>   if (TYPE_PRECISION (unprom_mult.type) != TYPE_PRECISION (type)
>       && TYPE_SIGN (unprom_mult.type) != TYPE_SIGN (half_type))
>     return NULL;

But as I mentioned, this restriction is unneeded and has been removed hence why it's not in my patchset's diff.
It's removed by https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569851.html which Richi conditioned on
the rest of these patches being approved.

This change needlessly blocks test vect-reduc-dot-[2,3,6,7].c from being dotproducts for instance

It's also part of the deficiency between GCC codegen and Clang https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88492#c6

Regards,
Tamar

> 
> Although this existing test makes sense for the normal case, IMO testing
> TYPE_SIGN (half_type) doesn't make sense for the mixed-sign case.  I think
> we should therefore replace the existing test with:
> 
>   /* If there are two widening operations, make sure they agree on the sign
>      of the extension.  The result of an optab_vector_mixed_sign operation
>      is signed; otherwise, the result has the same sign as the operands.  */
>   if (TYPE_PRECISION (unprom_mult.type) != TYPE_PRECISION (type)
>       && (subtype == optab_vector_mixed_sign
>          ? TYPE_UNSIGNED (unprom_mult.type)
>          : TYPE_SIGN (unprom_mult.type) != TYPE_SIGN (half_type)))
>     return NULL;
> 
> rather than add a separate condition for the mixed-sign case.
> The behaviour of the normal case is the same both ways.
> 
> Thanks,
> Richard
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes.
  2021-07-12  9:56                               ` Tamar Christina
@ 2021-07-12 10:25                                 ` Richard Sandiford
  2021-07-12 12:29                                   ` Tamar Christina
  0 siblings, 1 reply; 35+ messages in thread
From: Richard Sandiford @ 2021-07-12 10:25 UTC (permalink / raw)
  To: Tamar Christina; +Cc: Richard Biener, nd, gcc-patches

Tamar Christina <Tamar.Christina@arm.com> writes:
>> -----Original Message-----
>> From: Richard Sandiford <richard.sandiford@arm.com>
>> Sent: Monday, July 12, 2021 10:39 AM
>> To: Tamar Christina <Tamar.Christina@arm.com>
>> Cc: Richard Biener <rguenther@suse.de>; nd <nd@arm.com>; gcc-
>> patches@gcc.gnu.org
>> Subject: Re: [PATCH 1/4]middle-end Vect: Add support for dot-product
>> where the sign for the multiplicant changes.
>> 
>> Tamar Christina <Tamar.Christina@arm.com> writes:
>> > Hi,
>> >
>> >> Richard Sandiford <richard.sandiford@arm.com> writes:
>> >> >> @@ -992,21 +1029,27 @@ vect_recog_dot_prod_pattern (vec_info
>> >> *vinfo,
>> >> >>    /* FORNOW.  Can continue analyzing the def-use chain when this
>> >> >> stmt in
>> >> a phi
>> >> >>       inside the loop (in case we are analyzing an outer-loop).  */
>> >> >>    vect_unpromoted_value unprom0[2];
>> >> >> +  enum optab_subtype subtype = optab_vector;
>> >> >>    if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR,
>> >> WIDEN_MULT_EXPR,
>> >> >> -			     false, 2, unprom0, &half_type))
>> >> >> +			     false, 2, unprom0, &half_type, &subtype))
>> >> >> +    return NULL;
>> >> >> +
>> >> >> +  if (subtype == optab_vector_mixed_sign
>> >> >> +      && TYPE_UNSIGNED (unprom_mult.type)
>> >> >> +      && TYPE_PRECISION (half_type) * 4 > TYPE_PRECISION
>> >> >> + (unprom_mult.type))
>> >> >>      return NULL;
>> >> >
>> >> > Isn't the final condition here instead that TYPE1 is narrower than TYPE2?
>> >> > I.e. we need to reject the case in which we multiply a signed and
>> >> > an unsigned value to get a (logically) signed result, but then
>> >> > zero-extend it (rather than sign-extend it) to the precision of the
>> addition.
>> >> >
>> >> > That would make the test:
>> >> >
>> >> >   if (subtype == optab_vector_mixed_sign
>> >> >       && TYPE_UNSIGNED (unprom_mult.type)
>> >> >       && TYPE_PRECISION (unprom_mult.type) < TYPE_PRECISION (type))
>> >> >     return NULL;
>> >> >
>> >> > instead.
>> >>
>> >> And folding that into the existing test gives:
>> >>
>> >>   /* If there are two widening operations, make sure they agree on the
>> sign
>> >>      of the extension.  The result of an optab_vector_mixed_sign operation
>> >>      is signed; otherwise, the result has the same sign as the operands.  */
>> >>   if (TYPE_PRECISION (unprom_mult.type) != TYPE_PRECISION (type)
>> >>       && (subtype == optab_vector_mixed_sign
>> >> 	  ? TYPE_UNSIGNED (unprom_mult.type)
>> >> 	  : TYPE_SIGN (unprom_mult.type) != TYPE_SIGN (half_type)))
>> >>     return NULL;
>> >>
>> >
>> > I went with the first one which doesn't add the extra constraints for
>> > the normal dotproduct as that makes it too restrictive. It's the type
>> > of the multiplication that determines the operation so dotproduct can
>> > be used a bit more than where we currently do.
>> >
>> > This was relaxed in an earlier patch.
>> 
>> I didn't mean that we should add extra constraints to the normal case though.
>> The existing test I was referring to above was:
>> 
>>   /* If there are two widening operations, make sure they agree on
>>      the sign of the extension.  */
>>   if (TYPE_PRECISION (unprom_mult.type) != TYPE_PRECISION (type)
>>       && TYPE_SIGN (unprom_mult.type) != TYPE_SIGN (half_type))
>>     return NULL;
>
> But as I mentioned, this restriction is unneeded and has been removed hence why it's not in my patchset's diff.
> It's removed by https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569851.html which Richi conditioned on
> the rest of these patches being approved.
>
> This change needlessly blocks test vect-reduc-dot-[2,3,6,7].c from being dotproducts for instance
>
> It's also part of the deficiency between GCC codegen and Clang https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88492#c6

Hmm, OK.  Just removing the check regresses:

unsigned long __attribute__ ((noipa))
f (signed short *x, signed short *y)
{
  unsigned long res = 0;
  for (int i = 0; i < 100; ++i)
    res += (unsigned int) x[i] * (unsigned int) y[i];
  return res;
}

int
main (void)
{
  signed short x[100], y[100];
  for (int i = 0; i < 100; ++i)
    {
      x[i] = -1;
      y[i] = 1;
    }
  if (f (x, y) != 0x6400000000ULL - 100)
    __builtin_abort ();
  return 0;
}

on SVE.  We then use SDOT even though the result of the multiplication
is zero- rather than sign-extended to 64 bits.  Does something else
in the series stop that from that happening?

Richard

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes.
  2021-07-12 10:25                                 ` Richard Sandiford
@ 2021-07-12 12:29                                   ` Tamar Christina
  2021-07-12 14:55                                     ` Richard Sandiford
  0 siblings, 1 reply; 35+ messages in thread
From: Tamar Christina @ 2021-07-12 12:29 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: Richard Biener, nd, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 5496 bytes --]



> -----Original Message-----
> From: Richard Sandiford <richard.sandiford@arm.com>
> Sent: Monday, July 12, 2021 11:26 AM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: Richard Biener <rguenther@suse.de>; nd <nd@arm.com>; gcc-
> patches@gcc.gnu.org
> Subject: Re: [PATCH 1/4]middle-end Vect: Add support for dot-product
> where the sign for the multiplicant changes.
> 
> Tamar Christina <Tamar.Christina@arm.com> writes:
> >> -----Original Message-----
> >> From: Richard Sandiford <richard.sandiford@arm.com>
> >> Sent: Monday, July 12, 2021 10:39 AM
> >> To: Tamar Christina <Tamar.Christina@arm.com>
> >> Cc: Richard Biener <rguenther@suse.de>; nd <nd@arm.com>; gcc-
> >> patches@gcc.gnu.org
> >> Subject: Re: [PATCH 1/4]middle-end Vect: Add support for dot-product
> >> where the sign for the multiplicant changes.
> >>
> >> Tamar Christina <Tamar.Christina@arm.com> writes:
> >> > Hi,
> >> >
> >> >> Richard Sandiford <richard.sandiford@arm.com> writes:
> >> >> >> @@ -992,21 +1029,27 @@ vect_recog_dot_prod_pattern (vec_info
> >> >> *vinfo,
> >> >> >>    /* FORNOW.  Can continue analyzing the def-use chain when
> >> >> >> this stmt in
> >> >> a phi
> >> >> >>       inside the loop (in case we are analyzing an outer-loop).  */
> >> >> >>    vect_unpromoted_value unprom0[2];
> >> >> >> +  enum optab_subtype subtype = optab_vector;
> >> >> >>    if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR,
> >> >> WIDEN_MULT_EXPR,
> >> >> >> -			     false, 2, unprom0, &half_type))
> >> >> >> +			     false, 2, unprom0, &half_type, &subtype))
> >> >> >> +    return NULL;
> >> >> >> +
> >> >> >> +  if (subtype == optab_vector_mixed_sign
> >> >> >> +      && TYPE_UNSIGNED (unprom_mult.type)
> >> >> >> +      && TYPE_PRECISION (half_type) * 4 > TYPE_PRECISION
> >> >> >> + (unprom_mult.type))
> >> >> >>      return NULL;
> >> >> >
> >> >> > Isn't the final condition here instead that TYPE1 is narrower than
> TYPE2?
> >> >> > I.e. we need to reject the case in which we multiply a signed
> >> >> > and an unsigned value to get a (logically) signed result, but
> >> >> > then zero-extend it (rather than sign-extend it) to the
> >> >> > precision of the
> >> addition.
> >> >> >
> >> >> > That would make the test:
> >> >> >
> >> >> >   if (subtype == optab_vector_mixed_sign
> >> >> >       && TYPE_UNSIGNED (unprom_mult.type)
> >> >> >       && TYPE_PRECISION (unprom_mult.type) < TYPE_PRECISION
> (type))
> >> >> >     return NULL;
> >> >> >
> >> >> > instead.
> >> >>
> >> >> And folding that into the existing test gives:
> >> >>
> >> >>   /* If there are two widening operations, make sure they agree on
> >> >> the
> >> sign
> >> >>      of the extension.  The result of an optab_vector_mixed_sign
> operation
> >> >>      is signed; otherwise, the result has the same sign as the operands.
> */
> >> >>   if (TYPE_PRECISION (unprom_mult.type) != TYPE_PRECISION (type)
> >> >>       && (subtype == optab_vector_mixed_sign
> >> >> 	  ? TYPE_UNSIGNED (unprom_mult.type)
> >> >> 	  : TYPE_SIGN (unprom_mult.type) != TYPE_SIGN (half_type)))
> >> >>     return NULL;
> >> >>
> >> >
> >> > I went with the first one which doesn't add the extra constraints
> >> > for the normal dotproduct as that makes it too restrictive. It's
> >> > the type of the multiplication that determines the operation so
> >> > dotproduct can be used a bit more than where we currently do.
> >> >
> >> > This was relaxed in an earlier patch.
> >>
> >> I didn't mean that we should add extra constraints to the normal case
> though.
> >> The existing test I was referring to above was:
> >>
> >>   /* If there are two widening operations, make sure they agree on
> >>      the sign of the extension.  */
> >>   if (TYPE_PRECISION (unprom_mult.type) != TYPE_PRECISION (type)
> >>       && TYPE_SIGN (unprom_mult.type) != TYPE_SIGN (half_type))
> >>     return NULL;
> >
> > But as I mentioned, this restriction is unneeded and has been removed
> hence why it's not in my patchset's diff.
> > It's removed by
> > https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569851.html which
> Richi conditioned on the rest of these patches being approved.
> >
> > This change needlessly blocks test vect-reduc-dot-[2,3,6,7].c from
> > being dotproducts for instance
> >
> > It's also part of the deficiency between GCC codegen and Clang
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88492#c6
> 
> Hmm, OK.  Just removing the check regresses:
> 
> unsigned long __attribute__ ((noipa))
> f (signed short *x, signed short *y)
> {
>   unsigned long res = 0;
>   for (int i = 0; i < 100; ++i)
>     res += (unsigned int) x[i] * (unsigned int) y[i];
>   return res;
> }
> 
> int
> main (void)
> {
>   signed short x[100], y[100];
>   for (int i = 0; i < 100; ++i)
>     {
>       x[i] = -1;
>       y[i] = 1;
>     }
>   if (f (x, y) != 0x6400000000ULL - 100)
>     __builtin_abort ();
>   return 0;
> }
> 
> on SVE.  We then use SDOT even though the result of the multiplication is
> zero- rather than sign-extended to 64 bits.  Does something else in the series
> stop that from that happening?

No, and I hadn't noticed it before because it looks like the mid-end tests that are execution test don't turn on dot-product for arm targets :/ 

I'll look at it separately, for now I've then added the check back in.

Ok for trunk now?

Thanks,
Tamar

> 
> Richard

[-- Attachment #2: rb14433.patch --]
[-- Type: application/octet-stream, Size: 16746 bytes --]

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 1b91814433057b1b377283fd1f40cb970dc3d243..323ba8eab78e2b2e582fa0633752930182e83ee5 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5446,13 +5446,55 @@ Like @samp{fold_left_plus_@var{m}}, but takes an additional mask operand
 
 @cindex @code{sdot_prod@var{m}} instruction pattern
 @item @samp{sdot_prod@var{m}}
+
+Compute the sum of the products of two signed elements.
+Operand 1 and operand 2 are of the same mode. Their
+product, which is of a wider mode, is computed and added to operand 3.
+Operand 3 is of a mode equal or wider than the mode of the product. The
+result is placed in operand 0, which is of the same mode as operand 3.
+
+Semantically the expressions perform the multiplication in the following signs
+
+@smallexample
+sdot<signed c, signed a, signed b> ==
+   res = sign-ext (a) * sign-ext (b) + c
+@dots{}
+@end smallexample
+
 @cindex @code{udot_prod@var{m}} instruction pattern
-@itemx @samp{udot_prod@var{m}}
-Compute the sum of the products of two signed/unsigned elements.
-Operand 1 and operand 2 are of the same mode. Their product, which is of a
-wider mode, is computed and added to operand 3. Operand 3 is of a mode equal or
-wider than the mode of the product. The result is placed in operand 0, which
-is of the same mode as operand 3.
+@item @samp{udot_prod@var{m}}
+
+Compute the sum of the products of two unsigned elements.
+Operand 1 and operand 2 are of the same mode. Their
+product, which is of a wider mode, is computed and added to operand 3.
+Operand 3 is of a mode equal or wider than the mode of the product. The
+result is placed in operand 0, which is of the same mode as operand 3.
+
+Semantically the expressions perform the multiplication in the following signs
+
+@smallexample
+udot<unsigned c, unsigned a, unsigned b> ==
+   res = zero-ext (a) * zero-ext (b) + c
+@dots{}
+@end smallexample
+
+
+
+@cindex @code{usdot_prod@var{m}} instruction pattern
+@item @samp{usdot_prod@var{m}}
+Compute the sum of the products of elements of different signs.
+Operand 1 must be unsigned and operand 2 signed. Their
+product, which is of a wider mode, is computed and added to operand 3.
+Operand 3 is of a mode equal or wider than the mode of the product. The
+result is placed in operand 0, which is of the same mode as operand 3.
+
+Semantically the expressions perform the multiplication in the following signs
+
+@smallexample
+usdot<unsigned c, unsigned a, signed b> ==
+   res = ((unsigned-conv) sign-ext (a)) * zero-ext (b) + c
+@dots{}
+@end smallexample
 
 @cindex @code{ssad@var{m}} instruction pattern
 @item @samp{ssad@var{m}}
diff --git a/gcc/optabs-tree.h b/gcc/optabs-tree.h
index c3aaa1a416991e856d3e24da45968a92ebada82c..fbd2b06b8dbfd560dfb66b314830e6b564b37abb 100644
--- a/gcc/optabs-tree.h
+++ b/gcc/optabs-tree.h
@@ -29,7 +29,8 @@ enum optab_subtype
 {
   optab_default,
   optab_scalar,
-  optab_vector
+  optab_vector,
+  optab_vector_mixed_sign
 };
 
 /* Return the optab used for computing the given operation on the type given by
diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
index 95ffe397c23e80c105afea52e9d47216bf52f55a..eeb5aeed3202cc6971b6447994bc5311e9c010bb 100644
--- a/gcc/optabs-tree.c
+++ b/gcc/optabs-tree.c
@@ -127,7 +127,12 @@ optab_for_tree_code (enum tree_code code, const_tree type,
       return TYPE_UNSIGNED (type) ? usum_widen_optab : ssum_widen_optab;
 
     case DOT_PROD_EXPR:
-      return TYPE_UNSIGNED (type) ? udot_prod_optab : sdot_prod_optab;
+      {
+	if (subtype == optab_vector_mixed_sign)
+	  return usdot_prod_optab;
+
+	return (TYPE_UNSIGNED (type) ? udot_prod_optab : sdot_prod_optab);
+      }
 
     case SAD_EXPR:
       return TYPE_UNSIGNED (type) ? usad_optab : ssad_optab;
diff --git a/gcc/optabs.c b/gcc/optabs.c
index 62a6bdb4c59bf8263c499245795576199606d372..14d8ad2f33fd75388435fe912380e177f8f3c54b 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -262,6 +262,11 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
   bool sbool = false;
 
   oprnd0 = ops->op0;
+  if (nops >= 2)
+    oprnd1 = ops->op1;
+  if (nops >= 3)
+    oprnd2 = ops->op2;
+
   tmode0 = TYPE_MODE (TREE_TYPE (oprnd0));
   if (ops->code == VEC_UNPACK_FIX_TRUNC_HI_EXPR
       || ops->code == VEC_UNPACK_FIX_TRUNC_LO_EXPR)
@@ -285,6 +290,27 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
 	   ? vec_unpacks_sbool_hi_optab : vec_unpacks_sbool_lo_optab);
       sbool = true;
     }
+  else if (ops->code == DOT_PROD_EXPR)
+    {
+      enum optab_subtype subtype = optab_default;
+      signop sign1 = TYPE_SIGN (TREE_TYPE (oprnd0));
+      signop sign2 = TYPE_SIGN (TREE_TYPE (oprnd1));
+      if (sign1 == sign2)
+	;
+      else if (sign1 == SIGNED && sign2 == UNSIGNED)
+	{
+	  subtype = optab_vector_mixed_sign;
+	  /* Same as optab_vector_mixed_sign but flip the operands.  */
+	  std::swap (op0, op1);
+	}
+      else if (sign1 == UNSIGNED && sign2 == SIGNED)
+	subtype = optab_vector_mixed_sign;
+      else
+	gcc_unreachable ();
+
+      widen_pattern_optab
+	= optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), subtype);
+    }
   else
     widen_pattern_optab
       = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), optab_default);
@@ -298,10 +324,7 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
   gcc_assert (icode != CODE_FOR_nothing);
 
   if (nops >= 2)
-    {
-      oprnd1 = ops->op1;
-      tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
-    }
+    tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
   else if (sbool)
     {
       nops = 2;
@@ -316,7 +339,6 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
     {
       gcc_assert (tmode1 == tmode0);
       gcc_assert (op1);
-      oprnd2 = ops->op2;
       wmode = TYPE_MODE (TREE_TYPE (oprnd2));
     }
 
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 41ab2598eb6c32c003cbed490796abf25d2ee315..574d355b6b3092cf893f5ab0e8ae0f6d9ffcefbd 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -352,6 +352,7 @@ OPTAB_D (uavg_ceil_optab, "uavg$a3_ceil")
 OPTAB_D (sdot_prod_optab, "sdot_prod$I$a")
 OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3")
 OPTAB_D (udot_prod_optab, "udot_prod$I$a")
+OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
 OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
 OPTAB_D (usad_optab, "usad$I$a")
 OPTAB_D (ssad_optab, "ssad$I$a")
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index c73e1cbdda6b9380190b03de66caee48c4e173e3..3750d2881cbb7fd1e71c0eb8c0d4929925fd4152 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -4434,7 +4434,8 @@ verify_gimple_assign_ternary (gassign *stmt)
 		  && !SCALAR_FLOAT_TYPE_P (rhs1_type))
 		 || (!INTEGRAL_TYPE_P (lhs_type)
 		     && !SCALAR_FLOAT_TYPE_P (lhs_type))))
-	    || !types_compatible_p (rhs1_type, rhs2_type)
+	    /* rhs1_type and rhs2_type may differ in sign.  */
+	    || !tree_nop_conversion_p (rhs1_type, rhs2_type)
 	    || !useless_type_conversion_p (lhs_type, rhs3_type)
 	    || maybe_lt (GET_MODE_SIZE (element_mode (rhs3_type)),
 			 2 * GET_MODE_SIZE (element_mode (rhs1_type))))
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 51a46a6d852fb342278bb9513d013702cff4b868..4e63e84cc70ca60c706c19367ccf256ea3f851b5 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -6663,6 +6663,12 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
   bool lane_reduc_code_p
     = (code == DOT_PROD_EXPR || code == WIDEN_SUM_EXPR || code == SAD_EXPR);
   int op_type = TREE_CODE_LENGTH (code);
+  enum optab_subtype optab_query_kind = optab_vector;
+  if (code == DOT_PROD_EXPR
+      && TYPE_SIGN (TREE_TYPE (gimple_assign_rhs1 (stmt)))
+	   != TYPE_SIGN (TREE_TYPE (gimple_assign_rhs2 (stmt))))
+    optab_query_kind = optab_vector_mixed_sign;
+
 
   scalar_dest = gimple_assign_lhs (stmt);
   scalar_type = TREE_TYPE (scalar_dest);
@@ -7190,7 +7196,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
       bool ok = true;
 
       /* 4.1. check support for the operation in the loop  */
-      optab optab = optab_for_tree_code (code, vectype_in, optab_vector);
+      optab optab = optab_for_tree_code (code, vectype_in, optab_query_kind);
       if (!optab)
 	{
 	  if (dump_enabled_p ())
diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index b2e7fc2cc7adad72697b8d76deb0448d0b03e0a8..71533e61c934c63dd05a33c8f7159185e9b11a1b 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -191,9 +191,9 @@ vect_get_external_def_edge (vec_info *vinfo, tree var)
 }
 
 /* Return true if the target supports a vector version of CODE,
-   where CODE is known to map to a direct optab.  ITYPE specifies
-   the type of (some of) the scalar inputs and OTYPE specifies the
-   type of the scalar result.
+   where CODE is known to map to a direct optab with the given SUBTYPE.
+   ITYPE specifies the type of (some of) the scalar inputs and OTYPE
+   specifies the type of the scalar result.
 
    If CODE allows the inputs and outputs to have different type
    (such as for WIDEN_SUM_EXPR), it is the input mode rather
@@ -208,7 +208,8 @@ vect_get_external_def_edge (vec_info *vinfo, tree var)
 static bool
 vect_supportable_direct_optab_p (vec_info *vinfo, tree otype, tree_code code,
 				 tree itype, tree *vecotype_out,
-				 tree *vecitype_out = NULL)
+				 tree *vecitype_out = NULL,
+				 enum optab_subtype subtype = optab_default)
 {
   tree vecitype = get_vectype_for_scalar_type (vinfo, itype);
   if (!vecitype)
@@ -218,7 +219,7 @@ vect_supportable_direct_optab_p (vec_info *vinfo, tree otype, tree_code code,
   if (!vecotype)
     return false;
 
-  optab optab = optab_for_tree_code (code, vecitype, optab_default);
+  optab optab = optab_for_tree_code (code, vecitype, subtype);
   if (!optab)
     return false;
 
@@ -521,6 +522,7 @@ vect_joust_widened_type (tree type, tree new_type, tree *common_type)
   unsigned int precision = MAX (TYPE_PRECISION (*common_type),
 				TYPE_PRECISION (new_type));
   precision *= 2;
+
   if (precision * 2 > TYPE_PRECISION (type))
     return false;
 
@@ -539,6 +541,10 @@ vect_joust_widened_type (tree type, tree new_type, tree *common_type)
    to a type that (a) is narrower than the result of STMT_INFO and
    (b) can hold all leaf operand values.
 
+   If SUBTYPE then allow that the signs of the operands
+   may differ in signs but not in precision.  SUBTYPE is updated to reflect
+   this.
+
    Return 0 if STMT_INFO isn't such a tree, or if no such COMMON_TYPE
    exists.  */
 
@@ -546,7 +552,8 @@ static unsigned int
 vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
 		      tree_code widened_code, bool shift_p,
 		      unsigned int max_nops,
-		      vect_unpromoted_value *unprom, tree *common_type)
+		      vect_unpromoted_value *unprom, tree *common_type,
+		      enum optab_subtype *subtype = NULL)
 {
   /* Check for an integer operation with the right code.  */
   gassign *assign = dyn_cast <gassign *> (stmt_info->stmt);
@@ -607,7 +614,8 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
 		= vinfo->lookup_def (this_unprom->op);
 	      nops = vect_widened_op_tree (vinfo, def_stmt_info, code,
 					   widened_code, shift_p, max_nops,
-					   this_unprom, common_type);
+					   this_unprom, common_type,
+					   subtype);
 	      if (nops == 0)
 		return 0;
 
@@ -625,7 +633,18 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
 		*common_type = this_unprom->type;
 	      else if (!vect_joust_widened_type (type, this_unprom->type,
 						 common_type))
-		return 0;
+		{
+		  if (subtype)
+		    {
+		      /* See if we can sign extend the smaller type.  */
+		      if (TYPE_PRECISION (this_unprom->type)
+			  > TYPE_PRECISION (*common_type))
+			*common_type = this_unprom->type;
+		      *subtype = optab_vector_mixed_sign;
+		    }
+		  else
+		    return 0;
+		}
 	    }
 	}
       next_op += nops;
@@ -725,12 +744,22 @@ vect_split_statement (vec_info *vinfo, stmt_vec_info stmt2_info, tree new_rhs,
 
 /* Convert UNPROM to TYPE and return the result, adding new statements
    to STMT_INFO's pattern definition statements if no better way is
-   available.  VECTYPE is the vector form of TYPE.  */
+   available.  VECTYPE is the vector form of TYPE.
+
+   If SUBTYPE then convert the type based on the subtype.  */
 
 static tree
 vect_convert_input (vec_info *vinfo, stmt_vec_info stmt_info, tree type,
-		    vect_unpromoted_value *unprom, tree vectype)
+		    vect_unpromoted_value *unprom, tree vectype,
+		    enum optab_subtype subtype = optab_default)
 {
+
+  /* Update the type if the signs differ.  */
+  if (subtype == optab_vector_mixed_sign
+      && TYPE_SIGN (type) != TYPE_SIGN (TREE_TYPE (unprom->op)))
+    type = build_nonstandard_integer_type (TYPE_PRECISION (type),
+					   TYPE_SIGN (unprom->type));
+
   /* Check for a no-op conversion.  */
   if (types_compatible_p (type, TREE_TYPE (unprom->op)))
     return unprom->op;
@@ -806,12 +835,14 @@ vect_convert_input (vec_info *vinfo, stmt_vec_info stmt_info, tree type,
 }
 
 /* Invoke vect_convert_input for N elements of UNPROM and store the
-   result in the corresponding elements of RESULT.  */
+   result in the corresponding elements of RESULT.
+
+   If SUBTYPE then convert the type based on the subtype.  */
 
 static void
 vect_convert_inputs (vec_info *vinfo, stmt_vec_info stmt_info, unsigned int n,
 		     tree *result, tree type, vect_unpromoted_value *unprom,
-		     tree vectype)
+		     tree vectype, enum optab_subtype subtype = optab_default)
 {
   for (unsigned int i = 0; i < n; ++i)
     {
@@ -819,11 +850,12 @@ vect_convert_inputs (vec_info *vinfo, stmt_vec_info stmt_info, unsigned int n,
       for (j = 0; j < i; ++j)
 	if (unprom[j].op == unprom[i].op)
 	  break;
+
       if (j < i)
 	result[i] = result[j];
       else
 	result[i] = vect_convert_input (vinfo, stmt_info,
-					type, &unprom[i], vectype);
+					type, &unprom[i], vectype, subtype);
     }
 }
 
@@ -895,7 +927,8 @@ vect_reassociating_reduction_p (vec_info *vinfo,
 
    Try to find the following pattern:
 
-     type x_t, y_t;
+     type1a x_t
+     type1b y_t;
      TYPE1 prod;
      TYPE2 sum = init;
    loop:
@@ -908,9 +941,9 @@ vect_reassociating_reduction_p (vec_info *vinfo,
      [S6  prod = (TYPE2) prod;  #optional]
      S7  sum_1 = prod + sum_0;
 
-   where 'TYPE1' is exactly double the size of type 'type', and 'TYPE2' is the
-   same size of 'TYPE1' or bigger. This is a special case of a reduction
-   computation.
+   where 'TYPE1' is exactly double the size of type 'type1a' and 'type1b',
+   the sign of 'TYPE1' must be one of 'type1a' or 'type1b' but the sign of
+   'type1a' and 'type1b' can differ.
 
    Input:
 
@@ -953,7 +986,8 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
      In which
      - DX is double the size of X
      - DY is double the size of Y
-     - DX, DY, DPROD all have the same type
+     - DX, DY, DPROD all have the same type but the sign
+       between X, Y and DPROD can differ.
      - sum is the same size of DPROD or bigger
      - sum has been recognized as a reduction variable.
 
@@ -991,8 +1025,18 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
   /* FORNOW.  Can continue analyzing the def-use chain when this stmt in a phi
      inside the loop (in case we are analyzing an outer-loop).  */
   vect_unpromoted_value unprom0[2];
+  enum optab_subtype subtype = optab_vector;
   if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR, WIDEN_MULT_EXPR,
-			     false, 2, unprom0, &half_type))
+			     false, 2, unprom0, &half_type, &subtype))
+    return NULL;
+
+  /* If there are two widening operations, make sure they agree on the sign
+     of the extension.  The result of an optab_vector_mixed_sign operation
+     is signed; otherwise, the result has the same sign as the operands.  */
+  if (TYPE_PRECISION (unprom_mult.type) != TYPE_PRECISION (type)
+      && (subtype == optab_vector_mixed_sign
+	? TYPE_UNSIGNED (unprom_mult.type)
+	: TYPE_SIGN (unprom_mult.type) != TYPE_SIGN (half_type)))
     return NULL;
 
   /* If there are two widening operations, make sure they agree on
@@ -1005,13 +1049,13 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
 
   tree half_vectype;
   if (!vect_supportable_direct_optab_p (vinfo, type, DOT_PROD_EXPR, half_type,
-					type_out, &half_vectype))
+					type_out, &half_vectype, subtype))
     return NULL;
 
   /* Get the inputs in the appropriate types.  */
   tree mult_oprnd[2];
   vect_convert_inputs (vinfo, stmt_vinfo, 2, mult_oprnd, half_type,
-		       unprom0, half_vectype);
+		       unprom0, half_vectype, subtype);
 
   var = vect_recog_temp_ssa_var (type, NULL);
   pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes.
  2021-07-12 12:29                                   ` Tamar Christina
@ 2021-07-12 14:55                                     ` Richard Sandiford
  0 siblings, 0 replies; 35+ messages in thread
From: Richard Sandiford @ 2021-07-12 14:55 UTC (permalink / raw)
  To: Tamar Christina; +Cc: Richard Biener, nd, gcc-patches

Tamar Christina <Tamar.Christina@arm.com> writes:
>> -----Original Message-----
>> From: Richard Sandiford <richard.sandiford@arm.com>
>> Sent: Monday, July 12, 2021 11:26 AM
>> To: Tamar Christina <Tamar.Christina@arm.com>
>> Cc: Richard Biener <rguenther@suse.de>; nd <nd@arm.com>; gcc-
>> patches@gcc.gnu.org
>> Subject: Re: [PATCH 1/4]middle-end Vect: Add support for dot-product
>> where the sign for the multiplicant changes.
>> 
>> Tamar Christina <Tamar.Christina@arm.com> writes:
>> >> -----Original Message-----
>> >> From: Richard Sandiford <richard.sandiford@arm.com>
>> >> Sent: Monday, July 12, 2021 10:39 AM
>> >> To: Tamar Christina <Tamar.Christina@arm.com>
>> >> Cc: Richard Biener <rguenther@suse.de>; nd <nd@arm.com>; gcc-
>> >> patches@gcc.gnu.org
>> >> Subject: Re: [PATCH 1/4]middle-end Vect: Add support for dot-product
>> >> where the sign for the multiplicant changes.
>> >>
>> >> Tamar Christina <Tamar.Christina@arm.com> writes:
>> >> > Hi,
>> >> >
>> >> >> Richard Sandiford <richard.sandiford@arm.com> writes:
>> >> >> >> @@ -992,21 +1029,27 @@ vect_recog_dot_prod_pattern (vec_info
>> >> >> *vinfo,
>> >> >> >>    /* FORNOW.  Can continue analyzing the def-use chain when
>> >> >> >> this stmt in
>> >> >> a phi
>> >> >> >>       inside the loop (in case we are analyzing an outer-loop).  */
>> >> >> >>    vect_unpromoted_value unprom0[2];
>> >> >> >> +  enum optab_subtype subtype = optab_vector;
>> >> >> >>    if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR,
>> >> >> WIDEN_MULT_EXPR,
>> >> >> >> -			     false, 2, unprom0, &half_type))
>> >> >> >> +			     false, 2, unprom0, &half_type, &subtype))
>> >> >> >> +    return NULL;
>> >> >> >> +
>> >> >> >> +  if (subtype == optab_vector_mixed_sign
>> >> >> >> +      && TYPE_UNSIGNED (unprom_mult.type)
>> >> >> >> +      && TYPE_PRECISION (half_type) * 4 > TYPE_PRECISION
>> >> >> >> + (unprom_mult.type))
>> >> >> >>      return NULL;
>> >> >> >
>> >> >> > Isn't the final condition here instead that TYPE1 is narrower than
>> TYPE2?
>> >> >> > I.e. we need to reject the case in which we multiply a signed
>> >> >> > and an unsigned value to get a (logically) signed result, but
>> >> >> > then zero-extend it (rather than sign-extend it) to the
>> >> >> > precision of the
>> >> addition.
>> >> >> >
>> >> >> > That would make the test:
>> >> >> >
>> >> >> >   if (subtype == optab_vector_mixed_sign
>> >> >> >       && TYPE_UNSIGNED (unprom_mult.type)
>> >> >> >       && TYPE_PRECISION (unprom_mult.type) < TYPE_PRECISION
>> (type))
>> >> >> >     return NULL;
>> >> >> >
>> >> >> > instead.
>> >> >>
>> >> >> And folding that into the existing test gives:
>> >> >>
>> >> >>   /* If there are two widening operations, make sure they agree on
>> >> >> the
>> >> sign
>> >> >>      of the extension.  The result of an optab_vector_mixed_sign
>> operation
>> >> >>      is signed; otherwise, the result has the same sign as the operands.
>> */
>> >> >>   if (TYPE_PRECISION (unprom_mult.type) != TYPE_PRECISION (type)
>> >> >>       && (subtype == optab_vector_mixed_sign
>> >> >> 	  ? TYPE_UNSIGNED (unprom_mult.type)
>> >> >> 	  : TYPE_SIGN (unprom_mult.type) != TYPE_SIGN (half_type)))
>> >> >>     return NULL;
>> >> >>
>> >> >
>> >> > I went with the first one which doesn't add the extra constraints
>> >> > for the normal dotproduct as that makes it too restrictive. It's
>> >> > the type of the multiplication that determines the operation so
>> >> > dotproduct can be used a bit more than where we currently do.
>> >> >
>> >> > This was relaxed in an earlier patch.
>> >>
>> >> I didn't mean that we should add extra constraints to the normal case
>> though.
>> >> The existing test I was referring to above was:
>> >>
>> >>   /* If there are two widening operations, make sure they agree on
>> >>      the sign of the extension.  */
>> >>   if (TYPE_PRECISION (unprom_mult.type) != TYPE_PRECISION (type)
>> >>       && TYPE_SIGN (unprom_mult.type) != TYPE_SIGN (half_type))
>> >>     return NULL;
>> >
>> > But as I mentioned, this restriction is unneeded and has been removed
>> hence why it's not in my patchset's diff.
>> > It's removed by
>> > https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569851.html which
>> Richi conditioned on the rest of these patches being approved.
>> >
>> > This change needlessly blocks test vect-reduc-dot-[2,3,6,7].c from
>> > being dotproducts for instance
>> >
>> > It's also part of the deficiency between GCC codegen and Clang
>> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88492#c6
>> 
>> Hmm, OK.  Just removing the check regresses:
>> 
>> unsigned long __attribute__ ((noipa))
>> f (signed short *x, signed short *y)
>> {
>>   unsigned long res = 0;
>>   for (int i = 0; i < 100; ++i)
>>     res += (unsigned int) x[i] * (unsigned int) y[i];
>>   return res;
>> }
>> 
>> int
>> main (void)
>> {
>>   signed short x[100], y[100];
>>   for (int i = 0; i < 100; ++i)
>>     {
>>       x[i] = -1;
>>       y[i] = 1;
>>     }
>>   if (f (x, y) != 0x6400000000ULL - 100)
>>     __builtin_abort ();
>>   return 0;
>> }
>> 
>> on SVE.  We then use SDOT even though the result of the multiplication is
>> zero- rather than sign-extended to 64 bits.  Does something else in the series
>> stop that from that happening?
>
> No, and I hadn't noticed it before because it looks like the mid-end tests that are execution test don't turn on dot-product for arm targets :/ 

Yeah, I was surprised I needed SVE to get an SDOT above, but didn't look
into why…

> I'll look at it separately, for now I've then added the check back in.
>
> Ok for trunk now?

Reviewing the full patch this time: I have a couple of nits about
the documentation, but otherwise it LGTM.

> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5446,13 +5446,55 @@ Like @samp{fold_left_plus_@var{m}}, but takes an additional mask operand
>  
>  @cindex @code{sdot_prod@var{m}} instruction pattern
>  @item @samp{sdot_prod@var{m}}
> +
> +Compute the sum of the products of two signed elements.
> +Operand 1 and operand 2 are of the same mode. Their
> +product, which is of a wider mode, is computed and added to operand 3.
> +Operand 3 is of a mode equal or wider than the mode of the product. The
> +result is placed in operand 0, which is of the same mode as operand 3.
> +
> +Semantically the expressions perform the multiplication in the following signs
> +
> +@smallexample
> +sdot<signed c, signed a, signed b> ==
> +   res = sign-ext (a) * sign-ext (b) + c
> +@dots{}
> +@end smallexample

I think putting signed c first in the argument list might be confusing,
since like you say, it corresponds to operand 3 rather than operand 1.
How about calling them op0, op1, op2 and op3 instead of res, a, b and c,
and listing them in that order?

Same for udot_prod.

(Someone who doesn't know the AArch64 instructions might wonder how
the elements of op1 and op2 correspond to elements of op0 and op3.
That's a pre-existing problem though, so no need to fix it here.)

>  @cindex @code{udot_prod@var{m}} instruction pattern
> -@itemx @samp{udot_prod@var{m}}
> -Compute the sum of the products of two signed/unsigned elements.
> -Operand 1 and operand 2 are of the same mode. Their product, which is of a
> -wider mode, is computed and added to operand 3. Operand 3 is of a mode equal or
> -wider than the mode of the product. The result is placed in operand 0, which
> -is of the same mode as operand 3.
> +@item @samp{udot_prod@var{m}}
> +
> +Compute the sum of the products of two unsigned elements.
> +Operand 1 and operand 2 are of the same mode. Their
> +product, which is of a wider mode, is computed and added to operand 3.
> +Operand 3 is of a mode equal or wider than the mode of the product. The
> +result is placed in operand 0, which is of the same mode as operand 3.
> +
> +Semantically the expressions perform the multiplication in the following signs
> +
> +@smallexample
> +udot<unsigned c, unsigned a, unsigned b> ==
> +   res = zero-ext (a) * zero-ext (b) + c
> +@dots{}
> +@end smallexample
> +
> +
> +

Should just be one blank line here.

> +@cindex @code{usdot_prod@var{m}} instruction pattern
> +@item @samp{usdot_prod@var{m}}
> +Compute the sum of the products of elements of different signs.
> +Operand 1 must be unsigned and operand 2 signed. Their
> +product, which is of a wider mode, is computed and added to operand 3.
> +Operand 3 is of a mode equal or wider than the mode of the product. The
> +result is placed in operand 0, which is of the same mode as operand 3.
> +
> +Semantically the expressions perform the multiplication in the following signs
> +
> +@smallexample
> +usdot<unsigned c, unsigned a, signed b> ==
> +   res = ((unsigned-conv) sign-ext (a)) * zero-ext (b) + c

It looks like the extensions are the wrong way around.  I think it should be:

usdot<signed c, unsigned a, signed b> ==
   res = ((signed-conv) zero-ext (a)) * sign-ext (b) + c

(before the changes to put c last and use the opN names).

I.e. the unsigned operand is zero-extended and the signed operand is
sign extended.  I think it's easier to understand if we treat the
multiplication and c as signed, since in that case we don't reinterpret
any negative signed value (of b) as an unsigned value.  (Both choices
make sense for “a”, since the zero-ext(a) fits into both a signed wider
int and an unsigned wider int.)

OK with those changes, and thanks for your patience through the slow reviews.

Richard

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2021-07-12 14:55 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-05 17:38 [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes Tamar Christina
2021-05-05 17:38 ` [PATCH 2/4]AArch64: Add support for sign differing dot-product usdot for NEON and SVE Tamar Christina
2021-05-10 16:49   ` Richard Sandiford
2021-05-25 14:57     ` Tamar Christina
2021-05-26  8:50       ` Richard Sandiford
2021-05-05 17:39 ` [PATCH 3/4][AArch32]: Add support for sign differing dot-product usdot for NEON Tamar Christina
2021-05-05 17:42   ` FW: " Tamar Christina
     [not found]     ` <VI1PR08MB5325B832EE3BB6139886C0E9FF259@VI1PR08MB5325.eurprd08.prod.outlook.com>
2021-05-25 15:02       ` Tamar Christina
2021-05-26 10:45         ` Kyrylo Tkachov
2021-05-06  9:23   ` Christophe Lyon
2021-05-06  9:27     ` Tamar Christina
2021-05-05 17:39 ` [PATCH 4/4]middle-end: Add tests middle end generic tests for sign differing dotproduct Tamar Christina
     [not found]   ` <VI1PR08MB532511701573C18A33AC6291FF259@VI1PR08MB5325.eurprd08.prod.outlook.com>
2021-05-25 15:01     ` FW: " Tamar Christina
     [not found]     ` <11s2181-8856-30rq-26or-84q8o7qrr2o@fhfr.qr>
2021-05-26  8:48       ` Tamar Christina
2021-06-14 12:08       ` Tamar Christina
2021-05-07 11:45 ` [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes Richard Biener
2021-05-07 12:42   ` Tamar Christina
2021-05-10 11:39     ` Richard Biener
2021-05-10 12:58       ` Tamar Christina
2021-05-10 13:29         ` Richard Biener
2021-05-25 14:57           ` Tamar Christina
2021-05-26  8:56             ` Richard Biener
2021-06-02  9:28               ` Tamar Christina
2021-06-04 10:12                 ` Tamar Christina
2021-06-07 10:10                   ` Richard Sandiford
2021-06-14 12:06                     ` Tamar Christina
2021-06-21  8:11                       ` Tamar Christina
2021-06-22 10:56                       ` Richard Sandiford
2021-06-22 11:16                         ` Richard Sandiford
2021-07-12  9:18                           ` Tamar Christina
2021-07-12  9:39                             ` Richard Sandiford
2021-07-12  9:56                               ` Tamar Christina
2021-07-12 10:25                                 ` Richard Sandiford
2021-07-12 12:29                                   ` Tamar Christina
2021-07-12 14:55                                     ` Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).