[PATCH V2 00/10] optabs: Make all `*dot_prod_optab's modeled as conversions

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH V2 00/10] optabs: Make all `*dot_prod_optab's modeled as conversions
@ 2024-08-13 12:41 Victor Do Nascimento
  2024-08-13 12:41 ` [PATCH V2 01/10] " Victor Do Nascimento
                   ` (9 more replies)
  0 siblings, 10 replies; 18+ messages in thread
From: Victor Do Nascimento @ 2024-08-13 12:41 UTC (permalink / raw)
  To: gcc-patches
  Cc: Tamar.Christina, claziss, hongtao.liu, syq, bernds_cb1, aldyh,
	Victor Do Nascimento

Changes in this revision:

* Remove features that classified as feature creep (Gimple folding and
rewriting the aarch64/arm dotprod builtin initialization routines).
These will be submitted separately later.
* Add missing second mode to arm-backend pattern missed in original.
* Add implementation for internal_fn in `directly_supported_p' for
convert optabs.
* Reuse existing iterators in the i386 backend.
* Improve ChangeLog entries involving renaming of back-end patterns.
* Improve tests, including new test with run-time checks to verify
correctness.

-----

Given the specification in the GCC internals manual defines the
{u|s}dot_prod<m> standard name as taking "two signed elements of the
same mode, adding them to a third operand of wider mode", there is
currently ambiguity in the relationship between the mode of the first
two arguments and that of the third.

This vagueness means that, in theory, different modes may be
supportable in the third argument.  This flexibility would allow for a
given backend to add to the accumulator a different number of
vectorized products, e.g. A backend may provide instructions for both:

  accum += a[0] * b[0]

and

  accum += a[0] * b[0] + a[1] * b[1],

as is now seen in the SVE2.1 extension to AArch64.  In spite of the
aforementioned flexibility, modeling the dot-product operation as a
direct optab means that we have no way to encode both input and the
accumulator data modes into the backend pattern name, which prevents
us from harnessing this flexibility.

The purpose of this patch-series is therefore to remedy this current
shortcoming, moving the `dot_prod' from its current implementation as
a direct optab to an implementation where, as a conversion optab, we
are able to differentiate between dot products taking the same input
mode but resulting in a different output mode.

Regression-tested on x86_64, aarch64 and armhf.  I'd appreciate help
running relevant tests on the remaining architectures, i.e. arc, mips,
altivec and c6x to ensure I've not inadvertently broken anything for
those back-ends.

Victor Do Nascimento (10):
  optabs: Make all `*dot_prod_optab's modeled as conversions
  autovectorizer: Add basic support for convert optabs
  aarch64: Fix aarch64 backend-use of (u|s|us)dot_prod patterns
  arm: Fix arm backend-use of (u|s|us)dot_prod patterns
  i386: Fix dot_prod backend patterns for mmx and sse targets
  arc: Adjust dot-product backend patterns
  mips:  Adjust dot-product backend patterns
  rs6000: Adjust altivec dot-product backend patterns
  c6x:  Adjust dot-product backend patterns
  autovectorizer: Test autovectorization of different dot-prod modes.

 gcc/config/aarch64/aarch64-builtins.cc        |  7 ++
 gcc/config/aarch64/aarch64-simd-builtins.def  |  6 +-
 gcc/config/aarch64/aarch64-simd.md            |  9 +-
 .../aarch64/aarch64-sve-builtins-base.cc      | 13 +--
 gcc/config/aarch64/aarch64-sve-builtins.cc    | 17 ++++
 gcc/config/aarch64/aarch64-sve-builtins.h     |  3 +
 gcc/config/aarch64/aarch64-sve.md             |  6 +-
 gcc/config/aarch64/aarch64-sve2.md            |  2 +-
 gcc/config/arc/simdext.md                     |  8 +-
 gcc/config/arm/arm-builtins.cc                | 95 +++++++++++++++++++
 gcc/config/arm/arm-protos.h                   |  3 +
 gcc/config/arm/arm.cc                         |  1 +
 gcc/config/arm/arm_neon_builtins.def          |  3 -
 gcc/config/arm/neon.md                        |  6 +-
 gcc/config/c6x/c6x.md                         |  2 +-
 gcc/config/i386/mmx.md                        | 30 +++---
 gcc/config/i386/sse.md                        | 38 ++++----
 gcc/config/mips/loongson-mmi.md               |  2 +-
 gcc/config/rs6000/altivec.md                  |  4 +-
 gcc/doc/md.texi                               | 46 ++++-----
 gcc/gimple-match-exports.cc                   | 23 +++++
 gcc/gimple-match.h                            |  2 +
 gcc/optabs.cc                                 |  3 +-
 gcc/optabs.def                                |  6 +-
 .../gcc.dg/vect/vect-dotprod-twoway.c         | 39 ++++++++
 .../aarch64/sme/vect-dotprod-twoway.c         | 25 +++++
 .../gcc.target/aarch64/vect-dotprod-twoway.c  | 65 +++++++++++++
 gcc/testsuite/lib/target-supports.exp         |  8 ++
 gcc/tree-vect-loop.cc                         |  1 +
 gcc/tree-vect-patterns.cc                     | 43 ++++++++-
 30 files changed, 420 insertions(+), 96 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/vect-dotprod-twoway.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-dotprod-twoway.c

-- 
2.34.1

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH V2 01/10] optabs: Make all `*dot_prod_optab's modeled as conversions
  2024-08-13 12:41 [PATCH V2 00/10] optabs: Make all `*dot_prod_optab's modeled as conversions Victor Do Nascimento
@ 2024-08-13 12:41 ` Victor Do Nascimento
  2024-08-15  8:11   ` Richard Sandiford
  2024-08-13 12:41 ` [PATCH V2 02/10] autovectorizer: Add basic support for convert optabs Victor Do Nascimento
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 18+ messages in thread
From: Victor Do Nascimento @ 2024-08-13 12:41 UTC (permalink / raw)
  To: gcc-patches
  Cc: Tamar.Christina, claziss, hongtao.liu, syq, bernds_cb1, aldyh,
	Victor Do Nascimento

Given the specification in the GCC internals manual defines the
{u|s}dot_prod<m> standard name as taking "two signed elements of the
same mode, adding them to a third operand of wider mode", there is
currently ambiguity in the relationship between the mode of the first
two arguments and that of the third.

This vagueness means that, in theory, different modes may be
supportable in the third argument.  This flexibility would allow for a
given backend to add to the accumulator a different number of
vectorized products, e.g. A backend may provide instructions for both:

  accum += a[0] * b[0] + a[1] * b[1] + a[2] * b[2] + a[3] * b[3]

and

  accum += a[0] * b[0] + a[1] * b[1],

as is now seen in the SVE2.1 extension to AArch64.  In spite of the
aforementioned flexibility, modeling the dot-product operation as a
direct optab means that we have no way to encode both input and the
accumulator data modes into the backend pattern name, which prevents
us from harnessing this flexibility.

We therefore make all dot_prod optabs conversions, allowing, for
example, for the encoding of both 2-way and 4-way dot product backend
patterns.

gcc/ChangeLog:

	* optabs.def (sdot_prod_optab): Convert from OPTAB_D to
	OPTAB_CD.
	(udot_prod_optab): Likewise.
	(usdot_prod_optab): Likewise.
	* doc/md.texi (Standard Names): update entries for u,s and us
	dot_prod names.
---
 gcc/doc/md.texi | 46 +++++++++++++++++++++-------------------------
 gcc/optabs.def  |  6 +++---
 2 files changed, 24 insertions(+), 28 deletions(-)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 5dc0d55edd6..aa1181a3320 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5760,15 +5760,14 @@ for (i = 0; i < LEN + BIAS; i++)
     operand0 += operand2[i];
 @end smallexample
 
-@cindex @code{sdot_prod@var{m}} instruction pattern
-@item @samp{sdot_prod@var{m}}
-
-Compute the sum of the products of two signed elements.
-Operand 1 and operand 2 are of the same mode. Their
-product, which is of a wider mode, is computed and added to operand 3.
-Operand 3 is of a mode equal or wider than the mode of the product. The
-result is placed in operand 0, which is of the same mode as operand 3.
-@var{m} is the mode of operand 1 and operand 2.
+@cindex @code{sdot_prod@var{m}@var{n}} instruction pattern
+@item @samp{sdot_prod@var{m}@var{n}}
+
+Multiply operand 1 by operand 2 without loss of precision, given that
+both operands contain signed elements.  Add each product to the overlapping
+element of operand 3 and store the result in operand 0.  Operands 0 and 3
+have mode @var{m} and operands 1 and 2 have mode @var{n}, with @var{n}
+having narrower elements than @var{m}.
 
 Semantically the expressions perform the multiplication in the following signs
 
@@ -5778,15 +5777,14 @@ sdot<signed op0, signed op1, signed op2, signed op3> ==
 @dots{}
 @end smallexample
 
-@cindex @code{udot_prod@var{m}} instruction pattern
-@item @samp{udot_prod@var{m}}
+@cindex @code{udot_prod@var{m}@var{n}} instruction pattern
+@item @samp{udot_prod@var{m}@var{n}}
 
-Compute the sum of the products of two unsigned elements.
-Operand 1 and operand 2 are of the same mode. Their
-product, which is of a wider mode, is computed and added to operand 3.
-Operand 3 is of a mode equal or wider than the mode of the product. The
-result is placed in operand 0, which is of the same mode as operand 3.
-@var{m} is the mode of operand 1 and operand 2.
+Multiply operand 1 by operand 2 without loss of precision, given that
+both operands contain unsigned elements.  Add each product to the overlapping
+element of operand 3 and store the result in operand 0.  Operands 0 and 3
+have mode @var{m} and operands 1 and 2 have mode @var{n}, with @var{n}
+having narrower elements than @var{m}.
 
 Semantically the expressions perform the multiplication in the following signs
 
@@ -5796,14 +5794,12 @@ udot<unsigned op0, unsigned op1, unsigned op2, unsigned op3> ==
 @dots{}
 @end smallexample
 
-@cindex @code{usdot_prod@var{m}} instruction pattern
-@item @samp{usdot_prod@var{m}}
-Compute the sum of the products of elements of different signs.
-Operand 1 must be unsigned and operand 2 signed. Their
-product, which is of a wider mode, is computed and added to operand 3.
-Operand 3 is of a mode equal or wider than the mode of the product. The
-result is placed in operand 0, which is of the same mode as operand 3.
-@var{m} is the mode of operand 1 and operand 2.
+@cindex @code{usdot_prod@var{m}@var{n}} instruction pattern
+@item @samp{usdot_prod@var{m}@var{n}}
+Multiply operand 1 by operand 2.  Add each product to the overlapping
+element of operand 3 and store the result in operand 0.  Operands 0 and 3
+have mode @var{m} and operands 1 and 2 have mode @var{n}, with @var{n}
+having narrower elements than @var{m}.
 
 Semantically the expressions perform the multiplication in the following signs
 
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 58a939442bd..ba860144d8b 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -110,6 +110,9 @@ OPTAB_CD(mask_scatter_store_optab, "mask_scatter_store$a$b")
 OPTAB_CD(mask_len_scatter_store_optab, "mask_len_scatter_store$a$b")
 OPTAB_CD(vec_extract_optab, "vec_extract$a$b")
 OPTAB_CD(vec_init_optab, "vec_init$a$b")
+OPTAB_CD (sdot_prod_optab, "sdot_prod$I$a$b")
+OPTAB_CD (udot_prod_optab, "udot_prod$I$a$b")
+OPTAB_CD (usdot_prod_optab, "usdot_prod$I$a$b")
 
 OPTAB_CD (while_ult_optab, "while_ult$a$b")
 
@@ -413,10 +416,7 @@ OPTAB_D (savg_floor_optab, "avg$a3_floor")
 OPTAB_D (uavg_floor_optab, "uavg$a3_floor")
 OPTAB_D (savg_ceil_optab, "avg$a3_ceil")
 OPTAB_D (uavg_ceil_optab, "uavg$a3_ceil")
-OPTAB_D (sdot_prod_optab, "sdot_prod$I$a")
 OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3")
-OPTAB_D (udot_prod_optab, "udot_prod$I$a")
-OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
 OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
 OPTAB_D (usad_optab, "usad$I$a")
 OPTAB_D (ssad_optab, "ssad$I$a")
-- 
2.34.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH V2 02/10] autovectorizer: Add basic support for convert optabs
  2024-08-13 12:41 [PATCH V2 00/10] optabs: Make all `*dot_prod_optab's modeled as conversions Victor Do Nascimento
  2024-08-13 12:41 ` [PATCH V2 01/10] " Victor Do Nascimento
@ 2024-08-13 12:41 ` Victor Do Nascimento
  2024-08-14 12:24   ` Tamar Christina
  2024-08-13 12:41 ` [PATCH V2 03/10] aarch64: Fix aarch64 backend-use of (u|s|us)dot_prod patterns Victor Do Nascimento
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 18+ messages in thread
From: Victor Do Nascimento @ 2024-08-13 12:41 UTC (permalink / raw)
  To: gcc-patches
  Cc: Tamar.Christina, claziss, hongtao.liu, syq, bernds_cb1, aldyh,
	Victor Do Nascimento

Given the shift from modeling dot products as direct optabs to
treating them as conversion optabs, we make necessary changes to the
autovectorizer code to ensure that given the relevant tree code,
together with the input and output data modes, we can retrieve the
relevant optab and subsequently the insn_code for it.

gcc/ChangeLog:

	* gimple-match-exports.cc (directly_supported_p): Add overload
	for conversion-type optabs.
	* gimple-match.h (directly_supported_p): Add new function
	prototype.
	* optabs.cc (expand_widen_pattern_expr): Make the
	DOT_PROD_EXPR tree code use `find_widening_optab_handler' to
	retrieve icode.
	* tree-vect-loop.cc (vect_is_emulated_mixed_dot_prod): make it
	call conversion-type overloaded `directly_supported_p'.
	* tree-vect-patterns.cc (vect_supportable_conv_optab_p): New.
	(vect_recog_dot_prod_pattern): s/direct/conv/ in call to
	`vect_supportable_direct_optab_p'.
---
 gcc/gimple-match-exports.cc | 23 ++++++++++++++++++++
 gcc/gimple-match.h          |  2 ++
 gcc/optabs.cc               |  3 ++-
 gcc/tree-vect-loop.cc       |  1 +
 gcc/tree-vect-patterns.cc   | 43 +++++++++++++++++++++++++++++++++++--
 5 files changed, 69 insertions(+), 3 deletions(-)

diff --git a/gcc/gimple-match-exports.cc b/gcc/gimple-match-exports.cc
index aacf3ff0414..d18497e7c83 100644
--- a/gcc/gimple-match-exports.cc
+++ b/gcc/gimple-match-exports.cc
@@ -1381,6 +1381,29 @@ directly_supported_p (code_helper code, tree type, optab_subtype query_type)
 	  && direct_internal_fn_supported_p (ifn, type, OPTIMIZE_FOR_SPEED));
 }
 
+/* As above, overloading the function for conversion-type optabs.  */
+bool
+directly_supported_p (code_helper code, tree type_out, tree type_in,
+		      optab_subtype query_type)
+{
+  if (code.is_tree_code ())
+    {
+      convert_optab optab = optab_for_tree_code (tree_code (code), type_in,
+						query_type);
+      return (optab != unknown_optab
+	      && convert_optab_handler (optab, TYPE_MODE (type_out),
+					TYPE_MODE (type_in)) != CODE_FOR_nothing);
+    }
+  gcc_assert (query_type == optab_default
+	      || (query_type == optab_vector && VECTOR_TYPE_P (type_in))
+	      || (query_type == optab_scalar && !VECTOR_TYPE_P (type_in)));
+  internal_fn ifn = associated_internal_fn (combined_fn (code), type_in);
+  return (direct_internal_fn_p (ifn)
+	  && direct_internal_fn_supported_p (ifn, tree_pair (type_out, type_in),
+					     OPTIMIZE_FOR_SPEED));
+}
+
+
 /* A wrapper around the internal-fn.cc versions of get_conditional_internal_fn
    for a code_helper CODE operating on type TYPE.  */
 
diff --git a/gcc/gimple-match.h b/gcc/gimple-match.h
index d710fcbace2..0333a5db00a 100644
--- a/gcc/gimple-match.h
+++ b/gcc/gimple-match.h
@@ -419,6 +419,8 @@ code_helper canonicalize_code (code_helper, tree);
 
 #ifdef GCC_OPTABS_TREE_H
 bool directly_supported_p (code_helper, tree, optab_subtype = optab_default);
+bool directly_supported_p (code_helper, tree, tree,
+			   optab_subtype = optab_default);
 #endif
 
 internal_fn get_conditional_internal_fn (code_helper, tree);
diff --git a/gcc/optabs.cc b/gcc/optabs.cc
index 185c5b1a705..32737fb80e8 100644
--- a/gcc/optabs.cc
+++ b/gcc/optabs.cc
@@ -317,7 +317,8 @@ expand_widen_pattern_expr (const_sepops ops, rtx op0, rtx op1, rtx wide_op,
     widen_pattern_optab
       = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), optab_default);
   if (ops->code == WIDEN_MULT_PLUS_EXPR
-      || ops->code == WIDEN_MULT_MINUS_EXPR)
+      || ops->code == WIDEN_MULT_MINUS_EXPR
+      || ops->code == DOT_PROD_EXPR)
     icode = find_widening_optab_handler (widen_pattern_optab,
 					 TYPE_MODE (TREE_TYPE (ops->op2)),
 					 tmode0);
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 6456220cdc9..5f3de7b72a8 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -5289,6 +5289,7 @@ vect_is_emulated_mixed_dot_prod (stmt_vec_info stmt_info)
 
   gcc_assert (STMT_VINFO_REDUC_VECTYPE_IN (stmt_info));
   return !directly_supported_p (DOT_PROD_EXPR,
+				STMT_VINFO_VECTYPE (stmt_info),
 				STMT_VINFO_REDUC_VECTYPE_IN (stmt_info),
 				optab_vector_mixed_sign);
 }
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index f52de2b6972..3afedc9199b 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -250,6 +250,45 @@ vect_supportable_direct_optab_p (vec_info *vinfo, tree otype, tree_code code,
   return true;
 }
 
+/* Return true if the target supports a vector version of CODE,
+   where CODE is known to map to a conversion optab with the given SUBTYPE.
+   ITYPE specifies the type of (some of) the scalar inputs and OTYPE
+   specifies the type of the scalar result.
+
+   When returning true, set *VECOTYPE_OUT to the vector version of OTYPE.
+   Also set *VECITYPE_OUT to the vector version of ITYPE if VECITYPE_OUT
+   is nonnull.  */
+
+static bool
+vect_supportable_conv_optab_p (vec_info *vinfo, tree otype, tree_code code,
+				 tree itype, tree *vecotype_out,
+				 tree *vecitype_out = NULL,
+				 enum optab_subtype subtype = optab_default)
+{
+  tree vecitype = get_vectype_for_scalar_type (vinfo, itype);
+  tree vecotype = get_vectype_for_scalar_type (vinfo, otype);
+  if (!vecitype || !vecotype)
+    return false;
+
+  optab optab = optab_for_tree_code (code, vecitype, subtype);
+  if (!optab)
+    return false;
+
+  insn_code icode = convert_optab_handler (optab, TYPE_MODE (vecotype),
+					   TYPE_MODE (vecitype));
+
+  if (icode == CODE_FOR_nothing
+      || insn_data[icode].operand[0].mode != TYPE_MODE (vecotype)
+      || insn_data[icode].operand[1].mode != TYPE_MODE (vecitype))
+    return false;
+
+  *vecotype_out = vecotype;
+  if (vecitype_out)
+    *vecitype_out = vecitype;
+  return true;
+}
+
+
 /* Round bit precision PRECISION up to a full element.  */
 
 static unsigned int
@@ -1270,13 +1309,13 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
     half_type = signed_type_for (half_type);
 
   tree half_vectype;
-  if (!vect_supportable_direct_optab_p (vinfo, type, DOT_PROD_EXPR, half_type,
+  if (!vect_supportable_conv_optab_p (vinfo, type, DOT_PROD_EXPR, half_type,
 					type_out, &half_vectype, subtype))
     {
       /* We can emulate a mixed-sign dot-product using a sequence of
 	 signed dot-products; see vect_emulate_mixed_dot_prod for details.  */
       if (subtype != optab_vector_mixed_sign
-	  || !vect_supportable_direct_optab_p (vinfo, signed_type_for (type),
+	  || !vect_supportable_conv_optab_p (vinfo, signed_type_for (type),
 					       DOT_PROD_EXPR, half_type,
 					       type_out, &half_vectype,
 					       optab_vector))
-- 
2.34.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH V2 03/10] aarch64: Fix aarch64 backend-use of (u|s|us)dot_prod patterns
  2024-08-13 12:41 [PATCH V2 00/10] optabs: Make all `*dot_prod_optab's modeled as conversions Victor Do Nascimento
  2024-08-13 12:41 ` [PATCH V2 01/10] " Victor Do Nascimento
  2024-08-13 12:41 ` [PATCH V2 02/10] autovectorizer: Add basic support for convert optabs Victor Do Nascimento
@ 2024-08-13 12:41 ` Victor Do Nascimento
  2024-08-15  8:26   ` Richard Sandiford
  2024-08-13 12:41 ` [PATCH V2 04/10] arm: Fix arm " Victor Do Nascimento
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 18+ messages in thread
From: Victor Do Nascimento @ 2024-08-13 12:41 UTC (permalink / raw)
  To: gcc-patches
  Cc: Tamar.Christina, claziss, hongtao.liu, syq, bernds_cb1, aldyh,
	Victor Do Nascimento

Given recent changes to the dot_prod standard pattern name, this patch
fixes the aarch64 back-end by implementing the following changes:

1. Add 2nd mode to all (u|s|us)dot_prod patterns in .md files.
2. Rewrite initialization and function expansion mechanism for simd
builtins.
3. Fix all direct calls to back-end `dot_prod' patterns in SVE
builtins.

Finally, given that it is now possible for the compiler to
differentiate between the two- and four-way dot product, we add a test
to ensure that autovectorization picks up on dot-product patterns
where the result is twice the width of the operands.

gcc/ChangeLog:

	* config/aarch64/aarch64-simd.md
	(<sur>dot_prod<vsi2qi><vczle><vczbe>): Renamed to...
	(<sur>dot_prod<mode><vsi2qi><vczle><vczbe>): ...this.
	(usdot_prod<vsi2qi><vczle><vczbe>): Renamed to...
	(usdot_prod<mode><vsi2qi><vczle><vczbe>): ...this.
	(<su>sadv16qi): Adjust call to gen_udot_prod take second mode.
	(popcount<mode2>): fix use of `udot_prod_optab'.
	* gcc/config/aarch64/aarch64-sve.md
	(<sur>dot_prod<vsi2qi>): Renamed to...
	(<sur>dot_prod<mode><vsi2qi>): ...this.
	(@<sur>dot_prod<vsi2qi>): Renamed to...
	(@<sur>dot_prod<mode><vsi2qi>): ...this.
	(<su>sad<vsi2qi>): Adjust call to gen_udot_prod take second mode.
	* gcc/config/aarch64/aarch64-sve2.md
	(@aarch64_sve_<sur>dotvnx4sivnx8hi): Renamed to...
	(<sur>dot_prodvnx4sivnx8hi): ...this.
	* config/aarch64/aarch64-simd-builtins.def: Modify macro
	expansion-based initialization and expansion
	of (u|s|us)dot_prod builtins.
	* config/aarch64/aarch64-sve-builtins-base.cc
	(svdot_impl::expand): s/direct/convert/ in
	`convert_optab_handler_for_sign' function call.
	(svusdot_impl::expand): add second mode argument in call to
	`code_for_dot_prod'.
	* config/aarch64/aarch64-sve-builtins.cc
	(function_expander::convert_optab_handler_for_sign): New class
	method.
	* config/aarch64/aarch64-sve-builtins.h
	(class function_expander): Add prototype for new
	`convert_optab_handler_for_sign' method.

gcc/testsuite/ChangeLog:
	* gcc.target/aarch64/sme/vect-dotprod-twoway.c (udot2): New.
---
 gcc/config/aarch64/aarch64-builtins.cc        |  7 ++++++
 gcc/config/aarch64/aarch64-simd-builtins.def  |  6 ++---
 gcc/config/aarch64/aarch64-simd.md            |  9 ++++---
 .../aarch64/aarch64-sve-builtins-base.cc      | 13 +++++-----
 gcc/config/aarch64/aarch64-sve-builtins.cc    | 17 +++++++++++++
 gcc/config/aarch64/aarch64-sve-builtins.h     |  3 +++
 gcc/config/aarch64/aarch64-sve.md             |  6 ++---
 gcc/config/aarch64/aarch64-sve2.md            |  2 +-
 .../aarch64/sme/vect-dotprod-twoway.c         | 25 +++++++++++++++++++
 9 files changed, 71 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/vect-dotprod-twoway.c

diff --git a/gcc/config/aarch64/aarch64-builtins.cc b/gcc/config/aarch64/aarch64-builtins.cc
index 30669f8aa18..8af646ab066 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -458,6 +458,13 @@ aarch64_types_storestruct_lane_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
       qualifier_poly, qualifier_struct_load_store_lane_index };
 #define TYPES_STORESTRUCT_LANE_P (aarch64_types_storestruct_lane_p_qualifiers)
 
+constexpr insn_code CODE_FOR_aarch64_sdot_prodv8qi = CODE_FOR_sdot_prodv2siv8qi;
+constexpr insn_code CODE_FOR_aarch64_udot_prodv8qi = CODE_FOR_udot_prodv2siv8qi;
+constexpr insn_code CODE_FOR_aarch64_usdot_prodv8qi = CODE_FOR_usdot_prodv2siv8qi;
+constexpr insn_code CODE_FOR_aarch64_sdot_prodv16qi = CODE_FOR_sdot_prodv4siv16qi;
+constexpr insn_code CODE_FOR_aarch64_udot_prodv16qi = CODE_FOR_udot_prodv4siv16qi;
+constexpr insn_code CODE_FOR_aarch64_usdot_prodv16qi = CODE_FOR_usdot_prodv4siv16qi;
+
 #define CF0(N, X) CODE_FOR_aarch64_##N##X
 #define CF1(N, X) CODE_FOR_##N##X##1
 #define CF2(N, X) CODE_FOR_##N##X##2
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index e65f73d7ba2..0814f8ba14f 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -418,9 +418,9 @@
   BUILTIN_VSDQ_I_DI (BINOP_UUS, urshl, 0, NONE)
 
   /* Implemented by <sur><dotprod>_prod<dot_mode>.  */
-  BUILTIN_VB (TERNOP, sdot_prod, 10, NONE)
-  BUILTIN_VB (TERNOPU, udot_prod, 10, NONE)
-  BUILTIN_VB (TERNOP_SUSS, usdot_prod, 10, NONE)
+  BUILTIN_VB (TERNOP, sdot_prod, 0, NONE)
+  BUILTIN_VB (TERNOPU, udot_prod, 0, NONE)
+  BUILTIN_VB (TERNOP_SUSS, usdot_prod, 0, NONE)
   /* Implemented by aarch64_<sur><dotprod>_lane{q}<dot_mode>.  */
   BUILTIN_VB (QUADOP_LANE, sdot_lane, 0, NONE)
   BUILTIN_VB (QUADOPU_LANE, udot_lane, 0, NONE)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index cc612ec2ca0..e15e547b000 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -568,7 +568,7 @@ (define_expand "cmul<conj_op><mode>3"
 ;; ...
 ;;
 ;; and so the vectorizer provides r, in which the result has to be accumulated.
-(define_insn "<sur>dot_prod<vsi2qi><vczle><vczbe>"
+(define_insn "<sur>dot_prod<mode><vsi2qi><vczle><vczbe>"
   [(set (match_operand:VS 0 "register_operand" "=w")
 	(plus:VS
 	  (unspec:VS [(match_operand:<VSI2QI> 1 "register_operand" "w")
@@ -582,7 +582,7 @@ (define_insn "<sur>dot_prod<vsi2qi><vczle><vczbe>"
 
 ;; These instructions map to the __builtins for the Armv8.6-a I8MM usdot
 ;; (vector) Dot Product operation and the vectorized optab.
-(define_insn "usdot_prod<vsi2qi><vczle><vczbe>"
+(define_insn "usdot_prod<mode><vsi2qi><vczle><vczbe>"
   [(set (match_operand:VS 0 "register_operand" "=w")
 	(plus:VS
 	  (unspec:VS [(match_operand:<VSI2QI> 1 "register_operand" "w")
@@ -1075,7 +1075,7 @@ (define_expand "<su>sadv16qi"
 	rtx ones = force_reg (V16QImode, CONST1_RTX (V16QImode));
 	rtx abd = gen_reg_rtx (V16QImode);
 	emit_insn (gen_aarch64_<su>abdv16qi (abd, operands[1], operands[2]));
-	emit_insn (gen_udot_prodv16qi (operands[0], abd, ones, operands[3]));
+	emit_insn (gen_udot_prodv4siv16qi (operands[0], abd, ones, operands[3]));
 	DONE;
       }
     rtx reduc = gen_reg_rtx (V8HImode);
@@ -3528,6 +3528,7 @@ (define_expand "popcount<mode>2"
 
     /* Generate a byte popcount.  */
     machine_mode mode = <bitsize> == 64 ? V8QImode : V16QImode;
+    machine_mode mode2 = <bitsize> == 64 ? V2SImode : V4SImode;
     rtx tmp = gen_reg_rtx (mode);
     auto icode = optab_handler (popcount_optab, mode);
     emit_insn (GEN_FCN (icode) (tmp, gen_lowpart (mode, operands[1])));
@@ -3538,7 +3539,7 @@ (define_expand "popcount<mode>2"
 	/* For V4SI and V2SI, we can generate a UDOT with a 0 accumulator and a
 	   1 multiplicand.  For V2DI, another UAADDLP is needed.  */
 	rtx ones = force_reg (mode, CONST1_RTX (mode));
-	auto icode = optab_handler (udot_prod_optab, mode);
+	auto icode = convert_optab_handler (udot_prod_optab, mode2, mode);
 	mode = <bitsize> == 64 ? V2SImode : V4SImode;
 	rtx dest = mode == <MODE>mode ? operands[0] : gen_reg_rtx (mode);
 	rtx zeros = force_reg (mode, CONST0_RTX (mode));
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index d55bee0b72f..42e9cec57ad 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -804,15 +804,16 @@ public:
     e.rotate_inputs_left (0, 3);
     insn_code icode;
     if (e.type_suffix_ids[1] == NUM_TYPE_SUFFIXES)
-      icode = e.direct_optab_handler_for_sign (sdot_prod_optab,
-					       udot_prod_optab,
-					       0, GET_MODE (e.args[0]));
+      icode = e.convert_optab_handler_for_sign (sdot_prod_optab,
+						udot_prod_optab,
+						0, e.result_mode (),
+						GET_MODE (e.args[0]));
     else
       icode = (e.type_suffix (0).float_p
 	       ? CODE_FOR_aarch64_sve_fdotvnx4sfvnx8hf
 	       : e.type_suffix (0).unsigned_p
-	       ? CODE_FOR_aarch64_sve_udotvnx4sivnx8hi
-	       : CODE_FOR_aarch64_sve_sdotvnx4sivnx8hi);
+	       ? CODE_FOR_udot_prodvnx4sivnx8hi
+	       : CODE_FOR_sdot_prodvnx4sivnx8hi);
     return e.use_unpred_insn (icode);
   }
 };
@@ -2861,7 +2862,7 @@ public:
        Hence we do the same rotation on arguments as svdot_impl does.  */
     e.rotate_inputs_left (0, 3);
     machine_mode mode = e.vector_mode (0);
-    insn_code icode = code_for_dot_prod (UNSPEC_USDOT, mode);
+    insn_code icode = code_for_dot_prod (UNSPEC_USDOT, e.result_mode (), mode);
     return e.use_exact_insn (icode);
   }
 
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc
index 0a560eaedca..975eca0bbd6 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
@@ -3745,6 +3745,23 @@ function_expander::direct_optab_handler_for_sign (optab signed_op,
   return ::direct_optab_handler (op, mode);
 }
 
+/* Choose between signed and unsigned convert optabs SIGNED_OP and
+   UNSIGNED_OP based on the signedness of type suffix SUFFIX_I, then
+   pick the appropriate optab handler for the mode.  Use MODE as the
+   mode if given, otherwise use the mode of type suffix SUFFIX_I.  */
+insn_code
+function_expander::convert_optab_handler_for_sign (optab signed_op,
+						   optab unsigned_op,
+						   unsigned int suffix_i,
+						   machine_mode to_mode,
+						   machine_mode from_mode)
+{
+  if (from_mode == VOIDmode)
+    from_mode = vector_mode (suffix_i);
+  optab op = type_suffix (suffix_i).unsigned_p ? unsigned_op : signed_op;
+  return ::convert_optab_handler (op, to_mode, from_mode);
+}
+
 /* Return true if X overlaps any input.  */
 bool
 function_expander::overlaps_input_p (rtx x)
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h b/gcc/config/aarch64/aarch64-sve-builtins.h
index 9ab6f202c30..7534a58c3d7 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.h
+++ b/gcc/config/aarch64/aarch64-sve-builtins.h
@@ -659,6 +659,9 @@ public:
   insn_code direct_optab_handler (optab, unsigned int = 0);
   insn_code direct_optab_handler_for_sign (optab, optab, unsigned int = 0,
 					   machine_mode = E_VOIDmode);
+  insn_code convert_optab_handler_for_sign (optab, optab, unsigned int = 0,
+					    machine_mode = E_VOIDmode,
+					    machine_mode = E_VOIDmode);
 
   machine_mode result_mode () const;
 
diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md
index a5cd42be9d5..2fe18bdacfe 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -7197,7 +7197,7 @@ (define_insn_and_rewrite "*cond_fnma<mode>_any"
 ;; -------------------------------------------------------------------------
 
 ;; Four-element integer dot-product with accumulation.
-(define_insn "<sur>dot_prod<vsi2qi>"
+(define_insn "<sur>dot_prod<mode><vsi2qi>"
   [(set (match_operand:SVE_FULL_SDI 0 "register_operand")
 	(plus:SVE_FULL_SDI
 	  (unspec:SVE_FULL_SDI
@@ -7235,7 +7235,7 @@ (define_insn "@aarch64_<sur>dot_prod_lane<SVE_FULL_SDI:mode><SVE_FULL_BHI:mode>"
   }
 )
 
-(define_insn "@<sur>dot_prod<vsi2qi>"
+(define_insn "@<sur>dot_prod<mode><vsi2qi>"
   [(set (match_operand:VNx4SI_ONLY 0 "register_operand")
         (plus:VNx4SI_ONLY
 	  (unspec:VNx4SI_ONLY
@@ -7293,7 +7293,7 @@ (define_expand "<su>sad<vsi2qi>"
     rtx ones = force_reg (<VSI2QI>mode, CONST1_RTX (<VSI2QI>mode));
     rtx diff = gen_reg_rtx (<VSI2QI>mode);
     emit_insn (gen_<su>abd<vsi2qi>3 (diff, operands[1], operands[2]));
-    emit_insn (gen_udot_prod<vsi2qi> (operands[0], diff, ones, operands[3]));
+    emit_insn (gen_udot_prod<mode><vsi2qi> (operands[0], diff, ones, operands[3]));
     DONE;
   }
 )
diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-sve2.md
index 972b03a4fef..725092cc95f 100644
--- a/gcc/config/aarch64/aarch64-sve2.md
+++ b/gcc/config/aarch64/aarch64-sve2.md
@@ -2021,7 +2021,7 @@ (define_insn "@aarch64_sve_qsub_<sve_int_op>_lane_<mode>"
 )
 
 ;; Two-way dot-product.
-(define_insn "@aarch64_sve_<sur>dotvnx4sivnx8hi"
+(define_insn "<sur>dot_prodvnx4sivnx8hi"
   [(set (match_operand:VNx4SI 0 "register_operand")
 	(plus:VNx4SI
 	  (unspec:VNx4SI
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/vect-dotprod-twoway.c b/gcc/testsuite/gcc.target/aarch64/sme/vect-dotprod-twoway.c
new file mode 100644
index 00000000000..453f3a75e6f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/vect-dotprod-twoway.c
@@ -0,0 +1,25 @@
+/* { dg-additional-options "-march=armv9.2-a+sme2 -O2 -ftree-vectorize" } */
+
+#include <stdint.h>
+
+uint32_t udot2(int n, uint16_t* data) __arm_streaming
+{
+  uint32_t sum = 0;
+  for (int i=0; i<n; i+=1) {
+    sum += data[i] * data[i];
+  }
+  return sum;
+}
+
+int32_t sdot2(int n, int16_t* data) __arm_streaming
+{
+  int32_t sum = 0;
+  for (int i=0; i<n; i+=1) {
+    sum += data[i] * data[i];
+  }
+  return sum;
+}
+
+/* { dg-final { scan-assembler-times {\tudot\tz[0-9]+\.s, z[0-9]+\.h, z[0-9]+\.h\n} 5 } } */
+/* { dg-final { scan-assembler-times {\tsdot\tz[0-9]+\.s, z[0-9]+\.h, z[0-9]+\.h\n} 5 } } */
+/* { dg-final { scan-assembler-times {\twhilelo\t} 4 } } */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH V2 04/10] arm: Fix arm backend-use of (u|s|us)dot_prod patterns
  2024-08-13 12:41 [PATCH V2 00/10] optabs: Make all `*dot_prod_optab's modeled as conversions Victor Do Nascimento
                   ` (2 preceding siblings ...)
  2024-08-13 12:41 ` [PATCH V2 03/10] aarch64: Fix aarch64 backend-use of (u|s|us)dot_prod patterns Victor Do Nascimento
@ 2024-08-13 12:41 ` Victor Do Nascimento
  2024-08-13 12:41 ` [PATCH V2 05/10] i386: Fix dot_prod backend patterns for mmx and sse targets Victor Do Nascimento
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Victor Do Nascimento @ 2024-08-13 12:41 UTC (permalink / raw)
  To: gcc-patches
  Cc: Tamar.Christina, claziss, hongtao.liu, syq, bernds_cb1, aldyh,
	Victor Do Nascimento

gcc/ChangeLog:

	* config/arm/arm-builtins.cc (enum arm_builtins): Add new
	ARM_BUILTIN_* enum values: SDOTV8QI, SDOTV16QI, UDOTV8QI,
	UDOTV16QI, USDOTV8QI, USDOTV16QI.
	(arm_init_dotprod_builtins): New.
	(arm_init_builtins): Add call to `arm_init_dotprod_builtins'.
	(arm_general_gimple_fold_builtin): New.
	* config/arm/arm-protos.h (arm_general_gimple_fold_builtin):
	New prototype.
	* config/arm/arm.cc (arm_gimple_fold_builtin): Add call to
	`arm_general_gimple_fold_builtin'.
	* config/arm/neon.md (<sup>dot_prod<vsi2qi>): Renamed to...
	(<sup>dot_prod<mode><vsi2qi>): ...this.
	(neon_usdot<vsi2qi>): Renamed to...
	(neon_usdot<mode><vsi2qi>): ...this.
---
 gcc/config/arm/arm-builtins.cc       | 95 ++++++++++++++++++++++++++++
 gcc/config/arm/arm-protos.h          |  3 +
 gcc/config/arm/arm.cc                |  1 +
 gcc/config/arm/arm_neon_builtins.def |  3 -
 gcc/config/arm/neon.md               |  6 +-
 5 files changed, 102 insertions(+), 6 deletions(-)

diff --git a/gcc/config/arm/arm-builtins.cc b/gcc/config/arm/arm-builtins.cc
index c9d50bf8fbb..b23b6caa063 100644
--- a/gcc/config/arm/arm-builtins.cc
+++ b/gcc/config/arm/arm-builtins.cc
@@ -45,6 +45,8 @@
 #include "arm-builtins.h"
 #include "stringpool.h"
 #include "attribs.h"
+#include "basic-block.h"
+#include "gimple.h"
 
 #define SIMD_MAX_BUILTIN_ARGS 7
 
@@ -1298,6 +1300,13 @@ enum arm_builtins
 #define VAR1(T, N, X) \
   ARM_BUILTIN_##N,
 
+  ARM_BUILTIN_NEON_SDOTV8QI,
+  ARM_BUILTIN_NEON_SDOTV16QI,
+  ARM_BUILTIN_NEON_UDOTV8QI,
+  ARM_BUILTIN_NEON_UDOTV16QI,
+  ARM_BUILTIN_NEON_USDOTV8QI,
+  ARM_BUILTIN_NEON_USDOTV16QI,
+
   ARM_BUILTIN_ACLE_BASE,
   ARM_BUILTIN_SAT_IMM_CHECK = ARM_BUILTIN_ACLE_BASE,
 
@@ -2648,6 +2657,60 @@ arm_init_fp16_builtins (void)
 					       "__fp16");
 }
 
+static void
+arm_init_dotprod_builtins (void)
+{
+  tree fndecl = NULL;
+  tree ftype = NULL;
+
+  tree uv8qi = arm_simd_builtin_type (V8QImode, qualifier_unsigned);
+  tree sv8qi = arm_simd_builtin_type (V8QImode, qualifier_none);
+  tree uv16qi = arm_simd_builtin_type (V16QImode, qualifier_unsigned);
+  tree sv16qi = arm_simd_builtin_type (V16QImode, qualifier_none);
+  tree uv2si = arm_simd_builtin_type (V2SImode, qualifier_unsigned);
+  tree sv2si = arm_simd_builtin_type (V2SImode, qualifier_none);
+  tree uv4si = arm_simd_builtin_type (V4SImode, qualifier_unsigned);
+  tree sv4si = arm_simd_builtin_type (V4SImode, qualifier_none);
+
+  struct builtin_decls_data
+  {
+    tree out_type_node;
+    tree in_type1_node;
+    tree in_type2_node;
+    const char *builtin_name;
+    int function_code;
+  };
+
+#define NAME(A) "__builtin_neon_" #A
+#define ENUM(B) ARM_BUILTIN_NEON_##B
+
+  builtin_decls_data bdda[] =
+  {
+    { sv2si, sv8qi,  sv8qi,  NAME (sdotv8qi),	    ENUM (SDOTV8QI)   },
+    { uv2si, uv8qi,  uv8qi,  NAME (udotv8qi_uuuu),  ENUM (UDOTV8QI)   },
+    { sv2si, uv8qi,  sv8qi,  NAME (usdotv8qi_ssus), ENUM (USDOTV8QI)  },
+    { sv4si, sv16qi, sv16qi, NAME (sdotv16qi),	    ENUM (SDOTV16QI)  },
+    { uv4si, uv16qi, uv16qi, NAME (udotv16qi_uuuu),  ENUM (UDOTV16QI)  },
+    { sv4si, uv16qi, sv16qi, NAME (usdotv16qi_ssus), ENUM (USDOTV16QI) },
+  };
+
+#undef NAME
+#undef ENUM
+
+  builtin_decls_data *bdd = bdda;
+  builtin_decls_data *bdd_end = bdd + (ARRAY_SIZE (bdda));
+
+  for (; bdd < bdd_end; bdd++)
+  {
+    ftype = build_function_type_list (bdd->out_type_node, bdd->out_type_node,
+				      bdd->in_type1_node, bdd->in_type2_node,
+				      NULL_TREE);
+    fndecl = arm_general_add_builtin_function (bdd->builtin_name,
+					       ftype, bdd->function_code);
+    arm_builtin_decls[bdd->function_code] = fndecl;
+  }
+}
+
 void
 arm_init_builtins (void)
 {
@@ -2676,6 +2739,7 @@ arm_init_builtins (void)
 	arm_init_neon_builtins ();
       arm_init_vfp_builtins ();
       arm_init_crypto_builtins ();
+      arm_init_dotprod_builtins ();
     }
 
   if (TARGET_CDE)
@@ -2738,6 +2802,37 @@ arm_builtin_decl (unsigned code, bool initialize_p ATTRIBUTE_UNUSED)
     }
 }
 
+/* Try to fold STMT, given that it's a call to the built-in function with
+   subcode FCODE.  Return the new statement on success and null on
+   failure.  */
+gimple *
+arm_general_gimple_fold_builtin (unsigned int fcode, gcall *stmt,
+				 gimple_stmt_iterator *gsi ATTRIBUTE_UNUSED)
+{
+  gimple *new_stmt = NULL;
+  unsigned nargs = gimple_call_num_args (stmt);
+  tree *args = (nargs > 0
+		? gimple_call_arg_ptr (stmt, 0)
+		: &error_mark_node);
+
+  switch (fcode)
+    {
+    case ARM_BUILTIN_NEON_SDOTV8QI:
+    case ARM_BUILTIN_NEON_SDOTV16QI:
+    case ARM_BUILTIN_NEON_UDOTV8QI:
+    case ARM_BUILTIN_NEON_UDOTV16QI:
+    case ARM_BUILTIN_NEON_USDOTV8QI:
+    case ARM_BUILTIN_NEON_USDOTV16QI:
+      new_stmt = gimple_build_assign (gimple_call_lhs (stmt),
+				      DOT_PROD_EXPR, args[1],
+				      args[2], args[0]);
+      break;
+    default:
+      break;
+    }
+  return new_stmt;
+}
+
 /* Errors in the source file can cause expand_expr to return const0_rtx
    where we expect a vector.  To avoid crashing, use one of the vector
    clear instructions.  */
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 50cae2b513a..4e31d1d0225 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -57,6 +57,9 @@ extern rtx arm_expand_builtin (tree exp, rtx target, rtx subtarget
 extern tree arm_builtin_decl (unsigned code, bool initialize_p
 			      ATTRIBUTE_UNUSED);
 extern void arm_init_builtins (void);
+extern gimple *arm_general_gimple_fold_builtin (unsigned int fcode, gcall *stmt,
+						gimple_stmt_iterator *gsi
+						ATTRIBUTE_UNUSED);
 extern void arm_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update);
 extern rtx arm_simd_vect_par_cnst_half (machine_mode mode, bool high);
 extern bool arm_simd_check_vect_par_cnst_half_p (rtx op, machine_mode mode,
diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 92cd168e659..109e9c131f5 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -2852,6 +2852,7 @@ arm_gimple_fold_builtin (gimple_stmt_iterator *gsi)
   switch (code & ARM_BUILTIN_CLASS)
     {
     case ARM_BUILTIN_GENERAL:
+      new_stmt = arm_general_gimple_fold_builtin (subcode, stmt, gsi);
       break;
     case ARM_BUILTIN_MVE:
       new_stmt = arm_mve::gimple_fold_builtin (subcode, stmt);
diff --git a/gcc/config/arm/arm_neon_builtins.def b/gcc/config/arm/arm_neon_builtins.def
index 0c5d40b96e5..cf5537ca95d 100644
--- a/gcc/config/arm/arm_neon_builtins.def
+++ b/gcc/config/arm/arm_neon_builtins.def
@@ -349,14 +349,11 @@ VAR13 (STORE1, vst4,
 	v8qi, v4hi, v4hf, v4bf, v2si, v2sf, di, v16qi, v8hi, v8hf, v8bf, v4si, v4sf)
 VAR11 (STORE1LANE, vst4_lane,
 	v8qi, v4hi, v4hf, v2si, v2sf, v8hi, v8hf, v4si, v4sf, v4bf, v8bf)
-VAR2 (TERNOP, sdot, v8qi, v16qi)
-VAR2 (UTERNOP, udot, v8qi, v16qi)
 VAR2 (MAC_LANE, sdot_lane, v8qi, v16qi)
 VAR2 (UMAC_LANE, udot_lane, v8qi, v16qi)
 VAR2 (MAC_LANE, sdot_laneq, v8qi, v16qi)
 VAR2 (UMAC_LANE, udot_laneq, v8qi, v16qi)
 
-VAR2 (USTERNOP, usdot, v8qi, v16qi)
 VAR2 (USMAC_LANE_QUADTUP, usdot_lane, v8qi, v16qi)
 VAR2 (SUMAC_LANE_QUADTUP, sudot_lane, v8qi, v16qi)
 VAR2 (USMAC_LANE_QUADTUP, usdot_laneq, v8qi, v16qi)
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index fa4a7aeda35..b3a3564ca2b 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -2989,7 +2989,7 @@ (define_expand "cmul<conj_op><mode>3"
 ;; ...
 ;;
 ;; and so the vectorizer provides r, in which the result has to be accumulated.
-(define_insn "<sup>dot_prod<vsi2qi>"
+(define_insn "<sup>dot_prod<mode><vsi2qi>"
   [(set (match_operand:VCVTI 0 "register_operand" "=w")
 	(plus:VCVTI
 	  (unspec:VCVTI [(match_operand:<VSI2QI> 1 "register_operand" "w")
@@ -3013,7 +3013,7 @@ (define_expand "neon_<sup>dot<vsi2qi>"
 )
 
 ;; These instructions map to the __builtins for the Dot Product operations.
-(define_insn "neon_usdot<vsi2qi>"
+(define_insn "neon_usdot<mode><vsi2qi>"
   [(set (match_operand:VCVTI 0 "register_operand" "=w")
 	(plus:VCVTI
 	  (unspec:VCVTI
@@ -3112,7 +3112,7 @@ (define_insn "neon_<sup>dot_laneq<vsi2qi>"
 )
 
 ;; Auto-vectorizer pattern for usdot
-(define_expand "usdot_prod<vsi2qi>"
+(define_expand "usdot_prod<mode><vsi2qi>"
   [(set (match_operand:VCVTI 0 "register_operand")
 	(plus:VCVTI (unspec:VCVTI [(match_operand:<VSI2QI> 1
 							"register_operand")
-- 
2.34.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH V2 05/10] i386: Fix dot_prod backend patterns for mmx and sse targets
  2024-08-13 12:41 [PATCH V2 00/10] optabs: Make all `*dot_prod_optab's modeled as conversions Victor Do Nascimento
                   ` (3 preceding siblings ...)
  2024-08-13 12:41 ` [PATCH V2 04/10] arm: Fix arm " Victor Do Nascimento
@ 2024-08-13 12:41 ` Victor Do Nascimento
  2024-08-14  1:24   ` Liu, Hongtao
  2024-08-13 12:41 ` [PATCH V2 06/10] arc: Adjust dot-product backend patterns Victor Do Nascimento
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 18+ messages in thread
From: Victor Do Nascimento @ 2024-08-13 12:41 UTC (permalink / raw)
  To: gcc-patches
  Cc: Tamar.Christina, claziss, hongtao.liu, syq, bernds_cb1, aldyh,
	Victor Do Nascimento

Following the migration of the dot_prod optab from a direct to a
conversion-type optab, ensure all back-end patterns incorporate the
second machine mode into pattern names.

gcc/ChangeLog:

	* config/i386/mmx.md (usdot_prodv8qi): Renamed to...
	(usdot_prodv2siv8qi): ...this.
	(sdot_prodv8qi): Renamed to...
	(sdot_prodv2siv8qi): ...this.
	(udot_prodv8qi): Renamed to...
	(udot_prodv2siv8qi): ...this.
	(usdot_prodv4hi): Renamed to...
	(usdot_prodv2siv4hi): ...this.
	(udot_prodv4hi): Renamed to...
	(udot_prodv2siv4hi): ...this.
	(sdot_prodv4hi): Renamed to...
	(sdot_prodv2siv4hi): ...this.
	* config/i386/sse.md (sdot_prod<mode>): Renamed to...
	(sdot_prod<sseunpackmodelower><mode>): ...this.
	(sdot_prodv4si): Renamed to...
	(sdot_prodv2div4si): ...this.
	(usdot_prod<mode>): Renamed to...
	(usdot_prod<ssedvecmodelower><mode>): ...this.
	(sdot_prod<mode>): Renamed to...
	(sdot_prod<ssedvecmodelower><mode>): ...this.
	(sdot_prodv64qi): Renamed to...
	(sdot_prodv16siv64qi): ...this.
	(udot_prod<mode>): Renamed to...
	(udot_prod<ssedvecmodelower><mode>): ...this.
	(udot_prodv64qi): Renamed to...
	(udot_prodv16qiv64qi): ...this.
	(usdot_prod<mode>): Renamed to...
	(usdot_prod<sseunpackmodelower><mode>): ...this.
	(udot_prod<mode>): Renamed to...
	(udot_prod<sseunpackmodelower><mode>): ...this.
---
 gcc/config/i386/mmx.md | 30 +++++++++++++++---------------
 gcc/config/i386/sse.md | 38 +++++++++++++++++++-------------------
 2 files changed, 34 insertions(+), 34 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 94d3a6e5692..d78739b033d 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -6344,7 +6344,7 @@ (define_expand "usadv8qi"
   DONE;
 })
 
-(define_expand "usdot_prodv8qi"
+(define_expand "usdot_prodv2siv8qi"
   [(match_operand:V2SI 0 "register_operand")
    (match_operand:V8QI 1 "register_operand")
    (match_operand:V8QI 2 "register_operand")
@@ -6363,7 +6363,7 @@ (define_expand "usdot_prodv8qi"
       rtx op3 = lowpart_subreg (V4SImode, operands[3], V2SImode);
       rtx op0 = gen_reg_rtx (V4SImode);
 
-      emit_insn (gen_usdot_prodv16qi (op0, op1, op2, op3));
+      emit_insn (gen_usdot_prodv4siv16qi (op0, op1, op2, op3));
       emit_move_insn (operands[0], lowpart_subreg (V2SImode, op0, V4SImode));
      }
    else
@@ -6377,7 +6377,7 @@ (define_expand "usdot_prodv8qi"
       emit_move_insn (op3, CONST0_RTX (V4SImode));
       emit_insn (gen_zero_extendv8qiv8hi2 (op1, operands[1]));
       emit_insn (gen_extendv8qiv8hi2 (op2, operands[2]));
-      emit_insn (gen_sdot_prodv8hi (op0, op1, op2, op3));
+      emit_insn (gen_sdot_prodv4siv8hi (op0, op1, op2, op3));
 
       /* vec_perm (op0, 2, 3, 0, 1);  */
       emit_insn (gen_sse2_pshufd (op0_1, op0, GEN_INT (78)));
@@ -6388,7 +6388,7 @@ (define_expand "usdot_prodv8qi"
     DONE;
 })
 
-(define_expand "sdot_prodv8qi"
+(define_expand "sdot_prodv2siv8qi"
   [(match_operand:V2SI 0 "register_operand")
    (match_operand:V8QI 1 "register_operand")
    (match_operand:V8QI 2 "register_operand")
@@ -6406,7 +6406,7 @@ (define_expand "sdot_prodv8qi"
       rtx op3 = lowpart_subreg (V4SImode, operands[3], V2SImode);
       rtx op0 = gen_reg_rtx (V4SImode);
 
-      emit_insn (gen_sdot_prodv16qi (op0, op1, op2, op3));
+      emit_insn (gen_sdot_prodv4siv16qi (op0, op1, op2, op3));
       emit_move_insn (operands[0], lowpart_subreg (V2SImode, op0, V4SImode));
     }
   else
@@ -6420,7 +6420,7 @@ (define_expand "sdot_prodv8qi"
       emit_move_insn (op3, CONST0_RTX (V4SImode));
       emit_insn (gen_extendv8qiv8hi2 (op1, operands[1]));
       emit_insn (gen_extendv8qiv8hi2 (op2, operands[2]));
-      emit_insn (gen_sdot_prodv8hi (op0, op1, op2, op3));
+      emit_insn (gen_sdot_prodv4siv8hi (op0, op1, op2, op3));
 
       /* vec_perm (op0, 2, 3, 0, 1);  */
       emit_insn (gen_sse2_pshufd (op0_1, op0, GEN_INT (78)));
@@ -6432,7 +6432,7 @@ (define_expand "sdot_prodv8qi"
 
 })
 
-(define_expand "udot_prodv8qi"
+(define_expand "udot_prodv2siv8qi"
   [(match_operand:V2SI 0 "register_operand")
    (match_operand:V8QI 1 "register_operand")
    (match_operand:V8QI 2 "register_operand")
@@ -6450,7 +6450,7 @@ (define_expand "udot_prodv8qi"
       rtx op3 = lowpart_subreg (V4SImode, operands[3], V2SImode);
       rtx op0 = gen_reg_rtx (V4SImode);
 
-      emit_insn (gen_udot_prodv16qi (op0, op1, op2, op3));
+      emit_insn (gen_udot_prodv4siv16qi (op0, op1, op2, op3));
       emit_move_insn (operands[0], lowpart_subreg (V2SImode, op0, V4SImode));
     }
   else
@@ -6464,7 +6464,7 @@ (define_expand "udot_prodv8qi"
       emit_move_insn (op3, CONST0_RTX (V4SImode));
       emit_insn (gen_zero_extendv8qiv8hi2 (op1, operands[1]));
       emit_insn (gen_zero_extendv8qiv8hi2 (op2, operands[2]));
-      emit_insn (gen_sdot_prodv8hi (op0, op1, op2, op3));
+      emit_insn (gen_sdot_prodv4siv8hi (op0, op1, op2, op3));
 
       /* vec_perm (op0, 2, 3, 0, 1);  */
       emit_insn (gen_sse2_pshufd (op0_1, op0, GEN_INT (78)));
@@ -6476,7 +6476,7 @@ (define_expand "udot_prodv8qi"
 
 })
 
-(define_expand "usdot_prodv4hi"
+(define_expand "usdot_prodv2siv4hi"
   [(match_operand:V2SI 0 "register_operand")
    (match_operand:V4HI 1 "register_operand")
    (match_operand:V4HI 2 "register_operand")
@@ -6492,12 +6492,12 @@ (define_expand "usdot_prodv4hi"
   rtx op3 = lowpart_subreg (V4SImode, operands[3], V2SImode);
   rtx op0 = gen_reg_rtx (V4SImode);
 
-  emit_insn (gen_usdot_prodv8hi (op0, op1, op2, op3));
+  emit_insn (gen_usdot_prodv4siv8hi (op0, op1, op2, op3));
   emit_move_insn (operands[0], lowpart_subreg (V2SImode, op0, V4SImode));
   DONE;
 })
 
-(define_expand "udot_prodv4hi"
+(define_expand "udot_prodv2siv4hi"
   [(match_operand:V2SI 0 "register_operand")
    (match_operand:V4HI 1 "register_operand")
    (match_operand:V4HI 2 "register_operand")
@@ -6513,12 +6513,12 @@ (define_expand "udot_prodv4hi"
   rtx op3 = lowpart_subreg (V4SImode, operands[3], V2SImode);
   rtx op0 = gen_reg_rtx (V4SImode);
 
-  emit_insn (gen_udot_prodv8hi (op0, op1, op2, op3));
+  emit_insn (gen_udot_prodv4siv8hi (op0, op1, op2, op3));
   emit_move_insn (operands[0], lowpart_subreg (V2SImode, op0, V4SImode));
   DONE;
 })
 
-(define_expand "sdot_prodv4hi"
+(define_expand "sdot_prodv2siv4hi"
   [(match_operand:V2SI 0 "register_operand")
    (match_operand:V4HI 1 "register_operand")
    (match_operand:V4HI 2 "register_operand")
@@ -6534,7 +6534,7 @@ (define_expand "sdot_prodv4hi"
   rtx op3 = lowpart_subreg (V4SImode, operands[3], V2SImode);
   rtx op0 = gen_reg_rtx (V4SImode);
 
-  emit_insn (gen_sdot_prodv8hi (op0, op1, op2, op3));
+  emit_insn (gen_sdot_prodv4siv8hi (op0, op1, op2, op3));
   emit_move_insn (operands[0], lowpart_subreg (V2SImode, op0, V4SImode));
   DONE;
 })
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index d1010bc5682..0bf250c86d9 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -16727,7 +16727,7 @@ (define_mode_attr SDOT_PMADD_SUF
 (define_mode_attr SDOT_VPDP_SUF
   [(V32HI "v16si") (V16HI "v8si") (V8HI "v4si")])
 
-(define_expand "sdot_prod<mode>"
+(define_expand "sdot_prod<sseunpackmodelower><mode>"
   [(match_operand:<sseunpackmode> 0 "register_operand")
    (match_operand:VI2_AVX512VNNIBW 1 "register_operand")
    (match_operand:VI2_AVX512VNNIBW 2 "register_operand")
@@ -16762,7 +16762,7 @@ (define_expand "sdot_prod<mode>"
 
 ;; Normally we use widen_mul_even/odd, but combine can't quite get it all
 ;; back together when madd is available.
-(define_expand "sdot_prodv4si"
+(define_expand "sdot_prodv2div4si"
   [(match_operand:V2DI 0 "register_operand")
    (match_operand:V4SI 1 "register_operand")
    (match_operand:V4SI 2 "register_operand")
@@ -30190,7 +30190,7 @@ (define_insn "vpshldv_<mode>_maskz_1"
    [(set_attr ("prefix") ("evex"))
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_expand "usdot_prod<mode>"
+(define_expand "usdot_prod<ssedvecmodelower><mode>"
   [(match_operand:<ssedvecmode> 0 "register_operand")
    (match_operand:VI1_AVX512 1 "register_operand")
    (match_operand:VI1_AVX512 2 "register_operand")
@@ -30228,9 +30228,9 @@ (define_expand "usdot_prod<mode>"
       rtx sum = gen_reg_rtx (<ssedvecmode>mode);
 
       emit_move_insn (sum, CONST0_RTX (<ssedvecmode>mode));
-      emit_insn (gen_sdot_prod<sseunpackmodelower> (res1, op1_lo,
+      emit_insn (gen_sdot_prod<ssedvecmodelower><sseunpackmodelower> (res1, op1_lo,
 						    op2_lo, sum));
-      emit_insn (gen_sdot_prod<sseunpackmodelower> (res2, op1_hi,
+      emit_insn (gen_sdot_prod<ssedvecmodelower><sseunpackmodelower> (res2, op1_hi,
 						    op2_hi, operands[3]));
       emit_insn (gen_add<ssedvecmodelower>3 (operands[0], res1, res2));
     }
@@ -31049,7 +31049,7 @@ (define_int_attr vpdotprodtype
    (UNSPEC_VPDPBSUD "bsud") (UNSPEC_VPDPBSUDS "bsuds")
    (UNSPEC_VPDPBUUD "buud") (UNSPEC_VPDPBUUDS "buuds")])
 
-(define_expand "sdot_prod<mode>"
+(define_expand "sdot_prod<ssedvecmodelower><mode>"
   [(match_operand:<ssedvecmode> 0 "register_operand")
    (match_operand:VI1_AVX2 1 "register_operand")
    (match_operand:VI1_AVX2 2 "register_operand")
@@ -31085,9 +31085,9 @@ (define_expand "sdot_prod<mode>"
       rtx sum = gen_reg_rtx (<ssedvecmode>mode);
 
       emit_move_insn (sum, CONST0_RTX (<ssedvecmode>mode));
-      emit_insn (gen_sdot_prod<sseunpackmodelower> (res1, op1_lo,
+      emit_insn (gen_sdot_prod<ssedvecmodelower><sseunpackmodelower> (res1, op1_lo,
 						    op2_lo, sum));
-      emit_insn (gen_sdot_prod<sseunpackmodelower> (res2, op1_hi,
+      emit_insn (gen_sdot_prod<ssedvecmodelower><sseunpackmodelower> (res2, op1_hi,
 						    op2_hi, operands[3]));
       emit_insn (gen_add<ssedvecmodelower>3 (operands[0], res1, res2));
     }
@@ -31095,7 +31095,7 @@ (define_expand "sdot_prod<mode>"
   DONE;
 })
 
-(define_expand "sdot_prodv64qi"
+(define_expand "sdot_prodv16siv64qi"
   [(match_operand:V16SI 0 "register_operand")
    (match_operand:V64QI 1 "register_operand")
    (match_operand:V64QI 2 "register_operand")
@@ -31118,14 +31118,14 @@ (define_expand "sdot_prodv64qi"
   rtx sum = gen_reg_rtx (V16SImode);
 
   emit_move_insn (sum, CONST0_RTX (V16SImode));
-  emit_insn (gen_sdot_prodv32hi (res1, op1_lo, op2_lo, sum));
-  emit_insn (gen_sdot_prodv32hi (res2, op1_hi, op2_hi, operands[3]));
+  emit_insn (gen_sdot_prodv16siv32hi (res1, op1_lo, op2_lo, sum));
+  emit_insn (gen_sdot_prodv16siv32hi (res2, op1_hi, op2_hi, operands[3]));
 
   emit_insn (gen_addv16si3 (operands[0], res1, res2));
   DONE;
 })
 
-(define_expand "udot_prod<mode>"
+(define_expand "udot_prod<ssedvecmodelower><mode>"
   [(match_operand:<ssedvecmode> 0 "register_operand")
    (match_operand:VI1_AVX2 1 "register_operand")
    (match_operand:VI1_AVX2 2 "register_operand")
@@ -31161,9 +31161,9 @@ (define_expand "udot_prod<mode>"
      rtx sum = gen_reg_rtx (<ssedvecmode>mode);
 
      emit_move_insn (sum, CONST0_RTX (<ssedvecmode>mode));
-     emit_insn (gen_sdot_prod<sseunpackmodelower> (res1, op1_lo,
+     emit_insn (gen_sdot_prod<ssedvecmodelower><sseunpackmodelower> (res1, op1_lo,
 						    op2_lo, sum));
-     emit_insn (gen_sdot_prod<sseunpackmodelower> (res2, op1_hi,
+     emit_insn (gen_sdot_prod<ssedvecmodelower><sseunpackmodelower> (res2, op1_hi,
 						    op2_hi, operands[3]));
      emit_insn (gen_add<ssedvecmodelower>3 (operands[0], res1, res2));
    }
@@ -31171,7 +31171,7 @@ (define_expand "udot_prod<mode>"
   DONE;
 })
 
-(define_expand "udot_prodv64qi"
+(define_expand "udot_prodv16qiv64qi"
   [(match_operand:V16SI 0 "register_operand")
    (match_operand:V64QI 1 "register_operand")
    (match_operand:V64QI 2 "register_operand")
@@ -31194,8 +31194,8 @@ (define_expand "udot_prodv64qi"
   rtx sum = gen_reg_rtx (V16SImode);
 
   emit_move_insn (sum, CONST0_RTX (V16SImode));
-  emit_insn (gen_sdot_prodv32hi (res1, op1_lo, op2_lo, sum));
-  emit_insn (gen_sdot_prodv32hi (res2, op1_hi, op2_hi, operands[3]));
+  emit_insn (gen_sdot_prodv16siv32hi (res1, op1_lo, op2_lo, sum));
+  emit_insn (gen_sdot_prodv16siv32hi (res2, op1_hi, op2_hi, operands[3]));
 
   emit_insn (gen_addv16si3 (operands[0], res1, res2));
   DONE;
@@ -31301,7 +31301,7 @@ (define_int_attr vpdpwprodtype
    (UNSPEC_VPDPWSUD "wsud") (UNSPEC_VPDPWSUDS "wsuds")
    (UNSPEC_VPDPWUUD "wuud") (UNSPEC_VPDPWUUDS "wuuds")])
 
-(define_expand "usdot_prod<mode>"
+(define_expand "usdot_prod<sseunpackmodelower><mode>"
   [(match_operand:<sseunpackmode> 0 "register_operand")
    (match_operand:VI2_AVX2 1 "register_operand")
    (match_operand:VI2_AVX2 2 "register_operand")
@@ -31319,7 +31319,7 @@ (define_expand "usdot_prod<mode>"
   DONE;
 })
 
-(define_expand "udot_prod<mode>"
+(define_expand "udot_prod<sseunpackmodelower><mode>"
   [(match_operand:<sseunpackmode> 0 "register_operand")
    (match_operand:VI2_AVX2 1 "register_operand")
    (match_operand:VI2_AVX2 2 "register_operand")
-- 
2.34.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH V2 06/10] arc: Adjust dot-product backend patterns
  2024-08-13 12:41 [PATCH V2 00/10] optabs: Make all `*dot_prod_optab's modeled as conversions Victor Do Nascimento
                   ` (4 preceding siblings ...)
  2024-08-13 12:41 ` [PATCH V2 05/10] i386: Fix dot_prod backend patterns for mmx and sse targets Victor Do Nascimento
@ 2024-08-13 12:41 ` Victor Do Nascimento
  2024-08-13 12:41 ` [PATCH V2 07/10] mips: " Victor Do Nascimento
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Victor Do Nascimento @ 2024-08-13 12:41 UTC (permalink / raw)
  To: gcc-patches
  Cc: Tamar.Christina, claziss, hongtao.liu, syq, bernds_cb1, aldyh,
	Victor Do Nascimento

Following the migration of the dot_prod optab from a direct to a
conversion-type optab, ensure all back-end patterns incorporate the
second machine mode into pattern names.

gcc/ChangeLog:

	* config/arc/simdext.md (sdot_prodv2hi): Renamed to...
	(sdot_prodsiv2hi): ...this.
	(udot_prodv2hi): Renamed to...
	(udot_prodsiv2hi): ...this.
	(sdot_prodv4hi): Renamed to...
	(sdot_prodv2siv4hi): ...this.
	(udot_prodv4hi): Renamed to...
	(udot_prodv2siv4hi): ...this.
---
 gcc/config/arc/simdext.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/config/arc/simdext.md b/gcc/config/arc/simdext.md
index 4e51a237c3a..0696f0abb70 100644
--- a/gcc/config/arc/simdext.md
+++ b/gcc/config/arc/simdext.md
@@ -1643,7 +1643,7 @@ (define_insn "dmpyh<V_US_suffix>"
 
 ;; We can use dmac as well here.  To be investigated which version
 ;; brings more.
-(define_expand "sdot_prodv2hi"
+(define_expand "sdot_prodsiv2hi"
   [(match_operand:SI 0 "register_operand" "")
    (match_operand:V2HI 1 "register_operand" "")
    (match_operand:V2HI 2 "register_operand" "")
@@ -1656,7 +1656,7 @@ (define_expand "sdot_prodv2hi"
  DONE;
 })
 
-(define_expand "udot_prodv2hi"
+(define_expand "udot_prodsiv2hi"
   [(match_operand:SI 0 "register_operand" "")
    (match_operand:V2HI 1 "register_operand" "")
    (match_operand:V2HI 2 "register_operand" "")
@@ -1669,7 +1669,7 @@ (define_expand "udot_prodv2hi"
  DONE;
 })
 
-(define_expand "sdot_prodv4hi"
+(define_expand "sdot_prodv2siv4hi"
   [(match_operand:V2SI 0 "register_operand" "")
    (match_operand:V4HI 1 "register_operand" "")
    (match_operand:V4HI 2 "register_operand" "")
@@ -1688,7 +1688,7 @@ (define_expand "sdot_prodv4hi"
  DONE;
 })
 
-(define_expand "udot_prodv4hi"
+(define_expand "udot_prodv2siv4hi"
   [(match_operand:V2SI 0 "register_operand" "")
    (match_operand:V4HI 1 "register_operand" "")
    (match_operand:V4HI 2 "register_operand" "")
-- 
2.34.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH V2 07/10] mips:  Adjust dot-product backend patterns
  2024-08-13 12:41 [PATCH V2 00/10] optabs: Make all `*dot_prod_optab's modeled as conversions Victor Do Nascimento
                   ` (5 preceding siblings ...)
  2024-08-13 12:41 ` [PATCH V2 06/10] arc: Adjust dot-product backend patterns Victor Do Nascimento
@ 2024-08-13 12:41 ` Victor Do Nascimento
  2024-08-13 12:41 ` [PATCH V2 08/10] rs6000: Adjust altivec " Victor Do Nascimento
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Victor Do Nascimento @ 2024-08-13 12:41 UTC (permalink / raw)
  To: gcc-patches
  Cc: Tamar.Christina, claziss, hongtao.liu, syq, bernds_cb1, aldyh,
	Victor Do Nascimento

Following the migration of the dot_prod optab from a direct to a
conversion-type optab, ensure all back-end patterns incorporate the
second machine mode into pattern names.

gcc/ChangeLog:

	* config/mips/loongson-mmi.md (sdot_prodv4hi): Renamed to...
	(sdot_prodv2siv4hi): ...this.
---
 gcc/config/mips/loongson-mmi.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/mips/loongson-mmi.md b/gcc/config/mips/loongson-mmi.md
index dd166bfa4c9..4d958730139 100644
--- a/gcc/config/mips/loongson-mmi.md
+++ b/gcc/config/mips/loongson-mmi.md
@@ -394,7 +394,7 @@ (define_insn "loongson_pmaddhw"
   "pmaddhw\t%0,%1,%2"
   [(set_attr "type" "fmul")])
 
-(define_expand "sdot_prodv4hi"
+(define_expand "sdot_prodv2siv4hi"
   [(match_operand:V2SI 0 "register_operand" "")
    (match_operand:V4HI 1 "register_operand" "")
    (match_operand:V4HI 2 "register_operand" "")
-- 
2.34.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH V2 08/10] rs6000: Adjust altivec dot-product backend patterns
  2024-08-13 12:41 [PATCH V2 00/10] optabs: Make all `*dot_prod_optab's modeled as conversions Victor Do Nascimento
                   ` (6 preceding siblings ...)
  2024-08-13 12:41 ` [PATCH V2 07/10] mips: " Victor Do Nascimento
@ 2024-08-13 12:41 ` Victor Do Nascimento
  2024-08-13 12:41 ` [PATCH V2 09/10] c6x: Adjust " Victor Do Nascimento
  2024-08-13 12:41 ` [PATCH V2 10/10] autovectorizer: Test autovectorization of different dot-prod modes Victor Do Nascimento
  9 siblings, 0 replies; 18+ messages in thread
From: Victor Do Nascimento @ 2024-08-13 12:41 UTC (permalink / raw)
  To: gcc-patches
  Cc: Tamar.Christina, claziss, hongtao.liu, syq, bernds_cb1, aldyh,
	Victor Do Nascimento

Following the migration of the dot_prod optab from a direct to a
conversion-type optab, ensure all back-end patterns incorporate the
second machine mode into pattern names.

gcc/ChangeLog:

	* config/rs6000/altivec.md (udot_prod<mode>): Renamed to...
	(udot_prodv4si<mode>): ...this.
	(sdot_prodv8hi): Renamed to...
	(sdot_prodv4siv8hi): ...this.
---
 gcc/config/rs6000/altivec.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 1f5489b974f..0911c1792a8 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -3698,7 +3698,7 @@ (define_expand "neg<mode>2"
     }
 })
 
-(define_expand "udot_prod<mode>"
+(define_expand "udot_prodv4si<mode>"
   [(set (match_operand:V4SI 0 "register_operand" "=v")
         (plus:V4SI (match_operand:V4SI 3 "register_operand" "v")
                    (unspec:V4SI [(match_operand:VIshort 1 "register_operand" "v")  
@@ -3710,7 +3710,7 @@ (define_expand "udot_prod<mode>"
   DONE;
 })
 
-(define_expand "sdot_prodv8hi"
+(define_expand "sdot_prodv4siv8hi"
   [(set (match_operand:V4SI 0 "register_operand" "=v")
         (plus:V4SI (match_operand:V4SI 3 "register_operand" "v")
                    (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")
-- 
2.34.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH V2 09/10] c6x:  Adjust dot-product backend patterns
  2024-08-13 12:41 [PATCH V2 00/10] optabs: Make all `*dot_prod_optab's modeled as conversions Victor Do Nascimento
                   ` (7 preceding siblings ...)
  2024-08-13 12:41 ` [PATCH V2 08/10] rs6000: Adjust altivec " Victor Do Nascimento
@ 2024-08-13 12:41 ` Victor Do Nascimento
  2024-08-13 12:41 ` [PATCH V2 10/10] autovectorizer: Test autovectorization of different dot-prod modes Victor Do Nascimento
  9 siblings, 0 replies; 18+ messages in thread
From: Victor Do Nascimento @ 2024-08-13 12:41 UTC (permalink / raw)
  To: gcc-patches
  Cc: Tamar.Christina, claziss, hongtao.liu, syq, bernds_cb1, aldyh,
	Victor Do Nascimento

Following the migration of the dot_prod optab from a direct to a
conversion-type optab, ensure all back-end patterns incorporate the
second machine mode into pattern names.

gcc/ChangeLog:

	* config/c6x/c6x.md (sdot_prodv2hi): Renamed to...
	(sdot_prodsiv2hi): ...this.
---
 gcc/config/c6x/c6x.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/c6x/c6x.md b/gcc/config/c6x/c6x.md
index 5964dd69d0d..ea9ffe8b4e1 100644
--- a/gcc/config/c6x/c6x.md
+++ b/gcc/config/c6x/c6x.md
@@ -3082,7 +3082,7 @@ (define_insn "<shift_code>v2hi3"
 ;; Widening vector multiply and dot product.
 ;; See c6x-mult.md.in for the define_insn patterns
 
-(define_expand "sdot_prodv2hi"
+(define_expand "sdot_prodsiv2hi"
   [(match_operand:SI 0 "register_operand" "")
    (match_operand:V2HI 1 "register_operand" "")
    (match_operand:V2HI 2 "register_operand" "")
-- 
2.34.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH V2 10/10] autovectorizer: Test autovectorization of different dot-prod modes.
  2024-08-13 12:41 [PATCH V2 00/10] optabs: Make all `*dot_prod_optab's modeled as conversions Victor Do Nascimento
                   ` (8 preceding siblings ...)
  2024-08-13 12:41 ` [PATCH V2 09/10] c6x: Adjust " Victor Do Nascimento
@ 2024-08-13 12:41 ` Victor Do Nascimento
  9 siblings, 0 replies; 18+ messages in thread
From: Victor Do Nascimento @ 2024-08-13 12:41 UTC (permalink / raw)
  To: gcc-patches
  Cc: Tamar.Christina, claziss, hongtao.liu, syq, bernds_cb1, aldyh,
	Victor Do Nascimento

From: Victor Do Nascimento <vicdon01@e125768.arm.com>

Given the novel treatment of the dot product optab as a conversion, we
are now able to targe different relationships between output modes and
input modes.

This is made clearer by way of example. Previously, on AArch64, the
following loop was vectorizable:

uint32_t udot4(int n, uint8_t* data) {
  uint32_t sum = 0;
  for (int i=0; i<n; i+=1)
    sum += data[i] * data[i];
  return sum;
}

while the following was not:

uint32_t udot2(int n, uint16_t* data) {
  uint32_t sum = 0;
  for (int i=0; i<n; i+=1)
    sum += data[i] * data[i];
  return sum;
}

Under the new treatment of the dot product optab, they are both now
vectorizable.

This adds the relevant target-agnostic check to ensure this behaviour
in the autovectorizer, gated behind the new check_effective_target
`vect_dotprod_twoway' as well a runtime check targetting aarch64.

gcc/testsuite/ChangeLog:

	* lib/target-supports.exp (check_effective_target_vect_dotprod_twoway):
	New.
	* gcc.dg/vect/vect-dotprod-twoway.c: Likewise.
	* gcc.target/aarch64/vect-dotprod-twoway.c: Likewise.
---
 .../gcc.dg/vect/vect-dotprod-twoway.c         | 39 +++++++++++
 .../gcc.target/aarch64/vect-dotprod-twoway.c  | 65 +++++++++++++++++++
 gcc/testsuite/lib/target-supports.exp         |  8 +++
 3 files changed, 112 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-dotprod-twoway.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
new file mode 100644
index 00000000000..ff6a2559dee
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
@@ -0,0 +1,39 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_dotprod_twoway } */
+/* Ensure both the two-way and four-way dot products are autovectorized.  */
+#include <stdint.h>
+
+uint32_t udot4(int n, uint8_t* data) {
+  uint32_t sum = 0;
+  for (int i=0; i<n; i+=1) {
+    sum += data[i] * data[i];
+  }
+  return sum;
+}
+
+int32_t sdot4(int n, int8_t* data) {
+  int32_t sum = 0;
+  for (int i=0; i<n; i+=1) {
+    sum += data[i] * data[i];
+  }
+  return sum;
+}
+
+uint32_t udot2(int n, uint16_t* data) {
+  uint32_t sum = 0;
+  for (int i=0; i<n; i+=1) {
+    sum += data[i] * data[i];
+  }
+  return sum;
+}
+
+int32_t sdot2(int n, int16_t* data) {
+  int32_t sum = 0;
+  for (int i=0; i<n; i+=1) {
+    sum += data[i] * data[i];
+  }
+  return sum;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern: detected" 4 "vect" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-dotprod-twoway.c b/gcc/testsuite/gcc.target/aarch64/vect-dotprod-twoway.c
new file mode 100644
index 00000000000..bac1e1846da
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect-dotprod-twoway.c
@@ -0,0 +1,65 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_dotprod_twoway } */
+/* { dg-options "-march=armv8-a+sme2 -static -O3 -ftree-vectorize -fdump-tree-vect-details -save-temps" } */
+/* Ensure runtime correctness in the autovectorized two-way dot product operations.  */
+
+#include <stdint.h>
+#include <stdlib.h>
+
+uint32_t
+udot2 (int n, uint16_t* data)  __arm_streaming
+{
+  uint32_t sum = 0;
+  for (int i=0; i<n; i+=1) {
+    sum += data[i] * data[i];
+  }
+  return sum;
+}
+
+int32_t
+sdot2 (int n, int16_t* data)  __arm_streaming
+{
+  int32_t sum = 0;
+  for (int i=0; i<n; i+=1) {
+    sum += data[i] * data[i];
+  }
+  return sum;
+}
+
+int
+main ()
+{
+
+  uint16_t u_input_nil[] = { [0 ... 3] = 0 };
+  uint16_t u_input_min[] = { [0 ... 3] = 1 };
+  uint16_t u_input_max[] = { [0 ... 3] = 32767};
+
+  uint32_t u_nil_dotprod = udot2 (4, u_input_nil);
+  uint32_t u_min_dotprod = udot2 (4, u_input_min);
+  uint32_t u_max_dotprod = udot2 (4, u_input_max);
+
+  if (u_nil_dotprod != 0
+      || u_min_dotprod != 4
+      || u_max_dotprod != 4294705156)
+    abort ();
+
+  int16_t s_input_nil[] = { [0 ... 3] = 0 };
+  int16_t s_input_min[] = { [0 ... 3] = -23170 };
+  int16_t s_input_max[] = { [0 ... 3] =  23170 };
+
+  int32_t s_nil_dotprod = sdot2 (4, s_input_nil);
+  int32_t s_min_dotprod = sdot2 (4, s_input_min);
+  int32_t s_max_dotprod = sdot2 (4, s_input_max);
+
+  if (s_nil_dotprod != 0
+      || s_min_dotprod != 2147395600
+      || s_max_dotprod != 2147395600)
+      abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern: detected" 46 "vect" } } */
+/* { dg-final { scan-assembler "\[ \t\]udot\tz\[0-9\]+.s, z\[0-9\]+.h, z\[0-9\]+.h" } } */
+/* { dg-final { scan-assembler "\[ \t\]sdot\tz\[0-9\]+.s, z\[0-9\]+.h, z\[0-9\]+.h" } } */
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 11ba77ca404..41618d399a3 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -4258,6 +4258,14 @@ proc check_effective_target_vect_int { } {
 	}}]
 }
 
+# Return 1 if the target supports two-way dot products, or 0 otherwise.
+
+proc check_effective_target_vect_dotprod_twoway { } {
+    return [check_cached_effective_target_indexed aarch64_sme2 {
+	expr { [check_effective_target_aarch64_sme2]
+    }}]
+}
+
 # Return 1 if the target supports vectorization of early breaks,
 # 0 otherwise.
 #
-- 
2.34.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH V2 05/10] i386: Fix dot_prod backend patterns for mmx and sse targets
  2024-08-13 12:41 ` [PATCH V2 05/10] i386: Fix dot_prod backend patterns for mmx and sse targets Victor Do Nascimento
@ 2024-08-14  1:24   ` Liu, Hongtao
  0 siblings, 0 replies; 18+ messages in thread
From: Liu, Hongtao @ 2024-08-14  1:24 UTC (permalink / raw)
  To: Victor Do Nascimento, gcc-patches
  Cc: Tamar.Christina, claziss, syq, bernds_cb1, aldyh



> -----Original Message-----
> From: Victor Do Nascimento <victor.donascimento@arm.com>
> Sent: Tuesday, August 13, 2024 8:42 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Tamar.Christina@arm.com; claziss@gmail.com; Liu, Hongtao
> <hongtao.liu@intel.com>; syq@gcc.gnu.org; bernds_cb1@t-online.de;
> aldyh@redhat.com; Victor Do Nascimento <victor.donascimento@arm.com>
> Subject: [PATCH V2 05/10] i386: Fix dot_prod backend patterns for mmx and
> sse targets
> 
> Following the migration of the dot_prod optab from a direct to a conversion-
> type optab, ensure all back-end patterns incorporate the second machine
> mode into pattern names.

Ok.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH V2 02/10] autovectorizer: Add basic support for convert optabs
  2024-08-13 12:41 ` [PATCH V2 02/10] autovectorizer: Add basic support for convert optabs Victor Do Nascimento
@ 2024-08-14 12:24   ` Tamar Christina
  2024-08-14 13:00     ` Victor Do Nascimento
  2024-08-14 14:10     ` Victor Do Nascimento
  0 siblings, 2 replies; 18+ messages in thread
From: Tamar Christina @ 2024-08-14 12:24 UTC (permalink / raw)
  To: Victor Do Nascimento, gcc-patches
  Cc: claziss, hongtao.liu, syq, bernds_cb1, aldyh, Victor Do Nascimento

Hi Victor,

> -----Original Message-----
> From: Victor Do Nascimento <victor.donascimento@arm.com>
> Sent: Tuesday, August 13, 2024 1:42 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Tamar Christina <Tamar.Christina@arm.com>; claziss@gmail.com;
> hongtao.liu@intel.com; syq@gcc.gnu.org; bernds_cb1@t-online.de;
> aldyh@redhat.com; Victor Do Nascimento <Victor.DoNascimento@arm.com>
> Subject: [PATCH V2 02/10] autovectorizer: Add basic support for convert optabs
> 
> Given the shift from modeling dot products as direct optabs to
> treating them as conversion optabs, we make necessary changes to the
> autovectorizer code to ensure that given the relevant tree code,
> together with the input and output data modes, we can retrieve the
> relevant optab and subsequently the insn_code for it.
> 
> gcc/ChangeLog:
> 
> 	* gimple-match-exports.cc (directly_supported_p): Add overload
> 	for conversion-type optabs.
> 	* gimple-match.h (directly_supported_p): Add new function
> 	prototype.
> 	* optabs.cc (expand_widen_pattern_expr): Make the
> 	DOT_PROD_EXPR tree code use `find_widening_optab_handler' to
> 	retrieve icode.
> 	* tree-vect-loop.cc (vect_is_emulated_mixed_dot_prod): make it
> 	call conversion-type overloaded `directly_supported_p'.
> 	* tree-vect-patterns.cc (vect_supportable_conv_optab_p): New.
> 	(vect_recog_dot_prod_pattern): s/direct/conv/ in call to
> 	`vect_supportable_direct_optab_p'.
> ---
>  gcc/gimple-match-exports.cc | 23 ++++++++++++++++++++
>  gcc/gimple-match.h          |  2 ++
>  gcc/optabs.cc               |  3 ++-
>  gcc/tree-vect-loop.cc       |  1 +
>  gcc/tree-vect-patterns.cc   | 43 +++++++++++++++++++++++++++++++++++--
>  5 files changed, 69 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/gimple-match-exports.cc b/gcc/gimple-match-exports.cc
> index aacf3ff0414..d18497e7c83 100644
> --- a/gcc/gimple-match-exports.cc
> +++ b/gcc/gimple-match-exports.cc
> @@ -1381,6 +1381,29 @@ directly_supported_p (code_helper code, tree type,
> optab_subtype query_type)
>  	  && direct_internal_fn_supported_p (ifn, type, OPTIMIZE_FOR_SPEED));
>  }
> 
> +/* As above, overloading the function for conversion-type optabs.  */
> +bool
> +directly_supported_p (code_helper code, tree type_out, tree type_in,
> +		      optab_subtype query_type)
> +{
> +  if (code.is_tree_code ())
> +    {
> +      convert_optab optab = optab_for_tree_code (tree_code (code), type_in,
> +						query_type);
> +      return (optab != unknown_optab
> +	      && convert_optab_handler (optab, TYPE_MODE (type_out),
> +					TYPE_MODE (type_in)) !=
> CODE_FOR_nothing);
> +    }
> +  gcc_assert (query_type == optab_default
> +	      || (query_type == optab_vector && VECTOR_TYPE_P (type_in))
> +	      || (query_type == optab_scalar && !VECTOR_TYPE_P (type_in)));
> +  internal_fn ifn = associated_internal_fn (combined_fn (code), type_in);
> +  return (direct_internal_fn_p (ifn)
> +	  && direct_internal_fn_supported_p (ifn, tree_pair (type_out, type_in),
> +					     OPTIMIZE_FOR_SPEED));
> +}
> +
> +
>  /* A wrapper around the internal-fn.cc versions of get_conditional_internal_fn
>     for a code_helper CODE operating on type TYPE.  */
> 
> diff --git a/gcc/gimple-match.h b/gcc/gimple-match.h
> index d710fcbace2..0333a5db00a 100644
> --- a/gcc/gimple-match.h
> +++ b/gcc/gimple-match.h
> @@ -419,6 +419,8 @@ code_helper canonicalize_code (code_helper, tree);
> 
>  #ifdef GCC_OPTABS_TREE_H
>  bool directly_supported_p (code_helper, tree, optab_subtype = optab_default);
> +bool directly_supported_p (code_helper, tree, tree,
> +			   optab_subtype = optab_default);
>  #endif
> 
>  internal_fn get_conditional_internal_fn (code_helper, tree);
> diff --git a/gcc/optabs.cc b/gcc/optabs.cc
> index 185c5b1a705..32737fb80e8 100644
> --- a/gcc/optabs.cc
> +++ b/gcc/optabs.cc
> @@ -317,7 +317,8 @@ expand_widen_pattern_expr (const_sepops ops, rtx op0,
> rtx op1, rtx wide_op,
>      widen_pattern_optab
>        = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), optab_default);
>    if (ops->code == WIDEN_MULT_PLUS_EXPR
> -      || ops->code == WIDEN_MULT_MINUS_EXPR)
> +      || ops->code == WIDEN_MULT_MINUS_EXPR
> +      || ops->code == DOT_PROD_EXPR)
>      icode = find_widening_optab_handler (widen_pattern_optab,
>  					 TYPE_MODE (TREE_TYPE (ops->op2)),
>  					 tmode0);
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 6456220cdc9..5f3de7b72a8 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -5289,6 +5289,7 @@ vect_is_emulated_mixed_dot_prod (stmt_vec_info
> stmt_info)
> 
>    gcc_assert (STMT_VINFO_REDUC_VECTYPE_IN (stmt_info));
>    return !directly_supported_p (DOT_PROD_EXPR,
> +				STMT_VINFO_VECTYPE (stmt_info),
>  				STMT_VINFO_REDUC_VECTYPE_IN (stmt_info),
>  				optab_vector_mixed_sign);
>  }
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index f52de2b6972..3afedc9199b 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -250,6 +250,45 @@ vect_supportable_direct_optab_p (vec_info *vinfo, tree
> otype, tree_code code,
>    return true;
>  }
> 
> +/* Return true if the target supports a vector version of CODE,
> +   where CODE is known to map to a conversion optab with the given SUBTYPE.
> +   ITYPE specifies the type of (some of) the scalar inputs and OTYPE
> +   specifies the type of the scalar result.
> +
> +   When returning true, set *VECOTYPE_OUT to the vector version of OTYPE.
> +   Also set *VECITYPE_OUT to the vector version of ITYPE if VECITYPE_OUT
> +   is nonnull.  */
> +
> +static bool
> +vect_supportable_conv_optab_p (vec_info *vinfo, tree otype, tree_code code,
> +				 tree itype, tree *vecotype_out,
> +				 tree *vecitype_out = NULL,
> +				 enum optab_subtype subtype = optab_default)
> +{
> +  tree vecitype = get_vectype_for_scalar_type (vinfo, itype);
> +  tree vecotype = get_vectype_for_scalar_type (vinfo, otype);
> +  if (!vecitype || !vecotype)
> +    return false;
> +
> +  optab optab = optab_for_tree_code (code, vecitype, subtype);
> +  if (!optab)
> +    return false;
> +
> +  insn_code icode = convert_optab_handler (optab, TYPE_MODE (vecotype),
> +					   TYPE_MODE (vecitype));
> +
> +  if (icode == CODE_FOR_nothing
> +      || insn_data[icode].operand[0].mode != TYPE_MODE (vecotype)
> +      || insn_data[icode].operand[1].mode != TYPE_MODE (vecitype))
> +    return false;
> +
> +  *vecotype_out = vecotype;
> +  if (vecitype_out)
> +    *vecitype_out = vecitype;
> +  return true;
> +}

You never responded to the previous review for this change, so I have the same question.
You've now added directly_supported_p which takes a convert optab and calls convert_optab_handler
Which checks both mode.

So why does this function not use it?  It seems to be doing duplicate work and limits itself unnecessarily to tree_code.
It seems to me that this should take a code_helper, create the vector modes and call directly_supported_p, or am I missing something?

Thanks,
Tamar

> +
> +
>  /* Round bit precision PRECISION up to a full element.  */
> 
>  static unsigned int
> @@ -1270,13 +1309,13 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
>      half_type = signed_type_for (half_type);
> 
>    tree half_vectype;
> -  if (!vect_supportable_direct_optab_p (vinfo, type, DOT_PROD_EXPR, half_type,
> +  if (!vect_supportable_conv_optab_p (vinfo, type, DOT_PROD_EXPR, half_type,
>  					type_out, &half_vectype, subtype))
>      {
>        /* We can emulate a mixed-sign dot-product using a sequence of
>  	 signed dot-products; see vect_emulate_mixed_dot_prod for details.  */
>        if (subtype != optab_vector_mixed_sign
> -	  || !vect_supportable_direct_optab_p (vinfo, signed_type_for (type),
> +	  || !vect_supportable_conv_optab_p (vinfo, signed_type_for (type),
>  					       DOT_PROD_EXPR, half_type,
>  					       type_out, &half_vectype,
>  					       optab_vector))
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V2 02/10] autovectorizer: Add basic support for convert optabs
  2024-08-14 12:24   ` Tamar Christina
@ 2024-08-14 13:00     ` Victor Do Nascimento
  2024-08-14 14:10     ` Victor Do Nascimento
  1 sibling, 0 replies; 18+ messages in thread
From: Victor Do Nascimento @ 2024-08-14 13:00 UTC (permalink / raw)
  To: Tamar Christina, gcc-patches; +Cc: claziss, hongtao.liu, syq, bernds_cb1, aldyh

On 8/14/24 13:24, Tamar Christina wrote:
> Hi Victor,
> 
>> -----Original Message-----
>> From: Victor Do Nascimento <victor.donascimento@arm.com>
>> Sent: Tuesday, August 13, 2024 1:42 PM
>> To: gcc-patches@gcc.gnu.org
>> Cc: Tamar Christina <Tamar.Christina@arm.com>; claziss@gmail.com;
>> hongtao.liu@intel.com; syq@gcc.gnu.org; bernds_cb1@t-online.de;
>> aldyh@redhat.com; Victor Do Nascimento <Victor.DoNascimento@arm.com>
>> Subject: [PATCH V2 02/10] autovectorizer: Add basic support for convert optabs
>>
>> Given the shift from modeling dot products as direct optabs to
>> treating them as conversion optabs, we make necessary changes to the
>> autovectorizer code to ensure that given the relevant tree code,
>> together with the input and output data modes, we can retrieve the
>> relevant optab and subsequently the insn_code for it.
>>
>> gcc/ChangeLog:
>>
>> 	* gimple-match-exports.cc (directly_supported_p): Add overload
>> 	for conversion-type optabs.
>> 	* gimple-match.h (directly_supported_p): Add new function
>> 	prototype.
>> 	* optabs.cc (expand_widen_pattern_expr): Make the
>> 	DOT_PROD_EXPR tree code use `find_widening_optab_handler' to
>> 	retrieve icode.
>> 	* tree-vect-loop.cc (vect_is_emulated_mixed_dot_prod): make it
>> 	call conversion-type overloaded `directly_supported_p'.
>> 	* tree-vect-patterns.cc (vect_supportable_conv_optab_p): New.
>> 	(vect_recog_dot_prod_pattern): s/direct/conv/ in call to
>> 	`vect_supportable_direct_optab_p'.
>> ---
>>   gcc/gimple-match-exports.cc | 23 ++++++++++++++++++++
>>   gcc/gimple-match.h          |  2 ++
>>   gcc/optabs.cc               |  3 ++-
>>   gcc/tree-vect-loop.cc       |  1 +
>>   gcc/tree-vect-patterns.cc   | 43 +++++++++++++++++++++++++++++++++++--
>>   5 files changed, 69 insertions(+), 3 deletions(-)
>>
>> diff --git a/gcc/gimple-match-exports.cc b/gcc/gimple-match-exports.cc
>> index aacf3ff0414..d18497e7c83 100644
>> --- a/gcc/gimple-match-exports.cc
>> +++ b/gcc/gimple-match-exports.cc
>> @@ -1381,6 +1381,29 @@ directly_supported_p (code_helper code, tree type,
>> optab_subtype query_type)
>>   	  && direct_internal_fn_supported_p (ifn, type, OPTIMIZE_FOR_SPEED));
>>   }
>>
>> +/* As above, overloading the function for conversion-type optabs.  */
>> +bool
>> +directly_supported_p (code_helper code, tree type_out, tree type_in,
>> +		      optab_subtype query_type)
>> +{
>> +  if (code.is_tree_code ())
>> +    {
>> +      convert_optab optab = optab_for_tree_code (tree_code (code), type_in,
>> +						query_type);
>> +      return (optab != unknown_optab
>> +	      && convert_optab_handler (optab, TYPE_MODE (type_out),
>> +					TYPE_MODE (type_in)) !=
>> CODE_FOR_nothing);
>> +    }
>> +  gcc_assert (query_type == optab_default
>> +	      || (query_type == optab_vector && VECTOR_TYPE_P (type_in))
>> +	      || (query_type == optab_scalar && !VECTOR_TYPE_P (type_in)));
>> +  internal_fn ifn = associated_internal_fn (combined_fn (code), type_in);
>> +  return (direct_internal_fn_p (ifn)
>> +	  && direct_internal_fn_supported_p (ifn, tree_pair (type_out, type_in),
>> +					     OPTIMIZE_FOR_SPEED));
>> +}
>> +
>> +
>>   /* A wrapper around the internal-fn.cc versions of get_conditional_internal_fn
>>      for a code_helper CODE operating on type TYPE.  */
>>
>> diff --git a/gcc/gimple-match.h b/gcc/gimple-match.h
>> index d710fcbace2..0333a5db00a 100644
>> --- a/gcc/gimple-match.h
>> +++ b/gcc/gimple-match.h
>> @@ -419,6 +419,8 @@ code_helper canonicalize_code (code_helper, tree);
>>
>>   #ifdef GCC_OPTABS_TREE_H
>>   bool directly_supported_p (code_helper, tree, optab_subtype = optab_default);
>> +bool directly_supported_p (code_helper, tree, tree,
>> +			   optab_subtype = optab_default);
>>   #endif
>>
>>   internal_fn get_conditional_internal_fn (code_helper, tree);
>> diff --git a/gcc/optabs.cc b/gcc/optabs.cc
>> index 185c5b1a705..32737fb80e8 100644
>> --- a/gcc/optabs.cc
>> +++ b/gcc/optabs.cc
>> @@ -317,7 +317,8 @@ expand_widen_pattern_expr (const_sepops ops, rtx op0,
>> rtx op1, rtx wide_op,
>>       widen_pattern_optab
>>         = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), optab_default);
>>     if (ops->code == WIDEN_MULT_PLUS_EXPR
>> -      || ops->code == WIDEN_MULT_MINUS_EXPR)
>> +      || ops->code == WIDEN_MULT_MINUS_EXPR
>> +      || ops->code == DOT_PROD_EXPR)
>>       icode = find_widening_optab_handler (widen_pattern_optab,
>>   					 TYPE_MODE (TREE_TYPE (ops->op2)),
>>   					 tmode0);
>> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
>> index 6456220cdc9..5f3de7b72a8 100644
>> --- a/gcc/tree-vect-loop.cc
>> +++ b/gcc/tree-vect-loop.cc
>> @@ -5289,6 +5289,7 @@ vect_is_emulated_mixed_dot_prod (stmt_vec_info
>> stmt_info)
>>
>>     gcc_assert (STMT_VINFO_REDUC_VECTYPE_IN (stmt_info));
>>     return !directly_supported_p (DOT_PROD_EXPR,
>> +				STMT_VINFO_VECTYPE (stmt_info),
>>   				STMT_VINFO_REDUC_VECTYPE_IN (stmt_info),
>>   				optab_vector_mixed_sign);
>>   }
>> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
>> index f52de2b6972..3afedc9199b 100644
>> --- a/gcc/tree-vect-patterns.cc
>> +++ b/gcc/tree-vect-patterns.cc
>> @@ -250,6 +250,45 @@ vect_supportable_direct_optab_p (vec_info *vinfo, tree
>> otype, tree_code code,
>>     return true;
>>   }
>>
>> +/* Return true if the target supports a vector version of CODE,
>> +   where CODE is known to map to a conversion optab with the given SUBTYPE.
>> +   ITYPE specifies the type of (some of) the scalar inputs and OTYPE
>> +   specifies the type of the scalar result.
>> +
>> +   When returning true, set *VECOTYPE_OUT to the vector version of OTYPE.
>> +   Also set *VECITYPE_OUT to the vector version of ITYPE if VECITYPE_OUT
>> +   is nonnull.  */
>> +
>> +static bool
>> +vect_supportable_conv_optab_p (vec_info *vinfo, tree otype, tree_code code,
>> +				 tree itype, tree *vecotype_out,
>> +				 tree *vecitype_out = NULL,
>> +				 enum optab_subtype subtype = optab_default)
>> +{
>> +  tree vecitype = get_vectype_for_scalar_type (vinfo, itype);
>> +  tree vecotype = get_vectype_for_scalar_type (vinfo, otype);
>> +  if (!vecitype || !vecotype)
>> +    return false;
>> +
>> +  optab optab = optab_for_tree_code (code, vecitype, subtype);
>> +  if (!optab)
>> +    return false;
>> +
>> +  insn_code icode = convert_optab_handler (optab, TYPE_MODE (vecotype),
>> +					   TYPE_MODE (vecitype));
>> +
>> +  if (icode == CODE_FOR_nothing
>> +      || insn_data[icode].operand[0].mode != TYPE_MODE (vecotype)
>> +      || insn_data[icode].operand[1].mode != TYPE_MODE (vecitype))
>> +    return false;
>> +
>> +  *vecotype_out = vecotype;
>> +  if (vecitype_out)
>> +    *vecitype_out = vecitype;
>> +  return true;
>> +}
> 
> You never responded to the previous review for this change, so I have the same question.
> You've now added directly_supported_p which takes a convert optab and calls convert_optab_handler
> Which checks both mode.

Darn, it seems I clearly missed the point of your feedback given in your 
previous review, with it having been seemingly overlooked upon my return 
from the summer break.

> So why does this function not use it?  It seems to be doing duplicate work and limits itself unnecessarily to tree_code.
> It seems to me that this should take a code_helper, create the vector modes and call directly_supported_p, or am I missing something?

Yes, there clearly is unnecessary code duplication we can do without and 
your approach does away with that quite nicely.  The use of the 
code_helper also makes for an easily-implementable improvement over the 
current way of handling things.

Thanks for taking the time to look over the code, greatly appreciated.

Cheers,
Victor

> Thanks,
> Tamar
> 
>> +
>> +
>>   /* Round bit precision PRECISION up to a full element.  */
>>
>>   static unsigned int
>> @@ -1270,13 +1309,13 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
>>       half_type = signed_type_for (half_type);
>>
>>     tree half_vectype;
>> -  if (!vect_supportable_direct_optab_p (vinfo, type, DOT_PROD_EXPR, half_type,
>> +  if (!vect_supportable_conv_optab_p (vinfo, type, DOT_PROD_EXPR, half_type,
>>   					type_out, &half_vectype, subtype))
>>       {
>>         /* We can emulate a mixed-sign dot-product using a sequence of
>>   	 signed dot-products; see vect_emulate_mixed_dot_prod for details.  */
>>         if (subtype != optab_vector_mixed_sign
>> -	  || !vect_supportable_direct_optab_p (vinfo, signed_type_for (type),
>> +	  || !vect_supportable_conv_optab_p (vinfo, signed_type_for (type),
>>   					       DOT_PROD_EXPR, half_type,
>>   					       type_out, &half_vectype,
>>   					       optab_vector))
>> --
>> 2.34.1
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V2 02/10] autovectorizer: Add basic support for convert optabs
  2024-08-14 12:24   ` Tamar Christina
  2024-08-14 13:00     ` Victor Do Nascimento
@ 2024-08-14 14:10     ` Victor Do Nascimento
  1 sibling, 0 replies; 18+ messages in thread
From: Victor Do Nascimento @ 2024-08-14 14:10 UTC (permalink / raw)
  To: Tamar Christina, gcc-patches; +Cc: claziss, hongtao.liu, syq, bernds_cb1, aldyh

On 8/14/24 13:24, Tamar Christina wrote:

> It seems to me that this should take a code_helper, create the vector modes and call directly_supported_p, or am I missing something?

Ok. Having done some digging around in the git history, I see that 
`vect_supportable_direct_optab_p', upon which I based my implementation 
of `vect_supportable_conv_optab_p', was committed before you wrote the 
`directly_supported_p' function, which I guess explains why 
`vect_supportable_direct_optab_p' does not take advantage of the latter 
function to avoid code duplication.

I'd wrongly presumed that we'd have been aware of the existence of 
`directly_supported_p' when writing `vect_supportable_direct_optab_p', 
such that any apparent code duplication would have been a conscious 
choice by the author, thus making it a relevant design consideration in 
my own implementation.

Anyway, will submit updated patch shortly.

Cheers,
Victor

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V2 01/10] optabs: Make all `*dot_prod_optab's modeled as conversions
  2024-08-13 12:41 ` [PATCH V2 01/10] " Victor Do Nascimento
@ 2024-08-15  8:11   ` Richard Sandiford
  0 siblings, 0 replies; 18+ messages in thread
From: Richard Sandiford @ 2024-08-15  8:11 UTC (permalink / raw)
  To: Victor Do Nascimento
  Cc: gcc-patches, Tamar.Christina, claziss, hongtao.liu, syq,
	bernds_cb1, aldyh

Victor Do Nascimento <victor.donascimento@arm.com> writes:
> Given the specification in the GCC internals manual defines the
> {u|s}dot_prod<m> standard name as taking "two signed elements of the
> same mode, adding them to a third operand of wider mode", there is
> currently ambiguity in the relationship between the mode of the first
> two arguments and that of the third.
>
> This vagueness means that, in theory, different modes may be
> supportable in the third argument.  This flexibility would allow for a
> given backend to add to the accumulator a different number of
> vectorized products, e.g. A backend may provide instructions for both:
>
>   accum += a[0] * b[0] + a[1] * b[1] + a[2] * b[2] + a[3] * b[3]
>
> and
>
>   accum += a[0] * b[0] + a[1] * b[1],
>
> as is now seen in the SVE2.1 extension to AArch64.  In spite of the
> aforementioned flexibility, modeling the dot-product operation as a
> direct optab means that we have no way to encode both input and the
> accumulator data modes into the backend pattern name, which prevents
> us from harnessing this flexibility.
>
> We therefore make all dot_prod optabs conversions, allowing, for
> example, for the encoding of both 2-way and 4-way dot product backend
> patterns.
>
> gcc/ChangeLog:
>
> 	* optabs.def (sdot_prod_optab): Convert from OPTAB_D to
> 	OPTAB_CD.
> 	(udot_prod_optab): Likewise.
> 	(usdot_prod_optab): Likewise.
> 	* doc/md.texi (Standard Names): update entries for u,s and us
> 	dot_prod names.
> ---
>  gcc/doc/md.texi | 46 +++++++++++++++++++++-------------------------
>  gcc/optabs.def  |  6 +++---
>  2 files changed, 24 insertions(+), 28 deletions(-)
>
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 5dc0d55edd6..aa1181a3320 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5760,15 +5760,14 @@ for (i = 0; i < LEN + BIAS; i++)
>      operand0 += operand2[i];
>  @end smallexample
>  
> -@cindex @code{sdot_prod@var{m}} instruction pattern
> -@item @samp{sdot_prod@var{m}}
> -
> -Compute the sum of the products of two signed elements.
> -Operand 1 and operand 2 are of the same mode. Their
> -product, which is of a wider mode, is computed and added to operand 3.
> -Operand 3 is of a mode equal or wider than the mode of the product. The
> -result is placed in operand 0, which is of the same mode as operand 3.
> -@var{m} is the mode of operand 1 and operand 2.
> +@cindex @code{sdot_prod@var{m}@var{n}} instruction pattern
> +@item @samp{sdot_prod@var{m}@var{n}}
> +
> +Multiply operand 1 by operand 2 without loss of precision, given that
> +both operands contain signed elements.  Add each product to the overlapping
> +element of operand 3 and store the result in operand 0.  Operands 0 and 3
> +have mode @var{m} and operands 1 and 2 have mode @var{n}, with @var{n}
> +having narrower elements than @var{m}.
>  
>  Semantically the expressions perform the multiplication in the following signs
>  
> @@ -5778,15 +5777,14 @@ sdot<signed op0, signed op1, signed op2, signed op3> ==
>  @dots{}
>  @end smallexample
>  
> -@cindex @code{udot_prod@var{m}} instruction pattern
> -@item @samp{udot_prod@var{m}}
> +@cindex @code{udot_prod@var{m}@var{n}} instruction pattern
> +@item @samp{udot_prod@var{m}@var{n}}
>  
> -Compute the sum of the products of two unsigned elements.
> -Operand 1 and operand 2 are of the same mode. Their
> -product, which is of a wider mode, is computed and added to operand 3.
> -Operand 3 is of a mode equal or wider than the mode of the product. The
> -result is placed in operand 0, which is of the same mode as operand 3.
> -@var{m} is the mode of operand 1 and operand 2.
> +Multiply operand 1 by operand 2 without loss of precision, given that
> +both operands contain unsigned elements.  Add each product to the overlapping
> +element of operand 3 and store the result in operand 0.  Operands 0 and 3
> +have mode @var{m} and operands 1 and 2 have mode @var{n}, with @var{n}
> +having narrower elements than @var{m}.
>  
>  Semantically the expressions perform the multiplication in the following signs
>  
> @@ -5796,14 +5794,12 @@ udot<unsigned op0, unsigned op1, unsigned op2, unsigned op3> ==
>  @dots{}
>  @end smallexample
>  
> -@cindex @code{usdot_prod@var{m}} instruction pattern
> -@item @samp{usdot_prod@var{m}}
> -Compute the sum of the products of elements of different signs.
> -Operand 1 must be unsigned and operand 2 signed. Their
> -product, which is of a wider mode, is computed and added to operand 3.
> -Operand 3 is of a mode equal or wider than the mode of the product. The
> -result is placed in operand 0, which is of the same mode as operand 3.
> -@var{m} is the mode of operand 1 and operand 2.
> +@cindex @code{usdot_prod@var{m}@var{n}} instruction pattern
> +@item @samp{usdot_prod@var{m}@var{n}}
> +Multiply operand 1 by operand 2.  Add each product to the overlapping

The new paragraph drops the information that operand 1 is unsigned and
operand 2 is signed.  Maybe change this sentence to:

  Multiply operand 1 by operand 2 without loss of precision, given that
  operand 1 is unsigned and operand 2 is signed.

OK with that change, thanks.

Richard

> +element of operand 3 and store the result in operand 0.  Operands 0 and 3
> +have mode @var{m} and operands 1 and 2 have mode @var{n}, with @var{n}
> +having narrower elements than @var{m}.
>  
>  Semantically the expressions perform the multiplication in the following signs
>  
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index 58a939442bd..ba860144d8b 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -110,6 +110,9 @@ OPTAB_CD(mask_scatter_store_optab, "mask_scatter_store$a$b")
>  OPTAB_CD(mask_len_scatter_store_optab, "mask_len_scatter_store$a$b")
>  OPTAB_CD(vec_extract_optab, "vec_extract$a$b")
>  OPTAB_CD(vec_init_optab, "vec_init$a$b")
> +OPTAB_CD (sdot_prod_optab, "sdot_prod$I$a$b")
> +OPTAB_CD (udot_prod_optab, "udot_prod$I$a$b")
> +OPTAB_CD (usdot_prod_optab, "usdot_prod$I$a$b")
>  
>  OPTAB_CD (while_ult_optab, "while_ult$a$b")
>  
> @@ -413,10 +416,7 @@ OPTAB_D (savg_floor_optab, "avg$a3_floor")
>  OPTAB_D (uavg_floor_optab, "uavg$a3_floor")
>  OPTAB_D (savg_ceil_optab, "avg$a3_ceil")
>  OPTAB_D (uavg_ceil_optab, "uavg$a3_ceil")
> -OPTAB_D (sdot_prod_optab, "sdot_prod$I$a")
>  OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3")
> -OPTAB_D (udot_prod_optab, "udot_prod$I$a")
> -OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
>  OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
>  OPTAB_D (usad_optab, "usad$I$a")
>  OPTAB_D (ssad_optab, "ssad$I$a")

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V2 03/10] aarch64: Fix aarch64 backend-use of (u|s|us)dot_prod patterns
  2024-08-13 12:41 ` [PATCH V2 03/10] aarch64: Fix aarch64 backend-use of (u|s|us)dot_prod patterns Victor Do Nascimento
@ 2024-08-15  8:26   ` Richard Sandiford
  2024-08-15  9:45     ` Victor Do Nascimento
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Sandiford @ 2024-08-15  8:26 UTC (permalink / raw)
  To: Victor Do Nascimento
  Cc: gcc-patches, Tamar.Christina, claziss, hongtao.liu, syq,
	bernds_cb1, aldyh

Victor Do Nascimento <victor.donascimento@arm.com> writes:
> Given recent changes to the dot_prod standard pattern name, this patch
> fixes the aarch64 back-end by implementing the following changes:
>
> 1. Add 2nd mode to all (u|s|us)dot_prod patterns in .md files.
> 2. Rewrite initialization and function expansion mechanism for simd
> builtins.
> 3. Fix all direct calls to back-end `dot_prod' patterns in SVE
> builtins.
>
> Finally, given that it is now possible for the compiler to
> differentiate between the two- and four-way dot product, we add a test
> to ensure that autovectorization picks up on dot-product patterns
> where the result is twice the width of the operands.
>
> gcc/ChangeLog:
>
> 	* config/aarch64/aarch64-simd.md
> 	(<sur>dot_prod<vsi2qi><vczle><vczbe>): Renamed to...
> 	(<sur>dot_prod<mode><vsi2qi><vczle><vczbe>): ...this.
> 	(usdot_prod<vsi2qi><vczle><vczbe>): Renamed to...
> 	(usdot_prod<mode><vsi2qi><vczle><vczbe>): ...this.
> 	(<su>sadv16qi): Adjust call to gen_udot_prod take second mode.
> 	(popcount<mode2>): fix use of `udot_prod_optab'.
> 	* gcc/config/aarch64/aarch64-sve.md
> 	(<sur>dot_prod<vsi2qi>): Renamed to...
> 	(<sur>dot_prod<mode><vsi2qi>): ...this.
> 	(@<sur>dot_prod<vsi2qi>): Renamed to...
> 	(@<sur>dot_prod<mode><vsi2qi>): ...this.
> 	(<su>sad<vsi2qi>): Adjust call to gen_udot_prod take second mode.
> 	* gcc/config/aarch64/aarch64-sve2.md
> 	(@aarch64_sve_<sur>dotvnx4sivnx8hi): Renamed to...
> 	(<sur>dot_prodvnx4sivnx8hi): ...this.
> 	* config/aarch64/aarch64-simd-builtins.def: Modify macro
> 	expansion-based initialization and expansion
> 	of (u|s|us)dot_prod builtins.
> 	* config/aarch64/aarch64-sve-builtins-base.cc
> 	(svdot_impl::expand): s/direct/convert/ in
> 	`convert_optab_handler_for_sign' function call.
> 	(svusdot_impl::expand): add second mode argument in call to
> 	`code_for_dot_prod'.
> 	* config/aarch64/aarch64-sve-builtins.cc
> 	(function_expander::convert_optab_handler_for_sign): New class
> 	method.
> 	* config/aarch64/aarch64-sve-builtins.h
> 	(class function_expander): Add prototype for new
> 	`convert_optab_handler_for_sign' method.
>
> gcc/testsuite/ChangeLog:
> 	* gcc.target/aarch64/sme/vect-dotprod-twoway.c (udot2): New.

Could you run the patch through contrib/check_GNU_style.py to catch
the long lines?

> ---
>  gcc/config/aarch64/aarch64-builtins.cc        |  7 ++++++
>  gcc/config/aarch64/aarch64-simd-builtins.def  |  6 ++---
>  gcc/config/aarch64/aarch64-simd.md            |  9 ++++---
>  .../aarch64/aarch64-sve-builtins-base.cc      | 13 +++++-----
>  gcc/config/aarch64/aarch64-sve-builtins.cc    | 17 +++++++++++++
>  gcc/config/aarch64/aarch64-sve-builtins.h     |  3 +++
>  gcc/config/aarch64/aarch64-sve.md             |  6 ++---
>  gcc/config/aarch64/aarch64-sve2.md            |  2 +-
>  .../aarch64/sme/vect-dotprod-twoway.c         | 25 +++++++++++++++++++
>  9 files changed, 71 insertions(+), 17 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/vect-dotprod-twoway.c
> [...]
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc
> index 0a560eaedca..975eca0bbd6 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
> @@ -3745,6 +3745,23 @@ function_expander::direct_optab_handler_for_sign (optab signed_op,
>    return ::direct_optab_handler (op, mode);
>  }
>  
> +/* Choose between signed and unsigned convert optabs SIGNED_OP and
> +   UNSIGNED_OP based on the signedness of type suffix SUFFIX_I, then
> +   pick the appropriate optab handler for the mode.  Use MODE as the
> +   mode if given, otherwise use the mode of type suffix SUFFIX_I.  */

The last sentence needs to be adapted for this function.  Also, because
there is no longer a single mode, I don't think it makes sense to allow
a default.  So how about:

/* Choose between signed and unsigned convert optabs SIGNED_OP and
   UNSIGNED_OP based on the signedness of type suffix SUFFIX_I, then
   pick the appropriate optab handler for "converting" from FROM_MODE
   to TO_MODE.  */

> +insn_code
> +function_expander::convert_optab_handler_for_sign (optab signed_op,
> +						   optab unsigned_op,
> +						   unsigned int suffix_i,
> +						   machine_mode to_mode,
> +						   machine_mode from_mode)
> +{
> +  if (from_mode == VOIDmode)
> +    from_mode = vector_mode (suffix_i);

This code would then be removed.

> +  optab op = type_suffix (suffix_i).unsigned_p ? unsigned_op : signed_op;
> +  return ::convert_optab_handler (op, to_mode, from_mode);
> +}
> +
>  /* Return true if X overlaps any input.  */
>  bool
>  function_expander::overlaps_input_p (rtx x)
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h b/gcc/config/aarch64/aarch64-sve-builtins.h
> index 9ab6f202c30..7534a58c3d7 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins.h
> +++ b/gcc/config/aarch64/aarch64-sve-builtins.h
> @@ -659,6 +659,9 @@ public:
>    insn_code direct_optab_handler (optab, unsigned int = 0);
>    insn_code direct_optab_handler_for_sign (optab, optab, unsigned int = 0,
>  					   machine_mode = E_VOIDmode);
> +  insn_code convert_optab_handler_for_sign (optab, optab, unsigned int = 0,
> +					    machine_mode = E_VOIDmode,
> +					    machine_mode = E_VOIDmode);

and the "= E_VOIDmode"s here too.

>    machine_mode result_mode () const;
>  
> [...]
> diff --git a/gcc/testsuite/gcc.target/aarch64/sme/vect-dotprod-twoway.c b/gcc/testsuite/gcc.target/aarch64/sme/vect-dotprod-twoway.c
> new file mode 100644
> index 00000000000..453f3a75e6f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sme/vect-dotprod-twoway.c
> @@ -0,0 +1,25 @@
> +/* { dg-additional-options "-march=armv9.2-a+sme2 -O2 -ftree-vectorize" } */

Could you remove the -march option in favour of:

  #pragma GCC target "+sme2"

?  That way, we honour the user's test flags if they already include sme2.

LGTM otherwise, thanks.

Richard

> +
> +#include <stdint.h>
> +
> +uint32_t udot2(int n, uint16_t* data) __arm_streaming
> +{
> +  uint32_t sum = 0;
> +  for (int i=0; i<n; i+=1) {
> +    sum += data[i] * data[i];
> +  }
> +  return sum;
> +}
> +
> +int32_t sdot2(int n, int16_t* data) __arm_streaming
> +{
> +  int32_t sum = 0;
> +  for (int i=0; i<n; i+=1) {
> +    sum += data[i] * data[i];
> +  }
> +  return sum;
> +}
> +
> +/* { dg-final { scan-assembler-times {\tudot\tz[0-9]+\.s, z[0-9]+\.h, z[0-9]+\.h\n} 5 } } */
> +/* { dg-final { scan-assembler-times {\tsdot\tz[0-9]+\.s, z[0-9]+\.h, z[0-9]+\.h\n} 5 } } */
> +/* { dg-final { scan-assembler-times {\twhilelo\t} 4 } } */

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V2 03/10] aarch64: Fix aarch64 backend-use of (u|s|us)dot_prod patterns
  2024-08-15  8:26   ` Richard Sandiford
@ 2024-08-15  9:45     ` Victor Do Nascimento
  0 siblings, 0 replies; 18+ messages in thread
From: Victor Do Nascimento @ 2024-08-15  9:45 UTC (permalink / raw)
  To: gcc-patches, Tamar.Christina, claziss, hongtao.liu, syq,
	bernds_cb1, aldyh, richard.sandiford

On 8/15/24 09:26, Richard Sandiford wrote:
> Victor Do Nascimento <victor.donascimento@arm.com> writes:
>> Given recent changes to the dot_prod standard pattern name, this patch
>> fixes the aarch64 back-end by implementing the following changes:
>>
>> 1. Add 2nd mode to all (u|s|us)dot_prod patterns in .md files.
>> 2. Rewrite initialization and function expansion mechanism for simd
>> builtins.
>> 3. Fix all direct calls to back-end `dot_prod' patterns in SVE
>> builtins.
>>
>> Finally, given that it is now possible for the compiler to
>> differentiate between the two- and four-way dot product, we add a test
>> to ensure that autovectorization picks up on dot-product patterns
>> where the result is twice the width of the operands.
>>
>> gcc/ChangeLog:
>>
>> 	* config/aarch64/aarch64-simd.md
>> 	(<sur>dot_prod<vsi2qi><vczle><vczbe>): Renamed to...
>> 	(<sur>dot_prod<mode><vsi2qi><vczle><vczbe>): ...this.
>> 	(usdot_prod<vsi2qi><vczle><vczbe>): Renamed to...
>> 	(usdot_prod<mode><vsi2qi><vczle><vczbe>): ...this.
>> 	(<su>sadv16qi): Adjust call to gen_udot_prod take second mode.
>> 	(popcount<mode2>): fix use of `udot_prod_optab'.
>> 	* gcc/config/aarch64/aarch64-sve.md
>> 	(<sur>dot_prod<vsi2qi>): Renamed to...
>> 	(<sur>dot_prod<mode><vsi2qi>): ...this.
>> 	(@<sur>dot_prod<vsi2qi>): Renamed to...
>> 	(@<sur>dot_prod<mode><vsi2qi>): ...this.
>> 	(<su>sad<vsi2qi>): Adjust call to gen_udot_prod take second mode.
>> 	* gcc/config/aarch64/aarch64-sve2.md
>> 	(@aarch64_sve_<sur>dotvnx4sivnx8hi): Renamed to...
>> 	(<sur>dot_prodvnx4sivnx8hi): ...this.
>> 	* config/aarch64/aarch64-simd-builtins.def: Modify macro
>> 	expansion-based initialization and expansion
>> 	of (u|s|us)dot_prod builtins.
>> 	* config/aarch64/aarch64-sve-builtins-base.cc
>> 	(svdot_impl::expand): s/direct/convert/ in
>> 	`convert_optab_handler_for_sign' function call.
>> 	(svusdot_impl::expand): add second mode argument in call to
>> 	`code_for_dot_prod'.
>> 	* config/aarch64/aarch64-sve-builtins.cc
>> 	(function_expander::convert_optab_handler_for_sign): New class
>> 	method.
>> 	* config/aarch64/aarch64-sve-builtins.h
>> 	(class function_expander): Add prototype for new
>> 	`convert_optab_handler_for_sign' method.
>>
>> gcc/testsuite/ChangeLog:
>> 	* gcc.target/aarch64/sme/vect-dotprod-twoway.c (udot2): New.
> 
> Could you run the patch through contrib/check_GNU_style.py to catch
> the long lines?
> 
>> ---
>>   gcc/config/aarch64/aarch64-builtins.cc        |  7 ++++++
>>   gcc/config/aarch64/aarch64-simd-builtins.def  |  6 ++---
>>   gcc/config/aarch64/aarch64-simd.md            |  9 ++++---
>>   .../aarch64/aarch64-sve-builtins-base.cc      | 13 +++++-----
>>   gcc/config/aarch64/aarch64-sve-builtins.cc    | 17 +++++++++++++
>>   gcc/config/aarch64/aarch64-sve-builtins.h     |  3 +++
>>   gcc/config/aarch64/aarch64-sve.md             |  6 ++---
>>   gcc/config/aarch64/aarch64-sve2.md            |  2 +-
>>   .../aarch64/sme/vect-dotprod-twoway.c         | 25 +++++++++++++++++++
>>   9 files changed, 71 insertions(+), 17 deletions(-)
>>   create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/vect-dotprod-twoway.c
>> [...]
>> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc
>> index 0a560eaedca..975eca0bbd6 100644
>> --- a/gcc/config/aarch64/aarch64-sve-builtins.cc
>> +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
>> @@ -3745,6 +3745,23 @@ function_expander::direct_optab_handler_for_sign (optab signed_op,
>>     return ::direct_optab_handler (op, mode);
>>   }
>>   
>> +/* Choose between signed and unsigned convert optabs SIGNED_OP and
>> +   UNSIGNED_OP based on the signedness of type suffix SUFFIX_I, then
>> +   pick the appropriate optab handler for the mode.  Use MODE as the
>> +   mode if given, otherwise use the mode of type suffix SUFFIX_I.  */
> 
> The last sentence needs to be adapted for this function.  Also, because
> there is no longer a single mode, I don't think it makes sense to allow
> a default.  So how about:
> 
> /* Choose between signed and unsigned convert optabs SIGNED_OP and
>     UNSIGNED_OP based on the signedness of type suffix SUFFIX_I, then
>     pick the appropriate optab handler for "converting" from FROM_MODE
>     to TO_MODE.  */
> 
>> +insn_code
>> +function_expander::convert_optab_handler_for_sign (optab signed_op,
>> +						   optab unsigned_op,
>> +						   unsigned int suffix_i,
>> +						   machine_mode to_mode,
>> +						   machine_mode from_mode)
>> +{
>> +  if (from_mode == VOIDmode)
>> +    from_mode = vector_mode (suffix_i);
> 
> This code would then be removed.
> 
>> +  optab op = type_suffix (suffix_i).unsigned_p ? unsigned_op : signed_op;
>> +  return ::convert_optab_handler (op, to_mode, from_mode);
>> +}
>> +
>>   /* Return true if X overlaps any input.  */
>>   bool
>>   function_expander::overlaps_input_p (rtx x)
>> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h b/gcc/config/aarch64/aarch64-sve-builtins.h
>> index 9ab6f202c30..7534a58c3d7 100644
>> --- a/gcc/config/aarch64/aarch64-sve-builtins.h
>> +++ b/gcc/config/aarch64/aarch64-sve-builtins.h
>> @@ -659,6 +659,9 @@ public:
>>     insn_code direct_optab_handler (optab, unsigned int = 0);
>>     insn_code direct_optab_handler_for_sign (optab, optab, unsigned int = 0,
>>   					   machine_mode = E_VOIDmode);
>> +  insn_code convert_optab_handler_for_sign (optab, optab, unsigned int = 0,
>> +					    machine_mode = E_VOIDmode,
>> +					    machine_mode = E_VOIDmode);
> 
> and the "= E_VOIDmode"s here too.
> 
>>     machine_mode result_mode () const;
>>   
>> [...]
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sme/vect-dotprod-twoway.c b/gcc/testsuite/gcc.target/aarch64/sme/vect-dotprod-twoway.c
>> new file mode 100644
>> index 00000000000..453f3a75e6f
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/sme/vect-dotprod-twoway.c
>> @@ -0,0 +1,25 @@
>> +/* { dg-additional-options "-march=armv9.2-a+sme2 -O2 -ftree-vectorize" } */
> 
> Could you remove the -march option in favour of:
> 
>    #pragma GCC target "+sme2"
> 
> ?  That way, we honour the user's test flags if they already include sme2.
> 
> LGTM otherwise, thanks.
> 
> Richard

Thanks Richard, sorry the requested changes didn't make it into the 
posted V3 patch series.

Your proposed improvements make good sense and have been implemented in 
accordance with the provided feedback.

Cheers,
Victor

>> +
>> +#include <stdint.h>
>> +
>> +uint32_t udot2(int n, uint16_t* data) __arm_streaming
>> +{
>> +  uint32_t sum = 0;
>> +  for (int i=0; i<n; i+=1) {
>> +    sum += data[i] * data[i];
>> +  }
>> +  return sum;
>> +}
>> +
>> +int32_t sdot2(int n, int16_t* data) __arm_streaming
>> +{
>> +  int32_t sum = 0;
>> +  for (int i=0; i<n; i+=1) {
>> +    sum += data[i] * data[i];
>> +  }
>> +  return sum;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {\tudot\tz[0-9]+\.s, z[0-9]+\.h, z[0-9]+\.h\n} 5 } } */
>> +/* { dg-final { scan-assembler-times {\tsdot\tz[0-9]+\.s, z[0-9]+\.h, z[0-9]+\.h\n} 5 } } */
>> +/* { dg-final { scan-assembler-times {\twhilelo\t} 4 } } */

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2024-08-15  9:46 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-08-13 12:41 [PATCH V2 00/10] optabs: Make all `*dot_prod_optab's modeled as conversions Victor Do Nascimento
2024-08-13 12:41 ` [PATCH V2 01/10] " Victor Do Nascimento
2024-08-15  8:11   ` Richard Sandiford
2024-08-13 12:41 ` [PATCH V2 02/10] autovectorizer: Add basic support for convert optabs Victor Do Nascimento
2024-08-14 12:24   ` Tamar Christina
2024-08-14 13:00     ` Victor Do Nascimento
2024-08-14 14:10     ` Victor Do Nascimento
2024-08-13 12:41 ` [PATCH V2 03/10] aarch64: Fix aarch64 backend-use of (u|s|us)dot_prod patterns Victor Do Nascimento
2024-08-15  8:26   ` Richard Sandiford
2024-08-15  9:45     ` Victor Do Nascimento
2024-08-13 12:41 ` [PATCH V2 04/10] arm: Fix arm " Victor Do Nascimento
2024-08-13 12:41 ` [PATCH V2 05/10] i386: Fix dot_prod backend patterns for mmx and sse targets Victor Do Nascimento
2024-08-14  1:24   ` Liu, Hongtao
2024-08-13 12:41 ` [PATCH V2 06/10] arc: Adjust dot-product backend patterns Victor Do Nascimento
2024-08-13 12:41 ` [PATCH V2 07/10] mips: " Victor Do Nascimento
2024-08-13 12:41 ` [PATCH V2 08/10] rs6000: Adjust altivec " Victor Do Nascimento
2024-08-13 12:41 ` [PATCH V2 09/10] c6x: Adjust " Victor Do Nascimento
2024-08-13 12:41 ` [PATCH V2 10/10] autovectorizer: Test autovectorization of different dot-prod modes Victor Do Nascimento

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).