public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer
@ 2024-05-16 14:39 Victor Do Nascimento
  2024-05-16 14:45 ` Andrew Pinski
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Victor Do Nascimento @ 2024-05-16 14:39 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.sandiford, Richard.Earnshaw, Victor Do Nascimento

From: Victor Do Nascimento <vicdon01@e133397.arm.com>

At present, the compiler offers the `{u|s|us}dot_prod_optab' direct
optabs for dealing with vectorizable dot product code sequences.  The
consequence of using a direct optab for this is that backend-pattern
selection is only ever able to match against one datatype - Either
that of the operands or of the accumulated value, never both.

With the introduction of the 2-way (un)signed dot-product insn [1][2]
in AArch64 SVE2, the existing direct opcode approach is no longer
sufficient for full specification of all the possible dot product
machine instructions to be matched to the code sequence; a dot product
resulting in VNx4SI may result from either dot products on VNx16QI or
VNx8HI values for the 4- and 2-way dot product operations, respectively.

This means that the following example fails autovectorization:

uint32_t foo(int n, uint16_t* data) {
  uint32_t sum = 0;
  for (int i=0; i<n; i+=1) {
    sum += data[i] * data[i];
  }
  return sum;
}

To remedy the issue a new optab is added, tentatively named
`udot_prod_twoway_optab', whose selection is dependent upon checking
of both input and output types involved in the operation.

In order to minimize changes to the existing codebase,
`optab_for_tree_code' is renamed `optab_for_tree_code_1' and a new
argument is added to its signature - `const_tree otype', allowing type
information to be specified for both input and output types.  The
existing nterface is retained by defining a new `optab_for_tree_code',
which serves as a shim to `optab_for_tree_code_1', passing old
parameters as-is and setting the new `optype' argument to `NULL_TREE'.

For DOT_PROD_EXPR tree codes, we can call `optab_for_tree_code_1'
directly, passing it both types, adding the internal logic to the
function to distinguish between competing optabs.

Finally, necessary changes are made to `expand_widen_pattern_expr' to
ensure the new icode can be correctly selected, given the new optab.

[1] https://developer.arm.com/documentation/ddi0602/2024-03/SVE-Instructions/UDOT--2-way--vectors---Unsigned-integer-dot-product-
[2] https://developer.arm.com/documentation/ddi0602/2024-03/SVE-Instructions/SDOT--2-way--vectors---Signed-integer-dot-product-

gcc/ChangeLog:

	* config/aarch64/aarch64-sve2.md (@aarch64_sve_<sur>dotvnx4sivnx8hi):
	renamed to `<sur>dot_prod_twoway_vnx8hi'.
	* config/aarch64/aarch64-sve-builtins-base.cc (svdot_impl.expand):
	update icodes used in line with above rename.
	* optabs-tree.cc (optab_for_tree_code_1): Renamed
	`optab_for_tree_code' and added new argument.
	(optab_for_tree_code): Now a call to `optab_for_tree_code_1'.
	* optabs-tree.h (optab_for_tree_code_1): New.
	* optabs.cc (expand_widen_pattern_expr): Expand support for
	DOT_PROD_EXPR patterns.
	* optabs.def (udot_prod_twoway_optab): New.
	(sdot_prod_twoway_optab): Likewise.
	* tree-vect-patterns.cc (vect_supportable_direct_optab_p): Add
	support for misc optabs that use two modes.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/vect-dotprod-twoway.c: New.
---
 .../aarch64/aarch64-sve-builtins-base.cc      |  4 ++--
 gcc/config/aarch64/aarch64-sve2.md            |  2 +-
 gcc/optabs-tree.cc                            | 23 ++++++++++++++++--
 gcc/optabs-tree.h                             |  2 ++
 gcc/optabs.cc                                 |  2 +-
 gcc/optabs.def                                |  2 ++
 .../gcc.dg/vect/vect-dotprod-twoway.c         | 24 +++++++++++++++++++
 gcc/tree-vect-patterns.cc                     |  2 +-
 8 files changed, 54 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index 0d2edf3f19e..e457db09f66 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -764,8 +764,8 @@ public:
       icode = (e.type_suffix (0).float_p
 	       ? CODE_FOR_aarch64_sve_fdotvnx4sfvnx8hf
 	       : e.type_suffix (0).unsigned_p
-	       ? CODE_FOR_aarch64_sve_udotvnx4sivnx8hi
-	       : CODE_FOR_aarch64_sve_sdotvnx4sivnx8hi);
+	       ? CODE_FOR_udot_prod_twoway_vnx8hi
+	       : CODE_FOR_sdot_prod_twoway_vnx8hi);
     return e.use_unpred_insn (icode);
   }
 };
diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-sve2.md
index 934e57055d3..5677de7108d 100644
--- a/gcc/config/aarch64/aarch64-sve2.md
+++ b/gcc/config/aarch64/aarch64-sve2.md
@@ -2021,7 +2021,7 @@ (define_insn "@aarch64_sve_qsub_<sve_int_op>_lane_<mode>"
 )
 
 ;; Two-way dot-product.
-(define_insn "@aarch64_sve_<sur>dotvnx4sivnx8hi"
+(define_insn "<sur>dot_prod_twoway_vnx8hi"
   [(set (match_operand:VNx4SI 0 "register_operand")
 	(plus:VNx4SI
 	  (unspec:VNx4SI
diff --git a/gcc/optabs-tree.cc b/gcc/optabs-tree.cc
index b69a5bc3676..e3c5a618ea2 100644
--- a/gcc/optabs-tree.cc
+++ b/gcc/optabs-tree.cc
@@ -35,8 +35,8 @@ along with GCC; see the file COPYING3.  If not see
    cannot give complete results for multiplication or division) but probably
    ought to be relied on more widely throughout the expander.  */
 optab
-optab_for_tree_code (enum tree_code code, const_tree type,
-		     enum optab_subtype subtype)
+optab_for_tree_code_1 (enum tree_code code, const_tree type,
+		       const_tree otype, enum optab_subtype subtype)
 {
   bool trapv;
   switch (code)
@@ -149,6 +149,14 @@ optab_for_tree_code (enum tree_code code, const_tree type,
 
     case DOT_PROD_EXPR:
       {
+	if (otype && (TYPE_PRECISION (TREE_TYPE (type)) * 2
+		      == TYPE_PRECISION (TREE_TYPE (otype))))
+	  {
+	    if (TYPE_UNSIGNED (type) && TYPE_UNSIGNED (otype))
+	      return udot_prod_twoway_optab;
+	    if (!TYPE_UNSIGNED (type) && !TYPE_UNSIGNED (otype))
+	      return sdot_prod_twoway_optab;
+	  }
 	if (subtype == optab_vector_mixed_sign)
 	  return usdot_prod_optab;
 
@@ -285,6 +293,17 @@ optab_for_tree_code (enum tree_code code, const_tree type,
     }
 }
 
+/* Return the optab used for computing the operation given by the tree code,
+   CODE and the tree EXP.  This function is not always usable (for example, it
+   cannot give complete results for multiplication or division) but probably
+   ought to be relied on more widely throughout the expander.  */
+optab
+optab_for_tree_code (enum tree_code code, const_tree type,
+		     enum optab_subtype subtype)
+{
+  return optab_for_tree_code_1 (code, type, NULL_TREE, subtype);
+}
+
 /* Check whether an operation represented by CODE is a 'half' widening operation
    in which the input vector type has half the number of bits of the output
    vector type e.g. V8QI->V8HI.
diff --git a/gcc/optabs-tree.h b/gcc/optabs-tree.h
index f2b49991462..13ff7ca2e4b 100644
--- a/gcc/optabs-tree.h
+++ b/gcc/optabs-tree.h
@@ -36,6 +36,8 @@ enum optab_subtype
 /* Return the optab used for computing the given operation on the type given by
    the second argument.  The third argument distinguishes between the types of
    vector shifts and rotates.  */
+optab optab_for_tree_code_1 (enum tree_code, const_tree, const_tree __attribute__((unused)),
+			    enum optab_subtype );
 optab optab_for_tree_code (enum tree_code, const_tree, enum optab_subtype);
 bool
 supportable_half_widening_operation (enum tree_code, tree, tree,
diff --git a/gcc/optabs.cc b/gcc/optabs.cc
index ce91f94ed43..3a1c6c7b90e 100644
--- a/gcc/optabs.cc
+++ b/gcc/optabs.cc
@@ -311,7 +311,7 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
 	gcc_unreachable ();
 
       widen_pattern_optab
-	= optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), subtype);
+	= optab_for_tree_code_1 (ops->code, TREE_TYPE (oprnd0), TREE_TYPE (oprnd2), subtype);
     }
   else
     widen_pattern_optab
diff --git a/gcc/optabs.def b/gcc/optabs.def
index ad14f9328b9..cf1a6e7a7dc 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -408,6 +408,8 @@ OPTAB_D (sdot_prod_optab, "sdot_prod$I$a")
 OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3")
 OPTAB_D (udot_prod_optab, "udot_prod$I$a")
 OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
+OPTAB_D (sdot_prod_twoway_optab, "sdot_prod_twoway_$I$a")
+OPTAB_D (udot_prod_twoway_optab, "udot_prod_twoway_$I$a")
 OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
 OPTAB_D (usad_optab, "usad$I$a")
 OPTAB_D (ssad_optab, "ssad$I$a")
diff --git a/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
new file mode 100644
index 00000000000..cba2aadbec8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
@@ -0,0 +1,24 @@
+/* { dg-do compile { target { aarch64*-*-* } } } */
+/* { dg-additional-options "-march=-O3 -march=armv9.2-a+sme2 -fdump-tree-vect-details" { target { aarch64*-*-* } } } */
+
+#include <stdint.h>
+
+uint32_t udot2(int n, uint16_t* data) {
+  uint32_t sum = 0;
+  for (int i=0; i<n; i+=1) {
+    sum += data[i] * data[i];
+  }
+  return sum;
+}
+
+int32_t sdot2(int n, int16_t* data) {
+  int32_t sum = 0;
+  for (int i=0; i<n; i+=1) {
+    sum += data[i] * data[i];
+  }
+  return sum;
+}
+
+/* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern: detected:" 2 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: detected" 4 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } } */
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index dfb7d800526..0760f25d94d 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -233,7 +233,7 @@ vect_supportable_direct_optab_p (vec_info *vinfo, tree otype, tree_code code,
   if (!vecotype)
     return false;
 
-  optab optab = optab_for_tree_code (code, vecitype, subtype);
+  optab optab = optab_for_tree_code_1 (code, vecitype, vecotype, subtype);
   if (!optab)
     return false;
 
-- 
2.34.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer
  2024-05-16 14:39 [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer Victor Do Nascimento
@ 2024-05-16 14:45 ` Andrew Pinski
  2024-05-16 17:45 ` Tamar Christina
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: Andrew Pinski @ 2024-05-16 14:45 UTC (permalink / raw)
  To: Victor Do Nascimento
  Cc: GCC Patches, Richard Sandiford, Richard Earnshaw, Victor Do Nascimento

[-- Attachment #1: Type: text/plain, Size: 10873 bytes --]

On Thu, May 16, 2024, 4:40 PM Victor Do Nascimento <
victor.donascimento@arm.com> wrote:

> From: Victor Do Nascimento <vicdon01@e133397.arm.com>
>
> At present, the compiler offers the `{u|s|us}dot_prod_optab' direct
> optabs for dealing with vectorizable dot product code sequences.  The
> consequence of using a direct optab for this is that backend-pattern
> selection is only ever able to match against one datatype - Either
> that of the operands or of the accumulated value, never both.
>
> With the introduction of the 2-way (un)signed dot-product insn [1][2]
> in AArch64 SVE2, the existing direct opcode approach is no longer
> sufficient for full specification of all the possible dot product
> machine instructions to be matched to the code sequence; a dot product
> resulting in VNx4SI may result from either dot products on VNx16QI or
> VNx8HI values for the 4- and 2-way dot product operations, respectively.
>
> This means that the following example fails autovectorization:
>
> uint32_t foo(int n, uint16_t* data) {
>   uint32_t sum = 0;
>   for (int i=0; i<n; i+=1) {
>     sum += data[i] * data[i];
>   }
>   return sum;
> }
>
> To remedy the issue a new optab is added, tentatively named
> `udot_prod_twoway_optab', whose selection is dependent upon checking
> of both input and output types involved in the operation.
>
> In order to minimize changes to the existing codebase,
> `optab_for_tree_code' is renamed `optab_for_tree_code_1' and a new
> argument is added to its signature - `const_tree otype', allowing type
> information to be specified for both input and output types.  The
> existing nterface is retained by defining a new `optab_for_tree_code',
> which serves as a shim to `optab_for_tree_code_1', passing old
> parameters as-is and setting the new `optype' argument to `NULL_TREE'.
>
> For DOT_PROD_EXPR tree codes, we can call `optab_for_tree_code_1'
> directly, passing it both types, adding the internal logic to the
> function to distinguish between competing optabs.
>
> Finally, necessary changes are made to `expand_widen_pattern_expr' to
> ensure the new icode can be correctly selected, given the new optab.
>

Since you are adding an optab, please update the internals manual with the
documentation of the optab (the standard pattern names section).

Thanks,
Andrew


> [1]
> https://developer.arm.com/documentation/ddi0602/2024-03/SVE-Instructions/UDOT--2-way--vectors---Unsigned-integer-dot-product-
> [2]
> https://developer.arm.com/documentation/ddi0602/2024-03/SVE-Instructions/SDOT--2-way--vectors---Signed-integer-dot-product-
>
> gcc/ChangeLog:
>
>         * config/aarch64/aarch64-sve2.md
> (@aarch64_sve_<sur>dotvnx4sivnx8hi):
>         renamed to `<sur>dot_prod_twoway_vnx8hi'.
>         * config/aarch64/aarch64-sve-builtins-base.cc (svdot_impl.expand):
>         update icodes used in line with above rename.
>         * optabs-tree.cc (optab_for_tree_code_1): Renamed
>         `optab_for_tree_code' and added new argument.
>         (optab_for_tree_code): Now a call to `optab_for_tree_code_1'.
>         * optabs-tree.h (optab_for_tree_code_1): New.
>         * optabs.cc (expand_widen_pattern_expr): Expand support for
>         DOT_PROD_EXPR patterns.
>         * optabs.def (udot_prod_twoway_optab): New.
>         (sdot_prod_twoway_optab): Likewise.
>         * tree-vect-patterns.cc (vect_supportable_direct_optab_p): Add
>         support for misc optabs that use two modes.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.dg/vect/vect-dotprod-twoway.c: New.
> ---
>  .../aarch64/aarch64-sve-builtins-base.cc      |  4 ++--
>  gcc/config/aarch64/aarch64-sve2.md            |  2 +-
>  gcc/optabs-tree.cc                            | 23 ++++++++++++++++--
>  gcc/optabs-tree.h                             |  2 ++
>  gcc/optabs.cc                                 |  2 +-
>  gcc/optabs.def                                |  2 ++
>  .../gcc.dg/vect/vect-dotprod-twoway.c         | 24 +++++++++++++++++++
>  gcc/tree-vect-patterns.cc                     |  2 +-
>  8 files changed, 54 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
>
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> index 0d2edf3f19e..e457db09f66 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> @@ -764,8 +764,8 @@ public:
>        icode = (e.type_suffix (0).float_p
>                ? CODE_FOR_aarch64_sve_fdotvnx4sfvnx8hf
>                : e.type_suffix (0).unsigned_p
> -              ? CODE_FOR_aarch64_sve_udotvnx4sivnx8hi
> -              : CODE_FOR_aarch64_sve_sdotvnx4sivnx8hi);
> +              ? CODE_FOR_udot_prod_twoway_vnx8hi
> +              : CODE_FOR_sdot_prod_twoway_vnx8hi);
>      return e.use_unpred_insn (icode);
>    }
>  };
> diff --git a/gcc/config/aarch64/aarch64-sve2.md
> b/gcc/config/aarch64/aarch64-sve2.md
> index 934e57055d3..5677de7108d 100644
> --- a/gcc/config/aarch64/aarch64-sve2.md
> +++ b/gcc/config/aarch64/aarch64-sve2.md
> @@ -2021,7 +2021,7 @@ (define_insn
> "@aarch64_sve_qsub_<sve_int_op>_lane_<mode>"
>  )
>
>  ;; Two-way dot-product.
> -(define_insn "@aarch64_sve_<sur>dotvnx4sivnx8hi"
> +(define_insn "<sur>dot_prod_twoway_vnx8hi"
>    [(set (match_operand:VNx4SI 0 "register_operand")
>         (plus:VNx4SI
>           (unspec:VNx4SI
> diff --git a/gcc/optabs-tree.cc b/gcc/optabs-tree.cc
> index b69a5bc3676..e3c5a618ea2 100644
> --- a/gcc/optabs-tree.cc
> +++ b/gcc/optabs-tree.cc
> @@ -35,8 +35,8 @@ along with GCC; see the file COPYING3.  If not see
>     cannot give complete results for multiplication or division) but
> probably
>     ought to be relied on more widely throughout the expander.  */
>  optab
> -optab_for_tree_code (enum tree_code code, const_tree type,
> -                    enum optab_subtype subtype)
> +optab_for_tree_code_1 (enum tree_code code, const_tree type,
> +                      const_tree otype, enum optab_subtype subtype)
>  {
>    bool trapv;
>    switch (code)
> @@ -149,6 +149,14 @@ optab_for_tree_code (enum tree_code code, const_tree
> type,
>
>      case DOT_PROD_EXPR:
>        {
> +       if (otype && (TYPE_PRECISION (TREE_TYPE (type)) * 2
> +                     == TYPE_PRECISION (TREE_TYPE (otype))))
> +         {
> +           if (TYPE_UNSIGNED (type) && TYPE_UNSIGNED (otype))
> +             return udot_prod_twoway_optab;
> +           if (!TYPE_UNSIGNED (type) && !TYPE_UNSIGNED (otype))
> +             return sdot_prod_twoway_optab;
> +         }
>         if (subtype == optab_vector_mixed_sign)
>           return usdot_prod_optab;
>
> @@ -285,6 +293,17 @@ optab_for_tree_code (enum tree_code code, const_tree
> type,
>      }
>  }
>
> +/* Return the optab used for computing the operation given by the tree
> code,
> +   CODE and the tree EXP.  This function is not always usable (for
> example, it
> +   cannot give complete results for multiplication or division) but
> probably
> +   ought to be relied on more widely throughout the expander.  */
> +optab
> +optab_for_tree_code (enum tree_code code, const_tree type,
> +                    enum optab_subtype subtype)
> +{
> +  return optab_for_tree_code_1 (code, type, NULL_TREE, subtype);
> +}
> +
>  /* Check whether an operation represented by CODE is a 'half' widening
> operation
>     in which the input vector type has half the number of bits of the
> output
>     vector type e.g. V8QI->V8HI.
> diff --git a/gcc/optabs-tree.h b/gcc/optabs-tree.h
> index f2b49991462..13ff7ca2e4b 100644
> --- a/gcc/optabs-tree.h
> +++ b/gcc/optabs-tree.h
> @@ -36,6 +36,8 @@ enum optab_subtype
>  /* Return the optab used for computing the given operation on the type
> given by
>     the second argument.  The third argument distinguishes between the
> types of
>     vector shifts and rotates.  */
> +optab optab_for_tree_code_1 (enum tree_code, const_tree, const_tree
> __attribute__((unused)),
> +                           enum optab_subtype );
>  optab optab_for_tree_code (enum tree_code, const_tree, enum
> optab_subtype);
>  bool
>  supportable_half_widening_operation (enum tree_code, tree, tree,
> diff --git a/gcc/optabs.cc b/gcc/optabs.cc
> index ce91f94ed43..3a1c6c7b90e 100644
> --- a/gcc/optabs.cc
> +++ b/gcc/optabs.cc
> @@ -311,7 +311,7 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx
> op1, rtx wide_op,
>         gcc_unreachable ();
>
>        widen_pattern_optab
> -       = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), subtype);
> +       = optab_for_tree_code_1 (ops->code, TREE_TYPE (oprnd0), TREE_TYPE
> (oprnd2), subtype);
>      }
>    else
>      widen_pattern_optab
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index ad14f9328b9..cf1a6e7a7dc 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -408,6 +408,8 @@ OPTAB_D (sdot_prod_optab, "sdot_prod$I$a")
>  OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3")
>  OPTAB_D (udot_prod_optab, "udot_prod$I$a")
>  OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
> +OPTAB_D (sdot_prod_twoway_optab, "sdot_prod_twoway_$I$a")
> +OPTAB_D (udot_prod_twoway_optab, "udot_prod_twoway_$I$a")
>  OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
>  OPTAB_D (usad_optab, "usad$I$a")
>  OPTAB_D (ssad_optab, "ssad$I$a")
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> new file mode 100644
> index 00000000000..cba2aadbec8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> @@ -0,0 +1,24 @@
> +/* { dg-do compile { target { aarch64*-*-* } } } */
> +/* { dg-additional-options "-march=-O3 -march=armv9.2-a+sme2
> -fdump-tree-vect-details" { target { aarch64*-*-* } } } */
> +
> +#include <stdint.h>
> +
> +uint32_t udot2(int n, uint16_t* data) {
> +  uint32_t sum = 0;
> +  for (int i=0; i<n; i+=1) {
> +    sum += data[i] * data[i];
> +  }
> +  return sum;
> +}
> +
> +int32_t sdot2(int n, int16_t* data) {
> +  int32_t sum = 0;
> +  for (int i=0; i<n; i+=1) {
> +    sum += data[i] * data[i];
> +  }
> +  return sum;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern:
> detected:" 2 "vect" } } */
> +/* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern:
> detected" 4 "vect" } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } } */
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index dfb7d800526..0760f25d94d 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -233,7 +233,7 @@ vect_supportable_direct_optab_p (vec_info *vinfo, tree
> otype, tree_code code,
>    if (!vecotype)
>      return false;
>
> -  optab optab = optab_for_tree_code (code, vecitype, subtype);
> +  optab optab = optab_for_tree_code_1 (code, vecitype, vecotype, subtype);
>    if (!optab)
>      return false;
>
> --
> 2.34.1
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer
  2024-05-16 14:39 [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer Victor Do Nascimento
  2024-05-16 14:45 ` Andrew Pinski
@ 2024-05-16 17:45 ` Tamar Christina
  2024-05-16 17:53   ` Andrew Pinski
  2024-05-17  2:08 ` Hongtao Liu
  2024-05-17  5:51 ` Richard Biener
  3 siblings, 1 reply; 14+ messages in thread
From: Tamar Christina @ 2024-05-16 17:45 UTC (permalink / raw)
  To: Victor Do Nascimento, gcc-patches
  Cc: Richard Sandiford, Richard Earnshaw, Victor Do Nascimento

Hi Victor,

> -----Original Message-----
> From: Victor Do Nascimento <victor.donascimento@arm.com>
> Sent: Thursday, May 16, 2024 3:39 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Sandiford <Richard.Sandiford@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Victor Do Nascimento
> <vicdon01@e133397.arm.com>
> Subject: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer
> 
> From: Victor Do Nascimento <vicdon01@e133397.arm.com>
> 
> At present, the compiler offers the `{u|s|us}dot_prod_optab' direct
> optabs for dealing with vectorizable dot product code sequences.  The
> consequence of using a direct optab for this is that backend-pattern
> selection is only ever able to match against one datatype - Either
> that of the operands or of the accumulated value, never both.
> 
> With the introduction of the 2-way (un)signed dot-product insn [1][2]
> in AArch64 SVE2, the existing direct opcode approach is no longer
> sufficient for full specification of all the possible dot product
> machine instructions to be matched to the code sequence; a dot product
> resulting in VNx4SI may result from either dot products on VNx16QI or
> VNx8HI values for the 4- and 2-way dot product operations, respectively.
> 
> This means that the following example fails autovectorization:
> 
> uint32_t foo(int n, uint16_t* data) {
>   uint32_t sum = 0;
>   for (int i=0; i<n; i+=1) {
>     sum += data[i] * data[i];
>   }
>   return sum;
> }
> 
> To remedy the issue a new optab is added, tentatively named
> `udot_prod_twoway_optab', whose selection is dependent upon checking
> of both input and output types involved in the operation.
> 
> In order to minimize changes to the existing codebase,
> `optab_for_tree_code' is renamed `optab_for_tree_code_1' and a new
> argument is added to its signature - `const_tree otype', allowing type
> information to be specified for both input and output types.  The
> existing nterface is retained by defining a new `optab_for_tree_code',
> which serves as a shim to `optab_for_tree_code_1', passing old
> parameters as-is and setting the new `optype' argument to `NULL_TREE'.
> 
> For DOT_PROD_EXPR tree codes, we can call `optab_for_tree_code_1'
> directly, passing it both types, adding the internal logic to the
> function to distinguish between competing optabs.
> 
> Finally, necessary changes are made to `expand_widen_pattern_expr' to
> ensure the new icode can be correctly selected, given the new optab.
> 
> [1] https://developer.arm.com/documentation/ddi0602/2024-03/SVE-
> Instructions/UDOT--2-way--vectors---Unsigned-integer-dot-product-
> [2] https://developer.arm.com/documentation/ddi0602/2024-03/SVE-
> Instructions/SDOT--2-way--vectors---Signed-integer-dot-product-
> 
> gcc/ChangeLog:
> 
> 	* config/aarch64/aarch64-sve2.md
> (@aarch64_sve_<sur>dotvnx4sivnx8hi):
> 	renamed to `<sur>dot_prod_twoway_vnx8hi'.
> 	* config/aarch64/aarch64-sve-builtins-base.cc (svdot_impl.expand):
> 	update icodes used in line with above rename.

Please split the target specific bits from the target agnostic parts.
I.e. this patch series should be split in two.

> 	* optabs-tree.cc (optab_for_tree_code_1): Renamed
> 	`optab_for_tree_code' and added new argument.
> 	(optab_for_tree_code): Now a call to `optab_for_tree_code_1'.
> 	* optabs-tree.h (optab_for_tree_code_1): New.
> 	* optabs.cc (expand_widen_pattern_expr): Expand support for
> 	DOT_PROD_EXPR patterns.
> 	* optabs.def (udot_prod_twoway_optab): New.
> 	(sdot_prod_twoway_optab): Likewise.
> 	* tree-vect-patterns.cc (vect_supportable_direct_optab_p): Add
> 	support for misc optabs that use two modes.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.dg/vect/vect-dotprod-twoway.c: New.
> ---
>  .../aarch64/aarch64-sve-builtins-base.cc      |  4 ++--
>  gcc/config/aarch64/aarch64-sve2.md            |  2 +-
>  gcc/optabs-tree.cc                            | 23 ++++++++++++++++--
>  gcc/optabs-tree.h                             |  2 ++
>  gcc/optabs.cc                                 |  2 +-
>  gcc/optabs.def                                |  2 ++
>  .../gcc.dg/vect/vect-dotprod-twoway.c         | 24 +++++++++++++++++++
>  gcc/tree-vect-patterns.cc                     |  2 +-
>  8 files changed, 54 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> 
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> index 0d2edf3f19e..e457db09f66 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> @@ -764,8 +764,8 @@ public:
>        icode = (e.type_suffix (0).float_p
>  	       ? CODE_FOR_aarch64_sve_fdotvnx4sfvnx8hf
>  	       : e.type_suffix (0).unsigned_p
> -	       ? CODE_FOR_aarch64_sve_udotvnx4sivnx8hi
> -	       : CODE_FOR_aarch64_sve_sdotvnx4sivnx8hi);
> +	       ? CODE_FOR_udot_prod_twoway_vnx8hi
> +	       : CODE_FOR_sdot_prod_twoway_vnx8hi);
>      return e.use_unpred_insn (icode);
>    }
>  };
> diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-
> sve2.md
> index 934e57055d3..5677de7108d 100644
> --- a/gcc/config/aarch64/aarch64-sve2.md
> +++ b/gcc/config/aarch64/aarch64-sve2.md
> @@ -2021,7 +2021,7 @@ (define_insn
> "@aarch64_sve_qsub_<sve_int_op>_lane_<mode>"
>  )
> 
>  ;; Two-way dot-product.
> -(define_insn "@aarch64_sve_<sur>dotvnx4sivnx8hi"
> +(define_insn "<sur>dot_prod_twoway_vnx8hi"

This drops the @ modifier, did it have no callers?

>    [(set (match_operand:VNx4SI 0 "register_operand")
>  	(plus:VNx4SI
>  	  (unspec:VNx4SI
> diff --git a/gcc/optabs-tree.cc b/gcc/optabs-tree.cc
> index b69a5bc3676..e3c5a618ea2 100644
> --- a/gcc/optabs-tree.cc
> +++ b/gcc/optabs-tree.cc
> @@ -35,8 +35,8 @@ along with GCC; see the file COPYING3.  If not see
>     cannot give complete results for multiplication or division) but probably
>     ought to be relied on more widely throughout the expander.  */
>  optab
> -optab_for_tree_code (enum tree_code code, const_tree type,
> -		     enum optab_subtype subtype)
> +optab_for_tree_code_1 (enum tree_code code, const_tree type,
> +		       const_tree otype, enum optab_subtype subtype)
>  {

Since we're C++ now, why not just overload optab_for_tree_code.
Both functions are useful in their own right so it seems better to overload them.
By overloading it also means you don't have to change the existing call sites.

I'd also make the optab_for_tree_code with one type just call optab_for_tree_code
with the same type twice.

>    bool trapv;
>    switch (code)
> @@ -149,6 +149,14 @@ optab_for_tree_code (enum tree_code code, const_tree
> type,
> 
>      case DOT_PROD_EXPR:
>        {
> +	if (otype && (TYPE_PRECISION (TREE_TYPE (type)) * 2
> +		      == TYPE_PRECISION (TREE_TYPE (otype))))
> +	  {

Because then you can remove the check for otype here,
And store TREE_TYPE (type) and TREE_TYPE (otype) safely into a variable so
the if becomes
if (otype != type
    && TYPE_PRECISION ...
   && TYPE_UNSIGNED (type) == TYPE_UNSIGNED (otype)

> +	    if (TYPE_UNSIGNED (type) && TYPE_UNSIGNED (otype))
> +	      return udot_prod_twoway_optab;
> +	    if (!TYPE_UNSIGNED (type) && !TYPE_UNSIGNED (otype))
> +	      return sdot_prod_twoway_optab;
> +	  }

And you can then simplify this into

return (TYPE_UNSIGNED (otype) ? udot_prod_twoway_optab : sdot_prod_twoway_optab;

>  	if (subtype == optab_vector_mixed_sign)
>  	  return usdot_prod_optab;
> 
> @@ -285,6 +293,17 @@ optab_for_tree_code (enum tree_code code, const_tree
> type,
>      }
>  }
> 
> +/* Return the optab used for computing the operation given by the tree code,
> +   CODE and the tree EXP.  This function is not always usable (for example, it
> +   cannot give complete results for multiplication or division) but probably
> +   ought to be relied on more widely throughout the expander.  */
> +optab
> +optab_for_tree_code (enum tree_code code, const_tree type,
> +		     enum optab_subtype subtype)
> +{
> +  return optab_for_tree_code_1 (code, type, NULL_TREE, subtype);
> +}
> +
>  /* Check whether an operation represented by CODE is a 'half' widening operation
>     in which the input vector type has half the number of bits of the output
>     vector type e.g. V8QI->V8HI.
> diff --git a/gcc/optabs-tree.h b/gcc/optabs-tree.h
> index f2b49991462..13ff7ca2e4b 100644
> --- a/gcc/optabs-tree.h
> +++ b/gcc/optabs-tree.h
> @@ -36,6 +36,8 @@ enum optab_subtype
>  /* Return the optab used for computing the given operation on the type given by
>     the second argument.  The third argument distinguishes between the types of
>     vector shifts and rotates.  */
> +optab optab_for_tree_code_1 (enum tree_code, const_tree, const_tree
> __attribute__((unused)),
> +			    enum optab_subtype );
>  optab optab_for_tree_code (enum tree_code, const_tree, enum optab_subtype);
>  bool
>  supportable_half_widening_operation (enum tree_code, tree, tree,
> diff --git a/gcc/optabs.cc b/gcc/optabs.cc
> index ce91f94ed43..3a1c6c7b90e 100644
> --- a/gcc/optabs.cc
> +++ b/gcc/optabs.cc
> @@ -311,7 +311,7 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx
> op1, rtx wide_op,
>  	gcc_unreachable ();
> 
>        widen_pattern_optab
> -	= optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), subtype);
> +	= optab_for_tree_code_1 (ops->code, TREE_TYPE (oprnd0), TREE_TYPE
> (oprnd2), subtype);
>      }
>    else
>      widen_pattern_optab
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index ad14f9328b9..cf1a6e7a7dc 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -408,6 +408,8 @@ OPTAB_D (sdot_prod_optab, "sdot_prod$I$a")
>  OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3")
>  OPTAB_D (udot_prod_optab, "udot_prod$I$a")
>  OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
> +OPTAB_D (sdot_prod_twoway_optab, "sdot_prod_twoway_$I$a")
> +OPTAB_D (udot_prod_twoway_optab, "udot_prod_twoway_$I$a")
>  OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
>  OPTAB_D (usad_optab, "usad$I$a")
>  OPTAB_D (ssad_optab, "ssad$I$a")
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> new file mode 100644
> index 00000000000..cba2aadbec8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> @@ -0,0 +1,24 @@
> +/* { dg-do compile { target { aarch64*-*-* } } } */
> +/* { dg-additional-options "-march=-O3 -march=armv9.2-a+sme2 -fdump-tree-
> vect-details" { target { aarch64*-*-* } } } */

Does this work? -march=-O3? I am guessing you're getting lucky that the
later -march overrides it?

For tests in the vectorizer testsuite you shouldn't give optimization options as the tests
are ran with -fno-vect-cost-model.  I.e. we'll vectorize even if unprofitable.

You also shouldn't add the -fdump-tree-vect-details, the vectorizer testsuite adds it
automatically.

> +
> +#include <stdint.h>
> +
> +uint32_t udot2(int n, uint16_t* data) {
> +  uint32_t sum = 0;
> +  for (int i=0; i<n; i+=1) {
> +    sum += data[i] * data[i];
> +  }
> +  return sum;
> +}
> +
> +int32_t sdot2(int n, int16_t* data) {
> +  int32_t sum = 0;
> +  for (int i=0; i<n; i+=1) {
> +    sum += data[i] * data[i];
> +  }
> +  return sum;
> +}

So you have an AArch64 only test and place it in generic folder.
You should either make it aarch64 only or make the test a generic test.

Normally for these I add a generic test that scans for the vectorizer scans as you've done below
And I then add an aach64 specific test with check-functions-bodies that checks that we actually
produced the right instructions.

Your test at the moment do not actually guarantee that the instruction gets generated.

Look at how the tests for 4 way dot product are written for inspiration here.

> +
> +/* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern: detected:" 2
> "vect" } } */
> +/* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: detected"
> 4 "vect" } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } } */
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index dfb7d800526..0760f25d94d 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -233,7 +233,7 @@ vect_supportable_direct_optab_p (vec_info *vinfo, tree
> otype, tree_code code,
>    if (!vecotype)
>      return false;
> 
> -  optab optab = optab_for_tree_code (code, vecitype, subtype);
> +  optab optab = optab_for_tree_code_1 (code, vecitype, vecotype, subtype);

If you overload the function then these changes go away.

Thanks for working on this,
Tamar

>    if (!optab)
>      return false;
> 
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer
  2024-05-16 17:45 ` Tamar Christina
@ 2024-05-16 17:53   ` Andrew Pinski
  0 siblings, 0 replies; 14+ messages in thread
From: Andrew Pinski @ 2024-05-16 17:53 UTC (permalink / raw)
  To: Tamar Christina
  Cc: Victor Do Nascimento, GCC Patches, Richard Sandiford,
	Richard Earnshaw, Victor Do Nascimento

[-- Attachment #1: Type: text/plain, Size: 13715 bytes --]

On Thu, May 16, 2024, 7:46 PM Tamar Christina <Tamar.Christina@arm.com>
wrote:

> Hi Victor,
>
> > -----Original Message-----
> > From: Victor Do Nascimento <victor.donascimento@arm.com>
> > Sent: Thursday, May 16, 2024 3:39 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Richard Sandiford <Richard.Sandiford@arm.com>; Richard Earnshaw
> > <Richard.Earnshaw@arm.com>; Victor Do Nascimento
> > <vicdon01@e133397.arm.com>
> > Subject: [PATCH] middle-end: Expand {u|s}dot product support in
> autovectorizer
> >
> > From: Victor Do Nascimento <vicdon01@e133397.arm.com>
> >
> > At present, the compiler offers the `{u|s|us}dot_prod_optab' direct
> > optabs for dealing with vectorizable dot product code sequences.  The
> > consequence of using a direct optab for this is that backend-pattern
> > selection is only ever able to match against one datatype - Either
> > that of the operands or of the accumulated value, never both.
> >
> > With the introduction of the 2-way (un)signed dot-product insn [1][2]
> > in AArch64 SVE2, the existing direct opcode approach is no longer
> > sufficient for full specification of all the possible dot product
> > machine instructions to be matched to the code sequence; a dot product
> > resulting in VNx4SI may result from either dot products on VNx16QI or
> > VNx8HI values for the 4- and 2-way dot product operations, respectively.
> >
> > This means that the following example fails autovectorization:
> >
> > uint32_t foo(int n, uint16_t* data) {
> >   uint32_t sum = 0;
> >   for (int i=0; i<n; i+=1) {
> >     sum += data[i] * data[i];
> >   }
> >   return sum;
> > }
> >
> > To remedy the issue a new optab is added, tentatively named
> > `udot_prod_twoway_optab', whose selection is dependent upon checking
> > of both input and output types involved in the operation.
> >
> > In order to minimize changes to the existing codebase,
> > `optab_for_tree_code' is renamed `optab_for_tree_code_1' and a new
> > argument is added to its signature - `const_tree otype', allowing type
> > information to be specified for both input and output types.  The
> > existing nterface is retained by defining a new `optab_for_tree_code',
> > which serves as a shim to `optab_for_tree_code_1', passing old
> > parameters as-is and setting the new `optype' argument to `NULL_TREE'.
> >
> > For DOT_PROD_EXPR tree codes, we can call `optab_for_tree_code_1'
> > directly, passing it both types, adding the internal logic to the
> > function to distinguish between competing optabs.
> >
> > Finally, necessary changes are made to `expand_widen_pattern_expr' to
> > ensure the new icode can be correctly selected, given the new optab.
> >
> > [1] https://developer.arm.com/documentation/ddi0602/2024-03/SVE-
> > Instructions/UDOT--2-way--vectors---Unsigned-integer-dot-product-
> > [2] https://developer.arm.com/documentation/ddi0602/2024-03/SVE-
> > Instructions/SDOT--2-way--vectors---Signed-integer-dot-product-
> >
> > gcc/ChangeLog:
> >
> >       * config/aarch64/aarch64-sve2.md
> > (@aarch64_sve_<sur>dotvnx4sivnx8hi):
> >       renamed to `<sur>dot_prod_twoway_vnx8hi'.
> >       * config/aarch64/aarch64-sve-builtins-base.cc (svdot_impl.expand):
> >       update icodes used in line with above rename.
>
> Please split the target specific bits from the target agnostic parts.
> I.e. this patch series should be split in two.
>
> >       * optabs-tree.cc (optab_for_tree_code_1): Renamed
> >       `optab_for_tree_code' and added new argument.
> >       (optab_for_tree_code): Now a call to `optab_for_tree_code_1'.
> >       * optabs-tree.h (optab_for_tree_code_1): New.
> >       * optabs.cc (expand_widen_pattern_expr): Expand support for
> >       DOT_PROD_EXPR patterns.
> >       * optabs.def (udot_prod_twoway_optab): New.
> >       (sdot_prod_twoway_optab): Likewise.
> >       * tree-vect-patterns.cc (vect_supportable_direct_optab_p): Add
> >       support for misc optabs that use two modes.
> >
> > gcc/testsuite/ChangeLog:
> >
> >       * gcc.dg/vect/vect-dotprod-twoway.c: New.
> > ---
> >  .../aarch64/aarch64-sve-builtins-base.cc      |  4 ++--
> >  gcc/config/aarch64/aarch64-sve2.md            |  2 +-
> >  gcc/optabs-tree.cc                            | 23 ++++++++++++++++--
> >  gcc/optabs-tree.h                             |  2 ++
> >  gcc/optabs.cc                                 |  2 +-
> >  gcc/optabs.def                                |  2 ++
> >  .../gcc.dg/vect/vect-dotprod-twoway.c         | 24 +++++++++++++++++++
> >  gcc/tree-vect-patterns.cc                     |  2 +-
> >  8 files changed, 54 insertions(+), 7 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> >
> > diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > index 0d2edf3f19e..e457db09f66 100644
> > --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > @@ -764,8 +764,8 @@ public:
> >        icode = (e.type_suffix (0).float_p
> >              ? CODE_FOR_aarch64_sve_fdotvnx4sfvnx8hf
> >              : e.type_suffix (0).unsigned_p
> > -            ? CODE_FOR_aarch64_sve_udotvnx4sivnx8hi
> > -            : CODE_FOR_aarch64_sve_sdotvnx4sivnx8hi);
> > +            ? CODE_FOR_udot_prod_twoway_vnx8hi
> > +            : CODE_FOR_sdot_prod_twoway_vnx8hi);
> >      return e.use_unpred_insn (icode);
> >    }
> >  };
> > diff --git a/gcc/config/aarch64/aarch64-sve2.md
> b/gcc/config/aarch64/aarch64-
> > sve2.md
> > index 934e57055d3..5677de7108d 100644
> > --- a/gcc/config/aarch64/aarch64-sve2.md
> > +++ b/gcc/config/aarch64/aarch64-sve2.md
> > @@ -2021,7 +2021,7 @@ (define_insn
> > "@aarch64_sve_qsub_<sve_int_op>_lane_<mode>"
> >  )
> >
> >  ;; Two-way dot-product.
> > -(define_insn "@aarch64_sve_<sur>dotvnx4sivnx8hi"
> > +(define_insn "<sur>dot_prod_twoway_vnx8hi"
>
> This drops the @ modifier, did it have no callers?
>
> >    [(set (match_operand:VNx4SI 0 "register_operand")
> >       (plus:VNx4SI
> >         (unspec:VNx4SI
> > diff --git a/gcc/optabs-tree.cc b/gcc/optabs-tree.cc
> > index b69a5bc3676..e3c5a618ea2 100644
> > --- a/gcc/optabs-tree.cc
> > +++ b/gcc/optabs-tree.cc
> > @@ -35,8 +35,8 @@ along with GCC; see the file COPYING3.  If not see
> >     cannot give complete results for multiplication or division) but
> probably
> >     ought to be relied on more widely throughout the expander.  */
> >  optab
> > -optab_for_tree_code (enum tree_code code, const_tree type,
> > -                  enum optab_subtype subtype)
> > +optab_for_tree_code_1 (enum tree_code code, const_tree type,
> > +                    const_tree otype, enum optab_subtype subtype)
> >  {
>
> Since we're C++ now, why not just overload optab_for_tree_code.
> Both functions are useful in their own right so it seems better to
> overload them.
> By overloading it also means you don't have to change the existing call
> sites.
>
> I'd also make the optab_for_tree_code with one type just call
> optab_for_tree_code
> with the same type twice.
>
> >    bool trapv;
> >    switch (code)
> > @@ -149,6 +149,14 @@ optab_for_tree_code (enum tree_code code, const_tree
> > type,
> >
> >      case DOT_PROD_EXPR:
> >        {
> > +     if (otype && (TYPE_PRECISION (TREE_TYPE (type)) * 2
> > +                   == TYPE_PRECISION (TREE_TYPE (otype))))
> > +       {
>
> Because then you can remove the check for otype here,
> And store TREE_TYPE (type) and TREE_TYPE (otype) safely into a variable so
> the if becomes
> if (otype != type
>     && TYPE_PRECISION ...
>    && TYPE_UNSIGNED (type) == TYPE_UNSIGNED (otype)
>
> > +         if (TYPE_UNSIGNED (type) && TYPE_UNSIGNED (otype))
> > +           return udot_prod_twoway_optab;
> > +         if (!TYPE_UNSIGNED (type) && !TYPE_UNSIGNED (otype))
> > +           return sdot_prod_twoway_optab;
> > +       }
>
> And you can then simplify this into
>
> return (TYPE_UNSIGNED (otype) ? udot_prod_twoway_optab :
> sdot_prod_twoway_optab;
>
> >       if (subtype == optab_vector_mixed_sign)
> >         return usdot_prod_optab;
> >
> > @@ -285,6 +293,17 @@ optab_for_tree_code (enum tree_code code, const_tree
> > type,
> >      }
> >  }
> >
> > +/* Return the optab used for computing the operation given by the tree
> code,
> > +   CODE and the tree EXP.  This function is not always usable (for
> example, it
> > +   cannot give complete results for multiplication or division) but
> probably
> > +   ought to be relied on more widely throughout the expander.  */
> > +optab
> > +optab_for_tree_code (enum tree_code code, const_tree type,
> > +                  enum optab_subtype subtype)
> > +{
> > +  return optab_for_tree_code_1 (code, type, NULL_TREE, subtype);
> > +}
> > +
> >  /* Check whether an operation represented by CODE is a 'half' widening
> operation
> >     in which the input vector type has half the number of bits of the
> output
> >     vector type e.g. V8QI->V8HI.
> > diff --git a/gcc/optabs-tree.h b/gcc/optabs-tree.h
> > index f2b49991462..13ff7ca2e4b 100644
> > --- a/gcc/optabs-tree.h
> > +++ b/gcc/optabs-tree.h
> > @@ -36,6 +36,8 @@ enum optab_subtype
> >  /* Return the optab used for computing the given operation on the type
> given by
> >     the second argument.  The third argument distinguishes between the
> types of
> >     vector shifts and rotates.  */
> > +optab optab_for_tree_code_1 (enum tree_code, const_tree, const_tree
> > __attribute__((unused)),
> > +                         enum optab_subtype );
> >  optab optab_for_tree_code (enum tree_code, const_tree, enum
> optab_subtype);
> >  bool
> >  supportable_half_widening_operation (enum tree_code, tree, tree,
> > diff --git a/gcc/optabs.cc b/gcc/optabs.cc
> > index ce91f94ed43..3a1c6c7b90e 100644
> > --- a/gcc/optabs.cc
> > +++ b/gcc/optabs.cc
> > @@ -311,7 +311,7 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx
> > op1, rtx wide_op,
> >       gcc_unreachable ();
> >
> >        widen_pattern_optab
> > -     = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), subtype);
> > +     = optab_for_tree_code_1 (ops->code, TREE_TYPE (oprnd0), TREE_TYPE
> > (oprnd2), subtype);
> >      }
> >    else
> >      widen_pattern_optab
> > diff --git a/gcc/optabs.def b/gcc/optabs.def
> > index ad14f9328b9..cf1a6e7a7dc 100644
> > --- a/gcc/optabs.def
> > +++ b/gcc/optabs.def
> > @@ -408,6 +408,8 @@ OPTAB_D (sdot_prod_optab, "sdot_prod$I$a")
> >  OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3")
> >  OPTAB_D (udot_prod_optab, "udot_prod$I$a")
> >  OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
> > +OPTAB_D (sdot_prod_twoway_optab, "sdot_prod_twoway_$I$a")
> > +OPTAB_D (udot_prod_twoway_optab, "udot_prod_twoway_$I$a")
> >  OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
> >  OPTAB_D (usad_optab, "usad$I$a")
> >  OPTAB_D (ssad_optab, "ssad$I$a")
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> > b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> > new file mode 100644
> > index 00000000000..cba2aadbec8
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> > @@ -0,0 +1,24 @@
> > +/* { dg-do compile { target { aarch64*-*-* } } } */
> > +/* { dg-additional-options "-march=-O3 -march=armv9.2-a+sme2
> -fdump-tree-
> > vect-details" { target { aarch64*-*-* } } } */
>
> Does this work? -march=-O3? I am guessing you're getting lucky that the
> later -march overrides it?
>

Yes the later overrides the first one. I did report that as a bug a few
years back, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103422 .

Thanks,
Andrew


> For tests in the vectorizer testsuite you shouldn't give optimization
> options as the tests
> are ran with -fno-vect-cost-model.  I.e. we'll vectorize even if
> unprofitable.
>
> You also shouldn't add the -fdump-tree-vect-details, the vectorizer
> testsuite adds it
> automatically.
>
> > +
> > +#include <stdint.h>
> > +
> > +uint32_t udot2(int n, uint16_t* data) {
> > +  uint32_t sum = 0;
> > +  for (int i=0; i<n; i+=1) {
> > +    sum += data[i] * data[i];
> > +  }
> > +  return sum;
> > +}
> > +
> > +int32_t sdot2(int n, int16_t* data) {
> > +  int32_t sum = 0;
> > +  for (int i=0; i<n; i+=1) {
> > +    sum += data[i] * data[i];
> > +  }
> > +  return sum;
> > +}
>
> So you have an AArch64 only test and place it in generic folder.
> You should either make it aarch64 only or make the test a generic test.
>
> Normally for these I add a generic test that scans for the vectorizer
> scans as you've done below
> And I then add an aach64 specific test with check-functions-bodies that
> checks that we actually
> produced the right instructions.
>
> Your test at the moment do not actually guarantee that the instruction
> gets generated.
>
> Look at how the tests for 4 way dot product are written for inspiration
> here.
>
> > +
> > +/* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern:
> detected:" 2
> > "vect" } } */
> > +/* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern:
> detected"
> > 4 "vect" } } */
> > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } }
> */
> > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> > index dfb7d800526..0760f25d94d 100644
> > --- a/gcc/tree-vect-patterns.cc
> > +++ b/gcc/tree-vect-patterns.cc
> > @@ -233,7 +233,7 @@ vect_supportable_direct_optab_p (vec_info *vinfo,
> tree
> > otype, tree_code code,
> >    if (!vecotype)
> >      return false;
> >
> > -  optab optab = optab_for_tree_code (code, vecitype, subtype);
> > +  optab optab = optab_for_tree_code_1 (code, vecitype, vecotype,
> subtype);
>
> If you overload the function then these changes go away.
>
> Thanks for working on this,
> Tamar
>
> >    if (!optab)
> >      return false;
> >
> > --
> > 2.34.1
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer
  2024-05-16 14:39 [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer Victor Do Nascimento
  2024-05-16 14:45 ` Andrew Pinski
  2024-05-16 17:45 ` Tamar Christina
@ 2024-05-17  2:08 ` Hongtao Liu
  2024-05-17  2:13   ` Hongtao Liu
  2024-05-17  5:51 ` Richard Biener
  3 siblings, 1 reply; 14+ messages in thread
From: Hongtao Liu @ 2024-05-17  2:08 UTC (permalink / raw)
  To: Victor Do Nascimento
  Cc: gcc-patches, richard.sandiford, Richard.Earnshaw, Victor Do Nascimento

On Thu, May 16, 2024 at 10:40 PM Victor Do Nascimento
<victor.donascimento@arm.com> wrote:
>
> From: Victor Do Nascimento <vicdon01@e133397.arm.com>
>
> At present, the compiler offers the `{u|s|us}dot_prod_optab' direct
> optabs for dealing with vectorizable dot product code sequences.  The
> consequence of using a direct optab for this is that backend-pattern
> selection is only ever able to match against one datatype - Either
> that of the operands or of the accumulated value, never both.
>
> With the introduction of the 2-way (un)signed dot-product insn [1][2]
> in AArch64 SVE2, the existing direct opcode approach is no longer
> sufficient for full specification of all the possible dot product
> machine instructions to be matched to the code sequence; a dot product
> resulting in VNx4SI may result from either dot products on VNx16QI or
> VNx8HI values for the 4- and 2-way dot product operations, respectively.
>
> This means that the following example fails autovectorization:
>
> uint32_t foo(int n, uint16_t* data) {
>   uint32_t sum = 0;
>   for (int i=0; i<n; i+=1) {
>     sum += data[i] * data[i];
>   }
>   return sum;
> }
>
Sorry to chime in, for x86 backend, we defined usdot_prodv16hi, and
2-way dot_prod operations can be generated

> To remedy the issue a new optab is added, tentatively named
> `udot_prod_twoway_optab', whose selection is dependent upon checking
> of both input and output types involved in the operation.
>
> In order to minimize changes to the existing codebase,
> `optab_for_tree_code' is renamed `optab_for_tree_code_1' and a new
> argument is added to its signature - `const_tree otype', allowing type
> information to be specified for both input and output types.  The
> existing nterface is retained by defining a new `optab_for_tree_code',
> which serves as a shim to `optab_for_tree_code_1', passing old
> parameters as-is and setting the new `optype' argument to `NULL_TREE'.
>
> For DOT_PROD_EXPR tree codes, we can call `optab_for_tree_code_1'
> directly, passing it both types, adding the internal logic to the
> function to distinguish between competing optabs.
>
> Finally, necessary changes are made to `expand_widen_pattern_expr' to
> ensure the new icode can be correctly selected, given the new optab.
>
> [1] https://developer.arm.com/documentation/ddi0602/2024-03/SVE-Instructions/UDOT--2-way--vectors---Unsigned-integer-dot-product-
> [2] https://developer.arm.com/documentation/ddi0602/2024-03/SVE-Instructions/SDOT--2-way--vectors---Signed-integer-dot-product-
>
> gcc/ChangeLog:
>
>         * config/aarch64/aarch64-sve2.md (@aarch64_sve_<sur>dotvnx4sivnx8hi):
>         renamed to `<sur>dot_prod_twoway_vnx8hi'.
>         * config/aarch64/aarch64-sve-builtins-base.cc (svdot_impl.expand):
>         update icodes used in line with above rename.
>         * optabs-tree.cc (optab_for_tree_code_1): Renamed
>         `optab_for_tree_code' and added new argument.
>         (optab_for_tree_code): Now a call to `optab_for_tree_code_1'.
>         * optabs-tree.h (optab_for_tree_code_1): New.
>         * optabs.cc (expand_widen_pattern_expr): Expand support for
>         DOT_PROD_EXPR patterns.
>         * optabs.def (udot_prod_twoway_optab): New.
>         (sdot_prod_twoway_optab): Likewise.
>         * tree-vect-patterns.cc (vect_supportable_direct_optab_p): Add
>         support for misc optabs that use two modes.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.dg/vect/vect-dotprod-twoway.c: New.
> ---
>  .../aarch64/aarch64-sve-builtins-base.cc      |  4 ++--
>  gcc/config/aarch64/aarch64-sve2.md            |  2 +-
>  gcc/optabs-tree.cc                            | 23 ++++++++++++++++--
>  gcc/optabs-tree.h                             |  2 ++
>  gcc/optabs.cc                                 |  2 +-
>  gcc/optabs.def                                |  2 ++
>  .../gcc.dg/vect/vect-dotprod-twoway.c         | 24 +++++++++++++++++++
>  gcc/tree-vect-patterns.cc                     |  2 +-
>  8 files changed, 54 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
>
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> index 0d2edf3f19e..e457db09f66 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> @@ -764,8 +764,8 @@ public:
>        icode = (e.type_suffix (0).float_p
>                ? CODE_FOR_aarch64_sve_fdotvnx4sfvnx8hf
>                : e.type_suffix (0).unsigned_p
> -              ? CODE_FOR_aarch64_sve_udotvnx4sivnx8hi
> -              : CODE_FOR_aarch64_sve_sdotvnx4sivnx8hi);
> +              ? CODE_FOR_udot_prod_twoway_vnx8hi
> +              : CODE_FOR_sdot_prod_twoway_vnx8hi);
>      return e.use_unpred_insn (icode);
>    }
>  };
> diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-sve2.md
> index 934e57055d3..5677de7108d 100644
> --- a/gcc/config/aarch64/aarch64-sve2.md
> +++ b/gcc/config/aarch64/aarch64-sve2.md
> @@ -2021,7 +2021,7 @@ (define_insn "@aarch64_sve_qsub_<sve_int_op>_lane_<mode>"
>  )
>
>  ;; Two-way dot-product.
> -(define_insn "@aarch64_sve_<sur>dotvnx4sivnx8hi"
> +(define_insn "<sur>dot_prod_twoway_vnx8hi"
>    [(set (match_operand:VNx4SI 0 "register_operand")
>         (plus:VNx4SI
>           (unspec:VNx4SI
> diff --git a/gcc/optabs-tree.cc b/gcc/optabs-tree.cc
> index b69a5bc3676..e3c5a618ea2 100644
> --- a/gcc/optabs-tree.cc
> +++ b/gcc/optabs-tree.cc
> @@ -35,8 +35,8 @@ along with GCC; see the file COPYING3.  If not see
>     cannot give complete results for multiplication or division) but probably
>     ought to be relied on more widely throughout the expander.  */
>  optab
> -optab_for_tree_code (enum tree_code code, const_tree type,
> -                    enum optab_subtype subtype)
> +optab_for_tree_code_1 (enum tree_code code, const_tree type,
> +                      const_tree otype, enum optab_subtype subtype)
>  {
>    bool trapv;
>    switch (code)
> @@ -149,6 +149,14 @@ optab_for_tree_code (enum tree_code code, const_tree type,
>
>      case DOT_PROD_EXPR:
>        {
> +       if (otype && (TYPE_PRECISION (TREE_TYPE (type)) * 2
> +                     == TYPE_PRECISION (TREE_TYPE (otype))))
> +         {
> +           if (TYPE_UNSIGNED (type) && TYPE_UNSIGNED (otype))
> +             return udot_prod_twoway_optab;
> +           if (!TYPE_UNSIGNED (type) && !TYPE_UNSIGNED (otype))
> +             return sdot_prod_twoway_optab;
> +         }
>         if (subtype == optab_vector_mixed_sign)
>           return usdot_prod_optab;
>
> @@ -285,6 +293,17 @@ optab_for_tree_code (enum tree_code code, const_tree type,
>      }
>  }
>
> +/* Return the optab used for computing the operation given by the tree code,
> +   CODE and the tree EXP.  This function is not always usable (for example, it
> +   cannot give complete results for multiplication or division) but probably
> +   ought to be relied on more widely throughout the expander.  */
> +optab
> +optab_for_tree_code (enum tree_code code, const_tree type,
> +                    enum optab_subtype subtype)
> +{
> +  return optab_for_tree_code_1 (code, type, NULL_TREE, subtype);
> +}
> +
>  /* Check whether an operation represented by CODE is a 'half' widening operation
>     in which the input vector type has half the number of bits of the output
>     vector type e.g. V8QI->V8HI.
> diff --git a/gcc/optabs-tree.h b/gcc/optabs-tree.h
> index f2b49991462..13ff7ca2e4b 100644
> --- a/gcc/optabs-tree.h
> +++ b/gcc/optabs-tree.h
> @@ -36,6 +36,8 @@ enum optab_subtype
>  /* Return the optab used for computing the given operation on the type given by
>     the second argument.  The third argument distinguishes between the types of
>     vector shifts and rotates.  */
> +optab optab_for_tree_code_1 (enum tree_code, const_tree, const_tree __attribute__((unused)),
> +                           enum optab_subtype );
>  optab optab_for_tree_code (enum tree_code, const_tree, enum optab_subtype);
>  bool
>  supportable_half_widening_operation (enum tree_code, tree, tree,
> diff --git a/gcc/optabs.cc b/gcc/optabs.cc
> index ce91f94ed43..3a1c6c7b90e 100644
> --- a/gcc/optabs.cc
> +++ b/gcc/optabs.cc
> @@ -311,7 +311,7 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
>         gcc_unreachable ();
>
>        widen_pattern_optab
> -       = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), subtype);
> +       = optab_for_tree_code_1 (ops->code, TREE_TYPE (oprnd0), TREE_TYPE (oprnd2), subtype);
>      }
>    else
>      widen_pattern_optab
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index ad14f9328b9..cf1a6e7a7dc 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -408,6 +408,8 @@ OPTAB_D (sdot_prod_optab, "sdot_prod$I$a")
>  OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3")
>  OPTAB_D (udot_prod_optab, "udot_prod$I$a")
>  OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
> +OPTAB_D (sdot_prod_twoway_optab, "sdot_prod_twoway_$I$a")
> +OPTAB_D (udot_prod_twoway_optab, "udot_prod_twoway_$I$a")
>  OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
>  OPTAB_D (usad_optab, "usad$I$a")
>  OPTAB_D (ssad_optab, "ssad$I$a")
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> new file mode 100644
> index 00000000000..cba2aadbec8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> @@ -0,0 +1,24 @@
> +/* { dg-do compile { target { aarch64*-*-* } } } */
> +/* { dg-additional-options "-march=-O3 -march=armv9.2-a+sme2 -fdump-tree-vect-details" { target { aarch64*-*-* } } } */
> +
> +#include <stdint.h>
> +
> +uint32_t udot2(int n, uint16_t* data) {
> +  uint32_t sum = 0;
> +  for (int i=0; i<n; i+=1) {
> +    sum += data[i] * data[i];
> +  }
> +  return sum;
> +}
> +
> +int32_t sdot2(int n, int16_t* data) {
> +  int32_t sum = 0;
> +  for (int i=0; i<n; i+=1) {
> +    sum += data[i] * data[i];
> +  }
> +  return sum;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern: detected:" 2 "vect" } } */
> +/* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: detected" 4 "vect" } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } } */
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index dfb7d800526..0760f25d94d 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -233,7 +233,7 @@ vect_supportable_direct_optab_p (vec_info *vinfo, tree otype, tree_code code,
>    if (!vecotype)
>      return false;
>
> -  optab optab = optab_for_tree_code (code, vecitype, subtype);
> +  optab optab = optab_for_tree_code_1 (code, vecitype, vecotype, subtype);
>    if (!optab)
>      return false;
>
> --
> 2.34.1
>


-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer
  2024-05-17  2:08 ` Hongtao Liu
@ 2024-05-17  2:13   ` Hongtao Liu
  2024-05-17  9:09     ` Tamar Christina
  0 siblings, 1 reply; 14+ messages in thread
From: Hongtao Liu @ 2024-05-17  2:13 UTC (permalink / raw)
  To: Victor Do Nascimento
  Cc: gcc-patches, richard.sandiford, Richard.Earnshaw, Victor Do Nascimento

> >
> Sorry to chime in, for x86 backend, we defined usdot_prodv16hi, and
> 2-way dot_prod operations can be generated
>
This is the link https://godbolt.org/z/hcWr64vx3, x86 define
udot_prodv16qi/udot_prod8hi and both 2-way and 4-way dot_prod
instructions are generated


-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer
  2024-05-16 14:39 [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer Victor Do Nascimento
                   ` (2 preceding siblings ...)
  2024-05-17  2:08 ` Hongtao Liu
@ 2024-05-17  5:51 ` Richard Biener
  2024-05-17  9:04   ` Tamar Christina
  3 siblings, 1 reply; 14+ messages in thread
From: Richard Biener @ 2024-05-17  5:51 UTC (permalink / raw)
  To: Victor Do Nascimento
  Cc: gcc-patches, richard.sandiford, Richard.Earnshaw, Victor Do Nascimento

On Thu, May 16, 2024 at 4:40 PM Victor Do Nascimento
<victor.donascimento@arm.com> wrote:
>
> From: Victor Do Nascimento <vicdon01@e133397.arm.com>
>
> At present, the compiler offers the `{u|s|us}dot_prod_optab' direct
> optabs for dealing with vectorizable dot product code sequences.  The
> consequence of using a direct optab for this is that backend-pattern
> selection is only ever able to match against one datatype - Either
> that of the operands or of the accumulated value, never both.
>
> With the introduction of the 2-way (un)signed dot-product insn [1][2]
> in AArch64 SVE2, the existing direct opcode approach is no longer
> sufficient for full specification of all the possible dot product
> machine instructions to be matched to the code sequence; a dot product
> resulting in VNx4SI may result from either dot products on VNx16QI or
> VNx8HI values for the 4- and 2-way dot product operations, respectively.
>
> This means that the following example fails autovectorization:
>
> uint32_t foo(int n, uint16_t* data) {
>   uint32_t sum = 0;
>   for (int i=0; i<n; i+=1) {
>     sum += data[i] * data[i];
>   }
>   return sum;
> }
>
> To remedy the issue a new optab is added, tentatively named
> `udot_prod_twoway_optab', whose selection is dependent upon checking
> of both input and output types involved in the operation.

I don't like this too much.  I'll note we document dot_prod as

@cindex @code{sdot_prod@var{m}} instruction pattern
@item @samp{sdot_prod@var{m}}

Compute the sum of the products of two signed elements.
Operand 1 and operand 2 are of the same mode. Their
product, which is of a wider mode, is computed and added to operand 3.
Operand 3 is of a mode equal or wider than the mode of the product. The
result is placed in operand 0, which is of the same mode as operand 3.
@var{m} is the mode of operand 1 and operand 2.

with no restriction on the wider mode but we don't specify it which is
bad design.  This should have been a convert optab with two modes
from the start - adding a _twoway variant is just a hack.

Richard.

> In order to minimize changes to the existing codebase,
> `optab_for_tree_code' is renamed `optab_for_tree_code_1' and a new
> argument is added to its signature - `const_tree otype', allowing type
> information to be specified for both input and output types.  The
> existing nterface is retained by defining a new `optab_for_tree_code',
> which serves as a shim to `optab_for_tree_code_1', passing old
> parameters as-is and setting the new `optype' argument to `NULL_TREE'.
>
> For DOT_PROD_EXPR tree codes, we can call `optab_for_tree_code_1'
> directly, passing it both types, adding the internal logic to the
> function to distinguish between competing optabs.
>
> Finally, necessary changes are made to `expand_widen_pattern_expr' to
> ensure the new icode can be correctly selected, given the new optab.
>
> [1] https://developer.arm.com/documentation/ddi0602/2024-03/SVE-Instructions/UDOT--2-way--vectors---Unsigned-integer-dot-product-
> [2] https://developer.arm.com/documentation/ddi0602/2024-03/SVE-Instructions/SDOT--2-way--vectors---Signed-integer-dot-product-
>
> gcc/ChangeLog:
>
>         * config/aarch64/aarch64-sve2.md (@aarch64_sve_<sur>dotvnx4sivnx8hi):
>         renamed to `<sur>dot_prod_twoway_vnx8hi'.
>         * config/aarch64/aarch64-sve-builtins-base.cc (svdot_impl.expand):
>         update icodes used in line with above rename.
>         * optabs-tree.cc (optab_for_tree_code_1): Renamed
>         `optab_for_tree_code' and added new argument.
>         (optab_for_tree_code): Now a call to `optab_for_tree_code_1'.
>         * optabs-tree.h (optab_for_tree_code_1): New.
>         * optabs.cc (expand_widen_pattern_expr): Expand support for
>         DOT_PROD_EXPR patterns.
>         * optabs.def (udot_prod_twoway_optab): New.
>         (sdot_prod_twoway_optab): Likewise.
>         * tree-vect-patterns.cc (vect_supportable_direct_optab_p): Add
>         support for misc optabs that use two modes.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.dg/vect/vect-dotprod-twoway.c: New.
> ---
>  .../aarch64/aarch64-sve-builtins-base.cc      |  4 ++--
>  gcc/config/aarch64/aarch64-sve2.md            |  2 +-
>  gcc/optabs-tree.cc                            | 23 ++++++++++++++++--
>  gcc/optabs-tree.h                             |  2 ++
>  gcc/optabs.cc                                 |  2 +-
>  gcc/optabs.def                                |  2 ++
>  .../gcc.dg/vect/vect-dotprod-twoway.c         | 24 +++++++++++++++++++
>  gcc/tree-vect-patterns.cc                     |  2 +-
>  8 files changed, 54 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
>
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> index 0d2edf3f19e..e457db09f66 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> @@ -764,8 +764,8 @@ public:
>        icode = (e.type_suffix (0).float_p
>                ? CODE_FOR_aarch64_sve_fdotvnx4sfvnx8hf
>                : e.type_suffix (0).unsigned_p
> -              ? CODE_FOR_aarch64_sve_udotvnx4sivnx8hi
> -              : CODE_FOR_aarch64_sve_sdotvnx4sivnx8hi);
> +              ? CODE_FOR_udot_prod_twoway_vnx8hi
> +              : CODE_FOR_sdot_prod_twoway_vnx8hi);
>      return e.use_unpred_insn (icode);
>    }
>  };
> diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-sve2.md
> index 934e57055d3..5677de7108d 100644
> --- a/gcc/config/aarch64/aarch64-sve2.md
> +++ b/gcc/config/aarch64/aarch64-sve2.md
> @@ -2021,7 +2021,7 @@ (define_insn "@aarch64_sve_qsub_<sve_int_op>_lane_<mode>"
>  )
>
>  ;; Two-way dot-product.
> -(define_insn "@aarch64_sve_<sur>dotvnx4sivnx8hi"
> +(define_insn "<sur>dot_prod_twoway_vnx8hi"
>    [(set (match_operand:VNx4SI 0 "register_operand")
>         (plus:VNx4SI
>           (unspec:VNx4SI
> diff --git a/gcc/optabs-tree.cc b/gcc/optabs-tree.cc
> index b69a5bc3676..e3c5a618ea2 100644
> --- a/gcc/optabs-tree.cc
> +++ b/gcc/optabs-tree.cc
> @@ -35,8 +35,8 @@ along with GCC; see the file COPYING3.  If not see
>     cannot give complete results for multiplication or division) but probably
>     ought to be relied on more widely throughout the expander.  */
>  optab
> -optab_for_tree_code (enum tree_code code, const_tree type,
> -                    enum optab_subtype subtype)
> +optab_for_tree_code_1 (enum tree_code code, const_tree type,
> +                      const_tree otype, enum optab_subtype subtype)
>  {
>    bool trapv;
>    switch (code)
> @@ -149,6 +149,14 @@ optab_for_tree_code (enum tree_code code, const_tree type,
>
>      case DOT_PROD_EXPR:
>        {
> +       if (otype && (TYPE_PRECISION (TREE_TYPE (type)) * 2
> +                     == TYPE_PRECISION (TREE_TYPE (otype))))
> +         {
> +           if (TYPE_UNSIGNED (type) && TYPE_UNSIGNED (otype))
> +             return udot_prod_twoway_optab;
> +           if (!TYPE_UNSIGNED (type) && !TYPE_UNSIGNED (otype))
> +             return sdot_prod_twoway_optab;
> +         }
>         if (subtype == optab_vector_mixed_sign)
>           return usdot_prod_optab;
>
> @@ -285,6 +293,17 @@ optab_for_tree_code (enum tree_code code, const_tree type,
>      }
>  }
>
> +/* Return the optab used for computing the operation given by the tree code,
> +   CODE and the tree EXP.  This function is not always usable (for example, it
> +   cannot give complete results for multiplication or division) but probably
> +   ought to be relied on more widely throughout the expander.  */
> +optab
> +optab_for_tree_code (enum tree_code code, const_tree type,
> +                    enum optab_subtype subtype)
> +{
> +  return optab_for_tree_code_1 (code, type, NULL_TREE, subtype);
> +}
> +
>  /* Check whether an operation represented by CODE is a 'half' widening operation
>     in which the input vector type has half the number of bits of the output
>     vector type e.g. V8QI->V8HI.
> diff --git a/gcc/optabs-tree.h b/gcc/optabs-tree.h
> index f2b49991462..13ff7ca2e4b 100644
> --- a/gcc/optabs-tree.h
> +++ b/gcc/optabs-tree.h
> @@ -36,6 +36,8 @@ enum optab_subtype
>  /* Return the optab used for computing the given operation on the type given by
>     the second argument.  The third argument distinguishes between the types of
>     vector shifts and rotates.  */
> +optab optab_for_tree_code_1 (enum tree_code, const_tree, const_tree __attribute__((unused)),
> +                           enum optab_subtype );
>  optab optab_for_tree_code (enum tree_code, const_tree, enum optab_subtype);
>  bool
>  supportable_half_widening_operation (enum tree_code, tree, tree,
> diff --git a/gcc/optabs.cc b/gcc/optabs.cc
> index ce91f94ed43..3a1c6c7b90e 100644
> --- a/gcc/optabs.cc
> +++ b/gcc/optabs.cc
> @@ -311,7 +311,7 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
>         gcc_unreachable ();
>
>        widen_pattern_optab
> -       = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), subtype);
> +       = optab_for_tree_code_1 (ops->code, TREE_TYPE (oprnd0), TREE_TYPE (oprnd2), subtype);
>      }
>    else
>      widen_pattern_optab
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index ad14f9328b9..cf1a6e7a7dc 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -408,6 +408,8 @@ OPTAB_D (sdot_prod_optab, "sdot_prod$I$a")
>  OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3")
>  OPTAB_D (udot_prod_optab, "udot_prod$I$a")
>  OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
> +OPTAB_D (sdot_prod_twoway_optab, "sdot_prod_twoway_$I$a")
> +OPTAB_D (udot_prod_twoway_optab, "udot_prod_twoway_$I$a")
>  OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
>  OPTAB_D (usad_optab, "usad$I$a")
>  OPTAB_D (ssad_optab, "ssad$I$a")
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> new file mode 100644
> index 00000000000..cba2aadbec8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> @@ -0,0 +1,24 @@
> +/* { dg-do compile { target { aarch64*-*-* } } } */
> +/* { dg-additional-options "-march=-O3 -march=armv9.2-a+sme2 -fdump-tree-vect-details" { target { aarch64*-*-* } } } */
> +
> +#include <stdint.h>
> +
> +uint32_t udot2(int n, uint16_t* data) {
> +  uint32_t sum = 0;
> +  for (int i=0; i<n; i+=1) {
> +    sum += data[i] * data[i];
> +  }
> +  return sum;
> +}
> +
> +int32_t sdot2(int n, int16_t* data) {
> +  int32_t sum = 0;
> +  for (int i=0; i<n; i+=1) {
> +    sum += data[i] * data[i];
> +  }
> +  return sum;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern: detected:" 2 "vect" } } */
> +/* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: detected" 4 "vect" } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } } */
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index dfb7d800526..0760f25d94d 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -233,7 +233,7 @@ vect_supportable_direct_optab_p (vec_info *vinfo, tree otype, tree_code code,
>    if (!vecotype)
>      return false;
>
> -  optab optab = optab_for_tree_code (code, vecitype, subtype);
> +  optab optab = optab_for_tree_code_1 (code, vecitype, vecotype, subtype);
>    if (!optab)
>      return false;
>
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer
  2024-05-17  5:51 ` Richard Biener
@ 2024-05-17  9:04   ` Tamar Christina
  2024-05-17  9:46     ` Richard Biener
  0 siblings, 1 reply; 14+ messages in thread
From: Tamar Christina @ 2024-05-17  9:04 UTC (permalink / raw)
  To: Richard Biener, Victor Do Nascimento
  Cc: gcc-patches, Richard Sandiford, Richard Earnshaw, Victor Do Nascimento

> -----Original Message-----
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: Friday, May 17, 2024 6:51 AM
> To: Victor Do Nascimento <Victor.DoNascimento@arm.com>
> Cc: gcc-patches@gcc.gnu.org; Richard Sandiford <Richard.Sandiford@arm.com>;
> Richard Earnshaw <Richard.Earnshaw@arm.com>; Victor Do Nascimento
> <vicdon01@e133397.arm.com>
> Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in
> autovectorizer
> 
> On Thu, May 16, 2024 at 4:40 PM Victor Do Nascimento
> <victor.donascimento@arm.com> wrote:
> >
> > From: Victor Do Nascimento <vicdon01@e133397.arm.com>
> >
> > At present, the compiler offers the `{u|s|us}dot_prod_optab' direct
> > optabs for dealing with vectorizable dot product code sequences.  The
> > consequence of using a direct optab for this is that backend-pattern
> > selection is only ever able to match against one datatype - Either
> > that of the operands or of the accumulated value, never both.
> >
> > With the introduction of the 2-way (un)signed dot-product insn [1][2]
> > in AArch64 SVE2, the existing direct opcode approach is no longer
> > sufficient for full specification of all the possible dot product
> > machine instructions to be matched to the code sequence; a dot product
> > resulting in VNx4SI may result from either dot products on VNx16QI or
> > VNx8HI values for the 4- and 2-way dot product operations, respectively.
> >
> > This means that the following example fails autovectorization:
> >
> > uint32_t foo(int n, uint16_t* data) {
> >   uint32_t sum = 0;
> >   for (int i=0; i<n; i+=1) {
> >     sum += data[i] * data[i];
> >   }
> >   return sum;
> > }
> >
> > To remedy the issue a new optab is added, tentatively named
> > `udot_prod_twoway_optab', whose selection is dependent upon checking
> > of both input and output types involved in the operation.
> 
> I don't like this too much.  I'll note we document dot_prod as
> 
> @cindex @code{sdot_prod@var{m}} instruction pattern
> @item @samp{sdot_prod@var{m}}
> 
> Compute the sum of the products of two signed elements.
> Operand 1 and operand 2 are of the same mode. Their
> product, which is of a wider mode, is computed and added to operand 3.
> Operand 3 is of a mode equal or wider than the mode of the product. The
> result is placed in operand 0, which is of the same mode as operand 3.
> @var{m} is the mode of operand 1 and operand 2.
> 
> with no restriction on the wider mode but we don't specify it which is
> bad design.  This should have been a convert optab with two modes
> from the start - adding a _twoway variant is just a hack.

We did discuss this at the time we started implementing it.  There was two
options, one was indeed to change it to a convert dot_prod optab, but doing
this means we have to update every target that uses it.

Now that means 3 ISAs for AArch64, Arm, Arc, c6x, 2 for x86, loongson and altivec.

Which sure could be possible, but there's also every use in the backends that need
to be updated, and tested, which for some targets we don't even know how to begin.

So it seems very hard to correct dotprod to a convert optab now.

Tamar

> 
> Richard.
> 
> > In order to minimize changes to the existing codebase,
> > `optab_for_tree_code' is renamed `optab_for_tree_code_1' and a new
> > argument is added to its signature - `const_tree otype', allowing type
> > information to be specified for both input and output types.  The
> > existing nterface is retained by defining a new `optab_for_tree_code',
> > which serves as a shim to `optab_for_tree_code_1', passing old
> > parameters as-is and setting the new `optype' argument to `NULL_TREE'.
> >
> > For DOT_PROD_EXPR tree codes, we can call `optab_for_tree_code_1'
> > directly, passing it both types, adding the internal logic to the
> > function to distinguish between competing optabs.
> >
> > Finally, necessary changes are made to `expand_widen_pattern_expr' to
> > ensure the new icode can be correctly selected, given the new optab.
> >
> > [1] https://developer.arm.com/documentation/ddi0602/2024-03/SVE-
> Instructions/UDOT--2-way--vectors---Unsigned-integer-dot-product-
> > [2] https://developer.arm.com/documentation/ddi0602/2024-03/SVE-
> Instructions/SDOT--2-way--vectors---Signed-integer-dot-product-
> >
> > gcc/ChangeLog:
> >
> >         * config/aarch64/aarch64-sve2.md (@aarch64_sve_<sur>dotvnx4sivnx8hi):
> >         renamed to `<sur>dot_prod_twoway_vnx8hi'.
> >         * config/aarch64/aarch64-sve-builtins-base.cc (svdot_impl.expand):
> >         update icodes used in line with above rename.
> >         * optabs-tree.cc (optab_for_tree_code_1): Renamed
> >         `optab_for_tree_code' and added new argument.
> >         (optab_for_tree_code): Now a call to `optab_for_tree_code_1'.
> >         * optabs-tree.h (optab_for_tree_code_1): New.
> >         * optabs.cc (expand_widen_pattern_expr): Expand support for
> >         DOT_PROD_EXPR patterns.
> >         * optabs.def (udot_prod_twoway_optab): New.
> >         (sdot_prod_twoway_optab): Likewise.
> >         * tree-vect-patterns.cc (vect_supportable_direct_optab_p): Add
> >         support for misc optabs that use two modes.
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * gcc.dg/vect/vect-dotprod-twoway.c: New.
> > ---
> >  .../aarch64/aarch64-sve-builtins-base.cc      |  4 ++--
> >  gcc/config/aarch64/aarch64-sve2.md            |  2 +-
> >  gcc/optabs-tree.cc                            | 23 ++++++++++++++++--
> >  gcc/optabs-tree.h                             |  2 ++
> >  gcc/optabs.cc                                 |  2 +-
> >  gcc/optabs.def                                |  2 ++
> >  .../gcc.dg/vect/vect-dotprod-twoway.c         | 24 +++++++++++++++++++
> >  gcc/tree-vect-patterns.cc                     |  2 +-
> >  8 files changed, 54 insertions(+), 7 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> >
> > diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > index 0d2edf3f19e..e457db09f66 100644
> > --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > @@ -764,8 +764,8 @@ public:
> >        icode = (e.type_suffix (0).float_p
> >                ? CODE_FOR_aarch64_sve_fdotvnx4sfvnx8hf
> >                : e.type_suffix (0).unsigned_p
> > -              ? CODE_FOR_aarch64_sve_udotvnx4sivnx8hi
> > -              : CODE_FOR_aarch64_sve_sdotvnx4sivnx8hi);
> > +              ? CODE_FOR_udot_prod_twoway_vnx8hi
> > +              : CODE_FOR_sdot_prod_twoway_vnx8hi);
> >      return e.use_unpred_insn (icode);
> >    }
> >  };
> > diff --git a/gcc/config/aarch64/aarch64-sve2.md
> b/gcc/config/aarch64/aarch64-sve2.md
> > index 934e57055d3..5677de7108d 100644
> > --- a/gcc/config/aarch64/aarch64-sve2.md
> > +++ b/gcc/config/aarch64/aarch64-sve2.md
> > @@ -2021,7 +2021,7 @@ (define_insn
> "@aarch64_sve_qsub_<sve_int_op>_lane_<mode>"
> >  )
> >
> >  ;; Two-way dot-product.
> > -(define_insn "@aarch64_sve_<sur>dotvnx4sivnx8hi"
> > +(define_insn "<sur>dot_prod_twoway_vnx8hi"
> >    [(set (match_operand:VNx4SI 0 "register_operand")
> >         (plus:VNx4SI
> >           (unspec:VNx4SI
> > diff --git a/gcc/optabs-tree.cc b/gcc/optabs-tree.cc
> > index b69a5bc3676..e3c5a618ea2 100644
> > --- a/gcc/optabs-tree.cc
> > +++ b/gcc/optabs-tree.cc
> > @@ -35,8 +35,8 @@ along with GCC; see the file COPYING3.  If not see
> >     cannot give complete results for multiplication or division) but probably
> >     ought to be relied on more widely throughout the expander.  */
> >  optab
> > -optab_for_tree_code (enum tree_code code, const_tree type,
> > -                    enum optab_subtype subtype)
> > +optab_for_tree_code_1 (enum tree_code code, const_tree type,
> > +                      const_tree otype, enum optab_subtype subtype)
> >  {
> >    bool trapv;
> >    switch (code)
> > @@ -149,6 +149,14 @@ optab_for_tree_code (enum tree_code code,
> const_tree type,
> >
> >      case DOT_PROD_EXPR:
> >        {
> > +       if (otype && (TYPE_PRECISION (TREE_TYPE (type)) * 2
> > +                     == TYPE_PRECISION (TREE_TYPE (otype))))
> > +         {
> > +           if (TYPE_UNSIGNED (type) && TYPE_UNSIGNED (otype))
> > +             return udot_prod_twoway_optab;
> > +           if (!TYPE_UNSIGNED (type) && !TYPE_UNSIGNED (otype))
> > +             return sdot_prod_twoway_optab;
> > +         }
> >         if (subtype == optab_vector_mixed_sign)
> >           return usdot_prod_optab;
> >
> > @@ -285,6 +293,17 @@ optab_for_tree_code (enum tree_code code,
> const_tree type,
> >      }
> >  }
> >
> > +/* Return the optab used for computing the operation given by the tree code,
> > +   CODE and the tree EXP.  This function is not always usable (for example, it
> > +   cannot give complete results for multiplication or division) but probably
> > +   ought to be relied on more widely throughout the expander.  */
> > +optab
> > +optab_for_tree_code (enum tree_code code, const_tree type,
> > +                    enum optab_subtype subtype)
> > +{
> > +  return optab_for_tree_code_1 (code, type, NULL_TREE, subtype);
> > +}
> > +
> >  /* Check whether an operation represented by CODE is a 'half' widening
> operation
> >     in which the input vector type has half the number of bits of the output
> >     vector type e.g. V8QI->V8HI.
> > diff --git a/gcc/optabs-tree.h b/gcc/optabs-tree.h
> > index f2b49991462..13ff7ca2e4b 100644
> > --- a/gcc/optabs-tree.h
> > +++ b/gcc/optabs-tree.h
> > @@ -36,6 +36,8 @@ enum optab_subtype
> >  /* Return the optab used for computing the given operation on the type given
> by
> >     the second argument.  The third argument distinguishes between the types of
> >     vector shifts and rotates.  */
> > +optab optab_for_tree_code_1 (enum tree_code, const_tree, const_tree
> __attribute__((unused)),
> > +                           enum optab_subtype );
> >  optab optab_for_tree_code (enum tree_code, const_tree, enum
> optab_subtype);
> >  bool
> >  supportable_half_widening_operation (enum tree_code, tree, tree,
> > diff --git a/gcc/optabs.cc b/gcc/optabs.cc
> > index ce91f94ed43..3a1c6c7b90e 100644
> > --- a/gcc/optabs.cc
> > +++ b/gcc/optabs.cc
> > @@ -311,7 +311,7 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx
> op1, rtx wide_op,
> >         gcc_unreachable ();
> >
> >        widen_pattern_optab
> > -       = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), subtype);
> > +       = optab_for_tree_code_1 (ops->code, TREE_TYPE (oprnd0), TREE_TYPE
> (oprnd2), subtype);
> >      }
> >    else
> >      widen_pattern_optab
> > diff --git a/gcc/optabs.def b/gcc/optabs.def
> > index ad14f9328b9..cf1a6e7a7dc 100644
> > --- a/gcc/optabs.def
> > +++ b/gcc/optabs.def
> > @@ -408,6 +408,8 @@ OPTAB_D (sdot_prod_optab, "sdot_prod$I$a")
> >  OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3")
> >  OPTAB_D (udot_prod_optab, "udot_prod$I$a")
> >  OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
> > +OPTAB_D (sdot_prod_twoway_optab, "sdot_prod_twoway_$I$a")
> > +OPTAB_D (udot_prod_twoway_optab, "udot_prod_twoway_$I$a")
> >  OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
> >  OPTAB_D (usad_optab, "usad$I$a")
> >  OPTAB_D (ssad_optab, "ssad$I$a")
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> > new file mode 100644
> > index 00000000000..cba2aadbec8
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> > @@ -0,0 +1,24 @@
> > +/* { dg-do compile { target { aarch64*-*-* } } } */
> > +/* { dg-additional-options "-march=-O3 -march=armv9.2-a+sme2 -fdump-tree-
> vect-details" { target { aarch64*-*-* } } } */
> > +
> > +#include <stdint.h>
> > +
> > +uint32_t udot2(int n, uint16_t* data) {
> > +  uint32_t sum = 0;
> > +  for (int i=0; i<n; i+=1) {
> > +    sum += data[i] * data[i];
> > +  }
> > +  return sum;
> > +}
> > +
> > +int32_t sdot2(int n, int16_t* data) {
> > +  int32_t sum = 0;
> > +  for (int i=0; i<n; i+=1) {
> > +    sum += data[i] * data[i];
> > +  }
> > +  return sum;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern: detected:"
> 2 "vect" } } */
> > +/* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern:
> detected" 4 "vect" } } */
> > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } } */
> > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> > index dfb7d800526..0760f25d94d 100644
> > --- a/gcc/tree-vect-patterns.cc
> > +++ b/gcc/tree-vect-patterns.cc
> > @@ -233,7 +233,7 @@ vect_supportable_direct_optab_p (vec_info *vinfo, tree
> otype, tree_code code,
> >    if (!vecotype)
> >      return false;
> >
> > -  optab optab = optab_for_tree_code (code, vecitype, subtype);
> > +  optab optab = optab_for_tree_code_1 (code, vecitype, vecotype, subtype);
> >    if (!optab)
> >      return false;
> >
> > --
> > 2.34.1
> >

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer
  2024-05-17  2:13   ` Hongtao Liu
@ 2024-05-17  9:09     ` Tamar Christina
  0 siblings, 0 replies; 14+ messages in thread
From: Tamar Christina @ 2024-05-17  9:09 UTC (permalink / raw)
  To: Hongtao Liu, Victor Do Nascimento
  Cc: gcc-patches, Richard Sandiford, Richard Earnshaw, Victor Do Nascimento

> -----Original Message-----
> From: Hongtao Liu <crazylht@gmail.com>
> Sent: Friday, May 17, 2024 3:14 AM
> To: Victor Do Nascimento <Victor.DoNascimento@arm.com>
> Cc: gcc-patches@gcc.gnu.org; Richard Sandiford <Richard.Sandiford@arm.com>;
> Richard Earnshaw <Richard.Earnshaw@arm.com>; Victor Do Nascimento
> <vicdon01@e133397.arm.com>
> Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in
> autovectorizer
> 
> > >
> > Sorry to chime in, for x86 backend, we defined usdot_prodv16hi, and
> > 2-way dot_prod operations can be generated
> >
> This is the link https://godbolt.org/z/hcWr64vx3, x86 define
> udot_prodv16qi/udot_prod8hi and both 2-way and 4-way dot_prod
> instructions are generated
> 

That's not the same, the 2-way vs 4-way dot_prod here is that
e.g. udot_prod8hi can reduce to either DImode or SImode.
udot_prod8hi does not have enough information to distinguish the two and in RTL
you can't overload the names.  So this is about the ISA having instructions that overlap
on the source mode of the instruction.

Tamar

> 
> --
> BR,
> Hongtao

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer
  2024-05-17  9:04   ` Tamar Christina
@ 2024-05-17  9:46     ` Richard Biener
  2024-05-17  9:55       ` Tamar Christina
  0 siblings, 1 reply; 14+ messages in thread
From: Richard Biener @ 2024-05-17  9:46 UTC (permalink / raw)
  To: Tamar Christina
  Cc: Victor Do Nascimento, gcc-patches, Richard Sandiford,
	Richard Earnshaw, Victor Do Nascimento

On Fri, May 17, 2024 at 11:05 AM Tamar Christina
<Tamar.Christina@arm.com> wrote:
>
> > -----Original Message-----
> > From: Richard Biener <richard.guenther@gmail.com>
> > Sent: Friday, May 17, 2024 6:51 AM
> > To: Victor Do Nascimento <Victor.DoNascimento@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; Richard Sandiford <Richard.Sandiford@arm.com>;
> > Richard Earnshaw <Richard.Earnshaw@arm.com>; Victor Do Nascimento
> > <vicdon01@e133397.arm.com>
> > Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in
> > autovectorizer
> >
> > On Thu, May 16, 2024 at 4:40 PM Victor Do Nascimento
> > <victor.donascimento@arm.com> wrote:
> > >
> > > From: Victor Do Nascimento <vicdon01@e133397.arm.com>
> > >
> > > At present, the compiler offers the `{u|s|us}dot_prod_optab' direct
> > > optabs for dealing with vectorizable dot product code sequences.  The
> > > consequence of using a direct optab for this is that backend-pattern
> > > selection is only ever able to match against one datatype - Either
> > > that of the operands or of the accumulated value, never both.
> > >
> > > With the introduction of the 2-way (un)signed dot-product insn [1][2]
> > > in AArch64 SVE2, the existing direct opcode approach is no longer
> > > sufficient for full specification of all the possible dot product
> > > machine instructions to be matched to the code sequence; a dot product
> > > resulting in VNx4SI may result from either dot products on VNx16QI or
> > > VNx8HI values for the 4- and 2-way dot product operations, respectively.
> > >
> > > This means that the following example fails autovectorization:
> > >
> > > uint32_t foo(int n, uint16_t* data) {
> > >   uint32_t sum = 0;
> > >   for (int i=0; i<n; i+=1) {
> > >     sum += data[i] * data[i];
> > >   }
> > >   return sum;
> > > }
> > >
> > > To remedy the issue a new optab is added, tentatively named
> > > `udot_prod_twoway_optab', whose selection is dependent upon checking
> > > of both input and output types involved in the operation.
> >
> > I don't like this too much.  I'll note we document dot_prod as
> >
> > @cindex @code{sdot_prod@var{m}} instruction pattern
> > @item @samp{sdot_prod@var{m}}
> >
> > Compute the sum of the products of two signed elements.
> > Operand 1 and operand 2 are of the same mode. Their
> > product, which is of a wider mode, is computed and added to operand 3.
> > Operand 3 is of a mode equal or wider than the mode of the product. The
> > result is placed in operand 0, which is of the same mode as operand 3.
> > @var{m} is the mode of operand 1 and operand 2.
> >
> > with no restriction on the wider mode but we don't specify it which is
> > bad design.  This should have been a convert optab with two modes
> > from the start - adding a _twoway variant is just a hack.
>
> We did discuss this at the time we started implementing it.  There was two
> options, one was indeed to change it to a convert dot_prod optab, but doing
> this means we have to update every target that uses it.
>
> Now that means 3 ISAs for AArch64, Arm, Arc, c6x, 2 for x86, loongson and altivec.
>
> Which sure could be possible, but there's also every use in the backends that need
> to be updated, and tested, which for some targets we don't even know how to begin.
>
> So it seems very hard to correct dotprod to a convert optab now.

It's still the correct way to go.  At _least_ your new pattern should
have been this,
otherwise what do you do when you have two-way, four-way and eight-way variants?
Add yet another optab?

Another thing is that when you do it your way you should fix the existing optab
to be two-way by documenting how the second mode derives from the first.

And sure, it's not the only optab suffering from this issue.

Richard.

> Tamar
>
> >
> > Richard.
> >
> > > In order to minimize changes to the existing codebase,
> > > `optab_for_tree_code' is renamed `optab_for_tree_code_1' and a new
> > > argument is added to its signature - `const_tree otype', allowing type
> > > information to be specified for both input and output types.  The
> > > existing nterface is retained by defining a new `optab_for_tree_code',
> > > which serves as a shim to `optab_for_tree_code_1', passing old
> > > parameters as-is and setting the new `optype' argument to `NULL_TREE'.
> > >
> > > For DOT_PROD_EXPR tree codes, we can call `optab_for_tree_code_1'
> > > directly, passing it both types, adding the internal logic to the
> > > function to distinguish between competing optabs.
> > >
> > > Finally, necessary changes are made to `expand_widen_pattern_expr' to
> > > ensure the new icode can be correctly selected, given the new optab.
> > >
> > > [1] https://developer.arm.com/documentation/ddi0602/2024-03/SVE-
> > Instructions/UDOT--2-way--vectors---Unsigned-integer-dot-product-
> > > [2] https://developer.arm.com/documentation/ddi0602/2024-03/SVE-
> > Instructions/SDOT--2-way--vectors---Signed-integer-dot-product-
> > >
> > > gcc/ChangeLog:
> > >
> > >         * config/aarch64/aarch64-sve2.md (@aarch64_sve_<sur>dotvnx4sivnx8hi):
> > >         renamed to `<sur>dot_prod_twoway_vnx8hi'.
> > >         * config/aarch64/aarch64-sve-builtins-base.cc (svdot_impl.expand):
> > >         update icodes used in line with above rename.
> > >         * optabs-tree.cc (optab_for_tree_code_1): Renamed
> > >         `optab_for_tree_code' and added new argument.
> > >         (optab_for_tree_code): Now a call to `optab_for_tree_code_1'.
> > >         * optabs-tree.h (optab_for_tree_code_1): New.
> > >         * optabs.cc (expand_widen_pattern_expr): Expand support for
> > >         DOT_PROD_EXPR patterns.
> > >         * optabs.def (udot_prod_twoway_optab): New.
> > >         (sdot_prod_twoway_optab): Likewise.
> > >         * tree-vect-patterns.cc (vect_supportable_direct_optab_p): Add
> > >         support for misc optabs that use two modes.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >         * gcc.dg/vect/vect-dotprod-twoway.c: New.
> > > ---
> > >  .../aarch64/aarch64-sve-builtins-base.cc      |  4 ++--
> > >  gcc/config/aarch64/aarch64-sve2.md            |  2 +-
> > >  gcc/optabs-tree.cc                            | 23 ++++++++++++++++--
> > >  gcc/optabs-tree.h                             |  2 ++
> > >  gcc/optabs.cc                                 |  2 +-
> > >  gcc/optabs.def                                |  2 ++
> > >  .../gcc.dg/vect/vect-dotprod-twoway.c         | 24 +++++++++++++++++++
> > >  gcc/tree-vect-patterns.cc                     |  2 +-
> > >  8 files changed, 54 insertions(+), 7 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> > >
> > > diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > > index 0d2edf3f19e..e457db09f66 100644
> > > --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > > +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > > @@ -764,8 +764,8 @@ public:
> > >        icode = (e.type_suffix (0).float_p
> > >                ? CODE_FOR_aarch64_sve_fdotvnx4sfvnx8hf
> > >                : e.type_suffix (0).unsigned_p
> > > -              ? CODE_FOR_aarch64_sve_udotvnx4sivnx8hi
> > > -              : CODE_FOR_aarch64_sve_sdotvnx4sivnx8hi);
> > > +              ? CODE_FOR_udot_prod_twoway_vnx8hi
> > > +              : CODE_FOR_sdot_prod_twoway_vnx8hi);
> > >      return e.use_unpred_insn (icode);
> > >    }
> > >  };
> > > diff --git a/gcc/config/aarch64/aarch64-sve2.md
> > b/gcc/config/aarch64/aarch64-sve2.md
> > > index 934e57055d3..5677de7108d 100644
> > > --- a/gcc/config/aarch64/aarch64-sve2.md
> > > +++ b/gcc/config/aarch64/aarch64-sve2.md
> > > @@ -2021,7 +2021,7 @@ (define_insn
> > "@aarch64_sve_qsub_<sve_int_op>_lane_<mode>"
> > >  )
> > >
> > >  ;; Two-way dot-product.
> > > -(define_insn "@aarch64_sve_<sur>dotvnx4sivnx8hi"
> > > +(define_insn "<sur>dot_prod_twoway_vnx8hi"
> > >    [(set (match_operand:VNx4SI 0 "register_operand")
> > >         (plus:VNx4SI
> > >           (unspec:VNx4SI
> > > diff --git a/gcc/optabs-tree.cc b/gcc/optabs-tree.cc
> > > index b69a5bc3676..e3c5a618ea2 100644
> > > --- a/gcc/optabs-tree.cc
> > > +++ b/gcc/optabs-tree.cc
> > > @@ -35,8 +35,8 @@ along with GCC; see the file COPYING3.  If not see
> > >     cannot give complete results for multiplication or division) but probably
> > >     ought to be relied on more widely throughout the expander.  */
> > >  optab
> > > -optab_for_tree_code (enum tree_code code, const_tree type,
> > > -                    enum optab_subtype subtype)
> > > +optab_for_tree_code_1 (enum tree_code code, const_tree type,
> > > +                      const_tree otype, enum optab_subtype subtype)
> > >  {
> > >    bool trapv;
> > >    switch (code)
> > > @@ -149,6 +149,14 @@ optab_for_tree_code (enum tree_code code,
> > const_tree type,
> > >
> > >      case DOT_PROD_EXPR:
> > >        {
> > > +       if (otype && (TYPE_PRECISION (TREE_TYPE (type)) * 2
> > > +                     == TYPE_PRECISION (TREE_TYPE (otype))))
> > > +         {
> > > +           if (TYPE_UNSIGNED (type) && TYPE_UNSIGNED (otype))
> > > +             return udot_prod_twoway_optab;
> > > +           if (!TYPE_UNSIGNED (type) && !TYPE_UNSIGNED (otype))
> > > +             return sdot_prod_twoway_optab;
> > > +         }
> > >         if (subtype == optab_vector_mixed_sign)
> > >           return usdot_prod_optab;
> > >
> > > @@ -285,6 +293,17 @@ optab_for_tree_code (enum tree_code code,
> > const_tree type,
> > >      }
> > >  }
> > >
> > > +/* Return the optab used for computing the operation given by the tree code,
> > > +   CODE and the tree EXP.  This function is not always usable (for example, it
> > > +   cannot give complete results for multiplication or division) but probably
> > > +   ought to be relied on more widely throughout the expander.  */
> > > +optab
> > > +optab_for_tree_code (enum tree_code code, const_tree type,
> > > +                    enum optab_subtype subtype)
> > > +{
> > > +  return optab_for_tree_code_1 (code, type, NULL_TREE, subtype);
> > > +}
> > > +
> > >  /* Check whether an operation represented by CODE is a 'half' widening
> > operation
> > >     in which the input vector type has half the number of bits of the output
> > >     vector type e.g. V8QI->V8HI.
> > > diff --git a/gcc/optabs-tree.h b/gcc/optabs-tree.h
> > > index f2b49991462..13ff7ca2e4b 100644
> > > --- a/gcc/optabs-tree.h
> > > +++ b/gcc/optabs-tree.h
> > > @@ -36,6 +36,8 @@ enum optab_subtype
> > >  /* Return the optab used for computing the given operation on the type given
> > by
> > >     the second argument.  The third argument distinguishes between the types of
> > >     vector shifts and rotates.  */
> > > +optab optab_for_tree_code_1 (enum tree_code, const_tree, const_tree
> > __attribute__((unused)),
> > > +                           enum optab_subtype );
> > >  optab optab_for_tree_code (enum tree_code, const_tree, enum
> > optab_subtype);
> > >  bool
> > >  supportable_half_widening_operation (enum tree_code, tree, tree,
> > > diff --git a/gcc/optabs.cc b/gcc/optabs.cc
> > > index ce91f94ed43..3a1c6c7b90e 100644
> > > --- a/gcc/optabs.cc
> > > +++ b/gcc/optabs.cc
> > > @@ -311,7 +311,7 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx
> > op1, rtx wide_op,
> > >         gcc_unreachable ();
> > >
> > >        widen_pattern_optab
> > > -       = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), subtype);
> > > +       = optab_for_tree_code_1 (ops->code, TREE_TYPE (oprnd0), TREE_TYPE
> > (oprnd2), subtype);
> > >      }
> > >    else
> > >      widen_pattern_optab
> > > diff --git a/gcc/optabs.def b/gcc/optabs.def
> > > index ad14f9328b9..cf1a6e7a7dc 100644
> > > --- a/gcc/optabs.def
> > > +++ b/gcc/optabs.def
> > > @@ -408,6 +408,8 @@ OPTAB_D (sdot_prod_optab, "sdot_prod$I$a")
> > >  OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3")
> > >  OPTAB_D (udot_prod_optab, "udot_prod$I$a")
> > >  OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
> > > +OPTAB_D (sdot_prod_twoway_optab, "sdot_prod_twoway_$I$a")
> > > +OPTAB_D (udot_prod_twoway_optab, "udot_prod_twoway_$I$a")
> > >  OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
> > >  OPTAB_D (usad_optab, "usad$I$a")
> > >  OPTAB_D (ssad_optab, "ssad$I$a")
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> > b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> > > new file mode 100644
> > > index 00000000000..cba2aadbec8
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> > > @@ -0,0 +1,24 @@
> > > +/* { dg-do compile { target { aarch64*-*-* } } } */
> > > +/* { dg-additional-options "-march=-O3 -march=armv9.2-a+sme2 -fdump-tree-
> > vect-details" { target { aarch64*-*-* } } } */
> > > +
> > > +#include <stdint.h>
> > > +
> > > +uint32_t udot2(int n, uint16_t* data) {
> > > +  uint32_t sum = 0;
> > > +  for (int i=0; i<n; i+=1) {
> > > +    sum += data[i] * data[i];
> > > +  }
> > > +  return sum;
> > > +}
> > > +
> > > +int32_t sdot2(int n, int16_t* data) {
> > > +  int32_t sum = 0;
> > > +  for (int i=0; i<n; i+=1) {
> > > +    sum += data[i] * data[i];
> > > +  }
> > > +  return sum;
> > > +}
> > > +
> > > +/* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern: detected:"
> > 2 "vect" } } */
> > > +/* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern:
> > detected" 4 "vect" } } */
> > > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } } */
> > > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> > > index dfb7d800526..0760f25d94d 100644
> > > --- a/gcc/tree-vect-patterns.cc
> > > +++ b/gcc/tree-vect-patterns.cc
> > > @@ -233,7 +233,7 @@ vect_supportable_direct_optab_p (vec_info *vinfo, tree
> > otype, tree_code code,
> > >    if (!vecotype)
> > >      return false;
> > >
> > > -  optab optab = optab_for_tree_code (code, vecitype, subtype);
> > > +  optab optab = optab_for_tree_code_1 (code, vecitype, vecotype, subtype);
> > >    if (!optab)
> > >      return false;
> > >
> > > --
> > > 2.34.1
> > >

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer
  2024-05-17  9:46     ` Richard Biener
@ 2024-05-17  9:55       ` Tamar Christina
  2024-05-17 10:13         ` Richard Biener
  0 siblings, 1 reply; 14+ messages in thread
From: Tamar Christina @ 2024-05-17  9:55 UTC (permalink / raw)
  To: Richard Biener
  Cc: Victor Do Nascimento, gcc-patches, Richard Sandiford,
	Richard Earnshaw, Victor Do Nascimento

> -----Original Message-----
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: Friday, May 17, 2024 10:46 AM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: Victor Do Nascimento <Victor.DoNascimento@arm.com>; gcc-
> patches@gcc.gnu.org; Richard Sandiford <Richard.Sandiford@arm.com>; Richard
> Earnshaw <Richard.Earnshaw@arm.com>; Victor Do Nascimento
> <vicdon01@e133397.arm.com>
> Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in
> autovectorizer
> 
> On Fri, May 17, 2024 at 11:05 AM Tamar Christina
> <Tamar.Christina@arm.com> wrote:
> >
> > > -----Original Message-----
> > > From: Richard Biener <richard.guenther@gmail.com>
> > > Sent: Friday, May 17, 2024 6:51 AM
> > > To: Victor Do Nascimento <Victor.DoNascimento@arm.com>
> > > Cc: gcc-patches@gcc.gnu.org; Richard Sandiford
> <Richard.Sandiford@arm.com>;
> > > Richard Earnshaw <Richard.Earnshaw@arm.com>; Victor Do Nascimento
> > > <vicdon01@e133397.arm.com>
> > > Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in
> > > autovectorizer
> > >
> > > On Thu, May 16, 2024 at 4:40 PM Victor Do Nascimento
> > > <victor.donascimento@arm.com> wrote:
> > > >
> > > > From: Victor Do Nascimento <vicdon01@e133397.arm.com>
> > > >
> > > > At present, the compiler offers the `{u|s|us}dot_prod_optab' direct
> > > > optabs for dealing with vectorizable dot product code sequences.  The
> > > > consequence of using a direct optab for this is that backend-pattern
> > > > selection is only ever able to match against one datatype - Either
> > > > that of the operands or of the accumulated value, never both.
> > > >
> > > > With the introduction of the 2-way (un)signed dot-product insn [1][2]
> > > > in AArch64 SVE2, the existing direct opcode approach is no longer
> > > > sufficient for full specification of all the possible dot product
> > > > machine instructions to be matched to the code sequence; a dot product
> > > > resulting in VNx4SI may result from either dot products on VNx16QI or
> > > > VNx8HI values for the 4- and 2-way dot product operations, respectively.
> > > >
> > > > This means that the following example fails autovectorization:
> > > >
> > > > uint32_t foo(int n, uint16_t* data) {
> > > >   uint32_t sum = 0;
> > > >   for (int i=0; i<n; i+=1) {
> > > >     sum += data[i] * data[i];
> > > >   }
> > > >   return sum;
> > > > }
> > > >
> > > > To remedy the issue a new optab is added, tentatively named
> > > > `udot_prod_twoway_optab', whose selection is dependent upon checking
> > > > of both input and output types involved in the operation.
> > >
> > > I don't like this too much.  I'll note we document dot_prod as
> > >
> > > @cindex @code{sdot_prod@var{m}} instruction pattern
> > > @item @samp{sdot_prod@var{m}}
> > >
> > > Compute the sum of the products of two signed elements.
> > > Operand 1 and operand 2 are of the same mode. Their
> > > product, which is of a wider mode, is computed and added to operand 3.
> > > Operand 3 is of a mode equal or wider than the mode of the product. The
> > > result is placed in operand 0, which is of the same mode as operand 3.
> > > @var{m} is the mode of operand 1 and operand 2.
> > >
> > > with no restriction on the wider mode but we don't specify it which is
> > > bad design.  This should have been a convert optab with two modes
> > > from the start - adding a _twoway variant is just a hack.
> >
> > We did discuss this at the time we started implementing it.  There was two
> > options, one was indeed to change it to a convert dot_prod optab, but doing
> > this means we have to update every target that uses it.
> >
> > Now that means 3 ISAs for AArch64, Arm, Arc, c6x, 2 for x86, loongson and
> altivec.
> >
> > Which sure could be possible, but there's also every use in the backends that
> need
> > to be updated, and tested, which for some targets we don't even know how to
> begin.
> >
> > So it seems very hard to correct dotprod to a convert optab now.
> 
> It's still the correct way to go.  At _least_ your new pattern should
> have been this,
> otherwise what do you do when you have two-way, four-way and eight-way
> variants?
> Add yet another optab?

I guess that's fair, but having the new optab only be convert resulted in messy
code as everywhere you must check for both variants.

Additionally that optab would then overlap with the existing optabs as, as you
Say, the documentation only says it's of a wider type and doesn't indicate
precision.

So to avoid issues down the line then If the new optab isn't acceptable then
we'll have to do a wholesale conversion then..

> 
> Another thing is that when you do it your way you should fix the existing optab
> to be two-way by documenting how the second mode derives from the first.
> 
> And sure, it's not the only optab suffering from this issue.

Sure, all the zero and sign extending optabs for instance 😊

Tamar

> 
> Richard.
> 
> > Tamar
> >
> > >
> > > Richard.
> > >
> > > > In order to minimize changes to the existing codebase,
> > > > `optab_for_tree_code' is renamed `optab_for_tree_code_1' and a new
> > > > argument is added to its signature - `const_tree otype', allowing type
> > > > information to be specified for both input and output types.  The
> > > > existing nterface is retained by defining a new `optab_for_tree_code',
> > > > which serves as a shim to `optab_for_tree_code_1', passing old
> > > > parameters as-is and setting the new `optype' argument to `NULL_TREE'.
> > > >
> > > > For DOT_PROD_EXPR tree codes, we can call `optab_for_tree_code_1'
> > > > directly, passing it both types, adding the internal logic to the
> > > > function to distinguish between competing optabs.
> > > >
> > > > Finally, necessary changes are made to `expand_widen_pattern_expr' to
> > > > ensure the new icode can be correctly selected, given the new optab.
> > > >
> > > > [1] https://developer.arm.com/documentation/ddi0602/2024-03/SVE-
> > > Instructions/UDOT--2-way--vectors---Unsigned-integer-dot-product-
> > > > [2] https://developer.arm.com/documentation/ddi0602/2024-03/SVE-
> > > Instructions/SDOT--2-way--vectors---Signed-integer-dot-product-
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > >         * config/aarch64/aarch64-sve2.md
> (@aarch64_sve_<sur>dotvnx4sivnx8hi):
> > > >         renamed to `<sur>dot_prod_twoway_vnx8hi'.
> > > >         * config/aarch64/aarch64-sve-builtins-base.cc (svdot_impl.expand):
> > > >         update icodes used in line with above rename.
> > > >         * optabs-tree.cc (optab_for_tree_code_1): Renamed
> > > >         `optab_for_tree_code' and added new argument.
> > > >         (optab_for_tree_code): Now a call to `optab_for_tree_code_1'.
> > > >         * optabs-tree.h (optab_for_tree_code_1): New.
> > > >         * optabs.cc (expand_widen_pattern_expr): Expand support for
> > > >         DOT_PROD_EXPR patterns.
> > > >         * optabs.def (udot_prod_twoway_optab): New.
> > > >         (sdot_prod_twoway_optab): Likewise.
> > > >         * tree-vect-patterns.cc (vect_supportable_direct_optab_p): Add
> > > >         support for misc optabs that use two modes.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > >         * gcc.dg/vect/vect-dotprod-twoway.c: New.
> > > > ---
> > > >  .../aarch64/aarch64-sve-builtins-base.cc      |  4 ++--
> > > >  gcc/config/aarch64/aarch64-sve2.md            |  2 +-
> > > >  gcc/optabs-tree.cc                            | 23 ++++++++++++++++--
> > > >  gcc/optabs-tree.h                             |  2 ++
> > > >  gcc/optabs.cc                                 |  2 +-
> > > >  gcc/optabs.def                                |  2 ++
> > > >  .../gcc.dg/vect/vect-dotprod-twoway.c         | 24 +++++++++++++++++++
> > > >  gcc/tree-vect-patterns.cc                     |  2 +-
> > > >  8 files changed, 54 insertions(+), 7 deletions(-)
> > > >  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> > > >
> > > > diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > > b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > > > index 0d2edf3f19e..e457db09f66 100644
> > > > --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > > > +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > > > @@ -764,8 +764,8 @@ public:
> > > >        icode = (e.type_suffix (0).float_p
> > > >                ? CODE_FOR_aarch64_sve_fdotvnx4sfvnx8hf
> > > >                : e.type_suffix (0).unsigned_p
> > > > -              ? CODE_FOR_aarch64_sve_udotvnx4sivnx8hi
> > > > -              : CODE_FOR_aarch64_sve_sdotvnx4sivnx8hi);
> > > > +              ? CODE_FOR_udot_prod_twoway_vnx8hi
> > > > +              : CODE_FOR_sdot_prod_twoway_vnx8hi);
> > > >      return e.use_unpred_insn (icode);
> > > >    }
> > > >  };
> > > > diff --git a/gcc/config/aarch64/aarch64-sve2.md
> > > b/gcc/config/aarch64/aarch64-sve2.md
> > > > index 934e57055d3..5677de7108d 100644
> > > > --- a/gcc/config/aarch64/aarch64-sve2.md
> > > > +++ b/gcc/config/aarch64/aarch64-sve2.md
> > > > @@ -2021,7 +2021,7 @@ (define_insn
> > > "@aarch64_sve_qsub_<sve_int_op>_lane_<mode>"
> > > >  )
> > > >
> > > >  ;; Two-way dot-product.
> > > > -(define_insn "@aarch64_sve_<sur>dotvnx4sivnx8hi"
> > > > +(define_insn "<sur>dot_prod_twoway_vnx8hi"
> > > >    [(set (match_operand:VNx4SI 0 "register_operand")
> > > >         (plus:VNx4SI
> > > >           (unspec:VNx4SI
> > > > diff --git a/gcc/optabs-tree.cc b/gcc/optabs-tree.cc
> > > > index b69a5bc3676..e3c5a618ea2 100644
> > > > --- a/gcc/optabs-tree.cc
> > > > +++ b/gcc/optabs-tree.cc
> > > > @@ -35,8 +35,8 @@ along with GCC; see the file COPYING3.  If not see
> > > >     cannot give complete results for multiplication or division) but probably
> > > >     ought to be relied on more widely throughout the expander.  */
> > > >  optab
> > > > -optab_for_tree_code (enum tree_code code, const_tree type,
> > > > -                    enum optab_subtype subtype)
> > > > +optab_for_tree_code_1 (enum tree_code code, const_tree type,
> > > > +                      const_tree otype, enum optab_subtype subtype)
> > > >  {
> > > >    bool trapv;
> > > >    switch (code)
> > > > @@ -149,6 +149,14 @@ optab_for_tree_code (enum tree_code code,
> > > const_tree type,
> > > >
> > > >      case DOT_PROD_EXPR:
> > > >        {
> > > > +       if (otype && (TYPE_PRECISION (TREE_TYPE (type)) * 2
> > > > +                     == TYPE_PRECISION (TREE_TYPE (otype))))
> > > > +         {
> > > > +           if (TYPE_UNSIGNED (type) && TYPE_UNSIGNED (otype))
> > > > +             return udot_prod_twoway_optab;
> > > > +           if (!TYPE_UNSIGNED (type) && !TYPE_UNSIGNED (otype))
> > > > +             return sdot_prod_twoway_optab;
> > > > +         }
> > > >         if (subtype == optab_vector_mixed_sign)
> > > >           return usdot_prod_optab;
> > > >
> > > > @@ -285,6 +293,17 @@ optab_for_tree_code (enum tree_code code,
> > > const_tree type,
> > > >      }
> > > >  }
> > > >
> > > > +/* Return the optab used for computing the operation given by the tree
> code,
> > > > +   CODE and the tree EXP.  This function is not always usable (for example, it
> > > > +   cannot give complete results for multiplication or division) but probably
> > > > +   ought to be relied on more widely throughout the expander.  */
> > > > +optab
> > > > +optab_for_tree_code (enum tree_code code, const_tree type,
> > > > +                    enum optab_subtype subtype)
> > > > +{
> > > > +  return optab_for_tree_code_1 (code, type, NULL_TREE, subtype);
> > > > +}
> > > > +
> > > >  /* Check whether an operation represented by CODE is a 'half' widening
> > > operation
> > > >     in which the input vector type has half the number of bits of the output
> > > >     vector type e.g. V8QI->V8HI.
> > > > diff --git a/gcc/optabs-tree.h b/gcc/optabs-tree.h
> > > > index f2b49991462..13ff7ca2e4b 100644
> > > > --- a/gcc/optabs-tree.h
> > > > +++ b/gcc/optabs-tree.h
> > > > @@ -36,6 +36,8 @@ enum optab_subtype
> > > >  /* Return the optab used for computing the given operation on the type
> given
> > > by
> > > >     the second argument.  The third argument distinguishes between the types
> of
> > > >     vector shifts and rotates.  */
> > > > +optab optab_for_tree_code_1 (enum tree_code, const_tree, const_tree
> > > __attribute__((unused)),
> > > > +                           enum optab_subtype );
> > > >  optab optab_for_tree_code (enum tree_code, const_tree, enum
> > > optab_subtype);
> > > >  bool
> > > >  supportable_half_widening_operation (enum tree_code, tree, tree,
> > > > diff --git a/gcc/optabs.cc b/gcc/optabs.cc
> > > > index ce91f94ed43..3a1c6c7b90e 100644
> > > > --- a/gcc/optabs.cc
> > > > +++ b/gcc/optabs.cc
> > > > @@ -311,7 +311,7 @@ expand_widen_pattern_expr (sepops ops, rtx op0,
> rtx
> > > op1, rtx wide_op,
> > > >         gcc_unreachable ();
> > > >
> > > >        widen_pattern_optab
> > > > -       = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), subtype);
> > > > +       = optab_for_tree_code_1 (ops->code, TREE_TYPE (oprnd0), TREE_TYPE
> > > (oprnd2), subtype);
> > > >      }
> > > >    else
> > > >      widen_pattern_optab
> > > > diff --git a/gcc/optabs.def b/gcc/optabs.def
> > > > index ad14f9328b9..cf1a6e7a7dc 100644
> > > > --- a/gcc/optabs.def
> > > > +++ b/gcc/optabs.def
> > > > @@ -408,6 +408,8 @@ OPTAB_D (sdot_prod_optab, "sdot_prod$I$a")
> > > >  OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3")
> > > >  OPTAB_D (udot_prod_optab, "udot_prod$I$a")
> > > >  OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
> > > > +OPTAB_D (sdot_prod_twoway_optab, "sdot_prod_twoway_$I$a")
> > > > +OPTAB_D (udot_prod_twoway_optab, "udot_prod_twoway_$I$a")
> > > >  OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
> > > >  OPTAB_D (usad_optab, "usad$I$a")
> > > >  OPTAB_D (ssad_optab, "ssad$I$a")
> > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> > > > new file mode 100644
> > > > index 00000000000..cba2aadbec8
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> > > > @@ -0,0 +1,24 @@
> > > > +/* { dg-do compile { target { aarch64*-*-* } } } */
> > > > +/* { dg-additional-options "-march=-O3 -march=armv9.2-a+sme2 -fdump-
> tree-
> > > vect-details" { target { aarch64*-*-* } } } */
> > > > +
> > > > +#include <stdint.h>
> > > > +
> > > > +uint32_t udot2(int n, uint16_t* data) {
> > > > +  uint32_t sum = 0;
> > > > +  for (int i=0; i<n; i+=1) {
> > > > +    sum += data[i] * data[i];
> > > > +  }
> > > > +  return sum;
> > > > +}
> > > > +
> > > > +int32_t sdot2(int n, int16_t* data) {
> > > > +  int32_t sum = 0;
> > > > +  for (int i=0; i<n; i+=1) {
> > > > +    sum += data[i] * data[i];
> > > > +  }
> > > > +  return sum;
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern:
> detected:"
> > > 2 "vect" } } */
> > > > +/* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern:
> > > detected" 4 "vect" } } */
> > > > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } } */
> > > > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> > > > index dfb7d800526..0760f25d94d 100644
> > > > --- a/gcc/tree-vect-patterns.cc
> > > > +++ b/gcc/tree-vect-patterns.cc
> > > > @@ -233,7 +233,7 @@ vect_supportable_direct_optab_p (vec_info *vinfo,
> tree
> > > otype, tree_code code,
> > > >    if (!vecotype)
> > > >      return false;
> > > >
> > > > -  optab optab = optab_for_tree_code (code, vecitype, subtype);
> > > > +  optab optab = optab_for_tree_code_1 (code, vecitype, vecotype, subtype);
> > > >    if (!optab)
> > > >      return false;
> > > >
> > > > --
> > > > 2.34.1
> > > >

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer
  2024-05-17  9:55       ` Tamar Christina
@ 2024-05-17 10:13         ` Richard Biener
  2024-05-17 14:53           ` Victor Do Nascimento
  2024-05-20 16:28           ` Richard Sandiford
  0 siblings, 2 replies; 14+ messages in thread
From: Richard Biener @ 2024-05-17 10:13 UTC (permalink / raw)
  To: Tamar Christina
  Cc: Victor Do Nascimento, gcc-patches, Richard Sandiford,
	Richard Earnshaw, Victor Do Nascimento

On Fri, May 17, 2024 at 11:56 AM Tamar Christina
<Tamar.Christina@arm.com> wrote:
>
> > -----Original Message-----
> > From: Richard Biener <richard.guenther@gmail.com>
> > Sent: Friday, May 17, 2024 10:46 AM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: Victor Do Nascimento <Victor.DoNascimento@arm.com>; gcc-
> > patches@gcc.gnu.org; Richard Sandiford <Richard.Sandiford@arm.com>; Richard
> > Earnshaw <Richard.Earnshaw@arm.com>; Victor Do Nascimento
> > <vicdon01@e133397.arm.com>
> > Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in
> > autovectorizer
> >
> > On Fri, May 17, 2024 at 11:05 AM Tamar Christina
> > <Tamar.Christina@arm.com> wrote:
> > >
> > > > -----Original Message-----
> > > > From: Richard Biener <richard.guenther@gmail.com>
> > > > Sent: Friday, May 17, 2024 6:51 AM
> > > > To: Victor Do Nascimento <Victor.DoNascimento@arm.com>
> > > > Cc: gcc-patches@gcc.gnu.org; Richard Sandiford
> > <Richard.Sandiford@arm.com>;
> > > > Richard Earnshaw <Richard.Earnshaw@arm.com>; Victor Do Nascimento
> > > > <vicdon01@e133397.arm.com>
> > > > Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in
> > > > autovectorizer
> > > >
> > > > On Thu, May 16, 2024 at 4:40 PM Victor Do Nascimento
> > > > <victor.donascimento@arm.com> wrote:
> > > > >
> > > > > From: Victor Do Nascimento <vicdon01@e133397.arm.com>
> > > > >
> > > > > At present, the compiler offers the `{u|s|us}dot_prod_optab' direct
> > > > > optabs for dealing with vectorizable dot product code sequences.  The
> > > > > consequence of using a direct optab for this is that backend-pattern
> > > > > selection is only ever able to match against one datatype - Either
> > > > > that of the operands or of the accumulated value, never both.
> > > > >
> > > > > With the introduction of the 2-way (un)signed dot-product insn [1][2]
> > > > > in AArch64 SVE2, the existing direct opcode approach is no longer
> > > > > sufficient for full specification of all the possible dot product
> > > > > machine instructions to be matched to the code sequence; a dot product
> > > > > resulting in VNx4SI may result from either dot products on VNx16QI or
> > > > > VNx8HI values for the 4- and 2-way dot product operations, respectively.
> > > > >
> > > > > This means that the following example fails autovectorization:
> > > > >
> > > > > uint32_t foo(int n, uint16_t* data) {
> > > > >   uint32_t sum = 0;
> > > > >   for (int i=0; i<n; i+=1) {
> > > > >     sum += data[i] * data[i];
> > > > >   }
> > > > >   return sum;
> > > > > }
> > > > >
> > > > > To remedy the issue a new optab is added, tentatively named
> > > > > `udot_prod_twoway_optab', whose selection is dependent upon checking
> > > > > of both input and output types involved in the operation.
> > > >
> > > > I don't like this too much.  I'll note we document dot_prod as
> > > >
> > > > @cindex @code{sdot_prod@var{m}} instruction pattern
> > > > @item @samp{sdot_prod@var{m}}
> > > >
> > > > Compute the sum of the products of two signed elements.
> > > > Operand 1 and operand 2 are of the same mode. Their
> > > > product, which is of a wider mode, is computed and added to operand 3.
> > > > Operand 3 is of a mode equal or wider than the mode of the product. The
> > > > result is placed in operand 0, which is of the same mode as operand 3.
> > > > @var{m} is the mode of operand 1 and operand 2.
> > > >
> > > > with no restriction on the wider mode but we don't specify it which is
> > > > bad design.  This should have been a convert optab with two modes
> > > > from the start - adding a _twoway variant is just a hack.
> > >
> > > We did discuss this at the time we started implementing it.  There was two
> > > options, one was indeed to change it to a convert dot_prod optab, but doing
> > > this means we have to update every target that uses it.
> > >
> > > Now that means 3 ISAs for AArch64, Arm, Arc, c6x, 2 for x86, loongson and
> > altivec.
> > >
> > > Which sure could be possible, but there's also every use in the backends that
> > need
> > > to be updated, and tested, which for some targets we don't even know how to
> > begin.
> > >
> > > So it seems very hard to correct dotprod to a convert optab now.
> >
> > It's still the correct way to go.  At _least_ your new pattern should
> > have been this,
> > otherwise what do you do when you have two-way, four-way and eight-way
> > variants?
> > Add yet another optab?
>
> I guess that's fair, but having the new optab only be convert resulted in messy
> code as everywhere you must check for both variants.
>
> Additionally that optab would then overlap with the existing optabs as, as you
> Say, the documentation only says it's of a wider type and doesn't indicate
> precision.
>
> So to avoid issues down the line then If the new optab isn't acceptable then
> we'll have to do a wholesale conversion then..

Yep.  It shouldn't be difficult though.

> >
> > Another thing is that when you do it your way you should fix the existing optab
> > to be two-way by documenting how the second mode derives from the first.
> >
> > And sure, it's not the only optab suffering from this issue.
>
> Sure, all the zero and sign extending optabs for instance 😊

But for example the scalar ones are correct:

OPTAB_CL(sext_optab, "extend$b$a2", SIGN_EXTEND, "extend",
gen_extend_conv_libfunc)

Richard.

> Tamar
>
> >
> > Richard.
> >
> > > Tamar
> > >
> > > >
> > > > Richard.
> > > >
> > > > > In order to minimize changes to the existing codebase,
> > > > > `optab_for_tree_code' is renamed `optab_for_tree_code_1' and a new
> > > > > argument is added to its signature - `const_tree otype', allowing type
> > > > > information to be specified for both input and output types.  The
> > > > > existing nterface is retained by defining a new `optab_for_tree_code',
> > > > > which serves as a shim to `optab_for_tree_code_1', passing old
> > > > > parameters as-is and setting the new `optype' argument to `NULL_TREE'.
> > > > >
> > > > > For DOT_PROD_EXPR tree codes, we can call `optab_for_tree_code_1'
> > > > > directly, passing it both types, adding the internal logic to the
> > > > > function to distinguish between competing optabs.
> > > > >
> > > > > Finally, necessary changes are made to `expand_widen_pattern_expr' to
> > > > > ensure the new icode can be correctly selected, given the new optab.
> > > > >
> > > > > [1] https://developer.arm.com/documentation/ddi0602/2024-03/SVE-
> > > > Instructions/UDOT--2-way--vectors---Unsigned-integer-dot-product-
> > > > > [2] https://developer.arm.com/documentation/ddi0602/2024-03/SVE-
> > > > Instructions/SDOT--2-way--vectors---Signed-integer-dot-product-
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > >         * config/aarch64/aarch64-sve2.md
> > (@aarch64_sve_<sur>dotvnx4sivnx8hi):
> > > > >         renamed to `<sur>dot_prod_twoway_vnx8hi'.
> > > > >         * config/aarch64/aarch64-sve-builtins-base.cc (svdot_impl.expand):
> > > > >         update icodes used in line with above rename.
> > > > >         * optabs-tree.cc (optab_for_tree_code_1): Renamed
> > > > >         `optab_for_tree_code' and added new argument.
> > > > >         (optab_for_tree_code): Now a call to `optab_for_tree_code_1'.
> > > > >         * optabs-tree.h (optab_for_tree_code_1): New.
> > > > >         * optabs.cc (expand_widen_pattern_expr): Expand support for
> > > > >         DOT_PROD_EXPR patterns.
> > > > >         * optabs.def (udot_prod_twoway_optab): New.
> > > > >         (sdot_prod_twoway_optab): Likewise.
> > > > >         * tree-vect-patterns.cc (vect_supportable_direct_optab_p): Add
> > > > >         support for misc optabs that use two modes.
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >
> > > > >         * gcc.dg/vect/vect-dotprod-twoway.c: New.
> > > > > ---
> > > > >  .../aarch64/aarch64-sve-builtins-base.cc      |  4 ++--
> > > > >  gcc/config/aarch64/aarch64-sve2.md            |  2 +-
> > > > >  gcc/optabs-tree.cc                            | 23 ++++++++++++++++--
> > > > >  gcc/optabs-tree.h                             |  2 ++
> > > > >  gcc/optabs.cc                                 |  2 +-
> > > > >  gcc/optabs.def                                |  2 ++
> > > > >  .../gcc.dg/vect/vect-dotprod-twoway.c         | 24 +++++++++++++++++++
> > > > >  gcc/tree-vect-patterns.cc                     |  2 +-
> > > > >  8 files changed, 54 insertions(+), 7 deletions(-)
> > > > >  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> > > > >
> > > > > diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > > > b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > > > > index 0d2edf3f19e..e457db09f66 100644
> > > > > --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > > > > +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > > > > @@ -764,8 +764,8 @@ public:
> > > > >        icode = (e.type_suffix (0).float_p
> > > > >                ? CODE_FOR_aarch64_sve_fdotvnx4sfvnx8hf
> > > > >                : e.type_suffix (0).unsigned_p
> > > > > -              ? CODE_FOR_aarch64_sve_udotvnx4sivnx8hi
> > > > > -              : CODE_FOR_aarch64_sve_sdotvnx4sivnx8hi);
> > > > > +              ? CODE_FOR_udot_prod_twoway_vnx8hi
> > > > > +              : CODE_FOR_sdot_prod_twoway_vnx8hi);
> > > > >      return e.use_unpred_insn (icode);
> > > > >    }
> > > > >  };
> > > > > diff --git a/gcc/config/aarch64/aarch64-sve2.md
> > > > b/gcc/config/aarch64/aarch64-sve2.md
> > > > > index 934e57055d3..5677de7108d 100644
> > > > > --- a/gcc/config/aarch64/aarch64-sve2.md
> > > > > +++ b/gcc/config/aarch64/aarch64-sve2.md
> > > > > @@ -2021,7 +2021,7 @@ (define_insn
> > > > "@aarch64_sve_qsub_<sve_int_op>_lane_<mode>"
> > > > >  )
> > > > >
> > > > >  ;; Two-way dot-product.
> > > > > -(define_insn "@aarch64_sve_<sur>dotvnx4sivnx8hi"
> > > > > +(define_insn "<sur>dot_prod_twoway_vnx8hi"
> > > > >    [(set (match_operand:VNx4SI 0 "register_operand")
> > > > >         (plus:VNx4SI
> > > > >           (unspec:VNx4SI
> > > > > diff --git a/gcc/optabs-tree.cc b/gcc/optabs-tree.cc
> > > > > index b69a5bc3676..e3c5a618ea2 100644
> > > > > --- a/gcc/optabs-tree.cc
> > > > > +++ b/gcc/optabs-tree.cc
> > > > > @@ -35,8 +35,8 @@ along with GCC; see the file COPYING3.  If not see
> > > > >     cannot give complete results for multiplication or division) but probably
> > > > >     ought to be relied on more widely throughout the expander.  */
> > > > >  optab
> > > > > -optab_for_tree_code (enum tree_code code, const_tree type,
> > > > > -                    enum optab_subtype subtype)
> > > > > +optab_for_tree_code_1 (enum tree_code code, const_tree type,
> > > > > +                      const_tree otype, enum optab_subtype subtype)
> > > > >  {
> > > > >    bool trapv;
> > > > >    switch (code)
> > > > > @@ -149,6 +149,14 @@ optab_for_tree_code (enum tree_code code,
> > > > const_tree type,
> > > > >
> > > > >      case DOT_PROD_EXPR:
> > > > >        {
> > > > > +       if (otype && (TYPE_PRECISION (TREE_TYPE (type)) * 2
> > > > > +                     == TYPE_PRECISION (TREE_TYPE (otype))))
> > > > > +         {
> > > > > +           if (TYPE_UNSIGNED (type) && TYPE_UNSIGNED (otype))
> > > > > +             return udot_prod_twoway_optab;
> > > > > +           if (!TYPE_UNSIGNED (type) && !TYPE_UNSIGNED (otype))
> > > > > +             return sdot_prod_twoway_optab;
> > > > > +         }
> > > > >         if (subtype == optab_vector_mixed_sign)
> > > > >           return usdot_prod_optab;
> > > > >
> > > > > @@ -285,6 +293,17 @@ optab_for_tree_code (enum tree_code code,
> > > > const_tree type,
> > > > >      }
> > > > >  }
> > > > >
> > > > > +/* Return the optab used for computing the operation given by the tree
> > code,
> > > > > +   CODE and the tree EXP.  This function is not always usable (for example, it
> > > > > +   cannot give complete results for multiplication or division) but probably
> > > > > +   ought to be relied on more widely throughout the expander.  */
> > > > > +optab
> > > > > +optab_for_tree_code (enum tree_code code, const_tree type,
> > > > > +                    enum optab_subtype subtype)
> > > > > +{
> > > > > +  return optab_for_tree_code_1 (code, type, NULL_TREE, subtype);
> > > > > +}
> > > > > +
> > > > >  /* Check whether an operation represented by CODE is a 'half' widening
> > > > operation
> > > > >     in which the input vector type has half the number of bits of the output
> > > > >     vector type e.g. V8QI->V8HI.
> > > > > diff --git a/gcc/optabs-tree.h b/gcc/optabs-tree.h
> > > > > index f2b49991462..13ff7ca2e4b 100644
> > > > > --- a/gcc/optabs-tree.h
> > > > > +++ b/gcc/optabs-tree.h
> > > > > @@ -36,6 +36,8 @@ enum optab_subtype
> > > > >  /* Return the optab used for computing the given operation on the type
> > given
> > > > by
> > > > >     the second argument.  The third argument distinguishes between the types
> > of
> > > > >     vector shifts and rotates.  */
> > > > > +optab optab_for_tree_code_1 (enum tree_code, const_tree, const_tree
> > > > __attribute__((unused)),
> > > > > +                           enum optab_subtype );
> > > > >  optab optab_for_tree_code (enum tree_code, const_tree, enum
> > > > optab_subtype);
> > > > >  bool
> > > > >  supportable_half_widening_operation (enum tree_code, tree, tree,
> > > > > diff --git a/gcc/optabs.cc b/gcc/optabs.cc
> > > > > index ce91f94ed43..3a1c6c7b90e 100644
> > > > > --- a/gcc/optabs.cc
> > > > > +++ b/gcc/optabs.cc
> > > > > @@ -311,7 +311,7 @@ expand_widen_pattern_expr (sepops ops, rtx op0,
> > rtx
> > > > op1, rtx wide_op,
> > > > >         gcc_unreachable ();
> > > > >
> > > > >        widen_pattern_optab
> > > > > -       = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), subtype);
> > > > > +       = optab_for_tree_code_1 (ops->code, TREE_TYPE (oprnd0), TREE_TYPE
> > > > (oprnd2), subtype);
> > > > >      }
> > > > >    else
> > > > >      widen_pattern_optab
> > > > > diff --git a/gcc/optabs.def b/gcc/optabs.def
> > > > > index ad14f9328b9..cf1a6e7a7dc 100644
> > > > > --- a/gcc/optabs.def
> > > > > +++ b/gcc/optabs.def
> > > > > @@ -408,6 +408,8 @@ OPTAB_D (sdot_prod_optab, "sdot_prod$I$a")
> > > > >  OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3")
> > > > >  OPTAB_D (udot_prod_optab, "udot_prod$I$a")
> > > > >  OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
> > > > > +OPTAB_D (sdot_prod_twoway_optab, "sdot_prod_twoway_$I$a")
> > > > > +OPTAB_D (udot_prod_twoway_optab, "udot_prod_twoway_$I$a")
> > > > >  OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
> > > > >  OPTAB_D (usad_optab, "usad$I$a")
> > > > >  OPTAB_D (ssad_optab, "ssad$I$a")
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> > > > b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> > > > > new file mode 100644
> > > > > index 00000000000..cba2aadbec8
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> > > > > @@ -0,0 +1,24 @@
> > > > > +/* { dg-do compile { target { aarch64*-*-* } } } */
> > > > > +/* { dg-additional-options "-march=-O3 -march=armv9.2-a+sme2 -fdump-
> > tree-
> > > > vect-details" { target { aarch64*-*-* } } } */
> > > > > +
> > > > > +#include <stdint.h>
> > > > > +
> > > > > +uint32_t udot2(int n, uint16_t* data) {
> > > > > +  uint32_t sum = 0;
> > > > > +  for (int i=0; i<n; i+=1) {
> > > > > +    sum += data[i] * data[i];
> > > > > +  }
> > > > > +  return sum;
> > > > > +}
> > > > > +
> > > > > +int32_t sdot2(int n, int16_t* data) {
> > > > > +  int32_t sum = 0;
> > > > > +  for (int i=0; i<n; i+=1) {
> > > > > +    sum += data[i] * data[i];
> > > > > +  }
> > > > > +  return sum;
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern:
> > detected:"
> > > > 2 "vect" } } */
> > > > > +/* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern:
> > > > detected" 4 "vect" } } */
> > > > > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } } */
> > > > > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> > > > > index dfb7d800526..0760f25d94d 100644
> > > > > --- a/gcc/tree-vect-patterns.cc
> > > > > +++ b/gcc/tree-vect-patterns.cc
> > > > > @@ -233,7 +233,7 @@ vect_supportable_direct_optab_p (vec_info *vinfo,
> > tree
> > > > otype, tree_code code,
> > > > >    if (!vecotype)
> > > > >      return false;
> > > > >
> > > > > -  optab optab = optab_for_tree_code (code, vecitype, subtype);
> > > > > +  optab optab = optab_for_tree_code_1 (code, vecitype, vecotype, subtype);
> > > > >    if (!optab)
> > > > >      return false;
> > > > >
> > > > > --
> > > > > 2.34.1
> > > > >

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer
  2024-05-17 10:13         ` Richard Biener
@ 2024-05-17 14:53           ` Victor Do Nascimento
  2024-05-20 16:28           ` Richard Sandiford
  1 sibling, 0 replies; 14+ messages in thread
From: Victor Do Nascimento @ 2024-05-17 14:53 UTC (permalink / raw)
  To: Richard Biener, Tamar Christina; +Cc: gcc-patches, Victor Do Nascimento

Dear Richard and Tamar,

Thanks to the both of you for the various bits of feedback.
I've implemented all the more straightforward bits of feedback given, 
leaving "only" the merging of the two- and four-way dot product optabs 
into one, together with the necessary changes to the various backends 
which, though a little time-consuming, should be rather mechanical.

I had originally implemented the new two-way dotprod optab as a covert 
optab anyway, so going back to the work on that local branch will give 
me a good starting point from which to do this.

And Tamar, thanks very much for the feedback regarding the unit-tests. 
I knew my testing as it currently is was rather anaemic and was eager to 
get the relevant feedback on it.  Rest assured it's all been taken on board.

Cheers,
Victor


On 5/17/24 11:13, Richard Biener wrote:
> On Fri, May 17, 2024 at 11:56 AM Tamar Christina
> <Tamar.Christina@arm.com> wrote:
>>
>>> -----Original Message-----
>>> From: Richard Biener <richard.guenther@gmail.com>
>>> Sent: Friday, May 17, 2024 10:46 AM
>>> To: Tamar Christina <Tamar.Christina@arm.com>
>>> Cc: Victor Do Nascimento <Victor.DoNascimento@arm.com>; gcc-
>>> patches@gcc.gnu.org; Richard Sandiford <Richard.Sandiford@arm.com>; Richard
>>> Earnshaw <Richard.Earnshaw@arm.com>; Victor Do Nascimento
>>> <vicdon01@e133397.arm.com>
>>> Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in
>>> autovectorizer
>>>
>>> On Fri, May 17, 2024 at 11:05 AM Tamar Christina
>>> <Tamar.Christina@arm.com> wrote:
>>>>
>>>>> -----Original Message-----
>>>>> From: Richard Biener <richard.guenther@gmail.com>
>>>>> Sent: Friday, May 17, 2024 6:51 AM
>>>>> To: Victor Do Nascimento <Victor.DoNascimento@arm.com>
>>>>> Cc: gcc-patches@gcc.gnu.org; Richard Sandiford
>>> <Richard.Sandiford@arm.com>;
>>>>> Richard Earnshaw <Richard.Earnshaw@arm.com>; Victor Do Nascimento
>>>>> <vicdon01@e133397.arm.com>
>>>>> Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in
>>>>> autovectorizer
>>>>>
>>>>> On Thu, May 16, 2024 at 4:40 PM Victor Do Nascimento
>>>>> <victor.donascimento@arm.com> wrote:
>>>>>>
>>>>>> From: Victor Do Nascimento <vicdon01@e133397.arm.com>
>>>>>>
>>>>>> At present, the compiler offers the `{u|s|us}dot_prod_optab' direct
>>>>>> optabs for dealing with vectorizable dot product code sequences.  The
>>>>>> consequence of using a direct optab for this is that backend-pattern
>>>>>> selection is only ever able to match against one datatype - Either
>>>>>> that of the operands or of the accumulated value, never both.
>>>>>>
>>>>>> With the introduction of the 2-way (un)signed dot-product insn [1][2]
>>>>>> in AArch64 SVE2, the existing direct opcode approach is no longer
>>>>>> sufficient for full specification of all the possible dot product
>>>>>> machine instructions to be matched to the code sequence; a dot product
>>>>>> resulting in VNx4SI may result from either dot products on VNx16QI or
>>>>>> VNx8HI values for the 4- and 2-way dot product operations, respectively.
>>>>>>
>>>>>> This means that the following example fails autovectorization:
>>>>>>
>>>>>> uint32_t foo(int n, uint16_t* data) {
>>>>>>    uint32_t sum = 0;
>>>>>>    for (int i=0; i<n; i+=1) {
>>>>>>      sum += data[i] * data[i];
>>>>>>    }
>>>>>>    return sum;
>>>>>> }
>>>>>>
>>>>>> To remedy the issue a new optab is added, tentatively named
>>>>>> `udot_prod_twoway_optab', whose selection is dependent upon checking
>>>>>> of both input and output types involved in the operation.
>>>>>
>>>>> I don't like this too much.  I'll note we document dot_prod as
>>>>>
>>>>> @cindex @code{sdot_prod@var{m}} instruction pattern
>>>>> @item @samp{sdot_prod@var{m}}
>>>>>
>>>>> Compute the sum of the products of two signed elements.
>>>>> Operand 1 and operand 2 are of the same mode. Their
>>>>> product, which is of a wider mode, is computed and added to operand 3.
>>>>> Operand 3 is of a mode equal or wider than the mode of the product. The
>>>>> result is placed in operand 0, which is of the same mode as operand 3.
>>>>> @var{m} is the mode of operand 1 and operand 2.
>>>>>
>>>>> with no restriction on the wider mode but we don't specify it which is
>>>>> bad design.  This should have been a convert optab with two modes
>>>>> from the start - adding a _twoway variant is just a hack.
>>>>
>>>> We did discuss this at the time we started implementing it.  There was two
>>>> options, one was indeed to change it to a convert dot_prod optab, but doing
>>>> this means we have to update every target that uses it.
>>>>
>>>> Now that means 3 ISAs for AArch64, Arm, Arc, c6x, 2 for x86, loongson and
>>> altivec.
>>>>
>>>> Which sure could be possible, but there's also every use in the backends that
>>> need
>>>> to be updated, and tested, which for some targets we don't even know how to
>>> begin.
>>>>
>>>> So it seems very hard to correct dotprod to a convert optab now.
>>>
>>> It's still the correct way to go.  At _least_ your new pattern should
>>> have been this,
>>> otherwise what do you do when you have two-way, four-way and eight-way
>>> variants?
>>> Add yet another optab?
>>
>> I guess that's fair, but having the new optab only be convert resulted in messy
>> code as everywhere you must check for both variants.
>>
>> Additionally that optab would then overlap with the existing optabs as, as you
>> Say, the documentation only says it's of a wider type and doesn't indicate
>> precision.
>>
>> So to avoid issues down the line then If the new optab isn't acceptable then
>> we'll have to do a wholesale conversion then..
> 
> Yep.  It shouldn't be difficult though.
> 
>>>
>>> Another thing is that when you do it your way you should fix the existing optab
>>> to be two-way by documenting how the second mode derives from the first.
>>>
>>> And sure, it's not the only optab suffering from this issue.
>>
>> Sure, all the zero and sign extending optabs for instance 😊
> 
> But for example the scalar ones are correct:
> 
> OPTAB_CL(sext_optab, "extend$b$a2", SIGN_EXTEND, "extend",
> gen_extend_conv_libfunc)
> 
> Richard.
> 
>> Tamar
>>
>>>
>>> Richard.
>>>
>>>> Tamar
>>>>
>>>>>
>>>>> Richard.
>>>>>
>>>>>> In order to minimize changes to the existing codebase,
>>>>>> `optab_for_tree_code' is renamed `optab_for_tree_code_1' and a new
>>>>>> argument is added to its signature - `const_tree otype', allowing type
>>>>>> information to be specified for both input and output types.  The
>>>>>> existing nterface is retained by defining a new `optab_for_tree_code',
>>>>>> which serves as a shim to `optab_for_tree_code_1', passing old
>>>>>> parameters as-is and setting the new `optype' argument to `NULL_TREE'.
>>>>>>
>>>>>> For DOT_PROD_EXPR tree codes, we can call `optab_for_tree_code_1'
>>>>>> directly, passing it both types, adding the internal logic to the
>>>>>> function to distinguish between competing optabs.
>>>>>>
>>>>>> Finally, necessary changes are made to `expand_widen_pattern_expr' to
>>>>>> ensure the new icode can be correctly selected, given the new optab.
>>>>>>
>>>>>> [1] https://developer.arm.com/documentation/ddi0602/2024-03/SVE-
>>>>> Instructions/UDOT--2-way--vectors---Unsigned-integer-dot-product-
>>>>>> [2] https://developer.arm.com/documentation/ddi0602/2024-03/SVE-
>>>>> Instructions/SDOT--2-way--vectors---Signed-integer-dot-product-
>>>>>>
>>>>>> gcc/ChangeLog:
>>>>>>
>>>>>>          * config/aarch64/aarch64-sve2.md
>>> (@aarch64_sve_<sur>dotvnx4sivnx8hi):
>>>>>>          renamed to `<sur>dot_prod_twoway_vnx8hi'.
>>>>>>          * config/aarch64/aarch64-sve-builtins-base.cc (svdot_impl.expand):
>>>>>>          update icodes used in line with above rename.
>>>>>>          * optabs-tree.cc (optab_for_tree_code_1): Renamed
>>>>>>          `optab_for_tree_code' and added new argument.
>>>>>>          (optab_for_tree_code): Now a call to `optab_for_tree_code_1'.
>>>>>>          * optabs-tree.h (optab_for_tree_code_1): New.
>>>>>>          * optabs.cc (expand_widen_pattern_expr): Expand support for
>>>>>>          DOT_PROD_EXPR patterns.
>>>>>>          * optabs.def (udot_prod_twoway_optab): New.
>>>>>>          (sdot_prod_twoway_optab): Likewise.
>>>>>>          * tree-vect-patterns.cc (vect_supportable_direct_optab_p): Add
>>>>>>          support for misc optabs that use two modes.
>>>>>>
>>>>>> gcc/testsuite/ChangeLog:
>>>>>>
>>>>>>          * gcc.dg/vect/vect-dotprod-twoway.c: New.
>>>>>> ---
>>>>>>   .../aarch64/aarch64-sve-builtins-base.cc      |  4 ++--
>>>>>>   gcc/config/aarch64/aarch64-sve2.md            |  2 +-
>>>>>>   gcc/optabs-tree.cc                            | 23 ++++++++++++++++--
>>>>>>   gcc/optabs-tree.h                             |  2 ++
>>>>>>   gcc/optabs.cc                                 |  2 +-
>>>>>>   gcc/optabs.def                                |  2 ++
>>>>>>   .../gcc.dg/vect/vect-dotprod-twoway.c         | 24 +++++++++++++++++++
>>>>>>   gcc/tree-vect-patterns.cc                     |  2 +-
>>>>>>   8 files changed, 54 insertions(+), 7 deletions(-)
>>>>>>   create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
>>>>>>
>>>>>> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>>>>> b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>>>>>> index 0d2edf3f19e..e457db09f66 100644
>>>>>> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>>>>>> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>>>>>> @@ -764,8 +764,8 @@ public:
>>>>>>         icode = (e.type_suffix (0).float_p
>>>>>>                 ? CODE_FOR_aarch64_sve_fdotvnx4sfvnx8hf
>>>>>>                 : e.type_suffix (0).unsigned_p
>>>>>> -              ? CODE_FOR_aarch64_sve_udotvnx4sivnx8hi
>>>>>> -              : CODE_FOR_aarch64_sve_sdotvnx4sivnx8hi);
>>>>>> +              ? CODE_FOR_udot_prod_twoway_vnx8hi
>>>>>> +              : CODE_FOR_sdot_prod_twoway_vnx8hi);
>>>>>>       return e.use_unpred_insn (icode);
>>>>>>     }
>>>>>>   };
>>>>>> diff --git a/gcc/config/aarch64/aarch64-sve2.md
>>>>> b/gcc/config/aarch64/aarch64-sve2.md
>>>>>> index 934e57055d3..5677de7108d 100644
>>>>>> --- a/gcc/config/aarch64/aarch64-sve2.md
>>>>>> +++ b/gcc/config/aarch64/aarch64-sve2.md
>>>>>> @@ -2021,7 +2021,7 @@ (define_insn
>>>>> "@aarch64_sve_qsub_<sve_int_op>_lane_<mode>"
>>>>>>   )
>>>>>>
>>>>>>   ;; Two-way dot-product.
>>>>>> -(define_insn "@aarch64_sve_<sur>dotvnx4sivnx8hi"
>>>>>> +(define_insn "<sur>dot_prod_twoway_vnx8hi"
>>>>>>     [(set (match_operand:VNx4SI 0 "register_operand")
>>>>>>          (plus:VNx4SI
>>>>>>            (unspec:VNx4SI
>>>>>> diff --git a/gcc/optabs-tree.cc b/gcc/optabs-tree.cc
>>>>>> index b69a5bc3676..e3c5a618ea2 100644
>>>>>> --- a/gcc/optabs-tree.cc
>>>>>> +++ b/gcc/optabs-tree.cc
>>>>>> @@ -35,8 +35,8 @@ along with GCC; see the file COPYING3.  If not see
>>>>>>      cannot give complete results for multiplication or division) but probably
>>>>>>      ought to be relied on more widely throughout the expander.  */
>>>>>>   optab
>>>>>> -optab_for_tree_code (enum tree_code code, const_tree type,
>>>>>> -                    enum optab_subtype subtype)
>>>>>> +optab_for_tree_code_1 (enum tree_code code, const_tree type,
>>>>>> +                      const_tree otype, enum optab_subtype subtype)
>>>>>>   {
>>>>>>     bool trapv;
>>>>>>     switch (code)
>>>>>> @@ -149,6 +149,14 @@ optab_for_tree_code (enum tree_code code,
>>>>> const_tree type,
>>>>>>
>>>>>>       case DOT_PROD_EXPR:
>>>>>>         {
>>>>>> +       if (otype && (TYPE_PRECISION (TREE_TYPE (type)) * 2
>>>>>> +                     == TYPE_PRECISION (TREE_TYPE (otype))))
>>>>>> +         {
>>>>>> +           if (TYPE_UNSIGNED (type) && TYPE_UNSIGNED (otype))
>>>>>> +             return udot_prod_twoway_optab;
>>>>>> +           if (!TYPE_UNSIGNED (type) && !TYPE_UNSIGNED (otype))
>>>>>> +             return sdot_prod_twoway_optab;
>>>>>> +         }
>>>>>>          if (subtype == optab_vector_mixed_sign)
>>>>>>            return usdot_prod_optab;
>>>>>>
>>>>>> @@ -285,6 +293,17 @@ optab_for_tree_code (enum tree_code code,
>>>>> const_tree type,
>>>>>>       }
>>>>>>   }
>>>>>>
>>>>>> +/* Return the optab used for computing the operation given by the tree
>>> code,
>>>>>> +   CODE and the tree EXP.  This function is not always usable (for example, it
>>>>>> +   cannot give complete results for multiplication or division) but probably
>>>>>> +   ought to be relied on more widely throughout the expander.  */
>>>>>> +optab
>>>>>> +optab_for_tree_code (enum tree_code code, const_tree type,
>>>>>> +                    enum optab_subtype subtype)
>>>>>> +{
>>>>>> +  return optab_for_tree_code_1 (code, type, NULL_TREE, subtype);
>>>>>> +}
>>>>>> +
>>>>>>   /* Check whether an operation represented by CODE is a 'half' widening
>>>>> operation
>>>>>>      in which the input vector type has half the number of bits of the output
>>>>>>      vector type e.g. V8QI->V8HI.
>>>>>> diff --git a/gcc/optabs-tree.h b/gcc/optabs-tree.h
>>>>>> index f2b49991462..13ff7ca2e4b 100644
>>>>>> --- a/gcc/optabs-tree.h
>>>>>> +++ b/gcc/optabs-tree.h
>>>>>> @@ -36,6 +36,8 @@ enum optab_subtype
>>>>>>   /* Return the optab used for computing the given operation on the type
>>> given
>>>>> by
>>>>>>      the second argument.  The third argument distinguishes between the types
>>> of
>>>>>>      vector shifts and rotates.  */
>>>>>> +optab optab_for_tree_code_1 (enum tree_code, const_tree, const_tree
>>>>> __attribute__((unused)),
>>>>>> +                           enum optab_subtype );
>>>>>>   optab optab_for_tree_code (enum tree_code, const_tree, enum
>>>>> optab_subtype);
>>>>>>   bool
>>>>>>   supportable_half_widening_operation (enum tree_code, tree, tree,
>>>>>> diff --git a/gcc/optabs.cc b/gcc/optabs.cc
>>>>>> index ce91f94ed43..3a1c6c7b90e 100644
>>>>>> --- a/gcc/optabs.cc
>>>>>> +++ b/gcc/optabs.cc
>>>>>> @@ -311,7 +311,7 @@ expand_widen_pattern_expr (sepops ops, rtx op0,
>>> rtx
>>>>> op1, rtx wide_op,
>>>>>>          gcc_unreachable ();
>>>>>>
>>>>>>         widen_pattern_optab
>>>>>> -       = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), subtype);
>>>>>> +       = optab_for_tree_code_1 (ops->code, TREE_TYPE (oprnd0), TREE_TYPE
>>>>> (oprnd2), subtype);
>>>>>>       }
>>>>>>     else
>>>>>>       widen_pattern_optab
>>>>>> diff --git a/gcc/optabs.def b/gcc/optabs.def
>>>>>> index ad14f9328b9..cf1a6e7a7dc 100644
>>>>>> --- a/gcc/optabs.def
>>>>>> +++ b/gcc/optabs.def
>>>>>> @@ -408,6 +408,8 @@ OPTAB_D (sdot_prod_optab, "sdot_prod$I$a")
>>>>>>   OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3")
>>>>>>   OPTAB_D (udot_prod_optab, "udot_prod$I$a")
>>>>>>   OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
>>>>>> +OPTAB_D (sdot_prod_twoway_optab, "sdot_prod_twoway_$I$a")
>>>>>> +OPTAB_D (udot_prod_twoway_optab, "udot_prod_twoway_$I$a")
>>>>>>   OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
>>>>>>   OPTAB_D (usad_optab, "usad$I$a")
>>>>>>   OPTAB_D (ssad_optab, "ssad$I$a")
>>>>>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
>>>>> b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
>>>>>> new file mode 100644
>>>>>> index 00000000000..cba2aadbec8
>>>>>> --- /dev/null
>>>>>> +++ b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
>>>>>> @@ -0,0 +1,24 @@
>>>>>> +/* { dg-do compile { target { aarch64*-*-* } } } */
>>>>>> +/* { dg-additional-options "-march=-O3 -march=armv9.2-a+sme2 -fdump-
>>> tree-
>>>>> vect-details" { target { aarch64*-*-* } } } */
>>>>>> +
>>>>>> +#include <stdint.h>
>>>>>> +
>>>>>> +uint32_t udot2(int n, uint16_t* data) {
>>>>>> +  uint32_t sum = 0;
>>>>>> +  for (int i=0; i<n; i+=1) {
>>>>>> +    sum += data[i] * data[i];
>>>>>> +  }
>>>>>> +  return sum;
>>>>>> +}
>>>>>> +
>>>>>> +int32_t sdot2(int n, int16_t* data) {
>>>>>> +  int32_t sum = 0;
>>>>>> +  for (int i=0; i<n; i+=1) {
>>>>>> +    sum += data[i] * data[i];
>>>>>> +  }
>>>>>> +  return sum;
>>>>>> +}
>>>>>> +
>>>>>> +/* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern:
>>> detected:"
>>>>> 2 "vect" } } */
>>>>>> +/* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern:
>>>>> detected" 4 "vect" } } */
>>>>>> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } } */
>>>>>> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
>>>>>> index dfb7d800526..0760f25d94d 100644
>>>>>> --- a/gcc/tree-vect-patterns.cc
>>>>>> +++ b/gcc/tree-vect-patterns.cc
>>>>>> @@ -233,7 +233,7 @@ vect_supportable_direct_optab_p (vec_info *vinfo,
>>> tree
>>>>> otype, tree_code code,
>>>>>>     if (!vecotype)
>>>>>>       return false;
>>>>>>
>>>>>> -  optab optab = optab_for_tree_code (code, vecitype, subtype);
>>>>>> +  optab optab = optab_for_tree_code_1 (code, vecitype, vecotype, subtype);
>>>>>>     if (!optab)
>>>>>>       return false;
>>>>>>
>>>>>> --
>>>>>> 2.34.1
>>>>>>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer
  2024-05-17 10:13         ` Richard Biener
  2024-05-17 14:53           ` Victor Do Nascimento
@ 2024-05-20 16:28           ` Richard Sandiford
  1 sibling, 0 replies; 14+ messages in thread
From: Richard Sandiford @ 2024-05-20 16:28 UTC (permalink / raw)
  To: Richard Biener
  Cc: Tamar Christina, Victor Do Nascimento, gcc-patches,
	Richard Earnshaw, Victor Do Nascimento

Richard Biener <richard.guenther@gmail.com> writes:
> On Fri, May 17, 2024 at 11:56 AM Tamar Christina
> <Tamar.Christina@arm.com> wrote:
>>
>> > -----Original Message-----
>> > From: Richard Biener <richard.guenther@gmail.com>
>> > Sent: Friday, May 17, 2024 10:46 AM
>> > To: Tamar Christina <Tamar.Christina@arm.com>
>> > Cc: Victor Do Nascimento <Victor.DoNascimento@arm.com>; gcc-
>> > patches@gcc.gnu.org; Richard Sandiford <Richard.Sandiford@arm.com>; Richard
>> > Earnshaw <Richard.Earnshaw@arm.com>; Victor Do Nascimento
>> > <vicdon01@e133397.arm.com>
>> > Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in
>> > autovectorizer
>> >
>> > On Fri, May 17, 2024 at 11:05 AM Tamar Christina
>> > <Tamar.Christina@arm.com> wrote:
>> > >
>> > > > -----Original Message-----
>> > > > From: Richard Biener <richard.guenther@gmail.com>
>> > > > Sent: Friday, May 17, 2024 6:51 AM
>> > > > To: Victor Do Nascimento <Victor.DoNascimento@arm.com>
>> > > > Cc: gcc-patches@gcc.gnu.org; Richard Sandiford
>> > <Richard.Sandiford@arm.com>;
>> > > > Richard Earnshaw <Richard.Earnshaw@arm.com>; Victor Do Nascimento
>> > > > <vicdon01@e133397.arm.com>
>> > > > Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in
>> > > > autovectorizer
>> > > >
>> > > > On Thu, May 16, 2024 at 4:40 PM Victor Do Nascimento
>> > > > <victor.donascimento@arm.com> wrote:
>> > > > >
>> > > > > From: Victor Do Nascimento <vicdon01@e133397.arm.com>
>> > > > >
>> > > > > At present, the compiler offers the `{u|s|us}dot_prod_optab' direct
>> > > > > optabs for dealing with vectorizable dot product code sequences.  The
>> > > > > consequence of using a direct optab for this is that backend-pattern
>> > > > > selection is only ever able to match against one datatype - Either
>> > > > > that of the operands or of the accumulated value, never both.
>> > > > >
>> > > > > With the introduction of the 2-way (un)signed dot-product insn [1][2]
>> > > > > in AArch64 SVE2, the existing direct opcode approach is no longer
>> > > > > sufficient for full specification of all the possible dot product
>> > > > > machine instructions to be matched to the code sequence; a dot product
>> > > > > resulting in VNx4SI may result from either dot products on VNx16QI or
>> > > > > VNx8HI values for the 4- and 2-way dot product operations, respectively.
>> > > > >
>> > > > > This means that the following example fails autovectorization:
>> > > > >
>> > > > > uint32_t foo(int n, uint16_t* data) {
>> > > > >   uint32_t sum = 0;
>> > > > >   for (int i=0; i<n; i+=1) {
>> > > > >     sum += data[i] * data[i];
>> > > > >   }
>> > > > >   return sum;
>> > > > > }
>> > > > >
>> > > > > To remedy the issue a new optab is added, tentatively named
>> > > > > `udot_prod_twoway_optab', whose selection is dependent upon checking
>> > > > > of both input and output types involved in the operation.
>> > > >
>> > > > I don't like this too much.  I'll note we document dot_prod as
>> > > >
>> > > > @cindex @code{sdot_prod@var{m}} instruction pattern
>> > > > @item @samp{sdot_prod@var{m}}
>> > > >
>> > > > Compute the sum of the products of two signed elements.
>> > > > Operand 1 and operand 2 are of the same mode. Their
>> > > > product, which is of a wider mode, is computed and added to operand 3.
>> > > > Operand 3 is of a mode equal or wider than the mode of the product. The
>> > > > result is placed in operand 0, which is of the same mode as operand 3.
>> > > > @var{m} is the mode of operand 1 and operand 2.
>> > > >
>> > > > with no restriction on the wider mode but we don't specify it which is
>> > > > bad design.  This should have been a convert optab with two modes
>> > > > from the start - adding a _twoway variant is just a hack.
>> > >
>> > > We did discuss this at the time we started implementing it.  There was two
>> > > options, one was indeed to change it to a convert dot_prod optab, but doing
>> > > this means we have to update every target that uses it.
>> > >
>> > > Now that means 3 ISAs for AArch64, Arm, Arc, c6x, 2 for x86, loongson and
>> > altivec.
>> > >
>> > > Which sure could be possible, but there's also every use in the backends that
>> > need
>> > > to be updated, and tested, which for some targets we don't even know how to
>> > begin.
>> > >
>> > > So it seems very hard to correct dotprod to a convert optab now.
>> >
>> > It's still the correct way to go.  At _least_ your new pattern should
>> > have been this,
>> > otherwise what do you do when you have two-way, four-way and eight-way
>> > variants?
>> > Add yet another optab?
>>
>> I guess that's fair, but having the new optab only be convert resulted in messy
>> code as everywhere you must check for both variants.
>>
>> Additionally that optab would then overlap with the existing optabs as, as you
>> Say, the documentation only says it's of a wider type and doesn't indicate
>> precision.
>>
>> So to avoid issues down the line then If the new optab isn't acceptable then
>> we'll have to do a wholesale conversion then..
>
> Yep.  It shouldn't be difficult though.

Still catching up, but FWIW, I agree this is the way to go.  (Convert all
existing dot_prods to convert optabs first, and then add the new AArch64
ones.)  Having two mechanisms feels like storing up trouble for later. :)

Richard

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-05-20 16:28 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-16 14:39 [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer Victor Do Nascimento
2024-05-16 14:45 ` Andrew Pinski
2024-05-16 17:45 ` Tamar Christina
2024-05-16 17:53   ` Andrew Pinski
2024-05-17  2:08 ` Hongtao Liu
2024-05-17  2:13   ` Hongtao Liu
2024-05-17  9:09     ` Tamar Christina
2024-05-17  5:51 ` Richard Biener
2024-05-17  9:04   ` Tamar Christina
2024-05-17  9:46     ` Richard Biener
2024-05-17  9:55       ` Tamar Christina
2024-05-17 10:13         ` Richard Biener
2024-05-17 14:53           ` Victor Do Nascimento
2024-05-20 16:28           ` Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).