[2/3][vect] Add widening add, subtract vect patterns

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [2/3][vect] Add widening add, subtract vect patterns
@ 2020-11-12 19:34 Joel Hutton
  2020-11-13  7:58 ` Richard Biener
  2020-11-13 12:16 ` Richard Sandiford
  0 siblings, 2 replies; 6+ messages in thread
From: Joel Hutton @ 2020-11-12 19:34 UTC (permalink / raw)
  To: GCC Patches; +Cc: Kyrylo Tkachov, Richard Biener

[-- Attachment #1: Type: text/plain, Size: 1683 bytes --]

Hi all,

This patch adds widening add and widening subtract patterns to tree-vect-patterns.

All 3 patches together bootstrapped and regression tested on aarch64.

gcc/ChangeLog:

2020-11-12  Joel Hutton  <joel.hutton@arm.com>

        * expr.c (expand_expr_real_2): add widen_add,widen_subtract cases
        * optabs-tree.c (optab_for_tree_code): optabs for widening adds,subtracts
        * optabs.def (OPTAB_D): define vectorized widen add, subtracts
        * tree-cfg.c (verify_gimple_assign_binary): Add case for widening adds, subtracts
        * tree-inline.c (estimate_operator_cost): Add case for widening adds, subtracts
        * tree-vect-generic.c (expand_vector_operations_1): Add case for widening adds, subtracts
        * tree-vect-patterns.c (vect_recog_widen_add_pattern): New recog ptatern
        (vect_recog_widen_sub_pattern): New recog pattern
        (vect_recog_average_pattern): Update widened add code
        (vect_recog_average_pattern): Update widened add code
        * tree-vect-stmts.c (vectorizable_conversion): Add case for widened add, subtract
        (supportable_widening_operation): Add case for widened add, subtract
        * tree.def (WIDEN_ADD_EXPR): New tree code
        (WIDEN_SUB_EXPR): New tree code
        (VEC_WIDEN_ADD_HI_EXPR): New tree code
        (VEC_WIDEN_ADD_LO_EXPR): New tree code
        (VEC_WIDEN_SUB_HI_EXPR): New tree code
        (VEC_WIDEN_SUB_LO_EXPR): New tree code

gcc/testsuite/ChangeLog:

2020-11-12  Joel Hutton  <joel.hutton@arm.com>

        * gcc.target/aarch64/vect-widen-add.c: New test.
        * gcc.target/aarch64/vect-widen-sub.c: New test.


Ok for trunk?

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0002-vect-Add-widening-add-subtract-patterns.patch --]
[-- Type: text/x-patch; name="0002-vect-Add-widening-add-subtract-patterns.patch", Size: 17233 bytes --]

From e0c10ca554729b9e6d58dbd3f18ba72b2c3ee8bc Mon Sep 17 00:00:00 2001
From: Joel Hutton <joel.hutton@arm.com>
Date: Mon, 9 Nov 2020 15:44:18 +0000
Subject: [PATCH 2/3] [vect] Add widening add, subtract patterns

Add widening add, subtract patterns to tree-vect-patterns.
Add aarch64 tests for patterns.

fix sad
---
 gcc/expr.c                                    |  6 ++
 gcc/optabs-tree.c                             | 17 ++++
 gcc/optabs.def                                |  8 ++
 .../gcc.target/aarch64/vect-widen-add.c       | 90 +++++++++++++++++++
 .../gcc.target/aarch64/vect-widen-sub.c       | 90 +++++++++++++++++++
 gcc/tree-cfg.c                                |  8 ++
 gcc/tree-inline.c                             |  6 ++
 gcc/tree-vect-generic.c                       |  4 +
 gcc/tree-vect-patterns.c                      | 32 +++++--
 gcc/tree-vect-stmts.c                         | 15 +++-
 gcc/tree.def                                  |  6 ++
 11 files changed, 276 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-widen-add.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c

diff --git a/gcc/expr.c b/gcc/expr.c
index ae16f07775870792729e3805436d7f2debafb6ca..ffc8aed5296174066849d9e0d73b1c352c20fd9e 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -9034,6 +9034,8 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
 					  target, unsignedp);
       return target;
 
+    case WIDEN_ADD_EXPR:
+    case WIDEN_SUB_EXPR:
     case WIDEN_MULT_EXPR:
       /* If first operand is constant, swap them.
 	 Thus the following special case checks need only
@@ -9754,6 +9756,10 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
 	return temp;
       }
 
+    case VEC_WIDEN_ADD_HI_EXPR:
+    case VEC_WIDEN_ADD_LO_EXPR:
+    case VEC_WIDEN_SUB_HI_EXPR:
+    case VEC_WIDEN_SUB_LO_EXPR:
     case VEC_WIDEN_MULT_HI_EXPR:
     case VEC_WIDEN_MULT_LO_EXPR:
     case VEC_WIDEN_MULT_EVEN_EXPR:
diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
index 4dfda756932de1693667c39c6fabed043b20b63b..009dccfa3bd298bca7b3b45401a4cc2acc90ff21 100644
--- a/gcc/optabs-tree.c
+++ b/gcc/optabs-tree.c
@@ -170,6 +170,23 @@ optab_for_tree_code (enum tree_code code, const_tree type,
       return (TYPE_UNSIGNED (type)
 	      ? vec_widen_ushiftl_lo_optab : vec_widen_sshiftl_lo_optab);
 
+    case VEC_WIDEN_ADD_LO_EXPR:
+      return (TYPE_UNSIGNED (type)
+	      ? vec_widen_uaddl_lo_optab  : vec_widen_saddl_lo_optab);
+
+    case VEC_WIDEN_ADD_HI_EXPR:
+      return (TYPE_UNSIGNED (type)
+	      ? vec_widen_uaddl_hi_optab  : vec_widen_saddl_hi_optab);
+
+    case VEC_WIDEN_SUB_LO_EXPR:
+      return (TYPE_UNSIGNED (type)
+	      ? vec_widen_usubl_lo_optab  : vec_widen_ssubl_lo_optab);
+
+    case VEC_WIDEN_SUB_HI_EXPR:
+      return (TYPE_UNSIGNED (type)
+	      ? vec_widen_usubl_hi_optab  : vec_widen_ssubl_hi_optab);
+
+
     case VEC_UNPACK_HI_EXPR:
       return (TYPE_UNSIGNED (type)
 	      ? vec_unpacku_hi_optab : vec_unpacks_hi_optab);
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 78409aa14537d259bf90277751aac00d452a0d3f..a97cdb360781ca9c743e2991422c600626c75aa5 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -383,6 +383,14 @@ OPTAB_D (vec_widen_smult_even_optab, "vec_widen_smult_even_$a")
 OPTAB_D (vec_widen_smult_hi_optab, "vec_widen_smult_hi_$a")
 OPTAB_D (vec_widen_smult_lo_optab, "vec_widen_smult_lo_$a")
 OPTAB_D (vec_widen_smult_odd_optab, "vec_widen_smult_odd_$a")
+OPTAB_D (vec_widen_ssubl_hi_optab, "vec_widen_ssubl_hi_$a")
+OPTAB_D (vec_widen_ssubl_lo_optab, "vec_widen_ssubl_lo_$a")
+OPTAB_D (vec_widen_saddl_hi_optab, "vec_widen_saddl_hi_$a")
+OPTAB_D (vec_widen_saddl_lo_optab, "vec_widen_saddl_lo_$a")
+OPTAB_D (vec_widen_usubl_hi_optab, "vec_widen_usubl_hi_$a")
+OPTAB_D (vec_widen_usubl_lo_optab, "vec_widen_usubl_lo_$a")
+OPTAB_D (vec_widen_uaddl_hi_optab, "vec_widen_uaddl_hi_$a")
+OPTAB_D (vec_widen_uaddl_lo_optab, "vec_widen_uaddl_lo_$a")
 OPTAB_D (vec_widen_sshiftl_hi_optab, "vec_widen_sshiftl_hi_$a")
 OPTAB_D (vec_widen_sshiftl_lo_optab, "vec_widen_sshiftl_lo_$a")
 OPTAB_D (vec_widen_umult_even_optab, "vec_widen_umult_even_$a")
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c b/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c
new file mode 100644
index 0000000000000000000000000000000000000000..fc6966fd9f257170501247411d50428aaaabbdae
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c
@@ -0,0 +1,90 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -save-temps" } */
+#include <stdint.h>
+#include <string.h>
+
+#define ARR_SIZE 1024
+
+/* Should produce an uaddl */
+void uadd_opt (uint32_t *foo, uint16_t *a, uint16_t *b)
+{
+    for( int i = 0; i < ARR_SIZE - 3;i=i+4)
+    {
+        foo[i]   = a[i]   + b[i];
+        foo[i+1] = a[i+1] + b[i+1];
+        foo[i+2] = a[i+2] + b[i+2];
+        foo[i+3] = a[i+3] + b[i+3];
+    }
+}
+
+__attribute__((optimize (0)))
+void uadd_nonopt (uint32_t *foo, uint16_t *a, uint16_t *b)
+{
+    for( int i = 0; i < ARR_SIZE - 3;i=i+4)
+    {
+        foo[i]   = a[i]   + b[i];
+        foo[i+1] = a[i+1] + b[i+1];
+        foo[i+2] = a[i+2] + b[i+2];
+        foo[i+3] = a[i+3] + b[i+3];
+    }
+}
+
+/* Should produce an saddl */
+void sadd_opt (int32_t *foo, int16_t *a, int16_t *b)
+{
+    for( int i = 0; i < ARR_SIZE - 3;i=i+4)
+    {
+        foo[i]   = a[i]   + b[i];
+        foo[i+1] = a[i+1] + b[i+1];
+        foo[i+2] = a[i+2] + b[i+2];
+        foo[i+3] = a[i+3] + b[i+3];
+    }
+}
+
+__attribute__((optimize (0)))
+void sadd_nonopt (int32_t *foo, int16_t *a, int16_t *b)
+{
+    for( int i = 0; i < ARR_SIZE - 3;i=i+4)
+    {
+        foo[i]   = a[i]   + b[i];
+        foo[i+1] = a[i+1] + b[i+1];
+        foo[i+2] = a[i+2] + b[i+2];
+        foo[i+3] = a[i+3] + b[i+3];
+    }
+}
+
+
+void __attribute__((optimize (0)))
+init(uint16_t *a, uint16_t *b)
+{
+    for( int i = 0; i < ARR_SIZE;i++)
+    {
+      a[i] = i;
+      b[i] = 2*i;
+    }
+}
+
+int __attribute__((optimize (0)))
+main()
+{
+    uint32_t foo_arr[ARR_SIZE];
+    uint32_t bar_arr[ARR_SIZE];
+    uint16_t a[ARR_SIZE];
+    uint16_t b[ARR_SIZE];
+
+    init(a, b);
+    uadd_opt(foo_arr, a, b);
+    uadd_nonopt(bar_arr, a, b);
+    if (memcmp(foo_arr, bar_arr, ARR_SIZE) != 0)
+      return 1;
+    sadd_opt((int32_t*) foo_arr, (int16_t*) a, (int16_t*) b);
+    sadd_nonopt((int32_t*) bar_arr, (int16_t*) a, (int16_t*) b);
+    if (memcmp(foo_arr, bar_arr, ARR_SIZE) != 0)
+      return 1;
+    return 0;
+}
+
+/* { dg-final { scan-assembler-times "uaddl\t" 1} } */
+/* { dg-final { scan-assembler-times "uaddl2\t" 1} } */
+/* { dg-final { scan-assembler-times "saddl\t" 1} } */
+/* { dg-final { scan-assembler-times "saddl2\t" 1} } */
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c b/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c
new file mode 100644
index 0000000000000000000000000000000000000000..eab252786cd3a24974011c0b4451029ac1194935
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c
@@ -0,0 +1,90 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -save-temps" } */
+#include <stdint.h>
+#include <string.h>
+
+#define ARR_SIZE 1024
+
+/* Should produce an usubl */
+void usub_opt (uint32_t *foo, uint16_t *a, uint16_t *b)
+{
+    for( int i = 0; i < ARR_SIZE - 3;i=i+4)
+    {
+        foo[i]   = a[i]   - b[i];
+        foo[i+1] = a[i+1] - b[i+1];
+        foo[i+2] = a[i+2] - b[i+2];
+        foo[i+3] = a[i+3] - b[i+3];
+    }
+}
+
+__attribute__((optimize (0)))
+void usub_nonopt (uint32_t *foo, uint16_t *a, uint16_t *b)
+{
+    for( int i = 0; i < ARR_SIZE - 3;i=i+4)
+    {
+        foo[i]   = a[i]   - b[i];
+        foo[i+1] = a[i+1] - b[i+1];
+        foo[i+2] = a[i+2] - b[i+2];
+        foo[i+3] = a[i+3] - b[i+3];
+    }
+}
+
+/* Should produce an ssubl */
+void ssub_opt (int32_t *foo, int16_t *a, int16_t *b)
+{
+    for( int i = 0; i < ARR_SIZE - 3;i=i+4)
+    {
+        foo[i]   = a[i]   - b[i];
+        foo[i+1] = a[i+1] - b[i+1];
+        foo[i+2] = a[i+2] - b[i+2];
+        foo[i+3] = a[i+3] - b[i+3];
+    }
+}
+
+__attribute__((optimize (0)))
+void ssub_nonopt (int32_t *foo, int16_t *a, int16_t *b)
+{
+    for( int i = 0; i < ARR_SIZE - 3;i=i+4)
+    {
+        foo[i]   = a[i]   - b[i];
+        foo[i+1] = a[i+1] - b[i+1];
+        foo[i+2] = a[i+2] - b[i+2];
+        foo[i+3] = a[i+3] - b[i+3];
+    }
+}
+
+
+void __attribute__((optimize (0)))
+init(uint16_t *a, uint16_t *b)
+{
+    for( int i = 0; i < ARR_SIZE;i++)
+    {
+      a[i] = i;
+      b[i] = 2*i;
+    }
+}
+
+int __attribute__((optimize (0)))
+main()
+{
+    uint32_t foo_arr[ARR_SIZE];
+    uint32_t bar_arr[ARR_SIZE];
+    uint16_t a[ARR_SIZE];
+    uint16_t b[ARR_SIZE];
+
+    init(a, b);
+    usub_opt(foo_arr, a, b);
+    usub_nonopt(bar_arr, a, b);
+    if (memcmp(foo_arr, bar_arr, ARR_SIZE) != 0)
+      return 1;
+    ssub_opt((int32_t*) foo_arr, (int16_t*) a, (int16_t*) b);
+    ssub_nonopt((int32_t*) bar_arr, (int16_t*) a, (int16_t*) b);
+    if (memcmp(foo_arr, bar_arr, ARR_SIZE) != 0)
+      return 1;
+    return 0;
+}
+
+/* { dg-final { scan-assembler-times "usubl\t" 1} } */
+/* { dg-final { scan-assembler-times "usubl2\t" 1} } */
+/* { dg-final { scan-assembler-times "ssubl\t" 1} } */
+/* { dg-final { scan-assembler-times "ssubl2\t" 1} } */
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index 5139f111fecc7ec6e0902145b808308a5e47450b..532692a5da03d6d9653e2d47a1218982b27c4539 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -3864,6 +3864,12 @@ verify_gimple_assign_binary (gassign *stmt)
         return false;
       }
 
+    case VEC_WIDEN_SUB_HI_EXPR:
+    case VEC_WIDEN_SUB_LO_EXPR:
+    case VEC_WIDEN_ADD_HI_EXPR:
+    case VEC_WIDEN_ADD_LO_EXPR:
+      return false;
+
     case VEC_WIDEN_LSHIFT_HI_EXPR:
     case VEC_WIDEN_LSHIFT_LO_EXPR:
       {
@@ -3885,6 +3891,8 @@ verify_gimple_assign_binary (gassign *stmt)
         return false;
       }
 
+    case WIDEN_ADD_EXPR:
+    case WIDEN_SUB_EXPR:
     case PLUS_EXPR:
     case MINUS_EXPR:
       {
diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
index 32424b169c7310c03168baf43cc56a8b6ef2f15b..bff650be79e0c820f619ab333871770658ce93ee 100644
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -4224,6 +4224,8 @@ estimate_operator_cost (enum tree_code code, eni_weights *weights,
 
     case REALIGN_LOAD_EXPR:
 
+    case WIDEN_ADD_EXPR:
+    case WIDEN_SUB_EXPR:
     case WIDEN_SUM_EXPR:
     case WIDEN_MULT_EXPR:
     case DOT_PROD_EXPR:
@@ -4232,6 +4234,10 @@ estimate_operator_cost (enum tree_code code, eni_weights *weights,
     case WIDEN_MULT_MINUS_EXPR:
     case WIDEN_LSHIFT_EXPR:
 
+    case VEC_WIDEN_ADD_HI_EXPR:
+    case VEC_WIDEN_ADD_LO_EXPR:
+    case VEC_WIDEN_SUB_HI_EXPR:
+    case VEC_WIDEN_SUB_LO_EXPR:
     case VEC_WIDEN_MULT_HI_EXPR:
     case VEC_WIDEN_MULT_LO_EXPR:
     case VEC_WIDEN_MULT_EVEN_EXPR:
diff --git a/gcc/tree-vect-generic.c b/gcc/tree-vect-generic.c
index d7bafa77134079faf743c1b482251311abb681c5..940658c6be9c41cdc35b4b72d78fc6c2ed3f2072 100644
--- a/gcc/tree-vect-generic.c
+++ b/gcc/tree-vect-generic.c
@@ -2118,6 +2118,10 @@ expand_vector_operations_1 (gimple_stmt_iterator *gsi,
      arguments, not the widened result.  VEC_UNPACK_FLOAT_*_EXPR is
      calculated in the same way above.  */
   if (code == WIDEN_SUM_EXPR
+      || code == VEC_WIDEN_ADD_HI_EXPR
+      || code == VEC_WIDEN_ADD_LO_EXPR
+      || code == VEC_WIDEN_SUB_HI_EXPR
+      || code == VEC_WIDEN_SUB_LO_EXPR
       || code == VEC_WIDEN_MULT_HI_EXPR
       || code == VEC_WIDEN_MULT_LO_EXPR
       || code == VEC_WIDEN_MULT_EVEN_EXPR
diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index f68a87e05ed54145a25ccff598eeef9e57f9a759..331027444a6d95343eb110f7f9c7db19b40ee5ee 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -1086,8 +1086,10 @@ vect_recog_sad_pattern (vec_info *vinfo,
      of the above pattern.  */
 
   tree plus_oprnd0, plus_oprnd1;
-  if (!vect_reassociating_reduction_p (vinfo, stmt_vinfo, PLUS_EXPR,
-				       &plus_oprnd0, &plus_oprnd1))
+  if (!(vect_reassociating_reduction_p (vinfo, stmt_vinfo, PLUS_EXPR,
+				       &plus_oprnd0, &plus_oprnd1)
+	|| vect_reassociating_reduction_p (vinfo, stmt_vinfo, WIDEN_ADD_EXPR,
+				       &plus_oprnd0, &plus_oprnd1)))
     return NULL;
 
   tree sum_type = gimple_expr_type (last_stmt);
@@ -1148,7 +1150,7 @@ vect_recog_sad_pattern (vec_info *vinfo,
   /* FORNOW.  Can continue analyzing the def-use chain when this stmt in a phi
      inside the loop (in case we are analyzing an outer-loop).  */
   vect_unpromoted_value unprom[2];
-  if (!vect_widened_op_tree (vinfo, diff_stmt_vinfo, MINUS_EXPR, MINUS_EXPR,
+  if (!vect_widened_op_tree (vinfo, diff_stmt_vinfo, MINUS_EXPR, WIDEN_SUB_EXPR,
 			     false, 2, unprom, &half_type))
     return NULL;
 
@@ -1262,6 +1264,24 @@ vect_recog_widen_mult_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info,
 				      "vect_recog_widen_mult_pattern");
 }
 
+static gimple *
+vect_recog_widen_add_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info,
+			       tree *type_out)
+{
+  return vect_recog_widen_op_pattern (vinfo, last_stmt_info, type_out,
+				      PLUS_EXPR, WIDEN_ADD_EXPR, false,
+				      "vect_recog_widen_add_pattern");
+}
+
+static gimple *
+vect_recog_widen_sub_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info,
+			       tree *type_out)
+{
+  return vect_recog_widen_op_pattern (vinfo, last_stmt_info, type_out,
+				      MINUS_EXPR, WIDEN_SUB_EXPR, false,
+				      "vect_recog_widen_sub_pattern");
+}
+
 /* Function vect_recog_pow_pattern
 
    Try to find the following pattern:
@@ -1978,7 +1998,7 @@ vect_recog_average_pattern (vec_info *vinfo,
   vect_unpromoted_value unprom[3];
   tree new_type;
   unsigned int nops = vect_widened_op_tree (vinfo, plus_stmt_info, PLUS_EXPR,
-					    PLUS_EXPR, false, 3,
+					    WIDEN_ADD_EXPR, false, 3,
 					    unprom, &new_type);
   if (nops == 0)
     return NULL;
@@ -5249,7 +5269,9 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = {
      of mask conversion that are needed for gather and scatter
      internal functions.  */
   { vect_recog_gather_scatter_pattern, "gather_scatter" },
-  { vect_recog_mask_conversion_pattern, "mask_conversion" }
+  { vect_recog_mask_conversion_pattern, "mask_conversion" },
+  { vect_recog_widen_add_pattern, "widen_add" },
+  { vect_recog_widen_sub_pattern, "widen_sub" },
 };
 
 const unsigned int NUM_PATTERNS = ARRAY_SIZE (vect_vect_recog_func_ptrs);
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 2c7a8a70913bfc4b9903e9328f4489257ca59e02..f12fd158b13656ee24022ec7e445c53444be6554 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -4570,6 +4570,8 @@ vectorizable_conversion (vec_info *vinfo,
   if (!CONVERT_EXPR_CODE_P (code)
       && code != FIX_TRUNC_EXPR
       && code != FLOAT_EXPR
+      && code != WIDEN_ADD_EXPR
+      && code != WIDEN_SUB_EXPR
       && code != WIDEN_MULT_EXPR
       && code != WIDEN_LSHIFT_EXPR)
     return false;
@@ -4615,7 +4617,8 @@ vectorizable_conversion (vec_info *vinfo,
 
   if (op_type == binary_op)
     {
-      gcc_assert (code == WIDEN_MULT_EXPR || code == WIDEN_LSHIFT_EXPR);
+      gcc_assert (code == WIDEN_MULT_EXPR || code == WIDEN_LSHIFT_EXPR
+		  || code == WIDEN_ADD_EXPR || code == WIDEN_SUB_EXPR);
 
       op1 = gimple_assign_rhs2 (stmt);
       tree vectype1_in;
@@ -11534,6 +11537,16 @@ supportable_widening_operation (vec_info *vinfo,
       c2 = VEC_WIDEN_LSHIFT_HI_EXPR;
       break;
 
+    case WIDEN_ADD_EXPR:
+      c1 = VEC_WIDEN_ADD_LO_EXPR;
+      c2 = VEC_WIDEN_ADD_HI_EXPR;
+      break;
+
+    case WIDEN_SUB_EXPR:
+      c1 = VEC_WIDEN_SUB_LO_EXPR;
+      c2 = VEC_WIDEN_SUB_HI_EXPR;
+      break;
+
     CASE_CONVERT:
       c1 = VEC_UNPACK_LO_EXPR;
       c2 = VEC_UNPACK_HI_EXPR;
diff --git a/gcc/tree.def b/gcc/tree.def
index 6c53fe1bf67cd8eee7084de0b20b8d217d70710a..daaa2f22b384739c2d9fddcb1fd9185099e11788 100644
--- a/gcc/tree.def
+++ b/gcc/tree.def
@@ -1359,6 +1359,8 @@ DEFTREECODE (WIDEN_MULT_MINUS_EXPR, "widen_mult_minus_expr", tcc_expression, 3)
    the first argument from type t1 to type t2, and then shifting it
    by the second argument.  */
 DEFTREECODE (WIDEN_LSHIFT_EXPR, "widen_lshift_expr", tcc_binary, 2)
+DEFTREECODE (WIDEN_ADD_EXPR, "widen_add_expr", tcc_binary, 2)
+DEFTREECODE (WIDEN_SUB_EXPR, "widen_sub_expr", tcc_binary, 2)
 
 /* Widening vector multiplication.
    The two operands are vectors with N elements of size S. Multiplying the
@@ -1423,6 +1425,10 @@ DEFTREECODE (VEC_PACK_FLOAT_EXPR, "vec_pack_float_expr", tcc_binary, 2)
  */
 DEFTREECODE (VEC_WIDEN_LSHIFT_HI_EXPR, "widen_lshift_hi_expr", tcc_binary, 2)
 DEFTREECODE (VEC_WIDEN_LSHIFT_LO_EXPR, "widen_lshift_lo_expr", tcc_binary, 2)
+DEFTREECODE (VEC_WIDEN_ADD_HI_EXPR, "widen_add_hi_expr", tcc_binary, 2)
+DEFTREECODE (VEC_WIDEN_ADD_LO_EXPR, "widen_add_lo_expr", tcc_binary, 2)
+DEFTREECODE (VEC_WIDEN_SUB_HI_EXPR, "widen_add_hi_expr", tcc_binary, 2)
+DEFTREECODE (VEC_WIDEN_SUB_LO_EXPR, "widen_add_lo_expr", tcc_binary, 2)
 
 /* PREDICT_EXPR.  Specify hint for branch prediction.  The
    PREDICT_EXPR_PREDICTOR specify predictor and PREDICT_EXPR_OUTCOME the
-- 
2.17.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [2/3][vect] Add widening add, subtract vect patterns
  2020-11-12 19:34 [2/3][vect] Add widening add, subtract vect patterns Joel Hutton
@ 2020-11-13  7:58 ` Richard Biener
  2020-11-13 12:16 ` Richard Sandiford
  1 sibling, 0 replies; 6+ messages in thread
From: Richard Biener @ 2020-11-13  7:58 UTC (permalink / raw)
  To: Joel Hutton; +Cc: GCC Patches, Kyrylo Tkachov, richard.sandiford

On Thu, 12 Nov 2020, Joel Hutton wrote:

> Hi all,
> 
> This patch adds widening add and widening subtract patterns to 
> tree-vect-patterns.

I am missing documentation in md.texi for the new patterns.  In
particular I wonder why you need singed and unsigned variants
for the add/subtract patterns.

We're walking away from adding tree codes for new vectorizer
pieces and instead want to use direct internal functions for them.
Can you rework the patch to use this approach?

Thanks,
Richard.

> All 3 patches together bootstrapped and regression tested on aarch64.
> 
> gcc/ChangeLog:
> 
> 2020-11-12 ?Joel Hutton ?<joel.hutton@arm.com>
> 
> ? ? ? ? * expr.c (expand_expr_real_2): add widen_add,widen_subtract cases
> ? ? ? ? * optabs-tree.c (optab_for_tree_code): optabs for widening adds,subtracts
> ? ? ? ? * optabs.def (OPTAB_D): define vectorized widen add, subtracts
> ? ? ? ? * tree-cfg.c (verify_gimple_assign_binary): Add case for widening adds, subtracts
> ? ? ? ? * tree-inline.c (estimate_operator_cost): Add case for widening adds, subtracts
> ? ? ? ? * tree-vect-generic.c (expand_vector_operations_1): Add case for widening adds, subtracts
> ? ? ? ? * tree-vect-patterns.c (vect_recog_widen_add_pattern): New recog ptatern
> ? ? ? ? (vect_recog_widen_sub_pattern): New recog pattern
> ? ? ? ? (vect_recog_average_pattern): Update widened add code
> ? ? ? ? (vect_recog_average_pattern): Update widened add code
> ? ? ? ? * tree-vect-stmts.c (vectorizable_conversion): Add case for widened add, subtract
> ? ? ? ? (supportable_widening_operation): Add case for widened add, subtract
> ? ? ? ? * tree.def (WIDEN_ADD_EXPR): New tree code
> ? ? ? ? (WIDEN_SUB_EXPR): New tree code
> ? ? ? ? (VEC_WIDEN_ADD_HI_EXPR): New tree code
> ? ? ? ? (VEC_WIDEN_ADD_LO_EXPR): New tree code
> ? ? ? ? (VEC_WIDEN_SUB_HI_EXPR): New tree code
> ? ? ? ? (VEC_WIDEN_SUB_LO_EXPR): New tree code
> 
> gcc/testsuite/ChangeLog:
> 
> 2020-11-12 ?Joel Hutton ?<joel.hutton@arm.com>
> 
> ? ? ? ? * gcc.target/aarch64/vect-widen-add.c: New test.
> ? ? ? ? * gcc.target/aarch64/vect-widen-sub.c: New test.
> 
> 
> Ok for trunk?
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [2/3][vect] Add widening add, subtract vect patterns
  2020-11-12 19:34 [2/3][vect] Add widening add, subtract vect patterns Joel Hutton
  2020-11-13  7:58 ` Richard Biener
@ 2020-11-13 12:16 ` Richard Sandiford
  2020-11-13 16:48   ` Joel Hutton
  1 sibling, 1 reply; 6+ messages in thread
From: Richard Sandiford @ 2020-11-13 12:16 UTC (permalink / raw)
  To: Joel Hutton via Gcc-patches; +Cc: Joel Hutton, Richard Biener

[ There was a discussion on irc about how easy it would be to support
  internal functions and tree codes at the same time, so the agreement
  was to go for tree codes for now with a promise to convert the
  widening-related code to use internal functions for GCC 12. ]

Like Richard said, the new patterns need to be documented in md.texi
and the new tree codes need to be documented in generic.texi.

While we're using tree codes, I think we need to make the naming
consistent with other tree codes: WIDEN_PLUS_EXPR instead of
WIDEN_ADD_EXPR and WIDEN_MINUS_EXPR instead of WIDEN_SUB_EXPR.
Same idea for the VEC_* codes.

(In constrast, the internal functions do try to follow the optab names,
since there's usually a 1:1 correspondence.)

Joel Hutton via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> Hi all,
>
> This patch adds widening add and widening subtract patterns to tree-vect-patterns.
>
> All 3 patches together bootstrapped and regression tested on aarch64.
>
> gcc/ChangeLog:
>
> 2020-11-12  Joel Hutton  <joel.hutton@arm.com>
>
>         * expr.c (expand_expr_real_2): add widen_add,widen_subtract cases

Not that I personally care about this stuff (would love to see changelogs
go away :-)) but some nits:

Each description is supposed to start with a capital letter and end with
a full stop (even if it's not a complete sentence).  Same for the rest
of the log.

>         * optabs-tree.c (optab_for_tree_code): optabs for widening adds,subtracts

The line limit for changelogs is 80 characters.  The entry should say
what changed, so “Handle …” or “Add case for …” or something.

>         * optabs.def (OPTAB_D): define vectorized widen add, subtracts
>         * tree-cfg.c (verify_gimple_assign_binary): Add case for widening adds, subtracts
>         * tree-inline.c (estimate_operator_cost): Add case for widening adds, subtracts
>         * tree-vect-generic.c (expand_vector_operations_1): Add case for widening adds, subtracts
>         * tree-vect-patterns.c (vect_recog_widen_add_pattern): New recog ptatern

typo: pattern

>         (vect_recog_widen_sub_pattern): New recog pattern
>         (vect_recog_average_pattern): Update widened add code
>         (vect_recog_average_pattern): Update widened add code
>         * tree-vect-stmts.c (vectorizable_conversion): Add case for widened add, subtract
>         (supportable_widening_operation): Add case for widened add, subtract
>         * tree.def (WIDEN_ADD_EXPR): New tree code
>         (WIDEN_SUB_EXPR): New tree code
>         (VEC_WIDEN_ADD_HI_EXPR): New tree code
>         (VEC_WIDEN_ADD_LO_EXPR): New tree code
>         (VEC_WIDEN_SUB_HI_EXPR): New tree code
>         (VEC_WIDEN_SUB_LO_EXPR): New tree code
>
> gcc/testsuite/ChangeLog:
>
> 2020-11-12  Joel Hutton  <joel.hutton@arm.com>
>
>         * gcc.target/aarch64/vect-widen-add.c: New test.
>         * gcc.target/aarch64/vect-widen-sub.c: New test.
>
>
> Ok for trunk?
>
> From e0c10ca554729b9e6d58dbd3f18ba72b2c3ee8bc Mon Sep 17 00:00:00 2001
> From: Joel Hutton <joel.hutton@arm.com>
> Date: Mon, 9 Nov 2020 15:44:18 +0000
> Subject: [PATCH 2/3] [vect] Add widening add, subtract patterns
>
> Add widening add, subtract patterns to tree-vect-patterns.
> Add aarch64 tests for patterns.
>
> fix sad

Would be good to expand on this for the final commit message.

> […]
> diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
> index 4dfda756932de1693667c39c6fabed043b20b63b..009dccfa3bd298bca7b3b45401a4cc2acc90ff21 100644
> --- a/gcc/optabs-tree.c
> +++ b/gcc/optabs-tree.c
> @@ -170,6 +170,23 @@ optab_for_tree_code (enum tree_code code, const_tree type,
>        return (TYPE_UNSIGNED (type)
>  	      ? vec_widen_ushiftl_lo_optab : vec_widen_sshiftl_lo_optab);
>  
> +    case VEC_WIDEN_ADD_LO_EXPR:
> +      return (TYPE_UNSIGNED (type)
> +	      ? vec_widen_uaddl_lo_optab  : vec_widen_saddl_lo_optab);
> +
> +    case VEC_WIDEN_ADD_HI_EXPR:
> +      return (TYPE_UNSIGNED (type)
> +	      ? vec_widen_uaddl_hi_optab  : vec_widen_saddl_hi_optab);
> +
> +    case VEC_WIDEN_SUB_LO_EXPR:
> +      return (TYPE_UNSIGNED (type)
> +	      ? vec_widen_usubl_lo_optab  : vec_widen_ssubl_lo_optab);
> +
> +    case VEC_WIDEN_SUB_HI_EXPR:
> +      return (TYPE_UNSIGNED (type)
> +	      ? vec_widen_usubl_hi_optab  : vec_widen_ssubl_hi_optab);
> +
> +

Nits: excess blank line at the end and excess space before the “:”s.

>      case VEC_UNPACK_HI_EXPR:
>        return (TYPE_UNSIGNED (type)
>  	      ? vec_unpacku_hi_optab : vec_unpacks_hi_optab);
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index 78409aa14537d259bf90277751aac00d452a0d3f..a97cdb360781ca9c743e2991422c600626c75aa5 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -383,6 +383,14 @@ OPTAB_D (vec_widen_smult_even_optab, "vec_widen_smult_even_$a")
>  OPTAB_D (vec_widen_smult_hi_optab, "vec_widen_smult_hi_$a")
>  OPTAB_D (vec_widen_smult_lo_optab, "vec_widen_smult_lo_$a")
>  OPTAB_D (vec_widen_smult_odd_optab, "vec_widen_smult_odd_$a")
> +OPTAB_D (vec_widen_ssubl_hi_optab, "vec_widen_ssubl_hi_$a")
> +OPTAB_D (vec_widen_ssubl_lo_optab, "vec_widen_ssubl_lo_$a")
> +OPTAB_D (vec_widen_saddl_hi_optab, "vec_widen_saddl_hi_$a")
> +OPTAB_D (vec_widen_saddl_lo_optab, "vec_widen_saddl_lo_$a")
> +OPTAB_D (vec_widen_usubl_hi_optab, "vec_widen_usubl_hi_$a")
> +OPTAB_D (vec_widen_usubl_lo_optab, "vec_widen_usubl_lo_$a")
> +OPTAB_D (vec_widen_uaddl_hi_optab, "vec_widen_uaddl_hi_$a")
> +OPTAB_D (vec_widen_uaddl_lo_optab, "vec_widen_uaddl_lo_$a")
>  OPTAB_D (vec_widen_sshiftl_hi_optab, "vec_widen_sshiftl_hi_$a")
>  OPTAB_D (vec_widen_sshiftl_lo_optab, "vec_widen_sshiftl_lo_$a")
>  OPTAB_D (vec_widen_umult_even_optab, "vec_widen_umult_even_$a")

Looks like the current code groups signed stuff together and
unsigned stuff together, so would be good to follow that.

> diff --git a/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c b/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..fc6966fd9f257170501247411d50428aaaabbdae
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c
> @@ -0,0 +1,90 @@
> +/* { dg-do run } */
> +/* { dg-options "-O3 -save-temps" } */
> +#include <stdint.h>
> +#include <string.h>
> +
> +#define ARR_SIZE 1024
> +
> +/* Should produce an uaddl */
> +void uadd_opt (uint32_t *foo, uint16_t *a, uint16_t *b)
> +{
> +    for( int i = 0; i < ARR_SIZE - 3;i=i+4)
> +    {
> +        foo[i]   = a[i]   + b[i];
> +        foo[i+1] = a[i+1] + b[i+1];
> +        foo[i+2] = a[i+2] + b[i+2];
> +        foo[i+3] = a[i+3] + b[i+3];
> +    }
> +}
> +
> +__attribute__((optimize (0)))
> +void uadd_nonopt (uint32_t *foo, uint16_t *a, uint16_t *b)
> +{
> +    for( int i = 0; i < ARR_SIZE - 3;i=i+4)
> +    {
> +        foo[i]   = a[i]   + b[i];
> +        foo[i+1] = a[i+1] + b[i+1];
> +        foo[i+2] = a[i+2] + b[i+2];
> +        foo[i+3] = a[i+3] + b[i+3];
> +    }
> +}
> +
> +/* Should produce an saddl */
> +void sadd_opt (int32_t *foo, int16_t *a, int16_t *b)
> +{
> +    for( int i = 0; i < ARR_SIZE - 3;i=i+4)
> +    {
> +        foo[i]   = a[i]   + b[i];
> +        foo[i+1] = a[i+1] + b[i+1];
> +        foo[i+2] = a[i+2] + b[i+2];
> +        foo[i+3] = a[i+3] + b[i+3];
> +    }
> +}
> +
> +__attribute__((optimize (0)))
> +void sadd_nonopt (int32_t *foo, int16_t *a, int16_t *b)
> +{
> +    for( int i = 0; i < ARR_SIZE - 3;i=i+4)
> +    {
> +        foo[i]   = a[i]   + b[i];
> +        foo[i+1] = a[i+1] + b[i+1];
> +        foo[i+2] = a[i+2] + b[i+2];
> +        foo[i+3] = a[i+3] + b[i+3];
> +    }
> +}
> +
> +
> +void __attribute__((optimize (0)))
> +init(uint16_t *a, uint16_t *b)
> +{
> +    for( int i = 0; i < ARR_SIZE;i++)
> +    {
> +      a[i] = i;
> +      b[i] = 2*i;
> +    }
> +}
> +
> +int __attribute__((optimize (0)))
> +main()
> +{
> +    uint32_t foo_arr[ARR_SIZE];
> +    uint32_t bar_arr[ARR_SIZE];
> +    uint16_t a[ARR_SIZE];
> +    uint16_t b[ARR_SIZE];
> +
> +    init(a, b);
> +    uadd_opt(foo_arr, a, b);
> +    uadd_nonopt(bar_arr, a, b);
> +    if (memcmp(foo_arr, bar_arr, ARR_SIZE) != 0)
> +      return 1;
> +    sadd_opt((int32_t*) foo_arr, (int16_t*) a, (int16_t*) b);
> +    sadd_nonopt((int32_t*) bar_arr, (int16_t*) a, (int16_t*) b);
> +    if (memcmp(foo_arr, bar_arr, ARR_SIZE) != 0)
> +      return 1;
> +    return 0;
> +}
> +
> +/* { dg-final { scan-assembler-times "uaddl\t" 1} } */
> +/* { dg-final { scan-assembler-times "uaddl2\t" 1} } */
> +/* { dg-final { scan-assembler-times "saddl\t" 1} } */
> +/* { dg-final { scan-assembler-times "saddl2\t" 1} } */

Same comments as the previous patch about having a "+nosve" pragma
and about the scan-assembler-times lines.  Same for the sub test.

> diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
> index 5139f111fecc7ec6e0902145b808308a5e47450b..532692a5da03d6d9653e2d47a1218982b27c4539 100644
> --- a/gcc/tree-cfg.c
> +++ b/gcc/tree-cfg.c
> @@ -3864,6 +3864,12 @@ verify_gimple_assign_binary (gassign *stmt)
>          return false;
>        }
>  
> +    case VEC_WIDEN_SUB_HI_EXPR:
> +    case VEC_WIDEN_SUB_LO_EXPR:
> +    case VEC_WIDEN_ADD_HI_EXPR:
> +    case VEC_WIDEN_ADD_LO_EXPR:
> +      return false;
> +

I think these should get the same validity checking as
VEC_WIDEN_MULT_HI_EXPR etc.

>      case VEC_WIDEN_LSHIFT_HI_EXPR:
>      case VEC_WIDEN_LSHIFT_LO_EXPR:
>        {
> […]
> diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
> index f68a87e05ed54145a25ccff598eeef9e57f9a759..331027444a6d95343eb110f7f9c7db19b40ee5ee 100644
> --- a/gcc/tree-vect-patterns.c
> +++ b/gcc/tree-vect-patterns.c
> @@ -1086,8 +1086,10 @@ vect_recog_sad_pattern (vec_info *vinfo,
>       of the above pattern.  */
>  
>    tree plus_oprnd0, plus_oprnd1;
> -  if (!vect_reassociating_reduction_p (vinfo, stmt_vinfo, PLUS_EXPR,
> -				       &plus_oprnd0, &plus_oprnd1))
> +  if (!(vect_reassociating_reduction_p (vinfo, stmt_vinfo, PLUS_EXPR,
> +				       &plus_oprnd0, &plus_oprnd1)
> +	|| vect_reassociating_reduction_p (vinfo, stmt_vinfo, WIDEN_ADD_EXPR,
> +				       &plus_oprnd0, &plus_oprnd1)))
>      return NULL;
>  
>    tree sum_type = gimple_expr_type (last_stmt);

I think we should make:

  /* Any non-truncating sequence of conversions is OK here, since
     with a successful match, the result of the ABS(U) is known to fit
     within the nonnegative range of the result type.  (It cannot be the
     negative of the minimum signed value due to the range of the widening
     MINUS_EXPR.)  */
  vect_unpromoted_value unprom_abs;
  plus_oprnd0 = vect_look_through_possible_promotion (vinfo, plus_oprnd0,
						      &unprom_abs);

specific to the PLUS_EXPR case.  If we look through promotions on
the operands of a WIDEN_ADD_EXPR, we could potentially have a mixture
of signednesses involved, one on the operands of the WIDEN_ADD_EXPR
and one on its inputs.

E.g. if we had:

    uint32_t x;
    int32_t y = (int32_t) x;
    int64_t z = WIDEN_ADD_EXPR <y, …>;

I think the code above would look through the signedness change and
give the SAD_EXPR an unsigned extension instead of a signed one.

(Alternatively, we could try to handle WIDEN_ADD_EXPR with promoted
operands, but I think in practice it would be wasted work.)

> @@ -1148,7 +1150,7 @@ vect_recog_sad_pattern (vec_info *vinfo,
>    /* FORNOW.  Can continue analyzing the def-use chain when this stmt in a phi
>       inside the loop (in case we are analyzing an outer-loop).  */
>    vect_unpromoted_value unprom[2];
> -  if (!vect_widened_op_tree (vinfo, diff_stmt_vinfo, MINUS_EXPR, MINUS_EXPR,
> +  if (!vect_widened_op_tree (vinfo, diff_stmt_vinfo, MINUS_EXPR, WIDEN_SUB_EXPR,
>  			     false, 2, unprom, &half_type))
>      return NULL;
>  
> @@ -1262,6 +1264,24 @@ vect_recog_widen_mult_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info,
>  				      "vect_recog_widen_mult_pattern");
>  }
>  
> +static gimple *
> +vect_recog_widen_add_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info,
> +			       tree *type_out)
> +{
> +  return vect_recog_widen_op_pattern (vinfo, last_stmt_info, type_out,
> +				      PLUS_EXPR, WIDEN_ADD_EXPR, false,
> +				      "vect_recog_widen_add_pattern");
> +}
> +
> +static gimple *
> +vect_recog_widen_sub_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info,
> +			       tree *type_out)
> +{
> +  return vect_recog_widen_op_pattern (vinfo, last_stmt_info, type_out,
> +				      MINUS_EXPR, WIDEN_SUB_EXPR, false,
> +				      "vect_recog_widen_sub_pattern");
> +}
> +

The new functions should have comments before them.  Can probably
just use the vect_recog_widen_mult_pattern comment as a template.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [2/3][vect] Add widening add, subtract vect patterns
  2020-11-13 12:16 ` Richard Sandiford
@ 2020-11-13 16:48   ` Joel Hutton
  2020-11-16 14:04     ` Richard Biener
  0 siblings, 1 reply; 6+ messages in thread
From: Joel Hutton @ 2020-11-13 16:48 UTC (permalink / raw)
  To: Richard Sandiford, Joel Hutton via Gcc-patches; +Cc: Richard Biener

[-- Attachment #1: Type: text/plain, Size: 6264 bytes --]

Tests are still running, but I believe I've addressed all the comments.

> Like Richard said, the new patterns need to be documented in md.texi
> and the new tree codes need to be documented in generic.texi.

Done.

> While we're using tree codes, I think we need to make the naming
> consistent with other tree codes: WIDEN_PLUS_EXPR instead of
> WIDEN_ADD_EXPR and WIDEN_MINUS_EXPR instead of WIDEN_SUB_EXPR.
> Same idea for the VEC_* codes.

Fixed.

> > gcc/ChangeLog:
> >
> > 2020-11-12  Joel Hutton  <joel.hutton@arm.com>
> >
> >         * expr.c (expand_expr_real_2): add widen_add,widen_subtract cases
> 
> Not that I personally care about this stuff (would love to see changelogs
> go away :-)) but some nits:
> 
> Each description is supposed to start with a capital letter and end with
> a full stop (even if it's not a complete sentence).  Same for the rest

Fixed.

> >         * optabs-tree.c (optab_for_tree_code): optabs for widening adds,subtracts
> 
> The line limit for changelogs is 80 characters.  The entry should say
> what changed, so “Handle …” or “Add case for …” or something.

Fixed.

> >         * tree-vect-patterns.c (vect_recog_widen_add_pattern): New recog ptatern
> 
> typo: pattern

Fixed.

> > Add widening add, subtract patterns to tree-vect-patterns.
> > Add aarch64 tests for patterns.
> >
> > fix sad
> 
> Would be good to expand on this for the final commit message.

'fix sad' was accidentally included when I squashed two commits. I've made all the commit messages more descriptive.

> > +
> > +    case VEC_WIDEN_SUB_HI_EXPR:
> > +      return (TYPE_UNSIGNED (type)
> > +           ? vec_widen_usubl_hi_optab  : vec_widen_ssubl_hi_optab);
> > +
> > +
> 
> Nits: excess blank line at the end and excess space before the “:”s.

Fixed.

> > +OPTAB_D (vec_widen_usubl_lo_optab, "vec_widen_usubl_lo_$a")
> > +OPTAB_D (vec_widen_uaddl_hi_optab, "vec_widen_uaddl_hi_$a")
> > +OPTAB_D (vec_widen_uaddl_lo_optab, "vec_widen_uaddl_lo_$a")
> >  OPTAB_D (vec_widen_sshiftl_hi_optab, "vec_widen_sshiftl_hi_$a")
> >  OPTAB_D (vec_widen_sshiftl_lo_optab, "vec_widen_sshiftl_lo_$a")
> >  OPTAB_D (vec_widen_umult_even_optab, "vec_widen_umult_even_$a")
> 
> Looks like the current code groups signed stuff together and
> unsigned stuff together, so would be good to follow that.

Fixed.

> Same comments as the previous patch about having a "+nosve" pragma
> and about the scan-assembler-times lines.  Same for the sub test.

Fixed.

> I am missing documentation in md.texi for the new patterns.  In
> particular I wonder why you need singed and unsigned variants
> for the add/subtract patterns.

Fixed. Signed and unsigned variants because they correspond to signed and
unsigned instructions, (uaddl/uaddl2, saddl/saddl2).

> The new functions should have comments before them.  Can probably
> just use the vect_recog_widen_mult_pattern comment as a template.

Fixed.

> > +    case VEC_WIDEN_SUB_HI_EXPR:
> > +    case VEC_WIDEN_SUB_LO_EXPR:
> > +    case VEC_WIDEN_ADD_HI_EXPR:
> > +    case VEC_WIDEN_ADD_LO_EXPR:
> > +      return false;
> > +
>
> I think these should get the same validity checking as
> VEC_WIDEN_MULT_HI_EXPR etc.

Fixed.

> > --- a/gcc/tree-vect-patterns.c
> > +++ b/gcc/tree-vect-patterns.c
> > @@ -1086,8 +1086,10 @@ vect_recog_sad_pattern (vec_info *vinfo,
> >       of the above pattern.  */
> >
> >    tree plus_oprnd0, plus_oprnd1;
> > -  if (!vect_reassociating_reduction_p (vinfo, stmt_vinfo, PLUS_EXPR,
> > -                                    &plus_oprnd0, &plus_oprnd1))
> > +  if (!(vect_reassociating_reduction_p (vinfo, stmt_vinfo, PLUS_EXPR,
> > +                                    &plus_oprnd0, &plus_oprnd1)
> > +     || vect_reassociating_reduction_p (vinfo, stmt_vinfo, WIDEN_ADD_EXPR,
> > +                                    &plus_oprnd0, &plus_oprnd1)))
> >      return NULL;
> >
> >     tree sum_type = gimple_expr_type (last_stmt);
>
> I think we should make:
>
>   /* Any non-truncating sequence of conversions is OK here, since
>      with a successful match, the result of the ABS(U) is known to fit
>      within the nonnegative range of the result type.  (It cannot be the
>      negative of the minimum signed value due to the range of the widening
>      MINUS_EXPR.)  */
>   vect_unpromoted_value unprom_abs;
>   plus_oprnd0 = vect_look_through_possible_promotion (vinfo, plus_oprnd0,
>                                                       &unprom_abs);
>
> specific to the PLUS_EXPR case.  If we look through promotions on
> the operands of a WIDEN_ADD_EXPR, we could potentially have a mixture
> of signednesses involved, one on the operands of the WIDEN_ADD_EXPR
> and one on its inputs.

Fixed.


gcc/ChangeLog:

2020-11-13  Joel Hutton  <joel.hutton@arm.com>

	* expr.c (expand_expr_real_2): Add widen_add,widen_subtract cases.
	* optabs-tree.c (optab_for_tree_code): Add case for widening optabs.
	  adds, subtracts.
        * optabs.def (OPTAB_D): Define vectorized widen add, subtracts.
	* tree-cfg.c (verify_gimple_assign_binary): Add case for widening adds,
	  subtracts.
	* tree-inline.c (estimate_operator_cost): Add case for widening adds,
	   subtracts.
	* tree-vect-generic.c (expand_vector_operations_1): Add case for
	  widening adds, subtracts tree-vect-patterns.c
	* (vect_recog_widen_add_pattern): New recog pattern.
	  (vect_recog_widen_sub_pattern): New recog pattern.
	  (vect_recog_average_pattern): Update widened add code.
	  (vect_recog_average_pattern): Update widened add code.
	* tree-vect-stmts.c (vectorizable_conversion): Add case for widened add,
	  subtract.
	(supportable_widening_operation): Add case for widened add, subtract.
	* tree.def
	(WIDEN_PLUS_EXPR): New tree code.
	(WIDEN_MINUS_EXPR): New tree code.
	(VEC_WIDEN_ADD_HI_EXPR): New tree code.
	(VEC_WIDEN_PLUS_LO_EXPR): New tree code.
	(VEC_WIDEN_MINUS_HI_EXPR): New tree code.
	(VEC_WIDEN_MINUS_LO_EXPR): New tree code.

gcc/testsuite/ChangeLog:

2020-11-13  Joel Hutton  <joel.hutton@arm.com>

        * gcc.target/aarch64/vect-widen-add.c: New test.
        * gcc.target/aarch64/vect-widen-sub.c: New test.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0002-vect-Add-widening-add-subtract-patterns.patch --]
[-- Type: text/x-patch; name="0002-vect-Add-widening-add-subtract-patterns.patch", Size: 21630 bytes --]

From d5e20487bbccd69e9b5ac96fef6c9df8710d0cb0 Mon Sep 17 00:00:00 2001
From: Joel Hutton <joel.hutton@arm.com>
Date: Mon, 9 Nov 2020 15:44:18 +0000
Subject: [PATCH 2/3] [vect] Add widening add, subtract patterns

Add widening add, subtract patterns to tree-vect-patterns. Update the
widened code of patterns that detect PLUS_EXPR to also detect
WIDEN_PLUS_EXPR. These patterns take 2 vectors with N elements of size
S and perform an add/subtract on the elements, storing the results as N
elements of size 2*S (in 2 result vectors). This is implemented in the
aarch64 backend as addl,addl2 and subl,subl2 respectively. Add aarch64
tests for patterns.
---
 gcc/doc/generic.texi                          | 31 +++++++
 gcc/doc/md.texi                               | 22 +++++
 gcc/expr.c                                    |  6 ++
 gcc/optabs-tree.c                             | 16 ++++
 gcc/optabs.def                                |  8 ++
 .../gcc.target/aarch64/vect-widen-add.c       | 92 +++++++++++++++++++
 .../gcc.target/aarch64/vect-widen-sub.c       | 92 +++++++++++++++++++
 gcc/tree-cfg.c                                |  6 ++
 gcc/tree-inline.c                             |  6 ++
 gcc/tree-vect-generic.c                       |  4 +
 gcc/tree-vect-patterns.c                      | 31 ++++++-
 gcc/tree-vect-stmts.c                         | 15 ++-
 gcc/tree.def                                  |  6 ++
 13 files changed, 331 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-widen-add.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c

diff --git a/gcc/doc/generic.texi b/gcc/doc/generic.texi
index 7373266c69f..3d7d4b0b947 100644
--- a/gcc/doc/generic.texi
+++ b/gcc/doc/generic.texi
@@ -1790,6 +1790,10 @@ a value from @code{enum annot_expr_kind}, the third is an @code{INTEGER_CST}.
 @tindex VEC_RSHIFT_EXPR
 @tindex VEC_WIDEN_MULT_HI_EXPR
 @tindex VEC_WIDEN_MULT_LO_EXPR
+@tindex VEC_WIDEN_PLUS_HI_EXPR
+@tindex VEC_WIDEN_PLUS_LO_EXPR
+@tindex VEC_WIDEN_MINUS_HI_EXPR
+@tindex VEC_WIDEN_MINUS_LO_EXPR
 @tindex VEC_UNPACK_HI_EXPR
 @tindex VEC_UNPACK_LO_EXPR
 @tindex VEC_UNPACK_FLOAT_HI_EXPR
@@ -1836,6 +1840,33 @@ vector of @code{N/2} products. In the case of @code{VEC_WIDEN_MULT_LO_EXPR} the
 low @code{N/2} elements of the two vector are multiplied to produce the
 vector of @code{N/2} products.
 
+@item VEC_WIDEN_PLUS_HI_EXPR
+@itemx VEC_WIDEN_PLUS_LO_EXPR
+These nodes represent widening vector addition of the high and low parts of
+the two input vectors, respectively.  Their operands are vectors that contain
+the same number of elements (@code{N}) of the same integral type. The result
+is a vector that contains half as many elements, of an integral type whose size
+is twice as wide.  In the case of @code{VEC_WIDEN_PLUS_HI_EXPR} the high
+@code{N/2} elements of the two vectors are added to produce the vector of
+@code{N/2} products.  In the case of @code{VEC_WIDEN_PLUS_LO_EXPR} the low
+@code{N/2} elements of the two vectors are added to produce the vector of
+@code{N/2} products.
+
+@item VEC_WIDEN_MINUS_HI_EXPR
+@itemx VEC_WIDEN_MINUS_LO_EXPR
+These nodes represent widening vector subtraction of the high and low parts of
+the two input vectors, respectively.  Their operands are vectors that contain
+the same number of elements (@code{N}) of the same integral type. The high/low
+elements of the second vector are subtracted from the high/low elements of the
+first. The result is a vector that contains half as many elements, of an
+integral type whose size is twice as wide.  In the case of
+@code{VEC_WIDEN_MINUS_HI_EXPR} the high @code{N/2} elements of the second
+vector are subtracted from the high @code{N/2} of the first to produce the
+vector of @code{N/2} products.  In the case of
+@code{VEC_WIDEN_MINUS_LO_EXPR} the low @code{N/2} elements of the second
+vector are subtracted from the low @code{N/2} of the first to produce the
+vector of @code{N/2} products.
+
 @item VEC_UNPACK_HI_EXPR
 @itemx VEC_UNPACK_LO_EXPR
 These nodes represent unpacking of the high and low parts of the input vector,
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 813875b973b..da8c9a283dd 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5626,6 +5626,28 @@ with N signed/unsigned elements of size S@.  Operand 2 is a constant.  Shift
 the high/low elements of operand 1, and put the N/2 results of size 2*S in the
 output vector (operand 0).
 
+@cindex @code{vec_widen_saddl_hi_@var{m}} instruction pattern
+@cindex @code{vec_widen_saddl_lo_@var{m}} instruction pattern
+@cindex @code{vec_widen_uaddl_hi_@var{m}} instruction pattern
+@cindex @code{vec_widen_uaddl_lo_@var{m}} instruction pattern
+@item @samp{vec_widen_uaddl_hi_@var{m}}, @samp{vec_widen_uaddl_lo_@var{m}}
+@itemx @samp{vec_widen_saddl_hi_@var{m}}, @samp{vec_widen_saddl_lo_@var{m}}
+Signed/Unsigned widening add long.  Operands 1 and 2 are vectors with N
+signed/unsigned elements of size S@.  Add the high/low elements of 1 and 2
+together, widen the resulting elements and put the N/2 results of size 2*S in
+the output vector (operand 0).
+
+@cindex @code{vec_widen_ssubl_hi_@var{m}} instruction pattern
+@cindex @code{vec_widen_ssubl_lo_@var{m}} instruction pattern
+@cindex @code{vec_widen_usubl_hi_@var{m}} instruction pattern
+@cindex @code{vec_widen_usubl_lo_@var{m}} instruction pattern
+@item @samp{vec_widen_usubl_hi_@var{m}}, @samp{vec_widen_usubl_lo_@var{m}}
+@itemx @samp{vec_widen_ssubl_hi_@var{m}}, @samp{vec_widen_ssubl_lo_@var{m}}
+Signed/Unsigned widening subtract long.  Operands 1 and 2 are vectors with N
+signed/unsigned elements of size S@.  Subtract the high/low elements of 2 from
+1 and widen the resulting elements. Put the N/2 results of size 2*S in the
+output vector (operand 0).
+
 @cindex @code{mulhisi3} instruction pattern
 @item @samp{mulhisi3}
 Multiply operands 1 and 2, which have mode @code{HImode}, and store
diff --git a/gcc/expr.c b/gcc/expr.c
index ae16f077758..83aa63c41b5 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -9034,6 +9034,8 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
 					  target, unsignedp);
       return target;
 
+    case WIDEN_PLUS_EXPR:
+    case WIDEN_MINUS_EXPR:
     case WIDEN_MULT_EXPR:
       /* If first operand is constant, swap them.
 	 Thus the following special case checks need only
@@ -9754,6 +9756,10 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
 	return temp;
       }
 
+    case VEC_WIDEN_PLUS_HI_EXPR:
+    case VEC_WIDEN_PLUS_LO_EXPR:
+    case VEC_WIDEN_MINUS_HI_EXPR:
+    case VEC_WIDEN_MINUS_LO_EXPR:
     case VEC_WIDEN_MULT_HI_EXPR:
     case VEC_WIDEN_MULT_LO_EXPR:
     case VEC_WIDEN_MULT_EVEN_EXPR:
diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
index 4dfda756932..b797d018c84 100644
--- a/gcc/optabs-tree.c
+++ b/gcc/optabs-tree.c
@@ -170,6 +170,22 @@ optab_for_tree_code (enum tree_code code, const_tree type,
       return (TYPE_UNSIGNED (type)
 	      ? vec_widen_ushiftl_lo_optab : vec_widen_sshiftl_lo_optab);
 
+    case VEC_WIDEN_PLUS_LO_EXPR:
+      return (TYPE_UNSIGNED (type)
+	      ? vec_widen_uaddl_lo_optab : vec_widen_saddl_lo_optab);
+
+    case VEC_WIDEN_PLUS_HI_EXPR:
+      return (TYPE_UNSIGNED (type)
+	      ? vec_widen_uaddl_hi_optab : vec_widen_saddl_hi_optab);
+
+    case VEC_WIDEN_MINUS_LO_EXPR:
+      return (TYPE_UNSIGNED (type)
+	      ? vec_widen_usubl_lo_optab : vec_widen_ssubl_lo_optab);
+
+    case VEC_WIDEN_MINUS_HI_EXPR:
+      return (TYPE_UNSIGNED (type)
+	      ? vec_widen_usubl_hi_optab : vec_widen_ssubl_hi_optab);
+
     case VEC_UNPACK_HI_EXPR:
       return (TYPE_UNSIGNED (type)
 	      ? vec_unpacku_hi_optab : vec_unpacks_hi_optab);
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 78409aa1453..5607f51e6b4 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -383,6 +383,10 @@ OPTAB_D (vec_widen_smult_even_optab, "vec_widen_smult_even_$a")
 OPTAB_D (vec_widen_smult_hi_optab, "vec_widen_smult_hi_$a")
 OPTAB_D (vec_widen_smult_lo_optab, "vec_widen_smult_lo_$a")
 OPTAB_D (vec_widen_smult_odd_optab, "vec_widen_smult_odd_$a")
+OPTAB_D (vec_widen_ssubl_hi_optab, "vec_widen_ssubl_hi_$a")
+OPTAB_D (vec_widen_ssubl_lo_optab, "vec_widen_ssubl_lo_$a")
+OPTAB_D (vec_widen_saddl_hi_optab, "vec_widen_saddl_hi_$a")
+OPTAB_D (vec_widen_saddl_lo_optab, "vec_widen_saddl_lo_$a")
 OPTAB_D (vec_widen_sshiftl_hi_optab, "vec_widen_sshiftl_hi_$a")
 OPTAB_D (vec_widen_sshiftl_lo_optab, "vec_widen_sshiftl_lo_$a")
 OPTAB_D (vec_widen_umult_even_optab, "vec_widen_umult_even_$a")
@@ -391,6 +395,10 @@ OPTAB_D (vec_widen_umult_lo_optab, "vec_widen_umult_lo_$a")
 OPTAB_D (vec_widen_umult_odd_optab, "vec_widen_umult_odd_$a")
 OPTAB_D (vec_widen_ushiftl_hi_optab, "vec_widen_ushiftl_hi_$a")
 OPTAB_D (vec_widen_ushiftl_lo_optab, "vec_widen_ushiftl_lo_$a")
+OPTAB_D (vec_widen_usubl_hi_optab, "vec_widen_usubl_hi_$a")
+OPTAB_D (vec_widen_usubl_lo_optab, "vec_widen_usubl_lo_$a")
+OPTAB_D (vec_widen_uaddl_hi_optab, "vec_widen_uaddl_hi_$a")
+OPTAB_D (vec_widen_uaddl_lo_optab, "vec_widen_uaddl_lo_$a")
 
 OPTAB_D (sync_add_optab, "sync_add$I$a")
 OPTAB_D (sync_and_optab, "sync_and$I$a")
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c b/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c
new file mode 100644
index 00000000000..220bd9352a4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c
@@ -0,0 +1,92 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -save-temps" } */
+#include <stdint.h>
+#include <string.h>
+
+#pragma GCC target "+nosve"
+
+#define ARR_SIZE 1024
+
+/* Should produce an uaddl */
+void uadd_opt (uint32_t *foo, uint16_t *a, uint16_t *b)
+{
+    for( int i = 0; i < ARR_SIZE - 3;i=i+4)
+    {
+        foo[i]   = a[i]   + b[i];
+        foo[i+1] = a[i+1] + b[i+1];
+        foo[i+2] = a[i+2] + b[i+2];
+        foo[i+3] = a[i+3] + b[i+3];
+    }
+}
+
+__attribute__((optimize (0)))
+void uadd_nonopt (uint32_t *foo, uint16_t *a, uint16_t *b)
+{
+    for( int i = 0; i < ARR_SIZE - 3;i=i+4)
+    {
+        foo[i]   = a[i]   + b[i];
+        foo[i+1] = a[i+1] + b[i+1];
+        foo[i+2] = a[i+2] + b[i+2];
+        foo[i+3] = a[i+3] + b[i+3];
+    }
+}
+
+/* Should produce an saddl */
+void sadd_opt (int32_t *foo, int16_t *a, int16_t *b)
+{
+    for( int i = 0; i < ARR_SIZE - 3;i=i+4)
+    {
+        foo[i]   = a[i]   + b[i];
+        foo[i+1] = a[i+1] + b[i+1];
+        foo[i+2] = a[i+2] + b[i+2];
+        foo[i+3] = a[i+3] + b[i+3];
+    }
+}
+
+__attribute__((optimize (0)))
+void sadd_nonopt (int32_t *foo, int16_t *a, int16_t *b)
+{
+    for( int i = 0; i < ARR_SIZE - 3;i=i+4)
+    {
+        foo[i]   = a[i]   + b[i];
+        foo[i+1] = a[i+1] + b[i+1];
+        foo[i+2] = a[i+2] + b[i+2];
+        foo[i+3] = a[i+3] + b[i+3];
+    }
+}
+
+
+void __attribute__((optimize (0)))
+init(uint16_t *a, uint16_t *b)
+{
+    for( int i = 0; i < ARR_SIZE;i++)
+    {
+      a[i] = i;
+      b[i] = 2*i;
+    }
+}
+
+int __attribute__((optimize (0)))
+main()
+{
+    uint32_t foo_arr[ARR_SIZE];
+    uint32_t bar_arr[ARR_SIZE];
+    uint16_t a[ARR_SIZE];
+    uint16_t b[ARR_SIZE];
+
+    init(a, b);
+    uadd_opt(foo_arr, a, b);
+    uadd_nonopt(bar_arr, a, b);
+    if (memcmp(foo_arr, bar_arr, ARR_SIZE) != 0)
+      return 1;
+    sadd_opt((int32_t*) foo_arr, (int16_t*) a, (int16_t*) b);
+    sadd_nonopt((int32_t*) bar_arr, (int16_t*) a, (int16_t*) b);
+    if (memcmp(foo_arr, bar_arr, ARR_SIZE) != 0)
+      return 1;
+    return 0;
+}
+
+/* { dg-final { scan-assembler-times {\tuaddl\t} 1} } */
+/* { dg-final { scan-assembler-times {\tuaddl2\t} 1} } */
+/* { dg-final { scan-assembler-times {\tsaddl\t} 1} } */
+/* { dg-final { scan-assembler-times {\tsaddl2\t} 1} } */
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c b/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c
new file mode 100644
index 00000000000..a2bed63affb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c
@@ -0,0 +1,92 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -save-temps" } */
+#include <stdint.h>
+#include <string.h>
+
+#pragma GCC target "+nosve"
+
+#define ARR_SIZE 1024
+
+/* Should produce an usubl */
+void usub_opt (uint32_t *foo, uint16_t *a, uint16_t *b)
+{
+    for( int i = 0; i < ARR_SIZE - 3;i=i+4)
+    {
+        foo[i]   = a[i]   - b[i];
+        foo[i+1] = a[i+1] - b[i+1];
+        foo[i+2] = a[i+2] - b[i+2];
+        foo[i+3] = a[i+3] - b[i+3];
+    }
+}
+
+__attribute__((optimize (0)))
+void usub_nonopt (uint32_t *foo, uint16_t *a, uint16_t *b)
+{
+    for( int i = 0; i < ARR_SIZE - 3;i=i+4)
+    {
+        foo[i]   = a[i]   - b[i];
+        foo[i+1] = a[i+1] - b[i+1];
+        foo[i+2] = a[i+2] - b[i+2];
+        foo[i+3] = a[i+3] - b[i+3];
+    }
+}
+
+/* Should produce an ssubl */
+void ssub_opt (int32_t *foo, int16_t *a, int16_t *b)
+{
+    for( int i = 0; i < ARR_SIZE - 3;i=i+4)
+    {
+        foo[i]   = a[i]   - b[i];
+        foo[i+1] = a[i+1] - b[i+1];
+        foo[i+2] = a[i+2] - b[i+2];
+        foo[i+3] = a[i+3] - b[i+3];
+    }
+}
+
+__attribute__((optimize (0)))
+void ssub_nonopt (int32_t *foo, int16_t *a, int16_t *b)
+{
+    for( int i = 0; i < ARR_SIZE - 3;i=i+4)
+    {
+        foo[i]   = a[i]   - b[i];
+        foo[i+1] = a[i+1] - b[i+1];
+        foo[i+2] = a[i+2] - b[i+2];
+        foo[i+3] = a[i+3] - b[i+3];
+    }
+}
+
+
+void __attribute__((optimize (0)))
+init(uint16_t *a, uint16_t *b)
+{
+    for( int i = 0; i < ARR_SIZE;i++)
+    {
+      a[i] = i;
+      b[i] = 2*i;
+    }
+}
+
+int __attribute__((optimize (0)))
+main()
+{
+    uint32_t foo_arr[ARR_SIZE];
+    uint32_t bar_arr[ARR_SIZE];
+    uint16_t a[ARR_SIZE];
+    uint16_t b[ARR_SIZE];
+
+    init(a, b);
+    usub_opt(foo_arr, a, b);
+    usub_nonopt(bar_arr, a, b);
+    if (memcmp(foo_arr, bar_arr, ARR_SIZE) != 0)
+      return 1;
+    ssub_opt((int32_t*) foo_arr, (int16_t*) a, (int16_t*) b);
+    ssub_nonopt((int32_t*) bar_arr, (int16_t*) a, (int16_t*) b);
+    if (memcmp(foo_arr, bar_arr, ARR_SIZE) != 0)
+      return 1;
+    return 0;
+}
+
+/* { dg-final { scan-assembler-times {\tusubl\t} 1} } */
+/* { dg-final { scan-assembler-times {\tusubl2\t} 1} } */
+/* { dg-final { scan-assembler-times {\tssubl\t} 1} } */
+/* { dg-final { scan-assembler-times {\tssubl2\t} 1} } */
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index 5139f111fec..aaf390bda42 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -3885,6 +3885,8 @@ verify_gimple_assign_binary (gassign *stmt)
         return false;
       }
 
+    case WIDEN_PLUS_EXPR:
+    case WIDEN_MINUS_EXPR:
     case PLUS_EXPR:
     case MINUS_EXPR:
       {
@@ -4005,6 +4007,10 @@ verify_gimple_assign_binary (gassign *stmt)
         return false;
       }
 
+    case VEC_WIDEN_MINUS_HI_EXPR:
+    case VEC_WIDEN_MINUS_LO_EXPR:
+    case VEC_WIDEN_PLUS_HI_EXPR:
+    case VEC_WIDEN_PLUS_LO_EXPR:
     case VEC_WIDEN_MULT_HI_EXPR:
     case VEC_WIDEN_MULT_LO_EXPR:
     case VEC_WIDEN_MULT_EVEN_EXPR:
diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
index 32424b169c7..d9814bd10d3 100644
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -4224,6 +4224,8 @@ estimate_operator_cost (enum tree_code code, eni_weights *weights,
 
     case REALIGN_LOAD_EXPR:
 
+    case WIDEN_PLUS_EXPR:
+    case WIDEN_MINUS_EXPR:
     case WIDEN_SUM_EXPR:
     case WIDEN_MULT_EXPR:
     case DOT_PROD_EXPR:
@@ -4232,6 +4234,10 @@ estimate_operator_cost (enum tree_code code, eni_weights *weights,
     case WIDEN_MULT_MINUS_EXPR:
     case WIDEN_LSHIFT_EXPR:
 
+    case VEC_WIDEN_PLUS_HI_EXPR:
+    case VEC_WIDEN_PLUS_LO_EXPR:
+    case VEC_WIDEN_MINUS_HI_EXPR:
+    case VEC_WIDEN_MINUS_LO_EXPR:
     case VEC_WIDEN_MULT_HI_EXPR:
     case VEC_WIDEN_MULT_LO_EXPR:
     case VEC_WIDEN_MULT_EVEN_EXPR:
diff --git a/gcc/tree-vect-generic.c b/gcc/tree-vect-generic.c
index d7bafa77134..23bc1cb04b7 100644
--- a/gcc/tree-vect-generic.c
+++ b/gcc/tree-vect-generic.c
@@ -2118,6 +2118,10 @@ expand_vector_operations_1 (gimple_stmt_iterator *gsi,
      arguments, not the widened result.  VEC_UNPACK_FLOAT_*_EXPR is
      calculated in the same way above.  */
   if (code == WIDEN_SUM_EXPR
+      || code == VEC_WIDEN_PLUS_HI_EXPR
+      || code == VEC_WIDEN_PLUS_LO_EXPR
+      || code == VEC_WIDEN_MINUS_HI_EXPR
+      || code == VEC_WIDEN_MINUS_LO_EXPR
       || code == VEC_WIDEN_MULT_HI_EXPR
       || code == VEC_WIDEN_MULT_LO_EXPR
       || code == VEC_WIDEN_MULT_EVEN_EXPR
diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index f68a87e05ed..79b521aa436 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -1148,7 +1148,7 @@ vect_recog_sad_pattern (vec_info *vinfo,
   /* FORNOW.  Can continue analyzing the def-use chain when this stmt in a phi
      inside the loop (in case we are analyzing an outer-loop).  */
   vect_unpromoted_value unprom[2];
-  if (!vect_widened_op_tree (vinfo, diff_stmt_vinfo, MINUS_EXPR, MINUS_EXPR,
+  if (!vect_widened_op_tree (vinfo, diff_stmt_vinfo, MINUS_EXPR, WIDEN_MINUS_EXPR,
 			     false, 2, unprom, &half_type))
     return NULL;
 
@@ -1262,6 +1262,29 @@ vect_recog_widen_mult_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info,
 				      "vect_recog_widen_mult_pattern");
 }
 
+/* Try to detect addition on widened inputs, converting PLUS_EXPR
+   to WIDEN_PLUS_EXPR.  See vect_recog_widen_op_pattern for details.  */
+
+static gimple *
+vect_recog_widen_plus_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info,
+			       tree *type_out)
+{
+  return vect_recog_widen_op_pattern (vinfo, last_stmt_info, type_out,
+				      PLUS_EXPR, WIDEN_PLUS_EXPR, false,
+				      "vect_recog_widen_plus_pattern");
+}
+
+/* Try to detect addition on widened inputs, converting SUB_EXPR
+   to WIDEN_MINUS_EXPR.  See vect_recog_widen_op_pattern for details.  */
+static gimple *
+vect_recog_widen_minus_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info,
+			       tree *type_out)
+{
+  return vect_recog_widen_op_pattern (vinfo, last_stmt_info, type_out,
+				      MINUS_EXPR, WIDEN_MINUS_EXPR, false,
+				      "vect_recog_widen_minus_pattern");
+}
+
 /* Function vect_recog_pow_pattern
 
    Try to find the following pattern:
@@ -1978,7 +2001,7 @@ vect_recog_average_pattern (vec_info *vinfo,
   vect_unpromoted_value unprom[3];
   tree new_type;
   unsigned int nops = vect_widened_op_tree (vinfo, plus_stmt_info, PLUS_EXPR,
-					    PLUS_EXPR, false, 3,
+					    WIDEN_PLUS_EXPR, false, 3,
 					    unprom, &new_type);
   if (nops == 0)
     return NULL;
@@ -5249,7 +5272,9 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = {
      of mask conversion that are needed for gather and scatter
      internal functions.  */
   { vect_recog_gather_scatter_pattern, "gather_scatter" },
-  { vect_recog_mask_conversion_pattern, "mask_conversion" }
+  { vect_recog_mask_conversion_pattern, "mask_conversion" },
+  { vect_recog_widen_plus_pattern, "widen_plus" },
+  { vect_recog_widen_minus_pattern, "widen_minus" },
 };
 
 const unsigned int NUM_PATTERNS = ARRAY_SIZE (vect_vect_recog_func_ptrs);
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 2c7a8a70913..25a8474c774 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -4570,6 +4570,8 @@ vectorizable_conversion (vec_info *vinfo,
   if (!CONVERT_EXPR_CODE_P (code)
       && code != FIX_TRUNC_EXPR
       && code != FLOAT_EXPR
+      && code != WIDEN_PLUS_EXPR
+      && code != WIDEN_MINUS_EXPR
       && code != WIDEN_MULT_EXPR
       && code != WIDEN_LSHIFT_EXPR)
     return false;
@@ -4615,7 +4617,8 @@ vectorizable_conversion (vec_info *vinfo,
 
   if (op_type == binary_op)
     {
-      gcc_assert (code == WIDEN_MULT_EXPR || code == WIDEN_LSHIFT_EXPR);
+      gcc_assert (code == WIDEN_MULT_EXPR || code == WIDEN_LSHIFT_EXPR
+		  || code == WIDEN_PLUS_EXPR || code == WIDEN_MINUS_EXPR);
 
       op1 = gimple_assign_rhs2 (stmt);
       tree vectype1_in;
@@ -11534,6 +11537,16 @@ supportable_widening_operation (vec_info *vinfo,
       c2 = VEC_WIDEN_LSHIFT_HI_EXPR;
       break;
 
+    case WIDEN_PLUS_EXPR:
+      c1 = VEC_WIDEN_PLUS_LO_EXPR;
+      c2 = VEC_WIDEN_PLUS_HI_EXPR;
+      break;
+
+    case WIDEN_MINUS_EXPR:
+      c1 = VEC_WIDEN_MINUS_LO_EXPR;
+      c2 = VEC_WIDEN_MINUS_HI_EXPR;
+      break;
+
     CASE_CONVERT:
       c1 = VEC_UNPACK_LO_EXPR;
       c2 = VEC_UNPACK_HI_EXPR;
diff --git a/gcc/tree.def b/gcc/tree.def
index 6c53fe1bf67..ffbe00cf79f 100644
--- a/gcc/tree.def
+++ b/gcc/tree.def
@@ -1359,6 +1359,8 @@ DEFTREECODE (WIDEN_MULT_MINUS_EXPR, "widen_mult_minus_expr", tcc_expression, 3)
    the first argument from type t1 to type t2, and then shifting it
    by the second argument.  */
 DEFTREECODE (WIDEN_LSHIFT_EXPR, "widen_lshift_expr", tcc_binary, 2)
+DEFTREECODE (WIDEN_PLUS_EXPR, "widen_plus_expr", tcc_binary, 2)
+DEFTREECODE (WIDEN_MINUS_EXPR, "widen_minus_expr", tcc_binary, 2)
 
 /* Widening vector multiplication.
    The two operands are vectors with N elements of size S. Multiplying the
@@ -1423,6 +1425,10 @@ DEFTREECODE (VEC_PACK_FLOAT_EXPR, "vec_pack_float_expr", tcc_binary, 2)
  */
 DEFTREECODE (VEC_WIDEN_LSHIFT_HI_EXPR, "widen_lshift_hi_expr", tcc_binary, 2)
 DEFTREECODE (VEC_WIDEN_LSHIFT_LO_EXPR, "widen_lshift_lo_expr", tcc_binary, 2)
+DEFTREECODE (VEC_WIDEN_PLUS_HI_EXPR, "widen_plus_hi_expr", tcc_binary, 2)
+DEFTREECODE (VEC_WIDEN_PLUS_LO_EXPR, "widen_plus_lo_expr", tcc_binary, 2)
+DEFTREECODE (VEC_WIDEN_MINUS_HI_EXPR, "widen_minus_hi_expr", tcc_binary, 2)
+DEFTREECODE (VEC_WIDEN_MINUS_LO_EXPR, "widen_minus_lo_expr", tcc_binary, 2)
 
 /* PREDICT_EXPR.  Specify hint for branch prediction.  The
    PREDICT_EXPR_PREDICTOR specify predictor and PREDICT_EXPR_OUTCOME the
-- 
2.17.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [2/3][vect] Add widening add, subtract vect patterns
  2020-11-13 16:48   ` Joel Hutton
@ 2020-11-16 14:04     ` Richard Biener
  2020-11-17 13:40       ` Richard Sandiford
  0 siblings, 1 reply; 6+ messages in thread
From: Richard Biener @ 2020-11-16 14:04 UTC (permalink / raw)
  To: Joel Hutton; +Cc: Richard Sandiford, Joel Hutton via Gcc-patches

On Fri, 13 Nov 2020, Joel Hutton wrote:

> Tests are still running, but I believe I've addressed all the comments.
> 
> > Like Richard said, the new patterns need to be documented in md.texi
> > and the new tree codes need to be documented in generic.texi.
> 
> Done.
> 
> > While we're using tree codes, I think we need to make the naming
> > consistent with other tree codes: WIDEN_PLUS_EXPR instead of
> > WIDEN_ADD_EXPR and WIDEN_MINUS_EXPR instead of WIDEN_SUB_EXPR.
> > Same idea for the VEC_* codes.
> 
> Fixed.
> 
> > > gcc/ChangeLog:
> > >
> > > 2020-11-12  Joel Hutton  <joel.hutton@arm.com>
> > >
> > >         * expr.c (expand_expr_real_2): add widen_add,widen_subtract cases
> > 
> > Not that I personally care about this stuff (would love to see changelogs
> > go away :-)) but some nits:
> > 
> > Each description is supposed to start with a capital letter and end with
> > a full stop (even if it's not a complete sentence).  Same for the rest
> 
> Fixed.
> 
> > >         * optabs-tree.c (optab_for_tree_code): optabs for widening adds,subtracts
> > 
> > The line limit for changelogs is 80 characters.  The entry should say
> > what changed, so ?Handle ?? or ?Add case for ?? or something.
> 
> Fixed.
> 
> > >         * tree-vect-patterns.c (vect_recog_widen_add_pattern): New recog ptatern
> > 
> > typo: pattern
> 
> Fixed.
> 
> > > Add widening add, subtract patterns to tree-vect-patterns.
> > > Add aarch64 tests for patterns.
> > >
> > > fix sad
> > 
> > Would be good to expand on this for the final commit message.
> 
> 'fix sad' was accidentally included when I squashed two commits. I've made all the commit messages more descriptive.
> 
> > > +
> > > +    case VEC_WIDEN_SUB_HI_EXPR:
> > > +      return (TYPE_UNSIGNED (type)
> > > +           ? vec_widen_usubl_hi_optab  : vec_widen_ssubl_hi_optab);
> > > +
> > > +
> > 
> > Nits: excess blank line at the end and excess space before the ?:?s.
> 
> Fixed.
> 
> > > +OPTAB_D (vec_widen_usubl_lo_optab, "vec_widen_usubl_lo_$a")
> > > +OPTAB_D (vec_widen_uaddl_hi_optab, "vec_widen_uaddl_hi_$a")
> > > +OPTAB_D (vec_widen_uaddl_lo_optab, "vec_widen_uaddl_lo_$a")
> > >  OPTAB_D (vec_widen_sshiftl_hi_optab, "vec_widen_sshiftl_hi_$a")
> > >  OPTAB_D (vec_widen_sshiftl_lo_optab, "vec_widen_sshiftl_lo_$a")
> > >  OPTAB_D (vec_widen_umult_even_optab, "vec_widen_umult_even_$a")
> > 
> > Looks like the current code groups signed stuff together and
> > unsigned stuff together, so would be good to follow that.
> 
> Fixed.
> 
> > Same comments as the previous patch about having a "+nosve" pragma
> > and about the scan-assembler-times lines.  Same for the sub test.
> 
> Fixed.
> 
> > I am missing documentation in md.texi for the new patterns.  In
> > particular I wonder why you need singed and unsigned variants
> > for the add/subtract patterns.
> 
> Fixed. Signed and unsigned variants because they correspond to signed and
> unsigned instructions, (uaddl/uaddl2, saddl/saddl2).
> 
> > The new functions should have comments before them.  Can probably
> > just use the vect_recog_widen_mult_pattern comment as a template.
> 
> Fixed.
> 
> > > +    case VEC_WIDEN_SUB_HI_EXPR:
> > > +    case VEC_WIDEN_SUB_LO_EXPR:
> > > +    case VEC_WIDEN_ADD_HI_EXPR:
> > > +    case VEC_WIDEN_ADD_LO_EXPR:
> > > +      return false;
> > > +
> >
> > I think these should get the same validity checking as
> > VEC_WIDEN_MULT_HI_EXPR etc.
> 
> Fixed.
> 
> > > --- a/gcc/tree-vect-patterns.c
> > > +++ b/gcc/tree-vect-patterns.c
> > > @@ -1086,8 +1086,10 @@ vect_recog_sad_pattern (vec_info *vinfo,
> > >       of the above pattern.  */
> > >
> > >    tree plus_oprnd0, plus_oprnd1;
> > > -  if (!vect_reassociating_reduction_p (vinfo, stmt_vinfo, PLUS_EXPR,
> > > -                                    &plus_oprnd0, &plus_oprnd1))
> > > +  if (!(vect_reassociating_reduction_p (vinfo, stmt_vinfo, PLUS_EXPR,
> > > +                                    &plus_oprnd0, &plus_oprnd1)
> > > +     || vect_reassociating_reduction_p (vinfo, stmt_vinfo, WIDEN_ADD_EXPR,
> > > +                                    &plus_oprnd0, &plus_oprnd1)))
> > >      return NULL;
> > >
> > >     tree sum_type = gimple_expr_type (last_stmt);
> >
> > I think we should make:
> >
> >   /* Any non-truncating sequence of conversions is OK here, since
> >      with a successful match, the result of the ABS(U) is known to fit
> >      within the nonnegative range of the result type.  (It cannot be the
> >      negative of the minimum signed value due to the range of the widening
> >      MINUS_EXPR.)  */
> >   vect_unpromoted_value unprom_abs;
> >   plus_oprnd0 = vect_look_through_possible_promotion (vinfo, plus_oprnd0,
> >                                                       &unprom_abs);
> >
> > specific to the PLUS_EXPR case.  If we look through promotions on
> > the operands of a WIDEN_ADD_EXPR, we could potentially have a mixture
> > of signednesses involved, one on the operands of the WIDEN_ADD_EXPR
> > and one on its inputs.
> 
> Fixed.

LGTM.

Thanks,
Richard.

> 
> gcc/ChangeLog:
> 
> 2020-11-13  Joel Hutton  <joel.hutton@arm.com>
> 
> 	* expr.c (expand_expr_real_2): Add widen_add,widen_subtract cases.
> 	* optabs-tree.c (optab_for_tree_code): Add case for widening optabs.
> 	  adds, subtracts.
>         * optabs.def (OPTAB_D): Define vectorized widen add, subtracts.
> 	* tree-cfg.c (verify_gimple_assign_binary): Add case for widening adds,
> 	  subtracts.
> 	* tree-inline.c (estimate_operator_cost): Add case for widening adds,
> 	   subtracts.
> 	* tree-vect-generic.c (expand_vector_operations_1): Add case for
> 	  widening adds, subtracts tree-vect-patterns.c
> 	* (vect_recog_widen_add_pattern): New recog pattern.
> 	  (vect_recog_widen_sub_pattern): New recog pattern.
> 	  (vect_recog_average_pattern): Update widened add code.
> 	  (vect_recog_average_pattern): Update widened add code.
> 	* tree-vect-stmts.c (vectorizable_conversion): Add case for widened add,
> 	  subtract.
> 	(supportable_widening_operation): Add case for widened add, subtract.
> 	* tree.def
> 	(WIDEN_PLUS_EXPR): New tree code.
> 	(WIDEN_MINUS_EXPR): New tree code.
> 	(VEC_WIDEN_ADD_HI_EXPR): New tree code.
> 	(VEC_WIDEN_PLUS_LO_EXPR): New tree code.
> 	(VEC_WIDEN_MINUS_HI_EXPR): New tree code.
> 	(VEC_WIDEN_MINUS_LO_EXPR): New tree code.
> 
> gcc/testsuite/ChangeLog:
> 
> 2020-11-13  Joel Hutton  <joel.hutton@arm.com>
> 
>         * gcc.target/aarch64/vect-widen-add.c: New test.
>         * gcc.target/aarch64/vect-widen-sub.c: New test.
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [2/3][vect] Add widening add, subtract vect patterns
  2020-11-16 14:04     ` Richard Biener
@ 2020-11-17 13:40       ` Richard Sandiford
  0 siblings, 0 replies; 6+ messages in thread
From: Richard Sandiford @ 2020-11-17 13:40 UTC (permalink / raw)
  To: Richard Biener; +Cc: Joel Hutton, Joel Hutton via Gcc-patches

Richard Biener <rguenther@suse.de> writes:
> On Fri, 13 Nov 2020, Joel Hutton wrote:
>
>> Tests are still running, but I believe I've addressed all the comments.
>> 
>> > Like Richard said, the new patterns need to be documented in md.texi
>> > and the new tree codes need to be documented in generic.texi.
>> 
>> Done.
>> 
>> > While we're using tree codes, I think we need to make the naming
>> > consistent with other tree codes: WIDEN_PLUS_EXPR instead of
>> > WIDEN_ADD_EXPR and WIDEN_MINUS_EXPR instead of WIDEN_SUB_EXPR.
>> > Same idea for the VEC_* codes.
>> 
>> Fixed.
>> 
>> > > gcc/ChangeLog:
>> > >
>> > > 2020-11-12  Joel Hutton  <joel.hutton@arm.com>
>> > >
>> > >         * expr.c (expand_expr_real_2): add widen_add,widen_subtract cases
>> > 
>> > Not that I personally care about this stuff (would love to see changelogs
>> > go away :-)) but some nits:
>> > 
>> > Each description is supposed to start with a capital letter and end with
>> > a full stop (even if it's not a complete sentence).  Same for the rest
>> 
>> Fixed.
>> 
>> > >         * optabs-tree.c (optab_for_tree_code): optabs for widening adds,subtracts
>> > 
>> > The line limit for changelogs is 80 characters.  The entry should say
>> > what changed, so ?Handle ?? or ?Add case for ?? or something.
>> 
>> Fixed.
>> 
>> > >         * tree-vect-patterns.c (vect_recog_widen_add_pattern): New recog ptatern
>> > 
>> > typo: pattern
>> 
>> Fixed.
>> 
>> > > Add widening add, subtract patterns to tree-vect-patterns.
>> > > Add aarch64 tests for patterns.
>> > >
>> > > fix sad
>> > 
>> > Would be good to expand on this for the final commit message.
>> 
>> 'fix sad' was accidentally included when I squashed two commits. I've made all the commit messages more descriptive.
>> 
>> > > +
>> > > +    case VEC_WIDEN_SUB_HI_EXPR:
>> > > +      return (TYPE_UNSIGNED (type)
>> > > +           ? vec_widen_usubl_hi_optab  : vec_widen_ssubl_hi_optab);
>> > > +
>> > > +
>> > 
>> > Nits: excess blank line at the end and excess space before the ?:?s.
>> 
>> Fixed.
>> 
>> > > +OPTAB_D (vec_widen_usubl_lo_optab, "vec_widen_usubl_lo_$a")
>> > > +OPTAB_D (vec_widen_uaddl_hi_optab, "vec_widen_uaddl_hi_$a")
>> > > +OPTAB_D (vec_widen_uaddl_lo_optab, "vec_widen_uaddl_lo_$a")
>> > >  OPTAB_D (vec_widen_sshiftl_hi_optab, "vec_widen_sshiftl_hi_$a")
>> > >  OPTAB_D (vec_widen_sshiftl_lo_optab, "vec_widen_sshiftl_lo_$a")
>> > >  OPTAB_D (vec_widen_umult_even_optab, "vec_widen_umult_even_$a")
>> > 
>> > Looks like the current code groups signed stuff together and
>> > unsigned stuff together, so would be good to follow that.
>> 
>> Fixed.
>> 
>> > Same comments as the previous patch about having a "+nosve" pragma
>> > and about the scan-assembler-times lines.  Same for the sub test.
>> 
>> Fixed.
>> 
>> > I am missing documentation in md.texi for the new patterns.  In
>> > particular I wonder why you need singed and unsigned variants
>> > for the add/subtract patterns.
>> 
>> Fixed. Signed and unsigned variants because they correspond to signed and
>> unsigned instructions, (uaddl/uaddl2, saddl/saddl2).
>> 
>> > The new functions should have comments before them.  Can probably
>> > just use the vect_recog_widen_mult_pattern comment as a template.
>> 
>> Fixed.
>> 
>> > > +    case VEC_WIDEN_SUB_HI_EXPR:
>> > > +    case VEC_WIDEN_SUB_LO_EXPR:
>> > > +    case VEC_WIDEN_ADD_HI_EXPR:
>> > > +    case VEC_WIDEN_ADD_LO_EXPR:
>> > > +      return false;
>> > > +
>> >
>> > I think these should get the same validity checking as
>> > VEC_WIDEN_MULT_HI_EXPR etc.
>> 
>> Fixed.
>> 
>> > > --- a/gcc/tree-vect-patterns.c
>> > > +++ b/gcc/tree-vect-patterns.c
>> > > @@ -1086,8 +1086,10 @@ vect_recog_sad_pattern (vec_info *vinfo,
>> > >       of the above pattern.  */
>> > >
>> > >    tree plus_oprnd0, plus_oprnd1;
>> > > -  if (!vect_reassociating_reduction_p (vinfo, stmt_vinfo, PLUS_EXPR,
>> > > -                                    &plus_oprnd0, &plus_oprnd1))
>> > > +  if (!(vect_reassociating_reduction_p (vinfo, stmt_vinfo, PLUS_EXPR,
>> > > +                                    &plus_oprnd0, &plus_oprnd1)
>> > > +     || vect_reassociating_reduction_p (vinfo, stmt_vinfo, WIDEN_ADD_EXPR,
>> > > +                                    &plus_oprnd0, &plus_oprnd1)))
>> > >      return NULL;
>> > >
>> > >     tree sum_type = gimple_expr_type (last_stmt);
>> >
>> > I think we should make:
>> >
>> >   /* Any non-truncating sequence of conversions is OK here, since
>> >      with a successful match, the result of the ABS(U) is known to fit
>> >      within the nonnegative range of the result type.  (It cannot be the
>> >      negative of the minimum signed value due to the range of the widening
>> >      MINUS_EXPR.)  */
>> >   vect_unpromoted_value unprom_abs;
>> >   plus_oprnd0 = vect_look_through_possible_promotion (vinfo, plus_oprnd0,
>> >                                                       &unprom_abs);
>> >
>> > specific to the PLUS_EXPR case.  If we look through promotions on
>> > the operands of a WIDEN_ADD_EXPR, we could potentially have a mixture
>> > of signednesses involved, one on the operands of the WIDEN_ADD_EXPR
>> > and one on its inputs.
>> 
>> Fixed.
>
> LGTM.

Same here FWIW, although:

> +/* Try to detect addition on widened inputs, converting SUB_EXPR
> +   to WIDEN_MINUS_EXPR.  See vect_recog_widen_op_pattern for details.  */
> +static gimple *
> +vect_recog_widen_minus_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info,
> +			       tree *type_out)
> +{
> +  return vect_recog_widen_op_pattern (vinfo, last_stmt_info, type_out,
> +				      MINUS_EXPR, WIDEN_MINUS_EXPR, false,
> +				      "vect_recog_widen_minus_pattern");
> +}

s/addition/subtraction/ and s/SUB_EXPR/MINUS_EXPR/ in the comment.

Just to be sure, on the changelog:

>> 	* expr.c (expand_expr_real_2): Add widen_add,widen_subtract cases.
>> 	* optabs-tree.c (optab_for_tree_code): Add case for widening optabs.
>> 	  adds, subtracts.

Continuation lines should be indented by a tab only, not a tab and two
spaces.  (Although I agree the above looks nicer than the “correct” way.)

>>         * optabs.def (OPTAB_D): Define vectorized widen add, subtracts.

Should be indented by a tab rather than 8 spaces.

>> 	* tree-cfg.c (verify_gimple_assign_binary): Add case for widening adds,
>> 	  subtracts.
>> 	* tree-inline.c (estimate_operator_cost): Add case for widening adds,
>> 	   subtracts.
>> 	* tree-vect-generic.c (expand_vector_operations_1): Add case for
>> 	  widening adds, subtracts tree-vect-patterns.c
>> 	* (vect_recog_widen_add_pattern): New recog pattern.

Mis-positioned tree-vect-patterns.c, should be

	* tree-vect-generic.c (expand_vector_operations_1): Add case for
	widening adds, subtracts.
	* tree-vect-patterns.c (vect_recog_widen_add_pattern): New recog
	pattern.
	…

No need for another review around over that though, just go ahead and
apply the patch with those changes.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-11-17 13:40 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-12 19:34 [2/3][vect] Add widening add, subtract vect patterns Joel Hutton
2020-11-13  7:58 ` Richard Biener
2020-11-13 12:16 ` Richard Sandiford
2020-11-13 16:48   ` Joel Hutton
2020-11-16 14:04     ` Richard Biener
2020-11-17 13:40       ` Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).