public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH][RFC] Add x86 subadd SLP pattern
@ 2021-06-17  9:44 Richard Biener
  2021-06-17  9:54 ` Uros Bizjak
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Richard Biener @ 2021-06-17  9:44 UTC (permalink / raw)
  To: gcc-patches; +Cc: hongtao.liu, ubizjak, tamar.christina

This addds SLP pattern recognition for the SSE3/AVX [v]addsubp{ds} v0, v1
instructions which compute { v0[0] - v1[0], v0[1], + v1[1], ... }
thus subtract, add alternating on lanes, starting with subtract.

It adds a corresponding optab and direct internal function,
vec_subadd$a3 and at the moment to make the i386 backend changes
"obvious", duplicates the existing avx_addsubv4df3 pattern with
the new canonical name (CODE_FOR_* and gen_* are used throughout the
intrinsic code, so the actual change to rename all existing patterns
will be quite a bit bigger).  I expect some bike-shedding on
subadd vs. addsub so I delay that change ;)

ARC seems to have both addsub{V2HI,V2SI,V4HI} and
subadd{V2HI,V2SI,V4HI}.  ARM iwmmxt has wsubaddx on V4HI but
I didn't dare to decipher whether it's -, + or +, -.  bfin
has both variants as well plus a saturating variant of both (on v2hi).
But none of the major vector ISAs besides x86 seem to have it.
Still some targets having both and opting for the naming I choose
would make that a better fit IMHO.

Uros/Hontao - would mass-renaming of the x86 addsub patterns
to subadd be OK?  (not touching the FMA ones which have both)

The SLP pattern matches the exact alternating lane sequence rather
than trying to be clever and anticipating incoming permutes - we
could permute the two input vectors to the needed lane alternation,
do the subadd and then permute the result vector back but that's
only profitable in case the two input or the output permute will
vanish - something Tamars refactoring of SLP pattern recog should
make possible (I hope ;)).

So as said, the patch is incomplete on the x86 backend (and test
coverage) side, but the vectorizer part should be finalized until
Tamar posts his work.

Bootstrapped / tested on x86_64-unknown-linux-gnu.

2021-06-17  Richard Biener  <rguenther@suse.de>

	* config/i386/sse.md (vec_subaddv4df3): Duplicate from
	avx_addsubv4df3.
	* internal-fn.def (VEC_SUBADD): New internal optab fn.
	* optabs.def (vec_subadd_optab): New optab.
	* tree-vect-slp-patterns.c (class subadd_pattern): New.
	(slp_patterns): Add subadd_pattern.
	* tree-vectorizer.h (vect_pattern::vect_pattern): Make
	m_ops optional.

	* gcc.target/i386/vect-subadd-1.c: New testcase.
---
 gcc/config/i386/sse.md                        | 16 +++
 gcc/internal-fn.def                           |  1 +
 gcc/optabs.def                                |  1 +
 gcc/testsuite/gcc.target/i386/vect-subadd-1.c | 35 +++++++
 gcc/tree-vect-slp-patterns.c                  | 98 +++++++++++++++++++
 gcc/tree-vectorizer.h                         |  3 +-
 6 files changed, 153 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-subadd-1.c

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 94296bc773b..73238c9874a 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -2396,6 +2396,22 @@
    (set_attr "prefix" "<round_saeonly_scalar_prefix>")
    (set_attr "mode" "<ssescalarmode>")])
 
+/* ???  Rename avx_addsubv4df3, but its name is used throughout the backed,
+   so avoid this churn for the moment.  */
+(define_insn "vec_subaddv4df3"
+  [(set (match_operand:V4DF 0 "register_operand" "=x")
+	(vec_merge:V4DF
+	  (minus:V4DF
+	    (match_operand:V4DF 1 "register_operand" "x")
+	    (match_operand:V4DF 2 "nonimmediate_operand" "xm"))
+	  (plus:V4DF (match_dup 1) (match_dup 2))
+	  (const_int 5)))]
+  "TARGET_AVX"
+  "vaddsubpd\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "sseadd")
+   (set_attr "prefix" "vex")
+   (set_attr "mode" "V4DF")])
+
 (define_insn "avx_addsubv4df3"
   [(set (match_operand:V4DF 0 "register_operand" "=x")
 	(vec_merge:V4DF
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index b2f414d2131..a4578a0293e 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -281,6 +281,7 @@ DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT90, ECF_CONST, cadd90, binary)
 DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST, cadd270, binary)
 DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL, ECF_CONST, cmul, binary)
 DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL_CONJ, ECF_CONST, cmul_conj, binary)
+DEF_INTERNAL_OPTAB_FN (VEC_SUBADD, ECF_CONST, vec_subadd, binary)
 
 
 /* FP scales.  */
diff --git a/gcc/optabs.def b/gcc/optabs.def
index b192a9d070b..cacbd02cfe1 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -407,6 +407,7 @@ OPTAB_D (vec_widen_usubl_hi_optab, "vec_widen_usubl_hi_$a")
 OPTAB_D (vec_widen_usubl_lo_optab, "vec_widen_usubl_lo_$a")
 OPTAB_D (vec_widen_uaddl_hi_optab, "vec_widen_uaddl_hi_$a")
 OPTAB_D (vec_widen_uaddl_lo_optab, "vec_widen_uaddl_lo_$a")
+OPTAB_D (vec_subadd_optab, "vec_subadd$a3")
 
 OPTAB_D (sync_add_optab, "sync_add$I$a")
 OPTAB_D (sync_and_optab, "sync_and$I$a")
diff --git a/gcc/testsuite/gcc.target/i386/vect-subadd-1.c b/gcc/testsuite/gcc.target/i386/vect-subadd-1.c
new file mode 100644
index 00000000000..a79e5ba92e2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-subadd-1.c
@@ -0,0 +1,35 @@
+/* { dg-do run { target avx_runtime } } */
+/* { dg-options "-O3 -mavx2 -fdump-tree-slp2" } */
+
+double x[4], y[4], z[4];
+void __attribute__((noipa)) foo ()
+{
+  x[0] = y[0] - z[0];
+  x[1] = y[1] + z[1];
+  x[2] = y[2] - z[2];
+  x[3] = y[3] + z[3];
+}
+void __attribute__((noipa)) bar ()
+{
+  x[0] = y[0] + z[0];
+  x[1] = y[1] - z[1];
+  x[2] = y[2] + z[2];
+  x[3] = y[3] - z[3];
+}
+int main()
+{
+  for (int i = 0; i < 4; ++i)
+    {
+      y[i] = i + 1;
+      z[i] = 2 * i + 1;
+    }
+  foo ();
+  if (x[0] != 0 || x[1] != 5 || x[2] != -2 || x[3] != 11)
+    __builtin_abort ();
+  bar ();
+  if (x[0] != 2 || x[1] != -1 || x[2] != 8 || x[3] != -3)
+    __builtin_abort ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "SUBADD" 1 "slp2" } } */
diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
index 2ed49cd9edc..61732c53ad7 100644
--- a/gcc/tree-vect-slp-patterns.c
+++ b/gcc/tree-vect-slp-patterns.c
@@ -1490,6 +1490,103 @@ complex_operations_pattern::build (vec_info * /* vinfo */)
   gcc_unreachable ();
 }
 
+
+/* The subadd_pattern.  */
+
+class subadd_pattern : public vect_pattern
+{
+  public:
+    subadd_pattern (slp_tree *node)
+	: vect_pattern (node, NULL, IFN_VEC_SUBADD) {};
+
+    void build (vec_info *);
+
+    static vect_pattern*
+    recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
+};
+
+vect_pattern *
+subadd_pattern::recognize (slp_tree_to_load_perm_map_t *, slp_tree *node_)
+{
+  slp_tree node = *node_;
+  if (SLP_TREE_CODE (node) != VEC_PERM_EXPR
+      || SLP_TREE_CHILDREN (node).length () != 2)
+    return NULL;
+
+  /* Match a blend of a plus and a minus op with the same number of plus and
+     minus lanes on the same operands.  */
+  slp_tree sub = SLP_TREE_CHILDREN (node)[0];
+  slp_tree add = SLP_TREE_CHILDREN (node)[1];
+  bool swapped_p = false;
+  if (vect_match_expression_p (sub, PLUS_EXPR))
+    {
+      std::swap (add, sub);
+      swapped_p = true;
+    }
+  if (!(vect_match_expression_p (add, PLUS_EXPR)
+	&& vect_match_expression_p (sub, MINUS_EXPR)))
+    return NULL;
+  if (!((SLP_TREE_CHILDREN (sub)[0] == SLP_TREE_CHILDREN (add)[0]
+	 && SLP_TREE_CHILDREN (sub)[1] == SLP_TREE_CHILDREN (add)[1])
+	|| (SLP_TREE_CHILDREN (sub)[0] == SLP_TREE_CHILDREN (add)[1]
+	    && SLP_TREE_CHILDREN (sub)[1] == SLP_TREE_CHILDREN (add)[0])))
+    return NULL;
+
+  for (unsigned i = 0; i < SLP_TREE_LANE_PERMUTATION (node).length (); ++i)
+    {
+      std::pair<unsigned, unsigned> perm = SLP_TREE_LANE_PERMUTATION (node)[i];
+      if (swapped_p)
+	perm.first = perm.first == 0 ? 1 : 0;
+      /* It has to be alternating -, +, -, ...
+	 While we could permute the .SUBADD inputs and the .SUBADD output
+	 that's only profitable over the add + sub + blend if at least
+	 one of the permute is optimized which we can't determine here.  */
+      if (perm.first != (i & 1)
+	  || perm.second != i)
+	return NULL;
+    }
+
+  if (!vect_pattern_validate_optab (IFN_VEC_SUBADD, node))
+    return NULL;
+
+  return new subadd_pattern (node_);
+}
+
+void
+subadd_pattern::build (vec_info *vinfo)
+{
+  slp_tree node = *m_node;
+
+  slp_tree sub = SLP_TREE_CHILDREN (node)[0];
+  slp_tree add = SLP_TREE_CHILDREN (node)[1];
+  if (vect_match_expression_p (sub, PLUS_EXPR))
+    std::swap (add, sub);
+
+  /* Modify the blend node in-place.  */
+  SLP_TREE_CHILDREN (node)[0] = SLP_TREE_CHILDREN (sub)[0];
+  SLP_TREE_CHILDREN (node)[1] = SLP_TREE_CHILDREN (sub)[1];
+  SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[0])++;
+  SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[1])++;
+
+  /* Build IFN_VEC_SUBADD from the sub representative operands.  */
+  stmt_vec_info rep = SLP_TREE_REPRESENTATIVE (sub);
+  gcall *call = gimple_build_call_internal (IFN_VEC_SUBADD, 2,
+					    gimple_assign_rhs1 (rep->stmt),
+					    gimple_assign_rhs2 (rep->stmt));
+  gimple_call_set_lhs (call, make_ssa_name
+			       (TREE_TYPE (gimple_assign_lhs (rep->stmt))));
+  gimple_call_set_nothrow (call, true);
+  gimple_set_bb (call, gimple_bb (rep->stmt));
+  SLP_TREE_REPRESENTATIVE (node) = vinfo->add_pattern_stmt (call, rep);
+  STMT_VINFO_RELEVANT (SLP_TREE_REPRESENTATIVE (node)) = vect_used_in_scope;
+  STMT_SLP_TYPE (SLP_TREE_REPRESENTATIVE (node)) = pure_slp;
+  SLP_TREE_CODE (node) = ERROR_MARK;
+  SLP_TREE_LANE_PERMUTATION (node).release ();
+
+  vect_free_slp_tree (sub);
+  vect_free_slp_tree (add);
+}
+
 /*******************************************************************************
  * Pattern matching definitions
  ******************************************************************************/
@@ -1502,6 +1599,7 @@ vect_pattern_decl_t slp_patterns[]
      overlap in what they can detect.  */
 
   SLP_PATTERN (complex_operations_pattern),
+  SLP_PATTERN (subadd_pattern)
 };
 #undef SLP_PATTERN
 
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 04c20f8bd0f..b9824623ad9 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -2100,7 +2100,8 @@ class vect_pattern
       this->m_ifn = ifn;
       this->m_node = node;
       this->m_ops.create (0);
-      this->m_ops.safe_splice (*m_ops);
+      if (m_ops)
+	this->m_ops.safe_splice (*m_ops);
     }
 
   public:
-- 
2.26.2

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-06-25  7:55 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-17  9:44 [PATCH][RFC] Add x86 subadd SLP pattern Richard Biener
2021-06-17  9:54 ` Uros Bizjak
2021-06-17 10:00   ` Richard Biener
2021-06-17 10:09     ` Uros Bizjak
2021-06-17  9:59 ` Uros Bizjak
2021-06-17 10:43   ` Richard Biener
2021-06-17 13:47     ` Richard Biener
2021-06-22  9:42     ` Richard Sandiford
2021-06-22 10:17       ` Uros Bizjak
2021-06-22 10:34         ` Richard Biener
2021-06-22 10:49           ` Uros Bizjak
2021-06-24 11:07             ` Richard Biener
2021-06-24 14:25               ` Uros Bizjak
2021-06-25  6:48                 ` Richard Biener
2021-06-25  7:55                   ` Uros Bizjak
2021-06-17 10:24 ` Tamar Christina

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).