public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: [PATCH] Support addsub/subadd as non-isomorphic operations for SLP vectorizer.
@ 2013-11-22 11:57 Uros Bizjak
  2013-11-22 21:33 ` Cong Hou
  0 siblings, 1 reply; 27+ messages in thread
From: Uros Bizjak @ 2013-11-22 11:57 UTC (permalink / raw)
  To: gcc-patches; +Cc: Cong Hou, Richard Biener

Hello!

> In consequence, the ix86_expand_multi_arg_builtin() function tries to
> check two args but based on the define_expand of xop_vmfrcz<mode>2,
> the content of insn_data[CODE_FOR_xop_vmfrczv4sf2].operand[2] may be
> incorrect (because it only needs one input).

 ;; scalar insns
-(define_expand "xop_vmfrcz<mode>2"
+(define_expand "xop_vmfrcz<mode>3"
   [(set (match_operand:VF_128 0 "register_operand")
        (vec_merge:VF_128
          (unspec:VF_128
           [(match_operand:VF_128 1 "nonimmediate_operand")]
           UNSPEC_FRCZ)
-         (match_dup 3)
+         (match_operand:VF_128 2 "register_operand")
          (const_int 1)))]
   "TARGET_XOP"
 {
-  operands[3] = CONST0_RTX (<MODE>mode);
+  operands[2] = CONST0_RTX (<MODE>mode);
 })

No, just use (match_dup 2) in the RTX in addition to operands[2]
change. Do not rename patterns.

Uros.

^ permalink raw reply	[flat|nested] 27+ messages in thread
* Re: [PATCH] Support addsub/subadd as non-isomorphic operations for SLP vectorizer.
@ 2014-07-10  7:51 Uros Bizjak
  0 siblings, 0 replies; 27+ messages in thread
From: Uros Bizjak @ 2014-07-10  7:51 UTC (permalink / raw)
  To: Cong Hou; +Cc: gcc-patches, Richard Biener, David Li

Hello!

> Ping?

>>>>>>>>> While I added the new define_insn_and_split for vec_merge, a bug is
>>>>>>>>> exposed: in config/i386/sse.md, [ define_expand "xop_vmfrcz<mode>2" ]
>>>>>>>>> only takes one input, but the corresponding builtin functions have two
>>>>>>>>> inputs, which are shown in i386.c:
>>>>>>>>
>>>>>>>>>  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_vmfrczv4sf2,
>>>>>>>>> "__builtin_ia32_vfrczss",     IX86_BUILTIN_VFRCZSS,     UNKNOWN,
>>>>>>>>> (int)MULTI_ARG_2_SF },
>>>>>>>>>  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_vmfrczv2df2,
>>>>>>>>> "__builtin_ia32_vfrczsd",     IX86_BUILTIN_VFRCZSD,     UNKNOWN,
>>>>>>>>> (int)MULTI_ARG_2_DF },

[...]

>>>>>>>> This is PR 56788. Your patch seems strange to me and I don't think it
>>>>>>>> fixes the real issue, but I'll let more knowledgeable people answer.

It is not clear to me what are you pinging. The PR 56788 mentioned
above was fixed some time ago. Please repost the pinged patch to avoid
further confusion.

Uros.

^ permalink raw reply	[flat|nested] 27+ messages in thread
* Re: [PATCH] Support addsub/subadd as non-isomorphic operations for SLP vectorizer.
@ 2013-11-15 10:06 Uros Bizjak
  2013-11-18 21:07 ` Cong Hou
  0 siblings, 1 reply; 27+ messages in thread
From: Uros Bizjak @ 2013-11-15 10:06 UTC (permalink / raw)
  To: gcc-patches; +Cc: Cong Hou, Richard Biener

Hello!

> This patch adds the support to two non-isomorphic operations addsub
> and subadd for SLP vectorizer. More non-isomorphic operations can be
> added later, but the limitation is that operations on even/odd
> elements should still be isomorphic. Once such an operation is
> detected, the code of the operation used in vectorized code is stored
> and later will be used during statement transformation. Two new GIMPLE
> opeartions VEC_ADDSUB_EXPR and VEC_SUBADD_EXPR are defined. And also
> new optabs for them. They are also documented.
>
> The target supports for SSE/SSE2/SSE3/AVX are added for those two new
> operations on floating points. SSE3/AVX provides ADDSUBPD and ADDSUBPS
> instructions. For SSE/SSE2, those two operations are emulated using
> two instructions (selectively negate then add).

   ;; SSE3
   UNSPEC_LDDQU
+  UNSPEC_SUBADD
+  UNSPEC_ADDSUB

No! Please avoid unspecs.

+(define_expand "vec_subadd_v4sf3"
+  [(set (match_operand:V4SF 0 "register_operand")
+ (unspec:V4SF
+  [(match_operand:V4SF 1 "register_operand")
+   (match_operand:V4SF 2 "nonimmediate_operand")] UNSPEC_SUBADD))]
+  "TARGET_SSE"
+{
+  if (TARGET_SSE3)
+    emit_insn (gen_sse3_addsubv4sf3 (operands[0], operands[1], operands[2]));
+  else
+    ix86_sse_expand_fp_addsub_operator (true, V4SFmode, operands);
+  DONE;
+})

Make the expander pattern look like correspondig sse3 insn and:
...
{
  if (!TARGET_SSE3)
    {
      ix86_sse_expand_fp_...();
      DONE;
    }
}

Uros.

^ permalink raw reply	[flat|nested] 27+ messages in thread
* [PATCH] Support addsub/subadd as non-isomorphic operations for SLP vectorizer.
@ 2013-11-15  8:53 Cong Hou
  2013-11-15 10:02 ` Richard Biener
  2013-11-15 19:25 ` Richard Earnshaw
  0 siblings, 2 replies; 27+ messages in thread
From: Cong Hou @ 2013-11-15  8:53 UTC (permalink / raw)
  To: GCC Patches; +Cc: Richard Biener

[-- Attachment #1: Type: text/plain, Size: 22369 bytes --]

Hi

This patch adds the support to two non-isomorphic operations addsub
and subadd for SLP vectorizer. More non-isomorphic operations can be
added later, but the limitation is that operations on even/odd
elements should still be isomorphic. Once such an operation is
detected, the code of the operation used in vectorized code is stored
and later will be used during statement transformation. Two new GIMPLE
opeartions VEC_ADDSUB_EXPR and VEC_SUBADD_EXPR are defined. And also
new optabs for them. They are also documented.

The target supports for SSE/SSE2/SSE3/AVX are added for those two new
operations on floating points. SSE3/AVX provides ADDSUBPD and ADDSUBPS
instructions. For SSE/SSE2, those two operations are emulated using
two instructions (selectively negate then add).

With this patch the following function will be SLP vectorized:


float a[4], b[4], c[4];  // double also OK.

void subadd ()
{
  c[0] = a[0] - b[0];
  c[1] = a[1] + b[1];
  c[2] = a[2] - b[2];
  c[3] = a[3] + b[3];
}

void addsub ()
{
  c[0] = a[0] + b[0];
  c[1] = a[1] - b[1];
  c[2] = a[2] + b[2];
  c[3] = a[3] - b[3];
}


Boostrapped and tested on an x86-64 machine.


thanks,
Cong





diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 2c0554b..656d5fb 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,31 @@
+2013-11-14  Cong Hou  <congh@google.com>
+
+ * tree-vect-slp.c (vect_create_new_slp_node): Initialize
+ SLP_TREE_OP_CODE.
+ (slp_supported_non_isomorphic_op): New function.  Check if the
+ non-isomorphic operation is supported or not.
+ (vect_build_slp_tree_1): Consider non-isomorphic operations.
+ (vect_build_slp_tree): Change argument.
+ * tree-vect-stmts.c (vectorizable_operation): Consider the opcode
+ for non-isomorphic operations.
+ * optabs.def (vec_addsub_optab, vec_subadd_optab): New optabs.
+ * tree.def (VEC_ADDSUB_EXPR, VEC_SUBADD_EXPR): New operations.
+ * expr.c (expand_expr_real_2): Add support to VEC_ADDSUB_EXPR and
+ VEC_SUBADD_EXPR.
+ * gimple-pretty-print.c (dump_binary_rhs): Likewise.
+ * optabs.c (optab_for_tree_code): Likewise.
+ * tree-cfg.c (verify_gimple_assign_binary): Likewise.
+ * tree-vectorizer.h (struct _slp_tree): New data member.
+ * config/i386/i386-protos.h (ix86_sse_expand_fp_addsub_operator):
+ New funtion.  Expand addsub/subadd operations for SSE2.
+ * config/i386/i386.c (ix86_sse_expand_fp_addsub_operator): Likewise.
+ * config/i386/sse.md (UNSPEC_SUBADD, UNSPEC_ADDSUB): New RTL operation.
+ (vec_subadd_v4sf3, vec_subadd_v2df3, vec_subadd_<mode>3,
+ vec_addsub_v4sf3, vec_addsub_v2df3, vec_addsub_<mode>3):
+ Expand addsub/subadd operations for SSE/SSE2/SSE3/AVX.
+ * doc/generic.texi (VEC_ADDSUB_EXPR, VEC_SUBADD_EXPR): New doc.
+ * doc/md.texi (vec_addsub_@var{m}3, vec_subadd_@var{m}3): New doc.
+
 2013-11-12  Jeff Law  <law@redhat.com>

  * tree-ssa-threadedge.c (thread_around_empty_blocks): New
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index fdf9d58..b02b757 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -117,6 +117,7 @@ extern rtx ix86_expand_adjust_ufix_to_sfix_si (rtx, rtx *);
 extern enum ix86_fpcmp_strategy ix86_fp_comparison_strategy (enum rtx_code);
 extern void ix86_expand_fp_absneg_operator (enum rtx_code, enum machine_mode,
     rtx[]);
+extern void ix86_sse_expand_fp_addsub_operator (bool, enum
machine_mode, rtx[]);
 extern void ix86_expand_copysign (rtx []);
 extern void ix86_split_copysign_const (rtx []);
 extern void ix86_split_copysign_var (rtx []);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 5287b49..76f38f5 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -18702,6 +18702,51 @@ ix86_expand_fp_absneg_operator (enum rtx_code
code, enum machine_mode mode,
     emit_insn (set);
 }

+/* Generate code for addsub or subadd on fp vectors for sse/sse2.  The flag
+   SUBADD indicates if we are generating code for subadd or addsub.  */
+
+void
+ix86_sse_expand_fp_addsub_operator (bool subadd, enum machine_mode mode,
+    rtx operands[])
+{
+  rtx mask;
+  rtx neg_mask32 = GEN_INT (0x80000000);
+  rtx neg_mask64 = GEN_INT ((HOST_WIDE_INT)1 << 63);
+
+  switch (mode)
+    {
+    case V4SFmode:
+      if (subadd)
+ mask = gen_rtx_CONST_VECTOR (V4SImode, gen_rtvec (4,
+ neg_mask32, const0_rtx, neg_mask32, const0_rtx));
+      else
+ mask = gen_rtx_CONST_VECTOR (V4SImode, gen_rtvec (4,
+ const0_rtx, neg_mask32, const0_rtx, neg_mask32));
+      break;
+
+    case V2DFmode:
+      if (subadd)
+ mask = gen_rtx_CONST_VECTOR (V2DImode, gen_rtvec (2,
+ neg_mask64, const0_rtx));
+      else
+ mask = gen_rtx_CONST_VECTOR (V2DImode, gen_rtvec (2,
+ const0_rtx, neg_mask64));
+      break;
+
+    default:
+      gcc_unreachable ();
+    }
+
+  rtx tmp = gen_reg_rtx (mode);
+  convert_move (tmp, mask, false);
+
+  rtx tmp2 = gen_reg_rtx (mode);
+  tmp2 = expand_simple_binop (mode, XOR, tmp, operands[2],
+      tmp2, 0, OPTAB_DIRECT);
+  expand_simple_binop (mode, PLUS, operands[1], tmp2,
+       operands[0], 0, OPTAB_DIRECT);
+}
+
 /* Expand a copysign operation.  Special case operand 0 being a constant.  */

 void
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 7bb2d77..4369b2e 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -25,6 +25,8 @@

   ;; SSE3
   UNSPEC_LDDQU
+  UNSPEC_SUBADD
+  UNSPEC_ADDSUB

   ;; SSSE3
   UNSPEC_PSHUFB
@@ -1508,6 +1510,80 @@
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "<MODE>")])

+(define_expand "vec_subadd_v4sf3"
+  [(set (match_operand:V4SF 0 "register_operand")
+ (unspec:V4SF
+  [(match_operand:V4SF 1 "register_operand")
+   (match_operand:V4SF 2 "nonimmediate_operand")] UNSPEC_SUBADD))]
+  "TARGET_SSE"
+{
+  if (TARGET_SSE3)
+    emit_insn (gen_sse3_addsubv4sf3 (operands[0], operands[1], operands[2]));
+  else
+    ix86_sse_expand_fp_addsub_operator (true, V4SFmode, operands);
+  DONE;
+})
+
+(define_expand "vec_subadd_v2df3"
+  [(set (match_operand:V2DF 0 "register_operand")
+ (unspec:V2DF
+  [(match_operand:V2DF 1 "register_operand")
+   (match_operand:V2DF 2 "nonimmediate_operand")] UNSPEC_SUBADD))]
+  "TARGET_SSE2"
+{
+  if (TARGET_SSE3)
+    emit_insn (gen_sse3_addsubv2df3 (operands[0], operands[1], operands[2]));
+  else
+    ix86_sse_expand_fp_addsub_operator (true, V2DFmode, operands);
+  DONE;
+})
+
+(define_expand "vec_subadd_<mode>3"
+  [(set (match_operand:VF_256 0 "register_operand")
+ (unspec:VF_256
+  [(match_operand:VF_256 1 "register_operand")
+   (match_operand:VF_256 2 "nonimmediate_operand")] UNSPEC_SUBADD))]
+  "TARGET_AVX"
+{
+  emit_insn (gen_avx_addsub<mode>3 (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
+(define_expand "vec_addsub_v4sf3"
+  [(set (match_operand:V4SF 0 "register_operand")
+ (unspec:V4SF
+  [(match_operand:V4SF 1 "register_operand")
+   (match_operand:V4SF 2 "nonimmediate_operand")] UNSPEC_SUBADD))]
+  "TARGET_SSE"
+{
+  ix86_sse_expand_fp_addsub_operator (false, V4SFmode, operands);
+  DONE;
+})
+
+(define_expand "vec_addsub_v2df3"
+  [(set (match_operand:V2DF 0 "register_operand")
+ (unspec:V2DF
+  [(match_operand:V2DF 1 "register_operand")
+   (match_operand:V2DF 2 "nonimmediate_operand")] UNSPEC_SUBADD))]
+  "TARGET_SSE2"
+{
+  ix86_sse_expand_fp_addsub_operator (false, V2DFmode, operands);
+  DONE;
+})
+
+(define_expand "vec_addsub_<mode>3"
+  [(set (match_operand:VF_256 0 "register_operand")
+ (unspec:VF_256
+  [(match_operand:VF_256 1 "register_operand")
+   (match_operand:VF_256 2 "nonimmediate_operand")] UNSPEC_ADDSUB))]
+  "TARGET_AVX"
+{
+  rtx tmp = gen_reg_rtx (<MODE>mode);
+  emit_insn (gen_neg<mode>2 (tmp, operands[2]));
+  emit_insn (gen_avx_addsub<mode>3 (operands[0], operands[1], tmp));
+  DONE;
+})
+
 (define_insn "avx_addsubv4df3"
   [(set (match_operand:V4DF 0 "register_operand" "=x")
  (vec_merge:V4DF
diff --git a/gcc/doc/generic.texi b/gcc/doc/generic.texi
index f2dd0ff..0870d6f 100644
--- a/gcc/doc/generic.texi
+++ b/gcc/doc/generic.texi
@@ -1715,6 +1715,8 @@ a value from @code{enum annot_expr_kind}.
 @tindex VEC_PACK_TRUNC_EXPR
 @tindex VEC_PACK_SAT_EXPR
 @tindex VEC_PACK_FIX_TRUNC_EXPR
+@tindex VEC_ADDSUB_EXPR
+@tindex VEC_SUBADD_EXPR

 @table @code
 @item VEC_LSHIFT_EXPR
@@ -1795,6 +1797,12 @@ value, it is taken from the second operand. It
should never evaluate to
 any other value currently, but optimizations should not rely on that
 property. In contrast with a @code{COND_EXPR}, all operands are always
 evaluated.
+
+@item VEC_ADDSUB_EXPR
+@itemx VEC_SUBADD_EXPR
+These nodes represent add/sub and sub/add operations on even/odd elements
+of two vectors respectively.  The three operands must be vectors of the same
+size and number of elements.
 @end table


diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 1a06e3d..d9726d2 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -4885,6 +4885,12 @@ with N signed/unsigned elements of size S@.
Operand 2 is a constant.  Shift
 the high/low elements of operand 1, and put the N/2 results of size 2*S in the
 output vector (operand 0).

+@cindex @code{vec_addsub_@var{m}3} instruction pattern
+@cindex @code{vec_subadd_@var{m}3} instruction pattern
+@item @samp{vec_addsub_@var{m}3}, @samp{vec_subadd_@var{m}3}
+Perform add/sub or sub/add on even/odd elements of two vectors.  Each
+operand is a vector with N elements of size S@.
+
 @cindex @code{mulhisi3} instruction pattern
 @item @samp{mulhisi3}
 Multiply operands 1 and 2, which have mode @code{HImode}, and store
diff --git a/gcc/expr.c b/gcc/expr.c
index 28b4332..997cfe2 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -8743,6 +8743,8 @@ expand_expr_real_2 (sepops ops, rtx target, enum
machine_mode tmode,
     case BIT_AND_EXPR:
     case BIT_IOR_EXPR:
     case BIT_XOR_EXPR:
+    case VEC_ADDSUB_EXPR:
+    case VEC_SUBADD_EXPR:
       goto binop;

     case LROTATE_EXPR:
diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
index 6842213..e5c7a93 100644
--- a/gcc/gimple-pretty-print.c
+++ b/gcc/gimple-pretty-print.c
@@ -355,6 +355,8 @@ dump_binary_rhs (pretty_printer *buffer, gimple
gs, int spc, int flags)
     case VEC_PACK_FIX_TRUNC_EXPR:
     case VEC_WIDEN_LSHIFT_HI_EXPR:
     case VEC_WIDEN_LSHIFT_LO_EXPR:
+    case VEC_ADDSUB_EXPR:
+    case VEC_SUBADD_EXPR:
       for (p = get_tree_code_name (code); *p; p++)
  pp_character (buffer, TOUPPER (*p));
       pp_string (buffer, " <");
diff --git a/gcc/optabs.c b/gcc/optabs.c
index 164e4dd..a725117 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -547,6 +547,12 @@ optab_for_tree_code (enum tree_code code, const_tree type,
       return TYPE_UNSIGNED (type) ?
  vec_pack_ufix_trunc_optab : vec_pack_sfix_trunc_optab;

+    case VEC_ADDSUB_EXPR:
+      return vec_addsub_optab;
+
+    case VEC_SUBADD_EXPR:
+      return vec_subadd_optab;
+
     default:
       break;
     }
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 6b924ac..3a09c52 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -281,6 +281,8 @@ OPTAB_D (vec_widen_umult_lo_optab, "vec_widen_umult_lo_$a")
 OPTAB_D (vec_widen_umult_odd_optab, "vec_widen_umult_odd_$a")
 OPTAB_D (vec_widen_ushiftl_hi_optab, "vec_widen_ushiftl_hi_$a")
 OPTAB_D (vec_widen_ushiftl_lo_optab, "vec_widen_ushiftl_lo_$a")
+OPTAB_D (vec_addsub_optab, "vec_addsub_$a3")
+OPTAB_D (vec_subadd_optab, "vec_subadd_$a3")

 OPTAB_D (sync_add_optab, "sync_add$I$a")
 OPTAB_D (sync_and_optab, "sync_and$I$a")
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 09c7f20..efd6c24 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,10 @@
+2013-11-14  Cong Hou  <congh@google.com>
+
+ * lib/target-supports.exp (check_effective_target_vect_addsub):
+ New target.
+ * gcc.dg/vect/vect-addsub-float.c: New test.
+ * gcc.dg/vect/vect-addsub-double.c: New test.
+
 2013-11-12  Balaji V. Iyer  <balaji.v.iyer@intel.com>

  * gcc.dg/cilk-plus/cilk-plus.exp: Added a check for LTO before running
diff --git a/gcc/testsuite/gcc.dg/vect/vect-addsub-double.c
b/gcc/testsuite/gcc.dg/vect/vect-addsub-double.c
new file mode 100644
index 0000000..5399dde
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-addsub-double.c
@@ -0,0 +1,51 @@
+/* { dg-require-effective-target vect_addsub } */
+/* { dg-additional-options "-fdump-tree-slp-details" } */
+
+#include "tree-vect.h"
+
+double a[4], b[4], c[4];
+
+void subadd ()
+{
+  c[0] = a[0] - b[0];
+  c[1] = a[1] + b[1];
+  c[2] = a[2] - b[2];
+  c[3] = a[3] + b[3];
+}
+
+void addsub ()
+{
+  c[0] = a[0] + b[0];
+  c[1] = a[1] - b[1];
+  c[2] = a[2] + b[2];
+  c[3] = a[3] - b[3];
+}
+
+int main()
+{
+  int i;
+  for (i = 0; i < 4; ++i)
+    {
+      a[i] = (i + 1.2) / 3.4;
+      b[i] = (i + 5.6) / 7.8;
+    }
+
+  subadd ();
+
+  if (c[0] != a[0] - b[0]
+      || c[1] != a[1] + b[1]
+      || c[2] != a[2] - b[2]
+      || c[3] != a[3] + b[3])
+    abort ();
+
+  addsub ();
+
+  if (c[0] != a[0] + b[0]
+      || c[1] != a[1] - b[1]
+      || c[2] != a[2] + b[2]
+      || c[3] != a[3] - b[3])
+    abort ();
+}
+
+/* { dg-final { scan-tree-dump-times "basic block vectorized" 2 "slp" } } */
+/* { dg-final { cleanup-tree-dump "slp" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-addsub-float.c
b/gcc/testsuite/gcc.dg/vect/vect-addsub-float.c
new file mode 100644
index 0000000..5b780f3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-addsub-float.c
@@ -0,0 +1,51 @@
+/* { dg-require-effective-target vect_addsub } */
+/* { dg-additional-options "-fdump-tree-slp-details" } */
+
+#include "tree-vect.h"
+
+float a[4], b[4], c[4];
+
+void subadd ()
+{
+  c[0] = a[0] - b[0];
+  c[1] = a[1] + b[1];
+  c[2] = a[2] - b[2];
+  c[3] = a[3] + b[3];
+}
+
+void addsub ()
+{
+  c[0] = a[0] + b[0];
+  c[1] = a[1] - b[1];
+  c[2] = a[2] + b[2];
+  c[3] = a[3] - b[3];
+}
+
+int main()
+{
+  int i;
+  for (i = 0; i < 4; ++i)
+    {
+      a[i] = (i + 1.2) / 3.4;
+      b[i] = (i + 5.6) / 7.8;
+    }
+
+  subadd ();
+
+  if (c[0] != a[0] - b[0]
+      || c[1] != a[1] + b[1]
+      || c[2] != a[2] - b[2]
+      || c[3] != a[3] + b[3])
+    abort ();
+
+  addsub ();
+
+  if (c[0] != a[0] + b[0]
+      || c[1] != a[1] - b[1]
+      || c[2] != a[2] + b[2]
+      || c[3] != a[3] - b[3])
+    abort ();
+}
+
+/* { dg-final { scan-tree-dump-times "basic block vectorized" 2 "slp" } } */
+/* { dg-final { cleanup-tree-dump "slp" } } */
diff --git a/gcc/testsuite/lib/target-supports.exp
b/gcc/testsuite/lib/target-supports.exp
index c3d9712..f336f77 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -4099,6 +4099,15 @@ proc check_effective_target_vect_extract_even_odd { } {
     return $et_vect_extract_even_odd_saved
 }

+# Return 1 if the target supports vector addsub and subadd
operations, 0 otherwise.
+
+proc check_effective_target_vect_addsub { } {
+    if { [check_effective_target_sse2] } {
+ return 1
+    }
+    return 0
+}
+
 # Return 1 if the target supports vector interleaving, 0 otherwise.

 proc check_effective_target_vect_interleave { } {
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index 601efd6..2bf1b79 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -3572,6 +3572,23 @@ verify_gimple_assign_binary (gimple stmt)
         return false;
       }

+    case VEC_SUBADD_EXPR:
+    case VEC_ADDSUB_EXPR:
+      {
+        if (TREE_CODE (rhs1_type) != VECTOR_TYPE
+            || TREE_CODE (rhs2_type) != VECTOR_TYPE
+            || TREE_CODE (lhs_type) != VECTOR_TYPE)
+          {
+            error ("type mismatch in addsub/subadd expression");
+            debug_generic_expr (lhs_type);
+            debug_generic_expr (rhs1_type);
+            debug_generic_expr (rhs2_type);
+            return true;
+          }
+
+        return false;
+      }
+
     case PLUS_EXPR:
     case MINUS_EXPR:
       {
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 825f73a..1169d33 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -125,6 +125,7 @@ vect_create_new_slp_node (vec<gimple> scalar_stmts)
   SLP_TREE_VEC_STMTS (node).create (0);
   SLP_TREE_CHILDREN (node).create (nops);
   SLP_TREE_LOAD_PERMUTATION (node) = vNULL;
+  SLP_TREE_OP_CODE (node) = ERROR_MARK;

   return node;
 }
@@ -383,8 +384,74 @@ vect_get_and_check_slp_defs (loop_vec_info
loop_vinfo, bb_vec_info bb_vinfo,
   return true;
 }

+/* Check if the target supports the vector operation that performs the
+   operation of FIRST_STMT_CODE on even elements and the operation as in STMT
+   on odd elements.  If yes, set the code of NODE to that of the new operation
+   and return true.  Otherwise return false.  This enables SLP vectorization
+   for the following code:

-/* Verify if the scalar stmts STMTS are isomorphic, require data
+           a[0] = b[0] + c[0];
+           a[1] = b[1] - c[1];
+           a[2] = b[2] + c[2];
+           a[3] = b[3] - c[3];
+   */
+
+static bool
+slp_supported_non_isomorphic_op (enum tree_code first_stmt_code,
+ gimple stmt,
+ slp_tree *node)
+{
+  if (!is_gimple_assign (stmt))
+    return false;
+
+  enum tree_code rhs_code = gimple_assign_rhs_code (stmt);
+  enum tree_code vec_opcode = ERROR_MARK;
+
+  switch (first_stmt_code)
+    {
+    case PLUS_EXPR:
+      if (rhs_code == MINUS_EXPR)
+ vec_opcode = VEC_ADDSUB_EXPR;
+      break;
+
+    case MINUS_EXPR:
+      if (rhs_code == PLUS_EXPR)
+ vec_opcode = VEC_SUBADD_EXPR;
+      break;
+
+    default:
+      return false;
+    }
+
+  if (vec_opcode == ERROR_MARK)
+    return false;
+
+  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+  if (!vectype)
+    {
+      vectype = get_vectype_for_scalar_type
+                  (TREE_TYPE (gimple_assign_rhs1 (stmt)));
+      gcc_assert (vectype);
+    }
+
+  optab optab = optab_for_tree_code (vec_opcode, vectype, optab_default);
+  if (!optab)
+    return false;
+
+  int icode = (int) optab_handler (optab, TYPE_MODE (vectype));
+  if (icode == CODE_FOR_nothing)
+    return false;
+
+  if (SLP_TREE_OP_CODE (*node) != ERROR_MARK
+      && SLP_TREE_OP_CODE (*node) != vec_opcode)
+    return false;
+
+  SLP_TREE_OP_CODE (*node) = vec_opcode;
+  return true;
+}
+
+/* Verify if the scalar stmts of NODE are isomorphic, require data
    permutation or are of unsupported types of operation.  Return
    true if they are, otherwise return false and indicate in *MATCHES
    which stmts are not isomorphic to the first one.  If MATCHES[0]
@@ -393,11 +460,12 @@ vect_get_and_check_slp_defs (loop_vec_info
loop_vinfo, bb_vec_info bb_vinfo,

 static bool
 vect_build_slp_tree_1 (loop_vec_info loop_vinfo, bb_vec_info bb_vinfo,
-       vec<gimple> stmts, unsigned int group_size,
+                       slp_tree *node, unsigned int group_size,
        unsigned nops, unsigned int *max_nunits,
        unsigned int vectorization_factor, bool *matches)
 {
   unsigned int i;
+  vec<gimple> stmts = SLP_TREE_SCALAR_STMTS (*node);
   gimple stmt = stmts[0];
   enum tree_code first_stmt_code = ERROR_MARK, rhs_code = ERROR_MARK;
   enum tree_code first_cond_code = ERROR_MARK;
@@ -583,7 +651,10 @@ vect_build_slp_tree_1 (loop_vec_info loop_vinfo,
bb_vec_info bb_vinfo,
  }
       else
  {
-  if (first_stmt_code != rhs_code
+  if ((first_stmt_code != rhs_code
+ && (i % 2 == 0
+     || !slp_supported_non_isomorphic_op (first_stmt_code,
+  stmt, node)))
       && (first_stmt_code != IMAGPART_EXPR
   || rhs_code != REALPART_EXPR)
       && (first_stmt_code != REALPART_EXPR
@@ -868,7 +939,7 @@ vect_build_slp_tree (loop_vec_info loop_vinfo,
bb_vec_info bb_vinfo,
     return false;

   if (!vect_build_slp_tree_1 (loop_vinfo, bb_vinfo,
-      SLP_TREE_SCALAR_STMTS (*node), group_size, nops,
+      node, group_size, nops,
       max_nunits, vectorization_factor, matches))
     return false;

diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index b0e0fa9..98906f0 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -3512,7 +3512,13 @@ vectorizable_operation (gimple stmt,
gimple_stmt_iterator *gsi,
   if (TREE_CODE (gimple_assign_lhs (stmt)) != SSA_NAME)
     return false;

-  code = gimple_assign_rhs_code (stmt);
+  /* Check if this slp_node will be vectorized by non-isomorphic operations,
+     in which case the operation on vectors is stored in
+     SLP_TREE_OP_CODE (slp_node).  */
+  if (slp_node && SLP_TREE_OP_CODE (slp_node) != ERROR_MARK)
+    code = SLP_TREE_OP_CODE (slp_node);
+  else
+    code = gimple_assign_rhs_code (stmt);

   /* For pointer addition, we should use the normal plus for
      the vector addition.  */
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index bbd50e1..19c09ae 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -117,6 +117,10 @@ struct _slp_tree {
      scalar elements in one scalar iteration (GROUP_SIZE) multiplied by VF
      divided by vector size.  */
   unsigned int vec_stmts_size;
+  /* The operation code used in the vectorized statement if it is not
+     ERROR_MARK.  Otherwise the operation is determined by the original
+     statement.  */
+  enum tree_code op_code;
 };


@@ -157,6 +161,7 @@ typedef struct _slp_instance {
 #define SLP_TREE_VEC_STMTS(S)                    (S)->vec_stmts
 #define SLP_TREE_NUMBER_OF_VEC_STMTS(S)          (S)->vec_stmts_size
 #define SLP_TREE_LOAD_PERMUTATION(S)             (S)->load_permutation
+#define SLP_TREE_OP_CODE(S)                         (S)->op_code

 /* This structure is used in creation of an SLP tree.  Each instance
    corresponds to the same operand in a group of scalar stmts in an SLP
diff --git a/gcc/tree.def b/gcc/tree.def
index 6763e78..c3eda42 100644
--- a/gcc/tree.def
+++ b/gcc/tree.def
@@ -1256,6 +1256,13 @@ DEFTREECODE (VEC_PACK_FIX_TRUNC_EXPR,
"vec_pack_fix_trunc_expr", tcc_binary, 2)
 DEFTREECODE (VEC_WIDEN_LSHIFT_HI_EXPR, "widen_lshift_hi_expr", tcc_binary, 2)
 DEFTREECODE (VEC_WIDEN_LSHIFT_LO_EXPR, "widen_lshift_lo_expr", tcc_binary, 2)

+/* Add even/odd elements and sub odd/even elements between two vectors.
+   Operand 0 and operand 1 are two operands.
+   The result of this operation is a vector with the same type of operand 0/1.
+ */
+DEFTREECODE (VEC_ADDSUB_EXPR, "addsub_expr", tcc_binary, 2)
+DEFTREECODE (VEC_SUBADD_EXPR, "subadd_expr", tcc_binary, 2)
+
 /* PREDICT_EXPR.  Specify hint for branch prediction.  The
    PREDICT_EXPR_PREDICTOR specify predictor and PREDICT_EXPR_OUTCOME the
    outcome (0 for not taken and 1 for taken).  Once the profile is guessed

[-- Attachment #2: patch-addsub.txt --]
[-- Type: text/plain, Size: 21299 bytes --]

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 2c0554b..656d5fb 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,31 @@
+2013-11-14  Cong Hou  <congh@google.com>
+
+	* tree-vect-slp.c (vect_create_new_slp_node): Initialize
+	SLP_TREE_OP_CODE.
+	(slp_supported_non_isomorphic_op): New function.  Check if the
+	non-isomorphic operation is supported or not.
+	(vect_build_slp_tree_1): Consider non-isomorphic operations.
+	(vect_build_slp_tree): Change argument.
+	* tree-vect-stmts.c (vectorizable_operation): Consider the opcode
+	for non-isomorphic operations.
+	* optabs.def (vec_addsub_optab, vec_subadd_optab): New optabs.
+	* tree.def (VEC_ADDSUB_EXPR, VEC_SUBADD_EXPR): New operations.
+	* expr.c (expand_expr_real_2): Add support to VEC_ADDSUB_EXPR and
+	VEC_SUBADD_EXPR.
+	* gimple-pretty-print.c (dump_binary_rhs): Likewise.
+	* optabs.c (optab_for_tree_code): Likewise.
+	* tree-cfg.c (verify_gimple_assign_binary): Likewise.
+	* tree-vectorizer.h (struct _slp_tree): New data member.
+	* config/i386/i386-protos.h (ix86_sse_expand_fp_addsub_operator):
+	New funtion.  Expand addsub/subadd operations for SSE2.
+	* config/i386/i386.c (ix86_sse_expand_fp_addsub_operator): Likewise.
+	* config/i386/sse.md (UNSPEC_SUBADD, UNSPEC_ADDSUB): New RTL operation.
+	(vec_subadd_v4sf3, vec_subadd_v2df3, vec_subadd_<mode>3,
+	 vec_addsub_v4sf3, vec_addsub_v2df3, vec_addsub_<mode>3):
+	Expand addsub/subadd operations for SSE/SSE2/SSE3/AVX.
+	* doc/generic.texi (VEC_ADDSUB_EXPR, VEC_SUBADD_EXPR): New doc.
+	* doc/md.texi (vec_addsub_@var{m}3, vec_subadd_@var{m}3): New doc.
+
 2013-11-12  Jeff Law  <law@redhat.com>
 
 	* tree-ssa-threadedge.c (thread_around_empty_blocks): New
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index fdf9d58..b02b757 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -117,6 +117,7 @@ extern rtx ix86_expand_adjust_ufix_to_sfix_si (rtx, rtx *);
 extern enum ix86_fpcmp_strategy ix86_fp_comparison_strategy (enum rtx_code);
 extern void ix86_expand_fp_absneg_operator (enum rtx_code, enum machine_mode,
 					    rtx[]);
+extern void ix86_sse_expand_fp_addsub_operator (bool, enum machine_mode, rtx[]);
 extern void ix86_expand_copysign (rtx []);
 extern void ix86_split_copysign_const (rtx []);
 extern void ix86_split_copysign_var (rtx []);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 5287b49..76f38f5 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -18702,6 +18702,51 @@ ix86_expand_fp_absneg_operator (enum rtx_code code, enum machine_mode mode,
     emit_insn (set);
 }
 
+/* Generate code for addsub or subadd on fp vectors for sse/sse2.  The flag
+   SUBADD indicates if we are generating code for subadd or addsub.  */
+
+void
+ix86_sse_expand_fp_addsub_operator (bool subadd, enum machine_mode mode,
+				    rtx operands[])
+{
+  rtx mask;
+  rtx neg_mask32 = GEN_INT (0x80000000);
+  rtx neg_mask64 = GEN_INT ((HOST_WIDE_INT)1 << 63);
+
+  switch (mode)
+    {
+    case V4SFmode:
+      if (subadd)
+	mask = gen_rtx_CONST_VECTOR (V4SImode, gen_rtvec (4,
+		 neg_mask32, const0_rtx, neg_mask32, const0_rtx));
+      else
+	mask = gen_rtx_CONST_VECTOR (V4SImode, gen_rtvec (4,
+		 const0_rtx, neg_mask32, const0_rtx, neg_mask32));
+      break;
+
+    case V2DFmode:
+      if (subadd)
+	mask = gen_rtx_CONST_VECTOR (V2DImode, gen_rtvec (2,
+		 neg_mask64, const0_rtx));
+      else
+	mask = gen_rtx_CONST_VECTOR (V2DImode, gen_rtvec (2,
+		 const0_rtx, neg_mask64));
+      break;
+
+    default:
+      gcc_unreachable ();
+    }
+
+  rtx tmp = gen_reg_rtx (mode);
+  convert_move (tmp, mask, false);
+
+  rtx tmp2 = gen_reg_rtx (mode);
+  tmp2 = expand_simple_binop (mode, XOR, tmp, operands[2],
+			      tmp2, 0, OPTAB_DIRECT);
+  expand_simple_binop (mode, PLUS, operands[1], tmp2,
+		       operands[0], 0, OPTAB_DIRECT);
+}
+
 /* Expand a copysign operation.  Special case operand 0 being a constant.  */
 
 void
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 7bb2d77..4369b2e 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -25,6 +25,8 @@
 
   ;; SSE3
   UNSPEC_LDDQU
+  UNSPEC_SUBADD
+  UNSPEC_ADDSUB
 
   ;; SSSE3
   UNSPEC_PSHUFB
@@ -1508,6 +1510,80 @@
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "<MODE>")])
 
+(define_expand "vec_subadd_v4sf3"
+  [(set (match_operand:V4SF 0 "register_operand")
+	(unspec:V4SF
+	  [(match_operand:V4SF 1 "register_operand")
+	   (match_operand:V4SF 2 "nonimmediate_operand")] UNSPEC_SUBADD))]
+  "TARGET_SSE"
+{
+  if (TARGET_SSE3)
+    emit_insn (gen_sse3_addsubv4sf3 (operands[0], operands[1], operands[2]));
+  else
+    ix86_sse_expand_fp_addsub_operator (true, V4SFmode, operands);
+  DONE;
+})
+
+(define_expand "vec_subadd_v2df3"
+  [(set (match_operand:V2DF 0 "register_operand")
+	(unspec:V2DF
+	  [(match_operand:V2DF 1 "register_operand")
+	   (match_operand:V2DF 2 "nonimmediate_operand")] UNSPEC_SUBADD))]
+  "TARGET_SSE2"
+{
+  if (TARGET_SSE3)
+    emit_insn (gen_sse3_addsubv2df3 (operands[0], operands[1], operands[2]));
+  else
+    ix86_sse_expand_fp_addsub_operator (true, V2DFmode, operands);
+  DONE;
+})
+
+(define_expand "vec_subadd_<mode>3"
+  [(set (match_operand:VF_256 0 "register_operand")
+	(unspec:VF_256
+	  [(match_operand:VF_256 1 "register_operand")
+	   (match_operand:VF_256 2 "nonimmediate_operand")] UNSPEC_SUBADD))]
+  "TARGET_AVX"
+{
+  emit_insn (gen_avx_addsub<mode>3 (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
+(define_expand "vec_addsub_v4sf3"
+  [(set (match_operand:V4SF 0 "register_operand")
+	(unspec:V4SF
+	  [(match_operand:V4SF 1 "register_operand")
+	   (match_operand:V4SF 2 "nonimmediate_operand")] UNSPEC_SUBADD))]
+  "TARGET_SSE"
+{
+  ix86_sse_expand_fp_addsub_operator (false, V4SFmode, operands);
+  DONE;
+})
+
+(define_expand "vec_addsub_v2df3"
+  [(set (match_operand:V2DF 0 "register_operand")
+	(unspec:V2DF
+	  [(match_operand:V2DF 1 "register_operand")
+	   (match_operand:V2DF 2 "nonimmediate_operand")] UNSPEC_SUBADD))]
+  "TARGET_SSE2"
+{
+  ix86_sse_expand_fp_addsub_operator (false, V2DFmode, operands);
+  DONE;
+})
+
+(define_expand "vec_addsub_<mode>3"
+  [(set (match_operand:VF_256 0 "register_operand")
+	(unspec:VF_256
+	  [(match_operand:VF_256 1 "register_operand")
+	   (match_operand:VF_256 2 "nonimmediate_operand")] UNSPEC_ADDSUB))]
+  "TARGET_AVX"
+{
+  rtx tmp = gen_reg_rtx (<MODE>mode);
+  emit_insn (gen_neg<mode>2 (tmp, operands[2]));
+  emit_insn (gen_avx_addsub<mode>3 (operands[0], operands[1], tmp));
+  DONE;
+})
+
 (define_insn "avx_addsubv4df3"
   [(set (match_operand:V4DF 0 "register_operand" "=x")
 	(vec_merge:V4DF
diff --git a/gcc/doc/generic.texi b/gcc/doc/generic.texi
index f2dd0ff..0870d6f 100644
--- a/gcc/doc/generic.texi
+++ b/gcc/doc/generic.texi
@@ -1715,6 +1715,8 @@ a value from @code{enum annot_expr_kind}.
 @tindex VEC_PACK_TRUNC_EXPR
 @tindex VEC_PACK_SAT_EXPR
 @tindex VEC_PACK_FIX_TRUNC_EXPR
+@tindex VEC_ADDSUB_EXPR
+@tindex VEC_SUBADD_EXPR
 
 @table @code
 @item VEC_LSHIFT_EXPR
@@ -1795,6 +1797,12 @@ value, it is taken from the second operand. It should never evaluate to
 any other value currently, but optimizations should not rely on that
 property. In contrast with a @code{COND_EXPR}, all operands are always
 evaluated.
+
+@item VEC_ADDSUB_EXPR
+@itemx VEC_SUBADD_EXPR
+These nodes represent add/sub and sub/add operations on even/odd elements
+of two vectors respectively.  The three operands must be vectors of the same
+size and number of elements.
 @end table
 
 
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 1a06e3d..d9726d2 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -4885,6 +4885,12 @@ with N signed/unsigned elements of size S@.  Operand 2 is a constant.  Shift
 the high/low elements of operand 1, and put the N/2 results of size 2*S in the
 output vector (operand 0).
 
+@cindex @code{vec_addsub_@var{m}3} instruction pattern
+@cindex @code{vec_subadd_@var{m}3} instruction pattern
+@item @samp{vec_addsub_@var{m}3}, @samp{vec_subadd_@var{m}3}
+Perform add/sub or sub/add on even/odd elements of two vectors.  Each
+operand is a vector with N elements of size S@.
+
 @cindex @code{mulhisi3} instruction pattern
 @item @samp{mulhisi3}
 Multiply operands 1 and 2, which have mode @code{HImode}, and store
diff --git a/gcc/expr.c b/gcc/expr.c
index 28b4332..997cfe2 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -8743,6 +8743,8 @@ expand_expr_real_2 (sepops ops, rtx target, enum machine_mode tmode,
     case BIT_AND_EXPR:
     case BIT_IOR_EXPR:
     case BIT_XOR_EXPR:
+    case VEC_ADDSUB_EXPR:
+    case VEC_SUBADD_EXPR:
       goto binop;
 
     case LROTATE_EXPR:
diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
index 6842213..e5c7a93 100644
--- a/gcc/gimple-pretty-print.c
+++ b/gcc/gimple-pretty-print.c
@@ -355,6 +355,8 @@ dump_binary_rhs (pretty_printer *buffer, gimple gs, int spc, int flags)
     case VEC_PACK_FIX_TRUNC_EXPR:
     case VEC_WIDEN_LSHIFT_HI_EXPR:
     case VEC_WIDEN_LSHIFT_LO_EXPR:
+    case VEC_ADDSUB_EXPR:
+    case VEC_SUBADD_EXPR:
       for (p = get_tree_code_name (code); *p; p++)
 	pp_character (buffer, TOUPPER (*p));
       pp_string (buffer, " <");
diff --git a/gcc/optabs.c b/gcc/optabs.c
index 164e4dd..a725117 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -547,6 +547,12 @@ optab_for_tree_code (enum tree_code code, const_tree type,
       return TYPE_UNSIGNED (type) ?
 	vec_pack_ufix_trunc_optab : vec_pack_sfix_trunc_optab;
 
+    case VEC_ADDSUB_EXPR:
+      return vec_addsub_optab;
+
+    case VEC_SUBADD_EXPR:
+      return vec_subadd_optab;
+
     default:
       break;
     }
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 6b924ac..3a09c52 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -281,6 +281,8 @@ OPTAB_D (vec_widen_umult_lo_optab, "vec_widen_umult_lo_$a")
 OPTAB_D (vec_widen_umult_odd_optab, "vec_widen_umult_odd_$a")
 OPTAB_D (vec_widen_ushiftl_hi_optab, "vec_widen_ushiftl_hi_$a")
 OPTAB_D (vec_widen_ushiftl_lo_optab, "vec_widen_ushiftl_lo_$a")
+OPTAB_D (vec_addsub_optab, "vec_addsub_$a3")
+OPTAB_D (vec_subadd_optab, "vec_subadd_$a3")
 
 OPTAB_D (sync_add_optab, "sync_add$I$a")
 OPTAB_D (sync_and_optab, "sync_and$I$a")
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 09c7f20..efd6c24 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,10 @@
+2013-11-14  Cong Hou  <congh@google.com>
+
+	* lib/target-supports.exp (check_effective_target_vect_addsub):
+	New target.
+	* gcc.dg/vect/vect-addsub-float.c: New test.
+	* gcc.dg/vect/vect-addsub-double.c: New test.
+
 2013-11-12  Balaji V. Iyer  <balaji.v.iyer@intel.com>
 
 	* gcc.dg/cilk-plus/cilk-plus.exp: Added a check for LTO before running
diff --git a/gcc/testsuite/gcc.dg/vect/vect-addsub-double.c b/gcc/testsuite/gcc.dg/vect/vect-addsub-double.c
new file mode 100644
index 0000000..5399dde
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-addsub-double.c
@@ -0,0 +1,51 @@
+/* { dg-require-effective-target vect_addsub } */
+/* { dg-additional-options "-fdump-tree-slp-details" } */
+
+#include "tree-vect.h"
+
+double a[4], b[4], c[4];
+
+void subadd ()
+{
+  c[0] = a[0] - b[0];
+  c[1] = a[1] + b[1];
+  c[2] = a[2] - b[2];
+  c[3] = a[3] + b[3];
+}
+
+void addsub ()
+{
+  c[0] = a[0] + b[0];
+  c[1] = a[1] - b[1];
+  c[2] = a[2] + b[2];
+  c[3] = a[3] - b[3];
+}
+
+int main()
+{
+  int i;
+  for (i = 0; i < 4; ++i)
+    {
+      a[i] = (i + 1.2) / 3.4;
+      b[i] = (i + 5.6) / 7.8;
+    }
+
+  subadd ();
+
+  if (c[0] != a[0] - b[0]
+      || c[1] != a[1] + b[1]
+      || c[2] != a[2] - b[2]
+      || c[3] != a[3] + b[3])
+    abort ();
+
+  addsub ();
+
+  if (c[0] != a[0] + b[0]
+      || c[1] != a[1] - b[1]
+      || c[2] != a[2] + b[2]
+      || c[3] != a[3] - b[3])
+    abort ();
+}
+
+/* { dg-final { scan-tree-dump-times "basic block vectorized" 2 "slp" } } */
+/* { dg-final { cleanup-tree-dump "slp" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-addsub-float.c b/gcc/testsuite/gcc.dg/vect/vect-addsub-float.c
new file mode 100644
index 0000000..5b780f3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-addsub-float.c
@@ -0,0 +1,51 @@
+/* { dg-require-effective-target vect_addsub } */
+/* { dg-additional-options "-fdump-tree-slp-details" } */
+
+#include "tree-vect.h"
+
+float a[4], b[4], c[4];
+
+void subadd ()
+{
+  c[0] = a[0] - b[0];
+  c[1] = a[1] + b[1];
+  c[2] = a[2] - b[2];
+  c[3] = a[3] + b[3];
+}
+
+void addsub ()
+{
+  c[0] = a[0] + b[0];
+  c[1] = a[1] - b[1];
+  c[2] = a[2] + b[2];
+  c[3] = a[3] - b[3];
+}
+
+int main()
+{
+  int i;
+  for (i = 0; i < 4; ++i)
+    {
+      a[i] = (i + 1.2) / 3.4;
+      b[i] = (i + 5.6) / 7.8;
+    }
+
+  subadd ();
+
+  if (c[0] != a[0] - b[0]
+      || c[1] != a[1] + b[1]
+      || c[2] != a[2] - b[2]
+      || c[3] != a[3] + b[3])
+    abort ();
+
+  addsub ();
+
+  if (c[0] != a[0] + b[0]
+      || c[1] != a[1] - b[1]
+      || c[2] != a[2] + b[2]
+      || c[3] != a[3] - b[3])
+    abort ();
+}
+
+/* { dg-final { scan-tree-dump-times "basic block vectorized" 2 "slp" } } */
+/* { dg-final { cleanup-tree-dump "slp" } } */
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index c3d9712..f336f77 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -4099,6 +4099,15 @@ proc check_effective_target_vect_extract_even_odd { } {
     return $et_vect_extract_even_odd_saved
 }
 
+# Return 1 if the target supports vector addsub and subadd operations, 0 otherwise.
+
+proc check_effective_target_vect_addsub { } {
+    if { [check_effective_target_sse2] } {
+	return 1
+    }
+    return 0
+}
+
 # Return 1 if the target supports vector interleaving, 0 otherwise.
 
 proc check_effective_target_vect_interleave { } {
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index 601efd6..2bf1b79 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -3572,6 +3572,23 @@ verify_gimple_assign_binary (gimple stmt)
         return false;
       }
 
+    case VEC_SUBADD_EXPR:
+    case VEC_ADDSUB_EXPR:
+      {
+        if (TREE_CODE (rhs1_type) != VECTOR_TYPE
+            || TREE_CODE (rhs2_type) != VECTOR_TYPE
+            || TREE_CODE (lhs_type) != VECTOR_TYPE)
+          {
+            error ("type mismatch in addsub/subadd expression");
+            debug_generic_expr (lhs_type);
+            debug_generic_expr (rhs1_type);
+            debug_generic_expr (rhs2_type);
+            return true;
+          }
+
+        return false;
+      }
+
     case PLUS_EXPR:
     case MINUS_EXPR:
       {
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 825f73a..1169d33 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -125,6 +125,7 @@ vect_create_new_slp_node (vec<gimple> scalar_stmts)
   SLP_TREE_VEC_STMTS (node).create (0);
   SLP_TREE_CHILDREN (node).create (nops);
   SLP_TREE_LOAD_PERMUTATION (node) = vNULL;
+  SLP_TREE_OP_CODE (node) = ERROR_MARK;
 
   return node;
 }
@@ -383,8 +384,74 @@ vect_get_and_check_slp_defs (loop_vec_info loop_vinfo, bb_vec_info bb_vinfo,
   return true;
 }
 
+/* Check if the target supports the vector operation that performs the
+   operation of FIRST_STMT_CODE on even elements and the operation as in STMT
+   on odd elements.  If yes, set the code of NODE to that of the new operation
+   and return true.  Otherwise return false.  This enables SLP vectorization
+   for the following code:
 
-/* Verify if the scalar stmts STMTS are isomorphic, require data
+           a[0] = b[0] + c[0];
+           a[1] = b[1] - c[1];
+           a[2] = b[2] + c[2];
+           a[3] = b[3] - c[3];
+   */
+
+static bool
+slp_supported_non_isomorphic_op (enum tree_code first_stmt_code,
+				 gimple stmt,
+				 slp_tree *node)
+{
+  if (!is_gimple_assign (stmt))
+    return false;
+
+  enum tree_code rhs_code = gimple_assign_rhs_code (stmt);
+  enum tree_code vec_opcode = ERROR_MARK;
+
+  switch (first_stmt_code)
+    {
+    case PLUS_EXPR:
+      if (rhs_code == MINUS_EXPR)
+	vec_opcode = VEC_ADDSUB_EXPR;
+      break;
+
+    case MINUS_EXPR:
+      if (rhs_code == PLUS_EXPR)
+	vec_opcode = VEC_SUBADD_EXPR;
+      break;
+
+    default:
+      return false;
+    }
+
+  if (vec_opcode == ERROR_MARK)
+    return false;
+
+  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+  if (!vectype)
+    {
+      vectype = get_vectype_for_scalar_type
+                  (TREE_TYPE (gimple_assign_rhs1 (stmt)));
+      gcc_assert (vectype);
+    }
+
+  optab optab = optab_for_tree_code (vec_opcode, vectype, optab_default);
+  if (!optab)
+    return false;
+
+  int icode = (int) optab_handler (optab, TYPE_MODE (vectype));
+  if (icode == CODE_FOR_nothing)
+    return false;
+
+  if (SLP_TREE_OP_CODE (*node) != ERROR_MARK
+      && SLP_TREE_OP_CODE (*node) != vec_opcode)
+    return false;
+
+  SLP_TREE_OP_CODE (*node) = vec_opcode;
+  return true;
+}
+
+/* Verify if the scalar stmts of NODE are isomorphic, require data
    permutation or are of unsupported types of operation.  Return
    true if they are, otherwise return false and indicate in *MATCHES
    which stmts are not isomorphic to the first one.  If MATCHES[0]
@@ -393,11 +460,12 @@ vect_get_and_check_slp_defs (loop_vec_info loop_vinfo, bb_vec_info bb_vinfo,
 
 static bool
 vect_build_slp_tree_1 (loop_vec_info loop_vinfo, bb_vec_info bb_vinfo,
-		       vec<gimple> stmts, unsigned int group_size,
+                       slp_tree *node, unsigned int group_size,
 		       unsigned nops, unsigned int *max_nunits,
 		       unsigned int vectorization_factor, bool *matches)
 {
   unsigned int i;
+  vec<gimple> stmts = SLP_TREE_SCALAR_STMTS (*node);
   gimple stmt = stmts[0];
   enum tree_code first_stmt_code = ERROR_MARK, rhs_code = ERROR_MARK;
   enum tree_code first_cond_code = ERROR_MARK;
@@ -583,7 +651,10 @@ vect_build_slp_tree_1 (loop_vec_info loop_vinfo, bb_vec_info bb_vinfo,
 	}
       else
 	{
-	  if (first_stmt_code != rhs_code
+	  if ((first_stmt_code != rhs_code
+		 && (i % 2 == 0
+		     || !slp_supported_non_isomorphic_op (first_stmt_code,
+							  stmt, node)))
 	      && (first_stmt_code != IMAGPART_EXPR
 		  || rhs_code != REALPART_EXPR)
 	      && (first_stmt_code != REALPART_EXPR
@@ -868,7 +939,7 @@ vect_build_slp_tree (loop_vec_info loop_vinfo, bb_vec_info bb_vinfo,
     return false;
 
   if (!vect_build_slp_tree_1 (loop_vinfo, bb_vinfo,
-			      SLP_TREE_SCALAR_STMTS (*node), group_size, nops,
+			      node, group_size, nops,
 			      max_nunits, vectorization_factor, matches))
     return false;
 
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index b0e0fa9..98906f0 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -3512,7 +3512,13 @@ vectorizable_operation (gimple stmt, gimple_stmt_iterator *gsi,
   if (TREE_CODE (gimple_assign_lhs (stmt)) != SSA_NAME)
     return false;
 
-  code = gimple_assign_rhs_code (stmt);
+  /* Check if this slp_node will be vectorized by non-isomorphic operations,
+     in which case the operation on vectors is stored in
+     SLP_TREE_OP_CODE (slp_node).  */
+  if (slp_node && SLP_TREE_OP_CODE (slp_node) != ERROR_MARK)
+    code = SLP_TREE_OP_CODE (slp_node);
+  else
+    code = gimple_assign_rhs_code (stmt);
 
   /* For pointer addition, we should use the normal plus for
      the vector addition.  */
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index bbd50e1..19c09ae 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -117,6 +117,10 @@ struct _slp_tree {
      scalar elements in one scalar iteration (GROUP_SIZE) multiplied by VF
      divided by vector size.  */
   unsigned int vec_stmts_size;
+  /* The operation code used in the vectorized statement if it is not
+     ERROR_MARK.  Otherwise the operation is determined by the original
+     statement.  */
+  enum tree_code op_code;
 };
 
 
@@ -157,6 +161,7 @@ typedef struct _slp_instance {
 #define SLP_TREE_VEC_STMTS(S)                    (S)->vec_stmts
 #define SLP_TREE_NUMBER_OF_VEC_STMTS(S)          (S)->vec_stmts_size
 #define SLP_TREE_LOAD_PERMUTATION(S)             (S)->load_permutation
+#define SLP_TREE_OP_CODE(S)                         (S)->op_code
 
 /* This structure is used in creation of an SLP tree.  Each instance
    corresponds to the same operand in a group of scalar stmts in an SLP
diff --git a/gcc/tree.def b/gcc/tree.def
index 6763e78..c3eda42 100644
--- a/gcc/tree.def
+++ b/gcc/tree.def
@@ -1256,6 +1256,13 @@ DEFTREECODE (VEC_PACK_FIX_TRUNC_EXPR, "vec_pack_fix_trunc_expr", tcc_binary, 2)
 DEFTREECODE (VEC_WIDEN_LSHIFT_HI_EXPR, "widen_lshift_hi_expr", tcc_binary, 2)
 DEFTREECODE (VEC_WIDEN_LSHIFT_LO_EXPR, "widen_lshift_lo_expr", tcc_binary, 2)
 
+/* Add even/odd elements and sub odd/even elements between two vectors.
+   Operand 0 and operand 1 are two operands.
+   The result of this operation is a vector with the same type of operand 0/1.
+ */
+DEFTREECODE (VEC_ADDSUB_EXPR, "addsub_expr", tcc_binary, 2)
+DEFTREECODE (VEC_SUBADD_EXPR, "subadd_expr", tcc_binary, 2)
+
 /* PREDICT_EXPR.  Specify hint for branch prediction.  The
    PREDICT_EXPR_PREDICTOR specify predictor and PREDICT_EXPR_OUTCOME the
    outcome (0 for not taken and 1 for taken).  Once the profile is guessed

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2014-07-10  7:51 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-22 11:57 [PATCH] Support addsub/subadd as non-isomorphic operations for SLP vectorizer Uros Bizjak
2013-11-22 21:33 ` Cong Hou
  -- strict thread matches above, loose matches on Subject: below --
2014-07-10  7:51 Uros Bizjak
2013-11-15 10:06 Uros Bizjak
2013-11-18 21:07 ` Cong Hou
2013-11-18 21:57   ` Uros Bizjak
2013-11-19  5:13     ` Cong Hou
2013-11-15  8:53 Cong Hou
2013-11-15 10:02 ` Richard Biener
2013-11-18 21:00   ` Cong Hou
2013-11-19 11:22     ` Richard Biener
2013-11-20  5:28       ` Cong Hou
2013-11-20 10:09         ` Richard Biener
2013-11-22  0:33           ` Cong Hou
2013-11-22  3:32             ` Cong Hou
2013-11-22  4:08               ` Marc Glisse
2013-11-22  5:49                 ` Cong Hou
2013-11-22 13:18                   ` Marc Glisse
2013-11-22 21:40                     ` Cong Hou
2013-11-23 18:46                       ` Marc Glisse
2013-12-03  1:02                       ` Cong Hou
2013-12-17 18:05                         ` Cong Hou
2014-07-09  3:23                           ` Xinliang David Li
2014-07-10  4:50                             ` Cong Hou
2013-11-15 19:25 ` Richard Earnshaw
2013-11-18 21:08   ` Cong Hou
2013-11-19 11:45     ` Richard Earnshaw

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).