From: Richard Biener <rguenther@suse.de>
To: Cong Hou <congh@google.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] Support addsub/subadd as non-isomorphic operations for SLP vectorizer.
Date: Tue, 19 Nov 2013 11:22:00 -0000 [thread overview]
Message-ID: <alpine.LNX.2.00.1311191034550.4261@zhemvz.fhfr.qr> (raw)
In-Reply-To: <CAK=A3=3TAJNrivikE-NFAOYsyn49PxqkhYMfs0TzFtDqci2f1Q@mail.gmail.com>
On Mon, 18 Nov 2013, Cong Hou wrote:
> I tried your method and it works well for doubles. But for float,
> there is an issue. For the following gimple code:
>
> c1 = a - b;
> c2 = a + b;
> c = VEC_PERM <c1, c2, [0,5,2,7]>
>
> It needs two instructions to implement the VEC_PERM operation in
> SSE2-4, one of which should be using shufps which is represented by
> the following pattern in rtl:
>
>
> (define_insn "sse_shufps_<mode>"
> [(set (match_operand:VI4F_128 0 "register_operand" "=x,x")
> (vec_select:VI4F_128
> (vec_concat:<ssedoublevecmode>
> (match_operand:VI4F_128 1 "register_operand" "0,x")
> (match_operand:VI4F_128 2 "nonimmediate_operand" "xm,xm"))
> (parallel [(match_operand 3 "const_0_to_3_operand")
> (match_operand 4 "const_0_to_3_operand")
> (match_operand 5 "const_4_to_7_operand")
> (match_operand 6 "const_4_to_7_operand")])))]
> ...)
>
> Note that it contains two rtl instructions.
It's a single instruction as far as combine is concerned (RTL
instructions have arbitrary complexity).
> Together with minus, plus,
> and one more shuffling instruction, we have at least five instructions
> for addsub pattern. I think during the combine pass, only four
> instructions are considered to be combined, right? So unless we
> compress those five instructions into four or less, we could not use
> this method for float values.
At the moment addsubv4sf looks like
(define_insn "sse3_addsubv4sf3"
[(set (match_operand:V4SF 0 "register_operand" "=x,x")
(vec_merge:V4SF
(plus:V4SF
(match_operand:V4SF 1 "register_operand" "0,x")
(match_operand:V4SF 2 "nonimmediate_operand" "xm,xm"))
(minus:V4SF (match_dup 1) (match_dup 2))
(const_int 10)))]
to match this it's best to have the VEC_SHUFFLE retained as
vec_merge and thus support arbitrary(?) vec_merge for the aid
of combining until reload(?) after which we can split it.
> What do you think?
Besides of addsub are there other instructions that can be expressed
similarly? Thus, how far should the combiner pattern go?
Richard.
>
>
>
> thanks,
> Cong
>
>
> On Fri, Nov 15, 2013 at 12:53 AM, Richard Biener <rguenther@suse.de> wrote:
> > On Thu, 14 Nov 2013, Cong Hou wrote:
> >
> >> Hi
> >>
> >> This patch adds the support to two non-isomorphic operations addsub
> >> and subadd for SLP vectorizer. More non-isomorphic operations can be
> >> added later, but the limitation is that operations on even/odd
> >> elements should still be isomorphic. Once such an operation is
> >> detected, the code of the operation used in vectorized code is stored
> >> and later will be used during statement transformation. Two new GIMPLE
> >> opeartions VEC_ADDSUB_EXPR and VEC_SUBADD_EXPR are defined. And also
> >> new optabs for them. They are also documented.
> >>
> >> The target supports for SSE/SSE2/SSE3/AVX are added for those two new
> >> operations on floating points. SSE3/AVX provides ADDSUBPD and ADDSUBPS
> >> instructions. For SSE/SSE2, those two operations are emulated using
> >> two instructions (selectively negate then add).
> >>
> >> With this patch the following function will be SLP vectorized:
> >>
> >>
> >> float a[4], b[4], c[4]; // double also OK.
> >>
> >> void subadd ()
> >> {
> >> c[0] = a[0] - b[0];
> >> c[1] = a[1] + b[1];
> >> c[2] = a[2] - b[2];
> >> c[3] = a[3] + b[3];
> >> }
> >>
> >> void addsub ()
> >> {
> >> c[0] = a[0] + b[0];
> >> c[1] = a[1] - b[1];
> >> c[2] = a[2] + b[2];
> >> c[3] = a[3] - b[3];
> >> }
> >>
> >>
> >> Boostrapped and tested on an x86-64 machine.
> >
> > I managed to do this without adding new tree codes or optabs by
> > vectorizing the above as
> >
> > c1 = a + b;
> > c2 = a - b;
> > c = VEC_PERM <c1, c2, the-proper-mask>
> >
> > which then matches sse3_addsubv4sf3 if you fix that pattern to
> > not use vec_merge (or fix PR56766). Doing it this way also
> > means that the code is vectorizable if you don't have a HW
> > instruction for that but can do the VEC_PERM efficiently.
> >
> > So, I'd like to avoid new tree codes and optabs whenever possible
> > and here I've already proved (with a patch) that it is possible.
> > Didn't have time to clean it up, and it likely doesn't apply anymore
> > (and PR56766 blocks it but it even has a patch).
> >
> > Btw, this was PR56902 where I attached my patch.
> >
> > Richard.
> >
> >>
> >> thanks,
> >> Cong
> >>
> >>
> >>
> >>
> >>
> >> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> >> index 2c0554b..656d5fb 100644
> >> --- a/gcc/ChangeLog
> >> +++ b/gcc/ChangeLog
> >> @@ -1,3 +1,31 @@
> >> +2013-11-14 Cong Hou <congh@google.com>
> >> +
> >> + * tree-vect-slp.c (vect_create_new_slp_node): Initialize
> >> + SLP_TREE_OP_CODE.
> >> + (slp_supported_non_isomorphic_op): New function. Check if the
> >> + non-isomorphic operation is supported or not.
> >> + (vect_build_slp_tree_1): Consider non-isomorphic operations.
> >> + (vect_build_slp_tree): Change argument.
> >> + * tree-vect-stmts.c (vectorizable_operation): Consider the opcode
> >> + for non-isomorphic operations.
> >> + * optabs.def (vec_addsub_optab, vec_subadd_optab): New optabs.
> >> + * tree.def (VEC_ADDSUB_EXPR, VEC_SUBADD_EXPR): New operations.
> >> + * expr.c (expand_expr_real_2): Add support to VEC_ADDSUB_EXPR and
> >> + VEC_SUBADD_EXPR.
> >> + * gimple-pretty-print.c (dump_binary_rhs): Likewise.
> >> + * optabs.c (optab_for_tree_code): Likewise.
> >> + * tree-cfg.c (verify_gimple_assign_binary): Likewise.
> >> + * tree-vectorizer.h (struct _slp_tree): New data member.
> >> + * config/i386/i386-protos.h (ix86_sse_expand_fp_addsub_operator):
> >> + New funtion. Expand addsub/subadd operations for SSE2.
> >> + * config/i386/i386.c (ix86_sse_expand_fp_addsub_operator): Likewise.
> >> + * config/i386/sse.md (UNSPEC_SUBADD, UNSPEC_ADDSUB): New RTL operation.
> >> + (vec_subadd_v4sf3, vec_subadd_v2df3, vec_subadd_<mode>3,
> >> + vec_addsub_v4sf3, vec_addsub_v2df3, vec_addsub_<mode>3):
> >> + Expand addsub/subadd operations for SSE/SSE2/SSE3/AVX.
> >> + * doc/generic.texi (VEC_ADDSUB_EXPR, VEC_SUBADD_EXPR): New doc.
> >> + * doc/md.texi (vec_addsub_@var{m}3, vec_subadd_@var{m}3): New doc.
> >> +
> >> 2013-11-12 Jeff Law <law@redhat.com>
> >>
> >> * tree-ssa-threadedge.c (thread_around_empty_blocks): New
> >> diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
> >> index fdf9d58..b02b757 100644
> >> --- a/gcc/config/i386/i386-protos.h
> >> +++ b/gcc/config/i386/i386-protos.h
> >> @@ -117,6 +117,7 @@ extern rtx ix86_expand_adjust_ufix_to_sfix_si (rtx, rtx *);
> >> extern enum ix86_fpcmp_strategy ix86_fp_comparison_strategy (enum rtx_code);
> >> extern void ix86_expand_fp_absneg_operator (enum rtx_code, enum machine_mode,
> >> rtx[]);
> >> +extern void ix86_sse_expand_fp_addsub_operator (bool, enum
> >> machine_mode, rtx[]);
> >> extern void ix86_expand_copysign (rtx []);
> >> extern void ix86_split_copysign_const (rtx []);
> >> extern void ix86_split_copysign_var (rtx []);
> >> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> >> index 5287b49..76f38f5 100644
> >> --- a/gcc/config/i386/i386.c
> >> +++ b/gcc/config/i386/i386.c
> >> @@ -18702,6 +18702,51 @@ ix86_expand_fp_absneg_operator (enum rtx_code
> >> code, enum machine_mode mode,
> >> emit_insn (set);
> >> }
> >>
> >> +/* Generate code for addsub or subadd on fp vectors for sse/sse2. The flag
> >> + SUBADD indicates if we are generating code for subadd or addsub. */
> >> +
> >> +void
> >> +ix86_sse_expand_fp_addsub_operator (bool subadd, enum machine_mode mode,
> >> + rtx operands[])
> >> +{
> >> + rtx mask;
> >> + rtx neg_mask32 = GEN_INT (0x80000000);
> >> + rtx neg_mask64 = GEN_INT ((HOST_WIDE_INT)1 << 63);
> >> +
> >> + switch (mode)
> >> + {
> >> + case V4SFmode:
> >> + if (subadd)
> >> + mask = gen_rtx_CONST_VECTOR (V4SImode, gen_rtvec (4,
> >> + neg_mask32, const0_rtx, neg_mask32, const0_rtx));
> >> + else
> >> + mask = gen_rtx_CONST_VECTOR (V4SImode, gen_rtvec (4,
> >> + const0_rtx, neg_mask32, const0_rtx, neg_mask32));
> >> + break;
> >> +
> >> + case V2DFmode:
> >> + if (subadd)
> >> + mask = gen_rtx_CONST_VECTOR (V2DImode, gen_rtvec (2,
> >> + neg_mask64, const0_rtx));
> >> + else
> >> + mask = gen_rtx_CONST_VECTOR (V2DImode, gen_rtvec (2,
> >> + const0_rtx, neg_mask64));
> >> + break;
> >> +
> >> + default:
> >> + gcc_unreachable ();
> >> + }
> >> +
> >> + rtx tmp = gen_reg_rtx (mode);
> >> + convert_move (tmp, mask, false);
> >> +
> >> + rtx tmp2 = gen_reg_rtx (mode);
> >> + tmp2 = expand_simple_binop (mode, XOR, tmp, operands[2],
> >> + tmp2, 0, OPTAB_DIRECT);
> >> + expand_simple_binop (mode, PLUS, operands[1], tmp2,
> >> + operands[0], 0, OPTAB_DIRECT);
> >> +}
> >> +
> >> /* Expand a copysign operation. Special case operand 0 being a constant. */
> >>
> >> void
> >> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> >> index 7bb2d77..4369b2e 100644
> >> --- a/gcc/config/i386/sse.md
> >> +++ b/gcc/config/i386/sse.md
> >> @@ -25,6 +25,8 @@
> >>
> >> ;; SSE3
> >> UNSPEC_LDDQU
> >> + UNSPEC_SUBADD
> >> + UNSPEC_ADDSUB
> >>
> >> ;; SSSE3
> >> UNSPEC_PSHUFB
> >> @@ -1508,6 +1510,80 @@
> >> (set_attr "prefix" "orig,vex")
> >> (set_attr "mode" "<MODE>")])
> >>
> >> +(define_expand "vec_subadd_v4sf3"
> >> + [(set (match_operand:V4SF 0 "register_operand")
> >> + (unspec:V4SF
> >> + [(match_operand:V4SF 1 "register_operand")
> >> + (match_operand:V4SF 2 "nonimmediate_operand")] UNSPEC_SUBADD))]
> >> + "TARGET_SSE"
> >> +{
> >> + if (TARGET_SSE3)
> >> + emit_insn (gen_sse3_addsubv4sf3 (operands[0], operands[1], operands[2]));
> >> + else
> >> + ix86_sse_expand_fp_addsub_operator (true, V4SFmode, operands);
> >> + DONE;
> >> +})
> >> +
> >> +(define_expand "vec_subadd_v2df3"
> >> + [(set (match_operand:V2DF 0 "register_operand")
> >> + (unspec:V2DF
> >> + [(match_operand:V2DF 1 "register_operand")
> >> + (match_operand:V2DF 2 "nonimmediate_operand")] UNSPEC_SUBADD))]
> >> + "TARGET_SSE2"
> >> +{
> >> + if (TARGET_SSE3)
> >> + emit_insn (gen_sse3_addsubv2df3 (operands[0], operands[1], operands[2]));
> >> + else
> >> + ix86_sse_expand_fp_addsub_operator (true, V2DFmode, operands);
> >> + DONE;
> >> +})
> >> +
> >> +(define_expand "vec_subadd_<mode>3"
> >> + [(set (match_operand:VF_256 0 "register_operand")
> >> + (unspec:VF_256
> >> + [(match_operand:VF_256 1 "register_operand")
> >> + (match_operand:VF_256 2 "nonimmediate_operand")] UNSPEC_SUBADD))]
> >> + "TARGET_AVX"
> >> +{
> >> + emit_insn (gen_avx_addsub<mode>3 (operands[0], operands[1], operands[2]));
> >> + DONE;
> >> +})
> >> +
> >> +(define_expand "vec_addsub_v4sf3"
> >> + [(set (match_operand:V4SF 0 "register_operand")
> >> + (unspec:V4SF
> >> + [(match_operand:V4SF 1 "register_operand")
> >> + (match_operand:V4SF 2 "nonimmediate_operand")] UNSPEC_SUBADD))]
> >> + "TARGET_SSE"
> >> +{
> >> + ix86_sse_expand_fp_addsub_operator (false, V4SFmode, operands);
> >> + DONE;
> >> +})
> >> +
> >> +(define_expand "vec_addsub_v2df3"
> >> + [(set (match_operand:V2DF 0 "register_operand")
> >> + (unspec:V2DF
> >> + [(match_operand:V2DF 1 "register_operand")
> >> + (match_operand:V2DF 2 "nonimmediate_operand")] UNSPEC_SUBADD))]
> >> + "TARGET_SSE2"
> >> +{
> >> + ix86_sse_expand_fp_addsub_operator (false, V2DFmode, operands);
> >> + DONE;
> >> +})
> >> +
> >> +(define_expand "vec_addsub_<mode>3"
> >> + [(set (match_operand:VF_256 0 "register_operand")
> >> + (unspec:VF_256
> >> + [(match_operand:VF_256 1 "register_operand")
> >> + (match_operand:VF_256 2 "nonimmediate_operand")] UNSPEC_ADDSUB))]
> >> + "TARGET_AVX"
> >> +{
> >> + rtx tmp = gen_reg_rtx (<MODE>mode);
> >> + emit_insn (gen_neg<mode>2 (tmp, operands[2]));
> >> + emit_insn (gen_avx_addsub<mode>3 (operands[0], operands[1], tmp));
> >> + DONE;
> >> +})
> >> +
> >> (define_insn "avx_addsubv4df3"
> >> [(set (match_operand:V4DF 0 "register_operand" "=x")
> >> (vec_merge:V4DF
> >> diff --git a/gcc/doc/generic.texi b/gcc/doc/generic.texi
> >> index f2dd0ff..0870d6f 100644
> >> --- a/gcc/doc/generic.texi
> >> +++ b/gcc/doc/generic.texi
> >> @@ -1715,6 +1715,8 @@ a value from @code{enum annot_expr_kind}.
> >> @tindex VEC_PACK_TRUNC_EXPR
> >> @tindex VEC_PACK_SAT_EXPR
> >> @tindex VEC_PACK_FIX_TRUNC_EXPR
> >> +@tindex VEC_ADDSUB_EXPR
> >> +@tindex VEC_SUBADD_EXPR
> >>
> >> @table @code
> >> @item VEC_LSHIFT_EXPR
> >> @@ -1795,6 +1797,12 @@ value, it is taken from the second operand. It
> >> should never evaluate to
> >> any other value currently, but optimizations should not rely on that
> >> property. In contrast with a @code{COND_EXPR}, all operands are always
> >> evaluated.
> >> +
> >> +@item VEC_ADDSUB_EXPR
> >> +@itemx VEC_SUBADD_EXPR
> >> +These nodes represent add/sub and sub/add operations on even/odd elements
> >> +of two vectors respectively. The three operands must be vectors of the same
> >> +size and number of elements.
> >> @end table
> >>
> >>
> >> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> >> index 1a06e3d..d9726d2 100644
> >> --- a/gcc/doc/md.texi
> >> +++ b/gcc/doc/md.texi
> >> @@ -4885,6 +4885,12 @@ with N signed/unsigned elements of size S@.
> >> Operand 2 is a constant. Shift
> >> the high/low elements of operand 1, and put the N/2 results of size 2*S in the
> >> output vector (operand 0).
> >>
> >> +@cindex @code{vec_addsub_@var{m}3} instruction pattern
> >> +@cindex @code{vec_subadd_@var{m}3} instruction pattern
> >> +@item @samp{vec_addsub_@var{m}3}, @samp{vec_subadd_@var{m}3}
> >> +Perform add/sub or sub/add on even/odd elements of two vectors. Each
> >> +operand is a vector with N elements of size S@.
> >> +
> >> @cindex @code{mulhisi3} instruction pattern
> >> @item @samp{mulhisi3}
> >> Multiply operands 1 and 2, which have mode @code{HImode}, and store
> >> diff --git a/gcc/expr.c b/gcc/expr.c
> >> index 28b4332..997cfe2 100644
> >> --- a/gcc/expr.c
> >> +++ b/gcc/expr.c
> >> @@ -8743,6 +8743,8 @@ expand_expr_real_2 (sepops ops, rtx target, enum
> >> machine_mode tmode,
> >> case BIT_AND_EXPR:
> >> case BIT_IOR_EXPR:
> >> case BIT_XOR_EXPR:
> >> + case VEC_ADDSUB_EXPR:
> >> + case VEC_SUBADD_EXPR:
> >> goto binop;
> >>
> >> case LROTATE_EXPR:
> >> diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
> >> index 6842213..e5c7a93 100644
> >> --- a/gcc/gimple-pretty-print.c
> >> +++ b/gcc/gimple-pretty-print.c
> >> @@ -355,6 +355,8 @@ dump_binary_rhs (pretty_printer *buffer, gimple
> >> gs, int spc, int flags)
> >> case VEC_PACK_FIX_TRUNC_EXPR:
> >> case VEC_WIDEN_LSHIFT_HI_EXPR:
> >> case VEC_WIDEN_LSHIFT_LO_EXPR:
> >> + case VEC_ADDSUB_EXPR:
> >> + case VEC_SUBADD_EXPR:
> >> for (p = get_tree_code_name (code); *p; p++)
> >> pp_character (buffer, TOUPPER (*p));
> >> pp_string (buffer, " <");
> >> diff --git a/gcc/optabs.c b/gcc/optabs.c
> >> index 164e4dd..a725117 100644
> >> --- a/gcc/optabs.c
> >> +++ b/gcc/optabs.c
> >> @@ -547,6 +547,12 @@ optab_for_tree_code (enum tree_code code, const_tree type,
> >> return TYPE_UNSIGNED (type) ?
> >> vec_pack_ufix_trunc_optab : vec_pack_sfix_trunc_optab;
> >>
> >> + case VEC_ADDSUB_EXPR:
> >> + return vec_addsub_optab;
> >> +
> >> + case VEC_SUBADD_EXPR:
> >> + return vec_subadd_optab;
> >> +
> >> default:
> >> break;
> >> }
> >> diff --git a/gcc/optabs.def b/gcc/optabs.def
> >> index 6b924ac..3a09c52 100644
> >> --- a/gcc/optabs.def
> >> +++ b/gcc/optabs.def
> >> @@ -281,6 +281,8 @@ OPTAB_D (vec_widen_umult_lo_optab, "vec_widen_umult_lo_$a")
> >> OPTAB_D (vec_widen_umult_odd_optab, "vec_widen_umult_odd_$a")
> >> OPTAB_D (vec_widen_ushiftl_hi_optab, "vec_widen_ushiftl_hi_$a")
> >> OPTAB_D (vec_widen_ushiftl_lo_optab, "vec_widen_ushiftl_lo_$a")
> >> +OPTAB_D (vec_addsub_optab, "vec_addsub_$a3")
> >> +OPTAB_D (vec_subadd_optab, "vec_subadd_$a3")
> >>
> >> OPTAB_D (sync_add_optab, "sync_add$I$a")
> >> OPTAB_D (sync_and_optab, "sync_and$I$a")
> >> diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
> >> index 09c7f20..efd6c24 100644
> >> --- a/gcc/testsuite/ChangeLog
> >> +++ b/gcc/testsuite/ChangeLog
> >> @@ -1,3 +1,10 @@
> >> +2013-11-14 Cong Hou <congh@google.com>
> >> +
> >> + * lib/target-supports.exp (check_effective_target_vect_addsub):
> >> + New target.
> >> + * gcc.dg/vect/vect-addsub-float.c: New test.
> >> + * gcc.dg/vect/vect-addsub-double.c: New test.
> >> +
> >> 2013-11-12 Balaji V. Iyer <balaji.v.iyer@intel.com>
> >>
> >> * gcc.dg/cilk-plus/cilk-plus.exp: Added a check for LTO before running
> >> diff --git a/gcc/testsuite/gcc.dg/vect/vect-addsub-double.c
> >> b/gcc/testsuite/gcc.dg/vect/vect-addsub-double.c
> >> new file mode 100644
> >> index 0000000..5399dde
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.dg/vect/vect-addsub-double.c
> >> @@ -0,0 +1,51 @@
> >> +/* { dg-require-effective-target vect_addsub } */
> >> +/* { dg-additional-options "-fdump-tree-slp-details" } */
> >> +
> >> +#include "tree-vect.h"
> >> +
> >> +double a[4], b[4], c[4];
> >> +
> >> +void subadd ()
> >> +{
> >> + c[0] = a[0] - b[0];
> >> + c[1] = a[1] + b[1];
> >> + c[2] = a[2] - b[2];
> >> + c[3] = a[3] + b[3];
> >> +}
> >> +
> >> +void addsub ()
> >> +{
> >> + c[0] = a[0] + b[0];
> >> + c[1] = a[1] - b[1];
> >> + c[2] = a[2] + b[2];
> >> + c[3] = a[3] - b[3];
> >> +}
> >> +
> >> +int main()
> >> +{
> >> + int i;
> >> + for (i = 0; i < 4; ++i)
> >> + {
> >> + a[i] = (i + 1.2) / 3.4;
> >> + b[i] = (i + 5.6) / 7.8;
> >> + }
> >> +
> >> + subadd ();
> >> +
> >> + if (c[0] != a[0] - b[0]
> >> + || c[1] != a[1] + b[1]
> >> + || c[2] != a[2] - b[2]
> >> + || c[3] != a[3] + b[3])
> >> + abort ();
> >> +
> >> + addsub ();
> >> +
> >> + if (c[0] != a[0] + b[0]
> >> + || c[1] != a[1] - b[1]
> >> + || c[2] != a[2] + b[2]
> >> + || c[3] != a[3] - b[3])
> >> + abort ();
> >> +}
> >> +
> >> +/* { dg-final { scan-tree-dump-times "basic block vectorized" 2 "slp" } } */
> >> +/* { dg-final { cleanup-tree-dump "slp" } } */
> >> diff --git a/gcc/testsuite/gcc.dg/vect/vect-addsub-float.c
> >> b/gcc/testsuite/gcc.dg/vect/vect-addsub-float.c
> >> new file mode 100644
> >> index 0000000..5b780f3
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.dg/vect/vect-addsub-float.c
> >> @@ -0,0 +1,51 @@
> >> +/* { dg-require-effective-target vect_addsub } */
> >> +/* { dg-additional-options "-fdump-tree-slp-details" } */
> >> +
> >> +#include "tree-vect.h"
> >> +
> >> +float a[4], b[4], c[4];
> >> +
> >> +void subadd ()
> >> +{
> >> + c[0] = a[0] - b[0];
> >> + c[1] = a[1] + b[1];
> >> + c[2] = a[2] - b[2];
> >> + c[3] = a[3] + b[3];
> >> +}
> >> +
> >> +void addsub ()
> >> +{
> >> + c[0] = a[0] + b[0];
> >> + c[1] = a[1] - b[1];
> >> + c[2] = a[2] + b[2];
> >> + c[3] = a[3] - b[3];
> >> +}
> >> +
> >> +int main()
> >> +{
> >> + int i;
> >> + for (i = 0; i < 4; ++i)
> >> + {
> >> + a[i] = (i + 1.2) / 3.4;
> >> + b[i] = (i + 5.6) / 7.8;
> >> + }
> >> +
> >> + subadd ();
> >> +
> >> + if (c[0] != a[0] - b[0]
> >> + || c[1] != a[1] + b[1]
> >> + || c[2] != a[2] - b[2]
> >> + || c[3] != a[3] + b[3])
> >> + abort ();
> >> +
> >> + addsub ();
> >> +
> >> + if (c[0] != a[0] + b[0]
> >> + || c[1] != a[1] - b[1]
> >> + || c[2] != a[2] + b[2]
> >> + || c[3] != a[3] - b[3])
> >> + abort ();
> >> +}
> >> +
> >> +/* { dg-final { scan-tree-dump-times "basic block vectorized" 2 "slp" } } */
> >> +/* { dg-final { cleanup-tree-dump "slp" } } */
> >> diff --git a/gcc/testsuite/lib/target-supports.exp
> >> b/gcc/testsuite/lib/target-supports.exp
> >> index c3d9712..f336f77 100644
> >> --- a/gcc/testsuite/lib/target-supports.exp
> >> +++ b/gcc/testsuite/lib/target-supports.exp
> >> @@ -4099,6 +4099,15 @@ proc check_effective_target_vect_extract_even_odd { } {
> >> return $et_vect_extract_even_odd_saved
> >> }
> >>
> >> +# Return 1 if the target supports vector addsub and subadd
> >> operations, 0 otherwise.
> >> +
> >> +proc check_effective_target_vect_addsub { } {
> >> + if { [check_effective_target_sse2] } {
> >> + return 1
> >> + }
> >> + return 0
> >> +}
> >> +
> >> # Return 1 if the target supports vector interleaving, 0 otherwise.
> >>
> >> proc check_effective_target_vect_interleave { } {
> >> diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
> >> index 601efd6..2bf1b79 100644
> >> --- a/gcc/tree-cfg.c
> >> +++ b/gcc/tree-cfg.c
> >> @@ -3572,6 +3572,23 @@ verify_gimple_assign_binary (gimple stmt)
> >> return false;
> >> }
> >>
> >> + case VEC_SUBADD_EXPR:
> >> + case VEC_ADDSUB_EXPR:
> >> + {
> >> + if (TREE_CODE (rhs1_type) != VECTOR_TYPE
> >> + || TREE_CODE (rhs2_type) != VECTOR_TYPE
> >> + || TREE_CODE (lhs_type) != VECTOR_TYPE)
> >> + {
> >> + error ("type mismatch in addsub/subadd expression");
> >> + debug_generic_expr (lhs_type);
> >> + debug_generic_expr (rhs1_type);
> >> + debug_generic_expr (rhs2_type);
> >> + return true;
> >> + }
> >> +
> >> + return false;
> >> + }
> >> +
> >> case PLUS_EXPR:
> >> case MINUS_EXPR:
> >> {
> >> diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
> >> index 825f73a..1169d33 100644
> >> --- a/gcc/tree-vect-slp.c
> >> +++ b/gcc/tree-vect-slp.c
> >> @@ -125,6 +125,7 @@ vect_create_new_slp_node (vec<gimple> scalar_stmts)
> >> SLP_TREE_VEC_STMTS (node).create (0);
> >> SLP_TREE_CHILDREN (node).create (nops);
> >> SLP_TREE_LOAD_PERMUTATION (node) = vNULL;
> >> + SLP_TREE_OP_CODE (node) = ERROR_MARK;
> >>
> >> return node;
> >> }
> >> @@ -383,8 +384,74 @@ vect_get_and_check_slp_defs (loop_vec_info
> >> loop_vinfo, bb_vec_info bb_vinfo,
> >> return true;
> >> }
> >>
> >> +/* Check if the target supports the vector operation that performs the
> >> + operation of FIRST_STMT_CODE on even elements and the operation as in STMT
> >> + on odd elements. If yes, set the code of NODE to that of the new operation
> >> + and return true. Otherwise return false. This enables SLP vectorization
> >> + for the following code:
> >>
> >> -/* Verify if the scalar stmts STMTS are isomorphic, require data
> >> + a[0] = b[0] + c[0];
> >> + a[1] = b[1] - c[1];
> >> + a[2] = b[2] + c[2];
> >> + a[3] = b[3] - c[3];
> >> + */
> >> +
> >> +static bool
> >> +slp_supported_non_isomorphic_op (enum tree_code first_stmt_code,
> >> + gimple stmt,
> >> + slp_tree *node)
> >> +{
> >> + if (!is_gimple_assign (stmt))
> >> + return false;
> >> +
> >> + enum tree_code rhs_code = gimple_assign_rhs_code (stmt);
> >> + enum tree_code vec_opcode = ERROR_MARK;
> >> +
> >> + switch (first_stmt_code)
> >> + {
> >> + case PLUS_EXPR:
> >> + if (rhs_code == MINUS_EXPR)
> >> + vec_opcode = VEC_ADDSUB_EXPR;
> >> + break;
> >> +
> >> + case MINUS_EXPR:
> >> + if (rhs_code == PLUS_EXPR)
> >> + vec_opcode = VEC_SUBADD_EXPR;
> >> + break;
> >> +
> >> + default:
> >> + return false;
> >> + }
> >> +
> >> + if (vec_opcode == ERROR_MARK)
> >> + return false;
> >> +
> >> + stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
> >> + tree vectype = STMT_VINFO_VECTYPE (stmt_info);
> >> + if (!vectype)
> >> + {
> >> + vectype = get_vectype_for_scalar_type
> >> + (TREE_TYPE (gimple_assign_rhs1 (stmt)));
> >> + gcc_assert (vectype);
> >> + }
> >> +
> >> + optab optab = optab_for_tree_code (vec_opcode, vectype, optab_default);
> >> + if (!optab)
> >> + return false;
> >> +
> >> + int icode = (int) optab_handler (optab, TYPE_MODE (vectype));
> >> + if (icode == CODE_FOR_nothing)
> >> + return false;
> >> +
> >> + if (SLP_TREE_OP_CODE (*node) != ERROR_MARK
> >> + && SLP_TREE_OP_CODE (*node) != vec_opcode)
> >> + return false;
> >> +
> >> + SLP_TREE_OP_CODE (*node) = vec_opcode;
> >> + return true;
> >> +}
> >> +
> >> +/* Verify if the scalar stmts of NODE are isomorphic, require data
> >> permutation or are of unsupported types of operation. Return
> >> true if they are, otherwise return false and indicate in *MATCHES
> >> which stmts are not isomorphic to the first one. If MATCHES[0]
> >> @@ -393,11 +460,12 @@ vect_get_and_check_slp_defs (loop_vec_info
> >> loop_vinfo, bb_vec_info bb_vinfo,
> >>
> >> static bool
> >> vect_build_slp_tree_1 (loop_vec_info loop_vinfo, bb_vec_info bb_vinfo,
> >> - vec<gimple> stmts, unsigned int group_size,
> >> + slp_tree *node, unsigned int group_size,
> >> unsigned nops, unsigned int *max_nunits,
> >> unsigned int vectorization_factor, bool *matches)
> >> {
> >> unsigned int i;
> >> + vec<gimple> stmts = SLP_TREE_SCALAR_STMTS (*node);
> >> gimple stmt = stmts[0];
> >> enum tree_code first_stmt_code = ERROR_MARK, rhs_code = ERROR_MARK;
> >> enum tree_code first_cond_code = ERROR_MARK;
> >> @@ -583,7 +651,10 @@ vect_build_slp_tree_1 (loop_vec_info loop_vinfo,
> >> bb_vec_info bb_vinfo,
> >> }
> >> else
> >> {
> >> - if (first_stmt_code != rhs_code
> >> + if ((first_stmt_code != rhs_code
> >> + && (i % 2 == 0
> >> + || !slp_supported_non_isomorphic_op (first_stmt_code,
> >> + stmt, node)))
> >> && (first_stmt_code != IMAGPART_EXPR
> >> || rhs_code != REALPART_EXPR)
> >> && (first_stmt_code != REALPART_EXPR
> >> @@ -868,7 +939,7 @@ vect_build_slp_tree (loop_vec_info loop_vinfo,
> >> bb_vec_info bb_vinfo,
> >> return false;
> >>
> >> if (!vect_build_slp_tree_1 (loop_vinfo, bb_vinfo,
> >> - SLP_TREE_SCALAR_STMTS (*node), group_size, nops,
> >> + node, group_size, nops,
> >> max_nunits, vectorization_factor, matches))
> >> return false;
> >>
> >> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> >> index b0e0fa9..98906f0 100644
> >> --- a/gcc/tree-vect-stmts.c
> >> +++ b/gcc/tree-vect-stmts.c
> >> @@ -3512,7 +3512,13 @@ vectorizable_operation (gimple stmt,
> >> gimple_stmt_iterator *gsi,
> >> if (TREE_CODE (gimple_assign_lhs (stmt)) != SSA_NAME)
> >> return false;
> >>
> >> - code = gimple_assign_rhs_code (stmt);
> >> + /* Check if this slp_node will be vectorized by non-isomorphic operations,
> >> + in which case the operation on vectors is stored in
> >> + SLP_TREE_OP_CODE (slp_node). */
> >> + if (slp_node && SLP_TREE_OP_CODE (slp_node) != ERROR_MARK)
> >> + code = SLP_TREE_OP_CODE (slp_node);
> >> + else
> >> + code = gimple_assign_rhs_code (stmt);
> >>
> >> /* For pointer addition, we should use the normal plus for
> >> the vector addition. */
> >> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> >> index bbd50e1..19c09ae 100644
> >> --- a/gcc/tree-vectorizer.h
> >> +++ b/gcc/tree-vectorizer.h
> >> @@ -117,6 +117,10 @@ struct _slp_tree {
> >> scalar elements in one scalar iteration (GROUP_SIZE) multiplied by VF
> >> divided by vector size. */
> >> unsigned int vec_stmts_size;
> >> + /* The operation code used in the vectorized statement if it is not
> >> + ERROR_MARK. Otherwise the operation is determined by the original
> >> + statement. */
> >> + enum tree_code op_code;
> >> };
> >>
> >>
> >> @@ -157,6 +161,7 @@ typedef struct _slp_instance {
> >> #define SLP_TREE_VEC_STMTS(S) (S)->vec_stmts
> >> #define SLP_TREE_NUMBER_OF_VEC_STMTS(S) (S)->vec_stmts_size
> >> #define SLP_TREE_LOAD_PERMUTATION(S) (S)->load_permutation
> >> +#define SLP_TREE_OP_CODE(S) (S)->op_code
> >>
> >> /* This structure is used in creation of an SLP tree. Each instance
> >> corresponds to the same operand in a group of scalar stmts in an SLP
> >> diff --git a/gcc/tree.def b/gcc/tree.def
> >> index 6763e78..c3eda42 100644
> >> --- a/gcc/tree.def
> >> +++ b/gcc/tree.def
> >> @@ -1256,6 +1256,13 @@ DEFTREECODE (VEC_PACK_FIX_TRUNC_EXPR,
> >> "vec_pack_fix_trunc_expr", tcc_binary, 2)
> >> DEFTREECODE (VEC_WIDEN_LSHIFT_HI_EXPR, "widen_lshift_hi_expr", tcc_binary, 2)
> >> DEFTREECODE (VEC_WIDEN_LSHIFT_LO_EXPR, "widen_lshift_lo_expr", tcc_binary, 2)
> >>
> >> +/* Add even/odd elements and sub odd/even elements between two vectors.
> >> + Operand 0 and operand 1 are two operands.
> >> + The result of this operation is a vector with the same type of operand 0/1.
> >> + */
> >> +DEFTREECODE (VEC_ADDSUB_EXPR, "addsub_expr", tcc_binary, 2)
> >> +DEFTREECODE (VEC_SUBADD_EXPR, "subadd_expr", tcc_binary, 2)
> >> +
> >> /* PREDICT_EXPR. Specify hint for branch prediction. The
> >> PREDICT_EXPR_PREDICTOR specify predictor and PREDICT_EXPR_OUTCOME the
> >> outcome (0 for not taken and 1 for taken). Once the profile is guessed
> >>
> >
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE / SUSE Labs
> > SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
> > GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer
>
>
--
Richard Biener <rguenther@suse.de>
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer
next prev parent reply other threads:[~2013-11-19 9:45 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-11-15 8:53 Cong Hou
2013-11-15 10:02 ` Richard Biener
2013-11-18 21:00 ` Cong Hou
2013-11-19 11:22 ` Richard Biener [this message]
2013-11-20 5:28 ` Cong Hou
2013-11-20 10:09 ` Richard Biener
2013-11-22 0:33 ` Cong Hou
2013-11-22 3:32 ` Cong Hou
2013-11-22 4:08 ` Marc Glisse
2013-11-22 5:49 ` Cong Hou
2013-11-22 13:18 ` Marc Glisse
2013-11-22 21:40 ` Cong Hou
2013-11-23 18:46 ` Marc Glisse
2013-12-03 1:02 ` Cong Hou
2013-12-17 18:05 ` Cong Hou
2014-07-09 3:23 ` Xinliang David Li
2014-07-10 4:50 ` Cong Hou
2013-11-15 19:25 ` Richard Earnshaw
2013-11-18 21:08 ` Cong Hou
2013-11-19 11:45 ` Richard Earnshaw
2013-11-15 10:06 Uros Bizjak
2013-11-18 21:07 ` Cong Hou
2013-11-18 21:57 ` Uros Bizjak
2013-11-19 5:13 ` Cong Hou
2013-11-22 11:57 Uros Bizjak
2013-11-22 21:33 ` Cong Hou
2014-07-10 7:51 Uros Bizjak
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LNX.2.00.1311191034550.4261@zhemvz.fhfr.qr \
--to=rguenther@suse.de \
--cc=congh@google.com \
--cc=gcc-patches@gcc.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).