From: Richard Biener <rguenther@suse.de>
To: Jakub Jelinek <jakub@redhat.com>
Cc: Kilian Verhetsel <kilian.verhetsel@uclouvain.be>,
Alan Hayward <Alan.Hayward@arm.com>,
GCC Patches <gcc-patches@gcc.gnu.org>, nd <nd@arm.com>
Subject: Re: [PATCH] Fix result for conditional reductions matching at index 0 (PR tree-optimization/80631)
Date: Tue, 12 Dec 2017 07:57:00 -0000 [thread overview]
Message-ID: <alpine.LSU.2.20.1712120856190.12252@zhemvz.fhfr.qr> (raw)
In-Reply-To: <20171211212245.GS2353@tucnak>
[-- Attachment #1: Type: text/plain, Size: 18139 bytes --]
On Mon, 11 Dec 2017, Jakub Jelinek wrote:
> On Mon, Dec 11, 2017 at 06:00:11PM +0100, Kilian Verhetsel wrote:
> > Jakub Jelinek <jakub@redhat.com> writes:
> > > Of course it can be done efficiently, what we care most is that the body of
> > > the vectorized loop is efficient.
> >
> > That's fair, I was looking at the x86 assembly being generated when a single
> > vectorized iteration was enough (because that is the context in which I
> > first encountered this bug):
> >
> > int f(unsigned int *x, unsigned int k) {
> > unsigned int result = 8;
> > for (unsigned int i = 0; i < 8; i++) {
> > if (x[i] == k) result = i;
> > }
> > return result;
> > }
> >
> > where the vpand instruction this generates would have to be replaced
> > with a variable blend if the default value weren't 0 â although I had
> > not realized even SSE4.1 on x86 includes such an instruction, making
> > this point less relevant.
>
> So, here is my version of the patch, independent from your change.
> As I said, your patch is still highly valueable if it will be another
> STMT_VINFO_VEC_REDUCTION_TYPE kind to be used for the cases like the
> above testcase, where base is equal to TYPE_MIN_VALUE, or future improvement
> of base being variable, but TYPE_OVERFLOW_UNDEFINED iterator, where all we
> need is that the maximum number of iterations is smaller than the maximum
> of the type we use for the reduction phi.
>
> The patch handles also negative steps, though for now only on signed type
> (for unsigned it can't be really negative, but perhaps we could treat
> unsigned values with the msb set as if they were negative and consider
> overflows in that direction).
>
> Bootstrapped/regtested on x86_64-linux, i686-linux, powerpc64le-linux,
> bootstrapped on powerpc64-linux, regtest there ongoing. Ok for trunk?
>
> The patch prefers to emit what we were emitting if possible (i.e. zero
> value for the COND_EXPR never hit) - building a zero vector is usually
> cheaper than any other; if that is not possible, checks if initial_def
> can be used for that value - then we can avoid the
> res == induc_val ? initial_def : res;
> conditional move; if even that is not possible, attempts to use any other
> value. If no value can be found, it for now uses COND_REDUCTION, which is
> more expensive, but correct.
Ok.
Thanks,
Richard.
> 2017-12-11 Jakub Jelinek <jakub@redhat.com>
>
> PR tree-optimization/80631
> * tree-vect-loop.c (get_initial_def_for_reduction): Fix comment typo.
> (vect_create_epilog_for_reduction): Add INDUC_VAL and INDUC_CODE
> arguments, for INTEGER_INDUC_COND_REDUCTION use INDUC_VAL instead of
> hardcoding zero as the value if COND_EXPR is never true. For
> INTEGER_INDUC_COND_REDUCTION don't emit the final COND_EXPR if
> INDUC_VAL is equal to INITIAL_DEF, and use INDUC_CODE instead of
> hardcoding MAX_EXPR as the reduction operation.
> (is_nonwrapping_integer_induction): Allow negative step.
> (vectorizable_reduction): Compute INDUC_VAL and INDUC_CODE for
> vect_create_epilog_for_reduction, if no value is suitable, don't
> use INTEGER_INDUC_COND_REDUCTION for now. Formatting fixes.
>
> * gcc.dg/vect/pr80631-1.c: New test.
> * gcc.dg/vect/pr80631-2.c: New test.
> * gcc.dg/vect/pr65947-13.c: Expect integer induc cond reduction
> vectorization.
>
> --- gcc/tree-vect-loop.c.jj 2017-12-11 14:57:38.000000000 +0100
> +++ gcc/tree-vect-loop.c 2017-12-11 16:59:06.930720928 +0100
> @@ -4034,7 +4034,7 @@ get_initial_def_for_reduction (gimple *s
> case MULT_EXPR:
> case BIT_AND_EXPR:
> {
> - /* ADJUSMENT_DEF is NULL when called from
> + /* ADJUSTMENT_DEF is NULL when called from
> vect_create_epilog_for_reduction to vectorize double reduction. */
> if (adjustment_def)
> *adjustment_def = init_val;
> @@ -4283,6 +4283,11 @@ get_initial_defs_for_reduction (slp_tree
> DOUBLE_REDUC is TRUE if double reduction phi nodes should be handled.
> SLP_NODE is an SLP node containing a group of reduction statements. The
> first one in this group is STMT.
> + INDUC_VAL is for INTEGER_INDUC_COND_REDUCTION the value to use for the case
> + when the COND_EXPR is never true in the loop. For MAX_EXPR, it needs to
> + be smaller than any value of the IV in the loop, for MIN_EXPR larger than
> + any value of the IV in the loop.
> + INDUC_CODE is the code for epilog reduction if INTEGER_INDUC_COND_REDUCTION.
>
> This function:
> 1. Creates the reduction def-use cycles: sets the arguments for
> @@ -4330,7 +4335,8 @@ vect_create_epilog_for_reduction (vec<tr
> vec<gimple *> reduction_phis,
> bool double_reduc,
> slp_tree slp_node,
> - slp_instance slp_node_instance)
> + slp_instance slp_node_instance,
> + tree induc_val, enum tree_code induc_code)
> {
> stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
> stmt_vec_info prev_phi_info;
> @@ -4419,6 +4425,18 @@ vect_create_epilog_for_reduction (vec<tr
> gimple *def_stmt;
> initial_def = PHI_ARG_DEF_FROM_EDGE (reduc_def_stmt,
> loop_preheader_edge (loop));
> + /* Optimize: if initial_def is for REDUC_MAX smaller than the base
> + and we can't use zero for induc_val, use initial_def. Similarly
> + for REDUC_MIN and initial_def larger than the base. */
> + if (TREE_CODE (initial_def) == INTEGER_CST
> + && (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
> + == INTEGER_INDUC_COND_REDUCTION)
> + && !integer_zerop (induc_val)
> + && ((reduc_fn == IFN_REDUC_MAX
> + && tree_int_cst_lt (initial_def, induc_val))
> + || (reduc_fn == IFN_REDUC_MIN
> + && tree_int_cst_lt (induc_val, initial_def))))
> + induc_val = initial_def;
> vect_is_simple_use (initial_def, loop_vinfo, &def_stmt, &initial_def_dt);
> vec_initial_def = get_initial_def_for_reduction (stmt, initial_def,
> &adjustment_def);
> @@ -4453,9 +4471,10 @@ vect_create_epilog_for_reduction (vec<tr
> gcc_assert (i == 0);
>
> tree vec_init_def_type = TREE_TYPE (vec_init_def);
> - tree zero_vec = build_zero_cst (vec_init_def_type);
> + tree induc_val_vec
> + = build_vector_from_val (vec_init_def_type, induc_val);
>
> - add_phi_arg (as_a <gphi *> (phi), zero_vec,
> + add_phi_arg (as_a <gphi *> (phi), induc_val_vec,
> loop_preheader_edge (loop), UNKNOWN_LOCATION);
> }
> else
> @@ -4983,14 +5002,16 @@ vect_create_epilog_for_reduction (vec<tr
> gimple_set_lhs (epilog_stmt, new_temp);
> gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
>
> - if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
> - == INTEGER_INDUC_COND_REDUCTION)
> - {
> - /* Earlier we set the initial value to be zero. Check the result
> - and if it is zero then replace with the original initial
> - value. */
> - tree zero = build_zero_cst (scalar_type);
> - tree zcompare = build2 (EQ_EXPR, boolean_type_node, new_temp, zero);
> + if ((STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
> + == INTEGER_INDUC_COND_REDUCTION)
> + && !operand_equal_p (initial_def, induc_val, 0))
> + {
> + /* Earlier we set the initial value to be a vector if induc_val
> + values. Check the result and if it is induc_val then replace
> + with the original initial value, unless induc_val is
> + the same as initial_def already. */
> + tree zcompare = build2 (EQ_EXPR, boolean_type_node, new_temp,
> + induc_val);
>
> tmp = make_ssa_name (new_scalar_dest);
> epilog_stmt = gimple_build_assign (tmp, COND_EXPR, zcompare,
> @@ -5008,9 +5029,16 @@ vect_create_epilog_for_reduction (vec<tr
> int vec_size_in_bits = tree_to_uhwi (TYPE_SIZE (vectype));
> tree vec_temp;
>
> - /* COND reductions all do the final reduction with MAX_EXPR. */
> + /* COND reductions all do the final reduction with MAX_EXPR
> + or MIN_EXPR. */
> if (code == COND_EXPR)
> - code = MAX_EXPR;
> + {
> + if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
> + == INTEGER_INDUC_COND_REDUCTION)
> + code = induc_code;
> + else
> + code = MAX_EXPR;
> + }
>
> /* Regardless of whether we have a whole vector shift, if we're
> emulating the operation via tree-vect-generic, we don't want
> @@ -5110,7 +5138,7 @@ vect_create_epilog_for_reduction (vec<tr
> else
> vec_temp = gimple_assign_lhs (new_phi);
> tree rhs = build3 (BIT_FIELD_REF, scalar_type, vec_temp, bitsize,
> - bitsize_zero_node);
> + bitsize_zero_node);
> epilog_stmt = gimple_build_assign (new_scalar_dest, rhs);
> new_temp = make_ssa_name (new_scalar_dest, epilog_stmt);
> gimple_assign_set_lhs (epilog_stmt, new_temp);
> @@ -5179,14 +5207,16 @@ vect_create_epilog_for_reduction (vec<tr
> scalar_results.safe_push (new_temp);
> }
>
> - if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
> - == INTEGER_INDUC_COND_REDUCTION)
> - {
> - /* Earlier we set the initial value to be zero. Check the result
> - and if it is zero then replace with the original initial
> - value. */
> - tree zero = build_zero_cst (scalar_type);
> - tree zcompare = build2 (EQ_EXPR, boolean_type_node, new_temp, zero);
> + if ((STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
> + == INTEGER_INDUC_COND_REDUCTION)
> + && !operand_equal_p (initial_def, induc_val, 0))
> + {
> + /* Earlier we set the initial value to be a vector if induc_val
> + values. Check the result and if it is induc_val then replace
> + with the original initial value, unless induc_val is
> + the same as initial_def already. */
> + tree zcompare = build2 (EQ_EXPR, boolean_type_node, new_temp,
> + induc_val);
>
> tree tmp = make_ssa_name (new_scalar_dest);
> epilog_stmt = gimple_build_assign (tmp, COND_EXPR, zcompare,
> @@ -5513,10 +5543,6 @@ is_nonwrapping_integer_induction (gimple
> || TREE_CODE (step) != INTEGER_CST)
> return false;
>
> - /* Check that the induction increments. */
> - if (tree_int_cst_sgn (step) == -1)
> - return false;
> -
> /* Check that the max size of the loop will not wrap. */
>
> if (TYPE_OVERFLOW_UNDEFINED (lhs_type))
> @@ -5608,6 +5634,8 @@ vectorizable_reduction (gimple *stmt, gi
> tree new_temp = NULL_TREE;
> gimple *def_stmt;
> enum vect_def_type dt, cond_reduc_dt = vect_unknown_def_type;
> + gimple *cond_reduc_def_stmt = NULL;
> + enum tree_code cond_reduc_op_code = ERROR_MARK;
> tree scalar_type;
> bool is_simple_use;
> gimple *orig_stmt;
> @@ -5879,9 +5907,13 @@ vectorizable_reduction (gimple *stmt, gi
> cond_reduc_dt = dt;
> cond_reduc_val = ops[i];
> }
> - if (dt == vect_induction_def && def_stmt != NULL
> + if (dt == vect_induction_def
> + && def_stmt != NULL
> && is_nonwrapping_integer_induction (def_stmt, loop))
> - cond_reduc_dt = dt;
> + {
> + cond_reduc_dt = dt;
> + cond_reduc_def_stmt = def_stmt;
> + }
> }
> }
>
> @@ -5928,12 +5960,46 @@ vectorizable_reduction (gimple *stmt, gi
> {
> if (cond_reduc_dt == vect_induction_def)
> {
> - if (dump_enabled_p ())
> - dump_printf_loc (MSG_NOTE, vect_location,
> - "condition expression based on "
> - "integer induction.\n");
> - STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
> - = INTEGER_INDUC_COND_REDUCTION;
> + stmt_vec_info cond_stmt_vinfo = vinfo_for_stmt (cond_reduc_def_stmt);
> + tree base
> + = STMT_VINFO_LOOP_PHI_EVOLUTION_BASE_UNCHANGED (cond_stmt_vinfo);
> + tree step = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (cond_stmt_vinfo);
> +
> + gcc_assert (TREE_CODE (base) == INTEGER_CST
> + && TREE_CODE (step) == INTEGER_CST);
> + cond_reduc_val = NULL_TREE;
> + /* Find a suitable value, for MAX_EXPR below base, for MIN_EXPR
> + above base; punt if base is the minimum value of the type for
> + MAX_EXPR or maximum value of the type for MIN_EXPR for now. */
> + if (tree_int_cst_sgn (step) == -1)
> + {
> + cond_reduc_op_code = MIN_EXPR;
> + if (tree_int_cst_sgn (base) == -1)
> + cond_reduc_val = build_int_cst (TREE_TYPE (base), 0);
> + else if (tree_int_cst_lt (base,
> + TYPE_MAX_VALUE (TREE_TYPE (base))))
> + cond_reduc_val
> + = int_const_binop (PLUS_EXPR, base, integer_one_node);
> + }
> + else
> + {
> + cond_reduc_op_code = MAX_EXPR;
> + if (tree_int_cst_sgn (base) == 1)
> + cond_reduc_val = build_int_cst (TREE_TYPE (base), 0);
> + else if (tree_int_cst_lt (TYPE_MIN_VALUE (TREE_TYPE (base)),
> + base))
> + cond_reduc_val
> + = int_const_binop (MINUS_EXPR, base, integer_one_node);
> + }
> + if (cond_reduc_val)
> + {
> + if (dump_enabled_p ())
> + dump_printf_loc (MSG_NOTE, vect_location,
> + "condition expression based on "
> + "integer induction.\n");
> + STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
> + = INTEGER_INDUC_COND_REDUCTION;
> + }
> }
>
> /* Loop peeling modifies initial value of reduction PHI, which
> @@ -6127,8 +6193,8 @@ vectorizable_reduction (gimple *stmt, gi
> gcc_assert (orig_code == MAX_EXPR || orig_code == MIN_EXPR);
> }
> else if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
> - == INTEGER_INDUC_COND_REDUCTION)
> - orig_code = MAX_EXPR;
> + == INTEGER_INDUC_COND_REDUCTION)
> + orig_code = cond_reduc_op_code;
> }
>
> if (nested_cycle)
> @@ -6464,7 +6530,8 @@ vectorizable_reduction (gimple *stmt, gi
>
> vect_create_epilog_for_reduction (vect_defs, stmt, reduc_def_stmt,
> epilog_copies, reduc_fn, phis,
> - double_reduc, slp_node, slp_node_instance);
> + double_reduc, slp_node, slp_node_instance,
> + cond_reduc_val, cond_reduc_op_code);
>
> return true;
> }
> --- gcc/testsuite/gcc.dg/vect/pr80631-1.c.jj 2017-12-11 16:38:40.452891617 +0100
> +++ gcc/testsuite/gcc.dg/vect/pr80631-1.c 2017-12-11 16:38:20.000000000 +0100
> @@ -0,0 +1,76 @@
> +/* PR tree-optimization/80631 */
> +/* { dg-do run } */
> +
> +#include "tree-vect.h"
> +
> +int v[8] = { 77, 1, 79, 3, 4, 3, 6, 7 };
> +
> +__attribute__((noipa)) void
> +f1 (void)
> +{
> + int k, r = -1;
> + for (k = 0; k < 8; k++)
> + if (v[k] == 77)
> + r = k;
> + if (r != 0)
> + abort ();
> +}
> +
> +__attribute__((noipa)) void
> +f2 (void)
> +{
> + int k, r = 4;
> + for (k = 0; k < 8; k++)
> + if (v[k] == 79)
> + r = k;
> + if (r != 2)
> + abort ();
> +}
> +
> +__attribute__((noipa)) void
> +f3 (void)
> +{
> + int k, r = -17;
> + for (k = 0; k < 8; k++)
> + if (v[k] == 78)
> + r = k;
> + if (r != -17)
> + abort ();
> +}
> +
> +__attribute__((noipa)) void
> +f4 (void)
> +{
> + int k, r = 7;
> + for (k = 0; k < 8; k++)
> + if (v[k] == 78)
> + r = k;
> + if (r != 7)
> + abort ();
> +}
> +
> +__attribute__((noipa)) void
> +f5 (void)
> +{
> + int k, r = -1;
> + for (k = 0; k < 8; k++)
> + if (v[k] == 3)
> + r = k;
> + if (r != 5)
> + abort ();
> +}
> +
> +int
> +main ()
> +{
> + check_vect ();
> + f1 ();
> + f2 ();
> + f3 ();
> + f4 ();
> + f5 ();
> + return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 5 "vect" { target vect_condition } } } */
> +/* { dg-final { scan-tree-dump-times "condition expression based on integer induction." 10 "vect" { target vect_condition } } } */
> --- gcc/testsuite/gcc.dg/vect/pr80631-2.c.jj 2017-12-11 16:39:35.394210969 +0100
> +++ gcc/testsuite/gcc.dg/vect/pr80631-2.c 2017-12-11 16:41:14.420984162 +0100
> @@ -0,0 +1,76 @@
> +/* PR tree-optimization/80631 */
> +/* { dg-do run } */
> +
> +#include "tree-vect.h"
> +
> +int v[8] = { 77, 1, 79, 3, 4, 3, 6, 7 };
> +
> +__attribute__((noipa)) void
> +f1 (void)
> +{
> + int k, r = -1;
> + for (k = 7; k >= 0; k--)
> + if (v[k] == 77)
> + r = k;
> + if (r != 0)
> + abort ();
> +}
> +
> +__attribute__((noipa)) void
> +f2 (void)
> +{
> + int k, r = 4;
> + for (k = 7; k >= 0; k--)
> + if (v[k] == 79)
> + r = k;
> + if (r != 2)
> + abort ();
> +}
> +
> +__attribute__((noipa)) void
> +f3 (void)
> +{
> + int k, r = -17;
> + for (k = 7; k >= 0; k--)
> + if (v[k] == 78)
> + r = k;
> + if (r != -17)
> + abort ();
> +}
> +
> +__attribute__((noipa)) void
> +f4 (void)
> +{
> + int k, r = 7;
> + for (k = 7; k >= 0; k--)
> + if (v[k] == 78)
> + r = k;
> + if (r != 7)
> + abort ();
> +}
> +
> +__attribute__((noipa)) void
> +f5 (void)
> +{
> + int k, r = -1;
> + for (k = 7; k >= 0; k--)
> + if (v[k] == 3)
> + r = k;
> + if (r != 3)
> + abort ();
> +}
> +
> +int
> +main ()
> +{
> + check_vect ();
> + f1 ();
> + f2 ();
> + f3 ();
> + f4 ();
> + f5 ();
> + return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 5 "vect" { target vect_condition } } } */
> +/* { dg-final { scan-tree-dump-times "condition expression based on integer induction." 10 "vect" { target vect_condition } } } */
> --- gcc/testsuite/gcc.dg/vect/pr65947-13.c.jj 2017-06-23 17:04:40.000000000 +0200
> +++ gcc/testsuite/gcc.dg/vect/pr65947-13.c 2017-12-11 21:04:15.822886161 +0100
> @@ -6,8 +6,7 @@ extern void abort (void) __attribute__ (
>
> #define N 32
>
> -/* Simple condition reduction with a reversed loop.
> - Will fail to vectorize to a simple case. */
> +/* Simple condition reduction with a reversed loop. */
>
> int
> condition_reduction (int *a, int min_v)
> @@ -42,4 +41,4 @@ main (void)
> }
>
> /* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" } } */
> -/* { dg-final { scan-tree-dump-not "condition expression based on integer induction." "vect" } } */
> +/* { dg-final { scan-tree-dump-times "condition expression based on integer induction." 4 "vect" } } */
>
>
> Jakub
>
>
--
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
prev parent reply other threads:[~2017-12-12 7:57 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-11-21 11:41 [PATCH] Fix result for conditional reductions matching at index 0 Kilian Verhetsel
2017-11-21 14:06 ` Richard Biener
2017-11-21 16:49 ` Kilian Verhetsel
2017-11-21 17:09 ` Alan Hayward
2017-11-22 9:17 ` Richard Biener
2017-11-22 11:15 ` Alan Hayward
2017-11-22 15:07 ` Richard Biener
2017-11-22 17:23 ` Kilian Verhetsel
2017-11-23 10:30 ` Alan Hayward
2017-11-23 12:39 ` Richard Biener
2017-12-08 18:15 ` Jakub Jelinek
2017-12-11 10:57 ` Kilian Verhetsel
2017-12-11 13:11 ` Jakub Jelinek
2017-12-11 13:51 ` Jakub Jelinek
2017-12-11 17:00 ` Kilian Verhetsel
2017-12-11 17:06 ` Jakub Jelinek
2017-12-11 21:22 ` [PATCH] Fix result for conditional reductions matching at index 0 (PR tree-optimization/80631) Jakub Jelinek
2017-12-12 7:57 ` Richard Biener [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LSU.2.20.1712120856190.12252@zhemvz.fhfr.qr \
--to=rguenther@suse.de \
--cc=Alan.Hayward@arm.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=jakub@redhat.com \
--cc=kilian.verhetsel@uclouvain.be \
--cc=nd@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).