On Mon, 11 Dec 2017, Jakub Jelinek wrote:

> On Mon, Dec 11, 2017 at 06:00:11PM +0100, Kilian Verhetsel wrote:
> > Jakub Jelinek <jakub@redhat.com> writes:
> > > Of course it can be done efficiently, what we care most is that the body of
> > > the vectorized loop is efficient.
> > 
> > That's fair, I was looking at the x86 assembly being generated when a single
> > vectorized iteration was enough (because that is the context in which I
> > first encountered this bug):
> > 
> >     int f(unsigned int *x, unsigned int k) {
> >       unsigned int result = 8;
> >       for (unsigned int i = 0; i < 8; i++) {
> >         if (x[i] == k) result = i;
> >       }
> >       return result;
> >     }
> > 
> > where the vpand instruction this generates would have to be replaced
> > with a variable blend if the default value weren't 0 â although I had
> > not realized even SSE4.1 on x86 includes such an instruction, making
> > this point less relevant.
> 
> So, here is my version of the patch, independent from your change.
> As I said, your patch is still highly valueable if it will be another
> STMT_VINFO_VEC_REDUCTION_TYPE kind to be used for the cases like the
> above testcase, where base is equal to TYPE_MIN_VALUE, or future improvement
> of base being variable, but TYPE_OVERFLOW_UNDEFINED iterator, where all we
> need is that the maximum number of iterations is smaller than the maximum
> of the type we use for the reduction phi.
> 
> The patch handles also negative steps, though for now only on signed type
> (for unsigned it can't be really negative, but perhaps we could treat
> unsigned values with the msb set as if they were negative and consider
> overflows in that direction).
> 
> Bootstrapped/regtested on x86_64-linux, i686-linux, powerpc64le-linux,
> bootstrapped on powerpc64-linux, regtest there ongoing.  Ok for trunk?
> 
> The patch prefers to emit what we were emitting if possible (i.e. zero
> value for the COND_EXPR never hit) - building a zero vector is usually
> cheaper than any other; if that is not possible, checks if initial_def
> can be used for that value - then we can avoid the
> res == induc_val ? initial_def : res;
> conditional move; if even that is not possible, attempts to use any other
> value.  If no value can be found, it for now uses COND_REDUCTION, which is
> more expensive, but correct.

Ok.

Thanks,
Richard.

> 2017-12-11  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR tree-optimization/80631
> 	* tree-vect-loop.c (get_initial_def_for_reduction): Fix comment typo.
> 	(vect_create_epilog_for_reduction): Add INDUC_VAL and INDUC_CODE
> 	arguments, for INTEGER_INDUC_COND_REDUCTION use INDUC_VAL instead of
> 	hardcoding zero as the value if COND_EXPR is never true.  For
> 	INTEGER_INDUC_COND_REDUCTION don't emit the final COND_EXPR if
> 	INDUC_VAL is equal to INITIAL_DEF, and use INDUC_CODE instead of
> 	hardcoding MAX_EXPR as the reduction operation.
> 	(is_nonwrapping_integer_induction): Allow negative step.
> 	(vectorizable_reduction): Compute INDUC_VAL and INDUC_CODE for
> 	vect_create_epilog_for_reduction, if no value is suitable, don't
> 	use INTEGER_INDUC_COND_REDUCTION for now.  Formatting fixes.
> 
> 	* gcc.dg/vect/pr80631-1.c: New test.
> 	* gcc.dg/vect/pr80631-2.c: New test.
> 	* gcc.dg/vect/pr65947-13.c: Expect integer induc cond reduction
> 	vectorization.
> 
> --- gcc/tree-vect-loop.c.jj	2017-12-11 14:57:38.000000000 +0100
> +++ gcc/tree-vect-loop.c	2017-12-11 16:59:06.930720928 +0100
> @@ -4034,7 +4034,7 @@ get_initial_def_for_reduction (gimple *s
>      case MULT_EXPR:
>      case BIT_AND_EXPR:
>        {
> -        /* ADJUSMENT_DEF is NULL when called from
> +        /* ADJUSTMENT_DEF is NULL when called from
>             vect_create_epilog_for_reduction to vectorize double reduction.  */
>          if (adjustment_def)
>  	  *adjustment_def = init_val;
> @@ -4283,6 +4283,11 @@ get_initial_defs_for_reduction (slp_tree
>     DOUBLE_REDUC is TRUE if double reduction phi nodes should be handled.
>     SLP_NODE is an SLP node containing a group of reduction statements. The 
>       first one in this group is STMT.
> +   INDUC_VAL is for INTEGER_INDUC_COND_REDUCTION the value to use for the case
> +     when the COND_EXPR is never true in the loop.  For MAX_EXPR, it needs to
> +     be smaller than any value of the IV in the loop, for MIN_EXPR larger than
> +     any value of the IV in the loop.
> +   INDUC_CODE is the code for epilog reduction if INTEGER_INDUC_COND_REDUCTION.
>  
>     This function:
>     1. Creates the reduction def-use cycles: sets the arguments for 
> @@ -4330,7 +4335,8 @@ vect_create_epilog_for_reduction (vec<tr
>  				  vec<gimple *> reduction_phis,
>                                    bool double_reduc, 
>  				  slp_tree slp_node,
> -				  slp_instance slp_node_instance)
> +				  slp_instance slp_node_instance,
> +				  tree induc_val, enum tree_code induc_code)
>  {
>    stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
>    stmt_vec_info prev_phi_info;
> @@ -4419,6 +4425,18 @@ vect_create_epilog_for_reduction (vec<tr
>        gimple *def_stmt;
>        initial_def = PHI_ARG_DEF_FROM_EDGE (reduc_def_stmt,
>  					   loop_preheader_edge (loop));
> +      /* Optimize: if initial_def is for REDUC_MAX smaller than the base
> +	 and we can't use zero for induc_val, use initial_def.  Similarly
> +	 for REDUC_MIN and initial_def larger than the base.  */
> +      if (TREE_CODE (initial_def) == INTEGER_CST
> +	  && (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
> +	      == INTEGER_INDUC_COND_REDUCTION)
> +	  && !integer_zerop (induc_val)
> +	  && ((reduc_fn == IFN_REDUC_MAX
> +	       && tree_int_cst_lt (initial_def, induc_val))
> +	      || (reduc_fn == IFN_REDUC_MIN
> +		  && tree_int_cst_lt (induc_val, initial_def))))
> +	induc_val = initial_def;
>        vect_is_simple_use (initial_def, loop_vinfo, &def_stmt, &initial_def_dt);
>        vec_initial_def = get_initial_def_for_reduction (stmt, initial_def,
>  						       &adjustment_def);
> @@ -4453,9 +4471,10 @@ vect_create_epilog_for_reduction (vec<tr
>  	      gcc_assert (i == 0);
>  
>  	      tree vec_init_def_type = TREE_TYPE (vec_init_def);
> -	      tree zero_vec = build_zero_cst (vec_init_def_type);
> +	      tree induc_val_vec
> +		= build_vector_from_val (vec_init_def_type, induc_val);
>  
> -	      add_phi_arg (as_a <gphi *> (phi), zero_vec,
> +	      add_phi_arg (as_a <gphi *> (phi), induc_val_vec,
>  			   loop_preheader_edge (loop), UNKNOWN_LOCATION);
>  	    }
>  	  else
> @@ -4983,14 +5002,16 @@ vect_create_epilog_for_reduction (vec<tr
>        gimple_set_lhs (epilog_stmt, new_temp);
>        gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
>  
> -      if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
> -	  == INTEGER_INDUC_COND_REDUCTION)
> -	{
> -	  /* Earlier we set the initial value to be zero.  Check the result
> -	     and if it is zero then replace with the original initial
> -	     value.  */
> -	  tree zero = build_zero_cst (scalar_type);
> -	  tree zcompare = build2 (EQ_EXPR, boolean_type_node, new_temp, zero);
> +      if ((STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
> +	   == INTEGER_INDUC_COND_REDUCTION)
> +	  && !operand_equal_p (initial_def, induc_val, 0))
> +	{
> +	  /* Earlier we set the initial value to be a vector if induc_val
> +	     values.  Check the result and if it is induc_val then replace
> +	     with the original initial value, unless induc_val is
> +	     the same as initial_def already.  */
> +	  tree zcompare = build2 (EQ_EXPR, boolean_type_node, new_temp,
> +				  induc_val);
>  
>  	  tmp = make_ssa_name (new_scalar_dest);
>  	  epilog_stmt = gimple_build_assign (tmp, COND_EXPR, zcompare,
> @@ -5008,9 +5029,16 @@ vect_create_epilog_for_reduction (vec<tr
>        int vec_size_in_bits = tree_to_uhwi (TYPE_SIZE (vectype));
>        tree vec_temp;
>  
> -      /* COND reductions all do the final reduction with MAX_EXPR.  */
> +      /* COND reductions all do the final reduction with MAX_EXPR
> +	 or MIN_EXPR.  */
>        if (code == COND_EXPR)
> -	code = MAX_EXPR;
> +	{
> +	  if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
> +	      == INTEGER_INDUC_COND_REDUCTION)
> +	    code = induc_code;
> +	  else
> +	    code = MAX_EXPR;
> +	}
>  
>        /* Regardless of whether we have a whole vector shift, if we're
>           emulating the operation via tree-vect-generic, we don't want
> @@ -5110,7 +5138,7 @@ vect_create_epilog_for_reduction (vec<tr
>                else
>                  vec_temp = gimple_assign_lhs (new_phi);
>                tree rhs = build3 (BIT_FIELD_REF, scalar_type, vec_temp, bitsize,
> -                            bitsize_zero_node);
> +				 bitsize_zero_node);
>                epilog_stmt = gimple_build_assign (new_scalar_dest, rhs);
>                new_temp = make_ssa_name (new_scalar_dest, epilog_stmt);
>                gimple_assign_set_lhs (epilog_stmt, new_temp);
> @@ -5179,14 +5207,16 @@ vect_create_epilog_for_reduction (vec<tr
>              scalar_results.safe_push (new_temp);
>          }
>  
> -      if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
> -	  == INTEGER_INDUC_COND_REDUCTION)
> -	{
> -	  /* Earlier we set the initial value to be zero.  Check the result
> -	     and if it is zero then replace with the original initial
> -	     value.  */
> -	  tree zero = build_zero_cst (scalar_type);
> -	  tree zcompare = build2 (EQ_EXPR, boolean_type_node, new_temp, zero);
> +      if ((STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
> +	   == INTEGER_INDUC_COND_REDUCTION)
> +	  && !operand_equal_p (initial_def, induc_val, 0))
> +	{
> +	  /* Earlier we set the initial value to be a vector if induc_val
> +	     values.  Check the result and if it is induc_val then replace
> +	     with the original initial value, unless induc_val is
> +	     the same as initial_def already.  */
> +	  tree zcompare = build2 (EQ_EXPR, boolean_type_node, new_temp,
> +				  induc_val);
>  
>  	  tree tmp = make_ssa_name (new_scalar_dest);
>  	  epilog_stmt = gimple_build_assign (tmp, COND_EXPR, zcompare,
> @@ -5513,10 +5543,6 @@ is_nonwrapping_integer_induction (gimple
>        || TREE_CODE (step) != INTEGER_CST)
>      return false;
>  
> -  /* Check that the induction increments.  */
> -  if (tree_int_cst_sgn (step) == -1)
> -    return false;
> -
>    /* Check that the max size of the loop will not wrap.  */
>  
>    if (TYPE_OVERFLOW_UNDEFINED (lhs_type))
> @@ -5608,6 +5634,8 @@ vectorizable_reduction (gimple *stmt, gi
>    tree new_temp = NULL_TREE;
>    gimple *def_stmt;
>    enum vect_def_type dt, cond_reduc_dt = vect_unknown_def_type;
> +  gimple *cond_reduc_def_stmt = NULL;
> +  enum tree_code cond_reduc_op_code = ERROR_MARK;
>    tree scalar_type;
>    bool is_simple_use;
>    gimple *orig_stmt;
> @@ -5879,9 +5907,13 @@ vectorizable_reduction (gimple *stmt, gi
>  	      cond_reduc_dt = dt;
>  	      cond_reduc_val = ops[i];
>  	    }
> -	  if (dt == vect_induction_def && def_stmt != NULL
> +	  if (dt == vect_induction_def
> +	      && def_stmt != NULL
>  	      && is_nonwrapping_integer_induction (def_stmt, loop))
> -	    cond_reduc_dt = dt;
> +	    {
> +	      cond_reduc_dt = dt;
> +	      cond_reduc_def_stmt = def_stmt;
> +	    }
>  	}
>      }
>  
> @@ -5928,12 +5960,46 @@ vectorizable_reduction (gimple *stmt, gi
>      {
>        if (cond_reduc_dt == vect_induction_def)
>  	{
> -	  if (dump_enabled_p ())
> -	    dump_printf_loc (MSG_NOTE, vect_location,
> -			     "condition expression based on "
> -			     "integer induction.\n");
> -	  STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
> -	    = INTEGER_INDUC_COND_REDUCTION;
> +	  stmt_vec_info cond_stmt_vinfo = vinfo_for_stmt (cond_reduc_def_stmt);
> +	  tree base
> +	    = STMT_VINFO_LOOP_PHI_EVOLUTION_BASE_UNCHANGED (cond_stmt_vinfo);
> +	  tree step = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (cond_stmt_vinfo);
> +
> +	  gcc_assert (TREE_CODE (base) == INTEGER_CST
> +		      && TREE_CODE (step) == INTEGER_CST);
> +	  cond_reduc_val = NULL_TREE;
> +	  /* Find a suitable value, for MAX_EXPR below base, for MIN_EXPR
> +	     above base; punt if base is the minimum value of the type for
> +	     MAX_EXPR or maximum value of the type for MIN_EXPR for now.  */
> +	  if (tree_int_cst_sgn (step) == -1)
> +	    {
> +	      cond_reduc_op_code = MIN_EXPR;
> +	      if (tree_int_cst_sgn (base) == -1)
> +		cond_reduc_val = build_int_cst (TREE_TYPE (base), 0);
> +	      else if (tree_int_cst_lt (base,
> +					TYPE_MAX_VALUE (TREE_TYPE (base))))
> +		cond_reduc_val
> +		  = int_const_binop (PLUS_EXPR, base, integer_one_node);
> +	    }
> +	  else
> +	    {
> +	      cond_reduc_op_code = MAX_EXPR;
> +	      if (tree_int_cst_sgn (base) == 1)
> +		cond_reduc_val = build_int_cst (TREE_TYPE (base), 0);
> +	      else if (tree_int_cst_lt (TYPE_MIN_VALUE (TREE_TYPE (base)),
> +					base))
> +		cond_reduc_val
> +		  = int_const_binop (MINUS_EXPR, base, integer_one_node);
> +	    }
> +	  if (cond_reduc_val)
> +	    {
> +	      if (dump_enabled_p ())
> +		dump_printf_loc (MSG_NOTE, vect_location,
> +				 "condition expression based on "
> +				 "integer induction.\n");
> +	      STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
> +		= INTEGER_INDUC_COND_REDUCTION;
> +	    }
>  	}
>  
>        /* Loop peeling modifies initial value of reduction PHI, which
> @@ -6127,8 +6193,8 @@ vectorizable_reduction (gimple *stmt, gi
>  	  gcc_assert (orig_code == MAX_EXPR || orig_code == MIN_EXPR);
>  	}
>        else if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
> -		 == INTEGER_INDUC_COND_REDUCTION)
> -	orig_code = MAX_EXPR;
> +	       == INTEGER_INDUC_COND_REDUCTION)
> +	orig_code = cond_reduc_op_code;
>      }
>  
>    if (nested_cycle)
> @@ -6464,7 +6530,8 @@ vectorizable_reduction (gimple *stmt, gi
>  
>    vect_create_epilog_for_reduction (vect_defs, stmt, reduc_def_stmt,
>  				    epilog_copies, reduc_fn, phis,
> -				    double_reduc, slp_node, slp_node_instance);
> +				    double_reduc, slp_node, slp_node_instance,
> +				    cond_reduc_val, cond_reduc_op_code);
>  
>    return true;
>  }
> --- gcc/testsuite/gcc.dg/vect/pr80631-1.c.jj	2017-12-11 16:38:40.452891617 +0100
> +++ gcc/testsuite/gcc.dg/vect/pr80631-1.c	2017-12-11 16:38:20.000000000 +0100
> @@ -0,0 +1,76 @@
> +/* PR tree-optimization/80631 */
> +/* { dg-do run } */
> +
> +#include "tree-vect.h"
> +
> +int v[8] = { 77, 1, 79, 3, 4, 3, 6, 7 };
> +
> +__attribute__((noipa)) void
> +f1 (void)
> +{
> +  int k, r = -1;
> +  for (k = 0; k < 8; k++)
> +    if (v[k] == 77)
> +      r = k;
> +  if (r != 0)
> +    abort ();
> +}
> +
> +__attribute__((noipa)) void
> +f2 (void)
> +{
> +  int k, r = 4;
> +  for (k = 0; k < 8; k++)
> +    if (v[k] == 79)
> +      r = k;
> +  if (r != 2)
> +    abort ();
> +}
> +
> +__attribute__((noipa)) void
> +f3 (void)
> +{
> +  int k, r = -17;
> +  for (k = 0; k < 8; k++)
> +    if (v[k] == 78)
> +      r = k;
> +  if (r != -17)
> +    abort ();
> +}
> +
> +__attribute__((noipa)) void
> +f4 (void)
> +{
> +  int k, r = 7;
> +  for (k = 0; k < 8; k++)
> +    if (v[k] == 78)
> +      r = k;
> +  if (r != 7)
> +    abort ();
> +}
> +
> +__attribute__((noipa)) void
> +f5 (void)
> +{
> +  int k, r = -1;
> +  for (k = 0; k < 8; k++)
> +    if (v[k] == 3)
> +      r = k;
> +  if (r != 5)
> +    abort ();
> +}
> +
> +int
> +main ()
> +{
> +  check_vect ();
> +  f1 ();
> +  f2 ();
> +  f3 ();
> +  f4 ();
> +  f5 ();
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 5 "vect" { target vect_condition } } } */
> +/* { dg-final { scan-tree-dump-times "condition expression based on integer induction." 10 "vect" { target vect_condition } } } */
> --- gcc/testsuite/gcc.dg/vect/pr80631-2.c.jj	2017-12-11 16:39:35.394210969 +0100
> +++ gcc/testsuite/gcc.dg/vect/pr80631-2.c	2017-12-11 16:41:14.420984162 +0100
> @@ -0,0 +1,76 @@
> +/* PR tree-optimization/80631 */
> +/* { dg-do run } */
> +
> +#include "tree-vect.h"
> +
> +int v[8] = { 77, 1, 79, 3, 4, 3, 6, 7 };
> +
> +__attribute__((noipa)) void
> +f1 (void)
> +{
> +  int k, r = -1;
> +  for (k = 7; k >= 0; k--)
> +    if (v[k] == 77)
> +      r = k;
> +  if (r != 0)
> +    abort ();
> +}
> +
> +__attribute__((noipa)) void
> +f2 (void)
> +{
> +  int k, r = 4;
> +  for (k = 7; k >= 0; k--)
> +    if (v[k] == 79)
> +      r = k;
> +  if (r != 2)
> +    abort ();
> +}
> +
> +__attribute__((noipa)) void
> +f3 (void)
> +{
> +  int k, r = -17;
> +  for (k = 7; k >= 0; k--)
> +    if (v[k] == 78)
> +      r = k;
> +  if (r != -17)
> +    abort ();
> +}
> +
> +__attribute__((noipa)) void
> +f4 (void)
> +{
> +  int k, r = 7;
> +  for (k = 7; k >= 0; k--)
> +    if (v[k] == 78)
> +      r = k;
> +  if (r != 7)
> +    abort ();
> +}
> +
> +__attribute__((noipa)) void
> +f5 (void)
> +{
> +  int k, r = -1;
> +  for (k = 7; k >= 0; k--)
> +    if (v[k] == 3)
> +      r = k;
> +  if (r != 3)
> +    abort ();
> +}
> +
> +int
> +main ()
> +{
> +  check_vect ();
> +  f1 ();
> +  f2 ();
> +  f3 ();
> +  f4 ();
> +  f5 ();
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 5 "vect" { target vect_condition } } } */
> +/* { dg-final { scan-tree-dump-times "condition expression based on integer induction." 10 "vect" { target vect_condition } } } */
> --- gcc/testsuite/gcc.dg/vect/pr65947-13.c.jj	2017-06-23 17:04:40.000000000 +0200
> +++ gcc/testsuite/gcc.dg/vect/pr65947-13.c	2017-12-11 21:04:15.822886161 +0100
> @@ -6,8 +6,7 @@ extern void abort (void) __attribute__ (
>  
>  #define N 32
>  
> -/* Simple condition reduction with a reversed loop.
> -   Will fail to vectorize to a simple case.  */
> +/* Simple condition reduction with a reversed loop.  */
>  
>  int
>  condition_reduction (int *a, int min_v)
> @@ -42,4 +41,4 @@ main (void)
>  }
>  
>  /* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" } } */
> -/* { dg-final { scan-tree-dump-not "condition expression based on integer induction." "vect" } } */
> +/* { dg-final { scan-tree-dump-times "condition expression based on integer induction." 4 "vect" } } */
> 
> 
> 	Jakub
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)