Re: [PATCH 1/2][vect]PR 88915: Vectorize epilogues when versioning loops

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Richard Sandiford <richard.sandiford@arm.com>
To: "Andre Vieira \(lists\)" <andre.simoesdiasvieira@arm.com>
Cc: Richard Biener <rguenther@suse.de>,
	 gcc-patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH 1/2][vect]PR 88915: Vectorize epilogues when versioning loops
Date: Tue, 22 Oct 2019 18:07:00 -0000	[thread overview]
Message-ID: <mpto8y8fz0v.fsf@arm.com> (raw)
In-Reply-To: <ff99c128-251a-cc77-cdaf-da5e9cca2c42@arm.com> (Andre Vieira's	message of "Tue, 22 Oct 2019 13:48:18 +0100")

Thanks for doing this.  Hope this message doesn't cover too much old
ground or duplicate too much...

"Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com> writes:
> @@ -2466,15 +2476,65 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
>    else
>      niters_prolog = build_int_cst (type, 0);
>  
> +  loop_vec_info epilogue_vinfo = NULL;
> +  if (vect_epilogues)
> +    {
> +      /* Take the next epilogue_vinfo to vectorize for.  */
> +      epilogue_vinfo = loop_vinfo->epilogue_vinfos[0];
> +      loop_vinfo->epilogue_vinfos.ordered_remove (0);
> +
> +      /* Don't vectorize epilogues if this is not the most inner loop or if
> +	 the epilogue may need peeling for alignment as the vectorizer doesn't
> +	 know how to handle these situations properly yet.  */
> +      if (loop->inner != NULL
> +	  || LOOP_VINFO_PEELING_FOR_ALIGNMENT (epilogue_vinfo))
> +	vect_epilogues = false;
> +
> +    }

Nit: excess blank line before "}".  Sorry if this was discussed before,
but what's the reason for delaying the check for "loop->inner" to
this point, rather than doing it in vect_analyze_loop?

> +
> +  tree niters_vector_mult_vf;
> +  unsigned int lowest_vf = constant_lower_bound (vf);
> +  /* Note LOOP_VINFO_NITERS_KNOWN_P and LOOP_VINFO_INT_NITERS work
> +     on niters already ajusted for the iterations of the prologue.  */

Pre-existing typo: adjusted.  But...

> +  if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> +      && known_eq (vf, lowest_vf))
> +    {
> +      loop_vec_info orig_loop_vinfo;
> +      if (LOOP_VINFO_EPILOGUE_P (loop_vinfo))
> +	orig_loop_vinfo = LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo);
> +      else
> +	orig_loop_vinfo = loop_vinfo;
> +      vector_sizes vector_sizes = LOOP_VINFO_EPILOGUE_SIZES (orig_loop_vinfo);
> +      unsigned next_size = 0;
> +      unsigned HOST_WIDE_INT eiters
> +	= (LOOP_VINFO_INT_NITERS (loop_vinfo)
> +	   - LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo));
> +
> +      if (prolog_peeling > 0)
> +	eiters -= prolog_peeling;

...is that comment still true?  We're now subtracting the peeling
amount here.

Might be worth asserting prolog_peeling >= 0, just to emphasise
that we can't get here for variable peeling amounts, and then subtract
prolog_peeling unconditionally (assuming that's the right thing to do).

> +      eiters
> +	= eiters % lowest_vf + LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo);
> +
> +      unsigned int ratio;
> +      while (next_size < vector_sizes.length ()
> +	     && !(constant_multiple_p (current_vector_size,
> +				       vector_sizes[next_size], &ratio)
> +		  && eiters >= lowest_vf / ratio))
> +	next_size += 1;
> +
> +      if (next_size == vector_sizes.length ())
> +	vect_epilogues = false;
> +    }
> +
>    /* Prolog loop may be skipped.  */
>    bool skip_prolog = (prolog_peeling != 0);
>    /* Skip to epilog if scalar loop may be preferred.  It's only needed
> -     when we peel for epilog loop and when it hasn't been checked with
> -     loop versioning.  */
> +     when we peel for epilog loop or when we loop version.  */
>    bool skip_vector = (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
>  		      ? maybe_lt (LOOP_VINFO_INT_NITERS (loop_vinfo),
>  				  bound_prolog + bound_epilog)
> -		      : !LOOP_REQUIRES_VERSIONING (loop_vinfo));
> +		      : (!LOOP_REQUIRES_VERSIONING (loop_vinfo)
> +			 || vect_epilogues));

The comment update looks wrong here: without epilogues, we don't need
the skip when loop versioning, because loop versioning ensures that we
have at least one vector iteration.

(I think "it" was supposed to mean "skipping to the epilogue" rather
than the epilogue loop itself, in case that's the confusion.)

It'd be good to mention the epilogue condition in the comment too.

> @@ -2504,6 +2564,13 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
>  
>    dump_user_location_t loop_loc = find_loop_location (loop);
>    class loop *scalar_loop = LOOP_VINFO_SCALAR_LOOP (loop_vinfo);
> +  if (vect_epilogues)
> +    /* Make sure to set the epilogue's epilogue scalar loop, such that we can
> +       we can use the original scalar loop as remaining epilogue if
> +       necessary.  */

Double "we can".

> +    LOOP_VINFO_SCALAR_LOOP (epilogue_vinfo)
> +      = LOOP_VINFO_SCALAR_LOOP (loop_vinfo);
> +
>    if (prolog_peeling)
>      {
>        e = loop_preheader_edge (loop);
> @@ -2584,14 +2651,22 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
>  			   "loop can't be duplicated to exit edge.\n");
>  	  gcc_unreachable ();
>  	}
> -      /* Peel epilog and put it on exit edge of loop.  */
> -      epilog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, scalar_loop, e);
> +      /* Peel epilog and put it on exit edge of loop.  If we are vectorizing
> +	 said epilog then we should use a copy of the main loop as a starting
> +	 point.  This loop may have been already had some preliminary

s/been//

> +	 transformations to allow for more optimal vectorizationg, for example

typo: vectorizationg

> +	 if-conversion.  If we are not vectorizing the epilog then we should
> +	 use the scalar loop as the transformations mentioned above make less
> +	 or no sense when not vectorizing.  */
> +      epilog = vect_epilogues ? get_loop_copy (loop) : scalar_loop;
> +      epilog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, epilog, e);
>        if (!epilog)
>  	{
>  	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, loop_loc,
>  			   "slpeel_tree_duplicate_loop_to_edge_cfg failed.\n");
>  	  gcc_unreachable ();
>  	}
> +
>        epilog->force_vectorize = false;
>        slpeel_update_phi_nodes_for_loops (loop_vinfo, loop, epilog, false);
>  
> [...]
> @@ -2699,10 +2774,163 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
>        adjust_vec_debug_stmts ();
>        scev_reset ();
>      }
> +
> +  if (vect_epilogues)
> +    {
> +      epilog->aux = epilogue_vinfo;
> +      LOOP_VINFO_LOOP (epilogue_vinfo) = epilog;
> +
> +      loop_constraint_clear (epilog, LOOP_C_INFINITE);
> +
> +      /* We now must calculate the number of iterations for our epilogue.  */
> +      tree cond_niters, niters;
> +
> +      /* Depending on whether we peel for gaps we take niters or niters - 1,
> +	 we will refer to this as N - G, where N and G are the NITERS and
> +	 GAP for the original loop.  */
> +      niters = LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> +	? LOOP_VINFO_NITERSM1 (loop_vinfo)
> +	: LOOP_VINFO_NITERS (loop_vinfo);
> +
> +      /* Here we build a vector factorization mask:
> +	 vf_mask = ~(VF - 1), where VF is the Vectorization Factor.  */
> +      tree vf_mask = build_int_cst (TREE_TYPE (niters),
> +				    LOOP_VINFO_VECT_FACTOR (loop_vinfo));
> +      vf_mask = fold_build2 (MINUS_EXPR, TREE_TYPE (vf_mask),
> +			     vf_mask,
> +			     build_one_cst (TREE_TYPE (vf_mask)));
> +      vf_mask = fold_build1 (BIT_NOT_EXPR, TREE_TYPE (niters), vf_mask);
> +
> +      /* Here we calculate:
> +	 niters = N - ((N-G) & ~(VF -1)) */
> +      niters = fold_build2 (MINUS_EXPR, TREE_TYPE (niters),
> +			    LOOP_VINFO_NITERS (loop_vinfo),
> +			    fold_build2 (BIT_AND_EXPR, TREE_TYPE (niters),
> +					 niters,
> +					 vf_mask));

Might be a daft question, sorry, but why does this need to be so
complicated?  Couldn't we just use the final value of the main loop's
IV to calculate how many iterations are left?

The current code wouldn't for example work for non-power-of-2 SVE vectors.
vect_set_loop_condition_unmasked is structured to cope with that case
(in length-agnostic mode only), even when an epilogue is needed.

> [...]
> -  return epilog;
> +  if (vect_epilogues)
> +    {
> +      basic_block *bbs = get_loop_body (loop);
> +      loop_vec_info epilogue_vinfo = loop_vec_info_for_loop (epilog);
> +
> +      LOOP_VINFO_UP_STMTS (epilogue_vinfo).create (0);
> +      LOOP_VINFO_UP_GT_DRS (epilogue_vinfo).create (0);
> +      LOOP_VINFO_UP_DRS (epilogue_vinfo).create (0);
> +
> +      gimple_stmt_iterator gsi;
> +      gphi_iterator phi_gsi;
> +      gimple *stmt;
> +      stmt_vec_info stmt_vinfo;
> +      dr_vec_info *dr_vinfo;
> +
> +      /* The stmt_vec_info's of the epilogue were constructed for the main loop
> +	 and need to be updated to refer to the cloned variables used in the
> +	 epilogue loop.  We do this by assuming the original main loop and the
> +	 epilogue loop are identical (aside the different SSA names).  This
> +	 means we assume we can go through each BB in the loop and each STMT in
> +	 each BB and map them 1:1, replacing the STMT_VINFO_STMT of each
> +	 stmt_vec_info in the epilogue's loop_vec_info.  Here we only keep
> +	 track of the original state of the main loop, before vectorization.
> +	 After vectorization we proceed to update the epilogue's stmt_vec_infos
> +	 information.  We also update the references in PATTERN_DEF_SEQ's,
> +	 RELATED_STMT's and data_references.  Mainly the latter has to be
> +	 updated after we are done vectorizing the main loop, as the
> +	 data_references are shared between main and epilogue.  */
> +      for (unsigned i = 0; i < loop->num_nodes; ++i)
> +	{
> +	  for (phi_gsi = gsi_start_phis (bbs[i]);
> +	       !gsi_end_p (phi_gsi); gsi_next (&phi_gsi))
> +	    LOOP_VINFO_UP_STMTS (epilogue_vinfo).safe_push (phi_gsi.phi ());
> +	  for (gsi = gsi_start_bb (bbs[i]);
> +	       !gsi_end_p (gsi); gsi_next (&gsi))
> +	    {
> +	      stmt = gsi_stmt (gsi);
> +	      LOOP_VINFO_UP_STMTS (epilogue_vinfo).safe_push (stmt);
> +	      stmt_vinfo  = epilogue_vinfo->lookup_stmt (stmt);

Nit: double space before "=".

> +	      if (stmt_vinfo != NULL
> +		  && stmt_vinfo->dr_aux.stmt == stmt_vinfo)
> +		{
> +		  dr_vinfo = STMT_VINFO_DR_INFO (stmt_vinfo);
> +		  /* Data references pointing to gather loads and scatter stores
> +		     require special treatment because the address computation
> +		     happens in a different gimple node, pointed to by DR_REF.
> +		     In contrast to normal loads and stores where we only need
> +		     to update the offset of the data reference.  */
> +		  if (STMT_VINFO_GATHER_SCATTER_P (dr_vinfo->stmt))
> +		    LOOP_VINFO_UP_GT_DRS (epilogue_vinfo).safe_push (dr_vinfo);
> +		  LOOP_VINFO_UP_DRS (epilogue_vinfo).safe_push (dr_vinfo);
> +		}
> +	    }
> +	}
> +    }
> +
> +  return vect_epilogues ? epilog : NULL;
>  }
>  
>  /* Function vect_create_cond_for_niters_checks.
> [...]
> @@ -2151,8 +2176,18 @@ start_over:
>    /* During peeling, we need to check if number of loop iterations is
>       enough for both peeled prolog loop and vector loop.  This check
>       can be merged along with threshold check of loop versioning, so
> -     increase threshold for this case if necessary.  */
> -  if (LOOP_REQUIRES_VERSIONING (loop_vinfo))
> +     increase threshold for this case if necessary.
> +
> +     If we are analyzing an epilogue we still want to check what it's

s/it's/its/

> +     versioning threshold would be.  If we decide to vectorize the epilogues we
> +     will want to use the lowest versioning threshold of all epilogues and main
> +     loop.  This will enable us to enter a vectorized epilogue even when
> +     versioning the loop.  We can't simply check whether the epilogue requires
> +     versioning though since we may have skipped some versioning checks when
> +     analyzing the epilogue. For instance, checks for alias versioning will be

Nit: should be two spaces after ".".

> +     skipped when dealing with epilogues as we assume we already checked them
> +     for the main loop.  So instead we always check the 'orig_loop_vinfo'.  */
> +  if (LOOP_REQUIRES_VERSIONING (orig_loop_vinfo))
>      {
>        poly_uint64 niters_th = 0;
>        unsigned int th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo);
> @@ -2307,14 +2342,8 @@ again:
>     be vectorized.  */
>  opt_loop_vec_info
>  vect_analyze_loop (class loop *loop, loop_vec_info orig_loop_vinfo,
> -		   vec_info_shared *shared)
> +		   vec_info_shared *shared, vector_sizes vector_sizes)
>  {
> -  auto_vector_sizes vector_sizes;
> -
> -  /* Autodetect first vector size we try.  */
> -  current_vector_size = 0;
> -  targetm.vectorize.autovectorize_vector_sizes (&vector_sizes,
> -						loop->simdlen != 0);
>    unsigned int next_size = 0;
>  
>    DUMP_VECT_SCOPE ("analyze_loop_nest");
> @@ -2335,6 +2364,9 @@ vect_analyze_loop (class loop *loop, loop_vec_info orig_loop_vinfo,
>    poly_uint64 autodetected_vector_size = 0;
>    opt_loop_vec_info first_loop_vinfo = opt_loop_vec_info::success (NULL);
>    poly_uint64 first_vector_size = 0;
> +  poly_uint64 lowest_th = 0;
> +  unsigned vectorized_loops = 0;
> +  bool vect_epilogues = !loop->simdlen && PARAM_VALUE (PARAM_VECT_EPILOGUES_NOMASK);
>    while (1)
>      {
>        /* Check the CFG characteristics of the loop (nesting, entry/exit).  */
> @@ -2353,24 +2385,52 @@ vect_analyze_loop (class loop *loop, loop_vec_info orig_loop_vinfo,
>  
>        if (orig_loop_vinfo)
>  	LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo) = orig_loop_vinfo;
> +      else if (vect_epilogues && first_loop_vinfo)
> +	LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo) = first_loop_vinfo;
>  
>        opt_result res = vect_analyze_loop_2 (loop_vinfo, fatal, &n_stmts);
>        if (res)
>  	{
>  	  LOOP_VINFO_VECTORIZABLE_P (loop_vinfo) = 1;
> +	  vectorized_loops++;
>  
> -	  if (loop->simdlen
> -	      && maybe_ne (LOOP_VINFO_VECT_FACTOR (loop_vinfo),
> -			   (unsigned HOST_WIDE_INT) loop->simdlen))
> +	  if ((loop->simdlen
> +	       && maybe_ne (LOOP_VINFO_VECT_FACTOR (loop_vinfo),
> +			    (unsigned HOST_WIDE_INT) loop->simdlen))
> +	      || vect_epilogues)
>  	    {
>  	      if (first_loop_vinfo == NULL)
>  		{
>  		  first_loop_vinfo = loop_vinfo;
> +		  lowest_th
> +		    = LOOP_VINFO_VERSIONING_THRESHOLD (first_loop_vinfo);
>  		  first_vector_size = current_vector_size;
>  		  loop->aux = NULL;
>  		}
>  	      else
> -		delete loop_vinfo;
> +		{
> +		  /* Keep track of vector sizes that we know we can vectorize
> +		     the epilogue with.  */
> +		  if (vect_epilogues)
> +		    {
> +		      loop->aux = NULL;
> +		      first_loop_vinfo->epilogue_vsizes.reserve (1);
> +		      first_loop_vinfo->epilogue_vsizes.quick_push (current_vector_size);
> +		      first_loop_vinfo->epilogue_vinfos.reserve (1);
> +		      first_loop_vinfo->epilogue_vinfos.quick_push (loop_vinfo);

I've messed you around, sorry, but the patches I committed this weekend
mean we now store the vector size in the loop_vinfo.  It'd be good to
avoid a separate epilogue_vsizes array if possible.

> +		      LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo) = first_loop_vinfo;
> +		      poly_uint64 th
> +			= LOOP_VINFO_VERSIONING_THRESHOLD (loop_vinfo);
> +		      gcc_assert (!LOOP_REQUIRES_VERSIONING (loop_vinfo)
> +				  || maybe_ne (lowest_th, 0U));
> +		      /* Keep track of the known smallest versioning
> +			 threshold.  */
> +		      if (ordered_p (lowest_th, th))
> +			lowest_th = ordered_min (lowest_th, th);
> +		    }
> +		  else
> +		    delete loop_vinfo;
> +		}
>  	    }
>  	  else
>  	    {
> @@ -2408,6 +2468,8 @@ vect_analyze_loop (class loop *loop, loop_vec_info orig_loop_vinfo,
>  		  dump_dec (MSG_NOTE, current_vector_size);
>  		  dump_printf (MSG_NOTE, "\n");
>  		}
> +	      LOOP_VINFO_VERSIONING_THRESHOLD (first_loop_vinfo) = lowest_th;
> +
>  	      return first_loop_vinfo;
>  	    }
>  	  else
> @@ -8128,6 +8190,188 @@ vect_transform_loop_stmt (loop_vec_info loop_vinfo, stmt_vec_info stmt_info,
>      *seen_store = stmt_info;
>  }
>  
> +/* Helper function to pass to simplify_replace_tree to enable replacing tree's
> +   in the hash_map with its corresponding values.  */
> +static tree
> +find_in_mapping (tree t, void *context)
> +{
> +  hash_map<tree,tree>* mapping = (hash_map<tree, tree>*) context;
> +
> +  tree *value = mapping->get (t);
> +  return value ? *value : t;
> +}
> +
> +static void
> +update_epilogue_loop_vinfo (class loop *epilogue, tree advance)
> +{
> +  loop_vec_info epilogue_vinfo = loop_vec_info_for_loop (epilogue);
> +  auto_vec<stmt_vec_info> pattern_worklist, related_worklist;
> +  hash_map<tree,tree> mapping;
> +  gimple *orig_stmt, *new_stmt;
> +  gimple_stmt_iterator epilogue_gsi;
> +  gphi_iterator epilogue_phi_gsi;
> +  stmt_vec_info stmt_vinfo = NULL, related_vinfo;
> +  basic_block *epilogue_bbs = get_loop_body (epilogue);
> +
> +  LOOP_VINFO_BBS (epilogue_vinfo) = epilogue_bbs;
> +
> +  vect_update_inits_of_drs (epilogue_vinfo, advance, PLUS_EXPR);
> +
> +
> +  /* We are done vectorizing the main loop, so now we update the epilogues
> +     stmt_vec_info's.  At the same time we set the gimple UID of each

"epilogue's stmt_vec_infos"

> +     statement in the epilogue, as these are used to look them up in the
> +     epilogues loop_vec_info later.  We also keep track of what

epilogue's

> +     stmt_vec_info's have PATTERN_DEF_SEQ's and RELATED_STMT's that might

PATTERN_DEF_SEQs and RELATED_STMTs

> +     need updating and we construct a mapping between variables defined in
> +     the main loop and their corresponding names in epilogue.  */
> +  for (unsigned i = 0; i < epilogue->num_nodes; ++i)
> +    {
> +      for (epilogue_phi_gsi = gsi_start_phis (epilogue_bbs[i]);
> +	   !gsi_end_p (epilogue_phi_gsi); gsi_next (&epilogue_phi_gsi))
> +	{
> +	  orig_stmt = LOOP_VINFO_UP_STMTS (epilogue_vinfo)[0];
> +	  LOOP_VINFO_UP_STMTS (epilogue_vinfo).ordered_remove (0);
> +	  new_stmt = epilogue_phi_gsi.phi ();
> +
> +	  stmt_vinfo
> +	    = epilogue_vinfo->lookup_stmt (orig_stmt);

Nit: fits one line.

> +
> +	  STMT_VINFO_STMT (stmt_vinfo) = new_stmt;
> +	  gimple_set_uid (new_stmt, gimple_uid (orig_stmt));
> +
> +	  mapping.put (gimple_phi_result (orig_stmt),
> +			gimple_phi_result (new_stmt));

Nit: indented too far.

> +
> +	  if (STMT_VINFO_PATTERN_DEF_SEQ (stmt_vinfo))
> +	    pattern_worklist.safe_push (stmt_vinfo);
> +
> +	  related_vinfo = STMT_VINFO_RELATED_STMT (stmt_vinfo);
> +	  while (related_vinfo && related_vinfo != stmt_vinfo)
> +	    {
> +	      related_worklist.safe_push (related_vinfo);
> +	      /* Set BB such that the assert in
> +		'get_initial_def_for_reduction' is able to determine that
> +		the BB of the related stmt is inside this loop.  */
> +	      gimple_set_bb (STMT_VINFO_STMT (related_vinfo),
> +			     gimple_bb (new_stmt));
> +	      related_vinfo = STMT_VINFO_RELATED_STMT (related_vinfo);
> +	    }
> +	}
> +
> +      for (epilogue_gsi = gsi_start_bb (epilogue_bbs[i]);
> +	   !gsi_end_p (epilogue_gsi); gsi_next (&epilogue_gsi))
> +	{
> +	  orig_stmt = LOOP_VINFO_UP_STMTS (epilogue_vinfo)[0];
> +	  LOOP_VINFO_UP_STMTS (epilogue_vinfo).ordered_remove (0);
> +	  new_stmt = gsi_stmt (epilogue_gsi);
> +
> +	  stmt_vinfo
> +	    = epilogue_vinfo->lookup_stmt (orig_stmt);

Fits on one line.

> +
> +	  STMT_VINFO_STMT (stmt_vinfo) = new_stmt;
> +	  gimple_set_uid (new_stmt, gimple_uid (orig_stmt));
> +
> +	  if (is_gimple_assign (orig_stmt))
> +	    {
> +	      gcc_assert (is_gimple_assign (new_stmt));
> +	      mapping.put (gimple_assign_lhs (orig_stmt),
> +			  gimple_assign_lhs (new_stmt));
> +	    }

Why just assigns?  Don't we need to handle calls too?

Maybe just use gimple_get_lhs here.

> +
> +	  if (STMT_VINFO_PATTERN_DEF_SEQ (stmt_vinfo))
> +	    pattern_worklist.safe_push (stmt_vinfo);
> +
> +	  related_vinfo = STMT_VINFO_RELATED_STMT (stmt_vinfo);
> +	  related_vinfo = STMT_VINFO_RELATED_STMT (stmt_vinfo);
> +	  while (related_vinfo && related_vinfo != stmt_vinfo)
> +	    {
> +	      related_worklist.safe_push (related_vinfo);
> +	      /* Set BB such that the assert in
> +		'get_initial_def_for_reduction' is able to determine that
> +		the BB of the related stmt is inside this loop.  */
> +	      gimple_set_bb (STMT_VINFO_STMT (related_vinfo),
> +			     gimple_bb (new_stmt));
> +	      related_vinfo = STMT_VINFO_RELATED_STMT (related_vinfo);
> +	    }
> +	}
> +      gcc_assert (LOOP_VINFO_UP_STMTS (epilogue_vinfo).length () == 0);
> +    }
> +
> +  /* The PATTERN_DEF_SEQ's in the epilogue were constructed using the

PATTERN_DEF_SEQs

> +     original main loop and thus need to be updated to refer to the cloned
> +     variables used in the epilogue.  */
> +  for (unsigned i = 0; i < pattern_worklist.length (); ++i)
> +    {
> +      gimple_seq seq = STMT_VINFO_PATTERN_DEF_SEQ (pattern_worklist[i]);
> +      tree *new_op;
> +
> +      while (seq)
> +	{
> +	  for (unsigned j = 1; j < gimple_num_ops (seq); ++j)
> +	    {
> +	      tree op = gimple_op (seq, j);
> +	      if ((new_op = mapping.get(op)))
> +		gimple_set_op (seq, j, *new_op);
> +	      else
> +		{
> +		  op = simplify_replace_tree (op, NULL_TREE, NULL_TREE,
> +					 &find_in_mapping, &mapping);
> +		  gimple_set_op (seq, j, op);
> +		}
> +	    }
> +	  seq = seq->next;
> +	}
> +    }
> +
> +  /* Just like the PATTERN_DEF_SEQ's the RELATED_STMT's also need to be

as above

> +     updated.  */
> +  for (unsigned i = 0; i < related_worklist.length (); ++i)
> +    {
> +      tree *new_t;
> +      gimple * stmt = STMT_VINFO_STMT (related_worklist[i]);
> +      for (unsigned j = 1; j < gimple_num_ops (stmt); ++j)
> +	if ((new_t = mapping.get(gimple_op (stmt, j))))

These days I think:

	if (tree *new_t = mapping.get(gimple_op (stmt, j)))

is preferred.

> +	  gimple_set_op (stmt, j, *new_t);
> +    }
> +
> +  tree *new_op;
> +  /* Data references for gather loads and scatter stores do not use the
> +     updated offset we set using ADVANCE.  Instead we have to make sure the
> +     reference in the data references point to the corresponding copy of
> +     the original in the epilogue.  */
> +  for (unsigned i = 0; i < LOOP_VINFO_UP_GT_DRS (epilogue_vinfo).length (); ++i)
> +    {
> +      dr_vec_info *dr_vinfo = LOOP_VINFO_UP_GT_DRS (epilogue_vinfo)[i];
> +      data_reference *dr = dr_vinfo->dr;
> +      gcc_assert (dr);
> +      gcc_assert (TREE_CODE (DR_REF (dr)) == MEM_REF);
> +      new_op = mapping.get (TREE_OPERAND (DR_REF (dr), 0));
> +
> +      if (new_op)

Likewise:

      if (tree *new_op = mapping.get (TREE_OPERAND (DR_REF (dr), 0)))

here.

> +	{
> +	  DR_REF (dr) = unshare_expr (DR_REF (dr));
> +	  TREE_OPERAND (DR_REF (dr), 0) = *new_op;
> +	  DR_STMT (dr_vinfo->dr) = SSA_NAME_DEF_STMT (*new_op);
> +	}
> +    }
> +
> +  /* The vector size of the epilogue is smaller than that of the main loop
> +     so the alignment is either the same or lower. This means the dr will
> +     thus by definition be aligned.  */
> +  for (unsigned i = 0; i < LOOP_VINFO_UP_DRS (epilogue_vinfo).length (); ++i)
> +    LOOP_VINFO_UP_DRS (epilogue_vinfo)[i]->base_misaligned = false;
> +
> +
> +  LOOP_VINFO_UP_STMTS (epilogue_vinfo).release ();
> +  LOOP_VINFO_UP_GT_DRS (epilogue_vinfo).release ();
> +  LOOP_VINFO_UP_DRS (epilogue_vinfo).release ();
> +
> +  epilogue_vinfo->shared->datarefs_copy.release ();
> +  epilogue_vinfo->shared->save_datarefs ();
> +}
> +
> +
>  /* Function vect_transform_loop.
>  
>     The analysis phase has determined that the loop is vectorizable.
> [...]
> @@ -882,10 +886,35 @@ try_vectorize_loop_1 (hash_table<simduid_to_vf> *&simduid_to_vf_htab,
>  		 LOCATION_FILE (vect_location.get_location_t ()),
>  		 LOCATION_LINE (vect_location.get_location_t ()));
>  
> +  /* If this is an epilogue, we already know what vector sizes we will use for
> +     vectorization as the analyzis was part of the main vectorized loop.  Use
> +     these instead of going through all vector sizes again.  */
> +  if (orig_loop_vinfo
> +      && !LOOP_VINFO_EPILOGUE_SIZES (orig_loop_vinfo).is_empty ())
> +    {
> +      vector_sizes = LOOP_VINFO_EPILOGUE_SIZES (orig_loop_vinfo);
> +      assert_versioning = LOOP_REQUIRES_VERSIONING (orig_loop_vinfo);
> +      current_vector_size = vector_sizes[0];
> +    }
> +  else
> +    {
> +      /* Autodetect first vector size we try.  */
> +      current_vector_size = 0;
> +
> +      targetm.vectorize.autovectorize_vector_sizes (&auto_vector_sizes,
> +						    loop->simdlen != 0);
> +      vector_sizes = auto_vector_sizes;
> +    }
> +
>    /* Try to analyze the loop, retaining an opt_problem if dump_enabled_p.  */
> -  opt_loop_vec_info loop_vinfo
> -    = vect_analyze_loop (loop, orig_loop_vinfo, &shared);
> -  loop->aux = loop_vinfo;
> +  opt_loop_vec_info loop_vinfo = opt_loop_vec_info::success (NULL);
> +  if (loop_vec_info_for_loop (loop))
> +    loop_vinfo = opt_loop_vec_info::success (loop_vec_info_for_loop (loop));
> +  else
> +    {
> +      loop_vinfo = vect_analyze_loop (loop, orig_loop_vinfo, &shared, vector_sizes);
> +      loop->aux = loop_vinfo;
> +    }

I don't really understand what this is doing for the epilogue case.
Do we call vect_analyze_loop again?  Are vector_sizes[1:] significant
for epilogues?

Thanks,
Richard

next prev parent reply	other threads:[~2019-10-22 17:53 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-23 17:17 Andre Vieira (lists)
2019-08-26 13:17 ` Richard Biener
2019-09-18 11:11   ` Andre Vieira (lists)
2019-10-08 13:16   ` Andre Vieira (lists)
2019-10-09  9:06     ` Richard Biener
2019-10-10 14:01       ` Andre Vieira (lists)
2019-10-11 13:01         ` Richard Biener
2019-10-22 12:53           ` Andre Vieira (lists)
2019-10-22 14:05             ` Richard Biener
2019-10-25 16:18               ` Andre Vieira (lists)
2019-10-28 13:16                 ` Richard Biener
2019-10-22 18:07             ` Richard Sandiford [this message]
2019-10-25 16:20               ` Andre Vieira (lists)
2019-10-25 16:27                 ` Andre Vieira (lists)
2019-10-28 14:18                   ` Richard Biener
2019-10-28 18:37                     ` Andre Vieira (lists)
2019-10-29 12:12                       ` Richard Biener
2019-10-29 13:17                         ` [committed][vect]PR 88915: Vectorize epilogues when versioning loops (was Re: [PATCH 1/2][vect]PR 88915: Vectorize epilogues when versioning loops) Andre Vieira (lists)
2019-10-29 13:45                           ` Richard Biener
2019-09-04 15:35 ` [PATCH 1/2][vect]PR 88915: Vectorize epilogues when versioning loops Jeff Law

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mpto8y8fz0v.fsf@arm.com \
    --to=richard.sandiford@arm.com \
    --cc=andre.simoesdiasvieira@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=rguenther@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).