From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=LMmU=FY=suse.de=rguenther@sourceware.org>
Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28])
	by sourceware.org (Postfix) with ESMTPS id 4342A3858C33
	for <gcc-patches@gcc.gnu.org>; Tue, 10 Oct 2023 11:13:59 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4342A3858C33
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de
Received: from relay2.suse.de (relay2.suse.de [149.44.160.134])
	by smtp-out1.suse.de (Postfix) with ESMTP id 4114F211C5;
	Tue, 10 Oct 2023 11:13:58 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa;
	t=1696936438; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=3x2PddZonGBMzV41l+ASlSwO50s7x4Fwj1sxSAPs77w=;
	b=rJss3CSFmGSB1zi4OLTbk+48go5waLy6MA0wi10NSY8pc3ZAGwq/VsXTFNAL6/ohBn+mFr
	t+btJNK4y+TkMYfbcwPov/I3KUCI4AFzMTzPMc2uD9YtvxMoJChRBlvZ0EP7qWiyJy43hm
	mfmMrV1ihwGAbOwziC4tt+zD/kDa8Go=
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de;
	s=susede2_ed25519; t=1696936438;
	h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=3x2PddZonGBMzV41l+ASlSwO50s7x4Fwj1sxSAPs77w=;
	b=tPH9EOMKEF3WVVA5c6O3aA6My/GniaQUm2fQ6IbCCzJS1ALAIUP376yFSPjLl8AUwVu7Rm
	NG3usLsmtgS/uwBw==
Received: from wotan.suse.de (wotan.suse.de [10.160.0.1])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by relay2.suse.de (Postfix) with ESMTPS id 09D6D2C417;
	Tue, 10 Oct 2023 11:13:57 +0000 (UTC)
Date: Tue, 10 Oct 2023 11:13:58 +0000 (UTC)
From: Richard Biener <rguenther@suse.de>
To: Tamar Christina <tamar.christina@arm.com>
cc: gcc-patches@gcc.gnu.org, nd@arm.com, jlaw@ventanamicro.com
Subject: Re: [PATCH 2/3]middle-end: updated niters analysis to handle multiple
 exits.
In-Reply-To: <ZRp0Q7ovA3FweWk4@arm.com>
Message-ID: <nycvar.YFH.7.77.849.2310101105020.5561@jbgna.fhfr.qr>
References: <ZRp0Q7ovA3FweWk4@arm.com>
User-Agent: Alpine 2.22 (LSU 394 2020-01-19)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
X-Spam-Status: No, score=-10.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_LOTSOFHASH,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On Mon, 2 Oct 2023, Tamar Christina wrote:

> Hi All,
> 
> This second part updates niters analysis to be able to analyze any number of
> exits.  If we have multiple exits we determine the main exit by finding the
> first counting IV.
> 
> The change allows the vectorizer to pass analysis for multiple loops, but we
> later gracefully reject them.  It does however allow us to test if the exit
> handling is using the right exit everywhere.
> 
> Additionally since we analyze all exits, we now return all conditions for them
> and determine which condition belongs to the main exit.
> 
> The main condition is needed because the vectorizer needs to ignore the main IV
> condition during vectorization as it will replace it during codegen.
> 
> To track versioned loops we extend the contract between ifcvt and the vectorizer
> to store the exit number in aux so that we can match it up again during peeling.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-linux-gnu, and
> no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* tree-if-conv.cc (tree_if_conversion): Record exits in aux.
> 	* tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg): Use
> 	it.
> 	* tree-vect-loop.cc (vect_get_loop_niters): Determine main exit.
> 	(vec_init_loop_exit_info): Extend analysis when multiple exits.
> 	(vect_analyze_loop_form): Record conds and determine main cond.
> 	(vect_create_loop_vinfo): Extend bookkeeping of conds.
> 	(vect_analyze_loop): Release conds.
> 	* tree-vectorizer.h (LOOP_VINFO_LOOP_CONDS,
> 	LOOP_VINFO_LOOP_IV_COND):  New.
> 	(struct vect_loop_form_info): Add conds, alt_loop_conds;
> 	(struct loop_vec_info): Add conds, loop_iv_cond.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
> index 799f071965e5c41eb352b5530cf1d9c7ecf7bf25..3dc2290467797ebbfcef55903531b22829f4fdbd 100644
> --- a/gcc/tree-if-conv.cc
> +++ b/gcc/tree-if-conv.cc
> @@ -3795,6 +3795,13 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
>      }
>    if (need_to_ifcvt)
>      {
> +      /* Before we rewrite edges we'll record their original position in the
> +	 edge map such that we can map the edges between the ifcvt and the
> +	 non-ifcvt loop during peeling.  */
> +      uintptr_t idx = 0;
> +      for (edge exit : get_loop_exit_edges (loop))
> +	exit->aux = (void*)idx++;
> +
>        /* Now all statements are if-convertible.  Combine all the basic
>  	 blocks into one huge basic block doing the if-conversion
>  	 on-the-fly.  */
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index e06717272aafc6d31cbdcb94840ac25de616da6d..77f8e668bcc8beca99ba4052e1b12e0d17300262 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -1470,6 +1470,18 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
>        scalar_loop = loop;
>        scalar_exit = loop_exit;
>      }
> +  else if (scalar_loop == loop)
> +    scalar_exit = loop_exit;
> +  else
> +    {
> +      /* Loop has been version, match exits up using the aux index.  */
> +      for (edge exit : get_loop_exit_edges (scalar_loop))
> +	if (exit->aux == loop_exit->aux)
> +	  {
> +	    scalar_exit	= exit;
> +	    break;
> +	  }
> +    }
>  
>    bbs = XNEWVEC (basic_block, scalar_loop->num_nodes + 1);
>    pbbs = bbs + 1;
> @@ -1501,6 +1513,8 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
>    exit = loop_exit;
>    basic_block new_preheader = new_bbs[0];
>  
> +  /* Record the new loop exit information.  new_loop doesn't have SCEV data and
> +     so we must initialize the exit information.  */
>    if (new_e)
>      *new_e = new_exit;
>  
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 6e60d84143626a8e1d801bb580f4dcebc73c7ba7..f1caa5f207d3b13da58c3a313b11d1ef98374349 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -851,79 +851,106 @@ vect_fixup_scalar_cycles_with_patterns (loop_vec_info loop_vinfo)
>     in NUMBER_OF_ITERATIONSM1.  Place the condition under which the
>     niter information holds in ASSUMPTIONS.
>  
> -   Return the loop exit condition.  */
> +   Return the loop exit conditions.  */
>  
>  
> -static gcond *
> -vect_get_loop_niters (class loop *loop, edge exit, tree *assumptions,
> +static vec<gcond *>
> +vect_get_loop_niters (class loop *loop, tree *assumptions, const_edge main_exit,
>  		      tree *number_of_iterations, tree *number_of_iterationsm1)

Any reason you swap exit and main_exit?  IMHO the input better pairs with
the other input 'loop'.


>  {
> +  auto_vec<edge> exits = get_loop_exit_edges (loop);
> +  vec<gcond *> conds;
> +  conds.create (exits.length ());
>    class tree_niter_desc niter_desc;
>    tree niter_assumptions, niter, may_be_zero;
> -  gcond *cond = get_loop_exit_condition (loop);
>  
>    *assumptions = boolean_true_node;
>    *number_of_iterationsm1 = chrec_dont_know;
>    *number_of_iterations = chrec_dont_know;
> +
>    DUMP_VECT_SCOPE ("get_loop_niters");
>  
> -  if (!exit)
> -    return cond;
> +  if (exits.is_empty ())
> +    return conds;
> +
> +  if (dump_enabled_p ())
> +    dump_printf_loc (MSG_NOTE, vect_location, "Loop has %d exits.\n",
> +		     exits.length ());
> +
> +  edge exit;
> +  unsigned int i;
> +  FOR_EACH_VEC_ELT (exits, i, exit)
> +    {
> +      gcond *cond = get_loop_exit_condition (exit);
> +      if (cond)
> +	conds.safe_push (cond);
> +
> +      if (dump_enabled_p ())
> +	dump_printf_loc (MSG_NOTE, vect_location, "Analyzing exit %d...\n", i);
>  
> -  may_be_zero = NULL_TREE;
> -  if (!number_of_iterations_exit_assumptions (loop, exit, &niter_desc, NULL)
> -      || chrec_contains_undetermined (niter_desc.niter))
> -    return cond;
> +      may_be_zero = NULL_TREE;
> +      if (!number_of_iterations_exit_assumptions (loop, exit, &niter_desc, NULL)
> +          || chrec_contains_undetermined (niter_desc.niter))
> +	continue;
>  
> -  niter_assumptions = niter_desc.assumptions;
> -  may_be_zero = niter_desc.may_be_zero;
> -  niter = niter_desc.niter;
> +      niter_assumptions = niter_desc.assumptions;
> +      may_be_zero = niter_desc.may_be_zero;
> +      niter = niter_desc.niter;
>  
> -  if (may_be_zero && integer_zerop (may_be_zero))
> -    may_be_zero = NULL_TREE;
> +      if (may_be_zero && integer_zerop (may_be_zero))
> +	may_be_zero = NULL_TREE;
>  
> -  if (may_be_zero)
> -    {
> -      if (COMPARISON_CLASS_P (may_be_zero))
> +      if (may_be_zero)
>  	{
> -	  /* Try to combine may_be_zero with assumptions, this can simplify
> -	     computation of niter expression.  */
> -	  if (niter_assumptions && !integer_nonzerop (niter_assumptions))
> -	    niter_assumptions = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
> -					     niter_assumptions,
> -					     fold_build1 (TRUTH_NOT_EXPR,
> -							  boolean_type_node,
> -							  may_be_zero));
> +	  if (COMPARISON_CLASS_P (may_be_zero))
> +	    {
> +	      /* Try to combine may_be_zero with assumptions, this can simplify
> +		 computation of niter expression.  */
> +	      if (niter_assumptions && !integer_nonzerop (niter_assumptions))
> +		niter_assumptions = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
> +						 niter_assumptions,
> +						 fold_build1 (TRUTH_NOT_EXPR,
> +							      boolean_type_node,
> +							      may_be_zero));
> +	      else
> +		niter = fold_build3 (COND_EXPR, TREE_TYPE (niter), may_be_zero,
> +				     build_int_cst (TREE_TYPE (niter), 0),
> +				     rewrite_to_non_trapping_overflow (niter));
> +
> +	      may_be_zero = NULL_TREE;
> +	    }
> +	  else if (integer_nonzerop (may_be_zero) && exit == main_exit)
> +	    {
> +	      *number_of_iterationsm1 = build_int_cst (TREE_TYPE (niter), 0);
> +	      *number_of_iterations = build_int_cst (TREE_TYPE (niter), 1);
> +	      continue;
> +	    }
>  	  else
> -	    niter = fold_build3 (COND_EXPR, TREE_TYPE (niter), may_be_zero,
> -				 build_int_cst (TREE_TYPE (niter), 0),
> -				 rewrite_to_non_trapping_overflow (niter));
> +	    continue;
> +       }
>  
> -	  may_be_zero = NULL_TREE;
> -	}
> -      else if (integer_nonzerop (may_be_zero))
> +      /* Loop assumptions are based off the normal exit.  */
> +      if (exit == main_exit)

It's a bit hard to follow in patch form but I wonder why you even
analyze the number of iterations of the non-main exits riskying
possibly clobbering the *number_* outputs which we later assume
to be for the main exit?

>  	{
> -	  *number_of_iterationsm1 = build_int_cst (TREE_TYPE (niter), 0);
> -	  *number_of_iterations = build_int_cst (TREE_TYPE (niter), 1);
> -	  return cond;
> +	  *assumptions = niter_assumptions;
> +	  *number_of_iterationsm1 = niter;
> +
> +	  /* We want the number of loop header executions which is the number
> +	     of latch executions plus one.
> +	     ???  For UINT_MAX latch executions this number overflows to zero
> +	     for loops like do { n++; } while (n != 0);  */
> +	  if (niter && !chrec_contains_undetermined (niter))
> +	    niter = fold_build2 (PLUS_EXPR, TREE_TYPE (niter),
> +				 unshare_expr (niter),
> +				 build_int_cst (TREE_TYPE (niter), 1));
> +	  *number_of_iterations = niter;
>  	}
> -      else
> -	return cond;
>      }
>  
> -  *assumptions = niter_assumptions;
> -  *number_of_iterationsm1 = niter;
> -
> -  /* We want the number of loop header executions which is the number
> -     of latch executions plus one.
> -     ???  For UINT_MAX latch executions this number overflows to zero
> -     for loops like do { n++; } while (n != 0);  */
> -  if (niter && !chrec_contains_undetermined (niter))
> -    niter = fold_build2 (PLUS_EXPR, TREE_TYPE (niter), unshare_expr (niter),
> -			  build_int_cst (TREE_TYPE (niter), 1));
> -  *number_of_iterations = niter;
> +  if (dump_enabled_p ())
> +    dump_printf_loc (MSG_NOTE, vect_location, "All loop exits successfully analyzed.\n");
>  
> -  return cond;
> +  return conds;
>  }
>  
>  /*  Determine the main loop exit for the vectorizer.  */
> @@ -936,8 +963,25 @@ vec_init_loop_exit_info (class loop *loop)
>    auto_vec<edge> exits = get_loop_exit_edges (loop);
>    if (exits.length () == 1)
>      return exits[0];
> -  else
> -    return NULL;
> +
> +  /* If we have multiple exits we only support counting IV at the moment.  Analyze
> +     all exits and return one */
> +  class tree_niter_desc niter_desc;
> +  edge candidate = NULL;
> +  for (edge exit : exits)
> +    {
> +      if (!get_loop_exit_condition (exit))
> +	continue;
> +
> +      if (number_of_iterations_exit_assumptions (loop, exit, &niter_desc, NULL)
> +	  && !chrec_contains_undetermined (niter_desc.niter))
> +	{
> +	  if (!niter_desc.may_be_zero || !candidate)
> +	    candidate = exit;
> +	}
> +    }
> +
> +  return candidate;
>  }
>  
>  /* Function bb_in_loop_p
> @@ -1788,21 +1832,31 @@ vect_analyze_loop_form (class loop *loop, vect_loop_form_info *info)
>  				   "not vectorized: latch block not empty.\n");
>  
>    /* Make sure the exit is not abnormal.  */
> -  edge e = single_exit (loop);
> -  if (e->flags & EDGE_ABNORMAL)
> +  if (exit_e->flags & EDGE_ABNORMAL)
>      return opt_result::failure_at (vect_location,
>  				   "not vectorized:"
>  				   " abnormal loop exit edge.\n");
>  
> -  info->loop_cond
> -    = vect_get_loop_niters (loop, e, &info->assumptions,
> +  info->conds
> +    = vect_get_loop_niters (loop, &info->assumptions, exit_e,
>  			    &info->number_of_iterations,
>  			    &info->number_of_iterationsm1);
> -  if (!info->loop_cond)
> +
> +  if (info->conds.is_empty ())
>      return opt_result::failure_at
>        (vect_location,
>         "not vectorized: complicated exit condition.\n");
>  
> +  /* Determine what the primary and alternate exit conds are.  */
> +  info->alt_loop_conds.create (info->conds.length () - 1);
> +  for (gcond *cond : info->conds)
> +    {
> +      if (exit_e->src != gimple_bb (cond))
> +	info->alt_loop_conds.quick_push (cond);
> +      else
> +	info->loop_cond = cond;
> +    }
> +

IMHO it would be simpler to have the primary exit condition in
info->conds[0] and the rest after that?  That avoids having two
arrays and one scalar in vect_loop_form_info.

>    if (integer_zerop (info->assumptions)
>        || !info->number_of_iterations
>        || chrec_contains_undetermined (info->number_of_iterations))
> @@ -1847,8 +1901,13 @@ vect_create_loop_vinfo (class loop *loop, vec_info_shared *shared,
>    if (!integer_onep (info->assumptions) && !main_loop_info)
>      LOOP_VINFO_NITERS_ASSUMPTIONS (loop_vinfo) = info->assumptions;
>  
> -  stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (info->loop_cond);
> -  STMT_VINFO_TYPE (loop_cond_info) = loop_exit_ctrl_vec_info_type;
> +  for (gcond *cond : info->conds)
> +    {
> +      stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (cond);
> +      STMT_VINFO_TYPE (loop_cond_info) = loop_exit_ctrl_vec_info_type;
> +    }
> +  LOOP_VINFO_LOOP_CONDS (loop_vinfo).safe_splice (info->alt_loop_conds);
> +  LOOP_VINFO_LOOP_IV_COND (loop_vinfo) = info->loop_cond;
>    LOOP_VINFO_IV_EXIT (loop_vinfo) = info->loop_exit;
>  
> @@ -3594,7 +3653,11 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared)
>  			 && LOOP_VINFO_PEELING_FOR_NITER (first_loop_vinfo)
>  			 && !loop->simduid);
>    if (!vect_epilogues)
> -    return first_loop_vinfo;
> +    {
> +      loop_form_info.conds.release ();
> +      loop_form_info.alt_loop_conds.release ();
> +      return first_loop_vinfo;
> +    }

I think there's 'inner' where you leak these.  Maybe use auto_vec<>
in vect_loop_form_info instead?

Otherwise looks OK.

Thanks,
Richard.

>    /* Now analyze first_loop_vinfo for epilogue vectorization.  */
>    poly_uint64 lowest_th = LOOP_VINFO_VERSIONING_THRESHOLD (first_loop_vinfo);
> @@ -3694,6 +3757,9 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared)
>  			   (first_loop_vinfo->epilogue_vinfos[0]->vector_mode));
>      }
>  
> +  loop_form_info.conds.release ();
> +  loop_form_info.alt_loop_conds.release ();
> +
>    return first_loop_vinfo;
>  }
>  
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index afa7a8e30891c782a0e5e3740ecc4377f5a31e54..55b6771b271d5072fa1327d595e1dddb112cfdf6 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -882,6 +882,12 @@ public:
>       we need to peel off iterations at the end to form an epilogue loop.  */
>    bool peeling_for_niter;
>  
> +  /* List of loop additional IV conditionals found in the loop.  */
> +  auto_vec<gcond *> conds;
> +
> +  /* Main loop IV cond.  */
> +  gcond* loop_iv_cond;
> +
>    /* True if there are no loop carried data dependencies in the loop.
>       If loop->safelen <= 1, then this is always true, either the loop
>       didn't have any loop carried data dependencies, or the loop is being
> @@ -984,6 +990,8 @@ public:
>  #define LOOP_VINFO_REDUCTION_CHAINS(L)     (L)->reduction_chains
>  #define LOOP_VINFO_PEELING_FOR_GAPS(L)     (L)->peeling_for_gaps
>  #define LOOP_VINFO_PEELING_FOR_NITER(L)    (L)->peeling_for_niter
> +#define LOOP_VINFO_LOOP_CONDS(L)           (L)->conds
> +#define LOOP_VINFO_LOOP_IV_COND(L)         (L)->loop_iv_cond
>  #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)->no_data_dependencies
>  #define LOOP_VINFO_SCALAR_LOOP(L)	   (L)->scalar_loop
>  #define LOOP_VINFO_SCALAR_LOOP_SCALING(L)  (L)->scalar_loop_scaling
> @@ -2373,7 +2381,9 @@ struct vect_loop_form_info
>    tree number_of_iterations;
>    tree number_of_iterationsm1;
>    tree assumptions;
> +  vec<gcond *> conds;
>    gcond *loop_cond;
> +  vec<gcond *> alt_loop_conds;
>    gcond *inner_loop_cond;
>    edge loop_exit;
>  };
> 
> 
> 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)