public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Fix profile update in scale_profile_for_vect_loop
@ 2023-07-17 10:35 Jan Hubicka
  2023-07-18 19:29 ` Thiago Jung Bauermann
  0 siblings, 1 reply; 2+ messages in thread
From: Jan Hubicka @ 2023-07-17 10:35 UTC (permalink / raw)
  To: gcc-patches, rguenther

Hi,
when vectorizing 4 times, we sometimes do
  for
    <4x vectorized body>
  for
    <2x vectorized body>
  for
    <1x vectorized body>

Here the second two fors handling epilogue never iterates.
Currently vecotrizer thinks that the middle for itrates twice.
This turns out to be scale_profile_for_vect_loop that uses 
niter_for_unrolled_loop.

At that time we know epilogue will iterate at most 2 times
but niter_for_unrolled_loop does not know that the last iteration
will be taken by the epilogue-of-epilogue and thus it think
that the loop may iterate once and exit in middle of second
iteration.

We already do correct job updating niter bounds and this is
just ordering issue.  This patch makes us to first update
the bounds and then do updating of the loop.  I re-implemented
the function more correctly and precisely.

The loop reducing iteration factor for overly flat profiles is bit funny, but
only other method I can think of is to compute sreal scale that would have
similar overhead I think.

Bootstrapped/regtested x86_64-linux, comitted.

gcc/ChangeLog:

	PR middle-end/110649
	* tree-vect-loop.cc (scale_profile_for_vect_loop):
	(vect_transform_loop):
	(optimize_mask_stores):

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 7d917bfd72c..b44fb9c7712 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -10842,31 +10842,30 @@ vect_get_loop_len (loop_vec_info loop_vinfo, gimple_stmt_iterator *gsi,
 static void
 scale_profile_for_vect_loop (class loop *loop, unsigned vf)
 {
-  edge preheader = loop_preheader_edge (loop);
-  /* Reduce loop iterations by the vectorization factor.  */
-  gcov_type new_est_niter = niter_for_unrolled_loop (loop, vf);
-  profile_count freq_h = loop->header->count, freq_e = preheader->count ();
-
-  if (freq_h.nonzero_p ())
-    {
-      profile_probability p;
-
-      /* Avoid dropping loop body profile counter to 0 because of zero count
-	 in loop's preheader.  */
-      if (!(freq_e == profile_count::zero ()))
-        freq_e = freq_e.force_nonzero ();
-      p = (freq_e * (new_est_niter + 1)).probability_in (freq_h);
-      scale_loop_frequencies (loop, p);
-    }
-
+  /* Loop body executes VF fewer times and exit increases VF times.  */
   edge exit_e = single_exit (loop);
-  exit_e->probability = profile_probability::always () / (new_est_niter + 1);
-
-  edge exit_l = single_pred_edge (loop->latch);
-  profile_probability prob = exit_l->probability;
-  exit_l->probability = exit_e->probability.invert ();
-  if (prob.initialized_p () && exit_l->probability.initialized_p ())
-    scale_bbs_frequencies (&loop->latch, 1, exit_l->probability / prob);
+  profile_count entry_count = loop_preheader_edge (loop)->count ();
+
+  /* If we have unreliable loop profile avoid dropping entry
+     count bellow header count.  This can happen since loops
+     has unrealistically low trip counts.  */
+  while (vf > 1
+	 && loop->header->count > entry_count
+	 && loop->header->count < entry_count * vf)
+    vf /= 2;
+
+  if (entry_count.nonzero_p ())
+    set_edge_probability_and_rescale_others
+	    (exit_e,
+	     entry_count.probability_in (loop->header->count / vf));
+  /* Avoid producing very large exit probability when we do not have
+     sensible profile.  */
+  else if (exit_e->probability < profile_probability::always () / (vf * 2))
+    set_edge_probability_and_rescale_others (exit_e, exit_e->probability * vf);
+  loop->latch->count = single_pred_edge (loop->latch)->count ();
+
+  scale_loop_profile (loop, profile_probability::always () / vf,
+		      get_likely_max_loop_iterations_int (loop));
 }
 
 /* For a vectorized stmt DEF_STMT_INFO adjust all vectorized PHI
@@ -11476,7 +11475,6 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
 			   niters_vector_mult_vf, !niters_no_overflow);
 
   unsigned int assumed_vf = vect_vf_for_cost (loop_vinfo);
-  scale_profile_for_vect_loop (loop, assumed_vf);
 
   /* True if the final iteration might not handle a full vector's
      worth of scalar iterations.  */
@@ -11547,6 +11545,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
 			  assumed_vf) - 1
 	 : wi::udiv_floor (loop->nb_iterations_estimate + bias_for_assumed,
 			   assumed_vf) - 1);
+  scale_profile_for_vect_loop (loop, assumed_vf);
 
   if (dump_enabled_p ())
     {

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Fix profile update in scale_profile_for_vect_loop
  2023-07-17 10:35 Fix profile update in scale_profile_for_vect_loop Jan Hubicka
@ 2023-07-18 19:29 ` Thiago Jung Bauermann
  0 siblings, 0 replies; 2+ messages in thread
From: Thiago Jung Bauermann @ 2023-07-18 19:29 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: rguenther, gcc-patches


Hello,

Jan Hubicka via Gcc-patches <gcc-patches@gcc.gnu.org> writes:

> Hi,
> when vectorizing 4 times, we sometimes do
>   for
>     <4x vectorized body>
>   for
>     <2x vectorized body>
>   for
>     <1x vectorized body>
>
> Here the second two fors handling epilogue never iterates.
> Currently vecotrizer thinks that the middle for itrates twice.
> This turns out to be scale_profile_for_vect_loop that uses 
> niter_for_unrolled_loop.
>
> At that time we know epilogue will iterate at most 2 times
> but niter_for_unrolled_loop does not know that the last iteration
> will be taken by the epilogue-of-epilogue and thus it think
> that the loop may iterate once and exit in middle of second
> iteration.
>
> We already do correct job updating niter bounds and this is
> just ordering issue.  This patch makes us to first update
> the bounds and then do updating of the loop.  I re-implemented
> the function more correctly and precisely.
>
> The loop reducing iteration factor for overly flat profiles is bit funny, but
> only other method I can think of is to compute sreal scale that would have
> similar overhead I think.
>
> Bootstrapped/regtested x86_64-linux, comitted.
>
> gcc/ChangeLog:
>
> 	PR middle-end/110649
> 	* tree-vect-loop.cc (scale_profile_for_vect_loop):
> 	(vect_transform_loop):
> 	(optimize_mask_stores):

Our CI detected regressions on aarch64-linux-gnu with this commit in
gcc.target/aarch64/sve/aarch64-sve.exp. I checked today's trunk and it
still fails. I filed the following bug report with the details:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110727

Could you please check?

-- 
Thiago

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2023-07-18 19:29 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-17 10:35 Fix profile update in scale_profile_for_vect_loop Jan Hubicka
2023-07-18 19:29 ` Thiago Jung Bauermann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).