public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Do not give realistic estimates for loop with array accesses
@ 2016-03-30 11:02 Jan Hubicka
  2016-03-30 12:09 ` Richard Biener
  2016-03-30 13:50 ` Bin.Cheng
  0 siblings, 2 replies; 9+ messages in thread
From: Jan Hubicka @ 2016-03-30 11:02 UTC (permalink / raw)
  To: gcc-patches, rguenther

Hi,
while looking into sudoku solving benchark, I noticed that we incorrectly
estimate loop to iterate 10 times just because the array it traverses is of
dimension 10. This of course is just upper bound and not realistic bound.
Realistically those loops iterates once most of time.

It turns out this bug was introduced by me in
https://gcc.gnu.org/ml/gcc-patches/2013-01/msg00444.html
While I do not recall doing that patch, it seems like a thinko about reliable
(name of the variable) and realistic (name of the parameter it is passed to).

Fixing this caused one testsuite fallout in predictive commoning testcase
because loop unswitching previously disabled itself having an estimated number
of iterations 1 (I am not sure if that is not supposed to be 0, with 1
iteration unswithcing may pay back, little bit, I suppose it wants to test
number of stmt executions of the condtional to be at least 2 which depends on
whether the conditional is before or after the loop exits). This made me notice
that some loop passes that want to give up on low trip count uses combination
of estimated number of iterations and max number of iterations while other
don't which is fixed here. The code combining both realistic and max counts
is same as i.e. in unroller and other passes already.

I also wonder if predictive comming is a win for loops with very low
trip count, but that is for separate patch, too, anyway.

Bootstrapped/regtested x86_64-linux, OK?

Honza

	* tree-ssa-loop-niter.c (idx_infer_loop_bounds): We can't get realistic
	estimates here.
	* tree-ssa-loop-unswitch.c (tree_unswitch_single_loop): Use also
	max_loop_iterations_int.
	* tree-ssa-loop-ivopts.c (avg_loop_niter): Likewise.
	* tree-vect-loop.c (vect_analyze_loop_2): Likewise.
Index: tree-ssa-loop-niter.c
===================================================================
--- tree-ssa-loop-niter.c	(revision 234516)
+++ tree-ssa-loop-niter.c	(working copy)
@@ -3115,7 +3115,6 @@ idx_infer_loop_bounds (tree base, tree *
   tree low, high, type, next;
   bool sign, upper = true, at_end = false;
   struct loop *loop = data->loop;
-  bool reliable = true;
 
   if (TREE_CODE (base) != ARRAY_REF)
     return true;
@@ -3187,14 +3186,14 @@ idx_infer_loop_bounds (tree base, tree *
       && tree_int_cst_compare (next, high) <= 0)
     return true;
 
-  /* If access is not executed on every iteration, we must ensure that overlow may
-     not make the access valid later.  */
+  /* If access is not executed on every iteration, we must ensure that overlow
+     may not make the access valid later.  */
   if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt))
       && scev_probably_wraps_p (initial_condition_in_loop_num (ev, loop->num),
 				step, data->stmt, loop, true))
-    reliable = false;
+    upper = false;
 
-  record_nonwrapping_iv (loop, init, step, data->stmt, low, high, reliable, upper);
+  record_nonwrapping_iv (loop, init, step, data->stmt, low, high, false, upper);
   return true;
 }
 
Index: tree-ssa-loop-unswitch.c
===================================================================
--- tree-ssa-loop-unswitch.c	(revision 234516)
+++ tree-ssa-loop-unswitch.c	(working copy)
@@ -223,6 +223,8 @@ tree_unswitch_single_loop (struct loop *
       /* If the loop is not expected to iterate, there is no need
 	 for unswitching.  */
       iterations = estimated_loop_iterations_int (loop);
+      if (iterations < 0)
+        iterations = max_loop_iterations_int (loop);
       if (iterations >= 0 && iterations <= 1)
 	{
 	  if (dump_file && (dump_flags & TDF_DETAILS))
Index: tree-ssa-loop-ivopts.c
===================================================================
--- tree-ssa-loop-ivopts.c	(revision 234516)
+++ tree-ssa-loop-ivopts.c	(working copy)
@@ -121,7 +121,11 @@ avg_loop_niter (struct loop *loop)
 {
   HOST_WIDE_INT niter = estimated_stmt_executions_int (loop);
   if (niter == -1)
-    return AVG_LOOP_NITER (loop);
+    {
+      niter = max_stmt_executions_int (loop);
+      if (niter == -1 || niter > AVG_LOOP_NITER (loop))
+        return AVG_LOOP_NITER (loop);
+    }
 
   return niter;
 }
Index: tree-vect-loop.c
===================================================================
--- tree-vect-loop.c	(revision 234516)
+++ tree-vect-loop.c	(working copy)
@@ -2063,6 +2063,9 @@ start_over:
 
   estimated_niter
     = estimated_stmt_executions_int (LOOP_VINFO_LOOP (loop_vinfo));
+  if (estimated_niter != -1)
+    estimated_niter
+      = max_stmt_executions_int (LOOP_VINFO_LOOP (loop_vinfo));
   if (estimated_niter != -1
       && ((unsigned HOST_WIDE_INT) estimated_niter
           <= MAX (th, (unsigned)min_profitable_estimate)))

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Do not give realistic estimates for loop with array accesses
  2016-03-30 11:02 Do not give realistic estimates for loop with array accesses Jan Hubicka
@ 2016-03-30 12:09 ` Richard Biener
  2016-03-30 12:36   ` Jan Hubicka
  2016-03-30 13:50 ` Bin.Cheng
  1 sibling, 1 reply; 9+ messages in thread
From: Richard Biener @ 2016-03-30 12:09 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc-patches

On Wed, 30 Mar 2016, Jan Hubicka wrote:

> Hi,
> while looking into sudoku solving benchark, I noticed that we incorrectly
> estimate loop to iterate 10 times just because the array it traverses is of
> dimension 10. This of course is just upper bound and not realistic bound.
> Realistically those loops iterates once most of time.
> 
> It turns out this bug was introduced by me in
> https://gcc.gnu.org/ml/gcc-patches/2013-01/msg00444.html
> While I do not recall doing that patch, it seems like a thinko about reliable
> (name of the variable) and realistic (name of the parameter it is passed to).
> 
> Fixing this caused one testsuite fallout in predictive commoning testcase
> because loop unswitching previously disabled itself having an estimated number
> of iterations 1 (I am not sure if that is not supposed to be 0, with 1
> iteration unswithcing may pay back, little bit, I suppose it wants to test
> number of stmt executions of the condtional to be at least 2 which depends on
> whether the conditional is before or after the loop exits). This made me notice
> that some loop passes that want to give up on low trip count uses combination
> of estimated number of iterations and max number of iterations while other
> don't which is fixed here. The code combining both realistic and max counts
> is same as i.e. in unroller and other passes already.
> 
> I also wonder if predictive comming is a win for loops with very low
> trip count, but that is for separate patch, too, anyway.
> 
> Bootstrapped/regtested x86_64-linux, OK?
> 
> Honza
> 
> 	* tree-ssa-loop-niter.c (idx_infer_loop_bounds): We can't get realistic
> 	estimates here.
> 	* tree-ssa-loop-unswitch.c (tree_unswitch_single_loop): Use also
> 	max_loop_iterations_int.
> 	* tree-ssa-loop-ivopts.c (avg_loop_niter): Likewise.
> 	* tree-vect-loop.c (vect_analyze_loop_2): Likewise.
> Index: tree-ssa-loop-niter.c
> ===================================================================
> --- tree-ssa-loop-niter.c	(revision 234516)
> +++ tree-ssa-loop-niter.c	(working copy)
> @@ -3115,7 +3115,6 @@ idx_infer_loop_bounds (tree base, tree *
>    tree low, high, type, next;
>    bool sign, upper = true, at_end = false;
>    struct loop *loop = data->loop;
> -  bool reliable = true;
>  
>    if (TREE_CODE (base) != ARRAY_REF)
>      return true;
> @@ -3187,14 +3186,14 @@ idx_infer_loop_bounds (tree base, tree *
>        && tree_int_cst_compare (next, high) <= 0)
>      return true;
>  
> -  /* If access is not executed on every iteration, we must ensure that overlow may
> -     not make the access valid later.  */
> +  /* If access is not executed on every iteration, we must ensure that overlow
> +     may not make the access valid later.  */
>    if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt))
>        && scev_probably_wraps_p (initial_condition_in_loop_num (ev, loop->num),
>  				step, data->stmt, loop, true))
> -    reliable = false;
> +    upper = false;
>  
> -  record_nonwrapping_iv (loop, init, step, data->stmt, low, high, reliable, upper);
> +  record_nonwrapping_iv (loop, init, step, data->stmt, low, high, false, upper);
>    return true;
>  }
>  
> Index: tree-ssa-loop-unswitch.c
> ===================================================================
> --- tree-ssa-loop-unswitch.c	(revision 234516)
> +++ tree-ssa-loop-unswitch.c	(working copy)
> @@ -223,6 +223,8 @@ tree_unswitch_single_loop (struct loop *
>        /* If the loop is not expected to iterate, there is no need
>  	 for unswitching.  */
>        iterations = estimated_loop_iterations_int (loop);
> +      if (iterations < 0)
> +        iterations = max_loop_iterations_int (loop);

You are only changing one place in this file.

>        if (iterations >= 0 && iterations <= 1)
>  	{
>  	  if (dump_file && (dump_flags & TDF_DETAILS))
> Index: tree-ssa-loop-ivopts.c
> ===================================================================
> --- tree-ssa-loop-ivopts.c	(revision 234516)
> +++ tree-ssa-loop-ivopts.c	(working copy)
> @@ -121,7 +121,11 @@ avg_loop_niter (struct loop *loop)
>  {
>    HOST_WIDE_INT niter = estimated_stmt_executions_int (loop);
>    if (niter == -1)
> -    return AVG_LOOP_NITER (loop);
> +    {
> +      niter = max_stmt_executions_int (loop);
> +      if (niter == -1 || niter > AVG_LOOP_NITER (loop))
> +        return AVG_LOOP_NITER (loop);
> +    }
>  
>    return niter;
>  }
> Index: tree-vect-loop.c
> ===================================================================
> --- tree-vect-loop.c	(revision 234516)
> +++ tree-vect-loop.c	(working copy)
> @@ -2063,6 +2063,9 @@ start_over:
>  
>    estimated_niter
>      = estimated_stmt_executions_int (LOOP_VINFO_LOOP (loop_vinfo));
> +  if (estimated_niter != -1)
> +    estimated_niter
> +      = max_stmt_executions_int (LOOP_VINFO_LOOP (loop_vinfo));
>    if (estimated_niter != -1
>        && ((unsigned HOST_WIDE_INT) estimated_niter
>            <= MAX (th, (unsigned)min_profitable_estimate)))

The vectorizer already checks this (albeit indirectly):

  HOST_WIDE_INT max_niter
    = max_stmt_executions_int (LOOP_VINFO_LOOP (loop_vinfo));
  if ((LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
       && (LOOP_VINFO_INT_NITERS (loop_vinfo) < vectorization_factor))
      || (max_niter != -1
          && (unsigned HOST_WIDE_INT) max_niter < vectorization_factor))
    {
      if (dump_enabled_p ())
        dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
                         "not vectorized: iteration count smaller than "
                         "vectorization factor.\n");
      return false;
    }

Richard.

> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Do not give realistic estimates for loop with array accesses
  2016-03-30 12:09 ` Richard Biener
@ 2016-03-30 12:36   ` Jan Hubicka
  2016-03-30 12:49     ` Richard Biener
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Hubicka @ 2016-03-30 12:36 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jan Hubicka, gcc-patches

> 
> You are only changing one place in this file.

You are right. I am attaching the updated patch which I am re-testing now.
> 
> The vectorizer already checks this (albeit indirectly):
> 
>   HOST_WIDE_INT max_niter
>     = max_stmt_executions_int (LOOP_VINFO_LOOP (loop_vinfo));
>   if ((LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
>        && (LOOP_VINFO_INT_NITERS (loop_vinfo) < vectorization_factor))
>       || (max_niter != -1
>           && (unsigned HOST_WIDE_INT) max_niter < vectorization_factor))
>     {
>       if (dump_enabled_p ())
>         dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>                          "not vectorized: iteration count smaller than "
>                          "vectorization factor.\n");
>       return false;
>     }

Yes, but one tests only vectorization_factor and other min_profitable_estimate
which probably should be greater than vectorization_factor.

The check above should therefore become redundant.  My reading of the code is
that min_profiltable_estimate is computed after the check above, so it is
probably an useful shortcut and the message is also bit more informative.
I updated the later test to use max_niter variable once it is computed.

OK with those changes assuming testing passes?

Honza

	* tree-ssa-loop-niter.c (idx_infer_loop_bounds): We can't get realistic
	estimates here.
	* tree-ssa-loop-unswitch.c (tree_unswitch_single_loop): Use also
	max_loop_iterations_int.
	(tree_unswitch_outer_loop): Likewise.
	* tree-ssa-loop-ivopts.c (avg_loop_niter): Likewise.
	* tree-vect-loop.c (vect_analyze_loop_2): Likewise.
Index: tree-ssa-loop-ivopts.c
===================================================================
--- tree-ssa-loop-ivopts.c	(revision 234516)
+++ tree-ssa-loop-ivopts.c	(working copy)
@@ -121,7 +121,11 @@ avg_loop_niter (struct loop *loop)
 {
   HOST_WIDE_INT niter = estimated_stmt_executions_int (loop);
   if (niter == -1)
-    return AVG_LOOP_NITER (loop);
+    {
+      niter = max_stmt_executions_int (loop);
+      if (niter == -1 || niter > AVG_LOOP_NITER (loop))
+        return AVG_LOOP_NITER (loop);
+    }
 
   return niter;
 }
Index: tree-ssa-loop-niter.c
===================================================================
--- tree-ssa-loop-niter.c	(revision 234516)
+++ tree-ssa-loop-niter.c	(working copy)
@@ -3115,7 +3115,6 @@ idx_infer_loop_bounds (tree base, tree *
   tree low, high, type, next;
   bool sign, upper = true, at_end = false;
   struct loop *loop = data->loop;
-  bool reliable = true;
 
   if (TREE_CODE (base) != ARRAY_REF)
     return true;
@@ -3187,14 +3186,14 @@ idx_infer_loop_bounds (tree base, tree *
       && tree_int_cst_compare (next, high) <= 0)
     return true;
 
-  /* If access is not executed on every iteration, we must ensure that overlow may
-     not make the access valid later.  */
+  /* If access is not executed on every iteration, we must ensure that overlow
+     may not make the access valid later.  */
   if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt))
       && scev_probably_wraps_p (initial_condition_in_loop_num (ev, loop->num),
 				step, data->stmt, loop, true))
-    reliable = false;
+    upper = false;
 
-  record_nonwrapping_iv (loop, init, step, data->stmt, low, high, reliable, upper);
+  record_nonwrapping_iv (loop, init, step, data->stmt, low, high, false, upper);
   return true;
 }
 
Index: tree-ssa-loop-unswitch.c
===================================================================
--- tree-ssa-loop-unswitch.c	(revision 234516)
+++ tree-ssa-loop-unswitch.c	(working copy)
@@ -223,6 +223,8 @@ tree_unswitch_single_loop (struct loop *
       /* If the loop is not expected to iterate, there is no need
 	 for unswitching.  */
       iterations = estimated_loop_iterations_int (loop);
+      if (iterations < 0)
+        iterations = max_loop_iterations_int (loop);
       if (iterations >= 0 && iterations <= 1)
 	{
 	  if (dump_file && (dump_flags & TDF_DETAILS))
@@ -439,6 +441,8 @@ tree_unswitch_outer_loop (struct loop *l
   /* If the loop is not expected to iterate, there is no need
       for unswitching.  */
   iterations = estimated_loop_iterations_int (loop);
+  if (iterations < 0)
+    iterations = max_loop_iterations_int (loop);
   if (iterations >= 0 && iterations <= 1)
     {
       if (dump_file && (dump_flags & TDF_DETAILS))
Index: tree-vect-loop.c
===================================================================
--- tree-vect-loop.c	(revision 234516)
+++ tree-vect-loop.c	(working copy)
@@ -2063,6 +2063,8 @@ start_over:
 
   estimated_niter
     = estimated_stmt_executions_int (LOOP_VINFO_LOOP (loop_vinfo));
+  if (estimated_niter != -1)
+    estimated_niter = max_niter;
   if (estimated_niter != -1
       && ((unsigned HOST_WIDE_INT) estimated_niter
           <= MAX (th, (unsigned)min_profitable_estimate)))

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Do not give realistic estimates for loop with array accesses
  2016-03-30 12:36   ` Jan Hubicka
@ 2016-03-30 12:49     ` Richard Biener
  2016-03-30 18:52       ` Bernhard Reutner-Fischer
  2016-04-07 14:52       ` Tom de Vries
  0 siblings, 2 replies; 9+ messages in thread
From: Richard Biener @ 2016-03-30 12:49 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc-patches

On Wed, 30 Mar 2016, Jan Hubicka wrote:

> > 
> > You are only changing one place in this file.
> 
> You are right. I am attaching the updated patch which I am re-testing now.
> > 
> > The vectorizer already checks this (albeit indirectly):
> > 
> >   HOST_WIDE_INT max_niter
> >     = max_stmt_executions_int (LOOP_VINFO_LOOP (loop_vinfo));
> >   if ((LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> >        && (LOOP_VINFO_INT_NITERS (loop_vinfo) < vectorization_factor))
> >       || (max_niter != -1
> >           && (unsigned HOST_WIDE_INT) max_niter < vectorization_factor))
> >     {
> >       if (dump_enabled_p ())
> >         dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> >                          "not vectorized: iteration count smaller than "
> >                          "vectorization factor.\n");
> >       return false;
> >     }
> 
> Yes, but one tests only vectorization_factor and other min_profitable_estimate
> which probably should be greater than vectorization_factor.
> 
> The check above should therefore become redundant.  My reading of the code is
> that min_profiltable_estimate is computed after the check above, so it is
> probably an useful shortcut and the message is also bit more informative.
> I updated the later test to use max_niter variable once it is computed.
> 
> OK with those changes assuming testing passes?

Ok.

Richard.

> Honza
> 
> 	* tree-ssa-loop-niter.c (idx_infer_loop_bounds): We can't get realistic
> 	estimates here.
> 	* tree-ssa-loop-unswitch.c (tree_unswitch_single_loop): Use also
> 	max_loop_iterations_int.
> 	(tree_unswitch_outer_loop): Likewise.
> 	* tree-ssa-loop-ivopts.c (avg_loop_niter): Likewise.
> 	* tree-vect-loop.c (vect_analyze_loop_2): Likewise.
> Index: tree-ssa-loop-ivopts.c
> ===================================================================
> --- tree-ssa-loop-ivopts.c	(revision 234516)
> +++ tree-ssa-loop-ivopts.c	(working copy)
> @@ -121,7 +121,11 @@ avg_loop_niter (struct loop *loop)
>  {
>    HOST_WIDE_INT niter = estimated_stmt_executions_int (loop);
>    if (niter == -1)
> -    return AVG_LOOP_NITER (loop);
> +    {
> +      niter = max_stmt_executions_int (loop);
> +      if (niter == -1 || niter > AVG_LOOP_NITER (loop))
> +        return AVG_LOOP_NITER (loop);
> +    }
>  
>    return niter;
>  }
> Index: tree-ssa-loop-niter.c
> ===================================================================
> --- tree-ssa-loop-niter.c	(revision 234516)
> +++ tree-ssa-loop-niter.c	(working copy)
> @@ -3115,7 +3115,6 @@ idx_infer_loop_bounds (tree base, tree *
>    tree low, high, type, next;
>    bool sign, upper = true, at_end = false;
>    struct loop *loop = data->loop;
> -  bool reliable = true;
>  
>    if (TREE_CODE (base) != ARRAY_REF)
>      return true;
> @@ -3187,14 +3186,14 @@ idx_infer_loop_bounds (tree base, tree *
>        && tree_int_cst_compare (next, high) <= 0)
>      return true;
>  
> -  /* If access is not executed on every iteration, we must ensure that overlow may
> -     not make the access valid later.  */
> +  /* If access is not executed on every iteration, we must ensure that overlow
> +     may not make the access valid later.  */
>    if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt))
>        && scev_probably_wraps_p (initial_condition_in_loop_num (ev, loop->num),
>  				step, data->stmt, loop, true))
> -    reliable = false;
> +    upper = false;
>  
> -  record_nonwrapping_iv (loop, init, step, data->stmt, low, high, reliable, upper);
> +  record_nonwrapping_iv (loop, init, step, data->stmt, low, high, false, upper);
>    return true;
>  }
>  
> Index: tree-ssa-loop-unswitch.c
> ===================================================================
> --- tree-ssa-loop-unswitch.c	(revision 234516)
> +++ tree-ssa-loop-unswitch.c	(working copy)
> @@ -223,6 +223,8 @@ tree_unswitch_single_loop (struct loop *
>        /* If the loop is not expected to iterate, there is no need
>  	 for unswitching.  */
>        iterations = estimated_loop_iterations_int (loop);
> +      if (iterations < 0)
> +        iterations = max_loop_iterations_int (loop);
>        if (iterations >= 0 && iterations <= 1)
>  	{
>  	  if (dump_file && (dump_flags & TDF_DETAILS))
> @@ -439,6 +441,8 @@ tree_unswitch_outer_loop (struct loop *l
>    /* If the loop is not expected to iterate, there is no need
>        for unswitching.  */
>    iterations = estimated_loop_iterations_int (loop);
> +  if (iterations < 0)
> +    iterations = max_loop_iterations_int (loop);
>    if (iterations >= 0 && iterations <= 1)
>      {
>        if (dump_file && (dump_flags & TDF_DETAILS))
> Index: tree-vect-loop.c
> ===================================================================
> --- tree-vect-loop.c	(revision 234516)
> +++ tree-vect-loop.c	(working copy)
> @@ -2063,6 +2063,8 @@ start_over:
>  
>    estimated_niter
>      = estimated_stmt_executions_int (LOOP_VINFO_LOOP (loop_vinfo));
> +  if (estimated_niter != -1)
> +    estimated_niter = max_niter;
>    if (estimated_niter != -1
>        && ((unsigned HOST_WIDE_INT) estimated_niter
>            <= MAX (th, (unsigned)min_profitable_estimate)))
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Do not give realistic estimates for loop with array accesses
  2016-03-30 11:02 Do not give realistic estimates for loop with array accesses Jan Hubicka
  2016-03-30 12:09 ` Richard Biener
@ 2016-03-30 13:50 ` Bin.Cheng
  2016-03-30 14:41   ` Jan Hubicka
  1 sibling, 1 reply; 9+ messages in thread
From: Bin.Cheng @ 2016-03-30 13:50 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc-patches List, Richard Biener

On Wed, Mar 30, 2016 at 11:00 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
> Hi,
> while looking into sudoku solving benchark, I noticed that we incorrectly
> estimate loop to iterate 10 times just because the array it traverses is of
> dimension 10. This of course is just upper bound and not realistic bound.
> Realistically those loops iterates once most of time.
>
> It turns out this bug was introduced by me in
> https://gcc.gnu.org/ml/gcc-patches/2013-01/msg00444.html
> While I do not recall doing that patch, it seems like a thinko about reliable
> (name of the variable) and realistic (name of the parameter it is passed to).
>
> Fixing this caused one testsuite fallout in predictive commoning testcase
> because loop unswitching previously disabled itself having an estimated number
> of iterations 1 (I am not sure if that is not supposed to be 0, with 1
> iteration unswithcing may pay back, little bit, I suppose it wants to test
> number of stmt executions of the condtional to be at least 2 which depends on
> whether the conditional is before or after the loop exits). This made me notice
> that some loop passes that want to give up on low trip count uses combination
> of estimated number of iterations and max number of iterations while other
> don't which is fixed here. The code combining both realistic and max counts
> is same as i.e. in unroller and other passes already.
>
> I also wonder if predictive comming is a win for loops with very low
> trip count, but that is for separate patch, too, anyway.
>
> Bootstrapped/regtested x86_64-linux, OK?
>
> Honza
>
>         * tree-ssa-loop-niter.c (idx_infer_loop_bounds): We can't get realistic
>         estimates here.
>         * tree-ssa-loop-unswitch.c (tree_unswitch_single_loop): Use also
>         max_loop_iterations_int.
>         * tree-ssa-loop-ivopts.c (avg_loop_niter): Likewise.
>         * tree-vect-loop.c (vect_analyze_loop_2): Likewise.
> Index: tree-ssa-loop-niter.c
> ===================================================================
> --- tree-ssa-loop-niter.c       (revision 234516)
> +++ tree-ssa-loop-niter.c       (working copy)
> @@ -3115,7 +3115,6 @@ idx_infer_loop_bounds (tree base, tree *
>    tree low, high, type, next;
>    bool sign, upper = true, at_end = false;
>    struct loop *loop = data->loop;
> -  bool reliable = true;
>
>    if (TREE_CODE (base) != ARRAY_REF)
>      return true;
> @@ -3187,14 +3186,14 @@ idx_infer_loop_bounds (tree base, tree *
>        && tree_int_cst_compare (next, high) <= 0)
>      return true;
>
> -  /* If access is not executed on every iteration, we must ensure that overlow may
> -     not make the access valid later.  */
> +  /* If access is not executed on every iteration, we must ensure that overlow
> +     may not make the access valid later.  */
>    if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt))
>        && scev_probably_wraps_p (initial_condition_in_loop_num (ev, loop->num),
>                                 step, data->stmt, loop, true))
> -    reliable = false;
> +    upper = false;
>
> -  record_nonwrapping_iv (loop, init, step, data->stmt, low, high, reliable, upper);
> +  record_nonwrapping_iv (loop, init, step, data->stmt, low, high, false, upper);
>    return true;
>  }
Hi,
I have a concern that GCC records bound information from basic blocks
even that don't dominate loop latch.  Given below example:

extern int b[];
void bar (int *);
int foo (int x, int n)
{
  int i;
  int arr[128] = {0};

  for (i = 0; i < n; i++)
    {
      if (x > i)
        {
          a[i] = i;
          b[i] = i;
        }
    }
  bar (arr);
  return 0;
}
The upper bound inferred from a[i] is 127.  This information is
recorded along with the stmt itself in loop->bounds.  Afterwards, this
information is also used in a flow-sensitive way.  In this example, we
are sure that &b[i] won't overflow (thus a SCEV) because it's in the
same basic block as a[i].  GCC currently relies on such information in
overflow detection for scev, i.e., loop_exits_before_overflow.

But with this change, we won't record upper bound information in
record_estimate because the parameter is set to false?

Thanks,
bin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Do not give realistic estimates for loop with array accesses
  2016-03-30 13:50 ` Bin.Cheng
@ 2016-03-30 14:41   ` Jan Hubicka
  2016-03-30 15:30     ` Bin.Cheng
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Hubicka @ 2016-03-30 14:41 UTC (permalink / raw)
  To: Bin.Cheng; +Cc: Jan Hubicka, gcc-patches List, Richard Biener

> > -  /* If access is not executed on every iteration, we must ensure that overlow may
> > -     not make the access valid later.  */
> > +  /* If access is not executed on every iteration, we must ensure that overlow
> > +     may not make the access valid later.  */
> >    if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt))
> >        && scev_probably_wraps_p (initial_condition_in_loop_num (ev, loop->num),
> >                                 step, data->stmt, loop, true))
> > -    reliable = false;
> > +    upper = false;
> >
> > -  record_nonwrapping_iv (loop, init, step, data->stmt, low, high, reliable, upper);
> > +  record_nonwrapping_iv (loop, init, step, data->stmt, low, high, false, upper);
> >    return true;
> >  }
> Hi,
> I have a concern that GCC records bound information from basic blocks
> even that don't dominate loop latch.  Given below example:
> 
> extern int b[];
> void bar (int *);
> int foo (int x, int n)
> {
>   int i;
>   int arr[128] = {0};
> 
>   for (i = 0; i < n; i++)
>     {
>       if (x > i)
>         {
>           a[i] = i;
>           b[i] = i;
>         }
>     }
>   bar (arr);
>   return 0;
> }

This testcase is not affected, becase scev_probably_wraps_p returns false in this case.
In the wrapping case, we can't derive upper bound - this is indeed a correctness issue.

I am experiemtning with enabling loop peeling by default. For that I extended the code
to record likely upper bounds, which can be used to test if the loop most probably has
low trip count. This is also useful to trottle other transformations.

In this case we can probably assume that no sane person would count
on wrapping and record this as likely bound.

Honza

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Do not give realistic estimates for loop with array accesses
  2016-03-30 14:41   ` Jan Hubicka
@ 2016-03-30 15:30     ` Bin.Cheng
  0 siblings, 0 replies; 9+ messages in thread
From: Bin.Cheng @ 2016-03-30 15:30 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc-patches List, Richard Biener

On Wed, Mar 30, 2016 at 3:22 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> > -  /* If access is not executed on every iteration, we must ensure that overlow may
>> > -     not make the access valid later.  */
>> > +  /* If access is not executed on every iteration, we must ensure that overlow
>> > +     may not make the access valid later.  */
>> >    if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt))
>> >        && scev_probably_wraps_p (initial_condition_in_loop_num (ev, loop->num),
>> >                                 step, data->stmt, loop, true))
>> > -    reliable = false;
>> > +    upper = false;
>> >
>> > -  record_nonwrapping_iv (loop, init, step, data->stmt, low, high, reliable, upper);
>> > +  record_nonwrapping_iv (loop, init, step, data->stmt, low, high, false, upper);
>> >    return true;
>> >  }
>> Hi,
>> I have a concern that GCC records bound information from basic blocks
>> even that don't dominate loop latch.  Given below example:
>>
>> extern int b[];
>> void bar (int *);
>> int foo (int x, int n)
>> {
>>   int i;
>>   int arr[128] = {0};
>>
>>   for (i = 0; i < n; i++)
>>     {
>>       if (x > i)
>>         {
>>           a[i] = i;
>>           b[i] = i;
>>         }
>>     }
>>   bar (arr);
>>   return 0;
>> }
>
> This testcase is not affected, becase scev_probably_wraps_p returns false in this case.
> In the wrapping case, we can't derive upper bound - this is indeed a correctness issue.
In the wrapping case, we still can derive upper bound if the index's
wrapping range is larger than array bound.  But I agree it looks like
very corner case and not likely be useful in practice.

Thanks,
bin
>
> I am experiemtning with enabling loop peeling by default. For that I extended the code
> to record likely upper bounds, which can be used to test if the loop most probably has
> low trip count. This is also useful to trottle other transformations.
>
> In this case we can probably assume that no sane person would count
> on wrapping and record this as likely bound.
>
> Honza

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Do not give realistic estimates for loop with array accesses
  2016-03-30 12:49     ` Richard Biener
@ 2016-03-30 18:52       ` Bernhard Reutner-Fischer
  2016-04-07 14:52       ` Tom de Vries
  1 sibling, 0 replies; 9+ messages in thread
From: Bernhard Reutner-Fischer @ 2016-03-30 18:52 UTC (permalink / raw)
  To: Richard Biener, Jan Hubicka; +Cc: gcc-patches

On March 30, 2016 2:36:14 PM GMT+02:00, Richard Biener <rguenther@suse.de> wrote:
>On Wed, 30 Mar 2016, Jan Hubicka wrote:
>
>> > 
>> > You are only changing one place in this file.
>> 
>> You are right. I am attaching the updated patch which I am re-testing
>now.
>> > 
>> > The vectorizer already checks this (albeit indirectly):
>> > 
>> >   HOST_WIDE_INT max_niter
>> >     = max_stmt_executions_int (LOOP_VINFO_LOOP (loop_vinfo));
>> >   if ((LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
>> >        && (LOOP_VINFO_INT_NITERS (loop_vinfo) <
>vectorization_factor))
>> >       || (max_niter != -1
>> >           && (unsigned HOST_WIDE_INT) max_niter <
>vectorization_factor))
>> >     {
>> >       if (dump_enabled_p ())
>> >         dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>> >                          "not vectorized: iteration count smaller
>than "
>> >                          "vectorization factor.\n");
>> >       return false;
>> >     }
>> 
>> Yes, but one tests only vectorization_factor and other
>min_profitable_estimate
>> which probably should be greater than vectorization_factor.
>> 
>> The check above should therefore become redundant.  My reading of the
>code is
>> that min_profiltable_estimate is computed after the check above, so
>it is
>> probably an useful shortcut and the message is also bit more
>informative.
>> I updated the later test to use max_niter variable once it is
>computed.
>> 
>> OK with those changes assuming testing passes?
>
>Ok.

Maybe s/overlow/overflow/g while at it..

TIA,
>
>Richard.
>
>> Honza
>> 
>> 	* tree-ssa-loop-niter.c (idx_infer_loop_bounds): We can't get
>realistic
>> 	estimates here.
>> 	* tree-ssa-loop-unswitch.c (tree_unswitch_single_loop): Use also
>> 	max_loop_iterations_int.
>> 	(tree_unswitch_outer_loop): Likewise.
>> 	* tree-ssa-loop-ivopts.c (avg_loop_niter): Likewise.
>> 	* tree-vect-loop.c (vect_analyze_loop_2): Likewise.
>> Index: tree-ssa-loop-ivopts.c
>> ===================================================================
>> --- tree-ssa-loop-ivopts.c	(revision 234516)
>> +++ tree-ssa-loop-ivopts.c	(working copy)
>> @@ -121,7 +121,11 @@ avg_loop_niter (struct loop *loop)
>>  {
>>    HOST_WIDE_INT niter = estimated_stmt_executions_int (loop);
>>    if (niter == -1)
>> -    return AVG_LOOP_NITER (loop);
>> +    {
>> +      niter = max_stmt_executions_int (loop);
>> +      if (niter == -1 || niter > AVG_LOOP_NITER (loop))
>> +        return AVG_LOOP_NITER (loop);
>> +    }
>>  
>>    return niter;
>>  }
>> Index: tree-ssa-loop-niter.c
>> ===================================================================
>> --- tree-ssa-loop-niter.c	(revision 234516)
>> +++ tree-ssa-loop-niter.c	(working copy)
>> @@ -3115,7 +3115,6 @@ idx_infer_loop_bounds (tree base, tree *
>>    tree low, high, type, next;
>>    bool sign, upper = true, at_end = false;
>>    struct loop *loop = data->loop;
>> -  bool reliable = true;
>>  
>>    if (TREE_CODE (base) != ARRAY_REF)
>>      return true;
>> @@ -3187,14 +3186,14 @@ idx_infer_loop_bounds (tree base, tree *
>>        && tree_int_cst_compare (next, high) <= 0)
>>      return true;
>>  
>> -  /* If access is not executed on every iteration, we must ensure
>that overlow may
>> -     not make the access valid later.  */
>> +  /* If access is not executed on every iteration, we must ensure
>that overlow
>> +     may not make the access valid later.  */
>>    if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb
>(data->stmt))
>>        && scev_probably_wraps_p (initial_condition_in_loop_num (ev,
>loop->num),
>>  				step, data->stmt, loop, true))
>> -    reliable = false;
>> +    upper = false;
>>  
>> -  record_nonwrapping_iv (loop, init, step, data->stmt, low, high,
>reliable, upper);
>> +  record_nonwrapping_iv (loop, init, step, data->stmt, low, high,
>false, upper);
>>    return true;
>>  }
>>  
>> Index: tree-ssa-loop-unswitch.c
>> ===================================================================
>> --- tree-ssa-loop-unswitch.c	(revision 234516)
>> +++ tree-ssa-loop-unswitch.c	(working copy)
>> @@ -223,6 +223,8 @@ tree_unswitch_single_loop (struct loop *
>>        /* If the loop is not expected to iterate, there is no need
>>  	 for unswitching.  */
>>        iterations = estimated_loop_iterations_int (loop);
>> +      if (iterations < 0)
>> +        iterations = max_loop_iterations_int (loop);
>>        if (iterations >= 0 && iterations <= 1)
>>  	{
>>  	  if (dump_file && (dump_flags & TDF_DETAILS))
>> @@ -439,6 +441,8 @@ tree_unswitch_outer_loop (struct loop *l
>>    /* If the loop is not expected to iterate, there is no need
>>        for unswitching.  */
>>    iterations = estimated_loop_iterations_int (loop);
>> +  if (iterations < 0)
>> +    iterations = max_loop_iterations_int (loop);
>>    if (iterations >= 0 && iterations <= 1)
>>      {
>>        if (dump_file && (dump_flags & TDF_DETAILS))
>> Index: tree-vect-loop.c
>> ===================================================================
>> --- tree-vect-loop.c	(revision 234516)
>> +++ tree-vect-loop.c	(working copy)
>> @@ -2063,6 +2063,8 @@ start_over:
>>  
>>    estimated_niter
>>      = estimated_stmt_executions_int (LOOP_VINFO_LOOP (loop_vinfo));
>> +  if (estimated_niter != -1)
>> +    estimated_niter = max_niter;
>>    if (estimated_niter != -1
>>        && ((unsigned HOST_WIDE_INT) estimated_niter
>>            <= MAX (th, (unsigned)min_profitable_estimate)))
>> 
>> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Do not give realistic estimates for loop with array accesses
  2016-03-30 12:49     ` Richard Biener
  2016-03-30 18:52       ` Bernhard Reutner-Fischer
@ 2016-04-07 14:52       ` Tom de Vries
  1 sibling, 0 replies; 9+ messages in thread
From: Tom de Vries @ 2016-04-07 14:52 UTC (permalink / raw)
  To: Richard Biener, Jan Hubicka; +Cc: gcc-patches

On 30/03/16 14:36, Richard Biener wrote:
> On Wed, 30 Mar 2016, Jan Hubicka wrote:
>
>>> > >
>>> > >You are only changing one place in this file.
>> >
>> >You are right. I am attaching the updated patch which I am re-testing now.
>>> > >
>>> > >The vectorizer already checks this (albeit indirectly):
>>> > >
>>> > >   HOST_WIDE_INT max_niter
>>> > >     = max_stmt_executions_int (LOOP_VINFO_LOOP (loop_vinfo));
>>> > >   if ((LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
>>> > >        && (LOOP_VINFO_INT_NITERS (loop_vinfo) < vectorization_factor))
>>> > >       || (max_niter != -1
>>> > >           && (unsigned HOST_WIDE_INT) max_niter < vectorization_factor))
>>> > >     {
>>> > >       if (dump_enabled_p ())
>>> > >         dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>>> > >                          "not vectorized: iteration count smaller than "
>>> > >                          "vectorization factor.\n");
>>> > >       return false;
>>> > >     }
>> >
>> >Yes, but one tests only vectorization_factor and other min_profitable_estimate
>> >which probably should be greater than vectorization_factor.
>> >
>> >The check above should therefore become redundant.  My reading of the code is
>> >that min_profiltable_estimate is computed after the check above, so it is
>> >probably an useful shortcut and the message is also bit more informative.
>> >I updated the later test to use max_niter variable once it is computed.
>> >
>> >OK with those changes assuming testing passes?
> Ok.

This patch caused PR70577 - 'tree-ssa/prefetch-5.c scan-tree-dump-times 
aprefetch failures' ( https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70577 ).

Thanks,
- Tom

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-04-07 14:52 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-30 11:02 Do not give realistic estimates for loop with array accesses Jan Hubicka
2016-03-30 12:09 ` Richard Biener
2016-03-30 12:36   ` Jan Hubicka
2016-03-30 12:49     ` Richard Biener
2016-03-30 18:52       ` Bernhard Reutner-Fischer
2016-04-07 14:52       ` Tom de Vries
2016-03-30 13:50 ` Bin.Cheng
2016-03-30 14:41   ` Jan Hubicka
2016-03-30 15:30     ` Bin.Cheng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).