public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 1/2] Check negative combined step
@ 2022-01-13  1:48 Jiufu Guo
  2022-01-13  1:48 ` [PATCH 2/2] Add assumption combining iv Jiufu Guo
  2022-01-24 10:37 ` [PATCH 1/2] Check negative combined step Richard Biener
  0 siblings, 2 replies; 11+ messages in thread
From: Jiufu Guo @ 2022-01-13  1:48 UTC (permalink / raw)
  To: gcc-patches
  Cc: rguenther, amker.cheng, guojiufu, wschmidt, segher, dje.gcc, jlaw

Hi,

Previously, there is discussion in:
https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586460.html
I seperate it as two patches.

This first patch is to avoid negative step when combining two ivs.
The second patch is adding more accurate assumptions.

This patch pass bootstrap and regtest on ppc64, ppc64le and x86_64.
Is this ok for trunk?

BR,
Jiufu

	PR tree-optimization/100740

gcc/ChangeLog:

	* tree-ssa-loop-niter.c (number_of_iterations_cond): Check
	sign of combined step.

gcc/testsuite/ChangeLog:

	* gcc.c-torture/execute/pr100740.c: New test.



---
 gcc/tree-ssa-loop-niter.c                      |  6 ++++--
 gcc/testsuite/gcc.c-torture/execute/pr100740.c | 13 +++++++++++++
 2 files changed, 17 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr100740.c

diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
index b767056aeb0..439d595a79f 100644
--- a/gcc/tree-ssa-loop-niter.c
+++ b/gcc/tree-ssa-loop-niter.c
@@ -1890,8 +1890,10 @@ number_of_iterations_cond (class loop *loop,
       tree step = fold_binary_to_constant (MINUS_EXPR, step_type,
 					   iv0->step, iv1->step);
 
-      /* No need to check sign of the new step since below code takes care
-	 of this well.  */
+      /* Like cases shown in PR100740/102131, negtive step is not safe.  */
+      if (tree_int_cst_sign_bit (step))
+	return false;
+
       if (code != NE_EXPR
 	  && (TREE_CODE (step) != INTEGER_CST
 	      || !iv0->no_overflow || !iv1->no_overflow))
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr100740.c b/gcc/testsuite/gcc.c-torture/execute/pr100740.c
new file mode 100644
index 00000000000..381cdeb947a
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr100740.c
@@ -0,0 +1,13 @@
+/* PR tree-optimization/100740 */
+
+unsigned a, b;
+int
+main ()
+{
+  unsigned c = 0;
+  for (a = 0; a < 2; a++)
+    for (b = 0; b < 2; b++)
+      if (++c < a)
+	__builtin_abort ();
+  return 0;
+}
-- 
2.17.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 2/2] Add assumption combining iv
  2022-01-13  1:48 [PATCH 1/2] Check negative combined step Jiufu Guo
@ 2022-01-13  1:48 ` Jiufu Guo
  2022-01-24 10:37 ` [PATCH 1/2] Check negative combined step Richard Biener
  1 sibling, 0 replies; 11+ messages in thread
From: Jiufu Guo @ 2022-01-13  1:48 UTC (permalink / raw)
  To: gcc-patches
  Cc: rguenther, amker.cheng, guojiufu, wschmidt, segher, dje.gcc, jlaw

This is the second patch for two IVs combining.
When one IV is chasing another one, to make it safe, we should check if
there is wrap/overflow for either IV.

With the assumption, which computed as this patch, the number of iterations
can be caculated, even the no_overflow flag is not updated for unsigned IVs,
like the test case of this patch.

This patch pass bootstrap and regtest on ppc64le and x86_64.
Is this ok for trunk, or it may more suitable for stage1.

BR,
Jiufu

	PR tree-optimization/102131

gcc/ChangeLog:

	* tree-ssa-loop-niter.c (get_step_count): New function.
	(iv_chase_assumption): New function.
	(number_of_iterations_cond): Call iv_chase_assumption.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/pr102131.c: New test.


---
 gcc/tree-ssa-loop-niter.c            | 73 ++++++++++++++++++++++++----
 gcc/testsuite/gcc.dg/vect/pr102131.c | 47 ++++++++++++++++++
 2 files changed, 110 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr102131.c

diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
index 439d595a79f..56261164f28 100644
--- a/gcc/tree-ssa-loop-niter.c
+++ b/gcc/tree-ssa-loop-niter.c
@@ -1788,6 +1788,63 @@ dump_affine_iv (FILE *file, affine_iv *iv)
     }
 }
 
+/* Generate expr: (HIGH - LOW) / STEP, under UTYPE. */
+
+static tree
+get_step_count (tree high, tree low, tree step, tree utype,
+		bool end_inclusive = false)
+{
+  tree delta = fold_build2 (MINUS_EXPR, TREE_TYPE (low), high, low);
+  delta = fold_convert (utype, delta);
+  if (end_inclusive)
+    delta = fold_build2 (PLUS_EXPR, utype, delta, build_one_cst (utype));
+
+  if (tree_int_cst_sign_bit (step))
+    step = fold_build1 (NEGATE_EXPR, TREE_TYPE (step), step);
+  step = fold_convert (utype, step);
+
+  return fold_build2 (FLOOR_DIV_EXPR, utype, delta, step);
+}
+
+/*  Get the additional assumption if both two steps are not zero.
+    Assumptions satisfy that there is no overflow or wrap during
+    v0 and v1 chasing.  */
+
+static tree
+iv_chase_assumption (affine_iv *iv0, affine_iv *iv1, tree step,
+		     enum tree_code code)
+{
+  /* Here, no need additional assumptions for NE.  */
+  if (code == NE_EXPR)
+    return boolean_true_node;
+
+  /* No need addition assumption for pointer.  */
+  tree type = TREE_TYPE (iv0->base);
+  if (POINTER_TYPE_P (type))
+    return boolean_true_node;
+
+  bool positive0 = !tree_int_cst_sign_bit (iv0->step);
+  bool positive1 = !tree_int_cst_sign_bit (iv1->step);
+  tree utype = unsigned_type_for (type);
+  bool add1 = code == LE_EXPR;
+  tree niter = get_step_count (iv1->base, iv0->base, step, utype, add1);
+
+  int prec = TYPE_PRECISION (type);
+  signop sgn = TYPE_SIGN (type);
+  tree max = wide_int_to_tree (type, wi::max_value (prec, sgn));
+  tree min = wide_int_to_tree (type, wi::min_value (prec, sgn));
+  tree valid_niter0, valid_niter1;
+
+  valid_niter0 = positive0 ? get_step_count (max, iv0->base, iv0->step, utype)
+			   : get_step_count (iv0->base, min, iv0->step, utype);
+  valid_niter1 = positive1 ? get_step_count (max, iv1->base, iv1->step, utype)
+			   : get_step_count (iv1->base, min, iv1->step, utype);
+
+  tree e0 = fold_build2 (LT_EXPR, boolean_type_node, niter, valid_niter0);
+  tree e1 = fold_build2 (LT_EXPR, boolean_type_node, niter, valid_niter1);
+  return fold_build2 (TRUTH_AND_EXPR, boolean_type_node, e0, e1);
+}
+
 /* Determine the number of iterations according to condition (for staying
    inside loop) which compares two induction variables using comparison
    operator CODE.  The induction variable on left side of the comparison
@@ -1879,11 +1936,10 @@ number_of_iterations_cond (class loop *loop,
        {iv0.base, iv0.step - iv1.step} cmp_code {iv1.base, 0}
 
      provided that either below condition is satisfied:
+     a. iv0.step and iv1.step are integer.
+     b. Additional condition: before iv0 chase up v1, iv0 and iv1 should not
+     step over min or max of the type.  */
 
-       a) the test is NE_EXPR;
-       b) iv0.step - iv1.step is integer and iv0/iv1 don't overflow.
-
-     This rarely occurs in practice, but it is simple enough to manage.  */
   if (!integer_zerop (iv0->step) && !integer_zerop (iv1->step))
     {
       tree step_type = POINTER_TYPE_P (type) ? sizetype : type;
@@ -1894,15 +1950,12 @@ number_of_iterations_cond (class loop *loop,
       if (tree_int_cst_sign_bit (step))
 	return false;
 
-      if (code != NE_EXPR
-	  && (TREE_CODE (step) != INTEGER_CST
-	      || !iv0->no_overflow || !iv1->no_overflow))
+      niter->assumptions = iv_chase_assumption (iv0, iv1, step, code);
+      if (integer_zerop (niter->assumptions))
 	return false;
 
       iv0->step = step;
-      if (!POINTER_TYPE_P (type))
-	iv0->no_overflow = false;
-
+      iv0->no_overflow = true;
       iv1->step = build_int_cst (step_type, 0);
       iv1->no_overflow = true;
     }
diff --git a/gcc/testsuite/gcc.dg/vect/pr102131.c b/gcc/testsuite/gcc.dg/vect/pr102131.c
new file mode 100644
index 00000000000..23975cfeadb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr102131.c
@@ -0,0 +1,47 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "-O3" } */
+#define MAX ((unsigned int) 0xffffffff)
+#define MIN ((unsigned int) (0))
+
+int arr[512];
+
+#define FUNC(NAME, CODE, S0, S1)                                               \
+  unsigned __attribute__ ((noinline)) NAME (unsigned int b0, unsigned int b1)  \
+  {                                                                            \
+    unsigned int n = 0;                                                        \
+    unsigned int i0, i1;                                                       \
+    int *p = arr;                                                              \
+    for (i0 = b0, i1 = b1; i0 CODE i1; i0 += S0, i1 += S1)                     \
+      {                                                                        \
+	n++;                                                                   \
+	*p++ = i0 + i1;                                                        \
+      }                                                                        \
+    return n;                                                                  \
+  }
+
+FUNC (lt_5_1, <, 5, 1);
+FUNC (le_1_m5, <=, 1, -5);
+FUNC (lt_1_10, <, 1, 10);
+
+int
+main ()
+{
+  int fail = 0;
+  if (lt_5_1 (MAX - 124, MAX - 27) != 28)
+    fail++;
+
+  /* to save time, do not run this. */
+  /*
+  if (le_1_m5 (MIN + 1, MIN + 9) != 715827885)
+    fail++;  */
+
+  if (lt_1_10 (MAX - 1000, MAX - 500) != 51)
+    fail++;
+
+  if (fail)
+    __builtin_abort ();
+  
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } } */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] Check negative combined step
  2022-01-13  1:48 [PATCH 1/2] Check negative combined step Jiufu Guo
  2022-01-13  1:48 ` [PATCH 2/2] Add assumption combining iv Jiufu Guo
@ 2022-01-24 10:37 ` Richard Biener
  2022-01-25  4:25   ` Jiufu Guo
  1 sibling, 1 reply; 11+ messages in thread
From: Richard Biener @ 2022-01-24 10:37 UTC (permalink / raw)
  To: Jiufu Guo; +Cc: gcc-patches, amker.cheng, wschmidt, segher, dje.gcc, jlaw

On Thu, 13 Jan 2022, Jiufu Guo wrote:

> Hi,
> 
> Previously, there is discussion in:
> https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586460.html
> I seperate it as two patches.
> 
> This first patch is to avoid negative step when combining two ivs.
> The second patch is adding more accurate assumptions.
> 
> This patch pass bootstrap and regtest on ppc64, ppc64le and x86_64.
> Is this ok for trunk?
> 
> BR,
> Jiufu
> 
> 	PR tree-optimization/100740
> 
> gcc/ChangeLog:
> 
> 	* tree-ssa-loop-niter.c (number_of_iterations_cond): Check
> 	sign of combined step.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.c-torture/execute/pr100740.c: New test.
> 
> 
> 
> ---
>  gcc/tree-ssa-loop-niter.c                      |  6 ++++--
>  gcc/testsuite/gcc.c-torture/execute/pr100740.c | 13 +++++++++++++
>  2 files changed, 17 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr100740.c
> 
> diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
> index b767056aeb0..439d595a79f 100644
> --- a/gcc/tree-ssa-loop-niter.c
> +++ b/gcc/tree-ssa-loop-niter.c
> @@ -1890,8 +1890,10 @@ number_of_iterations_cond (class loop *loop,
>        tree step = fold_binary_to_constant (MINUS_EXPR, step_type,
>  					   iv0->step, iv1->step);
>  
> -      /* No need to check sign of the new step since below code takes care
> -	 of this well.  */
> +      /* Like cases shown in PR100740/102131, negtive step is not safe.  */
> +      if (tree_int_cst_sign_bit (step))
> +	return false;
> +
>        if (code != NE_EXPR
>  	  && (TREE_CODE (step) != INTEGER_CST
>  	      || !iv0->no_overflow || !iv1->no_overflow))

I think for NE_EXPR the sign is not relevant.  I think the key is
that we require that iv0->no_overflow and for non-equality checks
adjusting X + C1 < Y + C2 to X + C1 - C2 < Y is only valid if that
does not cause any overflow on either side.  On the LHS this is
only guaranteed if the absolute value of C1 - C2 is smaller than
that of C1 and if it has the same sign.

With the same reasoning we then know the new IV0 doesn't overflow.

So something like the following.  IIRC I've proposed sth similar
a while back.  I'm going to give it some testing, collecting
testcases from related PRs.

diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
index b767056aeb0..74fa4f66ee2 100644
--- a/gcc/tree-ssa-loop-niter.cc
+++ b/gcc/tree-ssa-loop-niter.cc
@@ -1890,17 +1890,28 @@ number_of_iterations_cond (class loop *loop,
       tree step = fold_binary_to_constant (MINUS_EXPR, step_type,
                                           iv0->step, iv1->step);
 
-      /* No need to check sign of the new step since below code takes 
care
-        of this well.  */
-      if (code != NE_EXPR
-         && (TREE_CODE (step) != INTEGER_CST
-             || !iv0->no_overflow || !iv1->no_overflow))
-       return false;
+      /* For code other than NE_EXPR we have to ensure moving the 
evolution
+        of IV1 to that of IV0 does not introduce overflow.  */
+      if (TREE_CODE (step) != INTEGER_CST
+         || !iv0->no_overflow || !iv1->no_overflow)
+       {
+         if (code != NE_EXPR)
+           return false;
+         iv0->no_overflow = false;
+       }
+      /* If the new step of IV0 has changed sign or is of greater
+        magnitude then we do not know whether IV0 does overflow
+        and thus the transform is not valid for code other than NE_EXPR  
*/
+      else if (tree_int_cst_sign_bit (step) != tree_int_cst_sign_bit 
(iv0->step)
+              || wi::gtu_p (wi::abs (wi::to_widest (step)),
+                            wi::abs (wi::to_widest (iv0->step))))
+       {
+         if (code != NE_EXPR)
+           return false;
+         iv0->no_overflow = false;
+       }
 
       iv0->step = step;
-      if (!POINTER_TYPE_P (type))
-       iv0->no_overflow = false;
-
       iv1->step = build_int_cst (step_type, 0);
       iv1->no_overflow = true;
     }


> diff --git a/gcc/testsuite/gcc.c-torture/execute/pr100740.c b/gcc/testsuite/gcc.c-torture/execute/pr100740.c
> new file mode 100644
> index 00000000000..381cdeb947a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/pr100740.c
> @@ -0,0 +1,13 @@
> +/* PR tree-optimization/100740 */
> +
> +unsigned a, b;
> +int
> +main ()
> +{
> +  unsigned c = 0;
> +  for (a = 0; a < 2; a++)
> +    for (b = 0; b < 2; b++)
> +      if (++c < a)
> +	__builtin_abort ();
> +  return 0;
> +}
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] Check negative combined step
  2022-01-24 10:37 ` [PATCH 1/2] Check negative combined step Richard Biener
@ 2022-01-25  4:25   ` Jiufu Guo
  2022-01-25  5:15     ` Jiufu Guo
  0 siblings, 1 reply; 11+ messages in thread
From: Jiufu Guo @ 2022-01-25  4:25 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, amker.cheng, wschmidt, segher, dje.gcc, jlaw

Richard Biener <rguenther@suse.de> writes:

> On Thu, 13 Jan 2022, Jiufu Guo wrote:
...
>
>> -      /* No need to check sign of the new step since below code takes care
>> -	 of this well.  */
>> +      /* Like cases shown in PR100740/102131, negtive step is not safe.  */
>> +      if (tree_int_cst_sign_bit (step))
>> +	return false;
>> +
>>        if (code != NE_EXPR
>>  	  && (TREE_CODE (step) != INTEGER_CST
>>  	      || !iv0->no_overflow || !iv1->no_overflow))
>
> I think for NE_EXPR the sign is not relevant.  I think the key is
> that we require that iv0->no_overflow and for non-equality checks
> adjusting X + C1 < Y + C2 to X + C1 - C2 < Y is only valid if that
> does not cause any overflow on either side.  On the LHS this is

Hi Richard,

Thanks a lot for your comments and ideas!

Right! The adjusting is safe only if we can make sure there is
no overflow/wrap on either side or say there is no overflow/wrap
on three 'iv's: {X,C1}, {Y,C2} and {X, C1 - C2}.

Or it may also ok if we can compute an assumption, under which
all three ivs are not overflowed/wrapped.

> only guaranteed if the absolute value of C1 - C2 is smaller than
> that of C1 and if it has the same sign.
I'm thinking this in another way:
When trying to do this transform in number_of_iterations_cond,
GT/GE is inverted to LT/LE, then the compare code would be:
LT/LE or NE.

For LT/LE, like {X, C1} < {Y, C2}, we can look it as iv0 is
chasing iv1.  We would able to assume X < Y (may_be_zero would
be set later via number_of_iterations_lt/le).
1. If C1 < C2, iv0 can never catch up iv1. For examples:
{X, 1} < {Y, 2}; {X, -2} < {Y, -1}; {X, -2} < {Y, 1}.
And there must be at least one overflow/wrap in iv0,iv1, or iv.
This indicates, if the sign of (C1 - C1) is negative, then the
transform would be incorrect.
2. If C1 > C2, we still need to make sure all the ivs (iv0,
iv1 and combined iv) are not wrapped.
For C2 > 0, {Y,C2} should not cross MAX before {X, C1} catch up.
   the assumption may like : (MAX-Y)/C2 > (Y-X)/(C1-C1)
For C1 < 0, {X,C1} should not down cross MIN
   the assumption may like : (X-MIN)/-C1 > (Y-X)/(C1-C1)
For C1 > 0 and C2 < 0, iv0 and iv1 are walking to each other,
it would be almost safe.

For NE, it seems more interesting. The transformation depends
on 3 things: 1. the relation between X and Y; 2 the sign
of (C1-C2); 3. if iv0 and iv1 can be equal finally.
The 3rd one may be more special.
The good news is, number_of_iterations_ne seems able to handle NE.

>
> With the same reasoning we then know the new IV0 doesn't overflow.
>
> So something like the following.  IIRC I've proposed sth similar
> a while back.  I'm going to give it some testing, collecting
> testcases from related PRs.
>
> diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
> index b767056aeb0..74fa4f66ee2 100644
> --- a/gcc/tree-ssa-loop-niter.cc
> +++ b/gcc/tree-ssa-loop-niter.cc
> @@ -1890,17 +1890,28 @@ number_of_iterations_cond (class loop *loop,
>        tree step = fold_binary_to_constant (MINUS_EXPR, step_type,
>                                            iv0->step, iv1->step);
>  
> -      /* No need to check sign of the new step since below code takes 
> care
> -        of this well.  */
> -      if (code != NE_EXPR
> -         && (TREE_CODE (step) != INTEGER_CST
> -             || !iv0->no_overflow || !iv1->no_overflow))
> -       return false;
> +      /* For code other than NE_EXPR we have to ensure moving the 
> evolution
> +        of IV1 to that of IV0 does not introduce overflow.  */
> +      if (TREE_CODE (step) != INTEGER_CST
> +         || !iv0->no_overflow || !iv1->no_overflow)
> +       {
I was also trying to leverage no_overflow of iv0 and iv1. While it seems
the computation logic of no_overflow is related to the type of IV.  If the
type of IV is signed, the C semantics may be used, overflow in signed
IV are treated UB, and then no_overflow would be true.

For unsigned IV, no_overflow would be false, even for the cases which
looks like:
"{10, 2} < {20, 1}", which would be ok to compute niter.

BR,
Jiufu

> +         if (code != NE_EXPR)
> +           return false;
> +         iv0->no_overflow = false;
> +       }
> +      /* If the new step of IV0 has changed sign or is of greater
> +        magnitude then we do not know whether IV0 does overflow
> +        and thus the transform is not valid for code other than NE_EXPR  
> */
> +      else if (tree_int_cst_sign_bit (step) != tree_int_cst_sign_bit 
> (iv0->step)
> +              || wi::gtu_p (wi::abs (wi::to_widest (step)),
> +                            wi::abs (wi::to_widest (iv0->step))))
> +       {
> +         if (code != NE_EXPR)
> +           return false;
> +         iv0->no_overflow = false;
> +       }
>  
>        iv0->step = step;
> -      if (!POINTER_TYPE_P (type))
> -       iv0->no_overflow = false;
> -
>        iv1->step = build_int_cst (step_type, 0);
>        iv1->no_overflow = true;
>      }
>
>
>> diff --git a/gcc/testsuite/gcc.c-torture/execute/pr100740.c
>> b/gcc/testsuite/gcc.c-torture/execute/pr100740.c
>> new file mode 100644
>> index 00000000000..381cdeb947a
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.c-torture/execute/pr100740.c
>> @@ -0,0 +1,13 @@
>> +/* PR tree-optimization/100740 */
>> +
>> +unsigned a, b;
>> +int
>> +main ()
>> +{
>> +  unsigned c = 0;
>> +  for (a = 0; a < 2; a++)
>> +    for (b = 0; b < 2; b++)
>> +      if (++c < a)
>> +	__builtin_abort ();
>> +  return 0;
>> +}
>> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] Check negative combined step
  2022-01-25  4:25   ` Jiufu Guo
@ 2022-01-25  5:15     ` Jiufu Guo
  2022-01-25  7:47       ` Richard Biener
  0 siblings, 1 reply; 11+ messages in thread
From: Jiufu Guo @ 2022-01-25  5:15 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, amker.cheng, wschmidt, segher, dje.gcc, jlaw

Jiufu Guo <guojiufu@linux.ibm.com> writes:

> Richard Biener <rguenther@suse.de> writes:
>
>> On Thu, 13 Jan 2022, Jiufu Guo wrote:
> ...
>>
>>> -      /* No need to check sign of the new step since below code takes care
>>> -	 of this well.  */
>>> +      /* Like cases shown in PR100740/102131, negtive step is not safe.  */
>>> +      if (tree_int_cst_sign_bit (step))
>>> +	return false;
>>> +
>>>        if (code != NE_EXPR
>>>  	  && (TREE_CODE (step) != INTEGER_CST
>>>  	      || !iv0->no_overflow || !iv1->no_overflow))
>>
>> I think for NE_EXPR the sign is not relevant.  I think the key is
>> that we require that iv0->no_overflow and for non-equality checks
>> adjusting X + C1 < Y + C2 to X + C1 - C2 < Y is only valid if that
>> does not cause any overflow on either side.  On the LHS this is
>
> Hi Richard,
>
> Thanks a lot for your comments and ideas!
>
> Right! The adjusting is safe only if we can make sure there is
> no overflow/wrap on either side or say there is no overflow/wrap
> on three 'iv's: {X,C1}, {Y,C2} and {X, C1 - C2}.
>
> Or it may also ok if we can compute an assumption, under which
> all three ivs are not overflowed/wrapped.
>
>> only guaranteed if the absolute value of C1 - C2 is smaller than
>> that of C1 and if it has the same sign.
> I'm thinking this in another way:
> When trying to do this transform in number_of_iterations_cond,
> GT/GE is inverted to LT/LE, then the compare code would be:
> LT/LE or NE.
>
> For LT/LE, like {X, C1} < {Y, C2}, we can look it as iv0 is
> chasing iv1.  We would able to assume X < Y (may_be_zero would
> be set later via number_of_iterations_lt/le).
> 1. If C1 < C2, iv0 can never catch up iv1. For examples:
> {X, 1} < {Y, 2}; {X, -2} < {Y, -1}; {X, -2} < {Y, 1}.
> And there must be at least one overflow/wrap in iv0,iv1, or iv.
> This indicates, if the sign of (C1 - C1) is negative, then the
> transform would be incorrect.
> 2. If C1 > C2, we still need to make sure all the ivs (iv0,
> iv1 and combined iv) are not wrapped.
> For C2 > 0, {Y,C2} should not cross MAX before {X, C1} catch up.
>    the assumption may like : (MAX-Y)/C2 > (Y-X)/(C1-C1)
There is still some cases: iv0 step is too large, then iv0 wraps
first, e.g. {MAX-5, 10} < {MAX-3, 1}. For this, the assumption
would need to and with (MAX-X)/C1 > (Y-X)/(C1-C1).

> For C1 < 0, {X,C1} should not down cross MIN
>    the assumption may like : (X-MIN)/-C1 > (Y-X)/(C1-C1)
  Also add the assumption: (Y-MIN)/-C2 > (Y-X)/(C1-C1)

> For C1 > 0 and C2 < 0, iv0 and iv1 are walking to each other,
> it would be almost safe.
For this case, we may still add the assumption to avoid wraping
at the first iteration.

BR,
Jiufu

>
> For NE, it seems more interesting. The transformation depends
> on 3 things: 1. the relation between X and Y; 2 the sign
> of (C1-C2); 3. if iv0 and iv1 can be equal finally.
> The 3rd one may be more special.
> The good news is, number_of_iterations_ne seems able to handle NE.
>
>>
>> With the same reasoning we then know the new IV0 doesn't overflow.
>>
>> So something like the following.  IIRC I've proposed sth similar
>> a while back.  I'm going to give it some testing, collecting
>> testcases from related PRs.
>>
>> diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
>> index b767056aeb0..74fa4f66ee2 100644
>> --- a/gcc/tree-ssa-loop-niter.cc
>> +++ b/gcc/tree-ssa-loop-niter.cc
>> @@ -1890,17 +1890,28 @@ number_of_iterations_cond (class loop *loop,
>>        tree step = fold_binary_to_constant (MINUS_EXPR, step_type,
>>                                            iv0->step, iv1->step);
>>  
>> -      /* No need to check sign of the new step since below code takes 
>> care
>> -        of this well.  */
>> -      if (code != NE_EXPR
>> -         && (TREE_CODE (step) != INTEGER_CST
>> -             || !iv0->no_overflow || !iv1->no_overflow))
>> -       return false;
>> +      /* For code other than NE_EXPR we have to ensure moving the 
>> evolution
>> +        of IV1 to that of IV0 does not introduce overflow.  */
>> +      if (TREE_CODE (step) != INTEGER_CST
>> +         || !iv0->no_overflow || !iv1->no_overflow)
>> +       {
> I was also trying to leverage no_overflow of iv0 and iv1. While it seems
> the computation logic of no_overflow is related to the type of IV.  If the
> type of IV is signed, the C semantics may be used, overflow in signed
> IV are treated UB, and then no_overflow would be true.
>
> For unsigned IV, no_overflow would be false, even for the cases which
> looks like:
> "{10, 2} < {20, 1}", which would be ok to compute niter.
>
> BR,
> Jiufu
>
>> +         if (code != NE_EXPR)
>> +           return false;
>> +         iv0->no_overflow = false;
>> +       }
>> +      /* If the new step of IV0 has changed sign or is of greater
>> +        magnitude then we do not know whether IV0 does overflow
>> +        and thus the transform is not valid for code other than NE_EXPR  
>> */
>> +      else if (tree_int_cst_sign_bit (step) != tree_int_cst_sign_bit 
>> (iv0->step)
>> +              || wi::gtu_p (wi::abs (wi::to_widest (step)),
>> +                            wi::abs (wi::to_widest (iv0->step))))
>> +       {
>> +         if (code != NE_EXPR)
>> +           return false;
>> +         iv0->no_overflow = false;
>> +       }
>>  
>>        iv0->step = step;
>> -      if (!POINTER_TYPE_P (type))
>> -       iv0->no_overflow = false;
>> -
>>        iv1->step = build_int_cst (step_type, 0);
>>        iv1->no_overflow = true;
>>      }
>>
>>
>>> diff --git a/gcc/testsuite/gcc.c-torture/execute/pr100740.c
>>> b/gcc/testsuite/gcc.c-torture/execute/pr100740.c
>>> new file mode 100644
>>> index 00000000000..381cdeb947a
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.c-torture/execute/pr100740.c
>>> @@ -0,0 +1,13 @@
>>> +/* PR tree-optimization/100740 */
>>> +
>>> +unsigned a, b;
>>> +int
>>> +main ()
>>> +{
>>> +  unsigned c = 0;
>>> +  for (a = 0; a < 2; a++)
>>> +    for (b = 0; b < 2; b++)
>>> +      if (++c < a)
>>> +	__builtin_abort ();
>>> +  return 0;
>>> +}
>>> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] Check negative combined step
  2022-01-25  5:15     ` Jiufu Guo
@ 2022-01-25  7:47       ` Richard Biener
  2022-01-25 11:42         ` Jiufu Guo
  0 siblings, 1 reply; 11+ messages in thread
From: Richard Biener @ 2022-01-25  7:47 UTC (permalink / raw)
  To: Jiufu Guo; +Cc: gcc-patches, amker.cheng, wschmidt, segher, dje.gcc, jlaw

On Tue, 25 Jan 2022, Jiufu Guo wrote:

> Jiufu Guo <guojiufu@linux.ibm.com> writes:
> 
> > Richard Biener <rguenther@suse.de> writes:
> >
> >> On Thu, 13 Jan 2022, Jiufu Guo wrote:
> > ...
> >>
> >>> -      /* No need to check sign of the new step since below code takes care
> >>> -	 of this well.  */
> >>> +      /* Like cases shown in PR100740/102131, negtive step is not safe.  */
> >>> +      if (tree_int_cst_sign_bit (step))
> >>> +	return false;
> >>> +
> >>>        if (code != NE_EXPR
> >>>  	  && (TREE_CODE (step) != INTEGER_CST
> >>>  	      || !iv0->no_overflow || !iv1->no_overflow))
> >>
> >> I think for NE_EXPR the sign is not relevant.  I think the key is
> >> that we require that iv0->no_overflow and for non-equality checks
> >> adjusting X + C1 < Y + C2 to X + C1 - C2 < Y is only valid if that
> >> does not cause any overflow on either side.  On the LHS this is
> >
> > Hi Richard,
> >
> > Thanks a lot for your comments and ideas!
> >
> > Right! The adjusting is safe only if we can make sure there is
> > no overflow/wrap on either side or say there is no overflow/wrap
> > on three 'iv's: {X,C1}, {Y,C2} and {X, C1 - C2}.

The point is that we may not change the iteration number at which
overflow occurs since that alters the result of the < compare.
Only if we know there is no overflow with the present expression
during the loop evaluation we can do the transform and then only
if we do not introduce overflow.

We are basically doing the transform that fold_comparison
in fold-const.cc does:

  /* Transform comparisons of the form X +- C1 CMP Y +- C2 to
     X CMP Y +- C2 +- C1 for signed X, Y.  This is valid if
     the resulting offset is smaller in absolute value than the
     original one and has the same sign.  */
  if (ANY_INTEGRAL_TYPE_P (TREE_TYPE (arg0))
      && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (arg0))
...

> > Or it may also ok if we can compute an assumption, under which
> > all three ivs are not overflowed/wrapped.
> >
> >> only guaranteed if the absolute value of C1 - C2 is smaller than
> >> that of C1 and if it has the same sign.
> > I'm thinking this in another way:
> > When trying to do this transform in number_of_iterations_cond,
> > GT/GE is inverted to LT/LE, then the compare code would be:
> > LT/LE or NE.
> >
> > For LT/LE, like {X, C1} < {Y, C2}, we can look it as iv0 is
> > chasing iv1.  We would able to assume X < Y (may_be_zero would
> > be set later via number_of_iterations_lt/le).
> > 1. If C1 < C2, iv0 can never catch up iv1. For examples:
> > {X, 1} < {Y, 2}; {X, -2} < {Y, -1}; {X, -2} < {Y, 1}.
> > And there must be at least one overflow/wrap in iv0,iv1, or iv.
> > This indicates, if the sign of (C1 - C1) is negative, then the
> > transform would be incorrect.
> > 2. If C1 > C2, we still need to make sure all the ivs (iv0,
> > iv1 and combined iv) are not wrapped.
> > For C2 > 0, {Y,C2} should not cross MAX before {X, C1} catch up.
> >    the assumption may like : (MAX-Y)/C2 > (Y-X)/(C1-C1)
> There is still some cases: iv0 step is too large, then iv0 wraps
> first, e.g. {MAX-5, 10} < {MAX-3, 1}. For this, the assumption
> would need to and with (MAX-X)/C1 > (Y-X)/(C1-C1).
> 
> > For C1 < 0, {X,C1} should not down cross MIN
> >    the assumption may like : (X-MIN)/-C1 > (Y-X)/(C1-C1)
>   Also add the assumption: (Y-MIN)/-C2 > (Y-X)/(C1-C1)
> 
> > For C1 > 0 and C2 < 0, iv0 and iv1 are walking to each other,
> > it would be almost safe.
> For this case, we may still add the assumption to avoid wraping
> at the first iteration.
> 
> BR,
> Jiufu
> 
> >
> > For NE, it seems more interesting. The transformation depends
> > on 3 things: 1. the relation between X and Y; 2 the sign
> > of (C1-C2); 3. if iv0 and iv1 can be equal finally.
> > The 3rd one may be more special.
> > The good news is, number_of_iterations_ne seems able to handle NE.
> >
> >>
> >> With the same reasoning we then know the new IV0 doesn't overflow.
> >>
> >> So something like the following.  IIRC I've proposed sth similar
> >> a while back.  I'm going to give it some testing, collecting
> >> testcases from related PRs.
> >>
> >> diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
> >> index b767056aeb0..74fa4f66ee2 100644
> >> --- a/gcc/tree-ssa-loop-niter.cc
> >> +++ b/gcc/tree-ssa-loop-niter.cc
> >> @@ -1890,17 +1890,28 @@ number_of_iterations_cond (class loop *loop,
> >>        tree step = fold_binary_to_constant (MINUS_EXPR, step_type,
> >>                                            iv0->step, iv1->step);
> >>  
> >> -      /* No need to check sign of the new step since below code takes 
> >> care
> >> -        of this well.  */
> >> -      if (code != NE_EXPR
> >> -         && (TREE_CODE (step) != INTEGER_CST
> >> -             || !iv0->no_overflow || !iv1->no_overflow))
> >> -       return false;
> >> +      /* For code other than NE_EXPR we have to ensure moving the 
> >> evolution
> >> +        of IV1 to that of IV0 does not introduce overflow.  */
> >> +      if (TREE_CODE (step) != INTEGER_CST
> >> +         || !iv0->no_overflow || !iv1->no_overflow)
> >> +       {
> > I was also trying to leverage no_overflow of iv0 and iv1. While it seems
> > the computation logic of no_overflow is related to the type of IV.  If the
> > type of IV is signed, the C semantics may be used, overflow in signed
> > IV are treated UB, and then no_overflow would be true.
> >
> > For unsigned IV, no_overflow would be false, even for the cases which
> > looks like:
> > "{10, 2} < {20, 1}", which would be ok to compute niter.

IIRC no_overflow is determined by SCEV which might also use niter
analysis.  For the case of {10, +2} < {20, +1} there is no need to
compute it as {10, +1} < 20 and we hopefully deal with this in
other code paths (in fact with base and step all constant we
can simply solve the linear equation for 'n' - maybe that's a
capability we should add to number_of_iterations_cond).

Running a small example through the debugger I can indeed see
no_overflow = true for this case (after a few times == false),
so we do have some ways of determining there is no overflow for this
case.

I do not think we should try to add special-casings to this very
generic transform.

I will look at the regression that was pointed out now.

Richard.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] Check negative combined step
  2022-01-25  7:47       ` Richard Biener
@ 2022-01-25 11:42         ` Jiufu Guo
  2022-01-25 12:02           ` Richard Biener
  0 siblings, 1 reply; 11+ messages in thread
From: Jiufu Guo @ 2022-01-25 11:42 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, amker.cheng, wschmidt, segher, dje.gcc, jlaw

Richard Biener <rguenther@suse.de> writes:

> On Tue, 25 Jan 2022, Jiufu Guo wrote:
>
>> Jiufu Guo <guojiufu@linux.ibm.com> writes:
>> 
>> > Richard Biener <rguenther@suse.de> writes:
>> >
>> >> On Thu, 13 Jan 2022, Jiufu Guo wrote:
>> > ...
>> >>
>> >>> -      /* No need to check sign of the new step since below code takes care
>> >>> -	 of this well.  */
>> >>> +      /* Like cases shown in PR100740/102131, negtive step is not safe.  */
>> >>> +      if (tree_int_cst_sign_bit (step))
>> >>> +	return false;
>> >>> +
>> >>>        if (code != NE_EXPR
>> >>>  	  && (TREE_CODE (step) != INTEGER_CST
>> >>>  	      || !iv0->no_overflow || !iv1->no_overflow))
>> >>
>> >> I think for NE_EXPR the sign is not relevant.  I think the key is
>> >> that we require that iv0->no_overflow and for non-equality checks
>> >> adjusting X + C1 < Y + C2 to X + C1 - C2 < Y is only valid if that
>> >> does not cause any overflow on either side.  On the LHS this is
>> >
>> > Hi Richard,
>> >
>> > Thanks a lot for your comments and ideas!
>> >
>> > Right! The adjusting is safe only if we can make sure there is
>> > no overflow/wrap on either side or say there is no overflow/wrap
>> > on three 'iv's: {X,C1}, {Y,C2} and {X, C1 - C2}.
Hi Richard,

Thanks for your quickly reply, and patient review!
>
> The point is that we may not change the iteration number at which
> overflow occurs since that alters the result of the < compare.
> Only if we know there is no overflow with the present expression
> during the loop evaluation we can do the transform and then only
> if we do not introduce overflow.
Exactly, this is also what I mean :)

>
> We are basically doing the transform that fold_comparison
> in fold-const.cc does:
>
>   /* Transform comparisons of the form X +- C1 CMP Y +- C2 to
>      X CMP Y +- C2 +- C1 for signed X, Y.  This is valid if
>      the resulting offset is smaller in absolute value than the
>      original one and has the same sign.  */
>   if (ANY_INTEGRAL_TYPE_P (TREE_TYPE (arg0))
>       && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (arg0))
> ...
>
This transform seems not the scenario which we are care about in
number_of_iterations_cond.
For example, 'X + 1 < Y + 4' ==> 'X < Y + 3' would be correct if
no overflow happen.
But for loop, the niter for '{X, 1} < {Y, 4}' would be totally
different with niter for '{X, 0} < {Y, 3}'.
for (iv0 = X, iv1 = Y; iv0 < iv1; iv0 += 1, iv1 += 4)
   in this loop, iv1 walks to overflow faster, step is 4.
vs.
for (iv0 = X, iv1 = Y; iv0 < iv1; iv1 += 3) (iv1 overflow slow)
   in this loop, iv1 overflows slower, step is 3.

Actually, the transformation 'X + 1 < Y + 4' ==> 'X < Y + 3',
may not always correct.  e.g. for below code, X=6, and Y=2147483645
it may output "b0 + 1 < b1 + 4 is true".
```c
int __attribute__ ((noinline))
foo (int b0, int b1)
{
  return __builtin_printf ("b0 + 1 < b1 + 4 is %s\n",
			   b0 + 1 < b1 + 4 ? "true" : "false");
}

int main(int argc, char **argv)
{
  if (argc < 2)
    return -1;
  int b0 = atoi(argv[1]);
  int b1 = atoi(argv[2]);  
  return foo (b0, b1);
}
```
>> > Or it may also ok if we can compute an assumption, under which
>> > all three ivs are not overflowed/wrapped.
>> >
>> >> only guaranteed if the absolute value of C1 - C2 is smaller than
>> >> that of C1 and if it has the same sign.
>> > I'm thinking this in another way:
>> > When trying to do this transform in number_of_iterations_cond,
>> > GT/GE is inverted to LT/LE, then the compare code would be:
>> > LT/LE or NE.
>> >
>> > For LT/LE, like {X, C1} < {Y, C2}, we can look it as iv0 is
>> > chasing iv1.  We would able to assume X < Y (may_be_zero would
>> > be set later via number_of_iterations_lt/le).
>> > 1. If C1 < C2, iv0 can never catch up iv1. For examples:
>> > {X, 1} < {Y, 2}; {X, -2} < {Y, -1}; {X, -2} < {Y, 1}.
>> > And there must be at least one overflow/wrap in iv0,iv1, or iv.
>> > This indicates, if the sign of (C1 - C1) is negative, then the
>> > transform would be incorrect.
>> > 2. If C1 > C2, we still need to make sure all the ivs (iv0,
>> > iv1 and combined iv) are not wrapped.
>> > For C2 > 0, {Y,C2} should not cross MAX before {X, C1} catch up.
>> >    the assumption may like : (MAX-Y)/C2 > (Y-X)/(C1-C1)
>> There is still some cases: iv0 step is too large, then iv0 wraps
>> first, e.g. {MAX-5, 10} < {MAX-3, 1}. For this, the assumption
>> would need to and with (MAX-X)/C1 > (Y-X)/(C1-C1).
>> 
>> > For C1 < 0, {X,C1} should not down cross MIN
>> >    the assumption may like : (X-MIN)/-C1 > (Y-X)/(C1-C1)
>>   Also add the assumption: (Y-MIN)/-C2 > (Y-X)/(C1-C1)
>> 
>> > For C1 > 0 and C2 < 0, iv0 and iv1 are walking to each other,
>> > it would be almost safe.
>> For this case, we may still add the assumption to avoid wraping
>> at the first iteration.
>> 
>> BR,
>> Jiufu
>> 
>> >
>> > For NE, it seems more interesting. The transformation depends
>> > on 3 things: 1. the relation between X and Y; 2 the sign
>> > of (C1-C2); 3. if iv0 and iv1 can be equal finally.
>> > The 3rd one may be more special.
>> > The good news is, number_of_iterations_ne seems able to handle NE.
>> >
>> >>
>> >> With the same reasoning we then know the new IV0 doesn't overflow.
>> >>
>> >> So something like the following.  IIRC I've proposed sth similar
>> >> a while back.  I'm going to give it some testing, collecting
>> >> testcases from related PRs.
>> >>
>> >> diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
>> >> index b767056aeb0..74fa4f66ee2 100644
...
>> >> +      if (TREE_CODE (step) != INTEGER_CST
>> >> +         || !iv0->no_overflow || !iv1->no_overflow)
>> >> +       {
>> > I was also trying to leverage no_overflow of iv0 and iv1. While it seems
>> > the computation logic of no_overflow is related to the type of IV.  If the
>> > type of IV is signed, the C semantics may be used, overflow in signed
>> > IV are treated UB, and then no_overflow would be true.
>> >
>> > For unsigned IV, no_overflow would be false, even for the cases which
>> > looks like:
>> > "{10, 2} < {20, 1}", which would be ok to compute niter.
>
> IIRC no_overflow is determined by SCEV which might also use niter
> analysis.  For the case of {10, +2} < {20, +1} there is no need to
> compute it as {10, +1} < 20 and we hopefully deal with this in
> other code paths (in fact with base and step all constant we
> can simply solve the linear equation for 'n' - maybe that's a
> capability we should add to number_of_iterations_cond).

Thanks for point this out.
Yes, for const base(s) and step(s), we have other code path
to deal with (e.g. loop_niter_by_eval).

For {10, +2}, what I really mean is about the no_overflow iv(s)
on unsigned.  Sorry the misleading words.
For no_overflow, it is set at some places, including
number_of_iterations_xxx. :),  Before number_of_iterations_xxx,
no_overflow could be calculated in simple_iv_with_niters:
```c
  iv->no_overflow = !folded_casts && nowrap_type_p (type);
```
nowrap_type_p checks if overflow is UB on type through macro
TYPE_OVERFLOW_UNDEFINED.  For signed, it is UB; for unsigned
it is different.

For example as below code, no_overflow is set as false for iv0
and iv1, and then niter was not computed quickly.
```c
unsigned __attribute__ ((noinline))
foo (unsigned b0, unsigned b1)
{
  unsigned n = 0;
  for (; b0 < b1; b0 += 2, b1 += 1)
    n++;
  return n;
}

int main()
{
  return foo (10, 20);
}
```

>
> Running a small example through the debugger I can indeed see
> no_overflow = true for this case (after a few times == false),
> so we do have some ways of determining there is no overflow for this
> case.
>
> I do not think we should try to add special-casings to this very
> generic transform.
>
> I will look at the regression that was pointed out now.
You may talking about pr81196.c, it seems relate to
"{p0, 1} < {p1, -1}".  It is not transformed to
"{p0, 2} < {p1,0}" anymore, and niter is not computed, then
vectorization does not happen.

BR,
Jiufu

>
> Richard.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] Check negative combined step
  2022-01-25 11:42         ` Jiufu Guo
@ 2022-01-25 12:02           ` Richard Biener
  2022-01-25 12:21             ` Richard Biener
  0 siblings, 1 reply; 11+ messages in thread
From: Richard Biener @ 2022-01-25 12:02 UTC (permalink / raw)
  To: Jiufu Guo; +Cc: gcc-patches, amker.cheng, wschmidt, segher, dje.gcc, jlaw

On Tue, 25 Jan 2022, Jiufu Guo wrote:

> Richard Biener <rguenther@suse.de> writes:
> 
> > On Tue, 25 Jan 2022, Jiufu Guo wrote:
> >
> >> Jiufu Guo <guojiufu@linux.ibm.com> writes:
> >> 
> >> > Richard Biener <rguenther@suse.de> writes:
> >> >
> >> >> On Thu, 13 Jan 2022, Jiufu Guo wrote:
> >> > ...
> >> >>
> >> >>> -      /* No need to check sign of the new step since below code takes care
> >> >>> -	 of this well.  */
> >> >>> +      /* Like cases shown in PR100740/102131, negtive step is not safe.  */
> >> >>> +      if (tree_int_cst_sign_bit (step))
> >> >>> +	return false;
> >> >>> +
> >> >>>        if (code != NE_EXPR
> >> >>>  	  && (TREE_CODE (step) != INTEGER_CST
> >> >>>  	      || !iv0->no_overflow || !iv1->no_overflow))
> >> >>
> >> >> I think for NE_EXPR the sign is not relevant.  I think the key is
> >> >> that we require that iv0->no_overflow and for non-equality checks
> >> >> adjusting X + C1 < Y + C2 to X + C1 - C2 < Y is only valid if that
> >> >> does not cause any overflow on either side.  On the LHS this is
> >> >
> >> > Hi Richard,
> >> >
> >> > Thanks a lot for your comments and ideas!
> >> >
> >> > Right! The adjusting is safe only if we can make sure there is
> >> > no overflow/wrap on either side or say there is no overflow/wrap
> >> > on three 'iv's: {X,C1}, {Y,C2} and {X, C1 - C2}.
> Hi Richard,
> 
> Thanks for your quickly reply, and patient review!
> >
> > The point is that we may not change the iteration number at which
> > overflow occurs since that alters the result of the < compare.
> > Only if we know there is no overflow with the present expression
> > during the loop evaluation we can do the transform and then only
> > if we do not introduce overflow.
> Exactly, this is also what I mean :)
> 
> >
> > We are basically doing the transform that fold_comparison
> > in fold-const.cc does:
> >
> >   /* Transform comparisons of the form X +- C1 CMP Y +- C2 to
> >      X CMP Y +- C2 +- C1 for signed X, Y.  This is valid if
> >      the resulting offset is smaller in absolute value than the
> >      original one and has the same sign.  */
> >   if (ANY_INTEGRAL_TYPE_P (TREE_TYPE (arg0))
> >       && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (arg0))
> > ...
> >
> This transform seems not the scenario which we are care about in
> number_of_iterations_cond.
> For example, 'X + 1 < Y + 4' ==> 'X < Y + 3' would be correct if
> no overflow happen.
> But for loop, the niter for '{X, 1} < {Y, 4}' would be totally
> different with niter for '{X, 0} < {Y, 3}'.
> for (iv0 = X, iv1 = Y; iv0 < iv1; iv0 += 1, iv1 += 4)
>    in this loop, iv1 walks to overflow faster, step is 4.
> vs.
> for (iv0 = X, iv1 = Y; iv0 < iv1; iv1 += 3) (iv1 overflow slow)
>    in this loop, iv1 overflows slower, step is 3.

Huh?  But we are _exactly_ doing this, analyzing {X, + 1} < {Y, + 4}
as X < {Y, + 3} (well, OK, we're only trying {X, -3} which now
fails - we should try the other way around as well).

> Actually, the transformation 'X + 1 < Y + 4' ==> 'X < Y + 3',
> may not always correct.  e.g. for below code, X=6, and Y=2147483645
> it may output "b0 + 1 < b1 + 4 is true".

But Y + 4 overflows with 2147483645 so X + 1 < Y + 4 invokes UB
and we can ignore this situation.

> ```c
> int __attribute__ ((noinline))
> foo (int b0, int b1)
> {
>   return __builtin_printf ("b0 + 1 < b1 + 4 is %s\n",
> 			   b0 + 1 < b1 + 4 ? "true" : "false");
> }
> 
> int main(int argc, char **argv)
> {
>   if (argc < 2)
>     return -1;
>   int b0 = atoi(argv[1]);
>   int b1 = atoi(argv[2]);  
>   return foo (b0, b1);
> }
> ```
> >> > Or it may also ok if we can compute an assumption, under which
> >> > all three ivs are not overflowed/wrapped.
> >> >
> >> >> only guaranteed if the absolute value of C1 - C2 is smaller than
> >> >> that of C1 and if it has the same sign.
> >> > I'm thinking this in another way:
> >> > When trying to do this transform in number_of_iterations_cond,
> >> > GT/GE is inverted to LT/LE, then the compare code would be:
> >> > LT/LE or NE.
> >> >
> >> > For LT/LE, like {X, C1} < {Y, C2}, we can look it as iv0 is
> >> > chasing iv1.  We would able to assume X < Y (may_be_zero would
> >> > be set later via number_of_iterations_lt/le).
> >> > 1. If C1 < C2, iv0 can never catch up iv1. For examples:
> >> > {X, 1} < {Y, 2}; {X, -2} < {Y, -1}; {X, -2} < {Y, 1}.
> >> > And there must be at least one overflow/wrap in iv0,iv1, or iv.
> >> > This indicates, if the sign of (C1 - C1) is negative, then the
> >> > transform would be incorrect.
> >> > 2. If C1 > C2, we still need to make sure all the ivs (iv0,
> >> > iv1 and combined iv) are not wrapped.
> >> > For C2 > 0, {Y,C2} should not cross MAX before {X, C1} catch up.
> >> >    the assumption may like : (MAX-Y)/C2 > (Y-X)/(C1-C1)
> >> There is still some cases: iv0 step is too large, then iv0 wraps
> >> first, e.g. {MAX-5, 10} < {MAX-3, 1}. For this, the assumption
> >> would need to and with (MAX-X)/C1 > (Y-X)/(C1-C1).
> >> 
> >> > For C1 < 0, {X,C1} should not down cross MIN
> >> >    the assumption may like : (X-MIN)/-C1 > (Y-X)/(C1-C1)
> >>   Also add the assumption: (Y-MIN)/-C2 > (Y-X)/(C1-C1)
> >> 
> >> > For C1 > 0 and C2 < 0, iv0 and iv1 are walking to each other,
> >> > it would be almost safe.
> >> For this case, we may still add the assumption to avoid wraping
> >> at the first iteration.
> >> 
> >> BR,
> >> Jiufu
> >> 
> >> >
> >> > For NE, it seems more interesting. The transformation depends
> >> > on 3 things: 1. the relation between X and Y; 2 the sign
> >> > of (C1-C2); 3. if iv0 and iv1 can be equal finally.
> >> > The 3rd one may be more special.
> >> > The good news is, number_of_iterations_ne seems able to handle NE.
> >> >
> >> >>
> >> >> With the same reasoning we then know the new IV0 doesn't overflow.
> >> >>
> >> >> So something like the following.  IIRC I've proposed sth similar
> >> >> a while back.  I'm going to give it some testing, collecting
> >> >> testcases from related PRs.
> >> >>
> >> >> diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
> >> >> index b767056aeb0..74fa4f66ee2 100644
> ...
> >> >> +      if (TREE_CODE (step) != INTEGER_CST
> >> >> +         || !iv0->no_overflow || !iv1->no_overflow)
> >> >> +       {
> >> > I was also trying to leverage no_overflow of iv0 and iv1. While it seems
> >> > the computation logic of no_overflow is related to the type of IV.  If the
> >> > type of IV is signed, the C semantics may be used, overflow in signed
> >> > IV are treated UB, and then no_overflow would be true.
> >> >
> >> > For unsigned IV, no_overflow would be false, even for the cases which
> >> > looks like:
> >> > "{10, 2} < {20, 1}", which would be ok to compute niter.
> >
> > IIRC no_overflow is determined by SCEV which might also use niter
> > analysis.  For the case of {10, +2} < {20, +1} there is no need to
> > compute it as {10, +1} < 20 and we hopefully deal with this in
> > other code paths (in fact with base and step all constant we
> > can simply solve the linear equation for 'n' - maybe that's a
> > capability we should add to number_of_iterations_cond).
> 
> Thanks for point this out.
> Yes, for const base(s) and step(s), we have other code path
> to deal with (e.g. loop_niter_by_eval).
> 
> For {10, +2}, what I really mean is about the no_overflow iv(s)
> on unsigned.  Sorry the misleading words.
> For no_overflow, it is set at some places, including
> number_of_iterations_xxx. :),  Before number_of_iterations_xxx,
> no_overflow could be calculated in simple_iv_with_niters:
> ```c
>   iv->no_overflow = !folded_casts && nowrap_type_p (type);
> ```
> nowrap_type_p checks if overflow is UB on type through macro
> TYPE_OVERFLOW_UNDEFINED.  For signed, it is UB; for unsigned
> it is different.

Yes.

> For example as below code, no_overflow is set as false for iv0
> and iv1, and then niter was not computed quickly.
> ```c
> unsigned __attribute__ ((noinline))
> foo (unsigned b0, unsigned b1)
> {
>   unsigned n = 0;
>   for (; b0 < b1; b0 += 2, b1 += 1)
>     n++;
>   return n;
> }

There's no difference to before my patch of course.  There are
some code paths in number_of_iterations_lt that use assumptions
to prove the combined IV does not wrap, just for this case
we give up too early.  I'm currently looking at rectifying this
with small incremental changes.

> int main()
> {
>   return foo (10, 20);
> }
> ```
> 
> >
> > Running a small example through the debugger I can indeed see
> > no_overflow = true for this case (after a few times == false),
> > so we do have some ways of determining there is no overflow for this
> > case.
> >
> > I do not think we should try to add special-casings to this very
> > generic transform.
> >
> > I will look at the regression that was pointed out now.
> You may talking about pr81196.c, it seems relate to
> "{p0, 1} < {p1, -1}".  It is not transformed to
> "{p0, 2} < {p1,0}" anymore, and niter is not computed, then
> vectorization does not happen.

Yes, I've pushed a fix for that - there's special pointer type
guarnatees we can use here.

Richard.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] Check negative combined step
  2022-01-25 12:02           ` Richard Biener
@ 2022-01-25 12:21             ` Richard Biener
  2022-01-27  9:12               ` Jiufu Guo
  0 siblings, 1 reply; 11+ messages in thread
From: Richard Biener @ 2022-01-25 12:21 UTC (permalink / raw)
  To: Jiufu Guo; +Cc: gcc-patches, amker.cheng, wschmidt, segher, dje.gcc, jlaw

On Tue, 25 Jan 2022, Richard Biener wrote:

> On Tue, 25 Jan 2022, Jiufu Guo wrote:
> 
> > Richard Biener <rguenther@suse.de> writes:
> > 
> > > On Tue, 25 Jan 2022, Jiufu Guo wrote:
> > >
> > >> Jiufu Guo <guojiufu@linux.ibm.com> writes:
> > >> 
> > >> > Richard Biener <rguenther@suse.de> writes:
> > >> >
> > >> >> On Thu, 13 Jan 2022, Jiufu Guo wrote:
> > >> > ...
> > >> >>
> > >> >>> -      /* No need to check sign of the new step since below code takes care
> > >> >>> -	 of this well.  */
> > >> >>> +      /* Like cases shown in PR100740/102131, negtive step is not safe.  */
> > >> >>> +      if (tree_int_cst_sign_bit (step))
> > >> >>> +	return false;
> > >> >>> +
> > >> >>>        if (code != NE_EXPR
> > >> >>>  	  && (TREE_CODE (step) != INTEGER_CST
> > >> >>>  	      || !iv0->no_overflow || !iv1->no_overflow))
> > >> >>
> > >> >> I think for NE_EXPR the sign is not relevant.  I think the key is
> > >> >> that we require that iv0->no_overflow and for non-equality checks
> > >> >> adjusting X + C1 < Y + C2 to X + C1 - C2 < Y is only valid if that
> > >> >> does not cause any overflow on either side.  On the LHS this is
> > >> >
> > >> > Hi Richard,
> > >> >
> > >> > Thanks a lot for your comments and ideas!
> > >> >
> > >> > Right! The adjusting is safe only if we can make sure there is
> > >> > no overflow/wrap on either side or say there is no overflow/wrap
> > >> > on three 'iv's: {X,C1}, {Y,C2} and {X, C1 - C2}.
> > Hi Richard,
> > 
> > Thanks for your quickly reply, and patient review!
> > >
> > > The point is that we may not change the iteration number at which
> > > overflow occurs since that alters the result of the < compare.
> > > Only if we know there is no overflow with the present expression
> > > during the loop evaluation we can do the transform and then only
> > > if we do not introduce overflow.
> > Exactly, this is also what I mean :)
> > 
> > >
> > > We are basically doing the transform that fold_comparison
> > > in fold-const.cc does:
> > >
> > >   /* Transform comparisons of the form X +- C1 CMP Y +- C2 to
> > >      X CMP Y +- C2 +- C1 for signed X, Y.  This is valid if
> > >      the resulting offset is smaller in absolute value than the
> > >      original one and has the same sign.  */
> > >   if (ANY_INTEGRAL_TYPE_P (TREE_TYPE (arg0))
> > >       && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (arg0))
> > > ...
> > >
> > This transform seems not the scenario which we are care about in
> > number_of_iterations_cond.
> > For example, 'X + 1 < Y + 4' ==> 'X < Y + 3' would be correct if
> > no overflow happen.
> > But for loop, the niter for '{X, 1} < {Y, 4}' would be totally
> > different with niter for '{X, 0} < {Y, 3}'.
> > for (iv0 = X, iv1 = Y; iv0 < iv1; iv0 += 1, iv1 += 4)
> >    in this loop, iv1 walks to overflow faster, step is 4.
> > vs.
> > for (iv0 = X, iv1 = Y; iv0 < iv1; iv1 += 3) (iv1 overflow slow)
> >    in this loop, iv1 overflows slower, step is 3.
> 
> Huh?  But we are _exactly_ doing this, analyzing {X, + 1} < {Y, + 4}
> as X < {Y, + 3} (well, OK, we're only trying {X, -3} which now
> fails - we should try the other way around as well).
> 
> > Actually, the transformation 'X + 1 < Y + 4' ==> 'X < Y + 3',
> > may not always correct.  e.g. for below code, X=6, and Y=2147483645
> > it may output "b0 + 1 < b1 + 4 is true".
> 
> But Y + 4 overflows with 2147483645 so X + 1 < Y + 4 invokes UB
> and we can ignore this situation.
> 
> > ```c
> > int __attribute__ ((noinline))
> > foo (int b0, int b1)
> > {
> >   return __builtin_printf ("b0 + 1 < b1 + 4 is %s\n",
> > 			   b0 + 1 < b1 + 4 ? "true" : "false");
> > }
> > 
> > int main(int argc, char **argv)
> > {
> >   if (argc < 2)
> >     return -1;
> >   int b0 = atoi(argv[1]);
> >   int b1 = atoi(argv[2]);  
> >   return foo (b0, b1);
> > }
> > ```
> > >> > Or it may also ok if we can compute an assumption, under which
> > >> > all three ivs are not overflowed/wrapped.
> > >> >
> > >> >> only guaranteed if the absolute value of C1 - C2 is smaller than
> > >> >> that of C1 and if it has the same sign.
> > >> > I'm thinking this in another way:
> > >> > When trying to do this transform in number_of_iterations_cond,
> > >> > GT/GE is inverted to LT/LE, then the compare code would be:
> > >> > LT/LE or NE.
> > >> >
> > >> > For LT/LE, like {X, C1} < {Y, C2}, we can look it as iv0 is
> > >> > chasing iv1.  We would able to assume X < Y (may_be_zero would
> > >> > be set later via number_of_iterations_lt/le).
> > >> > 1. If C1 < C2, iv0 can never catch up iv1. For examples:
> > >> > {X, 1} < {Y, 2}; {X, -2} < {Y, -1}; {X, -2} < {Y, 1}.
> > >> > And there must be at least one overflow/wrap in iv0,iv1, or iv.
> > >> > This indicates, if the sign of (C1 - C1) is negative, then the
> > >> > transform would be incorrect.
> > >> > 2. If C1 > C2, we still need to make sure all the ivs (iv0,
> > >> > iv1 and combined iv) are not wrapped.
> > >> > For C2 > 0, {Y,C2} should not cross MAX before {X, C1} catch up.
> > >> >    the assumption may like : (MAX-Y)/C2 > (Y-X)/(C1-C1)
> > >> There is still some cases: iv0 step is too large, then iv0 wraps
> > >> first, e.g. {MAX-5, 10} < {MAX-3, 1}. For this, the assumption
> > >> would need to and with (MAX-X)/C1 > (Y-X)/(C1-C1).
> > >> 
> > >> > For C1 < 0, {X,C1} should not down cross MIN
> > >> >    the assumption may like : (X-MIN)/-C1 > (Y-X)/(C1-C1)
> > >>   Also add the assumption: (Y-MIN)/-C2 > (Y-X)/(C1-C1)
> > >> 
> > >> > For C1 > 0 and C2 < 0, iv0 and iv1 are walking to each other,
> > >> > it would be almost safe.
> > >> For this case, we may still add the assumption to avoid wraping
> > >> at the first iteration.
> > >> 
> > >> BR,
> > >> Jiufu
> > >> 
> > >> >
> > >> > For NE, it seems more interesting. The transformation depends
> > >> > on 3 things: 1. the relation between X and Y; 2 the sign
> > >> > of (C1-C2); 3. if iv0 and iv1 can be equal finally.
> > >> > The 3rd one may be more special.
> > >> > The good news is, number_of_iterations_ne seems able to handle NE.
> > >> >
> > >> >>
> > >> >> With the same reasoning we then know the new IV0 doesn't overflow.
> > >> >>
> > >> >> So something like the following.  IIRC I've proposed sth similar
> > >> >> a while back.  I'm going to give it some testing, collecting
> > >> >> testcases from related PRs.
> > >> >>
> > >> >> diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
> > >> >> index b767056aeb0..74fa4f66ee2 100644
> > ...
> > >> >> +      if (TREE_CODE (step) != INTEGER_CST
> > >> >> +         || !iv0->no_overflow || !iv1->no_overflow)
> > >> >> +       {
> > >> > I was also trying to leverage no_overflow of iv0 and iv1. While it seems
> > >> > the computation logic of no_overflow is related to the type of IV.  If the
> > >> > type of IV is signed, the C semantics may be used, overflow in signed
> > >> > IV are treated UB, and then no_overflow would be true.
> > >> >
> > >> > For unsigned IV, no_overflow would be false, even for the cases which
> > >> > looks like:
> > >> > "{10, 2} < {20, 1}", which would be ok to compute niter.
> > >
> > > IIRC no_overflow is determined by SCEV which might also use niter
> > > analysis.  For the case of {10, +2} < {20, +1} there is no need to
> > > compute it as {10, +1} < 20 and we hopefully deal with this in
> > > other code paths (in fact with base and step all constant we
> > > can simply solve the linear equation for 'n' - maybe that's a
> > > capability we should add to number_of_iterations_cond).
> > 
> > Thanks for point this out.
> > Yes, for const base(s) and step(s), we have other code path
> > to deal with (e.g. loop_niter_by_eval).
> > 
> > For {10, +2}, what I really mean is about the no_overflow iv(s)
> > on unsigned.  Sorry the misleading words.
> > For no_overflow, it is set at some places, including
> > number_of_iterations_xxx. :),  Before number_of_iterations_xxx,
> > no_overflow could be calculated in simple_iv_with_niters:
> > ```c
> >   iv->no_overflow = !folded_casts && nowrap_type_p (type);
> > ```
> > nowrap_type_p checks if overflow is UB on type through macro
> > TYPE_OVERFLOW_UNDEFINED.  For signed, it is UB; for unsigned
> > it is different.
> 
> Yes.
> 
> > For example as below code, no_overflow is set as false for iv0
> > and iv1, and then niter was not computed quickly.
> > ```c
> > unsigned __attribute__ ((noinline))
> > foo (unsigned b0, unsigned b1)
> > {
> >   unsigned n = 0;
> >   for (; b0 < b1; b0 += 2, b1 += 1)
> >     n++;
> >   return n;
> > }
> 
> There's no difference to before my patch of course.  There are
> some code paths in number_of_iterations_lt that use assumptions
> to prove the combined IV does not wrap, just for this case
> we give up too early.  I'm currently looking at rectifying this
> with small incremental changes.

Like the one below which should handle the PR81196 case when
integer IVs are used.  I suspect we can do something similar
for IVs where we do not know the original overflow status
(we have to register assumptions for both original IVs _and_
the new adjusted one).

And as noted we can try rewriting to the other IV.

I do wonder how important these are and what improvements we need
to include in backports (I think we do want to fix the original issue
on branches).

Richard.

From f46855709dd45603d18f2dcd8403f5b060c164f0 Mon Sep 17 00:00:00 2001
From: Richard Biener <rguenther@suse.de>
Date: Tue, 25 Jan 2022 13:11:57 +0100
Subject: [PATCH] tree-optimization/104214 - improve IV analysis with integer
 IV compares
To: gcc-patches@gcc.gnu.org

For rewriting BASE0 + STEP0 cmp BASE1 + STEP1 as
BASE0 + STEP0 - STEP1 cmp BASE1 for signed integers we can use
niter assumptions to ensure that BASE0 + STEP0 - STEP1 does not
overflow instead of giving up when we cannot prove this
statically.

We can use the existing assert_no_overflow_lt for this and make it
efficient for LE_EXPR also by rewriting LE_EXPR IV compares to LT_EXPR earlier.

2022-01-25  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/104214
	* tree-ssa-loop-niter.cc (number_of_iterations_le): Refactor
	into ...
	(number_of_iterations_le_to_lt): ... this, just doing
	the iv->base rewriting and assumption registering.
	(number_of_iterations_cond): Rewrite LE_EXPR into LT_EXPR
	earlier.  When rewriting BASE0 + STEP0 cmp BASE1 + STEP1
	as BASE0 + STEP0 - STEP1 cmp BASE1 would fail for LT_EXPR
	because of possible overflow register assumptions instead.

	* gcc.dg/vect/pr81196-3.c: New testcase variant.
---
 gcc/testsuite/gcc.dg/vect/pr81196-3.c | 12 +++++
 gcc/tree-ssa-loop-niter.cc            | 67 +++++++++++++++------------
 2 files changed, 49 insertions(+), 30 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr81196-3.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr81196-3.c b/gcc/testsuite/gcc.dg/vect/pr81196-3.c
new file mode 100644
index 00000000000..bcdd815dc5d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr81196-3.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+
+void b (int *p, int j, int k)
+{
+  p = (int *)__builtin_assume_aligned(p, __BIGGEST_ALIGNMENT__);
+  int i = 0;
+  for(; j < k; ++j, --k)
+    p[i++] = 1;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
index d33095b8e03..b5f3d4b4a8d 100644
--- a/gcc/tree-ssa-loop-niter.cc
+++ b/gcc/tree-ssa-loop-niter.cc
@@ -1721,17 +1721,17 @@ number_of_iterations_lt (class loop *loop, tree type, affine_iv *iv0,
   return true;
 }
 
-/* Determines number of iterations of loop whose ending condition
-   is IV0 <= IV1.  TYPE is the type of the iv.  The number of
-   iterations is stored to NITER.  EXIT_MUST_BE_TAKEN is true if
-   we know that this condition must eventually become false (we derived this
+/* Rewrite the IV0 <= IV1 condition to IV0 < IV1 by adjusting one of
+   the IVs bases.  TYPE is the type of the iv.  Assumptions are
+   recorded to NITER.  EXIT_MUST_BE_TAKEN is true if we know that this
+   condition must eventually become false (we derived this
    earlier, and possibly set NITER->assumptions to make sure this
-   is the case).  BNDS bounds the difference IV1->base - IV0->base.  */
+   is the case).  */
 
 static bool
-number_of_iterations_le (class loop *loop, tree type, affine_iv *iv0,
-			 affine_iv *iv1, class tree_niter_desc *niter,
-			 bool exit_must_be_taken, bounds *bnds)
+number_of_iterations_le_to_lt (tree type, affine_iv *iv0, affine_iv *iv1,
+			       class tree_niter_desc *niter,
+			       bool exit_must_be_taken)
 {
   tree assumption;
   tree type1 = type;
@@ -1777,10 +1777,7 @@ number_of_iterations_le (class loop *loop, tree type, affine_iv *iv0,
     iv0->base = fold_build2 (MINUS_EXPR, type1,
 			     iv0->base, build_int_cst (type1, 1));
 
-  bounds_add (bnds, 1, type1);
-
-  return number_of_iterations_lt (loop, type, iv0, iv1, niter, exit_must_be_taken,
-				  bnds);
+  return true;
 }
 
 /* Dumps description of affine induction variable IV to FILE.  */
@@ -1862,6 +1859,17 @@ number_of_iterations_cond (class loop *loop,
       code = swap_tree_comparison (code);
     }
 
+  /* If the loop exits immediately, there is nothing to do.  */
+  tree tem = fold_binary (code, boolean_type_node, iv0->base, iv1->base);
+  if (tem && integer_zerop (tem))
+    {
+      if (!every_iteration)
+	return false;
+      niter->niter = build_int_cst (unsigned_type_for (type), 0);
+      niter->max = 0;
+      return true;
+    }
+
   if (POINTER_TYPE_P (type))
     {
       /* Comparison of pointers is undefined unless both iv0 and iv1 point
@@ -1884,6 +1892,15 @@ number_of_iterations_cond (class loop *loop,
 	exit_must_be_taken = true;
     }
 
+  /* Turn LE_EXPR to LT_EXPR, registering required assumptions.  */
+  if (code == LE_EXPR)
+    {
+      if (!number_of_iterations_le_to_lt (type, iv0, iv1, niter,
+					  exit_must_be_taken))
+	return false;
+      code = LT_EXPR;
+    }
+
   /* We can handle cases which neither of the sides of the comparison is
      invariant:
 
@@ -1928,15 +1945,21 @@ number_of_iterations_cond (class loop *loop,
 	       pointer compares, we also know the resulting IV does not
 	       overflow.  */
 	    ;
-	  else if (code != NE_EXPR)
-	    return false;
 	  else
+	    /* For LT_EXPR we register the assumptions necessary for
+	       the adjusted IV0 to not overflow.  */
 	    iv0->no_overflow = false;
 	}
 
       iv0->step = step;
       iv1->step = build_int_cst (step_type, 0);
       iv1->no_overflow = true;
+      if (code == LT_EXPR && !iv0->no_overflow)
+	{
+	  if (!assert_no_overflow_lt (type, iv0, iv1, niter, step))
+	    return false;
+	  /* We will now have iv0->no_overflow == true again.  */
+	}
     }
 
   /* If the result of the comparison is a constant,  the loop is weird.  More
@@ -1945,17 +1968,6 @@ number_of_iterations_cond (class loop *loop,
   if (integer_zerop (iv0->step) && integer_zerop (iv1->step))
     return false;
 
-  /* If the loop exits immediately, there is nothing to do.  */
-  tree tem = fold_binary (code, boolean_type_node, iv0->base, iv1->base);
-  if (tem && integer_zerop (tem))
-    {
-      if (!every_iteration)
-	return false;
-      niter->niter = build_int_cst (unsigned_type_for (type), 0);
-      niter->max = 0;
-      return true;
-    }
-
   /* OK, now we know we have a senseful loop.  Handle several cases, depending
      on what comparison operator is used.  */
   bound_difference (loop, iv1->base, iv0->base, &bnds);
@@ -1994,11 +2006,6 @@ number_of_iterations_cond (class loop *loop,
 				     exit_must_be_taken, &bnds);
       break;
 
-    case LE_EXPR:
-      ret = number_of_iterations_le (loop, type, iv0, iv1, niter,
-				     exit_must_be_taken, &bnds);
-      break;
-
     default:
       gcc_unreachable ();
     }
-- 
2.31.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] Check negative combined step
  2022-01-25 12:21             ` Richard Biener
@ 2022-01-27  9:12               ` Jiufu Guo
  2022-01-27  9:45                 ` Richard Biener
  0 siblings, 1 reply; 11+ messages in thread
From: Jiufu Guo @ 2022-01-27  9:12 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, amker.cheng, wschmidt, segher, dje.gcc, jlaw

Hi Richard,

Richard Biener <rguenther@suse.de> writes:

> On Tue, 25 Jan 2022, Richard Biener wrote:
>
>> On Tue, 25 Jan 2022, Jiufu Guo wrote:
>>
...
>> > >
>> > > The point is that we may not change the iteration number at which
>> > > overflow occurs since that alters the result of the < compare.
>> > > Only if we know there is no overflow with the present expression
>> > > during the loop evaluation we can do the transform and then only
>> > > if we do not introduce overflow.
>> > Exactly, this is also what I mean :)
>> > 
>> > >
>> > > We are basically doing the transform that fold_comparison
>> > > in fold-const.cc does:
>> > >
>> > >   /* Transform comparisons of the form X +- C1 CMP Y +- C2 to
>> > >      X CMP Y +- C2 +- C1 for signed X, Y.  This is valid if
>> > >      the resulting offset is smaller in absolute value than the
>> > >      original one and has the same sign.  */
>> > >   if (ANY_INTEGRAL_TYPE_P (TREE_TYPE (arg0))
>> > >       && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (arg0))
>> > > ...
>> > >
>> > This transform seems not the scenario which we are care about in
>> > number_of_iterations_cond.
>> > For example, 'X + 1 < Y + 4' ==> 'X < Y + 3' would be correct if
>> > no overflow happen.
>> > But for loop, the niter for '{X, 1} < {Y, 4}' would be totally
>> > different with niter for '{X, 0} < {Y, 3}'.
>> > for (iv0 = X, iv1 = Y; iv0 < iv1; iv0 += 1, iv1 += 4)
>> >    in this loop, iv1 walks to overflow faster, step is 4.
>> > vs.
>> > for (iv0 = X, iv1 = Y; iv0 < iv1; iv1 += 3) (iv1 overflow slow)
>> >    in this loop, iv1 overflows slower, step is 3.
>> 
>> Huh?  But we are _exactly_ doing this, analyzing {X, + 1} < {Y, + 4}
>> as X < {Y, + 3} (well, OK, we're only trying {X, -3} which now
>> fails - we should try the other way around as well).
>> 
>> > Actually, the transformation 'X + 1 < Y + 4' ==> 'X < Y + 3',
>> > may not always correct.  e.g. for below code, X=6, and Y=2147483645
>> > it may output "b0 + 1 < b1 + 4 is true".
>> 
>> But Y + 4 overflows with 2147483645 so X + 1 < Y + 4 invokes UB
>> and we can ignore this situation.
>> 
...
>> > >> > Or it may also ok if we can compute an assumption, under which
>> > >> > all three ivs are not overflowed/wrapped.
>> > >> >
>> > >> >> only guaranteed if the absolute value of C1 - C2 is smaller than
>> > >> >> that of C1 and if it has the same sign.
>> > >> > I'm thinking this in another way:
>> > >> > When trying to do this transform in number_of_iterations_cond,
>> > >> > GT/GE is inverted to LT/LE, then the compare code would be:
>> > >> > LT/LE or NE.
>> > >> >
>> > >> > For LT/LE, like {X, C1} < {Y, C2}, we can look it as iv0 is
>> > >> > chasing iv1.  We would able to assume X < Y (may_be_zero would
>> > >> > be set later via number_of_iterations_lt/le).
>> > >> > 1. If C1 < C2, iv0 can never catch up iv1. For examples:
>> > >> > {X, 1} < {Y, 2}; {X, -2} < {Y, -1}; {X, -2} < {Y, 1}.
>> > >> > And there must be at least one overflow/wrap in iv0,iv1, or iv.
>> > >> > This indicates, if the sign of (C1 - C1) is negative, then the
>> > >> > transform would be incorrect.
>> > >> > 2. If C1 > C2, we still need to make sure all the ivs (iv0,
>> > >> > iv1 and combined iv) are not wrapped.
>> > >> > For C2 > 0, {Y,C2} should not cross MAX before {X, C1} catch up.
>> > >> >    the assumption may like : (MAX-Y)/C2 > (Y-X)/(C1-C1)
>> > >> There is still some cases: iv0 step is too large, then iv0 wraps
>> > >> first, e.g. {MAX-5, 10} < {MAX-3, 1}. For this, the assumption
>> > >> would need to and with (MAX-X)/C1 > (Y-X)/(C1-C1).
>> > >> 
>> > >> > For C1 < 0, {X,C1} should not down cross MIN
>> > >> >    the assumption may like : (X-MIN)/-C1 > (Y-X)/(C1-C1)
>> > >>   Also add the assumption: (Y-MIN)/-C2 > (Y-X)/(C1-C1)
>> > >> 
>> > >> > For C1 > 0 and C2 < 0, iv0 and iv1 are walking to each other,
>> > >> > it would be almost safe.
>> > >> For this case, we may still add the assumption to avoid wraping
>> > >> at the first iteration.
>> > >> 
>> > >> BR,
>> > >> Jiufu
>> > >> 
>> > >> >
>> > >> > For NE, it seems more interesting. The transformation depends
>> > >> > on 3 things: 1. the relation between X and Y; 2 the sign
>> > >> > of (C1-C2); 3. if iv0 and iv1 can be equal finally.
>> > >> > The 3rd one may be more special.
>> > >> > The good news is, number_of_iterations_ne seems able to handle NE.
>> > >> >
>> > >> >>
>> > >> >> With the same reasoning we then know the new IV0 doesn't overflow.
>> > >> >>
>> > >> >> So something like the following.  IIRC I've proposed sth similar
>> > >> >> a while back.  I'm going to give it some testing, collecting
>> > >> >> testcases from related PRs.
>> > >> >>
>> > >> >> diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
>> > >> >> index b767056aeb0..74fa4f66ee2 100644
>> > ...
>> > >> >> +      if (TREE_CODE (step) != INTEGER_CST
>> > >> >> +         || !iv0->no_overflow || !iv1->no_overflow)
>> > >> >> +       {
>> > >> > I was also trying to leverage no_overflow of iv0 and iv1. While it seems
>> > >> > the computation logic of no_overflow is related to the type of IV.  If the
>> > >> > type of IV is signed, the C semantics may be used, overflow in signed
>> > >> > IV are treated UB, and then no_overflow would be true.
>> > >> >
>> > >> > For unsigned IV, no_overflow would be false, even for the cases which
>> > >> > looks like:
>> > >> > "{10, 2} < {20, 1}", which would be ok to compute niter.
>> > >
>> > > IIRC no_overflow is determined by SCEV which might also use niter
>> > > analysis.  For the case of {10, +2} < {20, +1} there is no need to
>> > > compute it as {10, +1} < 20 and we hopefully deal with this in
>> > > other code paths (in fact with base and step all constant we
>> > > can simply solve the linear equation for 'n' - maybe that's a
>> > > capability we should add to number_of_iterations_cond).
>> > 
>> > Thanks for point this out.
>> > Yes, for const base(s) and step(s), we have other code path
>> > to deal with (e.g. loop_niter_by_eval).
>> > 
>> > For {10, +2}, what I really mean is about the no_overflow iv(s)
>> > on unsigned.  Sorry the misleading words.
>> > For no_overflow, it is set at some places, including
>> > number_of_iterations_xxx. :),  Before number_of_iterations_xxx,
>> > no_overflow could be calculated in simple_iv_with_niters:
>> > ```c
>> >   iv->no_overflow = !folded_casts && nowrap_type_p (type);
>> > ```
>> > nowrap_type_p checks if overflow is UB on type through macro
>> > TYPE_OVERFLOW_UNDEFINED.  For signed, it is UB; for unsigned
>> > it is different.
>> 
>> Yes.
>> 
>> > For example as below code, no_overflow is set as false for iv0
>> > and iv1, and then niter was not computed quickly.
>> > ```c
>> > unsigned __attribute__ ((noinline))
>> > foo (unsigned b0, unsigned b1)
>> > {
>> >   unsigned n = 0;
>> >   for (; b0 < b1; b0 += 2, b1 += 1)
>> >     n++;
>> >   return n;
>> > }
>> 
>> There's no difference to before my patch of course.  There are
>> some code paths in number_of_iterations_lt that use assumptions
>> to prove the combined IV does not wrap, just for this case
>> we give up too early.  I'm currently looking at rectifying this
>> with small incremental changes.
>
> Like the one below which should handle the PR81196 case when
> integer IVs are used.  I suspect we can do something similar
> for IVs where we do not know the original overflow status
> (we have to register assumptions for both original IVs _and_
> the new adjusted one).
Agree with you, that without accurate overflow status, it may
not safe to combine the two iv steps directly.

For those cases where integer IVs are used, one kind of case,
as shown in your patch, is IV0 increasing and IV1 descreasing.
looks like "{b0, +C1} < {b1, -C2}".  For this kind of cases,
because "abs(step) > abs(iv0->step)", so more assumptions are
needed.  Maybe simple assumption is ok for most cases:
`(b0 > MIN + C2) && (b1 < MAX - C1 - C2).`
'b0 > MIN + C2' would help to make sure IV1 does not overflow,
and 'b1 < MAX - C1 - C2' guard that IV0 and combined IV do not
walk cross MAX. 

In a previous patch, I tried to add more assumptions. While
the assumption contains a few expensive expressions(e.g. DIV),
because it tries to handle more cases accurately even for
unsigned integer whose 'no_overflow' information is not known.

>
> And as noted we can try rewriting to the other IV.
>
> I do wonder how important these are and what improvements we need
> to include in backports (I think we do want to fix the original issue
> on branches).
I also have the feeling that combining two IVs are rare in cases.
It may not benefit benchmarks too much.  It seems this kind of
case may be not be used widely and is not on a hot path in spec.

Thanks for your thoughts and sugguestions!

BR,
Jiufu
>
> Richard.
>
> From f46855709dd45603d18f2dcd8403f5b060c164f0 Mon Sep 17 00:00:00 2001
> From: Richard Biener <rguenther@suse.de>
> Date: Tue, 25 Jan 2022 13:11:57 +0100
> Subject: [PATCH] tree-optimization/104214 - improve IV analysis with integer
>  IV compares
> To: gcc-patches@gcc.gnu.org
>
> For rewriting BASE0 + STEP0 cmp BASE1 + STEP1 as
> BASE0 + STEP0 - STEP1 cmp BASE1 for signed integers we can use
> niter assumptions to ensure that BASE0 + STEP0 - STEP1 does not
> overflow instead of giving up when we cannot prove this
> statically.
>
> We can use the existing assert_no_overflow_lt for this and make it
> efficient for LE_EXPR also by rewriting LE_EXPR IV compares to LT_EXPR earlier.
>
> 2022-01-25  Richard Biener  <rguenther@suse.de>
>
> 	PR tree-optimization/104214
> 	* tree-ssa-loop-niter.cc (number_of_iterations_le): Refactor
> 	into ...
> 	(number_of_iterations_le_to_lt): ... this, just doing
> 	the iv->base rewriting and assumption registering.
> 	(number_of_iterations_cond): Rewrite LE_EXPR into LT_EXPR
> 	earlier.  When rewriting BASE0 + STEP0 cmp BASE1 + STEP1
> 	as BASE0 + STEP0 - STEP1 cmp BASE1 would fail for LT_EXPR
> 	because of possible overflow register assumptions instead.
>
> 	* gcc.dg/vect/pr81196-3.c: New testcase variant.
> ---
>  gcc/testsuite/gcc.dg/vect/pr81196-3.c | 12 +++++
>  gcc/tree-ssa-loop-niter.cc            | 67 +++++++++++++++------------
>  2 files changed, 49 insertions(+), 30 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/pr81196-3.c
>
> diff --git a/gcc/testsuite/gcc.dg/vect/pr81196-3.c b/gcc/testsuite/gcc.dg/vect/pr81196-3.c
> new file mode 100644
> index 00000000000..bcdd815dc5d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/pr81196-3.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_int } */
> +
> +void b (int *p, int j, int k)
> +{
> +  p = (int *)__builtin_assume_aligned(p, __BIGGEST_ALIGNMENT__);
> +  int i = 0;
> +  for(; j < k; ++j, --k)
> +    p[i++] = 1;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
> diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
> index d33095b8e03..b5f3d4b4a8d 100644
> --- a/gcc/tree-ssa-loop-niter.cc
> +++ b/gcc/tree-ssa-loop-niter.cc
> @@ -1721,17 +1721,17 @@ number_of_iterations_lt (class loop *loop, tree type, affine_iv *iv0,
>    return true;
>  }
>  
> -/* Determines number of iterations of loop whose ending condition
> -   is IV0 <= IV1.  TYPE is the type of the iv.  The number of
> -   iterations is stored to NITER.  EXIT_MUST_BE_TAKEN is true if
> -   we know that this condition must eventually become false (we derived this
> +/* Rewrite the IV0 <= IV1 condition to IV0 < IV1 by adjusting one of
> +   the IVs bases.  TYPE is the type of the iv.  Assumptions are
> +   recorded to NITER.  EXIT_MUST_BE_TAKEN is true if we know that this
> +   condition must eventually become false (we derived this
>     earlier, and possibly set NITER->assumptions to make sure this
> -   is the case).  BNDS bounds the difference IV1->base - IV0->base.  */
> +   is the case).  */
>  
>  static bool
> -number_of_iterations_le (class loop *loop, tree type, affine_iv *iv0,
> -			 affine_iv *iv1, class tree_niter_desc *niter,
> -			 bool exit_must_be_taken, bounds *bnds)
> +number_of_iterations_le_to_lt (tree type, affine_iv *iv0, affine_iv *iv1,
> +			       class tree_niter_desc *niter,
> +			       bool exit_must_be_taken)
>  {
>    tree assumption;
>    tree type1 = type;
> @@ -1777,10 +1777,7 @@ number_of_iterations_le (class loop *loop, tree type, affine_iv *iv0,
>      iv0->base = fold_build2 (MINUS_EXPR, type1,
>  			     iv0->base, build_int_cst (type1, 1));
>  
> -  bounds_add (bnds, 1, type1);
> -
> -  return number_of_iterations_lt (loop, type, iv0, iv1, niter, exit_must_be_taken,
> -				  bnds);
> +  return true;
>  }
>  
>  /* Dumps description of affine induction variable IV to FILE.  */
> @@ -1862,6 +1859,17 @@ number_of_iterations_cond (class loop *loop,
>        code = swap_tree_comparison (code);
>      }
>  
> +  /* If the loop exits immediately, there is nothing to do.  */
> +  tree tem = fold_binary (code, boolean_type_node, iv0->base, iv1->base);
> +  if (tem && integer_zerop (tem))
> +    {
> +      if (!every_iteration)
> +	return false;
> +      niter->niter = build_int_cst (unsigned_type_for (type), 0);
> +      niter->max = 0;
> +      return true;
> +    }
> +
>    if (POINTER_TYPE_P (type))
>      {
>        /* Comparison of pointers is undefined unless both iv0 and iv1 point
> @@ -1884,6 +1892,15 @@ number_of_iterations_cond (class loop *loop,
>  	exit_must_be_taken = true;
>      }
>  
> +  /* Turn LE_EXPR to LT_EXPR, registering required assumptions.  */
> +  if (code == LE_EXPR)
> +    {
> +      if (!number_of_iterations_le_to_lt (type, iv0, iv1, niter,
> +					  exit_must_be_taken))
> +	return false;
> +      code = LT_EXPR;
> +    }
> +
>    /* We can handle cases which neither of the sides of the comparison is
>       invariant:
>  
> @@ -1928,15 +1945,21 @@ number_of_iterations_cond (class loop *loop,
>  	       pointer compares, we also know the resulting IV does not
>  	       overflow.  */
>  	    ;
> -	  else if (code != NE_EXPR)
> -	    return false;
If two lines are removed, some cases in original PR may not work.
And as you also mentioned above, we have to add more assumptions, then.
I saw you are try to use  number_of_iterations_le_to_lt and
assert_no_overflow_lt.  I still think it would be ok to use
negative step on combined IV to indicate the IV combining
is invalid.

Thanks a gain for your comments!

BR,
Jiufu

>  	  else
> +	    /* For LT_EXPR we register the assumptions necessary for
> +	       the adjusted IV0 to not overflow.  */
>  	    iv0->no_overflow = false;
>  	}
>  
>        iv0->step = step;
>        iv1->step = build_int_cst (step_type, 0);
>        iv1->no_overflow = true;
> +      if (code == LT_EXPR && !iv0->no_overflow)
> +	{
> +	  if (!assert_no_overflow_lt (type, iv0, iv1, niter, step))
> +	    return false;
> +	  /* We will now have iv0->no_overflow == true again.  */
> +	}
>      }
>  
>    /* If the result of the comparison is a constant,  the loop is weird.  More
> @@ -1945,17 +1968,6 @@ number_of_iterations_cond (class loop *loop,
>    if (integer_zerop (iv0->step) && integer_zerop (iv1->step))
>      return false;
>  
> -  /* If the loop exits immediately, there is nothing to do.  */
> -  tree tem = fold_binary (code, boolean_type_node, iv0->base, iv1->base);
> -  if (tem && integer_zerop (tem))
> -    {
> -      if (!every_iteration)
> -	return false;
> -      niter->niter = build_int_cst (unsigned_type_for (type), 0);
> -      niter->max = 0;
> -      return true;
> -    }
> -
>    /* OK, now we know we have a senseful loop.  Handle several cases, depending
>       on what comparison operator is used.  */
>    bound_difference (loop, iv1->base, iv0->base, &bnds);
> @@ -1994,11 +2006,6 @@ number_of_iterations_cond (class loop *loop,
>  				     exit_must_be_taken, &bnds);
>        break;
>  
> -    case LE_EXPR:
> -      ret = number_of_iterations_le (loop, type, iv0, iv1, niter,
> -				     exit_must_be_taken, &bnds);
> -      break;
> -
>      default:
>        gcc_unreachable ();
>      }

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] Check negative combined step
  2022-01-27  9:12               ` Jiufu Guo
@ 2022-01-27  9:45                 ` Richard Biener
  0 siblings, 0 replies; 11+ messages in thread
From: Richard Biener @ 2022-01-27  9:45 UTC (permalink / raw)
  To: Jiufu Guo; +Cc: gcc-patches, amker.cheng, wschmidt, segher, dje.gcc, jlaw

On Thu, 27 Jan 2022, Jiufu Guo wrote:

> Hi Richard,
> 
> Richard Biener <rguenther@suse.de> writes:
> 
> > On Tue, 25 Jan 2022, Richard Biener wrote:
> >
> >> On Tue, 25 Jan 2022, Jiufu Guo wrote:
> >>
> ...
> >> > >
> >> > > The point is that we may not change the iteration number at which
> >> > > overflow occurs since that alters the result of the < compare.
> >> > > Only if we know there is no overflow with the present expression
> >> > > during the loop evaluation we can do the transform and then only
> >> > > if we do not introduce overflow.
> >> > Exactly, this is also what I mean :)
> >> > 
> >> > >
> >> > > We are basically doing the transform that fold_comparison
> >> > > in fold-const.cc does:
> >> > >
> >> > >   /* Transform comparisons of the form X +- C1 CMP Y +- C2 to
> >> > >      X CMP Y +- C2 +- C1 for signed X, Y.  This is valid if
> >> > >      the resulting offset is smaller in absolute value than the
> >> > >      original one and has the same sign.  */
> >> > >   if (ANY_INTEGRAL_TYPE_P (TREE_TYPE (arg0))
> >> > >       && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (arg0))
> >> > > ...
> >> > >
> >> > This transform seems not the scenario which we are care about in
> >> > number_of_iterations_cond.
> >> > For example, 'X + 1 < Y + 4' ==> 'X < Y + 3' would be correct if
> >> > no overflow happen.
> >> > But for loop, the niter for '{X, 1} < {Y, 4}' would be totally
> >> > different with niter for '{X, 0} < {Y, 3}'.
> >> > for (iv0 = X, iv1 = Y; iv0 < iv1; iv0 += 1, iv1 += 4)
> >> >    in this loop, iv1 walks to overflow faster, step is 4.
> >> > vs.
> >> > for (iv0 = X, iv1 = Y; iv0 < iv1; iv1 += 3) (iv1 overflow slow)
> >> >    in this loop, iv1 overflows slower, step is 3.
> >> 
> >> Huh?  But we are _exactly_ doing this, analyzing {X, + 1} < {Y, + 4}
> >> as X < {Y, + 3} (well, OK, we're only trying {X, -3} which now
> >> fails - we should try the other way around as well).
> >> 
> >> > Actually, the transformation 'X + 1 < Y + 4' ==> 'X < Y + 3',
> >> > may not always correct.  e.g. for below code, X=6, and Y=2147483645
> >> > it may output "b0 + 1 < b1 + 4 is true".
> >> 
> >> But Y + 4 overflows with 2147483645 so X + 1 < Y + 4 invokes UB
> >> and we can ignore this situation.
> >> 
> ...
> >> > >> > Or it may also ok if we can compute an assumption, under which
> >> > >> > all three ivs are not overflowed/wrapped.
> >> > >> >
> >> > >> >> only guaranteed if the absolute value of C1 - C2 is smaller than
> >> > >> >> that of C1 and if it has the same sign.
> >> > >> > I'm thinking this in another way:
> >> > >> > When trying to do this transform in number_of_iterations_cond,
> >> > >> > GT/GE is inverted to LT/LE, then the compare code would be:
> >> > >> > LT/LE or NE.
> >> > >> >
> >> > >> > For LT/LE, like {X, C1} < {Y, C2}, we can look it as iv0 is
> >> > >> > chasing iv1.  We would able to assume X < Y (may_be_zero would
> >> > >> > be set later via number_of_iterations_lt/le).
> >> > >> > 1. If C1 < C2, iv0 can never catch up iv1. For examples:
> >> > >> > {X, 1} < {Y, 2}; {X, -2} < {Y, -1}; {X, -2} < {Y, 1}.
> >> > >> > And there must be at least one overflow/wrap in iv0,iv1, or iv.
> >> > >> > This indicates, if the sign of (C1 - C1) is negative, then the
> >> > >> > transform would be incorrect.
> >> > >> > 2. If C1 > C2, we still need to make sure all the ivs (iv0,
> >> > >> > iv1 and combined iv) are not wrapped.
> >> > >> > For C2 > 0, {Y,C2} should not cross MAX before {X, C1} catch up.
> >> > >> >    the assumption may like : (MAX-Y)/C2 > (Y-X)/(C1-C1)
> >> > >> There is still some cases: iv0 step is too large, then iv0 wraps
> >> > >> first, e.g. {MAX-5, 10} < {MAX-3, 1}. For this, the assumption
> >> > >> would need to and with (MAX-X)/C1 > (Y-X)/(C1-C1).
> >> > >> 
> >> > >> > For C1 < 0, {X,C1} should not down cross MIN
> >> > >> >    the assumption may like : (X-MIN)/-C1 > (Y-X)/(C1-C1)
> >> > >>   Also add the assumption: (Y-MIN)/-C2 > (Y-X)/(C1-C1)
> >> > >> 
> >> > >> > For C1 > 0 and C2 < 0, iv0 and iv1 are walking to each other,
> >> > >> > it would be almost safe.
> >> > >> For this case, we may still add the assumption to avoid wraping
> >> > >> at the first iteration.
> >> > >> 
> >> > >> BR,
> >> > >> Jiufu
> >> > >> 
> >> > >> >
> >> > >> > For NE, it seems more interesting. The transformation depends
> >> > >> > on 3 things: 1. the relation between X and Y; 2 the sign
> >> > >> > of (C1-C2); 3. if iv0 and iv1 can be equal finally.
> >> > >> > The 3rd one may be more special.
> >> > >> > The good news is, number_of_iterations_ne seems able to handle NE.
> >> > >> >
> >> > >> >>
> >> > >> >> With the same reasoning we then know the new IV0 doesn't overflow.
> >> > >> >>
> >> > >> >> So something like the following.  IIRC I've proposed sth similar
> >> > >> >> a while back.  I'm going to give it some testing, collecting
> >> > >> >> testcases from related PRs.
> >> > >> >>
> >> > >> >> diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
> >> > >> >> index b767056aeb0..74fa4f66ee2 100644
> >> > ...
> >> > >> >> +      if (TREE_CODE (step) != INTEGER_CST
> >> > >> >> +         || !iv0->no_overflow || !iv1->no_overflow)
> >> > >> >> +       {
> >> > >> > I was also trying to leverage no_overflow of iv0 and iv1. While it seems
> >> > >> > the computation logic of no_overflow is related to the type of IV.  If the
> >> > >> > type of IV is signed, the C semantics may be used, overflow in signed
> >> > >> > IV are treated UB, and then no_overflow would be true.
> >> > >> >
> >> > >> > For unsigned IV, no_overflow would be false, even for the cases which
> >> > >> > looks like:
> >> > >> > "{10, 2} < {20, 1}", which would be ok to compute niter.
> >> > >
> >> > > IIRC no_overflow is determined by SCEV which might also use niter
> >> > > analysis.  For the case of {10, +2} < {20, +1} there is no need to
> >> > > compute it as {10, +1} < 20 and we hopefully deal with this in
> >> > > other code paths (in fact with base and step all constant we
> >> > > can simply solve the linear equation for 'n' - maybe that's a
> >> > > capability we should add to number_of_iterations_cond).
> >> > 
> >> > Thanks for point this out.
> >> > Yes, for const base(s) and step(s), we have other code path
> >> > to deal with (e.g. loop_niter_by_eval).
> >> > 
> >> > For {10, +2}, what I really mean is about the no_overflow iv(s)
> >> > on unsigned.  Sorry the misleading words.
> >> > For no_overflow, it is set at some places, including
> >> > number_of_iterations_xxx. :),  Before number_of_iterations_xxx,
> >> > no_overflow could be calculated in simple_iv_with_niters:
> >> > ```c
> >> >   iv->no_overflow = !folded_casts && nowrap_type_p (type);
> >> > ```
> >> > nowrap_type_p checks if overflow is UB on type through macro
> >> > TYPE_OVERFLOW_UNDEFINED.  For signed, it is UB; for unsigned
> >> > it is different.
> >> 
> >> Yes.
> >> 
> >> > For example as below code, no_overflow is set as false for iv0
> >> > and iv1, and then niter was not computed quickly.
> >> > ```c
> >> > unsigned __attribute__ ((noinline))
> >> > foo (unsigned b0, unsigned b1)
> >> > {
> >> >   unsigned n = 0;
> >> >   for (; b0 < b1; b0 += 2, b1 += 1)
> >> >     n++;
> >> >   return n;
> >> > }
> >> 
> >> There's no difference to before my patch of course.  There are
> >> some code paths in number_of_iterations_lt that use assumptions
> >> to prove the combined IV does not wrap, just for this case
> >> we give up too early.  I'm currently looking at rectifying this
> >> with small incremental changes.
> >
> > Like the one below which should handle the PR81196 case when
> > integer IVs are used.  I suspect we can do something similar
> > for IVs where we do not know the original overflow status
> > (we have to register assumptions for both original IVs _and_
> > the new adjusted one).
> Agree with you, that without accurate overflow status, it may
> not safe to combine the two iv steps directly.
> 
> For those cases where integer IVs are used, one kind of case,
> as shown in your patch, is IV0 increasing and IV1 descreasing.
> looks like "{b0, +C1} < {b1, -C2}".  For this kind of cases,
> because "abs(step) > abs(iv0->step)", so more assumptions are
> needed.  Maybe simple assumption is ok for most cases:
> `(b0 > MIN + C2) && (b1 < MAX - C1 - C2).`
> 'b0 > MIN + C2' would help to make sure IV1 does not overflow,
> and 'b1 < MAX - C1 - C2' guard that IV0 and combined IV do not
> walk cross MAX. 
> 
> In a previous patch, I tried to add more assumptions. While
> the assumption contains a few expensive expressions(e.g. DIV),
> because it tries to handle more cases accurately even for
> unsigned integer whose 'no_overflow' information is not known.

Yes, I've seen this.  Btw, my attempt failed and will regress
the miscompile testcases again - I have not yet analyzed why.

> >
> > And as noted we can try rewriting to the other IV.
> >
> > I do wonder how important these are and what improvements we need
> > to include in backports (I think we do want to fix the original issue
> > on branches).
> I also have the feeling that combining two IVs are rare in cases.
> It may not benefit benchmarks too much.  It seems this kind of
> case may be not be used widely and is not on a hot path in spec.

Indeed.  So my plan is to have the current state settle a bit
(you've seen I've moved over to the other niter correctness issue)
to see whether there's any further bad fallout also to assess
safeness of backporting.  For the next stage1 we can see how to
improve these cases, I'd also like to add some statistics so we
can see how many loops we have where we are not able to compute
a symbolic number of iterations (and why), possibly at loop_init
pass time.

Richard.

> Thanks for your thoughts and sugguestions!
> 
> BR,
> Jiufu
> >
> > Richard.
> >
> > From f46855709dd45603d18f2dcd8403f5b060c164f0 Mon Sep 17 00:00:00 2001
> > From: Richard Biener <rguenther@suse.de>
> > Date: Tue, 25 Jan 2022 13:11:57 +0100
> > Subject: [PATCH] tree-optimization/104214 - improve IV analysis with integer
> >  IV compares
> > To: gcc-patches@gcc.gnu.org
> >
> > For rewriting BASE0 + STEP0 cmp BASE1 + STEP1 as
> > BASE0 + STEP0 - STEP1 cmp BASE1 for signed integers we can use
> > niter assumptions to ensure that BASE0 + STEP0 - STEP1 does not
> > overflow instead of giving up when we cannot prove this
> > statically.
> >
> > We can use the existing assert_no_overflow_lt for this and make it
> > efficient for LE_EXPR also by rewriting LE_EXPR IV compares to LT_EXPR earlier.
> >
> > 2022-01-25  Richard Biener  <rguenther@suse.de>
> >
> > 	PR tree-optimization/104214
> > 	* tree-ssa-loop-niter.cc (number_of_iterations_le): Refactor
> > 	into ...
> > 	(number_of_iterations_le_to_lt): ... this, just doing
> > 	the iv->base rewriting and assumption registering.
> > 	(number_of_iterations_cond): Rewrite LE_EXPR into LT_EXPR
> > 	earlier.  When rewriting BASE0 + STEP0 cmp BASE1 + STEP1
> > 	as BASE0 + STEP0 - STEP1 cmp BASE1 would fail for LT_EXPR
> > 	because of possible overflow register assumptions instead.
> >
> > 	* gcc.dg/vect/pr81196-3.c: New testcase variant.
> > ---
> >  gcc/testsuite/gcc.dg/vect/pr81196-3.c | 12 +++++
> >  gcc/tree-ssa-loop-niter.cc            | 67 +++++++++++++++------------
> >  2 files changed, 49 insertions(+), 30 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/vect/pr81196-3.c
> >
> > diff --git a/gcc/testsuite/gcc.dg/vect/pr81196-3.c b/gcc/testsuite/gcc.dg/vect/pr81196-3.c
> > new file mode 100644
> > index 00000000000..bcdd815dc5d
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/pr81196-3.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +void b (int *p, int j, int k)
> > +{
> > +  p = (int *)__builtin_assume_aligned(p, __BIGGEST_ALIGNMENT__);
> > +  int i = 0;
> > +  for(; j < k; ++j, --k)
> > +    p[i++] = 1;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
> > diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
> > index d33095b8e03..b5f3d4b4a8d 100644
> > --- a/gcc/tree-ssa-loop-niter.cc
> > +++ b/gcc/tree-ssa-loop-niter.cc
> > @@ -1721,17 +1721,17 @@ number_of_iterations_lt (class loop *loop, tree type, affine_iv *iv0,
> >    return true;
> >  }
> >  
> > -/* Determines number of iterations of loop whose ending condition
> > -   is IV0 <= IV1.  TYPE is the type of the iv.  The number of
> > -   iterations is stored to NITER.  EXIT_MUST_BE_TAKEN is true if
> > -   we know that this condition must eventually become false (we derived this
> > +/* Rewrite the IV0 <= IV1 condition to IV0 < IV1 by adjusting one of
> > +   the IVs bases.  TYPE is the type of the iv.  Assumptions are
> > +   recorded to NITER.  EXIT_MUST_BE_TAKEN is true if we know that this
> > +   condition must eventually become false (we derived this
> >     earlier, and possibly set NITER->assumptions to make sure this
> > -   is the case).  BNDS bounds the difference IV1->base - IV0->base.  */
> > +   is the case).  */
> >  
> >  static bool
> > -number_of_iterations_le (class loop *loop, tree type, affine_iv *iv0,
> > -			 affine_iv *iv1, class tree_niter_desc *niter,
> > -			 bool exit_must_be_taken, bounds *bnds)
> > +number_of_iterations_le_to_lt (tree type, affine_iv *iv0, affine_iv *iv1,
> > +			       class tree_niter_desc *niter,
> > +			       bool exit_must_be_taken)
> >  {
> >    tree assumption;
> >    tree type1 = type;
> > @@ -1777,10 +1777,7 @@ number_of_iterations_le (class loop *loop, tree type, affine_iv *iv0,
> >      iv0->base = fold_build2 (MINUS_EXPR, type1,
> >  			     iv0->base, build_int_cst (type1, 1));
> >  
> > -  bounds_add (bnds, 1, type1);
> > -
> > -  return number_of_iterations_lt (loop, type, iv0, iv1, niter, exit_must_be_taken,
> > -				  bnds);
> > +  return true;
> >  }
> >  
> >  /* Dumps description of affine induction variable IV to FILE.  */
> > @@ -1862,6 +1859,17 @@ number_of_iterations_cond (class loop *loop,
> >        code = swap_tree_comparison (code);
> >      }
> >  
> > +  /* If the loop exits immediately, there is nothing to do.  */
> > +  tree tem = fold_binary (code, boolean_type_node, iv0->base, iv1->base);
> > +  if (tem && integer_zerop (tem))
> > +    {
> > +      if (!every_iteration)
> > +	return false;
> > +      niter->niter = build_int_cst (unsigned_type_for (type), 0);
> > +      niter->max = 0;
> > +      return true;
> > +    }
> > +
> >    if (POINTER_TYPE_P (type))
> >      {
> >        /* Comparison of pointers is undefined unless both iv0 and iv1 point
> > @@ -1884,6 +1892,15 @@ number_of_iterations_cond (class loop *loop,
> >  	exit_must_be_taken = true;
> >      }
> >  
> > +  /* Turn LE_EXPR to LT_EXPR, registering required assumptions.  */
> > +  if (code == LE_EXPR)
> > +    {
> > +      if (!number_of_iterations_le_to_lt (type, iv0, iv1, niter,
> > +					  exit_must_be_taken))
> > +	return false;
> > +      code = LT_EXPR;
> > +    }
> > +
> >    /* We can handle cases which neither of the sides of the comparison is
> >       invariant:
> >  
> > @@ -1928,15 +1945,21 @@ number_of_iterations_cond (class loop *loop,
> >  	       pointer compares, we also know the resulting IV does not
> >  	       overflow.  */
> >  	    ;
> > -	  else if (code != NE_EXPR)
> > -	    return false;
> If two lines are removed, some cases in original PR may not work.
> And as you also mentioned above, we have to add more assumptions, then.
> I saw you are try to use  number_of_iterations_le_to_lt and
> assert_no_overflow_lt.  I still think it would be ok to use
> negative step on combined IV to indicate the IV combining
> is invalid.
> 
> Thanks a gain for your comments!
> 
> BR,
> Jiufu
> 
> >  	  else
> > +	    /* For LT_EXPR we register the assumptions necessary for
> > +	       the adjusted IV0 to not overflow.  */
> >  	    iv0->no_overflow = false;
> >  	}
> >  
> >        iv0->step = step;
> >        iv1->step = build_int_cst (step_type, 0);
> >        iv1->no_overflow = true;
> > +      if (code == LT_EXPR && !iv0->no_overflow)
> > +	{
> > +	  if (!assert_no_overflow_lt (type, iv0, iv1, niter, step))
> > +	    return false;
> > +	  /* We will now have iv0->no_overflow == true again.  */
> > +	}
> >      }
> >  
> >    /* If the result of the comparison is a constant,  the loop is weird.  More
> > @@ -1945,17 +1968,6 @@ number_of_iterations_cond (class loop *loop,
> >    if (integer_zerop (iv0->step) && integer_zerop (iv1->step))
> >      return false;
> >  
> > -  /* If the loop exits immediately, there is nothing to do.  */
> > -  tree tem = fold_binary (code, boolean_type_node, iv0->base, iv1->base);
> > -  if (tem && integer_zerop (tem))
> > -    {
> > -      if (!every_iteration)
> > -	return false;
> > -      niter->niter = build_int_cst (unsigned_type_for (type), 0);
> > -      niter->max = 0;
> > -      return true;
> > -    }
> > -
> >    /* OK, now we know we have a senseful loop.  Handle several cases, depending
> >       on what comparison operator is used.  */
> >    bound_difference (loop, iv1->base, iv0->base, &bnds);
> > @@ -1994,11 +2006,6 @@ number_of_iterations_cond (class loop *loop,
> >  				     exit_must_be_taken, &bnds);
> >        break;
> >  
> > -    case LE_EXPR:
> > -      ret = number_of_iterations_le (loop, type, iv0, iv1, niter,
> > -				     exit_must_be_taken, &bnds);
> > -      break;
> > -
> >      default:
> >        gcc_unreachable ();
> >      }
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-01-27  9:45 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-13  1:48 [PATCH 1/2] Check negative combined step Jiufu Guo
2022-01-13  1:48 ` [PATCH 2/2] Add assumption combining iv Jiufu Guo
2022-01-24 10:37 ` [PATCH 1/2] Check negative combined step Richard Biener
2022-01-25  4:25   ` Jiufu Guo
2022-01-25  5:15     ` Jiufu Guo
2022-01-25  7:47       ` Richard Biener
2022-01-25 11:42         ` Jiufu Guo
2022-01-25 12:02           ` Richard Biener
2022-01-25 12:21             ` Richard Biener
2022-01-27  9:12               ` Jiufu Guo
2022-01-27  9:45                 ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).