From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <guojiufu@linux.ibm.com>
Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com
 [148.163.156.1])
 by sourceware.org (Postfix) with ESMTPS id 7B0C13858C39
 for <gcc-patches@gcc.gnu.org>; Thu,  9 Dec 2021 06:53:49 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 7B0C13858C39
Received: from pps.filterd (m0098409.ppops.net [127.0.0.1])
 by mx0a-001b2d01.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 1B94gY3B018254; 
 Thu, 9 Dec 2021 06:53:47 GMT
Received: from pps.reinject (localhost [127.0.0.1])
 by mx0a-001b2d01.pphosted.com with ESMTP id 3cub0vt2yd-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Thu, 09 Dec 2021 06:53:47 +0000
Received: from m0098409.ppops.net (m0098409.ppops.net [127.0.0.1])
 by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 1B96R72E031099;
 Thu, 9 Dec 2021 06:53:47 GMT
Received: from ppma02dal.us.ibm.com (a.bd.3ea9.ip4.static.sl-reverse.com
 [169.62.189.10])
 by mx0a-001b2d01.pphosted.com with ESMTP id 3cub0vt2xw-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Thu, 09 Dec 2021 06:53:47 +0000
Received: from pps.filterd (ppma02dal.us.ibm.com [127.0.0.1])
 by ppma02dal.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 1B96nDNn017594;
 Thu, 9 Dec 2021 06:53:46 GMT
Received: from b03cxnp07029.gho.boulder.ibm.com
 (b03cxnp07029.gho.boulder.ibm.com [9.17.130.16])
 by ppma02dal.us.ibm.com with ESMTP id 3cqyyc7es9-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Thu, 09 Dec 2021 06:53:46 +0000
Received: from b03ledav005.gho.boulder.ibm.com
 (b03ledav005.gho.boulder.ibm.com [9.17.130.236])
 by b03cxnp07029.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id
 1B96riFD26935746
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);
 Thu, 9 Dec 2021 06:53:44 GMT
Received: from b03ledav005.gho.boulder.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id 9D7D3BE05B;
 Thu,  9 Dec 2021 06:53:44 +0000 (GMT)
Received: from b03ledav005.gho.boulder.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id 386F3BE059;
 Thu,  9 Dec 2021 06:53:44 +0000 (GMT)
Received: from pike (unknown [9.5.12.127])
 by b03ledav005.gho.boulder.ibm.com (Postfix) with ESMTPS;
 Thu,  9 Dec 2021 06:53:44 +0000 (GMT)
From: Jiufu Guo <guojiufu@linux.ibm.com>
To: Richard Biener <rguenther@suse.de>
Cc: gcc-patches@gcc.gnu.org, amker.cheng@gmail.com, wschmidt@linux.ibm.com,
 segher@kernel.crashing.org, dje.gcc@gmail.com, jlaw@tachyum.com
Subject: Re: [RFC] Overflow check in simplifying exit cond comparing two IVs.
References: <20211018133757.3960-1-guojiufu@linux.ibm.com>
 <27rnp025-8812-n4n8-oqp8-527311121ps3@fhfr.qr>
Date: Thu, 09 Dec 2021 14:53:30 +0800
In-Reply-To: <27rnp025-8812-n4n8-oqp8-527311121ps3@fhfr.qr> (Richard Biener's
 message of "Thu, 28 Oct 2021 11:13:30 +0200 (CEST)")
Message-ID: <7eh7bimfpx.fsf@pike.rch.stglabs.ibm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.2 (gnu/linux)
Content-Type: text/plain
X-TM-AS-GCONF: 00
X-Proofpoint-ORIG-GUID: 32Gc0451dx50gUEPGWhcj6H7K_OtsAE3
X-Proofpoint-GUID: W8iNxUm7o99E0mD_V3M4ek534hv61MTL
X-Proofpoint-UnRewURL: 0 URL was un-rewritten
MIME-Version: 1.0
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.205,Aquarius:18.0.790,Hydra:6.0.425,FMLib:17.11.62.513
 definitions=2021-12-09_03,2021-12-08_01,2021-12-02_01
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0
 suspectscore=0 spamscore=0
 adultscore=0 phishscore=0 impostorscore=0 lowpriorityscore=0 bulkscore=0
 malwarescore=0 mlxscore=0 priorityscore=1501 mlxlogscore=999 clxscore=1011
 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2110150000
 definitions=main-2112090033
X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H2,
 SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Thu, 09 Dec 2021 06:53:51 -0000

Richard Biener <rguenther@suse.de> writes:

> On Mon, 18 Oct 2021, Jiufu Guo wrote:
>
>> With reference the discussions in:
>> https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574334.html
>> https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572006.html
>> https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578672.html
>> 
>> Base on the patches in above discussion, we may draft a patch to fix the
>> issue.
>> 
>> In this patch, to make sure it is ok to change '{b0,s0} op {b1,s1}' to
>> '{b0,s0-s1} op {b1,0}', we also compute the condition which could assume
>> both 2 ivs are not overflow/wrap: the niter "of '{b0,s0-s1} op {b1,0}'"
>> < the niter "of untill wrap for iv0 or iv1".
>> 
>> Does this patch make sense?
>
> Hum, the patch is mightly complex :/  I'm not sure we can throw
> artficial IVs at number_of_iterations_cond and expect a meaningful
> result.
>
> ISTR the problem is with number_of_iterations_ne[_max], but I would
> have to go and dig in myself again for a full recap of the problem.
> I did plan to do that, but not before stage3 starts.
>
> Thanks,
> Richard.

Hi Richard,

Thanks for your comment!  It is really complex, using artificial IVs and
recursively calling number_of_iterations_cond.  We may use a simpler way.
Not sure if you had started to dig into the problem.  I refined a patch.
Hope this patch is helpful.  This patch enhances the conditions in some
aspects. Attached are two test cases that could be handled.

---
 gcc/tree-ssa-loop-niter.c                     | 92 +++++++++++++++----
 .../gcc.c-torture/execute/pr100740.c          | 11 +++
 gcc/testsuite/gcc.dg/vect/pr102131.c          | 47 ++++++++++
 3 files changed, 134 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr100740.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr102131.c

diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
index 06954e437f5..ee1d7293c5c 100644
--- a/gcc/tree-ssa-loop-niter.c
+++ b/gcc/tree-ssa-loop-niter.c
@@ -1788,6 +1788,70 @@ dump_affine_iv (FILE *file, affine_iv *iv)
     }
 }
 
+/* Generate expr: (HIGH - LOW) / STEP, under UTYPE. */
+
+static tree
+get_step_count (tree high, tree low, tree step, tree utype,
+		bool end_inclusive = false)
+{
+  tree delta = fold_build2 (MINUS_EXPR, TREE_TYPE (low), high, low);
+  delta = fold_convert (utype,delta);
+  if (end_inclusive)
+    delta = fold_build2 (PLUS_EXPR, utype, delta, build_one_cst (utype));
+
+  if (tree_int_cst_sign_bit (step))
+    step = fold_build1 (NEGATE_EXPR, TREE_TYPE (step), step);
+  step = fold_convert (utype, step);
+
+  return fold_build2 (FLOOR_DIV_EXPR, utype, delta, step);
+}
+
+/*  Get the additional assumption if both two steps are not zero.
+    Assumptions satisfy that there is no overflow or wrap during
+    v0 and v1 chasing.  */
+
+static tree
+extra_iv_chase_assumption (affine_iv *iv0, affine_iv *iv1, tree step,
+			   enum tree_code code)
+{
+  /* No need additional assumptions.  */
+  if (code == NE_EXPR)
+    return boolean_true_node;
+
+  /* it not safe to transform {b0, 1} < {b1, 2}.  */
+  if (tree_int_cst_sign_bit (step))
+    return boolean_false_node;
+
+  /* No need addition assumption for pointer.  */
+  tree type = TREE_TYPE (iv0->base);
+  if (POINTER_TYPE_P (type))
+    return boolean_true_node;
+
+  bool positive0 = !tree_int_cst_sign_bit (iv0->step);
+  bool positive1 = !tree_int_cst_sign_bit (iv1->step);
+  bool positive = !tree_int_cst_sign_bit (step);
+  tree utype = unsigned_type_for (type);
+  bool add1 = code == LE_EXPR;
+  tree niter = positive
+		 ? get_step_count (iv1->base, iv0->base, step, utype, add1)
+		 : get_step_count (iv0->base, iv1->base, step, utype, add1);
+
+  int prec = TYPE_PRECISION (type);
+  signop sgn = TYPE_SIGN (type);
+  tree max = wide_int_to_tree (type, wi::max_value (prec, sgn));
+  tree min = wide_int_to_tree (type, wi::min_value (prec, sgn));
+  tree valid_niter0, valid_niter1;
+
+  valid_niter0 = positive0 ? get_step_count (max, iv0->base, iv0->step, utype)
+			   : get_step_count (iv0->base, min, iv0->step, utype);
+  valid_niter1 = positive1 ? get_step_count (max, iv1->base, iv1->step, utype)
+			   : get_step_count (iv1->base, min, iv1->step, utype);
+
+  tree e0 = fold_build2 (LT_EXPR, boolean_type_node, niter, valid_niter0);
+  tree e1 = fold_build2 (LT_EXPR, boolean_type_node, niter, valid_niter1);
+  return fold_build2 (TRUTH_AND_EXPR, boolean_type_node, e0, e1);
+}
+
 /* Determine the number of iterations according to condition (for staying
    inside loop) which compares two induction variables using comparison
    operator CODE.  The induction variable on left side of the comparison
@@ -1879,30 +1943,26 @@ number_of_iterations_cond (class loop *loop,
        {iv0.base, iv0.step - iv1.step} cmp_code {iv1.base, 0}
 
      provided that either below condition is satisfied:
+     a. iv0.step and iv1.step are integer.
+     b. Additional condition: before iv0 chase up v1, iv0 and iv1 should not
+     step over min or max of the type.  */
 
-       a) the test is NE_EXPR;
-       b) iv0.step - iv1.step is integer and iv0/iv1 don't overflow.
-
-     This rarely occurs in practice, but it is simple enough to manage.  */
   if (!integer_zerop (iv0->step) && !integer_zerop (iv1->step))
     {
+      if (TREE_CODE (iv0->step) != INTEGER_CST
+	  || TREE_CODE (iv1->step) != INTEGER_CST)
+	return false;
+
       tree step_type = POINTER_TYPE_P (type) ? sizetype : type;
-      tree step = fold_binary_to_constant (MINUS_EXPR, step_type,
-					   iv0->step, iv1->step);
-
-      /* No need to check sign of the new step since below code takes care
-	 of this well.  */
-      if (code != NE_EXPR
-	  && (TREE_CODE (step) != INTEGER_CST
-	      || !iv0->no_overflow || !iv1->no_overflow))
+      tree step
+	= fold_binary_to_constant (MINUS_EXPR, step_type, iv0->step, iv1->step);
+
+      niter->assumptions = extra_iv_chase_assumption (iv0, iv1, step, code);
+      if (integer_zerop (niter->assumptions))
 	return false;
 
       iv0->step = step;
-      if (!POINTER_TYPE_P (type))
-	iv0->no_overflow = false;
-
       iv1->step = build_int_cst (step_type, 0);
-      iv1->no_overflow = true;
     }
 
   /* If the result of the comparison is a constant,  the loop is weird.  More
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr100740.c b/gcc/testsuite/gcc.c-torture/execute/pr100740.c
new file mode 100644
index 00000000000..8fcdaffef3b
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr100740.c
@@ -0,0 +1,11 @@
+/* PR tree-optimization/100740 */
+
+unsigned a, b;
+int main() {
+  unsigned c = 0;
+  for (a = 0; a < 2; a++)
+    for (b = 0; b < 2; b++)
+      if (++c < a)
+        __builtin_abort ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/pr102131.c b/gcc/testsuite/gcc.dg/vect/pr102131.c
new file mode 100644
index 00000000000..23975cfeadb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr102131.c
@@ -0,0 +1,47 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "-O3" } */
+#define MAX ((unsigned int) 0xffffffff)
+#define MIN ((unsigned int) (0))
+
+int arr[512];
+
+#define FUNC(NAME, CODE, S0, S1)                                               \
+  unsigned __attribute__ ((noinline)) NAME (unsigned int b0, unsigned int b1)  \
+  {                                                                            \
+    unsigned int n = 0;                                                        \
+    unsigned int i0, i1;                                                       \
+    int *p = arr;                                                              \
+    for (i0 = b0, i1 = b1; i0 CODE i1; i0 += S0, i1 += S1)                     \
+      {                                                                        \
+	n++;                                                                   \
+	*p++ = i0 + i1;                                                        \
+      }                                                                        \
+    return n;                                                                  \
+  }
+
+FUNC (lt_5_1, <, 5, 1);
+FUNC (le_1_m5, <=, 1, -5);
+FUNC (lt_1_10, <, 1, 10);
+
+int
+main ()
+{
+  int fail = 0;
+  if (lt_5_1 (MAX - 124, MAX - 27) != 28)
+    fail++;
+
+  /* to save time, do not run this. */
+  /*
+  if (le_1_m5 (MIN + 1, MIN + 9) != 715827885)
+    fail++;  */
+
+  if (lt_1_10 (MAX - 1000, MAX - 500) != 51)
+    fail++;
+
+  if (fail)
+    __builtin_abort ();
+  
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } } */
-- 
2.17.1


>
>
>> BR,
>> Jiufu Guo
>> 
>> gcc/ChangeLog:
>> 
>> 	PR tree-optimization/100740
>> 	* tree-ssa-loop-niter.c (number_of_iterations_cond): Add
>> 	assume condition for combining of two IVs
>> 
>> gcc/testsuite/ChangeLog:
>> 
>> 	* gcc.c-torture/execute/pr100740.c: New test.
>> ---
>>  gcc/tree-ssa-loop-niter.c                     | 103 +++++++++++++++---
>>  .../gcc.c-torture/execute/pr100740.c          |  11 ++
>>  2 files changed, 99 insertions(+), 15 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr100740.c
>> 
>> diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
>> index 75109407124..f2987a4448d 100644
>> --- a/gcc/tree-ssa-loop-niter.c
>> +++ b/gcc/tree-ssa-loop-niter.c
>> @@ -1863,29 +1863,102 @@ number_of_iterations_cond (class loop *loop,
>>  
>>       provided that either below condition is satisfied:
>>  
>> -       a) the test is NE_EXPR;
>> -       b) iv0.step - iv1.step is integer and iv0/iv1 don't overflow.
>> +       a) iv0.step - iv1.step is integer and iv0/iv1 don't overflow.
>> +       b) assumptions in below table also need to be satisfied.
>> +
>> +	| iv0     | iv1     | assum (iv0<iv1)     | assum (iv0!=iv1)    |
>> +	|---------+---------+---------------------+---------------------|
>> +	| (b0,2)  | (b1,1)  | before iv1 overflow | before iv1 overflow |
>> +	| (b0,2)  | (b1,-1) | true                | true                |
>> +	| (b0,-1) | (b1,-2) | before iv0 overflow | before iv0 overflow |
>> +	|         |         |                     |                     |
>> +	| (b0,1)  | (b1,2)  | false               | before iv0 overflow |
>> +	| (b0,-1) | (b1,2)  | false               | true                |
>> +	| (b0,-2) | (b1,-1) | false               | before iv1 overflow |
>> +       'true' in above table means no need additional condition.
>> +       'false' means this case can not satify the transform.
>> +       The first three rows: iv0->step > iv1->step;
>> +       The second three rows: iv0->step < iv1->step.
>>  
>>       This rarely occurs in practice, but it is simple enough to manage.  */
>>    if (!integer_zerop (iv0->step) && !integer_zerop (iv1->step))
>>      {
>> +      if (TREE_CODE (iv0->step) != INTEGER_CST
>> +	  || TREE_CODE (iv1->step) != INTEGER_CST)
>> +	return false;
>> +      if (!iv0->no_overflow || !iv1->no_overflow)
>> +	return false;
>> +
>>        tree step_type = POINTER_TYPE_P (type) ? sizetype : type;
>> -      tree step = fold_binary_to_constant (MINUS_EXPR, step_type,
>> -					   iv0->step, iv1->step);
>> -
>> -      /* No need to check sign of the new step since below code takes care
>> -	 of this well.  */
>> -      if (code != NE_EXPR
>> -	  && (TREE_CODE (step) != INTEGER_CST
>> -	      || !iv0->no_overflow || !iv1->no_overflow))
>> +      tree step
>> +	= fold_binary_to_constant (MINUS_EXPR, step_type, iv0->step, iv1->step);
>> +
>> +      if (code != NE_EXPR && tree_int_cst_sign_bit (step))
>>  	return false;
>>  
>> -      iv0->step = step;
>> -      if (!POINTER_TYPE_P (type))
>> -	iv0->no_overflow = false;
>> +      bool positive0 = !tree_int_cst_sign_bit (iv0->step);
>> +      bool positive1 = !tree_int_cst_sign_bit (iv1->step);
>>  
>> -      iv1->step = build_int_cst (step_type, 0);
>> -      iv1->no_overflow = true;
>> +      /* Cases in rows 2 and 4 of above table.  */
>> +      if ((positive0 && !positive1) || (!positive0 && positive1))
>> +	{
>> +	  iv0->step = step;
>> +	  iv1->step = build_int_cst (step_type, 0);
>> +	  return number_of_iterations_cond (loop, type, iv0, code, iv1,
>> +					    niter, only_exit, every_iteration);
>> +	}
>> +
>> +      affine_iv i_0, i_1;
>> +      class tree_niter_desc num;
>> +      i_0 = *iv0;
>> +      i_1 = *iv1;
>> +      i_0.step = step;
>> +      i_1.step = build_int_cst (step_type, 0);
>> +      if (!number_of_iterations_cond (loop, type, &i_0, code, &i_1, &num,
>> +				      only_exit, every_iteration))
>> +	return false;
>> +
>> +      affine_iv i0, i1;
>> +      class tree_niter_desc num_wrap;
>> +      i0 = *iv0;
>> +      i1 = *iv1;
>> +
>> +      /* Reset iv0 and iv1 to calculate the niter which cause overflow.  */
>> +      if (tree_int_cst_lt (i1.step, i0.step))
>> +	{
>> +	  if (positive0 && positive1)
>> +	    i0.step = build_int_cst (step_type, 0);
>> +	  else if (!positive0 && !positive1)
>> +	    i1.step = build_int_cst (step_type, 0);
>> +	  if (code == NE_EXPR)
>> +	    code = LT_EXPR;
>> +	}
>> +      else
>> +	{
>> +	  if (positive0 && positive1)
>> +	    i1.step = build_int_cst (step_type, 0);
>> +	  else if (!positive0 && !positive1)
>> +	    i0.step = build_int_cst (step_type, 0);
>> +	  gcc_assert (code == NE_EXPR);
>> +	  code = GT_EXPR;
>> +	}
>> +
>> +      /* Calculate the niter which cause overflow.  */
>> +      if (!number_of_iterations_cond (loop, type, &i0, code, &i1, &num_wrap,
>> +				      only_exit, every_iteration))
>> +	return false;
>> +
>> +      /* Make assumption there is no overflow. */
>> +      tree assum
>> +	= fold_build2 (LE_EXPR, boolean_type_node, num.niter,
>> +		       fold_convert (TREE_TYPE (num.niter), num_wrap.niter));
>> +      num.assumptions = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
>> +				      num.assumptions, assum);
>> +
>> +      *iv0 = i_0;
>> +      *iv1 = i_1;
>> +      *niter = num;
>> +      return true;
>>      }
>>  
>>    /* If the result of the comparison is a constant,  the loop is weird.  More
>> diff --git a/gcc/testsuite/gcc.c-torture/execute/pr100740.c b/gcc/testsuite/gcc.c-torture/execute/pr100740.c
>> new file mode 100644
>> index 00000000000..8fcdaffef3b
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.c-torture/execute/pr100740.c
>> @@ -0,0 +1,11 @@
>> +/* PR tree-optimization/100740 */
>> +
>> +unsigned a, b;
>> +int main() {
>> +  unsigned c = 0;
>> +  for (a = 0; a < 2; a++)
>> +    for (b = 0; b < 2; b++)
>> +      if (++c < a)
>> +        __builtin_abort ();
>> +  return 0;
>> +}
>>