public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Hao Liu OS <hliu@os.amperecomputing.com>
To: "GCC-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: [PATCH] tree-optimization/PR112774 - SCEV: extend the chrec tree with a nonwrapping flag
Date: Mon, 4 Dec 2023 10:05:05 +0000	[thread overview]
Message-ID: <SN6PR0102MB348749DE7F642DFC46298CEBE186A@SN6PR0102MB3487.prod.exchangelabs.com> (raw)

Loop vecotorization can not optimize following case due to SCEV is not affine
failure (i+offset may overflow):

    int A[1024 * 2];
    
    int foo (unsigned offset, unsigned N) 
    {
      int sum = 0;
      for (unsigned i = 0; i < N; i++)
        sum += A[i + offset];
      return sum;
    }

Actually, niter pass can find nonwrapping induction variables (i.e., i + offset
can not overflow) by inferring from undefined behaviors like array access (see
record_nonwrapping_iv). But this information is not shared to SCEV yet. This
patch adds a nonwrapping flag to the chrec tree, which allows SCEV to re-use it 
to pass the nonwrapping checks like scev_probably_wraps_p, and finaly loop 
vectorization could succeed.

The new flag is defined as CHREC_NOWRAP(tree), and the dump info is changed from
"{offset, +, 1}_1" -> "{offset, +, 1}<nw>_1" (nw is short for nonwrapping). Two
SCEV interfaces record_nonwrapping_chrec and nonwrapping_chrec_p are added to 
set and check the flag respectively.

However, an extra problem is caused by resetting SCEV cache (i.e., scev_reset or
reset_scev_htab), which may not be synchronized with the calling of
free_numbers_of_iterations_estimates, which set the loop->estimate_state to
EST_NOT_COMPUTED and make sure the above inferring from array access is called.
In other words, the nonwrapping flag could be cleared and lost by resetting SCEV
cache if the loop->estimate_state is not reset.
E.g., gimple_duplicate_loop_body_to_header_edge/flush_ssaname_freelist,
which calls scev_reset/scev_reset_htab to clear the SCEV cache, but the 
estimate_state is still kept as EST_AVAILABLE and the flag will not be set in
loop vectorization.

This patch uses a simple fix by calling free_numbers_of_iterations_estimates in
vect_analyze_loop, which will make sure the flag is always set propriately in
loop vectorization. This fix is a bit ad-hoc (works for loop vectorization
only), if there is more reasonable method, I will revert the simple fix and try
that.

This patch is bootstrapped and tested on aarch64-linux-gnu with no regressions. 
OK for trunk?

---
The patch is as following:

[PATCH] SCEV: extend the chrec tree with a nonwrapping flag
 [PR112774]

The flag is defined as CHREC_NOWRAP(tree), and will be dumped from
"{offset, +, 1}_1" to "{offset, +, 1}<nw>_1" (nw is short for nonwrapping).
Two SCEV interfaces record_nonwrapping_chrec and nonwrapping_chrec_p are
added to set and check the flag respectively.

As resetting the SCEV cache (i.e., the chrec trees) may not reset the
loop->estimate_state, free_numbers_of_iterations_estimates is called
explicitly in loop vectorization to make sure the flag can be
calculated propriately by niter.

gcc/ChangeLog:

	PR tree-optimization/112774
	* tree-pretty-print.cc: if nonwrapping flag is set, chrec will be
	printed with additional <nw> info.
	* tree-scalar-evolution.cc: add record_nonwrapping_chrec and
	nonwrapping_chrec_p to set and check the new flag respectively.
	* tree-scalar-evolution.h: Likewise.
	* tree-ssa-loop-niter.cc (idx_infer_loop_bounds,
	infer_loop_bounds_from_pointer_arith, infer_loop_bounds_from_signedness,
	scev_probably_wraps_p): call record_nonwrapping_chrec before
	record_nonwrapping_iv, call nonwrapping_chrec_p to check the flag is set and
	return false from scev_probably_wraps_p.
	* tree-vect-loop.cc (vect_analyze_loop): call
	free_numbers_of_iterations_estimates explicitly.
	* gcc/tree.h: add CHREC_NOWRAP(NODE), base.nothrow_flag is used
	to represent the nonwrapping info.

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/scev-16.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/scev-16.c | 17 +++++++++++++++++
 gcc/tree-pretty-print.cc                |  2 +-
 gcc/tree-scalar-evolution.cc            | 24 ++++++++++++++++++++++++
 gcc/tree-scalar-evolution.h             |  2 ++
 gcc/tree-ssa-loop-niter.cc              | 21 ++++++++++++++++-----
 gcc/tree-vect-loop.cc                   |  4 ++++
 gcc/tree.h                              |  8 +++++---
 7 files changed, 69 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/scev-16.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
new file mode 100644
index 00000000000..96ea36e4c65
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-vect-scev" } */
+
+int A[1024 * 2];
+
+int foo (unsigned offset, unsigned N)
+{
+  int sum = 0;
+
+  for (unsigned i = 0; i < N; i++)
+    sum += A[i + offset];
+
+  return sum;
+}
+
+/* { dg-final { scan-tree-dump "vec_transform_loop" "vect" } } */
+/* { dg-final { scan-tree-dump-not "missed:  failed: evolution of offset is not affine" "vect" } } */
diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc
index 1fadd752d05..0dabb6d1580 100644
--- a/gcc/tree-pretty-print.cc
+++ b/gcc/tree-pretty-print.cc
@@ -3488,7 +3488,7 @@ dump_generic_node (pretty_printer *pp, tree node, int spc, dump_flags_t flags,
       dump_generic_node (pp, CHREC_LEFT (node), spc, flags, false);
       pp_string (pp, ", +, ");
       dump_generic_node (pp, CHREC_RIGHT (node), spc, flags, false);
-      pp_string (pp, "}_");
+      pp_string (pp, !CHREC_NOWRAP (node) ? "}_" : "}<nw>_");
       pp_scalar (pp, "%u", CHREC_VARIABLE (node));
       is_stmt = false;
       break;
diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc
index 065bcd0743d..a9112572e0c 100644
--- a/gcc/tree-scalar-evolution.cc
+++ b/gcc/tree-scalar-evolution.cc
@@ -2050,6 +2050,30 @@ analyze_scalar_evolution (class loop *loop, tree var)
   return res;
 }
 
+/* If CHREC doesn't overflow, set the nonwrapping flag.  */
+
+void record_nonwrapping_chrec (tree chrec)
+{
+  CHREC_NOWRAP(chrec) = 1;
+
+  if (dump_file && (dump_flags & TDF_SCEV))
+    {
+      fprintf (dump_file, "(record_nonwrapping_chrec: ");
+      print_generic_expr (dump_file, chrec);
+      fprintf (dump_file, ")\n");
+    }
+}
+
+/* Return true if CHREC's nonwrapping flag is set.  */
+
+bool nonwrapping_chrec_p (tree chrec)
+{
+  if (!chrec || TREE_CODE(chrec) != POLYNOMIAL_CHREC)
+    return false;
+
+  return CHREC_NOWRAP(chrec);
+}
+
 /* Analyzes and returns the scalar evolution of VAR address in LOOP.  */
 
 static tree
diff --git a/gcc/tree-scalar-evolution.h b/gcc/tree-scalar-evolution.h
index a64ed78fe63..f57fde12ee2 100644
--- a/gcc/tree-scalar-evolution.h
+++ b/gcc/tree-scalar-evolution.h
@@ -43,6 +43,8 @@ extern bool simple_iv (class loop *, class loop *, tree, struct affine_iv *,
 		       bool);
 extern bool iv_can_overflow_p (class loop *, tree, tree, tree);
 extern tree compute_overall_effect_of_inner_loop (class loop *, tree);
+extern void record_nonwrapping_chrec (tree);
+extern bool nonwrapping_chrec_p (tree);
 
 /* Returns the basic block preceding LOOP, or the CFG entry block when
    the loop is function's body.  */
diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
index 2098bef9a97..d465e0ed7e1 100644
--- a/gcc/tree-ssa-loop-niter.cc
+++ b/gcc/tree-ssa-loop-niter.cc
@@ -4206,11 +4206,15 @@ idx_infer_loop_bounds (tree base, tree *idx, void *dta)
 
   /* If access is not executed on every iteration, we must ensure that overlow
      may not make the access valid later.  */
-  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt))
-      && scev_probably_wraps_p (NULL_TREE,
-				initial_condition_in_loop_num (ev, loop->num),
-				step, data->stmt, loop, true))
-    upper = false;
+  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt)))
+    {
+      if (scev_probably_wraps_p (NULL_TREE,
+				 initial_condition_in_loop_num (ev, loop->num),
+				 step, data->stmt, loop, true))
+	upper = false;
+    }
+  else
+    record_nonwrapping_chrec (ev);
 
   record_nonwrapping_iv (loop, init, step, data->stmt, low, high, false, upper);
   return true;
@@ -4324,6 +4328,7 @@ infer_loop_bounds_from_pointer_arith (class loop *loop, gimple *stmt)
   if (flag_delete_null_pointer_checks && int_cst_value (low) == 0)
     low = build_int_cstu (TREE_TYPE (low), TYPE_ALIGN_UNIT (TREE_TYPE (type)));
 
+  record_nonwrapping_chrec (scev);
   record_nonwrapping_iv (loop, base, step, stmt, low, high, false, true);
 }
 
@@ -4371,6 +4376,7 @@ infer_loop_bounds_from_signedness (class loop *loop, gimple *stmt)
       high = wide_int_to_tree (type, r.upper_bound ());
     }
 
+  record_nonwrapping_chrec (scev);
   record_nonwrapping_iv (loop, base, step, stmt, low, high, false, true);
 }
 
@@ -5505,6 +5511,11 @@ scev_probably_wraps_p (tree var, tree base, tree step,
   if (loop_exits_before_overflow (base, step, at_stmt, loop))
     return false;
 
+  /* Check the nonwrapping flag, which may be set by niter analysis (e.g., the
+     above loop exits before overflow).  */
+  if (var && nonwrapping_chrec_p (analyze_scalar_evolution (loop, var)))
+    return false;
+
   /* At this point we still don't have a proof that the iv does not
      overflow: give up.  */
   return true;
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index dd584ab4a42..6261cd1be1d 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -3570,6 +3570,10 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared)
 	 analysis are done under the assumptions.  */
       loop_constraint_set (loop, LOOP_C_FINITE);
     }
+  else
+    /* Clear the existing niter information to make sure the nonwrapping flag
+       will be calculated and set propriately.  */
+    free_numbers_of_iterations_estimates (loop);
 
   auto_vector_modes vector_modes;
   /* Autodetect first vector size we try.  */
diff --git a/gcc/tree.h b/gcc/tree.h
index 086b55f0375..59af8920f02 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -1438,9 +1438,11 @@ class auto_suppress_location_wrappers
 #define COND_EXPR_ELSE(NODE)	(TREE_OPERAND (COND_EXPR_CHECK (NODE), 2))
 
 /* Accessors for the chains of recurrences.  */
-#define CHREC_LEFT(NODE)          TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 0)
-#define CHREC_RIGHT(NODE)         TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 1)
-#define CHREC_VARIABLE(NODE)      POLYNOMIAL_CHREC_CHECK (NODE)->base.u.chrec_var
+#define CHREC_LEFT(NODE)        TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 0)
+#define CHREC_RIGHT(NODE)       TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 1)
+#define CHREC_VARIABLE(NODE)    POLYNOMIAL_CHREC_CHECK (NODE)->base.u.chrec_var
+/* Nonzero if this chrec doesn't overflow (i.e., nonwrapping).  */
+#define CHREC_NOWRAP(NODE)      POLYNOMIAL_CHREC_CHECK (NODE)->base.nothrow_flag
 
 /* LABEL_EXPR accessor. This gives access to the label associated with
    the given label expression.  */
-- 
2.34.1


             reply	other threads:[~2023-12-04 10:05 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-04 10:05 Hao Liu OS [this message]
2023-12-06  9:46 ` Hao Liu OS
2023-12-06 11:49   ` Richard Biener
2023-12-07  8:54     ` Hao Liu OS
2023-12-07 14:12       ` Richard Biener
2023-12-08  3:34         ` Hao Liu OS

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=SN6PR0102MB348749DE7F642DFC46298CEBE186A@SN6PR0102MB3487.prod.exchangelabs.com \
    --to=hliu@os.amperecomputing.com \
    --cc=gcc-patches@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).