From: Hao Liu OS <hliu@os.amperecomputing.com>
To: "GCC-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: [PATCH] tree-optimization/PR112774 - SCEV: extend the chrec tree with a nonwrapping flag
Date: Mon, 4 Dec 2023 10:05:05 +0000 [thread overview]
Message-ID: <SN6PR0102MB348749DE7F642DFC46298CEBE186A@SN6PR0102MB3487.prod.exchangelabs.com> (raw)
Loop vecotorization can not optimize following case due to SCEV is not affine
failure (i+offset may overflow):
int A[1024 * 2];
int foo (unsigned offset, unsigned N)
{
int sum = 0;
for (unsigned i = 0; i < N; i++)
sum += A[i + offset];
return sum;
}
Actually, niter pass can find nonwrapping induction variables (i.e., i + offset
can not overflow) by inferring from undefined behaviors like array access (see
record_nonwrapping_iv). But this information is not shared to SCEV yet. This
patch adds a nonwrapping flag to the chrec tree, which allows SCEV to re-use it
to pass the nonwrapping checks like scev_probably_wraps_p, and finaly loop
vectorization could succeed.
The new flag is defined as CHREC_NOWRAP(tree), and the dump info is changed from
"{offset, +, 1}_1" -> "{offset, +, 1}<nw>_1" (nw is short for nonwrapping). Two
SCEV interfaces record_nonwrapping_chrec and nonwrapping_chrec_p are added to
set and check the flag respectively.
However, an extra problem is caused by resetting SCEV cache (i.e., scev_reset or
reset_scev_htab), which may not be synchronized with the calling of
free_numbers_of_iterations_estimates, which set the loop->estimate_state to
EST_NOT_COMPUTED and make sure the above inferring from array access is called.
In other words, the nonwrapping flag could be cleared and lost by resetting SCEV
cache if the loop->estimate_state is not reset.
E.g., gimple_duplicate_loop_body_to_header_edge/flush_ssaname_freelist,
which calls scev_reset/scev_reset_htab to clear the SCEV cache, but the
estimate_state is still kept as EST_AVAILABLE and the flag will not be set in
loop vectorization.
This patch uses a simple fix by calling free_numbers_of_iterations_estimates in
vect_analyze_loop, which will make sure the flag is always set propriately in
loop vectorization. This fix is a bit ad-hoc (works for loop vectorization
only), if there is more reasonable method, I will revert the simple fix and try
that.
This patch is bootstrapped and tested on aarch64-linux-gnu with no regressions.
OK for trunk?
---
The patch is as following:
[PATCH] SCEV: extend the chrec tree with a nonwrapping flag
[PR112774]
The flag is defined as CHREC_NOWRAP(tree), and will be dumped from
"{offset, +, 1}_1" to "{offset, +, 1}<nw>_1" (nw is short for nonwrapping).
Two SCEV interfaces record_nonwrapping_chrec and nonwrapping_chrec_p are
added to set and check the flag respectively.
As resetting the SCEV cache (i.e., the chrec trees) may not reset the
loop->estimate_state, free_numbers_of_iterations_estimates is called
explicitly in loop vectorization to make sure the flag can be
calculated propriately by niter.
gcc/ChangeLog:
PR tree-optimization/112774
* tree-pretty-print.cc: if nonwrapping flag is set, chrec will be
printed with additional <nw> info.
* tree-scalar-evolution.cc: add record_nonwrapping_chrec and
nonwrapping_chrec_p to set and check the new flag respectively.
* tree-scalar-evolution.h: Likewise.
* tree-ssa-loop-niter.cc (idx_infer_loop_bounds,
infer_loop_bounds_from_pointer_arith, infer_loop_bounds_from_signedness,
scev_probably_wraps_p): call record_nonwrapping_chrec before
record_nonwrapping_iv, call nonwrapping_chrec_p to check the flag is set and
return false from scev_probably_wraps_p.
* tree-vect-loop.cc (vect_analyze_loop): call
free_numbers_of_iterations_estimates explicitly.
* gcc/tree.h: add CHREC_NOWRAP(NODE), base.nothrow_flag is used
to represent the nonwrapping info.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/scev-16.c: New test.
---
gcc/testsuite/gcc.dg/tree-ssa/scev-16.c | 17 +++++++++++++++++
gcc/tree-pretty-print.cc | 2 +-
gcc/tree-scalar-evolution.cc | 24 ++++++++++++++++++++++++
gcc/tree-scalar-evolution.h | 2 ++
gcc/tree-ssa-loop-niter.cc | 21 ++++++++++++++++-----
gcc/tree-vect-loop.cc | 4 ++++
gcc/tree.h | 8 +++++---
7 files changed, 69 insertions(+), 9 deletions(-)
create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
new file mode 100644
index 00000000000..96ea36e4c65
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-vect-scev" } */
+
+int A[1024 * 2];
+
+int foo (unsigned offset, unsigned N)
+{
+ int sum = 0;
+
+ for (unsigned i = 0; i < N; i++)
+ sum += A[i + offset];
+
+ return sum;
+}
+
+/* { dg-final { scan-tree-dump "vec_transform_loop" "vect" } } */
+/* { dg-final { scan-tree-dump-not "missed: failed: evolution of offset is not affine" "vect" } } */
diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc
index 1fadd752d05..0dabb6d1580 100644
--- a/gcc/tree-pretty-print.cc
+++ b/gcc/tree-pretty-print.cc
@@ -3488,7 +3488,7 @@ dump_generic_node (pretty_printer *pp, tree node, int spc, dump_flags_t flags,
dump_generic_node (pp, CHREC_LEFT (node), spc, flags, false);
pp_string (pp, ", +, ");
dump_generic_node (pp, CHREC_RIGHT (node), spc, flags, false);
- pp_string (pp, "}_");
+ pp_string (pp, !CHREC_NOWRAP (node) ? "}_" : "}<nw>_");
pp_scalar (pp, "%u", CHREC_VARIABLE (node));
is_stmt = false;
break;
diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc
index 065bcd0743d..a9112572e0c 100644
--- a/gcc/tree-scalar-evolution.cc
+++ b/gcc/tree-scalar-evolution.cc
@@ -2050,6 +2050,30 @@ analyze_scalar_evolution (class loop *loop, tree var)
return res;
}
+/* If CHREC doesn't overflow, set the nonwrapping flag. */
+
+void record_nonwrapping_chrec (tree chrec)
+{
+ CHREC_NOWRAP(chrec) = 1;
+
+ if (dump_file && (dump_flags & TDF_SCEV))
+ {
+ fprintf (dump_file, "(record_nonwrapping_chrec: ");
+ print_generic_expr (dump_file, chrec);
+ fprintf (dump_file, ")\n");
+ }
+}
+
+/* Return true if CHREC's nonwrapping flag is set. */
+
+bool nonwrapping_chrec_p (tree chrec)
+{
+ if (!chrec || TREE_CODE(chrec) != POLYNOMIAL_CHREC)
+ return false;
+
+ return CHREC_NOWRAP(chrec);
+}
+
/* Analyzes and returns the scalar evolution of VAR address in LOOP. */
static tree
diff --git a/gcc/tree-scalar-evolution.h b/gcc/tree-scalar-evolution.h
index a64ed78fe63..f57fde12ee2 100644
--- a/gcc/tree-scalar-evolution.h
+++ b/gcc/tree-scalar-evolution.h
@@ -43,6 +43,8 @@ extern bool simple_iv (class loop *, class loop *, tree, struct affine_iv *,
bool);
extern bool iv_can_overflow_p (class loop *, tree, tree, tree);
extern tree compute_overall_effect_of_inner_loop (class loop *, tree);
+extern void record_nonwrapping_chrec (tree);
+extern bool nonwrapping_chrec_p (tree);
/* Returns the basic block preceding LOOP, or the CFG entry block when
the loop is function's body. */
diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
index 2098bef9a97..d465e0ed7e1 100644
--- a/gcc/tree-ssa-loop-niter.cc
+++ b/gcc/tree-ssa-loop-niter.cc
@@ -4206,11 +4206,15 @@ idx_infer_loop_bounds (tree base, tree *idx, void *dta)
/* If access is not executed on every iteration, we must ensure that overlow
may not make the access valid later. */
- if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt))
- && scev_probably_wraps_p (NULL_TREE,
- initial_condition_in_loop_num (ev, loop->num),
- step, data->stmt, loop, true))
- upper = false;
+ if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt)))
+ {
+ if (scev_probably_wraps_p (NULL_TREE,
+ initial_condition_in_loop_num (ev, loop->num),
+ step, data->stmt, loop, true))
+ upper = false;
+ }
+ else
+ record_nonwrapping_chrec (ev);
record_nonwrapping_iv (loop, init, step, data->stmt, low, high, false, upper);
return true;
@@ -4324,6 +4328,7 @@ infer_loop_bounds_from_pointer_arith (class loop *loop, gimple *stmt)
if (flag_delete_null_pointer_checks && int_cst_value (low) == 0)
low = build_int_cstu (TREE_TYPE (low), TYPE_ALIGN_UNIT (TREE_TYPE (type)));
+ record_nonwrapping_chrec (scev);
record_nonwrapping_iv (loop, base, step, stmt, low, high, false, true);
}
@@ -4371,6 +4376,7 @@ infer_loop_bounds_from_signedness (class loop *loop, gimple *stmt)
high = wide_int_to_tree (type, r.upper_bound ());
}
+ record_nonwrapping_chrec (scev);
record_nonwrapping_iv (loop, base, step, stmt, low, high, false, true);
}
@@ -5505,6 +5511,11 @@ scev_probably_wraps_p (tree var, tree base, tree step,
if (loop_exits_before_overflow (base, step, at_stmt, loop))
return false;
+ /* Check the nonwrapping flag, which may be set by niter analysis (e.g., the
+ above loop exits before overflow). */
+ if (var && nonwrapping_chrec_p (analyze_scalar_evolution (loop, var)))
+ return false;
+
/* At this point we still don't have a proof that the iv does not
overflow: give up. */
return true;
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index dd584ab4a42..6261cd1be1d 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -3570,6 +3570,10 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared)
analysis are done under the assumptions. */
loop_constraint_set (loop, LOOP_C_FINITE);
}
+ else
+ /* Clear the existing niter information to make sure the nonwrapping flag
+ will be calculated and set propriately. */
+ free_numbers_of_iterations_estimates (loop);
auto_vector_modes vector_modes;
/* Autodetect first vector size we try. */
diff --git a/gcc/tree.h b/gcc/tree.h
index 086b55f0375..59af8920f02 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -1438,9 +1438,11 @@ class auto_suppress_location_wrappers
#define COND_EXPR_ELSE(NODE) (TREE_OPERAND (COND_EXPR_CHECK (NODE), 2))
/* Accessors for the chains of recurrences. */
-#define CHREC_LEFT(NODE) TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 0)
-#define CHREC_RIGHT(NODE) TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 1)
-#define CHREC_VARIABLE(NODE) POLYNOMIAL_CHREC_CHECK (NODE)->base.u.chrec_var
+#define CHREC_LEFT(NODE) TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 0)
+#define CHREC_RIGHT(NODE) TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 1)
+#define CHREC_VARIABLE(NODE) POLYNOMIAL_CHREC_CHECK (NODE)->base.u.chrec_var
+/* Nonzero if this chrec doesn't overflow (i.e., nonwrapping). */
+#define CHREC_NOWRAP(NODE) POLYNOMIAL_CHREC_CHECK (NODE)->base.nothrow_flag
/* LABEL_EXPR accessor. This gives access to the label associated with
the given label expression. */
--
2.34.1
next reply other threads:[~2023-12-04 10:05 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-04 10:05 Hao Liu OS [this message]
2023-12-06 9:46 ` Hao Liu OS
2023-12-06 11:49 ` Richard Biener
2023-12-07 8:54 ` Hao Liu OS
2023-12-07 14:12 ` Richard Biener
2023-12-08 3:34 ` Hao Liu OS
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=SN6PR0102MB348749DE7F642DFC46298CEBE186A@SN6PR0102MB3487.prod.exchangelabs.com \
--to=hliu@os.amperecomputing.com \
--cc=gcc-patches@gcc.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).