From: Jiufu Guo <guojiufu@linux.ibm.com>
To: Richard Biener <rguenther@suse.de>
Cc: gcc-patches@gcc.gnu.org, amker.cheng@gmail.com,
wschmidt@linux.ibm.com, segher@kernel.crashing.org,
dje.gcc@gmail.com, jlaw@tachyum.com
Subject: Re: [RFC] Overflow check in simplifying exit cond comparing two IVs.
Date: Fri, 10 Dec 2021 12:28:44 +0800 [thread overview]
Message-ID: <7e7dcdm6bn.fsf@pike.rch.stglabs.ibm.com> (raw)
In-Reply-To: <7eh7bimfpx.fsf@pike.rch.stglabs.ibm.com> (Jiufu Guo's message of "Thu, 09 Dec 2021 14:53:30 +0800")
Jiufu Guo <guojiufu@linux.ibm.com> writes:
> Richard Biener <rguenther@suse.de> writes:
>
>> On Mon, 18 Oct 2021, Jiufu Guo wrote:
>>
>>> With reference the discussions in:
>>> https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574334.html
>>> https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572006.html
>>> https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578672.html
>>>
>>> Base on the patches in above discussion, we may draft a patch to fix the
>>> issue.
>>>
>>> In this patch, to make sure it is ok to change '{b0,s0} op {b1,s1}' to
>>> '{b0,s0-s1} op {b1,0}', we also compute the condition which could assume
>>> both 2 ivs are not overflow/wrap: the niter "of '{b0,s0-s1} op {b1,0}'"
>>> < the niter "of untill wrap for iv0 or iv1".
>>>
>>> Does this patch make sense?
>>
>> Hum, the patch is mightly complex :/ I'm not sure we can throw
>> artficial IVs at number_of_iterations_cond and expect a meaningful
>> result.
>>
>> ISTR the problem is with number_of_iterations_ne[_max], but I would
>> have to go and dig in myself again for a full recap of the problem.
>> I did plan to do that, but not before stage3 starts.
>>
>> Thanks,
>> Richard.
>
> Hi Richard,
>
> Thanks for your comment! It is really complex, using artificial IVs and
> recursively calling number_of_iterations_cond. We may use a simpler way.
> Not sure if you had started to dig into the problem. I refined a patch.
> Hope this patch is helpful. This patch enhances the conditions in some
> aspects. Attached are two test cases that could be handled.
Some questions, I want to consult here, it may help to make the patch
works better.
- 1. For signed type, I'm wondering if we could leverage the idea about
"UB on signed overflow" in the phase to call number_of_iterations_cond
where may be far from user source code.
If we can, we may just ignore the assumption for signed type.
But then, there would be inconsitent behavior between noopt(-O0) and
opt (e.g. -O2/-O3). For example:
"{INT_MAX-124, +5} < {INT_MAX-27, +1}".
At -O0, the 'niter' would be 28; while, at -O3, it may result as 26.
- 2. For NEQ, which you may also concern, the assumption
"delta % step == 0" would make it safe. It seems current, we handle
NEQ where no_overflow is true for both iv0 and iv1.
- 3. In the current patch, DIV_EXPR is used, the cost may be high in
some cases. I'm wondering if the below idea is workable:
Extent to longer type, and using MULT instead DIV, for example:
a < b/c ===> a*c < b. a*c may be need to use longer type than 'a'.
-- 3.1 For some special case, e.g. "{b0, 5} < {b1, -5}", the assumption
may be able to simplied. For general case, still thinking to reduce
the runtime cost from assumption.
Thanks again!
BR,
Jiufu
>
> ---
> gcc/tree-ssa-loop-niter.c | 92 +++++++++++++++----
> .../gcc.c-torture/execute/pr100740.c | 11 +++
> gcc/testsuite/gcc.dg/vect/pr102131.c | 47 ++++++++++
> 3 files changed, 134 insertions(+), 16 deletions(-)
> create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr100740.c
> create mode 100644 gcc/testsuite/gcc.dg/vect/pr102131.c
>
> diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
> index 06954e437f5..ee1d7293c5c 100644
> --- a/gcc/tree-ssa-loop-niter.c
> +++ b/gcc/tree-ssa-loop-niter.c
> @@ -1788,6 +1788,70 @@ dump_affine_iv (FILE *file, affine_iv *iv)
> }
> }
>
> +/* Generate expr: (HIGH - LOW) / STEP, under UTYPE. */
> +
> +static tree
> +get_step_count (tree high, tree low, tree step, tree utype,
> + bool end_inclusive = false)
> +{
> + tree delta = fold_build2 (MINUS_EXPR, TREE_TYPE (low), high, low);
> + delta = fold_convert (utype,delta);
> + if (end_inclusive)
> + delta = fold_build2 (PLUS_EXPR, utype, delta, build_one_cst (utype));
> +
> + if (tree_int_cst_sign_bit (step))
> + step = fold_build1 (NEGATE_EXPR, TREE_TYPE (step), step);
> + step = fold_convert (utype, step);
> +
> + return fold_build2 (FLOOR_DIV_EXPR, utype, delta, step);
> +}
> +
> +/* Get the additional assumption if both two steps are not zero.
> + Assumptions satisfy that there is no overflow or wrap during
> + v0 and v1 chasing. */
> +
> +static tree
> +extra_iv_chase_assumption (affine_iv *iv0, affine_iv *iv1, tree step,
> + enum tree_code code)
> +{
> + /* No need additional assumptions. */
> + if (code == NE_EXPR)
> + return boolean_true_node;
> +
> + /* it not safe to transform {b0, 1} < {b1, 2}. */
> + if (tree_int_cst_sign_bit (step))
> + return boolean_false_node;
> +
> + /* No need addition assumption for pointer. */
> + tree type = TREE_TYPE (iv0->base);
> + if (POINTER_TYPE_P (type))
> + return boolean_true_node;
> +
> + bool positive0 = !tree_int_cst_sign_bit (iv0->step);
> + bool positive1 = !tree_int_cst_sign_bit (iv1->step);
> + bool positive = !tree_int_cst_sign_bit (step);
> + tree utype = unsigned_type_for (type);
> + bool add1 = code == LE_EXPR;
> + tree niter = positive
> + ? get_step_count (iv1->base, iv0->base, step, utype, add1)
> + : get_step_count (iv0->base, iv1->base, step, utype, add1);
> +
> + int prec = TYPE_PRECISION (type);
> + signop sgn = TYPE_SIGN (type);
> + tree max = wide_int_to_tree (type, wi::max_value (prec, sgn));
> + tree min = wide_int_to_tree (type, wi::min_value (prec, sgn));
> + tree valid_niter0, valid_niter1;
> +
> + valid_niter0 = positive0 ? get_step_count (max, iv0->base, iv0->step, utype)
> + : get_step_count (iv0->base, min, iv0->step, utype);
> + valid_niter1 = positive1 ? get_step_count (max, iv1->base, iv1->step, utype)
> + : get_step_count (iv1->base, min, iv1->step, utype);
> +
> + tree e0 = fold_build2 (LT_EXPR, boolean_type_node, niter, valid_niter0);
> + tree e1 = fold_build2 (LT_EXPR, boolean_type_node, niter, valid_niter1);
> + return fold_build2 (TRUTH_AND_EXPR, boolean_type_node, e0, e1);
> +}
> +
> /* Determine the number of iterations according to condition (for staying
> inside loop) which compares two induction variables using comparison
> operator CODE. The induction variable on left side of the comparison
> @@ -1879,30 +1943,26 @@ number_of_iterations_cond (class loop *loop,
> {iv0.base, iv0.step - iv1.step} cmp_code {iv1.base, 0}
>
> provided that either below condition is satisfied:
> + a. iv0.step and iv1.step are integer.
> + b. Additional condition: before iv0 chase up v1, iv0 and iv1 should not
> + step over min or max of the type. */
>
> - a) the test is NE_EXPR;
> - b) iv0.step - iv1.step is integer and iv0/iv1 don't overflow.
> -
> - This rarely occurs in practice, but it is simple enough to manage. */
> if (!integer_zerop (iv0->step) && !integer_zerop (iv1->step))
> {
> + if (TREE_CODE (iv0->step) != INTEGER_CST
> + || TREE_CODE (iv1->step) != INTEGER_CST)
> + return false;
> +
> tree step_type = POINTER_TYPE_P (type) ? sizetype : type;
> - tree step = fold_binary_to_constant (MINUS_EXPR, step_type,
> - iv0->step, iv1->step);
> -
> - /* No need to check sign of the new step since below code takes care
> - of this well. */
> - if (code != NE_EXPR
> - && (TREE_CODE (step) != INTEGER_CST
> - || !iv0->no_overflow || !iv1->no_overflow))
> + tree step
> + = fold_binary_to_constant (MINUS_EXPR, step_type, iv0->step, iv1->step);
> +
> + niter->assumptions = extra_iv_chase_assumption (iv0, iv1, step, code);
> + if (integer_zerop (niter->assumptions))
> return false;
>
> iv0->step = step;
> - if (!POINTER_TYPE_P (type))
> - iv0->no_overflow = false;
> -
> iv1->step = build_int_cst (step_type, 0);
> - iv1->no_overflow = true;
> }
>
> /* If the result of the comparison is a constant, the loop is weird. More
> diff --git a/gcc/testsuite/gcc.c-torture/execute/pr100740.c b/gcc/testsuite/gcc.c-torture/execute/pr100740.c
> new file mode 100644
> index 00000000000..8fcdaffef3b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/pr100740.c
> @@ -0,0 +1,11 @@
> +/* PR tree-optimization/100740 */
> +
> +unsigned a, b;
> +int main() {
> + unsigned c = 0;
> + for (a = 0; a < 2; a++)
> + for (b = 0; b < 2; b++)
> + if (++c < a)
> + __builtin_abort ();
> + return 0;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/pr102131.c b/gcc/testsuite/gcc.dg/vect/pr102131.c
> new file mode 100644
> index 00000000000..23975cfeadb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/pr102131.c
> @@ -0,0 +1,47 @@
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-additional-options "-O3" } */
> +#define MAX ((unsigned int) 0xffffffff)
> +#define MIN ((unsigned int) (0))
> +
> +int arr[512];
> +
> +#define FUNC(NAME, CODE, S0, S1) \
> + unsigned __attribute__ ((noinline)) NAME (unsigned int b0, unsigned int b1) \
> + { \
> + unsigned int n = 0; \
> + unsigned int i0, i1; \
> + int *p = arr; \
> + for (i0 = b0, i1 = b1; i0 CODE i1; i0 += S0, i1 += S1) \
> + { \
> + n++; \
> + *p++ = i0 + i1; \
> + } \
> + return n; \
> + }
> +
> +FUNC (lt_5_1, <, 5, 1);
> +FUNC (le_1_m5, <=, 1, -5);
> +FUNC (lt_1_10, <, 1, 10);
> +
> +int
> +main ()
> +{
> + int fail = 0;
> + if (lt_5_1 (MAX - 124, MAX - 27) != 28)
> + fail++;
> +
> + /* to save time, do not run this. */
> + /*
> + if (le_1_m5 (MIN + 1, MIN + 9) != 715827885)
> + fail++; */
> +
> + if (lt_1_10 (MAX - 1000, MAX - 500) != 51)
> + fail++;
> +
> + if (fail)
> + __builtin_abort ();
> +
> + return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } } */
next prev parent reply other threads:[~2021-12-10 4:28 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-18 13:37 Jiufu Guo
2021-10-28 2:19 ` guojiufu
2021-10-28 9:13 ` Richard Biener
2021-12-09 6:53 ` Jiufu Guo
2021-12-10 4:28 ` Jiufu Guo [this message]
2021-12-17 2:09 ` Jiufu Guo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7e7dcdm6bn.fsf@pike.rch.stglabs.ibm.com \
--to=guojiufu@linux.ibm.com \
--cc=amker.cheng@gmail.com \
--cc=dje.gcc@gmail.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=jlaw@tachyum.com \
--cc=rguenther@suse.de \
--cc=segher@kernel.crashing.org \
--cc=wschmidt@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).