* [PATCH] Improve debug info in ivopts optimized loops (PR debug/90231)
@ 2019-10-19 7:06 Jakub Jelinek
2019-10-19 7:49 ` Jakub Jelinek
2019-10-21 10:45 ` Bin.Cheng
0 siblings, 2 replies; 6+ messages in thread
From: Jakub Jelinek @ 2019-10-19 7:06 UTC (permalink / raw)
To: Bin.Cheng, Richard Biener, Alexandre Oliva; +Cc: gcc-patches
Hi!
As mentioned in the PR, the following patch attempts to address two issues.
In remove_unused_ivs we first find the best iv_cand (we prefer primarily the
same step, next same mode and lastly constant base) and only then call
get_computation_at to determine the replacement expression. Unfortunately,
in various cases get_computation_at can return NULL_TREE and in that case we
don't try any other candidate and just leave the vars for debug as is, which
results in => NULL and the IVs <optimized away>.
The following patch will only consider candidates for which
get_computation_at succeeds, while it can be slower, it can handle more
cases. Perhaps alternative would be to have two passes, pick up the best
candidate without calling get_computation_at, calling it on the best
candidate and if that fails, retry the best candidate search with calling
get_computation_at.
Another thing is that get_computation_at can only handle cases where
use can be expressed as ubase + (var - cbase) * ratio for integral ratio.
In the PR testcase we don't have any, one (the one we pick as best)
has a ratio 1/15 and the other 1/4. Using a division in ivopts at runtime
is hardly ever desirable, but for debug info we don't mind that, all we need
to ensure is that we don't result in wrong-debug.
The patch implements expressing use as ubase + (var - cbase) / ratio for
these debug info only uses, but so far requires that if the use IV couldn't
wrap around, that the candidate (var - cbase) can't wrap around either,
which should be true whenever the candidate type is IMHO at least
ceil_log2 (ratio) bits larger than the use type. Do I need to punt if
!use->iv->no_overflow, or is
ubase + (utype) ((var - cbase) / ratio) computation safe even if it wraps?
And as questioned in the PR, are there other cases where we can safely
assume no wrap (e.g. use it if use->iv->no_overflow && cand->iv->no_overflow
without those extra precision checks)?
Anyway, bootstrapped/regtested successfully on x86_64-linux and i686-linux.
Comparing bootstrapped cc1plus without and with this patch (the former with
the patch applied after bootstrap and stage3 rebuilt, so that .text etc. is
identical) shows:
- [33] .debug_info PROGBITS 0000000000000000 22a4788 749275e 00 0 0 1
- [34] .debug_abbrev PROGBITS 0000000000000000 9736ee6 204aad 00 0 0 1
- [35] .debug_line PROGBITS 0000000000000000 993b993 1688464 00 0 0 1
- [36] .debug_str PROGBITS 0000000000000000 afc3df7 6f65aa 01 MS 0 0 1
- [37] .debug_loc PROGBITS 0000000000000000 b6ba3a1 71a2dde 00 0 0 1
- [38] .debug_ranges PROGBITS 0000000000000000 1285d17f 16414d0 00 0 0 1
- [39] .symtab SYMTAB 0000000000000000 13e9e650 166ff8 18 40 38193 8
- [40] .strtab STRTAB 0000000000000000 14005648 2ad809 00 0 0 1
- [41] .shstrtab STRTAB 0000000000000000 142b2e51 0001a0 00 0 0 1
+ [33] .debug_info PROGBITS 0000000000000000 22a4788 749365e 00 0 0 1
+ [34] .debug_abbrev PROGBITS 0000000000000000 9737de6 204a9f 00 0 0 1
+ [35] .debug_line PROGBITS 0000000000000000 993c885 1688f0c 00 0 0 1
+ [36] .debug_str PROGBITS 0000000000000000 afc5791 6f65aa 01 MS 0 0 1
+ [37] .debug_loc PROGBITS 0000000000000000 b6bbd3b 71cd404 00 0 0 1
+ [38] .debug_ranges PROGBITS 0000000000000000 1288913f 16414b0 00 0 0 1
+ [39] .symtab SYMTAB 0000000000000000 13eca5f0 166ff8 18 40 38193 8
+ [40] .strtab STRTAB 0000000000000000 140315e8 2ad809 00 0 0 1
+ [41] .shstrtab STRTAB 0000000000000000 142dedf1 0001a0 00 0 0 1
so .debug_info is 3840 bytes larger and .debug_loc is 173606 bytes larger
(0.15%), so there are some changes with this, but not huge amount of them,
though .debug_loc size changed in 217 gcc/*.o files out of 474.
2019-10-18 Jakub Jelinek <jakub@redhat.com>
PR debug/90231
* tree-ssa-loop-ivopts.c (get_debug_computation_at): New function.
(remove_unused_ivs): Use it instead of get_computation_at. When
choosing best candidate, only consider candidates where
get_debug_computation_at actually returns non-NULL.
--- gcc/tree-ssa-loop-ivopts.c.jj 2019-09-20 12:25:26.810718338 +0200
+++ gcc/tree-ssa-loop-ivopts.c 2019-10-18 23:55:10.054026219 +0200
@@ -4089,6 +4089,83 @@ get_computation_at (class loop *loop, gi
return fold_convert (type, aff_combination_to_tree (&aff));
}
+/* Like get_computation_at, but try harder, even if the computation
+ is more expensive. Intended for debug stmts. */
+
+static tree
+get_debug_computation_at (class loop *loop, gimple *at,
+ struct iv_use *use, struct iv_cand *cand)
+{
+ if (tree ret = get_computation_at (loop, at, use, cand))
+ return ret;
+
+ tree ubase = use->iv->base, ustep = use->iv->step;
+ tree cbase = cand->iv->base, cstep = cand->iv->step;
+ tree var;
+ tree utype = TREE_TYPE (ubase), ctype = TREE_TYPE (cbase);
+ widest_int rat;
+
+ /* We must have a precision to express the values of use. */
+ if (TYPE_PRECISION (utype) >= TYPE_PRECISION (ctype))
+ return NULL_TREE;
+
+ /* Try to handle the case that get_computation_at doesn't,
+ try to express
+ use = ubase + (var - cbase) / ratio. */
+ if (!constant_multiple_of (cstep, fold_convert (TREE_TYPE (cstep), ustep),
+ &rat))
+ return NULL_TREE;
+
+ bool neg_p = false;
+ if (wi::neg_p (rat))
+ {
+ if (TYPE_UNSIGNED (ctype))
+ return NULL_TREE;
+ neg_p = true;
+ rat = wi::neg (rat);
+ }
+
+ int bits = wi::exact_log2 (rat);
+ if (bits == -1)
+ bits = wi::floor_log2 (rat) + 1;
+ if (TYPE_PRECISION (utype) + bits > TYPE_PRECISION (ctype))
+ return NULL_TREE;
+
+ var = var_at_stmt (loop, cand, at);
+
+ if (POINTER_TYPE_P (ctype))
+ {
+ ctype = unsigned_type_for (ctype);
+ cbase = fold_convert (ctype, cbase);
+ cstep = fold_convert (ctype, cstep);
+ var = fold_convert (ctype, var);
+ }
+
+ ubase = unshare_expr (ubase);
+ cbase = unshare_expr (cbase);
+ if (stmt_after_increment (loop, cand, at))
+ var = fold_build2 (MINUS_EXPR, TREE_TYPE (var), var,
+ unshare_expr (cstep));
+
+ var = fold_build2 (MINUS_EXPR, TREE_TYPE (var), var, cbase);
+ var = fold_build2 (EXACT_DIV_EXPR, TREE_TYPE (var), var,
+ wide_int_to_tree (TREE_TYPE (var), rat));
+ if (POINTER_TYPE_P (utype))
+ {
+ var = fold_convert (sizetype, var);
+ if (neg_p)
+ var = fold_build1 (NEGATE_EXPR, sizetype, var);
+ var = fold_build2 (POINTER_PLUS_EXPR, utype, ubase, var);
+ }
+ else
+ {
+ var = fold_convert (utype, var);
+ var = fold_build2 (neg_p ? MINUS_EXPR : PLUS_EXPR, utype,
+ ubase, var);
+ }
+ return var;
+}
+
/* Adjust the cost COST for being in loop setup rather than loop body.
If we're optimizing for space, the loop setup overhead is constant;
if we're optimizing for speed, amortize it over the per-iteration cost.
@@ -7523,6 +7600,7 @@ remove_unused_ivs (struct ivopts_data *d
struct iv_use dummy_use;
struct iv_cand *best_cand = NULL, *cand;
unsigned i, best_pref = 0, cand_pref;
+ tree comp = NULL_TREE;
memset (&dummy_use, 0, sizeof (dummy_use));
dummy_use.iv = info->iv;
@@ -7543,20 +7621,22 @@ remove_unused_ivs (struct ivopts_data *d
? 1 : 0;
if (best_cand == NULL || best_pref < cand_pref)
{
- best_cand = cand;
- best_pref = cand_pref;
+ tree this_comp
+ = get_debug_computation_at (data->current_loop,
+ SSA_NAME_DEF_STMT (def),
+ &dummy_use, cand);
+ if (this_comp)
+ {
+ best_cand = cand;
+ best_pref = cand_pref;
+ comp = this_comp;
+ }
}
}
if (!best_cand)
continue;
- tree comp = get_computation_at (data->current_loop,
- SSA_NAME_DEF_STMT (def),
- &dummy_use, best_cand);
- if (!comp)
- continue;
-
if (count > 1)
{
tree vexpr = make_node (DEBUG_EXPR_DECL);
Jakub
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] Improve debug info in ivopts optimized loops (PR debug/90231)
2019-10-19 7:06 [PATCH] Improve debug info in ivopts optimized loops (PR debug/90231) Jakub Jelinek
@ 2019-10-19 7:49 ` Jakub Jelinek
2019-10-21 10:45 ` Bin.Cheng
1 sibling, 0 replies; 6+ messages in thread
From: Jakub Jelinek @ 2019-10-19 7:49 UTC (permalink / raw)
To: Bin.Cheng, Richard Biener, Alexandre Oliva; +Cc: gcc-patches
On Sat, Oct 19, 2019 at 08:27:31AM +0200, Jakub Jelinek wrote:
> And as questioned in the PR, are there other cases where we can safely
> assume no wrap (e.g. use it if use->iv->no_overflow && cand->iv->no_overflow
> without those extra precision checks)?
Like, is there a way to find out if an iv_cand has been created from pointer
arithmetics or similar and thus at least with -fno-wrapv-pointer shouldn't wrap
around?
Jakub
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] Improve debug info in ivopts optimized loops (PR debug/90231)
2019-10-19 7:06 [PATCH] Improve debug info in ivopts optimized loops (PR debug/90231) Jakub Jelinek
2019-10-19 7:49 ` Jakub Jelinek
@ 2019-10-21 10:45 ` Bin.Cheng
2019-10-21 11:27 ` Jakub Jelinek
1 sibling, 1 reply; 6+ messages in thread
From: Bin.Cheng @ 2019-10-21 10:45 UTC (permalink / raw)
To: Jakub Jelinek; +Cc: Richard Biener, Alexandre Oliva, gcc-patches List
On Sat, Oct 19, 2019 at 2:27 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> Hi!
>
> As mentioned in the PR, the following patch attempts to address two issues.
> In remove_unused_ivs we first find the best iv_cand (we prefer primarily the
> same step, next same mode and lastly constant base) and only then call
> get_computation_at to determine the replacement expression. Unfortunately,
> in various cases get_computation_at can return NULL_TREE and in that case we
> don't try any other candidate and just leave the vars for debug as is, which
> results in => NULL and the IVs <optimized away>.
>
> The following patch will only consider candidates for which
> get_computation_at succeeds, while it can be slower, it can handle more
> cases. Perhaps alternative would be to have two passes, pick up the best
> candidate without calling get_computation_at, calling it on the best
> candidate and if that fails, retry the best candidate search with calling
> get_computation_at.
>
> Another thing is that get_computation_at can only handle cases where
> use can be expressed as ubase + (var - cbase) * ratio for integral ratio.
> In the PR testcase we don't have any, one (the one we pick as best)
> has a ratio 1/15 and the other 1/4. Using a division in ivopts at runtime
> is hardly ever desirable, but for debug info we don't mind that, all we need
> to ensure is that we don't result in wrong-debug.
> The patch implements expressing use as ubase + (var - cbase) / ratio for
> these debug info only uses, but so far requires that if the use IV couldn't
> wrap around, that the candidate (var - cbase) can't wrap around either,
> which should be true whenever the candidate type is IMHO at least
> ceil_log2 (ratio) bits larger than the use type. Do I need to punt if
> !use->iv->no_overflow, or is
> ubase + (utype) ((var - cbase) / ratio) computation safe even if it wraps?
> And as questioned in the PR, are there other cases where we can safely
> assume no wrap (e.g. use it if use->iv->no_overflow && cand->iv->no_overflow
> without those extra precision checks)?
>
> Anyway, bootstrapped/regtested successfully on x86_64-linux and i686-linux.
Hi Jakub, thanks very much for working on this.
As for choosing the best candidate, I was thinking to reuse existing
get_computation_cost interface, but looks like it requires non-trivial
modification. Your patch is simpler.
The overflow check is difficult, IIUC, checks on bit precision in the
patch is not enough? Considering an unsigned 64-bit candidate with
unknown compilation start value. My original idea was using
no_overflow flag, apparently this information is not well computed and
inaccurate in most cases? At last, for now there is no link between
candidate and its original iv. IIRC, when a candidate is derived from
a no_overflow pointer candidate, it's not safe to mark the candidate
as no_overflow. I am not sure if I remember wrong, unfortunately, I
don't have examples either.
Thanks,
bin
>
> Comparing bootstrapped cc1plus without and with this patch (the former with
> the patch applied after bootstrap and stage3 rebuilt, so that .text etc. is
> identical) shows:
> - [33] .debug_info PROGBITS 0000000000000000 22a4788 749275e 00 0 0 1
> - [34] .debug_abbrev PROGBITS 0000000000000000 9736ee6 204aad 00 0 0 1
> - [35] .debug_line PROGBITS 0000000000000000 993b993 1688464 00 0 0 1
> - [36] .debug_str PROGBITS 0000000000000000 afc3df7 6f65aa 01 MS 0 0 1
> - [37] .debug_loc PROGBITS 0000000000000000 b6ba3a1 71a2dde 00 0 0 1
> - [38] .debug_ranges PROGBITS 0000000000000000 1285d17f 16414d0 00 0 0 1
> - [39] .symtab SYMTAB 0000000000000000 13e9e650 166ff8 18 40 38193 8
> - [40] .strtab STRTAB 0000000000000000 14005648 2ad809 00 0 0 1
> - [41] .shstrtab STRTAB 0000000000000000 142b2e51 0001a0 00 0 0 1
> + [33] .debug_info PROGBITS 0000000000000000 22a4788 749365e 00 0 0 1
> + [34] .debug_abbrev PROGBITS 0000000000000000 9737de6 204a9f 00 0 0 1
> + [35] .debug_line PROGBITS 0000000000000000 993c885 1688f0c 00 0 0 1
> + [36] .debug_str PROGBITS 0000000000000000 afc5791 6f65aa 01 MS 0 0 1
> + [37] .debug_loc PROGBITS 0000000000000000 b6bbd3b 71cd404 00 0 0 1
> + [38] .debug_ranges PROGBITS 0000000000000000 1288913f 16414b0 00 0 0 1
> + [39] .symtab SYMTAB 0000000000000000 13eca5f0 166ff8 18 40 38193 8
> + [40] .strtab STRTAB 0000000000000000 140315e8 2ad809 00 0 0 1
> + [41] .shstrtab STRTAB 0000000000000000 142dedf1 0001a0 00 0 0 1
> so .debug_info is 3840 bytes larger and .debug_loc is 173606 bytes larger
> (0.15%), so there are some changes with this, but not huge amount of them,
> though .debug_loc size changed in 217 gcc/*.o files out of 474.
>
> 2019-10-18 Jakub Jelinek <jakub@redhat.com>
>
> PR debug/90231
> * tree-ssa-loop-ivopts.c (get_debug_computation_at): New function.
> (remove_unused_ivs): Use it instead of get_computation_at. When
> choosing best candidate, only consider candidates where
> get_debug_computation_at actually returns non-NULL.
>
> --- gcc/tree-ssa-loop-ivopts.c.jj 2019-09-20 12:25:26.810718338 +0200
> +++ gcc/tree-ssa-loop-ivopts.c 2019-10-18 23:55:10.054026219 +0200
> @@ -4089,6 +4089,83 @@ get_computation_at (class loop *loop, gi
> return fold_convert (type, aff_combination_to_tree (&aff));
> }
>
> +/* Like get_computation_at, but try harder, even if the computation
> + is more expensive. Intended for debug stmts. */
> +
> +static tree
> +get_debug_computation_at (class loop *loop, gimple *at,
> + struct iv_use *use, struct iv_cand *cand)
> +{
> + if (tree ret = get_computation_at (loop, at, use, cand))
> + return ret;
> +
> + tree ubase = use->iv->base, ustep = use->iv->step;
> + tree cbase = cand->iv->base, cstep = cand->iv->step;
> + tree var;
> + tree utype = TREE_TYPE (ubase), ctype = TREE_TYPE (cbase);
> + widest_int rat;
> +
> + /* We must have a precision to express the values of use. */
> + if (TYPE_PRECISION (utype) >= TYPE_PRECISION (ctype))
> + return NULL_TREE;
> +
> + /* Try to handle the case that get_computation_at doesn't,
> + try to express
> + use = ubase + (var - cbase) / ratio. */
> + if (!constant_multiple_of (cstep, fold_convert (TREE_TYPE (cstep), ustep),
> + &rat))
> + return NULL_TREE;
> +
> + bool neg_p = false;
> + if (wi::neg_p (rat))
> + {
> + if (TYPE_UNSIGNED (ctype))
> + return NULL_TREE;
> + neg_p = true;
> + rat = wi::neg (rat);
> + }
> +
> + int bits = wi::exact_log2 (rat);
> + if (bits == -1)
> + bits = wi::floor_log2 (rat) + 1;
> + if (TYPE_PRECISION (utype) + bits > TYPE_PRECISION (ctype))
> + return NULL_TREE;
> +
> + var = var_at_stmt (loop, cand, at);
> +
> + if (POINTER_TYPE_P (ctype))
> + {
> + ctype = unsigned_type_for (ctype);
> + cbase = fold_convert (ctype, cbase);
> + cstep = fold_convert (ctype, cstep);
> + var = fold_convert (ctype, var);
> + }
> +
> + ubase = unshare_expr (ubase);
> + cbase = unshare_expr (cbase);
> + if (stmt_after_increment (loop, cand, at))
> + var = fold_build2 (MINUS_EXPR, TREE_TYPE (var), var,
> + unshare_expr (cstep));
> +
> + var = fold_build2 (MINUS_EXPR, TREE_TYPE (var), var, cbase);
> + var = fold_build2 (EXACT_DIV_EXPR, TREE_TYPE (var), var,
> + wide_int_to_tree (TREE_TYPE (var), rat));
> + if (POINTER_TYPE_P (utype))
> + {
> + var = fold_convert (sizetype, var);
> + if (neg_p)
> + var = fold_build1 (NEGATE_EXPR, sizetype, var);
> + var = fold_build2 (POINTER_PLUS_EXPR, utype, ubase, var);
> + }
> + else
> + {
> + var = fold_convert (utype, var);
> + var = fold_build2 (neg_p ? MINUS_EXPR : PLUS_EXPR, utype,
> + ubase, var);
> + }
> + return var;
> +}
> +
> /* Adjust the cost COST for being in loop setup rather than loop body.
> If we're optimizing for space, the loop setup overhead is constant;
> if we're optimizing for speed, amortize it over the per-iteration cost.
> @@ -7523,6 +7600,7 @@ remove_unused_ivs (struct ivopts_data *d
> struct iv_use dummy_use;
> struct iv_cand *best_cand = NULL, *cand;
> unsigned i, best_pref = 0, cand_pref;
> + tree comp = NULL_TREE;
>
> memset (&dummy_use, 0, sizeof (dummy_use));
> dummy_use.iv = info->iv;
> @@ -7543,20 +7621,22 @@ remove_unused_ivs (struct ivopts_data *d
> ? 1 : 0;
> if (best_cand == NULL || best_pref < cand_pref)
> {
> - best_cand = cand;
> - best_pref = cand_pref;
> + tree this_comp
> + = get_debug_computation_at (data->current_loop,
> + SSA_NAME_DEF_STMT (def),
> + &dummy_use, cand);
> + if (this_comp)
> + {
> + best_cand = cand;
> + best_pref = cand_pref;
> + comp = this_comp;
> + }
> }
> }
>
> if (!best_cand)
> continue;
>
> - tree comp = get_computation_at (data->current_loop,
> - SSA_NAME_DEF_STMT (def),
> - &dummy_use, best_cand);
> - if (!comp)
> - continue;
> -
> if (count > 1)
> {
> tree vexpr = make_node (DEBUG_EXPR_DECL);
>
> Jakub
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] Improve debug info in ivopts optimized loops (PR debug/90231)
2019-10-21 10:45 ` Bin.Cheng
@ 2019-10-21 11:27 ` Jakub Jelinek
2019-10-22 7:43 ` Jakub Jelinek
0 siblings, 1 reply; 6+ messages in thread
From: Jakub Jelinek @ 2019-10-21 11:27 UTC (permalink / raw)
To: Bin.Cheng; +Cc: Richard Biener, Alexandre Oliva, gcc-patches List
On Mon, Oct 21, 2019 at 06:41:46PM +0800, Bin.Cheng wrote:
> The overflow check is difficult, IIUC, checks on bit precision in the
> patch is not enough? Considering an unsigned 64-bit candidate with
I think it can cover some cases, especially on 64-bit targets, but will
leave other cases out.
I guess the first question is what iv->no_overflow means, the IVs
are base + iteration * step, does no_overflow mean that neither iteration *
step, nor base + iteration * step overflows?
For the debug info and division, guess we care primarily if just the
iteration * step part doesn't overflow.
There are still cases where I fear even the precision check might not be
enough. If ustep is a power of two and so is rat, then I think we should be
fine, or if use->iv->no_overflow, but otherwise I fear of say unsigned char
use and unsigned short candidate, with ustep 3 and cstep 9 (i.e. rat 3).
If the loop has more than 65536 / 9 iterations, the var - cbase
will overflow, so will have values 0, 9, ..., 0xfff0, 0xfff9, 2, 11, ...
and corresponding use computed using division (assuming ubase is 0)
0 + (unsigned char) ((var - cbase) / 3):
0, 3, ..., 0x50, 0x53, 0, 3, ...
which is incorrect.
Simple testcase -O3 -g -fno-tree-dce:
void
foo (unsigned short c, unsigned char *p)
{
unsigned char a = 0;
unsigned short b = 0;
for (; b != c; a += 3, b += 9)
p[b]++;
}
where I think with the patch it will be wrong-debug on 7282th iteration etc.
So I wonder if for correctness I don't need to add:
if (!use->iv->no_overflow
&& !cand->iv->no_overflow
&& !integer_pow2p (cstep))
return NULL_TREE;
with some of the above as comment explaining why.
On the other side, if cand->iv->no_overflow, couldn't we bypass the extra
precision test?
And related to the first question above, perhaps incrementally, couldn't we
track separately whether the whole base + iteration * step overflows (i.e.
what we track right now) and also just whether iteration * step overflows in
a separate bool? Because when we subtract the base from the value, all we
care about is iteration * step.
Jakub
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] Improve debug info in ivopts optimized loops (PR debug/90231)
2019-10-21 11:27 ` Jakub Jelinek
@ 2019-10-22 7:43 ` Jakub Jelinek
2019-10-23 9:43 ` Bin.Cheng
0 siblings, 1 reply; 6+ messages in thread
From: Jakub Jelinek @ 2019-10-22 7:43 UTC (permalink / raw)
To: Bin.Cheng; +Cc: Richard Biener, Alexandre Oliva, gcc-patches List
On Mon, Oct 21, 2019 at 01:24:30PM +0200, Jakub Jelinek wrote:
> So I wonder if for correctness I don't need to add:
>
> if (!use->iv->no_overflow
> && !cand->iv->no_overflow
> && !integer_pow2p (cstep))
> return NULL_TREE;
>
> with some of the above as comment explaining why.
>
> On the other side, if cand->iv->no_overflow, couldn't we bypass the extra
> precision test?
Here are these two in patch form.
2019-10-22 Jakub Jelinek <jakub@redhat.com>
PR debug/90231
* tree-ssa-loop-ivopts.c (get_debug_computation_at): New function.
(remove_unused_ivs): Use it instead of get_computation_at. When
choosing best candidate, only consider candidates where
get_debug_computation_at actually returns non-NULL.
--- gcc/tree-ssa-loop-ivopts.c.jj 2019-10-21 14:17:57.598198162 +0200
+++ gcc/tree-ssa-loop-ivopts.c 2019-10-22 09:30:09.782238157 +0200
@@ -4089,6 +4089,94 @@ get_computation_at (class loop *loop, gi
return fold_convert (type, aff_combination_to_tree (&aff));
}
+/* Like get_computation_at, but try harder, even if the computation
+ is more expensive. Intended for debug stmts. */
+
+static tree
+get_debug_computation_at (class loop *loop, gimple *at,
+ struct iv_use *use, struct iv_cand *cand)
+{
+ if (tree ret = get_computation_at (loop, at, use, cand))
+ return ret;
+
+ tree ubase = use->iv->base, ustep = use->iv->step;
+ tree cbase = cand->iv->base, cstep = cand->iv->step;
+ tree var;
+ tree utype = TREE_TYPE (ubase), ctype = TREE_TYPE (cbase);
+ widest_int rat;
+
+ /* We must have a precision to express the values of use. */
+ if (TYPE_PRECISION (utype) >= TYPE_PRECISION (ctype))
+ return NULL_TREE;
+
+ /* Try to handle the case that get_computation_at doesn't,
+ try to express
+ use = ubase + (var - cbase) / ratio. */
+ if (!constant_multiple_of (cstep, fold_convert (TREE_TYPE (cstep), ustep),
+ &rat))
+ return NULL_TREE;
+
+ bool neg_p = false;
+ if (wi::neg_p (rat))
+ {
+ if (TYPE_UNSIGNED (ctype))
+ return NULL_TREE;
+ neg_p = true;
+ rat = wi::neg (rat);
+ }
+
+ /* If both IVs can wrap around and CAND doesn't have a power of two step,
+ it is unsafe. Consider uint16_t CAND with step 9, when wrapping around,
+ the values will be ... 0xfff0, 0xfff9, 2, 11 ... and when use is say
+ uint8_t with step 3, those values divided by 3 cast to uint8_t will be
+ ... 0x50, 0x53, 0, 3 ... rather than expected 0x50, 0x53, 0x56, 0x59. */
+ if (!use->iv->no_overflow
+ && !cand->iv->no_overflow
+ && !integer_pow2p (cstep))
+ return NULL_TREE;
+
+ int bits = wi::exact_log2 (rat);
+ if (bits == -1)
+ bits = wi::floor_log2 (rat) + 1;
+ if (!cand->iv->no_overflow
+ && TYPE_PRECISION (utype) + bits > TYPE_PRECISION (ctype))
+ return NULL_TREE;
+
+ var = var_at_stmt (loop, cand, at);
+
+ if (POINTER_TYPE_P (ctype))
+ {
+ ctype = unsigned_type_for (ctype);
+ cbase = fold_convert (ctype, cbase);
+ cstep = fold_convert (ctype, cstep);
+ var = fold_convert (ctype, var);
+ }
+
+ ubase = unshare_expr (ubase);
+ cbase = unshare_expr (cbase);
+ if (stmt_after_increment (loop, cand, at))
+ var = fold_build2 (MINUS_EXPR, TREE_TYPE (var), var,
+ unshare_expr (cstep));
+
+ var = fold_build2 (MINUS_EXPR, TREE_TYPE (var), var, cbase);
+ var = fold_build2 (EXACT_DIV_EXPR, TREE_TYPE (var), var,
+ wide_int_to_tree (TREE_TYPE (var), rat));
+ if (POINTER_TYPE_P (utype))
+ {
+ var = fold_convert (sizetype, var);
+ if (neg_p)
+ var = fold_build1 (NEGATE_EXPR, sizetype, var);
+ var = fold_build2 (POINTER_PLUS_EXPR, utype, ubase, var);
+ }
+ else
+ {
+ var = fold_convert (utype, var);
+ var = fold_build2 (neg_p ? MINUS_EXPR : PLUS_EXPR, utype,
+ ubase, var);
+ }
+ return var;
+}
+
/* Adjust the cost COST for being in loop setup rather than loop body.
If we're optimizing for space, the loop setup overhead is constant;
if we're optimizing for speed, amortize it over the per-iteration cost.
@@ -7523,6 +7611,7 @@ remove_unused_ivs (struct ivopts_data *d
struct iv_use dummy_use;
struct iv_cand *best_cand = NULL, *cand;
unsigned i, best_pref = 0, cand_pref;
+ tree comp = NULL_TREE;
memset (&dummy_use, 0, sizeof (dummy_use));
dummy_use.iv = info->iv;
@@ -7543,20 +7632,22 @@ remove_unused_ivs (struct ivopts_data *d
? 1 : 0;
if (best_cand == NULL || best_pref < cand_pref)
{
- best_cand = cand;
- best_pref = cand_pref;
+ tree this_comp
+ = get_debug_computation_at (data->current_loop,
+ SSA_NAME_DEF_STMT (def),
+ &dummy_use, cand);
+ if (this_comp)
+ {
+ best_cand = cand;
+ best_pref = cand_pref;
+ comp = this_comp;
+ }
}
}
if (!best_cand)
continue;
- tree comp = get_computation_at (data->current_loop,
- SSA_NAME_DEF_STMT (def),
- &dummy_use, best_cand);
- if (!comp)
- continue;
-
if (count > 1)
{
tree vexpr = make_node (DEBUG_EXPR_DECL);
Jakub
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] Improve debug info in ivopts optimized loops (PR debug/90231)
2019-10-22 7:43 ` Jakub Jelinek
@ 2019-10-23 9:43 ` Bin.Cheng
0 siblings, 0 replies; 6+ messages in thread
From: Bin.Cheng @ 2019-10-23 9:43 UTC (permalink / raw)
To: Jakub Jelinek; +Cc: Richard Biener, Alexandre Oliva, gcc-patches List
On Tue, Oct 22, 2019 at 3:32 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Mon, Oct 21, 2019 at 01:24:30PM +0200, Jakub Jelinek wrote:
> > So I wonder if for correctness I don't need to add:
> >
> > if (!use->iv->no_overflow
> > && !cand->iv->no_overflow
> > && !integer_pow2p (cstep))
> > return NULL_TREE;
> >
> > with some of the above as comment explaining why.
> >
> > On the other side, if cand->iv->no_overflow, couldn't we bypass the extra
> > precision test?
>
> Here are these two in patch form.
>
> 2019-10-22 Jakub Jelinek <jakub@redhat.com>
>
> PR debug/90231
> * tree-ssa-loop-ivopts.c (get_debug_computation_at): New function.
> (remove_unused_ivs): Use it instead of get_computation_at. When
> choosing best candidate, only consider candidates where
> get_debug_computation_at actually returns non-NULL.
>
> --- gcc/tree-ssa-loop-ivopts.c.jj 2019-10-21 14:17:57.598198162 +0200
> +++ gcc/tree-ssa-loop-ivopts.c 2019-10-22 09:30:09.782238157 +0200
> @@ -4089,6 +4089,94 @@ get_computation_at (class loop *loop, gi
> return fold_convert (type, aff_combination_to_tree (&aff));
> }
>
> +/* Like get_computation_at, but try harder, even if the computation
> + is more expensive. Intended for debug stmts. */
> +
> +static tree
> +get_debug_computation_at (class loop *loop, gimple *at,
> + struct iv_use *use, struct iv_cand *cand)
> +{
> + if (tree ret = get_computation_at (loop, at, use, cand))
> + return ret;
> +
> + tree ubase = use->iv->base, ustep = use->iv->step;
> + tree cbase = cand->iv->base, cstep = cand->iv->step;
> + tree var;
> + tree utype = TREE_TYPE (ubase), ctype = TREE_TYPE (cbase);
> + widest_int rat;
> +
> + /* We must have a precision to express the values of use. */
> + if (TYPE_PRECISION (utype) >= TYPE_PRECISION (ctype))
> + return NULL_TREE;
> +
> + /* Try to handle the case that get_computation_at doesn't,
> + try to express
> + use = ubase + (var - cbase) / ratio. */
> + if (!constant_multiple_of (cstep, fold_convert (TREE_TYPE (cstep), ustep),
> + &rat))
> + return NULL_TREE;
> +
> + bool neg_p = false;
> + if (wi::neg_p (rat))
> + {
> + if (TYPE_UNSIGNED (ctype))
> + return NULL_TREE;
> + neg_p = true;
> + rat = wi::neg (rat);
> + }
> +
> + /* If both IVs can wrap around and CAND doesn't have a power of two step,
> + it is unsafe. Consider uint16_t CAND with step 9, when wrapping around,
> + the values will be ... 0xfff0, 0xfff9, 2, 11 ... and when use is say
> + uint8_t with step 3, those values divided by 3 cast to uint8_t will be
> + ... 0x50, 0x53, 0, 3 ... rather than expected 0x50, 0x53, 0x56, 0x59. */
Interesting, so we can still get correct debug info for iter in such
special cases.
> + if (!use->iv->no_overflow
> + && !cand->iv->no_overflow
> + && !integer_pow2p (cstep))
> + return NULL_TREE;
> +
> + int bits = wi::exact_log2 (rat);
> + if (bits == -1)
> + bits = wi::floor_log2 (rat) + 1;
> + if (!cand->iv->no_overflow
> + && TYPE_PRECISION (utype) + bits > TYPE_PRECISION (ctype))
> + return NULL_TREE;
The patch is fine for me.
Just for the record, guess we may try to find (by recording
information in early phase) the correct/bijection candidate in
computing the iv in debuginfo in the future, then those checks would
be unnecessary.
Thanks,
bin
> +
> + var = var_at_stmt (loop, cand, at);
> +
> + if (POINTER_TYPE_P (ctype))
> + {
> + ctype = unsigned_type_for (ctype);
> + cbase = fold_convert (ctype, cbase);
> + cstep = fold_convert (ctype, cstep);
> + var = fold_convert (ctype, var);
> + }
> +
> + ubase = unshare_expr (ubase);
> + cbase = unshare_expr (cbase);
> + if (stmt_after_increment (loop, cand, at))
> + var = fold_build2 (MINUS_EXPR, TREE_TYPE (var), var,
> + unshare_expr (cstep));
> +
> + var = fold_build2 (MINUS_EXPR, TREE_TYPE (var), var, cbase);
> + var = fold_build2 (EXACT_DIV_EXPR, TREE_TYPE (var), var,
> + wide_int_to_tree (TREE_TYPE (var), rat));
> + if (POINTER_TYPE_P (utype))
> + {
> + var = fold_convert (sizetype, var);
> + if (neg_p)
> + var = fold_build1 (NEGATE_EXPR, sizetype, var);
> + var = fold_build2 (POINTER_PLUS_EXPR, utype, ubase, var);
> + }
> + else
> + {
> + var = fold_convert (utype, var);
> + var = fold_build2 (neg_p ? MINUS_EXPR : PLUS_EXPR, utype,
> + ubase, var);
> + }
> + return var;
> +}
> +
> /* Adjust the cost COST for being in loop setup rather than loop body.
> If we're optimizing for space, the loop setup overhead is constant;
> if we're optimizing for speed, amortize it over the per-iteration cost.
> @@ -7523,6 +7611,7 @@ remove_unused_ivs (struct ivopts_data *d
> struct iv_use dummy_use;
> struct iv_cand *best_cand = NULL, *cand;
> unsigned i, best_pref = 0, cand_pref;
> + tree comp = NULL_TREE;
>
> memset (&dummy_use, 0, sizeof (dummy_use));
> dummy_use.iv = info->iv;
> @@ -7543,20 +7632,22 @@ remove_unused_ivs (struct ivopts_data *d
> ? 1 : 0;
> if (best_cand == NULL || best_pref < cand_pref)
> {
> - best_cand = cand;
> - best_pref = cand_pref;
> + tree this_comp
> + = get_debug_computation_at (data->current_loop,
> + SSA_NAME_DEF_STMT (def),
> + &dummy_use, cand);
> + if (this_comp)
> + {
> + best_cand = cand;
> + best_pref = cand_pref;
> + comp = this_comp;
> + }
> }
> }
>
> if (!best_cand)
> continue;
>
> - tree comp = get_computation_at (data->current_loop,
> - SSA_NAME_DEF_STMT (def),
> - &dummy_use, best_cand);
> - if (!comp)
> - continue;
> -
> if (count > 1)
> {
> tree vexpr = make_node (DEBUG_EXPR_DECL);
>
>
> Jakub
>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2019-10-23 9:42 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-19 7:06 [PATCH] Improve debug info in ivopts optimized loops (PR debug/90231) Jakub Jelinek
2019-10-19 7:49 ` Jakub Jelinek
2019-10-21 10:45 ` Bin.Cheng
2019-10-21 11:27 ` Jakub Jelinek
2019-10-22 7:43 ` Jakub Jelinek
2019-10-23 9:43 ` Bin.Cheng
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).