From: "Martin Liška" <mliska@suse.cz>
To: Jeff Law <law@redhat.com>, gcc-patches@gcc.gnu.org
Subject: Re: [PATCH][RFC] Radically simplify emission of balanced tree for switch statements.
Date: Wed, 20 Sep 2017 07:24:00 -0000 [thread overview]
Message-ID: <a385e3cb-4a33-55ea-5c3b-1d159a040cf0@suse.cz> (raw)
In-Reply-To: <c0fc51df-4794-be1e-e139-578464623e57@redhat.com>
On 09/16/2017 12:19 AM, Jeff Law wrote:
> On 09/14/2017 06:17 AM, Martin Liška wrote:
>> Hello.
>>
>> As mentioned at Cauldron 2017, second step in switch lowering should be massive
>> simplification in code that does expansion of balanced tree. Basically it includes
>> VRP and DCE, which we can for obvious reason do by our own.
>>
>> The patch does that, and introduces a separate pass for -O0 that's responsible
>> for lowering at the end of tree pass.
>>
>> There's a small fallback that I would like to discuss:
>>
>> 1) vrp105.c - there's no longer catches threading opportunity in between default cases:
>> adding Patrick who can probably help why is the opportunity skipped with expanded tree
> Well, VRP knows how to analyze a switch statement and determine when
> paths out of one switch imply a particular case will be taken in a later
> switch. In this particular case we're trying to verify that the default
> case in the first switch threads directly to the default case in the
> second (though it seems we ought to be able to totally thread the cases).
>
> Obviously we don't have switches anymore after your lowering pass. But
> ISTM we still ought to be able to evaluate how your lowering affects
> jump threading on this case. In theory the lowered switch statements
> should be easier to thread.
>
> Reality is sadly different. There's two cases to consider. One when i
> < 0 (should go to first default, then directly to second default). The
> other when i > 1 which should also go to the first default, then
> directly to the second default).
>
> When i < 0 we do in fact thread through the second switch and go from
> the first default case directly to the second default case.
>
> When i > 1 we still go through the test for the second switch.
>
> These are the key blocks:
>
> ;; basic block 2, loop depth 0, freq 10000, maybe hot
> ;; prev block 0, next block 3, flags: (NEW, REACHABLE, VISITED)
> ;; pred: ENTRY [always] (FALLTHRU,EXECUTABLE)
> i.0_1 = i;
> if (i.0_1 > 0)
> goto <bb 4>; [50.01%] [count: INV]
> else
> goto <bb 3>; [49.99%] [count: INV]
> ;; succ: 3 [50.0% (guessed)] (FALSE_VALUE,EXECUTABLE)
> ;; 4 [50.0% (guessed)] (TRUE_VALUE,EXECUTABLE)
>
> ;; basic block 3, loop depth 0, freq 10000, maybe hot
> ;; Invalid sum of incoming frequencies 4999, should be 10000
> ;; prev block 2, next block 4, flags: (NEW, REACHABLE, VISITED)
> ;; pred: 2 [50.0% (guessed)] (FALSE_VALUE,EXECUTABLE)
> if (i.0_1 == 0)
> goto <bb 5>; [40.00%] [count: INV]
> else
> goto <bb 14>; [60.00%] [count: INV]
> ;; succ: 14 [60.0% (guessed)] (FALSE_VALUE,EXECUTABLE)
> ;; 5 [40.0% (guessed)] (TRUE_VALUE,EXECUTABLE)
>
> ;; basic block 4, loop depth 0, freq 10000, maybe hot
> ;; Invalid sum of incoming frequencies 5001, should be 10000
> ;; prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED)
> ;; pred: 2 [50.0% (guessed)] (TRUE_VALUE,EXECUTABLE)
> _13 = i.0_1 != 1;
> if (_13 != 0)
> goto <bb 13>; [16.68%] [count: INV]
> else
> goto <bb 6>; [83.32%] [count: INV]
> ;; succ: 6 [83.3% (guessed)] (FALSE_VALUE,EXECUTABLE)
> ;; 13 [16.7% (guessed)] (TRUE_VALUE,EXECUTABLE)
>
> [ ... ]
>
> ;; basic block 9, loop depth 0, freq 9999, maybe hot
> ;; Invalid sum of incoming frequencies 3999, should be 9999
> ;; prev block 8, next block 10, flags: (NEW, REACHABLE, VISITED)
> ;; pred: 13 [always] (FALLTHRU,EXECUTABLE)
> ;; 7 [always (guessed)] (TRUE_VALUE,EXECUTABLE)
> # prephitmp_19 = PHI <prephitmp_15(13), prephitmp_12(7)>
> _2 = prephitmp_19 != 1;
> if (_2 != 0)
> goto <bb 12>; [16.68%] [count: INV]
> else
> goto <bb 11>; [83.32%] [count: INV]
> ;; succ: 11 [83.3% (guessed)] (FALSE_VALUE,EXECUTABLE)
> ;; 12 [16.7% (guessed)] (TRUE_VALUE,EXECUTABLE)
>
> [ ... ]
> ;; basic block 13, loop depth 0, freq 1668, maybe hot
> ;; prev block 12, next block 14, flags: (NEW, REACHABLE, VISITED)
> ;; pred: 4 [16.7% (guessed)] (TRUE_VALUE,EXECUTABLE)
> # prephitmp_15 = PHI <i.0_1(4)>
> goto <bb 9>; [100.00%] [count: INV]
> ;; succ: 9 [always] (FALLTHRU,EXECUTABLE)
>
>
>
> WHen bb9 is reached by bb13 we know by back substitution that the assignemnt
>
> _2 = prehitmp_19 != 1
> _2 = prehitmp_15 != 1
> _2 = i.0_1 != 1
>
> And we should know enough about the range of i.0_1 to determine that is
> true which would allow us to thread the jump. But a few things get in
> the way.
>
> First, the VRP thread doesn't try hard at all to simplify gimple
> assignments. It really just tries to simplify gimple_cond and
> gimple_switch. This could be trivially improved.
>
> So if I do the right things by hand we end up trying to evaluate i.0_1
> != 1. So that's good. But we don't get a useful result back from
> vrp_evaluate_conditional. WHy? Because the recorded range for i.0_1 is
> [1,INF] -- that's the global range for i.0_1 not the range on the path.
>
> Now it turns out this is precisely the problem that I've got code to
> address in my queue which fixes a regression I was working on earlier
> this year in the gcc-7 cycle. It's backed up behind some improvements
> to tree-ssa-uninit.c that Richi and I still need to hash through.
>
> This would likely also be fixed by Andrew's work on ranges. It's
> precisely the kind of query its designed to answer.
>
> In summary, your switch lowering does regress the code we get for this
> test. But the regression is more a failing of the jump threader than
> anything.
>
>>
>> 2) uninit-18.c where we currently report:
>>
>> /home/marxin/Programming/gcc/gcc/testsuite/gcc.dg/uninit-18.c:13:12: warning: âtmpâ may be used uninitialized in this function [-Wmaybe-uninitialized]
>> tmp[5] = 7; /* { dg-bogus "may be used uninitialized" } */
>>
>> Am I right that the pass uses VRP?
> No, it doesn't use VRP. If you look at the history of uninit-18 it shows:
> commit cac06b024306772e2be2d357064f7ca2aa82bde8
> Author: rguenth <rguenth@138bc75d-0d04-0410-961f-82ee72b054a4>
> Date: Fri Jan 16 13:26:10 2015 +0000
>
> 2015-01-16 Richard Biener <rguenther@suse.de>
>
> PR middle-end/64614
> * tree-ssa-uninit.c: Include tree-cfg.h.
> (MAX_SWITCH_CASES): New define.
> (convert_control_dep_chain_into_preds): Handle switch
> statements.
> (is_pred_expr_subset_of): Handle x == CST vs. (x & CST) != 0.
> (normalize_one_pred_1): Do not split bit-manipulations.
> Record (x & CST).
>
> * gcc.dg/uninit-18.c: New testcase.
>
> Essentially tree-ssa-uninit.c has some specialized code to handle switch
> statements in predicates with BIT_AND_EXPR and uninit-18.c tests that
> capability.
>
> Your new switch lowering breaks that relatively specialized code.
>
> You could further enhance uninit to detect this. From the .uninit dump:
>
> MEM[(char *)tmp_2 + 5B] = 7;
> is guarded by :
>
> bar_5(D) > 0 (.AND.) (.NOT.) bar_5(D) > 1
>
>
> [CHECK]: Found unguarded use: MEM[(char *)tmp_2 + 5B] = 7;
>
>
> So that check is really bar_5 == 1 and in theory if bar_5 == 1, then we
> would have assigned a value to tmp. VRP is capable of making that
> determination and I think when VRP does that, things simplify enough
> that the general predicate code in tree-ssa-uninit.c will DTRT.
>
> I wonder if we could get the tighter test coming out of switch lowering,
> or discover it in DOM, then there's a reasonable change
> tree-ssa-uninit.c would just DTRT.
>
> Looking at the DOM dumps we have:
>
> Optimizing block #5
>
> 1>>> STMT 1 = bar_5(D) le_expr 1
> 1>>> STMT 0 = bar_5(D) gt_expr 1
> Optimizing statement if (bar_5(D) >= 1)
> LKUP STMT bar_5(D) ge_expr 1
>
>
> DOM doesn't usually try to do any kind of range optimization. But this
> is a case where it might make sense.
>
> So the first two lines indicate that we know bar5 <= 1 is true and that
> bar5 > 1 must be false. Those are expressions we can look up in the
> hash table.
>
> So when the lookup bar5 >= 1 fails we could in theory lookup bar5 <= 1
> and if that's true, then we know the test could be simplified to bar5 == 1
>
> I don't know how often this would hit in general and when it hits how
> often the resulting code is actually better. I can try to do some
> instrumentation on that... I wonder if that might help the first case
> as well.
>
>
>>
>> Last question is whether the pass is properly moved in optimization pipeline?
>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> Well, that's a *real* interesting question. THe two cases above argue
> that it's too early. However, I could easily make an argument that the
> spot you chose was reasonable -- namely it's after the first set of
> threading/vrp/dom passes, but before the second set of threading/vrp/dom
> passes.
>
> Let me poke a bit with DOM and see if it can reasonably clean up this
> stuff and whether or not the transformation is generally useful.
>
> Jeff
>
Hello.
Thank you Jeff for very verbose explanation what's happening. I'm planning to do
follow-up of this patch that will include clustering for bit-tests and jump tables.
Maybe that will make aforementioned issues even more difficult, but we'll see.
Martin
next prev parent reply other threads:[~2017-09-20 7:24 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-09-14 12:17 Martin Liška
2017-09-15 22:19 ` Jeff Law
2017-09-20 7:24 ` Martin Liška [this message]
2017-09-20 15:00 ` Jeff Law
2017-09-21 23:12 ` Bernhard Reutner-Fischer
2018-01-09 14:46 ` Martin Liška
2018-01-09 18:31 ` Jeff Law
2018-01-10 13:14 ` Richard Biener
2018-01-10 15:04 ` Martin Liška
2018-01-14 23:23 ` Bernhard Reutner-Fischer
2018-05-17 10:39 ` Martin Liška
2018-05-18 13:57 ` Rainer Orth
2018-05-18 14:19 ` Martin Liška
2018-05-21 11:29 ` Rainer Orth
2018-05-21 14:00 ` Martin Liška
2018-05-21 14:10 ` Rainer Orth
2018-05-21 14:54 ` Sudakshina Das
2018-05-24 19:59 ` Martin Liška
2018-05-25 10:01 ` Martin Liška
2018-05-25 10:12 ` Sudakshina Das
2018-05-24 12:52 ` Rainer Orth
2018-05-24 20:10 ` Martin Liška
2017-10-06 12:26 ` [PATCH] Implement smart multiple switch expansion algorithms Martin Liška
2017-10-06 16:20 ` [PATCH] Add selftest for vec::reverse David Malcolm
2017-12-29 14:39 ` Martin Liška
2017-12-29 17:09 ` David Malcolm
2017-10-06 17:24 ` [PATCH] Implement smart multiple switch expansion algorithms David Malcolm
2017-12-29 14:38 ` Martin Liška
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a385e3cb-4a33-55ea-5c3b-1d159a040cf0@suse.cz \
--to=mliska@suse.cz \
--cc=gcc-patches@gcc.gnu.org \
--cc=law@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).