Re: [PATCH][RFC] Radically simplify emission of balanced tree for switch statements.

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: "Martin Liška" <mliska@suse.cz>
To: Jeff Law <law@redhat.com>, gcc-patches@gcc.gnu.org
Subject: Re: [PATCH][RFC] Radically simplify emission of balanced tree for switch statements.
Date: Wed, 20 Sep 2017 07:24:00 -0000	[thread overview]
Message-ID: <a385e3cb-4a33-55ea-5c3b-1d159a040cf0@suse.cz> (raw)
In-Reply-To: <c0fc51df-4794-be1e-e139-578464623e57@redhat.com>

On 09/16/2017 12:19 AM, Jeff Law wrote:
> On 09/14/2017 06:17 AM, Martin LiÅ¡ka wrote:
>> Hello.
>>
>> As mentioned at Cauldron 2017, second step in switch lowering should be massive
>> simplification in code that does expansion of balanced tree. Basically it includes
>> VRP and DCE, which we can for obvious reason do by our own.
>>
>> The patch does that, and introduces a separate pass for -O0 that's responsible
>> for lowering at the end of tree pass.
>>
>> There's a small fallback that I would like to discuss:
>>
>> 1) vrp105.c - there's no longer catches threading opportunity in between default cases:
>> adding Patrick who can probably help why is the opportunity skipped with expanded tree
> Well, VRP knows how to analyze a switch statement and determine when
> paths out of one switch imply a particular case will be taken in a later
> switch.  In this particular case we're trying to verify that the default
> case in the first switch threads directly to the default case in the
> second (though it seems we ought to be able to totally thread the cases).
> 
> Obviously we don't have switches anymore after your lowering pass.  But
> ISTM we still ought to be able to evaluate how your lowering affects
> jump threading on this case.  In theory the lowered switch statements
> should be easier to thread.
> 
> Reality is sadly different.  There's two cases to consider.  One when i
> < 0 (should go to first default, then directly to second default).  The
> other when i > 1 which should also go to the first default, then
> directly to the second default).
> 
> When i < 0 we do in fact thread through the second switch and go from
> the first default case directly to the second default case.
> 
> When i > 1 we still go through the test for the second switch.
> 
> These are the key blocks:
> 
> ;;   basic block 2, loop depth 0, freq 10000, maybe hot
> ;;    prev block 0, next block 3, flags: (NEW, REACHABLE, VISITED)
> ;;    pred:       ENTRY [always]  (FALLTHRU,EXECUTABLE)
>   i.0_1 = i;
>   if (i.0_1 > 0)
>     goto <bb 4>; [50.01%] [count: INV]
>   else
>     goto <bb 3>; [49.99%] [count: INV]
> ;;    succ:       3 [50.0% (guessed)]  (FALSE_VALUE,EXECUTABLE)
> ;;                4 [50.0% (guessed)]  (TRUE_VALUE,EXECUTABLE)
> 
> ;;   basic block 3, loop depth 0, freq 10000, maybe hot
> ;;   Invalid sum of incoming frequencies 4999, should be 10000
> ;;    prev block 2, next block 4, flags: (NEW, REACHABLE, VISITED)
> ;;    pred:       2 [50.0% (guessed)]  (FALSE_VALUE,EXECUTABLE)
>   if (i.0_1 == 0)
>     goto <bb 5>; [40.00%] [count: INV]
>   else
>     goto <bb 14>; [60.00%] [count: INV]
> ;;    succ:       14 [60.0% (guessed)]  (FALSE_VALUE,EXECUTABLE)
> ;;                5 [40.0% (guessed)]  (TRUE_VALUE,EXECUTABLE)
> 
> ;;   basic block 4, loop depth 0, freq 10000, maybe hot
> ;;   Invalid sum of incoming frequencies 5001, should be 10000
> ;;    prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED)
> ;;    pred:       2 [50.0% (guessed)]  (TRUE_VALUE,EXECUTABLE)
>   _13 = i.0_1 != 1;
>   if (_13 != 0)
>     goto <bb 13>; [16.68%] [count: INV]
>   else
>     goto <bb 6>; [83.32%] [count: INV]
> ;;    succ:       6 [83.3% (guessed)]  (FALSE_VALUE,EXECUTABLE)
> ;;                13 [16.7% (guessed)]  (TRUE_VALUE,EXECUTABLE)
> 
> [ ... ]
> 
> ;;   basic block 9, loop depth 0, freq 9999, maybe hot
> ;;   Invalid sum of incoming frequencies 3999, should be 9999
> ;;    prev block 8, next block 10, flags: (NEW, REACHABLE, VISITED)
> ;;    pred:       13 [always]  (FALLTHRU,EXECUTABLE)
> ;;                7 [always (guessed)]  (TRUE_VALUE,EXECUTABLE)
>   # prephitmp_19 = PHI <prephitmp_15(13), prephitmp_12(7)>
>   _2 = prephitmp_19 != 1;
>   if (_2 != 0)
>     goto <bb 12>; [16.68%] [count: INV]
>   else
>     goto <bb 11>; [83.32%] [count: INV]
> ;;    succ:       11 [83.3% (guessed)]  (FALSE_VALUE,EXECUTABLE)
> ;;                12 [16.7% (guessed)]  (TRUE_VALUE,EXECUTABLE)
> 
> [ ... ]
> ;;   basic block 13, loop depth 0, freq 1668, maybe hot
> ;;    prev block 12, next block 14, flags: (NEW, REACHABLE, VISITED)
> ;;    pred:       4 [16.7% (guessed)]  (TRUE_VALUE,EXECUTABLE)
>   # prephitmp_15 = PHI <i.0_1(4)>
>   goto <bb 9>; [100.00%] [count: INV]
> ;;    succ:       9 [always]  (FALLTHRU,EXECUTABLE)
> 
> 
> 
> WHen bb9 is reached by bb13 we know by back substitution that the assignemnt
> 
>  _2 = prehitmp_19 != 1
>  _2 = prehitmp_15 != 1
>  _2 = i.0_1 != 1
> 
> And we should know enough about the range of i.0_1 to determine that is
> true which would allow us to thread the jump.  But a few things get in
> the way.
> 
> First, the VRP thread doesn't try hard at all to simplify gimple
> assignments.  It really just tries to simplify gimple_cond and
> gimple_switch.  This could be trivially improved.
> 
> So if I do the right things by hand we end up trying to evaluate i.0_1
> != 1.  So that's good.  But we don't get a useful result back from
> vrp_evaluate_conditional.  WHy?  Because the recorded range for i.0_1 is
> [1,INF] -- that's the global range for i.0_1 not the range on the path.
> 
> Now it turns out this is precisely the problem that I've got code to
> address in my queue which fixes a regression I was working on earlier
> this year in the gcc-7 cycle.  It's backed up behind some improvements
> to tree-ssa-uninit.c that Richi and I still need to hash through.
> 
> This would likely also be fixed by Andrew's work on ranges.  It's
> precisely the kind of query its designed to answer.
> 
> In summary, your switch lowering does regress the code we get for this
> test.  But the regression is more a failing of the jump threader than
> anything.
> 
>>
>> 2) uninit-18.c where we currently report:
>>
>> /home/marxin/Programming/gcc/gcc/testsuite/gcc.dg/uninit-18.c:13:12: warning: â€˜tmpâ€™ may be used uninitialized in this function [-Wmaybe-uninitialized]
>>      tmp[5] = 7;    /* { dg-bogus "may be used uninitialized" } */
>>
>> Am I right that the pass uses VRP?
> No, it doesn't use VRP.   If you look at the history of uninit-18 it shows:
> commit cac06b024306772e2be2d357064f7ca2aa82bde8
> Author: rguenth <rguenth@138bc75d-0d04-0410-961f-82ee72b054a4>
> Date:   Fri Jan 16 13:26:10 2015 +0000
> 
>     2015-01-16  Richard Biener  <rguenther@suse.de>
> 
>             PR middle-end/64614
>             * tree-ssa-uninit.c: Include tree-cfg.h.
>             (MAX_SWITCH_CASES): New define.
>             (convert_control_dep_chain_into_preds): Handle switch
> statements.
>             (is_pred_expr_subset_of): Handle x == CST vs. (x & CST) != 0.
>             (normalize_one_pred_1): Do not split bit-manipulations.
>             Record (x & CST).
> 
>             * gcc.dg/uninit-18.c: New testcase.
> 
> Essentially tree-ssa-uninit.c has some specialized code to handle switch
> statements in predicates with BIT_AND_EXPR and uninit-18.c tests that
> capability.
> 
> Your new switch lowering breaks that relatively specialized code.
> 
> You could further enhance uninit to detect this.   From the .uninit dump:
> 
> MEM[(char *)tmp_2 + 5B] = 7;
> is guarded by :
> 
> bar_5(D) > 0 (.AND.)  (.NOT.) bar_5(D) > 1
> 
> 
> [CHECK]: Found unguarded use: MEM[(char *)tmp_2 + 5B] = 7;
> 
> 
> So that check is really bar_5 == 1 and in theory if bar_5 == 1, then we
> would have assigned a value to tmp.   VRP is capable of making that
> determination and I think when VRP does that, things simplify enough
> that the general predicate code in tree-ssa-uninit.c will DTRT.
> 
> I wonder if we could get the tighter test coming out of switch lowering,
> or discover it in DOM, then there's a reasonable change
> tree-ssa-uninit.c would just DTRT.
> 
> Looking at the DOM dumps we have:
> 
> Optimizing block #5
> 
> 1>>> STMT 1 = bar_5(D) le_expr 1
> 1>>> STMT 0 = bar_5(D) gt_expr 1
> Optimizing statement if (bar_5(D) >= 1)
> LKUP STMT bar_5(D) ge_expr 1
> 
> 
> DOM doesn't usually try to do any kind of range optimization.  But this
> is a case where it might make sense.
> 
> So the first two lines indicate that we know bar5 <= 1 is true and that
> bar5 > 1 must be false.  Those are expressions we can look up in the
> hash table.
> 
> So when the lookup bar5 >= 1 fails we could in theory lookup bar5 <= 1
> and if that's true, then we know the test could be simplified to bar5 == 1
> 
> I don't know how often this would hit in general and when it hits how
> often the resulting code is actually better.   I can try to do some
> instrumentation on that...  I wonder if that might help the first case
> as well.
> 
> 
>>
>> Last question is whether the pass is properly moved in optimization pipeline?
>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> Well, that's a *real* interesting question.  THe two cases above argue
> that it's too early.  However, I could easily make an argument that the
> spot you chose was reasonable -- namely it's after the first set of
> threading/vrp/dom passes, but before the second set of threading/vrp/dom
> passes.
> 
> Let me poke a bit with DOM and see if it can reasonably clean up this
> stuff and whether or not the transformation is generally useful.
> 
> Jeff
> 

Hello.

Thank you Jeff for very verbose explanation what's happening. I'm planning to do
follow-up of this patch that will include clustering for bit-tests and jump tables.
Maybe that will make aforementioned issues even more difficult, but we'll see.

Martin

next prev parent reply	other threads:[~2017-09-20  7:24 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-14 12:17 Martin Liška
2017-09-15 22:19 ` Jeff Law
2017-09-20  7:24   ` Martin Liška [this message]
2017-09-20 15:00     ` Jeff Law
2017-09-21 23:12       ` Bernhard Reutner-Fischer
2018-01-09 14:46       ` Martin Liška
2018-01-09 18:31         ` Jeff Law
2018-01-10 13:14           ` Richard Biener
2018-01-10 15:04             ` Martin Liška
2018-01-14 23:23               ` Bernhard Reutner-Fischer
2018-05-17 10:39                 ` Martin Liška
2018-05-18 13:57                   ` Rainer Orth
2018-05-18 14:19                     ` Martin Liška
2018-05-21 11:29                       ` Rainer Orth
2018-05-21 14:00                         ` Martin Liška
2018-05-21 14:10                           ` Rainer Orth
2018-05-21 14:54                             ` Sudakshina Das
2018-05-24 19:59                               ` Martin Liška
2018-05-25 10:01                               ` Martin Liška
2018-05-25 10:12                                 ` Sudakshina Das
2018-05-24 12:52                           ` Rainer Orth
2018-05-24 20:10                             ` Martin Liška
2017-10-06 12:26   ` [PATCH] Implement smart multiple switch expansion algorithms Martin Liška
2017-10-06 16:20     ` [PATCH] Add selftest for vec::reverse David Malcolm
2017-12-29 14:39       ` Martin Liška
2017-12-29 17:09         ` David Malcolm
2017-10-06 17:24     ` [PATCH] Implement smart multiple switch expansion algorithms David Malcolm
2017-12-29 14:38       ` Martin Liška

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a385e3cb-4a33-55ea-5c3b-1d159a040cf0@suse.cz \
    --to=mliska@suse.cz \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=law@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).