From: Aldy Hernandez <aldyh@redhat.com>
To: Richard Biener <richard.guenther@gmail.com>
Cc: gcc-patches <gcc-patches@gcc.gnu.org>, Jeff Law <law@redhat.com>
Subject: Re: [PR tree-optimization/71691] Fix unswitching in presence of maybe-undef SSA_NAMEs (take 2)
Date: Fri, 13 Jan 2017 18:48:00 -0000 [thread overview]
Message-ID: <0ff3a68e-3e2c-c0df-8c48-b02dac40d621@redhat.com> (raw)
In-Reply-To: <CAFiYyc2vY-gBo_VDGC=VMYWanyhC_+gRYRGOzcV_fwMVUK40LQ@mail.gmail.com>
[Sorry for the delay, I was sick.]
On 01/09/2017 04:30 AM, Richard Biener wrote:
> On Sat, Jan 7, 2017 at 1:54 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
>> On 01/04/2017 07:11 AM, Richard Biener wrote:
>>>
>>> On Tue, Jan 3, 2017 at 6:36 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
>>>>
>>>> On 12/20/2016 09:16 AM, Richard Biener wrote:
>>>>>
>>>>>
>>>>> On Fri, Dec 16, 2016 at 3:41 PM, Aldy Hernandez <aldyh@redhat.com>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> Hi folks.
>>>>>>
>>>>>> This is a follow-up on Jeff and Richi's interaction on the
>>>>>> aforementioned
>>>>>> PR
>>>>>> here:
>>>>>>
>>>>>> https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01397.html
>>>>>>
>>>>>> I decided to explore the idea of analyzing may-undefness on-demand,
>>>>>> which
>>>>>> actually looks rather cheap.
>>>>>>
>>>>>> BTW, I don't understand why we don't have auto_bitmap's, as we already
>>>>>> have
>>>>>> auto_sbitmap's. I've implemented the former based on auto_sbitmap's
>>>>>> code
>>>>>> we
>>>>>> already have.
>>>>>>
>>>>>> The attached patch fixes the bug without introducing any regressions.
>>>>>>
>>>>>> I also tested the patch by compiling 242 .ii files with -O3. These
>>>>>> were
>>>>>> gathered from a stage1 build with -save-temps. There is a slight time
>>>>>> degradation of 4 seconds within 27 minutes of user time:
>>>>>>
>>>>>> tainted: 26:52
>>>>>> orig: 26:48
>>>>>>
>>>>>> This was the average aggregate time of two runs compiling all 242 .ii
>>>>>> files.
>>>>>> IMO, this looks reasonable. It is after all, -O3. Is it acceptable?
>>>>>
>>>>>
>>>>>
>>>>> + while (!worklist.is_empty ())
>>>>> + {
>>>>> + name = worklist.pop ();
>>>>> + gcc_assert (TREE_CODE (name) == SSA_NAME);
>>>>> +
>>>>> + if (ssa_undefined_value_p (name, true))
>>>>> + return true;
>>>>> +
>>>>> + bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
>>>>>
>>>>> it should be already set as we use visited_ssa as "was it ever on the
>>>>> worklist",
>>>>> so maybe renaming it would be a good thing as well.
>>>>
>>>>
>>>>
>>>> I don't understand what you would prefer here.
>>>
>>>
>>> Set the bit when you put name on the worklist (and do not do that if the
>>> bit is set). Thus simply remove the above and add a bitmap_set_bit
>>> for the initial name you put on the worklist.
>>>
>>>>>
>>>>> + if (TREE_CODE (name) == SSA_NAME)
>>>>> + {
>>>>> + /* If an SSA has already been seen, this may be a
>>>>> loop.
>>>>> + Fail conservatively. */
>>>>> + if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION
>>>>> (name)))
>>>>> + return false;
>>>>>
>>>>> so to me "conservative" is returning true, not false.
>>>>
>>>>
>>>>
>>>> OK
>>>>
>>>>>
>>>>> + bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
>>>>> + worklist.safe_push (name);
>>>>>
>>>>> but for loops we can just continue and ignore this use. And
>>>>> bitmap_set_bit
>>>>> returns whether it set a bit, thus
>>>>>
>>>>> if (bitmap_set_bit (visited_ssa, SSA_NAME_VERSION
>>>>> (name)))
>>>>> worklist.safe_push (name);
>>>>>
>>>>> should work?
>>>>
>>>>
>>>>
>>>> Fixed.
>>>>
>>>>>
>>>>> + /* Check that any SSA names used to define NAME is also fully
>>>>> + defined. */
>>>>> + use_operand_p use_p;
>>>>> + ssa_op_iter iter;
>>>>> + FOR_EACH_SSA_USE_OPERAND (use_p, def, iter, SSA_OP_USE)
>>>>> + {
>>>>> + name = USE_FROM_PTR (use_p);
>>>>> + if (TREE_CODE (name) == SSA_NAME)
>>>>>
>>>>> always true.
>>>>>
>>>>> + {
>>>>> + /* If an SSA has already been seen, this may be a loop.
>>>>> + Fail conservatively. */
>>>>> + if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION (name)))
>>>>> + return false;
>>>>> + bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
>>>>> + worklist.safe_push (name);
>>>>>
>>>>> See above.
>>>>>
>>>>> In principle the thing is sound but I'd like to be able to pass in a
>>>>> bitmap of
>>>>> known maybe-undefined/must-defined SSA names to have a cache for
>>>>> multiple invocations from within a pass (like this unswitching case).
>>>>
>>>>
>>>>
>>>> Done, though bitmaps are now calculated as part of the instantiation.
>>>>
>>>>>
>>>>> Also once you hit defs that are in a post-dominated region of the loop
>>>>> entry
>>>>> you can treat them as not undefined (as their use invokes undefined
>>>>> behavior). This is also how you treat function parameters (well,
>>>>> ssa_undefined_value_p does), where the call site invokes undefined
>>>>> behavior
>>>>> when passing in undefined values. So we need an extra parameter
>>>>> specifying
>>>>> the post-dominance region.
>>>>
>>>>
>>>>
>>>> Done.
>>>>
>>>>>
>>>>> You do not handle memory or calls conservatively which means the
>>>>> existing
>>>>> testcase only needs some obfuscation to become a problem again. To fix
>>>>> that before /* Check that any SSA names used to define NAME is also
>>>>> fully
>>>>> defined. */ bail out conservatively, like
>>>>>
>>>>> if (! is_gimple_assign (def)
>>>>> || gimple_assign_single_p (def))
>>>>> return true;
>>>>
>>>>
>>>>
>>>> As I asked previously, I understand the !is_gimple_assign, which will
>>>> skip
>>>> over GIMPLE_CALLs setting a value, but the "gimple_assign_single_p(def)
>>>> ==
>>>> true"??
>>>>
>>>> Won't this last one bail on things like e.3_7 = e, or x_77 = y_88? Don't
>>>> we
>>>> want to follow up the def chain precisely on these?
>>>>
>>>> The attached implementation uses a cache, and a pre-computed
>>>> post-dominance
>>>> region. A previous incantation of this patch (sans the post-dominance
>>>> stuff) had performance within the noise of the previous implementation.
>>>>
>>>> I am testing again, and will do some performance comparisons in a bit,
>>>> but
>>>> for now-- are we on the same page here? Is this what you wanted?
>>>
>>>
>>> + /* DEFs in the post-dominated region of the loop entry invoke
>>> + undefined behavior. Adding any use won't make things any
>>> + worse. */
>>> + for (unsigned i = 0; i < postdom.length (); ++i)
>>> + if (def->bb == postdom[i])
>>>
>>> gimple_bb (def)
>>>
>>> + {
>>> + set_defined (t);
>>> + return false;
>>> + }
>>>
>>> I think you can't really return false here but need to continue processing
>>> the worklist. I'd say rather than a linear walk you can as well use
>>> dominated_by_p (CDI_POST_DOMINATORS, ...) and record the entry
>>> block instead?
>>>
>>> Note that the way you compute post-dominators doesn't work -- nothing
>>> keeps them up-to-date when unswitching does CFG manipulations :/
>>> Fortunately we have a way to compute them per region, see how
>>> tree-if-conv.c
>>> does that (build_region, calculate_dominance_info_for_region).
>>
>>
>> If I understand correctly, we could compute them per region in
>> tree_unswitch_single_loop() for each individual loop with what
>> tree-if-conv.c uses.
>>
>> I could certainly abstract build_region/calculate_dominance_info_for_region
>> into something new, say calculate_dominance_info_for_loop_region(). Then we
>> could use dominated_by_p(CDI_POST_DOMINATORS, ...) in my class as you
>> suggest.
>>
>> Question... computing the post-dom tree per region as I've just described
>> will only create post-dom information for the loop region, which means that
>> any definition outside of the loop will have zero dominance info.
>
> Yes.
>
>> What if the definition in question post-dominates the loop entry but is
>> outside/after of the loop?
>
> How would we ever arrive at such def? Once we reach a def before the
> region/loop
> we know it's fine to use (the missing "stop" point I pointed out).
Hmmm, it looks like we can't even build the post-dom region
appropriately in the presence of infinite loops.
First, build_region() fails if it can't find a loop post-header:
/* The last element is loop post-header. */
gcc_assert (exit_bb);
which we won't have in an infinite loop like:
bb2: //loop header
|
V
loop 1:
bb3: <-------+
bar(); |
| |
V |
bb4: |
goto bb3; >---+
We could just build the region without the post-header, but then we fail
while building the DFS dominance region:
dom_info::calc_dfs_tree_nonrec (basic_block bb)
....
gcc_assert (bn != m_start_block);
Presumably because we've looped back to the start block (bb4 in a post
dominator world). This doesn't happen calculating going forward because
we always have a start block outside of the region (the function entry
block).
I tried to fake edge my way out of it, but things get really messed up
while putting/removing fake edges in the presence of loop info. I'm
assuming all this is by design.
Would you prefer me to continue down this path, trying to build a
post-dom region with infinite loops and fixing build_region /
calc_dfs_tree_nonrec, or could we do without this dominance
optimization? Pretty please?
>
>> We will have no post-dom information for this.
>> In this case, could we just ignore and continue evaluating the SSA_NAME with
>> the rest of our heuristics, or did you have something else in mind? Another
>> option would be to calculate the post-dom information for the entire
>> function on every loop (calculate_dominance_info(CDI_POST_DOMINATORS)), but
>> that seems rather expensive.
>>
>> As a side note, at this point I just want to fix/close the PR in an
>> acceptable manner, not come up with the end-all catch-all most-efficient
>> patch for an unlikely scenario ;-).
>
> Yeah, which is why I wonder whether the caching is worth the trouble (in it's
> current form) for the unswitching use (given it's other restrictions
> on conditions
> to unswitch).
We could go back to my original, no caching version (with the other
suggestions). That solves the problem quite simply, with no regressions.
Thanks again.
Aldy
next prev parent reply other threads:[~2017-01-13 18:48 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-12-16 15:20 Aldy Hernandez
2016-12-19 23:30 ` Jeff Law
2016-12-20 14:39 ` Richard Biener
2016-12-20 16:01 ` Jeff Law
2016-12-20 17:40 ` Richard Biener
2016-12-21 18:00 ` Jeff Law
2016-12-21 18:52 ` Aldy Hernandez
2017-01-04 11:43 ` Richard Biener
2017-01-03 17:37 ` Aldy Hernandez
2017-01-04 12:12 ` Richard Biener
2017-01-07 12:54 ` Aldy Hernandez
2017-01-09 9:30 ` Richard Biener
2017-01-13 18:48 ` Aldy Hernandez [this message]
2017-01-18 15:17 ` Richard Biener
2017-01-23 17:28 ` Aldy Hernandez
2017-01-24 12:25 ` Richard Biener
2017-01-26 12:04 ` Aldy Hernandez
2017-01-26 12:48 ` Richard Biener
2017-01-27 11:32 ` Aldy Hernandez
2017-01-30 15:05 ` Richard Biener
2017-01-30 17:24 ` Aldy Hernandez
2017-01-31 8:11 ` Richard Biener
2017-01-05 4:13 ` Trevor Saunders
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0ff3a68e-3e2c-c0df-8c48-b02dac40d621@redhat.com \
--to=aldyh@redhat.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=law@redhat.com \
--cc=richard.guenther@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).