From: Aldy Hernandez <aldyh@redhat.com>
To: Richard Biener <richard.guenther@gmail.com>
Cc: Michael Matz <matz@suse.de>, Jeff Law <jeffreyalaw@gmail.com>,
GCC Mailing List <gcc@gcc.gnu.org>,
Andrew MacLeod <amacleod@redhat.com>
Subject: Re: More aggressive threading causing loop-interchange-9.c regression
Date: Thu, 9 Sep 2021 11:21:13 +0200 [thread overview]
Message-ID: <8c49db8d-3119-0dc2-2bbb-4062c8d5d53b@redhat.com> (raw)
In-Reply-To: <CAFiYyc1=Bj3yBhvKJtYFHWqHU7WgW-4RWwZ6CXSwQiP-QuYGXw@mail.gmail.com>
On 9/9/21 10:58 AM, Richard Biener wrote:
> On Thu, Sep 9, 2021 at 10:36 AM Aldy Hernandez <aldyh@redhat.com> wrote:
>>
>>
>>
>> On 9/9/21 9:45 AM, Richard Biener wrote:
>>> On Thu, Sep 9, 2021 at 9:37 AM Aldy Hernandez <aldyh@redhat.com> wrote:
>>>>
>>>>
>>>>
>>>> On 9/9/21 8:57 AM, Richard Biener wrote:
>>>>> On Wed, Sep 8, 2021 at 8:13 PM Michael Matz <matz@suse.de> wrote:
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> [lame answer to self]
>>>>>>
>>>>>> On Wed, 8 Sep 2021, Michael Matz wrote:
>>>>>>
>>>>>>>>> The forward threader guards against this by simply disallowing
>>>>>>>>> threadings that involve different loops. As I see
>>>>>>>>
>>>>>>>> The thread in question (5->9->3) is all within the same outer loop,
>>>>>>>> though. BTW, the backward threader also disallows threading across
>>>>>>>> different loops (see path_crosses_loops variable).
>>>>>> ...
>>>>>>> Maybe it's possible to not disable threading over latches alltogether in
>>>>>>> the backward threader (like it's tried now), but I haven't looked at the
>>>>>>> specific situation here in depth, so take my view only as opinion from a
>>>>>>> large distance :-)
>>>>>>
>>>>>> I've now looked at the concrete situation. So yeah, the whole path is in
>>>>>> the same loop, crosses the latch, _and there's code following the latch
>>>>>> on that path_. (I.e. the latch isn't the last block in the path). In
>>>>>> particular, after loop_optimizer_init() (before any threading) we have:
>>>>>>
>>>>>> <bb 3> [local count: 118111600]:
>>>>>> # j_19 = PHI <j_13(9), 0(7)>
>>>>>> sum_11 = c[j_19];
>>>>>> if (n_10(D) > 0)
>>>>>> goto <bb 8>; [89.00%]
>>>>>> else
>>>>>> goto <bb 5>; [11.00%]
>>>>>>
>>>>>> <bb 8> [local count: 105119324]:
>>>>>> ...
>>>>>>
>>>>>> <bb 5> [local count: 118111600]:
>>>>>> # sum_21 = PHI <sum_14(4), sum_11(3)>
>>>>>> c[j_19] = sum_21;
>>>>>> j_13 = j_19 + 1;
>>>>>> if (n_10(D) > j_13)
>>>>>> goto <bb 9>; [89.00%]
>>>>>> else
>>>>>> goto <bb 6>; [11.00%]
>>>>>>
>>>>>> <bb 9> [local count: 105119324]:
>>>>>> goto <bb 3>; [100.00%]
>>>>>>
>>>>>> With bb9 the outer (empty) latch, bb3 the outer header, and bb8 the
>>>>>> pre-header of inner loop, but more importantly something that's not at the
>>>>>> start of the outer loop.
>>>>>>
>>>>>> Now, any thread that includes the backedge 9->3 _including_ its
>>>>>> destination (i.e. where the backedge isn't the last to-be-redirected edge)
>>>>>> necessarily duplicates all code from that destination onto the back edge.
>>>>>> Here it's the load from c[j] into sum_11.
>>>>>>
>>>>>> The important part is the code is emitted onto the back edge,
>>>>>> conceptually; in reality it's simply included into the (new) latch block
>>>>>> (the duplicate of bb9, which is bb12 intermediately, then named bb7 after
>>>>>> cfg_cleanup).
>>>>>>
>>>>>> That's what we can't have for some of our structural loop optimizers:
>>>>>> there must be no code executed after the exit test (e.g. in the latch
>>>>>> block). (This requirement makes reasoning about which code is or isn't
>>>>>> executed completely for an iteration trivial; simply everything in the
>>>>>> body is always executed; e.g. loop interchange uses this to check that
>>>>>> there are no memory references after the exit test, because those would
>>>>>> then be only conditional and hence make loop interchange very awkward).
>>>>>>
>>>>>> Note that this situation can't be later rectified anymore: the duplicated
>>>>>> instructions (because they are memory refs) must remain after the exit
>>>>>> test. Only by rerolling/unrotating the loop (i.e. noticing that the
>>>>>> memory refs on the loop-entry path and on the back edge are equivalent)
>>>>>> would that be possible, but that's something we aren't capable of. Even
>>>>>> if we were that would simply just revert the whole work that the threader
>>>>>> did, so it's better to not even do that to start with.
>>>>>>
>>>>>> I believe something like below would be appropriate, it disables threading
>>>>>> if the path contains a latch at the non-last position (due to being
>>>>>> backwards on the non-first position in the array). I.e. it disables
>>>>>> rotating the loop if there's danger of polluting the back edge. It might
>>>>>> be improved if the blocks following (preceding!) the latch are themself
>>>>>> empty because then no code is duplicated. It might also be improved if
>>>>>> the latch is already non-empty. That code should probably only be active
>>>>>> before the loop optimizers, but currently the backward threader isn't
>>>>>> differentiating between before/after loop-optims.
>>>>>>
>>>>>> I haven't tested this patch at all, except that it fixes the testcase :)
>>>>>
>>>>> Lame comment at the current end of the thread - it's not threading through the
>>>>
>>>> I don't know why y'all keep using the word "lame". On the contrary,
>>>> these are incredibly useful explanations. Thanks.
>>>>
>>>>> latch but threading through the loop header that's problematic, at least if the
>>>>> end of the threading path ends within the loop (threading through the header
>>>>> to the loop exit is fine). Because in that situation you effectively created an
>>>>> alternate loop entry. Threading through the latch into the loop header is
>>>>> fine but with simple latches that likely will never happen (if there are no
>>>>> simple latches then the latch can contain the loop exit test).
>>>>>
>>>>> See tree-ssa-threadupdate.c:thread_block_1
>>>>>
>>>>> e2 = path->last ()->e;
>>>>> if (!e2 || noloop_only)
>>>>> {
>>>>> /* If NOLOOP_ONLY is true, we only allow threading through the
>>>>> header of a loop to exit edges. */
>>>>>
>>>>> /* One case occurs when there was loop header buried in a jump
>>>>> threading path that crosses loop boundaries. We do not try
>>>>> and thread this elsewhere, so just cancel the jump threading
>>>>> request by clearing the AUX field now. */
>>>>> if (bb->loop_father != e2->src->loop_father
>>>>> && (!loop_exit_edge_p (e2->src->loop_father, e2)
>>>>> || flow_loop_nested_p (bb->loop_father,
>>>>> e2->dest->loop_father)))
>>>>> {
>>>>> /* Since this case is not handled by our special code
>>>>> to thread through a loop header, we must explicitly
>>>>> cancel the threading request here. */
>>>>> delete_jump_thread_path (path);
>>>>> e->aux = NULL;
>>>>> continue;
>>>>> }
>>>>
>>>> But this is for a threading path that crosses loop boundaries, which is
>>>> not the case. Perhaps we should restrict this further to threads within
>>>> a loop?
>>>>
>>>>>
>>>>> there are a lot of "useful" checks in this function and the backwards threader
>>>>> should adopt those. Note the backwards threader originally only did
>>>>> FSM style threadings which are exactly those possibly "harmful" ones, forming
>>>>> irreducible regions at worst or sub-loops at best. That might explain the
>>>>> lack of those checks.
>>>>
>>>> Also, the aforementioned checks are in jump_thread_path_registry, which
>>>> is also shared by the backward threader. These are thread discards
>>>> _after_ a thread has been registered.
>>>
>>> Yeah, that's indeed unfortunate.
>>>
>>>> The backward threader should also
>>>> be using these restrictions. Unless, I'm missing some interaction with
>>>> the FSM/etc threading types as per the preamble to the snippet you provided:
>>>>
>>>> if (((*path)[1]->type == EDGE_COPY_SRC_JOINER_BLOCK && !joiners)
>>>> || ((*path)[1]->type == EDGE_COPY_SRC_BLOCK && joiners))
>>>> continue;
>>>
>>> Indeed. But I understand the backwards threader does not (only) do FSM
>>> threading now.
>>
>> If it does, it was not part of my rewrite. I was careful to not touch
>> anything dealing with either path profitability or low-level path
>> registering.
>>
>> The path registering is in back_threader_registry::register_path(). We
>> only use EDGE_FSM_THREADs and then a final EDGE_NO_COPY. ISTM that
>> those are only EDGE_FSM_THREADs??
>
> Well, if the backwards threader classifies everything as FSM that's probably
> inaccurate since only threads through the loop latch are "FSM". There is
> the comment
>
> /* If this path does not thread through the loop latch, then we are
> using the FSM threader to find old style jump threads. This
> is good, except the FSM threader does not re-use an existing
> threading path to reduce code duplication.
>
> So for that case, drastically reduce the number of statements
> we are allowed to copy. */
*blink*
Woah. The backward threader has been using FSM threads indiscriminately
as far as I can remember. I wonder what would break if we "fixed it".
>
> so these cases should use the "old style" validity/costing metrics and thus
> classify threading opportunities in a different way?
Jeff, do you have any insight here?
>
> I think today "backwards" vs, "forwards" only refers to the way we find
> threading opportunities.
Yes, it's a mess.
I ran some experiments a while back, and my current work on the enhanced
solver/threader, can fold virtually everything the DOM/threader gets
(even with its use of const_and_copies, avail_exprs, and
evrp_range_analyzer), while getting 5% more DOM threads and 1% more
overall threads. That is, I've been testing if the path solver can
solve everything the DOM threader needs (the hybrid approach I mentioned).
Unfortunately, replacing the forward threader right now is not feasible
for a few reasons:
a) The const_and_copies/avail_exprs relation framework can do floats,
and that's next year's ranger work.
b) Even though we can seemingly fold everything DOM/threader does, in
order to replace it with a backward threader instance we'd have to merge
the cost/profitability code scattered throughout the forward threader,
as well as the EDGE_FSM* / EDGE_COPY* business.
c) DOM changes the IL as it goes. Though we could conceivably divorce
do the threading after DOM is done.
But I digress...
Aldy
next prev parent reply other threads:[~2021-09-09 9:21 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-07 11:49 Aldy Hernandez
2021-09-07 14:45 ` Michael Matz
2021-09-08 10:44 ` Aldy Hernandez
2021-09-08 13:13 ` Richard Biener
2021-09-08 13:25 ` Aldy Hernandez
2021-09-08 13:49 ` Richard Biener
2021-09-08 16:19 ` Aldy Hernandez
2021-09-08 16:39 ` Michael Matz
2021-09-08 18:13 ` Michael Matz
2021-09-09 6:57 ` Richard Biener
2021-09-09 7:37 ` Aldy Hernandez
2021-09-09 7:45 ` Richard Biener
2021-09-09 8:36 ` Aldy Hernandez
2021-09-09 8:58 ` Richard Biener
2021-09-09 9:21 ` Aldy Hernandez [this message]
2021-09-09 10:15 ` Richard Biener
2021-09-09 11:28 ` Aldy Hernandez
2021-09-10 15:51 ` Jeff Law
2021-09-10 16:11 ` Aldy Hernandez
2021-09-10 15:43 ` Jeff Law
2021-09-10 16:05 ` Aldy Hernandez
2021-09-10 16:21 ` Jeff Law
2021-09-10 16:38 ` Aldy Hernandez
2021-09-09 16:59 ` Jeff Law
2021-09-09 12:47 ` Michael Matz
2021-09-09 8:14 ` Aldy Hernandez
2021-09-09 8:24 ` Richard Biener
2021-09-09 12:52 ` Michael Matz
2021-09-09 13:37 ` Aldy Hernandez
2021-09-09 14:44 ` Michael Matz
2021-09-09 15:07 ` Aldy Hernandez
2021-09-10 7:04 ` Aldy Hernandez
2021-09-09 16:54 ` Jeff Law
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8c49db8d-3119-0dc2-2bbb-4062c8d5d53b@redhat.com \
--to=aldyh@redhat.com \
--cc=amacleod@redhat.com \
--cc=gcc@gcc.gnu.org \
--cc=jeffreyalaw@gmail.com \
--cc=matz@suse.de \
--cc=richard.guenther@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).