From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <richard.guenther@gmail.com>
Received: from mail-ej1-x62e.google.com (mail-ej1-x62e.google.com
 [IPv6:2a00:1450:4864:20::62e])
 by sourceware.org (Postfix) with ESMTPS id 29512384B0E6
 for <gcc@gcc.gnu.org>; Thu,  9 Sep 2021 07:45:41 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 29512384B0E6
Received: by mail-ej1-x62e.google.com with SMTP id x11so1793192ejv.0
 for <gcc@gcc.gnu.org>; Thu, 09 Sep 2021 00:45:41 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=WZWZsxYUrFFyOvs2Mlqy5/kGf/Utenr0eRXuG/2FhCA=;
 b=w1tt5NTB7+tWYAOwpB18vTxAP2tLgP7WNK0l2ZHEwIzeqIASmrwXq83EDjYDJxYAfj
 j/oqbRCrWkcJGTRHzavkmSKuVD9mZjfD/P1SgnO2PMM0RKt5d5kGyhfeBWQy7n6oIsDq
 9+FUQ8zWo82awMvle23VZC8wfCmWTEKqmWmTyruMqt9oRLPzn7Wej4cCh8Gtmx6RUpFw
 P5+JITmQsRGEgmw4KKhwahroSrEEzckqxNQ+HFqX58Jq68liD6rHhbeiwnZEQ+Af5ovU
 lFrs2X7TGcTQPPFg32OsX9Rg9BPifZ3Ii17aC69MEzSe6FluH55fD5WddzoxQA/0zs6p
 QVuA==
X-Gm-Message-State: AOAM531gE6J4hoo/iEDzbqOyw8dXUOjesdptOQ9R001qet/R5sVsecQm
 koVljqN6OffQF0WsKK7w9O5d0dlkA+d/+YE1HHQ=
X-Google-Smtp-Source: ABdhPJyOCE3FmbYdgDz2r56qPC4MGbYLxcucrFDK+HWXGJAEM8zLdG2QVZTp9IzgZcHXBg5e9svwXAe5v6b//K/pbuA=
X-Received: by 2002:a17:906:b104:: with SMTP id
 u4mr1971472ejy.201.1631173540047; 
 Thu, 09 Sep 2021 00:45:40 -0700 (PDT)
MIME-Version: 1.0
References: <09e48b82-bc51-405e-7680-89a5f08e4e8f@redhat.com>
 <alpine.LSU.2.20.2109071406070.12583@wotan.suse.de>
 <cb4191f1-0432-8740-b327-078c2aaff0fa@redhat.com>
 <CAFiYyc3ZBApnN3ks3YExLTLQ8tvEdFV-uuU2MXVVqXFYn+cdRw@mail.gmail.com>
 <d8234eed-6c60-c5fb-8800-5e9c5b932c58@redhat.com>
 <CAFiYyc2KWNEMD31AdYuNJ-dP7ixMsWtTCtokQpcbRrZctTUqzA@mail.gmail.com>
 <6d5695e4-e4eb-14a5-46a6-f425d1514008@redhat.com>
 <alpine.LSU.2.20.2109081627081.12583@wotan.suse.de>
 <alpine.LSU.2.20.2109081736530.12583@wotan.suse.de>
 <CAFiYyc1Mfw8RJ+qSGzwRtA4vhWLnLiFYeuPLxagFu6H6e_mVhg@mail.gmail.com>
 <07fdd2bb-e6b7-fe66-f6a0-df6ec0704ae4@redhat.com>
In-Reply-To: <07fdd2bb-e6b7-fe66-f6a0-df6ec0704ae4@redhat.com>
From: Richard Biener <richard.guenther@gmail.com>
Date: Thu, 9 Sep 2021 09:45:29 +0200
Message-ID: <CAFiYyc1oUQciFz12-70SvZw3e7h_p8TO3LKBVNuvAvPyjn9smQ@mail.gmail.com>
Subject: Re: More aggressive threading causing loop-interchange-9.c regression
To: Aldy Hernandez <aldyh@redhat.com>
Cc: Michael Matz <matz@suse.de>, Jeff Law <jeffreyalaw@gmail.com>, 
 GCC Mailing List <gcc@gcc.gnu.org>, Andrew MacLeod <amacleod@redhat.com>
Content-Type: text/plain; charset="UTF-8"
X-Spam-Status: No, score=-2.2 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,
 SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc mailing list <gcc.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc>,
 <mailto:gcc-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <mailto:gcc-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc>,
 <mailto:gcc-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Thu, 09 Sep 2021 07:45:43 -0000

On Thu, Sep 9, 2021 at 9:37 AM Aldy Hernandez <aldyh@redhat.com> wrote:
>
>
>
> On 9/9/21 8:57 AM, Richard Biener wrote:
> > On Wed, Sep 8, 2021 at 8:13 PM Michael Matz <matz@suse.de> wrote:
> >>
> >> Hello,
> >>
> >> [lame answer to self]
> >>
> >> On Wed, 8 Sep 2021, Michael Matz wrote:
> >>
> >>>>> The forward threader guards against this by simply disallowing
> >>>>> threadings that involve different loops.  As I see
> >>>>
> >>>> The thread in question (5->9->3) is all within the same outer loop,
> >>>> though. BTW, the backward threader also disallows threading across
> >>>> different loops (see path_crosses_loops variable).
> >> ...
> >>> Maybe it's possible to not disable threading over latches alltogether in
> >>> the backward threader (like it's tried now), but I haven't looked at the
> >>> specific situation here in depth, so take my view only as opinion from a
> >>> large distance :-)
> >>
> >> I've now looked at the concrete situation.  So yeah, the whole path is in
> >> the same loop, crosses the latch, _and there's code following the latch
> >> on that path_.  (I.e. the latch isn't the last block in the path).  In
> >> particular, after loop_optimizer_init() (before any threading) we have:
> >>
> >>    <bb 3> [local count: 118111600]:
> >>    # j_19 = PHI <j_13(9), 0(7)>
> >>    sum_11 = c[j_19];
> >>    if (n_10(D) > 0)
> >>      goto <bb 8>; [89.00%]
> >>    else
> >>      goto <bb 5>; [11.00%]
> >>
> >>       <bb 8> [local count: 105119324]:
> >> ...
> >>
> >>    <bb 5> [local count: 118111600]:
> >>    # sum_21 = PHI <sum_14(4), sum_11(3)>
> >>    c[j_19] = sum_21;
> >>    j_13 = j_19 + 1;
> >>    if (n_10(D) > j_13)
> >>      goto <bb 9>; [89.00%]
> >>    else
> >>      goto <bb 6>; [11.00%]
> >>
> >>    <bb 9> [local count: 105119324]:
> >>    goto <bb 3>; [100.00%]
> >>
> >> With bb9 the outer (empty) latch, bb3 the outer header, and bb8 the
> >> pre-header of inner loop, but more importantly something that's not at the
> >> start of the outer loop.
> >>
> >> Now, any thread that includes the backedge 9->3 _including_ its
> >> destination (i.e. where the backedge isn't the last to-be-redirected edge)
> >> necessarily duplicates all code from that destination onto the back edge.
> >> Here it's the load from c[j] into sum_11.
> >>
> >> The important part is the code is emitted onto the back edge,
> >> conceptually; in reality it's simply included into the (new) latch block
> >> (the duplicate of bb9, which is bb12 intermediately, then named bb7 after
> >> cfg_cleanup).
> >>
> >> That's what we can't have for some of our structural loop optimizers:
> >> there must be no code executed after the exit test (e.g. in the latch
> >> block).  (This requirement makes reasoning about which code is or isn't
> >> executed completely for an iteration trivial; simply everything in the
> >> body is always executed; e.g. loop interchange uses this to check that
> >> there are no memory references after the exit test, because those would
> >> then be only conditional and hence make loop interchange very awkward).
> >>
> >> Note that this situation can't be later rectified anymore: the duplicated
> >> instructions (because they are memory refs) must remain after the exit
> >> test.  Only by rerolling/unrotating the loop (i.e. noticing that the
> >> memory refs on the loop-entry path and on the back edge are equivalent)
> >> would that be possible, but that's something we aren't capable of.  Even
> >> if we were that would simply just revert the whole work that the threader
> >> did, so it's better to not even do that to start with.
> >>
> >> I believe something like below would be appropriate, it disables threading
> >> if the path contains a latch at the non-last position (due to being
> >> backwards on the non-first position in the array).  I.e. it disables
> >> rotating the loop if there's danger of polluting the back edge.  It might
> >> be improved if the blocks following (preceding!) the latch are themself
> >> empty because then no code is duplicated.  It might also be improved if
> >> the latch is already non-empty.  That code should probably only be active
> >> before the loop optimizers, but currently the backward threader isn't
> >> differentiating between before/after loop-optims.
> >>
> >> I haven't tested this patch at all, except that it fixes the testcase :)
> >
> > Lame comment at the current end of the thread - it's not threading through the
>
> I don't know why y'all keep using the word "lame".  On the contrary,
> these are incredibly useful explanations.  Thanks.
>
> > latch but threading through the loop header that's problematic, at least if the
> > end of the threading path ends within the loop (threading through the header
> > to the loop exit is fine).  Because in that situation you effectively created an
> > alternate loop entry.  Threading through the latch into the loop header is
> > fine but with simple latches that likely will never happen (if there are no
> > simple latches then the latch can contain the loop exit test).
> >
> > See tree-ssa-threadupdate.c:thread_block_1
> >
> >        e2 = path->last ()->e;
> >        if (!e2 || noloop_only)
> >          {
> >            /* If NOLOOP_ONLY is true, we only allow threading through the
> >               header of a loop to exit edges.  */
> >
> >            /* One case occurs when there was loop header buried in a jump
> >               threading path that crosses loop boundaries.  We do not try
> >               and thread this elsewhere, so just cancel the jump threading
> >               request by clearing the AUX field now.  */
> >            if (bb->loop_father != e2->src->loop_father
> >                && (!loop_exit_edge_p (e2->src->loop_father, e2)
> >                    || flow_loop_nested_p (bb->loop_father,
> >                                           e2->dest->loop_father)))
> >              {
> >                /* Since this case is not handled by our special code
> >                   to thread through a loop header, we must explicitly
> >                   cancel the threading request here.  */
> >                delete_jump_thread_path (path);
> >                e->aux = NULL;
> >                continue;
> >              }
>
> But this is for a threading path that crosses loop boundaries, which is
> not the case.  Perhaps we should restrict this further to threads within
> a loop?
>
> >
> > there are a lot of "useful" checks in this function and the backwards threader
> > should adopt those.  Note the backwards threader originally only did
> > FSM style threadings which are exactly those possibly "harmful" ones, forming
> > irreducible regions at worst or sub-loops at best.  That might explain the
> > lack of those checks.
>
> Also, the aforementioned checks are in jump_thread_path_registry, which
> is also shared by the backward threader.  These are thread discards
> _after_ a thread has been registered.

Yeah, that's indeed unfortunate.

>  The backward threader should also
> be using these restrictions.  Unless, I'm missing some interaction with
> the FSM/etc threading types as per the preamble to the snippet you provided:
>
>       if (((*path)[1]->type == EDGE_COPY_SRC_JOINER_BLOCK && !joiners)
>           || ((*path)[1]->type == EDGE_COPY_SRC_BLOCK && joiners))
>         continue;

Indeed.  But I understand the backwards threader does not (only) do FSM
threading now.

> You are right though, there are a lot of checks throughout the entire
> forward threader that should be audited and somehow shared.  It's on my
> back burner, but I'm running out of cycles here :-/.

yeah, it's quite a mess indeed and merging the path
validity/costing/code-transform
paths of both threaders would be incredibly useful.

Richard.

>
> Thanks.
> Aldy
>