From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ej1-x630.google.com (mail-ej1-x630.google.com [IPv6:2a00:1450:4864:20::630]) by sourceware.org (Postfix) with ESMTPS id 066ED384C002 for ; Thu, 9 Sep 2021 06:57:55 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 066ED384C002 Received: by mail-ej1-x630.google.com with SMTP id x11so1556277ejv.0 for ; Wed, 08 Sep 2021 23:57:54 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=naRyW9n273tb3CoLDX6csHZpW6OzUKmb0ZSZHjnMO8Y=; b=OcglFnoohP+FgmayL8Z6f931Lxfcx8xBf2PXQ49KoTC11sA8opAfblG0fOKRNDJ9ks MMRZv947RBR78R+fGRLKMT0ZB7QzIaMbNumoWWs62QgbBUDsaxQc14IG6rGSJ0zg9t6M ZxKCmInnyw22r1j8IgDXhVqHgZqUUZ3HxGWxaFLoBjLr7/tkWI8+cUHQKMF/ccqNoNc4 3ZyXew5SqO43zxeOgTtNtGon3wLDDjRXPT4nD7evY//LQ0JMe3A9MOHl0e0qAZICafWe EPsIaF2DOW1Xvjj+gygtrORr/qtL1IIB5BxJymDZlbeiXj5TA2q8Jo2Jug3JZrRLaSad LRBw== X-Gm-Message-State: AOAM533II2DqOb+lTTHw9nDjRxLyxQ1oxjBELhkCvjnayVrau3tNGx3k nIf3oJpA5Bz7Pf1jS9nM8AFEEZY7qU2WvLB8Aso= X-Google-Smtp-Source: ABdhPJxEdVwDHYA+91tZDzTotYYW/cKk3ZKuZighhmyU7bH2I7kex6AK49XxvufhQbAo3T7NYqjE68p3kmf5gtCKBqE= X-Received: by 2002:a17:906:584:: with SMTP id 4mr1835749ejn.56.1631170673832; Wed, 08 Sep 2021 23:57:53 -0700 (PDT) MIME-Version: 1.0 References: <09e48b82-bc51-405e-7680-89a5f08e4e8f@redhat.com> <6d5695e4-e4eb-14a5-46a6-f425d1514008@redhat.com> In-Reply-To: From: Richard Biener Date: Thu, 9 Sep 2021 08:57:42 +0200 Message-ID: Subject: Re: More aggressive threading causing loop-interchange-9.c regression To: Michael Matz Cc: Aldy Hernandez , Jeff Law , GCC Mailing List , Andrew MacLeod Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-8.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Sep 2021 06:57:56 -0000 On Wed, Sep 8, 2021 at 8:13 PM Michael Matz wrote: > > Hello, > > [lame answer to self] > > On Wed, 8 Sep 2021, Michael Matz wrote: > > > > > The forward threader guards against this by simply disallowing > > > > threadings that involve different loops. As I see > > > > > > The thread in question (5->9->3) is all within the same outer loop, > > > though. BTW, the backward threader also disallows threading across > > > different loops (see path_crosses_loops variable). > ... > > Maybe it's possible to not disable threading over latches alltogether in > > the backward threader (like it's tried now), but I haven't looked at the > > specific situation here in depth, so take my view only as opinion from a > > large distance :-) > > I've now looked at the concrete situation. So yeah, the whole path is in > the same loop, crosses the latch, _and there's code following the latch > on that path_. (I.e. the latch isn't the last block in the path). In > particular, after loop_optimizer_init() (before any threading) we have: > > [local count: 118111600]: > # j_19 = PHI > sum_11 = c[j_19]; > if (n_10(D) > 0) > goto ; [89.00%] > else > goto ; [11.00%] > > [local count: 105119324]: > ... > > [local count: 118111600]: > # sum_21 = PHI > c[j_19] = sum_21; > j_13 = j_19 + 1; > if (n_10(D) > j_13) > goto ; [89.00%] > else > goto ; [11.00%] > > [local count: 105119324]: > goto ; [100.00%] > > With bb9 the outer (empty) latch, bb3 the outer header, and bb8 the > pre-header of inner loop, but more importantly something that's not at the > start of the outer loop. > > Now, any thread that includes the backedge 9->3 _including_ its > destination (i.e. where the backedge isn't the last to-be-redirected edge) > necessarily duplicates all code from that destination onto the back edge. > Here it's the load from c[j] into sum_11. > > The important part is the code is emitted onto the back edge, > conceptually; in reality it's simply included into the (new) latch block > (the duplicate of bb9, which is bb12 intermediately, then named bb7 after > cfg_cleanup). > > That's what we can't have for some of our structural loop optimizers: > there must be no code executed after the exit test (e.g. in the latch > block). (This requirement makes reasoning about which code is or isn't > executed completely for an iteration trivial; simply everything in the > body is always executed; e.g. loop interchange uses this to check that > there are no memory references after the exit test, because those would > then be only conditional and hence make loop interchange very awkward). > > Note that this situation can't be later rectified anymore: the duplicated > instructions (because they are memory refs) must remain after the exit > test. Only by rerolling/unrotating the loop (i.e. noticing that the > memory refs on the loop-entry path and on the back edge are equivalent) > would that be possible, but that's something we aren't capable of. Even > if we were that would simply just revert the whole work that the threader > did, so it's better to not even do that to start with. > > I believe something like below would be appropriate, it disables threading > if the path contains a latch at the non-last position (due to being > backwards on the non-first position in the array). I.e. it disables > rotating the loop if there's danger of polluting the back edge. It might > be improved if the blocks following (preceding!) the latch are themself > empty because then no code is duplicated. It might also be improved if > the latch is already non-empty. That code should probably only be active > before the loop optimizers, but currently the backward threader isn't > differentiating between before/after loop-optims. > > I haven't tested this patch at all, except that it fixes the testcase :) Lame comment at the current end of the thread - it's not threading through the latch but threading through the loop header that's problematic, at least if the end of the threading path ends within the loop (threading through the header to the loop exit is fine). Because in that situation you effectively created an alternate loop entry. Threading through the latch into the loop header is fine but with simple latches that likely will never happen (if there are no simple latches then the latch can contain the loop exit test). See tree-ssa-threadupdate.c:thread_block_1 e2 = path->last ()->e; if (!e2 || noloop_only) { /* If NOLOOP_ONLY is true, we only allow threading through the header of a loop to exit edges. */ /* One case occurs when there was loop header buried in a jump threading path that crosses loop boundaries. We do not try and thread this elsewhere, so just cancel the jump threading request by clearing the AUX field now. */ if (bb->loop_father != e2->src->loop_father && (!loop_exit_edge_p (e2->src->loop_father, e2) || flow_loop_nested_p (bb->loop_father, e2->dest->loop_father))) { /* Since this case is not handled by our special code to thread through a loop header, we must explicitly cancel the threading request here. */ delete_jump_thread_path (path); e->aux = NULL; continue; } there are a lot of "useful" checks in this function and the backwards threader should adopt those. Note the backwards threader originally only did FSM style threadings which are exactly those possibly "harmful" ones, forming irreducible regions at worst or sub-loops at best. That might explain the lack of those checks. Richard. > > Ciao, > Michael. > > diff --git a/gcc/tree-ssa-threadbackward.c b/gcc/tree-ssa-threadbackward.c > index 449232c7715..528a753b886 100644 > --- a/gcc/tree-ssa-threadbackward.c > +++ b/gcc/tree-ssa-threadbackward.c > @@ -600,6 +600,7 @@ back_threader_profitability::profitable_path_p (const vec &m_path, > loop_p loop = m_path[0]->loop_father; > bool path_crosses_loops = false; > bool threaded_through_latch = false; > + bool latch_within_path = false; > bool multiway_branch_in_path = false; > bool threaded_multiway_branch = false; > bool contains_hot_bb = false; > @@ -725,7 +726,13 @@ back_threader_profitability::profitable_path_p (const vec &m_path, > the last entry in the array when determining if we thread > through the loop latch. */ > if (loop->latch == bb) > - threaded_through_latch = true; > + { > + threaded_through_latch = true; > + if (j != 0) > + latch_within_path = true; > + if (dump_file && (dump_flags & TDF_DETAILS)) > + fprintf (dump_file, " (latch)"); > + } > } > > gimple *stmt = get_gimple_control_stmt (m_path[0]); > @@ -845,6 +852,15 @@ back_threader_profitability::profitable_path_p (const vec &m_path, > "a multiway branch.\n"); > return false; > } > + > + if (latch_within_path) > + { > + if (dump_file && (dump_flags & TDF_DETAILS)) > + fprintf (dump_file, > + " FAIL: FSM Thread through latch would create non-empty latch\n"); > + return false; > + > + } > return true; > } >