From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by sourceware.org (Postfix) with ESMTPS id C20093858D34 for ; Tue, 7 Sep 2021 14:45:35 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C20093858D34 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 94D1220083; Tue, 7 Sep 2021 14:45:34 +0000 (UTC) Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 8C798A3B81; Tue, 7 Sep 2021 14:45:34 +0000 (UTC) Received: by wotan.suse.de (Postfix, from userid 10510) id 8593864BF; Tue, 7 Sep 2021 14:45:34 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by wotan.suse.de (Postfix) with ESMTP id 848B762F5; Tue, 7 Sep 2021 14:45:34 +0000 (UTC) Date: Tue, 7 Sep 2021 14:45:34 +0000 (UTC) From: Michael Matz To: Aldy Hernandez cc: Jeff Law , Richard Biener , GCC Mailing List Subject: Re: More aggressive threading causing loop-interchange-9.c regression In-Reply-To: <09e48b82-bc51-405e-7680-89a5f08e4e8f@redhat.com> Message-ID: References: <09e48b82-bc51-405e-7680-89a5f08e4e8f@redhat.com> User-Agent: Alpine 2.20 (LSU 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-3.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Sep 2021 14:45:37 -0000 Hello, On Tue, 7 Sep 2021, Aldy Hernandez via Gcc wrote: > The regression comes from the simple_reduc_1() function in > tree-ssa/loop-interchange-9.c, and it involves the following path: > > =========== BB 5 ============ > Imports: n_10(D) j_19 > Exports: n_10(D) j_13 j_19 > j_13 : j_19(I) > n_10(D) int [1, 257] > j_19 int [0, 256] > Relational : (j_13 > j_19) > [local count: 118111600]: > # sum_21 = PHI > c[j_19] = sum_21; > j_13 = j_19 + 1; > if (n_10(D) > j_13) > goto ; [89.00%] > else > goto ; [11.00%] So, this is the outer loops exit block ... > =========== BB 9 ============ > n_10(D) int [2, 257] > j_13 int [1, 256] > Relational : (n_10(D) > j_19) > Relational : (n_10(D) > j_13) > [local count: 105119324]: > goto ; [100.00%] ... this the outer loops latch block ... > =========== BB 3 ============ > Imports: n_10(D) > Exports: n_10(D) > n_10(D) int [1, +INF] > [local count: 118111600]: > # j_19 = PHI > sum_11 = c[j_19]; > if (n_10(D) > 0) > goto ; [89.00%] > else > goto ; [11.00%] ... and this the outer loops header, as well as inner loops pre-header. The attempted thread hence involves a back-edge (of the outer loop) and a loop-entry edge (bb3->bb8) of the inner loop. Doing that threading would destroy some properties that our loop optimizers rely on, e.g. that the loop header of a natural loop is only reached by two edges: entry edge and back edge, and that the latch blocks are empty and that there's only a single latch. (What exactly we require depends on several flags in loops_state_satisfies_p). > With knowledge from previous passes (SSA_NAME_RANGE_INFO), we know that > the loop cannot legally iterate outside the size of c[256]. So, j_13 > lies within [1, 257] and n_10 is [2, +INF] at the end of the path. > This allows us to thread BB3 to BB8. So, IIUC doing this threading would create a new entry to BB8: it would then be entered by its natural entry (bb3), by its natural back edge (whatever bb that is now) and the new entry from the threaded outer back edge (bb9 or bb5?). The inner loop wouldn't then be recognized as natural anymore and the whole nest not as perfect, and hence loop interchange can't easily happen anymore. Most other structural loop optimizers of us would have problems with that situation as well. > All the blocks lie within the same loop, and the path passes all the > tests in path_profitable_p(). > > Is there something about the shape of this path that should make it > invalid in the backward threader, or should the test be marked with > -fdisable-tree-threadN (or something else entirely)? This is a functionality test checking that the very necessary interchange in this test does happen with default plus -floop-interchange (with the intention of it being enabled by O3 or with profile info). So no additional flags can be added without changing the meaning of this test. > Note that the > backward threader is allowed to peel through loop headers. Something needs to give way in the path threaders before loop optimizations: either threading through back edges, through loop latches or through loop headers needs to be disabled. I think traditionally at least threading through latches should be disabled, because doing so usually destroys simple loop structure. I see that profitable_path_p() of the backward threader wants to be careful in some situations involving loops and latches; possibly it's not careful enough yet for the additional power brought by ranger. See also the additional tests tree-cfgcleanup.c:tree_forwarder_block_p is doing when loops are active. After the structural loop optimizers the threader can go wild and thread everything it finds. Ciao, Michael.