From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from rock.gnat.com (rock.gnat.com [IPv6:2620:20:4000:0:a9e:1ff:fe9b:1d1]) by sourceware.org (Postfix) with ESMTPS id D3AF83858402; Fri, 18 Feb 2022 23:27:37 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org D3AF83858402 Received: from localhost (localhost.localdomain [127.0.0.1]) by filtered-rock.gnat.com (Postfix) with ESMTP id F19F3116113; Fri, 18 Feb 2022 18:27:35 -0500 (EST) X-Virus-Scanned: Debian amavisd-new at gnat.com X-Amavis-Alert: BAD HEADER SECTION, Duplicate header field: "Cc" Received: from rock.gnat.com ([127.0.0.1]) by localhost (rock.gnat.com [127.0.0.1]) (amavisd-new, port 10024) with LMTP id UEOM9kCiu14s; Fri, 18 Feb 2022 18:27:35 -0500 (EST) Received: from free.home (tron.gnat.com [IPv6:2620:20:4000:0:46a8:42ff:fe0e:e294]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by rock.gnat.com (Postfix) with ESMTPS id 21B54116107; Fri, 18 Feb 2022 18:27:34 -0500 (EST) Received: from livre (livre.home [172.31.160.2]) by free.home (8.15.2/8.15.2) with ESMTPS id 21INRL29581636 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 18 Feb 2022 20:27:25 -0300 From: Alexandre Oliva To: Jeff Law , Vladimir Makarov Cc: gcc-patches@gcc.gnu.org, zsojka@seznam.cz, wilson@gcc.gnu.org Cc: Sebastian Huber Subject: Re: [PR103302] skip multi-word pre-move clobber during lra Organization: Free thinker, does not speak for AdaCore References: <74d2af86-c1c4-0842-b319-09e76b3bf633@gmail.com> <4bf89538-8c3c-59a7-378b-d02982e6a5ea@gmail.com> <03226f85-0334-1840-94f9-6c618175f974@gmail.com> Errors-To: aoliva@lxoliva.fsfla.org Date: Fri, 18 Feb 2022 20:27:21 -0300 In-Reply-To: <03226f85-0334-1840-94f9-6c618175f974@gmail.com> (Jeff Law's message of "Wed, 15 Dec 2021 09:00:01 -0700") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Scanned-By: MIMEDefang 2.84 X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Feb 2022 23:27:40 -0000 On Dec 15, 2021, Jeff Law wrote: >> * expr.c (emit_move_complex_parts): Skip clobbers during lra. > OK for the next cycle. Thanks, but having looked into PR 104121, I withdraw this patch and also the already-installed patch for PR 103302. As I found out, LRA does worse without the clobbers for multi-word moves, not only because the clobbers shorten live ranges and enable earlier and better allocations, but also because find_reload_regno_insns implicitly, possibly unknowingly, relied on the clobbers to avoid the risk of an infinite loop. As noted in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104121#c11 with the clobber, a multi-word reload, and the insn the reload applies to, we get 4 insns, so find_reload_regno_insns avoids the loop. Without the clobber, a multi-word reload for either input or output makes for n==3, so we enter the loop and don't ever exit it: we'll find first_insn (input) or second_insn (output), but then we'll loop forever because we won't iterate again on {prev,next}_insn, respectively, and the other iterator won't find the other word reload. We advance the other till the end, but that's not enough for us to terminate the loop. With the proposed patch reversal, we no longer hit the problem with the v850 testcase in 104121, but I'm concerned we might still get an infinite loop on ports whose input or output reloads might emit a pair of insns without a clobber. I see two ways to robustify it. One is to find a complete reload sequence: diff --git a/gcc/lra-assigns.cc b/gcc/lra-assigns.cc index c1d40ea2a14bd..ff1688917cbce 100644 --- a/gcc/lra-assigns.cc +++ b/gcc/lra-assigns.cc @@ -1716,9 +1716,18 @@ find_reload_regno_insns (int regno, rtx_insn * &start, rtx_insn * &finish) start_insn = lra_insn_recog_data[uid]->insn; n++; } - /* For reload pseudo we should have at most 3 insns referring for - it: input/output reload insns and the original insn. */ - if (n > 3) + /* For reload pseudo we should have at most 3 (sequences of) insns + referring for it: input/output reload insn sequences and the + original insn. Each reload insn sequence may amount to multiple + insns, but we expect to find each of them contiguous, one before + start_insn, one after it. We know start_insn is between the + sequences because it's the lowest-numbered insn, thus the first + we'll have found above. The reload insns, emitted later, will + have been assigned higher insn uids. If this assumption doesn't + hold, and there happen to be intervening reload insns for other + pseudos, we may end up returning false after searching to the end + in the wrong direction. */ + if (n > 1 + 2 * CEIL (lra_reg_info[regno].biggest_mode, UNITS_PER_WORD)) return false; if (n > 1) { @@ -1726,26 +1735,52 @@ find_reload_regno_insns (int regno, rtx_insn * &start, rtx_insn * &finish) next_insn = NEXT_INSN (start_insn); n != 1 && (prev_insn != NULL || next_insn != NULL); ) { - if (prev_insn != NULL && first_insn == NULL) + if (prev_insn != NULL) { if (! bitmap_bit_p (&lra_reg_info[regno].insn_bitmap, INSN_UID (prev_insn))) prev_insn = PREV_INSN (prev_insn); else { - first_insn = prev_insn; - n--; + /* A reload sequence may have multiple insns, but + they must be contiguous. */ + do + { + first_insn = prev_insn; + n--; + prev_insn = PREV_INSN (prev_insn); + } + while (n > 1 && prev_insn + && bitmap_bit_p (&lra_reg_info[regno].insn_bitmap, + INSN_UID (prev_insn))); + /* After finding first_insn, we don't want to search + backward any more, so set prev_insn to NULL so as + to not loop indefinitely. */ + prev_insn = NULL; } } - if (next_insn != NULL && second_insn == NULL) + else if (next_insn != NULL) { if (! bitmap_bit_p (&lra_reg_info[regno].insn_bitmap, INSN_UID (next_insn))) next_insn = NEXT_INSN (next_insn); else { - second_insn = next_insn; - n--; + /* A reload sequence may have multiple insns, but + they must be contiguous. */ + do + { + second_insn = next_insn; + n--; + next_insn = NEXT_INSN (next_insn); + } + while (n > 1 && next_insn + && bitmap_bit_p (&lra_reg_info[regno].insn_bitmap, + INSN_UID (next_insn))); + /* After finding second_insn, we don't want to + search forward any more, so set next_insn to NULL + so as to not loop indefinitely. */ + next_insn = NULL; } } } the other is to just prevent the infinite loop, that will then return false because n > 1 after the loop ends: diff --git a/gcc/lra-assigns.cc b/gcc/lra-assigns.cc index c1d40ea2a14bd..efd5f764a37a5 100644 --- a/gcc/lra-assigns.cc +++ b/gcc/lra-assigns.cc @@ -1726,7 +1726,7 @@ find_reload_regno_insns (int regno, rtx_insn * &start, rtx_insn * &finish) next_insn = NEXT_INSN (start_insn); n != 1 && (prev_insn != NULL || next_insn != NULL); ) { - if (prev_insn != NULL && first_insn == NULL) + if (prev_insn != NULL) { if (! bitmap_bit_p (&lra_reg_info[regno].insn_bitmap, INSN_UID (prev_insn))) @@ -1735,9 +1735,10 @@ find_reload_regno_insns (int regno, rtx_insn * &start, rtx_insn * &finish) { first_insn = prev_insn; n--; + prev_insn = NULL; } } - if (next_insn != NULL && second_insn == NULL) + if (next_insn != NULL) { if (! bitmap_bit_p (&lra_reg_info[regno].insn_bitmap, INSN_UID (next_insn))) @@ -1746,6 +1747,7 @@ find_reload_regno_insns (int regno, rtx_insn * &start, rtx_insn * &finish) { second_insn = next_insn; n--; + next_insn = NULL; } } } When it comes to the v850 testcase, one of them just moves the infinite loop to lra(), as we never get past while(fails_p); the other hits lra-assigns.cc:lra_assign (bool&)'s if (flag_checking && (lra_assignment_iter_after_spill > LRA_MAX_ASSIGNMENT_ITERATION_NUMBER)) internal_error ("maximum number of LRA assignment passes is achieved (%d)", LRA_MAX_ASSIGNMENT_ITERATION_NUMBER); which would loop indefinitely too without flag_checking. Neither solves the v850 problem, only restoring the clobber does, because then, with shorter live ranges, allocation succeeds for the reloads, and we don't even try to split -> spill their pseudos. Would any of these patchlets make sense to pursue regardless? Ok to revert commit r12-5852-g50e8b0c9bca6cdc57804f860ec5311b641753fbb I'm going to get back to the drawing board as to pr103302, since the problem there will likely resurface, possibly also on v850. diff --git a/gcc/expr.cc b/gcc/expr.cc index 63a4aa838dec0..b6ed54983fabf 100644 --- a/gcc/expr.cc +++ b/gcc/expr.cc @@ -3929,7 +3929,7 @@ emit_move_multi_word (machine_mode mode, rtx x, rtx y) hard regs shouldn't appear here except as return values. We never want to emit such a clobber after reload. */ if (x != y - && ! (lra_in_progress || reload_in_progress || reload_completed) + && ! (reload_in_progress || reload_completed) && need_clobber != 0) emit_clobber (x); -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer Disinformation flourishes because many people care deeply about injustice but very few check the facts. Ask me about