From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id 14FDE385E833 for ; Fri, 3 Jun 2022 17:27:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 14FDE385E833 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=oYG7rFPrQj5j8Ze6YKUPjhV1PadX09vFU09AnKxNaDQ=; b=hfHkMl9oPE5yXx/wVkEyBphBDl rFQv/tKBB5tPXzMIHHkSqaMLL9pJnwDsiYIhJDKfqo12i+dcHu28o5RHlZnRSfMboXZkW2cADxlWT fBAxRMC7Umk7aG4dr5ZmtorwZ2tgQi/Xm989Rc9bDXd0XLqqJBfEE2e8JdAPfRYouEL+0pgGSD/46 BFbuI5mloyAAmtlOnztQbWo09woSN+dpCZInF7RyAJzG5ztFh5VpkgUDC5M+ZmqCQWqzfRQfNP1NQ V9a2qUa7UeFKwCYrwaRRBgFxgWYDL6Er+n7ab1oAlz4mKhbsiJ4bqfjC2ZT6xNZZxXmeOrPJ8kDHK 0vWf4Kaw==; Received: from [185.62.158.67] (port=56127 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1nxB4v-0002aP-F2; Fri, 03 Jun 2022 13:27:17 -0400 From: "Roger Sayle" To: "'Richard Biener'" Cc: "'GCC Patches'" Subject: [PATCH/RFC take #2] cprop_hardreg... Third time's a charm. Date: Fri, 3 Jun 2022 18:27:16 +0100 Message-ID: <006c01d8776f$2c67b960$85372c20$@nextmovesoftware.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_006D_01D87777.8E2E9260" X-Mailer: Microsoft Outlook 16.0 Content-Language: en-gb Thread-Index: Adh3bP1s6aGyQrZKQL+VFclegzpz5Q== X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 03 Jun 2022 17:27:19 -0000 This is a multipart message in MIME format. ------=_NextPart_000_006D_01D87777.8E2E9260 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Richard, Here's a revised version of my patch incorporating both your = suggestions. The algorithm now uses two worklist vectors, and pointers to them, alternating between them on each iteration, which allows the code to handle an arbitrary number of passes without the previous code = duplication. This then allows a "for (pass=3D2; pass<=3Dpasses; pass++)"-like idiom, = that allows it to try two passes (the current behaviour) at -O1, but three passes = with -O2 and higher. This also provides a convenient point, should someone wish to control the number of cprop_hardreg pass iterations from the command line (or a backend) in future. This patch has been retested on x86_64-pc-linux-gnu, both with an = without the proposed i386 backend patch for reduced register shuffling, with make bootstrap and make -k check, both with/without target_board=3D unix{-m32}, with no new failures. Is this what you had in mind? 2022-06-03 Roger Sayle Richard Biener gcc/ChangeLog * regcprop.cc (pass_cprop_hardreg::execute): Perform a third iteration over each basic block that was updated by the second iteration. Cheers, Roger -- > -----Original Message----- > From: Richard Biener > Sent: 02 June 2022 11:56 > To: Roger Sayle > Cc: GCC Patches > Subject: Re: [PATCH/RFC] cprop_hardreg... Third time's a charm. >=20 > On Thu, Jun 2, 2022 at 12:20 PM Roger Sayle = > wrote: > > > > > > > > This middle-end patch proposes the "hard register constant = propagation" > > > > pass be performed up to three times on each basic block (up from the > > > > current two times) if the second pass successfully made changes. > > > > > > > > The motivation for three passes is to handle the "swap idiom" (i.e. > > > > t =3D x; x =3D y; y =3D t;" sequences) that get generated by = register > > allocation > > > > (reload). > > > > > > > > Consider the x86_64 test case for __int128 addition recently = discussed > > > > on gcc-patches. With that proposed patch, the input to the > > cprop_hardreg > > > > pass looks like: > > > > > > > > movq %rdi, %r8 > > > > movq %rsi, %rdi > > > > movq %r8, %rsi > > > > movq %rdx, %rax > > > > movq %rcx, %rdx > > > > addq %rsi %rax > > > > adcq %rdi, %rdx > > > > ret > > > > > > > > where the first three instructions effectively swap %rsi and %rdi. > > > > > > > > On the first pass of cprop_hardreg, we notice that the third insn, > > > > %rsi :=3D %r8, is redundant and can eliminated/propagated to = produce: > > > > > > > > movq %rdi, %r8 > > > > movq %rsi, %rdi > > > > movq %rdx, %rax > > > > movq %rcx, %rdx > > > > addq %r8 %rax > > > > adcq %rdi, %rdx > > > > ret > > > > > > > > Because a successful propagation was found, cprop_hardreg then runs > > > > a second pass/sweep on affected basic blocks (using worklist), and > > > > on this second pass notices that the second instruction, %rdi :=3D = %rsi, > > > > may now be propagated (%rsi was killed in the before the first > > transform), > > > > and after a second pass, we now end up with: > > > > > > > > movq %rdi, %r8 > > > > movq %rdx, %rax > > > > movq %rcx, %rdx > > > > addq %r8, %rax > > > > adcq %rsi, %rdx > > > > ret > > > > > > > > which is the current behaviour on mainline. However, a third and > > final > > > > pass would now notice that the first insn, "%r8 :=3D %rdi" is also = now > > > > eliminable, and a third iteration would produce optimal code: > > > > > > > > movq %rdx, %rax > > > > movq %rcx, %rdx > > > > addq %rdi, %rax > > > > adcq %rsi, %rdx > > > > ret > > > > > > > > The patch below creates an additional worklist, third_pass, that is > > > > populated with the basic block id's of blocks that were updated = during > > > > the second pass. Does the motivation for three passes (reload = doesn't > > > > generate more or less than three instructions to swap a pair of > > registers) > > > > seem reasonable for all targets? If cprop_hardreg is considered an > > > > expensive pass, this change could be gated based on basic block = count > > > > or similar. Finally, I should point out that this a regression fix; > > > > GCC 4.8 generated optimal code with two moves (whereas GCC 12 = required > > > > 5 moves, up from GCC 11's 4 moves). > > > > > > > > This patch has been tested on x86_64-pc-linux-gnu with make = bootstrap > > > > and make -k check, both with and without --target_board=3Dunix{-m32} > > with > > > > no new failures. Thoughts? Ok for mainline? >=20 > Can you instead refactor the code to have >=20 > auto_vec worklist1, worklist2; > vec *worklist =3D &worklist1; >=20 > and alternate between worklist1 and worklist2 N times (const int N =3D = 2 is OK for > now I guess, OTOH for -O1 we might want to stick to 1). >=20 > it might be tempting to maintain the number of times we visited a = block and just > directly iterate on changed BBs. There's also the comment in > cprop_hardreg_bb: >=20 > /* If a block has a single predecessor, that we've already > processed, begin with the value data that was live at > the end of the predecessor block. */ > /* ??? Ought to use more intelligent queuing of blocks. */ >=20 > which suggests we want to improve on its dataflow (like iterating in = RPO order > so we can intersect the live value data of predecessors?) That said, = calling > df_analyze () between iterations is what makes iterating expensive I = guess, as > well as cprop_hardreg_debug. >=20 > Richard. >=20 > > > > > > > > > > 2022-06-02 Roger Sayle > > > > > > > > gcc/ChangeLog > > > > * regcprop.cc (pass_cprop_hardreg::execute): Perform a third > > > > iteration over each basic block that was updated by the = second > > > > iteration. > > > > > > > > > > > > Thanks in advance, > > > > Roger > > > > -- > > > > > > ------=_NextPart_000_006D_01D87777.8E2E9260 Content-Type: text/plain; name="patchch2.txt" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="patchch2.txt" diff --git a/gcc/regcprop.cc b/gcc/regcprop.cc=0A= index 1fdc367..eacc59f 100644=0A= --- a/gcc/regcprop.cc=0A= +++ b/gcc/regcprop.cc=0A= @@ -1383,7 +1383,9 @@ pass_cprop_hardreg::execute (function *fun)=0A= auto_sbitmap visited (last_basic_block_for_fn (fun));=0A= bitmap_clear (visited);=0A= =0A= - auto_vec worklist;=0A= + auto_vec worklist1, worklist2;=0A= + auto_vec *curr =3D &worklist1;=0A= + auto_vec *next =3D &worklist2;=0A= bool any_debug_changes =3D false;=0A= =0A= /* We need accurate notes. Earlier passes such as if-conversion may=0A= @@ -1404,7 +1406,7 @@ pass_cprop_hardreg::execute (function *fun)=0A= FOR_EACH_BB_FN (bb, fun)=0A= {=0A= if (cprop_hardreg_bb (bb, all_vd, visited))=0A= - worklist.safe_push (bb->index);=0A= + curr->safe_push (bb->index);=0A= if (all_vd[bb->index].n_debug_insn_changes)=0A= any_debug_changes =3D true;=0A= }=0A= @@ -1416,16 +1418,22 @@ pass_cprop_hardreg::execute (function *fun)=0A= if (MAY_HAVE_DEBUG_BIND_INSNS && any_debug_changes)=0A= cprop_hardreg_debug (fun, all_vd);=0A= =0A= - /* Second pass if we've changed anything, only for the bbs where we = have=0A= - changed anything though. */=0A= - if (!worklist.is_empty ())=0A= + /* Repeat pass up to PASSES times, but only processing basic blocks=0A= + that have changed on the previous iteration. CURR points to the=0A= + current worklist, and each iteration populates the NEXT worklist,=0A= + swapping pointers after each cycle. */=0A= +=0A= + unsigned int passes =3D optimize > 1 ? 3 : 2;=0A= + for (unsigned int pass =3D 2; pass <=3D passes && !curr->is_empty (); = pass++)=0A= {=0A= any_debug_changes =3D false;=0A= bitmap_clear (visited);=0A= - for (int index : worklist)=0A= + next->truncate (0);=0A= + for (int index : *curr)=0A= {=0A= bb =3D BASIC_BLOCK_FOR_FN (fun, index);=0A= - cprop_hardreg_bb (bb, all_vd, visited);=0A= + if (cprop_hardreg_bb (bb, all_vd, visited))=0A= + next->safe_push (bb->index);=0A= if (all_vd[bb->index].n_debug_insn_changes)=0A= any_debug_changes =3D true;=0A= }=0A= @@ -1433,6 +1441,7 @@ pass_cprop_hardreg::execute (function *fun)=0A= df_analyze ();=0A= if (MAY_HAVE_DEBUG_BIND_INSNS && any_debug_changes)=0A= cprop_hardreg_debug (fun, all_vd);=0A= + std::swap (curr, next);=0A= }=0A= =0A= free (all_vd);=0A= ------=_NextPart_000_006D_01D87777.8E2E9260--