From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <roger@nextmovesoftware.com>
Received: from server.nextmovesoftware.com (server.nextmovesoftware.com
 [162.254.253.69])
 by sourceware.org (Postfix) with ESMTPS id 14FDE385E833
 for <gcc-patches@gcc.gnu.org>; Fri,  3 Jun 2022 17:27:18 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 14FDE385E833
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=nextmovesoftware.com
Authentication-Results: sourceware.org;
 spf=pass smtp.mailfrom=nextmovesoftware.com
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
 d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID:
 Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID:
 Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc
 :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe:
 List-Subscribe:List-Post:List-Owner:List-Archive;
 bh=oYG7rFPrQj5j8Ze6YKUPjhV1PadX09vFU09AnKxNaDQ=; b=hfHkMl9oPE5yXx/wVkEyBphBDl
 rFQv/tKBB5tPXzMIHHkSqaMLL9pJnwDsiYIhJDKfqo12i+dcHu28o5RHlZnRSfMboXZkW2cADxlWT
 fBAxRMC7Umk7aG4dr5ZmtorwZ2tgQi/Xm989Rc9bDXd0XLqqJBfEE2e8JdAPfRYouEL+0pgGSD/46
 BFbuI5mloyAAmtlOnztQbWo09woSN+dpCZInF7RyAJzG5ztFh5VpkgUDC5M+ZmqCQWqzfRQfNP1NQ
 V9a2qUa7UeFKwCYrwaRRBgFxgWYDL6Er+n7ab1oAlz4mKhbsiJ4bqfjC2ZT6xNZZxXmeOrPJ8kDHK
 0vWf4Kaw==;
Received: from [185.62.158.67] (port=56127 helo=Dell)
 by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls
 TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2)
 (envelope-from <roger@nextmovesoftware.com>)
 id 1nxB4v-0002aP-F2; Fri, 03 Jun 2022 13:27:17 -0400
From: "Roger Sayle" <roger@nextmovesoftware.com>
To: "'Richard Biener'" <richard.guenther@gmail.com>
Cc: "'GCC Patches'" <gcc-patches@gcc.gnu.org>
Subject: [PATCH/RFC take #2] cprop_hardreg... Third time's a charm.
Date: Fri, 3 Jun 2022 18:27:16 +0100
Message-ID: <006c01d8776f$2c67b960$85372c20$@nextmovesoftware.com>
MIME-Version: 1.0
Content-Type: multipart/mixed;
 boundary="----=_NextPart_000_006D_01D87777.8E2E9260"
X-Mailer: Microsoft Outlook 16.0
Content-Language: en-gb
Thread-Index: Adh3bP1s6aGyQrZKQL+VFclegzpz5Q==
X-AntiAbuse: This header was added to track abuse,
 please include it with any abuse report
X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com
X-AntiAbuse: Original Domain - gcc.gnu.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - nextmovesoftware.com
X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id:
 roger@nextmovesoftware.com
X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com
X-Source: 
X-Source-Args: 
X-Source-Dir: 
X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS,
 TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Fri, 03 Jun 2022 17:27:19 -0000

This is a multipart message in MIME format.

------=_NextPart_000_006D_01D87777.8E2E9260
Content-Type: text/plain;
	charset="UTF-8"
Content-Transfer-Encoding: quoted-printable


Hi Richard,
Here's a revised version of my patch incorporating both your =
suggestions.
The algorithm now uses two worklist vectors, and pointers to them,
alternating between them on each iteration, which allows the code to
handle an arbitrary number of passes without the previous code =
duplication.
This then allows a "for (pass=3D2; pass<=3Dpasses; pass++)"-like idiom, =
that allows
it to try two passes (the current behaviour) at -O1, but three passes =
with
-O2 and higher.  This also provides a convenient point, should someone
wish to control the number of cprop_hardreg pass iterations from the
command line (or a backend) in future.

This patch has been retested on x86_64-pc-linux-gnu, both with an =
without
the proposed i386 backend patch for reduced register shuffling, with
make bootstrap and make -k check, both with/without target_board=3D
unix{-m32}, with no new failures.  Is this what you had in mind?

2022-06-03  Roger Sayle  <roger@nextmovesoftware.com>
            Richard Biener  <rguenther@suse.de>

gcc/ChangeLog
        * regcprop.cc (pass_cprop_hardreg::execute): Perform a third
        iteration over each basic block that was updated by the second
        iteration.

Cheers,
Roger
--

> -----Original Message-----
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: 02 June 2022 11:56
> To: Roger Sayle <roger@nextmovesoftware.com>
> Cc: GCC Patches <gcc-patches@gcc.gnu.org>
> Subject: Re: [PATCH/RFC] cprop_hardreg... Third time's a charm.
>=20
> On Thu, Jun 2, 2022 at 12:20 PM Roger Sayle =
<roger@nextmovesoftware.com>
> wrote:
> >
> >
> >
> > This middle-end patch proposes the "hard register constant =
propagation"
> >
> > pass be performed up to three times on each basic block (up from the
> >
> > current two times) if the second pass successfully made changes.
> >
> >
> >
> > The motivation for three passes is to handle the "swap idiom" (i.e.
> >
> > t =3D x; x =3D y; y =3D t;" sequences) that get generated by =
register
> > allocation
> >
> > (reload).
> >
> >
> >
> > Consider the x86_64 test case for __int128 addition recently =
discussed
> >
> > on gcc-patches.  With that proposed patch, the input to the
> > cprop_hardreg
> >
> > pass looks like:
> >
> >
> >
> >         movq    %rdi, %r8
> >
> >         movq    %rsi, %rdi
> >
> >         movq    %r8, %rsi
> >
> >         movq    %rdx, %rax
> >
> >         movq    %rcx, %rdx
> >
> >         addq    %rsi %rax
> >
> >         adcq    %rdi, %rdx
> >
> >         ret
> >
> >
> >
> > where the first three instructions effectively swap %rsi and %rdi.
> >
> >
> >
> > On the first pass of cprop_hardreg, we notice that the third insn,
> >
> > %rsi :=3D %r8, is redundant and can eliminated/propagated to =
produce:
> >
> >
> >
> >         movq    %rdi, %r8
> >
> >         movq    %rsi, %rdi
> >
> >         movq    %rdx, %rax
> >
> >         movq    %rcx, %rdx
> >
> >         addq    %r8 %rax
> >
> >         adcq    %rdi, %rdx
> >
> >         ret
> >
> >
> >
> > Because a successful propagation was found, cprop_hardreg then runs
> >
> > a second pass/sweep on affected basic blocks (using worklist), and
> >
> > on this second pass notices that the second instruction, %rdi :=3D =
%rsi,
> >
> > may now be propagated (%rsi was killed in the before the first
> > transform),
> >
> > and after a second pass, we now end up with:
> >
> >
> >
> >         movq    %rdi, %r8
> >
> >         movq    %rdx, %rax
> >
> >         movq    %rcx, %rdx
> >
> >         addq    %r8, %rax
> >
> >         adcq    %rsi, %rdx
> >
> >         ret
> >
> >
> >
> > which is the current behaviour on mainline.  However, a third and
> > final
> >
> > pass would now notice that the first insn, "%r8 :=3D %rdi" is also =
now
> >
> > eliminable, and a third iteration would produce optimal code:
> >
> >
> >
> >         movq    %rdx, %rax
> >
> >         movq    %rcx, %rdx
> >
> >         addq    %rdi, %rax
> >
> >         adcq    %rsi, %rdx
> >
> >         ret
> >
> >
> >
> > The patch below creates an additional worklist, third_pass, that is
> >
> > populated with the basic block id's of blocks that were updated =
during
> >
> > the second pass.  Does the motivation for three passes (reload =
doesn't
> >
> > generate more or less than three instructions to swap a pair of
> > registers)
> >
> > seem reasonable for all targets?  If cprop_hardreg is considered an
> >
> > expensive pass, this change could be gated based on basic block =
count
> >
> > or similar.  Finally, I should point out that this a regression fix;
> >
> > GCC 4.8 generated optimal code with two moves (whereas GCC 12 =
required
> >
> > 5 moves, up from GCC 11's 4 moves).
> >
> >
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make =
bootstrap
> >
> > and make -k check, both with and without --target_board=3Dunix{-m32}
> > with
> >
> > no new failures.  Thoughts?  Ok for mainline?
>=20
> Can you instead refactor the code to have
>=20
> auto_vec<int> worklist1, worklist2;
> vec<int> *worklist =3D &worklist1;
>=20
> and alternate between worklist1 and worklist2 N times (const int N =3D =
2 is OK for
> now I guess, OTOH for -O1 we might want to stick to 1).
>=20
> it might be tempting to maintain the number of times we visited a =
block and just
> directly iterate on changed BBs.  There's also the comment in
> cprop_hardreg_bb:
>=20
>   /* If a block has a single predecessor, that we've already
>      processed, begin with the value data that was live at
>      the end of the predecessor block.  */
>   /* ??? Ought to use more intelligent queuing of blocks.  */
>=20
> which suggests we want to improve on its dataflow (like iterating in =
RPO order
> so we can intersect the live value data of predecessors?) That said, =
calling
> df_analyze () between iterations is what makes iterating expensive I =
guess, as
> well as cprop_hardreg_debug.
>=20
> Richard.
>=20
> >
> >
> >
> >
> > 2022-06-02  Roger Sayle  <roger@nextmovesoftware.com>
> >
> >
> >
> > gcc/ChangeLog
> >
> >         * regcprop.cc (pass_cprop_hardreg::execute): Perform a third
> >
> >         iteration over each basic block that was updated by the =
second
> >
> >         iteration.
> >
> >
> >
> >
> >
> > Thanks in advance,
> >
> > Roger
> >
> > --
> >
> >
> >

------=_NextPart_000_006D_01D87777.8E2E9260
Content-Type: text/plain;
	name="patchch2.txt"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
	filename="patchch2.txt"

diff --git a/gcc/regcprop.cc b/gcc/regcprop.cc=0A=
index 1fdc367..eacc59f 100644=0A=
--- a/gcc/regcprop.cc=0A=
+++ b/gcc/regcprop.cc=0A=
@@ -1383,7 +1383,9 @@ pass_cprop_hardreg::execute (function *fun)=0A=
   auto_sbitmap visited (last_basic_block_for_fn (fun));=0A=
   bitmap_clear (visited);=0A=
 =0A=
-  auto_vec<int> worklist;=0A=
+  auto_vec<int> worklist1, worklist2;=0A=
+  auto_vec<int> *curr =3D &worklist1;=0A=
+  auto_vec<int> *next =3D &worklist2;=0A=
   bool any_debug_changes =3D false;=0A=
 =0A=
   /* We need accurate notes.  Earlier passes such as if-conversion may=0A=
@@ -1404,7 +1406,7 @@ pass_cprop_hardreg::execute (function *fun)=0A=
   FOR_EACH_BB_FN (bb, fun)=0A=
     {=0A=
       if (cprop_hardreg_bb (bb, all_vd, visited))=0A=
-	worklist.safe_push (bb->index);=0A=
+	curr->safe_push (bb->index);=0A=
       if (all_vd[bb->index].n_debug_insn_changes)=0A=
 	any_debug_changes =3D true;=0A=
     }=0A=
@@ -1416,16 +1418,22 @@ pass_cprop_hardreg::execute (function *fun)=0A=
   if (MAY_HAVE_DEBUG_BIND_INSNS && any_debug_changes)=0A=
     cprop_hardreg_debug (fun, all_vd);=0A=
 =0A=
-  /* Second pass if we've changed anything, only for the bbs where we =
have=0A=
-     changed anything though.  */=0A=
-  if (!worklist.is_empty ())=0A=
+  /* Repeat pass up to PASSES times, but only processing basic blocks=0A=
+     that have changed on the previous iteration.  CURR points to the=0A=
+     current worklist, and each iteration populates the NEXT worklist,=0A=
+     swapping pointers after each cycle.  */=0A=
+=0A=
+  unsigned int passes =3D optimize > 1 ? 3 : 2;=0A=
+  for (unsigned int pass =3D 2; pass <=3D passes && !curr->is_empty (); =
pass++)=0A=
     {=0A=
       any_debug_changes =3D false;=0A=
       bitmap_clear (visited);=0A=
-      for (int index : worklist)=0A=
+      next->truncate (0);=0A=
+      for (int index : *curr)=0A=
 	{=0A=
 	  bb =3D BASIC_BLOCK_FOR_FN (fun, index);=0A=
-	  cprop_hardreg_bb (bb, all_vd, visited);=0A=
+          if (cprop_hardreg_bb (bb, all_vd, visited))=0A=
+	    next->safe_push (bb->index);=0A=
 	  if (all_vd[bb->index].n_debug_insn_changes)=0A=
 	    any_debug_changes =3D true;=0A=
 	}=0A=
@@ -1433,6 +1441,7 @@ pass_cprop_hardreg::execute (function *fun)=0A=
       df_analyze ();=0A=
       if (MAY_HAVE_DEBUG_BIND_INSNS && any_debug_changes)=0A=
 	cprop_hardreg_debug (fun, all_vd);=0A=
+      std::swap (curr, next);=0A=
     }=0A=
 =0A=
   free (all_vd);=0A=

------=_NextPart_000_006D_01D87777.8E2E9260--