From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
 id 7395A3858C2D; Wed,  3 Aug 2022 20:14:11 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 7395A3858C2D
From: "roger at nextmovesoftware dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug rtl-optimization/106518] New: Exchange/swap aware register
 allocation (generate xchg in reload)
Date: Wed, 03 Aug 2022 20:14:11 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: new
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: rtl-optimization
X-Bugzilla-Version: unknown
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: enhancement
X-Bugzilla-Who: roger at nextmovesoftware dot com
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status
 bug_severity priority component assigned_to reporter target_milestone
Message-ID: <bug-106518-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: gcc-bugs@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-bugs mailing list <gcc-bugs.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Wed, 03 Aug 2022 20:14:11 -0000

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D106518

            Bug ID: 106518
           Summary: Exchange/swap aware register allocation (generate xchg
                    in reload)
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: roger at nextmovesoftware dot com
  Target Milestone: ---

This enhacement request is a proposal for improving/tweaking GCC's register
allocation, but assuming/making use of a register exchange/swap operation a=
s a
useful abstraction.  Currently reload/lra is (solely) "move"-based, so when=
 the
contents of regA need to be placed in regB and the original contents of regB
need to be placed in regA, they make use of a temporary register (or a spil=
l)
and generate the classic sequence: tmp=3DregA; regA=3DregB; regB=3Dtmp.

A small improvement is to tweak register allocation to assume, as a higher
level abstraction, the existence of an exchange/swap instruction, like x86's
xchg, much like is assummed/used during the reg-stack pass (with i387's fxc=
h).=20
[https://gcc.gnu.org/legacy-ml/gcc-patches/2004-12/msg00815.html]

During early register allocation, we introduce virtual exchange operations,
that on can be lowered as a later pass, either to real exchange operations =
on
targets that support them, or to the standard three-move shuffle sequence
above, if there's a spare suitable temporary register, or alternatively to =
the
sequence regA^=3DregB; regB^=3DregA; regA^=3DregB, which implements an exch=
ange using
three fast instructions without requiring an additional register.  These th=
ree
alternatives guarantee that register allocation is no worse than current, b=
ut
has the flexibility to use fewer registers and perhaps fewer instructions.
On modern hardware, xchg is sometimes zero latency (using register renaming=
),
and on older architectures, a three xor sequence has the same latency as th=
ree
moves, but requires on less register, helpfully reducing register pressure.

An example application/benefit of this PR rtl-optimization/97756, which
demonstrates that the x86_64 ABI frequently places (TImode double word)
registers in locations that then neeed the high and low parts to be swapped
(or moved) to place them in the (reg X) and (reg X+1) locations required by
GCC's multi-word register allocation requirements.

Interestingly, GCC's middle-end doesn't have a standard named pattern for an
exchange/swap instruction, i.e. an optab, so currently it has no (easy) way=
 of
deciding whether a target has an xchg-like instruction, which helps explain=
 why
it doesn't currently use/generate them.=