public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/106518] New: Exchange/swap aware register allocation (generate xchg in reload)
@ 2022-08-03 20:14 roger at nextmovesoftware dot com
  2022-08-04  9:32 ` [Bug rtl-optimization/106518] " rguenth at gcc dot gnu.org
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: roger at nextmovesoftware dot com @ 2022-08-03 20:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106518

            Bug ID: 106518
           Summary: Exchange/swap aware register allocation (generate xchg
                    in reload)
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: roger at nextmovesoftware dot com
  Target Milestone: ---

This enhacement request is a proposal for improving/tweaking GCC's register
allocation, but assuming/making use of a register exchange/swap operation as a
useful abstraction.  Currently reload/lra is (solely) "move"-based, so when the
contents of regA need to be placed in regB and the original contents of regB
need to be placed in regA, they make use of a temporary register (or a spill)
and generate the classic sequence: tmp=regA; regA=regB; regB=tmp.

A small improvement is to tweak register allocation to assume, as a higher
level abstraction, the existence of an exchange/swap instruction, like x86's
xchg, much like is assummed/used during the reg-stack pass (with i387's fxch). 
[https://gcc.gnu.org/legacy-ml/gcc-patches/2004-12/msg00815.html]

During early register allocation, we introduce virtual exchange operations,
that on can be lowered as a later pass, either to real exchange operations on
targets that support them, or to the standard three-move shuffle sequence
above, if there's a spare suitable temporary register, or alternatively to the
sequence regA^=regB; regB^=regA; regA^=regB, which implements an exchange using
three fast instructions without requiring an additional register.  These three
alternatives guarantee that register allocation is no worse than current, but
has the flexibility to use fewer registers and perhaps fewer instructions.
On modern hardware, xchg is sometimes zero latency (using register renaming),
and on older architectures, a three xor sequence has the same latency as three
moves, but requires on less register, helpfully reducing register pressure.

An example application/benefit of this PR rtl-optimization/97756, which
demonstrates that the x86_64 ABI frequently places (TImode double word)
registers in locations that then neeed the high and low parts to be swapped
(or moved) to place them in the (reg X) and (reg X+1) locations required by
GCC's multi-word register allocation requirements.

Interestingly, GCC's middle-end doesn't have a standard named pattern for an
exchange/swap instruction, i.e. an optab, so currently it has no (easy) way of
deciding whether a target has an xchg-like instruction, which helps explain why
it doesn't currently use/generate them.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug rtl-optimization/106518] Exchange/swap aware register allocation (generate xchg in reload)
  2022-08-03 20:14 [Bug rtl-optimization/106518] New: Exchange/swap aware register allocation (generate xchg in reload) roger at nextmovesoftware dot com
@ 2022-08-04  9:32 ` rguenth at gcc dot gnu.org
  2022-08-04  9:53 ` hubicka at gcc dot gnu.org
  2023-04-26  8:11 ` cvs-commit at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-08-04  9:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106518

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization, ra

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
one could try to recog

 (parallel
   (set (reg A) (reg B))
   (set (reg B) (reg A)))

?  But yes, having a standard name for this would be nice.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug rtl-optimization/106518] Exchange/swap aware register allocation (generate xchg in reload)
  2022-08-03 20:14 [Bug rtl-optimization/106518] New: Exchange/swap aware register allocation (generate xchg in reload) roger at nextmovesoftware dot com
  2022-08-04  9:32 ` [Bug rtl-optimization/106518] " rguenth at gcc dot gnu.org
@ 2022-08-04  9:53 ` hubicka at gcc dot gnu.org
  2023-04-26  8:11 ` cvs-commit at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: hubicka at gcc dot gnu.org @ 2022-08-04  9:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106518

Jan Hubicka <hubicka at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hubicka at gcc dot gnu.org

--- Comment #2 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
We have xchg patterns in i386.md and corresponding peephole. I used to play
with this long time ago and it was not giving any of performance benefits
becuase xchg at that time was not well optimized in CPUs.

With regstack the main problem is that RTL after reg-stack is inconsistent
since we have no way to explicitly express push/pop operations that renumber
the registers.  Some years ago I made patch for that
https://gcc.gnu.org/pipermail/gcc-patches/1999-November/021921.html

Even if you make representation correct register allocation for stack based CPU
is quite different from normal registr allocation.

These days I would more like to see x87 to silently die.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug rtl-optimization/106518] Exchange/swap aware register allocation (generate xchg in reload)
  2022-08-03 20:14 [Bug rtl-optimization/106518] New: Exchange/swap aware register allocation (generate xchg in reload) roger at nextmovesoftware dot com
  2022-08-04  9:32 ` [Bug rtl-optimization/106518] " rguenth at gcc dot gnu.org
  2022-08-04  9:53 ` hubicka at gcc dot gnu.org
@ 2023-04-26  8:11 ` cvs-commit at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-04-26  8:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106518

--- Comment #3 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:

https://gcc.gnu.org/g:1f0bfbb26e532cef7347a91439008114fd88173a

commit r14-245-g1f0bfbb26e532cef7347a91439008114fd88173a
Author: Roger Sayle <roger@nextmovesoftware.com>
Date:   Wed Apr 26 09:10:06 2023 +0100

    [xstormy16] Add support for byte and word swapping instructions.

    This patch adds support for xstormy16's swpb (swap bytes) and swpw (swap
    words) instructions.  The most obvious application of these to implement
    the __builtin_bswap16 and __builtin_bswap32 intrinsics.

    Currently, __builtin_bswap16 is implemented as:
    foo:    mov r7,r2
            shl r7,#8
            shr r2,#8
            or r2,r7
            ret

    but with this patch becomes:
    foo:    swpb r2
            ret

    Likewise, __builtin_bswap32 now becomes:
    foo:    swpb r2 | swpb r3 | swpw r2,r3
            ret

    Finally, the swpw instruction on its own can be used to exchange
    two word mode registers without a temporary, so a new pattern and
    peephole2 have been added to catch this.  As described in the
    PR rtl-optimization/106518, register allocation can (in theory)
    be more efficient on targets that provide a swap/exchange instruction.
    The slightly unusual swap<mode> naming matches that used in i386.md.

    2024-04-26  Roger Sayle  <roger@nextmovesoftware.com>

    gcc/ChangeLog
            * config/stormy16/stormy16.md (bswaphi2): New define_insn.
            (bswapsi2): New define_insn.
            (swaphi): New define_insn to exchange two registers (swpw).
            (define_peephole2): Recognize exchange of registers as swaphi.

    gcc/testsuite/ChangeLog
            * gcc.target/xstormy16/bswap16.c: New test case.
            * gcc.target/xstormy16/bswap32.c: Likewise.
            * gcc.target/xstormy16/swpb.c: Likewise.
            * gcc.target/xstormy16/swpw-1.c: Likewise.
            * gcc.target/xstormy16/swpw-2.c: Likewise.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-04-26  8:11 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-03 20:14 [Bug rtl-optimization/106518] New: Exchange/swap aware register allocation (generate xchg in reload) roger at nextmovesoftware dot com
2022-08-04  9:32 ` [Bug rtl-optimization/106518] " rguenth at gcc dot gnu.org
2022-08-04  9:53 ` hubicka at gcc dot gnu.org
2023-04-26  8:11 ` cvs-commit at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).