public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [x86 PATCH] Add additional variant of bswaphisi2_lowpart peephole2.
@ 2024-07-01 13:20 Roger Sayle
  2024-07-01 18:46 ` Uros Bizjak
  0 siblings, 1 reply; 2+ messages in thread
From: Roger Sayle @ 2024-07-01 13:20 UTC (permalink / raw)
  To: gcc-patches; +Cc: 'Uros Bizjak'

[-- Attachment #1: Type: text/plain, Size: 1361 bytes --]


This patch adds an additional variation of the peephole2 used to convert
bswaphisi2_lowpart into rotlhi3_1_slp, which converts xchgb %ah,%al into
rotw if the flags register isn't live.  The motivating example is:

void ext(int x);
void foo(int x)
{
  ext((x&~0xffff)|((x>>8)&0xff)|((x&0xff)<<8));
}

where GCC with -O2 currently produces:

foo:    movl    %edi, %eax
        rolw    $8, %ax
        movl    %eax, %edi
        jmp     ext

The issue is that the original xchgb (bswaphisi2_lowpart) can only be
performed in "Q" registers that allow the %?h register to be used, so
reload generates the above two movl.  However, it's later in peephole2
where we see that CC_FLAGS can be clobbered, so we can use a rotate word,
which is more forgiving with register allocations.  With the additional
peephole2 proposed here, we now generate:

foo:    rolw    $8, %di
        jmp     ext


This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2024-07-01  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
        * config/i386/i386.md (bswaphisi2_lowpart peephole2): New
        peephole2 variant to eliminate register shuffling.

gcc/testsuite/ChangeLog
        * gcc.target/i386/xchg-4.c: New test case.


Thanks again,
Roger
--


[-- Attachment #2: patchbs.txt --]
[-- Type: text/plain, Size: 1815 bytes --]

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index b6ccb1e..9bc0eb7 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -21459,6 +21459,30 @@
 	      (clobber (reg:CC FLAGS_REG))])]
   "operands[0] = gen_lowpart (HImode, operands[0]);")
 
+;; Variant of above peephole2 to improve register allocation.
+(define_peephole2
+  [(set (match_operand:SI 0 "general_reg_operand")
+        (match_operand:SI 1 "register_operand"))
+   (set (match_dup 0)
+	(ior:SI (and:SI (match_dup 0)
+			(const_int -65536))
+		(lshiftrt:SI (bswap:SI (match_dup 0))
+			     (const_int 16))))
+   (set (match_operand:SI 2 "general_reg_operand") (match_dup 0))]
+  "!(TARGET_USE_XCHGB ||
+     TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun))
+   && peep2_regno_dead_p (0, FLAGS_REG)
+   && peep2_reg_dead_p(3, operands[0])"
+  [(parallel
+    [(set (strict_low_part (match_dup 3))
+	  (rotate:HI (match_dup 3) (const_int 8)))
+     (clobber (reg:CC FLAGS_REG))])]
+{
+  if (!rtx_equal_p (operands[1], operands[2]))
+    emit_move_insn (operands[2], operands[1]);
+  operands[3] = gen_lowpart (HImode, operands[2]);
+})
+
 (define_expand "paritydi2"
   [(set (match_operand:DI 0 "register_operand")
 	(parity:DI (match_operand:DI 1 "register_operand")))]
diff --git a/gcc/testsuite/gcc.target/i386/xchg-4.c b/gcc/testsuite/gcc.target/i386/xchg-4.c
new file mode 100644
index 0000000..de099e7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/xchg-4.c
@@ -0,0 +1,11 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2" } */
+
+void ext(int x);
+void foo(int x) 
+{
+    ext((x&~0xffff)|((x>>8)&0xff)|((x&0xff)<<8));
+}
+
+/* { dg-final { scan-assembler "rolw" } } */
+/* { dg-final { scan-assembler-not "mov" } } */

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [x86 PATCH] Add additional variant of bswaphisi2_lowpart peephole2.
  2024-07-01 13:20 [x86 PATCH] Add additional variant of bswaphisi2_lowpart peephole2 Roger Sayle
@ 2024-07-01 18:46 ` Uros Bizjak
  0 siblings, 0 replies; 2+ messages in thread
From: Uros Bizjak @ 2024-07-01 18:46 UTC (permalink / raw)
  To: Roger Sayle; +Cc: gcc-patches

On Mon, Jul 1, 2024 at 3:20 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> This patch adds an additional variation of the peephole2 used to convert
> bswaphisi2_lowpart into rotlhi3_1_slp, which converts xchgb %ah,%al into
> rotw if the flags register isn't live.  The motivating example is:
>
> void ext(int x);
> void foo(int x)
> {
>   ext((x&~0xffff)|((x>>8)&0xff)|((x&0xff)<<8));
> }
>
> where GCC with -O2 currently produces:
>
> foo:    movl    %edi, %eax
>         rolw    $8, %ax
>         movl    %eax, %edi
>         jmp     ext
>
> The issue is that the original xchgb (bswaphisi2_lowpart) can only be
> performed in "Q" registers that allow the %?h register to be used, so
> reload generates the above two movl.  However, it's later in peephole2
> where we see that CC_FLAGS can be clobbered, so we can use a rotate word,
> which is more forgiving with register allocations.  With the additional
> peephole2 proposed here, we now generate:
>
> foo:    rolw    $8, %di
>         jmp     ext
>
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2024-07-01  Roger Sayle  <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
>         * config/i386/i386.md (bswaphisi2_lowpart peephole2): New
>         peephole2 variant to eliminate register shuffling.
>
> gcc/testsuite/ChangeLog
>         * gcc.target/i386/xchg-4.c: New test case.

OK.

Thanks,
Uros.

>
>
> Thanks again,
> Roger
> --
>

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2024-07-01 18:46 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-07-01 13:20 [x86 PATCH] Add additional variant of bswaphisi2_lowpart peephole2 Roger Sayle
2024-07-01 18:46 ` Uros Bizjak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).