public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [x86 PATCH] Use xchg for DImode double word rotate by 32 bits with -m32.
@ 2022-06-26 15:54 Roger Sayle
  2022-06-26 18:27 ` Uros Bizjak
  0 siblings, 1 reply; 2+ messages in thread
From: Roger Sayle @ 2022-06-26 15:54 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2288 bytes --]


This patch was motivated by the investigation of Linus Torvalds' spill
heavy cryptography kernels in PR 105930.  The <any_rotate>di3 expander
handles all rotations by an immediate constant for 1..63 bits with the
exception of 32 bits, which FAILs and is then split by the middle-end.
This patch makes these 32-bit doubleword rotations consistent with the
other DImode rotations during reload, which results in reduced register
pressure, fewer instructions and the use of x86's xchg instruction
when appropriate.  In theory, xchg can be handled by register renaming,
but even on micro-architectures where it's implemented by 3 uops (no
worse than a three instruction shuffle), avoiding nominating a
"temporary" register, reduces user-visible register pressure (and
has obvious code size benefits).

To effects are best shown with the new testcase:

unsigned long long bar();
unsigned long long foo()
{
  unsigned long long x = bar();
  return (x>>32) | (x<<32);
}

for which GCC with -m32 -O2 currently generates:

        subl    $12, %esp
        call    bar
        addl    $12, %esp
        movl    %eax, %ecx
        movl    %edx, %eax
        movl    %ecx, %edx
        ret

but with this patch now generates:

        subl    $12, %esp
        call    bar
        addl    $12, %esp
        xchgl   %edx, %eax
        ret

With this patch, the number of lines of assembly language generated
for the blake2b kernel (from the attachment to PR105930) decreases
from 5626 to 5404. Although there's an impressive reduction in
instruction count, there's no change/reduction in stack frame size.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32},
with no new failures.  Ok for mainline?


2022-06-26  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	* config/i386/i386.md (swap_mode): Rename from *swap<mode> to
	provide gen_swapsi.
	(<any_rotate>di3): Handle !TARGET_64BIT rotations via new
	gen_ix86_<insn>32di2_doubleword below.
	(ix86_<anyrotate>32di2_doubleword): New define_insn_and_split
	that splits after reload as either a pair of move instructions
	or an xchgl (using gen_swapsi).

gcc/testsuite/ChangeLog
	* gcc.target/i386/xchg-3.c: New test case.


Thanks in advance,
Roger
--


[-- Attachment #2: patchrd2.txt --]
[-- Type: text/plain, Size: 2069 bytes --]

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index dd173f7..ab94866 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -2966,7 +2966,7 @@
    (set_attr "memory" "load")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "*swap<mode>"
+(define_insn "swap<mode>"
   [(set (match_operand:SWI48 0 "register_operand" "+r")
 	(match_operand:SWI48 1 "register_operand" "+r"))
    (set (match_dup 1)
@@ -13648,6 +13648,8 @@
   else if (const_1_to_31_operand (operands[2], VOIDmode))
     emit_insn (gen_ix86_<insn>di3_doubleword
 		(operands[0], operands[1], operands[2]));
+  else if (CONST_INT_P (operands[2]) && INTVAL (operands[2]) == 32)
+    emit_insn (gen_ix86_<insn>32di2_doubleword (operands[0], operands[1]));
   else
     FAIL;
 
@@ -13820,6 +13822,24 @@
   split_double_mode (<DWI>mode, &operands[0], 1, &operands[4], &operands[5]);
 })
 
+(define_insn_and_split "ix86_<insn>32di2_doubleword"
+ [(set (match_operand:DI 0 "register_operand" "=r,r")
+       (any_rotate:DI (match_operand:DI 1 "nonimmediate_operand" "r,o")
+                      (const_int 32)))]
+ "!TARGET_64BIT"
+ "#"
+ "&& reload_completed"
+ [(set (match_dup 0) (match_dup 3))
+  (set (match_dup 2) (match_dup 1))]
+{
+  split_double_mode (DImode, &operands[0], 2, &operands[0], &operands[2]);
+  if (rtx_equal_p (operands[0], operands[1]))
+    {
+      emit_insn (gen_swapsi (operands[0], operands[2]));
+      DONE;
+    }
+})
+
 (define_mode_attr rorx_immediate_operand
 	[(SI "const_0_to_31_operand")
 	 (DI "const_0_to_63_operand")])
diff --git a/gcc/testsuite/gcc.target/i386/xchg-3.c b/gcc/testsuite/gcc.target/i386/xchg-3.c
new file mode 100644
index 0000000..eec05f0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/xchg-3.c
@@ -0,0 +1,12 @@
+/* { dg-do compile { target ia32 } } */
+/* { dg-options "-O2" } */
+
+unsigned long long bar();
+
+unsigned long long foo()
+{
+  unsigned long long x = bar();
+  return (x>>32) | (x<<32);
+}
+
+/*{ dg-final { scan-assembler "xchgl" } } */

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [x86 PATCH] Use xchg for DImode double word rotate by 32 bits with -m32.
  2022-06-26 15:54 [x86 PATCH] Use xchg for DImode double word rotate by 32 bits with -m32 Roger Sayle
@ 2022-06-26 18:27 ` Uros Bizjak
  0 siblings, 0 replies; 2+ messages in thread
From: Uros Bizjak @ 2022-06-26 18:27 UTC (permalink / raw)
  To: Roger Sayle; +Cc: gcc-patches

On Sun, Jun 26, 2022 at 5:54 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> This patch was motivated by the investigation of Linus Torvalds' spill
> heavy cryptography kernels in PR 105930.  The <any_rotate>di3 expander
> handles all rotations by an immediate constant for 1..63 bits with the
> exception of 32 bits, which FAILs and is then split by the middle-end.
> This patch makes these 32-bit doubleword rotations consistent with the
> other DImode rotations during reload, which results in reduced register
> pressure, fewer instructions and the use of x86's xchg instruction
> when appropriate.  In theory, xchg can be handled by register renaming,
> but even on micro-architectures where it's implemented by 3 uops (no
> worse than a three instruction shuffle), avoiding nominating a
> "temporary" register, reduces user-visible register pressure (and
> has obvious code size benefits).
>
> To effects are best shown with the new testcase:
>
> unsigned long long bar();
> unsigned long long foo()
> {
>   unsigned long long x = bar();
>   return (x>>32) | (x<<32);
> }
>
> for which GCC with -m32 -O2 currently generates:
>
>         subl    $12, %esp
>         call    bar
>         addl    $12, %esp
>         movl    %eax, %ecx
>         movl    %edx, %eax
>         movl    %ecx, %edx
>         ret
>
> but with this patch now generates:
>
>         subl    $12, %esp
>         call    bar
>         addl    $12, %esp
>         xchgl   %edx, %eax
>         ret
>
> With this patch, the number of lines of assembly language generated
> for the blake2b kernel (from the attachment to PR105930) decreases
> from 5626 to 5404. Although there's an impressive reduction in
> instruction count, there's no change/reduction in stack frame size.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32},
> with no new failures.  Ok for mainline?
>
>
> 2022-06-26  Roger Sayle  <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
>         * config/i386/i386.md (swap_mode): Rename from *swap<mode> to
>         provide gen_swapsi.
>         (<any_rotate>di3): Handle !TARGET_64BIT rotations via new
>         gen_ix86_<insn>32di2_doubleword below.
>         (ix86_<anyrotate>32di2_doubleword): New define_insn_and_split
>         that splits after reload as either a pair of move instructions
>         or an xchgl (using gen_swapsi).
>
> gcc/testsuite/ChangeLog
>         * gcc.target/i386/xchg-3.c: New test case.

+(define_insn_and_split "ix86_<insn>32di2_doubleword"

We don't encode the target in the insn name - <insn>32di2_doubleword
should be OK.

+ [(set (match_operand:DI 0 "register_operand" "=r,r")
+       (any_rotate:DI (match_operand:DI 1 "nonimmediate_operand" "r,o")
+                      (const_int 32)))]

Please use "=r,r,r"/"0,r,o" constraints here.

Uros.

>
> Thanks in advance,
> Roger
> --
>

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2022-06-26 18:27 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-26 15:54 [x86 PATCH] Use xchg for DImode double word rotate by 32 bits with -m32 Roger Sayle
2022-06-26 18:27 ` Uros Bizjak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).