public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [X86 PATCH] Implement doubleword shift left by 1 bit using add+adc.
@ 2023-10-05 11:43 Roger Sayle
  0 siblings, 0 replies; 3+ messages in thread
From: Roger Sayle @ 2023-10-05 11:43 UTC (permalink / raw)
  To: gcc-patches; +Cc: 'Uros Bizjak'


This patch tweaks the i386 back-end's ix86_split_ashl to implement
doubleword left shifts by 1 bit, using an add followed by an add-with-carry
(i.e. a doubleword x+x) instead of using the x86's shld instruction.
The replacement sequence both requires fewer bytes and is faster on
both Intel and AMD architectures (from Agner Fog's latency tables and
confirmed by my own microbenchmarking).

For the test case:
__int128 foo(__int128 x) { return x << 1; }

with -O2 we previously generated:

foo:    movq    %rdi, %rax
        movq    %rsi, %rdx
        shldq   $1, %rdi, %rdx
        addq    %rdi, %rax
        ret

with this patch we now generate:

foo:    movq    %rdi, %rax
        movq    %rsi, %rdx
        addq    %rdi, %rax
        adcq    %rsi, %rdx
        ret


This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2023-10-05  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
        * config/i386/i386-expand.cc (ix86_split_ashl): Split shifts by
        one into add3_cc_overflow_1 followed by add3_carry.
        * config/i386/i386.md (@add<mode>3_cc_overflow_1): Renamed from
        "*add<mode>3_cc_overflow_1" to provide generator function.

gcc/testsuite/ChangeLog
        * gcc.target/i386/ashldi3-2.c: New 32-bit test case.
        * gcc.target/i386/ashlti3-3.c: New 64-bit test case.


Thanks in advance,
Roger
--



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [X86 PATCH] Implement doubleword shift left by 1 bit using add+adc.
  2023-10-05 11:45 Roger Sayle
@ 2023-10-05 12:06 ` Uros Bizjak
  0 siblings, 0 replies; 3+ messages in thread
From: Uros Bizjak @ 2023-10-05 12:06 UTC (permalink / raw)
  To: Roger Sayle; +Cc: gcc-patches

On Thu, Oct 5, 2023 at 1:45 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
> Doh! ENOPATCH.
>
> > -----Original Message-----
> > From: Roger Sayle <roger@nextmovesoftware.com>
> > Sent: 05 October 2023 12:44
> > To: 'gcc-patches@gcc.gnu.org' <gcc-patches@gcc.gnu.org>
> > Cc: 'Uros Bizjak' <ubizjak@gmail.com>
> > Subject: [X86 PATCH] Implement doubleword shift left by 1 bit using
> add+adc.
> >
> >
> > This patch tweaks the i386 back-end's ix86_split_ashl to implement
> doubleword
> > left shifts by 1 bit, using an add followed by an add-with-carry (i.e. a
> doubleword
> > x+x) instead of using the x86's shld instruction.
> > The replacement sequence both requires fewer bytes and is faster on both
> Intel
> > and AMD architectures (from Agner Fog's latency tables and confirmed by my
> > own microbenchmarking).
> >
> > For the test case:
> > __int128 foo(__int128 x) { return x << 1; }
> >
> > with -O2 we previously generated:
> >
> > foo:    movq    %rdi, %rax
> >         movq    %rsi, %rdx
> >         shldq   $1, %rdi, %rdx
> >         addq    %rdi, %rax
> >         ret
> >
> > with this patch we now generate:
> >
> > foo:    movq    %rdi, %rax
> >         movq    %rsi, %rdx
> >         addq    %rdi, %rax
> >         adcq    %rsi, %rdx
> >         ret
> >
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and
> > make -k check, both with and without --target_board=unix{-m32} with no new
> > failures.  Ok for mainline?
> >
> >
> > 2023-10-05  Roger Sayle  <roger@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> >         * config/i386/i386-expand.cc (ix86_split_ashl): Split shifts by
> >         one into add3_cc_overflow_1 followed by add3_carry.
> >         * config/i386/i386.md (@add<mode>3_cc_overflow_1): Renamed from
> >         "*add<mode>3_cc_overflow_1" to provide generator function.
> >
> > gcc/testsuite/ChangeLog
> >         * gcc.target/i386/ashldi3-2.c: New 32-bit test case.
> >         * gcc.target/i386/ashlti3-3.c: New 64-bit test case.

OK.

Thanks,
Uros.

> >
> >
> > Thanks in advance,
> > Roger
> > --
>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: [X86 PATCH] Implement doubleword shift left by 1 bit using add+adc.
@ 2023-10-05 11:45 Roger Sayle
  2023-10-05 12:06 ` Uros Bizjak
  0 siblings, 1 reply; 3+ messages in thread
From: Roger Sayle @ 2023-10-05 11:45 UTC (permalink / raw)
  To: gcc-patches; +Cc: 'Uros Bizjak'

[-- Attachment #1: Type: text/plain, Size: 1871 bytes --]

Doh! ENOPATCH.

> -----Original Message-----
> From: Roger Sayle <roger@nextmovesoftware.com>
> Sent: 05 October 2023 12:44
> To: 'gcc-patches@gcc.gnu.org' <gcc-patches@gcc.gnu.org>
> Cc: 'Uros Bizjak' <ubizjak@gmail.com>
> Subject: [X86 PATCH] Implement doubleword shift left by 1 bit using
add+adc.
> 
> 
> This patch tweaks the i386 back-end's ix86_split_ashl to implement
doubleword
> left shifts by 1 bit, using an add followed by an add-with-carry (i.e. a
doubleword
> x+x) instead of using the x86's shld instruction.
> The replacement sequence both requires fewer bytes and is faster on both
Intel
> and AMD architectures (from Agner Fog's latency tables and confirmed by my
> own microbenchmarking).
> 
> For the test case:
> __int128 foo(__int128 x) { return x << 1; }
> 
> with -O2 we previously generated:
> 
> foo:    movq    %rdi, %rax
>         movq    %rsi, %rdx
>         shldq   $1, %rdi, %rdx
>         addq    %rdi, %rax
>         ret
> 
> with this patch we now generate:
> 
> foo:    movq    %rdi, %rax
>         movq    %rsi, %rdx
>         addq    %rdi, %rax
>         adcq    %rsi, %rdx
>         ret
> 
> 
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and
> make -k check, both with and without --target_board=unix{-m32} with no new
> failures.  Ok for mainline?
> 
> 
> 2023-10-05  Roger Sayle  <roger@nextmovesoftware.com>
> 
> gcc/ChangeLog
>         * config/i386/i386-expand.cc (ix86_split_ashl): Split shifts by
>         one into add3_cc_overflow_1 followed by add3_carry.
>         * config/i386/i386.md (@add<mode>3_cc_overflow_1): Renamed from
>         "*add<mode>3_cc_overflow_1" to provide generator function.
> 
> gcc/testsuite/ChangeLog
>         * gcc.target/i386/ashldi3-2.c: New 32-bit test case.
>         * gcc.target/i386/ashlti3-3.c: New 64-bit test case.
> 
> 
> Thanks in advance,
> Roger
> --


[-- Attachment #2: patchrr.txt --]
[-- Type: text/plain, Size: 2249 bytes --]

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index e42ff27..09e41c8 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -6342,6 +6342,18 @@ ix86_split_ashl (rtx *operands, rtx scratch, machine_mode mode)
 	  if (count > half_width)
 	    ix86_expand_ashl_const (high[0], count - half_width, mode);
 	}
+      else if (count == 1)
+	{
+	  if (!rtx_equal_p (operands[0], operands[1]))
+	    emit_move_insn (operands[0], operands[1]);
+	  rtx x3 = gen_rtx_REG (CCCmode, FLAGS_REG);
+	  rtx x4 = gen_rtx_LTU (mode, x3, const0_rtx);
+	  half_mode = mode == DImode ? SImode : DImode;
+	  emit_insn (gen_add3_cc_overflow_1 (half_mode, low[0],
+					     low[0], low[0]));
+	  emit_insn (gen_add3_carry (half_mode, high[0], high[0], high[0],
+				     x3, x4));
+	}
       else
 	{
 	  gen_shld = mode == DImode ? gen_x86_shld : gen_x86_64_shld;
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index eef8a0e..6a5bc16 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -8864,7 +8864,7 @@
   [(set_attr "type" "alu")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "*add<mode>3_cc_overflow_1"
+(define_insn "@add<mode>3_cc_overflow_1"
   [(set (reg:CCC FLAGS_REG)
 	(compare:CCC
 	    (plus:SWI
diff --git a/gcc/testsuite/gcc.target/i386/ashldi3-2.c b/gcc/testsuite/gcc.target/i386/ashldi3-2.c
new file mode 100644
index 0000000..053389d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/ashldi3-2.c
@@ -0,0 +1,10 @@
+/* { dg-do compile { target ia32 } } */
+/* { dg-options "-O2 -mno-stv" } */
+
+long long foo(long long x)
+{
+  return x << 1;
+}
+
+/* { dg-final { scan-assembler "adcl" } } */
+/* { dg-final { scan-assembler-not "shldl" } } */
diff --git a/gcc/testsuite/gcc.target/i386/ashlti3-3.c b/gcc/testsuite/gcc.target/i386/ashlti3-3.c
new file mode 100644
index 0000000..4f14ca0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/ashlti3-3.c
@@ -0,0 +1,10 @@
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O2" } */
+
+__int128 foo(__int128 x)
+{
+  return x << 1;
+}
+
+/* { dg-final { scan-assembler "adcq" } } */
+/* { dg-final { scan-assembler-not "shldq" } } */

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-10-05 12:07 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-05 11:43 [X86 PATCH] Implement doubleword shift left by 1 bit using add+adc Roger Sayle
2023-10-05 11:45 Roger Sayle
2023-10-05 12:06 ` Uros Bizjak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).