* [x86 PATCH] PR 106245: Split (x<<31)>>31 as -(x&1) in i386.md
@ 2023-10-17 13:08 Roger Sayle
2023-10-17 16:37 ` Uros Bizjak
0 siblings, 1 reply; 4+ messages in thread
From: Roger Sayle @ 2023-10-17 13:08 UTC (permalink / raw)
To: gcc-patches; +Cc: 'Uros Bizjak'
[-- Attachment #1: Type: text/plain, Size: 1421 bytes --]
This patch is the backend piece of a solution to PRs 101955 and 106245,
that adds a define_insn_and_split to the i386 backend, to perform sign
extension of a single (least significant) bit using AND $1 then NEG.
Previously, (x<<31)>>31 would be generated as
sall $31, %eax // 3 bytes
sarl $31, %eax // 3 bytes
with this patch the backend now generates:
andl $1, %eax // 3 bytes
negl %eax // 2 bytes
Not only is this smaller in size, but microbenchmarking confirms
that it's a performance win on both Intel and AMD; Intel sees only a
2% improvement (perhaps just a size effect), but AMD sees a 7% win.
This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures. Ok for mainline?
2023-10-17 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR middle-end/101955
PR tree-optimization/106245
* config/i386/i386.md (*extv<mode>_1_0): New define_insn_and_split.
gcc/testsuite/ChangeLog
PR middle-end/101955
PR tree-optimization/106245
* gcc.target/i386/pr106245-2.c: New test case.
* gcc.target/i386/pr106245-3.c: New 32-bit test case.
* gcc.target/i386/pr106245-4.c: New 64-bit test case.
* gcc.target/i386/pr106245-5.c: Likewise.
Thanks in advance,
Roger
--
[-- Attachment #2: patchsb.txt --]
[-- Type: text/plain, Size: 2918 bytes --]
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 2a60df5..b7309be0 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -3414,6 +3414,21 @@
[(set_attr "type" "imovx")
(set_attr "mode" "SI")])
+;; Split sign-extension of single least significant bit as and x,$1;neg x
+(define_insn_and_split "*extv<mode>_1_0"
+ [(set (match_operand:SWI48 0 "register_operand" "=r")
+ (sign_extract:SWI48 (match_operand:SWI48 1 "register_operand" "0")
+ (const_int 1)
+ (const_int 0)))
+ (clobber (reg:CC FLAGS_REG))]
+ ""
+ "#"
+ "&& 1"
+ [(parallel [(set (match_dup 0) (and:SWI48 (match_dup 1) (const_int 1)))
+ (clobber (reg:CC FLAGS_REG))])
+ (parallel [(set (match_dup 0) (neg:SWI48 (match_dup 0)))
+ (clobber (reg:CC FLAGS_REG))])])
+
(define_expand "extzv<mode>"
[(set (match_operand:SWI248 0 "register_operand")
(zero_extract:SWI248 (match_operand:SWI248 1 "register_operand")
diff --git a/gcc/testsuite/gcc.target/i386/pr106245-2.c b/gcc/testsuite/gcc.target/i386/pr106245-2.c
new file mode 100644
index 0000000..47b0d27
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr106245-2.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+int f(int a)
+{
+ return (a << 31) >> 31;
+}
+
+/* { dg-final { scan-assembler "andl" } } */
+/* { dg-final { scan-assembler "negl" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr106245-3.c b/gcc/testsuite/gcc.target/i386/pr106245-3.c
new file mode 100644
index 0000000..4ec6342
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr106245-3.c
@@ -0,0 +1,11 @@
+/* { dg-do compile { target ia32 } } */
+/* { dg-options "-O2" } */
+
+long long f(long long a)
+{
+ return (a << 63) >> 63;
+}
+
+/* { dg-final { scan-assembler "andl" } } */
+/* { dg-final { scan-assembler "negl" } } */
+/* { dg-final { scan-assembler "cltd" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr106245-4.c b/gcc/testsuite/gcc.target/i386/pr106245-4.c
new file mode 100644
index 0000000..ef77ee5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr106245-4.c
@@ -0,0 +1,10 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2" } */
+
+long long f(long long a)
+{
+ return (a << 63) >> 63;
+}
+
+/* { dg-final { scan-assembler "andl" } } */
+/* { dg-final { scan-assembler "negq" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr106245-5.c b/gcc/testsuite/gcc.target/i386/pr106245-5.c
new file mode 100644
index 0000000..0351866
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr106245-5.c
@@ -0,0 +1,11 @@
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O2" } */
+
+__int128 f(__int128 a)
+{
+ return (a << 127) >> 127;
+}
+
+/* { dg-final { scan-assembler "andl" } } */
+/* { dg-final { scan-assembler "negq" } } */
+/* { dg-final { scan-assembler "cqto" } } */
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [x86 PATCH] PR 106245: Split (x<<31)>>31 as -(x&1) in i386.md
2023-10-17 13:08 [x86 PATCH] PR 106245: Split (x<<31)>>31 as -(x&1) in i386.md Roger Sayle
@ 2023-10-17 16:37 ` Uros Bizjak
2023-10-17 17:54 ` Roger Sayle
0 siblings, 1 reply; 4+ messages in thread
From: Uros Bizjak @ 2023-10-17 16:37 UTC (permalink / raw)
To: Roger Sayle; +Cc: gcc-patches
On Tue, Oct 17, 2023 at 3:08 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> This patch is the backend piece of a solution to PRs 101955 and 106245,
> that adds a define_insn_and_split to the i386 backend, to perform sign
> extension of a single (least significant) bit using AND $1 then NEG.
>
> Previously, (x<<31)>>31 would be generated as
>
> sall $31, %eax // 3 bytes
> sarl $31, %eax // 3 bytes
>
> with this patch the backend now generates:
>
> andl $1, %eax // 3 bytes
> negl %eax // 2 bytes
>
> Not only is this smaller in size, but microbenchmarking confirms
> that it's a performance win on both Intel and AMD; Intel sees only a
> 2% improvement (perhaps just a size effect), but AMD sees a 7% win.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures. Ok for mainline?
>
>
> 2023-10-17 Roger Sayle <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
> PR middle-end/101955
> PR tree-optimization/106245
> * config/i386/i386.md (*extv<mode>_1_0): New define_insn_and_split.
>
> gcc/testsuite/ChangeLog
> PR middle-end/101955
> PR tree-optimization/106245
> * gcc.target/i386/pr106245-2.c: New test case.
> * gcc.target/i386/pr106245-3.c: New 32-bit test case.
> * gcc.target/i386/pr106245-4.c: New 64-bit test case.
> * gcc.target/i386/pr106245-5.c: Likewise.
+;; Split sign-extension of single least significant bit as and x,$1;neg x
+(define_insn_and_split "*extv<mode>_1_0"
+ [(set (match_operand:SWI48 0 "register_operand" "=r")
+ (sign_extract:SWI48 (match_operand:SWI48 1 "register_operand" "0")
+ (const_int 1)
+ (const_int 0)))
+ (clobber (reg:CC FLAGS_REG))]
+ ""
+ "#"
+ "&& 1"
No need to use "&&" for an empty insn constraint. Just use
"reload_completed" in this case.
+ [(parallel [(set (match_dup 0) (and:SWI48 (match_dup 1) (const_int 1)))
+ (clobber (reg:CC FLAGS_REG))])
+ (parallel [(set (match_dup 0) (neg:SWI48 (match_dup 0)))
+ (clobber (reg:CC FLAGS_REG))])])
Did you intend to split this after reload? If this is the case, then
reload_completed is missing.
Uros.
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: [x86 PATCH] PR 106245: Split (x<<31)>>31 as -(x&1) in i386.md
2023-10-17 16:37 ` Uros Bizjak
@ 2023-10-17 17:54 ` Roger Sayle
2023-10-18 17:43 ` Uros Bizjak
0 siblings, 1 reply; 4+ messages in thread
From: Roger Sayle @ 2023-10-17 17:54 UTC (permalink / raw)
To: 'Uros Bizjak'; +Cc: gcc-patches
Hi Uros,
Thanks for the speedy review.
> From: Uros Bizjak <ubizjak@gmail.com>
> Sent: 17 October 2023 17:38
>
> On Tue, Oct 17, 2023 at 3:08 PM Roger Sayle <roger@nextmovesoftware.com>
> wrote:
> >
> >
> > This patch is the backend piece of a solution to PRs 101955 and
> > 106245, that adds a define_insn_and_split to the i386 backend, to
> > perform sign extension of a single (least significant) bit using AND $1 then NEG.
> >
> > Previously, (x<<31)>>31 would be generated as
> >
> > sall $31, %eax // 3 bytes
> > sarl $31, %eax // 3 bytes
> >
> > with this patch the backend now generates:
> >
> > andl $1, %eax // 3 bytes
> > negl %eax // 2 bytes
> >
> > Not only is this smaller in size, but microbenchmarking confirms that
> > it's a performance win on both Intel and AMD; Intel sees only a 2%
> > improvement (perhaps just a size effect), but AMD sees a 7% win.
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32}
> > with no new failures. Ok for mainline?
> >
> >
> > 2023-10-17 Roger Sayle <roger@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> > PR middle-end/101955
> > PR tree-optimization/106245
> > * config/i386/i386.md (*extv<mode>_1_0): New define_insn_and_split.
> >
> > gcc/testsuite/ChangeLog
> > PR middle-end/101955
> > PR tree-optimization/106245
> > * gcc.target/i386/pr106245-2.c: New test case.
> > * gcc.target/i386/pr106245-3.c: New 32-bit test case.
> > * gcc.target/i386/pr106245-4.c: New 64-bit test case.
> > * gcc.target/i386/pr106245-5.c: Likewise.
>
> +;; Split sign-extension of single least significant bit as and x,$1;neg
> +x (define_insn_and_split "*extv<mode>_1_0"
> + [(set (match_operand:SWI48 0 "register_operand" "=r")
> + (sign_extract:SWI48 (match_operand:SWI48 1 "register_operand" "0")
> + (const_int 1)
> + (const_int 0)))
> + (clobber (reg:CC FLAGS_REG))]
> + ""
> + "#"
> + "&& 1"
>
> No need to use "&&" for an empty insn constraint. Just use "reload_completed" in
> this case.
>
> + [(parallel [(set (match_dup 0) (and:SWI48 (match_dup 1) (const_int 1)))
> + (clobber (reg:CC FLAGS_REG))])
> + (parallel [(set (match_dup 0) (neg:SWI48 (match_dup 0)))
> + (clobber (reg:CC FLAGS_REG))])])
>
> Did you intend to split this after reload? If this is the case, then reload_completed
> is missing.
Because this splitter neither required the allocation of a new pseudo, nor a
hard register assignment, i.e. it's a splitter that can be run before or after
reload, it's written to split "whenever". If you'd prefer it to only split after
reload, I agree a "reload_completed" can be added (alternatively, adding
"ix86_pre_reload_split ()" would also work).
I now see from "*load_tp_<mode>" that "" is perhaps preferred over "&& 1"
In these cases. Please let me know which you prefer.
Cheers,
Roger
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [x86 PATCH] PR 106245: Split (x<<31)>>31 as -(x&1) in i386.md
2023-10-17 17:54 ` Roger Sayle
@ 2023-10-18 17:43 ` Uros Bizjak
0 siblings, 0 replies; 4+ messages in thread
From: Uros Bizjak @ 2023-10-18 17:43 UTC (permalink / raw)
To: Roger Sayle; +Cc: gcc-patches
On Tue, Oct 17, 2023 at 7:54 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> Hi Uros,
> Thanks for the speedy review.
>
> > From: Uros Bizjak <ubizjak@gmail.com>
> > Sent: 17 October 2023 17:38
> >
> > On Tue, Oct 17, 2023 at 3:08 PM Roger Sayle <roger@nextmovesoftware.com>
> > wrote:
> > >
> > >
> > > This patch is the backend piece of a solution to PRs 101955 and
> > > 106245, that adds a define_insn_and_split to the i386 backend, to
> > > perform sign extension of a single (least significant) bit using AND $1 then NEG.
> > >
> > > Previously, (x<<31)>>31 would be generated as
> > >
> > > sall $31, %eax // 3 bytes
> > > sarl $31, %eax // 3 bytes
> > >
> > > with this patch the backend now generates:
> > >
> > > andl $1, %eax // 3 bytes
> > > negl %eax // 2 bytes
> > >
> > > Not only is this smaller in size, but microbenchmarking confirms that
> > > it's a performance win on both Intel and AMD; Intel sees only a 2%
> > > improvement (perhaps just a size effect), but AMD sees a 7% win.
> > >
> > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > > and make -k check, both with and without --target_board=unix{-m32}
> > > with no new failures. Ok for mainline?
> > >
> > >
> > > 2023-10-17 Roger Sayle <roger@nextmovesoftware.com>
> > >
> > > gcc/ChangeLog
> > > PR middle-end/101955
> > > PR tree-optimization/106245
> > > * config/i386/i386.md (*extv<mode>_1_0): New define_insn_and_split.
> > >
> > > gcc/testsuite/ChangeLog
> > > PR middle-end/101955
> > > PR tree-optimization/106245
> > > * gcc.target/i386/pr106245-2.c: New test case.
> > > * gcc.target/i386/pr106245-3.c: New 32-bit test case.
> > > * gcc.target/i386/pr106245-4.c: New 64-bit test case.
> > > * gcc.target/i386/pr106245-5.c: Likewise.
> >
> > +;; Split sign-extension of single least significant bit as and x,$1;neg
> > +x (define_insn_and_split "*extv<mode>_1_0"
> > + [(set (match_operand:SWI48 0 "register_operand" "=r")
> > + (sign_extract:SWI48 (match_operand:SWI48 1 "register_operand" "0")
> > + (const_int 1)
> > + (const_int 0)))
> > + (clobber (reg:CC FLAGS_REG))]
> > + ""
> > + "#"
> > + "&& 1"
> >
> > No need to use "&&" for an empty insn constraint. Just use "reload_completed" in
> > this case.
> >
> > + [(parallel [(set (match_dup 0) (and:SWI48 (match_dup 1) (const_int 1)))
> > + (clobber (reg:CC FLAGS_REG))])
> > + (parallel [(set (match_dup 0) (neg:SWI48 (match_dup 0)))
> > + (clobber (reg:CC FLAGS_REG))])])
> >
> > Did you intend to split this after reload? If this is the case, then reload_completed
> > is missing.
>
> Because this splitter neither required the allocation of a new pseudo, nor a
> hard register assignment, i.e. it's a splitter that can be run before or after
> reload, it's written to split "whenever". If you'd prefer it to only split after
> reload, I agree a "reload_completed" can be added (alternatively, adding
> "ix86_pre_reload_split ()" would also work).
No, this part is OK. I just forgot that we have universal splitters ;)
> I now see from "*load_tp_<mode>" that "" is perhaps preferred over "&& 1"
> In these cases. Please let me know which you prefer.
"" please for an empty insn constraint.
OK otherwise.
Thanks,
Uros.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-10-18 17:44 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-17 13:08 [x86 PATCH] PR 106245: Split (x<<31)>>31 as -(x&1) in i386.md Roger Sayle
2023-10-17 16:37 ` Uros Bizjak
2023-10-17 17:54 ` Roger Sayle
2023-10-18 17:43 ` Uros Bizjak
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).