public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH]i386: Optimize pmovskb on zero_extend of subreg HI of the result [PR98461]
@ 2021-01-04  5:56 Hongtao Liu
  2021-01-04  7:40 ` Uros Bizjak
  2021-01-04  8:49 ` Jakub Jelinek
  0 siblings, 2 replies; 11+ messages in thread
From: Hongtao Liu @ 2021-01-04  5:56 UTC (permalink / raw)
  To: GCC Patches, Kirill Yukhin, Uros Bizjak; +Cc: H. J. Lu, Jakub Jelinek

Hi:
  The following patch adds define_insn_and_split to optimize

       vpmovmskb       %xmm0, %eax
-       movzwl  %ax, %eax
        notl    %eax

  Bootstrapped/regtested on x86_64-linux-gnu {,-m32}.
  Ok for trunk?

gcc/ChangeLog
        PR target/98461
        * config/i386/sse.md (*sse2_pmovskb_zexthisi): New
        define_insn_and_split for zero_extend of subreg HI of pmovskb
        result.

gcc/testsuite/ChangeLog
        * gcc.target/i386/sse-pr98461-2.c: New test.
---
 gcc/config/i386/sse.md                         | 11 +++++++++++
 gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c | 13 +++++++++++++
 2 files changed, 24 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index d84103807ff..4ed6b9ae476 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -16099,6 +16099,17 @@ (define_insn "*sse2_pmovmskb_ext"
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "SI")])

+(define_insn_and_split "*sse2_pmovskb_zexthisi"
+  [(set (match_operand:SI 0 "register_operand")
+       (zero_extend:SI (subreg:HI (unspec:SI
+         [(match_operand:V16QI 1 "register_operand")]
+          UNSPEC_MOVMSK) 0)))]
+  "TARGET_SSE2"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+       (unspec:SI [(match_dup 1)] UNSPEC_MOVMSK))])
+
 (define_split
   [(set (match_operand:SI 0 "register_operand")
        (unspec:SI
diff --git a/gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
b/gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
new file mode 100644
index 00000000000..60fc1f3e9c1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
@@ -0,0 +1,13 @@
+/* PR target/98461 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse2 -mno-sse3 -masm=att" } */
+/* { dg-final { scan-assembler-times "\tpmovmskb\t" 1 } } */
+/* { dg-final { scan-assembler-not "\tmovzwl" } } */
+/* { dg-final { scan-assembler-times "\tnotl" 1 } } */
+
+#include <immintrin.h>
+
+unsigned int movemask_not1(__m128i logical) {
+  unsigned short res = (unsigned short)(_mm_movemask_epi8(logical));
+  return ~res;
+}
-- 
2.18.1


-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH]i386: Optimize pmovskb on zero_extend of subreg HI of the result [PR98461]
  2021-01-04  5:56 [PATCH]i386: Optimize pmovskb on zero_extend of subreg HI of the result [PR98461] Hongtao Liu
@ 2021-01-04  7:40 ` Uros Bizjak
  2021-01-04  7:54   ` Hongtao Liu
  2021-01-04  8:49 ` Jakub Jelinek
  1 sibling, 1 reply; 11+ messages in thread
From: Uros Bizjak @ 2021-01-04  7:40 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: GCC Patches, Kirill Yukhin, H. J. Lu, Jakub Jelinek

On Mon, Jan 4, 2021 at 6:54 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> Hi:
>   The following patch adds define_insn_and_split to optimize
>
>        vpmovmskb       %xmm0, %eax
> -       movzwl  %ax, %eax
>         notl    %eax
>
>   Bootstrapped/regtested on x86_64-linux-gnu {,-m32}.
>   Ok for trunk?
>
> gcc/ChangeLog
>         PR target/98461
>         * config/i386/sse.md (*sse2_pmovskb_zexthisi): New
>         define_insn_and_split for zero_extend of subreg HI of pmovskb
>         result.
>
> gcc/testsuite/ChangeLog
>         * gcc.target/i386/sse-pr98461-2.c: New test.
> ---
>  gcc/config/i386/sse.md                         | 11 +++++++++++
>  gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c | 13 +++++++++++++
>  2 files changed, 24 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index d84103807ff..4ed6b9ae476 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -16099,6 +16099,17 @@ (define_insn "*sse2_pmovmskb_ext"
>     (set_attr "prefix" "maybe_vex")
>     (set_attr "mode" "SI")])
>
> +(define_insn_and_split "*sse2_pmovskb_zexthisi"
> +  [(set (match_operand:SI 0 "register_operand")
> +       (zero_extend:SI (subreg:HI (unspec:SI
> +         [(match_operand:V16QI 1 "register_operand")]
> +          UNSPEC_MOVMSK) 0)))]
> +  "TARGET_SSE2"

This needs ix86_pre_reload_split () in insn predicate.

Uros.

> +  "#"
> +  "&& 1"
> +  [(set (match_dup 0)
> +       (unspec:SI [(match_dup 1)] UNSPEC_MOVMSK))])
> +
>  (define_split
>    [(set (match_operand:SI 0 "register_operand")
>         (unspec:SI
> diff --git a/gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
> b/gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
> new file mode 100644
> index 00000000000..60fc1f3e9c1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
> @@ -0,0 +1,13 @@
> +/* PR target/98461 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -msse2 -mno-sse3 -masm=att" } */
> +/* { dg-final { scan-assembler-times "\tpmovmskb\t" 1 } } */
> +/* { dg-final { scan-assembler-not "\tmovzwl" } } */
> +/* { dg-final { scan-assembler-times "\tnotl" 1 } } */
> +
> +#include <immintrin.h>
> +
> +unsigned int movemask_not1(__m128i logical) {
> +  unsigned short res = (unsigned short)(_mm_movemask_epi8(logical));
> +  return ~res;
> +}
> --
> 2.18.1
>
>
> --
> BR,
> Hongtao

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH]i386: Optimize pmovskb on zero_extend of subreg HI of the result [PR98461]
  2021-01-04  7:40 ` Uros Bizjak
@ 2021-01-04  7:54   ` Hongtao Liu
  2021-01-04  8:42     ` Uros Bizjak
  0 siblings, 1 reply; 11+ messages in thread
From: Hongtao Liu @ 2021-01-04  7:54 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: GCC Patches, Kirill Yukhin, H. J. Lu, Jakub Jelinek

On Mon, Jan 4, 2021 at 3:40 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Mon, Jan 4, 2021 at 6:54 AM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > Hi:
> >   The following patch adds define_insn_and_split to optimize
> >
> >        vpmovmskb       %xmm0, %eax
> > -       movzwl  %ax, %eax
> >         notl    %eax
> >
> >   Bootstrapped/regtested on x86_64-linux-gnu {,-m32}.
> >   Ok for trunk?
> >
> > gcc/ChangeLog
> >         PR target/98461
> >         * config/i386/sse.md (*sse2_pmovskb_zexthisi): New
> >         define_insn_and_split for zero_extend of subreg HI of pmovskb
> >         result.
> >
> > gcc/testsuite/ChangeLog
> >         * gcc.target/i386/sse-pr98461-2.c: New test.
> > ---
> >  gcc/config/i386/sse.md                         | 11 +++++++++++
> >  gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c | 13 +++++++++++++
> >  2 files changed, 24 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
> >
> > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > index d84103807ff..4ed6b9ae476 100644
> > --- a/gcc/config/i386/sse.md
> > +++ b/gcc/config/i386/sse.md
> > @@ -16099,6 +16099,17 @@ (define_insn "*sse2_pmovmskb_ext"
> >     (set_attr "prefix" "maybe_vex")
> >     (set_attr "mode" "SI")])
> >
> > +(define_insn_and_split "*sse2_pmovskb_zexthisi"
> > +  [(set (match_operand:SI 0 "register_operand")
> > +       (zero_extend:SI (subreg:HI (unspec:SI
> > +         [(match_operand:V16QI 1 "register_operand")]
> > +          UNSPEC_MOVMSK) 0)))]
> > +  "TARGET_SSE2"
>
> This needs ix86_pre_reload_split () in insn predicate.
>

Yes, there's subreg in the pattern.
Assume patch is pre-approved with that change and
regtested/bootstrapped on x86_64-linux-gnu{-m32,}.

> Uros.
>
> > +  "#"
> > +  "&& 1"
> > +  [(set (match_dup 0)
> > +       (unspec:SI [(match_dup 1)] UNSPEC_MOVMSK))])
> > +
> >  (define_split
> >    [(set (match_operand:SI 0 "register_operand")
> >         (unspec:SI
> > diff --git a/gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
> > b/gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
> > new file mode 100644
> > index 00000000000..60fc1f3e9c1
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
> > @@ -0,0 +1,13 @@
> > +/* PR target/98461 */
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -msse2 -mno-sse3 -masm=att" } */
> > +/* { dg-final { scan-assembler-times "\tpmovmskb\t" 1 } } */
> > +/* { dg-final { scan-assembler-not "\tmovzwl" } } */
> > +/* { dg-final { scan-assembler-times "\tnotl" 1 } } */
> > +
> > +#include <immintrin.h>
> > +
> > +unsigned int movemask_not1(__m128i logical) {
> > +  unsigned short res = (unsigned short)(_mm_movemask_epi8(logical));
> > +  return ~res;
> > +}
> > --
> > 2.18.1
> >
> >
> > --
> > BR,
> > Hongtao



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH]i386: Optimize pmovskb on zero_extend of subreg HI of the result [PR98461]
  2021-01-04  7:54   ` Hongtao Liu
@ 2021-01-04  8:42     ` Uros Bizjak
  0 siblings, 0 replies; 11+ messages in thread
From: Uros Bizjak @ 2021-01-04  8:42 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: GCC Patches, Kirill Yukhin, H. J. Lu, Jakub Jelinek

On Mon, Jan 4, 2021 at 8:52 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Mon, Jan 4, 2021 at 3:40 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > On Mon, Jan 4, 2021 at 6:54 AM Hongtao Liu <crazylht@gmail.com> wrote:
> > >
> > > Hi:
> > >   The following patch adds define_insn_and_split to optimize
> > >
> > >        vpmovmskb       %xmm0, %eax
> > > -       movzwl  %ax, %eax
> > >         notl    %eax
> > >
> > >   Bootstrapped/regtested on x86_64-linux-gnu {,-m32}.
> > >   Ok for trunk?
> > >
> > > gcc/ChangeLog
> > >         PR target/98461
> > >         * config/i386/sse.md (*sse2_pmovskb_zexthisi): New
> > >         define_insn_and_split for zero_extend of subreg HI of pmovskb
> > >         result.
> > >
> > > gcc/testsuite/ChangeLog
> > >         * gcc.target/i386/sse-pr98461-2.c: New test.
> > > ---
> > >  gcc/config/i386/sse.md                         | 11 +++++++++++
> > >  gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c | 13 +++++++++++++
> > >  2 files changed, 24 insertions(+)
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
> > >
> > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > > index d84103807ff..4ed6b9ae476 100644
> > > --- a/gcc/config/i386/sse.md
> > > +++ b/gcc/config/i386/sse.md
> > > @@ -16099,6 +16099,17 @@ (define_insn "*sse2_pmovmskb_ext"
> > >     (set_attr "prefix" "maybe_vex")
> > >     (set_attr "mode" "SI")])
> > >
> > > +(define_insn_and_split "*sse2_pmovskb_zexthisi"
> > > +  [(set (match_operand:SI 0 "register_operand")
> > > +       (zero_extend:SI (subreg:HI (unspec:SI
> > > +         [(match_operand:V16QI 1 "register_operand")]
> > > +          UNSPEC_MOVMSK) 0)))]
> > > +  "TARGET_SSE2"
> >
> > This needs ix86_pre_reload_split () in insn predicate.
> >
>
> Yes, there's subreg in the pattern.

Also the insn pattern does not have operand constraints.

> Assume patch is pre-approved with that change and
> regtested/bootstrapped on x86_64-linux-gnu{-m32,}.

LGTM with the above addition.

Uros.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH]i386: Optimize pmovskb on zero_extend of subreg HI of the result [PR98461]
  2021-01-04  5:56 [PATCH]i386: Optimize pmovskb on zero_extend of subreg HI of the result [PR98461] Hongtao Liu
  2021-01-04  7:40 ` Uros Bizjak
@ 2021-01-04  8:49 ` Jakub Jelinek
  2021-01-04  8:59   ` Hongtao Liu
  1 sibling, 1 reply; 11+ messages in thread
From: Jakub Jelinek @ 2021-01-04  8:49 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: GCC Patches, Kirill Yukhin, Uros Bizjak

On Mon, Jan 04, 2021 at 01:56:44PM +0800, Hongtao Liu via Gcc-patches wrote:
> +(define_insn_and_split "*sse2_pmovskb_zexthisi"
> +  [(set (match_operand:SI 0 "register_operand")
> +       (zero_extend:SI (subreg:HI (unspec:SI
> +         [(match_operand:V16QI 1 "register_operand")]
> +          UNSPEC_MOVMSK) 0)))]

Also, please fix up formatting.  Should be:
	(zero_extend:SI
	  (subreg:HI
	    (unspec:SI
	      [(match_operand:V16QI 1 "register_operand")]
	      UNSPEC_MOVMSK) 0)))]
I think.

	Jakub


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH]i386: Optimize pmovskb on zero_extend of subreg HI of the result [PR98461]
  2021-01-04  8:49 ` Jakub Jelinek
@ 2021-01-04  8:59   ` Hongtao Liu
  2021-01-05  6:32     ` Hongtao Liu
  0 siblings, 1 reply; 11+ messages in thread
From: Hongtao Liu @ 2021-01-04  8:59 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: GCC Patches, Kirill Yukhin, Uros Bizjak

On Mon, Jan 4, 2021 at 4:49 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Mon, Jan 04, 2021 at 01:56:44PM +0800, Hongtao Liu via Gcc-patches wrote:
> > +(define_insn_and_split "*sse2_pmovskb_zexthisi"
> > +  [(set (match_operand:SI 0 "register_operand")
> > +       (zero_extend:SI (subreg:HI (unspec:SI
> > +         [(match_operand:V16QI 1 "register_operand")]
> > +          UNSPEC_MOVMSK) 0)))]
>
> Also, please fix up formatting.  Should be:
>         (zero_extend:SI
>           (subreg:HI
>             (unspec:SI
>               [(match_operand:V16QI 1 "register_operand")]
>               UNSPEC_MOVMSK) 0)))]
> I think.
>
>         Jakub
>

Yes, thanks for the review both, and happy new year!

-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH]i386: Optimize pmovskb on zero_extend of subreg HI of the result [PR98461]
  2021-01-04  8:59   ` Hongtao Liu
@ 2021-01-05  6:32     ` Hongtao Liu
  2021-01-05  7:04       ` Uros Bizjak
  0 siblings, 1 reply; 11+ messages in thread
From: Hongtao Liu @ 2021-01-05  6:32 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: GCC Patches, Kirill Yukhin, Uros Bizjak

On Mon, Jan 4, 2021 at 4:59 PM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Mon, Jan 4, 2021 at 4:49 PM Jakub Jelinek <jakub@redhat.com> wrote:
> >
> > On Mon, Jan 04, 2021 at 01:56:44PM +0800, Hongtao Liu via Gcc-patches wrote:
> > > +(define_insn_and_split "*sse2_pmovskb_zexthisi"
> > > +  [(set (match_operand:SI 0 "register_operand")
> > > +       (zero_extend:SI (subreg:HI (unspec:SI
> > > +         [(match_operand:V16QI 1 "register_operand")]
> > > +          UNSPEC_MOVMSK) 0)))]
> >
> > Also, please fix up formatting.  Should be:
> >         (zero_extend:SI
> >           (subreg:HI
> >             (unspec:SI
> >               [(match_operand:V16QI 1 "register_operand")]
> >               UNSPEC_MOVMSK) 0)))]
> > I think.
> >
> >         Jakub
> >
>
> Yes, thanks for the review both, and happy new year!
>
> --
> BR,
> Hongtao

Sorry for the bother, this is an incremental patch to split
(zero_extend:SI (not:HI (subreg:HI (pmovmskb result:SI)))) to

        pmovmskb        %xmm0, %eax
-       notl    %eax
-       movzwl  %ax, %eax
+       xorl    $65535, %eax


The patch is below, regtestes and bootstrapped on x86_64-linux-gnu{-m32,}.
  Ok for trunk?

The following patch adds define_insn_and_split to optimize

       vpmovmskb       %xmm0, %eax
-       movzwl  %ax, %eax
        notl    %eax

and combine splitter to optimize

        pmovmskb        %xmm0, %eax
-       notl    %eax
-       movzwl  %ax, %eax
+       xorl    $65535, %eax

gcc/ChangeLog
        PR target/98461
        * config/i386/sse.md (*sse2_pmovskb_zexthisi): New
        define_insn_and_split for zero_extend of subreg HI of pmovskb
        result.
        (*sse2_pmovskb_zexthisi): Add new combine splitters for
        zero_extend of not of subreg HI of pmovskb result.

gcc/testsuite/ChangeLog
        * gcc.target/i386/sse-pr98461-2.c: New test.
---
 gcc/config/i386/sse.md                        | 32 +++++++++++++++++++
 .../gcc.target/i386/sse2-pr98461-2.c          | 25 +++++++++++++++
 2 files changed, 57 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index d84103807ff..4fcff0800c0 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -16099,6 +16099,38 @@ (define_insn "*sse2_pmovmskb_ext"
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "SI")])

+(define_insn_and_split "*sse2_pmovskb_zexthisi"
+  [(set (match_operand:SI 0 "register_operand")
+        (zero_extend:SI
+          (subreg:HI
+            (unspec:SI
+              [(match_operand:V16QI 1 "register_operand")]
+              UNSPEC_MOVMSK) 0)))]
+  "TARGET_SSE2 && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+        (unspec:SI [(match_dup 1)] UNSPEC_MOVMSK))])
+
+(define_split
+  [(set (match_operand:SI 0 "register_operand")
+        (zero_extend:SI
+          (not:HI
+            (subreg:HI
+              (unspec:SI
+                [(match_operand:V16QI 1 "register_operand")]
+                UNSPEC_MOVMSK) 0))))]
+  "TARGET_SSE2"
+  [(set (match_dup 2)
+        (unspec:SI [(match_dup 1)] UNSPEC_MOVMSK))
+   (set (match_dup 0)
+        (match_dup 3))]
+{
+  operands[2] = gen_reg_rtx (SImode);
+  operands[3] = gen_int_mode ((HOST_WIDE_INT_1 << 16) - 1, SImode);
+  operands[3] = gen_rtx_XOR (SImode, operands[2], operands[3]);
+})
+
 (define_split
   [(set (match_operand:SI 0 "register_operand")
         (unspec:SI
diff --git a/gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
b/gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
new file mode 100644
index 00000000000..330272c69bc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
@@ -0,0 +1,25 @@
+/* PR target/98461 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse2 -mno-sse3 -masm=att" } */
+/* { dg-final { scan-assembler-times "\tpmovmskb\t" 3 } } */
+/* { dg-final { scan-assembler-not "\tmovzwl" } } */
+/* { dg-final { scan-assembler-times "\tnotl" 1 } } *
+/* { dg-final { scan-assembler-times "\txorl" 1 } } */
+
+#include <immintrin.h>
+
+unsigned int movemask_not1(__m128i logical) {
+  unsigned short res = (unsigned short)(_mm_movemask_epi8(logical));
+  return ~res;
+}
+
+unsigned int movemask_not2(__m128i logical) {
+  unsigned short res = (unsigned short)(_mm_movemask_epi8(logical));
+  res = ~res;
+  return res;
+}
+
+unsigned int movemask_zero_extend(__m128i logical) {
+  unsigned int res = _mm_movemask_epi8(logical);
+  return res & 0xffff;
+}
-- 
2.18.1


-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH]i386: Optimize pmovskb on zero_extend of subreg HI of the result [PR98461]
  2021-01-05  6:32     ` Hongtao Liu
@ 2021-01-05  7:04       ` Uros Bizjak
  2021-01-05  7:19         ` Uros Bizjak
  0 siblings, 1 reply; 11+ messages in thread
From: Uros Bizjak @ 2021-01-05  7:04 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: Jakub Jelinek, GCC Patches, Kirill Yukhin

On Tue, Jan 5, 2021 at 7:30 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Mon, Jan 4, 2021 at 4:59 PM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > On Mon, Jan 4, 2021 at 4:49 PM Jakub Jelinek <jakub@redhat.com> wrote:
> > >
> > > On Mon, Jan 04, 2021 at 01:56:44PM +0800, Hongtao Liu via Gcc-patches wrote:
> > > > +(define_insn_and_split "*sse2_pmovskb_zexthisi"
> > > > +  [(set (match_operand:SI 0 "register_operand")
> > > > +       (zero_extend:SI (subreg:HI (unspec:SI
> > > > +         [(match_operand:V16QI 1 "register_operand")]
> > > > +          UNSPEC_MOVMSK) 0)))]
> > >
> > > Also, please fix up formatting.  Should be:
> > >         (zero_extend:SI
> > >           (subreg:HI
> > >             (unspec:SI
> > >               [(match_operand:V16QI 1 "register_operand")]
> > >               UNSPEC_MOVMSK) 0)))]
> > > I think.
> > >
> > >         Jakub
> > >
> >
> > Yes, thanks for the review both, and happy new year!
> >
> > --
> > BR,
> > Hongtao
>
> Sorry for the bother, this is an incremental patch to split
> (zero_extend:SI (not:HI (subreg:HI (pmovmskb result:SI)))) to
>
>         pmovmskb        %xmm0, %eax
> -       notl    %eax
> -       movzwl  %ax, %eax
> +       xorl    $65535, %eax
>
>
> The patch is below, regtestes and bootstrapped on x86_64-linux-gnu{-m32,}.
>   Ok for trunk?
>
> The following patch adds define_insn_and_split to optimize
>
>        vpmovmskb       %xmm0, %eax
> -       movzwl  %ax, %eax
>         notl    %eax
>
> and combine splitter to optimize
>
>         pmovmskb        %xmm0, %eax
> -       notl    %eax
> -       movzwl  %ax, %eax
> +       xorl    $65535, %eax
>
> gcc/ChangeLog
>         PR target/98461
>         * config/i386/sse.md (*sse2_pmovskb_zexthisi): New
>         define_insn_and_split for zero_extend of subreg HI of pmovskb
>         result.
>         (*sse2_pmovskb_zexthisi): Add new combine splitters for
>         zero_extend of not of subreg HI of pmovskb result.
>
> gcc/testsuite/ChangeLog
>         * gcc.target/i386/sse-pr98461-2.c: New test.
> ---
>  gcc/config/i386/sse.md                        | 32 +++++++++++++++++++
>  .../gcc.target/i386/sse2-pr98461-2.c          | 25 +++++++++++++++
>  2 files changed, 57 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index d84103807ff..4fcff0800c0 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -16099,6 +16099,38 @@ (define_insn "*sse2_pmovmskb_ext"
>     (set_attr "prefix" "maybe_vex")
>     (set_attr "mode" "SI")])
>
> +(define_insn_and_split "*sse2_pmovskb_zexthisi"
> +  [(set (match_operand:SI 0 "register_operand")
> +        (zero_extend:SI
> +          (subreg:HI
> +            (unspec:SI
> +              [(match_operand:V16QI 1 "register_operand")]
> +              UNSPEC_MOVMSK) 0)))]
> +  "TARGET_SSE2 && ix86_pre_reload_split ()"
> +  "#"
> +  "&& 1"
> +  [(set (match_dup 0)
> +        (unspec:SI [(match_dup 1)] UNSPEC_MOVMSK))])
> +
> +(define_split
> +  [(set (match_operand:SI 0 "register_operand")
> +        (zero_extend:SI
> +          (not:HI
> +            (subreg:HI
> +              (unspec:SI
> +                [(match_operand:V16QI 1 "register_operand")]
> +                UNSPEC_MOVMSK) 0))))]
> +  "TARGET_SSE2"
> +  [(set (match_dup 2)
> +        (unspec:SI [(match_dup 1)] UNSPEC_MOVMSK))
> +   (set (match_dup 0)
> +        (match_dup 3))]

Just write:

(set (match_dup 0)
    (xor:SI (match_dup 2)(const_int 65535))

Uros.

>
> +{
> +  operands[2] = gen_reg_rtx (SImode);
> +  operands[3] = gen_int_mode ((HOST_WIDE_INT_1 << 16) - 1, SImode);
> +  operands[3] = gen_rtx_XOR (SImode, operands[2], operands[3]);
> +})
> +
>  (define_split
>    [(set (match_operand:SI 0 "register_operand")
>          (unspec:SI
> diff --git a/gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
> b/gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
> new file mode 100644
> index 00000000000..330272c69bc
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
> @@ -0,0 +1,25 @@
> +/* PR target/98461 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -msse2 -mno-sse3 -masm=att" } */
> +/* { dg-final { scan-assembler-times "\tpmovmskb\t" 3 } } */
> +/* { dg-final { scan-assembler-not "\tmovzwl" } } */
> +/* { dg-final { scan-assembler-times "\tnotl" 1 } } *
> +/* { dg-final { scan-assembler-times "\txorl" 1 } } */
> +
> +#include <immintrin.h>
> +
> +unsigned int movemask_not1(__m128i logical) {
> +  unsigned short res = (unsigned short)(_mm_movemask_epi8(logical));
> +  return ~res;
> +}
> +
> +unsigned int movemask_not2(__m128i logical) {
> +  unsigned short res = (unsigned short)(_mm_movemask_epi8(logical));
> +  res = ~res;
> +  return res;
> +}
> +
> +unsigned int movemask_zero_extend(__m128i logical) {
> +  unsigned int res = _mm_movemask_epi8(logical);
> +  return res & 0xffff;
> +}
> --
> 2.18.1
>
>
> --
> BR,
> Hongtao

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH]i386: Optimize pmovskb on zero_extend of subreg HI of the result [PR98461]
  2021-01-05  7:04       ` Uros Bizjak
@ 2021-01-05  7:19         ` Uros Bizjak
  2021-01-05 10:28           ` Hongtao Liu
  0 siblings, 1 reply; 11+ messages in thread
From: Uros Bizjak @ 2021-01-05  7:19 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: Jakub Jelinek, GCC Patches, Kirill Yukhin

On Tue, Jan 5, 2021 at 8:04 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > +(define_split
> > +  [(set (match_operand:SI 0 "register_operand")
> > +        (zero_extend:SI
> > +          (not:HI
> > +            (subreg:HI
> > +              (unspec:SI
> > +                [(match_operand:V16QI 1 "register_operand")]
> > +                UNSPEC_MOVMSK) 0))))]
> > +  "TARGET_SSE2"
> > +  [(set (match_dup 2)
> > +        (unspec:SI [(match_dup 1)] UNSPEC_MOVMSK))
> > +   (set (match_dup 0)
> > +        (match_dup 3))]
>
> Just write:
>
> (set (match_dup 0)
>     (xor:SI (match_dup 2)(const_int 65535))

BTW: This could be a universal combine splitter to simplify

unsigned int foo (unsigned short z)
{
    return (unsigned short)~z;
}

Trying 7 -> 8:
   7: r87:HI=~r88:SI#0
     REG_DEAD r88:SI
   8: r86:SI=zero_extend(r87:HI)
     REG_DEAD r87:HI
Failed to match this instruction:
(set (reg:SI 86)
   (zero_extend:SI (not:HI (subreg:HI (reg:SI 88) 0))))

But combine does not "split" to one insns.

Uros.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH]i386: Optimize pmovskb on zero_extend of subreg HI of the result [PR98461]
  2021-01-05  7:19         ` Uros Bizjak
@ 2021-01-05 10:28           ` Hongtao Liu
  2021-01-05 10:30             ` Uros Bizjak
  0 siblings, 1 reply; 11+ messages in thread
From: Hongtao Liu @ 2021-01-05 10:28 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Jakub Jelinek, GCC Patches, Kirill Yukhin

On Tue, Jan 5, 2021 at 3:20 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Tue, Jan 5, 2021 at 8:04 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> > >
> > > +(define_split
> > > +  [(set (match_operand:SI 0 "register_operand")
> > > +        (zero_extend:SI
> > > +          (not:HI
> > > +            (subreg:HI
> > > +              (unspec:SI
> > > +                [(match_operand:V16QI 1 "register_operand")]
> > > +                UNSPEC_MOVMSK) 0))))]
> > > +  "TARGET_SSE2"
> > > +  [(set (match_dup 2)
> > > +        (unspec:SI [(match_dup 1)] UNSPEC_MOVMSK))
> > > +   (set (match_dup 0)
> > > +        (match_dup 3))]
> >
> > Just write:
> >
> > (set (match_dup 0)
> >     (xor:SI (match_dup 2)(const_int 65535))
>

Yes, changed.

> BTW: This could be a universal combine splitter to simplify
>
> unsigned int foo (unsigned short z)
> {
>     return (unsigned short)~z;
> }
>
> Trying 7 -> 8:
>    7: r87:HI=~r88:SI#0
>      REG_DEAD r88:SI
>    8: r86:SI=zero_extend(r87:HI)
>      REG_DEAD r87:HI
> Failed to match this instruction:
> (set (reg:SI 86)
>    (zero_extend:SI (not:HI (subreg:HI (reg:SI 88) 0))))
>
> But combine does not "split" to one insns.

Yes, according to PSabi, the top half of the register is not
necessarily 0, so if you add the splitter, it just changes from notl +
movzwl to xor + movzwl, which doesn't look better?

>
> Uros.



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH]i386: Optimize pmovskb on zero_extend of subreg HI of the result [PR98461]
  2021-01-05 10:28           ` Hongtao Liu
@ 2021-01-05 10:30             ` Uros Bizjak
  0 siblings, 0 replies; 11+ messages in thread
From: Uros Bizjak @ 2021-01-05 10:30 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: Jakub Jelinek, GCC Patches, Kirill Yukhin

On Tue, Jan 5, 2021 at 11:25 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Tue, Jan 5, 2021 at 3:20 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > On Tue, Jan 5, 2021 at 8:04 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> > > >
> > > > +(define_split
> > > > +  [(set (match_operand:SI 0 "register_operand")
> > > > +        (zero_extend:SI
> > > > +          (not:HI
> > > > +            (subreg:HI
> > > > +              (unspec:SI
> > > > +                [(match_operand:V16QI 1 "register_operand")]
> > > > +                UNSPEC_MOVMSK) 0))))]
> > > > +  "TARGET_SSE2"
> > > > +  [(set (match_dup 2)
> > > > +        (unspec:SI [(match_dup 1)] UNSPEC_MOVMSK))
> > > > +   (set (match_dup 0)
> > > > +        (match_dup 3))]
> > >
> > > Just write:
> > >
> > > (set (match_dup 0)
> > >     (xor:SI (match_dup 2)(const_int 65535))
> >
>
> Yes, changed.
>
> > BTW: This could be a universal combine splitter to simplify
> >
> > unsigned int foo (unsigned short z)
> > {
> >     return (unsigned short)~z;
> > }
> >
> > Trying 7 -> 8:
> >    7: r87:HI=~r88:SI#0
> >      REG_DEAD r88:SI
> >    8: r86:SI=zero_extend(r87:HI)
> >      REG_DEAD r87:HI
> > Failed to match this instruction:
> > (set (reg:SI 86)
> >    (zero_extend:SI (not:HI (subreg:HI (reg:SI 88) 0))))
> >
> > But combine does not "split" to one insns.
>
> Yes, according to PSabi, the top half of the register is not
> necessarily 0, so if you add the splitter, it just changes from notl +
> movzwl to xor + movzwl, which doesn't look better?

Indeed.

The patch is OK.

Uros.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-01-05 10:30 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-04  5:56 [PATCH]i386: Optimize pmovskb on zero_extend of subreg HI of the result [PR98461] Hongtao Liu
2021-01-04  7:40 ` Uros Bizjak
2021-01-04  7:54   ` Hongtao Liu
2021-01-04  8:42     ` Uros Bizjak
2021-01-04  8:49 ` Jakub Jelinek
2021-01-04  8:59   ` Hongtao Liu
2021-01-05  6:32     ` Hongtao Liu
2021-01-05  7:04       ` Uros Bizjak
2021-01-05  7:19         ` Uros Bizjak
2021-01-05 10:28           ` Hongtao Liu
2021-01-05 10:30             ` Uros Bizjak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).