public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH][AVX512]Lower AVX512 vector compare to AVX version when dest is vector
@ 2020-09-02  9:34 Hongtao Liu
  2020-09-02 12:20 ` H.J. Lu
  2020-11-17  0:05 ` Jeff Law
  0 siblings, 2 replies; 7+ messages in thread
From: Hongtao Liu @ 2020-09-02  9:34 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 592 bytes --]

Hi:
  Add define_peephole2 to eliminate potential redundant conversion
from mask to vector.
  Bootstrap is ok, regression test is ok for i386/x86-64 backend.
  Ok for trunk?

gcc/ChangeLog:
        PR target/96891
        * config/i386/sse.md (VI_128_256): New mode iterator.
        (define_peephole2): Lower avx512 vector compare to avx version
        when dest is vector.

gcc/testsuite/ChangeLog:

        * gcc.target/i386/avx512bw-pr96891-1.c: New test.
        * gcc.target/i386/avx512f-pr96891-1.c: New test.
        * gcc.target/i386/avx512f-pr96891-2.c: New test.

-- 
BR,
Hongtao

[-- Attachment #2: 0001-Lower-AVX512-vector-compare-to-AVX-version-when-dest.patch --]
[-- Type: text/x-patch, Size: 9491 bytes --]

From ba76432c08f47e4ecc1f355c0dfdea8908aaf9f4 Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao.liu@intel.com>
Date: Wed, 2 Sep 2020 17:14:39 +0800
Subject: [PATCH] Lower AVX512 vector compare to AVX version when dest is
 vector.

gcc/ChangeLog:
	PR target/96891
	* config/i386/sse.md (VI_128_256): New mode iterator.
	(define_peephole2): Lower avx512 vector compare to avx version
	when dest is vector.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512bw-pr96891-1.c: New test.
	* gcc.target/i386/avx512f-pr96891-1.c: New test.
	* gcc.target/i386/avx512f-pr96891-2.c: New test.
---
 gcc/config/i386/sse.md                        | 93 +++++++++++++++++++
 .../gcc.target/i386/avx512bw-pr96891-1.c      | 36 +++++++
 .../gcc.target/i386/avx512f-pr96891-1.c       | 40 ++++++++
 .../gcc.target/i386/avx512f-pr96891-2.c       | 30 ++++++
 4 files changed, 199 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512bw-pr96891-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-pr96891-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-pr96891-2.c

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 8250325e1a3..31e0dc2a600 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -629,6 +629,9 @@ (define_mode_iterator VI_128 [V16QI V8HI V4SI V2DI])
 ;; All 256bit vector integer modes
 (define_mode_iterator VI_256 [V32QI V16HI V8SI V4DI])
 
+;; All 128 and 256bit vector integer modes
+(define_mode_iterator VI_128_256 [V16QI V8HI V4SI V2DI V32QI V16HI V8SI V4DI])
+
 ;; Various 128bit vector integer mode combinations
 (define_mode_iterator VI12_128 [V16QI V8HI])
 (define_mode_iterator VI14_128 [V16QI V4SI])
@@ -6703,6 +6706,96 @@ (define_insn "*<avx512>_cvtmask2<ssemodesuffix><mode>"
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
+/* Lower avx512 parallel floating compare to avx compare when dst is vector.  */
+(define_peephole2
+  [(set (match_operand:<avx512fmaskmode> 0 "register_operand")
+	(unspec:<avx512fmaskmode>
+	  [(match_operand:VF_128_256 1 "register_operand")
+	   (match_operand:VF_128_256 2 "nonimmediate_operand")
+	   (match_operand:SI 3 "const_0_to_31_operand")]
+	  UNSPEC_PCMP))
+   (set (match_operand:<sseintvecmode> 4 "register_operand")
+	(vec_merge:<sseintvecmode>
+	  (match_operand:<sseintvecmode> 5 "vector_all_ones_operand")
+	  (match_operand:<sseintvecmode> 6 "const0_operand")
+	  (match_dup 0)))]
+  "!EXT_REX_SSE_REGNO_P (REGNO (operands[4]))
+  && !EXT_REX_SSE_REGNO_P (REGNO (operands[1]))
+  && !(REG_P (operands[2]) && EXT_REX_SSE_REGNO_P (REGNO (operands[2])))
+  && peep2_reg_dead_p (2, operands[0])"
+  [(set (match_dup 7)
+	(unspec:VF_128_256
+	  [(match_dup 1)
+	   (match_dup 2)
+	   (match_dup 3)] UNSPEC_PCMP))]
+  "operands[7] = gen_rtx_REG (<MODE>mode, REGNO (operands[4]));")
+
+/* Lower avx512 parallel integral compare to avx compare when dst is vector.  */
+(define_peephole2
+  [(set (match_operand:<avx512fmaskmode> 0 "register_operand")
+	(unspec:<avx512fmaskmode>
+	  [(match_operand:VI_128_256 1 "register_operand")
+	   (match_operand:VI_128_256 2 "nonimmediate_operand")]
+	  UNSPEC_MASKED_EQ))
+   (set (match_operand:VI_128_256 4 "register_operand")
+	(vec_merge:VI_128_256
+	  (match_operand:VI_128_256 5 "vector_all_ones_operand")
+	  (match_operand:VI_128_256 6 "const0_operand")
+	  (match_dup 0)))]
+  "!EXT_REX_SSE_REGNO_P (REGNO (operands[4]))
+  && !EXT_REX_SSE_REGNO_P (REGNO (operands[1]))
+  && !(REG_P (operands[2]) && EXT_REX_SSE_REGNO_P (REGNO (operands[2])))
+  && peep2_reg_dead_p (2, operands[0])"
+  [(set (match_dup 4)
+  	(eq:VI_128_256
+	  (match_dup 1)
+	  (match_dup 2)))])
+
+(define_peephole2
+  [(set (match_operand:<avx512fmaskmode> 0 "register_operand")
+	(unspec:<avx512fmaskmode>
+	  [(match_operand:VI_128_256 1 "register_operand")
+	   (match_operand:VI_128_256 2 "nonimmediate_operand")]
+	  UNSPEC_MASKED_GT))
+   (set (match_operand:VI_128_256 4 "register_operand")
+	(vec_merge:VI_128_256
+	  (match_operand:VI_128_256 5 "vector_all_ones_operand")
+	  (match_operand:VI_128_256 6 "const0_operand")
+	  (match_dup 0)))]
+  "!EXT_REX_SSE_REGNO_P (REGNO (operands[4]))
+  && !EXT_REX_SSE_REGNO_P (REGNO (operands[1]))
+  && !(REG_P (operands[2]) && EXT_REX_SSE_REGNO_P (REGNO (operands[2])))
+  && peep2_reg_dead_p (2, operands[0])"
+  [(set (match_dup 4)
+  	(gt:VI_128_256
+	  (match_dup 1)
+	  (match_dup 2)))])
+
+(define_peephole2
+  [(set (match_operand:<avx512fmaskmode> 0 "register_operand")
+	(unspec:<avx512fmaskmode>
+	  [(match_operand:VI_128_256 1 "register_operand")
+	   (match_operand:VI_128_256 2 "nonimmediate_operand")
+	   (match_operand:SI 3 "const_0_to_7_operand")]
+	  UNSPEC_PCMP))
+   (set (match_operand:VI_128_256 4 "register_operand")
+	(vec_merge:VI_128_256
+	  (match_operand:VI_128_256 5 "vector_all_ones_operand")
+	  (match_operand:VI_128_256 6 "const0_operand")
+	  (match_dup 0)))]
+  "(INTVAL (operands[3]) == 0 || INTVAL (operands[3]) == 6)
+  && !EXT_REX_SSE_REGNO_P (REGNO (operands[4]))
+  && !EXT_REX_SSE_REGNO_P (REGNO (operands[1]))
+  && !(REG_P (operands[2]) && EXT_REX_SSE_REGNO_P (REGNO (operands[2])))
+  && peep2_reg_dead_p (2, operands[0])"
+  [(const_int 0)]
+{
+  enum rtx_code code = INTVAL (operands[3]) ? GT : EQ;
+  emit_move_insn (operands[4], gen_rtx_fmt_ee (code, <MODE>mode,
+  		 	       		       operands[1], operands[2]));
+  DONE;
+})
+
 (define_insn "sse2_cvtps2pd<mask_name>"
   [(set (match_operand:V2DF 0 "register_operand" "=v")
 	(float_extend:V2DF
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-pr96891-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-pr96891-1.c
new file mode 100644
index 00000000000..45efff4e0f0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-pr96891-1.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-not "%k\[0-7\]" } } */
+
+typedef char v16qi __attribute__ ((vector_size (16)));
+typedef char v32qi __attribute__ ((vector_size (32)));
+typedef short v8hi __attribute__ ((vector_size (16)));
+typedef short v16hi __attribute__ ((vector_size (32)));
+typedef int v4si __attribute__ ((vector_size (16)));
+typedef int v8si __attribute__ ((vector_size (32)));
+typedef long long v2di __attribute__ ((vector_size (16)));
+typedef long long v4di __attribute__ ((vector_size (32)));
+
+#define FOO(VTYPE, OPNAME, OP)			\
+  VTYPE						\
+  foo_##VTYPE##_##OPNAME (VTYPE a, VTYPE b)	\
+  {						\
+    return a OP b;				\
+  }						\
+
+FOO (v16qi, eq, ==)
+FOO (v16qi, gt, >)
+FOO (v32qi, eq, ==)
+FOO (v32qi, gt, >)
+FOO (v8hi, eq, ==)
+FOO (v8hi, gt, >)
+FOO (v16hi, eq, ==)
+FOO (v16hi, gt, >)
+FOO (v4si, eq, ==)
+FOO (v4si, gt, >)
+FOO (v8si, eq, ==)
+FOO (v8si, gt, >)
+FOO (v2di, eq, ==)
+FOO (v2di, gt, >)
+FOO (v4di, eq, ==)
+FOO (v4di, gt, >)
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-pr96891-1.c b/gcc/testsuite/gcc.target/i386/avx512f-pr96891-1.c
new file mode 100644
index 00000000000..48ba943e151
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-pr96891-1.c
@@ -0,0 +1,40 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512vl -O2" } */
+/* { dg-final { scan-assembler-not "%k\[0-7\]" } } */
+
+typedef float v4sf __attribute__ ((vector_size (16)));
+typedef float v8sf __attribute__ ((vector_size (32)));
+typedef double v2df __attribute__ ((vector_size (16)));
+typedef double v4df __attribute__ ((vector_size (32)));
+
+#define FOO(VTYPE, OPNAME, OP)			\
+  VTYPE						\
+  foo_##VTYPE##_##OPNAME (VTYPE a, VTYPE b)	\
+  {						\
+    return a OP b;				\
+  }						\
+
+FOO (v4sf, eq, ==)
+FOO (v4sf, neq, !=)
+FOO (v4sf, gt, >)
+FOO (v4sf, ge, >=)
+FOO (v4sf, lt, <)
+FOO (v4sf, le, <=)
+FOO (v8sf, eq, ==)
+FOO (v8sf, neq, !=)
+FOO (v8sf, gt, >)
+FOO (v8sf, ge, >=)
+FOO (v8sf, lt, <)
+FOO (v8sf, le, <=)
+FOO (v2df, eq, ==)
+FOO (v2df, neq, !=)
+FOO (v2df, gt, >)
+FOO (v2df, ge, >=)
+FOO (v2df, lt, <)
+FOO (v2df, le, <=)
+FOO (v4df, eq, ==)
+FOO (v4df, neq, !=)
+FOO (v4df, gt, >)
+FOO (v4df, ge, >=)
+FOO (v4df, lt, <)
+FOO (v4df, le, <=)
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-pr96891-2.c b/gcc/testsuite/gcc.target/i386/avx512f-pr96891-2.c
new file mode 100644
index 00000000000..5192a00e0f4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-pr96891-2.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512vl -mavx512bw -mavx512dq -O2" } */
+/* { dg-final { scan-assembler-not "%k\[0-7\]" } } */
+
+#include<immintrin.h>
+
+#define FOO(VTYPE,PREFIX,SUFFIX,OPNAME,MASK,LEN)			\
+  VTYPE								\
+  foo_##LEN##_##SUFFIX##_##OPNAME (VTYPE a, VTYPE b)		\
+  {									\
+    MASK m = _mm##PREFIX##_cmp##OPNAME##_##SUFFIX##_mask (a, b);	\
+    return _mm##PREFIX##_movm_##SUFFIX (m);				\
+  }									\
+
+FOO (__m128i,, epi8, eq, __mmask16, 128);
+FOO (__m128i,, epi16, eq, __mmask8, 128);
+FOO (__m128i,, epi32, eq, __mmask8, 128);
+FOO (__m128i,, epi64, eq, __mmask8, 128);
+FOO (__m128i,, epi8, gt, __mmask16, 128);
+FOO (__m128i,, epi16, gt, __mmask8, 128);
+FOO (__m128i,, epi32, gt, __mmask8, 128);
+FOO (__m128i,, epi64, gt, __mmask8, 128);
+FOO (__m256i, 256, epi8, eq, __mmask32, 256);
+FOO (__m256i, 256, epi16, eq, __mmask16, 256);
+FOO (__m256i, 256, epi32, eq, __mmask8, 256);
+FOO (__m256i, 256, epi64, eq, __mmask8, 256);
+FOO (__m256i, 256, epi8, gt, __mmask32, 256);
+FOO (__m256i, 256, epi16, gt, __mmask16, 256);
+FOO (__m256i, 256, epi32, gt, __mmask8, 256);
+FOO (__m256i, 256, epi64, gt, __mmask8, 256);
-- 
2.18.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH][AVX512]Lower AVX512 vector compare to AVX version when dest is vector
  2020-09-02  9:34 [PATCH][AVX512]Lower AVX512 vector compare to AVX version when dest is vector Hongtao Liu
@ 2020-09-02 12:20 ` H.J. Lu
  2020-11-17  0:05 ` Jeff Law
  1 sibling, 0 replies; 7+ messages in thread
From: H.J. Lu @ 2020-09-02 12:20 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: GCC Patches

On Wed, Sep 2, 2020 at 2:33 AM Hongtao Liu via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Hi:
>   Add define_peephole2 to eliminate potential redundant conversion
> from mask to vector.
>   Bootstrap is ok, regression test is ok for i386/x86-64 backend.
>   Ok for trunk?
>
> gcc/ChangeLog:
>         PR target/96891
>         * config/i386/sse.md (VI_128_256): New mode iterator.
>         (define_peephole2): Lower avx512 vector compare to avx version
>         when dest is vector.
>
> gcc/testsuite/ChangeLog:

Missing PR target/96891

>         * gcc.target/i386/avx512bw-pr96891-1.c: New test.
>         * gcc.target/i386/avx512f-pr96891-1.c: New test.
>         * gcc.target/i386/avx512f-pr96891-2.c: New test.
>
> --
> BR,
> Hongtao



-- 
H.J.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH][AVX512]Lower AVX512 vector compare to AVX version when dest is vector
  2020-09-02  9:34 [PATCH][AVX512]Lower AVX512 vector compare to AVX version when dest is vector Hongtao Liu
  2020-09-02 12:20 ` H.J. Lu
@ 2020-11-17  0:05 ` Jeff Law
  2020-11-17  3:10   ` Hongtao Liu
  1 sibling, 1 reply; 7+ messages in thread
From: Jeff Law @ 2020-11-17  0:05 UTC (permalink / raw)
  To: Hongtao Liu, GCC Patches


On 9/2/20 3:34 AM, Hongtao Liu via Gcc-patches wrote:
> Hi:
>   Add define_peephole2 to eliminate potential redundant conversion
> from mask to vector.
>   Bootstrap is ok, regression test is ok for i386/x86-64 backend.
>   Ok for trunk?
>
> gcc/ChangeLog:
>         PR target/96891
>         * config/i386/sse.md (VI_128_256): New mode iterator.
>         (define_peephole2): Lower avx512 vector compare to avx version
>         when dest is vector.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/i386/avx512bw-pr96891-1.c: New test.
>         * gcc.target/i386/avx512f-pr96891-1.c: New test.
>         * gcc.target/i386/avx512f-pr96891-2.c: New test.

Aren't these the two insns in question:


(insn 7 4 8 2 (set (reg:QI 86)
        (unspec:QI [
                (reg:V8SF 90)
                (reg:V8SF 89)
                (const_int 2 [0x2])
            ] UNSPEC_PCMP)) "j.c":4:14 1911 {avx512vl_cmpv8sf3}
     (expr_list:REG_DEAD (reg:V8SF 90)
        (expr_list:REG_DEAD (reg:V8SF 89)
            (nil))))
(note 8 7 9 2 NOTE_INSN_DELETED)
(insn 9 8 14 2 (set (reg:V8SI 82 [ _2 ])
        (vec_merge:V8SI (const_vector:V8SI [
                    (const_int -1 [0xffffffffffffffff]) repeated x8
                ])
            (const_vector:V8SI [
                    (const_int 0 [0]) repeated x8
                ])
            (reg:QI 86))) "j.c":4:14 2705 {*avx512vl_cvtmask2dv8si}
     (expr_list:REG_DEAD (reg:QI 86)
        (nil)))


Note there's a data dependency between them.  insn 7 feeds insn 9.  When
there's a data dependency, combiner patterns are usually the better
choice than peepholes.  I think you'd be looking to match something
likethis (from the . combine dump):

(set (reg:V8SI 82 [ _2 ])
    (vec_merge:V8SI (const_vector:V8SI [
                (const_int -1 [0xffffffffffffffff]) repeated x8
            ])
        (const_vector:V8SI [
                (const_int 0 [0]) repeated x8
            ])
        (unspec:QI [
                (reg:V8SF 90)
                (reg:V8SF 89)
                (const_int 2 [0x2])
            ] UNSPEC_PCMP)))


Jeff


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH][AVX512]Lower AVX512 vector compare to AVX version when dest is vector
  2020-11-17  0:05 ` Jeff Law
@ 2020-11-17  3:10   ` Hongtao Liu
  2020-11-30 16:38     ` Jeff Law
  0 siblings, 1 reply; 7+ messages in thread
From: Hongtao Liu @ 2020-11-17  3:10 UTC (permalink / raw)
  To: Jeff Law; +Cc: GCC Patches

On Tue, Nov 17, 2020 at 8:05 AM Jeff Law <law@redhat.com> wrote:
>
>
> On 9/2/20 3:34 AM, Hongtao Liu via Gcc-patches wrote:
> > Hi:
> >   Add define_peephole2 to eliminate potential redundant conversion
> > from mask to vector.
> >   Bootstrap is ok, regression test is ok for i386/x86-64 backend.
> >   Ok for trunk?
> >
> > gcc/ChangeLog:
> >         PR target/96891
> >         * config/i386/sse.md (VI_128_256): New mode iterator.
> >         (define_peephole2): Lower avx512 vector compare to avx version
> >         when dest is vector.
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * gcc.target/i386/avx512bw-pr96891-1.c: New test.
> >         * gcc.target/i386/avx512f-pr96891-1.c: New test.
> >         * gcc.target/i386/avx512f-pr96891-2.c: New test.
>
> Aren't these the two insns in question:
>
>
> (insn 7 4 8 2 (set (reg:QI 86)
>         (unspec:QI [
>                 (reg:V8SF 90)
>                 (reg:V8SF 89)
>                 (const_int 2 [0x2])
>             ] UNSPEC_PCMP)) "j.c":4:14 1911 {avx512vl_cmpv8sf3}
>      (expr_list:REG_DEAD (reg:V8SF 90)
>         (expr_list:REG_DEAD (reg:V8SF 89)
>             (nil))))
> (note 8 7 9 2 NOTE_INSN_DELETED)
> (insn 9 8 14 2 (set (reg:V8SI 82 [ _2 ])
>         (vec_merge:V8SI (const_vector:V8SI [
>                     (const_int -1 [0xffffffffffffffff]) repeated x8
>                 ])
>             (const_vector:V8SI [
>                     (const_int 0 [0]) repeated x8
>                 ])
>             (reg:QI 86))) "j.c":4:14 2705 {*avx512vl_cvtmask2dv8si}
>      (expr_list:REG_DEAD (reg:QI 86)
>         (nil)))
>
>
> Note there's a data dependency between them.  insn 7 feeds insn 9.  When
> there's a data dependency, combiner patterns are usually the better
> choice than peepholes.  I think you'd be looking to match something
> likethis (from the . combine dump):
>
> (set (reg:V8SI 82 [ _2 ])
>     (vec_merge:V8SI (const_vector:V8SI [
>                 (const_int -1 [0xffffffffffffffff]) repeated x8
>             ])
>         (const_vector:V8SI [
>                 (const_int 0 [0]) repeated x8
>             ])
>         (unspec:QI [
>                 (reg:V8SF 90)
>                 (reg:V8SF 89)
>                 (const_int 2 [0x2])
>             ] UNSPEC_PCMP)))
>
>
> Jeff
>

Yes, as discussed in [1], maybe it's better to refactor avx512 integer
mask with VnBImode. Then unspec_pcmp could be dropped and simplify_rtx
could handle vector comparison more effectively.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97521#c4
-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH][AVX512]Lower AVX512 vector compare to AVX version when dest is vector
  2020-11-17  3:10   ` Hongtao Liu
@ 2020-11-30 16:38     ` Jeff Law
  2021-01-06  3:34       ` Hongtao Liu
  0 siblings, 1 reply; 7+ messages in thread
From: Jeff Law @ 2020-11-30 16:38 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: GCC Patches



On 11/16/20 8:10 PM, Hongtao Liu wrote:
> On Tue, Nov 17, 2020 at 8:05 AM Jeff Law <law@redhat.com> wrote:
>>
>> On 9/2/20 3:34 AM, Hongtao Liu via Gcc-patches wrote:
>>> Hi:
>>>   Add define_peephole2 to eliminate potential redundant conversion
>>> from mask to vector.
>>>   Bootstrap is ok, regression test is ok for i386/x86-64 backend.
>>>   Ok for trunk?
>>>
>>> gcc/ChangeLog:
>>>         PR target/96891
>>>         * config/i386/sse.md (VI_128_256): New mode iterator.
>>>         (define_peephole2): Lower avx512 vector compare to avx version
>>>         when dest is vector.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>>         * gcc.target/i386/avx512bw-pr96891-1.c: New test.
>>>         * gcc.target/i386/avx512f-pr96891-1.c: New test.
>>>         * gcc.target/i386/avx512f-pr96891-2.c: New test.
>> Aren't these the two insns in question:
>>
>>
>> (insn 7 4 8 2 (set (reg:QI 86)
>>         (unspec:QI [
>>                 (reg:V8SF 90)
>>                 (reg:V8SF 89)
>>                 (const_int 2 [0x2])
>>             ] UNSPEC_PCMP)) "j.c":4:14 1911 {avx512vl_cmpv8sf3}
>>      (expr_list:REG_DEAD (reg:V8SF 90)
>>         (expr_list:REG_DEAD (reg:V8SF 89)
>>             (nil))))
>> (note 8 7 9 2 NOTE_INSN_DELETED)
>> (insn 9 8 14 2 (set (reg:V8SI 82 [ _2 ])
>>         (vec_merge:V8SI (const_vector:V8SI [
>>                     (const_int -1 [0xffffffffffffffff]) repeated x8
>>                 ])
>>             (const_vector:V8SI [
>>                     (const_int 0 [0]) repeated x8
>>                 ])
>>             (reg:QI 86))) "j.c":4:14 2705 {*avx512vl_cvtmask2dv8si}
>>      (expr_list:REG_DEAD (reg:QI 86)
>>         (nil)))
>>
>>
>> Note there's a data dependency between them.  insn 7 feeds insn 9.  When
>> there's a data dependency, combiner patterns are usually the better
>> choice than peepholes.  I think you'd be looking to match something
>> likethis (from the . combine dump):
>>
>> (set (reg:V8SI 82 [ _2 ])
>>     (vec_merge:V8SI (const_vector:V8SI [
>>                 (const_int -1 [0xffffffffffffffff]) repeated x8
>>             ])
>>         (const_vector:V8SI [
>>                 (const_int 0 [0]) repeated x8
>>             ])
>>         (unspec:QI [
>>                 (reg:V8SF 90)
>>                 (reg:V8SF 89)
>>                 (const_int 2 [0x2])
>>             ] UNSPEC_PCMP)))
>>
>>
>> Jeff
>>
> Yes, as discussed in [1], maybe it's better to refactor avx512 integer
> mask with VnBImode. Then unspec_pcmp could be dropped and simplify_rtx
> could handle vector comparison more effectively.
>
> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97521#c4
Thanks for the pointer.   I didn't realize this patch was essentially
abandoned.

Jeff


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH][AVX512]Lower AVX512 vector compare to AVX version when dest is vector
  2020-11-30 16:38     ` Jeff Law
@ 2021-01-06  3:34       ` Hongtao Liu
  2021-01-21 13:47         ` Jakub Jelinek
  0 siblings, 1 reply; 7+ messages in thread
From: Hongtao Liu @ 2021-01-06  3:34 UTC (permalink / raw)
  To: Jeff Law; +Cc: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 1636 bytes --]

> >>
> >> Note there's a data dependency between them.  insn 7 feeds insn 9.  When
> >> there's a data dependency, combiner patterns are usually the better
> >> choice than peepholes.  I think you'd be looking to match something
> >> likethis (from the . combine dump):
> >>

Using combiner patterns, details is discussed in PR98348

Boottrapped and regtested on x86_64-linux-gnu{-m32,} for both GCC10 and trunk.
gcc/ChangeLog:

        PR target/96891
        PR target/98348
        * config/i386/sse.md (VI_128_256): New mode iterator.
        (*avx_cmp<mode>3_1, *avx_cmp<mode>3_2, *avx_cmp<mode>3_3,
         *avx_cmp<mode>3_4, *avx2_eq<mode>3, *avx2_pcmp<mode>3_1,
         *avx2_pcmp<mode>3_2, *avx2_gt<mode>3): New
        define_insn_and_split to lower avx512 vector comparison to avx
        version when dest is vector.
        (*<avx512>_cmp<mode>3,*<avx512>_cmp<mode>3,*<avx512>_ucmp<mode>3):
        define_insn_and_split for negating the comparison result.
        * config/i386/predicates.md (float_vector_all_ones_operand):
        New predicate.
        * config/i386/i386-expand.c (ix86_expand_sse_movcc): Use
        general NOT operator without UNSPEC_MASKOP.

gcc/testsuite/ChangeLog:

        PR target/96891
        PR target/98348
        * gcc.target/i386/avx512bw-pr96891-1.c: New test.
        * gcc.target/i386/avx512f-pr96891-1.c: New test.
        * gcc.target/i386/avx512f-pr96891-2.c: New test.
        * gcc.target/i386/avx512f-pr96891-3.c: New test.
        * g++.target/i386/avx512f-pr96891-1.C: New test.
        * gcc.target/i386/bitwise_mask_op-3.c: Adjust testcase.

>
> Jeff
>



--
BR,
Hongtao

[-- Attachment #2: 0001-Lower-AVX512-vector-comparison-to-AVX-version-when-d.patch --]
[-- Type: text/x-patch, Size: 24177 bytes --]

From 240c830b3d35f7571da876a21aa71e263c3abe80 Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao.liu@intel.com>
Date: Fri, 18 Dec 2020 15:56:06 +0800
Subject: [PATCH] Lower AVX512 vector comparison to AVX version when dest is
 vector.

gcc/ChangeLog:

	PR target/96891
	PR target/98348
	* config/i386/sse.md (VI_128_256): New mode iterator.
	(*avx_cmp<mode>3_1, *avx_cmp<mode>3_2, *avx_cmp<mode>3_3,
	 *avx_cmp<mode>3_4, *avx2_eq<mode>3, *avx2_pcmp<mode>3_1,
	 *avx2_pcmp<mode>3_2, *avx2_gt<mode>3): New
	define_insn_and_split to lower avx512 vector comparison to avx
	version when dest is vector.
	(*<avx512>_cmp<mode>3,*<avx512>_cmp<mode>3,*<avx512>_ucmp<mode>3):
	define_insn_and_split for negating the comparison result.
	* config/i386/predicates.md (float_vector_all_ones_operand):
	New predicate.
	* config/i386/i386-expand.c (ix86_expand_sse_movcc): Use
	general NOT operator without UNSPEC_MASKOP.

gcc/testsuite/ChangeLog:

	PR target/96891
	PR target/98348
	* gcc.target/i386/avx512bw-pr96891-1.c: New test.
	* gcc.target/i386/avx512f-pr96891-1.c: New test.
	* gcc.target/i386/avx512f-pr96891-2.c: New test.
	* gcc.target/i386/avx512f-pr96891-3.c: New test.
	* g++.target/i386/avx512f-pr96891-1.C: New test.
	* gcc.target/i386/bitwise_mask_op-3.c: Adjust testcase.
---
 gcc/config/i386/i386-expand.c                 |  14 +-
 gcc/config/i386/predicates.md                 |  47 ++++
 gcc/config/i386/sse.md                        | 261 +++++++++++++++++-
 .../g++.target/i386/avx512f-pr96891-1.C       |  37 +++
 .../gcc.target/i386/avx512bw-pr96891-1.c      |  75 +++++
 .../gcc.target/i386/avx512f-pr96891-1.c       |  40 +++
 .../gcc.target/i386/avx512f-pr96891-2.c       |  30 ++
 .../gcc.target/i386/avx512f-pr96891-3.c       |  39 +++
 .../gcc.target/i386/bitwise_mask_op-3.c       |   1 -
 9 files changed, 531 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/i386/avx512f-pr96891-1.C
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512bw-pr96891-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-pr96891-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-pr96891-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-pr96891-3.c

diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 6e08fd32726..b4f8b275718 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -3568,17 +3568,11 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp, rtx op_true, rtx op_false)
 		  ? force_reg (mode, op_false) : op_false);
       if (op_true == CONST0_RTX (mode))
 	{
-	  rtx (*gen_not) (rtx, rtx);
-	  switch (cmpmode)
-	    {
-	    case E_QImode: gen_not = gen_knotqi; break;
-	    case E_HImode: gen_not = gen_knothi; break;
-	    case E_SImode: gen_not = gen_knotsi; break;
-	    case E_DImode: gen_not = gen_knotdi; break;
-	    default: gcc_unreachable ();
-	    }
 	  rtx n = gen_reg_rtx (cmpmode);
-	  emit_insn (gen_not (n, cmp));
+	  if (cmpmode == E_DImode && !TARGET_64BIT)
+	    emit_insn (gen_knotdi (n, cmp));
+	  else
+	    emit_insn (gen_rtx_SET (n, gen_rtx_fmt_e (NOT, cmpmode, cmp)));
 	  cmp = n;
 	  /* Reverse op_true op_false.  */
 	  std::swap (op_true, op_false);
diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index be5aaa4d76f..0bb0729e933 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -1069,6 +1069,53 @@ (define_predicate "zero_extended_scalar_load_operand"
   return true;
 })
 
+/* Return true if operand is a float vector constant that is all ones. */
+(define_predicate "float_vector_all_ones_operand"
+  (match_code "const_vector,mem")
+{
+  mode = GET_MODE (op);
+  if (!FLOAT_MODE_P (mode)
+      || (MEM_P (op)
+	  && (!SYMBOL_REF_P (XEXP (op, 0))
+	      || !CONSTANT_POOL_ADDRESS_P (XEXP (op, 0)))))
+    return false;
+
+  if (MEM_P (op))
+    {
+      op = get_pool_constant (XEXP (op, 0));
+      if (GET_CODE (op) != CONST_VECTOR)
+	return false;
+
+      if (GET_MODE (op) != mode
+	 && INTEGRAL_MODE_P (GET_MODE (op))
+	 && op == CONSTM1_RTX (GET_MODE (op)))
+	return true;
+    }
+
+  rtx first = XVECEXP (op, 0, 0);
+  for (int i = 1; i != GET_MODE_NUNITS (GET_MODE (op)); i++)
+    {
+      rtx tmp = XVECEXP (op, 0, i);
+      if (!rtx_equal_p (tmp, first))
+	return false;
+    }
+  if (GET_MODE (first) == E_SFmode)
+    {
+      long l;
+      REAL_VALUE_TO_TARGET_SINGLE (*CONST_DOUBLE_REAL_VALUE (first), l);
+      return (l & 0xffffffff) == 0xffffffff;
+    }
+  else if (GET_MODE (first) == E_DFmode)
+    {
+      long l[2];
+      REAL_VALUE_TO_TARGET_DOUBLE (*CONST_DOUBLE_REAL_VALUE (first), l);
+      return ((l[0] & 0xffffffff) == 0xffffffff
+	     && (l[1] & 0xffffffff) == 0xffffffff);
+    }
+  else
+    return false;
+})
+
 /* Return true if operand is a vector constant that is all ones. */
 (define_predicate "vector_all_ones_operand"
   (and (match_code "const_vector")
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index d84103807ff..a7ac8e8ae5e 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -648,6 +648,9 @@ (define_mode_iterator VI_128 [V16QI V8HI V4SI V2DI])
 ;; All 256bit vector integer modes
 (define_mode_iterator VI_256 [V32QI V16HI V8SI V4DI])
 
+;; All 128 and 256bit vector integer modes
+(define_mode_iterator VI_128_256 [V16QI V8HI V4SI V2DI V32QI V16HI V8SI V4DI])
+
 ;; Various 128bit vector integer mode combinations
 (define_mode_iterator VI12_128 [V16QI V8HI])
 (define_mode_iterator VI14_128 [V16QI V4SI])
@@ -2965,6 +2968,102 @@ (define_insn "avx_cmp<mode>3"
    (set_attr "prefix" "vex")
    (set_attr "mode" "<MODE>")])
 
+(define_insn_and_split "*avx_cmp<mode>3_1"
+  [(set (match_operand:<sseintvecmode> 0 "register_operand")
+	(vec_merge:<sseintvecmode>
+	  (match_operand:<sseintvecmode> 1 "vector_all_ones_operand")
+	  (match_operand:<sseintvecmode> 2 "const0_operand")
+	  (unspec:<avx512fmaskmode>
+	    [(match_operand:VF_128_256 3 "register_operand")
+	     (match_operand:VF_128_256 4 "nonimmediate_operand")
+	     (match_operand:SI 5 "const_0_to_31_operand")]
+	     UNSPEC_PCMP)))]
+  "TARGET_AVX512VL && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(set (match_dup 6)
+	(unspec:VF_128_256
+	  [(match_dup 3)
+	   (match_dup 4)
+	   (match_dup 5)]
+	  UNSPEC_PCMP))
+   (set (match_dup 0) (match_dup 7))]
+{
+  operands[6] = gen_reg_rtx (<MODE>mode);
+  operands[7]
+    = lowpart_subreg (GET_MODE (operands[0]), operands[6], <MODE>mode);
+})
+
+(define_insn_and_split "*avx_cmp<mode>3_2"
+  [(set (match_operand:<sseintvecmode> 0 "register_operand")
+	(vec_merge:<sseintvecmode>
+	  (match_operand:<sseintvecmode> 1 "vector_all_ones_operand")
+	  (match_operand:<sseintvecmode> 2 "const0_operand")
+	  (not:<avx512fmaskmode>
+	    (unspec:<avx512fmaskmode>
+	      [(match_operand:VF_128_256 3 "register_operand")
+	       (match_operand:VF_128_256 4 "nonimmediate_operand")
+	       (match_operand:SI 5 "const_0_to_31_operand")]
+	       UNSPEC_PCMP))))]
+  "TARGET_AVX512VL && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(set (match_dup 6)
+	(unspec:VF_128_256
+	  [(match_dup 3)
+	   (match_dup 4)
+	   (match_dup 5)]
+	  UNSPEC_PCMP))
+   (set (match_dup 0) (match_dup 7))]
+{
+  operands[5] = GEN_INT (INTVAL (operands[5]) ^ 4);
+  operands[6] = gen_reg_rtx (<MODE>mode);
+  operands[7]
+    = lowpart_subreg (GET_MODE (operands[0]), operands[6], <MODE>mode);
+})
+
+(define_insn_and_split "*avx_cmp<mode>3_3"
+  [(set (match_operand:VF_128_256 0 "register_operand")
+	(vec_merge:VF_128_256
+	  (match_operand:VF_128_256 1 "float_vector_all_ones_operand")
+	  (match_operand:VF_128_256 2 "const0_operand")
+	  (unspec:<avx512fmaskmode>
+	    [(match_operand:VF_128_256 3 "register_operand")
+	     (match_operand:VF_128_256 4 "nonimmediate_operand")
+	     (match_operand:SI 5 "const_0_to_31_operand")]
+	     UNSPEC_PCMP)))]
+  "TARGET_AVX512VL && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+	(unspec:VF_128_256
+	  [(match_dup 3)
+	   (match_dup 4)
+	   (match_dup 5)]
+	  UNSPEC_PCMP))])
+
+(define_insn_and_split "*avx_cmp<mode>3_4"
+  [(set (match_operand:VF_128_256 0 "register_operand")
+	(vec_merge:VF_128_256
+	  (match_operand:VF_128_256 1 "float_vector_all_ones_operand")
+	  (match_operand:VF_128_256 2 "const0_operand")
+	  (not:<avx512fmaskmode>
+	    (unspec:<avx512fmaskmode>
+	      [(match_operand:VF_128_256 3 "register_operand")
+	       (match_operand:VF_128_256 4 "nonimmediate_operand")
+	       (match_operand:SI 5 "const_0_to_31_operand")]
+	       UNSPEC_PCMP))))]
+  "TARGET_AVX512VL && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+	(unspec:VF_128_256
+	  [(match_dup 3)
+	   (match_dup 4)
+	   (match_dup 5)]
+	  UNSPEC_PCMP))]
+  "operands[5] = GEN_INT (INTVAL (operands[5]) ^ 4);")
+
 (define_insn "avx_vmcmp<mode>3"
   [(set (match_operand:VF_128 0 "register_operand" "=x")
 	(vec_merge:VF_128
@@ -3056,6 +3155,25 @@ (define_insn "<avx512>_cmp<mode>3<mask_scalar_merge_name><round_saeonly_name>"
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
+(define_insn_and_split "*<avx512>_cmp<mode>3"
+  [(set (match_operand:<avx512fmaskmode> 0 "register_operand")
+	(not:<avx512fmaskmode>
+	  (unspec:<avx512fmaskmode>
+	    [(match_operand:V48_AVX512VL 1 "register_operand")
+	     (match_operand:V48_AVX512VL 2 "nonimmediate_operand")
+	     (match_operand:SI 3 "<cmp_imm_predicate>" "n")]
+	    UNSPEC_PCMP)))]
+  "TARGET_AVX512F && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+  	(unspec:<avx512fmaskmode>
+	  [(match_dup 1)
+	   (match_dup 2)
+	   (match_dup 4)]
+	   UNSPEC_PCMP))]
+  "operands[4] = GEN_INT (INTVAL (operands[3]) ^ 4);")
+
 (define_insn "<avx512>_cmp<mode>3<mask_scalar_merge_name>"
   [(set (match_operand:<avx512fmaskmode> 0 "register_operand" "=k")
 	(unspec:<avx512fmaskmode>
@@ -3070,6 +3188,28 @@ (define_insn "<avx512>_cmp<mode>3<mask_scalar_merge_name>"
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
+(define_int_iterator UNSPEC_PCMP_ITER
+  [UNSPEC_PCMP UNSPEC_UNSIGNED_PCMP])
+
+(define_insn_and_split "*<avx512>_cmp<mode>3"
+  [(set (match_operand:<avx512fmaskmode> 0 "register_operand")
+	(not:<avx512fmaskmode>
+	  (unspec:<avx512fmaskmode>
+	    [(match_operand:VI12_AVX512VL 1 "register_operand")
+	     (match_operand:VI12_AVX512VL 2 "nonimmediate_operand")
+	     (match_operand:SI 3 "<cmp_imm_predicate>")]
+	    UNSPEC_PCMP_ITER)))]
+  "TARGET_AVX512BW && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+	(unspec:<avx512fmaskmode>
+	  [(match_dup 1)
+	   (match_dup 2)
+	   (match_dup 4)]
+	   UNSPEC_PCMP_ITER))]
+  "operands[4] = GEN_INT (INTVAL (operands[3]) ^ 4);")
+
 (define_insn "<avx512>_ucmp<mode>3<mask_scalar_merge_name>"
   [(set (match_operand:<avx512fmaskmode> 0 "register_operand" "=k")
 	(unspec:<avx512fmaskmode>
@@ -3098,8 +3238,24 @@ (define_insn "<avx512>_ucmp<mode>3<mask_scalar_merge_name>"
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_int_iterator UNSPEC_PCMP_ITER
-  [UNSPEC_PCMP UNSPEC_UNSIGNED_PCMP])
+(define_insn_and_split "*<avx512>_ucmp<mode>3"
+  [(set (match_operand:<avx512fmaskmode> 0 "register_operand")
+	(not:<avx512fmaskmode>
+	  (unspec:<avx512fmaskmode>
+	    [(match_operand:VI48_AVX512VL 1 "register_operand")
+	     (match_operand:VI48_AVX512VL 2 "nonimmediate_operand")
+	     (match_operand:SI 3 "const_0_to_7_operand")]
+	    UNSPEC_UNSIGNED_PCMP)))]
+  "TARGET_AVX512F && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+	(unspec:<avx512fmaskmode>
+	  [(match_dup 1)
+	   (match_dup 2)
+	   (match_dup 4)]
+	   UNSPEC_UNSIGNED_PCMP))]
+  "operands[4] = GEN_INT (INTVAL (operands[3]) ^ 4);")
 
 (define_int_attr pcmp_signed_mask
   [(UNSPEC_PCMP "3") (UNSPEC_UNSIGNED_PCMP "1")])
@@ -12699,6 +12855,89 @@ (define_insn "*avx2_eq<mode>3"
    (set_attr "prefix" "vex")
    (set_attr "mode" "OI")])
 
+(define_insn_and_split "*avx2_eq<mode>3"
+ [(set (match_operand:VI_128_256  0 "register_operand")
+	(vec_merge:VI_128_256
+	  (match_operand:VI_128_256 1 "vector_all_ones_operand")
+	  (match_operand:VI_128_256 2 "const0_operand")
+	  (unspec:<avx512fmaskmode>
+	    [(match_operand:VI_128_256 3 "nonimmediate_operand")
+	     (match_operand:VI_128_256 4 "nonimmediate_operand")]
+	     UNSPEC_MASKED_EQ)))]
+  "TARGET_AVX512VL && ix86_pre_reload_split ()
+  && !(MEM_P (operands[3]) && MEM_P (operands[4]))"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+	(eq:VI_128_256
+	   (match_dup 3)
+	   (match_dup 4)))])
+
+(define_insn_and_split "*avx2_pcmp<mode>3_1"
+ [(set (match_operand:VI_128_256  0 "register_operand")
+	(vec_merge:VI_128_256
+	  (match_operand:VI_128_256 1 "vector_all_ones_operand")
+	  (match_operand:VI_128_256 2 "const0_operand")
+	  (unspec:<avx512fmaskmode>
+	    [(match_operand:VI_128_256 3 "nonimmediate_operand")
+	     (match_operand:VI_128_256 4 "nonimmediate_operand")
+	     (match_operand:SI 5 "const_0_to_7_operand")]
+	     UNSPEC_PCMP)))]
+  "TARGET_AVX512VL && ix86_pre_reload_split ()
+  /* EQ is commutative.  */
+   && ((INTVAL (operands[5]) == 0
+	&& !(MEM_P (operands[3]) && MEM_P (operands[4])))
+	  /* NLE aka GT, 3 must be register.  */
+       || (INTVAL (operands[5]) == 6
+	   && !MEM_P (operands[3]))
+	  /* LT, 4 must be register and we swap operands.  */
+       || (INTVAL (operands[5]) == 1
+	   && !MEM_P (operands[4])))"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  if (INTVAL (operands[5]) == 1)
+    std::swap (operands[3], operands[4]);
+  enum rtx_code code = INTVAL (operands[5]) ? GT : EQ;
+  emit_move_insn (operands[0], gen_rtx_fmt_ee (code, <MODE>mode,
+					       operands[3], operands[4]));
+  DONE;
+})
+
+(define_insn_and_split "*avx2_pcmp<mode>3_2"
+ [(set (match_operand:VI_128_256  0 "register_operand")
+	(vec_merge:VI_128_256
+	  (match_operand:VI_128_256 1 "vector_all_ones_operand")
+	  (match_operand:VI_128_256 2 "const0_operand")
+	  (not:<avx512fmaskmode>
+	    (unspec:<avx512fmaskmode>
+	      [(match_operand:VI_128_256 3 "nonimmediate_operand")
+	       (match_operand:VI_128_256 4 "nonimmediate_operand")
+	       (match_operand:SI 5 "const_0_to_7_operand")]
+	       UNSPEC_PCMP))))]
+  "TARGET_AVX512VL && ix86_pre_reload_split ()
+   /* NE is commutative.  */
+   && ((INTVAL (operands[5]) == 4
+	&& !(MEM_P (operands[3]) && MEM_P (operands[4])))
+	  /* LE, 3 must be register.  */
+       || (INTVAL (operands[5]) == 2
+	   && !MEM_P (operands[3]))
+	  /* NLT aka GE, 4 must be register and we swap operands.  */
+       || (INTVAL (operands[5]) == 5
+	   && !MEM_P (operands[4])))"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  if (INTVAL (operands[5]) == 5)
+    std::swap (operands[3], operands[4]);
+  enum rtx_code code = INTVAL (operands[5]) != 4 ? GT : EQ;
+  emit_move_insn (operands[0], gen_rtx_fmt_ee (code, <MODE>mode,
+					       operands[3], operands[4]));
+  DONE;
+})
+
 (define_expand "<avx512>_eq<mode>3<mask_scalar_merge_name>"
   [(set (match_operand:<avx512fmaskmode> 0 "register_operand")
 	(unspec:<avx512fmaskmode>
@@ -12823,6 +13062,24 @@ (define_insn "avx2_gt<mode>3"
    (set_attr "prefix" "vex")
    (set_attr "mode" "OI")])
 
+(define_insn_and_split "*avx2_gt<mode>3"
+ [(set (match_operand:VI_128_256  0 "register_operand")
+	(vec_merge:VI_128_256
+	  (match_operand:VI_128_256 1 "vector_all_ones_operand")
+	  (match_operand:VI_128_256 2 "const0_operand")
+	  (unspec:<avx512fmaskmode>
+	    [(match_operand:VI_128_256 3 "register_operand")
+	     (match_operand:VI_128_256 4 "nonimmediate_operand")]
+	     UNSPEC_MASKED_GT)))]
+  "TARGET_AVX512VL
+  && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+	(gt:VI_128_256
+	   (match_dup 3)
+	   (match_dup 4)))])
+
 (define_insn "<avx512>_gt<mode>3<mask_scalar_merge_name>"
   [(set (match_operand:<avx512fmaskmode> 0 "register_operand" "=k")
 	(unspec:<avx512fmaskmode>
diff --git a/gcc/testsuite/g++.target/i386/avx512f-pr96891-1.C b/gcc/testsuite/g++.target/i386/avx512f-pr96891-1.C
new file mode 100644
index 00000000000..969a085b900
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/avx512f-pr96891-1.C
@@ -0,0 +1,37 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512vl -O2" } */
+/* { dg-final { scan-assembler-not "%k\[0-7\]" } } */
+
+typedef float v4sf __attribute__ ((vector_size (16)));
+typedef float v8sf __attribute__ ((vector_size (32)));
+typedef double v2df __attribute__ ((vector_size (16)));
+typedef double v4df __attribute__ ((vector_size (32)));
+
+
+v4sf
+foo_v4sf (v4sf x)
+{
+  const union U { unsigned u; float f; } u = { -1U };
+  return x > 0.0f ? u.f : 0.0f;
+}
+
+v8sf
+foo_v8sf (v8sf x)
+{
+  const union U { unsigned u; float f; } u = { -1U };
+  return x > 0.0f ? u.f : 0.0f;
+}
+
+v2df
+foo_v2df (v2df x)
+{
+  const union U { unsigned long long u; double df; } u = { -1ULL };
+  return x > 0.0 ? u.df : 0.0;
+}
+
+v4df
+foo_v4df (v4df x)
+{
+  const union U { unsigned long long u; double df; } u = { -1ULL };
+  return x > 0.0 ? u.df : 0.0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-pr96891-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-pr96891-1.c
new file mode 100644
index 00000000000..d899cebd0d6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-pr96891-1.c
@@ -0,0 +1,75 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-not "%k\[0-7\]" } } */
+
+typedef char v16qi __attribute__ ((vector_size (16)));
+typedef char v32qi __attribute__ ((vector_size (32)));
+typedef short v8hi __attribute__ ((vector_size (16)));
+typedef short v16hi __attribute__ ((vector_size (32)));
+typedef int v4si __attribute__ ((vector_size (16)));
+typedef int v8si __attribute__ ((vector_size (32)));
+typedef long long v2di __attribute__ ((vector_size (16)));
+typedef long long v4di __attribute__ ((vector_size (32)));
+
+#define FOO(VTYPE, OPNAME, OP)			\
+  VTYPE						\
+  foo_##VTYPE##_##OPNAME (VTYPE a, VTYPE b)	\
+  {						\
+    return a OP b;				\
+  }						\
+
+#define FOO1(VTYPE, OPNAME, OP)			\
+  VTYPE						\
+  foo1_##VTYPE##_##OPNAME (VTYPE a, VTYPE b)	\
+  {						\
+    return ~(a OP b);				\
+  }						\
+
+FOO (v16qi, eq, ==)
+FOO1 (v16qi, neq, !=)
+FOO (v16qi, gt, >)
+FOO (v16qi, lt, <)
+FOO1 (v16qi, le, <=)
+FOO1 (v16qi, ge, >=)
+FOO (v32qi, eq, ==)
+FOO1 (v32qi, neq, !=)
+FOO (v32qi, gt, >)
+FOO (v32qi, lt, <)
+FOO1 (v32qi, le, <=)
+FOO1 (v32qi, ge, >=)
+FOO (v8hi, eq, ==)
+FOO1 (v8hi, neq, !=)
+FOO (v8hi, gt, >)
+FOO (v8hi, lt, <)
+FOO1 (v8hi, le, <=)
+FOO1 (v8hi, ge, >=)
+FOO (v16hi, eq, ==)
+FOO1 (v16hi, neq, !=)
+FOO (v16hi, gt, >)
+FOO (v16hi, lt, <)
+FOO1 (v16hi, le, <=)
+FOO1 (v16hi, ge, >=)
+FOO (v4si, eq, ==)
+FOO1 (v4si, neq, !=)
+FOO (v4si, gt, >)
+FOO (v4si, lt, <)
+FOO1 (v4si, le, <=)
+FOO1 (v4si, ge, >=)
+FOO (v8si, eq, ==)
+FOO1 (v8si, neq, !=)
+FOO (v8si, gt, >)
+FOO (v8si, lt, <)
+FOO1 (v8si, le, <=)
+FOO1 (v8si, ge, >=)
+FOO (v2di, eq, ==)
+FOO1 (v2di, neq, !=)
+FOO (v2di, gt, >)
+FOO (v2di, lt, <)
+FOO1 (v2di, le, <=)
+FOO1 (v2di, ge, >=)
+FOO (v4di, eq, ==)
+FOO1 (v4di, neq, !=)
+FOO (v4di, gt, >)
+FOO (v4di, lt, >)
+FOO1 (v4di, le, <=)
+FOO1 (v4di, ge, >=)
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-pr96891-1.c b/gcc/testsuite/gcc.target/i386/avx512f-pr96891-1.c
new file mode 100644
index 00000000000..48ba943e151
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-pr96891-1.c
@@ -0,0 +1,40 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512vl -O2" } */
+/* { dg-final { scan-assembler-not "%k\[0-7\]" } } */
+
+typedef float v4sf __attribute__ ((vector_size (16)));
+typedef float v8sf __attribute__ ((vector_size (32)));
+typedef double v2df __attribute__ ((vector_size (16)));
+typedef double v4df __attribute__ ((vector_size (32)));
+
+#define FOO(VTYPE, OPNAME, OP)			\
+  VTYPE						\
+  foo_##VTYPE##_##OPNAME (VTYPE a, VTYPE b)	\
+  {						\
+    return a OP b;				\
+  }						\
+
+FOO (v4sf, eq, ==)
+FOO (v4sf, neq, !=)
+FOO (v4sf, gt, >)
+FOO (v4sf, ge, >=)
+FOO (v4sf, lt, <)
+FOO (v4sf, le, <=)
+FOO (v8sf, eq, ==)
+FOO (v8sf, neq, !=)
+FOO (v8sf, gt, >)
+FOO (v8sf, ge, >=)
+FOO (v8sf, lt, <)
+FOO (v8sf, le, <=)
+FOO (v2df, eq, ==)
+FOO (v2df, neq, !=)
+FOO (v2df, gt, >)
+FOO (v2df, ge, >=)
+FOO (v2df, lt, <)
+FOO (v2df, le, <=)
+FOO (v4df, eq, ==)
+FOO (v4df, neq, !=)
+FOO (v4df, gt, >)
+FOO (v4df, ge, >=)
+FOO (v4df, lt, <)
+FOO (v4df, le, <=)
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-pr96891-2.c b/gcc/testsuite/gcc.target/i386/avx512f-pr96891-2.c
new file mode 100644
index 00000000000..5192a00e0f4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-pr96891-2.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512vl -mavx512bw -mavx512dq -O2" } */
+/* { dg-final { scan-assembler-not "%k\[0-7\]" } } */
+
+#include<immintrin.h>
+
+#define FOO(VTYPE,PREFIX,SUFFIX,OPNAME,MASK,LEN)			\
+  VTYPE								\
+  foo_##LEN##_##SUFFIX##_##OPNAME (VTYPE a, VTYPE b)		\
+  {									\
+    MASK m = _mm##PREFIX##_cmp##OPNAME##_##SUFFIX##_mask (a, b);	\
+    return _mm##PREFIX##_movm_##SUFFIX (m);				\
+  }									\
+
+FOO (__m128i,, epi8, eq, __mmask16, 128);
+FOO (__m128i,, epi16, eq, __mmask8, 128);
+FOO (__m128i,, epi32, eq, __mmask8, 128);
+FOO (__m128i,, epi64, eq, __mmask8, 128);
+FOO (__m128i,, epi8, gt, __mmask16, 128);
+FOO (__m128i,, epi16, gt, __mmask8, 128);
+FOO (__m128i,, epi32, gt, __mmask8, 128);
+FOO (__m128i,, epi64, gt, __mmask8, 128);
+FOO (__m256i, 256, epi8, eq, __mmask32, 256);
+FOO (__m256i, 256, epi16, eq, __mmask16, 256);
+FOO (__m256i, 256, epi32, eq, __mmask8, 256);
+FOO (__m256i, 256, epi64, eq, __mmask8, 256);
+FOO (__m256i, 256, epi8, gt, __mmask32, 256);
+FOO (__m256i, 256, epi16, gt, __mmask16, 256);
+FOO (__m256i, 256, epi32, gt, __mmask8, 256);
+FOO (__m256i, 256, epi64, gt, __mmask8, 256);
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-pr96891-3.c b/gcc/testsuite/gcc.target/i386/avx512f-pr96891-3.c
new file mode 100644
index 00000000000..1cf18f2407b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-pr96891-3.c
@@ -0,0 +1,39 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512vl -mavx512bw -mavx512dq -O2 -masm=att" } */
+/* { dg-final { scan-assembler-not {not[bwlqd]\]} } } */
+/* { dg-final { scan-assembler-times {(?n)vpcmp[bwdq][ \t]*\$5} 4} } */
+/* { dg-final { scan-assembler-times {(?n)vpcmp[bwdq][ \t]*\$6} 4} } */
+/* { dg-final { scan-assembler-times {(?n)vpcmp[bwdq][ \t]*\$7} 4} } */
+/* { dg-final { scan-assembler-times {(?n)vcmpp[sd][ \t]*\$5} 2} } */
+/* { dg-final { scan-assembler-times {(?n)vcmpp[sd][ \t]*\$6} 2} } */
+/* { dg-final { scan-assembler-times {(?n)vcmpp[sd][ \t]*\$7} 2} } */
+
+#include<immintrin.h>
+
+#define FOO(VTYPE,PREFIX,SUFFIX,MASK,LEN,CMPIMM)			\
+  MASK								\
+  foo_##LEN##_##SUFFIX##_##CMPIMM (VTYPE a, VTYPE b)				\
+  {									\
+    MASK m = _mm##PREFIX##_cmp_##SUFFIX##_mask (a, b, CMPIMM);		\
+    return ~m;								\
+  }									\
+
+FOO (__m128i,, epi8, __mmask16, 128, 1);
+FOO (__m128i,, epi16, __mmask8, 128, 1);
+FOO (__m128i,, epi32, __mmask8, 128, 1);
+FOO (__m128i,, epi64, __mmask8, 128, 1);
+FOO (__m256i, 256, epi8, __mmask32, 256, 2);
+FOO (__m256i, 256, epi16, __mmask16, 256, 2);
+FOO (__m256i, 256, epi32, __mmask8, 256, 2);
+FOO (__m256i, 256, epi64, __mmask8, 256, 2);
+FOO (__m512i, 512, epi8, __mmask64, 512, 3);
+FOO (__m512i, 512, epi16, __mmask32, 512, 3);
+FOO (__m512i, 512, epi32, __mmask16, 512, 3);
+FOO (__m512i, 512, epi64, __mmask8, 512, 3);
+
+FOO (__m128,, ps, __mmask8, 128, 1);
+FOO (__m128d,, pd, __mmask8, 128, 1);
+FOO (__m256, 256, ps, __mmask8, 256, 2);
+FOO (__m256d, 256, pd, __mmask8, 256, 2);
+FOO (__m512, 512, ps, __mmask16, 512, 3);
+FOO (__m512d, 512, pd, __mmask8, 512, 3);
diff --git a/gcc/testsuite/gcc.target/i386/bitwise_mask_op-3.c b/gcc/testsuite/gcc.target/i386/bitwise_mask_op-3.c
index 18bf4f0d768..4a9078615aa 100644
--- a/gcc/testsuite/gcc.target/i386/bitwise_mask_op-3.c
+++ b/gcc/testsuite/gcc.target/i386/bitwise_mask_op-3.c
@@ -40,5 +40,4 @@ foo_andnb (__m512i a, __m512i b)
   foo = m1 & ~m2;
 }
 
-/* { dg-final { scan-assembler-times "knotb\[\t \]" "1" } }  */
 /* { dg-final { scan-assembler-times "kmovb\[\t \]" "4"} }  */
-- 
2.18.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH][AVX512]Lower AVX512 vector compare to AVX version when dest is vector
  2021-01-06  3:34       ` Hongtao Liu
@ 2021-01-21 13:47         ` Jakub Jelinek
  0 siblings, 0 replies; 7+ messages in thread
From: Jakub Jelinek @ 2021-01-21 13:47 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: Jeff Law, GCC Patches

On Wed, Jan 06, 2021 at 11:34:32AM +0800, Hongtao Liu via Gcc-patches wrote:
> > >>
> > >> Note there's a data dependency between them.  insn 7 feeds insn 9.  When
> > >> there's a data dependency, combiner patterns are usually the better
> > >> choice than peepholes.  I think you'd be looking to match something
> > >> likethis (from the . combine dump):
> > >>
> 
> Using combiner patterns, details is discussed in PR98348
> 
> Boottrapped and regtested on x86_64-linux-gnu{-m32,} for both GCC10 and trunk.
> gcc/ChangeLog:
> 
>         PR target/96891
>         PR target/98348
>         * config/i386/sse.md (VI_128_256): New mode iterator.
>         (*avx_cmp<mode>3_1, *avx_cmp<mode>3_2, *avx_cmp<mode>3_3,
>          *avx_cmp<mode>3_4, *avx2_eq<mode>3, *avx2_pcmp<mode>3_1,
>          *avx2_pcmp<mode>3_2, *avx2_gt<mode>3): New
>         define_insn_and_split to lower avx512 vector comparison to avx
>         version when dest is vector.
>         (*<avx512>_cmp<mode>3,*<avx512>_cmp<mode>3,*<avx512>_ucmp<mode>3):
>         define_insn_and_split for negating the comparison result.
>         * config/i386/predicates.md (float_vector_all_ones_operand):
>         New predicate.
>         * config/i386/i386-expand.c (ix86_expand_sse_movcc): Use
>         general NOT operator without UNSPEC_MASKOP.
> 
> gcc/testsuite/ChangeLog:
> 
>         PR target/96891
>         PR target/98348
>         * gcc.target/i386/avx512bw-pr96891-1.c: New test.
>         * gcc.target/i386/avx512f-pr96891-1.c: New test.
>         * gcc.target/i386/avx512f-pr96891-2.c: New test.
>         * gcc.target/i386/avx512f-pr96891-3.c: New test.
>         * g++.target/i386/avx512f-pr96891-1.C: New test.
>         * gcc.target/i386/bitwise_mask_op-3.c: Adjust testcase.

Ok for trunk.  I'd prefer not to backport it to GCC 10.

	Jakub


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-01-21 13:48 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-02  9:34 [PATCH][AVX512]Lower AVX512 vector compare to AVX version when dest is vector Hongtao Liu
2020-09-02 12:20 ` H.J. Lu
2020-11-17  0:05 ` Jeff Law
2020-11-17  3:10   ` Hongtao Liu
2020-11-30 16:38     ` Jeff Law
2021-01-06  3:34       ` Hongtao Liu
2021-01-21 13:47         ` Jakub Jelinek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).