[PATCH][AVX512]Lower AVX512 vector compare to AVX version when dest is vector

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Hongtao Liu <crazylht@gmail.com>
To: GCC Patches <gcc-patches@gcc.gnu.org>
Subject: [PATCH][AVX512]Lower AVX512 vector compare to AVX version when dest is vector
Date: Wed, 2 Sep 2020 17:34:27 +0800	[thread overview]
Message-ID: <CAMZc-bz3nqJmZ-042cYAzTw7tquvwE0fc-MOG73s-r+C+BKX6Q@mail.gmail.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 592 bytes --]

Hi:
  Add define_peephole2 to eliminate potential redundant conversion
from mask to vector.
  Bootstrap is ok, regression test is ok for i386/x86-64 backend.
  Ok for trunk?

gcc/ChangeLog:
        PR target/96891
        * config/i386/sse.md (VI_128_256): New mode iterator.
        (define_peephole2): Lower avx512 vector compare to avx version
        when dest is vector.

gcc/testsuite/ChangeLog:

        * gcc.target/i386/avx512bw-pr96891-1.c: New test.
        * gcc.target/i386/avx512f-pr96891-1.c: New test.
        * gcc.target/i386/avx512f-pr96891-2.c: New test.

-- 
BR,
Hongtao

[-- Attachment #2: 0001-Lower-AVX512-vector-compare-to-AVX-version-when-dest.patch --]
[-- Type: text/x-patch, Size: 9491 bytes --]

From ba76432c08f47e4ecc1f355c0dfdea8908aaf9f4 Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao.liu@intel.com>
Date: Wed, 2 Sep 2020 17:14:39 +0800
Subject: [PATCH] Lower AVX512 vector compare to AVX version when dest is
 vector.

gcc/ChangeLog:
	PR target/96891
	* config/i386/sse.md (VI_128_256): New mode iterator.
	(define_peephole2): Lower avx512 vector compare to avx version
	when dest is vector.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512bw-pr96891-1.c: New test.
	* gcc.target/i386/avx512f-pr96891-1.c: New test.
	* gcc.target/i386/avx512f-pr96891-2.c: New test.
---
 gcc/config/i386/sse.md                        | 93 +++++++++++++++++++
 .../gcc.target/i386/avx512bw-pr96891-1.c      | 36 +++++++
 .../gcc.target/i386/avx512f-pr96891-1.c       | 40 ++++++++
 .../gcc.target/i386/avx512f-pr96891-2.c       | 30 ++++++
 4 files changed, 199 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512bw-pr96891-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-pr96891-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-pr96891-2.c

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 8250325e1a3..31e0dc2a600 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -629,6 +629,9 @@ (define_mode_iterator VI_128 [V16QI V8HI V4SI V2DI])
 ;; All 256bit vector integer modes
 (define_mode_iterator VI_256 [V32QI V16HI V8SI V4DI])
 
+;; All 128 and 256bit vector integer modes
+(define_mode_iterator VI_128_256 [V16QI V8HI V4SI V2DI V32QI V16HI V8SI V4DI])
+
 ;; Various 128bit vector integer mode combinations
 (define_mode_iterator VI12_128 [V16QI V8HI])
 (define_mode_iterator VI14_128 [V16QI V4SI])
@@ -6703,6 +6706,96 @@ (define_insn "*<avx512>_cvtmask2<ssemodesuffix><mode>"
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
+/* Lower avx512 parallel floating compare to avx compare when dst is vector.  */
+(define_peephole2
+  [(set (match_operand:<avx512fmaskmode> 0 "register_operand")
+	(unspec:<avx512fmaskmode>
+	  [(match_operand:VF_128_256 1 "register_operand")
+	   (match_operand:VF_128_256 2 "nonimmediate_operand")
+	   (match_operand:SI 3 "const_0_to_31_operand")]
+	  UNSPEC_PCMP))
+   (set (match_operand:<sseintvecmode> 4 "register_operand")
+	(vec_merge:<sseintvecmode>
+	  (match_operand:<sseintvecmode> 5 "vector_all_ones_operand")
+	  (match_operand:<sseintvecmode> 6 "const0_operand")
+	  (match_dup 0)))]
+  "!EXT_REX_SSE_REGNO_P (REGNO (operands[4]))
+  && !EXT_REX_SSE_REGNO_P (REGNO (operands[1]))
+  && !(REG_P (operands[2]) && EXT_REX_SSE_REGNO_P (REGNO (operands[2])))
+  && peep2_reg_dead_p (2, operands[0])"
+  [(set (match_dup 7)
+	(unspec:VF_128_256
+	  [(match_dup 1)
+	   (match_dup 2)
+	   (match_dup 3)] UNSPEC_PCMP))]
+  "operands[7] = gen_rtx_REG (<MODE>mode, REGNO (operands[4]));")
+
+/* Lower avx512 parallel integral compare to avx compare when dst is vector.  */
+(define_peephole2
+  [(set (match_operand:<avx512fmaskmode> 0 "register_operand")
+	(unspec:<avx512fmaskmode>
+	  [(match_operand:VI_128_256 1 "register_operand")
+	   (match_operand:VI_128_256 2 "nonimmediate_operand")]
+	  UNSPEC_MASKED_EQ))
+   (set (match_operand:VI_128_256 4 "register_operand")
+	(vec_merge:VI_128_256
+	  (match_operand:VI_128_256 5 "vector_all_ones_operand")
+	  (match_operand:VI_128_256 6 "const0_operand")
+	  (match_dup 0)))]
+  "!EXT_REX_SSE_REGNO_P (REGNO (operands[4]))
+  && !EXT_REX_SSE_REGNO_P (REGNO (operands[1]))
+  && !(REG_P (operands[2]) && EXT_REX_SSE_REGNO_P (REGNO (operands[2])))
+  && peep2_reg_dead_p (2, operands[0])"
+  [(set (match_dup 4)
+  	(eq:VI_128_256
+	  (match_dup 1)
+	  (match_dup 2)))])
+
+(define_peephole2
+  [(set (match_operand:<avx512fmaskmode> 0 "register_operand")
+	(unspec:<avx512fmaskmode>
+	  [(match_operand:VI_128_256 1 "register_operand")
+	   (match_operand:VI_128_256 2 "nonimmediate_operand")]
+	  UNSPEC_MASKED_GT))
+   (set (match_operand:VI_128_256 4 "register_operand")
+	(vec_merge:VI_128_256
+	  (match_operand:VI_128_256 5 "vector_all_ones_operand")
+	  (match_operand:VI_128_256 6 "const0_operand")
+	  (match_dup 0)))]
+  "!EXT_REX_SSE_REGNO_P (REGNO (operands[4]))
+  && !EXT_REX_SSE_REGNO_P (REGNO (operands[1]))
+  && !(REG_P (operands[2]) && EXT_REX_SSE_REGNO_P (REGNO (operands[2])))
+  && peep2_reg_dead_p (2, operands[0])"
+  [(set (match_dup 4)
+  	(gt:VI_128_256
+	  (match_dup 1)
+	  (match_dup 2)))])
+
+(define_peephole2
+  [(set (match_operand:<avx512fmaskmode> 0 "register_operand")
+	(unspec:<avx512fmaskmode>
+	  [(match_operand:VI_128_256 1 "register_operand")
+	   (match_operand:VI_128_256 2 "nonimmediate_operand")
+	   (match_operand:SI 3 "const_0_to_7_operand")]
+	  UNSPEC_PCMP))
+   (set (match_operand:VI_128_256 4 "register_operand")
+	(vec_merge:VI_128_256
+	  (match_operand:VI_128_256 5 "vector_all_ones_operand")
+	  (match_operand:VI_128_256 6 "const0_operand")
+	  (match_dup 0)))]
+  "(INTVAL (operands[3]) == 0 || INTVAL (operands[3]) == 6)
+  && !EXT_REX_SSE_REGNO_P (REGNO (operands[4]))
+  && !EXT_REX_SSE_REGNO_P (REGNO (operands[1]))
+  && !(REG_P (operands[2]) && EXT_REX_SSE_REGNO_P (REGNO (operands[2])))
+  && peep2_reg_dead_p (2, operands[0])"
+  [(const_int 0)]
+{
+  enum rtx_code code = INTVAL (operands[3]) ? GT : EQ;
+  emit_move_insn (operands[4], gen_rtx_fmt_ee (code, <MODE>mode,
+  		 	       		       operands[1], operands[2]));
+  DONE;
+})
+
 (define_insn "sse2_cvtps2pd<mask_name>"
   [(set (match_operand:V2DF 0 "register_operand" "=v")
 	(float_extend:V2DF
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-pr96891-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-pr96891-1.c
new file mode 100644
index 00000000000..45efff4e0f0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-pr96891-1.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-not "%k\[0-7\]" } } */
+
+typedef char v16qi __attribute__ ((vector_size (16)));
+typedef char v32qi __attribute__ ((vector_size (32)));
+typedef short v8hi __attribute__ ((vector_size (16)));
+typedef short v16hi __attribute__ ((vector_size (32)));
+typedef int v4si __attribute__ ((vector_size (16)));
+typedef int v8si __attribute__ ((vector_size (32)));
+typedef long long v2di __attribute__ ((vector_size (16)));
+typedef long long v4di __attribute__ ((vector_size (32)));
+
+#define FOO(VTYPE, OPNAME, OP)			\
+  VTYPE						\
+  foo_##VTYPE##_##OPNAME (VTYPE a, VTYPE b)	\
+  {						\
+    return a OP b;				\
+  }						\
+
+FOO (v16qi, eq, ==)
+FOO (v16qi, gt, >)
+FOO (v32qi, eq, ==)
+FOO (v32qi, gt, >)
+FOO (v8hi, eq, ==)
+FOO (v8hi, gt, >)
+FOO (v16hi, eq, ==)
+FOO (v16hi, gt, >)
+FOO (v4si, eq, ==)
+FOO (v4si, gt, >)
+FOO (v8si, eq, ==)
+FOO (v8si, gt, >)
+FOO (v2di, eq, ==)
+FOO (v2di, gt, >)
+FOO (v4di, eq, ==)
+FOO (v4di, gt, >)
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-pr96891-1.c b/gcc/testsuite/gcc.target/i386/avx512f-pr96891-1.c
new file mode 100644
index 00000000000..48ba943e151
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-pr96891-1.c
@@ -0,0 +1,40 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512vl -O2" } */
+/* { dg-final { scan-assembler-not "%k\[0-7\]" } } */
+
+typedef float v4sf __attribute__ ((vector_size (16)));
+typedef float v8sf __attribute__ ((vector_size (32)));
+typedef double v2df __attribute__ ((vector_size (16)));
+typedef double v4df __attribute__ ((vector_size (32)));
+
+#define FOO(VTYPE, OPNAME, OP)			\
+  VTYPE						\
+  foo_##VTYPE##_##OPNAME (VTYPE a, VTYPE b)	\
+  {						\
+    return a OP b;				\
+  }						\
+
+FOO (v4sf, eq, ==)
+FOO (v4sf, neq, !=)
+FOO (v4sf, gt, >)
+FOO (v4sf, ge, >=)
+FOO (v4sf, lt, <)
+FOO (v4sf, le, <=)
+FOO (v8sf, eq, ==)
+FOO (v8sf, neq, !=)
+FOO (v8sf, gt, >)
+FOO (v8sf, ge, >=)
+FOO (v8sf, lt, <)
+FOO (v8sf, le, <=)
+FOO (v2df, eq, ==)
+FOO (v2df, neq, !=)
+FOO (v2df, gt, >)
+FOO (v2df, ge, >=)
+FOO (v2df, lt, <)
+FOO (v2df, le, <=)
+FOO (v4df, eq, ==)
+FOO (v4df, neq, !=)
+FOO (v4df, gt, >)
+FOO (v4df, ge, >=)
+FOO (v4df, lt, <)
+FOO (v4df, le, <=)
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-pr96891-2.c b/gcc/testsuite/gcc.target/i386/avx512f-pr96891-2.c
new file mode 100644
index 00000000000..5192a00e0f4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-pr96891-2.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512vl -mavx512bw -mavx512dq -O2" } */
+/* { dg-final { scan-assembler-not "%k\[0-7\]" } } */
+
+#include<immintrin.h>
+
+#define FOO(VTYPE,PREFIX,SUFFIX,OPNAME,MASK,LEN)			\
+  VTYPE								\
+  foo_##LEN##_##SUFFIX##_##OPNAME (VTYPE a, VTYPE b)		\
+  {									\
+    MASK m = _mm##PREFIX##_cmp##OPNAME##_##SUFFIX##_mask (a, b);	\
+    return _mm##PREFIX##_movm_##SUFFIX (m);				\
+  }									\
+
+FOO (__m128i,, epi8, eq, __mmask16, 128);
+FOO (__m128i,, epi16, eq, __mmask8, 128);
+FOO (__m128i,, epi32, eq, __mmask8, 128);
+FOO (__m128i,, epi64, eq, __mmask8, 128);
+FOO (__m128i,, epi8, gt, __mmask16, 128);
+FOO (__m128i,, epi16, gt, __mmask8, 128);
+FOO (__m128i,, epi32, gt, __mmask8, 128);
+FOO (__m128i,, epi64, gt, __mmask8, 128);
+FOO (__m256i, 256, epi8, eq, __mmask32, 256);
+FOO (__m256i, 256, epi16, eq, __mmask16, 256);
+FOO (__m256i, 256, epi32, eq, __mmask8, 256);
+FOO (__m256i, 256, epi64, eq, __mmask8, 256);
+FOO (__m256i, 256, epi8, gt, __mmask32, 256);
+FOO (__m256i, 256, epi16, gt, __mmask16, 256);
+FOO (__m256i, 256, epi32, gt, __mmask8, 256);
+FOO (__m256i, 256, epi64, gt, __mmask8, 256);
-- 
2.18.1

next             reply	other threads:[~2020-09-02  9:33 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-02  9:34 Hongtao Liu [this message]
2020-09-02 12:20 ` H.J. Lu
2020-11-17  0:05 ` Jeff Law
2020-11-17  3:10   ` Hongtao Liu
2020-11-30 16:38     ` Jeff Law
2021-01-06  3:34       ` Hongtao Liu
2021-01-21 13:47         ` Jakub Jelinek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAMZc-bz3nqJmZ-042cYAzTw7tquvwE0fc-MOG73s-r+C+BKX6Q@mail.gmail.com \
    --to=crazylht@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).