public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] x86: Add pmovzx/pmovsx patterns with SI/DI operands
@ 2018-09-29 22:11 H.J. Lu
  2018-09-30 10:55 ` Marc Glisse
  0 siblings, 1 reply; 3+ messages in thread
From: H.J. Lu @ 2018-09-29 22:11 UTC (permalink / raw)
  To: gcc-patches; +Cc: Uros Bizjak

Add pmovzx/pmovsx patterns with SI and DI operands for pmovzx/pmovsx
instructions which only read the low 4 or 8 bytes from the source.

gcc/

	PR target/87317
	* config/i386/sse.md (*sse4_1_<code>v8qiv8hi2<mask_name>): New
	pattern.
	(*sse4_1_<code>v4qiv4si2<mask_name>): Likewise.
	(*sse4_1_<code>v4hiv4si2<mask_name>): Likewise.
	(*sse4_1_<code>v2hiv2di2<mask_name>): Likewise.
	(*sse4_1_<code>v2siv2di2<mask_name>): Likewise.

gcc/testsuite/

	PR target/87317
	* gcc.target/i386/pr87317-1.c: New file.
	* gcc.target/i386/pr87317-2.c: Likewise.
	* gcc.target/i386/pr87317-3.c: Likewise.
	* gcc.target/i386/pr87317-4.c: Likewise.
	* gcc.target/i386/pr87317-5.c: Likewise.
	* gcc.target/i386/pr87317-6.c: Likewise.
	* gcc.target/i386/pr87317-7.c: Likewise.
	* gcc.target/i386/pr87317-8.c: Likewise.
	* gcc.target/i386/pr87317-9.c: Likewise.
	* gcc.target/i386/pr87317-10.c: Likewise.
---
 gcc/config/i386/sse.md                     | 98 ++++++++++++++++++++++
 gcc/testsuite/gcc.target/i386/pr87317-1.c  | 13 +++
 gcc/testsuite/gcc.target/i386/pr87317-10.c | 13 +++
 gcc/testsuite/gcc.target/i386/pr87317-2.c  | 13 +++
 gcc/testsuite/gcc.target/i386/pr87317-3.c  | 13 +++
 gcc/testsuite/gcc.target/i386/pr87317-4.c  | 13 +++
 gcc/testsuite/gcc.target/i386/pr87317-5.c  | 13 +++
 gcc/testsuite/gcc.target/i386/pr87317-6.c  | 13 +++
 gcc/testsuite/gcc.target/i386/pr87317-7.c  | 13 +++
 gcc/testsuite/gcc.target/i386/pr87317-8.c  | 13 +++
 gcc/testsuite/gcc.target/i386/pr87317-9.c  | 13 +++
 11 files changed, 228 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-10.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-8.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-9.c

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index d2722fdfcd0..c8ff35b125c 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -15521,6 +15521,26 @@
    (set_attr "prefix" "orig,orig,maybe_evex")
    (set_attr "mode" "TI")])
 
+(define_insn "*sse4_1_<code>v8qiv8hi2<mask_name>"
+  [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v")
+	(any_extend:V8HI
+	  (vec_select:V8QI
+	    (subreg:V16QI
+	      (vec_concat:V2DI
+	        (match_operand:DI 1 "nonimmediate_operand" "Yrm,*xm,vm")
+		(const_int 0)) 0)
+	    (parallel [(const_int 0) (const_int 1)
+		       (const_int 2) (const_int 3)
+		       (const_int 4) (const_int 5)
+		       (const_int 6) (const_int 7)]))))]
+  "TARGET_SSE4_1 && <mask_avx512bw_condition> && <mask_avx512vl_condition>"
+  "%vpmov<extsuffix>bw\t{%1, %0<mask_operand2>|%0<mask_operand2>, %q1}"
+  [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "type" "ssemov")
+   (set_attr "prefix_extra" "1")
+   (set_attr "prefix" "orig,orig,maybe_evex")
+   (set_attr "mode" "TI")])
+
 (define_insn "<mask_codefor>avx512f_<code>v16qiv16si2<mask_name>"
   [(set (match_operand:V16SI 0 "register_operand" "=v")
 	(any_extend:V16SI
@@ -15562,6 +15582,28 @@
    (set_attr "prefix" "orig,orig,maybe_evex")
    (set_attr "mode" "TI")])
 
+(define_insn "*sse4_1_<code>v4qiv4si2<mask_name>"
+  [(set (match_operand:V4SI 0 "register_operand" "=Yr,*x,v")
+	(any_extend:V4SI
+	  (vec_select:V4QI
+	    (subreg:V16QI
+	      (vec_merge:V4SI
+	        (vec_duplicate:V4SI
+		  (match_operand:SI 1 "nonimmediate_operand" "m,*m,m"))
+		(const_vector:V4SI
+		   [(const_int 0) (const_int 0)
+		    (const_int 0) (const_int 0)])
+		(const_int 1)) 0)
+	    (parallel [(const_int 0) (const_int 1)
+		       (const_int 2) (const_int 3)]))))]
+  "TARGET_SSE4_1 && <mask_avx512vl_condition>"
+  "%vpmov<extsuffix>bd\t{%1, %0<mask_operand2>|%0<mask_operand2>, %k1}"
+  [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "type" "ssemov")
+   (set_attr "prefix_extra" "1")
+   (set_attr "prefix" "orig,orig,maybe_evex")
+   (set_attr "mode" "TI")])
+
 (define_insn "avx512f_<code>v16hiv16si2<mask_name>"
   [(set (match_operand:V16SI 0 "register_operand" "=v")
 	(any_extend:V16SI
@@ -15598,6 +15640,24 @@
    (set_attr "prefix" "orig,orig,maybe_evex")
    (set_attr "mode" "TI")])
 
+(define_insn "*sse4_1_<code>v4hiv4si2<mask_name>"
+  [(set (match_operand:V4SI 0 "register_operand" "=Yr,*x,v")
+	(any_extend:V4SI
+	  (vec_select:V4HI
+	    (subreg:V8HI
+	      (vec_concat:V2DI
+		(match_operand:DI 1 "nonimmediate_operand" "Yrm,*xm,vm")
+		(const_int 0)) 0)
+	    (parallel [(const_int 0) (const_int 1)
+		       (const_int 2) (const_int 3)]))))]
+  "TARGET_SSE4_1 && <mask_avx512vl_condition>"
+  "%vpmov<extsuffix>wd\t{%1, %0<mask_operand2>|%0<mask_operand2>, %q1}"
+  [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "type" "ssemov")
+   (set_attr "prefix_extra" "1")
+   (set_attr "prefix" "orig,orig,maybe_evex")
+   (set_attr "mode" "TI")])
+
 (define_insn "avx512f_<code>v8qiv8di2<mask_name>"
   [(set (match_operand:V8DI 0 "register_operand" "=v")
 	(any_extend:V8DI
@@ -15679,6 +15739,27 @@
    (set_attr "prefix" "orig,orig,maybe_evex")
    (set_attr "mode" "TI")])
 
+(define_insn "*sse4_1_<code>v2hiv2di2<mask_name>"
+  [(set (match_operand:V2DI 0 "register_operand" "=Yr,*x,v")
+	(any_extend:V2DI
+	  (vec_select:V2HI
+	    (subreg:V8HI
+	      (vec_merge:V4SI
+	        (vec_duplicate:V4SI
+	          (match_operand:SI 1 "nonimmediate_operand" "m,*m,m"))
+		(const_vector:V4SI
+		   [(const_int 0) (const_int 0)
+		    (const_int 0) (const_int 0)])
+		(const_int 1)) 0)
+	    (parallel [(const_int 0) (const_int 1)]))))]
+  "TARGET_SSE4_1 && <mask_avx512vl_condition>"
+  "%vpmov<extsuffix>wq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %k1}"
+  [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "type" "ssemov")
+   (set_attr "prefix_extra" "1")
+   (set_attr "prefix" "orig,orig,maybe_evex")
+   (set_attr "mode" "TI")])
+
 (define_insn "avx512f_<code>v8siv8di2<mask_name>"
   [(set (match_operand:V8DI 0 "register_operand" "=v")
 	(any_extend:V8DI
@@ -15714,6 +15795,23 @@
    (set_attr "prefix" "orig,orig,maybe_evex")
    (set_attr "mode" "TI")])
 
+(define_insn "*sse4_1_<code>v2siv2di2<mask_name>"
+  [(set (match_operand:V2DI 0 "register_operand" "=Yr,*x,v")
+	(any_extend:V2DI
+	  (vec_select:V2SI
+	    (subreg:V4SI
+	      (vec_concat:V2DI
+		(match_operand:DI 1 "nonimmediate_operand" "Yrm,*xm,vm")
+		(const_int 0)) 0)
+	    (parallel [(const_int 0) (const_int 1)]))))]
+  "TARGET_SSE4_1 && <mask_avx512vl_condition>"
+  "%vpmov<extsuffix>dq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %q1}"
+  [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "type" "ssemov")
+   (set_attr "prefix_extra" "1")
+   (set_attr "prefix" "orig,orig,maybe_evex")
+   (set_attr "mode" "TI")])
+
 ;; ptestps/ptestpd are very similar to comiss and ucomiss when
 ;; setting FLAGS_REG. But it is not a really compare instruction.
 (define_insn "avx_vtest<ssemodesuffix><avxsizesuffix>"
diff --git a/gcc/testsuite/gcc.target/i386/pr87317-1.c b/gcc/testsuite/gcc.target/i386/pr87317-1.c
new file mode 100644
index 00000000000..91f00368293
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr87317-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=haswell" } */
+/* { dg-final { scan-assembler-not "vmov(d|q)" } } */
+
+#include <immintrin.h>
+
+void
+f (void *dst, void *ptr)
+{
+  __m128i data = _mm_loadl_epi64((__m128i *)ptr);
+  data = _mm_cvtepu8_epi16(data);
+  _mm_storeu_si128((__m128i*)dst, data);
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr87317-10.c b/gcc/testsuite/gcc.target/i386/pr87317-10.c
new file mode 100644
index 00000000000..99c657e9df8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr87317-10.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=haswell" } */
+/* { dg-final { scan-assembler-not "vmov(d|q)" } } */
+
+#include <immintrin.h>
+
+void
+f (void *dst, void *ptr)
+{
+  __m128i data = _mm_cvtsi32_si128(*(int*)ptr);
+  data = _mm_cvtepu8_epi32(data);
+  _mm_storeu_si128((__m128i*)dst, data);
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr87317-2.c b/gcc/testsuite/gcc.target/i386/pr87317-2.c
new file mode 100644
index 00000000000..e21f00334e0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr87317-2.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=haswell" } */
+/* { dg-final { scan-assembler-not "vmov(d|q)" } } */
+
+#include <immintrin.h>
+
+void
+f (void *dst, void *ptr)
+{
+  __m128i data = _mm_loadl_epi64((__m128i *)ptr);
+  data = _mm_cvtepi16_epi32(data);
+  _mm_storeu_si128((__m128i*)dst, data);
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr87317-3.c b/gcc/testsuite/gcc.target/i386/pr87317-3.c
new file mode 100644
index 00000000000..d4483f9c134
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr87317-3.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=haswell" } */
+/* { dg-final { scan-assembler-not "vmov(d|q)" } } */
+
+#include <immintrin.h>
+
+void
+f (void *dst, void *ptr)
+{
+  __m128i data = _mm_loadl_epi64((__m128i *)ptr);
+  data = _mm_cvtepi32_epi64(data);
+  _mm_storeu_si128((__m128i*)dst, data);
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr87317-4.c b/gcc/testsuite/gcc.target/i386/pr87317-4.c
new file mode 100644
index 00000000000..dff24a9d657
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr87317-4.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=haswell" } */
+/* { dg-final { scan-assembler-not "vmov(d|q)" } } */
+
+#include <immintrin.h>
+
+void
+f (void *dst, __m64 x)
+{
+  __m128i y = _mm_movpi64_epi64(x);
+  __m128i z = _mm_cvtepu8_epi16(y);
+  _mm_storeu_si128((__m128i*)dst, z);
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr87317-5.c b/gcc/testsuite/gcc.target/i386/pr87317-5.c
new file mode 100644
index 00000000000..574395894b1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr87317-5.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=haswell" } */
+/* { dg-final { scan-assembler-not "vmov(d|q)" } } */
+
+#include <immintrin.h>
+
+void
+f (void *dst, __m64 x)
+{
+  __m128i y = _mm_movpi64_epi64(x);
+  __m128i z = _mm_cvtepi16_epi32(y);
+  _mm_storeu_si128((__m128i*)dst, z);
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr87317-6.c b/gcc/testsuite/gcc.target/i386/pr87317-6.c
new file mode 100644
index 00000000000..9d27648d433
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr87317-6.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=haswell" } */
+/* { dg-final { scan-assembler-not "vmov(d|q)" } } */
+
+#include <immintrin.h>
+
+void
+f (void *dst, __m64 x)
+{
+  __m128i y = _mm_movpi64_epi64(x);
+  __m128i z = _mm_cvtepi32_epi64 (y);
+  _mm_storeu_si128((__m128i*)dst, z);
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr87317-7.c b/gcc/testsuite/gcc.target/i386/pr87317-7.c
new file mode 100644
index 00000000000..99c657e9df8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr87317-7.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=haswell" } */
+/* { dg-final { scan-assembler-not "vmov(d|q)" } } */
+
+#include <immintrin.h>
+
+void
+f (void *dst, void *ptr)
+{
+  __m128i data = _mm_cvtsi32_si128(*(int*)ptr);
+  data = _mm_cvtepu8_epi32(data);
+  _mm_storeu_si128((__m128i*)dst, data);
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr87317-8.c b/gcc/testsuite/gcc.target/i386/pr87317-8.c
new file mode 100644
index 00000000000..c688e3e6d08
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr87317-8.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=haswell" } */
+/* { dg-final { scan-assembler-not "vmov(d|q)" } } */
+
+#include <immintrin.h>
+
+void
+f (void *dst, void *ptr)
+{
+  __m128i data = _mm_cvtsi32_si128(*(int*)ptr);
+  data = _mm_cvtepu16_epi64(data);
+  _mm_storeu_si128((__m128i*)dst, data);
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr87317-9.c b/gcc/testsuite/gcc.target/i386/pr87317-9.c
new file mode 100644
index 00000000000..4311ed3ceb5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr87317-9.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=haswell" } */
+/* { dg-final { scan-assembler-not "vmovq" } } */
+
+#include <immintrin.h>
+
+int
+f (void *ptr)
+{
+  __m128i data = _mm_loadl_epi64((__m128i *)ptr);
+  data = _mm_cvtepu8_epi16(data);
+  return _mm_cvtsi128_si32(data);
+}
-- 
2.17.1

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] x86: Add pmovzx/pmovsx patterns with SI/DI operands
  2018-09-29 22:11 [PATCH] x86: Add pmovzx/pmovsx patterns with SI/DI operands H.J. Lu
@ 2018-09-30 10:55 ` Marc Glisse
  2018-09-30 15:11   ` H.J. Lu
  0 siblings, 1 reply; 3+ messages in thread
From: Marc Glisse @ 2018-09-30 10:55 UTC (permalink / raw)
  To: H.J. Lu; +Cc: gcc-patches, Uros Bizjak

On Sat, 29 Sep 2018, H.J. Lu wrote:

> Add pmovzx/pmovsx patterns with SI and DI operands for pmovzx/pmovsx
> instructions which only read the low 4 or 8 bytes from the source.

Hello,

I am wondering a few things (these are questions, I am not asking for 
changes):

Should we change the builtin and make it take a shorter argument, so it is 
visible to gimple optimizers that the high part is unused? But then would 
that shorter type be v8qi (we don't really have that type) or di (risks 
trying to use general regs?)?

> +(define_insn "*sse4_1_<code>v8qiv8hi2<mask_name>"
> +  [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v")
> +	(any_extend:V8HI
> +	  (vec_select:V8QI
> +	    (subreg:V16QI
> +	      (vec_concat:V2DI
> +	        (match_operand:DI 1 "nonimmediate_operand" "Yrm,*xm,vm")
> +		(const_int 0)) 0)
> +	    (parallel [(const_int 0) (const_int 1)
> +		       (const_int 2) (const_int 3)
> +		       (const_int 4) (const_int 5)
> +		       (const_int 6) (const_int 7)]))))]

There is code in simplify-rtx.c that handles (vec_select (vec_concat x
y) z) when vec_select only picks from x. We could extend it to handle an
intermediate subreg/cast, which would yield something like:
(any_extend:V8HI (subreg:V8QI (match_operand:DI)))
or maybe even
(any_extend:V8HI (match_operand:V8QI))
Would this be likely to work? Is it desirable?

-- 
Marc Glisse

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] x86: Add pmovzx/pmovsx patterns with SI/DI operands
  2018-09-30 10:55 ` Marc Glisse
@ 2018-09-30 15:11   ` H.J. Lu
  0 siblings, 0 replies; 3+ messages in thread
From: H.J. Lu @ 2018-09-30 15:11 UTC (permalink / raw)
  To: GCC Patches; +Cc: Uros Bizjak

On Sun, Sep 30, 2018 at 1:53 AM Marc Glisse <marc.glisse@inria.fr> wrote:
>
> On Sat, 29 Sep 2018, H.J. Lu wrote:
>
> > Add pmovzx/pmovsx patterns with SI and DI operands for pmovzx/pmovsx
> > instructions which only read the low 4 or 8 bytes from the source.
>
> Hello,
>
> I am wondering a few things (these are questions, I am not asking for
> changes):
>
> Should we change the builtin and make it take a shorter argument, so it is
> visible to gimple optimizers that the high part is unused? But then would
> that shorter type be v8qi (we don't really have that type) or di (risks
> trying to use general regs?)?

I think this may lead to other issues you pointed out.

> > +(define_insn "*sse4_1_<code>v8qiv8hi2<mask_name>"
> > +  [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v")
> > +     (any_extend:V8HI
> > +       (vec_select:V8QI
> > +         (subreg:V16QI
> > +           (vec_concat:V2DI
> > +             (match_operand:DI 1 "nonimmediate_operand" "Yrm,*xm,vm")
> > +             (const_int 0)) 0)
> > +         (parallel [(const_int 0) (const_int 1)
> > +                    (const_int 2) (const_int 3)
> > +                    (const_int 4) (const_int 5)
> > +                    (const_int 6) (const_int 7)]))))]
>
> There is code in simplify-rtx.c that handles (vec_select (vec_concat x
> y) z) when vec_select only picks from x. We could extend it to handle an
> intermediate subreg/cast, which would yield something like:
> (any_extend:V8HI (subreg:V8QI (match_operand:DI)))
> or maybe even
> (any_extend:V8HI (match_operand:V8QI))
> Would this be likely to work? Is it desirable?
>

We need vector instructions with source memory or XMM operand in SI/DImode
only for these patterns.  Exposed it to simplify-rtx.c may lead to unexpected
results.  But we don't know for sure unless we try.

-- 
H.J.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-09-30 15:01 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-29 22:11 [PATCH] x86: Add pmovzx/pmovsx patterns with SI/DI operands H.J. Lu
2018-09-30 10:55 ` Marc Glisse
2018-09-30 15:11   ` H.J. Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).