public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH i386 11/8] [AVX512] Add missing packed PF gathers/scatters, rename load/store.
@ 2014-01-14  6:13 Kirill Yukhin
  2014-01-15 19:53 ` Uros Bizjak
  0 siblings, 1 reply; 8+ messages in thread
From: Kirill Yukhin @ 2014-01-14  6:13 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Jakub Jelinek, GCC Patches

Hello,
This patch introduces missing AVX-512PF intrinsics and tests.
It also renames store/load intrinsics according to EAS.

gcc/
        * config/i386/avx512fintrin.h (_mm512_loadu_si512): Rename.
        (_mm512_storeu_si512): Ditto.
        * config/i386/avx512pfintrin.h (_mm512_mask_prefetch_i32gather_pd): New.
        (_mm512_mask_prefetch_i64gather_pd): Ditto.
        (_mm512_prefetch_i32scatter_pd): Ditto.
        (_mm512_mask_prefetch_i32scatter_pd): Ditto.
        (_mm512_prefetch_i64scatter_pd): Ditto.
        (_mm512_mask_prefetch_i64scatter_pd): Ditto.
        (_mm512_mask_prefetch_i32gather_ps): Fix operand type.
        (_mm512_mask_prefetch_i64gather_ps): Ditto.
        (_mm512_prefetch_i32scatter_ps): Ditto.
        (_mm512_mask_prefetch_i32scatter_ps): Ditto.
        (_mm512_prefetch_i64scatter_ps): Ditto.
        (_mm512_mask_prefetch_i64scatter_ps): Ditto.
        * config/i386/i386-builtin-types.def: Define
        VOID_FTYPE_QI_V8SI_PCINT64_INT_INT and VOID_FTYPE_QI_V8DI_PCINT64_INT_INT.
        * config/i386/i386.c (ix86_builtins): Define IX86_BUILTIN_GATHERPFQPD,
        IX86_BUILTIN_GATHERPFDPD, IX86_BUILTIN_SCATTERPFDPD,
        IX86_BUILTIN_SCATTERPFQPD.
        (ix86_init_mmx_sse_builtins): Define __builtin_ia32_gatherpfdpd,
        __builtin_ia32_gatherpfdps, __builtin_ia32_gatherpfqpd,
        __builtin_ia32_gatherpfqps, __builtin_ia32_scatterpfdpd,
        __builtin_ia32_scatterpfdps, __builtin_ia32_scatterpfqpd,
        __builtin_ia32_scatterpfqps.
        (ix86_expand_builtin): Expand new built-ins.
        * config/i386/sse.md (avx512pf_gatherpf<mode>): Add SF suffix,
        fix memory access data type.
        (*avx512pf_gatherpf<mode>_mask): Ditto.
        (*avx512pf_gatherpf<mode>): Ditto.
        (avx512pf_scatterpf<mode>): Ditto.
        (*avx512pf_scatterpf<mode>_mask): Ditto.
        (*avx512pf_scatterpf<mode>): Ditto.
        (avx512pf_gatherpf<mode>df): New.
        (*avx512pf_gatherpf<mode>df_mask): Ditto.
        (*avx512pf_gatherpf<mode>df): Ditto.
        (avx512pf_scatterpf<mode>df): Ditto.
        (*avx512pf_scatterpf<mode>df_mask): Ditto.
        (*avx512pf_scatterpf<mode>df): Ditto.

testsuite/
        * gcc.target/i386/avx512f-vmovdqu32-1.c: Fix intrinsic name.
        * gcc.target/i386/avx512f-vmovdqu32-2.c: Ditto.
        * gcc.target/i386/avx512f-vpcmpd-2.c: Ditto.
        * gcc.target/i386/avx512f-vpcmpq-2.c: Ditto.
        * gcc.target/i386/avx512f-vpcmpud-2.c: Ditto.
        * gcc.target/i386/avx512f-vpcmpuq-2.c: Ditto.
        * gcc.target/i386/avx512pf-vgatherpf0dpd-1.c: Ditto.
        * gcc.target/i386/avx512pf-vgatherpf0qpd-1.c: Ditto.
        * gcc.target/i386/avx512pf-vgatherpf1dpd-1.c: Ditto.
        * gcc.target/i386/avx512pf-vgatherpf1qpd-1.c: Ditto.
        * gcc.target/i386/avx512pf-vscatterpf0dpd-1.c: Ditto.
        * gcc.target/i386/avx512pf-vscatterpf0qpd-1.c: Ditto.
        * gcc.target/i386/avx512pf-vscatterpf1dpd-1.c: Ditto.
        * gcc.target/i386/avx512pf-vscatterpf1qpd-1.c: Ditto.
        * gcc.target/i386/sse-14.c: Add new built-ins, fix AVX-512ER
        built-ins roudning immediate.
        * gcc.target/i386/sse-22.c: Add new built-ins.
        * gcc.target/i386/sse-23.c: Ditto.
	* gcc.target/i386/avx-1.c: Ditto.

I have a doubts about changes to sse.md.
I've splitted existing (SF-only) patterns into 2: DF and SF.
As far as insn operands and final instruction have no such data
type discrimination I set this data type to (mem:..) part.
Having this (for SF):
  (define_expand "avx512pf_scatterpf<mode>sf"
    [(unspec
       [(match_operand:<avx512fmaskmode> 0 "register_or_constm1_operand")
        (mem:SF
  ...

instead of this:
  (define_expand "avx512pf_scatterpf<mode>"
    [(unspec
       [(match_operand:<avx512fmaskmode> 0 "register_or_constm1_operand")
        (mem:<ssescalarmode>
  ...

Not sure if this (DI/SI) mode for mem is needed. Moreover, not sure what
that data type represents.

Patch in the bottom. AVX* and SSE* tests pass.

Comments or it is ok for trunk?

--
Thanks, K

---
 gcc/config/i386/avx512fintrin.h                    |   4 +-
 gcc/config/i386/avx512pfintrin.h                   | 113 ++++++++++++--
 gcc/config/i386/i386-builtin-types.def             |   2 +
 gcc/config/i386/i386.c                             |  37 ++++-
 gcc/config/i386/sse.md                             | 171 +++++++++++++++++++--
 gcc/testsuite/gcc.target/i386/avx-1.c              |   4 +
 .../gcc.target/i386/avx512f-vmovdqu32-1.c          |   4 +-
 .../gcc.target/i386/avx512f-vmovdqu32-2.c          |   4 +-
 gcc/testsuite/gcc.target/i386/avx512f-vpcmpd-2.c   |   4 +-
 gcc/testsuite/gcc.target/i386/avx512f-vpcmpq-2.c   |   4 +-
 gcc/testsuite/gcc.target/i386/avx512f-vpcmpud-2.c  |   4 +-
 gcc/testsuite/gcc.target/i386/avx512f-vpcmpuq-2.c  |   4 +-
 .../gcc.target/i386/avx512pf-vgatherpf0dpd-1.c     |  15 ++
 .../gcc.target/i386/avx512pf-vgatherpf0qpd-1.c     |  15 ++
 .../gcc.target/i386/avx512pf-vgatherpf1dpd-1.c     |  15 ++
 .../gcc.target/i386/avx512pf-vgatherpf1qpd-1.c     |  15 ++
 .../gcc.target/i386/avx512pf-vscatterpf0dpd-1.c    |  17 ++
 .../gcc.target/i386/avx512pf-vscatterpf0qpd-1.c    |  17 ++
 .../gcc.target/i386/avx512pf-vscatterpf1dpd-1.c    |  17 ++
 .../gcc.target/i386/avx512pf-vscatterpf1qpd-1.c    |  17 ++
 gcc/testsuite/gcc.target/i386/sse-14.c             |  40 ++---
 gcc/testsuite/gcc.target/i386/sse-22.c             |   5 +
 gcc/testsuite/gcc.target/i386/sse-23.c             |   4 +
 23 files changed, 469 insertions(+), 63 deletions(-)

diff --git a/gcc/config/i386/avx512fintrin.h b/gcc/config/i386/avx512fintrin.h
index 26f8cb6..4e94174 100644
--- a/gcc/config/i386/avx512fintrin.h
+++ b/gcc/config/i386/avx512fintrin.h
@@ -5570,7 +5570,7 @@ _mm512_mask_storeu_epi64 (void *__P, __mmask8 __U, __m512i __A)
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_loadu_si512 (void const *__P)
+_mm512_loadu_epi32 (void const *__P)
 {
   return (__m512i) __builtin_ia32_loaddqusi512_mask ((const __v16si *) __P,
 						     (__v16si)
@@ -5599,7 +5599,7 @@ _mm512_maskz_loadu_epi32 (__mmask16 __U, void const *__P)
 
 extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_storeu_si512 (void *__P, __m512i __A)
+_mm512_storeu_epi32 (void *__P, __m512i __A)
 {
   __builtin_ia32_storedqusi512_mask ((__v16si *) __P, (__v16si) __A,
 				     (__mmask16) -1);
diff --git a/gcc/config/i386/avx512pfintrin.h b/gcc/config/i386/avx512pfintrin.h
index b8c0110..bc7598e 100644
--- a/gcc/config/i386/avx512pfintrin.h
+++ b/gcc/config/i386/avx512pfintrin.h
@@ -48,74 +48,157 @@ typedef unsigned short __mmask16;
 #ifdef __OPTIMIZE__
 extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_prefetch_i32gather_pd (__m256i index, __mmask8 mask,
+				   void *addr, int scale, int hint)
+{
+  __builtin_ia32_gatherpfdpd (mask, (__v8si) index, (long long const *) addr,
+			      scale, hint);
+}
+
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_mask_prefetch_i32gather_ps (__m512i index, __mmask16 mask,
-				   int const *addr, int scale, int hint)
+				   void *addr, int scale, int hint)
+{
+  __builtin_ia32_gatherpfdps (mask, (__v16si) index, (int const *) addr,
+			      scale, hint);
+}
+
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_prefetch_i64gather_pd (__m512i index, __mmask8 mask,
+				   void *addr, int scale, int hint)
 {
-  __builtin_ia32_gatherpfdps (mask, (__v16si) index, addr, scale, hint);
+  __builtin_ia32_gatherpfqpd (mask, (__v8di) index, (long long const *) addr,
+			      scale, hint);
 }
 
 extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_mask_prefetch_i64gather_ps (__m512i index, __mmask8 mask,
-				   int const *addr, int scale, int hint)
+				   void *addr, int scale, int hint)
 {
-  __builtin_ia32_gatherpfqps (mask, (__v8di) index, addr, scale, hint);
+  __builtin_ia32_gatherpfqps (mask, (__v8di) index, (int const *) addr,
+			      scale, hint);
 }
 
 extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_prefetch_i32scatter_ps (int const *addr, __m512i index, int scale,
+_mm512_prefetch_i32scatter_pd (void *addr, __m256i index, int scale,
 			       int hint)
 {
-  __builtin_ia32_scatterpfdps ((__mmask16) 0xFFFF, (__v16si) index, addr, scale,
-			       hint);
+  __builtin_ia32_scatterpfdpd ((__mmask8) 0xFF, (__v8si) index, 
+			       (long long const *)addr, scale, hint);
+}
+
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_prefetch_i32scatter_ps (void *addr, __m512i index, int scale,
+			       int hint)
+{
+  __builtin_ia32_scatterpfdps ((__mmask16) 0xFFFF, (__v16si) index, (int const *) addr,
+			       scale, hint);
+}
+
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_prefetch_i32scatter_pd (void *addr, __mmask8 mask,
+				    __m256i index, int scale, int hint)
+{
+  __builtin_ia32_scatterpfdpd (mask, (__v8si) index, (long long const *) addr,
+			       scale, hint);
 }
 
 extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_prefetch_i32scatter_ps (int const *addr, __mmask16 mask,
+_mm512_mask_prefetch_i32scatter_ps (void *addr, __mmask16 mask,
 				    __m512i index, int scale, int hint)
 {
-  __builtin_ia32_scatterpfdps (mask, (__v16si) index, addr, scale, hint);
+  __builtin_ia32_scatterpfdps (mask, (__v16si) index, (int const *) addr,
+			       scale, hint);
+}
+
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_prefetch_i64scatter_pd (void *addr, __m512i index, int scale,
+			       int hint)
+{
+  __builtin_ia32_scatterpfqpd ((__mmask8) 0xFF, (__v8di) index, (long long const *) addr,
+			       scale, hint);
 }
 
 extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_prefetch_i64scatter_ps (int const *addr, __m512i index, int scale,
+_mm512_prefetch_i64scatter_ps (void *addr, __m512i index, int scale,
 			       int hint)
 {
-  __builtin_ia32_scatterpfqps ((__mmask8) 0xFF, (__v8di) index, addr, scale,
-			       hint);
+  __builtin_ia32_scatterpfqps ((__mmask8) 0xFF, (__v8di) index, (int const *) addr,
+			       scale, hint);
+}
+
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_prefetch_i64scatter_pd (void *addr, __mmask16 mask,
+				    __m512i index, int scale, int hint)
+{
+  __builtin_ia32_scatterpfqpd (mask, (__v8di) index, (long long const *) addr,
+			       scale, hint);
 }
 
 extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_prefetch_i64scatter_ps (int const *addr, __mmask16 mask,
+_mm512_mask_prefetch_i64scatter_ps (void *addr, __mmask16 mask,
 				    __m512i index, int scale, int hint)
 {
-  __builtin_ia32_scatterpfqps (mask, (__v8di) index, addr, scale, hint);
+  __builtin_ia32_scatterpfqps (mask, (__v8di) index, (int const *) addr,
+			       scale, hint);
 }
+
 #else
+#define _mm512_mask_prefetch_i32gather_pd(INDEX, MASK, ADDR, SCALE, HINT)    \
+  __builtin_ia32_gatherpfdpd ((__mmask8)MASK, (__v8si)(__m256i)INDEX,	     \
+			      (long long const *)ADDR, (int)SCALE, (int)HINT)
+
 #define _mm512_mask_prefetch_i32gather_ps(INDEX, MASK, ADDR, SCALE, HINT)    \
-  __builtin_ia32_gatherpfdps ((__mmask16)MASK, (__v16si)(__m512i)INDEX,	     \
+  __builtin_ia32_gatherpfdps ((__mmask16)MASK, (__v16si)(__m512i)INDEX,      \
 			      (int const *)ADDR, (int)SCALE, (int)HINT)
 
+#define _mm512_mask_prefetch_i64gather_pd(INDEX, MASK, ADDR, SCALE, HINT)    \
+  __builtin_ia32_gatherpfqpd ((__mmask8)MASK, (__v8di)(__m512i)INDEX,	     \
+			      (long long const *)ADDR, (int)SCALE, (int)HINT)
+
 #define _mm512_mask_prefetch_i64gather_ps(INDEX, MASK, ADDR, SCALE, HINT)    \
   __builtin_ia32_gatherpfqps ((__mmask8)MASK, (__v8di)(__m512i)INDEX,	     \
 			      (int const *)ADDR, (int)SCALE, (int)HINT)
 
+#define _mm512_prefetch_i32scatter_pd(ADDR, INDEX, SCALE, HINT)              \
+  __builtin_ia32_scatterpfdpd ((__mmask8)0xFF, (__v8si)(__m256i)INDEX,       \
+			       (long long const *)ADDR, (int)SCALE, (int)HINT)
+
 #define _mm512_prefetch_i32scatter_ps(ADDR, INDEX, SCALE, HINT)              \
   __builtin_ia32_scatterpfdps ((__mmask16)0xFFFF, (__v16si)(__m512i)INDEX,   \
 			       (int const *)ADDR, (int)SCALE, (int)HINT)
 
+#define _mm512_mask_prefetch_i32scatter_pd(ADDR, MASK, INDEX, SCALE, HINT)   \
+  __builtin_ia32_scatterpfdpd ((__mmask8)MASK, (__v8si)(__m256i)INDEX,       \
+			       (long long const *)ADDR, (int)SCALE, (int)HINT)
+
 #define _mm512_mask_prefetch_i32scatter_ps(ADDR, MASK, INDEX, SCALE, HINT)   \
   __builtin_ia32_scatterpfdps ((__mmask16)MASK, (__v16si)(__m512i)INDEX,     \
 			       (int const *)ADDR, (int)SCALE, (int)HINT)
 
+#define _mm512_prefetch_i64scatter_pd(ADDR, INDEX, SCALE, HINT)              \
+  __builtin_ia32_scatterpfqpd ((__mmask8)0xFF, (__v8di)(__m512i)INDEX,	     \
+			       (long long const *)ADDR, (int)SCALE, (int)HINT)
+
 #define _mm512_prefetch_i64scatter_ps(ADDR, INDEX, SCALE, HINT)              \
   __builtin_ia32_scatterpfqps ((__mmask8)0xFF, (__v8di)(__m512i)INDEX,	     \
 			       (int const *)ADDR, (int)SCALE, (int)HINT)
 
+#define _mm512_mask_prefetch_i64scatter_pd(ADDR, MASK, INDEX, SCALE, HINT)   \
+  __builtin_ia32_scatterpfqpd ((__mmask8)MASK, (__v8di)(__m512i)INDEX,	     \
+			       (long long const *)ADDR, (int)SCALE, (int)HINT)
+
 #define _mm512_mask_prefetch_i64scatter_ps(ADDR, MASK, INDEX, SCALE, HINT)   \
   __builtin_ia32_scatterpfqps ((__mmask8)MASK, (__v8di)(__m512i)INDEX,	     \
 			       (int const *)ADDR, (int)SCALE, (int)HINT)
diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
index acf2f32..f3c658b 100644
--- a/gcc/config/i386/i386-builtin-types.def
+++ b/gcc/config/i386/i386-builtin-types.def
@@ -733,7 +733,9 @@ DEF_FUNCTION_TYPE (VOID, PLONGLONG, QI, V8SI, V8DI, INT)
 DEF_FUNCTION_TYPE (VOID, PINT, QI, V8DI, V8SI, INT)
 DEF_FUNCTION_TYPE (VOID, PLONGLONG, QI, V8DI, V8DI, INT)
 
+DEF_FUNCTION_TYPE (VOID, QI, V8SI, PCINT64, INT, INT)
 DEF_FUNCTION_TYPE (VOID, HI, V16SI, PCINT, INT, INT)
+DEF_FUNCTION_TYPE (VOID, QI, V8DI, PCINT64, INT, INT)
 DEF_FUNCTION_TYPE (VOID, QI, V8DI, PCINT, INT, INT)
 
 DEF_FUNCTION_TYPE_ALIAS (V2DF_FTYPE_V2DF, ROUND)
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 3cda147..547c1f1 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -28323,9 +28323,13 @@ enum ix86_builtins
   IX86_BUILTIN_SCATTERSIV8DI,
 
   /* AVX512PF */
+  IX86_BUILTIN_GATHERPFQPD,
   IX86_BUILTIN_GATHERPFDPS,
+  IX86_BUILTIN_GATHERPFDPD,
   IX86_BUILTIN_GATHERPFQPS,
+  IX86_BUILTIN_SCATTERPFDPD,
   IX86_BUILTIN_SCATTERPFDPS,
+  IX86_BUILTIN_SCATTERPFQPD,
   IX86_BUILTIN_SCATTERPFQPS,
 
   /* AVX-512ER */
@@ -30855,15 +30859,27 @@ ix86_init_mmx_sse_builtins (void)
 	       IX86_BUILTIN_SCATTERDIV8DI);
 
   /* AVX512PF */
+  def_builtin (OPTION_MASK_ISA_AVX512PF, "__builtin_ia32_gatherpfdpd",
+	       VOID_FTYPE_QI_V8SI_PCINT64_INT_INT,
+	       IX86_BUILTIN_GATHERPFDPD);
   def_builtin (OPTION_MASK_ISA_AVX512PF, "__builtin_ia32_gatherpfdps",
 	       VOID_FTYPE_HI_V16SI_PCINT_INT_INT,
 	       IX86_BUILTIN_GATHERPFDPS);
+  def_builtin (OPTION_MASK_ISA_AVX512PF, "__builtin_ia32_gatherpfqpd",
+	       VOID_FTYPE_QI_V8DI_PCINT64_INT_INT,
+	       IX86_BUILTIN_GATHERPFQPD);
   def_builtin (OPTION_MASK_ISA_AVX512PF, "__builtin_ia32_gatherpfqps",
 	       VOID_FTYPE_QI_V8DI_PCINT_INT_INT,
 	       IX86_BUILTIN_GATHERPFQPS);
+  def_builtin (OPTION_MASK_ISA_AVX512PF, "__builtin_ia32_scatterpfdpd",
+	       VOID_FTYPE_QI_V8SI_PCINT64_INT_INT,
+	       IX86_BUILTIN_SCATTERPFDPD);
   def_builtin (OPTION_MASK_ISA_AVX512PF, "__builtin_ia32_scatterpfdps",
 	       VOID_FTYPE_HI_V16SI_PCINT_INT_INT,
 	       IX86_BUILTIN_SCATTERPFDPS);
+  def_builtin (OPTION_MASK_ISA_AVX512PF, "__builtin_ia32_scatterpfqpd",
+	       VOID_FTYPE_QI_V8DI_PCINT64_INT_INT,
+	       IX86_BUILTIN_SCATTERPFQPD);
   def_builtin (OPTION_MASK_ISA_AVX512PF, "__builtin_ia32_scatterpfqps",
 	       VOID_FTYPE_QI_V8DI_PCINT_INT_INT,
 	       IX86_BUILTIN_SCATTERPFQPS);
@@ -35509,17 +35525,30 @@ addcarryx:
     case IX86_BUILTIN_SCATTERDIV8DI:
       icode = CODE_FOR_avx512f_scatterdiv8di;
       goto scatter_gen;
+
+    case IX86_BUILTIN_GATHERPFDPD:
+      icode = CODE_FOR_avx512pf_gatherpfv8sidf;
+      goto vec_prefetch_gen;
     case IX86_BUILTIN_GATHERPFDPS:
-      icode = CODE_FOR_avx512pf_gatherpfv16si;
+      icode = CODE_FOR_avx512pf_gatherpfv16sisf;
+      goto vec_prefetch_gen;
+    case IX86_BUILTIN_GATHERPFQPD:
+      icode = CODE_FOR_avx512pf_gatherpfv8didf;
       goto vec_prefetch_gen;
     case IX86_BUILTIN_GATHERPFQPS:
-      icode = CODE_FOR_avx512pf_gatherpfv8di;
+      icode = CODE_FOR_avx512pf_gatherpfv8disf;
+      goto vec_prefetch_gen;
+    case IX86_BUILTIN_SCATTERPFDPD:
+      icode = CODE_FOR_avx512pf_scatterpfv8sidf;
       goto vec_prefetch_gen;
     case IX86_BUILTIN_SCATTERPFDPS:
-      icode = CODE_FOR_avx512pf_scatterpfv16si;
+      icode = CODE_FOR_avx512pf_scatterpfv16sisf;
+      goto vec_prefetch_gen;
+    case IX86_BUILTIN_SCATTERPFQPD:
+      icode = CODE_FOR_avx512pf_scatterpfv8didf;
       goto vec_prefetch_gen;
     case IX86_BUILTIN_SCATTERPFQPS:
-      icode = CODE_FOR_avx512pf_scatterpfv8di;
+      icode = CODE_FOR_avx512pf_scatterpfv8disf;
       goto vec_prefetch_gen;
 
     gather_gen:
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 31e94fe..419c33a 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -417,6 +417,7 @@
   [V32QI V16HI V8SI (V8DI "TARGET_AVX512F") (V16SI "TARGET_AVX512F")])
 (define_mode_iterator VI48_256 [V8SI V4DI])
 (define_mode_iterator VI48_512 [V16SI V8DI])
+(define_mode_iterator VI4_256_8_512 [V8SI V8DI])
 
 ;; Int-float size matches
 (define_mode_iterator VI4F_128 [V4SI V4SF])
@@ -12492,10 +12493,11 @@
    (set_attr "btver2_decode" "vector,vector,vector,vector")
    (set_attr "mode" "TI")])
 
-(define_expand "avx512pf_gatherpf<mode>"
+;; Packed float variants
+(define_expand "avx512pf_gatherpf<mode>sf"
   [(unspec
      [(match_operand:<avx512fmaskmode> 0 "register_or_constm1_operand")
-      (mem:<ssescalarmode>
+      (mem:SF
 	(match_par_dup 5
 	  [(match_operand 2 "vsib_address_operand")
 	   (match_operand:VI48_512 1 "register_operand")
@@ -12509,10 +12511,10 @@
 					operands[3]), UNSPEC_VSIBADDR);
 })
 
-(define_insn "*avx512pf_gatherpf<mode>_mask"
+(define_insn "*avx512pf_gatherpf<mode>sf_mask"
   [(unspec
      [(match_operand:<avx512fmaskmode> 0 "register_operand" "k")
-      (match_operator:<ssescalarmode> 5 "vsib_mem_operator"
+      (match_operator:SF 5 "vsib_mem_operator"
 	[(unspec:P
 	   [(match_operand:P 2 "vsib_address_operand" "Tv")
 	    (match_operand:VI48_512 1 "register_operand" "v")
@@ -12536,10 +12538,10 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
 
-(define_insn "*avx512pf_gatherpf<mode>"
+(define_insn "*avx512pf_gatherpf<mode>sf"
   [(unspec
      [(const_int -1)
-      (match_operator:<ssescalarmode> 4 "vsib_mem_operator"
+      (match_operator:SF 4 "vsib_mem_operator"
 	[(unspec:P
 	   [(match_operand:P 1 "vsib_address_operand" "Tv")
 	    (match_operand:VI48_512 0 "register_operand" "v")
@@ -12563,10 +12565,83 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
 
-(define_expand "avx512pf_scatterpf<mode>"
+;; Packed double variants
+(define_expand "avx512pf_gatherpf<mode>df"
   [(unspec
      [(match_operand:<avx512fmaskmode> 0 "register_or_constm1_operand")
-      (mem:<ssescalarmode>
+      (mem:DF
+	(match_par_dup 5
+	  [(match_operand 2 "vsib_address_operand")
+	   (match_operand:VI4_256_8_512 1 "register_operand")
+	   (match_operand:SI 3 "const1248_operand")]))
+      (match_operand:SI 4 "const_0_to_1_operand")]
+     UNSPEC_GATHER_PREFETCH)]
+  "TARGET_AVX512PF"
+{
+  operands[5]
+    = gen_rtx_UNSPEC (Pmode, gen_rtvec (3, operands[2], operands[1],
+					operands[3]), UNSPEC_VSIBADDR);
+})
+
+(define_insn "*avx512pf_gatherpf<mode>df_mask"
+  [(unspec
+     [(match_operand:<avx512fmaskmode> 0 "register_operand" "k")
+      (match_operator:DF 5 "vsib_mem_operator"
+	[(unspec:P
+	   [(match_operand:P 2 "vsib_address_operand" "Tv")
+	    (match_operand:VI4_256_8_512 1 "register_operand" "v")
+	    (match_operand:SI 3 "const1248_operand" "n")]
+	   UNSPEC_VSIBADDR)])
+      (match_operand:SI 4 "const_0_to_1_operand" "n")]
+     UNSPEC_GATHER_PREFETCH)]
+  "TARGET_AVX512PF"
+{
+  switch (INTVAL (operands[4]))
+    {
+    case 0:
+      return "vgatherpf0<ssemodesuffix>pd\t{%5%{%0%}|%5%{%0%}}";
+    case 1:
+      return "vgatherpf1<ssemodesuffix>pd\t{%5%{%0%}|%5%{%0%}}";
+    default:
+      gcc_unreachable ();
+    }
+}
+  [(set_attr "type" "sse")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "XI")])
+
+(define_insn "*avx512pf_gatherpf<mode>df"
+  [(unspec
+     [(const_int -1)
+      (match_operator:DF 4 "vsib_mem_operator"
+	[(unspec:P
+	   [(match_operand:P 1 "vsib_address_operand" "Tv")
+	    (match_operand:VI4_256_8_512 0 "register_operand" "v")
+	    (match_operand:SI 2 "const1248_operand" "n")]
+	   UNSPEC_VSIBADDR)])
+      (match_operand:SI 3 "const_0_to_1_operand" "n")]
+     UNSPEC_GATHER_PREFETCH)]
+  "TARGET_AVX512PF"
+{
+  switch (INTVAL (operands[3]))
+    {
+    case 0:
+      return "vgatherpf0<ssemodesuffix>pd\t{%4|%4}";
+    case 1:
+      return "vgatherpf1<ssemodesuffix>pd\t{%4|%4}";
+    default:
+      gcc_unreachable ();
+    }
+}
+  [(set_attr "type" "sse")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "XI")])
+
+;; Packed float variants
+(define_expand "avx512pf_scatterpf<mode>sf"
+  [(unspec
+     [(match_operand:<avx512fmaskmode> 0 "register_or_constm1_operand")
+      (mem:SF
 	(match_par_dup 5
 	  [(match_operand 2 "vsib_address_operand")
 	   (match_operand:VI48_512 1 "register_operand")
@@ -12580,10 +12655,10 @@
 					operands[3]), UNSPEC_VSIBADDR);
 })
 
-(define_insn "*avx512pf_scatterpf<mode>_mask"
+(define_insn "*avx512pf_scatterpf<mode>sf_mask"
   [(unspec
      [(match_operand:<avx512fmaskmode> 0 "register_operand" "k")
-      (match_operator:<ssescalarmode> 5 "vsib_mem_operator"
+      (match_operator:SF 5 "vsib_mem_operator"
 	[(unspec:P
 	   [(match_operand:P 2 "vsib_address_operand" "Tv")
 	    (match_operand:VI48_512 1 "register_operand" "v")
@@ -12607,10 +12682,10 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
 
-(define_insn "*avx512pf_scatterpf<mode>"
+(define_insn "*avx512pf_scatterpf<mode>sf"
   [(unspec
      [(const_int -1)
-      (match_operator:<ssescalarmode> 4 "vsib_mem_operator"
+      (match_operator:SF 4 "vsib_mem_operator"
 	[(unspec:P
 	   [(match_operand:P 1 "vsib_address_operand" "Tv")
 	    (match_operand:VI48_512 0 "register_operand" "v")
@@ -12634,6 +12709,78 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
 
+;; Packed double variants
+(define_expand "avx512pf_scatterpf<mode>df"
+  [(unspec
+     [(match_operand:<avx512fmaskmode> 0 "register_or_constm1_operand")
+      (mem:DF
+	(match_par_dup 5
+	  [(match_operand 2 "vsib_address_operand")
+	   (match_operand:VI4_256_8_512 1 "register_operand")
+	   (match_operand:SI 3 "const1248_operand")]))
+      (match_operand:SI 4 "const_0_to_1_operand")]
+     UNSPEC_SCATTER_PREFETCH)]
+  "TARGET_AVX512PF"
+{
+  operands[5]
+    = gen_rtx_UNSPEC (Pmode, gen_rtvec (3, operands[2], operands[1],
+					operands[3]), UNSPEC_VSIBADDR);
+})
+
+(define_insn "*avx512pf_scatterpf<mode>df_mask"
+  [(unspec
+     [(match_operand:<avx512fmaskmode> 0 "register_operand" "k")
+      (match_operator:DF 5 "vsib_mem_operator"
+	[(unspec:P
+	   [(match_operand:P 2 "vsib_address_operand" "Tv")
+	    (match_operand:VI4_256_8_512 1 "register_operand" "v")
+	    (match_operand:SI 3 "const1248_operand" "n")]
+	   UNSPEC_VSIBADDR)])
+      (match_operand:SI 4 "const_0_to_1_operand" "n")]
+     UNSPEC_SCATTER_PREFETCH)]
+  "TARGET_AVX512PF"
+{
+  switch (INTVAL (operands[4]))
+    {
+    case 0:
+      return "vscatterpf0<ssemodesuffix>pd\t{%5%{%0%}|%5%{%0%}}";
+    case 1:
+      return "vscatterpf1<ssemodesuffix>pd\t{%5%{%0%}|%5%{%0%}}";
+    default:
+      gcc_unreachable ();
+    }
+}
+  [(set_attr "type" "sse")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "XI")])
+
+(define_insn "*avx512pf_scatterpf<mode>df"
+  [(unspec
+     [(const_int -1)
+      (match_operator:DF 4 "vsib_mem_operator"
+	[(unspec:P
+	   [(match_operand:P 1 "vsib_address_operand" "Tv")
+	    (match_operand:VI4_256_8_512 0 "register_operand" "v")
+	    (match_operand:SI 2 "const1248_operand" "n")]
+	   UNSPEC_VSIBADDR)])
+      (match_operand:SI 3 "const_0_to_1_operand" "n")]
+     UNSPEC_SCATTER_PREFETCH)]
+  "TARGET_AVX512PF"
+{
+  switch (INTVAL (operands[3]))
+    {
+    case 0:
+      return "vscatterpf0<ssemodesuffix>pd\t{%4|%4}";
+    case 1:
+      return "vscatterpf1<ssemodesuffix>pd\t{%4|%4}";
+    default:
+      gcc_unreachable ();
+    }
+}
+  [(set_attr "type" "sse")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "XI")])
+
 (define_insn "avx512er_exp2<mode><mask_name><round_saeonly_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(unspec:VF_512
diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c
index 12674ad..8fb6fb88 100644
--- a/gcc/testsuite/gcc.target/i386/avx-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx-1.c
@@ -362,6 +362,10 @@
 #define __builtin_ia32_gatherpfqps(A, B, C, D, E) __builtin_ia32_gatherpfqps(A, B, C, 1, 1)
 #define __builtin_ia32_scatterpfdps(A, B, C, D, E) __builtin_ia32_scatterpfdps(A, B, C, 1, 1)
 #define __builtin_ia32_scatterpfqps(A, B, C, D, E) __builtin_ia32_scatterpfqps(A, B, C, 1, 1)
+#define __builtin_ia32_gatherpfdpd(A, B, C, D, E) __builtin_ia32_gatherpfdpd(A, B, C, 1, 1)
+#define __builtin_ia32_gatherpfqpd(A, B, C, D, E) __builtin_ia32_gatherpfqpd(A, B, C, 1, 1)
+#define __builtin_ia32_scatterpfdpd(A, B, C, D, E) __builtin_ia32_scatterpfdpd(A, B, C, 1, 1)
+#define __builtin_ia32_scatterpfqpd(A, B, C, D, E) __builtin_ia32_scatterpfqpd(A, B, C, 1, 1)
 
 /* shaintrin.h */
 #define __builtin_ia32_sha1rnds4(A, B, C) __builtin_ia32_sha1rnds4(A, B, 1)
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vmovdqu32-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vmovdqu32-1.c
index 79dbf9d..66e358a 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-vmovdqu32-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vmovdqu32-1.c
@@ -15,10 +15,10 @@ volatile __mmask16 m;
 void extern
 avx512f_test (void)
 {
-  x = _mm512_loadu_si512 (p);
+  x = _mm512_loadu_epi32 (p);
   x = _mm512_mask_loadu_epi32 (x, m, p);
   x = _mm512_maskz_loadu_epi32 (m, p);
 
-  _mm512_storeu_si512 (p, x);
+  _mm512_storeu_epi32 (p, x);
   _mm512_mask_storeu_epi32 (p, m, x);
 }
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vmovdqu32-2.c b/gcc/testsuite/gcc.target/i386/avx512f-vmovdqu32-2.c
index f1ae73c..0333d31 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-vmovdqu32-2.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vmovdqu32-2.c
@@ -33,8 +33,8 @@ TEST (void)
     }
 
 #if AVX512F_LEN == 512
-  res1.x = _mm512_loadu_si512 (s1.a);
-  _mm512_storeu_si512 (res2.a, s2.x);
+  res1.x = _mm512_loadu_epi32 (s1.a);
+  _mm512_storeu_epi32 (res2.a, s2.x);
 #endif
   res3.x = INTRINSIC (_mask_loadu_epi32) (res3.x, mask, s1.a);
   res4.x = INTRINSIC (_maskz_loadu_epi32) (mask, s1.a);
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vpcmpd-2.c b/gcc/testsuite/gcc.target/i386/avx512f-vpcmpd-2.c
index 600dfd2..c044f42 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-vpcmpd-2.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vpcmpd-2.c
@@ -17,8 +17,8 @@
     {							\
       dst_ref = ((rel) << i) | dst_ref;			\
     }							\
-    source1.x = _mm512_loadu_si512 (s1);		\
-    source2.x = _mm512_loadu_si512 (s2);		\
+    source1.x = _mm512_loadu_epi32 (s1);		\
+    source2.x = _mm512_loadu_epi32 (s2);		\
     dst1 = _mm512_cmp_epi32_mask (source1.x, source2.x, imm);\
     dst2 = _mm512_mask_cmp_epi32_mask (mask, source1.x, source2.x, imm);\
     if (dst_ref != dst1) abort();			\
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vpcmpq-2.c b/gcc/testsuite/gcc.target/i386/avx512f-vpcmpq-2.c
index 2a9ceb6..e3a90d8 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-vpcmpq-2.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vpcmpq-2.c
@@ -18,8 +18,8 @@ __mmask8 dst_ref;
     {							\
       dst_ref = ((rel) << i) | dst_ref;			\
     }							\
-    source1.x = _mm512_loadu_si512 (s1);		\
-    source2.x = _mm512_loadu_si512 (s2);		\
+    source1.x = _mm512_loadu_epi32 (s1);		\
+    source2.x = _mm512_loadu_epi32 (s2);		\
     dst1 = _mm512_cmp_epi64_mask (source1.x, source2.x, imm);\
     dst2 = _mm512_mask_cmp_epi64_mask (mask, source1.x, source2.x, imm);\
     if (dst_ref != dst1) abort();			\
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vpcmpud-2.c b/gcc/testsuite/gcc.target/i386/avx512f-vpcmpud-2.c
index c0bb978..a90baf9 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-vpcmpud-2.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vpcmpud-2.c
@@ -17,8 +17,8 @@
     {							\
       dst_ref = ((rel) << i) | dst_ref;			\
     }							\
-    source1.x = _mm512_loadu_si512 (s1);		\
-    source2.x = _mm512_loadu_si512 (s2);		\
+    source1.x = _mm512_loadu_epi32 (s1);		\
+    source2.x = _mm512_loadu_epi32 (s2);		\
     dst1 = _mm512_cmp_epu32_mask (source1.x, source2.x, imm);\
     dst2 = _mm512_mask_cmp_epu32_mask (mask, source1.x, source2.x, imm);\
     if (dst_ref != dst1) abort();			\
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vpcmpuq-2.c b/gcc/testsuite/gcc.target/i386/avx512f-vpcmpuq-2.c
index 3bd1b86..c49f5e4 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-vpcmpuq-2.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vpcmpuq-2.c
@@ -17,8 +17,8 @@
     {							\
       dst_ref = ((rel) << i) | dst_ref;			\
     }							\
-    source1.x = _mm512_loadu_si512 (s1);		\
-    source2.x = _mm512_loadu_si512 (s2);		\
+    source1.x = _mm512_loadu_epi32 (s1);		\
+    source2.x = _mm512_loadu_epi32 (s2);		\
     dst1 = _mm512_cmp_epu64_mask (source1.x, source2.x, imm);\
     dst2 = _mm512_mask_cmp_epu64_mask (mask, source1.x, source2.x, imm);\
     if (dst_ref != dst1) abort();			\
diff --git a/gcc/testsuite/gcc.target/i386/avx512pf-vgatherpf0dpd-1.c b/gcc/testsuite/gcc.target/i386/avx512pf-vgatherpf0dpd-1.c
new file mode 100644
index 0000000..1368b7a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512pf-vgatherpf0dpd-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512pf -O2" } */
+/* { dg-final { scan-assembler-times "vgatherpf0dpd\[ \\t\]+\[^\n\]*\{%k\[1-7\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256i idx;
+volatile __mmask8 m8;
+void *base;
+
+void extern
+avx512pf_test (void)
+{
+  _mm512_mask_prefetch_i32gather_pd (idx, m8, base, 8, 0);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512pf-vgatherpf0qpd-1.c b/gcc/testsuite/gcc.target/i386/avx512pf-vgatherpf0qpd-1.c
new file mode 100644
index 0000000..61a81bb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512pf-vgatherpf0qpd-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512pf -O2" } */
+/* { dg-final { scan-assembler-times "vgatherpf0qpd\[ \\t\]+\[^\n\]*\{%k\[1-7\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m512i idx;
+volatile __mmask8 m8;
+int *base;
+
+void extern
+avx512pf_test (void)
+{
+  _mm512_mask_prefetch_i64gather_pd (idx, m8, base, 8, 0);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512pf-vgatherpf1dpd-1.c b/gcc/testsuite/gcc.target/i386/avx512pf-vgatherpf1dpd-1.c
new file mode 100644
index 0000000..5bc7599
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512pf-vgatherpf1dpd-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512pf -O2" } */
+/* { dg-final { scan-assembler-times "vgatherpf1dpd\[ \\t\]+\[^\n\]*\{%k\[1-7\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256i idx;
+volatile __mmask8 m8;
+int *base;
+
+void extern
+avx512pf_test (void)
+{
+  _mm512_mask_prefetch_i32gather_pd (idx, m8, base, 8, 1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512pf-vgatherpf1qpd-1.c b/gcc/testsuite/gcc.target/i386/avx512pf-vgatherpf1qpd-1.c
new file mode 100644
index 0000000..96610db
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512pf-vgatherpf1qpd-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512pf -O2" } */
+/* { dg-final { scan-assembler-times "vgatherpf1qpd\[ \\t\]+\[^\n\]*\{%k\[1-7\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m512i idx;
+volatile __mmask8 m8;
+int *base;
+
+void extern
+avx512pf_test (void)
+{
+  _mm512_mask_prefetch_i64gather_pd (idx, m8, base, 8, 1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf0dpd-1.c b/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf0dpd-1.c
new file mode 100644
index 0000000..83c31cc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf0dpd-1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512pf -O2" } */
+/* { dg-final { scan-assembler-times "vscatterpf0dpd\[ \\t\]+\[^\n\]*%ymm\[0-9\]" 2 } } */
+/* { dg-final { scan-assembler-times "vscatterpf0dpd\[ \\t\]+\[^\n\]*\{%k\[1-7\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256i idx;
+volatile __mmask8 m8;
+void *base;
+
+void extern
+avx512pf_test (void)
+{
+  _mm512_prefetch_i32scatter_pd (base, idx, 8, 0);
+  _mm512_mask_prefetch_i32scatter_pd (base, m8, idx, 8, 0);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf0qpd-1.c b/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf0qpd-1.c
new file mode 100644
index 0000000..31172f8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf0qpd-1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512pf -O2" } */
+/* { dg-final { scan-assembler-times "vscatterpf0qpd\[ \\t\]+\[^\n\]*%zmm\[0-9\]" 2 } } */
+/* { dg-final { scan-assembler-times "vscatterpf0qpd\[ \\t\]+\[^\n\]*\{%k\[1-7\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m512i idx;
+volatile __mmask8 m8;
+void *base;
+
+void extern
+avx512pf_test (void)
+{
+  _mm512_prefetch_i64scatter_pd (base, idx, 8, 0);
+  _mm512_mask_prefetch_i64scatter_pd (base, m8, idx, 8, 0);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf1dpd-1.c b/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf1dpd-1.c
new file mode 100644
index 0000000..205505b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf1dpd-1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512pf -O2" } */
+/* { dg-final { scan-assembler-times "vscatterpf1dpd\[ \\t\]+\[^\n\]*%ymm\[0-9\]" 2 } } */
+/* { dg-final { scan-assembler-times "vscatterpf1dpd\[ \\t\]+\[^\n\]*\{%k\[1-7\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256i idx;
+volatile __mmask8 m8;
+void *base;
+
+void extern
+avx512pf_test (void)
+{
+  _mm512_prefetch_i32scatter_pd (base, idx, 8, 1);
+  _mm512_mask_prefetch_i32scatter_pd (base, m8, idx, 8, 1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf1qpd-1.c b/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf1qpd-1.c
new file mode 100644
index 0000000..64d7dfa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf1qpd-1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512pf -O2" } */
+/* { dg-final { scan-assembler-times "vscatterpf1qpd\[ \\t\]+\[^\n\]*%zmm\[0-9\]" 2 } } */
+/* { dg-final { scan-assembler-times "vscatterpf1qpd\[ \\t\]+\[^\n\]*\{%k\[1-7\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m512i idx;
+volatile __mmask8 m8;
+int *base;
+
+void extern
+avx512pf_test (void)
+{
+  _mm512_prefetch_i64scatter_pd (base, idx, 8, 1);
+  _mm512_mask_prefetch_i64scatter_pd (base, m8, idx, 8, 1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c
index c5d8876..643eb99 100644
--- a/gcc/testsuite/gcc.target/i386/sse-14.c
+++ b/gcc/testsuite/gcc.target/i386/sse-14.c
@@ -523,26 +523,30 @@ test_3vx (_mm512_mask_prefetch_i32gather_ps, __m512i, __mmask16, void const *, 1
 test_3vx (_mm512_mask_prefetch_i32scatter_ps, void const *, __mmask16, __m512i, 1, 1)
 test_3vx (_mm512_mask_prefetch_i64gather_ps, __m512i, __mmask8, void const *, 1, 1)
 test_3vx (_mm512_mask_prefetch_i64scatter_ps, void const *, __mmask8, __m512i, 1, 1)
+test_3vx (_mm512_mask_prefetch_i32gather_pd, __m256i, __mmask8, void const *, 1, 1)
+test_3vx (_mm512_mask_prefetch_i32scatter_pd, void const *, __mmask8, __m256i, 1, 1)
+test_3vx (_mm512_mask_prefetch_i64gather_pd, __m512i, __mmask8, void const *, 1, 1)
+test_3vx (_mm512_mask_prefetch_i64scatter_pd, void const *, __mmask8, __m512i, 1, 1)
 
 /* avx512erintrin.h */
-test_1 (_mm512_exp2a23_round_pd, __m512d, __m512d, 1)
-test_1 (_mm512_exp2a23_round_ps, __m512, __m512, 1)
-test_1 (_mm512_rcp28_round_pd, __m512d, __m512d, 1)
-test_1 (_mm512_rcp28_round_ps, __m512, __m512, 1)
-test_1 (_mm512_rsqrt28_round_pd, __m512d, __m512d, 1)
-test_1 (_mm512_rsqrt28_round_ps, __m512, __m512, 1)
-test_2 (_mm512_maskz_exp2a23_round_pd, __m512d, __mmask8, __m512d, 1)
-test_2 (_mm512_maskz_exp2a23_round_ps, __m512, __mmask16, __m512, 1)
-test_2 (_mm512_maskz_rcp28_round_pd, __m512d, __mmask8, __m512d, 1)
-test_2 (_mm512_maskz_rcp28_round_ps, __m512, __mmask16, __m512, 1)
-test_2 (_mm512_maskz_rsqrt28_round_pd, __m512d, __mmask8, __m512d, 1)
-test_2 (_mm512_maskz_rsqrt28_round_ps, __m512, __mmask16, __m512, 1)
-test_3 (_mm512_mask_exp2a23_round_pd, __m512d, __m512d, __mmask8, __m512d, 1)
-test_3 (_mm512_mask_exp2a23_round_ps, __m512, __m512, __mmask16, __m512, 1)
-test_3 (_mm512_mask_rcp28_round_pd, __m512d, __m512d, __mmask8, __m512d, 1)
-test_3 (_mm512_mask_rcp28_round_ps, __m512, __m512, __mmask16, __m512, 1)
-test_3 (_mm512_mask_rsqrt28_round_pd, __m512d, __m512d, __mmask8, __m512d, 1)
-test_3 (_mm512_mask_rsqrt28_round_ps, __m512, __m512, __mmask16, __m512, 1)
+test_1 (_mm512_exp2a23_round_pd, __m512d, __m512d, 5)
+test_1 (_mm512_exp2a23_round_ps, __m512, __m512, 5)
+test_1 (_mm512_rcp28_round_pd, __m512d, __m512d, 5)
+test_1 (_mm512_rcp28_round_ps, __m512, __m512, 5)
+test_1 (_mm512_rsqrt28_round_pd, __m512d, __m512d, 5)
+test_1 (_mm512_rsqrt28_round_ps, __m512, __m512, 5)
+test_2 (_mm512_maskz_exp2a23_round_pd, __m512d, __mmask8, __m512d, 5)
+test_2 (_mm512_maskz_exp2a23_round_ps, __m512, __mmask16, __m512, 5)
+test_2 (_mm512_maskz_rcp28_round_pd, __m512d, __mmask8, __m512d, 5)
+test_2 (_mm512_maskz_rcp28_round_ps, __m512, __mmask16, __m512, 5)
+test_2 (_mm512_maskz_rsqrt28_round_pd, __m512d, __mmask8, __m512d, 5)
+test_2 (_mm512_maskz_rsqrt28_round_ps, __m512, __mmask16, __m512, 5)
+test_3 (_mm512_mask_exp2a23_round_pd, __m512d, __m512d, __mmask8, __m512d, 5)
+test_3 (_mm512_mask_exp2a23_round_ps, __m512, __m512, __mmask16, __m512, 5)
+test_3 (_mm512_mask_rcp28_round_pd, __m512d, __m512d, __mmask8, __m512d, 5)
+test_3 (_mm512_mask_rcp28_round_ps, __m512, __m512, __mmask16, __m512, 5)
+test_3 (_mm512_mask_rsqrt28_round_pd, __m512d, __m512d, __mmask8, __m512d, 5)
+test_3 (_mm512_mask_rsqrt28_round_ps, __m512, __m512, __mmask16, __m512, 5)
 
 /* shaintrin.h */
 test_2 (_mm_sha1rnds4_epu32, __m128i, __m128i, __m128i, 1)
diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c
index 630c952..7d68be1 100644
--- a/gcc/testsuite/gcc.target/i386/sse-22.c
+++ b/gcc/testsuite/gcc.target/i386/sse-22.c
@@ -646,6 +646,11 @@ test_3vx (_mm512_mask_prefetch_i32scatter_ps, void const *, __mmask16, __m512i,
 test_3vx (_mm512_mask_prefetch_i64gather_ps, __m512i, __mmask8, void const *, 1, 1)
 test_3vx (_mm512_mask_prefetch_i64scatter_ps, void const *, __mmask8, __m512i, 1, 1)
 
+test_3vx (_mm512_mask_prefetch_i32gather_pd, __m256i, __mmask8, void const *, 1, 1)
+test_3vx (_mm512_mask_prefetch_i32scatter_pd, void const *, __mmask8, __m256i, 1, 1)
+test_3vx (_mm512_mask_prefetch_i64gather_pd, __m512i, __mmask8, long long *, 1, 1)
+test_3vx (_mm512_mask_prefetch_i64scatter_pd, void const *, __mmask8, __m512i, 1, 1)
+
 /* avx512erintrin.h */
 test_1 (_mm512_exp2a23_round_pd, __m512d, __m512d, 5)
 test_1 (_mm512_exp2a23_round_ps, __m512, __m512, 5)
diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
index 309cd73..77c8d67 100644
--- a/gcc/testsuite/gcc.target/i386/sse-23.c
+++ b/gcc/testsuite/gcc.target/i386/sse-23.c
@@ -365,6 +365,10 @@
 #define __builtin_ia32_gatherpfqps(A, B, C, D, E) __builtin_ia32_gatherpfqps(A, B, C, 1, 1)
 #define __builtin_ia32_scatterpfdps(A, B, C, D, E) __builtin_ia32_scatterpfdps(A, B, C, 1, 1)
 #define __builtin_ia32_scatterpfqps(A, B, C, D, E) __builtin_ia32_scatterpfqps(A, B, C, 1, 1)
+#define __builtin_ia32_gatherpfdpd(A, B, C, D, E) __builtin_ia32_gatherpfdpd(A, B, C, 1, 1)
+#define __builtin_ia32_gatherpfqpd(A, B, C, D, E) __builtin_ia32_gatherpfqpd(A, B, C, 1, 1)
+#define __builtin_ia32_scatterpfdpd(A, B, C, D, E) __builtin_ia32_scatterpfdpd(A, B, C, 1, 1)
+#define __builtin_ia32_scatterpfqpd(A, B, C, D, E) __builtin_ia32_scatterpfqpd(A, B, C, 1, 1)
 
 /* avx512erintrin.h */
 #define __builtin_ia32_exp2pd_mask(A, B, C, D) __builtin_ia32_exp2pd_mask (A, B, C, 5)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH i386 11/8] [AVX512] Add missing packed PF gathers/scatters, rename load/store.
  2014-01-14  6:13 [PATCH i386 11/8] [AVX512] Add missing packed PF gathers/scatters, rename load/store Kirill Yukhin
@ 2014-01-15 19:53 ` Uros Bizjak
  2014-01-21 15:07   ` [PATCH i386 11/8] [AVX512] [1/2] Rename vmov* intrinsics according to EAS Kirill Yukhin
       [not found]   ` <20140116045745.GA31714@msticlxl57.ims.intel.com>
  0 siblings, 2 replies; 8+ messages in thread
From: Uros Bizjak @ 2014-01-15 19:53 UTC (permalink / raw)
  To: Kirill Yukhin; +Cc: Jakub Jelinek, GCC Patches

On Tue, Jan 14, 2014 at 7:13 AM, Kirill Yukhin <kirill.yukhin@gmail.com> wrote:
> Hello,
> This patch introduces missing AVX-512PF intrinsics and tests.
> It also renames store/load intrinsics according to EAS.
>
> gcc/
>         * config/i386/avx512fintrin.h (_mm512_loadu_si512): Rename.
>         (_mm512_storeu_si512): Ditto.
>         * config/i386/avx512pfintrin.h (_mm512_mask_prefetch_i32gather_pd): New.
>         (_mm512_mask_prefetch_i64gather_pd): Ditto.
>         (_mm512_prefetch_i32scatter_pd): Ditto.
>         (_mm512_mask_prefetch_i32scatter_pd): Ditto.
>         (_mm512_prefetch_i64scatter_pd): Ditto.
>         (_mm512_mask_prefetch_i64scatter_pd): Ditto.
>         (_mm512_mask_prefetch_i32gather_ps): Fix operand type.
>         (_mm512_mask_prefetch_i64gather_ps): Ditto.
>         (_mm512_prefetch_i32scatter_ps): Ditto.
>         (_mm512_mask_prefetch_i32scatter_ps): Ditto.
>         (_mm512_prefetch_i64scatter_ps): Ditto.
>         (_mm512_mask_prefetch_i64scatter_ps): Ditto.
>         * config/i386/i386-builtin-types.def: Define
>         VOID_FTYPE_QI_V8SI_PCINT64_INT_INT and VOID_FTYPE_QI_V8DI_PCINT64_INT_INT.
>         * config/i386/i386.c (ix86_builtins): Define IX86_BUILTIN_GATHERPFQPD,
>         IX86_BUILTIN_GATHERPFDPD, IX86_BUILTIN_SCATTERPFDPD,
>         IX86_BUILTIN_SCATTERPFQPD.
>         (ix86_init_mmx_sse_builtins): Define __builtin_ia32_gatherpfdpd,
>         __builtin_ia32_gatherpfdps, __builtin_ia32_gatherpfqpd,
>         __builtin_ia32_gatherpfqps, __builtin_ia32_scatterpfdpd,
>         __builtin_ia32_scatterpfdps, __builtin_ia32_scatterpfqpd,
>         __builtin_ia32_scatterpfqps.
>         (ix86_expand_builtin): Expand new built-ins.
>         * config/i386/sse.md (avx512pf_gatherpf<mode>): Add SF suffix,
>         fix memory access data type.
>         (*avx512pf_gatherpf<mode>_mask): Ditto.
>         (*avx512pf_gatherpf<mode>): Ditto.
>         (avx512pf_scatterpf<mode>): Ditto.
>         (*avx512pf_scatterpf<mode>_mask): Ditto.
>         (*avx512pf_scatterpf<mode>): Ditto.
>         (avx512pf_gatherpf<mode>df): New.
>         (*avx512pf_gatherpf<mode>df_mask): Ditto.
>         (*avx512pf_gatherpf<mode>df): Ditto.
>         (avx512pf_scatterpf<mode>df): Ditto.
>         (*avx512pf_scatterpf<mode>df_mask): Ditto.
>         (*avx512pf_scatterpf<mode>df): Ditto.
>
> testsuite/
>         * gcc.target/i386/avx512f-vmovdqu32-1.c: Fix intrinsic name.
>         * gcc.target/i386/avx512f-vmovdqu32-2.c: Ditto.
>         * gcc.target/i386/avx512f-vpcmpd-2.c: Ditto.
>         * gcc.target/i386/avx512f-vpcmpq-2.c: Ditto.
>         * gcc.target/i386/avx512f-vpcmpud-2.c: Ditto.
>         * gcc.target/i386/avx512f-vpcmpuq-2.c: Ditto.
>         * gcc.target/i386/avx512pf-vgatherpf0dpd-1.c: Ditto.
>         * gcc.target/i386/avx512pf-vgatherpf0qpd-1.c: Ditto.
>         * gcc.target/i386/avx512pf-vgatherpf1dpd-1.c: Ditto.
>         * gcc.target/i386/avx512pf-vgatherpf1qpd-1.c: Ditto.
>         * gcc.target/i386/avx512pf-vscatterpf0dpd-1.c: Ditto.
>         * gcc.target/i386/avx512pf-vscatterpf0qpd-1.c: Ditto.
>         * gcc.target/i386/avx512pf-vscatterpf1dpd-1.c: Ditto.
>         * gcc.target/i386/avx512pf-vscatterpf1qpd-1.c: Ditto.
>         * gcc.target/i386/sse-14.c: Add new built-ins, fix AVX-512ER
>         built-ins roudning immediate.
>         * gcc.target/i386/sse-22.c: Add new built-ins.
>         * gcc.target/i386/sse-23.c: Ditto.
>         * gcc.target/i386/avx-1.c: Ditto.
>
> I have a doubts about changes to sse.md.
> I've splitted existing (SF-only) patterns into 2: DF and SF.
> As far as insn operands and final instruction have no such data
> type discrimination I set this data type to (mem:..) part.
> Having this (for SF):
>   (define_expand "avx512pf_scatterpf<mode>sf"
>     [(unspec
>        [(match_operand:<avx512fmaskmode> 0 "register_or_constm1_operand")
>         (mem:SF
>   ...
>
> instead of this:
>   (define_expand "avx512pf_scatterpf<mode>"
>     [(unspec
>        [(match_operand:<avx512fmaskmode> 0 "register_or_constm1_operand")
>         (mem:<ssescalarmode>
>   ...
>
> Not sure if this (DI/SI) mode for mem is needed. Moreover, not sure what
> that data type represents.

Did you try to add DF/SF mode to the unspec? I am not familiar with
this insn, but shouldn't the mode of mem access be somehow similar to
the avx512f_scattersi<mode> access?

Also, you can use double macroization with MODEF iterator for SF and DFmode.

Uros.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH i386 11/8] [AVX512] [1/2] Rename vmov* intrinsics according to EAS.
  2014-01-15 19:53 ` Uros Bizjak
@ 2014-01-21 15:07   ` Kirill Yukhin
  2014-01-21 15:15     ` Uros Bizjak
       [not found]   ` <20140116045745.GA31714@msticlxl57.ims.intel.com>
  1 sibling, 1 reply; 8+ messages in thread
From: Kirill Yukhin @ 2014-01-21 15:07 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Jakub Jelinek, GCC Patches

Hello,
On 15 Jan 20:53, Uros Bizjak wrote:
> On Tue, Jan 14, 2014 at 7:13 AM, Kirill Yukhin <kirill.yukhin@gmail.com> wrote:
> > I have a doubts about changes to sse.md.
> > I've splitted existing (SF-only) patterns into 2: DF and SF.
> > As far as insn operands and final instruction have no such data
> > type discrimination I set this data type to (mem:..) part.
> > Having this (for SF):
> >   (define_expand "avx512pf_scatterpf<mode>sf"
> >     [(unspec
> >        [(match_operand:<avx512fmaskmode> 0 "register_or_constm1_operand")
> >         (mem:SF
> >   ...
> >
> > instead of this:
> >   (define_expand "avx512pf_scatterpf<mode>"
> >     [(unspec
> >        [(match_operand:<avx512fmaskmode> 0 "register_or_constm1_operand")
> >         (mem:<ssescalarmode>
> >   ...
> >
> > Not sure if this (DI/SI) mode for mem is needed. Moreover, not sure what
> > that data type represents.
> 
> Did you try to add DF/SF mode to the unspec? I am not familiar with
> this insn, but shouldn't the mode of mem access be somehow similar to
> the avx512f_scattersi<mode> access?
> 
> Also, you can use double macroization with MODEF iterator for SF and DFmode.

It seems that patch is (at last!) non-trivial, so I am splitting out trivial part
in order to reduce volume.

It is in the bottom.

Bootstrapped, avx-512* tests pass, sse-* tests pass. Ok for trunk?

--
Thanks, K

---
 gcc/config/i386/avx512fintrin.h                     | 4 ++--
 gcc/testsuite/gcc.target/i386/avx512f-vmovdqu32-1.c | 4 ++--
 gcc/testsuite/gcc.target/i386/avx512f-vmovdqu32-2.c | 4 ++--
 gcc/testsuite/gcc.target/i386/avx512f-vpcmpd-2.c    | 4 ++--
 gcc/testsuite/gcc.target/i386/avx512f-vpcmpq-2.c    | 4 ++--
 gcc/testsuite/gcc.target/i386/avx512f-vpcmpud-2.c   | 4 ++--
 gcc/testsuite/gcc.target/i386/avx512f-vpcmpuq-2.c   | 4 ++--
 7 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/gcc/config/i386/avx512fintrin.h b/gcc/config/i386/avx512fintrin.h
index 26f8cb6..4e94174 100644
--- a/gcc/config/i386/avx512fintrin.h
+++ b/gcc/config/i386/avx512fintrin.h
@@ -5570,7 +5570,7 @@ _mm512_mask_storeu_epi64 (void *__P, __mmask8 __U, __m512i __A)
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_loadu_si512 (void const *__P)
+_mm512_loadu_epi32 (void const *__P)
 {
   return (__m512i) __builtin_ia32_loaddqusi512_mask ((const __v16si *) __P,
 						     (__v16si)
@@ -5599,7 +5599,7 @@ _mm512_maskz_loadu_epi32 (__mmask16 __U, void const *__P)
 
 extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_storeu_si512 (void *__P, __m512i __A)
+_mm512_storeu_epi32 (void *__P, __m512i __A)
 {
   __builtin_ia32_storedqusi512_mask ((__v16si *) __P, (__v16si) __A,
 				     (__mmask16) -1);
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vmovdqu32-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vmovdqu32-1.c
index 79dbf9d..66e358a 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-vmovdqu32-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vmovdqu32-1.c
@@ -15,10 +15,10 @@ volatile __mmask16 m;
 void extern
 avx512f_test (void)
 {
-  x = _mm512_loadu_si512 (p);
+  x = _mm512_loadu_epi32 (p);
   x = _mm512_mask_loadu_epi32 (x, m, p);
   x = _mm512_maskz_loadu_epi32 (m, p);
 
-  _mm512_storeu_si512 (p, x);
+  _mm512_storeu_epi32 (p, x);
   _mm512_mask_storeu_epi32 (p, m, x);
 }
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vmovdqu32-2.c b/gcc/testsuite/gcc.target/i386/avx512f-vmovdqu32-2.c
index f1ae73c..0333d31 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-vmovdqu32-2.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vmovdqu32-2.c
@@ -33,8 +33,8 @@ TEST (void)
     }
 
 #if AVX512F_LEN == 512
-  res1.x = _mm512_loadu_si512 (s1.a);
-  _mm512_storeu_si512 (res2.a, s2.x);
+  res1.x = _mm512_loadu_epi32 (s1.a);
+  _mm512_storeu_epi32 (res2.a, s2.x);
 #endif
   res3.x = INTRINSIC (_mask_loadu_epi32) (res3.x, mask, s1.a);
   res4.x = INTRINSIC (_maskz_loadu_epi32) (mask, s1.a);
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vpcmpd-2.c b/gcc/testsuite/gcc.target/i386/avx512f-vpcmpd-2.c
index 600dfd2..c044f42 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-vpcmpd-2.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vpcmpd-2.c
@@ -17,8 +17,8 @@
     {							\
       dst_ref = ((rel) << i) | dst_ref;			\
     }							\
-    source1.x = _mm512_loadu_si512 (s1);		\
-    source2.x = _mm512_loadu_si512 (s2);		\
+    source1.x = _mm512_loadu_epi32 (s1);		\
+    source2.x = _mm512_loadu_epi32 (s2);		\
     dst1 = _mm512_cmp_epi32_mask (source1.x, source2.x, imm);\
     dst2 = _mm512_mask_cmp_epi32_mask (mask, source1.x, source2.x, imm);\
     if (dst_ref != dst1) abort();			\
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vpcmpq-2.c b/gcc/testsuite/gcc.target/i386/avx512f-vpcmpq-2.c
index 2a9ceb6..e3a90d8 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-vpcmpq-2.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vpcmpq-2.c
@@ -18,8 +18,8 @@ __mmask8 dst_ref;
     {							\
       dst_ref = ((rel) << i) | dst_ref;			\
     }							\
-    source1.x = _mm512_loadu_si512 (s1);		\
-    source2.x = _mm512_loadu_si512 (s2);		\
+    source1.x = _mm512_loadu_epi32 (s1);		\
+    source2.x = _mm512_loadu_epi32 (s2);		\
     dst1 = _mm512_cmp_epi64_mask (source1.x, source2.x, imm);\
     dst2 = _mm512_mask_cmp_epi64_mask (mask, source1.x, source2.x, imm);\
     if (dst_ref != dst1) abort();			\
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vpcmpud-2.c b/gcc/testsuite/gcc.target/i386/avx512f-vpcmpud-2.c
index c0bb978..a90baf9 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-vpcmpud-2.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vpcmpud-2.c
@@ -17,8 +17,8 @@
     {							\
       dst_ref = ((rel) << i) | dst_ref;			\
     }							\
-    source1.x = _mm512_loadu_si512 (s1);		\
-    source2.x = _mm512_loadu_si512 (s2);		\
+    source1.x = _mm512_loadu_epi32 (s1);		\
+    source2.x = _mm512_loadu_epi32 (s2);		\
     dst1 = _mm512_cmp_epu32_mask (source1.x, source2.x, imm);\
     dst2 = _mm512_mask_cmp_epu32_mask (mask, source1.x, source2.x, imm);\
     if (dst_ref != dst1) abort();			\
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vpcmpuq-2.c b/gcc/testsuite/gcc.target/i386/avx512f-vpcmpuq-2.c
index 3bd1b86..c49f5e4 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-vpcmpuq-2.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vpcmpuq-2.c
@@ -17,8 +17,8 @@
     {							\
       dst_ref = ((rel) << i) | dst_ref;			\
     }							\
-    source1.x = _mm512_loadu_si512 (s1);		\
-    source2.x = _mm512_loadu_si512 (s2);		\
+    source1.x = _mm512_loadu_epi32 (s1);		\
+    source2.x = _mm512_loadu_epi32 (s2);		\
     dst1 = _mm512_cmp_epu64_mask (source1.x, source2.x, imm);\
     dst2 = _mm512_mask_cmp_epu64_mask (mask, source1.x, source2.x, imm);\
     if (dst_ref != dst1) abort();			\

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH i386 11/8] [AVX512] [1/2] Rename vmov* intrinsics according to EAS.
  2014-01-21 15:07   ` [PATCH i386 11/8] [AVX512] [1/2] Rename vmov* intrinsics according to EAS Kirill Yukhin
@ 2014-01-21 15:15     ` Uros Bizjak
  0 siblings, 0 replies; 8+ messages in thread
From: Uros Bizjak @ 2014-01-21 15:15 UTC (permalink / raw)
  To: Kirill Yukhin; +Cc: Jakub Jelinek, GCC Patches

On Tue, Jan 21, 2014 at 4:07 PM, Kirill Yukhin <kirill.yukhin@gmail.com> wrote:

>> > I have a doubts about changes to sse.md.
>> > I've splitted existing (SF-only) patterns into 2: DF and SF.
>> > As far as insn operands and final instruction have no such data
>> > type discrimination I set this data type to (mem:..) part.
>> > Having this (for SF):
>> >   (define_expand "avx512pf_scatterpf<mode>sf"
>> >     [(unspec
>> >        [(match_operand:<avx512fmaskmode> 0 "register_or_constm1_operand")
>> >         (mem:SF
>> >   ...
>> >
>> > instead of this:
>> >   (define_expand "avx512pf_scatterpf<mode>"
>> >     [(unspec
>> >        [(match_operand:<avx512fmaskmode> 0 "register_or_constm1_operand")
>> >         (mem:<ssescalarmode>
>> >   ...
>> >
>> > Not sure if this (DI/SI) mode for mem is needed. Moreover, not sure what
>> > that data type represents.
>>
>> Did you try to add DF/SF mode to the unspec? I am not familiar with
>> this insn, but shouldn't the mode of mem access be somehow similar to
>> the avx512f_scattersi<mode> access?
>>
>> Also, you can use double macroization with MODEF iterator for SF and DFmode.
>
> It seems that patch is (at last!) non-trivial, so I am splitting out trivial part
> in order to reduce volume.
>
> It is in the bottom.
>
> Bootstrapped, avx-512* tests pass, sse-* tests pass. Ok for trunk?

OK with a proper ChangeLog.

Thanks,
Uros.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH i386 11/8] [AVX512] [2/2] Add missing packed PF gathers/scatters.
       [not found]     ` <CAFULd4ZsBn3gPViBRRE43VLV2VzCmi4XGGr-bWaz1=fKBKLfbA@mail.gmail.com>
@ 2014-01-21 18:52       ` Kirill Yukhin
  2014-01-23 13:22         ` Uros Bizjak
  0 siblings, 1 reply; 8+ messages in thread
From: Kirill Yukhin @ 2014-01-21 18:52 UTC (permalink / raw)
  To: Uros Bizjak, Jakub Jelinek; +Cc: GCC Patches

Hello,
This is non-trivial part of the patch.

> On 15 Jan 20:53, Uros Bizjak wrote:
> On Tue, Jan 14, 2014 at 7:13 AM, Kirill Yukhin <kirill.yukhin@gmail.com> wrote:
> Did you try to add DF/SF mode to the unspec? I am not familiar with
> this insn, but shouldn't the mode of mem access be somehow similar to
> the avx512f_scattersi<mode> access?
avx512f_scattersi<mode> is different in its appearence.
It has explicit type of destination which discriminates SF/DF modes. Prefetches
has no such.
 
> Also, you can use double macroization with MODEF iterator for SF and DFmode.
I think I cannot. Because DF/SF types of the insn incurs different vidx iterators.
E.g.:
Currently we have for SF:
(define_expand "avx512pf_scatterpf<VI48_512:mode>sf"
  [(unspec
     [(match_operand:<avx512fmaskmode> 0 "register_or_constm1_operand")
      (mem:SF
        (match_par_dup 5
          [(match_operand 2 "vsib_address_operand")
           (match_operand:VI48_512 1 "register_operand")
           (match_operand:SI 3 "const1248_operand")]))
      (match_operand:SI 4 "const_0_to_1_operand")]
     UNSPEC_SCATTER_PREFETCH)]

and for DF:
(define_expand "avx512pf_scatterpf<mode>df"
  [(unspec
     [(match_operand:<avx512fmaskmode> 0 "register_or_constm1_operand")
      (mem:DF
       (match_par_dup 5
         [(match_operand 2 "vsib_address_operand")
          (match_operand:VI4_256_8_512 1 "register_operand")
          (match_operand:SI 3 "const1248_operand")]))
      (match_operand:SI 4 "const_0_to_1_operand")]
     UNSPEC_SCATTER_PREFETCH)]

We have this correspondence between, say, main and index modes:
  SF -> (V16SI, V8DI)
  DF -> (V8SI , V8DI)


I think we should hear from Jaku about sse.md changes and expansion.

Bootstrapped, avx512* and sse-* tests pass.

Comments?

--
Thanks, K

---
 gcc/config/i386/avx512pfintrin.h                   | 113 ++++++++++++--
 gcc/config/i386/i386-builtin-types.def             |   2 +
 gcc/config/i386/i386.c                             |  37 ++++-
 gcc/config/i386/sse.md                             | 171 +++++++++++++++++++--
 gcc/testsuite/gcc.target/i386/avx-1.c              |   4 +
 .../gcc.target/i386/avx512pf-vgatherpf0dpd-1.c     |  15 ++
 .../gcc.target/i386/avx512pf-vgatherpf0qpd-1.c     |  15 ++
 .../gcc.target/i386/avx512pf-vgatherpf1dpd-1.c     |  15 ++
 .../gcc.target/i386/avx512pf-vgatherpf1qpd-1.c     |  15 ++
 .../gcc.target/i386/avx512pf-vscatterpf0dpd-1.c    |  17 ++
 .../gcc.target/i386/avx512pf-vscatterpf0qpd-1.c    |  17 ++
 .../gcc.target/i386/avx512pf-vscatterpf1dpd-1.c    |  17 ++
 .../gcc.target/i386/avx512pf-vscatterpf1qpd-1.c    |  17 ++
 gcc/testsuite/gcc.target/i386/sse-14.c             |   4 +
 gcc/testsuite/gcc.target/i386/sse-22.c             |   5 +
 gcc/testsuite/gcc.target/i386/sse-23.c             |   4 +
 16 files changed, 437 insertions(+), 31 deletions(-)

diff --git a/gcc/config/i386/avx512pfintrin.h b/gcc/config/i386/avx512pfintrin.h
index b8c0110..bc7598e 100644
--- a/gcc/config/i386/avx512pfintrin.h
+++ b/gcc/config/i386/avx512pfintrin.h
@@ -48,74 +48,157 @@ typedef unsigned short __mmask16;
 #ifdef __OPTIMIZE__
 extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_prefetch_i32gather_pd (__m256i index, __mmask8 mask,
+				   void *addr, int scale, int hint)
+{
+  __builtin_ia32_gatherpfdpd (mask, (__v8si) index, (long long const *) addr,
+			      scale, hint);
+}
+
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_mask_prefetch_i32gather_ps (__m512i index, __mmask16 mask,
-				   int const *addr, int scale, int hint)
+				   void *addr, int scale, int hint)
+{
+  __builtin_ia32_gatherpfdps (mask, (__v16si) index, (int const *) addr,
+			      scale, hint);
+}
+
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_prefetch_i64gather_pd (__m512i index, __mmask8 mask,
+				   void *addr, int scale, int hint)
 {
-  __builtin_ia32_gatherpfdps (mask, (__v16si) index, addr, scale, hint);
+  __builtin_ia32_gatherpfqpd (mask, (__v8di) index, (long long const *) addr,
+			      scale, hint);
 }
 
 extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_mask_prefetch_i64gather_ps (__m512i index, __mmask8 mask,
-				   int const *addr, int scale, int hint)
+				   void *addr, int scale, int hint)
 {
-  __builtin_ia32_gatherpfqps (mask, (__v8di) index, addr, scale, hint);
+  __builtin_ia32_gatherpfqps (mask, (__v8di) index, (int const *) addr,
+			      scale, hint);
 }
 
 extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_prefetch_i32scatter_ps (int const *addr, __m512i index, int scale,
+_mm512_prefetch_i32scatter_pd (void *addr, __m256i index, int scale,
 			       int hint)
 {
-  __builtin_ia32_scatterpfdps ((__mmask16) 0xFFFF, (__v16si) index, addr, scale,
-			       hint);
+  __builtin_ia32_scatterpfdpd ((__mmask8) 0xFF, (__v8si) index, 
+			       (long long const *)addr, scale, hint);
+}
+
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_prefetch_i32scatter_ps (void *addr, __m512i index, int scale,
+			       int hint)
+{
+  __builtin_ia32_scatterpfdps ((__mmask16) 0xFFFF, (__v16si) index, (int const *) addr,
+			       scale, hint);
+}
+
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_prefetch_i32scatter_pd (void *addr, __mmask8 mask,
+				    __m256i index, int scale, int hint)
+{
+  __builtin_ia32_scatterpfdpd (mask, (__v8si) index, (long long const *) addr,
+			       scale, hint);
 }
 
 extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_prefetch_i32scatter_ps (int const *addr, __mmask16 mask,
+_mm512_mask_prefetch_i32scatter_ps (void *addr, __mmask16 mask,
 				    __m512i index, int scale, int hint)
 {
-  __builtin_ia32_scatterpfdps (mask, (__v16si) index, addr, scale, hint);
+  __builtin_ia32_scatterpfdps (mask, (__v16si) index, (int const *) addr,
+			       scale, hint);
+}
+
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_prefetch_i64scatter_pd (void *addr, __m512i index, int scale,
+			       int hint)
+{
+  __builtin_ia32_scatterpfqpd ((__mmask8) 0xFF, (__v8di) index, (long long const *) addr,
+			       scale, hint);
 }
 
 extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_prefetch_i64scatter_ps (int const *addr, __m512i index, int scale,
+_mm512_prefetch_i64scatter_ps (void *addr, __m512i index, int scale,
 			       int hint)
 {
-  __builtin_ia32_scatterpfqps ((__mmask8) 0xFF, (__v8di) index, addr, scale,
-			       hint);
+  __builtin_ia32_scatterpfqps ((__mmask8) 0xFF, (__v8di) index, (int const *) addr,
+			       scale, hint);
+}
+
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_prefetch_i64scatter_pd (void *addr, __mmask16 mask,
+				    __m512i index, int scale, int hint)
+{
+  __builtin_ia32_scatterpfqpd (mask, (__v8di) index, (long long const *) addr,
+			       scale, hint);
 }
 
 extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_prefetch_i64scatter_ps (int const *addr, __mmask16 mask,
+_mm512_mask_prefetch_i64scatter_ps (void *addr, __mmask16 mask,
 				    __m512i index, int scale, int hint)
 {
-  __builtin_ia32_scatterpfqps (mask, (__v8di) index, addr, scale, hint);
+  __builtin_ia32_scatterpfqps (mask, (__v8di) index, (int const *) addr,
+			       scale, hint);
 }
+
 #else
+#define _mm512_mask_prefetch_i32gather_pd(INDEX, MASK, ADDR, SCALE, HINT)    \
+  __builtin_ia32_gatherpfdpd ((__mmask8)MASK, (__v8si)(__m256i)INDEX,	     \
+			      (long long const *)ADDR, (int)SCALE, (int)HINT)
+
 #define _mm512_mask_prefetch_i32gather_ps(INDEX, MASK, ADDR, SCALE, HINT)    \
-  __builtin_ia32_gatherpfdps ((__mmask16)MASK, (__v16si)(__m512i)INDEX,	     \
+  __builtin_ia32_gatherpfdps ((__mmask16)MASK, (__v16si)(__m512i)INDEX,      \
 			      (int const *)ADDR, (int)SCALE, (int)HINT)
 
+#define _mm512_mask_prefetch_i64gather_pd(INDEX, MASK, ADDR, SCALE, HINT)    \
+  __builtin_ia32_gatherpfqpd ((__mmask8)MASK, (__v8di)(__m512i)INDEX,	     \
+			      (long long const *)ADDR, (int)SCALE, (int)HINT)
+
 #define _mm512_mask_prefetch_i64gather_ps(INDEX, MASK, ADDR, SCALE, HINT)    \
   __builtin_ia32_gatherpfqps ((__mmask8)MASK, (__v8di)(__m512i)INDEX,	     \
 			      (int const *)ADDR, (int)SCALE, (int)HINT)
 
+#define _mm512_prefetch_i32scatter_pd(ADDR, INDEX, SCALE, HINT)              \
+  __builtin_ia32_scatterpfdpd ((__mmask8)0xFF, (__v8si)(__m256i)INDEX,       \
+			       (long long const *)ADDR, (int)SCALE, (int)HINT)
+
 #define _mm512_prefetch_i32scatter_ps(ADDR, INDEX, SCALE, HINT)              \
   __builtin_ia32_scatterpfdps ((__mmask16)0xFFFF, (__v16si)(__m512i)INDEX,   \
 			       (int const *)ADDR, (int)SCALE, (int)HINT)
 
+#define _mm512_mask_prefetch_i32scatter_pd(ADDR, MASK, INDEX, SCALE, HINT)   \
+  __builtin_ia32_scatterpfdpd ((__mmask8)MASK, (__v8si)(__m256i)INDEX,       \
+			       (long long const *)ADDR, (int)SCALE, (int)HINT)
+
 #define _mm512_mask_prefetch_i32scatter_ps(ADDR, MASK, INDEX, SCALE, HINT)   \
   __builtin_ia32_scatterpfdps ((__mmask16)MASK, (__v16si)(__m512i)INDEX,     \
 			       (int const *)ADDR, (int)SCALE, (int)HINT)
 
+#define _mm512_prefetch_i64scatter_pd(ADDR, INDEX, SCALE, HINT)              \
+  __builtin_ia32_scatterpfqpd ((__mmask8)0xFF, (__v8di)(__m512i)INDEX,	     \
+			       (long long const *)ADDR, (int)SCALE, (int)HINT)
+
 #define _mm512_prefetch_i64scatter_ps(ADDR, INDEX, SCALE, HINT)              \
   __builtin_ia32_scatterpfqps ((__mmask8)0xFF, (__v8di)(__m512i)INDEX,	     \
 			       (int const *)ADDR, (int)SCALE, (int)HINT)
 
+#define _mm512_mask_prefetch_i64scatter_pd(ADDR, MASK, INDEX, SCALE, HINT)   \
+  __builtin_ia32_scatterpfqpd ((__mmask8)MASK, (__v8di)(__m512i)INDEX,	     \
+			       (long long const *)ADDR, (int)SCALE, (int)HINT)
+
 #define _mm512_mask_prefetch_i64scatter_ps(ADDR, MASK, INDEX, SCALE, HINT)   \
   __builtin_ia32_scatterpfqps ((__mmask8)MASK, (__v8di)(__m512i)INDEX,	     \
 			       (int const *)ADDR, (int)SCALE, (int)HINT)
diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
index acf2f32..f3c658b 100644
--- a/gcc/config/i386/i386-builtin-types.def
+++ b/gcc/config/i386/i386-builtin-types.def
@@ -733,7 +733,9 @@ DEF_FUNCTION_TYPE (VOID, PLONGLONG, QI, V8SI, V8DI, INT)
 DEF_FUNCTION_TYPE (VOID, PINT, QI, V8DI, V8SI, INT)
 DEF_FUNCTION_TYPE (VOID, PLONGLONG, QI, V8DI, V8DI, INT)
 
+DEF_FUNCTION_TYPE (VOID, QI, V8SI, PCINT64, INT, INT)
 DEF_FUNCTION_TYPE (VOID, HI, V16SI, PCINT, INT, INT)
+DEF_FUNCTION_TYPE (VOID, QI, V8DI, PCINT64, INT, INT)
 DEF_FUNCTION_TYPE (VOID, QI, V8DI, PCINT, INT, INT)
 
 DEF_FUNCTION_TYPE_ALIAS (V2DF_FTYPE_V2DF, ROUND)
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 1a4d568..49e153c 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -28397,9 +28397,13 @@ enum ix86_builtins
   IX86_BUILTIN_SCATTERSIV8DI,
 
   /* AVX512PF */
+  IX86_BUILTIN_GATHERPFQPD,
   IX86_BUILTIN_GATHERPFDPS,
+  IX86_BUILTIN_GATHERPFDPD,
   IX86_BUILTIN_GATHERPFQPS,
+  IX86_BUILTIN_SCATTERPFDPD,
   IX86_BUILTIN_SCATTERPFDPS,
+  IX86_BUILTIN_SCATTERPFQPD,
   IX86_BUILTIN_SCATTERPFQPS,
 
   /* AVX-512ER */
@@ -30929,15 +30933,27 @@ ix86_init_mmx_sse_builtins (void)
 	       IX86_BUILTIN_SCATTERDIV8DI);
 
   /* AVX512PF */
+  def_builtin (OPTION_MASK_ISA_AVX512PF, "__builtin_ia32_gatherpfdpd",
+	       VOID_FTYPE_QI_V8SI_PCINT64_INT_INT,
+	       IX86_BUILTIN_GATHERPFDPD);
   def_builtin (OPTION_MASK_ISA_AVX512PF, "__builtin_ia32_gatherpfdps",
 	       VOID_FTYPE_HI_V16SI_PCINT_INT_INT,
 	       IX86_BUILTIN_GATHERPFDPS);
+  def_builtin (OPTION_MASK_ISA_AVX512PF, "__builtin_ia32_gatherpfqpd",
+	       VOID_FTYPE_QI_V8DI_PCINT64_INT_INT,
+	       IX86_BUILTIN_GATHERPFQPD);
   def_builtin (OPTION_MASK_ISA_AVX512PF, "__builtin_ia32_gatherpfqps",
 	       VOID_FTYPE_QI_V8DI_PCINT_INT_INT,
 	       IX86_BUILTIN_GATHERPFQPS);
+  def_builtin (OPTION_MASK_ISA_AVX512PF, "__builtin_ia32_scatterpfdpd",
+	       VOID_FTYPE_QI_V8SI_PCINT64_INT_INT,
+	       IX86_BUILTIN_SCATTERPFDPD);
   def_builtin (OPTION_MASK_ISA_AVX512PF, "__builtin_ia32_scatterpfdps",
 	       VOID_FTYPE_HI_V16SI_PCINT_INT_INT,
 	       IX86_BUILTIN_SCATTERPFDPS);
+  def_builtin (OPTION_MASK_ISA_AVX512PF, "__builtin_ia32_scatterpfqpd",
+	       VOID_FTYPE_QI_V8DI_PCINT64_INT_INT,
+	       IX86_BUILTIN_SCATTERPFQPD);
   def_builtin (OPTION_MASK_ISA_AVX512PF, "__builtin_ia32_scatterpfqps",
 	       VOID_FTYPE_QI_V8DI_PCINT_INT_INT,
 	       IX86_BUILTIN_SCATTERPFQPS);
@@ -35583,17 +35599,30 @@ addcarryx:
     case IX86_BUILTIN_SCATTERDIV8DI:
       icode = CODE_FOR_avx512f_scatterdiv8di;
       goto scatter_gen;
+
+    case IX86_BUILTIN_GATHERPFDPD:
+      icode = CODE_FOR_avx512pf_gatherpfv8sidf;
+      goto vec_prefetch_gen;
     case IX86_BUILTIN_GATHERPFDPS:
-      icode = CODE_FOR_avx512pf_gatherpfv16si;
+      icode = CODE_FOR_avx512pf_gatherpfv16sisf;
+      goto vec_prefetch_gen;
+    case IX86_BUILTIN_GATHERPFQPD:
+      icode = CODE_FOR_avx512pf_gatherpfv8didf;
       goto vec_prefetch_gen;
     case IX86_BUILTIN_GATHERPFQPS:
-      icode = CODE_FOR_avx512pf_gatherpfv8di;
+      icode = CODE_FOR_avx512pf_gatherpfv8disf;
+      goto vec_prefetch_gen;
+    case IX86_BUILTIN_SCATTERPFDPD:
+      icode = CODE_FOR_avx512pf_scatterpfv8sidf;
       goto vec_prefetch_gen;
     case IX86_BUILTIN_SCATTERPFDPS:
-      icode = CODE_FOR_avx512pf_scatterpfv16si;
+      icode = CODE_FOR_avx512pf_scatterpfv16sisf;
+      goto vec_prefetch_gen;
+    case IX86_BUILTIN_SCATTERPFQPD:
+      icode = CODE_FOR_avx512pf_scatterpfv8didf;
       goto vec_prefetch_gen;
     case IX86_BUILTIN_SCATTERPFQPS:
-      icode = CODE_FOR_avx512pf_scatterpfv8di;
+      icode = CODE_FOR_avx512pf_scatterpfv8disf;
       goto vec_prefetch_gen;
 
     gather_gen:
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 2e68fb6..7a2097d 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -417,6 +417,7 @@
   [V32QI V16HI V8SI (V8DI "TARGET_AVX512F") (V16SI "TARGET_AVX512F")])
 (define_mode_iterator VI48_256 [V8SI V4DI])
 (define_mode_iterator VI48_512 [V16SI V8DI])
+(define_mode_iterator VI4_256_8_512 [V8SI V8DI])
 
 ;; Int-float size matches
 (define_mode_iterator VI4F_128 [V4SI V4SF])
@@ -12495,10 +12496,11 @@
    (set_attr "btver2_decode" "vector,vector,vector,vector")
    (set_attr "mode" "TI")])
 
-(define_expand "avx512pf_gatherpf<mode>"
+;; Packed float variants
+(define_expand "avx512pf_gatherpf<mode>sf"
   [(unspec
      [(match_operand:<avx512fmaskmode> 0 "register_or_constm1_operand")
-      (mem:<ssescalarmode>
+      (mem:SF
 	(match_par_dup 5
 	  [(match_operand 2 "vsib_address_operand")
 	   (match_operand:VI48_512 1 "register_operand")
@@ -12512,10 +12514,10 @@
 					operands[3]), UNSPEC_VSIBADDR);
 })
 
-(define_insn "*avx512pf_gatherpf<mode>_mask"
+(define_insn "*avx512pf_gatherpf<mode>sf_mask"
   [(unspec
      [(match_operand:<avx512fmaskmode> 0 "register_operand" "k")
-      (match_operator:<ssescalarmode> 5 "vsib_mem_operator"
+      (match_operator:SF 5 "vsib_mem_operator"
 	[(unspec:P
 	   [(match_operand:P 2 "vsib_address_operand" "Tv")
 	    (match_operand:VI48_512 1 "register_operand" "v")
@@ -12539,10 +12541,10 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
 
-(define_insn "*avx512pf_gatherpf<mode>"
+(define_insn "*avx512pf_gatherpf<mode>sf"
   [(unspec
      [(const_int -1)
-      (match_operator:<ssescalarmode> 4 "vsib_mem_operator"
+      (match_operator:SF 4 "vsib_mem_operator"
 	[(unspec:P
 	   [(match_operand:P 1 "vsib_address_operand" "Tv")
 	    (match_operand:VI48_512 0 "register_operand" "v")
@@ -12566,10 +12568,83 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
 
-(define_expand "avx512pf_scatterpf<mode>"
+;; Packed double variants
+(define_expand "avx512pf_gatherpf<mode>df"
   [(unspec
      [(match_operand:<avx512fmaskmode> 0 "register_or_constm1_operand")
-      (mem:<ssescalarmode>
+      (mem:DF
+	(match_par_dup 5
+	  [(match_operand 2 "vsib_address_operand")
+	   (match_operand:VI4_256_8_512 1 "register_operand")
+	   (match_operand:SI 3 "const1248_operand")]))
+      (match_operand:SI 4 "const_0_to_1_operand")]
+     UNSPEC_GATHER_PREFETCH)]
+  "TARGET_AVX512PF"
+{
+  operands[5]
+    = gen_rtx_UNSPEC (Pmode, gen_rtvec (3, operands[2], operands[1],
+					operands[3]), UNSPEC_VSIBADDR);
+})
+
+(define_insn "*avx512pf_gatherpf<mode>df_mask"
+  [(unspec
+     [(match_operand:<avx512fmaskmode> 0 "register_operand" "k")
+      (match_operator:DF 5 "vsib_mem_operator"
+	[(unspec:P
+	   [(match_operand:P 2 "vsib_address_operand" "Tv")
+	    (match_operand:VI4_256_8_512 1 "register_operand" "v")
+	    (match_operand:SI 3 "const1248_operand" "n")]
+	   UNSPEC_VSIBADDR)])
+      (match_operand:SI 4 "const_0_to_1_operand" "n")]
+     UNSPEC_GATHER_PREFETCH)]
+  "TARGET_AVX512PF"
+{
+  switch (INTVAL (operands[4]))
+    {
+    case 0:
+      return "vgatherpf0<ssemodesuffix>pd\t{%5%{%0%}|%5%{%0%}}";
+    case 1:
+      return "vgatherpf1<ssemodesuffix>pd\t{%5%{%0%}|%5%{%0%}}";
+    default:
+      gcc_unreachable ();
+    }
+}
+  [(set_attr "type" "sse")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "XI")])
+
+(define_insn "*avx512pf_gatherpf<mode>df"
+  [(unspec
+     [(const_int -1)
+      (match_operator:DF 4 "vsib_mem_operator"
+	[(unspec:P
+	   [(match_operand:P 1 "vsib_address_operand" "Tv")
+	    (match_operand:VI4_256_8_512 0 "register_operand" "v")
+	    (match_operand:SI 2 "const1248_operand" "n")]
+	   UNSPEC_VSIBADDR)])
+      (match_operand:SI 3 "const_0_to_1_operand" "n")]
+     UNSPEC_GATHER_PREFETCH)]
+  "TARGET_AVX512PF"
+{
+  switch (INTVAL (operands[3]))
+    {
+    case 0:
+      return "vgatherpf0<ssemodesuffix>pd\t{%4|%4}";
+    case 1:
+      return "vgatherpf1<ssemodesuffix>pd\t{%4|%4}";
+    default:
+      gcc_unreachable ();
+    }
+}
+  [(set_attr "type" "sse")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "XI")])
+
+;; Packed float variants
+(define_expand "avx512pf_scatterpf<mode>sf"
+  [(unspec
+     [(match_operand:<avx512fmaskmode> 0 "register_or_constm1_operand")
+      (mem:SF
 	(match_par_dup 5
 	  [(match_operand 2 "vsib_address_operand")
 	   (match_operand:VI48_512 1 "register_operand")
@@ -12583,10 +12658,10 @@
 					operands[3]), UNSPEC_VSIBADDR);
 })
 
-(define_insn "*avx512pf_scatterpf<mode>_mask"
+(define_insn "*avx512pf_scatterpf<mode>sf_mask"
   [(unspec
      [(match_operand:<avx512fmaskmode> 0 "register_operand" "k")
-      (match_operator:<ssescalarmode> 5 "vsib_mem_operator"
+      (match_operator:SF 5 "vsib_mem_operator"
 	[(unspec:P
 	   [(match_operand:P 2 "vsib_address_operand" "Tv")
 	    (match_operand:VI48_512 1 "register_operand" "v")
@@ -12610,10 +12685,10 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
 
-(define_insn "*avx512pf_scatterpf<mode>"
+(define_insn "*avx512pf_scatterpf<mode>sf"
   [(unspec
      [(const_int -1)
-      (match_operator:<ssescalarmode> 4 "vsib_mem_operator"
+      (match_operator:SF 4 "vsib_mem_operator"
 	[(unspec:P
 	   [(match_operand:P 1 "vsib_address_operand" "Tv")
 	    (match_operand:VI48_512 0 "register_operand" "v")
@@ -12637,6 +12712,78 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
 
+;; Packed double variants
+(define_expand "avx512pf_scatterpf<mode>df"
+  [(unspec
+     [(match_operand:<avx512fmaskmode> 0 "register_or_constm1_operand")
+      (mem:DF
+	(match_par_dup 5
+	  [(match_operand 2 "vsib_address_operand")
+	   (match_operand:VI4_256_8_512 1 "register_operand")
+	   (match_operand:SI 3 "const1248_operand")]))
+      (match_operand:SI 4 "const_0_to_1_operand")]
+     UNSPEC_SCATTER_PREFETCH)]
+  "TARGET_AVX512PF"
+{
+  operands[5]
+    = gen_rtx_UNSPEC (Pmode, gen_rtvec (3, operands[2], operands[1],
+					operands[3]), UNSPEC_VSIBADDR);
+})
+
+(define_insn "*avx512pf_scatterpf<mode>df_mask"
+  [(unspec
+     [(match_operand:<avx512fmaskmode> 0 "register_operand" "k")
+      (match_operator:DF 5 "vsib_mem_operator"
+	[(unspec:P
+	   [(match_operand:P 2 "vsib_address_operand" "Tv")
+	    (match_operand:VI4_256_8_512 1 "register_operand" "v")
+	    (match_operand:SI 3 "const1248_operand" "n")]
+	   UNSPEC_VSIBADDR)])
+      (match_operand:SI 4 "const_0_to_1_operand" "n")]
+     UNSPEC_SCATTER_PREFETCH)]
+  "TARGET_AVX512PF"
+{
+  switch (INTVAL (operands[4]))
+    {
+    case 0:
+      return "vscatterpf0<ssemodesuffix>pd\t{%5%{%0%}|%5%{%0%}}";
+    case 1:
+      return "vscatterpf1<ssemodesuffix>pd\t{%5%{%0%}|%5%{%0%}}";
+    default:
+      gcc_unreachable ();
+    }
+}
+  [(set_attr "type" "sse")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "XI")])
+
+(define_insn "*avx512pf_scatterpf<mode>df"
+  [(unspec
+     [(const_int -1)
+      (match_operator:DF 4 "vsib_mem_operator"
+	[(unspec:P
+	   [(match_operand:P 1 "vsib_address_operand" "Tv")
+	    (match_operand:VI4_256_8_512 0 "register_operand" "v")
+	    (match_operand:SI 2 "const1248_operand" "n")]
+	   UNSPEC_VSIBADDR)])
+      (match_operand:SI 3 "const_0_to_1_operand" "n")]
+     UNSPEC_SCATTER_PREFETCH)]
+  "TARGET_AVX512PF"
+{
+  switch (INTVAL (operands[3]))
+    {
+    case 0:
+      return "vscatterpf0<ssemodesuffix>pd\t{%4|%4}";
+    case 1:
+      return "vscatterpf1<ssemodesuffix>pd\t{%4|%4}";
+    default:
+      gcc_unreachable ();
+    }
+}
+  [(set_attr "type" "sse")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "XI")])
+
 (define_insn "avx512er_exp2<mode><mask_name><round_saeonly_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(unspec:VF_512
diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c
index 12674ad..8fb6fb88 100644
--- a/gcc/testsuite/gcc.target/i386/avx-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx-1.c
@@ -362,6 +362,10 @@
 #define __builtin_ia32_gatherpfqps(A, B, C, D, E) __builtin_ia32_gatherpfqps(A, B, C, 1, 1)
 #define __builtin_ia32_scatterpfdps(A, B, C, D, E) __builtin_ia32_scatterpfdps(A, B, C, 1, 1)
 #define __builtin_ia32_scatterpfqps(A, B, C, D, E) __builtin_ia32_scatterpfqps(A, B, C, 1, 1)
+#define __builtin_ia32_gatherpfdpd(A, B, C, D, E) __builtin_ia32_gatherpfdpd(A, B, C, 1, 1)
+#define __builtin_ia32_gatherpfqpd(A, B, C, D, E) __builtin_ia32_gatherpfqpd(A, B, C, 1, 1)
+#define __builtin_ia32_scatterpfdpd(A, B, C, D, E) __builtin_ia32_scatterpfdpd(A, B, C, 1, 1)
+#define __builtin_ia32_scatterpfqpd(A, B, C, D, E) __builtin_ia32_scatterpfqpd(A, B, C, 1, 1)
 
 /* shaintrin.h */
 #define __builtin_ia32_sha1rnds4(A, B, C) __builtin_ia32_sha1rnds4(A, B, 1)
diff --git a/gcc/testsuite/gcc.target/i386/avx512pf-vgatherpf0dpd-1.c b/gcc/testsuite/gcc.target/i386/avx512pf-vgatherpf0dpd-1.c
new file mode 100644
index 0000000..1368b7a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512pf-vgatherpf0dpd-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512pf -O2" } */
+/* { dg-final { scan-assembler-times "vgatherpf0dpd\[ \\t\]+\[^\n\]*\{%k\[1-7\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256i idx;
+volatile __mmask8 m8;
+void *base;
+
+void extern
+avx512pf_test (void)
+{
+  _mm512_mask_prefetch_i32gather_pd (idx, m8, base, 8, 0);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512pf-vgatherpf0qpd-1.c b/gcc/testsuite/gcc.target/i386/avx512pf-vgatherpf0qpd-1.c
new file mode 100644
index 0000000..61a81bb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512pf-vgatherpf0qpd-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512pf -O2" } */
+/* { dg-final { scan-assembler-times "vgatherpf0qpd\[ \\t\]+\[^\n\]*\{%k\[1-7\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m512i idx;
+volatile __mmask8 m8;
+int *base;
+
+void extern
+avx512pf_test (void)
+{
+  _mm512_mask_prefetch_i64gather_pd (idx, m8, base, 8, 0);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512pf-vgatherpf1dpd-1.c b/gcc/testsuite/gcc.target/i386/avx512pf-vgatherpf1dpd-1.c
new file mode 100644
index 0000000..5bc7599
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512pf-vgatherpf1dpd-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512pf -O2" } */
+/* { dg-final { scan-assembler-times "vgatherpf1dpd\[ \\t\]+\[^\n\]*\{%k\[1-7\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256i idx;
+volatile __mmask8 m8;
+int *base;
+
+void extern
+avx512pf_test (void)
+{
+  _mm512_mask_prefetch_i32gather_pd (idx, m8, base, 8, 1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512pf-vgatherpf1qpd-1.c b/gcc/testsuite/gcc.target/i386/avx512pf-vgatherpf1qpd-1.c
new file mode 100644
index 0000000..96610db
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512pf-vgatherpf1qpd-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512pf -O2" } */
+/* { dg-final { scan-assembler-times "vgatherpf1qpd\[ \\t\]+\[^\n\]*\{%k\[1-7\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m512i idx;
+volatile __mmask8 m8;
+int *base;
+
+void extern
+avx512pf_test (void)
+{
+  _mm512_mask_prefetch_i64gather_pd (idx, m8, base, 8, 1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf0dpd-1.c b/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf0dpd-1.c
new file mode 100644
index 0000000..83c31cc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf0dpd-1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512pf -O2" } */
+/* { dg-final { scan-assembler-times "vscatterpf0dpd\[ \\t\]+\[^\n\]*%ymm\[0-9\]" 2 } } */
+/* { dg-final { scan-assembler-times "vscatterpf0dpd\[ \\t\]+\[^\n\]*\{%k\[1-7\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256i idx;
+volatile __mmask8 m8;
+void *base;
+
+void extern
+avx512pf_test (void)
+{
+  _mm512_prefetch_i32scatter_pd (base, idx, 8, 0);
+  _mm512_mask_prefetch_i32scatter_pd (base, m8, idx, 8, 0);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf0qpd-1.c b/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf0qpd-1.c
new file mode 100644
index 0000000..31172f8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf0qpd-1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512pf -O2" } */
+/* { dg-final { scan-assembler-times "vscatterpf0qpd\[ \\t\]+\[^\n\]*%zmm\[0-9\]" 2 } } */
+/* { dg-final { scan-assembler-times "vscatterpf0qpd\[ \\t\]+\[^\n\]*\{%k\[1-7\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m512i idx;
+volatile __mmask8 m8;
+void *base;
+
+void extern
+avx512pf_test (void)
+{
+  _mm512_prefetch_i64scatter_pd (base, idx, 8, 0);
+  _mm512_mask_prefetch_i64scatter_pd (base, m8, idx, 8, 0);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf1dpd-1.c b/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf1dpd-1.c
new file mode 100644
index 0000000..205505b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf1dpd-1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512pf -O2" } */
+/* { dg-final { scan-assembler-times "vscatterpf1dpd\[ \\t\]+\[^\n\]*%ymm\[0-9\]" 2 } } */
+/* { dg-final { scan-assembler-times "vscatterpf1dpd\[ \\t\]+\[^\n\]*\{%k\[1-7\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256i idx;
+volatile __mmask8 m8;
+void *base;
+
+void extern
+avx512pf_test (void)
+{
+  _mm512_prefetch_i32scatter_pd (base, idx, 8, 1);
+  _mm512_mask_prefetch_i32scatter_pd (base, m8, idx, 8, 1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf1qpd-1.c b/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf1qpd-1.c
new file mode 100644
index 0000000..64d7dfa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf1qpd-1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512pf -O2" } */
+/* { dg-final { scan-assembler-times "vscatterpf1qpd\[ \\t\]+\[^\n\]*%zmm\[0-9\]" 2 } } */
+/* { dg-final { scan-assembler-times "vscatterpf1qpd\[ \\t\]+\[^\n\]*\{%k\[1-7\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m512i idx;
+volatile __mmask8 m8;
+int *base;
+
+void extern
+avx512pf_test (void)
+{
+  _mm512_prefetch_i64scatter_pd (base, idx, 8, 1);
+  _mm512_mask_prefetch_i64scatter_pd (base, m8, idx, 8, 1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c
index ad7ca76..643eb99 100644
--- a/gcc/testsuite/gcc.target/i386/sse-14.c
+++ b/gcc/testsuite/gcc.target/i386/sse-14.c
@@ -523,6 +523,10 @@ test_3vx (_mm512_mask_prefetch_i32gather_ps, __m512i, __mmask16, void const *, 1
 test_3vx (_mm512_mask_prefetch_i32scatter_ps, void const *, __mmask16, __m512i, 1, 1)
 test_3vx (_mm512_mask_prefetch_i64gather_ps, __m512i, __mmask8, void const *, 1, 1)
 test_3vx (_mm512_mask_prefetch_i64scatter_ps, void const *, __mmask8, __m512i, 1, 1)
+test_3vx (_mm512_mask_prefetch_i32gather_pd, __m256i, __mmask8, void const *, 1, 1)
+test_3vx (_mm512_mask_prefetch_i32scatter_pd, void const *, __mmask8, __m256i, 1, 1)
+test_3vx (_mm512_mask_prefetch_i64gather_pd, __m512i, __mmask8, void const *, 1, 1)
+test_3vx (_mm512_mask_prefetch_i64scatter_pd, void const *, __mmask8, __m512i, 1, 1)
 
 /* avx512erintrin.h */
 test_1 (_mm512_exp2a23_round_pd, __m512d, __m512d, 5)
diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c
index 630c952..7d68be1 100644
--- a/gcc/testsuite/gcc.target/i386/sse-22.c
+++ b/gcc/testsuite/gcc.target/i386/sse-22.c
@@ -646,6 +646,11 @@ test_3vx (_mm512_mask_prefetch_i32scatter_ps, void const *, __mmask16, __m512i,
 test_3vx (_mm512_mask_prefetch_i64gather_ps, __m512i, __mmask8, void const *, 1, 1)
 test_3vx (_mm512_mask_prefetch_i64scatter_ps, void const *, __mmask8, __m512i, 1, 1)
 
+test_3vx (_mm512_mask_prefetch_i32gather_pd, __m256i, __mmask8, void const *, 1, 1)
+test_3vx (_mm512_mask_prefetch_i32scatter_pd, void const *, __mmask8, __m256i, 1, 1)
+test_3vx (_mm512_mask_prefetch_i64gather_pd, __m512i, __mmask8, long long *, 1, 1)
+test_3vx (_mm512_mask_prefetch_i64scatter_pd, void const *, __mmask8, __m512i, 1, 1)
+
 /* avx512erintrin.h */
 test_1 (_mm512_exp2a23_round_pd, __m512d, __m512d, 5)
 test_1 (_mm512_exp2a23_round_ps, __m512, __m512, 5)
diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
index 309cd73..77c8d67 100644
--- a/gcc/testsuite/gcc.target/i386/sse-23.c
+++ b/gcc/testsuite/gcc.target/i386/sse-23.c
@@ -365,6 +365,10 @@
 #define __builtin_ia32_gatherpfqps(A, B, C, D, E) __builtin_ia32_gatherpfqps(A, B, C, 1, 1)
 #define __builtin_ia32_scatterpfdps(A, B, C, D, E) __builtin_ia32_scatterpfdps(A, B, C, 1, 1)
 #define __builtin_ia32_scatterpfqps(A, B, C, D, E) __builtin_ia32_scatterpfqps(A, B, C, 1, 1)
+#define __builtin_ia32_gatherpfdpd(A, B, C, D, E) __builtin_ia32_gatherpfdpd(A, B, C, 1, 1)
+#define __builtin_ia32_gatherpfqpd(A, B, C, D, E) __builtin_ia32_gatherpfqpd(A, B, C, 1, 1)
+#define __builtin_ia32_scatterpfdpd(A, B, C, D, E) __builtin_ia32_scatterpfdpd(A, B, C, 1, 1)
+#define __builtin_ia32_scatterpfqpd(A, B, C, D, E) __builtin_ia32_scatterpfqpd(A, B, C, 1, 1)
 
 /* avx512erintrin.h */
 #define __builtin_ia32_exp2pd_mask(A, B, C, D) __builtin_ia32_exp2pd_mask (A, B, C, 5)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH i386 11/8] [AVX512] [2/2] Add missing packed PF gathers/scatters.
  2014-01-21 18:52       ` [PATCH i386 11/8] [AVX512] [2/2] Add missing packed PF gathers/scatters Kirill Yukhin
@ 2014-01-23 13:22         ` Uros Bizjak
  2014-01-27 10:09           ` Kirill Yukhin
  0 siblings, 1 reply; 8+ messages in thread
From: Uros Bizjak @ 2014-01-23 13:22 UTC (permalink / raw)
  To: Kirill Yukhin; +Cc: Jakub Jelinek, GCC Patches

On Tue, Jan 21, 2014 at 7:52 PM, Kirill Yukhin <kirill.yukhin@gmail.com> wrote:
> Hello,
> This is non-trivial part of the patch.
>
>> On 15 Jan 20:53, Uros Bizjak wrote:
>> On Tue, Jan 14, 2014 at 7:13 AM, Kirill Yukhin <kirill.yukhin@gmail.com> wrote:
>> Did you try to add DF/SF mode to the unspec? I am not familiar with
>> this insn, but shouldn't the mode of mem access be somehow similar to
>> the avx512f_scattersi<mode> access?
> avx512f_scattersi<mode> is different in its appearence.
> It has explicit type of destination which discriminates SF/DF modes. Prefetches
> has no such.
>
>> Also, you can use double macroization with MODEF iterator for SF and DFmode.
> I think I cannot. Because DF/SF types of the insn incurs different vidx iterators.
> E.g.:
> Currently we have for SF:
> (define_expand "avx512pf_scatterpf<VI48_512:mode>sf"
>   [(unspec
>      [(match_operand:<avx512fmaskmode> 0 "register_or_constm1_operand")
>       (mem:SF
>         (match_par_dup 5
>           [(match_operand 2 "vsib_address_operand")
>            (match_operand:VI48_512 1 "register_operand")
>            (match_operand:SI 3 "const1248_operand")]))
>       (match_operand:SI 4 "const_0_to_1_operand")]
>      UNSPEC_SCATTER_PREFETCH)]
>
> and for DF:
> (define_expand "avx512pf_scatterpf<mode>df"
>   [(unspec
>      [(match_operand:<avx512fmaskmode> 0 "register_or_constm1_operand")
>       (mem:DF
>        (match_par_dup 5
>          [(match_operand 2 "vsib_address_operand")
>           (match_operand:VI4_256_8_512 1 "register_operand")
>           (match_operand:SI 3 "const1248_operand")]))
>       (match_operand:SI 4 "const_0_to_1_operand")]
>      UNSPEC_SCATTER_PREFETCH)]
>
> We have this correspondence between, say, main and index modes:
>   SF -> (V16SI, V8DI)
>   DF -> (V8SI , V8DI)

It looks to me that you should use V16SF and V8DF instead of SF and DF
modes here.

Other than this, the patch looks OK to me. Please wait a day if Jakub
has any remark here.

Thanks,
Uros.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH i386 11/8] [AVX512] [2/2] Add missing packed PF gathers/scatters.
  2014-01-23 13:22         ` Uros Bizjak
@ 2014-01-27 10:09           ` Kirill Yukhin
  2014-01-27 10:25             ` Uros Bizjak
  0 siblings, 1 reply; 8+ messages in thread
From: Kirill Yukhin @ 2014-01-27 10:09 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Jakub Jelinek, GCC Patches

Hello,
On 23 Jan 14:22, Uros Bizjak wrote:
> > (define_expand "avx512pf_scatterpf<mode>df"
> >   [(unspec
> >      [(match_operand:<avx512fmaskmode> 0 "register_or_constm1_operand")
> >       (mem:DF
> >        (match_par_dup 5
> >          [(match_operand 2 "vsib_address_operand")
> >           (match_operand:VI4_256_8_512 1 "register_operand")
> >           (match_operand:SI 3 "const1248_operand")]))
> >       (match_operand:SI 4 "const_0_to_1_operand")]
> >      UNSPEC_SCATTER_PREFETCH)]
> >
> > We have this correspondence between, say, main and index modes:
> >   SF -> (V16SI, V8DI)
> >   DF -> (V8SI , V8DI)
> 
> It looks to me that you should use V16SF and V8DF instead of SF and DF
> modes here.
I didn't find existing attributes with necessary mapping, so I invented new.

> Other than this, the patch looks OK to me. Please wait a day if Jakub
> has any remark here.

Patch in the bottom and I'll check it in this evening (MS time) if no objections.
(will update ChangeLog adding new mode attributes)

--
Thanks, K

 gcc/config/i386/avx512pfintrin.h                   | 113 +++++++++++--
 gcc/config/i386/i386-builtin-types.def             |   2 +
 gcc/config/i386/i386.c                             |  37 ++++-
 gcc/config/i386/sse.md                             | 176 +++++++++++++++++++--
 gcc/testsuite/gcc.target/i386/avx-1.c              |   4 +
 .../gcc.target/i386/avx512pf-vgatherpf0dpd-1.c     |  15 ++
 .../gcc.target/i386/avx512pf-vgatherpf0qpd-1.c     |  15 ++
 .../gcc.target/i386/avx512pf-vgatherpf1dpd-1.c     |  15 ++
 .../gcc.target/i386/avx512pf-vgatherpf1qpd-1.c     |  15 ++
 .../gcc.target/i386/avx512pf-vscatterpf0dpd-1.c    |  17 ++
 .../gcc.target/i386/avx512pf-vscatterpf0qpd-1.c    |  17 ++
 .../gcc.target/i386/avx512pf-vscatterpf1dpd-1.c    |  17 ++
 .../gcc.target/i386/avx512pf-vscatterpf1qpd-1.c    |  17 ++
 gcc/testsuite/gcc.target/i386/sse-14.c             |   4 +
 gcc/testsuite/gcc.target/i386/sse-22.c             |   5 +
 gcc/testsuite/gcc.target/i386/sse-23.c             |   4 +
 16 files changed, 442 insertions(+), 31 deletions(-)

diff --git a/gcc/config/i386/avx512pfintrin.h b/gcc/config/i386/avx512pfintrin.h
index b8c0110..bc7598e 100644
--- a/gcc/config/i386/avx512pfintrin.h
+++ b/gcc/config/i386/avx512pfintrin.h
@@ -48,74 +48,157 @@ typedef unsigned short __mmask16;
 #ifdef __OPTIMIZE__
 extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_prefetch_i32gather_pd (__m256i index, __mmask8 mask,
+				   void *addr, int scale, int hint)
+{
+  __builtin_ia32_gatherpfdpd (mask, (__v8si) index, (long long const *) addr,
+			      scale, hint);
+}
+
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_mask_prefetch_i32gather_ps (__m512i index, __mmask16 mask,
-				   int const *addr, int scale, int hint)
+				   void *addr, int scale, int hint)
+{
+  __builtin_ia32_gatherpfdps (mask, (__v16si) index, (int const *) addr,
+			      scale, hint);
+}
+
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_prefetch_i64gather_pd (__m512i index, __mmask8 mask,
+				   void *addr, int scale, int hint)
 {
-  __builtin_ia32_gatherpfdps (mask, (__v16si) index, addr, scale, hint);
+  __builtin_ia32_gatherpfqpd (mask, (__v8di) index, (long long const *) addr,
+			      scale, hint);
 }
 
 extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_mask_prefetch_i64gather_ps (__m512i index, __mmask8 mask,
-				   int const *addr, int scale, int hint)
+				   void *addr, int scale, int hint)
 {
-  __builtin_ia32_gatherpfqps (mask, (__v8di) index, addr, scale, hint);
+  __builtin_ia32_gatherpfqps (mask, (__v8di) index, (int const *) addr,
+			      scale, hint);
 }
 
 extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_prefetch_i32scatter_ps (int const *addr, __m512i index, int scale,
+_mm512_prefetch_i32scatter_pd (void *addr, __m256i index, int scale,
 			       int hint)
 {
-  __builtin_ia32_scatterpfdps ((__mmask16) 0xFFFF, (__v16si) index, addr, scale,
-			       hint);
+  __builtin_ia32_scatterpfdpd ((__mmask8) 0xFF, (__v8si) index, 
+			       (long long const *)addr, scale, hint);
+}
+
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_prefetch_i32scatter_ps (void *addr, __m512i index, int scale,
+			       int hint)
+{
+  __builtin_ia32_scatterpfdps ((__mmask16) 0xFFFF, (__v16si) index, (int const *) addr,
+			       scale, hint);
+}
+
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_prefetch_i32scatter_pd (void *addr, __mmask8 mask,
+				    __m256i index, int scale, int hint)
+{
+  __builtin_ia32_scatterpfdpd (mask, (__v8si) index, (long long const *) addr,
+			       scale, hint);
 }
 
 extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_prefetch_i32scatter_ps (int const *addr, __mmask16 mask,
+_mm512_mask_prefetch_i32scatter_ps (void *addr, __mmask16 mask,
 				    __m512i index, int scale, int hint)
 {
-  __builtin_ia32_scatterpfdps (mask, (__v16si) index, addr, scale, hint);
+  __builtin_ia32_scatterpfdps (mask, (__v16si) index, (int const *) addr,
+			       scale, hint);
+}
+
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_prefetch_i64scatter_pd (void *addr, __m512i index, int scale,
+			       int hint)
+{
+  __builtin_ia32_scatterpfqpd ((__mmask8) 0xFF, (__v8di) index, (long long const *) addr,
+			       scale, hint);
 }
 
 extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_prefetch_i64scatter_ps (int const *addr, __m512i index, int scale,
+_mm512_prefetch_i64scatter_ps (void *addr, __m512i index, int scale,
 			       int hint)
 {
-  __builtin_ia32_scatterpfqps ((__mmask8) 0xFF, (__v8di) index, addr, scale,
-			       hint);
+  __builtin_ia32_scatterpfqps ((__mmask8) 0xFF, (__v8di) index, (int const *) addr,
+			       scale, hint);
+}
+
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_prefetch_i64scatter_pd (void *addr, __mmask16 mask,
+				    __m512i index, int scale, int hint)
+{
+  __builtin_ia32_scatterpfqpd (mask, (__v8di) index, (long long const *) addr,
+			       scale, hint);
 }
 
 extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_prefetch_i64scatter_ps (int const *addr, __mmask16 mask,
+_mm512_mask_prefetch_i64scatter_ps (void *addr, __mmask16 mask,
 				    __m512i index, int scale, int hint)
 {
-  __builtin_ia32_scatterpfqps (mask, (__v8di) index, addr, scale, hint);
+  __builtin_ia32_scatterpfqps (mask, (__v8di) index, (int const *) addr,
+			       scale, hint);
 }
+
 #else
+#define _mm512_mask_prefetch_i32gather_pd(INDEX, MASK, ADDR, SCALE, HINT)    \
+  __builtin_ia32_gatherpfdpd ((__mmask8)MASK, (__v8si)(__m256i)INDEX,	     \
+			      (long long const *)ADDR, (int)SCALE, (int)HINT)
+
 #define _mm512_mask_prefetch_i32gather_ps(INDEX, MASK, ADDR, SCALE, HINT)    \
-  __builtin_ia32_gatherpfdps ((__mmask16)MASK, (__v16si)(__m512i)INDEX,	     \
+  __builtin_ia32_gatherpfdps ((__mmask16)MASK, (__v16si)(__m512i)INDEX,      \
 			      (int const *)ADDR, (int)SCALE, (int)HINT)
 
+#define _mm512_mask_prefetch_i64gather_pd(INDEX, MASK, ADDR, SCALE, HINT)    \
+  __builtin_ia32_gatherpfqpd ((__mmask8)MASK, (__v8di)(__m512i)INDEX,	     \
+			      (long long const *)ADDR, (int)SCALE, (int)HINT)
+
 #define _mm512_mask_prefetch_i64gather_ps(INDEX, MASK, ADDR, SCALE, HINT)    \
   __builtin_ia32_gatherpfqps ((__mmask8)MASK, (__v8di)(__m512i)INDEX,	     \
 			      (int const *)ADDR, (int)SCALE, (int)HINT)
 
+#define _mm512_prefetch_i32scatter_pd(ADDR, INDEX, SCALE, HINT)              \
+  __builtin_ia32_scatterpfdpd ((__mmask8)0xFF, (__v8si)(__m256i)INDEX,       \
+			       (long long const *)ADDR, (int)SCALE, (int)HINT)
+
 #define _mm512_prefetch_i32scatter_ps(ADDR, INDEX, SCALE, HINT)              \
   __builtin_ia32_scatterpfdps ((__mmask16)0xFFFF, (__v16si)(__m512i)INDEX,   \
 			       (int const *)ADDR, (int)SCALE, (int)HINT)
 
+#define _mm512_mask_prefetch_i32scatter_pd(ADDR, MASK, INDEX, SCALE, HINT)   \
+  __builtin_ia32_scatterpfdpd ((__mmask8)MASK, (__v8si)(__m256i)INDEX,       \
+			       (long long const *)ADDR, (int)SCALE, (int)HINT)
+
 #define _mm512_mask_prefetch_i32scatter_ps(ADDR, MASK, INDEX, SCALE, HINT)   \
   __builtin_ia32_scatterpfdps ((__mmask16)MASK, (__v16si)(__m512i)INDEX,     \
 			       (int const *)ADDR, (int)SCALE, (int)HINT)
 
+#define _mm512_prefetch_i64scatter_pd(ADDR, INDEX, SCALE, HINT)              \
+  __builtin_ia32_scatterpfqpd ((__mmask8)0xFF, (__v8di)(__m512i)INDEX,	     \
+			       (long long const *)ADDR, (int)SCALE, (int)HINT)
+
 #define _mm512_prefetch_i64scatter_ps(ADDR, INDEX, SCALE, HINT)              \
   __builtin_ia32_scatterpfqps ((__mmask8)0xFF, (__v8di)(__m512i)INDEX,	     \
 			       (int const *)ADDR, (int)SCALE, (int)HINT)
 
+#define _mm512_mask_prefetch_i64scatter_pd(ADDR, MASK, INDEX, SCALE, HINT)   \
+  __builtin_ia32_scatterpfqpd ((__mmask8)MASK, (__v8di)(__m512i)INDEX,	     \
+			       (long long const *)ADDR, (int)SCALE, (int)HINT)
+
 #define _mm512_mask_prefetch_i64scatter_ps(ADDR, MASK, INDEX, SCALE, HINT)   \
   __builtin_ia32_scatterpfqps ((__mmask8)MASK, (__v8di)(__m512i)INDEX,	     \
 			       (int const *)ADDR, (int)SCALE, (int)HINT)
diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
index acf2f32..f3c658b 100644
--- a/gcc/config/i386/i386-builtin-types.def
+++ b/gcc/config/i386/i386-builtin-types.def
@@ -733,7 +733,9 @@ DEF_FUNCTION_TYPE (VOID, PLONGLONG, QI, V8SI, V8DI, INT)
 DEF_FUNCTION_TYPE (VOID, PINT, QI, V8DI, V8SI, INT)
 DEF_FUNCTION_TYPE (VOID, PLONGLONG, QI, V8DI, V8DI, INT)
 
+DEF_FUNCTION_TYPE (VOID, QI, V8SI, PCINT64, INT, INT)
 DEF_FUNCTION_TYPE (VOID, HI, V16SI, PCINT, INT, INT)
+DEF_FUNCTION_TYPE (VOID, QI, V8DI, PCINT64, INT, INT)
 DEF_FUNCTION_TYPE (VOID, QI, V8DI, PCINT, INT, INT)
 
 DEF_FUNCTION_TYPE_ALIAS (V2DF_FTYPE_V2DF, ROUND)
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 1a4d568..49e153c 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -28397,9 +28397,13 @@ enum ix86_builtins
   IX86_BUILTIN_SCATTERSIV8DI,
 
   /* AVX512PF */
+  IX86_BUILTIN_GATHERPFQPD,
   IX86_BUILTIN_GATHERPFDPS,
+  IX86_BUILTIN_GATHERPFDPD,
   IX86_BUILTIN_GATHERPFQPS,
+  IX86_BUILTIN_SCATTERPFDPD,
   IX86_BUILTIN_SCATTERPFDPS,
+  IX86_BUILTIN_SCATTERPFQPD,
   IX86_BUILTIN_SCATTERPFQPS,
 
   /* AVX-512ER */
@@ -30929,15 +30933,27 @@ ix86_init_mmx_sse_builtins (void)
 	       IX86_BUILTIN_SCATTERDIV8DI);
 
   /* AVX512PF */
+  def_builtin (OPTION_MASK_ISA_AVX512PF, "__builtin_ia32_gatherpfdpd",
+	       VOID_FTYPE_QI_V8SI_PCINT64_INT_INT,
+	       IX86_BUILTIN_GATHERPFDPD);
   def_builtin (OPTION_MASK_ISA_AVX512PF, "__builtin_ia32_gatherpfdps",
 	       VOID_FTYPE_HI_V16SI_PCINT_INT_INT,
 	       IX86_BUILTIN_GATHERPFDPS);
+  def_builtin (OPTION_MASK_ISA_AVX512PF, "__builtin_ia32_gatherpfqpd",
+	       VOID_FTYPE_QI_V8DI_PCINT64_INT_INT,
+	       IX86_BUILTIN_GATHERPFQPD);
   def_builtin (OPTION_MASK_ISA_AVX512PF, "__builtin_ia32_gatherpfqps",
 	       VOID_FTYPE_QI_V8DI_PCINT_INT_INT,
 	       IX86_BUILTIN_GATHERPFQPS);
+  def_builtin (OPTION_MASK_ISA_AVX512PF, "__builtin_ia32_scatterpfdpd",
+	       VOID_FTYPE_QI_V8SI_PCINT64_INT_INT,
+	       IX86_BUILTIN_SCATTERPFDPD);
   def_builtin (OPTION_MASK_ISA_AVX512PF, "__builtin_ia32_scatterpfdps",
 	       VOID_FTYPE_HI_V16SI_PCINT_INT_INT,
 	       IX86_BUILTIN_SCATTERPFDPS);
+  def_builtin (OPTION_MASK_ISA_AVX512PF, "__builtin_ia32_scatterpfqpd",
+	       VOID_FTYPE_QI_V8DI_PCINT64_INT_INT,
+	       IX86_BUILTIN_SCATTERPFQPD);
   def_builtin (OPTION_MASK_ISA_AVX512PF, "__builtin_ia32_scatterpfqps",
 	       VOID_FTYPE_QI_V8DI_PCINT_INT_INT,
 	       IX86_BUILTIN_SCATTERPFQPS);
@@ -35583,17 +35599,30 @@ addcarryx:
     case IX86_BUILTIN_SCATTERDIV8DI:
       icode = CODE_FOR_avx512f_scatterdiv8di;
       goto scatter_gen;
+
+    case IX86_BUILTIN_GATHERPFDPD:
+      icode = CODE_FOR_avx512pf_gatherpfv8sidf;
+      goto vec_prefetch_gen;
     case IX86_BUILTIN_GATHERPFDPS:
-      icode = CODE_FOR_avx512pf_gatherpfv16si;
+      icode = CODE_FOR_avx512pf_gatherpfv16sisf;
+      goto vec_prefetch_gen;
+    case IX86_BUILTIN_GATHERPFQPD:
+      icode = CODE_FOR_avx512pf_gatherpfv8didf;
       goto vec_prefetch_gen;
     case IX86_BUILTIN_GATHERPFQPS:
-      icode = CODE_FOR_avx512pf_gatherpfv8di;
+      icode = CODE_FOR_avx512pf_gatherpfv8disf;
+      goto vec_prefetch_gen;
+    case IX86_BUILTIN_SCATTERPFDPD:
+      icode = CODE_FOR_avx512pf_scatterpfv8sidf;
       goto vec_prefetch_gen;
     case IX86_BUILTIN_SCATTERPFDPS:
-      icode = CODE_FOR_avx512pf_scatterpfv16si;
+      icode = CODE_FOR_avx512pf_scatterpfv16sisf;
+      goto vec_prefetch_gen;
+    case IX86_BUILTIN_SCATTERPFQPD:
+      icode = CODE_FOR_avx512pf_scatterpfv8didf;
       goto vec_prefetch_gen;
     case IX86_BUILTIN_SCATTERPFQPS:
-      icode = CODE_FOR_avx512pf_scatterpfv8di;
+      icode = CODE_FOR_avx512pf_scatterpfv8disf;
       goto vec_prefetch_gen;
 
     gather_gen:
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 2e68fb6..24eec40 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -417,6 +417,7 @@
   [V32QI V16HI V8SI (V8DI "TARGET_AVX512F") (V16SI "TARGET_AVX512F")])
 (define_mode_iterator VI48_256 [V8SI V4DI])
 (define_mode_iterator VI48_512 [V16SI V8DI])
+(define_mode_iterator VI4_256_8_512 [V8SI V8DI])
 
 ;; Int-float size matches
 (define_mode_iterator VI4F_128 [V4SI V4SF])
@@ -12495,10 +12496,16 @@
    (set_attr "btver2_decode" "vector,vector,vector,vector")
    (set_attr "mode" "TI")])
 
-(define_expand "avx512pf_gatherpf<mode>"
+;; Packed float variants
+(define_mode_attr GATHER_SCATTER_SF_MEM_MODE
+		      [(V8DI "V8SF") (V16SI "V16SF")])
+(define_mode_attr GATHER_SCATTER_DF_MEM_MODE
+		      [(V8DI "V8DF") (V8SI "V8DF")])
+
+(define_expand "avx512pf_gatherpf<mode>sf"
   [(unspec
      [(match_operand:<avx512fmaskmode> 0 "register_or_constm1_operand")
-      (mem:<ssescalarmode>
+      (mem:<GATHER_SCATTER_SF_MEM_MODE>
 	(match_par_dup 5
 	  [(match_operand 2 "vsib_address_operand")
 	   (match_operand:VI48_512 1 "register_operand")
@@ -12512,10 +12519,10 @@
 					operands[3]), UNSPEC_VSIBADDR);
 })
 
-(define_insn "*avx512pf_gatherpf<mode>_mask"
+(define_insn "*avx512pf_gatherpf<mode>sf_mask"
   [(unspec
      [(match_operand:<avx512fmaskmode> 0 "register_operand" "k")
-      (match_operator:<ssescalarmode> 5 "vsib_mem_operator"
+      (match_operator:<GATHER_SCATTER_SF_MEM_MODE> 5 "vsib_mem_operator"
 	[(unspec:P
 	   [(match_operand:P 2 "vsib_address_operand" "Tv")
 	    (match_operand:VI48_512 1 "register_operand" "v")
@@ -12539,10 +12546,10 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
 
-(define_insn "*avx512pf_gatherpf<mode>"
+(define_insn "*avx512pf_gatherpf<mode>sf"
   [(unspec
      [(const_int -1)
-      (match_operator:<ssescalarmode> 4 "vsib_mem_operator"
+      (match_operator:<GATHER_SCATTER_SF_MEM_MODE> 4 "vsib_mem_operator"
 	[(unspec:P
 	   [(match_operand:P 1 "vsib_address_operand" "Tv")
 	    (match_operand:VI48_512 0 "register_operand" "v")
@@ -12566,10 +12573,83 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
 
-(define_expand "avx512pf_scatterpf<mode>"
+;; Packed double variants
+(define_expand "avx512pf_gatherpf<mode>df"
+  [(unspec
+     [(match_operand:<avx512fmaskmode> 0 "register_or_constm1_operand")
+      (mem:<GATHER_SCATTER_DF_MEM_MODE>
+	(match_par_dup 5
+	  [(match_operand 2 "vsib_address_operand")
+	   (match_operand:VI4_256_8_512 1 "register_operand")
+	   (match_operand:SI 3 "const1248_operand")]))
+      (match_operand:SI 4 "const_0_to_1_operand")]
+     UNSPEC_GATHER_PREFETCH)]
+  "TARGET_AVX512PF"
+{
+  operands[5]
+    = gen_rtx_UNSPEC (Pmode, gen_rtvec (3, operands[2], operands[1],
+					operands[3]), UNSPEC_VSIBADDR);
+})
+
+(define_insn "*avx512pf_gatherpf<mode>df_mask"
+  [(unspec
+     [(match_operand:<avx512fmaskmode> 0 "register_operand" "k")
+      (match_operator:<GATHER_SCATTER_DF_MEM_MODE> 5 "vsib_mem_operator"
+	[(unspec:P
+	   [(match_operand:P 2 "vsib_address_operand" "Tv")
+	    (match_operand:VI4_256_8_512 1 "register_operand" "v")
+	    (match_operand:SI 3 "const1248_operand" "n")]
+	   UNSPEC_VSIBADDR)])
+      (match_operand:SI 4 "const_0_to_1_operand" "n")]
+     UNSPEC_GATHER_PREFETCH)]
+  "TARGET_AVX512PF"
+{
+  switch (INTVAL (operands[4]))
+    {
+    case 0:
+      return "vgatherpf0<ssemodesuffix>pd\t{%5%{%0%}|%5%{%0%}}";
+    case 1:
+      return "vgatherpf1<ssemodesuffix>pd\t{%5%{%0%}|%5%{%0%}}";
+    default:
+      gcc_unreachable ();
+    }
+}
+  [(set_attr "type" "sse")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "XI")])
+
+(define_insn "*avx512pf_gatherpf<mode>df"
+  [(unspec
+     [(const_int -1)
+      (match_operator:<GATHER_SCATTER_DF_MEM_MODE> 4 "vsib_mem_operator"
+	[(unspec:P
+	   [(match_operand:P 1 "vsib_address_operand" "Tv")
+	    (match_operand:VI4_256_8_512 0 "register_operand" "v")
+	    (match_operand:SI 2 "const1248_operand" "n")]
+	   UNSPEC_VSIBADDR)])
+      (match_operand:SI 3 "const_0_to_1_operand" "n")]
+     UNSPEC_GATHER_PREFETCH)]
+  "TARGET_AVX512PF"
+{
+  switch (INTVAL (operands[3]))
+    {
+    case 0:
+      return "vgatherpf0<ssemodesuffix>pd\t{%4|%4}";
+    case 1:
+      return "vgatherpf1<ssemodesuffix>pd\t{%4|%4}";
+    default:
+      gcc_unreachable ();
+    }
+}
+  [(set_attr "type" "sse")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "XI")])
+
+;; Packed float variants
+(define_expand "avx512pf_scatterpf<mode>sf"
   [(unspec
      [(match_operand:<avx512fmaskmode> 0 "register_or_constm1_operand")
-      (mem:<ssescalarmode>
+      (mem:<GATHER_SCATTER_SF_MEM_MODE>
 	(match_par_dup 5
 	  [(match_operand 2 "vsib_address_operand")
 	   (match_operand:VI48_512 1 "register_operand")
@@ -12583,10 +12663,10 @@
 					operands[3]), UNSPEC_VSIBADDR);
 })
 
-(define_insn "*avx512pf_scatterpf<mode>_mask"
+(define_insn "*avx512pf_scatterpf<mode>sf_mask"
   [(unspec
      [(match_operand:<avx512fmaskmode> 0 "register_operand" "k")
-      (match_operator:<ssescalarmode> 5 "vsib_mem_operator"
+      (match_operator:<GATHER_SCATTER_SF_MEM_MODE> 5 "vsib_mem_operator"
 	[(unspec:P
 	   [(match_operand:P 2 "vsib_address_operand" "Tv")
 	    (match_operand:VI48_512 1 "register_operand" "v")
@@ -12610,10 +12690,10 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
 
-(define_insn "*avx512pf_scatterpf<mode>"
+(define_insn "*avx512pf_scatterpf<mode>sf"
   [(unspec
      [(const_int -1)
-      (match_operator:<ssescalarmode> 4 "vsib_mem_operator"
+      (match_operator:<GATHER_SCATTER_SF_MEM_MODE> 4 "vsib_mem_operator"
 	[(unspec:P
 	   [(match_operand:P 1 "vsib_address_operand" "Tv")
 	    (match_operand:VI48_512 0 "register_operand" "v")
@@ -12637,6 +12717,78 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
 
+;; Packed double variants
+(define_expand "avx512pf_scatterpf<mode>df"
+  [(unspec
+     [(match_operand:<avx512fmaskmode> 0 "register_or_constm1_operand")
+      (mem:<GATHER_SCATTER_DF_MEM_MODE>
+	(match_par_dup 5
+	  [(match_operand 2 "vsib_address_operand")
+	   (match_operand:VI4_256_8_512 1 "register_operand")
+	   (match_operand:SI 3 "const1248_operand")]))
+      (match_operand:SI 4 "const_0_to_1_operand")]
+     UNSPEC_SCATTER_PREFETCH)]
+  "TARGET_AVX512PF"
+{
+  operands[5]
+    = gen_rtx_UNSPEC (Pmode, gen_rtvec (3, operands[2], operands[1],
+					operands[3]), UNSPEC_VSIBADDR);
+})
+
+(define_insn "*avx512pf_scatterpf<mode>df_mask"
+  [(unspec
+     [(match_operand:<avx512fmaskmode> 0 "register_operand" "k")
+      (match_operator:<GATHER_SCATTER_DF_MEM_MODE> 5 "vsib_mem_operator"
+	[(unspec:P
+	   [(match_operand:P 2 "vsib_address_operand" "Tv")
+	    (match_operand:VI4_256_8_512 1 "register_operand" "v")
+	    (match_operand:SI 3 "const1248_operand" "n")]
+	   UNSPEC_VSIBADDR)])
+      (match_operand:SI 4 "const_0_to_1_operand" "n")]
+     UNSPEC_SCATTER_PREFETCH)]
+  "TARGET_AVX512PF"
+{
+  switch (INTVAL (operands[4]))
+    {
+    case 0:
+      return "vscatterpf0<ssemodesuffix>pd\t{%5%{%0%}|%5%{%0%}}";
+    case 1:
+      return "vscatterpf1<ssemodesuffix>pd\t{%5%{%0%}|%5%{%0%}}";
+    default:
+      gcc_unreachable ();
+    }
+}
+  [(set_attr "type" "sse")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "XI")])
+
+(define_insn "*avx512pf_scatterpf<mode>df"
+  [(unspec
+     [(const_int -1)
+      (match_operator:<GATHER_SCATTER_DF_MEM_MODE> 4 "vsib_mem_operator"
+	[(unspec:P
+	   [(match_operand:P 1 "vsib_address_operand" "Tv")
+	    (match_operand:VI4_256_8_512 0 "register_operand" "v")
+	    (match_operand:SI 2 "const1248_operand" "n")]
+	   UNSPEC_VSIBADDR)])
+      (match_operand:SI 3 "const_0_to_1_operand" "n")]
+     UNSPEC_SCATTER_PREFETCH)]
+  "TARGET_AVX512PF"
+{
+  switch (INTVAL (operands[3]))
+    {
+    case 0:
+      return "vscatterpf0<ssemodesuffix>pd\t{%4|%4}";
+    case 1:
+      return "vscatterpf1<ssemodesuffix>pd\t{%4|%4}";
+    default:
+      gcc_unreachable ();
+    }
+}
+  [(set_attr "type" "sse")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "XI")])
+
 (define_insn "avx512er_exp2<mode><mask_name><round_saeonly_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(unspec:VF_512
diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c
index 12674ad..8fb6fb88 100644
--- a/gcc/testsuite/gcc.target/i386/avx-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx-1.c
@@ -362,6 +362,10 @@
 #define __builtin_ia32_gatherpfqps(A, B, C, D, E) __builtin_ia32_gatherpfqps(A, B, C, 1, 1)
 #define __builtin_ia32_scatterpfdps(A, B, C, D, E) __builtin_ia32_scatterpfdps(A, B, C, 1, 1)
 #define __builtin_ia32_scatterpfqps(A, B, C, D, E) __builtin_ia32_scatterpfqps(A, B, C, 1, 1)
+#define __builtin_ia32_gatherpfdpd(A, B, C, D, E) __builtin_ia32_gatherpfdpd(A, B, C, 1, 1)
+#define __builtin_ia32_gatherpfqpd(A, B, C, D, E) __builtin_ia32_gatherpfqpd(A, B, C, 1, 1)
+#define __builtin_ia32_scatterpfdpd(A, B, C, D, E) __builtin_ia32_scatterpfdpd(A, B, C, 1, 1)
+#define __builtin_ia32_scatterpfqpd(A, B, C, D, E) __builtin_ia32_scatterpfqpd(A, B, C, 1, 1)
 
 /* shaintrin.h */
 #define __builtin_ia32_sha1rnds4(A, B, C) __builtin_ia32_sha1rnds4(A, B, 1)
diff --git a/gcc/testsuite/gcc.target/i386/avx512pf-vgatherpf0dpd-1.c b/gcc/testsuite/gcc.target/i386/avx512pf-vgatherpf0dpd-1.c
new file mode 100644
index 0000000..1368b7a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512pf-vgatherpf0dpd-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512pf -O2" } */
+/* { dg-final { scan-assembler-times "vgatherpf0dpd\[ \\t\]+\[^\n\]*\{%k\[1-7\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256i idx;
+volatile __mmask8 m8;
+void *base;
+
+void extern
+avx512pf_test (void)
+{
+  _mm512_mask_prefetch_i32gather_pd (idx, m8, base, 8, 0);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512pf-vgatherpf0qpd-1.c b/gcc/testsuite/gcc.target/i386/avx512pf-vgatherpf0qpd-1.c
new file mode 100644
index 0000000..61a81bb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512pf-vgatherpf0qpd-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512pf -O2" } */
+/* { dg-final { scan-assembler-times "vgatherpf0qpd\[ \\t\]+\[^\n\]*\{%k\[1-7\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m512i idx;
+volatile __mmask8 m8;
+int *base;
+
+void extern
+avx512pf_test (void)
+{
+  _mm512_mask_prefetch_i64gather_pd (idx, m8, base, 8, 0);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512pf-vgatherpf1dpd-1.c b/gcc/testsuite/gcc.target/i386/avx512pf-vgatherpf1dpd-1.c
new file mode 100644
index 0000000..5bc7599
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512pf-vgatherpf1dpd-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512pf -O2" } */
+/* { dg-final { scan-assembler-times "vgatherpf1dpd\[ \\t\]+\[^\n\]*\{%k\[1-7\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256i idx;
+volatile __mmask8 m8;
+int *base;
+
+void extern
+avx512pf_test (void)
+{
+  _mm512_mask_prefetch_i32gather_pd (idx, m8, base, 8, 1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512pf-vgatherpf1qpd-1.c b/gcc/testsuite/gcc.target/i386/avx512pf-vgatherpf1qpd-1.c
new file mode 100644
index 0000000..96610db
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512pf-vgatherpf1qpd-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512pf -O2" } */
+/* { dg-final { scan-assembler-times "vgatherpf1qpd\[ \\t\]+\[^\n\]*\{%k\[1-7\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m512i idx;
+volatile __mmask8 m8;
+int *base;
+
+void extern
+avx512pf_test (void)
+{
+  _mm512_mask_prefetch_i64gather_pd (idx, m8, base, 8, 1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf0dpd-1.c b/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf0dpd-1.c
new file mode 100644
index 0000000..83c31cc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf0dpd-1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512pf -O2" } */
+/* { dg-final { scan-assembler-times "vscatterpf0dpd\[ \\t\]+\[^\n\]*%ymm\[0-9\]" 2 } } */
+/* { dg-final { scan-assembler-times "vscatterpf0dpd\[ \\t\]+\[^\n\]*\{%k\[1-7\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256i idx;
+volatile __mmask8 m8;
+void *base;
+
+void extern
+avx512pf_test (void)
+{
+  _mm512_prefetch_i32scatter_pd (base, idx, 8, 0);
+  _mm512_mask_prefetch_i32scatter_pd (base, m8, idx, 8, 0);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf0qpd-1.c b/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf0qpd-1.c
new file mode 100644
index 0000000..31172f8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf0qpd-1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512pf -O2" } */
+/* { dg-final { scan-assembler-times "vscatterpf0qpd\[ \\t\]+\[^\n\]*%zmm\[0-9\]" 2 } } */
+/* { dg-final { scan-assembler-times "vscatterpf0qpd\[ \\t\]+\[^\n\]*\{%k\[1-7\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m512i idx;
+volatile __mmask8 m8;
+void *base;
+
+void extern
+avx512pf_test (void)
+{
+  _mm512_prefetch_i64scatter_pd (base, idx, 8, 0);
+  _mm512_mask_prefetch_i64scatter_pd (base, m8, idx, 8, 0);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf1dpd-1.c b/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf1dpd-1.c
new file mode 100644
index 0000000..205505b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf1dpd-1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512pf -O2" } */
+/* { dg-final { scan-assembler-times "vscatterpf1dpd\[ \\t\]+\[^\n\]*%ymm\[0-9\]" 2 } } */
+/* { dg-final { scan-assembler-times "vscatterpf1dpd\[ \\t\]+\[^\n\]*\{%k\[1-7\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256i idx;
+volatile __mmask8 m8;
+void *base;
+
+void extern
+avx512pf_test (void)
+{
+  _mm512_prefetch_i32scatter_pd (base, idx, 8, 1);
+  _mm512_mask_prefetch_i32scatter_pd (base, m8, idx, 8, 1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf1qpd-1.c b/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf1qpd-1.c
new file mode 100644
index 0000000..64d7dfa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf1qpd-1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512pf -O2" } */
+/* { dg-final { scan-assembler-times "vscatterpf1qpd\[ \\t\]+\[^\n\]*%zmm\[0-9\]" 2 } } */
+/* { dg-final { scan-assembler-times "vscatterpf1qpd\[ \\t\]+\[^\n\]*\{%k\[1-7\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m512i idx;
+volatile __mmask8 m8;
+int *base;
+
+void extern
+avx512pf_test (void)
+{
+  _mm512_prefetch_i64scatter_pd (base, idx, 8, 1);
+  _mm512_mask_prefetch_i64scatter_pd (base, m8, idx, 8, 1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c
index ad7ca76..643eb99 100644
--- a/gcc/testsuite/gcc.target/i386/sse-14.c
+++ b/gcc/testsuite/gcc.target/i386/sse-14.c
@@ -523,6 +523,10 @@ test_3vx (_mm512_mask_prefetch_i32gather_ps, __m512i, __mmask16, void const *, 1
 test_3vx (_mm512_mask_prefetch_i32scatter_ps, void const *, __mmask16, __m512i, 1, 1)
 test_3vx (_mm512_mask_prefetch_i64gather_ps, __m512i, __mmask8, void const *, 1, 1)
 test_3vx (_mm512_mask_prefetch_i64scatter_ps, void const *, __mmask8, __m512i, 1, 1)
+test_3vx (_mm512_mask_prefetch_i32gather_pd, __m256i, __mmask8, void const *, 1, 1)
+test_3vx (_mm512_mask_prefetch_i32scatter_pd, void const *, __mmask8, __m256i, 1, 1)
+test_3vx (_mm512_mask_prefetch_i64gather_pd, __m512i, __mmask8, void const *, 1, 1)
+test_3vx (_mm512_mask_prefetch_i64scatter_pd, void const *, __mmask8, __m512i, 1, 1)
 
 /* avx512erintrin.h */
 test_1 (_mm512_exp2a23_round_pd, __m512d, __m512d, 5)
diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c
index 630c952..7d68be1 100644
--- a/gcc/testsuite/gcc.target/i386/sse-22.c
+++ b/gcc/testsuite/gcc.target/i386/sse-22.c
@@ -646,6 +646,11 @@ test_3vx (_mm512_mask_prefetch_i32scatter_ps, void const *, __mmask16, __m512i,
 test_3vx (_mm512_mask_prefetch_i64gather_ps, __m512i, __mmask8, void const *, 1, 1)
 test_3vx (_mm512_mask_prefetch_i64scatter_ps, void const *, __mmask8, __m512i, 1, 1)
 
+test_3vx (_mm512_mask_prefetch_i32gather_pd, __m256i, __mmask8, void const *, 1, 1)
+test_3vx (_mm512_mask_prefetch_i32scatter_pd, void const *, __mmask8, __m256i, 1, 1)
+test_3vx (_mm512_mask_prefetch_i64gather_pd, __m512i, __mmask8, long long *, 1, 1)
+test_3vx (_mm512_mask_prefetch_i64scatter_pd, void const *, __mmask8, __m512i, 1, 1)
+
 /* avx512erintrin.h */
 test_1 (_mm512_exp2a23_round_pd, __m512d, __m512d, 5)
 test_1 (_mm512_exp2a23_round_ps, __m512, __m512, 5)
diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
index 309cd73..77c8d67 100644
--- a/gcc/testsuite/gcc.target/i386/sse-23.c
+++ b/gcc/testsuite/gcc.target/i386/sse-23.c
@@ -365,6 +365,10 @@
 #define __builtin_ia32_gatherpfqps(A, B, C, D, E) __builtin_ia32_gatherpfqps(A, B, C, 1, 1)
 #define __builtin_ia32_scatterpfdps(A, B, C, D, E) __builtin_ia32_scatterpfdps(A, B, C, 1, 1)
 #define __builtin_ia32_scatterpfqps(A, B, C, D, E) __builtin_ia32_scatterpfqps(A, B, C, 1, 1)
+#define __builtin_ia32_gatherpfdpd(A, B, C, D, E) __builtin_ia32_gatherpfdpd(A, B, C, 1, 1)
+#define __builtin_ia32_gatherpfqpd(A, B, C, D, E) __builtin_ia32_gatherpfqpd(A, B, C, 1, 1)
+#define __builtin_ia32_scatterpfdpd(A, B, C, D, E) __builtin_ia32_scatterpfdpd(A, B, C, 1, 1)
+#define __builtin_ia32_scatterpfqpd(A, B, C, D, E) __builtin_ia32_scatterpfqpd(A, B, C, 1, 1)
 
 /* avx512erintrin.h */
 #define __builtin_ia32_exp2pd_mask(A, B, C, D) __builtin_ia32_exp2pd_mask (A, B, C, 5)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH i386 11/8] [AVX512] [2/2] Add missing packed PF gathers/scatters.
  2014-01-27 10:09           ` Kirill Yukhin
@ 2014-01-27 10:25             ` Uros Bizjak
  0 siblings, 0 replies; 8+ messages in thread
From: Uros Bizjak @ 2014-01-27 10:25 UTC (permalink / raw)
  To: Kirill Yukhin; +Cc: Jakub Jelinek, GCC Patches

On Mon, Jan 27, 2014 at 11:09 AM, Kirill Yukhin <kirill.yukhin@gmail.com> wrote:

>> > (define_expand "avx512pf_scatterpf<mode>df"
>> >   [(unspec
>> >      [(match_operand:<avx512fmaskmode> 0 "register_or_constm1_operand")
>> >       (mem:DF
>> >        (match_par_dup 5
>> >          [(match_operand 2 "vsib_address_operand")
>> >           (match_operand:VI4_256_8_512 1 "register_operand")
>> >           (match_operand:SI 3 "const1248_operand")]))
>> >       (match_operand:SI 4 "const_0_to_1_operand")]
>> >      UNSPEC_SCATTER_PREFETCH)]
>> >
>> > We have this correspondence between, say, main and index modes:
>> >   SF -> (V16SI, V8DI)
>> >   DF -> (V8SI , V8DI)
>>
>> It looks to me that you should use V16SF and V8DF instead of SF and DF
>> modes here.
> I didn't find existing attributes with necessary mapping, so I invented new.
>
>> Other than this, the patch looks OK to me. Please wait a day if Jakub
>> has any remark here.
>
> Patch in the bottom and I'll check it in this evening (MS time) if no objections.
> (will update ChangeLog adding new mode attributes)
>
> --
> Thanks, K
>
>  gcc/config/i386/avx512pfintrin.h                   | 113 +++++++++++--
>  gcc/config/i386/i386-builtin-types.def             |   2 +
>  gcc/config/i386/i386.c                             |  37 ++++-
>  gcc/config/i386/sse.md                             | 176 +++++++++++++++++++--
>  gcc/testsuite/gcc.target/i386/avx-1.c              |   4 +
>  .../gcc.target/i386/avx512pf-vgatherpf0dpd-1.c     |  15 ++
>  .../gcc.target/i386/avx512pf-vgatherpf0qpd-1.c     |  15 ++
>  .../gcc.target/i386/avx512pf-vgatherpf1dpd-1.c     |  15 ++
>  .../gcc.target/i386/avx512pf-vgatherpf1qpd-1.c     |  15 ++
>  .../gcc.target/i386/avx512pf-vscatterpf0dpd-1.c    |  17 ++
>  .../gcc.target/i386/avx512pf-vscatterpf0qpd-1.c    |  17 ++
>  .../gcc.target/i386/avx512pf-vscatterpf1dpd-1.c    |  17 ++
>  .../gcc.target/i386/avx512pf-vscatterpf1qpd-1.c    |  17 ++
>  gcc/testsuite/gcc.target/i386/sse-14.c             |   4 +
>  gcc/testsuite/gcc.target/i386/sse-22.c             |   5 +
>  gcc/testsuite/gcc.target/i386/sse-23.c             |   4 +
>  16 files changed, 442 insertions(+), 31 deletions(-)

> -(define_expand "avx512pf_gatherpf<mode>"
> +;; Packed float variants
> +(define_mode_attr GATHER_SCATTER_SF_MEM_MODE
> +                     [(V8DI "V8SF") (V16SI "V16SF")])
> +(define_mode_attr GATHER_SCATTER_DF_MEM_MODE
> +                     [(V8DI "V8DF") (V8SI "V8DF")])

You actually don't need this attribute, since it always declares V8DF.
Just use V8DF mode in the patterns instead.

(no need to repost the patch due to this trivial removal).

Uros.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-01-27 10:25 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-14  6:13 [PATCH i386 11/8] [AVX512] Add missing packed PF gathers/scatters, rename load/store Kirill Yukhin
2014-01-15 19:53 ` Uros Bizjak
2014-01-21 15:07   ` [PATCH i386 11/8] [AVX512] [1/2] Rename vmov* intrinsics according to EAS Kirill Yukhin
2014-01-21 15:15     ` Uros Bizjak
     [not found]   ` <20140116045745.GA31714@msticlxl57.ims.intel.com>
     [not found]     ` <CAFULd4ZsBn3gPViBRRE43VLV2VzCmi4XGGr-bWaz1=fKBKLfbA@mail.gmail.com>
2014-01-21 18:52       ` [PATCH i386 11/8] [AVX512] [2/2] Add missing packed PF gathers/scatters Kirill Yukhin
2014-01-23 13:22         ` Uros Bizjak
2014-01-27 10:09           ` Kirill Yukhin
2014-01-27 10:25             ` Uros Bizjak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).