public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 00/18] Support -mevex512 for AVX512
@ 2023-09-21  7:19 Hu, Lin1
  2023-09-21  7:19 ` [PATCH 01/18] Initial support for -mevex512 Hu, Lin1
                   ` (19 more replies)
  0 siblings, 20 replies; 25+ messages in thread
From: Hu, Lin1 @ 2023-09-21  7:19 UTC (permalink / raw)
  To: gcc-patches; +Cc: hongtao.liu, ubizjak, haochen.jiang

Hi all,

After previous discussion, instead of supporting option -mavx10.1, we
will first introduct option -m[no-]evex512, which will enable/disable
512 bit register and 64 bit mask register.

It will not change the current option behavior since if AVX512F is
enabled with no evex512 option specified, it will automatically enable
512 bit register and 64 bit mask register.

How the patches go comes following:

Patch 1 added initial support for option -mevex512.

Patch 2-6 refined current intrin file to push evex512 target for all
512 bit intrins. Those scalar intrins remained untouched.

Patch 7-11 added OPTION_MASK_ISA2_EVEX512 for all related builtins.

Patch 12 disabled zmm register, 512 bit libmvec call for no-evex512,
also requested evex512 for vectorization when using 512 bit register.

Patch 13-17 supported evex512 in related patterns.

Patch 18 added testcases for -mno-evex512 and allowed its usage.

The patches currently cause scan-asm fail for pr89229-{5,6,7}b.c since
we will emit scalar vmovss here. When trying to use x/ymm 16+ w/o
avx512vl but with avx512f+evex512, I suppose we could either emit scalar
or zmm instructions. It is quite a rare case on HW since there is no
HW w/o avx512vl but with avx512f, so I prefer to not to add maintainence
effort here to get a slightly perf improvement. But it could be changed
to former behavior.

Discussions are welcomed for all the patches.

Thx,
Haochen

Haochen Jiang (18):
  Initial support for -mevex512
  Push evex512 target for 512 bit intrins
  Push evex512 target for 512 bit intrins
  Push evex512 target for 512 bit intrins
  Push evex512 target for 512 bit intrins
  Push evex512 target for 512 bit intrins
  Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins
  Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins
  Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins
  Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins
  Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins
  Disable zmm register and 512 bit libmvec call when !TARGET_EVEX512
  Support -mevex512 for AVX512F intrins
  Support -mevex512 for AVX512DQ intrins
  Support -mevex512 for AVX512BW intrins
  Support -mevex512 for
    AVX512{IFMA,VBMI,VNNI,BF16,VPOPCNTDQ,VBMI2,BITALG,VP2INTERSECT},VAES,GFNI,VPCLMULQDQ
    intrins
  Support -mevex512 for AVX512FP16 intrins
  Allow -mno-evex512 usage

 gcc/common/config/i386/i386-common.cc       |    15 +
 gcc/config.gcc                              |    19 +-
 gcc/config/i386/avx5124fmapsintrin.h        |     2 +-
 gcc/config/i386/avx5124vnniwintrin.h        |     2 +-
 gcc/config/i386/avx512bf16intrin.h          |    31 +-
 gcc/config/i386/avx512bitalgintrin.h        |   155 +-
 gcc/config/i386/avx512bitalgvlintrin.h      |   180 +
 gcc/config/i386/avx512bwintrin.h            |   291 +-
 gcc/config/i386/avx512dqintrin.h            |  1840 +-
 gcc/config/i386/avx512erintrin.h            |     2 +-
 gcc/config/i386/avx512fintrin.h             | 19663 +++++++++---------
 gcc/config/i386/avx512fp16intrin.h          |  8925 ++++----
 gcc/config/i386/avx512ifmaintrin.h          |     4 +-
 gcc/config/i386/avx512pfintrin.h            |     2 +-
 gcc/config/i386/avx512vbmi2intrin.h         |     4 +-
 gcc/config/i386/avx512vbmiintrin.h          |     4 +-
 gcc/config/i386/avx512vnniintrin.h          |     4 +-
 gcc/config/i386/avx512vp2intersectintrin.h  |     4 +-
 gcc/config/i386/avx512vpopcntdqintrin.h     |     4 +-
 gcc/config/i386/gfniintrin.h                |    76 +-
 gcc/config/i386/i386-builtin.def            |  1312 +-
 gcc/config/i386/i386-builtins.cc            |    96 +-
 gcc/config/i386/i386-c.cc                   |     2 +
 gcc/config/i386/i386-expand.cc              |    18 +-
 gcc/config/i386/i386-options.cc             |    33 +-
 gcc/config/i386/i386.cc                     |   168 +-
 gcc/config/i386/i386.h                      |     7 +-
 gcc/config/i386/i386.md                     |   127 +-
 gcc/config/i386/i386.opt                    |     4 +
 gcc/config/i386/immintrin.h                 |     2 +
 gcc/config/i386/predicates.md               |     3 +-
 gcc/config/i386/sse.md                      |   854 +-
 gcc/config/i386/vaesintrin.h                |     4 +-
 gcc/config/i386/vpclmulqdqintrin.h          |     4 +-
 gcc/testsuite/gcc.target/i386/noevex512-1.c |    13 +
 gcc/testsuite/gcc.target/i386/noevex512-2.c |    13 +
 gcc/testsuite/gcc.target/i386/noevex512-3.c |    13 +
 gcc/testsuite/gcc.target/i386/pr89229-5b.c  |     2 +-
 gcc/testsuite/gcc.target/i386/pr89229-6b.c  |     2 +-
 gcc/testsuite/gcc.target/i386/pr89229-7b.c  |     2 +-
 gcc/testsuite/gcc.target/i386/pr90096.c     |     2 +-
 41 files changed, 17170 insertions(+), 16738 deletions(-)
 create mode 100644 gcc/config/i386/avx512bitalgvlintrin.h
 create mode 100644 gcc/testsuite/gcc.target/i386/noevex512-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/noevex512-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/noevex512-3.c

-- 
2.31.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 01/18] Initial support for -mevex512
  2023-09-21  7:19 [PATCH 00/18] Support -mevex512 for AVX512 Hu, Lin1
@ 2023-09-21  7:19 ` Hu, Lin1
  2023-10-07  6:34   ` [PATCH v2 " Haochen Jiang
  2023-09-21  7:19 ` [PATCH 02/18] [PATCH 1/5] Push evex512 target for 512 bit intrins Hu, Lin1
                   ` (18 subsequent siblings)
  19 siblings, 1 reply; 25+ messages in thread
From: Hu, Lin1 @ 2023-09-21  7:19 UTC (permalink / raw)
  To: gcc-patches; +Cc: hongtao.liu, ubizjak, haochen.jiang

From: Haochen Jiang <haochen.jiang@intel.com>

gcc/ChangeLog:

	* common/config/i386/i386-common.cc
	(OPTION_MASK_ISA2_EVEX512_SET): New.
	(OPTION_MASK_ISA2_EVEX512_UNSET): Ditto.
	(ix86_handle_option): Handle EVEX512.
	* config/i386/i386-c.cc (ix86_target_macros_internal): Ditto.
	* config/i386/i386-options.cc: (isa2_opts): Ditto.
	(ix86_valid_target_attribute_inner_p): Ditto.
	(ix86_option_override_internal): Set EVEX512 target if it is not
	explicitly set when AVX512 is enabled. Disable
	AVX512{PF,ER,4VNNIW,4FAMPS} for -mno-evex512.
	* config/i386/i386.opt: Add mevex512. Temporaily RejectNegative.
---
 gcc/common/config/i386/i386-common.cc | 15 +++++++++++++++
 gcc/config/i386/i386-c.cc             |  2 ++
 gcc/config/i386/i386-options.cc       | 19 ++++++++++++++++++-
 gcc/config/i386/i386.opt              |  4 ++++
 4 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/gcc/common/config/i386/i386-common.cc b/gcc/common/config/i386/i386-common.cc
index 95468b7c405..8cc59e08d06 100644
--- a/gcc/common/config/i386/i386-common.cc
+++ b/gcc/common/config/i386/i386-common.cc
@@ -123,6 +123,7 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA2_SM3_SET OPTION_MASK_ISA2_SM3
 #define OPTION_MASK_ISA2_SHA512_SET OPTION_MASK_ISA2_SHA512
 #define OPTION_MASK_ISA2_SM4_SET OPTION_MASK_ISA2_SM4
+#define OPTION_MASK_ISA2_EVEX512_SET OPTION_MASK_ISA2_EVEX512
 
 /* SSE4 includes both SSE4.1 and SSE4.2. -msse4 should be the same
    as -msse4.2.  */
@@ -309,6 +310,7 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA2_SM3_UNSET OPTION_MASK_ISA2_SM3
 #define OPTION_MASK_ISA2_SHA512_UNSET OPTION_MASK_ISA2_SHA512
 #define OPTION_MASK_ISA2_SM4_UNSET OPTION_MASK_ISA2_SM4
+#define OPTION_MASK_ISA2_EVEX512_UNSET OPTION_MASK_ISA2_EVEX512
 
 /* SSE4 includes both SSE4.1 and SSE4.2.  -mno-sse4 should the same
    as -mno-sse4.1. */
@@ -1341,6 +1343,19 @@ ix86_handle_option (struct gcc_options *opts,
 	}
       return true;
 
+    case OPT_mevex512:
+      if (value)
+	{
+	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_EVEX512_SET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_EVEX512_SET;
+	}
+      else
+	{
+	  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_EVEX512_UNSET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_EVEX512_UNSET;
+	}
+      return true;
+
     case OPT_mfma:
       if (value)
 	{
diff --git a/gcc/config/i386/i386-c.cc b/gcc/config/i386/i386-c.cc
index 47768fa0940..93154efa7ff 100644
--- a/gcc/config/i386/i386-c.cc
+++ b/gcc/config/i386/i386-c.cc
@@ -707,6 +707,8 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
     def_or_undef (parse_in, "__SHA512__");
   if (isa_flag2 & OPTION_MASK_ISA2_SM4)
     def_or_undef (parse_in, "__SM4__");
+  if (isa_flag2 & OPTION_MASK_ISA2_EVEX512)
+    def_or_undef (parse_in, "__EVEX512__");
   if (TARGET_IAMCU)
     {
       def_or_undef (parse_in, "__iamcu");
diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
index e47f9ed5d5f..a1a7a92da9f 100644
--- a/gcc/config/i386/i386-options.cc
+++ b/gcc/config/i386/i386-options.cc
@@ -250,7 +250,8 @@ static struct ix86_target_opts isa2_opts[] =
   { "-mavxvnniint16",	OPTION_MASK_ISA2_AVXVNNIINT16 },
   { "-msm3",		OPTION_MASK_ISA2_SM3 },
   { "-msha512",		OPTION_MASK_ISA2_SHA512 },
-  { "-msm4",            OPTION_MASK_ISA2_SM4 }
+  { "-msm4",            OPTION_MASK_ISA2_SM4 },
+  { "-mevex512",        OPTION_MASK_ISA2_EVEX512 }
 };
 static struct ix86_target_opts isa_opts[] =
 {
@@ -1109,6 +1110,7 @@ ix86_valid_target_attribute_inner_p (tree fndecl, tree args, char *p_strings[],
     IX86_ATTR_ISA ("sm3", OPT_msm3),
     IX86_ATTR_ISA ("sha512", OPT_msha512),
     IX86_ATTR_ISA ("sm4", OPT_msm4),
+    IX86_ATTR_ISA ("evex512", OPT_mevex512),
 
     /* enum options */
     IX86_ATTR_ENUM ("fpmath=",	OPT_mfpmath_),
@@ -2559,6 +2561,21 @@ ix86_option_override_internal (bool main_args_p,
       &= ~((OPTION_MASK_ISA_BMI | OPTION_MASK_ISA_BMI2 | OPTION_MASK_ISA_TBM)
 	   & ~opts->x_ix86_isa_flags_explicit);
 
+  /* Set EVEX512 target if it is not explicitly set
+     when AVX512 is enabled.  */
+  if (TARGET_AVX512F_P(opts->x_ix86_isa_flags)
+      && !(opts->x_ix86_isa_flags2_explicit & OPTION_MASK_ISA2_EVEX512))
+    opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_EVEX512;
+
+  /* Disable AVX512{PF,ER,4VNNIW,4FAMPS} for -mno-evex512.  */
+  if (!TARGET_EVEX512_P(opts->x_ix86_isa_flags2))
+    {
+      opts->x_ix86_isa_flags
+	&= ~(OPTION_MASK_ISA_AVX512PF | OPTION_MASK_ISA_AVX512ER);
+      opts->x_ix86_isa_flags2
+	&= ~(OPTION_MASK_ISA2_AVX5124FMAPS | OPTION_MASK_ISA2_AVX5124VNNIW);
+    }
+
   /* Validate -mpreferred-stack-boundary= value or default it to
      PREFERRED_STACK_BOUNDARY_DEFAULT.  */
   ix86_preferred_stack_boundary = PREFERRED_STACK_BOUNDARY_DEFAULT;
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index 78b499304a4..6d8601b1f75 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -1310,3 +1310,7 @@ Enable vectorization for gather instruction.
 mscatter
 Target Alias(mtune-ctrl=, use_scatter, ^use_scatter)
 Enable vectorization for scatter instruction.
+
+mevex512
+Target RejectNegative Mask(ISA2_EVEX512) Var(ix86_isa_flags2) Save
+Support 512 bit vector built-in functions and code generation.
-- 
2.31.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 02/18] [PATCH 1/5] Push evex512 target for 512 bit intrins
  2023-09-21  7:19 [PATCH 00/18] Support -mevex512 for AVX512 Hu, Lin1
  2023-09-21  7:19 ` [PATCH 01/18] Initial support for -mevex512 Hu, Lin1
@ 2023-09-21  7:19 ` Hu, Lin1
  2023-09-21  7:19 ` [PATCH 03/18] [PATCH 2/5] " Hu, Lin1
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Hu, Lin1 @ 2023-09-21  7:19 UTC (permalink / raw)
  To: gcc-patches; +Cc: hongtao.liu, ubizjak, haochen.jiang

From: Haochen Jiang <haochen.jiang@intel.com>

gcc/ChangeLog:

	* config/i386/avx512fintrin.h: Add evex512 target for 512 bit intrins.
---
 gcc/config/i386/avx512fintrin.h | 19663 +++++++++++++++---------------
 1 file changed, 9871 insertions(+), 9792 deletions(-)

diff --git a/gcc/config/i386/avx512fintrin.h b/gcc/config/i386/avx512fintrin.h
index 517e7878d8c..85bf72d9fae 100644
--- a/gcc/config/i386/avx512fintrin.h
+++ b/gcc/config/i386/avx512fintrin.h
@@ -34,4457 +34,4421 @@
 #define __DISABLE_AVX512F__
 #endif /* __AVX512F__ */
 
-/* Internal data types for implementing the intrinsics.  */
-typedef double __v8df __attribute__ ((__vector_size__ (64)));
-typedef float __v16sf __attribute__ ((__vector_size__ (64)));
-typedef long long __v8di __attribute__ ((__vector_size__ (64)));
-typedef unsigned long long __v8du __attribute__ ((__vector_size__ (64)));
-typedef int __v16si __attribute__ ((__vector_size__ (64)));
-typedef unsigned int __v16su __attribute__ ((__vector_size__ (64)));
-typedef short __v32hi __attribute__ ((__vector_size__ (64)));
-typedef unsigned short __v32hu __attribute__ ((__vector_size__ (64)));
-typedef char __v64qi __attribute__ ((__vector_size__ (64)));
-typedef unsigned char __v64qu __attribute__ ((__vector_size__ (64)));
-
-/* The Intel API is flexible enough that we must allow aliasing with other
-   vector types, and their scalar components.  */
-typedef float __m512 __attribute__ ((__vector_size__ (64), __may_alias__));
-typedef long long __m512i __attribute__ ((__vector_size__ (64), __may_alias__));
-typedef double __m512d __attribute__ ((__vector_size__ (64), __may_alias__));
-
-/* Unaligned version of the same type.  */
-typedef float __m512_u __attribute__ ((__vector_size__ (64), __may_alias__, __aligned__ (1)));
-typedef long long __m512i_u __attribute__ ((__vector_size__ (64), __may_alias__, __aligned__ (1)));
-typedef double __m512d_u __attribute__ ((__vector_size__ (64), __may_alias__, __aligned__ (1)));
-
 typedef unsigned char  __mmask8;
 typedef unsigned short __mmask16;
+typedef unsigned int __mmask32;
 
-extern __inline __mmask16
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_int2mask (int __M)
+/* Constants for mantissa extraction */
+typedef enum
 {
-  return (__mmask16) __M;
-}
+  _MM_MANT_NORM_1_2,		/* interval [1, 2)      */
+  _MM_MANT_NORM_p5_2,		/* interval [0.5, 2)    */
+  _MM_MANT_NORM_p5_1,		/* interval [0.5, 1)    */
+  _MM_MANT_NORM_p75_1p5		/* interval [0.75, 1.5) */
+} _MM_MANTISSA_NORM_ENUM;
 
-extern __inline int
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask2int (__mmask16 __M)
+typedef enum
 {
-  return (int) __M;
-}
+  _MM_MANT_SIGN_src,		/* sign = sign(SRC)     */
+  _MM_MANT_SIGN_zero,		/* sign = 0             */
+  _MM_MANT_SIGN_nan		/* DEST = NaN if sign(SRC) = 1 */
+} _MM_MANTISSA_SIGN_ENUM;
 
-extern __inline __m512i
+#ifdef __OPTIMIZE__
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_set_epi64 (long long __A, long long __B, long long __C,
-		  long long __D, long long __E, long long __F,
-		  long long __G, long long __H)
+_mm_add_round_sd (__m128d __A, __m128d __B, const int __R)
 {
-  return __extension__ (__m512i) (__v8di)
-	 { __H, __G, __F, __E, __D, __C, __B, __A };
+  return (__m128d) __builtin_ia32_addsd_round ((__v2df) __A,
+					       (__v2df) __B,
+					       __R);
 }
 
-/* Create the vector [A B C D E F G H I J K L M N O P].  */
-extern __inline __m512i
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_set_epi32 (int __A, int __B, int __C, int __D,
-		  int __E, int __F, int __G, int __H,
-		  int __I, int __J, int __K, int __L,
-		  int __M, int __N, int __O, int __P)
-{
-  return __extension__ (__m512i)(__v16si)
-	 { __P, __O, __N, __M, __L, __K, __J, __I,
-	   __H, __G, __F, __E, __D, __C, __B, __A };
-}
-
-extern __inline __m512i
-__attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_set_epi16 (short __q31, short __q30, short __q29, short __q28,
-		  short __q27, short __q26, short __q25, short __q24,
-		  short __q23, short __q22, short __q21, short __q20,
-		  short __q19, short __q18, short __q17, short __q16,
-		  short __q15, short __q14, short __q13, short __q12,
-		  short __q11, short __q10, short __q09, short __q08,
-		  short __q07, short __q06, short __q05, short __q04,
-		  short __q03, short __q02, short __q01, short __q00)
+_mm_mask_add_round_sd (__m128d __W, __mmask8 __U, __m128d __A,
+			  __m128d __B, const int __R)
 {
-  return __extension__ (__m512i)(__v32hi){
-    __q00, __q01, __q02, __q03, __q04, __q05, __q06, __q07,
-    __q08, __q09, __q10, __q11, __q12, __q13, __q14, __q15,
-    __q16, __q17, __q18, __q19, __q20, __q21, __q22, __q23,
-    __q24, __q25, __q26, __q27, __q28, __q29, __q30, __q31
-  };
+  return (__m128d) __builtin_ia32_addsd_mask_round ((__v2df) __A,
+						 (__v2df) __B,
+						 (__v2df) __W,
+						 (__mmask8) __U, __R);
 }
 
-extern __inline __m512i
-__attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_set_epi8 (char __q63, char __q62, char __q61, char __q60,
-		 char __q59, char __q58, char __q57, char __q56,
-		 char __q55, char __q54, char __q53, char __q52,
-		 char __q51, char __q50, char __q49, char __q48,
-		 char __q47, char __q46, char __q45, char __q44,
-		 char __q43, char __q42, char __q41, char __q40,
-		 char __q39, char __q38, char __q37, char __q36,
-		 char __q35, char __q34, char __q33, char __q32,
-		 char __q31, char __q30, char __q29, char __q28,
-		 char __q27, char __q26, char __q25, char __q24,
-		 char __q23, char __q22, char __q21, char __q20,
-		 char __q19, char __q18, char __q17, char __q16,
-		 char __q15, char __q14, char __q13, char __q12,
-		 char __q11, char __q10, char __q09, char __q08,
-		 char __q07, char __q06, char __q05, char __q04,
-		 char __q03, char __q02, char __q01, char __q00)
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_add_round_sd (__mmask8 __U, __m128d __A, __m128d __B,
+			   const int __R)
 {
-  return __extension__ (__m512i)(__v64qi){
-    __q00, __q01, __q02, __q03, __q04, __q05, __q06, __q07,
-    __q08, __q09, __q10, __q11, __q12, __q13, __q14, __q15,
-    __q16, __q17, __q18, __q19, __q20, __q21, __q22, __q23,
-    __q24, __q25, __q26, __q27, __q28, __q29, __q30, __q31,
-    __q32, __q33, __q34, __q35, __q36, __q37, __q38, __q39,
-    __q40, __q41, __q42, __q43, __q44, __q45, __q46, __q47,
-    __q48, __q49, __q50, __q51, __q52, __q53, __q54, __q55,
-    __q56, __q57, __q58, __q59, __q60, __q61, __q62, __q63
-  };
+  return (__m128d) __builtin_ia32_addsd_mask_round ((__v2df) __A,
+						 (__v2df) __B,
+						 (__v2df)
+						 _mm_setzero_pd (),
+						 (__mmask8) __U, __R);
 }
 
-extern __inline __m512d
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_set_pd (double __A, double __B, double __C, double __D,
-	       double __E, double __F, double __G, double __H)
+_mm_add_round_ss (__m128 __A, __m128 __B, const int __R)
 {
-  return __extension__ (__m512d)
-	 { __H, __G, __F, __E, __D, __C, __B, __A };
+  return (__m128) __builtin_ia32_addss_round ((__v4sf) __A,
+					      (__v4sf) __B,
+					      __R);
 }
 
-extern __inline __m512
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_set_ps (float __A, float __B, float __C, float __D,
-	       float __E, float __F, float __G, float __H,
-	       float __I, float __J, float __K, float __L,
-	       float __M, float __N, float __O, float __P)
+_mm_mask_add_round_ss (__m128 __W, __mmask8 __U, __m128 __A,
+			  __m128 __B, const int __R)
 {
-  return __extension__ (__m512)
-	 { __P, __O, __N, __M, __L, __K, __J, __I,
-	   __H, __G, __F, __E, __D, __C, __B, __A };
+  return (__m128) __builtin_ia32_addss_mask_round ((__v4sf) __A,
+						 (__v4sf) __B,
+						 (__v4sf) __W,
+						 (__mmask8) __U, __R);
 }
 
-#define _mm512_setr_epi64(e0,e1,e2,e3,e4,e5,e6,e7)			      \
-  _mm512_set_epi64(e7,e6,e5,e4,e3,e2,e1,e0)
-
-#define _mm512_setr_epi32(e0,e1,e2,e3,e4,e5,e6,e7,			      \
-			  e8,e9,e10,e11,e12,e13,e14,e15)		      \
-  _mm512_set_epi32(e15,e14,e13,e12,e11,e10,e9,e8,e7,e6,e5,e4,e3,e2,e1,e0)
-
-#define _mm512_setr_pd(e0,e1,e2,e3,e4,e5,e6,e7)				      \
-  _mm512_set_pd(e7,e6,e5,e4,e3,e2,e1,e0)
-
-#define _mm512_setr_ps(e0,e1,e2,e3,e4,e5,e6,e7,e8,e9,e10,e11,e12,e13,e14,e15) \
-  _mm512_set_ps(e15,e14,e13,e12,e11,e10,e9,e8,e7,e6,e5,e4,e3,e2,e1,e0)
-
-extern __inline __m512
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_undefined_ps (void)
+_mm_maskz_add_round_ss (__mmask8 __U, __m128 __A, __m128 __B,
+			   const int __R)
 {
-#pragma GCC diagnostic push
-#pragma GCC diagnostic ignored "-Winit-self"
-  __m512 __Y = __Y;
-#pragma GCC diagnostic pop
-  return __Y;
+  return (__m128) __builtin_ia32_addss_mask_round ((__v4sf) __A,
+						 (__v4sf) __B,
+						 (__v4sf)
+						 _mm_setzero_ps (),
+						 (__mmask8) __U, __R);
 }
 
-#define _mm512_undefined _mm512_undefined_ps
-
-extern __inline __m512d
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_undefined_pd (void)
+_mm_sub_round_sd (__m128d __A, __m128d __B, const int __R)
 {
-#pragma GCC diagnostic push
-#pragma GCC diagnostic ignored "-Winit-self"
-  __m512d __Y = __Y;
-#pragma GCC diagnostic pop
-  return __Y;
+  return (__m128d) __builtin_ia32_subsd_round ((__v2df) __A,
+					       (__v2df) __B,
+					       __R);
 }
 
-extern __inline __m512i
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_undefined_epi32 (void)
+_mm_mask_sub_round_sd (__m128d __W, __mmask8 __U, __m128d __A,
+			  __m128d __B, const int __R)
 {
-#pragma GCC diagnostic push
-#pragma GCC diagnostic ignored "-Winit-self"
-  __m512i __Y = __Y;
-#pragma GCC diagnostic pop
-  return __Y;
+  return (__m128d) __builtin_ia32_subsd_mask_round ((__v2df) __A,
+						 (__v2df) __B,
+						 (__v2df) __W,
+						 (__mmask8) __U, __R);
 }
 
-#define _mm512_undefined_si512 _mm512_undefined_epi32
-
-extern __inline __m512i
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_set1_epi8 (char __A)
+_mm_maskz_sub_round_sd (__mmask8 __U, __m128d __A, __m128d __B,
+			   const int __R)
 {
-  return __extension__ (__m512i)(__v64qi)
-	 { __A, __A, __A, __A, __A, __A, __A, __A,
-	   __A, __A, __A, __A, __A, __A, __A, __A,
-	   __A, __A, __A, __A, __A, __A, __A, __A,
-	   __A, __A, __A, __A, __A, __A, __A, __A,
-	   __A, __A, __A, __A, __A, __A, __A, __A,
-	   __A, __A, __A, __A, __A, __A, __A, __A,
-	   __A, __A, __A, __A, __A, __A, __A, __A,
-	   __A, __A, __A, __A, __A, __A, __A, __A };
+  return (__m128d) __builtin_ia32_subsd_mask_round ((__v2df) __A,
+						 (__v2df) __B,
+						 (__v2df)
+						 _mm_setzero_pd (),
+						 (__mmask8) __U, __R);
 }
 
-extern __inline __m512i
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_set1_epi16 (short __A)
+_mm_sub_round_ss (__m128 __A, __m128 __B, const int __R)
 {
-  return __extension__ (__m512i)(__v32hi)
-	 { __A, __A, __A, __A, __A, __A, __A, __A,
-	   __A, __A, __A, __A, __A, __A, __A, __A,
-	   __A, __A, __A, __A, __A, __A, __A, __A,
-	   __A, __A, __A, __A, __A, __A, __A, __A };
+  return (__m128) __builtin_ia32_subss_round ((__v4sf) __A,
+					      (__v4sf) __B,
+					      __R);
 }
 
-extern __inline __m512d
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_set1_pd (double __A)
+_mm_mask_sub_round_ss (__m128 __W, __mmask8 __U, __m128 __A,
+			  __m128 __B, const int __R)
 {
-  return __extension__ (__m512d)(__v8df)
-    { __A, __A, __A, __A, __A, __A, __A, __A };
+  return (__m128) __builtin_ia32_subss_mask_round ((__v4sf) __A,
+						 (__v4sf) __B,
+						 (__v4sf) __W,
+						 (__mmask8) __U, __R);
 }
 
-extern __inline __m512
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_set1_ps (float __A)
+_mm_maskz_sub_round_ss (__mmask8 __U, __m128 __A, __m128 __B,
+			   const int __R)
 {
-  return __extension__ (__m512)(__v16sf)
-    { __A, __A, __A, __A, __A, __A, __A, __A,
-      __A, __A, __A, __A, __A, __A, __A, __A };
+  return (__m128) __builtin_ia32_subss_mask_round ((__v4sf) __A,
+						 (__v4sf) __B,
+						 (__v4sf)
+						 _mm_setzero_ps (),
+						 (__mmask8) __U, __R);
 }
 
-/* Create the vector [A B C D A B C D A B C D A B C D].  */
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_set4_epi32 (int __A, int __B, int __C, int __D)
-{
-  return __extension__ (__m512i)(__v16si)
-	 { __D, __C, __B, __A, __D, __C, __B, __A,
-	   __D, __C, __B, __A, __D, __C, __B, __A };
-}
+#else
+#define _mm_add_round_sd(A, B, C)            \
+    (__m128d)__builtin_ia32_addsd_round(A, B, C)
 
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_set4_epi64 (long long __A, long long __B, long long __C,
-		   long long __D)
-{
-  return __extension__ (__m512i) (__v8di)
-	 { __D, __C, __B, __A, __D, __C, __B, __A };
-}
+#define _mm_mask_add_round_sd(W, U, A, B, C) \
+    (__m128d)__builtin_ia32_addsd_mask_round(A, B, W, U, C)
 
-extern __inline __m512d
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_set4_pd (double __A, double __B, double __C, double __D)
-{
-  return __extension__ (__m512d)
-	 { __D, __C, __B, __A, __D, __C, __B, __A };
-}
+#define _mm_maskz_add_round_sd(U, A, B, C)   \
+    (__m128d)__builtin_ia32_addsd_mask_round(A, B, (__v2df)_mm_setzero_pd(), U, C)
 
-extern __inline __m512
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_set4_ps (float __A, float __B, float __C, float __D)
-{
-  return __extension__ (__m512)
-	 { __D, __C, __B, __A, __D, __C, __B, __A,
-	   __D, __C, __B, __A, __D, __C, __B, __A };
-}
+#define _mm_add_round_ss(A, B, C)            \
+    (__m128)__builtin_ia32_addss_round(A, B, C)
 
-#define _mm512_setr4_epi64(e0,e1,e2,e3)					      \
-  _mm512_set4_epi64(e3,e2,e1,e0)
+#define _mm_mask_add_round_ss(W, U, A, B, C) \
+    (__m128)__builtin_ia32_addss_mask_round(A, B, W, U, C)
 
-#define _mm512_setr4_epi32(e0,e1,e2,e3)					      \
-  _mm512_set4_epi32(e3,e2,e1,e0)
+#define _mm_maskz_add_round_ss(U, A, B, C)   \
+    (__m128)__builtin_ia32_addss_mask_round(A, B, (__v4sf)_mm_setzero_ps(), U, C)
 
-#define _mm512_setr4_pd(e0,e1,e2,e3)					      \
-  _mm512_set4_pd(e3,e2,e1,e0)
+#define _mm_sub_round_sd(A, B, C)            \
+    (__m128d)__builtin_ia32_subsd_round(A, B, C)
 
-#define _mm512_setr4_ps(e0,e1,e2,e3)					      \
-  _mm512_set4_ps(e3,e2,e1,e0)
+#define _mm_mask_sub_round_sd(W, U, A, B, C) \
+    (__m128d)__builtin_ia32_subsd_mask_round(A, B, W, U, C)
 
-extern __inline __m512
+#define _mm_maskz_sub_round_sd(U, A, B, C)   \
+    (__m128d)__builtin_ia32_subsd_mask_round(A, B, (__v2df)_mm_setzero_pd(), U, C)
+
+#define _mm_sub_round_ss(A, B, C)            \
+    (__m128)__builtin_ia32_subss_round(A, B, C)
+
+#define _mm_mask_sub_round_ss(W, U, A, B, C) \
+    (__m128)__builtin_ia32_subss_mask_round(A, B, W, U, C)
+
+#define _mm_maskz_sub_round_ss(U, A, B, C)   \
+    (__m128)__builtin_ia32_subss_mask_round(A, B, (__v4sf)_mm_setzero_ps(), U, C)
+
+#endif
+
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_setzero_ps (void)
+_mm_rcp14_sd (__m128d __A, __m128d __B)
 {
-  return __extension__ (__m512){ 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
-				 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 };
+  return (__m128d) __builtin_ia32_rcp14sd ((__v2df) __B,
+					   (__v2df) __A);
 }
 
-extern __inline __m512
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_setzero (void)
+_mm_mask_rcp14_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B)
 {
-  return _mm512_setzero_ps ();
+  return (__m128d) __builtin_ia32_rcp14sd_mask ((__v2df) __B,
+						(__v2df) __A,
+						(__v2df) __W,
+						(__mmask8) __U);
 }
 
-extern __inline __m512d
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_setzero_pd (void)
+_mm_maskz_rcp14_sd (__mmask8 __U, __m128d __A, __m128d __B)
 {
-  return __extension__ (__m512d) { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 };
+  return (__m128d) __builtin_ia32_rcp14sd_mask ((__v2df) __B,
+						(__v2df) __A,
+						(__v2df) _mm_setzero_ps (),
+						(__mmask8) __U);
 }
 
-extern __inline __m512i
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_setzero_epi32 (void)
+_mm_rcp14_ss (__m128 __A, __m128 __B)
 {
-  return __extension__ (__m512i)(__v8di){ 0, 0, 0, 0, 0, 0, 0, 0 };
+  return (__m128) __builtin_ia32_rcp14ss ((__v4sf) __B,
+					  (__v4sf) __A);
 }
 
-extern __inline __m512i
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_setzero_si512 (void)
+_mm_mask_rcp14_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B)
 {
-  return __extension__ (__m512i)(__v8di){ 0, 0, 0, 0, 0, 0, 0, 0 };
+  return (__m128) __builtin_ia32_rcp14ss_mask ((__v4sf) __B,
+						(__v4sf) __A,
+						(__v4sf) __W,
+						(__mmask8) __U);
 }
 
-extern __inline __m512d
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_mov_pd (__m512d __W, __mmask8 __U, __m512d __A)
+_mm_maskz_rcp14_ss (__mmask8 __U, __m128 __A, __m128 __B)
 {
-  return (__m512d) __builtin_ia32_movapd512_mask ((__v8df) __A,
-						  (__v8df) __W,
-						  (__mmask8) __U);
+  return (__m128) __builtin_ia32_rcp14ss_mask ((__v4sf) __B,
+						(__v4sf) __A,
+						(__v4sf) _mm_setzero_ps (),
+						(__mmask8) __U);
 }
 
-extern __inline __m512d
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_mov_pd (__mmask8 __U, __m512d __A)
+_mm_rsqrt14_sd (__m128d __A, __m128d __B)
 {
-  return (__m512d) __builtin_ia32_movapd512_mask ((__v8df) __A,
-						  (__v8df)
-						  _mm512_setzero_pd (),
-						  (__mmask8) __U);
+  return (__m128d) __builtin_ia32_rsqrt14sd ((__v2df) __B,
+					     (__v2df) __A);
 }
 
-extern __inline __m512
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_mov_ps (__m512 __W, __mmask16 __U, __m512 __A)
+_mm_mask_rsqrt14_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B)
 {
-  return (__m512) __builtin_ia32_movaps512_mask ((__v16sf) __A,
-						 (__v16sf) __W,
-						 (__mmask16) __U);
+  return (__m128d) __builtin_ia32_rsqrt14sd_mask ((__v2df) __B,
+						 (__v2df) __A,
+						 (__v2df) __W,
+						 (__mmask8) __U);
 }
 
-extern __inline __m512
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_mov_ps (__mmask16 __U, __m512 __A)
+_mm_maskz_rsqrt14_sd (__mmask8 __U, __m128d __A, __m128d __B)
 {
-  return (__m512) __builtin_ia32_movaps512_mask ((__v16sf) __A,
-						 (__v16sf)
-						 _mm512_setzero_ps (),
-						 (__mmask16) __U);
+  return (__m128d) __builtin_ia32_rsqrt14sd_mask ((__v2df) __B,
+						 (__v2df) __A,
+						 (__v2df) _mm_setzero_pd (),
+						 (__mmask8) __U);
 }
 
-extern __inline __m512d
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_load_pd (void const *__P)
+_mm_rsqrt14_ss (__m128 __A, __m128 __B)
 {
-  return *(__m512d *) __P;
+  return (__m128) __builtin_ia32_rsqrt14ss ((__v4sf) __B,
+					    (__v4sf) __A);
 }
 
-extern __inline __m512d
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_load_pd (__m512d __W, __mmask8 __U, void const *__P)
+_mm_mask_rsqrt14_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B)
 {
-  return (__m512d) __builtin_ia32_loadapd512_mask ((const __v8df *) __P,
-						   (__v8df) __W,
-						   (__mmask8) __U);
+  return (__m128) __builtin_ia32_rsqrt14ss_mask ((__v4sf) __B,
+						 (__v4sf) __A,
+						 (__v4sf) __W,
+						 (__mmask8) __U);
 }
 
-extern __inline __m512d
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_load_pd (__mmask8 __U, void const *__P)
+_mm_maskz_rsqrt14_ss (__mmask8 __U, __m128 __A, __m128 __B)
 {
-  return (__m512d) __builtin_ia32_loadapd512_mask ((const __v8df *) __P,
-						   (__v8df)
-						   _mm512_setzero_pd (),
-						   (__mmask8) __U);
+  return (__m128) __builtin_ia32_rsqrt14ss_mask ((__v4sf) __B,
+						(__v4sf) __A,
+						(__v4sf) _mm_setzero_ps (),
+						(__mmask8) __U);
 }
 
-extern __inline void
+#ifdef __OPTIMIZE__
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_store_pd (void *__P, __m512d __A)
+_mm_sqrt_round_sd (__m128d __A, __m128d __B, const int __R)
 {
-  *(__m512d *) __P = __A;
+  return (__m128d) __builtin_ia32_sqrtsd_mask_round ((__v2df) __B,
+						     (__v2df) __A,
+						     (__v2df)
+						     _mm_setzero_pd (),
+						     (__mmask8) -1, __R);
 }
 
-extern __inline void
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_store_pd (void *__P, __mmask8 __U, __m512d __A)
+_mm_mask_sqrt_round_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B,
+			const int __R)
 {
-  __builtin_ia32_storeapd512_mask ((__v8df *) __P, (__v8df) __A,
-				   (__mmask8) __U);
+  return (__m128d) __builtin_ia32_sqrtsd_mask_round ((__v2df) __B,
+						     (__v2df) __A,
+						     (__v2df) __W,
+						     (__mmask8) __U, __R);
 }
 
-extern __inline __m512
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_load_ps (void const *__P)
+_mm_maskz_sqrt_round_sd (__mmask8 __U, __m128d __A, __m128d __B, const int __R)
 {
-  return *(__m512 *) __P;
+  return (__m128d) __builtin_ia32_sqrtsd_mask_round ((__v2df) __B,
+						     (__v2df) __A,
+						     (__v2df)
+						     _mm_setzero_pd (),
+						     (__mmask8) __U, __R);
 }
 
-extern __inline __m512
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_load_ps (__m512 __W, __mmask16 __U, void const *__P)
+_mm_sqrt_round_ss (__m128 __A, __m128 __B, const int __R)
 {
-  return (__m512) __builtin_ia32_loadaps512_mask ((const __v16sf *) __P,
-						  (__v16sf) __W,
-						  (__mmask16) __U);
+  return (__m128) __builtin_ia32_sqrtss_mask_round ((__v4sf) __B,
+						    (__v4sf) __A,
+						    (__v4sf)
+						    _mm_setzero_ps (),
+						    (__mmask8) -1, __R);
 }
 
-extern __inline __m512
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_load_ps (__mmask16 __U, void const *__P)
+_mm_mask_sqrt_round_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B,
+			const int __R)
 {
-  return (__m512) __builtin_ia32_loadaps512_mask ((const __v16sf *) __P,
-						  (__v16sf)
-						  _mm512_setzero_ps (),
-						  (__mmask16) __U);
+  return (__m128) __builtin_ia32_sqrtss_mask_round ((__v4sf) __B,
+						    (__v4sf) __A,
+						    (__v4sf) __W,
+						    (__mmask8) __U, __R);
 }
 
-extern __inline void
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_store_ps (void *__P, __m512 __A)
+_mm_maskz_sqrt_round_ss (__mmask8 __U, __m128 __A, __m128 __B, const int __R)
 {
-  *(__m512 *) __P = __A;
+  return (__m128) __builtin_ia32_sqrtss_mask_round ((__v4sf) __B,
+						    (__v4sf) __A,
+						    (__v4sf)
+						    _mm_setzero_ps (),
+						    (__mmask8) __U, __R);
 }
 
-extern __inline void
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_store_ps (void *__P, __mmask16 __U, __m512 __A)
+_mm_mul_round_sd (__m128d __A, __m128d __B, const int __R)
 {
-  __builtin_ia32_storeaps512_mask ((__v16sf *) __P, (__v16sf) __A,
-				   (__mmask16) __U);
+  return (__m128d) __builtin_ia32_mulsd_round ((__v2df) __A,
+					       (__v2df) __B,
+					       __R);
 }
 
-extern __inline __m512i
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_mov_epi64 (__m512i __W, __mmask8 __U, __m512i __A)
+_mm_mask_mul_round_sd (__m128d __W, __mmask8 __U, __m128d __A,
+			  __m128d __B, const int __R)
 {
-  return (__m512i) __builtin_ia32_movdqa64_512_mask ((__v8di) __A,
-						     (__v8di) __W,
-						     (__mmask8) __U);
+  return (__m128d) __builtin_ia32_mulsd_mask_round ((__v2df) __A,
+						 (__v2df) __B,
+						 (__v2df) __W,
+						 (__mmask8) __U, __R);
 }
 
-extern __inline __m512i
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_mov_epi64 (__mmask8 __U, __m512i __A)
+_mm_maskz_mul_round_sd (__mmask8 __U, __m128d __A, __m128d __B,
+			   const int __R)
 {
-  return (__m512i) __builtin_ia32_movdqa64_512_mask ((__v8di) __A,
-						     (__v8di)
-						     _mm512_setzero_si512 (),
-						     (__mmask8) __U);
+  return (__m128d) __builtin_ia32_mulsd_mask_round ((__v2df) __A,
+						 (__v2df) __B,
+						 (__v2df)
+						 _mm_setzero_pd (),
+						 (__mmask8) __U, __R);
 }
 
-extern __inline __m512i
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_load_epi64 (void const *__P)
+_mm_mul_round_ss (__m128 __A, __m128 __B, const int __R)
 {
-  return *(__m512i *) __P;
+  return (__m128) __builtin_ia32_mulss_round ((__v4sf) __A,
+					      (__v4sf) __B,
+					      __R);
 }
 
-extern __inline __m512i
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_load_epi64 (__m512i __W, __mmask8 __U, void const *__P)
+_mm_mask_mul_round_ss (__m128 __W, __mmask8 __U, __m128 __A,
+			  __m128 __B, const int __R)
 {
-  return (__m512i) __builtin_ia32_movdqa64load512_mask ((const __v8di *) __P,
-							(__v8di) __W,
-							(__mmask8) __U);
+  return (__m128) __builtin_ia32_mulss_mask_round ((__v4sf) __A,
+						 (__v4sf) __B,
+						 (__v4sf) __W,
+						 (__mmask8) __U, __R);
 }
 
-extern __inline __m512i
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_load_epi64 (__mmask8 __U, void const *__P)
+_mm_maskz_mul_round_ss (__mmask8 __U, __m128 __A, __m128 __B,
+			   const int __R)
 {
-  return (__m512i) __builtin_ia32_movdqa64load512_mask ((const __v8di *) __P,
-							(__v8di)
-							_mm512_setzero_si512 (),
-							(__mmask8) __U);
+  return (__m128) __builtin_ia32_mulss_mask_round ((__v4sf) __A,
+						 (__v4sf) __B,
+						 (__v4sf)
+						 _mm_setzero_ps (),
+						 (__mmask8) __U, __R);
 }
 
-extern __inline void
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_store_epi64 (void *__P, __m512i __A)
+_mm_div_round_sd (__m128d __A, __m128d __B, const int __R)
 {
-  *(__m512i *) __P = __A;
+  return (__m128d) __builtin_ia32_divsd_round ((__v2df) __A,
+					       (__v2df) __B,
+					       __R);
 }
 
-extern __inline void
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_store_epi64 (void *__P, __mmask8 __U, __m512i __A)
+_mm_mask_div_round_sd (__m128d __W, __mmask8 __U, __m128d __A,
+			  __m128d __B, const int __R)
 {
-  __builtin_ia32_movdqa64store512_mask ((__v8di *) __P, (__v8di) __A,
-					(__mmask8) __U);
+  return (__m128d) __builtin_ia32_divsd_mask_round ((__v2df) __A,
+						 (__v2df) __B,
+						 (__v2df) __W,
+						 (__mmask8) __U, __R);
 }
 
-extern __inline __m512i
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_mov_epi32 (__m512i __W, __mmask16 __U, __m512i __A)
+_mm_maskz_div_round_sd (__mmask8 __U, __m128d __A, __m128d __B,
+			   const int __R)
 {
-  return (__m512i) __builtin_ia32_movdqa32_512_mask ((__v16si) __A,
-						     (__v16si) __W,
-						     (__mmask16) __U);
+  return (__m128d) __builtin_ia32_divsd_mask_round ((__v2df) __A,
+						 (__v2df) __B,
+						 (__v2df)
+						 _mm_setzero_pd (),
+						 (__mmask8) __U, __R);
 }
 
-extern __inline __m512i
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_mov_epi32 (__mmask16 __U, __m512i __A)
+_mm_div_round_ss (__m128 __A, __m128 __B, const int __R)
 {
-  return (__m512i) __builtin_ia32_movdqa32_512_mask ((__v16si) __A,
-						     (__v16si)
-						     _mm512_setzero_si512 (),
-						     (__mmask16) __U);
+  return (__m128) __builtin_ia32_divss_round ((__v4sf) __A,
+					      (__v4sf) __B,
+					      __R);
 }
 
-extern __inline __m512i
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_load_si512 (void const *__P)
+_mm_mask_div_round_ss (__m128 __W, __mmask8 __U, __m128 __A,
+			  __m128 __B, const int __R)
 {
-  return *(__m512i *) __P;
+  return (__m128) __builtin_ia32_divss_mask_round ((__v4sf) __A,
+						 (__v4sf) __B,
+						 (__v4sf) __W,
+						 (__mmask8) __U, __R);
 }
 
-extern __inline __m512i
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_load_epi32 (void const *__P)
+_mm_maskz_div_round_ss (__mmask8 __U, __m128 __A, __m128 __B,
+			   const int __R)
 {
-  return *(__m512i *) __P;
+  return (__m128) __builtin_ia32_divss_mask_round ((__v4sf) __A,
+						 (__v4sf) __B,
+						 (__v4sf)
+						 _mm_setzero_ps (),
+						 (__mmask8) __U, __R);
 }
 
-extern __inline __m512i
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_load_epi32 (__m512i __W, __mmask16 __U, void const *__P)
+_mm_scalef_round_sd (__m128d __A, __m128d __B, const int __R)
 {
-  return (__m512i) __builtin_ia32_movdqa32load512_mask ((const __v16si *) __P,
-							(__v16si) __W,
-							(__mmask16) __U);
+  return (__m128d) __builtin_ia32_scalefsd_mask_round ((__v2df) __A,
+						       (__v2df) __B,
+						       (__v2df)
+						       _mm_setzero_pd (),
+						       (__mmask8) -1, __R);
 }
 
-extern __inline __m512i
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_load_epi32 (__mmask16 __U, void const *__P)
+_mm_mask_scalef_round_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B,
+			  const int __R)
 {
-  return (__m512i) __builtin_ia32_movdqa32load512_mask ((const __v16si *) __P,
-							(__v16si)
-							_mm512_setzero_si512 (),
-							(__mmask16) __U);
+  return (__m128d) __builtin_ia32_scalefsd_mask_round ((__v2df) __A,
+						       (__v2df) __B,
+						       (__v2df) __W,
+						       (__mmask8) __U, __R);
 }
 
-extern __inline void
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_store_si512 (void *__P, __m512i __A)
+_mm_maskz_scalef_round_sd (__mmask8 __U, __m128d __A, __m128d __B,
+			   const int __R)
 {
-  *(__m512i *) __P = __A;
+  return (__m128d) __builtin_ia32_scalefsd_mask_round ((__v2df) __A,
+						       (__v2df) __B,
+						       (__v2df)
+						       _mm_setzero_pd (),
+						       (__mmask8) __U, __R);
 }
 
-extern __inline void
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_store_epi32 (void *__P, __m512i __A)
+_mm_scalef_round_ss (__m128 __A, __m128 __B, const int __R)
 {
-  *(__m512i *) __P = __A;
+  return (__m128) __builtin_ia32_scalefss_mask_round ((__v4sf) __A,
+						      (__v4sf) __B,
+						      (__v4sf)
+						      _mm_setzero_ps (),
+						      (__mmask8) -1, __R);
 }
 
-extern __inline void
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_store_epi32 (void *__P, __mmask16 __U, __m512i __A)
+_mm_mask_scalef_round_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B,
+			 const int __R)
 {
-  __builtin_ia32_movdqa32store512_mask ((__v16si *) __P, (__v16si) __A,
-					(__mmask16) __U);
+  return (__m128) __builtin_ia32_scalefss_mask_round ((__v4sf) __A,
+						      (__v4sf) __B,
+						      (__v4sf) __W,
+						      (__mmask8) __U, __R);
 }
 
-extern __inline __m512i
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mullo_epi32 (__m512i __A, __m512i __B)
+_mm_maskz_scalef_round_ss (__mmask8 __U, __m128 __A, __m128 __B, const int __R)
 {
-  return (__m512i) ((__v16su) __A * (__v16su) __B);
+  return (__m128) __builtin_ia32_scalefss_mask_round ((__v4sf) __A,
+						      (__v4sf) __B,
+						      (__v4sf)
+						      _mm_setzero_ps (),
+						      (__mmask8) __U, __R);
 }
+#else
+#define _mm_sqrt_round_sd(A, B, C)	      \
+    (__m128d)__builtin_ia32_sqrtsd_mask_round (B, A, \
+	(__v2df) _mm_setzero_pd (), -1, C)
 
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_mullo_epi32 (__mmask16 __M, __m512i __A, __m512i __B)
-{
-  return (__m512i) __builtin_ia32_pmulld512_mask ((__v16si) __A,
-						  (__v16si) __B,
-						  (__v16si)
-						  _mm512_setzero_si512 (),
-						  __M);
-}
+#define _mm_mask_sqrt_round_sd(W, U, A, B, C) \
+    (__m128d)__builtin_ia32_sqrtsd_mask_round (B, A, W, U, C)
 
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_mullo_epi32 (__m512i __W, __mmask16 __M, __m512i __A, __m512i __B)
-{
-  return (__m512i) __builtin_ia32_pmulld512_mask ((__v16si) __A,
-						  (__v16si) __B,
-						  (__v16si) __W, __M);
-}
+#define _mm_maskz_sqrt_round_sd(U, A, B, C)   \
+    (__m128d)__builtin_ia32_sqrtsd_mask_round (B, A, \
+	(__v2df) _mm_setzero_pd (), U, C)
 
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mullox_epi64 (__m512i __A, __m512i __B)
-{
-  return (__m512i) ((__v8du) __A * (__v8du) __B);
-}
+#define _mm_sqrt_round_ss(A, B, C)	      \
+    (__m128)__builtin_ia32_sqrtss_mask_round (B, A, \
+	(__v4sf) _mm_setzero_ps (), -1, C)
 
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_mullox_epi64 (__m512i __W, __mmask8 __M, __m512i __A, __m512i __B)
-{
-  return _mm512_mask_mov_epi64 (__W, __M, _mm512_mullox_epi64 (__A, __B));
-}
+#define _mm_mask_sqrt_round_ss(W, U, A, B, C) \
+    (__m128)__builtin_ia32_sqrtss_mask_round (B, A, W, U, C)
 
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_sllv_epi32 (__m512i __X, __m512i __Y)
+#define _mm_maskz_sqrt_round_ss(U, A, B, C)   \
+    (__m128)__builtin_ia32_sqrtss_mask_round (B, A, \
+	(__v4sf) _mm_setzero_ps (), U, C)
+
+#define _mm_mul_round_sd(A, B, C)            \
+    (__m128d)__builtin_ia32_mulsd_round(A, B, C)
+
+#define _mm_mask_mul_round_sd(W, U, A, B, C) \
+    (__m128d)__builtin_ia32_mulsd_mask_round(A, B, W, U, C)
+
+#define _mm_maskz_mul_round_sd(U, A, B, C)   \
+    (__m128d)__builtin_ia32_mulsd_mask_round(A, B, (__v2df)_mm_setzero_pd(), U, C)
+
+#define _mm_mul_round_ss(A, B, C)            \
+    (__m128)__builtin_ia32_mulss_round(A, B, C)
+
+#define _mm_mask_mul_round_ss(W, U, A, B, C) \
+    (__m128)__builtin_ia32_mulss_mask_round(A, B, W, U, C)
+
+#define _mm_maskz_mul_round_ss(U, A, B, C)   \
+    (__m128)__builtin_ia32_mulss_mask_round(A, B, (__v4sf)_mm_setzero_ps(), U, C)
+
+#define _mm_div_round_sd(A, B, C)            \
+    (__m128d)__builtin_ia32_divsd_round(A, B, C)
+
+#define _mm_mask_div_round_sd(W, U, A, B, C) \
+    (__m128d)__builtin_ia32_divsd_mask_round(A, B, W, U, C)
+
+#define _mm_maskz_div_round_sd(U, A, B, C)   \
+    (__m128d)__builtin_ia32_divsd_mask_round(A, B, (__v2df)_mm_setzero_pd(), U, C)
+
+#define _mm_div_round_ss(A, B, C)            \
+    (__m128)__builtin_ia32_divss_round(A, B, C)
+
+#define _mm_mask_div_round_ss(W, U, A, B, C) \
+    (__m128)__builtin_ia32_divss_mask_round(A, B, W, U, C)
+
+#define _mm_maskz_div_round_ss(U, A, B, C)   \
+    (__m128)__builtin_ia32_divss_mask_round(A, B, (__v4sf)_mm_setzero_ps(), U, C)
+
+#define _mm_scalef_round_sd(A, B, C)					\
+  ((__m128d)								\
+   __builtin_ia32_scalefsd_mask_round ((A), (B),			\
+				       (__v2df) _mm_undefined_pd (),	\
+				       -1, (C)))
+
+#define _mm_scalef_round_ss(A, B, C)					\
+  ((__m128)								\
+   __builtin_ia32_scalefss_mask_round ((A), (B),			\
+				       (__v4sf) _mm_undefined_ps (),	\
+				       -1, (C)))
+
+#define _mm_mask_scalef_round_sd(W, U, A, B, C)				\
+  ((__m128d)								\
+   __builtin_ia32_scalefsd_mask_round ((A), (B), (W), (U), (C)))
+
+#define _mm_mask_scalef_round_ss(W, U, A, B, C)				\
+  ((__m128)								\
+   __builtin_ia32_scalefss_mask_round ((A), (B), (W), (U), (C)))
+
+#define _mm_maskz_scalef_round_sd(U, A, B, C)				\
+  ((__m128d)								\
+   __builtin_ia32_scalefsd_mask_round ((A), (B),			\
+				       (__v2df) _mm_setzero_pd (),	\
+				       (U), (C)))
+
+#define _mm_maskz_scalef_round_ss(U, A, B, C)				\
+  ((__m128)								\
+   __builtin_ia32_scalefss_mask_round ((A), (B),			\
+				       (__v4sf) _mm_setzero_ps (),	\
+				       (U), (C)))
+#endif
+
+#define _mm_mask_sqrt_sd(W, U, A, B) \
+    _mm_mask_sqrt_round_sd ((W), (U), (A), (B), _MM_FROUND_CUR_DIRECTION)
+
+#define _mm_maskz_sqrt_sd(U, A, B) \
+    _mm_maskz_sqrt_round_sd ((U), (A), (B), _MM_FROUND_CUR_DIRECTION)
+
+#define _mm_mask_sqrt_ss(W, U, A, B) \
+    _mm_mask_sqrt_round_ss ((W), (U), (A), (B), _MM_FROUND_CUR_DIRECTION)
+
+#define _mm_maskz_sqrt_ss(U, A, B) \
+    _mm_maskz_sqrt_round_ss ((U), (A), (B), _MM_FROUND_CUR_DIRECTION)
+
+#define _mm_mask_scalef_sd(W, U, A, B) \
+    _mm_mask_scalef_round_sd ((W), (U), (A), (B), _MM_FROUND_CUR_DIRECTION)
+
+#define _mm_maskz_scalef_sd(U, A, B) \
+    _mm_maskz_scalef_round_sd ((U), (A), (B), _MM_FROUND_CUR_DIRECTION)
+
+#define _mm_mask_scalef_ss(W, U, A, B) \
+    _mm_mask_scalef_round_ss ((W), (U), (A), (B), _MM_FROUND_CUR_DIRECTION)
+
+#define _mm_maskz_scalef_ss(U, A, B) \
+    _mm_maskz_scalef_round_ss ((U), (A), (B), _MM_FROUND_CUR_DIRECTION)
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtu32_sd (__m128d __A, unsigned __B)
 {
-  return (__m512i) __builtin_ia32_psllv16si_mask ((__v16si) __X,
-						  (__v16si) __Y,
-						  (__v16si)
-						  _mm512_undefined_epi32 (),
-						  (__mmask16) -1);
+  return (__m128d) __builtin_ia32_cvtusi2sd32 ((__v2df) __A, __B);
 }
 
-extern __inline __m512i
+#ifdef __x86_64__
+#ifdef __OPTIMIZE__
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_sllv_epi32 (__m512i __W, __mmask16 __U, __m512i __X, __m512i __Y)
+_mm_cvt_roundu64_sd (__m128d __A, unsigned long long __B, const int __R)
 {
-  return (__m512i) __builtin_ia32_psllv16si_mask ((__v16si) __X,
-						  (__v16si) __Y,
-						  (__v16si) __W,
-						  (__mmask16) __U);
+  return (__m128d) __builtin_ia32_cvtusi2sd64 ((__v2df) __A, __B, __R);
 }
 
-extern __inline __m512i
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_sllv_epi32 (__mmask16 __U, __m512i __X, __m512i __Y)
+_mm_cvt_roundi64_sd (__m128d __A, long long __B, const int __R)
 {
-  return (__m512i) __builtin_ia32_psllv16si_mask ((__v16si) __X,
-						  (__v16si) __Y,
-						  (__v16si)
-						  _mm512_setzero_si512 (),
-						  (__mmask16) __U);
+  return (__m128d) __builtin_ia32_cvtsi2sd64 ((__v2df) __A, __B, __R);
 }
 
-extern __inline __m512i
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_srav_epi32 (__m512i __X, __m512i __Y)
+_mm_cvt_roundsi64_sd (__m128d __A, long long __B, const int __R)
 {
-  return (__m512i) __builtin_ia32_psrav16si_mask ((__v16si) __X,
-						  (__v16si) __Y,
-						  (__v16si)
-						  _mm512_undefined_epi32 (),
-						  (__mmask16) -1);
+  return (__m128d) __builtin_ia32_cvtsi2sd64 ((__v2df) __A, __B, __R);
 }
+#else
+#define _mm_cvt_roundu64_sd(A, B, C)   \
+    (__m128d)__builtin_ia32_cvtusi2sd64(A, B, C)
 
-extern __inline __m512i
+#define _mm_cvt_roundi64_sd(A, B, C)   \
+    (__m128d)__builtin_ia32_cvtsi2sd64(A, B, C)
+
+#define _mm_cvt_roundsi64_sd(A, B, C)   \
+    (__m128d)__builtin_ia32_cvtsi2sd64(A, B, C)
+#endif
+
+#endif
+
+#ifdef __OPTIMIZE__
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_srav_epi32 (__m512i __W, __mmask16 __U, __m512i __X, __m512i __Y)
+_mm_cvt_roundu32_ss (__m128 __A, unsigned __B, const int __R)
 {
-  return (__m512i) __builtin_ia32_psrav16si_mask ((__v16si) __X,
-						  (__v16si) __Y,
-						  (__v16si) __W,
-						  (__mmask16) __U);
+  return (__m128) __builtin_ia32_cvtusi2ss32 ((__v4sf) __A, __B, __R);
 }
 
-extern __inline __m512i
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_srav_epi32 (__mmask16 __U, __m512i __X, __m512i __Y)
+_mm_cvt_roundsi32_ss (__m128 __A, int __B, const int __R)
 {
-  return (__m512i) __builtin_ia32_psrav16si_mask ((__v16si) __X,
-						  (__v16si) __Y,
-						  (__v16si)
-						  _mm512_setzero_si512 (),
-						  (__mmask16) __U);
+  return (__m128) __builtin_ia32_cvtsi2ss32 ((__v4sf) __A, __B, __R);
 }
 
-extern __inline __m512i
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_srlv_epi32 (__m512i __X, __m512i __Y)
+_mm_cvt_roundi32_ss (__m128 __A, int __B, const int __R)
 {
-  return (__m512i) __builtin_ia32_psrlv16si_mask ((__v16si) __X,
-						  (__v16si) __Y,
-						  (__v16si)
-						  _mm512_undefined_epi32 (),
-						  (__mmask16) -1);
+  return (__m128) __builtin_ia32_cvtsi2ss32 ((__v4sf) __A, __B, __R);
 }
+#else
+#define _mm_cvt_roundu32_ss(A, B, C)   \
+    (__m128)__builtin_ia32_cvtusi2ss32(A, B, C)
 
-extern __inline __m512i
+#define _mm_cvt_roundi32_ss(A, B, C)   \
+    (__m128)__builtin_ia32_cvtsi2ss32(A, B, C)
+
+#define _mm_cvt_roundsi32_ss(A, B, C)   \
+    (__m128)__builtin_ia32_cvtsi2ss32(A, B, C)
+#endif
+
+#ifdef __x86_64__
+#ifdef __OPTIMIZE__
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_srlv_epi32 (__m512i __W, __mmask16 __U, __m512i __X, __m512i __Y)
+_mm_cvt_roundu64_ss (__m128 __A, unsigned long long __B, const int __R)
 {
-  return (__m512i) __builtin_ia32_psrlv16si_mask ((__v16si) __X,
-						  (__v16si) __Y,
-						  (__v16si) __W,
-						  (__mmask16) __U);
+  return (__m128) __builtin_ia32_cvtusi2ss64 ((__v4sf) __A, __B, __R);
 }
 
-extern __inline __m512i
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_srlv_epi32 (__mmask16 __U, __m512i __X, __m512i __Y)
+_mm_cvt_roundsi64_ss (__m128 __A, long long __B, const int __R)
 {
-  return (__m512i) __builtin_ia32_psrlv16si_mask ((__v16si) __X,
-						  (__v16si) __Y,
-						  (__v16si)
-						  _mm512_setzero_si512 (),
-						  (__mmask16) __U);
+  return (__m128) __builtin_ia32_cvtsi2ss64 ((__v4sf) __A, __B, __R);
 }
 
-extern __inline __m512i
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_add_epi64 (__m512i __A, __m512i __B)
+_mm_cvt_roundi64_ss (__m128 __A, long long __B, const int __R)
 {
-  return (__m512i) ((__v8du) __A + (__v8du) __B);
+  return (__m128) __builtin_ia32_cvtsi2ss64 ((__v4sf) __A, __B, __R);
 }
+#else
+#define _mm_cvt_roundu64_ss(A, B, C)   \
+    (__m128)__builtin_ia32_cvtusi2ss64(A, B, C)
 
-extern __inline __m512i
+#define _mm_cvt_roundi64_ss(A, B, C)   \
+    (__m128)__builtin_ia32_cvtsi2ss64(A, B, C)
+
+#define _mm_cvt_roundsi64_ss(A, B, C)   \
+    (__m128)__builtin_ia32_cvtsi2ss64(A, B, C)
+#endif
+
+#endif
+
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_add_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m512i __B)
+_mm_mask_load_ss (__m128 __W, __mmask8 __U, const float *__P)
 {
-  return (__m512i) __builtin_ia32_paddq512_mask ((__v8di) __A,
-						 (__v8di) __B,
-						 (__v8di) __W,
-						 (__mmask8) __U);
+  return (__m128) __builtin_ia32_loadss_mask (__P, (__v4sf) __W, __U);
 }
 
-extern __inline __m512i
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_add_epi64 (__mmask8 __U, __m512i __A, __m512i __B)
+_mm_maskz_load_ss (__mmask8 __U, const float *__P)
 {
-  return (__m512i) __builtin_ia32_paddq512_mask ((__v8di) __A,
-						 (__v8di) __B,
-						 (__v8di)
-						 _mm512_setzero_si512 (),
-						 (__mmask8) __U);
+  return (__m128) __builtin_ia32_loadss_mask (__P, (__v4sf) _mm_setzero_ps (),
+					      __U);
 }
 
-extern __inline __m512i
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_sub_epi64 (__m512i __A, __m512i __B)
+_mm_mask_load_sd (__m128d __W, __mmask8 __U, const double *__P)
 {
-  return (__m512i) ((__v8du) __A - (__v8du) __B);
+  return (__m128d) __builtin_ia32_loadsd_mask (__P, (__v2df) __W, __U);
 }
 
-extern __inline __m512i
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_sub_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m512i __B)
+_mm_maskz_load_sd (__mmask8 __U, const double *__P)
 {
-  return (__m512i) __builtin_ia32_psubq512_mask ((__v8di) __A,
-						 (__v8di) __B,
-						 (__v8di) __W,
-						 (__mmask8) __U);
+  return (__m128d) __builtin_ia32_loadsd_mask (__P, (__v2df) _mm_setzero_pd (),
+					       __U);
 }
 
-extern __inline __m512i
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_sub_epi64 (__mmask8 __U, __m512i __A, __m512i __B)
+_mm_mask_move_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B)
 {
-  return (__m512i) __builtin_ia32_psubq512_mask ((__v8di) __A,
-						 (__v8di) __B,
-						 (__v8di)
-						 _mm512_setzero_si512 (),
-						 (__mmask8) __U);
+  return (__m128) __builtin_ia32_movess_mask ((__v4sf) __A, (__v4sf) __B,
+					      (__v4sf) __W, __U);
 }
 
-extern __inline __m512i
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_sllv_epi64 (__m512i __X, __m512i __Y)
+_mm_maskz_move_ss (__mmask8 __U, __m128 __A, __m128 __B)
 {
-  return (__m512i) __builtin_ia32_psllv8di_mask ((__v8di) __X,
-						 (__v8di) __Y,
-						 (__v8di)
-						 _mm512_undefined_pd (),
-						 (__mmask8) -1);
+  return (__m128) __builtin_ia32_movess_mask ((__v4sf) __A, (__v4sf) __B,
+					      (__v4sf) _mm_setzero_ps (), __U);
 }
 
-extern __inline __m512i
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_sllv_epi64 (__m512i __W, __mmask8 __U, __m512i __X, __m512i __Y)
+_mm_mask_move_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B)
 {
-  return (__m512i) __builtin_ia32_psllv8di_mask ((__v8di) __X,
-						 (__v8di) __Y,
-						 (__v8di) __W,
-						 (__mmask8) __U);
+  return (__m128d) __builtin_ia32_movesd_mask ((__v2df) __A, (__v2df) __B,
+					       (__v2df) __W, __U);
 }
 
-extern __inline __m512i
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_sllv_epi64 (__mmask8 __U, __m512i __X, __m512i __Y)
+_mm_maskz_move_sd (__mmask8 __U, __m128d __A, __m128d __B)
 {
-  return (__m512i) __builtin_ia32_psllv8di_mask ((__v8di) __X,
-						 (__v8di) __Y,
-						 (__v8di)
-						 _mm512_setzero_si512 (),
-						 (__mmask8) __U);
+  return (__m128d) __builtin_ia32_movesd_mask ((__v2df) __A, (__v2df) __B,
+					       (__v2df) _mm_setzero_pd (),
+					       __U);
 }
 
-extern __inline __m512i
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_srav_epi64 (__m512i __X, __m512i __Y)
+_mm_mask_store_ss (float *__P, __mmask8 __U, __m128 __A)
 {
-  return (__m512i) __builtin_ia32_psrav8di_mask ((__v8di) __X,
-						 (__v8di) __Y,
-						 (__v8di)
-						 _mm512_undefined_epi32 (),
-						 (__mmask8) -1);
+  __builtin_ia32_storess_mask (__P, (__v4sf) __A, (__mmask8) __U);
 }
 
-extern __inline __m512i
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_srav_epi64 (__m512i __W, __mmask8 __U, __m512i __X, __m512i __Y)
+_mm_mask_store_sd (double *__P, __mmask8 __U, __m128d __A)
 {
-  return (__m512i) __builtin_ia32_psrav8di_mask ((__v8di) __X,
-						 (__v8di) __Y,
-						 (__v8di) __W,
-						 (__mmask8) __U);
+  __builtin_ia32_storesd_mask (__P, (__v2df) __A, (__mmask8) __U);
 }
 
-extern __inline __m512i
+#ifdef __OPTIMIZE__
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_srav_epi64 (__mmask8 __U, __m512i __X, __m512i __Y)
+_mm_fixupimm_round_sd (__m128d __A, __m128d __B, __m128i __C,
+		       const int __imm, const int __R)
 {
-  return (__m512i) __builtin_ia32_psrav8di_mask ((__v8di) __X,
-						 (__v8di) __Y,
-						 (__v8di)
-						 _mm512_setzero_si512 (),
-						 (__mmask8) __U);
+  return (__m128d) __builtin_ia32_fixupimmsd_mask ((__v2df) __A,
+						   (__v2df) __B,
+						   (__v2di) __C, __imm,
+						   (__mmask8) -1, __R);
 }
 
-extern __inline __m512i
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_srlv_epi64 (__m512i __X, __m512i __Y)
+_mm_mask_fixupimm_round_sd (__m128d __A, __mmask8 __U, __m128d __B,
+			    __m128i __C, const int __imm, const int __R)
 {
-  return (__m512i) __builtin_ia32_psrlv8di_mask ((__v8di) __X,
-						 (__v8di) __Y,
-						 (__v8di)
-						 _mm512_undefined_epi32 (),
-						 (__mmask8) -1);
+  return (__m128d) __builtin_ia32_fixupimmsd_mask ((__v2df) __A,
+						   (__v2df) __B,
+						   (__v2di) __C, __imm,
+						   (__mmask8) __U, __R);
 }
 
-extern __inline __m512i
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_srlv_epi64 (__m512i __W, __mmask8 __U, __m512i __X, __m512i __Y)
+_mm_maskz_fixupimm_round_sd (__mmask8 __U, __m128d __A, __m128d __B,
+			     __m128i __C, const int __imm, const int __R)
 {
-  return (__m512i) __builtin_ia32_psrlv8di_mask ((__v8di) __X,
-						 (__v8di) __Y,
-						 (__v8di) __W,
-						 (__mmask8) __U);
+  return (__m128d) __builtin_ia32_fixupimmsd_maskz ((__v2df) __A,
+						    (__v2df) __B,
+						    (__v2di) __C,
+						    __imm,
+						    (__mmask8) __U, __R);
 }
 
-extern __inline __m512i
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_srlv_epi64 (__mmask8 __U, __m512i __X, __m512i __Y)
+_mm_fixupimm_round_ss (__m128 __A, __m128 __B, __m128i __C,
+		       const int __imm, const int __R)
 {
-  return (__m512i) __builtin_ia32_psrlv8di_mask ((__v8di) __X,
-						 (__v8di) __Y,
-						 (__v8di)
-						 _mm512_setzero_si512 (),
-						 (__mmask8) __U);
+  return (__m128) __builtin_ia32_fixupimmss_mask ((__v4sf) __A,
+						  (__v4sf) __B,
+						  (__v4si) __C, __imm,
+						  (__mmask8) -1, __R);
 }
 
-extern __inline __m512i
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_add_epi32 (__m512i __A, __m512i __B)
+_mm_mask_fixupimm_round_ss (__m128 __A, __mmask8 __U, __m128 __B,
+			    __m128i __C, const int __imm, const int __R)
 {
-  return (__m512i) ((__v16su) __A + (__v16su) __B);
+  return (__m128) __builtin_ia32_fixupimmss_mask ((__v4sf) __A,
+						  (__v4sf) __B,
+						  (__v4si) __C, __imm,
+						  (__mmask8) __U, __R);
 }
 
-extern __inline __m512i
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_add_epi32 (__m512i __W, __mmask16 __U, __m512i __A, __m512i __B)
+_mm_maskz_fixupimm_round_ss (__mmask8 __U, __m128 __A, __m128 __B,
+			     __m128i __C, const int __imm, const int __R)
 {
-  return (__m512i) __builtin_ia32_paddd512_mask ((__v16si) __A,
-						 (__v16si) __B,
-						 (__v16si) __W,
-						 (__mmask16) __U);
+  return (__m128) __builtin_ia32_fixupimmss_maskz ((__v4sf) __A,
+						   (__v4sf) __B,
+						   (__v4si) __C, __imm,
+						   (__mmask8) __U, __R);
 }
 
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_add_epi32 (__mmask16 __U, __m512i __A, __m512i __B)
-{
-  return (__m512i) __builtin_ia32_paddd512_mask ((__v16si) __A,
-						 (__v16si) __B,
-						 (__v16si)
-						 _mm512_setzero_si512 (),
-						 (__mmask16) __U);
-}
+#else
+#define _mm_fixupimm_round_sd(X, Y, Z, C, R)					\
+    ((__m128d)__builtin_ia32_fixupimmsd_mask ((__v2df)(__m128d)(X),	\
+      (__v2df)(__m128d)(Y), (__v2di)(__m128i)(Z), (int)(C),		\
+      (__mmask8)(-1), (R)))
 
-extern __inline __m512i
+#define _mm_mask_fixupimm_round_sd(X, U, Y, Z, C, R)				\
+    ((__m128d)__builtin_ia32_fixupimmsd_mask ((__v2df)(__m128d)(X),	\
+      (__v2df)(__m128d)(Y), (__v2di)(__m128i)(Z), (int)(C),		\
+      (__mmask8)(U), (R)))
+
+#define _mm_maskz_fixupimm_round_sd(U, X, Y, Z, C, R)				\
+    ((__m128d)__builtin_ia32_fixupimmsd_maskz ((__v2df)(__m128d)(X),	\
+      (__v2df)(__m128d)(Y), (__v2di)(__m128i)(Z), (int)(C),		\
+      (__mmask8)(U), (R)))
+
+#define _mm_fixupimm_round_ss(X, Y, Z, C, R)					\
+    ((__m128)__builtin_ia32_fixupimmss_mask ((__v4sf)(__m128)(X),	\
+      (__v4sf)(__m128)(Y), (__v4si)(__m128i)(Z), (int)(C),		\
+      (__mmask8)(-1), (R)))
+
+#define _mm_mask_fixupimm_round_ss(X, U, Y, Z, C, R)				\
+    ((__m128)__builtin_ia32_fixupimmss_mask ((__v4sf)(__m128)(X),	\
+      (__v4sf)(__m128)(Y), (__v4si)(__m128i)(Z), (int)(C),		\
+      (__mmask8)(U), (R)))
+
+#define _mm_maskz_fixupimm_round_ss(U, X, Y, Z, C, R)				\
+    ((__m128)__builtin_ia32_fixupimmss_maskz ((__v4sf)(__m128)(X),	\
+      (__v4sf)(__m128)(Y), (__v4si)(__m128i)(Z), (int)(C),		\
+      (__mmask8)(U), (R)))
+
+#endif
+
+#ifdef __x86_64__
+#ifdef __OPTIMIZE__
+extern __inline unsigned long long
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mul_epi32 (__m512i __X, __m512i __Y)
+_mm_cvt_roundss_u64 (__m128 __A, const int __R)
 {
-  return (__m512i) __builtin_ia32_pmuldq512_mask ((__v16si) __X,
-						  (__v16si) __Y,
-						  (__v8di)
-						  _mm512_undefined_epi32 (),
-						  (__mmask8) -1);
+  return (unsigned long long) __builtin_ia32_vcvtss2usi64 ((__v4sf) __A, __R);
 }
 
-extern __inline __m512i
+extern __inline long long
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_mul_epi32 (__m512i __W, __mmask8 __M, __m512i __X, __m512i __Y)
+_mm_cvt_roundss_si64 (__m128 __A, const int __R)
 {
-  return (__m512i) __builtin_ia32_pmuldq512_mask ((__v16si) __X,
-						  (__v16si) __Y,
-						  (__v8di) __W, __M);
+  return (long long) __builtin_ia32_vcvtss2si64 ((__v4sf) __A, __R);
 }
 
-extern __inline __m512i
+extern __inline long long
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_mul_epi32 (__mmask8 __M, __m512i __X, __m512i __Y)
+_mm_cvt_roundss_i64 (__m128 __A, const int __R)
 {
-  return (__m512i) __builtin_ia32_pmuldq512_mask ((__v16si) __X,
-						  (__v16si) __Y,
-						  (__v8di)
-						  _mm512_setzero_si512 (),
-						  __M);
+  return (long long) __builtin_ia32_vcvtss2si64 ((__v4sf) __A, __R);
 }
 
-extern __inline __m512i
+extern __inline unsigned long long
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_sub_epi32 (__m512i __A, __m512i __B)
+_mm_cvtt_roundss_u64 (__m128 __A, const int __R)
 {
-  return (__m512i) ((__v16su) __A - (__v16su) __B);
+  return (unsigned long long) __builtin_ia32_vcvttss2usi64 ((__v4sf) __A, __R);
 }
 
-extern __inline __m512i
+extern __inline long long
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_sub_epi32 (__m512i __W, __mmask16 __U, __m512i __A, __m512i __B)
+_mm_cvtt_roundss_i64 (__m128 __A, const int __R)
 {
-  return (__m512i) __builtin_ia32_psubd512_mask ((__v16si) __A,
-						 (__v16si) __B,
-						 (__v16si) __W,
-						 (__mmask16) __U);
+  return (long long) __builtin_ia32_vcvttss2si64 ((__v4sf) __A, __R);
 }
 
-extern __inline __m512i
+extern __inline long long
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_sub_epi32 (__mmask16 __U, __m512i __A, __m512i __B)
+_mm_cvtt_roundss_si64 (__m128 __A, const int __R)
 {
-  return (__m512i) __builtin_ia32_psubd512_mask ((__v16si) __A,
-						 (__v16si) __B,
-						 (__v16si)
-						 _mm512_setzero_si512 (),
-						 (__mmask16) __U);
+  return (long long) __builtin_ia32_vcvttss2si64 ((__v4sf) __A, __R);
 }
+#else
+#define _mm_cvt_roundss_u64(A, B)   \
+    ((unsigned long long)__builtin_ia32_vcvtss2usi64(A, B))
 
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mul_epu32 (__m512i __X, __m512i __Y)
-{
-  return (__m512i) __builtin_ia32_pmuludq512_mask ((__v16si) __X,
-						   (__v16si) __Y,
-						   (__v8di)
-						   _mm512_undefined_epi32 (),
-						   (__mmask8) -1);
+#define _mm_cvt_roundss_si64(A, B)   \
+    ((long long)__builtin_ia32_vcvtss2si64(A, B))
+
+#define _mm_cvt_roundss_i64(A, B)   \
+    ((long long)__builtin_ia32_vcvtss2si64(A, B))
+
+#define _mm_cvtt_roundss_u64(A, B)  \
+    ((unsigned long long)__builtin_ia32_vcvttss2usi64(A, B))
+
+#define _mm_cvtt_roundss_i64(A, B)  \
+    ((long long)__builtin_ia32_vcvttss2si64(A, B))
+
+#define _mm_cvtt_roundss_si64(A, B)  \
+    ((long long)__builtin_ia32_vcvttss2si64(A, B))
+#endif
+#endif
+
+#ifdef __OPTIMIZE__
+extern __inline unsigned
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvt_roundss_u32 (__m128 __A, const int __R)
+{
+  return (unsigned) __builtin_ia32_vcvtss2usi32 ((__v4sf) __A, __R);
 }
 
-extern __inline __m512i
+extern __inline int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_mul_epu32 (__m512i __W, __mmask8 __M, __m512i __X, __m512i __Y)
+_mm_cvt_roundss_si32 (__m128 __A, const int __R)
 {
-  return (__m512i) __builtin_ia32_pmuludq512_mask ((__v16si) __X,
-						   (__v16si) __Y,
-						   (__v8di) __W, __M);
+  return (int) __builtin_ia32_vcvtss2si32 ((__v4sf) __A, __R);
 }
 
-extern __inline __m512i
+extern __inline int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_mul_epu32 (__mmask8 __M, __m512i __X, __m512i __Y)
+_mm_cvt_roundss_i32 (__m128 __A, const int __R)
 {
-  return (__m512i) __builtin_ia32_pmuludq512_mask ((__v16si) __X,
-						   (__v16si) __Y,
-						   (__v8di)
-						   _mm512_setzero_si512 (),
-						   __M);
+  return (int) __builtin_ia32_vcvtss2si32 ((__v4sf) __A, __R);
 }
 
-#ifdef __OPTIMIZE__
-extern __inline __m512i
+extern __inline unsigned
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_slli_epi64 (__m512i __A, unsigned int __B)
+_mm_cvtt_roundss_u32 (__m128 __A, const int __R)
 {
-  return (__m512i) __builtin_ia32_psllqi512_mask ((__v8di) __A, __B,
-						  (__v8di)
-						  _mm512_undefined_epi32 (),
-						  (__mmask8) -1);
+  return (unsigned) __builtin_ia32_vcvttss2usi32 ((__v4sf) __A, __R);
 }
 
-extern __inline __m512i
+extern __inline int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_slli_epi64 (__m512i __W, __mmask8 __U, __m512i __A,
-			unsigned int __B)
+_mm_cvtt_roundss_i32 (__m128 __A, const int __R)
 {
-  return (__m512i) __builtin_ia32_psllqi512_mask ((__v8di) __A, __B,
-						  (__v8di) __W,
-						  (__mmask8) __U);
+  return (int) __builtin_ia32_vcvttss2si32 ((__v4sf) __A, __R);
 }
 
-extern __inline __m512i
+extern __inline int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_slli_epi64 (__mmask8 __U, __m512i __A, unsigned int __B)
+_mm_cvtt_roundss_si32 (__m128 __A, const int __R)
 {
-  return (__m512i) __builtin_ia32_psllqi512_mask ((__v8di) __A, __B,
-						  (__v8di)
-						  _mm512_setzero_si512 (),
-						  (__mmask8) __U);
+  return (int) __builtin_ia32_vcvttss2si32 ((__v4sf) __A, __R);
 }
 #else
-#define _mm512_slli_epi64(X, C)						\
-  ((__m512i) __builtin_ia32_psllqi512_mask ((__v8di)(__m512i)(X),	\
-    (unsigned int)(C),							\
-    (__v8di)(__m512i)_mm512_undefined_epi32 (),				\
-    (__mmask8)-1))
+#define _mm_cvt_roundss_u32(A, B)   \
+    ((unsigned)__builtin_ia32_vcvtss2usi32(A, B))
 
-#define _mm512_mask_slli_epi64(W, U, X, C)				\
-  ((__m512i) __builtin_ia32_psllqi512_mask ((__v8di)(__m512i)(X),	\
-    (unsigned int)(C),							\
-    (__v8di)(__m512i)(W),						\
-    (__mmask8)(U)))
+#define _mm_cvt_roundss_si32(A, B)   \
+    ((int)__builtin_ia32_vcvtss2si32(A, B))
 
-#define _mm512_maskz_slli_epi64(U, X, C)				\
-  ((__m512i) __builtin_ia32_psllqi512_mask ((__v8di)(__m512i)(X),	\
-    (unsigned int)(C),							\
-    (__v8di)(__m512i)_mm512_setzero_si512 (),				\
-    (__mmask8)(U)))
+#define _mm_cvt_roundss_i32(A, B)   \
+    ((int)__builtin_ia32_vcvtss2si32(A, B))
+
+#define _mm_cvtt_roundss_u32(A, B)  \
+    ((unsigned)__builtin_ia32_vcvttss2usi32(A, B))
+
+#define _mm_cvtt_roundss_si32(A, B)  \
+    ((int)__builtin_ia32_vcvttss2si32(A, B))
+
+#define _mm_cvtt_roundss_i32(A, B)  \
+    ((int)__builtin_ia32_vcvttss2si32(A, B))
 #endif
 
-extern __inline __m512i
+#ifdef __x86_64__
+#ifdef __OPTIMIZE__
+extern __inline unsigned long long
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_sll_epi64 (__m512i __A, __m128i __B)
+_mm_cvt_roundsd_u64 (__m128d __A, const int __R)
 {
-  return (__m512i) __builtin_ia32_psllq512_mask ((__v8di) __A,
-						 (__v2di) __B,
-						 (__v8di)
-						 _mm512_undefined_epi32 (),
-						 (__mmask8) -1);
+  return (unsigned long long) __builtin_ia32_vcvtsd2usi64 ((__v2df) __A, __R);
 }
 
-extern __inline __m512i
+extern __inline long long
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_sll_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m128i __B)
+_mm_cvt_roundsd_si64 (__m128d __A, const int __R)
 {
-  return (__m512i) __builtin_ia32_psllq512_mask ((__v8di) __A,
-						 (__v2di) __B,
-						 (__v8di) __W,
-						 (__mmask8) __U);
+  return (long long) __builtin_ia32_vcvtsd2si64 ((__v2df) __A, __R);
 }
 
-extern __inline __m512i
+extern __inline long long
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_sll_epi64 (__mmask8 __U, __m512i __A, __m128i __B)
+_mm_cvt_roundsd_i64 (__m128d __A, const int __R)
 {
-  return (__m512i) __builtin_ia32_psllq512_mask ((__v8di) __A,
-						 (__v2di) __B,
-						 (__v8di)
-						 _mm512_setzero_si512 (),
-						 (__mmask8) __U);
+  return (long long) __builtin_ia32_vcvtsd2si64 ((__v2df) __A, __R);
 }
 
-#ifdef __OPTIMIZE__
-extern __inline __m512i
+extern __inline unsigned long long
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_srli_epi64 (__m512i __A, unsigned int __B)
+_mm_cvtt_roundsd_u64 (__m128d __A, const int __R)
 {
-  return (__m512i) __builtin_ia32_psrlqi512_mask ((__v8di) __A, __B,
-						  (__v8di)
-						  _mm512_undefined_epi32 (),
-						  (__mmask8) -1);
+  return (unsigned long long) __builtin_ia32_vcvttsd2usi64 ((__v2df) __A, __R);
 }
 
-extern __inline __m512i
+extern __inline long long
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_srli_epi64 (__m512i __W, __mmask8 __U,
-			__m512i __A, unsigned int __B)
+_mm_cvtt_roundsd_si64 (__m128d __A, const int __R)
 {
-  return (__m512i) __builtin_ia32_psrlqi512_mask ((__v8di) __A, __B,
-						  (__v8di) __W,
-						  (__mmask8) __U);
+  return (long long) __builtin_ia32_vcvttsd2si64 ((__v2df) __A, __R);
 }
 
-extern __inline __m512i
+extern __inline long long
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_srli_epi64 (__mmask8 __U, __m512i __A, unsigned int __B)
+_mm_cvtt_roundsd_i64 (__m128d __A, const int __R)
 {
-  return (__m512i) __builtin_ia32_psrlqi512_mask ((__v8di) __A, __B,
-						  (__v8di)
-						  _mm512_setzero_si512 (),
-						  (__mmask8) __U);
+  return (long long) __builtin_ia32_vcvttsd2si64 ((__v2df) __A, __R);
 }
 #else
-#define _mm512_srli_epi64(X, C)						\
-  ((__m512i) __builtin_ia32_psrlqi512_mask ((__v8di)(__m512i)(X),	\
-    (unsigned int)(C),							\
-    (__v8di)(__m512i)_mm512_undefined_epi32 (),				\
-    (__mmask8)-1))
+#define _mm_cvt_roundsd_u64(A, B)   \
+    ((unsigned long long)__builtin_ia32_vcvtsd2usi64(A, B))
 
-#define _mm512_mask_srli_epi64(W, U, X, C)				\
-  ((__m512i) __builtin_ia32_psrlqi512_mask ((__v8di)(__m512i)(X),	\
-    (unsigned int)(C),							\
-    (__v8di)(__m512i)(W),						\
-    (__mmask8)(U)))
+#define _mm_cvt_roundsd_si64(A, B)   \
+    ((long long)__builtin_ia32_vcvtsd2si64(A, B))
 
-#define _mm512_maskz_srli_epi64(U, X, C)				\
-  ((__m512i) __builtin_ia32_psrlqi512_mask ((__v8di)(__m512i)(X),	\
-    (unsigned int)(C),							\
-    (__v8di)(__m512i)_mm512_setzero_si512 (),				\
-    (__mmask8)(U)))
+#define _mm_cvt_roundsd_i64(A, B)   \
+    ((long long)__builtin_ia32_vcvtsd2si64(A, B))
+
+#define _mm_cvtt_roundsd_u64(A, B)   \
+    ((unsigned long long)__builtin_ia32_vcvttsd2usi64(A, B))
+
+#define _mm_cvtt_roundsd_si64(A, B)   \
+    ((long long)__builtin_ia32_vcvttsd2si64(A, B))
+
+#define _mm_cvtt_roundsd_i64(A, B)   \
+    ((long long)__builtin_ia32_vcvttsd2si64(A, B))
+#endif
 #endif
 
-extern __inline __m512i
+#ifdef __OPTIMIZE__
+extern __inline unsigned
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_srl_epi64 (__m512i __A, __m128i __B)
+_mm_cvt_roundsd_u32 (__m128d __A, const int __R)
 {
-  return (__m512i) __builtin_ia32_psrlq512_mask ((__v8di) __A,
-						 (__v2di) __B,
-						 (__v8di)
-						 _mm512_undefined_epi32 (),
-						 (__mmask8) -1);
+  return (unsigned) __builtin_ia32_vcvtsd2usi32 ((__v2df) __A, __R);
 }
 
-extern __inline __m512i
+extern __inline int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_srl_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m128i __B)
+_mm_cvt_roundsd_si32 (__m128d __A, const int __R)
 {
-  return (__m512i) __builtin_ia32_psrlq512_mask ((__v8di) __A,
-						 (__v2di) __B,
-						 (__v8di) __W,
-						 (__mmask8) __U);
+  return (int) __builtin_ia32_vcvtsd2si32 ((__v2df) __A, __R);
 }
 
-extern __inline __m512i
+extern __inline int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_srl_epi64 (__mmask8 __U, __m512i __A, __m128i __B)
+_mm_cvt_roundsd_i32 (__m128d __A, const int __R)
 {
-  return (__m512i) __builtin_ia32_psrlq512_mask ((__v8di) __A,
-						 (__v2di) __B,
-						 (__v8di)
-						 _mm512_setzero_si512 (),
-						 (__mmask8) __U);
+  return (int) __builtin_ia32_vcvtsd2si32 ((__v2df) __A, __R);
 }
 
-#ifdef __OPTIMIZE__
-extern __inline __m512i
+extern __inline unsigned
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_srai_epi64 (__m512i __A, unsigned int __B)
+_mm_cvtt_roundsd_u32 (__m128d __A, const int __R)
 {
-  return (__m512i) __builtin_ia32_psraqi512_mask ((__v8di) __A, __B,
-						  (__v8di)
-						  _mm512_undefined_epi32 (),
-						  (__mmask8) -1);
+  return (unsigned) __builtin_ia32_vcvttsd2usi32 ((__v2df) __A, __R);
 }
 
-extern __inline __m512i
+extern __inline int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_srai_epi64 (__m512i __W, __mmask8 __U, __m512i __A,
-			unsigned int __B)
+_mm_cvtt_roundsd_i32 (__m128d __A, const int __R)
 {
-  return (__m512i) __builtin_ia32_psraqi512_mask ((__v8di) __A, __B,
-						  (__v8di) __W,
-						  (__mmask8) __U);
+  return (int) __builtin_ia32_vcvttsd2si32 ((__v2df) __A, __R);
 }
 
-extern __inline __m512i
+extern __inline int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_srai_epi64 (__mmask8 __U, __m512i __A, unsigned int __B)
+_mm_cvtt_roundsd_si32 (__m128d __A, const int __R)
 {
-  return (__m512i) __builtin_ia32_psraqi512_mask ((__v8di) __A, __B,
-						  (__v8di)
-						  _mm512_setzero_si512 (),
-						  (__mmask8) __U);
+  return (int) __builtin_ia32_vcvttsd2si32 ((__v2df) __A, __R);
 }
-#else
-#define _mm512_srai_epi64(X, C)						\
-  ((__m512i) __builtin_ia32_psraqi512_mask ((__v8di)(__m512i)(X),	\
-    (unsigned int)(C),							\
-    (__v8di)(__m512i)_mm512_undefined_epi32 (),				\
-    (__mmask8)-1))
-
-#define _mm512_mask_srai_epi64(W, U, X, C)				\
-  ((__m512i) __builtin_ia32_psraqi512_mask ((__v8di)(__m512i)(X),	\
-    (unsigned int)(C),							\
-    (__v8di)(__m512i)(W),						\
-    (__mmask8)(U)))
-
-#define _mm512_maskz_srai_epi64(U, X, C)				\
-  ((__m512i) __builtin_ia32_psraqi512_mask ((__v8di)(__m512i)(X),	\
-    (unsigned int)(C),							\
-    (__v8di)(__m512i)_mm512_setzero_si512 (),				\
-    (__mmask8)(U)))
-#endif
 
-extern __inline __m512i
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_sra_epi64 (__m512i __A, __m128i __B)
+_mm_cvt_roundsd_ss (__m128 __A, __m128d __B, const int __R)
 {
-  return (__m512i) __builtin_ia32_psraq512_mask ((__v8di) __A,
-						 (__v2di) __B,
-						 (__v8di)
-						 _mm512_undefined_epi32 (),
-						 (__mmask8) -1);
+  return (__m128) __builtin_ia32_cvtsd2ss_round ((__v4sf) __A,
+						 (__v2df) __B,
+						 __R);
 }
 
-extern __inline __m512i
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_sra_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m128i __B)
+_mm_mask_cvt_roundsd_ss (__m128 __W, __mmask8 __U, __m128 __A,
+			 __m128d __B, const int __R)
 {
-  return (__m512i) __builtin_ia32_psraq512_mask ((__v8di) __A,
-						 (__v2di) __B,
-						 (__v8di) __W,
-						 (__mmask8) __U);
+  return (__m128) __builtin_ia32_cvtsd2ss_mask_round ((__v4sf) __A,
+						      (__v2df) __B,
+						      (__v4sf) __W,
+						      __U,
+						      __R);
 }
 
-extern __inline __m512i
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_sra_epi64 (__mmask8 __U, __m512i __A, __m128i __B)
+_mm_maskz_cvt_roundsd_ss (__mmask8 __U, __m128 __A,
+			 __m128d __B, const int __R)
 {
-  return (__m512i) __builtin_ia32_psraq512_mask ((__v8di) __A,
-						 (__v2di) __B,
-						 (__v8di)
-						 _mm512_setzero_si512 (),
-						 (__mmask8) __U);
+  return (__m128) __builtin_ia32_cvtsd2ss_mask_round ((__v4sf) __A,
+						      (__v2df) __B,
+						      _mm_setzero_ps (),
+						      __U,
+						      __R);
 }
 
-#ifdef __OPTIMIZE__
-extern __inline __m512i
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_slli_epi32 (__m512i __A, unsigned int __B)
+_mm_cvt_roundss_sd (__m128d __A, __m128 __B, const int __R)
 {
-  return (__m512i) __builtin_ia32_pslldi512_mask ((__v16si) __A, __B,
-						  (__v16si)
-						  _mm512_undefined_epi32 (),
-						  (__mmask16) -1);
+  return (__m128d) __builtin_ia32_cvtss2sd_round ((__v2df) __A,
+						  (__v4sf) __B,
+						  __R);
 }
 
-extern __inline __m512i
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_slli_epi32 (__m512i __W, __mmask16 __U, __m512i __A,
-			unsigned int __B)
+_mm_mask_cvt_roundss_sd (__m128d __W, __mmask8 __U, __m128d __A,
+			 __m128 __B, const int __R)
 {
-  return (__m512i) __builtin_ia32_pslldi512_mask ((__v16si) __A, __B,
-						  (__v16si) __W,
-						  (__mmask16) __U);
+  return (__m128d) __builtin_ia32_cvtss2sd_mask_round ((__v2df) __A,
+						       (__v4sf) __B,
+						       (__v2df) __W,
+						       __U,
+						       __R);
 }
 
-extern __inline __m512i
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_slli_epi32 (__mmask16 __U, __m512i __A, unsigned int __B)
+_mm_maskz_cvt_roundss_sd (__mmask8 __U, __m128d __A,
+			  __m128 __B, const int __R)
 {
-  return (__m512i) __builtin_ia32_pslldi512_mask ((__v16si) __A, __B,
-						  (__v16si)
-						  _mm512_setzero_si512 (),
-						  (__mmask16) __U);
+  return (__m128d) __builtin_ia32_cvtss2sd_mask_round ((__v2df) __A,
+						       (__v4sf) __B,
+						       _mm_setzero_pd (),
+						       __U,
+						       __R);
 }
-#else
-#define _mm512_slli_epi32(X, C)						\
-  ((__m512i) __builtin_ia32_pslldi512_mask ((__v16si)(__m512i)(X),	\
-    (unsigned int)(C),							\
-    (__v16si)(__m512i)_mm512_undefined_epi32 (),			\
-    (__mmask16)-1))
-
-#define _mm512_mask_slli_epi32(W, U, X, C)				\
-  ((__m512i) __builtin_ia32_pslldi512_mask ((__v16si)(__m512i)(X),	\
-    (unsigned int)(C),							\
-    (__v16si)(__m512i)(W),						\
-    (__mmask16)(U)))
-
-#define _mm512_maskz_slli_epi32(U, X, C)				\
-  ((__m512i) __builtin_ia32_pslldi512_mask ((__v16si)(__m512i)(X),	\
-    (unsigned int)(C),							\
-    (__v16si)(__m512i)_mm512_setzero_si512 (),				\
-    (__mmask16)(U)))
-#endif
 
-extern __inline __m512i
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_sll_epi32 (__m512i __A, __m128i __B)
+_mm_getexp_round_ss (__m128 __A, __m128 __B, const int __R)
 {
-  return (__m512i) __builtin_ia32_pslld512_mask ((__v16si) __A,
-						 (__v4si) __B,
-						 (__v16si)
-						 _mm512_undefined_epi32 (),
-						 (__mmask16) -1);
+  return (__m128) __builtin_ia32_getexpss128_round ((__v4sf) __A,
+						    (__v4sf) __B,
+						    __R);
 }
 
-extern __inline __m512i
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_sll_epi32 (__m512i __W, __mmask16 __U, __m512i __A, __m128i __B)
+_mm_mask_getexp_round_ss (__m128 __W, __mmask8 __U, __m128 __A,
+			  __m128 __B, const int __R)
 {
-  return (__m512i) __builtin_ia32_pslld512_mask ((__v16si) __A,
-						 (__v4si) __B,
-						 (__v16si) __W,
-						 (__mmask16) __U);
+  return (__m128) __builtin_ia32_getexpss_mask_round ((__v4sf) __A,
+						 (__v4sf) __B,
+						 (__v4sf) __W,
+						 (__mmask8) __U, __R);
 }
 
-extern __inline __m512i
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_sll_epi32 (__mmask16 __U, __m512i __A, __m128i __B)
+_mm_maskz_getexp_round_ss (__mmask8 __U, __m128 __A, __m128 __B,
+			   const int __R)
 {
-  return (__m512i) __builtin_ia32_pslld512_mask ((__v16si) __A,
-						 (__v4si) __B,
-						 (__v16si)
-						 _mm512_setzero_si512 (),
-						 (__mmask16) __U);
+  return (__m128) __builtin_ia32_getexpss_mask_round ((__v4sf) __A,
+						 (__v4sf) __B,
+						 (__v4sf)
+						 _mm_setzero_ps (),
+						 (__mmask8) __U, __R);
 }
 
-#ifdef __OPTIMIZE__
-extern __inline __m512i
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_srli_epi32 (__m512i __A, unsigned int __B)
+_mm_getexp_round_sd (__m128d __A, __m128d __B, const int __R)
 {
-  return (__m512i) __builtin_ia32_psrldi512_mask ((__v16si) __A, __B,
-						  (__v16si)
-						  _mm512_undefined_epi32 (),
-						  (__mmask16) -1);
+  return (__m128d) __builtin_ia32_getexpsd128_round ((__v2df) __A,
+						     (__v2df) __B,
+						     __R);
 }
 
-extern __inline __m512i
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_srli_epi32 (__m512i __W, __mmask16 __U,
-			__m512i __A, unsigned int __B)
+_mm_mask_getexp_round_sd (__m128d __W, __mmask8 __U, __m128d __A,
+			  __m128d __B, const int __R)
 {
-  return (__m512i) __builtin_ia32_psrldi512_mask ((__v16si) __A, __B,
-						  (__v16si) __W,
-						  (__mmask16) __U);
+  return (__m128d) __builtin_ia32_getexpsd_mask_round ((__v2df) __A,
+						 (__v2df) __B,
+						 (__v2df) __W,
+						 (__mmask8) __U, __R);
 }
 
-extern __inline __m512i
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_srli_epi32 (__mmask16 __U, __m512i __A, unsigned int __B)
+_mm_maskz_getexp_round_sd (__mmask8 __U, __m128d __A, __m128d __B,
+			   const int __R)
 {
-  return (__m512i) __builtin_ia32_psrldi512_mask ((__v16si) __A, __B,
-						  (__v16si)
-						  _mm512_setzero_si512 (),
-						  (__mmask16) __U);
+  return (__m128d) __builtin_ia32_getexpsd_mask_round ((__v2df) __A,
+						 (__v2df) __B,
+						 (__v2df)
+						 _mm_setzero_pd (),
+						 (__mmask8) __U, __R);
 }
-#else
-#define _mm512_srli_epi32(X, C)						  \
-  ((__m512i) __builtin_ia32_psrldi512_mask ((__v16si)(__m512i)(X),	  \
-    (unsigned int)(C),							  \
-    (__v16si)(__m512i)_mm512_undefined_epi32 (),\
-    (__mmask16)-1))
-
-#define _mm512_mask_srli_epi32(W, U, X, C)				  \
-  ((__m512i) __builtin_ia32_psrldi512_mask ((__v16si)(__m512i)(X),	  \
-    (unsigned int)(C),							  \
-    (__v16si)(__m512i)(W),						  \
-    (__mmask16)(U)))
-
-#define _mm512_maskz_srli_epi32(U, X, C)				  \
-  ((__m512i) __builtin_ia32_psrldi512_mask ((__v16si)(__m512i)(X),	  \
-    (unsigned int)(C),							  \
-    (__v16si)(__m512i)_mm512_setzero_si512 (),				  \
-    (__mmask16)(U)))
-#endif
 
-extern __inline __m512i
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_srl_epi32 (__m512i __A, __m128i __B)
+_mm_getmant_round_sd (__m128d __A, __m128d __B,
+		      _MM_MANTISSA_NORM_ENUM __C,
+		      _MM_MANTISSA_SIGN_ENUM __D, const int __R)
 {
-  return (__m512i) __builtin_ia32_psrld512_mask ((__v16si) __A,
-						 (__v4si) __B,
-						 (__v16si)
-						 _mm512_undefined_epi32 (),
-						 (__mmask16) -1);
+  return (__m128d) __builtin_ia32_getmantsd_round ((__v2df) __A,
+						  (__v2df) __B,
+						  (__D << 2) | __C,
+						   __R);
 }
 
-extern __inline __m512i
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_srl_epi32 (__m512i __W, __mmask16 __U, __m512i __A, __m128i __B)
+_mm_mask_getmant_round_sd (__m128d __W, __mmask8 __U, __m128d __A,
+			      __m128d __B, _MM_MANTISSA_NORM_ENUM __C,
+			      _MM_MANTISSA_SIGN_ENUM __D, const int __R)
 {
-  return (__m512i) __builtin_ia32_psrld512_mask ((__v16si) __A,
-						 (__v4si) __B,
-						 (__v16si) __W,
-						 (__mmask16) __U);
+  return (__m128d) __builtin_ia32_getmantsd_mask_round ((__v2df) __A,
+						    (__v2df) __B,
+						    (__D << 2) | __C,
+                                                    (__v2df) __W,
+						     __U, __R);
 }
 
-extern __inline __m512i
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_srl_epi32 (__mmask16 __U, __m512i __A, __m128i __B)
+_mm_maskz_getmant_round_sd (__mmask8 __U, __m128d __A, __m128d __B,
+			       _MM_MANTISSA_NORM_ENUM __C,
+			       _MM_MANTISSA_SIGN_ENUM __D, const int __R)
 {
-  return (__m512i) __builtin_ia32_psrld512_mask ((__v16si) __A,
-						 (__v4si) __B,
-						 (__v16si)
-						 _mm512_setzero_si512 (),
-						 (__mmask16) __U);
-}
-
-#ifdef __OPTIMIZE__
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_srai_epi32 (__m512i __A, unsigned int __B)
-{
-  return (__m512i) __builtin_ia32_psradi512_mask ((__v16si) __A, __B,
-						  (__v16si)
-						  _mm512_undefined_epi32 (),
-						  (__mmask16) -1);
-}
-
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_srai_epi32 (__m512i __W, __mmask16 __U, __m512i __A,
-			unsigned int __B)
-{
-  return (__m512i) __builtin_ia32_psradi512_mask ((__v16si) __A, __B,
-						  (__v16si) __W,
-						  (__mmask16) __U);
-}
-
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_srai_epi32 (__mmask16 __U, __m512i __A, unsigned int __B)
-{
-  return (__m512i) __builtin_ia32_psradi512_mask ((__v16si) __A, __B,
-						  (__v16si)
-						  _mm512_setzero_si512 (),
-						  (__mmask16) __U);
-}
-#else
-#define _mm512_srai_epi32(X, C)						\
-  ((__m512i) __builtin_ia32_psradi512_mask ((__v16si)(__m512i)(X),	\
-    (unsigned int)(C),							\
-    (__v16si)(__m512i)_mm512_undefined_epi32 (),\
-    (__mmask16)-1))
-
-#define _mm512_mask_srai_epi32(W, U, X, C)				\
-  ((__m512i) __builtin_ia32_psradi512_mask ((__v16si)(__m512i)(X),	\
-    (unsigned int)(C),							\
-    (__v16si)(__m512i)(W),						\
-    (__mmask16)(U)))
-
-#define _mm512_maskz_srai_epi32(U, X, C)				\
-  ((__m512i) __builtin_ia32_psradi512_mask ((__v16si)(__m512i)(X),	\
-    (unsigned int)(C),							\
-    (__v16si)(__m512i)_mm512_setzero_si512 (),				\
-    (__mmask16)(U)))
-#endif
-
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_sra_epi32 (__m512i __A, __m128i __B)
-{
-  return (__m512i) __builtin_ia32_psrad512_mask ((__v16si) __A,
-						 (__v4si) __B,
-						 (__v16si)
-						 _mm512_undefined_epi32 (),
-						 (__mmask16) -1);
-}
-
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_sra_epi32 (__m512i __W, __mmask16 __U, __m512i __A, __m128i __B)
-{
-  return (__m512i) __builtin_ia32_psrad512_mask ((__v16si) __A,
-						 (__v4si) __B,
-						 (__v16si) __W,
-						 (__mmask16) __U);
-}
-
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_sra_epi32 (__mmask16 __U, __m512i __A, __m128i __B)
-{
-  return (__m512i) __builtin_ia32_psrad512_mask ((__v16si) __A,
-						 (__v4si) __B,
-						 (__v16si)
-						 _mm512_setzero_si512 (),
-						 (__mmask16) __U);
+  return (__m128d) __builtin_ia32_getmantsd_mask_round ((__v2df) __A,
+							(__v2df) __B,
+						        (__D << 2) | __C,
+                                                        (__v2df)
+                                                        _mm_setzero_pd(),
+						        __U, __R);
 }
 
-#ifdef __OPTIMIZE__
-extern __inline __m128d
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_add_round_sd (__m128d __A, __m128d __B, const int __R)
+_mm_getmant_round_ss (__m128 __A, __m128 __B,
+		      _MM_MANTISSA_NORM_ENUM __C,
+		      _MM_MANTISSA_SIGN_ENUM __D, const int __R)
 {
-  return (__m128d) __builtin_ia32_addsd_round ((__v2df) __A,
-					       (__v2df) __B,
-					       __R);
+  return (__m128) __builtin_ia32_getmantss_round ((__v4sf) __A,
+						  (__v4sf) __B,
+						  (__D << 2) | __C,
+						  __R);
 }
 
-extern __inline __m128d
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_add_round_sd (__m128d __W, __mmask8 __U, __m128d __A,
-			  __m128d __B, const int __R)
+_mm_mask_getmant_round_ss (__m128 __W, __mmask8 __U, __m128 __A,
+			      __m128 __B, _MM_MANTISSA_NORM_ENUM __C,
+			      _MM_MANTISSA_SIGN_ENUM __D, const int __R)
 {
-  return (__m128d) __builtin_ia32_addsd_mask_round ((__v2df) __A,
-						 (__v2df) __B,
-						 (__v2df) __W,
-						 (__mmask8) __U, __R);
+  return (__m128) __builtin_ia32_getmantss_mask_round ((__v4sf) __A,
+						    (__v4sf) __B,
+						    (__D << 2) | __C,
+                                                    (__v4sf) __W,
+						     __U, __R);
 }
 
-extern __inline __m128d
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_add_round_sd (__mmask8 __U, __m128d __A, __m128d __B,
-			   const int __R)
+_mm_maskz_getmant_round_ss (__mmask8 __U, __m128 __A, __m128 __B,
+			       _MM_MANTISSA_NORM_ENUM __C,
+			       _MM_MANTISSA_SIGN_ENUM __D, const int __R)
 {
-  return (__m128d) __builtin_ia32_addsd_mask_round ((__v2df) __A,
-						 (__v2df) __B,
-						 (__v2df)
-						 _mm_setzero_pd (),
-						 (__mmask8) __U, __R);
+  return (__m128) __builtin_ia32_getmantss_mask_round ((__v4sf) __A,
+							(__v4sf) __B,
+						        (__D << 2) | __C,
+                                                        (__v4sf)
+                                                        _mm_setzero_ps(),
+						        __U, __R);
 }
 
 extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_add_round_ss (__m128 __A, __m128 __B, const int __R)
+_mm_roundscale_round_ss (__m128 __A, __m128 __B, const int __imm,
+			 const int __R)
 {
-  return (__m128) __builtin_ia32_addss_round ((__v4sf) __A,
-					      (__v4sf) __B,
-					      __R);
+  return (__m128)
+    __builtin_ia32_rndscaless_mask_round ((__v4sf) __A,
+					  (__v4sf) __B, __imm,
+					  (__v4sf)
+					  _mm_setzero_ps (),
+					  (__mmask8) -1,
+					  __R);
 }
 
 extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_add_round_ss (__m128 __W, __mmask8 __U, __m128 __A,
-			  __m128 __B, const int __R)
+_mm_mask_roundscale_round_ss (__m128 __A, __mmask8 __B, __m128 __C,
+			      __m128 __D, const int __imm, const int __R)
 {
-  return (__m128) __builtin_ia32_addss_mask_round ((__v4sf) __A,
-						 (__v4sf) __B,
-						 (__v4sf) __W,
-						 (__mmask8) __U, __R);
+  return (__m128)
+    __builtin_ia32_rndscaless_mask_round ((__v4sf) __C,
+					  (__v4sf) __D, __imm,
+					  (__v4sf) __A,
+					  (__mmask8) __B,
+					  __R);
 }
 
 extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_add_round_ss (__mmask8 __U, __m128 __A, __m128 __B,
-			   const int __R)
+_mm_maskz_roundscale_round_ss (__mmask8 __A, __m128 __B, __m128 __C,
+			       const int __imm, const int __R)
 {
-  return (__m128) __builtin_ia32_addss_mask_round ((__v4sf) __A,
-						 (__v4sf) __B,
-						 (__v4sf)
-						 _mm_setzero_ps (),
-						 (__mmask8) __U, __R);
+  return (__m128)
+    __builtin_ia32_rndscaless_mask_round ((__v4sf) __B,
+					  (__v4sf) __C, __imm,
+					  (__v4sf)
+					  _mm_setzero_ps (),
+					  (__mmask8) __A,
+					  __R);
 }
 
 extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_sub_round_sd (__m128d __A, __m128d __B, const int __R)
+_mm_roundscale_round_sd (__m128d __A, __m128d __B, const int __imm,
+			 const int __R)
 {
-  return (__m128d) __builtin_ia32_subsd_round ((__v2df) __A,
-					       (__v2df) __B,
-					       __R);
+  return (__m128d)
+    __builtin_ia32_rndscalesd_mask_round ((__v2df) __A,
+					  (__v2df) __B, __imm,
+					  (__v2df)
+					  _mm_setzero_pd (),
+					  (__mmask8) -1,
+					  __R);
 }
 
 extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_sub_round_sd (__m128d __W, __mmask8 __U, __m128d __A,
-			  __m128d __B, const int __R)
+_mm_mask_roundscale_round_sd (__m128d __A, __mmask8 __B, __m128d __C,
+			      __m128d __D, const int __imm, const int __R)
 {
-  return (__m128d) __builtin_ia32_subsd_mask_round ((__v2df) __A,
-						 (__v2df) __B,
-						 (__v2df) __W,
-						 (__mmask8) __U, __R);
+  return (__m128d)
+    __builtin_ia32_rndscalesd_mask_round ((__v2df) __C,
+					  (__v2df) __D, __imm,
+					  (__v2df) __A,
+					  (__mmask8) __B,
+					  __R);
 }
 
 extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_sub_round_sd (__mmask8 __U, __m128d __A, __m128d __B,
-			   const int __R)
+_mm_maskz_roundscale_round_sd (__mmask8 __A, __m128d __B, __m128d __C,
+			       const int __imm, const int __R)
 {
-  return (__m128d) __builtin_ia32_subsd_mask_round ((__v2df) __A,
-						 (__v2df) __B,
-						 (__v2df)
-						 _mm_setzero_pd (),
-						 (__mmask8) __U, __R);
+  return (__m128d)
+    __builtin_ia32_rndscalesd_mask_round ((__v2df) __B,
+					  (__v2df) __C, __imm,
+					  (__v2df)
+					  _mm_setzero_pd (),
+					  (__mmask8) __A,
+					  __R);
 }
 
-extern __inline __m128
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_sub_round_ss (__m128 __A, __m128 __B, const int __R)
-{
-  return (__m128) __builtin_ia32_subss_round ((__v4sf) __A,
-					      (__v4sf) __B,
-					      __R);
-}
+#else
+#define _mm_cvt_roundsd_u32(A, B)   \
+    ((unsigned)__builtin_ia32_vcvtsd2usi32(A, B))
 
-extern __inline __m128
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_sub_round_ss (__m128 __W, __mmask8 __U, __m128 __A,
-			  __m128 __B, const int __R)
-{
-  return (__m128) __builtin_ia32_subss_mask_round ((__v4sf) __A,
-						 (__v4sf) __B,
-						 (__v4sf) __W,
-						 (__mmask8) __U, __R);
-}
+#define _mm_cvt_roundsd_si32(A, B)   \
+    ((int)__builtin_ia32_vcvtsd2si32(A, B))
 
-extern __inline __m128
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_sub_round_ss (__mmask8 __U, __m128 __A, __m128 __B,
-			   const int __R)
-{
-  return (__m128) __builtin_ia32_subss_mask_round ((__v4sf) __A,
-						 (__v4sf) __B,
-						 (__v4sf)
-						 _mm_setzero_ps (),
-						 (__mmask8) __U, __R);
-}
+#define _mm_cvt_roundsd_i32(A, B)   \
+    ((int)__builtin_ia32_vcvtsd2si32(A, B))
 
-#else
-#define _mm_add_round_sd(A, B, C)            \
-    (__m128d)__builtin_ia32_addsd_round(A, B, C)
+#define _mm_cvtt_roundsd_u32(A, B)   \
+    ((unsigned)__builtin_ia32_vcvttsd2usi32(A, B))
 
-#define _mm_mask_add_round_sd(W, U, A, B, C) \
-    (__m128d)__builtin_ia32_addsd_mask_round(A, B, W, U, C)
+#define _mm_cvtt_roundsd_si32(A, B)   \
+    ((int)__builtin_ia32_vcvttsd2si32(A, B))
 
-#define _mm_maskz_add_round_sd(U, A, B, C)   \
-    (__m128d)__builtin_ia32_addsd_mask_round(A, B, (__v2df)_mm_setzero_pd(), U, C)
+#define _mm_cvtt_roundsd_i32(A, B)   \
+    ((int)__builtin_ia32_vcvttsd2si32(A, B))
 
-#define _mm_add_round_ss(A, B, C)            \
-    (__m128)__builtin_ia32_addss_round(A, B, C)
+#define _mm_cvt_roundsd_ss(A, B, C)		 \
+    (__m128)__builtin_ia32_cvtsd2ss_round(A, B, C)
 
-#define _mm_mask_add_round_ss(W, U, A, B, C) \
-    (__m128)__builtin_ia32_addss_mask_round(A, B, W, U, C)
+#define _mm_mask_cvt_roundsd_ss(W, U, A, B, C)	\
+    (__m128)__builtin_ia32_cvtsd2ss_mask_round ((A), (B), (W), (U), (C))
 
-#define _mm_maskz_add_round_ss(U, A, B, C)   \
-    (__m128)__builtin_ia32_addss_mask_round(A, B, (__v4sf)_mm_setzero_ps(), U, C)
+#define _mm_maskz_cvt_roundsd_ss(U, A, B, C)	\
+    (__m128)__builtin_ia32_cvtsd2ss_mask_round ((A), (B), _mm_setzero_ps (), \
+						(U), (C))
 
-#define _mm_sub_round_sd(A, B, C)            \
-    (__m128d)__builtin_ia32_subsd_round(A, B, C)
+#define _mm_cvt_roundss_sd(A, B, C)		 \
+    (__m128d)__builtin_ia32_cvtss2sd_round(A, B, C)
 
-#define _mm_mask_sub_round_sd(W, U, A, B, C) \
-    (__m128d)__builtin_ia32_subsd_mask_round(A, B, W, U, C)
+#define _mm_mask_cvt_roundss_sd(W, U, A, B, C)	\
+    (__m128d)__builtin_ia32_cvtss2sd_mask_round ((A), (B), (W), (U), (C))
 
-#define _mm_maskz_sub_round_sd(U, A, B, C)   \
-    (__m128d)__builtin_ia32_subsd_mask_round(A, B, (__v2df)_mm_setzero_pd(), U, C)
+#define _mm_maskz_cvt_roundss_sd(U, A, B, C)	\
+    (__m128d)__builtin_ia32_cvtss2sd_mask_round ((A), (B), _mm_setzero_pd (), \
+						 (U), (C))
 
-#define _mm_sub_round_ss(A, B, C)            \
-    (__m128)__builtin_ia32_subss_round(A, B, C)
+#define _mm_getmant_round_sd(X, Y, C, D, R)                                                  \
+  ((__m128d)__builtin_ia32_getmantsd_round ((__v2df)(__m128d)(X),                    \
+					    (__v2df)(__m128d)(Y),	\
+					    (int)(((D)<<2) | (C)),	\
+					    (R)))
 
-#define _mm_mask_sub_round_ss(W, U, A, B, C) \
-    (__m128)__builtin_ia32_subss_mask_round(A, B, W, U, C)
+#define _mm_mask_getmant_round_sd(W, U, X, Y, C, D, R)                                       \
+  ((__m128d)__builtin_ia32_getmantsd_mask_round ((__v2df)(__m128d)(X),                  \
+					     (__v2df)(__m128d)(Y),                  \
+                                             (int)(((D)<<2) | (C)),                 \
+                                             (__v2df)(__m128d)(W),                   \
+                                             (__mmask8)(U),\
+					     (R)))
 
-#define _mm_maskz_sub_round_ss(U, A, B, C)   \
-    (__m128)__builtin_ia32_subss_mask_round(A, B, (__v4sf)_mm_setzero_ps(), U, C)
+#define _mm_maskz_getmant_round_sd(U, X, Y, C, D, R)                                         \
+  ((__m128d)__builtin_ia32_getmantsd_mask_round ((__v2df)(__m128d)(X),                  \
+                                                 (__v2df)(__m128d)(Y),                  \
+                                             (int)(((D)<<2) | (C)),              \
+                                             (__v2df)(__m128d)_mm_setzero_pd(),  \
+                                             (__mmask8)(U),\
+					     (R)))
+
+#define _mm_getmant_round_ss(X, Y, C, D, R)                                                  \
+  ((__m128)__builtin_ia32_getmantss_round ((__v4sf)(__m128)(X),                      \
+					   (__v4sf)(__m128)(Y),		\
+					   (int)(((D)<<2) | (C)),	\
+					   (R)))
+
+#define _mm_mask_getmant_round_ss(W, U, X, Y, C, D, R)                                       \
+  ((__m128)__builtin_ia32_getmantss_mask_round ((__v4sf)(__m128)(X),                  \
+					     (__v4sf)(__m128)(Y),                  \
+                                             (int)(((D)<<2) | (C)),                 \
+                                             (__v4sf)(__m128)(W),                   \
+                                             (__mmask8)(U),\
+					     (R)))
+
+#define _mm_maskz_getmant_round_ss(U, X, Y, C, D, R)                                         \
+  ((__m128)__builtin_ia32_getmantss_mask_round ((__v4sf)(__m128)(X),                  \
+                                                 (__v4sf)(__m128)(Y),                  \
+                                             (int)(((D)<<2) | (C)),              \
+                                             (__v4sf)(__m128)_mm_setzero_ps(),  \
+                                             (__mmask8)(U),\
+					     (R)))
+
+#define _mm_getexp_round_ss(A, B, R)						      \
+  ((__m128)__builtin_ia32_getexpss128_round((__v4sf)(__m128)(A), (__v4sf)(__m128)(B), R))
+
+#define _mm_mask_getexp_round_ss(W, U, A, B, C) \
+    (__m128)__builtin_ia32_getexpss_mask_round(A, B, W, U, C)
+
+#define _mm_maskz_getexp_round_ss(U, A, B, C)   \
+    (__m128)__builtin_ia32_getexpss_mask_round(A, B, (__v4sf)_mm_setzero_ps(), U, C)
+
+#define _mm_getexp_round_sd(A, B, R)						       \
+  ((__m128d)__builtin_ia32_getexpsd128_round((__v2df)(__m128d)(A), (__v2df)(__m128d)(B), R))
+
+#define _mm_mask_getexp_round_sd(W, U, A, B, C) \
+    (__m128d)__builtin_ia32_getexpsd_mask_round(A, B, W, U, C)
+
+#define _mm_maskz_getexp_round_sd(U, A, B, C)   \
+    (__m128d)__builtin_ia32_getexpsd_mask_round(A, B, (__v2df)_mm_setzero_pd(), U, C)
+
+#define _mm_roundscale_round_ss(A, B, I, R)				\
+  ((__m128)								\
+   __builtin_ia32_rndscaless_mask_round ((__v4sf) (__m128) (A),		\
+					 (__v4sf) (__m128) (B),		\
+					 (int) (I),			\
+					 (__v4sf) _mm_setzero_ps (),	\
+					 (__mmask8) (-1),		\
+					 (int) (R)))
+#define _mm_mask_roundscale_round_ss(A, U, B, C, I, R)		\
+  ((__m128)							\
+   __builtin_ia32_rndscaless_mask_round ((__v4sf) (__m128) (B),	\
+					 (__v4sf) (__m128) (C),	\
+					 (int) (I),		\
+					 (__v4sf) (__m128) (A),	\
+					 (__mmask8) (U),	\
+					 (int) (R)))
+#define _mm_maskz_roundscale_round_ss(U, A, B, I, R)			\
+  ((__m128)								\
+   __builtin_ia32_rndscaless_mask_round ((__v4sf) (__m128) (A),		\
+					 (__v4sf) (__m128) (B),		\
+					 (int) (I),			\
+					 (__v4sf) _mm_setzero_ps (),	\
+					 (__mmask8) (U),		\
+					 (int) (R)))
+#define _mm_roundscale_round_sd(A, B, I, R)				\
+  ((__m128d)								\
+   __builtin_ia32_rndscalesd_mask_round ((__v2df) (__m128d) (A),	\
+					 (__v2df) (__m128d) (B),	\
+					 (int) (I),			\
+					 (__v2df) _mm_setzero_pd (),	\
+					 (__mmask8) (-1),		\
+					 (int) (R)))
+#define _mm_mask_roundscale_round_sd(A, U, B, C, I, R)			\
+  ((__m128d)								\
+   __builtin_ia32_rndscalesd_mask_round ((__v2df) (__m128d) (B),	\
+					 (__v2df) (__m128d) (C),	\
+					 (int) (I),			\
+					 (__v2df) (__m128d) (A),	\
+					 (__mmask8) (U),		\
+					 (int) (R)))
+#define _mm_maskz_roundscale_round_sd(U, A, B, I, R)			\
+  ((__m128d)								\
+   __builtin_ia32_rndscalesd_mask_round ((__v2df) (__m128d) (A),	\
+					 (__v2df) (__m128d) (B),	\
+					 (int) (I),			\
+					 (__v2df) _mm_setzero_pd (),	\
+					 (__mmask8) (U),		\
+					 (int) (R)))
 
 #endif
 
-/* Constant helper to represent the ternary logic operations among
-   vector A, B and C.  */
-typedef enum
-{
-  _MM_TERNLOG_A = 0xF0,
-  _MM_TERNLOG_B = 0xCC,
-  _MM_TERNLOG_C = 0xAA
-} _MM_TERNLOG_ENUM;
+#define _mm_mask_cvtss_sd(W, U, A, B) \
+    _mm_mask_cvt_roundss_sd ((W), (U), (A), (B), _MM_FROUND_CUR_DIRECTION)
+
+#define _mm_maskz_cvtss_sd(U, A, B) \
+    _mm_maskz_cvt_roundss_sd ((U), (A), (B), _MM_FROUND_CUR_DIRECTION)
+
+#define _mm_mask_cvtsd_ss(W, U, A, B) \
+    _mm_mask_cvt_roundsd_ss ((W), (U), (A), (B), _MM_FROUND_CUR_DIRECTION)
+
+#define _mm_maskz_cvtsd_ss(U, A, B) \
+    _mm_maskz_cvt_roundsd_ss ((U), (A), (B), _MM_FROUND_CUR_DIRECTION)
 
 #ifdef __OPTIMIZE__
-extern __inline __m512i
+extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_ternarylogic_epi64 (__m512i __A, __m512i __B, __m512i __C,
-			   const int __imm)
+_kshiftli_mask16 (__mmask16 __A, unsigned int __B)
 {
-  return (__m512i)
-    __builtin_ia32_pternlogq512_mask ((__v8di) __A,
-				      (__v8di) __B,
-				      (__v8di) __C,
-				      (unsigned char) __imm,
-				      (__mmask8) -1);
+  return (__mmask16) __builtin_ia32_kshiftlihi ((__mmask16) __A,
+						(__mmask8) __B);
 }
 
-extern __inline __m512i
+extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_ternarylogic_epi64 (__m512i __A, __mmask8 __U, __m512i __B,
-				__m512i __C, const int __imm)
+_kshiftri_mask16 (__mmask16 __A, unsigned int __B)
 {
-  return (__m512i)
-    __builtin_ia32_pternlogq512_mask ((__v8di) __A,
-				      (__v8di) __B,
-				      (__v8di) __C,
-				      (unsigned char) __imm,
-				      (__mmask8) __U);
+  return (__mmask16) __builtin_ia32_kshiftrihi ((__mmask16) __A,
+						(__mmask8) __B);
 }
 
-extern __inline __m512i
+extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_ternarylogic_epi64 (__mmask8 __U, __m512i __A, __m512i __B,
-				 __m512i __C, const int __imm)
+_mm_cmp_round_sd_mask (__m128d __X, __m128d __Y, const int __P, const int __R)
 {
-  return (__m512i)
-    __builtin_ia32_pternlogq512_maskz ((__v8di) __A,
-				       (__v8di) __B,
-				       (__v8di) __C,
-				       (unsigned char) __imm,
-				       (__mmask8) __U);
+  return (__mmask8) __builtin_ia32_cmpsd_mask ((__v2df) __X,
+					       (__v2df) __Y, __P,
+					       (__mmask8) -1, __R);
 }
 
-extern __inline __m512i
+extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_ternarylogic_epi32 (__m512i __A, __m512i __B, __m512i __C,
-			   const int __imm)
+_mm_mask_cmp_round_sd_mask (__mmask8 __M, __m128d __X, __m128d __Y,
+			    const int __P, const int __R)
 {
-  return (__m512i)
-    __builtin_ia32_pternlogd512_mask ((__v16si) __A,
-				      (__v16si) __B,
-				      (__v16si) __C,
-				      (unsigned char) __imm,
-				      (__mmask16) -1);
+  return (__mmask8) __builtin_ia32_cmpsd_mask ((__v2df) __X,
+					       (__v2df) __Y, __P,
+					       (__mmask8) __M, __R);
 }
 
-extern __inline __m512i
+extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_ternarylogic_epi32 (__m512i __A, __mmask16 __U, __m512i __B,
-				__m512i __C, const int __imm)
+_mm_cmp_round_ss_mask (__m128 __X, __m128 __Y, const int __P, const int __R)
 {
-  return (__m512i)
-    __builtin_ia32_pternlogd512_mask ((__v16si) __A,
-				      (__v16si) __B,
-				      (__v16si) __C,
-				      (unsigned char) __imm,
-				      (__mmask16) __U);
+  return (__mmask8) __builtin_ia32_cmpss_mask ((__v4sf) __X,
+					       (__v4sf) __Y, __P,
+					       (__mmask8) -1, __R);
 }
 
-extern __inline __m512i
+extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_ternarylogic_epi32 (__mmask16 __U, __m512i __A, __m512i __B,
-				 __m512i __C, const int __imm)
+_mm_mask_cmp_round_ss_mask (__mmask8 __M, __m128 __X, __m128 __Y,
+			    const int __P, const int __R)
 {
-  return (__m512i)
-    __builtin_ia32_pternlogd512_maskz ((__v16si) __A,
-				       (__v16si) __B,
-				       (__v16si) __C,
-				       (unsigned char) __imm,
-				       (__mmask16) __U);
+  return (__mmask8) __builtin_ia32_cmpss_mask ((__v4sf) __X,
+					       (__v4sf) __Y, __P,
+					       (__mmask8) __M, __R);
 }
+
 #else
-#define _mm512_ternarylogic_epi64(A, B, C, I)			\
-  ((__m512i)							\
-   __builtin_ia32_pternlogq512_mask ((__v8di) (__m512i) (A),	\
-				     (__v8di) (__m512i) (B),	\
-				     (__v8di) (__m512i) (C),	\
-				     (unsigned char) (I),	\
-				     (__mmask8) -1))
-#define _mm512_mask_ternarylogic_epi64(A, U, B, C, I)		\
-  ((__m512i)							\
-   __builtin_ia32_pternlogq512_mask ((__v8di) (__m512i) (A),	\
-				     (__v8di) (__m512i) (B),	\
-				     (__v8di) (__m512i) (C),	\
-				     (unsigned char)(I),	\
-				     (__mmask8) (U)))
-#define _mm512_maskz_ternarylogic_epi64(U, A, B, C, I)		\
-  ((__m512i)							\
-   __builtin_ia32_pternlogq512_maskz ((__v8di) (__m512i) (A),	\
-				      (__v8di) (__m512i) (B),	\
-				      (__v8di) (__m512i) (C),	\
-				      (unsigned char) (I),	\
-				      (__mmask8) (U)))
-#define _mm512_ternarylogic_epi32(A, B, C, I)			\
-  ((__m512i)							\
-   __builtin_ia32_pternlogd512_mask ((__v16si) (__m512i) (A),	\
-				     (__v16si) (__m512i) (B),	\
-				     (__v16si) (__m512i) (C),	\
-				     (unsigned char) (I),	\
-				     (__mmask16) -1))
-#define _mm512_mask_ternarylogic_epi32(A, U, B, C, I)		\
-  ((__m512i)							\
-   __builtin_ia32_pternlogd512_mask ((__v16si) (__m512i) (A),	\
-				     (__v16si) (__m512i) (B),	\
-				     (__v16si) (__m512i) (C),	\
-				     (unsigned char) (I),	\
-				     (__mmask16) (U)))
-#define _mm512_maskz_ternarylogic_epi32(U, A, B, C, I)		\
-  ((__m512i)							\
-   __builtin_ia32_pternlogd512_maskz ((__v16si) (__m512i) (A),	\
-				      (__v16si) (__m512i) (B),	\
-				      (__v16si) (__m512i) (C),	\
-				      (unsigned char) (I),	\
-				      (__mmask16) (U)))
-#endif
+#define _kshiftli_mask16(X, Y)						\
+  ((__mmask16) __builtin_ia32_kshiftlihi ((__mmask16)(X), (__mmask8)(Y)))
 
-extern __inline __m512d
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_rcp14_pd (__m512d __A)
-{
-  return (__m512d) __builtin_ia32_rcp14pd512_mask ((__v8df) __A,
-						   (__v8df)
-						   _mm512_undefined_pd (),
-						   (__mmask8) -1);
-}
+#define _kshiftri_mask16(X, Y)						\
+  ((__mmask16) __builtin_ia32_kshiftrihi ((__mmask16)(X), (__mmask8)(Y)))
 
-extern __inline __m512d
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_rcp14_pd (__m512d __W, __mmask8 __U, __m512d __A)
-{
-  return (__m512d) __builtin_ia32_rcp14pd512_mask ((__v8df) __A,
-						   (__v8df) __W,
-						   (__mmask8) __U);
-}
+#define _mm_cmp_round_sd_mask(X, Y, P, R)				\
+  ((__mmask8) __builtin_ia32_cmpsd_mask ((__v2df)(__m128d)(X),		\
+					 (__v2df)(__m128d)(Y), (int)(P),\
+					 (__mmask8)-1, R))
 
-extern __inline __m512d
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_rcp14_pd (__mmask8 __U, __m512d __A)
-{
-  return (__m512d) __builtin_ia32_rcp14pd512_mask ((__v8df) __A,
-						   (__v8df)
-						   _mm512_setzero_pd (),
-						   (__mmask8) __U);
-}
+#define _mm_mask_cmp_round_sd_mask(M, X, Y, P, R)			\
+  ((__mmask8) __builtin_ia32_cmpsd_mask ((__v2df)(__m128d)(X),		\
+					 (__v2df)(__m128d)(Y), (int)(P),\
+					 (M), R))
 
-extern __inline __m512
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_rcp14_ps (__m512 __A)
-{
-  return (__m512) __builtin_ia32_rcp14ps512_mask ((__v16sf) __A,
-						  (__v16sf)
-						  _mm512_undefined_ps (),
-						  (__mmask16) -1);
-}
+#define _mm_cmp_round_ss_mask(X, Y, P, R)				\
+  ((__mmask8) __builtin_ia32_cmpss_mask ((__v4sf)(__m128)(X),		\
+					 (__v4sf)(__m128)(Y), (int)(P), \
+					 (__mmask8)-1, R))
 
-extern __inline __m512
+#define _mm_mask_cmp_round_ss_mask(M, X, Y, P, R)			\
+  ((__mmask8) __builtin_ia32_cmpss_mask ((__v4sf)(__m128)(X),		\
+					 (__v4sf)(__m128)(Y), (int)(P), \
+					 (M), R))
+
+#endif
+
+extern __inline unsigned char
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_rcp14_ps (__m512 __W, __mmask16 __U, __m512 __A)
+_kortest_mask16_u8  (__mmask16 __A,  __mmask16 __B, unsigned char *__CF)
 {
-  return (__m512) __builtin_ia32_rcp14ps512_mask ((__v16sf) __A,
-						  (__v16sf) __W,
-						  (__mmask16) __U);
+  *__CF = (unsigned char) __builtin_ia32_kortestchi (__A, __B);
+  return (unsigned char) __builtin_ia32_kortestzhi (__A, __B);
 }
 
-extern __inline __m512
+extern __inline unsigned char
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_rcp14_ps (__mmask16 __U, __m512 __A)
+_kortestz_mask16_u8 (__mmask16 __A, __mmask16 __B)
 {
-  return (__m512) __builtin_ia32_rcp14ps512_mask ((__v16sf) __A,
-						  (__v16sf)
-						  _mm512_setzero_ps (),
-						  (__mmask16) __U);
+  return (unsigned char) __builtin_ia32_kortestzhi ((__mmask16) __A,
+						    (__mmask16) __B);
 }
 
-extern __inline __m128d
+extern __inline unsigned char
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_rcp14_sd (__m128d __A, __m128d __B)
+_kortestc_mask16_u8 (__mmask16 __A, __mmask16 __B)
 {
-  return (__m128d) __builtin_ia32_rcp14sd ((__v2df) __B,
-					   (__v2df) __A);
+  return (unsigned char) __builtin_ia32_kortestchi ((__mmask16) __A,
+						    (__mmask16) __B);
 }
 
-extern __inline __m128d
+extern __inline unsigned int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_rcp14_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B)
+_cvtmask16_u32 (__mmask16 __A)
 {
-  return (__m128d) __builtin_ia32_rcp14sd_mask ((__v2df) __B,
-						(__v2df) __A,
-						(__v2df) __W,
-						(__mmask8) __U);
+  return (unsigned int) __builtin_ia32_kmovw ((__mmask16 ) __A);
 }
 
-extern __inline __m128d
+extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_rcp14_sd (__mmask8 __U, __m128d __A, __m128d __B)
+_cvtu32_mask16 (unsigned int __A)
 {
-  return (__m128d) __builtin_ia32_rcp14sd_mask ((__v2df) __B,
-						(__v2df) __A,
-						(__v2df) _mm_setzero_ps (),
-						(__mmask8) __U);
+  return (__mmask16) __builtin_ia32_kmovw ((__mmask16 ) __A);
 }
 
-extern __inline __m128
+extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_rcp14_ss (__m128 __A, __m128 __B)
+_load_mask16 (__mmask16 *__A)
 {
-  return (__m128) __builtin_ia32_rcp14ss ((__v4sf) __B,
-					  (__v4sf) __A);
+  return (__mmask16) __builtin_ia32_kmovw (*(__mmask16 *) __A);
 }
 
-extern __inline __m128
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_rcp14_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B)
+_store_mask16 (__mmask16 *__A, __mmask16 __B)
 {
-  return (__m128) __builtin_ia32_rcp14ss_mask ((__v4sf) __B,
-						(__v4sf) __A,
-						(__v4sf) __W,
-						(__mmask8) __U);
+  *(__mmask16 *) __A = __builtin_ia32_kmovw (__B);
 }
 
-extern __inline __m128
+extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_rcp14_ss (__mmask8 __U, __m128 __A, __m128 __B)
+_kand_mask16 (__mmask16 __A, __mmask16 __B)
 {
-  return (__m128) __builtin_ia32_rcp14ss_mask ((__v4sf) __B,
-						(__v4sf) __A,
-						(__v4sf) _mm_setzero_ps (),
-						(__mmask8) __U);
+  return (__mmask16) __builtin_ia32_kandhi ((__mmask16) __A, (__mmask16) __B);
 }
 
-extern __inline __m512d
+extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_rsqrt14_pd (__m512d __A)
+_kandn_mask16 (__mmask16 __A, __mmask16 __B)
 {
-  return (__m512d) __builtin_ia32_rsqrt14pd512_mask ((__v8df) __A,
-						     (__v8df)
-						     _mm512_undefined_pd (),
-						     (__mmask8) -1);
+  return (__mmask16) __builtin_ia32_kandnhi ((__mmask16) __A,
+					     (__mmask16) __B);
 }
 
-extern __inline __m512d
+extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_rsqrt14_pd (__m512d __W, __mmask8 __U, __m512d __A)
+_kor_mask16 (__mmask16 __A, __mmask16 __B)
 {
-  return (__m512d) __builtin_ia32_rsqrt14pd512_mask ((__v8df) __A,
-						     (__v8df) __W,
-						     (__mmask8) __U);
+  return (__mmask16) __builtin_ia32_korhi ((__mmask16) __A, (__mmask16) __B);
 }
 
-extern __inline __m512d
+extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_rsqrt14_pd (__mmask8 __U, __m512d __A)
+_kxnor_mask16 (__mmask16 __A, __mmask16 __B)
 {
-  return (__m512d) __builtin_ia32_rsqrt14pd512_mask ((__v8df) __A,
-						     (__v8df)
-						     _mm512_setzero_pd (),
-						     (__mmask8) __U);
+  return (__mmask16) __builtin_ia32_kxnorhi ((__mmask16) __A, (__mmask16) __B);
 }
 
-extern __inline __m512
+extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_rsqrt14_ps (__m512 __A)
+_kxor_mask16 (__mmask16 __A, __mmask16 __B)
 {
-  return (__m512) __builtin_ia32_rsqrt14ps512_mask ((__v16sf) __A,
-						    (__v16sf)
-						    _mm512_undefined_ps (),
-						    (__mmask16) -1);
+  return (__mmask16) __builtin_ia32_kxorhi ((__mmask16) __A, (__mmask16) __B);
 }
 
-extern __inline __m512
+extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_rsqrt14_ps (__m512 __W, __mmask16 __U, __m512 __A)
+_knot_mask16 (__mmask16 __A)
 {
-  return (__m512) __builtin_ia32_rsqrt14ps512_mask ((__v16sf) __A,
-						    (__v16sf) __W,
-						    (__mmask16) __U);
+  return (__mmask16) __builtin_ia32_knothi ((__mmask16) __A);
 }
 
-extern __inline __m512
+extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_rsqrt14_ps (__mmask16 __U, __m512 __A)
+_kunpackb_mask16 (__mmask8 __A, __mmask8 __B)
 {
-  return (__m512) __builtin_ia32_rsqrt14ps512_mask ((__v16sf) __A,
-						    (__v16sf)
-						    _mm512_setzero_ps (),
-						    (__mmask16) __U);
+  return (__mmask16) __builtin_ia32_kunpckhi ((__mmask16) __A, (__mmask16) __B);
 }
 
+#ifdef __OPTIMIZE__
 extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_rsqrt14_sd (__m128d __A, __m128d __B)
+_mm_max_round_sd (__m128d __A, __m128d __B, const int __R)
 {
-  return (__m128d) __builtin_ia32_rsqrt14sd ((__v2df) __B,
-					     (__v2df) __A);
+  return (__m128d) __builtin_ia32_maxsd_round ((__v2df) __A,
+					       (__v2df) __B,
+					       __R);
 }
 
 extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_rsqrt14_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B)
+_mm_mask_max_round_sd (__m128d __W, __mmask8 __U, __m128d __A,
+			  __m128d __B, const int __R)
 {
-  return (__m128d) __builtin_ia32_rsqrt14sd_mask ((__v2df) __B,
-						 (__v2df) __A,
+  return (__m128d) __builtin_ia32_maxsd_mask_round ((__v2df) __A,
+						 (__v2df) __B,
 						 (__v2df) __W,
-						 (__mmask8) __U);
+						 (__mmask8) __U, __R);
 }
 
 extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_rsqrt14_sd (__mmask8 __U, __m128d __A, __m128d __B)
+_mm_maskz_max_round_sd (__mmask8 __U, __m128d __A, __m128d __B,
+			   const int __R)
 {
-  return (__m128d) __builtin_ia32_rsqrt14sd_mask ((__v2df) __B,
-						 (__v2df) __A,
-						 (__v2df) _mm_setzero_pd (),
-						 (__mmask8) __U);
+  return (__m128d) __builtin_ia32_maxsd_mask_round ((__v2df) __A,
+						 (__v2df) __B,
+						 (__v2df)
+						 _mm_setzero_pd (),
+						 (__mmask8) __U, __R);
 }
 
 extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_rsqrt14_ss (__m128 __A, __m128 __B)
+_mm_max_round_ss (__m128 __A, __m128 __B, const int __R)
 {
-  return (__m128) __builtin_ia32_rsqrt14ss ((__v4sf) __B,
-					    (__v4sf) __A);
+  return (__m128) __builtin_ia32_maxss_round ((__v4sf) __A,
+					      (__v4sf) __B,
+					      __R);
 }
 
 extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_rsqrt14_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B)
-{
-  return (__m128) __builtin_ia32_rsqrt14ss_mask ((__v4sf) __B,
-						 (__v4sf) __A,
+_mm_mask_max_round_ss (__m128 __W, __mmask8 __U, __m128 __A,
+			  __m128 __B, const int __R)
+{
+  return (__m128) __builtin_ia32_maxss_mask_round ((__v4sf) __A,
+						 (__v4sf) __B,
 						 (__v4sf) __W,
-						 (__mmask8) __U);
+						 (__mmask8) __U, __R);
 }
 
 extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_rsqrt14_ss (__mmask8 __U, __m128 __A, __m128 __B)
+_mm_maskz_max_round_ss (__mmask8 __U, __m128 __A, __m128 __B,
+			   const int __R)
 {
-  return (__m128) __builtin_ia32_rsqrt14ss_mask ((__v4sf) __B,
-						(__v4sf) __A,
-						(__v4sf) _mm_setzero_ps (),
-						(__mmask8) __U);
+  return (__m128) __builtin_ia32_maxss_mask_round ((__v4sf) __A,
+						 (__v4sf) __B,
+						 (__v4sf)
+						 _mm_setzero_ps (),
+						 (__mmask8) __U, __R);
 }
 
-#ifdef __OPTIMIZE__
-extern __inline __m512d
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_sqrt_round_pd (__m512d __A, const int __R)
+_mm_min_round_sd (__m128d __A, __m128d __B, const int __R)
 {
-  return (__m512d) __builtin_ia32_sqrtpd512_mask ((__v8df) __A,
-						  (__v8df)
-						  _mm512_undefined_pd (),
-						  (__mmask8) -1, __R);
+  return (__m128d) __builtin_ia32_minsd_round ((__v2df) __A,
+					       (__v2df) __B,
+					       __R);
 }
 
-extern __inline __m512d
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_sqrt_round_pd (__m512d __W, __mmask8 __U, __m512d __A,
-			   const int __R)
+_mm_mask_min_round_sd (__m128d __W, __mmask8 __U, __m128d __A,
+			  __m128d __B, const int __R)
 {
-  return (__m512d) __builtin_ia32_sqrtpd512_mask ((__v8df) __A,
-						  (__v8df) __W,
-						  (__mmask8) __U, __R);
+  return (__m128d) __builtin_ia32_minsd_mask_round ((__v2df) __A,
+						 (__v2df) __B,
+						 (__v2df) __W,
+						 (__mmask8) __U, __R);
 }
 
-extern __inline __m512d
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_sqrt_round_pd (__mmask8 __U, __m512d __A, const int __R)
+_mm_maskz_min_round_sd (__mmask8 __U, __m128d __A, __m128d __B,
+			   const int __R)
 {
-  return (__m512d) __builtin_ia32_sqrtpd512_mask ((__v8df) __A,
-						  (__v8df)
-						  _mm512_setzero_pd (),
-						  (__mmask8) __U, __R);
+  return (__m128d) __builtin_ia32_minsd_mask_round ((__v2df) __A,
+						 (__v2df) __B,
+						 (__v2df)
+						 _mm_setzero_pd (),
+						 (__mmask8) __U, __R);
 }
 
-extern __inline __m512
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_sqrt_round_ps (__m512 __A, const int __R)
+_mm_min_round_ss (__m128 __A, __m128 __B, const int __R)
 {
-  return (__m512) __builtin_ia32_sqrtps512_mask ((__v16sf) __A,
-						 (__v16sf)
-						 _mm512_undefined_ps (),
-						 (__mmask16) -1, __R);
+  return (__m128) __builtin_ia32_minss_round ((__v4sf) __A,
+					      (__v4sf) __B,
+					      __R);
 }
 
-extern __inline __m512
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_sqrt_round_ps (__m512 __W, __mmask16 __U, __m512 __A, const int __R)
+_mm_mask_min_round_ss (__m128 __W, __mmask8 __U, __m128 __A,
+			  __m128 __B, const int __R)
 {
-  return (__m512) __builtin_ia32_sqrtps512_mask ((__v16sf) __A,
-						 (__v16sf) __W,
-						 (__mmask16) __U, __R);
+  return (__m128) __builtin_ia32_minss_mask_round ((__v4sf) __A,
+						 (__v4sf) __B,
+						 (__v4sf) __W,
+						 (__mmask8) __U, __R);
 }
 
-extern __inline __m512
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_sqrt_round_ps (__mmask16 __U, __m512 __A, const int __R)
+_mm_maskz_min_round_ss (__mmask8 __U, __m128 __A, __m128 __B,
+			   const int __R)
 {
-  return (__m512) __builtin_ia32_sqrtps512_mask ((__v16sf) __A,
-						 (__v16sf)
-						 _mm512_setzero_ps (),
-						 (__mmask16) __U, __R);
+  return (__m128) __builtin_ia32_minss_mask_round ((__v4sf) __A,
+						 (__v4sf) __B,
+						 (__v4sf)
+						 _mm_setzero_ps (),
+						 (__mmask8) __U, __R);
 }
 
+#else
+#define _mm_max_round_sd(A, B, C)            \
+    (__m128d)__builtin_ia32_maxsd_round(A, B, C)
+
+#define _mm_mask_max_round_sd(W, U, A, B, C) \
+    (__m128d)__builtin_ia32_maxsd_mask_round(A, B, W, U, C)
+
+#define _mm_maskz_max_round_sd(U, A, B, C)   \
+    (__m128d)__builtin_ia32_maxsd_mask_round(A, B, (__v2df)_mm_setzero_pd(), U, C)
+
+#define _mm_max_round_ss(A, B, C)            \
+    (__m128)__builtin_ia32_maxss_round(A, B, C)
+
+#define _mm_mask_max_round_ss(W, U, A, B, C) \
+    (__m128)__builtin_ia32_maxss_mask_round(A, B, W, U, C)
+
+#define _mm_maskz_max_round_ss(U, A, B, C)   \
+    (__m128)__builtin_ia32_maxss_mask_round(A, B, (__v4sf)_mm_setzero_ps(), U, C)
+
+#define _mm_min_round_sd(A, B, C)            \
+    (__m128d)__builtin_ia32_minsd_round(A, B, C)
+
+#define _mm_mask_min_round_sd(W, U, A, B, C) \
+    (__m128d)__builtin_ia32_minsd_mask_round(A, B, W, U, C)
+
+#define _mm_maskz_min_round_sd(U, A, B, C)   \
+    (__m128d)__builtin_ia32_minsd_mask_round(A, B, (__v2df)_mm_setzero_pd(), U, C)
+
+#define _mm_min_round_ss(A, B, C)            \
+    (__m128)__builtin_ia32_minss_round(A, B, C)
+
+#define _mm_mask_min_round_ss(W, U, A, B, C) \
+    (__m128)__builtin_ia32_minss_mask_round(A, B, W, U, C)
+
+#define _mm_maskz_min_round_ss(U, A, B, C)   \
+    (__m128)__builtin_ia32_minss_mask_round(A, B, (__v4sf)_mm_setzero_ps(), U, C)
+
+#endif
+
+#ifdef __OPTIMIZE__
 extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_sqrt_round_sd (__m128d __A, __m128d __B, const int __R)
+_mm_fmadd_round_sd (__m128d __W, __m128d __A, __m128d __B, const int __R)
 {
-  return (__m128d) __builtin_ia32_sqrtsd_mask_round ((__v2df) __B,
-						     (__v2df) __A,
-						     (__v2df)
-						     _mm_setzero_pd (),
-						     (__mmask8) -1, __R);
+  return (__m128d) __builtin_ia32_vfmaddsd3_round ((__v2df) __W,
+						   (__v2df) __A,
+						   (__v2df) __B,
+						   __R);
 }
 
-extern __inline __m128d
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_sqrt_round_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B,
-			const int __R)
+_mm_fmadd_round_ss (__m128 __W, __m128 __A, __m128 __B, const int __R)
 {
-  return (__m128d) __builtin_ia32_sqrtsd_mask_round ((__v2df) __B,
-						     (__v2df) __A,
-						     (__v2df) __W,
-						     (__mmask8) __U, __R);
+  return (__m128) __builtin_ia32_vfmaddss3_round ((__v4sf) __W,
+						  (__v4sf) __A,
+						  (__v4sf) __B,
+						  __R);
 }
 
 extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_sqrt_round_sd (__mmask8 __U, __m128d __A, __m128d __B, const int __R)
+_mm_fmsub_round_sd (__m128d __W, __m128d __A, __m128d __B, const int __R)
 {
-  return (__m128d) __builtin_ia32_sqrtsd_mask_round ((__v2df) __B,
-						     (__v2df) __A,
-						     (__v2df)
-						     _mm_setzero_pd (),
-						     (__mmask8) __U, __R);
+  return (__m128d) __builtin_ia32_vfmaddsd3_round ((__v2df) __W,
+						   (__v2df) __A,
+						   -(__v2df) __B,
+						   __R);
 }
 
 extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_sqrt_round_ss (__m128 __A, __m128 __B, const int __R)
+_mm_fmsub_round_ss (__m128 __W, __m128 __A, __m128 __B, const int __R)
 {
-  return (__m128) __builtin_ia32_sqrtss_mask_round ((__v4sf) __B,
-						    (__v4sf) __A,
-						    (__v4sf)
-						    _mm_setzero_ps (),
-						    (__mmask8) -1, __R);
+  return (__m128) __builtin_ia32_vfmaddss3_round ((__v4sf) __W,
+						  (__v4sf) __A,
+						  -(__v4sf) __B,
+						  __R);
+}
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fnmadd_round_sd (__m128d __W, __m128d __A, __m128d __B, const int __R)
+{
+  return (__m128d) __builtin_ia32_vfmaddsd3_round ((__v2df) __W,
+						   -(__v2df) __A,
+						   (__v2df) __B,
+						   __R);
 }
 
 extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_sqrt_round_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B,
-			const int __R)
+_mm_fnmadd_round_ss (__m128 __W, __m128 __A, __m128 __B, const int __R)
 {
-  return (__m128) __builtin_ia32_sqrtss_mask_round ((__v4sf) __B,
-						    (__v4sf) __A,
-						    (__v4sf) __W,
-						    (__mmask8) __U, __R);
+  return (__m128) __builtin_ia32_vfmaddss3_round ((__v4sf) __W,
+						  -(__v4sf) __A,
+						  (__v4sf) __B,
+						  __R);
+}
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fnmsub_round_sd (__m128d __W, __m128d __A, __m128d __B, const int __R)
+{
+  return (__m128d) __builtin_ia32_vfmaddsd3_round ((__v2df) __W,
+						   -(__v2df) __A,
+						   -(__v2df) __B,
+						   __R);
 }
 
 extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_sqrt_round_ss (__mmask8 __U, __m128 __A, __m128 __B, const int __R)
+_mm_fnmsub_round_ss (__m128 __W, __m128 __A, __m128 __B, const int __R)
 {
-  return (__m128) __builtin_ia32_sqrtss_mask_round ((__v4sf) __B,
-						    (__v4sf) __A,
-						    (__v4sf)
-						    _mm_setzero_ps (),
-						    (__mmask8) __U, __R);
+  return (__m128) __builtin_ia32_vfmaddss3_round ((__v4sf) __W,
+						  -(__v4sf) __A,
+						  -(__v4sf) __B,
+						  __R);
 }
 #else
-#define _mm512_sqrt_round_pd(A, C)            \
-    (__m512d)__builtin_ia32_sqrtpd512_mask(A, (__v8df)_mm512_undefined_pd(), -1, C)
+#define _mm_fmadd_round_sd(A, B, C, R)            \
+    (__m128d)__builtin_ia32_vfmaddsd3_round(A, B, C, R)
 
-#define _mm512_mask_sqrt_round_pd(W, U, A, C) \
-    (__m512d)__builtin_ia32_sqrtpd512_mask(A, W, U, C)
+#define _mm_fmadd_round_ss(A, B, C, R)            \
+    (__m128)__builtin_ia32_vfmaddss3_round(A, B, C, R)
 
-#define _mm512_maskz_sqrt_round_pd(U, A, C)   \
-    (__m512d)__builtin_ia32_sqrtpd512_mask(A, (__v8df)_mm512_setzero_pd(), U, C)
+#define _mm_fmsub_round_sd(A, B, C, R)            \
+    (__m128d)__builtin_ia32_vfmaddsd3_round(A, B, -(C), R)
 
-#define _mm512_sqrt_round_ps(A, C)            \
-    (__m512)__builtin_ia32_sqrtps512_mask(A, (__v16sf)_mm512_undefined_ps(), -1, C)
+#define _mm_fmsub_round_ss(A, B, C, R)            \
+    (__m128)__builtin_ia32_vfmaddss3_round(A, B, -(C), R)
 
-#define _mm512_mask_sqrt_round_ps(W, U, A, C) \
-    (__m512)__builtin_ia32_sqrtps512_mask(A, W, U, C)
+#define _mm_fnmadd_round_sd(A, B, C, R)            \
+    (__m128d)__builtin_ia32_vfmaddsd3_round(A, -(B), C, R)
 
-#define _mm512_maskz_sqrt_round_ps(U, A, C)   \
-    (__m512)__builtin_ia32_sqrtps512_mask(A, (__v16sf)_mm512_setzero_ps(), U, C)
+#define _mm_fnmadd_round_ss(A, B, C, R)            \
+   (__m128)__builtin_ia32_vfmaddss3_round(A, -(B), C, R)
 
-#define _mm_sqrt_round_sd(A, B, C)	      \
-    (__m128d)__builtin_ia32_sqrtsd_mask_round (B, A, \
-	(__v2df) _mm_setzero_pd (), -1, C)
+#define _mm_fnmsub_round_sd(A, B, C, R)            \
+    (__m128d)__builtin_ia32_vfmaddsd3_round(A, -(B), -(C), R)
 
-#define _mm_mask_sqrt_round_sd(W, U, A, B, C) \
-    (__m128d)__builtin_ia32_sqrtsd_mask_round (B, A, W, U, C)
-
-#define _mm_maskz_sqrt_round_sd(U, A, B, C)   \
-    (__m128d)__builtin_ia32_sqrtsd_mask_round (B, A, \
-	(__v2df) _mm_setzero_pd (), U, C)
-
-#define _mm_sqrt_round_ss(A, B, C)	      \
-    (__m128)__builtin_ia32_sqrtss_mask_round (B, A, \
-	(__v4sf) _mm_setzero_ps (), -1, C)
-
-#define _mm_mask_sqrt_round_ss(W, U, A, B, C) \
-    (__m128)__builtin_ia32_sqrtss_mask_round (B, A, W, U, C)
-
-#define _mm_maskz_sqrt_round_ss(U, A, B, C)   \
-    (__m128)__builtin_ia32_sqrtss_mask_round (B, A, \
-	(__v4sf) _mm_setzero_ps (), U, C)
+#define _mm_fnmsub_round_ss(A, B, C, R)            \
+    (__m128)__builtin_ia32_vfmaddss3_round(A, -(B), -(C), R)
 #endif
 
-#define _mm_mask_sqrt_sd(W, U, A, B) \
-    _mm_mask_sqrt_round_sd ((W), (U), (A), (B), _MM_FROUND_CUR_DIRECTION)
-
-#define _mm_maskz_sqrt_sd(U, A, B) \
-    _mm_maskz_sqrt_round_sd ((U), (A), (B), _MM_FROUND_CUR_DIRECTION)
-
-#define _mm_mask_sqrt_ss(W, U, A, B) \
-    _mm_mask_sqrt_round_ss ((W), (U), (A), (B), _MM_FROUND_CUR_DIRECTION)
-
-#define _mm_maskz_sqrt_ss(U, A, B) \
-    _mm_maskz_sqrt_round_ss ((U), (A), (B), _MM_FROUND_CUR_DIRECTION)
-
-extern __inline __m512i
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepi8_epi32 (__m128i __A)
+_mm_mask_fmadd_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B)
 {
-  return (__m512i) __builtin_ia32_pmovsxbd512_mask ((__v16qi) __A,
-						    (__v16si)
-						    _mm512_undefined_epi32 (),
-						    (__mmask16) -1);
+  return (__m128d) __builtin_ia32_vfmaddsd3_mask ((__v2df) __W,
+						  (__v2df) __A,
+						  (__v2df) __B,
+						  (__mmask8) __U,
+						  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512i
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepi8_epi32 (__m512i __W, __mmask16 __U, __m128i __A)
+_mm_mask_fmadd_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B)
 {
-  return (__m512i) __builtin_ia32_pmovsxbd512_mask ((__v16qi) __A,
-						    (__v16si) __W,
-						    (__mmask16) __U);
+  return (__m128) __builtin_ia32_vfmaddss3_mask ((__v4sf) __W,
+						 (__v4sf) __A,
+						 (__v4sf) __B,
+						 (__mmask8) __U,
+						 _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512i
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepi8_epi32 (__mmask16 __U, __m128i __A)
+_mm_mask3_fmadd_sd (__m128d __W, __m128d __A, __m128d __B, __mmask8 __U)
 {
-  return (__m512i) __builtin_ia32_pmovsxbd512_mask ((__v16qi) __A,
-						    (__v16si)
-						    _mm512_setzero_si512 (),
-						    (__mmask16) __U);
+  return (__m128d) __builtin_ia32_vfmaddsd3_mask3 ((__v2df) __W,
+						   (__v2df) __A,
+						   (__v2df) __B,
+						   (__mmask8) __U,
+						   _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512i
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepi8_epi64 (__m128i __A)
+_mm_mask3_fmadd_ss (__m128 __W, __m128 __A, __m128 __B, __mmask8 __U)
 {
-  return (__m512i) __builtin_ia32_pmovsxbq512_mask ((__v16qi) __A,
-						    (__v8di)
-						    _mm512_undefined_epi32 (),
-						    (__mmask8) -1);
+  return (__m128) __builtin_ia32_vfmaddss3_mask3 ((__v4sf) __W,
+						  (__v4sf) __A,
+						  (__v4sf) __B,
+						  (__mmask8) __U,
+						  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512i
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepi8_epi64 (__m512i __W, __mmask8 __U, __m128i __A)
+_mm_maskz_fmadd_sd (__mmask8 __U, __m128d __W, __m128d __A, __m128d __B)
 {
-  return (__m512i) __builtin_ia32_pmovsxbq512_mask ((__v16qi) __A,
-						    (__v8di) __W,
-						    (__mmask8) __U);
+  return (__m128d) __builtin_ia32_vfmaddsd3_maskz ((__v2df) __W,
+						   (__v2df) __A,
+						   (__v2df) __B,
+						   (__mmask8) __U,
+						   _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512i
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepi8_epi64 (__mmask8 __U, __m128i __A)
+_mm_maskz_fmadd_ss (__mmask8 __U, __m128 __W, __m128 __A, __m128 __B)
 {
-  return (__m512i) __builtin_ia32_pmovsxbq512_mask ((__v16qi) __A,
-						    (__v8di)
-						    _mm512_setzero_si512 (),
-						    (__mmask8) __U);
+  return (__m128) __builtin_ia32_vfmaddss3_maskz ((__v4sf) __W,
+						  (__v4sf) __A,
+						  (__v4sf) __B,
+						  (__mmask8) __U,
+						  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512i
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepi16_epi32 (__m256i __A)
+_mm_mask_fmsub_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B)
 {
-  return (__m512i) __builtin_ia32_pmovsxwd512_mask ((__v16hi) __A,
-						    (__v16si)
-						    _mm512_undefined_epi32 (),
-						    (__mmask16) -1);
+  return (__m128d) __builtin_ia32_vfmaddsd3_mask ((__v2df) __W,
+						  (__v2df) __A,
+						  -(__v2df) __B,
+						  (__mmask8) __U,
+						  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512i
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepi16_epi32 (__m512i __W, __mmask16 __U, __m256i __A)
+_mm_mask_fmsub_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B)
 {
-  return (__m512i) __builtin_ia32_pmovsxwd512_mask ((__v16hi) __A,
-						    (__v16si) __W,
-						    (__mmask16) __U);
+  return (__m128) __builtin_ia32_vfmaddss3_mask ((__v4sf) __W,
+						 (__v4sf) __A,
+						 -(__v4sf) __B,
+						 (__mmask8) __U,
+						 _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512i
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepi16_epi32 (__mmask16 __U, __m256i __A)
+_mm_mask3_fmsub_sd (__m128d __W, __m128d __A, __m128d __B, __mmask8 __U)
 {
-  return (__m512i) __builtin_ia32_pmovsxwd512_mask ((__v16hi) __A,
-						    (__v16si)
-						    _mm512_setzero_si512 (),
-						    (__mmask16) __U);
+  return (__m128d) __builtin_ia32_vfmsubsd3_mask3 ((__v2df) __W,
+						   (__v2df) __A,
+						   (__v2df) __B,
+						   (__mmask8) __U,
+						   _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512i
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepi16_epi64 (__m128i __A)
+_mm_mask3_fmsub_ss (__m128 __W, __m128 __A, __m128 __B, __mmask8 __U)
 {
-  return (__m512i) __builtin_ia32_pmovsxwq512_mask ((__v8hi) __A,
-						    (__v8di)
-						    _mm512_undefined_epi32 (),
-						    (__mmask8) -1);
+  return (__m128) __builtin_ia32_vfmsubss3_mask3 ((__v4sf) __W,
+						  (__v4sf) __A,
+						  (__v4sf) __B,
+						  (__mmask8) __U,
+						  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512i
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepi16_epi64 (__m512i __W, __mmask8 __U, __m128i __A)
+_mm_maskz_fmsub_sd (__mmask8 __U, __m128d __W, __m128d __A, __m128d __B)
 {
-  return (__m512i) __builtin_ia32_pmovsxwq512_mask ((__v8hi) __A,
-						    (__v8di) __W,
-						    (__mmask8) __U);
+  return (__m128d) __builtin_ia32_vfmaddsd3_maskz ((__v2df) __W,
+						   (__v2df) __A,
+						   -(__v2df) __B,
+						   (__mmask8) __U,
+						   _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512i
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepi16_epi64 (__mmask8 __U, __m128i __A)
+_mm_maskz_fmsub_ss (__mmask8 __U, __m128 __W, __m128 __A, __m128 __B)
 {
-  return (__m512i) __builtin_ia32_pmovsxwq512_mask ((__v8hi) __A,
-						    (__v8di)
-						    _mm512_setzero_si512 (),
-						    (__mmask8) __U);
+  return (__m128) __builtin_ia32_vfmaddss3_maskz ((__v4sf) __W,
+						  (__v4sf) __A,
+						  -(__v4sf) __B,
+						  (__mmask8) __U,
+						  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512i
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepi32_epi64 (__m256i __X)
+_mm_mask_fnmadd_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B)
 {
-  return (__m512i) __builtin_ia32_pmovsxdq512_mask ((__v8si) __X,
-						    (__v8di)
-						    _mm512_undefined_epi32 (),
-						    (__mmask8) -1);
+  return (__m128d) __builtin_ia32_vfmaddsd3_mask ((__v2df) __W,
+						  -(__v2df) __A,
+						  (__v2df) __B,
+						  (__mmask8) __U,
+						  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512i
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepi32_epi64 (__m512i __W, __mmask8 __U, __m256i __X)
+_mm_mask_fnmadd_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B)
 {
-  return (__m512i) __builtin_ia32_pmovsxdq512_mask ((__v8si) __X,
-						    (__v8di) __W,
-						    (__mmask8) __U);
+  return (__m128) __builtin_ia32_vfmaddss3_mask ((__v4sf) __W,
+						 -(__v4sf) __A,
+						 (__v4sf) __B,
+						 (__mmask8) __U,
+						 _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512i
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepi32_epi64 (__mmask8 __U, __m256i __X)
+_mm_mask3_fnmadd_sd (__m128d __W, __m128d __A, __m128d __B, __mmask8 __U)
 {
-  return (__m512i) __builtin_ia32_pmovsxdq512_mask ((__v8si) __X,
-						    (__v8di)
-						    _mm512_setzero_si512 (),
-						    (__mmask8) __U);
+  return (__m128d) __builtin_ia32_vfmaddsd3_mask3 ((__v2df) __W,
+						   -(__v2df) __A,
+						   (__v2df) __B,
+						   (__mmask8) __U,
+						   _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512i
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepu8_epi32 (__m128i __A)
+_mm_mask3_fnmadd_ss (__m128 __W, __m128 __A, __m128 __B, __mmask8 __U)
 {
-  return (__m512i) __builtin_ia32_pmovzxbd512_mask ((__v16qi) __A,
-						    (__v16si)
-						    _mm512_undefined_epi32 (),
-						    (__mmask16) -1);
+  return (__m128) __builtin_ia32_vfmaddss3_mask3 ((__v4sf) __W,
+						  -(__v4sf) __A,
+						  (__v4sf) __B,
+						  (__mmask8) __U,
+						  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512i
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepu8_epi32 (__m512i __W, __mmask16 __U, __m128i __A)
+_mm_maskz_fnmadd_sd (__mmask8 __U, __m128d __W, __m128d __A, __m128d __B)
 {
-  return (__m512i) __builtin_ia32_pmovzxbd512_mask ((__v16qi) __A,
-						    (__v16si) __W,
-						    (__mmask16) __U);
+  return (__m128d) __builtin_ia32_vfmaddsd3_maskz ((__v2df) __W,
+						   -(__v2df) __A,
+						   (__v2df) __B,
+						   (__mmask8) __U,
+						   _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512i
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepu8_epi32 (__mmask16 __U, __m128i __A)
+_mm_maskz_fnmadd_ss (__mmask8 __U, __m128 __W, __m128 __A, __m128 __B)
 {
-  return (__m512i) __builtin_ia32_pmovzxbd512_mask ((__v16qi) __A,
-						    (__v16si)
-						    _mm512_setzero_si512 (),
-						    (__mmask16) __U);
-}
-
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepu8_epi64 (__m128i __A)
-{
-  return (__m512i) __builtin_ia32_pmovzxbq512_mask ((__v16qi) __A,
-						    (__v8di)
-						    _mm512_undefined_epi32 (),
-						    (__mmask8) -1);
+  return (__m128) __builtin_ia32_vfmaddss3_maskz ((__v4sf) __W,
+						  -(__v4sf) __A,
+						  (__v4sf) __B,
+						  (__mmask8) __U,
+						  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512i
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepu8_epi64 (__m512i __W, __mmask8 __U, __m128i __A)
+_mm_mask_fnmsub_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B)
 {
-  return (__m512i) __builtin_ia32_pmovzxbq512_mask ((__v16qi) __A,
-						    (__v8di) __W,
-						    (__mmask8) __U);
+  return (__m128d) __builtin_ia32_vfmaddsd3_mask ((__v2df) __W,
+						  -(__v2df) __A,
+						  -(__v2df) __B,
+						  (__mmask8) __U,
+						  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512i
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepu8_epi64 (__mmask8 __U, __m128i __A)
+_mm_mask_fnmsub_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B)
 {
-  return (__m512i) __builtin_ia32_pmovzxbq512_mask ((__v16qi) __A,
-						    (__v8di)
-						    _mm512_setzero_si512 (),
-						    (__mmask8) __U);
+  return (__m128) __builtin_ia32_vfmaddss3_mask ((__v4sf) __W,
+						 -(__v4sf) __A,
+						 -(__v4sf) __B,
+						 (__mmask8) __U,
+						 _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512i
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepu16_epi32 (__m256i __A)
+_mm_mask3_fnmsub_sd (__m128d __W, __m128d __A, __m128d __B, __mmask8 __U)
 {
-  return (__m512i) __builtin_ia32_pmovzxwd512_mask ((__v16hi) __A,
-						    (__v16si)
-						    _mm512_undefined_epi32 (),
-						    (__mmask16) -1);
+  return (__m128d) __builtin_ia32_vfmsubsd3_mask3 ((__v2df) __W,
+						   -(__v2df) __A,
+						   (__v2df) __B,
+						   (__mmask8) __U,
+						   _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512i
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepu16_epi32 (__m512i __W, __mmask16 __U, __m256i __A)
+_mm_mask3_fnmsub_ss (__m128 __W, __m128 __A, __m128 __B, __mmask8 __U)
 {
-  return (__m512i) __builtin_ia32_pmovzxwd512_mask ((__v16hi) __A,
-						    (__v16si) __W,
-						    (__mmask16) __U);
+  return (__m128) __builtin_ia32_vfmsubss3_mask3 ((__v4sf) __W,
+						  -(__v4sf) __A,
+						  (__v4sf) __B,
+						  (__mmask8) __U,
+						  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512i
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepu16_epi32 (__mmask16 __U, __m256i __A)
+_mm_maskz_fnmsub_sd (__mmask8 __U, __m128d __W, __m128d __A, __m128d __B)
 {
-  return (__m512i) __builtin_ia32_pmovzxwd512_mask ((__v16hi) __A,
-						    (__v16si)
-						    _mm512_setzero_si512 (),
-						    (__mmask16) __U);
+  return (__m128d) __builtin_ia32_vfmaddsd3_maskz ((__v2df) __W,
+						   -(__v2df) __A,
+						   -(__v2df) __B,
+						   (__mmask8) __U,
+						   _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512i
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepu16_epi64 (__m128i __A)
+_mm_maskz_fnmsub_ss (__mmask8 __U, __m128 __W, __m128 __A, __m128 __B)
 {
-  return (__m512i) __builtin_ia32_pmovzxwq512_mask ((__v8hi) __A,
-						    (__v8di)
-						    _mm512_undefined_epi32 (),
-						    (__mmask8) -1);
+  return (__m128) __builtin_ia32_vfmaddss3_maskz ((__v4sf) __W,
+						  -(__v4sf) __A,
+						  -(__v4sf) __B,
+						  (__mmask8) __U,
+						  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512i
+#ifdef __OPTIMIZE__
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepu16_epi64 (__m512i __W, __mmask8 __U, __m128i __A)
+_mm_mask_fmadd_round_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B,
+			 const int __R)
 {
-  return (__m512i) __builtin_ia32_pmovzxwq512_mask ((__v8hi) __A,
-						    (__v8di) __W,
-						    (__mmask8) __U);
+  return (__m128d) __builtin_ia32_vfmaddsd3_mask ((__v2df) __W,
+						  (__v2df) __A,
+						  (__v2df) __B,
+						  (__mmask8) __U, __R);
 }
 
-extern __inline __m512i
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepu16_epi64 (__mmask8 __U, __m128i __A)
+_mm_mask_fmadd_round_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B,
+			 const int __R)
 {
-  return (__m512i) __builtin_ia32_pmovzxwq512_mask ((__v8hi) __A,
-						    (__v8di)
-						    _mm512_setzero_si512 (),
-						    (__mmask8) __U);
+  return (__m128) __builtin_ia32_vfmaddss3_mask ((__v4sf) __W,
+						 (__v4sf) __A,
+						 (__v4sf) __B,
+						 (__mmask8) __U, __R);
 }
 
-extern __inline __m512i
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepu32_epi64 (__m256i __X)
+_mm_mask3_fmadd_round_sd (__m128d __W, __m128d __A, __m128d __B, __mmask8 __U,
+			  const int __R)
 {
-  return (__m512i) __builtin_ia32_pmovzxdq512_mask ((__v8si) __X,
-						    (__v8di)
-						    _mm512_undefined_epi32 (),
-						    (__mmask8) -1);
+  return (__m128d) __builtin_ia32_vfmaddsd3_mask3 ((__v2df) __W,
+						   (__v2df) __A,
+						   (__v2df) __B,
+						   (__mmask8) __U, __R);
 }
 
-extern __inline __m512i
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepu32_epi64 (__m512i __W, __mmask8 __U, __m256i __X)
+_mm_mask3_fmadd_round_ss (__m128 __W, __m128 __A, __m128 __B, __mmask8 __U,
+			  const int __R)
 {
-  return (__m512i) __builtin_ia32_pmovzxdq512_mask ((__v8si) __X,
-						    (__v8di) __W,
-						    (__mmask8) __U);
+  return (__m128) __builtin_ia32_vfmaddss3_mask3 ((__v4sf) __W,
+						  (__v4sf) __A,
+						  (__v4sf) __B,
+						  (__mmask8) __U, __R);
 }
 
-extern __inline __m512i
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepu32_epi64 (__mmask8 __U, __m256i __X)
+_mm_maskz_fmadd_round_sd (__mmask8 __U, __m128d __W, __m128d __A, __m128d __B,
+			  const int __R)
 {
-  return (__m512i) __builtin_ia32_pmovzxdq512_mask ((__v8si) __X,
-						    (__v8di)
-						    _mm512_setzero_si512 (),
-						    (__mmask8) __U);
+  return (__m128d) __builtin_ia32_vfmaddsd3_maskz ((__v2df) __W,
+						   (__v2df) __A,
+						   (__v2df) __B,
+						   (__mmask8) __U, __R);
 }
 
-#ifdef __OPTIMIZE__
-extern __inline __m512d
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_add_round_pd (__m512d __A, __m512d __B, const int __R)
+_mm_maskz_fmadd_round_ss (__mmask8 __U, __m128 __W, __m128 __A, __m128 __B,
+			  const int __R)
 {
-  return (__m512d) __builtin_ia32_addpd512_mask ((__v8df) __A,
-						 (__v8df) __B,
-						 (__v8df)
-						 _mm512_undefined_pd (),
-						 (__mmask8) -1, __R);
+  return (__m128) __builtin_ia32_vfmaddss3_maskz ((__v4sf) __W,
+						  (__v4sf) __A,
+						  (__v4sf) __B,
+						  (__mmask8) __U, __R);
 }
 
-extern __inline __m512d
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_add_round_pd (__m512d __W, __mmask8 __U, __m512d __A,
-			  __m512d __B, const int __R)
+_mm_mask_fmsub_round_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B,
+			 const int __R)
 {
-  return (__m512d) __builtin_ia32_addpd512_mask ((__v8df) __A,
-						 (__v8df) __B,
-						 (__v8df) __W,
-						 (__mmask8) __U, __R);
+  return (__m128d) __builtin_ia32_vfmaddsd3_mask ((__v2df) __W,
+						  (__v2df) __A,
+						  -(__v2df) __B,
+						  (__mmask8) __U, __R);
 }
 
-extern __inline __m512d
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_add_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
-			   const int __R)
+_mm_mask_fmsub_round_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B,
+			 const int __R)
 {
-  return (__m512d) __builtin_ia32_addpd512_mask ((__v8df) __A,
-						 (__v8df) __B,
-						 (__v8df)
-						 _mm512_setzero_pd (),
+  return (__m128) __builtin_ia32_vfmaddss3_mask ((__v4sf) __W,
+						 (__v4sf) __A,
+						 -(__v4sf) __B,
 						 (__mmask8) __U, __R);
 }
 
-extern __inline __m512
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_add_round_ps (__m512 __A, __m512 __B, const int __R)
+_mm_mask3_fmsub_round_sd (__m128d __W, __m128d __A, __m128d __B, __mmask8 __U,
+			  const int __R)
 {
-  return (__m512) __builtin_ia32_addps512_mask ((__v16sf) __A,
-						(__v16sf) __B,
-						(__v16sf)
-						_mm512_undefined_ps (),
-						(__mmask16) -1, __R);
+  return (__m128d) __builtin_ia32_vfmsubsd3_mask3 ((__v2df) __W,
+						   (__v2df) __A,
+						   (__v2df) __B,
+						   (__mmask8) __U, __R);
 }
 
-extern __inline __m512
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_add_round_ps (__m512 __W, __mmask16 __U, __m512 __A,
-			  __m512 __B, const int __R)
+_mm_mask3_fmsub_round_ss (__m128 __W, __m128 __A, __m128 __B, __mmask8 __U,
+			  const int __R)
 {
-  return (__m512) __builtin_ia32_addps512_mask ((__v16sf) __A,
-						(__v16sf) __B,
-						(__v16sf) __W,
-						(__mmask16) __U, __R);
+  return (__m128) __builtin_ia32_vfmsubss3_mask3 ((__v4sf) __W,
+						  (__v4sf) __A,
+						  (__v4sf) __B,
+						  (__mmask8) __U, __R);
 }
 
-extern __inline __m512
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_add_round_ps (__mmask16 __U, __m512 __A, __m512 __B, const int __R)
+_mm_maskz_fmsub_round_sd (__mmask8 __U, __m128d __W, __m128d __A, __m128d __B,
+			  const int __R)
 {
-  return (__m512) __builtin_ia32_addps512_mask ((__v16sf) __A,
-						(__v16sf) __B,
-						(__v16sf)
-						_mm512_setzero_ps (),
-						(__mmask16) __U, __R);
+  return (__m128d) __builtin_ia32_vfmaddsd3_maskz ((__v2df) __W,
+						   (__v2df) __A,
+						   -(__v2df) __B,
+						   (__mmask8) __U, __R);
 }
 
-extern __inline __m512d
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_sub_round_pd (__m512d __A, __m512d __B, const int __R)
+_mm_maskz_fmsub_round_ss (__mmask8 __U, __m128 __W, __m128 __A, __m128 __B,
+			  const int __R)
 {
-  return (__m512d) __builtin_ia32_subpd512_mask ((__v8df) __A,
-						 (__v8df) __B,
-						 (__v8df)
-						 _mm512_undefined_pd (),
-						 (__mmask8) -1, __R);
+  return (__m128) __builtin_ia32_vfmaddss3_maskz ((__v4sf) __W,
+						  (__v4sf) __A,
+						  -(__v4sf) __B,
+						  (__mmask8) __U, __R);
 }
 
-extern __inline __m512d
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_sub_round_pd (__m512d __W, __mmask8 __U, __m512d __A,
-			  __m512d __B, const int __R)
+_mm_mask_fnmadd_round_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B,
+			 const int __R)
 {
-  return (__m512d) __builtin_ia32_subpd512_mask ((__v8df) __A,
-						 (__v8df) __B,
-						 (__v8df) __W,
-						 (__mmask8) __U, __R);
+  return (__m128d) __builtin_ia32_vfmaddsd3_mask ((__v2df) __W,
+						  -(__v2df) __A,
+						  (__v2df) __B,
+						  (__mmask8) __U, __R);
 }
 
-extern __inline __m512d
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_sub_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
-			   const int __R)
+_mm_mask_fnmadd_round_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B,
+			 const int __R)
 {
-  return (__m512d) __builtin_ia32_subpd512_mask ((__v8df) __A,
-						 (__v8df) __B,
-						 (__v8df)
-						 _mm512_setzero_pd (),
+  return (__m128) __builtin_ia32_vfmaddss3_mask ((__v4sf) __W,
+						 -(__v4sf) __A,
+						 (__v4sf) __B,
 						 (__mmask8) __U, __R);
 }
 
-extern __inline __m512
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_sub_round_ps (__m512 __A, __m512 __B, const int __R)
+_mm_mask3_fnmadd_round_sd (__m128d __W, __m128d __A, __m128d __B, __mmask8 __U,
+			  const int __R)
 {
-  return (__m512) __builtin_ia32_subps512_mask ((__v16sf) __A,
-						(__v16sf) __B,
-						(__v16sf)
-						_mm512_undefined_ps (),
-						(__mmask16) -1, __R);
+  return (__m128d) __builtin_ia32_vfmaddsd3_mask3 ((__v2df) __W,
+						   -(__v2df) __A,
+						   (__v2df) __B,
+						   (__mmask8) __U, __R);
 }
 
-extern __inline __m512
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_sub_round_ps (__m512 __W, __mmask16 __U, __m512 __A,
-			  __m512 __B, const int __R)
+_mm_mask3_fnmadd_round_ss (__m128 __W, __m128 __A, __m128 __B, __mmask8 __U,
+			  const int __R)
 {
-  return (__m512) __builtin_ia32_subps512_mask ((__v16sf) __A,
-						(__v16sf) __B,
-						(__v16sf) __W,
-						(__mmask16) __U, __R);
+  return (__m128) __builtin_ia32_vfmaddss3_mask3 ((__v4sf) __W,
+						  -(__v4sf) __A,
+						  (__v4sf) __B,
+						  (__mmask8) __U, __R);
 }
 
-extern __inline __m512
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_sub_round_ps (__mmask16 __U, __m512 __A, __m512 __B, const int __R)
+_mm_maskz_fnmadd_round_sd (__mmask8 __U, __m128d __W, __m128d __A, __m128d __B,
+			  const int __R)
 {
-  return (__m512) __builtin_ia32_subps512_mask ((__v16sf) __A,
-						(__v16sf) __B,
-						(__v16sf)
-						_mm512_setzero_ps (),
-						(__mmask16) __U, __R);
+  return (__m128d) __builtin_ia32_vfmaddsd3_maskz ((__v2df) __W,
+						   -(__v2df) __A,
+						   (__v2df) __B,
+						   (__mmask8) __U, __R);
 }
-#else
-#define _mm512_add_round_pd(A, B, C)            \
-    (__m512d)__builtin_ia32_addpd512_mask(A, B, (__v8df)_mm512_undefined_pd(), -1, C)
-
-#define _mm512_mask_add_round_pd(W, U, A, B, C) \
-    (__m512d)__builtin_ia32_addpd512_mask(A, B, W, U, C)
-
-#define _mm512_maskz_add_round_pd(U, A, B, C)   \
-    (__m512d)__builtin_ia32_addpd512_mask(A, B, (__v8df)_mm512_setzero_pd(), U, C)
-
-#define _mm512_add_round_ps(A, B, C)            \
-    (__m512)__builtin_ia32_addps512_mask(A, B, (__v16sf)_mm512_undefined_ps(), -1, C)
-
-#define _mm512_mask_add_round_ps(W, U, A, B, C) \
-    (__m512)__builtin_ia32_addps512_mask(A, B, W, U, C)
-
-#define _mm512_maskz_add_round_ps(U, A, B, C)   \
-    (__m512)__builtin_ia32_addps512_mask(A, B, (__v16sf)_mm512_setzero_ps(), U, C)
-
-#define _mm512_sub_round_pd(A, B, C)            \
-    (__m512d)__builtin_ia32_subpd512_mask(A, B, (__v8df)_mm512_undefined_pd(), -1, C)
-
-#define _mm512_mask_sub_round_pd(W, U, A, B, C) \
-    (__m512d)__builtin_ia32_subpd512_mask(A, B, W, U, C)
-
-#define _mm512_maskz_sub_round_pd(U, A, B, C)   \
-    (__m512d)__builtin_ia32_subpd512_mask(A, B, (__v8df)_mm512_setzero_pd(), U, C)
-
-#define _mm512_sub_round_ps(A, B, C)            \
-    (__m512)__builtin_ia32_subps512_mask(A, B, (__v16sf)_mm512_undefined_ps(), -1, C)
-
-#define _mm512_mask_sub_round_ps(W, U, A, B, C) \
-    (__m512)__builtin_ia32_subps512_mask(A, B, W, U, C)
-
-#define _mm512_maskz_sub_round_ps(U, A, B, C)   \
-    (__m512)__builtin_ia32_subps512_mask(A, B, (__v16sf)_mm512_setzero_ps(), U, C)
-#endif
 
-#ifdef __OPTIMIZE__
-extern __inline __m512d
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mul_round_pd (__m512d __A, __m512d __B, const int __R)
+_mm_maskz_fnmadd_round_ss (__mmask8 __U, __m128 __W, __m128 __A, __m128 __B,
+			  const int __R)
 {
-  return (__m512d) __builtin_ia32_mulpd512_mask ((__v8df) __A,
-						 (__v8df) __B,
-						 (__v8df)
-						 _mm512_undefined_pd (),
-						 (__mmask8) -1, __R);
+  return (__m128) __builtin_ia32_vfmaddss3_maskz ((__v4sf) __W,
+						  -(__v4sf) __A,
+						  (__v4sf) __B,
+						  (__mmask8) __U, __R);
 }
 
-extern __inline __m512d
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_mul_round_pd (__m512d __W, __mmask8 __U, __m512d __A,
-			  __m512d __B, const int __R)
+_mm_mask_fnmsub_round_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B,
+			 const int __R)
 {
-  return (__m512d) __builtin_ia32_mulpd512_mask ((__v8df) __A,
-						 (__v8df) __B,
-						 (__v8df) __W,
-						 (__mmask8) __U, __R);
+  return (__m128d) __builtin_ia32_vfmaddsd3_mask ((__v2df) __W,
+						  -(__v2df) __A,
+						  -(__v2df) __B,
+						  (__mmask8) __U, __R);
 }
 
-extern __inline __m512d
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_mul_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
-			   const int __R)
+_mm_mask_fnmsub_round_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B,
+			 const int __R)
 {
-  return (__m512d) __builtin_ia32_mulpd512_mask ((__v8df) __A,
-						 (__v8df) __B,
-						 (__v8df)
-						 _mm512_setzero_pd (),
+  return (__m128) __builtin_ia32_vfmaddss3_mask ((__v4sf) __W,
+						 -(__v4sf) __A,
+						 -(__v4sf) __B,
 						 (__mmask8) __U, __R);
 }
 
-extern __inline __m512
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mul_round_ps (__m512 __A, __m512 __B, const int __R)
+_mm_mask3_fnmsub_round_sd (__m128d __W, __m128d __A, __m128d __B, __mmask8 __U,
+			  const int __R)
 {
-  return (__m512) __builtin_ia32_mulps512_mask ((__v16sf) __A,
-						(__v16sf) __B,
-						(__v16sf)
-						_mm512_undefined_ps (),
-						(__mmask16) -1, __R);
+  return (__m128d) __builtin_ia32_vfmsubsd3_mask3 ((__v2df) __W,
+						   -(__v2df) __A,
+						   (__v2df) __B,
+						   (__mmask8) __U, __R);
 }
 
-extern __inline __m512
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_mul_round_ps (__m512 __W, __mmask16 __U, __m512 __A,
-			  __m512 __B, const int __R)
+_mm_mask3_fnmsub_round_ss (__m128 __W, __m128 __A, __m128 __B, __mmask8 __U,
+			  const int __R)
 {
-  return (__m512) __builtin_ia32_mulps512_mask ((__v16sf) __A,
-						(__v16sf) __B,
-						(__v16sf) __W,
-						(__mmask16) __U, __R);
+  return (__m128) __builtin_ia32_vfmsubss3_mask3 ((__v4sf) __W,
+						  -(__v4sf) __A,
+						  (__v4sf) __B,
+						  (__mmask8) __U, __R);
 }
 
-extern __inline __m512
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_mul_round_ps (__mmask16 __U, __m512 __A, __m512 __B, const int __R)
+_mm_maskz_fnmsub_round_sd (__mmask8 __U, __m128d __W, __m128d __A, __m128d __B,
+			  const int __R)
 {
-  return (__m512) __builtin_ia32_mulps512_mask ((__v16sf) __A,
-						(__v16sf) __B,
-						(__v16sf)
-						_mm512_setzero_ps (),
-						(__mmask16) __U, __R);
+  return (__m128d) __builtin_ia32_vfmaddsd3_maskz ((__v2df) __W,
+						   -(__v2df) __A,
+						   -(__v2df) __B,
+						   (__mmask8) __U, __R);
 }
 
-extern __inline __m512d
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_div_round_pd (__m512d __M, __m512d __V, const int __R)
+_mm_maskz_fnmsub_round_ss (__mmask8 __U, __m128 __W, __m128 __A, __m128 __B,
+			  const int __R)
 {
-  return (__m512d) __builtin_ia32_divpd512_mask ((__v8df) __M,
-						 (__v8df) __V,
-						 (__v8df)
-						 _mm512_undefined_pd (),
-						 (__mmask8) -1, __R);
+  return (__m128) __builtin_ia32_vfmaddss3_maskz ((__v4sf) __W,
+						  -(__v4sf) __A,
+						  -(__v4sf) __B,
+						  (__mmask8) __U, __R);
 }
+#else
+#define _mm_mask_fmadd_round_sd(A, U, B, C, R)            \
+    (__m128d) __builtin_ia32_vfmaddsd3_mask (A, B, C, U, R)
 
-extern __inline __m512d
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_div_round_pd (__m512d __W, __mmask8 __U, __m512d __M,
-			  __m512d __V, const int __R)
-{
-  return (__m512d) __builtin_ia32_divpd512_mask ((__v8df) __M,
-						 (__v8df) __V,
-						 (__v8df) __W,
-						 (__mmask8) __U, __R);
-}
+#define _mm_mask_fmadd_round_ss(A, U, B, C, R)            \
+    (__m128) __builtin_ia32_vfmaddss3_mask (A, B, C, U, R)
 
-extern __inline __m512d
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_div_round_pd (__mmask8 __U, __m512d __M, __m512d __V,
-			   const int __R)
-{
-  return (__m512d) __builtin_ia32_divpd512_mask ((__v8df) __M,
-						 (__v8df) __V,
-						 (__v8df)
-						 _mm512_setzero_pd (),
-						 (__mmask8) __U, __R);
-}
+#define _mm_mask3_fmadd_round_sd(A, B, C, U, R)            \
+    (__m128d) __builtin_ia32_vfmaddsd3_mask3 (A, B, C, U, R)
 
-extern __inline __m512
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_div_round_ps (__m512 __A, __m512 __B, const int __R)
-{
-  return (__m512) __builtin_ia32_divps512_mask ((__v16sf) __A,
-						(__v16sf) __B,
-						(__v16sf)
-						_mm512_undefined_ps (),
-						(__mmask16) -1, __R);
-}
+#define _mm_mask3_fmadd_round_ss(A, B, C, U, R)            \
+    (__m128) __builtin_ia32_vfmaddss3_mask3 (A, B, C, U, R)
 
-extern __inline __m512
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_div_round_ps (__m512 __W, __mmask16 __U, __m512 __A,
-			  __m512 __B, const int __R)
-{
-  return (__m512) __builtin_ia32_divps512_mask ((__v16sf) __A,
-						(__v16sf) __B,
-						(__v16sf) __W,
-						(__mmask16) __U, __R);
-}
+#define _mm_maskz_fmadd_round_sd(U, A, B, C, R)            \
+    (__m128d) __builtin_ia32_vfmaddsd3_maskz (A, B, C, U, R)
 
-extern __inline __m512
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_div_round_ps (__mmask16 __U, __m512 __A, __m512 __B, const int __R)
-{
-  return (__m512) __builtin_ia32_divps512_mask ((__v16sf) __A,
-						(__v16sf) __B,
-						(__v16sf)
-						_mm512_setzero_ps (),
-						(__mmask16) __U, __R);
+#define _mm_maskz_fmadd_round_ss(U, A, B, C, R)            \
+    (__m128) __builtin_ia32_vfmaddss3_maskz (A, B, C, U, R)
+
+#define _mm_mask_fmsub_round_sd(A, U, B, C, R)            \
+    (__m128d) __builtin_ia32_vfmaddsd3_mask (A, B, -(C), U, R)
+
+#define _mm_mask_fmsub_round_ss(A, U, B, C, R)            \
+    (__m128) __builtin_ia32_vfmaddss3_mask (A, B, -(C), U, R)
+
+#define _mm_mask3_fmsub_round_sd(A, B, C, U, R)            \
+    (__m128d) __builtin_ia32_vfmsubsd3_mask3 (A, B, C, U, R)
+
+#define _mm_mask3_fmsub_round_ss(A, B, C, U, R)            \
+    (__m128) __builtin_ia32_vfmsubss3_mask3 (A, B, C, U, R)
+
+#define _mm_maskz_fmsub_round_sd(U, A, B, C, R)            \
+    (__m128d) __builtin_ia32_vfmaddsd3_maskz (A, B, -(C), U, R)
+
+#define _mm_maskz_fmsub_round_ss(U, A, B, C, R)            \
+    (__m128) __builtin_ia32_vfmaddss3_maskz (A, B, -(C), U, R)
+
+#define _mm_mask_fnmadd_round_sd(A, U, B, C, R)            \
+    (__m128d) __builtin_ia32_vfmaddsd3_mask (A, -(B), C, U, R)
+
+#define _mm_mask_fnmadd_round_ss(A, U, B, C, R)            \
+    (__m128) __builtin_ia32_vfmaddss3_mask (A, -(B), C, U, R)
+
+#define _mm_mask3_fnmadd_round_sd(A, B, C, U, R)            \
+    (__m128d) __builtin_ia32_vfmaddsd3_mask3 (A, -(B), C, U, R)
+
+#define _mm_mask3_fnmadd_round_ss(A, B, C, U, R)            \
+    (__m128) __builtin_ia32_vfmaddss3_mask3 (A, -(B), C, U, R)
+
+#define _mm_maskz_fnmadd_round_sd(U, A, B, C, R)            \
+    (__m128d) __builtin_ia32_vfmaddsd3_maskz (A, -(B), C, U, R)
+
+#define _mm_maskz_fnmadd_round_ss(U, A, B, C, R)            \
+    (__m128) __builtin_ia32_vfmaddss3_maskz (A, -(B), C, U, R)
+
+#define _mm_mask_fnmsub_round_sd(A, U, B, C, R)            \
+    (__m128d) __builtin_ia32_vfmaddsd3_mask (A, -(B), -(C), U, R)
+
+#define _mm_mask_fnmsub_round_ss(A, U, B, C, R)            \
+    (__m128) __builtin_ia32_vfmaddss3_mask (A, -(B), -(C), U, R)
+
+#define _mm_mask3_fnmsub_round_sd(A, B, C, U, R)            \
+    (__m128d) __builtin_ia32_vfmsubsd3_mask3 (A, -(B), C, U, R)
+
+#define _mm_mask3_fnmsub_round_ss(A, B, C, U, R)            \
+    (__m128) __builtin_ia32_vfmsubss3_mask3 (A, -(B), C, U, R)
+
+#define _mm_maskz_fnmsub_round_sd(U, A, B, C, R)            \
+    (__m128d) __builtin_ia32_vfmaddsd3_maskz (A, -(B), -(C), U, R)
+
+#define _mm_maskz_fnmsub_round_ss(U, A, B, C, R)            \
+    (__m128) __builtin_ia32_vfmaddss3_maskz (A, -(B), -(C), U, R)
+#endif
+
+#ifdef __OPTIMIZE__
+extern __inline int
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comi_round_ss (__m128 __A, __m128 __B, const int __P, const int __R)
+{
+  return __builtin_ia32_vcomiss ((__v4sf) __A, (__v4sf) __B, __P, __R);
+}
+
+extern __inline int
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comi_round_sd (__m128d __A, __m128d __B, const int __P, const int __R)
+{
+  return __builtin_ia32_vcomisd ((__v2df) __A, (__v2df) __B, __P, __R);
 }
+#else
+#define _mm_comi_round_ss(A, B, C, D)\
+__builtin_ia32_vcomiss(A, B, C, D)
+#define _mm_comi_round_sd(A, B, C, D)\
+__builtin_ia32_vcomisd(A, B, C, D)
+#endif
 
 extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mul_round_sd (__m128d __A, __m128d __B, const int __R)
+_mm_mask_add_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B)
 {
-  return (__m128d) __builtin_ia32_mulsd_round ((__v2df) __A,
-					       (__v2df) __B,
-					       __R);
+  return (__m128d) __builtin_ia32_addsd_mask_round ((__v2df) __A,
+						(__v2df) __B,
+						(__v2df) __W,
+						(__mmask8) __U,
+						_MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_mul_round_sd (__m128d __W, __mmask8 __U, __m128d __A,
-			  __m128d __B, const int __R)
+_mm_maskz_add_sd (__mmask8 __U, __m128d __A, __m128d __B)
+{
+  return (__m128d) __builtin_ia32_addsd_mask_round ((__v2df) __A,
+						(__v2df) __B,
+						(__v2df)
+						_mm_setzero_pd (),
+						(__mmask8) __U,
+						_MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_add_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B)
+{
+  return (__m128) __builtin_ia32_addss_mask_round ((__v4sf) __A,
+						(__v4sf) __B,
+						(__v4sf) __W,
+						(__mmask8) __U,
+						_MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_add_ss (__mmask8 __U, __m128 __A, __m128 __B)
+{
+  return (__m128) __builtin_ia32_addss_mask_round ((__v4sf) __A,
+						(__v4sf) __B,
+						(__v4sf)
+						_mm_setzero_ps (),
+						(__mmask8) __U,
+						_MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_sub_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B)
+{
+  return (__m128d) __builtin_ia32_subsd_mask_round ((__v2df) __A,
+						(__v2df) __B,
+						(__v2df) __W,
+						(__mmask8) __U,
+						_MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_sub_sd (__mmask8 __U, __m128d __A, __m128d __B)
+{
+  return (__m128d) __builtin_ia32_subsd_mask_round ((__v2df) __A,
+						(__v2df) __B,
+						(__v2df)
+						_mm_setzero_pd (),
+						(__mmask8) __U,
+						_MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_sub_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B)
+{
+  return (__m128) __builtin_ia32_subss_mask_round ((__v4sf) __A,
+						(__v4sf) __B,
+						(__v4sf) __W,
+						(__mmask8) __U,
+						_MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_sub_ss (__mmask8 __U, __m128 __A, __m128 __B)
+{
+  return (__m128) __builtin_ia32_subss_mask_round ((__v4sf) __A,
+						(__v4sf) __B,
+						(__v4sf)
+						_mm_setzero_ps (),
+						(__mmask8) __U,
+						_MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_mul_sd (__m128d __W, __mmask8 __U, __m128d __A,
+			  __m128d __B)
 {
   return (__m128d) __builtin_ia32_mulsd_mask_round ((__v2df) __A,
 						 (__v2df) __B,
 						 (__v2df) __W,
-						 (__mmask8) __U, __R);
+						 (__mmask8) __U,
+						  _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_mul_round_sd (__mmask8 __U, __m128d __A, __m128d __B,
-			   const int __R)
+_mm_maskz_mul_sd (__mmask8 __U, __m128d __A, __m128d __B)
 {
   return (__m128d) __builtin_ia32_mulsd_mask_round ((__v2df) __A,
 						 (__v2df) __B,
 						 (__v2df)
 						 _mm_setzero_pd (),
-						 (__mmask8) __U, __R);
-}
-
-extern __inline __m128
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mul_round_ss (__m128 __A, __m128 __B, const int __R)
-{
-  return (__m128) __builtin_ia32_mulss_round ((__v4sf) __A,
-					      (__v4sf) __B,
-					      __R);
+						 (__mmask8) __U,
+						  _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_mul_round_ss (__m128 __W, __mmask8 __U, __m128 __A,
-			  __m128 __B, const int __R)
+_mm_mask_mul_ss (__m128 __W, __mmask8 __U, __m128 __A,
+			  __m128 __B)
 {
   return (__m128) __builtin_ia32_mulss_mask_round ((__v4sf) __A,
 						 (__v4sf) __B,
 						 (__v4sf) __W,
-						 (__mmask8) __U, __R);
+						 (__mmask8) __U,
+						  _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_mul_round_ss (__mmask8 __U, __m128 __A, __m128 __B,
-			   const int __R)
+_mm_maskz_mul_ss (__mmask8 __U, __m128 __A, __m128 __B)
 {
   return (__m128) __builtin_ia32_mulss_mask_round ((__v4sf) __A,
 						 (__v4sf) __B,
 						 (__v4sf)
 						 _mm_setzero_ps (),
-						 (__mmask8) __U, __R);
-}
-
-extern __inline __m128d
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_div_round_sd (__m128d __A, __m128d __B, const int __R)
-{
-  return (__m128d) __builtin_ia32_divsd_round ((__v2df) __A,
-					       (__v2df) __B,
-					       __R);
+						 (__mmask8) __U,
+						  _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_div_round_sd (__m128d __W, __mmask8 __U, __m128d __A,
-			  __m128d __B, const int __R)
+_mm_mask_div_sd (__m128d __W, __mmask8 __U, __m128d __A,
+			  __m128d __B)
 {
   return (__m128d) __builtin_ia32_divsd_mask_round ((__v2df) __A,
 						 (__v2df) __B,
 						 (__v2df) __W,
-						 (__mmask8) __U, __R);
+						 (__mmask8) __U,
+						  _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_div_round_sd (__mmask8 __U, __m128d __A, __m128d __B,
-			   const int __R)
+_mm_maskz_div_sd (__mmask8 __U, __m128d __A, __m128d __B)
 {
   return (__m128d) __builtin_ia32_divsd_mask_round ((__v2df) __A,
 						 (__v2df) __B,
 						 (__v2df)
 						 _mm_setzero_pd (),
-						 (__mmask8) __U, __R);
-}
-
-extern __inline __m128
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_div_round_ss (__m128 __A, __m128 __B, const int __R)
-{
-  return (__m128) __builtin_ia32_divss_round ((__v4sf) __A,
-					      (__v4sf) __B,
-					      __R);
+						 (__mmask8) __U,
+						  _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_div_round_ss (__m128 __W, __mmask8 __U, __m128 __A,
-			  __m128 __B, const int __R)
+_mm_mask_div_ss (__m128 __W, __mmask8 __U, __m128 __A,
+			  __m128 __B)
 {
   return (__m128) __builtin_ia32_divss_mask_round ((__v4sf) __A,
 						 (__v4sf) __B,
 						 (__v4sf) __W,
-						 (__mmask8) __U, __R);
+						 (__mmask8) __U,
+						  _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_div_round_ss (__mmask8 __U, __m128 __A, __m128 __B,
-			   const int __R)
+_mm_maskz_div_ss (__mmask8 __U, __m128 __A, __m128 __B)
 {
   return (__m128) __builtin_ia32_divss_mask_round ((__v4sf) __A,
 						 (__v4sf) __B,
 						 (__v4sf)
 						 _mm_setzero_ps (),
-						 (__mmask8) __U, __R);
+						 (__mmask8) __U,
+						  _MM_FROUND_CUR_DIRECTION);
 }
 
-#else
-#define _mm512_mul_round_pd(A, B, C)            \
-    (__m512d)__builtin_ia32_mulpd512_mask(A, B, (__v8df)_mm512_undefined_pd(), -1, C)
-
-#define _mm512_mask_mul_round_pd(W, U, A, B, C) \
-    (__m512d)__builtin_ia32_mulpd512_mask(A, B, W, U, C)
-
-#define _mm512_maskz_mul_round_pd(U, A, B, C)   \
-    (__m512d)__builtin_ia32_mulpd512_mask(A, B, (__v8df)_mm512_setzero_pd(), U, C)
-
-#define _mm512_mul_round_ps(A, B, C)            \
-    (__m512)__builtin_ia32_mulps512_mask(A, B, (__v16sf)_mm512_undefined_ps(), -1, C)
-
-#define _mm512_mask_mul_round_ps(W, U, A, B, C) \
-    (__m512)__builtin_ia32_mulps512_mask(A, B, W, U, C)
-
-#define _mm512_maskz_mul_round_ps(U, A, B, C)   \
-    (__m512)__builtin_ia32_mulps512_mask(A, B, (__v16sf)_mm512_setzero_ps(), U, C)
-
-#define _mm512_div_round_pd(A, B, C)            \
-    (__m512d)__builtin_ia32_divpd512_mask(A, B, (__v8df)_mm512_undefined_pd(), -1, C)
-
-#define _mm512_mask_div_round_pd(W, U, A, B, C) \
-    (__m512d)__builtin_ia32_divpd512_mask(A, B, W, U, C)
-
-#define _mm512_maskz_div_round_pd(U, A, B, C)   \
-    (__m512d)__builtin_ia32_divpd512_mask(A, B, (__v8df)_mm512_setzero_pd(), U, C)
-
-#define _mm512_div_round_ps(A, B, C)            \
-    (__m512)__builtin_ia32_divps512_mask(A, B, (__v16sf)_mm512_undefined_ps(), -1, C)
-
-#define _mm512_mask_div_round_ps(W, U, A, B, C) \
-    (__m512)__builtin_ia32_divps512_mask(A, B, W, U, C)
-
-#define _mm512_maskz_div_round_ps(U, A, B, C)   \
-    (__m512)__builtin_ia32_divps512_mask(A, B, (__v16sf)_mm512_setzero_ps(), U, C)
-
-#define _mm_mul_round_sd(A, B, C)            \
-    (__m128d)__builtin_ia32_mulsd_round(A, B, C)
-
-#define _mm_mask_mul_round_sd(W, U, A, B, C) \
-    (__m128d)__builtin_ia32_mulsd_mask_round(A, B, W, U, C)
-
-#define _mm_maskz_mul_round_sd(U, A, B, C)   \
-    (__m128d)__builtin_ia32_mulsd_mask_round(A, B, (__v2df)_mm_setzero_pd(), U, C)
-
-#define _mm_mul_round_ss(A, B, C)            \
-    (__m128)__builtin_ia32_mulss_round(A, B, C)
-
-#define _mm_mask_mul_round_ss(W, U, A, B, C) \
-    (__m128)__builtin_ia32_mulss_mask_round(A, B, W, U, C)
-
-#define _mm_maskz_mul_round_ss(U, A, B, C)   \
-    (__m128)__builtin_ia32_mulss_mask_round(A, B, (__v4sf)_mm_setzero_ps(), U, C)
-
-#define _mm_div_round_sd(A, B, C)            \
-    (__m128d)__builtin_ia32_divsd_round(A, B, C)
-
-#define _mm_mask_div_round_sd(W, U, A, B, C) \
-    (__m128d)__builtin_ia32_divsd_mask_round(A, B, W, U, C)
-
-#define _mm_maskz_div_round_sd(U, A, B, C)   \
-    (__m128d)__builtin_ia32_divsd_mask_round(A, B, (__v2df)_mm_setzero_pd(), U, C)
-
-#define _mm_div_round_ss(A, B, C)            \
-    (__m128)__builtin_ia32_divss_round(A, B, C)
-
-#define _mm_mask_div_round_ss(W, U, A, B, C) \
-    (__m128)__builtin_ia32_divss_mask_round(A, B, W, U, C)
-
-#define _mm_maskz_div_round_ss(U, A, B, C)   \
-    (__m128)__builtin_ia32_divss_mask_round(A, B, (__v4sf)_mm_setzero_ps(), U, C)
-
-#endif
-
-#ifdef __OPTIMIZE__
-extern __inline __m512d
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_max_round_pd (__m512d __A, __m512d __B, const int __R)
+_mm_mask_max_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B)
 {
-  return (__m512d) __builtin_ia32_maxpd512_mask ((__v8df) __A,
-						 (__v8df) __B,
-						 (__v8df)
-						 _mm512_undefined_pd (),
-						 (__mmask8) -1, __R);
+  return (__m128d) __builtin_ia32_maxsd_mask_round ((__v2df) __A,
+						 (__v2df) __B,
+						 (__v2df) __W,
+						 (__mmask8) __U,
+						 _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512d
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_max_round_pd (__m512d __W, __mmask8 __U, __m512d __A,
-			  __m512d __B, const int __R)
+_mm_maskz_max_sd (__mmask8 __U, __m128d __A, __m128d __B)
 {
-  return (__m512d) __builtin_ia32_maxpd512_mask ((__v8df) __A,
-						 (__v8df) __B,
-						 (__v8df) __W,
-						 (__mmask8) __U, __R);
+  return (__m128d) __builtin_ia32_maxsd_mask_round ((__v2df) __A,
+						 (__v2df) __B,
+						 (__v2df)
+						 _mm_setzero_pd (),
+						 (__mmask8) __U,
+						 _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512d
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_max_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
-			   const int __R)
+_mm_mask_max_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B)
 {
-  return (__m512d) __builtin_ia32_maxpd512_mask ((__v8df) __A,
-						 (__v8df) __B,
-						 (__v8df)
-						 _mm512_setzero_pd (),
-						 (__mmask8) __U, __R);
+  return (__m128) __builtin_ia32_maxss_mask_round ((__v4sf) __A,
+						(__v4sf) __B,
+						(__v4sf) __W,
+						(__mmask8) __U,
+						_MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_max_round_ps (__m512 __A, __m512 __B, const int __R)
+_mm_maskz_max_ss (__mmask8 __U, __m128 __A, __m128 __B)
 {
-  return (__m512) __builtin_ia32_maxps512_mask ((__v16sf) __A,
-						(__v16sf) __B,
-						(__v16sf)
-						_mm512_undefined_ps (),
-						(__mmask16) -1, __R);
+  return (__m128) __builtin_ia32_maxss_mask_round ((__v4sf) __A,
+						(__v4sf) __B,
+						(__v4sf)
+						_mm_setzero_ps (),
+						(__mmask8) __U,
+						_MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_max_round_ps (__m512 __W, __mmask16 __U, __m512 __A,
-			  __m512 __B, const int __R)
+_mm_mask_min_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B)
 {
-  return (__m512) __builtin_ia32_maxps512_mask ((__v16sf) __A,
-						(__v16sf) __B,
-						(__v16sf) __W,
-						(__mmask16) __U, __R);
+  return (__m128d) __builtin_ia32_minsd_mask_round ((__v2df) __A,
+						 (__v2df) __B,
+						 (__v2df) __W,
+						 (__mmask8) __U,
+						 _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_max_round_ps (__mmask16 __U, __m512 __A, __m512 __B, const int __R)
+_mm_maskz_min_sd (__mmask8 __U, __m128d __A, __m128d __B)
 {
-  return (__m512) __builtin_ia32_maxps512_mask ((__v16sf) __A,
-						(__v16sf) __B,
-						(__v16sf)
-						_mm512_setzero_ps (),
-						(__mmask16) __U, __R);
+  return (__m128d) __builtin_ia32_minsd_mask_round ((__v2df) __A,
+						 (__v2df) __B,
+						 (__v2df)
+						 _mm_setzero_pd (),
+						 (__mmask8) __U,
+						 _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512d
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_min_round_pd (__m512d __A, __m512d __B, const int __R)
+_mm_mask_min_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B)
 {
-  return (__m512d) __builtin_ia32_minpd512_mask ((__v8df) __A,
-						 (__v8df) __B,
-						 (__v8df)
-						 _mm512_undefined_pd (),
-						 (__mmask8) -1, __R);
+  return (__m128) __builtin_ia32_minss_mask_round ((__v4sf) __A,
+						(__v4sf) __B,
+						(__v4sf) __W,
+						(__mmask8) __U,
+						_MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512d
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_min_round_pd (__m512d __W, __mmask8 __U, __m512d __A,
-			  __m512d __B, const int __R)
+_mm_maskz_min_ss (__mmask8 __U, __m128 __A, __m128 __B)
 {
-  return (__m512d) __builtin_ia32_minpd512_mask ((__v8df) __A,
-						 (__v8df) __B,
-						 (__v8df) __W,
-						 (__mmask8) __U, __R);
+  return (__m128) __builtin_ia32_minss_mask_round ((__v4sf) __A,
+						(__v4sf) __B,
+						(__v4sf)
+						_mm_setzero_ps (),
+						(__mmask8) __U,
+						_MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512d
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_min_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
-			   const int __R)
+_mm_scalef_sd (__m128d __A, __m128d __B)
 {
-  return (__m512d) __builtin_ia32_minpd512_mask ((__v8df) __A,
-						 (__v8df) __B,
-						 (__v8df)
-						 _mm512_setzero_pd (),
-						 (__mmask8) __U, __R);
+  return (__m128d) __builtin_ia32_scalefsd_mask_round ((__v2df) __A,
+						    (__v2df) __B,
+						    (__v2df)
+						    _mm_setzero_pd (),
+						    (__mmask8) -1,
+						    _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_min_round_ps (__m512 __A, __m512 __B, const int __R)
+_mm_scalef_ss (__m128 __A, __m128 __B)
 {
-  return (__m512) __builtin_ia32_minps512_mask ((__v16sf) __A,
-						(__v16sf) __B,
-						(__v16sf)
-						_mm512_undefined_ps (),
-						(__mmask16) -1, __R);
+  return (__m128) __builtin_ia32_scalefss_mask_round ((__v4sf) __A,
+						   (__v4sf) __B,
+						   (__v4sf)
+						   _mm_setzero_ps (),
+						   (__mmask8) -1,
+						   _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512
+#ifdef __x86_64__
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_min_round_ps (__m512 __W, __mmask16 __U, __m512 __A,
-			  __m512 __B, const int __R)
+_mm_cvtu64_ss (__m128 __A, unsigned long long __B)
 {
-  return (__m512) __builtin_ia32_minps512_mask ((__v16sf) __A,
-						(__v16sf) __B,
-						(__v16sf) __W,
-						(__mmask16) __U, __R);
+  return (__m128) __builtin_ia32_cvtusi2ss64 ((__v4sf) __A, __B,
+					      _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_min_round_ps (__mmask16 __U, __m512 __A, __m512 __B, const int __R)
+_mm_cvtu64_sd (__m128d __A, unsigned long long __B)
 {
-  return (__m512) __builtin_ia32_minps512_mask ((__v16sf) __A,
-						(__v16sf) __B,
-						(__v16sf)
-						_mm512_setzero_ps (),
-						(__mmask16) __U, __R);
+  return (__m128d) __builtin_ia32_cvtusi2sd64 ((__v2df) __A, __B,
+					       _MM_FROUND_CUR_DIRECTION);
 }
-#else
-#define _mm512_max_round_pd(A, B,  R) \
-    (__m512d)__builtin_ia32_maxpd512_mask(A, B, (__v8df)_mm512_undefined_pd(), -1, R)
-
-#define _mm512_mask_max_round_pd(W, U,  A, B, R) \
-    (__m512d)__builtin_ia32_maxpd512_mask(A, B, W, U, R)
+#endif
 
-#define _mm512_maskz_max_round_pd(U, A,  B, R) \
-    (__m512d)__builtin_ia32_maxpd512_mask(A, B, (__v8df)_mm512_setzero_pd(), U, R)
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtu32_ss (__m128 __A, unsigned __B)
+{
+  return (__m128) __builtin_ia32_cvtusi2ss32 ((__v4sf) __A, __B,
+					      _MM_FROUND_CUR_DIRECTION);
+}
 
-#define _mm512_max_round_ps(A, B,  R) \
-    (__m512)__builtin_ia32_maxps512_mask(A, B, (__v16sf)_mm512_undefined_pd(), -1, R)
+#ifdef __OPTIMIZE__
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fixupimm_sd (__m128d __A, __m128d __B, __m128i __C, const int __imm)
+{
+  return (__m128d) __builtin_ia32_fixupimmsd_mask ((__v2df) __A,
+						   (__v2df) __B,
+						   (__v2di) __C, __imm,
+						   (__mmask8) -1,
+						   _MM_FROUND_CUR_DIRECTION);
+}
 
-#define _mm512_mask_max_round_ps(W, U,  A, B, R) \
-    (__m512)__builtin_ia32_maxps512_mask(A, B, W, U, R)
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_fixupimm_sd (__m128d __A, __mmask8 __U, __m128d __B,
+		      __m128i __C, const int __imm)
+{
+  return (__m128d) __builtin_ia32_fixupimmsd_mask ((__v2df) __A,
+						   (__v2df) __B,
+						   (__v2di) __C, __imm,
+						   (__mmask8) __U,
+						   _MM_FROUND_CUR_DIRECTION);
+}
 
-#define _mm512_maskz_max_round_ps(U, A,  B, R) \
-    (__m512)__builtin_ia32_maxps512_mask(A, B, (__v16sf)_mm512_setzero_ps(), U, R)
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_fixupimm_sd (__mmask8 __U, __m128d __A, __m128d __B,
+		       __m128i __C, const int __imm)
+{
+  return (__m128d) __builtin_ia32_fixupimmsd_maskz ((__v2df) __A,
+						    (__v2df) __B,
+						    (__v2di) __C,
+						    __imm,
+						    (__mmask8) __U,
+						    _MM_FROUND_CUR_DIRECTION);
+}
 
-#define _mm512_min_round_pd(A, B,  R) \
-    (__m512d)__builtin_ia32_minpd512_mask(A, B, (__v8df)_mm512_undefined_pd(), -1, R)
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fixupimm_ss (__m128 __A, __m128 __B, __m128i __C, const int __imm)
+{
+  return (__m128) __builtin_ia32_fixupimmss_mask ((__v4sf) __A,
+						  (__v4sf) __B,
+						  (__v4si) __C, __imm,
+						  (__mmask8) -1,
+						  _MM_FROUND_CUR_DIRECTION);
+}
 
-#define _mm512_mask_min_round_pd(W, U,  A, B, R) \
-    (__m512d)__builtin_ia32_minpd512_mask(A, B, W, U, R)
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_fixupimm_ss (__m128 __A, __mmask8 __U, __m128 __B,
+		      __m128i __C, const int __imm)
+{
+  return (__m128) __builtin_ia32_fixupimmss_mask ((__v4sf) __A,
+						  (__v4sf) __B,
+						  (__v4si) __C, __imm,
+						  (__mmask8) __U,
+						  _MM_FROUND_CUR_DIRECTION);
+}
 
-#define _mm512_maskz_min_round_pd(U, A,  B, R) \
-    (__m512d)__builtin_ia32_minpd512_mask(A, B, (__v8df)_mm512_setzero_pd(), U, R)
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_fixupimm_ss (__mmask8 __U, __m128 __A, __m128 __B,
+		       __m128i __C, const int __imm)
+{
+  return (__m128) __builtin_ia32_fixupimmss_maskz ((__v4sf) __A,
+						   (__v4sf) __B,
+						   (__v4si) __C, __imm,
+						   (__mmask8) __U,
+						   _MM_FROUND_CUR_DIRECTION);
+}
 
-#define _mm512_min_round_ps(A, B, R) \
-    (__m512)__builtin_ia32_minps512_mask(A, B, (__v16sf)_mm512_undefined_ps(), -1, R)
+#else
+#define _mm_fixupimm_sd(X, Y, Z, C)					\
+    ((__m128d)__builtin_ia32_fixupimmsd_mask ((__v2df)(__m128d)(X),	\
+      (__v2df)(__m128d)(Y), (__v2di)(__m128i)(Z), (int)(C),		\
+      (__mmask8)(-1), _MM_FROUND_CUR_DIRECTION))
 
-#define _mm512_mask_min_round_ps(W, U,  A, B, R) \
-    (__m512)__builtin_ia32_minps512_mask(A, B, W, U, R)
+#define _mm_mask_fixupimm_sd(X, U, Y, Z, C)				\
+    ((__m128d)__builtin_ia32_fixupimmsd_mask ((__v2df)(__m128d)(X),	\
+      (__v2df)(__m128d)(Y), (__v2di)(__m128i)(Z), (int)(C),		\
+      (__mmask8)(U), _MM_FROUND_CUR_DIRECTION))
+
+#define _mm_maskz_fixupimm_sd(U, X, Y, Z, C)				\
+    ((__m128d)__builtin_ia32_fixupimmsd_maskz ((__v2df)(__m128d)(X),	\
+      (__v2df)(__m128d)(Y), (__v2di)(__m128i)(Z), (int)(C),		\
+      (__mmask8)(U), _MM_FROUND_CUR_DIRECTION))
+
+#define _mm_fixupimm_ss(X, Y, Z, C)					\
+    ((__m128)__builtin_ia32_fixupimmss_mask ((__v4sf)(__m128)(X),	\
+      (__v4sf)(__m128)(Y), (__v4si)(__m128i)(Z), (int)(C),		\
+      (__mmask8)(-1), _MM_FROUND_CUR_DIRECTION))
+
+#define _mm_mask_fixupimm_ss(X, U, Y, Z, C)				\
+    ((__m128)__builtin_ia32_fixupimmss_mask ((__v4sf)(__m128)(X),	\
+      (__v4sf)(__m128)(Y), (__v4si)(__m128i)(Z), (int)(C),		\
+      (__mmask8)(U), _MM_FROUND_CUR_DIRECTION))
+
+#define _mm_maskz_fixupimm_ss(U, X, Y, Z, C)				\
+    ((__m128)__builtin_ia32_fixupimmss_maskz ((__v4sf)(__m128)(X),	\
+      (__v4sf)(__m128)(Y), (__v4si)(__m128i)(Z), (int)(C),		\
+      (__mmask8)(U), _MM_FROUND_CUR_DIRECTION))
 
-#define _mm512_maskz_min_round_ps(U, A,  B, R) \
-    (__m512)__builtin_ia32_minps512_mask(A, B, (__v16sf)_mm512_setzero_ps(), U, R)
 #endif
 
-#ifdef __OPTIMIZE__
-extern __inline __m512d
+#ifdef __x86_64__
+extern __inline unsigned long long
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_scalef_round_pd (__m512d __A, __m512d __B, const int __R)
+_mm_cvtss_u64 (__m128 __A)
 {
-  return (__m512d) __builtin_ia32_scalefpd512_mask ((__v8df) __A,
-						    (__v8df) __B,
-						    (__v8df)
-						    _mm512_undefined_pd (),
-						    (__mmask8) -1, __R);
+  return (unsigned long long) __builtin_ia32_vcvtss2usi64 ((__v4sf)
+							   __A,
+							   _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512d
+extern __inline unsigned long long
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_scalef_round_pd (__m512d __W, __mmask8 __U, __m512d __A,
-			     __m512d __B, const int __R)
+_mm_cvttss_u64 (__m128 __A)
 {
-  return (__m512d) __builtin_ia32_scalefpd512_mask ((__v8df) __A,
-						    (__v8df) __B,
-						    (__v8df) __W,
-						    (__mmask8) __U, __R);
+  return (unsigned long long) __builtin_ia32_vcvttss2usi64 ((__v4sf)
+							    __A,
+							    _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512d
+extern __inline long long
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_scalef_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
-			      const int __R)
+_mm_cvttss_i64 (__m128 __A)
 {
-  return (__m512d) __builtin_ia32_scalefpd512_mask ((__v8df) __A,
-						    (__v8df) __B,
-						    (__v8df)
-						    _mm512_setzero_pd (),
-						    (__mmask8) __U, __R);
+  return (long long) __builtin_ia32_vcvttss2si64 ((__v4sf) __A,
+						  _MM_FROUND_CUR_DIRECTION);
 }
+#endif /* __x86_64__ */
 
-extern __inline __m512
+extern __inline unsigned
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_scalef_round_ps (__m512 __A, __m512 __B, const int __R)
+_mm_cvtss_u32 (__m128 __A)
 {
-  return (__m512) __builtin_ia32_scalefps512_mask ((__v16sf) __A,
-						   (__v16sf) __B,
-						   (__v16sf)
-						   _mm512_undefined_ps (),
-						   (__mmask16) -1, __R);
+  return (unsigned) __builtin_ia32_vcvtss2usi32 ((__v4sf) __A,
+						 _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512
+extern __inline unsigned
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_scalef_round_ps (__m512 __W, __mmask16 __U, __m512 __A,
-			     __m512 __B, const int __R)
+_mm_cvttss_u32 (__m128 __A)
 {
-  return (__m512) __builtin_ia32_scalefps512_mask ((__v16sf) __A,
-						   (__v16sf) __B,
-						   (__v16sf) __W,
-						   (__mmask16) __U, __R);
+  return (unsigned) __builtin_ia32_vcvttss2usi32 ((__v4sf) __A,
+						  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512
+extern __inline int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_scalef_round_ps (__mmask16 __U, __m512 __A, __m512 __B,
-			      const int __R)
+_mm_cvttss_i32 (__m128 __A)
 {
-  return (__m512) __builtin_ia32_scalefps512_mask ((__v16sf) __A,
-						   (__v16sf) __B,
-						   (__v16sf)
-						   _mm512_setzero_ps (),
-						   (__mmask16) __U, __R);
+  return (int) __builtin_ia32_vcvttss2si32 ((__v4sf) __A,
+					    _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128d
+extern __inline int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_scalef_round_sd (__m128d __A, __m128d __B, const int __R)
+_mm_cvtsd_i32 (__m128d __A)
 {
-  return (__m128d) __builtin_ia32_scalefsd_mask_round ((__v2df) __A,
-						       (__v2df) __B,
-						       (__v2df)
-						       _mm_setzero_pd (),
-						       (__mmask8) -1, __R);
+  return (int) __builtin_ia32_cvtsd2si ((__v2df) __A);
 }
 
-extern __inline __m128d
+extern __inline int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_scalef_round_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B,
-			  const int __R)
+_mm_cvtss_i32 (__m128 __A)
 {
-  return (__m128d) __builtin_ia32_scalefsd_mask_round ((__v2df) __A,
-						       (__v2df) __B,
-						       (__v2df) __W,
-						       (__mmask8) __U, __R);
+  return (int) __builtin_ia32_cvtss2si ((__v4sf) __A);
 }
 
 extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_scalef_round_sd (__mmask8 __U, __m128d __A, __m128d __B,
-			   const int __R)
+_mm_cvti32_sd (__m128d __A, int __B)
 {
-  return (__m128d) __builtin_ia32_scalefsd_mask_round ((__v2df) __A,
-						       (__v2df) __B,
-						       (__v2df)
-						       _mm_setzero_pd (),
-						       (__mmask8) __U, __R);
+  return (__m128d) __builtin_ia32_cvtsi2sd ((__v2df) __A, __B);
 }
 
 extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_scalef_round_ss (__m128 __A, __m128 __B, const int __R)
+_mm_cvti32_ss (__m128 __A, int __B)
 {
-  return (__m128) __builtin_ia32_scalefss_mask_round ((__v4sf) __A,
-						      (__v4sf) __B,
-						      (__v4sf)
-						      _mm_setzero_ps (),
-						      (__mmask8) -1, __R);
+  return (__m128) __builtin_ia32_cvtsi2ss ((__v4sf) __A, __B);
 }
 
-extern __inline __m128
+#ifdef __x86_64__
+extern __inline unsigned long long
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_scalef_round_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B,
-			 const int __R)
+_mm_cvtsd_u64 (__m128d __A)
 {
-  return (__m128) __builtin_ia32_scalefss_mask_round ((__v4sf) __A,
-						      (__v4sf) __B,
-						      (__v4sf) __W,
-						      (__mmask8) __U, __R);
+  return (unsigned long long) __builtin_ia32_vcvtsd2usi64 ((__v2df)
+							   __A,
+							   _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128
+extern __inline unsigned long long
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_scalef_round_ss (__mmask8 __U, __m128 __A, __m128 __B, const int __R)
+_mm_cvttsd_u64 (__m128d __A)
 {
-  return (__m128) __builtin_ia32_scalefss_mask_round ((__v4sf) __A,
-						      (__v4sf) __B,
-						      (__v4sf)
-						      _mm_setzero_ps (),
-						      (__mmask8) __U, __R);
+  return (unsigned long long) __builtin_ia32_vcvttsd2usi64 ((__v2df)
+							    __A,
+							    _MM_FROUND_CUR_DIRECTION);
 }
-#else
-#define _mm512_scalef_round_pd(A, B, C)					\
-  ((__m512d)								\
-   __builtin_ia32_scalefpd512_mask((A), (B),				\
-				   (__v8df) _mm512_undefined_pd(),	\
-				   -1, (C)))
 
-#define _mm512_mask_scalef_round_pd(W, U, A, B, C)			\
-  ((__m512d) __builtin_ia32_scalefpd512_mask((A), (B), (W), (U), (C)))
-
-#define _mm512_maskz_scalef_round_pd(U, A, B, C)			\
-  ((__m512d)								\
-   __builtin_ia32_scalefpd512_mask((A), (B),				\
-				   (__v8df) _mm512_setzero_pd(),	\
-				   (U), (C)))
-
-#define _mm512_scalef_round_ps(A, B, C)					\
-  ((__m512)								\
-   __builtin_ia32_scalefps512_mask((A), (B),				\
-				   (__v16sf) _mm512_undefined_ps(),	\
-				   -1, (C)))
-
-#define _mm512_mask_scalef_round_ps(W, U, A, B, C)			\
-  ((__m512) __builtin_ia32_scalefps512_mask((A), (B), (W), (U), (C)))
-
-#define _mm512_maskz_scalef_round_ps(U, A, B, C)			\
-  ((__m512)								\
-   __builtin_ia32_scalefps512_mask((A), (B),				\
-				   (__v16sf) _mm512_setzero_ps(),	\
-				   (U), (C)))
-
-#define _mm_scalef_round_sd(A, B, C)					\
-  ((__m128d)								\
-   __builtin_ia32_scalefsd_mask_round ((A), (B),			\
-				       (__v2df) _mm_undefined_pd (),	\
-				       -1, (C)))
-
-#define _mm_scalef_round_ss(A, B, C)					\
-  ((__m128)								\
-   __builtin_ia32_scalefss_mask_round ((A), (B),			\
-				       (__v4sf) _mm_undefined_ps (),	\
-				       -1, (C)))
-
-#define _mm_mask_scalef_round_sd(W, U, A, B, C)				\
-  ((__m128d)								\
-   __builtin_ia32_scalefsd_mask_round ((A), (B), (W), (U), (C)))
-
-#define _mm_mask_scalef_round_ss(W, U, A, B, C)				\
-  ((__m128)								\
-   __builtin_ia32_scalefss_mask_round ((A), (B), (W), (U), (C)))
-
-#define _mm_maskz_scalef_round_sd(U, A, B, C)				\
-  ((__m128d)								\
-   __builtin_ia32_scalefsd_mask_round ((A), (B),			\
-				       (__v2df) _mm_setzero_pd (),	\
-				       (U), (C)))
-
-#define _mm_maskz_scalef_round_ss(U, A, B, C)				\
-  ((__m128)								\
-   __builtin_ia32_scalefss_mask_round ((A), (B),			\
-				       (__v4sf) _mm_setzero_ps (),	\
-				       (U), (C)))
-#endif
-
-#define _mm_mask_scalef_sd(W, U, A, B) \
-    _mm_mask_scalef_round_sd ((W), (U), (A), (B), _MM_FROUND_CUR_DIRECTION)
-
-#define _mm_maskz_scalef_sd(U, A, B) \
-    _mm_maskz_scalef_round_sd ((U), (A), (B), _MM_FROUND_CUR_DIRECTION)
-
-#define _mm_mask_scalef_ss(W, U, A, B) \
-    _mm_mask_scalef_round_ss ((W), (U), (A), (B), _MM_FROUND_CUR_DIRECTION)
-
-#define _mm_maskz_scalef_ss(U, A, B) \
-    _mm_maskz_scalef_round_ss ((U), (A), (B), _MM_FROUND_CUR_DIRECTION)
-
-#ifdef __OPTIMIZE__
-extern __inline __m512d
+extern __inline long long
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fmadd_round_pd (__m512d __A, __m512d __B, __m512d __C, const int __R)
+_mm_cvttsd_i64 (__m128d __A)
 {
-  return (__m512d) __builtin_ia32_vfmaddpd512_mask ((__v8df) __A,
-						    (__v8df) __B,
-						    (__v8df) __C,
-						    (__mmask8) -1, __R);
+  return (long long) __builtin_ia32_vcvttsd2si64 ((__v2df) __A,
+						  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512d
+extern __inline long long
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmadd_round_pd (__m512d __A, __mmask8 __U, __m512d __B,
-			    __m512d __C, const int __R)
+_mm_cvtsd_i64 (__m128d __A)
 {
-  return (__m512d) __builtin_ia32_vfmaddpd512_mask ((__v8df) __A,
-						    (__v8df) __B,
-						    (__v8df) __C,
-						    (__mmask8) __U, __R);
+  return (long long) __builtin_ia32_cvtsd2si64 ((__v2df) __A);
 }
 
-extern __inline __m512d
+extern __inline long long
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fmadd_round_pd (__m512d __A, __m512d __B, __m512d __C,
-			     __mmask8 __U, const int __R)
+_mm_cvtss_i64 (__m128 __A)
 {
-  return (__m512d) __builtin_ia32_vfmaddpd512_mask3 ((__v8df) __A,
-						     (__v8df) __B,
-						     (__v8df) __C,
-						     (__mmask8) __U, __R);
+  return (long long) __builtin_ia32_cvtss2si64 ((__v4sf) __A);
 }
 
-extern __inline __m512d
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmadd_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
-			     __m512d __C, const int __R)
+_mm_cvti64_sd (__m128d __A, long long __B)
 {
-  return (__m512d) __builtin_ia32_vfmaddpd512_maskz ((__v8df) __A,
-						     (__v8df) __B,
-						     (__v8df) __C,
-						     (__mmask8) __U, __R);
+  return (__m128d) __builtin_ia32_cvtsi642sd ((__v2df) __A, __B);
 }
 
-extern __inline __m512
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fmadd_round_ps (__m512 __A, __m512 __B, __m512 __C, const int __R)
+_mm_cvti64_ss (__m128 __A, long long __B)
 {
-  return (__m512) __builtin_ia32_vfmaddps512_mask ((__v16sf) __A,
-						   (__v16sf) __B,
-						   (__v16sf) __C,
-						   (__mmask16) -1, __R);
+  return (__m128) __builtin_ia32_cvtsi642ss ((__v4sf) __A, __B);
 }
+#endif /* __x86_64__ */
 
-extern __inline __m512
+extern __inline unsigned
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmadd_round_ps (__m512 __A, __mmask16 __U, __m512 __B,
-			    __m512 __C, const int __R)
+_mm_cvtsd_u32 (__m128d __A)
 {
-  return (__m512) __builtin_ia32_vfmaddps512_mask ((__v16sf) __A,
-						   (__v16sf) __B,
-						   (__v16sf) __C,
-						   (__mmask16) __U, __R);
+  return (unsigned) __builtin_ia32_vcvtsd2usi32 ((__v2df) __A,
+						 _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512
+extern __inline unsigned
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fmadd_round_ps (__m512 __A, __m512 __B, __m512 __C,
-			     __mmask16 __U, const int __R)
+_mm_cvttsd_u32 (__m128d __A)
 {
-  return (__m512) __builtin_ia32_vfmaddps512_mask3 ((__v16sf) __A,
-						    (__v16sf) __B,
-						    (__v16sf) __C,
-						    (__mmask16) __U, __R);
+  return (unsigned) __builtin_ia32_vcvttsd2usi32 ((__v2df) __A,
+						  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512
+extern __inline int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmadd_round_ps (__mmask16 __U, __m512 __A, __m512 __B,
-			     __m512 __C, const int __R)
+_mm_cvttsd_i32 (__m128d __A)
 {
-  return (__m512) __builtin_ia32_vfmaddps512_maskz ((__v16sf) __A,
-						    (__v16sf) __B,
-						    (__v16sf) __C,
-						    (__mmask16) __U, __R);
+  return (int) __builtin_ia32_vcvttsd2si32 ((__v2df) __A,
+					    _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512d
+#ifdef __OPTIMIZE__
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fmsub_round_pd (__m512d __A, __m512d __B, __m512d __C, const int __R)
+_mm_getexp_ss (__m128 __A, __m128 __B)
 {
-  return (__m512d) __builtin_ia32_vfmsubpd512_mask ((__v8df) __A,
-						    (__v8df) __B,
-						    (__v8df) __C,
-						    (__mmask8) -1, __R);
+  return (__m128) __builtin_ia32_getexpss128_round ((__v4sf) __A,
+						    (__v4sf) __B,
+						    _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512d
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmsub_round_pd (__m512d __A, __mmask8 __U, __m512d __B,
-			    __m512d __C, const int __R)
+_mm_mask_getexp_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B)
 {
-  return (__m512d) __builtin_ia32_vfmsubpd512_mask ((__v8df) __A,
-						    (__v8df) __B,
-						    (__v8df) __C,
-						    (__mmask8) __U, __R);
+  return (__m128) __builtin_ia32_getexpss_mask_round ((__v4sf) __A,
+						(__v4sf) __B,
+						(__v4sf) __W,
+						(__mmask8) __U,
+						_MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512d
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fmsub_round_pd (__m512d __A, __m512d __B, __m512d __C,
-			     __mmask8 __U, const int __R)
+_mm_maskz_getexp_ss (__mmask8 __U, __m128 __A, __m128 __B)
 {
-  return (__m512d) __builtin_ia32_vfmsubpd512_mask3 ((__v8df) __A,
-						     (__v8df) __B,
-						     (__v8df) __C,
-						     (__mmask8) __U, __R);
+  return (__m128) __builtin_ia32_getexpss_mask_round ((__v4sf) __A,
+						(__v4sf) __B,
+						(__v4sf)
+						_mm_setzero_ps (),
+						(__mmask8) __U,
+						_MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512d
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmsub_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
-			     __m512d __C, const int __R)
+_mm_getexp_sd (__m128d __A, __m128d __B)
 {
-  return (__m512d) __builtin_ia32_vfmsubpd512_maskz ((__v8df) __A,
-						     (__v8df) __B,
-						     (__v8df) __C,
-						     (__mmask8) __U, __R);
+  return (__m128d) __builtin_ia32_getexpsd128_round ((__v2df) __A,
+						     (__v2df) __B,
+						     _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fmsub_round_ps (__m512 __A, __m512 __B, __m512 __C, const int __R)
+_mm_mask_getexp_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B)
 {
-  return (__m512) __builtin_ia32_vfmsubps512_mask ((__v16sf) __A,
-						   (__v16sf) __B,
-						   (__v16sf) __C,
-						   (__mmask16) -1, __R);
+  return (__m128d) __builtin_ia32_getexpsd_mask_round ((__v2df) __A,
+						(__v2df) __B,
+						(__v2df) __W,
+						(__mmask8) __U,
+						_MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmsub_round_ps (__m512 __A, __mmask16 __U, __m512 __B,
-			    __m512 __C, const int __R)
+_mm_maskz_getexp_sd (__mmask8 __U, __m128d __A, __m128d __B)
 {
-  return (__m512) __builtin_ia32_vfmsubps512_mask ((__v16sf) __A,
-						   (__v16sf) __B,
-						   (__v16sf) __C,
-						   (__mmask16) __U, __R);
+  return (__m128d) __builtin_ia32_getexpsd_mask_round ((__v2df) __A,
+						(__v2df) __B,
+						(__v2df)
+						_mm_setzero_pd (),
+						(__mmask8) __U,
+						_MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fmsub_round_ps (__m512 __A, __m512 __B, __m512 __C,
-			     __mmask16 __U, const int __R)
+_mm_getmant_sd (__m128d __A, __m128d __B, _MM_MANTISSA_NORM_ENUM __C,
+		_MM_MANTISSA_SIGN_ENUM __D)
 {
-  return (__m512) __builtin_ia32_vfmsubps512_mask3 ((__v16sf) __A,
-						    (__v16sf) __B,
-						    (__v16sf) __C,
-						    (__mmask16) __U, __R);
+  return (__m128d) __builtin_ia32_getmantsd_round ((__v2df) __A,
+						   (__v2df) __B,
+						   (__D << 2) | __C,
+						   _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmsub_round_ps (__mmask16 __U, __m512 __A, __m512 __B,
-			     __m512 __C, const int __R)
+_mm_mask_getmant_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B,
+			_MM_MANTISSA_NORM_ENUM __C, _MM_MANTISSA_SIGN_ENUM __D)
 {
-  return (__m512) __builtin_ia32_vfmsubps512_maskz ((__v16sf) __A,
-						    (__v16sf) __B,
-						    (__v16sf) __C,
-						    (__mmask16) __U, __R);
-}
-
-extern __inline __m512d
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fmaddsub_round_pd (__m512d __A, __m512d __B, __m512d __C, const int __R)
-{
-  return (__m512d) __builtin_ia32_vfmaddsubpd512_mask ((__v8df) __A,
-						       (__v8df) __B,
-						       (__v8df) __C,
-						       (__mmask8) -1, __R);
+  return (__m128d) __builtin_ia32_getmantsd_mask_round ((__v2df) __A,
+							(__v2df) __B,
+						        (__D << 2) | __C,
+                                                        (__v2df) __W,
+						       __U,
+						     _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512d
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmaddsub_round_pd (__m512d __A, __mmask8 __U, __m512d __B,
-			       __m512d __C, const int __R)
+_mm_maskz_getmant_sd (__mmask8 __U, __m128d __A, __m128d __B,
+			 _MM_MANTISSA_NORM_ENUM __C, _MM_MANTISSA_SIGN_ENUM __D)
 {
-  return (__m512d) __builtin_ia32_vfmaddsubpd512_mask ((__v8df) __A,
-						       (__v8df) __B,
-						       (__v8df) __C,
-						       (__mmask8) __U, __R);
+  return (__m128d) __builtin_ia32_getmantsd_mask_round ((__v2df) __A,
+                                                        (__v2df) __B,
+						        (__D << 2) | __C,
+                                                        (__v2df)
+							_mm_setzero_pd(),
+						        __U,
+						     _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512d
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fmaddsub_round_pd (__m512d __A, __m512d __B, __m512d __C,
-				__mmask8 __U, const int __R)
+_mm_getmant_ss (__m128 __A, __m128 __B, _MM_MANTISSA_NORM_ENUM __C,
+		_MM_MANTISSA_SIGN_ENUM __D)
 {
-  return (__m512d) __builtin_ia32_vfmaddsubpd512_mask3 ((__v8df) __A,
-							(__v8df) __B,
-							(__v8df) __C,
-							(__mmask8) __U, __R);
+  return (__m128) __builtin_ia32_getmantss_round ((__v4sf) __A,
+						  (__v4sf) __B,
+						  (__D << 2) | __C,
+						  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512d
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmaddsub_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
-				__m512d __C, const int __R)
+_mm_mask_getmant_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B,
+			_MM_MANTISSA_NORM_ENUM __C, _MM_MANTISSA_SIGN_ENUM __D)
 {
-  return (__m512d) __builtin_ia32_vfmaddsubpd512_maskz ((__v8df) __A,
-							(__v8df) __B,
-							(__v8df) __C,
-							(__mmask8) __U, __R);
+  return (__m128) __builtin_ia32_getmantss_mask_round ((__v4sf) __A,
+							(__v4sf) __B,
+						        (__D << 2) | __C,
+                                                        (__v4sf) __W,
+						       __U,
+						     _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fmaddsub_round_ps (__m512 __A, __m512 __B, __m512 __C, const int __R)
+_mm_maskz_getmant_ss (__mmask8 __U, __m128 __A, __m128 __B,
+			 _MM_MANTISSA_NORM_ENUM __C, _MM_MANTISSA_SIGN_ENUM __D)
 {
-  return (__m512) __builtin_ia32_vfmaddsubps512_mask ((__v16sf) __A,
-						      (__v16sf) __B,
-						      (__v16sf) __C,
-						      (__mmask16) -1, __R);
+  return (__m128) __builtin_ia32_getmantss_mask_round ((__v4sf) __A,
+                                                        (__v4sf) __B,
+						        (__D << 2) | __C,
+                                                        (__v4sf)
+							_mm_setzero_ps(),
+						        __U,
+						     _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmaddsub_round_ps (__m512 __A, __mmask16 __U, __m512 __B,
-			       __m512 __C, const int __R)
+_mm_roundscale_ss (__m128 __A, __m128 __B, const int __imm)
 {
-  return (__m512) __builtin_ia32_vfmaddsubps512_mask ((__v16sf) __A,
-						      (__v16sf) __B,
-						      (__v16sf) __C,
-						      (__mmask16) __U, __R);
+  return (__m128)
+    __builtin_ia32_rndscaless_mask_round ((__v4sf) __A,
+					  (__v4sf) __B, __imm,
+					  (__v4sf)
+					  _mm_setzero_ps (),
+					  (__mmask8) -1,
+					  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fmaddsub_round_ps (__m512 __A, __m512 __B, __m512 __C,
-				__mmask16 __U, const int __R)
-{
-  return (__m512) __builtin_ia32_vfmaddsubps512_mask3 ((__v16sf) __A,
-						       (__v16sf) __B,
-						       (__v16sf) __C,
-						       (__mmask16) __U, __R);
-}
 
-extern __inline __m512
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmaddsub_round_ps (__mmask16 __U, __m512 __A, __m512 __B,
-				__m512 __C, const int __R)
+_mm_mask_roundscale_ss (__m128 __A, __mmask8 __B, __m128 __C, __m128 __D,
+			const int __imm)
 {
-  return (__m512) __builtin_ia32_vfmaddsubps512_maskz ((__v16sf) __A,
-						       (__v16sf) __B,
-						       (__v16sf) __C,
-						       (__mmask16) __U, __R);
+  return (__m128)
+    __builtin_ia32_rndscaless_mask_round ((__v4sf) __C,
+					  (__v4sf) __D, __imm,
+					  (__v4sf) __A,
+					  (__mmask8) __B,
+					  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512d
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fmsubadd_round_pd (__m512d __A, __m512d __B, __m512d __C, const int __R)
+_mm_maskz_roundscale_ss (__mmask8 __A, __m128 __B, __m128 __C,
+			 const int __imm)
 {
-  return (__m512d) __builtin_ia32_vfmaddsubpd512_mask ((__v8df) __A,
-						       (__v8df) __B,
-						       -(__v8df) __C,
-						       (__mmask8) -1, __R);
+  return (__m128)
+    __builtin_ia32_rndscaless_mask_round ((__v4sf) __B,
+					  (__v4sf) __C, __imm,
+					  (__v4sf)
+					  _mm_setzero_ps (),
+					  (__mmask8) __A,
+					  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512d
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmsubadd_round_pd (__m512d __A, __mmask8 __U, __m512d __B,
-			       __m512d __C, const int __R)
+_mm_roundscale_sd (__m128d __A, __m128d __B, const int __imm)
 {
-  return (__m512d) __builtin_ia32_vfmaddsubpd512_mask ((__v8df) __A,
-						       (__v8df) __B,
-						       -(__v8df) __C,
-						       (__mmask8) __U, __R);
+  return (__m128d)
+    __builtin_ia32_rndscalesd_mask_round ((__v2df) __A,
+					  (__v2df) __B, __imm,
+					  (__v2df)
+					  _mm_setzero_pd (),
+					  (__mmask8) -1,
+					  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512d
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fmsubadd_round_pd (__m512d __A, __m512d __B, __m512d __C,
-				__mmask8 __U, const int __R)
+_mm_mask_roundscale_sd (__m128d __A, __mmask8 __B, __m128d __C, __m128d __D,
+			const int __imm)
 {
-  return (__m512d) __builtin_ia32_vfmsubaddpd512_mask3 ((__v8df) __A,
-							(__v8df) __B,
-							(__v8df) __C,
-							(__mmask8) __U, __R);
+  return (__m128d)
+    __builtin_ia32_rndscalesd_mask_round ((__v2df) __C,
+					  (__v2df) __D, __imm,
+					  (__v2df) __A,
+					  (__mmask8) __B,
+					  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512d
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmsubadd_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
-				__m512d __C, const int __R)
+_mm_maskz_roundscale_sd (__mmask8 __A, __m128d __B, __m128d __C,
+			 const int __imm)
 {
-  return (__m512d) __builtin_ia32_vfmaddsubpd512_maskz ((__v8df) __A,
-							(__v8df) __B,
-							-(__v8df) __C,
-							(__mmask8) __U, __R);
+  return (__m128d)
+    __builtin_ia32_rndscalesd_mask_round ((__v2df) __B,
+					  (__v2df) __C, __imm,
+					  (__v2df)
+					  _mm_setzero_pd (),
+					  (__mmask8) __A,
+					  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512
+extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fmsubadd_round_ps (__m512 __A, __m512 __B, __m512 __C, const int __R)
+_mm_cmp_sd_mask (__m128d __X, __m128d __Y, const int __P)
 {
-  return (__m512) __builtin_ia32_vfmaddsubps512_mask ((__v16sf) __A,
-						      (__v16sf) __B,
-						      -(__v16sf) __C,
-						      (__mmask16) -1, __R);
+  return (__mmask8) __builtin_ia32_cmpsd_mask ((__v2df) __X,
+					       (__v2df) __Y, __P,
+					       (__mmask8) -1,
+					       _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512
+extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmsubadd_round_ps (__m512 __A, __mmask16 __U, __m512 __B,
-			       __m512 __C, const int __R)
+_mm_mask_cmp_sd_mask (__mmask8 __M, __m128d __X, __m128d __Y, const int __P)
 {
-  return (__m512) __builtin_ia32_vfmaddsubps512_mask ((__v16sf) __A,
-						      (__v16sf) __B,
-						      -(__v16sf) __C,
-						      (__mmask16) __U, __R);
+  return (__mmask8) __builtin_ia32_cmpsd_mask ((__v2df) __X,
+					       (__v2df) __Y, __P,
+					       (__mmask8) __M,
+					       _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512
+extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fmsubadd_round_ps (__m512 __A, __m512 __B, __m512 __C,
-				__mmask16 __U, const int __R)
+_mm_cmp_ss_mask (__m128 __X, __m128 __Y, const int __P)
 {
-  return (__m512) __builtin_ia32_vfmsubaddps512_mask3 ((__v16sf) __A,
-						       (__v16sf) __B,
-						       (__v16sf) __C,
-						       (__mmask16) __U, __R);
+  return (__mmask8) __builtin_ia32_cmpss_mask ((__v4sf) __X,
+					       (__v4sf) __Y, __P,
+					       (__mmask8) -1,
+					       _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512
+extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmsubadd_round_ps (__mmask16 __U, __m512 __A, __m512 __B,
-				__m512 __C, const int __R)
+_mm_mask_cmp_ss_mask (__mmask8 __M, __m128 __X, __m128 __Y, const int __P)
 {
-  return (__m512) __builtin_ia32_vfmaddsubps512_maskz ((__v16sf) __A,
-						       (__v16sf) __B,
-						       -(__v16sf) __C,
-						       (__mmask16) __U, __R);
+  return (__mmask8) __builtin_ia32_cmpss_mask ((__v4sf) __X,
+					       (__v4sf) __Y, __P,
+					       (__mmask8) __M,
+					       _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512d
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fnmadd_round_pd (__m512d __A, __m512d __B, __m512d __C, const int __R)
-{
-  return (__m512d) __builtin_ia32_vfnmaddpd512_mask ((__v8df) __A,
-						     (__v8df) __B,
-						     (__v8df) __C,
-						     (__mmask8) -1, __R);
-}
+#else
+#define _mm_getmant_sd(X, Y, C, D)                                                  \
+  ((__m128d)__builtin_ia32_getmantsd_round ((__v2df)(__m128d)(X),                    \
+                                           (__v2df)(__m128d)(Y),                    \
+                                           (int)(((D)<<2) | (C)),                   \
+					   _MM_FROUND_CUR_DIRECTION))
 
-extern __inline __m512d
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fnmadd_round_pd (__m512d __A, __mmask8 __U, __m512d __B,
-			     __m512d __C, const int __R)
-{
-  return (__m512d) __builtin_ia32_vfnmaddpd512_mask ((__v8df) __A,
-						     (__v8df) __B,
-						     (__v8df) __C,
-						     (__mmask8) __U, __R);
-}
+#define _mm_mask_getmant_sd(W, U, X, Y, C, D)                                       \
+  ((__m128d)__builtin_ia32_getmantsd_mask_round ((__v2df)(__m128d)(X),                 \
+                                                 (__v2df)(__m128d)(Y),                 \
+                                                 (int)(((D)<<2) | (C)),                \
+                                                (__v2df)(__m128d)(W),                 \
+                                              (__mmask8)(U),\
+					      _MM_FROUND_CUR_DIRECTION))
 
-extern __inline __m512d
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fnmadd_round_pd (__m512d __A, __m512d __B, __m512d __C,
-			      __mmask8 __U, const int __R)
-{
-  return (__m512d) __builtin_ia32_vfnmaddpd512_mask3 ((__v8df) __A,
-						      (__v8df) __B,
-						      (__v8df) __C,
-						      (__mmask8) __U, __R);
-}
+#define _mm_maskz_getmant_sd(U, X, Y, C, D)                                         \
+  ((__m128d)__builtin_ia32_getmantsd_mask_round ((__v2df)(__m128d)(X),                 \
+                                           (__v2df)(__m128d)(Y),                     \
+                                              (int)(((D)<<2) | (C)),                \
+                                           (__v2df)_mm_setzero_pd(),             \
+                                              (__mmask8)(U),\
+					      _MM_FROUND_CUR_DIRECTION))
 
-extern __inline __m512d
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fnmadd_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
-			      __m512d __C, const int __R)
-{
-  return (__m512d) __builtin_ia32_vfnmaddpd512_maskz ((__v8df) __A,
-						      (__v8df) __B,
-						      (__v8df) __C,
-						      (__mmask8) __U, __R);
-}
+#define _mm_getmant_ss(X, Y, C, D)                                                  \
+  ((__m128)__builtin_ia32_getmantss_round ((__v4sf)(__m128)(X),                      \
+                                          (__v4sf)(__m128)(Y),                      \
+                                          (int)(((D)<<2) | (C)),                    \
+					  _MM_FROUND_CUR_DIRECTION))
 
-extern __inline __m512
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fnmadd_round_ps (__m512 __A, __m512 __B, __m512 __C, const int __R)
-{
-  return (__m512) __builtin_ia32_vfnmaddps512_mask ((__v16sf) __A,
-						    (__v16sf) __B,
-						    (__v16sf) __C,
-						    (__mmask16) -1, __R);
-}
+#define _mm_mask_getmant_ss(W, U, X, Y, C, D)                                       \
+  ((__m128)__builtin_ia32_getmantss_mask_round ((__v4sf)(__m128)(X),                 \
+                                                 (__v4sf)(__m128)(Y),                 \
+                                                 (int)(((D)<<2) | (C)),                \
+                                                (__v4sf)(__m128)(W),                 \
+                                              (__mmask8)(U),\
+					      _MM_FROUND_CUR_DIRECTION))
 
-extern __inline __m512
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fnmadd_round_ps (__m512 __A, __mmask16 __U, __m512 __B,
-			     __m512 __C, const int __R)
-{
-  return (__m512) __builtin_ia32_vfnmaddps512_mask ((__v16sf) __A,
-						    (__v16sf) __B,
-						    (__v16sf) __C,
-						    (__mmask16) __U, __R);
-}
+#define _mm_maskz_getmant_ss(U, X, Y, C, D)                                         \
+  ((__m128)__builtin_ia32_getmantss_mask_round ((__v4sf)(__m128)(X),                 \
+                                           (__v4sf)(__m128)(Y),                     \
+                                              (int)(((D)<<2) | (C)),                \
+                                           (__v4sf)_mm_setzero_ps(),             \
+                                              (__mmask8)(U),\
+					      _MM_FROUND_CUR_DIRECTION))
 
-extern __inline __m512
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fnmadd_round_ps (__m512 __A, __m512 __B, __m512 __C,
-			      __mmask16 __U, const int __R)
-{
-  return (__m512) __builtin_ia32_vfnmaddps512_mask3 ((__v16sf) __A,
-						     (__v16sf) __B,
-						     (__v16sf) __C,
-						     (__mmask16) __U, __R);
-}
+#define _mm_getexp_ss(A, B)						      \
+  ((__m128)__builtin_ia32_getexpss128_round((__v4sf)(__m128)(A), (__v4sf)(__m128)(B),  \
+					   _MM_FROUND_CUR_DIRECTION))
 
-extern __inline __m512
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fnmadd_round_ps (__mmask16 __U, __m512 __A, __m512 __B,
-			      __m512 __C, const int __R)
-{
-  return (__m512) __builtin_ia32_vfnmaddps512_maskz ((__v16sf) __A,
-						     (__v16sf) __B,
-						     (__v16sf) __C,
-						     (__mmask16) __U, __R);
-}
+#define _mm_mask_getexp_ss(W, U, A, B) \
+    (__m128)__builtin_ia32_getexpss_mask_round(A, B, W, U,\
+                                             _MM_FROUND_CUR_DIRECTION)
 
-extern __inline __m512d
+#define _mm_maskz_getexp_ss(U, A, B)   \
+    (__m128)__builtin_ia32_getexpss_mask_round(A, B, (__v4sf)_mm_setzero_ps(), U,\
+					      _MM_FROUND_CUR_DIRECTION)
+
+#define _mm_getexp_sd(A, B)						       \
+  ((__m128d)__builtin_ia32_getexpsd128_round((__v2df)(__m128d)(A), (__v2df)(__m128d)(B),\
+					    _MM_FROUND_CUR_DIRECTION))
+
+#define _mm_mask_getexp_sd(W, U, A, B) \
+    (__m128d)__builtin_ia32_getexpsd_mask_round(A, B, W, U,\
+                                             _MM_FROUND_CUR_DIRECTION)
+
+#define _mm_maskz_getexp_sd(U, A, B)   \
+    (__m128d)__builtin_ia32_getexpsd_mask_round(A, B, (__v2df)_mm_setzero_pd(), U,\
+					      _MM_FROUND_CUR_DIRECTION)
+
+#define _mm_roundscale_ss(A, B, I)					\
+  ((__m128)								\
+   __builtin_ia32_rndscaless_mask_round ((__v4sf) (__m128) (A),		\
+					 (__v4sf) (__m128) (B),		\
+					 (int) (I),			\
+					 (__v4sf) _mm_setzero_ps (),	\
+					 (__mmask8) (-1),		\
+					 _MM_FROUND_CUR_DIRECTION))
+#define _mm_mask_roundscale_ss(A, U, B, C, I)				\
+  ((__m128)								\
+   __builtin_ia32_rndscaless_mask_round ((__v4sf) (__m128) (B),		\
+					 (__v4sf) (__m128) (C),		\
+					 (int) (I),			\
+					 (__v4sf) (__m128) (A),		\
+					 (__mmask8) (U),		\
+					 _MM_FROUND_CUR_DIRECTION))
+#define _mm_maskz_roundscale_ss(U, A, B, I)				\
+  ((__m128)								\
+   __builtin_ia32_rndscaless_mask_round ((__v4sf) (__m128) (A),		\
+					 (__v4sf) (__m128) (B),		\
+					 (int) (I),			\
+					 (__v4sf) _mm_setzero_ps (),	\
+					 (__mmask8) (U),		\
+					 _MM_FROUND_CUR_DIRECTION))
+#define _mm_roundscale_sd(A, B, I)					\
+  ((__m128d)								\
+   __builtin_ia32_rndscalesd_mask_round ((__v2df) (__m128d) (A),	\
+					 (__v2df) (__m128d) (B),	\
+					 (int) (I),			\
+					 (__v2df) _mm_setzero_pd (),	\
+					 (__mmask8) (-1),		\
+					 _MM_FROUND_CUR_DIRECTION))
+#define _mm_mask_roundscale_sd(A, U, B, C, I)				\
+  ((__m128d)								\
+   __builtin_ia32_rndscalesd_mask_round ((__v2df) (__m128d) (B),	\
+					 (__v2df) (__m128d) (C),	\
+					 (int) (I),			\
+					 (__v2df) (__m128d) (A),	\
+					 (__mmask8) (U),		\
+					 _MM_FROUND_CUR_DIRECTION))
+#define _mm_maskz_roundscale_sd(U, A, B, I)				\
+  ((__m128d)								\
+   __builtin_ia32_rndscalesd_mask_round ((__v2df) (__m128d) (A),	\
+					 (__v2df) (__m128d) (B),	\
+					 (int) (I),			\
+					 (__v2df) _mm_setzero_pd (),	\
+					 (__mmask8) (U),		\
+					 _MM_FROUND_CUR_DIRECTION))
+
+#define _mm_cmp_sd_mask(X, Y, P)					\
+  ((__mmask8) __builtin_ia32_cmpsd_mask ((__v2df)(__m128d)(X),		\
+					 (__v2df)(__m128d)(Y), (int)(P),\
+					 (__mmask8)-1,_MM_FROUND_CUR_DIRECTION))
+
+#define _mm_mask_cmp_sd_mask(M, X, Y, P)					\
+  ((__mmask8) __builtin_ia32_cmpsd_mask ((__v2df)(__m128d)(X),		\
+					 (__v2df)(__m128d)(Y), (int)(P),\
+					 M,_MM_FROUND_CUR_DIRECTION))
+
+#define _mm_cmp_ss_mask(X, Y, P)					\
+  ((__mmask8) __builtin_ia32_cmpss_mask ((__v4sf)(__m128)(X),		\
+					 (__v4sf)(__m128)(Y), (int)(P), \
+					 (__mmask8)-1,_MM_FROUND_CUR_DIRECTION))
+
+#define _mm_mask_cmp_ss_mask(M, X, Y, P)					\
+  ((__mmask8) __builtin_ia32_cmpss_mask ((__v4sf)(__m128)(X),		\
+					 (__v4sf)(__m128)(Y), (int)(P), \
+					 M,_MM_FROUND_CUR_DIRECTION))
+
+#endif
+
+#ifdef __DISABLE_AVX512F__
+#undef __DISABLE_AVX512F__
+#pragma GCC pop_options
+#endif /* __DISABLE_AVX512F__ */
+
+#if !defined (__AVX512F__) || !defined (__EVEX512__)
+#pragma GCC push_options
+#pragma GCC target("avx512f,evex512")
+#define __DISABLE_AVX512F_512__
+#endif /* __AVX512F_512__ */
+
+/* Internal data types for implementing the intrinsics.  */
+typedef double __v8df __attribute__ ((__vector_size__ (64)));
+typedef float __v16sf __attribute__ ((__vector_size__ (64)));
+typedef long long __v8di __attribute__ ((__vector_size__ (64)));
+typedef unsigned long long __v8du __attribute__ ((__vector_size__ (64)));
+typedef int __v16si __attribute__ ((__vector_size__ (64)));
+typedef unsigned int __v16su __attribute__ ((__vector_size__ (64)));
+typedef short __v32hi __attribute__ ((__vector_size__ (64)));
+typedef unsigned short __v32hu __attribute__ ((__vector_size__ (64)));
+typedef char __v64qi __attribute__ ((__vector_size__ (64)));
+typedef unsigned char __v64qu __attribute__ ((__vector_size__ (64)));
+
+/* The Intel API is flexible enough that we must allow aliasing with other
+   vector types, and their scalar components.  */
+typedef float __m512 __attribute__ ((__vector_size__ (64), __may_alias__));
+typedef long long __m512i __attribute__ ((__vector_size__ (64), __may_alias__));
+typedef double __m512d __attribute__ ((__vector_size__ (64), __may_alias__));
+
+/* Unaligned version of the same type.  */
+typedef float __m512_u __attribute__ ((__vector_size__ (64), __may_alias__, __aligned__ (1)));
+typedef long long __m512i_u __attribute__ ((__vector_size__ (64), __may_alias__, __aligned__ (1)));
+typedef double __m512d_u __attribute__ ((__vector_size__ (64), __may_alias__, __aligned__ (1)));
+
+extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fnmsub_round_pd (__m512d __A, __m512d __B, __m512d __C, const int __R)
+_mm512_int2mask (int __M)
 {
-  return (__m512d) __builtin_ia32_vfnmsubpd512_mask ((__v8df) __A,
-						     (__v8df) __B,
-						     (__v8df) __C,
-						     (__mmask8) -1, __R);
+  return (__mmask16) __M;
 }
 
-extern __inline __m512d
+extern __inline int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fnmsub_round_pd (__m512d __A, __mmask8 __U, __m512d __B,
-			     __m512d __C, const int __R)
+_mm512_mask2int (__mmask16 __M)
 {
-  return (__m512d) __builtin_ia32_vfnmsubpd512_mask ((__v8df) __A,
-						     (__v8df) __B,
-						     (__v8df) __C,
-						     (__mmask8) __U, __R);
+  return (int) __M;
 }
 
-extern __inline __m512d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fnmsub_round_pd (__m512d __A, __m512d __B, __m512d __C,
-			      __mmask8 __U, const int __R)
+_mm512_set_epi64 (long long __A, long long __B, long long __C,
+		  long long __D, long long __E, long long __F,
+		  long long __G, long long __H)
 {
-  return (__m512d) __builtin_ia32_vfnmsubpd512_mask3 ((__v8df) __A,
-						      (__v8df) __B,
-						      (__v8df) __C,
-						      (__mmask8) __U, __R);
+  return __extension__ (__m512i) (__v8di)
+	 { __H, __G, __F, __E, __D, __C, __B, __A };
 }
 
-extern __inline __m512d
+/* Create the vector [A B C D E F G H I J K L M N O P].  */
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fnmsub_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
-			      __m512d __C, const int __R)
+_mm512_set_epi32 (int __A, int __B, int __C, int __D,
+		  int __E, int __F, int __G, int __H,
+		  int __I, int __J, int __K, int __L,
+		  int __M, int __N, int __O, int __P)
 {
-  return (__m512d) __builtin_ia32_vfnmsubpd512_maskz ((__v8df) __A,
-						      (__v8df) __B,
-						      (__v8df) __C,
-						      (__mmask8) __U, __R);
+  return __extension__ (__m512i)(__v16si)
+	 { __P, __O, __N, __M, __L, __K, __J, __I,
+	   __H, __G, __F, __E, __D, __C, __B, __A };
 }
 
-extern __inline __m512
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fnmsub_round_ps (__m512 __A, __m512 __B, __m512 __C, const int __R)
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_set_epi16 (short __q31, short __q30, short __q29, short __q28,
+		  short __q27, short __q26, short __q25, short __q24,
+		  short __q23, short __q22, short __q21, short __q20,
+		  short __q19, short __q18, short __q17, short __q16,
+		  short __q15, short __q14, short __q13, short __q12,
+		  short __q11, short __q10, short __q09, short __q08,
+		  short __q07, short __q06, short __q05, short __q04,
+		  short __q03, short __q02, short __q01, short __q00)
 {
-  return (__m512) __builtin_ia32_vfnmsubps512_mask ((__v16sf) __A,
-						    (__v16sf) __B,
-						    (__v16sf) __C,
-						    (__mmask16) -1, __R);
+  return __extension__ (__m512i)(__v32hi){
+    __q00, __q01, __q02, __q03, __q04, __q05, __q06, __q07,
+    __q08, __q09, __q10, __q11, __q12, __q13, __q14, __q15,
+    __q16, __q17, __q18, __q19, __q20, __q21, __q22, __q23,
+    __q24, __q25, __q26, __q27, __q28, __q29, __q30, __q31
+  };
 }
 
-extern __inline __m512
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fnmsub_round_ps (__m512 __A, __mmask16 __U, __m512 __B,
-			     __m512 __C, const int __R)
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_set_epi8 (char __q63, char __q62, char __q61, char __q60,
+		 char __q59, char __q58, char __q57, char __q56,
+		 char __q55, char __q54, char __q53, char __q52,
+		 char __q51, char __q50, char __q49, char __q48,
+		 char __q47, char __q46, char __q45, char __q44,
+		 char __q43, char __q42, char __q41, char __q40,
+		 char __q39, char __q38, char __q37, char __q36,
+		 char __q35, char __q34, char __q33, char __q32,
+		 char __q31, char __q30, char __q29, char __q28,
+		 char __q27, char __q26, char __q25, char __q24,
+		 char __q23, char __q22, char __q21, char __q20,
+		 char __q19, char __q18, char __q17, char __q16,
+		 char __q15, char __q14, char __q13, char __q12,
+		 char __q11, char __q10, char __q09, char __q08,
+		 char __q07, char __q06, char __q05, char __q04,
+		 char __q03, char __q02, char __q01, char __q00)
 {
-  return (__m512) __builtin_ia32_vfnmsubps512_mask ((__v16sf) __A,
-						    (__v16sf) __B,
-						    (__v16sf) __C,
-						    (__mmask16) __U, __R);
+  return __extension__ (__m512i)(__v64qi){
+    __q00, __q01, __q02, __q03, __q04, __q05, __q06, __q07,
+    __q08, __q09, __q10, __q11, __q12, __q13, __q14, __q15,
+    __q16, __q17, __q18, __q19, __q20, __q21, __q22, __q23,
+    __q24, __q25, __q26, __q27, __q28, __q29, __q30, __q31,
+    __q32, __q33, __q34, __q35, __q36, __q37, __q38, __q39,
+    __q40, __q41, __q42, __q43, __q44, __q45, __q46, __q47,
+    __q48, __q49, __q50, __q51, __q52, __q53, __q54, __q55,
+    __q56, __q57, __q58, __q59, __q60, __q61, __q62, __q63
+  };
 }
 
-extern __inline __m512
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fnmsub_round_ps (__m512 __A, __m512 __B, __m512 __C,
-			      __mmask16 __U, const int __R)
+_mm512_set_pd (double __A, double __B, double __C, double __D,
+	       double __E, double __F, double __G, double __H)
 {
-  return (__m512) __builtin_ia32_vfnmsubps512_mask3 ((__v16sf) __A,
-						     (__v16sf) __B,
-						     (__v16sf) __C,
-						     (__mmask16) __U, __R);
+  return __extension__ (__m512d)
+	 { __H, __G, __F, __E, __D, __C, __B, __A };
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fnmsub_round_ps (__mmask16 __U, __m512 __A, __m512 __B,
-			      __m512 __C, const int __R)
+_mm512_set_ps (float __A, float __B, float __C, float __D,
+	       float __E, float __F, float __G, float __H,
+	       float __I, float __J, float __K, float __L,
+	       float __M, float __N, float __O, float __P)
 {
-  return (__m512) __builtin_ia32_vfnmsubps512_maskz ((__v16sf) __A,
-						     (__v16sf) __B,
-						     (__v16sf) __C,
-						     (__mmask16) __U, __R);
+  return __extension__ (__m512)
+	 { __P, __O, __N, __M, __L, __K, __J, __I,
+	   __H, __G, __F, __E, __D, __C, __B, __A };
 }
-#else
-#define _mm512_fmadd_round_pd(A, B, C, R)            \
-    (__m512d)__builtin_ia32_vfmaddpd512_mask(A, B, C, -1, R)
-
-#define _mm512_mask_fmadd_round_pd(A, U, B, C, R)    \
-    (__m512d)__builtin_ia32_vfmaddpd512_mask(A, B, C, U, R)
-
-#define _mm512_mask3_fmadd_round_pd(A, B, C, U, R)   \
-    (__m512d)__builtin_ia32_vfmaddpd512_mask3(A, B, C, U, R)
-
-#define _mm512_maskz_fmadd_round_pd(U, A, B, C, R)   \
-    (__m512d)__builtin_ia32_vfmaddpd512_maskz(A, B, C, U, R)
-
-#define _mm512_fmadd_round_ps(A, B, C, R)            \
-    (__m512)__builtin_ia32_vfmaddps512_mask(A, B, C, -1, R)
-
-#define _mm512_mask_fmadd_round_ps(A, U, B, C, R)    \
-    (__m512)__builtin_ia32_vfmaddps512_mask(A, B, C, U, R)
-
-#define _mm512_mask3_fmadd_round_ps(A, B, C, U, R)   \
-    (__m512)__builtin_ia32_vfmaddps512_mask3(A, B, C, U, R)
-
-#define _mm512_maskz_fmadd_round_ps(U, A, B, C, R)   \
-    (__m512)__builtin_ia32_vfmaddps512_maskz(A, B, C, U, R)
-
-#define _mm512_fmsub_round_pd(A, B, C, R)            \
-    (__m512d)__builtin_ia32_vfmsubpd512_mask(A, B, C, -1, R)
-
-#define _mm512_mask_fmsub_round_pd(A, U, B, C, R)    \
-    (__m512d)__builtin_ia32_vfmsubpd512_mask(A, B, C, U, R)
-
-#define _mm512_mask3_fmsub_round_pd(A, B, C, U, R)   \
-    (__m512d)__builtin_ia32_vfmsubpd512_mask3(A, B, C, U, R)
-
-#define _mm512_maskz_fmsub_round_pd(U, A, B, C, R)   \
-    (__m512d)__builtin_ia32_vfmsubpd512_maskz(A, B, C, U, R)
-
-#define _mm512_fmsub_round_ps(A, B, C, R)            \
-    (__m512)__builtin_ia32_vfmsubps512_mask(A, B, C, -1, R)
-
-#define _mm512_mask_fmsub_round_ps(A, U, B, C, R)    \
-    (__m512)__builtin_ia32_vfmsubps512_mask(A, B, C, U, R)
-
-#define _mm512_mask3_fmsub_round_ps(A, B, C, U, R)   \
-    (__m512)__builtin_ia32_vfmsubps512_mask3(A, B, C, U, R)
-
-#define _mm512_maskz_fmsub_round_ps(U, A, B, C, R)   \
-    (__m512)__builtin_ia32_vfmsubps512_maskz(A, B, C, U, R)
 
-#define _mm512_fmaddsub_round_pd(A, B, C, R)            \
-    (__m512d)__builtin_ia32_vfmaddsubpd512_mask(A, B, C, -1, R)
-
-#define _mm512_mask_fmaddsub_round_pd(A, U, B, C, R)    \
-    (__m512d)__builtin_ia32_vfmaddsubpd512_mask(A, B, C, U, R)
-
-#define _mm512_mask3_fmaddsub_round_pd(A, B, C, U, R)   \
-    (__m512d)__builtin_ia32_vfmaddsubpd512_mask3(A, B, C, U, R)
-
-#define _mm512_maskz_fmaddsub_round_pd(U, A, B, C, R)   \
-    (__m512d)__builtin_ia32_vfmaddsubpd512_maskz(A, B, C, U, R)
-
-#define _mm512_fmaddsub_round_ps(A, B, C, R)            \
-    (__m512)__builtin_ia32_vfmaddsubps512_mask(A, B, C, -1, R)
+#define _mm512_setr_epi64(e0,e1,e2,e3,e4,e5,e6,e7)			      \
+  _mm512_set_epi64(e7,e6,e5,e4,e3,e2,e1,e0)
 
-#define _mm512_mask_fmaddsub_round_ps(A, U, B, C, R)    \
-    (__m512)__builtin_ia32_vfmaddsubps512_mask(A, B, C, U, R)
+#define _mm512_setr_epi32(e0,e1,e2,e3,e4,e5,e6,e7,			      \
+			  e8,e9,e10,e11,e12,e13,e14,e15)		      \
+  _mm512_set_epi32(e15,e14,e13,e12,e11,e10,e9,e8,e7,e6,e5,e4,e3,e2,e1,e0)
 
-#define _mm512_mask3_fmaddsub_round_ps(A, B, C, U, R)   \
-    (__m512)__builtin_ia32_vfmaddsubps512_mask3(A, B, C, U, R)
+#define _mm512_setr_pd(e0,e1,e2,e3,e4,e5,e6,e7)				      \
+  _mm512_set_pd(e7,e6,e5,e4,e3,e2,e1,e0)
 
-#define _mm512_maskz_fmaddsub_round_ps(U, A, B, C, R)   \
-    (__m512)__builtin_ia32_vfmaddsubps512_maskz(A, B, C, U, R)
+#define _mm512_setr_ps(e0,e1,e2,e3,e4,e5,e6,e7,e8,e9,e10,e11,e12,e13,e14,e15) \
+  _mm512_set_ps(e15,e14,e13,e12,e11,e10,e9,e8,e7,e6,e5,e4,e3,e2,e1,e0)
 
-#define _mm512_fmsubadd_round_pd(A, B, C, R)            \
-    (__m512d)__builtin_ia32_vfmaddsubpd512_mask(A, B, -(C), -1, R)
+extern __inline __m512
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_undefined_ps (void)
+{
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Winit-self"
+  __m512 __Y = __Y;
+#pragma GCC diagnostic pop
+  return __Y;
+}
 
-#define _mm512_mask_fmsubadd_round_pd(A, U, B, C, R)    \
-    (__m512d)__builtin_ia32_vfmaddsubpd512_mask(A, B, -(C), U, R)
+#define _mm512_undefined _mm512_undefined_ps
 
-#define _mm512_mask3_fmsubadd_round_pd(A, B, C, U, R)   \
-    (__m512d)__builtin_ia32_vfmsubaddpd512_mask3(A, B, C, U, R)
+extern __inline __m512d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_undefined_pd (void)
+{
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Winit-self"
+  __m512d __Y = __Y;
+#pragma GCC diagnostic pop
+  return __Y;
+}
 
-#define _mm512_maskz_fmsubadd_round_pd(U, A, B, C, R)   \
-    (__m512d)__builtin_ia32_vfmaddsubpd512_maskz(A, B, -(C), U, R)
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_undefined_epi32 (void)
+{
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Winit-self"
+  __m512i __Y = __Y;
+#pragma GCC diagnostic pop
+  return __Y;
+}
 
-#define _mm512_fmsubadd_round_ps(A, B, C, R)            \
-    (__m512)__builtin_ia32_vfmaddsubps512_mask(A, B, -(C), -1, R)
+#define _mm512_undefined_si512 _mm512_undefined_epi32
 
-#define _mm512_mask_fmsubadd_round_ps(A, U, B, C, R)    \
-    (__m512)__builtin_ia32_vfmaddsubps512_mask(A, B, -(C), U, R)
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_set1_epi8 (char __A)
+{
+  return __extension__ (__m512i)(__v64qi)
+	 { __A, __A, __A, __A, __A, __A, __A, __A,
+	   __A, __A, __A, __A, __A, __A, __A, __A,
+	   __A, __A, __A, __A, __A, __A, __A, __A,
+	   __A, __A, __A, __A, __A, __A, __A, __A,
+	   __A, __A, __A, __A, __A, __A, __A, __A,
+	   __A, __A, __A, __A, __A, __A, __A, __A,
+	   __A, __A, __A, __A, __A, __A, __A, __A,
+	   __A, __A, __A, __A, __A, __A, __A, __A };
+}
 
-#define _mm512_mask3_fmsubadd_round_ps(A, B, C, U, R)   \
-    (__m512)__builtin_ia32_vfmsubaddps512_mask3(A, B, C, U, R)
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_set1_epi16 (short __A)
+{
+  return __extension__ (__m512i)(__v32hi)
+	 { __A, __A, __A, __A, __A, __A, __A, __A,
+	   __A, __A, __A, __A, __A, __A, __A, __A,
+	   __A, __A, __A, __A, __A, __A, __A, __A,
+	   __A, __A, __A, __A, __A, __A, __A, __A };
+}
 
-#define _mm512_maskz_fmsubadd_round_ps(U, A, B, C, R)   \
-    (__m512)__builtin_ia32_vfmaddsubps512_maskz(A, B, -(C), U, R)
+extern __inline __m512d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_set1_pd (double __A)
+{
+  return __extension__ (__m512d)(__v8df)
+    { __A, __A, __A, __A, __A, __A, __A, __A };
+}
 
-#define _mm512_fnmadd_round_pd(A, B, C, R)            \
-    (__m512d)__builtin_ia32_vfnmaddpd512_mask(A, B, C, -1, R)
+extern __inline __m512
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_set1_ps (float __A)
+{
+  return __extension__ (__m512)(__v16sf)
+    { __A, __A, __A, __A, __A, __A, __A, __A,
+      __A, __A, __A, __A, __A, __A, __A, __A };
+}
 
-#define _mm512_mask_fnmadd_round_pd(A, U, B, C, R)    \
-    (__m512d)__builtin_ia32_vfnmaddpd512_mask(A, B, C, U, R)
+/* Create the vector [A B C D A B C D A B C D A B C D].  */
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_set4_epi32 (int __A, int __B, int __C, int __D)
+{
+  return __extension__ (__m512i)(__v16si)
+	 { __D, __C, __B, __A, __D, __C, __B, __A,
+	   __D, __C, __B, __A, __D, __C, __B, __A };
+}
 
-#define _mm512_mask3_fnmadd_round_pd(A, B, C, U, R)   \
-    (__m512d)__builtin_ia32_vfnmaddpd512_mask3(A, B, C, U, R)
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_set4_epi64 (long long __A, long long __B, long long __C,
+		   long long __D)
+{
+  return __extension__ (__m512i) (__v8di)
+	 { __D, __C, __B, __A, __D, __C, __B, __A };
+}
 
-#define _mm512_maskz_fnmadd_round_pd(U, A, B, C, R)   \
-    (__m512d)__builtin_ia32_vfnmaddpd512_maskz(A, B, C, U, R)
+extern __inline __m512d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_set4_pd (double __A, double __B, double __C, double __D)
+{
+  return __extension__ (__m512d)
+	 { __D, __C, __B, __A, __D, __C, __B, __A };
+}
 
-#define _mm512_fnmadd_round_ps(A, B, C, R)            \
-    (__m512)__builtin_ia32_vfnmaddps512_mask(A, B, C, -1, R)
+extern __inline __m512
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_set4_ps (float __A, float __B, float __C, float __D)
+{
+  return __extension__ (__m512)
+	 { __D, __C, __B, __A, __D, __C, __B, __A,
+	   __D, __C, __B, __A, __D, __C, __B, __A };
+}
 
-#define _mm512_mask_fnmadd_round_ps(A, U, B, C, R)    \
-    (__m512)__builtin_ia32_vfnmaddps512_mask(A, B, C, U, R)
+#define _mm512_setr4_epi64(e0,e1,e2,e3)					      \
+  _mm512_set4_epi64(e3,e2,e1,e0)
 
-#define _mm512_mask3_fnmadd_round_ps(A, B, C, U, R)   \
-    (__m512)__builtin_ia32_vfnmaddps512_mask3(A, B, C, U, R)
+#define _mm512_setr4_epi32(e0,e1,e2,e3)					      \
+  _mm512_set4_epi32(e3,e2,e1,e0)
 
-#define _mm512_maskz_fnmadd_round_ps(U, A, B, C, R)   \
-    (__m512)__builtin_ia32_vfnmaddps512_maskz(A, B, C, U, R)
+#define _mm512_setr4_pd(e0,e1,e2,e3)					      \
+  _mm512_set4_pd(e3,e2,e1,e0)
 
-#define _mm512_fnmsub_round_pd(A, B, C, R)            \
-    (__m512d)__builtin_ia32_vfnmsubpd512_mask(A, B, C, -1, R)
+#define _mm512_setr4_ps(e0,e1,e2,e3)					      \
+  _mm512_set4_ps(e3,e2,e1,e0)
 
-#define _mm512_mask_fnmsub_round_pd(A, U, B, C, R)    \
-    (__m512d)__builtin_ia32_vfnmsubpd512_mask(A, B, C, U, R)
+extern __inline __m512
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_setzero_ps (void)
+{
+  return __extension__ (__m512){ 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
+				 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 };
+}
 
-#define _mm512_mask3_fnmsub_round_pd(A, B, C, U, R)   \
-    (__m512d)__builtin_ia32_vfnmsubpd512_mask3(A, B, C, U, R)
+extern __inline __m512
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_setzero (void)
+{
+  return _mm512_setzero_ps ();
+}
 
-#define _mm512_maskz_fnmsub_round_pd(U, A, B, C, R)   \
-    (__m512d)__builtin_ia32_vfnmsubpd512_maskz(A, B, C, U, R)
+extern __inline __m512d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_setzero_pd (void)
+{
+  return __extension__ (__m512d) { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 };
+}
 
-#define _mm512_fnmsub_round_ps(A, B, C, R)            \
-    (__m512)__builtin_ia32_vfnmsubps512_mask(A, B, C, -1, R)
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_setzero_epi32 (void)
+{
+  return __extension__ (__m512i)(__v8di){ 0, 0, 0, 0, 0, 0, 0, 0 };
+}
 
-#define _mm512_mask_fnmsub_round_ps(A, U, B, C, R)    \
-    (__m512)__builtin_ia32_vfnmsubps512_mask(A, B, C, U, R)
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_setzero_si512 (void)
+{
+  return __extension__ (__m512i)(__v8di){ 0, 0, 0, 0, 0, 0, 0, 0 };
+}
 
-#define _mm512_mask3_fnmsub_round_ps(A, B, C, U, R)   \
-    (__m512)__builtin_ia32_vfnmsubps512_mask3(A, B, C, U, R)
+extern __inline __m512d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_mov_pd (__m512d __W, __mmask8 __U, __m512d __A)
+{
+  return (__m512d) __builtin_ia32_movapd512_mask ((__v8df) __A,
+						  (__v8df) __W,
+						  (__mmask8) __U);
+}
 
-#define _mm512_maskz_fnmsub_round_ps(U, A, B, C, R)   \
-    (__m512)__builtin_ia32_vfnmsubps512_maskz(A, B, C, U, R)
-#endif
+extern __inline __m512d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_mov_pd (__mmask8 __U, __m512d __A)
+{
+  return (__m512d) __builtin_ia32_movapd512_mask ((__v8df) __A,
+						  (__v8df)
+						  _mm512_setzero_pd (),
+						  (__mmask8) __U);
+}
 
-extern __inline __m512i
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_abs_epi64 (__m512i __A)
+_mm512_mask_mov_ps (__m512 __W, __mmask16 __U, __m512 __A)
 {
-  return (__m512i) __builtin_ia32_pabsq512_mask ((__v8di) __A,
-						 (__v8di)
-						 _mm512_undefined_epi32 (),
-						 (__mmask8) -1);
+  return (__m512) __builtin_ia32_movaps512_mask ((__v16sf) __A,
+						 (__v16sf) __W,
+						 (__mmask16) __U);
 }
 
-extern __inline __m512i
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_abs_epi64 (__m512i __W, __mmask8 __U, __m512i __A)
+_mm512_maskz_mov_ps (__mmask16 __U, __m512 __A)
 {
-  return (__m512i) __builtin_ia32_pabsq512_mask ((__v8di) __A,
-						 (__v8di) __W,
-						 (__mmask8) __U);
+  return (__m512) __builtin_ia32_movaps512_mask ((__v16sf) __A,
+						 (__v16sf)
+						 _mm512_setzero_ps (),
+						 (__mmask16) __U);
 }
 
-extern __inline __m512i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_abs_epi64 (__mmask8 __U, __m512i __A)
+_mm512_load_pd (void const *__P)
 {
-  return (__m512i) __builtin_ia32_pabsq512_mask ((__v8di) __A,
-						 (__v8di)
-						 _mm512_setzero_si512 (),
-						 (__mmask8) __U);
+  return *(__m512d *) __P;
 }
 
-extern __inline __m512i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_abs_epi32 (__m512i __A)
+_mm512_mask_load_pd (__m512d __W, __mmask8 __U, void const *__P)
 {
-  return (__m512i) __builtin_ia32_pabsd512_mask ((__v16si) __A,
-						 (__v16si)
-						 _mm512_undefined_epi32 (),
-						 (__mmask16) -1);
+  return (__m512d) __builtin_ia32_loadapd512_mask ((const __v8df *) __P,
+						   (__v8df) __W,
+						   (__mmask8) __U);
 }
 
-extern __inline __m512i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_abs_epi32 (__m512i __W, __mmask16 __U, __m512i __A)
+_mm512_maskz_load_pd (__mmask8 __U, void const *__P)
 {
-  return (__m512i) __builtin_ia32_pabsd512_mask ((__v16si) __A,
-						 (__v16si) __W,
-						 (__mmask16) __U);
+  return (__m512d) __builtin_ia32_loadapd512_mask ((const __v8df *) __P,
+						   (__v8df)
+						   _mm512_setzero_pd (),
+						   (__mmask8) __U);
+}
+
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_store_pd (void *__P, __m512d __A)
+{
+  *(__m512d *) __P = __A;
 }
 
-extern __inline __m512i
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_abs_epi32 (__mmask16 __U, __m512i __A)
+_mm512_mask_store_pd (void *__P, __mmask8 __U, __m512d __A)
 {
-  return (__m512i) __builtin_ia32_pabsd512_mask ((__v16si) __A,
-						 (__v16si)
-						 _mm512_setzero_si512 (),
-						 (__mmask16) __U);
+  __builtin_ia32_storeapd512_mask ((__v8df *) __P, (__v8df) __A,
+				   (__mmask8) __U);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_broadcastss_ps (__m128 __A)
+_mm512_load_ps (void const *__P)
 {
-  return (__m512) __builtin_ia32_broadcastss512 ((__v4sf) __A,
-						 (__v16sf)
-						 _mm512_undefined_ps (),
-						 (__mmask16) -1);
+  return *(__m512 *) __P;
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_broadcastss_ps (__m512 __O, __mmask16 __M, __m128 __A)
+_mm512_mask_load_ps (__m512 __W, __mmask16 __U, void const *__P)
 {
-  return (__m512) __builtin_ia32_broadcastss512 ((__v4sf) __A,
-						 (__v16sf) __O, __M);
+  return (__m512) __builtin_ia32_loadaps512_mask ((const __v16sf *) __P,
+						  (__v16sf) __W,
+						  (__mmask16) __U);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_broadcastss_ps (__mmask16 __M, __m128 __A)
+_mm512_maskz_load_ps (__mmask16 __U, void const *__P)
 {
-  return (__m512) __builtin_ia32_broadcastss512 ((__v4sf) __A,
-						 (__v16sf)
-						 _mm512_setzero_ps (),
-						 __M);
+  return (__m512) __builtin_ia32_loadaps512_mask ((const __v16sf *) __P,
+						  (__v16sf)
+						  _mm512_setzero_ps (),
+						  (__mmask16) __U);
 }
 
-extern __inline __m512d
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_broadcastsd_pd (__m128d __A)
+_mm512_store_ps (void *__P, __m512 __A)
 {
-  return (__m512d) __builtin_ia32_broadcastsd512 ((__v2df) __A,
-						  (__v8df)
-						  _mm512_undefined_pd (),
-						  (__mmask8) -1);
+  *(__m512 *) __P = __A;
 }
 
-extern __inline __m512d
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_broadcastsd_pd (__m512d __O, __mmask8 __M, __m128d __A)
+_mm512_mask_store_ps (void *__P, __mmask16 __U, __m512 __A)
 {
-  return (__m512d) __builtin_ia32_broadcastsd512 ((__v2df) __A,
-						  (__v8df) __O, __M);
+  __builtin_ia32_storeaps512_mask ((__v16sf *) __P, (__v16sf) __A,
+				   (__mmask16) __U);
 }
 
-extern __inline __m512d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_broadcastsd_pd (__mmask8 __M, __m128d __A)
+_mm512_mask_mov_epi64 (__m512i __W, __mmask8 __U, __m512i __A)
 {
-  return (__m512d) __builtin_ia32_broadcastsd512 ((__v2df) __A,
-						  (__v8df)
-						  _mm512_setzero_pd (),
-						  __M);
+  return (__m512i) __builtin_ia32_movdqa64_512_mask ((__v8di) __A,
+						     (__v8di) __W,
+						     (__mmask8) __U);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_broadcastd_epi32 (__m128i __A)
+_mm512_maskz_mov_epi64 (__mmask8 __U, __m512i __A)
 {
-  return (__m512i) __builtin_ia32_pbroadcastd512 ((__v4si) __A,
-						  (__v16si)
-						  _mm512_undefined_epi32 (),
-						  (__mmask16) -1);
+  return (__m512i) __builtin_ia32_movdqa64_512_mask ((__v8di) __A,
+						     (__v8di)
+						     _mm512_setzero_si512 (),
+						     (__mmask8) __U);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_broadcastd_epi32 (__m512i __O, __mmask16 __M, __m128i __A)
+_mm512_load_epi64 (void const *__P)
 {
-  return (__m512i) __builtin_ia32_pbroadcastd512 ((__v4si) __A,
-						  (__v16si) __O, __M);
+  return *(__m512i *) __P;
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_broadcastd_epi32 (__mmask16 __M, __m128i __A)
+_mm512_mask_load_epi64 (__m512i __W, __mmask8 __U, void const *__P)
 {
-  return (__m512i) __builtin_ia32_pbroadcastd512 ((__v4si) __A,
-						  (__v16si)
-						  _mm512_setzero_si512 (),
-						  __M);
+  return (__m512i) __builtin_ia32_movdqa64load512_mask ((const __v8di *) __P,
+							(__v8di) __W,
+							(__mmask8) __U);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_set1_epi32 (int __A)
+_mm512_maskz_load_epi64 (__mmask8 __U, void const *__P)
 {
-  return (__m512i)(__v16si)
-    { __A, __A, __A, __A, __A, __A, __A, __A,
-      __A, __A, __A, __A, __A, __A, __A, __A };
+  return (__m512i) __builtin_ia32_movdqa64load512_mask ((const __v8di *) __P,
+							(__v8di)
+							_mm512_setzero_si512 (),
+							(__mmask8) __U);
 }
 
-extern __inline __m512i
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_set1_epi32 (__m512i __O, __mmask16 __M, int __A)
+_mm512_store_epi64 (void *__P, __m512i __A)
 {
-  return (__m512i) __builtin_ia32_pbroadcastd512_gpr_mask (__A, (__v16si) __O,
-							   __M);
+  *(__m512i *) __P = __A;
 }
 
-extern __inline __m512i
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_set1_epi32 (__mmask16 __M, int __A)
+_mm512_mask_store_epi64 (void *__P, __mmask8 __U, __m512i __A)
 {
-  return (__m512i)
-	 __builtin_ia32_pbroadcastd512_gpr_mask (__A,
-						 (__v16si) _mm512_setzero_si512 (),
-						 __M);
+  __builtin_ia32_movdqa64store512_mask ((__v8di *) __P, (__v8di) __A,
+					(__mmask8) __U);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_broadcastq_epi64 (__m128i __A)
+_mm512_mask_mov_epi32 (__m512i __W, __mmask16 __U, __m512i __A)
 {
-  return (__m512i) __builtin_ia32_pbroadcastq512 ((__v2di) __A,
-						  (__v8di)
-						  _mm512_undefined_epi32 (),
-						  (__mmask8) -1);
+  return (__m512i) __builtin_ia32_movdqa32_512_mask ((__v16si) __A,
+						     (__v16si) __W,
+						     (__mmask16) __U);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_broadcastq_epi64 (__m512i __O, __mmask8 __M, __m128i __A)
+_mm512_maskz_mov_epi32 (__mmask16 __U, __m512i __A)
 {
-  return (__m512i) __builtin_ia32_pbroadcastq512 ((__v2di) __A,
-						  (__v8di) __O, __M);
+  return (__m512i) __builtin_ia32_movdqa32_512_mask ((__v16si) __A,
+						     (__v16si)
+						     _mm512_setzero_si512 (),
+						     (__mmask16) __U);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_broadcastq_epi64 (__mmask8 __M, __m128i __A)
+_mm512_load_si512 (void const *__P)
 {
-  return (__m512i) __builtin_ia32_pbroadcastq512 ((__v2di) __A,
-						  (__v8di)
-						  _mm512_setzero_si512 (),
-						  __M);
+  return *(__m512i *) __P;
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_set1_epi64 (long long __A)
+_mm512_load_epi32 (void const *__P)
 {
-  return (__m512i)(__v8di) { __A, __A, __A, __A, __A, __A, __A, __A };
+  return *(__m512i *) __P;
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_set1_epi64 (__m512i __O, __mmask8 __M, long long __A)
+_mm512_mask_load_epi32 (__m512i __W, __mmask16 __U, void const *__P)
 {
-  return (__m512i) __builtin_ia32_pbroadcastq512_gpr_mask (__A, (__v8di) __O,
-							   __M);
+  return (__m512i) __builtin_ia32_movdqa32load512_mask ((const __v16si *) __P,
+							(__v16si) __W,
+							(__mmask16) __U);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_set1_epi64 (__mmask8 __M, long long __A)
+_mm512_maskz_load_epi32 (__mmask16 __U, void const *__P)
 {
-  return (__m512i)
-	 __builtin_ia32_pbroadcastq512_gpr_mask (__A,
-						 (__v8di) _mm512_setzero_si512 (),
-						 __M);
+  return (__m512i) __builtin_ia32_movdqa32load512_mask ((const __v16si *) __P,
+							(__v16si)
+							_mm512_setzero_si512 (),
+							(__mmask16) __U);
 }
 
-extern __inline __m512
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_broadcast_f32x4 (__m128 __A)
+_mm512_store_si512 (void *__P, __m512i __A)
 {
-  return (__m512) __builtin_ia32_broadcastf32x4_512 ((__v4sf) __A,
-						     (__v16sf)
-						     _mm512_undefined_ps (),
-						     (__mmask16) -1);
+  *(__m512i *) __P = __A;
 }
 
-extern __inline __m512
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_broadcast_f32x4 (__m512 __O, __mmask16 __M, __m128 __A)
+_mm512_store_epi32 (void *__P, __m512i __A)
 {
-  return (__m512) __builtin_ia32_broadcastf32x4_512 ((__v4sf) __A,
-						     (__v16sf) __O,
-						     __M);
+  *(__m512i *) __P = __A;
 }
 
-extern __inline __m512
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_broadcast_f32x4 (__mmask16 __M, __m128 __A)
+_mm512_mask_store_epi32 (void *__P, __mmask16 __U, __m512i __A)
 {
-  return (__m512) __builtin_ia32_broadcastf32x4_512 ((__v4sf) __A,
-						     (__v16sf)
-						     _mm512_setzero_ps (),
-						     __M);
+  __builtin_ia32_movdqa32store512_mask ((__v16si *) __P, (__v16si) __A,
+					(__mmask16) __U);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_broadcast_i32x4 (__m128i __A)
+_mm512_mullo_epi32 (__m512i __A, __m512i __B)
 {
-  return (__m512i) __builtin_ia32_broadcasti32x4_512 ((__v4si) __A,
-						      (__v16si)
-						      _mm512_undefined_epi32 (),
-						      (__mmask16) -1);
+  return (__m512i) ((__v16su) __A * (__v16su) __B);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_broadcast_i32x4 (__m512i __O, __mmask16 __M, __m128i __A)
+_mm512_maskz_mullo_epi32 (__mmask16 __M, __m512i __A, __m512i __B)
 {
-  return (__m512i) __builtin_ia32_broadcasti32x4_512 ((__v4si) __A,
-						      (__v16si) __O,
-						      __M);
+  return (__m512i) __builtin_ia32_pmulld512_mask ((__v16si) __A,
+						  (__v16si) __B,
+						  (__v16si)
+						  _mm512_setzero_si512 (),
+						  __M);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_broadcast_i32x4 (__mmask16 __M, __m128i __A)
+_mm512_mask_mullo_epi32 (__m512i __W, __mmask16 __M, __m512i __A, __m512i __B)
 {
-  return (__m512i) __builtin_ia32_broadcasti32x4_512 ((__v4si) __A,
-						      (__v16si)
-						      _mm512_setzero_si512 (),
-						      __M);
+  return (__m512i) __builtin_ia32_pmulld512_mask ((__v16si) __A,
+						  (__v16si) __B,
+						  (__v16si) __W, __M);
 }
 
-extern __inline __m512d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_broadcast_f64x4 (__m256d __A)
+_mm512_mullox_epi64 (__m512i __A, __m512i __B)
 {
-  return (__m512d) __builtin_ia32_broadcastf64x4_512 ((__v4df) __A,
-						      (__v8df)
-						      _mm512_undefined_pd (),
-						      (__mmask8) -1);
+  return (__m512i) ((__v8du) __A * (__v8du) __B);
 }
 
-extern __inline __m512d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_broadcast_f64x4 (__m512d __O, __mmask8 __M, __m256d __A)
+_mm512_mask_mullox_epi64 (__m512i __W, __mmask8 __M, __m512i __A, __m512i __B)
 {
-  return (__m512d) __builtin_ia32_broadcastf64x4_512 ((__v4df) __A,
-						      (__v8df) __O,
-						      __M);
+  return _mm512_mask_mov_epi64 (__W, __M, _mm512_mullox_epi64 (__A, __B));
 }
 
-extern __inline __m512d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_broadcast_f64x4 (__mmask8 __M, __m256d __A)
+_mm512_sllv_epi32 (__m512i __X, __m512i __Y)
 {
-  return (__m512d) __builtin_ia32_broadcastf64x4_512 ((__v4df) __A,
-						      (__v8df)
-						      _mm512_setzero_pd (),
-						      __M);
+  return (__m512i) __builtin_ia32_psllv16si_mask ((__v16si) __X,
+						  (__v16si) __Y,
+						  (__v16si)
+						  _mm512_undefined_epi32 (),
+						  (__mmask16) -1);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_sllv_epi32 (__m512i __W, __mmask16 __U, __m512i __X, __m512i __Y)
+{
+  return (__m512i) __builtin_ia32_psllv16si_mask ((__v16si) __X,
+						  (__v16si) __Y,
+						  (__v16si) __W,
+						  (__mmask16) __U);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_sllv_epi32 (__mmask16 __U, __m512i __X, __m512i __Y)
+{
+  return (__m512i) __builtin_ia32_psllv16si_mask ((__v16si) __X,
+						  (__v16si) __Y,
+						  (__v16si)
+						  _mm512_setzero_si512 (),
+						  (__mmask16) __U);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_broadcast_i64x4 (__m256i __A)
+_mm512_srav_epi32 (__m512i __X, __m512i __Y)
 {
-  return (__m512i) __builtin_ia32_broadcasti64x4_512 ((__v4di) __A,
-						      (__v8di)
-						      _mm512_undefined_epi32 (),
-						      (__mmask8) -1);
+  return (__m512i) __builtin_ia32_psrav16si_mask ((__v16si) __X,
+						  (__v16si) __Y,
+						  (__v16si)
+						  _mm512_undefined_epi32 (),
+						  (__mmask16) -1);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_broadcast_i64x4 (__m512i __O, __mmask8 __M, __m256i __A)
+_mm512_mask_srav_epi32 (__m512i __W, __mmask16 __U, __m512i __X, __m512i __Y)
 {
-  return (__m512i) __builtin_ia32_broadcasti64x4_512 ((__v4di) __A,
-						      (__v8di) __O,
-						      __M);
+  return (__m512i) __builtin_ia32_psrav16si_mask ((__v16si) __X,
+						  (__v16si) __Y,
+						  (__v16si) __W,
+						  (__mmask16) __U);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_broadcast_i64x4 (__mmask8 __M, __m256i __A)
+_mm512_maskz_srav_epi32 (__mmask16 __U, __m512i __X, __m512i __Y)
 {
-  return (__m512i) __builtin_ia32_broadcasti64x4_512 ((__v4di) __A,
-						      (__v8di)
-						      _mm512_setzero_si512 (),
-						      __M);
+  return (__m512i) __builtin_ia32_psrav16si_mask ((__v16si) __X,
+						  (__v16si) __Y,
+						  (__v16si)
+						  _mm512_setzero_si512 (),
+						  (__mmask16) __U);
 }
 
-typedef enum
-{
-  _MM_PERM_AAAA = 0x00, _MM_PERM_AAAB = 0x01, _MM_PERM_AAAC = 0x02,
-  _MM_PERM_AAAD = 0x03, _MM_PERM_AABA = 0x04, _MM_PERM_AABB = 0x05,
-  _MM_PERM_AABC = 0x06, _MM_PERM_AABD = 0x07, _MM_PERM_AACA = 0x08,
-  _MM_PERM_AACB = 0x09, _MM_PERM_AACC = 0x0A, _MM_PERM_AACD = 0x0B,
-  _MM_PERM_AADA = 0x0C, _MM_PERM_AADB = 0x0D, _MM_PERM_AADC = 0x0E,
-  _MM_PERM_AADD = 0x0F, _MM_PERM_ABAA = 0x10, _MM_PERM_ABAB = 0x11,
-  _MM_PERM_ABAC = 0x12, _MM_PERM_ABAD = 0x13, _MM_PERM_ABBA = 0x14,
-  _MM_PERM_ABBB = 0x15, _MM_PERM_ABBC = 0x16, _MM_PERM_ABBD = 0x17,
-  _MM_PERM_ABCA = 0x18, _MM_PERM_ABCB = 0x19, _MM_PERM_ABCC = 0x1A,
-  _MM_PERM_ABCD = 0x1B, _MM_PERM_ABDA = 0x1C, _MM_PERM_ABDB = 0x1D,
-  _MM_PERM_ABDC = 0x1E, _MM_PERM_ABDD = 0x1F, _MM_PERM_ACAA = 0x20,
-  _MM_PERM_ACAB = 0x21, _MM_PERM_ACAC = 0x22, _MM_PERM_ACAD = 0x23,
-  _MM_PERM_ACBA = 0x24, _MM_PERM_ACBB = 0x25, _MM_PERM_ACBC = 0x26,
-  _MM_PERM_ACBD = 0x27, _MM_PERM_ACCA = 0x28, _MM_PERM_ACCB = 0x29,
-  _MM_PERM_ACCC = 0x2A, _MM_PERM_ACCD = 0x2B, _MM_PERM_ACDA = 0x2C,
-  _MM_PERM_ACDB = 0x2D, _MM_PERM_ACDC = 0x2E, _MM_PERM_ACDD = 0x2F,
-  _MM_PERM_ADAA = 0x30, _MM_PERM_ADAB = 0x31, _MM_PERM_ADAC = 0x32,
-  _MM_PERM_ADAD = 0x33, _MM_PERM_ADBA = 0x34, _MM_PERM_ADBB = 0x35,
-  _MM_PERM_ADBC = 0x36, _MM_PERM_ADBD = 0x37, _MM_PERM_ADCA = 0x38,
-  _MM_PERM_ADCB = 0x39, _MM_PERM_ADCC = 0x3A, _MM_PERM_ADCD = 0x3B,
-  _MM_PERM_ADDA = 0x3C, _MM_PERM_ADDB = 0x3D, _MM_PERM_ADDC = 0x3E,
-  _MM_PERM_ADDD = 0x3F, _MM_PERM_BAAA = 0x40, _MM_PERM_BAAB = 0x41,
-  _MM_PERM_BAAC = 0x42, _MM_PERM_BAAD = 0x43, _MM_PERM_BABA = 0x44,
-  _MM_PERM_BABB = 0x45, _MM_PERM_BABC = 0x46, _MM_PERM_BABD = 0x47,
-  _MM_PERM_BACA = 0x48, _MM_PERM_BACB = 0x49, _MM_PERM_BACC = 0x4A,
-  _MM_PERM_BACD = 0x4B, _MM_PERM_BADA = 0x4C, _MM_PERM_BADB = 0x4D,
-  _MM_PERM_BADC = 0x4E, _MM_PERM_BADD = 0x4F, _MM_PERM_BBAA = 0x50,
-  _MM_PERM_BBAB = 0x51, _MM_PERM_BBAC = 0x52, _MM_PERM_BBAD = 0x53,
-  _MM_PERM_BBBA = 0x54, _MM_PERM_BBBB = 0x55, _MM_PERM_BBBC = 0x56,
-  _MM_PERM_BBBD = 0x57, _MM_PERM_BBCA = 0x58, _MM_PERM_BBCB = 0x59,
-  _MM_PERM_BBCC = 0x5A, _MM_PERM_BBCD = 0x5B, _MM_PERM_BBDA = 0x5C,
-  _MM_PERM_BBDB = 0x5D, _MM_PERM_BBDC = 0x5E, _MM_PERM_BBDD = 0x5F,
-  _MM_PERM_BCAA = 0x60, _MM_PERM_BCAB = 0x61, _MM_PERM_BCAC = 0x62,
-  _MM_PERM_BCAD = 0x63, _MM_PERM_BCBA = 0x64, _MM_PERM_BCBB = 0x65,
-  _MM_PERM_BCBC = 0x66, _MM_PERM_BCBD = 0x67, _MM_PERM_BCCA = 0x68,
-  _MM_PERM_BCCB = 0x69, _MM_PERM_BCCC = 0x6A, _MM_PERM_BCCD = 0x6B,
-  _MM_PERM_BCDA = 0x6C, _MM_PERM_BCDB = 0x6D, _MM_PERM_BCDC = 0x6E,
-  _MM_PERM_BCDD = 0x6F, _MM_PERM_BDAA = 0x70, _MM_PERM_BDAB = 0x71,
-  _MM_PERM_BDAC = 0x72, _MM_PERM_BDAD = 0x73, _MM_PERM_BDBA = 0x74,
-  _MM_PERM_BDBB = 0x75, _MM_PERM_BDBC = 0x76, _MM_PERM_BDBD = 0x77,
-  _MM_PERM_BDCA = 0x78, _MM_PERM_BDCB = 0x79, _MM_PERM_BDCC = 0x7A,
-  _MM_PERM_BDCD = 0x7B, _MM_PERM_BDDA = 0x7C, _MM_PERM_BDDB = 0x7D,
-  _MM_PERM_BDDC = 0x7E, _MM_PERM_BDDD = 0x7F, _MM_PERM_CAAA = 0x80,
-  _MM_PERM_CAAB = 0x81, _MM_PERM_CAAC = 0x82, _MM_PERM_CAAD = 0x83,
-  _MM_PERM_CABA = 0x84, _MM_PERM_CABB = 0x85, _MM_PERM_CABC = 0x86,
-  _MM_PERM_CABD = 0x87, _MM_PERM_CACA = 0x88, _MM_PERM_CACB = 0x89,
-  _MM_PERM_CACC = 0x8A, _MM_PERM_CACD = 0x8B, _MM_PERM_CADA = 0x8C,
-  _MM_PERM_CADB = 0x8D, _MM_PERM_CADC = 0x8E, _MM_PERM_CADD = 0x8F,
-  _MM_PERM_CBAA = 0x90, _MM_PERM_CBAB = 0x91, _MM_PERM_CBAC = 0x92,
-  _MM_PERM_CBAD = 0x93, _MM_PERM_CBBA = 0x94, _MM_PERM_CBBB = 0x95,
-  _MM_PERM_CBBC = 0x96, _MM_PERM_CBBD = 0x97, _MM_PERM_CBCA = 0x98,
-  _MM_PERM_CBCB = 0x99, _MM_PERM_CBCC = 0x9A, _MM_PERM_CBCD = 0x9B,
-  _MM_PERM_CBDA = 0x9C, _MM_PERM_CBDB = 0x9D, _MM_PERM_CBDC = 0x9E,
-  _MM_PERM_CBDD = 0x9F, _MM_PERM_CCAA = 0xA0, _MM_PERM_CCAB = 0xA1,
-  _MM_PERM_CCAC = 0xA2, _MM_PERM_CCAD = 0xA3, _MM_PERM_CCBA = 0xA4,
-  _MM_PERM_CCBB = 0xA5, _MM_PERM_CCBC = 0xA6, _MM_PERM_CCBD = 0xA7,
-  _MM_PERM_CCCA = 0xA8, _MM_PERM_CCCB = 0xA9, _MM_PERM_CCCC = 0xAA,
-  _MM_PERM_CCCD = 0xAB, _MM_PERM_CCDA = 0xAC, _MM_PERM_CCDB = 0xAD,
-  _MM_PERM_CCDC = 0xAE, _MM_PERM_CCDD = 0xAF, _MM_PERM_CDAA = 0xB0,
-  _MM_PERM_CDAB = 0xB1, _MM_PERM_CDAC = 0xB2, _MM_PERM_CDAD = 0xB3,
-  _MM_PERM_CDBA = 0xB4, _MM_PERM_CDBB = 0xB5, _MM_PERM_CDBC = 0xB6,
-  _MM_PERM_CDBD = 0xB7, _MM_PERM_CDCA = 0xB8, _MM_PERM_CDCB = 0xB9,
-  _MM_PERM_CDCC = 0xBA, _MM_PERM_CDCD = 0xBB, _MM_PERM_CDDA = 0xBC,
-  _MM_PERM_CDDB = 0xBD, _MM_PERM_CDDC = 0xBE, _MM_PERM_CDDD = 0xBF,
-  _MM_PERM_DAAA = 0xC0, _MM_PERM_DAAB = 0xC1, _MM_PERM_DAAC = 0xC2,
-  _MM_PERM_DAAD = 0xC3, _MM_PERM_DABA = 0xC4, _MM_PERM_DABB = 0xC5,
-  _MM_PERM_DABC = 0xC6, _MM_PERM_DABD = 0xC7, _MM_PERM_DACA = 0xC8,
-  _MM_PERM_DACB = 0xC9, _MM_PERM_DACC = 0xCA, _MM_PERM_DACD = 0xCB,
-  _MM_PERM_DADA = 0xCC, _MM_PERM_DADB = 0xCD, _MM_PERM_DADC = 0xCE,
-  _MM_PERM_DADD = 0xCF, _MM_PERM_DBAA = 0xD0, _MM_PERM_DBAB = 0xD1,
-  _MM_PERM_DBAC = 0xD2, _MM_PERM_DBAD = 0xD3, _MM_PERM_DBBA = 0xD4,
-  _MM_PERM_DBBB = 0xD5, _MM_PERM_DBBC = 0xD6, _MM_PERM_DBBD = 0xD7,
-  _MM_PERM_DBCA = 0xD8, _MM_PERM_DBCB = 0xD9, _MM_PERM_DBCC = 0xDA,
-  _MM_PERM_DBCD = 0xDB, _MM_PERM_DBDA = 0xDC, _MM_PERM_DBDB = 0xDD,
-  _MM_PERM_DBDC = 0xDE, _MM_PERM_DBDD = 0xDF, _MM_PERM_DCAA = 0xE0,
-  _MM_PERM_DCAB = 0xE1, _MM_PERM_DCAC = 0xE2, _MM_PERM_DCAD = 0xE3,
-  _MM_PERM_DCBA = 0xE4, _MM_PERM_DCBB = 0xE5, _MM_PERM_DCBC = 0xE6,
-  _MM_PERM_DCBD = 0xE7, _MM_PERM_DCCA = 0xE8, _MM_PERM_DCCB = 0xE9,
-  _MM_PERM_DCCC = 0xEA, _MM_PERM_DCCD = 0xEB, _MM_PERM_DCDA = 0xEC,
-  _MM_PERM_DCDB = 0xED, _MM_PERM_DCDC = 0xEE, _MM_PERM_DCDD = 0xEF,
-  _MM_PERM_DDAA = 0xF0, _MM_PERM_DDAB = 0xF1, _MM_PERM_DDAC = 0xF2,
-  _MM_PERM_DDAD = 0xF3, _MM_PERM_DDBA = 0xF4, _MM_PERM_DDBB = 0xF5,
-  _MM_PERM_DDBC = 0xF6, _MM_PERM_DDBD = 0xF7, _MM_PERM_DDCA = 0xF8,
-  _MM_PERM_DDCB = 0xF9, _MM_PERM_DDCC = 0xFA, _MM_PERM_DDCD = 0xFB,
-  _MM_PERM_DDDA = 0xFC, _MM_PERM_DDDB = 0xFD, _MM_PERM_DDDC = 0xFE,
-  _MM_PERM_DDDD = 0xFF
-} _MM_PERM_ENUM;
-
-#ifdef __OPTIMIZE__
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_shuffle_epi32 (__m512i __A, _MM_PERM_ENUM __mask)
+_mm512_srlv_epi32 (__m512i __X, __m512i __Y)
 {
-  return (__m512i) __builtin_ia32_pshufd512_mask ((__v16si) __A,
-						  __mask,
+  return (__m512i) __builtin_ia32_psrlv16si_mask ((__v16si) __X,
+						  (__v16si) __Y,
 						  (__v16si)
 						  _mm512_undefined_epi32 (),
 						  (__mmask16) -1);
@@ -4492,21 +4456,20 @@ _mm512_shuffle_epi32 (__m512i __A, _MM_PERM_ENUM __mask)
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_shuffle_epi32 (__m512i __W, __mmask16 __U, __m512i __A,
-			   _MM_PERM_ENUM __mask)
+_mm512_mask_srlv_epi32 (__m512i __W, __mmask16 __U, __m512i __X, __m512i __Y)
 {
-  return (__m512i) __builtin_ia32_pshufd512_mask ((__v16si) __A,
-						  __mask,
+  return (__m512i) __builtin_ia32_psrlv16si_mask ((__v16si) __X,
+						  (__v16si) __Y,
 						  (__v16si) __W,
 						  (__mmask16) __U);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_shuffle_epi32 (__mmask16 __U, __m512i __A, _MM_PERM_ENUM __mask)
+_mm512_maskz_srlv_epi32 (__mmask16 __U, __m512i __X, __m512i __Y)
 {
-  return (__m512i) __builtin_ia32_pshufd512_mask ((__v16si) __A,
-						  __mask,
+  return (__m512i) __builtin_ia32_psrlv16si_mask ((__v16si) __X,
+						  (__v16si) __Y,
 						  (__v16si)
 						  _mm512_setzero_si512 (),
 						  (__mmask16) __U);
@@ -4514,302 +4477,190 @@ _mm512_maskz_shuffle_epi32 (__mmask16 __U, __m512i __A, _MM_PERM_ENUM __mask)
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_shuffle_i64x2 (__m512i __A, __m512i __B, const int __imm)
+_mm512_add_epi64 (__m512i __A, __m512i __B)
 {
-  return (__m512i) __builtin_ia32_shuf_i64x2_mask ((__v8di) __A,
-						   (__v8di) __B, __imm,
-						   (__v8di)
-						   _mm512_undefined_epi32 (),
-						   (__mmask8) -1);
+  return (__m512i) ((__v8du) __A + (__v8du) __B);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_shuffle_i64x2 (__m512i __W, __mmask8 __U, __m512i __A,
-			   __m512i __B, const int __imm)
+_mm512_mask_add_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m512i __B)
 {
-  return (__m512i) __builtin_ia32_shuf_i64x2_mask ((__v8di) __A,
-						   (__v8di) __B, __imm,
-						   (__v8di) __W,
-						   (__mmask8) __U);
+  return (__m512i) __builtin_ia32_paddq512_mask ((__v8di) __A,
+						 (__v8di) __B,
+						 (__v8di) __W,
+						 (__mmask8) __U);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_shuffle_i64x2 (__mmask8 __U, __m512i __A, __m512i __B,
-			    const int __imm)
+_mm512_maskz_add_epi64 (__mmask8 __U, __m512i __A, __m512i __B)
 {
-  return (__m512i) __builtin_ia32_shuf_i64x2_mask ((__v8di) __A,
-						   (__v8di) __B, __imm,
-						   (__v8di)
-						   _mm512_setzero_si512 (),
-						   (__mmask8) __U);
+  return (__m512i) __builtin_ia32_paddq512_mask ((__v8di) __A,
+						 (__v8di) __B,
+						 (__v8di)
+						 _mm512_setzero_si512 (),
+						 (__mmask8) __U);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_shuffle_i32x4 (__m512i __A, __m512i __B, const int __imm)
+_mm512_sub_epi64 (__m512i __A, __m512i __B)
 {
-  return (__m512i) __builtin_ia32_shuf_i32x4_mask ((__v16si) __A,
-						   (__v16si) __B,
-						   __imm,
-						   (__v16si)
-						   _mm512_undefined_epi32 (),
-						   (__mmask16) -1);
+  return (__m512i) ((__v8du) __A - (__v8du) __B);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_shuffle_i32x4 (__m512i __W, __mmask16 __U, __m512i __A,
-			   __m512i __B, const int __imm)
+_mm512_mask_sub_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m512i __B)
 {
-  return (__m512i) __builtin_ia32_shuf_i32x4_mask ((__v16si) __A,
-						   (__v16si) __B,
-						   __imm,
-						   (__v16si) __W,
-						   (__mmask16) __U);
+  return (__m512i) __builtin_ia32_psubq512_mask ((__v8di) __A,
+						 (__v8di) __B,
+						 (__v8di) __W,
+						 (__mmask8) __U);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_shuffle_i32x4 (__mmask16 __U, __m512i __A, __m512i __B,
-			    const int __imm)
+_mm512_maskz_sub_epi64 (__mmask8 __U, __m512i __A, __m512i __B)
 {
-  return (__m512i) __builtin_ia32_shuf_i32x4_mask ((__v16si) __A,
-						   (__v16si) __B,
-						   __imm,
-						   (__v16si)
-						   _mm512_setzero_si512 (),
-						   (__mmask16) __U);
+  return (__m512i) __builtin_ia32_psubq512_mask ((__v8di) __A,
+						 (__v8di) __B,
+						 (__v8di)
+						 _mm512_setzero_si512 (),
+						 (__mmask8) __U);
 }
 
-extern __inline __m512d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_shuffle_f64x2 (__m512d __A, __m512d __B, const int __imm)
+_mm512_sllv_epi64 (__m512i __X, __m512i __Y)
 {
-  return (__m512d) __builtin_ia32_shuf_f64x2_mask ((__v8df) __A,
-						   (__v8df) __B, __imm,
-						   (__v8df)
-						   _mm512_undefined_pd (),
-						   (__mmask8) -1);
+  return (__m512i) __builtin_ia32_psllv8di_mask ((__v8di) __X,
+						 (__v8di) __Y,
+						 (__v8di)
+						 _mm512_undefined_pd (),
+						 (__mmask8) -1);
 }
 
-extern __inline __m512d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_shuffle_f64x2 (__m512d __W, __mmask8 __U, __m512d __A,
-			   __m512d __B, const int __imm)
+_mm512_mask_sllv_epi64 (__m512i __W, __mmask8 __U, __m512i __X, __m512i __Y)
 {
-  return (__m512d) __builtin_ia32_shuf_f64x2_mask ((__v8df) __A,
-						   (__v8df) __B, __imm,
-						   (__v8df) __W,
-						   (__mmask8) __U);
+  return (__m512i) __builtin_ia32_psllv8di_mask ((__v8di) __X,
+						 (__v8di) __Y,
+						 (__v8di) __W,
+						 (__mmask8) __U);
 }
 
-extern __inline __m512d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_shuffle_f64x2 (__mmask8 __U, __m512d __A, __m512d __B,
-			    const int __imm)
+_mm512_maskz_sllv_epi64 (__mmask8 __U, __m512i __X, __m512i __Y)
 {
-  return (__m512d) __builtin_ia32_shuf_f64x2_mask ((__v8df) __A,
-						   (__v8df) __B, __imm,
-						   (__v8df)
-						   _mm512_setzero_pd (),
-						   (__mmask8) __U);
+  return (__m512i) __builtin_ia32_psllv8di_mask ((__v8di) __X,
+						 (__v8di) __Y,
+						 (__v8di)
+						 _mm512_setzero_si512 (),
+						 (__mmask8) __U);
 }
 
-extern __inline __m512
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_shuffle_f32x4 (__m512 __A, __m512 __B, const int __imm)
+_mm512_srav_epi64 (__m512i __X, __m512i __Y)
 {
-  return (__m512) __builtin_ia32_shuf_f32x4_mask ((__v16sf) __A,
-						  (__v16sf) __B, __imm,
-						  (__v16sf)
-						  _mm512_undefined_ps (),
-						  (__mmask16) -1);
+  return (__m512i) __builtin_ia32_psrav8di_mask ((__v8di) __X,
+						 (__v8di) __Y,
+						 (__v8di)
+						 _mm512_undefined_epi32 (),
+						 (__mmask8) -1);
 }
 
-extern __inline __m512
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_shuffle_f32x4 (__m512 __W, __mmask16 __U, __m512 __A,
-			   __m512 __B, const int __imm)
+_mm512_mask_srav_epi64 (__m512i __W, __mmask8 __U, __m512i __X, __m512i __Y)
 {
-  return (__m512) __builtin_ia32_shuf_f32x4_mask ((__v16sf) __A,
-						  (__v16sf) __B, __imm,
-						  (__v16sf) __W,
-						  (__mmask16) __U);
+  return (__m512i) __builtin_ia32_psrav8di_mask ((__v8di) __X,
+						 (__v8di) __Y,
+						 (__v8di) __W,
+						 (__mmask8) __U);
 }
 
-extern __inline __m512
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_shuffle_f32x4 (__mmask16 __U, __m512 __A, __m512 __B,
-			    const int __imm)
+_mm512_maskz_srav_epi64 (__mmask8 __U, __m512i __X, __m512i __Y)
 {
-  return (__m512) __builtin_ia32_shuf_f32x4_mask ((__v16sf) __A,
-						  (__v16sf) __B, __imm,
-						  (__v16sf)
-						  _mm512_setzero_ps (),
-						  (__mmask16) __U);
+  return (__m512i) __builtin_ia32_psrav8di_mask ((__v8di) __X,
+						 (__v8di) __Y,
+						 (__v8di)
+						 _mm512_setzero_si512 (),
+						 (__mmask8) __U);
 }
 
-#else
-#define _mm512_shuffle_epi32(X, C)                                      \
-  ((__m512i)  __builtin_ia32_pshufd512_mask ((__v16si)(__m512i)(X), (int)(C),\
-    (__v16si)(__m512i)_mm512_undefined_epi32 (),\
-    (__mmask16)-1))
-
-#define _mm512_mask_shuffle_epi32(W, U, X, C)                           \
-  ((__m512i)  __builtin_ia32_pshufd512_mask ((__v16si)(__m512i)(X), (int)(C),\
-    (__v16si)(__m512i)(W),\
-    (__mmask16)(U)))
-
-#define _mm512_maskz_shuffle_epi32(U, X, C)                             \
-  ((__m512i)  __builtin_ia32_pshufd512_mask ((__v16si)(__m512i)(X), (int)(C),\
-    (__v16si)(__m512i)_mm512_setzero_si512 (),\
-    (__mmask16)(U)))
-
-#define _mm512_shuffle_i64x2(X, Y, C)                                   \
-  ((__m512i)  __builtin_ia32_shuf_i64x2_mask ((__v8di)(__m512i)(X),     \
-      (__v8di)(__m512i)(Y), (int)(C),\
-    (__v8di)(__m512i)_mm512_undefined_epi32 (),\
-    (__mmask8)-1))
-
-#define _mm512_mask_shuffle_i64x2(W, U, X, Y, C)                        \
-  ((__m512i)  __builtin_ia32_shuf_i64x2_mask ((__v8di)(__m512i)(X),     \
-      (__v8di)(__m512i)(Y), (int)(C),\
-    (__v8di)(__m512i)(W),\
-    (__mmask8)(U)))
-
-#define _mm512_maskz_shuffle_i64x2(U, X, Y, C)                          \
-  ((__m512i)  __builtin_ia32_shuf_i64x2_mask ((__v8di)(__m512i)(X),     \
-      (__v8di)(__m512i)(Y), (int)(C),\
-    (__v8di)(__m512i)_mm512_setzero_si512 (),\
-    (__mmask8)(U)))
-
-#define _mm512_shuffle_i32x4(X, Y, C)                                   \
-  ((__m512i)  __builtin_ia32_shuf_i32x4_mask ((__v16si)(__m512i)(X),    \
-      (__v16si)(__m512i)(Y), (int)(C),\
-    (__v16si)(__m512i)_mm512_undefined_epi32 (),\
-    (__mmask16)-1))
-
-#define _mm512_mask_shuffle_i32x4(W, U, X, Y, C)                        \
-  ((__m512i)  __builtin_ia32_shuf_i32x4_mask ((__v16si)(__m512i)(X),    \
-      (__v16si)(__m512i)(Y), (int)(C),\
-    (__v16si)(__m512i)(W),\
-    (__mmask16)(U)))
-
-#define _mm512_maskz_shuffle_i32x4(U, X, Y, C)                          \
-  ((__m512i)  __builtin_ia32_shuf_i32x4_mask ((__v16si)(__m512i)(X),    \
-      (__v16si)(__m512i)(Y), (int)(C),\
-    (__v16si)(__m512i)_mm512_setzero_si512 (),\
-    (__mmask16)(U)))
-
-#define _mm512_shuffle_f64x2(X, Y, C)                                   \
-  ((__m512d)  __builtin_ia32_shuf_f64x2_mask ((__v8df)(__m512d)(X),     \
-      (__v8df)(__m512d)(Y), (int)(C),\
-    (__v8df)(__m512d)_mm512_undefined_pd(),\
-    (__mmask8)-1))
-
-#define _mm512_mask_shuffle_f64x2(W, U, X, Y, C)                        \
-  ((__m512d)  __builtin_ia32_shuf_f64x2_mask ((__v8df)(__m512d)(X),     \
-      (__v8df)(__m512d)(Y), (int)(C),\
-    (__v8df)(__m512d)(W),\
-    (__mmask8)(U)))
-
-#define _mm512_maskz_shuffle_f64x2(U, X, Y, C)                         \
-  ((__m512d)  __builtin_ia32_shuf_f64x2_mask ((__v8df)(__m512d)(X),    \
-      (__v8df)(__m512d)(Y), (int)(C),\
-    (__v8df)(__m512d)_mm512_setzero_pd(),\
-    (__mmask8)(U)))
-
-#define _mm512_shuffle_f32x4(X, Y, C)                                  \
-  ((__m512)  __builtin_ia32_shuf_f32x4_mask ((__v16sf)(__m512)(X),     \
-      (__v16sf)(__m512)(Y), (int)(C),\
-    (__v16sf)(__m512)_mm512_undefined_ps(),\
-    (__mmask16)-1))
-
-#define _mm512_mask_shuffle_f32x4(W, U, X, Y, C)                       \
-  ((__m512)  __builtin_ia32_shuf_f32x4_mask ((__v16sf)(__m512)(X),     \
-      (__v16sf)(__m512)(Y), (int)(C),\
-    (__v16sf)(__m512)(W),\
-    (__mmask16)(U)))
-
-#define _mm512_maskz_shuffle_f32x4(U, X, Y, C)                         \
-  ((__m512)  __builtin_ia32_shuf_f32x4_mask ((__v16sf)(__m512)(X),     \
-      (__v16sf)(__m512)(Y), (int)(C),\
-    (__v16sf)(__m512)_mm512_setzero_ps(),\
-    (__mmask16)(U)))
-#endif
-
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_rolv_epi32 (__m512i __A, __m512i __B)
+_mm512_srlv_epi64 (__m512i __X, __m512i __Y)
 {
-  return (__m512i) __builtin_ia32_prolvd512_mask ((__v16si) __A,
-						  (__v16si) __B,
-						  (__v16si)
-						  _mm512_undefined_epi32 (),
-						  (__mmask16) -1);
+  return (__m512i) __builtin_ia32_psrlv8di_mask ((__v8di) __X,
+						 (__v8di) __Y,
+						 (__v8di)
+						 _mm512_undefined_epi32 (),
+						 (__mmask8) -1);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_rolv_epi32 (__m512i __W, __mmask16 __U, __m512i __A, __m512i __B)
+_mm512_mask_srlv_epi64 (__m512i __W, __mmask8 __U, __m512i __X, __m512i __Y)
 {
-  return (__m512i) __builtin_ia32_prolvd512_mask ((__v16si) __A,
-						  (__v16si) __B,
-						  (__v16si) __W,
-						  (__mmask16) __U);
+  return (__m512i) __builtin_ia32_psrlv8di_mask ((__v8di) __X,
+						 (__v8di) __Y,
+						 (__v8di) __W,
+						 (__mmask8) __U);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_rolv_epi32 (__mmask16 __U, __m512i __A, __m512i __B)
+_mm512_maskz_srlv_epi64 (__mmask8 __U, __m512i __X, __m512i __Y)
 {
-  return (__m512i) __builtin_ia32_prolvd512_mask ((__v16si) __A,
-						  (__v16si) __B,
-						  (__v16si)
-						  _mm512_setzero_si512 (),
-						  (__mmask16) __U);
+  return (__m512i) __builtin_ia32_psrlv8di_mask ((__v8di) __X,
+						 (__v8di) __Y,
+						 (__v8di)
+						 _mm512_setzero_si512 (),
+						 (__mmask8) __U);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_rorv_epi32 (__m512i __A, __m512i __B)
+_mm512_add_epi32 (__m512i __A, __m512i __B)
 {
-  return (__m512i) __builtin_ia32_prorvd512_mask ((__v16si) __A,
-						  (__v16si) __B,
-						  (__v16si)
-						  _mm512_undefined_epi32 (),
-						  (__mmask16) -1);
+  return (__m512i) ((__v16su) __A + (__v16su) __B);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_rorv_epi32 (__m512i __W, __mmask16 __U, __m512i __A, __m512i __B)
+_mm512_mask_add_epi32 (__m512i __W, __mmask16 __U, __m512i __A, __m512i __B)
 {
-  return (__m512i) __builtin_ia32_prorvd512_mask ((__v16si) __A,
-						  (__v16si) __B,
-						  (__v16si) __W,
-						  (__mmask16) __U);
+  return (__m512i) __builtin_ia32_paddd512_mask ((__v16si) __A,
+						 (__v16si) __B,
+						 (__v16si) __W,
+						 (__mmask16) __U);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_rorv_epi32 (__mmask16 __U, __m512i __A, __m512i __B)
+_mm512_maskz_add_epi32 (__mmask16 __U, __m512i __A, __m512i __B)
 {
-  return (__m512i) __builtin_ia32_prorvd512_mask ((__v16si) __A,
-						  (__v16si) __B,
-						  (__v16si)
-						  _mm512_setzero_si512 (),
-						  (__mmask16) __U);
+  return (__m512i) __builtin_ia32_paddd512_mask ((__v16si) __A,
+						 (__v16si) __B,
+						 (__v16si)
+						 _mm512_setzero_si512 (),
+						 (__mmask16) __U);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_rolv_epi64 (__m512i __A, __m512i __B)
+_mm512_mul_epi32 (__m512i __X, __m512i __Y)
 {
-  return (__m512i) __builtin_ia32_prolvq512_mask ((__v8di) __A,
-						  (__v8di) __B,
+  return (__m512i) __builtin_ia32_pmuldq512_mask ((__v16si) __X,
+						  (__v16si) __Y,
 						  (__v8di)
 						  _mm512_undefined_epi32 (),
 						  (__mmask8) -1);
@@ -4817,2133 +4668,2567 @@ _mm512_rolv_epi64 (__m512i __A, __m512i __B)
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_rolv_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m512i __B)
+_mm512_mask_mul_epi32 (__m512i __W, __mmask8 __M, __m512i __X, __m512i __Y)
 {
-  return (__m512i) __builtin_ia32_prolvq512_mask ((__v8di) __A,
-						  (__v8di) __B,
-						  (__v8di) __W,
-						  (__mmask8) __U);
+  return (__m512i) __builtin_ia32_pmuldq512_mask ((__v16si) __X,
+						  (__v16si) __Y,
+						  (__v8di) __W, __M);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_rolv_epi64 (__mmask8 __U, __m512i __A, __m512i __B)
+_mm512_maskz_mul_epi32 (__mmask8 __M, __m512i __X, __m512i __Y)
 {
-  return (__m512i) __builtin_ia32_prolvq512_mask ((__v8di) __A,
-						  (__v8di) __B,
+  return (__m512i) __builtin_ia32_pmuldq512_mask ((__v16si) __X,
+						  (__v16si) __Y,
 						  (__v8di)
 						  _mm512_setzero_si512 (),
-						  (__mmask8) __U);
+						  __M);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_rorv_epi64 (__m512i __A, __m512i __B)
+_mm512_sub_epi32 (__m512i __A, __m512i __B)
 {
-  return (__m512i) __builtin_ia32_prorvq512_mask ((__v8di) __A,
-						  (__v8di) __B,
-						  (__v8di)
-						  _mm512_undefined_epi32 (),
-						  (__mmask8) -1);
+  return (__m512i) ((__v16su) __A - (__v16su) __B);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_rorv_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m512i __B)
+_mm512_mask_sub_epi32 (__m512i __W, __mmask16 __U, __m512i __A, __m512i __B)
 {
-  return (__m512i) __builtin_ia32_prorvq512_mask ((__v8di) __A,
-						  (__v8di) __B,
-						  (__v8di) __W,
-						  (__mmask8) __U);
+  return (__m512i) __builtin_ia32_psubd512_mask ((__v16si) __A,
+						 (__v16si) __B,
+						 (__v16si) __W,
+						 (__mmask16) __U);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_rorv_epi64 (__mmask8 __U, __m512i __A, __m512i __B)
+_mm512_maskz_sub_epi32 (__mmask16 __U, __m512i __A, __m512i __B)
 {
-  return (__m512i) __builtin_ia32_prorvq512_mask ((__v8di) __A,
-						  (__v8di) __B,
-						  (__v8di)
-						  _mm512_setzero_si512 (),
-						  (__mmask8) __U);
+  return (__m512i) __builtin_ia32_psubd512_mask ((__v16si) __A,
+						 (__v16si) __B,
+						 (__v16si)
+						 _mm512_setzero_si512 (),
+						 (__mmask16) __U);
 }
 
-#ifdef __OPTIMIZE__
-extern __inline __m256i
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtt_roundpd_epi32 (__m512d __A, const int __R)
-{
-  return (__m256i) __builtin_ia32_cvttpd2dq512_mask ((__v8df) __A,
-						     (__v8si)
-						     _mm256_undefined_si256 (),
-						     (__mmask8) -1, __R);
+_mm512_mul_epu32 (__m512i __X, __m512i __Y)
+{
+  return (__m512i) __builtin_ia32_pmuludq512_mask ((__v16si) __X,
+						   (__v16si) __Y,
+						   (__v8di)
+						   _mm512_undefined_epi32 (),
+						   (__mmask8) -1);
 }
 
-extern __inline __m256i
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtt_roundpd_epi32 (__m256i __W, __mmask8 __U, __m512d __A,
-				const int __R)
+_mm512_mask_mul_epu32 (__m512i __W, __mmask8 __M, __m512i __X, __m512i __Y)
 {
-  return (__m256i) __builtin_ia32_cvttpd2dq512_mask ((__v8df) __A,
-						     (__v8si) __W,
-						     (__mmask8) __U, __R);
+  return (__m512i) __builtin_ia32_pmuludq512_mask ((__v16si) __X,
+						   (__v16si) __Y,
+						   (__v8di) __W, __M);
 }
 
-extern __inline __m256i
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtt_roundpd_epi32 (__mmask8 __U, __m512d __A, const int __R)
+_mm512_maskz_mul_epu32 (__mmask8 __M, __m512i __X, __m512i __Y)
 {
-  return (__m256i) __builtin_ia32_cvttpd2dq512_mask ((__v8df) __A,
-						     (__v8si)
-						     _mm256_setzero_si256 (),
-						     (__mmask8) __U, __R);
+  return (__m512i) __builtin_ia32_pmuludq512_mask ((__v16si) __X,
+						   (__v16si) __Y,
+						   (__v8di)
+						   _mm512_setzero_si512 (),
+						   __M);
 }
 
-extern __inline __m256i
+#ifdef __OPTIMIZE__
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtt_roundpd_epu32 (__m512d __A, const int __R)
+_mm512_slli_epi64 (__m512i __A, unsigned int __B)
 {
-  return (__m256i) __builtin_ia32_cvttpd2udq512_mask ((__v8df) __A,
-						      (__v8si)
-						      _mm256_undefined_si256 (),
-						      (__mmask8) -1, __R);
+  return (__m512i) __builtin_ia32_psllqi512_mask ((__v8di) __A, __B,
+						  (__v8di)
+						  _mm512_undefined_epi32 (),
+						  (__mmask8) -1);
 }
 
-extern __inline __m256i
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtt_roundpd_epu32 (__m256i __W, __mmask8 __U, __m512d __A,
-				const int __R)
+_mm512_mask_slli_epi64 (__m512i __W, __mmask8 __U, __m512i __A,
+			unsigned int __B)
 {
-  return (__m256i) __builtin_ia32_cvttpd2udq512_mask ((__v8df) __A,
-						      (__v8si) __W,
-						      (__mmask8) __U, __R);
+  return (__m512i) __builtin_ia32_psllqi512_mask ((__v8di) __A, __B,
+						  (__v8di) __W,
+						  (__mmask8) __U);
 }
 
-extern __inline __m256i
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtt_roundpd_epu32 (__mmask8 __U, __m512d __A, const int __R)
+_mm512_maskz_slli_epi64 (__mmask8 __U, __m512i __A, unsigned int __B)
 {
-  return (__m256i) __builtin_ia32_cvttpd2udq512_mask ((__v8df) __A,
-						      (__v8si)
-						      _mm256_setzero_si256 (),
-						      (__mmask8) __U, __R);
+  return (__m512i) __builtin_ia32_psllqi512_mask ((__v8di) __A, __B,
+						  (__v8di)
+						  _mm512_setzero_si512 (),
+						  (__mmask8) __U);
 }
 #else
-#define _mm512_cvtt_roundpd_epi32(A, B)		     \
-    ((__m256i)__builtin_ia32_cvttpd2dq512_mask(A, (__v8si)_mm256_undefined_si256(), -1, B))
-
-#define _mm512_mask_cvtt_roundpd_epi32(W, U, A, B)   \
-    ((__m256i)__builtin_ia32_cvttpd2dq512_mask(A, (__v8si)(W), U, B))
-
-#define _mm512_maskz_cvtt_roundpd_epi32(U, A, B)     \
-    ((__m256i)__builtin_ia32_cvttpd2dq512_mask(A, (__v8si)_mm256_setzero_si256(), U, B))
-
-#define _mm512_cvtt_roundpd_epu32(A, B)		     \
-    ((__m256i)__builtin_ia32_cvttpd2udq512_mask(A, (__v8si)_mm256_undefined_si256(), -1, B))
+#define _mm512_slli_epi64(X, C)						\
+  ((__m512i) __builtin_ia32_psllqi512_mask ((__v8di)(__m512i)(X),	\
+    (unsigned int)(C),							\
+    (__v8di)(__m512i)_mm512_undefined_epi32 (),				\
+    (__mmask8)-1))
 
-#define _mm512_mask_cvtt_roundpd_epu32(W, U, A, B)   \
-    ((__m256i)__builtin_ia32_cvttpd2udq512_mask(A, (__v8si)(W), U, B))
+#define _mm512_mask_slli_epi64(W, U, X, C)				\
+  ((__m512i) __builtin_ia32_psllqi512_mask ((__v8di)(__m512i)(X),	\
+    (unsigned int)(C),							\
+    (__v8di)(__m512i)(W),						\
+    (__mmask8)(U)))
 
-#define _mm512_maskz_cvtt_roundpd_epu32(U, A, B)     \
-    ((__m256i)__builtin_ia32_cvttpd2udq512_mask(A, (__v8si)_mm256_setzero_si256(), U, B))
+#define _mm512_maskz_slli_epi64(U, X, C)				\
+  ((__m512i) __builtin_ia32_psllqi512_mask ((__v8di)(__m512i)(X),	\
+    (unsigned int)(C),							\
+    (__v8di)(__m512i)_mm512_setzero_si512 (),				\
+    (__mmask8)(U)))
 #endif
 
-#ifdef __OPTIMIZE__
-extern __inline __m256i
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvt_roundpd_epi32 (__m512d __A, const int __R)
+_mm512_sll_epi64 (__m512i __A, __m128i __B)
 {
-  return (__m256i) __builtin_ia32_cvtpd2dq512_mask ((__v8df) __A,
-						    (__v8si)
-						    _mm256_undefined_si256 (),
-						    (__mmask8) -1, __R);
+  return (__m512i) __builtin_ia32_psllq512_mask ((__v8di) __A,
+						 (__v2di) __B,
+						 (__v8di)
+						 _mm512_undefined_epi32 (),
+						 (__mmask8) -1);
 }
 
-extern __inline __m256i
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvt_roundpd_epi32 (__m256i __W, __mmask8 __U, __m512d __A,
-			       const int __R)
+_mm512_mask_sll_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m128i __B)
 {
-  return (__m256i) __builtin_ia32_cvtpd2dq512_mask ((__v8df) __A,
-						    (__v8si) __W,
-						    (__mmask8) __U, __R);
+  return (__m512i) __builtin_ia32_psllq512_mask ((__v8di) __A,
+						 (__v2di) __B,
+						 (__v8di) __W,
+						 (__mmask8) __U);
 }
 
-extern __inline __m256i
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvt_roundpd_epi32 (__mmask8 __U, __m512d __A, const int __R)
+_mm512_maskz_sll_epi64 (__mmask8 __U, __m512i __A, __m128i __B)
 {
-  return (__m256i) __builtin_ia32_cvtpd2dq512_mask ((__v8df) __A,
-						    (__v8si)
-						    _mm256_setzero_si256 (),
-						    (__mmask8) __U, __R);
+  return (__m512i) __builtin_ia32_psllq512_mask ((__v8di) __A,
+						 (__v2di) __B,
+						 (__v8di)
+						 _mm512_setzero_si512 (),
+						 (__mmask8) __U);
 }
 
-extern __inline __m256i
+#ifdef __OPTIMIZE__
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvt_roundpd_epu32 (__m512d __A, const int __R)
+_mm512_srli_epi64 (__m512i __A, unsigned int __B)
 {
-  return (__m256i) __builtin_ia32_cvtpd2udq512_mask ((__v8df) __A,
-						     (__v8si)
-						     _mm256_undefined_si256 (),
-						     (__mmask8) -1, __R);
+  return (__m512i) __builtin_ia32_psrlqi512_mask ((__v8di) __A, __B,
+						  (__v8di)
+						  _mm512_undefined_epi32 (),
+						  (__mmask8) -1);
 }
 
-extern __inline __m256i
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvt_roundpd_epu32 (__m256i __W, __mmask8 __U, __m512d __A,
-			       const int __R)
+_mm512_mask_srli_epi64 (__m512i __W, __mmask8 __U,
+			__m512i __A, unsigned int __B)
 {
-  return (__m256i) __builtin_ia32_cvtpd2udq512_mask ((__v8df) __A,
-						     (__v8si) __W,
-						     (__mmask8) __U, __R);
+  return (__m512i) __builtin_ia32_psrlqi512_mask ((__v8di) __A, __B,
+						  (__v8di) __W,
+						  (__mmask8) __U);
 }
 
-extern __inline __m256i
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvt_roundpd_epu32 (__mmask8 __U, __m512d __A, const int __R)
+_mm512_maskz_srli_epi64 (__mmask8 __U, __m512i __A, unsigned int __B)
 {
-  return (__m256i) __builtin_ia32_cvtpd2udq512_mask ((__v8df) __A,
-						     (__v8si)
-						     _mm256_setzero_si256 (),
-						     (__mmask8) __U, __R);
+  return (__m512i) __builtin_ia32_psrlqi512_mask ((__v8di) __A, __B,
+						  (__v8di)
+						  _mm512_setzero_si512 (),
+						  (__mmask8) __U);
 }
 #else
-#define _mm512_cvt_roundpd_epi32(A, B)		    \
-    ((__m256i)__builtin_ia32_cvtpd2dq512_mask(A, (__v8si)_mm256_undefined_si256(), -1, B))
-
-#define _mm512_mask_cvt_roundpd_epi32(W, U, A, B)   \
-    ((__m256i)__builtin_ia32_cvtpd2dq512_mask(A, (__v8si)(W), U, B))
-
-#define _mm512_maskz_cvt_roundpd_epi32(U, A, B)     \
-    ((__m256i)__builtin_ia32_cvtpd2dq512_mask(A, (__v8si)_mm256_setzero_si256(), U, B))
-
-#define _mm512_cvt_roundpd_epu32(A, B)		    \
-    ((__m256i)__builtin_ia32_cvtpd2udq512_mask(A, (__v8si)_mm256_undefined_si256(), -1, B))
+#define _mm512_srli_epi64(X, C)						\
+  ((__m512i) __builtin_ia32_psrlqi512_mask ((__v8di)(__m512i)(X),	\
+    (unsigned int)(C),							\
+    (__v8di)(__m512i)_mm512_undefined_epi32 (),				\
+    (__mmask8)-1))
 
-#define _mm512_mask_cvt_roundpd_epu32(W, U, A, B)   \
-    ((__m256i)__builtin_ia32_cvtpd2udq512_mask(A, (__v8si)(W), U, B))
+#define _mm512_mask_srli_epi64(W, U, X, C)				\
+  ((__m512i) __builtin_ia32_psrlqi512_mask ((__v8di)(__m512i)(X),	\
+    (unsigned int)(C),							\
+    (__v8di)(__m512i)(W),						\
+    (__mmask8)(U)))
 
-#define _mm512_maskz_cvt_roundpd_epu32(U, A, B)     \
-    ((__m256i)__builtin_ia32_cvtpd2udq512_mask(A, (__v8si)_mm256_setzero_si256(), U, B))
+#define _mm512_maskz_srli_epi64(U, X, C)				\
+  ((__m512i) __builtin_ia32_psrlqi512_mask ((__v8di)(__m512i)(X),	\
+    (unsigned int)(C),							\
+    (__v8di)(__m512i)_mm512_setzero_si512 (),				\
+    (__mmask8)(U)))
 #endif
 
-#ifdef __OPTIMIZE__
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtt_roundps_epi32 (__m512 __A, const int __R)
+_mm512_srl_epi64 (__m512i __A, __m128i __B)
 {
-  return (__m512i) __builtin_ia32_cvttps2dq512_mask ((__v16sf) __A,
-						     (__v16si)
-						     _mm512_undefined_epi32 (),
-						     (__mmask16) -1, __R);
+  return (__m512i) __builtin_ia32_psrlq512_mask ((__v8di) __A,
+						 (__v2di) __B,
+						 (__v8di)
+						 _mm512_undefined_epi32 (),
+						 (__mmask8) -1);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtt_roundps_epi32 (__m512i __W, __mmask16 __U, __m512 __A,
-				const int __R)
+_mm512_mask_srl_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m128i __B)
 {
-  return (__m512i) __builtin_ia32_cvttps2dq512_mask ((__v16sf) __A,
-						     (__v16si) __W,
-						     (__mmask16) __U, __R);
+  return (__m512i) __builtin_ia32_psrlq512_mask ((__v8di) __A,
+						 (__v2di) __B,
+						 (__v8di) __W,
+						 (__mmask8) __U);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtt_roundps_epi32 (__mmask16 __U, __m512 __A, const int __R)
+_mm512_maskz_srl_epi64 (__mmask8 __U, __m512i __A, __m128i __B)
 {
-  return (__m512i) __builtin_ia32_cvttps2dq512_mask ((__v16sf) __A,
-						     (__v16si)
-						     _mm512_setzero_si512 (),
-						     (__mmask16) __U, __R);
+  return (__m512i) __builtin_ia32_psrlq512_mask ((__v8di) __A,
+						 (__v2di) __B,
+						 (__v8di)
+						 _mm512_setzero_si512 (),
+						 (__mmask8) __U);
 }
 
+#ifdef __OPTIMIZE__
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtt_roundps_epu32 (__m512 __A, const int __R)
+_mm512_srai_epi64 (__m512i __A, unsigned int __B)
 {
-  return (__m512i) __builtin_ia32_cvttps2udq512_mask ((__v16sf) __A,
-						      (__v16si)
-						      _mm512_undefined_epi32 (),
-						      (__mmask16) -1, __R);
+  return (__m512i) __builtin_ia32_psraqi512_mask ((__v8di) __A, __B,
+						  (__v8di)
+						  _mm512_undefined_epi32 (),
+						  (__mmask8) -1);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtt_roundps_epu32 (__m512i __W, __mmask16 __U, __m512 __A,
-				const int __R)
+_mm512_mask_srai_epi64 (__m512i __W, __mmask8 __U, __m512i __A,
+			unsigned int __B)
 {
-  return (__m512i) __builtin_ia32_cvttps2udq512_mask ((__v16sf) __A,
-						      (__v16si) __W,
-						      (__mmask16) __U, __R);
+  return (__m512i) __builtin_ia32_psraqi512_mask ((__v8di) __A, __B,
+						  (__v8di) __W,
+						  (__mmask8) __U);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtt_roundps_epu32 (__mmask16 __U, __m512 __A, const int __R)
+_mm512_maskz_srai_epi64 (__mmask8 __U, __m512i __A, unsigned int __B)
 {
-  return (__m512i) __builtin_ia32_cvttps2udq512_mask ((__v16sf) __A,
-						      (__v16si)
-						      _mm512_setzero_si512 (),
-						      (__mmask16) __U, __R);
+  return (__m512i) __builtin_ia32_psraqi512_mask ((__v8di) __A, __B,
+						  (__v8di)
+						  _mm512_setzero_si512 (),
+						  (__mmask8) __U);
 }
 #else
-#define _mm512_cvtt_roundps_epi32(A, B)		     \
-    ((__m512i)__builtin_ia32_cvttps2dq512_mask(A, (__v16si)_mm512_undefined_epi32 (), -1, B))
-
-#define _mm512_mask_cvtt_roundps_epi32(W, U, A, B)   \
-    ((__m512i)__builtin_ia32_cvttps2dq512_mask(A, (__v16si)(W), U, B))
-
-#define _mm512_maskz_cvtt_roundps_epi32(U, A, B)     \
-    ((__m512i)__builtin_ia32_cvttps2dq512_mask(A, (__v16si)_mm512_setzero_si512 (), U, B))
-
-#define _mm512_cvtt_roundps_epu32(A, B)		     \
-    ((__m512i)__builtin_ia32_cvttps2udq512_mask(A, (__v16si)_mm512_undefined_epi32 (), -1, B))
+#define _mm512_srai_epi64(X, C)						\
+  ((__m512i) __builtin_ia32_psraqi512_mask ((__v8di)(__m512i)(X),	\
+    (unsigned int)(C),							\
+    (__v8di)(__m512i)_mm512_undefined_epi32 (),				\
+    (__mmask8)-1))
 
-#define _mm512_mask_cvtt_roundps_epu32(W, U, A, B)   \
-    ((__m512i)__builtin_ia32_cvttps2udq512_mask(A, (__v16si)(W), U, B))
+#define _mm512_mask_srai_epi64(W, U, X, C)				\
+  ((__m512i) __builtin_ia32_psraqi512_mask ((__v8di)(__m512i)(X),	\
+    (unsigned int)(C),							\
+    (__v8di)(__m512i)(W),						\
+    (__mmask8)(U)))
 
-#define _mm512_maskz_cvtt_roundps_epu32(U, A, B)     \
-    ((__m512i)__builtin_ia32_cvttps2udq512_mask(A, (__v16si)_mm512_setzero_si512 (), U, B))
+#define _mm512_maskz_srai_epi64(U, X, C)				\
+  ((__m512i) __builtin_ia32_psraqi512_mask ((__v8di)(__m512i)(X),	\
+    (unsigned int)(C),							\
+    (__v8di)(__m512i)_mm512_setzero_si512 (),				\
+    (__mmask8)(U)))
 #endif
 
-#ifdef __OPTIMIZE__
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvt_roundps_epi32 (__m512 __A, const int __R)
+_mm512_sra_epi64 (__m512i __A, __m128i __B)
 {
-  return (__m512i) __builtin_ia32_cvtps2dq512_mask ((__v16sf) __A,
-						    (__v16si)
-						    _mm512_undefined_epi32 (),
-						    (__mmask16) -1, __R);
+  return (__m512i) __builtin_ia32_psraq512_mask ((__v8di) __A,
+						 (__v2di) __B,
+						 (__v8di)
+						 _mm512_undefined_epi32 (),
+						 (__mmask8) -1);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvt_roundps_epi32 (__m512i __W, __mmask16 __U, __m512 __A,
-			       const int __R)
+_mm512_mask_sra_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m128i __B)
 {
-  return (__m512i) __builtin_ia32_cvtps2dq512_mask ((__v16sf) __A,
-						    (__v16si) __W,
-						    (__mmask16) __U, __R);
+  return (__m512i) __builtin_ia32_psraq512_mask ((__v8di) __A,
+						 (__v2di) __B,
+						 (__v8di) __W,
+						 (__mmask8) __U);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvt_roundps_epi32 (__mmask16 __U, __m512 __A, const int __R)
+_mm512_maskz_sra_epi64 (__mmask8 __U, __m512i __A, __m128i __B)
 {
-  return (__m512i) __builtin_ia32_cvtps2dq512_mask ((__v16sf) __A,
-						    (__v16si)
-						    _mm512_setzero_si512 (),
-						    (__mmask16) __U, __R);
+  return (__m512i) __builtin_ia32_psraq512_mask ((__v8di) __A,
+						 (__v2di) __B,
+						 (__v8di)
+						 _mm512_setzero_si512 (),
+						 (__mmask8) __U);
 }
 
+#ifdef __OPTIMIZE__
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvt_roundps_epu32 (__m512 __A, const int __R)
+_mm512_slli_epi32 (__m512i __A, unsigned int __B)
 {
-  return (__m512i) __builtin_ia32_cvtps2udq512_mask ((__v16sf) __A,
-						     (__v16si)
-						     _mm512_undefined_epi32 (),
-						     (__mmask16) -1, __R);
+  return (__m512i) __builtin_ia32_pslldi512_mask ((__v16si) __A, __B,
+						  (__v16si)
+						  _mm512_undefined_epi32 (),
+						  (__mmask16) -1);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvt_roundps_epu32 (__m512i __W, __mmask16 __U, __m512 __A,
-			       const int __R)
+_mm512_mask_slli_epi32 (__m512i __W, __mmask16 __U, __m512i __A,
+			unsigned int __B)
 {
-  return (__m512i) __builtin_ia32_cvtps2udq512_mask ((__v16sf) __A,
-						     (__v16si) __W,
-						     (__mmask16) __U, __R);
+  return (__m512i) __builtin_ia32_pslldi512_mask ((__v16si) __A, __B,
+						  (__v16si) __W,
+						  (__mmask16) __U);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvt_roundps_epu32 (__mmask16 __U, __m512 __A, const int __R)
+_mm512_maskz_slli_epi32 (__mmask16 __U, __m512i __A, unsigned int __B)
 {
-  return (__m512i) __builtin_ia32_cvtps2udq512_mask ((__v16sf) __A,
-						     (__v16si)
-						     _mm512_setzero_si512 (),
-						     (__mmask16) __U, __R);
+  return (__m512i) __builtin_ia32_pslldi512_mask ((__v16si) __A, __B,
+						  (__v16si)
+						  _mm512_setzero_si512 (),
+						  (__mmask16) __U);
 }
 #else
-#define _mm512_cvt_roundps_epi32(A, B)		    \
-    ((__m512i)__builtin_ia32_cvtps2dq512_mask(A, (__v16si)_mm512_undefined_epi32 (), -1, B))
-
-#define _mm512_mask_cvt_roundps_epi32(W, U, A, B)   \
-    ((__m512i)__builtin_ia32_cvtps2dq512_mask(A, (__v16si)(W), U, B))
+#define _mm512_slli_epi32(X, C)						\
+  ((__m512i) __builtin_ia32_pslldi512_mask ((__v16si)(__m512i)(X),	\
+    (unsigned int)(C),							\
+    (__v16si)(__m512i)_mm512_undefined_epi32 (),			\
+    (__mmask16)-1))
 
-#define _mm512_maskz_cvt_roundps_epi32(U, A, B)     \
-    ((__m512i)__builtin_ia32_cvtps2dq512_mask(A, (__v16si)_mm512_setzero_si512 (), U, B))
+#define _mm512_mask_slli_epi32(W, U, X, C)				\
+  ((__m512i) __builtin_ia32_pslldi512_mask ((__v16si)(__m512i)(X),	\
+    (unsigned int)(C),							\
+    (__v16si)(__m512i)(W),						\
+    (__mmask16)(U)))
 
-#define _mm512_cvt_roundps_epu32(A, B)		    \
-    ((__m512i)__builtin_ia32_cvtps2udq512_mask(A, (__v16si)_mm512_undefined_epi32 (), -1, B))
+#define _mm512_maskz_slli_epi32(U, X, C)				\
+  ((__m512i) __builtin_ia32_pslldi512_mask ((__v16si)(__m512i)(X),	\
+    (unsigned int)(C),							\
+    (__v16si)(__m512i)_mm512_setzero_si512 (),				\
+    (__mmask16)(U)))
+#endif
 
-#define _mm512_mask_cvt_roundps_epu32(W, U, A, B)   \
-    ((__m512i)__builtin_ia32_cvtps2udq512_mask(A, (__v16si)(W), U, B))
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_sll_epi32 (__m512i __A, __m128i __B)
+{
+  return (__m512i) __builtin_ia32_pslld512_mask ((__v16si) __A,
+						 (__v4si) __B,
+						 (__v16si)
+						 _mm512_undefined_epi32 (),
+						 (__mmask16) -1);
+}
 
-#define _mm512_maskz_cvt_roundps_epu32(U, A, B)     \
-    ((__m512i)__builtin_ia32_cvtps2udq512_mask(A, (__v16si)_mm512_setzero_si512 (), U, B))
-#endif
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_sll_epi32 (__m512i __W, __mmask16 __U, __m512i __A, __m128i __B)
+{
+  return (__m512i) __builtin_ia32_pslld512_mask ((__v16si) __A,
+						 (__v4si) __B,
+						 (__v16si) __W,
+						 (__mmask16) __U);
+}
 
-extern __inline __m128d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtu32_sd (__m128d __A, unsigned __B)
+_mm512_maskz_sll_epi32 (__mmask16 __U, __m512i __A, __m128i __B)
 {
-  return (__m128d) __builtin_ia32_cvtusi2sd32 ((__v2df) __A, __B);
+  return (__m512i) __builtin_ia32_pslld512_mask ((__v16si) __A,
+						 (__v4si) __B,
+						 (__v16si)
+						 _mm512_setzero_si512 (),
+						 (__mmask16) __U);
 }
 
-#ifdef __x86_64__
 #ifdef __OPTIMIZE__
-extern __inline __m128d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundu64_sd (__m128d __A, unsigned long long __B, const int __R)
+_mm512_srli_epi32 (__m512i __A, unsigned int __B)
 {
-  return (__m128d) __builtin_ia32_cvtusi2sd64 ((__v2df) __A, __B, __R);
+  return (__m512i) __builtin_ia32_psrldi512_mask ((__v16si) __A, __B,
+						  (__v16si)
+						  _mm512_undefined_epi32 (),
+						  (__mmask16) -1);
 }
 
-extern __inline __m128d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundi64_sd (__m128d __A, long long __B, const int __R)
+_mm512_mask_srli_epi32 (__m512i __W, __mmask16 __U,
+			__m512i __A, unsigned int __B)
 {
-  return (__m128d) __builtin_ia32_cvtsi2sd64 ((__v2df) __A, __B, __R);
+  return (__m512i) __builtin_ia32_psrldi512_mask ((__v16si) __A, __B,
+						  (__v16si) __W,
+						  (__mmask16) __U);
 }
 
-extern __inline __m128d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundsi64_sd (__m128d __A, long long __B, const int __R)
+_mm512_maskz_srli_epi32 (__mmask16 __U, __m512i __A, unsigned int __B)
 {
-  return (__m128d) __builtin_ia32_cvtsi2sd64 ((__v2df) __A, __B, __R);
+  return (__m512i) __builtin_ia32_psrldi512_mask ((__v16si) __A, __B,
+						  (__v16si)
+						  _mm512_setzero_si512 (),
+						  (__mmask16) __U);
 }
 #else
-#define _mm_cvt_roundu64_sd(A, B, C)   \
-    (__m128d)__builtin_ia32_cvtusi2sd64(A, B, C)
-
-#define _mm_cvt_roundi64_sd(A, B, C)   \
-    (__m128d)__builtin_ia32_cvtsi2sd64(A, B, C)
+#define _mm512_srli_epi32(X, C)						  \
+  ((__m512i) __builtin_ia32_psrldi512_mask ((__v16si)(__m512i)(X),	  \
+    (unsigned int)(C),							  \
+    (__v16si)(__m512i)_mm512_undefined_epi32 (),\
+    (__mmask16)-1))
 
-#define _mm_cvt_roundsi64_sd(A, B, C)   \
-    (__m128d)__builtin_ia32_cvtsi2sd64(A, B, C)
-#endif
+#define _mm512_mask_srli_epi32(W, U, X, C)				  \
+  ((__m512i) __builtin_ia32_psrldi512_mask ((__v16si)(__m512i)(X),	  \
+    (unsigned int)(C),							  \
+    (__v16si)(__m512i)(W),						  \
+    (__mmask16)(U)))
 
+#define _mm512_maskz_srli_epi32(U, X, C)				  \
+  ((__m512i) __builtin_ia32_psrldi512_mask ((__v16si)(__m512i)(X),	  \
+    (unsigned int)(C),							  \
+    (__v16si)(__m512i)_mm512_setzero_si512 (),				  \
+    (__mmask16)(U)))
 #endif
 
-#ifdef __OPTIMIZE__
-extern __inline __m128
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundu32_ss (__m128 __A, unsigned __B, const int __R)
+_mm512_srl_epi32 (__m512i __A, __m128i __B)
 {
-  return (__m128) __builtin_ia32_cvtusi2ss32 ((__v4sf) __A, __B, __R);
+  return (__m512i) __builtin_ia32_psrld512_mask ((__v16si) __A,
+						 (__v4si) __B,
+						 (__v16si)
+						 _mm512_undefined_epi32 (),
+						 (__mmask16) -1);
 }
 
-extern __inline __m128
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundsi32_ss (__m128 __A, int __B, const int __R)
+_mm512_mask_srl_epi32 (__m512i __W, __mmask16 __U, __m512i __A, __m128i __B)
 {
-  return (__m128) __builtin_ia32_cvtsi2ss32 ((__v4sf) __A, __B, __R);
+  return (__m512i) __builtin_ia32_psrld512_mask ((__v16si) __A,
+						 (__v4si) __B,
+						 (__v16si) __W,
+						 (__mmask16) __U);
 }
 
-extern __inline __m128
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundi32_ss (__m128 __A, int __B, const int __R)
+_mm512_maskz_srl_epi32 (__mmask16 __U, __m512i __A, __m128i __B)
 {
-  return (__m128) __builtin_ia32_cvtsi2ss32 ((__v4sf) __A, __B, __R);
+  return (__m512i) __builtin_ia32_psrld512_mask ((__v16si) __A,
+						 (__v4si) __B,
+						 (__v16si)
+						 _mm512_setzero_si512 (),
+						 (__mmask16) __U);
 }
-#else
-#define _mm_cvt_roundu32_ss(A, B, C)   \
-    (__m128)__builtin_ia32_cvtusi2ss32(A, B, C)
-
-#define _mm_cvt_roundi32_ss(A, B, C)   \
-    (__m128)__builtin_ia32_cvtsi2ss32(A, B, C)
-
-#define _mm_cvt_roundsi32_ss(A, B, C)   \
-    (__m128)__builtin_ia32_cvtsi2ss32(A, B, C)
-#endif
 
-#ifdef __x86_64__
 #ifdef __OPTIMIZE__
-extern __inline __m128
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundu64_ss (__m128 __A, unsigned long long __B, const int __R)
+_mm512_srai_epi32 (__m512i __A, unsigned int __B)
 {
-  return (__m128) __builtin_ia32_cvtusi2ss64 ((__v4sf) __A, __B, __R);
+  return (__m512i) __builtin_ia32_psradi512_mask ((__v16si) __A, __B,
+						  (__v16si)
+						  _mm512_undefined_epi32 (),
+						  (__mmask16) -1);
 }
 
-extern __inline __m128
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundsi64_ss (__m128 __A, long long __B, const int __R)
+_mm512_mask_srai_epi32 (__m512i __W, __mmask16 __U, __m512i __A,
+			unsigned int __B)
 {
-  return (__m128) __builtin_ia32_cvtsi2ss64 ((__v4sf) __A, __B, __R);
+  return (__m512i) __builtin_ia32_psradi512_mask ((__v16si) __A, __B,
+						  (__v16si) __W,
+						  (__mmask16) __U);
 }
 
-extern __inline __m128
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundi64_ss (__m128 __A, long long __B, const int __R)
+_mm512_maskz_srai_epi32 (__mmask16 __U, __m512i __A, unsigned int __B)
 {
-  return (__m128) __builtin_ia32_cvtsi2ss64 ((__v4sf) __A, __B, __R);
+  return (__m512i) __builtin_ia32_psradi512_mask ((__v16si) __A, __B,
+						  (__v16si)
+						  _mm512_setzero_si512 (),
+						  (__mmask16) __U);
 }
 #else
-#define _mm_cvt_roundu64_ss(A, B, C)   \
-    (__m128)__builtin_ia32_cvtusi2ss64(A, B, C)
-
-#define _mm_cvt_roundi64_ss(A, B, C)   \
-    (__m128)__builtin_ia32_cvtsi2ss64(A, B, C)
+#define _mm512_srai_epi32(X, C)						\
+  ((__m512i) __builtin_ia32_psradi512_mask ((__v16si)(__m512i)(X),	\
+    (unsigned int)(C),							\
+    (__v16si)(__m512i)_mm512_undefined_epi32 (),\
+    (__mmask16)-1))
 
-#define _mm_cvt_roundsi64_ss(A, B, C)   \
-    (__m128)__builtin_ia32_cvtsi2ss64(A, B, C)
-#endif
+#define _mm512_mask_srai_epi32(W, U, X, C)				\
+  ((__m512i) __builtin_ia32_psradi512_mask ((__v16si)(__m512i)(X),	\
+    (unsigned int)(C),							\
+    (__v16si)(__m512i)(W),						\
+    (__mmask16)(U)))
 
+#define _mm512_maskz_srai_epi32(U, X, C)				\
+  ((__m512i) __builtin_ia32_psradi512_mask ((__v16si)(__m512i)(X),	\
+    (unsigned int)(C),							\
+    (__v16si)(__m512i)_mm512_setzero_si512 (),				\
+    (__mmask16)(U)))
 #endif
 
-extern __inline __m128i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepi32_epi8 (__m512i __A)
-{
-  return (__m128i) __builtin_ia32_pmovdb512_mask ((__v16si) __A,
-						  (__v16qi)
-						  _mm_undefined_si128 (),
-						  (__mmask16) -1);
-}
-
-extern __inline void
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepi32_storeu_epi8 (void * __P, __mmask16 __M, __m512i __A)
+_mm512_sra_epi32 (__m512i __A, __m128i __B)
 {
-  __builtin_ia32_pmovdb512mem_mask ((__v16qi *) __P, (__v16si) __A, __M);
+  return (__m512i) __builtin_ia32_psrad512_mask ((__v16si) __A,
+						 (__v4si) __B,
+						 (__v16si)
+						 _mm512_undefined_epi32 (),
+						 (__mmask16) -1);
 }
 
-extern __inline __m128i
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepi32_epi8 (__m128i __O, __mmask16 __M, __m512i __A)
+_mm512_mask_sra_epi32 (__m512i __W, __mmask16 __U, __m512i __A, __m128i __B)
 {
-  return (__m128i) __builtin_ia32_pmovdb512_mask ((__v16si) __A,
-						  (__v16qi) __O, __M);
+  return (__m512i) __builtin_ia32_psrad512_mask ((__v16si) __A,
+						 (__v4si) __B,
+						 (__v16si) __W,
+						 (__mmask16) __U);
 }
 
-extern __inline __m128i
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepi32_epi8 (__mmask16 __M, __m512i __A)
+_mm512_maskz_sra_epi32 (__mmask16 __U, __m512i __A, __m128i __B)
 {
-  return (__m128i) __builtin_ia32_pmovdb512_mask ((__v16si) __A,
-						  (__v16qi)
-						  _mm_setzero_si128 (),
-						  __M);
+  return (__m512i) __builtin_ia32_psrad512_mask ((__v16si) __A,
+						 (__v4si) __B,
+						 (__v16si)
+						 _mm512_setzero_si512 (),
+						 (__mmask16) __U);
 }
 
-extern __inline __m128i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtsepi32_epi8 (__m512i __A)
+/* Constant helper to represent the ternary logic operations among
+   vector A, B and C.  */
+typedef enum
 {
-  return (__m128i) __builtin_ia32_pmovsdb512_mask ((__v16si) __A,
-						   (__v16qi)
-						   _mm_undefined_si128 (),
-						   (__mmask16) -1);
-}
+  _MM_TERNLOG_A = 0xF0,
+  _MM_TERNLOG_B = 0xCC,
+  _MM_TERNLOG_C = 0xAA
+} _MM_TERNLOG_ENUM;
 
-extern __inline void
+#ifdef __OPTIMIZE__
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtsepi32_storeu_epi8 (void * __P, __mmask16 __M, __m512i __A)
+_mm512_ternarylogic_epi64 (__m512i __A, __m512i __B, __m512i __C,
+			   const int __imm)
 {
-  __builtin_ia32_pmovsdb512mem_mask ((__v16qi *) __P, (__v16si) __A, __M);
+  return (__m512i)
+    __builtin_ia32_pternlogq512_mask ((__v8di) __A,
+				      (__v8di) __B,
+				      (__v8di) __C,
+				      (unsigned char) __imm,
+				      (__mmask8) -1);
 }
 
-extern __inline __m128i
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtsepi32_epi8 (__m128i __O, __mmask16 __M, __m512i __A)
+_mm512_mask_ternarylogic_epi64 (__m512i __A, __mmask8 __U, __m512i __B,
+				__m512i __C, const int __imm)
 {
-  return (__m128i) __builtin_ia32_pmovsdb512_mask ((__v16si) __A,
-						   (__v16qi) __O, __M);
+  return (__m512i)
+    __builtin_ia32_pternlogq512_mask ((__v8di) __A,
+				      (__v8di) __B,
+				      (__v8di) __C,
+				      (unsigned char) __imm,
+				      (__mmask8) __U);
 }
 
-extern __inline __m128i
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtsepi32_epi8 (__mmask16 __M, __m512i __A)
+_mm512_maskz_ternarylogic_epi64 (__mmask8 __U, __m512i __A, __m512i __B,
+				 __m512i __C, const int __imm)
 {
-  return (__m128i) __builtin_ia32_pmovsdb512_mask ((__v16si) __A,
-						   (__v16qi)
-						   _mm_setzero_si128 (),
-						   __M);
+  return (__m512i)
+    __builtin_ia32_pternlogq512_maskz ((__v8di) __A,
+				       (__v8di) __B,
+				       (__v8di) __C,
+				       (unsigned char) __imm,
+				       (__mmask8) __U);
 }
 
-extern __inline __m128i
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtusepi32_epi8 (__m512i __A)
+_mm512_ternarylogic_epi32 (__m512i __A, __m512i __B, __m512i __C,
+			   const int __imm)
 {
-  return (__m128i) __builtin_ia32_pmovusdb512_mask ((__v16si) __A,
-						    (__v16qi)
-						    _mm_undefined_si128 (),
-						    (__mmask16) -1);
+  return (__m512i)
+    __builtin_ia32_pternlogd512_mask ((__v16si) __A,
+				      (__v16si) __B,
+				      (__v16si) __C,
+				      (unsigned char) __imm,
+				      (__mmask16) -1);
 }
 
-extern __inline void
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtusepi32_storeu_epi8 (void * __P, __mmask16 __M, __m512i __A)
+_mm512_mask_ternarylogic_epi32 (__m512i __A, __mmask16 __U, __m512i __B,
+				__m512i __C, const int __imm)
 {
-  __builtin_ia32_pmovusdb512mem_mask ((__v16qi *) __P, (__v16si) __A, __M);
+  return (__m512i)
+    __builtin_ia32_pternlogd512_mask ((__v16si) __A,
+				      (__v16si) __B,
+				      (__v16si) __C,
+				      (unsigned char) __imm,
+				      (__mmask16) __U);
 }
 
-extern __inline __m128i
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtusepi32_epi8 (__m128i __O, __mmask16 __M, __m512i __A)
+_mm512_maskz_ternarylogic_epi32 (__mmask16 __U, __m512i __A, __m512i __B,
+				 __m512i __C, const int __imm)
 {
-  return (__m128i) __builtin_ia32_pmovusdb512_mask ((__v16si) __A,
-						    (__v16qi) __O,
-						    __M);
+  return (__m512i)
+    __builtin_ia32_pternlogd512_maskz ((__v16si) __A,
+				       (__v16si) __B,
+				       (__v16si) __C,
+				       (unsigned char) __imm,
+				       (__mmask16) __U);
 }
+#else
+#define _mm512_ternarylogic_epi64(A, B, C, I)			\
+  ((__m512i)							\
+   __builtin_ia32_pternlogq512_mask ((__v8di) (__m512i) (A),	\
+				     (__v8di) (__m512i) (B),	\
+				     (__v8di) (__m512i) (C),	\
+				     (unsigned char) (I),	\
+				     (__mmask8) -1))
+#define _mm512_mask_ternarylogic_epi64(A, U, B, C, I)		\
+  ((__m512i)							\
+   __builtin_ia32_pternlogq512_mask ((__v8di) (__m512i) (A),	\
+				     (__v8di) (__m512i) (B),	\
+				     (__v8di) (__m512i) (C),	\
+				     (unsigned char)(I),	\
+				     (__mmask8) (U)))
+#define _mm512_maskz_ternarylogic_epi64(U, A, B, C, I)		\
+  ((__m512i)							\
+   __builtin_ia32_pternlogq512_maskz ((__v8di) (__m512i) (A),	\
+				      (__v8di) (__m512i) (B),	\
+				      (__v8di) (__m512i) (C),	\
+				      (unsigned char) (I),	\
+				      (__mmask8) (U)))
+#define _mm512_ternarylogic_epi32(A, B, C, I)			\
+  ((__m512i)							\
+   __builtin_ia32_pternlogd512_mask ((__v16si) (__m512i) (A),	\
+				     (__v16si) (__m512i) (B),	\
+				     (__v16si) (__m512i) (C),	\
+				     (unsigned char) (I),	\
+				     (__mmask16) -1))
+#define _mm512_mask_ternarylogic_epi32(A, U, B, C, I)		\
+  ((__m512i)							\
+   __builtin_ia32_pternlogd512_mask ((__v16si) (__m512i) (A),	\
+				     (__v16si) (__m512i) (B),	\
+				     (__v16si) (__m512i) (C),	\
+				     (unsigned char) (I),	\
+				     (__mmask16) (U)))
+#define _mm512_maskz_ternarylogic_epi32(U, A, B, C, I)		\
+  ((__m512i)							\
+   __builtin_ia32_pternlogd512_maskz ((__v16si) (__m512i) (A),	\
+				      (__v16si) (__m512i) (B),	\
+				      (__v16si) (__m512i) (C),	\
+				      (unsigned char) (I),	\
+				      (__mmask16) (U)))
+#endif
 
-extern __inline __m128i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtusepi32_epi8 (__mmask16 __M, __m512i __A)
+_mm512_rcp14_pd (__m512d __A)
 {
-  return (__m128i) __builtin_ia32_pmovusdb512_mask ((__v16si) __A,
-						    (__v16qi)
-						    _mm_setzero_si128 (),
-						    __M);
+  return (__m512d) __builtin_ia32_rcp14pd512_mask ((__v8df) __A,
+						   (__v8df)
+						   _mm512_undefined_pd (),
+						   (__mmask8) -1);
 }
 
-extern __inline __m256i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepi32_epi16 (__m512i __A)
+_mm512_mask_rcp14_pd (__m512d __W, __mmask8 __U, __m512d __A)
 {
-  return (__m256i) __builtin_ia32_pmovdw512_mask ((__v16si) __A,
-						  (__v16hi)
-						  _mm256_undefined_si256 (),
-						  (__mmask16) -1);
+  return (__m512d) __builtin_ia32_rcp14pd512_mask ((__v8df) __A,
+						   (__v8df) __W,
+						   (__mmask8) __U);
 }
 
-extern __inline void
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepi32_storeu_epi16 (void * __P, __mmask16 __M, __m512i __A)
+_mm512_maskz_rcp14_pd (__mmask8 __U, __m512d __A)
 {
-  __builtin_ia32_pmovdw512mem_mask ((__v16hi *) __P, (__v16si) __A, __M);
+  return (__m512d) __builtin_ia32_rcp14pd512_mask ((__v8df) __A,
+						   (__v8df)
+						   _mm512_setzero_pd (),
+						   (__mmask8) __U);
 }
 
-extern __inline __m256i
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepi32_epi16 (__m256i __O, __mmask16 __M, __m512i __A)
+_mm512_rcp14_ps (__m512 __A)
 {
-  return (__m256i) __builtin_ia32_pmovdw512_mask ((__v16si) __A,
-						  (__v16hi) __O, __M);
+  return (__m512) __builtin_ia32_rcp14ps512_mask ((__v16sf) __A,
+						  (__v16sf)
+						  _mm512_undefined_ps (),
+						  (__mmask16) -1);
 }
 
-extern __inline __m256i
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepi32_epi16 (__mmask16 __M, __m512i __A)
+_mm512_mask_rcp14_ps (__m512 __W, __mmask16 __U, __m512 __A)
 {
-  return (__m256i) __builtin_ia32_pmovdw512_mask ((__v16si) __A,
-						  (__v16hi)
-						  _mm256_setzero_si256 (),
-						  __M);
+  return (__m512) __builtin_ia32_rcp14ps512_mask ((__v16sf) __A,
+						  (__v16sf) __W,
+						  (__mmask16) __U);
 }
 
-extern __inline __m256i
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtsepi32_epi16 (__m512i __A)
+_mm512_maskz_rcp14_ps (__mmask16 __U, __m512 __A)
 {
-  return (__m256i) __builtin_ia32_pmovsdw512_mask ((__v16si) __A,
-						   (__v16hi)
-						   _mm256_undefined_si256 (),
-						   (__mmask16) -1);
+  return (__m512) __builtin_ia32_rcp14ps512_mask ((__v16sf) __A,
+						  (__v16sf)
+						  _mm512_setzero_ps (),
+						  (__mmask16) __U);
 }
 
-extern __inline void
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtsepi32_storeu_epi16 (void *__P, __mmask16 __M, __m512i __A)
+_mm512_rsqrt14_pd (__m512d __A)
 {
-  __builtin_ia32_pmovsdw512mem_mask ((__v16hi*) __P, (__v16si) __A, __M);
+  return (__m512d) __builtin_ia32_rsqrt14pd512_mask ((__v8df) __A,
+						     (__v8df)
+						     _mm512_undefined_pd (),
+						     (__mmask8) -1);
 }
 
-extern __inline __m256i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtsepi32_epi16 (__m256i __O, __mmask16 __M, __m512i __A)
+_mm512_mask_rsqrt14_pd (__m512d __W, __mmask8 __U, __m512d __A)
 {
-  return (__m256i) __builtin_ia32_pmovsdw512_mask ((__v16si) __A,
-						   (__v16hi) __O, __M);
+  return (__m512d) __builtin_ia32_rsqrt14pd512_mask ((__v8df) __A,
+						     (__v8df) __W,
+						     (__mmask8) __U);
 }
 
-extern __inline __m256i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtsepi32_epi16 (__mmask16 __M, __m512i __A)
+_mm512_maskz_rsqrt14_pd (__mmask8 __U, __m512d __A)
 {
-  return (__m256i) __builtin_ia32_pmovsdw512_mask ((__v16si) __A,
-						   (__v16hi)
-						   _mm256_setzero_si256 (),
-						   __M);
+  return (__m512d) __builtin_ia32_rsqrt14pd512_mask ((__v8df) __A,
+						     (__v8df)
+						     _mm512_setzero_pd (),
+						     (__mmask8) __U);
 }
 
-extern __inline __m256i
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtusepi32_epi16 (__m512i __A)
+_mm512_rsqrt14_ps (__m512 __A)
 {
-  return (__m256i) __builtin_ia32_pmovusdw512_mask ((__v16si) __A,
-						    (__v16hi)
-						    _mm256_undefined_si256 (),
+  return (__m512) __builtin_ia32_rsqrt14ps512_mask ((__v16sf) __A,
+						    (__v16sf)
+						    _mm512_undefined_ps (),
 						    (__mmask16) -1);
 }
 
-extern __inline void
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtusepi32_storeu_epi16 (void *__P, __mmask16 __M, __m512i __A)
+_mm512_mask_rsqrt14_ps (__m512 __W, __mmask16 __U, __m512 __A)
 {
-  __builtin_ia32_pmovusdw512mem_mask ((__v16hi*) __P, (__v16si) __A, __M);
+  return (__m512) __builtin_ia32_rsqrt14ps512_mask ((__v16sf) __A,
+						    (__v16sf) __W,
+						    (__mmask16) __U);
 }
 
-extern __inline __m256i
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtusepi32_epi16 (__m256i __O, __mmask16 __M, __m512i __A)
+_mm512_maskz_rsqrt14_ps (__mmask16 __U, __m512 __A)
 {
-  return (__m256i) __builtin_ia32_pmovusdw512_mask ((__v16si) __A,
-						    (__v16hi) __O,
-						    __M);
+  return (__m512) __builtin_ia32_rsqrt14ps512_mask ((__v16sf) __A,
+						    (__v16sf)
+						    _mm512_setzero_ps (),
+						    (__mmask16) __U);
 }
 
-extern __inline __m256i
+#ifdef __OPTIMIZE__
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtusepi32_epi16 (__mmask16 __M, __m512i __A)
+_mm512_sqrt_round_pd (__m512d __A, const int __R)
 {
-  return (__m256i) __builtin_ia32_pmovusdw512_mask ((__v16si) __A,
-						    (__v16hi)
-						    _mm256_setzero_si256 (),
-						    __M);
+  return (__m512d) __builtin_ia32_sqrtpd512_mask ((__v8df) __A,
+						  (__v8df)
+						  _mm512_undefined_pd (),
+						  (__mmask8) -1, __R);
 }
 
-extern __inline __m256i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepi64_epi32 (__m512i __A)
+_mm512_mask_sqrt_round_pd (__m512d __W, __mmask8 __U, __m512d __A,
+			   const int __R)
 {
-  return (__m256i) __builtin_ia32_pmovqd512_mask ((__v8di) __A,
-						  (__v8si)
-						  _mm256_undefined_si256 (),
-						  (__mmask8) -1);
+  return (__m512d) __builtin_ia32_sqrtpd512_mask ((__v8df) __A,
+						  (__v8df) __W,
+						  (__mmask8) __U, __R);
 }
 
-extern __inline void
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepi64_storeu_epi32 (void* __P, __mmask8 __M, __m512i __A)
+_mm512_maskz_sqrt_round_pd (__mmask8 __U, __m512d __A, const int __R)
 {
-  __builtin_ia32_pmovqd512mem_mask ((__v8si *) __P, (__v8di) __A, __M);
+  return (__m512d) __builtin_ia32_sqrtpd512_mask ((__v8df) __A,
+						  (__v8df)
+						  _mm512_setzero_pd (),
+						  (__mmask8) __U, __R);
 }
 
-extern __inline __m256i
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepi64_epi32 (__m256i __O, __mmask8 __M, __m512i __A)
+_mm512_sqrt_round_ps (__m512 __A, const int __R)
 {
-  return (__m256i) __builtin_ia32_pmovqd512_mask ((__v8di) __A,
-						  (__v8si) __O, __M);
+  return (__m512) __builtin_ia32_sqrtps512_mask ((__v16sf) __A,
+						 (__v16sf)
+						 _mm512_undefined_ps (),
+						 (__mmask16) -1, __R);
 }
 
-extern __inline __m256i
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepi64_epi32 (__mmask8 __M, __m512i __A)
+_mm512_mask_sqrt_round_ps (__m512 __W, __mmask16 __U, __m512 __A, const int __R)
 {
-  return (__m256i) __builtin_ia32_pmovqd512_mask ((__v8di) __A,
-						  (__v8si)
-						  _mm256_setzero_si256 (),
-						  __M);
+  return (__m512) __builtin_ia32_sqrtps512_mask ((__v16sf) __A,
+						 (__v16sf) __W,
+						 (__mmask16) __U, __R);
 }
 
-extern __inline __m256i
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtsepi64_epi32 (__m512i __A)
+_mm512_maskz_sqrt_round_ps (__mmask16 __U, __m512 __A, const int __R)
 {
-  return (__m256i) __builtin_ia32_pmovsqd512_mask ((__v8di) __A,
-						   (__v8si)
-						   _mm256_undefined_si256 (),
-						   (__mmask8) -1);
+  return (__m512) __builtin_ia32_sqrtps512_mask ((__v16sf) __A,
+						 (__v16sf)
+						 _mm512_setzero_ps (),
+						 (__mmask16) __U, __R);
 }
 
-extern __inline void
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtsepi64_storeu_epi32 (void *__P, __mmask8 __M, __m512i __A)
-{
-  __builtin_ia32_pmovsqd512mem_mask ((__v8si *) __P, (__v8di) __A, __M);
-}
+#else
+#define _mm512_sqrt_round_pd(A, C)            \
+    (__m512d)__builtin_ia32_sqrtpd512_mask(A, (__v8df)_mm512_undefined_pd(), -1, C)
 
-extern __inline __m256i
+#define _mm512_mask_sqrt_round_pd(W, U, A, C) \
+    (__m512d)__builtin_ia32_sqrtpd512_mask(A, W, U, C)
+
+#define _mm512_maskz_sqrt_round_pd(U, A, C)   \
+    (__m512d)__builtin_ia32_sqrtpd512_mask(A, (__v8df)_mm512_setzero_pd(), U, C)
+
+#define _mm512_sqrt_round_ps(A, C)            \
+    (__m512)__builtin_ia32_sqrtps512_mask(A, (__v16sf)_mm512_undefined_ps(), -1, C)
+
+#define _mm512_mask_sqrt_round_ps(W, U, A, C) \
+    (__m512)__builtin_ia32_sqrtps512_mask(A, W, U, C)
+
+#define _mm512_maskz_sqrt_round_ps(U, A, C)   \
+    (__m512)__builtin_ia32_sqrtps512_mask(A, (__v16sf)_mm512_setzero_ps(), U, C)
+
+#endif
+
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtsepi64_epi32 (__m256i __O, __mmask8 __M, __m512i __A)
+_mm512_cvtepi8_epi32 (__m128i __A)
 {
-  return (__m256i) __builtin_ia32_pmovsqd512_mask ((__v8di) __A,
-						   (__v8si) __O, __M);
+  return (__m512i) __builtin_ia32_pmovsxbd512_mask ((__v16qi) __A,
+						    (__v16si)
+						    _mm512_undefined_epi32 (),
+						    (__mmask16) -1);
 }
 
-extern __inline __m256i
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtsepi64_epi32 (__mmask8 __M, __m512i __A)
+_mm512_mask_cvtepi8_epi32 (__m512i __W, __mmask16 __U, __m128i __A)
 {
-  return (__m256i) __builtin_ia32_pmovsqd512_mask ((__v8di) __A,
-						   (__v8si)
-						   _mm256_setzero_si256 (),
-						   __M);
+  return (__m512i) __builtin_ia32_pmovsxbd512_mask ((__v16qi) __A,
+						    (__v16si) __W,
+						    (__mmask16) __U);
 }
 
-extern __inline __m256i
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtusepi64_epi32 (__m512i __A)
+_mm512_maskz_cvtepi8_epi32 (__mmask16 __U, __m128i __A)
 {
-  return (__m256i) __builtin_ia32_pmovusqd512_mask ((__v8di) __A,
-						    (__v8si)
-						    _mm256_undefined_si256 (),
-						    (__mmask8) -1);
+  return (__m512i) __builtin_ia32_pmovsxbd512_mask ((__v16qi) __A,
+						    (__v16si)
+						    _mm512_setzero_si512 (),
+						    (__mmask16) __U);
 }
 
-extern __inline void
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtusepi64_storeu_epi32 (void* __P, __mmask8 __M, __m512i __A)
+_mm512_cvtepi8_epi64 (__m128i __A)
 {
-  __builtin_ia32_pmovusqd512mem_mask ((__v8si*) __P, (__v8di) __A, __M);
+  return (__m512i) __builtin_ia32_pmovsxbq512_mask ((__v16qi) __A,
+						    (__v8di)
+						    _mm512_undefined_epi32 (),
+						    (__mmask8) -1);
 }
 
-extern __inline __m256i
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtusepi64_epi32 (__m256i __O, __mmask8 __M, __m512i __A)
+_mm512_mask_cvtepi8_epi64 (__m512i __W, __mmask8 __U, __m128i __A)
 {
-  return (__m256i) __builtin_ia32_pmovusqd512_mask ((__v8di) __A,
-						    (__v8si) __O, __M);
+  return (__m512i) __builtin_ia32_pmovsxbq512_mask ((__v16qi) __A,
+						    (__v8di) __W,
+						    (__mmask8) __U);
 }
 
-extern __inline __m256i
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtusepi64_epi32 (__mmask8 __M, __m512i __A)
-{
-  return (__m256i) __builtin_ia32_pmovusqd512_mask ((__v8di) __A,
-						    (__v8si)
-						    _mm256_setzero_si256 (),
-						    __M);
+_mm512_maskz_cvtepi8_epi64 (__mmask8 __U, __m128i __A)
+{
+  return (__m512i) __builtin_ia32_pmovsxbq512_mask ((__v16qi) __A,
+						    (__v8di)
+						    _mm512_setzero_si512 (),
+						    (__mmask8) __U);
 }
 
-extern __inline __m128i
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepi64_epi16 (__m512i __A)
+_mm512_cvtepi16_epi32 (__m256i __A)
 {
-  return (__m128i) __builtin_ia32_pmovqw512_mask ((__v8di) __A,
-						  (__v8hi)
-						  _mm_undefined_si128 (),
-						  (__mmask8) -1);
+  return (__m512i) __builtin_ia32_pmovsxwd512_mask ((__v16hi) __A,
+						    (__v16si)
+						    _mm512_undefined_epi32 (),
+						    (__mmask16) -1);
 }
 
-extern __inline void
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepi64_storeu_epi16 (void *__P, __mmask8 __M, __m512i __A)
+_mm512_mask_cvtepi16_epi32 (__m512i __W, __mmask16 __U, __m256i __A)
 {
-  __builtin_ia32_pmovqw512mem_mask ((__v8hi *) __P, (__v8di) __A, __M);
+  return (__m512i) __builtin_ia32_pmovsxwd512_mask ((__v16hi) __A,
+						    (__v16si) __W,
+						    (__mmask16) __U);
 }
 
-extern __inline __m128i
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepi64_epi16 (__m128i __O, __mmask8 __M, __m512i __A)
+_mm512_maskz_cvtepi16_epi32 (__mmask16 __U, __m256i __A)
 {
-  return (__m128i) __builtin_ia32_pmovqw512_mask ((__v8di) __A,
-						  (__v8hi) __O, __M);
+  return (__m512i) __builtin_ia32_pmovsxwd512_mask ((__v16hi) __A,
+						    (__v16si)
+						    _mm512_setzero_si512 (),
+						    (__mmask16) __U);
 }
 
-extern __inline __m128i
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepi64_epi16 (__mmask8 __M, __m512i __A)
+_mm512_cvtepi16_epi64 (__m128i __A)
 {
-  return (__m128i) __builtin_ia32_pmovqw512_mask ((__v8di) __A,
-						  (__v8hi)
-						  _mm_setzero_si128 (),
-						  __M);
+  return (__m512i) __builtin_ia32_pmovsxwq512_mask ((__v8hi) __A,
+						    (__v8di)
+						    _mm512_undefined_epi32 (),
+						    (__mmask8) -1);
 }
 
-extern __inline __m128i
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtsepi64_epi16 (__m512i __A)
+_mm512_mask_cvtepi16_epi64 (__m512i __W, __mmask8 __U, __m128i __A)
 {
-  return (__m128i) __builtin_ia32_pmovsqw512_mask ((__v8di) __A,
-						   (__v8hi)
-						   _mm_undefined_si128 (),
-						   (__mmask8) -1);
+  return (__m512i) __builtin_ia32_pmovsxwq512_mask ((__v8hi) __A,
+						    (__v8di) __W,
+						    (__mmask8) __U);
 }
 
-extern __inline void
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtsepi64_storeu_epi16 (void * __P, __mmask8 __M, __m512i __A)
+_mm512_maskz_cvtepi16_epi64 (__mmask8 __U, __m128i __A)
 {
-  __builtin_ia32_pmovsqw512mem_mask ((__v8hi *) __P, (__v8di) __A, __M);
+  return (__m512i) __builtin_ia32_pmovsxwq512_mask ((__v8hi) __A,
+						    (__v8di)
+						    _mm512_setzero_si512 (),
+						    (__mmask8) __U);
 }
 
-extern __inline __m128i
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtsepi64_epi16 (__m128i __O, __mmask8 __M, __m512i __A)
+_mm512_cvtepi32_epi64 (__m256i __X)
 {
-  return (__m128i) __builtin_ia32_pmovsqw512_mask ((__v8di) __A,
-						   (__v8hi) __O, __M);
+  return (__m512i) __builtin_ia32_pmovsxdq512_mask ((__v8si) __X,
+						    (__v8di)
+						    _mm512_undefined_epi32 (),
+						    (__mmask8) -1);
 }
 
-extern __inline __m128i
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtsepi64_epi16 (__mmask8 __M, __m512i __A)
+_mm512_mask_cvtepi32_epi64 (__m512i __W, __mmask8 __U, __m256i __X)
 {
-  return (__m128i) __builtin_ia32_pmovsqw512_mask ((__v8di) __A,
-						   (__v8hi)
-						   _mm_setzero_si128 (),
-						   __M);
+  return (__m512i) __builtin_ia32_pmovsxdq512_mask ((__v8si) __X,
+						    (__v8di) __W,
+						    (__mmask8) __U);
 }
 
-extern __inline __m128i
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtusepi64_epi16 (__m512i __A)
+_mm512_maskz_cvtepi32_epi64 (__mmask8 __U, __m256i __X)
 {
-  return (__m128i) __builtin_ia32_pmovusqw512_mask ((__v8di) __A,
-						    (__v8hi)
-						    _mm_undefined_si128 (),
-						    (__mmask8) -1);
+  return (__m512i) __builtin_ia32_pmovsxdq512_mask ((__v8si) __X,
+						    (__v8di)
+						    _mm512_setzero_si512 (),
+						    (__mmask8) __U);
 }
 
-extern __inline void
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtusepi64_storeu_epi16 (void *__P, __mmask8 __M, __m512i __A)
+_mm512_cvtepu8_epi32 (__m128i __A)
 {
-  __builtin_ia32_pmovusqw512mem_mask ((__v8hi*) __P, (__v8di) __A, __M);
+  return (__m512i) __builtin_ia32_pmovzxbd512_mask ((__v16qi) __A,
+						    (__v16si)
+						    _mm512_undefined_epi32 (),
+						    (__mmask16) -1);
 }
 
-extern __inline __m128i
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtusepi64_epi16 (__m128i __O, __mmask8 __M, __m512i __A)
+_mm512_mask_cvtepu8_epi32 (__m512i __W, __mmask16 __U, __m128i __A)
 {
-  return (__m128i) __builtin_ia32_pmovusqw512_mask ((__v8di) __A,
-						    (__v8hi) __O, __M);
+  return (__m512i) __builtin_ia32_pmovzxbd512_mask ((__v16qi) __A,
+						    (__v16si) __W,
+						    (__mmask16) __U);
 }
 
-extern __inline __m128i
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtusepi64_epi16 (__mmask8 __M, __m512i __A)
+_mm512_maskz_cvtepu8_epi32 (__mmask16 __U, __m128i __A)
 {
-  return (__m128i) __builtin_ia32_pmovusqw512_mask ((__v8di) __A,
-						    (__v8hi)
-						    _mm_setzero_si128 (),
-						    __M);
+  return (__m512i) __builtin_ia32_pmovzxbd512_mask ((__v16qi) __A,
+						    (__v16si)
+						    _mm512_setzero_si512 (),
+						    (__mmask16) __U);
 }
 
-extern __inline __m128i
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepi64_epi8 (__m512i __A)
+_mm512_cvtepu8_epi64 (__m128i __A)
 {
-  return (__m128i) __builtin_ia32_pmovqb512_mask ((__v8di) __A,
-						  (__v16qi)
-						  _mm_undefined_si128 (),
-						  (__mmask8) -1);
+  return (__m512i) __builtin_ia32_pmovzxbq512_mask ((__v16qi) __A,
+						    (__v8di)
+						    _mm512_undefined_epi32 (),
+						    (__mmask8) -1);
 }
 
-extern __inline void
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepi64_storeu_epi8 (void * __P, __mmask8 __M, __m512i __A)
+_mm512_mask_cvtepu8_epi64 (__m512i __W, __mmask8 __U, __m128i __A)
 {
-  __builtin_ia32_pmovqb512mem_mask ((unsigned long long *) __P,
-				    (__v8di) __A, __M);
+  return (__m512i) __builtin_ia32_pmovzxbq512_mask ((__v16qi) __A,
+						    (__v8di) __W,
+						    (__mmask8) __U);
 }
 
-extern __inline __m128i
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepi64_epi8 (__m128i __O, __mmask8 __M, __m512i __A)
+_mm512_maskz_cvtepu8_epi64 (__mmask8 __U, __m128i __A)
 {
-  return (__m128i) __builtin_ia32_pmovqb512_mask ((__v8di) __A,
-						  (__v16qi) __O, __M);
+  return (__m512i) __builtin_ia32_pmovzxbq512_mask ((__v16qi) __A,
+						    (__v8di)
+						    _mm512_setzero_si512 (),
+						    (__mmask8) __U);
 }
 
-extern __inline __m128i
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepi64_epi8 (__mmask8 __M, __m512i __A)
+_mm512_cvtepu16_epi32 (__m256i __A)
 {
-  return (__m128i) __builtin_ia32_pmovqb512_mask ((__v8di) __A,
-						  (__v16qi)
-						  _mm_setzero_si128 (),
-						  __M);
+  return (__m512i) __builtin_ia32_pmovzxwd512_mask ((__v16hi) __A,
+						    (__v16si)
+						    _mm512_undefined_epi32 (),
+						    (__mmask16) -1);
 }
 
-extern __inline __m128i
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtsepi64_epi8 (__m512i __A)
+_mm512_mask_cvtepu16_epi32 (__m512i __W, __mmask16 __U, __m256i __A)
 {
-  return (__m128i) __builtin_ia32_pmovsqb512_mask ((__v8di) __A,
-						   (__v16qi)
-						   _mm_undefined_si128 (),
-						   (__mmask8) -1);
+  return (__m512i) __builtin_ia32_pmovzxwd512_mask ((__v16hi) __A,
+						    (__v16si) __W,
+						    (__mmask16) __U);
 }
 
-extern __inline void
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtsepi64_storeu_epi8 (void * __P, __mmask8 __M, __m512i __A)
+_mm512_maskz_cvtepu16_epi32 (__mmask16 __U, __m256i __A)
 {
-  __builtin_ia32_pmovsqb512mem_mask ((unsigned long long *) __P, (__v8di) __A, __M);
+  return (__m512i) __builtin_ia32_pmovzxwd512_mask ((__v16hi) __A,
+						    (__v16si)
+						    _mm512_setzero_si512 (),
+						    (__mmask16) __U);
 }
 
-extern __inline __m128i
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtsepi64_epi8 (__m128i __O, __mmask8 __M, __m512i __A)
+_mm512_cvtepu16_epi64 (__m128i __A)
 {
-  return (__m128i) __builtin_ia32_pmovsqb512_mask ((__v8di) __A,
-						   (__v16qi) __O, __M);
+  return (__m512i) __builtin_ia32_pmovzxwq512_mask ((__v8hi) __A,
+						    (__v8di)
+						    _mm512_undefined_epi32 (),
+						    (__mmask8) -1);
 }
 
-extern __inline __m128i
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtsepi64_epi8 (__mmask8 __M, __m512i __A)
+_mm512_mask_cvtepu16_epi64 (__m512i __W, __mmask8 __U, __m128i __A)
 {
-  return (__m128i) __builtin_ia32_pmovsqb512_mask ((__v8di) __A,
-						   (__v16qi)
-						   _mm_setzero_si128 (),
-						   __M);
+  return (__m512i) __builtin_ia32_pmovzxwq512_mask ((__v8hi) __A,
+						    (__v8di) __W,
+						    (__mmask8) __U);
 }
 
-extern __inline __m128i
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtusepi64_epi8 (__m512i __A)
+_mm512_maskz_cvtepu16_epi64 (__mmask8 __U, __m128i __A)
 {
-  return (__m128i) __builtin_ia32_pmovusqb512_mask ((__v8di) __A,
-						    (__v16qi)
-						    _mm_undefined_si128 (),
-						    (__mmask8) -1);
+  return (__m512i) __builtin_ia32_pmovzxwq512_mask ((__v8hi) __A,
+						    (__v8di)
+						    _mm512_setzero_si512 (),
+						    (__mmask8) __U);
 }
 
-extern __inline void
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtusepi64_storeu_epi8 (void * __P, __mmask8 __M, __m512i __A)
+_mm512_cvtepu32_epi64 (__m256i __X)
 {
-  __builtin_ia32_pmovusqb512mem_mask ((unsigned long long *) __P, (__v8di) __A, __M);
+  return (__m512i) __builtin_ia32_pmovzxdq512_mask ((__v8si) __X,
+						    (__v8di)
+						    _mm512_undefined_epi32 (),
+						    (__mmask8) -1);
 }
 
-extern __inline __m128i
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtusepi64_epi8 (__m128i __O, __mmask8 __M, __m512i __A)
+_mm512_mask_cvtepu32_epi64 (__m512i __W, __mmask8 __U, __m256i __X)
 {
-  return (__m128i) __builtin_ia32_pmovusqb512_mask ((__v8di) __A,
-						    (__v16qi) __O,
-						    __M);
+  return (__m512i) __builtin_ia32_pmovzxdq512_mask ((__v8si) __X,
+						    (__v8di) __W,
+						    (__mmask8) __U);
 }
 
-extern __inline __m128i
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtusepi64_epi8 (__mmask8 __M, __m512i __A)
+_mm512_maskz_cvtepu32_epi64 (__mmask8 __U, __m256i __X)
 {
-  return (__m128i) __builtin_ia32_pmovusqb512_mask ((__v8di) __A,
-						    (__v16qi)
-						    _mm_setzero_si128 (),
-						    __M);
+  return (__m512i) __builtin_ia32_pmovzxdq512_mask ((__v8si) __X,
+						    (__v8di)
+						    _mm512_setzero_si512 (),
+						    (__mmask8) __U);
 }
 
+#ifdef __OPTIMIZE__
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepi32_pd (__m256i __A)
+_mm512_add_round_pd (__m512d __A, __m512d __B, const int __R)
 {
-  return (__m512d) __builtin_ia32_cvtdq2pd512_mask ((__v8si) __A,
-						    (__v8df)
-						    _mm512_undefined_pd (),
-						    (__mmask8) -1);
+  return (__m512d) __builtin_ia32_addpd512_mask ((__v8df) __A,
+						 (__v8df) __B,
+						 (__v8df)
+						 _mm512_undefined_pd (),
+						 (__mmask8) -1, __R);
 }
 
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepi32_pd (__m512d __W, __mmask8 __U, __m256i __A)
+_mm512_mask_add_round_pd (__m512d __W, __mmask8 __U, __m512d __A,
+			  __m512d __B, const int __R)
 {
-  return (__m512d) __builtin_ia32_cvtdq2pd512_mask ((__v8si) __A,
-						    (__v8df) __W,
-						    (__mmask8) __U);
+  return (__m512d) __builtin_ia32_addpd512_mask ((__v8df) __A,
+						 (__v8df) __B,
+						 (__v8df) __W,
+						 (__mmask8) __U, __R);
 }
 
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepi32_pd (__mmask8 __U, __m256i __A)
+_mm512_maskz_add_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
+			   const int __R)
 {
-  return (__m512d) __builtin_ia32_cvtdq2pd512_mask ((__v8si) __A,
-						    (__v8df)
-						    _mm512_setzero_pd (),
-						    (__mmask8) __U);
+  return (__m512d) __builtin_ia32_addpd512_mask ((__v8df) __A,
+						 (__v8df) __B,
+						 (__v8df)
+						 _mm512_setzero_pd (),
+						 (__mmask8) __U, __R);
 }
 
-extern __inline __m512d
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepu32_pd (__m256i __A)
+_mm512_add_round_ps (__m512 __A, __m512 __B, const int __R)
 {
-  return (__m512d) __builtin_ia32_cvtudq2pd512_mask ((__v8si) __A,
-						     (__v8df)
-						     _mm512_undefined_pd (),
-						     (__mmask8) -1);
+  return (__m512) __builtin_ia32_addps512_mask ((__v16sf) __A,
+						(__v16sf) __B,
+						(__v16sf)
+						_mm512_undefined_ps (),
+						(__mmask16) -1, __R);
 }
 
-extern __inline __m512d
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepu32_pd (__m512d __W, __mmask8 __U, __m256i __A)
+_mm512_mask_add_round_ps (__m512 __W, __mmask16 __U, __m512 __A,
+			  __m512 __B, const int __R)
 {
-  return (__m512d) __builtin_ia32_cvtudq2pd512_mask ((__v8si) __A,
-						     (__v8df) __W,
-						     (__mmask8) __U);
+  return (__m512) __builtin_ia32_addps512_mask ((__v16sf) __A,
+						(__v16sf) __B,
+						(__v16sf) __W,
+						(__mmask16) __U, __R);
 }
 
-extern __inline __m512d
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepu32_pd (__mmask8 __U, __m256i __A)
+_mm512_maskz_add_round_ps (__mmask16 __U, __m512 __A, __m512 __B, const int __R)
 {
-  return (__m512d) __builtin_ia32_cvtudq2pd512_mask ((__v8si) __A,
-						     (__v8df)
-						     _mm512_setzero_pd (),
-						     (__mmask8) __U);
+  return (__m512) __builtin_ia32_addps512_mask ((__v16sf) __A,
+						(__v16sf) __B,
+						(__v16sf)
+						_mm512_setzero_ps (),
+						(__mmask16) __U, __R);
 }
 
-#ifdef __OPTIMIZE__
-extern __inline __m512
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvt_roundepi32_ps (__m512i __A, const int __R)
+_mm512_sub_round_pd (__m512d __A, __m512d __B, const int __R)
 {
-  return (__m512) __builtin_ia32_cvtdq2ps512_mask ((__v16si) __A,
-						   (__v16sf)
-						   _mm512_undefined_ps (),
-						   (__mmask16) -1, __R);
+  return (__m512d) __builtin_ia32_subpd512_mask ((__v8df) __A,
+						 (__v8df) __B,
+						 (__v8df)
+						 _mm512_undefined_pd (),
+						 (__mmask8) -1, __R);
 }
 
-extern __inline __m512
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvt_roundepi32_ps (__m512 __W, __mmask16 __U, __m512i __A,
-			       const int __R)
+_mm512_mask_sub_round_pd (__m512d __W, __mmask8 __U, __m512d __A,
+			  __m512d __B, const int __R)
 {
-  return (__m512) __builtin_ia32_cvtdq2ps512_mask ((__v16si) __A,
-						   (__v16sf) __W,
-						   (__mmask16) __U, __R);
+  return (__m512d) __builtin_ia32_subpd512_mask ((__v8df) __A,
+						 (__v8df) __B,
+						 (__v8df) __W,
+						 (__mmask8) __U, __R);
 }
 
-extern __inline __m512
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvt_roundepi32_ps (__mmask16 __U, __m512i __A, const int __R)
+_mm512_maskz_sub_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
+			   const int __R)
 {
-  return (__m512) __builtin_ia32_cvtdq2ps512_mask ((__v16si) __A,
-						   (__v16sf)
-						   _mm512_setzero_ps (),
-						   (__mmask16) __U, __R);
+  return (__m512d) __builtin_ia32_subpd512_mask ((__v8df) __A,
+						 (__v8df) __B,
+						 (__v8df)
+						 _mm512_setzero_pd (),
+						 (__mmask8) __U, __R);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvt_roundepu32_ps (__m512i __A, const int __R)
+_mm512_sub_round_ps (__m512 __A, __m512 __B, const int __R)
 {
-  return (__m512) __builtin_ia32_cvtudq2ps512_mask ((__v16si) __A,
-						    (__v16sf)
-						    _mm512_undefined_ps (),
-						    (__mmask16) -1, __R);
+  return (__m512) __builtin_ia32_subps512_mask ((__v16sf) __A,
+						(__v16sf) __B,
+						(__v16sf)
+						_mm512_undefined_ps (),
+						(__mmask16) -1, __R);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvt_roundepu32_ps (__m512 __W, __mmask16 __U, __m512i __A,
-			       const int __R)
+_mm512_mask_sub_round_ps (__m512 __W, __mmask16 __U, __m512 __A,
+			  __m512 __B, const int __R)
 {
-  return (__m512) __builtin_ia32_cvtudq2ps512_mask ((__v16si) __A,
-						    (__v16sf) __W,
-						    (__mmask16) __U, __R);
+  return (__m512) __builtin_ia32_subps512_mask ((__v16sf) __A,
+						(__v16sf) __B,
+						(__v16sf) __W,
+						(__mmask16) __U, __R);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvt_roundepu32_ps (__mmask16 __U, __m512i __A, const int __R)
+_mm512_maskz_sub_round_ps (__mmask16 __U, __m512 __A, __m512 __B, const int __R)
 {
-  return (__m512) __builtin_ia32_cvtudq2ps512_mask ((__v16si) __A,
-						    (__v16sf)
-						    _mm512_setzero_ps (),
-						    (__mmask16) __U, __R);
+  return (__m512) __builtin_ia32_subps512_mask ((__v16sf) __A,
+						(__v16sf) __B,
+						(__v16sf)
+						_mm512_setzero_ps (),
+						(__mmask16) __U, __R);
 }
-
 #else
-#define _mm512_cvt_roundepi32_ps(A, B)        \
-    (__m512)__builtin_ia32_cvtdq2ps512_mask((__v16si)(A), (__v16sf)_mm512_undefined_ps(), -1, B)
+#define _mm512_add_round_pd(A, B, C)            \
+    (__m512d)__builtin_ia32_addpd512_mask(A, B, (__v8df)_mm512_undefined_pd(), -1, C)
 
-#define _mm512_mask_cvt_roundepi32_ps(W, U, A, B)   \
-    (__m512)__builtin_ia32_cvtdq2ps512_mask((__v16si)(A), W, U, B)
+#define _mm512_mask_add_round_pd(W, U, A, B, C) \
+    (__m512d)__builtin_ia32_addpd512_mask(A, B, W, U, C)
 
-#define _mm512_maskz_cvt_roundepi32_ps(U, A, B)      \
-    (__m512)__builtin_ia32_cvtdq2ps512_mask((__v16si)(A), (__v16sf)_mm512_setzero_ps(), U, B)
+#define _mm512_maskz_add_round_pd(U, A, B, C)   \
+    (__m512d)__builtin_ia32_addpd512_mask(A, B, (__v8df)_mm512_setzero_pd(), U, C)
 
-#define _mm512_cvt_roundepu32_ps(A, B)        \
-    (__m512)__builtin_ia32_cvtudq2ps512_mask((__v16si)(A), (__v16sf)_mm512_undefined_ps(), -1, B)
+#define _mm512_add_round_ps(A, B, C)            \
+    (__m512)__builtin_ia32_addps512_mask(A, B, (__v16sf)_mm512_undefined_ps(), -1, C)
 
-#define _mm512_mask_cvt_roundepu32_ps(W, U, A, B)   \
-    (__m512)__builtin_ia32_cvtudq2ps512_mask((__v16si)(A), W, U, B)
+#define _mm512_mask_add_round_ps(W, U, A, B, C) \
+    (__m512)__builtin_ia32_addps512_mask(A, B, W, U, C)
 
-#define _mm512_maskz_cvt_roundepu32_ps(U, A, B)      \
-    (__m512)__builtin_ia32_cvtudq2ps512_mask((__v16si)(A), (__v16sf)_mm512_setzero_ps(), U, B)
+#define _mm512_maskz_add_round_ps(U, A, B, C)   \
+    (__m512)__builtin_ia32_addps512_mask(A, B, (__v16sf)_mm512_setzero_ps(), U, C)
+
+#define _mm512_sub_round_pd(A, B, C)            \
+    (__m512d)__builtin_ia32_subpd512_mask(A, B, (__v8df)_mm512_undefined_pd(), -1, C)
+
+#define _mm512_mask_sub_round_pd(W, U, A, B, C) \
+    (__m512d)__builtin_ia32_subpd512_mask(A, B, W, U, C)
+
+#define _mm512_maskz_sub_round_pd(U, A, B, C)   \
+    (__m512d)__builtin_ia32_subpd512_mask(A, B, (__v8df)_mm512_setzero_pd(), U, C)
+
+#define _mm512_sub_round_ps(A, B, C)            \
+    (__m512)__builtin_ia32_subps512_mask(A, B, (__v16sf)_mm512_undefined_ps(), -1, C)
+
+#define _mm512_mask_sub_round_ps(W, U, A, B, C) \
+    (__m512)__builtin_ia32_subps512_mask(A, B, W, U, C)
+
+#define _mm512_maskz_sub_round_ps(U, A, B, C)   \
+    (__m512)__builtin_ia32_subps512_mask(A, B, (__v16sf)_mm512_setzero_ps(), U, C)
 #endif
 
 #ifdef __OPTIMIZE__
-extern __inline __m256d
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_extractf64x4_pd (__m512d __A, const int __imm)
+_mm512_mul_round_pd (__m512d __A, __m512d __B, const int __R)
 {
-  return (__m256d) __builtin_ia32_extractf64x4_mask ((__v8df) __A,
-						     __imm,
-						     (__v4df)
-						     _mm256_undefined_pd (),
-						     (__mmask8) -1);
+  return (__m512d) __builtin_ia32_mulpd512_mask ((__v8df) __A,
+						 (__v8df) __B,
+						 (__v8df)
+						 _mm512_undefined_pd (),
+						 (__mmask8) -1, __R);
 }
 
-extern __inline __m256d
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_extractf64x4_pd (__m256d __W, __mmask8 __U, __m512d __A,
-			     const int __imm)
+_mm512_mask_mul_round_pd (__m512d __W, __mmask8 __U, __m512d __A,
+			  __m512d __B, const int __R)
 {
-  return (__m256d) __builtin_ia32_extractf64x4_mask ((__v8df) __A,
-						     __imm,
-						     (__v4df) __W,
-						     (__mmask8) __U);
+  return (__m512d) __builtin_ia32_mulpd512_mask ((__v8df) __A,
+						 (__v8df) __B,
+						 (__v8df) __W,
+						 (__mmask8) __U, __R);
 }
 
-extern __inline __m256d
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_extractf64x4_pd (__mmask8 __U, __m512d __A, const int __imm)
+_mm512_maskz_mul_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
+			   const int __R)
 {
-  return (__m256d) __builtin_ia32_extractf64x4_mask ((__v8df) __A,
-						     __imm,
-						     (__v4df)
-						     _mm256_setzero_pd (),
-						     (__mmask8) __U);
+  return (__m512d) __builtin_ia32_mulpd512_mask ((__v8df) __A,
+						 (__v8df) __B,
+						 (__v8df)
+						 _mm512_setzero_pd (),
+						 (__mmask8) __U, __R);
 }
 
-extern __inline __m128
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_extractf32x4_ps (__m512 __A, const int __imm)
+_mm512_mul_round_ps (__m512 __A, __m512 __B, const int __R)
 {
-  return (__m128) __builtin_ia32_extractf32x4_mask ((__v16sf) __A,
-						    __imm,
-						    (__v4sf)
-						    _mm_undefined_ps (),
-						    (__mmask8) -1);
+  return (__m512) __builtin_ia32_mulps512_mask ((__v16sf) __A,
+						(__v16sf) __B,
+						(__v16sf)
+						_mm512_undefined_ps (),
+						(__mmask16) -1, __R);
 }
 
-extern __inline __m128
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_extractf32x4_ps (__m128 __W, __mmask8 __U, __m512 __A,
-			     const int __imm)
-{
-  return (__m128) __builtin_ia32_extractf32x4_mask ((__v16sf) __A,
-						    __imm,
-						    (__v4sf) __W,
-						    (__mmask8) __U);
+_mm512_mask_mul_round_ps (__m512 __W, __mmask16 __U, __m512 __A,
+			  __m512 __B, const int __R)
+{
+  return (__m512) __builtin_ia32_mulps512_mask ((__v16sf) __A,
+						(__v16sf) __B,
+						(__v16sf) __W,
+						(__mmask16) __U, __R);
 }
 
-extern __inline __m128
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_extractf32x4_ps (__mmask8 __U, __m512 __A, const int __imm)
+_mm512_maskz_mul_round_ps (__mmask16 __U, __m512 __A, __m512 __B, const int __R)
 {
-  return (__m128) __builtin_ia32_extractf32x4_mask ((__v16sf) __A,
-						    __imm,
-						    (__v4sf)
-						    _mm_setzero_ps (),
-						    (__mmask8) __U);
+  return (__m512) __builtin_ia32_mulps512_mask ((__v16sf) __A,
+						(__v16sf) __B,
+						(__v16sf)
+						_mm512_setzero_ps (),
+						(__mmask16) __U, __R);
 }
 
-extern __inline __m256i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_extracti64x4_epi64 (__m512i __A, const int __imm)
+_mm512_div_round_pd (__m512d __M, __m512d __V, const int __R)
 {
-  return (__m256i) __builtin_ia32_extracti64x4_mask ((__v8di) __A,
-						     __imm,
-						     (__v4di)
-						     _mm256_undefined_si256 (),
-						     (__mmask8) -1);
+  return (__m512d) __builtin_ia32_divpd512_mask ((__v8df) __M,
+						 (__v8df) __V,
+						 (__v8df)
+						 _mm512_undefined_pd (),
+						 (__mmask8) -1, __R);
 }
 
-extern __inline __m256i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_extracti64x4_epi64 (__m256i __W, __mmask8 __U, __m512i __A,
-				const int __imm)
+_mm512_mask_div_round_pd (__m512d __W, __mmask8 __U, __m512d __M,
+			  __m512d __V, const int __R)
 {
-  return (__m256i) __builtin_ia32_extracti64x4_mask ((__v8di) __A,
-						     __imm,
-						     (__v4di) __W,
-						     (__mmask8) __U);
+  return (__m512d) __builtin_ia32_divpd512_mask ((__v8df) __M,
+						 (__v8df) __V,
+						 (__v8df) __W,
+						 (__mmask8) __U, __R);
 }
 
-extern __inline __m256i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_extracti64x4_epi64 (__mmask8 __U, __m512i __A, const int __imm)
+_mm512_maskz_div_round_pd (__mmask8 __U, __m512d __M, __m512d __V,
+			   const int __R)
 {
-  return (__m256i) __builtin_ia32_extracti64x4_mask ((__v8di) __A,
-						     __imm,
-						     (__v4di)
-						     _mm256_setzero_si256 (),
-						     (__mmask8) __U);
+  return (__m512d) __builtin_ia32_divpd512_mask ((__v8df) __M,
+						 (__v8df) __V,
+						 (__v8df)
+						 _mm512_setzero_pd (),
+						 (__mmask8) __U, __R);
 }
 
-extern __inline __m128i
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_extracti32x4_epi32 (__m512i __A, const int __imm)
+_mm512_div_round_ps (__m512 __A, __m512 __B, const int __R)
 {
-  return (__m128i) __builtin_ia32_extracti32x4_mask ((__v16si) __A,
-						     __imm,
-						     (__v4si)
-						     _mm_undefined_si128 (),
-						     (__mmask8) -1);
+  return (__m512) __builtin_ia32_divps512_mask ((__v16sf) __A,
+						(__v16sf) __B,
+						(__v16sf)
+						_mm512_undefined_ps (),
+						(__mmask16) -1, __R);
 }
 
-extern __inline __m128i
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_extracti32x4_epi32 (__m128i __W, __mmask8 __U, __m512i __A,
-				const int __imm)
+_mm512_mask_div_round_ps (__m512 __W, __mmask16 __U, __m512 __A,
+			  __m512 __B, const int __R)
 {
-  return (__m128i) __builtin_ia32_extracti32x4_mask ((__v16si) __A,
-						     __imm,
-						     (__v4si) __W,
-						     (__mmask8) __U);
+  return (__m512) __builtin_ia32_divps512_mask ((__v16sf) __A,
+						(__v16sf) __B,
+						(__v16sf) __W,
+						(__mmask16) __U, __R);
 }
 
-extern __inline __m128i
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_extracti32x4_epi32 (__mmask8 __U, __m512i __A, const int __imm)
+_mm512_maskz_div_round_ps (__mmask16 __U, __m512 __A, __m512 __B, const int __R)
 {
-  return (__m128i) __builtin_ia32_extracti32x4_mask ((__v16si) __A,
-						     __imm,
-						     (__v4si)
-						     _mm_setzero_si128 (),
-						     (__mmask8) __U);
+  return (__m512) __builtin_ia32_divps512_mask ((__v16sf) __A,
+						(__v16sf) __B,
+						(__v16sf)
+						_mm512_setzero_ps (),
+						(__mmask16) __U, __R);
 }
+
 #else
+#define _mm512_mul_round_pd(A, B, C)            \
+    (__m512d)__builtin_ia32_mulpd512_mask(A, B, (__v8df)_mm512_undefined_pd(), -1, C)
 
-#define _mm512_extractf64x4_pd(X, C)                                    \
-  ((__m256d) __builtin_ia32_extractf64x4_mask ((__v8df)(__m512d) (X),   \
-    (int) (C),\
-    (__v4df)(__m256d)_mm256_undefined_pd(),\
-    (__mmask8)-1))
+#define _mm512_mask_mul_round_pd(W, U, A, B, C) \
+    (__m512d)__builtin_ia32_mulpd512_mask(A, B, W, U, C)
 
-#define _mm512_mask_extractf64x4_pd(W, U, X, C)                         \
-  ((__m256d) __builtin_ia32_extractf64x4_mask ((__v8df)(__m512d) (X),   \
-    (int) (C),\
-    (__v4df)(__m256d)(W),\
-    (__mmask8)(U)))
+#define _mm512_maskz_mul_round_pd(U, A, B, C)   \
+    (__m512d)__builtin_ia32_mulpd512_mask(A, B, (__v8df)_mm512_setzero_pd(), U, C)
 
-#define _mm512_maskz_extractf64x4_pd(U, X, C)                           \
-  ((__m256d) __builtin_ia32_extractf64x4_mask ((__v8df)(__m512d) (X),   \
-    (int) (C),\
-    (__v4df)(__m256d)_mm256_setzero_pd(),\
-    (__mmask8)(U)))
+#define _mm512_mul_round_ps(A, B, C)            \
+    (__m512)__builtin_ia32_mulps512_mask(A, B, (__v16sf)_mm512_undefined_ps(), -1, C)
 
-#define _mm512_extractf32x4_ps(X, C)                                    \
-  ((__m128) __builtin_ia32_extractf32x4_mask ((__v16sf)(__m512) (X),    \
-    (int) (C),\
-    (__v4sf)(__m128)_mm_undefined_ps(),\
-    (__mmask8)-1))
+#define _mm512_mask_mul_round_ps(W, U, A, B, C) \
+    (__m512)__builtin_ia32_mulps512_mask(A, B, W, U, C)
 
-#define _mm512_mask_extractf32x4_ps(W, U, X, C)                         \
-  ((__m128) __builtin_ia32_extractf32x4_mask ((__v16sf)(__m512) (X),    \
-    (int) (C),\
-    (__v4sf)(__m128)(W),\
-    (__mmask8)(U)))
+#define _mm512_maskz_mul_round_ps(U, A, B, C)   \
+    (__m512)__builtin_ia32_mulps512_mask(A, B, (__v16sf)_mm512_setzero_ps(), U, C)
 
-#define _mm512_maskz_extractf32x4_ps(U, X, C)                           \
-  ((__m128) __builtin_ia32_extractf32x4_mask ((__v16sf)(__m512) (X),    \
-    (int) (C),\
-    (__v4sf)(__m128)_mm_setzero_ps(),\
-    (__mmask8)(U)))
+#define _mm512_div_round_pd(A, B, C)            \
+    (__m512d)__builtin_ia32_divpd512_mask(A, B, (__v8df)_mm512_undefined_pd(), -1, C)
 
-#define _mm512_extracti64x4_epi64(X, C)                                 \
-  ((__m256i) __builtin_ia32_extracti64x4_mask ((__v8di)(__m512i) (X),   \
-    (int) (C),\
-    (__v4di)(__m256i)_mm256_undefined_si256 (),\
-    (__mmask8)-1))
+#define _mm512_mask_div_round_pd(W, U, A, B, C) \
+    (__m512d)__builtin_ia32_divpd512_mask(A, B, W, U, C)
 
-#define _mm512_mask_extracti64x4_epi64(W, U, X, C)                      \
-  ((__m256i) __builtin_ia32_extracti64x4_mask ((__v8di)(__m512i) (X),   \
-    (int) (C),\
-    (__v4di)(__m256i)(W),\
-    (__mmask8)(U)))
+#define _mm512_maskz_div_round_pd(U, A, B, C)   \
+    (__m512d)__builtin_ia32_divpd512_mask(A, B, (__v8df)_mm512_setzero_pd(), U, C)
 
-#define _mm512_maskz_extracti64x4_epi64(U, X, C)                        \
-  ((__m256i) __builtin_ia32_extracti64x4_mask ((__v8di)(__m512i) (X),   \
-    (int) (C),\
-    (__v4di)(__m256i)_mm256_setzero_si256 (),\
-    (__mmask8)(U)))
+#define _mm512_div_round_ps(A, B, C)            \
+    (__m512)__builtin_ia32_divps512_mask(A, B, (__v16sf)_mm512_undefined_ps(), -1, C)
 
-#define _mm512_extracti32x4_epi32(X, C)                                 \
-  ((__m128i) __builtin_ia32_extracti32x4_mask ((__v16si)(__m512i) (X),  \
-    (int) (C),\
-    (__v4si)(__m128i)_mm_undefined_si128 (),\
-    (__mmask8)-1))
+#define _mm512_mask_div_round_ps(W, U, A, B, C) \
+    (__m512)__builtin_ia32_divps512_mask(A, B, W, U, C)
 
-#define _mm512_mask_extracti32x4_epi32(W, U, X, C)                      \
-  ((__m128i) __builtin_ia32_extracti32x4_mask ((__v16si)(__m512i) (X),  \
-    (int) (C),\
-    (__v4si)(__m128i)(W),\
-    (__mmask8)(U)))
+#define _mm512_maskz_div_round_ps(U, A, B, C)   \
+    (__m512)__builtin_ia32_divps512_mask(A, B, (__v16sf)_mm512_setzero_ps(), U, C)
 
-#define _mm512_maskz_extracti32x4_epi32(U, X, C)                        \
-  ((__m128i) __builtin_ia32_extracti32x4_mask ((__v16si)(__m512i) (X),  \
-    (int) (C),\
-    (__v4si)(__m128i)_mm_setzero_si128 (),\
-    (__mmask8)(U)))
 #endif
 
 #ifdef __OPTIMIZE__
-extern __inline __m512i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_inserti32x4 (__m512i __A, __m128i __B, const int __imm)
+_mm512_max_round_pd (__m512d __A, __m512d __B, const int __R)
 {
-  return (__m512i) __builtin_ia32_inserti32x4_mask ((__v16si) __A,
-						    (__v4si) __B,
-						    __imm,
-						    (__v16si) __A, -1);
+  return (__m512d) __builtin_ia32_maxpd512_mask ((__v8df) __A,
+						 (__v8df) __B,
+						 (__v8df)
+						 _mm512_undefined_pd (),
+						 (__mmask8) -1, __R);
+}
+
+extern __inline __m512d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_max_round_pd (__m512d __W, __mmask8 __U, __m512d __A,
+			  __m512d __B, const int __R)
+{
+  return (__m512d) __builtin_ia32_maxpd512_mask ((__v8df) __A,
+						 (__v8df) __B,
+						 (__v8df) __W,
+						 (__mmask8) __U, __R);
+}
+
+extern __inline __m512d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_max_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
+			   const int __R)
+{
+  return (__m512d) __builtin_ia32_maxpd512_mask ((__v8df) __A,
+						 (__v8df) __B,
+						 (__v8df)
+						 _mm512_setzero_pd (),
+						 (__mmask8) __U, __R);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_insertf32x4 (__m512 __A, __m128 __B, const int __imm)
+_mm512_max_round_ps (__m512 __A, __m512 __B, const int __R)
 {
-  return (__m512) __builtin_ia32_insertf32x4_mask ((__v16sf) __A,
-						   (__v4sf) __B,
-						   __imm,
-						   (__v16sf) __A, -1);
+  return (__m512) __builtin_ia32_maxps512_mask ((__v16sf) __A,
+						(__v16sf) __B,
+						(__v16sf)
+						_mm512_undefined_ps (),
+						(__mmask16) -1, __R);
+}
+
+extern __inline __m512
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_max_round_ps (__m512 __W, __mmask16 __U, __m512 __A,
+			  __m512 __B, const int __R)
+{
+  return (__m512) __builtin_ia32_maxps512_mask ((__v16sf) __A,
+						(__v16sf) __B,
+						(__v16sf) __W,
+						(__mmask16) __U, __R);
+}
+
+extern __inline __m512
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_max_round_ps (__mmask16 __U, __m512 __A, __m512 __B, const int __R)
+{
+  return (__m512) __builtin_ia32_maxps512_mask ((__v16sf) __A,
+						(__v16sf) __B,
+						(__v16sf)
+						_mm512_setzero_ps (),
+						(__mmask16) __U, __R);
 }
 
-extern __inline __m512i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_inserti64x4 (__m512i __A, __m256i __B, const int __imm)
+_mm512_min_round_pd (__m512d __A, __m512d __B, const int __R)
 {
-  return (__m512i) __builtin_ia32_inserti64x4_mask ((__v8di) __A,
-						    (__v4di) __B,
-						    __imm,
-						    (__v8di)
-						    _mm512_undefined_epi32 (),
-						    (__mmask8) -1);
+  return (__m512d) __builtin_ia32_minpd512_mask ((__v8df) __A,
+						 (__v8df) __B,
+						 (__v8df)
+						 _mm512_undefined_pd (),
+						 (__mmask8) -1, __R);
 }
 
-extern __inline __m512i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_inserti64x4 (__m512i __W, __mmask8 __U, __m512i __A,
-			 __m256i __B, const int __imm)
+_mm512_mask_min_round_pd (__m512d __W, __mmask8 __U, __m512d __A,
+			  __m512d __B, const int __R)
 {
-  return (__m512i) __builtin_ia32_inserti64x4_mask ((__v8di) __A,
-						    (__v4di) __B,
-						    __imm,
-						    (__v8di) __W,
-						    (__mmask8) __U);
+  return (__m512d) __builtin_ia32_minpd512_mask ((__v8df) __A,
+						 (__v8df) __B,
+						 (__v8df) __W,
+						 (__mmask8) __U, __R);
 }
 
-extern __inline __m512i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_inserti64x4 (__mmask8 __U, __m512i __A, __m256i __B,
-			  const int __imm)
+_mm512_maskz_min_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
+			   const int __R)
 {
-  return (__m512i) __builtin_ia32_inserti64x4_mask ((__v8di) __A,
-						    (__v4di) __B,
-						    __imm,
-						    (__v8di)
-						    _mm512_setzero_si512 (),
-						    (__mmask8) __U);
+  return (__m512d) __builtin_ia32_minpd512_mask ((__v8df) __A,
+						 (__v8df) __B,
+						 (__v8df)
+						 _mm512_setzero_pd (),
+						 (__mmask8) __U, __R);
 }
 
-extern __inline __m512d
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_insertf64x4 (__m512d __A, __m256d __B, const int __imm)
+_mm512_min_round_ps (__m512 __A, __m512 __B, const int __R)
 {
-  return (__m512d) __builtin_ia32_insertf64x4_mask ((__v8df) __A,
-						    (__v4df) __B,
-						    __imm,
-						    (__v8df)
-						    _mm512_undefined_pd (),
-						    (__mmask8) -1);
+  return (__m512) __builtin_ia32_minps512_mask ((__v16sf) __A,
+						(__v16sf) __B,
+						(__v16sf)
+						_mm512_undefined_ps (),
+						(__mmask16) -1, __R);
 }
 
-extern __inline __m512d
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_insertf64x4 (__m512d __W, __mmask8 __U, __m512d __A,
-			 __m256d __B, const int __imm)
+_mm512_mask_min_round_ps (__m512 __W, __mmask16 __U, __m512 __A,
+			  __m512 __B, const int __R)
 {
-  return (__m512d) __builtin_ia32_insertf64x4_mask ((__v8df) __A,
-						    (__v4df) __B,
-						    __imm,
-						    (__v8df) __W,
-						    (__mmask8) __U);
+  return (__m512) __builtin_ia32_minps512_mask ((__v16sf) __A,
+						(__v16sf) __B,
+						(__v16sf) __W,
+						(__mmask16) __U, __R);
 }
 
-extern __inline __m512d
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_insertf64x4 (__mmask8 __U, __m512d __A, __m256d __B,
-			  const int __imm)
+_mm512_maskz_min_round_ps (__mmask16 __U, __m512 __A, __m512 __B, const int __R)
 {
-  return (__m512d) __builtin_ia32_insertf64x4_mask ((__v8df) __A,
-						    (__v4df) __B,
-						    __imm,
-						    (__v8df)
-						    _mm512_setzero_pd (),
-						    (__mmask8) __U);
+  return (__m512) __builtin_ia32_minps512_mask ((__v16sf) __A,
+						(__v16sf) __B,
+						(__v16sf)
+						_mm512_setzero_ps (),
+						(__mmask16) __U, __R);
 }
 #else
-#define _mm512_insertf32x4(X, Y, C)                                     \
-  ((__m512) __builtin_ia32_insertf32x4_mask ((__v16sf)(__m512) (X),     \
-    (__v4sf)(__m128) (Y), (int) (C), (__v16sf)(__m512) (X), (__mmask16)(-1)))
+#define _mm512_max_round_pd(A, B,  R) \
+    (__m512d)__builtin_ia32_maxpd512_mask(A, B, (__v8df)_mm512_undefined_pd(), -1, R)
 
-#define _mm512_inserti32x4(X, Y, C)                                     \
-  ((__m512i) __builtin_ia32_inserti32x4_mask ((__v16si)(__m512i) (X),   \
-    (__v4si)(__m128i) (Y), (int) (C), (__v16si)(__m512i) (X), (__mmask16)(-1)))
+#define _mm512_mask_max_round_pd(W, U,  A, B, R) \
+    (__m512d)__builtin_ia32_maxpd512_mask(A, B, W, U, R)
 
-#define _mm512_insertf64x4(X, Y, C)                                     \
-  ((__m512d) __builtin_ia32_insertf64x4_mask ((__v8df)(__m512d) (X),    \
-    (__v4df)(__m256d) (Y), (int) (C),					\
-    (__v8df)(__m512d)_mm512_undefined_pd(),				\
-    (__mmask8)-1))
+#define _mm512_maskz_max_round_pd(U, A,  B, R) \
+    (__m512d)__builtin_ia32_maxpd512_mask(A, B, (__v8df)_mm512_setzero_pd(), U, R)
 
-#define _mm512_mask_insertf64x4(W, U, X, Y, C)                          \
-  ((__m512d) __builtin_ia32_insertf64x4_mask ((__v8df)(__m512d) (X),    \
-    (__v4df)(__m256d) (Y), (int) (C),					\
-    (__v8df)(__m512d)(W),						\
-    (__mmask8)(U)))
+#define _mm512_max_round_ps(A, B,  R) \
+    (__m512)__builtin_ia32_maxps512_mask(A, B, (__v16sf)_mm512_undefined_pd(), -1, R)
 
-#define _mm512_maskz_insertf64x4(U, X, Y, C)                            \
-  ((__m512d) __builtin_ia32_insertf64x4_mask ((__v8df)(__m512d) (X),    \
-    (__v4df)(__m256d) (Y), (int) (C),					\
-    (__v8df)(__m512d)_mm512_setzero_pd(),				\
-    (__mmask8)(U)))
+#define _mm512_mask_max_round_ps(W, U,  A, B, R) \
+    (__m512)__builtin_ia32_maxps512_mask(A, B, W, U, R)
 
-#define _mm512_inserti64x4(X, Y, C)                                     \
-  ((__m512i) __builtin_ia32_inserti64x4_mask ((__v8di)(__m512i) (X),    \
-    (__v4di)(__m256i) (Y), (int) (C),					\
-    (__v8di)(__m512i)_mm512_undefined_epi32 (),				\
-    (__mmask8)-1))
+#define _mm512_maskz_max_round_ps(U, A,  B, R) \
+    (__m512)__builtin_ia32_maxps512_mask(A, B, (__v16sf)_mm512_setzero_ps(), U, R)
 
-#define _mm512_mask_inserti64x4(W, U, X, Y, C)                          \
-  ((__m512i) __builtin_ia32_inserti64x4_mask ((__v8di)(__m512i) (X),    \
-    (__v4di)(__m256i) (Y), (int) (C),\
-    (__v8di)(__m512i)(W),\
-    (__mmask8)(U)))
+#define _mm512_min_round_pd(A, B,  R) \
+    (__m512d)__builtin_ia32_minpd512_mask(A, B, (__v8df)_mm512_undefined_pd(), -1, R)
 
-#define _mm512_maskz_inserti64x4(U, X, Y, C)                            \
-  ((__m512i) __builtin_ia32_inserti64x4_mask ((__v8di)(__m512i) (X),    \
-    (__v4di)(__m256i) (Y), (int) (C),					\
-    (__v8di)(__m512i)_mm512_setzero_si512 (),				\
-    (__mmask8)(U)))
+#define _mm512_mask_min_round_pd(W, U,  A, B, R) \
+    (__m512d)__builtin_ia32_minpd512_mask(A, B, W, U, R)
+
+#define _mm512_maskz_min_round_pd(U, A,  B, R) \
+    (__m512d)__builtin_ia32_minpd512_mask(A, B, (__v8df)_mm512_setzero_pd(), U, R)
+
+#define _mm512_min_round_ps(A, B, R) \
+    (__m512)__builtin_ia32_minps512_mask(A, B, (__v16sf)_mm512_undefined_ps(), -1, R)
+
+#define _mm512_mask_min_round_ps(W, U,  A, B, R) \
+    (__m512)__builtin_ia32_minps512_mask(A, B, W, U, R)
+
+#define _mm512_maskz_min_round_ps(U, A,  B, R) \
+    (__m512)__builtin_ia32_minps512_mask(A, B, (__v16sf)_mm512_setzero_ps(), U, R)
 #endif
 
+#ifdef __OPTIMIZE__
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_loadu_pd (void const *__P)
+_mm512_scalef_round_pd (__m512d __A, __m512d __B, const int __R)
 {
-  return *(__m512d_u *)__P;
+  return (__m512d) __builtin_ia32_scalefpd512_mask ((__v8df) __A,
+						    (__v8df) __B,
+						    (__v8df)
+						    _mm512_undefined_pd (),
+						    (__mmask8) -1, __R);
 }
 
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_loadu_pd (__m512d __W, __mmask8 __U, void const *__P)
+_mm512_mask_scalef_round_pd (__m512d __W, __mmask8 __U, __m512d __A,
+			     __m512d __B, const int __R)
 {
-  return (__m512d) __builtin_ia32_loadupd512_mask ((const double *) __P,
-						   (__v8df) __W,
-						   (__mmask8) __U);
+  return (__m512d) __builtin_ia32_scalefpd512_mask ((__v8df) __A,
+						    (__v8df) __B,
+						    (__v8df) __W,
+						    (__mmask8) __U, __R);
 }
 
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_loadu_pd (__mmask8 __U, void const *__P)
+_mm512_maskz_scalef_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
+			      const int __R)
 {
-  return (__m512d) __builtin_ia32_loadupd512_mask ((const double *) __P,
-						   (__v8df)
-						   _mm512_setzero_pd (),
-						   (__mmask8) __U);
+  return (__m512d) __builtin_ia32_scalefpd512_mask ((__v8df) __A,
+						    (__v8df) __B,
+						    (__v8df)
+						    _mm512_setzero_pd (),
+						    (__mmask8) __U, __R);
 }
 
-extern __inline void
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_storeu_pd (void *__P, __m512d __A)
+_mm512_scalef_round_ps (__m512 __A, __m512 __B, const int __R)
 {
-  *(__m512d_u *)__P = __A;
+  return (__m512) __builtin_ia32_scalefps512_mask ((__v16sf) __A,
+						   (__v16sf) __B,
+						   (__v16sf)
+						   _mm512_undefined_ps (),
+						   (__mmask16) -1, __R);
 }
 
-extern __inline void
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_storeu_pd (void *__P, __mmask8 __U, __m512d __A)
+_mm512_mask_scalef_round_ps (__m512 __W, __mmask16 __U, __m512 __A,
+			     __m512 __B, const int __R)
 {
-  __builtin_ia32_storeupd512_mask ((double *) __P, (__v8df) __A,
-				   (__mmask8) __U);
+  return (__m512) __builtin_ia32_scalefps512_mask ((__v16sf) __A,
+						   (__v16sf) __B,
+						   (__v16sf) __W,
+						   (__mmask16) __U, __R);
+}
+
+extern __inline __m512
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_scalef_round_ps (__mmask16 __U, __m512 __A, __m512 __B,
+			      const int __R)
+{
+  return (__m512) __builtin_ia32_scalefps512_mask ((__v16sf) __A,
+						   (__v16sf) __B,
+						   (__v16sf)
+						   _mm512_setzero_ps (),
+						   (__mmask16) __U, __R);
 }
 
-extern __inline __m512
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_loadu_ps (void const *__P)
-{
-  return *(__m512_u *)__P;
-}
+#else
+#define _mm512_scalef_round_pd(A, B, C)					\
+  ((__m512d)								\
+   __builtin_ia32_scalefpd512_mask((A), (B),				\
+				   (__v8df) _mm512_undefined_pd(),	\
+				   -1, (C)))
+
+#define _mm512_mask_scalef_round_pd(W, U, A, B, C)			\
+  ((__m512d) __builtin_ia32_scalefpd512_mask((A), (B), (W), (U), (C)))
+
+#define _mm512_maskz_scalef_round_pd(U, A, B, C)			\
+  ((__m512d)								\
+   __builtin_ia32_scalefpd512_mask((A), (B),				\
+				   (__v8df) _mm512_setzero_pd(),	\
+				   (U), (C)))
+
+#define _mm512_scalef_round_ps(A, B, C)					\
+  ((__m512)								\
+   __builtin_ia32_scalefps512_mask((A), (B),				\
+				   (__v16sf) _mm512_undefined_ps(),	\
+				   -1, (C)))
+
+#define _mm512_mask_scalef_round_ps(W, U, A, B, C)			\
+  ((__m512) __builtin_ia32_scalefps512_mask((A), (B), (W), (U), (C)))
+
+#define _mm512_maskz_scalef_round_ps(U, A, B, C)			\
+  ((__m512)								\
+   __builtin_ia32_scalefps512_mask((A), (B),				\
+				   (__v16sf) _mm512_setzero_ps(),	\
+				   (U), (C)))
+
+#endif
 
-extern __inline __m512
+#ifdef __OPTIMIZE__
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_loadu_ps (__m512 __W, __mmask16 __U, void const *__P)
+_mm512_fmadd_round_pd (__m512d __A, __m512d __B, __m512d __C, const int __R)
 {
-  return (__m512) __builtin_ia32_loadups512_mask ((const float *) __P,
-						  (__v16sf) __W,
-						  (__mmask16) __U);
+  return (__m512d) __builtin_ia32_vfmaddpd512_mask ((__v8df) __A,
+						    (__v8df) __B,
+						    (__v8df) __C,
+						    (__mmask8) -1, __R);
 }
 
-extern __inline __m512
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_loadu_ps (__mmask16 __U, void const *__P)
+_mm512_mask_fmadd_round_pd (__m512d __A, __mmask8 __U, __m512d __B,
+			    __m512d __C, const int __R)
 {
-  return (__m512) __builtin_ia32_loadups512_mask ((const float *) __P,
-						  (__v16sf)
-						  _mm512_setzero_ps (),
-						  (__mmask16) __U);
+  return (__m512d) __builtin_ia32_vfmaddpd512_mask ((__v8df) __A,
+						    (__v8df) __B,
+						    (__v8df) __C,
+						    (__mmask8) __U, __R);
 }
 
-extern __inline void
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_storeu_ps (void *__P, __m512 __A)
+_mm512_mask3_fmadd_round_pd (__m512d __A, __m512d __B, __m512d __C,
+			     __mmask8 __U, const int __R)
 {
-  *(__m512_u *)__P = __A;
+  return (__m512d) __builtin_ia32_vfmaddpd512_mask3 ((__v8df) __A,
+						     (__v8df) __B,
+						     (__v8df) __C,
+						     (__mmask8) __U, __R);
 }
 
-extern __inline void
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_storeu_ps (void *__P, __mmask16 __U, __m512 __A)
+_mm512_maskz_fmadd_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
+			     __m512d __C, const int __R)
 {
-  __builtin_ia32_storeups512_mask ((float *) __P, (__v16sf) __A,
-				   (__mmask16) __U);
+  return (__m512d) __builtin_ia32_vfmaddpd512_maskz ((__v8df) __A,
+						     (__v8df) __B,
+						     (__v8df) __C,
+						     (__mmask8) __U, __R);
 }
 
-extern __inline __m128
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_load_ss (__m128 __W, __mmask8 __U, const float *__P)
+_mm512_fmadd_round_ps (__m512 __A, __m512 __B, __m512 __C, const int __R)
 {
-  return (__m128) __builtin_ia32_loadss_mask (__P, (__v4sf) __W, __U);
+  return (__m512) __builtin_ia32_vfmaddps512_mask ((__v16sf) __A,
+						   (__v16sf) __B,
+						   (__v16sf) __C,
+						   (__mmask16) -1, __R);
 }
 
-extern __inline __m128
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_load_ss (__mmask8 __U, const float *__P)
+_mm512_mask_fmadd_round_ps (__m512 __A, __mmask16 __U, __m512 __B,
+			    __m512 __C, const int __R)
 {
-  return (__m128) __builtin_ia32_loadss_mask (__P, (__v4sf) _mm_setzero_ps (),
-					      __U);
+  return (__m512) __builtin_ia32_vfmaddps512_mask ((__v16sf) __A,
+						   (__v16sf) __B,
+						   (__v16sf) __C,
+						   (__mmask16) __U, __R);
 }
 
-extern __inline __m128d
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_load_sd (__m128d __W, __mmask8 __U, const double *__P)
+_mm512_mask3_fmadd_round_ps (__m512 __A, __m512 __B, __m512 __C,
+			     __mmask16 __U, const int __R)
 {
-  return (__m128d) __builtin_ia32_loadsd_mask (__P, (__v2df) __W, __U);
+  return (__m512) __builtin_ia32_vfmaddps512_mask3 ((__v16sf) __A,
+						    (__v16sf) __B,
+						    (__v16sf) __C,
+						    (__mmask16) __U, __R);
 }
 
-extern __inline __m128d
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_load_sd (__mmask8 __U, const double *__P)
+_mm512_maskz_fmadd_round_ps (__mmask16 __U, __m512 __A, __m512 __B,
+			     __m512 __C, const int __R)
 {
-  return (__m128d) __builtin_ia32_loadsd_mask (__P, (__v2df) _mm_setzero_pd (),
-					       __U);
+  return (__m512) __builtin_ia32_vfmaddps512_maskz ((__v16sf) __A,
+						    (__v16sf) __B,
+						    (__v16sf) __C,
+						    (__mmask16) __U, __R);
 }
 
-extern __inline __m128
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_move_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B)
+_mm512_fmsub_round_pd (__m512d __A, __m512d __B, __m512d __C, const int __R)
 {
-  return (__m128) __builtin_ia32_movess_mask ((__v4sf) __A, (__v4sf) __B,
-					      (__v4sf) __W, __U);
+  return (__m512d) __builtin_ia32_vfmsubpd512_mask ((__v8df) __A,
+						    (__v8df) __B,
+						    (__v8df) __C,
+						    (__mmask8) -1, __R);
 }
 
-extern __inline __m128
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_move_ss (__mmask8 __U, __m128 __A, __m128 __B)
+_mm512_mask_fmsub_round_pd (__m512d __A, __mmask8 __U, __m512d __B,
+			    __m512d __C, const int __R)
 {
-  return (__m128) __builtin_ia32_movess_mask ((__v4sf) __A, (__v4sf) __B,
-					      (__v4sf) _mm_setzero_ps (), __U);
+  return (__m512d) __builtin_ia32_vfmsubpd512_mask ((__v8df) __A,
+						    (__v8df) __B,
+						    (__v8df) __C,
+						    (__mmask8) __U, __R);
 }
 
-extern __inline __m128d
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_move_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B)
+_mm512_mask3_fmsub_round_pd (__m512d __A, __m512d __B, __m512d __C,
+			     __mmask8 __U, const int __R)
 {
-  return (__m128d) __builtin_ia32_movesd_mask ((__v2df) __A, (__v2df) __B,
-					       (__v2df) __W, __U);
+  return (__m512d) __builtin_ia32_vfmsubpd512_mask3 ((__v8df) __A,
+						     (__v8df) __B,
+						     (__v8df) __C,
+						     (__mmask8) __U, __R);
 }
 
-extern __inline __m128d
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_move_sd (__mmask8 __U, __m128d __A, __m128d __B)
+_mm512_maskz_fmsub_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
+			     __m512d __C, const int __R)
 {
-  return (__m128d) __builtin_ia32_movesd_mask ((__v2df) __A, (__v2df) __B,
-					       (__v2df) _mm_setzero_pd (),
-					       __U);
+  return (__m512d) __builtin_ia32_vfmsubpd512_maskz ((__v8df) __A,
+						     (__v8df) __B,
+						     (__v8df) __C,
+						     (__mmask8) __U, __R);
 }
 
-extern __inline void
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_store_ss (float *__P, __mmask8 __U, __m128 __A)
+_mm512_fmsub_round_ps (__m512 __A, __m512 __B, __m512 __C, const int __R)
 {
-  __builtin_ia32_storess_mask (__P, (__v4sf) __A, (__mmask8) __U);
+  return (__m512) __builtin_ia32_vfmsubps512_mask ((__v16sf) __A,
+						   (__v16sf) __B,
+						   (__v16sf) __C,
+						   (__mmask16) -1, __R);
 }
 
-extern __inline void
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_store_sd (double *__P, __mmask8 __U, __m128d __A)
+_mm512_mask_fmsub_round_ps (__m512 __A, __mmask16 __U, __m512 __B,
+			    __m512 __C, const int __R)
 {
-  __builtin_ia32_storesd_mask (__P, (__v2df) __A, (__mmask8) __U);
+  return (__m512) __builtin_ia32_vfmsubps512_mask ((__v16sf) __A,
+						   (__v16sf) __B,
+						   (__v16sf) __C,
+						   (__mmask16) __U, __R);
 }
 
-extern __inline __m512i
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_loadu_epi64 (void const *__P)
+_mm512_mask3_fmsub_round_ps (__m512 __A, __m512 __B, __m512 __C,
+			     __mmask16 __U, const int __R)
 {
-  return *(__m512i_u *) __P;
+  return (__m512) __builtin_ia32_vfmsubps512_mask3 ((__v16sf) __A,
+						    (__v16sf) __B,
+						    (__v16sf) __C,
+						    (__mmask16) __U, __R);
 }
 
-extern __inline __m512i
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_loadu_epi64 (__m512i __W, __mmask8 __U, void const *__P)
+_mm512_maskz_fmsub_round_ps (__mmask16 __U, __m512 __A, __m512 __B,
+			     __m512 __C, const int __R)
 {
-  return (__m512i) __builtin_ia32_loaddqudi512_mask ((const long long *) __P,
-						     (__v8di) __W,
-						     (__mmask8) __U);
+  return (__m512) __builtin_ia32_vfmsubps512_maskz ((__v16sf) __A,
+						    (__v16sf) __B,
+						    (__v16sf) __C,
+						    (__mmask16) __U, __R);
 }
 
-extern __inline __m512i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_loadu_epi64 (__mmask8 __U, void const *__P)
+_mm512_fmaddsub_round_pd (__m512d __A, __m512d __B, __m512d __C, const int __R)
 {
-  return (__m512i) __builtin_ia32_loaddqudi512_mask ((const long long *) __P,
-						     (__v8di)
-						     _mm512_setzero_si512 (),
-						     (__mmask8) __U);
+  return (__m512d) __builtin_ia32_vfmaddsubpd512_mask ((__v8df) __A,
+						       (__v8df) __B,
+						       (__v8df) __C,
+						       (__mmask8) -1, __R);
 }
 
-extern __inline void
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_storeu_epi64 (void *__P, __m512i __A)
+_mm512_mask_fmaddsub_round_pd (__m512d __A, __mmask8 __U, __m512d __B,
+			       __m512d __C, const int __R)
 {
-  *(__m512i_u *) __P = (__m512i_u) __A;
+  return (__m512d) __builtin_ia32_vfmaddsubpd512_mask ((__v8df) __A,
+						       (__v8df) __B,
+						       (__v8df) __C,
+						       (__mmask8) __U, __R);
 }
 
-extern __inline void
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_storeu_epi64 (void *__P, __mmask8 __U, __m512i __A)
+_mm512_mask3_fmaddsub_round_pd (__m512d __A, __m512d __B, __m512d __C,
+				__mmask8 __U, const int __R)
 {
-  __builtin_ia32_storedqudi512_mask ((long long *) __P, (__v8di) __A,
-				     (__mmask8) __U);
+  return (__m512d) __builtin_ia32_vfmaddsubpd512_mask3 ((__v8df) __A,
+							(__v8df) __B,
+							(__v8df) __C,
+							(__mmask8) __U, __R);
 }
 
-extern __inline __m512i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_loadu_si512 (void const *__P)
+_mm512_maskz_fmaddsub_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
+				__m512d __C, const int __R)
 {
-  return *(__m512i_u *)__P;
+  return (__m512d) __builtin_ia32_vfmaddsubpd512_maskz ((__v8df) __A,
+							(__v8df) __B,
+							(__v8df) __C,
+							(__mmask8) __U, __R);
 }
 
-extern __inline __m512i
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_loadu_epi32 (void const *__P)
+_mm512_fmaddsub_round_ps (__m512 __A, __m512 __B, __m512 __C, const int __R)
 {
-  return *(__m512i_u *) __P;
+  return (__m512) __builtin_ia32_vfmaddsubps512_mask ((__v16sf) __A,
+						      (__v16sf) __B,
+						      (__v16sf) __C,
+						      (__mmask16) -1, __R);
 }
 
-extern __inline __m512i
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_loadu_epi32 (__m512i __W, __mmask16 __U, void const *__P)
+_mm512_mask_fmaddsub_round_ps (__m512 __A, __mmask16 __U, __m512 __B,
+			       __m512 __C, const int __R)
 {
-  return (__m512i) __builtin_ia32_loaddqusi512_mask ((const int *) __P,
-						     (__v16si) __W,
-						     (__mmask16) __U);
+  return (__m512) __builtin_ia32_vfmaddsubps512_mask ((__v16sf) __A,
+						      (__v16sf) __B,
+						      (__v16sf) __C,
+						      (__mmask16) __U, __R);
 }
 
-extern __inline __m512i
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_loadu_epi32 (__mmask16 __U, void const *__P)
+_mm512_mask3_fmaddsub_round_ps (__m512 __A, __m512 __B, __m512 __C,
+				__mmask16 __U, const int __R)
 {
-  return (__m512i) __builtin_ia32_loaddqusi512_mask ((const int *) __P,
-						     (__v16si)
-						     _mm512_setzero_si512 (),
-						     (__mmask16) __U);
+  return (__m512) __builtin_ia32_vfmaddsubps512_mask3 ((__v16sf) __A,
+						       (__v16sf) __B,
+						       (__v16sf) __C,
+						       (__mmask16) __U, __R);
 }
 
-extern __inline void
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_storeu_si512 (void *__P, __m512i __A)
+_mm512_maskz_fmaddsub_round_ps (__mmask16 __U, __m512 __A, __m512 __B,
+				__m512 __C, const int __R)
 {
-  *(__m512i_u *)__P = __A;
+  return (__m512) __builtin_ia32_vfmaddsubps512_maskz ((__v16sf) __A,
+						       (__v16sf) __B,
+						       (__v16sf) __C,
+						       (__mmask16) __U, __R);
 }
 
-extern __inline void
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_storeu_epi32 (void *__P, __m512i __A)
+_mm512_fmsubadd_round_pd (__m512d __A, __m512d __B, __m512d __C, const int __R)
 {
-  *(__m512i_u *) __P = (__m512i_u) __A;
+  return (__m512d) __builtin_ia32_vfmaddsubpd512_mask ((__v8df) __A,
+						       (__v8df) __B,
+						       -(__v8df) __C,
+						       (__mmask8) -1, __R);
 }
 
-extern __inline void
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_storeu_epi32 (void *__P, __mmask16 __U, __m512i __A)
+_mm512_mask_fmsubadd_round_pd (__m512d __A, __mmask8 __U, __m512d __B,
+			       __m512d __C, const int __R)
 {
-  __builtin_ia32_storedqusi512_mask ((int *) __P, (__v16si) __A,
-				     (__mmask16) __U);
+  return (__m512d) __builtin_ia32_vfmaddsubpd512_mask ((__v8df) __A,
+						       (__v8df) __B,
+						       -(__v8df) __C,
+						       (__mmask8) __U, __R);
 }
 
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_permutevar_pd (__m512d __A, __m512i __C)
+_mm512_mask3_fmsubadd_round_pd (__m512d __A, __m512d __B, __m512d __C,
+				__mmask8 __U, const int __R)
 {
-  return (__m512d) __builtin_ia32_vpermilvarpd512_mask ((__v8df) __A,
-							(__v8di) __C,
-							(__v8df)
-							_mm512_undefined_pd (),
-							(__mmask8) -1);
+  return (__m512d) __builtin_ia32_vfmsubaddpd512_mask3 ((__v8df) __A,
+							(__v8df) __B,
+							(__v8df) __C,
+							(__mmask8) __U, __R);
 }
 
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_permutevar_pd (__m512d __W, __mmask8 __U, __m512d __A, __m512i __C)
+_mm512_maskz_fmsubadd_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
+				__m512d __C, const int __R)
 {
-  return (__m512d) __builtin_ia32_vpermilvarpd512_mask ((__v8df) __A,
-							(__v8di) __C,
-							(__v8df) __W,
-							(__mmask8) __U);
+  return (__m512d) __builtin_ia32_vfmaddsubpd512_maskz ((__v8df) __A,
+							(__v8df) __B,
+							-(__v8df) __C,
+							(__mmask8) __U, __R);
 }
 
-extern __inline __m512d
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_permutevar_pd (__mmask8 __U, __m512d __A, __m512i __C)
+_mm512_fmsubadd_round_ps (__m512 __A, __m512 __B, __m512 __C, const int __R)
 {
-  return (__m512d) __builtin_ia32_vpermilvarpd512_mask ((__v8df) __A,
-							(__v8di) __C,
-							(__v8df)
-							_mm512_setzero_pd (),
-							(__mmask8) __U);
+  return (__m512) __builtin_ia32_vfmaddsubps512_mask ((__v16sf) __A,
+						      (__v16sf) __B,
+						      -(__v16sf) __C,
+						      (__mmask16) -1, __R);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_permutevar_ps (__m512 __A, __m512i __C)
+_mm512_mask_fmsubadd_round_ps (__m512 __A, __mmask16 __U, __m512 __B,
+			       __m512 __C, const int __R)
 {
-  return (__m512) __builtin_ia32_vpermilvarps512_mask ((__v16sf) __A,
-						       (__v16si) __C,
-						       (__v16sf)
-						       _mm512_undefined_ps (),
-						       (__mmask16) -1);
+  return (__m512) __builtin_ia32_vfmaddsubps512_mask ((__v16sf) __A,
+						      (__v16sf) __B,
+						      -(__v16sf) __C,
+						      (__mmask16) __U, __R);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_permutevar_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512i __C)
+_mm512_mask3_fmsubadd_round_ps (__m512 __A, __m512 __B, __m512 __C,
+				__mmask16 __U, const int __R)
 {
-  return (__m512) __builtin_ia32_vpermilvarps512_mask ((__v16sf) __A,
-						       (__v16si) __C,
-						       (__v16sf) __W,
-						       (__mmask16) __U);
+  return (__m512) __builtin_ia32_vfmsubaddps512_mask3 ((__v16sf) __A,
+						       (__v16sf) __B,
+						       (__v16sf) __C,
+						       (__mmask16) __U, __R);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_permutevar_ps (__mmask16 __U, __m512 __A, __m512i __C)
+_mm512_maskz_fmsubadd_round_ps (__mmask16 __U, __m512 __A, __m512 __B,
+				__m512 __C, const int __R)
 {
-  return (__m512) __builtin_ia32_vpermilvarps512_mask ((__v16sf) __A,
-						       (__v16si) __C,
-						       (__v16sf)
-						       _mm512_setzero_ps (),
-						       (__mmask16) __U);
+  return (__m512) __builtin_ia32_vfmaddsubps512_maskz ((__v16sf) __A,
+						       (__v16sf) __B,
+						       -(__v16sf) __C,
+						       (__mmask16) __U, __R);
 }
 
-extern __inline __m512i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_permutex2var_epi64 (__m512i __A, __m512i __I, __m512i __B)
+_mm512_fnmadd_round_pd (__m512d __A, __m512d __B, __m512d __C, const int __R)
 {
-  return (__m512i) __builtin_ia32_vpermt2varq512_mask ((__v8di) __I
-						       /* idx */ ,
-						       (__v8di) __A,
-						       (__v8di) __B,
-						       (__mmask8) -1);
+  return (__m512d) __builtin_ia32_vfnmaddpd512_mask ((__v8df) __A,
+						     (__v8df) __B,
+						     (__v8df) __C,
+						     (__mmask8) -1, __R);
 }
 
-extern __inline __m512i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_permutex2var_epi64 (__m512i __A, __mmask8 __U, __m512i __I,
-				__m512i __B)
+_mm512_mask_fnmadd_round_pd (__m512d __A, __mmask8 __U, __m512d __B,
+			     __m512d __C, const int __R)
 {
-  return (__m512i) __builtin_ia32_vpermt2varq512_mask ((__v8di) __I
-						       /* idx */ ,
-						       (__v8di) __A,
-						       (__v8di) __B,
-						       (__mmask8) __U);
+  return (__m512d) __builtin_ia32_vfnmaddpd512_mask ((__v8df) __A,
+						     (__v8df) __B,
+						     (__v8df) __C,
+						     (__mmask8) __U, __R);
 }
 
-extern __inline __m512i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask2_permutex2var_epi64 (__m512i __A, __m512i __I,
-				 __mmask8 __U, __m512i __B)
+_mm512_mask3_fnmadd_round_pd (__m512d __A, __m512d __B, __m512d __C,
+			      __mmask8 __U, const int __R)
 {
-  return (__m512i) __builtin_ia32_vpermi2varq512_mask ((__v8di) __A,
-						       (__v8di) __I
-						       /* idx */ ,
-						       (__v8di) __B,
-						       (__mmask8) __U);
+  return (__m512d) __builtin_ia32_vfnmaddpd512_mask3 ((__v8df) __A,
+						      (__v8df) __B,
+						      (__v8df) __C,
+						      (__mmask8) __U, __R);
 }
 
-extern __inline __m512i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_permutex2var_epi64 (__mmask8 __U, __m512i __A,
-				 __m512i __I, __m512i __B)
+_mm512_maskz_fnmadd_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
+			      __m512d __C, const int __R)
 {
-  return (__m512i) __builtin_ia32_vpermt2varq512_maskz ((__v8di) __I
-							/* idx */ ,
-							(__v8di) __A,
-							(__v8di) __B,
-							(__mmask8) __U);
+  return (__m512d) __builtin_ia32_vfnmaddpd512_maskz ((__v8df) __A,
+						      (__v8df) __B,
+						      (__v8df) __C,
+						      (__mmask8) __U, __R);
 }
 
-extern __inline __m512i
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_permutex2var_epi32 (__m512i __A, __m512i __I, __m512i __B)
+_mm512_fnmadd_round_ps (__m512 __A, __m512 __B, __m512 __C, const int __R)
 {
-  return (__m512i) __builtin_ia32_vpermt2vard512_mask ((__v16si) __I
-						       /* idx */ ,
-						       (__v16si) __A,
-						       (__v16si) __B,
-						       (__mmask16) -1);
+  return (__m512) __builtin_ia32_vfnmaddps512_mask ((__v16sf) __A,
+						    (__v16sf) __B,
+						    (__v16sf) __C,
+						    (__mmask16) -1, __R);
 }
 
-extern __inline __m512i
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_permutex2var_epi32 (__m512i __A, __mmask16 __U,
-				__m512i __I, __m512i __B)
+_mm512_mask_fnmadd_round_ps (__m512 __A, __mmask16 __U, __m512 __B,
+			     __m512 __C, const int __R)
 {
-  return (__m512i) __builtin_ia32_vpermt2vard512_mask ((__v16si) __I
-						       /* idx */ ,
-						       (__v16si) __A,
-						       (__v16si) __B,
-						       (__mmask16) __U);
+  return (__m512) __builtin_ia32_vfnmaddps512_mask ((__v16sf) __A,
+						    (__v16sf) __B,
+						    (__v16sf) __C,
+						    (__mmask16) __U, __R);
 }
 
-extern __inline __m512i
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask2_permutex2var_epi32 (__m512i __A, __m512i __I,
-				 __mmask16 __U, __m512i __B)
+_mm512_mask3_fnmadd_round_ps (__m512 __A, __m512 __B, __m512 __C,
+			      __mmask16 __U, const int __R)
 {
-  return (__m512i) __builtin_ia32_vpermi2vard512_mask ((__v16si) __A,
-						       (__v16si) __I
-						       /* idx */ ,
-						       (__v16si) __B,
-						       (__mmask16) __U);
+  return (__m512) __builtin_ia32_vfnmaddps512_mask3 ((__v16sf) __A,
+						     (__v16sf) __B,
+						     (__v16sf) __C,
+						     (__mmask16) __U, __R);
 }
 
-extern __inline __m512i
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_permutex2var_epi32 (__mmask16 __U, __m512i __A,
-				 __m512i __I, __m512i __B)
+_mm512_maskz_fnmadd_round_ps (__mmask16 __U, __m512 __A, __m512 __B,
+			      __m512 __C, const int __R)
 {
-  return (__m512i) __builtin_ia32_vpermt2vard512_maskz ((__v16si) __I
-							/* idx */ ,
-							(__v16si) __A,
-							(__v16si) __B,
-							(__mmask16) __U);
+  return (__m512) __builtin_ia32_vfnmaddps512_maskz ((__v16sf) __A,
+						     (__v16sf) __B,
+						     (__v16sf) __C,
+						     (__mmask16) __U, __R);
 }
 
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_permutex2var_pd (__m512d __A, __m512i __I, __m512d __B)
+_mm512_fnmsub_round_pd (__m512d __A, __m512d __B, __m512d __C, const int __R)
 {
-  return (__m512d) __builtin_ia32_vpermt2varpd512_mask ((__v8di) __I
-							/* idx */ ,
-							(__v8df) __A,
-							(__v8df) __B,
-							(__mmask8) -1);
+  return (__m512d) __builtin_ia32_vfnmsubpd512_mask ((__v8df) __A,
+						     (__v8df) __B,
+						     (__v8df) __C,
+						     (__mmask8) -1, __R);
 }
 
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_permutex2var_pd (__m512d __A, __mmask8 __U, __m512i __I,
-			     __m512d __B)
+_mm512_mask_fnmsub_round_pd (__m512d __A, __mmask8 __U, __m512d __B,
+			     __m512d __C, const int __R)
 {
-  return (__m512d) __builtin_ia32_vpermt2varpd512_mask ((__v8di) __I
-							/* idx */ ,
-							(__v8df) __A,
-							(__v8df) __B,
-							(__mmask8) __U);
+  return (__m512d) __builtin_ia32_vfnmsubpd512_mask ((__v8df) __A,
+						     (__v8df) __B,
+						     (__v8df) __C,
+						     (__mmask8) __U, __R);
 }
 
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask2_permutex2var_pd (__m512d __A, __m512i __I, __mmask8 __U,
-			      __m512d __B)
+_mm512_mask3_fnmsub_round_pd (__m512d __A, __m512d __B, __m512d __C,
+			      __mmask8 __U, const int __R)
 {
-  return (__m512d) __builtin_ia32_vpermi2varpd512_mask ((__v8df) __A,
-							(__v8di) __I
-							/* idx */ ,
-							(__v8df) __B,
-							(__mmask8) __U);
+  return (__m512d) __builtin_ia32_vfnmsubpd512_mask3 ((__v8df) __A,
+						      (__v8df) __B,
+						      (__v8df) __C,
+						      (__mmask8) __U, __R);
 }
 
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_permutex2var_pd (__mmask8 __U, __m512d __A, __m512i __I,
-			      __m512d __B)
+_mm512_maskz_fnmsub_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
+			      __m512d __C, const int __R)
 {
-  return (__m512d) __builtin_ia32_vpermt2varpd512_maskz ((__v8di) __I
-							 /* idx */ ,
-							 (__v8df) __A,
-							 (__v8df) __B,
-							 (__mmask8) __U);
+  return (__m512d) __builtin_ia32_vfnmsubpd512_maskz ((__v8df) __A,
+						      (__v8df) __B,
+						      (__v8df) __C,
+						      (__mmask8) __U, __R);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_permutex2var_ps (__m512 __A, __m512i __I, __m512 __B)
+_mm512_fnmsub_round_ps (__m512 __A, __m512 __B, __m512 __C, const int __R)
 {
-  return (__m512) __builtin_ia32_vpermt2varps512_mask ((__v16si) __I
-						       /* idx */ ,
-						       (__v16sf) __A,
-						       (__v16sf) __B,
-						       (__mmask16) -1);
+  return (__m512) __builtin_ia32_vfnmsubps512_mask ((__v16sf) __A,
+						    (__v16sf) __B,
+						    (__v16sf) __C,
+						    (__mmask16) -1, __R);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_permutex2var_ps (__m512 __A, __mmask16 __U, __m512i __I, __m512 __B)
+_mm512_mask_fnmsub_round_ps (__m512 __A, __mmask16 __U, __m512 __B,
+			     __m512 __C, const int __R)
 {
-  return (__m512) __builtin_ia32_vpermt2varps512_mask ((__v16si) __I
-						       /* idx */ ,
-						       (__v16sf) __A,
-						       (__v16sf) __B,
-						       (__mmask16) __U);
+  return (__m512) __builtin_ia32_vfnmsubps512_mask ((__v16sf) __A,
+						    (__v16sf) __B,
+						    (__v16sf) __C,
+						    (__mmask16) __U, __R);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask2_permutex2var_ps (__m512 __A, __m512i __I, __mmask16 __U,
-			      __m512 __B)
+_mm512_mask3_fnmsub_round_ps (__m512 __A, __m512 __B, __m512 __C,
+			      __mmask16 __U, const int __R)
 {
-  return (__m512) __builtin_ia32_vpermi2varps512_mask ((__v16sf) __A,
-						       (__v16si) __I
-						       /* idx */ ,
-						       (__v16sf) __B,
-						       (__mmask16) __U);
+  return (__m512) __builtin_ia32_vfnmsubps512_mask3 ((__v16sf) __A,
+						     (__v16sf) __B,
+						     (__v16sf) __C,
+						     (__mmask16) __U, __R);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_permutex2var_ps (__mmask16 __U, __m512 __A, __m512i __I,
-			      __m512 __B)
+_mm512_maskz_fnmsub_round_ps (__mmask16 __U, __m512 __A, __m512 __B,
+			      __m512 __C, const int __R)
 {
-  return (__m512) __builtin_ia32_vpermt2varps512_maskz ((__v16si) __I
-							/* idx */ ,
-							(__v16sf) __A,
-							(__v16sf) __B,
-							(__mmask16) __U);
+  return (__m512) __builtin_ia32_vfnmsubps512_maskz ((__v16sf) __A,
+						     (__v16sf) __B,
+						     (__v16sf) __C,
+						     (__mmask16) __U, __R);
 }
+#else
+#define _mm512_fmadd_round_pd(A, B, C, R)            \
+    (__m512d)__builtin_ia32_vfmaddpd512_mask(A, B, C, -1, R)
 
-#ifdef __OPTIMIZE__
-extern __inline __m512d
+#define _mm512_mask_fmadd_round_pd(A, U, B, C, R)    \
+    (__m512d)__builtin_ia32_vfmaddpd512_mask(A, B, C, U, R)
+
+#define _mm512_mask3_fmadd_round_pd(A, B, C, U, R)   \
+    (__m512d)__builtin_ia32_vfmaddpd512_mask3(A, B, C, U, R)
+
+#define _mm512_maskz_fmadd_round_pd(U, A, B, C, R)   \
+    (__m512d)__builtin_ia32_vfmaddpd512_maskz(A, B, C, U, R)
+
+#define _mm512_fmadd_round_ps(A, B, C, R)            \
+    (__m512)__builtin_ia32_vfmaddps512_mask(A, B, C, -1, R)
+
+#define _mm512_mask_fmadd_round_ps(A, U, B, C, R)    \
+    (__m512)__builtin_ia32_vfmaddps512_mask(A, B, C, U, R)
+
+#define _mm512_mask3_fmadd_round_ps(A, B, C, U, R)   \
+    (__m512)__builtin_ia32_vfmaddps512_mask3(A, B, C, U, R)
+
+#define _mm512_maskz_fmadd_round_ps(U, A, B, C, R)   \
+    (__m512)__builtin_ia32_vfmaddps512_maskz(A, B, C, U, R)
+
+#define _mm512_fmsub_round_pd(A, B, C, R)            \
+    (__m512d)__builtin_ia32_vfmsubpd512_mask(A, B, C, -1, R)
+
+#define _mm512_mask_fmsub_round_pd(A, U, B, C, R)    \
+    (__m512d)__builtin_ia32_vfmsubpd512_mask(A, B, C, U, R)
+
+#define _mm512_mask3_fmsub_round_pd(A, B, C, U, R)   \
+    (__m512d)__builtin_ia32_vfmsubpd512_mask3(A, B, C, U, R)
+
+#define _mm512_maskz_fmsub_round_pd(U, A, B, C, R)   \
+    (__m512d)__builtin_ia32_vfmsubpd512_maskz(A, B, C, U, R)
+
+#define _mm512_fmsub_round_ps(A, B, C, R)            \
+    (__m512)__builtin_ia32_vfmsubps512_mask(A, B, C, -1, R)
+
+#define _mm512_mask_fmsub_round_ps(A, U, B, C, R)    \
+    (__m512)__builtin_ia32_vfmsubps512_mask(A, B, C, U, R)
+
+#define _mm512_mask3_fmsub_round_ps(A, B, C, U, R)   \
+    (__m512)__builtin_ia32_vfmsubps512_mask3(A, B, C, U, R)
+
+#define _mm512_maskz_fmsub_round_ps(U, A, B, C, R)   \
+    (__m512)__builtin_ia32_vfmsubps512_maskz(A, B, C, U, R)
+
+#define _mm512_fmaddsub_round_pd(A, B, C, R)            \
+    (__m512d)__builtin_ia32_vfmaddsubpd512_mask(A, B, C, -1, R)
+
+#define _mm512_mask_fmaddsub_round_pd(A, U, B, C, R)    \
+    (__m512d)__builtin_ia32_vfmaddsubpd512_mask(A, B, C, U, R)
+
+#define _mm512_mask3_fmaddsub_round_pd(A, B, C, U, R)   \
+    (__m512d)__builtin_ia32_vfmaddsubpd512_mask3(A, B, C, U, R)
+
+#define _mm512_maskz_fmaddsub_round_pd(U, A, B, C, R)   \
+    (__m512d)__builtin_ia32_vfmaddsubpd512_maskz(A, B, C, U, R)
+
+#define _mm512_fmaddsub_round_ps(A, B, C, R)            \
+    (__m512)__builtin_ia32_vfmaddsubps512_mask(A, B, C, -1, R)
+
+#define _mm512_mask_fmaddsub_round_ps(A, U, B, C, R)    \
+    (__m512)__builtin_ia32_vfmaddsubps512_mask(A, B, C, U, R)
+
+#define _mm512_mask3_fmaddsub_round_ps(A, B, C, U, R)   \
+    (__m512)__builtin_ia32_vfmaddsubps512_mask3(A, B, C, U, R)
+
+#define _mm512_maskz_fmaddsub_round_ps(U, A, B, C, R)   \
+    (__m512)__builtin_ia32_vfmaddsubps512_maskz(A, B, C, U, R)
+
+#define _mm512_fmsubadd_round_pd(A, B, C, R)            \
+    (__m512d)__builtin_ia32_vfmaddsubpd512_mask(A, B, -(C), -1, R)
+
+#define _mm512_mask_fmsubadd_round_pd(A, U, B, C, R)    \
+    (__m512d)__builtin_ia32_vfmaddsubpd512_mask(A, B, -(C), U, R)
+
+#define _mm512_mask3_fmsubadd_round_pd(A, B, C, U, R)   \
+    (__m512d)__builtin_ia32_vfmsubaddpd512_mask3(A, B, C, U, R)
+
+#define _mm512_maskz_fmsubadd_round_pd(U, A, B, C, R)   \
+    (__m512d)__builtin_ia32_vfmaddsubpd512_maskz(A, B, -(C), U, R)
+
+#define _mm512_fmsubadd_round_ps(A, B, C, R)            \
+    (__m512)__builtin_ia32_vfmaddsubps512_mask(A, B, -(C), -1, R)
+
+#define _mm512_mask_fmsubadd_round_ps(A, U, B, C, R)    \
+    (__m512)__builtin_ia32_vfmaddsubps512_mask(A, B, -(C), U, R)
+
+#define _mm512_mask3_fmsubadd_round_ps(A, B, C, U, R)   \
+    (__m512)__builtin_ia32_vfmsubaddps512_mask3(A, B, C, U, R)
+
+#define _mm512_maskz_fmsubadd_round_ps(U, A, B, C, R)   \
+    (__m512)__builtin_ia32_vfmaddsubps512_maskz(A, B, -(C), U, R)
+
+#define _mm512_fnmadd_round_pd(A, B, C, R)            \
+    (__m512d)__builtin_ia32_vfnmaddpd512_mask(A, B, C, -1, R)
+
+#define _mm512_mask_fnmadd_round_pd(A, U, B, C, R)    \
+    (__m512d)__builtin_ia32_vfnmaddpd512_mask(A, B, C, U, R)
+
+#define _mm512_mask3_fnmadd_round_pd(A, B, C, U, R)   \
+    (__m512d)__builtin_ia32_vfnmaddpd512_mask3(A, B, C, U, R)
+
+#define _mm512_maskz_fnmadd_round_pd(U, A, B, C, R)   \
+    (__m512d)__builtin_ia32_vfnmaddpd512_maskz(A, B, C, U, R)
+
+#define _mm512_fnmadd_round_ps(A, B, C, R)            \
+    (__m512)__builtin_ia32_vfnmaddps512_mask(A, B, C, -1, R)
+
+#define _mm512_mask_fnmadd_round_ps(A, U, B, C, R)    \
+    (__m512)__builtin_ia32_vfnmaddps512_mask(A, B, C, U, R)
+
+#define _mm512_mask3_fnmadd_round_ps(A, B, C, U, R)   \
+    (__m512)__builtin_ia32_vfnmaddps512_mask3(A, B, C, U, R)
+
+#define _mm512_maskz_fnmadd_round_ps(U, A, B, C, R)   \
+    (__m512)__builtin_ia32_vfnmaddps512_maskz(A, B, C, U, R)
+
+#define _mm512_fnmsub_round_pd(A, B, C, R)            \
+    (__m512d)__builtin_ia32_vfnmsubpd512_mask(A, B, C, -1, R)
+
+#define _mm512_mask_fnmsub_round_pd(A, U, B, C, R)    \
+    (__m512d)__builtin_ia32_vfnmsubpd512_mask(A, B, C, U, R)
+
+#define _mm512_mask3_fnmsub_round_pd(A, B, C, U, R)   \
+    (__m512d)__builtin_ia32_vfnmsubpd512_mask3(A, B, C, U, R)
+
+#define _mm512_maskz_fnmsub_round_pd(U, A, B, C, R)   \
+    (__m512d)__builtin_ia32_vfnmsubpd512_maskz(A, B, C, U, R)
+
+#define _mm512_fnmsub_round_ps(A, B, C, R)            \
+    (__m512)__builtin_ia32_vfnmsubps512_mask(A, B, C, -1, R)
+
+#define _mm512_mask_fnmsub_round_ps(A, U, B, C, R)    \
+    (__m512)__builtin_ia32_vfnmsubps512_mask(A, B, C, U, R)
+
+#define _mm512_mask3_fnmsub_round_ps(A, B, C, U, R)   \
+    (__m512)__builtin_ia32_vfnmsubps512_mask3(A, B, C, U, R)
+
+#define _mm512_maskz_fnmsub_round_ps(U, A, B, C, R)   \
+    (__m512)__builtin_ia32_vfnmsubps512_maskz(A, B, C, U, R)
+#endif
+
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_permute_pd (__m512d __X, const int __C)
+_mm512_abs_epi64 (__m512i __A)
 {
-  return (__m512d) __builtin_ia32_vpermilpd512_mask ((__v8df) __X, __C,
-						     (__v8df)
-						     _mm512_undefined_pd (),
-						     (__mmask8) -1);
+  return (__m512i) __builtin_ia32_pabsq512_mask ((__v8di) __A,
+						 (__v8di)
+						 _mm512_undefined_epi32 (),
+						 (__mmask8) -1);
 }
 
-extern __inline __m512d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_permute_pd (__m512d __W, __mmask8 __U, __m512d __X, const int __C)
+_mm512_mask_abs_epi64 (__m512i __W, __mmask8 __U, __m512i __A)
 {
-  return (__m512d) __builtin_ia32_vpermilpd512_mask ((__v8df) __X, __C,
-						     (__v8df) __W,
-						     (__mmask8) __U);
+  return (__m512i) __builtin_ia32_pabsq512_mask ((__v8di) __A,
+						 (__v8di) __W,
+						 (__mmask8) __U);
 }
 
-extern __inline __m512d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_permute_pd (__mmask8 __U, __m512d __X, const int __C)
+_mm512_maskz_abs_epi64 (__mmask8 __U, __m512i __A)
 {
-  return (__m512d) __builtin_ia32_vpermilpd512_mask ((__v8df) __X, __C,
-						     (__v8df)
-						     _mm512_setzero_pd (),
-						     (__mmask8) __U);
+  return (__m512i) __builtin_ia32_pabsq512_mask ((__v8di) __A,
+						 (__v8di)
+						 _mm512_setzero_si512 (),
+						 (__mmask8) __U);
 }
 
-extern __inline __m512
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_permute_ps (__m512 __X, const int __C)
+_mm512_abs_epi32 (__m512i __A)
 {
-  return (__m512) __builtin_ia32_vpermilps512_mask ((__v16sf) __X, __C,
-						    (__v16sf)
-						    _mm512_undefined_ps (),
-						    (__mmask16) -1);
+  return (__m512i) __builtin_ia32_pabsd512_mask ((__v16si) __A,
+						 (__v16si)
+						 _mm512_undefined_epi32 (),
+						 (__mmask16) -1);
 }
 
-extern __inline __m512
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_permute_ps (__m512 __W, __mmask16 __U, __m512 __X, const int __C)
+_mm512_mask_abs_epi32 (__m512i __W, __mmask16 __U, __m512i __A)
 {
-  return (__m512) __builtin_ia32_vpermilps512_mask ((__v16sf) __X, __C,
-						    (__v16sf) __W,
-						    (__mmask16) __U);
+  return (__m512i) __builtin_ia32_pabsd512_mask ((__v16si) __A,
+						 (__v16si) __W,
+						 (__mmask16) __U);
 }
 
-extern __inline __m512
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_permute_ps (__mmask16 __U, __m512 __X, const int __C)
+_mm512_maskz_abs_epi32 (__mmask16 __U, __m512i __A)
 {
-  return (__m512) __builtin_ia32_vpermilps512_mask ((__v16sf) __X, __C,
-						    (__v16sf)
-						    _mm512_setzero_ps (),
-						    (__mmask16) __U);
+  return (__m512i) __builtin_ia32_pabsd512_mask ((__v16si) __A,
+						 (__v16si)
+						 _mm512_setzero_si512 (),
+						 (__mmask16) __U);
 }
-#else
-#define _mm512_permute_pd(X, C)							    \
-  ((__m512d) __builtin_ia32_vpermilpd512_mask ((__v8df)(__m512d)(X), (int)(C),	    \
-					      (__v8df)(__m512d)_mm512_undefined_pd(),\
-					      (__mmask8)(-1)))
-
-#define _mm512_mask_permute_pd(W, U, X, C)					    \
-  ((__m512d) __builtin_ia32_vpermilpd512_mask ((__v8df)(__m512d)(X), (int)(C),	    \
-					      (__v8df)(__m512d)(W),		    \
-					      (__mmask8)(U)))
-
-#define _mm512_maskz_permute_pd(U, X, C)					    \
-  ((__m512d) __builtin_ia32_vpermilpd512_mask ((__v8df)(__m512d)(X), (int)(C),	    \
-					      (__v8df)(__m512d)_mm512_setzero_pd(), \
-					      (__mmask8)(U)))
-
-#define _mm512_permute_ps(X, C)							    \
-  ((__m512) __builtin_ia32_vpermilps512_mask ((__v16sf)(__m512)(X), (int)(C),	    \
-					      (__v16sf)(__m512)_mm512_undefined_ps(),\
-					      (__mmask16)(-1)))
-
-#define _mm512_mask_permute_ps(W, U, X, C)					    \
-  ((__m512) __builtin_ia32_vpermilps512_mask ((__v16sf)(__m512)(X), (int)(C),	    \
-					      (__v16sf)(__m512)(W),		    \
-					      (__mmask16)(U)))
-
-#define _mm512_maskz_permute_ps(U, X, C)					    \
-  ((__m512) __builtin_ia32_vpermilps512_mask ((__v16sf)(__m512)(X), (int)(C),	    \
-					      (__v16sf)(__m512)_mm512_setzero_ps(), \
-					      (__mmask16)(U)))
-#endif
 
-#ifdef __OPTIMIZE__
-extern __inline __m512i
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_permutex_epi64 (__m512i __X, const int __I)
+_mm512_broadcastss_ps (__m128 __A)
 {
-  return (__m512i) __builtin_ia32_permdi512_mask ((__v8di) __X, __I,
-						  (__v8di)
-						  _mm512_undefined_epi32 (),
-						  (__mmask8) (-1));
+  return (__m512) __builtin_ia32_broadcastss512 ((__v4sf) __A,
+						 (__v16sf)
+						 _mm512_undefined_ps (),
+						 (__mmask16) -1);
 }
 
-extern __inline __m512i
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_permutex_epi64 (__m512i __W, __mmask8 __M,
-			    __m512i __X, const int __I)
+_mm512_mask_broadcastss_ps (__m512 __O, __mmask16 __M, __m128 __A)
 {
-  return (__m512i) __builtin_ia32_permdi512_mask ((__v8di) __X, __I,
-						  (__v8di) __W,
-						  (__mmask8) __M);
+  return (__m512) __builtin_ia32_broadcastss512 ((__v4sf) __A,
+						 (__v16sf) __O, __M);
 }
 
-extern __inline __m512i
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_permutex_epi64 (__mmask8 __M, __m512i __X, const int __I)
+_mm512_maskz_broadcastss_ps (__mmask16 __M, __m128 __A)
 {
-  return (__m512i) __builtin_ia32_permdi512_mask ((__v8di) __X, __I,
-						  (__v8di)
-						  _mm512_setzero_si512 (),
-						  (__mmask8) __M);
+  return (__m512) __builtin_ia32_broadcastss512 ((__v4sf) __A,
+						 (__v16sf)
+						 _mm512_setzero_ps (),
+						 __M);
 }
 
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_permutex_pd (__m512d __X, const int __M)
+_mm512_broadcastsd_pd (__m128d __A)
 {
-  return (__m512d) __builtin_ia32_permdf512_mask ((__v8df) __X, __M,
+  return (__m512d) __builtin_ia32_broadcastsd512 ((__v2df) __A,
 						  (__v8df)
 						  _mm512_undefined_pd (),
 						  (__mmask8) -1);
@@ -6951,1543 +7236,1580 @@ _mm512_permutex_pd (__m512d __X, const int __M)
 
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_permutex_pd (__m512d __W, __mmask8 __U, __m512d __X, const int __M)
+_mm512_mask_broadcastsd_pd (__m512d __O, __mmask8 __M, __m128d __A)
 {
-  return (__m512d) __builtin_ia32_permdf512_mask ((__v8df) __X, __M,
-						  (__v8df) __W,
-						  (__mmask8) __U);
+  return (__m512d) __builtin_ia32_broadcastsd512 ((__v2df) __A,
+						  (__v8df) __O, __M);
 }
 
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_permutex_pd (__mmask8 __U, __m512d __X, const int __M)
+_mm512_maskz_broadcastsd_pd (__mmask8 __M, __m128d __A)
 {
-  return (__m512d) __builtin_ia32_permdf512_mask ((__v8df) __X, __M,
+  return (__m512d) __builtin_ia32_broadcastsd512 ((__v2df) __A,
 						  (__v8df)
 						  _mm512_setzero_pd (),
-						  (__mmask8) __U);
+						  __M);
 }
-#else
-#define _mm512_permutex_pd(X, M)						\
-  ((__m512d) __builtin_ia32_permdf512_mask ((__v8df)(__m512d)(X), (int)(M),	\
-					    (__v8df)(__m512d)_mm512_undefined_pd(),\
-					    (__mmask8)-1))
-
-#define _mm512_mask_permutex_pd(W, U, X, M)					\
-  ((__m512d) __builtin_ia32_permdf512_mask ((__v8df)(__m512d)(X), (int)(M),	\
-					    (__v8df)(__m512d)(W), (__mmask8)(U)))
 
-#define _mm512_maskz_permutex_pd(U, X, M)					\
-  ((__m512d) __builtin_ia32_permdf512_mask ((__v8df)(__m512d)(X), (int)(M),	\
-					    (__v8df)(__m512d)_mm512_setzero_pd(),\
-					    (__mmask8)(U)))
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_broadcastd_epi32 (__m128i __A)
+{
+  return (__m512i) __builtin_ia32_pbroadcastd512 ((__v4si) __A,
+						  (__v16si)
+						  _mm512_undefined_epi32 (),
+						  (__mmask16) -1);
+}
 
-#define _mm512_permutex_epi64(X, I)			          \
-  ((__m512i) __builtin_ia32_permdi512_mask ((__v8di)(__m512i)(X), \
-					    (int)(I),             \
-					    (__v8di)(__m512i)	  \
-					    (_mm512_undefined_epi32 ()),\
-					    (__mmask8)(-1)))
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_broadcastd_epi32 (__m512i __O, __mmask16 __M, __m128i __A)
+{
+  return (__m512i) __builtin_ia32_pbroadcastd512 ((__v4si) __A,
+						  (__v16si) __O, __M);
+}
 
-#define _mm512_maskz_permutex_epi64(M, X, I)                 \
-  ((__m512i) __builtin_ia32_permdi512_mask ((__v8di)(__m512i)(X), \
-					    (int)(I),             \
-					    (__v8di)(__m512i)     \
-					    (_mm512_setzero_si512 ()),\
-					    (__mmask8)(M)))
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_broadcastd_epi32 (__mmask16 __M, __m128i __A)
+{
+  return (__m512i) __builtin_ia32_pbroadcastd512 ((__v4si) __A,
+						  (__v16si)
+						  _mm512_setzero_si512 (),
+						  __M);
+}
 
-#define _mm512_mask_permutex_epi64(W, M, X, I)               \
-  ((__m512i) __builtin_ia32_permdi512_mask ((__v8di)(__m512i)(X), \
-					    (int)(I),             \
-					    (__v8di)(__m512i)(W), \
-					    (__mmask8)(M)))
-#endif
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_set1_epi32 (int __A)
+{
+  return (__m512i)(__v16si)
+    { __A, __A, __A, __A, __A, __A, __A, __A,
+      __A, __A, __A, __A, __A, __A, __A, __A };
+}
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_permutexvar_epi64 (__mmask8 __M, __m512i __X, __m512i __Y)
+_mm512_mask_set1_epi32 (__m512i __O, __mmask16 __M, int __A)
 {
-  return (__m512i) __builtin_ia32_permvardi512_mask ((__v8di) __Y,
-						     (__v8di) __X,
-						     (__v8di)
-						     _mm512_setzero_si512 (),
-						     __M);
+  return (__m512i) __builtin_ia32_pbroadcastd512_gpr_mask (__A, (__v16si) __O,
+							   __M);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_permutexvar_epi64 (__m512i __X, __m512i __Y)
+_mm512_maskz_set1_epi32 (__mmask16 __M, int __A)
 {
-  return (__m512i) __builtin_ia32_permvardi512_mask ((__v8di) __Y,
-						     (__v8di) __X,
-						     (__v8di)
-						     _mm512_undefined_epi32 (),
-						     (__mmask8) -1);
+  return (__m512i)
+	 __builtin_ia32_pbroadcastd512_gpr_mask (__A,
+						 (__v16si) _mm512_setzero_si512 (),
+						 __M);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_permutexvar_epi64 (__m512i __W, __mmask8 __M, __m512i __X,
-			       __m512i __Y)
+_mm512_broadcastq_epi64 (__m128i __A)
 {
-  return (__m512i) __builtin_ia32_permvardi512_mask ((__v8di) __Y,
-						     (__v8di) __X,
-						     (__v8di) __W,
-						     __M);
+  return (__m512i) __builtin_ia32_pbroadcastq512 ((__v2di) __A,
+						  (__v8di)
+						  _mm512_undefined_epi32 (),
+						  (__mmask8) -1);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_permutexvar_epi32 (__mmask16 __M, __m512i __X, __m512i __Y)
+_mm512_mask_broadcastq_epi64 (__m512i __O, __mmask8 __M, __m128i __A)
 {
-  return (__m512i) __builtin_ia32_permvarsi512_mask ((__v16si) __Y,
-						     (__v16si) __X,
-						     (__v16si)
-						     _mm512_setzero_si512 (),
-						     __M);
+  return (__m512i) __builtin_ia32_pbroadcastq512 ((__v2di) __A,
+						  (__v8di) __O, __M);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_permutexvar_epi32 (__m512i __X, __m512i __Y)
+_mm512_maskz_broadcastq_epi64 (__mmask8 __M, __m128i __A)
 {
-  return (__m512i) __builtin_ia32_permvarsi512_mask ((__v16si) __Y,
-						     (__v16si) __X,
-						     (__v16si)
-						     _mm512_undefined_epi32 (),
-						     (__mmask16) -1);
+  return (__m512i) __builtin_ia32_pbroadcastq512 ((__v2di) __A,
+						  (__v8di)
+						  _mm512_setzero_si512 (),
+						  __M);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_permutexvar_epi32 (__m512i __W, __mmask16 __M, __m512i __X,
-			       __m512i __Y)
+_mm512_set1_epi64 (long long __A)
 {
-  return (__m512i) __builtin_ia32_permvarsi512_mask ((__v16si) __Y,
-						     (__v16si) __X,
-						     (__v16si) __W,
-						     __M);
+  return (__m512i)(__v8di) { __A, __A, __A, __A, __A, __A, __A, __A };
 }
 
-extern __inline __m512d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_permutexvar_pd (__m512i __X, __m512d __Y)
+_mm512_mask_set1_epi64 (__m512i __O, __mmask8 __M, long long __A)
 {
-  return (__m512d) __builtin_ia32_permvardf512_mask ((__v8df) __Y,
-						     (__v8di) __X,
-						     (__v8df)
-						     _mm512_undefined_pd (),
-						     (__mmask8) -1);
+  return (__m512i) __builtin_ia32_pbroadcastq512_gpr_mask (__A, (__v8di) __O,
+							   __M);
 }
 
-extern __inline __m512d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_permutexvar_pd (__m512d __W, __mmask8 __U, __m512i __X, __m512d __Y)
+_mm512_maskz_set1_epi64 (__mmask8 __M, long long __A)
 {
-  return (__m512d) __builtin_ia32_permvardf512_mask ((__v8df) __Y,
-						     (__v8di) __X,
-						     (__v8df) __W,
-						     (__mmask8) __U);
+  return (__m512i)
+	 __builtin_ia32_pbroadcastq512_gpr_mask (__A,
+						 (__v8di) _mm512_setzero_si512 (),
+						 __M);
 }
 
-extern __inline __m512d
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_permutexvar_pd (__mmask8 __U, __m512i __X, __m512d __Y)
+_mm512_broadcast_f32x4 (__m128 __A)
 {
-  return (__m512d) __builtin_ia32_permvardf512_mask ((__v8df) __Y,
-						     (__v8di) __X,
-						     (__v8df)
-						     _mm512_setzero_pd (),
-						     (__mmask8) __U);
+  return (__m512) __builtin_ia32_broadcastf32x4_512 ((__v4sf) __A,
+						     (__v16sf)
+						     _mm512_undefined_ps (),
+						     (__mmask16) -1);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_permutexvar_ps (__m512i __X, __m512 __Y)
+_mm512_mask_broadcast_f32x4 (__m512 __O, __mmask16 __M, __m128 __A)
 {
-  return (__m512) __builtin_ia32_permvarsf512_mask ((__v16sf) __Y,
-						    (__v16si) __X,
-						    (__v16sf)
-						    _mm512_undefined_ps (),
-						    (__mmask16) -1);
+  return (__m512) __builtin_ia32_broadcastf32x4_512 ((__v4sf) __A,
+						     (__v16sf) __O,
+						     __M);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_permutexvar_ps (__m512 __W, __mmask16 __U, __m512i __X, __m512 __Y)
+_mm512_maskz_broadcast_f32x4 (__mmask16 __M, __m128 __A)
 {
-  return (__m512) __builtin_ia32_permvarsf512_mask ((__v16sf) __Y,
-						    (__v16si) __X,
-						    (__v16sf) __W,
-						    (__mmask16) __U);
+  return (__m512) __builtin_ia32_broadcastf32x4_512 ((__v4sf) __A,
+						     (__v16sf)
+						     _mm512_setzero_ps (),
+						     __M);
 }
 
-extern __inline __m512
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_permutexvar_ps (__mmask16 __U, __m512i __X, __m512 __Y)
+_mm512_broadcast_i32x4 (__m128i __A)
 {
-  return (__m512) __builtin_ia32_permvarsf512_mask ((__v16sf) __Y,
-						    (__v16si) __X,
-						    (__v16sf)
-						    _mm512_setzero_ps (),
-						    (__mmask16) __U);
+  return (__m512i) __builtin_ia32_broadcasti32x4_512 ((__v4si) __A,
+						      (__v16si)
+						      _mm512_undefined_epi32 (),
+						      (__mmask16) -1);
 }
 
-#ifdef __OPTIMIZE__
-extern __inline __m512
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_shuffle_ps (__m512 __M, __m512 __V, const int __imm)
+_mm512_mask_broadcast_i32x4 (__m512i __O, __mmask16 __M, __m128i __A)
 {
-  return (__m512) __builtin_ia32_shufps512_mask ((__v16sf) __M,
-						 (__v16sf) __V, __imm,
-						 (__v16sf)
-						 _mm512_undefined_ps (),
-						 (__mmask16) -1);
+  return (__m512i) __builtin_ia32_broadcasti32x4_512 ((__v4si) __A,
+						      (__v16si) __O,
+						      __M);
 }
 
-extern __inline __m512
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_shuffle_ps (__m512 __W, __mmask16 __U, __m512 __M,
-			__m512 __V, const int __imm)
+_mm512_maskz_broadcast_i32x4 (__mmask16 __M, __m128i __A)
 {
-  return (__m512) __builtin_ia32_shufps512_mask ((__v16sf) __M,
-						 (__v16sf) __V, __imm,
-						 (__v16sf) __W,
-						 (__mmask16) __U);
+  return (__m512i) __builtin_ia32_broadcasti32x4_512 ((__v4si) __A,
+						      (__v16si)
+						      _mm512_setzero_si512 (),
+						      __M);
 }
 
-extern __inline __m512
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_shuffle_ps (__mmask16 __U, __m512 __M, __m512 __V, const int __imm)
+_mm512_broadcast_f64x4 (__m256d __A)
 {
-  return (__m512) __builtin_ia32_shufps512_mask ((__v16sf) __M,
-						 (__v16sf) __V, __imm,
-						 (__v16sf)
-						 _mm512_setzero_ps (),
-						 (__mmask16) __U);
+  return (__m512d) __builtin_ia32_broadcastf64x4_512 ((__v4df) __A,
+						      (__v8df)
+						      _mm512_undefined_pd (),
+						      (__mmask8) -1);
 }
 
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_shuffle_pd (__m512d __M, __m512d __V, const int __imm)
+_mm512_mask_broadcast_f64x4 (__m512d __O, __mmask8 __M, __m256d __A)
 {
-  return (__m512d) __builtin_ia32_shufpd512_mask ((__v8df) __M,
-						  (__v8df) __V, __imm,
-						  (__v8df)
-						  _mm512_undefined_pd (),
-						  (__mmask8) -1);
+  return (__m512d) __builtin_ia32_broadcastf64x4_512 ((__v4df) __A,
+						      (__v8df) __O,
+						      __M);
 }
 
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_shuffle_pd (__m512d __W, __mmask8 __U, __m512d __M,
-			__m512d __V, const int __imm)
+_mm512_maskz_broadcast_f64x4 (__mmask8 __M, __m256d __A)
 {
-  return (__m512d) __builtin_ia32_shufpd512_mask ((__v8df) __M,
-						  (__v8df) __V, __imm,
-						  (__v8df) __W,
-						  (__mmask8) __U);
+  return (__m512d) __builtin_ia32_broadcastf64x4_512 ((__v4df) __A,
+						      (__v8df)
+						      _mm512_setzero_pd (),
+						      __M);
 }
 
-extern __inline __m512d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_shuffle_pd (__mmask8 __U, __m512d __M, __m512d __V,
-			 const int __imm)
+_mm512_broadcast_i64x4 (__m256i __A)
 {
-  return (__m512d) __builtin_ia32_shufpd512_mask ((__v8df) __M,
-						  (__v8df) __V, __imm,
-						  (__v8df)
-						  _mm512_setzero_pd (),
-						  (__mmask8) __U);
+  return (__m512i) __builtin_ia32_broadcasti64x4_512 ((__v4di) __A,
+						      (__v8di)
+						      _mm512_undefined_epi32 (),
+						      (__mmask8) -1);
 }
 
-extern __inline __m512d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fixupimm_round_pd (__m512d __A, __m512d __B, __m512i __C,
-			  const int __imm, const int __R)
+_mm512_mask_broadcast_i64x4 (__m512i __O, __mmask8 __M, __m256i __A)
 {
-  return (__m512d) __builtin_ia32_fixupimmpd512_mask ((__v8df) __A,
-						      (__v8df) __B,
-						      (__v8di) __C,
-						      __imm,
-						      (__mmask8) -1, __R);
+  return (__m512i) __builtin_ia32_broadcasti64x4_512 ((__v4di) __A,
+						      (__v8di) __O,
+						      __M);
 }
 
-extern __inline __m512d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fixupimm_round_pd (__m512d __A, __mmask8 __U, __m512d __B,
-			       __m512i __C, const int __imm, const int __R)
+_mm512_maskz_broadcast_i64x4 (__mmask8 __M, __m256i __A)
 {
-  return (__m512d) __builtin_ia32_fixupimmpd512_mask ((__v8df) __A,
-						      (__v8df) __B,
-						      (__v8di) __C,
-						      __imm,
-						      (__mmask8) __U, __R);
+  return (__m512i) __builtin_ia32_broadcasti64x4_512 ((__v4di) __A,
+						      (__v8di)
+						      _mm512_setzero_si512 (),
+						      __M);
 }
 
-extern __inline __m512d
+typedef enum
+{
+  _MM_PERM_AAAA = 0x00, _MM_PERM_AAAB = 0x01, _MM_PERM_AAAC = 0x02,
+  _MM_PERM_AAAD = 0x03, _MM_PERM_AABA = 0x04, _MM_PERM_AABB = 0x05,
+  _MM_PERM_AABC = 0x06, _MM_PERM_AABD = 0x07, _MM_PERM_AACA = 0x08,
+  _MM_PERM_AACB = 0x09, _MM_PERM_AACC = 0x0A, _MM_PERM_AACD = 0x0B,
+  _MM_PERM_AADA = 0x0C, _MM_PERM_AADB = 0x0D, _MM_PERM_AADC = 0x0E,
+  _MM_PERM_AADD = 0x0F, _MM_PERM_ABAA = 0x10, _MM_PERM_ABAB = 0x11,
+  _MM_PERM_ABAC = 0x12, _MM_PERM_ABAD = 0x13, _MM_PERM_ABBA = 0x14,
+  _MM_PERM_ABBB = 0x15, _MM_PERM_ABBC = 0x16, _MM_PERM_ABBD = 0x17,
+  _MM_PERM_ABCA = 0x18, _MM_PERM_ABCB = 0x19, _MM_PERM_ABCC = 0x1A,
+  _MM_PERM_ABCD = 0x1B, _MM_PERM_ABDA = 0x1C, _MM_PERM_ABDB = 0x1D,
+  _MM_PERM_ABDC = 0x1E, _MM_PERM_ABDD = 0x1F, _MM_PERM_ACAA = 0x20,
+  _MM_PERM_ACAB = 0x21, _MM_PERM_ACAC = 0x22, _MM_PERM_ACAD = 0x23,
+  _MM_PERM_ACBA = 0x24, _MM_PERM_ACBB = 0x25, _MM_PERM_ACBC = 0x26,
+  _MM_PERM_ACBD = 0x27, _MM_PERM_ACCA = 0x28, _MM_PERM_ACCB = 0x29,
+  _MM_PERM_ACCC = 0x2A, _MM_PERM_ACCD = 0x2B, _MM_PERM_ACDA = 0x2C,
+  _MM_PERM_ACDB = 0x2D, _MM_PERM_ACDC = 0x2E, _MM_PERM_ACDD = 0x2F,
+  _MM_PERM_ADAA = 0x30, _MM_PERM_ADAB = 0x31, _MM_PERM_ADAC = 0x32,
+  _MM_PERM_ADAD = 0x33, _MM_PERM_ADBA = 0x34, _MM_PERM_ADBB = 0x35,
+  _MM_PERM_ADBC = 0x36, _MM_PERM_ADBD = 0x37, _MM_PERM_ADCA = 0x38,
+  _MM_PERM_ADCB = 0x39, _MM_PERM_ADCC = 0x3A, _MM_PERM_ADCD = 0x3B,
+  _MM_PERM_ADDA = 0x3C, _MM_PERM_ADDB = 0x3D, _MM_PERM_ADDC = 0x3E,
+  _MM_PERM_ADDD = 0x3F, _MM_PERM_BAAA = 0x40, _MM_PERM_BAAB = 0x41,
+  _MM_PERM_BAAC = 0x42, _MM_PERM_BAAD = 0x43, _MM_PERM_BABA = 0x44,
+  _MM_PERM_BABB = 0x45, _MM_PERM_BABC = 0x46, _MM_PERM_BABD = 0x47,
+  _MM_PERM_BACA = 0x48, _MM_PERM_BACB = 0x49, _MM_PERM_BACC = 0x4A,
+  _MM_PERM_BACD = 0x4B, _MM_PERM_BADA = 0x4C, _MM_PERM_BADB = 0x4D,
+  _MM_PERM_BADC = 0x4E, _MM_PERM_BADD = 0x4F, _MM_PERM_BBAA = 0x50,
+  _MM_PERM_BBAB = 0x51, _MM_PERM_BBAC = 0x52, _MM_PERM_BBAD = 0x53,
+  _MM_PERM_BBBA = 0x54, _MM_PERM_BBBB = 0x55, _MM_PERM_BBBC = 0x56,
+  _MM_PERM_BBBD = 0x57, _MM_PERM_BBCA = 0x58, _MM_PERM_BBCB = 0x59,
+  _MM_PERM_BBCC = 0x5A, _MM_PERM_BBCD = 0x5B, _MM_PERM_BBDA = 0x5C,
+  _MM_PERM_BBDB = 0x5D, _MM_PERM_BBDC = 0x5E, _MM_PERM_BBDD = 0x5F,
+  _MM_PERM_BCAA = 0x60, _MM_PERM_BCAB = 0x61, _MM_PERM_BCAC = 0x62,
+  _MM_PERM_BCAD = 0x63, _MM_PERM_BCBA = 0x64, _MM_PERM_BCBB = 0x65,
+  _MM_PERM_BCBC = 0x66, _MM_PERM_BCBD = 0x67, _MM_PERM_BCCA = 0x68,
+  _MM_PERM_BCCB = 0x69, _MM_PERM_BCCC = 0x6A, _MM_PERM_BCCD = 0x6B,
+  _MM_PERM_BCDA = 0x6C, _MM_PERM_BCDB = 0x6D, _MM_PERM_BCDC = 0x6E,
+  _MM_PERM_BCDD = 0x6F, _MM_PERM_BDAA = 0x70, _MM_PERM_BDAB = 0x71,
+  _MM_PERM_BDAC = 0x72, _MM_PERM_BDAD = 0x73, _MM_PERM_BDBA = 0x74,
+  _MM_PERM_BDBB = 0x75, _MM_PERM_BDBC = 0x76, _MM_PERM_BDBD = 0x77,
+  _MM_PERM_BDCA = 0x78, _MM_PERM_BDCB = 0x79, _MM_PERM_BDCC = 0x7A,
+  _MM_PERM_BDCD = 0x7B, _MM_PERM_BDDA = 0x7C, _MM_PERM_BDDB = 0x7D,
+  _MM_PERM_BDDC = 0x7E, _MM_PERM_BDDD = 0x7F, _MM_PERM_CAAA = 0x80,
+  _MM_PERM_CAAB = 0x81, _MM_PERM_CAAC = 0x82, _MM_PERM_CAAD = 0x83,
+  _MM_PERM_CABA = 0x84, _MM_PERM_CABB = 0x85, _MM_PERM_CABC = 0x86,
+  _MM_PERM_CABD = 0x87, _MM_PERM_CACA = 0x88, _MM_PERM_CACB = 0x89,
+  _MM_PERM_CACC = 0x8A, _MM_PERM_CACD = 0x8B, _MM_PERM_CADA = 0x8C,
+  _MM_PERM_CADB = 0x8D, _MM_PERM_CADC = 0x8E, _MM_PERM_CADD = 0x8F,
+  _MM_PERM_CBAA = 0x90, _MM_PERM_CBAB = 0x91, _MM_PERM_CBAC = 0x92,
+  _MM_PERM_CBAD = 0x93, _MM_PERM_CBBA = 0x94, _MM_PERM_CBBB = 0x95,
+  _MM_PERM_CBBC = 0x96, _MM_PERM_CBBD = 0x97, _MM_PERM_CBCA = 0x98,
+  _MM_PERM_CBCB = 0x99, _MM_PERM_CBCC = 0x9A, _MM_PERM_CBCD = 0x9B,
+  _MM_PERM_CBDA = 0x9C, _MM_PERM_CBDB = 0x9D, _MM_PERM_CBDC = 0x9E,
+  _MM_PERM_CBDD = 0x9F, _MM_PERM_CCAA = 0xA0, _MM_PERM_CCAB = 0xA1,
+  _MM_PERM_CCAC = 0xA2, _MM_PERM_CCAD = 0xA3, _MM_PERM_CCBA = 0xA4,
+  _MM_PERM_CCBB = 0xA5, _MM_PERM_CCBC = 0xA6, _MM_PERM_CCBD = 0xA7,
+  _MM_PERM_CCCA = 0xA8, _MM_PERM_CCCB = 0xA9, _MM_PERM_CCCC = 0xAA,
+  _MM_PERM_CCCD = 0xAB, _MM_PERM_CCDA = 0xAC, _MM_PERM_CCDB = 0xAD,
+  _MM_PERM_CCDC = 0xAE, _MM_PERM_CCDD = 0xAF, _MM_PERM_CDAA = 0xB0,
+  _MM_PERM_CDAB = 0xB1, _MM_PERM_CDAC = 0xB2, _MM_PERM_CDAD = 0xB3,
+  _MM_PERM_CDBA = 0xB4, _MM_PERM_CDBB = 0xB5, _MM_PERM_CDBC = 0xB6,
+  _MM_PERM_CDBD = 0xB7, _MM_PERM_CDCA = 0xB8, _MM_PERM_CDCB = 0xB9,
+  _MM_PERM_CDCC = 0xBA, _MM_PERM_CDCD = 0xBB, _MM_PERM_CDDA = 0xBC,
+  _MM_PERM_CDDB = 0xBD, _MM_PERM_CDDC = 0xBE, _MM_PERM_CDDD = 0xBF,
+  _MM_PERM_DAAA = 0xC0, _MM_PERM_DAAB = 0xC1, _MM_PERM_DAAC = 0xC2,
+  _MM_PERM_DAAD = 0xC3, _MM_PERM_DABA = 0xC4, _MM_PERM_DABB = 0xC5,
+  _MM_PERM_DABC = 0xC6, _MM_PERM_DABD = 0xC7, _MM_PERM_DACA = 0xC8,
+  _MM_PERM_DACB = 0xC9, _MM_PERM_DACC = 0xCA, _MM_PERM_DACD = 0xCB,
+  _MM_PERM_DADA = 0xCC, _MM_PERM_DADB = 0xCD, _MM_PERM_DADC = 0xCE,
+  _MM_PERM_DADD = 0xCF, _MM_PERM_DBAA = 0xD0, _MM_PERM_DBAB = 0xD1,
+  _MM_PERM_DBAC = 0xD2, _MM_PERM_DBAD = 0xD3, _MM_PERM_DBBA = 0xD4,
+  _MM_PERM_DBBB = 0xD5, _MM_PERM_DBBC = 0xD6, _MM_PERM_DBBD = 0xD7,
+  _MM_PERM_DBCA = 0xD8, _MM_PERM_DBCB = 0xD9, _MM_PERM_DBCC = 0xDA,
+  _MM_PERM_DBCD = 0xDB, _MM_PERM_DBDA = 0xDC, _MM_PERM_DBDB = 0xDD,
+  _MM_PERM_DBDC = 0xDE, _MM_PERM_DBDD = 0xDF, _MM_PERM_DCAA = 0xE0,
+  _MM_PERM_DCAB = 0xE1, _MM_PERM_DCAC = 0xE2, _MM_PERM_DCAD = 0xE3,
+  _MM_PERM_DCBA = 0xE4, _MM_PERM_DCBB = 0xE5, _MM_PERM_DCBC = 0xE6,
+  _MM_PERM_DCBD = 0xE7, _MM_PERM_DCCA = 0xE8, _MM_PERM_DCCB = 0xE9,
+  _MM_PERM_DCCC = 0xEA, _MM_PERM_DCCD = 0xEB, _MM_PERM_DCDA = 0xEC,
+  _MM_PERM_DCDB = 0xED, _MM_PERM_DCDC = 0xEE, _MM_PERM_DCDD = 0xEF,
+  _MM_PERM_DDAA = 0xF0, _MM_PERM_DDAB = 0xF1, _MM_PERM_DDAC = 0xF2,
+  _MM_PERM_DDAD = 0xF3, _MM_PERM_DDBA = 0xF4, _MM_PERM_DDBB = 0xF5,
+  _MM_PERM_DDBC = 0xF6, _MM_PERM_DDBD = 0xF7, _MM_PERM_DDCA = 0xF8,
+  _MM_PERM_DDCB = 0xF9, _MM_PERM_DDCC = 0xFA, _MM_PERM_DDCD = 0xFB,
+  _MM_PERM_DDDA = 0xFC, _MM_PERM_DDDB = 0xFD, _MM_PERM_DDDC = 0xFE,
+  _MM_PERM_DDDD = 0xFF
+} _MM_PERM_ENUM;
+
+#ifdef __OPTIMIZE__
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fixupimm_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
-				__m512i __C, const int __imm, const int __R)
+_mm512_shuffle_epi32 (__m512i __A, _MM_PERM_ENUM __mask)
 {
-  return (__m512d) __builtin_ia32_fixupimmpd512_maskz ((__v8df) __A,
-						       (__v8df) __B,
-						       (__v8di) __C,
-						       __imm,
-						       (__mmask8) __U, __R);
+  return (__m512i) __builtin_ia32_pshufd512_mask ((__v16si) __A,
+						  __mask,
+						  (__v16si)
+						  _mm512_undefined_epi32 (),
+						  (__mmask16) -1);
 }
 
-extern __inline __m512
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fixupimm_round_ps (__m512 __A, __m512 __B, __m512i __C,
-			  const int __imm, const int __R)
+_mm512_mask_shuffle_epi32 (__m512i __W, __mmask16 __U, __m512i __A,
+			   _MM_PERM_ENUM __mask)
 {
-  return (__m512) __builtin_ia32_fixupimmps512_mask ((__v16sf) __A,
-						     (__v16sf) __B,
-						     (__v16si) __C,
-						     __imm,
-						     (__mmask16) -1, __R);
+  return (__m512i) __builtin_ia32_pshufd512_mask ((__v16si) __A,
+						  __mask,
+						  (__v16si) __W,
+						  (__mmask16) __U);
 }
 
-extern __inline __m512
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fixupimm_round_ps (__m512 __A, __mmask16 __U, __m512 __B,
-			       __m512i __C, const int __imm, const int __R)
+_mm512_maskz_shuffle_epi32 (__mmask16 __U, __m512i __A, _MM_PERM_ENUM __mask)
 {
-  return (__m512) __builtin_ia32_fixupimmps512_mask ((__v16sf) __A,
-						     (__v16sf) __B,
-						     (__v16si) __C,
-						     __imm,
-						     (__mmask16) __U, __R);
+  return (__m512i) __builtin_ia32_pshufd512_mask ((__v16si) __A,
+						  __mask,
+						  (__v16si)
+						  _mm512_setzero_si512 (),
+						  (__mmask16) __U);
 }
 
-extern __inline __m512
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fixupimm_round_ps (__mmask16 __U, __m512 __A, __m512 __B,
-				__m512i __C, const int __imm, const int __R)
+_mm512_shuffle_i64x2 (__m512i __A, __m512i __B, const int __imm)
 {
-  return (__m512) __builtin_ia32_fixupimmps512_maskz ((__v16sf) __A,
-						      (__v16sf) __B,
-						      (__v16si) __C,
-						      __imm,
-						      (__mmask16) __U, __R);
+  return (__m512i) __builtin_ia32_shuf_i64x2_mask ((__v8di) __A,
+						   (__v8di) __B, __imm,
+						   (__v8di)
+						   _mm512_undefined_epi32 (),
+						   (__mmask8) -1);
 }
 
-extern __inline __m128d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fixupimm_round_sd (__m128d __A, __m128d __B, __m128i __C,
-		       const int __imm, const int __R)
+_mm512_mask_shuffle_i64x2 (__m512i __W, __mmask8 __U, __m512i __A,
+			   __m512i __B, const int __imm)
 {
-  return (__m128d) __builtin_ia32_fixupimmsd_mask ((__v2df) __A,
-						   (__v2df) __B,
-						   (__v2di) __C, __imm,
-						   (__mmask8) -1, __R);
+  return (__m512i) __builtin_ia32_shuf_i64x2_mask ((__v8di) __A,
+						   (__v8di) __B, __imm,
+						   (__v8di) __W,
+						   (__mmask8) __U);
 }
 
-extern __inline __m128d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fixupimm_round_sd (__m128d __A, __mmask8 __U, __m128d __B,
-			    __m128i __C, const int __imm, const int __R)
+_mm512_maskz_shuffle_i64x2 (__mmask8 __U, __m512i __A, __m512i __B,
+			    const int __imm)
 {
-  return (__m128d) __builtin_ia32_fixupimmsd_mask ((__v2df) __A,
-						   (__v2df) __B,
-						   (__v2di) __C, __imm,
-						   (__mmask8) __U, __R);
+  return (__m512i) __builtin_ia32_shuf_i64x2_mask ((__v8di) __A,
+						   (__v8di) __B, __imm,
+						   (__v8di)
+						   _mm512_setzero_si512 (),
+						   (__mmask8) __U);
 }
 
-extern __inline __m128d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fixupimm_round_sd (__mmask8 __U, __m128d __A, __m128d __B,
-			     __m128i __C, const int __imm, const int __R)
+_mm512_shuffle_i32x4 (__m512i __A, __m512i __B, const int __imm)
 {
-  return (__m128d) __builtin_ia32_fixupimmsd_maskz ((__v2df) __A,
-						    (__v2df) __B,
-						    (__v2di) __C,
-						    __imm,
-						    (__mmask8) __U, __R);
+  return (__m512i) __builtin_ia32_shuf_i32x4_mask ((__v16si) __A,
+						   (__v16si) __B,
+						   __imm,
+						   (__v16si)
+						   _mm512_undefined_epi32 (),
+						   (__mmask16) -1);
 }
 
-extern __inline __m128
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fixupimm_round_ss (__m128 __A, __m128 __B, __m128i __C,
-		       const int __imm, const int __R)
+_mm512_mask_shuffle_i32x4 (__m512i __W, __mmask16 __U, __m512i __A,
+			   __m512i __B, const int __imm)
 {
-  return (__m128) __builtin_ia32_fixupimmss_mask ((__v4sf) __A,
-						  (__v4sf) __B,
-						  (__v4si) __C, __imm,
-						  (__mmask8) -1, __R);
+  return (__m512i) __builtin_ia32_shuf_i32x4_mask ((__v16si) __A,
+						   (__v16si) __B,
+						   __imm,
+						   (__v16si) __W,
+						   (__mmask16) __U);
 }
 
-extern __inline __m128
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fixupimm_round_ss (__m128 __A, __mmask8 __U, __m128 __B,
-			    __m128i __C, const int __imm, const int __R)
+_mm512_maskz_shuffle_i32x4 (__mmask16 __U, __m512i __A, __m512i __B,
+			    const int __imm)
 {
-  return (__m128) __builtin_ia32_fixupimmss_mask ((__v4sf) __A,
-						  (__v4sf) __B,
-						  (__v4si) __C, __imm,
-						  (__mmask8) __U, __R);
+  return (__m512i) __builtin_ia32_shuf_i32x4_mask ((__v16si) __A,
+						   (__v16si) __B,
+						   __imm,
+						   (__v16si)
+						   _mm512_setzero_si512 (),
+						   (__mmask16) __U);
 }
 
-extern __inline __m128
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fixupimm_round_ss (__mmask8 __U, __m128 __A, __m128 __B,
-			     __m128i __C, const int __imm, const int __R)
+_mm512_shuffle_f64x2 (__m512d __A, __m512d __B, const int __imm)
 {
-  return (__m128) __builtin_ia32_fixupimmss_maskz ((__v4sf) __A,
-						   (__v4sf) __B,
-						   (__v4si) __C, __imm,
-						   (__mmask8) __U, __R);
+  return (__m512d) __builtin_ia32_shuf_f64x2_mask ((__v8df) __A,
+						   (__v8df) __B, __imm,
+						   (__v8df)
+						   _mm512_undefined_pd (),
+						   (__mmask8) -1);
 }
 
-#else
-#define _mm512_shuffle_pd(X, Y, C)                                      \
-    ((__m512d)__builtin_ia32_shufpd512_mask ((__v8df)(__m512d)(X),           \
-        (__v8df)(__m512d)(Y), (int)(C),\
-    (__v8df)(__m512d)_mm512_undefined_pd(),\
-    (__mmask8)-1))
-
-#define _mm512_mask_shuffle_pd(W, U, X, Y, C)                           \
-    ((__m512d)__builtin_ia32_shufpd512_mask ((__v8df)(__m512d)(X),           \
-        (__v8df)(__m512d)(Y), (int)(C),\
-    (__v8df)(__m512d)(W),\
-    (__mmask8)(U)))
-
-#define _mm512_maskz_shuffle_pd(U, X, Y, C)                             \
-    ((__m512d)__builtin_ia32_shufpd512_mask ((__v8df)(__m512d)(X),           \
-        (__v8df)(__m512d)(Y), (int)(C),\
-    (__v8df)(__m512d)_mm512_setzero_pd(),\
-    (__mmask8)(U)))
-
-#define _mm512_shuffle_ps(X, Y, C)                                      \
-    ((__m512)__builtin_ia32_shufps512_mask ((__v16sf)(__m512)(X),            \
-        (__v16sf)(__m512)(Y), (int)(C),\
-    (__v16sf)(__m512)_mm512_undefined_ps(),\
-    (__mmask16)-1))
-
-#define _mm512_mask_shuffle_ps(W, U, X, Y, C)                           \
-    ((__m512)__builtin_ia32_shufps512_mask ((__v16sf)(__m512)(X),            \
-        (__v16sf)(__m512)(Y), (int)(C),\
-    (__v16sf)(__m512)(W),\
-    (__mmask16)(U)))
-
-#define _mm512_maskz_shuffle_ps(U, X, Y, C)                             \
-    ((__m512)__builtin_ia32_shufps512_mask ((__v16sf)(__m512)(X),            \
-        (__v16sf)(__m512)(Y), (int)(C),\
-    (__v16sf)(__m512)_mm512_setzero_ps(),\
-    (__mmask16)(U)))
-
-#define _mm512_fixupimm_round_pd(X, Y, Z, C, R)					\
-  ((__m512d)__builtin_ia32_fixupimmpd512_mask ((__v8df)(__m512d)(X),	\
-      (__v8df)(__m512d)(Y), (__v8di)(__m512i)(Z), (int)(C),		\
-      (__mmask8)(-1), (R)))
-
-#define _mm512_mask_fixupimm_round_pd(X, U, Y, Z, C, R)                          \
-  ((__m512d)__builtin_ia32_fixupimmpd512_mask ((__v8df)(__m512d)(X),    \
-      (__v8df)(__m512d)(Y), (__v8di)(__m512i)(Z), (int)(C),             \
-      (__mmask8)(U), (R)))
-
-#define _mm512_maskz_fixupimm_round_pd(U, X, Y, Z, C, R)                         \
-  ((__m512d)__builtin_ia32_fixupimmpd512_maskz ((__v8df)(__m512d)(X),   \
-      (__v8df)(__m512d)(Y), (__v8di)(__m512i)(Z), (int)(C),             \
-      (__mmask8)(U), (R)))
-
-#define _mm512_fixupimm_round_ps(X, Y, Z, C, R)					\
-  ((__m512)__builtin_ia32_fixupimmps512_mask ((__v16sf)(__m512)(X),	\
-    (__v16sf)(__m512)(Y), (__v16si)(__m512i)(Z), (int)(C),		\
-    (__mmask16)(-1), (R)))
-
-#define _mm512_mask_fixupimm_round_ps(X, U, Y, Z, C, R)                          \
-  ((__m512)__builtin_ia32_fixupimmps512_mask ((__v16sf)(__m512)(X),     \
-    (__v16sf)(__m512)(Y), (__v16si)(__m512i)(Z), (int)(C),              \
-    (__mmask16)(U), (R)))
-
-#define _mm512_maskz_fixupimm_round_ps(U, X, Y, Z, C, R)                         \
-  ((__m512)__builtin_ia32_fixupimmps512_maskz ((__v16sf)(__m512)(X),    \
-    (__v16sf)(__m512)(Y), (__v16si)(__m512i)(Z), (int)(C),              \
-    (__mmask16)(U), (R)))
-
-#define _mm_fixupimm_round_sd(X, Y, Z, C, R)					\
-    ((__m128d)__builtin_ia32_fixupimmsd_mask ((__v2df)(__m128d)(X),	\
-      (__v2df)(__m128d)(Y), (__v2di)(__m128i)(Z), (int)(C),		\
-      (__mmask8)(-1), (R)))
-
-#define _mm_mask_fixupimm_round_sd(X, U, Y, Z, C, R)				\
-    ((__m128d)__builtin_ia32_fixupimmsd_mask ((__v2df)(__m128d)(X),	\
-      (__v2df)(__m128d)(Y), (__v2di)(__m128i)(Z), (int)(C),		\
-      (__mmask8)(U), (R)))
-
-#define _mm_maskz_fixupimm_round_sd(U, X, Y, Z, C, R)				\
-    ((__m128d)__builtin_ia32_fixupimmsd_maskz ((__v2df)(__m128d)(X),	\
-      (__v2df)(__m128d)(Y), (__v2di)(__m128i)(Z), (int)(C),		\
-      (__mmask8)(U), (R)))
-
-#define _mm_fixupimm_round_ss(X, Y, Z, C, R)					\
-    ((__m128)__builtin_ia32_fixupimmss_mask ((__v4sf)(__m128)(X),	\
-      (__v4sf)(__m128)(Y), (__v4si)(__m128i)(Z), (int)(C),		\
-      (__mmask8)(-1), (R)))
-
-#define _mm_mask_fixupimm_round_ss(X, U, Y, Z, C, R)				\
-    ((__m128)__builtin_ia32_fixupimmss_mask ((__v4sf)(__m128)(X),	\
-      (__v4sf)(__m128)(Y), (__v4si)(__m128i)(Z), (int)(C),		\
-      (__mmask8)(U), (R)))
-
-#define _mm_maskz_fixupimm_round_ss(U, X, Y, Z, C, R)				\
-    ((__m128)__builtin_ia32_fixupimmss_maskz ((__v4sf)(__m128)(X),	\
-      (__v4sf)(__m128)(Y), (__v4si)(__m128i)(Z), (int)(C),		\
-      (__mmask8)(U), (R)))
-#endif
-
-extern __inline __m512
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_movehdup_ps (__m512 __A)
+_mm512_mask_shuffle_f64x2 (__m512d __W, __mmask8 __U, __m512d __A,
+			   __m512d __B, const int __imm)
 {
-  return (__m512) __builtin_ia32_movshdup512_mask ((__v16sf) __A,
-						   (__v16sf)
-						   _mm512_undefined_ps (),
-						   (__mmask16) -1);
+  return (__m512d) __builtin_ia32_shuf_f64x2_mask ((__v8df) __A,
+						   (__v8df) __B, __imm,
+						   (__v8df) __W,
+						   (__mmask8) __U);
 }
 
-extern __inline __m512
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_movehdup_ps (__m512 __W, __mmask16 __U, __m512 __A)
+_mm512_maskz_shuffle_f64x2 (__mmask8 __U, __m512d __A, __m512d __B,
+			    const int __imm)
 {
-  return (__m512) __builtin_ia32_movshdup512_mask ((__v16sf) __A,
-						   (__v16sf) __W,
-						   (__mmask16) __U);
+  return (__m512d) __builtin_ia32_shuf_f64x2_mask ((__v8df) __A,
+						   (__v8df) __B, __imm,
+						   (__v8df)
+						   _mm512_setzero_pd (),
+						   (__mmask8) __U);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_movehdup_ps (__mmask16 __U, __m512 __A)
+_mm512_shuffle_f32x4 (__m512 __A, __m512 __B, const int __imm)
 {
-  return (__m512) __builtin_ia32_movshdup512_mask ((__v16sf) __A,
-						   (__v16sf)
-						   _mm512_setzero_ps (),
-						   (__mmask16) __U);
+  return (__m512) __builtin_ia32_shuf_f32x4_mask ((__v16sf) __A,
+						  (__v16sf) __B, __imm,
+						  (__v16sf)
+						  _mm512_undefined_ps (),
+						  (__mmask16) -1);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_moveldup_ps (__m512 __A)
+_mm512_mask_shuffle_f32x4 (__m512 __W, __mmask16 __U, __m512 __A,
+			   __m512 __B, const int __imm)
 {
-  return (__m512) __builtin_ia32_movsldup512_mask ((__v16sf) __A,
-						   (__v16sf)
-						   _mm512_undefined_ps (),
-						   (__mmask16) -1);
+  return (__m512) __builtin_ia32_shuf_f32x4_mask ((__v16sf) __A,
+						  (__v16sf) __B, __imm,
+						  (__v16sf) __W,
+						  (__mmask16) __U);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_moveldup_ps (__m512 __W, __mmask16 __U, __m512 __A)
+_mm512_maskz_shuffle_f32x4 (__mmask16 __U, __m512 __A, __m512 __B,
+			    const int __imm)
 {
-  return (__m512) __builtin_ia32_movsldup512_mask ((__v16sf) __A,
-						   (__v16sf) __W,
-						   (__mmask16) __U);
+  return (__m512) __builtin_ia32_shuf_f32x4_mask ((__v16sf) __A,
+						  (__v16sf) __B, __imm,
+						  (__v16sf)
+						  _mm512_setzero_ps (),
+						  (__mmask16) __U);
 }
 
-extern __inline __m512
+#else
+#define _mm512_shuffle_epi32(X, C)                                      \
+  ((__m512i)  __builtin_ia32_pshufd512_mask ((__v16si)(__m512i)(X), (int)(C),\
+    (__v16si)(__m512i)_mm512_undefined_epi32 (),\
+    (__mmask16)-1))
+
+#define _mm512_mask_shuffle_epi32(W, U, X, C)                           \
+  ((__m512i)  __builtin_ia32_pshufd512_mask ((__v16si)(__m512i)(X), (int)(C),\
+    (__v16si)(__m512i)(W),\
+    (__mmask16)(U)))
+
+#define _mm512_maskz_shuffle_epi32(U, X, C)                             \
+  ((__m512i)  __builtin_ia32_pshufd512_mask ((__v16si)(__m512i)(X), (int)(C),\
+    (__v16si)(__m512i)_mm512_setzero_si512 (),\
+    (__mmask16)(U)))
+
+#define _mm512_shuffle_i64x2(X, Y, C)                                   \
+  ((__m512i)  __builtin_ia32_shuf_i64x2_mask ((__v8di)(__m512i)(X),     \
+      (__v8di)(__m512i)(Y), (int)(C),\
+    (__v8di)(__m512i)_mm512_undefined_epi32 (),\
+    (__mmask8)-1))
+
+#define _mm512_mask_shuffle_i64x2(W, U, X, Y, C)                        \
+  ((__m512i)  __builtin_ia32_shuf_i64x2_mask ((__v8di)(__m512i)(X),     \
+      (__v8di)(__m512i)(Y), (int)(C),\
+    (__v8di)(__m512i)(W),\
+    (__mmask8)(U)))
+
+#define _mm512_maskz_shuffle_i64x2(U, X, Y, C)                          \
+  ((__m512i)  __builtin_ia32_shuf_i64x2_mask ((__v8di)(__m512i)(X),     \
+      (__v8di)(__m512i)(Y), (int)(C),\
+    (__v8di)(__m512i)_mm512_setzero_si512 (),\
+    (__mmask8)(U)))
+
+#define _mm512_shuffle_i32x4(X, Y, C)                                   \
+  ((__m512i)  __builtin_ia32_shuf_i32x4_mask ((__v16si)(__m512i)(X),    \
+      (__v16si)(__m512i)(Y), (int)(C),\
+    (__v16si)(__m512i)_mm512_undefined_epi32 (),\
+    (__mmask16)-1))
+
+#define _mm512_mask_shuffle_i32x4(W, U, X, Y, C)                        \
+  ((__m512i)  __builtin_ia32_shuf_i32x4_mask ((__v16si)(__m512i)(X),    \
+      (__v16si)(__m512i)(Y), (int)(C),\
+    (__v16si)(__m512i)(W),\
+    (__mmask16)(U)))
+
+#define _mm512_maskz_shuffle_i32x4(U, X, Y, C)                          \
+  ((__m512i)  __builtin_ia32_shuf_i32x4_mask ((__v16si)(__m512i)(X),    \
+      (__v16si)(__m512i)(Y), (int)(C),\
+    (__v16si)(__m512i)_mm512_setzero_si512 (),\
+    (__mmask16)(U)))
+
+#define _mm512_shuffle_f64x2(X, Y, C)                                   \
+  ((__m512d)  __builtin_ia32_shuf_f64x2_mask ((__v8df)(__m512d)(X),     \
+      (__v8df)(__m512d)(Y), (int)(C),\
+    (__v8df)(__m512d)_mm512_undefined_pd(),\
+    (__mmask8)-1))
+
+#define _mm512_mask_shuffle_f64x2(W, U, X, Y, C)                        \
+  ((__m512d)  __builtin_ia32_shuf_f64x2_mask ((__v8df)(__m512d)(X),     \
+      (__v8df)(__m512d)(Y), (int)(C),\
+    (__v8df)(__m512d)(W),\
+    (__mmask8)(U)))
+
+#define _mm512_maskz_shuffle_f64x2(U, X, Y, C)                         \
+  ((__m512d)  __builtin_ia32_shuf_f64x2_mask ((__v8df)(__m512d)(X),    \
+      (__v8df)(__m512d)(Y), (int)(C),\
+    (__v8df)(__m512d)_mm512_setzero_pd(),\
+    (__mmask8)(U)))
+
+#define _mm512_shuffle_f32x4(X, Y, C)                                  \
+  ((__m512)  __builtin_ia32_shuf_f32x4_mask ((__v16sf)(__m512)(X),     \
+      (__v16sf)(__m512)(Y), (int)(C),\
+    (__v16sf)(__m512)_mm512_undefined_ps(),\
+    (__mmask16)-1))
+
+#define _mm512_mask_shuffle_f32x4(W, U, X, Y, C)                       \
+  ((__m512)  __builtin_ia32_shuf_f32x4_mask ((__v16sf)(__m512)(X),     \
+      (__v16sf)(__m512)(Y), (int)(C),\
+    (__v16sf)(__m512)(W),\
+    (__mmask16)(U)))
+
+#define _mm512_maskz_shuffle_f32x4(U, X, Y, C)                         \
+  ((__m512)  __builtin_ia32_shuf_f32x4_mask ((__v16sf)(__m512)(X),     \
+      (__v16sf)(__m512)(Y), (int)(C),\
+    (__v16sf)(__m512)_mm512_setzero_ps(),\
+    (__mmask16)(U)))
+#endif
+
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_moveldup_ps (__mmask16 __U, __m512 __A)
+_mm512_rolv_epi32 (__m512i __A, __m512i __B)
 {
-  return (__m512) __builtin_ia32_movsldup512_mask ((__v16sf) __A,
-						   (__v16sf)
-						   _mm512_setzero_ps (),
-						   (__mmask16) __U);
+  return (__m512i) __builtin_ia32_prolvd512_mask ((__v16si) __A,
+						  (__v16si) __B,
+						  (__v16si)
+						  _mm512_undefined_epi32 (),
+						  (__mmask16) -1);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_or_si512 (__m512i __A, __m512i __B)
+_mm512_mask_rolv_epi32 (__m512i __W, __mmask16 __U, __m512i __A, __m512i __B)
 {
-  return (__m512i) ((__v16su) __A | (__v16su) __B);
+  return (__m512i) __builtin_ia32_prolvd512_mask ((__v16si) __A,
+						  (__v16si) __B,
+						  (__v16si) __W,
+						  (__mmask16) __U);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_or_epi32 (__m512i __A, __m512i __B)
+_mm512_maskz_rolv_epi32 (__mmask16 __U, __m512i __A, __m512i __B)
 {
-  return (__m512i) ((__v16su) __A | (__v16su) __B);
+  return (__m512i) __builtin_ia32_prolvd512_mask ((__v16si) __A,
+						  (__v16si) __B,
+						  (__v16si)
+						  _mm512_setzero_si512 (),
+						  (__mmask16) __U);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_or_epi32 (__m512i __W, __mmask16 __U, __m512i __A, __m512i __B)
+_mm512_rorv_epi32 (__m512i __A, __m512i __B)
 {
-  return (__m512i) __builtin_ia32_pord512_mask ((__v16si) __A,
-						(__v16si) __B,
-						(__v16si) __W,
-						(__mmask16) __U);
+  return (__m512i) __builtin_ia32_prorvd512_mask ((__v16si) __A,
+						  (__v16si) __B,
+						  (__v16si)
+						  _mm512_undefined_epi32 (),
+						  (__mmask16) -1);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_or_epi32 (__mmask16 __U, __m512i __A, __m512i __B)
+_mm512_mask_rorv_epi32 (__m512i __W, __mmask16 __U, __m512i __A, __m512i __B)
 {
-  return (__m512i) __builtin_ia32_pord512_mask ((__v16si) __A,
-						(__v16si) __B,
-						(__v16si)
-						_mm512_setzero_si512 (),
-						(__mmask16) __U);
+  return (__m512i) __builtin_ia32_prorvd512_mask ((__v16si) __A,
+						  (__v16si) __B,
+						  (__v16si) __W,
+						  (__mmask16) __U);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_or_epi64 (__m512i __A, __m512i __B)
+_mm512_maskz_rorv_epi32 (__mmask16 __U, __m512i __A, __m512i __B)
 {
-  return (__m512i) ((__v8du) __A | (__v8du) __B);
+  return (__m512i) __builtin_ia32_prorvd512_mask ((__v16si) __A,
+						  (__v16si) __B,
+						  (__v16si)
+						  _mm512_setzero_si512 (),
+						  (__mmask16) __U);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_or_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m512i __B)
+_mm512_rolv_epi64 (__m512i __A, __m512i __B)
 {
-  return (__m512i) __builtin_ia32_porq512_mask ((__v8di) __A,
-						(__v8di) __B,
-						(__v8di) __W,
-						(__mmask8) __U);
+  return (__m512i) __builtin_ia32_prolvq512_mask ((__v8di) __A,
+						  (__v8di) __B,
+						  (__v8di)
+						  _mm512_undefined_epi32 (),
+						  (__mmask8) -1);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_or_epi64 (__mmask8 __U, __m512i __A, __m512i __B)
+_mm512_mask_rolv_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m512i __B)
 {
-  return (__m512i) __builtin_ia32_porq512_mask ((__v8di) __A,
-						(__v8di) __B,
-						(__v8di)
-						_mm512_setzero_si512 (),
-						(__mmask8) __U);
+  return (__m512i) __builtin_ia32_prolvq512_mask ((__v8di) __A,
+						  (__v8di) __B,
+						  (__v8di) __W,
+						  (__mmask8) __U);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_xor_si512 (__m512i __A, __m512i __B)
+_mm512_maskz_rolv_epi64 (__mmask8 __U, __m512i __A, __m512i __B)
 {
-  return (__m512i) ((__v16su) __A ^ (__v16su) __B);
+  return (__m512i) __builtin_ia32_prolvq512_mask ((__v8di) __A,
+						  (__v8di) __B,
+						  (__v8di)
+						  _mm512_setzero_si512 (),
+						  (__mmask8) __U);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_xor_epi32 (__m512i __A, __m512i __B)
+_mm512_rorv_epi64 (__m512i __A, __m512i __B)
 {
-  return (__m512i) ((__v16su) __A ^ (__v16su) __B);
+  return (__m512i) __builtin_ia32_prorvq512_mask ((__v8di) __A,
+						  (__v8di) __B,
+						  (__v8di)
+						  _mm512_undefined_epi32 (),
+						  (__mmask8) -1);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_xor_epi32 (__m512i __W, __mmask16 __U, __m512i __A, __m512i __B)
+_mm512_mask_rorv_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m512i __B)
 {
-  return (__m512i) __builtin_ia32_pxord512_mask ((__v16si) __A,
-						 (__v16si) __B,
-						 (__v16si) __W,
-						 (__mmask16) __U);
+  return (__m512i) __builtin_ia32_prorvq512_mask ((__v8di) __A,
+						  (__v8di) __B,
+						  (__v8di) __W,
+						  (__mmask8) __U);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_xor_epi32 (__mmask16 __U, __m512i __A, __m512i __B)
+_mm512_maskz_rorv_epi64 (__mmask8 __U, __m512i __A, __m512i __B)
 {
-  return (__m512i) __builtin_ia32_pxord512_mask ((__v16si) __A,
-						 (__v16si) __B,
-						 (__v16si)
-						 _mm512_setzero_si512 (),
-						 (__mmask16) __U);
+  return (__m512i) __builtin_ia32_prorvq512_mask ((__v8di) __A,
+						  (__v8di) __B,
+						  (__v8di)
+						  _mm512_setzero_si512 (),
+						  (__mmask8) __U);
 }
 
-extern __inline __m512i
+#ifdef __OPTIMIZE__
+extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_xor_epi64 (__m512i __A, __m512i __B)
+_mm512_cvtt_roundpd_epi32 (__m512d __A, const int __R)
 {
-  return (__m512i) ((__v8du) __A ^ (__v8du) __B);
+  return (__m256i) __builtin_ia32_cvttpd2dq512_mask ((__v8df) __A,
+						     (__v8si)
+						     _mm256_undefined_si256 (),
+						     (__mmask8) -1, __R);
 }
 
-extern __inline __m512i
+extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_xor_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m512i __B)
+_mm512_mask_cvtt_roundpd_epi32 (__m256i __W, __mmask8 __U, __m512d __A,
+				const int __R)
 {
-  return (__m512i) __builtin_ia32_pxorq512_mask ((__v8di) __A,
-						 (__v8di) __B,
-						 (__v8di) __W,
-						 (__mmask8) __U);
+  return (__m256i) __builtin_ia32_cvttpd2dq512_mask ((__v8df) __A,
+						     (__v8si) __W,
+						     (__mmask8) __U, __R);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvtt_roundpd_epi32 (__mmask8 __U, __m512d __A, const int __R)
+{
+  return (__m256i) __builtin_ia32_cvttpd2dq512_mask ((__v8df) __A,
+						     (__v8si)
+						     _mm256_setzero_si256 (),
+						     (__mmask8) __U, __R);
 }
 
-extern __inline __m512i
+extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_xor_epi64 (__mmask8 __U, __m512i __A, __m512i __B)
+_mm512_cvtt_roundpd_epu32 (__m512d __A, const int __R)
 {
-  return (__m512i) __builtin_ia32_pxorq512_mask ((__v8di) __A,
-						 (__v8di) __B,
-						 (__v8di)
-						 _mm512_setzero_si512 (),
-						 (__mmask8) __U);
+  return (__m256i) __builtin_ia32_cvttpd2udq512_mask ((__v8df) __A,
+						      (__v8si)
+						      _mm256_undefined_si256 (),
+						      (__mmask8) -1, __R);
 }
 
-#ifdef __OPTIMIZE__
-extern __inline __m512i
+extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_rol_epi32 (__m512i __A, const int __B)
+_mm512_mask_cvtt_roundpd_epu32 (__m256i __W, __mmask8 __U, __m512d __A,
+				const int __R)
 {
-  return (__m512i) __builtin_ia32_prold512_mask ((__v16si) __A, __B,
-						 (__v16si)
-						 _mm512_undefined_epi32 (),
-						 (__mmask16) -1);
+  return (__m256i) __builtin_ia32_cvttpd2udq512_mask ((__v8df) __A,
+						      (__v8si) __W,
+						      (__mmask8) __U, __R);
 }
 
-extern __inline __m512i
+extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_rol_epi32 (__m512i __W, __mmask16 __U, __m512i __A, const int __B)
+_mm512_maskz_cvtt_roundpd_epu32 (__mmask8 __U, __m512d __A, const int __R)
 {
-  return (__m512i) __builtin_ia32_prold512_mask ((__v16si) __A, __B,
-						 (__v16si) __W,
-						 (__mmask16) __U);
+  return (__m256i) __builtin_ia32_cvttpd2udq512_mask ((__v8df) __A,
+						      (__v8si)
+						      _mm256_setzero_si256 (),
+						      (__mmask8) __U, __R);
 }
+#else
+#define _mm512_cvtt_roundpd_epi32(A, B)		     \
+    ((__m256i)__builtin_ia32_cvttpd2dq512_mask(A, (__v8si)_mm256_undefined_si256(), -1, B))
 
-extern __inline __m512i
+#define _mm512_mask_cvtt_roundpd_epi32(W, U, A, B)   \
+    ((__m256i)__builtin_ia32_cvttpd2dq512_mask(A, (__v8si)(W), U, B))
+
+#define _mm512_maskz_cvtt_roundpd_epi32(U, A, B)     \
+    ((__m256i)__builtin_ia32_cvttpd2dq512_mask(A, (__v8si)_mm256_setzero_si256(), U, B))
+
+#define _mm512_cvtt_roundpd_epu32(A, B)		     \
+    ((__m256i)__builtin_ia32_cvttpd2udq512_mask(A, (__v8si)_mm256_undefined_si256(), -1, B))
+
+#define _mm512_mask_cvtt_roundpd_epu32(W, U, A, B)   \
+    ((__m256i)__builtin_ia32_cvttpd2udq512_mask(A, (__v8si)(W), U, B))
+
+#define _mm512_maskz_cvtt_roundpd_epu32(U, A, B)     \
+    ((__m256i)__builtin_ia32_cvttpd2udq512_mask(A, (__v8si)_mm256_setzero_si256(), U, B))
+#endif
+
+#ifdef __OPTIMIZE__
+extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_rol_epi32 (__mmask16 __U, __m512i __A, const int __B)
+_mm512_cvt_roundpd_epi32 (__m512d __A, const int __R)
 {
-  return (__m512i) __builtin_ia32_prold512_mask ((__v16si) __A, __B,
-						 (__v16si)
-						 _mm512_setzero_si512 (),
-						 (__mmask16) __U);
+  return (__m256i) __builtin_ia32_cvtpd2dq512_mask ((__v8df) __A,
+						    (__v8si)
+						    _mm256_undefined_si256 (),
+						    (__mmask8) -1, __R);
 }
 
-extern __inline __m512i
+extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_ror_epi32 (__m512i __A, int __B)
+_mm512_mask_cvt_roundpd_epi32 (__m256i __W, __mmask8 __U, __m512d __A,
+			       const int __R)
 {
-  return (__m512i) __builtin_ia32_prord512_mask ((__v16si) __A, __B,
-						 (__v16si)
-						 _mm512_undefined_epi32 (),
-						 (__mmask16) -1);
+  return (__m256i) __builtin_ia32_cvtpd2dq512_mask ((__v8df) __A,
+						    (__v8si) __W,
+						    (__mmask8) __U, __R);
 }
 
-extern __inline __m512i
+extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_ror_epi32 (__m512i __W, __mmask16 __U, __m512i __A, int __B)
+_mm512_maskz_cvt_roundpd_epi32 (__mmask8 __U, __m512d __A, const int __R)
 {
-  return (__m512i) __builtin_ia32_prord512_mask ((__v16si) __A, __B,
-						 (__v16si) __W,
-						 (__mmask16) __U);
+  return (__m256i) __builtin_ia32_cvtpd2dq512_mask ((__v8df) __A,
+						    (__v8si)
+						    _mm256_setzero_si256 (),
+						    (__mmask8) __U, __R);
 }
 
-extern __inline __m512i
+extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_ror_epi32 (__mmask16 __U, __m512i __A, int __B)
+_mm512_cvt_roundpd_epu32 (__m512d __A, const int __R)
 {
-  return (__m512i) __builtin_ia32_prord512_mask ((__v16si) __A, __B,
-						 (__v16si)
-						 _mm512_setzero_si512 (),
-						 (__mmask16) __U);
+  return (__m256i) __builtin_ia32_cvtpd2udq512_mask ((__v8df) __A,
+						     (__v8si)
+						     _mm256_undefined_si256 (),
+						     (__mmask8) -1, __R);
 }
 
-extern __inline __m512i
+extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_rol_epi64 (__m512i __A, const int __B)
+_mm512_mask_cvt_roundpd_epu32 (__m256i __W, __mmask8 __U, __m512d __A,
+			       const int __R)
 {
-  return (__m512i) __builtin_ia32_prolq512_mask ((__v8di) __A, __B,
-						 (__v8di)
-						 _mm512_undefined_epi32 (),
-						 (__mmask8) -1);
+  return (__m256i) __builtin_ia32_cvtpd2udq512_mask ((__v8df) __A,
+						     (__v8si) __W,
+						     (__mmask8) __U, __R);
 }
 
-extern __inline __m512i
+extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_rol_epi64 (__m512i __W, __mmask8 __U, __m512i __A, const int __B)
+_mm512_maskz_cvt_roundpd_epu32 (__mmask8 __U, __m512d __A, const int __R)
 {
-  return (__m512i) __builtin_ia32_prolq512_mask ((__v8di) __A, __B,
-						 (__v8di) __W,
-						 (__mmask8) __U);
+  return (__m256i) __builtin_ia32_cvtpd2udq512_mask ((__v8df) __A,
+						     (__v8si)
+						     _mm256_setzero_si256 (),
+						     (__mmask8) __U, __R);
 }
+#else
+#define _mm512_cvt_roundpd_epi32(A, B)		    \
+    ((__m256i)__builtin_ia32_cvtpd2dq512_mask(A, (__v8si)_mm256_undefined_si256(), -1, B))
+
+#define _mm512_mask_cvt_roundpd_epi32(W, U, A, B)   \
+    ((__m256i)__builtin_ia32_cvtpd2dq512_mask(A, (__v8si)(W), U, B))
+
+#define _mm512_maskz_cvt_roundpd_epi32(U, A, B)     \
+    ((__m256i)__builtin_ia32_cvtpd2dq512_mask(A, (__v8si)_mm256_setzero_si256(), U, B))
+
+#define _mm512_cvt_roundpd_epu32(A, B)		    \
+    ((__m256i)__builtin_ia32_cvtpd2udq512_mask(A, (__v8si)_mm256_undefined_si256(), -1, B))
+
+#define _mm512_mask_cvt_roundpd_epu32(W, U, A, B)   \
+    ((__m256i)__builtin_ia32_cvtpd2udq512_mask(A, (__v8si)(W), U, B))
+
+#define _mm512_maskz_cvt_roundpd_epu32(U, A, B)     \
+    ((__m256i)__builtin_ia32_cvtpd2udq512_mask(A, (__v8si)_mm256_setzero_si256(), U, B))
+#endif
 
+#ifdef __OPTIMIZE__
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_rol_epi64 (__mmask8 __U, __m512i __A, const int __B)
+_mm512_cvtt_roundps_epi32 (__m512 __A, const int __R)
 {
-  return (__m512i) __builtin_ia32_prolq512_mask ((__v8di) __A, __B,
-						 (__v8di)
-						 _mm512_setzero_si512 (),
-						 (__mmask8) __U);
+  return (__m512i) __builtin_ia32_cvttps2dq512_mask ((__v16sf) __A,
+						     (__v16si)
+						     _mm512_undefined_epi32 (),
+						     (__mmask16) -1, __R);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_ror_epi64 (__m512i __A, int __B)
+_mm512_mask_cvtt_roundps_epi32 (__m512i __W, __mmask16 __U, __m512 __A,
+				const int __R)
 {
-  return (__m512i) __builtin_ia32_prorq512_mask ((__v8di) __A, __B,
-						 (__v8di)
-						 _mm512_undefined_epi32 (),
-						 (__mmask8) -1);
+  return (__m512i) __builtin_ia32_cvttps2dq512_mask ((__v16sf) __A,
+						     (__v16si) __W,
+						     (__mmask16) __U, __R);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_ror_epi64 (__m512i __W, __mmask8 __U, __m512i __A, int __B)
+_mm512_maskz_cvtt_roundps_epi32 (__mmask16 __U, __m512 __A, const int __R)
 {
-  return (__m512i) __builtin_ia32_prorq512_mask ((__v8di) __A, __B,
-						 (__v8di) __W,
-						 (__mmask8) __U);
+  return (__m512i) __builtin_ia32_cvttps2dq512_mask ((__v16sf) __A,
+						     (__v16si)
+						     _mm512_setzero_si512 (),
+						     (__mmask16) __U, __R);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_ror_epi64 (__mmask8 __U, __m512i __A, int __B)
+_mm512_cvtt_roundps_epu32 (__m512 __A, const int __R)
 {
-  return (__m512i) __builtin_ia32_prorq512_mask ((__v8di) __A, __B,
-						 (__v8di)
-						 _mm512_setzero_si512 (),
-						 (__mmask8) __U);
-}
-
-#else
-#define _mm512_rol_epi32(A, B)						  \
-    ((__m512i)__builtin_ia32_prold512_mask ((__v16si)(__m512i)(A),	  \
-					    (int)(B),			  \
-					    (__v16si)_mm512_undefined_epi32 (), \
-					    (__mmask16)(-1)))
-#define _mm512_mask_rol_epi32(W, U, A, B)				  \
-    ((__m512i)__builtin_ia32_prold512_mask ((__v16si)(__m512i)(A),	  \
-					    (int)(B),			  \
-					    (__v16si)(__m512i)(W),	  \
-					    (__mmask16)(U)))
-#define _mm512_maskz_rol_epi32(U, A, B)					  \
-    ((__m512i)__builtin_ia32_prold512_mask ((__v16si)(__m512i)(A),	  \
-					    (int)(B),			  \
-					    (__v16si)_mm512_setzero_si512 (), \
-					    (__mmask16)(U)))
-#define _mm512_ror_epi32(A, B)						  \
-    ((__m512i)__builtin_ia32_prord512_mask ((__v16si)(__m512i)(A),	  \
-					    (int)(B),			  \
-					    (__v16si)_mm512_undefined_epi32 (), \
-					    (__mmask16)(-1)))
-#define _mm512_mask_ror_epi32(W, U, A, B)				  \
-    ((__m512i)__builtin_ia32_prord512_mask ((__v16si)(__m512i)(A),	  \
-					    (int)(B),			  \
-					    (__v16si)(__m512i)(W),	  \
-					    (__mmask16)(U)))
-#define _mm512_maskz_ror_epi32(U, A, B)					  \
-    ((__m512i)__builtin_ia32_prord512_mask ((__v16si)(__m512i)(A),	  \
-					    (int)(B),			  \
-					    (__v16si)_mm512_setzero_si512 (), \
-					    (__mmask16)(U)))
-#define _mm512_rol_epi64(A, B)						  \
-    ((__m512i)__builtin_ia32_prolq512_mask ((__v8di)(__m512i)(A),	  \
-					    (int)(B),			  \
-					    (__v8di)_mm512_undefined_epi32 (),  \
-					    (__mmask8)(-1)))
-#define _mm512_mask_rol_epi64(W, U, A, B)				  \
-    ((__m512i)__builtin_ia32_prolq512_mask ((__v8di)(__m512i)(A),	  \
-					    (int)(B),			  \
-					    (__v8di)(__m512i)(W),	  \
-					    (__mmask8)(U)))
-#define _mm512_maskz_rol_epi64(U, A, B)					  \
-    ((__m512i)__builtin_ia32_prolq512_mask ((__v8di)(__m512i)(A),	  \
-					    (int)(B),			  \
-					    (__v8di)_mm512_setzero_si512 (),  \
-					    (__mmask8)(U)))
-
-#define _mm512_ror_epi64(A, B)						  \
-    ((__m512i)__builtin_ia32_prorq512_mask ((__v8di)(__m512i)(A),	  \
-					    (int)(B),			  \
-					    (__v8di)_mm512_undefined_epi32 (),  \
-					    (__mmask8)(-1)))
-#define _mm512_mask_ror_epi64(W, U, A, B)				  \
-    ((__m512i)__builtin_ia32_prorq512_mask ((__v8di)(__m512i)(A),	  \
-					    (int)(B),			  \
-					    (__v8di)(__m512i)(W),	  \
-					    (__mmask8)(U)))
-#define _mm512_maskz_ror_epi64(U, A, B)					  \
-    ((__m512i)__builtin_ia32_prorq512_mask ((__v8di)(__m512i)(A),	  \
-					    (int)(B),			  \
-					    (__v8di)_mm512_setzero_si512 (),  \
-					    (__mmask8)(U)))
-#endif
+  return (__m512i) __builtin_ia32_cvttps2udq512_mask ((__v16sf) __A,
+						      (__v16si)
+						      _mm512_undefined_epi32 (),
+						      (__mmask16) -1, __R);
+}
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_and_si512 (__m512i __A, __m512i __B)
+_mm512_mask_cvtt_roundps_epu32 (__m512i __W, __mmask16 __U, __m512 __A,
+				const int __R)
 {
-  return (__m512i) ((__v16su) __A & (__v16su) __B);
+  return (__m512i) __builtin_ia32_cvttps2udq512_mask ((__v16sf) __A,
+						      (__v16si) __W,
+						      (__mmask16) __U, __R);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_and_epi32 (__m512i __A, __m512i __B)
+_mm512_maskz_cvtt_roundps_epu32 (__mmask16 __U, __m512 __A, const int __R)
 {
-  return (__m512i) ((__v16su) __A & (__v16su) __B);
+  return (__m512i) __builtin_ia32_cvttps2udq512_mask ((__v16sf) __A,
+						      (__v16si)
+						      _mm512_setzero_si512 (),
+						      (__mmask16) __U, __R);
 }
+#else
+#define _mm512_cvtt_roundps_epi32(A, B)		     \
+    ((__m512i)__builtin_ia32_cvttps2dq512_mask(A, (__v16si)_mm512_undefined_epi32 (), -1, B))
+
+#define _mm512_mask_cvtt_roundps_epi32(W, U, A, B)   \
+    ((__m512i)__builtin_ia32_cvttps2dq512_mask(A, (__v16si)(W), U, B))
+
+#define _mm512_maskz_cvtt_roundps_epi32(U, A, B)     \
+    ((__m512i)__builtin_ia32_cvttps2dq512_mask(A, (__v16si)_mm512_setzero_si512 (), U, B))
+
+#define _mm512_cvtt_roundps_epu32(A, B)		     \
+    ((__m512i)__builtin_ia32_cvttps2udq512_mask(A, (__v16si)_mm512_undefined_epi32 (), -1, B))
+
+#define _mm512_mask_cvtt_roundps_epu32(W, U, A, B)   \
+    ((__m512i)__builtin_ia32_cvttps2udq512_mask(A, (__v16si)(W), U, B))
 
+#define _mm512_maskz_cvtt_roundps_epu32(U, A, B)     \
+    ((__m512i)__builtin_ia32_cvttps2udq512_mask(A, (__v16si)_mm512_setzero_si512 (), U, B))
+#endif
+
+#ifdef __OPTIMIZE__
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_and_epi32 (__m512i __W, __mmask16 __U, __m512i __A, __m512i __B)
+_mm512_cvt_roundps_epi32 (__m512 __A, const int __R)
 {
-  return (__m512i) __builtin_ia32_pandd512_mask ((__v16si) __A,
-						 (__v16si) __B,
-						 (__v16si) __W,
-						 (__mmask16) __U);
+  return (__m512i) __builtin_ia32_cvtps2dq512_mask ((__v16sf) __A,
+						    (__v16si)
+						    _mm512_undefined_epi32 (),
+						    (__mmask16) -1, __R);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_and_epi32 (__mmask16 __U, __m512i __A, __m512i __B)
+_mm512_mask_cvt_roundps_epi32 (__m512i __W, __mmask16 __U, __m512 __A,
+			       const int __R)
 {
-  return (__m512i) __builtin_ia32_pandd512_mask ((__v16si) __A,
-						 (__v16si) __B,
-						 (__v16si)
-						 _mm512_setzero_si512 (),
-						 (__mmask16) __U);
+  return (__m512i) __builtin_ia32_cvtps2dq512_mask ((__v16sf) __A,
+						    (__v16si) __W,
+						    (__mmask16) __U, __R);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_and_epi64 (__m512i __A, __m512i __B)
+_mm512_maskz_cvt_roundps_epi32 (__mmask16 __U, __m512 __A, const int __R)
 {
-  return (__m512i) ((__v8du) __A & (__v8du) __B);
+  return (__m512i) __builtin_ia32_cvtps2dq512_mask ((__v16sf) __A,
+						    (__v16si)
+						    _mm512_setzero_si512 (),
+						    (__mmask16) __U, __R);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_and_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m512i __B)
+_mm512_cvt_roundps_epu32 (__m512 __A, const int __R)
 {
-  return (__m512i) __builtin_ia32_pandq512_mask ((__v8di) __A,
-						 (__v8di) __B,
-						 (__v8di) __W, __U);
+  return (__m512i) __builtin_ia32_cvtps2udq512_mask ((__v16sf) __A,
+						     (__v16si)
+						     _mm512_undefined_epi32 (),
+						     (__mmask16) -1, __R);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_and_epi64 (__mmask8 __U, __m512i __A, __m512i __B)
+_mm512_mask_cvt_roundps_epu32 (__m512i __W, __mmask16 __U, __m512 __A,
+			       const int __R)
 {
-  return (__m512i) __builtin_ia32_pandq512_mask ((__v8di) __A,
-						 (__v8di) __B,
-						 (__v8di)
-						 _mm512_setzero_pd (),
-						 __U);
+  return (__m512i) __builtin_ia32_cvtps2udq512_mask ((__v16sf) __A,
+						     (__v16si) __W,
+						     (__mmask16) __U, __R);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_andnot_si512 (__m512i __A, __m512i __B)
+_mm512_maskz_cvt_roundps_epu32 (__mmask16 __U, __m512 __A, const int __R)
 {
-  return (__m512i) __builtin_ia32_pandnd512_mask ((__v16si) __A,
-						  (__v16si) __B,
-						  (__v16si)
-						  _mm512_undefined_epi32 (),
-						  (__mmask16) -1);
+  return (__m512i) __builtin_ia32_cvtps2udq512_mask ((__v16sf) __A,
+						     (__v16si)
+						     _mm512_setzero_si512 (),
+						     (__mmask16) __U, __R);
 }
+#else
+#define _mm512_cvt_roundps_epi32(A, B)		    \
+    ((__m512i)__builtin_ia32_cvtps2dq512_mask(A, (__v16si)_mm512_undefined_epi32 (), -1, B))
 
-extern __inline __m512i
+#define _mm512_mask_cvt_roundps_epi32(W, U, A, B)   \
+    ((__m512i)__builtin_ia32_cvtps2dq512_mask(A, (__v16si)(W), U, B))
+
+#define _mm512_maskz_cvt_roundps_epi32(U, A, B)     \
+    ((__m512i)__builtin_ia32_cvtps2dq512_mask(A, (__v16si)_mm512_setzero_si512 (), U, B))
+
+#define _mm512_cvt_roundps_epu32(A, B)		    \
+    ((__m512i)__builtin_ia32_cvtps2udq512_mask(A, (__v16si)_mm512_undefined_epi32 (), -1, B))
+
+#define _mm512_mask_cvt_roundps_epu32(W, U, A, B)   \
+    ((__m512i)__builtin_ia32_cvtps2udq512_mask(A, (__v16si)(W), U, B))
+
+#define _mm512_maskz_cvt_roundps_epu32(U, A, B)     \
+    ((__m512i)__builtin_ia32_cvtps2udq512_mask(A, (__v16si)_mm512_setzero_si512 (), U, B))
+#endif
+
+extern __inline __m128i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_andnot_epi32 (__m512i __A, __m512i __B)
+_mm512_cvtepi32_epi8 (__m512i __A)
 {
-  return (__m512i) __builtin_ia32_pandnd512_mask ((__v16si) __A,
-						  (__v16si) __B,
-						  (__v16si)
-						  _mm512_undefined_epi32 (),
+  return (__m128i) __builtin_ia32_pmovdb512_mask ((__v16si) __A,
+						  (__v16qi)
+						  _mm_undefined_si128 (),
 						  (__mmask16) -1);
 }
 
-extern __inline __m512i
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_andnot_epi32 (__m512i __W, __mmask16 __U, __m512i __A, __m512i __B)
+_mm512_mask_cvtepi32_storeu_epi8 (void * __P, __mmask16 __M, __m512i __A)
 {
-  return (__m512i) __builtin_ia32_pandnd512_mask ((__v16si) __A,
-						  (__v16si) __B,
-						  (__v16si) __W,
-						  (__mmask16) __U);
+  __builtin_ia32_pmovdb512mem_mask ((__v16qi *) __P, (__v16si) __A, __M);
 }
 
-extern __inline __m512i
+extern __inline __m128i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_andnot_epi32 (__mmask16 __U, __m512i __A, __m512i __B)
+_mm512_mask_cvtepi32_epi8 (__m128i __O, __mmask16 __M, __m512i __A)
 {
-  return (__m512i) __builtin_ia32_pandnd512_mask ((__v16si) __A,
-						  (__v16si) __B,
-						  (__v16si)
-						  _mm512_setzero_si512 (),
-						  (__mmask16) __U);
+  return (__m128i) __builtin_ia32_pmovdb512_mask ((__v16si) __A,
+						  (__v16qi) __O, __M);
 }
 
-extern __inline __m512i
+extern __inline __m128i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_andnot_epi64 (__m512i __A, __m512i __B)
+_mm512_maskz_cvtepi32_epi8 (__mmask16 __M, __m512i __A)
 {
-  return (__m512i) __builtin_ia32_pandnq512_mask ((__v8di) __A,
-						  (__v8di) __B,
-						  (__v8di)
-						  _mm512_undefined_epi32 (),
-						  (__mmask8) -1);
+  return (__m128i) __builtin_ia32_pmovdb512_mask ((__v16si) __A,
+						  (__v16qi)
+						  _mm_setzero_si128 (),
+						  __M);
 }
 
-extern __inline __m512i
+extern __inline __m128i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_andnot_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m512i __B)
+_mm512_cvtsepi32_epi8 (__m512i __A)
 {
-  return (__m512i) __builtin_ia32_pandnq512_mask ((__v8di) __A,
-						  (__v8di) __B,
-						  (__v8di) __W, __U);
+  return (__m128i) __builtin_ia32_pmovsdb512_mask ((__v16si) __A,
+						   (__v16qi)
+						   _mm_undefined_si128 (),
+						   (__mmask16) -1);
 }
 
-extern __inline __m512i
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_andnot_epi64 (__mmask8 __U, __m512i __A, __m512i __B)
+_mm512_mask_cvtsepi32_storeu_epi8 (void * __P, __mmask16 __M, __m512i __A)
 {
-  return (__m512i) __builtin_ia32_pandnq512_mask ((__v8di) __A,
-						  (__v8di) __B,
-						  (__v8di)
-						  _mm512_setzero_pd (),
-						  __U);
+  __builtin_ia32_pmovsdb512mem_mask ((__v16qi *) __P, (__v16si) __A, __M);
 }
 
-extern __inline __mmask16
+extern __inline __m128i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_test_epi32_mask (__m512i __A, __m512i __B)
+_mm512_mask_cvtsepi32_epi8 (__m128i __O, __mmask16 __M, __m512i __A)
 {
-  return (__mmask16) __builtin_ia32_ptestmd512 ((__v16si) __A,
-						(__v16si) __B,
-						(__mmask16) -1);
+  return (__m128i) __builtin_ia32_pmovsdb512_mask ((__v16si) __A,
+						   (__v16qi) __O, __M);
 }
 
-extern __inline __mmask16
+extern __inline __m128i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_test_epi32_mask (__mmask16 __U, __m512i __A, __m512i __B)
+_mm512_maskz_cvtsepi32_epi8 (__mmask16 __M, __m512i __A)
 {
-  return (__mmask16) __builtin_ia32_ptestmd512 ((__v16si) __A,
-						(__v16si) __B, __U);
+  return (__m128i) __builtin_ia32_pmovsdb512_mask ((__v16si) __A,
+						   (__v16qi)
+						   _mm_setzero_si128 (),
+						   __M);
 }
 
-extern __inline __mmask8
+extern __inline __m128i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_test_epi64_mask (__m512i __A, __m512i __B)
+_mm512_cvtusepi32_epi8 (__m512i __A)
 {
-  return (__mmask8) __builtin_ia32_ptestmq512 ((__v8di) __A,
-					       (__v8di) __B,
-					       (__mmask8) -1);
+  return (__m128i) __builtin_ia32_pmovusdb512_mask ((__v16si) __A,
+						    (__v16qi)
+						    _mm_undefined_si128 (),
+						    (__mmask16) -1);
 }
 
-extern __inline __mmask8
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_test_epi64_mask (__mmask8 __U, __m512i __A, __m512i __B)
+_mm512_mask_cvtusepi32_storeu_epi8 (void * __P, __mmask16 __M, __m512i __A)
 {
-  return (__mmask8) __builtin_ia32_ptestmq512 ((__v8di) __A, (__v8di) __B, __U);
+  __builtin_ia32_pmovusdb512mem_mask ((__v16qi *) __P, (__v16si) __A, __M);
 }
 
-extern __inline __mmask16
+extern __inline __m128i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_testn_epi32_mask (__m512i __A, __m512i __B)
+_mm512_mask_cvtusepi32_epi8 (__m128i __O, __mmask16 __M, __m512i __A)
 {
-  return (__mmask16) __builtin_ia32_ptestnmd512 ((__v16si) __A,
-						 (__v16si) __B,
-						 (__mmask16) -1);
+  return (__m128i) __builtin_ia32_pmovusdb512_mask ((__v16si) __A,
+						    (__v16qi) __O,
+						    __M);
 }
 
-extern __inline __mmask16
+extern __inline __m128i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_testn_epi32_mask (__mmask16 __U, __m512i __A, __m512i __B)
+_mm512_maskz_cvtusepi32_epi8 (__mmask16 __M, __m512i __A)
 {
-  return (__mmask16) __builtin_ia32_ptestnmd512 ((__v16si) __A,
-						 (__v16si) __B, __U);
+  return (__m128i) __builtin_ia32_pmovusdb512_mask ((__v16si) __A,
+						    (__v16qi)
+						    _mm_setzero_si128 (),
+						    __M);
 }
 
-extern __inline __mmask8
+extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_testn_epi64_mask (__m512i __A, __m512i __B)
+_mm512_cvtepi32_epi16 (__m512i __A)
 {
-  return (__mmask8) __builtin_ia32_ptestnmq512 ((__v8di) __A,
-						(__v8di) __B,
-						(__mmask8) -1);
+  return (__m256i) __builtin_ia32_pmovdw512_mask ((__v16si) __A,
+						  (__v16hi)
+						  _mm256_undefined_si256 (),
+						  (__mmask16) -1);
 }
 
-extern __inline __mmask8
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_testn_epi64_mask (__mmask8 __U, __m512i __A, __m512i __B)
+_mm512_mask_cvtepi32_storeu_epi16 (void * __P, __mmask16 __M, __m512i __A)
 {
-  return (__mmask8) __builtin_ia32_ptestnmq512 ((__v8di) __A,
-						(__v8di) __B, __U);
+  __builtin_ia32_pmovdw512mem_mask ((__v16hi *) __P, (__v16si) __A, __M);
 }
 
-extern __inline __m512
+extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_abs_ps (__m512 __A)
+_mm512_mask_cvtepi32_epi16 (__m256i __O, __mmask16 __M, __m512i __A)
 {
-  return (__m512) _mm512_and_epi32 ((__m512i) __A,
-				    _mm512_set1_epi32 (0x7fffffff));
+  return (__m256i) __builtin_ia32_pmovdw512_mask ((__v16si) __A,
+						  (__v16hi) __O, __M);
 }
 
-extern __inline __m512
+extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_abs_ps (__m512 __W, __mmask16 __U, __m512 __A)
+_mm512_maskz_cvtepi32_epi16 (__mmask16 __M, __m512i __A)
 {
-  return (__m512) _mm512_mask_and_epi32 ((__m512i) __W, __U, (__m512i) __A,
-					 _mm512_set1_epi32 (0x7fffffff));
+  return (__m256i) __builtin_ia32_pmovdw512_mask ((__v16si) __A,
+						  (__v16hi)
+						  _mm256_setzero_si256 (),
+						  __M);
 }
 
-extern __inline __m512d
+extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_abs_pd (__m512d __A)
+_mm512_cvtsepi32_epi16 (__m512i __A)
 {
-  return (__m512d) _mm512_and_epi64 ((__m512i) __A,
-				     _mm512_set1_epi64 (0x7fffffffffffffffLL));
+  return (__m256i) __builtin_ia32_pmovsdw512_mask ((__v16si) __A,
+						   (__v16hi)
+						   _mm256_undefined_si256 (),
+						   (__mmask16) -1);
 }
 
-extern __inline __m512d
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_abs_pd (__m512d __W, __mmask8 __U, __m512d __A)
+_mm512_mask_cvtsepi32_storeu_epi16 (void *__P, __mmask16 __M, __m512i __A)
 {
-  return (__m512d)
-	 _mm512_mask_and_epi64 ((__m512i) __W, __U, (__m512i) __A,
-				_mm512_set1_epi64 (0x7fffffffffffffffLL));
+  __builtin_ia32_pmovsdw512mem_mask ((__v16hi*) __P, (__v16si) __A, __M);
 }
 
-extern __inline __m512i
+extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_unpackhi_epi32 (__m512i __A, __m512i __B)
+_mm512_mask_cvtsepi32_epi16 (__m256i __O, __mmask16 __M, __m512i __A)
 {
-  return (__m512i) __builtin_ia32_punpckhdq512_mask ((__v16si) __A,
-						     (__v16si) __B,
-						     (__v16si)
-						     _mm512_undefined_epi32 (),
-						     (__mmask16) -1);
+  return (__m256i) __builtin_ia32_pmovsdw512_mask ((__v16si) __A,
+						   (__v16hi) __O, __M);
 }
 
-extern __inline __m512i
+extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_unpackhi_epi32 (__m512i __W, __mmask16 __U, __m512i __A,
-			    __m512i __B)
+_mm512_maskz_cvtsepi32_epi16 (__mmask16 __M, __m512i __A)
 {
-  return (__m512i) __builtin_ia32_punpckhdq512_mask ((__v16si) __A,
-						     (__v16si) __B,
-						     (__v16si) __W,
-						     (__mmask16) __U);
+  return (__m256i) __builtin_ia32_pmovsdw512_mask ((__v16si) __A,
+						   (__v16hi)
+						   _mm256_setzero_si256 (),
+						   __M);
 }
 
-extern __inline __m512i
+extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_unpackhi_epi32 (__mmask16 __U, __m512i __A, __m512i __B)
+_mm512_cvtusepi32_epi16 (__m512i __A)
 {
-  return (__m512i) __builtin_ia32_punpckhdq512_mask ((__v16si) __A,
-						     (__v16si) __B,
-						     (__v16si)
-						     _mm512_setzero_si512 (),
-						     (__mmask16) __U);
+  return (__m256i) __builtin_ia32_pmovusdw512_mask ((__v16si) __A,
+						    (__v16hi)
+						    _mm256_undefined_si256 (),
+						    (__mmask16) -1);
 }
 
-extern __inline __m512i
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_unpackhi_epi64 (__m512i __A, __m512i __B)
+_mm512_mask_cvtusepi32_storeu_epi16 (void *__P, __mmask16 __M, __m512i __A)
 {
-  return (__m512i) __builtin_ia32_punpckhqdq512_mask ((__v8di) __A,
-						      (__v8di) __B,
-						      (__v8di)
-						      _mm512_undefined_epi32 (),
-						      (__mmask8) -1);
+  __builtin_ia32_pmovusdw512mem_mask ((__v16hi*) __P, (__v16si) __A, __M);
 }
 
-extern __inline __m512i
+extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_unpackhi_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m512i __B)
+_mm512_mask_cvtusepi32_epi16 (__m256i __O, __mmask16 __M, __m512i __A)
 {
-  return (__m512i) __builtin_ia32_punpckhqdq512_mask ((__v8di) __A,
-						      (__v8di) __B,
-						      (__v8di) __W,
-						      (__mmask8) __U);
+  return (__m256i) __builtin_ia32_pmovusdw512_mask ((__v16si) __A,
+						    (__v16hi) __O,
+						    __M);
 }
 
-extern __inline __m512i
+extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_unpackhi_epi64 (__mmask8 __U, __m512i __A, __m512i __B)
+_mm512_maskz_cvtusepi32_epi16 (__mmask16 __M, __m512i __A)
 {
-  return (__m512i) __builtin_ia32_punpckhqdq512_mask ((__v8di) __A,
-						      (__v8di) __B,
-						      (__v8di)
-						      _mm512_setzero_si512 (),
-						      (__mmask8) __U);
+  return (__m256i) __builtin_ia32_pmovusdw512_mask ((__v16si) __A,
+						    (__v16hi)
+						    _mm256_setzero_si256 (),
+						    __M);
 }
 
-extern __inline __m512i
+extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_unpacklo_epi32 (__m512i __A, __m512i __B)
+_mm512_cvtepi64_epi32 (__m512i __A)
 {
-  return (__m512i) __builtin_ia32_punpckldq512_mask ((__v16si) __A,
-						     (__v16si) __B,
-						     (__v16si)
-						     _mm512_undefined_epi32 (),
-						     (__mmask16) -1);
+  return (__m256i) __builtin_ia32_pmovqd512_mask ((__v8di) __A,
+						  (__v8si)
+						  _mm256_undefined_si256 (),
+						  (__mmask8) -1);
 }
 
-extern __inline __m512i
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_unpacklo_epi32 (__m512i __W, __mmask16 __U, __m512i __A,
-			    __m512i __B)
+_mm512_mask_cvtepi64_storeu_epi32 (void* __P, __mmask8 __M, __m512i __A)
 {
-  return (__m512i) __builtin_ia32_punpckldq512_mask ((__v16si) __A,
-						     (__v16si) __B,
-						     (__v16si) __W,
-						     (__mmask16) __U);
+  __builtin_ia32_pmovqd512mem_mask ((__v8si *) __P, (__v8di) __A, __M);
 }
 
-extern __inline __m512i
+extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_unpacklo_epi32 (__mmask16 __U, __m512i __A, __m512i __B)
+_mm512_mask_cvtepi64_epi32 (__m256i __O, __mmask8 __M, __m512i __A)
 {
-  return (__m512i) __builtin_ia32_punpckldq512_mask ((__v16si) __A,
-						     (__v16si) __B,
-						     (__v16si)
-						     _mm512_setzero_si512 (),
-						     (__mmask16) __U);
+  return (__m256i) __builtin_ia32_pmovqd512_mask ((__v8di) __A,
+						  (__v8si) __O, __M);
 }
 
-extern __inline __m512i
+extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_unpacklo_epi64 (__m512i __A, __m512i __B)
+_mm512_maskz_cvtepi64_epi32 (__mmask8 __M, __m512i __A)
 {
-  return (__m512i) __builtin_ia32_punpcklqdq512_mask ((__v8di) __A,
-						      (__v8di) __B,
-						      (__v8di)
-						      _mm512_undefined_epi32 (),
-						      (__mmask8) -1);
+  return (__m256i) __builtin_ia32_pmovqd512_mask ((__v8di) __A,
+						  (__v8si)
+						  _mm256_setzero_si256 (),
+						  __M);
 }
 
-extern __inline __m512i
+extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_unpacklo_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m512i __B)
+_mm512_cvtsepi64_epi32 (__m512i __A)
 {
-  return (__m512i) __builtin_ia32_punpcklqdq512_mask ((__v8di) __A,
-						      (__v8di) __B,
-						      (__v8di) __W,
-						      (__mmask8) __U);
+  return (__m256i) __builtin_ia32_pmovsqd512_mask ((__v8di) __A,
+						   (__v8si)
+						   _mm256_undefined_si256 (),
+						   (__mmask8) -1);
 }
 
-extern __inline __m512i
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_unpacklo_epi64 (__mmask8 __U, __m512i __A, __m512i __B)
+_mm512_mask_cvtsepi64_storeu_epi32 (void *__P, __mmask8 __M, __m512i __A)
 {
-  return (__m512i) __builtin_ia32_punpcklqdq512_mask ((__v8di) __A,
-						      (__v8di) __B,
-						      (__v8di)
-						      _mm512_setzero_si512 (),
-						      (__mmask8) __U);
+  __builtin_ia32_pmovsqd512mem_mask ((__v8si *) __P, (__v8di) __A, __M);
 }
 
-#ifdef __x86_64__
-#ifdef __OPTIMIZE__
-extern __inline unsigned long long
+extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundss_u64 (__m128 __A, const int __R)
+_mm512_mask_cvtsepi64_epi32 (__m256i __O, __mmask8 __M, __m512i __A)
 {
-  return (unsigned long long) __builtin_ia32_vcvtss2usi64 ((__v4sf) __A, __R);
+  return (__m256i) __builtin_ia32_pmovsqd512_mask ((__v8di) __A,
+						   (__v8si) __O, __M);
 }
 
-extern __inline long long
+extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundss_si64 (__m128 __A, const int __R)
+_mm512_maskz_cvtsepi64_epi32 (__mmask8 __M, __m512i __A)
 {
-  return (long long) __builtin_ia32_vcvtss2si64 ((__v4sf) __A, __R);
+  return (__m256i) __builtin_ia32_pmovsqd512_mask ((__v8di) __A,
+						   (__v8si)
+						   _mm256_setzero_si256 (),
+						   __M);
 }
 
-extern __inline long long
+extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundss_i64 (__m128 __A, const int __R)
+_mm512_cvtusepi64_epi32 (__m512i __A)
 {
-  return (long long) __builtin_ia32_vcvtss2si64 ((__v4sf) __A, __R);
+  return (__m256i) __builtin_ia32_pmovusqd512_mask ((__v8di) __A,
+						    (__v8si)
+						    _mm256_undefined_si256 (),
+						    (__mmask8) -1);
 }
 
-extern __inline unsigned long long
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtt_roundss_u64 (__m128 __A, const int __R)
+_mm512_mask_cvtusepi64_storeu_epi32 (void* __P, __mmask8 __M, __m512i __A)
 {
-  return (unsigned long long) __builtin_ia32_vcvttss2usi64 ((__v4sf) __A, __R);
+  __builtin_ia32_pmovusqd512mem_mask ((__v8si*) __P, (__v8di) __A, __M);
 }
 
-extern __inline long long
+extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtt_roundss_i64 (__m128 __A, const int __R)
+_mm512_mask_cvtusepi64_epi32 (__m256i __O, __mmask8 __M, __m512i __A)
 {
-  return (long long) __builtin_ia32_vcvttss2si64 ((__v4sf) __A, __R);
+  return (__m256i) __builtin_ia32_pmovusqd512_mask ((__v8di) __A,
+						    (__v8si) __O, __M);
 }
 
-extern __inline long long
+extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtt_roundss_si64 (__m128 __A, const int __R)
+_mm512_maskz_cvtusepi64_epi32 (__mmask8 __M, __m512i __A)
 {
-  return (long long) __builtin_ia32_vcvttss2si64 ((__v4sf) __A, __R);
+  return (__m256i) __builtin_ia32_pmovusqd512_mask ((__v8di) __A,
+						    (__v8si)
+						    _mm256_setzero_si256 (),
+						    __M);
 }
-#else
-#define _mm_cvt_roundss_u64(A, B)   \
-    ((unsigned long long)__builtin_ia32_vcvtss2usi64(A, B))
-
-#define _mm_cvt_roundss_si64(A, B)   \
-    ((long long)__builtin_ia32_vcvtss2si64(A, B))
-
-#define _mm_cvt_roundss_i64(A, B)   \
-    ((long long)__builtin_ia32_vcvtss2si64(A, B))
-
-#define _mm_cvtt_roundss_u64(A, B)  \
-    ((unsigned long long)__builtin_ia32_vcvttss2usi64(A, B))
-
-#define _mm_cvtt_roundss_i64(A, B)  \
-    ((long long)__builtin_ia32_vcvttss2si64(A, B))
-
-#define _mm_cvtt_roundss_si64(A, B)  \
-    ((long long)__builtin_ia32_vcvttss2si64(A, B))
-#endif
-#endif
 
-#ifdef __OPTIMIZE__
-extern __inline unsigned
+extern __inline __m128i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundss_u32 (__m128 __A, const int __R)
+_mm512_cvtepi64_epi16 (__m512i __A)
 {
-  return (unsigned) __builtin_ia32_vcvtss2usi32 ((__v4sf) __A, __R);
+  return (__m128i) __builtin_ia32_pmovqw512_mask ((__v8di) __A,
+						  (__v8hi)
+						  _mm_undefined_si128 (),
+						  (__mmask8) -1);
 }
 
-extern __inline int
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundss_si32 (__m128 __A, const int __R)
+_mm512_mask_cvtepi64_storeu_epi16 (void *__P, __mmask8 __M, __m512i __A)
 {
-  return (int) __builtin_ia32_vcvtss2si32 ((__v4sf) __A, __R);
+  __builtin_ia32_pmovqw512mem_mask ((__v8hi *) __P, (__v8di) __A, __M);
 }
 
-extern __inline int
+extern __inline __m128i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundss_i32 (__m128 __A, const int __R)
+_mm512_mask_cvtepi64_epi16 (__m128i __O, __mmask8 __M, __m512i __A)
 {
-  return (int) __builtin_ia32_vcvtss2si32 ((__v4sf) __A, __R);
+  return (__m128i) __builtin_ia32_pmovqw512_mask ((__v8di) __A,
+						  (__v8hi) __O, __M);
 }
 
-extern __inline unsigned
+extern __inline __m128i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtt_roundss_u32 (__m128 __A, const int __R)
+_mm512_maskz_cvtepi64_epi16 (__mmask8 __M, __m512i __A)
 {
-  return (unsigned) __builtin_ia32_vcvttss2usi32 ((__v4sf) __A, __R);
+  return (__m128i) __builtin_ia32_pmovqw512_mask ((__v8di) __A,
+						  (__v8hi)
+						  _mm_setzero_si128 (),
+						  __M);
 }
 
-extern __inline int
+extern __inline __m128i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtt_roundss_i32 (__m128 __A, const int __R)
+_mm512_cvtsepi64_epi16 (__m512i __A)
 {
-  return (int) __builtin_ia32_vcvttss2si32 ((__v4sf) __A, __R);
+  return (__m128i) __builtin_ia32_pmovsqw512_mask ((__v8di) __A,
+						   (__v8hi)
+						   _mm_undefined_si128 (),
+						   (__mmask8) -1);
 }
 
-extern __inline int
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtt_roundss_si32 (__m128 __A, const int __R)
+_mm512_mask_cvtsepi64_storeu_epi16 (void * __P, __mmask8 __M, __m512i __A)
 {
-  return (int) __builtin_ia32_vcvttss2si32 ((__v4sf) __A, __R);
+  __builtin_ia32_pmovsqw512mem_mask ((__v8hi *) __P, (__v8di) __A, __M);
 }
-#else
-#define _mm_cvt_roundss_u32(A, B)   \
-    ((unsigned)__builtin_ia32_vcvtss2usi32(A, B))
-
-#define _mm_cvt_roundss_si32(A, B)   \
-    ((int)__builtin_ia32_vcvtss2si32(A, B))
-
-#define _mm_cvt_roundss_i32(A, B)   \
-    ((int)__builtin_ia32_vcvtss2si32(A, B))
-
-#define _mm_cvtt_roundss_u32(A, B)  \
-    ((unsigned)__builtin_ia32_vcvttss2usi32(A, B))
-
-#define _mm_cvtt_roundss_si32(A, B)  \
-    ((int)__builtin_ia32_vcvttss2si32(A, B))
-
-#define _mm_cvtt_roundss_i32(A, B)  \
-    ((int)__builtin_ia32_vcvttss2si32(A, B))
-#endif
 
-#ifdef __x86_64__
-#ifdef __OPTIMIZE__
-extern __inline unsigned long long
+extern __inline __m128i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundsd_u64 (__m128d __A, const int __R)
+_mm512_mask_cvtsepi64_epi16 (__m128i __O, __mmask8 __M, __m512i __A)
 {
-  return (unsigned long long) __builtin_ia32_vcvtsd2usi64 ((__v2df) __A, __R);
+  return (__m128i) __builtin_ia32_pmovsqw512_mask ((__v8di) __A,
+						   (__v8hi) __O, __M);
 }
 
-extern __inline long long
+extern __inline __m128i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundsd_si64 (__m128d __A, const int __R)
+_mm512_maskz_cvtsepi64_epi16 (__mmask8 __M, __m512i __A)
 {
-  return (long long) __builtin_ia32_vcvtsd2si64 ((__v2df) __A, __R);
+  return (__m128i) __builtin_ia32_pmovsqw512_mask ((__v8di) __A,
+						   (__v8hi)
+						   _mm_setzero_si128 (),
+						   __M);
 }
 
-extern __inline long long
+extern __inline __m128i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundsd_i64 (__m128d __A, const int __R)
+_mm512_cvtusepi64_epi16 (__m512i __A)
 {
-  return (long long) __builtin_ia32_vcvtsd2si64 ((__v2df) __A, __R);
+  return (__m128i) __builtin_ia32_pmovusqw512_mask ((__v8di) __A,
+						    (__v8hi)
+						    _mm_undefined_si128 (),
+						    (__mmask8) -1);
 }
 
-extern __inline unsigned long long
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtt_roundsd_u64 (__m128d __A, const int __R)
+_mm512_mask_cvtusepi64_storeu_epi16 (void *__P, __mmask8 __M, __m512i __A)
 {
-  return (unsigned long long) __builtin_ia32_vcvttsd2usi64 ((__v2df) __A, __R);
+  __builtin_ia32_pmovusqw512mem_mask ((__v8hi*) __P, (__v8di) __A, __M);
 }
 
-extern __inline long long
+extern __inline __m128i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtt_roundsd_si64 (__m128d __A, const int __R)
+_mm512_mask_cvtusepi64_epi16 (__m128i __O, __mmask8 __M, __m512i __A)
 {
-  return (long long) __builtin_ia32_vcvttsd2si64 ((__v2df) __A, __R);
+  return (__m128i) __builtin_ia32_pmovusqw512_mask ((__v8di) __A,
+						    (__v8hi) __O, __M);
 }
 
-extern __inline long long
+extern __inline __m128i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtt_roundsd_i64 (__m128d __A, const int __R)
+_mm512_maskz_cvtusepi64_epi16 (__mmask8 __M, __m512i __A)
 {
-  return (long long) __builtin_ia32_vcvttsd2si64 ((__v2df) __A, __R);
+  return (__m128i) __builtin_ia32_pmovusqw512_mask ((__v8di) __A,
+						    (__v8hi)
+						    _mm_setzero_si128 (),
+						    __M);
 }
-#else
-#define _mm_cvt_roundsd_u64(A, B)   \
-    ((unsigned long long)__builtin_ia32_vcvtsd2usi64(A, B))
-
-#define _mm_cvt_roundsd_si64(A, B)   \
-    ((long long)__builtin_ia32_vcvtsd2si64(A, B))
-
-#define _mm_cvt_roundsd_i64(A, B)   \
-    ((long long)__builtin_ia32_vcvtsd2si64(A, B))
-
-#define _mm_cvtt_roundsd_u64(A, B)   \
-    ((unsigned long long)__builtin_ia32_vcvttsd2usi64(A, B))
-
-#define _mm_cvtt_roundsd_si64(A, B)   \
-    ((long long)__builtin_ia32_vcvttsd2si64(A, B))
-
-#define _mm_cvtt_roundsd_i64(A, B)   \
-    ((long long)__builtin_ia32_vcvttsd2si64(A, B))
-#endif
-#endif
 
-#ifdef __OPTIMIZE__
-extern __inline unsigned
+extern __inline __m128i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundsd_u32 (__m128d __A, const int __R)
+_mm512_cvtepi64_epi8 (__m512i __A)
 {
-  return (unsigned) __builtin_ia32_vcvtsd2usi32 ((__v2df) __A, __R);
+  return (__m128i) __builtin_ia32_pmovqb512_mask ((__v8di) __A,
+						  (__v16qi)
+						  _mm_undefined_si128 (),
+						  (__mmask8) -1);
 }
 
-extern __inline int
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundsd_si32 (__m128d __A, const int __R)
+_mm512_mask_cvtepi64_storeu_epi8 (void * __P, __mmask8 __M, __m512i __A)
 {
-  return (int) __builtin_ia32_vcvtsd2si32 ((__v2df) __A, __R);
+  __builtin_ia32_pmovqb512mem_mask ((unsigned long long *) __P,
+				    (__v8di) __A, __M);
 }
 
-extern __inline int
+extern __inline __m128i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundsd_i32 (__m128d __A, const int __R)
+_mm512_mask_cvtepi64_epi8 (__m128i __O, __mmask8 __M, __m512i __A)
 {
-  return (int) __builtin_ia32_vcvtsd2si32 ((__v2df) __A, __R);
+  return (__m128i) __builtin_ia32_pmovqb512_mask ((__v8di) __A,
+						  (__v16qi) __O, __M);
 }
 
-extern __inline unsigned
+extern __inline __m128i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtt_roundsd_u32 (__m128d __A, const int __R)
+_mm512_maskz_cvtepi64_epi8 (__mmask8 __M, __m512i __A)
 {
-  return (unsigned) __builtin_ia32_vcvttsd2usi32 ((__v2df) __A, __R);
+  return (__m128i) __builtin_ia32_pmovqb512_mask ((__v8di) __A,
+						  (__v16qi)
+						  _mm_setzero_si128 (),
+						  __M);
 }
 
-extern __inline int
+extern __inline __m128i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtt_roundsd_i32 (__m128d __A, const int __R)
+_mm512_cvtsepi64_epi8 (__m512i __A)
 {
-  return (int) __builtin_ia32_vcvttsd2si32 ((__v2df) __A, __R);
+  return (__m128i) __builtin_ia32_pmovsqb512_mask ((__v8di) __A,
+						   (__v16qi)
+						   _mm_undefined_si128 (),
+						   (__mmask8) -1);
 }
 
-extern __inline int
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtt_roundsd_si32 (__m128d __A, const int __R)
+_mm512_mask_cvtsepi64_storeu_epi8 (void * __P, __mmask8 __M, __m512i __A)
 {
-  return (int) __builtin_ia32_vcvttsd2si32 ((__v2df) __A, __R);
+  __builtin_ia32_pmovsqb512mem_mask ((unsigned long long *) __P, (__v8di) __A, __M);
 }
-#else
-#define _mm_cvt_roundsd_u32(A, B)   \
-    ((unsigned)__builtin_ia32_vcvtsd2usi32(A, B))
-
-#define _mm_cvt_roundsd_si32(A, B)   \
-    ((int)__builtin_ia32_vcvtsd2si32(A, B))
 
-#define _mm_cvt_roundsd_i32(A, B)   \
-    ((int)__builtin_ia32_vcvtsd2si32(A, B))
-
-#define _mm_cvtt_roundsd_u32(A, B)   \
-    ((unsigned)__builtin_ia32_vcvttsd2usi32(A, B))
-
-#define _mm_cvtt_roundsd_si32(A, B)   \
-    ((int)__builtin_ia32_vcvttsd2si32(A, B))
-
-#define _mm_cvtt_roundsd_i32(A, B)   \
-    ((int)__builtin_ia32_vcvttsd2si32(A, B))
-#endif
-
-extern __inline __m512d
+extern __inline __m128i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_movedup_pd (__m512d __A)
+_mm512_mask_cvtsepi64_epi8 (__m128i __O, __mmask8 __M, __m512i __A)
 {
-  return (__m512d) __builtin_ia32_movddup512_mask ((__v8df) __A,
-						   (__v8df)
-						   _mm512_undefined_pd (),
-						   (__mmask8) -1);
+  return (__m128i) __builtin_ia32_pmovsqb512_mask ((__v8di) __A,
+						   (__v16qi) __O, __M);
 }
 
-extern __inline __m512d
+extern __inline __m128i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_movedup_pd (__m512d __W, __mmask8 __U, __m512d __A)
+_mm512_maskz_cvtsepi64_epi8 (__mmask8 __M, __m512i __A)
 {
-  return (__m512d) __builtin_ia32_movddup512_mask ((__v8df) __A,
-						   (__v8df) __W,
-						   (__mmask8) __U);
+  return (__m128i) __builtin_ia32_pmovsqb512_mask ((__v8di) __A,
+						   (__v16qi)
+						   _mm_setzero_si128 (),
+						   __M);
 }
 
-extern __inline __m512d
+extern __inline __m128i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_movedup_pd (__mmask8 __U, __m512d __A)
+_mm512_cvtusepi64_epi8 (__m512i __A)
 {
-  return (__m512d) __builtin_ia32_movddup512_mask ((__v8df) __A,
-						   (__v8df)
-						   _mm512_setzero_pd (),
-						   (__mmask8) __U);
+  return (__m128i) __builtin_ia32_pmovusqb512_mask ((__v8di) __A,
+						    (__v16qi)
+						    _mm_undefined_si128 (),
+						    (__mmask8) -1);
 }
 
-extern __inline __m512d
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_unpacklo_pd (__m512d __A, __m512d __B)
+_mm512_mask_cvtusepi64_storeu_epi8 (void * __P, __mmask8 __M, __m512i __A)
 {
-  return (__m512d) __builtin_ia32_unpcklpd512_mask ((__v8df) __A,
-						    (__v8df) __B,
-						    (__v8df)
-						    _mm512_undefined_pd (),
-						    (__mmask8) -1);
+  __builtin_ia32_pmovusqb512mem_mask ((unsigned long long *) __P, (__v8di) __A, __M);
 }
 
-extern __inline __m512d
+extern __inline __m128i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_unpacklo_pd (__m512d __W, __mmask8 __U, __m512d __A, __m512d __B)
+_mm512_mask_cvtusepi64_epi8 (__m128i __O, __mmask8 __M, __m512i __A)
 {
-  return (__m512d) __builtin_ia32_unpcklpd512_mask ((__v8df) __A,
-						    (__v8df) __B,
-						    (__v8df) __W,
-						    (__mmask8) __U);
+  return (__m128i) __builtin_ia32_pmovusqb512_mask ((__v8di) __A,
+						    (__v16qi) __O,
+						    __M);
 }
 
-extern __inline __m512d
+extern __inline __m128i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_unpacklo_pd (__mmask8 __U, __m512d __A, __m512d __B)
+_mm512_maskz_cvtusepi64_epi8 (__mmask8 __M, __m512i __A)
 {
-  return (__m512d) __builtin_ia32_unpcklpd512_mask ((__v8df) __A,
-						    (__v8df) __B,
-						    (__v8df)
-						    _mm512_setzero_pd (),
-						    (__mmask8) __U);
+  return (__m128i) __builtin_ia32_pmovusqb512_mask ((__v8di) __A,
+						    (__v16qi)
+						    _mm_setzero_si128 (),
+						    __M);
 }
 
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_unpackhi_pd (__m512d __A, __m512d __B)
+_mm512_cvtepi32_pd (__m256i __A)
 {
-  return (__m512d) __builtin_ia32_unpckhpd512_mask ((__v8df) __A,
-						    (__v8df) __B,
+  return (__m512d) __builtin_ia32_cvtdq2pd512_mask ((__v8si) __A,
 						    (__v8df)
 						    _mm512_undefined_pd (),
 						    (__mmask8) -1);
@@ -8495,93 +8817,88 @@ _mm512_unpackhi_pd (__m512d __A, __m512d __B)
 
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_unpackhi_pd (__m512d __W, __mmask8 __U, __m512d __A, __m512d __B)
+_mm512_mask_cvtepi32_pd (__m512d __W, __mmask8 __U, __m256i __A)
 {
-  return (__m512d) __builtin_ia32_unpckhpd512_mask ((__v8df) __A,
-						    (__v8df) __B,
+  return (__m512d) __builtin_ia32_cvtdq2pd512_mask ((__v8si) __A,
 						    (__v8df) __W,
 						    (__mmask8) __U);
 }
 
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_unpackhi_pd (__mmask8 __U, __m512d __A, __m512d __B)
+_mm512_maskz_cvtepi32_pd (__mmask8 __U, __m256i __A)
 {
-  return (__m512d) __builtin_ia32_unpckhpd512_mask ((__v8df) __A,
-						    (__v8df) __B,
+  return (__m512d) __builtin_ia32_cvtdq2pd512_mask ((__v8si) __A,
 						    (__v8df)
 						    _mm512_setzero_pd (),
 						    (__mmask8) __U);
 }
 
-extern __inline __m512
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_unpackhi_ps (__m512 __A, __m512 __B)
+_mm512_cvtepu32_pd (__m256i __A)
 {
-  return (__m512) __builtin_ia32_unpckhps512_mask ((__v16sf) __A,
-						   (__v16sf) __B,
-						   (__v16sf)
-						   _mm512_undefined_ps (),
-						   (__mmask16) -1);
+  return (__m512d) __builtin_ia32_cvtudq2pd512_mask ((__v8si) __A,
+						     (__v8df)
+						     _mm512_undefined_pd (),
+						     (__mmask8) -1);
 }
 
-extern __inline __m512
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_unpackhi_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B)
+_mm512_mask_cvtepu32_pd (__m512d __W, __mmask8 __U, __m256i __A)
 {
-  return (__m512) __builtin_ia32_unpckhps512_mask ((__v16sf) __A,
-						   (__v16sf) __B,
-						   (__v16sf) __W,
-						   (__mmask16) __U);
+  return (__m512d) __builtin_ia32_cvtudq2pd512_mask ((__v8si) __A,
+						     (__v8df) __W,
+						     (__mmask8) __U);
 }
 
-extern __inline __m512
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_unpackhi_ps (__mmask16 __U, __m512 __A, __m512 __B)
+_mm512_maskz_cvtepu32_pd (__mmask8 __U, __m256i __A)
 {
-  return (__m512) __builtin_ia32_unpckhps512_mask ((__v16sf) __A,
-						   (__v16sf) __B,
-						   (__v16sf)
-						   _mm512_setzero_ps (),
-						   (__mmask16) __U);
+  return (__m512d) __builtin_ia32_cvtudq2pd512_mask ((__v8si) __A,
+						     (__v8df)
+						     _mm512_setzero_pd (),
+						     (__mmask8) __U);
 }
 
 #ifdef __OPTIMIZE__
-extern __inline __m512d
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvt_roundps_pd (__m256 __A, const int __R)
+_mm512_cvt_roundepi32_ps (__m512i __A, const int __R)
 {
-  return (__m512d) __builtin_ia32_cvtps2pd512_mask ((__v8sf) __A,
-						    (__v8df)
-						    _mm512_undefined_pd (),
-						    (__mmask8) -1, __R);
+  return (__m512) __builtin_ia32_cvtdq2ps512_mask ((__v16si) __A,
+						   (__v16sf)
+						   _mm512_undefined_ps (),
+						   (__mmask16) -1, __R);
 }
 
-extern __inline __m512d
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvt_roundps_pd (__m512d __W, __mmask8 __U, __m256 __A,
-			    const int __R)
+_mm512_mask_cvt_roundepi32_ps (__m512 __W, __mmask16 __U, __m512i __A,
+			       const int __R)
 {
-  return (__m512d) __builtin_ia32_cvtps2pd512_mask ((__v8sf) __A,
-						    (__v8df) __W,
-						    (__mmask8) __U, __R);
+  return (__m512) __builtin_ia32_cvtdq2ps512_mask ((__v16si) __A,
+						   (__v16sf) __W,
+						   (__mmask16) __U, __R);
 }
 
-extern __inline __m512d
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvt_roundps_pd (__mmask8 __U, __m256 __A, const int __R)
+_mm512_maskz_cvt_roundepi32_ps (__mmask16 __U, __m512i __A, const int __R)
 {
-  return (__m512d) __builtin_ia32_cvtps2pd512_mask ((__v8sf) __A,
-						    (__v8df)
-						    _mm512_setzero_pd (),
-						    (__mmask8) __U, __R);
+  return (__m512) __builtin_ia32_cvtdq2ps512_mask ((__v16si) __A,
+						   (__v16sf)
+						   _mm512_setzero_ps (),
+						   (__mmask16) __U, __R);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvt_roundph_ps (__m256i __A, const int __R)
+_mm512_cvt_roundepu32_ps (__m512i __A, const int __R)
 {
-  return (__m512) __builtin_ia32_vcvtph2ps512_mask ((__v16hi) __A,
+  return (__m512) __builtin_ia32_cvtudq2ps512_mask ((__v16si) __A,
 						    (__v16sf)
 						    _mm512_undefined_ps (),
 						    (__mmask16) -1, __R);
@@ -8589,2829 +8906,2850 @@ _mm512_cvt_roundph_ps (__m256i __A, const int __R)
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvt_roundph_ps (__m512 __W, __mmask16 __U, __m256i __A,
-			    const int __R)
+_mm512_mask_cvt_roundepu32_ps (__m512 __W, __mmask16 __U, __m512i __A,
+			       const int __R)
 {
-  return (__m512) __builtin_ia32_vcvtph2ps512_mask ((__v16hi) __A,
+  return (__m512) __builtin_ia32_cvtudq2ps512_mask ((__v16si) __A,
 						    (__v16sf) __W,
 						    (__mmask16) __U, __R);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvt_roundph_ps (__mmask16 __U, __m256i __A, const int __R)
+_mm512_maskz_cvt_roundepu32_ps (__mmask16 __U, __m512i __A, const int __R)
 {
-  return (__m512) __builtin_ia32_vcvtph2ps512_mask ((__v16hi) __A,
+  return (__m512) __builtin_ia32_cvtudq2ps512_mask ((__v16si) __A,
 						    (__v16sf)
 						    _mm512_setzero_ps (),
 						    (__mmask16) __U, __R);
 }
 
-extern __inline __m256i
+#else
+#define _mm512_cvt_roundepi32_ps(A, B)        \
+    (__m512)__builtin_ia32_cvtdq2ps512_mask((__v16si)(A), (__v16sf)_mm512_undefined_ps(), -1, B)
+
+#define _mm512_mask_cvt_roundepi32_ps(W, U, A, B)   \
+    (__m512)__builtin_ia32_cvtdq2ps512_mask((__v16si)(A), W, U, B)
+
+#define _mm512_maskz_cvt_roundepi32_ps(U, A, B)      \
+    (__m512)__builtin_ia32_cvtdq2ps512_mask((__v16si)(A), (__v16sf)_mm512_setzero_ps(), U, B)
+
+#define _mm512_cvt_roundepu32_ps(A, B)        \
+    (__m512)__builtin_ia32_cvtudq2ps512_mask((__v16si)(A), (__v16sf)_mm512_undefined_ps(), -1, B)
+
+#define _mm512_mask_cvt_roundepu32_ps(W, U, A, B)   \
+    (__m512)__builtin_ia32_cvtudq2ps512_mask((__v16si)(A), W, U, B)
+
+#define _mm512_maskz_cvt_roundepu32_ps(U, A, B)      \
+    (__m512)__builtin_ia32_cvtudq2ps512_mask((__v16si)(A), (__v16sf)_mm512_setzero_ps(), U, B)
+#endif
+
+#ifdef __OPTIMIZE__
+extern __inline __m256d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvt_roundps_ph (__m512 __A, const int __I)
+_mm512_extractf64x4_pd (__m512d __A, const int __imm)
 {
-  return (__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf) __A,
-						     __I,
-						     (__v16hi)
-						     _mm256_undefined_si256 (),
-						     -1);
+  return (__m256d) __builtin_ia32_extractf64x4_mask ((__v8df) __A,
+						     __imm,
+						     (__v4df)
+						     _mm256_undefined_pd (),
+						     (__mmask8) -1);
 }
 
-extern __inline __m256i
+extern __inline __m256d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtps_ph (__m512 __A, const int __I)
+_mm512_mask_extractf64x4_pd (__m256d __W, __mmask8 __U, __m512d __A,
+			     const int __imm)
 {
-  return (__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf) __A,
-						     __I,
-						     (__v16hi)
-						     _mm256_undefined_si256 (),
-						     -1);
+  return (__m256d) __builtin_ia32_extractf64x4_mask ((__v8df) __A,
+						     __imm,
+						     (__v4df) __W,
+						     (__mmask8) __U);
 }
 
-extern __inline __m256i
+extern __inline __m256d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvt_roundps_ph (__m256i __U, __mmask16 __W, __m512 __A,
-			    const int __I)
+_mm512_maskz_extractf64x4_pd (__mmask8 __U, __m512d __A, const int __imm)
 {
-  return (__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf) __A,
-						     __I,
-						     (__v16hi) __U,
-						     (__mmask16) __W);
+  return (__m256d) __builtin_ia32_extractf64x4_mask ((__v8df) __A,
+						     __imm,
+						     (__v4df)
+						     _mm256_setzero_pd (),
+						     (__mmask8) __U);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_extractf32x4_ps (__m512 __A, const int __imm)
+{
+  return (__m128) __builtin_ia32_extractf32x4_mask ((__v16sf) __A,
+						    __imm,
+						    (__v4sf)
+						    _mm_undefined_ps (),
+						    (__mmask8) -1);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_extractf32x4_ps (__m128 __W, __mmask8 __U, __m512 __A,
+			     const int __imm)
+{
+  return (__m128) __builtin_ia32_extractf32x4_mask ((__v16sf) __A,
+						    __imm,
+						    (__v4sf) __W,
+						    (__mmask8) __U);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_extractf32x4_ps (__mmask8 __U, __m512 __A, const int __imm)
+{
+  return (__m128) __builtin_ia32_extractf32x4_mask ((__v16sf) __A,
+						    __imm,
+						    (__v4sf)
+						    _mm_setzero_ps (),
+						    (__mmask8) __U);
 }
 
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtps_ph (__m256i __U, __mmask16 __W, __m512 __A, const int __I)
+_mm512_extracti64x4_epi64 (__m512i __A, const int __imm)
 {
-  return (__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf) __A,
-						     __I,
-						     (__v16hi) __U,
-						     (__mmask16) __W);
+  return (__m256i) __builtin_ia32_extracti64x4_mask ((__v8di) __A,
+						     __imm,
+						     (__v4di)
+						     _mm256_undefined_si256 (),
+						     (__mmask8) -1);
 }
 
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvt_roundps_ph (__mmask16 __W, __m512 __A, const int __I)
+_mm512_mask_extracti64x4_epi64 (__m256i __W, __mmask8 __U, __m512i __A,
+				const int __imm)
 {
-  return (__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf) __A,
-						     __I,
-						     (__v16hi)
-						     _mm256_setzero_si256 (),
-						     (__mmask16) __W);
+  return (__m256i) __builtin_ia32_extracti64x4_mask ((__v8di) __A,
+						     __imm,
+						     (__v4di) __W,
+						     (__mmask8) __U);
 }
 
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtps_ph (__mmask16 __W, __m512 __A, const int __I)
+_mm512_maskz_extracti64x4_epi64 (__mmask8 __U, __m512i __A, const int __imm)
 {
-  return (__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf) __A,
-						     __I,
-						     (__v16hi)
+  return (__m256i) __builtin_ia32_extracti64x4_mask ((__v8di) __A,
+						     __imm,
+						     (__v4di)
 						     _mm256_setzero_si256 (),
-						     (__mmask16) __W);
+						     (__mmask8) __U);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_extracti32x4_epi32 (__m512i __A, const int __imm)
+{
+  return (__m128i) __builtin_ia32_extracti32x4_mask ((__v16si) __A,
+						     __imm,
+						     (__v4si)
+						     _mm_undefined_si128 (),
+						     (__mmask8) -1);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_extracti32x4_epi32 (__m128i __W, __mmask8 __U, __m512i __A,
+				const int __imm)
+{
+  return (__m128i) __builtin_ia32_extracti32x4_mask ((__v16si) __A,
+						     __imm,
+						     (__v4si) __W,
+						     (__mmask8) __U);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_extracti32x4_epi32 (__mmask8 __U, __m512i __A, const int __imm)
+{
+  return (__m128i) __builtin_ia32_extracti32x4_mask ((__v16si) __A,
+						     __imm,
+						     (__v4si)
+						     _mm_setzero_si128 (),
+						     (__mmask8) __U);
 }
 #else
-#define _mm512_cvt_roundps_pd(A, B)		 \
-    (__m512d)__builtin_ia32_cvtps2pd512_mask(A, (__v8df)_mm512_undefined_pd(), -1, B)
 
-#define _mm512_mask_cvt_roundps_pd(W, U, A, B)   \
-    (__m512d)__builtin_ia32_cvtps2pd512_mask(A, (__v8df)(W), U, B)
+#define _mm512_extractf64x4_pd(X, C)                                    \
+  ((__m256d) __builtin_ia32_extractf64x4_mask ((__v8df)(__m512d) (X),   \
+    (int) (C),\
+    (__v4df)(__m256d)_mm256_undefined_pd(),\
+    (__mmask8)-1))
 
-#define _mm512_maskz_cvt_roundps_pd(U, A, B)     \
-    (__m512d)__builtin_ia32_cvtps2pd512_mask(A, (__v8df)_mm512_setzero_pd(), U, B)
+#define _mm512_mask_extractf64x4_pd(W, U, X, C)                         \
+  ((__m256d) __builtin_ia32_extractf64x4_mask ((__v8df)(__m512d) (X),   \
+    (int) (C),\
+    (__v4df)(__m256d)(W),\
+    (__mmask8)(U)))
 
-#define _mm512_cvt_roundph_ps(A, B)		 \
-    (__m512)__builtin_ia32_vcvtph2ps512_mask((__v16hi)(A), (__v16sf)_mm512_undefined_ps(), -1, B)
+#define _mm512_maskz_extractf64x4_pd(U, X, C)                           \
+  ((__m256d) __builtin_ia32_extractf64x4_mask ((__v8df)(__m512d) (X),   \
+    (int) (C),\
+    (__v4df)(__m256d)_mm256_setzero_pd(),\
+    (__mmask8)(U)))
 
-#define _mm512_mask_cvt_roundph_ps(W, U, A, B)   \
-    (__m512)__builtin_ia32_vcvtph2ps512_mask((__v16hi)(A), (__v16sf)(W), U, B)
+#define _mm512_extractf32x4_ps(X, C)                                    \
+  ((__m128) __builtin_ia32_extractf32x4_mask ((__v16sf)(__m512) (X),    \
+    (int) (C),\
+    (__v4sf)(__m128)_mm_undefined_ps(),\
+    (__mmask8)-1))
 
-#define _mm512_maskz_cvt_roundph_ps(U, A, B)     \
-    (__m512)__builtin_ia32_vcvtph2ps512_mask((__v16hi)(A), (__v16sf)_mm512_setzero_ps(), U, B)
+#define _mm512_mask_extractf32x4_ps(W, U, X, C)                         \
+  ((__m128) __builtin_ia32_extractf32x4_mask ((__v16sf)(__m512) (X),    \
+    (int) (C),\
+    (__v4sf)(__m128)(W),\
+    (__mmask8)(U)))
 
-#define _mm512_cvt_roundps_ph(A, I)						 \
-  ((__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf)(__m512) (A), (int) (I),\
-    (__v16hi)_mm256_undefined_si256 (), -1))
-#define _mm512_cvtps_ph(A, I)						 \
-  ((__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf)(__m512) (A), (int) (I),\
-    (__v16hi)_mm256_undefined_si256 (), -1))
-#define _mm512_mask_cvt_roundps_ph(U, W, A, I)				 \
-  ((__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf)(__m512) (A), (int) (I),\
-    (__v16hi)(__m256i)(U), (__mmask16) (W)))
-#define _mm512_mask_cvtps_ph(U, W, A, I)				 \
-  ((__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf)(__m512) (A), (int) (I),\
-    (__v16hi)(__m256i)(U), (__mmask16) (W)))
-#define _mm512_maskz_cvt_roundps_ph(W, A, I)					 \
-  ((__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf)(__m512) (A), (int) (I),\
-    (__v16hi)_mm256_setzero_si256 (), (__mmask16) (W)))
-#define _mm512_maskz_cvtps_ph(W, A, I)					 \
-  ((__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf)(__m512) (A), (int) (I),\
-    (__v16hi)_mm256_setzero_si256 (), (__mmask16) (W)))
+#define _mm512_maskz_extractf32x4_ps(U, X, C)                           \
+  ((__m128) __builtin_ia32_extractf32x4_mask ((__v16sf)(__m512) (X),    \
+    (int) (C),\
+    (__v4sf)(__m128)_mm_setzero_ps(),\
+    (__mmask8)(U)))
+
+#define _mm512_extracti64x4_epi64(X, C)                                 \
+  ((__m256i) __builtin_ia32_extracti64x4_mask ((__v8di)(__m512i) (X),   \
+    (int) (C),\
+    (__v4di)(__m256i)_mm256_undefined_si256 (),\
+    (__mmask8)-1))
+
+#define _mm512_mask_extracti64x4_epi64(W, U, X, C)                      \
+  ((__m256i) __builtin_ia32_extracti64x4_mask ((__v8di)(__m512i) (X),   \
+    (int) (C),\
+    (__v4di)(__m256i)(W),\
+    (__mmask8)(U)))
+
+#define _mm512_maskz_extracti64x4_epi64(U, X, C)                        \
+  ((__m256i) __builtin_ia32_extracti64x4_mask ((__v8di)(__m512i) (X),   \
+    (int) (C),\
+    (__v4di)(__m256i)_mm256_setzero_si256 (),\
+    (__mmask8)(U)))
+
+#define _mm512_extracti32x4_epi32(X, C)                                 \
+  ((__m128i) __builtin_ia32_extracti32x4_mask ((__v16si)(__m512i) (X),  \
+    (int) (C),\
+    (__v4si)(__m128i)_mm_undefined_si128 (),\
+    (__mmask8)-1))
+
+#define _mm512_mask_extracti32x4_epi32(W, U, X, C)                      \
+  ((__m128i) __builtin_ia32_extracti32x4_mask ((__v16si)(__m512i) (X),  \
+    (int) (C),\
+    (__v4si)(__m128i)(W),\
+    (__mmask8)(U)))
+
+#define _mm512_maskz_extracti32x4_epi32(U, X, C)                        \
+  ((__m128i) __builtin_ia32_extracti32x4_mask ((__v16si)(__m512i) (X),  \
+    (int) (C),\
+    (__v4si)(__m128i)_mm_setzero_si128 (),\
+    (__mmask8)(U)))
 #endif
 
 #ifdef __OPTIMIZE__
-extern __inline __m256
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvt_roundpd_ps (__m512d __A, const int __R)
-{
-  return (__m256) __builtin_ia32_cvtpd2ps512_mask ((__v8df) __A,
-						   (__v8sf)
-						   _mm256_undefined_ps (),
-						   (__mmask8) -1, __R);
-}
-
-extern __inline __m256
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvt_roundpd_ps (__m256 __W, __mmask8 __U, __m512d __A,
-			    const int __R)
+_mm512_inserti32x4 (__m512i __A, __m128i __B, const int __imm)
 {
-  return (__m256) __builtin_ia32_cvtpd2ps512_mask ((__v8df) __A,
-						   (__v8sf) __W,
-						   (__mmask8) __U, __R);
+  return (__m512i) __builtin_ia32_inserti32x4_mask ((__v16si) __A,
+						    (__v4si) __B,
+						    __imm,
+						    (__v16si) __A, -1);
 }
 
-extern __inline __m256
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvt_roundpd_ps (__mmask8 __U, __m512d __A, const int __R)
+_mm512_insertf32x4 (__m512 __A, __m128 __B, const int __imm)
 {
-  return (__m256) __builtin_ia32_cvtpd2ps512_mask ((__v8df) __A,
-						   (__v8sf)
-						   _mm256_setzero_ps (),
-						   (__mmask8) __U, __R);
+  return (__m512) __builtin_ia32_insertf32x4_mask ((__v16sf) __A,
+						   (__v4sf) __B,
+						   __imm,
+						   (__v16sf) __A, -1);
 }
 
-extern __inline __m128
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundsd_ss (__m128 __A, __m128d __B, const int __R)
+_mm512_inserti64x4 (__m512i __A, __m256i __B, const int __imm)
 {
-  return (__m128) __builtin_ia32_cvtsd2ss_round ((__v4sf) __A,
-						 (__v2df) __B,
-						 __R);
+  return (__m512i) __builtin_ia32_inserti64x4_mask ((__v8di) __A,
+						    (__v4di) __B,
+						    __imm,
+						    (__v8di)
+						    _mm512_undefined_epi32 (),
+						    (__mmask8) -1);
 }
 
-extern __inline __m128
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_cvt_roundsd_ss (__m128 __W, __mmask8 __U, __m128 __A,
-			 __m128d __B, const int __R)
+_mm512_mask_inserti64x4 (__m512i __W, __mmask8 __U, __m512i __A,
+			 __m256i __B, const int __imm)
 {
-  return (__m128) __builtin_ia32_cvtsd2ss_mask_round ((__v4sf) __A,
-						      (__v2df) __B,
-						      (__v4sf) __W,
-						      __U,
-						      __R);
+  return (__m512i) __builtin_ia32_inserti64x4_mask ((__v8di) __A,
+						    (__v4di) __B,
+						    __imm,
+						    (__v8di) __W,
+						    (__mmask8) __U);
 }
 
-extern __inline __m128
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_cvt_roundsd_ss (__mmask8 __U, __m128 __A,
-			 __m128d __B, const int __R)
+_mm512_maskz_inserti64x4 (__mmask8 __U, __m512i __A, __m256i __B,
+			  const int __imm)
 {
-  return (__m128) __builtin_ia32_cvtsd2ss_mask_round ((__v4sf) __A,
-						      (__v2df) __B,
-						      _mm_setzero_ps (),
-						      __U,
-						      __R);
+  return (__m512i) __builtin_ia32_inserti64x4_mask ((__v8di) __A,
+						    (__v4di) __B,
+						    __imm,
+						    (__v8di)
+						    _mm512_setzero_si512 (),
+						    (__mmask8) __U);
 }
 
-extern __inline __m128d
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundss_sd (__m128d __A, __m128 __B, const int __R)
+_mm512_insertf64x4 (__m512d __A, __m256d __B, const int __imm)
 {
-  return (__m128d) __builtin_ia32_cvtss2sd_round ((__v2df) __A,
-						  (__v4sf) __B,
-						  __R);
+  return (__m512d) __builtin_ia32_insertf64x4_mask ((__v8df) __A,
+						    (__v4df) __B,
+						    __imm,
+						    (__v8df)
+						    _mm512_undefined_pd (),
+						    (__mmask8) -1);
 }
 
-extern __inline __m128d
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_cvt_roundss_sd (__m128d __W, __mmask8 __U, __m128d __A,
-			 __m128 __B, const int __R)
+_mm512_mask_insertf64x4 (__m512d __W, __mmask8 __U, __m512d __A,
+			 __m256d __B, const int __imm)
 {
-  return (__m128d) __builtin_ia32_cvtss2sd_mask_round ((__v2df) __A,
-						       (__v4sf) __B,
-						       (__v2df) __W,
-						       __U,
-						       __R);
+  return (__m512d) __builtin_ia32_insertf64x4_mask ((__v8df) __A,
+						    (__v4df) __B,
+						    __imm,
+						    (__v8df) __W,
+						    (__mmask8) __U);
 }
 
-extern __inline __m128d
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_cvt_roundss_sd (__mmask8 __U, __m128d __A,
-			  __m128 __B, const int __R)
+_mm512_maskz_insertf64x4 (__mmask8 __U, __m512d __A, __m256d __B,
+			  const int __imm)
 {
-  return (__m128d) __builtin_ia32_cvtss2sd_mask_round ((__v2df) __A,
-						       (__v4sf) __B,
-						       _mm_setzero_pd (),
-						       __U,
-						       __R);
+  return (__m512d) __builtin_ia32_insertf64x4_mask ((__v8df) __A,
+						    (__v4df) __B,
+						    __imm,
+						    (__v8df)
+						    _mm512_setzero_pd (),
+						    (__mmask8) __U);
 }
 #else
-#define _mm512_cvt_roundpd_ps(A, B)		 \
-    (__m256)__builtin_ia32_cvtpd2ps512_mask(A, (__v8sf)_mm256_undefined_ps(), -1, B)
-
-#define _mm512_mask_cvt_roundpd_ps(W, U, A, B)   \
-    (__m256)__builtin_ia32_cvtpd2ps512_mask(A, (__v8sf)(W), U, B)
-
-#define _mm512_maskz_cvt_roundpd_ps(U, A, B)     \
-    (__m256)__builtin_ia32_cvtpd2ps512_mask(A, (__v8sf)_mm256_setzero_ps(), U, B)
+#define _mm512_insertf32x4(X, Y, C)                                     \
+  ((__m512) __builtin_ia32_insertf32x4_mask ((__v16sf)(__m512) (X),     \
+    (__v4sf)(__m128) (Y), (int) (C), (__v16sf)(__m512) (X), (__mmask16)(-1)))
 
-#define _mm_cvt_roundsd_ss(A, B, C)		 \
-    (__m128)__builtin_ia32_cvtsd2ss_round(A, B, C)
+#define _mm512_inserti32x4(X, Y, C)                                     \
+  ((__m512i) __builtin_ia32_inserti32x4_mask ((__v16si)(__m512i) (X),   \
+    (__v4si)(__m128i) (Y), (int) (C), (__v16si)(__m512i) (X), (__mmask16)(-1)))
 
-#define _mm_mask_cvt_roundsd_ss(W, U, A, B, C)	\
-    (__m128)__builtin_ia32_cvtsd2ss_mask_round ((A), (B), (W), (U), (C))
+#define _mm512_insertf64x4(X, Y, C)                                     \
+  ((__m512d) __builtin_ia32_insertf64x4_mask ((__v8df)(__m512d) (X),    \
+    (__v4df)(__m256d) (Y), (int) (C),					\
+    (__v8df)(__m512d)_mm512_undefined_pd(),				\
+    (__mmask8)-1))
 
-#define _mm_maskz_cvt_roundsd_ss(U, A, B, C)	\
-    (__m128)__builtin_ia32_cvtsd2ss_mask_round ((A), (B), _mm_setzero_ps (), \
-						(U), (C))
+#define _mm512_mask_insertf64x4(W, U, X, Y, C)                          \
+  ((__m512d) __builtin_ia32_insertf64x4_mask ((__v8df)(__m512d) (X),    \
+    (__v4df)(__m256d) (Y), (int) (C),					\
+    (__v8df)(__m512d)(W),						\
+    (__mmask8)(U)))
 
-#define _mm_cvt_roundss_sd(A, B, C)		 \
-    (__m128d)__builtin_ia32_cvtss2sd_round(A, B, C)
+#define _mm512_maskz_insertf64x4(U, X, Y, C)                            \
+  ((__m512d) __builtin_ia32_insertf64x4_mask ((__v8df)(__m512d) (X),    \
+    (__v4df)(__m256d) (Y), (int) (C),					\
+    (__v8df)(__m512d)_mm512_setzero_pd(),				\
+    (__mmask8)(U)))
 
-#define _mm_mask_cvt_roundss_sd(W, U, A, B, C)	\
-    (__m128d)__builtin_ia32_cvtss2sd_mask_round ((A), (B), (W), (U), (C))
+#define _mm512_inserti64x4(X, Y, C)                                     \
+  ((__m512i) __builtin_ia32_inserti64x4_mask ((__v8di)(__m512i) (X),    \
+    (__v4di)(__m256i) (Y), (int) (C),					\
+    (__v8di)(__m512i)_mm512_undefined_epi32 (),				\
+    (__mmask8)-1))
 
-#define _mm_maskz_cvt_roundss_sd(U, A, B, C)	\
-    (__m128d)__builtin_ia32_cvtss2sd_mask_round ((A), (B), _mm_setzero_pd (), \
-						 (U), (C))
+#define _mm512_mask_inserti64x4(W, U, X, Y, C)                          \
+  ((__m512i) __builtin_ia32_inserti64x4_mask ((__v8di)(__m512i) (X),    \
+    (__v4di)(__m256i) (Y), (int) (C),\
+    (__v8di)(__m512i)(W),\
+    (__mmask8)(U)))
 
+#define _mm512_maskz_inserti64x4(U, X, Y, C)                            \
+  ((__m512i) __builtin_ia32_inserti64x4_mask ((__v8di)(__m512i) (X),    \
+    (__v4di)(__m256i) (Y), (int) (C),					\
+    (__v8di)(__m512i)_mm512_setzero_si512 (),				\
+    (__mmask8)(U)))
 #endif
 
-#define _mm_mask_cvtss_sd(W, U, A, B) \
-    _mm_mask_cvt_roundss_sd ((W), (U), (A), (B), _MM_FROUND_CUR_DIRECTION)
-
-#define _mm_maskz_cvtss_sd(U, A, B) \
-    _mm_maskz_cvt_roundss_sd ((U), (A), (B), _MM_FROUND_CUR_DIRECTION)
+extern __inline __m512d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_loadu_pd (void const *__P)
+{
+  return *(__m512d_u *)__P;
+}
 
-#define _mm_mask_cvtsd_ss(W, U, A, B) \
-    _mm_mask_cvt_roundsd_ss ((W), (U), (A), (B), _MM_FROUND_CUR_DIRECTION)
+extern __inline __m512d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_loadu_pd (__m512d __W, __mmask8 __U, void const *__P)
+{
+  return (__m512d) __builtin_ia32_loadupd512_mask ((const double *) __P,
+						   (__v8df) __W,
+						   (__mmask8) __U);
+}
 
-#define _mm_maskz_cvtsd_ss(U, A, B) \
-    _mm_maskz_cvt_roundsd_ss ((U), (A), (B), _MM_FROUND_CUR_DIRECTION)
+extern __inline __m512d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_loadu_pd (__mmask8 __U, void const *__P)
+{
+  return (__m512d) __builtin_ia32_loadupd512_mask ((const double *) __P,
+						   (__v8df)
+						   _mm512_setzero_pd (),
+						   (__mmask8) __U);
+}
 
 extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_stream_si512 (__m512i * __P, __m512i __A)
+_mm512_storeu_pd (void *__P, __m512d __A)
 {
-  __builtin_ia32_movntdq512 ((__v8di *) __P, (__v8di) __A);
+  *(__m512d_u *)__P = __A;
 }
 
 extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_stream_ps (float *__P, __m512 __A)
+_mm512_mask_storeu_pd (void *__P, __mmask8 __U, __m512d __A)
 {
-  __builtin_ia32_movntps512 (__P, (__v16sf) __A);
+  __builtin_ia32_storeupd512_mask ((double *) __P, (__v8df) __A,
+				   (__mmask8) __U);
 }
 
-extern __inline void
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_stream_pd (double *__P, __m512d __A)
+_mm512_loadu_ps (void const *__P)
 {
-  __builtin_ia32_movntpd512 (__P, (__v8df) __A);
+  return *(__m512_u *)__P;
 }
 
-extern __inline __m512i
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_stream_load_si512 (void *__P)
+_mm512_mask_loadu_ps (__m512 __W, __mmask16 __U, void const *__P)
 {
-  return __builtin_ia32_movntdqa512 ((__v8di *)__P);
+  return (__m512) __builtin_ia32_loadups512_mask ((const float *) __P,
+						  (__v16sf) __W,
+						  (__mmask16) __U);
 }
 
-/* Constants for mantissa extraction */
-typedef enum
+extern __inline __m512
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_loadu_ps (__mmask16 __U, void const *__P)
 {
-  _MM_MANT_NORM_1_2,		/* interval [1, 2)      */
-  _MM_MANT_NORM_p5_2,		/* interval [0.5, 2)    */
-  _MM_MANT_NORM_p5_1,		/* interval [0.5, 1)    */
-  _MM_MANT_NORM_p75_1p5		/* interval [0.75, 1.5) */
-} _MM_MANTISSA_NORM_ENUM;
+  return (__m512) __builtin_ia32_loadups512_mask ((const float *) __P,
+						  (__v16sf)
+						  _mm512_setzero_ps (),
+						  (__mmask16) __U);
+}
 
-typedef enum
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_storeu_ps (void *__P, __m512 __A)
 {
-  _MM_MANT_SIGN_src,		/* sign = sign(SRC)     */
-  _MM_MANT_SIGN_zero,		/* sign = 0             */
-  _MM_MANT_SIGN_nan		/* DEST = NaN if sign(SRC) = 1 */
-} _MM_MANTISSA_SIGN_ENUM;
+  *(__m512_u *)__P = __A;
+}
 
-#ifdef __OPTIMIZE__
-extern __inline __m128
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_getexp_round_ss (__m128 __A, __m128 __B, const int __R)
+_mm512_mask_storeu_ps (void *__P, __mmask16 __U, __m512 __A)
 {
-  return (__m128) __builtin_ia32_getexpss128_round ((__v4sf) __A,
-						    (__v4sf) __B,
-						    __R);
+  __builtin_ia32_storeups512_mask ((float *) __P, (__v16sf) __A,
+				   (__mmask16) __U);
 }
 
-extern __inline __m128
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_getexp_round_ss (__m128 __W, __mmask8 __U, __m128 __A,
-			  __m128 __B, const int __R)
+_mm512_loadu_epi64 (void const *__P)
 {
-  return (__m128) __builtin_ia32_getexpss_mask_round ((__v4sf) __A,
-						 (__v4sf) __B,
-						 (__v4sf) __W,
-						 (__mmask8) __U, __R);
+  return *(__m512i_u *) __P;
 }
 
-extern __inline __m128
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_getexp_round_ss (__mmask8 __U, __m128 __A, __m128 __B,
-			   const int __R)
+_mm512_mask_loadu_epi64 (__m512i __W, __mmask8 __U, void const *__P)
 {
-  return (__m128) __builtin_ia32_getexpss_mask_round ((__v4sf) __A,
-						 (__v4sf) __B,
-						 (__v4sf)
-						 _mm_setzero_ps (),
-						 (__mmask8) __U, __R);
+  return (__m512i) __builtin_ia32_loaddqudi512_mask ((const long long *) __P,
+						     (__v8di) __W,
+						     (__mmask8) __U);
 }
 
-extern __inline __m128d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_getexp_round_sd (__m128d __A, __m128d __B, const int __R)
+_mm512_maskz_loadu_epi64 (__mmask8 __U, void const *__P)
 {
-  return (__m128d) __builtin_ia32_getexpsd128_round ((__v2df) __A,
-						     (__v2df) __B,
-						     __R);
+  return (__m512i) __builtin_ia32_loaddqudi512_mask ((const long long *) __P,
+						     (__v8di)
+						     _mm512_setzero_si512 (),
+						     (__mmask8) __U);
 }
 
-extern __inline __m128d
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_getexp_round_sd (__m128d __W, __mmask8 __U, __m128d __A,
-			  __m128d __B, const int __R)
+_mm512_storeu_epi64 (void *__P, __m512i __A)
 {
-  return (__m128d) __builtin_ia32_getexpsd_mask_round ((__v2df) __A,
-						 (__v2df) __B,
-						 (__v2df) __W,
-						 (__mmask8) __U, __R);
+  *(__m512i_u *) __P = (__m512i_u) __A;
 }
 
-extern __inline __m128d
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_getexp_round_sd (__mmask8 __U, __m128d __A, __m128d __B,
-			   const int __R)
+_mm512_mask_storeu_epi64 (void *__P, __mmask8 __U, __m512i __A)
 {
-  return (__m128d) __builtin_ia32_getexpsd_mask_round ((__v2df) __A,
-						 (__v2df) __B,
-						 (__v2df)
-						 _mm_setzero_pd (),
-						 (__mmask8) __U, __R);
+  __builtin_ia32_storedqudi512_mask ((long long *) __P, (__v8di) __A,
+				     (__mmask8) __U);
 }
 
-extern __inline __m512
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_getexp_round_ps (__m512 __A, const int __R)
+_mm512_loadu_si512 (void const *__P)
 {
-  return (__m512) __builtin_ia32_getexpps512_mask ((__v16sf) __A,
-						   (__v16sf)
-						   _mm512_undefined_ps (),
-						   (__mmask16) -1, __R);
+  return *(__m512i_u *)__P;
 }
 
-extern __inline __m512
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_getexp_round_ps (__m512 __W, __mmask16 __U, __m512 __A,
-			     const int __R)
+_mm512_loadu_epi32 (void const *__P)
 {
-  return (__m512) __builtin_ia32_getexpps512_mask ((__v16sf) __A,
-						   (__v16sf) __W,
-						   (__mmask16) __U, __R);
+  return *(__m512i_u *) __P;
 }
 
-extern __inline __m512
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_getexp_round_ps (__mmask16 __U, __m512 __A, const int __R)
+_mm512_mask_loadu_epi32 (__m512i __W, __mmask16 __U, void const *__P)
 {
-  return (__m512) __builtin_ia32_getexpps512_mask ((__v16sf) __A,
-						   (__v16sf)
-						   _mm512_setzero_ps (),
-						   (__mmask16) __U, __R);
+  return (__m512i) __builtin_ia32_loaddqusi512_mask ((const int *) __P,
+						     (__v16si) __W,
+						     (__mmask16) __U);
 }
 
-extern __inline __m512d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_getexp_round_pd (__m512d __A, const int __R)
+_mm512_maskz_loadu_epi32 (__mmask16 __U, void const *__P)
 {
-  return (__m512d) __builtin_ia32_getexppd512_mask ((__v8df) __A,
-						    (__v8df)
-						    _mm512_undefined_pd (),
-						    (__mmask8) -1, __R);
+  return (__m512i) __builtin_ia32_loaddqusi512_mask ((const int *) __P,
+						     (__v16si)
+						     _mm512_setzero_si512 (),
+						     (__mmask16) __U);
 }
 
-extern __inline __m512d
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_getexp_round_pd (__m512d __W, __mmask8 __U, __m512d __A,
-			     const int __R)
+_mm512_storeu_si512 (void *__P, __m512i __A)
 {
-  return (__m512d) __builtin_ia32_getexppd512_mask ((__v8df) __A,
-						    (__v8df) __W,
-						    (__mmask8) __U, __R);
+  *(__m512i_u *)__P = __A;
 }
 
-extern __inline __m512d
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_getexp_round_pd (__mmask8 __U, __m512d __A, const int __R)
+_mm512_storeu_epi32 (void *__P, __m512i __A)
 {
-  return (__m512d) __builtin_ia32_getexppd512_mask ((__v8df) __A,
-						    (__v8df)
-						    _mm512_setzero_pd (),
-						    (__mmask8) __U, __R);
+  *(__m512i_u *) __P = (__m512i_u) __A;
 }
 
-extern __inline __m512d
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_getmant_round_pd (__m512d __A, _MM_MANTISSA_NORM_ENUM __B,
-			 _MM_MANTISSA_SIGN_ENUM __C, const int __R)
+_mm512_mask_storeu_epi32 (void *__P, __mmask16 __U, __m512i __A)
 {
-  return (__m512d) __builtin_ia32_getmantpd512_mask ((__v8df) __A,
-						     (__C << 2) | __B,
-						     _mm512_undefined_pd (),
-						     (__mmask8) -1, __R);
+  __builtin_ia32_storedqusi512_mask ((int *) __P, (__v16si) __A,
+				     (__mmask16) __U);
 }
 
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_getmant_round_pd (__m512d __W, __mmask8 __U, __m512d __A,
-			      _MM_MANTISSA_NORM_ENUM __B,
-			      _MM_MANTISSA_SIGN_ENUM __C, const int __R)
+_mm512_permutevar_pd (__m512d __A, __m512i __C)
 {
-  return (__m512d) __builtin_ia32_getmantpd512_mask ((__v8df) __A,
-						     (__C << 2) | __B,
-						     (__v8df) __W, __U,
-						     __R);
+  return (__m512d) __builtin_ia32_vpermilvarpd512_mask ((__v8df) __A,
+							(__v8di) __C,
+							(__v8df)
+							_mm512_undefined_pd (),
+							(__mmask8) -1);
 }
 
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_getmant_round_pd (__mmask8 __U, __m512d __A,
-			       _MM_MANTISSA_NORM_ENUM __B,
-			       _MM_MANTISSA_SIGN_ENUM __C, const int __R)
+_mm512_mask_permutevar_pd (__m512d __W, __mmask8 __U, __m512d __A, __m512i __C)
 {
-  return (__m512d) __builtin_ia32_getmantpd512_mask ((__v8df) __A,
-						     (__C << 2) | __B,
-						     (__v8df)
-						     _mm512_setzero_pd (),
-						     __U, __R);
+  return (__m512d) __builtin_ia32_vpermilvarpd512_mask ((__v8df) __A,
+							(__v8di) __C,
+							(__v8df) __W,
+							(__mmask8) __U);
 }
 
-extern __inline __m512
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_getmant_round_ps (__m512 __A, _MM_MANTISSA_NORM_ENUM __B,
-			 _MM_MANTISSA_SIGN_ENUM __C, const int __R)
+_mm512_maskz_permutevar_pd (__mmask8 __U, __m512d __A, __m512i __C)
 {
-  return (__m512) __builtin_ia32_getmantps512_mask ((__v16sf) __A,
-						    (__C << 2) | __B,
-						    _mm512_undefined_ps (),
-						    (__mmask16) -1, __R);
+  return (__m512d) __builtin_ia32_vpermilvarpd512_mask ((__v8df) __A,
+							(__v8di) __C,
+							(__v8df)
+							_mm512_setzero_pd (),
+							(__mmask8) __U);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_getmant_round_ps (__m512 __W, __mmask16 __U, __m512 __A,
-			      _MM_MANTISSA_NORM_ENUM __B,
-			      _MM_MANTISSA_SIGN_ENUM __C, const int __R)
+_mm512_permutevar_ps (__m512 __A, __m512i __C)
 {
-  return (__m512) __builtin_ia32_getmantps512_mask ((__v16sf) __A,
-						    (__C << 2) | __B,
-						    (__v16sf) __W, __U,
-						    __R);
+  return (__m512) __builtin_ia32_vpermilvarps512_mask ((__v16sf) __A,
+						       (__v16si) __C,
+						       (__v16sf)
+						       _mm512_undefined_ps (),
+						       (__mmask16) -1);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_getmant_round_ps (__mmask16 __U, __m512 __A,
-			       _MM_MANTISSA_NORM_ENUM __B,
-			       _MM_MANTISSA_SIGN_ENUM __C, const int __R)
+_mm512_mask_permutevar_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512i __C)
 {
-  return (__m512) __builtin_ia32_getmantps512_mask ((__v16sf) __A,
-						    (__C << 2) | __B,
-						    (__v16sf)
-						    _mm512_setzero_ps (),
-						    __U, __R);
+  return (__m512) __builtin_ia32_vpermilvarps512_mask ((__v16sf) __A,
+						       (__v16si) __C,
+						       (__v16sf) __W,
+						       (__mmask16) __U);
 }
 
-extern __inline __m128d
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_getmant_round_sd (__m128d __A, __m128d __B,
-		      _MM_MANTISSA_NORM_ENUM __C,
-		      _MM_MANTISSA_SIGN_ENUM __D, const int __R)
+_mm512_maskz_permutevar_ps (__mmask16 __U, __m512 __A, __m512i __C)
 {
-  return (__m128d) __builtin_ia32_getmantsd_round ((__v2df) __A,
-						  (__v2df) __B,
-						  (__D << 2) | __C,
-						   __R);
+  return (__m512) __builtin_ia32_vpermilvarps512_mask ((__v16sf) __A,
+						       (__v16si) __C,
+						       (__v16sf)
+						       _mm512_setzero_ps (),
+						       (__mmask16) __U);
 }
 
-extern __inline __m128d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_getmant_round_sd (__m128d __W, __mmask8 __U, __m128d __A,
-			      __m128d __B, _MM_MANTISSA_NORM_ENUM __C,
-			      _MM_MANTISSA_SIGN_ENUM __D, const int __R)
+_mm512_permutex2var_epi64 (__m512i __A, __m512i __I, __m512i __B)
 {
-  return (__m128d) __builtin_ia32_getmantsd_mask_round ((__v2df) __A,
-						    (__v2df) __B,
-						    (__D << 2) | __C,
-                                                    (__v2df) __W,
-						     __U, __R);
+  return (__m512i) __builtin_ia32_vpermt2varq512_mask ((__v8di) __I
+						       /* idx */ ,
+						       (__v8di) __A,
+						       (__v8di) __B,
+						       (__mmask8) -1);
 }
 
-extern __inline __m128d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_getmant_round_sd (__mmask8 __U, __m128d __A, __m128d __B,
-			       _MM_MANTISSA_NORM_ENUM __C,
-			       _MM_MANTISSA_SIGN_ENUM __D, const int __R)
+_mm512_mask_permutex2var_epi64 (__m512i __A, __mmask8 __U, __m512i __I,
+				__m512i __B)
 {
-  return (__m128d) __builtin_ia32_getmantsd_mask_round ((__v2df) __A,
-							(__v2df) __B,
-						        (__D << 2) | __C,
-                                                        (__v2df)
-                                                        _mm_setzero_pd(),
-						        __U, __R);
+  return (__m512i) __builtin_ia32_vpermt2varq512_mask ((__v8di) __I
+						       /* idx */ ,
+						       (__v8di) __A,
+						       (__v8di) __B,
+						       (__mmask8) __U);
 }
 
-extern __inline __m128
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_getmant_round_ss (__m128 __A, __m128 __B,
-		      _MM_MANTISSA_NORM_ENUM __C,
-		      _MM_MANTISSA_SIGN_ENUM __D, const int __R)
+_mm512_mask2_permutex2var_epi64 (__m512i __A, __m512i __I,
+				 __mmask8 __U, __m512i __B)
 {
-  return (__m128) __builtin_ia32_getmantss_round ((__v4sf) __A,
-						  (__v4sf) __B,
-						  (__D << 2) | __C,
-						  __R);
+  return (__m512i) __builtin_ia32_vpermi2varq512_mask ((__v8di) __A,
+						       (__v8di) __I
+						       /* idx */ ,
+						       (__v8di) __B,
+						       (__mmask8) __U);
 }
 
-extern __inline __m128
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_getmant_round_ss (__m128 __W, __mmask8 __U, __m128 __A,
-			      __m128 __B, _MM_MANTISSA_NORM_ENUM __C,
-			      _MM_MANTISSA_SIGN_ENUM __D, const int __R)
+_mm512_maskz_permutex2var_epi64 (__mmask8 __U, __m512i __A,
+				 __m512i __I, __m512i __B)
 {
-  return (__m128) __builtin_ia32_getmantss_mask_round ((__v4sf) __A,
-						    (__v4sf) __B,
-						    (__D << 2) | __C,
-                                                    (__v4sf) __W,
-						     __U, __R);
+  return (__m512i) __builtin_ia32_vpermt2varq512_maskz ((__v8di) __I
+							/* idx */ ,
+							(__v8di) __A,
+							(__v8di) __B,
+							(__mmask8) __U);
 }
 
-extern __inline __m128
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_getmant_round_ss (__mmask8 __U, __m128 __A, __m128 __B,
-			       _MM_MANTISSA_NORM_ENUM __C,
-			       _MM_MANTISSA_SIGN_ENUM __D, const int __R)
+_mm512_permutex2var_epi32 (__m512i __A, __m512i __I, __m512i __B)
 {
-  return (__m128) __builtin_ia32_getmantss_mask_round ((__v4sf) __A,
-							(__v4sf) __B,
-						        (__D << 2) | __C,
-                                                        (__v4sf)
-                                                        _mm_setzero_ps(),
-						        __U, __R);
-}
-
-#else
-#define _mm512_getmant_round_pd(X, B, C, R)                                                  \
-  ((__m512d)__builtin_ia32_getmantpd512_mask ((__v8df)(__m512d)(X),                 \
-                                              (int)(((C)<<2) | (B)),                \
-                                              (__v8df)(__m512d)_mm512_undefined_pd(), \
-                                              (__mmask8)-1,\
-					      (R)))
-
-#define _mm512_mask_getmant_round_pd(W, U, X, B, C, R)                                       \
-  ((__m512d)__builtin_ia32_getmantpd512_mask ((__v8df)(__m512d)(X),                 \
-                                              (int)(((C)<<2) | (B)),                \
-                                              (__v8df)(__m512d)(W),                 \
-                                              (__mmask8)(U),\
-					      (R)))
-
-#define _mm512_maskz_getmant_round_pd(U, X, B, C, R)                                         \
-  ((__m512d)__builtin_ia32_getmantpd512_mask ((__v8df)(__m512d)(X),                 \
-                                              (int)(((C)<<2) | (B)),                \
-                                              (__v8df)(__m512d)_mm512_setzero_pd(), \
-                                              (__mmask8)(U),\
-					      (R)))
-#define _mm512_getmant_round_ps(X, B, C, R)                                                  \
-  ((__m512)__builtin_ia32_getmantps512_mask ((__v16sf)(__m512)(X),                  \
-                                             (int)(((C)<<2) | (B)),                 \
-                                             (__v16sf)(__m512)_mm512_undefined_ps(), \
-                                             (__mmask16)-1,\
-					     (R)))
-
-#define _mm512_mask_getmant_round_ps(W, U, X, B, C, R)                                       \
-  ((__m512)__builtin_ia32_getmantps512_mask ((__v16sf)(__m512)(X),                  \
-                                             (int)(((C)<<2) | (B)),                 \
-                                             (__v16sf)(__m512)(W),                  \
-                                             (__mmask16)(U),\
-					     (R)))
-
-#define _mm512_maskz_getmant_round_ps(U, X, B, C, R)                                         \
-  ((__m512)__builtin_ia32_getmantps512_mask ((__v16sf)(__m512)(X),                  \
-                                             (int)(((C)<<2) | (B)),                 \
-                                             (__v16sf)(__m512)_mm512_setzero_ps(),  \
-                                             (__mmask16)(U),\
-					     (R)))
-#define _mm_getmant_round_sd(X, Y, C, D, R)                                                  \
-  ((__m128d)__builtin_ia32_getmantsd_round ((__v2df)(__m128d)(X),                    \
-					    (__v2df)(__m128d)(Y),	\
-					    (int)(((D)<<2) | (C)),	\
-					    (R)))
-
-#define _mm_mask_getmant_round_sd(W, U, X, Y, C, D, R)                                       \
-  ((__m128d)__builtin_ia32_getmantsd_mask_round ((__v2df)(__m128d)(X),                  \
-					     (__v2df)(__m128d)(Y),                  \
-                                             (int)(((D)<<2) | (C)),                 \
-                                             (__v2df)(__m128d)(W),                   \
-                                             (__mmask8)(U),\
-					     (R)))
-
-#define _mm_maskz_getmant_round_sd(U, X, Y, C, D, R)                                         \
-  ((__m128d)__builtin_ia32_getmantsd_mask_round ((__v2df)(__m128d)(X),                  \
-                                                 (__v2df)(__m128d)(Y),                  \
-                                             (int)(((D)<<2) | (C)),              \
-                                             (__v2df)(__m128d)_mm_setzero_pd(),  \
-                                             (__mmask8)(U),\
-					     (R)))
-
-#define _mm_getmant_round_ss(X, Y, C, D, R)                                                  \
-  ((__m128)__builtin_ia32_getmantss_round ((__v4sf)(__m128)(X),                      \
-					   (__v4sf)(__m128)(Y),		\
-					   (int)(((D)<<2) | (C)),	\
-					   (R)))
-
-#define _mm_mask_getmant_round_ss(W, U, X, Y, C, D, R)                                       \
-  ((__m128)__builtin_ia32_getmantss_mask_round ((__v4sf)(__m128)(X),                  \
-					     (__v4sf)(__m128)(Y),                  \
-                                             (int)(((D)<<2) | (C)),                 \
-                                             (__v4sf)(__m128)(W),                   \
-                                             (__mmask8)(U),\
-					     (R)))
-
-#define _mm_maskz_getmant_round_ss(U, X, Y, C, D, R)                                         \
-  ((__m128)__builtin_ia32_getmantss_mask_round ((__v4sf)(__m128)(X),                  \
-                                                 (__v4sf)(__m128)(Y),                  \
-                                             (int)(((D)<<2) | (C)),              \
-                                             (__v4sf)(__m128)_mm_setzero_ps(),  \
-                                             (__mmask8)(U),\
-					     (R)))
-
-#define _mm_getexp_round_ss(A, B, R)						      \
-  ((__m128)__builtin_ia32_getexpss128_round((__v4sf)(__m128)(A), (__v4sf)(__m128)(B), R))
-
-#define _mm_mask_getexp_round_ss(W, U, A, B, C) \
-    (__m128)__builtin_ia32_getexpss_mask_round(A, B, W, U, C)
-
-#define _mm_maskz_getexp_round_ss(U, A, B, C)   \
-    (__m128)__builtin_ia32_getexpss_mask_round(A, B, (__v4sf)_mm_setzero_ps(), U, C)
-
-#define _mm_getexp_round_sd(A, B, R)						       \
-  ((__m128d)__builtin_ia32_getexpsd128_round((__v2df)(__m128d)(A), (__v2df)(__m128d)(B), R))
-
-#define _mm_mask_getexp_round_sd(W, U, A, B, C) \
-    (__m128d)__builtin_ia32_getexpsd_mask_round(A, B, W, U, C)
-
-#define _mm_maskz_getexp_round_sd(U, A, B, C)   \
-    (__m128d)__builtin_ia32_getexpsd_mask_round(A, B, (__v2df)_mm_setzero_pd(), U, C)
+  return (__m512i) __builtin_ia32_vpermt2vard512_mask ((__v16si) __I
+						       /* idx */ ,
+						       (__v16si) __A,
+						       (__v16si) __B,
+						       (__mmask16) -1);
+}
 
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_permutex2var_epi32 (__m512i __A, __mmask16 __U,
+				__m512i __I, __m512i __B)
+{
+  return (__m512i) __builtin_ia32_vpermt2vard512_mask ((__v16si) __I
+						       /* idx */ ,
+						       (__v16si) __A,
+						       (__v16si) __B,
+						       (__mmask16) __U);
+}
 
-#define _mm512_getexp_round_ps(A, R)						\
-  ((__m512)__builtin_ia32_getexpps512_mask((__v16sf)(__m512)(A),		\
-  (__v16sf)_mm512_undefined_ps(), (__mmask16)-1, R))
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask2_permutex2var_epi32 (__m512i __A, __m512i __I,
+				 __mmask16 __U, __m512i __B)
+{
+  return (__m512i) __builtin_ia32_vpermi2vard512_mask ((__v16si) __A,
+						       (__v16si) __I
+						       /* idx */ ,
+						       (__v16si) __B,
+						       (__mmask16) __U);
+}
 
-#define _mm512_mask_getexp_round_ps(W, U, A, R)					\
-  ((__m512)__builtin_ia32_getexpps512_mask((__v16sf)(__m512)(A),		\
-  (__v16sf)(__m512)(W), (__mmask16)(U), R))
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_permutex2var_epi32 (__mmask16 __U, __m512i __A,
+				 __m512i __I, __m512i __B)
+{
+  return (__m512i) __builtin_ia32_vpermt2vard512_maskz ((__v16si) __I
+							/* idx */ ,
+							(__v16si) __A,
+							(__v16si) __B,
+							(__mmask16) __U);
+}
 
-#define _mm512_maskz_getexp_round_ps(U, A, R)					\
-  ((__m512)__builtin_ia32_getexpps512_mask((__v16sf)(__m512)(A),		\
-  (__v16sf)_mm512_setzero_ps(), (__mmask16)(U), R))
+extern __inline __m512d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_permutex2var_pd (__m512d __A, __m512i __I, __m512d __B)
+{
+  return (__m512d) __builtin_ia32_vpermt2varpd512_mask ((__v8di) __I
+							/* idx */ ,
+							(__v8df) __A,
+							(__v8df) __B,
+							(__mmask8) -1);
+}
 
-#define _mm512_getexp_round_pd(A, R)						\
-  ((__m512d)__builtin_ia32_getexppd512_mask((__v8df)(__m512d)(A),		\
-  (__v8df)_mm512_undefined_pd(), (__mmask8)-1, R))
+extern __inline __m512d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_permutex2var_pd (__m512d __A, __mmask8 __U, __m512i __I,
+			     __m512d __B)
+{
+  return (__m512d) __builtin_ia32_vpermt2varpd512_mask ((__v8di) __I
+							/* idx */ ,
+							(__v8df) __A,
+							(__v8df) __B,
+							(__mmask8) __U);
+}
 
-#define _mm512_mask_getexp_round_pd(W, U, A, R)					\
-  ((__m512d)__builtin_ia32_getexppd512_mask((__v8df)(__m512d)(A),		\
-  (__v8df)(__m512d)(W), (__mmask8)(U), R))
+extern __inline __m512d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask2_permutex2var_pd (__m512d __A, __m512i __I, __mmask8 __U,
+			      __m512d __B)
+{
+  return (__m512d) __builtin_ia32_vpermi2varpd512_mask ((__v8df) __A,
+							(__v8di) __I
+							/* idx */ ,
+							(__v8df) __B,
+							(__mmask8) __U);
+}
 
-#define _mm512_maskz_getexp_round_pd(U, A, R)					\
-  ((__m512d)__builtin_ia32_getexppd512_mask((__v8df)(__m512d)(A),		\
-  (__v8df)_mm512_setzero_pd(), (__mmask8)(U), R))
-#endif
+extern __inline __m512d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_permutex2var_pd (__mmask8 __U, __m512d __A, __m512i __I,
+			      __m512d __B)
+{
+  return (__m512d) __builtin_ia32_vpermt2varpd512_maskz ((__v8di) __I
+							 /* idx */ ,
+							 (__v8df) __A,
+							 (__v8df) __B,
+							 (__mmask8) __U);
+}
 
-#ifdef __OPTIMIZE__
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_roundscale_round_ps (__m512 __A, const int __imm, const int __R)
+_mm512_permutex2var_ps (__m512 __A, __m512i __I, __m512 __B)
 {
-  return (__m512) __builtin_ia32_rndscaleps_mask ((__v16sf) __A, __imm,
-						  (__v16sf)
-						  _mm512_undefined_ps (),
-						  -1, __R);
+  return (__m512) __builtin_ia32_vpermt2varps512_mask ((__v16si) __I
+						       /* idx */ ,
+						       (__v16sf) __A,
+						       (__v16sf) __B,
+						       (__mmask16) -1);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_roundscale_round_ps (__m512 __A, __mmask16 __B, __m512 __C,
-				 const int __imm, const int __R)
+_mm512_mask_permutex2var_ps (__m512 __A, __mmask16 __U, __m512i __I, __m512 __B)
 {
-  return (__m512) __builtin_ia32_rndscaleps_mask ((__v16sf) __C, __imm,
-						  (__v16sf) __A,
-						  (__mmask16) __B, __R);
+  return (__m512) __builtin_ia32_vpermt2varps512_mask ((__v16si) __I
+						       /* idx */ ,
+						       (__v16sf) __A,
+						       (__v16sf) __B,
+						       (__mmask16) __U);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_roundscale_round_ps (__mmask16 __A, __m512 __B,
-				  const int __imm, const int __R)
+_mm512_mask2_permutex2var_ps (__m512 __A, __m512i __I, __mmask16 __U,
+			      __m512 __B)
 {
-  return (__m512) __builtin_ia32_rndscaleps_mask ((__v16sf) __B,
-						  __imm,
-						  (__v16sf)
-						  _mm512_setzero_ps (),
-						  (__mmask16) __A, __R);
+  return (__m512) __builtin_ia32_vpermi2varps512_mask ((__v16sf) __A,
+						       (__v16si) __I
+						       /* idx */ ,
+						       (__v16sf) __B,
+						       (__mmask16) __U);
 }
 
-extern __inline __m512d
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_roundscale_round_pd (__m512d __A, const int __imm, const int __R)
+_mm512_maskz_permutex2var_ps (__mmask16 __U, __m512 __A, __m512i __I,
+			      __m512 __B)
 {
-  return (__m512d) __builtin_ia32_rndscalepd_mask ((__v8df) __A, __imm,
-						   (__v8df)
-						   _mm512_undefined_pd (),
-						   -1, __R);
+  return (__m512) __builtin_ia32_vpermt2varps512_maskz ((__v16si) __I
+							/* idx */ ,
+							(__v16sf) __A,
+							(__v16sf) __B,
+							(__mmask16) __U);
 }
 
+#ifdef __OPTIMIZE__
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_roundscale_round_pd (__m512d __A, __mmask8 __B,
-				 __m512d __C, const int __imm, const int __R)
+_mm512_permute_pd (__m512d __X, const int __C)
 {
-  return (__m512d) __builtin_ia32_rndscalepd_mask ((__v8df) __C, __imm,
-						   (__v8df) __A,
-						   (__mmask8) __B, __R);
+  return (__m512d) __builtin_ia32_vpermilpd512_mask ((__v8df) __X, __C,
+						     (__v8df)
+						     _mm512_undefined_pd (),
+						     (__mmask8) -1);
 }
 
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_roundscale_round_pd (__mmask8 __A, __m512d __B,
-				  const int __imm, const int __R)
+_mm512_mask_permute_pd (__m512d __W, __mmask8 __U, __m512d __X, const int __C)
 {
-  return (__m512d) __builtin_ia32_rndscalepd_mask ((__v8df) __B,
-						   __imm,
-						   (__v8df)
-						   _mm512_setzero_pd (),
-						   (__mmask8) __A, __R);
+  return (__m512d) __builtin_ia32_vpermilpd512_mask ((__v8df) __X, __C,
+						     (__v8df) __W,
+						     (__mmask8) __U);
 }
 
-extern __inline __m128
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_roundscale_round_ss (__m128 __A, __m128 __B, const int __imm,
-			 const int __R)
+_mm512_maskz_permute_pd (__mmask8 __U, __m512d __X, const int __C)
 {
-  return (__m128)
-    __builtin_ia32_rndscaless_mask_round ((__v4sf) __A,
-					  (__v4sf) __B, __imm,
-					  (__v4sf)
-					  _mm_setzero_ps (),
-					  (__mmask8) -1,
-					  __R);
+  return (__m512d) __builtin_ia32_vpermilpd512_mask ((__v8df) __X, __C,
+						     (__v8df)
+						     _mm512_setzero_pd (),
+						     (__mmask8) __U);
 }
 
-extern __inline __m128
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_roundscale_round_ss (__m128 __A, __mmask8 __B, __m128 __C,
-			      __m128 __D, const int __imm, const int __R)
+_mm512_permute_ps (__m512 __X, const int __C)
 {
-  return (__m128)
-    __builtin_ia32_rndscaless_mask_round ((__v4sf) __C,
-					  (__v4sf) __D, __imm,
-					  (__v4sf) __A,
-					  (__mmask8) __B,
-					  __R);
+  return (__m512) __builtin_ia32_vpermilps512_mask ((__v16sf) __X, __C,
+						    (__v16sf)
+						    _mm512_undefined_ps (),
+						    (__mmask16) -1);
 }
 
-extern __inline __m128
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_roundscale_round_ss (__mmask8 __A, __m128 __B, __m128 __C,
-			       const int __imm, const int __R)
+_mm512_mask_permute_ps (__m512 __W, __mmask16 __U, __m512 __X, const int __C)
 {
-  return (__m128)
-    __builtin_ia32_rndscaless_mask_round ((__v4sf) __B,
-					  (__v4sf) __C, __imm,
-					  (__v4sf)
-					  _mm_setzero_ps (),
-					  (__mmask8) __A,
-					  __R);
+  return (__m512) __builtin_ia32_vpermilps512_mask ((__v16sf) __X, __C,
+						    (__v16sf) __W,
+						    (__mmask16) __U);
 }
 
-extern __inline __m128d
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_roundscale_round_sd (__m128d __A, __m128d __B, const int __imm,
-			 const int __R)
+_mm512_maskz_permute_ps (__mmask16 __U, __m512 __X, const int __C)
 {
-  return (__m128d)
-    __builtin_ia32_rndscalesd_mask_round ((__v2df) __A,
-					  (__v2df) __B, __imm,
-					  (__v2df)
-					  _mm_setzero_pd (),
-					  (__mmask8) -1,
-					  __R);
+  return (__m512) __builtin_ia32_vpermilps512_mask ((__v16sf) __X, __C,
+						    (__v16sf)
+						    _mm512_setzero_ps (),
+						    (__mmask16) __U);
 }
+#else
+#define _mm512_permute_pd(X, C)							    \
+  ((__m512d) __builtin_ia32_vpermilpd512_mask ((__v8df)(__m512d)(X), (int)(C),	    \
+					      (__v8df)(__m512d)_mm512_undefined_pd(),\
+					      (__mmask8)(-1)))
+
+#define _mm512_mask_permute_pd(W, U, X, C)					    \
+  ((__m512d) __builtin_ia32_vpermilpd512_mask ((__v8df)(__m512d)(X), (int)(C),	    \
+					      (__v8df)(__m512d)(W),		    \
+					      (__mmask8)(U)))
+
+#define _mm512_maskz_permute_pd(U, X, C)					    \
+  ((__m512d) __builtin_ia32_vpermilpd512_mask ((__v8df)(__m512d)(X), (int)(C),	    \
+					      (__v8df)(__m512d)_mm512_setzero_pd(), \
+					      (__mmask8)(U)))
+
+#define _mm512_permute_ps(X, C)							    \
+  ((__m512) __builtin_ia32_vpermilps512_mask ((__v16sf)(__m512)(X), (int)(C),	    \
+					      (__v16sf)(__m512)_mm512_undefined_ps(),\
+					      (__mmask16)(-1)))
 
-extern __inline __m128d
+#define _mm512_mask_permute_ps(W, U, X, C)					    \
+  ((__m512) __builtin_ia32_vpermilps512_mask ((__v16sf)(__m512)(X), (int)(C),	    \
+					      (__v16sf)(__m512)(W),		    \
+					      (__mmask16)(U)))
+
+#define _mm512_maskz_permute_ps(U, X, C)					    \
+  ((__m512) __builtin_ia32_vpermilps512_mask ((__v16sf)(__m512)(X), (int)(C),	    \
+					      (__v16sf)(__m512)_mm512_setzero_ps(), \
+					      (__mmask16)(U)))
+#endif
+
+#ifdef __OPTIMIZE__
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_roundscale_round_sd (__m128d __A, __mmask8 __B, __m128d __C,
-			      __m128d __D, const int __imm, const int __R)
+_mm512_permutex_epi64 (__m512i __X, const int __I)
 {
-  return (__m128d)
-    __builtin_ia32_rndscalesd_mask_round ((__v2df) __C,
-					  (__v2df) __D, __imm,
-					  (__v2df) __A,
-					  (__mmask8) __B,
-					  __R);
+  return (__m512i) __builtin_ia32_permdi512_mask ((__v8di) __X, __I,
+						  (__v8di)
+						  _mm512_undefined_epi32 (),
+						  (__mmask8) (-1));
 }
 
-extern __inline __m128d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_roundscale_round_sd (__mmask8 __A, __m128d __B, __m128d __C,
-			       const int __imm, const int __R)
+_mm512_mask_permutex_epi64 (__m512i __W, __mmask8 __M,
+			    __m512i __X, const int __I)
 {
-  return (__m128d)
-    __builtin_ia32_rndscalesd_mask_round ((__v2df) __B,
-					  (__v2df) __C, __imm,
-					  (__v2df)
-					  _mm_setzero_pd (),
-					  (__mmask8) __A,
-					  __R);
+  return (__m512i) __builtin_ia32_permdi512_mask ((__v8di) __X, __I,
+						  (__v8di) __W,
+						  (__mmask8) __M);
 }
 
-#else
-#define _mm512_roundscale_round_ps(A, B, R) \
-  ((__m512) __builtin_ia32_rndscaleps_mask ((__v16sf)(__m512)(A), (int)(B),\
-    (__v16sf)_mm512_undefined_ps(), (__mmask16)(-1), R))
-#define _mm512_mask_roundscale_round_ps(A, B, C, D, R)				\
-  ((__m512) __builtin_ia32_rndscaleps_mask ((__v16sf)(__m512)(C),	\
-					    (int)(D),			\
-					    (__v16sf)(__m512)(A),	\
-					    (__mmask16)(B), R))
-#define _mm512_maskz_roundscale_round_ps(A, B, C, R)				\
-  ((__m512) __builtin_ia32_rndscaleps_mask ((__v16sf)(__m512)(B),	\
-					    (int)(C),			\
-					    (__v16sf)_mm512_setzero_ps(),\
-					    (__mmask16)(A), R))
-#define _mm512_roundscale_round_pd(A, B, R) \
-  ((__m512d) __builtin_ia32_rndscalepd_mask ((__v8df)(__m512d)(A), (int)(B),\
-    (__v8df)_mm512_undefined_pd(), (__mmask8)(-1), R))
-#define _mm512_mask_roundscale_round_pd(A, B, C, D, R)				\
-  ((__m512d) __builtin_ia32_rndscalepd_mask ((__v8df)(__m512d)(C),	\
-					     (int)(D),			\
-					     (__v8df)(__m512d)(A),	\
-					     (__mmask8)(B), R))
-#define _mm512_maskz_roundscale_round_pd(A, B, C, R)				\
-  ((__m512d) __builtin_ia32_rndscalepd_mask ((__v8df)(__m512d)(B),	\
-					     (int)(C),			\
-					     (__v8df)_mm512_setzero_pd(),\
-					     (__mmask8)(A), R))
-#define _mm_roundscale_round_ss(A, B, I, R)				\
-  ((__m128)								\
-   __builtin_ia32_rndscaless_mask_round ((__v4sf) (__m128) (A),		\
-					 (__v4sf) (__m128) (B),		\
-					 (int) (I),			\
-					 (__v4sf) _mm_setzero_ps (),	\
-					 (__mmask8) (-1),		\
-					 (int) (R)))
-#define _mm_mask_roundscale_round_ss(A, U, B, C, I, R)		\
-  ((__m128)							\
-   __builtin_ia32_rndscaless_mask_round ((__v4sf) (__m128) (B),	\
-					 (__v4sf) (__m128) (C),	\
-					 (int) (I),		\
-					 (__v4sf) (__m128) (A),	\
-					 (__mmask8) (U),	\
-					 (int) (R)))
-#define _mm_maskz_roundscale_round_ss(U, A, B, I, R)			\
-  ((__m128)								\
-   __builtin_ia32_rndscaless_mask_round ((__v4sf) (__m128) (A),		\
-					 (__v4sf) (__m128) (B),		\
-					 (int) (I),			\
-					 (__v4sf) _mm_setzero_ps (),	\
-					 (__mmask8) (U),		\
-					 (int) (R)))
-#define _mm_roundscale_round_sd(A, B, I, R)				\
-  ((__m128d)								\
-   __builtin_ia32_rndscalesd_mask_round ((__v2df) (__m128d) (A),	\
-					 (__v2df) (__m128d) (B),	\
-					 (int) (I),			\
-					 (__v2df) _mm_setzero_pd (),	\
-					 (__mmask8) (-1),		\
-					 (int) (R)))
-#define _mm_mask_roundscale_round_sd(A, U, B, C, I, R)			\
-  ((__m128d)								\
-   __builtin_ia32_rndscalesd_mask_round ((__v2df) (__m128d) (B),	\
-					 (__v2df) (__m128d) (C),	\
-					 (int) (I),			\
-					 (__v2df) (__m128d) (A),	\
-					 (__mmask8) (U),		\
-					 (int) (R)))
-#define _mm_maskz_roundscale_round_sd(U, A, B, I, R)			\
-  ((__m128d)								\
-   __builtin_ia32_rndscalesd_mask_round ((__v2df) (__m128d) (A),	\
-					 (__v2df) (__m128d) (B),	\
-					 (int) (I),			\
-					 (__v2df) _mm_setzero_pd (),	\
-					 (__mmask8) (U),		\
-					 (int) (R)))
-#endif
-
-extern __inline __m512
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_floor_ps (__m512 __A)
+_mm512_maskz_permutex_epi64 (__mmask8 __M, __m512i __X, const int __I)
 {
-  return (__m512) __builtin_ia32_rndscaleps_mask ((__v16sf) __A,
-						  _MM_FROUND_FLOOR,
-						  (__v16sf) __A, -1,
-						  _MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_permdi512_mask ((__v8di) __X, __I,
+						  (__v8di)
+						  _mm512_setzero_si512 (),
+						  (__mmask8) __M);
 }
 
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_floor_pd (__m512d __A)
+_mm512_permutex_pd (__m512d __X, const int __M)
 {
-  return (__m512d) __builtin_ia32_rndscalepd_mask ((__v8df) __A,
-						   _MM_FROUND_FLOOR,
-						   (__v8df) __A, -1,
-						   _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_permdf512_mask ((__v8df) __X, __M,
+						  (__v8df)
+						  _mm512_undefined_pd (),
+						  (__mmask8) -1);
 }
 
-extern __inline __m512
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_ceil_ps (__m512 __A)
+_mm512_mask_permutex_pd (__m512d __W, __mmask8 __U, __m512d __X, const int __M)
 {
-  return (__m512) __builtin_ia32_rndscaleps_mask ((__v16sf) __A,
-						  _MM_FROUND_CEIL,
-						  (__v16sf) __A, -1,
-						  _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_permdf512_mask ((__v8df) __X, __M,
+						  (__v8df) __W,
+						  (__mmask8) __U);
 }
 
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_ceil_pd (__m512d __A)
+_mm512_maskz_permutex_pd (__mmask8 __U, __m512d __X, const int __M)
 {
-  return (__m512d) __builtin_ia32_rndscalepd_mask ((__v8df) __A,
-						   _MM_FROUND_CEIL,
-						   (__v8df) __A, -1,
-						   _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_permdf512_mask ((__v8df) __X, __M,
+						  (__v8df)
+						  _mm512_setzero_pd (),
+						  (__mmask8) __U);
 }
+#else
+#define _mm512_permutex_pd(X, M)						\
+  ((__m512d) __builtin_ia32_permdf512_mask ((__v8df)(__m512d)(X), (int)(M),	\
+					    (__v8df)(__m512d)_mm512_undefined_pd(),\
+					    (__mmask8)-1))
 
-extern __inline __m512
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_floor_ps (__m512 __W, __mmask16 __U, __m512 __A)
-{
-  return (__m512) __builtin_ia32_rndscaleps_mask ((__v16sf) __A,
-						  _MM_FROUND_FLOOR,
-						  (__v16sf) __W, __U,
-						  _MM_FROUND_CUR_DIRECTION);
-}
+#define _mm512_mask_permutex_pd(W, U, X, M)					\
+  ((__m512d) __builtin_ia32_permdf512_mask ((__v8df)(__m512d)(X), (int)(M),	\
+					    (__v8df)(__m512d)(W), (__mmask8)(U)))
 
-extern __inline __m512d
+#define _mm512_maskz_permutex_pd(U, X, M)					\
+  ((__m512d) __builtin_ia32_permdf512_mask ((__v8df)(__m512d)(X), (int)(M),	\
+					    (__v8df)(__m512d)_mm512_setzero_pd(),\
+					    (__mmask8)(U)))
+
+#define _mm512_permutex_epi64(X, I)			          \
+  ((__m512i) __builtin_ia32_permdi512_mask ((__v8di)(__m512i)(X), \
+					    (int)(I),             \
+					    (__v8di)(__m512i)	  \
+					    (_mm512_undefined_epi32 ()),\
+					    (__mmask8)(-1)))
+
+#define _mm512_maskz_permutex_epi64(M, X, I)                 \
+  ((__m512i) __builtin_ia32_permdi512_mask ((__v8di)(__m512i)(X), \
+					    (int)(I),             \
+					    (__v8di)(__m512i)     \
+					    (_mm512_setzero_si512 ()),\
+					    (__mmask8)(M)))
+
+#define _mm512_mask_permutex_epi64(W, M, X, I)               \
+  ((__m512i) __builtin_ia32_permdi512_mask ((__v8di)(__m512i)(X), \
+					    (int)(I),             \
+					    (__v8di)(__m512i)(W), \
+					    (__mmask8)(M)))
+#endif
+
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_floor_pd (__m512d __W, __mmask8 __U, __m512d __A)
+_mm512_maskz_permutexvar_epi64 (__mmask8 __M, __m512i __X, __m512i __Y)
 {
-  return (__m512d) __builtin_ia32_rndscalepd_mask ((__v8df) __A,
-						   _MM_FROUND_FLOOR,
-						   (__v8df) __W, __U,
-						   _MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_permvardi512_mask ((__v8di) __Y,
+						     (__v8di) __X,
+						     (__v8di)
+						     _mm512_setzero_si512 (),
+						     __M);
 }
 
-extern __inline __m512
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_ceil_ps (__m512 __W, __mmask16 __U, __m512 __A)
+_mm512_permutexvar_epi64 (__m512i __X, __m512i __Y)
 {
-  return (__m512) __builtin_ia32_rndscaleps_mask ((__v16sf) __A,
-						  _MM_FROUND_CEIL,
-						  (__v16sf) __W, __U,
-						  _MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_permvardi512_mask ((__v8di) __Y,
+						     (__v8di) __X,
+						     (__v8di)
+						     _mm512_undefined_epi32 (),
+						     (__mmask8) -1);
 }
 
-extern __inline __m512d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_ceil_pd (__m512d __W, __mmask8 __U, __m512d __A)
+_mm512_mask_permutexvar_epi64 (__m512i __W, __mmask8 __M, __m512i __X,
+			       __m512i __Y)
 {
-  return (__m512d) __builtin_ia32_rndscalepd_mask ((__v8df) __A,
-						   _MM_FROUND_CEIL,
-						   (__v8df) __W, __U,
-						   _MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_permvardi512_mask ((__v8di) __Y,
+						     (__v8di) __X,
+						     (__v8di) __W,
+						     __M);
 }
 
-#ifdef __OPTIMIZE__
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_alignr_epi32 (__m512i __A, __m512i __B, const int __imm)
+_mm512_maskz_permutexvar_epi32 (__mmask16 __M, __m512i __X, __m512i __Y)
 {
-  return (__m512i) __builtin_ia32_alignd512_mask ((__v16si) __A,
-						  (__v16si) __B, __imm,
-						  (__v16si)
-						  _mm512_undefined_epi32 (),
-						  (__mmask16) -1);
+  return (__m512i) __builtin_ia32_permvarsi512_mask ((__v16si) __Y,
+						     (__v16si) __X,
+						     (__v16si)
+						     _mm512_setzero_si512 (),
+						     __M);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_alignr_epi32 (__m512i __W, __mmask16 __U, __m512i __A,
-			  __m512i __B, const int __imm)
+_mm512_permutexvar_epi32 (__m512i __X, __m512i __Y)
 {
-  return (__m512i) __builtin_ia32_alignd512_mask ((__v16si) __A,
-						  (__v16si) __B, __imm,
-						  (__v16si) __W,
-						  (__mmask16) __U);
+  return (__m512i) __builtin_ia32_permvarsi512_mask ((__v16si) __Y,
+						     (__v16si) __X,
+						     (__v16si)
+						     _mm512_undefined_epi32 (),
+						     (__mmask16) -1);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_alignr_epi32 (__mmask16 __U, __m512i __A, __m512i __B,
-			   const int __imm)
+_mm512_mask_permutexvar_epi32 (__m512i __W, __mmask16 __M, __m512i __X,
+			       __m512i __Y)
 {
-  return (__m512i) __builtin_ia32_alignd512_mask ((__v16si) __A,
-						  (__v16si) __B, __imm,
-						  (__v16si)
-						  _mm512_setzero_si512 (),
-						  (__mmask16) __U);
+  return (__m512i) __builtin_ia32_permvarsi512_mask ((__v16si) __Y,
+						     (__v16si) __X,
+						     (__v16si) __W,
+						     __M);
 }
 
-extern __inline __m512i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_alignr_epi64 (__m512i __A, __m512i __B, const int __imm)
+_mm512_permutexvar_pd (__m512i __X, __m512d __Y)
 {
-  return (__m512i) __builtin_ia32_alignq512_mask ((__v8di) __A,
-						  (__v8di) __B, __imm,
-						  (__v8di)
-						  _mm512_undefined_epi32 (),
-						  (__mmask8) -1);
+  return (__m512d) __builtin_ia32_permvardf512_mask ((__v8df) __Y,
+						     (__v8di) __X,
+						     (__v8df)
+						     _mm512_undefined_pd (),
+						     (__mmask8) -1);
 }
 
-extern __inline __m512i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_alignr_epi64 (__m512i __W, __mmask8 __U, __m512i __A,
-			  __m512i __B, const int __imm)
+_mm512_mask_permutexvar_pd (__m512d __W, __mmask8 __U, __m512i __X, __m512d __Y)
 {
-  return (__m512i) __builtin_ia32_alignq512_mask ((__v8di) __A,
-						  (__v8di) __B, __imm,
-						  (__v8di) __W,
-						  (__mmask8) __U);
+  return (__m512d) __builtin_ia32_permvardf512_mask ((__v8df) __Y,
+						     (__v8di) __X,
+						     (__v8df) __W,
+						     (__mmask8) __U);
 }
 
-extern __inline __m512i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_alignr_epi64 (__mmask8 __U, __m512i __A, __m512i __B,
-			   const int __imm)
+_mm512_maskz_permutexvar_pd (__mmask8 __U, __m512i __X, __m512d __Y)
 {
-  return (__m512i) __builtin_ia32_alignq512_mask ((__v8di) __A,
-						  (__v8di) __B, __imm,
-						  (__v8di)
-						  _mm512_setzero_si512 (),
-						  (__mmask8) __U);
+  return (__m512d) __builtin_ia32_permvardf512_mask ((__v8df) __Y,
+						     (__v8di) __X,
+						     (__v8df)
+						     _mm512_setzero_pd (),
+						     (__mmask8) __U);
 }
-#else
-#define _mm512_alignr_epi32(X, Y, C)                                        \
-    ((__m512i)__builtin_ia32_alignd512_mask ((__v16si)(__m512i)(X),         \
-        (__v16si)(__m512i)(Y), (int)(C), (__v16si)_mm512_undefined_epi32 (),\
-        (__mmask16)-1))
-
-#define _mm512_mask_alignr_epi32(W, U, X, Y, C)                             \
-    ((__m512i)__builtin_ia32_alignd512_mask ((__v16si)(__m512i)(X),         \
-        (__v16si)(__m512i)(Y), (int)(C), (__v16si)(__m512i)(W),             \
-        (__mmask16)(U)))
-
-#define _mm512_maskz_alignr_epi32(U, X, Y, C)                               \
-    ((__m512i)__builtin_ia32_alignd512_mask ((__v16si)(__m512i)(X),         \
-        (__v16si)(__m512i)(Y), (int)(C), (__v16si)_mm512_setzero_si512 (),\
-        (__mmask16)(U)))
-
-#define _mm512_alignr_epi64(X, Y, C)                                        \
-    ((__m512i)__builtin_ia32_alignq512_mask ((__v8di)(__m512i)(X),          \
-        (__v8di)(__m512i)(Y), (int)(C), (__v8di)_mm512_undefined_epi32 (),  \
-	(__mmask8)-1))
-
-#define _mm512_mask_alignr_epi64(W, U, X, Y, C)                             \
-    ((__m512i)__builtin_ia32_alignq512_mask ((__v8di)(__m512i)(X),          \
-        (__v8di)(__m512i)(Y), (int)(C), (__v8di)(__m512i)(W), (__mmask8)(U)))
-
-#define _mm512_maskz_alignr_epi64(U, X, Y, C)                               \
-    ((__m512i)__builtin_ia32_alignq512_mask ((__v8di)(__m512i)(X),          \
-        (__v8di)(__m512i)(Y), (int)(C), (__v8di)_mm512_setzero_si512 (),\
-        (__mmask8)(U)))
-#endif
 
-extern __inline __mmask16
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmpeq_epi32_mask (__m512i __A, __m512i __B)
+_mm512_permutexvar_ps (__m512i __X, __m512 __Y)
 {
-  return (__mmask16) __builtin_ia32_pcmpeqd512_mask ((__v16si) __A,
-						     (__v16si) __B,
-						     (__mmask16) -1);
+  return (__m512) __builtin_ia32_permvarsf512_mask ((__v16sf) __Y,
+						    (__v16si) __X,
+						    (__v16sf)
+						    _mm512_undefined_ps (),
+						    (__mmask16) -1);
 }
 
-extern __inline __mmask16
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmpeq_epi32_mask (__mmask16 __U, __m512i __A, __m512i __B)
+_mm512_mask_permutexvar_ps (__m512 __W, __mmask16 __U, __m512i __X, __m512 __Y)
 {
-  return (__mmask16) __builtin_ia32_pcmpeqd512_mask ((__v16si) __A,
-						     (__v16si) __B, __U);
+  return (__m512) __builtin_ia32_permvarsf512_mask ((__v16sf) __Y,
+						    (__v16si) __X,
+						    (__v16sf) __W,
+						    (__mmask16) __U);
 }
 
-extern __inline __mmask8
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmpeq_epi64_mask (__mmask8 __U, __m512i __A, __m512i __B)
+_mm512_maskz_permutexvar_ps (__mmask16 __U, __m512i __X, __m512 __Y)
 {
-  return (__mmask8) __builtin_ia32_pcmpeqq512_mask ((__v8di) __A,
-						    (__v8di) __B, __U);
+  return (__m512) __builtin_ia32_permvarsf512_mask ((__v16sf) __Y,
+						    (__v16si) __X,
+						    (__v16sf)
+						    _mm512_setzero_ps (),
+						    (__mmask16) __U);
 }
 
-extern __inline __mmask8
+#ifdef __OPTIMIZE__
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmpeq_epi64_mask (__m512i __A, __m512i __B)
+_mm512_shuffle_ps (__m512 __M, __m512 __V, const int __imm)
 {
-  return (__mmask8) __builtin_ia32_pcmpeqq512_mask ((__v8di) __A,
-						    (__v8di) __B,
-						    (__mmask8) -1);
+  return (__m512) __builtin_ia32_shufps512_mask ((__v16sf) __M,
+						 (__v16sf) __V, __imm,
+						 (__v16sf)
+						 _mm512_undefined_ps (),
+						 (__mmask16) -1);
 }
 
-extern __inline __mmask16
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmpgt_epi32_mask (__m512i __A, __m512i __B)
+_mm512_mask_shuffle_ps (__m512 __W, __mmask16 __U, __m512 __M,
+			__m512 __V, const int __imm)
 {
-  return (__mmask16) __builtin_ia32_pcmpgtd512_mask ((__v16si) __A,
-						     (__v16si) __B,
-						     (__mmask16) -1);
+  return (__m512) __builtin_ia32_shufps512_mask ((__v16sf) __M,
+						 (__v16sf) __V, __imm,
+						 (__v16sf) __W,
+						 (__mmask16) __U);
 }
 
-extern __inline __mmask16
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmpgt_epi32_mask (__mmask16 __U, __m512i __A, __m512i __B)
+_mm512_maskz_shuffle_ps (__mmask16 __U, __m512 __M, __m512 __V, const int __imm)
 {
-  return (__mmask16) __builtin_ia32_pcmpgtd512_mask ((__v16si) __A,
-						     (__v16si) __B, __U);
+  return (__m512) __builtin_ia32_shufps512_mask ((__v16sf) __M,
+						 (__v16sf) __V, __imm,
+						 (__v16sf)
+						 _mm512_setzero_ps (),
+						 (__mmask16) __U);
 }
 
-extern __inline __mmask8
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmpgt_epi64_mask (__mmask8 __U, __m512i __A, __m512i __B)
+_mm512_shuffle_pd (__m512d __M, __m512d __V, const int __imm)
 {
-  return (__mmask8) __builtin_ia32_pcmpgtq512_mask ((__v8di) __A,
-						    (__v8di) __B, __U);
+  return (__m512d) __builtin_ia32_shufpd512_mask ((__v8df) __M,
+						  (__v8df) __V, __imm,
+						  (__v8df)
+						  _mm512_undefined_pd (),
+						  (__mmask8) -1);
 }
 
-extern __inline __mmask8
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmpgt_epi64_mask (__m512i __A, __m512i __B)
+_mm512_mask_shuffle_pd (__m512d __W, __mmask8 __U, __m512d __M,
+			__m512d __V, const int __imm)
 {
-  return (__mmask8) __builtin_ia32_pcmpgtq512_mask ((__v8di) __A,
-						    (__v8di) __B,
-						    (__mmask8) -1);
+  return (__m512d) __builtin_ia32_shufpd512_mask ((__v8df) __M,
+						  (__v8df) __V, __imm,
+						  (__v8df) __W,
+						  (__mmask8) __U);
 }
 
-extern __inline __mmask16
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmpge_epi32_mask (__m512i __X, __m512i __Y)
+_mm512_maskz_shuffle_pd (__mmask8 __U, __m512d __M, __m512d __V,
+			 const int __imm)
 {
-  return (__mmask16) __builtin_ia32_cmpd512_mask ((__v16si) __X,
-						    (__v16si) __Y, 5,
-						    (__mmask16) -1);
+  return (__m512d) __builtin_ia32_shufpd512_mask ((__v8df) __M,
+						  (__v8df) __V, __imm,
+						  (__v8df)
+						  _mm512_setzero_pd (),
+						  (__mmask8) __U);
 }
 
-extern __inline __mmask16
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmpge_epi32_mask (__mmask16 __M, __m512i __X, __m512i __Y)
+_mm512_fixupimm_round_pd (__m512d __A, __m512d __B, __m512i __C,
+			  const int __imm, const int __R)
 {
-  return (__mmask16) __builtin_ia32_cmpd512_mask ((__v16si) __X,
-						    (__v16si) __Y, 5,
-						    (__mmask16) __M);
+  return (__m512d) __builtin_ia32_fixupimmpd512_mask ((__v8df) __A,
+						      (__v8df) __B,
+						      (__v8di) __C,
+						      __imm,
+						      (__mmask8) -1, __R);
 }
 
-extern __inline __mmask16
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmpge_epu32_mask (__mmask16 __M, __m512i __X, __m512i __Y)
+_mm512_mask_fixupimm_round_pd (__m512d __A, __mmask8 __U, __m512d __B,
+			       __m512i __C, const int __imm, const int __R)
 {
-  return (__mmask16) __builtin_ia32_ucmpd512_mask ((__v16si) __X,
-						    (__v16si) __Y, 5,
-						    (__mmask16) __M);
+  return (__m512d) __builtin_ia32_fixupimmpd512_mask ((__v8df) __A,
+						      (__v8df) __B,
+						      (__v8di) __C,
+						      __imm,
+						      (__mmask8) __U, __R);
 }
 
-extern __inline __mmask16
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmpge_epu32_mask (__m512i __X, __m512i __Y)
+_mm512_maskz_fixupimm_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
+				__m512i __C, const int __imm, const int __R)
 {
-  return (__mmask16) __builtin_ia32_ucmpd512_mask ((__v16si) __X,
-						    (__v16si) __Y, 5,
-						    (__mmask16) -1);
+  return (__m512d) __builtin_ia32_fixupimmpd512_maskz ((__v8df) __A,
+						       (__v8df) __B,
+						       (__v8di) __C,
+						       __imm,
+						       (__mmask8) __U, __R);
 }
 
-extern __inline __mmask8
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmpge_epi64_mask (__mmask8 __M, __m512i __X, __m512i __Y)
+_mm512_fixupimm_round_ps (__m512 __A, __m512 __B, __m512i __C,
+			  const int __imm, const int __R)
 {
-  return (__mmask8) __builtin_ia32_cmpq512_mask ((__v8di) __X,
-						    (__v8di) __Y, 5,
-						    (__mmask8) __M);
+  return (__m512) __builtin_ia32_fixupimmps512_mask ((__v16sf) __A,
+						     (__v16sf) __B,
+						     (__v16si) __C,
+						     __imm,
+						     (__mmask16) -1, __R);
 }
 
-extern __inline __mmask8
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmpge_epi64_mask (__m512i __X, __m512i __Y)
+_mm512_mask_fixupimm_round_ps (__m512 __A, __mmask16 __U, __m512 __B,
+			       __m512i __C, const int __imm, const int __R)
 {
-  return (__mmask8) __builtin_ia32_cmpq512_mask ((__v8di) __X,
-						    (__v8di) __Y, 5,
-						    (__mmask8) -1);
+  return (__m512) __builtin_ia32_fixupimmps512_mask ((__v16sf) __A,
+						     (__v16sf) __B,
+						     (__v16si) __C,
+						     __imm,
+						     (__mmask16) __U, __R);
 }
 
-extern __inline __mmask8
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmpge_epu64_mask (__mmask8 __M, __m512i __X, __m512i __Y)
+_mm512_maskz_fixupimm_round_ps (__mmask16 __U, __m512 __A, __m512 __B,
+				__m512i __C, const int __imm, const int __R)
 {
-  return (__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di) __X,
-						    (__v8di) __Y, 5,
-						    (__mmask8) __M);
+  return (__m512) __builtin_ia32_fixupimmps512_maskz ((__v16sf) __A,
+						      (__v16sf) __B,
+						      (__v16si) __C,
+						      __imm,
+						      (__mmask16) __U, __R);
 }
 
-extern __inline __mmask8
+#else
+#define _mm512_shuffle_pd(X, Y, C)                                      \
+    ((__m512d)__builtin_ia32_shufpd512_mask ((__v8df)(__m512d)(X),           \
+        (__v8df)(__m512d)(Y), (int)(C),\
+    (__v8df)(__m512d)_mm512_undefined_pd(),\
+    (__mmask8)-1))
+
+#define _mm512_mask_shuffle_pd(W, U, X, Y, C)                           \
+    ((__m512d)__builtin_ia32_shufpd512_mask ((__v8df)(__m512d)(X),           \
+        (__v8df)(__m512d)(Y), (int)(C),\
+    (__v8df)(__m512d)(W),\
+    (__mmask8)(U)))
+
+#define _mm512_maskz_shuffle_pd(U, X, Y, C)                             \
+    ((__m512d)__builtin_ia32_shufpd512_mask ((__v8df)(__m512d)(X),           \
+        (__v8df)(__m512d)(Y), (int)(C),\
+    (__v8df)(__m512d)_mm512_setzero_pd(),\
+    (__mmask8)(U)))
+
+#define _mm512_shuffle_ps(X, Y, C)                                      \
+    ((__m512)__builtin_ia32_shufps512_mask ((__v16sf)(__m512)(X),            \
+        (__v16sf)(__m512)(Y), (int)(C),\
+    (__v16sf)(__m512)_mm512_undefined_ps(),\
+    (__mmask16)-1))
+
+#define _mm512_mask_shuffle_ps(W, U, X, Y, C)                           \
+    ((__m512)__builtin_ia32_shufps512_mask ((__v16sf)(__m512)(X),            \
+        (__v16sf)(__m512)(Y), (int)(C),\
+    (__v16sf)(__m512)(W),\
+    (__mmask16)(U)))
+
+#define _mm512_maskz_shuffle_ps(U, X, Y, C)                             \
+    ((__m512)__builtin_ia32_shufps512_mask ((__v16sf)(__m512)(X),            \
+        (__v16sf)(__m512)(Y), (int)(C),\
+    (__v16sf)(__m512)_mm512_setzero_ps(),\
+    (__mmask16)(U)))
+
+#define _mm512_fixupimm_round_pd(X, Y, Z, C, R)					\
+  ((__m512d)__builtin_ia32_fixupimmpd512_mask ((__v8df)(__m512d)(X),	\
+      (__v8df)(__m512d)(Y), (__v8di)(__m512i)(Z), (int)(C),		\
+      (__mmask8)(-1), (R)))
+
+#define _mm512_mask_fixupimm_round_pd(X, U, Y, Z, C, R)                          \
+  ((__m512d)__builtin_ia32_fixupimmpd512_mask ((__v8df)(__m512d)(X),    \
+      (__v8df)(__m512d)(Y), (__v8di)(__m512i)(Z), (int)(C),             \
+      (__mmask8)(U), (R)))
+
+#define _mm512_maskz_fixupimm_round_pd(U, X, Y, Z, C, R)                         \
+  ((__m512d)__builtin_ia32_fixupimmpd512_maskz ((__v8df)(__m512d)(X),   \
+      (__v8df)(__m512d)(Y), (__v8di)(__m512i)(Z), (int)(C),             \
+      (__mmask8)(U), (R)))
+
+#define _mm512_fixupimm_round_ps(X, Y, Z, C, R)					\
+  ((__m512)__builtin_ia32_fixupimmps512_mask ((__v16sf)(__m512)(X),	\
+    (__v16sf)(__m512)(Y), (__v16si)(__m512i)(Z), (int)(C),		\
+    (__mmask16)(-1), (R)))
+
+#define _mm512_mask_fixupimm_round_ps(X, U, Y, Z, C, R)                          \
+  ((__m512)__builtin_ia32_fixupimmps512_mask ((__v16sf)(__m512)(X),     \
+    (__v16sf)(__m512)(Y), (__v16si)(__m512i)(Z), (int)(C),              \
+    (__mmask16)(U), (R)))
+
+#define _mm512_maskz_fixupimm_round_ps(U, X, Y, Z, C, R)                         \
+  ((__m512)__builtin_ia32_fixupimmps512_maskz ((__v16sf)(__m512)(X),    \
+    (__v16sf)(__m512)(Y), (__v16si)(__m512i)(Z), (int)(C),              \
+    (__mmask16)(U), (R)))
+
+#endif
+
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmpge_epu64_mask (__m512i __X, __m512i __Y)
+_mm512_movehdup_ps (__m512 __A)
 {
-  return (__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di) __X,
-						    (__v8di) __Y, 5,
-						    (__mmask8) -1);
+  return (__m512) __builtin_ia32_movshdup512_mask ((__v16sf) __A,
+						   (__v16sf)
+						   _mm512_undefined_ps (),
+						   (__mmask16) -1);
 }
 
-extern __inline __mmask16
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmple_epi32_mask (__mmask16 __M, __m512i __X, __m512i __Y)
+_mm512_mask_movehdup_ps (__m512 __W, __mmask16 __U, __m512 __A)
 {
-  return (__mmask16) __builtin_ia32_cmpd512_mask ((__v16si) __X,
-						    (__v16si) __Y, 2,
-						    (__mmask16) __M);
+  return (__m512) __builtin_ia32_movshdup512_mask ((__v16sf) __A,
+						   (__v16sf) __W,
+						   (__mmask16) __U);
 }
 
-extern __inline __mmask16
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmple_epi32_mask (__m512i __X, __m512i __Y)
+_mm512_maskz_movehdup_ps (__mmask16 __U, __m512 __A)
 {
-  return (__mmask16) __builtin_ia32_cmpd512_mask ((__v16si) __X,
-						    (__v16si) __Y, 2,
-						    (__mmask16) -1);
+  return (__m512) __builtin_ia32_movshdup512_mask ((__v16sf) __A,
+						   (__v16sf)
+						   _mm512_setzero_ps (),
+						   (__mmask16) __U);
 }
 
-extern __inline __mmask16
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmple_epu32_mask (__mmask16 __M, __m512i __X, __m512i __Y)
+_mm512_moveldup_ps (__m512 __A)
 {
-  return (__mmask16) __builtin_ia32_ucmpd512_mask ((__v16si) __X,
-						    (__v16si) __Y, 2,
-						    (__mmask16) __M);
+  return (__m512) __builtin_ia32_movsldup512_mask ((__v16sf) __A,
+						   (__v16sf)
+						   _mm512_undefined_ps (),
+						   (__mmask16) -1);
 }
 
-extern __inline __mmask16
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmple_epu32_mask (__m512i __X, __m512i __Y)
+_mm512_mask_moveldup_ps (__m512 __W, __mmask16 __U, __m512 __A)
 {
-  return (__mmask16) __builtin_ia32_ucmpd512_mask ((__v16si) __X,
-						    (__v16si) __Y, 2,
-						    (__mmask16) -1);
+  return (__m512) __builtin_ia32_movsldup512_mask ((__v16sf) __A,
+						   (__v16sf) __W,
+						   (__mmask16) __U);
 }
 
-extern __inline __mmask8
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmple_epi64_mask (__mmask8 __M, __m512i __X, __m512i __Y)
+_mm512_maskz_moveldup_ps (__mmask16 __U, __m512 __A)
 {
-  return (__mmask8) __builtin_ia32_cmpq512_mask ((__v8di) __X,
-						    (__v8di) __Y, 2,
-						    (__mmask8) __M);
+  return (__m512) __builtin_ia32_movsldup512_mask ((__v16sf) __A,
+						   (__v16sf)
+						   _mm512_setzero_ps (),
+						   (__mmask16) __U);
 }
 
-extern __inline __mmask8
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmple_epi64_mask (__m512i __X, __m512i __Y)
+_mm512_or_si512 (__m512i __A, __m512i __B)
 {
-  return (__mmask8) __builtin_ia32_cmpq512_mask ((__v8di) __X,
-						    (__v8di) __Y, 2,
-						    (__mmask8) -1);
+  return (__m512i) ((__v16su) __A | (__v16su) __B);
 }
 
-extern __inline __mmask8
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmple_epu64_mask (__mmask8 __M, __m512i __X, __m512i __Y)
+_mm512_or_epi32 (__m512i __A, __m512i __B)
 {
-  return (__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di) __X,
-						    (__v8di) __Y, 2,
-						    (__mmask8) __M);
+  return (__m512i) ((__v16su) __A | (__v16su) __B);
 }
 
-extern __inline __mmask8
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmple_epu64_mask (__m512i __X, __m512i __Y)
+_mm512_mask_or_epi32 (__m512i __W, __mmask16 __U, __m512i __A, __m512i __B)
 {
-  return (__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di) __X,
-						    (__v8di) __Y, 2,
-						    (__mmask8) -1);
+  return (__m512i) __builtin_ia32_pord512_mask ((__v16si) __A,
+						(__v16si) __B,
+						(__v16si) __W,
+						(__mmask16) __U);
 }
 
-extern __inline __mmask16
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmplt_epi32_mask (__mmask16 __M, __m512i __X, __m512i __Y)
+_mm512_maskz_or_epi32 (__mmask16 __U, __m512i __A, __m512i __B)
 {
-  return (__mmask16) __builtin_ia32_cmpd512_mask ((__v16si) __X,
-						    (__v16si) __Y, 1,
-						    (__mmask16) __M);
+  return (__m512i) __builtin_ia32_pord512_mask ((__v16si) __A,
+						(__v16si) __B,
+						(__v16si)
+						_mm512_setzero_si512 (),
+						(__mmask16) __U);
 }
 
-extern __inline __mmask16
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmplt_epi32_mask (__m512i __X, __m512i __Y)
+_mm512_or_epi64 (__m512i __A, __m512i __B)
 {
-  return (__mmask16) __builtin_ia32_cmpd512_mask ((__v16si) __X,
-						    (__v16si) __Y, 1,
-						    (__mmask16) -1);
+  return (__m512i) ((__v8du) __A | (__v8du) __B);
 }
 
-extern __inline __mmask16
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmplt_epu32_mask (__mmask16 __M, __m512i __X, __m512i __Y)
+_mm512_mask_or_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m512i __B)
 {
-  return (__mmask16) __builtin_ia32_ucmpd512_mask ((__v16si) __X,
-						    (__v16si) __Y, 1,
-						    (__mmask16) __M);
+  return (__m512i) __builtin_ia32_porq512_mask ((__v8di) __A,
+						(__v8di) __B,
+						(__v8di) __W,
+						(__mmask8) __U);
 }
 
-extern __inline __mmask16
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmplt_epu32_mask (__m512i __X, __m512i __Y)
+_mm512_maskz_or_epi64 (__mmask8 __U, __m512i __A, __m512i __B)
 {
-  return (__mmask16) __builtin_ia32_ucmpd512_mask ((__v16si) __X,
-						    (__v16si) __Y, 1,
-						    (__mmask16) -1);
+  return (__m512i) __builtin_ia32_porq512_mask ((__v8di) __A,
+						(__v8di) __B,
+						(__v8di)
+						_mm512_setzero_si512 (),
+						(__mmask8) __U);
 }
 
-extern __inline __mmask8
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmplt_epi64_mask (__mmask8 __M, __m512i __X, __m512i __Y)
+_mm512_xor_si512 (__m512i __A, __m512i __B)
 {
-  return (__mmask8) __builtin_ia32_cmpq512_mask ((__v8di) __X,
-						    (__v8di) __Y, 1,
-						    (__mmask8) __M);
+  return (__m512i) ((__v16su) __A ^ (__v16su) __B);
 }
 
-extern __inline __mmask8
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmplt_epi64_mask (__m512i __X, __m512i __Y)
+_mm512_xor_epi32 (__m512i __A, __m512i __B)
 {
-  return (__mmask8) __builtin_ia32_cmpq512_mask ((__v8di) __X,
-						    (__v8di) __Y, 1,
-						    (__mmask8) -1);
+  return (__m512i) ((__v16su) __A ^ (__v16su) __B);
 }
 
-extern __inline __mmask8
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmplt_epu64_mask (__mmask8 __M, __m512i __X, __m512i __Y)
+_mm512_mask_xor_epi32 (__m512i __W, __mmask16 __U, __m512i __A, __m512i __B)
 {
-  return (__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di) __X,
-						    (__v8di) __Y, 1,
-						    (__mmask8) __M);
+  return (__m512i) __builtin_ia32_pxord512_mask ((__v16si) __A,
+						 (__v16si) __B,
+						 (__v16si) __W,
+						 (__mmask16) __U);
 }
 
-extern __inline __mmask8
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmplt_epu64_mask (__m512i __X, __m512i __Y)
+_mm512_maskz_xor_epi32 (__mmask16 __U, __m512i __A, __m512i __B)
 {
-  return (__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di) __X,
-						    (__v8di) __Y, 1,
-						    (__mmask8) -1);
+  return (__m512i) __builtin_ia32_pxord512_mask ((__v16si) __A,
+						 (__v16si) __B,
+						 (__v16si)
+						 _mm512_setzero_si512 (),
+						 (__mmask16) __U);
 }
 
-extern __inline __mmask16
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmpneq_epi32_mask (__m512i __X, __m512i __Y)
+_mm512_xor_epi64 (__m512i __A, __m512i __B)
 {
-  return (__mmask16) __builtin_ia32_cmpd512_mask ((__v16si) __X,
-						    (__v16si) __Y, 4,
-						    (__mmask16) -1);
+  return (__m512i) ((__v8du) __A ^ (__v8du) __B);
 }
 
-extern __inline __mmask16
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmpneq_epi32_mask (__mmask16 __M, __m512i __X, __m512i __Y)
+_mm512_mask_xor_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m512i __B)
 {
-  return (__mmask16) __builtin_ia32_cmpd512_mask ((__v16si) __X,
-						    (__v16si) __Y, 4,
-						    (__mmask16) __M);
+  return (__m512i) __builtin_ia32_pxorq512_mask ((__v8di) __A,
+						 (__v8di) __B,
+						 (__v8di) __W,
+						 (__mmask8) __U);
 }
 
-extern __inline __mmask16
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmpneq_epu32_mask (__mmask16 __M, __m512i __X, __m512i __Y)
+_mm512_maskz_xor_epi64 (__mmask8 __U, __m512i __A, __m512i __B)
 {
-  return (__mmask16) __builtin_ia32_ucmpd512_mask ((__v16si) __X,
-						    (__v16si) __Y, 4,
-						    (__mmask16) __M);
+  return (__m512i) __builtin_ia32_pxorq512_mask ((__v8di) __A,
+						 (__v8di) __B,
+						 (__v8di)
+						 _mm512_setzero_si512 (),
+						 (__mmask8) __U);
 }
 
-extern __inline __mmask16
+#ifdef __OPTIMIZE__
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmpneq_epu32_mask (__m512i __X, __m512i __Y)
+_mm512_rol_epi32 (__m512i __A, const int __B)
 {
-  return (__mmask16) __builtin_ia32_ucmpd512_mask ((__v16si) __X,
-						    (__v16si) __Y, 4,
-						    (__mmask16) -1);
+  return (__m512i) __builtin_ia32_prold512_mask ((__v16si) __A, __B,
+						 (__v16si)
+						 _mm512_undefined_epi32 (),
+						 (__mmask16) -1);
 }
 
-extern __inline __mmask8
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmpneq_epi64_mask (__mmask8 __M, __m512i __X, __m512i __Y)
+_mm512_mask_rol_epi32 (__m512i __W, __mmask16 __U, __m512i __A, const int __B)
 {
-  return (__mmask8) __builtin_ia32_cmpq512_mask ((__v8di) __X,
-						    (__v8di) __Y, 4,
-						    (__mmask8) __M);
+  return (__m512i) __builtin_ia32_prold512_mask ((__v16si) __A, __B,
+						 (__v16si) __W,
+						 (__mmask16) __U);
 }
 
-extern __inline __mmask8
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmpneq_epi64_mask (__m512i __X, __m512i __Y)
+_mm512_maskz_rol_epi32 (__mmask16 __U, __m512i __A, const int __B)
 {
-  return (__mmask8) __builtin_ia32_cmpq512_mask ((__v8di) __X,
-						    (__v8di) __Y, 4,
-						    (__mmask8) -1);
+  return (__m512i) __builtin_ia32_prold512_mask ((__v16si) __A, __B,
+						 (__v16si)
+						 _mm512_setzero_si512 (),
+						 (__mmask16) __U);
 }
 
-extern __inline __mmask8
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmpneq_epu64_mask (__mmask8 __M, __m512i __X, __m512i __Y)
+_mm512_ror_epi32 (__m512i __A, int __B)
 {
-  return (__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di) __X,
-						    (__v8di) __Y, 4,
-						    (__mmask8) __M);
+  return (__m512i) __builtin_ia32_prord512_mask ((__v16si) __A, __B,
+						 (__v16si)
+						 _mm512_undefined_epi32 (),
+						 (__mmask16) -1);
 }
 
-extern __inline __mmask8
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmpneq_epu64_mask (__m512i __X, __m512i __Y)
+_mm512_mask_ror_epi32 (__m512i __W, __mmask16 __U, __m512i __A, int __B)
 {
-  return (__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di) __X,
-						    (__v8di) __Y, 4,
-						    (__mmask8) -1);
+  return (__m512i) __builtin_ia32_prord512_mask ((__v16si) __A, __B,
+						 (__v16si) __W,
+						 (__mmask16) __U);
 }
 
-#define _MM_CMPINT_EQ	    0x0
-#define _MM_CMPINT_LT	    0x1
-#define _MM_CMPINT_LE	    0x2
-#define _MM_CMPINT_UNUSED   0x3
-#define _MM_CMPINT_NE	    0x4
-#define _MM_CMPINT_NLT	    0x5
-#define _MM_CMPINT_GE	    0x5
-#define _MM_CMPINT_NLE	    0x6
-#define _MM_CMPINT_GT	    0x6
-
-#ifdef __OPTIMIZE__
-extern __inline __mmask16
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kshiftli_mask16 (__mmask16 __A, unsigned int __B)
+_mm512_maskz_ror_epi32 (__mmask16 __U, __m512i __A, int __B)
 {
-  return (__mmask16) __builtin_ia32_kshiftlihi ((__mmask16) __A,
-						(__mmask8) __B);
+  return (__m512i) __builtin_ia32_prord512_mask ((__v16si) __A, __B,
+						 (__v16si)
+						 _mm512_setzero_si512 (),
+						 (__mmask16) __U);
 }
 
-extern __inline __mmask16
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kshiftri_mask16 (__mmask16 __A, unsigned int __B)
+_mm512_rol_epi64 (__m512i __A, const int __B)
 {
-  return (__mmask16) __builtin_ia32_kshiftrihi ((__mmask16) __A,
-						(__mmask8) __B);
+  return (__m512i) __builtin_ia32_prolq512_mask ((__v8di) __A, __B,
+						 (__v8di)
+						 _mm512_undefined_epi32 (),
+						 (__mmask8) -1);
 }
 
-extern __inline __mmask8
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmp_epi64_mask (__m512i __X, __m512i __Y, const int __P)
+_mm512_mask_rol_epi64 (__m512i __W, __mmask8 __U, __m512i __A, const int __B)
 {
-  return (__mmask8) __builtin_ia32_cmpq512_mask ((__v8di) __X,
-						 (__v8di) __Y, __P,
-						 (__mmask8) -1);
+  return (__m512i) __builtin_ia32_prolq512_mask ((__v8di) __A, __B,
+						 (__v8di) __W,
+						 (__mmask8) __U);
 }
 
-extern __inline __mmask16
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmp_epi32_mask (__m512i __X, __m512i __Y, const int __P)
+_mm512_maskz_rol_epi64 (__mmask8 __U, __m512i __A, const int __B)
 {
-  return (__mmask16) __builtin_ia32_cmpd512_mask ((__v16si) __X,
-						  (__v16si) __Y, __P,
-						  (__mmask16) -1);
+  return (__m512i) __builtin_ia32_prolq512_mask ((__v8di) __A, __B,
+						 (__v8di)
+						 _mm512_setzero_si512 (),
+						 (__mmask8) __U);
 }
 
-extern __inline __mmask8
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmp_epu64_mask (__m512i __X, __m512i __Y, const int __P)
+_mm512_ror_epi64 (__m512i __A, int __B)
 {
-  return (__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di) __X,
-						  (__v8di) __Y, __P,
-						  (__mmask8) -1);
+  return (__m512i) __builtin_ia32_prorq512_mask ((__v8di) __A, __B,
+						 (__v8di)
+						 _mm512_undefined_epi32 (),
+						 (__mmask8) -1);
 }
 
-extern __inline __mmask16
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmp_epu32_mask (__m512i __X, __m512i __Y, const int __P)
+_mm512_mask_ror_epi64 (__m512i __W, __mmask8 __U, __m512i __A, int __B)
 {
-  return (__mmask16) __builtin_ia32_ucmpd512_mask ((__v16si) __X,
-						   (__v16si) __Y, __P,
-						   (__mmask16) -1);
+  return (__m512i) __builtin_ia32_prorq512_mask ((__v8di) __A, __B,
+						 (__v8di) __W,
+						 (__mmask8) __U);
 }
 
-extern __inline __mmask8
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmp_round_pd_mask (__m512d __X, __m512d __Y, const int __P,
-			  const int __R)
+_mm512_maskz_ror_epi64 (__mmask8 __U, __m512i __A, int __B)
 {
-  return (__mmask8) __builtin_ia32_cmppd512_mask ((__v8df) __X,
-						  (__v8df) __Y, __P,
-						  (__mmask8) -1, __R);
+  return (__m512i) __builtin_ia32_prorq512_mask ((__v8di) __A, __B,
+						 (__v8di)
+						 _mm512_setzero_si512 (),
+						 (__mmask8) __U);
 }
 
-extern __inline __mmask16
+#else
+#define _mm512_rol_epi32(A, B)						  \
+    ((__m512i)__builtin_ia32_prold512_mask ((__v16si)(__m512i)(A),	  \
+					    (int)(B),			  \
+					    (__v16si)_mm512_undefined_epi32 (), \
+					    (__mmask16)(-1)))
+#define _mm512_mask_rol_epi32(W, U, A, B)				  \
+    ((__m512i)__builtin_ia32_prold512_mask ((__v16si)(__m512i)(A),	  \
+					    (int)(B),			  \
+					    (__v16si)(__m512i)(W),	  \
+					    (__mmask16)(U)))
+#define _mm512_maskz_rol_epi32(U, A, B)					  \
+    ((__m512i)__builtin_ia32_prold512_mask ((__v16si)(__m512i)(A),	  \
+					    (int)(B),			  \
+					    (__v16si)_mm512_setzero_si512 (), \
+					    (__mmask16)(U)))
+#define _mm512_ror_epi32(A, B)						  \
+    ((__m512i)__builtin_ia32_prord512_mask ((__v16si)(__m512i)(A),	  \
+					    (int)(B),			  \
+					    (__v16si)_mm512_undefined_epi32 (), \
+					    (__mmask16)(-1)))
+#define _mm512_mask_ror_epi32(W, U, A, B)				  \
+    ((__m512i)__builtin_ia32_prord512_mask ((__v16si)(__m512i)(A),	  \
+					    (int)(B),			  \
+					    (__v16si)(__m512i)(W),	  \
+					    (__mmask16)(U)))
+#define _mm512_maskz_ror_epi32(U, A, B)					  \
+    ((__m512i)__builtin_ia32_prord512_mask ((__v16si)(__m512i)(A),	  \
+					    (int)(B),			  \
+					    (__v16si)_mm512_setzero_si512 (), \
+					    (__mmask16)(U)))
+#define _mm512_rol_epi64(A, B)						  \
+    ((__m512i)__builtin_ia32_prolq512_mask ((__v8di)(__m512i)(A),	  \
+					    (int)(B),			  \
+					    (__v8di)_mm512_undefined_epi32 (),  \
+					    (__mmask8)(-1)))
+#define _mm512_mask_rol_epi64(W, U, A, B)				  \
+    ((__m512i)__builtin_ia32_prolq512_mask ((__v8di)(__m512i)(A),	  \
+					    (int)(B),			  \
+					    (__v8di)(__m512i)(W),	  \
+					    (__mmask8)(U)))
+#define _mm512_maskz_rol_epi64(U, A, B)					  \
+    ((__m512i)__builtin_ia32_prolq512_mask ((__v8di)(__m512i)(A),	  \
+					    (int)(B),			  \
+					    (__v8di)_mm512_setzero_si512 (),  \
+					    (__mmask8)(U)))
+
+#define _mm512_ror_epi64(A, B)						  \
+    ((__m512i)__builtin_ia32_prorq512_mask ((__v8di)(__m512i)(A),	  \
+					    (int)(B),			  \
+					    (__v8di)_mm512_undefined_epi32 (),  \
+					    (__mmask8)(-1)))
+#define _mm512_mask_ror_epi64(W, U, A, B)				  \
+    ((__m512i)__builtin_ia32_prorq512_mask ((__v8di)(__m512i)(A),	  \
+					    (int)(B),			  \
+					    (__v8di)(__m512i)(W),	  \
+					    (__mmask8)(U)))
+#define _mm512_maskz_ror_epi64(U, A, B)					  \
+    ((__m512i)__builtin_ia32_prorq512_mask ((__v8di)(__m512i)(A),	  \
+					    (int)(B),			  \
+					    (__v8di)_mm512_setzero_si512 (),  \
+					    (__mmask8)(U)))
+#endif
+
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmp_round_ps_mask (__m512 __X, __m512 __Y, const int __P, const int __R)
+_mm512_and_si512 (__m512i __A, __m512i __B)
 {
-  return (__mmask16) __builtin_ia32_cmpps512_mask ((__v16sf) __X,
-						   (__v16sf) __Y, __P,
-						   (__mmask16) -1, __R);
+  return (__m512i) ((__v16su) __A & (__v16su) __B);
 }
 
-extern __inline __mmask8
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmp_epi64_mask (__mmask8 __U, __m512i __X, __m512i __Y,
-			    const int __P)
+_mm512_and_epi32 (__m512i __A, __m512i __B)
 {
-  return (__mmask8) __builtin_ia32_cmpq512_mask ((__v8di) __X,
-						 (__v8di) __Y, __P,
-						 (__mmask8) __U);
+  return (__m512i) ((__v16su) __A & (__v16su) __B);
 }
 
-extern __inline __mmask16
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmp_epi32_mask (__mmask16 __U, __m512i __X, __m512i __Y,
-			    const int __P)
+_mm512_mask_and_epi32 (__m512i __W, __mmask16 __U, __m512i __A, __m512i __B)
 {
-  return (__mmask16) __builtin_ia32_cmpd512_mask ((__v16si) __X,
-						  (__v16si) __Y, __P,
-						  (__mmask16) __U);
+  return (__m512i) __builtin_ia32_pandd512_mask ((__v16si) __A,
+						 (__v16si) __B,
+						 (__v16si) __W,
+						 (__mmask16) __U);
 }
 
-extern __inline __mmask8
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmp_epu64_mask (__mmask8 __U, __m512i __X, __m512i __Y,
-			    const int __P)
+_mm512_maskz_and_epi32 (__mmask16 __U, __m512i __A, __m512i __B)
 {
-  return (__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di) __X,
-						  (__v8di) __Y, __P,
-						  (__mmask8) __U);
+  return (__m512i) __builtin_ia32_pandd512_mask ((__v16si) __A,
+						 (__v16si) __B,
+						 (__v16si)
+						 _mm512_setzero_si512 (),
+						 (__mmask16) __U);
 }
 
-extern __inline __mmask16
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmp_epu32_mask (__mmask16 __U, __m512i __X, __m512i __Y,
-			    const int __P)
+_mm512_and_epi64 (__m512i __A, __m512i __B)
 {
-  return (__mmask16) __builtin_ia32_ucmpd512_mask ((__v16si) __X,
-						   (__v16si) __Y, __P,
-						   (__mmask16) __U);
+  return (__m512i) ((__v8du) __A & (__v8du) __B);
 }
 
-extern __inline __mmask8
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmp_round_pd_mask (__mmask8 __U, __m512d __X, __m512d __Y,
-			       const int __P, const int __R)
+_mm512_mask_and_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m512i __B)
 {
-  return (__mmask8) __builtin_ia32_cmppd512_mask ((__v8df) __X,
-						  (__v8df) __Y, __P,
-						  (__mmask8) __U, __R);
+  return (__m512i) __builtin_ia32_pandq512_mask ((__v8di) __A,
+						 (__v8di) __B,
+						 (__v8di) __W, __U);
 }
 
-extern __inline __mmask16
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmp_round_ps_mask (__mmask16 __U, __m512 __X, __m512 __Y,
-			       const int __P, const int __R)
+_mm512_maskz_and_epi64 (__mmask8 __U, __m512i __A, __m512i __B)
 {
-  return (__mmask16) __builtin_ia32_cmpps512_mask ((__v16sf) __X,
-						   (__v16sf) __Y, __P,
-						   (__mmask16) __U, __R);
+  return (__m512i) __builtin_ia32_pandq512_mask ((__v8di) __A,
+						 (__v8di) __B,
+						 (__v8di)
+						 _mm512_setzero_pd (),
+						 __U);
 }
 
-extern __inline __mmask8
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cmp_round_sd_mask (__m128d __X, __m128d __Y, const int __P, const int __R)
+_mm512_andnot_si512 (__m512i __A, __m512i __B)
 {
-  return (__mmask8) __builtin_ia32_cmpsd_mask ((__v2df) __X,
-					       (__v2df) __Y, __P,
-					       (__mmask8) -1, __R);
+  return (__m512i) __builtin_ia32_pandnd512_mask ((__v16si) __A,
+						  (__v16si) __B,
+						  (__v16si)
+						  _mm512_undefined_epi32 (),
+						  (__mmask16) -1);
 }
 
-extern __inline __mmask8
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_cmp_round_sd_mask (__mmask8 __M, __m128d __X, __m128d __Y,
-			    const int __P, const int __R)
+_mm512_andnot_epi32 (__m512i __A, __m512i __B)
 {
-  return (__mmask8) __builtin_ia32_cmpsd_mask ((__v2df) __X,
-					       (__v2df) __Y, __P,
-					       (__mmask8) __M, __R);
+  return (__m512i) __builtin_ia32_pandnd512_mask ((__v16si) __A,
+						  (__v16si) __B,
+						  (__v16si)
+						  _mm512_undefined_epi32 (),
+						  (__mmask16) -1);
 }
 
-extern __inline __mmask8
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cmp_round_ss_mask (__m128 __X, __m128 __Y, const int __P, const int __R)
+_mm512_mask_andnot_epi32 (__m512i __W, __mmask16 __U, __m512i __A, __m512i __B)
 {
-  return (__mmask8) __builtin_ia32_cmpss_mask ((__v4sf) __X,
-					       (__v4sf) __Y, __P,
-					       (__mmask8) -1, __R);
+  return (__m512i) __builtin_ia32_pandnd512_mask ((__v16si) __A,
+						  (__v16si) __B,
+						  (__v16si) __W,
+						  (__mmask16) __U);
 }
 
-extern __inline __mmask8
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_cmp_round_ss_mask (__mmask8 __M, __m128 __X, __m128 __Y,
-			    const int __P, const int __R)
+_mm512_maskz_andnot_epi32 (__mmask16 __U, __m512i __A, __m512i __B)
 {
-  return (__mmask8) __builtin_ia32_cmpss_mask ((__v4sf) __X,
-					       (__v4sf) __Y, __P,
-					       (__mmask8) __M, __R);
+  return (__m512i) __builtin_ia32_pandnd512_mask ((__v16si) __A,
+						  (__v16si) __B,
+						  (__v16si)
+						  _mm512_setzero_si512 (),
+						  (__mmask16) __U);
 }
 
-#else
-#define _kshiftli_mask16(X, Y)						\
-  ((__mmask16) __builtin_ia32_kshiftlihi ((__mmask16)(X), (__mmask8)(Y)))
-
-#define _kshiftri_mask16(X, Y)						\
-  ((__mmask16) __builtin_ia32_kshiftrihi ((__mmask16)(X), (__mmask8)(Y)))
-
-#define _mm512_cmp_epi64_mask(X, Y, P)					\
-  ((__mmask8) __builtin_ia32_cmpq512_mask ((__v8di)(__m512i)(X),	\
-					   (__v8di)(__m512i)(Y), (int)(P),\
-					   (__mmask8)-1))
-
-#define _mm512_cmp_epi32_mask(X, Y, P)					\
-  ((__mmask16) __builtin_ia32_cmpd512_mask ((__v16si)(__m512i)(X),	\
-					    (__v16si)(__m512i)(Y), (int)(P), \
-					    (__mmask16)-1))
-
-#define _mm512_cmp_epu64_mask(X, Y, P)					\
-  ((__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di)(__m512i)(X),	\
-					    (__v8di)(__m512i)(Y), (int)(P),\
-					    (__mmask8)-1))
-
-#define _mm512_cmp_epu32_mask(X, Y, P)					\
-  ((__mmask16) __builtin_ia32_ucmpd512_mask ((__v16si)(__m512i)(X),	\
-					     (__v16si)(__m512i)(Y), (int)(P), \
-					     (__mmask16)-1))
-
-#define _mm512_cmp_round_pd_mask(X, Y, P, R)				\
-  ((__mmask8) __builtin_ia32_cmppd512_mask ((__v8df)(__m512d)(X),	\
-					    (__v8df)(__m512d)(Y), (int)(P),\
-					    (__mmask8)-1, R))
-
-#define _mm512_cmp_round_ps_mask(X, Y, P, R)				\
-  ((__mmask16) __builtin_ia32_cmpps512_mask ((__v16sf)(__m512)(X),	\
-					     (__v16sf)(__m512)(Y), (int)(P),\
-					     (__mmask16)-1, R))
-
-#define _mm512_mask_cmp_epi64_mask(M, X, Y, P)				\
-  ((__mmask8) __builtin_ia32_cmpq512_mask ((__v8di)(__m512i)(X),	\
-					   (__v8di)(__m512i)(Y), (int)(P),\
-					   (__mmask8)(M)))
-
-#define _mm512_mask_cmp_epi32_mask(M, X, Y, P)				\
-  ((__mmask16) __builtin_ia32_cmpd512_mask ((__v16si)(__m512i)(X),	\
-					    (__v16si)(__m512i)(Y), (int)(P), \
-					    (__mmask16)(M)))
-
-#define _mm512_mask_cmp_epu64_mask(M, X, Y, P)				\
-  ((__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di)(__m512i)(X),	\
-					    (__v8di)(__m512i)(Y), (int)(P),\
-					    (__mmask8)(M)))
-
-#define _mm512_mask_cmp_epu32_mask(M, X, Y, P)				\
-  ((__mmask16) __builtin_ia32_ucmpd512_mask ((__v16si)(__m512i)(X),	\
-					     (__v16si)(__m512i)(Y), (int)(P), \
-					     (__mmask16)(M)))
-
-#define _mm512_mask_cmp_round_pd_mask(M, X, Y, P, R)			\
-  ((__mmask8) __builtin_ia32_cmppd512_mask ((__v8df)(__m512d)(X),	\
-					    (__v8df)(__m512d)(Y), (int)(P),\
-					    (__mmask8)(M), R))
-
-#define _mm512_mask_cmp_round_ps_mask(M, X, Y, P, R)			\
-  ((__mmask16) __builtin_ia32_cmpps512_mask ((__v16sf)(__m512)(X),	\
-					     (__v16sf)(__m512)(Y), (int)(P),\
-					     (__mmask16)(M), R))
-
-#define _mm_cmp_round_sd_mask(X, Y, P, R)				\
-  ((__mmask8) __builtin_ia32_cmpsd_mask ((__v2df)(__m128d)(X),		\
-					 (__v2df)(__m128d)(Y), (int)(P),\
-					 (__mmask8)-1, R))
-
-#define _mm_mask_cmp_round_sd_mask(M, X, Y, P, R)			\
-  ((__mmask8) __builtin_ia32_cmpsd_mask ((__v2df)(__m128d)(X),		\
-					 (__v2df)(__m128d)(Y), (int)(P),\
-					 (M), R))
-
-#define _mm_cmp_round_ss_mask(X, Y, P, R)				\
-  ((__mmask8) __builtin_ia32_cmpss_mask ((__v4sf)(__m128)(X),		\
-					 (__v4sf)(__m128)(Y), (int)(P), \
-					 (__mmask8)-1, R))
-
-#define _mm_mask_cmp_round_ss_mask(M, X, Y, P, R)			\
-  ((__mmask8) __builtin_ia32_cmpss_mask ((__v4sf)(__m128)(X),		\
-					 (__v4sf)(__m128)(Y), (int)(P), \
-					 (M), R))
-#endif
-
-#ifdef __OPTIMIZE__
-extern __inline __m512
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_i32gather_ps (__m512i __index, void const *__addr, int __scale)
+_mm512_andnot_epi64 (__m512i __A, __m512i __B)
 {
-  __m512 __v1_old = _mm512_undefined_ps ();
-  __mmask16 __mask = 0xFFFF;
-
-  return (__m512) __builtin_ia32_gathersiv16sf ((__v16sf) __v1_old,
-						__addr,
-						(__v16si) __index,
-						__mask, __scale);
+  return (__m512i) __builtin_ia32_pandnq512_mask ((__v8di) __A,
+						  (__v8di) __B,
+						  (__v8di)
+						  _mm512_undefined_epi32 (),
+						  (__mmask8) -1);
 }
 
-extern __inline __m512
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_i32gather_ps (__m512 __v1_old, __mmask16 __mask,
-			  __m512i __index, void const *__addr, int __scale)
+_mm512_mask_andnot_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m512i __B)
 {
-  return (__m512) __builtin_ia32_gathersiv16sf ((__v16sf) __v1_old,
-						__addr,
-						(__v16si) __index,
-						__mask, __scale);
+  return (__m512i) __builtin_ia32_pandnq512_mask ((__v8di) __A,
+						  (__v8di) __B,
+						  (__v8di) __W, __U);
 }
 
-extern __inline __m512d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_i32gather_pd (__m256i __index, void const *__addr, int __scale)
+_mm512_maskz_andnot_epi64 (__mmask8 __U, __m512i __A, __m512i __B)
 {
-  __m512d __v1_old = _mm512_undefined_pd ();
-  __mmask8 __mask = 0xFF;
-
-  return (__m512d) __builtin_ia32_gathersiv8df ((__v8df) __v1_old,
-						__addr,
-						(__v8si) __index, __mask,
-						__scale);
+  return (__m512i) __builtin_ia32_pandnq512_mask ((__v8di) __A,
+						  (__v8di) __B,
+						  (__v8di)
+						  _mm512_setzero_pd (),
+						  __U);
 }
 
-extern __inline __m512d
+extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_i32gather_pd (__m512d __v1_old, __mmask8 __mask,
-			  __m256i __index, void const *__addr, int __scale)
+_mm512_test_epi32_mask (__m512i __A, __m512i __B)
 {
-  return (__m512d) __builtin_ia32_gathersiv8df ((__v8df) __v1_old,
-						__addr,
-						(__v8si) __index,
-						__mask, __scale);
+  return (__mmask16) __builtin_ia32_ptestmd512 ((__v16si) __A,
+						(__v16si) __B,
+						(__mmask16) -1);
 }
 
-extern __inline __m256
+extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_i64gather_ps (__m512i __index, void const *__addr, int __scale)
+_mm512_mask_test_epi32_mask (__mmask16 __U, __m512i __A, __m512i __B)
 {
-  __m256 __v1_old = _mm256_undefined_ps ();
-  __mmask8 __mask = 0xFF;
-
-  return (__m256) __builtin_ia32_gatherdiv16sf ((__v8sf) __v1_old,
-						__addr,
-						(__v8di) __index, __mask,
-						__scale);
+  return (__mmask16) __builtin_ia32_ptestmd512 ((__v16si) __A,
+						(__v16si) __B, __U);
 }
 
-extern __inline __m256
+extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_i64gather_ps (__m256 __v1_old, __mmask8 __mask,
-			  __m512i __index, void const *__addr, int __scale)
+_mm512_test_epi64_mask (__m512i __A, __m512i __B)
 {
-  return (__m256) __builtin_ia32_gatherdiv16sf ((__v8sf) __v1_old,
-						__addr,
-						(__v8di) __index,
-						__mask, __scale);
+  return (__mmask8) __builtin_ia32_ptestmq512 ((__v8di) __A,
+					       (__v8di) __B,
+					       (__mmask8) -1);
 }
 
-extern __inline __m512d
+extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_i64gather_pd (__m512i __index, void const *__addr, int __scale)
+_mm512_mask_test_epi64_mask (__mmask8 __U, __m512i __A, __m512i __B)
 {
-  __m512d __v1_old = _mm512_undefined_pd ();
-  __mmask8 __mask = 0xFF;
-
-  return (__m512d) __builtin_ia32_gatherdiv8df ((__v8df) __v1_old,
-						__addr,
-						(__v8di) __index, __mask,
-						__scale);
+  return (__mmask8) __builtin_ia32_ptestmq512 ((__v8di) __A, (__v8di) __B, __U);
 }
 
-extern __inline __m512d
+extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_i64gather_pd (__m512d __v1_old, __mmask8 __mask,
-			  __m512i __index, void const *__addr, int __scale)
+_mm512_testn_epi32_mask (__m512i __A, __m512i __B)
 {
-  return (__m512d) __builtin_ia32_gatherdiv8df ((__v8df) __v1_old,
-						__addr,
-						(__v8di) __index,
-						__mask, __scale);
+  return (__mmask16) __builtin_ia32_ptestnmd512 ((__v16si) __A,
+						 (__v16si) __B,
+						 (__mmask16) -1);
 }
 
-extern __inline __m512i
+extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_i32gather_epi32 (__m512i __index, void const *__addr, int __scale)
+_mm512_mask_testn_epi32_mask (__mmask16 __U, __m512i __A, __m512i __B)
 {
-  __m512i __v1_old = _mm512_undefined_epi32 ();
-  __mmask16 __mask = 0xFFFF;
-
-  return (__m512i) __builtin_ia32_gathersiv16si ((__v16si) __v1_old,
-						 __addr,
-						 (__v16si) __index,
-						 __mask, __scale);
+  return (__mmask16) __builtin_ia32_ptestnmd512 ((__v16si) __A,
+						 (__v16si) __B, __U);
 }
 
-extern __inline __m512i
+extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_i32gather_epi32 (__m512i __v1_old, __mmask16 __mask,
-			     __m512i __index, void const *__addr, int __scale)
+_mm512_testn_epi64_mask (__m512i __A, __m512i __B)
 {
-  return (__m512i) __builtin_ia32_gathersiv16si ((__v16si) __v1_old,
-						 __addr,
-						 (__v16si) __index,
-						 __mask, __scale);
+  return (__mmask8) __builtin_ia32_ptestnmq512 ((__v8di) __A,
+						(__v8di) __B,
+						(__mmask8) -1);
 }
 
-extern __inline __m512i
+extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_i32gather_epi64 (__m256i __index, void const *__addr, int __scale)
+_mm512_mask_testn_epi64_mask (__mmask8 __U, __m512i __A, __m512i __B)
 {
-  __m512i __v1_old = _mm512_undefined_epi32 ();
-  __mmask8 __mask = 0xFF;
-
-  return (__m512i) __builtin_ia32_gathersiv8di ((__v8di) __v1_old,
-						__addr,
-						(__v8si) __index, __mask,
-						__scale);
+  return (__mmask8) __builtin_ia32_ptestnmq512 ((__v8di) __A,
+						(__v8di) __B, __U);
 }
 
-extern __inline __m512i
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_i32gather_epi64 (__m512i __v1_old, __mmask8 __mask,
-			     __m256i __index, void const *__addr,
-			     int __scale)
+_mm512_abs_ps (__m512 __A)
 {
-  return (__m512i) __builtin_ia32_gathersiv8di ((__v8di) __v1_old,
-						__addr,
-						(__v8si) __index,
-						__mask, __scale);
+  return (__m512) _mm512_and_epi32 ((__m512i) __A,
+				    _mm512_set1_epi32 (0x7fffffff));
 }
 
-extern __inline __m256i
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_i64gather_epi32 (__m512i __index, void const *__addr, int __scale)
+_mm512_mask_abs_ps (__m512 __W, __mmask16 __U, __m512 __A)
 {
-  __m256i __v1_old = _mm256_undefined_si256 ();
-  __mmask8 __mask = 0xFF;
-
-  return (__m256i) __builtin_ia32_gatherdiv16si ((__v8si) __v1_old,
-						 __addr,
-						 (__v8di) __index,
-						 __mask, __scale);
+  return (__m512) _mm512_mask_and_epi32 ((__m512i) __W, __U, (__m512i) __A,
+					 _mm512_set1_epi32 (0x7fffffff));
 }
 
-extern __inline __m256i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_i64gather_epi32 (__m256i __v1_old, __mmask8 __mask,
-			     __m512i __index, void const *__addr, int __scale)
+_mm512_abs_pd (__m512d __A)
 {
-  return (__m256i) __builtin_ia32_gatherdiv16si ((__v8si) __v1_old,
-						 __addr,
-						 (__v8di) __index,
-						 __mask, __scale);
+  return (__m512d) _mm512_and_epi64 ((__m512i) __A,
+				     _mm512_set1_epi64 (0x7fffffffffffffffLL));
 }
 
-extern __inline __m512i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_i64gather_epi64 (__m512i __index, void const *__addr, int __scale)
+_mm512_mask_abs_pd (__m512d __W, __mmask8 __U, __m512d __A)
 {
-  __m512i __v1_old = _mm512_undefined_epi32 ();
-  __mmask8 __mask = 0xFF;
-
-  return (__m512i) __builtin_ia32_gatherdiv8di ((__v8di) __v1_old,
-						__addr,
-						(__v8di) __index, __mask,
-						__scale);
+  return (__m512d)
+	 _mm512_mask_and_epi64 ((__m512i) __W, __U, (__m512i) __A,
+				_mm512_set1_epi64 (0x7fffffffffffffffLL));
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_i64gather_epi64 (__m512i __v1_old, __mmask8 __mask,
-			     __m512i __index, void const *__addr,
-			     int __scale)
+_mm512_unpackhi_epi32 (__m512i __A, __m512i __B)
 {
-  return (__m512i) __builtin_ia32_gatherdiv8di ((__v8di) __v1_old,
-						__addr,
-						(__v8di) __index,
-						__mask, __scale);
+  return (__m512i) __builtin_ia32_punpckhdq512_mask ((__v16si) __A,
+						     (__v16si) __B,
+						     (__v16si)
+						     _mm512_undefined_epi32 (),
+						     (__mmask16) -1);
 }
 
-extern __inline void
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_i32scatter_ps (void *__addr, __m512i __index, __m512 __v1, int __scale)
+_mm512_mask_unpackhi_epi32 (__m512i __W, __mmask16 __U, __m512i __A,
+			    __m512i __B)
 {
-  __builtin_ia32_scattersiv16sf (__addr, (__mmask16) 0xFFFF,
-				 (__v16si) __index, (__v16sf) __v1, __scale);
+  return (__m512i) __builtin_ia32_punpckhdq512_mask ((__v16si) __A,
+						     (__v16si) __B,
+						     (__v16si) __W,
+						     (__mmask16) __U);
 }
 
-extern __inline void
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_i32scatter_ps (void *__addr, __mmask16 __mask,
-			   __m512i __index, __m512 __v1, int __scale)
+_mm512_maskz_unpackhi_epi32 (__mmask16 __U, __m512i __A, __m512i __B)
 {
-  __builtin_ia32_scattersiv16sf (__addr, __mask, (__v16si) __index,
-				 (__v16sf) __v1, __scale);
+  return (__m512i) __builtin_ia32_punpckhdq512_mask ((__v16si) __A,
+						     (__v16si) __B,
+						     (__v16si)
+						     _mm512_setzero_si512 (),
+						     (__mmask16) __U);
 }
 
-extern __inline void
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_i32scatter_pd (void *__addr, __m256i __index, __m512d __v1,
-		      int __scale)
+_mm512_unpackhi_epi64 (__m512i __A, __m512i __B)
 {
-  __builtin_ia32_scattersiv8df (__addr, (__mmask8) 0xFF,
-				(__v8si) __index, (__v8df) __v1, __scale);
+  return (__m512i) __builtin_ia32_punpckhqdq512_mask ((__v8di) __A,
+						      (__v8di) __B,
+						      (__v8di)
+						      _mm512_undefined_epi32 (),
+						      (__mmask8) -1);
 }
 
-extern __inline void
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_i32scatter_pd (void *__addr, __mmask8 __mask,
-			   __m256i __index, __m512d __v1, int __scale)
+_mm512_mask_unpackhi_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m512i __B)
 {
-  __builtin_ia32_scattersiv8df (__addr, __mask, (__v8si) __index,
-				(__v8df) __v1, __scale);
+  return (__m512i) __builtin_ia32_punpckhqdq512_mask ((__v8di) __A,
+						      (__v8di) __B,
+						      (__v8di) __W,
+						      (__mmask8) __U);
 }
 
-extern __inline void
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_i64scatter_ps (void *__addr, __m512i __index, __m256 __v1, int __scale)
+_mm512_maskz_unpackhi_epi64 (__mmask8 __U, __m512i __A, __m512i __B)
 {
-  __builtin_ia32_scatterdiv16sf (__addr, (__mmask8) 0xFF,
-				 (__v8di) __index, (__v8sf) __v1, __scale);
+  return (__m512i) __builtin_ia32_punpckhqdq512_mask ((__v8di) __A,
+						      (__v8di) __B,
+						      (__v8di)
+						      _mm512_setzero_si512 (),
+						      (__mmask8) __U);
 }
 
-extern __inline void
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_i64scatter_ps (void *__addr, __mmask8 __mask,
-			   __m512i __index, __m256 __v1, int __scale)
+_mm512_unpacklo_epi32 (__m512i __A, __m512i __B)
 {
-  __builtin_ia32_scatterdiv16sf (__addr, __mask, (__v8di) __index,
-				 (__v8sf) __v1, __scale);
+  return (__m512i) __builtin_ia32_punpckldq512_mask ((__v16si) __A,
+						     (__v16si) __B,
+						     (__v16si)
+						     _mm512_undefined_epi32 (),
+						     (__mmask16) -1);
 }
 
-extern __inline void
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_i64scatter_pd (void *__addr, __m512i __index, __m512d __v1,
-		      int __scale)
+_mm512_mask_unpacklo_epi32 (__m512i __W, __mmask16 __U, __m512i __A,
+			    __m512i __B)
 {
-  __builtin_ia32_scatterdiv8df (__addr, (__mmask8) 0xFF,
-				(__v8di) __index, (__v8df) __v1, __scale);
+  return (__m512i) __builtin_ia32_punpckldq512_mask ((__v16si) __A,
+						     (__v16si) __B,
+						     (__v16si) __W,
+						     (__mmask16) __U);
 }
 
-extern __inline void
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_i64scatter_pd (void *__addr, __mmask8 __mask,
-			   __m512i __index, __m512d __v1, int __scale)
+_mm512_maskz_unpacklo_epi32 (__mmask16 __U, __m512i __A, __m512i __B)
 {
-  __builtin_ia32_scatterdiv8df (__addr, __mask, (__v8di) __index,
-				(__v8df) __v1, __scale);
+  return (__m512i) __builtin_ia32_punpckldq512_mask ((__v16si) __A,
+						     (__v16si) __B,
+						     (__v16si)
+						     _mm512_setzero_si512 (),
+						     (__mmask16) __U);
 }
 
-extern __inline void
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_i32scatter_epi32 (void *__addr, __m512i __index,
-			 __m512i __v1, int __scale)
+_mm512_unpacklo_epi64 (__m512i __A, __m512i __B)
 {
-  __builtin_ia32_scattersiv16si (__addr, (__mmask16) 0xFFFF,
-				 (__v16si) __index, (__v16si) __v1, __scale);
+  return (__m512i) __builtin_ia32_punpcklqdq512_mask ((__v8di) __A,
+						      (__v8di) __B,
+						      (__v8di)
+						      _mm512_undefined_epi32 (),
+						      (__mmask8) -1);
 }
 
-extern __inline void
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_i32scatter_epi32 (void *__addr, __mmask16 __mask,
-			      __m512i __index, __m512i __v1, int __scale)
+_mm512_mask_unpacklo_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m512i __B)
 {
-  __builtin_ia32_scattersiv16si (__addr, __mask, (__v16si) __index,
-				 (__v16si) __v1, __scale);
+  return (__m512i) __builtin_ia32_punpcklqdq512_mask ((__v8di) __A,
+						      (__v8di) __B,
+						      (__v8di) __W,
+						      (__mmask8) __U);
 }
 
-extern __inline void
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_i32scatter_epi64 (void *__addr, __m256i __index,
-			 __m512i __v1, int __scale)
+_mm512_maskz_unpacklo_epi64 (__mmask8 __U, __m512i __A, __m512i __B)
 {
-  __builtin_ia32_scattersiv8di (__addr, (__mmask8) 0xFF,
-				(__v8si) __index, (__v8di) __v1, __scale);
+  return (__m512i) __builtin_ia32_punpcklqdq512_mask ((__v8di) __A,
+						      (__v8di) __B,
+						      (__v8di)
+						      _mm512_setzero_si512 (),
+						      (__mmask8) __U);
 }
 
-extern __inline void
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_i32scatter_epi64 (void *__addr, __mmask8 __mask,
-			      __m256i __index, __m512i __v1, int __scale)
+_mm512_movedup_pd (__m512d __A)
 {
-  __builtin_ia32_scattersiv8di (__addr, __mask, (__v8si) __index,
-				(__v8di) __v1, __scale);
+  return (__m512d) __builtin_ia32_movddup512_mask ((__v8df) __A,
+						   (__v8df)
+						   _mm512_undefined_pd (),
+						   (__mmask8) -1);
 }
 
-extern __inline void
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_i64scatter_epi32 (void *__addr, __m512i __index,
-			 __m256i __v1, int __scale)
+_mm512_mask_movedup_pd (__m512d __W, __mmask8 __U, __m512d __A)
 {
-  __builtin_ia32_scatterdiv16si (__addr, (__mmask8) 0xFF,
-				 (__v8di) __index, (__v8si) __v1, __scale);
+  return (__m512d) __builtin_ia32_movddup512_mask ((__v8df) __A,
+						   (__v8df) __W,
+						   (__mmask8) __U);
 }
 
-extern __inline void
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_i64scatter_epi32 (void *__addr, __mmask8 __mask,
-			      __m512i __index, __m256i __v1, int __scale)
+_mm512_maskz_movedup_pd (__mmask8 __U, __m512d __A)
 {
-  __builtin_ia32_scatterdiv16si (__addr, __mask, (__v8di) __index,
-				 (__v8si) __v1, __scale);
+  return (__m512d) __builtin_ia32_movddup512_mask ((__v8df) __A,
+						   (__v8df)
+						   _mm512_setzero_pd (),
+						   (__mmask8) __U);
 }
 
-extern __inline void
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_i64scatter_epi64 (void *__addr, __m512i __index,
-			 __m512i __v1, int __scale)
+_mm512_unpacklo_pd (__m512d __A, __m512d __B)
 {
-  __builtin_ia32_scatterdiv8di (__addr, (__mmask8) 0xFF,
-				(__v8di) __index, (__v8di) __v1, __scale);
+  return (__m512d) __builtin_ia32_unpcklpd512_mask ((__v8df) __A,
+						    (__v8df) __B,
+						    (__v8df)
+						    _mm512_undefined_pd (),
+						    (__mmask8) -1);
 }
 
-extern __inline void
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_i64scatter_epi64 (void *__addr, __mmask8 __mask,
-			      __m512i __index, __m512i __v1, int __scale)
+_mm512_mask_unpacklo_pd (__m512d __W, __mmask8 __U, __m512d __A, __m512d __B)
 {
-  __builtin_ia32_scatterdiv8di (__addr, __mask, (__v8di) __index,
-				(__v8di) __v1, __scale);
-}
-#else
-#define _mm512_i32gather_ps(INDEX, ADDR, SCALE)				\
-  (__m512) __builtin_ia32_gathersiv16sf ((__v16sf)_mm512_undefined_ps(),\
-					 (void const *) (ADDR),		\
-					 (__v16si)(__m512i) (INDEX),	\
-					 (__mmask16)0xFFFF,		\
-					 (int) (SCALE))
-
-#define _mm512_mask_i32gather_ps(V1OLD, MASK, INDEX, ADDR, SCALE)	\
-  (__m512) __builtin_ia32_gathersiv16sf ((__v16sf)(__m512) (V1OLD),	\
-					 (void const *) (ADDR),		\
-					 (__v16si)(__m512i) (INDEX),	\
-					 (__mmask16) (MASK),		\
-					 (int) (SCALE))
-
-#define _mm512_i32gather_pd(INDEX, ADDR, SCALE)				\
-  (__m512d) __builtin_ia32_gathersiv8df ((__v8df)_mm512_undefined_pd(),	\
-					 (void const *) (ADDR),		\
-					 (__v8si)(__m256i) (INDEX),	\
-					 (__mmask8)0xFF, (int) (SCALE))
-
-#define _mm512_mask_i32gather_pd(V1OLD, MASK, INDEX, ADDR, SCALE)	\
-  (__m512d) __builtin_ia32_gathersiv8df ((__v8df)(__m512d) (V1OLD),	\
-					 (void const *) (ADDR),		\
-					 (__v8si)(__m256i) (INDEX),	\
-					 (__mmask8) (MASK),		\
-					 (int) (SCALE))
-
-#define _mm512_i64gather_ps(INDEX, ADDR, SCALE)				\
-  (__m256) __builtin_ia32_gatherdiv16sf ((__v8sf)_mm256_undefined_ps(),	\
-					 (void const *) (ADDR),		\
-					 (__v8di)(__m512i) (INDEX),	\
-					 (__mmask8)0xFF, (int) (SCALE))
-
-#define _mm512_mask_i64gather_ps(V1OLD, MASK, INDEX, ADDR, SCALE)	\
-  (__m256) __builtin_ia32_gatherdiv16sf ((__v8sf)(__m256) (V1OLD),	\
-					 (void const *) (ADDR),		\
-					 (__v8di)(__m512i) (INDEX),	\
-					 (__mmask8) (MASK),		\
-					 (int) (SCALE))
-
-#define _mm512_i64gather_pd(INDEX, ADDR, SCALE)				\
-  (__m512d) __builtin_ia32_gatherdiv8df ((__v8df)_mm512_undefined_pd(),	\
-					 (void const *) (ADDR),		\
-					 (__v8di)(__m512i) (INDEX),	\
-					 (__mmask8)0xFF, (int) (SCALE))
-
-#define _mm512_mask_i64gather_pd(V1OLD, MASK, INDEX, ADDR, SCALE)	\
-  (__m512d) __builtin_ia32_gatherdiv8df ((__v8df)(__m512d) (V1OLD),	\
-					 (void const *) (ADDR),		\
-					 (__v8di)(__m512i) (INDEX),	\
-					 (__mmask8) (MASK),		\
-					 (int) (SCALE))
-
-#define _mm512_i32gather_epi32(INDEX, ADDR, SCALE)			\
-  (__m512i) __builtin_ia32_gathersiv16si ((__v16si)_mm512_undefined_epi32 (),\
-					  (void const *) (ADDR),	\
-					  (__v16si)(__m512i) (INDEX),	\
-					  (__mmask16)0xFFFF,		\
-					  (int) (SCALE))
-
-#define _mm512_mask_i32gather_epi32(V1OLD, MASK, INDEX, ADDR, SCALE)	\
-  (__m512i) __builtin_ia32_gathersiv16si ((__v16si)(__m512i) (V1OLD),	\
-					  (void const *) (ADDR),	\
-					  (__v16si)(__m512i) (INDEX),	\
-					  (__mmask16) (MASK),		\
-					  (int) (SCALE))
-
-#define _mm512_i32gather_epi64(INDEX, ADDR, SCALE)			\
-  (__m512i) __builtin_ia32_gathersiv8di ((__v8di)_mm512_undefined_epi32 (),\
-					 (void const *) (ADDR),		\
-					 (__v8si)(__m256i) (INDEX),	\
-					 (__mmask8)0xFF, (int) (SCALE))
-
-#define _mm512_mask_i32gather_epi64(V1OLD, MASK, INDEX, ADDR, SCALE)	\
-  (__m512i) __builtin_ia32_gathersiv8di ((__v8di)(__m512i) (V1OLD),	\
-					 (void const *) (ADDR),		\
-					 (__v8si)(__m256i) (INDEX),	\
-					 (__mmask8) (MASK),		\
-					 (int) (SCALE))
-
-#define _mm512_i64gather_epi32(INDEX, ADDR, SCALE)			   \
-  (__m256i) __builtin_ia32_gatherdiv16si ((__v8si)_mm256_undefined_si256(),\
-					  (void const *) (ADDR),	   \
-					  (__v8di)(__m512i) (INDEX),	   \
-					  (__mmask8)0xFF, (int) (SCALE))
-
-#define _mm512_mask_i64gather_epi32(V1OLD, MASK, INDEX, ADDR, SCALE)	\
-  (__m256i) __builtin_ia32_gatherdiv16si ((__v8si)(__m256i) (V1OLD),	\
-					  (void const *) (ADDR),	\
-					  (__v8di)(__m512i) (INDEX),	\
-					  (__mmask8) (MASK),		\
-					  (int) (SCALE))
-
-#define _mm512_i64gather_epi64(INDEX, ADDR, SCALE)			\
-  (__m512i) __builtin_ia32_gatherdiv8di ((__v8di)_mm512_undefined_epi32 (),\
-					 (void const *) (ADDR),		\
-					 (__v8di)(__m512i) (INDEX),	\
-					 (__mmask8)0xFF, (int) (SCALE))
-
-#define _mm512_mask_i64gather_epi64(V1OLD, MASK, INDEX, ADDR, SCALE)	\
-  (__m512i) __builtin_ia32_gatherdiv8di ((__v8di)(__m512i) (V1OLD),	\
-					 (void const *) (ADDR),		\
-					 (__v8di)(__m512i) (INDEX),	\
-					 (__mmask8) (MASK),		\
-					 (int) (SCALE))
-
-#define _mm512_i32scatter_ps(ADDR, INDEX, V1, SCALE)			\
-  __builtin_ia32_scattersiv16sf ((void *) (ADDR), (__mmask16)0xFFFF,	\
-				 (__v16si)(__m512i) (INDEX),		\
-				 (__v16sf)(__m512) (V1), (int) (SCALE))
-
-#define _mm512_mask_i32scatter_ps(ADDR, MASK, INDEX, V1, SCALE)		\
-  __builtin_ia32_scattersiv16sf ((void *) (ADDR), (__mmask16) (MASK),	\
-				 (__v16si)(__m512i) (INDEX),		\
-				 (__v16sf)(__m512) (V1), (int) (SCALE))
-
-#define _mm512_i32scatter_pd(ADDR, INDEX, V1, SCALE)			\
-  __builtin_ia32_scattersiv8df ((void *) (ADDR), (__mmask8)0xFF,	\
-				(__v8si)(__m256i) (INDEX),		\
-				(__v8df)(__m512d) (V1), (int) (SCALE))
-
-#define _mm512_mask_i32scatter_pd(ADDR, MASK, INDEX, V1, SCALE)		\
-  __builtin_ia32_scattersiv8df ((void *) (ADDR), (__mmask8) (MASK),	\
-				(__v8si)(__m256i) (INDEX),		\
-				(__v8df)(__m512d) (V1), (int) (SCALE))
-
-#define _mm512_i64scatter_ps(ADDR, INDEX, V1, SCALE)			\
-  __builtin_ia32_scatterdiv16sf ((void *) (ADDR), (__mmask8)0xFF,	\
-				 (__v8di)(__m512i) (INDEX),		\
-				 (__v8sf)(__m256) (V1), (int) (SCALE))
-
-#define _mm512_mask_i64scatter_ps(ADDR, MASK, INDEX, V1, SCALE)		\
-  __builtin_ia32_scatterdiv16sf ((void *) (ADDR), (__mmask16) (MASK),	\
-				 (__v8di)(__m512i) (INDEX),		\
-				 (__v8sf)(__m256) (V1), (int) (SCALE))
-
-#define _mm512_i64scatter_pd(ADDR, INDEX, V1, SCALE)			\
-  __builtin_ia32_scatterdiv8df ((void *) (ADDR), (__mmask8)0xFF,	\
-				(__v8di)(__m512i) (INDEX),		\
-				(__v8df)(__m512d) (V1), (int) (SCALE))
-
-#define _mm512_mask_i64scatter_pd(ADDR, MASK, INDEX, V1, SCALE)		\
-  __builtin_ia32_scatterdiv8df ((void *) (ADDR), (__mmask8) (MASK),	\
-				(__v8di)(__m512i) (INDEX),		\
-				(__v8df)(__m512d) (V1), (int) (SCALE))
-
-#define _mm512_i32scatter_epi32(ADDR, INDEX, V1, SCALE)			\
-  __builtin_ia32_scattersiv16si ((void *) (ADDR), (__mmask16)0xFFFF,	\
-				 (__v16si)(__m512i) (INDEX),		\
-				 (__v16si)(__m512i) (V1), (int) (SCALE))
-
-#define _mm512_mask_i32scatter_epi32(ADDR, MASK, INDEX, V1, SCALE)	\
-  __builtin_ia32_scattersiv16si ((void *) (ADDR), (__mmask16) (MASK),	\
-				 (__v16si)(__m512i) (INDEX),		\
-				 (__v16si)(__m512i) (V1), (int) (SCALE))
-
-#define _mm512_i32scatter_epi64(ADDR, INDEX, V1, SCALE)			\
-  __builtin_ia32_scattersiv8di ((void *) (ADDR), (__mmask8)0xFF,	\
-				(__v8si)(__m256i) (INDEX),		\
-				(__v8di)(__m512i) (V1), (int) (SCALE))
-
-#define _mm512_mask_i32scatter_epi64(ADDR, MASK, INDEX, V1, SCALE)	\
-  __builtin_ia32_scattersiv8di ((void *) (ADDR), (__mmask8) (MASK),	\
-				(__v8si)(__m256i) (INDEX),		\
-				(__v8di)(__m512i) (V1), (int) (SCALE))
-
-#define _mm512_i64scatter_epi32(ADDR, INDEX, V1, SCALE)			\
-  __builtin_ia32_scatterdiv16si ((void *) (ADDR), (__mmask8)0xFF,	\
-				 (__v8di)(__m512i) (INDEX),		\
-				 (__v8si)(__m256i) (V1), (int) (SCALE))
-
-#define _mm512_mask_i64scatter_epi32(ADDR, MASK, INDEX, V1, SCALE)	\
-  __builtin_ia32_scatterdiv16si ((void *) (ADDR), (__mmask8) (MASK),	\
-				 (__v8di)(__m512i) (INDEX),		\
-				 (__v8si)(__m256i) (V1), (int) (SCALE))
+  return (__m512d) __builtin_ia32_unpcklpd512_mask ((__v8df) __A,
+						    (__v8df) __B,
+						    (__v8df) __W,
+						    (__mmask8) __U);
+}
 
-#define _mm512_i64scatter_epi64(ADDR, INDEX, V1, SCALE)			\
-  __builtin_ia32_scatterdiv8di ((void *) (ADDR), (__mmask8)0xFF,	\
-				(__v8di)(__m512i) (INDEX),		\
-				(__v8di)(__m512i) (V1), (int) (SCALE))
+extern __inline __m512d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_unpacklo_pd (__mmask8 __U, __m512d __A, __m512d __B)
+{
+  return (__m512d) __builtin_ia32_unpcklpd512_mask ((__v8df) __A,
+						    (__v8df) __B,
+						    (__v8df)
+						    _mm512_setzero_pd (),
+						    (__mmask8) __U);
+}
 
-#define _mm512_mask_i64scatter_epi64(ADDR, MASK, INDEX, V1, SCALE)	\
-  __builtin_ia32_scatterdiv8di ((void *) (ADDR), (__mmask8) (MASK),	\
-				(__v8di)(__m512i) (INDEX),		\
-				(__v8di)(__m512i) (V1), (int) (SCALE))
-#endif
+extern __inline __m512d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_unpackhi_pd (__m512d __A, __m512d __B)
+{
+  return (__m512d) __builtin_ia32_unpckhpd512_mask ((__v8df) __A,
+						    (__v8df) __B,
+						    (__v8df)
+						    _mm512_undefined_pd (),
+						    (__mmask8) -1);
+}
 
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_compress_pd (__m512d __W, __mmask8 __U, __m512d __A)
+_mm512_mask_unpackhi_pd (__m512d __W, __mmask8 __U, __m512d __A, __m512d __B)
 {
-  return (__m512d) __builtin_ia32_compressdf512_mask ((__v8df) __A,
-						      (__v8df) __W,
-						      (__mmask8) __U);
+  return (__m512d) __builtin_ia32_unpckhpd512_mask ((__v8df) __A,
+						    (__v8df) __B,
+						    (__v8df) __W,
+						    (__mmask8) __U);
 }
 
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_compress_pd (__mmask8 __U, __m512d __A)
+_mm512_maskz_unpackhi_pd (__mmask8 __U, __m512d __A, __m512d __B)
 {
-  return (__m512d) __builtin_ia32_compressdf512_mask ((__v8df) __A,
-						      (__v8df)
-						      _mm512_setzero_pd (),
-						      (__mmask8) __U);
+  return (__m512d) __builtin_ia32_unpckhpd512_mask ((__v8df) __A,
+						    (__v8df) __B,
+						    (__v8df)
+						    _mm512_setzero_pd (),
+						    (__mmask8) __U);
 }
 
-extern __inline void
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_compressstoreu_pd (void *__P, __mmask8 __U, __m512d __A)
+_mm512_unpackhi_ps (__m512 __A, __m512 __B)
 {
-  __builtin_ia32_compressstoredf512_mask ((__v8df *) __P, (__v8df) __A,
-					  (__mmask8) __U);
+  return (__m512) __builtin_ia32_unpckhps512_mask ((__v16sf) __A,
+						   (__v16sf) __B,
+						   (__v16sf)
+						   _mm512_undefined_ps (),
+						   (__mmask16) -1);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_compress_ps (__m512 __W, __mmask16 __U, __m512 __A)
+_mm512_mask_unpackhi_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B)
 {
-  return (__m512) __builtin_ia32_compresssf512_mask ((__v16sf) __A,
-						     (__v16sf) __W,
-						     (__mmask16) __U);
+  return (__m512) __builtin_ia32_unpckhps512_mask ((__v16sf) __A,
+						   (__v16sf) __B,
+						   (__v16sf) __W,
+						   (__mmask16) __U);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_compress_ps (__mmask16 __U, __m512 __A)
+_mm512_maskz_unpackhi_ps (__mmask16 __U, __m512 __A, __m512 __B)
 {
-  return (__m512) __builtin_ia32_compresssf512_mask ((__v16sf) __A,
-						     (__v16sf)
-						     _mm512_setzero_ps (),
-						     (__mmask16) __U);
+  return (__m512) __builtin_ia32_unpckhps512_mask ((__v16sf) __A,
+						   (__v16sf) __B,
+						   (__v16sf)
+						   _mm512_setzero_ps (),
+						   (__mmask16) __U);
 }
 
-extern __inline void
+#ifdef __OPTIMIZE__
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_compressstoreu_ps (void *__P, __mmask16 __U, __m512 __A)
+_mm512_cvt_roundps_pd (__m256 __A, const int __R)
 {
-  __builtin_ia32_compressstoresf512_mask ((__v16sf *) __P, (__v16sf) __A,
-					  (__mmask16) __U);
+  return (__m512d) __builtin_ia32_cvtps2pd512_mask ((__v8sf) __A,
+						    (__v8df)
+						    _mm512_undefined_pd (),
+						    (__mmask8) -1, __R);
 }
 
-extern __inline __m512i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_compress_epi64 (__m512i __W, __mmask8 __U, __m512i __A)
+_mm512_mask_cvt_roundps_pd (__m512d __W, __mmask8 __U, __m256 __A,
+			    const int __R)
 {
-  return (__m512i) __builtin_ia32_compressdi512_mask ((__v8di) __A,
-						      (__v8di) __W,
-						      (__mmask8) __U);
+  return (__m512d) __builtin_ia32_cvtps2pd512_mask ((__v8sf) __A,
+						    (__v8df) __W,
+						    (__mmask8) __U, __R);
 }
 
-extern __inline __m512i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_compress_epi64 (__mmask8 __U, __m512i __A)
+_mm512_maskz_cvt_roundps_pd (__mmask8 __U, __m256 __A, const int __R)
 {
-  return (__m512i) __builtin_ia32_compressdi512_mask ((__v8di) __A,
-						      (__v8di)
-						      _mm512_setzero_si512 (),
-						      (__mmask8) __U);
+  return (__m512d) __builtin_ia32_cvtps2pd512_mask ((__v8sf) __A,
+						    (__v8df)
+						    _mm512_setzero_pd (),
+						    (__mmask8) __U, __R);
 }
 
-extern __inline void
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_compressstoreu_epi64 (void *__P, __mmask8 __U, __m512i __A)
+_mm512_cvt_roundph_ps (__m256i __A, const int __R)
 {
-  __builtin_ia32_compressstoredi512_mask ((__v8di *) __P, (__v8di) __A,
-					  (__mmask8) __U);
+  return (__m512) __builtin_ia32_vcvtph2ps512_mask ((__v16hi) __A,
+						    (__v16sf)
+						    _mm512_undefined_ps (),
+						    (__mmask16) -1, __R);
 }
 
-extern __inline __m512i
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_compress_epi32 (__m512i __W, __mmask16 __U, __m512i __A)
+_mm512_mask_cvt_roundph_ps (__m512 __W, __mmask16 __U, __m256i __A,
+			    const int __R)
 {
-  return (__m512i) __builtin_ia32_compresssi512_mask ((__v16si) __A,
-						      (__v16si) __W,
-						      (__mmask16) __U);
+  return (__m512) __builtin_ia32_vcvtph2ps512_mask ((__v16hi) __A,
+						    (__v16sf) __W,
+						    (__mmask16) __U, __R);
 }
 
-extern __inline __m512i
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_compress_epi32 (__mmask16 __U, __m512i __A)
+_mm512_maskz_cvt_roundph_ps (__mmask16 __U, __m256i __A, const int __R)
 {
-  return (__m512i) __builtin_ia32_compresssi512_mask ((__v16si) __A,
-						      (__v16si)
-						      _mm512_setzero_si512 (),
-						      (__mmask16) __U);
+  return (__m512) __builtin_ia32_vcvtph2ps512_mask ((__v16hi) __A,
+						    (__v16sf)
+						    _mm512_setzero_ps (),
+						    (__mmask16) __U, __R);
 }
 
-extern __inline void
+extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_compressstoreu_epi32 (void *__P, __mmask16 __U, __m512i __A)
+_mm512_cvt_roundps_ph (__m512 __A, const int __I)
 {
-  __builtin_ia32_compressstoresi512_mask ((__v16si *) __P, (__v16si) __A,
-					  (__mmask16) __U);
+  return (__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf) __A,
+						     __I,
+						     (__v16hi)
+						     _mm256_undefined_si256 (),
+						     -1);
 }
 
-extern __inline __m512d
+extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_expand_pd (__m512d __W, __mmask8 __U, __m512d __A)
+_mm512_cvtps_ph (__m512 __A, const int __I)
 {
-  return (__m512d) __builtin_ia32_expanddf512_mask ((__v8df) __A,
-						    (__v8df) __W,
-						    (__mmask8) __U);
+  return (__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf) __A,
+						     __I,
+						     (__v16hi)
+						     _mm256_undefined_si256 (),
+						     -1);
 }
 
-extern __inline __m512d
+extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_expand_pd (__mmask8 __U, __m512d __A)
+_mm512_mask_cvt_roundps_ph (__m256i __U, __mmask16 __W, __m512 __A,
+			    const int __I)
 {
-  return (__m512d) __builtin_ia32_expanddf512_maskz ((__v8df) __A,
-						     (__v8df)
-						     _mm512_setzero_pd (),
-						     (__mmask8) __U);
+  return (__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf) __A,
+						     __I,
+						     (__v16hi) __U,
+						     (__mmask16) __W);
 }
 
-extern __inline __m512d
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cvtps_ph (__m256i __U, __mmask16 __W, __m512 __A, const int __I)
+{
+  return (__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf) __A,
+						     __I,
+						     (__v16hi) __U,
+						     (__mmask16) __W);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvt_roundps_ph (__mmask16 __W, __m512 __A, const int __I)
+{
+  return (__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf) __A,
+						     __I,
+						     (__v16hi)
+						     _mm256_setzero_si256 (),
+						     (__mmask16) __W);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvtps_ph (__mmask16 __W, __m512 __A, const int __I)
+{
+  return (__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf) __A,
+						     __I,
+						     (__v16hi)
+						     _mm256_setzero_si256 (),
+						     (__mmask16) __W);
+}
+#else
+#define _mm512_cvt_roundps_pd(A, B)		 \
+    (__m512d)__builtin_ia32_cvtps2pd512_mask(A, (__v8df)_mm512_undefined_pd(), -1, B)
+
+#define _mm512_mask_cvt_roundps_pd(W, U, A, B)   \
+    (__m512d)__builtin_ia32_cvtps2pd512_mask(A, (__v8df)(W), U, B)
+
+#define _mm512_maskz_cvt_roundps_pd(U, A, B)     \
+    (__m512d)__builtin_ia32_cvtps2pd512_mask(A, (__v8df)_mm512_setzero_pd(), U, B)
+
+#define _mm512_cvt_roundph_ps(A, B)		 \
+    (__m512)__builtin_ia32_vcvtph2ps512_mask((__v16hi)(A), (__v16sf)_mm512_undefined_ps(), -1, B)
+
+#define _mm512_mask_cvt_roundph_ps(W, U, A, B)   \
+    (__m512)__builtin_ia32_vcvtph2ps512_mask((__v16hi)(A), (__v16sf)(W), U, B)
+
+#define _mm512_maskz_cvt_roundph_ps(U, A, B)     \
+    (__m512)__builtin_ia32_vcvtph2ps512_mask((__v16hi)(A), (__v16sf)_mm512_setzero_ps(), U, B)
+
+#define _mm512_cvt_roundps_ph(A, I)						 \
+  ((__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf)(__m512) (A), (int) (I),\
+    (__v16hi)_mm256_undefined_si256 (), -1))
+#define _mm512_cvtps_ph(A, I)						 \
+  ((__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf)(__m512) (A), (int) (I),\
+    (__v16hi)_mm256_undefined_si256 (), -1))
+#define _mm512_mask_cvt_roundps_ph(U, W, A, I)				 \
+  ((__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf)(__m512) (A), (int) (I),\
+    (__v16hi)(__m256i)(U), (__mmask16) (W)))
+#define _mm512_mask_cvtps_ph(U, W, A, I)				 \
+  ((__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf)(__m512) (A), (int) (I),\
+    (__v16hi)(__m256i)(U), (__mmask16) (W)))
+#define _mm512_maskz_cvt_roundps_ph(W, A, I)					 \
+  ((__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf)(__m512) (A), (int) (I),\
+    (__v16hi)_mm256_setzero_si256 (), (__mmask16) (W)))
+#define _mm512_maskz_cvtps_ph(W, A, I)					 \
+  ((__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf)(__m512) (A), (int) (I),\
+    (__v16hi)_mm256_setzero_si256 (), (__mmask16) (W)))
+#endif
+
+#ifdef __OPTIMIZE__
+extern __inline __m256
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_expandloadu_pd (__m512d __W, __mmask8 __U, void const *__P)
+_mm512_cvt_roundpd_ps (__m512d __A, const int __R)
 {
-  return (__m512d) __builtin_ia32_expandloaddf512_mask ((const __v8df *) __P,
-							(__v8df) __W,
-							(__mmask8) __U);
+  return (__m256) __builtin_ia32_cvtpd2ps512_mask ((__v8df) __A,
+						   (__v8sf)
+						   _mm256_undefined_ps (),
+						   (__mmask8) -1, __R);
 }
 
-extern __inline __m512d
+extern __inline __m256
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_expandloadu_pd (__mmask8 __U, void const *__P)
+_mm512_mask_cvt_roundpd_ps (__m256 __W, __mmask8 __U, __m512d __A,
+			    const int __R)
 {
-  return (__m512d) __builtin_ia32_expandloaddf512_maskz ((const __v8df *) __P,
-							 (__v8df)
-							 _mm512_setzero_pd (),
-							 (__mmask8) __U);
+  return (__m256) __builtin_ia32_cvtpd2ps512_mask ((__v8df) __A,
+						   (__v8sf) __W,
+						   (__mmask8) __U, __R);
 }
 
-extern __inline __m512
+extern __inline __m256
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_expand_ps (__m512 __W, __mmask16 __U, __m512 __A)
+_mm512_maskz_cvt_roundpd_ps (__mmask8 __U, __m512d __A, const int __R)
 {
-  return (__m512) __builtin_ia32_expandsf512_mask ((__v16sf) __A,
-						   (__v16sf) __W,
-						   (__mmask16) __U);
+  return (__m256) __builtin_ia32_cvtpd2ps512_mask ((__v8df) __A,
+						   (__v8sf)
+						   _mm256_setzero_ps (),
+						   (__mmask8) __U, __R);
 }
 
-extern __inline __m512
+#else
+#define _mm512_cvt_roundpd_ps(A, B)		 \
+    (__m256)__builtin_ia32_cvtpd2ps512_mask(A, (__v8sf)_mm256_undefined_ps(), -1, B)
+
+#define _mm512_mask_cvt_roundpd_ps(W, U, A, B)   \
+    (__m256)__builtin_ia32_cvtpd2ps512_mask(A, (__v8sf)(W), U, B)
+
+#define _mm512_maskz_cvt_roundpd_ps(U, A, B)     \
+    (__m256)__builtin_ia32_cvtpd2ps512_mask(A, (__v8sf)_mm256_setzero_ps(), U, B)
+
+#endif
+
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_expand_ps (__mmask16 __U, __m512 __A)
+_mm512_stream_si512 (__m512i * __P, __m512i __A)
 {
-  return (__m512) __builtin_ia32_expandsf512_maskz ((__v16sf) __A,
-						    (__v16sf)
-						    _mm512_setzero_ps (),
-						    (__mmask16) __U);
+  __builtin_ia32_movntdq512 ((__v8di *) __P, (__v8di) __A);
 }
 
-extern __inline __m512
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_expandloadu_ps (__m512 __W, __mmask16 __U, void const *__P)
+_mm512_stream_ps (float *__P, __m512 __A)
 {
-  return (__m512) __builtin_ia32_expandloadsf512_mask ((const __v16sf *) __P,
-						       (__v16sf) __W,
-						       (__mmask16) __U);
+  __builtin_ia32_movntps512 (__P, (__v16sf) __A);
 }
 
-extern __inline __m512
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_expandloadu_ps (__mmask16 __U, void const *__P)
+_mm512_stream_pd (double *__P, __m512d __A)
 {
-  return (__m512) __builtin_ia32_expandloadsf512_maskz ((const __v16sf *) __P,
-							(__v16sf)
-							_mm512_setzero_ps (),
-							(__mmask16) __U);
+  __builtin_ia32_movntpd512 (__P, (__v8df) __A);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_expand_epi64 (__m512i __W, __mmask8 __U, __m512i __A)
+_mm512_stream_load_si512 (void *__P)
 {
-  return (__m512i) __builtin_ia32_expanddi512_mask ((__v8di) __A,
-						    (__v8di) __W,
-						    (__mmask8) __U);
+  return __builtin_ia32_movntdqa512 ((__v8di *)__P);
 }
 
-extern __inline __m512i
+#ifdef __OPTIMIZE__
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_expand_epi64 (__mmask8 __U, __m512i __A)
+_mm512_getexp_round_ps (__m512 __A, const int __R)
 {
-  return (__m512i) __builtin_ia32_expanddi512_maskz ((__v8di) __A,
-						     (__v8di)
-						     _mm512_setzero_si512 (),
-						     (__mmask8) __U);
+  return (__m512) __builtin_ia32_getexpps512_mask ((__v16sf) __A,
+						   (__v16sf)
+						   _mm512_undefined_ps (),
+						   (__mmask16) -1, __R);
 }
 
-extern __inline __m512i
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_expandloadu_epi64 (__m512i __W, __mmask8 __U, void const *__P)
+_mm512_mask_getexp_round_ps (__m512 __W, __mmask16 __U, __m512 __A,
+			     const int __R)
 {
-  return (__m512i) __builtin_ia32_expandloaddi512_mask ((const __v8di *) __P,
-							(__v8di) __W,
-							(__mmask8) __U);
+  return (__m512) __builtin_ia32_getexpps512_mask ((__v16sf) __A,
+						   (__v16sf) __W,
+						   (__mmask16) __U, __R);
 }
 
-extern __inline __m512i
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_expandloadu_epi64 (__mmask8 __U, void const *__P)
+_mm512_maskz_getexp_round_ps (__mmask16 __U, __m512 __A, const int __R)
 {
-  return (__m512i)
-	 __builtin_ia32_expandloaddi512_maskz ((const __v8di *) __P,
-					       (__v8di)
-					       _mm512_setzero_si512 (),
-					       (__mmask8) __U);
+  return (__m512) __builtin_ia32_getexpps512_mask ((__v16sf) __A,
+						   (__v16sf)
+						   _mm512_setzero_ps (),
+						   (__mmask16) __U, __R);
 }
 
-extern __inline __m512i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_expand_epi32 (__m512i __W, __mmask16 __U, __m512i __A)
+_mm512_getexp_round_pd (__m512d __A, const int __R)
 {
-  return (__m512i) __builtin_ia32_expandsi512_mask ((__v16si) __A,
-						    (__v16si) __W,
-						    (__mmask16) __U);
+  return (__m512d) __builtin_ia32_getexppd512_mask ((__v8df) __A,
+						    (__v8df)
+						    _mm512_undefined_pd (),
+						    (__mmask8) -1, __R);
 }
 
-extern __inline __m512i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_expand_epi32 (__mmask16 __U, __m512i __A)
+_mm512_mask_getexp_round_pd (__m512d __W, __mmask8 __U, __m512d __A,
+			     const int __R)
 {
-  return (__m512i) __builtin_ia32_expandsi512_maskz ((__v16si) __A,
-						     (__v16si)
-						     _mm512_setzero_si512 (),
-						     (__mmask16) __U);
+  return (__m512d) __builtin_ia32_getexppd512_mask ((__v8df) __A,
+						    (__v8df) __W,
+						    (__mmask8) __U, __R);
 }
 
-extern __inline __m512i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_expandloadu_epi32 (__m512i __W, __mmask16 __U, void const *__P)
+_mm512_maskz_getexp_round_pd (__mmask8 __U, __m512d __A, const int __R)
 {
-  return (__m512i) __builtin_ia32_expandloadsi512_mask ((const __v16si *) __P,
-							(__v16si) __W,
-							(__mmask16) __U);
+  return (__m512d) __builtin_ia32_getexppd512_mask ((__v8df) __A,
+						    (__v8df)
+						    _mm512_setzero_pd (),
+						    (__mmask8) __U, __R);
 }
 
-extern __inline __m512i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_expandloadu_epi32 (__mmask16 __U, void const *__P)
+_mm512_getmant_round_pd (__m512d __A, _MM_MANTISSA_NORM_ENUM __B,
+			 _MM_MANTISSA_SIGN_ENUM __C, const int __R)
 {
-  return (__m512i) __builtin_ia32_expandloadsi512_maskz ((const __v16si *) __P,
-							 (__v16si)
-							 _mm512_setzero_si512
-							 (), (__mmask16) __U);
+  return (__m512d) __builtin_ia32_getmantpd512_mask ((__v8df) __A,
+						     (__C << 2) | __B,
+						     _mm512_undefined_pd (),
+						     (__mmask8) -1, __R);
 }
 
-/* Mask arithmetic operations */
-#define _kand_mask16 _mm512_kand
-#define _kandn_mask16 _mm512_kandn
-#define _knot_mask16 _mm512_knot
-#define _kor_mask16 _mm512_kor
-#define _kxnor_mask16 _mm512_kxnor
-#define _kxor_mask16 _mm512_kxor
-
-extern __inline unsigned char
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kortest_mask16_u8  (__mmask16 __A,  __mmask16 __B, unsigned char *__CF)
+_mm512_mask_getmant_round_pd (__m512d __W, __mmask8 __U, __m512d __A,
+			      _MM_MANTISSA_NORM_ENUM __B,
+			      _MM_MANTISSA_SIGN_ENUM __C, const int __R)
 {
-  *__CF = (unsigned char) __builtin_ia32_kortestchi (__A, __B);
-  return (unsigned char) __builtin_ia32_kortestzhi (__A, __B);
+  return (__m512d) __builtin_ia32_getmantpd512_mask ((__v8df) __A,
+						     (__C << 2) | __B,
+						     (__v8df) __W, __U,
+						     __R);
 }
 
-extern __inline unsigned char
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kortestz_mask16_u8 (__mmask16 __A, __mmask16 __B)
+_mm512_maskz_getmant_round_pd (__mmask8 __U, __m512d __A,
+			       _MM_MANTISSA_NORM_ENUM __B,
+			       _MM_MANTISSA_SIGN_ENUM __C, const int __R)
 {
-  return (unsigned char) __builtin_ia32_kortestzhi ((__mmask16) __A,
-						    (__mmask16) __B);
+  return (__m512d) __builtin_ia32_getmantpd512_mask ((__v8df) __A,
+						     (__C << 2) | __B,
+						     (__v8df)
+						     _mm512_setzero_pd (),
+						     __U, __R);
 }
 
-extern __inline unsigned char
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kortestc_mask16_u8 (__mmask16 __A, __mmask16 __B)
+_mm512_getmant_round_ps (__m512 __A, _MM_MANTISSA_NORM_ENUM __B,
+			 _MM_MANTISSA_SIGN_ENUM __C, const int __R)
 {
-  return (unsigned char) __builtin_ia32_kortestchi ((__mmask16) __A,
-						    (__mmask16) __B);
+  return (__m512) __builtin_ia32_getmantps512_mask ((__v16sf) __A,
+						    (__C << 2) | __B,
+						    _mm512_undefined_ps (),
+						    (__mmask16) -1, __R);
 }
 
-extern __inline unsigned int
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_cvtmask16_u32 (__mmask16 __A)
+_mm512_mask_getmant_round_ps (__m512 __W, __mmask16 __U, __m512 __A,
+			      _MM_MANTISSA_NORM_ENUM __B,
+			      _MM_MANTISSA_SIGN_ENUM __C, const int __R)
 {
-  return (unsigned int) __builtin_ia32_kmovw ((__mmask16 ) __A);
+  return (__m512) __builtin_ia32_getmantps512_mask ((__v16sf) __A,
+						    (__C << 2) | __B,
+						    (__v16sf) __W, __U,
+						    __R);
 }
 
-extern __inline __mmask16
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_cvtu32_mask16 (unsigned int __A)
+_mm512_maskz_getmant_round_ps (__mmask16 __U, __m512 __A,
+			       _MM_MANTISSA_NORM_ENUM __B,
+			       _MM_MANTISSA_SIGN_ENUM __C, const int __R)
 {
-  return (__mmask16) __builtin_ia32_kmovw ((__mmask16 ) __A);
+  return (__m512) __builtin_ia32_getmantps512_mask ((__v16sf) __A,
+						    (__C << 2) | __B,
+						    (__v16sf)
+						    _mm512_setzero_ps (),
+						    __U, __R);
 }
 
-extern __inline __mmask16
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_load_mask16 (__mmask16 *__A)
-{
-  return (__mmask16) __builtin_ia32_kmovw (*(__mmask16 *) __A);
-}
+#else
+#define _mm512_getmant_round_pd(X, B, C, R)                                                  \
+  ((__m512d)__builtin_ia32_getmantpd512_mask ((__v8df)(__m512d)(X),                 \
+                                              (int)(((C)<<2) | (B)),                \
+                                              (__v8df)(__m512d)_mm512_undefined_pd(), \
+                                              (__mmask8)-1,\
+					      (R)))
+
+#define _mm512_mask_getmant_round_pd(W, U, X, B, C, R)                                       \
+  ((__m512d)__builtin_ia32_getmantpd512_mask ((__v8df)(__m512d)(X),                 \
+                                              (int)(((C)<<2) | (B)),                \
+                                              (__v8df)(__m512d)(W),                 \
+                                              (__mmask8)(U),\
+					      (R)))
+
+#define _mm512_maskz_getmant_round_pd(U, X, B, C, R)                                         \
+  ((__m512d)__builtin_ia32_getmantpd512_mask ((__v8df)(__m512d)(X),                 \
+                                              (int)(((C)<<2) | (B)),                \
+                                              (__v8df)(__m512d)_mm512_setzero_pd(), \
+                                              (__mmask8)(U),\
+					      (R)))
+#define _mm512_getmant_round_ps(X, B, C, R)                                                  \
+  ((__m512)__builtin_ia32_getmantps512_mask ((__v16sf)(__m512)(X),                  \
+                                             (int)(((C)<<2) | (B)),                 \
+                                             (__v16sf)(__m512)_mm512_undefined_ps(), \
+                                             (__mmask16)-1,\
+					     (R)))
+
+#define _mm512_mask_getmant_round_ps(W, U, X, B, C, R)                                       \
+  ((__m512)__builtin_ia32_getmantps512_mask ((__v16sf)(__m512)(X),                  \
+                                             (int)(((C)<<2) | (B)),                 \
+                                             (__v16sf)(__m512)(W),                  \
+                                             (__mmask16)(U),\
+					     (R)))
+
+#define _mm512_maskz_getmant_round_ps(U, X, B, C, R)                                         \
+  ((__m512)__builtin_ia32_getmantps512_mask ((__v16sf)(__m512)(X),                  \
+                                             (int)(((C)<<2) | (B)),                 \
+                                             (__v16sf)(__m512)_mm512_setzero_ps(),  \
+                                             (__mmask16)(U),\
+					     (R)))
 
-extern __inline void
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_store_mask16 (__mmask16 *__A, __mmask16 __B)
-{
-  *(__mmask16 *) __A = __builtin_ia32_kmovw (__B);
-}
+#define _mm512_getexp_round_ps(A, R)						\
+  ((__m512)__builtin_ia32_getexpps512_mask((__v16sf)(__m512)(A),		\
+  (__v16sf)_mm512_undefined_ps(), (__mmask16)-1, R))
 
-extern __inline __mmask16
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_kand (__mmask16 __A, __mmask16 __B)
-{
-  return (__mmask16) __builtin_ia32_kandhi ((__mmask16) __A, (__mmask16) __B);
-}
+#define _mm512_mask_getexp_round_ps(W, U, A, R)					\
+  ((__m512)__builtin_ia32_getexpps512_mask((__v16sf)(__m512)(A),		\
+  (__v16sf)(__m512)(W), (__mmask16)(U), R))
 
-extern __inline __mmask16
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_kandn (__mmask16 __A, __mmask16 __B)
-{
-  return (__mmask16) __builtin_ia32_kandnhi ((__mmask16) __A,
-					     (__mmask16) __B);
-}
+#define _mm512_maskz_getexp_round_ps(U, A, R)					\
+  ((__m512)__builtin_ia32_getexpps512_mask((__v16sf)(__m512)(A),		\
+  (__v16sf)_mm512_setzero_ps(), (__mmask16)(U), R))
 
-extern __inline __mmask16
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_kor (__mmask16 __A, __mmask16 __B)
-{
-  return (__mmask16) __builtin_ia32_korhi ((__mmask16) __A, (__mmask16) __B);
-}
+#define _mm512_getexp_round_pd(A, R)						\
+  ((__m512d)__builtin_ia32_getexppd512_mask((__v8df)(__m512d)(A),		\
+  (__v8df)_mm512_undefined_pd(), (__mmask8)-1, R))
 
-extern __inline int
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_kortestz (__mmask16 __A, __mmask16 __B)
-{
-  return (__mmask16) __builtin_ia32_kortestzhi ((__mmask16) __A,
-						(__mmask16) __B);
-}
+#define _mm512_mask_getexp_round_pd(W, U, A, R)					\
+  ((__m512d)__builtin_ia32_getexppd512_mask((__v8df)(__m512d)(A),		\
+  (__v8df)(__m512d)(W), (__mmask8)(U), R))
 
-extern __inline int
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_kortestc (__mmask16 __A, __mmask16 __B)
-{
-  return (__mmask16) __builtin_ia32_kortestchi ((__mmask16) __A,
-						(__mmask16) __B);
-}
+#define _mm512_maskz_getexp_round_pd(U, A, R)					\
+  ((__m512d)__builtin_ia32_getexppd512_mask((__v8df)(__m512d)(A),		\
+  (__v8df)_mm512_setzero_pd(), (__mmask8)(U), R))
+#endif
 
-extern __inline __mmask16
+#ifdef __OPTIMIZE__
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_kxnor (__mmask16 __A, __mmask16 __B)
+_mm512_roundscale_round_ps (__m512 __A, const int __imm, const int __R)
 {
-  return (__mmask16) __builtin_ia32_kxnorhi ((__mmask16) __A, (__mmask16) __B);
+  return (__m512) __builtin_ia32_rndscaleps_mask ((__v16sf) __A, __imm,
+						  (__v16sf)
+						  _mm512_undefined_ps (),
+						  -1, __R);
 }
 
-extern __inline __mmask16
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_kxor (__mmask16 __A, __mmask16 __B)
+_mm512_mask_roundscale_round_ps (__m512 __A, __mmask16 __B, __m512 __C,
+				 const int __imm, const int __R)
 {
-  return (__mmask16) __builtin_ia32_kxorhi ((__mmask16) __A, (__mmask16) __B);
+  return (__m512) __builtin_ia32_rndscaleps_mask ((__v16sf) __C, __imm,
+						  (__v16sf) __A,
+						  (__mmask16) __B, __R);
 }
 
-extern __inline __mmask16
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_knot (__mmask16 __A)
+_mm512_maskz_roundscale_round_ps (__mmask16 __A, __m512 __B,
+				  const int __imm, const int __R)
 {
-  return (__mmask16) __builtin_ia32_knothi ((__mmask16) __A);
+  return (__m512) __builtin_ia32_rndscaleps_mask ((__v16sf) __B,
+						  __imm,
+						  (__v16sf)
+						  _mm512_setzero_ps (),
+						  (__mmask16) __A, __R);
 }
 
-extern __inline __mmask16
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_kunpackb (__mmask16 __A, __mmask16 __B)
+_mm512_roundscale_round_pd (__m512d __A, const int __imm, const int __R)
 {
-  return (__mmask16) __builtin_ia32_kunpckhi ((__mmask16) __A, (__mmask16) __B);
+  return (__m512d) __builtin_ia32_rndscalepd_mask ((__v8df) __A, __imm,
+						   (__v8df)
+						   _mm512_undefined_pd (),
+						   -1, __R);
 }
 
-extern __inline __mmask16
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kunpackb_mask16 (__mmask8 __A, __mmask8 __B)
+_mm512_mask_roundscale_round_pd (__m512d __A, __mmask8 __B,
+				 __m512d __C, const int __imm, const int __R)
 {
-  return (__mmask16) __builtin_ia32_kunpckhi ((__mmask16) __A, (__mmask16) __B);
+  return (__m512d) __builtin_ia32_rndscalepd_mask ((__v8df) __C, __imm,
+						   (__v8df) __A,
+						   (__mmask8) __B, __R);
 }
 
-#ifdef __OPTIMIZE__
-extern __inline __m512i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_inserti32x4 (__mmask16 __B, __m512i __C, __m128i __D,
-			  const int __imm)
+_mm512_maskz_roundscale_round_pd (__mmask8 __A, __m512d __B,
+				  const int __imm, const int __R)
 {
-  return (__m512i) __builtin_ia32_inserti32x4_mask ((__v16si) __C,
-						    (__v4si) __D,
-						    __imm,
-						    (__v16si)
-						    _mm512_setzero_si512 (),
-						    __B);
+  return (__m512d) __builtin_ia32_rndscalepd_mask ((__v8df) __B,
+						   __imm,
+						   (__v8df)
+						   _mm512_setzero_pd (),
+						   (__mmask8) __A, __R);
 }
 
+#else
+#define _mm512_roundscale_round_ps(A, B, R) \
+  ((__m512) __builtin_ia32_rndscaleps_mask ((__v16sf)(__m512)(A), (int)(B),\
+    (__v16sf)_mm512_undefined_ps(), (__mmask16)(-1), R))
+#define _mm512_mask_roundscale_round_ps(A, B, C, D, R)				\
+  ((__m512) __builtin_ia32_rndscaleps_mask ((__v16sf)(__m512)(C),	\
+					    (int)(D),			\
+					    (__v16sf)(__m512)(A),	\
+					    (__mmask16)(B), R))
+#define _mm512_maskz_roundscale_round_ps(A, B, C, R)				\
+  ((__m512) __builtin_ia32_rndscaleps_mask ((__v16sf)(__m512)(B),	\
+					    (int)(C),			\
+					    (__v16sf)_mm512_setzero_ps(),\
+					    (__mmask16)(A), R))
+#define _mm512_roundscale_round_pd(A, B, R) \
+  ((__m512d) __builtin_ia32_rndscalepd_mask ((__v8df)(__m512d)(A), (int)(B),\
+    (__v8df)_mm512_undefined_pd(), (__mmask8)(-1), R))
+#define _mm512_mask_roundscale_round_pd(A, B, C, D, R)				\
+  ((__m512d) __builtin_ia32_rndscalepd_mask ((__v8df)(__m512d)(C),	\
+					     (int)(D),			\
+					     (__v8df)(__m512d)(A),	\
+					     (__mmask8)(B), R))
+#define _mm512_maskz_roundscale_round_pd(A, B, C, R)				\
+  ((__m512d) __builtin_ia32_rndscalepd_mask ((__v8df)(__m512d)(B),	\
+					     (int)(C),			\
+					     (__v8df)_mm512_setzero_pd(),\
+					     (__mmask8)(A), R))
+#endif
+
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_insertf32x4 (__mmask16 __B, __m512 __C, __m128 __D,
-			  const int __imm)
+_mm512_floor_ps (__m512 __A)
 {
-  return (__m512) __builtin_ia32_insertf32x4_mask ((__v16sf) __C,
-						   (__v4sf) __D,
-						   __imm,
-						   (__v16sf)
-						   _mm512_setzero_ps (), __B);
+  return (__m512) __builtin_ia32_rndscaleps_mask ((__v16sf) __A,
+						  _MM_FROUND_FLOOR,
+						  (__v16sf) __A, -1,
+						  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_inserti32x4 (__m512i __A, __mmask16 __B, __m512i __C,
-			 __m128i __D, const int __imm)
+_mm512_floor_pd (__m512d __A)
 {
-  return (__m512i) __builtin_ia32_inserti32x4_mask ((__v16si) __C,
-						    (__v4si) __D,
-						    __imm,
-						    (__v16si) __A,
-						    __B);
+  return (__m512d) __builtin_ia32_rndscalepd_mask ((__v8df) __A,
+						   _MM_FROUND_FLOOR,
+						   (__v8df) __A, -1,
+						   _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_insertf32x4 (__m512 __A, __mmask16 __B, __m512 __C,
-			 __m128 __D, const int __imm)
-{
-  return (__m512) __builtin_ia32_insertf32x4_mask ((__v16sf) __C,
-						   (__v4sf) __D,
-						   __imm,
-						   (__v16sf) __A, __B);
-}
-#else
-#define _mm512_maskz_insertf32x4(A, X, Y, C)                            \
-  ((__m512) __builtin_ia32_insertf32x4_mask ((__v16sf)(__m512) (X),     \
-    (__v4sf)(__m128) (Y), (int) (C), (__v16sf)_mm512_setzero_ps(),      \
-    (__mmask16)(A)))
-
-#define _mm512_maskz_inserti32x4(A, X, Y, C)                            \
-  ((__m512i) __builtin_ia32_inserti32x4_mask ((__v16si)(__m512i) (X),   \
-    (__v4si)(__m128i) (Y), (int) (C), (__v16si)_mm512_setzero_si512 (),     \
-    (__mmask16)(A)))
-
-#define _mm512_mask_insertf32x4(A, B, X, Y, C)                          \
-  ((__m512) __builtin_ia32_insertf32x4_mask ((__v16sf)(__m512) (X),     \
-    (__v4sf)(__m128) (Y), (int) (C), (__v16sf)(__m512) (A),             \
-					     (__mmask16)(B)))
-
-#define _mm512_mask_inserti32x4(A, B, X, Y, C)                          \
-  ((__m512i) __builtin_ia32_inserti32x4_mask ((__v16si)(__m512i) (X),   \
-    (__v4si)(__m128i) (Y), (int) (C), (__v16si)(__m512i) (A),           \
-					      (__mmask16)(B)))
-#endif
-
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_max_epi64 (__m512i __A, __m512i __B)
+_mm512_ceil_ps (__m512 __A)
 {
-  return (__m512i) __builtin_ia32_pmaxsq512_mask ((__v8di) __A,
-						  (__v8di) __B,
-						  (__v8di)
-						  _mm512_undefined_epi32 (),
-						  (__mmask8) -1);
+  return (__m512) __builtin_ia32_rndscaleps_mask ((__v16sf) __A,
+						  _MM_FROUND_CEIL,
+						  (__v16sf) __A, -1,
+						  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_max_epi64 (__mmask8 __M, __m512i __A, __m512i __B)
+_mm512_ceil_pd (__m512d __A)
 {
-  return (__m512i) __builtin_ia32_pmaxsq512_mask ((__v8di) __A,
-						  (__v8di) __B,
-						  (__v8di)
-						  _mm512_setzero_si512 (),
-						  __M);
+  return (__m512d) __builtin_ia32_rndscalepd_mask ((__v8df) __A,
+						   _MM_FROUND_CEIL,
+						   (__v8df) __A, -1,
+						   _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512i
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_max_epi64 (__m512i __W, __mmask8 __M, __m512i __A, __m512i __B)
+_mm512_mask_floor_ps (__m512 __W, __mmask16 __U, __m512 __A)
 {
-  return (__m512i) __builtin_ia32_pmaxsq512_mask ((__v8di) __A,
-						  (__v8di) __B,
-						  (__v8di) __W, __M);
+  return (__m512) __builtin_ia32_rndscaleps_mask ((__v16sf) __A,
+						  _MM_FROUND_FLOOR,
+						  (__v16sf) __W, __U,
+						  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_min_epi64 (__m512i __A, __m512i __B)
+_mm512_mask_floor_pd (__m512d __W, __mmask8 __U, __m512d __A)
 {
-  return (__m512i) __builtin_ia32_pminsq512_mask ((__v8di) __A,
-						  (__v8di) __B,
-						  (__v8di)
-						  _mm512_undefined_epi32 (),
-						  (__mmask8) -1);
+  return (__m512d) __builtin_ia32_rndscalepd_mask ((__v8df) __A,
+						   _MM_FROUND_FLOOR,
+						   (__v8df) __W, __U,
+						   _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512i
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_min_epi64 (__m512i __W, __mmask8 __M, __m512i __A, __m512i __B)
+_mm512_mask_ceil_ps (__m512 __W, __mmask16 __U, __m512 __A)
 {
-  return (__m512i) __builtin_ia32_pminsq512_mask ((__v8di) __A,
-						  (__v8di) __B,
-						  (__v8di) __W, __M);
+  return (__m512) __builtin_ia32_rndscaleps_mask ((__v16sf) __A,
+						  _MM_FROUND_CEIL,
+						  (__v16sf) __W, __U,
+						  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_min_epi64 (__mmask8 __M, __m512i __A, __m512i __B)
+_mm512_mask_ceil_pd (__m512d __W, __mmask8 __U, __m512d __A)
 {
-  return (__m512i) __builtin_ia32_pminsq512_mask ((__v8di) __A,
-						  (__v8di) __B,
-						  (__v8di)
-						  _mm512_setzero_si512 (),
-						  __M);
+  return (__m512d) __builtin_ia32_rndscalepd_mask ((__v8df) __A,
+						   _MM_FROUND_CEIL,
+						   (__v8df) __W, __U,
+						   _MM_FROUND_CUR_DIRECTION);
 }
 
+#ifdef __OPTIMIZE__
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_max_epu64 (__m512i __A, __m512i __B)
+_mm512_alignr_epi32 (__m512i __A, __m512i __B, const int __imm)
 {
-  return (__m512i) __builtin_ia32_pmaxuq512_mask ((__v8di) __A,
-						  (__v8di) __B,
-						  (__v8di)
+  return (__m512i) __builtin_ia32_alignd512_mask ((__v16si) __A,
+						  (__v16si) __B, __imm,
+						  (__v16si)
 						  _mm512_undefined_epi32 (),
-						  (__mmask8) -1);
+						  (__mmask16) -1);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_max_epu64 (__mmask8 __M, __m512i __A, __m512i __B)
+_mm512_mask_alignr_epi32 (__m512i __W, __mmask16 __U, __m512i __A,
+			  __m512i __B, const int __imm)
 {
-  return (__m512i) __builtin_ia32_pmaxuq512_mask ((__v8di) __A,
-						  (__v8di) __B,
-						  (__v8di)
-						  _mm512_setzero_si512 (),
-						  __M);
+  return (__m512i) __builtin_ia32_alignd512_mask ((__v16si) __A,
+						  (__v16si) __B, __imm,
+						  (__v16si) __W,
+						  (__mmask16) __U);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_max_epu64 (__m512i __W, __mmask8 __M, __m512i __A, __m512i __B)
+_mm512_maskz_alignr_epi32 (__mmask16 __U, __m512i __A, __m512i __B,
+			   const int __imm)
 {
-  return (__m512i) __builtin_ia32_pmaxuq512_mask ((__v8di) __A,
-						  (__v8di) __B,
-						  (__v8di) __W, __M);
+  return (__m512i) __builtin_ia32_alignd512_mask ((__v16si) __A,
+						  (__v16si) __B, __imm,
+						  (__v16si)
+						  _mm512_setzero_si512 (),
+						  (__mmask16) __U);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_min_epu64 (__m512i __A, __m512i __B)
+_mm512_alignr_epi64 (__m512i __A, __m512i __B, const int __imm)
 {
-  return (__m512i) __builtin_ia32_pminuq512_mask ((__v8di) __A,
-						  (__v8di) __B,
+  return (__m512i) __builtin_ia32_alignq512_mask ((__v8di) __A,
+						  (__v8di) __B, __imm,
 						  (__v8di)
 						  _mm512_undefined_epi32 (),
 						  (__mmask8) -1);
@@ -11419,3194 +11757,3331 @@ _mm512_min_epu64 (__m512i __A, __m512i __B)
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_min_epu64 (__m512i __W, __mmask8 __M, __m512i __A, __m512i __B)
+_mm512_mask_alignr_epi64 (__m512i __W, __mmask8 __U, __m512i __A,
+			  __m512i __B, const int __imm)
 {
-  return (__m512i) __builtin_ia32_pminuq512_mask ((__v8di) __A,
-						  (__v8di) __B,
-						  (__v8di) __W, __M);
+  return (__m512i) __builtin_ia32_alignq512_mask ((__v8di) __A,
+						  (__v8di) __B, __imm,
+						  (__v8di) __W,
+						  (__mmask8) __U);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_min_epu64 (__mmask8 __M, __m512i __A, __m512i __B)
+_mm512_maskz_alignr_epi64 (__mmask8 __U, __m512i __A, __m512i __B,
+			   const int __imm)
 {
-  return (__m512i) __builtin_ia32_pminuq512_mask ((__v8di) __A,
-						  (__v8di) __B,
+  return (__m512i) __builtin_ia32_alignq512_mask ((__v8di) __A,
+						  (__v8di) __B, __imm,
 						  (__v8di)
 						  _mm512_setzero_si512 (),
-						  __M);
+						  (__mmask8) __U);
 }
+#else
+#define _mm512_alignr_epi32(X, Y, C)                                        \
+    ((__m512i)__builtin_ia32_alignd512_mask ((__v16si)(__m512i)(X),         \
+        (__v16si)(__m512i)(Y), (int)(C), (__v16si)_mm512_undefined_epi32 (),\
+        (__mmask16)-1))
 
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_max_epi32 (__m512i __A, __m512i __B)
-{
-  return (__m512i) __builtin_ia32_pmaxsd512_mask ((__v16si) __A,
-						  (__v16si) __B,
-						  (__v16si)
-						  _mm512_undefined_epi32 (),
-						  (__mmask16) -1);
-}
+#define _mm512_mask_alignr_epi32(W, U, X, Y, C)                             \
+    ((__m512i)__builtin_ia32_alignd512_mask ((__v16si)(__m512i)(X),         \
+        (__v16si)(__m512i)(Y), (int)(C), (__v16si)(__m512i)(W),             \
+        (__mmask16)(U)))
 
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_max_epi32 (__mmask16 __M, __m512i __A, __m512i __B)
-{
-  return (__m512i) __builtin_ia32_pmaxsd512_mask ((__v16si) __A,
-						  (__v16si) __B,
-						  (__v16si)
-						  _mm512_setzero_si512 (),
-						  __M);
-}
+#define _mm512_maskz_alignr_epi32(U, X, Y, C)                               \
+    ((__m512i)__builtin_ia32_alignd512_mask ((__v16si)(__m512i)(X),         \
+        (__v16si)(__m512i)(Y), (int)(C), (__v16si)_mm512_setzero_si512 (),\
+        (__mmask16)(U)))
 
-extern __inline __m512i
+#define _mm512_alignr_epi64(X, Y, C)                                        \
+    ((__m512i)__builtin_ia32_alignq512_mask ((__v8di)(__m512i)(X),          \
+        (__v8di)(__m512i)(Y), (int)(C), (__v8di)_mm512_undefined_epi32 (),  \
+	(__mmask8)-1))
+
+#define _mm512_mask_alignr_epi64(W, U, X, Y, C)                             \
+    ((__m512i)__builtin_ia32_alignq512_mask ((__v8di)(__m512i)(X),          \
+        (__v8di)(__m512i)(Y), (int)(C), (__v8di)(__m512i)(W), (__mmask8)(U)))
+
+#define _mm512_maskz_alignr_epi64(U, X, Y, C)                               \
+    ((__m512i)__builtin_ia32_alignq512_mask ((__v8di)(__m512i)(X),          \
+        (__v8di)(__m512i)(Y), (int)(C), (__v8di)_mm512_setzero_si512 (),\
+        (__mmask8)(U)))
+#endif
+
+extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_max_epi32 (__m512i __W, __mmask16 __M, __m512i __A, __m512i __B)
+_mm512_cmpeq_epi32_mask (__m512i __A, __m512i __B)
 {
-  return (__m512i) __builtin_ia32_pmaxsd512_mask ((__v16si) __A,
-						  (__v16si) __B,
-						  (__v16si) __W, __M);
+  return (__mmask16) __builtin_ia32_pcmpeqd512_mask ((__v16si) __A,
+						     (__v16si) __B,
+						     (__mmask16) -1);
 }
 
-extern __inline __m512i
+extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_min_epi32 (__m512i __A, __m512i __B)
+_mm512_mask_cmpeq_epi32_mask (__mmask16 __U, __m512i __A, __m512i __B)
 {
-  return (__m512i) __builtin_ia32_pminsd512_mask ((__v16si) __A,
-						  (__v16si) __B,
-						  (__v16si)
-						  _mm512_undefined_epi32 (),
-						  (__mmask16) -1);
+  return (__mmask16) __builtin_ia32_pcmpeqd512_mask ((__v16si) __A,
+						     (__v16si) __B, __U);
 }
 
-extern __inline __m512i
+extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_min_epi32 (__mmask16 __M, __m512i __A, __m512i __B)
+_mm512_mask_cmpeq_epi64_mask (__mmask8 __U, __m512i __A, __m512i __B)
 {
-  return (__m512i) __builtin_ia32_pminsd512_mask ((__v16si) __A,
-						  (__v16si) __B,
-						  (__v16si)
-						  _mm512_setzero_si512 (),
-						  __M);
+  return (__mmask8) __builtin_ia32_pcmpeqq512_mask ((__v8di) __A,
+						    (__v8di) __B, __U);
 }
 
-extern __inline __m512i
+extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_min_epi32 (__m512i __W, __mmask16 __M, __m512i __A, __m512i __B)
+_mm512_cmpeq_epi64_mask (__m512i __A, __m512i __B)
 {
-  return (__m512i) __builtin_ia32_pminsd512_mask ((__v16si) __A,
-						  (__v16si) __B,
-						  (__v16si) __W, __M);
+  return (__mmask8) __builtin_ia32_pcmpeqq512_mask ((__v8di) __A,
+						    (__v8di) __B,
+						    (__mmask8) -1);
 }
 
-extern __inline __m512i
+extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_max_epu32 (__m512i __A, __m512i __B)
+_mm512_cmpgt_epi32_mask (__m512i __A, __m512i __B)
 {
-  return (__m512i) __builtin_ia32_pmaxud512_mask ((__v16si) __A,
-						  (__v16si) __B,
-						  (__v16si)
-						  _mm512_undefined_epi32 (),
-						  (__mmask16) -1);
+  return (__mmask16) __builtin_ia32_pcmpgtd512_mask ((__v16si) __A,
+						     (__v16si) __B,
+						     (__mmask16) -1);
 }
 
-extern __inline __m512i
+extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_max_epu32 (__mmask16 __M, __m512i __A, __m512i __B)
+_mm512_mask_cmpgt_epi32_mask (__mmask16 __U, __m512i __A, __m512i __B)
 {
-  return (__m512i) __builtin_ia32_pmaxud512_mask ((__v16si) __A,
-						  (__v16si) __B,
-						  (__v16si)
-						  _mm512_setzero_si512 (),
-						  __M);
+  return (__mmask16) __builtin_ia32_pcmpgtd512_mask ((__v16si) __A,
+						     (__v16si) __B, __U);
 }
 
-extern __inline __m512i
+extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_max_epu32 (__m512i __W, __mmask16 __M, __m512i __A, __m512i __B)
+_mm512_mask_cmpgt_epi64_mask (__mmask8 __U, __m512i __A, __m512i __B)
 {
-  return (__m512i) __builtin_ia32_pmaxud512_mask ((__v16si) __A,
-						  (__v16si) __B,
-						  (__v16si) __W, __M);
+  return (__mmask8) __builtin_ia32_pcmpgtq512_mask ((__v8di) __A,
+						    (__v8di) __B, __U);
 }
 
-extern __inline __m512i
+extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_min_epu32 (__m512i __A, __m512i __B)
+_mm512_cmpgt_epi64_mask (__m512i __A, __m512i __B)
 {
-  return (__m512i) __builtin_ia32_pminud512_mask ((__v16si) __A,
-						  (__v16si) __B,
-						  (__v16si)
-						  _mm512_undefined_epi32 (),
-						  (__mmask16) -1);
+  return (__mmask8) __builtin_ia32_pcmpgtq512_mask ((__v8di) __A,
+						    (__v8di) __B,
+						    (__mmask8) -1);
 }
 
-extern __inline __m512i
+extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_min_epu32 (__mmask16 __M, __m512i __A, __m512i __B)
+_mm512_cmpge_epi32_mask (__m512i __X, __m512i __Y)
 {
-  return (__m512i) __builtin_ia32_pminud512_mask ((__v16si) __A,
-						  (__v16si) __B,
-						  (__v16si)
-						  _mm512_setzero_si512 (),
-						  __M);
+  return (__mmask16) __builtin_ia32_cmpd512_mask ((__v16si) __X,
+						    (__v16si) __Y, 5,
+						    (__mmask16) -1);
 }
 
-extern __inline __m512i
+extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_min_epu32 (__m512i __W, __mmask16 __M, __m512i __A, __m512i __B)
+_mm512_mask_cmpge_epi32_mask (__mmask16 __M, __m512i __X, __m512i __Y)
 {
-  return (__m512i) __builtin_ia32_pminud512_mask ((__v16si) __A,
-						  (__v16si) __B,
-						  (__v16si) __W, __M);
+  return (__mmask16) __builtin_ia32_cmpd512_mask ((__v16si) __X,
+						    (__v16si) __Y, 5,
+						    (__mmask16) __M);
 }
 
-extern __inline __m512
+extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_unpacklo_ps (__m512 __A, __m512 __B)
+_mm512_mask_cmpge_epu32_mask (__mmask16 __M, __m512i __X, __m512i __Y)
 {
-  return (__m512) __builtin_ia32_unpcklps512_mask ((__v16sf) __A,
-						   (__v16sf) __B,
-						   (__v16sf)
-						   _mm512_undefined_ps (),
-						   (__mmask16) -1);
+  return (__mmask16) __builtin_ia32_ucmpd512_mask ((__v16si) __X,
+						    (__v16si) __Y, 5,
+						    (__mmask16) __M);
 }
 
-extern __inline __m512
+extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_unpacklo_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B)
+_mm512_cmpge_epu32_mask (__m512i __X, __m512i __Y)
 {
-  return (__m512) __builtin_ia32_unpcklps512_mask ((__v16sf) __A,
-						   (__v16sf) __B,
-						   (__v16sf) __W,
-						   (__mmask16) __U);
+  return (__mmask16) __builtin_ia32_ucmpd512_mask ((__v16si) __X,
+						    (__v16si) __Y, 5,
+						    (__mmask16) -1);
 }
 
-extern __inline __m512
+extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_unpacklo_ps (__mmask16 __U, __m512 __A, __m512 __B)
+_mm512_mask_cmpge_epi64_mask (__mmask8 __M, __m512i __X, __m512i __Y)
 {
-  return (__m512) __builtin_ia32_unpcklps512_mask ((__v16sf) __A,
-						   (__v16sf) __B,
-						   (__v16sf)
-						   _mm512_setzero_ps (),
-						   (__mmask16) __U);
+  return (__mmask8) __builtin_ia32_cmpq512_mask ((__v8di) __X,
+						    (__v8di) __Y, 5,
+						    (__mmask8) __M);
 }
 
-#ifdef __OPTIMIZE__
-extern __inline __m128d
+extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_max_round_sd (__m128d __A, __m128d __B, const int __R)
+_mm512_cmpge_epi64_mask (__m512i __X, __m512i __Y)
 {
-  return (__m128d) __builtin_ia32_maxsd_round ((__v2df) __A,
-					       (__v2df) __B,
-					       __R);
+  return (__mmask8) __builtin_ia32_cmpq512_mask ((__v8di) __X,
+						    (__v8di) __Y, 5,
+						    (__mmask8) -1);
 }
 
-extern __inline __m128d
+extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_max_round_sd (__m128d __W, __mmask8 __U, __m128d __A,
-			  __m128d __B, const int __R)
+_mm512_mask_cmpge_epu64_mask (__mmask8 __M, __m512i __X, __m512i __Y)
 {
-  return (__m128d) __builtin_ia32_maxsd_mask_round ((__v2df) __A,
-						 (__v2df) __B,
-						 (__v2df) __W,
-						 (__mmask8) __U, __R);
+  return (__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di) __X,
+						    (__v8di) __Y, 5,
+						    (__mmask8) __M);
 }
 
-extern __inline __m128d
+extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_max_round_sd (__mmask8 __U, __m128d __A, __m128d __B,
-			   const int __R)
+_mm512_cmpge_epu64_mask (__m512i __X, __m512i __Y)
 {
-  return (__m128d) __builtin_ia32_maxsd_mask_round ((__v2df) __A,
-						 (__v2df) __B,
-						 (__v2df)
-						 _mm_setzero_pd (),
-						 (__mmask8) __U, __R);
+  return (__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di) __X,
+						    (__v8di) __Y, 5,
+						    (__mmask8) -1);
 }
 
-extern __inline __m128
+extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_max_round_ss (__m128 __A, __m128 __B, const int __R)
+_mm512_mask_cmple_epi32_mask (__mmask16 __M, __m512i __X, __m512i __Y)
 {
-  return (__m128) __builtin_ia32_maxss_round ((__v4sf) __A,
-					      (__v4sf) __B,
-					      __R);
+  return (__mmask16) __builtin_ia32_cmpd512_mask ((__v16si) __X,
+						    (__v16si) __Y, 2,
+						    (__mmask16) __M);
 }
 
-extern __inline __m128
+extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_max_round_ss (__m128 __W, __mmask8 __U, __m128 __A,
-			  __m128 __B, const int __R)
+_mm512_cmple_epi32_mask (__m512i __X, __m512i __Y)
 {
-  return (__m128) __builtin_ia32_maxss_mask_round ((__v4sf) __A,
-						 (__v4sf) __B,
-						 (__v4sf) __W,
-						 (__mmask8) __U, __R);
+  return (__mmask16) __builtin_ia32_cmpd512_mask ((__v16si) __X,
+						    (__v16si) __Y, 2,
+						    (__mmask16) -1);
 }
 
-extern __inline __m128
+extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_max_round_ss (__mmask8 __U, __m128 __A, __m128 __B,
-			   const int __R)
+_mm512_mask_cmple_epu32_mask (__mmask16 __M, __m512i __X, __m512i __Y)
 {
-  return (__m128) __builtin_ia32_maxss_mask_round ((__v4sf) __A,
-						 (__v4sf) __B,
-						 (__v4sf)
-						 _mm_setzero_ps (),
-						 (__mmask8) __U, __R);
+  return (__mmask16) __builtin_ia32_ucmpd512_mask ((__v16si) __X,
+						    (__v16si) __Y, 2,
+						    (__mmask16) __M);
 }
 
-extern __inline __m128d
+extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_min_round_sd (__m128d __A, __m128d __B, const int __R)
+_mm512_cmple_epu32_mask (__m512i __X, __m512i __Y)
 {
-  return (__m128d) __builtin_ia32_minsd_round ((__v2df) __A,
-					       (__v2df) __B,
-					       __R);
+  return (__mmask16) __builtin_ia32_ucmpd512_mask ((__v16si) __X,
+						    (__v16si) __Y, 2,
+						    (__mmask16) -1);
 }
 
-extern __inline __m128d
+extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_min_round_sd (__m128d __W, __mmask8 __U, __m128d __A,
-			  __m128d __B, const int __R)
+_mm512_mask_cmple_epi64_mask (__mmask8 __M, __m512i __X, __m512i __Y)
 {
-  return (__m128d) __builtin_ia32_minsd_mask_round ((__v2df) __A,
-						 (__v2df) __B,
-						 (__v2df) __W,
-						 (__mmask8) __U, __R);
+  return (__mmask8) __builtin_ia32_cmpq512_mask ((__v8di) __X,
+						    (__v8di) __Y, 2,
+						    (__mmask8) __M);
 }
 
-extern __inline __m128d
+extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_min_round_sd (__mmask8 __U, __m128d __A, __m128d __B,
-			   const int __R)
+_mm512_cmple_epi64_mask (__m512i __X, __m512i __Y)
 {
-  return (__m128d) __builtin_ia32_minsd_mask_round ((__v2df) __A,
-						 (__v2df) __B,
-						 (__v2df)
-						 _mm_setzero_pd (),
-						 (__mmask8) __U, __R);
+  return (__mmask8) __builtin_ia32_cmpq512_mask ((__v8di) __X,
+						    (__v8di) __Y, 2,
+						    (__mmask8) -1);
 }
 
-extern __inline __m128
+extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_min_round_ss (__m128 __A, __m128 __B, const int __R)
+_mm512_mask_cmple_epu64_mask (__mmask8 __M, __m512i __X, __m512i __Y)
 {
-  return (__m128) __builtin_ia32_minss_round ((__v4sf) __A,
-					      (__v4sf) __B,
-					      __R);
+  return (__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di) __X,
+						    (__v8di) __Y, 2,
+						    (__mmask8) __M);
 }
 
-extern __inline __m128
+extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_min_round_ss (__m128 __W, __mmask8 __U, __m128 __A,
-			  __m128 __B, const int __R)
+_mm512_cmple_epu64_mask (__m512i __X, __m512i __Y)
 {
-  return (__m128) __builtin_ia32_minss_mask_round ((__v4sf) __A,
-						 (__v4sf) __B,
-						 (__v4sf) __W,
-						 (__mmask8) __U, __R);
+  return (__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di) __X,
+						    (__v8di) __Y, 2,
+						    (__mmask8) -1);
 }
 
-extern __inline __m128
+extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_min_round_ss (__mmask8 __U, __m128 __A, __m128 __B,
-			   const int __R)
+_mm512_mask_cmplt_epi32_mask (__mmask16 __M, __m512i __X, __m512i __Y)
 {
-  return (__m128) __builtin_ia32_minss_mask_round ((__v4sf) __A,
-						 (__v4sf) __B,
-						 (__v4sf)
-						 _mm_setzero_ps (),
-						 (__mmask8) __U, __R);
+  return (__mmask16) __builtin_ia32_cmpd512_mask ((__v16si) __X,
+						    (__v16si) __Y, 1,
+						    (__mmask16) __M);
 }
 
-#else
-#define _mm_max_round_sd(A, B, C)            \
-    (__m128d)__builtin_ia32_maxsd_round(A, B, C)
-
-#define _mm_mask_max_round_sd(W, U, A, B, C) \
-    (__m128d)__builtin_ia32_maxsd_mask_round(A, B, W, U, C)
-
-#define _mm_maskz_max_round_sd(U, A, B, C)   \
-    (__m128d)__builtin_ia32_maxsd_mask_round(A, B, (__v2df)_mm_setzero_pd(), U, C)
-
-#define _mm_max_round_ss(A, B, C)            \
-    (__m128)__builtin_ia32_maxss_round(A, B, C)
-
-#define _mm_mask_max_round_ss(W, U, A, B, C) \
-    (__m128)__builtin_ia32_maxss_mask_round(A, B, W, U, C)
-
-#define _mm_maskz_max_round_ss(U, A, B, C)   \
-    (__m128)__builtin_ia32_maxss_mask_round(A, B, (__v4sf)_mm_setzero_ps(), U, C)
-
-#define _mm_min_round_sd(A, B, C)            \
-    (__m128d)__builtin_ia32_minsd_round(A, B, C)
-
-#define _mm_mask_min_round_sd(W, U, A, B, C) \
-    (__m128d)__builtin_ia32_minsd_mask_round(A, B, W, U, C)
-
-#define _mm_maskz_min_round_sd(U, A, B, C)   \
-    (__m128d)__builtin_ia32_minsd_mask_round(A, B, (__v2df)_mm_setzero_pd(), U, C)
-
-#define _mm_min_round_ss(A, B, C)            \
-    (__m128)__builtin_ia32_minss_round(A, B, C)
-
-#define _mm_mask_min_round_ss(W, U, A, B, C) \
-    (__m128)__builtin_ia32_minss_mask_round(A, B, W, U, C)
-
-#define _mm_maskz_min_round_ss(U, A, B, C)   \
-    (__m128)__builtin_ia32_minss_mask_round(A, B, (__v4sf)_mm_setzero_ps(), U, C)
-
-#endif
-
-extern __inline __m512d
+extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_blend_pd (__mmask8 __U, __m512d __A, __m512d __W)
+_mm512_cmplt_epi32_mask (__m512i __X, __m512i __Y)
 {
-  return (__m512d) __builtin_ia32_blendmpd_512_mask ((__v8df) __A,
-						     (__v8df) __W,
-						     (__mmask8) __U);
+  return (__mmask16) __builtin_ia32_cmpd512_mask ((__v16si) __X,
+						    (__v16si) __Y, 1,
+						    (__mmask16) -1);
 }
 
-extern __inline __m512
+extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_blend_ps (__mmask16 __U, __m512 __A, __m512 __W)
+_mm512_mask_cmplt_epu32_mask (__mmask16 __M, __m512i __X, __m512i __Y)
 {
-  return (__m512) __builtin_ia32_blendmps_512_mask ((__v16sf) __A,
-						    (__v16sf) __W,
-						    (__mmask16) __U);
+  return (__mmask16) __builtin_ia32_ucmpd512_mask ((__v16si) __X,
+						    (__v16si) __Y, 1,
+						    (__mmask16) __M);
 }
 
-extern __inline __m512i
+extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_blend_epi64 (__mmask8 __U, __m512i __A, __m512i __W)
+_mm512_cmplt_epu32_mask (__m512i __X, __m512i __Y)
 {
-  return (__m512i) __builtin_ia32_blendmq_512_mask ((__v8di) __A,
-						    (__v8di) __W,
-						    (__mmask8) __U);
+  return (__mmask16) __builtin_ia32_ucmpd512_mask ((__v16si) __X,
+						    (__v16si) __Y, 1,
+						    (__mmask16) -1);
 }
 
-extern __inline __m512i
+extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_blend_epi32 (__mmask16 __U, __m512i __A, __m512i __W)
+_mm512_mask_cmplt_epi64_mask (__mmask8 __M, __m512i __X, __m512i __Y)
 {
-  return (__m512i) __builtin_ia32_blendmd_512_mask ((__v16si) __A,
-						    (__v16si) __W,
-						    (__mmask16) __U);
+  return (__mmask8) __builtin_ia32_cmpq512_mask ((__v8di) __X,
+						    (__v8di) __Y, 1,
+						    (__mmask8) __M);
 }
 
-#ifdef __OPTIMIZE__
-extern __inline __m128d
+extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fmadd_round_sd (__m128d __W, __m128d __A, __m128d __B, const int __R)
+_mm512_cmplt_epi64_mask (__m512i __X, __m512i __Y)
 {
-  return (__m128d) __builtin_ia32_vfmaddsd3_round ((__v2df) __W,
-						   (__v2df) __A,
-						   (__v2df) __B,
-						   __R);
+  return (__mmask8) __builtin_ia32_cmpq512_mask ((__v8di) __X,
+						    (__v8di) __Y, 1,
+						    (__mmask8) -1);
 }
 
-extern __inline __m128
+extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fmadd_round_ss (__m128 __W, __m128 __A, __m128 __B, const int __R)
+_mm512_mask_cmplt_epu64_mask (__mmask8 __M, __m512i __X, __m512i __Y)
 {
-  return (__m128) __builtin_ia32_vfmaddss3_round ((__v4sf) __W,
-						  (__v4sf) __A,
-						  (__v4sf) __B,
-						  __R);
+  return (__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di) __X,
+						    (__v8di) __Y, 1,
+						    (__mmask8) __M);
 }
 
-extern __inline __m128d
+extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fmsub_round_sd (__m128d __W, __m128d __A, __m128d __B, const int __R)
+_mm512_cmplt_epu64_mask (__m512i __X, __m512i __Y)
 {
-  return (__m128d) __builtin_ia32_vfmaddsd3_round ((__v2df) __W,
-						   (__v2df) __A,
-						   -(__v2df) __B,
-						   __R);
+  return (__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di) __X,
+						    (__v8di) __Y, 1,
+						    (__mmask8) -1);
 }
 
-extern __inline __m128
+extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fmsub_round_ss (__m128 __W, __m128 __A, __m128 __B, const int __R)
+_mm512_cmpneq_epi32_mask (__m512i __X, __m512i __Y)
 {
-  return (__m128) __builtin_ia32_vfmaddss3_round ((__v4sf) __W,
-						  (__v4sf) __A,
-						  -(__v4sf) __B,
-						  __R);
+  return (__mmask16) __builtin_ia32_cmpd512_mask ((__v16si) __X,
+						    (__v16si) __Y, 4,
+						    (__mmask16) -1);
 }
 
-extern __inline __m128d
+extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fnmadd_round_sd (__m128d __W, __m128d __A, __m128d __B, const int __R)
+_mm512_mask_cmpneq_epi32_mask (__mmask16 __M, __m512i __X, __m512i __Y)
 {
-  return (__m128d) __builtin_ia32_vfmaddsd3_round ((__v2df) __W,
-						   -(__v2df) __A,
-						   (__v2df) __B,
-						   __R);
+  return (__mmask16) __builtin_ia32_cmpd512_mask ((__v16si) __X,
+						    (__v16si) __Y, 4,
+						    (__mmask16) __M);
 }
 
-extern __inline __m128
+extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fnmadd_round_ss (__m128 __W, __m128 __A, __m128 __B, const int __R)
+_mm512_mask_cmpneq_epu32_mask (__mmask16 __M, __m512i __X, __m512i __Y)
 {
-  return (__m128) __builtin_ia32_vfmaddss3_round ((__v4sf) __W,
-						  -(__v4sf) __A,
-						  (__v4sf) __B,
-						  __R);
+  return (__mmask16) __builtin_ia32_ucmpd512_mask ((__v16si) __X,
+						    (__v16si) __Y, 4,
+						    (__mmask16) __M);
 }
 
-extern __inline __m128d
+extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fnmsub_round_sd (__m128d __W, __m128d __A, __m128d __B, const int __R)
+_mm512_cmpneq_epu32_mask (__m512i __X, __m512i __Y)
 {
-  return (__m128d) __builtin_ia32_vfmaddsd3_round ((__v2df) __W,
-						   -(__v2df) __A,
-						   -(__v2df) __B,
-						   __R);
+  return (__mmask16) __builtin_ia32_ucmpd512_mask ((__v16si) __X,
+						    (__v16si) __Y, 4,
+						    (__mmask16) -1);
 }
 
-extern __inline __m128
+extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fnmsub_round_ss (__m128 __W, __m128 __A, __m128 __B, const int __R)
+_mm512_mask_cmpneq_epi64_mask (__mmask8 __M, __m512i __X, __m512i __Y)
 {
-  return (__m128) __builtin_ia32_vfmaddss3_round ((__v4sf) __W,
-						  -(__v4sf) __A,
-						  -(__v4sf) __B,
-						  __R);
+  return (__mmask8) __builtin_ia32_cmpq512_mask ((__v8di) __X,
+						    (__v8di) __Y, 4,
+						    (__mmask8) __M);
 }
-#else
-#define _mm_fmadd_round_sd(A, B, C, R)            \
-    (__m128d)__builtin_ia32_vfmaddsd3_round(A, B, C, R)
-
-#define _mm_fmadd_round_ss(A, B, C, R)            \
-    (__m128)__builtin_ia32_vfmaddss3_round(A, B, C, R)
-
-#define _mm_fmsub_round_sd(A, B, C, R)            \
-    (__m128d)__builtin_ia32_vfmaddsd3_round(A, B, -(C), R)
 
-#define _mm_fmsub_round_ss(A, B, C, R)            \
-    (__m128)__builtin_ia32_vfmaddss3_round(A, B, -(C), R)
-
-#define _mm_fnmadd_round_sd(A, B, C, R)            \
-    (__m128d)__builtin_ia32_vfmaddsd3_round(A, -(B), C, R)
-
-#define _mm_fnmadd_round_ss(A, B, C, R)            \
-   (__m128)__builtin_ia32_vfmaddss3_round(A, -(B), C, R)
-
-#define _mm_fnmsub_round_sd(A, B, C, R)            \
-    (__m128d)__builtin_ia32_vfmaddsd3_round(A, -(B), -(C), R)
-
-#define _mm_fnmsub_round_ss(A, B, C, R)            \
-    (__m128)__builtin_ia32_vfmaddss3_round(A, -(B), -(C), R)
-#endif
-
-extern __inline __m128d
+extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fmadd_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B)
+_mm512_cmpneq_epi64_mask (__m512i __X, __m512i __Y)
 {
-  return (__m128d) __builtin_ia32_vfmaddsd3_mask ((__v2df) __W,
-						  (__v2df) __A,
-						  (__v2df) __B,
-						  (__mmask8) __U,
-						  _MM_FROUND_CUR_DIRECTION);
+  return (__mmask8) __builtin_ia32_cmpq512_mask ((__v8di) __X,
+						    (__v8di) __Y, 4,
+						    (__mmask8) -1);
 }
 
-extern __inline __m128
+extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fmadd_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B)
+_mm512_mask_cmpneq_epu64_mask (__mmask8 __M, __m512i __X, __m512i __Y)
 {
-  return (__m128) __builtin_ia32_vfmaddss3_mask ((__v4sf) __W,
-						 (__v4sf) __A,
-						 (__v4sf) __B,
-						 (__mmask8) __U,
-						 _MM_FROUND_CUR_DIRECTION);
+  return (__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di) __X,
+						    (__v8di) __Y, 4,
+						    (__mmask8) __M);
 }
 
-extern __inline __m128d
+extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fmadd_sd (__m128d __W, __m128d __A, __m128d __B, __mmask8 __U)
+_mm512_cmpneq_epu64_mask (__m512i __X, __m512i __Y)
 {
-  return (__m128d) __builtin_ia32_vfmaddsd3_mask3 ((__v2df) __W,
-						   (__v2df) __A,
-						   (__v2df) __B,
-						   (__mmask8) __U,
-						   _MM_FROUND_CUR_DIRECTION);
+  return (__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di) __X,
+						    (__v8di) __Y, 4,
+						    (__mmask8) -1);
 }
 
-extern __inline __m128
+#define _MM_CMPINT_EQ	    0x0
+#define _MM_CMPINT_LT	    0x1
+#define _MM_CMPINT_LE	    0x2
+#define _MM_CMPINT_UNUSED   0x3
+#define _MM_CMPINT_NE	    0x4
+#define _MM_CMPINT_NLT	    0x5
+#define _MM_CMPINT_GE	    0x5
+#define _MM_CMPINT_NLE	    0x6
+#define _MM_CMPINT_GT	    0x6
+
+#ifdef __OPTIMIZE__
+extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fmadd_ss (__m128 __W, __m128 __A, __m128 __B, __mmask8 __U)
+_mm512_cmp_epi64_mask (__m512i __X, __m512i __Y, const int __P)
 {
-  return (__m128) __builtin_ia32_vfmaddss3_mask3 ((__v4sf) __W,
-						  (__v4sf) __A,
-						  (__v4sf) __B,
-						  (__mmask8) __U,
-						  _MM_FROUND_CUR_DIRECTION);
+  return (__mmask8) __builtin_ia32_cmpq512_mask ((__v8di) __X,
+						 (__v8di) __Y, __P,
+						 (__mmask8) -1);
 }
 
-extern __inline __m128d
+extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fmadd_sd (__mmask8 __U, __m128d __W, __m128d __A, __m128d __B)
+_mm512_cmp_epi32_mask (__m512i __X, __m512i __Y, const int __P)
 {
-  return (__m128d) __builtin_ia32_vfmaddsd3_maskz ((__v2df) __W,
-						   (__v2df) __A,
-						   (__v2df) __B,
-						   (__mmask8) __U,
-						   _MM_FROUND_CUR_DIRECTION);
+  return (__mmask16) __builtin_ia32_cmpd512_mask ((__v16si) __X,
+						  (__v16si) __Y, __P,
+						  (__mmask16) -1);
 }
 
-extern __inline __m128
+extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fmadd_ss (__mmask8 __U, __m128 __W, __m128 __A, __m128 __B)
+_mm512_cmp_epu64_mask (__m512i __X, __m512i __Y, const int __P)
 {
-  return (__m128) __builtin_ia32_vfmaddss3_maskz ((__v4sf) __W,
-						  (__v4sf) __A,
-						  (__v4sf) __B,
-						  (__mmask8) __U,
-						  _MM_FROUND_CUR_DIRECTION);
+  return (__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di) __X,
+						  (__v8di) __Y, __P,
+						  (__mmask8) -1);
 }
 
-extern __inline __m128d
+extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fmsub_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B)
+_mm512_cmp_epu32_mask (__m512i __X, __m512i __Y, const int __P)
 {
-  return (__m128d) __builtin_ia32_vfmaddsd3_mask ((__v2df) __W,
-						  (__v2df) __A,
-						  -(__v2df) __B,
-						  (__mmask8) __U,
-						  _MM_FROUND_CUR_DIRECTION);
+  return (__mmask16) __builtin_ia32_ucmpd512_mask ((__v16si) __X,
+						   (__v16si) __Y, __P,
+						   (__mmask16) -1);
 }
 
-extern __inline __m128
+extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fmsub_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B)
+_mm512_cmp_round_pd_mask (__m512d __X, __m512d __Y, const int __P,
+			  const int __R)
 {
-  return (__m128) __builtin_ia32_vfmaddss3_mask ((__v4sf) __W,
-						 (__v4sf) __A,
-						 -(__v4sf) __B,
-						 (__mmask8) __U,
-						 _MM_FROUND_CUR_DIRECTION);
+  return (__mmask8) __builtin_ia32_cmppd512_mask ((__v8df) __X,
+						  (__v8df) __Y, __P,
+						  (__mmask8) -1, __R);
 }
 
-extern __inline __m128d
+extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fmsub_sd (__m128d __W, __m128d __A, __m128d __B, __mmask8 __U)
+_mm512_cmp_round_ps_mask (__m512 __X, __m512 __Y, const int __P, const int __R)
 {
-  return (__m128d) __builtin_ia32_vfmsubsd3_mask3 ((__v2df) __W,
-						   (__v2df) __A,
-						   (__v2df) __B,
-						   (__mmask8) __U,
-						   _MM_FROUND_CUR_DIRECTION);
+  return (__mmask16) __builtin_ia32_cmpps512_mask ((__v16sf) __X,
+						   (__v16sf) __Y, __P,
+						   (__mmask16) -1, __R);
 }
 
-extern __inline __m128
+extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fmsub_ss (__m128 __W, __m128 __A, __m128 __B, __mmask8 __U)
+_mm512_mask_cmp_epi64_mask (__mmask8 __U, __m512i __X, __m512i __Y,
+			    const int __P)
 {
-  return (__m128) __builtin_ia32_vfmsubss3_mask3 ((__v4sf) __W,
-						  (__v4sf) __A,
-						  (__v4sf) __B,
-						  (__mmask8) __U,
-						  _MM_FROUND_CUR_DIRECTION);
+  return (__mmask8) __builtin_ia32_cmpq512_mask ((__v8di) __X,
+						 (__v8di) __Y, __P,
+						 (__mmask8) __U);
 }
 
-extern __inline __m128d
+extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fmsub_sd (__mmask8 __U, __m128d __W, __m128d __A, __m128d __B)
-{
-  return (__m128d) __builtin_ia32_vfmaddsd3_maskz ((__v2df) __W,
-						   (__v2df) __A,
-						   -(__v2df) __B,
-						   (__mmask8) __U,
-						   _MM_FROUND_CUR_DIRECTION);
+_mm512_mask_cmp_epi32_mask (__mmask16 __U, __m512i __X, __m512i __Y,
+			    const int __P)
+{
+  return (__mmask16) __builtin_ia32_cmpd512_mask ((__v16si) __X,
+						  (__v16si) __Y, __P,
+						  (__mmask16) __U);
 }
 
-extern __inline __m128
+extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fmsub_ss (__mmask8 __U, __m128 __W, __m128 __A, __m128 __B)
+_mm512_mask_cmp_epu64_mask (__mmask8 __U, __m512i __X, __m512i __Y,
+			    const int __P)
 {
-  return (__m128) __builtin_ia32_vfmaddss3_maskz ((__v4sf) __W,
-						  (__v4sf) __A,
-						  -(__v4sf) __B,
-						  (__mmask8) __U,
-						  _MM_FROUND_CUR_DIRECTION);
+  return (__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di) __X,
+						  (__v8di) __Y, __P,
+						  (__mmask8) __U);
 }
 
-extern __inline __m128d
+extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fnmadd_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B)
+_mm512_mask_cmp_epu32_mask (__mmask16 __U, __m512i __X, __m512i __Y,
+			    const int __P)
 {
-  return (__m128d) __builtin_ia32_vfmaddsd3_mask ((__v2df) __W,
-						  -(__v2df) __A,
-						  (__v2df) __B,
-						  (__mmask8) __U,
-						  _MM_FROUND_CUR_DIRECTION);
+  return (__mmask16) __builtin_ia32_ucmpd512_mask ((__v16si) __X,
+						   (__v16si) __Y, __P,
+						   (__mmask16) __U);
 }
 
-extern __inline __m128
+extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fnmadd_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B)
+_mm512_mask_cmp_round_pd_mask (__mmask8 __U, __m512d __X, __m512d __Y,
+			       const int __P, const int __R)
 {
-  return (__m128) __builtin_ia32_vfmaddss3_mask ((__v4sf) __W,
-						 -(__v4sf) __A,
-						 (__v4sf) __B,
-						 (__mmask8) __U,
-						 _MM_FROUND_CUR_DIRECTION);
+  return (__mmask8) __builtin_ia32_cmppd512_mask ((__v8df) __X,
+						  (__v8df) __Y, __P,
+						  (__mmask8) __U, __R);
 }
 
-extern __inline __m128d
+extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fnmadd_sd (__m128d __W, __m128d __A, __m128d __B, __mmask8 __U)
+_mm512_mask_cmp_round_ps_mask (__mmask16 __U, __m512 __X, __m512 __Y,
+			       const int __P, const int __R)
 {
-  return (__m128d) __builtin_ia32_vfmaddsd3_mask3 ((__v2df) __W,
-						   -(__v2df) __A,
-						   (__v2df) __B,
-						   (__mmask8) __U,
-						   _MM_FROUND_CUR_DIRECTION);
+  return (__mmask16) __builtin_ia32_cmpps512_mask ((__v16sf) __X,
+						   (__v16sf) __Y, __P,
+						   (__mmask16) __U, __R);
 }
 
-extern __inline __m128
+#else
+#define _mm512_cmp_epi64_mask(X, Y, P)					\
+  ((__mmask8) __builtin_ia32_cmpq512_mask ((__v8di)(__m512i)(X),	\
+					   (__v8di)(__m512i)(Y), (int)(P),\
+					   (__mmask8)-1))
+
+#define _mm512_cmp_epi32_mask(X, Y, P)					\
+  ((__mmask16) __builtin_ia32_cmpd512_mask ((__v16si)(__m512i)(X),	\
+					    (__v16si)(__m512i)(Y), (int)(P), \
+					    (__mmask16)-1))
+
+#define _mm512_cmp_epu64_mask(X, Y, P)					\
+  ((__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di)(__m512i)(X),	\
+					    (__v8di)(__m512i)(Y), (int)(P),\
+					    (__mmask8)-1))
+
+#define _mm512_cmp_epu32_mask(X, Y, P)					\
+  ((__mmask16) __builtin_ia32_ucmpd512_mask ((__v16si)(__m512i)(X),	\
+					     (__v16si)(__m512i)(Y), (int)(P), \
+					     (__mmask16)-1))
+
+#define _mm512_cmp_round_pd_mask(X, Y, P, R)				\
+  ((__mmask8) __builtin_ia32_cmppd512_mask ((__v8df)(__m512d)(X),	\
+					    (__v8df)(__m512d)(Y), (int)(P),\
+					    (__mmask8)-1, R))
+
+#define _mm512_cmp_round_ps_mask(X, Y, P, R)				\
+  ((__mmask16) __builtin_ia32_cmpps512_mask ((__v16sf)(__m512)(X),	\
+					     (__v16sf)(__m512)(Y), (int)(P),\
+					     (__mmask16)-1, R))
+
+#define _mm512_mask_cmp_epi64_mask(M, X, Y, P)				\
+  ((__mmask8) __builtin_ia32_cmpq512_mask ((__v8di)(__m512i)(X),	\
+					   (__v8di)(__m512i)(Y), (int)(P),\
+					   (__mmask8)(M)))
+
+#define _mm512_mask_cmp_epi32_mask(M, X, Y, P)				\
+  ((__mmask16) __builtin_ia32_cmpd512_mask ((__v16si)(__m512i)(X),	\
+					    (__v16si)(__m512i)(Y), (int)(P), \
+					    (__mmask16)(M)))
+
+#define _mm512_mask_cmp_epu64_mask(M, X, Y, P)				\
+  ((__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di)(__m512i)(X),	\
+					    (__v8di)(__m512i)(Y), (int)(P),\
+					    (__mmask8)(M)))
+
+#define _mm512_mask_cmp_epu32_mask(M, X, Y, P)				\
+  ((__mmask16) __builtin_ia32_ucmpd512_mask ((__v16si)(__m512i)(X),	\
+					     (__v16si)(__m512i)(Y), (int)(P), \
+					     (__mmask16)(M)))
+
+#define _mm512_mask_cmp_round_pd_mask(M, X, Y, P, R)			\
+  ((__mmask8) __builtin_ia32_cmppd512_mask ((__v8df)(__m512d)(X),	\
+					    (__v8df)(__m512d)(Y), (int)(P),\
+					    (__mmask8)(M), R))
+
+#define _mm512_mask_cmp_round_ps_mask(M, X, Y, P, R)			\
+  ((__mmask16) __builtin_ia32_cmpps512_mask ((__v16sf)(__m512)(X),	\
+					     (__v16sf)(__m512)(Y), (int)(P),\
+					     (__mmask16)(M), R))
+
+#endif
+
+#ifdef __OPTIMIZE__
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fnmadd_ss (__m128 __W, __m128 __A, __m128 __B, __mmask8 __U)
+_mm512_i32gather_ps (__m512i __index, void const *__addr, int __scale)
 {
-  return (__m128) __builtin_ia32_vfmaddss3_mask3 ((__v4sf) __W,
-						  -(__v4sf) __A,
-						  (__v4sf) __B,
-						  (__mmask8) __U,
-						  _MM_FROUND_CUR_DIRECTION);
+  __m512 __v1_old = _mm512_undefined_ps ();
+  __mmask16 __mask = 0xFFFF;
+
+  return (__m512) __builtin_ia32_gathersiv16sf ((__v16sf) __v1_old,
+						__addr,
+						(__v16si) __index,
+						__mask, __scale);
 }
 
-extern __inline __m128d
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fnmadd_sd (__mmask8 __U, __m128d __W, __m128d __A, __m128d __B)
+_mm512_mask_i32gather_ps (__m512 __v1_old, __mmask16 __mask,
+			  __m512i __index, void const *__addr, int __scale)
 {
-  return (__m128d) __builtin_ia32_vfmaddsd3_maskz ((__v2df) __W,
-						   -(__v2df) __A,
-						   (__v2df) __B,
-						   (__mmask8) __U,
-						   _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_gathersiv16sf ((__v16sf) __v1_old,
+						__addr,
+						(__v16si) __index,
+						__mask, __scale);
 }
 
-extern __inline __m128
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fnmadd_ss (__mmask8 __U, __m128 __W, __m128 __A, __m128 __B)
+_mm512_i32gather_pd (__m256i __index, void const *__addr, int __scale)
 {
-  return (__m128) __builtin_ia32_vfmaddss3_maskz ((__v4sf) __W,
-						  -(__v4sf) __A,
-						  (__v4sf) __B,
-						  (__mmask8) __U,
-						  _MM_FROUND_CUR_DIRECTION);
+  __m512d __v1_old = _mm512_undefined_pd ();
+  __mmask8 __mask = 0xFF;
+
+  return (__m512d) __builtin_ia32_gathersiv8df ((__v8df) __v1_old,
+						__addr,
+						(__v8si) __index, __mask,
+						__scale);
 }
 
-extern __inline __m128d
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fnmsub_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B)
+_mm512_mask_i32gather_pd (__m512d __v1_old, __mmask8 __mask,
+			  __m256i __index, void const *__addr, int __scale)
 {
-  return (__m128d) __builtin_ia32_vfmaddsd3_mask ((__v2df) __W,
-						  -(__v2df) __A,
-						  -(__v2df) __B,
-						  (__mmask8) __U,
-						  _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_gathersiv8df ((__v8df) __v1_old,
+						__addr,
+						(__v8si) __index,
+						__mask, __scale);
 }
 
-extern __inline __m128
+extern __inline __m256
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fnmsub_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B)
+_mm512_i64gather_ps (__m512i __index, void const *__addr, int __scale)
 {
-  return (__m128) __builtin_ia32_vfmaddss3_mask ((__v4sf) __W,
-						 -(__v4sf) __A,
-						 -(__v4sf) __B,
-						 (__mmask8) __U,
-						 _MM_FROUND_CUR_DIRECTION);
+  __m256 __v1_old = _mm256_undefined_ps ();
+  __mmask8 __mask = 0xFF;
+
+  return (__m256) __builtin_ia32_gatherdiv16sf ((__v8sf) __v1_old,
+						__addr,
+						(__v8di) __index, __mask,
+						__scale);
 }
 
-extern __inline __m128d
+extern __inline __m256
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fnmsub_sd (__m128d __W, __m128d __A, __m128d __B, __mmask8 __U)
+_mm512_mask_i64gather_ps (__m256 __v1_old, __mmask8 __mask,
+			  __m512i __index, void const *__addr, int __scale)
 {
-  return (__m128d) __builtin_ia32_vfmsubsd3_mask3 ((__v2df) __W,
-						   -(__v2df) __A,
-						   (__v2df) __B,
-						   (__mmask8) __U,
-						   _MM_FROUND_CUR_DIRECTION);
+  return (__m256) __builtin_ia32_gatherdiv16sf ((__v8sf) __v1_old,
+						__addr,
+						(__v8di) __index,
+						__mask, __scale);
 }
 
-extern __inline __m128
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fnmsub_ss (__m128 __W, __m128 __A, __m128 __B, __mmask8 __U)
+_mm512_i64gather_pd (__m512i __index, void const *__addr, int __scale)
 {
-  return (__m128) __builtin_ia32_vfmsubss3_mask3 ((__v4sf) __W,
-						  -(__v4sf) __A,
-						  (__v4sf) __B,
-						  (__mmask8) __U,
-						  _MM_FROUND_CUR_DIRECTION);
+  __m512d __v1_old = _mm512_undefined_pd ();
+  __mmask8 __mask = 0xFF;
+
+  return (__m512d) __builtin_ia32_gatherdiv8df ((__v8df) __v1_old,
+						__addr,
+						(__v8di) __index, __mask,
+						__scale);
 }
 
-extern __inline __m128d
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fnmsub_sd (__mmask8 __U, __m128d __W, __m128d __A, __m128d __B)
+_mm512_mask_i64gather_pd (__m512d __v1_old, __mmask8 __mask,
+			  __m512i __index, void const *__addr, int __scale)
 {
-  return (__m128d) __builtin_ia32_vfmaddsd3_maskz ((__v2df) __W,
-						   -(__v2df) __A,
-						   -(__v2df) __B,
-						   (__mmask8) __U,
-						   _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_gatherdiv8df ((__v8df) __v1_old,
+						__addr,
+						(__v8di) __index,
+						__mask, __scale);
 }
 
-extern __inline __m128
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fnmsub_ss (__mmask8 __U, __m128 __W, __m128 __A, __m128 __B)
+_mm512_i32gather_epi32 (__m512i __index, void const *__addr, int __scale)
 {
-  return (__m128) __builtin_ia32_vfmaddss3_maskz ((__v4sf) __W,
-						  -(__v4sf) __A,
-						  -(__v4sf) __B,
-						  (__mmask8) __U,
-						  _MM_FROUND_CUR_DIRECTION);
+  __m512i __v1_old = _mm512_undefined_epi32 ();
+  __mmask16 __mask = 0xFFFF;
+
+  return (__m512i) __builtin_ia32_gathersiv16si ((__v16si) __v1_old,
+						 __addr,
+						 (__v16si) __index,
+						 __mask, __scale);
 }
 
-#ifdef __OPTIMIZE__
-extern __inline __m128d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fmadd_round_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B,
-			 const int __R)
+_mm512_mask_i32gather_epi32 (__m512i __v1_old, __mmask16 __mask,
+			     __m512i __index, void const *__addr, int __scale)
 {
-  return (__m128d) __builtin_ia32_vfmaddsd3_mask ((__v2df) __W,
-						  (__v2df) __A,
-						  (__v2df) __B,
-						  (__mmask8) __U, __R);
+  return (__m512i) __builtin_ia32_gathersiv16si ((__v16si) __v1_old,
+						 __addr,
+						 (__v16si) __index,
+						 __mask, __scale);
 }
 
-extern __inline __m128
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fmadd_round_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B,
-			 const int __R)
+_mm512_i32gather_epi64 (__m256i __index, void const *__addr, int __scale)
 {
-  return (__m128) __builtin_ia32_vfmaddss3_mask ((__v4sf) __W,
-						 (__v4sf) __A,
-						 (__v4sf) __B,
-						 (__mmask8) __U, __R);
-}
+  __m512i __v1_old = _mm512_undefined_epi32 ();
+  __mmask8 __mask = 0xFF;
 
-extern __inline __m128d
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fmadd_round_sd (__m128d __W, __m128d __A, __m128d __B, __mmask8 __U,
-			  const int __R)
-{
-  return (__m128d) __builtin_ia32_vfmaddsd3_mask3 ((__v2df) __W,
-						   (__v2df) __A,
-						   (__v2df) __B,
-						   (__mmask8) __U, __R);
+  return (__m512i) __builtin_ia32_gathersiv8di ((__v8di) __v1_old,
+						__addr,
+						(__v8si) __index, __mask,
+						__scale);
 }
 
-extern __inline __m128
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fmadd_round_ss (__m128 __W, __m128 __A, __m128 __B, __mmask8 __U,
-			  const int __R)
+_mm512_mask_i32gather_epi64 (__m512i __v1_old, __mmask8 __mask,
+			     __m256i __index, void const *__addr,
+			     int __scale)
 {
-  return (__m128) __builtin_ia32_vfmaddss3_mask3 ((__v4sf) __W,
-						  (__v4sf) __A,
-						  (__v4sf) __B,
-						  (__mmask8) __U, __R);
+  return (__m512i) __builtin_ia32_gathersiv8di ((__v8di) __v1_old,
+						__addr,
+						(__v8si) __index,
+						__mask, __scale);
 }
 
-extern __inline __m128d
+extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fmadd_round_sd (__mmask8 __U, __m128d __W, __m128d __A, __m128d __B,
-			  const int __R)
+_mm512_i64gather_epi32 (__m512i __index, void const *__addr, int __scale)
 {
-  return (__m128d) __builtin_ia32_vfmaddsd3_maskz ((__v2df) __W,
-						   (__v2df) __A,
-						   (__v2df) __B,
-						   (__mmask8) __U, __R);
+  __m256i __v1_old = _mm256_undefined_si256 ();
+  __mmask8 __mask = 0xFF;
+
+  return (__m256i) __builtin_ia32_gatherdiv16si ((__v8si) __v1_old,
+						 __addr,
+						 (__v8di) __index,
+						 __mask, __scale);
 }
 
-extern __inline __m128
+extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fmadd_round_ss (__mmask8 __U, __m128 __W, __m128 __A, __m128 __B,
-			  const int __R)
+_mm512_mask_i64gather_epi32 (__m256i __v1_old, __mmask8 __mask,
+			     __m512i __index, void const *__addr, int __scale)
 {
-  return (__m128) __builtin_ia32_vfmaddss3_maskz ((__v4sf) __W,
-						  (__v4sf) __A,
-						  (__v4sf) __B,
-						  (__mmask8) __U, __R);
+  return (__m256i) __builtin_ia32_gatherdiv16si ((__v8si) __v1_old,
+						 __addr,
+						 (__v8di) __index,
+						 __mask, __scale);
 }
 
-extern __inline __m128d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fmsub_round_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B,
-			 const int __R)
+_mm512_i64gather_epi64 (__m512i __index, void const *__addr, int __scale)
 {
-  return (__m128d) __builtin_ia32_vfmaddsd3_mask ((__v2df) __W,
-						  (__v2df) __A,
-						  -(__v2df) __B,
-						  (__mmask8) __U, __R);
+  __m512i __v1_old = _mm512_undefined_epi32 ();
+  __mmask8 __mask = 0xFF;
+
+  return (__m512i) __builtin_ia32_gatherdiv8di ((__v8di) __v1_old,
+						__addr,
+						(__v8di) __index, __mask,
+						__scale);
 }
 
-extern __inline __m128
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fmsub_round_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B,
-			 const int __R)
+_mm512_mask_i64gather_epi64 (__m512i __v1_old, __mmask8 __mask,
+			     __m512i __index, void const *__addr,
+			     int __scale)
 {
-  return (__m128) __builtin_ia32_vfmaddss3_mask ((__v4sf) __W,
-						 (__v4sf) __A,
-						 -(__v4sf) __B,
-						 (__mmask8) __U, __R);
+  return (__m512i) __builtin_ia32_gatherdiv8di ((__v8di) __v1_old,
+						__addr,
+						(__v8di) __index,
+						__mask, __scale);
 }
 
-extern __inline __m128d
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fmsub_round_sd (__m128d __W, __m128d __A, __m128d __B, __mmask8 __U,
-			  const int __R)
+_mm512_i32scatter_ps (void *__addr, __m512i __index, __m512 __v1, int __scale)
 {
-  return (__m128d) __builtin_ia32_vfmsubsd3_mask3 ((__v2df) __W,
-						   (__v2df) __A,
-						   (__v2df) __B,
-						   (__mmask8) __U, __R);
+  __builtin_ia32_scattersiv16sf (__addr, (__mmask16) 0xFFFF,
+				 (__v16si) __index, (__v16sf) __v1, __scale);
 }
 
-extern __inline __m128
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fmsub_round_ss (__m128 __W, __m128 __A, __m128 __B, __mmask8 __U,
-			  const int __R)
+_mm512_mask_i32scatter_ps (void *__addr, __mmask16 __mask,
+			   __m512i __index, __m512 __v1, int __scale)
 {
-  return (__m128) __builtin_ia32_vfmsubss3_mask3 ((__v4sf) __W,
-						  (__v4sf) __A,
-						  (__v4sf) __B,
-						  (__mmask8) __U, __R);
+  __builtin_ia32_scattersiv16sf (__addr, __mask, (__v16si) __index,
+				 (__v16sf) __v1, __scale);
 }
 
-extern __inline __m128d
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fmsub_round_sd (__mmask8 __U, __m128d __W, __m128d __A, __m128d __B,
-			  const int __R)
+_mm512_i32scatter_pd (void *__addr, __m256i __index, __m512d __v1,
+		      int __scale)
 {
-  return (__m128d) __builtin_ia32_vfmaddsd3_maskz ((__v2df) __W,
-						   (__v2df) __A,
-						   -(__v2df) __B,
-						   (__mmask8) __U, __R);
+  __builtin_ia32_scattersiv8df (__addr, (__mmask8) 0xFF,
+				(__v8si) __index, (__v8df) __v1, __scale);
 }
 
-extern __inline __m128
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fmsub_round_ss (__mmask8 __U, __m128 __W, __m128 __A, __m128 __B,
-			  const int __R)
+_mm512_mask_i32scatter_pd (void *__addr, __mmask8 __mask,
+			   __m256i __index, __m512d __v1, int __scale)
 {
-  return (__m128) __builtin_ia32_vfmaddss3_maskz ((__v4sf) __W,
-						  (__v4sf) __A,
-						  -(__v4sf) __B,
-						  (__mmask8) __U, __R);
+  __builtin_ia32_scattersiv8df (__addr, __mask, (__v8si) __index,
+				(__v8df) __v1, __scale);
 }
 
-extern __inline __m128d
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fnmadd_round_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B,
-			 const int __R)
+_mm512_i64scatter_ps (void *__addr, __m512i __index, __m256 __v1, int __scale)
 {
-  return (__m128d) __builtin_ia32_vfmaddsd3_mask ((__v2df) __W,
-						  -(__v2df) __A,
-						  (__v2df) __B,
-						  (__mmask8) __U, __R);
+  __builtin_ia32_scatterdiv16sf (__addr, (__mmask8) 0xFF,
+				 (__v8di) __index, (__v8sf) __v1, __scale);
 }
 
-extern __inline __m128
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fnmadd_round_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B,
-			 const int __R)
+_mm512_mask_i64scatter_ps (void *__addr, __mmask8 __mask,
+			   __m512i __index, __m256 __v1, int __scale)
 {
-  return (__m128) __builtin_ia32_vfmaddss3_mask ((__v4sf) __W,
-						 -(__v4sf) __A,
-						 (__v4sf) __B,
-						 (__mmask8) __U, __R);
+  __builtin_ia32_scatterdiv16sf (__addr, __mask, (__v8di) __index,
+				 (__v8sf) __v1, __scale);
 }
 
-extern __inline __m128d
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fnmadd_round_sd (__m128d __W, __m128d __A, __m128d __B, __mmask8 __U,
-			  const int __R)
+_mm512_i64scatter_pd (void *__addr, __m512i __index, __m512d __v1,
+		      int __scale)
 {
-  return (__m128d) __builtin_ia32_vfmaddsd3_mask3 ((__v2df) __W,
-						   -(__v2df) __A,
-						   (__v2df) __B,
-						   (__mmask8) __U, __R);
+  __builtin_ia32_scatterdiv8df (__addr, (__mmask8) 0xFF,
+				(__v8di) __index, (__v8df) __v1, __scale);
 }
 
-extern __inline __m128
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fnmadd_round_ss (__m128 __W, __m128 __A, __m128 __B, __mmask8 __U,
-			  const int __R)
+_mm512_mask_i64scatter_pd (void *__addr, __mmask8 __mask,
+			   __m512i __index, __m512d __v1, int __scale)
 {
-  return (__m128) __builtin_ia32_vfmaddss3_mask3 ((__v4sf) __W,
-						  -(__v4sf) __A,
-						  (__v4sf) __B,
-						  (__mmask8) __U, __R);
+  __builtin_ia32_scatterdiv8df (__addr, __mask, (__v8di) __index,
+				(__v8df) __v1, __scale);
 }
 
-extern __inline __m128d
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fnmadd_round_sd (__mmask8 __U, __m128d __W, __m128d __A, __m128d __B,
-			  const int __R)
+_mm512_i32scatter_epi32 (void *__addr, __m512i __index,
+			 __m512i __v1, int __scale)
 {
-  return (__m128d) __builtin_ia32_vfmaddsd3_maskz ((__v2df) __W,
-						   -(__v2df) __A,
-						   (__v2df) __B,
-						   (__mmask8) __U, __R);
+  __builtin_ia32_scattersiv16si (__addr, (__mmask16) 0xFFFF,
+				 (__v16si) __index, (__v16si) __v1, __scale);
 }
 
-extern __inline __m128
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fnmadd_round_ss (__mmask8 __U, __m128 __W, __m128 __A, __m128 __B,
-			  const int __R)
+_mm512_mask_i32scatter_epi32 (void *__addr, __mmask16 __mask,
+			      __m512i __index, __m512i __v1, int __scale)
 {
-  return (__m128) __builtin_ia32_vfmaddss3_maskz ((__v4sf) __W,
-						  -(__v4sf) __A,
-						  (__v4sf) __B,
-						  (__mmask8) __U, __R);
+  __builtin_ia32_scattersiv16si (__addr, __mask, (__v16si) __index,
+				 (__v16si) __v1, __scale);
 }
 
-extern __inline __m128d
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fnmsub_round_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B,
-			 const int __R)
+_mm512_i32scatter_epi64 (void *__addr, __m256i __index,
+			 __m512i __v1, int __scale)
 {
-  return (__m128d) __builtin_ia32_vfmaddsd3_mask ((__v2df) __W,
-						  -(__v2df) __A,
-						  -(__v2df) __B,
-						  (__mmask8) __U, __R);
+  __builtin_ia32_scattersiv8di (__addr, (__mmask8) 0xFF,
+				(__v8si) __index, (__v8di) __v1, __scale);
 }
 
-extern __inline __m128
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fnmsub_round_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B,
-			 const int __R)
-{
-  return (__m128) __builtin_ia32_vfmaddss3_mask ((__v4sf) __W,
-						 -(__v4sf) __A,
-						 -(__v4sf) __B,
-						 (__mmask8) __U, __R);
+_mm512_mask_i32scatter_epi64 (void *__addr, __mmask8 __mask,
+			      __m256i __index, __m512i __v1, int __scale)
+{
+  __builtin_ia32_scattersiv8di (__addr, __mask, (__v8si) __index,
+				(__v8di) __v1, __scale);
 }
 
-extern __inline __m128d
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fnmsub_round_sd (__m128d __W, __m128d __A, __m128d __B, __mmask8 __U,
-			  const int __R)
+_mm512_i64scatter_epi32 (void *__addr, __m512i __index,
+			 __m256i __v1, int __scale)
 {
-  return (__m128d) __builtin_ia32_vfmsubsd3_mask3 ((__v2df) __W,
-						   -(__v2df) __A,
-						   (__v2df) __B,
-						   (__mmask8) __U, __R);
+  __builtin_ia32_scatterdiv16si (__addr, (__mmask8) 0xFF,
+				 (__v8di) __index, (__v8si) __v1, __scale);
 }
 
-extern __inline __m128
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fnmsub_round_ss (__m128 __W, __m128 __A, __m128 __B, __mmask8 __U,
-			  const int __R)
+_mm512_mask_i64scatter_epi32 (void *__addr, __mmask8 __mask,
+			      __m512i __index, __m256i __v1, int __scale)
 {
-  return (__m128) __builtin_ia32_vfmsubss3_mask3 ((__v4sf) __W,
-						  -(__v4sf) __A,
-						  (__v4sf) __B,
-						  (__mmask8) __U, __R);
+  __builtin_ia32_scatterdiv16si (__addr, __mask, (__v8di) __index,
+				 (__v8si) __v1, __scale);
 }
 
-extern __inline __m128d
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fnmsub_round_sd (__mmask8 __U, __m128d __W, __m128d __A, __m128d __B,
-			  const int __R)
+_mm512_i64scatter_epi64 (void *__addr, __m512i __index,
+			 __m512i __v1, int __scale)
 {
-  return (__m128d) __builtin_ia32_vfmaddsd3_maskz ((__v2df) __W,
-						   -(__v2df) __A,
-						   -(__v2df) __B,
-						   (__mmask8) __U, __R);
+  __builtin_ia32_scatterdiv8di (__addr, (__mmask8) 0xFF,
+				(__v8di) __index, (__v8di) __v1, __scale);
 }
 
-extern __inline __m128
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fnmsub_round_ss (__mmask8 __U, __m128 __W, __m128 __A, __m128 __B,
-			  const int __R)
+_mm512_mask_i64scatter_epi64 (void *__addr, __mmask8 __mask,
+			      __m512i __index, __m512i __v1, int __scale)
 {
-  return (__m128) __builtin_ia32_vfmaddss3_maskz ((__v4sf) __W,
-						  -(__v4sf) __A,
-						  -(__v4sf) __B,
-						  (__mmask8) __U, __R);
+  __builtin_ia32_scatterdiv8di (__addr, __mask, (__v8di) __index,
+				(__v8di) __v1, __scale);
 }
 #else
-#define _mm_mask_fmadd_round_sd(A, U, B, C, R)            \
-    (__m128d) __builtin_ia32_vfmaddsd3_mask (A, B, C, U, R)
+#define _mm512_i32gather_ps(INDEX, ADDR, SCALE)				\
+  (__m512) __builtin_ia32_gathersiv16sf ((__v16sf)_mm512_undefined_ps(),\
+					 (void const *) (ADDR),		\
+					 (__v16si)(__m512i) (INDEX),	\
+					 (__mmask16)0xFFFF,		\
+					 (int) (SCALE))
 
-#define _mm_mask_fmadd_round_ss(A, U, B, C, R)            \
-    (__m128) __builtin_ia32_vfmaddss3_mask (A, B, C, U, R)
+#define _mm512_mask_i32gather_ps(V1OLD, MASK, INDEX, ADDR, SCALE)	\
+  (__m512) __builtin_ia32_gathersiv16sf ((__v16sf)(__m512) (V1OLD),	\
+					 (void const *) (ADDR),		\
+					 (__v16si)(__m512i) (INDEX),	\
+					 (__mmask16) (MASK),		\
+					 (int) (SCALE))
 
-#define _mm_mask3_fmadd_round_sd(A, B, C, U, R)            \
-    (__m128d) __builtin_ia32_vfmaddsd3_mask3 (A, B, C, U, R)
+#define _mm512_i32gather_pd(INDEX, ADDR, SCALE)				\
+  (__m512d) __builtin_ia32_gathersiv8df ((__v8df)_mm512_undefined_pd(),	\
+					 (void const *) (ADDR),		\
+					 (__v8si)(__m256i) (INDEX),	\
+					 (__mmask8)0xFF, (int) (SCALE))
 
-#define _mm_mask3_fmadd_round_ss(A, B, C, U, R)            \
-    (__m128) __builtin_ia32_vfmaddss3_mask3 (A, B, C, U, R)
+#define _mm512_mask_i32gather_pd(V1OLD, MASK, INDEX, ADDR, SCALE)	\
+  (__m512d) __builtin_ia32_gathersiv8df ((__v8df)(__m512d) (V1OLD),	\
+					 (void const *) (ADDR),		\
+					 (__v8si)(__m256i) (INDEX),	\
+					 (__mmask8) (MASK),		\
+					 (int) (SCALE))
 
-#define _mm_maskz_fmadd_round_sd(U, A, B, C, R)            \
-    (__m128d) __builtin_ia32_vfmaddsd3_maskz (A, B, C, U, R)
+#define _mm512_i64gather_ps(INDEX, ADDR, SCALE)				\
+  (__m256) __builtin_ia32_gatherdiv16sf ((__v8sf)_mm256_undefined_ps(),	\
+					 (void const *) (ADDR),		\
+					 (__v8di)(__m512i) (INDEX),	\
+					 (__mmask8)0xFF, (int) (SCALE))
 
-#define _mm_maskz_fmadd_round_ss(U, A, B, C, R)            \
-    (__m128) __builtin_ia32_vfmaddss3_maskz (A, B, C, U, R)
+#define _mm512_mask_i64gather_ps(V1OLD, MASK, INDEX, ADDR, SCALE)	\
+  (__m256) __builtin_ia32_gatherdiv16sf ((__v8sf)(__m256) (V1OLD),	\
+					 (void const *) (ADDR),		\
+					 (__v8di)(__m512i) (INDEX),	\
+					 (__mmask8) (MASK),		\
+					 (int) (SCALE))
 
-#define _mm_mask_fmsub_round_sd(A, U, B, C, R)            \
-    (__m128d) __builtin_ia32_vfmaddsd3_mask (A, B, -(C), U, R)
+#define _mm512_i64gather_pd(INDEX, ADDR, SCALE)				\
+  (__m512d) __builtin_ia32_gatherdiv8df ((__v8df)_mm512_undefined_pd(),	\
+					 (void const *) (ADDR),		\
+					 (__v8di)(__m512i) (INDEX),	\
+					 (__mmask8)0xFF, (int) (SCALE))
 
-#define _mm_mask_fmsub_round_ss(A, U, B, C, R)            \
-    (__m128) __builtin_ia32_vfmaddss3_mask (A, B, -(C), U, R)
+#define _mm512_mask_i64gather_pd(V1OLD, MASK, INDEX, ADDR, SCALE)	\
+  (__m512d) __builtin_ia32_gatherdiv8df ((__v8df)(__m512d) (V1OLD),	\
+					 (void const *) (ADDR),		\
+					 (__v8di)(__m512i) (INDEX),	\
+					 (__mmask8) (MASK),		\
+					 (int) (SCALE))
 
-#define _mm_mask3_fmsub_round_sd(A, B, C, U, R)            \
-    (__m128d) __builtin_ia32_vfmsubsd3_mask3 (A, B, C, U, R)
+#define _mm512_i32gather_epi32(INDEX, ADDR, SCALE)			\
+  (__m512i) __builtin_ia32_gathersiv16si ((__v16si)_mm512_undefined_epi32 (),\
+					  (void const *) (ADDR),	\
+					  (__v16si)(__m512i) (INDEX),	\
+					  (__mmask16)0xFFFF,		\
+					  (int) (SCALE))
 
-#define _mm_mask3_fmsub_round_ss(A, B, C, U, R)            \
-    (__m128) __builtin_ia32_vfmsubss3_mask3 (A, B, C, U, R)
+#define _mm512_mask_i32gather_epi32(V1OLD, MASK, INDEX, ADDR, SCALE)	\
+  (__m512i) __builtin_ia32_gathersiv16si ((__v16si)(__m512i) (V1OLD),	\
+					  (void const *) (ADDR),	\
+					  (__v16si)(__m512i) (INDEX),	\
+					  (__mmask16) (MASK),		\
+					  (int) (SCALE))
 
-#define _mm_maskz_fmsub_round_sd(U, A, B, C, R)            \
-    (__m128d) __builtin_ia32_vfmaddsd3_maskz (A, B, -(C), U, R)
+#define _mm512_i32gather_epi64(INDEX, ADDR, SCALE)			\
+  (__m512i) __builtin_ia32_gathersiv8di ((__v8di)_mm512_undefined_epi32 (),\
+					 (void const *) (ADDR),		\
+					 (__v8si)(__m256i) (INDEX),	\
+					 (__mmask8)0xFF, (int) (SCALE))
 
-#define _mm_maskz_fmsub_round_ss(U, A, B, C, R)            \
-    (__m128) __builtin_ia32_vfmaddss3_maskz (A, B, -(C), U, R)
+#define _mm512_mask_i32gather_epi64(V1OLD, MASK, INDEX, ADDR, SCALE)	\
+  (__m512i) __builtin_ia32_gathersiv8di ((__v8di)(__m512i) (V1OLD),	\
+					 (void const *) (ADDR),		\
+					 (__v8si)(__m256i) (INDEX),	\
+					 (__mmask8) (MASK),		\
+					 (int) (SCALE))
 
-#define _mm_mask_fnmadd_round_sd(A, U, B, C, R)            \
-    (__m128d) __builtin_ia32_vfmaddsd3_mask (A, -(B), C, U, R)
+#define _mm512_i64gather_epi32(INDEX, ADDR, SCALE)			   \
+  (__m256i) __builtin_ia32_gatherdiv16si ((__v8si)_mm256_undefined_si256(),\
+					  (void const *) (ADDR),	   \
+					  (__v8di)(__m512i) (INDEX),	   \
+					  (__mmask8)0xFF, (int) (SCALE))
 
-#define _mm_mask_fnmadd_round_ss(A, U, B, C, R)            \
-    (__m128) __builtin_ia32_vfmaddss3_mask (A, -(B), C, U, R)
+#define _mm512_mask_i64gather_epi32(V1OLD, MASK, INDEX, ADDR, SCALE)	\
+  (__m256i) __builtin_ia32_gatherdiv16si ((__v8si)(__m256i) (V1OLD),	\
+					  (void const *) (ADDR),	\
+					  (__v8di)(__m512i) (INDEX),	\
+					  (__mmask8) (MASK),		\
+					  (int) (SCALE))
 
-#define _mm_mask3_fnmadd_round_sd(A, B, C, U, R)            \
-    (__m128d) __builtin_ia32_vfmaddsd3_mask3 (A, -(B), C, U, R)
+#define _mm512_i64gather_epi64(INDEX, ADDR, SCALE)			\
+  (__m512i) __builtin_ia32_gatherdiv8di ((__v8di)_mm512_undefined_epi32 (),\
+					 (void const *) (ADDR),		\
+					 (__v8di)(__m512i) (INDEX),	\
+					 (__mmask8)0xFF, (int) (SCALE))
 
-#define _mm_mask3_fnmadd_round_ss(A, B, C, U, R)            \
-    (__m128) __builtin_ia32_vfmaddss3_mask3 (A, -(B), C, U, R)
+#define _mm512_mask_i64gather_epi64(V1OLD, MASK, INDEX, ADDR, SCALE)	\
+  (__m512i) __builtin_ia32_gatherdiv8di ((__v8di)(__m512i) (V1OLD),	\
+					 (void const *) (ADDR),		\
+					 (__v8di)(__m512i) (INDEX),	\
+					 (__mmask8) (MASK),		\
+					 (int) (SCALE))
 
-#define _mm_maskz_fnmadd_round_sd(U, A, B, C, R)            \
-    (__m128d) __builtin_ia32_vfmaddsd3_maskz (A, -(B), C, U, R)
+#define _mm512_i32scatter_ps(ADDR, INDEX, V1, SCALE)			\
+  __builtin_ia32_scattersiv16sf ((void *) (ADDR), (__mmask16)0xFFFF,	\
+				 (__v16si)(__m512i) (INDEX),		\
+				 (__v16sf)(__m512) (V1), (int) (SCALE))
 
-#define _mm_maskz_fnmadd_round_ss(U, A, B, C, R)            \
-    (__m128) __builtin_ia32_vfmaddss3_maskz (A, -(B), C, U, R)
+#define _mm512_mask_i32scatter_ps(ADDR, MASK, INDEX, V1, SCALE)		\
+  __builtin_ia32_scattersiv16sf ((void *) (ADDR), (__mmask16) (MASK),	\
+				 (__v16si)(__m512i) (INDEX),		\
+				 (__v16sf)(__m512) (V1), (int) (SCALE))
 
-#define _mm_mask_fnmsub_round_sd(A, U, B, C, R)            \
-    (__m128d) __builtin_ia32_vfmaddsd3_mask (A, -(B), -(C), U, R)
+#define _mm512_i32scatter_pd(ADDR, INDEX, V1, SCALE)			\
+  __builtin_ia32_scattersiv8df ((void *) (ADDR), (__mmask8)0xFF,	\
+				(__v8si)(__m256i) (INDEX),		\
+				(__v8df)(__m512d) (V1), (int) (SCALE))
 
-#define _mm_mask_fnmsub_round_ss(A, U, B, C, R)            \
-    (__m128) __builtin_ia32_vfmaddss3_mask (A, -(B), -(C), U, R)
+#define _mm512_mask_i32scatter_pd(ADDR, MASK, INDEX, V1, SCALE)		\
+  __builtin_ia32_scattersiv8df ((void *) (ADDR), (__mmask8) (MASK),	\
+				(__v8si)(__m256i) (INDEX),		\
+				(__v8df)(__m512d) (V1), (int) (SCALE))
+
+#define _mm512_i64scatter_ps(ADDR, INDEX, V1, SCALE)			\
+  __builtin_ia32_scatterdiv16sf ((void *) (ADDR), (__mmask8)0xFF,	\
+				 (__v8di)(__m512i) (INDEX),		\
+				 (__v8sf)(__m256) (V1), (int) (SCALE))
+
+#define _mm512_mask_i64scatter_ps(ADDR, MASK, INDEX, V1, SCALE)		\
+  __builtin_ia32_scatterdiv16sf ((void *) (ADDR), (__mmask16) (MASK),	\
+				 (__v8di)(__m512i) (INDEX),		\
+				 (__v8sf)(__m256) (V1), (int) (SCALE))
+
+#define _mm512_i64scatter_pd(ADDR, INDEX, V1, SCALE)			\
+  __builtin_ia32_scatterdiv8df ((void *) (ADDR), (__mmask8)0xFF,	\
+				(__v8di)(__m512i) (INDEX),		\
+				(__v8df)(__m512d) (V1), (int) (SCALE))
+
+#define _mm512_mask_i64scatter_pd(ADDR, MASK, INDEX, V1, SCALE)		\
+  __builtin_ia32_scatterdiv8df ((void *) (ADDR), (__mmask8) (MASK),	\
+				(__v8di)(__m512i) (INDEX),		\
+				(__v8df)(__m512d) (V1), (int) (SCALE))
+
+#define _mm512_i32scatter_epi32(ADDR, INDEX, V1, SCALE)			\
+  __builtin_ia32_scattersiv16si ((void *) (ADDR), (__mmask16)0xFFFF,	\
+				 (__v16si)(__m512i) (INDEX),		\
+				 (__v16si)(__m512i) (V1), (int) (SCALE))
+
+#define _mm512_mask_i32scatter_epi32(ADDR, MASK, INDEX, V1, SCALE)	\
+  __builtin_ia32_scattersiv16si ((void *) (ADDR), (__mmask16) (MASK),	\
+				 (__v16si)(__m512i) (INDEX),		\
+				 (__v16si)(__m512i) (V1), (int) (SCALE))
+
+#define _mm512_i32scatter_epi64(ADDR, INDEX, V1, SCALE)			\
+  __builtin_ia32_scattersiv8di ((void *) (ADDR), (__mmask8)0xFF,	\
+				(__v8si)(__m256i) (INDEX),		\
+				(__v8di)(__m512i) (V1), (int) (SCALE))
+
+#define _mm512_mask_i32scatter_epi64(ADDR, MASK, INDEX, V1, SCALE)	\
+  __builtin_ia32_scattersiv8di ((void *) (ADDR), (__mmask8) (MASK),	\
+				(__v8si)(__m256i) (INDEX),		\
+				(__v8di)(__m512i) (V1), (int) (SCALE))
+
+#define _mm512_i64scatter_epi32(ADDR, INDEX, V1, SCALE)			\
+  __builtin_ia32_scatterdiv16si ((void *) (ADDR), (__mmask8)0xFF,	\
+				 (__v8di)(__m512i) (INDEX),		\
+				 (__v8si)(__m256i) (V1), (int) (SCALE))
+
+#define _mm512_mask_i64scatter_epi32(ADDR, MASK, INDEX, V1, SCALE)	\
+  __builtin_ia32_scatterdiv16si ((void *) (ADDR), (__mmask8) (MASK),	\
+				 (__v8di)(__m512i) (INDEX),		\
+				 (__v8si)(__m256i) (V1), (int) (SCALE))
+
+#define _mm512_i64scatter_epi64(ADDR, INDEX, V1, SCALE)			\
+  __builtin_ia32_scatterdiv8di ((void *) (ADDR), (__mmask8)0xFF,	\
+				(__v8di)(__m512i) (INDEX),		\
+				(__v8di)(__m512i) (V1), (int) (SCALE))
+
+#define _mm512_mask_i64scatter_epi64(ADDR, MASK, INDEX, V1, SCALE)	\
+  __builtin_ia32_scatterdiv8di ((void *) (ADDR), (__mmask8) (MASK),	\
+				(__v8di)(__m512i) (INDEX),		\
+				(__v8di)(__m512i) (V1), (int) (SCALE))
+#endif
 
-#define _mm_mask3_fnmsub_round_sd(A, B, C, U, R)            \
-    (__m128d) __builtin_ia32_vfmsubsd3_mask3 (A, -(B), C, U, R)
+extern __inline __m512d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_compress_pd (__m512d __W, __mmask8 __U, __m512d __A)
+{
+  return (__m512d) __builtin_ia32_compressdf512_mask ((__v8df) __A,
+						      (__v8df) __W,
+						      (__mmask8) __U);
+}
 
-#define _mm_mask3_fnmsub_round_ss(A, B, C, U, R)            \
-    (__m128) __builtin_ia32_vfmsubss3_mask3 (A, -(B), C, U, R)
+extern __inline __m512d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_compress_pd (__mmask8 __U, __m512d __A)
+{
+  return (__m512d) __builtin_ia32_compressdf512_mask ((__v8df) __A,
+						      (__v8df)
+						      _mm512_setzero_pd (),
+						      (__mmask8) __U);
+}
 
-#define _mm_maskz_fnmsub_round_sd(U, A, B, C, R)            \
-    (__m128d) __builtin_ia32_vfmaddsd3_maskz (A, -(B), -(C), U, R)
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_compressstoreu_pd (void *__P, __mmask8 __U, __m512d __A)
+{
+  __builtin_ia32_compressstoredf512_mask ((__v8df *) __P, (__v8df) __A,
+					  (__mmask8) __U);
+}
 
-#define _mm_maskz_fnmsub_round_ss(U, A, B, C, R)            \
-    (__m128) __builtin_ia32_vfmaddss3_maskz (A, -(B), -(C), U, R)
-#endif
+extern __inline __m512
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_compress_ps (__m512 __W, __mmask16 __U, __m512 __A)
+{
+  return (__m512) __builtin_ia32_compresssf512_mask ((__v16sf) __A,
+						     (__v16sf) __W,
+						     (__mmask16) __U);
+}
 
-#ifdef __OPTIMIZE__
-extern __inline int
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_comi_round_ss (__m128 __A, __m128 __B, const int __P, const int __R)
+_mm512_maskz_compress_ps (__mmask16 __U, __m512 __A)
 {
-  return __builtin_ia32_vcomiss ((__v4sf) __A, (__v4sf) __B, __P, __R);
+  return (__m512) __builtin_ia32_compresssf512_mask ((__v16sf) __A,
+						     (__v16sf)
+						     _mm512_setzero_ps (),
+						     (__mmask16) __U);
 }
 
-extern __inline int
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_comi_round_sd (__m128d __A, __m128d __B, const int __P, const int __R)
+_mm512_mask_compressstoreu_ps (void *__P, __mmask16 __U, __m512 __A)
 {
-  return __builtin_ia32_vcomisd ((__v2df) __A, (__v2df) __B, __P, __R);
+  __builtin_ia32_compressstoresf512_mask ((__v16sf *) __P, (__v16sf) __A,
+					  (__mmask16) __U);
 }
-#else
-#define _mm_comi_round_ss(A, B, C, D)\
-__builtin_ia32_vcomiss(A, B, C, D)
-#define _mm_comi_round_sd(A, B, C, D)\
-__builtin_ia32_vcomisd(A, B, C, D)
-#endif
 
-extern __inline __m512d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_sqrt_pd (__m512d __A)
+_mm512_mask_compress_epi64 (__m512i __W, __mmask8 __U, __m512i __A)
 {
-  return (__m512d) __builtin_ia32_sqrtpd512_mask ((__v8df) __A,
-						  (__v8df)
-						  _mm512_undefined_pd (),
-						  (__mmask8) -1,
-						  _MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_compressdi512_mask ((__v8di) __A,
+						      (__v8di) __W,
+						      (__mmask8) __U);
 }
 
-extern __inline __m512d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_sqrt_pd (__m512d __W, __mmask8 __U, __m512d __A)
+_mm512_maskz_compress_epi64 (__mmask8 __U, __m512i __A)
 {
-  return (__m512d) __builtin_ia32_sqrtpd512_mask ((__v8df) __A,
-						  (__v8df) __W,
-						  (__mmask8) __U,
-						  _MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_compressdi512_mask ((__v8di) __A,
+						      (__v8di)
+						      _mm512_setzero_si512 (),
+						      (__mmask8) __U);
 }
 
-extern __inline __m512d
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_sqrt_pd (__mmask8 __U, __m512d __A)
+_mm512_mask_compressstoreu_epi64 (void *__P, __mmask8 __U, __m512i __A)
 {
-  return (__m512d) __builtin_ia32_sqrtpd512_mask ((__v8df) __A,
-						  (__v8df)
-						  _mm512_setzero_pd (),
-						  (__mmask8) __U,
-						  _MM_FROUND_CUR_DIRECTION);
+  __builtin_ia32_compressstoredi512_mask ((__v8di *) __P, (__v8di) __A,
+					  (__mmask8) __U);
 }
 
-extern __inline __m512
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_sqrt_ps (__m512 __A)
+_mm512_mask_compress_epi32 (__m512i __W, __mmask16 __U, __m512i __A)
 {
-  return (__m512) __builtin_ia32_sqrtps512_mask ((__v16sf) __A,
-						 (__v16sf)
-						 _mm512_undefined_ps (),
-						 (__mmask16) -1,
-						 _MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_compresssi512_mask ((__v16si) __A,
+						      (__v16si) __W,
+						      (__mmask16) __U);
 }
 
-extern __inline __m512
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_sqrt_ps (__m512 __W, __mmask16 __U, __m512 __A)
+_mm512_maskz_compress_epi32 (__mmask16 __U, __m512i __A)
 {
-  return (__m512) __builtin_ia32_sqrtps512_mask ((__v16sf) __A,
-						 (__v16sf) __W,
-						 (__mmask16) __U,
-						 _MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_compresssi512_mask ((__v16si) __A,
+						      (__v16si)
+						      _mm512_setzero_si512 (),
+						      (__mmask16) __U);
 }
 
-extern __inline __m512
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_sqrt_ps (__mmask16 __U, __m512 __A)
+_mm512_mask_compressstoreu_epi32 (void *__P, __mmask16 __U, __m512i __A)
 {
-  return (__m512) __builtin_ia32_sqrtps512_mask ((__v16sf) __A,
-						 (__v16sf)
-						 _mm512_setzero_ps (),
-						 (__mmask16) __U,
-						 _MM_FROUND_CUR_DIRECTION);
+  __builtin_ia32_compressstoresi512_mask ((__v16si *) __P, (__v16si) __A,
+					  (__mmask16) __U);
 }
 
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_add_pd (__m512d __A, __m512d __B)
+_mm512_mask_expand_pd (__m512d __W, __mmask8 __U, __m512d __A)
 {
-  return (__m512d) ((__v8df)__A + (__v8df)__B);
+  return (__m512d) __builtin_ia32_expanddf512_mask ((__v8df) __A,
+						    (__v8df) __W,
+						    (__mmask8) __U);
 }
 
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_add_pd (__m512d __W, __mmask8 __U, __m512d __A, __m512d __B)
+_mm512_maskz_expand_pd (__mmask8 __U, __m512d __A)
 {
-  return (__m512d) __builtin_ia32_addpd512_mask ((__v8df) __A,
-						 (__v8df) __B,
-						 (__v8df) __W,
-						 (__mmask8) __U,
-						 _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_expanddf512_maskz ((__v8df) __A,
+						     (__v8df)
+						     _mm512_setzero_pd (),
+						     (__mmask8) __U);
 }
 
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_add_pd (__mmask8 __U, __m512d __A, __m512d __B)
+_mm512_mask_expandloadu_pd (__m512d __W, __mmask8 __U, void const *__P)
 {
-  return (__m512d) __builtin_ia32_addpd512_mask ((__v8df) __A,
-						 (__v8df) __B,
-						 (__v8df)
-						 _mm512_setzero_pd (),
-						 (__mmask8) __U,
-						 _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_expandloaddf512_mask ((const __v8df *) __P,
+							(__v8df) __W,
+							(__mmask8) __U);
 }
 
-extern __inline __m512
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_add_ps (__m512 __A, __m512 __B)
+_mm512_maskz_expandloadu_pd (__mmask8 __U, void const *__P)
 {
-  return (__m512) ((__v16sf)__A + (__v16sf)__B);
+  return (__m512d) __builtin_ia32_expandloaddf512_maskz ((const __v8df *) __P,
+							 (__v8df)
+							 _mm512_setzero_pd (),
+							 (__mmask8) __U);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_add_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B)
+_mm512_mask_expand_ps (__m512 __W, __mmask16 __U, __m512 __A)
 {
-  return (__m512) __builtin_ia32_addps512_mask ((__v16sf) __A,
-						(__v16sf) __B,
-						(__v16sf) __W,
-						(__mmask16) __U,
-						_MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_expandsf512_mask ((__v16sf) __A,
+						   (__v16sf) __W,
+						   (__mmask16) __U);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_add_ps (__mmask16 __U, __m512 __A, __m512 __B)
+_mm512_maskz_expand_ps (__mmask16 __U, __m512 __A)
 {
-  return (__m512) __builtin_ia32_addps512_mask ((__v16sf) __A,
-						(__v16sf) __B,
-						(__v16sf)
-						_mm512_setzero_ps (),
-						(__mmask16) __U,
-						_MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_expandsf512_maskz ((__v16sf) __A,
+						    (__v16sf)
+						    _mm512_setzero_ps (),
+						    (__mmask16) __U);
 }
 
-extern __inline __m128d
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_add_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B)
+_mm512_mask_expandloadu_ps (__m512 __W, __mmask16 __U, void const *__P)
 {
-  return (__m128d) __builtin_ia32_addsd_mask_round ((__v2df) __A,
-						(__v2df) __B,
-						(__v2df) __W,
-						(__mmask8) __U,
-						_MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_expandloadsf512_mask ((const __v16sf *) __P,
+						       (__v16sf) __W,
+						       (__mmask16) __U);
 }
 
-extern __inline __m128d
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_add_sd (__mmask8 __U, __m128d __A, __m128d __B)
+_mm512_maskz_expandloadu_ps (__mmask16 __U, void const *__P)
 {
-  return (__m128d) __builtin_ia32_addsd_mask_round ((__v2df) __A,
-						(__v2df) __B,
-						(__v2df)
-						_mm_setzero_pd (),
-						(__mmask8) __U,
-						_MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_expandloadsf512_maskz ((const __v16sf *) __P,
+							(__v16sf)
+							_mm512_setzero_ps (),
+							(__mmask16) __U);
 }
 
-extern __inline __m128
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_add_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B)
+_mm512_mask_expand_epi64 (__m512i __W, __mmask8 __U, __m512i __A)
 {
-  return (__m128) __builtin_ia32_addss_mask_round ((__v4sf) __A,
-						(__v4sf) __B,
-						(__v4sf) __W,
-						(__mmask8) __U,
-						_MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_expanddi512_mask ((__v8di) __A,
+						    (__v8di) __W,
+						    (__mmask8) __U);
 }
 
-extern __inline __m128
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_add_ss (__mmask8 __U, __m128 __A, __m128 __B)
+_mm512_maskz_expand_epi64 (__mmask8 __U, __m512i __A)
 {
-  return (__m128) __builtin_ia32_addss_mask_round ((__v4sf) __A,
-						(__v4sf) __B,
-						(__v4sf)
-						_mm_setzero_ps (),
-						(__mmask8) __U,
-						_MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_expanddi512_maskz ((__v8di) __A,
+						     (__v8di)
+						     _mm512_setzero_si512 (),
+						     (__mmask8) __U);
 }
 
-extern __inline __m512d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_sub_pd (__m512d __A, __m512d __B)
+_mm512_mask_expandloadu_epi64 (__m512i __W, __mmask8 __U, void const *__P)
 {
-  return (__m512d) ((__v8df)__A - (__v8df)__B);
+  return (__m512i) __builtin_ia32_expandloaddi512_mask ((const __v8di *) __P,
+							(__v8di) __W,
+							(__mmask8) __U);
 }
 
-extern __inline __m512d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_sub_pd (__m512d __W, __mmask8 __U, __m512d __A, __m512d __B)
+_mm512_maskz_expandloadu_epi64 (__mmask8 __U, void const *__P)
 {
-  return (__m512d) __builtin_ia32_subpd512_mask ((__v8df) __A,
-						 (__v8df) __B,
-						 (__v8df) __W,
-						 (__mmask8) __U,
-						 _MM_FROUND_CUR_DIRECTION);
+  return (__m512i)
+	 __builtin_ia32_expandloaddi512_maskz ((const __v8di *) __P,
+					       (__v8di)
+					       _mm512_setzero_si512 (),
+					       (__mmask8) __U);
 }
 
-extern __inline __m512d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_sub_pd (__mmask8 __U, __m512d __A, __m512d __B)
+_mm512_mask_expand_epi32 (__m512i __W, __mmask16 __U, __m512i __A)
 {
-  return (__m512d) __builtin_ia32_subpd512_mask ((__v8df) __A,
-						 (__v8df) __B,
-						 (__v8df)
-						 _mm512_setzero_pd (),
-						 (__mmask8) __U,
-						 _MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_expandsi512_mask ((__v16si) __A,
+						    (__v16si) __W,
+						    (__mmask16) __U);
 }
 
-extern __inline __m512
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_sub_ps (__m512 __A, __m512 __B)
+_mm512_maskz_expand_epi32 (__mmask16 __U, __m512i __A)
 {
-  return (__m512) ((__v16sf)__A - (__v16sf)__B);
+  return (__m512i) __builtin_ia32_expandsi512_maskz ((__v16si) __A,
+						     (__v16si)
+						     _mm512_setzero_si512 (),
+						     (__mmask16) __U);
 }
 
-extern __inline __m512
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_sub_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B)
+_mm512_mask_expandloadu_epi32 (__m512i __W, __mmask16 __U, void const *__P)
 {
-  return (__m512) __builtin_ia32_subps512_mask ((__v16sf) __A,
-						(__v16sf) __B,
-						(__v16sf) __W,
-						(__mmask16) __U,
-						_MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_expandloadsi512_mask ((const __v16si *) __P,
+							(__v16si) __W,
+							(__mmask16) __U);
 }
 
-extern __inline __m512
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_sub_ps (__mmask16 __U, __m512 __A, __m512 __B)
+_mm512_maskz_expandloadu_epi32 (__mmask16 __U, void const *__P)
 {
-  return (__m512) __builtin_ia32_subps512_mask ((__v16sf) __A,
-						(__v16sf) __B,
-						(__v16sf)
-						_mm512_setzero_ps (),
-						(__mmask16) __U,
-						_MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_expandloadsi512_maskz ((const __v16si *) __P,
+							 (__v16si)
+							 _mm512_setzero_si512
+							 (), (__mmask16) __U);
 }
 
-extern __inline __m128d
+extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_sub_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B)
+_mm512_kand (__mmask16 __A, __mmask16 __B)
 {
-  return (__m128d) __builtin_ia32_subsd_mask_round ((__v2df) __A,
-						(__v2df) __B,
-						(__v2df) __W,
-						(__mmask8) __U,
-						_MM_FROUND_CUR_DIRECTION);
+  return (__mmask16) __builtin_ia32_kandhi ((__mmask16) __A, (__mmask16) __B);
 }
 
-extern __inline __m128d
+extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_sub_sd (__mmask8 __U, __m128d __A, __m128d __B)
+_mm512_kandn (__mmask16 __A, __mmask16 __B)
 {
-  return (__m128d) __builtin_ia32_subsd_mask_round ((__v2df) __A,
-						(__v2df) __B,
-						(__v2df)
-						_mm_setzero_pd (),
-						(__mmask8) __U,
-						_MM_FROUND_CUR_DIRECTION);
+  return (__mmask16) __builtin_ia32_kandnhi ((__mmask16) __A,
+					     (__mmask16) __B);
 }
 
-extern __inline __m128
+extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_sub_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B)
+_mm512_kor (__mmask16 __A, __mmask16 __B)
 {
-  return (__m128) __builtin_ia32_subss_mask_round ((__v4sf) __A,
-						(__v4sf) __B,
-						(__v4sf) __W,
-						(__mmask8) __U,
-						_MM_FROUND_CUR_DIRECTION);
+  return (__mmask16) __builtin_ia32_korhi ((__mmask16) __A, (__mmask16) __B);
 }
 
-extern __inline __m128
+extern __inline int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_sub_ss (__mmask8 __U, __m128 __A, __m128 __B)
+_mm512_kortestz (__mmask16 __A, __mmask16 __B)
 {
-  return (__m128) __builtin_ia32_subss_mask_round ((__v4sf) __A,
-						(__v4sf) __B,
-						(__v4sf)
-						_mm_setzero_ps (),
-						(__mmask8) __U,
-						_MM_FROUND_CUR_DIRECTION);
+  return (__mmask16) __builtin_ia32_kortestzhi ((__mmask16) __A,
+						(__mmask16) __B);
 }
 
-extern __inline __m512d
+extern __inline int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mul_pd (__m512d __A, __m512d __B)
+_mm512_kortestc (__mmask16 __A, __mmask16 __B)
 {
-  return (__m512d) ((__v8df)__A * (__v8df)__B);
+  return (__mmask16) __builtin_ia32_kortestchi ((__mmask16) __A,
+						(__mmask16) __B);
 }
 
-extern __inline __m512d
+extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_mul_pd (__m512d __W, __mmask8 __U, __m512d __A, __m512d __B)
+_mm512_kxnor (__mmask16 __A, __mmask16 __B)
 {
-  return (__m512d) __builtin_ia32_mulpd512_mask ((__v8df) __A,
-						 (__v8df) __B,
-						 (__v8df) __W,
-						 (__mmask8) __U,
-						 _MM_FROUND_CUR_DIRECTION);
+  return (__mmask16) __builtin_ia32_kxnorhi ((__mmask16) __A, (__mmask16) __B);
 }
 
-extern __inline __m512d
+extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_mul_pd (__mmask8 __U, __m512d __A, __m512d __B)
+_mm512_kxor (__mmask16 __A, __mmask16 __B)
 {
-  return (__m512d) __builtin_ia32_mulpd512_mask ((__v8df) __A,
-						 (__v8df) __B,
-						 (__v8df)
-						 _mm512_setzero_pd (),
-						 (__mmask8) __U,
-						 _MM_FROUND_CUR_DIRECTION);
+  return (__mmask16) __builtin_ia32_kxorhi ((__mmask16) __A, (__mmask16) __B);
 }
 
-extern __inline __m512
+extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mul_ps (__m512 __A, __m512 __B)
+_mm512_knot (__mmask16 __A)
 {
-  return (__m512) ((__v16sf)__A * (__v16sf)__B);
+  return (__mmask16) __builtin_ia32_knothi ((__mmask16) __A);
 }
 
-extern __inline __m512
+extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_mul_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B)
+_mm512_kunpackb (__mmask16 __A, __mmask16 __B)
 {
-  return (__m512) __builtin_ia32_mulps512_mask ((__v16sf) __A,
-						(__v16sf) __B,
-						(__v16sf) __W,
-						(__mmask16) __U,
-						_MM_FROUND_CUR_DIRECTION);
+  return (__mmask16) __builtin_ia32_kunpckhi ((__mmask16) __A, (__mmask16) __B);
 }
 
-extern __inline __m512
+#ifdef __OPTIMIZE__
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_mul_ps (__mmask16 __U, __m512 __A, __m512 __B)
+_mm512_maskz_inserti32x4 (__mmask16 __B, __m512i __C, __m128i __D,
+			  const int __imm)
 {
-  return (__m512) __builtin_ia32_mulps512_mask ((__v16sf) __A,
-						(__v16sf) __B,
-						(__v16sf)
-						_mm512_setzero_ps (),
-						(__mmask16) __U,
-						_MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_inserti32x4_mask ((__v16si) __C,
+						    (__v4si) __D,
+						    __imm,
+						    (__v16si)
+						    _mm512_setzero_si512 (),
+						    __B);
 }
 
-extern __inline __m128d
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_mul_sd (__m128d __W, __mmask8 __U, __m128d __A,
-			  __m128d __B)
+_mm512_maskz_insertf32x4 (__mmask16 __B, __m512 __C, __m128 __D,
+			  const int __imm)
 {
-  return (__m128d) __builtin_ia32_mulsd_mask_round ((__v2df) __A,
-						 (__v2df) __B,
-						 (__v2df) __W,
-						 (__mmask8) __U,
-						  _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_insertf32x4_mask ((__v16sf) __C,
+						   (__v4sf) __D,
+						   __imm,
+						   (__v16sf)
+						   _mm512_setzero_ps (), __B);
 }
 
-extern __inline __m128d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_mul_sd (__mmask8 __U, __m128d __A, __m128d __B)
+_mm512_mask_inserti32x4 (__m512i __A, __mmask16 __B, __m512i __C,
+			 __m128i __D, const int __imm)
 {
-  return (__m128d) __builtin_ia32_mulsd_mask_round ((__v2df) __A,
-						 (__v2df) __B,
-						 (__v2df)
-						 _mm_setzero_pd (),
-						 (__mmask8) __U,
-						  _MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_inserti32x4_mask ((__v16si) __C,
+						    (__v4si) __D,
+						    __imm,
+						    (__v16si) __A,
+						    __B);
 }
 
-extern __inline __m128
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_mul_ss (__m128 __W, __mmask8 __U, __m128 __A,
-			  __m128 __B)
+_mm512_mask_insertf32x4 (__m512 __A, __mmask16 __B, __m512 __C,
+			 __m128 __D, const int __imm)
 {
-  return (__m128) __builtin_ia32_mulss_mask_round ((__v4sf) __A,
-						 (__v4sf) __B,
-						 (__v4sf) __W,
-						 (__mmask8) __U,
-						  _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_insertf32x4_mask ((__v16sf) __C,
+						   (__v4sf) __D,
+						   __imm,
+						   (__v16sf) __A, __B);
 }
+#else
+#define _mm512_maskz_insertf32x4(A, X, Y, C)                            \
+  ((__m512) __builtin_ia32_insertf32x4_mask ((__v16sf)(__m512) (X),     \
+    (__v4sf)(__m128) (Y), (int) (C), (__v16sf)_mm512_setzero_ps(),      \
+    (__mmask16)(A)))
 
-extern __inline __m128
+#define _mm512_maskz_inserti32x4(A, X, Y, C)                            \
+  ((__m512i) __builtin_ia32_inserti32x4_mask ((__v16si)(__m512i) (X),   \
+    (__v4si)(__m128i) (Y), (int) (C), (__v16si)_mm512_setzero_si512 (),     \
+    (__mmask16)(A)))
+
+#define _mm512_mask_insertf32x4(A, B, X, Y, C)                          \
+  ((__m512) __builtin_ia32_insertf32x4_mask ((__v16sf)(__m512) (X),     \
+    (__v4sf)(__m128) (Y), (int) (C), (__v16sf)(__m512) (A),             \
+					     (__mmask16)(B)))
+
+#define _mm512_mask_inserti32x4(A, B, X, Y, C)                          \
+  ((__m512i) __builtin_ia32_inserti32x4_mask ((__v16si)(__m512i) (X),   \
+    (__v4si)(__m128i) (Y), (int) (C), (__v16si)(__m512i) (A),           \
+					      (__mmask16)(B)))
+#endif
+
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_mul_ss (__mmask8 __U, __m128 __A, __m128 __B)
+_mm512_max_epi64 (__m512i __A, __m512i __B)
 {
-  return (__m128) __builtin_ia32_mulss_mask_round ((__v4sf) __A,
-						 (__v4sf) __B,
-						 (__v4sf)
-						 _mm_setzero_ps (),
-						 (__mmask8) __U,
-						  _MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_pmaxsq512_mask ((__v8di) __A,
+						  (__v8di) __B,
+						  (__v8di)
+						  _mm512_undefined_epi32 (),
+						  (__mmask8) -1);
 }
 
-extern __inline __m512d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_div_pd (__m512d __M, __m512d __V)
+_mm512_maskz_max_epi64 (__mmask8 __M, __m512i __A, __m512i __B)
 {
-  return (__m512d) ((__v8df)__M / (__v8df)__V);
+  return (__m512i) __builtin_ia32_pmaxsq512_mask ((__v8di) __A,
+						  (__v8di) __B,
+						  (__v8di)
+						  _mm512_setzero_si512 (),
+						  __M);
 }
 
-extern __inline __m512d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_div_pd (__m512d __W, __mmask8 __U, __m512d __M, __m512d __V)
+_mm512_mask_max_epi64 (__m512i __W, __mmask8 __M, __m512i __A, __m512i __B)
 {
-  return (__m512d) __builtin_ia32_divpd512_mask ((__v8df) __M,
-						 (__v8df) __V,
-						 (__v8df) __W,
-						 (__mmask8) __U,
-						 _MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_pmaxsq512_mask ((__v8di) __A,
+						  (__v8di) __B,
+						  (__v8di) __W, __M);
 }
 
-extern __inline __m512d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_div_pd (__mmask8 __U, __m512d __M, __m512d __V)
+_mm512_min_epi64 (__m512i __A, __m512i __B)
 {
-  return (__m512d) __builtin_ia32_divpd512_mask ((__v8df) __M,
-						 (__v8df) __V,
-						 (__v8df)
-						 _mm512_setzero_pd (),
-						 (__mmask8) __U,
-						 _MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_pminsq512_mask ((__v8di) __A,
+						  (__v8di) __B,
+						  (__v8di)
+						  _mm512_undefined_epi32 (),
+						  (__mmask8) -1);
 }
 
-extern __inline __m512
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_div_ps (__m512 __A, __m512 __B)
+_mm512_mask_min_epi64 (__m512i __W, __mmask8 __M, __m512i __A, __m512i __B)
 {
-  return (__m512) ((__v16sf)__A / (__v16sf)__B);
+  return (__m512i) __builtin_ia32_pminsq512_mask ((__v8di) __A,
+						  (__v8di) __B,
+						  (__v8di) __W, __M);
 }
 
-extern __inline __m512
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_div_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B)
+_mm512_maskz_min_epi64 (__mmask8 __M, __m512i __A, __m512i __B)
 {
-  return (__m512) __builtin_ia32_divps512_mask ((__v16sf) __A,
-						(__v16sf) __B,
-						(__v16sf) __W,
-						(__mmask16) __U,
-						_MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_pminsq512_mask ((__v8di) __A,
+						  (__v8di) __B,
+						  (__v8di)
+						  _mm512_setzero_si512 (),
+						  __M);
 }
 
-extern __inline __m512
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_div_ps (__mmask16 __U, __m512 __A, __m512 __B)
+_mm512_max_epu64 (__m512i __A, __m512i __B)
 {
-  return (__m512) __builtin_ia32_divps512_mask ((__v16sf) __A,
-						(__v16sf) __B,
-						(__v16sf)
-						_mm512_setzero_ps (),
-						(__mmask16) __U,
-						_MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_pmaxuq512_mask ((__v8di) __A,
+						  (__v8di) __B,
+						  (__v8di)
+						  _mm512_undefined_epi32 (),
+						  (__mmask8) -1);
 }
 
-extern __inline __m128d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_div_sd (__m128d __W, __mmask8 __U, __m128d __A,
-			  __m128d __B)
+_mm512_maskz_max_epu64 (__mmask8 __M, __m512i __A, __m512i __B)
 {
-  return (__m128d) __builtin_ia32_divsd_mask_round ((__v2df) __A,
-						 (__v2df) __B,
-						 (__v2df) __W,
-						 (__mmask8) __U,
-						  _MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_pmaxuq512_mask ((__v8di) __A,
+						  (__v8di) __B,
+						  (__v8di)
+						  _mm512_setzero_si512 (),
+						  __M);
 }
 
-extern __inline __m128d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_div_sd (__mmask8 __U, __m128d __A, __m128d __B)
+_mm512_mask_max_epu64 (__m512i __W, __mmask8 __M, __m512i __A, __m512i __B)
 {
-  return (__m128d) __builtin_ia32_divsd_mask_round ((__v2df) __A,
-						 (__v2df) __B,
-						 (__v2df)
-						 _mm_setzero_pd (),
-						 (__mmask8) __U,
-						  _MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_pmaxuq512_mask ((__v8di) __A,
+						  (__v8di) __B,
+						  (__v8di) __W, __M);
 }
 
-extern __inline __m128
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_div_ss (__m128 __W, __mmask8 __U, __m128 __A,
-			  __m128 __B)
+_mm512_min_epu64 (__m512i __A, __m512i __B)
 {
-  return (__m128) __builtin_ia32_divss_mask_round ((__v4sf) __A,
-						 (__v4sf) __B,
-						 (__v4sf) __W,
-						 (__mmask8) __U,
-						  _MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_pminuq512_mask ((__v8di) __A,
+						  (__v8di) __B,
+						  (__v8di)
+						  _mm512_undefined_epi32 (),
+						  (__mmask8) -1);
 }
 
-extern __inline __m128
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_div_ss (__mmask8 __U, __m128 __A, __m128 __B)
+_mm512_mask_min_epu64 (__m512i __W, __mmask8 __M, __m512i __A, __m512i __B)
 {
-  return (__m128) __builtin_ia32_divss_mask_round ((__v4sf) __A,
-						 (__v4sf) __B,
-						 (__v4sf)
-						 _mm_setzero_ps (),
-						 (__mmask8) __U,
-						  _MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_pminuq512_mask ((__v8di) __A,
+						  (__v8di) __B,
+						  (__v8di) __W, __M);
 }
 
-extern __inline __m512d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_max_pd (__m512d __A, __m512d __B)
+_mm512_maskz_min_epu64 (__mmask8 __M, __m512i __A, __m512i __B)
 {
-  return (__m512d) __builtin_ia32_maxpd512_mask ((__v8df) __A,
-						 (__v8df) __B,
-						 (__v8df)
-						 _mm512_undefined_pd (),
-						 (__mmask8) -1,
-						 _MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_pminuq512_mask ((__v8di) __A,
+						  (__v8di) __B,
+						  (__v8di)
+						  _mm512_setzero_si512 (),
+						  __M);
 }
 
-extern __inline __m512d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_max_pd (__m512d __W, __mmask8 __U, __m512d __A, __m512d __B)
+_mm512_max_epi32 (__m512i __A, __m512i __B)
 {
-  return (__m512d) __builtin_ia32_maxpd512_mask ((__v8df) __A,
-						 (__v8df) __B,
-						 (__v8df) __W,
-						 (__mmask8) __U,
-						 _MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_pmaxsd512_mask ((__v16si) __A,
+						  (__v16si) __B,
+						  (__v16si)
+						  _mm512_undefined_epi32 (),
+						  (__mmask16) -1);
 }
 
-extern __inline __m512d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_max_pd (__mmask8 __U, __m512d __A, __m512d __B)
+_mm512_maskz_max_epi32 (__mmask16 __M, __m512i __A, __m512i __B)
 {
-  return (__m512d) __builtin_ia32_maxpd512_mask ((__v8df) __A,
-						 (__v8df) __B,
-						 (__v8df)
-						 _mm512_setzero_pd (),
-						 (__mmask8) __U,
-						 _MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_pmaxsd512_mask ((__v16si) __A,
+						  (__v16si) __B,
+						  (__v16si)
+						  _mm512_setzero_si512 (),
+						  __M);
 }
 
-extern __inline __m512
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_max_ps (__m512 __A, __m512 __B)
+_mm512_mask_max_epi32 (__m512i __W, __mmask16 __M, __m512i __A, __m512i __B)
 {
-  return (__m512) __builtin_ia32_maxps512_mask ((__v16sf) __A,
-						(__v16sf) __B,
-						(__v16sf)
-						_mm512_undefined_ps (),
-						(__mmask16) -1,
-						_MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_pmaxsd512_mask ((__v16si) __A,
+						  (__v16si) __B,
+						  (__v16si) __W, __M);
 }
 
-extern __inline __m512
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_max_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B)
+_mm512_min_epi32 (__m512i __A, __m512i __B)
 {
-  return (__m512) __builtin_ia32_maxps512_mask ((__v16sf) __A,
-						(__v16sf) __B,
-						(__v16sf) __W,
-						(__mmask16) __U,
-						_MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_pminsd512_mask ((__v16si) __A,
+						  (__v16si) __B,
+						  (__v16si)
+						  _mm512_undefined_epi32 (),
+						  (__mmask16) -1);
 }
 
-extern __inline __m512
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_max_ps (__mmask16 __U, __m512 __A, __m512 __B)
+_mm512_maskz_min_epi32 (__mmask16 __M, __m512i __A, __m512i __B)
 {
-  return (__m512) __builtin_ia32_maxps512_mask ((__v16sf) __A,
-						(__v16sf) __B,
-						(__v16sf)
-						_mm512_setzero_ps (),
-						(__mmask16) __U,
-						_MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_pminsd512_mask ((__v16si) __A,
+						  (__v16si) __B,
+						  (__v16si)
+						  _mm512_setzero_si512 (),
+						  __M);
 }
 
-extern __inline __m128d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_max_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B)
+_mm512_mask_min_epi32 (__m512i __W, __mmask16 __M, __m512i __A, __m512i __B)
 {
-  return (__m128d) __builtin_ia32_maxsd_mask_round ((__v2df) __A,
-						 (__v2df) __B,
-						 (__v2df) __W,
-						 (__mmask8) __U,
-						 _MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_pminsd512_mask ((__v16si) __A,
+						  (__v16si) __B,
+						  (__v16si) __W, __M);
 }
 
-extern __inline __m128d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_max_sd (__mmask8 __U, __m128d __A, __m128d __B)
+_mm512_max_epu32 (__m512i __A, __m512i __B)
 {
-  return (__m128d) __builtin_ia32_maxsd_mask_round ((__v2df) __A,
-						 (__v2df) __B,
-						 (__v2df)
-						 _mm_setzero_pd (),
-						 (__mmask8) __U,
-						 _MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_pmaxud512_mask ((__v16si) __A,
+						  (__v16si) __B,
+						  (__v16si)
+						  _mm512_undefined_epi32 (),
+						  (__mmask16) -1);
 }
 
-extern __inline __m128
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_max_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B)
+_mm512_maskz_max_epu32 (__mmask16 __M, __m512i __A, __m512i __B)
 {
-  return (__m128) __builtin_ia32_maxss_mask_round ((__v4sf) __A,
-						(__v4sf) __B,
-						(__v4sf) __W,
-						(__mmask8) __U,
-						_MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_pmaxud512_mask ((__v16si) __A,
+						  (__v16si) __B,
+						  (__v16si)
+						  _mm512_setzero_si512 (),
+						  __M);
 }
 
-extern __inline __m128
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_max_ss (__mmask8 __U, __m128 __A, __m128 __B)
+_mm512_mask_max_epu32 (__m512i __W, __mmask16 __M, __m512i __A, __m512i __B)
 {
-  return (__m128) __builtin_ia32_maxss_mask_round ((__v4sf) __A,
-						(__v4sf) __B,
-						(__v4sf)
-						_mm_setzero_ps (),
-						(__mmask8) __U,
-						_MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_pmaxud512_mask ((__v16si) __A,
+						  (__v16si) __B,
+						  (__v16si) __W, __M);
 }
 
-extern __inline __m512d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_min_pd (__m512d __A, __m512d __B)
+_mm512_min_epu32 (__m512i __A, __m512i __B)
 {
-  return (__m512d) __builtin_ia32_minpd512_mask ((__v8df) __A,
-						 (__v8df) __B,
-						 (__v8df)
-						 _mm512_undefined_pd (),
-						 (__mmask8) -1,
-						 _MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_pminud512_mask ((__v16si) __A,
+						  (__v16si) __B,
+						  (__v16si)
+						  _mm512_undefined_epi32 (),
+						  (__mmask16) -1);
 }
 
-extern __inline __m512d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_min_pd (__m512d __W, __mmask8 __U, __m512d __A, __m512d __B)
+_mm512_maskz_min_epu32 (__mmask16 __M, __m512i __A, __m512i __B)
 {
-  return (__m512d) __builtin_ia32_minpd512_mask ((__v8df) __A,
-						 (__v8df) __B,
-						 (__v8df) __W,
-						 (__mmask8) __U,
-						 _MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_pminud512_mask ((__v16si) __A,
+						  (__v16si) __B,
+						  (__v16si)
+						  _mm512_setzero_si512 (),
+						  __M);
 }
 
-extern __inline __m512d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_min_pd (__mmask8 __U, __m512d __A, __m512d __B)
+_mm512_mask_min_epu32 (__m512i __W, __mmask16 __M, __m512i __A, __m512i __B)
 {
-  return (__m512d) __builtin_ia32_minpd512_mask ((__v8df) __A,
-						 (__v8df) __B,
-						 (__v8df)
-						 _mm512_setzero_pd (),
-						 (__mmask8) __U,
-						 _MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_pminud512_mask ((__v16si) __A,
+						  (__v16si) __B,
+						  (__v16si) __W, __M);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_min_ps (__m512 __A, __m512 __B)
+_mm512_unpacklo_ps (__m512 __A, __m512 __B)
 {
-  return (__m512) __builtin_ia32_minps512_mask ((__v16sf) __A,
-						(__v16sf) __B,
-						(__v16sf)
-						_mm512_undefined_ps (),
-						(__mmask16) -1,
-						_MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_unpcklps512_mask ((__v16sf) __A,
+						   (__v16sf) __B,
+						   (__v16sf)
+						   _mm512_undefined_ps (),
+						   (__mmask16) -1);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_min_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B)
+_mm512_mask_unpacklo_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B)
 {
-  return (__m512) __builtin_ia32_minps512_mask ((__v16sf) __A,
-						(__v16sf) __B,
-						(__v16sf) __W,
-						(__mmask16) __U,
-						_MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_unpcklps512_mask ((__v16sf) __A,
+						   (__v16sf) __B,
+						   (__v16sf) __W,
+						   (__mmask16) __U);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_min_ps (__mmask16 __U, __m512 __A, __m512 __B)
+_mm512_maskz_unpacklo_ps (__mmask16 __U, __m512 __A, __m512 __B)
 {
-  return (__m512) __builtin_ia32_minps512_mask ((__v16sf) __A,
-						(__v16sf) __B,
-						(__v16sf)
-						_mm512_setzero_ps (),
-						(__mmask16) __U,
-						_MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_unpcklps512_mask ((__v16sf) __A,
+						   (__v16sf) __B,
+						   (__v16sf)
+						   _mm512_setzero_ps (),
+						   (__mmask16) __U);
 }
 
-extern __inline __m128d
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_min_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B)
+_mm512_mask_blend_pd (__mmask8 __U, __m512d __A, __m512d __W)
 {
-  return (__m128d) __builtin_ia32_minsd_mask_round ((__v2df) __A,
-						 (__v2df) __B,
-						 (__v2df) __W,
-						 (__mmask8) __U,
-						 _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_blendmpd_512_mask ((__v8df) __A,
+						     (__v8df) __W,
+						     (__mmask8) __U);
 }
 
-extern __inline __m128d
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_min_sd (__mmask8 __U, __m128d __A, __m128d __B)
+_mm512_mask_blend_ps (__mmask16 __U, __m512 __A, __m512 __W)
 {
-  return (__m128d) __builtin_ia32_minsd_mask_round ((__v2df) __A,
-						 (__v2df) __B,
-						 (__v2df)
-						 _mm_setzero_pd (),
-						 (__mmask8) __U,
-						 _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_blendmps_512_mask ((__v16sf) __A,
+						    (__v16sf) __W,
+						    (__mmask16) __U);
 }
 
-extern __inline __m128
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_min_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B)
+_mm512_mask_blend_epi64 (__mmask8 __U, __m512i __A, __m512i __W)
 {
-  return (__m128) __builtin_ia32_minss_mask_round ((__v4sf) __A,
-						(__v4sf) __B,
-						(__v4sf) __W,
-						(__mmask8) __U,
-						_MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_blendmq_512_mask ((__v8di) __A,
+						    (__v8di) __W,
+						    (__mmask8) __U);
 }
 
-extern __inline __m128
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_min_ss (__mmask8 __U, __m128 __A, __m128 __B)
+_mm512_mask_blend_epi32 (__mmask16 __U, __m512i __A, __m512i __W)
 {
-  return (__m128) __builtin_ia32_minss_mask_round ((__v4sf) __A,
-						(__v4sf) __B,
-						(__v4sf)
-						_mm_setzero_ps (),
-						(__mmask8) __U,
-						_MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_blendmd_512_mask ((__v16si) __A,
+						    (__v16si) __W,
+						    (__mmask16) __U);
 }
 
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_scalef_pd (__m512d __A, __m512d __B)
+_mm512_sqrt_pd (__m512d __A)
 {
-  return (__m512d) __builtin_ia32_scalefpd512_mask ((__v8df) __A,
-						    (__v8df) __B,
-						    (__v8df)
-						    _mm512_undefined_pd (),
-						    (__mmask8) -1,
-						    _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_sqrtpd512_mask ((__v8df) __A,
+						  (__v8df)
+						  _mm512_undefined_pd (),
+						  (__mmask8) -1,
+						  _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_scalef_pd (__m512d __W, __mmask8 __U, __m512d __A, __m512d __B)
+_mm512_mask_sqrt_pd (__m512d __W, __mmask8 __U, __m512d __A)
 {
-  return (__m512d) __builtin_ia32_scalefpd512_mask ((__v8df) __A,
-						    (__v8df) __B,
-						    (__v8df) __W,
-						    (__mmask8) __U,
-						    _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_sqrtpd512_mask ((__v8df) __A,
+						  (__v8df) __W,
+						  (__mmask8) __U,
+						  _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_scalef_pd (__mmask8 __U, __m512d __A, __m512d __B)
+_mm512_maskz_sqrt_pd (__mmask8 __U, __m512d __A)
 {
-  return (__m512d) __builtin_ia32_scalefpd512_mask ((__v8df) __A,
-						    (__v8df) __B,
-						    (__v8df)
-						    _mm512_setzero_pd (),
-						    (__mmask8) __U,
-						    _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_sqrtpd512_mask ((__v8df) __A,
+						  (__v8df)
+						  _mm512_setzero_pd (),
+						  (__mmask8) __U,
+						  _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_scalef_ps (__m512 __A, __m512 __B)
+_mm512_sqrt_ps (__m512 __A)
 {
-  return (__m512) __builtin_ia32_scalefps512_mask ((__v16sf) __A,
-						   (__v16sf) __B,
-						   (__v16sf)
-						   _mm512_undefined_ps (),
-						   (__mmask16) -1,
-						   _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_sqrtps512_mask ((__v16sf) __A,
+						 (__v16sf)
+						 _mm512_undefined_ps (),
+						 (__mmask16) -1,
+						 _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_scalef_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B)
+_mm512_mask_sqrt_ps (__m512 __W, __mmask16 __U, __m512 __A)
 {
-  return (__m512) __builtin_ia32_scalefps512_mask ((__v16sf) __A,
-						   (__v16sf) __B,
-						   (__v16sf) __W,
-						   (__mmask16) __U,
-						   _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_sqrtps512_mask ((__v16sf) __A,
+						 (__v16sf) __W,
+						 (__mmask16) __U,
+						 _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_scalef_ps (__mmask16 __U, __m512 __A, __m512 __B)
+_mm512_maskz_sqrt_ps (__mmask16 __U, __m512 __A)
 {
-  return (__m512) __builtin_ia32_scalefps512_mask ((__v16sf) __A,
-						   (__v16sf) __B,
-						   (__v16sf)
-						   _mm512_setzero_ps (),
-						   (__mmask16) __U,
-						   _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_sqrtps512_mask ((__v16sf) __A,
+						 (__v16sf)
+						 _mm512_setzero_ps (),
+						 (__mmask16) __U,
+						 _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128d
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_scalef_sd (__m128d __A, __m128d __B)
+_mm512_add_pd (__m512d __A, __m512d __B)
 {
-  return (__m128d) __builtin_ia32_scalefsd_mask_round ((__v2df) __A,
-						    (__v2df) __B,
-						    (__v2df)
-						    _mm_setzero_pd (),
-						    (__mmask8) -1,
-						    _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) ((__v8df)__A + (__v8df)__B);
 }
 
-extern __inline __m128
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_scalef_ss (__m128 __A, __m128 __B)
+_mm512_mask_add_pd (__m512d __W, __mmask8 __U, __m512d __A, __m512d __B)
 {
-  return (__m128) __builtin_ia32_scalefss_mask_round ((__v4sf) __A,
-						   (__v4sf) __B,
-						   (__v4sf)
-						   _mm_setzero_ps (),
-						   (__mmask8) -1,
-						   _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_addpd512_mask ((__v8df) __A,
+						 (__v8df) __B,
+						 (__v8df) __W,
+						 (__mmask8) __U,
+						 _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fmadd_pd (__m512d __A, __m512d __B, __m512d __C)
+_mm512_maskz_add_pd (__mmask8 __U, __m512d __A, __m512d __B)
+{
+  return (__m512d) __builtin_ia32_addpd512_mask ((__v8df) __A,
+						 (__v8df) __B,
+						 (__v8df)
+						 _mm512_setzero_pd (),
+						 (__mmask8) __U,
+						 _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_add_ps (__m512 __A, __m512 __B)
+{
+  return (__m512) ((__v16sf)__A + (__v16sf)__B);
+}
+
+extern __inline __m512
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_add_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B)
 {
-  return (__m512d) __builtin_ia32_vfmaddpd512_mask ((__v8df) __A,
-						    (__v8df) __B,
-						    (__v8df) __C,
-						    (__mmask8) -1,
-						    _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_addps512_mask ((__v16sf) __A,
+						(__v16sf) __B,
+						(__v16sf) __W,
+						(__mmask16) __U,
+						_MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512d
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmadd_pd (__m512d __A, __mmask8 __U, __m512d __B, __m512d __C)
+_mm512_maskz_add_ps (__mmask16 __U, __m512 __A, __m512 __B)
 {
-  return (__m512d) __builtin_ia32_vfmaddpd512_mask ((__v8df) __A,
-						    (__v8df) __B,
-						    (__v8df) __C,
-						    (__mmask8) __U,
-						    _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_addps512_mask ((__v16sf) __A,
+						(__v16sf) __B,
+						(__v16sf)
+						_mm512_setzero_ps (),
+						(__mmask16) __U,
+						_MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fmadd_pd (__m512d __A, __m512d __B, __m512d __C, __mmask8 __U)
+_mm512_sub_pd (__m512d __A, __m512d __B)
 {
-  return (__m512d) __builtin_ia32_vfmaddpd512_mask3 ((__v8df) __A,
-						     (__v8df) __B,
-						     (__v8df) __C,
-						     (__mmask8) __U,
-						     _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) ((__v8df)__A - (__v8df)__B);
 }
 
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmadd_pd (__mmask8 __U, __m512d __A, __m512d __B, __m512d __C)
+_mm512_mask_sub_pd (__m512d __W, __mmask8 __U, __m512d __A, __m512d __B)
 {
-  return (__m512d) __builtin_ia32_vfmaddpd512_maskz ((__v8df) __A,
-						     (__v8df) __B,
-						     (__v8df) __C,
-						     (__mmask8) __U,
-						     _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_subpd512_mask ((__v8df) __A,
+						 (__v8df) __B,
+						 (__v8df) __W,
+						 (__mmask8) __U,
+						 _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fmadd_ps (__m512 __A, __m512 __B, __m512 __C)
+_mm512_maskz_sub_pd (__mmask8 __U, __m512d __A, __m512d __B)
 {
-  return (__m512) __builtin_ia32_vfmaddps512_mask ((__v16sf) __A,
-						   (__v16sf) __B,
-						   (__v16sf) __C,
-						   (__mmask16) -1,
-						   _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_subpd512_mask ((__v8df) __A,
+						 (__v8df) __B,
+						 (__v8df)
+						 _mm512_setzero_pd (),
+						 (__mmask8) __U,
+						 _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmadd_ps (__m512 __A, __mmask16 __U, __m512 __B, __m512 __C)
+_mm512_sub_ps (__m512 __A, __m512 __B)
 {
-  return (__m512) __builtin_ia32_vfmaddps512_mask ((__v16sf) __A,
-						   (__v16sf) __B,
-						   (__v16sf) __C,
-						   (__mmask16) __U,
-						   _MM_FROUND_CUR_DIRECTION);
+  return (__m512) ((__v16sf)__A - (__v16sf)__B);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fmadd_ps (__m512 __A, __m512 __B, __m512 __C, __mmask16 __U)
+_mm512_mask_sub_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B)
 {
-  return (__m512) __builtin_ia32_vfmaddps512_mask3 ((__v16sf) __A,
-						    (__v16sf) __B,
-						    (__v16sf) __C,
-						    (__mmask16) __U,
-						    _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_subps512_mask ((__v16sf) __A,
+						(__v16sf) __B,
+						(__v16sf) __W,
+						(__mmask16) __U,
+						_MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmadd_ps (__mmask16 __U, __m512 __A, __m512 __B, __m512 __C)
+_mm512_maskz_sub_ps (__mmask16 __U, __m512 __A, __m512 __B)
 {
-  return (__m512) __builtin_ia32_vfmaddps512_maskz ((__v16sf) __A,
-						    (__v16sf) __B,
-						    (__v16sf) __C,
-						    (__mmask16) __U,
-						    _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_subps512_mask ((__v16sf) __A,
+						(__v16sf) __B,
+						(__v16sf)
+						_mm512_setzero_ps (),
+						(__mmask16) __U,
+						_MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fmsub_pd (__m512d __A, __m512d __B, __m512d __C)
+_mm512_mul_pd (__m512d __A, __m512d __B)
 {
-  return (__m512d) __builtin_ia32_vfmsubpd512_mask ((__v8df) __A,
-						    (__v8df) __B,
-						    (__v8df) __C,
-						    (__mmask8) -1,
-						    _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) ((__v8df)__A * (__v8df)__B);
 }
 
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmsub_pd (__m512d __A, __mmask8 __U, __m512d __B, __m512d __C)
+_mm512_mask_mul_pd (__m512d __W, __mmask8 __U, __m512d __A, __m512d __B)
 {
-  return (__m512d) __builtin_ia32_vfmsubpd512_mask ((__v8df) __A,
-						    (__v8df) __B,
-						    (__v8df) __C,
-						    (__mmask8) __U,
-						    _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_mulpd512_mask ((__v8df) __A,
+						 (__v8df) __B,
+						 (__v8df) __W,
+						 (__mmask8) __U,
+						 _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fmsub_pd (__m512d __A, __m512d __B, __m512d __C, __mmask8 __U)
+_mm512_maskz_mul_pd (__mmask8 __U, __m512d __A, __m512d __B)
 {
-  return (__m512d) __builtin_ia32_vfmsubpd512_mask3 ((__v8df) __A,
-						     (__v8df) __B,
-						     (__v8df) __C,
-						     (__mmask8) __U,
-						     _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_mulpd512_mask ((__v8df) __A,
+						 (__v8df) __B,
+						 (__v8df)
+						 _mm512_setzero_pd (),
+						 (__mmask8) __U,
+						 _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512d
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmsub_pd (__mmask8 __U, __m512d __A, __m512d __B, __m512d __C)
+_mm512_mul_ps (__m512 __A, __m512 __B)
 {
-  return (__m512d) __builtin_ia32_vfmsubpd512_maskz ((__v8df) __A,
-						     (__v8df) __B,
-						     (__v8df) __C,
-						     (__mmask8) __U,
-						     _MM_FROUND_CUR_DIRECTION);
+  return (__m512) ((__v16sf)__A * (__v16sf)__B);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fmsub_ps (__m512 __A, __m512 __B, __m512 __C)
+_mm512_mask_mul_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B)
 {
-  return (__m512) __builtin_ia32_vfmsubps512_mask ((__v16sf) __A,
-						   (__v16sf) __B,
-						   (__v16sf) __C,
-						   (__mmask16) -1,
-						   _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_mulps512_mask ((__v16sf) __A,
+						(__v16sf) __B,
+						(__v16sf) __W,
+						(__mmask16) __U,
+						_MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmsub_ps (__m512 __A, __mmask16 __U, __m512 __B, __m512 __C)
+_mm512_maskz_mul_ps (__mmask16 __U, __m512 __A, __m512 __B)
 {
-  return (__m512) __builtin_ia32_vfmsubps512_mask ((__v16sf) __A,
-						   (__v16sf) __B,
-						   (__v16sf) __C,
-						   (__mmask16) __U,
-						   _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_mulps512_mask ((__v16sf) __A,
+						(__v16sf) __B,
+						(__v16sf)
+						_mm512_setzero_ps (),
+						(__mmask16) __U,
+						_MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fmsub_ps (__m512 __A, __m512 __B, __m512 __C, __mmask16 __U)
+_mm512_div_pd (__m512d __M, __m512d __V)
 {
-  return (__m512) __builtin_ia32_vfmsubps512_mask3 ((__v16sf) __A,
-						    (__v16sf) __B,
-						    (__v16sf) __C,
-						    (__mmask16) __U,
-						    _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) ((__v8df)__M / (__v8df)__V);
 }
 
-extern __inline __m512
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmsub_ps (__mmask16 __U, __m512 __A, __m512 __B, __m512 __C)
+_mm512_mask_div_pd (__m512d __W, __mmask8 __U, __m512d __M, __m512d __V)
 {
-  return (__m512) __builtin_ia32_vfmsubps512_maskz ((__v16sf) __A,
-						    (__v16sf) __B,
-						    (__v16sf) __C,
-						    (__mmask16) __U,
-						    _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_divpd512_mask ((__v8df) __M,
+						 (__v8df) __V,
+						 (__v8df) __W,
+						 (__mmask8) __U,
+						 _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fmaddsub_pd (__m512d __A, __m512d __B, __m512d __C)
+_mm512_maskz_div_pd (__mmask8 __U, __m512d __M, __m512d __V)
 {
-  return (__m512d) __builtin_ia32_vfmaddsubpd512_mask ((__v8df) __A,
-						       (__v8df) __B,
-						       (__v8df) __C,
-						       (__mmask8) -1,
-						       _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_divpd512_mask ((__v8df) __M,
+						 (__v8df) __V,
+						 (__v8df)
+						 _mm512_setzero_pd (),
+						 (__mmask8) __U,
+						 _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512d
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmaddsub_pd (__m512d __A, __mmask8 __U, __m512d __B, __m512d __C)
+_mm512_div_ps (__m512 __A, __m512 __B)
 {
-  return (__m512d) __builtin_ia32_vfmaddsubpd512_mask ((__v8df) __A,
-						       (__v8df) __B,
-						       (__v8df) __C,
-						       (__mmask8) __U,
-						       _MM_FROUND_CUR_DIRECTION);
+  return (__m512) ((__v16sf)__A / (__v16sf)__B);
 }
 
-extern __inline __m512d
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fmaddsub_pd (__m512d __A, __m512d __B, __m512d __C, __mmask8 __U)
+_mm512_mask_div_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B)
 {
-  return (__m512d) __builtin_ia32_vfmaddsubpd512_mask3 ((__v8df) __A,
-							(__v8df) __B,
-							(__v8df) __C,
-							(__mmask8) __U,
-							_MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_divps512_mask ((__v16sf) __A,
+						(__v16sf) __B,
+						(__v16sf) __W,
+						(__mmask16) __U,
+						_MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512d
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmaddsub_pd (__mmask8 __U, __m512d __A, __m512d __B, __m512d __C)
+_mm512_maskz_div_ps (__mmask16 __U, __m512 __A, __m512 __B)
 {
-  return (__m512d) __builtin_ia32_vfmaddsubpd512_maskz ((__v8df) __A,
-							(__v8df) __B,
-							(__v8df) __C,
-							(__mmask8) __U,
-							_MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_divps512_mask ((__v16sf) __A,
+						(__v16sf) __B,
+						(__v16sf)
+						_mm512_setzero_ps (),
+						(__mmask16) __U,
+						_MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fmaddsub_ps (__m512 __A, __m512 __B, __m512 __C)
+_mm512_max_pd (__m512d __A, __m512d __B)
 {
-  return (__m512) __builtin_ia32_vfmaddsubps512_mask ((__v16sf) __A,
-						      (__v16sf) __B,
-						      (__v16sf) __C,
-						      (__mmask16) -1,
-						      _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_maxpd512_mask ((__v8df) __A,
+						 (__v8df) __B,
+						 (__v8df)
+						 _mm512_undefined_pd (),
+						 (__mmask8) -1,
+						 _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmaddsub_ps (__m512 __A, __mmask16 __U, __m512 __B, __m512 __C)
+_mm512_mask_max_pd (__m512d __W, __mmask8 __U, __m512d __A, __m512d __B)
 {
-  return (__m512) __builtin_ia32_vfmaddsubps512_mask ((__v16sf) __A,
-						      (__v16sf) __B,
-						      (__v16sf) __C,
-						      (__mmask16) __U,
-						      _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_maxpd512_mask ((__v8df) __A,
+						 (__v8df) __B,
+						 (__v8df) __W,
+						 (__mmask8) __U,
+						 _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fmaddsub_ps (__m512 __A, __m512 __B, __m512 __C, __mmask16 __U)
+_mm512_maskz_max_pd (__mmask8 __U, __m512d __A, __m512d __B)
 {
-  return (__m512) __builtin_ia32_vfmaddsubps512_mask3 ((__v16sf) __A,
-						       (__v16sf) __B,
-						       (__v16sf) __C,
-						       (__mmask16) __U,
-						       _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_maxpd512_mask ((__v8df) __A,
+						 (__v8df) __B,
+						 (__v8df)
+						 _mm512_setzero_pd (),
+						 (__mmask8) __U,
+						 _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmaddsub_ps (__mmask16 __U, __m512 __A, __m512 __B, __m512 __C)
+_mm512_max_ps (__m512 __A, __m512 __B)
 {
-  return (__m512) __builtin_ia32_vfmaddsubps512_maskz ((__v16sf) __A,
-						       (__v16sf) __B,
-						       (__v16sf) __C,
-						       (__mmask16) __U,
-						       _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_maxps512_mask ((__v16sf) __A,
+						(__v16sf) __B,
+						(__v16sf)
+						_mm512_undefined_ps (),
+						(__mmask16) -1,
+						_MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512d
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fmsubadd_pd (__m512d __A, __m512d __B, __m512d __C)
+_mm512_mask_max_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B)
 {
-  return (__m512d) __builtin_ia32_vfmaddsubpd512_mask ((__v8df) __A,
-						       (__v8df) __B,
-						       -(__v8df) __C,
-						       (__mmask8) -1,
-						       _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_maxps512_mask ((__v16sf) __A,
+						(__v16sf) __B,
+						(__v16sf) __W,
+						(__mmask16) __U,
+						_MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512d
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmsubadd_pd (__m512d __A, __mmask8 __U, __m512d __B, __m512d __C)
+_mm512_maskz_max_ps (__mmask16 __U, __m512 __A, __m512 __B)
 {
-  return (__m512d) __builtin_ia32_vfmaddsubpd512_mask ((__v8df) __A,
-						       (__v8df) __B,
-						       -(__v8df) __C,
-						       (__mmask8) __U,
-						       _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_maxps512_mask ((__v16sf) __A,
+						(__v16sf) __B,
+						(__v16sf)
+						_mm512_setzero_ps (),
+						(__mmask16) __U,
+						_MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fmsubadd_pd (__m512d __A, __m512d __B, __m512d __C, __mmask8 __U)
+_mm512_min_pd (__m512d __A, __m512d __B)
 {
-  return (__m512d) __builtin_ia32_vfmsubaddpd512_mask3 ((__v8df) __A,
-							(__v8df) __B,
-							(__v8df) __C,
-							(__mmask8) __U,
-							_MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_minpd512_mask ((__v8df) __A,
+						 (__v8df) __B,
+						 (__v8df)
+						 _mm512_undefined_pd (),
+						 (__mmask8) -1,
+						 _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmsubadd_pd (__mmask8 __U, __m512d __A, __m512d __B, __m512d __C)
+_mm512_mask_min_pd (__m512d __W, __mmask8 __U, __m512d __A, __m512d __B)
 {
-  return (__m512d) __builtin_ia32_vfmaddsubpd512_maskz ((__v8df) __A,
-							(__v8df) __B,
-							-(__v8df) __C,
-							(__mmask8) __U,
-							_MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_minpd512_mask ((__v8df) __A,
+						 (__v8df) __B,
+						 (__v8df) __W,
+						 (__mmask8) __U,
+						 _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fmsubadd_ps (__m512 __A, __m512 __B, __m512 __C)
+_mm512_maskz_min_pd (__mmask8 __U, __m512d __A, __m512d __B)
 {
-  return (__m512) __builtin_ia32_vfmaddsubps512_mask ((__v16sf) __A,
-						      (__v16sf) __B,
-						      -(__v16sf) __C,
-						      (__mmask16) -1,
-						      _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_minpd512_mask ((__v8df) __A,
+						 (__v8df) __B,
+						 (__v8df)
+						 _mm512_setzero_pd (),
+						 (__mmask8) __U,
+						 _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmsubadd_ps (__m512 __A, __mmask16 __U, __m512 __B, __m512 __C)
+_mm512_min_ps (__m512 __A, __m512 __B)
 {
-  return (__m512) __builtin_ia32_vfmaddsubps512_mask ((__v16sf) __A,
-						      (__v16sf) __B,
-						      -(__v16sf) __C,
-						      (__mmask16) __U,
-						      _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_minps512_mask ((__v16sf) __A,
+						(__v16sf) __B,
+						(__v16sf)
+						_mm512_undefined_ps (),
+						(__mmask16) -1,
+						_MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fmsubadd_ps (__m512 __A, __m512 __B, __m512 __C, __mmask16 __U)
+_mm512_mask_min_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B)
 {
-  return (__m512) __builtin_ia32_vfmsubaddps512_mask3 ((__v16sf) __A,
-						       (__v16sf) __B,
-						       (__v16sf) __C,
-						       (__mmask16) __U,
-						       _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_minps512_mask ((__v16sf) __A,
+						(__v16sf) __B,
+						(__v16sf) __W,
+						(__mmask16) __U,
+						_MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmsubadd_ps (__mmask16 __U, __m512 __A, __m512 __B, __m512 __C)
+_mm512_maskz_min_ps (__mmask16 __U, __m512 __A, __m512 __B)
 {
-  return (__m512) __builtin_ia32_vfmaddsubps512_maskz ((__v16sf) __A,
-						       (__v16sf) __B,
-						       -(__v16sf) __C,
-						       (__mmask16) __U,
-						       _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_minps512_mask ((__v16sf) __A,
+						(__v16sf) __B,
+						(__v16sf)
+						_mm512_setzero_ps (),
+						(__mmask16) __U,
+						_MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fnmadd_pd (__m512d __A, __m512d __B, __m512d __C)
+_mm512_scalef_pd (__m512d __A, __m512d __B)
 {
-  return (__m512d) __builtin_ia32_vfnmaddpd512_mask ((__v8df) __A,
-						     (__v8df) __B,
-						     (__v8df) __C,
-						     (__mmask8) -1,
-						     _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_scalefpd512_mask ((__v8df) __A,
+						    (__v8df) __B,
+						    (__v8df)
+						    _mm512_undefined_pd (),
+						    (__mmask8) -1,
+						    _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fnmadd_pd (__m512d __A, __mmask8 __U, __m512d __B, __m512d __C)
+_mm512_mask_scalef_pd (__m512d __W, __mmask8 __U, __m512d __A, __m512d __B)
 {
-  return (__m512d) __builtin_ia32_vfnmaddpd512_mask ((__v8df) __A,
-						     (__v8df) __B,
-						     (__v8df) __C,
-						     (__mmask8) __U,
-						     _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_scalefpd512_mask ((__v8df) __A,
+						    (__v8df) __B,
+						    (__v8df) __W,
+						    (__mmask8) __U,
+						    _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fnmadd_pd (__m512d __A, __m512d __B, __m512d __C, __mmask8 __U)
+_mm512_maskz_scalef_pd (__mmask8 __U, __m512d __A, __m512d __B)
 {
-  return (__m512d) __builtin_ia32_vfnmaddpd512_mask3 ((__v8df) __A,
-						      (__v8df) __B,
-						      (__v8df) __C,
-						      (__mmask8) __U,
-						      _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_scalefpd512_mask ((__v8df) __A,
+						    (__v8df) __B,
+						    (__v8df)
+						    _mm512_setzero_pd (),
+						    (__mmask8) __U,
+						    _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512d
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fnmadd_pd (__mmask8 __U, __m512d __A, __m512d __B, __m512d __C)
+_mm512_scalef_ps (__m512 __A, __m512 __B)
 {
-  return (__m512d) __builtin_ia32_vfnmaddpd512_maskz ((__v8df) __A,
-						      (__v8df) __B,
-						      (__v8df) __C,
-						      (__mmask8) __U,
-						      _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_scalefps512_mask ((__v16sf) __A,
+						   (__v16sf) __B,
+						   (__v16sf)
+						   _mm512_undefined_ps (),
+						   (__mmask16) -1,
+						   _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fnmadd_ps (__m512 __A, __m512 __B, __m512 __C)
+_mm512_mask_scalef_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B)
 {
-  return (__m512) __builtin_ia32_vfnmaddps512_mask ((__v16sf) __A,
-						    (__v16sf) __B,
-						    (__v16sf) __C,
-						    (__mmask16) -1,
-						    _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_scalefps512_mask ((__v16sf) __A,
+						   (__v16sf) __B,
+						   (__v16sf) __W,
+						   (__mmask16) __U,
+						   _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fnmadd_ps (__m512 __A, __mmask16 __U, __m512 __B, __m512 __C)
+_mm512_maskz_scalef_ps (__mmask16 __U, __m512 __A, __m512 __B)
 {
-  return (__m512) __builtin_ia32_vfnmaddps512_mask ((__v16sf) __A,
-						    (__v16sf) __B,
-						    (__v16sf) __C,
-						    (__mmask16) __U,
-						    _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_scalefps512_mask ((__v16sf) __A,
+						   (__v16sf) __B,
+						   (__v16sf)
+						   _mm512_setzero_ps (),
+						   (__mmask16) __U,
+						   _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fnmadd_ps (__m512 __A, __m512 __B, __m512 __C, __mmask16 __U)
+_mm512_fmadd_pd (__m512d __A, __m512d __B, __m512d __C)
 {
-  return (__m512) __builtin_ia32_vfnmaddps512_mask3 ((__v16sf) __A,
-						     (__v16sf) __B,
-						     (__v16sf) __C,
-						     (__mmask16) __U,
-						     _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_vfmaddpd512_mask ((__v8df) __A,
+						    (__v8df) __B,
+						    (__v8df) __C,
+						    (__mmask8) -1,
+						    _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fnmadd_ps (__mmask16 __U, __m512 __A, __m512 __B, __m512 __C)
+_mm512_mask_fmadd_pd (__m512d __A, __mmask8 __U, __m512d __B, __m512d __C)
 {
-  return (__m512) __builtin_ia32_vfnmaddps512_maskz ((__v16sf) __A,
-						     (__v16sf) __B,
-						     (__v16sf) __C,
-						     (__mmask16) __U,
-						     _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_vfmaddpd512_mask ((__v8df) __A,
+						    (__v8df) __B,
+						    (__v8df) __C,
+						    (__mmask8) __U,
+						    _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fnmsub_pd (__m512d __A, __m512d __B, __m512d __C)
+_mm512_mask3_fmadd_pd (__m512d __A, __m512d __B, __m512d __C, __mmask8 __U)
 {
-  return (__m512d) __builtin_ia32_vfnmsubpd512_mask ((__v8df) __A,
+  return (__m512d) __builtin_ia32_vfmaddpd512_mask3 ((__v8df) __A,
 						     (__v8df) __B,
 						     (__v8df) __C,
-						     (__mmask8) -1,
+						     (__mmask8) __U,
 						     _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fnmsub_pd (__m512d __A, __mmask8 __U, __m512d __B, __m512d __C)
+_mm512_maskz_fmadd_pd (__mmask8 __U, __m512d __A, __m512d __B, __m512d __C)
 {
-  return (__m512d) __builtin_ia32_vfnmsubpd512_mask ((__v8df) __A,
+  return (__m512d) __builtin_ia32_vfmaddpd512_maskz ((__v8df) __A,
 						     (__v8df) __B,
 						     (__v8df) __C,
 						     (__mmask8) __U,
 						     _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512d
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fnmsub_pd (__m512d __A, __m512d __B, __m512d __C, __mmask8 __U)
+_mm512_fmadd_ps (__m512 __A, __m512 __B, __m512 __C)
 {
-  return (__m512d) __builtin_ia32_vfnmsubpd512_mask3 ((__v8df) __A,
-						      (__v8df) __B,
-						      (__v8df) __C,
-						      (__mmask8) __U,
-						      _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_vfmaddps512_mask ((__v16sf) __A,
+						   (__v16sf) __B,
+						   (__v16sf) __C,
+						   (__mmask16) -1,
+						   _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512d
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fnmsub_pd (__mmask8 __U, __m512d __A, __m512d __B, __m512d __C)
+_mm512_mask_fmadd_ps (__m512 __A, __mmask16 __U, __m512 __B, __m512 __C)
 {
-  return (__m512d) __builtin_ia32_vfnmsubpd512_maskz ((__v8df) __A,
-						      (__v8df) __B,
-						      (__v8df) __C,
-						      (__mmask8) __U,
-						      _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_vfmaddps512_mask ((__v16sf) __A,
+						   (__v16sf) __B,
+						   (__v16sf) __C,
+						   (__mmask16) __U,
+						   _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fnmsub_ps (__m512 __A, __m512 __B, __m512 __C)
+_mm512_mask3_fmadd_ps (__m512 __A, __m512 __B, __m512 __C, __mmask16 __U)
 {
-  return (__m512) __builtin_ia32_vfnmsubps512_mask ((__v16sf) __A,
+  return (__m512) __builtin_ia32_vfmaddps512_mask3 ((__v16sf) __A,
 						    (__v16sf) __B,
 						    (__v16sf) __C,
-						    (__mmask16) -1,
+						    (__mmask16) __U,
 						    _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fnmsub_ps (__m512 __A, __mmask16 __U, __m512 __B, __m512 __C)
+_mm512_maskz_fmadd_ps (__mmask16 __U, __m512 __A, __m512 __B, __m512 __C)
 {
-  return (__m512) __builtin_ia32_vfnmsubps512_mask ((__v16sf) __A,
+  return (__m512) __builtin_ia32_vfmaddps512_maskz ((__v16sf) __A,
 						    (__v16sf) __B,
 						    (__v16sf) __C,
 						    (__mmask16) __U,
 						    _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fnmsub_ps (__m512 __A, __m512 __B, __m512 __C, __mmask16 __U)
+_mm512_fmsub_pd (__m512d __A, __m512d __B, __m512d __C)
 {
-  return (__m512) __builtin_ia32_vfnmsubps512_mask3 ((__v16sf) __A,
-						     (__v16sf) __B,
-						     (__v16sf) __C,
-						     (__mmask16) __U,
-						     _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_vfmsubpd512_mask ((__v8df) __A,
+						    (__v8df) __B,
+						    (__v8df) __C,
+						    (__mmask8) -1,
+						    _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fnmsub_ps (__mmask16 __U, __m512 __A, __m512 __B, __m512 __C)
+_mm512_mask_fmsub_pd (__m512d __A, __mmask8 __U, __m512d __B, __m512d __C)
 {
-  return (__m512) __builtin_ia32_vfnmsubps512_maskz ((__v16sf) __A,
-						     (__v16sf) __B,
-						     (__v16sf) __C,
-						     (__mmask16) __U,
-						     _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_vfmsubpd512_mask ((__v8df) __A,
+						    (__v8df) __B,
+						    (__v8df) __C,
+						    (__mmask8) __U,
+						    _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m256i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvttpd_epi32 (__m512d __A)
+_mm512_mask3_fmsub_pd (__m512d __A, __m512d __B, __m512d __C, __mmask8 __U)
 {
-  return (__m256i) __builtin_ia32_cvttpd2dq512_mask ((__v8df) __A,
-						     (__v8si)
-						     _mm256_undefined_si256 (),
-						     (__mmask8) -1,
+  return (__m512d) __builtin_ia32_vfmsubpd512_mask3 ((__v8df) __A,
+						     (__v8df) __B,
+						     (__v8df) __C,
+						     (__mmask8) __U,
 						     _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m256i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvttpd_epi32 (__m256i __W, __mmask8 __U, __m512d __A)
+_mm512_maskz_fmsub_pd (__mmask8 __U, __m512d __A, __m512d __B, __m512d __C)
 {
-  return (__m256i) __builtin_ia32_cvttpd2dq512_mask ((__v8df) __A,
-						     (__v8si) __W,
+  return (__m512d) __builtin_ia32_vfmsubpd512_maskz ((__v8df) __A,
+						     (__v8df) __B,
+						     (__v8df) __C,
 						     (__mmask8) __U,
 						     _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m256i
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvttpd_epi32 (__mmask8 __U, __m512d __A)
+_mm512_fmsub_ps (__m512 __A, __m512 __B, __m512 __C)
 {
-  return (__m256i) __builtin_ia32_cvttpd2dq512_mask ((__v8df) __A,
-						     (__v8si)
-						     _mm256_setzero_si256 (),
-						     (__mmask8) __U,
-						     _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_vfmsubps512_mask ((__v16sf) __A,
+						   (__v16sf) __B,
+						   (__v16sf) __C,
+						   (__mmask16) -1,
+						   _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m256i
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvttpd_epu32 (__m512d __A)
+_mm512_mask_fmsub_ps (__m512 __A, __mmask16 __U, __m512 __B, __m512 __C)
 {
-  return (__m256i) __builtin_ia32_cvttpd2udq512_mask ((__v8df) __A,
-						      (__v8si)
-						      _mm256_undefined_si256 (),
-						      (__mmask8) -1,
-						      _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_vfmsubps512_mask ((__v16sf) __A,
+						   (__v16sf) __B,
+						   (__v16sf) __C,
+						   (__mmask16) __U,
+						   _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m256i
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvttpd_epu32 (__m256i __W, __mmask8 __U, __m512d __A)
+_mm512_mask3_fmsub_ps (__m512 __A, __m512 __B, __m512 __C, __mmask16 __U)
 {
-  return (__m256i) __builtin_ia32_cvttpd2udq512_mask ((__v8df) __A,
-						      (__v8si) __W,
-						      (__mmask8) __U,
-						      _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_vfmsubps512_mask3 ((__v16sf) __A,
+						    (__v16sf) __B,
+						    (__v16sf) __C,
+						    (__mmask16) __U,
+						    _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m256i
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvttpd_epu32 (__mmask8 __U, __m512d __A)
+_mm512_maskz_fmsub_ps (__mmask16 __U, __m512 __A, __m512 __B, __m512 __C)
 {
-  return (__m256i) __builtin_ia32_cvttpd2udq512_mask ((__v8df) __A,
-						      (__v8si)
-						      _mm256_setzero_si256 (),
-						      (__mmask8) __U,
-						      _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_vfmsubps512_maskz ((__v16sf) __A,
+						    (__v16sf) __B,
+						    (__v16sf) __C,
+						    (__mmask16) __U,
+						    _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m256i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtpd_epi32 (__m512d __A)
+_mm512_fmaddsub_pd (__m512d __A, __m512d __B, __m512d __C)
 {
-  return (__m256i) __builtin_ia32_cvtpd2dq512_mask ((__v8df) __A,
-						    (__v8si)
-						    _mm256_undefined_si256 (),
-						    (__mmask8) -1,
-						    _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_vfmaddsubpd512_mask ((__v8df) __A,
+						       (__v8df) __B,
+						       (__v8df) __C,
+						       (__mmask8) -1,
+						       _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m256i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtpd_epi32 (__m256i __W, __mmask8 __U, __m512d __A)
+_mm512_mask_fmaddsub_pd (__m512d __A, __mmask8 __U, __m512d __B, __m512d __C)
 {
-  return (__m256i) __builtin_ia32_cvtpd2dq512_mask ((__v8df) __A,
-						    (__v8si) __W,
-						    (__mmask8) __U,
-						    _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_vfmaddsubpd512_mask ((__v8df) __A,
+						       (__v8df) __B,
+						       (__v8df) __C,
+						       (__mmask8) __U,
+						       _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m256i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtpd_epi32 (__mmask8 __U, __m512d __A)
+_mm512_mask3_fmaddsub_pd (__m512d __A, __m512d __B, __m512d __C, __mmask8 __U)
 {
-  return (__m256i) __builtin_ia32_cvtpd2dq512_mask ((__v8df) __A,
-						    (__v8si)
-						    _mm256_setzero_si256 (),
-						    (__mmask8) __U,
-						    _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_vfmaddsubpd512_mask3 ((__v8df) __A,
+							(__v8df) __B,
+							(__v8df) __C,
+							(__mmask8) __U,
+							_MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m256i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtpd_epu32 (__m512d __A)
+_mm512_maskz_fmaddsub_pd (__mmask8 __U, __m512d __A, __m512d __B, __m512d __C)
 {
-  return (__m256i) __builtin_ia32_cvtpd2udq512_mask ((__v8df) __A,
-						     (__v8si)
-						     _mm256_undefined_si256 (),
-						     (__mmask8) -1,
-						     _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_vfmaddsubpd512_maskz ((__v8df) __A,
+							(__v8df) __B,
+							(__v8df) __C,
+							(__mmask8) __U,
+							_MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m256i
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtpd_epu32 (__m256i __W, __mmask8 __U, __m512d __A)
+_mm512_fmaddsub_ps (__m512 __A, __m512 __B, __m512 __C)
 {
-  return (__m256i) __builtin_ia32_cvtpd2udq512_mask ((__v8df) __A,
-						     (__v8si) __W,
-						     (__mmask8) __U,
-						     _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_vfmaddsubps512_mask ((__v16sf) __A,
+						      (__v16sf) __B,
+						      (__v16sf) __C,
+						      (__mmask16) -1,
+						      _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m256i
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtpd_epu32 (__mmask8 __U, __m512d __A)
+_mm512_mask_fmaddsub_ps (__m512 __A, __mmask16 __U, __m512 __B, __m512 __C)
 {
-  return (__m256i) __builtin_ia32_cvtpd2udq512_mask ((__v8df) __A,
-						     (__v8si)
-						     _mm256_setzero_si256 (),
-						     (__mmask8) __U,
-						     _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_vfmaddsubps512_mask ((__v16sf) __A,
+						      (__v16sf) __B,
+						      (__v16sf) __C,
+						      (__mmask16) __U,
+						      _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512i
+extern __inline __m512
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask3_fmaddsub_ps (__m512 __A, __m512 __B, __m512 __C, __mmask16 __U)
+{
+  return (__m512) __builtin_ia32_vfmaddsubps512_mask3 ((__v16sf) __A,
+						       (__v16sf) __B,
+						       (__v16sf) __C,
+						       (__mmask16) __U,
+						       _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_fmaddsub_ps (__mmask16 __U, __m512 __A, __m512 __B, __m512 __C)
+{
+  return (__m512) __builtin_ia32_vfmaddsubps512_maskz ((__v16sf) __A,
+						       (__v16sf) __B,
+						       (__v16sf) __C,
+						       (__mmask16) __U,
+						       _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvttps_epi32 (__m512 __A)
+_mm512_fmsubadd_pd (__m512d __A, __m512d __B, __m512d __C)
 {
-  return (__m512i) __builtin_ia32_cvttps2dq512_mask ((__v16sf) __A,
-						     (__v16si)
-						     _mm512_undefined_epi32 (),
-						     (__mmask16) -1,
-						     _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_vfmaddsubpd512_mask ((__v8df) __A,
+						       (__v8df) __B,
+						       -(__v8df) __C,
+						       (__mmask8) -1,
+						       _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvttps_epi32 (__m512i __W, __mmask16 __U, __m512 __A)
+_mm512_mask_fmsubadd_pd (__m512d __A, __mmask8 __U, __m512d __B, __m512d __C)
 {
-  return (__m512i) __builtin_ia32_cvttps2dq512_mask ((__v16sf) __A,
-						     (__v16si) __W,
-						     (__mmask16) __U,
-						     _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_vfmaddsubpd512_mask ((__v8df) __A,
+						       (__v8df) __B,
+						       -(__v8df) __C,
+						       (__mmask8) __U,
+						       _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvttps_epi32 (__mmask16 __U, __m512 __A)
+_mm512_mask3_fmsubadd_pd (__m512d __A, __m512d __B, __m512d __C, __mmask8 __U)
 {
-  return (__m512i) __builtin_ia32_cvttps2dq512_mask ((__v16sf) __A,
-						     (__v16si)
-						     _mm512_setzero_si512 (),
-						     (__mmask16) __U,
-						     _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_vfmsubaddpd512_mask3 ((__v8df) __A,
+							(__v8df) __B,
+							(__v8df) __C,
+							(__mmask8) __U,
+							_MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvttps_epu32 (__m512 __A)
+_mm512_maskz_fmsubadd_pd (__mmask8 __U, __m512d __A, __m512d __B, __m512d __C)
 {
-  return (__m512i) __builtin_ia32_cvttps2udq512_mask ((__v16sf) __A,
-						      (__v16si)
-						      _mm512_undefined_epi32 (),
-						      (__mmask16) -1,
-						      _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_vfmaddsubpd512_maskz ((__v8df) __A,
+							(__v8df) __B,
+							-(__v8df) __C,
+							(__mmask8) __U,
+							_MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512i
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvttps_epu32 (__m512i __W, __mmask16 __U, __m512 __A)
+_mm512_fmsubadd_ps (__m512 __A, __m512 __B, __m512 __C)
 {
-  return (__m512i) __builtin_ia32_cvttps2udq512_mask ((__v16sf) __A,
-						      (__v16si) __W,
-						      (__mmask16) __U,
+  return (__m512) __builtin_ia32_vfmaddsubps512_mask ((__v16sf) __A,
+						      (__v16sf) __B,
+						      -(__v16sf) __C,
+						      (__mmask16) -1,
 						      _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512i
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvttps_epu32 (__mmask16 __U, __m512 __A)
+_mm512_mask_fmsubadd_ps (__m512 __A, __mmask16 __U, __m512 __B, __m512 __C)
 {
-  return (__m512i) __builtin_ia32_cvttps2udq512_mask ((__v16sf) __A,
-						      (__v16si)
-						      _mm512_setzero_si512 (),
+  return (__m512) __builtin_ia32_vfmaddsubps512_mask ((__v16sf) __A,
+						      (__v16sf) __B,
+						      -(__v16sf) __C,
 						      (__mmask16) __U,
 						      _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512i
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtps_epi32 (__m512 __A)
+_mm512_mask3_fmsubadd_ps (__m512 __A, __m512 __B, __m512 __C, __mmask16 __U)
 {
-  return (__m512i) __builtin_ia32_cvtps2dq512_mask ((__v16sf) __A,
-						    (__v16si)
-						    _mm512_undefined_epi32 (),
-						    (__mmask16) -1,
-						    _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_vfmsubaddps512_mask3 ((__v16sf) __A,
+						       (__v16sf) __B,
+						       (__v16sf) __C,
+						       (__mmask16) __U,
+						       _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512i
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtps_epi32 (__m512i __W, __mmask16 __U, __m512 __A)
+_mm512_maskz_fmsubadd_ps (__mmask16 __U, __m512 __A, __m512 __B, __m512 __C)
 {
-  return (__m512i) __builtin_ia32_cvtps2dq512_mask ((__v16sf) __A,
-						    (__v16si) __W,
-						    (__mmask16) __U,
-						    _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_vfmaddsubps512_maskz ((__v16sf) __A,
+						       (__v16sf) __B,
+						       -(__v16sf) __C,
+						       (__mmask16) __U,
+						       _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtps_epi32 (__mmask16 __U, __m512 __A)
+_mm512_fnmadd_pd (__m512d __A, __m512d __B, __m512d __C)
 {
-  return (__m512i) __builtin_ia32_cvtps2dq512_mask ((__v16sf) __A,
-						    (__v16si)
-						    _mm512_setzero_si512 (),
-						    (__mmask16) __U,
-						    _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_vfnmaddpd512_mask ((__v8df) __A,
+						     (__v8df) __B,
+						     (__v8df) __C,
+						     (__mmask8) -1,
+						     _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtps_epu32 (__m512 __A)
+_mm512_mask_fnmadd_pd (__m512d __A, __mmask8 __U, __m512d __B, __m512d __C)
 {
-  return (__m512i) __builtin_ia32_cvtps2udq512_mask ((__v16sf) __A,
-						     (__v16si)
-						     _mm512_undefined_epi32 (),
-						     (__mmask16) -1,
+  return (__m512d) __builtin_ia32_vfnmaddpd512_mask ((__v8df) __A,
+						     (__v8df) __B,
+						     (__v8df) __C,
+						     (__mmask8) __U,
 						     _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtps_epu32 (__m512i __W, __mmask16 __U, __m512 __A)
+_mm512_mask3_fnmadd_pd (__m512d __A, __m512d __B, __m512d __C, __mmask8 __U)
 {
-  return (__m512i) __builtin_ia32_cvtps2udq512_mask ((__v16sf) __A,
-						     (__v16si) __W,
-						     (__mmask16) __U,
-						     _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_vfnmaddpd512_mask3 ((__v8df) __A,
+						      (__v8df) __B,
+						      (__v8df) __C,
+						      (__mmask8) __U,
+						      _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtps_epu32 (__mmask16 __U, __m512 __A)
+_mm512_maskz_fnmadd_pd (__mmask8 __U, __m512d __A, __m512d __B, __m512d __C)
 {
-  return (__m512i) __builtin_ia32_cvtps2udq512_mask ((__v16sf) __A,
-						     (__v16si)
-						     _mm512_setzero_si512 (),
-						     (__mmask16) __U,
-						     _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_vfnmaddpd512_maskz ((__v8df) __A,
+						      (__v8df) __B,
+						      (__v8df) __C,
+						      (__mmask8) __U,
+						      _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline double
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtsd_f64 (__m512d __A)
+_mm512_fnmadd_ps (__m512 __A, __m512 __B, __m512 __C)
 {
-  return __A[0];
+  return (__m512) __builtin_ia32_vfnmaddps512_mask ((__v16sf) __A,
+						    (__v16sf) __B,
+						    (__v16sf) __C,
+						    (__mmask16) -1,
+						    _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline float
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtss_f32 (__m512 __A)
+_mm512_mask_fnmadd_ps (__m512 __A, __mmask16 __U, __m512 __B, __m512 __C)
 {
-  return __A[0];
+  return (__m512) __builtin_ia32_vfnmaddps512_mask ((__v16sf) __A,
+						    (__v16sf) __B,
+						    (__v16sf) __C,
+						    (__mmask16) __U,
+						    _MM_FROUND_CUR_DIRECTION);
 }
 
-#ifdef __x86_64__
-extern __inline __m128
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtu64_ss (__m128 __A, unsigned long long __B)
+_mm512_mask3_fnmadd_ps (__m512 __A, __m512 __B, __m512 __C, __mmask16 __U)
 {
-  return (__m128) __builtin_ia32_cvtusi2ss64 ((__v4sf) __A, __B,
-					      _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_vfnmaddps512_mask3 ((__v16sf) __A,
+						     (__v16sf) __B,
+						     (__v16sf) __C,
+						     (__mmask16) __U,
+						     _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128d
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtu64_sd (__m128d __A, unsigned long long __B)
+_mm512_maskz_fnmadd_ps (__mmask16 __U, __m512 __A, __m512 __B, __m512 __C)
 {
-  return (__m128d) __builtin_ia32_cvtusi2sd64 ((__v2df) __A, __B,
-					       _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_vfnmaddps512_maskz ((__v16sf) __A,
+						     (__v16sf) __B,
+						     (__v16sf) __C,
+						     (__mmask16) __U,
+						     _MM_FROUND_CUR_DIRECTION);
 }
-#endif
 
-extern __inline __m128
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtu32_ss (__m128 __A, unsigned __B)
+_mm512_fnmsub_pd (__m512d __A, __m512d __B, __m512d __C)
 {
-  return (__m128) __builtin_ia32_cvtusi2ss32 ((__v4sf) __A, __B,
-					      _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_vfnmsubpd512_mask ((__v8df) __A,
+						     (__v8df) __B,
+						     (__v8df) __C,
+						     (__mmask8) -1,
+						     _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepi32_ps (__m512i __A)
+_mm512_mask_fnmsub_pd (__m512d __A, __mmask8 __U, __m512d __B, __m512d __C)
 {
-  return (__m512) __builtin_ia32_cvtdq2ps512_mask ((__v16si) __A,
-						   (__v16sf)
-						   _mm512_undefined_ps (),
-						   (__mmask16) -1,
-						   _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_vfnmsubpd512_mask ((__v8df) __A,
+						     (__v8df) __B,
+						     (__v8df) __C,
+						     (__mmask8) __U,
+						     _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepi32_ps (__m512 __W, __mmask16 __U, __m512i __A)
+_mm512_mask3_fnmsub_pd (__m512d __A, __m512d __B, __m512d __C, __mmask8 __U)
 {
-  return (__m512) __builtin_ia32_cvtdq2ps512_mask ((__v16si) __A,
-						   (__v16sf) __W,
-						   (__mmask16) __U,
-						   _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_vfnmsubpd512_mask3 ((__v8df) __A,
+						      (__v8df) __B,
+						      (__v8df) __C,
+						      (__mmask8) __U,
+						      _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepi32_ps (__mmask16 __U, __m512i __A)
+_mm512_maskz_fnmsub_pd (__mmask8 __U, __m512d __A, __m512d __B, __m512d __C)
 {
-  return (__m512) __builtin_ia32_cvtdq2ps512_mask ((__v16si) __A,
-						   (__v16sf)
-						   _mm512_setzero_ps (),
-						   (__mmask16) __U,
-						   _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_vfnmsubpd512_maskz ((__v8df) __A,
+						      (__v8df) __B,
+						      (__v8df) __C,
+						      (__mmask8) __U,
+						      _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepu32_ps (__m512i __A)
+_mm512_fnmsub_ps (__m512 __A, __m512 __B, __m512 __C)
 {
-  return (__m512) __builtin_ia32_cvtudq2ps512_mask ((__v16si) __A,
-						    (__v16sf)
-						    _mm512_undefined_ps (),
+  return (__m512) __builtin_ia32_vfnmsubps512_mask ((__v16sf) __A,
+						    (__v16sf) __B,
+						    (__v16sf) __C,
 						    (__mmask16) -1,
 						    _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepu32_ps (__m512 __W, __mmask16 __U, __m512i __A)
+_mm512_mask_fnmsub_ps (__m512 __A, __mmask16 __U, __m512 __B, __m512 __C)
 {
-  return (__m512) __builtin_ia32_cvtudq2ps512_mask ((__v16si) __A,
-						    (__v16sf) __W,
+  return (__m512) __builtin_ia32_vfnmsubps512_mask ((__v16sf) __A,
+						    (__v16sf) __B,
+						    (__v16sf) __C,
 						    (__mmask16) __U,
 						    _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepu32_ps (__mmask16 __U, __m512i __A)
+_mm512_mask3_fnmsub_ps (__m512 __A, __m512 __B, __m512 __C, __mmask16 __U)
 {
-  return (__m512) __builtin_ia32_cvtudq2ps512_mask ((__v16si) __A,
-						    (__v16sf)
-						    _mm512_setzero_ps (),
-						    (__mmask16) __U,
-						    _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_vfnmsubps512_mask3 ((__v16sf) __A,
+						     (__v16sf) __B,
+						     (__v16sf) __C,
+						     (__mmask16) __U,
+						     _MM_FROUND_CUR_DIRECTION);
 }
 
-#ifdef __OPTIMIZE__
-extern __inline __m512d
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fixupimm_pd (__m512d __A, __m512d __B, __m512i __C, const int __imm)
+_mm512_maskz_fnmsub_ps (__mmask16 __U, __m512 __A, __m512 __B, __m512 __C)
 {
-  return (__m512d) __builtin_ia32_fixupimmpd512_mask ((__v8df) __A,
-						      (__v8df) __B,
-						      (__v8di) __C,
-						      __imm,
-						      (__mmask8) -1,
-						      _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_vfnmsubps512_maskz ((__v16sf) __A,
+						     (__v16sf) __B,
+						     (__v16sf) __C,
+						     (__mmask16) __U,
+						     _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512d
+extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fixupimm_pd (__m512d __A, __mmask8 __U, __m512d __B,
-			 __m512i __C, const int __imm)
+_mm512_cvttpd_epi32 (__m512d __A)
 {
-  return (__m512d) __builtin_ia32_fixupimmpd512_mask ((__v8df) __A,
-						      (__v8df) __B,
-						      (__v8di) __C,
-						      __imm,
-						      (__mmask8) __U,
-						      _MM_FROUND_CUR_DIRECTION);
+  return (__m256i) __builtin_ia32_cvttpd2dq512_mask ((__v8df) __A,
+						     (__v8si)
+						     _mm256_undefined_si256 (),
+						     (__mmask8) -1,
+						     _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512d
+extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fixupimm_pd (__mmask8 __U, __m512d __A, __m512d __B,
-			  __m512i __C, const int __imm)
+_mm512_mask_cvttpd_epi32 (__m256i __W, __mmask8 __U, __m512d __A)
 {
-  return (__m512d) __builtin_ia32_fixupimmpd512_maskz ((__v8df) __A,
-						       (__v8df) __B,
-						       (__v8di) __C,
-						       __imm,
-						       (__mmask8) __U,
-						       _MM_FROUND_CUR_DIRECTION);
+  return (__m256i) __builtin_ia32_cvttpd2dq512_mask ((__v8df) __A,
+						     (__v8si) __W,
+						     (__mmask8) __U,
+						     _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512
+extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fixupimm_ps (__m512 __A, __m512 __B, __m512i __C, const int __imm)
+_mm512_maskz_cvttpd_epi32 (__mmask8 __U, __m512d __A)
 {
-  return (__m512) __builtin_ia32_fixupimmps512_mask ((__v16sf) __A,
-						     (__v16sf) __B,
-						     (__v16si) __C,
-						     __imm,
-						     (__mmask16) -1,
+  return (__m256i) __builtin_ia32_cvttpd2dq512_mask ((__v8df) __A,
+						     (__v8si)
+						     _mm256_setzero_si256 (),
+						     (__mmask8) __U,
 						     _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512
+extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fixupimm_ps (__m512 __A, __mmask16 __U, __m512 __B,
-			 __m512i __C, const int __imm)
+_mm512_cvttpd_epu32 (__m512d __A)
 {
-  return (__m512) __builtin_ia32_fixupimmps512_mask ((__v16sf) __A,
-						     (__v16sf) __B,
-						     (__v16si) __C,
-						     __imm,
-						     (__mmask16) __U,
-						     _MM_FROUND_CUR_DIRECTION);
+  return (__m256i) __builtin_ia32_cvttpd2udq512_mask ((__v8df) __A,
+						      (__v8si)
+						      _mm256_undefined_si256 (),
+						      (__mmask8) -1,
+						      _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512
+extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fixupimm_ps (__mmask16 __U, __m512 __A, __m512 __B,
-			  __m512i __C, const int __imm)
+_mm512_mask_cvttpd_epu32 (__m256i __W, __mmask8 __U, __m512d __A)
 {
-  return (__m512) __builtin_ia32_fixupimmps512_maskz ((__v16sf) __A,
-						      (__v16sf) __B,
-						      (__v16si) __C,
-						      __imm,
-						      (__mmask16) __U,
+  return (__m256i) __builtin_ia32_cvttpd2udq512_mask ((__v8df) __A,
+						      (__v8si) __W,
+						      (__mmask8) __U,
 						      _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128d
+extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fixupimm_sd (__m128d __A, __m128d __B, __m128i __C, const int __imm)
+_mm512_maskz_cvttpd_epu32 (__mmask8 __U, __m512d __A)
 {
-  return (__m128d) __builtin_ia32_fixupimmsd_mask ((__v2df) __A,
-						   (__v2df) __B,
-						   (__v2di) __C, __imm,
-						   (__mmask8) -1,
-						   _MM_FROUND_CUR_DIRECTION);
+  return (__m256i) __builtin_ia32_cvttpd2udq512_mask ((__v8df) __A,
+						      (__v8si)
+						      _mm256_setzero_si256 (),
+						      (__mmask8) __U,
+						      _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128d
+extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fixupimm_sd (__m128d __A, __mmask8 __U, __m128d __B,
-		      __m128i __C, const int __imm)
+_mm512_cvtpd_epi32 (__m512d __A)
 {
-  return (__m128d) __builtin_ia32_fixupimmsd_mask ((__v2df) __A,
-						   (__v2df) __B,
-						   (__v2di) __C, __imm,
-						   (__mmask8) __U,
-						   _MM_FROUND_CUR_DIRECTION);
+  return (__m256i) __builtin_ia32_cvtpd2dq512_mask ((__v8df) __A,
+						    (__v8si)
+						    _mm256_undefined_si256 (),
+						    (__mmask8) -1,
+						    _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128d
+extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fixupimm_sd (__mmask8 __U, __m128d __A, __m128d __B,
-		       __m128i __C, const int __imm)
+_mm512_mask_cvtpd_epi32 (__m256i __W, __mmask8 __U, __m512d __A)
 {
-  return (__m128d) __builtin_ia32_fixupimmsd_maskz ((__v2df) __A,
-						    (__v2df) __B,
-						    (__v2di) __C,
-						    __imm,
+  return (__m256i) __builtin_ia32_cvtpd2dq512_mask ((__v8df) __A,
+						    (__v8si) __W,
 						    (__mmask8) __U,
 						    _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128
+extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fixupimm_ss (__m128 __A, __m128 __B, __m128i __C, const int __imm)
+_mm512_maskz_cvtpd_epi32 (__mmask8 __U, __m512d __A)
 {
-  return (__m128) __builtin_ia32_fixupimmss_mask ((__v4sf) __A,
-						  (__v4sf) __B,
-						  (__v4si) __C, __imm,
-						  (__mmask8) -1,
-						  _MM_FROUND_CUR_DIRECTION);
+  return (__m256i) __builtin_ia32_cvtpd2dq512_mask ((__v8df) __A,
+						    (__v8si)
+						    _mm256_setzero_si256 (),
+						    (__mmask8) __U,
+						    _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128
+extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fixupimm_ss (__m128 __A, __mmask8 __U, __m128 __B,
-		      __m128i __C, const int __imm)
+_mm512_cvtpd_epu32 (__m512d __A)
 {
-  return (__m128) __builtin_ia32_fixupimmss_mask ((__v4sf) __A,
-						  (__v4sf) __B,
-						  (__v4si) __C, __imm,
-						  (__mmask8) __U,
-						  _MM_FROUND_CUR_DIRECTION);
+  return (__m256i) __builtin_ia32_cvtpd2udq512_mask ((__v8df) __A,
+						     (__v8si)
+						     _mm256_undefined_si256 (),
+						     (__mmask8) -1,
+						     _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128
+extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fixupimm_ss (__mmask8 __U, __m128 __A, __m128 __B,
-		       __m128i __C, const int __imm)
+_mm512_mask_cvtpd_epu32 (__m256i __W, __mmask8 __U, __m512d __A)
 {
-  return (__m128) __builtin_ia32_fixupimmss_maskz ((__v4sf) __A,
-						   (__v4sf) __B,
-						   (__v4si) __C, __imm,
-						   (__mmask8) __U,
-						   _MM_FROUND_CUR_DIRECTION);
+  return (__m256i) __builtin_ia32_cvtpd2udq512_mask ((__v8df) __A,
+						     (__v8si) __W,
+						     (__mmask8) __U,
+						     _MM_FROUND_CUR_DIRECTION);
 }
-#else
-#define _mm512_fixupimm_pd(X, Y, Z, C)					\
-  ((__m512d)__builtin_ia32_fixupimmpd512_mask ((__v8df)(__m512d)(X),	\
-      (__v8df)(__m512d)(Y), (__v8di)(__m512i)(Z), (int)(C),		\
-      (__mmask8)(-1), _MM_FROUND_CUR_DIRECTION))
-
-#define _mm512_mask_fixupimm_pd(X, U, Y, Z, C)                          \
-  ((__m512d)__builtin_ia32_fixupimmpd512_mask ((__v8df)(__m512d)(X),    \
-      (__v8df)(__m512d)(Y), (__v8di)(__m512i)(Z), (int)(C),             \
-      (__mmask8)(U), _MM_FROUND_CUR_DIRECTION))
-
-#define _mm512_maskz_fixupimm_pd(U, X, Y, Z, C)                         \
-  ((__m512d)__builtin_ia32_fixupimmpd512_maskz ((__v8df)(__m512d)(X),   \
-      (__v8df)(__m512d)(Y), (__v8di)(__m512i)(Z), (int)(C),             \
-      (__mmask8)(U), _MM_FROUND_CUR_DIRECTION))
 
-#define _mm512_fixupimm_ps(X, Y, Z, C)					\
-  ((__m512)__builtin_ia32_fixupimmps512_mask ((__v16sf)(__m512)(X),	\
-    (__v16sf)(__m512)(Y), (__v16si)(__m512i)(Z), (int)(C),		\
-    (__mmask16)(-1), _MM_FROUND_CUR_DIRECTION))
-
-#define _mm512_mask_fixupimm_ps(X, U, Y, Z, C)                          \
-  ((__m512)__builtin_ia32_fixupimmps512_mask ((__v16sf)(__m512)(X),     \
-    (__v16sf)(__m512)(Y), (__v16si)(__m512i)(Z), (int)(C),              \
-    (__mmask16)(U), _MM_FROUND_CUR_DIRECTION))
-
-#define _mm512_maskz_fixupimm_ps(U, X, Y, Z, C)                         \
-  ((__m512)__builtin_ia32_fixupimmps512_maskz ((__v16sf)(__m512)(X),    \
-    (__v16sf)(__m512)(Y), (__v16si)(__m512i)(Z), (int)(C),              \
-    (__mmask16)(U), _MM_FROUND_CUR_DIRECTION))
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvtpd_epu32 (__mmask8 __U, __m512d __A)
+{
+  return (__m256i) __builtin_ia32_cvtpd2udq512_mask ((__v8df) __A,
+						     (__v8si)
+						     _mm256_setzero_si256 (),
+						     (__mmask8) __U,
+						     _MM_FROUND_CUR_DIRECTION);
+}
 
-#define _mm_fixupimm_sd(X, Y, Z, C)					\
-    ((__m128d)__builtin_ia32_fixupimmsd_mask ((__v2df)(__m128d)(X),	\
-      (__v2df)(__m128d)(Y), (__v2di)(__m128i)(Z), (int)(C),		\
-      (__mmask8)(-1), _MM_FROUND_CUR_DIRECTION))
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvttps_epi32 (__m512 __A)
+{
+  return (__m512i) __builtin_ia32_cvttps2dq512_mask ((__v16sf) __A,
+						     (__v16si)
+						     _mm512_undefined_epi32 (),
+						     (__mmask16) -1,
+						     _MM_FROUND_CUR_DIRECTION);
+}
 
-#define _mm_mask_fixupimm_sd(X, U, Y, Z, C)				\
-    ((__m128d)__builtin_ia32_fixupimmsd_mask ((__v2df)(__m128d)(X),	\
-      (__v2df)(__m128d)(Y), (__v2di)(__m128i)(Z), (int)(C),		\
-      (__mmask8)(U), _MM_FROUND_CUR_DIRECTION))
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cvttps_epi32 (__m512i __W, __mmask16 __U, __m512 __A)
+{
+  return (__m512i) __builtin_ia32_cvttps2dq512_mask ((__v16sf) __A,
+						     (__v16si) __W,
+						     (__mmask16) __U,
+						     _MM_FROUND_CUR_DIRECTION);
+}
 
-#define _mm_maskz_fixupimm_sd(U, X, Y, Z, C)				\
-    ((__m128d)__builtin_ia32_fixupimmsd_maskz ((__v2df)(__m128d)(X),	\
-      (__v2df)(__m128d)(Y), (__v2di)(__m128i)(Z), (int)(C),		\
-      (__mmask8)(U), _MM_FROUND_CUR_DIRECTION))
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvttps_epi32 (__mmask16 __U, __m512 __A)
+{
+  return (__m512i) __builtin_ia32_cvttps2dq512_mask ((__v16sf) __A,
+						     (__v16si)
+						     _mm512_setzero_si512 (),
+						     (__mmask16) __U,
+						     _MM_FROUND_CUR_DIRECTION);
+}
 
-#define _mm_fixupimm_ss(X, Y, Z, C)					\
-    ((__m128)__builtin_ia32_fixupimmss_mask ((__v4sf)(__m128)(X),	\
-      (__v4sf)(__m128)(Y), (__v4si)(__m128i)(Z), (int)(C),		\
-      (__mmask8)(-1), _MM_FROUND_CUR_DIRECTION))
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvttps_epu32 (__m512 __A)
+{
+  return (__m512i) __builtin_ia32_cvttps2udq512_mask ((__v16sf) __A,
+						      (__v16si)
+						      _mm512_undefined_epi32 (),
+						      (__mmask16) -1,
+						      _MM_FROUND_CUR_DIRECTION);
+}
 
-#define _mm_mask_fixupimm_ss(X, U, Y, Z, C)				\
-    ((__m128)__builtin_ia32_fixupimmss_mask ((__v4sf)(__m128)(X),	\
-      (__v4sf)(__m128)(Y), (__v4si)(__m128i)(Z), (int)(C),		\
-      (__mmask8)(U), _MM_FROUND_CUR_DIRECTION))
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cvttps_epu32 (__m512i __W, __mmask16 __U, __m512 __A)
+{
+  return (__m512i) __builtin_ia32_cvttps2udq512_mask ((__v16sf) __A,
+						      (__v16si) __W,
+						      (__mmask16) __U,
+						      _MM_FROUND_CUR_DIRECTION);
+}
 
-#define _mm_maskz_fixupimm_ss(U, X, Y, Z, C)				\
-    ((__m128)__builtin_ia32_fixupimmss_maskz ((__v4sf)(__m128)(X),	\
-      (__v4sf)(__m128)(Y), (__v4si)(__m128i)(Z), (int)(C),		\
-      (__mmask8)(U), _MM_FROUND_CUR_DIRECTION))
-#endif
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvttps_epu32 (__mmask16 __U, __m512 __A)
+{
+  return (__m512i) __builtin_ia32_cvttps2udq512_mask ((__v16sf) __A,
+						      (__v16si)
+						      _mm512_setzero_si512 (),
+						      (__mmask16) __U,
+						      _MM_FROUND_CUR_DIRECTION);
+}
 
-#ifdef __x86_64__
-extern __inline unsigned long long
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtss_u64 (__m128 __A)
+_mm512_cvtps_epi32 (__m512 __A)
 {
-  return (unsigned long long) __builtin_ia32_vcvtss2usi64 ((__v4sf)
-							   __A,
-							   _MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_cvtps2dq512_mask ((__v16sf) __A,
+						    (__v16si)
+						    _mm512_undefined_epi32 (),
+						    (__mmask16) -1,
+						    _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline unsigned long long
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvttss_u64 (__m128 __A)
+_mm512_mask_cvtps_epi32 (__m512i __W, __mmask16 __U, __m512 __A)
 {
-  return (unsigned long long) __builtin_ia32_vcvttss2usi64 ((__v4sf)
-							    __A,
-							    _MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_cvtps2dq512_mask ((__v16sf) __A,
+						    (__v16si) __W,
+						    (__mmask16) __U,
+						    _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline long long
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvttss_i64 (__m128 __A)
+_mm512_maskz_cvtps_epi32 (__mmask16 __U, __m512 __A)
 {
-  return (long long) __builtin_ia32_vcvttss2si64 ((__v4sf) __A,
-						  _MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_cvtps2dq512_mask ((__v16sf) __A,
+						    (__v16si)
+						    _mm512_setzero_si512 (),
+						    (__mmask16) __U,
+						    _MM_FROUND_CUR_DIRECTION);
 }
-#endif /* __x86_64__ */
 
-extern __inline int
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtsi512_si32 (__m512i __A)
+_mm512_cvtps_epu32 (__m512 __A)
 {
-  __v16si __B = (__v16si) __A;
-  return __B[0];
+  return (__m512i) __builtin_ia32_cvtps2udq512_mask ((__v16sf) __A,
+						     (__v16si)
+						     _mm512_undefined_epi32 (),
+						     (__mmask16) -1,
+						     _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline unsigned
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtss_u32 (__m128 __A)
+_mm512_mask_cvtps_epu32 (__m512i __W, __mmask16 __U, __m512 __A)
 {
-  return (unsigned) __builtin_ia32_vcvtss2usi32 ((__v4sf) __A,
-						 _MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_cvtps2udq512_mask ((__v16sf) __A,
+						     (__v16si) __W,
+						     (__mmask16) __U,
+						     _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline unsigned
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvttss_u32 (__m128 __A)
+_mm512_maskz_cvtps_epu32 (__mmask16 __U, __m512 __A)
 {
-  return (unsigned) __builtin_ia32_vcvttss2usi32 ((__v4sf) __A,
-						  _MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_cvtps2udq512_mask ((__v16sf) __A,
+						     (__v16si)
+						     _mm512_setzero_si512 (),
+						     (__mmask16) __U,
+						     _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline int
+extern __inline double
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvttss_i32 (__m128 __A)
+_mm512_cvtsd_f64 (__m512d __A)
 {
-  return (int) __builtin_ia32_vcvttss2si32 ((__v4sf) __A,
-					    _MM_FROUND_CUR_DIRECTION);
+  return __A[0];
 }
 
-extern __inline int
+extern __inline float
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtsd_i32 (__m128d __A)
+_mm512_cvtss_f32 (__m512 __A)
 {
-  return (int) __builtin_ia32_cvtsd2si ((__v2df) __A);
+  return __A[0];
 }
 
-extern __inline int
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtss_i32 (__m128 __A)
+_mm512_cvtepi32_ps (__m512i __A)
 {
-  return (int) __builtin_ia32_cvtss2si ((__v4sf) __A);
+  return (__m512) __builtin_ia32_cvtdq2ps512_mask ((__v16si) __A,
+						   (__v16sf)
+						   _mm512_undefined_ps (),
+						   (__mmask16) -1,
+						   _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128d
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvti32_sd (__m128d __A, int __B)
+_mm512_mask_cvtepi32_ps (__m512 __W, __mmask16 __U, __m512i __A)
 {
-  return (__m128d) __builtin_ia32_cvtsi2sd ((__v2df) __A, __B);
+  return (__m512) __builtin_ia32_cvtdq2ps512_mask ((__v16si) __A,
+						   (__v16sf) __W,
+						   (__mmask16) __U,
+						   _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvti32_ss (__m128 __A, int __B)
+_mm512_maskz_cvtepi32_ps (__mmask16 __U, __m512i __A)
 {
-  return (__m128) __builtin_ia32_cvtsi2ss ((__v4sf) __A, __B);
+  return (__m512) __builtin_ia32_cvtdq2ps512_mask ((__v16si) __A,
+						   (__v16sf)
+						   _mm512_setzero_ps (),
+						   (__mmask16) __U,
+						   _MM_FROUND_CUR_DIRECTION);
 }
 
-#ifdef __x86_64__
-extern __inline unsigned long long
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtsd_u64 (__m128d __A)
+_mm512_cvtepu32_ps (__m512i __A)
 {
-  return (unsigned long long) __builtin_ia32_vcvtsd2usi64 ((__v2df)
-							   __A,
-							   _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_cvtudq2ps512_mask ((__v16si) __A,
+						    (__v16sf)
+						    _mm512_undefined_ps (),
+						    (__mmask16) -1,
+						    _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline unsigned long long
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvttsd_u64 (__m128d __A)
+_mm512_mask_cvtepu32_ps (__m512 __W, __mmask16 __U, __m512i __A)
 {
-  return (unsigned long long) __builtin_ia32_vcvttsd2usi64 ((__v2df)
-							    __A,
-							    _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_cvtudq2ps512_mask ((__v16si) __A,
+						    (__v16sf) __W,
+						    (__mmask16) __U,
+						    _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline long long
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvttsd_i64 (__m128d __A)
+_mm512_maskz_cvtepu32_ps (__mmask16 __U, __m512i __A)
 {
-  return (long long) __builtin_ia32_vcvttsd2si64 ((__v2df) __A,
-						  _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_cvtudq2ps512_mask ((__v16si) __A,
+						    (__v16sf)
+						    _mm512_setzero_ps (),
+						    (__mmask16) __U,
+						    _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline long long
+#ifdef __OPTIMIZE__
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtsd_i64 (__m128d __A)
+_mm512_fixupimm_pd (__m512d __A, __m512d __B, __m512i __C, const int __imm)
 {
-  return (long long) __builtin_ia32_cvtsd2si64 ((__v2df) __A);
+  return (__m512d) __builtin_ia32_fixupimmpd512_mask ((__v8df) __A,
+						      (__v8df) __B,
+						      (__v8di) __C,
+						      __imm,
+						      (__mmask8) -1,
+						      _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline long long
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtss_i64 (__m128 __A)
+_mm512_mask_fixupimm_pd (__m512d __A, __mmask8 __U, __m512d __B,
+			 __m512i __C, const int __imm)
 {
-  return (long long) __builtin_ia32_cvtss2si64 ((__v4sf) __A);
+  return (__m512d) __builtin_ia32_fixupimmpd512_mask ((__v8df) __A,
+						      (__v8df) __B,
+						      (__v8di) __C,
+						      __imm,
+						      (__mmask8) __U,
+						      _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128d
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvti64_sd (__m128d __A, long long __B)
+_mm512_maskz_fixupimm_pd (__mmask8 __U, __m512d __A, __m512d __B,
+			  __m512i __C, const int __imm)
 {
-  return (__m128d) __builtin_ia32_cvtsi642sd ((__v2df) __A, __B);
+  return (__m512d) __builtin_ia32_fixupimmpd512_maskz ((__v8df) __A,
+						       (__v8df) __B,
+						       (__v8di) __C,
+						       __imm,
+						       (__mmask8) __U,
+						       _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvti64_ss (__m128 __A, long long __B)
+_mm512_fixupimm_ps (__m512 __A, __m512 __B, __m512i __C, const int __imm)
 {
-  return (__m128) __builtin_ia32_cvtsi642ss ((__v4sf) __A, __B);
+  return (__m512) __builtin_ia32_fixupimmps512_mask ((__v16sf) __A,
+						     (__v16sf) __B,
+						     (__v16si) __C,
+						     __imm,
+						     (__mmask16) -1,
+						     _MM_FROUND_CUR_DIRECTION);
 }
-#endif /* __x86_64__ */
 
-extern __inline unsigned
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtsd_u32 (__m128d __A)
+_mm512_mask_fixupimm_ps (__m512 __A, __mmask16 __U, __m512 __B,
+			 __m512i __C, const int __imm)
 {
-  return (unsigned) __builtin_ia32_vcvtsd2usi32 ((__v2df) __A,
-						 _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_fixupimmps512_mask ((__v16sf) __A,
+						     (__v16sf) __B,
+						     (__v16si) __C,
+						     __imm,
+						     (__mmask16) __U,
+						     _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline unsigned
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvttsd_u32 (__m128d __A)
+_mm512_maskz_fixupimm_ps (__mmask16 __U, __m512 __A, __m512 __B,
+			  __m512i __C, const int __imm)
 {
-  return (unsigned) __builtin_ia32_vcvttsd2usi32 ((__v2df) __A,
-						  _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_fixupimmps512_maskz ((__v16sf) __A,
+						      (__v16sf) __B,
+						      (__v16si) __C,
+						      __imm,
+						      (__mmask16) __U,
+						      _MM_FROUND_CUR_DIRECTION);
 }
 
+#else
+#define _mm512_fixupimm_pd(X, Y, Z, C)					\
+  ((__m512d)__builtin_ia32_fixupimmpd512_mask ((__v8df)(__m512d)(X),	\
+      (__v8df)(__m512d)(Y), (__v8di)(__m512i)(Z), (int)(C),		\
+      (__mmask8)(-1), _MM_FROUND_CUR_DIRECTION))
+
+#define _mm512_mask_fixupimm_pd(X, U, Y, Z, C)                          \
+  ((__m512d)__builtin_ia32_fixupimmpd512_mask ((__v8df)(__m512d)(X),    \
+      (__v8df)(__m512d)(Y), (__v8di)(__m512i)(Z), (int)(C),             \
+      (__mmask8)(U), _MM_FROUND_CUR_DIRECTION))
+
+#define _mm512_maskz_fixupimm_pd(U, X, Y, Z, C)                         \
+  ((__m512d)__builtin_ia32_fixupimmpd512_maskz ((__v8df)(__m512d)(X),   \
+      (__v8df)(__m512d)(Y), (__v8di)(__m512i)(Z), (int)(C),             \
+      (__mmask8)(U), _MM_FROUND_CUR_DIRECTION))
+
+#define _mm512_fixupimm_ps(X, Y, Z, C)					\
+  ((__m512)__builtin_ia32_fixupimmps512_mask ((__v16sf)(__m512)(X),	\
+    (__v16sf)(__m512)(Y), (__v16si)(__m512i)(Z), (int)(C),		\
+    (__mmask16)(-1), _MM_FROUND_CUR_DIRECTION))
+
+#define _mm512_mask_fixupimm_ps(X, U, Y, Z, C)                          \
+  ((__m512)__builtin_ia32_fixupimmps512_mask ((__v16sf)(__m512)(X),     \
+    (__v16sf)(__m512)(Y), (__v16si)(__m512i)(Z), (int)(C),              \
+    (__mmask16)(U), _MM_FROUND_CUR_DIRECTION))
+
+#define _mm512_maskz_fixupimm_ps(U, X, Y, Z, C)                         \
+  ((__m512)__builtin_ia32_fixupimmps512_maskz ((__v16sf)(__m512)(X),    \
+    (__v16sf)(__m512)(Y), (__v16si)(__m512i)(Z), (int)(C),              \
+    (__mmask16)(U), _MM_FROUND_CUR_DIRECTION))
+
+#endif
+
 extern __inline int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvttsd_i32 (__m128d __A)
+_mm512_cvtsi512_si32 (__m512i __A)
 {
-  return (int) __builtin_ia32_vcvttsd2si32 ((__v2df) __A,
-					    _MM_FROUND_CUR_DIRECTION);
+  __v16si __B = (__v16si) __A;
+  return __B[0];
 }
 
 extern __inline __m512d
@@ -14770,70 +15245,6 @@ _mm512_maskz_getexp_pd (__mmask8 __U, __m512d __A)
 						    _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_getexp_ss (__m128 __A, __m128 __B)
-{
-  return (__m128) __builtin_ia32_getexpss128_round ((__v4sf) __A,
-						    (__v4sf) __B,
-						    _MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __m128
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_getexp_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B)
-{
-  return (__m128) __builtin_ia32_getexpss_mask_round ((__v4sf) __A,
-						(__v4sf) __B,
-						(__v4sf) __W,
-						(__mmask8) __U,
-						_MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __m128
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_getexp_ss (__mmask8 __U, __m128 __A, __m128 __B)
-{
-  return (__m128) __builtin_ia32_getexpss_mask_round ((__v4sf) __A,
-						(__v4sf) __B,
-						(__v4sf)
-						_mm_setzero_ps (),
-						(__mmask8) __U,
-						_MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __m128d
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_getexp_sd (__m128d __A, __m128d __B)
-{
-  return (__m128d) __builtin_ia32_getexpsd128_round ((__v2df) __A,
-						     (__v2df) __B,
-						     _MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __m128d
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_getexp_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B)
-{
-  return (__m128d) __builtin_ia32_getexpsd_mask_round ((__v2df) __A,
-						(__v2df) __B,
-						(__v2df) __W,
-						(__mmask8) __U,
-						_MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __m128d
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_getexp_sd (__mmask8 __U, __m128d __A, __m128d __B)
-{
-  return (__m128d) __builtin_ia32_getexpsd_mask_round ((__v2df) __A,
-						(__v2df) __B,
-						(__v2df)
-						_mm_setzero_pd (),
-						(__mmask8) __U,
-						_MM_FROUND_CUR_DIRECTION);
-}
-
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_getmant_pd (__m512d __A, _MM_MANTISSA_NORM_ENUM __B,
@@ -14906,82 +15317,6 @@ _mm512_maskz_getmant_ps (__mmask16 __U, __m512 __A,
 						    _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128d
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_getmant_sd (__m128d __A, __m128d __B, _MM_MANTISSA_NORM_ENUM __C,
-		_MM_MANTISSA_SIGN_ENUM __D)
-{
-  return (__m128d) __builtin_ia32_getmantsd_round ((__v2df) __A,
-						   (__v2df) __B,
-						   (__D << 2) | __C,
-						   _MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __m128d
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_getmant_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B,
-			_MM_MANTISSA_NORM_ENUM __C, _MM_MANTISSA_SIGN_ENUM __D)
-{
-  return (__m128d) __builtin_ia32_getmantsd_mask_round ((__v2df) __A,
-							(__v2df) __B,
-						        (__D << 2) | __C,
-                                                        (__v2df) __W,
-						       __U,
-						     _MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __m128d
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_getmant_sd (__mmask8 __U, __m128d __A, __m128d __B,
-			 _MM_MANTISSA_NORM_ENUM __C, _MM_MANTISSA_SIGN_ENUM __D)
-{
-  return (__m128d) __builtin_ia32_getmantsd_mask_round ((__v2df) __A,
-                                                        (__v2df) __B,
-						        (__D << 2) | __C,
-                                                        (__v2df)
-							_mm_setzero_pd(),
-						        __U,
-						     _MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __m128
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_getmant_ss (__m128 __A, __m128 __B, _MM_MANTISSA_NORM_ENUM __C,
-		_MM_MANTISSA_SIGN_ENUM __D)
-{
-  return (__m128) __builtin_ia32_getmantss_round ((__v4sf) __A,
-						  (__v4sf) __B,
-						  (__D << 2) | __C,
-						  _MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __m128
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_getmant_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B,
-			_MM_MANTISSA_NORM_ENUM __C, _MM_MANTISSA_SIGN_ENUM __D)
-{
-  return (__m128) __builtin_ia32_getmantss_mask_round ((__v4sf) __A,
-							(__v4sf) __B,
-						        (__D << 2) | __C,
-                                                        (__v4sf) __W,
-						       __U,
-						     _MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __m128
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_getmant_ss (__mmask8 __U, __m128 __A, __m128 __B,
-			 _MM_MANTISSA_NORM_ENUM __C, _MM_MANTISSA_SIGN_ENUM __D)
-{
-  return (__m128) __builtin_ia32_getmantss_mask_round ((__v4sf) __A,
-                                                        (__v4sf) __B,
-						        (__D << 2) | __C,
-                                                        (__v4sf)
-							_mm_setzero_ps(),
-						        __U,
-						     _MM_FROUND_CUR_DIRECTION);
-}
-
 #else
 #define _mm512_getmant_pd(X, B, C)                                                  \
   ((__m512d)__builtin_ia32_getmantpd512_mask ((__v8df)(__m512d)(X),                 \
@@ -14990,107 +15325,39 @@ _mm_maskz_getmant_ss (__mmask8 __U, __m128 __A, __m128 __B,
                                               (__mmask8)-1,\
 					      _MM_FROUND_CUR_DIRECTION))
 
-#define _mm512_mask_getmant_pd(W, U, X, B, C)                                       \
-  ((__m512d)__builtin_ia32_getmantpd512_mask ((__v8df)(__m512d)(X),                 \
-                                              (int)(((C)<<2) | (B)),                \
-                                              (__v8df)(__m512d)(W),                 \
-                                              (__mmask8)(U),\
-					      _MM_FROUND_CUR_DIRECTION))
-
-#define _mm512_maskz_getmant_pd(U, X, B, C)                                         \
-  ((__m512d)__builtin_ia32_getmantpd512_mask ((__v8df)(__m512d)(X),                 \
-                                              (int)(((C)<<2) | (B)),                \
-                                              (__v8df)_mm512_setzero_pd(),          \
-                                              (__mmask8)(U),\
-					      _MM_FROUND_CUR_DIRECTION))
-#define _mm512_getmant_ps(X, B, C)                                                  \
-  ((__m512)__builtin_ia32_getmantps512_mask ((__v16sf)(__m512)(X),                  \
-                                             (int)(((C)<<2) | (B)),                 \
-                                             (__v16sf)_mm512_undefined_ps(),        \
-                                             (__mmask16)-1,\
-					     _MM_FROUND_CUR_DIRECTION))
-
-#define _mm512_mask_getmant_ps(W, U, X, B, C)                                       \
-  ((__m512)__builtin_ia32_getmantps512_mask ((__v16sf)(__m512)(X),                  \
-                                             (int)(((C)<<2) | (B)),                 \
-                                             (__v16sf)(__m512)(W),                  \
-                                             (__mmask16)(U),\
-					     _MM_FROUND_CUR_DIRECTION))
-
-#define _mm512_maskz_getmant_ps(U, X, B, C)                                         \
-  ((__m512)__builtin_ia32_getmantps512_mask ((__v16sf)(__m512)(X),                  \
-                                             (int)(((C)<<2) | (B)),                 \
-                                             (__v16sf)_mm512_setzero_ps(),          \
-                                             (__mmask16)(U),\
-					     _MM_FROUND_CUR_DIRECTION))
-#define _mm_getmant_sd(X, Y, C, D)                                                  \
-  ((__m128d)__builtin_ia32_getmantsd_round ((__v2df)(__m128d)(X),                    \
-                                           (__v2df)(__m128d)(Y),                    \
-                                           (int)(((D)<<2) | (C)),                   \
-					   _MM_FROUND_CUR_DIRECTION))
-
-#define _mm_mask_getmant_sd(W, U, X, Y, C, D)                                       \
-  ((__m128d)__builtin_ia32_getmantsd_mask_round ((__v2df)(__m128d)(X),                 \
-                                                 (__v2df)(__m128d)(Y),                 \
-                                                 (int)(((D)<<2) | (C)),                \
-                                                (__v2df)(__m128d)(W),                 \
-                                              (__mmask8)(U),\
-					      _MM_FROUND_CUR_DIRECTION))
-
-#define _mm_maskz_getmant_sd(U, X, Y, C, D)                                         \
-  ((__m128d)__builtin_ia32_getmantsd_mask_round ((__v2df)(__m128d)(X),                 \
-                                           (__v2df)(__m128d)(Y),                     \
-                                              (int)(((D)<<2) | (C)),                \
-                                           (__v2df)_mm_setzero_pd(),             \
-                                              (__mmask8)(U),\
-					      _MM_FROUND_CUR_DIRECTION))
-
-#define _mm_getmant_ss(X, Y, C, D)                                                  \
-  ((__m128)__builtin_ia32_getmantss_round ((__v4sf)(__m128)(X),                      \
-                                          (__v4sf)(__m128)(Y),                      \
-                                          (int)(((D)<<2) | (C)),                    \
-					  _MM_FROUND_CUR_DIRECTION))
-
-#define _mm_mask_getmant_ss(W, U, X, Y, C, D)                                       \
-  ((__m128)__builtin_ia32_getmantss_mask_round ((__v4sf)(__m128)(X),                 \
-                                                 (__v4sf)(__m128)(Y),                 \
-                                                 (int)(((D)<<2) | (C)),                \
-                                                (__v4sf)(__m128)(W),                 \
+#define _mm512_mask_getmant_pd(W, U, X, B, C)                                       \
+  ((__m512d)__builtin_ia32_getmantpd512_mask ((__v8df)(__m512d)(X),                 \
+                                              (int)(((C)<<2) | (B)),                \
+                                              (__v8df)(__m512d)(W),                 \
                                               (__mmask8)(U),\
 					      _MM_FROUND_CUR_DIRECTION))
 
-#define _mm_maskz_getmant_ss(U, X, Y, C, D)                                         \
-  ((__m128)__builtin_ia32_getmantss_mask_round ((__v4sf)(__m128)(X),                 \
-                                           (__v4sf)(__m128)(Y),                     \
-                                              (int)(((D)<<2) | (C)),                \
-                                           (__v4sf)_mm_setzero_ps(),             \
+#define _mm512_maskz_getmant_pd(U, X, B, C)                                         \
+  ((__m512d)__builtin_ia32_getmantpd512_mask ((__v8df)(__m512d)(X),                 \
+                                              (int)(((C)<<2) | (B)),                \
+                                              (__v8df)_mm512_setzero_pd(),          \
                                               (__mmask8)(U),\
 					      _MM_FROUND_CUR_DIRECTION))
+#define _mm512_getmant_ps(X, B, C)                                                  \
+  ((__m512)__builtin_ia32_getmantps512_mask ((__v16sf)(__m512)(X),                  \
+                                             (int)(((C)<<2) | (B)),                 \
+                                             (__v16sf)_mm512_undefined_ps(),        \
+                                             (__mmask16)-1,\
+					     _MM_FROUND_CUR_DIRECTION))
 
-#define _mm_getexp_ss(A, B)						      \
-  ((__m128)__builtin_ia32_getexpss128_round((__v4sf)(__m128)(A), (__v4sf)(__m128)(B),  \
-					   _MM_FROUND_CUR_DIRECTION))
-
-#define _mm_mask_getexp_ss(W, U, A, B) \
-    (__m128)__builtin_ia32_getexpss_mask_round(A, B, W, U,\
-                                             _MM_FROUND_CUR_DIRECTION)
-
-#define _mm_maskz_getexp_ss(U, A, B)   \
-    (__m128)__builtin_ia32_getexpss_mask_round(A, B, (__v4sf)_mm_setzero_ps(), U,\
-					      _MM_FROUND_CUR_DIRECTION)
-
-#define _mm_getexp_sd(A, B)						       \
-  ((__m128d)__builtin_ia32_getexpsd128_round((__v2df)(__m128d)(A), (__v2df)(__m128d)(B),\
-					    _MM_FROUND_CUR_DIRECTION))
-
-#define _mm_mask_getexp_sd(W, U, A, B) \
-    (__m128d)__builtin_ia32_getexpsd_mask_round(A, B, W, U,\
-                                             _MM_FROUND_CUR_DIRECTION)
-
-#define _mm_maskz_getexp_sd(U, A, B)   \
-    (__m128d)__builtin_ia32_getexpsd_mask_round(A, B, (__v2df)_mm_setzero_pd(), U,\
-					      _MM_FROUND_CUR_DIRECTION)
+#define _mm512_mask_getmant_ps(W, U, X, B, C)                                       \
+  ((__m512)__builtin_ia32_getmantps512_mask ((__v16sf)(__m512)(X),                  \
+                                             (int)(((C)<<2) | (B)),                 \
+                                             (__v16sf)(__m512)(W),                  \
+                                             (__mmask16)(U),\
+					     _MM_FROUND_CUR_DIRECTION))
 
+#define _mm512_maskz_getmant_ps(U, X, B, C)                                         \
+  ((__m512)__builtin_ia32_getmantps512_mask ((__v16sf)(__m512)(X),                  \
+                                             (int)(((C)<<2) | (B)),                 \
+                                             (__v16sf)_mm512_setzero_ps(),          \
+                                             (__mmask16)(U),\
+					     _MM_FROUND_CUR_DIRECTION))
 #define _mm512_getexp_ps(A)						\
   ((__m512)__builtin_ia32_getexpps512_mask((__v16sf)(__m512)(A),		\
   (__v16sf)_mm512_undefined_ps(), (__mmask16)-1, _MM_FROUND_CUR_DIRECTION))
@@ -15185,87 +15452,6 @@ _mm512_maskz_roundscale_pd (__mmask8 __A, __m512d __B, const int __imm)
 						   _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_roundscale_ss (__m128 __A, __m128 __B, const int __imm)
-{
-  return (__m128)
-    __builtin_ia32_rndscaless_mask_round ((__v4sf) __A,
-					  (__v4sf) __B, __imm,
-					  (__v4sf)
-					  _mm_setzero_ps (),
-					  (__mmask8) -1,
-					  _MM_FROUND_CUR_DIRECTION);
-}
-
-
-extern __inline __m128
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_roundscale_ss (__m128 __A, __mmask8 __B, __m128 __C, __m128 __D,
-			const int __imm)
-{
-  return (__m128)
-    __builtin_ia32_rndscaless_mask_round ((__v4sf) __C,
-					  (__v4sf) __D, __imm,
-					  (__v4sf) __A,
-					  (__mmask8) __B,
-					  _MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __m128
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_roundscale_ss (__mmask8 __A, __m128 __B, __m128 __C,
-			 const int __imm)
-{
-  return (__m128)
-    __builtin_ia32_rndscaless_mask_round ((__v4sf) __B,
-					  (__v4sf) __C, __imm,
-					  (__v4sf)
-					  _mm_setzero_ps (),
-					  (__mmask8) __A,
-					  _MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __m128d
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_roundscale_sd (__m128d __A, __m128d __B, const int __imm)
-{
-  return (__m128d)
-    __builtin_ia32_rndscalesd_mask_round ((__v2df) __A,
-					  (__v2df) __B, __imm,
-					  (__v2df)
-					  _mm_setzero_pd (),
-					  (__mmask8) -1,
-					  _MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __m128d
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_roundscale_sd (__m128d __A, __mmask8 __B, __m128d __C, __m128d __D,
-			const int __imm)
-{
-  return (__m128d)
-    __builtin_ia32_rndscalesd_mask_round ((__v2df) __C,
-					  (__v2df) __D, __imm,
-					  (__v2df) __A,
-					  (__mmask8) __B,
-					  _MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __m128d
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_roundscale_sd (__mmask8 __A, __m128d __B, __m128d __C,
-			 const int __imm)
-{
-  return (__m128d)
-    __builtin_ia32_rndscalesd_mask_round ((__v2df) __B,
-					  (__v2df) __C, __imm,
-					  (__v2df)
-					  _mm_setzero_pd (),
-					  (__mmask8) __A,
-					  _MM_FROUND_CUR_DIRECTION);
-}
-
 #else
 #define _mm512_roundscale_ps(A, B) \
   ((__m512) __builtin_ia32_rndscaleps_mask ((__v16sf)(__m512)(A), (int)(B),\
@@ -15293,54 +15479,6 @@ _mm_maskz_roundscale_sd (__mmask8 __A, __m128d __B, __m128d __C,
 					     (int)(C),			\
 					     (__v8df)_mm512_setzero_pd(),\
 					     (__mmask8)(A), _MM_FROUND_CUR_DIRECTION))
-#define _mm_roundscale_ss(A, B, I)					\
-  ((__m128)								\
-   __builtin_ia32_rndscaless_mask_round ((__v4sf) (__m128) (A),		\
-					 (__v4sf) (__m128) (B),		\
-					 (int) (I),			\
-					 (__v4sf) _mm_setzero_ps (),	\
-					 (__mmask8) (-1),		\
-					 _MM_FROUND_CUR_DIRECTION))
-#define _mm_mask_roundscale_ss(A, U, B, C, I)				\
-  ((__m128)								\
-   __builtin_ia32_rndscaless_mask_round ((__v4sf) (__m128) (B),		\
-					 (__v4sf) (__m128) (C),		\
-					 (int) (I),			\
-					 (__v4sf) (__m128) (A),		\
-					 (__mmask8) (U),		\
-					 _MM_FROUND_CUR_DIRECTION))
-#define _mm_maskz_roundscale_ss(U, A, B, I)				\
-  ((__m128)								\
-   __builtin_ia32_rndscaless_mask_round ((__v4sf) (__m128) (A),		\
-					 (__v4sf) (__m128) (B),		\
-					 (int) (I),			\
-					 (__v4sf) _mm_setzero_ps (),	\
-					 (__mmask8) (U),		\
-					 _MM_FROUND_CUR_DIRECTION))
-#define _mm_roundscale_sd(A, B, I)					\
-  ((__m128d)								\
-   __builtin_ia32_rndscalesd_mask_round ((__v2df) (__m128d) (A),	\
-					 (__v2df) (__m128d) (B),	\
-					 (int) (I),			\
-					 (__v2df) _mm_setzero_pd (),	\
-					 (__mmask8) (-1),		\
-					 _MM_FROUND_CUR_DIRECTION))
-#define _mm_mask_roundscale_sd(A, U, B, C, I)				\
-  ((__m128d)								\
-   __builtin_ia32_rndscalesd_mask_round ((__v2df) (__m128d) (B),	\
-					 (__v2df) (__m128d) (C),	\
-					 (int) (I),			\
-					 (__v2df) (__m128d) (A),	\
-					 (__mmask8) (U),		\
-					 _MM_FROUND_CUR_DIRECTION))
-#define _mm_maskz_roundscale_sd(U, A, B, I)				\
-  ((__m128d)								\
-   __builtin_ia32_rndscalesd_mask_round ((__v2df) (__m128d) (A),	\
-					 (__v2df) (__m128d) (B),	\
-					 (int) (I),			\
-					 (__v2df) _mm_setzero_pd (),	\
-					 (__mmask8) (U),		\
-					 _MM_FROUND_CUR_DIRECTION))
 #endif
 
 #ifdef __OPTIMIZE__
@@ -15384,46 +15522,6 @@ _mm512_mask_cmp_pd_mask (__mmask8 __U, __m512d __X, __m512d __Y, const int __P)
 						  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __mmask8
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cmp_sd_mask (__m128d __X, __m128d __Y, const int __P)
-{
-  return (__mmask8) __builtin_ia32_cmpsd_mask ((__v2df) __X,
-					       (__v2df) __Y, __P,
-					       (__mmask8) -1,
-					       _MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __mmask8
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_cmp_sd_mask (__mmask8 __M, __m128d __X, __m128d __Y, const int __P)
-{
-  return (__mmask8) __builtin_ia32_cmpsd_mask ((__v2df) __X,
-					       (__v2df) __Y, __P,
-					       (__mmask8) __M,
-					       _MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __mmask8
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cmp_ss_mask (__m128 __X, __m128 __Y, const int __P)
-{
-  return (__mmask8) __builtin_ia32_cmpss_mask ((__v4sf) __X,
-					       (__v4sf) __Y, __P,
-					       (__mmask8) -1,
-					       _MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __mmask8
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_cmp_ss_mask (__mmask8 __M, __m128 __X, __m128 __Y, const int __P)
-{
-  return (__mmask8) __builtin_ia32_cmpss_mask ((__v4sf) __X,
-					       (__v4sf) __Y, __P,
-					       (__mmask8) __M,
-					       _MM_FROUND_CUR_DIRECTION);
-}
-
 #else
 #define _mm512_cmp_pd_mask(X, Y, P)					\
   ((__mmask8) __builtin_ia32_cmppd512_mask ((__v8df)(__m512d)(X),	\
@@ -15445,25 +15543,6 @@ _mm_mask_cmp_ss_mask (__mmask8 __M, __m128 __X, __m128 __Y, const int __P)
 					     (__v16sf)(__m512)(Y), (int)(P),\
 					     (__mmask16)(M),_MM_FROUND_CUR_DIRECTION))
 
-#define _mm_cmp_sd_mask(X, Y, P)					\
-  ((__mmask8) __builtin_ia32_cmpsd_mask ((__v2df)(__m128d)(X),		\
-					 (__v2df)(__m128d)(Y), (int)(P),\
-					 (__mmask8)-1,_MM_FROUND_CUR_DIRECTION))
-
-#define _mm_mask_cmp_sd_mask(M, X, Y, P)					\
-  ((__mmask8) __builtin_ia32_cmpsd_mask ((__v2df)(__m128d)(X),		\
-					 (__v2df)(__m128d)(Y), (int)(P),\
-					 M,_MM_FROUND_CUR_DIRECTION))
-
-#define _mm_cmp_ss_mask(X, Y, P)					\
-  ((__mmask8) __builtin_ia32_cmpss_mask ((__v4sf)(__m128)(X),		\
-					 (__v4sf)(__m128)(Y), (int)(P), \
-					 (__mmask8)-1,_MM_FROUND_CUR_DIRECTION))
-
-#define _mm_mask_cmp_ss_mask(M, X, Y, P)					\
-  ((__mmask8) __builtin_ia32_cmpss_mask ((__v4sf)(__m128)(X),		\
-					 (__v4sf)(__m128)(Y), (int)(P), \
-					 M,_MM_FROUND_CUR_DIRECTION))
 #endif
 
 extern __inline __mmask8
@@ -16493,9 +16572,9 @@ _mm512_mask_reduce_max_pd (__mmask8 __U, __m512d __A)
 
 #undef __MM512_REDUCE_OP
 
-#ifdef __DISABLE_AVX512F__
-#undef __DISABLE_AVX512F__
+#ifdef __DISABLE_AVX512F_512__
+#undef __DISABLE_AVX512F_512__
 #pragma GCC pop_options
-#endif /* __DISABLE_AVX512F__ */
+#endif /* __DISABLE_AVX512F_512__ */
 
 #endif /* _AVX512FINTRIN_H_INCLUDED */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 03/18] [PATCH 2/5] Push evex512 target for 512 bit intrins
  2023-09-21  7:19 [PATCH 00/18] Support -mevex512 for AVX512 Hu, Lin1
  2023-09-21  7:19 ` [PATCH 01/18] Initial support for -mevex512 Hu, Lin1
  2023-09-21  7:19 ` [PATCH 02/18] [PATCH 1/5] Push evex512 target for 512 bit intrins Hu, Lin1
@ 2023-09-21  7:19 ` Hu, Lin1
  2023-09-21  7:19 ` [PATCH 04/18] [PATCH 3/5] " Hu, Lin1
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Hu, Lin1 @ 2023-09-21  7:19 UTC (permalink / raw)
  To: gcc-patches; +Cc: hongtao.liu, ubizjak, haochen.jiang

From: Haochen Jiang <haochen.jiang@intel.com>

gcc/ChangeLog:

	* config/i386/avx512dqintrin.h: Add evex512 target for 512 bit
	intrins.
---
 gcc/config/i386/avx512dqintrin.h | 1840 +++++++++++++++---------------
 1 file changed, 926 insertions(+), 914 deletions(-)

diff --git a/gcc/config/i386/avx512dqintrin.h b/gcc/config/i386/avx512dqintrin.h
index 93900a0b5c7..b6a1d499e25 100644
--- a/gcc/config/i386/avx512dqintrin.h
+++ b/gcc/config/i386/avx512dqintrin.h
@@ -184,1275 +184,1426 @@ _kandn_mask8 (__mmask8 __A, __mmask8 __B)
   return (__mmask8) __builtin_ia32_kandnqi ((__mmask8) __A, (__mmask8) __B);
 }
 
-extern __inline __m512d
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_broadcast_f64x2 (__m128d __A)
-{
-  return (__m512d)
-	 __builtin_ia32_broadcastf64x2_512_mask ((__v2df) __A,
-						 _mm512_undefined_pd (),
-						 (__mmask8) -1);
-}
-
-extern __inline __m512d
+#ifdef __OPTIMIZE__
+extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_broadcast_f64x2 (__m512d __O, __mmask8 __M, __m128d __A)
+_kshiftli_mask8 (__mmask8 __A, unsigned int __B)
 {
-  return (__m512d) __builtin_ia32_broadcastf64x2_512_mask ((__v2df)
-							   __A,
-							   (__v8df)
-							   __O, __M);
+  return (__mmask8) __builtin_ia32_kshiftliqi ((__mmask8) __A, (__mmask8) __B);
 }
 
-extern __inline __m512d
+extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_broadcast_f64x2 (__mmask8 __M, __m128d __A)
+_kshiftri_mask8 (__mmask8 __A, unsigned int __B)
 {
-  return (__m512d) __builtin_ia32_broadcastf64x2_512_mask ((__v2df)
-							   __A,
-							   (__v8df)
-							   _mm512_setzero_ps (),
-							   __M);
+  return (__mmask8) __builtin_ia32_kshiftriqi ((__mmask8) __A, (__mmask8) __B);
 }
 
-extern __inline __m512i
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_broadcast_i64x2 (__m128i __A)
+_mm_reduce_sd (__m128d __A, __m128d __B, int __C)
 {
-  return (__m512i)
-	 __builtin_ia32_broadcasti64x2_512_mask ((__v2di) __A,
-						 _mm512_undefined_epi32 (),
+  return (__m128d) __builtin_ia32_reducesd_mask ((__v2df) __A,
+						 (__v2df) __B, __C,
+						 (__v2df) _mm_setzero_pd (),
 						 (__mmask8) -1);
 }
 
-extern __inline __m512i
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_broadcast_i64x2 (__m512i __O, __mmask8 __M, __m128i __A)
+_mm_reduce_round_sd (__m128d __A, __m128d __B, int __C, const int __R)
 {
-  return (__m512i) __builtin_ia32_broadcasti64x2_512_mask ((__v2di)
-							   __A,
-							   (__v8di)
-							   __O, __M);
+  return (__m128d) __builtin_ia32_reducesd_mask_round ((__v2df) __A,
+						       (__v2df) __B, __C,
+						       (__v2df)
+						       _mm_setzero_pd (),
+						       (__mmask8) -1, __R);
 }
 
-extern __inline __m512i
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_broadcast_i64x2 (__mmask8 __M, __m128i __A)
+_mm_mask_reduce_sd (__m128d __W,  __mmask8 __U, __m128d __A,
+		    __m128d __B, int __C)
 {
-  return (__m512i) __builtin_ia32_broadcasti64x2_512_mask ((__v2di)
-							   __A,
-							   (__v8di)
-							   _mm512_setzero_si512 (),
-							   __M);
+  return (__m128d) __builtin_ia32_reducesd_mask ((__v2df) __A,
+						 (__v2df) __B, __C,
+						 (__v2df) __W,
+						 (__mmask8) __U);
 }
 
-extern __inline __m512
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_broadcast_f32x2 (__m128 __A)
+_mm_mask_reduce_round_sd (__m128d __W,  __mmask8 __U, __m128d __A,
+			  __m128d __B, int __C, const int __R)
 {
-  return (__m512)
-	 __builtin_ia32_broadcastf32x2_512_mask ((__v4sf) __A,
-						 (__v16sf)_mm512_undefined_ps (),
-						 (__mmask16) -1);
+  return (__m128d) __builtin_ia32_reducesd_mask_round ((__v2df) __A,
+						       (__v2df) __B, __C,
+						       (__v2df) __W,
+						       __U, __R);
 }
 
-extern __inline __m512
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_broadcast_f32x2 (__m512 __O, __mmask16 __M, __m128 __A)
+_mm_maskz_reduce_sd (__mmask8 __U, __m128d __A, __m128d __B, int __C)
 {
-  return (__m512) __builtin_ia32_broadcastf32x2_512_mask ((__v4sf) __A,
-							  (__v16sf)
-							  __O, __M);
+  return (__m128d) __builtin_ia32_reducesd_mask ((__v2df) __A,
+						 (__v2df) __B, __C,
+						 (__v2df) _mm_setzero_pd (),
+						 (__mmask8) __U);
 }
 
-extern __inline __m512
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_broadcast_f32x2 (__mmask16 __M, __m128 __A)
+_mm_maskz_reduce_round_sd (__mmask8 __U, __m128d __A, __m128d __B,
+			   int __C, const int __R)
 {
-  return (__m512) __builtin_ia32_broadcastf32x2_512_mask ((__v4sf) __A,
-							  (__v16sf)
-							  _mm512_setzero_ps (),
-							  __M);
+  return (__m128d) __builtin_ia32_reducesd_mask_round ((__v2df) __A,
+						       (__v2df) __B, __C,
+						       (__v2df)
+						       _mm_setzero_pd (),
+						       __U, __R);
 }
 
-extern __inline __m512i
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_broadcast_i32x2 (__m128i __A)
+_mm_reduce_ss (__m128 __A, __m128 __B, int __C)
 {
-  return (__m512i)
-	 __builtin_ia32_broadcasti32x2_512_mask ((__v4si) __A,
-						 (__v16si)
-						 _mm512_undefined_epi32 (),
-						 (__mmask16) -1);
+  return (__m128) __builtin_ia32_reducess_mask ((__v4sf) __A,
+						(__v4sf) __B, __C,
+						(__v4sf) _mm_setzero_ps (),
+						(__mmask8) -1);
 }
 
-extern __inline __m512i
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_broadcast_i32x2 (__m512i __O, __mmask16 __M, __m128i __A)
+_mm_reduce_round_ss (__m128 __A, __m128 __B, int __C, const int __R)
 {
-  return (__m512i) __builtin_ia32_broadcasti32x2_512_mask ((__v4si)
-							   __A,
-							   (__v16si)
-							   __O, __M);
+  return (__m128) __builtin_ia32_reducess_mask_round ((__v4sf) __A,
+						      (__v4sf) __B, __C,
+						      (__v4sf)
+						      _mm_setzero_ps (),
+						      (__mmask8) -1, __R);
 }
 
-extern __inline __m512i
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_broadcast_i32x2 (__mmask16 __M, __m128i __A)
+_mm_mask_reduce_ss (__m128 __W,  __mmask8 __U, __m128 __A,
+		    __m128 __B, int __C)
 {
-  return (__m512i) __builtin_ia32_broadcasti32x2_512_mask ((__v4si)
-							   __A,
-							   (__v16si)
-							   _mm512_setzero_si512 (),
-							   __M);
+  return (__m128) __builtin_ia32_reducess_mask ((__v4sf) __A,
+						(__v4sf) __B, __C,
+						(__v4sf) __W,
+						(__mmask8) __U);
 }
 
-extern __inline __m512
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_broadcast_f32x8 (__m256 __A)
+_mm_mask_reduce_round_ss (__m128 __W,  __mmask8 __U, __m128 __A,
+			  __m128 __B, int __C, const int __R)
 {
-  return (__m512)
-	 __builtin_ia32_broadcastf32x8_512_mask ((__v8sf) __A,
-						 _mm512_undefined_ps (),
-						 (__mmask16) -1);
+  return (__m128) __builtin_ia32_reducess_mask_round ((__v4sf) __A,
+						      (__v4sf) __B, __C,
+						      (__v4sf) __W,
+						      __U, __R);
 }
 
-extern __inline __m512
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_broadcast_f32x8 (__m512 __O, __mmask16 __M, __m256 __A)
+_mm_maskz_reduce_ss (__mmask8 __U, __m128 __A, __m128 __B, int __C)
 {
-  return (__m512) __builtin_ia32_broadcastf32x8_512_mask ((__v8sf) __A,
-							  (__v16sf)__O,
-							  __M);
+  return (__m128) __builtin_ia32_reducess_mask ((__v4sf) __A,
+						(__v4sf) __B, __C,
+						(__v4sf) _mm_setzero_ps (),
+						(__mmask8) __U);
 }
 
-extern __inline __m512
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_broadcast_f32x8 (__mmask16 __M, __m256 __A)
+_mm_maskz_reduce_round_ss (__mmask8 __U, __m128 __A, __m128 __B,
+			   int __C, const int __R)
 {
-  return (__m512) __builtin_ia32_broadcastf32x8_512_mask ((__v8sf) __A,
-							  (__v16sf)
-							  _mm512_setzero_ps (),
-							  __M);
+  return (__m128) __builtin_ia32_reducess_mask_round ((__v4sf) __A,
+						      (__v4sf) __B, __C,
+						      (__v4sf)
+						      _mm_setzero_ps (),
+						      __U, __R);
 }
 
-extern __inline __m512i
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_broadcast_i32x8 (__m256i __A)
+_mm_range_sd (__m128d __A, __m128d __B, int __C)
 {
-  return (__m512i)
-	 __builtin_ia32_broadcasti32x8_512_mask ((__v8si) __A,
-						 (__v16si)
-						 _mm512_undefined_epi32 (),
-						 (__mmask16) -1);
+  return (__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df) __A,
+						   (__v2df) __B, __C,
+						   (__v2df)
+						   _mm_setzero_pd (),
+						   (__mmask8) -1,
+						   _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512i
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_broadcast_i32x8 (__m512i __O, __mmask16 __M, __m256i __A)
+_mm_mask_range_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B, int __C)
 {
-  return (__m512i) __builtin_ia32_broadcasti32x8_512_mask ((__v8si)
-							   __A,
-							   (__v16si)__O,
-							   __M);
+  return (__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df) __A,
+						   (__v2df) __B, __C,
+						   (__v2df) __W,
+						   (__mmask8) __U,
+						   _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512i
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_broadcast_i32x8 (__mmask16 __M, __m256i __A)
+_mm_maskz_range_sd (__mmask8 __U, __m128d __A, __m128d __B, int __C)
 {
-  return (__m512i) __builtin_ia32_broadcasti32x8_512_mask ((__v8si)
-							   __A,
-							   (__v16si)
-							   _mm512_setzero_si512 (),
-							   __M);
+  return (__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df) __A,
+						   (__v2df) __B, __C,
+						   (__v2df)
+						   _mm_setzero_pd (),
+						   (__mmask8) __U,
+						   _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512i
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mullo_epi64 (__m512i __A, __m512i __B)
+_mm_range_ss (__m128 __A, __m128 __B, int __C)
 {
-  return (__m512i) ((__v8du) __A * (__v8du) __B);
+  return (__m128) __builtin_ia32_rangess128_mask_round ((__v4sf) __A,
+						  (__v4sf) __B, __C,
+						  (__v4sf)
+						  _mm_setzero_ps (),
+						  (__mmask8) -1,
+						  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512i
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_mullo_epi64 (__m512i __W, __mmask8 __U, __m512i __A,
-			 __m512i __B)
+_mm_mask_range_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B, int __C)
 {
-  return (__m512i) __builtin_ia32_pmullq512_mask ((__v8di) __A,
-						  (__v8di) __B,
-						  (__v8di) __W,
-						  (__mmask8) __U);
+  return (__m128) __builtin_ia32_rangess128_mask_round ((__v4sf) __A,
+						  (__v4sf) __B, __C,
+						  (__v4sf) __W,
+						  (__mmask8) __U,
+						  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512i
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_mullo_epi64 (__mmask8 __U, __m512i __A, __m512i __B)
+_mm_maskz_range_ss (__mmask8 __U, __m128 __A, __m128 __B, int __C)
 {
-  return (__m512i) __builtin_ia32_pmullq512_mask ((__v8di) __A,
-						  (__v8di) __B,
-						  (__v8di)
-						  _mm512_setzero_si512 (),
-						  (__mmask8) __U);
+  return (__m128) __builtin_ia32_rangess128_mask_round ((__v4sf) __A,
+						  (__v4sf) __B, __C,
+						  (__v4sf)
+						  _mm_setzero_ps (),
+						  (__mmask8) __U,
+						  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512d
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_xor_pd (__m512d __A, __m512d __B)
+_mm_range_round_sd (__m128d __A, __m128d __B, int __C, const int __R)
 {
-  return (__m512d) __builtin_ia32_xorpd512_mask ((__v8df) __A,
-						 (__v8df) __B,
-						 (__v8df)
-						 _mm512_setzero_pd (),
-						 (__mmask8) -1);
+  return (__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df) __A,
+						   (__v2df) __B, __C,
+						   (__v2df)
+						   _mm_setzero_pd (),
+						   (__mmask8) -1, __R);
 }
 
-extern __inline __m512d
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_xor_pd (__m512d __W, __mmask8 __U, __m512d __A,
-		    __m512d __B)
+_mm_mask_range_round_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B,
+			 int __C, const int __R)
 {
-  return (__m512d) __builtin_ia32_xorpd512_mask ((__v8df) __A,
-						 (__v8df) __B,
-						 (__v8df) __W,
-						 (__mmask8) __U);
+  return (__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df) __A,
+						   (__v2df) __B, __C,
+						   (__v2df) __W,
+						   (__mmask8) __U, __R);
 }
 
-extern __inline __m512d
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_xor_pd (__mmask8 __U, __m512d __A, __m512d __B)
+_mm_maskz_range_round_sd (__mmask8 __U, __m128d __A, __m128d __B, int __C,
+			  const int __R)
 {
-  return (__m512d) __builtin_ia32_xorpd512_mask ((__v8df) __A,
-						 (__v8df) __B,
-						 (__v8df)
-						 _mm512_setzero_pd (),
-						 (__mmask8) __U);
+  return (__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df) __A,
+						   (__v2df) __B, __C,
+						   (__v2df)
+						   _mm_setzero_pd (),
+						   (__mmask8) __U, __R);
 }
 
-extern __inline __m512
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_xor_ps (__m512 __A, __m512 __B)
+_mm_range_round_ss (__m128 __A, __m128 __B, int __C, const int __R)
 {
-  return (__m512) __builtin_ia32_xorps512_mask ((__v16sf) __A,
-						(__v16sf) __B,
-						(__v16sf)
-						_mm512_setzero_ps (),
-						(__mmask16) -1);
+  return (__m128) __builtin_ia32_rangess128_mask_round ((__v4sf) __A,
+						  (__v4sf) __B, __C,
+						  (__v4sf)
+						  _mm_setzero_ps (),
+						  (__mmask8) -1, __R);
 }
 
-extern __inline __m512
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_xor_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B)
+_mm_mask_range_round_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B,
+			 int __C, const int __R)
 {
-  return (__m512) __builtin_ia32_xorps512_mask ((__v16sf) __A,
-						(__v16sf) __B,
-						(__v16sf) __W,
-						(__mmask16) __U);
+  return (__m128) __builtin_ia32_rangess128_mask_round ((__v4sf) __A,
+						  (__v4sf) __B, __C,
+						  (__v4sf) __W,
+						  (__mmask8) __U, __R);
 }
 
-extern __inline __m512
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_xor_ps (__mmask16 __U, __m512 __A, __m512 __B)
+_mm_maskz_range_round_ss (__mmask8 __U, __m128 __A, __m128 __B, int __C,
+			  const int __R)
 {
-  return (__m512) __builtin_ia32_xorps512_mask ((__v16sf) __A,
-						(__v16sf) __B,
-						(__v16sf)
-						_mm512_setzero_ps (),
-						(__mmask16) __U);
+  return (__m128) __builtin_ia32_rangess128_mask_round ((__v4sf) __A,
+						  (__v4sf) __B, __C,
+						  (__v4sf)
+						  _mm_setzero_ps (),
+						  (__mmask8) __U, __R);
 }
 
-extern __inline __m512d
+extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_or_pd (__m512d __A, __m512d __B)
+_mm_fpclass_ss_mask (__m128 __A, const int __imm)
 {
-  return (__m512d) __builtin_ia32_orpd512_mask ((__v8df) __A,
-						(__v8df) __B,
-						(__v8df)
-						_mm512_setzero_pd (),
-						(__mmask8) -1);
+  return (__mmask8) __builtin_ia32_fpclassss_mask ((__v4sf) __A, __imm,
+						   (__mmask8) -1);
 }
 
-extern __inline __m512d
+extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_or_pd (__m512d __W, __mmask8 __U, __m512d __A, __m512d __B)
+_mm_fpclass_sd_mask (__m128d __A, const int __imm)
 {
-  return (__m512d) __builtin_ia32_orpd512_mask ((__v8df) __A,
-						(__v8df) __B,
-						(__v8df) __W,
-						(__mmask8) __U);
+  return (__mmask8) __builtin_ia32_fpclasssd_mask ((__v2df) __A, __imm,
+						   (__mmask8) -1);
 }
 
-extern __inline __m512d
+extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_or_pd (__mmask8 __U, __m512d __A, __m512d __B)
+_mm_mask_fpclass_ss_mask (__mmask8 __U, __m128 __A, const int __imm)
 {
-  return (__m512d) __builtin_ia32_orpd512_mask ((__v8df) __A,
-						(__v8df) __B,
-						(__v8df)
-						_mm512_setzero_pd (),
-						(__mmask8) __U);
+  return (__mmask8) __builtin_ia32_fpclassss_mask ((__v4sf) __A, __imm, __U);
 }
 
-extern __inline __m512
+extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_or_ps (__m512 __A, __m512 __B)
+_mm_mask_fpclass_sd_mask (__mmask8 __U, __m128d __A, const int __imm)
 {
-  return (__m512) __builtin_ia32_orps512_mask ((__v16sf) __A,
-					       (__v16sf) __B,
-					       (__v16sf)
-					       _mm512_setzero_ps (),
-					       (__mmask16) -1);
+  return (__mmask8) __builtin_ia32_fpclasssd_mask ((__v2df) __A, __imm, __U);
 }
 
-extern __inline __m512
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_or_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B)
-{
-  return (__m512) __builtin_ia32_orps512_mask ((__v16sf) __A,
-					       (__v16sf) __B,
-					       (__v16sf) __W,
-					       (__mmask16) __U);
-}
+#else
+#define _kshiftli_mask8(X, Y)                                           \
+  ((__mmask8) __builtin_ia32_kshiftliqi ((__mmask8)(X), (__mmask8)(Y)))
 
-extern __inline __m512
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_or_ps (__mmask16 __U, __m512 __A, __m512 __B)
-{
-  return (__m512) __builtin_ia32_orps512_mask ((__v16sf) __A,
-					       (__v16sf) __B,
-					       (__v16sf)
-					       _mm512_setzero_ps (),
-					       (__mmask16) __U);
-}
+#define _kshiftri_mask8(X, Y)                                           \
+  ((__mmask8) __builtin_ia32_kshiftriqi ((__mmask8)(X), (__mmask8)(Y)))
+
+#define _mm_range_sd(A, B, C)						 \
+  ((__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df)(__m128d)(A), \
+    (__v2df)(__m128d)(B), (int)(C), (__v2df) _mm_setzero_pd (), 	 \
+    (__mmask8) -1, _MM_FROUND_CUR_DIRECTION))
+
+#define _mm_mask_range_sd(W, U, A, B, C)				 \
+  ((__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df)(__m128d)(A), \
+    (__v2df)(__m128d)(B), (int)(C), (__v2df)(__m128d)(W), 		 \
+    (__mmask8)(U), _MM_FROUND_CUR_DIRECTION))
+
+#define _mm_maskz_range_sd(U, A, B, C)					 \
+  ((__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df)(__m128d)(A), \
+    (__v2df)(__m128d)(B), (int)(C), (__v2df) _mm_setzero_pd (), 	 \
+    (__mmask8)(U), _MM_FROUND_CUR_DIRECTION))
+
+#define _mm_range_ss(A, B, C)						\
+  ((__m128) __builtin_ia32_rangess128_mask_round ((__v4sf)(__m128)(A),	\
+    (__v4sf)(__m128)(B), (int)(C), (__v4sf) _mm_setzero_ps (),		\
+    (__mmask8) -1, _MM_FROUND_CUR_DIRECTION))
+
+#define _mm_mask_range_ss(W, U, A, B, C)				\
+  ((__m128) __builtin_ia32_rangess128_mask_round ((__v4sf)(__m128)(A),	\
+    (__v4sf)(__m128)(B), (int)(C), (__v4sf)(__m128)(W),			\
+    (__mmask8)(U), _MM_FROUND_CUR_DIRECTION))
+
+#define _mm_maskz_range_ss(U, A, B, C)					\
+  ((__m128) __builtin_ia32_rangess128_mask_round ((__v4sf)(__m128)(A),	\
+    (__v4sf)(__m128)(B), (int)(C), (__v4sf) _mm_setzero_ps (),		\
+    (__mmask8)(U), _MM_FROUND_CUR_DIRECTION))
+
+#define _mm_range_round_sd(A, B, C, R)					 \
+  ((__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df)(__m128d)(A), \
+    (__v2df)(__m128d)(B), (int)(C), (__v2df) _mm_setzero_pd (),		 \
+    (__mmask8) -1, (R)))
+
+#define _mm_mask_range_round_sd(W, U, A, B, C, R)			 \
+  ((__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df)(__m128d)(A), \
+    (__v2df)(__m128d)(B), (int)(C), (__v2df)(__m128d)(W),		 \
+    (__mmask8)(U), (R)))
+
+#define _mm_maskz_range_round_sd(U, A, B, C, R)				 \
+  ((__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df)(__m128d)(A), \
+    (__v2df)(__m128d)(B), (int)(C), (__v2df) _mm_setzero_pd (),		 \
+    (__mmask8)(U), (R)))
+
+#define _mm_range_round_ss(A, B, C, R)					\
+  ((__m128) __builtin_ia32_rangess128_mask_round ((__v4sf)(__m128)(A),	\
+    (__v4sf)(__m128)(B), (int)(C), (__v4sf) _mm_setzero_ps (),		\
+    (__mmask8) -1, (R)))
+
+#define _mm_mask_range_round_ss(W, U, A, B, C, R)			\
+  ((__m128) __builtin_ia32_rangess128_mask_round ((__v4sf)(__m128)(A),	\
+    (__v4sf)(__m128)(B), (int)(C), (__v4sf)(__m128)(W),			\
+    (__mmask8)(U), (R)))
+
+#define _mm_maskz_range_round_ss(U, A, B, C, R)				\
+  ((__m128) __builtin_ia32_rangess128_mask_round ((__v4sf)(__m128)(A),	\
+    (__v4sf)(__m128)(B), (int)(C), (__v4sf) _mm_setzero_ps (),		\
+    (__mmask8)(U), (R)))
+
+#define _mm_fpclass_ss_mask(X, C)					\
+  ((__mmask8) __builtin_ia32_fpclassss_mask ((__v4sf) (__m128) (X),	\
+					     (int) (C), (__mmask8) (-1))) \
+
+#define _mm_fpclass_sd_mask(X, C)					\
+  ((__mmask8) __builtin_ia32_fpclasssd_mask ((__v2df) (__m128d) (X),	\
+					     (int) (C), (__mmask8) (-1))) \
+
+#define _mm_mask_fpclass_ss_mask(X, C, U)				\
+  ((__mmask8) __builtin_ia32_fpclassss_mask ((__v4sf) (__m128) (X),	\
+					     (int) (C), (__mmask8) (U)))
+
+#define _mm_mask_fpclass_sd_mask(X, C, U)				\
+  ((__mmask8) __builtin_ia32_fpclasssd_mask ((__v2df) (__m128d) (X),	\
+					     (int) (C), (__mmask8) (U)))
+#define _mm_reduce_sd(A, B, C)						\
+  ((__m128d) __builtin_ia32_reducesd_mask ((__v2df)(__m128d)(A),	\
+    (__v2df)(__m128d)(B), (int)(C), (__v2df) _mm_setzero_pd (),		\
+    (__mmask8)-1))
+
+#define _mm_mask_reduce_sd(W, U, A, B, C)				\
+  ((__m128d) __builtin_ia32_reducesd_mask ((__v2df)(__m128d)(A),	\
+    (__v2df)(__m128d)(B), (int)(C), (__v2df)(__m128d)(W), (__mmask8)(U)))
+
+#define _mm_maskz_reduce_sd(U, A, B, C)					\
+  ((__m128d) __builtin_ia32_reducesd_mask ((__v2df)(__m128d)(A),	\
+    (__v2df)(__m128d)(B), (int)(C), (__v2df) _mm_setzero_pd (),		\
+    (__mmask8)(U)))
+
+#define _mm_reduce_round_sd(A, B, C, R)				       \
+  ((__m128d) __builtin_ia32_reducesd_round ((__v2df)(__m128d)(A),      \
+    (__v2df)(__m128d)(B), (int)(C), (__mmask8)(U), (int)(R)))
+
+#define _mm_mask_reduce_round_sd(W, U, A, B, C, R)		       \
+  ((__m128d) __builtin_ia32_reducesd_mask_round ((__v2df)(__m128d)(A), \
+    (__v2df)(__m128d)(B), (int)(C), (__v2df)(__m128d)(W),	       \
+    (__mmask8)(U), (int)(R)))
+
+#define _mm_maskz_reduce_round_sd(U, A, B, C, R)		       \
+  ((__m128d) __builtin_ia32_reducesd_mask_round ((__v2df)(__m128d)(A), \
+    (__v2df)(__m128d)(B), (int)(C), (__v2df) _mm_setzero_pd (),	       \
+    (__mmask8)(U), (int)(R)))
+
+#define _mm_reduce_ss(A, B, C)						\
+  ((__m128) __builtin_ia32_reducess_mask ((__v4sf)(__m128)(A),		\
+    (__v4sf)(__m128)(B), (int)(C), (__v4sf) _mm_setzero_ps (),		\
+    (__mmask8)-1))
+
+#define _mm_mask_reduce_ss(W, U, A, B, C)				\
+  ((__m128) __builtin_ia32_reducess_mask ((__v4sf)(__m128)(A),		\
+    (__v4sf)(__m128)(B), (int)(C), (__v4sf)(__m128)(W), (__mmask8)(U)))
+
+#define _mm_maskz_reduce_ss(U, A, B, C)					\
+  ((__m128) __builtin_ia32_reducess_mask ((__v4sf)(__m128)(A),		\
+    (__v4sf)(__m128)(B), (int)(C), (__v4sf) _mm_setzero_ps (),		\
+    (__mmask8)(U)))
+
+#define _mm_reduce_round_ss(A, B, C, R)				       \
+  ((__m128) __builtin_ia32_reducess_round ((__v4sf)(__m128)(A),	       \
+    (__v4sf)(__m128)(B), (int)(C), (__mmask8)(U), (int)(R)))
+
+#define _mm_mask_reduce_round_ss(W, U, A, B, C, R)		       \
+  ((__m128) __builtin_ia32_reducess_mask_round ((__v4sf)(__m128)(A),   \
+    (__v4sf)(__m128)(B), (int)(C), (__v4sf)(__m128)(W),		       \
+    (__mmask8)(U), (int)(R)))
+
+#define _mm_maskz_reduce_round_ss(U, A, B, C, R)		       \
+  ((__m128) __builtin_ia32_reducesd_mask_round ((__v4sf)(__m128)(A),   \
+    (__v4sf)(__m128)(B), (int)(C), (__v4sf) _mm_setzero_ps (),	       \
+    (__mmask8)(U), (int)(R)))
+
+#endif
+
+#ifdef __DISABLE_AVX512DQ__
+#undef __DISABLE_AVX512DQ__
+#pragma GCC pop_options
+#endif /* __DISABLE_AVX512DQ__ */
+
+#if !defined (__AVX512DQ__) || !defined (__EVEX512__)
+#pragma GCC push_options
+#pragma GCC target("avx512dq,evex512")
+#define __DISABLE_AVX512DQ_512__
+#endif /* __AVX512DQ_512__ */
 
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_and_pd (__m512d __A, __m512d __B)
+_mm512_broadcast_f64x2 (__m128d __A)
 {
-  return (__m512d) __builtin_ia32_andpd512_mask ((__v8df) __A,
-						 (__v8df) __B,
-						 (__v8df)
-						 _mm512_setzero_pd (),
+  return (__m512d)
+	 __builtin_ia32_broadcastf64x2_512_mask ((__v2df) __A,
+						 _mm512_undefined_pd (),
 						 (__mmask8) -1);
 }
 
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_and_pd (__m512d __W, __mmask8 __U, __m512d __A,
-		    __m512d __B)
+_mm512_mask_broadcast_f64x2 (__m512d __O, __mmask8 __M, __m128d __A)
 {
-  return (__m512d) __builtin_ia32_andpd512_mask ((__v8df) __A,
-						 (__v8df) __B,
-						 (__v8df) __W,
-						 (__mmask8) __U);
+  return (__m512d) __builtin_ia32_broadcastf64x2_512_mask ((__v2df)
+							   __A,
+							   (__v8df)
+							   __O, __M);
 }
 
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_and_pd (__mmask8 __U, __m512d __A, __m512d __B)
+_mm512_maskz_broadcast_f64x2 (__mmask8 __M, __m128d __A)
 {
-  return (__m512d) __builtin_ia32_andpd512_mask ((__v8df) __A,
-						 (__v8df) __B,
-						 (__v8df)
-						 _mm512_setzero_pd (),
-						 (__mmask8) __U);
+  return (__m512d) __builtin_ia32_broadcastf64x2_512_mask ((__v2df)
+							   __A,
+							   (__v8df)
+							   _mm512_setzero_ps (),
+							   __M);
 }
 
-extern __inline __m512
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_and_ps (__m512 __A, __m512 __B)
+_mm512_broadcast_i64x2 (__m128i __A)
 {
-  return (__m512) __builtin_ia32_andps512_mask ((__v16sf) __A,
-						(__v16sf) __B,
-						(__v16sf)
-						_mm512_setzero_ps (),
-						(__mmask16) -1);
+  return (__m512i)
+	 __builtin_ia32_broadcasti64x2_512_mask ((__v2di) __A,
+						 _mm512_undefined_epi32 (),
+						 (__mmask8) -1);
 }
 
-extern __inline __m512
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_and_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B)
+_mm512_mask_broadcast_i64x2 (__m512i __O, __mmask8 __M, __m128i __A)
 {
-  return (__m512) __builtin_ia32_andps512_mask ((__v16sf) __A,
-						(__v16sf) __B,
-						(__v16sf) __W,
-						(__mmask16) __U);
+  return (__m512i) __builtin_ia32_broadcasti64x2_512_mask ((__v2di)
+							   __A,
+							   (__v8di)
+							   __O, __M);
 }
 
-extern __inline __m512
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_and_ps (__mmask16 __U, __m512 __A, __m512 __B)
+_mm512_maskz_broadcast_i64x2 (__mmask8 __M, __m128i __A)
 {
-  return (__m512) __builtin_ia32_andps512_mask ((__v16sf) __A,
-						(__v16sf) __B,
-						(__v16sf)
-						_mm512_setzero_ps (),
-						(__mmask16) __U);
+  return (__m512i) __builtin_ia32_broadcasti64x2_512_mask ((__v2di)
+							   __A,
+							   (__v8di)
+							   _mm512_setzero_si512 (),
+							   __M);
 }
 
-extern __inline __m512d
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_andnot_pd (__m512d __A, __m512d __B)
+_mm512_broadcast_f32x2 (__m128 __A)
 {
-  return (__m512d) __builtin_ia32_andnpd512_mask ((__v8df) __A,
-						  (__v8df) __B,
-						  (__v8df)
-						  _mm512_setzero_pd (),
-						  (__mmask8) -1);
+  return (__m512)
+	 __builtin_ia32_broadcastf32x2_512_mask ((__v4sf) __A,
+						 (__v16sf)_mm512_undefined_ps (),
+						 (__mmask16) -1);
 }
 
-extern __inline __m512d
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_andnot_pd (__m512d __W, __mmask8 __U, __m512d __A,
-		       __m512d __B)
+_mm512_mask_broadcast_f32x2 (__m512 __O, __mmask16 __M, __m128 __A)
 {
-  return (__m512d) __builtin_ia32_andnpd512_mask ((__v8df) __A,
-						  (__v8df) __B,
-						  (__v8df) __W,
-						  (__mmask8) __U);
+  return (__m512) __builtin_ia32_broadcastf32x2_512_mask ((__v4sf) __A,
+							  (__v16sf)
+							  __O, __M);
 }
 
-extern __inline __m512d
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_andnot_pd (__mmask8 __U, __m512d __A, __m512d __B)
+_mm512_maskz_broadcast_f32x2 (__mmask16 __M, __m128 __A)
 {
-  return (__m512d) __builtin_ia32_andnpd512_mask ((__v8df) __A,
-						  (__v8df) __B,
-						  (__v8df)
-						  _mm512_setzero_pd (),
-						  (__mmask8) __U);
+  return (__m512) __builtin_ia32_broadcastf32x2_512_mask ((__v4sf) __A,
+							  (__v16sf)
+							  _mm512_setzero_ps (),
+							  __M);
 }
 
-extern __inline __m512
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_andnot_ps (__m512 __A, __m512 __B)
+_mm512_broadcast_i32x2 (__m128i __A)
 {
-  return (__m512) __builtin_ia32_andnps512_mask ((__v16sf) __A,
-						 (__v16sf) __B,
-						 (__v16sf)
-						 _mm512_setzero_ps (),
+  return (__m512i)
+	 __builtin_ia32_broadcasti32x2_512_mask ((__v4si) __A,
+						 (__v16si)
+						 _mm512_undefined_epi32 (),
 						 (__mmask16) -1);
 }
 
-extern __inline __m512
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_andnot_ps (__m512 __W, __mmask16 __U, __m512 __A,
-		       __m512 __B)
+_mm512_mask_broadcast_i32x2 (__m512i __O, __mmask16 __M, __m128i __A)
 {
-  return (__m512) __builtin_ia32_andnps512_mask ((__v16sf) __A,
-						 (__v16sf) __B,
-						 (__v16sf) __W,
-						 (__mmask16) __U);
+  return (__m512i) __builtin_ia32_broadcasti32x2_512_mask ((__v4si)
+							   __A,
+							   (__v16si)
+							   __O, __M);
 }
 
-extern __inline __m512
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_andnot_ps (__mmask16 __U, __m512 __A, __m512 __B)
+_mm512_maskz_broadcast_i32x2 (__mmask16 __M, __m128i __A)
 {
-  return (__m512) __builtin_ia32_andnps512_mask ((__v16sf) __A,
-						 (__v16sf) __B,
-						 (__v16sf)
-						 _mm512_setzero_ps (),
-						 (__mmask16) __U);
+  return (__m512i) __builtin_ia32_broadcasti32x2_512_mask ((__v4si)
+							   __A,
+							   (__v16si)
+							   _mm512_setzero_si512 (),
+							   __M);
 }
 
-extern __inline __mmask16
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_movepi32_mask (__m512i __A)
+_mm512_broadcast_f32x8 (__m256 __A)
 {
-  return (__mmask16) __builtin_ia32_cvtd2mask512 ((__v16si) __A);
+  return (__m512)
+	 __builtin_ia32_broadcastf32x8_512_mask ((__v8sf) __A,
+						 _mm512_undefined_ps (),
+						 (__mmask16) -1);
 }
 
-extern __inline __mmask8
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_movepi64_mask (__m512i __A)
+_mm512_mask_broadcast_f32x8 (__m512 __O, __mmask16 __M, __m256 __A)
 {
-  return (__mmask8) __builtin_ia32_cvtq2mask512 ((__v8di) __A);
+  return (__m512) __builtin_ia32_broadcastf32x8_512_mask ((__v8sf) __A,
+							  (__v16sf)__O,
+							  __M);
 }
 
-extern __inline __m512i
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_movm_epi32 (__mmask16 __A)
+_mm512_maskz_broadcast_f32x8 (__mmask16 __M, __m256 __A)
 {
-  return (__m512i) __builtin_ia32_cvtmask2d512 (__A);
+  return (__m512) __builtin_ia32_broadcastf32x8_512_mask ((__v8sf) __A,
+							  (__v16sf)
+							  _mm512_setzero_ps (),
+							  __M);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_movm_epi64 (__mmask8 __A)
+_mm512_broadcast_i32x8 (__m256i __A)
 {
-  return (__m512i) __builtin_ia32_cvtmask2q512 (__A);
+  return (__m512i)
+	 __builtin_ia32_broadcasti32x8_512_mask ((__v8si) __A,
+						 (__v16si)
+						 _mm512_undefined_epi32 (),
+						 (__mmask16) -1);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvttpd_epi64 (__m512d __A)
+_mm512_mask_broadcast_i32x8 (__m512i __O, __mmask16 __M, __m256i __A)
 {
-  return (__m512i) __builtin_ia32_cvttpd2qq512_mask ((__v8df) __A,
-						     (__v8di)
-						     _mm512_setzero_si512 (),
-						     (__mmask8) -1,
-						     _MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_broadcasti32x8_512_mask ((__v8si)
+							   __A,
+							   (__v16si)__O,
+							   __M);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvttpd_epi64 (__m512i __W, __mmask8 __U, __m512d __A)
+_mm512_maskz_broadcast_i32x8 (__mmask16 __M, __m256i __A)
 {
-  return (__m512i) __builtin_ia32_cvttpd2qq512_mask ((__v8df) __A,
-						     (__v8di) __W,
-						     (__mmask8) __U,
-						     _MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_broadcasti32x8_512_mask ((__v8si)
+							   __A,
+							   (__v16si)
+							   _mm512_setzero_si512 (),
+							   __M);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvttpd_epi64 (__mmask8 __U, __m512d __A)
+_mm512_mullo_epi64 (__m512i __A, __m512i __B)
 {
-  return (__m512i) __builtin_ia32_cvttpd2qq512_mask ((__v8df) __A,
-						     (__v8di)
-						     _mm512_setzero_si512 (),
-						     (__mmask8) __U,
-						     _MM_FROUND_CUR_DIRECTION);
+  return (__m512i) ((__v8du) __A * (__v8du) __B);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvttpd_epu64 (__m512d __A)
+_mm512_mask_mullo_epi64 (__m512i __W, __mmask8 __U, __m512i __A,
+			 __m512i __B)
 {
-  return (__m512i) __builtin_ia32_cvttpd2uqq512_mask ((__v8df) __A,
-						      (__v8di)
-						      _mm512_setzero_si512 (),
-						      (__mmask8) -1,
-						      _MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_pmullq512_mask ((__v8di) __A,
+						  (__v8di) __B,
+						  (__v8di) __W,
+						  (__mmask8) __U);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvttpd_epu64 (__m512i __W, __mmask8 __U, __m512d __A)
+_mm512_maskz_mullo_epi64 (__mmask8 __U, __m512i __A, __m512i __B)
 {
-  return (__m512i) __builtin_ia32_cvttpd2uqq512_mask ((__v8df) __A,
-						      (__v8di) __W,
-						      (__mmask8) __U,
-						      _MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_pmullq512_mask ((__v8di) __A,
+						  (__v8di) __B,
+						  (__v8di)
+						  _mm512_setzero_si512 (),
+						  (__mmask8) __U);
 }
 
-extern __inline __m512i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvttpd_epu64 (__mmask8 __U, __m512d __A)
+_mm512_xor_pd (__m512d __A, __m512d __B)
 {
-  return (__m512i) __builtin_ia32_cvttpd2uqq512_mask ((__v8df) __A,
-						      (__v8di)
-						      _mm512_setzero_si512 (),
-						      (__mmask8) __U,
-						      _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_xorpd512_mask ((__v8df) __A,
+						 (__v8df) __B,
+						 (__v8df)
+						 _mm512_setzero_pd (),
+						 (__mmask8) -1);
 }
 
-extern __inline __m512i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvttps_epi64 (__m256 __A)
+_mm512_mask_xor_pd (__m512d __W, __mmask8 __U, __m512d __A,
+		    __m512d __B)
 {
-  return (__m512i) __builtin_ia32_cvttps2qq512_mask ((__v8sf) __A,
-						     (__v8di)
-						     _mm512_setzero_si512 (),
-						     (__mmask8) -1,
-						     _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_xorpd512_mask ((__v8df) __A,
+						 (__v8df) __B,
+						 (__v8df) __W,
+						 (__mmask8) __U);
 }
 
-extern __inline __m512i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvttps_epi64 (__m512i __W, __mmask8 __U, __m256 __A)
+_mm512_maskz_xor_pd (__mmask8 __U, __m512d __A, __m512d __B)
 {
-  return (__m512i) __builtin_ia32_cvttps2qq512_mask ((__v8sf) __A,
-						     (__v8di) __W,
-						     (__mmask8) __U,
-						     _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_xorpd512_mask ((__v8df) __A,
+						 (__v8df) __B,
+						 (__v8df)
+						 _mm512_setzero_pd (),
+						 (__mmask8) __U);
 }
 
-extern __inline __m512i
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvttps_epi64 (__mmask8 __U, __m256 __A)
+_mm512_xor_ps (__m512 __A, __m512 __B)
 {
-  return (__m512i) __builtin_ia32_cvttps2qq512_mask ((__v8sf) __A,
-						     (__v8di)
-						     _mm512_setzero_si512 (),
-						     (__mmask8) __U,
-						     _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_xorps512_mask ((__v16sf) __A,
+						(__v16sf) __B,
+						(__v16sf)
+						_mm512_setzero_ps (),
+						(__mmask16) -1);
 }
 
-extern __inline __m512i
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvttps_epu64 (__m256 __A)
+_mm512_mask_xor_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B)
 {
-  return (__m512i) __builtin_ia32_cvttps2uqq512_mask ((__v8sf) __A,
-						      (__v8di)
-						      _mm512_setzero_si512 (),
-						      (__mmask8) -1,
-						      _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_xorps512_mask ((__v16sf) __A,
+						(__v16sf) __B,
+						(__v16sf) __W,
+						(__mmask16) __U);
 }
 
-extern __inline __m512i
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvttps_epu64 (__m512i __W, __mmask8 __U, __m256 __A)
+_mm512_maskz_xor_ps (__mmask16 __U, __m512 __A, __m512 __B)
 {
-  return (__m512i) __builtin_ia32_cvttps2uqq512_mask ((__v8sf) __A,
-						      (__v8di) __W,
-						      (__mmask8) __U,
-						      _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_xorps512_mask ((__v16sf) __A,
+						(__v16sf) __B,
+						(__v16sf)
+						_mm512_setzero_ps (),
+						(__mmask16) __U);
 }
 
-extern __inline __m512i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvttps_epu64 (__mmask8 __U, __m256 __A)
+_mm512_or_pd (__m512d __A, __m512d __B)
 {
-  return (__m512i) __builtin_ia32_cvttps2uqq512_mask ((__v8sf) __A,
-						      (__v8di)
-						      _mm512_setzero_si512 (),
-						      (__mmask8) __U,
-						      _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_orpd512_mask ((__v8df) __A,
+						(__v8df) __B,
+						(__v8df)
+						_mm512_setzero_pd (),
+						(__mmask8) -1);
 }
 
-extern __inline __m512i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtpd_epi64 (__m512d __A)
+_mm512_mask_or_pd (__m512d __W, __mmask8 __U, __m512d __A, __m512d __B)
 {
-  return (__m512i) __builtin_ia32_cvtpd2qq512_mask ((__v8df) __A,
-						    (__v8di)
-						    _mm512_setzero_si512 (),
-						    (__mmask8) -1,
-						    _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_orpd512_mask ((__v8df) __A,
+						(__v8df) __B,
+						(__v8df) __W,
+						(__mmask8) __U);
 }
 
-extern __inline __m512i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtpd_epi64 (__m512i __W, __mmask8 __U, __m512d __A)
+_mm512_maskz_or_pd (__mmask8 __U, __m512d __A, __m512d __B)
 {
-  return (__m512i) __builtin_ia32_cvtpd2qq512_mask ((__v8df) __A,
-						    (__v8di) __W,
-						    (__mmask8) __U,
-						    _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_orpd512_mask ((__v8df) __A,
+						(__v8df) __B,
+						(__v8df)
+						_mm512_setzero_pd (),
+						(__mmask8) __U);
 }
 
-extern __inline __m512i
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtpd_epi64 (__mmask8 __U, __m512d __A)
+_mm512_or_ps (__m512 __A, __m512 __B)
 {
-  return (__m512i) __builtin_ia32_cvtpd2qq512_mask ((__v8df) __A,
-						    (__v8di)
-						    _mm512_setzero_si512 (),
-						    (__mmask8) __U,
-						    _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_orps512_mask ((__v16sf) __A,
+					       (__v16sf) __B,
+					       (__v16sf)
+					       _mm512_setzero_ps (),
+					       (__mmask16) -1);
 }
 
-extern __inline __m512i
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtpd_epu64 (__m512d __A)
+_mm512_mask_or_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B)
 {
-  return (__m512i) __builtin_ia32_cvtpd2uqq512_mask ((__v8df) __A,
-						     (__v8di)
-						     _mm512_setzero_si512 (),
-						     (__mmask8) -1,
-						     _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_orps512_mask ((__v16sf) __A,
+					       (__v16sf) __B,
+					       (__v16sf) __W,
+					       (__mmask16) __U);
 }
 
-extern __inline __m512i
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtpd_epu64 (__m512i __W, __mmask8 __U, __m512d __A)
+_mm512_maskz_or_ps (__mmask16 __U, __m512 __A, __m512 __B)
 {
-  return (__m512i) __builtin_ia32_cvtpd2uqq512_mask ((__v8df) __A,
-						     (__v8di) __W,
-						     (__mmask8) __U,
-						     _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_orps512_mask ((__v16sf) __A,
+					       (__v16sf) __B,
+					       (__v16sf)
+					       _mm512_setzero_ps (),
+					       (__mmask16) __U);
 }
 
-extern __inline __m512i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtpd_epu64 (__mmask8 __U, __m512d __A)
+_mm512_and_pd (__m512d __A, __m512d __B)
 {
-  return (__m512i) __builtin_ia32_cvtpd2uqq512_mask ((__v8df) __A,
-						     (__v8di)
-						     _mm512_setzero_si512 (),
-						     (__mmask8) __U,
-						     _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_andpd512_mask ((__v8df) __A,
+						 (__v8df) __B,
+						 (__v8df)
+						 _mm512_setzero_pd (),
+						 (__mmask8) -1);
 }
 
-extern __inline __m512i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtps_epi64 (__m256 __A)
+_mm512_mask_and_pd (__m512d __W, __mmask8 __U, __m512d __A,
+		    __m512d __B)
 {
-  return (__m512i) __builtin_ia32_cvtps2qq512_mask ((__v8sf) __A,
-						    (__v8di)
-						    _mm512_setzero_si512 (),
-						    (__mmask8) -1,
-						    _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_andpd512_mask ((__v8df) __A,
+						 (__v8df) __B,
+						 (__v8df) __W,
+						 (__mmask8) __U);
 }
 
-extern __inline __m512i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtps_epi64 (__m512i __W, __mmask8 __U, __m256 __A)
+_mm512_maskz_and_pd (__mmask8 __U, __m512d __A, __m512d __B)
 {
-  return (__m512i) __builtin_ia32_cvtps2qq512_mask ((__v8sf) __A,
-						    (__v8di) __W,
-						    (__mmask8) __U,
-						    _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_andpd512_mask ((__v8df) __A,
+						 (__v8df) __B,
+						 (__v8df)
+						 _mm512_setzero_pd (),
+						 (__mmask8) __U);
 }
 
-extern __inline __m512i
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtps_epi64 (__mmask8 __U, __m256 __A)
+_mm512_and_ps (__m512 __A, __m512 __B)
 {
-  return (__m512i) __builtin_ia32_cvtps2qq512_mask ((__v8sf) __A,
-						    (__v8di)
-						    _mm512_setzero_si512 (),
-						    (__mmask8) __U,
-						    _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_andps512_mask ((__v16sf) __A,
+						(__v16sf) __B,
+						(__v16sf)
+						_mm512_setzero_ps (),
+						(__mmask16) -1);
 }
 
-extern __inline __m512i
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtps_epu64 (__m256 __A)
+_mm512_mask_and_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B)
 {
-  return (__m512i) __builtin_ia32_cvtps2uqq512_mask ((__v8sf) __A,
-						     (__v8di)
-						     _mm512_setzero_si512 (),
-						     (__mmask8) -1,
-						     _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_andps512_mask ((__v16sf) __A,
+						(__v16sf) __B,
+						(__v16sf) __W,
+						(__mmask16) __U);
 }
 
-extern __inline __m512i
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtps_epu64 (__m512i __W, __mmask8 __U, __m256 __A)
+_mm512_maskz_and_ps (__mmask16 __U, __m512 __A, __m512 __B)
 {
-  return (__m512i) __builtin_ia32_cvtps2uqq512_mask ((__v8sf) __A,
-						     (__v8di) __W,
-						     (__mmask8) __U,
-						     _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_andps512_mask ((__v16sf) __A,
+						(__v16sf) __B,
+						(__v16sf)
+						_mm512_setzero_ps (),
+						(__mmask16) __U);
 }
 
-extern __inline __m512i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtps_epu64 (__mmask8 __U, __m256 __A)
+_mm512_andnot_pd (__m512d __A, __m512d __B)
 {
-  return (__m512i) __builtin_ia32_cvtps2uqq512_mask ((__v8sf) __A,
-						     (__v8di)
-						     _mm512_setzero_si512 (),
-						     (__mmask8) __U,
-						     _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_andnpd512_mask ((__v8df) __A,
+						  (__v8df) __B,
+						  (__v8df)
+						  _mm512_setzero_pd (),
+						  (__mmask8) -1);
 }
 
-extern __inline __m256
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepi64_ps (__m512i __A)
+_mm512_mask_andnot_pd (__m512d __W, __mmask8 __U, __m512d __A,
+		       __m512d __B)
 {
-  return (__m256) __builtin_ia32_cvtqq2ps512_mask ((__v8di) __A,
-						   (__v8sf)
-						   _mm256_setzero_ps (),
-						   (__mmask8) -1,
-						   _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_andnpd512_mask ((__v8df) __A,
+						  (__v8df) __B,
+						  (__v8df) __W,
+						  (__mmask8) __U);
 }
 
-extern __inline __m256
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepi64_ps (__m256 __W, __mmask8 __U, __m512i __A)
+_mm512_maskz_andnot_pd (__mmask8 __U, __m512d __A, __m512d __B)
 {
-  return (__m256) __builtin_ia32_cvtqq2ps512_mask ((__v8di) __A,
-						   (__v8sf) __W,
-						   (__mmask8) __U,
-						   _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_andnpd512_mask ((__v8df) __A,
+						  (__v8df) __B,
+						  (__v8df)
+						  _mm512_setzero_pd (),
+						  (__mmask8) __U);
 }
 
-extern __inline __m256
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepi64_ps (__mmask8 __U, __m512i __A)
+_mm512_andnot_ps (__m512 __A, __m512 __B)
 {
-  return (__m256) __builtin_ia32_cvtqq2ps512_mask ((__v8di) __A,
-						   (__v8sf)
-						   _mm256_setzero_ps (),
-						   (__mmask8) __U,
-						   _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_andnps512_mask ((__v16sf) __A,
+						 (__v16sf) __B,
+						 (__v16sf)
+						 _mm512_setzero_ps (),
+						 (__mmask16) -1);
 }
 
-extern __inline __m256
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepu64_ps (__m512i __A)
+_mm512_mask_andnot_ps (__m512 __W, __mmask16 __U, __m512 __A,
+		       __m512 __B)
 {
-  return (__m256) __builtin_ia32_cvtuqq2ps512_mask ((__v8di) __A,
-						    (__v8sf)
-						    _mm256_setzero_ps (),
-						    (__mmask8) -1,
-						    _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_andnps512_mask ((__v16sf) __A,
+						 (__v16sf) __B,
+						 (__v16sf) __W,
+						 (__mmask16) __U);
 }
 
-extern __inline __m256
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepu64_ps (__m256 __W, __mmask8 __U, __m512i __A)
+_mm512_maskz_andnot_ps (__mmask16 __U, __m512 __A, __m512 __B)
 {
-  return (__m256) __builtin_ia32_cvtuqq2ps512_mask ((__v8di) __A,
-						    (__v8sf) __W,
-						    (__mmask8) __U,
-						    _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_andnps512_mask ((__v16sf) __A,
+						 (__v16sf) __B,
+						 (__v16sf)
+						 _mm512_setzero_ps (),
+						 (__mmask16) __U);
 }
 
-extern __inline __m256
+extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepu64_ps (__mmask8 __U, __m512i __A)
+_mm512_movepi32_mask (__m512i __A)
 {
-  return (__m256) __builtin_ia32_cvtuqq2ps512_mask ((__v8di) __A,
-						    (__v8sf)
-						    _mm256_setzero_ps (),
-						    (__mmask8) __U,
-						    _MM_FROUND_CUR_DIRECTION);
+  return (__mmask16) __builtin_ia32_cvtd2mask512 ((__v16si) __A);
 }
 
-extern __inline __m512d
+extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepi64_pd (__m512i __A)
+_mm512_movepi64_mask (__m512i __A)
 {
-  return (__m512d) __builtin_ia32_cvtqq2pd512_mask ((__v8di) __A,
-						    (__v8df)
-						    _mm512_setzero_pd (),
-						    (__mmask8) -1,
-						    _MM_FROUND_CUR_DIRECTION);
+  return (__mmask8) __builtin_ia32_cvtq2mask512 ((__v8di) __A);
 }
 
-extern __inline __m512d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepi64_pd (__m512d __W, __mmask8 __U, __m512i __A)
+_mm512_movm_epi32 (__mmask16 __A)
 {
-  return (__m512d) __builtin_ia32_cvtqq2pd512_mask ((__v8di) __A,
-						    (__v8df) __W,
-						    (__mmask8) __U,
-						    _MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_cvtmask2d512 (__A);
 }
 
-extern __inline __m512d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepi64_pd (__mmask8 __U, __m512i __A)
+_mm512_movm_epi64 (__mmask8 __A)
 {
-  return (__m512d) __builtin_ia32_cvtqq2pd512_mask ((__v8di) __A,
-						    (__v8df)
-						    _mm512_setzero_pd (),
-						    (__mmask8) __U,
-						    _MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_cvtmask2q512 (__A);
 }
 
-extern __inline __m512d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepu64_pd (__m512i __A)
+_mm512_cvttpd_epi64 (__m512d __A)
 {
-  return (__m512d) __builtin_ia32_cvtuqq2pd512_mask ((__v8di) __A,
-						     (__v8df)
-						     _mm512_setzero_pd (),
+  return (__m512i) __builtin_ia32_cvttpd2qq512_mask ((__v8df) __A,
+						     (__v8di)
+						     _mm512_setzero_si512 (),
 						     (__mmask8) -1,
 						     _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepu64_pd (__m512d __W, __mmask8 __U, __m512i __A)
+_mm512_mask_cvttpd_epi64 (__m512i __W, __mmask8 __U, __m512d __A)
 {
-  return (__m512d) __builtin_ia32_cvtuqq2pd512_mask ((__v8di) __A,
-						     (__v8df) __W,
+  return (__m512i) __builtin_ia32_cvttpd2qq512_mask ((__v8df) __A,
+						     (__v8di) __W,
 						     (__mmask8) __U,
 						     _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepu64_pd (__mmask8 __U, __m512i __A)
+_mm512_maskz_cvttpd_epi64 (__mmask8 __U, __m512d __A)
 {
-  return (__m512d) __builtin_ia32_cvtuqq2pd512_mask ((__v8di) __A,
-						     (__v8df)
-						     _mm512_setzero_pd (),
+  return (__m512i) __builtin_ia32_cvttpd2qq512_mask ((__v8df) __A,
+						     (__v8di)
+						     _mm512_setzero_si512 (),
 						     (__mmask8) __U,
 						     _MM_FROUND_CUR_DIRECTION);
 }
 
-#ifdef __OPTIMIZE__
-extern __inline __mmask8
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kshiftli_mask8 (__mmask8 __A, unsigned int __B)
+_mm512_cvttpd_epu64 (__m512d __A)
 {
-  return (__mmask8) __builtin_ia32_kshiftliqi ((__mmask8) __A, (__mmask8) __B);
+  return (__m512i) __builtin_ia32_cvttpd2uqq512_mask ((__v8df) __A,
+						      (__v8di)
+						      _mm512_setzero_si512 (),
+						      (__mmask8) -1,
+						      _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __mmask8
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kshiftri_mask8 (__mmask8 __A, unsigned int __B)
+_mm512_mask_cvttpd_epu64 (__m512i __W, __mmask8 __U, __m512d __A)
 {
-  return (__mmask8) __builtin_ia32_kshiftriqi ((__mmask8) __A, (__mmask8) __B);
+  return (__m512i) __builtin_ia32_cvttpd2uqq512_mask ((__v8df) __A,
+						      (__v8di) __W,
+						      (__mmask8) __U,
+						      _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_range_pd (__m512d __A, __m512d __B, int __C)
+_mm512_maskz_cvttpd_epu64 (__mmask8 __U, __m512d __A)
 {
-  return (__m512d) __builtin_ia32_rangepd512_mask ((__v8df) __A,
-						   (__v8df) __B, __C,
-						   (__v8df)
-						   _mm512_setzero_pd (),
-						   (__mmask8) -1,
-						   _MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_cvttpd2uqq512_mask ((__v8df) __A,
+						      (__v8di)
+						      _mm512_setzero_si512 (),
+						      (__mmask8) __U,
+						      _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_range_pd (__m512d __W, __mmask8 __U,
-		      __m512d __A, __m512d __B, int __C)
+_mm512_cvttps_epi64 (__m256 __A)
 {
-  return (__m512d) __builtin_ia32_rangepd512_mask ((__v8df) __A,
-						   (__v8df) __B, __C,
-						   (__v8df) __W,
-						   (__mmask8) __U,
-						   _MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_cvttps2qq512_mask ((__v8sf) __A,
+						     (__v8di)
+						     _mm512_setzero_si512 (),
+						     (__mmask8) -1,
+						     _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_range_pd (__mmask8 __U, __m512d __A, __m512d __B, int __C)
+_mm512_mask_cvttps_epi64 (__m512i __W, __mmask8 __U, __m256 __A)
 {
-  return (__m512d) __builtin_ia32_rangepd512_mask ((__v8df) __A,
-						   (__v8df) __B, __C,
-						   (__v8df)
-						   _mm512_setzero_pd (),
-						   (__mmask8) __U,
-						   _MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_cvttps2qq512_mask ((__v8sf) __A,
+						     (__v8di) __W,
+						     (__mmask8) __U,
+						     _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_range_ps (__m512 __A, __m512 __B, int __C)
+_mm512_maskz_cvttps_epi64 (__mmask8 __U, __m256 __A)
 {
-  return (__m512) __builtin_ia32_rangeps512_mask ((__v16sf) __A,
-						  (__v16sf) __B, __C,
-						  (__v16sf)
-						  _mm512_setzero_ps (),
-						  (__mmask16) -1,
-						  _MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_cvttps2qq512_mask ((__v8sf) __A,
+						     (__v8di)
+						     _mm512_setzero_si512 (),
+						     (__mmask8) __U,
+						     _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_range_ps (__m512 __W, __mmask16 __U,
-		      __m512 __A, __m512 __B, int __C)
+_mm512_cvttps_epu64 (__m256 __A)
 {
-  return (__m512) __builtin_ia32_rangeps512_mask ((__v16sf) __A,
-						  (__v16sf) __B, __C,
-						  (__v16sf) __W,
-						  (__mmask16) __U,
-						  _MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_cvttps2uqq512_mask ((__v8sf) __A,
+						      (__v8di)
+						      _mm512_setzero_si512 (),
+						      (__mmask8) -1,
+						      _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_range_ps (__mmask16 __U, __m512 __A, __m512 __B, int __C)
+_mm512_mask_cvttps_epu64 (__m512i __W, __mmask8 __U, __m256 __A)
 {
-  return (__m512) __builtin_ia32_rangeps512_mask ((__v16sf) __A,
-						  (__v16sf) __B, __C,
-						  (__v16sf)
-						  _mm512_setzero_ps (),
-						  (__mmask16) __U,
-						  _MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __builtin_ia32_cvttps2uqq512_mask ((__v8sf) __A,
+						      (__v8di) __W,
+						      (__mmask8) __U,
+						      _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_reduce_sd (__m128d __A, __m128d __B, int __C)
+_mm512_maskz_cvttps_epu64 (__mmask8 __U, __m256 __A)
 {
-  return (__m128d) __builtin_ia32_reducesd_mask ((__v2df) __A,
-						 (__v2df) __B, __C,
-						 (__v2df) _mm_setzero_pd (),
-						 (__mmask8) -1);
+  return (__m512i) __builtin_ia32_cvttps2uqq512_mask ((__v8sf) __A,
+						      (__v8di)
+						      _mm512_setzero_si512 (),
+						      (__mmask8) __U,
+						      _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_reduce_round_sd (__m128d __A, __m128d __B, int __C, const int __R)
+_mm512_cvtpd_epi64 (__m512d __A)
 {
-  return (__m128d) __builtin_ia32_reducesd_mask_round ((__v2df) __A,
-						       (__v2df) __B, __C,
-						       (__v2df)
-						       _mm_setzero_pd (),
-						       (__mmask8) -1, __R);
+  return (__m512i) __builtin_ia32_cvtpd2qq512_mask ((__v8df) __A,
+						    (__v8di)
+						    _mm512_setzero_si512 (),
+						    (__mmask8) -1,
+						    _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cvtpd_epi64 (__m512i __W, __mmask8 __U, __m512d __A)
+{
+  return (__m512i) __builtin_ia32_cvtpd2qq512_mask ((__v8df) __A,
+						    (__v8di) __W,
+						    (__mmask8) __U,
+						    _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_reduce_sd (__m128d __W,  __mmask8 __U, __m128d __A,
-		    __m128d __B, int __C)
+_mm512_maskz_cvtpd_epi64 (__mmask8 __U, __m512d __A)
 {
-  return (__m128d) __builtin_ia32_reducesd_mask ((__v2df) __A,
-						 (__v2df) __B, __C,
-						 (__v2df) __W,
-						 (__mmask8) __U);
+  return (__m512i) __builtin_ia32_cvtpd2qq512_mask ((__v8df) __A,
+						    (__v8di)
+						    _mm512_setzero_si512 (),
+						    (__mmask8) __U,
+						    _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_reduce_round_sd (__m128d __W,  __mmask8 __U, __m128d __A,
-			  __m128d __B, int __C, const int __R)
+_mm512_cvtpd_epu64 (__m512d __A)
 {
-  return (__m128d) __builtin_ia32_reducesd_mask_round ((__v2df) __A,
-						       (__v2df) __B, __C,
-						       (__v2df) __W,
-						       __U, __R);
+  return (__m512i) __builtin_ia32_cvtpd2uqq512_mask ((__v8df) __A,
+						     (__v8di)
+						     _mm512_setzero_si512 (),
+						     (__mmask8) -1,
+						     _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_reduce_sd (__mmask8 __U, __m128d __A, __m128d __B, int __C)
+_mm512_mask_cvtpd_epu64 (__m512i __W, __mmask8 __U, __m512d __A)
 {
-  return (__m128d) __builtin_ia32_reducesd_mask ((__v2df) __A,
-						 (__v2df) __B, __C,
-						 (__v2df) _mm_setzero_pd (),
-						 (__mmask8) __U);
+  return (__m512i) __builtin_ia32_cvtpd2uqq512_mask ((__v8df) __A,
+						     (__v8di) __W,
+						     (__mmask8) __U,
+						     _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_reduce_round_sd (__mmask8 __U, __m128d __A, __m128d __B,
-			   int __C, const int __R)
+_mm512_maskz_cvtpd_epu64 (__mmask8 __U, __m512d __A)
 {
-  return (__m128d) __builtin_ia32_reducesd_mask_round ((__v2df) __A,
-						       (__v2df) __B, __C,
-						       (__v2df)
-						       _mm_setzero_pd (),
-						       __U, __R);
+  return (__m512i) __builtin_ia32_cvtpd2uqq512_mask ((__v8df) __A,
+						     (__v8di)
+						     _mm512_setzero_si512 (),
+						     (__mmask8) __U,
+						     _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_reduce_ss (__m128 __A, __m128 __B, int __C)
+_mm512_cvtps_epi64 (__m256 __A)
 {
-  return (__m128) __builtin_ia32_reducess_mask ((__v4sf) __A,
-						(__v4sf) __B, __C,
-						(__v4sf) _mm_setzero_ps (),
-						(__mmask8) -1);
+  return (__m512i) __builtin_ia32_cvtps2qq512_mask ((__v8sf) __A,
+						    (__v8di)
+						    _mm512_setzero_si512 (),
+						    (__mmask8) -1,
+						    _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_reduce_round_ss (__m128 __A, __m128 __B, int __C, const int __R)
+_mm512_mask_cvtps_epi64 (__m512i __W, __mmask8 __U, __m256 __A)
 {
-  return (__m128) __builtin_ia32_reducess_mask_round ((__v4sf) __A,
-						      (__v4sf) __B, __C,
-						      (__v4sf)
-						      _mm_setzero_ps (),
-						      (__mmask8) -1, __R);
+  return (__m512i) __builtin_ia32_cvtps2qq512_mask ((__v8sf) __A,
+						    (__v8di) __W,
+						    (__mmask8) __U,
+						    _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_reduce_ss (__m128 __W,  __mmask8 __U, __m128 __A,
-		    __m128 __B, int __C)
+_mm512_maskz_cvtps_epi64 (__mmask8 __U, __m256 __A)
 {
-  return (__m128) __builtin_ia32_reducess_mask ((__v4sf) __A,
-						(__v4sf) __B, __C,
-						(__v4sf) __W,
-						(__mmask8) __U);
+  return (__m512i) __builtin_ia32_cvtps2qq512_mask ((__v8sf) __A,
+						    (__v8di)
+						    _mm512_setzero_si512 (),
+						    (__mmask8) __U,
+						    _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_reduce_round_ss (__m128 __W,  __mmask8 __U, __m128 __A,
-			  __m128 __B, int __C, const int __R)
+_mm512_cvtps_epu64 (__m256 __A)
 {
-  return (__m128) __builtin_ia32_reducess_mask_round ((__v4sf) __A,
-						      (__v4sf) __B, __C,
-						      (__v4sf) __W,
-						      __U, __R);
+  return (__m512i) __builtin_ia32_cvtps2uqq512_mask ((__v8sf) __A,
+						     (__v8di)
+						     _mm512_setzero_si512 (),
+						     (__mmask8) -1,
+						     _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_reduce_ss (__mmask8 __U, __m128 __A, __m128 __B, int __C)
+_mm512_mask_cvtps_epu64 (__m512i __W, __mmask8 __U, __m256 __A)
 {
-  return (__m128) __builtin_ia32_reducess_mask ((__v4sf) __A,
-						(__v4sf) __B, __C,
-						(__v4sf) _mm_setzero_ps (),
-						(__mmask8) __U);
+  return (__m512i) __builtin_ia32_cvtps2uqq512_mask ((__v8sf) __A,
+						     (__v8di) __W,
+						     (__mmask8) __U,
+						     _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_reduce_round_ss (__mmask8 __U, __m128 __A, __m128 __B,
-			   int __C, const int __R)
+_mm512_maskz_cvtps_epu64 (__mmask8 __U, __m256 __A)
 {
-  return (__m128) __builtin_ia32_reducess_mask_round ((__v4sf) __A,
-						      (__v4sf) __B, __C,
-						      (__v4sf)
-						      _mm_setzero_ps (),
-						      __U, __R);
+  return (__m512i) __builtin_ia32_cvtps2uqq512_mask ((__v8sf) __A,
+						     (__v8di)
+						     _mm512_setzero_si512 (),
+						     (__mmask8) __U,
+						     _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128d
+extern __inline __m256
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_range_sd (__m128d __A, __m128d __B, int __C)
+_mm512_cvtepi64_ps (__m512i __A)
 {
-  return (__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df) __A,
-						   (__v2df) __B, __C,
-						   (__v2df)
-						   _mm_setzero_pd (),
+  return (__m256) __builtin_ia32_cvtqq2ps512_mask ((__v8di) __A,
+						   (__v8sf)
+						   _mm256_setzero_ps (),
 						   (__mmask8) -1,
 						   _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128d
+extern __inline __m256
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_range_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B, int __C)
+_mm512_mask_cvtepi64_ps (__m256 __W, __mmask8 __U, __m512i __A)
 {
-  return (__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df) __A,
-						   (__v2df) __B, __C,
-						   (__v2df) __W,
+  return (__m256) __builtin_ia32_cvtqq2ps512_mask ((__v8di) __A,
+						   (__v8sf) __W,
 						   (__mmask8) __U,
 						   _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128d
+extern __inline __m256
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_range_sd (__mmask8 __U, __m128d __A, __m128d __B, int __C)
+_mm512_maskz_cvtepi64_ps (__mmask8 __U, __m512i __A)
 {
-  return (__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df) __A,
-						   (__v2df) __B, __C,
-						   (__v2df)
-						   _mm_setzero_pd (),
+  return (__m256) __builtin_ia32_cvtqq2ps512_mask ((__v8di) __A,
+						   (__v8sf)
+						   _mm256_setzero_ps (),
 						   (__mmask8) __U,
 						   _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128
+extern __inline __m256
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_range_ss (__m128 __A, __m128 __B, int __C)
+_mm512_cvtepu64_ps (__m512i __A)
 {
-  return (__m128) __builtin_ia32_rangess128_mask_round ((__v4sf) __A,
-						  (__v4sf) __B, __C,
-						  (__v4sf)
-						  _mm_setzero_ps (),
-						  (__mmask8) -1,
-						  _MM_FROUND_CUR_DIRECTION);
+  return (__m256) __builtin_ia32_cvtuqq2ps512_mask ((__v8di) __A,
+						    (__v8sf)
+						    _mm256_setzero_ps (),
+						    (__mmask8) -1,
+						    _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128
+extern __inline __m256
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_range_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B, int __C)
+_mm512_mask_cvtepu64_ps (__m256 __W, __mmask8 __U, __m512i __A)
 {
-  return (__m128) __builtin_ia32_rangess128_mask_round ((__v4sf) __A,
-						  (__v4sf) __B, __C,
-						  (__v4sf) __W,
-						  (__mmask8) __U,
-						  _MM_FROUND_CUR_DIRECTION);
+  return (__m256) __builtin_ia32_cvtuqq2ps512_mask ((__v8di) __A,
+						    (__v8sf) __W,
+						    (__mmask8) __U,
+						    _MM_FROUND_CUR_DIRECTION);
 }
 
+extern __inline __m256
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvtepu64_ps (__mmask8 __U, __m512i __A)
+{
+  return (__m256) __builtin_ia32_cvtuqq2ps512_mask ((__v8di) __A,
+						    (__v8sf)
+						    _mm256_setzero_ps (),
+						    (__mmask8) __U,
+						    _MM_FROUND_CUR_DIRECTION);
+}
 
-extern __inline __m128
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_range_ss (__mmask8 __U, __m128 __A, __m128 __B, int __C)
+_mm512_cvtepi64_pd (__m512i __A)
 {
-  return (__m128) __builtin_ia32_rangess128_mask_round ((__v4sf) __A,
-						  (__v4sf) __B, __C,
-						  (__v4sf)
-						  _mm_setzero_ps (),
-						  (__mmask8) __U,
-						  _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_cvtqq2pd512_mask ((__v8di) __A,
+						    (__v8df)
+						    _mm512_setzero_pd (),
+						    (__mmask8) -1,
+						    _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128d
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_range_round_sd (__m128d __A, __m128d __B, int __C, const int __R)
+_mm512_mask_cvtepi64_pd (__m512d __W, __mmask8 __U, __m512i __A)
 {
-  return (__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df) __A,
-						   (__v2df) __B, __C,
-						   (__v2df)
-						   _mm_setzero_pd (),
-						   (__mmask8) -1, __R);
+  return (__m512d) __builtin_ia32_cvtqq2pd512_mask ((__v8di) __A,
+						    (__v8df) __W,
+						    (__mmask8) __U,
+						    _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128d
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_range_round_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B,
-			 int __C, const int __R)
+_mm512_maskz_cvtepi64_pd (__mmask8 __U, __m512i __A)
+{
+  return (__m512d) __builtin_ia32_cvtqq2pd512_mask ((__v8di) __A,
+						    (__v8df)
+						    _mm512_setzero_pd (),
+						    (__mmask8) __U,
+						    _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvtepu64_pd (__m512i __A)
 {
-  return (__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df) __A,
-						   (__v2df) __B, __C,
-						   (__v2df) __W,
-						   (__mmask8) __U, __R);
+  return (__m512d) __builtin_ia32_cvtuqq2pd512_mask ((__v8di) __A,
+						     (__v8df)
+						     _mm512_setzero_pd (),
+						     (__mmask8) -1,
+						     _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128d
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_range_round_sd (__mmask8 __U, __m128d __A, __m128d __B, int __C,
-			  const int __R)
+_mm512_mask_cvtepu64_pd (__m512d __W, __mmask8 __U, __m512i __A)
 {
-  return (__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df) __A,
-						   (__v2df) __B, __C,
-						   (__v2df)
-						   _mm_setzero_pd (),
-						   (__mmask8) __U, __R);
+  return (__m512d) __builtin_ia32_cvtuqq2pd512_mask ((__v8di) __A,
+						     (__v8df) __W,
+						     (__mmask8) __U,
+						     _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_range_round_ss (__m128 __A, __m128 __B, int __C, const int __R)
+_mm512_maskz_cvtepu64_pd (__mmask8 __U, __m512i __A)
 {
-  return (__m128) __builtin_ia32_rangess128_mask_round ((__v4sf) __A,
-						  (__v4sf) __B, __C,
-						  (__v4sf)
-						  _mm_setzero_ps (),
-						  (__mmask8) -1, __R);
+  return (__m512d) __builtin_ia32_cvtuqq2pd512_mask ((__v8di) __A,
+						     (__v8df)
+						     _mm512_setzero_pd (),
+						     (__mmask8) __U,
+						     _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128
+#ifdef __OPTIMIZE__
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_range_round_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B,
-			 int __C, const int __R)
+_mm512_range_pd (__m512d __A, __m512d __B, int __C)
 {
-  return (__m128) __builtin_ia32_rangess128_mask_round ((__v4sf) __A,
-						  (__v4sf) __B, __C,
-						  (__v4sf) __W,
-						  (__mmask8) __U, __R);
+  return (__m512d) __builtin_ia32_rangepd512_mask ((__v8df) __A,
+						   (__v8df) __B, __C,
+						   (__v8df)
+						   _mm512_setzero_pd (),
+						   (__mmask8) -1,
+						   _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_range_round_ss (__mmask8 __U, __m128 __A, __m128 __B, int __C,
-			  const int __R)
+_mm512_mask_range_pd (__m512d __W, __mmask8 __U,
+		      __m512d __A, __m512d __B, int __C)
 {
-  return (__m128) __builtin_ia32_rangess128_mask_round ((__v4sf) __A,
-						  (__v4sf) __B, __C,
-						  (__v4sf)
-						  _mm_setzero_ps (),
-						  (__mmask8) __U, __R);
+  return (__m512d) __builtin_ia32_rangepd512_mask ((__v8df) __A,
+						   (__v8df) __B, __C,
+						   (__v8df) __W,
+						   (__mmask8) __U,
+						   _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __mmask8
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fpclass_ss_mask (__m128 __A, const int __imm)
+_mm512_maskz_range_pd (__mmask8 __U, __m512d __A, __m512d __B, int __C)
 {
-  return (__mmask8) __builtin_ia32_fpclassss_mask ((__v4sf) __A, __imm,
-						   (__mmask8) -1);
+  return (__m512d) __builtin_ia32_rangepd512_mask ((__v8df) __A,
+						   (__v8df) __B, __C,
+						   (__v8df)
+						   _mm512_setzero_pd (),
+						   (__mmask8) __U,
+						   _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __mmask8
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fpclass_sd_mask (__m128d __A, const int __imm)
+_mm512_range_ps (__m512 __A, __m512 __B, int __C)
 {
-  return (__mmask8) __builtin_ia32_fpclasssd_mask ((__v2df) __A, __imm,
-						   (__mmask8) -1);
+  return (__m512) __builtin_ia32_rangeps512_mask ((__v16sf) __A,
+						  (__v16sf) __B, __C,
+						  (__v16sf)
+						  _mm512_setzero_ps (),
+						  (__mmask16) -1,
+						  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __mmask8
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fpclass_ss_mask (__mmask8 __U, __m128 __A, const int __imm)
+_mm512_mask_range_ps (__m512 __W, __mmask16 __U,
+		      __m512 __A, __m512 __B, int __C)
 {
-  return (__mmask8) __builtin_ia32_fpclassss_mask ((__v4sf) __A, __imm, __U);
+  return (__m512) __builtin_ia32_rangeps512_mask ((__v16sf) __A,
+						  (__v16sf) __B, __C,
+						  (__v16sf) __W,
+						  (__mmask16) __U,
+						  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __mmask8
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fpclass_sd_mask (__mmask8 __U, __m128d __A, const int __imm)
+_mm512_maskz_range_ps (__mmask16 __U, __m512 __A, __m512 __B, int __C)
 {
-  return (__mmask8) __builtin_ia32_fpclasssd_mask ((__v2df) __A, __imm, __U);
+  return (__m512) __builtin_ia32_rangeps512_mask ((__v16sf) __A,
+						  (__v16sf) __B, __C,
+						  (__v16sf)
+						  _mm512_setzero_ps (),
+						  (__mmask16) __U,
+						  _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512i
@@ -2395,72 +2546,6 @@ _mm512_fpclass_ps_mask (__m512 __A, const int __imm)
 }
 
 #else
-#define _kshiftli_mask8(X, Y)						\
-  ((__mmask8) __builtin_ia32_kshiftliqi ((__mmask8)(X), (__mmask8)(Y)))
-
-#define _kshiftri_mask8(X, Y)						\
-  ((__mmask8) __builtin_ia32_kshiftriqi ((__mmask8)(X), (__mmask8)(Y)))
-
-#define _mm_range_sd(A, B, C)						 \
-  ((__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df)(__m128d)(A), \
-    (__v2df)(__m128d)(B), (int)(C), (__v2df) _mm_setzero_pd (), 	 \
-    (__mmask8) -1, _MM_FROUND_CUR_DIRECTION))
-
-#define _mm_mask_range_sd(W, U, A, B, C)				 \
-  ((__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df)(__m128d)(A), \
-    (__v2df)(__m128d)(B), (int)(C), (__v2df)(__m128d)(W), 		 \
-    (__mmask8)(U), _MM_FROUND_CUR_DIRECTION))
-
-#define _mm_maskz_range_sd(U, A, B, C)					 \
-  ((__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df)(__m128d)(A), \
-    (__v2df)(__m128d)(B), (int)(C), (__v2df) _mm_setzero_pd (), 	 \
-    (__mmask8)(U), _MM_FROUND_CUR_DIRECTION))
-
-#define _mm_range_ss(A, B, C)						\
-  ((__m128) __builtin_ia32_rangess128_mask_round ((__v4sf)(__m128)(A),	\
-    (__v4sf)(__m128)(B), (int)(C), (__v4sf) _mm_setzero_ps (),		\
-    (__mmask8) -1, _MM_FROUND_CUR_DIRECTION))
-
-#define _mm_mask_range_ss(W, U, A, B, C)				\
-  ((__m128) __builtin_ia32_rangess128_mask_round ((__v4sf)(__m128)(A),	\
-    (__v4sf)(__m128)(B), (int)(C), (__v4sf)(__m128)(W),			\
-    (__mmask8)(U), _MM_FROUND_CUR_DIRECTION))
-
-#define _mm_maskz_range_ss(U, A, B, C)					\
-  ((__m128) __builtin_ia32_rangess128_mask_round ((__v4sf)(__m128)(A),	\
-    (__v4sf)(__m128)(B), (int)(C), (__v4sf) _mm_setzero_ps (),		\
-    (__mmask8)(U), _MM_FROUND_CUR_DIRECTION))
-
-#define _mm_range_round_sd(A, B, C, R)					 \
-  ((__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df)(__m128d)(A), \
-    (__v2df)(__m128d)(B), (int)(C), (__v2df) _mm_setzero_pd (),		 \
-    (__mmask8) -1, (R)))
-
-#define _mm_mask_range_round_sd(W, U, A, B, C, R)			 \
-  ((__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df)(__m128d)(A), \
-    (__v2df)(__m128d)(B), (int)(C), (__v2df)(__m128d)(W),		 \
-    (__mmask8)(U), (R)))
-
-#define _mm_maskz_range_round_sd(U, A, B, C, R)				 \
-  ((__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df)(__m128d)(A), \
-    (__v2df)(__m128d)(B), (int)(C), (__v2df) _mm_setzero_pd (),		 \
-    (__mmask8)(U), (R)))
-
-#define _mm_range_round_ss(A, B, C, R)					\
-  ((__m128) __builtin_ia32_rangess128_mask_round ((__v4sf)(__m128)(A),	\
-    (__v4sf)(__m128)(B), (int)(C), (__v4sf) _mm_setzero_ps (),		\
-    (__mmask8) -1, (R)))
-
-#define _mm_mask_range_round_ss(W, U, A, B, C, R)			\
-  ((__m128) __builtin_ia32_rangess128_mask_round ((__v4sf)(__m128)(A),	\
-    (__v4sf)(__m128)(B), (int)(C), (__v4sf)(__m128)(W),			\
-    (__mmask8)(U), (R)))
-
-#define _mm_maskz_range_round_ss(U, A, B, C, R)				\
-  ((__m128) __builtin_ia32_rangess128_mask_round ((__v4sf)(__m128)(A),	\
-    (__v4sf)(__m128)(B), (int)(C), (__v4sf) _mm_setzero_ps (),		\
-    (__mmask8)(U), (R)))
-
 #define _mm512_cvtt_roundpd_epi64(A, B)		    \
   ((__m512i)__builtin_ia32_cvttpd2qq512_mask ((A), (__v8di)		\
 					      _mm512_setzero_si512 (),	\
@@ -2792,22 +2877,6 @@ _mm512_fpclass_ps_mask (__m512 __A, const int __imm)
     (__v16si)(__m512i)_mm512_setzero_si512 (),\
     (__mmask16)(U)))
 
-#define _mm_fpclass_ss_mask(X, C)					\
-  ((__mmask8) __builtin_ia32_fpclassss_mask ((__v4sf) (__m128) (X),	\
-					     (int) (C), (__mmask8) (-1))) \
-
-#define _mm_fpclass_sd_mask(X, C)					\
-  ((__mmask8) __builtin_ia32_fpclasssd_mask ((__v2df) (__m128d) (X),	\
-					     (int) (C), (__mmask8) (-1))) \
-
-#define _mm_mask_fpclass_ss_mask(X, C, U)				\
-  ((__mmask8) __builtin_ia32_fpclassss_mask ((__v4sf) (__m128) (X),	\
-					     (int) (C), (__mmask8) (U)))
-
-#define _mm_mask_fpclass_sd_mask(X, C, U)				\
-  ((__mmask8) __builtin_ia32_fpclasssd_mask ((__v2df) (__m128d) (X),	\
-					     (int) (C), (__mmask8) (U)))
-
 #define _mm512_mask_fpclass_pd_mask(u, X, C)                            \
   ((__mmask8) __builtin_ia32_fpclasspd512_mask ((__v8df) (__m512d) (X), \
 						(int) (C), (__mmask8)(u)))
@@ -2824,68 +2893,11 @@ _mm512_fpclass_ps_mask (__m512 __A, const int __imm)
   ((__mmask16) __builtin_ia32_fpclassps512_mask ((__v16sf) (__m512) (x),\
 						 (int) (c),(__mmask16)-1))
 
-#define _mm_reduce_sd(A, B, C)						\
-  ((__m128d) __builtin_ia32_reducesd_mask ((__v2df)(__m128d)(A),	\
-    (__v2df)(__m128d)(B), (int)(C), (__v2df) _mm_setzero_pd (),		\
-    (__mmask8)-1))
-
-#define _mm_mask_reduce_sd(W, U, A, B, C)				\
-  ((__m128d) __builtin_ia32_reducesd_mask ((__v2df)(__m128d)(A),	\
-    (__v2df)(__m128d)(B), (int)(C), (__v2df)(__m128d)(W), (__mmask8)(U)))
-
-#define _mm_maskz_reduce_sd(U, A, B, C)					\
-  ((__m128d) __builtin_ia32_reducesd_mask ((__v2df)(__m128d)(A),	\
-    (__v2df)(__m128d)(B), (int)(C), (__v2df) _mm_setzero_pd (),		\
-    (__mmask8)(U)))
-
-#define _mm_reduce_round_sd(A, B, C, R)				       \
-  ((__m128d) __builtin_ia32_reducesd_round ((__v2df)(__m128d)(A),      \
-    (__v2df)(__m128d)(B), (int)(C), (__mmask8)(U), (int)(R)))
-
-#define _mm_mask_reduce_round_sd(W, U, A, B, C, R)		       \
-  ((__m128d) __builtin_ia32_reducesd_mask_round ((__v2df)(__m128d)(A), \
-    (__v2df)(__m128d)(B), (int)(C), (__v2df)(__m128d)(W),	       \
-    (__mmask8)(U), (int)(R)))
-
-#define _mm_maskz_reduce_round_sd(U, A, B, C, R)		       \
-  ((__m128d) __builtin_ia32_reducesd_mask_round ((__v2df)(__m128d)(A), \
-    (__v2df)(__m128d)(B), (int)(C), (__v2df) _mm_setzero_pd (),	       \
-    (__mmask8)(U), (int)(R)))
-
-#define _mm_reduce_ss(A, B, C)						\
-  ((__m128) __builtin_ia32_reducess_mask ((__v4sf)(__m128)(A),		\
-    (__v4sf)(__m128)(B), (int)(C), (__v4sf) _mm_setzero_ps (),		\
-    (__mmask8)-1))
-
-#define _mm_mask_reduce_ss(W, U, A, B, C)				\
-  ((__m128) __builtin_ia32_reducess_mask ((__v4sf)(__m128)(A),		\
-    (__v4sf)(__m128)(B), (int)(C), (__v4sf)(__m128)(W), (__mmask8)(U)))
-
-#define _mm_maskz_reduce_ss(U, A, B, C)					\
-  ((__m128) __builtin_ia32_reducess_mask ((__v4sf)(__m128)(A),		\
-    (__v4sf)(__m128)(B), (int)(C), (__v4sf) _mm_setzero_ps (),		\
-    (__mmask8)(U)))
-
-#define _mm_reduce_round_ss(A, B, C, R)				       \
-  ((__m128) __builtin_ia32_reducess_round ((__v4sf)(__m128)(A),	       \
-    (__v4sf)(__m128)(B), (int)(C), (__mmask8)(U), (int)(R)))
-
-#define _mm_mask_reduce_round_ss(W, U, A, B, C, R)		       \
-  ((__m128) __builtin_ia32_reducess_mask_round ((__v4sf)(__m128)(A),   \
-    (__v4sf)(__m128)(B), (int)(C), (__v4sf)(__m128)(W),		       \
-    (__mmask8)(U), (int)(R)))
-
-#define _mm_maskz_reduce_round_ss(U, A, B, C, R)		       \
-  ((__m128) __builtin_ia32_reducesd_mask_round ((__v4sf)(__m128)(A),   \
-    (__v4sf)(__m128)(B), (int)(C), (__v4sf) _mm_setzero_ps (),	       \
-    (__mmask8)(U), (int)(R)))
-
-
 #endif
 
-#ifdef __DISABLE_AVX512DQ__
-#undef __DISABLE_AVX512DQ__
+#ifdef __DISABLE_AVX512DQ_512__
+#undef __DISABLE_AVX512DQ_512__
 #pragma GCC pop_options
-#endif /* __DISABLE_AVX512DQ__ */
+#endif /* __DISABLE_AVX512DQ_512__ */
 
 #endif /* _AVX512DQINTRIN_H_INCLUDED */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 04/18] [PATCH 3/5] Push evex512 target for 512 bit intrins
  2023-09-21  7:19 [PATCH 00/18] Support -mevex512 for AVX512 Hu, Lin1
                   ` (2 preceding siblings ...)
  2023-09-21  7:19 ` [PATCH 03/18] [PATCH 2/5] " Hu, Lin1
@ 2023-09-21  7:19 ` Hu, Lin1
  2023-09-21  7:20 ` [PATCH 05/18] [PATCH 4/5] " Hu, Lin1
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Hu, Lin1 @ 2023-09-21  7:19 UTC (permalink / raw)
  To: gcc-patches; +Cc: hongtao.liu, ubizjak, haochen.jiang

From: Haochen Jiang <haochen.jiang@intel.com>

gcc/ChangeLog:

	* config/i386/avx512bwintrin.h: Add evex512 target for 512 bit
	intrins.
---
 gcc/config/i386/avx512bwintrin.h | 291 ++++++++++++++++---------------
 1 file changed, 153 insertions(+), 138 deletions(-)

diff --git a/gcc/config/i386/avx512bwintrin.h b/gcc/config/i386/avx512bwintrin.h
index d1cd549ce18..925bae1457c 100644
--- a/gcc/config/i386/avx512bwintrin.h
+++ b/gcc/config/i386/avx512bwintrin.h
@@ -34,16 +34,6 @@
 #define __DISABLE_AVX512BW__
 #endif /* __AVX512BW__ */
 
-/* Internal data types for implementing the intrinsics.  */
-typedef short __v32hi __attribute__ ((__vector_size__ (64)));
-typedef short __v32hi_u __attribute__ ((__vector_size__ (64),	\
-					__may_alias__, __aligned__ (1)));
-typedef char __v64qi __attribute__ ((__vector_size__ (64)));
-typedef char __v64qi_u __attribute__ ((__vector_size__ (64),	\
-				       __may_alias__, __aligned__ (1)));
-
-typedef unsigned long long __mmask64;
-
 extern __inline unsigned char
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _ktest_mask32_u8  (__mmask32 __A,  __mmask32 __B, unsigned char *__CF)
@@ -54,229 +44,292 @@ _ktest_mask32_u8  (__mmask32 __A,  __mmask32 __B, unsigned char *__CF)
 
 extern __inline unsigned char
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_ktest_mask64_u8  (__mmask64 __A,  __mmask64 __B, unsigned char *__CF)
+_ktestz_mask32_u8 (__mmask32 __A, __mmask32 __B)
 {
-  *__CF = (unsigned char) __builtin_ia32_ktestcdi (__A, __B);
-  return (unsigned char) __builtin_ia32_ktestzdi (__A, __B);
+  return (unsigned char) __builtin_ia32_ktestzsi (__A, __B);
 }
 
 extern __inline unsigned char
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_ktestz_mask32_u8 (__mmask32 __A, __mmask32 __B)
+_ktestc_mask32_u8 (__mmask32 __A, __mmask32 __B)
 {
-  return (unsigned char) __builtin_ia32_ktestzsi (__A, __B);
+  return (unsigned char) __builtin_ia32_ktestcsi (__A, __B);
 }
 
 extern __inline unsigned char
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_ktestz_mask64_u8 (__mmask64 __A, __mmask64 __B)
+_kortest_mask32_u8  (__mmask32 __A,  __mmask32 __B, unsigned char *__CF)
 {
-  return (unsigned char) __builtin_ia32_ktestzdi (__A, __B);
+  *__CF = (unsigned char) __builtin_ia32_kortestcsi (__A, __B);
+  return (unsigned char) __builtin_ia32_kortestzsi (__A, __B);
 }
 
 extern __inline unsigned char
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_ktestc_mask32_u8 (__mmask32 __A, __mmask32 __B)
+_kortestz_mask32_u8 (__mmask32 __A, __mmask32 __B)
 {
-  return (unsigned char) __builtin_ia32_ktestcsi (__A, __B);
+  return (unsigned char) __builtin_ia32_kortestzsi (__A, __B);
 }
 
 extern __inline unsigned char
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_ktestc_mask64_u8 (__mmask64 __A, __mmask64 __B)
+_kortestc_mask32_u8 (__mmask32 __A, __mmask32 __B)
 {
-  return (unsigned char) __builtin_ia32_ktestcdi (__A, __B);
+  return (unsigned char) __builtin_ia32_kortestcsi (__A, __B);
 }
 
-extern __inline unsigned char
+extern __inline __mmask32
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kortest_mask32_u8  (__mmask32 __A,  __mmask32 __B, unsigned char *__CF)
+_kadd_mask32 (__mmask32 __A, __mmask32 __B)
 {
-  *__CF = (unsigned char) __builtin_ia32_kortestcsi (__A, __B);
-  return (unsigned char) __builtin_ia32_kortestzsi (__A, __B);
+  return (__mmask32) __builtin_ia32_kaddsi ((__mmask32) __A, (__mmask32) __B);
 }
 
-extern __inline unsigned char
+extern __inline unsigned int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kortest_mask64_u8  (__mmask64 __A,  __mmask64 __B, unsigned char *__CF)
+_cvtmask32_u32 (__mmask32 __A)
 {
-  *__CF = (unsigned char) __builtin_ia32_kortestcdi (__A, __B);
-  return (unsigned char) __builtin_ia32_kortestzdi (__A, __B);
+  return (unsigned int) __builtin_ia32_kmovd ((__mmask32) __A);
 }
 
-extern __inline unsigned char
+extern __inline __mmask32
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kortestz_mask32_u8 (__mmask32 __A, __mmask32 __B)
+_cvtu32_mask32 (unsigned int __A)
 {
-  return (unsigned char) __builtin_ia32_kortestzsi (__A, __B);
+  return (__mmask32) __builtin_ia32_kmovd ((__mmask32) __A);
 }
 
-extern __inline unsigned char
+extern __inline __mmask32
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kortestz_mask64_u8 (__mmask64 __A, __mmask64 __B)
+_load_mask32 (__mmask32 *__A)
 {
-  return (unsigned char) __builtin_ia32_kortestzdi (__A, __B);
+  return (__mmask32) __builtin_ia32_kmovd (*__A);
 }
 
-extern __inline unsigned char
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kortestc_mask32_u8 (__mmask32 __A, __mmask32 __B)
+_store_mask32 (__mmask32 *__A, __mmask32 __B)
 {
-  return (unsigned char) __builtin_ia32_kortestcsi (__A, __B);
+  *(__mmask32 *) __A = __builtin_ia32_kmovd (__B);
 }
 
-extern __inline unsigned char
+extern __inline __mmask32
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kortestc_mask64_u8 (__mmask64 __A, __mmask64 __B)
+_knot_mask32 (__mmask32 __A)
 {
-  return (unsigned char) __builtin_ia32_kortestcdi (__A, __B);
+  return (__mmask32) __builtin_ia32_knotsi ((__mmask32) __A);
 }
 
 extern __inline __mmask32
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kadd_mask32 (__mmask32 __A, __mmask32 __B)
+_kor_mask32 (__mmask32 __A, __mmask32 __B)
 {
-  return (__mmask32) __builtin_ia32_kaddsi ((__mmask32) __A, (__mmask32) __B);
+  return (__mmask32) __builtin_ia32_korsi ((__mmask32) __A, (__mmask32) __B);
 }
 
-extern __inline __mmask64
+extern __inline __mmask32
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kadd_mask64 (__mmask64 __A, __mmask64 __B)
+_kxnor_mask32 (__mmask32 __A, __mmask32 __B)
 {
-  return (__mmask64) __builtin_ia32_kadddi ((__mmask64) __A, (__mmask64) __B);
+  return (__mmask32) __builtin_ia32_kxnorsi ((__mmask32) __A, (__mmask32) __B);
 }
 
-extern __inline unsigned int
+extern __inline __mmask32
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_cvtmask32_u32 (__mmask32 __A)
+_kxor_mask32 (__mmask32 __A, __mmask32 __B)
 {
-  return (unsigned int) __builtin_ia32_kmovd ((__mmask32) __A);
+  return (__mmask32) __builtin_ia32_kxorsi ((__mmask32) __A, (__mmask32) __B);
 }
 
-extern __inline unsigned long long
+extern __inline __mmask32
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_cvtmask64_u64 (__mmask64 __A)
+_kand_mask32 (__mmask32 __A, __mmask32 __B)
 {
-  return (unsigned long long) __builtin_ia32_kmovq ((__mmask64) __A);
+  return (__mmask32) __builtin_ia32_kandsi ((__mmask32) __A, (__mmask32) __B);
 }
 
 extern __inline __mmask32
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_cvtu32_mask32 (unsigned int __A)
+_kandn_mask32 (__mmask32 __A, __mmask32 __B)
 {
-  return (__mmask32) __builtin_ia32_kmovd ((__mmask32) __A);
+  return (__mmask32) __builtin_ia32_kandnsi ((__mmask32) __A, (__mmask32) __B);
 }
 
-extern __inline __mmask64
+extern __inline __mmask32
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_cvtu64_mask64 (unsigned long long __A)
+_mm512_kunpackw (__mmask32 __A, __mmask32 __B)
 {
-  return (__mmask64) __builtin_ia32_kmovq ((__mmask64) __A);
+  return (__mmask32) __builtin_ia32_kunpcksi ((__mmask32) __A,
+					      (__mmask32) __B);
 }
 
 extern __inline __mmask32
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_load_mask32 (__mmask32 *__A)
+_kunpackw_mask32 (__mmask16 __A, __mmask16 __B)
 {
-  return (__mmask32) __builtin_ia32_kmovd (*__A);
+  return (__mmask32) __builtin_ia32_kunpcksi ((__mmask32) __A,
+					      (__mmask32) __B);
 }
 
-extern __inline __mmask64
+#if __OPTIMIZE__
+extern __inline __mmask32
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_load_mask64 (__mmask64 *__A)
+_kshiftli_mask32 (__mmask32 __A, unsigned int __B)
 {
-  return (__mmask64) __builtin_ia32_kmovq (*(__mmask64 *) __A);
+  return (__mmask32) __builtin_ia32_kshiftlisi ((__mmask32) __A,
+						(__mmask8) __B);
 }
 
-extern __inline void
+extern __inline __mmask32
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_store_mask32 (__mmask32 *__A, __mmask32 __B)
+_kshiftri_mask32 (__mmask32 __A, unsigned int __B)
 {
-  *(__mmask32 *) __A = __builtin_ia32_kmovd (__B);
+  return (__mmask32) __builtin_ia32_kshiftrisi ((__mmask32) __A,
+						(__mmask8) __B);
 }
 
-extern __inline void
+#else
+#define _kshiftli_mask32(X, Y)							\
+  ((__mmask32) __builtin_ia32_kshiftlisi ((__mmask32)(X), (__mmask8)(Y)))
+
+#define _kshiftri_mask32(X, Y)							\
+  ((__mmask32) __builtin_ia32_kshiftrisi ((__mmask32)(X), (__mmask8)(Y)))
+
+#endif
+
+#ifdef __DISABLE_AVX512BW__
+#undef __DISABLE_AVX512BW__
+#pragma GCC pop_options
+#endif /* __DISABLE_AVX512BW__ */
+
+#if !defined (__AVX512BW__) || !defined (__EVEX512__)
+#pragma GCC push_options
+#pragma GCC target("avx512bw,evex512")
+#define __DISABLE_AVX512BW_512__
+#endif /* __AVX512BW_512__ */
+
+/* Internal data types for implementing the intrinsics.  */
+typedef short __v32hi __attribute__ ((__vector_size__ (64)));
+typedef short __v32hi_u __attribute__ ((__vector_size__ (64),	\
+					__may_alias__, __aligned__ (1)));
+typedef char __v64qi __attribute__ ((__vector_size__ (64)));
+typedef char __v64qi_u __attribute__ ((__vector_size__ (64),	\
+				       __may_alias__, __aligned__ (1)));
+
+typedef unsigned long long __mmask64;
+
+extern __inline unsigned char
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_store_mask64 (__mmask64 *__A, __mmask64 __B)
+_ktest_mask64_u8  (__mmask64 __A,  __mmask64 __B, unsigned char *__CF)
 {
-  *(__mmask64 *) __A = __builtin_ia32_kmovq (__B);
+  *__CF = (unsigned char) __builtin_ia32_ktestcdi (__A, __B);
+  return (unsigned char) __builtin_ia32_ktestzdi (__A, __B);
 }
 
-extern __inline __mmask32
+extern __inline unsigned char
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_knot_mask32 (__mmask32 __A)
+_ktestz_mask64_u8 (__mmask64 __A, __mmask64 __B)
 {
-  return (__mmask32) __builtin_ia32_knotsi ((__mmask32) __A);
+  return (unsigned char) __builtin_ia32_ktestzdi (__A, __B);
 }
 
-extern __inline __mmask64
+extern __inline unsigned char
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_knot_mask64 (__mmask64 __A)
+_ktestc_mask64_u8 (__mmask64 __A, __mmask64 __B)
 {
-  return (__mmask64) __builtin_ia32_knotdi ((__mmask64) __A);
+  return (unsigned char) __builtin_ia32_ktestcdi (__A, __B);
 }
 
-extern __inline __mmask32
+extern __inline unsigned char
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kor_mask32 (__mmask32 __A, __mmask32 __B)
+_kortest_mask64_u8  (__mmask64 __A,  __mmask64 __B, unsigned char *__CF)
 {
-  return (__mmask32) __builtin_ia32_korsi ((__mmask32) __A, (__mmask32) __B);
+  *__CF = (unsigned char) __builtin_ia32_kortestcdi (__A, __B);
+  return (unsigned char) __builtin_ia32_kortestzdi (__A, __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortestz_mask64_u8 (__mmask64 __A, __mmask64 __B)
+{
+  return (unsigned char) __builtin_ia32_kortestzdi (__A, __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortestc_mask64_u8 (__mmask64 __A, __mmask64 __B)
+{
+  return (unsigned char) __builtin_ia32_kortestcdi (__A, __B);
 }
 
 extern __inline __mmask64
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kor_mask64 (__mmask64 __A, __mmask64 __B)
+_kadd_mask64 (__mmask64 __A, __mmask64 __B)
 {
-  return (__mmask64) __builtin_ia32_kordi ((__mmask64) __A, (__mmask64) __B);
+  return (__mmask64) __builtin_ia32_kadddi ((__mmask64) __A, (__mmask64) __B);
 }
 
-extern __inline __mmask32
+extern __inline unsigned long long
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kxnor_mask32 (__mmask32 __A, __mmask32 __B)
+_cvtmask64_u64 (__mmask64 __A)
 {
-  return (__mmask32) __builtin_ia32_kxnorsi ((__mmask32) __A, (__mmask32) __B);
+  return (unsigned long long) __builtin_ia32_kmovq ((__mmask64) __A);
 }
 
 extern __inline __mmask64
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kxnor_mask64 (__mmask64 __A, __mmask64 __B)
+_cvtu64_mask64 (unsigned long long __A)
 {
-  return (__mmask64) __builtin_ia32_kxnordi ((__mmask64) __A, (__mmask64) __B);
+  return (__mmask64) __builtin_ia32_kmovq ((__mmask64) __A);
 }
 
-extern __inline __mmask32
+extern __inline __mmask64
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kxor_mask32 (__mmask32 __A, __mmask32 __B)
+_load_mask64 (__mmask64 *__A)
 {
-  return (__mmask32) __builtin_ia32_kxorsi ((__mmask32) __A, (__mmask32) __B);
+  return (__mmask64) __builtin_ia32_kmovq (*(__mmask64 *) __A);
+}
+
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_store_mask64 (__mmask64 *__A, __mmask64 __B)
+{
+  *(__mmask64 *) __A = __builtin_ia32_kmovq (__B);
 }
 
 extern __inline __mmask64
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kxor_mask64 (__mmask64 __A, __mmask64 __B)
+_knot_mask64 (__mmask64 __A)
 {
-  return (__mmask64) __builtin_ia32_kxordi ((__mmask64) __A, (__mmask64) __B);
+  return (__mmask64) __builtin_ia32_knotdi ((__mmask64) __A);
 }
 
-extern __inline __mmask32
+extern __inline __mmask64
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kand_mask32 (__mmask32 __A, __mmask32 __B)
+_kor_mask64 (__mmask64 __A, __mmask64 __B)
 {
-  return (__mmask32) __builtin_ia32_kandsi ((__mmask32) __A, (__mmask32) __B);
+  return (__mmask64) __builtin_ia32_kordi ((__mmask64) __A, (__mmask64) __B);
 }
 
 extern __inline __mmask64
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kand_mask64 (__mmask64 __A, __mmask64 __B)
+_kxnor_mask64 (__mmask64 __A, __mmask64 __B)
 {
-  return (__mmask64) __builtin_ia32_kanddi ((__mmask64) __A, (__mmask64) __B);
+  return (__mmask64) __builtin_ia32_kxnordi ((__mmask64) __A, (__mmask64) __B);
 }
 
-extern __inline __mmask32
+extern __inline __mmask64
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kandn_mask32 (__mmask32 __A, __mmask32 __B)
+_kxor_mask64 (__mmask64 __A, __mmask64 __B)
 {
-  return (__mmask32) __builtin_ia32_kandnsi ((__mmask32) __A, (__mmask32) __B);
+  return (__mmask64) __builtin_ia32_kxordi ((__mmask64) __A, (__mmask64) __B);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kand_mask64 (__mmask64 __A, __mmask64 __B)
+{
+  return (__mmask64) __builtin_ia32_kanddi ((__mmask64) __A, (__mmask64) __B);
 }
 
 extern __inline __mmask64
@@ -366,22 +419,6 @@ _mm512_maskz_mov_epi8 (__mmask64 __U, __m512i __A)
 						    (__mmask64) __U);
 }
 
-extern __inline __mmask32
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_kunpackw (__mmask32 __A, __mmask32 __B)
-{
-  return (__mmask32) __builtin_ia32_kunpcksi ((__mmask32) __A,
-					      (__mmask32) __B);
-}
-
-extern __inline __mmask32
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kunpackw_mask32 (__mmask16 __A, __mmask16 __B)
-{
-  return (__mmask32) __builtin_ia32_kunpcksi ((__mmask32) __A,
-					      (__mmask32) __B);
-}
-
 extern __inline __mmask64
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_kunpackd (__mmask64 __A, __mmask64 __B)
@@ -2776,14 +2813,6 @@ _mm512_mask_packus_epi32 (__m512i __W, __mmask32 __M, __m512i __A,
 }
 
 #ifdef __OPTIMIZE__
-extern __inline __mmask32
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kshiftli_mask32 (__mmask32 __A, unsigned int __B)
-{
-  return (__mmask32) __builtin_ia32_kshiftlisi ((__mmask32) __A,
-						(__mmask8) __B);
-}
-
 extern __inline __mmask64
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _kshiftli_mask64 (__mmask64 __A, unsigned int __B)
@@ -2792,14 +2821,6 @@ _kshiftli_mask64 (__mmask64 __A, unsigned int __B)
 						(__mmask8) __B);
 }
 
-extern __inline __mmask32
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kshiftri_mask32 (__mmask32 __A, unsigned int __B)
-{
-  return (__mmask32) __builtin_ia32_kshiftrisi ((__mmask32) __A,
-						(__mmask8) __B);
-}
-
 extern __inline __mmask64
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _kshiftri_mask64 (__mmask64 __A, unsigned int __B)
@@ -3145,15 +3166,9 @@ _mm512_bsrli_epi128 (__m512i __A, const int __N)
 }
 
 #else
-#define _kshiftli_mask32(X, Y)							\
-  ((__mmask32) __builtin_ia32_kshiftlisi ((__mmask32)(X), (__mmask8)(Y)))
-
 #define _kshiftli_mask64(X, Y)							\
   ((__mmask64) __builtin_ia32_kshiftlidi ((__mmask64)(X), (__mmask8)(Y)))
 
-#define _kshiftri_mask32(X, Y)							\
-  ((__mmask32) __builtin_ia32_kshiftrisi ((__mmask32)(X), (__mmask8)(Y)))
-
 #define _kshiftri_mask64(X, Y)							\
   ((__mmask64) __builtin_ia32_kshiftridi ((__mmask64)(X), (__mmask8)(Y)))
 
@@ -3328,9 +3343,9 @@ _mm512_bsrli_epi128 (__m512i __A, const int __N)
 
 #endif
 
-#ifdef __DISABLE_AVX512BW__
-#undef __DISABLE_AVX512BW__
+#ifdef __DISABLE_AVX512BW_512__
+#undef __DISABLE_AVX512BW_512__
 #pragma GCC pop_options
-#endif /* __DISABLE_AVX512BW__ */
+#endif /* __DISABLE_AVX512BW_512__ */
 
 #endif /* _AVX512BWINTRIN_H_INCLUDED */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 05/18] [PATCH 4/5] Push evex512 target for 512 bit intrins
  2023-09-21  7:19 [PATCH 00/18] Support -mevex512 for AVX512 Hu, Lin1
                   ` (3 preceding siblings ...)
  2023-09-21  7:19 ` [PATCH 04/18] [PATCH 3/5] " Hu, Lin1
@ 2023-09-21  7:20 ` Hu, Lin1
  2023-09-21  7:20 ` [PATCH 06/18] [PATCH 5/5] " Hu, Lin1
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Hu, Lin1 @ 2023-09-21  7:20 UTC (permalink / raw)
  To: gcc-patches; +Cc: hongtao.liu, ubizjak, haochen.jiang

From: Haochen Jiang <haochen.jiang@intel.com>

gcc/ChangeLog:

	* config.gcc: Add avx512bitalgvlintrin.h.
	* config/i386/avx5124fmapsintrin.h: Add evex512 target for 512 bit
	intrins.
	* config/i386/avx5124vnniwintrin.h: Ditto.
	* config/i386/avx512bf16intrin.h: Ditto.
	* config/i386/avx512bitalgintrin.h: Add evex512 target for 512 bit
	intrins. Split 128/256 bit intrins to avx512bitalgvlintrin.h.
	* config/i386/avx512erintrin.h: Add evex512 target for 512 bit
	intrins
	* config/i386/avx512ifmaintrin.h: Ditto
	* config/i386/avx512pfintrin.h: Ditto
	* config/i386/avx512vbmi2intrin.h: Ditto.
	* config/i386/avx512vbmiintrin.h: Ditto.
	* config/i386/avx512vnniintrin.h: Ditto.
	* config/i386/avx512vp2intersectintrin.h: Ditto.
	* config/i386/avx512vpopcntdqintrin.h: Ditto.
	* config/i386/gfniintrin.h: Ditto.
	* config/i386/immintrin.h: Add avx512bitalgvlintrin.h.
	* config/i386/vaesintrin.h: Add evex512 target for 512 bit intrins.
	* config/i386/vpclmulqdqintrin.h: Ditto.
	* config/i386/avx512bitalgvlintrin.h: New.
---
 gcc/config.gcc                             |  19 +--
 gcc/config/i386/avx5124fmapsintrin.h       |   2 +-
 gcc/config/i386/avx5124vnniwintrin.h       |   2 +-
 gcc/config/i386/avx512bf16intrin.h         |  31 ++--
 gcc/config/i386/avx512bitalgintrin.h       | 155 +-----------------
 gcc/config/i386/avx512bitalgvlintrin.h     | 180 +++++++++++++++++++++
 gcc/config/i386/avx512erintrin.h           |   2 +-
 gcc/config/i386/avx512ifmaintrin.h         |   4 +-
 gcc/config/i386/avx512pfintrin.h           |   2 +-
 gcc/config/i386/avx512vbmi2intrin.h        |   4 +-
 gcc/config/i386/avx512vbmiintrin.h         |   4 +-
 gcc/config/i386/avx512vnniintrin.h         |   4 +-
 gcc/config/i386/avx512vp2intersectintrin.h |   4 +-
 gcc/config/i386/avx512vpopcntdqintrin.h    |   4 +-
 gcc/config/i386/gfniintrin.h               |  76 +++++----
 gcc/config/i386/immintrin.h                |   2 +
 gcc/config/i386/vaesintrin.h               |   4 +-
 gcc/config/i386/vpclmulqdqintrin.h         |   4 +-
 18 files changed, 282 insertions(+), 221 deletions(-)
 create mode 100644 gcc/config/i386/avx512bitalgvlintrin.h

diff --git a/gcc/config.gcc b/gcc/config.gcc
index ce5def08e2e..e47e6893e1d 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -425,15 +425,16 @@ i[34567]86-*-* | x86_64-*-*)
 		       avx512vbmi2vlintrin.h avx512vnniintrin.h
 		       avx512vnnivlintrin.h vaesintrin.h vpclmulqdqintrin.h
 		       avx512vpopcntdqvlintrin.h avx512bitalgintrin.h
-		       pconfigintrin.h wbnoinvdintrin.h movdirintrin.h
-		       waitpkgintrin.h cldemoteintrin.h avx512bf16vlintrin.h
-		       avx512bf16intrin.h enqcmdintrin.h serializeintrin.h
-		       avx512vp2intersectintrin.h avx512vp2intersectvlintrin.h
-		       tsxldtrkintrin.h amxtileintrin.h amxint8intrin.h
-		       amxbf16intrin.h x86gprintrin.h uintrintrin.h
-		       hresetintrin.h keylockerintrin.h avxvnniintrin.h
-		       mwaitintrin.h avx512fp16intrin.h avx512fp16vlintrin.h
-		       avxifmaintrin.h avxvnniint8intrin.h avxneconvertintrin.h
+		       avx512bitalgvlintrin.h pconfigintrin.h wbnoinvdintrin.h
+		       movdirintrin.h waitpkgintrin.h cldemoteintrin.h
+		       avx512bf16vlintrin.h avx512bf16intrin.h enqcmdintrin.h
+		       serializeintrin.h avx512vp2intersectintrin.h
+		       avx512vp2intersectvlintrin.h tsxldtrkintrin.h
+		       amxtileintrin.h amxint8intrin.h amxbf16intrin.h
+		       x86gprintrin.h uintrintrin.h hresetintrin.h
+		       keylockerintrin.h avxvnniintrin.h mwaitintrin.h
+		       avx512fp16intrin.h avx512fp16vlintrin.h avxifmaintrin.h
+		       avxvnniint8intrin.h avxneconvertintrin.h
 		       cmpccxaddintrin.h amxfp16intrin.h prfchiintrin.h
 		       raointintrin.h amxcomplexintrin.h avxvnniint16intrin.h
 		       sm3intrin.h sha512intrin.h sm4intrin.h"
diff --git a/gcc/config/i386/avx5124fmapsintrin.h b/gcc/config/i386/avx5124fmapsintrin.h
index 97dd77c9235..4c884a5c203 100644
--- a/gcc/config/i386/avx5124fmapsintrin.h
+++ b/gcc/config/i386/avx5124fmapsintrin.h
@@ -30,7 +30,7 @@
 
 #ifndef __AVX5124FMAPS__
 #pragma GCC push_options
-#pragma GCC target("avx5124fmaps")
+#pragma GCC target("avx5124fmaps,evex512")
 #define __DISABLE_AVX5124FMAPS__
 #endif /* __AVX5124FMAPS__ */
 
diff --git a/gcc/config/i386/avx5124vnniwintrin.h b/gcc/config/i386/avx5124vnniwintrin.h
index fd129589798..795e4814f28 100644
--- a/gcc/config/i386/avx5124vnniwintrin.h
+++ b/gcc/config/i386/avx5124vnniwintrin.h
@@ -30,7 +30,7 @@
 
 #ifndef __AVX5124VNNIW__
 #pragma GCC push_options
-#pragma GCC target("avx5124vnniw")
+#pragma GCC target("avx5124vnniw,evex512")
 #define __DISABLE_AVX5124VNNIW__
 #endif /* __AVX5124VNNIW__ */
 
diff --git a/gcc/config/i386/avx512bf16intrin.h b/gcc/config/i386/avx512bf16intrin.h
index 107f4a448f6..94ccbf6389f 100644
--- a/gcc/config/i386/avx512bf16intrin.h
+++ b/gcc/config/i386/avx512bf16intrin.h
@@ -34,13 +34,6 @@
 #define __DISABLE_AVX512BF16__
 #endif /* __AVX512BF16__ */
 
-/* Internal data types for implementing the intrinsics.  */
-typedef __bf16 __v32bf __attribute__ ((__vector_size__ (64)));
-
-/* The Intel API is flexible enough that we must allow aliasing with other
-   vector types, and their scalar components.  */
-typedef __bf16 __m512bh __attribute__ ((__vector_size__ (64), __may_alias__));
-
 /* Convert One BF16 Data to One Single Float Data.  */
 extern __inline float
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
@@ -49,6 +42,24 @@ _mm_cvtsbh_ss (__bf16 __A)
   return __builtin_ia32_cvtbf2sf (__A);
 }
 
+#ifdef __DISABLE_AVX512BF16__
+#undef __DISABLE_AVX512BF16__
+#pragma GCC pop_options
+#endif /* __DISABLE_AVX512BF16__ */
+
+#if !defined (__AVX512BF16__) || !defined (__EVEX512__)
+#pragma GCC push_options
+#pragma GCC target("avx512bf16,evex512")
+#define __DISABLE_AVX512BF16_512__
+#endif /* __AVX512BF16_512__ */
+
+/* Internal data types for implementing the intrinsics.  */
+typedef __bf16 __v32bf __attribute__ ((__vector_size__ (64)));
+
+/* The Intel API is flexible enough that we must allow aliasing with other
+   vector types, and their scalar components.  */
+typedef __bf16 __m512bh __attribute__ ((__vector_size__ (64), __may_alias__));
+
 /* vcvtne2ps2bf16 */
 
 extern __inline __m512bh
@@ -144,9 +155,9 @@ _mm512_mask_cvtpbh_ps (__m512 __S, __mmask16 __U, __m256bh __A)
 	 (__m512i)_mm512_cvtepi16_epi32 ((__m256i)__A), 16)));
 }
 
-#ifdef __DISABLE_AVX512BF16__
-#undef __DISABLE_AVX512BF16__
+#ifdef __DISABLE_AVX512BF16_512__
+#undef __DISABLE_AVX512BF16_512__
 #pragma GCC pop_options
-#endif /* __DISABLE_AVX512BF16__ */
+#endif /* __DISABLE_AVX512BF16_512__ */
 
 #endif /* _AVX512BF16INTRIN_H_INCLUDED */
diff --git a/gcc/config/i386/avx512bitalgintrin.h b/gcc/config/i386/avx512bitalgintrin.h
index a1c7be109a9..af8514f5838 100644
--- a/gcc/config/i386/avx512bitalgintrin.h
+++ b/gcc/config/i386/avx512bitalgintrin.h
@@ -22,15 +22,15 @@
    <http://www.gnu.org/licenses/>.  */
 
 #if !defined _IMMINTRIN_H_INCLUDED
-# error "Never use <avx512bitalgintrin.h> directly; include <x86intrin.h> instead."
+# error "Never use <avx512bitalgintrin.h> directly; include <immintrin.h> instead."
 #endif
 
 #ifndef _AVX512BITALGINTRIN_H_INCLUDED
 #define _AVX512BITALGINTRIN_H_INCLUDED
 
-#ifndef __AVX512BITALG__
+#if !defined (__AVX512BITALG__) || !defined (__EVEX512__)
 #pragma GCC push_options
-#pragma GCC target("avx512bitalg")
+#pragma GCC target("avx512bitalg,evex512")
 #define __DISABLE_AVX512BITALG__
 #endif /* __AVX512BITALG__ */
 
@@ -108,153 +108,4 @@ _mm512_mask_bitshuffle_epi64_mask (__mmask64 __M, __m512i __A, __m512i __B)
 #pragma GCC pop_options
 #endif /* __DISABLE_AVX512BITALG__ */
 
-#if !defined(__AVX512BITALG__) || !defined(__AVX512VL__)
-#pragma GCC push_options
-#pragma GCC target("avx512bitalg,avx512vl")
-#define __DISABLE_AVX512BITALGVL__
-#endif /* __AVX512BITALGVL__ */
-
-extern __inline __m256i
-__attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm256_mask_popcnt_epi8 (__m256i __W, __mmask32 __U, __m256i __A)
-{
-  return (__m256i) __builtin_ia32_vpopcountb_v32qi_mask ((__v32qi) __A,
-							 (__v32qi) __W,
-							 (__mmask32) __U);
-}
-
-extern __inline __m256i
-__attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm256_maskz_popcnt_epi8 (__mmask32 __U, __m256i __A)
-{
-  return (__m256i) __builtin_ia32_vpopcountb_v32qi_mask ((__v32qi) __A,
-						(__v32qi)
-						 _mm256_setzero_si256 (),
-						(__mmask32) __U);
-}
-
-extern __inline __mmask32
-__attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm256_bitshuffle_epi64_mask (__m256i __A, __m256i __B)
-{
-  return (__mmask32) __builtin_ia32_vpshufbitqmb256_mask ((__v32qi) __A,
-						 (__v32qi) __B,
-						 (__mmask32) -1);
-}
-
-extern __inline __mmask32
-__attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm256_mask_bitshuffle_epi64_mask (__mmask32 __M, __m256i __A, __m256i __B)
-{
-  return (__mmask32) __builtin_ia32_vpshufbitqmb256_mask ((__v32qi) __A,
-						 (__v32qi) __B,
-						 (__mmask32) __M);
-}
-
-extern __inline __mmask16
-__attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm_bitshuffle_epi64_mask (__m128i __A, __m128i __B)
-{
-  return (__mmask16) __builtin_ia32_vpshufbitqmb128_mask ((__v16qi) __A,
-						 (__v16qi) __B,
-						 (__mmask16) -1);
-}
-
-extern __inline __mmask16
-__attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_bitshuffle_epi64_mask (__mmask16 __M, __m128i __A, __m128i __B)
-{
-  return (__mmask16) __builtin_ia32_vpshufbitqmb128_mask ((__v16qi) __A,
-						 (__v16qi) __B,
-						 (__mmask16) __M);
-}
-
-extern __inline __m256i
-__attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm256_popcnt_epi8 (__m256i __A)
-{
-  return (__m256i) __builtin_ia32_vpopcountb_v32qi ((__v32qi) __A);
-}
-
-extern __inline __m256i
-__attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm256_popcnt_epi16 (__m256i __A)
-{
-  return (__m256i) __builtin_ia32_vpopcountw_v16hi ((__v16hi) __A);
-}
-
-extern __inline __m128i
-__attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm_popcnt_epi8 (__m128i __A)
-{
-  return (__m128i) __builtin_ia32_vpopcountb_v16qi ((__v16qi) __A);
-}
-
-extern __inline __m128i
-__attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm_popcnt_epi16 (__m128i __A)
-{
-  return (__m128i) __builtin_ia32_vpopcountw_v8hi ((__v8hi) __A);
-}
-
-extern __inline __m256i
-__attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm256_mask_popcnt_epi16 (__m256i __W, __mmask16 __U, __m256i __A)
-{
-  return (__m256i) __builtin_ia32_vpopcountw_v16hi_mask ((__v16hi) __A,
-							(__v16hi) __W,
-							(__mmask16) __U);
-}
-
-extern __inline __m256i
-__attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm256_maskz_popcnt_epi16 (__mmask16 __U, __m256i __A)
-{
-  return (__m256i) __builtin_ia32_vpopcountw_v16hi_mask ((__v16hi) __A,
-						(__v16hi)
-						_mm256_setzero_si256 (),
-						(__mmask16) __U);
-}
-
-extern __inline __m128i
-__attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_popcnt_epi8 (__m128i __W, __mmask16 __U, __m128i __A)
-{
-  return (__m128i) __builtin_ia32_vpopcountb_v16qi_mask ((__v16qi) __A,
-							 (__v16qi) __W,
-							 (__mmask16) __U);
-}
-
-extern __inline __m128i
-__attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_popcnt_epi8 (__mmask16 __U, __m128i __A)
-{
-  return (__m128i) __builtin_ia32_vpopcountb_v16qi_mask ((__v16qi) __A,
-							 (__v16qi)
-							 _mm_setzero_si128 (),
-							 (__mmask16) __U);
-}
-extern __inline __m128i
-__attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_popcnt_epi16 (__m128i __W, __mmask8 __U, __m128i __A)
-{
-  return (__m128i) __builtin_ia32_vpopcountw_v8hi_mask ((__v8hi) __A,
-							(__v8hi) __W,
-							(__mmask8) __U);
-}
-
-extern __inline __m128i
-__attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_popcnt_epi16 (__mmask8 __U, __m128i __A)
-{
-  return (__m128i) __builtin_ia32_vpopcountw_v8hi_mask ((__v8hi) __A,
-							(__v8hi)
-							_mm_setzero_si128 (),
-							(__mmask8) __U);
-}
-#ifdef __DISABLE_AVX512BITALGVL__
-#undef __DISABLE_AVX512BITALGVL__
-#pragma GCC pop_options
-#endif /* __DISABLE_AVX512BITALGVL__ */
-
 #endif /* _AVX512BITALGINTRIN_H_INCLUDED */
diff --git a/gcc/config/i386/avx512bitalgvlintrin.h b/gcc/config/i386/avx512bitalgvlintrin.h
new file mode 100644
index 00000000000..36d697dea8a
--- /dev/null
+++ b/gcc/config/i386/avx512bitalgvlintrin.h
@@ -0,0 +1,180 @@
+/* Copyright (C) 2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#if !defined _IMMINTRIN_H_INCLUDED
+# error "Never use <avx512bitalgvlintrin.h> directly; include <immintrin.h> instead."
+#endif
+
+#ifndef _AVX512BITALGVLINTRIN_H_INCLUDED
+#define _AVX512BITALGVLINTRIN_H_INCLUDED
+
+#if !defined(__AVX512BITALG__) || !defined(__AVX512VL__)
+#pragma GCC push_options
+#pragma GCC target("avx512bitalg,avx512vl")
+#define __DISABLE_AVX512BITALGVL__
+#endif /* __AVX512BITALGVL__ */
+
+extern __inline __m256i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_popcnt_epi8 (__m256i __W, __mmask32 __U, __m256i __A)
+{
+  return (__m256i) __builtin_ia32_vpopcountb_v32qi_mask ((__v32qi) __A,
+							 (__v32qi) __W,
+							 (__mmask32) __U);
+}
+
+extern __inline __m256i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_popcnt_epi8 (__mmask32 __U, __m256i __A)
+{
+  return (__m256i) __builtin_ia32_vpopcountb_v32qi_mask ((__v32qi) __A,
+						(__v32qi)
+						 _mm256_setzero_si256 (),
+						(__mmask32) __U);
+}
+
+extern __inline __mmask32
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_bitshuffle_epi64_mask (__m256i __A, __m256i __B)
+{
+  return (__mmask32) __builtin_ia32_vpshufbitqmb256_mask ((__v32qi) __A,
+						 (__v32qi) __B,
+						 (__mmask32) -1);
+}
+
+extern __inline __mmask32
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_bitshuffle_epi64_mask (__mmask32 __M, __m256i __A, __m256i __B)
+{
+  return (__mmask32) __builtin_ia32_vpshufbitqmb256_mask ((__v32qi) __A,
+						 (__v32qi) __B,
+						 (__mmask32) __M);
+}
+
+extern __inline __mmask16
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_bitshuffle_epi64_mask (__m128i __A, __m128i __B)
+{
+  return (__mmask16) __builtin_ia32_vpshufbitqmb128_mask ((__v16qi) __A,
+						 (__v16qi) __B,
+						 (__mmask16) -1);
+}
+
+extern __inline __mmask16
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_bitshuffle_epi64_mask (__mmask16 __M, __m128i __A, __m128i __B)
+{
+  return (__mmask16) __builtin_ia32_vpshufbitqmb128_mask ((__v16qi) __A,
+						 (__v16qi) __B,
+						 (__mmask16) __M);
+}
+
+extern __inline __m256i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_popcnt_epi8 (__m256i __A)
+{
+  return (__m256i) __builtin_ia32_vpopcountb_v32qi ((__v32qi) __A);
+}
+
+extern __inline __m256i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_popcnt_epi16 (__m256i __A)
+{
+  return (__m256i) __builtin_ia32_vpopcountw_v16hi ((__v16hi) __A);
+}
+
+extern __inline __m128i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_popcnt_epi8 (__m128i __A)
+{
+  return (__m128i) __builtin_ia32_vpopcountb_v16qi ((__v16qi) __A);
+}
+
+extern __inline __m128i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_popcnt_epi16 (__m128i __A)
+{
+  return (__m128i) __builtin_ia32_vpopcountw_v8hi ((__v8hi) __A);
+}
+
+extern __inline __m256i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_popcnt_epi16 (__m256i __W, __mmask16 __U, __m256i __A)
+{
+  return (__m256i) __builtin_ia32_vpopcountw_v16hi_mask ((__v16hi) __A,
+							(__v16hi) __W,
+							(__mmask16) __U);
+}
+
+extern __inline __m256i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_popcnt_epi16 (__mmask16 __U, __m256i __A)
+{
+  return (__m256i) __builtin_ia32_vpopcountw_v16hi_mask ((__v16hi) __A,
+						(__v16hi)
+						_mm256_setzero_si256 (),
+						(__mmask16) __U);
+}
+
+extern __inline __m128i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_popcnt_epi8 (__m128i __W, __mmask16 __U, __m128i __A)
+{
+  return (__m128i) __builtin_ia32_vpopcountb_v16qi_mask ((__v16qi) __A,
+							 (__v16qi) __W,
+							 (__mmask16) __U);
+}
+
+extern __inline __m128i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_popcnt_epi8 (__mmask16 __U, __m128i __A)
+{
+  return (__m128i) __builtin_ia32_vpopcountb_v16qi_mask ((__v16qi) __A,
+							 (__v16qi)
+							 _mm_setzero_si128 (),
+							 (__mmask16) __U);
+}
+extern __inline __m128i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_popcnt_epi16 (__m128i __W, __mmask8 __U, __m128i __A)
+{
+  return (__m128i) __builtin_ia32_vpopcountw_v8hi_mask ((__v8hi) __A,
+							(__v8hi) __W,
+							(__mmask8) __U);
+}
+
+extern __inline __m128i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_popcnt_epi16 (__mmask8 __U, __m128i __A)
+{
+  return (__m128i) __builtin_ia32_vpopcountw_v8hi_mask ((__v8hi) __A,
+							(__v8hi)
+							_mm_setzero_si128 (),
+							(__mmask8) __U);
+}
+#ifdef __DISABLE_AVX512BITALGVL__
+#undef __DISABLE_AVX512BITALGVL__
+#pragma GCC pop_options
+#endif /* __DISABLE_AVX512BITALGVL__ */
+
+#endif /* _AVX512BITALGVLINTRIN_H_INCLUDED */
diff --git a/gcc/config/i386/avx512erintrin.h b/gcc/config/i386/avx512erintrin.h
index bd83b7fbbc6..5c7be9c47ac 100644
--- a/gcc/config/i386/avx512erintrin.h
+++ b/gcc/config/i386/avx512erintrin.h
@@ -30,7 +30,7 @@
 
 #ifndef __AVX512ER__
 #pragma GCC push_options
-#pragma GCC target("avx512er")
+#pragma GCC target("avx512er,evex512")
 #define __DISABLE_AVX512ER__
 #endif /* __AVX512ER__ */
 
diff --git a/gcc/config/i386/avx512ifmaintrin.h b/gcc/config/i386/avx512ifmaintrin.h
index fc97f1defe8..e08078b2725 100644
--- a/gcc/config/i386/avx512ifmaintrin.h
+++ b/gcc/config/i386/avx512ifmaintrin.h
@@ -28,9 +28,9 @@
 #ifndef _AVX512IFMAINTRIN_H_INCLUDED
 #define _AVX512IFMAINTRIN_H_INCLUDED
 
-#ifndef __AVX512IFMA__
+#if !defined (__AVX512IFMA__) || !defined (__EVEX512__)
 #pragma GCC push_options
-#pragma GCC target("avx512ifma")
+#pragma GCC target("avx512ifma,evex512")
 #define __DISABLE_AVX512IFMA__
 #endif /* __AVX512IFMA__ */
 
diff --git a/gcc/config/i386/avx512pfintrin.h b/gcc/config/i386/avx512pfintrin.h
index a547610660a..58af26ff02e 100644
--- a/gcc/config/i386/avx512pfintrin.h
+++ b/gcc/config/i386/avx512pfintrin.h
@@ -30,7 +30,7 @@
 
 #ifndef __AVX512PF__
 #pragma GCC push_options
-#pragma GCC target("avx512pf")
+#pragma GCC target("avx512pf,evex512")
 #define __DISABLE_AVX512PF__
 #endif /* __AVX512PF__ */
 
diff --git a/gcc/config/i386/avx512vbmi2intrin.h b/gcc/config/i386/avx512vbmi2intrin.h
index ca00f8a5f14..b7ff07b2d11 100644
--- a/gcc/config/i386/avx512vbmi2intrin.h
+++ b/gcc/config/i386/avx512vbmi2intrin.h
@@ -28,9 +28,9 @@
 #ifndef __AVX512VBMI2INTRIN_H_INCLUDED
 #define __AVX512VBMI2INTRIN_H_INCLUDED
 
-#if !defined(__AVX512VBMI2__)
+#if !defined(__AVX512VBMI2__) || !defined (__EVEX512__)
 #pragma GCC push_options
-#pragma GCC target("avx512vbmi2")
+#pragma GCC target("avx512vbmi2,evex512")
 #define __DISABLE_AVX512VBMI2__
 #endif /* __AVX512VBMI2__ */
 
diff --git a/gcc/config/i386/avx512vbmiintrin.h b/gcc/config/i386/avx512vbmiintrin.h
index 502586090ae..1a7ab4edca3 100644
--- a/gcc/config/i386/avx512vbmiintrin.h
+++ b/gcc/config/i386/avx512vbmiintrin.h
@@ -28,9 +28,9 @@
 #ifndef _AVX512VBMIINTRIN_H_INCLUDED
 #define _AVX512VBMIINTRIN_H_INCLUDED
 
-#ifndef __AVX512VBMI__
+#if !defined (__AVX512VBMI__) || !defined (__EVEX512__)
 #pragma GCC push_options
-#pragma GCC target("avx512vbmi")
+#pragma GCC target("avx512vbmi,evex512")
 #define __DISABLE_AVX512VBMI__
 #endif /* __AVX512VBMI__ */
 
diff --git a/gcc/config/i386/avx512vnniintrin.h b/gcc/config/i386/avx512vnniintrin.h
index e36e2e57f21..1090703ec48 100644
--- a/gcc/config/i386/avx512vnniintrin.h
+++ b/gcc/config/i386/avx512vnniintrin.h
@@ -28,9 +28,9 @@
 #ifndef __AVX512VNNIINTRIN_H_INCLUDED
 #define __AVX512VNNIINTRIN_H_INCLUDED
 
-#if !defined(__AVX512VNNI__)
+#if !defined(__AVX512VNNI__) || !defined (__EVEX512__)
 #pragma GCC push_options
-#pragma GCC target("avx512vnni")
+#pragma GCC target("avx512vnni,evex512")
 #define __DISABLE_AVX512VNNI__
 #endif /* __AVX512VNNI__ */
 
diff --git a/gcc/config/i386/avx512vp2intersectintrin.h b/gcc/config/i386/avx512vp2intersectintrin.h
index 65e2fb1abf5..bf68245155d 100644
--- a/gcc/config/i386/avx512vp2intersectintrin.h
+++ b/gcc/config/i386/avx512vp2intersectintrin.h
@@ -28,9 +28,9 @@
 #ifndef _AVX512VP2INTERSECTINTRIN_H_INCLUDED
 #define _AVX512VP2INTERSECTINTRIN_H_INCLUDED
 
-#if !defined(__AVX512VP2INTERSECT__)
+#if !defined(__AVX512VP2INTERSECT__) || !defined (__EVEX512__)
 #pragma GCC push_options
-#pragma GCC target("avx512vp2intersect")
+#pragma GCC target("avx512vp2intersect,evex512")
 #define __DISABLE_AVX512VP2INTERSECT__
 #endif /* __AVX512VP2INTERSECT__ */
 
diff --git a/gcc/config/i386/avx512vpopcntdqintrin.h b/gcc/config/i386/avx512vpopcntdqintrin.h
index 47897fbd8d7..9470a403f8e 100644
--- a/gcc/config/i386/avx512vpopcntdqintrin.h
+++ b/gcc/config/i386/avx512vpopcntdqintrin.h
@@ -28,9 +28,9 @@
 #ifndef _AVX512VPOPCNTDQINTRIN_H_INCLUDED
 #define _AVX512VPOPCNTDQINTRIN_H_INCLUDED
 
-#ifndef __AVX512VPOPCNTDQ__
+#if !defined (__AVX512VPOPCNTDQ__) || !defined (__EVEX512__)
 #pragma GCC push_options
-#pragma GCC target("avx512vpopcntdq")
+#pragma GCC target("avx512vpopcntdq,evex512")
 #define __DISABLE_AVX512VPOPCNTDQ__
 #endif /* __AVX512VPOPCNTDQ__ */
 
diff --git a/gcc/config/i386/gfniintrin.h b/gcc/config/i386/gfniintrin.h
index ef3dc225b40..907e7a0cf7a 100644
--- a/gcc/config/i386/gfniintrin.h
+++ b/gcc/config/i386/gfniintrin.h
@@ -297,9 +297,53 @@ _mm256_maskz_gf2p8affine_epi64_epi8 (__mmask32 __A, __m256i __B,
 #pragma GCC pop_options
 #endif /* __GFNIAVX512VLBW__ */
 
-#if !defined(__GFNI__) || !defined(__AVX512F__) || !defined(__AVX512BW__)
+#if !defined(__GFNI__) || !defined(__EVEX512__) || !defined(__AVX512F__)
 #pragma GCC push_options
-#pragma GCC target("gfni,avx512f,avx512bw")
+#pragma GCC target("gfni,avx512f,evex512")
+#define __DISABLE_GFNIAVX512F__
+#endif /* __GFNIAVX512F__ */
+
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_gf2p8mul_epi8 (__m512i __A, __m512i __B)
+{
+  return (__m512i) __builtin_ia32_vgf2p8mulb_v64qi ((__v64qi) __A,
+						    (__v64qi) __B);
+}
+
+#ifdef __OPTIMIZE__
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_gf2p8affineinv_epi64_epi8 (__m512i __A, __m512i __B, const int __C)
+{
+  return (__m512i) __builtin_ia32_vgf2p8affineinvqb_v64qi ((__v64qi) __A,
+							   (__v64qi) __B, __C);
+}
+
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_gf2p8affine_epi64_epi8 (__m512i __A, __m512i __B, const int __C)
+{
+  return (__m512i) __builtin_ia32_vgf2p8affineqb_v64qi ((__v64qi) __A,
+							(__v64qi) __B, __C);
+}
+#else
+#define _mm512_gf2p8affineinv_epi64_epi8(A, B, C)			\
+  ((__m512i) __builtin_ia32_vgf2p8affineinvqb_v64qi (			\
+	(__v64qi)(__m512i)(A), (__v64qi)(__m512i)(B), (int)(C)))
+#define _mm512_gf2p8affine_epi64_epi8(A, B, C)				    \
+  ((__m512i) __builtin_ia32_vgf2p8affineqb_v64qi ((__v64qi)(__m512i)(A),    \
+	 (__v64qi)(__m512i)(B), (int)(C)))
+#endif
+
+#ifdef __DISABLE_GFNIAVX512F__
+#undef __DISABLE_GFNIAVX512F__
+#pragma GCC pop_options
+#endif /* __GFNIAVX512F__ */
+
+#if !defined(__GFNI__) || !defined(__EVEX512__) || !defined(__AVX512BW__)
+#pragma GCC push_options
+#pragma GCC target("gfni,avx512bw,evex512")
 #define __DISABLE_GFNIAVX512FBW__
 #endif /* __GFNIAVX512FBW__ */
 
@@ -319,13 +363,6 @@ _mm512_maskz_gf2p8mul_epi8 (__mmask64 __A, __m512i __B, __m512i __C)
   return (__m512i) __builtin_ia32_vgf2p8mulb_v64qi_mask ((__v64qi) __B,
 			(__v64qi) __C, (__v64qi) _mm512_setzero_si512 (), __A);
 }
-extern __inline __m512i
-__attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_gf2p8mul_epi8 (__m512i __A, __m512i __B)
-{
-  return (__m512i) __builtin_ia32_vgf2p8mulb_v64qi ((__v64qi) __A,
-						    (__v64qi) __B);
-}
 
 #ifdef __OPTIMIZE__
 extern __inline __m512i
@@ -350,14 +387,6 @@ _mm512_maskz_gf2p8affineinv_epi64_epi8 (__mmask64 __A, __m512i __B,
 				(__v64qi) _mm512_setzero_si512 (), __A);
 }
 
-extern __inline __m512i
-__attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_gf2p8affineinv_epi64_epi8 (__m512i __A, __m512i __B, const int __C)
-{
-  return (__m512i) __builtin_ia32_vgf2p8affineinvqb_v64qi ((__v64qi) __A,
-							   (__v64qi) __B, __C);
-}
-
 extern __inline __m512i
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_mask_gf2p8affine_epi64_epi8 (__m512i __A, __mmask64 __B, __m512i __C,
@@ -375,13 +404,6 @@ _mm512_maskz_gf2p8affine_epi64_epi8 (__mmask64 __A, __m512i __B, __m512i __C,
   return (__m512i) __builtin_ia32_vgf2p8affineqb_v64qi_mask ((__v64qi) __B,
 		  (__v64qi) __C, __D, (__v64qi) _mm512_setzero_si512 (), __A);
 }
-extern __inline __m512i
-__attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_gf2p8affine_epi64_epi8 (__m512i __A, __m512i __B, const int __C)
-{
-  return (__m512i) __builtin_ia32_vgf2p8affineqb_v64qi ((__v64qi) __A,
-							(__v64qi) __B, __C);
-}
 #else
 #define _mm512_mask_gf2p8affineinv_epi64_epi8(A, B, C, D, E) 		\
   ((__m512i) __builtin_ia32_vgf2p8affineinvqb_v64qi_mask(		\
@@ -391,9 +413,6 @@ _mm512_gf2p8affine_epi64_epi8 (__m512i __A, __m512i __B, const int __C)
   ((__m512i) __builtin_ia32_vgf2p8affineinvqb_v64qi_mask(		\
 	(__v64qi)(__m512i)(B), (__v64qi)(__m512i)(C), (int)(D),		\
 	(__v64qi)(__m512i) _mm512_setzero_si512 (), (__mmask64)(A)))
-#define _mm512_gf2p8affineinv_epi64_epi8(A, B, C)			\
-  ((__m512i) __builtin_ia32_vgf2p8affineinvqb_v64qi (			\
-	(__v64qi)(__m512i)(A), (__v64qi)(__m512i)(B), (int)(C)))
 #define _mm512_mask_gf2p8affine_epi64_epi8(A, B, C, D, E)		    \
   ((__m512i) __builtin_ia32_vgf2p8affineqb_v64qi_mask((__v64qi)(__m512i)(C),\
      (__v64qi)(__m512i)(D), (int)(E), (__v64qi)(__m512i)(A), (__mmask64)(B)))
@@ -401,9 +420,6 @@ _mm512_gf2p8affine_epi64_epi8 (__m512i __A, __m512i __B, const int __C)
   ((__m512i) __builtin_ia32_vgf2p8affineqb_v64qi_mask((__v64qi)(__m512i)(B),\
 	 (__v64qi)(__m512i)(C), (int)(D),				    \
 	 (__v64qi)(__m512i) _mm512_setzero_si512 (), (__mmask64)(A)))
-#define _mm512_gf2p8affine_epi64_epi8(A, B, C)				    \
-  ((__m512i) __builtin_ia32_vgf2p8affineqb_v64qi ((__v64qi)(__m512i)(A),    \
-	 (__v64qi)(__m512i)(B), (int)(C)))
 #endif
 
 #ifdef __DISABLE_GFNIAVX512FBW__
diff --git a/gcc/config/i386/immintrin.h b/gcc/config/i386/immintrin.h
index 29b4dbbda24..4e17901db15 100644
--- a/gcc/config/i386/immintrin.h
+++ b/gcc/config/i386/immintrin.h
@@ -96,6 +96,8 @@
 
 #include <avx512bitalgintrin.h>
 
+#include <avx512bitalgvlintrin.h>
+
 #include <avx512vp2intersectintrin.h>
 
 #include <avx512vp2intersectvlintrin.h>
diff --git a/gcc/config/i386/vaesintrin.h b/gcc/config/i386/vaesintrin.h
index 58fc19c9eb3..b2bcdbe5bd1 100644
--- a/gcc/config/i386/vaesintrin.h
+++ b/gcc/config/i386/vaesintrin.h
@@ -66,9 +66,9 @@ _mm256_aesenclast_epi128 (__m256i __A, __m256i __B)
 #endif /* __DISABLE_VAES__ */
 
 
-#if !defined(__VAES__) || !defined(__AVX512F__)
+#if !defined(__VAES__) || !defined(__AVX512F__) || !defined(__EVEX512__)
 #pragma GCC push_options
-#pragma GCC target("vaes,avx512f")
+#pragma GCC target("vaes,avx512f,evex512")
 #define __DISABLE_VAESF__
 #endif /* __VAES__ */
 
diff --git a/gcc/config/i386/vpclmulqdqintrin.h b/gcc/config/i386/vpclmulqdqintrin.h
index 2c83b6037a0..c8c2c19d33f 100644
--- a/gcc/config/i386/vpclmulqdqintrin.h
+++ b/gcc/config/i386/vpclmulqdqintrin.h
@@ -28,9 +28,9 @@
 #ifndef _VPCLMULQDQINTRIN_H_INCLUDED
 #define _VPCLMULQDQINTRIN_H_INCLUDED
 
-#if !defined(__VPCLMULQDQ__) || !defined(__AVX512F__)
+#if !defined(__VPCLMULQDQ__) || !defined(__AVX512F__) || !defined(__EVEX512__)
 #pragma GCC push_options
-#pragma GCC target("vpclmulqdq,avx512f")
+#pragma GCC target("vpclmulqdq,avx512f,evex512")
 #define __DISABLE_VPCLMULQDQF__
 #endif /* __VPCLMULQDQF__ */
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 06/18] [PATCH 5/5] Push evex512 target for 512 bit intrins
  2023-09-21  7:19 [PATCH 00/18] Support -mevex512 for AVX512 Hu, Lin1
                   ` (4 preceding siblings ...)
  2023-09-21  7:20 ` [PATCH 05/18] [PATCH 4/5] " Hu, Lin1
@ 2023-09-21  7:20 ` Hu, Lin1
  2023-09-21  7:20 ` [PATCH 07/18] [PATCH 1/5] Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins Hu, Lin1
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Hu, Lin1 @ 2023-09-21  7:20 UTC (permalink / raw)
  To: gcc-patches; +Cc: hongtao.liu, ubizjak, haochen.jiang

From: Haochen Jiang <haochen.jiang@intel.com>

gcc/Changelog:

	* config/i386/avx512fp16intrin.h: Add evex512 target for 512 bit
	intrins.

Co-authored-by: Hu, Lin1 <lin1.hu@intel.com>
---
 gcc/config/i386/avx512fp16intrin.h | 8925 ++++++++++++++--------------
 1 file changed, 4476 insertions(+), 4449 deletions(-)

diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h
index dd083e5ed67..92c0c24e9bd 100644
--- a/gcc/config/i386/avx512fp16intrin.h
+++ b/gcc/config/i386/avx512fp16intrin.h
@@ -25,8 +25,8 @@
 #error "Never use <avx512fp16intrin.h> directly; include <immintrin.h> instead."
 #endif
 
-#ifndef __AVX512FP16INTRIN_H_INCLUDED
-#define __AVX512FP16INTRIN_H_INCLUDED
+#ifndef _AVX512FP16INTRIN_H_INCLUDED
+#define _AVX512FP16INTRIN_H_INCLUDED
 
 #ifndef __AVX512FP16__
 #pragma GCC push_options
@@ -37,21 +37,17 @@
 /* Internal data types for implementing the intrinsics.  */
 typedef _Float16 __v8hf __attribute__ ((__vector_size__ (16)));
 typedef _Float16 __v16hf __attribute__ ((__vector_size__ (32)));
-typedef _Float16 __v32hf __attribute__ ((__vector_size__ (64)));
 
 /* The Intel API is flexible enough that we must allow aliasing with other
    vector types, and their scalar components.  */
 typedef _Float16 __m128h __attribute__ ((__vector_size__ (16), __may_alias__));
 typedef _Float16 __m256h __attribute__ ((__vector_size__ (32), __may_alias__));
-typedef _Float16 __m512h __attribute__ ((__vector_size__ (64), __may_alias__));
 
 /* Unaligned version of the same type.  */
 typedef _Float16 __m128h_u __attribute__ ((__vector_size__ (16),	\
 					   __may_alias__, __aligned__ (1)));
 typedef _Float16 __m256h_u __attribute__ ((__vector_size__ (32),	\
 					   __may_alias__, __aligned__ (1)));
-typedef _Float16 __m512h_u __attribute__ ((__vector_size__ (64),	\
-					   __may_alias__, __aligned__ (1)));
 
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
@@ -78,33 +74,8 @@ _mm256_set_ph (_Float16 __A15, _Float16 __A14, _Float16 __A13,
 					   __A12, __A13, __A14, __A15 };
 }
 
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_set_ph (_Float16 __A31, _Float16 __A30, _Float16 __A29,
-	       _Float16 __A28, _Float16 __A27, _Float16 __A26,
-	       _Float16 __A25, _Float16 __A24, _Float16 __A23,
-	       _Float16 __A22, _Float16 __A21, _Float16 __A20,
-	       _Float16 __A19, _Float16 __A18, _Float16 __A17,
-	       _Float16 __A16, _Float16 __A15, _Float16 __A14,
-	       _Float16 __A13, _Float16 __A12, _Float16 __A11,
-	       _Float16 __A10, _Float16 __A9, _Float16 __A8,
-	       _Float16 __A7, _Float16 __A6, _Float16 __A5,
-	       _Float16 __A4, _Float16 __A3, _Float16 __A2,
-	       _Float16 __A1, _Float16 __A0)
-{
-  return __extension__ (__m512h)(__v32hf){ __A0, __A1, __A2, __A3,
-					   __A4, __A5, __A6, __A7,
-					   __A8, __A9, __A10, __A11,
-					   __A12, __A13, __A14, __A15,
-					   __A16, __A17, __A18, __A19,
-					   __A20, __A21, __A22, __A23,
-					   __A24, __A25, __A26, __A27,
-					   __A28, __A29, __A30, __A31 };
-}
-
-/* Create vectors of elements in the reversed order from _mm_set_ph,
-   _mm256_set_ph and _mm512_set_ph functions.  */
-
+/* Create vectors of elements in the reversed order from _mm_set_ph
+   and _mm256_set_ph functions.  */
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_setr_ph (_Float16 __A0, _Float16 __A1, _Float16 __A2,
@@ -128,30 +99,7 @@ _mm256_setr_ph (_Float16 __A0, _Float16 __A1, _Float16 __A2,
 			__A0);
 }
 
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_setr_ph (_Float16 __A0, _Float16 __A1, _Float16 __A2,
-		_Float16 __A3, _Float16 __A4, _Float16 __A5,
-		_Float16 __A6, _Float16 __A7, _Float16 __A8,
-		_Float16 __A9, _Float16 __A10, _Float16 __A11,
-		_Float16 __A12, _Float16 __A13, _Float16 __A14,
-		_Float16 __A15, _Float16 __A16, _Float16 __A17,
-		_Float16 __A18, _Float16 __A19, _Float16 __A20,
-		_Float16 __A21, _Float16 __A22, _Float16 __A23,
-		_Float16 __A24, _Float16 __A25, _Float16 __A26,
-		_Float16 __A27, _Float16 __A28, _Float16 __A29,
-		_Float16 __A30, _Float16 __A31)
-
-{
-  return _mm512_set_ph (__A31, __A30, __A29, __A28, __A27, __A26, __A25,
-			__A24, __A23, __A22, __A21, __A20, __A19, __A18,
-			__A17, __A16, __A15, __A14, __A13, __A12, __A11,
-			__A10, __A9, __A8, __A7, __A6, __A5, __A4, __A3,
-			__A2, __A1, __A0);
-}
-
 /* Broadcast _Float16 to vector.  */
-
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_set1_ph (_Float16 __A)
@@ -167,18 +115,7 @@ _mm256_set1_ph (_Float16 __A)
 			__A, __A, __A, __A, __A, __A, __A, __A);
 }
 
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_set1_ph (_Float16 __A)
-{
-  return _mm512_set_ph (__A, __A, __A, __A, __A, __A, __A, __A,
-			__A, __A, __A, __A, __A, __A, __A, __A,
-			__A, __A, __A, __A, __A, __A, __A, __A,
-			__A, __A, __A, __A, __A, __A, __A, __A);
-}
-
 /* Create a vector with all zeros.  */
-
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_setzero_ph (void)
@@ -193,13 +130,6 @@ _mm256_setzero_ph (void)
   return _mm256_set1_ph (0.0f16);
 }
 
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_setzero_ph (void)
-{
-  return _mm512_set1_ph (0.0f16);
-}
-
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_undefined_ph (void)
@@ -222,3815 +152,4056 @@ _mm256_undefined_ph (void)
   return __Y;
 }
 
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_undefined_ph (void)
-{
-#pragma GCC diagnostic push
-#pragma GCC diagnostic ignored "-Winit-self"
-  __m512h __Y = __Y;
-#pragma GCC diagnostic pop
-  return __Y;
-}
-
 extern __inline _Float16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtsh_h (__m128h __A)
+_mm256_cvtsh_h (__m256h __A)
 {
   return __A[0];
 }
 
-extern __inline _Float16
+extern __inline __m256h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm256_cvtsh_h (__m256h __A)
+_mm256_load_ph (void const *__P)
 {
-  return __A[0];
+  return *(const __m256h *) __P;
 }
 
-extern __inline _Float16
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtsh_h (__m512h __A)
+_mm_load_ph (void const *__P)
 {
-  return __A[0];
+  return *(const __m128h *) __P;
 }
 
-extern __inline __m512
+extern __inline __m256h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_castph_ps (__m512h __a)
+_mm256_loadu_ph (void const *__P)
 {
-  return (__m512) __a;
+  return *(const __m256h_u *) __P;
 }
 
-extern __inline __m512d
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_castph_pd (__m512h __a)
+_mm_loadu_ph (void const *__P)
 {
-  return (__m512d) __a;
+  return *(const __m128h_u *) __P;
 }
 
-extern __inline __m512i
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_castph_si512 (__m512h __a)
+_mm256_store_ph (void *__P, __m256h __A)
 {
-  return (__m512i) __a;
+   *(__m256h *) __P = __A;
 }
 
-extern __inline __m128h
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_castph512_ph128 (__m512h __A)
+_mm_store_ph (void *__P, __m128h __A)
 {
-  union
-  {
-    __m128h __a[4];
-    __m512h __v;
-  } __u = { .__v = __A };
-  return __u.__a[0];
+   *(__m128h *) __P = __A;
 }
 
-extern __inline __m256h
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_castph512_ph256 (__m512h __A)
+_mm256_storeu_ph (void *__P, __m256h __A)
 {
-  union
-  {
-    __m256h __a[2];
-    __m512h __v;
-  } __u = { .__v = __A };
-  return __u.__a[0];
+   *(__m256h_u *) __P = __A;
 }
 
-extern __inline __m512h
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_castph128_ph512 (__m128h __A)
+_mm_storeu_ph (void *__P, __m128h __A)
 {
-  union
-  {
-    __m128h __a[4];
-    __m512h __v;
-  } __u;
-  __u.__a[0] = __A;
-  return __u.__v;
+   *(__m128h_u *) __P = __A;
 }
 
-extern __inline __m512h
+/* Create a vector with element 0 as F and the rest zero.  */
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_castph256_ph512 (__m256h __A)
+_mm_set_sh (_Float16 __F)
 {
-  union
-  {
-    __m256h __a[2];
-    __m512h __v;
-  } __u;
-  __u.__a[0] = __A;
-  return __u.__v;
+  return _mm_set_ph (0.0f16, 0.0f16, 0.0f16, 0.0f16, 0.0f16, 0.0f16, 0.0f16,
+		     __F);
 }
 
-extern __inline __m512h
+/* Create a vector with element 0 as *P and the rest zero.  */
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_zextph128_ph512 (__m128h __A)
+_mm_load_sh (void const *__P)
 {
-  return (__m512h) _mm512_insertf32x4 (_mm512_setzero_ps (),
-				       (__m128) __A, 0);
+  return _mm_set_ph (0.0f16, 0.0f16, 0.0f16, 0.0f16, 0.0f16, 0.0f16, 0.0f16,
+		     *(_Float16 const *) __P);
 }
 
-extern __inline __m512h
+/* Stores the lower _Float16 value.  */
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_zextph256_ph512 (__m256h __A)
+_mm_store_sh (void *__P, __m128h __A)
 {
-  return (__m512h) _mm512_insertf64x4 (_mm512_setzero_pd (),
-				       (__m256d) __A, 0);
+  *(_Float16 *) __P = ((__v8hf)__A)[0];
 }
 
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_castps_ph (__m512 __a)
+/* Intrinsics of v[add,sub,mul,div]sh.  */
+extern __inline __m128h
+  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_add_sh (__m128h __A, __m128h __B)
 {
-  return (__m512h) __a;
+  __A[0] += __B[0];
+  return __A;
 }
 
-extern __inline __m512h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_castpd_ph (__m512d __a)
+_mm_mask_add_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
 {
-  return (__m512h) __a;
+  return __builtin_ia32_addsh_mask (__C, __D, __A, __B);
 }
 
-extern __inline __m512h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_castsi512_ph (__m512i __a)
+_mm_maskz_add_sh (__mmask8 __A, __m128h __B, __m128h __C)
 {
-  return (__m512h) __a;
+  return __builtin_ia32_addsh_mask (__B, __C, _mm_setzero_ph (),
+				    __A);
 }
 
-/* Create a vector with element 0 as F and the rest zero.  */
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_set_sh (_Float16 __F)
+_mm_sub_sh (__m128h __A, __m128h __B)
 {
-  return _mm_set_ph (0.0f16, 0.0f16, 0.0f16, 0.0f16, 0.0f16, 0.0f16, 0.0f16,
-		     __F);
+  __A[0] -= __B[0];
+  return __A;
 }
 
-/* Create a vector with element 0 as *P and the rest zero.  */
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_load_sh (void const *__P)
+_mm_mask_sub_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
 {
-  return _mm_set_ph (0.0f16, 0.0f16, 0.0f16, 0.0f16, 0.0f16, 0.0f16, 0.0f16,
-		     *(_Float16 const *) __P);
+  return __builtin_ia32_subsh_mask (__C, __D, __A, __B);
 }
 
-extern __inline __m512h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_load_ph (void const *__P)
+_mm_maskz_sub_sh (__mmask8 __A, __m128h __B, __m128h __C)
 {
-  return *(const __m512h *) __P;
+  return __builtin_ia32_subsh_mask (__B, __C, _mm_setzero_ph (),
+				    __A);
 }
 
-extern __inline __m256h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm256_load_ph (void const *__P)
+_mm_mul_sh (__m128h __A, __m128h __B)
 {
-  return *(const __m256h *) __P;
+  __A[0] *= __B[0];
+  return __A;
 }
 
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_load_ph (void const *__P)
+_mm_mask_mul_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
 {
-  return *(const __m128h *) __P;
+  return __builtin_ia32_mulsh_mask (__C, __D, __A, __B);
 }
 
-extern __inline __m512h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_loadu_ph (void const *__P)
+_mm_maskz_mul_sh (__mmask8 __A, __m128h __B, __m128h __C)
 {
-  return *(const __m512h_u *) __P;
+  return __builtin_ia32_mulsh_mask (__B, __C, _mm_setzero_ph (), __A);
 }
 
-extern __inline __m256h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm256_loadu_ph (void const *__P)
+_mm_div_sh (__m128h __A, __m128h __B)
 {
-  return *(const __m256h_u *) __P;
+  __A[0] /= __B[0];
+  return __A;
 }
 
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_loadu_ph (void const *__P)
+_mm_mask_div_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
 {
-  return *(const __m128h_u *) __P;
+  return __builtin_ia32_divsh_mask (__C, __D, __A, __B);
 }
 
-/* Stores the lower _Float16 value.  */
-extern __inline void
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_store_sh (void *__P, __m128h __A)
+_mm_maskz_div_sh (__mmask8 __A, __m128h __B, __m128h __C)
 {
-  *(_Float16 *) __P = ((__v8hf)__A)[0];
-}
-
-extern __inline void
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_store_ph (void *__P, __m512h __A)
-{
-   *(__m512h *) __P = __A;
+  return __builtin_ia32_divsh_mask (__B, __C, _mm_setzero_ph (),
+				    __A);
 }
 
-extern __inline void
+#ifdef __OPTIMIZE__
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm256_store_ph (void *__P, __m256h __A)
+_mm_add_round_sh (__m128h __A, __m128h __B, const int __C)
 {
-   *(__m256h *) __P = __A;
+  return __builtin_ia32_addsh_mask_round (__A, __B,
+					  _mm_setzero_ph (),
+					  (__mmask8) -1, __C);
 }
 
-extern __inline void
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_store_ph (void *__P, __m128h __A)
+_mm_mask_add_round_sh (__m128h __A, __mmask8 __B, __m128h __C,
+		       __m128h __D, const int __E)
 {
-   *(__m128h *) __P = __A;
+  return __builtin_ia32_addsh_mask_round (__C, __D, __A, __B, __E);
 }
 
-extern __inline void
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_storeu_ph (void *__P, __m512h __A)
+_mm_maskz_add_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
+			const int __D)
 {
-   *(__m512h_u *) __P = __A;
+  return __builtin_ia32_addsh_mask_round (__B, __C,
+					  _mm_setzero_ph (),
+					  __A, __D);
 }
 
-extern __inline void
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm256_storeu_ph (void *__P, __m256h __A)
+_mm_sub_round_sh (__m128h __A, __m128h __B, const int __C)
 {
-   *(__m256h_u *) __P = __A;
+  return __builtin_ia32_subsh_mask_round (__A, __B,
+					  _mm_setzero_ph (),
+					  (__mmask8) -1, __C);
 }
 
-extern __inline void
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_storeu_ph (void *__P, __m128h __A)
+_mm_mask_sub_round_sh (__m128h __A, __mmask8 __B, __m128h __C,
+		       __m128h __D, const int __E)
 {
-   *(__m128h_u *) __P = __A;
+  return __builtin_ia32_subsh_mask_round (__C, __D, __A, __B, __E);
 }
 
-extern __inline __m512h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_abs_ph (__m512h __A)
+_mm_maskz_sub_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
+			const int __D)
 {
-  return (__m512h) _mm512_and_epi32 ( _mm512_set1_epi32 (0x7FFF7FFF),
-				      (__m512i) __A);
+  return __builtin_ia32_subsh_mask_round (__B, __C,
+					  _mm_setzero_ph (),
+					  __A, __D);
 }
 
-/* Intrinsics v[add,sub,mul,div]ph.  */
-extern __inline __m512h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_add_ph (__m512h __A, __m512h __B)
+_mm_mul_round_sh (__m128h __A, __m128h __B, const int __C)
 {
-  return (__m512h) ((__v32hf) __A + (__v32hf) __B);
+  return __builtin_ia32_mulsh_mask_round (__A, __B,
+					  _mm_setzero_ph (),
+					  (__mmask8) -1, __C);
 }
 
-extern __inline __m512h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_add_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D)
+_mm_mask_mul_round_sh (__m128h __A, __mmask8 __B, __m128h __C,
+		       __m128h __D, const int __E)
 {
-  return __builtin_ia32_addph512_mask (__C, __D, __A, __B);
+  return __builtin_ia32_mulsh_mask_round (__C, __D, __A, __B, __E);
 }
 
-extern __inline __m512h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_add_ph (__mmask32 __A, __m512h __B, __m512h __C)
+_mm_maskz_mul_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
+			const int __D)
 {
-  return __builtin_ia32_addph512_mask (__B, __C,
-				       _mm512_setzero_ph (), __A);
+  return __builtin_ia32_mulsh_mask_round (__B, __C,
+					  _mm_setzero_ph (),
+					  __A, __D);
 }
 
-extern __inline __m512h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_sub_ph (__m512h __A, __m512h __B)
+_mm_div_round_sh (__m128h __A, __m128h __B, const int __C)
 {
-  return (__m512h) ((__v32hf) __A - (__v32hf) __B);
+  return __builtin_ia32_divsh_mask_round (__A, __B,
+					  _mm_setzero_ph (),
+					  (__mmask8) -1, __C);
 }
 
-extern __inline __m512h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_sub_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D)
+_mm_mask_div_round_sh (__m128h __A, __mmask8 __B, __m128h __C,
+		       __m128h __D, const int __E)
 {
-  return __builtin_ia32_subph512_mask (__C, __D, __A, __B);
+  return __builtin_ia32_divsh_mask_round (__C, __D, __A, __B, __E);
 }
 
-extern __inline __m512h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_sub_ph (__mmask32 __A, __m512h __B, __m512h __C)
+_mm_maskz_div_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
+			const int __D)
 {
-  return __builtin_ia32_subph512_mask (__B, __C,
-				       _mm512_setzero_ph (), __A);
+  return __builtin_ia32_divsh_mask_round (__B, __C,
+					  _mm_setzero_ph (),
+					  __A, __D);
 }
+#else
+#define _mm_add_round_sh(A, B, C)					\
+  ((__m128h)__builtin_ia32_addsh_mask_round ((A), (B),			\
+					     _mm_setzero_ph (),		\
+					     (__mmask8)-1, (C)))
 
-extern __inline __m512h
+#define _mm_mask_add_round_sh(A, B, C, D, E)				\
+  ((__m128h)__builtin_ia32_addsh_mask_round ((C), (D), (A), (B), (E)))
+
+#define _mm_maskz_add_round_sh(A, B, C, D)			\
+  ((__m128h)__builtin_ia32_addsh_mask_round ((B), (C),		\
+					     _mm_setzero_ph (),	\
+					     (A), (D)))
+
+#define _mm_sub_round_sh(A, B, C)					\
+  ((__m128h)__builtin_ia32_subsh_mask_round ((A), (B),			\
+					     _mm_setzero_ph (),		\
+					     (__mmask8)-1, (C)))
+
+#define _mm_mask_sub_round_sh(A, B, C, D, E)				\
+  ((__m128h)__builtin_ia32_subsh_mask_round ((C), (D), (A), (B), (E)))
+
+#define _mm_maskz_sub_round_sh(A, B, C, D)			\
+  ((__m128h)__builtin_ia32_subsh_mask_round ((B), (C),		\
+					     _mm_setzero_ph (),	\
+					     (A), (D)))
+
+#define _mm_mul_round_sh(A, B, C)					\
+  ((__m128h)__builtin_ia32_mulsh_mask_round ((A), (B),			\
+					     _mm_setzero_ph (),		\
+					     (__mmask8)-1, (C)))
+
+#define _mm_mask_mul_round_sh(A, B, C, D, E)				\
+  ((__m128h)__builtin_ia32_mulsh_mask_round ((C), (D), (A), (B), (E)))
+
+#define _mm_maskz_mul_round_sh(A, B, C, D)			\
+  ((__m128h)__builtin_ia32_mulsh_mask_round ((B), (C),		\
+					     _mm_setzero_ph (),	\
+					     (A), (D)))
+
+#define _mm_div_round_sh(A, B, C)					\
+  ((__m128h)__builtin_ia32_divsh_mask_round ((A), (B),			\
+					     _mm_setzero_ph (),		\
+					     (__mmask8)-1, (C)))
+
+#define _mm_mask_div_round_sh(A, B, C, D, E)				\
+  ((__m128h)__builtin_ia32_divsh_mask_round ((C), (D), (A), (B), (E)))
+
+#define _mm_maskz_div_round_sh(A, B, C, D)			\
+  ((__m128h)__builtin_ia32_divsh_mask_round ((B), (C),		\
+					     _mm_setzero_ph (),	\
+					     (A), (D)))
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsic vmaxsh vminsh.  */
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mul_ph (__m512h __A, __m512h __B)
+_mm_max_sh (__m128h __A, __m128h __B)
 {
-  return (__m512h) ((__v32hf) __A * (__v32hf) __B);
+  __A[0] = __A[0] > __B[0] ? __A[0] : __B[0];
+  return __A;
 }
 
-extern __inline __m512h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_mul_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D)
+_mm_mask_max_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
 {
-  return __builtin_ia32_mulph512_mask (__C, __D, __A, __B);
+  return __builtin_ia32_maxsh_mask (__C, __D, __A, __B);
 }
 
-extern __inline __m512h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_mul_ph (__mmask32 __A, __m512h __B, __m512h __C)
+_mm_maskz_max_sh (__mmask8 __A, __m128h __B, __m128h __C)
 {
-  return __builtin_ia32_mulph512_mask (__B, __C,
-				       _mm512_setzero_ph (), __A);
+  return __builtin_ia32_maxsh_mask (__B, __C, _mm_setzero_ph (),
+				    __A);
 }
 
-extern __inline __m512h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_div_ph (__m512h __A, __m512h __B)
+_mm_min_sh (__m128h __A, __m128h __B)
 {
-  return (__m512h) ((__v32hf) __A / (__v32hf) __B);
+  __A[0] = __A[0] < __B[0] ? __A[0] : __B[0];
+  return __A;
 }
 
-extern __inline __m512h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_div_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D)
+_mm_mask_min_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
 {
-  return __builtin_ia32_divph512_mask (__C, __D, __A, __B);
+  return __builtin_ia32_minsh_mask (__C, __D, __A, __B);
 }
 
-extern __inline __m512h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_div_ph (__mmask32 __A, __m512h __B, __m512h __C)
+_mm_maskz_min_sh (__mmask8 __A, __m128h __B, __m128h __C)
 {
-  return __builtin_ia32_divph512_mask (__B, __C,
-				       _mm512_setzero_ph (), __A);
+  return __builtin_ia32_minsh_mask (__B, __C, _mm_setzero_ph (),
+				    __A);
 }
 
 #ifdef __OPTIMIZE__
-extern __inline __m512h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_add_round_ph (__m512h __A, __m512h __B, const int __C)
+_mm_max_round_sh (__m128h __A, __m128h __B, const int __C)
 {
-  return __builtin_ia32_addph512_mask_round (__A, __B,
-					     _mm512_setzero_ph (),
-					     (__mmask32) -1, __C);
+  return __builtin_ia32_maxsh_mask_round (__A, __B,
+					  _mm_setzero_ph (),
+					  (__mmask8) -1, __C);
 }
 
-extern __inline __m512h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_add_round_ph (__m512h __A, __mmask32 __B, __m512h __C,
-			  __m512h __D, const int __E)
+_mm_mask_max_round_sh (__m128h __A, __mmask8 __B, __m128h __C,
+		       __m128h __D, const int __E)
 {
-  return __builtin_ia32_addph512_mask_round (__C, __D, __A, __B, __E);
+  return __builtin_ia32_maxsh_mask_round (__C, __D, __A, __B, __E);
 }
 
-extern __inline __m512h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_add_round_ph (__mmask32 __A, __m512h __B, __m512h __C,
-			   const int __D)
+_mm_maskz_max_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
+			const int __D)
 {
-  return __builtin_ia32_addph512_mask_round (__B, __C,
-					     _mm512_setzero_ph (),
-					     __A, __D);
+  return __builtin_ia32_maxsh_mask_round (__B, __C,
+					  _mm_setzero_ph (),
+					  __A, __D);
 }
 
-extern __inline __m512h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_sub_round_ph (__m512h __A, __m512h __B, const int __C)
+_mm_min_round_sh (__m128h __A, __m128h __B, const int __C)
 {
-  return __builtin_ia32_subph512_mask_round (__A, __B,
-					     _mm512_setzero_ph (),
-					     (__mmask32) -1, __C);
+  return __builtin_ia32_minsh_mask_round (__A, __B,
+					  _mm_setzero_ph (),
+					  (__mmask8) -1, __C);
 }
 
-extern __inline __m512h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_sub_round_ph (__m512h __A, __mmask32 __B, __m512h __C,
-			  __m512h __D, const int __E)
+_mm_mask_min_round_sh (__m128h __A, __mmask8 __B, __m128h __C,
+		       __m128h __D, const int __E)
 {
-  return __builtin_ia32_subph512_mask_round (__C, __D, __A, __B, __E);
+  return __builtin_ia32_minsh_mask_round (__C, __D, __A, __B, __E);
 }
 
-extern __inline __m512h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_sub_round_ph (__mmask32 __A, __m512h __B, __m512h __C,
-			   const int __D)
+_mm_maskz_min_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
+			const int __D)
 {
-  return __builtin_ia32_subph512_mask_round (__B, __C,
-					     _mm512_setzero_ph (),
-					     __A, __D);
+  return __builtin_ia32_minsh_mask_round (__B, __C,
+					  _mm_setzero_ph (),
+					  __A, __D);
 }
 
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mul_round_ph (__m512h __A, __m512h __B, const int __C)
-{
-  return __builtin_ia32_mulph512_mask_round (__A, __B,
-					     _mm512_setzero_ph (),
-					     (__mmask32) -1, __C);
-}
+#else
+#define _mm_max_round_sh(A, B, C)			\
+  (__builtin_ia32_maxsh_mask_round ((A), (B),		\
+				    _mm_setzero_ph (),	\
+				    (__mmask8)-1, (C)))
 
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_mul_round_ph (__m512h __A, __mmask32 __B, __m512h __C,
-			  __m512h __D, const int __E)
-{
-  return __builtin_ia32_mulph512_mask_round (__C, __D, __A, __B, __E);
-}
+#define _mm_mask_max_round_sh(A, B, C, D, E)			\
+  (__builtin_ia32_maxsh_mask_round ((C), (D), (A), (B), (E)))
 
-extern __inline __m512h
+#define _mm_maskz_max_round_sh(A, B, C, D)		\
+  (__builtin_ia32_maxsh_mask_round ((B), (C),		\
+				    _mm_setzero_ph (),	\
+				    (A), (D)))
+
+#define _mm_min_round_sh(A, B, C)			\
+  (__builtin_ia32_minsh_mask_round ((A), (B),		\
+				    _mm_setzero_ph (),	\
+				    (__mmask8)-1, (C)))
+
+#define _mm_mask_min_round_sh(A, B, C, D, E)			\
+  (__builtin_ia32_minsh_mask_round ((C), (D), (A), (B), (E)))
+
+#define _mm_maskz_min_round_sh(A, B, C, D)		\
+  (__builtin_ia32_minsh_mask_round ((B), (C),		\
+				    _mm_setzero_ph (),	\
+				    (A), (D)))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vcmpsh.  */
+#ifdef __OPTIMIZE__
+extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_mul_round_ph (__mmask32 __A, __m512h __B, __m512h __C,
-			   const int __D)
+_mm_cmp_sh_mask (__m128h __A, __m128h __B, const int __C)
 {
-  return __builtin_ia32_mulph512_mask_round (__B, __C,
-					     _mm512_setzero_ph (),
-					     __A, __D);
+  return (__mmask8)
+    __builtin_ia32_cmpsh_mask_round (__A, __B,
+				     __C, (__mmask8) -1,
+				     _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512h
+extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_div_round_ph (__m512h __A, __m512h __B, const int __C)
+_mm_mask_cmp_sh_mask (__mmask8 __A, __m128h __B, __m128h __C,
+		      const int __D)
 {
-  return __builtin_ia32_divph512_mask_round (__A, __B,
-					     _mm512_setzero_ph (),
-					     (__mmask32) -1, __C);
+  return (__mmask8)
+    __builtin_ia32_cmpsh_mask_round (__B, __C,
+				     __D, __A,
+				     _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512h
+extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_div_round_ph (__m512h __A, __mmask32 __B, __m512h __C,
-			  __m512h __D, const int __E)
+_mm_cmp_round_sh_mask (__m128h __A, __m128h __B, const int __C,
+		       const int __D)
 {
-  return __builtin_ia32_divph512_mask_round (__C, __D, __A, __B, __E);
+  return (__mmask8) __builtin_ia32_cmpsh_mask_round (__A, __B,
+						     __C, (__mmask8) -1,
+						     __D);
 }
 
-extern __inline __m512h
+extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_div_round_ph (__mmask32 __A, __m512h __B, __m512h __C,
-			   const int __D)
+_mm_mask_cmp_round_sh_mask (__mmask8 __A, __m128h __B, __m128h __C,
+			    const int __D, const int __E)
 {
-  return __builtin_ia32_divph512_mask_round (__B, __C,
-					     _mm512_setzero_ph (),
-					     __A, __D);
+  return (__mmask8) __builtin_ia32_cmpsh_mask_round (__B, __C,
+						     __D, __A,
+						     __E);
 }
-#else
-#define _mm512_add_round_ph(A, B, C)					\
-  ((__m512h)__builtin_ia32_addph512_mask_round((A), (B),		\
-					       _mm512_setzero_ph (),	\
-					       (__mmask32)-1, (C)))
-
-#define _mm512_mask_add_round_ph(A, B, C, D, E)				\
-  ((__m512h)__builtin_ia32_addph512_mask_round((C), (D), (A), (B), (E)))
-
-#define _mm512_maskz_add_round_ph(A, B, C, D)				\
-  ((__m512h)__builtin_ia32_addph512_mask_round((B), (C),		\
-					       _mm512_setzero_ph (),	\
-					       (A), (D)))
-
-#define _mm512_sub_round_ph(A, B, C)					\
-  ((__m512h)__builtin_ia32_subph512_mask_round((A), (B),		\
-					       _mm512_setzero_ph (),	\
-					       (__mmask32)-1, (C)))
-
-#define _mm512_mask_sub_round_ph(A, B, C, D, E)				\
-  ((__m512h)__builtin_ia32_subph512_mask_round((C), (D), (A), (B), (E)))
-
-#define _mm512_maskz_sub_round_ph(A, B, C, D)				\
-  ((__m512h)__builtin_ia32_subph512_mask_round((B), (C),		\
-					       _mm512_setzero_ph (),	\
-					       (A), (D)))
-
-#define _mm512_mul_round_ph(A, B, C)					\
-  ((__m512h)__builtin_ia32_mulph512_mask_round((A), (B),		\
-					       _mm512_setzero_ph (),	\
-					       (__mmask32)-1, (C)))
 
-#define _mm512_mask_mul_round_ph(A, B, C, D, E)				\
-  ((__m512h)__builtin_ia32_mulph512_mask_round((C), (D), (A), (B), (E)))
+#else
+#define _mm_cmp_sh_mask(A, B, C)					\
+  (__builtin_ia32_cmpsh_mask_round ((A), (B), (C), (-1),		\
+				    (_MM_FROUND_CUR_DIRECTION)))
 
-#define _mm512_maskz_mul_round_ph(A, B, C, D)				\
-  ((__m512h)__builtin_ia32_mulph512_mask_round((B), (C),		\
-					       _mm512_setzero_ph (),	\
-					       (A), (D)))
+#define _mm_mask_cmp_sh_mask(A, B, C, D)				\
+  (__builtin_ia32_cmpsh_mask_round ((B), (C), (D), (A),			\
+				    (_MM_FROUND_CUR_DIRECTION)))
 
-#define _mm512_div_round_ph(A, B, C)					\
-  ((__m512h)__builtin_ia32_divph512_mask_round((A), (B),		\
-					       _mm512_setzero_ph (),	\
-					       (__mmask32)-1, (C)))
+#define _mm_cmp_round_sh_mask(A, B, C, D)			\
+  (__builtin_ia32_cmpsh_mask_round ((A), (B), (C), (-1), (D)))
 
-#define _mm512_mask_div_round_ph(A, B, C, D, E)				\
-  ((__m512h)__builtin_ia32_divph512_mask_round((C), (D), (A), (B), (E)))
+#define _mm_mask_cmp_round_sh_mask(A, B, C, D, E)		\
+  (__builtin_ia32_cmpsh_mask_round ((B), (C), (D), (A), (E)))
 
-#define _mm512_maskz_div_round_ph(A, B, C, D)				\
-  ((__m512h)__builtin_ia32_divph512_mask_round((B), (C),		\
-					       _mm512_setzero_ph (),	\
-					       (A), (D)))
-#endif  /* __OPTIMIZE__  */
+#endif /* __OPTIMIZE__ */
 
-extern __inline __m512h
+/* Intrinsics vcomish.  */
+extern __inline int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_conj_pch (__m512h __A)
+_mm_comieq_sh (__m128h __A, __m128h __B)
 {
-  return (__m512h) _mm512_xor_epi32 ((__m512i) __A, _mm512_set1_epi32 (1<<31));
+  return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_EQ_OS,
+					  (__mmask8) -1,
+					  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512h
+extern __inline int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_conj_pch (__m512h __W, __mmask16 __U, __m512h __A)
+_mm_comilt_sh (__m128h __A, __m128h __B)
 {
-  return (__m512h)
-    __builtin_ia32_movaps512_mask ((__v16sf) _mm512_conj_pch (__A),
-				   (__v16sf) __W,
-				   (__mmask16) __U);
+  return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_LT_OS,
+					  (__mmask8) -1,
+					  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512h
+extern __inline int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_conj_pch (__mmask16 __U, __m512h __A)
+_mm_comile_sh (__m128h __A, __m128h __B)
 {
-  return (__m512h)
-    __builtin_ia32_movaps512_mask ((__v16sf) _mm512_conj_pch (__A),
-				   (__v16sf) _mm512_setzero_ps (),
-				   (__mmask16) __U);
+  return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_LE_OS,
+					  (__mmask8) -1,
+					  _MM_FROUND_CUR_DIRECTION);
 }
 
-/* Intrinsics of v[add,sub,mul,div]sh.  */
-extern __inline __m128h
-  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_add_sh (__m128h __A, __m128h __B)
+extern __inline int
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comigt_sh (__m128h __A, __m128h __B)
 {
-  __A[0] += __B[0];
-  return __A;
+  return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_GT_OS,
+					  (__mmask8) -1,
+					  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128h
+extern __inline int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_add_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
+_mm_comige_sh (__m128h __A, __m128h __B)
 {
-  return __builtin_ia32_addsh_mask (__C, __D, __A, __B);
+  return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_GE_OS,
+					  (__mmask8) -1,
+					  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128h
+extern __inline int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_add_sh (__mmask8 __A, __m128h __B, __m128h __C)
+_mm_comineq_sh (__m128h __A, __m128h __B)
 {
-  return __builtin_ia32_addsh_mask (__B, __C, _mm_setzero_ph (),
-				    __A);
+  return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_NEQ_US,
+					  (__mmask8) -1,
+					  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128h
+extern __inline int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_sub_sh (__m128h __A, __m128h __B)
+_mm_ucomieq_sh (__m128h __A, __m128h __B)
 {
-  __A[0] -= __B[0];
-  return __A;
+  return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_EQ_OQ,
+					  (__mmask8) -1,
+					  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128h
+extern __inline int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_sub_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
+_mm_ucomilt_sh (__m128h __A, __m128h __B)
 {
-  return __builtin_ia32_subsh_mask (__C, __D, __A, __B);
+  return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_LT_OQ,
+					  (__mmask8) -1,
+					  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128h
+extern __inline int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_sub_sh (__mmask8 __A, __m128h __B, __m128h __C)
+_mm_ucomile_sh (__m128h __A, __m128h __B)
 {
-  return __builtin_ia32_subsh_mask (__B, __C, _mm_setzero_ph (),
-				    __A);
+  return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_LE_OQ,
+					  (__mmask8) -1,
+					  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128h
+extern __inline int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mul_sh (__m128h __A, __m128h __B)
+_mm_ucomigt_sh (__m128h __A, __m128h __B)
 {
-  __A[0] *= __B[0];
-  return __A;
+  return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_GT_OQ,
+					  (__mmask8) -1,
+					  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128h
+extern __inline int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_mul_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
+_mm_ucomige_sh (__m128h __A, __m128h __B)
 {
-  return __builtin_ia32_mulsh_mask (__C, __D, __A, __B);
+  return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_GE_OQ,
+					  (__mmask8) -1,
+					  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128h
+extern __inline int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_mul_sh (__mmask8 __A, __m128h __B, __m128h __C)
+_mm_ucomineq_sh (__m128h __A, __m128h __B)
 {
-  return __builtin_ia32_mulsh_mask (__B, __C, _mm_setzero_ph (), __A);
+  return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_NEQ_UQ,
+					  (__mmask8) -1,
+					  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128h
+#ifdef __OPTIMIZE__
+extern __inline int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_div_sh (__m128h __A, __m128h __B)
+_mm_comi_sh (__m128h __A, __m128h __B, const int __P)
 {
-  __A[0] /= __B[0];
-  return __A;
+  return __builtin_ia32_cmpsh_mask_round (__A, __B, __P,
+					  (__mmask8) -1,
+					  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128h
+extern __inline int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_div_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
+_mm_comi_round_sh (__m128h __A, __m128h __B, const int __P, const int __R)
 {
-  return __builtin_ia32_divsh_mask (__C, __D, __A, __B);
+  return __builtin_ia32_cmpsh_mask_round (__A, __B, __P,
+					  (__mmask8) -1,__R);
 }
 
+#else
+#define _mm_comi_round_sh(A, B, P, R)					\
+  (__builtin_ia32_cmpsh_mask_round ((A), (B), (P), (__mmask8) (-1), (R)))
+#define _mm_comi_sh(A, B, P)						\
+  (__builtin_ia32_cmpsh_mask_round ((A), (B), (P), (__mmask8) (-1),	\
+				    _MM_FROUND_CUR_DIRECTION))
+
+#endif /* __OPTIMIZE__  */
+
+/* Intrinsics vsqrtsh.  */
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_div_sh (__mmask8 __A, __m128h __B, __m128h __C)
+_mm_sqrt_sh (__m128h __A, __m128h __B)
 {
-  return __builtin_ia32_divsh_mask (__B, __C, _mm_setzero_ph (),
-				    __A);
+  return __builtin_ia32_sqrtsh_mask_round (__B, __A,
+					   _mm_setzero_ph (),
+					   (__mmask8) -1,
+					   _MM_FROUND_CUR_DIRECTION);
 }
 
-#ifdef __OPTIMIZE__
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_add_round_sh (__m128h __A, __m128h __B, const int __C)
+_mm_mask_sqrt_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
 {
-  return __builtin_ia32_addsh_mask_round (__A, __B,
-					  _mm_setzero_ph (),
-					  (__mmask8) -1, __C);
+  return __builtin_ia32_sqrtsh_mask_round (__D, __C, __A, __B,
+					   _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_add_round_sh (__m128h __A, __mmask8 __B, __m128h __C,
-		       __m128h __D, const int __E)
+_mm_maskz_sqrt_sh (__mmask8 __A, __m128h __B, __m128h __C)
 {
-  return __builtin_ia32_addsh_mask_round (__C, __D, __A, __B, __E);
+  return __builtin_ia32_sqrtsh_mask_round (__C, __B,
+					   _mm_setzero_ph (),
+					   __A, _MM_FROUND_CUR_DIRECTION);
 }
 
+#ifdef __OPTIMIZE__
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_add_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
-			const int __D)
+_mm_sqrt_round_sh (__m128h __A, __m128h __B, const int __C)
 {
-  return __builtin_ia32_addsh_mask_round (__B, __C,
-					  _mm_setzero_ph (),
-					  __A, __D);
+  return __builtin_ia32_sqrtsh_mask_round (__B, __A,
+					   _mm_setzero_ph (),
+					   (__mmask8) -1, __C);
 }
 
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_sub_round_sh (__m128h __A, __m128h __B, const int __C)
+_mm_mask_sqrt_round_sh (__m128h __A, __mmask8 __B, __m128h __C,
+			__m128h __D, const int __E)
 {
-  return __builtin_ia32_subsh_mask_round (__A, __B,
-					  _mm_setzero_ph (),
-					  (__mmask8) -1, __C);
+  return __builtin_ia32_sqrtsh_mask_round (__D, __C, __A, __B,
+					   __E);
 }
 
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_sub_round_sh (__m128h __A, __mmask8 __B, __m128h __C,
-		       __m128h __D, const int __E)
+_mm_maskz_sqrt_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
+			 const int __D)
 {
-  return __builtin_ia32_subsh_mask_round (__C, __D, __A, __B, __E);
+  return __builtin_ia32_sqrtsh_mask_round (__C, __B,
+					   _mm_setzero_ph (),
+					   __A, __D);
 }
 
+#else
+#define _mm_sqrt_round_sh(A, B, C)				\
+  (__builtin_ia32_sqrtsh_mask_round ((B), (A),			\
+				     _mm_setzero_ph (),		\
+				     (__mmask8)-1, (C)))
+
+#define _mm_mask_sqrt_round_sh(A, B, C, D, E)			\
+  (__builtin_ia32_sqrtsh_mask_round ((D), (C), (A), (B), (E)))
+
+#define _mm_maskz_sqrt_round_sh(A, B, C, D)		\
+  (__builtin_ia32_sqrtsh_mask_round ((C), (B),		\
+				     _mm_setzero_ph (),	\
+				     (A), (D)))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vrsqrtsh.  */
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_sub_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
-			const int __D)
+_mm_rsqrt_sh (__m128h __A, __m128h __B)
 {
-  return __builtin_ia32_subsh_mask_round (__B, __C,
-					  _mm_setzero_ph (),
-					  __A, __D);
+  return __builtin_ia32_rsqrtsh_mask (__B, __A, _mm_setzero_ph (),
+				      (__mmask8) -1);
 }
 
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mul_round_sh (__m128h __A, __m128h __B, const int __C)
+_mm_mask_rsqrt_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
 {
-  return __builtin_ia32_mulsh_mask_round (__A, __B,
-					  _mm_setzero_ph (),
-					  (__mmask8) -1, __C);
+  return __builtin_ia32_rsqrtsh_mask (__D, __C, __A, __B);
 }
 
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_mul_round_sh (__m128h __A, __mmask8 __B, __m128h __C,
-		       __m128h __D, const int __E)
+_mm_maskz_rsqrt_sh (__mmask8 __A, __m128h __B, __m128h __C)
 {
-  return __builtin_ia32_mulsh_mask_round (__C, __D, __A, __B, __E);
+  return __builtin_ia32_rsqrtsh_mask (__C, __B, _mm_setzero_ph (),
+				      __A);
 }
 
+/* Intrinsics vrcpsh.  */
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_mul_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
-			const int __D)
+_mm_rcp_sh (__m128h __A, __m128h __B)
 {
-  return __builtin_ia32_mulsh_mask_round (__B, __C,
-					  _mm_setzero_ph (),
-					  __A, __D);
+  return __builtin_ia32_rcpsh_mask (__B, __A, _mm_setzero_ph (),
+				    (__mmask8) -1);
 }
 
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_div_round_sh (__m128h __A, __m128h __B, const int __C)
+_mm_mask_rcp_sh (__m128h __A, __mmask32 __B, __m128h __C, __m128h __D)
 {
-  return __builtin_ia32_divsh_mask_round (__A, __B,
-					  _mm_setzero_ph (),
-					  (__mmask8) -1, __C);
+  return __builtin_ia32_rcpsh_mask (__D, __C, __A, __B);
 }
 
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_div_round_sh (__m128h __A, __mmask8 __B, __m128h __C,
-		       __m128h __D, const int __E)
+_mm_maskz_rcp_sh (__mmask32 __A, __m128h __B, __m128h __C)
 {
-  return __builtin_ia32_divsh_mask_round (__C, __D, __A, __B, __E);
+  return __builtin_ia32_rcpsh_mask (__C, __B, _mm_setzero_ph (),
+				    __A);
 }
 
+/* Intrinsics vscalefsh.  */
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_div_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
-			const int __D)
+_mm_scalef_sh (__m128h __A, __m128h __B)
 {
-  return __builtin_ia32_divsh_mask_round (__B, __C,
-					  _mm_setzero_ph (),
-					  __A, __D);
+  return __builtin_ia32_scalefsh_mask_round (__A, __B,
+					     _mm_setzero_ph (),
+					     (__mmask8) -1,
+					     _MM_FROUND_CUR_DIRECTION);
 }
-#else
-#define _mm_add_round_sh(A, B, C)					\
-  ((__m128h)__builtin_ia32_addsh_mask_round ((A), (B),			\
-					     _mm_setzero_ph (),		\
-					     (__mmask8)-1, (C)))
-
-#define _mm_mask_add_round_sh(A, B, C, D, E)				\
-  ((__m128h)__builtin_ia32_addsh_mask_round ((C), (D), (A), (B), (E)))
-
-#define _mm_maskz_add_round_sh(A, B, C, D)			\
-  ((__m128h)__builtin_ia32_addsh_mask_round ((B), (C),		\
-					     _mm_setzero_ph (),	\
-					     (A), (D)))
-
-#define _mm_sub_round_sh(A, B, C)					\
-  ((__m128h)__builtin_ia32_subsh_mask_round ((A), (B),			\
-					     _mm_setzero_ph (),		\
-					     (__mmask8)-1, (C)))
-
-#define _mm_mask_sub_round_sh(A, B, C, D, E)				\
-  ((__m128h)__builtin_ia32_subsh_mask_round ((C), (D), (A), (B), (E)))
-
-#define _mm_maskz_sub_round_sh(A, B, C, D)			\
-  ((__m128h)__builtin_ia32_subsh_mask_round ((B), (C),		\
-					     _mm_setzero_ph (),	\
-					     (A), (D)))
-
-#define _mm_mul_round_sh(A, B, C)					\
-  ((__m128h)__builtin_ia32_mulsh_mask_round ((A), (B),			\
-					     _mm_setzero_ph (),		\
-					     (__mmask8)-1, (C)))
-
-#define _mm_mask_mul_round_sh(A, B, C, D, E)				\
-  ((__m128h)__builtin_ia32_mulsh_mask_round ((C), (D), (A), (B), (E)))
-
-#define _mm_maskz_mul_round_sh(A, B, C, D)			\
-  ((__m128h)__builtin_ia32_mulsh_mask_round ((B), (C),		\
-					     _mm_setzero_ph (),	\
-					     (A), (D)))
-
-#define _mm_div_round_sh(A, B, C)					\
-  ((__m128h)__builtin_ia32_divsh_mask_round ((A), (B),			\
-					     _mm_setzero_ph (),		\
-					     (__mmask8)-1, (C)))
-
-#define _mm_mask_div_round_sh(A, B, C, D, E)				\
-  ((__m128h)__builtin_ia32_divsh_mask_round ((C), (D), (A), (B), (E)))
-
-#define _mm_maskz_div_round_sh(A, B, C, D)			\
-  ((__m128h)__builtin_ia32_divsh_mask_round ((B), (C),		\
-					     _mm_setzero_ph (),	\
-					     (A), (D)))
-#endif /* __OPTIMIZE__ */
 
-/* Intrinsic vmaxph vminph.  */
-extern __inline __m512h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_max_ph (__m512h __A, __m512h __B)
+_mm_mask_scalef_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
 {
-  return __builtin_ia32_maxph512_mask (__A, __B,
-				       _mm512_setzero_ph (),
-				       (__mmask32) -1);
+  return __builtin_ia32_scalefsh_mask_round (__C, __D, __A, __B,
+					     _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_max_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D)
+_mm_maskz_scalef_sh (__mmask8 __A, __m128h __B, __m128h __C)
 {
-  return __builtin_ia32_maxph512_mask (__C, __D, __A, __B);
+  return __builtin_ia32_scalefsh_mask_round (__B, __C,
+					     _mm_setzero_ph (),
+					     __A,
+					     _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512h
+#ifdef __OPTIMIZE__
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_max_ph (__mmask32 __A, __m512h __B, __m512h __C)
+_mm_scalef_round_sh (__m128h __A, __m128h __B, const int __C)
 {
-  return __builtin_ia32_maxph512_mask (__B, __C,
-				       _mm512_setzero_ph (), __A);
+  return __builtin_ia32_scalefsh_mask_round (__A, __B,
+					     _mm_setzero_ph (),
+					     (__mmask8) -1, __C);
 }
 
-extern __inline __m512h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_min_ph (__m512h __A, __m512h __B)
+_mm_mask_scalef_round_sh (__m128h __A, __mmask8 __B, __m128h __C,
+			  __m128h __D, const int __E)
 {
-  return __builtin_ia32_minph512_mask (__A, __B,
-				       _mm512_setzero_ph (),
-				       (__mmask32) -1);
+  return __builtin_ia32_scalefsh_mask_round (__C, __D, __A, __B,
+					     __E);
 }
 
-extern __inline __m512h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_min_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D)
+_mm_maskz_scalef_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
+			   const int __D)
 {
-  return __builtin_ia32_minph512_mask (__C, __D, __A, __B);
+  return __builtin_ia32_scalefsh_mask_round (__B, __C,
+					     _mm_setzero_ph (),
+					     __A, __D);
 }
 
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_min_ph (__mmask32 __A, __m512h __B, __m512h __C)
-{
-  return __builtin_ia32_minph512_mask (__B, __C,
-				       _mm512_setzero_ph (), __A);
-}
+#else
+#define _mm_scalef_round_sh(A, B, C)				\
+  (__builtin_ia32_scalefsh_mask_round ((A), (B),		\
+				       _mm_setzero_ph (),	\
+				       (__mmask8)-1, (C)))
 
+#define _mm_mask_scalef_round_sh(A, B, C, D, E)				\
+  (__builtin_ia32_scalefsh_mask_round ((C), (D), (A), (B), (E)))
+
+#define _mm_maskz_scalef_round_sh(A, B, C, D)				\
+  (__builtin_ia32_scalefsh_mask_round ((B), (C), _mm_setzero_ph (),	\
+				       (A), (D)))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vreducesh.  */
 #ifdef __OPTIMIZE__
-extern __inline __m512h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_max_round_ph (__m512h __A, __m512h __B, const int __C)
+_mm_reduce_sh (__m128h __A, __m128h __B, int __C)
 {
-  return __builtin_ia32_maxph512_mask_round (__A, __B,
-					     _mm512_setzero_ph (),
-					     (__mmask32) -1, __C);
+  return __builtin_ia32_reducesh_mask_round (__A, __B, __C,
+					     _mm_setzero_ph (),
+					     (__mmask8) -1,
+					     _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_max_round_ph (__m512h __A, __mmask32 __B, __m512h __C,
-			  __m512h __D, const int __E)
+_mm_mask_reduce_sh (__m128h __A, __mmask8 __B, __m128h __C,
+		    __m128h __D, int __E)
 {
-  return __builtin_ia32_maxph512_mask_round (__C, __D, __A, __B, __E);
+  return __builtin_ia32_reducesh_mask_round (__C, __D, __E, __A, __B,
+					     _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_max_round_ph (__mmask32 __A, __m512h __B, __m512h __C,
-			   const int __D)
+_mm_maskz_reduce_sh (__mmask8 __A, __m128h __B, __m128h __C, int __D)
 {
-  return __builtin_ia32_maxph512_mask_round (__B, __C,
-					     _mm512_setzero_ph (),
-					     __A, __D);
+  return __builtin_ia32_reducesh_mask_round (__B, __C, __D,
+					     _mm_setzero_ph (), __A,
+					     _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_min_round_ph (__m512h __A, __m512h __B, const int __C)
+_mm_reduce_round_sh (__m128h __A, __m128h __B, int __C, const int __D)
 {
-  return __builtin_ia32_minph512_mask_round (__A, __B,
-					     _mm512_setzero_ph (),
-					     (__mmask32) -1, __C);
+  return __builtin_ia32_reducesh_mask_round (__A, __B, __C,
+					     _mm_setzero_ph (),
+					     (__mmask8) -1, __D);
 }
 
-extern __inline __m512h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_min_round_ph (__m512h __A, __mmask32 __B, __m512h __C,
-			  __m512h __D, const int __E)
+_mm_mask_reduce_round_sh (__m128h __A, __mmask8 __B, __m128h __C,
+			  __m128h __D, int __E, const int __F)
 {
-  return __builtin_ia32_minph512_mask_round (__C, __D, __A, __B, __E);
+  return __builtin_ia32_reducesh_mask_round (__C, __D, __E, __A,
+					     __B, __F);
 }
 
-extern __inline __m512h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_min_round_ph (__mmask32 __A, __m512h __B, __m512h __C,
-			   const int __D)
+_mm_maskz_reduce_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
+			   int __D, const int __E)
 {
-  return __builtin_ia32_minph512_mask_round (__B, __C,
-					     _mm512_setzero_ph (),
-					     __A, __D);
+  return __builtin_ia32_reducesh_mask_round (__B, __C, __D,
+					     _mm_setzero_ph (),
+					     __A, __E);
 }
 
 #else
-#define _mm512_max_round_ph(A, B, C)				\
-  (__builtin_ia32_maxph512_mask_round ((A), (B),		\
-				       _mm512_setzero_ph (),	\
-				       (__mmask32)-1, (C)))
+#define _mm_reduce_sh(A, B, C)						\
+  (__builtin_ia32_reducesh_mask_round ((A), (B), (C),			\
+				       _mm_setzero_ph (),		\
+				       (__mmask8)-1,			\
+				       _MM_FROUND_CUR_DIRECTION))
 
-#define _mm512_mask_max_round_ph(A, B, C, D, E)				\
-  (__builtin_ia32_maxph512_mask_round ((C), (D), (A), (B), (E)))
+#define _mm_mask_reduce_sh(A, B, C, D, E)				\
+  (__builtin_ia32_reducesh_mask_round ((C), (D), (E), (A), (B),		\
+				       _MM_FROUND_CUR_DIRECTION))
 
-#define _mm512_maskz_max_round_ph(A, B, C, D)			\
-  (__builtin_ia32_maxph512_mask_round ((B), (C),		\
-				       _mm512_setzero_ph (),	\
-				       (A), (D)))
+#define _mm_maskz_reduce_sh(A, B, C, D)					\
+  (__builtin_ia32_reducesh_mask_round ((B), (C), (D),			\
+				       _mm_setzero_ph (),		\
+				       (A), _MM_FROUND_CUR_DIRECTION))
 
-#define _mm512_min_round_ph(A, B, C)				\
-  (__builtin_ia32_minph512_mask_round ((A), (B),		\
-				       _mm512_setzero_ph (),	\
-				       (__mmask32)-1, (C)))
+#define _mm_reduce_round_sh(A, B, C, D)				\
+  (__builtin_ia32_reducesh_mask_round ((A), (B), (C),		\
+				       _mm_setzero_ph (),	\
+				       (__mmask8)-1, (D)))
 
-#define _mm512_mask_min_round_ph(A, B, C, D, E)				\
-  (__builtin_ia32_minph512_mask_round ((C), (D), (A), (B), (E)))
+#define _mm_mask_reduce_round_sh(A, B, C, D, E, F)			\
+  (__builtin_ia32_reducesh_mask_round ((C), (D), (E), (A), (B), (F)))
+
+#define _mm_maskz_reduce_round_sh(A, B, C, D, E)		\
+  (__builtin_ia32_reducesh_mask_round ((B), (C), (D),		\
+				       _mm_setzero_ph (),	\
+				       (A), (E)))
 
-#define _mm512_maskz_min_round_ph(A, B, C, D)			\
-  (__builtin_ia32_minph512_mask_round ((B), (C),		\
-				       _mm512_setzero_ph (),	\
-				       (A), (D)))
 #endif /* __OPTIMIZE__ */
 
-/* Intrinsic vmaxsh vminsh.  */
+/* Intrinsics vrndscalesh.  */
+#ifdef __OPTIMIZE__
 extern __inline __m128h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_max_sh (__m128h __A, __m128h __B)
+  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_roundscale_sh (__m128h __A, __m128h __B, int __C)
 {
-  __A[0] = __A[0] > __B[0] ? __A[0] : __B[0];
-  return __A;
+  return __builtin_ia32_rndscalesh_mask_round (__A, __B, __C,
+					       _mm_setzero_ph (),
+					       (__mmask8) -1,
+					       _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_max_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
+_mm_mask_roundscale_sh (__m128h __A, __mmask8 __B, __m128h __C,
+			__m128h __D, int __E)
 {
-  return __builtin_ia32_maxsh_mask (__C, __D, __A, __B);
+  return __builtin_ia32_rndscalesh_mask_round (__C, __D, __E, __A, __B,
+					       _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_max_sh (__mmask8 __A, __m128h __B, __m128h __C)
+_mm_maskz_roundscale_sh (__mmask8 __A, __m128h __B, __m128h __C, int __D)
 {
-  return __builtin_ia32_maxsh_mask (__B, __C, _mm_setzero_ph (),
-				    __A);
+  return __builtin_ia32_rndscalesh_mask_round (__B, __C, __D,
+					       _mm_setzero_ph (), __A,
+					       _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_min_sh (__m128h __A, __m128h __B)
+_mm_roundscale_round_sh (__m128h __A, __m128h __B, int __C, const int __D)
 {
-  __A[0] = __A[0] < __B[0] ? __A[0] : __B[0];
-  return __A;
+  return __builtin_ia32_rndscalesh_mask_round (__A, __B, __C,
+					       _mm_setzero_ph (),
+					       (__mmask8) -1,
+					       __D);
 }
 
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_min_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
+_mm_mask_roundscale_round_sh (__m128h __A, __mmask8 __B, __m128h __C,
+			      __m128h __D, int __E, const int __F)
 {
-  return __builtin_ia32_minsh_mask (__C, __D, __A, __B);
+  return __builtin_ia32_rndscalesh_mask_round (__C, __D, __E,
+					       __A, __B, __F);
 }
 
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_min_sh (__mmask8 __A, __m128h __B, __m128h __C)
+_mm_maskz_roundscale_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
+			       int __D, const int __E)
 {
-  return __builtin_ia32_minsh_mask (__B, __C, _mm_setzero_ph (),
-				    __A);
+  return __builtin_ia32_rndscalesh_mask_round (__B, __C, __D,
+					       _mm_setzero_ph (),
+					       __A, __E);
 }
 
+#else
+#define _mm_roundscale_sh(A, B, C)					\
+  (__builtin_ia32_rndscalesh_mask_round ((A), (B), (C),			\
+					 _mm_setzero_ph (),		\
+					 (__mmask8)-1,			\
+					 _MM_FROUND_CUR_DIRECTION))
+
+#define _mm_mask_roundscale_sh(A, B, C, D, E)				\
+  (__builtin_ia32_rndscalesh_mask_round ((C), (D), (E), (A), (B),	\
+					 _MM_FROUND_CUR_DIRECTION))
+
+#define _mm_maskz_roundscale_sh(A, B, C, D)				\
+  (__builtin_ia32_rndscalesh_mask_round ((B), (C), (D),			\
+					 _mm_setzero_ph (),		\
+					 (A), _MM_FROUND_CUR_DIRECTION))
+
+#define _mm_roundscale_round_sh(A, B, C, D)			\
+  (__builtin_ia32_rndscalesh_mask_round ((A), (B), (C),		\
+					 _mm_setzero_ph (),	\
+					 (__mmask8)-1, (D)))
+
+#define _mm_mask_roundscale_round_sh(A, B, C, D, E, F)			\
+  (__builtin_ia32_rndscalesh_mask_round ((C), (D), (E), (A), (B), (F)))
+
+#define _mm_maskz_roundscale_round_sh(A, B, C, D, E)		\
+  (__builtin_ia32_rndscalesh_mask_round ((B), (C), (D),		\
+					 _mm_setzero_ph (),	\
+					 (A), (E)))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vfpclasssh.  */
 #ifdef __OPTIMIZE__
-extern __inline __m128h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_max_round_sh (__m128h __A, __m128h __B, const int __C)
+extern __inline __mmask8
+  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fpclass_sh_mask (__m128h __A, const int __imm)
 {
-  return __builtin_ia32_maxsh_mask_round (__A, __B,
-					  _mm_setzero_ph (),
-					  (__mmask8) -1, __C);
+  return (__mmask8) __builtin_ia32_fpclasssh_mask ((__v8hf) __A, __imm,
+						   (__mmask8) -1);
 }
 
-extern __inline __m128h
+extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_max_round_sh (__m128h __A, __mmask8 __B, __m128h __C,
-		       __m128h __D, const int __E)
+_mm_mask_fpclass_sh_mask (__mmask8 __U, __m128h __A, const int __imm)
 {
-  return __builtin_ia32_maxsh_mask_round (__C, __D, __A, __B, __E);
+  return (__mmask8) __builtin_ia32_fpclasssh_mask ((__v8hf) __A, __imm, __U);
 }
 
+#else
+#define _mm_fpclass_sh_mask(X, C)					\
+  ((__mmask8) __builtin_ia32_fpclasssh_mask ((__v8hf) (__m128h) (X),	\
+					     (int) (C), (__mmask8) (-1))) \
+
+#define _mm_mask_fpclass_sh_mask(U, X, C)				\
+  ((__mmask8) __builtin_ia32_fpclasssh_mask ((__v8hf) (__m128h) (X),	\
+					     (int) (C), (__mmask8) (U)))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vgetexpsh.  */
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_max_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
-			const int __D)
+_mm_getexp_sh (__m128h __A, __m128h __B)
 {
-  return __builtin_ia32_maxsh_mask_round (__B, __C,
-					  _mm_setzero_ph (),
-					  __A, __D);
+  return (__m128h)
+    __builtin_ia32_getexpsh_mask_round ((__v8hf) __A, (__v8hf) __B,
+					(__v8hf) _mm_setzero_ph (),
+					(__mmask8) -1,
+					_MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_min_round_sh (__m128h __A, __m128h __B, const int __C)
+_mm_mask_getexp_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B)
 {
-  return __builtin_ia32_minsh_mask_round (__A, __B,
-					  _mm_setzero_ph (),
-					  (__mmask8) -1, __C);
+  return (__m128h)
+    __builtin_ia32_getexpsh_mask_round ((__v8hf) __A, (__v8hf) __B,
+					(__v8hf) __W, (__mmask8) __U,
+					_MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_min_round_sh (__m128h __A, __mmask8 __B, __m128h __C,
-		       __m128h __D, const int __E)
+_mm_maskz_getexp_sh (__mmask8 __U, __m128h __A, __m128h __B)
 {
-  return __builtin_ia32_minsh_mask_round (__C, __D, __A, __B, __E);
+  return (__m128h)
+    __builtin_ia32_getexpsh_mask_round ((__v8hf) __A, (__v8hf) __B,
+					(__v8hf) _mm_setzero_ph (),
+					(__mmask8) __U,
+					_MM_FROUND_CUR_DIRECTION);
 }
 
+#ifdef __OPTIMIZE__
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_min_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
-			const int __D)
+_mm_getexp_round_sh (__m128h __A, __m128h __B, const int __R)
 {
-  return __builtin_ia32_minsh_mask_round (__B, __C,
-					  _mm_setzero_ph (),
-					  __A, __D);
+  return (__m128h) __builtin_ia32_getexpsh_mask_round ((__v8hf) __A,
+						       (__v8hf) __B,
+						       _mm_setzero_ph (),
+						       (__mmask8) -1,
+						       __R);
 }
 
-#else
-#define _mm_max_round_sh(A, B, C)			\
-  (__builtin_ia32_maxsh_mask_round ((A), (B),		\
-				    _mm_setzero_ph (),	\
-				    (__mmask8)-1, (C)))
-
-#define _mm_mask_max_round_sh(A, B, C, D, E)			\
-  (__builtin_ia32_maxsh_mask_round ((C), (D), (A), (B), (E)))
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_getexp_round_sh (__m128h __W, __mmask8 __U, __m128h __A,
+			  __m128h __B, const int __R)
+{
+  return (__m128h) __builtin_ia32_getexpsh_mask_round ((__v8hf) __A,
+						       (__v8hf) __B,
+						       (__v8hf) __W,
+						       (__mmask8) __U, __R);
+}
 
-#define _mm_maskz_max_round_sh(A, B, C, D)		\
-  (__builtin_ia32_maxsh_mask_round ((B), (C),		\
-				    _mm_setzero_ph (),	\
-				    (A), (D)))
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_getexp_round_sh (__mmask8 __U, __m128h __A, __m128h __B,
+			   const int __R)
+{
+  return (__m128h) __builtin_ia32_getexpsh_mask_round ((__v8hf) __A,
+						       (__v8hf) __B,
+						       (__v8hf)
+						       _mm_setzero_ph (),
+						       (__mmask8) __U, __R);
+}
 
-#define _mm_min_round_sh(A, B, C)			\
-  (__builtin_ia32_minsh_mask_round ((A), (B),		\
-				    _mm_setzero_ph (),	\
-				    (__mmask8)-1, (C)))
+#else
+#define _mm_getexp_round_sh(A, B, R)					\
+  ((__m128h)__builtin_ia32_getexpsh_mask_round((__v8hf)(__m128h)(A),	\
+					       (__v8hf)(__m128h)(B),	\
+					       (__v8hf)_mm_setzero_ph(), \
+					       (__mmask8)-1, R))
 
-#define _mm_mask_min_round_sh(A, B, C, D, E)			\
-  (__builtin_ia32_minsh_mask_round ((C), (D), (A), (B), (E)))
+#define _mm_mask_getexp_round_sh(W, U, A, B, C)			\
+  (__m128h)__builtin_ia32_getexpsh_mask_round(A, B, W, U, C)
 
-#define _mm_maskz_min_round_sh(A, B, C, D)		\
-  (__builtin_ia32_minsh_mask_round ((B), (C),		\
-				    _mm_setzero_ph (),	\
-				    (A), (D)))
+#define _mm_maskz_getexp_round_sh(U, A, B, C)				\
+  (__m128h)__builtin_ia32_getexpsh_mask_round(A, B,			\
+					      (__v8hf)_mm_setzero_ph(),	\
+					      U, C)
 
 #endif /* __OPTIMIZE__ */
 
-/* vcmpph */
-#ifdef __OPTIMIZE
-extern __inline __mmask32
+/* Intrinsics vgetmantsh.  */
+#ifdef __OPTIMIZE__
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmp_ph_mask (__m512h __A, __m512h __B, const int __C)
+_mm_getmant_sh (__m128h __A, __m128h __B,
+		_MM_MANTISSA_NORM_ENUM __C,
+		_MM_MANTISSA_SIGN_ENUM __D)
 {
-  return (__mmask32) __builtin_ia32_cmpph512_mask (__A, __B, __C,
-						   (__mmask32) -1);
+  return (__m128h)
+    __builtin_ia32_getmantsh_mask_round ((__v8hf) __A, (__v8hf) __B,
+					 (__D << 2) | __C, _mm_setzero_ph (),
+					 (__mmask8) -1,
+					 _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __mmask32
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmp_ph_mask (__mmask32 __A, __m512h __B, __m512h __C,
-			 const int __D)
+_mm_mask_getmant_sh (__m128h __W, __mmask8 __U, __m128h __A,
+		     __m128h __B, _MM_MANTISSA_NORM_ENUM __C,
+		     _MM_MANTISSA_SIGN_ENUM __D)
 {
-  return (__mmask32) __builtin_ia32_cmpph512_mask (__B, __C, __D,
-						   __A);
+  return (__m128h)
+    __builtin_ia32_getmantsh_mask_round ((__v8hf) __A, (__v8hf) __B,
+					 (__D << 2) | __C, (__v8hf) __W,
+					 __U, _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __mmask32
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmp_round_ph_mask (__m512h __A, __m512h __B, const int __C,
-			  const int __D)
+_mm_maskz_getmant_sh (__mmask8 __U, __m128h __A, __m128h __B,
+		      _MM_MANTISSA_NORM_ENUM __C,
+		      _MM_MANTISSA_SIGN_ENUM __D)
 {
-  return (__mmask32) __builtin_ia32_cmpph512_mask_round (__A, __B,
-							 __C, (__mmask32) -1,
-							 __D);
+  return (__m128h)
+    __builtin_ia32_getmantsh_mask_round ((__v8hf) __A, (__v8hf) __B,
+					 (__D << 2) | __C,
+					 (__v8hf) _mm_setzero_ph(),
+					 __U, _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __mmask32
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmp_round_ph_mask (__mmask32 __A, __m512h __B, __m512h __C,
-			       const int __D, const int __E)
+_mm_getmant_round_sh (__m128h __A, __m128h __B,
+		      _MM_MANTISSA_NORM_ENUM __C,
+		      _MM_MANTISSA_SIGN_ENUM __D, const int __R)
 {
-  return (__mmask32) __builtin_ia32_cmpph512_mask_round (__B, __C,
-							 __D, __A,
-							 __E);
+  return (__m128h) __builtin_ia32_getmantsh_mask_round ((__v8hf) __A,
+							(__v8hf) __B,
+							(__D << 2) | __C,
+							_mm_setzero_ph (),
+							(__mmask8) -1,
+							__R);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_getmant_round_sh (__m128h __W, __mmask8 __U, __m128h __A,
+			   __m128h __B, _MM_MANTISSA_NORM_ENUM __C,
+			   _MM_MANTISSA_SIGN_ENUM __D, const int __R)
+{
+  return (__m128h) __builtin_ia32_getmantsh_mask_round ((__v8hf) __A,
+							(__v8hf) __B,
+							(__D << 2) | __C,
+							(__v8hf) __W,
+							__U, __R);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_getmant_round_sh (__mmask8 __U, __m128h __A, __m128h __B,
+			    _MM_MANTISSA_NORM_ENUM __C,
+			    _MM_MANTISSA_SIGN_ENUM __D, const int __R)
+{
+  return (__m128h) __builtin_ia32_getmantsh_mask_round ((__v8hf) __A,
+							(__v8hf) __B,
+							(__D << 2) | __C,
+							(__v8hf)
+							_mm_setzero_ph(),
+							__U, __R);
 }
 
 #else
-#define _mm512_cmp_ph_mask(A, B, C)			\
-  (__builtin_ia32_cmpph512_mask ((A), (B), (C), (-1)))
+#define _mm_getmant_sh(X, Y, C, D)					\
+  ((__m128h)__builtin_ia32_getmantsh_mask_round ((__v8hf)(__m128h)(X),	\
+						 (__v8hf)(__m128h)(Y),	\
+						 (int)(((D)<<2) | (C)),	\
+						 (__v8hf)(__m128h)	\
+						 _mm_setzero_ph (),	\
+						 (__mmask8)-1,		\
+						 _MM_FROUND_CUR_DIRECTION))
 
-#define _mm512_mask_cmp_ph_mask(A, B, C, D)		\
-  (__builtin_ia32_cmpph512_mask ((B), (C), (D), (A)))
+#define _mm_mask_getmant_sh(W, U, X, Y, C, D)				\
+  ((__m128h)__builtin_ia32_getmantsh_mask_round ((__v8hf)(__m128h)(X),	\
+						 (__v8hf)(__m128h)(Y),	\
+						 (int)(((D)<<2) | (C)),	\
+						 (__v8hf)(__m128h)(W),	\
+						 (__mmask8)(U),		\
+						 _MM_FROUND_CUR_DIRECTION))
 
-#define _mm512_cmp_round_ph_mask(A, B, C, D)				\
-  (__builtin_ia32_cmpph512_mask_round ((A), (B), (C), (-1), (D)))
+#define _mm_maskz_getmant_sh(U, X, Y, C, D)				\
+  ((__m128h)__builtin_ia32_getmantsh_mask_round ((__v8hf)(__m128h)(X),	\
+						 (__v8hf)(__m128h)(Y),	\
+						 (int)(((D)<<2) | (C)),	\
+						 (__v8hf)(__m128h)	\
+						 _mm_setzero_ph(),	\
+						 (__mmask8)(U),		\
+						 _MM_FROUND_CUR_DIRECTION))
 
-#define _mm512_mask_cmp_round_ph_mask(A, B, C, D, E)			\
-  (__builtin_ia32_cmpph512_mask_round ((B), (C), (D), (A), (E)))
+#define _mm_getmant_round_sh(X, Y, C, D, R)				\
+  ((__m128h)__builtin_ia32_getmantsh_mask_round ((__v8hf)(__m128h)(X),	\
+						 (__v8hf)(__m128h)(Y),	\
+						 (int)(((D)<<2) | (C)),	\
+						 (__v8hf)(__m128h)	\
+						 _mm_setzero_ph (),	\
+						 (__mmask8)-1,		\
+						 (R)))
+
+#define _mm_mask_getmant_round_sh(W, U, X, Y, C, D, R)			\
+  ((__m128h)__builtin_ia32_getmantsh_mask_round ((__v8hf)(__m128h)(X),	\
+						 (__v8hf)(__m128h)(Y),	\
+						 (int)(((D)<<2) | (C)),	\
+						 (__v8hf)(__m128h)(W),	\
+						 (__mmask8)(U),		\
+						 (R)))
+
+#define _mm_maskz_getmant_round_sh(U, X, Y, C, D, R)			\
+  ((__m128h)__builtin_ia32_getmantsh_mask_round ((__v8hf)(__m128h)(X),	\
+						 (__v8hf)(__m128h)(Y),	\
+						 (int)(((D)<<2) | (C)),	\
+						 (__v8hf)(__m128h)	\
+						 _mm_setzero_ph(),	\
+						 (__mmask8)(U),		\
+						 (R)))
 
 #endif /* __OPTIMIZE__ */
 
-/* Intrinsics vcmpsh.  */
-#ifdef __OPTIMIZE__
-extern __inline __mmask8
+/* Intrinsics vmovw.  */
+extern __inline __m128i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cmp_sh_mask (__m128h __A, __m128h __B, const int __C)
+_mm_cvtsi16_si128 (short __A)
 {
-  return (__mmask8)
-    __builtin_ia32_cmpsh_mask_round (__A, __B,
-				     __C, (__mmask8) -1,
-				     _MM_FROUND_CUR_DIRECTION);
+  return _mm_set_epi16 (0, 0, 0, 0, 0, 0, 0, __A);
 }
 
-extern __inline __mmask8
+extern __inline short
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_cmp_sh_mask (__mmask8 __A, __m128h __B, __m128h __C,
-		      const int __D)
+_mm_cvtsi128_si16 (__m128i __A)
 {
-  return (__mmask8)
-    __builtin_ia32_cmpsh_mask_round (__B, __C,
-				     __D, __A,
-				     _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_vec_ext_v8hi ((__v8hi)__A, 0);
 }
 
-extern __inline __mmask8
+/* Intrinsics vmovsh.  */
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cmp_round_sh_mask (__m128h __A, __m128h __B, const int __C,
-		       const int __D)
+_mm_mask_load_sh (__m128h __A, __mmask8 __B, _Float16 const* __C)
 {
-  return (__mmask8) __builtin_ia32_cmpsh_mask_round (__A, __B,
-						     __C, (__mmask8) -1,
-						     __D);
+  return __builtin_ia32_loadsh_mask (__C, __A, __B);
 }
 
-extern __inline __mmask8
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_cmp_round_sh_mask (__mmask8 __A, __m128h __B, __m128h __C,
-			    const int __D, const int __E)
+_mm_maskz_load_sh (__mmask8 __A, _Float16 const* __B)
 {
-  return (__mmask8) __builtin_ia32_cmpsh_mask_round (__B, __C,
-						     __D, __A,
-						     __E);
+  return __builtin_ia32_loadsh_mask (__B, _mm_setzero_ph (), __A);
 }
 
-#else
-#define _mm_cmp_sh_mask(A, B, C)					\
-  (__builtin_ia32_cmpsh_mask_round ((A), (B), (C), (-1),		\
-				    (_MM_FROUND_CUR_DIRECTION)))
-
-#define _mm_mask_cmp_sh_mask(A, B, C, D)				\
-  (__builtin_ia32_cmpsh_mask_round ((B), (C), (D), (A),			\
-				    (_MM_FROUND_CUR_DIRECTION)))
-
-#define _mm_cmp_round_sh_mask(A, B, C, D)			\
-  (__builtin_ia32_cmpsh_mask_round ((A), (B), (C), (-1), (D)))
-
-#define _mm_mask_cmp_round_sh_mask(A, B, C, D, E)		\
-  (__builtin_ia32_cmpsh_mask_round ((B), (C), (D), (A), (E)))
-
-#endif /* __OPTIMIZE__ */
-
-/* Intrinsics vcomish.  */
-extern __inline int
+extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_comieq_sh (__m128h __A, __m128h __B)
+_mm_mask_store_sh (_Float16 const* __A, __mmask8 __B, __m128h __C)
 {
-  return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_EQ_OS,
-					  (__mmask8) -1,
-					  _MM_FROUND_CUR_DIRECTION);
+  __builtin_ia32_storesh_mask (__A,  __C, __B);
 }
 
-extern __inline int
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_comilt_sh (__m128h __A, __m128h __B)
+_mm_move_sh (__m128h __A, __m128h  __B)
 {
-  return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_LT_OS,
-					  (__mmask8) -1,
-					  _MM_FROUND_CUR_DIRECTION);
+  __A[0] = __B[0];
+  return __A;
 }
 
-extern __inline int
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_comile_sh (__m128h __A, __m128h __B)
+_mm_mask_move_sh (__m128h __A, __mmask8 __B, __m128h  __C, __m128h __D)
 {
-  return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_LE_OS,
-					  (__mmask8) -1,
-					  _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_vmovsh_mask (__C, __D, __A, __B);
 }
 
-extern __inline int
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_comigt_sh (__m128h __A, __m128h __B)
+_mm_maskz_move_sh (__mmask8 __A, __m128h  __B, __m128h __C)
 {
-  return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_GT_OS,
-					  (__mmask8) -1,
-					  _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_vmovsh_mask (__B, __C, _mm_setzero_ph (), __A);
 }
 
+/* Intrinsics vcvtsh2si, vcvtsh2us.  */
 extern __inline int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_comige_sh (__m128h __A, __m128h __B)
+_mm_cvtsh_i32 (__m128h __A)
 {
-  return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_GE_OS,
-					  (__mmask8) -1,
-					  _MM_FROUND_CUR_DIRECTION);
+  return (int) __builtin_ia32_vcvtsh2si32_round (__A, _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline int
+extern __inline unsigned
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_comineq_sh (__m128h __A, __m128h __B)
+_mm_cvtsh_u32 (__m128h __A)
 {
-  return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_NEQ_US,
-					  (__mmask8) -1,
-					  _MM_FROUND_CUR_DIRECTION);
+  return (int) __builtin_ia32_vcvtsh2usi32_round (__A,
+						  _MM_FROUND_CUR_DIRECTION);
 }
 
+#ifdef __OPTIMIZE__
 extern __inline int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_ucomieq_sh (__m128h __A, __m128h __B)
+_mm_cvt_roundsh_i32 (__m128h __A, const int __R)
 {
-  return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_EQ_OQ,
-					  (__mmask8) -1,
-					  _MM_FROUND_CUR_DIRECTION);
+  return (int) __builtin_ia32_vcvtsh2si32_round (__A, __R);
 }
 
-extern __inline int
+extern __inline unsigned
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_ucomilt_sh (__m128h __A, __m128h __B)
+_mm_cvt_roundsh_u32 (__m128h __A, const int __R)
 {
-  return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_LT_OQ,
-					  (__mmask8) -1,
-					  _MM_FROUND_CUR_DIRECTION);
+  return (int) __builtin_ia32_vcvtsh2usi32_round (__A, __R);
 }
 
-extern __inline int
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_ucomile_sh (__m128h __A, __m128h __B)
-{
-  return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_LE_OQ,
-					  (__mmask8) -1,
-					  _MM_FROUND_CUR_DIRECTION);
-}
+#else
+#define _mm_cvt_roundsh_i32(A, B)		\
+  ((int)__builtin_ia32_vcvtsh2si32_round ((A), (B)))
+#define _mm_cvt_roundsh_u32(A, B)		\
+  ((int)__builtin_ia32_vcvtsh2usi32_round ((A), (B)))
 
-extern __inline int
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_ucomigt_sh (__m128h __A, __m128h __B)
-{
-  return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_GT_OQ,
-					  (__mmask8) -1,
-					  _MM_FROUND_CUR_DIRECTION);
-}
+#endif /* __OPTIMIZE__ */
 
-extern __inline int
+#ifdef __x86_64__
+extern __inline long long
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_ucomige_sh (__m128h __A, __m128h __B)
+_mm_cvtsh_i64 (__m128h __A)
 {
-  return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_GE_OQ,
-					  (__mmask8) -1,
-					  _MM_FROUND_CUR_DIRECTION);
+  return (long long)
+    __builtin_ia32_vcvtsh2si64_round (__A, _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline int
+extern __inline unsigned long long
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_ucomineq_sh (__m128h __A, __m128h __B)
+_mm_cvtsh_u64 (__m128h __A)
 {
-  return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_NEQ_UQ,
-					  (__mmask8) -1,
-					  _MM_FROUND_CUR_DIRECTION);
+  return (long long)
+    __builtin_ia32_vcvtsh2usi64_round (__A, _MM_FROUND_CUR_DIRECTION);
 }
 
 #ifdef __OPTIMIZE__
-extern __inline int
+extern __inline long long
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_comi_sh (__m128h __A, __m128h __B, const int __P)
+_mm_cvt_roundsh_i64 (__m128h __A, const int __R)
 {
-  return __builtin_ia32_cmpsh_mask_round (__A, __B, __P,
-					  (__mmask8) -1,
-					  _MM_FROUND_CUR_DIRECTION);
+  return (long long) __builtin_ia32_vcvtsh2si64_round (__A, __R);
 }
 
-extern __inline int
+extern __inline unsigned long long
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_comi_round_sh (__m128h __A, __m128h __B, const int __P, const int __R)
+_mm_cvt_roundsh_u64 (__m128h __A, const int __R)
 {
-  return __builtin_ia32_cmpsh_mask_round (__A, __B, __P,
-					  (__mmask8) -1,__R);
+  return (long long) __builtin_ia32_vcvtsh2usi64_round (__A, __R);
 }
 
 #else
-#define _mm_comi_round_sh(A, B, P, R)					\
-  (__builtin_ia32_cmpsh_mask_round ((A), (B), (P), (__mmask8) (-1), (R)))
-#define _mm_comi_sh(A, B, P)						\
-  (__builtin_ia32_cmpsh_mask_round ((A), (B), (P), (__mmask8) (-1),	\
-				    _MM_FROUND_CUR_DIRECTION))
-
-#endif /* __OPTIMIZE__  */
+#define _mm_cvt_roundsh_i64(A, B)			\
+  ((long long)__builtin_ia32_vcvtsh2si64_round ((A), (B)))
+#define _mm_cvt_roundsh_u64(A, B)			\
+  ((long long)__builtin_ia32_vcvtsh2usi64_round ((A), (B)))
 
-/* Intrinsics vsqrtph.  */
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_sqrt_ph (__m512h __A)
-{
-  return __builtin_ia32_sqrtph512_mask_round (__A,
-					      _mm512_setzero_ph(),
-					      (__mmask32) -1,
-					      _MM_FROUND_CUR_DIRECTION);
-}
+#endif /* __OPTIMIZE__ */
+#endif /* __x86_64__ */
 
-extern __inline __m512h
+/* Intrinsics vcvtsi2sh, vcvtusi2sh.  */
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_sqrt_ph (__m512h __A, __mmask32 __B, __m512h __C)
+_mm_cvti32_sh (__m128h __A, int __B)
 {
-  return __builtin_ia32_sqrtph512_mask_round (__C, __A, __B,
-					      _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_vcvtsi2sh32_round (__A, __B, _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_sqrt_ph (__mmask32 __A, __m512h __B)
+_mm_cvtu32_sh (__m128h __A, unsigned int __B)
 {
-  return __builtin_ia32_sqrtph512_mask_round (__B,
-					      _mm512_setzero_ph (),
-					      __A,
-					      _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_vcvtusi2sh32_round (__A, __B, _MM_FROUND_CUR_DIRECTION);
 }
 
 #ifdef __OPTIMIZE__
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_sqrt_round_ph (__m512h __A, const int __B)
-{
-  return __builtin_ia32_sqrtph512_mask_round (__A,
-					      _mm512_setzero_ph(),
-					      (__mmask32) -1, __B);
-}
-
-extern __inline __m512h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_sqrt_round_ph (__m512h __A, __mmask32 __B, __m512h __C,
-			   const int __D)
+_mm_cvt_roundi32_sh (__m128h __A, int __B, const int __R)
 {
-  return __builtin_ia32_sqrtph512_mask_round (__C, __A, __B, __D);
+  return __builtin_ia32_vcvtsi2sh32_round (__A, __B, __R);
 }
 
-extern __inline __m512h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_sqrt_round_ph (__mmask32 __A, __m512h __B, const int __C)
+_mm_cvt_roundu32_sh (__m128h __A, unsigned int __B, const int __R)
 {
-  return __builtin_ia32_sqrtph512_mask_round (__B,
-					      _mm512_setzero_ph (),
-					      __A, __C);
+  return __builtin_ia32_vcvtusi2sh32_round (__A, __B, __R);
 }
 
 #else
-#define _mm512_sqrt_round_ph(A, B)				\
-  (__builtin_ia32_sqrtph512_mask_round ((A),			\
-					_mm512_setzero_ph (),	\
-					(__mmask32)-1, (B)))
-
-#define _mm512_mask_sqrt_round_ph(A, B, C, D)			\
-  (__builtin_ia32_sqrtph512_mask_round ((C), (A), (B), (D)))
-
-#define _mm512_maskz_sqrt_round_ph(A, B, C)			\
-  (__builtin_ia32_sqrtph512_mask_round ((B),			\
-					_mm512_setzero_ph (),	\
-					(A), (C)))
+#define _mm_cvt_roundi32_sh(A, B, C)		\
+  (__builtin_ia32_vcvtsi2sh32_round ((A), (B), (C)))
+#define _mm_cvt_roundu32_sh(A, B, C)		\
+  (__builtin_ia32_vcvtusi2sh32_round ((A), (B), (C)))
 
 #endif /* __OPTIMIZE__ */
 
-/* Intrinsics vrsqrtph.  */
-extern __inline __m512h
+#ifdef __x86_64__
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_rsqrt_ph (__m512h __A)
+_mm_cvti64_sh (__m128h __A, long long __B)
 {
-  return __builtin_ia32_rsqrtph512_mask (__A, _mm512_setzero_ph (),
-					 (__mmask32) -1);
+  return __builtin_ia32_vcvtsi2sh64_round (__A, __B, _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_rsqrt_ph (__m512h __A, __mmask32 __B, __m512h __C)
+_mm_cvtu64_sh (__m128h __A, unsigned long long __B)
 {
-  return __builtin_ia32_rsqrtph512_mask (__C, __A, __B);
+  return __builtin_ia32_vcvtusi2sh64_round (__A, __B, _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512h
+#ifdef __OPTIMIZE__
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_rsqrt_ph (__mmask32 __A, __m512h __B)
+_mm_cvt_roundi64_sh (__m128h __A, long long __B, const int __R)
 {
-  return __builtin_ia32_rsqrtph512_mask (__B, _mm512_setzero_ph (),
-					 __A);
+  return __builtin_ia32_vcvtsi2sh64_round (__A, __B, __R);
 }
 
-/* Intrinsics vrsqrtsh.  */
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_rsqrt_sh (__m128h __A, __m128h __B)
+_mm_cvt_roundu64_sh (__m128h __A, unsigned long long __B, const int __R)
 {
-  return __builtin_ia32_rsqrtsh_mask (__B, __A, _mm_setzero_ph (),
-				      (__mmask8) -1);
+  return __builtin_ia32_vcvtusi2sh64_round (__A, __B, __R);
 }
 
-extern __inline __m128h
+#else
+#define _mm_cvt_roundi64_sh(A, B, C)		\
+  (__builtin_ia32_vcvtsi2sh64_round ((A), (B), (C)))
+#define _mm_cvt_roundu64_sh(A, B, C)		\
+  (__builtin_ia32_vcvtusi2sh64_round ((A), (B), (C)))
+
+#endif /* __OPTIMIZE__ */
+#endif /* __x86_64__ */
+
+/* Intrinsics vcvttsh2si, vcvttsh2us.  */
+extern __inline int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_rsqrt_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
+_mm_cvttsh_i32 (__m128h __A)
 {
-  return __builtin_ia32_rsqrtsh_mask (__D, __C, __A, __B);
+  return (int)
+    __builtin_ia32_vcvttsh2si32_round (__A, _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128h
+extern __inline unsigned
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_rsqrt_sh (__mmask8 __A, __m128h __B, __m128h __C)
+_mm_cvttsh_u32 (__m128h __A)
 {
-  return __builtin_ia32_rsqrtsh_mask (__C, __B, _mm_setzero_ph (),
-				      __A);
+  return (int)
+    __builtin_ia32_vcvttsh2usi32_round (__A, _MM_FROUND_CUR_DIRECTION);
 }
 
-/* Intrinsics vsqrtsh.  */
-extern __inline __m128h
+#ifdef __OPTIMIZE__
+extern __inline int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_sqrt_sh (__m128h __A, __m128h __B)
+_mm_cvtt_roundsh_i32 (__m128h __A, const int __R)
 {
-  return __builtin_ia32_sqrtsh_mask_round (__B, __A,
-					   _mm_setzero_ph (),
-					   (__mmask8) -1,
-					   _MM_FROUND_CUR_DIRECTION);
+  return (int) __builtin_ia32_vcvttsh2si32_round (__A, __R);
 }
 
-extern __inline __m128h
+extern __inline unsigned
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_sqrt_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
+_mm_cvtt_roundsh_u32 (__m128h __A, const int __R)
 {
-  return __builtin_ia32_sqrtsh_mask_round (__D, __C, __A, __B,
-					   _MM_FROUND_CUR_DIRECTION);
+  return (int) __builtin_ia32_vcvttsh2usi32_round (__A, __R);
 }
 
-extern __inline __m128h
+#else
+#define _mm_cvtt_roundsh_i32(A, B)		\
+  ((int)__builtin_ia32_vcvttsh2si32_round ((A), (B)))
+#define _mm_cvtt_roundsh_u32(A, B)		\
+  ((int)__builtin_ia32_vcvttsh2usi32_round ((A), (B)))
+
+#endif /* __OPTIMIZE__ */
+
+#ifdef __x86_64__
+extern __inline long long
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_sqrt_sh (__mmask8 __A, __m128h __B, __m128h __C)
+_mm_cvttsh_i64 (__m128h __A)
 {
-  return __builtin_ia32_sqrtsh_mask_round (__C, __B,
-					   _mm_setzero_ph (),
-					   __A, _MM_FROUND_CUR_DIRECTION);
+  return (long long)
+    __builtin_ia32_vcvttsh2si64_round (__A, _MM_FROUND_CUR_DIRECTION);
 }
 
-#ifdef __OPTIMIZE__
-extern __inline __m128h
+extern __inline unsigned long long
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_sqrt_round_sh (__m128h __A, __m128h __B, const int __C)
+_mm_cvttsh_u64 (__m128h __A)
 {
-  return __builtin_ia32_sqrtsh_mask_round (__B, __A,
-					   _mm_setzero_ph (),
-					   (__mmask8) -1, __C);
+  return (long long)
+    __builtin_ia32_vcvttsh2usi64_round (__A, _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128h
+#ifdef __OPTIMIZE__
+extern __inline long long
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_sqrt_round_sh (__m128h __A, __mmask8 __B, __m128h __C,
-			__m128h __D, const int __E)
+_mm_cvtt_roundsh_i64 (__m128h __A, const int __R)
 {
-  return __builtin_ia32_sqrtsh_mask_round (__D, __C, __A, __B,
-					   __E);
+  return (long long) __builtin_ia32_vcvttsh2si64_round (__A, __R);
 }
 
-extern __inline __m128h
+extern __inline unsigned long long
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_sqrt_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
-			 const int __D)
+_mm_cvtt_roundsh_u64 (__m128h __A, const int __R)
 {
-  return __builtin_ia32_sqrtsh_mask_round (__C, __B,
-					   _mm_setzero_ph (),
-					   __A, __D);
+  return (long long) __builtin_ia32_vcvttsh2usi64_round (__A, __R);
 }
 
 #else
-#define _mm_sqrt_round_sh(A, B, C)				\
-  (__builtin_ia32_sqrtsh_mask_round ((B), (A),			\
-				     _mm_setzero_ph (),		\
-				     (__mmask8)-1, (C)))
-
-#define _mm_mask_sqrt_round_sh(A, B, C, D, E)			\
-  (__builtin_ia32_sqrtsh_mask_round ((D), (C), (A), (B), (E)))
-
-#define _mm_maskz_sqrt_round_sh(A, B, C, D)		\
-  (__builtin_ia32_sqrtsh_mask_round ((C), (B),		\
-				     _mm_setzero_ph (),	\
-				     (A), (D)))
+#define _mm_cvtt_roundsh_i64(A, B)			\
+  ((long long)__builtin_ia32_vcvttsh2si64_round ((A), (B)))
+#define _mm_cvtt_roundsh_u64(A, B)			\
+  ((long long)__builtin_ia32_vcvttsh2usi64_round ((A), (B)))
 
 #endif /* __OPTIMIZE__ */
+#endif /* __x86_64__ */
 
-/* Intrinsics vrcpph.  */
-extern __inline __m512h
+/* Intrinsics vcvtsh2ss, vcvtsh2sd.  */
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_rcp_ph (__m512h __A)
+_mm_cvtsh_ss (__m128 __A, __m128h __B)
 {
-  return __builtin_ia32_rcpph512_mask (__A, _mm512_setzero_ph (),
-				       (__mmask32) -1);
+  return __builtin_ia32_vcvtsh2ss_mask_round (__B, __A,
+					      _mm_setzero_ps (),
+					      (__mmask8) -1,
+					      _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512h
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_rcp_ph (__m512h __A, __mmask32 __B, __m512h __C)
+_mm_mask_cvtsh_ss (__m128 __A, __mmask8 __B, __m128 __C,
+			 __m128h __D)
 {
-  return __builtin_ia32_rcpph512_mask (__C, __A, __B);
+  return __builtin_ia32_vcvtsh2ss_mask_round (__D, __C, __A, __B,
+					      _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512h
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_rcp_ph (__mmask32 __A, __m512h __B)
+_mm_maskz_cvtsh_ss (__mmask8 __A, __m128 __B,
+			  __m128h __C)
 {
-  return __builtin_ia32_rcpph512_mask (__B, _mm512_setzero_ph (),
-				       __A);
+  return __builtin_ia32_vcvtsh2ss_mask_round (__C, __B,
+					      _mm_setzero_ps (),
+					      __A, _MM_FROUND_CUR_DIRECTION);
 }
 
-/* Intrinsics vrcpsh.  */
-extern __inline __m128h
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_rcp_sh (__m128h __A, __m128h __B)
+_mm_cvtsh_sd (__m128d __A, __m128h __B)
 {
-  return __builtin_ia32_rcpsh_mask (__B, __A, _mm_setzero_ph (),
-				    (__mmask8) -1);
+  return __builtin_ia32_vcvtsh2sd_mask_round (__B, __A,
+					      _mm_setzero_pd (),
+					      (__mmask8) -1,
+					      _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128h
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_rcp_sh (__m128h __A, __mmask32 __B, __m128h __C, __m128h __D)
+_mm_mask_cvtsh_sd (__m128d __A, __mmask8 __B, __m128d __C,
+			 __m128h __D)
 {
-  return __builtin_ia32_rcpsh_mask (__D, __C, __A, __B);
+  return __builtin_ia32_vcvtsh2sd_mask_round (__D, __C, __A, __B,
+					      _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128h
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_rcp_sh (__mmask32 __A, __m128h __B, __m128h __C)
+_mm_maskz_cvtsh_sd (__mmask8 __A, __m128d __B, __m128h __C)
 {
-  return __builtin_ia32_rcpsh_mask (__C, __B, _mm_setzero_ph (),
-				    __A);
+  return __builtin_ia32_vcvtsh2sd_mask_round (__C, __B,
+					      _mm_setzero_pd (),
+					      __A, _MM_FROUND_CUR_DIRECTION);
 }
 
-/* Intrinsics vscalefph.  */
-extern __inline __m512h
+#ifdef __OPTIMIZE__
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_scalef_ph (__m512h __A, __m512h __B)
+_mm_cvt_roundsh_ss (__m128 __A, __m128h __B, const int __R)
 {
-  return __builtin_ia32_scalefph512_mask_round (__A, __B,
-						_mm512_setzero_ph (),
-						(__mmask32) -1,
-						_MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_vcvtsh2ss_mask_round (__B, __A,
+					      _mm_setzero_ps (),
+					      (__mmask8) -1, __R);
 }
 
-extern __inline __m512h
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_scalef_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D)
+_mm_mask_cvt_roundsh_ss (__m128 __A, __mmask8 __B, __m128 __C,
+			 __m128h __D, const int __R)
 {
-  return __builtin_ia32_scalefph512_mask_round (__C, __D, __A, __B,
-						_MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_vcvtsh2ss_mask_round (__D, __C, __A, __B, __R);
 }
 
-extern __inline __m512h
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_scalef_ph (__mmask32 __A, __m512h __B, __m512h __C)
+_mm_maskz_cvt_roundsh_ss (__mmask8 __A, __m128 __B,
+			  __m128h __C, const int __R)
 {
-  return __builtin_ia32_scalefph512_mask_round (__B, __C,
-						_mm512_setzero_ph (),
-						__A,
-						_MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_vcvtsh2ss_mask_round (__C, __B,
+					      _mm_setzero_ps (),
+					      __A, __R);
 }
 
-#ifdef __OPTIMIZE__
-extern __inline __m512h
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_scalef_round_ph (__m512h __A, __m512h __B, const int __C)
+_mm_cvt_roundsh_sd (__m128d __A, __m128h __B, const int __R)
 {
-  return __builtin_ia32_scalefph512_mask_round (__A, __B,
-						_mm512_setzero_ph (),
-						(__mmask32) -1, __C);
+  return __builtin_ia32_vcvtsh2sd_mask_round (__B, __A,
+					      _mm_setzero_pd (),
+					      (__mmask8) -1, __R);
 }
 
-extern __inline __m512h
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_scalef_round_ph (__m512h __A, __mmask32 __B, __m512h __C,
-			     __m512h __D, const int __E)
+_mm_mask_cvt_roundsh_sd (__m128d __A, __mmask8 __B, __m128d __C,
+			 __m128h __D, const int __R)
 {
-  return __builtin_ia32_scalefph512_mask_round (__C, __D, __A, __B,
-						__E);
+  return __builtin_ia32_vcvtsh2sd_mask_round (__D, __C, __A, __B, __R);
 }
 
-extern __inline __m512h
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_scalef_round_ph (__mmask32 __A, __m512h __B, __m512h __C,
-			      const int __D)
+_mm_maskz_cvt_roundsh_sd (__mmask8 __A, __m128d __B, __m128h __C, const int __R)
 {
-  return __builtin_ia32_scalefph512_mask_round (__B, __C,
-						_mm512_setzero_ph (),
-						__A, __D);
+  return __builtin_ia32_vcvtsh2sd_mask_round (__C, __B,
+					      _mm_setzero_pd (),
+					      __A, __R);
 }
 
 #else
-#define _mm512_scalef_round_ph(A, B, C)				\
-  (__builtin_ia32_scalefph512_mask_round ((A), (B),		\
-					  _mm512_setzero_ph (),	\
-					  (__mmask32)-1, (C)))
+#define _mm_cvt_roundsh_ss(A, B, R)				\
+  (__builtin_ia32_vcvtsh2ss_mask_round ((B), (A),		\
+					_mm_setzero_ps (),	\
+					(__mmask8) -1, (R)))
 
-#define _mm512_mask_scalef_round_ph(A, B, C, D, E)			\
-  (__builtin_ia32_scalefph512_mask_round ((C), (D), (A), (B), (E)))
+#define _mm_mask_cvt_roundsh_ss(A, B, C, D, R)				\
+  (__builtin_ia32_vcvtsh2ss_mask_round ((D), (C), (A), (B), (R)))
 
-#define _mm512_maskz_scalef_round_ph(A, B, C, D)		\
-  (__builtin_ia32_scalefph512_mask_round ((B), (C),		\
-					  _mm512_setzero_ph (),	\
-					  (A), (D)))
+#define _mm_maskz_cvt_roundsh_ss(A, B, C, R)			\
+  (__builtin_ia32_vcvtsh2ss_mask_round ((C), (B),		\
+					_mm_setzero_ps (),	\
+					(A), (R)))
 
-#endif  /* __OPTIMIZE__ */
+#define _mm_cvt_roundsh_sd(A, B, R)				\
+  (__builtin_ia32_vcvtsh2sd_mask_round ((B), (A),		\
+					_mm_setzero_pd (),	\
+					(__mmask8) -1, (R)))
 
-/* Intrinsics vscalefsh.  */
-extern __inline __m128h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_scalef_sh (__m128h __A, __m128h __B)
+#define _mm_mask_cvt_roundsh_sd(A, B, C, D, R)				\
+  (__builtin_ia32_vcvtsh2sd_mask_round ((D), (C), (A), (B), (R)))
+
+#define _mm_maskz_cvt_roundsh_sd(A, B, C, R)			\
+  (__builtin_ia32_vcvtsh2sd_mask_round ((C), (B),		\
+					_mm_setzero_pd (),	\
+					(A), (R)))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vcvtss2sh, vcvtsd2sh.  */
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtss_sh (__m128h __A, __m128 __B)
 {
-  return __builtin_ia32_scalefsh_mask_round (__A, __B,
-					     _mm_setzero_ph (),
-					     (__mmask8) -1,
-					     _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_vcvtss2sh_mask_round (__B, __A,
+					      _mm_setzero_ph (),
+					      (__mmask8) -1,
+					      _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_scalef_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
+_mm_mask_cvtss_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128 __D)
 {
-  return __builtin_ia32_scalefsh_mask_round (__C, __D, __A, __B,
-					     _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_vcvtss2sh_mask_round (__D, __C, __A, __B,
+					      _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_scalef_sh (__mmask8 __A, __m128h __B, __m128h __C)
+_mm_maskz_cvtss_sh (__mmask8 __A, __m128h __B, __m128 __C)
 {
-  return __builtin_ia32_scalefsh_mask_round (__B, __C,
-					     _mm_setzero_ph (),
-					     __A,
-					     _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_vcvtss2sh_mask_round (__C, __B,
+					      _mm_setzero_ph (),
+					      __A, _MM_FROUND_CUR_DIRECTION);
 }
 
-#ifdef __OPTIMIZE__
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_scalef_round_sh (__m128h __A, __m128h __B, const int __C)
+_mm_cvtsd_sh (__m128h __A, __m128d __B)
 {
-  return __builtin_ia32_scalefsh_mask_round (__A, __B,
-					     _mm_setzero_ph (),
-					     (__mmask8) -1, __C);
+  return __builtin_ia32_vcvtsd2sh_mask_round (__B, __A,
+					      _mm_setzero_ph (),
+					      (__mmask8) -1,
+					      _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_scalef_round_sh (__m128h __A, __mmask8 __B, __m128h __C,
-			  __m128h __D, const int __E)
+_mm_mask_cvtsd_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128d __D)
 {
-  return __builtin_ia32_scalefsh_mask_round (__C, __D, __A, __B,
-					     __E);
+  return __builtin_ia32_vcvtsd2sh_mask_round (__D, __C, __A, __B,
+					      _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_scalef_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
-			   const int __D)
+_mm_maskz_cvtsd_sh (__mmask8 __A, __m128h __B, __m128d __C)
 {
-  return __builtin_ia32_scalefsh_mask_round (__B, __C,
-					     _mm_setzero_ph (),
-					     __A, __D);
+  return __builtin_ia32_vcvtsd2sh_mask_round (__C, __B,
+					      _mm_setzero_ph (),
+					      __A, _MM_FROUND_CUR_DIRECTION);
 }
 
-#else
-#define _mm_scalef_round_sh(A, B, C)				\
-  (__builtin_ia32_scalefsh_mask_round ((A), (B),		\
-				       _mm_setzero_ph (),	\
-				       (__mmask8)-1, (C)))
-
-#define _mm_mask_scalef_round_sh(A, B, C, D, E)				\
-  (__builtin_ia32_scalefsh_mask_round ((C), (D), (A), (B), (E)))
-
-#define _mm_maskz_scalef_round_sh(A, B, C, D)				\
-  (__builtin_ia32_scalefsh_mask_round ((B), (C), _mm_setzero_ph (),	\
-				       (A), (D)))
-
-#endif /* __OPTIMIZE__ */
-
-/* Intrinsics vreduceph.  */
 #ifdef __OPTIMIZE__
-extern __inline __m512h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_reduce_ph (__m512h __A, int __B)
+_mm_cvt_roundss_sh (__m128h __A, __m128 __B, const int __R)
 {
-  return __builtin_ia32_reduceph512_mask_round (__A, __B,
-						_mm512_setzero_ph (),
-						(__mmask32) -1,
-						_MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_vcvtss2sh_mask_round (__B, __A,
+					      _mm_setzero_ph (),
+					      (__mmask8) -1, __R);
 }
 
-extern __inline __m512h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_reduce_ph (__m512h __A, __mmask32 __B, __m512h __C, int __D)
+_mm_mask_cvt_roundss_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128 __D,
+			 const int __R)
 {
-  return __builtin_ia32_reduceph512_mask_round (__C, __D, __A, __B,
-						_MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_vcvtss2sh_mask_round (__D, __C, __A, __B, __R);
 }
 
-extern __inline __m512h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_reduce_ph (__mmask32 __A, __m512h __B, int __C)
+_mm_maskz_cvt_roundss_sh (__mmask8 __A, __m128h __B, __m128 __C,
+			  const int __R)
 {
-  return __builtin_ia32_reduceph512_mask_round (__B, __C,
-						_mm512_setzero_ph (),
-						__A,
-						_MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_vcvtss2sh_mask_round (__C, __B,
+					      _mm_setzero_ph (),
+					      __A, __R);
 }
 
-extern __inline __m512h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_reduce_round_ph (__m512h __A, int __B, const int __C)
+_mm_cvt_roundsd_sh (__m128h __A, __m128d __B, const int __R)
 {
-  return __builtin_ia32_reduceph512_mask_round (__A, __B,
-						_mm512_setzero_ph (),
-						(__mmask32) -1, __C);
+  return __builtin_ia32_vcvtsd2sh_mask_round (__B, __A,
+					      _mm_setzero_ph (),
+					      (__mmask8) -1, __R);
 }
 
-extern __inline __m512h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_reduce_round_ph (__m512h __A, __mmask32 __B, __m512h __C,
-			     int __D, const int __E)
+_mm_mask_cvt_roundsd_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128d __D,
+			 const int __R)
 {
-  return __builtin_ia32_reduceph512_mask_round (__C, __D, __A, __B,
-						__E);
+  return __builtin_ia32_vcvtsd2sh_mask_round (__D, __C, __A, __B, __R);
 }
 
-extern __inline __m512h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_reduce_round_ph (__mmask32 __A, __m512h __B, int __C,
-			      const int __D)
+_mm_maskz_cvt_roundsd_sh (__mmask8 __A, __m128h __B, __m128d __C,
+			  const int __R)
 {
-  return __builtin_ia32_reduceph512_mask_round (__B, __C,
-						_mm512_setzero_ph (),
-						__A, __D);
+  return __builtin_ia32_vcvtsd2sh_mask_round (__C, __B,
+					      _mm_setzero_ph (),
+					      __A, __R);
 }
 
 #else
-#define _mm512_reduce_ph(A, B)						\
-  (__builtin_ia32_reduceph512_mask_round ((A), (B),			\
-					  _mm512_setzero_ph (),		\
-					  (__mmask32)-1,		\
-					  _MM_FROUND_CUR_DIRECTION))
+#define _mm_cvt_roundss_sh(A, B, R)				\
+  (__builtin_ia32_vcvtss2sh_mask_round ((B), (A),		\
+					_mm_setzero_ph (),	\
+					(__mmask8) -1, R))
 
-#define _mm512_mask_reduce_ph(A, B, C, D)				\
-  (__builtin_ia32_reduceph512_mask_round ((C), (D), (A), (B),		\
-					  _MM_FROUND_CUR_DIRECTION))
+#define _mm_mask_cvt_roundss_sh(A, B, C, D, R)				\
+  (__builtin_ia32_vcvtss2sh_mask_round ((D), (C), (A), (B), (R)))
 
-#define _mm512_maskz_reduce_ph(A, B, C)					\
-  (__builtin_ia32_reduceph512_mask_round ((B), (C),			\
-					  _mm512_setzero_ph (),		\
-					  (A), _MM_FROUND_CUR_DIRECTION))
+#define _mm_maskz_cvt_roundss_sh(A, B, C, R)			\
+  (__builtin_ia32_vcvtss2sh_mask_round ((C), (B),		\
+					_mm_setzero_ph (),	\
+					A, R))
 
-#define _mm512_reduce_round_ph(A, B, C)				\
-  (__builtin_ia32_reduceph512_mask_round ((A), (B),		\
-					  _mm512_setzero_ph (),	\
-					  (__mmask32)-1, (C)))
+#define _mm_cvt_roundsd_sh(A, B, R)				\
+  (__builtin_ia32_vcvtsd2sh_mask_round ((B), (A),		\
+					_mm_setzero_ph (),	\
+					(__mmask8) -1, R))
 
-#define _mm512_mask_reduce_round_ph(A, B, C, D, E)			\
-  (__builtin_ia32_reduceph512_mask_round ((C), (D), (A), (B), (E)))
+#define _mm_mask_cvt_roundsd_sh(A, B, C, D, R)				\
+  (__builtin_ia32_vcvtsd2sh_mask_round ((D), (C), (A), (B), (R)))
 
-#define _mm512_maskz_reduce_round_ph(A, B, C, D)		\
-  (__builtin_ia32_reduceph512_mask_round ((B), (C),		\
-					  _mm512_setzero_ph (),	\
-					  (A), (D)))
+#define _mm_maskz_cvt_roundsd_sh(A, B, C, R)			\
+  (__builtin_ia32_vcvtsd2sh_mask_round ((C), (B),		\
+					_mm_setzero_ph (),	\
+					(A), (R)))
 
 #endif /* __OPTIMIZE__ */
 
-/* Intrinsics vreducesh.  */
-#ifdef __OPTIMIZE__
-extern __inline __m128h
+extern __inline _Float16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_reduce_sh (__m128h __A, __m128h __B, int __C)
+_mm_cvtsh_h (__m128h __A)
 {
-  return __builtin_ia32_reducesh_mask_round (__A, __B, __C,
-					     _mm_setzero_ph (),
-					     (__mmask8) -1,
-					     _MM_FROUND_CUR_DIRECTION);
+  return __A[0];
 }
 
+/* Intrinsics vfmadd[132,213,231]sh.  */
 extern __inline __m128h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_reduce_sh (__m128h __A, __mmask8 __B, __m128h __C,
-		    __m128h __D, int __E)
+  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fmadd_sh (__m128h __W, __m128h __A, __m128h __B)
 {
-  return __builtin_ia32_reducesh_mask_round (__C, __D, __E, __A, __B,
-					     _MM_FROUND_CUR_DIRECTION);
+  return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
+						  (__v8hf) __A,
+						  (__v8hf) __B,
+						  (__mmask8) -1,
+						  _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_reduce_sh (__mmask8 __A, __m128h __B, __m128h __C, int __D)
+_mm_mask_fmadd_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B)
 {
-  return __builtin_ia32_reducesh_mask_round (__B, __C, __D,
-					     _mm_setzero_ph (), __A,
-					     _MM_FROUND_CUR_DIRECTION);
+  return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
+						  (__v8hf) __A,
+						  (__v8hf) __B,
+						  (__mmask8) __U,
+						  _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_reduce_round_sh (__m128h __A, __m128h __B, int __C, const int __D)
+_mm_mask3_fmadd_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U)
 {
-  return __builtin_ia32_reducesh_mask_round (__A, __B, __C,
-					     _mm_setzero_ph (),
-					     (__mmask8) -1, __D);
+  return (__m128h) __builtin_ia32_vfmaddsh3_mask3 ((__v8hf) __W,
+						   (__v8hf) __A,
+						   (__v8hf) __B,
+						   (__mmask8) __U,
+						   _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_reduce_round_sh (__m128h __A, __mmask8 __B, __m128h __C,
-			  __m128h __D, int __E, const int __F)
+_mm_maskz_fmadd_sh (__mmask8 __U, __m128h __W, __m128h __A, __m128h __B)
 {
-  return __builtin_ia32_reducesh_mask_round (__C, __D, __E, __A,
-					     __B, __F);
+  return (__m128h) __builtin_ia32_vfmaddsh3_maskz ((__v8hf) __W,
+						   (__v8hf) __A,
+						   (__v8hf) __B,
+						   (__mmask8) __U,
+						   _MM_FROUND_CUR_DIRECTION);
 }
 
+
+#ifdef __OPTIMIZE__
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_reduce_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
-			   int __D, const int __E)
+_mm_fmadd_round_sh (__m128h __W, __m128h __A, __m128h __B, const int __R)
 {
-  return __builtin_ia32_reducesh_mask_round (__B, __C, __D,
-					     _mm_setzero_ph (),
-					     __A, __E);
+  return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
+						  (__v8hf) __A,
+						  (__v8hf) __B,
+						  (__mmask8) -1,
+						  __R);
 }
 
-#else
-#define _mm_reduce_sh(A, B, C)						\
-  (__builtin_ia32_reducesh_mask_round ((A), (B), (C),			\
-				       _mm_setzero_ph (),		\
-				       (__mmask8)-1,			\
-				       _MM_FROUND_CUR_DIRECTION))
-
-#define _mm_mask_reduce_sh(A, B, C, D, E)				\
-  (__builtin_ia32_reducesh_mask_round ((C), (D), (E), (A), (B),		\
-				       _MM_FROUND_CUR_DIRECTION))
-
-#define _mm_maskz_reduce_sh(A, B, C, D)					\
-  (__builtin_ia32_reducesh_mask_round ((B), (C), (D),			\
-				       _mm_setzero_ph (),		\
-				       (A), _MM_FROUND_CUR_DIRECTION))
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_fmadd_round_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B,
+			 const int __R)
+{
+  return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
+						  (__v8hf) __A,
+						  (__v8hf) __B,
+						  (__mmask8) __U, __R);
+}
 
-#define _mm_reduce_round_sh(A, B, C, D)				\
-  (__builtin_ia32_reducesh_mask_round ((A), (B), (C),		\
-				       _mm_setzero_ph (),	\
-				       (__mmask8)-1, (D)))
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask3_fmadd_round_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U,
+			  const int __R)
+{
+  return (__m128h) __builtin_ia32_vfmaddsh3_mask3 ((__v8hf) __W,
+						   (__v8hf) __A,
+						   (__v8hf) __B,
+						   (__mmask8) __U, __R);
+}
 
-#define _mm_mask_reduce_round_sh(A, B, C, D, E, F)			\
-  (__builtin_ia32_reducesh_mask_round ((C), (D), (E), (A), (B), (F)))
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_fmadd_round_sh (__mmask8 __U, __m128h __W, __m128h __A,
+			  __m128h __B, const int __R)
+{
+  return (__m128h) __builtin_ia32_vfmaddsh3_maskz ((__v8hf) __W,
+						   (__v8hf) __A,
+						   (__v8hf) __B,
+						   (__mmask8) __U, __R);
+}
 
-#define _mm_maskz_reduce_round_sh(A, B, C, D, E)		\
-  (__builtin_ia32_reducesh_mask_round ((B), (C), (D),		\
-				       _mm_setzero_ph (),	\
-				       (A), (E)))
+#else
+#define _mm_fmadd_round_sh(A, B, C, R)					\
+  ((__m128h) __builtin_ia32_vfmaddsh3_mask ((A), (B), (C), (-1), (R)))
+#define _mm_mask_fmadd_round_sh(A, U, B, C, R)				\
+  ((__m128h) __builtin_ia32_vfmaddsh3_mask ((A), (B), (C), (U), (R)))
+#define _mm_mask3_fmadd_round_sh(A, B, C, U, R)				\
+  ((__m128h) __builtin_ia32_vfmaddsh3_mask3 ((A), (B), (C), (U), (R)))
+#define _mm_maskz_fmadd_round_sh(U, A, B, C, R)				\
+  ((__m128h) __builtin_ia32_vfmaddsh3_maskz ((A), (B), (C), (U), (R)))
 
 #endif /* __OPTIMIZE__ */
 
-/* Intrinsics vrndscaleph.  */
-#ifdef __OPTIMIZE__
-extern __inline __m512h
+/* Intrinsics vfnmadd[132,213,231]sh.  */
+extern __inline __m128h
   __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_roundscale_ph (__m512h __A, int __B)
+_mm_fnmadd_sh (__m128h __W, __m128h __A, __m128h __B)
 {
-  return __builtin_ia32_rndscaleph512_mask_round (__A, __B,
-						  _mm512_setzero_ph (),
-						  (__mmask32) -1,
-						  _MM_FROUND_CUR_DIRECTION);
+  return (__m128h) __builtin_ia32_vfnmaddsh3_mask ((__v8hf) __W,
+						   (__v8hf) __A,
+						   (__v8hf) __B,
+						   (__mmask8) -1,
+						   _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_roundscale_ph (__m512h __A, __mmask32 __B,
-			   __m512h __C, int __D)
+_mm_mask_fnmadd_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B)
 {
-  return __builtin_ia32_rndscaleph512_mask_round (__C, __D, __A, __B,
+  return (__m128h) __builtin_ia32_vfnmaddsh3_mask ((__v8hf) __W,
+						  (__v8hf) __A,
+						  (__v8hf) __B,
+						  (__mmask8) __U,
 						  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_roundscale_ph (__mmask32 __A, __m512h __B, int __C)
+_mm_mask3_fnmadd_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U)
 {
-  return __builtin_ia32_rndscaleph512_mask_round (__B, __C,
-						  _mm512_setzero_ph (),
-						  __A,
-						  _MM_FROUND_CUR_DIRECTION);
+  return (__m128h) __builtin_ia32_vfnmaddsh3_mask3 ((__v8hf) __W,
+						   (__v8hf) __A,
+						   (__v8hf) __B,
+						   (__mmask8) __U,
+						   _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_roundscale_round_ph (__m512h __A, int __B, const int __C)
+_mm_maskz_fnmadd_sh (__mmask8 __U, __m128h __W, __m128h __A, __m128h __B)
 {
-  return __builtin_ia32_rndscaleph512_mask_round (__A, __B,
-						  _mm512_setzero_ph (),
-						  (__mmask32) -1,
-						  __C);
+  return (__m128h) __builtin_ia32_vfnmaddsh3_maskz ((__v8hf) __W,
+						   (__v8hf) __A,
+						   (__v8hf) __B,
+						   (__mmask8) __U,
+						   _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512h
+
+#ifdef __OPTIMIZE__
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_roundscale_round_ph (__m512h __A, __mmask32 __B,
-				 __m512h __C, int __D, const int __E)
+_mm_fnmadd_round_sh (__m128h __W, __m128h __A, __m128h __B, const int __R)
 {
-  return __builtin_ia32_rndscaleph512_mask_round (__C, __D, __A,
-						  __B, __E);
+  return (__m128h) __builtin_ia32_vfnmaddsh3_mask ((__v8hf) __W,
+						   (__v8hf) __A,
+						   (__v8hf) __B,
+						   (__mmask8) -1,
+						   __R);
 }
 
-extern __inline __m512h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_roundscale_round_ph (__mmask32 __A, __m512h __B, int __C,
-				  const int __D)
+_mm_mask_fnmadd_round_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B,
+			 const int __R)
 {
-  return __builtin_ia32_rndscaleph512_mask_round (__B, __C,
-						  _mm512_setzero_ph (),
-						  __A, __D);
+  return (__m128h) __builtin_ia32_vfnmaddsh3_mask ((__v8hf) __W,
+						  (__v8hf) __A,
+						  (__v8hf) __B,
+						  (__mmask8) __U, __R);
 }
 
-#else
-#define _mm512_roundscale_ph(A, B)					\
-  (__builtin_ia32_rndscaleph512_mask_round ((A), (B),			\
-					    _mm512_setzero_ph (),	\
-					    (__mmask32)-1,		\
-					    _MM_FROUND_CUR_DIRECTION))
-
-#define _mm512_mask_roundscale_ph(A, B, C, D)				\
-  (__builtin_ia32_rndscaleph512_mask_round ((C), (D), (A), (B),		\
-					    _MM_FROUND_CUR_DIRECTION))
-
-#define _mm512_maskz_roundscale_ph(A, B, C)				\
-  (__builtin_ia32_rndscaleph512_mask_round ((B), (C),			\
-					    _mm512_setzero_ph (),	\
-					    (A),			\
-					    _MM_FROUND_CUR_DIRECTION))
-#define _mm512_roundscale_round_ph(A, B, C)				\
-  (__builtin_ia32_rndscaleph512_mask_round ((A), (B),			\
-					    _mm512_setzero_ph (),	\
-					    (__mmask32)-1, (C)))
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask3_fnmadd_round_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U,
+			  const int __R)
+{
+  return (__m128h) __builtin_ia32_vfnmaddsh3_mask3 ((__v8hf) __W,
+						   (__v8hf) __A,
+						   (__v8hf) __B,
+						   (__mmask8) __U, __R);
+}
 
-#define _mm512_mask_roundscale_round_ph(A, B, C, D, E)			\
-  (__builtin_ia32_rndscaleph512_mask_round ((C), (D), (A), (B), (E)))
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_fnmadd_round_sh (__mmask8 __U, __m128h __W, __m128h __A,
+			  __m128h __B, const int __R)
+{
+  return (__m128h) __builtin_ia32_vfnmaddsh3_maskz ((__v8hf) __W,
+						   (__v8hf) __A,
+						   (__v8hf) __B,
+						   (__mmask8) __U, __R);
+}
 
-#define _mm512_maskz_roundscale_round_ph(A, B, C, D)			\
-  (__builtin_ia32_rndscaleph512_mask_round ((B), (C),			\
-					    _mm512_setzero_ph (),	\
-					    (A), (D)))
+#else
+#define _mm_fnmadd_round_sh(A, B, C, R)					\
+  ((__m128h) __builtin_ia32_vfnmaddsh3_mask ((A), (B), (C), (-1), (R)))
+#define _mm_mask_fnmadd_round_sh(A, U, B, C, R)				\
+  ((__m128h) __builtin_ia32_vfnmaddsh3_mask ((A), (B), (C), (U), (R)))
+#define _mm_mask3_fnmadd_round_sh(A, B, C, U, R)			\
+  ((__m128h) __builtin_ia32_vfnmaddsh3_mask3 ((A), (B), (C), (U), (R)))
+#define _mm_maskz_fnmadd_round_sh(U, A, B, C, R)			\
+  ((__m128h) __builtin_ia32_vfnmaddsh3_maskz ((A), (B), (C), (U), (R)))
 
 #endif /* __OPTIMIZE__ */
 
-/* Intrinsics vrndscalesh.  */
-#ifdef __OPTIMIZE__
+/* Intrinsics vfmsub[132,213,231]sh.  */
 extern __inline __m128h
   __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_roundscale_sh (__m128h __A, __m128h __B, int __C)
+_mm_fmsub_sh (__m128h __W, __m128h __A, __m128h __B)
 {
-  return __builtin_ia32_rndscalesh_mask_round (__A, __B, __C,
-					       _mm_setzero_ph (),
-					       (__mmask8) -1,
-					       _MM_FROUND_CUR_DIRECTION);
+  return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
+						  (__v8hf) __A,
+						  -(__v8hf) __B,
+						  (__mmask8) -1,
+						  _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_roundscale_sh (__m128h __A, __mmask8 __B, __m128h __C,
-			__m128h __D, int __E)
+_mm_mask_fmsub_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B)
 {
-  return __builtin_ia32_rndscalesh_mask_round (__C, __D, __E, __A, __B,
-					       _MM_FROUND_CUR_DIRECTION);
+  return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
+						  (__v8hf) __A,
+						  -(__v8hf) __B,
+						  (__mmask8) __U,
+						  _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_roundscale_sh (__mmask8 __A, __m128h __B, __m128h __C, int __D)
+_mm_mask3_fmsub_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U)
 {
-  return __builtin_ia32_rndscalesh_mask_round (__B, __C, __D,
-					       _mm_setzero_ph (), __A,
-					       _MM_FROUND_CUR_DIRECTION);
+  return (__m128h) __builtin_ia32_vfmsubsh3_mask3 ((__v8hf) __W,
+						   (__v8hf) __A,
+						   (__v8hf) __B,
+						   (__mmask8) __U,
+						   _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_roundscale_round_sh (__m128h __A, __m128h __B, int __C, const int __D)
+_mm_maskz_fmsub_sh (__mmask8 __U, __m128h __W, __m128h __A, __m128h __B)
 {
-  return __builtin_ia32_rndscalesh_mask_round (__A, __B, __C,
-					       _mm_setzero_ph (),
-					       (__mmask8) -1,
-					       __D);
+  return (__m128h) __builtin_ia32_vfmaddsh3_maskz ((__v8hf) __W,
+						   (__v8hf) __A,
+						   -(__v8hf) __B,
+						   (__mmask8) __U,
+						   _MM_FROUND_CUR_DIRECTION);
 }
 
+
+#ifdef __OPTIMIZE__
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_roundscale_round_sh (__m128h __A, __mmask8 __B, __m128h __C,
-			      __m128h __D, int __E, const int __F)
+_mm_fmsub_round_sh (__m128h __W, __m128h __A, __m128h __B, const int __R)
 {
-  return __builtin_ia32_rndscalesh_mask_round (__C, __D, __E,
-					       __A, __B, __F);
+  return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
+						  (__v8hf) __A,
+						  -(__v8hf) __B,
+						  (__mmask8) -1,
+						  __R);
 }
 
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_roundscale_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
-			       int __D, const int __E)
+_mm_mask_fmsub_round_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B,
+			 const int __R)
 {
-  return __builtin_ia32_rndscalesh_mask_round (__B, __C, __D,
-					       _mm_setzero_ph (),
-					       __A, __E);
+  return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
+						  (__v8hf) __A,
+						  -(__v8hf) __B,
+						  (__mmask8) __U, __R);
 }
 
-#else
-#define _mm_roundscale_sh(A, B, C)					\
-  (__builtin_ia32_rndscalesh_mask_round ((A), (B), (C),			\
-					 _mm_setzero_ph (),		\
-					 (__mmask8)-1,			\
-					 _MM_FROUND_CUR_DIRECTION))
-
-#define _mm_mask_roundscale_sh(A, B, C, D, E)				\
-  (__builtin_ia32_rndscalesh_mask_round ((C), (D), (E), (A), (B),	\
-					 _MM_FROUND_CUR_DIRECTION))
-
-#define _mm_maskz_roundscale_sh(A, B, C, D)				\
-  (__builtin_ia32_rndscalesh_mask_round ((B), (C), (D),			\
-					 _mm_setzero_ph (),		\
-					 (A), _MM_FROUND_CUR_DIRECTION))
-
-#define _mm_roundscale_round_sh(A, B, C, D)			\
-  (__builtin_ia32_rndscalesh_mask_round ((A), (B), (C),		\
-					 _mm_setzero_ph (),	\
-					 (__mmask8)-1, (D)))
-
-#define _mm_mask_roundscale_round_sh(A, B, C, D, E, F)			\
-  (__builtin_ia32_rndscalesh_mask_round ((C), (D), (E), (A), (B), (F)))
-
-#define _mm_maskz_roundscale_round_sh(A, B, C, D, E)		\
-  (__builtin_ia32_rndscalesh_mask_round ((B), (C), (D),		\
-					 _mm_setzero_ph (),	\
-					 (A), (E)))
-
-#endif /* __OPTIMIZE__ */
-
-/* Intrinsics vfpclasssh.  */
-#ifdef __OPTIMIZE__
-extern __inline __mmask8
-  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fpclass_sh_mask (__m128h __A, const int __imm)
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask3_fmsub_round_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U,
+			  const int __R)
 {
-  return (__mmask8) __builtin_ia32_fpclasssh_mask ((__v8hf) __A, __imm,
-						   (__mmask8) -1);
+  return (__m128h) __builtin_ia32_vfmsubsh3_mask3 ((__v8hf) __W,
+						   (__v8hf) __A,
+						   (__v8hf) __B,
+						   (__mmask8) __U, __R);
 }
 
-extern __inline __mmask8
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fpclass_sh_mask (__mmask8 __U, __m128h __A, const int __imm)
+_mm_maskz_fmsub_round_sh (__mmask8 __U, __m128h __W, __m128h __A,
+			  __m128h __B, const int __R)
 {
-  return (__mmask8) __builtin_ia32_fpclasssh_mask ((__v8hf) __A, __imm, __U);
+  return (__m128h) __builtin_ia32_vfmaddsh3_maskz ((__v8hf) __W,
+						   (__v8hf) __A,
+						   -(__v8hf) __B,
+						   (__mmask8) __U, __R);
 }
 
 #else
-#define _mm_fpclass_sh_mask(X, C)					\
-  ((__mmask8) __builtin_ia32_fpclasssh_mask ((__v8hf) (__m128h) (X),	\
-					     (int) (C), (__mmask8) (-1))) \
+#define _mm_fmsub_round_sh(A, B, C, R)					\
+  ((__m128h) __builtin_ia32_vfmaddsh3_mask ((A), (B), -(C), (-1), (R)))
+#define _mm_mask_fmsub_round_sh(A, U, B, C, R)				\
+  ((__m128h) __builtin_ia32_vfmaddsh3_mask ((A), (B), -(C), (U), (R)))
+#define _mm_mask3_fmsub_round_sh(A, B, C, U, R)				\
+  ((__m128h) __builtin_ia32_vfmsubsh3_mask3 ((A), (B), (C), (U), (R)))
+#define _mm_maskz_fmsub_round_sh(U, A, B, C, R)				\
+  ((__m128h) __builtin_ia32_vfmaddsh3_maskz ((A), (B), -(C), (U), (R)))
 
-#define _mm_mask_fpclass_sh_mask(U, X, C)				\
-  ((__mmask8) __builtin_ia32_fpclasssh_mask ((__v8hf) (__m128h) (X),	\
-					     (int) (C), (__mmask8) (U)))
 #endif /* __OPTIMIZE__ */
 
-/* Intrinsics vfpclassph.  */
-#ifdef __OPTIMIZE__
-extern __inline __mmask32
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fpclass_ph_mask (__mmask32 __U, __m512h __A,
-			     const int __imm)
+/* Intrinsics vfnmsub[132,213,231]sh.  */
+extern __inline __m128h
+  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fnmsub_sh (__m128h __W, __m128h __A, __m128h __B)
 {
-  return (__mmask32) __builtin_ia32_fpclassph512_mask ((__v32hf) __A,
-						       __imm, __U);
+  return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
+						  -(__v8hf) __A,
+						  -(__v8hf) __B,
+						  (__mmask8) -1,
+						  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __mmask32
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fpclass_ph_mask (__m512h __A, const int __imm)
+_mm_mask_fnmsub_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B)
 {
-  return (__mmask32) __builtin_ia32_fpclassph512_mask ((__v32hf) __A,
-						       __imm,
-						       (__mmask32) -1);
+  return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
+						  -(__v8hf) __A,
+						  -(__v8hf) __B,
+						  (__mmask8) __U,
+						  _MM_FROUND_CUR_DIRECTION);
 }
 
-#else
-#define _mm512_mask_fpclass_ph_mask(u, x, c)				\
-  ((__mmask32) __builtin_ia32_fpclassph512_mask ((__v32hf) (__m512h) (x), \
-						 (int) (c),(__mmask8)(u)))
-
-#define _mm512_fpclass_ph_mask(x, c)                                    \
-  ((__mmask32) __builtin_ia32_fpclassph512_mask ((__v32hf) (__m512h) (x), \
-						 (int) (c),(__mmask8)-1))
-#endif /* __OPIMTIZE__ */
-
-/* Intrinsics vgetexpph, vgetexpsh.  */
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_getexp_sh (__m128h __A, __m128h __B)
+_mm_mask3_fnmsub_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U)
 {
-  return (__m128h)
-    __builtin_ia32_getexpsh_mask_round ((__v8hf) __A, (__v8hf) __B,
-					(__v8hf) _mm_setzero_ph (),
-					(__mmask8) -1,
-					_MM_FROUND_CUR_DIRECTION);
+  return (__m128h) __builtin_ia32_vfmsubsh3_mask3 ((__v8hf) __W,
+						   -(__v8hf) __A,
+						   (__v8hf) __B,
+						   (__mmask8) __U,
+						   _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_getexp_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B)
+_mm_maskz_fnmsub_sh (__mmask8 __U, __m128h __W, __m128h __A, __m128h __B)
 {
-  return (__m128h)
-    __builtin_ia32_getexpsh_mask_round ((__v8hf) __A, (__v8hf) __B,
-					(__v8hf) __W, (__mmask8) __U,
-					_MM_FROUND_CUR_DIRECTION);
+  return (__m128h) __builtin_ia32_vfmaddsh3_maskz ((__v8hf) __W,
+						   -(__v8hf) __A,
+						   -(__v8hf) __B,
+						   (__mmask8) __U,
+						   _MM_FROUND_CUR_DIRECTION);
 }
 
+
+#ifdef __OPTIMIZE__
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_getexp_sh (__mmask8 __U, __m128h __A, __m128h __B)
+_mm_fnmsub_round_sh (__m128h __W, __m128h __A, __m128h __B, const int __R)
 {
-  return (__m128h)
-    __builtin_ia32_getexpsh_mask_round ((__v8hf) __A, (__v8hf) __B,
-					(__v8hf) _mm_setzero_ph (),
-					(__mmask8) __U,
-					_MM_FROUND_CUR_DIRECTION);
+  return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
+						  -(__v8hf) __A,
+						  -(__v8hf) __B,
+						  (__mmask8) -1,
+						  __R);
 }
 
-extern __inline __m512h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_getexp_ph (__m512h __A)
+_mm_mask_fnmsub_round_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B,
+			 const int __R)
 {
-  return (__m512h)
-    __builtin_ia32_getexpph512_mask ((__v32hf) __A,
-				     (__v32hf) _mm512_setzero_ph (),
-				     (__mmask32) -1, _MM_FROUND_CUR_DIRECTION);
+  return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
+						  -(__v8hf) __A,
+						  -(__v8hf) __B,
+						  (__mmask8) __U, __R);
 }
 
-extern __inline __m512h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_getexp_ph (__m512h __W, __mmask32 __U, __m512h __A)
+_mm_mask3_fnmsub_round_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U,
+			  const int __R)
 {
-  return (__m512h)
-    __builtin_ia32_getexpph512_mask ((__v32hf) __A, (__v32hf) __W,
-				     (__mmask32) __U, _MM_FROUND_CUR_DIRECTION);
+  return (__m128h) __builtin_ia32_vfmsubsh3_mask3 ((__v8hf) __W,
+						   -(__v8hf) __A,
+						   (__v8hf) __B,
+						   (__mmask8) __U, __R);
 }
 
-extern __inline __m512h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_getexp_ph (__mmask32 __U, __m512h __A)
+_mm_maskz_fnmsub_round_sh (__mmask8 __U, __m128h __W, __m128h __A,
+			  __m128h __B, const int __R)
 {
-  return (__m512h)
-    __builtin_ia32_getexpph512_mask ((__v32hf) __A,
-				     (__v32hf) _mm512_setzero_ph (),
-				     (__mmask32) __U, _MM_FROUND_CUR_DIRECTION);
+  return (__m128h) __builtin_ia32_vfmaddsh3_maskz ((__v8hf) __W,
+						   -(__v8hf) __A,
+						   -(__v8hf) __B,
+						   (__mmask8) __U, __R);
 }
 
-#ifdef __OPTIMIZE__
+#else
+#define _mm_fnmsub_round_sh(A, B, C, R)					\
+  ((__m128h) __builtin_ia32_vfmaddsh3_mask ((A), -(B), -(C), (-1), (R)))
+#define _mm_mask_fnmsub_round_sh(A, U, B, C, R)				\
+  ((__m128h) __builtin_ia32_vfmaddsh3_mask ((A), -(B), -(C), (U), (R)))
+#define _mm_mask3_fnmsub_round_sh(A, B, C, U, R)			\
+  ((__m128h) __builtin_ia32_vfmsubsh3_mask3 ((A), -(B), (C), (U), (R)))
+#define _mm_maskz_fnmsub_round_sh(U, A, B, C, R)			\
+  ((__m128h) __builtin_ia32_vfmaddsh3_maskz ((A), -(B), -(C), (U), (R)))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vf[,c]maddcsh.  */
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_getexp_round_sh (__m128h __A, __m128h __B, const int __R)
+_mm_mask_fcmadd_sch (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
 {
-  return (__m128h) __builtin_ia32_getexpsh_mask_round ((__v8hf) __A,
-						       (__v8hf) __B,
-						       _mm_setzero_ph (),
-						       (__mmask8) -1,
-						       __R);
+  return (__m128h)
+    __builtin_ia32_vfcmaddcsh_mask_round ((__v8hf) __A,
+					  (__v8hf) __C,
+					  (__v8hf) __D, __B,
+					  _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_getexp_round_sh (__m128h __W, __mmask8 __U, __m128h __A,
-			  __m128h __B, const int __R)
+_mm_mask3_fcmadd_sch (__m128h __A, __m128h __B, __m128h __C, __mmask8 __D)
 {
-  return (__m128h) __builtin_ia32_getexpsh_mask_round ((__v8hf) __A,
-						       (__v8hf) __B,
-						       (__v8hf) __W,
-						       (__mmask8) __U, __R);
+  return (__m128h)
+    __builtin_ia32_vfcmaddcsh_mask3_round ((__v8hf) __A,
+					   (__v8hf) __B,
+					   (__v8hf) __C, __D,
+					   _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_getexp_round_sh (__mmask8 __U, __m128h __A, __m128h __B,
-			   const int __R)
+_mm_maskz_fcmadd_sch (__mmask8 __A, __m128h __B, __m128h __C, __m128h __D)
 {
-  return (__m128h) __builtin_ia32_getexpsh_mask_round ((__v8hf) __A,
-						       (__v8hf) __B,
-						       (__v8hf)
-						       _mm_setzero_ph (),
-						       (__mmask8) __U, __R);
+  return (__m128h)
+    __builtin_ia32_vfcmaddcsh_maskz_round ((__v8hf) __B,
+					   (__v8hf) __C,
+					   (__v8hf) __D,
+					   __A, _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_getexp_round_ph (__m512h __A, const int __R)
+_mm_fcmadd_sch (__m128h __A, __m128h __B, __m128h __C)
 {
-  return (__m512h) __builtin_ia32_getexpph512_mask ((__v32hf) __A,
-						    (__v32hf)
-						    _mm512_setzero_ph (),
-						    (__mmask32) -1, __R);
+  return (__m128h)
+    __builtin_ia32_vfcmaddcsh_round ((__v8hf) __A,
+				     (__v8hf) __B,
+				     (__v8hf) __C,
+				     _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_getexp_round_ph (__m512h __W, __mmask32 __U, __m512h __A,
-			     const int __R)
-{
-  return (__m512h) __builtin_ia32_getexpph512_mask ((__v32hf) __A,
-						    (__v32hf) __W,
-						    (__mmask32) __U, __R);
-}
-
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_getexp_round_ph (__mmask32 __U, __m512h __A, const int __R)
+_mm_mask_fmadd_sch (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
 {
-  return (__m512h) __builtin_ia32_getexpph512_mask ((__v32hf) __A,
-						    (__v32hf)
-						    _mm512_setzero_ph (),
-						    (__mmask32) __U, __R);
+  return (__m128h)
+    __builtin_ia32_vfmaddcsh_mask_round ((__v8hf) __A,
+					 (__v8hf) __C,
+					 (__v8hf) __D, __B,
+					 _MM_FROUND_CUR_DIRECTION);
 }
 
-#else
-#define _mm_getexp_round_sh(A, B, R)					\
-  ((__m128h)__builtin_ia32_getexpsh_mask_round((__v8hf)(__m128h)(A),	\
-					       (__v8hf)(__m128h)(B),	\
-					       (__v8hf)_mm_setzero_ph(), \
-					       (__mmask8)-1, R))
-
-#define _mm_mask_getexp_round_sh(W, U, A, B, C)			\
-  (__m128h)__builtin_ia32_getexpsh_mask_round(A, B, W, U, C)
-
-#define _mm_maskz_getexp_round_sh(U, A, B, C)				\
-  (__m128h)__builtin_ia32_getexpsh_mask_round(A, B,			\
-					      (__v8hf)_mm_setzero_ph(),	\
-					      U, C)
-
-#define _mm512_getexp_round_ph(A, R)					\
-  ((__m512h)__builtin_ia32_getexpph512_mask((__v32hf)(__m512h)(A),	\
-					    (__v32hf)_mm512_setzero_ph(), (__mmask32)-1, R))
-
-#define _mm512_mask_getexp_round_ph(W, U, A, R)				\
-  ((__m512h)__builtin_ia32_getexpph512_mask((__v32hf)(__m512h)(A),	\
-					    (__v32hf)(__m512h)(W), (__mmask32)(U), R))
-
-#define _mm512_maskz_getexp_round_ph(U, A, R)				\
-  ((__m512h)__builtin_ia32_getexpph512_mask((__v32hf)(__m512h)(A),	\
-					    (__v32hf)_mm512_setzero_ph(), (__mmask32)(U), R))
-
-#endif /* __OPTIMIZE__ */
-
-/* Intrinsics vgetmantph, vgetmantsh.  */
-#ifdef __OPTIMIZE__
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_getmant_sh (__m128h __A, __m128h __B,
-		_MM_MANTISSA_NORM_ENUM __C,
-		_MM_MANTISSA_SIGN_ENUM __D)
+_mm_mask3_fmadd_sch (__m128h __A, __m128h __B, __m128h __C, __mmask8 __D)
 {
   return (__m128h)
-    __builtin_ia32_getmantsh_mask_round ((__v8hf) __A, (__v8hf) __B,
-					 (__D << 2) | __C, _mm_setzero_ph (),
-					 (__mmask8) -1,
-					 _MM_FROUND_CUR_DIRECTION);
+    __builtin_ia32_vfmaddcsh_mask3_round ((__v8hf) __A,
+					  (__v8hf) __B,
+					  (__v8hf) __C, __D,
+					  _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_getmant_sh (__m128h __W, __mmask8 __U, __m128h __A,
-		     __m128h __B, _MM_MANTISSA_NORM_ENUM __C,
-		     _MM_MANTISSA_SIGN_ENUM __D)
+_mm_maskz_fmadd_sch (__mmask8 __A, __m128h __B, __m128h __C, __m128h __D)
 {
   return (__m128h)
-    __builtin_ia32_getmantsh_mask_round ((__v8hf) __A, (__v8hf) __B,
-					 (__D << 2) | __C, (__v8hf) __W,
-					 __U, _MM_FROUND_CUR_DIRECTION);
+    __builtin_ia32_vfmaddcsh_maskz_round ((__v8hf) __B,
+					  (__v8hf) __C,
+					  (__v8hf) __D,
+					  __A, _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_getmant_sh (__mmask8 __U, __m128h __A, __m128h __B,
-		      _MM_MANTISSA_NORM_ENUM __C,
-		      _MM_MANTISSA_SIGN_ENUM __D)
+_mm_fmadd_sch (__m128h __A, __m128h __B, __m128h __C)
 {
   return (__m128h)
-    __builtin_ia32_getmantsh_mask_round ((__v8hf) __A, (__v8hf) __B,
-					 (__D << 2) | __C,
-					 (__v8hf) _mm_setzero_ph(),
-					 __U, _MM_FROUND_CUR_DIRECTION);
+    __builtin_ia32_vfmaddcsh_round ((__v8hf) __A,
+				    (__v8hf) __B,
+				    (__v8hf) __C,
+				    _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512h
+#ifdef __OPTIMIZE__
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_getmant_ph (__m512h __A, _MM_MANTISSA_NORM_ENUM __B,
-		   _MM_MANTISSA_SIGN_ENUM __C)
+_mm_mask_fcmadd_round_sch (__m128h __A, __mmask8 __B, __m128h __C,
+			   __m128h __D, const int __E)
 {
-  return (__m512h) __builtin_ia32_getmantph512_mask ((__v32hf) __A,
-						     (__C << 2) | __B,
-						     _mm512_setzero_ph (),
-						     (__mmask32) -1,
-						     _MM_FROUND_CUR_DIRECTION);
+  return (__m128h)
+    __builtin_ia32_vfcmaddcsh_mask_round ((__v8hf) __A,
+					  (__v8hf) __C,
+					  (__v8hf) __D,
+					  __B, __E);
 }
 
-extern __inline __m512h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_getmant_ph (__m512h __W, __mmask32 __U, __m512h __A,
-			_MM_MANTISSA_NORM_ENUM __B,
-			_MM_MANTISSA_SIGN_ENUM __C)
+_mm_mask3_fcmadd_round_sch (__m128h __A, __m128h __B, __m128h __C,
+			    __mmask8 __D, const int __E)
 {
-  return (__m512h) __builtin_ia32_getmantph512_mask ((__v32hf) __A,
-						     (__C << 2) | __B,
-						     (__v32hf) __W, __U,
-						     _MM_FROUND_CUR_DIRECTION);
+  return (__m128h)
+    __builtin_ia32_vfcmaddcsh_mask3_round ((__v8hf) __A,
+					   (__v8hf) __B,
+					   (__v8hf) __C,
+					   __D, __E);
 }
 
-extern __inline __m512h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_getmant_ph (__mmask32 __U, __m512h __A,
-			 _MM_MANTISSA_NORM_ENUM __B,
-			 _MM_MANTISSA_SIGN_ENUM __C)
+_mm_maskz_fcmadd_round_sch (__mmask8 __A, __m128h __B, __m128h __C,
+			    __m128h __D, const int __E)
 {
-  return (__m512h) __builtin_ia32_getmantph512_mask ((__v32hf) __A,
-						     (__C << 2) | __B,
-						     (__v32hf)
-						     _mm512_setzero_ph (),
-						     __U,
-						     _MM_FROUND_CUR_DIRECTION);
+  return (__m128h)
+    __builtin_ia32_vfcmaddcsh_maskz_round ((__v8hf) __B,
+					   (__v8hf) __C,
+					   (__v8hf) __D,
+					   __A, __E);
 }
 
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_getmant_round_sh (__m128h __A, __m128h __B,
-		      _MM_MANTISSA_NORM_ENUM __C,
-		      _MM_MANTISSA_SIGN_ENUM __D, const int __R)
+_mm_fcmadd_round_sch (__m128h __A, __m128h __B, __m128h __C, const int __D)
 {
-  return (__m128h) __builtin_ia32_getmantsh_mask_round ((__v8hf) __A,
-							(__v8hf) __B,
-							(__D << 2) | __C,
-							_mm_setzero_ph (),
-							(__mmask8) -1,
-							__R);
+  return (__m128h)
+    __builtin_ia32_vfcmaddcsh_round ((__v8hf) __A,
+				     (__v8hf) __B,
+				     (__v8hf) __C,
+				     __D);
 }
 
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_getmant_round_sh (__m128h __W, __mmask8 __U, __m128h __A,
-			   __m128h __B, _MM_MANTISSA_NORM_ENUM __C,
-			   _MM_MANTISSA_SIGN_ENUM __D, const int __R)
+_mm_mask_fmadd_round_sch (__m128h __A, __mmask8 __B, __m128h __C,
+			  __m128h __D, const int __E)
 {
-  return (__m128h) __builtin_ia32_getmantsh_mask_round ((__v8hf) __A,
-							(__v8hf) __B,
-							(__D << 2) | __C,
-							(__v8hf) __W,
-							__U, __R);
+  return (__m128h)
+    __builtin_ia32_vfmaddcsh_mask_round ((__v8hf) __A,
+					 (__v8hf) __C,
+					 (__v8hf) __D,
+					 __B, __E);
 }
 
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_getmant_round_sh (__mmask8 __U, __m128h __A, __m128h __B,
-			    _MM_MANTISSA_NORM_ENUM __C,
-			    _MM_MANTISSA_SIGN_ENUM __D, const int __R)
+_mm_mask3_fmadd_round_sch (__m128h __A, __m128h __B, __m128h __C,
+			   __mmask8 __D, const int __E)
 {
-  return (__m128h) __builtin_ia32_getmantsh_mask_round ((__v8hf) __A,
-							(__v8hf) __B,
-							(__D << 2) | __C,
-							(__v8hf)
-							_mm_setzero_ph(),
-							__U, __R);
+  return (__m128h)
+    __builtin_ia32_vfmaddcsh_mask3_round ((__v8hf) __A,
+					  (__v8hf) __B,
+					  (__v8hf) __C,
+					  __D, __E);
 }
 
-extern __inline __m512h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_getmant_round_ph (__m512h __A, _MM_MANTISSA_NORM_ENUM __B,
-			 _MM_MANTISSA_SIGN_ENUM __C, const int __R)
+_mm_maskz_fmadd_round_sch (__mmask8 __A, __m128h __B, __m128h __C,
+			   __m128h __D, const int __E)
 {
-  return (__m512h) __builtin_ia32_getmantph512_mask ((__v32hf) __A,
-						     (__C << 2) | __B,
-						     _mm512_setzero_ph (),
-						     (__mmask32) -1, __R);
+  return (__m128h)
+    __builtin_ia32_vfmaddcsh_maskz_round ((__v8hf) __B,
+					  (__v8hf) __C,
+					  (__v8hf) __D,
+					  __A, __E);
 }
 
-extern __inline __m512h
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_getmant_round_ph (__m512h __W, __mmask32 __U, __m512h __A,
-			      _MM_MANTISSA_NORM_ENUM __B,
-			      _MM_MANTISSA_SIGN_ENUM __C, const int __R)
+_mm_fmadd_round_sch (__m128h __A, __m128h __B, __m128h __C, const int __D)
 {
-  return (__m512h) __builtin_ia32_getmantph512_mask ((__v32hf) __A,
-						     (__C << 2) | __B,
-						     (__v32hf) __W, __U,
-						     __R);
+  return (__m128h)
+    __builtin_ia32_vfmaddcsh_round ((__v8hf) __A,
+				    (__v8hf) __B,
+				    (__v8hf) __C,
+				    __D);
 }
+#else
+#define _mm_mask_fcmadd_round_sch(A, B, C, D, E)			\
+    ((__m128h)								\
+     __builtin_ia32_vfcmaddcsh_mask_round ((__v8hf) (A),		\
+					   (__v8hf) (C),		\
+					   (__v8hf) (D),		\
+					   (B), (E)))
 
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_getmant_round_ph (__mmask32 __U, __m512h __A,
-			       _MM_MANTISSA_NORM_ENUM __B,
-			       _MM_MANTISSA_SIGN_ENUM __C, const int __R)
-{
-  return (__m512h) __builtin_ia32_getmantph512_mask ((__v32hf) __A,
-						     (__C << 2) | __B,
-						     (__v32hf)
-						     _mm512_setzero_ph (),
-						     __U, __R);
-}
 
-#else
-#define _mm512_getmant_ph(X, B, C)					\
-  ((__m512h)__builtin_ia32_getmantph512_mask ((__v32hf)(__m512h)(X),	\
-					      (int)(((C)<<2) | (B)),	\
-					      (__v32hf)(__m512h)	\
-					      _mm512_setzero_ph(),	\
-					      (__mmask32)-1,		\
-					      _MM_FROUND_CUR_DIRECTION))
+#define _mm_mask3_fcmadd_round_sch(A, B, C, D, E)			\
+  ((__m128h)								\
+   __builtin_ia32_vfcmaddcsh_mask3_round ((__v8hf) (A),		\
+					  (__v8hf) (B),		\
+					  (__v8hf) (C),		\
+					  (D), (E)))
 
-#define _mm512_mask_getmant_ph(W, U, X, B, C)				\
-  ((__m512h)__builtin_ia32_getmantph512_mask ((__v32hf)(__m512h)(X),	\
-					      (int)(((C)<<2) | (B)),	\
-					      (__v32hf)(__m512h)(W),	\
-					      (__mmask32)(U),		\
-					      _MM_FROUND_CUR_DIRECTION))
+#define _mm_maskz_fcmadd_round_sch(A, B, C, D, E)		\
+  __builtin_ia32_vfcmaddcsh_maskz_round ((B), (C), (D), (A), (E))
 
+#define _mm_fcmadd_round_sch(A, B, C, D)		\
+  __builtin_ia32_vfcmaddcsh_round ((A), (B), (C), (D))
 
-#define _mm512_maskz_getmant_ph(U, X, B, C)				\
-  ((__m512h)__builtin_ia32_getmantph512_mask ((__v32hf)(__m512h)(X),	\
-					      (int)(((C)<<2) | (B)),	\
-					      (__v32hf)(__m512h)	\
-					      _mm512_setzero_ph(),	\
-					      (__mmask32)(U),		\
-					      _MM_FROUND_CUR_DIRECTION))
-
-#define _mm_getmant_sh(X, Y, C, D)					\
-  ((__m128h)__builtin_ia32_getmantsh_mask_round ((__v8hf)(__m128h)(X),	\
-						 (__v8hf)(__m128h)(Y),	\
-						 (int)(((D)<<2) | (C)),	\
-						 (__v8hf)(__m128h)	\
-						 _mm_setzero_ph (),	\
-						 (__mmask8)-1,		\
-						 _MM_FROUND_CUR_DIRECTION))
-
-#define _mm_mask_getmant_sh(W, U, X, Y, C, D)				\
-  ((__m128h)__builtin_ia32_getmantsh_mask_round ((__v8hf)(__m128h)(X),	\
-						 (__v8hf)(__m128h)(Y),	\
-						 (int)(((D)<<2) | (C)),	\
-						 (__v8hf)(__m128h)(W),	\
-						 (__mmask8)(U),		\
-						 _MM_FROUND_CUR_DIRECTION))
-
-#define _mm_maskz_getmant_sh(U, X, Y, C, D)				\
-  ((__m128h)__builtin_ia32_getmantsh_mask_round ((__v8hf)(__m128h)(X),	\
-						 (__v8hf)(__m128h)(Y),	\
-						 (int)(((D)<<2) | (C)),	\
-						 (__v8hf)(__m128h)	\
-						 _mm_setzero_ph(),	\
-						 (__mmask8)(U),		\
-						 _MM_FROUND_CUR_DIRECTION))
-
-#define _mm512_getmant_round_ph(X, B, C, R)				\
-  ((__m512h)__builtin_ia32_getmantph512_mask ((__v32hf)(__m512h)(X),	\
-					      (int)(((C)<<2) | (B)),	\
-					      (__v32hf)(__m512h)	\
-					      _mm512_setzero_ph(),	\
-					      (__mmask32)-1,		\
-					      (R)))
-
-#define _mm512_mask_getmant_round_ph(W, U, X, B, C, R)			\
-  ((__m512h)__builtin_ia32_getmantph512_mask ((__v32hf)(__m512h)(X),	\
-					      (int)(((C)<<2) | (B)),	\
-					      (__v32hf)(__m512h)(W),	\
-					      (__mmask32)(U),		\
-					      (R)))
-
-
-#define _mm512_maskz_getmant_round_ph(U, X, B, C, R)			\
-  ((__m512h)__builtin_ia32_getmantph512_mask ((__v32hf)(__m512h)(X),	\
-					      (int)(((C)<<2) | (B)),	\
-					      (__v32hf)(__m512h)	\
-					      _mm512_setzero_ph(),	\
-					      (__mmask32)(U),		\
-					      (R)))
+#define _mm_mask_fmadd_round_sch(A, B, C, D, E)				\
+    ((__m128h)								\
+     __builtin_ia32_vfmaddcsh_mask_round ((__v8hf) (A),		\
+					  (__v8hf) (C),		\
+					  (__v8hf) (D),		\
+					  (B), (E)))
 
-#define _mm_getmant_round_sh(X, Y, C, D, R)				\
-  ((__m128h)__builtin_ia32_getmantsh_mask_round ((__v8hf)(__m128h)(X),	\
-						 (__v8hf)(__m128h)(Y),	\
-						 (int)(((D)<<2) | (C)),	\
-						 (__v8hf)(__m128h)	\
-						 _mm_setzero_ph (),	\
-						 (__mmask8)-1,		\
-						 (R)))
+#define _mm_mask3_fmadd_round_sch(A, B, C, D, E)			\
+  ((__m128h)								\
+   __builtin_ia32_vfmaddcsh_mask3_round ((__v8hf) (A),		\
+					 (__v8hf) (B),		\
+					 (__v8hf) (C),		\
+					 (D), (E)))
 
-#define _mm_mask_getmant_round_sh(W, U, X, Y, C, D, R)			\
-  ((__m128h)__builtin_ia32_getmantsh_mask_round ((__v8hf)(__m128h)(X),	\
-						 (__v8hf)(__m128h)(Y),	\
-						 (int)(((D)<<2) | (C)),	\
-						 (__v8hf)(__m128h)(W),	\
-						 (__mmask8)(U),		\
-						 (R)))
+#define _mm_maskz_fmadd_round_sch(A, B, C, D, E)		\
+  __builtin_ia32_vfmaddcsh_maskz_round ((B), (C), (D), (A), (E))
 
-#define _mm_maskz_getmant_round_sh(U, X, Y, C, D, R)			\
-  ((__m128h)__builtin_ia32_getmantsh_mask_round ((__v8hf)(__m128h)(X),	\
-						 (__v8hf)(__m128h)(Y),	\
-						 (int)(((D)<<2) | (C)),	\
-						 (__v8hf)(__m128h)	\
-						 _mm_setzero_ph(),	\
-						 (__mmask8)(U),		\
-						 (R)))
+#define _mm_fmadd_round_sch(A, B, C, D)		\
+  __builtin_ia32_vfmaddcsh_round ((A), (B), (C), (D))
 
 #endif /* __OPTIMIZE__ */
 
-/* Intrinsics vmovw.  */
-extern __inline __m128i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtsi16_si128 (short __A)
-{
-  return _mm_set_epi16 (0, 0, 0, 0, 0, 0, 0, __A);
-}
-
-extern __inline short
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtsi128_si16 (__m128i __A)
-{
-  return __builtin_ia32_vec_ext_v8hi ((__v8hi)__A, 0);
-}
-
-/* Intrinsics vmovsh.  */
+/* Intrinsics vf[,c]mulcsh.  */
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_load_sh (__m128h __A, __mmask8 __B, _Float16 const* __C)
+_mm_fcmul_sch (__m128h __A, __m128h __B)
 {
-  return __builtin_ia32_loadsh_mask (__C, __A, __B);
+  return (__m128h)
+    __builtin_ia32_vfcmulcsh_round ((__v8hf) __A,
+				    (__v8hf) __B,
+				    _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_load_sh (__mmask8 __A, _Float16 const* __B)
+_mm_mask_fcmul_sch (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
 {
-  return __builtin_ia32_loadsh_mask (__B, _mm_setzero_ph (), __A);
+  return (__m128h)
+    __builtin_ia32_vfcmulcsh_mask_round ((__v8hf) __C,
+					 (__v8hf) __D,
+					 (__v8hf) __A,
+					 __B, _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline void
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_store_sh (_Float16 const* __A, __mmask8 __B, __m128h __C)
+_mm_maskz_fcmul_sch (__mmask8 __A, __m128h __B, __m128h __C)
 {
-  __builtin_ia32_storesh_mask (__A,  __C, __B);
+  return (__m128h)
+    __builtin_ia32_vfcmulcsh_mask_round ((__v8hf) __B,
+					 (__v8hf) __C,
+					 _mm_setzero_ph (),
+					 __A, _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_move_sh (__m128h __A, __m128h  __B)
+_mm_fmul_sch (__m128h __A, __m128h __B)
 {
-  __A[0] = __B[0];
-  return __A;
+  return (__m128h)
+    __builtin_ia32_vfmulcsh_round ((__v8hf) __A,
+				   (__v8hf) __B,
+				   _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_move_sh (__m128h __A, __mmask8 __B, __m128h  __C, __m128h __D)
+_mm_mask_fmul_sch (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
 {
-  return __builtin_ia32_vmovsh_mask (__C, __D, __A, __B);
+  return (__m128h)
+    __builtin_ia32_vfmulcsh_mask_round ((__v8hf) __C,
+					(__v8hf) __D,
+					(__v8hf) __A,
+					__B, _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_move_sh (__mmask8 __A, __m128h  __B, __m128h __C)
+_mm_maskz_fmul_sch (__mmask8 __A, __m128h __B, __m128h __C)
 {
-  return __builtin_ia32_vmovsh_mask (__B, __C, _mm_setzero_ph (), __A);
+  return (__m128h)
+    __builtin_ia32_vfmulcsh_mask_round ((__v8hf) __B,
+					(__v8hf) __C,
+					_mm_setzero_ph (),
+					__A, _MM_FROUND_CUR_DIRECTION);
 }
 
-/* Intrinsics vcvtph2dq.  */
-extern __inline __m512i
+#ifdef __OPTIMIZE__
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtph_epi32 (__m256h __A)
+_mm_fcmul_round_sch (__m128h __A, __m128h __B, const int __D)
 {
-  return (__m512i)
-    __builtin_ia32_vcvtph2dq512_mask_round (__A,
-					    (__v16si)
-					    _mm512_setzero_si512 (),
-					    (__mmask16) -1,
-					    _MM_FROUND_CUR_DIRECTION);
+  return (__m128h)
+    __builtin_ia32_vfcmulcsh_round ((__v8hf) __A,
+				    (__v8hf) __B,
+				    __D);
 }
 
-extern __inline __m512i
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtph_epi32 (__m512i __A, __mmask16 __B, __m256h __C)
+_mm_mask_fcmul_round_sch (__m128h __A, __mmask8 __B, __m128h __C,
+			  __m128h __D, const int __E)
 {
-  return (__m512i)
-    __builtin_ia32_vcvtph2dq512_mask_round (__C,
-					    (__v16si) __A,
-					    __B,
-					    _MM_FROUND_CUR_DIRECTION);
+  return (__m128h)
+    __builtin_ia32_vfcmulcsh_mask_round ((__v8hf) __C,
+					 (__v8hf) __D,
+					 (__v8hf) __A,
+					 __B, __E);
 }
 
-extern __inline __m512i
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtph_epi32 (__mmask16 __A, __m256h __B)
+_mm_maskz_fcmul_round_sch (__mmask8 __A, __m128h __B, __m128h __C,
+			   const int __E)
 {
-  return (__m512i)
-    __builtin_ia32_vcvtph2dq512_mask_round (__B,
-					    (__v16si)
-					    _mm512_setzero_si512 (),
-					    __A,
-					    _MM_FROUND_CUR_DIRECTION);
+  return (__m128h)
+    __builtin_ia32_vfcmulcsh_mask_round ((__v8hf) __B,
+					 (__v8hf) __C,
+					 _mm_setzero_ph (),
+					 __A, __E);
 }
 
-#ifdef __OPTIMIZE__
-extern __inline __m512i
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvt_roundph_epi32 (__m256h __A, int __B)
+_mm_fmul_round_sch (__m128h __A, __m128h __B, const int __D)
 {
-  return (__m512i)
-    __builtin_ia32_vcvtph2dq512_mask_round (__A,
-					    (__v16si)
-					    _mm512_setzero_si512 (),
-					    (__mmask16) -1,
-					    __B);
+  return (__m128h)
+    __builtin_ia32_vfmulcsh_round ((__v8hf) __A,
+				   (__v8hf) __B, __D);
 }
 
-extern __inline __m512i
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvt_roundph_epi32 (__m512i __A, __mmask16 __B, __m256h __C, int __D)
+_mm_mask_fmul_round_sch (__m128h __A, __mmask8 __B, __m128h __C,
+			 __m128h __D, const int __E)
 {
-  return (__m512i)
-    __builtin_ia32_vcvtph2dq512_mask_round (__C,
-					    (__v16si) __A,
-					    __B,
-					    __D);
+  return (__m128h)
+    __builtin_ia32_vfmulcsh_mask_round ((__v8hf) __C,
+					(__v8hf) __D,
+					(__v8hf) __A,
+					__B, __E);
 }
 
-extern __inline __m512i
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvt_roundph_epi32 (__mmask16 __A, __m256h __B, int __C)
+_mm_maskz_fmul_round_sch (__mmask8 __A, __m128h __B, __m128h __C, const int __E)
 {
-  return (__m512i)
-    __builtin_ia32_vcvtph2dq512_mask_round (__B,
-					    (__v16si)
-					    _mm512_setzero_si512 (),
-					    __A,
-					    __C);
+  return (__m128h)
+    __builtin_ia32_vfmulcsh_mask_round ((__v8hf) __B,
+					(__v8hf) __C,
+					_mm_setzero_ph (),
+					__A, __E);
 }
 
 #else
-#define _mm512_cvt_roundph_epi32(A, B)					\
-  ((__m512i)								\
-   __builtin_ia32_vcvtph2dq512_mask_round ((A),				\
-					   (__v16si)			\
-					   _mm512_setzero_si512 (),	\
-					   (__mmask16)-1,		\
-					   (B)))
+#define _mm_fcmul_round_sch(__A, __B, __D)				\
+  (__m128h) __builtin_ia32_vfcmulcsh_round ((__v8hf) __A,		\
+					    (__v8hf) __B, __D)
 
-#define _mm512_mask_cvt_roundph_epi32(A, B, C, D)			\
-  ((__m512i)								\
-   __builtin_ia32_vcvtph2dq512_mask_round ((C), (__v16si)(A), (B), (D)))
+#define _mm_mask_fcmul_round_sch(__A, __B, __C, __D, __E)		\
+  (__m128h) __builtin_ia32_vfcmulcsh_mask_round ((__v8hf) __C,		\
+						 (__v8hf) __D,		\
+						 (__v8hf) __A,		\
+						 __B, __E)
 
-#define _mm512_maskz_cvt_roundph_epi32(A, B, C)				\
-  ((__m512i)								\
-   __builtin_ia32_vcvtph2dq512_mask_round ((B),				\
-					   (__v16si)			\
-					   _mm512_setzero_si512 (),	\
-					   (A),				\
-					   (C)))
+#define _mm_maskz_fcmul_round_sch(__A, __B, __C, __E)			\
+  (__m128h) __builtin_ia32_vfcmulcsh_mask_round ((__v8hf) __B,		\
+						 (__v8hf) __C,		\
+						 _mm_setzero_ph (),	\
+						 __A, __E)
+
+#define _mm_fmul_round_sch(__A, __B, __D)				\
+  (__m128h) __builtin_ia32_vfmulcsh_round ((__v8hf) __A,		\
+					   (__v8hf) __B, __D)
+
+#define _mm_mask_fmul_round_sch(__A, __B, __C, __D, __E)		\
+  (__m128h) __builtin_ia32_vfmulcsh_mask_round ((__v8hf) __C,		\
+						(__v8hf) __D,		\
+						(__v8hf) __A,		\
+						__B, __E)
+
+#define _mm_maskz_fmul_round_sch(__A, __B, __C, __E)			\
+  (__m128h) __builtin_ia32_vfmulcsh_mask_round ((__v8hf) __B,		\
+						(__v8hf) __C,		\
+						_mm_setzero_ph (),	\
+						__A, __E)
 
 #endif /* __OPTIMIZE__ */
 
-/* Intrinsics vcvtph2udq.  */
-extern __inline __m512i
+#define _mm_mul_sch(A, B) _mm_fmul_sch ((A), (B))
+#define _mm_mask_mul_sch(W, U, A, B) _mm_mask_fmul_sch ((W), (U), (A), (B))
+#define _mm_maskz_mul_sch(U, A, B) _mm_maskz_fmul_sch ((U), (A), (B))
+#define _mm_mul_round_sch(A, B, R) _mm_fmul_round_sch ((A), (B), (R))
+#define _mm_mask_mul_round_sch(W, U, A, B, R)			      \
+  _mm_mask_fmul_round_sch ((W), (U), (A), (B), (R))
+#define _mm_maskz_mul_round_sch(U, A, B, R)			      \
+  _mm_maskz_fmul_round_sch ((U), (A), (B), (R))
+
+#define _mm_cmul_sch(A, B) _mm_fcmul_sch ((A), (B))
+#define _mm_mask_cmul_sch(W, U, A, B) _mm_mask_fcmul_sch ((W), (U), (A), (B))
+#define _mm_maskz_cmul_sch(U, A, B) _mm_maskz_fcmul_sch ((U), (A), (B))
+#define _mm_cmul_round_sch(A, B, R) _mm_fcmul_round_sch ((A), (B), (R))
+#define _mm_mask_cmul_round_sch(W, U, A, B, R)			      \
+  _mm_mask_fcmul_round_sch ((W), (U), (A), (B), (R))
+#define _mm_maskz_cmul_round_sch(U, A, B, R)			      \
+  _mm_maskz_fcmul_round_sch ((U), (A), (B), (R))
+
+#ifdef __DISABLE_AVX512FP16__
+#undef __DISABLE_AVX512FP16__
+#pragma GCC pop_options
+#endif /* __DISABLE_AVX512FP16__ */
+
+#if !defined (__AVX512FP16__) || !defined (__EVEX512__)
+#pragma GCC push_options
+#pragma GCC target("avx512fp16,evex512")
+#define __DISABLE_AVX512FP16_512__
+#endif /* __AVX512FP16_512__ */
+
+typedef _Float16 __v32hf __attribute__ ((__vector_size__ (64)));
+typedef _Float16 __m512h __attribute__ ((__vector_size__ (64), __may_alias__));
+typedef _Float16 __m512h_u __attribute__ ((__vector_size__ (64),	\
+					   __may_alias__, __aligned__ (1)));
+
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtph_epu32 (__m256h __A)
+_mm512_set_ph (_Float16 __A31, _Float16 __A30, _Float16 __A29,
+	       _Float16 __A28, _Float16 __A27, _Float16 __A26,
+	       _Float16 __A25, _Float16 __A24, _Float16 __A23,
+	       _Float16 __A22, _Float16 __A21, _Float16 __A20,
+	       _Float16 __A19, _Float16 __A18, _Float16 __A17,
+	       _Float16 __A16, _Float16 __A15, _Float16 __A14,
+	       _Float16 __A13, _Float16 __A12, _Float16 __A11,
+	       _Float16 __A10, _Float16 __A9, _Float16 __A8,
+	       _Float16 __A7, _Float16 __A6, _Float16 __A5,
+	       _Float16 __A4, _Float16 __A3, _Float16 __A2,
+	       _Float16 __A1, _Float16 __A0)
 {
-  return (__m512i)
-    __builtin_ia32_vcvtph2udq512_mask_round (__A,
-					     (__v16si)
-					     _mm512_setzero_si512 (),
-					     (__mmask16) -1,
-					     _MM_FROUND_CUR_DIRECTION);
+  return __extension__ (__m512h)(__v32hf){ __A0, __A1, __A2, __A3,
+					   __A4, __A5, __A6, __A7,
+					   __A8, __A9, __A10, __A11,
+					   __A12, __A13, __A14, __A15,
+					   __A16, __A17, __A18, __A19,
+					   __A20, __A21, __A22, __A23,
+					   __A24, __A25, __A26, __A27,
+					   __A28, __A29, __A30, __A31 };
 }
 
-extern __inline __m512i
+/* Create vectors of elements in the reversed order from
+   _mm512_set_ph functions.  */
+
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtph_epu32 (__m512i __A, __mmask16 __B, __m256h __C)
+_mm512_setr_ph (_Float16 __A0, _Float16 __A1, _Float16 __A2,
+		_Float16 __A3, _Float16 __A4, _Float16 __A5,
+		_Float16 __A6, _Float16 __A7, _Float16 __A8,
+		_Float16 __A9, _Float16 __A10, _Float16 __A11,
+		_Float16 __A12, _Float16 __A13, _Float16 __A14,
+		_Float16 __A15, _Float16 __A16, _Float16 __A17,
+		_Float16 __A18, _Float16 __A19, _Float16 __A20,
+		_Float16 __A21, _Float16 __A22, _Float16 __A23,
+		_Float16 __A24, _Float16 __A25, _Float16 __A26,
+		_Float16 __A27, _Float16 __A28, _Float16 __A29,
+		_Float16 __A30, _Float16 __A31)
+
 {
-  return (__m512i)
-    __builtin_ia32_vcvtph2udq512_mask_round (__C,
-					     (__v16si) __A,
-					     __B,
-					     _MM_FROUND_CUR_DIRECTION);
+  return _mm512_set_ph (__A31, __A30, __A29, __A28, __A27, __A26, __A25,
+			__A24, __A23, __A22, __A21, __A20, __A19, __A18,
+			__A17, __A16, __A15, __A14, __A13, __A12, __A11,
+			__A10, __A9, __A8, __A7, __A6, __A5, __A4, __A3,
+			__A2, __A1, __A0);
 }
 
-extern __inline __m512i
+/* Broadcast _Float16 to vector.  */
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtph_epu32 (__mmask16 __A, __m256h __B)
+_mm512_set1_ph (_Float16 __A)
 {
-  return (__m512i)
-    __builtin_ia32_vcvtph2udq512_mask_round (__B,
-					     (__v16si)
-					     _mm512_setzero_si512 (),
-					     __A,
-					     _MM_FROUND_CUR_DIRECTION);
+  return _mm512_set_ph (__A, __A, __A, __A, __A, __A, __A, __A,
+			__A, __A, __A, __A, __A, __A, __A, __A,
+			__A, __A, __A, __A, __A, __A, __A, __A,
+			__A, __A, __A, __A, __A, __A, __A, __A);
 }
 
-#ifdef __OPTIMIZE__
-extern __inline __m512i
+/* Create a vector with all zeros.  */
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvt_roundph_epu32 (__m256h __A, int __B)
+_mm512_setzero_ph (void)
 {
-  return (__m512i)
-    __builtin_ia32_vcvtph2udq512_mask_round (__A,
-					     (__v16si)
-					     _mm512_setzero_si512 (),
-					     (__mmask16) -1,
-					     __B);
+  return _mm512_set1_ph (0.0f16);
 }
 
-extern __inline __m512i
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvt_roundph_epu32 (__m512i __A, __mmask16 __B, __m256h __C, int __D)
+_mm512_undefined_ph (void)
 {
-  return (__m512i)
-    __builtin_ia32_vcvtph2udq512_mask_round (__C,
-					     (__v16si) __A,
-					     __B,
-					     __D);
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Winit-self"
+  __m512h __Y = __Y;
+#pragma GCC diagnostic pop
+  return __Y;
 }
 
-extern __inline __m512i
+extern __inline _Float16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvt_roundph_epu32 (__mmask16 __A, __m256h __B, int __C)
+_mm512_cvtsh_h (__m512h __A)
 {
-  return (__m512i)
-    __builtin_ia32_vcvtph2udq512_mask_round (__B,
-					     (__v16si)
-					     _mm512_setzero_si512 (),
-					     __A,
-					     __C);
+  return __A[0];
 }
 
-#else
-#define _mm512_cvt_roundph_epu32(A, B)					\
-  ((__m512i)								\
-   __builtin_ia32_vcvtph2udq512_mask_round ((A),			\
-					    (__v16si)			\
-					    _mm512_setzero_si512 (),	\
-					    (__mmask16)-1,		\
-					    (B)))
-
-#define _mm512_mask_cvt_roundph_epu32(A, B, C, D)			\
-  ((__m512i)								\
-   __builtin_ia32_vcvtph2udq512_mask_round ((C), (__v16si)(A), (B), (D)))
-
-#define _mm512_maskz_cvt_roundph_epu32(A, B, C)				\
-  ((__m512i)								\
-   __builtin_ia32_vcvtph2udq512_mask_round ((B),			\
-					    (__v16si)			\
-					    _mm512_setzero_si512 (),	\
-					    (A),			\
-					    (C)))
-
-#endif /* __OPTIMIZE__ */
-
-/* Intrinsics vcvttph2dq.  */
-extern __inline __m512i
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvttph_epi32 (__m256h __A)
+_mm512_castph_ps (__m512h __a)
 {
-  return (__m512i)
-    __builtin_ia32_vcvttph2dq512_mask_round (__A,
-					     (__v16si)
-					     _mm512_setzero_si512 (),
-					     (__mmask16) -1,
-					     _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __a;
 }
 
-extern __inline __m512i
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvttph_epi32 (__m512i __A, __mmask16 __B, __m256h __C)
+_mm512_castph_pd (__m512h __a)
 {
-  return (__m512i)
-    __builtin_ia32_vcvttph2dq512_mask_round (__C,
-					     (__v16si) __A,
-					     __B,
-					     _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __a;
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvttph_epi32 (__mmask16 __A, __m256h __B)
+_mm512_castph_si512 (__m512h __a)
 {
-  return (__m512i)
-    __builtin_ia32_vcvttph2dq512_mask_round (__B,
-					     (__v16si)
-					     _mm512_setzero_si512 (),
-					     __A,
-					     _MM_FROUND_CUR_DIRECTION);
+  return (__m512i) __a;
 }
 
-#ifdef __OPTIMIZE__
-extern __inline __m512i
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtt_roundph_epi32 (__m256h __A, int __B)
+_mm512_castph512_ph128 (__m512h __A)
 {
-  return (__m512i)
-    __builtin_ia32_vcvttph2dq512_mask_round (__A,
-					     (__v16si)
-					     _mm512_setzero_si512 (),
-					     (__mmask16) -1,
-					     __B);
+  union
+  {
+    __m128h __a[4];
+    __m512h __v;
+  } __u = { .__v = __A };
+  return __u.__a[0];
 }
 
-extern __inline __m512i
+extern __inline __m256h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtt_roundph_epi32 (__m512i __A, __mmask16 __B,
-				__m256h __C, int __D)
+_mm512_castph512_ph256 (__m512h __A)
 {
-  return (__m512i)
-    __builtin_ia32_vcvttph2dq512_mask_round (__C,
-					     (__v16si) __A,
-					     __B,
-					     __D);
+  union
+  {
+    __m256h __a[2];
+    __m512h __v;
+  } __u = { .__v = __A };
+  return __u.__a[0];
 }
 
-extern __inline __m512i
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtt_roundph_epi32 (__mmask16 __A, __m256h __B, int __C)
+_mm512_castph128_ph512 (__m128h __A)
 {
-  return (__m512i)
-    __builtin_ia32_vcvttph2dq512_mask_round (__B,
-					     (__v16si)
-					     _mm512_setzero_si512 (),
-					     __A,
-					     __C);
+  union
+  {
+    __m128h __a[4];
+    __m512h __v;
+  } __u;
+  __u.__a[0] = __A;
+  return __u.__v;
 }
 
-#else
-#define _mm512_cvtt_roundph_epi32(A, B)					\
-  ((__m512i)								\
-   __builtin_ia32_vcvttph2dq512_mask_round ((A),			\
-					    (__v16si)			\
-					    (_mm512_setzero_si512 ()),	\
-					    (__mmask16)(-1), (B)))
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_castph256_ph512 (__m256h __A)
+{
+  union
+  {
+    __m256h __a[2];
+    __m512h __v;
+  } __u;
+  __u.__a[0] = __A;
+  return __u.__v;
+}
 
-#define _mm512_mask_cvtt_roundph_epi32(A, B, C, D)		\
-  ((__m512i)							\
-   __builtin_ia32_vcvttph2dq512_mask_round ((C),		\
-					    (__v16si)(A),	\
-					    (B),		\
-					    (D)))
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_zextph128_ph512 (__m128h __A)
+{
+  return (__m512h) _mm512_insertf32x4 (_mm512_setzero_ps (),
+				       (__m128) __A, 0);
+}
 
-#define _mm512_maskz_cvtt_roundph_epi32(A, B, C)			\
-  ((__m512i)								\
-   __builtin_ia32_vcvttph2dq512_mask_round ((B),			\
-					    (__v16si)			\
-					    _mm512_setzero_si512 (),	\
-					    (A),			\
-					    (C)))
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_zextph256_ph512 (__m256h __A)
+{
+  return (__m512h) _mm512_insertf64x4 (_mm512_setzero_pd (),
+				       (__m256d) __A, 0);
+}
 
-#endif /* __OPTIMIZE__ */
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_castps_ph (__m512 __a)
+{
+  return (__m512h) __a;
+}
 
-/* Intrinsics vcvttph2udq.  */
-extern __inline __m512i
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvttph_epu32 (__m256h __A)
+_mm512_castpd_ph (__m512d __a)
 {
-  return (__m512i)
-    __builtin_ia32_vcvttph2udq512_mask_round (__A,
-					      (__v16si)
-					      _mm512_setzero_si512 (),
-					      (__mmask16) -1,
-					      _MM_FROUND_CUR_DIRECTION);
+  return (__m512h) __a;
 }
 
-extern __inline __m512i
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvttph_epu32 (__m512i __A, __mmask16 __B, __m256h __C)
+_mm512_castsi512_ph (__m512i __a)
 {
-  return (__m512i)
-    __builtin_ia32_vcvttph2udq512_mask_round (__C,
-					      (__v16si) __A,
-					      __B,
-					      _MM_FROUND_CUR_DIRECTION);
+  return (__m512h) __a;
 }
 
-extern __inline __m512i
+/* Create a vector with element 0 as *P and the rest zero.  */
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvttph_epu32 (__mmask16 __A, __m256h __B)
+_mm512_load_ph (void const *__P)
 {
-  return (__m512i)
-    __builtin_ia32_vcvttph2udq512_mask_round (__B,
-					      (__v16si)
-					      _mm512_setzero_si512 (),
-					      __A,
-					      _MM_FROUND_CUR_DIRECTION);
+  return *(const __m512h *) __P;
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_loadu_ph (void const *__P)
+{
+  return *(const __m512h_u *) __P;
+}
+
+/* Stores the lower _Float16 value.  */
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_store_ph (void *__P, __m512h __A)
+{
+   *(__m512h *) __P = __A;
+}
+
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_storeu_ph (void *__P, __m512h __A)
+{
+   *(__m512h_u *) __P = __A;
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_abs_ph (__m512h __A)
+{
+  return (__m512h) _mm512_and_epi32 ( _mm512_set1_epi32 (0x7FFF7FFF),
+				      (__m512i) __A);
+}
+
+/* Intrinsics v[add,sub,mul,div]ph.  */
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_add_ph (__m512h __A, __m512h __B)
+{
+  return (__m512h) ((__v32hf) __A + (__v32hf) __B);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_add_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D)
+{
+  return __builtin_ia32_addph512_mask (__C, __D, __A, __B);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_add_ph (__mmask32 __A, __m512h __B, __m512h __C)
+{
+  return __builtin_ia32_addph512_mask (__B, __C,
+				       _mm512_setzero_ph (), __A);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_sub_ph (__m512h __A, __m512h __B)
+{
+  return (__m512h) ((__v32hf) __A - (__v32hf) __B);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_sub_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D)
+{
+  return __builtin_ia32_subph512_mask (__C, __D, __A, __B);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_sub_ph (__mmask32 __A, __m512h __B, __m512h __C)
+{
+  return __builtin_ia32_subph512_mask (__B, __C,
+				       _mm512_setzero_ph (), __A);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mul_ph (__m512h __A, __m512h __B)
+{
+  return (__m512h) ((__v32hf) __A * (__v32hf) __B);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_mul_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D)
+{
+  return __builtin_ia32_mulph512_mask (__C, __D, __A, __B);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_mul_ph (__mmask32 __A, __m512h __B, __m512h __C)
+{
+  return __builtin_ia32_mulph512_mask (__B, __C,
+				       _mm512_setzero_ph (), __A);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_div_ph (__m512h __A, __m512h __B)
+{
+  return (__m512h) ((__v32hf) __A / (__v32hf) __B);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_div_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D)
+{
+  return __builtin_ia32_divph512_mask (__C, __D, __A, __B);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_div_ph (__mmask32 __A, __m512h __B, __m512h __C)
+{
+  return __builtin_ia32_divph512_mask (__B, __C,
+				       _mm512_setzero_ph (), __A);
 }
 
 #ifdef __OPTIMIZE__
-extern __inline __m512i
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtt_roundph_epu32 (__m256h __A, int __B)
+_mm512_add_round_ph (__m512h __A, __m512h __B, const int __C)
 {
-  return (__m512i)
-    __builtin_ia32_vcvttph2udq512_mask_round (__A,
-					      (__v16si)
-					      _mm512_setzero_si512 (),
-					      (__mmask16) -1,
-					      __B);
+  return __builtin_ia32_addph512_mask_round (__A, __B,
+					     _mm512_setzero_ph (),
+					     (__mmask32) -1, __C);
 }
 
-extern __inline __m512i
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtt_roundph_epu32 (__m512i __A, __mmask16 __B,
-				__m256h __C, int __D)
+_mm512_mask_add_round_ph (__m512h __A, __mmask32 __B, __m512h __C,
+			  __m512h __D, const int __E)
 {
-  return (__m512i)
-    __builtin_ia32_vcvttph2udq512_mask_round (__C,
-					      (__v16si) __A,
-					      __B,
-					      __D);
+  return __builtin_ia32_addph512_mask_round (__C, __D, __A, __B, __E);
 }
 
-extern __inline __m512i
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtt_roundph_epu32 (__mmask16 __A, __m256h __B, int __C)
+_mm512_maskz_add_round_ph (__mmask32 __A, __m512h __B, __m512h __C,
+			   const int __D)
 {
-  return (__m512i)
-    __builtin_ia32_vcvttph2udq512_mask_round (__B,
-					      (__v16si)
-					      _mm512_setzero_si512 (),
-					      __A,
-					      __C);
+  return __builtin_ia32_addph512_mask_round (__B, __C,
+					     _mm512_setzero_ph (),
+					     __A, __D);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_sub_round_ph (__m512h __A, __m512h __B, const int __C)
+{
+  return __builtin_ia32_subph512_mask_round (__A, __B,
+					     _mm512_setzero_ph (),
+					     (__mmask32) -1, __C);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_sub_round_ph (__m512h __A, __mmask32 __B, __m512h __C,
+			  __m512h __D, const int __E)
+{
+  return __builtin_ia32_subph512_mask_round (__C, __D, __A, __B, __E);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_sub_round_ph (__mmask32 __A, __m512h __B, __m512h __C,
+			   const int __D)
+{
+  return __builtin_ia32_subph512_mask_round (__B, __C,
+					     _mm512_setzero_ph (),
+					     __A, __D);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mul_round_ph (__m512h __A, __m512h __B, const int __C)
+{
+  return __builtin_ia32_mulph512_mask_round (__A, __B,
+					     _mm512_setzero_ph (),
+					     (__mmask32) -1, __C);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_mul_round_ph (__m512h __A, __mmask32 __B, __m512h __C,
+			  __m512h __D, const int __E)
+{
+  return __builtin_ia32_mulph512_mask_round (__C, __D, __A, __B, __E);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_mul_round_ph (__mmask32 __A, __m512h __B, __m512h __C,
+			   const int __D)
+{
+  return __builtin_ia32_mulph512_mask_round (__B, __C,
+					     _mm512_setzero_ph (),
+					     __A, __D);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_div_round_ph (__m512h __A, __m512h __B, const int __C)
+{
+  return __builtin_ia32_divph512_mask_round (__A, __B,
+					     _mm512_setzero_ph (),
+					     (__mmask32) -1, __C);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_div_round_ph (__m512h __A, __mmask32 __B, __m512h __C,
+			  __m512h __D, const int __E)
+{
+  return __builtin_ia32_divph512_mask_round (__C, __D, __A, __B, __E);
 }
 
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_div_round_ph (__mmask32 __A, __m512h __B, __m512h __C,
+			   const int __D)
+{
+  return __builtin_ia32_divph512_mask_round (__B, __C,
+					     _mm512_setzero_ph (),
+					     __A, __D);
+}
 #else
-#define _mm512_cvtt_roundph_epu32(A, B)					\
-  ((__m512i)								\
-   __builtin_ia32_vcvttph2udq512_mask_round ((A),			\
-					     (__v16si)			\
-					     _mm512_setzero_si512 (),	\
-					     (__mmask16)-1,		\
-					     (B)))
+#define _mm512_add_round_ph(A, B, C)					\
+  ((__m512h)__builtin_ia32_addph512_mask_round((A), (B),		\
+					       _mm512_setzero_ph (),	\
+					       (__mmask32)-1, (C)))
 
-#define _mm512_mask_cvtt_roundph_epu32(A, B, C, D)		\
-  ((__m512i)							\
-   __builtin_ia32_vcvttph2udq512_mask_round ((C),		\
-					     (__v16si)(A),	\
-					     (B),		\
-					     (D)))
+#define _mm512_mask_add_round_ph(A, B, C, D, E)				\
+  ((__m512h)__builtin_ia32_addph512_mask_round((C), (D), (A), (B), (E)))
 
-#define _mm512_maskz_cvtt_roundph_epu32(A, B, C)			\
-  ((__m512i)								\
-   __builtin_ia32_vcvttph2udq512_mask_round ((B),			\
-					     (__v16si)			\
-					     _mm512_setzero_si512 (),	\
-					     (A),			\
-					     (C)))
+#define _mm512_maskz_add_round_ph(A, B, C, D)				\
+  ((__m512h)__builtin_ia32_addph512_mask_round((B), (C),		\
+					       _mm512_setzero_ph (),	\
+					       (A), (D)))
 
-#endif /* __OPTIMIZE__ */
+#define _mm512_sub_round_ph(A, B, C)					\
+  ((__m512h)__builtin_ia32_subph512_mask_round((A), (B),		\
+					       _mm512_setzero_ph (),	\
+					       (__mmask32)-1, (C)))
 
-/* Intrinsics vcvtdq2ph.  */
-extern __inline __m256h
+#define _mm512_mask_sub_round_ph(A, B, C, D, E)				\
+  ((__m512h)__builtin_ia32_subph512_mask_round((C), (D), (A), (B), (E)))
+
+#define _mm512_maskz_sub_round_ph(A, B, C, D)				\
+  ((__m512h)__builtin_ia32_subph512_mask_round((B), (C),		\
+					       _mm512_setzero_ph (),	\
+					       (A), (D)))
+
+#define _mm512_mul_round_ph(A, B, C)					\
+  ((__m512h)__builtin_ia32_mulph512_mask_round((A), (B),		\
+					       _mm512_setzero_ph (),	\
+					       (__mmask32)-1, (C)))
+
+#define _mm512_mask_mul_round_ph(A, B, C, D, E)				\
+  ((__m512h)__builtin_ia32_mulph512_mask_round((C), (D), (A), (B), (E)))
+
+#define _mm512_maskz_mul_round_ph(A, B, C, D)				\
+  ((__m512h)__builtin_ia32_mulph512_mask_round((B), (C),		\
+					       _mm512_setzero_ph (),	\
+					       (A), (D)))
+
+#define _mm512_div_round_ph(A, B, C)					\
+  ((__m512h)__builtin_ia32_divph512_mask_round((A), (B),		\
+					       _mm512_setzero_ph (),	\
+					       (__mmask32)-1, (C)))
+
+#define _mm512_mask_div_round_ph(A, B, C, D, E)				\
+  ((__m512h)__builtin_ia32_divph512_mask_round((C), (D), (A), (B), (E)))
+
+#define _mm512_maskz_div_round_ph(A, B, C, D)				\
+  ((__m512h)__builtin_ia32_divph512_mask_round((B), (C),		\
+					       _mm512_setzero_ph (),	\
+					       (A), (D)))
+#endif  /* __OPTIMIZE__  */
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_conj_pch (__m512h __A)
+{
+  return (__m512h) _mm512_xor_epi32 ((__m512i) __A, _mm512_set1_epi32 (1<<31));
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_conj_pch (__m512h __W, __mmask16 __U, __m512h __A)
+{
+  return (__m512h)
+    __builtin_ia32_movaps512_mask ((__v16sf) _mm512_conj_pch (__A),
+				   (__v16sf) __W,
+				   (__mmask16) __U);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_conj_pch (__mmask16 __U, __m512h __A)
+{
+  return (__m512h)
+    __builtin_ia32_movaps512_mask ((__v16sf) _mm512_conj_pch (__A),
+				   (__v16sf) _mm512_setzero_ps (),
+				   (__mmask16) __U);
+}
+
+/* Intrinsic vmaxph vminph.  */
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepi32_ph (__m512i __A)
+_mm512_max_ph (__m512h __A, __m512h __B)
 {
-  return __builtin_ia32_vcvtdq2ph512_mask_round ((__v16si) __A,
-						 _mm256_setzero_ph (),
-						 (__mmask16) -1,
-						 _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_maxph512_mask (__A, __B,
+				       _mm512_setzero_ph (),
+				       (__mmask32) -1);
 }
 
-extern __inline __m256h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepi32_ph (__m256h __A, __mmask16 __B, __m512i __C)
+_mm512_mask_max_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D)
 {
-  return __builtin_ia32_vcvtdq2ph512_mask_round ((__v16si) __C,
-						 __A,
-						 __B,
-						 _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_maxph512_mask (__C, __D, __A, __B);
 }
 
-extern __inline __m256h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepi32_ph (__mmask16 __A, __m512i __B)
+_mm512_maskz_max_ph (__mmask32 __A, __m512h __B, __m512h __C)
 {
-  return __builtin_ia32_vcvtdq2ph512_mask_round ((__v16si) __B,
-						 _mm256_setzero_ph (),
-						 __A,
-						 _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_maxph512_mask (__B, __C,
+				       _mm512_setzero_ph (), __A);
 }
 
-#ifdef __OPTIMIZE__
-extern __inline __m256h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvt_roundepi32_ph (__m512i __A, int __B)
+_mm512_min_ph (__m512h __A, __m512h __B)
 {
-  return __builtin_ia32_vcvtdq2ph512_mask_round ((__v16si) __A,
-						 _mm256_setzero_ph (),
-						 (__mmask16) -1,
-						 __B);
+  return __builtin_ia32_minph512_mask (__A, __B,
+				       _mm512_setzero_ph (),
+				       (__mmask32) -1);
 }
 
-extern __inline __m256h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvt_roundepi32_ph (__m256h __A, __mmask16 __B, __m512i __C, int __D)
+_mm512_mask_min_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D)
 {
-  return __builtin_ia32_vcvtdq2ph512_mask_round ((__v16si) __C,
-						 __A,
-						 __B,
-						 __D);
+  return __builtin_ia32_minph512_mask (__C, __D, __A, __B);
 }
 
-extern __inline __m256h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvt_roundepi32_ph (__mmask16 __A, __m512i __B, int __C)
+_mm512_maskz_min_ph (__mmask32 __A, __m512h __B, __m512h __C)
 {
-  return __builtin_ia32_vcvtdq2ph512_mask_round ((__v16si) __B,
-						 _mm256_setzero_ph (),
-						 __A,
-						 __C);
+  return __builtin_ia32_minph512_mask (__B, __C,
+				       _mm512_setzero_ph (), __A);
 }
 
-#else
-#define _mm512_cvt_roundepi32_ph(A, B)					\
-  (__builtin_ia32_vcvtdq2ph512_mask_round ((__v16si)(A),		\
-					   _mm256_setzero_ph (),	\
-					   (__mmask16)-1,		\
-					   (B)))
-
-#define _mm512_mask_cvt_roundepi32_ph(A, B, C, D)		\
-  (__builtin_ia32_vcvtdq2ph512_mask_round ((__v16si)(C),	\
-					   (A),			\
-					   (B),			\
-					   (D)))
-
-#define _mm512_maskz_cvt_roundepi32_ph(A, B, C)				\
-  (__builtin_ia32_vcvtdq2ph512_mask_round ((__v16si)(B),		\
-					   _mm256_setzero_ph (),	\
-					   (A),				\
-					   (C)))
-
-#endif /* __OPTIMIZE__ */
-
-/* Intrinsics vcvtudq2ph.  */
-extern __inline __m256h
+#ifdef __OPTIMIZE__
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepu32_ph (__m512i __A)
+_mm512_max_round_ph (__m512h __A, __m512h __B, const int __C)
 {
-  return __builtin_ia32_vcvtudq2ph512_mask_round ((__v16si) __A,
-						  _mm256_setzero_ph (),
-						  (__mmask16) -1,
-						  _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_maxph512_mask_round (__A, __B,
+					     _mm512_setzero_ph (),
+					     (__mmask32) -1, __C);
 }
 
-extern __inline __m256h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepu32_ph (__m256h __A, __mmask16 __B, __m512i __C)
+_mm512_mask_max_round_ph (__m512h __A, __mmask32 __B, __m512h __C,
+			  __m512h __D, const int __E)
 {
-  return __builtin_ia32_vcvtudq2ph512_mask_round ((__v16si) __C,
-						  __A,
-						  __B,
-						  _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_maxph512_mask_round (__C, __D, __A, __B, __E);
 }
 
-extern __inline __m256h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepu32_ph (__mmask16 __A, __m512i __B)
+_mm512_maskz_max_round_ph (__mmask32 __A, __m512h __B, __m512h __C,
+			   const int __D)
 {
-  return __builtin_ia32_vcvtudq2ph512_mask_round ((__v16si) __B,
-						  _mm256_setzero_ph (),
-						  __A,
-						  _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_maxph512_mask_round (__B, __C,
+					     _mm512_setzero_ph (),
+					     __A, __D);
 }
 
-#ifdef __OPTIMIZE__
-extern __inline __m256h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvt_roundepu32_ph (__m512i __A, int __B)
+_mm512_min_round_ph (__m512h __A, __m512h __B, const int __C)
 {
-  return __builtin_ia32_vcvtudq2ph512_mask_round ((__v16si) __A,
-						  _mm256_setzero_ph (),
-						  (__mmask16) -1,
-						  __B);
+  return __builtin_ia32_minph512_mask_round (__A, __B,
+					     _mm512_setzero_ph (),
+					     (__mmask32) -1, __C);
 }
 
-extern __inline __m256h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvt_roundepu32_ph (__m256h __A, __mmask16 __B, __m512i __C, int __D)
+_mm512_mask_min_round_ph (__m512h __A, __mmask32 __B, __m512h __C,
+			  __m512h __D, const int __E)
 {
-  return __builtin_ia32_vcvtudq2ph512_mask_round ((__v16si) __C,
-						  __A,
-						  __B,
-						  __D);
+  return __builtin_ia32_minph512_mask_round (__C, __D, __A, __B, __E);
 }
 
-extern __inline __m256h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvt_roundepu32_ph (__mmask16 __A, __m512i __B, int __C)
+_mm512_maskz_min_round_ph (__mmask32 __A, __m512h __B, __m512h __C,
+			   const int __D)
 {
-  return __builtin_ia32_vcvtudq2ph512_mask_round ((__v16si) __B,
-						  _mm256_setzero_ph (),
-						  __A,
-						  __C);
+  return __builtin_ia32_minph512_mask_round (__B, __C,
+					     _mm512_setzero_ph (),
+					     __A, __D);
 }
 
 #else
-#define _mm512_cvt_roundepu32_ph(A, B)					\
-  (__builtin_ia32_vcvtudq2ph512_mask_round ((__v16si)(A),		\
-					    _mm256_setzero_ph (),	\
-					    (__mmask16)-1,		\
-					    B))
+#define _mm512_max_round_ph(A, B, C)				\
+  (__builtin_ia32_maxph512_mask_round ((A), (B),		\
+				       _mm512_setzero_ph (),	\
+				       (__mmask32)-1, (C)))
 
-#define _mm512_mask_cvt_roundepu32_ph(A, B, C, D)	\
-  (__builtin_ia32_vcvtudq2ph512_mask_round ((__v16si)C,	\
-					    A,		\
-					    B,		\
-					    D))
+#define _mm512_mask_max_round_ph(A, B, C, D, E)				\
+  (__builtin_ia32_maxph512_mask_round ((C), (D), (A), (B), (E)))
 
-#define _mm512_maskz_cvt_roundepu32_ph(A, B, C)				\
-  (__builtin_ia32_vcvtudq2ph512_mask_round ((__v16si)B,			\
-					    _mm256_setzero_ph (),	\
-					    A,				\
-					    C))
+#define _mm512_maskz_max_round_ph(A, B, C, D)			\
+  (__builtin_ia32_maxph512_mask_round ((B), (C),		\
+				       _mm512_setzero_ph (),	\
+				       (A), (D)))
 
-#endif /* __OPTIMIZE__ */
+#define _mm512_min_round_ph(A, B, C)				\
+  (__builtin_ia32_minph512_mask_round ((A), (B),		\
+				       _mm512_setzero_ph (),	\
+				       (__mmask32)-1, (C)))
 
-/* Intrinsics vcvtph2qq.  */
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtph_epi64 (__m128h __A)
-{
-  return __builtin_ia32_vcvtph2qq512_mask_round (__A,
-						 _mm512_setzero_si512 (),
-						 (__mmask8) -1,
-						 _MM_FROUND_CUR_DIRECTION);
-}
+#define _mm512_mask_min_round_ph(A, B, C, D, E)				\
+  (__builtin_ia32_minph512_mask_round ((C), (D), (A), (B), (E)))
 
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtph_epi64 (__m512i __A, __mmask8 __B, __m128h __C)
-{
-  return __builtin_ia32_vcvtph2qq512_mask_round (__C, __A, __B,
-						 _MM_FROUND_CUR_DIRECTION);
-}
+#define _mm512_maskz_min_round_ph(A, B, C, D)			\
+  (__builtin_ia32_minph512_mask_round ((B), (C),		\
+				       _mm512_setzero_ph (),	\
+				       (A), (D)))
+#endif /* __OPTIMIZE__ */
 
-extern __inline __m512i
+/* vcmpph */
+#ifdef __OPTIMIZE
+extern __inline __mmask32
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtph_epi64 (__mmask8 __A, __m128h __B)
+_mm512_cmp_ph_mask (__m512h __A, __m512h __B, const int __C)
 {
-  return __builtin_ia32_vcvtph2qq512_mask_round (__B,
-						 _mm512_setzero_si512 (),
-						 __A,
-						 _MM_FROUND_CUR_DIRECTION);
+  return (__mmask32) __builtin_ia32_cmpph512_mask (__A, __B, __C,
+						   (__mmask32) -1);
 }
 
-#ifdef __OPTIMIZE__
-extern __inline __m512i
+extern __inline __mmask32
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvt_roundph_epi64 (__m128h __A, int __B)
+_mm512_mask_cmp_ph_mask (__mmask32 __A, __m512h __B, __m512h __C,
+			 const int __D)
 {
-  return __builtin_ia32_vcvtph2qq512_mask_round (__A,
-						 _mm512_setzero_si512 (),
-						 (__mmask8) -1,
-						 __B);
+  return (__mmask32) __builtin_ia32_cmpph512_mask (__B, __C, __D,
+						   __A);
 }
 
-extern __inline __m512i
+extern __inline __mmask32
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvt_roundph_epi64 (__m512i __A, __mmask8 __B, __m128h __C, int __D)
+_mm512_cmp_round_ph_mask (__m512h __A, __m512h __B, const int __C,
+			  const int __D)
 {
-  return __builtin_ia32_vcvtph2qq512_mask_round (__C, __A, __B, __D);
+  return (__mmask32) __builtin_ia32_cmpph512_mask_round (__A, __B,
+							 __C, (__mmask32) -1,
+							 __D);
 }
 
-extern __inline __m512i
+extern __inline __mmask32
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvt_roundph_epi64 (__mmask8 __A, __m128h __B, int __C)
+_mm512_mask_cmp_round_ph_mask (__mmask32 __A, __m512h __B, __m512h __C,
+			       const int __D, const int __E)
 {
-  return __builtin_ia32_vcvtph2qq512_mask_round (__B,
-						 _mm512_setzero_si512 (),
-						 __A,
-						 __C);
+  return (__mmask32) __builtin_ia32_cmpph512_mask_round (__B, __C,
+							 __D, __A,
+							 __E);
 }
 
-#else
-#define _mm512_cvt_roundph_epi64(A, B)					\
-  (__builtin_ia32_vcvtph2qq512_mask_round ((A),				\
-					   _mm512_setzero_si512 (),	\
-					   (__mmask8)-1,		\
-					   (B)))
-
-#define _mm512_mask_cvt_roundph_epi64(A, B, C, D)		\
-  (__builtin_ia32_vcvtph2qq512_mask_round ((C), (A), (B), (D)))
+#else
+#define _mm512_cmp_ph_mask(A, B, C)			\
+  (__builtin_ia32_cmpph512_mask ((A), (B), (C), (-1)))
 
-#define _mm512_maskz_cvt_roundph_epi64(A, B, C)				\
-  (__builtin_ia32_vcvtph2qq512_mask_round ((B),				\
-					   _mm512_setzero_si512 (),	\
-					   (A),				\
-					   (C)))
+#define _mm512_mask_cmp_ph_mask(A, B, C, D)		\
+  (__builtin_ia32_cmpph512_mask ((B), (C), (D), (A)))
+
+#define _mm512_cmp_round_ph_mask(A, B, C, D)				\
+  (__builtin_ia32_cmpph512_mask_round ((A), (B), (C), (-1), (D)))
+
+#define _mm512_mask_cmp_round_ph_mask(A, B, C, D, E)			\
+  (__builtin_ia32_cmpph512_mask_round ((B), (C), (D), (A), (E)))
 
 #endif /* __OPTIMIZE__ */
 
-/* Intrinsics vcvtph2uqq.  */
-extern __inline __m512i
+/* Intrinsics vsqrtph.  */
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtph_epu64 (__m128h __A)
+_mm512_sqrt_ph (__m512h __A)
 {
-  return __builtin_ia32_vcvtph2uqq512_mask_round (__A,
-						  _mm512_setzero_si512 (),
-						  (__mmask8) -1,
-						  _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_sqrtph512_mask_round (__A,
+					      _mm512_setzero_ph(),
+					      (__mmask32) -1,
+					      _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512i
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtph_epu64 (__m512i __A, __mmask8 __B, __m128h __C)
+_mm512_mask_sqrt_ph (__m512h __A, __mmask32 __B, __m512h __C)
 {
-  return __builtin_ia32_vcvtph2uqq512_mask_round (__C, __A, __B,
-						  _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_sqrtph512_mask_round (__C, __A, __B,
+					      _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512i
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtph_epu64 (__mmask8 __A, __m128h __B)
+_mm512_maskz_sqrt_ph (__mmask32 __A, __m512h __B)
 {
-  return __builtin_ia32_vcvtph2uqq512_mask_round (__B,
-						  _mm512_setzero_si512 (),
-						  __A,
-						  _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_sqrtph512_mask_round (__B,
+					      _mm512_setzero_ph (),
+					      __A,
+					      _MM_FROUND_CUR_DIRECTION);
 }
 
 #ifdef __OPTIMIZE__
-
-extern __inline __m512i
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvt_roundph_epu64 (__m128h __A, int __B)
+_mm512_sqrt_round_ph (__m512h __A, const int __B)
 {
-  return __builtin_ia32_vcvtph2uqq512_mask_round (__A,
-						  _mm512_setzero_si512 (),
-						  (__mmask8) -1,
-						  __B);
+  return __builtin_ia32_sqrtph512_mask_round (__A,
+					      _mm512_setzero_ph(),
+					      (__mmask32) -1, __B);
 }
 
-extern __inline __m512i
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvt_roundph_epu64 (__m512i __A, __mmask8 __B, __m128h __C, int __D)
+_mm512_mask_sqrt_round_ph (__m512h __A, __mmask32 __B, __m512h __C,
+			   const int __D)
 {
-  return __builtin_ia32_vcvtph2uqq512_mask_round (__C, __A, __B, __D);
+  return __builtin_ia32_sqrtph512_mask_round (__C, __A, __B, __D);
 }
 
-extern __inline __m512i
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvt_roundph_epu64 (__mmask8 __A, __m128h __B, int __C)
+_mm512_maskz_sqrt_round_ph (__mmask32 __A, __m512h __B, const int __C)
 {
-  return __builtin_ia32_vcvtph2uqq512_mask_round (__B,
-						  _mm512_setzero_si512 (),
-						  __A,
-						  __C);
+  return __builtin_ia32_sqrtph512_mask_round (__B,
+					      _mm512_setzero_ph (),
+					      __A, __C);
 }
 
 #else
-#define _mm512_cvt_roundph_epu64(A, B)					\
-  (__builtin_ia32_vcvtph2uqq512_mask_round ((A),			\
-					    _mm512_setzero_si512 (),	\
-					    (__mmask8)-1,		\
-					    (B)))
+#define _mm512_sqrt_round_ph(A, B)				\
+  (__builtin_ia32_sqrtph512_mask_round ((A),			\
+					_mm512_setzero_ph (),	\
+					(__mmask32)-1, (B)))
 
-#define _mm512_mask_cvt_roundph_epu64(A, B, C, D)			\
-  (__builtin_ia32_vcvtph2uqq512_mask_round ((C), (A), (B), (D)))
+#define _mm512_mask_sqrt_round_ph(A, B, C, D)			\
+  (__builtin_ia32_sqrtph512_mask_round ((C), (A), (B), (D)))
 
-#define _mm512_maskz_cvt_roundph_epu64(A, B, C)				\
-  (__builtin_ia32_vcvtph2uqq512_mask_round ((B),			\
-					    _mm512_setzero_si512 (),	\
-					    (A),			\
-					    (C)))
+#define _mm512_maskz_sqrt_round_ph(A, B, C)			\
+  (__builtin_ia32_sqrtph512_mask_round ((B),			\
+					_mm512_setzero_ph (),	\
+					(A), (C)))
 
 #endif /* __OPTIMIZE__ */
 
-/* Intrinsics vcvttph2qq.  */
-extern __inline __m512i
+/* Intrinsics vrsqrtph.  */
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvttph_epi64 (__m128h __A)
+_mm512_rsqrt_ph (__m512h __A)
 {
-  return __builtin_ia32_vcvttph2qq512_mask_round (__A,
-						  _mm512_setzero_si512 (),
-						  (__mmask8) -1,
-						  _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_rsqrtph512_mask (__A, _mm512_setzero_ph (),
+					 (__mmask32) -1);
 }
 
-extern __inline __m512i
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvttph_epi64 (__m512i __A, __mmask8 __B, __m128h __C)
+_mm512_mask_rsqrt_ph (__m512h __A, __mmask32 __B, __m512h __C)
 {
-  return __builtin_ia32_vcvttph2qq512_mask_round (__C, __A, __B,
-						  _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_rsqrtph512_mask (__C, __A, __B);
 }
 
-extern __inline __m512i
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvttph_epi64 (__mmask8 __A, __m128h __B)
+_mm512_maskz_rsqrt_ph (__mmask32 __A, __m512h __B)
 {
-  return __builtin_ia32_vcvttph2qq512_mask_round (__B,
-						  _mm512_setzero_si512 (),
-						  __A,
-						  _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_rsqrtph512_mask (__B, _mm512_setzero_ph (),
+					 __A);
 }
 
-#ifdef __OPTIMIZE__
-extern __inline __m512i
+/* Intrinsics vrcpph.  */
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtt_roundph_epi64 (__m128h __A, int __B)
+_mm512_rcp_ph (__m512h __A)
 {
-  return __builtin_ia32_vcvttph2qq512_mask_round (__A,
-						  _mm512_setzero_si512 (),
-						  (__mmask8) -1,
-						  __B);
+  return __builtin_ia32_rcpph512_mask (__A, _mm512_setzero_ph (),
+				       (__mmask32) -1);
 }
 
-extern __inline __m512i
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtt_roundph_epi64 (__m512i __A, __mmask8 __B, __m128h __C, int __D)
+_mm512_mask_rcp_ph (__m512h __A, __mmask32 __B, __m512h __C)
 {
-  return __builtin_ia32_vcvttph2qq512_mask_round (__C, __A, __B, __D);
+  return __builtin_ia32_rcpph512_mask (__C, __A, __B);
 }
 
-extern __inline __m512i
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtt_roundph_epi64 (__mmask8 __A, __m128h __B, int __C)
+_mm512_maskz_rcp_ph (__mmask32 __A, __m512h __B)
 {
-  return __builtin_ia32_vcvttph2qq512_mask_round (__B,
-						  _mm512_setzero_si512 (),
-						  __A,
-						  __C);
+  return __builtin_ia32_rcpph512_mask (__B, _mm512_setzero_ph (),
+				       __A);
 }
 
-#else
-#define _mm512_cvtt_roundph_epi64(A, B)					\
-  (__builtin_ia32_vcvttph2qq512_mask_round ((A),			\
-					    _mm512_setzero_si512 (),	\
-					    (__mmask8)-1,		\
-					    (B)))
-
-#define _mm512_mask_cvtt_roundph_epi64(A, B, C, D)			\
-  __builtin_ia32_vcvttph2qq512_mask_round ((C), (A), (B), (D))
-
-#define _mm512_maskz_cvtt_roundph_epi64(A, B, C)			\
-  (__builtin_ia32_vcvttph2qq512_mask_round ((B),			\
-					    _mm512_setzero_si512 (),	\
-					    (A),			\
-					    (C)))
-
-#endif /* __OPTIMIZE__ */
-
-/* Intrinsics vcvttph2uqq.  */
-extern __inline __m512i
+/* Intrinsics vscalefph.  */
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvttph_epu64 (__m128h __A)
+_mm512_scalef_ph (__m512h __A, __m512h __B)
 {
-  return __builtin_ia32_vcvttph2uqq512_mask_round (__A,
-						   _mm512_setzero_si512 (),
-						   (__mmask8) -1,
-						   _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_scalefph512_mask_round (__A, __B,
+						_mm512_setzero_ph (),
+						(__mmask32) -1,
+						_MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512i
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvttph_epu64 (__m512i __A, __mmask8 __B, __m128h __C)
+_mm512_mask_scalef_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D)
 {
-  return __builtin_ia32_vcvttph2uqq512_mask_round (__C, __A, __B,
-						   _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_scalefph512_mask_round (__C, __D, __A, __B,
+						_MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512i
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvttph_epu64 (__mmask8 __A, __m128h __B)
+_mm512_maskz_scalef_ph (__mmask32 __A, __m512h __B, __m512h __C)
 {
-  return __builtin_ia32_vcvttph2uqq512_mask_round (__B,
-						   _mm512_setzero_si512 (),
-						   __A,
-						   _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_scalefph512_mask_round (__B, __C,
+						_mm512_setzero_ph (),
+						__A,
+						_MM_FROUND_CUR_DIRECTION);
 }
 
 #ifdef __OPTIMIZE__
-extern __inline __m512i
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtt_roundph_epu64 (__m128h __A, int __B)
+_mm512_scalef_round_ph (__m512h __A, __m512h __B, const int __C)
 {
-  return __builtin_ia32_vcvttph2uqq512_mask_round (__A,
-						   _mm512_setzero_si512 (),
-						   (__mmask8) -1,
-						   __B);
+  return __builtin_ia32_scalefph512_mask_round (__A, __B,
+						_mm512_setzero_ph (),
+						(__mmask32) -1, __C);
 }
 
-extern __inline __m512i
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtt_roundph_epu64 (__m512i __A, __mmask8 __B, __m128h __C, int __D)
+_mm512_mask_scalef_round_ph (__m512h __A, __mmask32 __B, __m512h __C,
+			     __m512h __D, const int __E)
 {
-  return __builtin_ia32_vcvttph2uqq512_mask_round (__C, __A, __B, __D);
+  return __builtin_ia32_scalefph512_mask_round (__C, __D, __A, __B,
+						__E);
 }
 
-extern __inline __m512i
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtt_roundph_epu64 (__mmask8 __A, __m128h __B, int __C)
+_mm512_maskz_scalef_round_ph (__mmask32 __A, __m512h __B, __m512h __C,
+			      const int __D)
 {
-  return __builtin_ia32_vcvttph2uqq512_mask_round (__B,
-						   _mm512_setzero_si512 (),
-						   __A,
-						   __C);
+  return __builtin_ia32_scalefph512_mask_round (__B, __C,
+						_mm512_setzero_ph (),
+						__A, __D);
 }
 
 #else
-#define _mm512_cvtt_roundph_epu64(A, B)					\
-  (__builtin_ia32_vcvttph2uqq512_mask_round ((A),			\
-					     _mm512_setzero_si512 (),	\
-					     (__mmask8)-1,		\
-					     (B)))
-
-#define _mm512_mask_cvtt_roundph_epu64(A, B, C, D)			\
-  __builtin_ia32_vcvttph2uqq512_mask_round ((C), (A), (B), (D))
+#define _mm512_scalef_round_ph(A, B, C)				\
+  (__builtin_ia32_scalefph512_mask_round ((A), (B),		\
+					  _mm512_setzero_ph (),	\
+					  (__mmask32)-1, (C)))
 
-#define _mm512_maskz_cvtt_roundph_epu64(A, B, C)			\
-  (__builtin_ia32_vcvttph2uqq512_mask_round ((B),			\
-					     _mm512_setzero_si512 (),	\
-					     (A),			\
-					     (C)))
+#define _mm512_mask_scalef_round_ph(A, B, C, D, E)			\
+  (__builtin_ia32_scalefph512_mask_round ((C), (D), (A), (B), (E)))
 
-#endif /* __OPTIMIZE__ */
+#define _mm512_maskz_scalef_round_ph(A, B, C, D)		\
+  (__builtin_ia32_scalefph512_mask_round ((B), (C),		\
+					  _mm512_setzero_ph (),	\
+					  (A), (D)))
 
-/* Intrinsics vcvtqq2ph.  */
-extern __inline __m128h
-  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepi64_ph (__m512i __A)
+#endif  /* __OPTIMIZE__ */
+
+/* Intrinsics vreduceph.  */
+#ifdef __OPTIMIZE__
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_reduce_ph (__m512h __A, int __B)
 {
-  return __builtin_ia32_vcvtqq2ph512_mask_round ((__v8di) __A,
-						 _mm_setzero_ph (),
-						 (__mmask8) -1,
-						 _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_reduceph512_mask_round (__A, __B,
+						_mm512_setzero_ph (),
+						(__mmask32) -1,
+						_MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepi64_ph (__m128h __A, __mmask8 __B, __m512i __C)
+_mm512_mask_reduce_ph (__m512h __A, __mmask32 __B, __m512h __C, int __D)
 {
-  return __builtin_ia32_vcvtqq2ph512_mask_round ((__v8di) __C,
-						 __A,
-						 __B,
-						 _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_reduceph512_mask_round (__C, __D, __A, __B,
+						_MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepi64_ph (__mmask8 __A, __m512i __B)
+_mm512_maskz_reduce_ph (__mmask32 __A, __m512h __B, int __C)
 {
-  return __builtin_ia32_vcvtqq2ph512_mask_round ((__v8di) __B,
-						 _mm_setzero_ph (),
-						 __A,
-						 _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_reduceph512_mask_round (__B, __C,
+						_mm512_setzero_ph (),
+						__A,
+						_MM_FROUND_CUR_DIRECTION);
 }
 
-#ifdef __OPTIMIZE__
-extern __inline __m128h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvt_roundepi64_ph (__m512i __A, int __B)
+_mm512_reduce_round_ph (__m512h __A, int __B, const int __C)
 {
-  return __builtin_ia32_vcvtqq2ph512_mask_round ((__v8di) __A,
-						 _mm_setzero_ph (),
-						 (__mmask8) -1,
-						 __B);
+  return __builtin_ia32_reduceph512_mask_round (__A, __B,
+						_mm512_setzero_ph (),
+						(__mmask32) -1, __C);
 }
 
-extern __inline __m128h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvt_roundepi64_ph (__m128h __A, __mmask8 __B, __m512i __C, int __D)
+_mm512_mask_reduce_round_ph (__m512h __A, __mmask32 __B, __m512h __C,
+			     int __D, const int __E)
 {
-  return __builtin_ia32_vcvtqq2ph512_mask_round ((__v8di) __C,
-						 __A,
-						 __B,
-						 __D);
+  return __builtin_ia32_reduceph512_mask_round (__C, __D, __A, __B,
+						__E);
 }
 
-extern __inline __m128h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvt_roundepi64_ph (__mmask8 __A, __m512i __B, int __C)
+_mm512_maskz_reduce_round_ph (__mmask32 __A, __m512h __B, int __C,
+			      const int __D)
 {
-  return __builtin_ia32_vcvtqq2ph512_mask_round ((__v8di) __B,
-						 _mm_setzero_ph (),
-						 __A,
-						 __C);
+  return __builtin_ia32_reduceph512_mask_round (__B, __C,
+						_mm512_setzero_ph (),
+						__A, __D);
 }
 
 #else
-#define _mm512_cvt_roundepi64_ph(A, B)				\
-  (__builtin_ia32_vcvtqq2ph512_mask_round ((__v8di)(A),		\
-					   _mm_setzero_ph (),	\
-					   (__mmask8)-1,	\
-					   (B)))
+#define _mm512_reduce_ph(A, B)						\
+  (__builtin_ia32_reduceph512_mask_round ((A), (B),			\
+					  _mm512_setzero_ph (),		\
+					  (__mmask32)-1,		\
+					  _MM_FROUND_CUR_DIRECTION))
 
-#define _mm512_mask_cvt_roundepi64_ph(A, B, C, D)			\
-  (__builtin_ia32_vcvtqq2ph512_mask_round ((__v8di)(C), (A), (B), (D)))
+#define _mm512_mask_reduce_ph(A, B, C, D)				\
+  (__builtin_ia32_reduceph512_mask_round ((C), (D), (A), (B),		\
+					  _MM_FROUND_CUR_DIRECTION))
 
-#define _mm512_maskz_cvt_roundepi64_ph(A, B, C)			\
-  (__builtin_ia32_vcvtqq2ph512_mask_round ((__v8di)(B),		\
-					   _mm_setzero_ph (),	\
-					   (A),			\
-					   (C)))
+#define _mm512_maskz_reduce_ph(A, B, C)					\
+  (__builtin_ia32_reduceph512_mask_round ((B), (C),			\
+					  _mm512_setzero_ph (),		\
+					  (A), _MM_FROUND_CUR_DIRECTION))
+
+#define _mm512_reduce_round_ph(A, B, C)				\
+  (__builtin_ia32_reduceph512_mask_round ((A), (B),		\
+					  _mm512_setzero_ph (),	\
+					  (__mmask32)-1, (C)))
+
+#define _mm512_mask_reduce_round_ph(A, B, C, D, E)			\
+  (__builtin_ia32_reduceph512_mask_round ((C), (D), (A), (B), (E)))
+
+#define _mm512_maskz_reduce_round_ph(A, B, C, D)		\
+  (__builtin_ia32_reduceph512_mask_round ((B), (C),		\
+					  _mm512_setzero_ph (),	\
+					  (A), (D)))
 
 #endif /* __OPTIMIZE__ */
 
-/* Intrinsics vcvtuqq2ph.  */
-extern __inline __m128h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepu64_ph (__m512i __A)
+/* Intrinsics vrndscaleph.  */
+#ifdef __OPTIMIZE__
+extern __inline __m512h
+  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_roundscale_ph (__m512h __A, int __B)
 {
-  return __builtin_ia32_vcvtuqq2ph512_mask_round ((__v8di) __A,
-						  _mm_setzero_ph (),
-						  (__mmask8) -1,
+  return __builtin_ia32_rndscaleph512_mask_round (__A, __B,
+						  _mm512_setzero_ph (),
+						  (__mmask32) -1,
 						  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepu64_ph (__m128h __A, __mmask8 __B, __m512i __C)
+_mm512_mask_roundscale_ph (__m512h __A, __mmask32 __B,
+			   __m512h __C, int __D)
 {
-  return __builtin_ia32_vcvtuqq2ph512_mask_round ((__v8di) __C,
-						  __A,
-						  __B,
+  return __builtin_ia32_rndscaleph512_mask_round (__C, __D, __A, __B,
 						  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepu64_ph (__mmask8 __A, __m512i __B)
+_mm512_maskz_roundscale_ph (__mmask32 __A, __m512h __B, int __C)
 {
-  return __builtin_ia32_vcvtuqq2ph512_mask_round ((__v8di) __B,
-						  _mm_setzero_ph (),
+  return __builtin_ia32_rndscaleph512_mask_round (__B, __C,
+						  _mm512_setzero_ph (),
 						  __A,
 						  _MM_FROUND_CUR_DIRECTION);
 }
 
-#ifdef __OPTIMIZE__
-extern __inline __m128h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvt_roundepu64_ph (__m512i __A, int __B)
+_mm512_roundscale_round_ph (__m512h __A, int __B, const int __C)
 {
-  return __builtin_ia32_vcvtuqq2ph512_mask_round ((__v8di) __A,
-						  _mm_setzero_ph (),
-						  (__mmask8) -1,
-						  __B);
+  return __builtin_ia32_rndscaleph512_mask_round (__A, __B,
+						  _mm512_setzero_ph (),
+						  (__mmask32) -1,
+						  __C);
 }
 
-extern __inline __m128h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvt_roundepu64_ph (__m128h __A, __mmask8 __B, __m512i __C, int __D)
+_mm512_mask_roundscale_round_ph (__m512h __A, __mmask32 __B,
+				 __m512h __C, int __D, const int __E)
 {
-  return __builtin_ia32_vcvtuqq2ph512_mask_round ((__v8di) __C,
-						  __A,
-						  __B,
-						  __D);
+  return __builtin_ia32_rndscaleph512_mask_round (__C, __D, __A,
+						  __B, __E);
 }
 
-extern __inline __m128h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvt_roundepu64_ph (__mmask8 __A, __m512i __B, int __C)
+_mm512_maskz_roundscale_round_ph (__mmask32 __A, __m512h __B, int __C,
+				  const int __D)
 {
-  return __builtin_ia32_vcvtuqq2ph512_mask_round ((__v8di) __B,
-						  _mm_setzero_ph (),
-						  __A,
-						  __C);
+  return __builtin_ia32_rndscaleph512_mask_round (__B, __C,
+						  _mm512_setzero_ph (),
+						  __A, __D);
 }
 
 #else
-#define _mm512_cvt_roundepu64_ph(A, B)				\
-  (__builtin_ia32_vcvtuqq2ph512_mask_round ((__v8di)(A),	\
-					    _mm_setzero_ph (),	\
-					    (__mmask8)-1,	\
-					    (B)))
+#define _mm512_roundscale_ph(A, B)					\
+  (__builtin_ia32_rndscaleph512_mask_round ((A), (B),			\
+					    _mm512_setzero_ph (),	\
+					    (__mmask32)-1,		\
+					    _MM_FROUND_CUR_DIRECTION))
 
-#define _mm512_mask_cvt_roundepu64_ph(A, B, C, D)			\
-  (__builtin_ia32_vcvtuqq2ph512_mask_round ((__v8di)(C), (A), (B), (D)))
+#define _mm512_mask_roundscale_ph(A, B, C, D)				\
+  (__builtin_ia32_rndscaleph512_mask_round ((C), (D), (A), (B),		\
+					    _MM_FROUND_CUR_DIRECTION))
 
-#define _mm512_maskz_cvt_roundepu64_ph(A, B, C)			\
-  (__builtin_ia32_vcvtuqq2ph512_mask_round ((__v8di)(B),	\
-					    _mm_setzero_ph (),	\
-					    (A),		\
-					    (C)))
+#define _mm512_maskz_roundscale_ph(A, B, C)				\
+  (__builtin_ia32_rndscaleph512_mask_round ((B), (C),			\
+					    _mm512_setzero_ph (),	\
+					    (A),			\
+					    _MM_FROUND_CUR_DIRECTION))
+#define _mm512_roundscale_round_ph(A, B, C)				\
+  (__builtin_ia32_rndscaleph512_mask_round ((A), (B),			\
+					    _mm512_setzero_ph (),	\
+					    (__mmask32)-1, (C)))
+
+#define _mm512_mask_roundscale_round_ph(A, B, C, D, E)			\
+  (__builtin_ia32_rndscaleph512_mask_round ((C), (D), (A), (B), (E)))
+
+#define _mm512_maskz_roundscale_round_ph(A, B, C, D)			\
+  (__builtin_ia32_rndscaleph512_mask_round ((B), (C),			\
+					    _mm512_setzero_ph (),	\
+					    (A), (D)))
 
 #endif /* __OPTIMIZE__ */
 
-/* Intrinsics vcvtph2w.  */
-extern __inline __m512i
+/* Intrinsics vfpclassph.  */
+#ifdef __OPTIMIZE__
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_fpclass_ph_mask (__mmask32 __U, __m512h __A,
+			     const int __imm)
+{
+  return (__mmask32) __builtin_ia32_fpclassph512_mask ((__v32hf) __A,
+						       __imm, __U);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_fpclass_ph_mask (__m512h __A, const int __imm)
+{
+  return (__mmask32) __builtin_ia32_fpclassph512_mask ((__v32hf) __A,
+						       __imm,
+						       (__mmask32) -1);
+}
+
+#else
+#define _mm512_mask_fpclass_ph_mask(u, x, c)				\
+  ((__mmask32) __builtin_ia32_fpclassph512_mask ((__v32hf) (__m512h) (x), \
+						 (int) (c),(__mmask8)(u)))
+
+#define _mm512_fpclass_ph_mask(x, c)                                    \
+  ((__mmask32) __builtin_ia32_fpclassph512_mask ((__v32hf) (__m512h) (x), \
+						 (int) (c),(__mmask8)-1))
+#endif /* __OPIMTIZE__ */
+
+/* Intrinsics vgetexpph.  */
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtph_epi16 (__m512h __A)
+_mm512_getexp_ph (__m512h __A)
 {
-  return (__m512i)
-    __builtin_ia32_vcvtph2w512_mask_round (__A,
-					      (__v32hi)
-					      _mm512_setzero_si512 (),
-					      (__mmask32) -1,
-					      _MM_FROUND_CUR_DIRECTION);
+  return (__m512h)
+    __builtin_ia32_getexpph512_mask ((__v32hf) __A,
+				     (__v32hf) _mm512_setzero_ph (),
+				     (__mmask32) -1, _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512i
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtph_epi16 (__m512i __A, __mmask32 __B, __m512h __C)
+_mm512_mask_getexp_ph (__m512h __W, __mmask32 __U, __m512h __A)
 {
-  return (__m512i)
-    __builtin_ia32_vcvtph2w512_mask_round (__C,
-					      (__v32hi) __A,
-					      __B,
-					      _MM_FROUND_CUR_DIRECTION);
+  return (__m512h)
+    __builtin_ia32_getexpph512_mask ((__v32hf) __A, (__v32hf) __W,
+				     (__mmask32) __U, _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512i
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtph_epi16 (__mmask32 __A, __m512h __B)
+_mm512_maskz_getexp_ph (__mmask32 __U, __m512h __A)
 {
-  return (__m512i)
-    __builtin_ia32_vcvtph2w512_mask_round (__B,
-					      (__v32hi)
-					      _mm512_setzero_si512 (),
-					      __A,
-					      _MM_FROUND_CUR_DIRECTION);
+  return (__m512h)
+    __builtin_ia32_getexpph512_mask ((__v32hf) __A,
+				     (__v32hf) _mm512_setzero_ph (),
+				     (__mmask32) __U, _MM_FROUND_CUR_DIRECTION);
 }
 
 #ifdef __OPTIMIZE__
-extern __inline __m512i
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvt_roundph_epi16 (__m512h __A, int __B)
+_mm512_getexp_round_ph (__m512h __A, const int __R)
 {
-  return (__m512i)
-    __builtin_ia32_vcvtph2w512_mask_round (__A,
-					      (__v32hi)
-					      _mm512_setzero_si512 (),
-					      (__mmask32) -1,
-					      __B);
+  return (__m512h) __builtin_ia32_getexpph512_mask ((__v32hf) __A,
+						    (__v32hf)
+						    _mm512_setzero_ph (),
+						    (__mmask32) -1, __R);
 }
 
-extern __inline __m512i
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvt_roundph_epi16 (__m512i __A, __mmask32 __B, __m512h __C, int __D)
+_mm512_mask_getexp_round_ph (__m512h __W, __mmask32 __U, __m512h __A,
+			     const int __R)
 {
-  return (__m512i)
-    __builtin_ia32_vcvtph2w512_mask_round (__C,
-					      (__v32hi) __A,
-					      __B,
-					      __D);
+  return (__m512h) __builtin_ia32_getexpph512_mask ((__v32hf) __A,
+						    (__v32hf) __W,
+						    (__mmask32) __U, __R);
 }
 
-extern __inline __m512i
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvt_roundph_epi16 (__mmask32 __A, __m512h __B, int __C)
+_mm512_maskz_getexp_round_ph (__mmask32 __U, __m512h __A, const int __R)
 {
-  return (__m512i)
-    __builtin_ia32_vcvtph2w512_mask_round (__B,
-					      (__v32hi)
-					      _mm512_setzero_si512 (),
-					      __A,
-					      __C);
+  return (__m512h) __builtin_ia32_getexpph512_mask ((__v32hf) __A,
+						    (__v32hf)
+						    _mm512_setzero_ph (),
+						    (__mmask32) __U, __R);
 }
 
 #else
-#define _mm512_cvt_roundph_epi16(A, B)					\
-  ((__m512i)__builtin_ia32_vcvtph2w512_mask_round ((A),		\
-						      (__v32hi)		\
-						      _mm512_setzero_si512 (), \
-						      (__mmask32)-1,	\
-						      (B)))
+#define _mm512_getexp_round_ph(A, R)					\
+  ((__m512h)__builtin_ia32_getexpph512_mask((__v32hf)(__m512h)(A),	\
+					    (__v32hf)_mm512_setzero_ph(), (__mmask32)-1, R))
 
-#define _mm512_mask_cvt_roundph_epi16(A, B, C, D)			\
-  ((__m512i)__builtin_ia32_vcvtph2w512_mask_round ((C),		\
-						      (__v32hi)(A),	\
-						      (B),		\
-						      (D)))
+#define _mm512_mask_getexp_round_ph(W, U, A, R)				\
+  ((__m512h)__builtin_ia32_getexpph512_mask((__v32hf)(__m512h)(A),	\
+					    (__v32hf)(__m512h)(W), (__mmask32)(U), R))
 
-#define _mm512_maskz_cvt_roundph_epi16(A, B, C)				\
-  ((__m512i)__builtin_ia32_vcvtph2w512_mask_round ((B),		\
-						      (__v32hi)		\
-						      _mm512_setzero_si512 (), \
-						      (A),		\
-						      (C)))
+#define _mm512_maskz_getexp_round_ph(U, A, R)				\
+  ((__m512h)__builtin_ia32_getexpph512_mask((__v32hf)(__m512h)(A),	\
+					    (__v32hf)_mm512_setzero_ph(), (__mmask32)(U), R))
 
 #endif /* __OPTIMIZE__ */
 
-/* Intrinsics vcvtph2uw.  */
-extern __inline __m512i
+/* Intrinsics vgetmantph.  */
+#ifdef __OPTIMIZE__
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtph_epu16 (__m512h __A)
+_mm512_getmant_ph (__m512h __A, _MM_MANTISSA_NORM_ENUM __B,
+		   _MM_MANTISSA_SIGN_ENUM __C)
 {
-  return (__m512i)
-    __builtin_ia32_vcvtph2uw512_mask_round (__A,
-					       (__v32hi)
-					       _mm512_setzero_si512 (),
-					       (__mmask32) -1,
-					       _MM_FROUND_CUR_DIRECTION);
+  return (__m512h) __builtin_ia32_getmantph512_mask ((__v32hf) __A,
+						     (__C << 2) | __B,
+						     _mm512_setzero_ph (),
+						     (__mmask32) -1,
+						     _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512i
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtph_epu16 (__m512i __A, __mmask32 __B, __m512h __C)
+_mm512_mask_getmant_ph (__m512h __W, __mmask32 __U, __m512h __A,
+			_MM_MANTISSA_NORM_ENUM __B,
+			_MM_MANTISSA_SIGN_ENUM __C)
 {
-  return (__m512i)
-    __builtin_ia32_vcvtph2uw512_mask_round (__C, (__v32hi) __A, __B,
-					       _MM_FROUND_CUR_DIRECTION);
+  return (__m512h) __builtin_ia32_getmantph512_mask ((__v32hf) __A,
+						     (__C << 2) | __B,
+						     (__v32hf) __W, __U,
+						     _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512i
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtph_epu16 (__mmask32 __A, __m512h __B)
+_mm512_maskz_getmant_ph (__mmask32 __U, __m512h __A,
+			 _MM_MANTISSA_NORM_ENUM __B,
+			 _MM_MANTISSA_SIGN_ENUM __C)
 {
-  return (__m512i)
-    __builtin_ia32_vcvtph2uw512_mask_round (__B,
-					       (__v32hi)
-					       _mm512_setzero_si512 (),
-					       __A,
-					       _MM_FROUND_CUR_DIRECTION);
+  return (__m512h) __builtin_ia32_getmantph512_mask ((__v32hf) __A,
+						     (__C << 2) | __B,
+						     (__v32hf)
+						     _mm512_setzero_ph (),
+						     __U,
+						     _MM_FROUND_CUR_DIRECTION);
 }
 
-#ifdef __OPTIMIZE__
-extern __inline __m512i
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvt_roundph_epu16 (__m512h __A, int __B)
+_mm512_getmant_round_ph (__m512h __A, _MM_MANTISSA_NORM_ENUM __B,
+			 _MM_MANTISSA_SIGN_ENUM __C, const int __R)
 {
-  return (__m512i)
-    __builtin_ia32_vcvtph2uw512_mask_round (__A,
-					       (__v32hi)
-					       _mm512_setzero_si512 (),
-					       (__mmask32) -1,
-					       __B);
+  return (__m512h) __builtin_ia32_getmantph512_mask ((__v32hf) __A,
+						     (__C << 2) | __B,
+						     _mm512_setzero_ph (),
+						     (__mmask32) -1, __R);
 }
 
-extern __inline __m512i
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvt_roundph_epu16 (__m512i __A, __mmask32 __B, __m512h __C, int __D)
+_mm512_mask_getmant_round_ph (__m512h __W, __mmask32 __U, __m512h __A,
+			      _MM_MANTISSA_NORM_ENUM __B,
+			      _MM_MANTISSA_SIGN_ENUM __C, const int __R)
 {
-  return (__m512i)
-    __builtin_ia32_vcvtph2uw512_mask_round (__C, (__v32hi) __A, __B, __D);
+  return (__m512h) __builtin_ia32_getmantph512_mask ((__v32hf) __A,
+						     (__C << 2) | __B,
+						     (__v32hf) __W, __U,
+						     __R);
 }
 
-extern __inline __m512i
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvt_roundph_epu16 (__mmask32 __A, __m512h __B, int __C)
+_mm512_maskz_getmant_round_ph (__mmask32 __U, __m512h __A,
+			       _MM_MANTISSA_NORM_ENUM __B,
+			       _MM_MANTISSA_SIGN_ENUM __C, const int __R)
 {
-  return (__m512i)
-    __builtin_ia32_vcvtph2uw512_mask_round (__B,
-					       (__v32hi)
-					       _mm512_setzero_si512 (),
-					       __A,
-					       __C);
+  return (__m512h) __builtin_ia32_getmantph512_mask ((__v32hf) __A,
+						     (__C << 2) | __B,
+						     (__v32hf)
+						     _mm512_setzero_ph (),
+						     __U, __R);
 }
 
 #else
-#define _mm512_cvt_roundph_epu16(A, B)					\
-  ((__m512i)								\
-   __builtin_ia32_vcvtph2uw512_mask_round ((A),			\
-					      (__v32hi)			\
-					      _mm512_setzero_si512 (),	\
-					      (__mmask32)-1, (B)))
+#define _mm512_getmant_ph(X, B, C)					\
+  ((__m512h)__builtin_ia32_getmantph512_mask ((__v32hf)(__m512h)(X),	\
+					      (int)(((C)<<2) | (B)),	\
+					      (__v32hf)(__m512h)	\
+					      _mm512_setzero_ph(),	\
+					      (__mmask32)-1,		\
+					      _MM_FROUND_CUR_DIRECTION))
 
-#define _mm512_mask_cvt_roundph_epu16(A, B, C, D)			\
-  ((__m512i)								\
-   __builtin_ia32_vcvtph2uw512_mask_round ((C), (__v32hi)(A), (B), (D)))
+#define _mm512_mask_getmant_ph(W, U, X, B, C)				\
+  ((__m512h)__builtin_ia32_getmantph512_mask ((__v32hf)(__m512h)(X),	\
+					      (int)(((C)<<2) | (B)),	\
+					      (__v32hf)(__m512h)(W),	\
+					      (__mmask32)(U),		\
+					      _MM_FROUND_CUR_DIRECTION))
 
-#define _mm512_maskz_cvt_roundph_epu16(A, B, C)				\
-  ((__m512i)								\
-   __builtin_ia32_vcvtph2uw512_mask_round ((B),			\
-					      (__v32hi)			\
-					      _mm512_setzero_si512 (),	\
-					      (A),			\
-					      (C)))
+
+#define _mm512_maskz_getmant_ph(U, X, B, C)				\
+  ((__m512h)__builtin_ia32_getmantph512_mask ((__v32hf)(__m512h)(X),	\
+					      (int)(((C)<<2) | (B)),	\
+					      (__v32hf)(__m512h)	\
+					      _mm512_setzero_ph(),	\
+					      (__mmask32)(U),		\
+					      _MM_FROUND_CUR_DIRECTION))
+
+#define _mm512_getmant_round_ph(X, B, C, R)				\
+  ((__m512h)__builtin_ia32_getmantph512_mask ((__v32hf)(__m512h)(X),	\
+					      (int)(((C)<<2) | (B)),	\
+					      (__v32hf)(__m512h)	\
+					      _mm512_setzero_ph(),	\
+					      (__mmask32)-1,		\
+					      (R)))
+
+#define _mm512_mask_getmant_round_ph(W, U, X, B, C, R)			\
+  ((__m512h)__builtin_ia32_getmantph512_mask ((__v32hf)(__m512h)(X),	\
+					      (int)(((C)<<2) | (B)),	\
+					      (__v32hf)(__m512h)(W),	\
+					      (__mmask32)(U),		\
+					      (R)))
+
+
+#define _mm512_maskz_getmant_round_ph(U, X, B, C, R)			\
+  ((__m512h)__builtin_ia32_getmantph512_mask ((__v32hf)(__m512h)(X),	\
+					      (int)(((C)<<2) | (B)),	\
+					      (__v32hf)(__m512h)	\
+					      _mm512_setzero_ph(),	\
+					      (__mmask32)(U),		\
+					      (R)))
 
 #endif /* __OPTIMIZE__ */
 
-/* Intrinsics vcvttph2w.  */
+/* Intrinsics vcvtph2dq.  */
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvttph_epi16 (__m512h __A)
+_mm512_cvtph_epi32 (__m256h __A)
 {
   return (__m512i)
-    __builtin_ia32_vcvttph2w512_mask_round (__A,
-					    (__v32hi)
+    __builtin_ia32_vcvtph2dq512_mask_round (__A,
+					    (__v16si)
 					    _mm512_setzero_si512 (),
-					    (__mmask32) -1,
+					    (__mmask16) -1,
 					    _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvttph_epi16 (__m512i __A, __mmask32 __B, __m512h __C)
+_mm512_mask_cvtph_epi32 (__m512i __A, __mmask16 __B, __m256h __C)
 {
   return (__m512i)
-    __builtin_ia32_vcvttph2w512_mask_round (__C,
-					    (__v32hi) __A,
+    __builtin_ia32_vcvtph2dq512_mask_round (__C,
+					    (__v16si) __A,
 					    __B,
 					    _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvttph_epi16 (__mmask32 __A, __m512h __B)
+_mm512_maskz_cvtph_epi32 (__mmask16 __A, __m256h __B)
 {
   return (__m512i)
-    __builtin_ia32_vcvttph2w512_mask_round (__B,
-					    (__v32hi)
+    __builtin_ia32_vcvtph2dq512_mask_round (__B,
+					    (__v16si)
 					    _mm512_setzero_si512 (),
 					    __A,
 					    _MM_FROUND_CUR_DIRECTION);
@@ -4039,2994 +4210,2868 @@ _mm512_maskz_cvttph_epi16 (__mmask32 __A, __m512h __B)
 #ifdef __OPTIMIZE__
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtt_roundph_epi16 (__m512h __A, int __B)
+_mm512_cvt_roundph_epi32 (__m256h __A, int __B)
 {
   return (__m512i)
-    __builtin_ia32_vcvttph2w512_mask_round (__A,
-					    (__v32hi)
+    __builtin_ia32_vcvtph2dq512_mask_round (__A,
+					    (__v16si)
 					    _mm512_setzero_si512 (),
-					    (__mmask32) -1,
+					    (__mmask16) -1,
 					    __B);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtt_roundph_epi16 (__m512i __A, __mmask32 __B,
-				__m512h __C, int __D)
+_mm512_mask_cvt_roundph_epi32 (__m512i __A, __mmask16 __B, __m256h __C, int __D)
 {
   return (__m512i)
-    __builtin_ia32_vcvttph2w512_mask_round (__C,
-					    (__v32hi) __A,
+    __builtin_ia32_vcvtph2dq512_mask_round (__C,
+					    (__v16si) __A,
 					    __B,
 					    __D);
 }
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtt_roundph_epi16 (__mmask32 __A, __m512h __B, int __C)
+_mm512_maskz_cvt_roundph_epi32 (__mmask16 __A, __m256h __B, int __C)
 {
   return (__m512i)
-    __builtin_ia32_vcvttph2w512_mask_round (__B,
-					    (__v32hi)
+    __builtin_ia32_vcvtph2dq512_mask_round (__B,
+					    (__v16si)
 					    _mm512_setzero_si512 (),
 					    __A,
 					    __C);
 }
 
 #else
-#define _mm512_cvtt_roundph_epi16(A, B)				    \
-  ((__m512i)							    \
-   __builtin_ia32_vcvttph2w512_mask_round ((A),			    \
-					   (__v32hi)		    \
-					   _mm512_setzero_si512 (), \
-					   (__mmask32)-1,	    \
+#define _mm512_cvt_roundph_epi32(A, B)					\
+  ((__m512i)								\
+   __builtin_ia32_vcvtph2dq512_mask_round ((A),				\
+					   (__v16si)			\
+					   _mm512_setzero_si512 (),	\
+					   (__mmask16)-1,		\
 					   (B)))
 
-#define _mm512_mask_cvtt_roundph_epi16(A, B, C, D)		\
-  ((__m512i)							\
-   __builtin_ia32_vcvttph2w512_mask_round ((C),			\
-					   (__v32hi)(A),	\
-					   (B),			\
-					   (D)))
-
-#define _mm512_maskz_cvtt_roundph_epi16(A, B, C)		    \
-  ((__m512i)							    \
-   __builtin_ia32_vcvttph2w512_mask_round ((B),			    \
-					   (__v32hi)		    \
-					   _mm512_setzero_si512 (), \
-					   (A),			    \
-					   (C)))
-
-#endif /* __OPTIMIZE__ */
-
-/* Intrinsics vcvttph2uw.  */
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvttph_epu16 (__m512h __A)
-{
-  return (__m512i)
-    __builtin_ia32_vcvttph2uw512_mask_round (__A,
-					     (__v32hi)
-					     _mm512_setzero_si512 (),
-					     (__mmask32) -1,
-					     _MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvttph_epu16 (__m512i __A, __mmask32 __B, __m512h __C)
-{
-  return (__m512i)
-    __builtin_ia32_vcvttph2uw512_mask_round (__C,
-					     (__v32hi) __A,
-					     __B,
-					     _MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvttph_epu16 (__mmask32 __A, __m512h __B)
-{
-  return (__m512i)
-    __builtin_ia32_vcvttph2uw512_mask_round (__B,
-					     (__v32hi)
-					     _mm512_setzero_si512 (),
-					     __A,
-					     _MM_FROUND_CUR_DIRECTION);
-}
-
-#ifdef __OPTIMIZE__
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtt_roundph_epu16 (__m512h __A, int __B)
-{
-  return (__m512i)
-    __builtin_ia32_vcvttph2uw512_mask_round (__A,
-					     (__v32hi)
-					     _mm512_setzero_si512 (),
-					     (__mmask32) -1,
-					     __B);
-}
-
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtt_roundph_epu16 (__m512i __A, __mmask32 __B,
-				__m512h __C, int __D)
-{
-  return (__m512i)
-    __builtin_ia32_vcvttph2uw512_mask_round (__C,
-					     (__v32hi) __A,
-					     __B,
-					     __D);
-}
-
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtt_roundph_epu16 (__mmask32 __A, __m512h __B, int __C)
-{
-  return (__m512i)
-    __builtin_ia32_vcvttph2uw512_mask_round (__B,
-					     (__v32hi)
-					     _mm512_setzero_si512 (),
-					     __A,
-					     __C);
-}
-
-#else
-#define _mm512_cvtt_roundph_epu16(A, B)				     \
-  ((__m512i)							     \
-   __builtin_ia32_vcvttph2uw512_mask_round ((A),		     \
-					    (__v32hi)		     \
-					    _mm512_setzero_si512 (), \
-					    (__mmask32)-1,	     \
-					    (B)))
-
-#define _mm512_mask_cvtt_roundph_epu16(A, B, C, D)		\
-  ((__m512i)							\
-   __builtin_ia32_vcvttph2uw512_mask_round ((C),		\
-					    (__v32hi)(A),	\
-					    (B),		\
-					    (D)))
-
-#define _mm512_maskz_cvtt_roundph_epu16(A, B, C)		     \
-  ((__m512i)							     \
-   __builtin_ia32_vcvttph2uw512_mask_round ((B),		     \
-					    (__v32hi)		     \
-					    _mm512_setzero_si512 (), \
-					    (A),		     \
-					    (C)))
-
-#endif /* __OPTIMIZE__ */
-
-/* Intrinsics vcvtw2ph.  */
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepi16_ph (__m512i __A)
-{
-  return __builtin_ia32_vcvtw2ph512_mask_round ((__v32hi) __A,
-						_mm512_setzero_ph (),
-						(__mmask32) -1,
-						_MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepi16_ph (__m512h __A, __mmask32 __B, __m512i __C)
-{
-  return __builtin_ia32_vcvtw2ph512_mask_round ((__v32hi) __C,
-						__A,
-						__B,
-						_MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepi16_ph (__mmask32 __A, __m512i __B)
-{
-  return __builtin_ia32_vcvtw2ph512_mask_round ((__v32hi) __B,
-						_mm512_setzero_ph (),
-						__A,
-						_MM_FROUND_CUR_DIRECTION);
-}
-
-#ifdef __OPTIMIZE__
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvt_roundepi16_ph (__m512i __A, int __B)
-{
-  return __builtin_ia32_vcvtw2ph512_mask_round ((__v32hi) __A,
-						_mm512_setzero_ph (),
-						(__mmask32) -1,
-						__B);
-}
-
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvt_roundepi16_ph (__m512h __A, __mmask32 __B, __m512i __C, int __D)
-{
-  return __builtin_ia32_vcvtw2ph512_mask_round ((__v32hi) __C,
-						__A,
-						__B,
-						__D);
-}
-
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvt_roundepi16_ph (__mmask32 __A, __m512i __B, int __C)
-{
-  return __builtin_ia32_vcvtw2ph512_mask_round ((__v32hi) __B,
-						_mm512_setzero_ph (),
-						__A,
-						__C);
-}
-
-#else
-#define _mm512_cvt_roundepi16_ph(A, B)				\
-  (__builtin_ia32_vcvtw2ph512_mask_round ((__v32hi)(A),		\
-					  _mm512_setzero_ph (),	\
-					  (__mmask32)-1,	\
-					  (B)))
-
-#define _mm512_mask_cvt_roundepi16_ph(A, B, C, D)	\
-  (__builtin_ia32_vcvtw2ph512_mask_round ((__v32hi)(C),	\
-					  (A),		\
-					  (B),		\
-					  (D)))
+#define _mm512_mask_cvt_roundph_epi32(A, B, C, D)			\
+  ((__m512i)								\
+   __builtin_ia32_vcvtph2dq512_mask_round ((C), (__v16si)(A), (B), (D)))
 
-#define _mm512_maskz_cvt_roundepi16_ph(A, B, C)			\
-  (__builtin_ia32_vcvtw2ph512_mask_round ((__v32hi)(B),		\
-					  _mm512_setzero_ph (),	\
-					  (A),			\
-					  (C)))
+#define _mm512_maskz_cvt_roundph_epi32(A, B, C)				\
+  ((__m512i)								\
+   __builtin_ia32_vcvtph2dq512_mask_round ((B),				\
+					   (__v16si)			\
+					   _mm512_setzero_si512 (),	\
+					   (A),				\
+					   (C)))
 
 #endif /* __OPTIMIZE__ */
 
-/* Intrinsics vcvtuw2ph.  */
-  extern __inline __m512h
-  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-  _mm512_cvtepu16_ph (__m512i __A)
-  {
-    return __builtin_ia32_vcvtuw2ph512_mask_round ((__v32hi) __A,
-						   _mm512_setzero_ph (),
-						   (__mmask32) -1,
-						   _MM_FROUND_CUR_DIRECTION);
-  }
+/* Intrinsics vcvtph2udq.  */
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvtph_epu32 (__m256h __A)
+{
+  return (__m512i)
+    __builtin_ia32_vcvtph2udq512_mask_round (__A,
+					     (__v16si)
+					     _mm512_setzero_si512 (),
+					     (__mmask16) -1,
+					     _MM_FROUND_CUR_DIRECTION);
+}
 
-extern __inline __m512h
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepu16_ph (__m512h __A, __mmask32 __B, __m512i __C)
+_mm512_mask_cvtph_epu32 (__m512i __A, __mmask16 __B, __m256h __C)
 {
-  return __builtin_ia32_vcvtuw2ph512_mask_round ((__v32hi) __C,
-						 __A,
-						 __B,
-						 _MM_FROUND_CUR_DIRECTION);
+  return (__m512i)
+    __builtin_ia32_vcvtph2udq512_mask_round (__C,
+					     (__v16si) __A,
+					     __B,
+					     _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512h
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepu16_ph (__mmask32 __A, __m512i __B)
+_mm512_maskz_cvtph_epu32 (__mmask16 __A, __m256h __B)
 {
-  return __builtin_ia32_vcvtuw2ph512_mask_round ((__v32hi) __B,
-						 _mm512_setzero_ph (),
-						 __A,
-						 _MM_FROUND_CUR_DIRECTION);
+  return (__m512i)
+    __builtin_ia32_vcvtph2udq512_mask_round (__B,
+					     (__v16si)
+					     _mm512_setzero_si512 (),
+					     __A,
+					     _MM_FROUND_CUR_DIRECTION);
 }
 
 #ifdef __OPTIMIZE__
-extern __inline __m512h
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvt_roundepu16_ph (__m512i __A, int __B)
+_mm512_cvt_roundph_epu32 (__m256h __A, int __B)
 {
-  return __builtin_ia32_vcvtuw2ph512_mask_round ((__v32hi) __A,
-						 _mm512_setzero_ph (),
-						 (__mmask32) -1,
-						 __B);
+  return (__m512i)
+    __builtin_ia32_vcvtph2udq512_mask_round (__A,
+					     (__v16si)
+					     _mm512_setzero_si512 (),
+					     (__mmask16) -1,
+					     __B);
 }
 
-extern __inline __m512h
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvt_roundepu16_ph (__m512h __A, __mmask32 __B, __m512i __C, int __D)
+_mm512_mask_cvt_roundph_epu32 (__m512i __A, __mmask16 __B, __m256h __C, int __D)
 {
-  return __builtin_ia32_vcvtuw2ph512_mask_round ((__v32hi) __C,
-						 __A,
-						 __B,
-						 __D);
+  return (__m512i)
+    __builtin_ia32_vcvtph2udq512_mask_round (__C,
+					     (__v16si) __A,
+					     __B,
+					     __D);
 }
 
-extern __inline __m512h
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvt_roundepu16_ph (__mmask32 __A, __m512i __B, int __C)
+_mm512_maskz_cvt_roundph_epu32 (__mmask16 __A, __m256h __B, int __C)
 {
-  return __builtin_ia32_vcvtuw2ph512_mask_round ((__v32hi) __B,
-						 _mm512_setzero_ph (),
-						 __A,
-						 __C);
+  return (__m512i)
+    __builtin_ia32_vcvtph2udq512_mask_round (__B,
+					     (__v16si)
+					     _mm512_setzero_si512 (),
+					     __A,
+					     __C);
 }
 
 #else
-#define _mm512_cvt_roundepu16_ph(A, B)					\
-  (__builtin_ia32_vcvtuw2ph512_mask_round ((__v32hi)(A),		\
-					   _mm512_setzero_ph (),	\
-					   (__mmask32)-1,		\
-					   (B)))
+#define _mm512_cvt_roundph_epu32(A, B)					\
+  ((__m512i)								\
+   __builtin_ia32_vcvtph2udq512_mask_round ((A),			\
+					    (__v16si)			\
+					    _mm512_setzero_si512 (),	\
+					    (__mmask16)-1,		\
+					    (B)))
 
-#define _mm512_mask_cvt_roundepu16_ph(A, B, C, D)		\
-  (__builtin_ia32_vcvtuw2ph512_mask_round ((__v32hi)(C),	\
-					   (A),			\
-					   (B),			\
-					   (D)))
+#define _mm512_mask_cvt_roundph_epu32(A, B, C, D)			\
+  ((__m512i)								\
+   __builtin_ia32_vcvtph2udq512_mask_round ((C), (__v16si)(A), (B), (D)))
 
-#define _mm512_maskz_cvt_roundepu16_ph(A, B, C)				\
-  (__builtin_ia32_vcvtuw2ph512_mask_round ((__v32hi)(B),		\
-					   _mm512_setzero_ph (),	\
-					   (A),				\
-					   (C)))
+#define _mm512_maskz_cvt_roundph_epu32(A, B, C)				\
+  ((__m512i)								\
+   __builtin_ia32_vcvtph2udq512_mask_round ((B),			\
+					    (__v16si)			\
+					    _mm512_setzero_si512 (),	\
+					    (A),			\
+					    (C)))
 
 #endif /* __OPTIMIZE__ */
 
-/* Intrinsics vcvtsh2si, vcvtsh2us.  */
-extern __inline int
+/* Intrinsics vcvttph2dq.  */
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtsh_i32 (__m128h __A)
+_mm512_cvttph_epi32 (__m256h __A)
 {
-  return (int) __builtin_ia32_vcvtsh2si32_round (__A, _MM_FROUND_CUR_DIRECTION);
+  return (__m512i)
+    __builtin_ia32_vcvttph2dq512_mask_round (__A,
+					     (__v16si)
+					     _mm512_setzero_si512 (),
+					     (__mmask16) -1,
+					     _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline unsigned
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtsh_u32 (__m128h __A)
+_mm512_mask_cvttph_epi32 (__m512i __A, __mmask16 __B, __m256h __C)
 {
-  return (int) __builtin_ia32_vcvtsh2usi32_round (__A,
-						  _MM_FROUND_CUR_DIRECTION);
+  return (__m512i)
+    __builtin_ia32_vcvttph2dq512_mask_round (__C,
+					     (__v16si) __A,
+					     __B,
+					     _MM_FROUND_CUR_DIRECTION);
 }
 
-#ifdef __OPTIMIZE__
-extern __inline int
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundsh_i32 (__m128h __A, const int __R)
+_mm512_maskz_cvttph_epi32 (__mmask16 __A, __m256h __B)
 {
-  return (int) __builtin_ia32_vcvtsh2si32_round (__A, __R);
+  return (__m512i)
+    __builtin_ia32_vcvttph2dq512_mask_round (__B,
+					     (__v16si)
+					     _mm512_setzero_si512 (),
+					     __A,
+					     _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline unsigned
+#ifdef __OPTIMIZE__
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundsh_u32 (__m128h __A, const int __R)
+_mm512_cvtt_roundph_epi32 (__m256h __A, int __B)
 {
-  return (int) __builtin_ia32_vcvtsh2usi32_round (__A, __R);
+  return (__m512i)
+    __builtin_ia32_vcvttph2dq512_mask_round (__A,
+					     (__v16si)
+					     _mm512_setzero_si512 (),
+					     (__mmask16) -1,
+					     __B);
 }
 
-#else
-#define _mm_cvt_roundsh_i32(A, B)		\
-  ((int)__builtin_ia32_vcvtsh2si32_round ((A), (B)))
-#define _mm_cvt_roundsh_u32(A, B)		\
-  ((int)__builtin_ia32_vcvtsh2usi32_round ((A), (B)))
-
-#endif /* __OPTIMIZE__ */
-
-#ifdef __x86_64__
-extern __inline long long
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtsh_i64 (__m128h __A)
+_mm512_mask_cvtt_roundph_epi32 (__m512i __A, __mmask16 __B,
+				__m256h __C, int __D)
 {
-  return (long long)
-    __builtin_ia32_vcvtsh2si64_round (__A, _MM_FROUND_CUR_DIRECTION);
+  return (__m512i)
+    __builtin_ia32_vcvttph2dq512_mask_round (__C,
+					     (__v16si) __A,
+					     __B,
+					     __D);
 }
 
-extern __inline unsigned long long
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtsh_u64 (__m128h __A)
+_mm512_maskz_cvtt_roundph_epi32 (__mmask16 __A, __m256h __B, int __C)
 {
-  return (long long)
-    __builtin_ia32_vcvtsh2usi64_round (__A, _MM_FROUND_CUR_DIRECTION);
+  return (__m512i)
+    __builtin_ia32_vcvttph2dq512_mask_round (__B,
+					     (__v16si)
+					     _mm512_setzero_si512 (),
+					     __A,
+					     __C);
 }
 
-#ifdef __OPTIMIZE__
-extern __inline long long
+#else
+#define _mm512_cvtt_roundph_epi32(A, B)					\
+  ((__m512i)								\
+   __builtin_ia32_vcvttph2dq512_mask_round ((A),			\
+					    (__v16si)			\
+					    (_mm512_setzero_si512 ()),	\
+					    (__mmask16)(-1), (B)))
+
+#define _mm512_mask_cvtt_roundph_epi32(A, B, C, D)		\
+  ((__m512i)							\
+   __builtin_ia32_vcvttph2dq512_mask_round ((C),		\
+					    (__v16si)(A),	\
+					    (B),		\
+					    (D)))
+
+#define _mm512_maskz_cvtt_roundph_epi32(A, B, C)			\
+  ((__m512i)								\
+   __builtin_ia32_vcvttph2dq512_mask_round ((B),			\
+					    (__v16si)			\
+					    _mm512_setzero_si512 (),	\
+					    (A),			\
+					    (C)))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vcvttph2udq.  */
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundsh_i64 (__m128h __A, const int __R)
+_mm512_cvttph_epu32 (__m256h __A)
 {
-  return (long long) __builtin_ia32_vcvtsh2si64_round (__A, __R);
+  return (__m512i)
+    __builtin_ia32_vcvttph2udq512_mask_round (__A,
+					      (__v16si)
+					      _mm512_setzero_si512 (),
+					      (__mmask16) -1,
+					      _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline unsigned long long
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundsh_u64 (__m128h __A, const int __R)
+_mm512_mask_cvttph_epu32 (__m512i __A, __mmask16 __B, __m256h __C)
 {
-  return (long long) __builtin_ia32_vcvtsh2usi64_round (__A, __R);
+  return (__m512i)
+    __builtin_ia32_vcvttph2udq512_mask_round (__C,
+					      (__v16si) __A,
+					      __B,
+					      _MM_FROUND_CUR_DIRECTION);
 }
 
-#else
-#define _mm_cvt_roundsh_i64(A, B)			\
-  ((long long)__builtin_ia32_vcvtsh2si64_round ((A), (B)))
-#define _mm_cvt_roundsh_u64(A, B)			\
-  ((long long)__builtin_ia32_vcvtsh2usi64_round ((A), (B)))
-
-#endif /* __OPTIMIZE__ */
-#endif /* __x86_64__ */
-
-/* Intrinsics vcvttsh2si, vcvttsh2us.  */
-extern __inline int
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvttsh_i32 (__m128h __A)
+_mm512_maskz_cvttph_epu32 (__mmask16 __A, __m256h __B)
 {
-  return (int)
-    __builtin_ia32_vcvttsh2si32_round (__A, _MM_FROUND_CUR_DIRECTION);
+  return (__m512i)
+    __builtin_ia32_vcvttph2udq512_mask_round (__B,
+					      (__v16si)
+					      _mm512_setzero_si512 (),
+					      __A,
+					      _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline unsigned
+#ifdef __OPTIMIZE__
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvttsh_u32 (__m128h __A)
+_mm512_cvtt_roundph_epu32 (__m256h __A, int __B)
 {
-  return (int)
-    __builtin_ia32_vcvttsh2usi32_round (__A, _MM_FROUND_CUR_DIRECTION);
+  return (__m512i)
+    __builtin_ia32_vcvttph2udq512_mask_round (__A,
+					      (__v16si)
+					      _mm512_setzero_si512 (),
+					      (__mmask16) -1,
+					      __B);
 }
 
-#ifdef __OPTIMIZE__
-extern __inline int
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtt_roundsh_i32 (__m128h __A, const int __R)
+_mm512_mask_cvtt_roundph_epu32 (__m512i __A, __mmask16 __B,
+				__m256h __C, int __D)
 {
-  return (int) __builtin_ia32_vcvttsh2si32_round (__A, __R);
+  return (__m512i)
+    __builtin_ia32_vcvttph2udq512_mask_round (__C,
+					      (__v16si) __A,
+					      __B,
+					      __D);
 }
 
-extern __inline unsigned
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtt_roundsh_u32 (__m128h __A, const int __R)
+_mm512_maskz_cvtt_roundph_epu32 (__mmask16 __A, __m256h __B, int __C)
 {
-  return (int) __builtin_ia32_vcvttsh2usi32_round (__A, __R);
+  return (__m512i)
+    __builtin_ia32_vcvttph2udq512_mask_round (__B,
+					      (__v16si)
+					      _mm512_setzero_si512 (),
+					      __A,
+					      __C);
 }
 
 #else
-#define _mm_cvtt_roundsh_i32(A, B)		\
-  ((int)__builtin_ia32_vcvttsh2si32_round ((A), (B)))
-#define _mm_cvtt_roundsh_u32(A, B)		\
-  ((int)__builtin_ia32_vcvttsh2usi32_round ((A), (B)))
+#define _mm512_cvtt_roundph_epu32(A, B)					\
+  ((__m512i)								\
+   __builtin_ia32_vcvttph2udq512_mask_round ((A),			\
+					     (__v16si)			\
+					     _mm512_setzero_si512 (),	\
+					     (__mmask16)-1,		\
+					     (B)))
+
+#define _mm512_mask_cvtt_roundph_epu32(A, B, C, D)		\
+  ((__m512i)							\
+   __builtin_ia32_vcvttph2udq512_mask_round ((C),		\
+					     (__v16si)(A),	\
+					     (B),		\
+					     (D)))
+
+#define _mm512_maskz_cvtt_roundph_epu32(A, B, C)			\
+  ((__m512i)								\
+   __builtin_ia32_vcvttph2udq512_mask_round ((B),			\
+					     (__v16si)			\
+					     _mm512_setzero_si512 (),	\
+					     (A),			\
+					     (C)))
 
 #endif /* __OPTIMIZE__ */
 
-#ifdef __x86_64__
-extern __inline long long
+/* Intrinsics vcvtdq2ph.  */
+extern __inline __m256h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvttsh_i64 (__m128h __A)
+_mm512_cvtepi32_ph (__m512i __A)
 {
-  return (long long)
-    __builtin_ia32_vcvttsh2si64_round (__A, _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_vcvtdq2ph512_mask_round ((__v16si) __A,
+						 _mm256_setzero_ph (),
+						 (__mmask16) -1,
+						 _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline unsigned long long
+extern __inline __m256h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvttsh_u64 (__m128h __A)
+_mm512_mask_cvtepi32_ph (__m256h __A, __mmask16 __B, __m512i __C)
 {
-  return (long long)
-    __builtin_ia32_vcvttsh2usi64_round (__A, _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_vcvtdq2ph512_mask_round ((__v16si) __C,
+						 __A,
+						 __B,
+						 _MM_FROUND_CUR_DIRECTION);
 }
 
-#ifdef __OPTIMIZE__
-extern __inline long long
+extern __inline __m256h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtt_roundsh_i64 (__m128h __A, const int __R)
+_mm512_maskz_cvtepi32_ph (__mmask16 __A, __m512i __B)
 {
-  return (long long) __builtin_ia32_vcvttsh2si64_round (__A, __R);
+  return __builtin_ia32_vcvtdq2ph512_mask_round ((__v16si) __B,
+						 _mm256_setzero_ph (),
+						 __A,
+						 _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline unsigned long long
+#ifdef __OPTIMIZE__
+extern __inline __m256h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtt_roundsh_u64 (__m128h __A, const int __R)
+_mm512_cvt_roundepi32_ph (__m512i __A, int __B)
 {
-  return (long long) __builtin_ia32_vcvttsh2usi64_round (__A, __R);
+  return __builtin_ia32_vcvtdq2ph512_mask_round ((__v16si) __A,
+						 _mm256_setzero_ph (),
+						 (__mmask16) -1,
+						 __B);
 }
 
-#else
-#define _mm_cvtt_roundsh_i64(A, B)			\
-  ((long long)__builtin_ia32_vcvttsh2si64_round ((A), (B)))
-#define _mm_cvtt_roundsh_u64(A, B)			\
-  ((long long)__builtin_ia32_vcvttsh2usi64_round ((A), (B)))
-
-#endif /* __OPTIMIZE__ */
-#endif /* __x86_64__ */
-
-/* Intrinsics vcvtsi2sh, vcvtusi2sh.  */
-extern __inline __m128h
+extern __inline __m256h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvti32_sh (__m128h __A, int __B)
+_mm512_mask_cvt_roundepi32_ph (__m256h __A, __mmask16 __B, __m512i __C, int __D)
 {
-  return __builtin_ia32_vcvtsi2sh32_round (__A, __B, _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_vcvtdq2ph512_mask_round ((__v16si) __C,
+						 __A,
+						 __B,
+						 __D);
 }
 
-extern __inline __m128h
+extern __inline __m256h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtu32_sh (__m128h __A, unsigned int __B)
+_mm512_maskz_cvt_roundepi32_ph (__mmask16 __A, __m512i __B, int __C)
 {
-  return __builtin_ia32_vcvtusi2sh32_round (__A, __B, _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_vcvtdq2ph512_mask_round ((__v16si) __B,
+						 _mm256_setzero_ph (),
+						 __A,
+						 __C);
 }
 
-#ifdef __OPTIMIZE__
-extern __inline __m128h
+#else
+#define _mm512_cvt_roundepi32_ph(A, B)					\
+  (__builtin_ia32_vcvtdq2ph512_mask_round ((__v16si)(A),		\
+					   _mm256_setzero_ph (),	\
+					   (__mmask16)-1,		\
+					   (B)))
+
+#define _mm512_mask_cvt_roundepi32_ph(A, B, C, D)		\
+  (__builtin_ia32_vcvtdq2ph512_mask_round ((__v16si)(C),	\
+					   (A),			\
+					   (B),			\
+					   (D)))
+
+#define _mm512_maskz_cvt_roundepi32_ph(A, B, C)				\
+  (__builtin_ia32_vcvtdq2ph512_mask_round ((__v16si)(B),		\
+					   _mm256_setzero_ph (),	\
+					   (A),				\
+					   (C)))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vcvtudq2ph.  */
+extern __inline __m256h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundi32_sh (__m128h __A, int __B, const int __R)
+_mm512_cvtepu32_ph (__m512i __A)
 {
-  return __builtin_ia32_vcvtsi2sh32_round (__A, __B, __R);
+  return __builtin_ia32_vcvtudq2ph512_mask_round ((__v16si) __A,
+						  _mm256_setzero_ph (),
+						  (__mmask16) -1,
+						  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128h
+extern __inline __m256h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundu32_sh (__m128h __A, unsigned int __B, const int __R)
+_mm512_mask_cvtepu32_ph (__m256h __A, __mmask16 __B, __m512i __C)
 {
-  return __builtin_ia32_vcvtusi2sh32_round (__A, __B, __R);
+  return __builtin_ia32_vcvtudq2ph512_mask_round ((__v16si) __C,
+						  __A,
+						  __B,
+						  _MM_FROUND_CUR_DIRECTION);
 }
 
-#else
-#define _mm_cvt_roundi32_sh(A, B, C)		\
-  (__builtin_ia32_vcvtsi2sh32_round ((A), (B), (C)))
-#define _mm_cvt_roundu32_sh(A, B, C)		\
-  (__builtin_ia32_vcvtusi2sh32_round ((A), (B), (C)))
-
-#endif /* __OPTIMIZE__ */
-
-#ifdef __x86_64__
-extern __inline __m128h
+extern __inline __m256h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvti64_sh (__m128h __A, long long __B)
+_mm512_maskz_cvtepu32_ph (__mmask16 __A, __m512i __B)
 {
-  return __builtin_ia32_vcvtsi2sh64_round (__A, __B, _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_vcvtudq2ph512_mask_round ((__v16si) __B,
+						  _mm256_setzero_ph (),
+						  __A,
+						  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128h
+#ifdef __OPTIMIZE__
+extern __inline __m256h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtu64_sh (__m128h __A, unsigned long long __B)
+_mm512_cvt_roundepu32_ph (__m512i __A, int __B)
 {
-  return __builtin_ia32_vcvtusi2sh64_round (__A, __B, _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_vcvtudq2ph512_mask_round ((__v16si) __A,
+						  _mm256_setzero_ph (),
+						  (__mmask16) -1,
+						  __B);
 }
 
-#ifdef __OPTIMIZE__
-extern __inline __m128h
+extern __inline __m256h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundi64_sh (__m128h __A, long long __B, const int __R)
+_mm512_mask_cvt_roundepu32_ph (__m256h __A, __mmask16 __B, __m512i __C, int __D)
 {
-  return __builtin_ia32_vcvtsi2sh64_round (__A, __B, __R);
+  return __builtin_ia32_vcvtudq2ph512_mask_round ((__v16si) __C,
+						  __A,
+						  __B,
+						  __D);
 }
 
-extern __inline __m128h
+extern __inline __m256h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundu64_sh (__m128h __A, unsigned long long __B, const int __R)
+_mm512_maskz_cvt_roundepu32_ph (__mmask16 __A, __m512i __B, int __C)
 {
-  return __builtin_ia32_vcvtusi2sh64_round (__A, __B, __R);
+  return __builtin_ia32_vcvtudq2ph512_mask_round ((__v16si) __B,
+						  _mm256_setzero_ph (),
+						  __A,
+						  __C);
 }
 
-#else
-#define _mm_cvt_roundi64_sh(A, B, C)		\
-  (__builtin_ia32_vcvtsi2sh64_round ((A), (B), (C)))
-#define _mm_cvt_roundu64_sh(A, B, C)		\
-  (__builtin_ia32_vcvtusi2sh64_round ((A), (B), (C)))
+#else
+#define _mm512_cvt_roundepu32_ph(A, B)					\
+  (__builtin_ia32_vcvtudq2ph512_mask_round ((__v16si)(A),		\
+					    _mm256_setzero_ph (),	\
+					    (__mmask16)-1,		\
+					    B))
+
+#define _mm512_mask_cvt_roundepu32_ph(A, B, C, D)	\
+  (__builtin_ia32_vcvtudq2ph512_mask_round ((__v16si)C,	\
+					    A,		\
+					    B,		\
+					    D))
+
+#define _mm512_maskz_cvt_roundepu32_ph(A, B, C)				\
+  (__builtin_ia32_vcvtudq2ph512_mask_round ((__v16si)B,			\
+					    _mm256_setzero_ph (),	\
+					    A,				\
+					    C))
 
 #endif /* __OPTIMIZE__ */
-#endif /* __x86_64__ */
 
-/* Intrinsics vcvtph2pd.  */
-extern __inline __m512d
+/* Intrinsics vcvtph2qq.  */
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtph_pd (__m128h __A)
+_mm512_cvtph_epi64 (__m128h __A)
 {
-  return __builtin_ia32_vcvtph2pd512_mask_round (__A,
-						 _mm512_setzero_pd (),
+  return __builtin_ia32_vcvtph2qq512_mask_round (__A,
+						 _mm512_setzero_si512 (),
 						 (__mmask8) -1,
 						 _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtph_pd (__m512d __A, __mmask8 __B, __m128h __C)
+_mm512_mask_cvtph_epi64 (__m512i __A, __mmask8 __B, __m128h __C)
 {
-  return __builtin_ia32_vcvtph2pd512_mask_round (__C, __A, __B,
+  return __builtin_ia32_vcvtph2qq512_mask_round (__C, __A, __B,
 						 _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtph_pd (__mmask8 __A, __m128h __B)
+_mm512_maskz_cvtph_epi64 (__mmask8 __A, __m128h __B)
 {
-  return __builtin_ia32_vcvtph2pd512_mask_round (__B,
-						 _mm512_setzero_pd (),
+  return __builtin_ia32_vcvtph2qq512_mask_round (__B,
+						 _mm512_setzero_si512 (),
 						 __A,
 						 _MM_FROUND_CUR_DIRECTION);
 }
 
 #ifdef __OPTIMIZE__
-extern __inline __m512d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvt_roundph_pd (__m128h __A, int __B)
+_mm512_cvt_roundph_epi64 (__m128h __A, int __B)
 {
-  return __builtin_ia32_vcvtph2pd512_mask_round (__A,
-						 _mm512_setzero_pd (),
+  return __builtin_ia32_vcvtph2qq512_mask_round (__A,
+						 _mm512_setzero_si512 (),
 						 (__mmask8) -1,
 						 __B);
 }
 
-extern __inline __m512d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvt_roundph_pd (__m512d __A, __mmask8 __B, __m128h __C, int __D)
+_mm512_mask_cvt_roundph_epi64 (__m512i __A, __mmask8 __B, __m128h __C, int __D)
 {
-  return __builtin_ia32_vcvtph2pd512_mask_round (__C, __A, __B, __D);
+  return __builtin_ia32_vcvtph2qq512_mask_round (__C, __A, __B, __D);
 }
 
-extern __inline __m512d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvt_roundph_pd (__mmask8 __A, __m128h __B, int __C)
+_mm512_maskz_cvt_roundph_epi64 (__mmask8 __A, __m128h __B, int __C)
 {
-  return __builtin_ia32_vcvtph2pd512_mask_round (__B,
-						 _mm512_setzero_pd (),
+  return __builtin_ia32_vcvtph2qq512_mask_round (__B,
+						 _mm512_setzero_si512 (),
 						 __A,
 						 __C);
 }
 
 #else
-#define _mm512_cvt_roundph_pd(A, B)					\
-  (__builtin_ia32_vcvtph2pd512_mask_round ((A),			\
-					   _mm512_setzero_pd (),	\
+#define _mm512_cvt_roundph_epi64(A, B)					\
+  (__builtin_ia32_vcvtph2qq512_mask_round ((A),				\
+					   _mm512_setzero_si512 (),	\
 					   (__mmask8)-1,		\
 					   (B)))
 
-#define _mm512_mask_cvt_roundph_pd(A, B, C, D)				\
-  (__builtin_ia32_vcvtph2pd512_mask_round ((C), (A), (B), (D)))
+#define _mm512_mask_cvt_roundph_epi64(A, B, C, D)		\
+  (__builtin_ia32_vcvtph2qq512_mask_round ((C), (A), (B), (D)))
 
-#define _mm512_maskz_cvt_roundph_pd(A, B, C)				\
-  (__builtin_ia32_vcvtph2pd512_mask_round ((B),			\
-					   _mm512_setzero_pd (),	\
-					   (A),			\
+#define _mm512_maskz_cvt_roundph_epi64(A, B, C)				\
+  (__builtin_ia32_vcvtph2qq512_mask_round ((B),				\
+					   _mm512_setzero_si512 (),	\
+					   (A),				\
 					   (C)))
 
 #endif /* __OPTIMIZE__ */
 
-/* Intrinsics vcvtph2psx.  */
-extern __inline __m512
+/* Intrinsics vcvtph2uqq.  */
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtxph_ps (__m256h __A)
+_mm512_cvtph_epu64 (__m128h __A)
 {
-  return __builtin_ia32_vcvtph2psx512_mask_round (__A,
-						  _mm512_setzero_ps (),
-						  (__mmask16) -1,
+  return __builtin_ia32_vcvtph2uqq512_mask_round (__A,
+						  _mm512_setzero_si512 (),
+						  (__mmask8) -1,
 						  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtxph_ps (__m512 __A, __mmask16 __B, __m256h __C)
+_mm512_mask_cvtph_epu64 (__m512i __A, __mmask8 __B, __m128h __C)
 {
-  return __builtin_ia32_vcvtph2psx512_mask_round (__C, __A, __B,
+  return __builtin_ia32_vcvtph2uqq512_mask_round (__C, __A, __B,
 						  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtxph_ps (__mmask16 __A, __m256h __B)
+_mm512_maskz_cvtph_epu64 (__mmask8 __A, __m128h __B)
 {
-  return __builtin_ia32_vcvtph2psx512_mask_round (__B,
-						  _mm512_setzero_ps (),
+  return __builtin_ia32_vcvtph2uqq512_mask_round (__B,
+						  _mm512_setzero_si512 (),
 						  __A,
 						  _MM_FROUND_CUR_DIRECTION);
 }
 
 #ifdef __OPTIMIZE__
-extern __inline __m512
+
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtx_roundph_ps (__m256h __A, int __B)
+_mm512_cvt_roundph_epu64 (__m128h __A, int __B)
 {
-  return __builtin_ia32_vcvtph2psx512_mask_round (__A,
-						  _mm512_setzero_ps (),
-						  (__mmask16) -1,
+  return __builtin_ia32_vcvtph2uqq512_mask_round (__A,
+						  _mm512_setzero_si512 (),
+						  (__mmask8) -1,
 						  __B);
 }
 
-extern __inline __m512
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtx_roundph_ps (__m512 __A, __mmask16 __B, __m256h __C, int __D)
+_mm512_mask_cvt_roundph_epu64 (__m512i __A, __mmask8 __B, __m128h __C, int __D)
 {
-  return __builtin_ia32_vcvtph2psx512_mask_round (__C, __A, __B, __D);
+  return __builtin_ia32_vcvtph2uqq512_mask_round (__C, __A, __B, __D);
 }
 
-extern __inline __m512
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtx_roundph_ps (__mmask16 __A, __m256h __B, int __C)
+_mm512_maskz_cvt_roundph_epu64 (__mmask8 __A, __m128h __B, int __C)
 {
-  return __builtin_ia32_vcvtph2psx512_mask_round (__B,
-						  _mm512_setzero_ps (),
+  return __builtin_ia32_vcvtph2uqq512_mask_round (__B,
+						  _mm512_setzero_si512 (),
 						  __A,
 						  __C);
 }
 
 #else
-#define _mm512_cvtx_roundph_ps(A, B)					\
-  (__builtin_ia32_vcvtph2psx512_mask_round ((A),			\
-					    _mm512_setzero_ps (),	\
-					    (__mmask16)-1,		\
+#define _mm512_cvt_roundph_epu64(A, B)					\
+  (__builtin_ia32_vcvtph2uqq512_mask_round ((A),			\
+					    _mm512_setzero_si512 (),	\
+					    (__mmask8)-1,		\
 					    (B)))
 
-#define _mm512_mask_cvtx_roundph_ps(A, B, C, D)				\
-  (__builtin_ia32_vcvtph2psx512_mask_round ((C), (A), (B), (D)))
+#define _mm512_mask_cvt_roundph_epu64(A, B, C, D)			\
+  (__builtin_ia32_vcvtph2uqq512_mask_round ((C), (A), (B), (D)))
 
-#define _mm512_maskz_cvtx_roundph_ps(A, B, C)				\
-  (__builtin_ia32_vcvtph2psx512_mask_round ((B),			\
-					    _mm512_setzero_ps (),	\
+#define _mm512_maskz_cvt_roundph_epu64(A, B, C)				\
+  (__builtin_ia32_vcvtph2uqq512_mask_round ((B),			\
+					    _mm512_setzero_si512 (),	\
 					    (A),			\
 					    (C)))
+
 #endif /* __OPTIMIZE__ */
 
-/* Intrinsics vcvtps2ph.  */
-extern __inline __m256h
+/* Intrinsics vcvttph2qq.  */
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtxps_ph (__m512 __A)
+_mm512_cvttph_epi64 (__m128h __A)
 {
-  return __builtin_ia32_vcvtps2phx512_mask_round ((__v16sf) __A,
-						  _mm256_setzero_ph (),
-						  (__mmask16) -1,
+  return __builtin_ia32_vcvttph2qq512_mask_round (__A,
+						  _mm512_setzero_si512 (),
+						  (__mmask8) -1,
 						  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m256h
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtxps_ph (__m256h __A, __mmask16 __B, __m512 __C)
+_mm512_mask_cvttph_epi64 (__m512i __A, __mmask8 __B, __m128h __C)
 {
-  return __builtin_ia32_vcvtps2phx512_mask_round ((__v16sf) __C,
-						  __A, __B,
+  return __builtin_ia32_vcvttph2qq512_mask_round (__C, __A, __B,
 						  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m256h
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtxps_ph (__mmask16 __A, __m512 __B)
+_mm512_maskz_cvttph_epi64 (__mmask8 __A, __m128h __B)
 {
-  return __builtin_ia32_vcvtps2phx512_mask_round ((__v16sf) __B,
-						  _mm256_setzero_ph (),
+  return __builtin_ia32_vcvttph2qq512_mask_round (__B,
+						  _mm512_setzero_si512 (),
 						  __A,
-						  _MM_FROUND_CUR_DIRECTION);
-}
-
-#ifdef __OPTIMIZE__
-extern __inline __m256h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtx_roundps_ph (__m512 __A, int __B)
-{
-  return __builtin_ia32_vcvtps2phx512_mask_round ((__v16sf) __A,
-						  _mm256_setzero_ph (),
-						  (__mmask16) -1,
-						  __B);
-}
-
-extern __inline __m256h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtx_roundps_ph (__m256h __A, __mmask16 __B, __m512 __C, int __D)
-{
-  return __builtin_ia32_vcvtps2phx512_mask_round ((__v16sf) __C,
-						  __A, __B, __D);
-}
-
-extern __inline __m256h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtx_roundps_ph (__mmask16 __A, __m512 __B, int __C)
-{
-  return __builtin_ia32_vcvtps2phx512_mask_round ((__v16sf) __B,
-						  _mm256_setzero_ph (),
-						  __A, __C);
-}
-
-#else
-#define _mm512_cvtx_roundps_ph(A, B)				\
-  (__builtin_ia32_vcvtps2phx512_mask_round ((__v16sf)(A),	\
-					    _mm256_setzero_ph (),\
-					    (__mmask16)-1, (B)))
-
-#define _mm512_mask_cvtx_roundps_ph(A, B, C, D)			\
-  (__builtin_ia32_vcvtps2phx512_mask_round ((__v16sf)(C),	\
-					    (A), (B), (D)))
-
-#define _mm512_maskz_cvtx_roundps_ph(A, B, C)			\
-  (__builtin_ia32_vcvtps2phx512_mask_round ((__v16sf)(B),	\
-					    _mm256_setzero_ph (),\
-					    (A), (C)))
-#endif /* __OPTIMIZE__ */
-
-/* Intrinsics vcvtpd2ph.  */
-extern __inline __m128h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtpd_ph (__m512d __A)
-{
-  return __builtin_ia32_vcvtpd2ph512_mask_round ((__v8df) __A,
-						 _mm_setzero_ph (),
-						 (__mmask8) -1,
-						 _MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __m128h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtpd_ph (__m128h __A, __mmask8 __B, __m512d __C)
-{
-  return __builtin_ia32_vcvtpd2ph512_mask_round ((__v8df) __C,
-						 __A, __B,
-						 _MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __m128h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtpd_ph (__mmask8 __A, __m512d __B)
-{
-  return __builtin_ia32_vcvtpd2ph512_mask_round ((__v8df) __B,
-						 _mm_setzero_ph (),
-						 __A,
-						 _MM_FROUND_CUR_DIRECTION);
+						  _MM_FROUND_CUR_DIRECTION);
 }
 
 #ifdef __OPTIMIZE__
-extern __inline __m128h
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvt_roundpd_ph (__m512d __A, int __B)
+_mm512_cvtt_roundph_epi64 (__m128h __A, int __B)
 {
-  return __builtin_ia32_vcvtpd2ph512_mask_round ((__v8df) __A,
-						 _mm_setzero_ph (),
-						 (__mmask8) -1,
-						 __B);
+  return __builtin_ia32_vcvttph2qq512_mask_round (__A,
+						  _mm512_setzero_si512 (),
+						  (__mmask8) -1,
+						  __B);
 }
 
-extern __inline __m128h
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvt_roundpd_ph (__m128h __A, __mmask8 __B, __m512d __C, int __D)
+_mm512_mask_cvtt_roundph_epi64 (__m512i __A, __mmask8 __B, __m128h __C, int __D)
 {
-  return __builtin_ia32_vcvtpd2ph512_mask_round ((__v8df) __C,
-						 __A, __B, __D);
+  return __builtin_ia32_vcvttph2qq512_mask_round (__C, __A, __B, __D);
 }
 
-extern __inline __m128h
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvt_roundpd_ph (__mmask8 __A, __m512d __B, int __C)
+_mm512_maskz_cvtt_roundph_epi64 (__mmask8 __A, __m128h __B, int __C)
 {
-  return __builtin_ia32_vcvtpd2ph512_mask_round ((__v8df) __B,
-						 _mm_setzero_ph (),
-						 __A, __C);
+  return __builtin_ia32_vcvttph2qq512_mask_round (__B,
+						  _mm512_setzero_si512 (),
+						  __A,
+						  __C);
 }
 
 #else
-#define _mm512_cvt_roundpd_ph(A, B)				\
-  (__builtin_ia32_vcvtpd2ph512_mask_round ((__v8df)(A),		\
-					   _mm_setzero_ph (),	\
-					   (__mmask8)-1, (B)))
+#define _mm512_cvtt_roundph_epi64(A, B)					\
+  (__builtin_ia32_vcvttph2qq512_mask_round ((A),			\
+					    _mm512_setzero_si512 (),	\
+					    (__mmask8)-1,		\
+					    (B)))
 
-#define _mm512_mask_cvt_roundpd_ph(A, B, C, D)			\
-  (__builtin_ia32_vcvtpd2ph512_mask_round ((__v8df)(C),		\
-					   (A), (B), (D)))
+#define _mm512_mask_cvtt_roundph_epi64(A, B, C, D)			\
+  __builtin_ia32_vcvttph2qq512_mask_round ((C), (A), (B), (D))
 
-#define _mm512_maskz_cvt_roundpd_ph(A, B, C)			\
-  (__builtin_ia32_vcvtpd2ph512_mask_round ((__v8df)(B),		\
-					   _mm_setzero_ph (),	\
-					   (A), (C)))
+#define _mm512_maskz_cvtt_roundph_epi64(A, B, C)			\
+  (__builtin_ia32_vcvttph2qq512_mask_round ((B),			\
+					    _mm512_setzero_si512 (),	\
+					    (A),			\
+					    (C)))
 
 #endif /* __OPTIMIZE__ */
 
-/* Intrinsics vcvtsh2ss, vcvtsh2sd.  */
-extern __inline __m128
+/* Intrinsics vcvttph2uqq.  */
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtsh_ss (__m128 __A, __m128h __B)
+_mm512_cvttph_epu64 (__m128h __A)
 {
-  return __builtin_ia32_vcvtsh2ss_mask_round (__B, __A,
-					      _mm_setzero_ps (),
-					      (__mmask8) -1,
-					      _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_vcvttph2uqq512_mask_round (__A,
+						   _mm512_setzero_si512 (),
+						   (__mmask8) -1,
+						   _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_cvtsh_ss (__m128 __A, __mmask8 __B, __m128 __C,
-			 __m128h __D)
+_mm512_mask_cvttph_epu64 (__m512i __A, __mmask8 __B, __m128h __C)
 {
-  return __builtin_ia32_vcvtsh2ss_mask_round (__D, __C, __A, __B,
-					      _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_vcvttph2uqq512_mask_round (__C, __A, __B,
+						   _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_cvtsh_ss (__mmask8 __A, __m128 __B,
-			  __m128h __C)
+_mm512_maskz_cvttph_epu64 (__mmask8 __A, __m128h __B)
 {
-  return __builtin_ia32_vcvtsh2ss_mask_round (__C, __B,
-					      _mm_setzero_ps (),
-					      __A, _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_vcvttph2uqq512_mask_round (__B,
+						   _mm512_setzero_si512 (),
+						   __A,
+						   _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128d
+#ifdef __OPTIMIZE__
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtsh_sd (__m128d __A, __m128h __B)
+_mm512_cvtt_roundph_epu64 (__m128h __A, int __B)
 {
-  return __builtin_ia32_vcvtsh2sd_mask_round (__B, __A,
-					      _mm_setzero_pd (),
-					      (__mmask8) -1,
-					      _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_vcvttph2uqq512_mask_round (__A,
+						   _mm512_setzero_si512 (),
+						   (__mmask8) -1,
+						   __B);
 }
 
-extern __inline __m128d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_cvtsh_sd (__m128d __A, __mmask8 __B, __m128d __C,
-			 __m128h __D)
+_mm512_mask_cvtt_roundph_epu64 (__m512i __A, __mmask8 __B, __m128h __C, int __D)
 {
-  return __builtin_ia32_vcvtsh2sd_mask_round (__D, __C, __A, __B,
-					      _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_vcvttph2uqq512_mask_round (__C, __A, __B, __D);
 }
 
-extern __inline __m128d
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_cvtsh_sd (__mmask8 __A, __m128d __B, __m128h __C)
+_mm512_maskz_cvtt_roundph_epu64 (__mmask8 __A, __m128h __B, int __C)
 {
-  return __builtin_ia32_vcvtsh2sd_mask_round (__C, __B,
-					      _mm_setzero_pd (),
-					      __A, _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_vcvttph2uqq512_mask_round (__B,
+						   _mm512_setzero_si512 (),
+						   __A,
+						   __C);
 }
 
-#ifdef __OPTIMIZE__
-extern __inline __m128
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundsh_ss (__m128 __A, __m128h __B, const int __R)
+#else
+#define _mm512_cvtt_roundph_epu64(A, B)					\
+  (__builtin_ia32_vcvttph2uqq512_mask_round ((A),			\
+					     _mm512_setzero_si512 (),	\
+					     (__mmask8)-1,		\
+					     (B)))
+
+#define _mm512_mask_cvtt_roundph_epu64(A, B, C, D)			\
+  __builtin_ia32_vcvttph2uqq512_mask_round ((C), (A), (B), (D))
+
+#define _mm512_maskz_cvtt_roundph_epu64(A, B, C)			\
+  (__builtin_ia32_vcvttph2uqq512_mask_round ((B),			\
+					     _mm512_setzero_si512 (),	\
+					     (A),			\
+					     (C)))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vcvtqq2ph.  */
+extern __inline __m128h
+  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvtepi64_ph (__m512i __A)
 {
-  return __builtin_ia32_vcvtsh2ss_mask_round (__B, __A,
-					      _mm_setzero_ps (),
-					      (__mmask8) -1, __R);
+  return __builtin_ia32_vcvtqq2ph512_mask_round ((__v8di) __A,
+						 _mm_setzero_ph (),
+						 (__mmask8) -1,
+						 _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_cvt_roundsh_ss (__m128 __A, __mmask8 __B, __m128 __C,
-			 __m128h __D, const int __R)
+_mm512_mask_cvtepi64_ph (__m128h __A, __mmask8 __B, __m512i __C)
 {
-  return __builtin_ia32_vcvtsh2ss_mask_round (__D, __C, __A, __B, __R);
+  return __builtin_ia32_vcvtqq2ph512_mask_round ((__v8di) __C,
+						 __A,
+						 __B,
+						 _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_cvt_roundsh_ss (__mmask8 __A, __m128 __B,
-			  __m128h __C, const int __R)
+_mm512_maskz_cvtepi64_ph (__mmask8 __A, __m512i __B)
 {
-  return __builtin_ia32_vcvtsh2ss_mask_round (__C, __B,
-					      _mm_setzero_ps (),
-					      __A, __R);
+  return __builtin_ia32_vcvtqq2ph512_mask_round ((__v8di) __B,
+						 _mm_setzero_ph (),
+						 __A,
+						 _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128d
+#ifdef __OPTIMIZE__
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundsh_sd (__m128d __A, __m128h __B, const int __R)
+_mm512_cvt_roundepi64_ph (__m512i __A, int __B)
 {
-  return __builtin_ia32_vcvtsh2sd_mask_round (__B, __A,
-					      _mm_setzero_pd (),
-					      (__mmask8) -1, __R);
+  return __builtin_ia32_vcvtqq2ph512_mask_round ((__v8di) __A,
+						 _mm_setzero_ph (),
+						 (__mmask8) -1,
+						 __B);
 }
 
-extern __inline __m128d
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_cvt_roundsh_sd (__m128d __A, __mmask8 __B, __m128d __C,
-			 __m128h __D, const int __R)
+_mm512_mask_cvt_roundepi64_ph (__m128h __A, __mmask8 __B, __m512i __C, int __D)
 {
-  return __builtin_ia32_vcvtsh2sd_mask_round (__D, __C, __A, __B, __R);
+  return __builtin_ia32_vcvtqq2ph512_mask_round ((__v8di) __C,
+						 __A,
+						 __B,
+						 __D);
 }
 
-extern __inline __m128d
+extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_cvt_roundsh_sd (__mmask8 __A, __m128d __B, __m128h __C, const int __R)
+_mm512_maskz_cvt_roundepi64_ph (__mmask8 __A, __m512i __B, int __C)
 {
-  return __builtin_ia32_vcvtsh2sd_mask_round (__C, __B,
-					      _mm_setzero_pd (),
-					      __A, __R);
+  return __builtin_ia32_vcvtqq2ph512_mask_round ((__v8di) __B,
+						 _mm_setzero_ph (),
+						 __A,
+						 __C);
 }
 
 #else
-#define _mm_cvt_roundsh_ss(A, B, R)				\
-  (__builtin_ia32_vcvtsh2ss_mask_round ((B), (A),		\
-					_mm_setzero_ps (),	\
-					(__mmask8) -1, (R)))
-
-#define _mm_mask_cvt_roundsh_ss(A, B, C, D, R)				\
-  (__builtin_ia32_vcvtsh2ss_mask_round ((D), (C), (A), (B), (R)))
-
-#define _mm_maskz_cvt_roundsh_ss(A, B, C, R)			\
-  (__builtin_ia32_vcvtsh2ss_mask_round ((C), (B),		\
-					_mm_setzero_ps (),	\
-					(A), (R)))
-
-#define _mm_cvt_roundsh_sd(A, B, R)				\
-  (__builtin_ia32_vcvtsh2sd_mask_round ((B), (A),		\
-					_mm_setzero_pd (),	\
-					(__mmask8) -1, (R)))
+#define _mm512_cvt_roundepi64_ph(A, B)				\
+  (__builtin_ia32_vcvtqq2ph512_mask_round ((__v8di)(A),		\
+					   _mm_setzero_ph (),	\
+					   (__mmask8)-1,	\
+					   (B)))
 
-#define _mm_mask_cvt_roundsh_sd(A, B, C, D, R)				\
-  (__builtin_ia32_vcvtsh2sd_mask_round ((D), (C), (A), (B), (R)))
+#define _mm512_mask_cvt_roundepi64_ph(A, B, C, D)			\
+  (__builtin_ia32_vcvtqq2ph512_mask_round ((__v8di)(C), (A), (B), (D)))
 
-#define _mm_maskz_cvt_roundsh_sd(A, B, C, R)			\
-  (__builtin_ia32_vcvtsh2sd_mask_round ((C), (B),		\
-					_mm_setzero_pd (),	\
-					(A), (R)))
+#define _mm512_maskz_cvt_roundepi64_ph(A, B, C)			\
+  (__builtin_ia32_vcvtqq2ph512_mask_round ((__v8di)(B),		\
+					   _mm_setzero_ph (),	\
+					   (A),			\
+					   (C)))
 
 #endif /* __OPTIMIZE__ */
 
-/* Intrinsics vcvtss2sh, vcvtsd2sh.  */
+/* Intrinsics vcvtuqq2ph.  */
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtss_sh (__m128h __A, __m128 __B)
+_mm512_cvtepu64_ph (__m512i __A)
 {
-  return __builtin_ia32_vcvtss2sh_mask_round (__B, __A,
-					      _mm_setzero_ph (),
-					      (__mmask8) -1,
-					      _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_vcvtuqq2ph512_mask_round ((__v8di) __A,
+						  _mm_setzero_ph (),
+						  (__mmask8) -1,
+						  _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_cvtss_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128 __D)
+_mm512_mask_cvtepu64_ph (__m128h __A, __mmask8 __B, __m512i __C)
 {
-  return __builtin_ia32_vcvtss2sh_mask_round (__D, __C, __A, __B,
-					      _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_vcvtuqq2ph512_mask_round ((__v8di) __C,
+						  __A,
+						  __B,
+						  _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_cvtss_sh (__mmask8 __A, __m128h __B, __m128 __C)
+_mm512_maskz_cvtepu64_ph (__mmask8 __A, __m512i __B)
 {
-  return __builtin_ia32_vcvtss2sh_mask_round (__C, __B,
-					      _mm_setzero_ph (),
-					      __A, _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_vcvtuqq2ph512_mask_round ((__v8di) __B,
+						  _mm_setzero_ph (),
+						  __A,
+						  _MM_FROUND_CUR_DIRECTION);
 }
 
+#ifdef __OPTIMIZE__
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtsd_sh (__m128h __A, __m128d __B)
+_mm512_cvt_roundepu64_ph (__m512i __A, int __B)
 {
-  return __builtin_ia32_vcvtsd2sh_mask_round (__B, __A,
-					      _mm_setzero_ph (),
-					      (__mmask8) -1,
-					      _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_vcvtuqq2ph512_mask_round ((__v8di) __A,
+						  _mm_setzero_ph (),
+						  (__mmask8) -1,
+						  __B);
 }
 
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_cvtsd_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128d __D)
+_mm512_mask_cvt_roundepu64_ph (__m128h __A, __mmask8 __B, __m512i __C, int __D)
 {
-  return __builtin_ia32_vcvtsd2sh_mask_round (__D, __C, __A, __B,
-					      _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_vcvtuqq2ph512_mask_round ((__v8di) __C,
+						  __A,
+						  __B,
+						  __D);
 }
 
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_cvtsd_sh (__mmask8 __A, __m128h __B, __m128d __C)
+_mm512_maskz_cvt_roundepu64_ph (__mmask8 __A, __m512i __B, int __C)
 {
-  return __builtin_ia32_vcvtsd2sh_mask_round (__C, __B,
-					      _mm_setzero_ph (),
-					      __A, _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_vcvtuqq2ph512_mask_round ((__v8di) __B,
+						  _mm_setzero_ph (),
+						  __A,
+						  __C);
 }
 
-#ifdef __OPTIMIZE__
-extern __inline __m128h
+#else
+#define _mm512_cvt_roundepu64_ph(A, B)				\
+  (__builtin_ia32_vcvtuqq2ph512_mask_round ((__v8di)(A),	\
+					    _mm_setzero_ph (),	\
+					    (__mmask8)-1,	\
+					    (B)))
+
+#define _mm512_mask_cvt_roundepu64_ph(A, B, C, D)			\
+  (__builtin_ia32_vcvtuqq2ph512_mask_round ((__v8di)(C), (A), (B), (D)))
+
+#define _mm512_maskz_cvt_roundepu64_ph(A, B, C)			\
+  (__builtin_ia32_vcvtuqq2ph512_mask_round ((__v8di)(B),	\
+					    _mm_setzero_ph (),	\
+					    (A),		\
+					    (C)))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vcvtph2w.  */
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundss_sh (__m128h __A, __m128 __B, const int __R)
+_mm512_cvtph_epi16 (__m512h __A)
 {
-  return __builtin_ia32_vcvtss2sh_mask_round (__B, __A,
-					      _mm_setzero_ph (),
-					      (__mmask8) -1, __R);
+  return (__m512i)
+    __builtin_ia32_vcvtph2w512_mask_round (__A,
+					      (__v32hi)
+					      _mm512_setzero_si512 (),
+					      (__mmask32) -1,
+					      _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128h
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_cvt_roundss_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128 __D,
-			 const int __R)
+_mm512_mask_cvtph_epi16 (__m512i __A, __mmask32 __B, __m512h __C)
 {
-  return __builtin_ia32_vcvtss2sh_mask_round (__D, __C, __A, __B, __R);
+  return (__m512i)
+    __builtin_ia32_vcvtph2w512_mask_round (__C,
+					      (__v32hi) __A,
+					      __B,
+					      _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128h
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_cvt_roundss_sh (__mmask8 __A, __m128h __B, __m128 __C,
-			  const int __R)
+_mm512_maskz_cvtph_epi16 (__mmask32 __A, __m512h __B)
 {
-  return __builtin_ia32_vcvtss2sh_mask_round (__C, __B,
-					      _mm_setzero_ph (),
-					      __A, __R);
+  return (__m512i)
+    __builtin_ia32_vcvtph2w512_mask_round (__B,
+					      (__v32hi)
+					      _mm512_setzero_si512 (),
+					      __A,
+					      _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128h
+#ifdef __OPTIMIZE__
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundsd_sh (__m128h __A, __m128d __B, const int __R)
+_mm512_cvt_roundph_epi16 (__m512h __A, int __B)
 {
-  return __builtin_ia32_vcvtsd2sh_mask_round (__B, __A,
-					      _mm_setzero_ph (),
-					      (__mmask8) -1, __R);
+  return (__m512i)
+    __builtin_ia32_vcvtph2w512_mask_round (__A,
+					      (__v32hi)
+					      _mm512_setzero_si512 (),
+					      (__mmask32) -1,
+					      __B);
 }
 
-extern __inline __m128h
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_cvt_roundsd_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128d __D,
-			 const int __R)
+_mm512_mask_cvt_roundph_epi16 (__m512i __A, __mmask32 __B, __m512h __C, int __D)
 {
-  return __builtin_ia32_vcvtsd2sh_mask_round (__D, __C, __A, __B, __R);
+  return (__m512i)
+    __builtin_ia32_vcvtph2w512_mask_round (__C,
+					      (__v32hi) __A,
+					      __B,
+					      __D);
 }
 
-extern __inline __m128h
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_cvt_roundsd_sh (__mmask8 __A, __m128h __B, __m128d __C,
-			  const int __R)
+_mm512_maskz_cvt_roundph_epi16 (__mmask32 __A, __m512h __B, int __C)
 {
-  return __builtin_ia32_vcvtsd2sh_mask_round (__C, __B,
-					      _mm_setzero_ph (),
-					      __A, __R);
+  return (__m512i)
+    __builtin_ia32_vcvtph2w512_mask_round (__B,
+					      (__v32hi)
+					      _mm512_setzero_si512 (),
+					      __A,
+					      __C);
 }
 
 #else
-#define _mm_cvt_roundss_sh(A, B, R)				\
-  (__builtin_ia32_vcvtss2sh_mask_round ((B), (A),		\
-					_mm_setzero_ph (),	\
-					(__mmask8) -1, R))
-
-#define _mm_mask_cvt_roundss_sh(A, B, C, D, R)				\
-  (__builtin_ia32_vcvtss2sh_mask_round ((D), (C), (A), (B), (R)))
-
-#define _mm_maskz_cvt_roundss_sh(A, B, C, R)			\
-  (__builtin_ia32_vcvtss2sh_mask_round ((C), (B),		\
-					_mm_setzero_ph (),	\
-					A, R))
-
-#define _mm_cvt_roundsd_sh(A, B, R)				\
-  (__builtin_ia32_vcvtsd2sh_mask_round ((B), (A),		\
-					_mm_setzero_ph (),	\
-					(__mmask8) -1, R))
+#define _mm512_cvt_roundph_epi16(A, B)					\
+  ((__m512i)__builtin_ia32_vcvtph2w512_mask_round ((A),		\
+						      (__v32hi)		\
+						      _mm512_setzero_si512 (), \
+						      (__mmask32)-1,	\
+						      (B)))
 
-#define _mm_mask_cvt_roundsd_sh(A, B, C, D, R)				\
-  (__builtin_ia32_vcvtsd2sh_mask_round ((D), (C), (A), (B), (R)))
+#define _mm512_mask_cvt_roundph_epi16(A, B, C, D)			\
+  ((__m512i)__builtin_ia32_vcvtph2w512_mask_round ((C),		\
+						      (__v32hi)(A),	\
+						      (B),		\
+						      (D)))
 
-#define _mm_maskz_cvt_roundsd_sh(A, B, C, R)			\
-  (__builtin_ia32_vcvtsd2sh_mask_round ((C), (B),		\
-					_mm_setzero_ph (),	\
-					(A), (R)))
+#define _mm512_maskz_cvt_roundph_epi16(A, B, C)				\
+  ((__m512i)__builtin_ia32_vcvtph2w512_mask_round ((B),		\
+						      (__v32hi)		\
+						      _mm512_setzero_si512 (), \
+						      (A),		\
+						      (C)))
 
 #endif /* __OPTIMIZE__ */
 
-/* Intrinsics vfmaddsub[132,213,231]ph.  */
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fmaddsub_ph (__m512h __A, __m512h __B, __m512h __C)
-{
-  return (__m512h)
-    __builtin_ia32_vfmaddsubph512_mask ((__v32hf) __A,
-					(__v32hf) __B,
-					(__v32hf) __C,
-					(__mmask32) -1,
-					_MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __m512h
+/* Intrinsics vcvtph2uw.  */
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmaddsub_ph (__m512h __A, __mmask32 __U, __m512h __B, __m512h __C)
+_mm512_cvtph_epu16 (__m512h __A)
 {
-  return (__m512h)
-    __builtin_ia32_vfmaddsubph512_mask ((__v32hf) __A,
-					(__v32hf) __B,
-					(__v32hf) __C,
-					(__mmask32) __U,
-					_MM_FROUND_CUR_DIRECTION);
+  return (__m512i)
+    __builtin_ia32_vcvtph2uw512_mask_round (__A,
+					       (__v32hi)
+					       _mm512_setzero_si512 (),
+					       (__mmask32) -1,
+					       _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512h
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fmaddsub_ph (__m512h __A, __m512h __B, __m512h __C, __mmask32 __U)
+_mm512_mask_cvtph_epu16 (__m512i __A, __mmask32 __B, __m512h __C)
 {
-  return (__m512h)
-    __builtin_ia32_vfmaddsubph512_mask3 ((__v32hf) __A,
-					 (__v32hf) __B,
-					 (__v32hf) __C,
-					 (__mmask32) __U,
-					 _MM_FROUND_CUR_DIRECTION);
+  return (__m512i)
+    __builtin_ia32_vcvtph2uw512_mask_round (__C, (__v32hi) __A, __B,
+					       _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512h
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmaddsub_ph (__mmask32 __U, __m512h __A, __m512h __B, __m512h __C)
+_mm512_maskz_cvtph_epu16 (__mmask32 __A, __m512h __B)
 {
-  return (__m512h)
-    __builtin_ia32_vfmaddsubph512_maskz ((__v32hf) __A,
-					 (__v32hf) __B,
-					 (__v32hf) __C,
-					 (__mmask32) __U,
-					 _MM_FROUND_CUR_DIRECTION);
+  return (__m512i)
+    __builtin_ia32_vcvtph2uw512_mask_round (__B,
+					       (__v32hi)
+					       _mm512_setzero_si512 (),
+					       __A,
+					       _MM_FROUND_CUR_DIRECTION);
 }
 
 #ifdef __OPTIMIZE__
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fmaddsub_round_ph (__m512h __A, __m512h __B, __m512h __C, const int __R)
-{
-  return (__m512h)
-    __builtin_ia32_vfmaddsubph512_mask ((__v32hf) __A,
-					(__v32hf) __B,
-					(__v32hf) __C,
-					(__mmask32) -1, __R);
-}
-
-extern __inline __m512h
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmaddsub_round_ph (__m512h __A, __mmask32 __U, __m512h __B,
-			       __m512h __C, const int __R)
+_mm512_cvt_roundph_epu16 (__m512h __A, int __B)
 {
-  return (__m512h)
-    __builtin_ia32_vfmaddsubph512_mask ((__v32hf) __A,
-					(__v32hf) __B,
-					(__v32hf) __C,
-					(__mmask32) __U, __R);
+  return (__m512i)
+    __builtin_ia32_vcvtph2uw512_mask_round (__A,
+					       (__v32hi)
+					       _mm512_setzero_si512 (),
+					       (__mmask32) -1,
+					       __B);
 }
 
-extern __inline __m512h
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fmaddsub_round_ph (__m512h __A, __m512h __B, __m512h __C,
-				__mmask32 __U, const int __R)
+_mm512_mask_cvt_roundph_epu16 (__m512i __A, __mmask32 __B, __m512h __C, int __D)
 {
-  return (__m512h)
-    __builtin_ia32_vfmaddsubph512_mask3 ((__v32hf) __A,
-					 (__v32hf) __B,
-					 (__v32hf) __C,
-					 (__mmask32) __U, __R);
+  return (__m512i)
+    __builtin_ia32_vcvtph2uw512_mask_round (__C, (__v32hi) __A, __B, __D);
 }
 
-extern __inline __m512h
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmaddsub_round_ph (__mmask32 __U, __m512h __A, __m512h __B,
-				__m512h __C, const int __R)
+_mm512_maskz_cvt_roundph_epu16 (__mmask32 __A, __m512h __B, int __C)
 {
-  return (__m512h)
-    __builtin_ia32_vfmaddsubph512_maskz ((__v32hf) __A,
-					 (__v32hf) __B,
-					 (__v32hf) __C,
-					 (__mmask32) __U, __R);
+  return (__m512i)
+    __builtin_ia32_vcvtph2uw512_mask_round (__B,
+					       (__v32hi)
+					       _mm512_setzero_si512 (),
+					       __A,
+					       __C);
 }
 
 #else
-#define _mm512_fmaddsub_round_ph(A, B, C, R)				\
-  ((__m512h)__builtin_ia32_vfmaddsubph512_mask ((A), (B), (C), -1, (R)))
-
-#define _mm512_mask_fmaddsub_round_ph(A, U, B, C, R)			\
-  ((__m512h)__builtin_ia32_vfmaddsubph512_mask ((A), (B), (C), (U), (R)))
+#define _mm512_cvt_roundph_epu16(A, B)					\
+  ((__m512i)								\
+   __builtin_ia32_vcvtph2uw512_mask_round ((A),			\
+					      (__v32hi)			\
+					      _mm512_setzero_si512 (),	\
+					      (__mmask32)-1, (B)))
 
-#define _mm512_mask3_fmaddsub_round_ph(A, B, C, U, R)			\
-  ((__m512h)__builtin_ia32_vfmaddsubph512_mask3 ((A), (B), (C), (U), (R)))
+#define _mm512_mask_cvt_roundph_epu16(A, B, C, D)			\
+  ((__m512i)								\
+   __builtin_ia32_vcvtph2uw512_mask_round ((C), (__v32hi)(A), (B), (D)))
 
-#define _mm512_maskz_fmaddsub_round_ph(U, A, B, C, R)			\
-  ((__m512h)__builtin_ia32_vfmaddsubph512_maskz ((A), (B), (C), (U), (R)))
+#define _mm512_maskz_cvt_roundph_epu16(A, B, C)				\
+  ((__m512i)								\
+   __builtin_ia32_vcvtph2uw512_mask_round ((B),			\
+					      (__v32hi)			\
+					      _mm512_setzero_si512 (),	\
+					      (A),			\
+					      (C)))
 
 #endif /* __OPTIMIZE__ */
 
-/* Intrinsics vfmsubadd[132,213,231]ph.  */
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-  _mm512_fmsubadd_ph (__m512h __A, __m512h __B, __m512h __C)
-{
-  return (__m512h)
-    __builtin_ia32_vfmsubaddph512_mask ((__v32hf) __A,
-					(__v32hf) __B,
-					(__v32hf) __C,
-					(__mmask32) -1,
-					_MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __m512h
+/* Intrinsics vcvttph2w.  */
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmsubadd_ph (__m512h __A, __mmask32 __U,
-			 __m512h __B, __m512h __C)
+_mm512_cvttph_epi16 (__m512h __A)
 {
-  return (__m512h)
-    __builtin_ia32_vfmsubaddph512_mask ((__v32hf) __A,
-					(__v32hf) __B,
-					(__v32hf) __C,
-					(__mmask32) __U,
-					_MM_FROUND_CUR_DIRECTION);
+  return (__m512i)
+    __builtin_ia32_vcvttph2w512_mask_round (__A,
+					    (__v32hi)
+					    _mm512_setzero_si512 (),
+					    (__mmask32) -1,
+					    _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512h
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fmsubadd_ph (__m512h __A, __m512h __B,
-			  __m512h __C, __mmask32 __U)
+_mm512_mask_cvttph_epi16 (__m512i __A, __mmask32 __B, __m512h __C)
 {
-  return (__m512h)
-    __builtin_ia32_vfmsubaddph512_mask3 ((__v32hf) __A,
-					 (__v32hf) __B,
-					 (__v32hf) __C,
-					 (__mmask32) __U,
-					 _MM_FROUND_CUR_DIRECTION);
+  return (__m512i)
+    __builtin_ia32_vcvttph2w512_mask_round (__C,
+					    (__v32hi) __A,
+					    __B,
+					    _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512h
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmsubadd_ph (__mmask32 __U, __m512h __A,
-			  __m512h __B, __m512h __C)
+_mm512_maskz_cvttph_epi16 (__mmask32 __A, __m512h __B)
 {
-  return (__m512h)
-    __builtin_ia32_vfmsubaddph512_maskz ((__v32hf) __A,
-					 (__v32hf) __B,
-					 (__v32hf) __C,
-					 (__mmask32) __U,
-					 _MM_FROUND_CUR_DIRECTION);
+  return (__m512i)
+    __builtin_ia32_vcvttph2w512_mask_round (__B,
+					    (__v32hi)
+					    _mm512_setzero_si512 (),
+					    __A,
+					    _MM_FROUND_CUR_DIRECTION);
 }
 
 #ifdef __OPTIMIZE__
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fmsubadd_round_ph (__m512h __A, __m512h __B,
-			  __m512h __C, const int __R)
-{
-  return (__m512h)
-    __builtin_ia32_vfmsubaddph512_mask ((__v32hf) __A,
-					(__v32hf) __B,
-					(__v32hf) __C,
-					(__mmask32) -1, __R);
-}
-
-extern __inline __m512h
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmsubadd_round_ph (__m512h __A, __mmask32 __U, __m512h __B,
-			       __m512h __C, const int __R)
+_mm512_cvtt_roundph_epi16 (__m512h __A, int __B)
 {
-  return (__m512h)
-    __builtin_ia32_vfmsubaddph512_mask ((__v32hf) __A,
-					(__v32hf) __B,
-					(__v32hf) __C,
-					(__mmask32) __U, __R);
+  return (__m512i)
+    __builtin_ia32_vcvttph2w512_mask_round (__A,
+					    (__v32hi)
+					    _mm512_setzero_si512 (),
+					    (__mmask32) -1,
+					    __B);
 }
 
-extern __inline __m512h
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fmsubadd_round_ph (__m512h __A, __m512h __B, __m512h __C,
-				__mmask32 __U, const int __R)
+_mm512_mask_cvtt_roundph_epi16 (__m512i __A, __mmask32 __B,
+				__m512h __C, int __D)
 {
-  return (__m512h)
-    __builtin_ia32_vfmsubaddph512_mask3 ((__v32hf) __A,
-					 (__v32hf) __B,
-					 (__v32hf) __C,
-					 (__mmask32) __U, __R);
+  return (__m512i)
+    __builtin_ia32_vcvttph2w512_mask_round (__C,
+					    (__v32hi) __A,
+					    __B,
+					    __D);
 }
 
-extern __inline __m512h
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmsubadd_round_ph (__mmask32 __U, __m512h __A, __m512h __B,
-				__m512h __C, const int __R)
+_mm512_maskz_cvtt_roundph_epi16 (__mmask32 __A, __m512h __B, int __C)
 {
-  return (__m512h)
-    __builtin_ia32_vfmsubaddph512_maskz ((__v32hf) __A,
-					 (__v32hf) __B,
-					 (__v32hf) __C,
-					 (__mmask32) __U, __R);
+  return (__m512i)
+    __builtin_ia32_vcvttph2w512_mask_round (__B,
+					    (__v32hi)
+					    _mm512_setzero_si512 (),
+					    __A,
+					    __C);
 }
 
 #else
-#define _mm512_fmsubadd_round_ph(A, B, C, R)				\
-  ((__m512h)__builtin_ia32_vfmsubaddph512_mask ((A), (B), (C), -1, (R)))
-
-#define _mm512_mask_fmsubadd_round_ph(A, U, B, C, R)			\
-  ((__m512h)__builtin_ia32_vfmsubaddph512_mask ((A), (B), (C), (U), (R)))
+#define _mm512_cvtt_roundph_epi16(A, B)				    \
+  ((__m512i)							    \
+   __builtin_ia32_vcvttph2w512_mask_round ((A),			    \
+					   (__v32hi)		    \
+					   _mm512_setzero_si512 (), \
+					   (__mmask32)-1,	    \
+					   (B)))
 
-#define _mm512_mask3_fmsubadd_round_ph(A, B, C, U, R)			\
-  ((__m512h)__builtin_ia32_vfmsubaddph512_mask3 ((A), (B), (C), (U), (R)))
+#define _mm512_mask_cvtt_roundph_epi16(A, B, C, D)		\
+  ((__m512i)							\
+   __builtin_ia32_vcvttph2w512_mask_round ((C),			\
+					   (__v32hi)(A),	\
+					   (B),			\
+					   (D)))
 
-#define _mm512_maskz_fmsubadd_round_ph(U, A, B, C, R)			\
-  ((__m512h)__builtin_ia32_vfmsubaddph512_maskz ((A), (B), (C), (U), (R)))
+#define _mm512_maskz_cvtt_roundph_epi16(A, B, C)		    \
+  ((__m512i)							    \
+   __builtin_ia32_vcvttph2w512_mask_round ((B),			    \
+					   (__v32hi)		    \
+					   _mm512_setzero_si512 (), \
+					   (A),			    \
+					   (C)))
 
 #endif /* __OPTIMIZE__ */
 
-/* Intrinsics vfmadd[132,213,231]ph.  */
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-  _mm512_fmadd_ph (__m512h __A, __m512h __B, __m512h __C)
-{
-  return (__m512h)
-    __builtin_ia32_vfmaddph512_mask ((__v32hf) __A,
-				     (__v32hf) __B,
-				     (__v32hf) __C,
-				     (__mmask32) -1,
-				     _MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmadd_ph (__m512h __A, __mmask32 __U, __m512h __B, __m512h __C)
-{
-  return (__m512h)
-    __builtin_ia32_vfmaddph512_mask ((__v32hf) __A,
-				     (__v32hf) __B,
-				     (__v32hf) __C,
-				     (__mmask32) __U,
-				     _MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __m512h
+/* Intrinsics vcvttph2uw.  */
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fmadd_ph (__m512h __A, __m512h __B, __m512h __C, __mmask32 __U)
+_mm512_cvttph_epu16 (__m512h __A)
 {
-  return (__m512h)
-    __builtin_ia32_vfmaddph512_mask3 ((__v32hf) __A,
-				      (__v32hf) __B,
-				      (__v32hf) __C,
-				      (__mmask32) __U,
-				      _MM_FROUND_CUR_DIRECTION);
+  return (__m512i)
+    __builtin_ia32_vcvttph2uw512_mask_round (__A,
+					     (__v32hi)
+					     _mm512_setzero_si512 (),
+					     (__mmask32) -1,
+					     _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512h
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmadd_ph (__mmask32 __U, __m512h __A, __m512h __B, __m512h __C)
+_mm512_mask_cvttph_epu16 (__m512i __A, __mmask32 __B, __m512h __C)
 {
-  return (__m512h)
-    __builtin_ia32_vfmaddph512_maskz ((__v32hf) __A,
-				      (__v32hf) __B,
-				      (__v32hf) __C,
-				      (__mmask32) __U,
-				      _MM_FROUND_CUR_DIRECTION);
+  return (__m512i)
+    __builtin_ia32_vcvttph2uw512_mask_round (__C,
+					     (__v32hi) __A,
+					     __B,
+					     _MM_FROUND_CUR_DIRECTION);
 }
 
-#ifdef __OPTIMIZE__
-extern __inline __m512h
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fmadd_round_ph (__m512h __A, __m512h __B, __m512h __C, const int __R)
+_mm512_maskz_cvttph_epu16 (__mmask32 __A, __m512h __B)
 {
-  return (__m512h) __builtin_ia32_vfmaddph512_mask ((__v32hf) __A,
-						       (__v32hf) __B,
-						       (__v32hf) __C,
-						       (__mmask32) -1, __R);
+  return (__m512i)
+    __builtin_ia32_vcvttph2uw512_mask_round (__B,
+					     (__v32hi)
+					     _mm512_setzero_si512 (),
+					     __A,
+					     _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512h
+#ifdef __OPTIMIZE__
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmadd_round_ph (__m512h __A, __mmask32 __U, __m512h __B,
-			       __m512h __C, const int __R)
+_mm512_cvtt_roundph_epu16 (__m512h __A, int __B)
 {
-  return (__m512h) __builtin_ia32_vfmaddph512_mask ((__v32hf) __A,
-						       (__v32hf) __B,
-						       (__v32hf) __C,
-						       (__mmask32) __U, __R);
+  return (__m512i)
+    __builtin_ia32_vcvttph2uw512_mask_round (__A,
+					     (__v32hi)
+					     _mm512_setzero_si512 (),
+					     (__mmask32) -1,
+					     __B);
 }
 
-extern __inline __m512h
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fmadd_round_ph (__m512h __A, __m512h __B, __m512h __C,
-				__mmask32 __U, const int __R)
+_mm512_mask_cvtt_roundph_epu16 (__m512i __A, __mmask32 __B,
+				__m512h __C, int __D)
 {
-  return (__m512h) __builtin_ia32_vfmaddph512_mask3 ((__v32hf) __A,
-							(__v32hf) __B,
-							(__v32hf) __C,
-							(__mmask32) __U, __R);
+  return (__m512i)
+    __builtin_ia32_vcvttph2uw512_mask_round (__C,
+					     (__v32hi) __A,
+					     __B,
+					     __D);
 }
 
-extern __inline __m512h
+extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmadd_round_ph (__mmask32 __U, __m512h __A, __m512h __B,
-				__m512h __C, const int __R)
+_mm512_maskz_cvtt_roundph_epu16 (__mmask32 __A, __m512h __B, int __C)
 {
-  return (__m512h) __builtin_ia32_vfmaddph512_maskz ((__v32hf) __A,
-							(__v32hf) __B,
-							(__v32hf) __C,
-							(__mmask32) __U, __R);
+  return (__m512i)
+    __builtin_ia32_vcvttph2uw512_mask_round (__B,
+					     (__v32hi)
+					     _mm512_setzero_si512 (),
+					     __A,
+					     __C);
 }
 
 #else
-#define _mm512_fmadd_round_ph(A, B, C, R)				\
-  ((__m512h)__builtin_ia32_vfmaddph512_mask ((A), (B), (C), -1, (R)))
-
-#define _mm512_mask_fmadd_round_ph(A, U, B, C, R)			\
-  ((__m512h)__builtin_ia32_vfmaddph512_mask ((A), (B), (C), (U), (R)))
+#define _mm512_cvtt_roundph_epu16(A, B)				     \
+  ((__m512i)							     \
+   __builtin_ia32_vcvttph2uw512_mask_round ((A),		     \
+					    (__v32hi)		     \
+					    _mm512_setzero_si512 (), \
+					    (__mmask32)-1,	     \
+					    (B)))
 
-#define _mm512_mask3_fmadd_round_ph(A, B, C, U, R)			\
-  ((__m512h)__builtin_ia32_vfmaddph512_mask3 ((A), (B), (C), (U), (R)))
+#define _mm512_mask_cvtt_roundph_epu16(A, B, C, D)		\
+  ((__m512i)							\
+   __builtin_ia32_vcvttph2uw512_mask_round ((C),		\
+					    (__v32hi)(A),	\
+					    (B),		\
+					    (D)))
 
-#define _mm512_maskz_fmadd_round_ph(U, A, B, C, R)			\
-  ((__m512h)__builtin_ia32_vfmaddph512_maskz ((A), (B), (C), (U), (R)))
+#define _mm512_maskz_cvtt_roundph_epu16(A, B, C)		     \
+  ((__m512i)							     \
+   __builtin_ia32_vcvttph2uw512_mask_round ((B),		     \
+					    (__v32hi)		     \
+					    _mm512_setzero_si512 (), \
+					    (A),		     \
+					    (C)))
 
 #endif /* __OPTIMIZE__ */
 
-/* Intrinsics vfnmadd[132,213,231]ph.  */
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fnmadd_ph (__m512h __A, __m512h __B, __m512h __C)
-{
-  return (__m512h)
-    __builtin_ia32_vfnmaddph512_mask ((__v32hf) __A,
-				      (__v32hf) __B,
-				      (__v32hf) __C,
-				      (__mmask32) -1,
-				      _MM_FROUND_CUR_DIRECTION);
-}
-
+/* Intrinsics vcvtw2ph.  */
 extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fnmadd_ph (__m512h __A, __mmask32 __U, __m512h __B, __m512h __C)
+_mm512_cvtepi16_ph (__m512i __A)
 {
-  return (__m512h)
-    __builtin_ia32_vfnmaddph512_mask ((__v32hf) __A,
-				      (__v32hf) __B,
-				      (__v32hf) __C,
-				      (__mmask32) __U,
-				      _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_vcvtw2ph512_mask_round ((__v32hi) __A,
+						_mm512_setzero_ph (),
+						(__mmask32) -1,
+						_MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fnmadd_ph (__m512h __A, __m512h __B, __m512h __C, __mmask32 __U)
+_mm512_mask_cvtepi16_ph (__m512h __A, __mmask32 __B, __m512i __C)
 {
-  return (__m512h)
-    __builtin_ia32_vfnmaddph512_mask3 ((__v32hf) __A,
-				       (__v32hf) __B,
-				       (__v32hf) __C,
-				       (__mmask32) __U,
-				       _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_vcvtw2ph512_mask_round ((__v32hi) __C,
+						__A,
+						__B,
+						_MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fnmadd_ph (__mmask32 __U, __m512h __A, __m512h __B, __m512h __C)
+_mm512_maskz_cvtepi16_ph (__mmask32 __A, __m512i __B)
 {
-  return (__m512h)
-    __builtin_ia32_vfnmaddph512_maskz ((__v32hf) __A,
-				       (__v32hf) __B,
-				       (__v32hf) __C,
-				       (__mmask32) __U,
-				       _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_vcvtw2ph512_mask_round ((__v32hi) __B,
+						_mm512_setzero_ph (),
+						__A,
+						_MM_FROUND_CUR_DIRECTION);
 }
 
 #ifdef __OPTIMIZE__
 extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fnmadd_round_ph (__m512h __A, __m512h __B, __m512h __C, const int __R)
-{
-  return (__m512h) __builtin_ia32_vfnmaddph512_mask ((__v32hf) __A,
-						       (__v32hf) __B,
-						       (__v32hf) __C,
-						       (__mmask32) -1, __R);
-}
-
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fnmadd_round_ph (__m512h __A, __mmask32 __U, __m512h __B,
-			       __m512h __C, const int __R)
+_mm512_cvt_roundepi16_ph (__m512i __A, int __B)
 {
-  return (__m512h) __builtin_ia32_vfnmaddph512_mask ((__v32hf) __A,
-						       (__v32hf) __B,
-						       (__v32hf) __C,
-						       (__mmask32) __U, __R);
+  return __builtin_ia32_vcvtw2ph512_mask_round ((__v32hi) __A,
+						_mm512_setzero_ph (),
+						(__mmask32) -1,
+						__B);
 }
 
 extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fnmadd_round_ph (__m512h __A, __m512h __B, __m512h __C,
-				__mmask32 __U, const int __R)
+_mm512_mask_cvt_roundepi16_ph (__m512h __A, __mmask32 __B, __m512i __C, int __D)
 {
-  return (__m512h) __builtin_ia32_vfnmaddph512_mask3 ((__v32hf) __A,
-							(__v32hf) __B,
-							(__v32hf) __C,
-							(__mmask32) __U, __R);
+  return __builtin_ia32_vcvtw2ph512_mask_round ((__v32hi) __C,
+						__A,
+						__B,
+						__D);
 }
 
 extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fnmadd_round_ph (__mmask32 __U, __m512h __A, __m512h __B,
-				__m512h __C, const int __R)
+_mm512_maskz_cvt_roundepi16_ph (__mmask32 __A, __m512i __B, int __C)
 {
-  return (__m512h) __builtin_ia32_vfnmaddph512_maskz ((__v32hf) __A,
-							(__v32hf) __B,
-							(__v32hf) __C,
-							(__mmask32) __U, __R);
+  return __builtin_ia32_vcvtw2ph512_mask_round ((__v32hi) __B,
+						_mm512_setzero_ph (),
+						__A,
+						__C);
 }
 
 #else
-#define _mm512_fnmadd_round_ph(A, B, C, R)				\
-  ((__m512h)__builtin_ia32_vfnmaddph512_mask ((A), (B), (C), -1, (R)))
-
-#define _mm512_mask_fnmadd_round_ph(A, U, B, C, R)			\
-  ((__m512h)__builtin_ia32_vfnmaddph512_mask ((A), (B), (C), (U), (R)))
+#define _mm512_cvt_roundepi16_ph(A, B)				\
+  (__builtin_ia32_vcvtw2ph512_mask_round ((__v32hi)(A),		\
+					  _mm512_setzero_ph (),	\
+					  (__mmask32)-1,	\
+					  (B)))
 
-#define _mm512_mask3_fnmadd_round_ph(A, B, C, U, R)			\
-  ((__m512h)__builtin_ia32_vfnmaddph512_mask3 ((A), (B), (C), (U), (R)))
+#define _mm512_mask_cvt_roundepi16_ph(A, B, C, D)	\
+  (__builtin_ia32_vcvtw2ph512_mask_round ((__v32hi)(C),	\
+					  (A),		\
+					  (B),		\
+					  (D)))
 
-#define _mm512_maskz_fnmadd_round_ph(U, A, B, C, R)			\
-  ((__m512h)__builtin_ia32_vfnmaddph512_maskz ((A), (B), (C), (U), (R)))
+#define _mm512_maskz_cvt_roundepi16_ph(A, B, C)			\
+  (__builtin_ia32_vcvtw2ph512_mask_round ((__v32hi)(B),		\
+					  _mm512_setzero_ph (),	\
+					  (A),			\
+					  (C)))
 
 #endif /* __OPTIMIZE__ */
 
-/* Intrinsics vfmsub[132,213,231]ph.  */
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fmsub_ph (__m512h __A, __m512h __B, __m512h __C)
-{
-  return (__m512h)
-    __builtin_ia32_vfmsubph512_mask ((__v32hf) __A,
-				     (__v32hf) __B,
-				     (__v32hf) __C,
-				     (__mmask32) -1,
-				     _MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmsub_ph (__m512h __A, __mmask32 __U, __m512h __B, __m512h __C)
-{
-  return (__m512h)
-    __builtin_ia32_vfmsubph512_mask ((__v32hf) __A,
-				     (__v32hf) __B,
-				     (__v32hf) __C,
-				     (__mmask32) __U,
-				     _MM_FROUND_CUR_DIRECTION);
-}
+/* Intrinsics vcvtuw2ph.  */
+  extern __inline __m512h
+  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+  _mm512_cvtepu16_ph (__m512i __A)
+  {
+    return __builtin_ia32_vcvtuw2ph512_mask_round ((__v32hi) __A,
+						   _mm512_setzero_ph (),
+						   (__mmask32) -1,
+						   _MM_FROUND_CUR_DIRECTION);
+  }
 
 extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fmsub_ph (__m512h __A, __m512h __B, __m512h __C, __mmask32 __U)
+_mm512_mask_cvtepu16_ph (__m512h __A, __mmask32 __B, __m512i __C)
 {
-  return (__m512h)
-    __builtin_ia32_vfmsubph512_mask3 ((__v32hf) __A,
-				      (__v32hf) __B,
-				      (__v32hf) __C,
-				      (__mmask32) __U,
-				      _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_vcvtuw2ph512_mask_round ((__v32hi) __C,
+						 __A,
+						 __B,
+						 _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmsub_ph (__mmask32 __U, __m512h __A, __m512h __B, __m512h __C)
+_mm512_maskz_cvtepu16_ph (__mmask32 __A, __m512i __B)
 {
-  return (__m512h)
-    __builtin_ia32_vfmsubph512_maskz ((__v32hf) __A,
-				      (__v32hf) __B,
-				      (__v32hf) __C,
-				      (__mmask32) __U,
-				      _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_vcvtuw2ph512_mask_round ((__v32hi) __B,
+						 _mm512_setzero_ph (),
+						 __A,
+						 _MM_FROUND_CUR_DIRECTION);
 }
 
 #ifdef __OPTIMIZE__
 extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fmsub_round_ph (__m512h __A, __m512h __B, __m512h __C, const int __R)
-{
-  return (__m512h) __builtin_ia32_vfmsubph512_mask ((__v32hf) __A,
-						       (__v32hf) __B,
-						       (__v32hf) __C,
-						       (__mmask32) -1, __R);
-}
-
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmsub_round_ph (__m512h __A, __mmask32 __U, __m512h __B,
-			       __m512h __C, const int __R)
+_mm512_cvt_roundepu16_ph (__m512i __A, int __B)
 {
-  return (__m512h) __builtin_ia32_vfmsubph512_mask ((__v32hf) __A,
-						       (__v32hf) __B,
-						       (__v32hf) __C,
-						       (__mmask32) __U, __R);
+  return __builtin_ia32_vcvtuw2ph512_mask_round ((__v32hi) __A,
+						 _mm512_setzero_ph (),
+						 (__mmask32) -1,
+						 __B);
 }
 
 extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fmsub_round_ph (__m512h __A, __m512h __B, __m512h __C,
-				__mmask32 __U, const int __R)
+_mm512_mask_cvt_roundepu16_ph (__m512h __A, __mmask32 __B, __m512i __C, int __D)
 {
-  return (__m512h) __builtin_ia32_vfmsubph512_mask3 ((__v32hf) __A,
-							(__v32hf) __B,
-							(__v32hf) __C,
-							(__mmask32) __U, __R);
+  return __builtin_ia32_vcvtuw2ph512_mask_round ((__v32hi) __C,
+						 __A,
+						 __B,
+						 __D);
 }
 
 extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmsub_round_ph (__mmask32 __U, __m512h __A, __m512h __B,
-				__m512h __C, const int __R)
+_mm512_maskz_cvt_roundepu16_ph (__mmask32 __A, __m512i __B, int __C)
 {
-  return (__m512h) __builtin_ia32_vfmsubph512_maskz ((__v32hf) __A,
-							(__v32hf) __B,
-							(__v32hf) __C,
-							(__mmask32) __U, __R);
+  return __builtin_ia32_vcvtuw2ph512_mask_round ((__v32hi) __B,
+						 _mm512_setzero_ph (),
+						 __A,
+						 __C);
 }
 
 #else
-#define _mm512_fmsub_round_ph(A, B, C, R)				\
-  ((__m512h)__builtin_ia32_vfmsubph512_mask ((A), (B), (C), -1, (R)))
-
-#define _mm512_mask_fmsub_round_ph(A, U, B, C, R)			\
-  ((__m512h)__builtin_ia32_vfmsubph512_mask ((A), (B), (C), (U), (R)))
+#define _mm512_cvt_roundepu16_ph(A, B)					\
+  (__builtin_ia32_vcvtuw2ph512_mask_round ((__v32hi)(A),		\
+					   _mm512_setzero_ph (),	\
+					   (__mmask32)-1,		\
+					   (B)))
 
-#define _mm512_mask3_fmsub_round_ph(A, B, C, U, R)			\
-  ((__m512h)__builtin_ia32_vfmsubph512_mask3 ((A), (B), (C), (U), (R)))
+#define _mm512_mask_cvt_roundepu16_ph(A, B, C, D)		\
+  (__builtin_ia32_vcvtuw2ph512_mask_round ((__v32hi)(C),	\
+					   (A),			\
+					   (B),			\
+					   (D)))
 
-#define _mm512_maskz_fmsub_round_ph(U, A, B, C, R)			\
-  ((__m512h)__builtin_ia32_vfmsubph512_maskz ((A), (B), (C), (U), (R)))
+#define _mm512_maskz_cvt_roundepu16_ph(A, B, C)				\
+  (__builtin_ia32_vcvtuw2ph512_mask_round ((__v32hi)(B),		\
+					   _mm512_setzero_ph (),	\
+					   (A),				\
+					   (C)))
 
 #endif /* __OPTIMIZE__ */
 
-/* Intrinsics vfnmsub[132,213,231]ph.  */
-extern __inline __m512h
+/* Intrinsics vcvtph2pd.  */
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fnmsub_ph (__m512h __A, __m512h __B, __m512h __C)
+_mm512_cvtph_pd (__m128h __A)
 {
-  return (__m512h)
-    __builtin_ia32_vfnmsubph512_mask ((__v32hf) __A,
-				      (__v32hf) __B,
-				      (__v32hf) __C,
-				      (__mmask32) -1,
-				      _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_vcvtph2pd512_mask_round (__A,
+						 _mm512_setzero_pd (),
+						 (__mmask8) -1,
+						 _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512h
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fnmsub_ph (__m512h __A, __mmask32 __U, __m512h __B, __m512h __C)
+_mm512_mask_cvtph_pd (__m512d __A, __mmask8 __B, __m128h __C)
 {
-  return (__m512h)
-    __builtin_ia32_vfnmsubph512_mask ((__v32hf) __A,
-				      (__v32hf) __B,
-				      (__v32hf) __C,
-				      (__mmask32) __U,
-				      _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_vcvtph2pd512_mask_round (__C, __A, __B,
+						 _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512h
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fnmsub_ph (__m512h __A, __m512h __B, __m512h __C, __mmask32 __U)
+_mm512_maskz_cvtph_pd (__mmask8 __A, __m128h __B)
 {
-  return (__m512h)
-    __builtin_ia32_vfnmsubph512_mask3 ((__v32hf) __A,
-				       (__v32hf) __B,
-				       (__v32hf) __C,
-				       (__mmask32) __U,
-				       _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_vcvtph2pd512_mask_round (__B,
+						 _mm512_setzero_pd (),
+						 __A,
+						 _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512h
+#ifdef __OPTIMIZE__
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fnmsub_ph (__mmask32 __U, __m512h __A, __m512h __B, __m512h __C)
+_mm512_cvt_roundph_pd (__m128h __A, int __B)
 {
-  return (__m512h)
-    __builtin_ia32_vfnmsubph512_maskz ((__v32hf) __A,
-				       (__v32hf) __B,
-				       (__v32hf) __C,
-				       (__mmask32) __U,
-				       _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_vcvtph2pd512_mask_round (__A,
+						 _mm512_setzero_pd (),
+						 (__mmask8) -1,
+						 __B);
 }
 
-#ifdef __OPTIMIZE__
-extern __inline __m512h
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fnmsub_round_ph (__m512h __A, __m512h __B, __m512h __C, const int __R)
+_mm512_mask_cvt_roundph_pd (__m512d __A, __mmask8 __B, __m128h __C, int __D)
 {
-  return (__m512h) __builtin_ia32_vfnmsubph512_mask ((__v32hf) __A,
-						       (__v32hf) __B,
-						       (__v32hf) __C,
-						       (__mmask32) -1, __R);
+  return __builtin_ia32_vcvtph2pd512_mask_round (__C, __A, __B, __D);
 }
 
-extern __inline __m512h
+extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fnmsub_round_ph (__m512h __A, __mmask32 __U, __m512h __B,
-			       __m512h __C, const int __R)
+_mm512_maskz_cvt_roundph_pd (__mmask8 __A, __m128h __B, int __C)
 {
-  return (__m512h) __builtin_ia32_vfnmsubph512_mask ((__v32hf) __A,
-						       (__v32hf) __B,
-						       (__v32hf) __C,
-						       (__mmask32) __U, __R);
+  return __builtin_ia32_vcvtph2pd512_mask_round (__B,
+						 _mm512_setzero_pd (),
+						 __A,
+						 __C);
 }
 
-extern __inline __m512h
+#else
+#define _mm512_cvt_roundph_pd(A, B)					\
+  (__builtin_ia32_vcvtph2pd512_mask_round ((A),			\
+					   _mm512_setzero_pd (),	\
+					   (__mmask8)-1,		\
+					   (B)))
+
+#define _mm512_mask_cvt_roundph_pd(A, B, C, D)				\
+  (__builtin_ia32_vcvtph2pd512_mask_round ((C), (A), (B), (D)))
+
+#define _mm512_maskz_cvt_roundph_pd(A, B, C)				\
+  (__builtin_ia32_vcvtph2pd512_mask_round ((B),			\
+					   _mm512_setzero_pd (),	\
+					   (A),			\
+					   (C)))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vcvtph2psx.  */
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fnmsub_round_ph (__m512h __A, __m512h __B, __m512h __C,
-				__mmask32 __U, const int __R)
+_mm512_cvtxph_ps (__m256h __A)
 {
-  return (__m512h) __builtin_ia32_vfnmsubph512_mask3 ((__v32hf) __A,
-							(__v32hf) __B,
-							(__v32hf) __C,
-							(__mmask32) __U, __R);
+  return __builtin_ia32_vcvtph2psx512_mask_round (__A,
+						  _mm512_setzero_ps (),
+						  (__mmask16) -1,
+						  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m512h
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fnmsub_round_ph (__mmask32 __U, __m512h __A, __m512h __B,
-				__m512h __C, const int __R)
+_mm512_mask_cvtxph_ps (__m512 __A, __mmask16 __B, __m256h __C)
 {
-  return (__m512h) __builtin_ia32_vfnmsubph512_maskz ((__v32hf) __A,
-							(__v32hf) __B,
-							(__v32hf) __C,
-							(__mmask32) __U, __R);
+  return __builtin_ia32_vcvtph2psx512_mask_round (__C, __A, __B,
+						  _MM_FROUND_CUR_DIRECTION);
 }
 
-#else
-#define _mm512_fnmsub_round_ph(A, B, C, R)				\
-  ((__m512h)__builtin_ia32_vfnmsubph512_mask ((A), (B), (C), -1, (R)))
-
-#define _mm512_mask_fnmsub_round_ph(A, U, B, C, R)			\
-  ((__m512h)__builtin_ia32_vfnmsubph512_mask ((A), (B), (C), (U), (R)))
-
-#define _mm512_mask3_fnmsub_round_ph(A, B, C, U, R)			\
-  ((__m512h)__builtin_ia32_vfnmsubph512_mask3 ((A), (B), (C), (U), (R)))
-
-#define _mm512_maskz_fnmsub_round_ph(U, A, B, C, R)			\
-  ((__m512h)__builtin_ia32_vfnmsubph512_maskz ((A), (B), (C), (U), (R)))
-
-#endif /* __OPTIMIZE__ */
-
-/* Intrinsics vfmadd[132,213,231]sh.  */
-extern __inline __m128h
-  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fmadd_sh (__m128h __W, __m128h __A, __m128h __B)
+extern __inline __m512
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvtxph_ps (__mmask16 __A, __m256h __B)
 {
-  return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
-						  (__v8hf) __A,
-						  (__v8hf) __B,
-						  (__mmask8) -1,
+  return __builtin_ia32_vcvtph2psx512_mask_round (__B,
+						  _mm512_setzero_ps (),
+						  __A,
 						  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128h
+#ifdef __OPTIMIZE__
+extern __inline __m512
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvtx_roundph_ps (__m256h __A, int __B)
+{
+  return __builtin_ia32_vcvtph2psx512_mask_round (__A,
+						  _mm512_setzero_ps (),
+						  (__mmask16) -1,
+						  __B);
+}
+
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fmadd_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B)
+_mm512_mask_cvtx_roundph_ps (__m512 __A, __mmask16 __B, __m256h __C, int __D)
 {
-  return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
-						  (__v8hf) __A,
-						  (__v8hf) __B,
-						  (__mmask8) __U,
-						  _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_vcvtph2psx512_mask_round (__C, __A, __B, __D);
 }
 
-extern __inline __m128h
+extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fmadd_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U)
+_mm512_maskz_cvtx_roundph_ps (__mmask16 __A, __m256h __B, int __C)
 {
-  return (__m128h) __builtin_ia32_vfmaddsh3_mask3 ((__v8hf) __W,
-						   (__v8hf) __A,
-						   (__v8hf) __B,
-						   (__mmask8) __U,
-						   _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_vcvtph2psx512_mask_round (__B,
+						  _mm512_setzero_ps (),
+						  __A,
+						  __C);
 }
 
-extern __inline __m128h
+#else
+#define _mm512_cvtx_roundph_ps(A, B)					\
+  (__builtin_ia32_vcvtph2psx512_mask_round ((A),			\
+					    _mm512_setzero_ps (),	\
+					    (__mmask16)-1,		\
+					    (B)))
+
+#define _mm512_mask_cvtx_roundph_ps(A, B, C, D)				\
+  (__builtin_ia32_vcvtph2psx512_mask_round ((C), (A), (B), (D)))
+
+#define _mm512_maskz_cvtx_roundph_ps(A, B, C)				\
+  (__builtin_ia32_vcvtph2psx512_mask_round ((B),			\
+					    _mm512_setzero_ps (),	\
+					    (A),			\
+					    (C)))
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vcvtps2ph.  */
+extern __inline __m256h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fmadd_sh (__mmask8 __U, __m128h __W, __m128h __A, __m128h __B)
+_mm512_cvtxps_ph (__m512 __A)
 {
-  return (__m128h) __builtin_ia32_vfmaddsh3_maskz ((__v8hf) __W,
-						   (__v8hf) __A,
-						   (__v8hf) __B,
-						   (__mmask8) __U,
-						   _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_vcvtps2phx512_mask_round ((__v16sf) __A,
+						  _mm256_setzero_ph (),
+						  (__mmask16) -1,
+						  _MM_FROUND_CUR_DIRECTION);
 }
 
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cvtxps_ph (__m256h __A, __mmask16 __B, __m512 __C)
+{
+  return __builtin_ia32_vcvtps2phx512_mask_round ((__v16sf) __C,
+						  __A, __B,
+						  _MM_FROUND_CUR_DIRECTION);
+}
 
-#ifdef __OPTIMIZE__
-extern __inline __m128h
+extern __inline __m256h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fmadd_round_sh (__m128h __W, __m128h __A, __m128h __B, const int __R)
+_mm512_maskz_cvtxps_ph (__mmask16 __A, __m512 __B)
 {
-  return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
-						  (__v8hf) __A,
-						  (__v8hf) __B,
-						  (__mmask8) -1,
-						  __R);
+  return __builtin_ia32_vcvtps2phx512_mask_round ((__v16sf) __B,
+						  _mm256_setzero_ph (),
+						  __A,
+						  _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128h
+#ifdef __OPTIMIZE__
+extern __inline __m256h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fmadd_round_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B,
-			 const int __R)
+_mm512_cvtx_roundps_ph (__m512 __A, int __B)
 {
-  return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
-						  (__v8hf) __A,
-						  (__v8hf) __B,
-						  (__mmask8) __U, __R);
+  return __builtin_ia32_vcvtps2phx512_mask_round ((__v16sf) __A,
+						  _mm256_setzero_ph (),
+						  (__mmask16) -1,
+						  __B);
 }
 
-extern __inline __m128h
+extern __inline __m256h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fmadd_round_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U,
-			  const int __R)
+_mm512_mask_cvtx_roundps_ph (__m256h __A, __mmask16 __B, __m512 __C, int __D)
 {
-  return (__m128h) __builtin_ia32_vfmaddsh3_mask3 ((__v8hf) __W,
-						   (__v8hf) __A,
-						   (__v8hf) __B,
-						   (__mmask8) __U, __R);
+  return __builtin_ia32_vcvtps2phx512_mask_round ((__v16sf) __C,
+						  __A, __B, __D);
 }
 
-extern __inline __m128h
+extern __inline __m256h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fmadd_round_sh (__mmask8 __U, __m128h __W, __m128h __A,
-			  __m128h __B, const int __R)
+_mm512_maskz_cvtx_roundps_ph (__mmask16 __A, __m512 __B, int __C)
 {
-  return (__m128h) __builtin_ia32_vfmaddsh3_maskz ((__v8hf) __W,
-						   (__v8hf) __A,
-						   (__v8hf) __B,
-						   (__mmask8) __U, __R);
+  return __builtin_ia32_vcvtps2phx512_mask_round ((__v16sf) __B,
+						  _mm256_setzero_ph (),
+						  __A, __C);
 }
 
 #else
-#define _mm_fmadd_round_sh(A, B, C, R)					\
-  ((__m128h) __builtin_ia32_vfmaddsh3_mask ((A), (B), (C), (-1), (R)))
-#define _mm_mask_fmadd_round_sh(A, U, B, C, R)				\
-  ((__m128h) __builtin_ia32_vfmaddsh3_mask ((A), (B), (C), (U), (R)))
-#define _mm_mask3_fmadd_round_sh(A, B, C, U, R)				\
-  ((__m128h) __builtin_ia32_vfmaddsh3_mask3 ((A), (B), (C), (U), (R)))
-#define _mm_maskz_fmadd_round_sh(U, A, B, C, R)				\
-  ((__m128h) __builtin_ia32_vfmaddsh3_maskz ((A), (B), (C), (U), (R)))
+#define _mm512_cvtx_roundps_ph(A, B)				\
+  (__builtin_ia32_vcvtps2phx512_mask_round ((__v16sf)(A),	\
+					    _mm256_setzero_ph (),\
+					    (__mmask16)-1, (B)))
 
-#endif /* __OPTIMIZE__ */
+#define _mm512_mask_cvtx_roundps_ph(A, B, C, D)			\
+  (__builtin_ia32_vcvtps2phx512_mask_round ((__v16sf)(C),	\
+					    (A), (B), (D)))
 
-/* Intrinsics vfnmadd[132,213,231]sh.  */
-extern __inline __m128h
-  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fnmadd_sh (__m128h __W, __m128h __A, __m128h __B)
-{
-  return (__m128h) __builtin_ia32_vfnmaddsh3_mask ((__v8hf) __W,
-						   (__v8hf) __A,
-						   (__v8hf) __B,
-						   (__mmask8) -1,
-						   _MM_FROUND_CUR_DIRECTION);
-}
+#define _mm512_maskz_cvtx_roundps_ph(A, B, C)			\
+  (__builtin_ia32_vcvtps2phx512_mask_round ((__v16sf)(B),	\
+					    _mm256_setzero_ph (),\
+					    (A), (C)))
+#endif /* __OPTIMIZE__ */
 
+/* Intrinsics vcvtpd2ph.  */
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fnmadd_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B)
+_mm512_cvtpd_ph (__m512d __A)
 {
-  return (__m128h) __builtin_ia32_vfnmaddsh3_mask ((__v8hf) __W,
-						  (__v8hf) __A,
-						  (__v8hf) __B,
-						  (__mmask8) __U,
-						  _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_vcvtpd2ph512_mask_round ((__v8df) __A,
+						 _mm_setzero_ph (),
+						 (__mmask8) -1,
+						 _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fnmadd_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U)
+_mm512_mask_cvtpd_ph (__m128h __A, __mmask8 __B, __m512d __C)
 {
-  return (__m128h) __builtin_ia32_vfnmaddsh3_mask3 ((__v8hf) __W,
-						   (__v8hf) __A,
-						   (__v8hf) __B,
-						   (__mmask8) __U,
-						   _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_vcvtpd2ph512_mask_round ((__v8df) __C,
+						 __A, __B,
+						 _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fnmadd_sh (__mmask8 __U, __m128h __W, __m128h __A, __m128h __B)
+_mm512_maskz_cvtpd_ph (__mmask8 __A, __m512d __B)
 {
-  return (__m128h) __builtin_ia32_vfnmaddsh3_maskz ((__v8hf) __W,
-						   (__v8hf) __A,
-						   (__v8hf) __B,
-						   (__mmask8) __U,
-						   _MM_FROUND_CUR_DIRECTION);
+  return __builtin_ia32_vcvtpd2ph512_mask_round ((__v8df) __B,
+						 _mm_setzero_ph (),
+						 __A,
+						 _MM_FROUND_CUR_DIRECTION);
 }
 
-
 #ifdef __OPTIMIZE__
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fnmadd_round_sh (__m128h __W, __m128h __A, __m128h __B, const int __R)
-{
-  return (__m128h) __builtin_ia32_vfnmaddsh3_mask ((__v8hf) __W,
-						   (__v8hf) __A,
-						   (__v8hf) __B,
-						   (__mmask8) -1,
-						   __R);
-}
-
-extern __inline __m128h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fnmadd_round_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B,
-			 const int __R)
+_mm512_cvt_roundpd_ph (__m512d __A, int __B)
 {
-  return (__m128h) __builtin_ia32_vfnmaddsh3_mask ((__v8hf) __W,
-						  (__v8hf) __A,
-						  (__v8hf) __B,
-						  (__mmask8) __U, __R);
+  return __builtin_ia32_vcvtpd2ph512_mask_round ((__v8df) __A,
+						 _mm_setzero_ph (),
+						 (__mmask8) -1,
+						 __B);
 }
 
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fnmadd_round_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U,
-			  const int __R)
+_mm512_mask_cvt_roundpd_ph (__m128h __A, __mmask8 __B, __m512d __C, int __D)
 {
-  return (__m128h) __builtin_ia32_vfnmaddsh3_mask3 ((__v8hf) __W,
-						   (__v8hf) __A,
-						   (__v8hf) __B,
-						   (__mmask8) __U, __R);
+  return __builtin_ia32_vcvtpd2ph512_mask_round ((__v8df) __C,
+						 __A, __B, __D);
 }
 
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fnmadd_round_sh (__mmask8 __U, __m128h __W, __m128h __A,
-			  __m128h __B, const int __R)
+_mm512_maskz_cvt_roundpd_ph (__mmask8 __A, __m512d __B, int __C)
 {
-  return (__m128h) __builtin_ia32_vfnmaddsh3_maskz ((__v8hf) __W,
-						   (__v8hf) __A,
-						   (__v8hf) __B,
-						   (__mmask8) __U, __R);
+  return __builtin_ia32_vcvtpd2ph512_mask_round ((__v8df) __B,
+						 _mm_setzero_ph (),
+						 __A, __C);
 }
 
 #else
-#define _mm_fnmadd_round_sh(A, B, C, R)					\
-  ((__m128h) __builtin_ia32_vfnmaddsh3_mask ((A), (B), (C), (-1), (R)))
-#define _mm_mask_fnmadd_round_sh(A, U, B, C, R)				\
-  ((__m128h) __builtin_ia32_vfnmaddsh3_mask ((A), (B), (C), (U), (R)))
-#define _mm_mask3_fnmadd_round_sh(A, B, C, U, R)			\
-  ((__m128h) __builtin_ia32_vfnmaddsh3_mask3 ((A), (B), (C), (U), (R)))
-#define _mm_maskz_fnmadd_round_sh(U, A, B, C, R)			\
-  ((__m128h) __builtin_ia32_vfnmaddsh3_maskz ((A), (B), (C), (U), (R)))
+#define _mm512_cvt_roundpd_ph(A, B)				\
+  (__builtin_ia32_vcvtpd2ph512_mask_round ((__v8df)(A),		\
+					   _mm_setzero_ph (),	\
+					   (__mmask8)-1, (B)))
+
+#define _mm512_mask_cvt_roundpd_ph(A, B, C, D)			\
+  (__builtin_ia32_vcvtpd2ph512_mask_round ((__v8df)(C),		\
+					   (A), (B), (D)))
+
+#define _mm512_maskz_cvt_roundpd_ph(A, B, C)			\
+  (__builtin_ia32_vcvtpd2ph512_mask_round ((__v8df)(B),		\
+					   _mm_setzero_ph (),	\
+					   (A), (C)))
 
 #endif /* __OPTIMIZE__ */
 
-/* Intrinsics vfmsub[132,213,231]sh.  */
-extern __inline __m128h
-  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fmsub_sh (__m128h __W, __m128h __A, __m128h __B)
-{
-  return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
-						  (__v8hf) __A,
-						  -(__v8hf) __B,
-						  (__mmask8) -1,
-						  _MM_FROUND_CUR_DIRECTION);
+/* Intrinsics vfmaddsub[132,213,231]ph.  */
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_fmaddsub_ph (__m512h __A, __m512h __B, __m512h __C)
+{
+  return (__m512h)
+    __builtin_ia32_vfmaddsubph512_mask ((__v32hf) __A,
+					(__v32hf) __B,
+					(__v32hf) __C,
+					(__mmask32) -1,
+					_MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fmsub_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B)
+_mm512_mask_fmaddsub_ph (__m512h __A, __mmask32 __U, __m512h __B, __m512h __C)
 {
-  return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
-						  (__v8hf) __A,
-						  -(__v8hf) __B,
-						  (__mmask8) __U,
-						  _MM_FROUND_CUR_DIRECTION);
+  return (__m512h)
+    __builtin_ia32_vfmaddsubph512_mask ((__v32hf) __A,
+					(__v32hf) __B,
+					(__v32hf) __C,
+					(__mmask32) __U,
+					_MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fmsub_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U)
+_mm512_mask3_fmaddsub_ph (__m512h __A, __m512h __B, __m512h __C, __mmask32 __U)
 {
-  return (__m128h) __builtin_ia32_vfmsubsh3_mask3 ((__v8hf) __W,
-						   (__v8hf) __A,
-						   (__v8hf) __B,
-						   (__mmask8) __U,
-						   _MM_FROUND_CUR_DIRECTION);
+  return (__m512h)
+    __builtin_ia32_vfmaddsubph512_mask3 ((__v32hf) __A,
+					 (__v32hf) __B,
+					 (__v32hf) __C,
+					 (__mmask32) __U,
+					 _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fmsub_sh (__mmask8 __U, __m128h __W, __m128h __A, __m128h __B)
+_mm512_maskz_fmaddsub_ph (__mmask32 __U, __m512h __A, __m512h __B, __m512h __C)
 {
-  return (__m128h) __builtin_ia32_vfmaddsh3_maskz ((__v8hf) __W,
-						   (__v8hf) __A,
-						   -(__v8hf) __B,
-						   (__mmask8) __U,
-						   _MM_FROUND_CUR_DIRECTION);
+  return (__m512h)
+    __builtin_ia32_vfmaddsubph512_maskz ((__v32hf) __A,
+					 (__v32hf) __B,
+					 (__v32hf) __C,
+					 (__mmask32) __U,
+					 _MM_FROUND_CUR_DIRECTION);
 }
 
-
 #ifdef __OPTIMIZE__
-extern __inline __m128h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fmsub_round_sh (__m128h __W, __m128h __A, __m128h __B, const int __R)
+_mm512_fmaddsub_round_ph (__m512h __A, __m512h __B, __m512h __C, const int __R)
 {
-  return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
-						  (__v8hf) __A,
-						  -(__v8hf) __B,
-						  (__mmask8) -1,
-						  __R);
+  return (__m512h)
+    __builtin_ia32_vfmaddsubph512_mask ((__v32hf) __A,
+					(__v32hf) __B,
+					(__v32hf) __C,
+					(__mmask32) -1, __R);
 }
 
-extern __inline __m128h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fmsub_round_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B,
-			 const int __R)
+_mm512_mask_fmaddsub_round_ph (__m512h __A, __mmask32 __U, __m512h __B,
+			       __m512h __C, const int __R)
 {
-  return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
-						  (__v8hf) __A,
-						  -(__v8hf) __B,
-						  (__mmask8) __U, __R);
+  return (__m512h)
+    __builtin_ia32_vfmaddsubph512_mask ((__v32hf) __A,
+					(__v32hf) __B,
+					(__v32hf) __C,
+					(__mmask32) __U, __R);
 }
 
-extern __inline __m128h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fmsub_round_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U,
-			  const int __R)
+_mm512_mask3_fmaddsub_round_ph (__m512h __A, __m512h __B, __m512h __C,
+				__mmask32 __U, const int __R)
 {
-  return (__m128h) __builtin_ia32_vfmsubsh3_mask3 ((__v8hf) __W,
-						   (__v8hf) __A,
-						   (__v8hf) __B,
-						   (__mmask8) __U, __R);
+  return (__m512h)
+    __builtin_ia32_vfmaddsubph512_mask3 ((__v32hf) __A,
+					 (__v32hf) __B,
+					 (__v32hf) __C,
+					 (__mmask32) __U, __R);
 }
 
-extern __inline __m128h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fmsub_round_sh (__mmask8 __U, __m128h __W, __m128h __A,
-			  __m128h __B, const int __R)
+_mm512_maskz_fmaddsub_round_ph (__mmask32 __U, __m512h __A, __m512h __B,
+				__m512h __C, const int __R)
 {
-  return (__m128h) __builtin_ia32_vfmaddsh3_maskz ((__v8hf) __W,
-						   (__v8hf) __A,
-						   -(__v8hf) __B,
-						   (__mmask8) __U, __R);
+  return (__m512h)
+    __builtin_ia32_vfmaddsubph512_maskz ((__v32hf) __A,
+					 (__v32hf) __B,
+					 (__v32hf) __C,
+					 (__mmask32) __U, __R);
 }
 
 #else
-#define _mm_fmsub_round_sh(A, B, C, R)					\
-  ((__m128h) __builtin_ia32_vfmaddsh3_mask ((A), (B), -(C), (-1), (R)))
-#define _mm_mask_fmsub_round_sh(A, U, B, C, R)				\
-  ((__m128h) __builtin_ia32_vfmaddsh3_mask ((A), (B), -(C), (U), (R)))
-#define _mm_mask3_fmsub_round_sh(A, B, C, U, R)				\
-  ((__m128h) __builtin_ia32_vfmsubsh3_mask3 ((A), (B), (C), (U), (R)))
-#define _mm_maskz_fmsub_round_sh(U, A, B, C, R)				\
-  ((__m128h) __builtin_ia32_vfmaddsh3_maskz ((A), (B), -(C), (U), (R)))
+#define _mm512_fmaddsub_round_ph(A, B, C, R)				\
+  ((__m512h)__builtin_ia32_vfmaddsubph512_mask ((A), (B), (C), -1, (R)))
+
+#define _mm512_mask_fmaddsub_round_ph(A, U, B, C, R)			\
+  ((__m512h)__builtin_ia32_vfmaddsubph512_mask ((A), (B), (C), (U), (R)))
+
+#define _mm512_mask3_fmaddsub_round_ph(A, B, C, U, R)			\
+  ((__m512h)__builtin_ia32_vfmaddsubph512_mask3 ((A), (B), (C), (U), (R)))
+
+#define _mm512_maskz_fmaddsub_round_ph(U, A, B, C, R)			\
+  ((__m512h)__builtin_ia32_vfmaddsubph512_maskz ((A), (B), (C), (U), (R)))
 
 #endif /* __OPTIMIZE__ */
 
-/* Intrinsics vfnmsub[132,213,231]sh.  */
-extern __inline __m128h
-  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fnmsub_sh (__m128h __W, __m128h __A, __m128h __B)
+/* Intrinsics vfmsubadd[132,213,231]ph.  */
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+  _mm512_fmsubadd_ph (__m512h __A, __m512h __B, __m512h __C)
 {
-  return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
-						  -(__v8hf) __A,
-						  -(__v8hf) __B,
-						  (__mmask8) -1,
-						  _MM_FROUND_CUR_DIRECTION);
+  return (__m512h)
+    __builtin_ia32_vfmsubaddph512_mask ((__v32hf) __A,
+					(__v32hf) __B,
+					(__v32hf) __C,
+					(__mmask32) -1,
+					_MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fnmsub_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B)
+_mm512_mask_fmsubadd_ph (__m512h __A, __mmask32 __U,
+			 __m512h __B, __m512h __C)
 {
-  return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
-						  -(__v8hf) __A,
-						  -(__v8hf) __B,
-						  (__mmask8) __U,
-						  _MM_FROUND_CUR_DIRECTION);
+  return (__m512h)
+    __builtin_ia32_vfmsubaddph512_mask ((__v32hf) __A,
+					(__v32hf) __B,
+					(__v32hf) __C,
+					(__mmask32) __U,
+					_MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fnmsub_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U)
+_mm512_mask3_fmsubadd_ph (__m512h __A, __m512h __B,
+			  __m512h __C, __mmask32 __U)
 {
-  return (__m128h) __builtin_ia32_vfmsubsh3_mask3 ((__v8hf) __W,
-						   -(__v8hf) __A,
-						   (__v8hf) __B,
-						   (__mmask8) __U,
-						   _MM_FROUND_CUR_DIRECTION);
+  return (__m512h)
+    __builtin_ia32_vfmsubaddph512_mask3 ((__v32hf) __A,
+					 (__v32hf) __B,
+					 (__v32hf) __C,
+					 (__mmask32) __U,
+					 _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fnmsub_sh (__mmask8 __U, __m128h __W, __m128h __A, __m128h __B)
+_mm512_maskz_fmsubadd_ph (__mmask32 __U, __m512h __A,
+			  __m512h __B, __m512h __C)
 {
-  return (__m128h) __builtin_ia32_vfmaddsh3_maskz ((__v8hf) __W,
-						   -(__v8hf) __A,
-						   -(__v8hf) __B,
-						   (__mmask8) __U,
-						   _MM_FROUND_CUR_DIRECTION);
+  return (__m512h)
+    __builtin_ia32_vfmsubaddph512_maskz ((__v32hf) __A,
+					 (__v32hf) __B,
+					 (__v32hf) __C,
+					 (__mmask32) __U,
+					 _MM_FROUND_CUR_DIRECTION);
 }
 
-
 #ifdef __OPTIMIZE__
-extern __inline __m128h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fnmsub_round_sh (__m128h __W, __m128h __A, __m128h __B, const int __R)
+_mm512_fmsubadd_round_ph (__m512h __A, __m512h __B,
+			  __m512h __C, const int __R)
 {
-  return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
-						  -(__v8hf) __A,
-						  -(__v8hf) __B,
-						  (__mmask8) -1,
-						  __R);
+  return (__m512h)
+    __builtin_ia32_vfmsubaddph512_mask ((__v32hf) __A,
+					(__v32hf) __B,
+					(__v32hf) __C,
+					(__mmask32) -1, __R);
 }
 
-extern __inline __m128h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fnmsub_round_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B,
-			 const int __R)
+_mm512_mask_fmsubadd_round_ph (__m512h __A, __mmask32 __U, __m512h __B,
+			       __m512h __C, const int __R)
 {
-  return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
-						  -(__v8hf) __A,
-						  -(__v8hf) __B,
-						  (__mmask8) __U, __R);
+  return (__m512h)
+    __builtin_ia32_vfmsubaddph512_mask ((__v32hf) __A,
+					(__v32hf) __B,
+					(__v32hf) __C,
+					(__mmask32) __U, __R);
 }
 
-extern __inline __m128h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fnmsub_round_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U,
-			  const int __R)
+_mm512_mask3_fmsubadd_round_ph (__m512h __A, __m512h __B, __m512h __C,
+				__mmask32 __U, const int __R)
 {
-  return (__m128h) __builtin_ia32_vfmsubsh3_mask3 ((__v8hf) __W,
-						   -(__v8hf) __A,
-						   (__v8hf) __B,
-						   (__mmask8) __U, __R);
+  return (__m512h)
+    __builtin_ia32_vfmsubaddph512_mask3 ((__v32hf) __A,
+					 (__v32hf) __B,
+					 (__v32hf) __C,
+					 (__mmask32) __U, __R);
 }
 
-extern __inline __m128h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fnmsub_round_sh (__mmask8 __U, __m128h __W, __m128h __A,
-			  __m128h __B, const int __R)
-{
-  return (__m128h) __builtin_ia32_vfmaddsh3_maskz ((__v8hf) __W,
-						   -(__v8hf) __A,
-						   -(__v8hf) __B,
-						   (__mmask8) __U, __R);
+_mm512_maskz_fmsubadd_round_ph (__mmask32 __U, __m512h __A, __m512h __B,
+				__m512h __C, const int __R)
+{
+  return (__m512h)
+    __builtin_ia32_vfmsubaddph512_maskz ((__v32hf) __A,
+					 (__v32hf) __B,
+					 (__v32hf) __C,
+					 (__mmask32) __U, __R);
 }
 
 #else
-#define _mm_fnmsub_round_sh(A, B, C, R)					\
-  ((__m128h) __builtin_ia32_vfmaddsh3_mask ((A), -(B), -(C), (-1), (R)))
-#define _mm_mask_fnmsub_round_sh(A, U, B, C, R)				\
-  ((__m128h) __builtin_ia32_vfmaddsh3_mask ((A), -(B), -(C), (U), (R)))
-#define _mm_mask3_fnmsub_round_sh(A, B, C, U, R)			\
-  ((__m128h) __builtin_ia32_vfmsubsh3_mask3 ((A), -(B), (C), (U), (R)))
-#define _mm_maskz_fnmsub_round_sh(U, A, B, C, R)			\
-  ((__m128h) __builtin_ia32_vfmaddsh3_maskz ((A), -(B), -(C), (U), (R)))
+#define _mm512_fmsubadd_round_ph(A, B, C, R)				\
+  ((__m512h)__builtin_ia32_vfmsubaddph512_mask ((A), (B), (C), -1, (R)))
+
+#define _mm512_mask_fmsubadd_round_ph(A, U, B, C, R)			\
+  ((__m512h)__builtin_ia32_vfmsubaddph512_mask ((A), (B), (C), (U), (R)))
+
+#define _mm512_mask3_fmsubadd_round_ph(A, B, C, U, R)			\
+  ((__m512h)__builtin_ia32_vfmsubaddph512_mask3 ((A), (B), (C), (U), (R)))
+
+#define _mm512_maskz_fmsubadd_round_ph(U, A, B, C, R)			\
+  ((__m512h)__builtin_ia32_vfmsubaddph512_maskz ((A), (B), (C), (U), (R)))
 
 #endif /* __OPTIMIZE__ */
 
-/* Intrinsics vf[,c]maddcph.  */
+/* Intrinsics vfmadd[132,213,231]ph.  */
 extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fcmadd_pch (__m512h __A, __m512h __B, __m512h __C)
+  _mm512_fmadd_ph (__m512h __A, __m512h __B, __m512h __C)
 {
   return (__m512h)
-    __builtin_ia32_vfcmaddcph512_round ((__v32hf) __A,
-					(__v32hf) __B,
-					(__v32hf) __C,
-					_MM_FROUND_CUR_DIRECTION);
+    __builtin_ia32_vfmaddph512_mask ((__v32hf) __A,
+				     (__v32hf) __B,
+				     (__v32hf) __C,
+				     (__mmask32) -1,
+				     _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fcmadd_pch (__m512h __A, __mmask16 __B, __m512h __C, __m512h __D)
+_mm512_mask_fmadd_ph (__m512h __A, __mmask32 __U, __m512h __B, __m512h __C)
 {
   return (__m512h)
-    __builtin_ia32_vfcmaddcph512_mask_round ((__v32hf) __A,
-					     (__v32hf) __C,
-					     (__v32hf) __D, __B,
-					     _MM_FROUND_CUR_DIRECTION);
+    __builtin_ia32_vfmaddph512_mask ((__v32hf) __A,
+				     (__v32hf) __B,
+				     (__v32hf) __C,
+				     (__mmask32) __U,
+				     _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fcmadd_pch (__m512h __A, __m512h __B, __m512h __C, __mmask16 __D)
+_mm512_mask3_fmadd_ph (__m512h __A, __m512h __B, __m512h __C, __mmask32 __U)
 {
   return (__m512h)
-    __builtin_ia32_vfcmaddcph512_mask3_round ((__v32hf) __A,
-					      (__v32hf) __B,
-					      (__v32hf) __C,
-					      __D, _MM_FROUND_CUR_DIRECTION);
+    __builtin_ia32_vfmaddph512_mask3 ((__v32hf) __A,
+				      (__v32hf) __B,
+				      (__v32hf) __C,
+				      (__mmask32) __U,
+				      _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fcmadd_pch (__mmask16 __A, __m512h __B, __m512h __C, __m512h __D)
+_mm512_maskz_fmadd_ph (__mmask32 __U, __m512h __A, __m512h __B, __m512h __C)
 {
   return (__m512h)
-    __builtin_ia32_vfcmaddcph512_maskz_round ((__v32hf) __B,
-					      (__v32hf) __C,
-					      (__v32hf) __D,
-					      __A, _MM_FROUND_CUR_DIRECTION);
+    __builtin_ia32_vfmaddph512_maskz ((__v32hf) __A,
+				      (__v32hf) __B,
+				      (__v32hf) __C,
+				      (__mmask32) __U,
+				      _MM_FROUND_CUR_DIRECTION);
 }
 
+#ifdef __OPTIMIZE__
 extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fmadd_pch (__m512h __A, __m512h __B, __m512h __C)
+_mm512_fmadd_round_ph (__m512h __A, __m512h __B, __m512h __C, const int __R)
 {
-  return (__m512h)
-    __builtin_ia32_vfmaddcph512_round ((__v32hf) __A,
-				       (__v32hf) __B,
-				       (__v32hf) __C,
-				       _MM_FROUND_CUR_DIRECTION);
+  return (__m512h) __builtin_ia32_vfmaddph512_mask ((__v32hf) __A,
+						       (__v32hf) __B,
+						       (__v32hf) __C,
+						       (__mmask32) -1, __R);
 }
 
 extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmadd_pch (__m512h __A, __mmask16 __B, __m512h __C, __m512h __D)
+_mm512_mask_fmadd_round_ph (__m512h __A, __mmask32 __U, __m512h __B,
+			       __m512h __C, const int __R)
 {
-  return (__m512h)
-    __builtin_ia32_vfmaddcph512_mask_round ((__v32hf) __A,
-					    (__v32hf) __C,
-					    (__v32hf) __D, __B,
-					    _MM_FROUND_CUR_DIRECTION);
+  return (__m512h) __builtin_ia32_vfmaddph512_mask ((__v32hf) __A,
+						       (__v32hf) __B,
+						       (__v32hf) __C,
+						       (__mmask32) __U, __R);
 }
 
 extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fmadd_pch (__m512h __A, __m512h __B, __m512h __C, __mmask16 __D)
+_mm512_mask3_fmadd_round_ph (__m512h __A, __m512h __B, __m512h __C,
+				__mmask32 __U, const int __R)
 {
-  return (__m512h)
-    __builtin_ia32_vfmaddcph512_mask3_round ((__v32hf) __A,
-					     (__v32hf) __B,
-					     (__v32hf) __C,
-					     __D, _MM_FROUND_CUR_DIRECTION);
+  return (__m512h) __builtin_ia32_vfmaddph512_mask3 ((__v32hf) __A,
+							(__v32hf) __B,
+							(__v32hf) __C,
+							(__mmask32) __U, __R);
 }
 
 extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmadd_pch (__mmask16 __A, __m512h __B, __m512h __C, __m512h __D)
+_mm512_maskz_fmadd_round_ph (__mmask32 __U, __m512h __A, __m512h __B,
+				__m512h __C, const int __R)
 {
-  return (__m512h)
-    __builtin_ia32_vfmaddcph512_maskz_round ((__v32hf) __B,
-					     (__v32hf) __C,
-					     (__v32hf) __D,
-					     __A, _MM_FROUND_CUR_DIRECTION);
+  return (__m512h) __builtin_ia32_vfmaddph512_maskz ((__v32hf) __A,
+							(__v32hf) __B,
+							(__v32hf) __C,
+							(__mmask32) __U, __R);
 }
 
-#ifdef __OPTIMIZE__
+#else
+#define _mm512_fmadd_round_ph(A, B, C, R)				\
+  ((__m512h)__builtin_ia32_vfmaddph512_mask ((A), (B), (C), -1, (R)))
+
+#define _mm512_mask_fmadd_round_ph(A, U, B, C, R)			\
+  ((__m512h)__builtin_ia32_vfmaddph512_mask ((A), (B), (C), (U), (R)))
+
+#define _mm512_mask3_fmadd_round_ph(A, B, C, U, R)			\
+  ((__m512h)__builtin_ia32_vfmaddph512_mask3 ((A), (B), (C), (U), (R)))
+
+#define _mm512_maskz_fmadd_round_ph(U, A, B, C, R)			\
+  ((__m512h)__builtin_ia32_vfmaddph512_maskz ((A), (B), (C), (U), (R)))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vfnmadd[132,213,231]ph.  */
 extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fcmadd_round_pch (__m512h __A, __m512h __B, __m512h __C, const int __D)
+_mm512_fnmadd_ph (__m512h __A, __m512h __B, __m512h __C)
 {
   return (__m512h)
-    __builtin_ia32_vfcmaddcph512_round ((__v32hf) __A,
-					(__v32hf) __B,
-					(__v32hf) __C,
-					__D);
+    __builtin_ia32_vfnmaddph512_mask ((__v32hf) __A,
+				      (__v32hf) __B,
+				      (__v32hf) __C,
+				      (__mmask32) -1,
+				      _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fcmadd_round_pch (__m512h __A, __mmask16 __B, __m512h __C,
-			      __m512h __D, const int __E)
+_mm512_mask_fnmadd_ph (__m512h __A, __mmask32 __U, __m512h __B, __m512h __C)
 {
   return (__m512h)
-    __builtin_ia32_vfcmaddcph512_mask_round ((__v32hf) __A,
-					     (__v32hf) __C,
-					     (__v32hf) __D, __B,
-					     __E);
+    __builtin_ia32_vfnmaddph512_mask ((__v32hf) __A,
+				      (__v32hf) __B,
+				      (__v32hf) __C,
+				      (__mmask32) __U,
+				      _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fcmadd_round_pch (__m512h __A, __m512h __B, __m512h __C,
-			       __mmask16 __D, const int __E)
+_mm512_mask3_fnmadd_ph (__m512h __A, __m512h __B, __m512h __C, __mmask32 __U)
 {
   return (__m512h)
-    __builtin_ia32_vfcmaddcph512_mask3_round ((__v32hf) __A,
-					      (__v32hf) __B,
-					      (__v32hf) __C,
-					      __D, __E);
+    __builtin_ia32_vfnmaddph512_mask3 ((__v32hf) __A,
+				       (__v32hf) __B,
+				       (__v32hf) __C,
+				       (__mmask32) __U,
+				       _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fcmadd_round_pch (__mmask16 __A, __m512h __B, __m512h __C,
-			       __m512h __D, const int __E)
+_mm512_maskz_fnmadd_ph (__mmask32 __U, __m512h __A, __m512h __B, __m512h __C)
 {
   return (__m512h)
-    __builtin_ia32_vfcmaddcph512_maskz_round ((__v32hf) __B,
-					      (__v32hf) __C,
-					      (__v32hf) __D,
-					      __A, __E);
+    __builtin_ia32_vfnmaddph512_maskz ((__v32hf) __A,
+				       (__v32hf) __B,
+				       (__v32hf) __C,
+				       (__mmask32) __U,
+				       _MM_FROUND_CUR_DIRECTION);
 }
 
+#ifdef __OPTIMIZE__
 extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fmadd_round_pch (__m512h __A, __m512h __B, __m512h __C, const int __D)
+_mm512_fnmadd_round_ph (__m512h __A, __m512h __B, __m512h __C, const int __R)
 {
-  return (__m512h)
-    __builtin_ia32_vfmaddcph512_round ((__v32hf) __A,
-				       (__v32hf) __B,
-				       (__v32hf) __C,
-				       __D);
+  return (__m512h) __builtin_ia32_vfnmaddph512_mask ((__v32hf) __A,
+						       (__v32hf) __B,
+						       (__v32hf) __C,
+						       (__mmask32) -1, __R);
 }
 
 extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmadd_round_pch (__m512h __A, __mmask16 __B, __m512h __C,
-			     __m512h __D, const int __E)
+_mm512_mask_fnmadd_round_ph (__m512h __A, __mmask32 __U, __m512h __B,
+			       __m512h __C, const int __R)
 {
-  return (__m512h)
-    __builtin_ia32_vfmaddcph512_mask_round ((__v32hf) __A,
-					    (__v32hf) __C,
-					    (__v32hf) __D, __B,
-					    __E);
+  return (__m512h) __builtin_ia32_vfnmaddph512_mask ((__v32hf) __A,
+						       (__v32hf) __B,
+						       (__v32hf) __C,
+						       (__mmask32) __U, __R);
 }
 
 extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fmadd_round_pch (__m512h __A, __m512h __B, __m512h __C,
-			      __mmask16 __D, const int __E)
+_mm512_mask3_fnmadd_round_ph (__m512h __A, __m512h __B, __m512h __C,
+				__mmask32 __U, const int __R)
 {
-  return (__m512h)
-    __builtin_ia32_vfmaddcph512_mask3_round ((__v32hf) __A,
-					     (__v32hf) __B,
-					     (__v32hf) __C,
-					     __D, __E);
+  return (__m512h) __builtin_ia32_vfnmaddph512_mask3 ((__v32hf) __A,
+							(__v32hf) __B,
+							(__v32hf) __C,
+							(__mmask32) __U, __R);
 }
 
 extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmadd_round_pch (__mmask16 __A, __m512h __B, __m512h __C,
-			      __m512h __D, const int __E)
+_mm512_maskz_fnmadd_round_ph (__mmask32 __U, __m512h __A, __m512h __B,
+				__m512h __C, const int __R)
 {
-  return (__m512h)
-    __builtin_ia32_vfmaddcph512_maskz_round ((__v32hf) __B,
-					     (__v32hf) __C,
-					     (__v32hf) __D,
-					     __A, __E);
+  return (__m512h) __builtin_ia32_vfnmaddph512_maskz ((__v32hf) __A,
+							(__v32hf) __B,
+							(__v32hf) __C,
+							(__mmask32) __U, __R);
 }
 
 #else
-#define _mm512_fcmadd_round_pch(A, B, C, D)			\
-  (__m512h) __builtin_ia32_vfcmaddcph512_round ((A), (B), (C), (D))
-
-#define _mm512_mask_fcmadd_round_pch(A, B, C, D, E)			\
-  ((__m512h) 								\
-    __builtin_ia32_vfcmaddcph512_mask_round ((__v32hf) (A),		\
-					     (__v32hf) (C),		\
-					     (__v32hf) (D),		\
-					     (B), (E)))
-
-
-#define _mm512_mask3_fcmadd_round_pch(A, B, C, D, E)			\
-  ((__m512h)								\
-   __builtin_ia32_vfcmaddcph512_mask3_round ((A), (B), (C), (D), (E)))
-
-#define _mm512_maskz_fcmadd_round_pch(A, B, C, D, E)			\
-  (__m512h)								\
-   __builtin_ia32_vfcmaddcph512_maskz_round ((B), (C), (D), (A), (E))
-
-#define _mm512_fmadd_round_pch(A, B, C, D)			\
-  (__m512h) __builtin_ia32_vfmaddcph512_round ((A), (B), (C), (D))
+#define _mm512_fnmadd_round_ph(A, B, C, R)				\
+  ((__m512h)__builtin_ia32_vfnmaddph512_mask ((A), (B), (C), -1, (R)))
 
-#define _mm512_mask_fmadd_round_pch(A, B, C, D, E)			\
-  ((__m512h)								\
-    __builtin_ia32_vfmaddcph512_mask_round ((__v32hf) (A),		\
-					    (__v32hf) (C),		\
-					    (__v32hf) (D),		\
-					    (B), (E)))
+#define _mm512_mask_fnmadd_round_ph(A, U, B, C, R)			\
+  ((__m512h)__builtin_ia32_vfnmaddph512_mask ((A), (B), (C), (U), (R)))
 
-#define _mm512_mask3_fmadd_round_pch(A, B, C, D, E)			\
-  (__m512h)								\
-   __builtin_ia32_vfmaddcph512_mask3_round ((A), (B), (C), (D), (E))
+#define _mm512_mask3_fnmadd_round_ph(A, B, C, U, R)			\
+  ((__m512h)__builtin_ia32_vfnmaddph512_mask3 ((A), (B), (C), (U), (R)))
 
-#define _mm512_maskz_fmadd_round_pch(A, B, C, D, E)			\
-  (__m512h)								\
-   __builtin_ia32_vfmaddcph512_maskz_round ((B), (C), (D), (A), (E))
+#define _mm512_maskz_fnmadd_round_ph(U, A, B, C, R)			\
+  ((__m512h)__builtin_ia32_vfnmaddph512_maskz ((A), (B), (C), (U), (R)))
 
 #endif /* __OPTIMIZE__ */
 
-/* Intrinsics vf[,c]mulcph.  */
+/* Intrinsics vfmsub[132,213,231]ph.  */
 extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fcmul_pch (__m512h __A, __m512h __B)
+_mm512_fmsub_ph (__m512h __A, __m512h __B, __m512h __C)
 {
   return (__m512h)
-    __builtin_ia32_vfcmulcph512_round ((__v32hf) __A,
-				       (__v32hf) __B,
-				       _MM_FROUND_CUR_DIRECTION);
+    __builtin_ia32_vfmsubph512_mask ((__v32hf) __A,
+				     (__v32hf) __B,
+				     (__v32hf) __C,
+				     (__mmask32) -1,
+				     _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fcmul_pch (__m512h __A, __mmask16 __B, __m512h __C, __m512h __D)
+_mm512_mask_fmsub_ph (__m512h __A, __mmask32 __U, __m512h __B, __m512h __C)
 {
   return (__m512h)
-    __builtin_ia32_vfcmulcph512_mask_round ((__v32hf) __C,
-					    (__v32hf) __D,
-					    (__v32hf) __A,
-					    __B, _MM_FROUND_CUR_DIRECTION);
+    __builtin_ia32_vfmsubph512_mask ((__v32hf) __A,
+				     (__v32hf) __B,
+				     (__v32hf) __C,
+				     (__mmask32) __U,
+				     _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fcmul_pch (__mmask16 __A, __m512h __B, __m512h __C)
+_mm512_mask3_fmsub_ph (__m512h __A, __m512h __B, __m512h __C, __mmask32 __U)
 {
   return (__m512h)
-    __builtin_ia32_vfcmulcph512_mask_round ((__v32hf) __B,
-					    (__v32hf) __C,
-					    _mm512_setzero_ph (),
-					    __A, _MM_FROUND_CUR_DIRECTION);
+    __builtin_ia32_vfmsubph512_mask3 ((__v32hf) __A,
+				      (__v32hf) __B,
+				      (__v32hf) __C,
+				      (__mmask32) __U,
+				      _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fmul_pch (__m512h __A, __m512h __B)
+_mm512_maskz_fmsub_ph (__mmask32 __U, __m512h __A, __m512h __B, __m512h __C)
 {
   return (__m512h)
-    __builtin_ia32_vfmulcph512_round ((__v32hf) __A,
+    __builtin_ia32_vfmsubph512_maskz ((__v32hf) __A,
 				      (__v32hf) __B,
+				      (__v32hf) __C,
+				      (__mmask32) __U,
 				      _MM_FROUND_CUR_DIRECTION);
 }
 
+#ifdef __OPTIMIZE__
 extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmul_pch (__m512h __A, __mmask16 __B, __m512h __C, __m512h __D)
+_mm512_fmsub_round_ph (__m512h __A, __m512h __B, __m512h __C, const int __R)
 {
-  return (__m512h)
-    __builtin_ia32_vfmulcph512_mask_round ((__v32hf) __C,
-					   (__v32hf) __D,
-					   (__v32hf) __A,
-					   __B, _MM_FROUND_CUR_DIRECTION);
+  return (__m512h) __builtin_ia32_vfmsubph512_mask ((__v32hf) __A,
+						       (__v32hf) __B,
+						       (__v32hf) __C,
+						       (__mmask32) -1, __R);
 }
 
 extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmul_pch (__mmask16 __A, __m512h __B, __m512h __C)
+_mm512_mask_fmsub_round_ph (__m512h __A, __mmask32 __U, __m512h __B,
+			       __m512h __C, const int __R)
 {
-  return (__m512h)
-    __builtin_ia32_vfmulcph512_mask_round ((__v32hf) __B,
-					   (__v32hf) __C,
-					   _mm512_setzero_ph (),
-					   __A, _MM_FROUND_CUR_DIRECTION);
+  return (__m512h) __builtin_ia32_vfmsubph512_mask ((__v32hf) __A,
+						       (__v32hf) __B,
+						       (__v32hf) __C,
+						       (__mmask32) __U, __R);
 }
 
-#ifdef __OPTIMIZE__
 extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fcmul_round_pch (__m512h __A, __m512h __B, const int __D)
+_mm512_mask3_fmsub_round_ph (__m512h __A, __m512h __B, __m512h __C,
+				__mmask32 __U, const int __R)
 {
-  return (__m512h)
-    __builtin_ia32_vfcmulcph512_round ((__v32hf) __A,
-				       (__v32hf) __B, __D);
+  return (__m512h) __builtin_ia32_vfmsubph512_mask3 ((__v32hf) __A,
+							(__v32hf) __B,
+							(__v32hf) __C,
+							(__mmask32) __U, __R);
 }
 
 extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fcmul_round_pch (__m512h __A, __mmask16 __B, __m512h __C,
-			     __m512h __D, const int __E)
+_mm512_maskz_fmsub_round_ph (__mmask32 __U, __m512h __A, __m512h __B,
+				__m512h __C, const int __R)
 {
-  return (__m512h)
-    __builtin_ia32_vfcmulcph512_mask_round ((__v32hf) __C,
-					    (__v32hf) __D,
-					    (__v32hf) __A,
-					    __B, __E);
+  return (__m512h) __builtin_ia32_vfmsubph512_maskz ((__v32hf) __A,
+							(__v32hf) __B,
+							(__v32hf) __C,
+							(__mmask32) __U, __R);
 }
 
+#else
+#define _mm512_fmsub_round_ph(A, B, C, R)				\
+  ((__m512h)__builtin_ia32_vfmsubph512_mask ((A), (B), (C), -1, (R)))
+
+#define _mm512_mask_fmsub_round_ph(A, U, B, C, R)			\
+  ((__m512h)__builtin_ia32_vfmsubph512_mask ((A), (B), (C), (U), (R)))
+
+#define _mm512_mask3_fmsub_round_ph(A, B, C, U, R)			\
+  ((__m512h)__builtin_ia32_vfmsubph512_mask3 ((A), (B), (C), (U), (R)))
+
+#define _mm512_maskz_fmsub_round_ph(U, A, B, C, R)			\
+  ((__m512h)__builtin_ia32_vfmsubph512_maskz ((A), (B), (C), (U), (R)))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vfnmsub[132,213,231]ph.  */
 extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fcmul_round_pch (__mmask16 __A, __m512h __B,
-			      __m512h __C, const int __E)
+_mm512_fnmsub_ph (__m512h __A, __m512h __B, __m512h __C)
 {
   return (__m512h)
-    __builtin_ia32_vfcmulcph512_mask_round ((__v32hf) __B,
-					    (__v32hf) __C,
-					    _mm512_setzero_ph (),
-					    __A, __E);
+    __builtin_ia32_vfnmsubph512_mask ((__v32hf) __A,
+				      (__v32hf) __B,
+				      (__v32hf) __C,
+				      (__mmask32) -1,
+				      _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fmul_round_pch (__m512h __A, __m512h __B, const int __D)
+_mm512_mask_fnmsub_ph (__m512h __A, __mmask32 __U, __m512h __B, __m512h __C)
 {
   return (__m512h)
-    __builtin_ia32_vfmulcph512_round ((__v32hf) __A,
+    __builtin_ia32_vfnmsubph512_mask ((__v32hf) __A,
 				      (__v32hf) __B,
-				      __D);
+				      (__v32hf) __C,
+				      (__mmask32) __U,
+				      _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmul_round_pch (__m512h __A, __mmask16 __B, __m512h __C,
-			    __m512h __D, const int __E)
+_mm512_mask3_fnmsub_ph (__m512h __A, __m512h __B, __m512h __C, __mmask32 __U)
 {
   return (__m512h)
-    __builtin_ia32_vfmulcph512_mask_round ((__v32hf) __C,
-					   (__v32hf) __D,
-					   (__v32hf) __A,
-					   __B, __E);
+    __builtin_ia32_vfnmsubph512_mask3 ((__v32hf) __A,
+				       (__v32hf) __B,
+				       (__v32hf) __C,
+				       (__mmask32) __U,
+				       _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmul_round_pch (__mmask16 __A, __m512h __B,
-			     __m512h __C, const int __E)
+_mm512_maskz_fnmsub_ph (__mmask32 __U, __m512h __A, __m512h __B, __m512h __C)
 {
   return (__m512h)
-    __builtin_ia32_vfmulcph512_mask_round ((__v32hf) __B,
-					   (__v32hf) __C,
-					   _mm512_setzero_ph (),
-					   __A, __E);
+    __builtin_ia32_vfnmsubph512_maskz ((__v32hf) __A,
+				       (__v32hf) __B,
+				       (__v32hf) __C,
+				       (__mmask32) __U,
+				       _MM_FROUND_CUR_DIRECTION);
 }
 
-#else
-#define _mm512_fcmul_round_pch(A, B, D)				\
-  (__m512h) __builtin_ia32_vfcmulcph512_round ((A), (B), (D))
+#ifdef __OPTIMIZE__
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_fnmsub_round_ph (__m512h __A, __m512h __B, __m512h __C, const int __R)
+{
+  return (__m512h) __builtin_ia32_vfnmsubph512_mask ((__v32hf) __A,
+						       (__v32hf) __B,
+						       (__v32hf) __C,
+						       (__mmask32) -1, __R);
+}
 
-#define _mm512_mask_fcmul_round_pch(A, B, C, D, E)			\
-  (__m512h) __builtin_ia32_vfcmulcph512_mask_round ((C), (D), (A), (B), (E))
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_fnmsub_round_ph (__m512h __A, __mmask32 __U, __m512h __B,
+			       __m512h __C, const int __R)
+{
+  return (__m512h) __builtin_ia32_vfnmsubph512_mask ((__v32hf) __A,
+						       (__v32hf) __B,
+						       (__v32hf) __C,
+						       (__mmask32) __U, __R);
+}
 
-#define _mm512_maskz_fcmul_round_pch(A, B, C, E)			\
-  (__m512h) __builtin_ia32_vfcmulcph512_mask_round ((B), (C),		\
-						    (__v32hf)		\
-						    _mm512_setzero_ph (), \
-						    (A), (E))
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask3_fnmsub_round_ph (__m512h __A, __m512h __B, __m512h __C,
+				__mmask32 __U, const int __R)
+{
+  return (__m512h) __builtin_ia32_vfnmsubph512_mask3 ((__v32hf) __A,
+							(__v32hf) __B,
+							(__v32hf) __C,
+							(__mmask32) __U, __R);
+}
 
-#define _mm512_fmul_round_pch(A, B, D)			\
-  (__m512h) __builtin_ia32_vfmulcph512_round ((A), (B), (D))
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_fnmsub_round_ph (__mmask32 __U, __m512h __A, __m512h __B,
+				__m512h __C, const int __R)
+{
+  return (__m512h) __builtin_ia32_vfnmsubph512_maskz ((__v32hf) __A,
+							(__v32hf) __B,
+							(__v32hf) __C,
+							(__mmask32) __U, __R);
+}
 
-#define _mm512_mask_fmul_round_pch(A, B, C, D, E)			  \
-  (__m512h) __builtin_ia32_vfmulcph512_mask_round ((C), (D), (A), (B), (E))
+#else
+#define _mm512_fnmsub_round_ph(A, B, C, R)				\
+  ((__m512h)__builtin_ia32_vfnmsubph512_mask ((A), (B), (C), -1, (R)))
 
-#define _mm512_maskz_fmul_round_pch(A, B, C, E)				  \
-  (__m512h) __builtin_ia32_vfmulcph512_mask_round ((B), (C),		  \
-						   (__v32hf)		  \
-						   _mm512_setzero_ph (),  \
-						   (A), (E))
+#define _mm512_mask_fnmsub_round_ph(A, U, B, C, R)			\
+  ((__m512h)__builtin_ia32_vfnmsubph512_mask ((A), (B), (C), (U), (R)))
+
+#define _mm512_mask3_fnmsub_round_ph(A, B, C, U, R)			\
+  ((__m512h)__builtin_ia32_vfnmsubph512_mask3 ((A), (B), (C), (U), (R)))
+
+#define _mm512_maskz_fnmsub_round_ph(U, A, B, C, R)			\
+  ((__m512h)__builtin_ia32_vfnmsubph512_maskz ((A), (B), (C), (U), (R)))
 
 #endif /* __OPTIMIZE__ */
 
-/* Intrinsics vf[,c]maddcsh.  */
-extern __inline __m128h
+/* Intrinsics vf[,c]maddcph.  */
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fcmadd_sch (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
+_mm512_fcmadd_pch (__m512h __A, __m512h __B, __m512h __C)
 {
-  return (__m128h)
-    __builtin_ia32_vfcmaddcsh_mask_round ((__v8hf) __A,
-					  (__v8hf) __C,
-					  (__v8hf) __D, __B,
-					  _MM_FROUND_CUR_DIRECTION);
+  return (__m512h)
+    __builtin_ia32_vfcmaddcph512_round ((__v32hf) __A,
+					(__v32hf) __B,
+					(__v32hf) __C,
+					_MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fcmadd_sch (__m128h __A, __m128h __B, __m128h __C, __mmask8 __D)
+_mm512_mask_fcmadd_pch (__m512h __A, __mmask16 __B, __m512h __C, __m512h __D)
 {
-  return (__m128h)
-    __builtin_ia32_vfcmaddcsh_mask3_round ((__v8hf) __A,
-					   (__v8hf) __B,
-					   (__v8hf) __C, __D,
-					   _MM_FROUND_CUR_DIRECTION);
+  return (__m512h)
+    __builtin_ia32_vfcmaddcph512_mask_round ((__v32hf) __A,
+					     (__v32hf) __C,
+					     (__v32hf) __D, __B,
+					     _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fcmadd_sch (__mmask8 __A, __m128h __B, __m128h __C, __m128h __D)
+_mm512_mask3_fcmadd_pch (__m512h __A, __m512h __B, __m512h __C, __mmask16 __D)
 {
-  return (__m128h)
-    __builtin_ia32_vfcmaddcsh_maskz_round ((__v8hf) __B,
-					   (__v8hf) __C,
-					   (__v8hf) __D,
-					   __A, _MM_FROUND_CUR_DIRECTION);
+  return (__m512h)
+    __builtin_ia32_vfcmaddcph512_mask3_round ((__v32hf) __A,
+					      (__v32hf) __B,
+					      (__v32hf) __C,
+					      __D, _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fcmadd_sch (__m128h __A, __m128h __B, __m128h __C)
+_mm512_maskz_fcmadd_pch (__mmask16 __A, __m512h __B, __m512h __C, __m512h __D)
 {
-  return (__m128h)
-    __builtin_ia32_vfcmaddcsh_round ((__v8hf) __A,
-				     (__v8hf) __B,
-				     (__v8hf) __C,
-				     _MM_FROUND_CUR_DIRECTION);
+  return (__m512h)
+    __builtin_ia32_vfcmaddcph512_maskz_round ((__v32hf) __B,
+					      (__v32hf) __C,
+					      (__v32hf) __D,
+					      __A, _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fmadd_sch (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
+_mm512_fmadd_pch (__m512h __A, __m512h __B, __m512h __C)
 {
-  return (__m128h)
-    __builtin_ia32_vfmaddcsh_mask_round ((__v8hf) __A,
-					 (__v8hf) __C,
-					 (__v8hf) __D, __B,
-					 _MM_FROUND_CUR_DIRECTION);
+  return (__m512h)
+    __builtin_ia32_vfmaddcph512_round ((__v32hf) __A,
+				       (__v32hf) __B,
+				       (__v32hf) __C,
+				       _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fmadd_sch (__m128h __A, __m128h __B, __m128h __C, __mmask8 __D)
+_mm512_mask_fmadd_pch (__m512h __A, __mmask16 __B, __m512h __C, __m512h __D)
 {
-  return (__m128h)
-    __builtin_ia32_vfmaddcsh_mask3_round ((__v8hf) __A,
-					  (__v8hf) __B,
-					  (__v8hf) __C, __D,
-					  _MM_FROUND_CUR_DIRECTION);
+  return (__m512h)
+    __builtin_ia32_vfmaddcph512_mask_round ((__v32hf) __A,
+					    (__v32hf) __C,
+					    (__v32hf) __D, __B,
+					    _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fmadd_sch (__mmask8 __A, __m128h __B, __m128h __C, __m128h __D)
+_mm512_mask3_fmadd_pch (__m512h __A, __m512h __B, __m512h __C, __mmask16 __D)
 {
-  return (__m128h)
-    __builtin_ia32_vfmaddcsh_maskz_round ((__v8hf) __B,
-					  (__v8hf) __C,
-					  (__v8hf) __D,
-					  __A, _MM_FROUND_CUR_DIRECTION);
+  return (__m512h)
+    __builtin_ia32_vfmaddcph512_mask3_round ((__v32hf) __A,
+					     (__v32hf) __B,
+					     (__v32hf) __C,
+					     __D, _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fmadd_sch (__m128h __A, __m128h __B, __m128h __C)
+_mm512_maskz_fmadd_pch (__mmask16 __A, __m512h __B, __m512h __C, __m512h __D)
 {
-  return (__m128h)
-    __builtin_ia32_vfmaddcsh_round ((__v8hf) __A,
-				    (__v8hf) __B,
-				    (__v8hf) __C,
-				    _MM_FROUND_CUR_DIRECTION);
+  return (__m512h)
+    __builtin_ia32_vfmaddcph512_maskz_round ((__v32hf) __B,
+					     (__v32hf) __C,
+					     (__v32hf) __D,
+					     __A, _MM_FROUND_CUR_DIRECTION);
 }
 
 #ifdef __OPTIMIZE__
-extern __inline __m128h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fcmadd_round_sch (__m128h __A, __mmask8 __B, __m128h __C,
-			   __m128h __D, const int __E)
+_mm512_fcmadd_round_pch (__m512h __A, __m512h __B, __m512h __C, const int __D)
 {
-  return (__m128h)
-    __builtin_ia32_vfcmaddcsh_mask_round ((__v8hf) __A,
-					  (__v8hf) __C,
-					  (__v8hf) __D,
-					  __B, __E);
+  return (__m512h)
+    __builtin_ia32_vfcmaddcph512_round ((__v32hf) __A,
+					(__v32hf) __B,
+					(__v32hf) __C,
+					__D);
 }
 
-extern __inline __m128h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fcmadd_round_sch (__m128h __A, __m128h __B, __m128h __C,
-			    __mmask8 __D, const int __E)
+_mm512_mask_fcmadd_round_pch (__m512h __A, __mmask16 __B, __m512h __C,
+			      __m512h __D, const int __E)
 {
-  return (__m128h)
-    __builtin_ia32_vfcmaddcsh_mask3_round ((__v8hf) __A,
-					   (__v8hf) __B,
-					   (__v8hf) __C,
-					   __D, __E);
+  return (__m512h)
+    __builtin_ia32_vfcmaddcph512_mask_round ((__v32hf) __A,
+					     (__v32hf) __C,
+					     (__v32hf) __D, __B,
+					     __E);
 }
 
-extern __inline __m128h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fcmadd_round_sch (__mmask8 __A, __m128h __B, __m128h __C,
-			    __m128h __D, const int __E)
+_mm512_mask3_fcmadd_round_pch (__m512h __A, __m512h __B, __m512h __C,
+			       __mmask16 __D, const int __E)
 {
-  return (__m128h)
-    __builtin_ia32_vfcmaddcsh_maskz_round ((__v8hf) __B,
-					   (__v8hf) __C,
-					   (__v8hf) __D,
-					   __A, __E);
+  return (__m512h)
+    __builtin_ia32_vfcmaddcph512_mask3_round ((__v32hf) __A,
+					      (__v32hf) __B,
+					      (__v32hf) __C,
+					      __D, __E);
 }
 
-extern __inline __m128h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fcmadd_round_sch (__m128h __A, __m128h __B, __m128h __C, const int __D)
+_mm512_maskz_fcmadd_round_pch (__mmask16 __A, __m512h __B, __m512h __C,
+			       __m512h __D, const int __E)
 {
-  return (__m128h)
-    __builtin_ia32_vfcmaddcsh_round ((__v8hf) __A,
-				     (__v8hf) __B,
-				     (__v8hf) __C,
-				     __D);
+  return (__m512h)
+    __builtin_ia32_vfcmaddcph512_maskz_round ((__v32hf) __B,
+					      (__v32hf) __C,
+					      (__v32hf) __D,
+					      __A, __E);
 }
 
-extern __inline __m128h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fmadd_round_sch (__m128h __A, __mmask8 __B, __m128h __C,
-			  __m128h __D, const int __E)
+_mm512_fmadd_round_pch (__m512h __A, __m512h __B, __m512h __C, const int __D)
 {
-  return (__m128h)
-    __builtin_ia32_vfmaddcsh_mask_round ((__v8hf) __A,
-					 (__v8hf) __C,
-					 (__v8hf) __D,
-					 __B, __E);
+  return (__m512h)
+    __builtin_ia32_vfmaddcph512_round ((__v32hf) __A,
+				       (__v32hf) __B,
+				       (__v32hf) __C,
+				       __D);
 }
 
-extern __inline __m128h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fmadd_round_sch (__m128h __A, __m128h __B, __m128h __C,
-			   __mmask8 __D, const int __E)
+_mm512_mask_fmadd_round_pch (__m512h __A, __mmask16 __B, __m512h __C,
+			     __m512h __D, const int __E)
 {
-  return (__m128h)
-    __builtin_ia32_vfmaddcsh_mask3_round ((__v8hf) __A,
-					  (__v8hf) __B,
-					  (__v8hf) __C,
-					  __D, __E);
+  return (__m512h)
+    __builtin_ia32_vfmaddcph512_mask_round ((__v32hf) __A,
+					    (__v32hf) __C,
+					    (__v32hf) __D, __B,
+					    __E);
 }
 
-extern __inline __m128h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fmadd_round_sch (__mmask8 __A, __m128h __B, __m128h __C,
-			   __m128h __D, const int __E)
+_mm512_mask3_fmadd_round_pch (__m512h __A, __m512h __B, __m512h __C,
+			      __mmask16 __D, const int __E)
 {
-  return (__m128h)
-    __builtin_ia32_vfmaddcsh_maskz_round ((__v8hf) __B,
-					  (__v8hf) __C,
-					  (__v8hf) __D,
-					  __A, __E);
+  return (__m512h)
+    __builtin_ia32_vfmaddcph512_mask3_round ((__v32hf) __A,
+					     (__v32hf) __B,
+					     (__v32hf) __C,
+					     __D, __E);
 }
 
-extern __inline __m128h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fmadd_round_sch (__m128h __A, __m128h __B, __m128h __C, const int __D)
+_mm512_maskz_fmadd_round_pch (__mmask16 __A, __m512h __B, __m512h __C,
+			      __m512h __D, const int __E)
 {
-  return (__m128h)
-    __builtin_ia32_vfmaddcsh_round ((__v8hf) __A,
-				    (__v8hf) __B,
-				    (__v8hf) __C,
-				    __D);
+  return (__m512h)
+    __builtin_ia32_vfmaddcph512_maskz_round ((__v32hf) __B,
+					     (__v32hf) __C,
+					     (__v32hf) __D,
+					     __A, __E);
 }
+
 #else
-#define _mm_mask_fcmadd_round_sch(A, B, C, D, E)			\
-    ((__m128h)								\
-     __builtin_ia32_vfcmaddcsh_mask_round ((__v8hf) (A),		\
-					   (__v8hf) (C),		\
-					   (__v8hf) (D),		\
-					   (B), (E)))
+#define _mm512_fcmadd_round_pch(A, B, C, D)			\
+  (__m512h) __builtin_ia32_vfcmaddcph512_round ((A), (B), (C), (D))
 
+#define _mm512_mask_fcmadd_round_pch(A, B, C, D, E)			\
+  ((__m512h) 								\
+    __builtin_ia32_vfcmaddcph512_mask_round ((__v32hf) (A),		\
+					     (__v32hf) (C),		\
+					     (__v32hf) (D),		\
+					     (B), (E)))
 
-#define _mm_mask3_fcmadd_round_sch(A, B, C, D, E)			\
-  ((__m128h)								\
-   __builtin_ia32_vfcmaddcsh_mask3_round ((__v8hf) (A),		\
-					  (__v8hf) (B),		\
-					  (__v8hf) (C),		\
-					  (D), (E)))
 
-#define _mm_maskz_fcmadd_round_sch(A, B, C, D, E)		\
-  __builtin_ia32_vfcmaddcsh_maskz_round ((B), (C), (D), (A), (E))
+#define _mm512_mask3_fcmadd_round_pch(A, B, C, D, E)			\
+  ((__m512h)								\
+   __builtin_ia32_vfcmaddcph512_mask3_round ((A), (B), (C), (D), (E)))
 
-#define _mm_fcmadd_round_sch(A, B, C, D)		\
-  __builtin_ia32_vfcmaddcsh_round ((A), (B), (C), (D))
+#define _mm512_maskz_fcmadd_round_pch(A, B, C, D, E)			\
+  (__m512h)								\
+   __builtin_ia32_vfcmaddcph512_maskz_round ((B), (C), (D), (A), (E))
 
-#define _mm_mask_fmadd_round_sch(A, B, C, D, E)				\
-    ((__m128h)								\
-     __builtin_ia32_vfmaddcsh_mask_round ((__v8hf) (A),		\
-					  (__v8hf) (C),		\
-					  (__v8hf) (D),		\
-					  (B), (E)))
+#define _mm512_fmadd_round_pch(A, B, C, D)			\
+  (__m512h) __builtin_ia32_vfmaddcph512_round ((A), (B), (C), (D))
 
-#define _mm_mask3_fmadd_round_sch(A, B, C, D, E)			\
-  ((__m128h)								\
-   __builtin_ia32_vfmaddcsh_mask3_round ((__v8hf) (A),		\
-					 (__v8hf) (B),		\
-					 (__v8hf) (C),		\
-					 (D), (E)))
+#define _mm512_mask_fmadd_round_pch(A, B, C, D, E)			\
+  ((__m512h)								\
+    __builtin_ia32_vfmaddcph512_mask_round ((__v32hf) (A),		\
+					    (__v32hf) (C),		\
+					    (__v32hf) (D),		\
+					    (B), (E)))
 
-#define _mm_maskz_fmadd_round_sch(A, B, C, D, E)		\
-  __builtin_ia32_vfmaddcsh_maskz_round ((B), (C), (D), (A), (E))
+#define _mm512_mask3_fmadd_round_pch(A, B, C, D, E)			\
+  (__m512h)								\
+   __builtin_ia32_vfmaddcph512_mask3_round ((A), (B), (C), (D), (E))
 
-#define _mm_fmadd_round_sch(A, B, C, D)		\
-  __builtin_ia32_vfmaddcsh_round ((A), (B), (C), (D))
+#define _mm512_maskz_fmadd_round_pch(A, B, C, D, E)			\
+  (__m512h)								\
+   __builtin_ia32_vfmaddcph512_maskz_round ((B), (C), (D), (A), (E))
 
 #endif /* __OPTIMIZE__ */
 
-/* Intrinsics vf[,c]mulcsh.  */
-extern __inline __m128h
+/* Intrinsics vf[,c]mulcph.  */
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fcmul_sch (__m128h __A, __m128h __B)
+_mm512_fcmul_pch (__m512h __A, __m512h __B)
 {
-  return (__m128h)
-    __builtin_ia32_vfcmulcsh_round ((__v8hf) __A,
-				    (__v8hf) __B,
-				    _MM_FROUND_CUR_DIRECTION);
+  return (__m512h)
+    __builtin_ia32_vfcmulcph512_round ((__v32hf) __A,
+				       (__v32hf) __B,
+				       _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fcmul_sch (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
+_mm512_mask_fcmul_pch (__m512h __A, __mmask16 __B, __m512h __C, __m512h __D)
 {
-  return (__m128h)
-    __builtin_ia32_vfcmulcsh_mask_round ((__v8hf) __C,
-					 (__v8hf) __D,
-					 (__v8hf) __A,
-					 __B, _MM_FROUND_CUR_DIRECTION);
+  return (__m512h)
+    __builtin_ia32_vfcmulcph512_mask_round ((__v32hf) __C,
+					    (__v32hf) __D,
+					    (__v32hf) __A,
+					    __B, _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fcmul_sch (__mmask8 __A, __m128h __B, __m128h __C)
+_mm512_maskz_fcmul_pch (__mmask16 __A, __m512h __B, __m512h __C)
 {
-  return (__m128h)
-    __builtin_ia32_vfcmulcsh_mask_round ((__v8hf) __B,
-					 (__v8hf) __C,
-					 _mm_setzero_ph (),
-					 __A, _MM_FROUND_CUR_DIRECTION);
+  return (__m512h)
+    __builtin_ia32_vfcmulcph512_mask_round ((__v32hf) __B,
+					    (__v32hf) __C,
+					    _mm512_setzero_ph (),
+					    __A, _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fmul_sch (__m128h __A, __m128h __B)
+_mm512_fmul_pch (__m512h __A, __m512h __B)
 {
-  return (__m128h)
-    __builtin_ia32_vfmulcsh_round ((__v8hf) __A,
-				   (__v8hf) __B,
-				   _MM_FROUND_CUR_DIRECTION);
+  return (__m512h)
+    __builtin_ia32_vfmulcph512_round ((__v32hf) __A,
+				      (__v32hf) __B,
+				      _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fmul_sch (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
+_mm512_mask_fmul_pch (__m512h __A, __mmask16 __B, __m512h __C, __m512h __D)
 {
-  return (__m128h)
-    __builtin_ia32_vfmulcsh_mask_round ((__v8hf) __C,
-					(__v8hf) __D,
-					(__v8hf) __A,
-					__B, _MM_FROUND_CUR_DIRECTION);
+  return (__m512h)
+    __builtin_ia32_vfmulcph512_mask_round ((__v32hf) __C,
+					   (__v32hf) __D,
+					   (__v32hf) __A,
+					   __B, _MM_FROUND_CUR_DIRECTION);
 }
 
-extern __inline __m128h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fmul_sch (__mmask8 __A, __m128h __B, __m128h __C)
+_mm512_maskz_fmul_pch (__mmask16 __A, __m512h __B, __m512h __C)
 {
-  return (__m128h)
-    __builtin_ia32_vfmulcsh_mask_round ((__v8hf) __B,
-					(__v8hf) __C,
-					_mm_setzero_ph (),
-					__A, _MM_FROUND_CUR_DIRECTION);
+  return (__m512h)
+    __builtin_ia32_vfmulcph512_mask_round ((__v32hf) __B,
+					   (__v32hf) __C,
+					   _mm512_setzero_ph (),
+					   __A, _MM_FROUND_CUR_DIRECTION);
 }
 
 #ifdef __OPTIMIZE__
-extern __inline __m128h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fcmul_round_sch (__m128h __A, __m128h __B, const int __D)
+_mm512_fcmul_round_pch (__m512h __A, __m512h __B, const int __D)
 {
-  return (__m128h)
-    __builtin_ia32_vfcmulcsh_round ((__v8hf) __A,
-				    (__v8hf) __B,
-				    __D);
+  return (__m512h)
+    __builtin_ia32_vfcmulcph512_round ((__v32hf) __A,
+				       (__v32hf) __B, __D);
 }
 
-extern __inline __m128h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fcmul_round_sch (__m128h __A, __mmask8 __B, __m128h __C,
-			  __m128h __D, const int __E)
+_mm512_mask_fcmul_round_pch (__m512h __A, __mmask16 __B, __m512h __C,
+			     __m512h __D, const int __E)
 {
-  return (__m128h)
-    __builtin_ia32_vfcmulcsh_mask_round ((__v8hf) __C,
-					 (__v8hf) __D,
-					 (__v8hf) __A,
-					 __B, __E);
+  return (__m512h)
+    __builtin_ia32_vfcmulcph512_mask_round ((__v32hf) __C,
+					    (__v32hf) __D,
+					    (__v32hf) __A,
+					    __B, __E);
 }
 
-extern __inline __m128h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fcmul_round_sch (__mmask8 __A, __m128h __B, __m128h __C,
-			   const int __E)
+_mm512_maskz_fcmul_round_pch (__mmask16 __A, __m512h __B,
+			      __m512h __C, const int __E)
 {
-  return (__m128h)
-    __builtin_ia32_vfcmulcsh_mask_round ((__v8hf) __B,
-					 (__v8hf) __C,
-					 _mm_setzero_ph (),
-					 __A, __E);
+  return (__m512h)
+    __builtin_ia32_vfcmulcph512_mask_round ((__v32hf) __B,
+					    (__v32hf) __C,
+					    _mm512_setzero_ph (),
+					    __A, __E);
 }
 
-extern __inline __m128h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fmul_round_sch (__m128h __A, __m128h __B, const int __D)
+_mm512_fmul_round_pch (__m512h __A, __m512h __B, const int __D)
 {
-  return (__m128h)
-    __builtin_ia32_vfmulcsh_round ((__v8hf) __A,
-				   (__v8hf) __B, __D);
+  return (__m512h)
+    __builtin_ia32_vfmulcph512_round ((__v32hf) __A,
+				      (__v32hf) __B,
+				      __D);
 }
 
-extern __inline __m128h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fmul_round_sch (__m128h __A, __mmask8 __B, __m128h __C,
-			 __m128h __D, const int __E)
+_mm512_mask_fmul_round_pch (__m512h __A, __mmask16 __B, __m512h __C,
+			    __m512h __D, const int __E)
 {
-  return (__m128h)
-    __builtin_ia32_vfmulcsh_mask_round ((__v8hf) __C,
-					(__v8hf) __D,
-					(__v8hf) __A,
-					__B, __E);
+  return (__m512h)
+    __builtin_ia32_vfmulcph512_mask_round ((__v32hf) __C,
+					   (__v32hf) __D,
+					   (__v32hf) __A,
+					   __B, __E);
 }
 
-extern __inline __m128h
+extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fmul_round_sch (__mmask8 __A, __m128h __B, __m128h __C, const int __E)
+_mm512_maskz_fmul_round_pch (__mmask16 __A, __m512h __B,
+			     __m512h __C, const int __E)
 {
-  return (__m128h)
-    __builtin_ia32_vfmulcsh_mask_round ((__v8hf) __B,
-					(__v8hf) __C,
-					_mm_setzero_ph (),
-					__A, __E);
+  return (__m512h)
+    __builtin_ia32_vfmulcph512_mask_round ((__v32hf) __B,
+					   (__v32hf) __C,
+					   _mm512_setzero_ph (),
+					   __A, __E);
 }
 
 #else
-#define _mm_fcmul_round_sch(__A, __B, __D)				\
-  (__m128h) __builtin_ia32_vfcmulcsh_round ((__v8hf) __A,		\
-					    (__v8hf) __B, __D)
+#define _mm512_fcmul_round_pch(A, B, D)				\
+  (__m512h) __builtin_ia32_vfcmulcph512_round ((A), (B), (D))
 
-#define _mm_mask_fcmul_round_sch(__A, __B, __C, __D, __E)		\
-  (__m128h) __builtin_ia32_vfcmulcsh_mask_round ((__v8hf) __C,		\
-						 (__v8hf) __D,		\
-						 (__v8hf) __A,		\
-						 __B, __E)
+#define _mm512_mask_fcmul_round_pch(A, B, C, D, E)			\
+  (__m512h) __builtin_ia32_vfcmulcph512_mask_round ((C), (D), (A), (B), (E))
 
-#define _mm_maskz_fcmul_round_sch(__A, __B, __C, __E)			\
-  (__m128h) __builtin_ia32_vfcmulcsh_mask_round ((__v8hf) __B,		\
-						 (__v8hf) __C,		\
-						 _mm_setzero_ph (),	\
-						 __A, __E)
+#define _mm512_maskz_fcmul_round_pch(A, B, C, E)			\
+  (__m512h) __builtin_ia32_vfcmulcph512_mask_round ((B), (C),		\
+						    (__v32hf)		\
+						    _mm512_setzero_ph (), \
+						    (A), (E))
 
-#define _mm_fmul_round_sch(__A, __B, __D)				\
-  (__m128h) __builtin_ia32_vfmulcsh_round ((__v8hf) __A,		\
-					   (__v8hf) __B, __D)
+#define _mm512_fmul_round_pch(A, B, D)			\
+  (__m512h) __builtin_ia32_vfmulcph512_round ((A), (B), (D))
 
-#define _mm_mask_fmul_round_sch(__A, __B, __C, __D, __E)		\
-  (__m128h) __builtin_ia32_vfmulcsh_mask_round ((__v8hf) __C,		\
-						(__v8hf) __D,		\
-						(__v8hf) __A,		\
-						__B, __E)
+#define _mm512_mask_fmul_round_pch(A, B, C, D, E)			  \
+  (__m512h) __builtin_ia32_vfmulcph512_mask_round ((C), (D), (A), (B), (E))
 
-#define _mm_maskz_fmul_round_sch(__A, __B, __C, __E)			\
-  (__m128h) __builtin_ia32_vfmulcsh_mask_round ((__v8hf) __B,		\
-						(__v8hf) __C,		\
-						_mm_setzero_ph (),	\
-						__A, __E)
+#define _mm512_maskz_fmul_round_pch(A, B, C, E)				  \
+  (__m512h) __builtin_ia32_vfmulcph512_mask_round ((B), (C),		  \
+						   (__v32hf)		  \
+						   _mm512_setzero_ph (),  \
+						   (A), (E))
 
 #endif /* __OPTIMIZE__ */
 
@@ -7193,27 +7238,9 @@ _mm512_set1_pch (_Float16 _Complex __A)
 #define _mm512_maskz_cmul_round_pch(U, A, B, R)			      \
   _mm512_maskz_fcmul_round_pch ((U), (A), (B), (R))
 
-#define _mm_mul_sch(A, B) _mm_fmul_sch ((A), (B))
-#define _mm_mask_mul_sch(W, U, A, B) _mm_mask_fmul_sch ((W), (U), (A), (B))
-#define _mm_maskz_mul_sch(U, A, B) _mm_maskz_fmul_sch ((U), (A), (B))
-#define _mm_mul_round_sch(A, B, R) _mm_fmul_round_sch ((A), (B), (R))
-#define _mm_mask_mul_round_sch(W, U, A, B, R)			      \
-  _mm_mask_fmul_round_sch ((W), (U), (A), (B), (R))
-#define _mm_maskz_mul_round_sch(U, A, B, R)			      \
-  _mm_maskz_fmul_round_sch ((U), (A), (B), (R))
-
-#define _mm_cmul_sch(A, B) _mm_fcmul_sch ((A), (B))
-#define _mm_mask_cmul_sch(W, U, A, B) _mm_mask_fcmul_sch ((W), (U), (A), (B))
-#define _mm_maskz_cmul_sch(U, A, B) _mm_maskz_fcmul_sch ((U), (A), (B))
-#define _mm_cmul_round_sch(A, B, R) _mm_fcmul_round_sch ((A), (B), (R))
-#define _mm_mask_cmul_round_sch(W, U, A, B, R)			      \
-  _mm_mask_fcmul_round_sch ((W), (U), (A), (B), (R))
-#define _mm_maskz_cmul_round_sch(U, A, B, R)			      \
-  _mm_maskz_fcmul_round_sch ((U), (A), (B), (R))
-
-#ifdef __DISABLE_AVX512FP16__
-#undef __DISABLE_AVX512FP16__
+#ifdef __DISABLE_AVX512FP16_512__
+#undef __DISABLE_AVX512FP16_512__
 #pragma GCC pop_options
-#endif /* __DISABLE_AVX512FP16__ */
+#endif /* __DISABLE_AVX512FP16_512__ */
 
-#endif /* __AVX512FP16INTRIN_H_INCLUDED */
+#endif /* _AVX512FP16INTRIN_H_INCLUDED */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 07/18] [PATCH 1/5] Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins
  2023-09-21  7:19 [PATCH 00/18] Support -mevex512 for AVX512 Hu, Lin1
                   ` (5 preceding siblings ...)
  2023-09-21  7:20 ` [PATCH 06/18] [PATCH 5/5] " Hu, Lin1
@ 2023-09-21  7:20 ` Hu, Lin1
  2023-09-21  7:20 ` [PATCH 08/18] [PATCH 2/5] " Hu, Lin1
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Hu, Lin1 @ 2023-09-21  7:20 UTC (permalink / raw)
  To: gcc-patches; +Cc: hongtao.liu, ubizjak, haochen.jiang

From: Haochen Jiang <haochen.jiang@intel.com>

gcc/ChangeLog:

	* config/i386/i386-builtin.def (BDESC): Add
	OPTION_MASK_ISA2_EVEX512.
	* config/i386/i386-builtins.cc
	(ix86_init_mmx_sse_builtins): Ditto.
---
 gcc/config/i386/i386-builtin.def | 648 +++++++++++++++----------------
 gcc/config/i386/i386-builtins.cc |  72 ++--
 2 files changed, 372 insertions(+), 348 deletions(-)

diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 8738b3b6a8a..0cc526383db 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -200,53 +200,53 @@ BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_avx2_maskstored256, "__builtin_ia32_mas
 BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_avx2_maskstoreq256, "__builtin_ia32_maskstoreq256", IX86_BUILTIN_MASKSTOREQ256, UNKNOWN, (int) VOID_FTYPE_PV4DI_V4DI_V4DI)
 
 /* AVX512F */
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_compressstorev16sf_mask, "__builtin_ia32_compressstoresf512_mask", IX86_BUILTIN_COMPRESSPSSTORE512, UNKNOWN, (int) VOID_FTYPE_PV16SF_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_compressstorev16si_mask, "__builtin_ia32_compressstoresi512_mask", IX86_BUILTIN_PCOMPRESSDSTORE512, UNKNOWN, (int) VOID_FTYPE_PV16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_compressstorev8df_mask, "__builtin_ia32_compressstoredf512_mask", IX86_BUILTIN_COMPRESSPDSTORE512, UNKNOWN, (int) VOID_FTYPE_PV8DF_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_compressstorev8di_mask, "__builtin_ia32_compressstoredi512_mask", IX86_BUILTIN_PCOMPRESSQSTORE512, UNKNOWN, (int) VOID_FTYPE_PV8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_expandv16sf_mask, "__builtin_ia32_expandloadsf512_mask", IX86_BUILTIN_EXPANDPSLOAD512, UNKNOWN, (int) V16SF_FTYPE_PCV16SF_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv16sf_maskz, "__builtin_ia32_expandloadsf512_maskz", IX86_BUILTIN_EXPANDPSLOAD512Z, UNKNOWN, (int) V16SF_FTYPE_PCV16SF_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_expandv16si_mask, "__builtin_ia32_expandloadsi512_mask", IX86_BUILTIN_PEXPANDDLOAD512, UNKNOWN, (int) V16SI_FTYPE_PCV16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv16si_maskz, "__builtin_ia32_expandloadsi512_maskz", IX86_BUILTIN_PEXPANDDLOAD512Z, UNKNOWN, (int) V16SI_FTYPE_PCV16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_expandv8df_mask, "__builtin_ia32_expandloaddf512_mask", IX86_BUILTIN_EXPANDPDLOAD512, UNKNOWN, (int) V8DF_FTYPE_PCV8DF_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv8df_maskz, "__builtin_ia32_expandloaddf512_maskz", IX86_BUILTIN_EXPANDPDLOAD512Z, UNKNOWN, (int) V8DF_FTYPE_PCV8DF_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_expandv8di_mask, "__builtin_ia32_expandloaddi512_mask", IX86_BUILTIN_PEXPANDQLOAD512, UNKNOWN, (int) V8DI_FTYPE_PCV8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv8di_maskz, "__builtin_ia32_expandloaddi512_maskz", IX86_BUILTIN_PEXPANDQLOAD512Z, UNKNOWN, (int) V8DI_FTYPE_PCV8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_loadv16si_mask, "__builtin_ia32_loaddqusi512_mask", IX86_BUILTIN_LOADDQUSI512, UNKNOWN, (int) V16SI_FTYPE_PCINT_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_loadv8di_mask, "__builtin_ia32_loaddqudi512_mask", IX86_BUILTIN_LOADDQUDI512, UNKNOWN, (int) V8DI_FTYPE_PCINT64_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_loadv8df_mask, "__builtin_ia32_loadupd512_mask", IX86_BUILTIN_LOADUPD512, UNKNOWN, (int) V8DF_FTYPE_PCDOUBLE_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_loadv16sf_mask, "__builtin_ia32_loadups512_mask", IX86_BUILTIN_LOADUPS512, UNKNOWN, (int) V16SF_FTYPE_PCFLOAT_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_loadv16sf_mask, "__builtin_ia32_loadaps512_mask", IX86_BUILTIN_LOADAPS512, UNKNOWN, (int) V16SF_FTYPE_PCV16SF_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_loadv16si_mask, "__builtin_ia32_movdqa32load512_mask", IX86_BUILTIN_MOVDQA32LOAD512, UNKNOWN, (int) V16SI_FTYPE_PCV16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_loadv8df_mask, "__builtin_ia32_loadapd512_mask", IX86_BUILTIN_LOADAPD512, UNKNOWN, (int) V8DF_FTYPE_PCV8DF_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_loadv8di_mask, "__builtin_ia32_movdqa64load512_mask", IX86_BUILTIN_MOVDQA64LOAD512, UNKNOWN, (int) V8DI_FTYPE_PCV8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_movntv16sf, "__builtin_ia32_movntps512", IX86_BUILTIN_MOVNTPS512, UNKNOWN, (int) VOID_FTYPE_PFLOAT_V16SF)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_movntv8df, "__builtin_ia32_movntpd512", IX86_BUILTIN_MOVNTPD512, UNKNOWN, (int) VOID_FTYPE_PDOUBLE_V8DF)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_movntv8di, "__builtin_ia32_movntdq512", IX86_BUILTIN_MOVNTDQ512, UNKNOWN, (int) VOID_FTYPE_PV8DI_V8DI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_movntdqa, "__builtin_ia32_movntdqa512", IX86_BUILTIN_MOVNTDQA512, UNKNOWN, (int) V8DI_FTYPE_PV8DI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_storev16si_mask, "__builtin_ia32_storedqusi512_mask", IX86_BUILTIN_STOREDQUSI512, UNKNOWN, (int) VOID_FTYPE_PINT_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_storev8di_mask, "__builtin_ia32_storedqudi512_mask", IX86_BUILTIN_STOREDQUDI512, UNKNOWN, (int) VOID_FTYPE_PINT64_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_storev8df_mask, "__builtin_ia32_storeupd512_mask", IX86_BUILTIN_STOREUPD512, UNKNOWN, (int) VOID_FTYPE_PDOUBLE_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_us_truncatev8div8si2_mask_store, "__builtin_ia32_pmovusqd512mem_mask", IX86_BUILTIN_PMOVUSQD512_MEM, UNKNOWN, (int) VOID_FTYPE_PV8SI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_ss_truncatev8div8si2_mask_store, "__builtin_ia32_pmovsqd512mem_mask", IX86_BUILTIN_PMOVSQD512_MEM, UNKNOWN, (int) VOID_FTYPE_PV8SI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_truncatev8div8si2_mask_store, "__builtin_ia32_pmovqd512mem_mask", IX86_BUILTIN_PMOVQD512_MEM, UNKNOWN, (int) VOID_FTYPE_PV8SI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_us_truncatev8div8hi2_mask_store, "__builtin_ia32_pmovusqw512mem_mask", IX86_BUILTIN_PMOVUSQW512_MEM, UNKNOWN, (int) VOID_FTYPE_PV8HI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_ss_truncatev8div8hi2_mask_store, "__builtin_ia32_pmovsqw512mem_mask", IX86_BUILTIN_PMOVSQW512_MEM, UNKNOWN, (int) VOID_FTYPE_PV8HI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_truncatev8div8hi2_mask_store, "__builtin_ia32_pmovqw512mem_mask", IX86_BUILTIN_PMOVQW512_MEM, UNKNOWN, (int) VOID_FTYPE_PV8HI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_us_truncatev16siv16hi2_mask_store, "__builtin_ia32_pmovusdw512mem_mask", IX86_BUILTIN_PMOVUSDW512_MEM, UNKNOWN, (int) VOID_FTYPE_PV16HI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_ss_truncatev16siv16hi2_mask_store, "__builtin_ia32_pmovsdw512mem_mask", IX86_BUILTIN_PMOVSDW512_MEM, UNKNOWN, (int) VOID_FTYPE_PV16HI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_truncatev16siv16hi2_mask_store, "__builtin_ia32_pmovdw512mem_mask", IX86_BUILTIN_PMOVDW512_MEM, UNKNOWN, (int) VOID_FTYPE_PV16HI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_truncatev8div16qi2_mask_store_2, "__builtin_ia32_pmovqb512mem_mask", IX86_BUILTIN_PMOVQB512_MEM, UNKNOWN, (int) VOID_FTYPE_PUDI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_us_truncatev8div16qi2_mask_store_2, "__builtin_ia32_pmovusqb512mem_mask", IX86_BUILTIN_PMOVUSQB512_MEM, UNKNOWN, (int) VOID_FTYPE_PUDI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_ss_truncatev8div16qi2_mask_store_2, "__builtin_ia32_pmovsqb512mem_mask", IX86_BUILTIN_PMOVSQB512_MEM, UNKNOWN, (int) VOID_FTYPE_PUDI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_us_truncatev16siv16qi2_mask_store, "__builtin_ia32_pmovusdb512mem_mask", IX86_BUILTIN_PMOVUSDB512_MEM, UNKNOWN, (int) VOID_FTYPE_PV16QI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_ss_truncatev16siv16qi2_mask_store, "__builtin_ia32_pmovsdb512mem_mask", IX86_BUILTIN_PMOVSDB512_MEM, UNKNOWN, (int) VOID_FTYPE_PV16QI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_truncatev16siv16qi2_mask_store, "__builtin_ia32_pmovdb512mem_mask", IX86_BUILTIN_PMOVDB512_MEM, UNKNOWN, (int) VOID_FTYPE_PV16QI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_storev16sf_mask, "__builtin_ia32_storeups512_mask", IX86_BUILTIN_STOREUPS512, UNKNOWN, (int) VOID_FTYPE_PFLOAT_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_storev16sf_mask, "__builtin_ia32_storeaps512_mask", IX86_BUILTIN_STOREAPS512, UNKNOWN, (int) VOID_FTYPE_PV16SF_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_storev16si_mask, "__builtin_ia32_movdqa32store512_mask", IX86_BUILTIN_MOVDQA32STORE512, UNKNOWN, (int) VOID_FTYPE_PV16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_storev8df_mask, "__builtin_ia32_storeapd512_mask", IX86_BUILTIN_STOREAPD512, UNKNOWN, (int) VOID_FTYPE_PV8DF_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_storev8di_mask, "__builtin_ia32_movdqa64store512_mask", IX86_BUILTIN_MOVDQA64STORE512, UNKNOWN, (int) VOID_FTYPE_PV8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_compressstorev16sf_mask, "__builtin_ia32_compressstoresf512_mask", IX86_BUILTIN_COMPRESSPSSTORE512, UNKNOWN, (int) VOID_FTYPE_PV16SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_compressstorev16si_mask, "__builtin_ia32_compressstoresi512_mask", IX86_BUILTIN_PCOMPRESSDSTORE512, UNKNOWN, (int) VOID_FTYPE_PV16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_compressstorev8df_mask, "__builtin_ia32_compressstoredf512_mask", IX86_BUILTIN_COMPRESSPDSTORE512, UNKNOWN, (int) VOID_FTYPE_PV8DF_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_compressstorev8di_mask, "__builtin_ia32_compressstoredi512_mask", IX86_BUILTIN_PCOMPRESSQSTORE512, UNKNOWN, (int) VOID_FTYPE_PV8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_expandv16sf_mask, "__builtin_ia32_expandloadsf512_mask", IX86_BUILTIN_EXPANDPSLOAD512, UNKNOWN, (int) V16SF_FTYPE_PCV16SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_expandv16sf_maskz, "__builtin_ia32_expandloadsf512_maskz", IX86_BUILTIN_EXPANDPSLOAD512Z, UNKNOWN, (int) V16SF_FTYPE_PCV16SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_expandv16si_mask, "__builtin_ia32_expandloadsi512_mask", IX86_BUILTIN_PEXPANDDLOAD512, UNKNOWN, (int) V16SI_FTYPE_PCV16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_expandv16si_maskz, "__builtin_ia32_expandloadsi512_maskz", IX86_BUILTIN_PEXPANDDLOAD512Z, UNKNOWN, (int) V16SI_FTYPE_PCV16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_expandv8df_mask, "__builtin_ia32_expandloaddf512_mask", IX86_BUILTIN_EXPANDPDLOAD512, UNKNOWN, (int) V8DF_FTYPE_PCV8DF_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_expandv8df_maskz, "__builtin_ia32_expandloaddf512_maskz", IX86_BUILTIN_EXPANDPDLOAD512Z, UNKNOWN, (int) V8DF_FTYPE_PCV8DF_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_expandv8di_mask, "__builtin_ia32_expandloaddi512_mask", IX86_BUILTIN_PEXPANDQLOAD512, UNKNOWN, (int) V8DI_FTYPE_PCV8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_expandv8di_maskz, "__builtin_ia32_expandloaddi512_maskz", IX86_BUILTIN_PEXPANDQLOAD512Z, UNKNOWN, (int) V8DI_FTYPE_PCV8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_loadv16si_mask, "__builtin_ia32_loaddqusi512_mask", IX86_BUILTIN_LOADDQUSI512, UNKNOWN, (int) V16SI_FTYPE_PCINT_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_loadv8di_mask, "__builtin_ia32_loaddqudi512_mask", IX86_BUILTIN_LOADDQUDI512, UNKNOWN, (int) V8DI_FTYPE_PCINT64_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_loadv8df_mask, "__builtin_ia32_loadupd512_mask", IX86_BUILTIN_LOADUPD512, UNKNOWN, (int) V8DF_FTYPE_PCDOUBLE_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_loadv16sf_mask, "__builtin_ia32_loadups512_mask", IX86_BUILTIN_LOADUPS512, UNKNOWN, (int) V16SF_FTYPE_PCFLOAT_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_loadv16sf_mask, "__builtin_ia32_loadaps512_mask", IX86_BUILTIN_LOADAPS512, UNKNOWN, (int) V16SF_FTYPE_PCV16SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_loadv16si_mask, "__builtin_ia32_movdqa32load512_mask", IX86_BUILTIN_MOVDQA32LOAD512, UNKNOWN, (int) V16SI_FTYPE_PCV16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_loadv8df_mask, "__builtin_ia32_loadapd512_mask", IX86_BUILTIN_LOADAPD512, UNKNOWN, (int) V8DF_FTYPE_PCV8DF_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_loadv8di_mask, "__builtin_ia32_movdqa64load512_mask", IX86_BUILTIN_MOVDQA64LOAD512, UNKNOWN, (int) V8DI_FTYPE_PCV8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_movntv16sf, "__builtin_ia32_movntps512", IX86_BUILTIN_MOVNTPS512, UNKNOWN, (int) VOID_FTYPE_PFLOAT_V16SF)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_movntv8df, "__builtin_ia32_movntpd512", IX86_BUILTIN_MOVNTPD512, UNKNOWN, (int) VOID_FTYPE_PDOUBLE_V8DF)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_movntv8di, "__builtin_ia32_movntdq512", IX86_BUILTIN_MOVNTDQ512, UNKNOWN, (int) VOID_FTYPE_PV8DI_V8DI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_movntdqa, "__builtin_ia32_movntdqa512", IX86_BUILTIN_MOVNTDQA512, UNKNOWN, (int) V8DI_FTYPE_PV8DI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_storev16si_mask, "__builtin_ia32_storedqusi512_mask", IX86_BUILTIN_STOREDQUSI512, UNKNOWN, (int) VOID_FTYPE_PINT_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_storev8di_mask, "__builtin_ia32_storedqudi512_mask", IX86_BUILTIN_STOREDQUDI512, UNKNOWN, (int) VOID_FTYPE_PINT64_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_storev8df_mask, "__builtin_ia32_storeupd512_mask", IX86_BUILTIN_STOREUPD512, UNKNOWN, (int) VOID_FTYPE_PDOUBLE_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_us_truncatev8div8si2_mask_store, "__builtin_ia32_pmovusqd512mem_mask", IX86_BUILTIN_PMOVUSQD512_MEM, UNKNOWN, (int) VOID_FTYPE_PV8SI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_ss_truncatev8div8si2_mask_store, "__builtin_ia32_pmovsqd512mem_mask", IX86_BUILTIN_PMOVSQD512_MEM, UNKNOWN, (int) VOID_FTYPE_PV8SI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_truncatev8div8si2_mask_store, "__builtin_ia32_pmovqd512mem_mask", IX86_BUILTIN_PMOVQD512_MEM, UNKNOWN, (int) VOID_FTYPE_PV8SI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_us_truncatev8div8hi2_mask_store, "__builtin_ia32_pmovusqw512mem_mask", IX86_BUILTIN_PMOVUSQW512_MEM, UNKNOWN, (int) VOID_FTYPE_PV8HI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_ss_truncatev8div8hi2_mask_store, "__builtin_ia32_pmovsqw512mem_mask", IX86_BUILTIN_PMOVSQW512_MEM, UNKNOWN, (int) VOID_FTYPE_PV8HI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_truncatev8div8hi2_mask_store, "__builtin_ia32_pmovqw512mem_mask", IX86_BUILTIN_PMOVQW512_MEM, UNKNOWN, (int) VOID_FTYPE_PV8HI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_us_truncatev16siv16hi2_mask_store, "__builtin_ia32_pmovusdw512mem_mask", IX86_BUILTIN_PMOVUSDW512_MEM, UNKNOWN, (int) VOID_FTYPE_PV16HI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_ss_truncatev16siv16hi2_mask_store, "__builtin_ia32_pmovsdw512mem_mask", IX86_BUILTIN_PMOVSDW512_MEM, UNKNOWN, (int) VOID_FTYPE_PV16HI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_truncatev16siv16hi2_mask_store, "__builtin_ia32_pmovdw512mem_mask", IX86_BUILTIN_PMOVDW512_MEM, UNKNOWN, (int) VOID_FTYPE_PV16HI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_truncatev8div16qi2_mask_store_2, "__builtin_ia32_pmovqb512mem_mask", IX86_BUILTIN_PMOVQB512_MEM, UNKNOWN, (int) VOID_FTYPE_PUDI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_us_truncatev8div16qi2_mask_store_2, "__builtin_ia32_pmovusqb512mem_mask", IX86_BUILTIN_PMOVUSQB512_MEM, UNKNOWN, (int) VOID_FTYPE_PUDI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_ss_truncatev8div16qi2_mask_store_2, "__builtin_ia32_pmovsqb512mem_mask", IX86_BUILTIN_PMOVSQB512_MEM, UNKNOWN, (int) VOID_FTYPE_PUDI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_us_truncatev16siv16qi2_mask_store, "__builtin_ia32_pmovusdb512mem_mask", IX86_BUILTIN_PMOVUSDB512_MEM, UNKNOWN, (int) VOID_FTYPE_PV16QI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_ss_truncatev16siv16qi2_mask_store, "__builtin_ia32_pmovsdb512mem_mask", IX86_BUILTIN_PMOVSDB512_MEM, UNKNOWN, (int) VOID_FTYPE_PV16QI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_truncatev16siv16qi2_mask_store, "__builtin_ia32_pmovdb512mem_mask", IX86_BUILTIN_PMOVDB512_MEM, UNKNOWN, (int) VOID_FTYPE_PV16QI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_storev16sf_mask, "__builtin_ia32_storeups512_mask", IX86_BUILTIN_STOREUPS512, UNKNOWN, (int) VOID_FTYPE_PFLOAT_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_storev16sf_mask, "__builtin_ia32_storeaps512_mask", IX86_BUILTIN_STOREAPS512, UNKNOWN, (int) VOID_FTYPE_PV16SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_storev16si_mask, "__builtin_ia32_movdqa32store512_mask", IX86_BUILTIN_MOVDQA32STORE512, UNKNOWN, (int) VOID_FTYPE_PV16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_storev8df_mask, "__builtin_ia32_storeapd512_mask", IX86_BUILTIN_STOREAPD512, UNKNOWN, (int) VOID_FTYPE_PV8DF_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_storev8di_mask, "__builtin_ia32_movdqa64store512_mask", IX86_BUILTIN_MOVDQA64STORE512, UNKNOWN, (int) VOID_FTYPE_PV8DI_V8DI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_loaddf_mask, "__builtin_ia32_loadsd_mask", IX86_BUILTIN_LOADSD_MASK, UNKNOWN, (int) V2DF_FTYPE_PCDOUBLE_V2DF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_loadsf_mask, "__builtin_ia32_loadss_mask", IX86_BUILTIN_LOADSS_MASK, UNKNOWN, (int) V4SF_FTYPE_PCFLOAT_V4SF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_storedf_mask, "__builtin_ia32_storesd_mask", IX86_BUILTIN_STORESD_MASK, UNKNOWN, (int) VOID_FTYPE_PDOUBLE_V2DF_UQI)
@@ -1360,231 +1360,231 @@ BDESC (OPTION_MASK_ISA_BMI2, 0, CODE_FOR_bmi2_pext_si3, "__builtin_ia32_pext_si"
 BDESC (OPTION_MASK_ISA_BMI2 | OPTION_MASK_ISA_64BIT, 0, CODE_FOR_bmi2_pext_di3, "__builtin_ia32_pext_di", IX86_BUILTIN_PEXT64, UNKNOWN, (int) UINT64_FTYPE_UINT64_UINT64)
 
 /* AVX512F */
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_si512_256si, "__builtin_ia32_si512_256si", IX86_BUILTIN_SI512_SI256, UNKNOWN, (int) V16SI_FTYPE_V8SI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_ps512_256ps, "__builtin_ia32_ps512_256ps", IX86_BUILTIN_PS512_PS256, UNKNOWN, (int) V16SF_FTYPE_V8SF)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_pd512_256pd, "__builtin_ia32_pd512_256pd", IX86_BUILTIN_PD512_PD256, UNKNOWN, (int) V8DF_FTYPE_V4DF)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_si512_si, "__builtin_ia32_si512_si", IX86_BUILTIN_SI512_SI, UNKNOWN, (int) V16SI_FTYPE_V4SI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_ps512_ps, "__builtin_ia32_ps512_ps", IX86_BUILTIN_PS512_PS, UNKNOWN, (int) V16SF_FTYPE_V4SF)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_pd512_pd, "__builtin_ia32_pd512_pd", IX86_BUILTIN_PD512_PD, UNKNOWN, (int) V8DF_FTYPE_V2DF)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_alignv16si_mask, "__builtin_ia32_alignd512_mask", IX86_BUILTIN_ALIGND512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_INT_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_alignv8di_mask, "__builtin_ia32_alignq512_mask", IX86_BUILTIN_ALIGNQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_INT_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_blendmv16si, "__builtin_ia32_blendmd_512_mask", IX86_BUILTIN_BLENDMD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_blendmv8df, "__builtin_ia32_blendmpd_512_mask", IX86_BUILTIN_BLENDMPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_blendmv16sf, "__builtin_ia32_blendmps_512_mask", IX86_BUILTIN_BLENDMPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_blendmv8di, "__builtin_ia32_blendmq_512_mask", IX86_BUILTIN_BLENDMQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_broadcastv16sf_mask, "__builtin_ia32_broadcastf32x4_512", IX86_BUILTIN_BROADCASTF32X4_512, UNKNOWN, (int) V16SF_FTYPE_V4SF_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_broadcastv8df_mask, "__builtin_ia32_broadcastf64x4_512", IX86_BUILTIN_BROADCASTF64X4_512, UNKNOWN, (int) V8DF_FTYPE_V4DF_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_broadcastv16si_mask, "__builtin_ia32_broadcasti32x4_512", IX86_BUILTIN_BROADCASTI32X4_512, UNKNOWN, (int) V16SI_FTYPE_V4SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_broadcastv8di_mask, "__builtin_ia32_broadcasti64x4_512", IX86_BUILTIN_BROADCASTI64X4_512, UNKNOWN, (int) V8DI_FTYPE_V4DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vec_dupv8df_mask, "__builtin_ia32_broadcastsd512", IX86_BUILTIN_BROADCASTSD512, UNKNOWN, (int) V8DF_FTYPE_V2DF_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vec_dupv16sf_mask, "__builtin_ia32_broadcastss512", IX86_BUILTIN_BROADCASTSS512, UNKNOWN, (int) V16SF_FTYPE_V4SF_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_cmpv16si3_mask, "__builtin_ia32_cmpd512_mask", IX86_BUILTIN_CMPD512, UNKNOWN, (int) UHI_FTYPE_V16SI_V16SI_INT_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_cmpv8di3_mask, "__builtin_ia32_cmpq512_mask", IX86_BUILTIN_CMPQ512, UNKNOWN, (int) UQI_FTYPE_V8DI_V8DI_INT_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_compressv8df_mask, "__builtin_ia32_compressdf512_mask", IX86_BUILTIN_COMPRESSPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_compressv16sf_mask, "__builtin_ia32_compresssf512_mask", IX86_BUILTIN_COMPRESSPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_floatv8siv8df2_mask, "__builtin_ia32_cvtdq2pd512_mask", IX86_BUILTIN_CVTDQ2PD512, UNKNOWN, (int) V8DF_FTYPE_V8SI_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vcvtps2ph512_mask_sae,  "__builtin_ia32_vcvtps2ph512_mask", IX86_BUILTIN_CVTPS2PH512, UNKNOWN, (int) V16HI_FTYPE_V16SF_INT_V16HI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_floatunsv8siv8df2_mask, "__builtin_ia32_cvtudq2pd512_mask", IX86_BUILTIN_CVTUDQ2PD512, UNKNOWN, (int) V8DF_FTYPE_V8SI_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_si512_256si, "__builtin_ia32_si512_256si", IX86_BUILTIN_SI512_SI256, UNKNOWN, (int) V16SI_FTYPE_V8SI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_ps512_256ps, "__builtin_ia32_ps512_256ps", IX86_BUILTIN_PS512_PS256, UNKNOWN, (int) V16SF_FTYPE_V8SF)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_pd512_256pd, "__builtin_ia32_pd512_256pd", IX86_BUILTIN_PD512_PD256, UNKNOWN, (int) V8DF_FTYPE_V4DF)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_si512_si, "__builtin_ia32_si512_si", IX86_BUILTIN_SI512_SI, UNKNOWN, (int) V16SI_FTYPE_V4SI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_ps512_ps, "__builtin_ia32_ps512_ps", IX86_BUILTIN_PS512_PS, UNKNOWN, (int) V16SF_FTYPE_V4SF)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_pd512_pd, "__builtin_ia32_pd512_pd", IX86_BUILTIN_PD512_PD, UNKNOWN, (int) V8DF_FTYPE_V2DF)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_alignv16si_mask, "__builtin_ia32_alignd512_mask", IX86_BUILTIN_ALIGND512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_INT_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_alignv8di_mask, "__builtin_ia32_alignq512_mask", IX86_BUILTIN_ALIGNQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_INT_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_blendmv16si, "__builtin_ia32_blendmd_512_mask", IX86_BUILTIN_BLENDMD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_blendmv8df, "__builtin_ia32_blendmpd_512_mask", IX86_BUILTIN_BLENDMPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_blendmv16sf, "__builtin_ia32_blendmps_512_mask", IX86_BUILTIN_BLENDMPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_blendmv8di, "__builtin_ia32_blendmq_512_mask", IX86_BUILTIN_BLENDMQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_broadcastv16sf_mask, "__builtin_ia32_broadcastf32x4_512", IX86_BUILTIN_BROADCASTF32X4_512, UNKNOWN, (int) V16SF_FTYPE_V4SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_broadcastv8df_mask, "__builtin_ia32_broadcastf64x4_512", IX86_BUILTIN_BROADCASTF64X4_512, UNKNOWN, (int) V8DF_FTYPE_V4DF_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_broadcastv16si_mask, "__builtin_ia32_broadcasti32x4_512", IX86_BUILTIN_BROADCASTI32X4_512, UNKNOWN, (int) V16SI_FTYPE_V4SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_broadcastv8di_mask, "__builtin_ia32_broadcasti64x4_512", IX86_BUILTIN_BROADCASTI64X4_512, UNKNOWN, (int) V8DI_FTYPE_V4DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vec_dupv8df_mask, "__builtin_ia32_broadcastsd512", IX86_BUILTIN_BROADCASTSD512, UNKNOWN, (int) V8DF_FTYPE_V2DF_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vec_dupv16sf_mask, "__builtin_ia32_broadcastss512", IX86_BUILTIN_BROADCASTSS512, UNKNOWN, (int) V16SF_FTYPE_V4SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_cmpv16si3_mask, "__builtin_ia32_cmpd512_mask", IX86_BUILTIN_CMPD512, UNKNOWN, (int) UHI_FTYPE_V16SI_V16SI_INT_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_cmpv8di3_mask, "__builtin_ia32_cmpq512_mask", IX86_BUILTIN_CMPQ512, UNKNOWN, (int) UQI_FTYPE_V8DI_V8DI_INT_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_compressv8df_mask, "__builtin_ia32_compressdf512_mask", IX86_BUILTIN_COMPRESSPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_compressv16sf_mask, "__builtin_ia32_compresssf512_mask", IX86_BUILTIN_COMPRESSPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_floatv8siv8df2_mask, "__builtin_ia32_cvtdq2pd512_mask", IX86_BUILTIN_CVTDQ2PD512, UNKNOWN, (int) V8DF_FTYPE_V8SI_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vcvtps2ph512_mask_sae,  "__builtin_ia32_vcvtps2ph512_mask", IX86_BUILTIN_CVTPS2PH512, UNKNOWN, (int) V16HI_FTYPE_V16SF_INT_V16HI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_floatunsv8siv8df2_mask, "__builtin_ia32_cvtudq2pd512_mask", IX86_BUILTIN_CVTUDQ2PD512, UNKNOWN, (int) V8DF_FTYPE_V8SI_V8DF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_cvtusi2sd32, "__builtin_ia32_cvtusi2sd32", IX86_BUILTIN_CVTUSI2SD32, UNKNOWN, (int) V2DF_FTYPE_V2DF_UINT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_expandv8df_mask, "__builtin_ia32_expanddf512_mask", IX86_BUILTIN_EXPANDPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv8df_maskz, "__builtin_ia32_expanddf512_maskz", IX86_BUILTIN_EXPANDPD512Z, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_expandv16sf_mask, "__builtin_ia32_expandsf512_mask", IX86_BUILTIN_EXPANDPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv16sf_maskz, "__builtin_ia32_expandsf512_maskz", IX86_BUILTIN_EXPANDPS512Z, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vextractf32x4_mask, "__builtin_ia32_extractf32x4_mask", IX86_BUILTIN_EXTRACTF32X4, UNKNOWN, (int) V4SF_FTYPE_V16SF_INT_V4SF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vextractf64x4_mask, "__builtin_ia32_extractf64x4_mask", IX86_BUILTIN_EXTRACTF64X4, UNKNOWN, (int) V4DF_FTYPE_V8DF_INT_V4DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vextracti32x4_mask, "__builtin_ia32_extracti32x4_mask", IX86_BUILTIN_EXTRACTI32X4, UNKNOWN, (int) V4SI_FTYPE_V16SI_INT_V4SI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vextracti64x4_mask, "__builtin_ia32_extracti64x4_mask", IX86_BUILTIN_EXTRACTI64X4, UNKNOWN, (int) V4DI_FTYPE_V8DI_INT_V4DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vinsertf32x4_mask, "__builtin_ia32_insertf32x4_mask", IX86_BUILTIN_INSERTF32X4, UNKNOWN, (int) V16SF_FTYPE_V16SF_V4SF_INT_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vinsertf64x4_mask, "__builtin_ia32_insertf64x4_mask", IX86_BUILTIN_INSERTF64X4, UNKNOWN, (int) V8DF_FTYPE_V8DF_V4DF_INT_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vinserti32x4_mask, "__builtin_ia32_inserti32x4_mask", IX86_BUILTIN_INSERTI32X4, UNKNOWN, (int) V16SI_FTYPE_V16SI_V4SI_INT_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vinserti64x4_mask, "__builtin_ia32_inserti64x4_mask", IX86_BUILTIN_INSERTI64X4, UNKNOWN, (int) V8DI_FTYPE_V8DI_V4DI_INT_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_loadv8df_mask, "__builtin_ia32_movapd512_mask", IX86_BUILTIN_MOVAPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_loadv16sf_mask, "__builtin_ia32_movaps512_mask", IX86_BUILTIN_MOVAPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_movddup512_mask, "__builtin_ia32_movddup512_mask", IX86_BUILTIN_MOVDDUP512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_loadv16si_mask, "__builtin_ia32_movdqa32_512_mask", IX86_BUILTIN_MOVDQA32_512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_loadv8di_mask, "__builtin_ia32_movdqa64_512_mask", IX86_BUILTIN_MOVDQA64_512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_movshdup512_mask, "__builtin_ia32_movshdup512_mask", IX86_BUILTIN_MOVSHDUP512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_movsldup512_mask, "__builtin_ia32_movsldup512_mask", IX86_BUILTIN_MOVSLDUP512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_absv16si2_mask, "__builtin_ia32_pabsd512_mask", IX86_BUILTIN_PABSD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_absv8di2_mask, "__builtin_ia32_pabsq512_mask", IX86_BUILTIN_PABSQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_addv16si3_mask, "__builtin_ia32_paddd512_mask", IX86_BUILTIN_PADDD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_addv8di3_mask, "__builtin_ia32_paddq512_mask", IX86_BUILTIN_PADDQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_andv16si3_mask, "__builtin_ia32_pandd512_mask", IX86_BUILTIN_PANDD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_andnotv16si3_mask, "__builtin_ia32_pandnd512_mask", IX86_BUILTIN_PANDND512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_andnotv8di3_mask, "__builtin_ia32_pandnq512_mask", IX86_BUILTIN_PANDNQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_andv8di3_mask, "__builtin_ia32_pandq512_mask", IX86_BUILTIN_PANDQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vec_dupv16si_mask, "__builtin_ia32_pbroadcastd512", IX86_BUILTIN_PBROADCASTD512, UNKNOWN, (int) V16SI_FTYPE_V4SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vec_dup_gprv16si_mask, "__builtin_ia32_pbroadcastd512_gpr_mask", IX86_BUILTIN_PBROADCASTD512_GPR, UNKNOWN, (int) V16SI_FTYPE_SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512CD, 0, CODE_FOR_avx512cd_maskb_vec_dupv8di, "__builtin_ia32_broadcastmb512", IX86_BUILTIN_PBROADCASTMB512, UNKNOWN, (int) V8DI_FTYPE_UQI)
-BDESC (OPTION_MASK_ISA_AVX512CD, 0, CODE_FOR_avx512cd_maskw_vec_dupv16si, "__builtin_ia32_broadcastmw512", IX86_BUILTIN_PBROADCASTMW512, UNKNOWN, (int) V16SI_FTYPE_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vec_dupv8di_mask, "__builtin_ia32_pbroadcastq512", IX86_BUILTIN_PBROADCASTQ512, UNKNOWN, (int) V8DI_FTYPE_V2DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vec_dup_gprv8di_mask, "__builtin_ia32_pbroadcastq512_gpr_mask", IX86_BUILTIN_PBROADCASTQ512_GPR, UNKNOWN, (int) V8DI_FTYPE_DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_eqv16si3_mask, "__builtin_ia32_pcmpeqd512_mask", IX86_BUILTIN_PCMPEQD512_MASK, UNKNOWN, (int) UHI_FTYPE_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_eqv8di3_mask, "__builtin_ia32_pcmpeqq512_mask", IX86_BUILTIN_PCMPEQQ512_MASK, UNKNOWN, (int) UQI_FTYPE_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_gtv16si3_mask, "__builtin_ia32_pcmpgtd512_mask", IX86_BUILTIN_PCMPGTD512_MASK, UNKNOWN, (int) UHI_FTYPE_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_gtv8di3_mask, "__builtin_ia32_pcmpgtq512_mask", IX86_BUILTIN_PCMPGTQ512_MASK, UNKNOWN, (int) UQI_FTYPE_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_compressv16si_mask, "__builtin_ia32_compresssi512_mask", IX86_BUILTIN_PCOMPRESSD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_compressv8di_mask, "__builtin_ia32_compressdi512_mask", IX86_BUILTIN_PCOMPRESSQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_expandv16si_mask, "__builtin_ia32_expandsi512_mask", IX86_BUILTIN_PEXPANDD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv16si_maskz, "__builtin_ia32_expandsi512_maskz", IX86_BUILTIN_PEXPANDD512Z, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_expandv8di_mask, "__builtin_ia32_expanddi512_mask", IX86_BUILTIN_PEXPANDQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv8di_maskz, "__builtin_ia32_expanddi512_maskz", IX86_BUILTIN_PEXPANDQ512Z, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_smaxv16si3_mask, "__builtin_ia32_pmaxsd512_mask", IX86_BUILTIN_PMAXSD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_smaxv8di3_mask, "__builtin_ia32_pmaxsq512_mask", IX86_BUILTIN_PMAXSQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_umaxv16si3_mask, "__builtin_ia32_pmaxud512_mask", IX86_BUILTIN_PMAXUD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_umaxv8di3_mask, "__builtin_ia32_pmaxuq512_mask", IX86_BUILTIN_PMAXUQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sminv16si3_mask, "__builtin_ia32_pminsd512_mask", IX86_BUILTIN_PMINSD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sminv8di3_mask, "__builtin_ia32_pminsq512_mask", IX86_BUILTIN_PMINSQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_uminv16si3_mask, "__builtin_ia32_pminud512_mask", IX86_BUILTIN_PMINUD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_uminv8di3_mask, "__builtin_ia32_pminuq512_mask", IX86_BUILTIN_PMINUQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_truncatev16siv16qi2_mask, "__builtin_ia32_pmovdb512_mask", IX86_BUILTIN_PMOVDB512, UNKNOWN, (int) V16QI_FTYPE_V16SI_V16QI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_truncatev16siv16hi2_mask, "__builtin_ia32_pmovdw512_mask", IX86_BUILTIN_PMOVDW512, UNKNOWN, (int) V16HI_FTYPE_V16SI_V16HI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_truncatev8div16qi2_mask, "__builtin_ia32_pmovqb512_mask", IX86_BUILTIN_PMOVQB512, UNKNOWN, (int) V16QI_FTYPE_V8DI_V16QI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_truncatev8div8si2_mask, "__builtin_ia32_pmovqd512_mask", IX86_BUILTIN_PMOVQD512, UNKNOWN, (int) V8SI_FTYPE_V8DI_V8SI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_truncatev8div8hi2_mask, "__builtin_ia32_pmovqw512_mask", IX86_BUILTIN_PMOVQW512, UNKNOWN, (int) V8HI_FTYPE_V8DI_V8HI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_ss_truncatev16siv16qi2_mask, "__builtin_ia32_pmovsdb512_mask", IX86_BUILTIN_PMOVSDB512, UNKNOWN, (int) V16QI_FTYPE_V16SI_V16QI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_ss_truncatev16siv16hi2_mask, "__builtin_ia32_pmovsdw512_mask", IX86_BUILTIN_PMOVSDW512, UNKNOWN, (int) V16HI_FTYPE_V16SI_V16HI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_ss_truncatev8div16qi2_mask, "__builtin_ia32_pmovsqb512_mask", IX86_BUILTIN_PMOVSQB512, UNKNOWN, (int) V16QI_FTYPE_V8DI_V16QI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_ss_truncatev8div8si2_mask, "__builtin_ia32_pmovsqd512_mask", IX86_BUILTIN_PMOVSQD512, UNKNOWN, (int) V8SI_FTYPE_V8DI_V8SI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_ss_truncatev8div8hi2_mask, "__builtin_ia32_pmovsqw512_mask", IX86_BUILTIN_PMOVSQW512, UNKNOWN, (int) V8HI_FTYPE_V8DI_V8HI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_sign_extendv16qiv16si2_mask, "__builtin_ia32_pmovsxbd512_mask", IX86_BUILTIN_PMOVSXBD512, UNKNOWN, (int) V16SI_FTYPE_V16QI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_sign_extendv8qiv8di2_mask, "__builtin_ia32_pmovsxbq512_mask", IX86_BUILTIN_PMOVSXBQ512, UNKNOWN, (int) V8DI_FTYPE_V16QI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_sign_extendv8siv8di2_mask, "__builtin_ia32_pmovsxdq512_mask", IX86_BUILTIN_PMOVSXDQ512, UNKNOWN, (int) V8DI_FTYPE_V8SI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_sign_extendv16hiv16si2_mask, "__builtin_ia32_pmovsxwd512_mask", IX86_BUILTIN_PMOVSXWD512, UNKNOWN, (int) V16SI_FTYPE_V16HI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_sign_extendv8hiv8di2_mask, "__builtin_ia32_pmovsxwq512_mask", IX86_BUILTIN_PMOVSXWQ512, UNKNOWN, (int) V8DI_FTYPE_V8HI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_us_truncatev16siv16qi2_mask, "__builtin_ia32_pmovusdb512_mask", IX86_BUILTIN_PMOVUSDB512, UNKNOWN, (int) V16QI_FTYPE_V16SI_V16QI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_us_truncatev16siv16hi2_mask, "__builtin_ia32_pmovusdw512_mask", IX86_BUILTIN_PMOVUSDW512, UNKNOWN, (int) V16HI_FTYPE_V16SI_V16HI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_us_truncatev8div16qi2_mask, "__builtin_ia32_pmovusqb512_mask", IX86_BUILTIN_PMOVUSQB512, UNKNOWN, (int) V16QI_FTYPE_V8DI_V16QI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_us_truncatev8div8si2_mask, "__builtin_ia32_pmovusqd512_mask", IX86_BUILTIN_PMOVUSQD512, UNKNOWN, (int) V8SI_FTYPE_V8DI_V8SI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_us_truncatev8div8hi2_mask, "__builtin_ia32_pmovusqw512_mask", IX86_BUILTIN_PMOVUSQW512, UNKNOWN, (int) V8HI_FTYPE_V8DI_V8HI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_zero_extendv16qiv16si2_mask, "__builtin_ia32_pmovzxbd512_mask", IX86_BUILTIN_PMOVZXBD512, UNKNOWN, (int) V16SI_FTYPE_V16QI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_zero_extendv8qiv8di2_mask, "__builtin_ia32_pmovzxbq512_mask", IX86_BUILTIN_PMOVZXBQ512, UNKNOWN, (int) V8DI_FTYPE_V16QI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_zero_extendv8siv8di2_mask, "__builtin_ia32_pmovzxdq512_mask", IX86_BUILTIN_PMOVZXDQ512, UNKNOWN, (int) V8DI_FTYPE_V8SI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_zero_extendv16hiv16si2_mask, "__builtin_ia32_pmovzxwd512_mask", IX86_BUILTIN_PMOVZXWD512, UNKNOWN, (int) V16SI_FTYPE_V16HI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_zero_extendv8hiv8di2_mask, "__builtin_ia32_pmovzxwq512_mask", IX86_BUILTIN_PMOVZXWQ512, UNKNOWN, (int) V8DI_FTYPE_V8HI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_vec_widen_smult_even_v16si_mask, "__builtin_ia32_pmuldq512_mask", IX86_BUILTIN_PMULDQ512, UNKNOWN, (int) V8DI_FTYPE_V16SI_V16SI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_mulv16si3_mask, "__builtin_ia32_pmulld512_mask"  , IX86_BUILTIN_PMULLD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_vec_widen_umult_even_v16si_mask, "__builtin_ia32_pmuludq512_mask", IX86_BUILTIN_PMULUDQ512, UNKNOWN, (int) V8DI_FTYPE_V16SI_V16SI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_iorv16si3_mask, "__builtin_ia32_pord512_mask", IX86_BUILTIN_PORD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_iorv8di3_mask, "__builtin_ia32_porq512_mask", IX86_BUILTIN_PORQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_rolv16si_mask, "__builtin_ia32_prold512_mask", IX86_BUILTIN_PROLD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_INT_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_rolv8di_mask, "__builtin_ia32_prolq512_mask", IX86_BUILTIN_PROLQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_INT_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_rolvv16si_mask, "__builtin_ia32_prolvd512_mask", IX86_BUILTIN_PROLVD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_rolvv8di_mask, "__builtin_ia32_prolvq512_mask", IX86_BUILTIN_PROLVQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_rorv16si_mask, "__builtin_ia32_prord512_mask", IX86_BUILTIN_PRORD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_INT_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_rorv8di_mask, "__builtin_ia32_prorq512_mask", IX86_BUILTIN_PRORQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_INT_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_rorvv16si_mask, "__builtin_ia32_prorvd512_mask", IX86_BUILTIN_PRORVD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_rorvv8di_mask, "__builtin_ia32_prorvq512_mask", IX86_BUILTIN_PRORVQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_pshufdv3_mask, "__builtin_ia32_pshufd512_mask", IX86_BUILTIN_PSHUFD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_INT_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_ashlv16si3_mask, "__builtin_ia32_pslld512_mask", IX86_BUILTIN_PSLLD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V4SI_V16SI_UHI_COUNT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_ashlv16si3_mask, "__builtin_ia32_pslldi512_mask", IX86_BUILTIN_PSLLDI512, UNKNOWN, (int) V16SI_FTYPE_V16SI_INT_V16SI_UHI_COUNT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_ashlv8di3_mask, "__builtin_ia32_psllq512_mask", IX86_BUILTIN_PSLLQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V2DI_V8DI_UQI_COUNT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_ashlv8di3_mask, "__builtin_ia32_psllqi512_mask", IX86_BUILTIN_PSLLQI512, UNKNOWN, (int) V8DI_FTYPE_V8DI_INT_V8DI_UQI_COUNT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_ashlvv16si_mask, "__builtin_ia32_psllv16si_mask", IX86_BUILTIN_PSLLVV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_ashlvv8di_mask, "__builtin_ia32_psllv8di_mask", IX86_BUILTIN_PSLLVV8DI, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_ashrv16si3_mask, "__builtin_ia32_psrad512_mask", IX86_BUILTIN_PSRAD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V4SI_V16SI_UHI_COUNT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_ashrv16si3_mask, "__builtin_ia32_psradi512_mask", IX86_BUILTIN_PSRADI512, UNKNOWN, (int) V16SI_FTYPE_V16SI_INT_V16SI_UHI_COUNT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_ashrv8di3_mask, "__builtin_ia32_psraq512_mask", IX86_BUILTIN_PSRAQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V2DI_V8DI_UQI_COUNT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_ashrv8di3_mask, "__builtin_ia32_psraqi512_mask", IX86_BUILTIN_PSRAQI512, UNKNOWN, (int) V8DI_FTYPE_V8DI_INT_V8DI_UQI_COUNT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_ashrvv16si_mask, "__builtin_ia32_psrav16si_mask", IX86_BUILTIN_PSRAVV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_ashrvv8di_mask, "__builtin_ia32_psrav8di_mask", IX86_BUILTIN_PSRAVV8DI, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_lshrv16si3_mask, "__builtin_ia32_psrld512_mask", IX86_BUILTIN_PSRLD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V4SI_V16SI_UHI_COUNT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_lshrv16si3_mask, "__builtin_ia32_psrldi512_mask", IX86_BUILTIN_PSRLDI512, UNKNOWN, (int) V16SI_FTYPE_V16SI_INT_V16SI_UHI_COUNT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_lshrv8di3_mask, "__builtin_ia32_psrlq512_mask", IX86_BUILTIN_PSRLQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V2DI_V8DI_UQI_COUNT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_lshrv8di3_mask, "__builtin_ia32_psrlqi512_mask", IX86_BUILTIN_PSRLQI512, UNKNOWN, (int) V8DI_FTYPE_V8DI_INT_V8DI_UQI_COUNT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_lshrvv16si_mask, "__builtin_ia32_psrlv16si_mask", IX86_BUILTIN_PSRLVV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_lshrvv8di_mask, "__builtin_ia32_psrlv8di_mask", IX86_BUILTIN_PSRLVV8DI, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_subv16si3_mask, "__builtin_ia32_psubd512_mask", IX86_BUILTIN_PSUBD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_subv8di3_mask, "__builtin_ia32_psubq512_mask", IX86_BUILTIN_PSUBQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_testmv16si3_mask, "__builtin_ia32_ptestmd512", IX86_BUILTIN_PTESTMD512, UNKNOWN, (int) UHI_FTYPE_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_testmv8di3_mask, "__builtin_ia32_ptestmq512", IX86_BUILTIN_PTESTMQ512, UNKNOWN, (int) UQI_FTYPE_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_testnmv16si3_mask, "__builtin_ia32_ptestnmd512", IX86_BUILTIN_PTESTNMD512, UNKNOWN, (int) UHI_FTYPE_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_testnmv8di3_mask, "__builtin_ia32_ptestnmq512", IX86_BUILTIN_PTESTNMQ512, UNKNOWN, (int) UQI_FTYPE_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_interleave_highv16si_mask, "__builtin_ia32_punpckhdq512_mask", IX86_BUILTIN_PUNPCKHDQ512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_interleave_highv8di_mask, "__builtin_ia32_punpckhqdq512_mask", IX86_BUILTIN_PUNPCKHQDQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_interleave_lowv16si_mask, "__builtin_ia32_punpckldq512_mask", IX86_BUILTIN_PUNPCKLDQ512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_interleave_lowv8di_mask, "__builtin_ia32_punpcklqdq512_mask", IX86_BUILTIN_PUNPCKLQDQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_xorv16si3_mask, "__builtin_ia32_pxord512_mask", IX86_BUILTIN_PXORD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_xorv8di3_mask, "__builtin_ia32_pxorq512_mask", IX86_BUILTIN_PXORQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_rcp14v8df_mask, "__builtin_ia32_rcp14pd512_mask", IX86_BUILTIN_RCP14PD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_rcp14v16sf_mask, "__builtin_ia32_rcp14ps512_mask", IX86_BUILTIN_RCP14PS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_expandv8df_mask, "__builtin_ia32_expanddf512_mask", IX86_BUILTIN_EXPANDPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_expandv8df_maskz, "__builtin_ia32_expanddf512_maskz", IX86_BUILTIN_EXPANDPD512Z, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_expandv16sf_mask, "__builtin_ia32_expandsf512_mask", IX86_BUILTIN_EXPANDPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_expandv16sf_maskz, "__builtin_ia32_expandsf512_maskz", IX86_BUILTIN_EXPANDPS512Z, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vextractf32x4_mask, "__builtin_ia32_extractf32x4_mask", IX86_BUILTIN_EXTRACTF32X4, UNKNOWN, (int) V4SF_FTYPE_V16SF_INT_V4SF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vextractf64x4_mask, "__builtin_ia32_extractf64x4_mask", IX86_BUILTIN_EXTRACTF64X4, UNKNOWN, (int) V4DF_FTYPE_V8DF_INT_V4DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vextracti32x4_mask, "__builtin_ia32_extracti32x4_mask", IX86_BUILTIN_EXTRACTI32X4, UNKNOWN, (int) V4SI_FTYPE_V16SI_INT_V4SI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vextracti64x4_mask, "__builtin_ia32_extracti64x4_mask", IX86_BUILTIN_EXTRACTI64X4, UNKNOWN, (int) V4DI_FTYPE_V8DI_INT_V4DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vinsertf32x4_mask, "__builtin_ia32_insertf32x4_mask", IX86_BUILTIN_INSERTF32X4, UNKNOWN, (int) V16SF_FTYPE_V16SF_V4SF_INT_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vinsertf64x4_mask, "__builtin_ia32_insertf64x4_mask", IX86_BUILTIN_INSERTF64X4, UNKNOWN, (int) V8DF_FTYPE_V8DF_V4DF_INT_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vinserti32x4_mask, "__builtin_ia32_inserti32x4_mask", IX86_BUILTIN_INSERTI32X4, UNKNOWN, (int) V16SI_FTYPE_V16SI_V4SI_INT_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vinserti64x4_mask, "__builtin_ia32_inserti64x4_mask", IX86_BUILTIN_INSERTI64X4, UNKNOWN, (int) V8DI_FTYPE_V8DI_V4DI_INT_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_loadv8df_mask, "__builtin_ia32_movapd512_mask", IX86_BUILTIN_MOVAPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_loadv16sf_mask, "__builtin_ia32_movaps512_mask", IX86_BUILTIN_MOVAPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_movddup512_mask, "__builtin_ia32_movddup512_mask", IX86_BUILTIN_MOVDDUP512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_loadv16si_mask, "__builtin_ia32_movdqa32_512_mask", IX86_BUILTIN_MOVDQA32_512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_loadv8di_mask, "__builtin_ia32_movdqa64_512_mask", IX86_BUILTIN_MOVDQA64_512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_movshdup512_mask, "__builtin_ia32_movshdup512_mask", IX86_BUILTIN_MOVSHDUP512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_movsldup512_mask, "__builtin_ia32_movsldup512_mask", IX86_BUILTIN_MOVSLDUP512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_absv16si2_mask, "__builtin_ia32_pabsd512_mask", IX86_BUILTIN_PABSD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_absv8di2_mask, "__builtin_ia32_pabsq512_mask", IX86_BUILTIN_PABSQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_addv16si3_mask, "__builtin_ia32_paddd512_mask", IX86_BUILTIN_PADDD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_addv8di3_mask, "__builtin_ia32_paddq512_mask", IX86_BUILTIN_PADDQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_andv16si3_mask, "__builtin_ia32_pandd512_mask", IX86_BUILTIN_PANDD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_andnotv16si3_mask, "__builtin_ia32_pandnd512_mask", IX86_BUILTIN_PANDND512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_andnotv8di3_mask, "__builtin_ia32_pandnq512_mask", IX86_BUILTIN_PANDNQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_andv8di3_mask, "__builtin_ia32_pandq512_mask", IX86_BUILTIN_PANDQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vec_dupv16si_mask, "__builtin_ia32_pbroadcastd512", IX86_BUILTIN_PBROADCASTD512, UNKNOWN, (int) V16SI_FTYPE_V4SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vec_dup_gprv16si_mask, "__builtin_ia32_pbroadcastd512_gpr_mask", IX86_BUILTIN_PBROADCASTD512_GPR, UNKNOWN, (int) V16SI_FTYPE_SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512CD, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512cd_maskb_vec_dupv8di, "__builtin_ia32_broadcastmb512", IX86_BUILTIN_PBROADCASTMB512, UNKNOWN, (int) V8DI_FTYPE_UQI)
+BDESC (OPTION_MASK_ISA_AVX512CD, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512cd_maskw_vec_dupv16si, "__builtin_ia32_broadcastmw512", IX86_BUILTIN_PBROADCASTMW512, UNKNOWN, (int) V16SI_FTYPE_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vec_dupv8di_mask, "__builtin_ia32_pbroadcastq512", IX86_BUILTIN_PBROADCASTQ512, UNKNOWN, (int) V8DI_FTYPE_V2DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vec_dup_gprv8di_mask, "__builtin_ia32_pbroadcastq512_gpr_mask", IX86_BUILTIN_PBROADCASTQ512_GPR, UNKNOWN, (int) V8DI_FTYPE_DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_eqv16si3_mask, "__builtin_ia32_pcmpeqd512_mask", IX86_BUILTIN_PCMPEQD512_MASK, UNKNOWN, (int) UHI_FTYPE_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_eqv8di3_mask, "__builtin_ia32_pcmpeqq512_mask", IX86_BUILTIN_PCMPEQQ512_MASK, UNKNOWN, (int) UQI_FTYPE_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_gtv16si3_mask, "__builtin_ia32_pcmpgtd512_mask", IX86_BUILTIN_PCMPGTD512_MASK, UNKNOWN, (int) UHI_FTYPE_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_gtv8di3_mask, "__builtin_ia32_pcmpgtq512_mask", IX86_BUILTIN_PCMPGTQ512_MASK, UNKNOWN, (int) UQI_FTYPE_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_compressv16si_mask, "__builtin_ia32_compresssi512_mask", IX86_BUILTIN_PCOMPRESSD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_compressv8di_mask, "__builtin_ia32_compressdi512_mask", IX86_BUILTIN_PCOMPRESSQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_expandv16si_mask, "__builtin_ia32_expandsi512_mask", IX86_BUILTIN_PEXPANDD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_expandv16si_maskz, "__builtin_ia32_expandsi512_maskz", IX86_BUILTIN_PEXPANDD512Z, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_expandv8di_mask, "__builtin_ia32_expanddi512_mask", IX86_BUILTIN_PEXPANDQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_expandv8di_maskz, "__builtin_ia32_expanddi512_maskz", IX86_BUILTIN_PEXPANDQ512Z, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_smaxv16si3_mask, "__builtin_ia32_pmaxsd512_mask", IX86_BUILTIN_PMAXSD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_smaxv8di3_mask, "__builtin_ia32_pmaxsq512_mask", IX86_BUILTIN_PMAXSQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_umaxv16si3_mask, "__builtin_ia32_pmaxud512_mask", IX86_BUILTIN_PMAXUD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_umaxv8di3_mask, "__builtin_ia32_pmaxuq512_mask", IX86_BUILTIN_PMAXUQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_sminv16si3_mask, "__builtin_ia32_pminsd512_mask", IX86_BUILTIN_PMINSD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_sminv8di3_mask, "__builtin_ia32_pminsq512_mask", IX86_BUILTIN_PMINSQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_uminv16si3_mask, "__builtin_ia32_pminud512_mask", IX86_BUILTIN_PMINUD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_uminv8di3_mask, "__builtin_ia32_pminuq512_mask", IX86_BUILTIN_PMINUQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_truncatev16siv16qi2_mask, "__builtin_ia32_pmovdb512_mask", IX86_BUILTIN_PMOVDB512, UNKNOWN, (int) V16QI_FTYPE_V16SI_V16QI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_truncatev16siv16hi2_mask, "__builtin_ia32_pmovdw512_mask", IX86_BUILTIN_PMOVDW512, UNKNOWN, (int) V16HI_FTYPE_V16SI_V16HI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_truncatev8div16qi2_mask, "__builtin_ia32_pmovqb512_mask", IX86_BUILTIN_PMOVQB512, UNKNOWN, (int) V16QI_FTYPE_V8DI_V16QI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_truncatev8div8si2_mask, "__builtin_ia32_pmovqd512_mask", IX86_BUILTIN_PMOVQD512, UNKNOWN, (int) V8SI_FTYPE_V8DI_V8SI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_truncatev8div8hi2_mask, "__builtin_ia32_pmovqw512_mask", IX86_BUILTIN_PMOVQW512, UNKNOWN, (int) V8HI_FTYPE_V8DI_V8HI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_ss_truncatev16siv16qi2_mask, "__builtin_ia32_pmovsdb512_mask", IX86_BUILTIN_PMOVSDB512, UNKNOWN, (int) V16QI_FTYPE_V16SI_V16QI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_ss_truncatev16siv16hi2_mask, "__builtin_ia32_pmovsdw512_mask", IX86_BUILTIN_PMOVSDW512, UNKNOWN, (int) V16HI_FTYPE_V16SI_V16HI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_ss_truncatev8div16qi2_mask, "__builtin_ia32_pmovsqb512_mask", IX86_BUILTIN_PMOVSQB512, UNKNOWN, (int) V16QI_FTYPE_V8DI_V16QI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_ss_truncatev8div8si2_mask, "__builtin_ia32_pmovsqd512_mask", IX86_BUILTIN_PMOVSQD512, UNKNOWN, (int) V8SI_FTYPE_V8DI_V8SI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_ss_truncatev8div8hi2_mask, "__builtin_ia32_pmovsqw512_mask", IX86_BUILTIN_PMOVSQW512, UNKNOWN, (int) V8HI_FTYPE_V8DI_V8HI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_sign_extendv16qiv16si2_mask, "__builtin_ia32_pmovsxbd512_mask", IX86_BUILTIN_PMOVSXBD512, UNKNOWN, (int) V16SI_FTYPE_V16QI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_sign_extendv8qiv8di2_mask, "__builtin_ia32_pmovsxbq512_mask", IX86_BUILTIN_PMOVSXBQ512, UNKNOWN, (int) V8DI_FTYPE_V16QI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_sign_extendv8siv8di2_mask, "__builtin_ia32_pmovsxdq512_mask", IX86_BUILTIN_PMOVSXDQ512, UNKNOWN, (int) V8DI_FTYPE_V8SI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_sign_extendv16hiv16si2_mask, "__builtin_ia32_pmovsxwd512_mask", IX86_BUILTIN_PMOVSXWD512, UNKNOWN, (int) V16SI_FTYPE_V16HI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_sign_extendv8hiv8di2_mask, "__builtin_ia32_pmovsxwq512_mask", IX86_BUILTIN_PMOVSXWQ512, UNKNOWN, (int) V8DI_FTYPE_V8HI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_us_truncatev16siv16qi2_mask, "__builtin_ia32_pmovusdb512_mask", IX86_BUILTIN_PMOVUSDB512, UNKNOWN, (int) V16QI_FTYPE_V16SI_V16QI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_us_truncatev16siv16hi2_mask, "__builtin_ia32_pmovusdw512_mask", IX86_BUILTIN_PMOVUSDW512, UNKNOWN, (int) V16HI_FTYPE_V16SI_V16HI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_us_truncatev8div16qi2_mask, "__builtin_ia32_pmovusqb512_mask", IX86_BUILTIN_PMOVUSQB512, UNKNOWN, (int) V16QI_FTYPE_V8DI_V16QI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_us_truncatev8div8si2_mask, "__builtin_ia32_pmovusqd512_mask", IX86_BUILTIN_PMOVUSQD512, UNKNOWN, (int) V8SI_FTYPE_V8DI_V8SI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_us_truncatev8div8hi2_mask, "__builtin_ia32_pmovusqw512_mask", IX86_BUILTIN_PMOVUSQW512, UNKNOWN, (int) V8HI_FTYPE_V8DI_V8HI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_zero_extendv16qiv16si2_mask, "__builtin_ia32_pmovzxbd512_mask", IX86_BUILTIN_PMOVZXBD512, UNKNOWN, (int) V16SI_FTYPE_V16QI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_zero_extendv8qiv8di2_mask, "__builtin_ia32_pmovzxbq512_mask", IX86_BUILTIN_PMOVZXBQ512, UNKNOWN, (int) V8DI_FTYPE_V16QI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_zero_extendv8siv8di2_mask, "__builtin_ia32_pmovzxdq512_mask", IX86_BUILTIN_PMOVZXDQ512, UNKNOWN, (int) V8DI_FTYPE_V8SI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_zero_extendv16hiv16si2_mask, "__builtin_ia32_pmovzxwd512_mask", IX86_BUILTIN_PMOVZXWD512, UNKNOWN, (int) V16SI_FTYPE_V16HI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_zero_extendv8hiv8di2_mask, "__builtin_ia32_pmovzxwq512_mask", IX86_BUILTIN_PMOVZXWQ512, UNKNOWN, (int) V8DI_FTYPE_V8HI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vec_widen_smult_even_v16si_mask, "__builtin_ia32_pmuldq512_mask", IX86_BUILTIN_PMULDQ512, UNKNOWN, (int) V8DI_FTYPE_V16SI_V16SI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_mulv16si3_mask, "__builtin_ia32_pmulld512_mask"  , IX86_BUILTIN_PMULLD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vec_widen_umult_even_v16si_mask, "__builtin_ia32_pmuludq512_mask", IX86_BUILTIN_PMULUDQ512, UNKNOWN, (int) V8DI_FTYPE_V16SI_V16SI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_iorv16si3_mask, "__builtin_ia32_pord512_mask", IX86_BUILTIN_PORD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_iorv8di3_mask, "__builtin_ia32_porq512_mask", IX86_BUILTIN_PORQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_rolv16si_mask, "__builtin_ia32_prold512_mask", IX86_BUILTIN_PROLD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_INT_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_rolv8di_mask, "__builtin_ia32_prolq512_mask", IX86_BUILTIN_PROLQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_INT_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_rolvv16si_mask, "__builtin_ia32_prolvd512_mask", IX86_BUILTIN_PROLVD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_rolvv8di_mask, "__builtin_ia32_prolvq512_mask", IX86_BUILTIN_PROLVQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_rorv16si_mask, "__builtin_ia32_prord512_mask", IX86_BUILTIN_PRORD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_INT_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_rorv8di_mask, "__builtin_ia32_prorq512_mask", IX86_BUILTIN_PRORQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_INT_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_rorvv16si_mask, "__builtin_ia32_prorvd512_mask", IX86_BUILTIN_PRORVD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_rorvv8di_mask, "__builtin_ia32_prorvq512_mask", IX86_BUILTIN_PRORVQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_pshufdv3_mask, "__builtin_ia32_pshufd512_mask", IX86_BUILTIN_PSHUFD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_INT_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_ashlv16si3_mask, "__builtin_ia32_pslld512_mask", IX86_BUILTIN_PSLLD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V4SI_V16SI_UHI_COUNT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_ashlv16si3_mask, "__builtin_ia32_pslldi512_mask", IX86_BUILTIN_PSLLDI512, UNKNOWN, (int) V16SI_FTYPE_V16SI_INT_V16SI_UHI_COUNT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_ashlv8di3_mask, "__builtin_ia32_psllq512_mask", IX86_BUILTIN_PSLLQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V2DI_V8DI_UQI_COUNT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_ashlv8di3_mask, "__builtin_ia32_psllqi512_mask", IX86_BUILTIN_PSLLQI512, UNKNOWN, (int) V8DI_FTYPE_V8DI_INT_V8DI_UQI_COUNT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_ashlvv16si_mask, "__builtin_ia32_psllv16si_mask", IX86_BUILTIN_PSLLVV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_ashlvv8di_mask, "__builtin_ia32_psllv8di_mask", IX86_BUILTIN_PSLLVV8DI, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_ashrv16si3_mask, "__builtin_ia32_psrad512_mask", IX86_BUILTIN_PSRAD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V4SI_V16SI_UHI_COUNT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_ashrv16si3_mask, "__builtin_ia32_psradi512_mask", IX86_BUILTIN_PSRADI512, UNKNOWN, (int) V16SI_FTYPE_V16SI_INT_V16SI_UHI_COUNT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_ashrv8di3_mask, "__builtin_ia32_psraq512_mask", IX86_BUILTIN_PSRAQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V2DI_V8DI_UQI_COUNT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_ashrv8di3_mask, "__builtin_ia32_psraqi512_mask", IX86_BUILTIN_PSRAQI512, UNKNOWN, (int) V8DI_FTYPE_V8DI_INT_V8DI_UQI_COUNT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_ashrvv16si_mask, "__builtin_ia32_psrav16si_mask", IX86_BUILTIN_PSRAVV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_ashrvv8di_mask, "__builtin_ia32_psrav8di_mask", IX86_BUILTIN_PSRAVV8DI, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_lshrv16si3_mask, "__builtin_ia32_psrld512_mask", IX86_BUILTIN_PSRLD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V4SI_V16SI_UHI_COUNT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_lshrv16si3_mask, "__builtin_ia32_psrldi512_mask", IX86_BUILTIN_PSRLDI512, UNKNOWN, (int) V16SI_FTYPE_V16SI_INT_V16SI_UHI_COUNT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_lshrv8di3_mask, "__builtin_ia32_psrlq512_mask", IX86_BUILTIN_PSRLQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V2DI_V8DI_UQI_COUNT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_lshrv8di3_mask, "__builtin_ia32_psrlqi512_mask", IX86_BUILTIN_PSRLQI512, UNKNOWN, (int) V8DI_FTYPE_V8DI_INT_V8DI_UQI_COUNT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_lshrvv16si_mask, "__builtin_ia32_psrlv16si_mask", IX86_BUILTIN_PSRLVV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_lshrvv8di_mask, "__builtin_ia32_psrlv8di_mask", IX86_BUILTIN_PSRLVV8DI, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_subv16si3_mask, "__builtin_ia32_psubd512_mask", IX86_BUILTIN_PSUBD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_subv8di3_mask, "__builtin_ia32_psubq512_mask", IX86_BUILTIN_PSUBQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_testmv16si3_mask, "__builtin_ia32_ptestmd512", IX86_BUILTIN_PTESTMD512, UNKNOWN, (int) UHI_FTYPE_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_testmv8di3_mask, "__builtin_ia32_ptestmq512", IX86_BUILTIN_PTESTMQ512, UNKNOWN, (int) UQI_FTYPE_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_testnmv16si3_mask, "__builtin_ia32_ptestnmd512", IX86_BUILTIN_PTESTNMD512, UNKNOWN, (int) UHI_FTYPE_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_testnmv8di3_mask, "__builtin_ia32_ptestnmq512", IX86_BUILTIN_PTESTNMQ512, UNKNOWN, (int) UQI_FTYPE_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_interleave_highv16si_mask, "__builtin_ia32_punpckhdq512_mask", IX86_BUILTIN_PUNPCKHDQ512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_interleave_highv8di_mask, "__builtin_ia32_punpckhqdq512_mask", IX86_BUILTIN_PUNPCKHQDQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_interleave_lowv16si_mask, "__builtin_ia32_punpckldq512_mask", IX86_BUILTIN_PUNPCKLDQ512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_interleave_lowv8di_mask, "__builtin_ia32_punpcklqdq512_mask", IX86_BUILTIN_PUNPCKLQDQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_xorv16si3_mask, "__builtin_ia32_pxord512_mask", IX86_BUILTIN_PXORD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_xorv8di3_mask, "__builtin_ia32_pxorq512_mask", IX86_BUILTIN_PXORQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_rcp14v8df_mask, "__builtin_ia32_rcp14pd512_mask", IX86_BUILTIN_RCP14PD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_rcp14v16sf_mask, "__builtin_ia32_rcp14ps512_mask", IX86_BUILTIN_RCP14PS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_srcp14v2df, "__builtin_ia32_rcp14sd", IX86_BUILTIN_RCP14SD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_srcp14v2df_mask, "__builtin_ia32_rcp14sd_mask", IX86_BUILTIN_RCP14SDMASK, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_srcp14v4sf, "__builtin_ia32_rcp14ss", IX86_BUILTIN_RCP14SS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_srcp14v4sf_mask, "__builtin_ia32_rcp14ss_mask", IX86_BUILTIN_RCP14SSMASK, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_rsqrt14v8df_mask, "__builtin_ia32_rsqrt14pd512_mask", IX86_BUILTIN_RSQRT14PD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_rsqrt14v16sf_mask, "__builtin_ia32_rsqrt14ps512_mask", IX86_BUILTIN_RSQRT14PS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_rsqrt14v8df_mask, "__builtin_ia32_rsqrt14pd512_mask", IX86_BUILTIN_RSQRT14PD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_rsqrt14v16sf_mask, "__builtin_ia32_rsqrt14ps512_mask", IX86_BUILTIN_RSQRT14PS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_rsqrt14v2df, "__builtin_ia32_rsqrt14sd", IX86_BUILTIN_RSQRT14SD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_rsqrt14_v2df_mask, "__builtin_ia32_rsqrt14sd_mask", IX86_BUILTIN_RSQRT14SDMASK, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_rsqrt14v4sf, "__builtin_ia32_rsqrt14ss", IX86_BUILTIN_RSQRT14SS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_rsqrt14_v4sf_mask, "__builtin_ia32_rsqrt14ss_mask", IX86_BUILTIN_RSQRT14SSMASK, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_shufpd512_mask, "__builtin_ia32_shufpd512_mask", IX86_BUILTIN_SHUFPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_INT_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_shufps512_mask, "__builtin_ia32_shufps512_mask", IX86_BUILTIN_SHUFPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_INT_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_shuf_f32x4_mask, "__builtin_ia32_shuf_f32x4_mask", IX86_BUILTIN_SHUF_F32x4, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_INT_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_shuf_f64x2_mask, "__builtin_ia32_shuf_f64x2_mask", IX86_BUILTIN_SHUF_F64x2, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_INT_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_shuf_i32x4_mask, "__builtin_ia32_shuf_i32x4_mask", IX86_BUILTIN_SHUF_I32x4, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_INT_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_shuf_i64x2_mask, "__builtin_ia32_shuf_i64x2_mask", IX86_BUILTIN_SHUF_I64x2, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_INT_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_ucmpv16si3_mask, "__builtin_ia32_ucmpd512_mask", IX86_BUILTIN_UCMPD512, UNKNOWN, (int) UHI_FTYPE_V16SI_V16SI_INT_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_ucmpv8di3_mask, "__builtin_ia32_ucmpq512_mask", IX86_BUILTIN_UCMPQ512, UNKNOWN, (int) UQI_FTYPE_V8DI_V8DI_INT_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_unpckhpd512_mask, "__builtin_ia32_unpckhpd512_mask", IX86_BUILTIN_UNPCKHPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_unpckhps512_mask, "__builtin_ia32_unpckhps512_mask", IX86_BUILTIN_UNPCKHPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_unpcklpd512_mask, "__builtin_ia32_unpcklpd512_mask", IX86_BUILTIN_UNPCKLPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_unpcklps512_mask,  "__builtin_ia32_unpcklps512_mask", IX86_BUILTIN_UNPCKLPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512CD, 0, CODE_FOR_clzv16si2_mask, "__builtin_ia32_vplzcntd_512_mask", IX86_BUILTIN_VPCLZCNTD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512CD, 0, CODE_FOR_clzv8di2_mask, "__builtin_ia32_vplzcntq_512_mask", IX86_BUILTIN_VPCLZCNTQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512CD, 0, CODE_FOR_conflictv16si_mask, "__builtin_ia32_vpconflictsi_512_mask", IX86_BUILTIN_VPCONFLICTD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512CD, 0, CODE_FOR_conflictv8di_mask, "__builtin_ia32_vpconflictdi_512_mask", IX86_BUILTIN_VPCONFLICTQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_permv8df_mask, "__builtin_ia32_permdf512_mask", IX86_BUILTIN_VPERMDF512, UNKNOWN, (int) V8DF_FTYPE_V8DF_INT_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_permv8di_mask, "__builtin_ia32_permdi512_mask", IX86_BUILTIN_VPERMDI512, UNKNOWN, (int) V8DI_FTYPE_V8DI_INT_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vpermi2varv16si3_mask, "__builtin_ia32_vpermi2vard512_mask", IX86_BUILTIN_VPERMI2VARD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vpermi2varv8df3_mask, "__builtin_ia32_vpermi2varpd512_mask", IX86_BUILTIN_VPERMI2VARPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DI_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vpermi2varv16sf3_mask, "__builtin_ia32_vpermi2varps512_mask", IX86_BUILTIN_VPERMI2VARPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SI_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vpermi2varv8di3_mask, "__builtin_ia32_vpermi2varq512_mask", IX86_BUILTIN_VPERMI2VARQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vpermilv8df_mask, "__builtin_ia32_vpermilpd512_mask", IX86_BUILTIN_VPERMILPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_INT_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vpermilv16sf_mask, "__builtin_ia32_vpermilps512_mask", IX86_BUILTIN_VPERMILPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_INT_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vpermilvarv8df3_mask, "__builtin_ia32_vpermilvarpd512_mask", IX86_BUILTIN_VPERMILVARPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DI_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vpermilvarv16sf3_mask, "__builtin_ia32_vpermilvarps512_mask", IX86_BUILTIN_VPERMILVARPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SI_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vpermt2varv16si3_mask, "__builtin_ia32_vpermt2vard512_mask", IX86_BUILTIN_VPERMT2VARD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vpermt2varv16si3_maskz, "__builtin_ia32_vpermt2vard512_maskz", IX86_BUILTIN_VPERMT2VARD512_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vpermt2varv8df3_mask, "__builtin_ia32_vpermt2varpd512_mask", IX86_BUILTIN_VPERMT2VARPD512, UNKNOWN, (int) V8DF_FTYPE_V8DI_V8DF_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vpermt2varv8df3_maskz, "__builtin_ia32_vpermt2varpd512_maskz", IX86_BUILTIN_VPERMT2VARPD512_MASKZ, UNKNOWN, (int) V8DF_FTYPE_V8DI_V8DF_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vpermt2varv16sf3_mask, "__builtin_ia32_vpermt2varps512_mask", IX86_BUILTIN_VPERMT2VARPS512, UNKNOWN, (int) V16SF_FTYPE_V16SI_V16SF_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vpermt2varv16sf3_maskz, "__builtin_ia32_vpermt2varps512_maskz", IX86_BUILTIN_VPERMT2VARPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SI_V16SF_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vpermt2varv8di3_mask, "__builtin_ia32_vpermt2varq512_mask", IX86_BUILTIN_VPERMT2VARQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vpermt2varv8di3_maskz, "__builtin_ia32_vpermt2varq512_maskz", IX86_BUILTIN_VPERMT2VARQ512_MASKZ, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_permvarv8df_mask, "__builtin_ia32_permvardf512_mask", IX86_BUILTIN_VPERMVARDF512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DI_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_permvarv8di_mask, "__builtin_ia32_permvardi512_mask", IX86_BUILTIN_VPERMVARDI512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_permvarv16sf_mask, "__builtin_ia32_permvarsf512_mask", IX86_BUILTIN_VPERMVARSF512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SI_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_permvarv16si_mask, "__builtin_ia32_permvarsi512_mask", IX86_BUILTIN_VPERMVARSI512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vternlogv16si_mask, "__builtin_ia32_pternlogd512_mask", IX86_BUILTIN_VTERNLOGD512_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_INT_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vternlogv16si_maskz, "__builtin_ia32_pternlogd512_maskz", IX86_BUILTIN_VTERNLOGD512_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_INT_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vternlogv8di_mask, "__builtin_ia32_pternlogq512_mask", IX86_BUILTIN_VTERNLOGQ512_MASK, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_INT_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vternlogv8di_maskz, "__builtin_ia32_pternlogq512_maskz", IX86_BUILTIN_VTERNLOGQ512_MASKZ, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_INT_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_shufpd512_mask, "__builtin_ia32_shufpd512_mask", IX86_BUILTIN_SHUFPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_INT_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_shufps512_mask, "__builtin_ia32_shufps512_mask", IX86_BUILTIN_SHUFPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_INT_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_shuf_f32x4_mask, "__builtin_ia32_shuf_f32x4_mask", IX86_BUILTIN_SHUF_F32x4, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_INT_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_shuf_f64x2_mask, "__builtin_ia32_shuf_f64x2_mask", IX86_BUILTIN_SHUF_F64x2, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_INT_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_shuf_i32x4_mask, "__builtin_ia32_shuf_i32x4_mask", IX86_BUILTIN_SHUF_I32x4, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_INT_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_shuf_i64x2_mask, "__builtin_ia32_shuf_i64x2_mask", IX86_BUILTIN_SHUF_I64x2, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_INT_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_ucmpv16si3_mask, "__builtin_ia32_ucmpd512_mask", IX86_BUILTIN_UCMPD512, UNKNOWN, (int) UHI_FTYPE_V16SI_V16SI_INT_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_ucmpv8di3_mask, "__builtin_ia32_ucmpq512_mask", IX86_BUILTIN_UCMPQ512, UNKNOWN, (int) UQI_FTYPE_V8DI_V8DI_INT_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_unpckhpd512_mask, "__builtin_ia32_unpckhpd512_mask", IX86_BUILTIN_UNPCKHPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_unpckhps512_mask, "__builtin_ia32_unpckhps512_mask", IX86_BUILTIN_UNPCKHPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_unpcklpd512_mask, "__builtin_ia32_unpcklpd512_mask", IX86_BUILTIN_UNPCKLPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_unpcklps512_mask,  "__builtin_ia32_unpcklps512_mask", IX86_BUILTIN_UNPCKLPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512CD, OPTION_MASK_ISA2_EVEX512, CODE_FOR_clzv16si2_mask, "__builtin_ia32_vplzcntd_512_mask", IX86_BUILTIN_VPCLZCNTD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512CD, OPTION_MASK_ISA2_EVEX512, CODE_FOR_clzv8di2_mask, "__builtin_ia32_vplzcntq_512_mask", IX86_BUILTIN_VPCLZCNTQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512CD, OPTION_MASK_ISA2_EVEX512, CODE_FOR_conflictv16si_mask, "__builtin_ia32_vpconflictsi_512_mask", IX86_BUILTIN_VPCONFLICTD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512CD, OPTION_MASK_ISA2_EVEX512, CODE_FOR_conflictv8di_mask, "__builtin_ia32_vpconflictdi_512_mask", IX86_BUILTIN_VPCONFLICTQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_permv8df_mask, "__builtin_ia32_permdf512_mask", IX86_BUILTIN_VPERMDF512, UNKNOWN, (int) V8DF_FTYPE_V8DF_INT_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_permv8di_mask, "__builtin_ia32_permdi512_mask", IX86_BUILTIN_VPERMDI512, UNKNOWN, (int) V8DI_FTYPE_V8DI_INT_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vpermi2varv16si3_mask, "__builtin_ia32_vpermi2vard512_mask", IX86_BUILTIN_VPERMI2VARD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vpermi2varv8df3_mask, "__builtin_ia32_vpermi2varpd512_mask", IX86_BUILTIN_VPERMI2VARPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DI_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vpermi2varv16sf3_mask, "__builtin_ia32_vpermi2varps512_mask", IX86_BUILTIN_VPERMI2VARPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SI_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vpermi2varv8di3_mask, "__builtin_ia32_vpermi2varq512_mask", IX86_BUILTIN_VPERMI2VARQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vpermilv8df_mask, "__builtin_ia32_vpermilpd512_mask", IX86_BUILTIN_VPERMILPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_INT_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vpermilv16sf_mask, "__builtin_ia32_vpermilps512_mask", IX86_BUILTIN_VPERMILPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_INT_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vpermilvarv8df3_mask, "__builtin_ia32_vpermilvarpd512_mask", IX86_BUILTIN_VPERMILVARPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DI_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vpermilvarv16sf3_mask, "__builtin_ia32_vpermilvarps512_mask", IX86_BUILTIN_VPERMILVARPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SI_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vpermt2varv16si3_mask, "__builtin_ia32_vpermt2vard512_mask", IX86_BUILTIN_VPERMT2VARD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vpermt2varv16si3_maskz, "__builtin_ia32_vpermt2vard512_maskz", IX86_BUILTIN_VPERMT2VARD512_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vpermt2varv8df3_mask, "__builtin_ia32_vpermt2varpd512_mask", IX86_BUILTIN_VPERMT2VARPD512, UNKNOWN, (int) V8DF_FTYPE_V8DI_V8DF_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vpermt2varv8df3_maskz, "__builtin_ia32_vpermt2varpd512_maskz", IX86_BUILTIN_VPERMT2VARPD512_MASKZ, UNKNOWN, (int) V8DF_FTYPE_V8DI_V8DF_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vpermt2varv16sf3_mask, "__builtin_ia32_vpermt2varps512_mask", IX86_BUILTIN_VPERMT2VARPS512, UNKNOWN, (int) V16SF_FTYPE_V16SI_V16SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vpermt2varv16sf3_maskz, "__builtin_ia32_vpermt2varps512_maskz", IX86_BUILTIN_VPERMT2VARPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SI_V16SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vpermt2varv8di3_mask, "__builtin_ia32_vpermt2varq512_mask", IX86_BUILTIN_VPERMT2VARQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vpermt2varv8di3_maskz, "__builtin_ia32_vpermt2varq512_maskz", IX86_BUILTIN_VPERMT2VARQ512_MASKZ, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_permvarv8df_mask, "__builtin_ia32_permvardf512_mask", IX86_BUILTIN_VPERMVARDF512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DI_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_permvarv8di_mask, "__builtin_ia32_permvardi512_mask", IX86_BUILTIN_VPERMVARDI512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_permvarv16sf_mask, "__builtin_ia32_permvarsf512_mask", IX86_BUILTIN_VPERMVARSF512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SI_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_permvarv16si_mask, "__builtin_ia32_permvarsi512_mask", IX86_BUILTIN_VPERMVARSI512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vternlogv16si_mask, "__builtin_ia32_pternlogd512_mask", IX86_BUILTIN_VTERNLOGD512_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_INT_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vternlogv16si_maskz, "__builtin_ia32_pternlogd512_maskz", IX86_BUILTIN_VTERNLOGD512_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_INT_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vternlogv8di_mask, "__builtin_ia32_pternlogq512_mask", IX86_BUILTIN_VTERNLOGQ512_MASK, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_INT_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vternlogv8di_maskz, "__builtin_ia32_pternlogq512_maskz", IX86_BUILTIN_VTERNLOGQ512_MASKZ, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_INT_UQI)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_movdf_mask, "__builtin_ia32_movesd_mask", IX86_BUILTIN_MOVSD_MASK, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_movsf_mask, "__builtin_ia32_movess_mask", IX86_BUILTIN_MOVSS_MASK, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI)
 
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_copysignv16sf3,  "__builtin_ia32_copysignps512", IX86_BUILTIN_CPYSGNPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_copysignv8df3,  "__builtin_ia32_copysignpd512", IX86_BUILTIN_CPYSGNPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_sqrtv8df2, "__builtin_ia32_sqrtpd512", IX86_BUILTIN_SQRTPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sqrtv16sf2, "__builtin_ia32_sqrtps512", IX86_BUILTIN_SQRTPS_NR512, UNKNOWN, (int) V16SF_FTYPE_V16SF)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_copysignv16sf3,  "__builtin_ia32_copysignps512", IX86_BUILTIN_CPYSGNPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_copysignv8df3,  "__builtin_ia32_copysignpd512", IX86_BUILTIN_CPYSGNPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_sqrtv8df2, "__builtin_ia32_sqrtpd512", IX86_BUILTIN_SQRTPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_sqrtv16sf2, "__builtin_ia32_sqrtps512", IX86_BUILTIN_SQRTPS_NR512, UNKNOWN, (int) V16SF_FTYPE_V16SF)
 BDESC (OPTION_MASK_ISA_AVX512ER, 0, CODE_FOR_avx512er_exp2v16sf, "__builtin_ia32_exp2ps", IX86_BUILTIN_EXP2PS, UNKNOWN, (int) V16SF_FTYPE_V16SF)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_rndscalev32hf, "__builtin_ia32_floorph512", IX86_BUILTIN_FLOORPH512, (enum rtx_code) ROUND_FLOOR, (int) V32HF_FTYPE_V32HF_ROUND)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_rndscalev32hf, "__builtin_ia32_ceilph512", IX86_BUILTIN_CEILPH512, (enum rtx_code) ROUND_CEIL, (int) V32HF_FTYPE_V32HF_ROUND)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_rndscalev32hf, "__builtin_ia32_truncph512", IX86_BUILTIN_TRUNCPH512, (enum rtx_code) ROUND_TRUNC, (int) V32HF_FTYPE_V32HF_ROUND)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_roundps512, "__builtin_ia32_floorps512", IX86_BUILTIN_FLOORPS512, (enum rtx_code) ROUND_FLOOR, (int) V16SF_FTYPE_V16SF_ROUND)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_roundps512, "__builtin_ia32_ceilps512", IX86_BUILTIN_CEILPS512, (enum rtx_code) ROUND_CEIL, (int) V16SF_FTYPE_V16SF_ROUND)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_roundps512, "__builtin_ia32_truncps512", IX86_BUILTIN_TRUNCPS512, (enum rtx_code) ROUND_TRUNC, (int) V16SF_FTYPE_V16SF_ROUND)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_roundpd512, "__builtin_ia32_floorpd512", IX86_BUILTIN_FLOORPD512, (enum rtx_code) ROUND_FLOOR, (int) V8DF_FTYPE_V8DF_ROUND)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_roundpd512, "__builtin_ia32_ceilpd512", IX86_BUILTIN_CEILPD512, (enum rtx_code) ROUND_CEIL, (int) V8DF_FTYPE_V8DF_ROUND)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_roundpd512, "__builtin_ia32_truncpd512", IX86_BUILTIN_TRUNCPD512, (enum rtx_code) ROUND_TRUNC, (int) V8DF_FTYPE_V8DF_ROUND)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fix_notruncv16sfv16si, "__builtin_ia32_cvtps2dq512", IX86_BUILTIN_CVTPS2DQ512, UNKNOWN, (int) V16SI_FTYPE_V16SF)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vec_pack_sfix_v8df, "__builtin_ia32_vec_pack_sfix512", IX86_BUILTIN_VEC_PACK_SFIX512, UNKNOWN, (int) V16SI_FTYPE_V8DF_V8DF)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_roundv16sf2_sfix, "__builtin_ia32_roundps_az_sfix512", IX86_BUILTIN_ROUNDPS_AZ_SFIX512, UNKNOWN, (int) V16SI_FTYPE_V16SF)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_roundps512_sfix, "__builtin_ia32_floorps_sfix512", IX86_BUILTIN_FLOORPS_SFIX512, (enum rtx_code) ROUND_FLOOR, (int) V16SI_FTYPE_V16SF_ROUND)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_roundps512_sfix, "__builtin_ia32_ceilps_sfix512", IX86_BUILTIN_CEILPS_SFIX512, (enum rtx_code) ROUND_CEIL, (int) V16SI_FTYPE_V16SF_ROUND)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_roundv8df2_vec_pack_sfix, "__builtin_ia32_roundpd_az_vec_pack_sfix512", IX86_BUILTIN_ROUNDPD_AZ_VEC_PACK_SFIX512, UNKNOWN, (int) V16SI_FTYPE_V8DF_V8DF)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_roundpd_vec_pack_sfix512, "__builtin_ia32_floorpd_vec_pack_sfix512", IX86_BUILTIN_FLOORPD_VEC_PACK_SFIX512, (enum rtx_code) ROUND_FLOOR, (int) V16SI_FTYPE_V8DF_V8DF_ROUND)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_roundpd_vec_pack_sfix512, "__builtin_ia32_ceilpd_vec_pack_sfix512", IX86_BUILTIN_CEILPD_VEC_PACK_SFIX512, (enum rtx_code) ROUND_CEIL, (int) V16SI_FTYPE_V8DF_V8DF_ROUND)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_roundps512, "__builtin_ia32_floorps512", IX86_BUILTIN_FLOORPS512, (enum rtx_code) ROUND_FLOOR, (int) V16SF_FTYPE_V16SF_ROUND)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_roundps512, "__builtin_ia32_ceilps512", IX86_BUILTIN_CEILPS512, (enum rtx_code) ROUND_CEIL, (int) V16SF_FTYPE_V16SF_ROUND)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_roundps512, "__builtin_ia32_truncps512", IX86_BUILTIN_TRUNCPS512, (enum rtx_code) ROUND_TRUNC, (int) V16SF_FTYPE_V16SF_ROUND)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_roundpd512, "__builtin_ia32_floorpd512", IX86_BUILTIN_FLOORPD512, (enum rtx_code) ROUND_FLOOR, (int) V8DF_FTYPE_V8DF_ROUND)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_roundpd512, "__builtin_ia32_ceilpd512", IX86_BUILTIN_CEILPD512, (enum rtx_code) ROUND_CEIL, (int) V8DF_FTYPE_V8DF_ROUND)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_roundpd512, "__builtin_ia32_truncpd512", IX86_BUILTIN_TRUNCPD512, (enum rtx_code) ROUND_TRUNC, (int) V8DF_FTYPE_V8DF_ROUND)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fix_notruncv16sfv16si, "__builtin_ia32_cvtps2dq512", IX86_BUILTIN_CVTPS2DQ512, UNKNOWN, (int) V16SI_FTYPE_V16SF)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vec_pack_sfix_v8df, "__builtin_ia32_vec_pack_sfix512", IX86_BUILTIN_VEC_PACK_SFIX512, UNKNOWN, (int) V16SI_FTYPE_V8DF_V8DF)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_roundv16sf2_sfix, "__builtin_ia32_roundps_az_sfix512", IX86_BUILTIN_ROUNDPS_AZ_SFIX512, UNKNOWN, (int) V16SI_FTYPE_V16SF)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_roundps512_sfix, "__builtin_ia32_floorps_sfix512", IX86_BUILTIN_FLOORPS_SFIX512, (enum rtx_code) ROUND_FLOOR, (int) V16SI_FTYPE_V16SF_ROUND)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_roundps512_sfix, "__builtin_ia32_ceilps_sfix512", IX86_BUILTIN_CEILPS_SFIX512, (enum rtx_code) ROUND_CEIL, (int) V16SI_FTYPE_V16SF_ROUND)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_roundv8df2_vec_pack_sfix, "__builtin_ia32_roundpd_az_vec_pack_sfix512", IX86_BUILTIN_ROUNDPD_AZ_VEC_PACK_SFIX512, UNKNOWN, (int) V16SI_FTYPE_V8DF_V8DF)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_roundpd_vec_pack_sfix512, "__builtin_ia32_floorpd_vec_pack_sfix512", IX86_BUILTIN_FLOORPD_VEC_PACK_SFIX512, (enum rtx_code) ROUND_FLOOR, (int) V16SI_FTYPE_V8DF_V8DF_ROUND)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_roundpd_vec_pack_sfix512, "__builtin_ia32_ceilpd_vec_pack_sfix512", IX86_BUILTIN_CEILPD_VEC_PACK_SFIX512, (enum rtx_code) ROUND_CEIL, (int) V16SI_FTYPE_V8DF_V8DF_ROUND)
 
 /* Mask arithmetic operations */
 BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_kashiftqi, "__builtin_ia32_kshiftliqi", IX86_BUILTIN_KSHIFTLI8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI_CONST)
@@ -3034,26 +3034,26 @@ BDESC_END (ARGS, ROUND_ARGS)
 
 /* AVX512F.  */
 BDESC_FIRST (round_args, ROUND_ARGS,
-       OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_addv8df3_mask_round, "__builtin_ia32_addpd512_mask", IX86_BUILTIN_ADDPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_addv16sf3_mask_round, "__builtin_ia32_addps512_mask", IX86_BUILTIN_ADDPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
+       OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_addv8df3_mask_round, "__builtin_ia32_addpd512_mask", IX86_BUILTIN_ADDPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_addv16sf3_mask_round, "__builtin_ia32_addps512_mask", IX86_BUILTIN_ADDPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse2_vmaddv2df3_round, "__builtin_ia32_addsd_round", IX86_BUILTIN_ADDSD_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse2_vmaddv2df3_mask_round, "__builtin_ia32_addsd_mask_round", IX86_BUILTIN_ADDSD_MASK_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse_vmaddv4sf3_round, "__builtin_ia32_addss_round", IX86_BUILTIN_ADDSS_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse_vmaddv4sf3_mask_round, "__builtin_ia32_addss_mask_round", IX86_BUILTIN_ADDSS_MASK_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_cmpv8df3_mask_round, "__builtin_ia32_cmppd512_mask", IX86_BUILTIN_CMPPD512, UNKNOWN, (int) UQI_FTYPE_V8DF_V8DF_INT_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_cmpv16sf3_mask_round, "__builtin_ia32_cmpps512_mask", IX86_BUILTIN_CMPPS512, UNKNOWN, (int) UHI_FTYPE_V16SF_V16SF_INT_UHI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_cmpv8df3_mask_round, "__builtin_ia32_cmppd512_mask", IX86_BUILTIN_CMPPD512, UNKNOWN, (int) UQI_FTYPE_V8DF_V8DF_INT_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_cmpv16sf3_mask_round, "__builtin_ia32_cmpps512_mask", IX86_BUILTIN_CMPPS512, UNKNOWN, (int) UHI_FTYPE_V16SF_V16SF_INT_UHI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vmcmpv2df3_mask_round, "__builtin_ia32_cmpsd_mask", IX86_BUILTIN_CMPSD_MASK, UNKNOWN, (int) UQI_FTYPE_V2DF_V2DF_INT_UQI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vmcmpv4sf3_mask_round, "__builtin_ia32_cmpss_mask", IX86_BUILTIN_CMPSS_MASK, UNKNOWN, (int) UQI_FTYPE_V4SF_V4SF_INT_UQI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse2_comi_round, "__builtin_ia32_vcomisd", IX86_BUILTIN_COMIDF, UNKNOWN, (int) INT_FTYPE_V2DF_V2DF_INT_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse_comi_round, "__builtin_ia32_vcomiss", IX86_BUILTIN_COMISF, UNKNOWN, (int) INT_FTYPE_V4SF_V4SF_INT_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_floatv16siv16sf2_mask_round, "__builtin_ia32_cvtdq2ps512_mask", IX86_BUILTIN_CVTDQ2PS512, UNKNOWN, (int) V16SF_FTYPE_V16SI_V16SF_HI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_cvtpd2dq512_mask_round, "__builtin_ia32_cvtpd2dq512_mask", IX86_BUILTIN_CVTPD2DQ512, UNKNOWN, (int) V8SI_FTYPE_V8DF_V8SI_QI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_cvtpd2ps512_mask_round,  "__builtin_ia32_cvtpd2ps512_mask", IX86_BUILTIN_CVTPD2PS512, UNKNOWN, (int) V8SF_FTYPE_V8DF_V8SF_QI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_fixuns_notruncv8dfv8si2_mask_round, "__builtin_ia32_cvtpd2udq512_mask", IX86_BUILTIN_CVTPD2UDQ512, UNKNOWN, (int) V8SI_FTYPE_V8DF_V8SI_QI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vcvtph2ps512_mask_round,  "__builtin_ia32_vcvtph2ps512_mask", IX86_BUILTIN_CVTPH2PS512, UNKNOWN, (int) V16SF_FTYPE_V16HI_V16SF_HI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fix_notruncv16sfv16si_mask_round, "__builtin_ia32_cvtps2dq512_mask", IX86_BUILTIN_CVTPS2DQ512_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SF_V16SI_HI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_cvtps2pd512_mask_round, "__builtin_ia32_cvtps2pd512_mask", IX86_BUILTIN_CVTPS2PD512, UNKNOWN, (int) V8DF_FTYPE_V8SF_V8DF_QI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fixuns_notruncv16sfv16si_mask_round, "__builtin_ia32_cvtps2udq512_mask", IX86_BUILTIN_CVTPS2UDQ512, UNKNOWN, (int) V16SI_FTYPE_V16SF_V16SI_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_floatv16siv16sf2_mask_round, "__builtin_ia32_cvtdq2ps512_mask", IX86_BUILTIN_CVTDQ2PS512, UNKNOWN, (int) V16SF_FTYPE_V16SI_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_cvtpd2dq512_mask_round, "__builtin_ia32_cvtpd2dq512_mask", IX86_BUILTIN_CVTPD2DQ512, UNKNOWN, (int) V8SI_FTYPE_V8DF_V8SI_QI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_cvtpd2ps512_mask_round,  "__builtin_ia32_cvtpd2ps512_mask", IX86_BUILTIN_CVTPD2PS512, UNKNOWN, (int) V8SF_FTYPE_V8DF_V8SF_QI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_fixuns_notruncv8dfv8si2_mask_round, "__builtin_ia32_cvtpd2udq512_mask", IX86_BUILTIN_CVTPD2UDQ512, UNKNOWN, (int) V8SI_FTYPE_V8DF_V8SI_QI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vcvtph2ps512_mask_round,  "__builtin_ia32_vcvtph2ps512_mask", IX86_BUILTIN_CVTPH2PS512, UNKNOWN, (int) V16SF_FTYPE_V16HI_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fix_notruncv16sfv16si_mask_round, "__builtin_ia32_cvtps2dq512_mask", IX86_BUILTIN_CVTPS2DQ512_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SF_V16SI_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_cvtps2pd512_mask_round, "__builtin_ia32_cvtps2pd512_mask", IX86_BUILTIN_CVTPS2PD512, UNKNOWN, (int) V8DF_FTYPE_V8SF_V8DF_QI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fixuns_notruncv16sfv16si_mask_round, "__builtin_ia32_cvtps2udq512_mask", IX86_BUILTIN_CVTPS2UDQ512, UNKNOWN, (int) V16SI_FTYPE_V16SF_V16SI_HI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse2_cvtsd2ss_round, "__builtin_ia32_cvtsd2ss_round", IX86_BUILTIN_CVTSD2SS_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V2DF_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse2_cvtsd2ss_mask_round, "__builtin_ia32_cvtsd2ss_mask_round", IX86_BUILTIN_CVTSD2SS_MASK_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V2DF_V4SF_UQI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F | OPTION_MASK_ISA_64BIT, 0, CODE_FOR_sse2_cvtsi2sdq_round, "__builtin_ia32_cvtsi2sd64", IX86_BUILTIN_CVTSI2SD64, UNKNOWN, (int) V2DF_FTYPE_V2DF_INT64_INT)
@@ -3069,64 +3069,64 @@ BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_floatunsv16siv16sf2_mask_round, "__b
 BDESC (OPTION_MASK_ISA_AVX512F | OPTION_MASK_ISA_64BIT, 0, CODE_FOR_cvtusi2sd64_round, "__builtin_ia32_cvtusi2sd64", IX86_BUILTIN_CVTUSI2SD64, UNKNOWN, (int) V2DF_FTYPE_V2DF_UINT64_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_cvtusi2ss32_round, "__builtin_ia32_cvtusi2ss32", IX86_BUILTIN_CVTUSI2SS32, UNKNOWN, (int) V4SF_FTYPE_V4SF_UINT_INT)
 BDESC (OPTION_MASK_ISA_AVX512F | OPTION_MASK_ISA_64BIT, 0, CODE_FOR_cvtusi2ss64_round, "__builtin_ia32_cvtusi2ss64", IX86_BUILTIN_CVTUSI2SS64, UNKNOWN, (int) V4SF_FTYPE_V4SF_UINT64_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_divv8df3_mask_round, "__builtin_ia32_divpd512_mask", IX86_BUILTIN_DIVPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_divv16sf3_mask_round, "__builtin_ia32_divps512_mask", IX86_BUILTIN_DIVPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_divv8df3_mask_round, "__builtin_ia32_divpd512_mask", IX86_BUILTIN_DIVPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_divv16sf3_mask_round, "__builtin_ia32_divps512_mask", IX86_BUILTIN_DIVPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse2_vmdivv2df3_round, "__builtin_ia32_divsd_round", IX86_BUILTIN_DIVSD_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse2_vmdivv2df3_mask_round, "__builtin_ia32_divsd_mask_round", IX86_BUILTIN_DIVSD_MASK_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse_vmdivv4sf3_round, "__builtin_ia32_divss_round", IX86_BUILTIN_DIVSS_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse_vmdivv4sf3_mask_round, "__builtin_ia32_divss_mask_round", IX86_BUILTIN_DIVSS_MASK_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fixupimmv8df_mask_round, "__builtin_ia32_fixupimmpd512_mask", IX86_BUILTIN_FIXUPIMMPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DI_INT_QI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fixupimmv8df_maskz_round, "__builtin_ia32_fixupimmpd512_maskz", IX86_BUILTIN_FIXUPIMMPD512_MASKZ, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DI_INT_QI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fixupimmv16sf_mask_round, "__builtin_ia32_fixupimmps512_mask", IX86_BUILTIN_FIXUPIMMPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SI_INT_HI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fixupimmv16sf_maskz_round, "__builtin_ia32_fixupimmps512_maskz", IX86_BUILTIN_FIXUPIMMPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SI_INT_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fixupimmv8df_mask_round, "__builtin_ia32_fixupimmpd512_mask", IX86_BUILTIN_FIXUPIMMPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DI_INT_QI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fixupimmv8df_maskz_round, "__builtin_ia32_fixupimmpd512_maskz", IX86_BUILTIN_FIXUPIMMPD512_MASKZ, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DI_INT_QI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fixupimmv16sf_mask_round, "__builtin_ia32_fixupimmps512_mask", IX86_BUILTIN_FIXUPIMMPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SI_INT_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fixupimmv16sf_maskz_round, "__builtin_ia32_fixupimmps512_maskz", IX86_BUILTIN_FIXUPIMMPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SI_INT_HI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_sfixupimmv2df_mask_round, "__builtin_ia32_fixupimmsd_mask", IX86_BUILTIN_FIXUPIMMSD128_MASK, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DI_INT_QI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_sfixupimmv2df_maskz_round, "__builtin_ia32_fixupimmsd_maskz", IX86_BUILTIN_FIXUPIMMSD128_MASKZ, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DI_INT_QI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_sfixupimmv4sf_mask_round, "__builtin_ia32_fixupimmss_mask", IX86_BUILTIN_FIXUPIMMSS128_MASK, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SI_INT_QI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_sfixupimmv4sf_maskz_round, "__builtin_ia32_fixupimmss_maskz", IX86_BUILTIN_FIXUPIMMSS128_MASKZ, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SI_INT_QI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_getexpv8df_mask_round, "__builtin_ia32_getexppd512_mask", IX86_BUILTIN_GETEXPPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_QI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_getexpv16sf_mask_round, "__builtin_ia32_getexpps512_mask", IX86_BUILTIN_GETEXPPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_getexpv8df_mask_round, "__builtin_ia32_getexppd512_mask", IX86_BUILTIN_GETEXPPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_QI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_getexpv16sf_mask_round, "__builtin_ia32_getexpps512_mask", IX86_BUILTIN_GETEXPPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_HI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_sgetexpv2df_round, "__builtin_ia32_getexpsd128_round", IX86_BUILTIN_GETEXPSD128, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_sgetexpv2df_mask_round, "__builtin_ia32_getexpsd_mask_round", IX86_BUILTIN_GETEXPSD_MASK_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_sgetexpv4sf_round, "__builtin_ia32_getexpss128_round", IX86_BUILTIN_GETEXPSS128, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_sgetexpv4sf_mask_round, "__builtin_ia32_getexpss_mask_round", IX86_BUILTIN_GETEXPSS_MASK_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_getmantv8df_mask_round, "__builtin_ia32_getmantpd512_mask", IX86_BUILTIN_GETMANTPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_INT_V8DF_QI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_getmantv16sf_mask_round, "__builtin_ia32_getmantps512_mask", IX86_BUILTIN_GETMANTPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_INT_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_getmantv8df_mask_round, "__builtin_ia32_getmantpd512_mask", IX86_BUILTIN_GETMANTPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_INT_V8DF_QI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_getmantv16sf_mask_round, "__builtin_ia32_getmantps512_mask", IX86_BUILTIN_GETMANTPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_INT_V16SF_HI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vgetmantv2df_round, "__builtin_ia32_getmantsd_round", IX86_BUILTIN_GETMANTSD128, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vgetmantv2df_mask_round, "__builtin_ia32_getmantsd_mask_round", IX86_BUILTIN_GETMANTSD_MASK_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT_V2DF_UQI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vgetmantv4sf_round, "__builtin_ia32_getmantss_round", IX86_BUILTIN_GETMANTSS128, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vgetmantv4sf_mask_round, "__builtin_ia32_getmantss_mask_round", IX86_BUILTIN_GETMANTSS_MASK_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT_V4SF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_smaxv8df3_mask_round, "__builtin_ia32_maxpd512_mask", IX86_BUILTIN_MAXPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_smaxv16sf3_mask_round, "__builtin_ia32_maxps512_mask", IX86_BUILTIN_MAXPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_smaxv8df3_mask_round, "__builtin_ia32_maxpd512_mask", IX86_BUILTIN_MAXPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_smaxv16sf3_mask_round, "__builtin_ia32_maxps512_mask", IX86_BUILTIN_MAXPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse2_vmsmaxv2df3_round, "__builtin_ia32_maxsd_round", IX86_BUILTIN_MAXSD_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse2_vmsmaxv2df3_mask_round, "__builtin_ia32_maxsd_mask_round", IX86_BUILTIN_MAXSD_MASK_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse_vmsmaxv4sf3_round, "__builtin_ia32_maxss_round", IX86_BUILTIN_MAXSS_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse_vmsmaxv4sf3_mask_round, "__builtin_ia32_maxss_mask_round", IX86_BUILTIN_MAXSS_MASK_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sminv8df3_mask_round, "__builtin_ia32_minpd512_mask", IX86_BUILTIN_MINPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sminv16sf3_mask_round, "__builtin_ia32_minps512_mask", IX86_BUILTIN_MINPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_sminv8df3_mask_round, "__builtin_ia32_minpd512_mask", IX86_BUILTIN_MINPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_sminv16sf3_mask_round, "__builtin_ia32_minps512_mask", IX86_BUILTIN_MINPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse2_vmsminv2df3_round, "__builtin_ia32_minsd_round", IX86_BUILTIN_MINSD_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse2_vmsminv2df3_mask_round, "__builtin_ia32_minsd_mask_round", IX86_BUILTIN_MINSD_MASK_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse_vmsminv4sf3_round, "__builtin_ia32_minss_round", IX86_BUILTIN_MINSS_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse_vmsminv4sf3_mask_round, "__builtin_ia32_minss_mask_round", IX86_BUILTIN_MINSS_MASK_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_mulv8df3_mask_round, "__builtin_ia32_mulpd512_mask", IX86_BUILTIN_MULPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_mulv16sf3_mask_round, "__builtin_ia32_mulps512_mask", IX86_BUILTIN_MULPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_mulv8df3_mask_round, "__builtin_ia32_mulpd512_mask", IX86_BUILTIN_MULPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_mulv16sf3_mask_round, "__builtin_ia32_mulps512_mask", IX86_BUILTIN_MULPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse2_vmmulv2df3_round, "__builtin_ia32_mulsd_round", IX86_BUILTIN_MULSD_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse2_vmmulv2df3_mask_round, "__builtin_ia32_mulsd_mask_round", IX86_BUILTIN_MULSD_MASK_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse_vmmulv4sf3_round, "__builtin_ia32_mulss_round", IX86_BUILTIN_MULSS_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse_vmmulv4sf3_mask_round, "__builtin_ia32_mulss_mask_round", IX86_BUILTIN_MULSS_MASK_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_rndscalev8df_mask_round, "__builtin_ia32_rndscalepd_mask", IX86_BUILTIN_RNDSCALEPD, UNKNOWN, (int) V8DF_FTYPE_V8DF_INT_V8DF_QI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_rndscalev16sf_mask_round, "__builtin_ia32_rndscaleps_mask", IX86_BUILTIN_RNDSCALEPS, UNKNOWN, (int) V16SF_FTYPE_V16SF_INT_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_rndscalev8df_mask_round, "__builtin_ia32_rndscalepd_mask", IX86_BUILTIN_RNDSCALEPD, UNKNOWN, (int) V8DF_FTYPE_V8DF_INT_V8DF_QI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_rndscalev16sf_mask_round, "__builtin_ia32_rndscaleps_mask", IX86_BUILTIN_RNDSCALEPS, UNKNOWN, (int) V16SF_FTYPE_V16SF_INT_V16SF_HI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_rndscalev2df_mask_round, "__builtin_ia32_rndscalesd_mask_round", IX86_BUILTIN_RNDSCALESD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT_V2DF_UQI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_rndscalev4sf_mask_round, "__builtin_ia32_rndscaless_mask_round", IX86_BUILTIN_RNDSCALESS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT_V4SF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_scalefv8df_mask_round, "__builtin_ia32_scalefpd512_mask", IX86_BUILTIN_SCALEFPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_scalefv16sf_mask_round, "__builtin_ia32_scalefps512_mask", IX86_BUILTIN_SCALEFPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_scalefv8df_mask_round, "__builtin_ia32_scalefpd512_mask", IX86_BUILTIN_SCALEFPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_scalefv16sf_mask_round, "__builtin_ia32_scalefps512_mask", IX86_BUILTIN_SCALEFPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vmscalefv2df_mask_round, "__builtin_ia32_scalefsd_mask_round", IX86_BUILTIN_SCALEFSD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vmscalefv4sf_mask_round, "__builtin_ia32_scalefss_mask_round", IX86_BUILTIN_SCALEFSS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_sqrtv8df2_mask_round, "__builtin_ia32_sqrtpd512_mask", IX86_BUILTIN_SQRTPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_QI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_sqrtv16sf2_mask_round, "__builtin_ia32_sqrtps512_mask", IX86_BUILTIN_SQRTPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_sqrtv8df2_mask_round, "__builtin_ia32_sqrtpd512_mask", IX86_BUILTIN_SQRTPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_QI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_sqrtv16sf2_mask_round, "__builtin_ia32_sqrtps512_mask", IX86_BUILTIN_SQRTPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_HI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse2_vmsqrtv2df2_mask_round, "__builtin_ia32_sqrtsd_mask_round", IX86_BUILTIN_SQRTSD_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse_vmsqrtv4sf2_mask_round, "__builtin_ia32_sqrtss_mask_round", IX86_BUILTIN_SQRTSS_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_subv8df3_mask_round, "__builtin_ia32_subpd512_mask", IX86_BUILTIN_SUBPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_subv16sf3_mask_round, "__builtin_ia32_subps512_mask", IX86_BUILTIN_SUBPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_subv8df3_mask_round, "__builtin_ia32_subpd512_mask", IX86_BUILTIN_SUBPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_subv16sf3_mask_round, "__builtin_ia32_subps512_mask", IX86_BUILTIN_SUBPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse2_vmsubv2df3_round, "__builtin_ia32_subsd_round", IX86_BUILTIN_SUBSD_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse2_vmsubv2df3_mask_round, "__builtin_ia32_subsd_mask_round", IX86_BUILTIN_SUBSD_MASK_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse_vmsubv4sf3_round, "__builtin_ia32_subss_round", IX86_BUILTIN_SUBSS_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT)
@@ -3147,12 +3147,12 @@ BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse_cvttss2si_round, "__builtin_ia32
 BDESC (OPTION_MASK_ISA_AVX512F | OPTION_MASK_ISA_64BIT, 0, CODE_FOR_sse_cvttss2siq_round, "__builtin_ia32_vcvttss2si64", IX86_BUILTIN_VCVTTSS2SI64, UNKNOWN, (int) INT64_FTYPE_V4SF_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vcvttss2usi_round, "__builtin_ia32_vcvttss2usi32", IX86_BUILTIN_VCVTTSS2USI32, UNKNOWN, (int) UINT_FTYPE_V4SF_INT)
 BDESC (OPTION_MASK_ISA_AVX512F | OPTION_MASK_ISA_64BIT, 0, CODE_FOR_avx512f_vcvttss2usiq_round, "__builtin_ia32_vcvttss2usi64", IX86_BUILTIN_VCVTTSS2USI64, UNKNOWN, (int) UINT64_FTYPE_V4SF_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmadd_v8df_mask_round, "__builtin_ia32_vfmaddpd512_mask", IX86_BUILTIN_VFMADDPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmadd_v8df_mask3_round, "__builtin_ia32_vfmaddpd512_mask3", IX86_BUILTIN_VFMADDPD512_MASK3, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmadd_v8df_maskz_round, "__builtin_ia32_vfmaddpd512_maskz", IX86_BUILTIN_VFMADDPD512_MASKZ, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmadd_v16sf_mask_round, "__builtin_ia32_vfmaddps512_mask", IX86_BUILTIN_VFMADDPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmadd_v16sf_mask3_round, "__builtin_ia32_vfmaddps512_mask3", IX86_BUILTIN_VFMADDPS512_MASK3, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmadd_v16sf_maskz_round, "__builtin_ia32_vfmaddps512_maskz", IX86_BUILTIN_VFMADDPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmadd_v8df_mask_round, "__builtin_ia32_vfmaddpd512_mask", IX86_BUILTIN_VFMADDPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmadd_v8df_mask3_round, "__builtin_ia32_vfmaddpd512_mask3", IX86_BUILTIN_VFMADDPD512_MASK3, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmadd_v8df_maskz_round, "__builtin_ia32_vfmaddpd512_maskz", IX86_BUILTIN_VFMADDPD512_MASKZ, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmadd_v16sf_mask_round, "__builtin_ia32_vfmaddps512_mask", IX86_BUILTIN_VFMADDPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmadd_v16sf_mask3_round, "__builtin_ia32_vfmaddps512_mask3", IX86_BUILTIN_VFMADDPS512_MASK3, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmadd_v16sf_maskz_round, "__builtin_ia32_vfmaddps512_maskz", IX86_BUILTIN_VFMADDPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_fmai_vmfmadd_v2df_round, "__builtin_ia32_vfmaddsd3_round", IX86_BUILTIN_VFMADDSD3_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_fmai_vmfmadd_v4sf_round, "__builtin_ia32_vfmaddss3_round", IX86_BUILTIN_VFMADDSS3_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vmfmadd_v2df_mask_round, "__builtin_ia32_vfmaddsd3_mask", IX86_BUILTIN_VFMADDSD3_MASK, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI_INT)
@@ -3163,32 +3163,32 @@ BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vmfmadd_v4sf_mask_round, "__
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vmfmadd_v4sf_mask3_round, "__builtin_ia32_vfmaddss3_mask3", IX86_BUILTIN_VFMADDSS3_MASK3, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vmfmadd_v4sf_maskz_round, "__builtin_ia32_vfmaddss3_maskz", IX86_BUILTIN_VFMADDSS3_MASKZ, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vmfmsub_v4sf_mask3_round, "__builtin_ia32_vfmsubss3_mask3", IX86_BUILTIN_VFMSUBSS3_MASK3, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmaddsub_v8df_mask_round, "__builtin_ia32_vfmaddsubpd512_mask", IX86_BUILTIN_VFMADDSUBPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmaddsub_v8df_mask3_round, "__builtin_ia32_vfmaddsubpd512_mask3", IX86_BUILTIN_VFMADDSUBPD512_MASK3, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmaddsub_v8df_maskz_round, "__builtin_ia32_vfmaddsubpd512_maskz", IX86_BUILTIN_VFMADDSUBPD512_MASKZ, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmaddsub_v16sf_mask_round, "__builtin_ia32_vfmaddsubps512_mask", IX86_BUILTIN_VFMADDSUBPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmaddsub_v16sf_mask3_round, "__builtin_ia32_vfmaddsubps512_mask3", IX86_BUILTIN_VFMADDSUBPS512_MASK3, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmaddsub_v16sf_maskz_round, "__builtin_ia32_vfmaddsubps512_maskz", IX86_BUILTIN_VFMADDSUBPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmsubadd_v8df_mask3_round, "__builtin_ia32_vfmsubaddpd512_mask3", IX86_BUILTIN_VFMSUBADDPD512_MASK3, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmsubadd_v16sf_mask3_round, "__builtin_ia32_vfmsubaddps512_mask3", IX86_BUILTIN_VFMSUBADDPS512_MASK3, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmsub_v8df_mask_round, "__builtin_ia32_vfmsubpd512_mask", IX86_BUILTIN_VFMSUBPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmsub_v8df_mask3_round, "__builtin_ia32_vfmsubpd512_mask3", IX86_BUILTIN_VFMSUBPD512_MASK3, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmsub_v8df_maskz_round, "__builtin_ia32_vfmsubpd512_maskz", IX86_BUILTIN_VFMSUBPD512_MASKZ, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmsub_v16sf_mask_round, "__builtin_ia32_vfmsubps512_mask", IX86_BUILTIN_VFMSUBPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmsub_v16sf_mask3_round, "__builtin_ia32_vfmsubps512_mask3", IX86_BUILTIN_VFMSUBPS512_MASK3, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmsub_v16sf_maskz_round, "__builtin_ia32_vfmsubps512_maskz", IX86_BUILTIN_VFMSUBPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fnmadd_v8df_mask_round, "__builtin_ia32_vfnmaddpd512_mask", IX86_BUILTIN_VFNMADDPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fnmadd_v8df_mask3_round, "__builtin_ia32_vfnmaddpd512_mask3", IX86_BUILTIN_VFNMADDPD512_MASK3, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fnmadd_v8df_maskz_round, "__builtin_ia32_vfnmaddpd512_maskz", IX86_BUILTIN_VFNMADDPD512_MASKZ, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fnmadd_v16sf_mask_round, "__builtin_ia32_vfnmaddps512_mask", IX86_BUILTIN_VFNMADDPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fnmadd_v16sf_mask3_round, "__builtin_ia32_vfnmaddps512_mask3", IX86_BUILTIN_VFNMADDPS512_MASK3, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fnmadd_v16sf_maskz_round, "__builtin_ia32_vfnmaddps512_maskz", IX86_BUILTIN_VFNMADDPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fnmsub_v8df_mask_round, "__builtin_ia32_vfnmsubpd512_mask", IX86_BUILTIN_VFNMSUBPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fnmsub_v8df_mask3_round, "__builtin_ia32_vfnmsubpd512_mask3", IX86_BUILTIN_VFNMSUBPD512_MASK3, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fnmsub_v8df_maskz_round, "__builtin_ia32_vfnmsubpd512_maskz", IX86_BUILTIN_VFNMSUBPD512_MASKZ, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fnmsub_v16sf_mask_round, "__builtin_ia32_vfnmsubps512_mask", IX86_BUILTIN_VFNMSUBPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fnmsub_v16sf_mask3_round, "__builtin_ia32_vfnmsubps512_mask3", IX86_BUILTIN_VFNMSUBPS512_MASK3, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fnmsub_v16sf_maskz_round, "__builtin_ia32_vfnmsubps512_maskz", IX86_BUILTIN_VFNMSUBPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmaddsub_v8df_mask_round, "__builtin_ia32_vfmaddsubpd512_mask", IX86_BUILTIN_VFMADDSUBPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmaddsub_v8df_mask3_round, "__builtin_ia32_vfmaddsubpd512_mask3", IX86_BUILTIN_VFMADDSUBPD512_MASK3, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmaddsub_v8df_maskz_round, "__builtin_ia32_vfmaddsubpd512_maskz", IX86_BUILTIN_VFMADDSUBPD512_MASKZ, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmaddsub_v16sf_mask_round, "__builtin_ia32_vfmaddsubps512_mask", IX86_BUILTIN_VFMADDSUBPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmaddsub_v16sf_mask3_round, "__builtin_ia32_vfmaddsubps512_mask3", IX86_BUILTIN_VFMADDSUBPS512_MASK3, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmaddsub_v16sf_maskz_round, "__builtin_ia32_vfmaddsubps512_maskz", IX86_BUILTIN_VFMADDSUBPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmsubadd_v8df_mask3_round, "__builtin_ia32_vfmsubaddpd512_mask3", IX86_BUILTIN_VFMSUBADDPD512_MASK3, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmsubadd_v16sf_mask3_round, "__builtin_ia32_vfmsubaddps512_mask3", IX86_BUILTIN_VFMSUBADDPS512_MASK3, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmsub_v8df_mask_round, "__builtin_ia32_vfmsubpd512_mask", IX86_BUILTIN_VFMSUBPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmsub_v8df_mask3_round, "__builtin_ia32_vfmsubpd512_mask3", IX86_BUILTIN_VFMSUBPD512_MASK3, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmsub_v8df_maskz_round, "__builtin_ia32_vfmsubpd512_maskz", IX86_BUILTIN_VFMSUBPD512_MASKZ, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmsub_v16sf_mask_round, "__builtin_ia32_vfmsubps512_mask", IX86_BUILTIN_VFMSUBPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmsub_v16sf_mask3_round, "__builtin_ia32_vfmsubps512_mask3", IX86_BUILTIN_VFMSUBPS512_MASK3, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmsub_v16sf_maskz_round, "__builtin_ia32_vfmsubps512_maskz", IX86_BUILTIN_VFMSUBPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fnmadd_v8df_mask_round, "__builtin_ia32_vfnmaddpd512_mask", IX86_BUILTIN_VFNMADDPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fnmadd_v8df_mask3_round, "__builtin_ia32_vfnmaddpd512_mask3", IX86_BUILTIN_VFNMADDPD512_MASK3, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fnmadd_v8df_maskz_round, "__builtin_ia32_vfnmaddpd512_maskz", IX86_BUILTIN_VFNMADDPD512_MASKZ, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fnmadd_v16sf_mask_round, "__builtin_ia32_vfnmaddps512_mask", IX86_BUILTIN_VFNMADDPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fnmadd_v16sf_mask3_round, "__builtin_ia32_vfnmaddps512_mask3", IX86_BUILTIN_VFNMADDPS512_MASK3, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fnmadd_v16sf_maskz_round, "__builtin_ia32_vfnmaddps512_maskz", IX86_BUILTIN_VFNMADDPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fnmsub_v8df_mask_round, "__builtin_ia32_vfnmsubpd512_mask", IX86_BUILTIN_VFNMSUBPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fnmsub_v8df_mask3_round, "__builtin_ia32_vfnmsubpd512_mask3", IX86_BUILTIN_VFNMSUBPD512_MASK3, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fnmsub_v8df_maskz_round, "__builtin_ia32_vfnmsubpd512_maskz", IX86_BUILTIN_VFNMSUBPD512_MASKZ, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fnmsub_v16sf_mask_round, "__builtin_ia32_vfnmsubps512_mask", IX86_BUILTIN_VFNMSUBPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fnmsub_v16sf_mask3_round, "__builtin_ia32_vfnmsubps512_mask3", IX86_BUILTIN_VFNMSUBPS512_MASK3, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fnmsub_v16sf_maskz_round, "__builtin_ia32_vfnmsubps512_maskz", IX86_BUILTIN_VFNMSUBPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
 
 /* AVX512ER */
 BDESC (OPTION_MASK_ISA_AVX512ER, 0, CODE_FOR_avx512er_exp2v8df_mask_round, "__builtin_ia32_exp2pd_mask", IX86_BUILTIN_EXP2PD_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_QI_INT)
diff --git a/gcc/config/i386/i386-builtins.cc b/gcc/config/i386/i386-builtins.cc
index 8a0b8dfe073..e1d1dac2ba2 100644
--- a/gcc/config/i386/i386-builtins.cc
+++ b/gcc/config/i386/i386-builtins.cc
@@ -784,83 +784,103 @@ ix86_init_mmx_sse_builtins (void)
 		    IX86_BUILTIN_GATHERALTDIV8SI);
 
   /* AVX512F */
-  def_builtin_pure (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_gathersiv16sf",
+  def_builtin_pure (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512,
+		    "__builtin_ia32_gathersiv16sf",
 		    V16SF_FTYPE_V16SF_PCVOID_V16SI_HI_INT,
 		    IX86_BUILTIN_GATHER3SIV16SF);
 
-  def_builtin_pure (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_gathersiv8df",
+  def_builtin_pure (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512,
+		    "__builtin_ia32_gathersiv8df",
 		    V8DF_FTYPE_V8DF_PCVOID_V8SI_QI_INT,
 		    IX86_BUILTIN_GATHER3SIV8DF);
 
-  def_builtin_pure (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_gatherdiv16sf",
+  def_builtin_pure (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512,
+		    "__builtin_ia32_gatherdiv16sf",
 		    V8SF_FTYPE_V8SF_PCVOID_V8DI_QI_INT,
 		    IX86_BUILTIN_GATHER3DIV16SF);
 
-  def_builtin_pure (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_gatherdiv8df",
+  def_builtin_pure (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512,
+		    "__builtin_ia32_gatherdiv8df",
 		    V8DF_FTYPE_V8DF_PCVOID_V8DI_QI_INT,
 		    IX86_BUILTIN_GATHER3DIV8DF);
 
-  def_builtin_pure (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_gathersiv16si",
+  def_builtin_pure (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512,
+		    "__builtin_ia32_gathersiv16si",
 		    V16SI_FTYPE_V16SI_PCVOID_V16SI_HI_INT,
 		    IX86_BUILTIN_GATHER3SIV16SI);
 
-  def_builtin_pure (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_gathersiv8di",
+  def_builtin_pure (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512,
+		    "__builtin_ia32_gathersiv8di",
 		    V8DI_FTYPE_V8DI_PCVOID_V8SI_QI_INT,
 		    IX86_BUILTIN_GATHER3SIV8DI);
 
-  def_builtin_pure (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_gatherdiv16si",
+  def_builtin_pure (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512,
+		    "__builtin_ia32_gatherdiv16si",
 		    V8SI_FTYPE_V8SI_PCVOID_V8DI_QI_INT,
 		    IX86_BUILTIN_GATHER3DIV16SI);
 
-  def_builtin_pure (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_gatherdiv8di",
+  def_builtin_pure (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512,
+		    "__builtin_ia32_gatherdiv8di",
 		    V8DI_FTYPE_V8DI_PCVOID_V8DI_QI_INT,
 		    IX86_BUILTIN_GATHER3DIV8DI);
 
-  def_builtin_pure (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_gather3altsiv8df ",
+  def_builtin_pure (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512,
+		    "__builtin_ia32_gather3altsiv8df ",
 		    V8DF_FTYPE_V8DF_PCDOUBLE_V16SI_QI_INT,
 		    IX86_BUILTIN_GATHER3ALTSIV8DF);
 
-  def_builtin_pure (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_gather3altdiv16sf ",
+  def_builtin_pure (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512,
+		    "__builtin_ia32_gather3altdiv16sf ",
 		    V16SF_FTYPE_V16SF_PCFLOAT_V8DI_HI_INT,
 		    IX86_BUILTIN_GATHER3ALTDIV16SF);
 
-  def_builtin_pure (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_gather3altsiv8di ",
+  def_builtin_pure (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512,
+		    "__builtin_ia32_gather3altsiv8di ",
 		    V8DI_FTYPE_V8DI_PCINT64_V16SI_QI_INT,
 		    IX86_BUILTIN_GATHER3ALTSIV8DI);
 
-  def_builtin_pure (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_gather3altdiv16si ",
+  def_builtin_pure (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512,
+		    "__builtin_ia32_gather3altdiv16si ",
 		    V16SI_FTYPE_V16SI_PCINT_V8DI_HI_INT,
 		    IX86_BUILTIN_GATHER3ALTDIV16SI);
 
-  def_builtin (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_scattersiv16sf",
+  def_builtin (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512,
+	       "__builtin_ia32_scattersiv16sf",
 	       VOID_FTYPE_PVOID_HI_V16SI_V16SF_INT,
 	       IX86_BUILTIN_SCATTERSIV16SF);
 
-  def_builtin (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_scattersiv8df",
+  def_builtin (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512,
+	       "__builtin_ia32_scattersiv8df",
 	       VOID_FTYPE_PVOID_QI_V8SI_V8DF_INT,
 	       IX86_BUILTIN_SCATTERSIV8DF);
 
-  def_builtin (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_scatterdiv16sf",
+  def_builtin (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512,
+	       "__builtin_ia32_scatterdiv16sf",
 	       VOID_FTYPE_PVOID_QI_V8DI_V8SF_INT,
 	       IX86_BUILTIN_SCATTERDIV16SF);
 
-  def_builtin (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_scatterdiv8df",
+  def_builtin (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512,
+	       "__builtin_ia32_scatterdiv8df",
 	       VOID_FTYPE_PVOID_QI_V8DI_V8DF_INT,
 	       IX86_BUILTIN_SCATTERDIV8DF);
 
-  def_builtin (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_scattersiv16si",
+  def_builtin (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512,
+	       "__builtin_ia32_scattersiv16si",
 	       VOID_FTYPE_PVOID_HI_V16SI_V16SI_INT,
 	       IX86_BUILTIN_SCATTERSIV16SI);
 
-  def_builtin (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_scattersiv8di",
+  def_builtin (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512,
+	       "__builtin_ia32_scattersiv8di",
 	       VOID_FTYPE_PVOID_QI_V8SI_V8DI_INT,
 	       IX86_BUILTIN_SCATTERSIV8DI);
 
-  def_builtin (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_scatterdiv16si",
+  def_builtin (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512,
+	       "__builtin_ia32_scatterdiv16si",
 	       VOID_FTYPE_PVOID_QI_V8DI_V8SI_INT,
 	       IX86_BUILTIN_SCATTERDIV16SI);
 
-  def_builtin (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_scatterdiv8di",
+  def_builtin (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512,
+	       "__builtin_ia32_scatterdiv8di",
 	       VOID_FTYPE_PVOID_QI_V8DI_V8DI_INT,
 	       IX86_BUILTIN_SCATTERDIV8DI);
 
@@ -1009,19 +1029,23 @@ ix86_init_mmx_sse_builtins (void)
 	       VOID_FTYPE_PVOID_QI_V2DI_V2DI_INT,
 	       IX86_BUILTIN_SCATTERDIV2DI);
 
-  def_builtin (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_scatteraltsiv8df ",
+  def_builtin (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512,
+	       "__builtin_ia32_scatteraltsiv8df ",
 	       VOID_FTYPE_PDOUBLE_QI_V16SI_V8DF_INT,
 	       IX86_BUILTIN_SCATTERALTSIV8DF);
 
-  def_builtin (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_scatteraltdiv16sf ",
+  def_builtin (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512,
+	       "__builtin_ia32_scatteraltdiv16sf ",
 	       VOID_FTYPE_PFLOAT_HI_V8DI_V16SF_INT,
 	       IX86_BUILTIN_SCATTERALTDIV16SF);
 
-  def_builtin (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_scatteraltsiv8di ",
+  def_builtin (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512,
+	       "__builtin_ia32_scatteraltsiv8di ",
 	       VOID_FTYPE_PLONGLONG_QI_V16SI_V8DI_INT,
 	       IX86_BUILTIN_SCATTERALTSIV8DI);
 
-  def_builtin (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_scatteraltdiv16si ",
+  def_builtin (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512,
+	       "__builtin_ia32_scatteraltdiv16si ",
 	       VOID_FTYPE_PINT_HI_V8DI_V16SI_INT,
 	       IX86_BUILTIN_SCATTERALTDIV16SI);
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 08/18] [PATCH 2/5] Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins
  2023-09-21  7:19 [PATCH 00/18] Support -mevex512 for AVX512 Hu, Lin1
                   ` (6 preceding siblings ...)
  2023-09-21  7:20 ` [PATCH 07/18] [PATCH 1/5] Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins Hu, Lin1
@ 2023-09-21  7:20 ` Hu, Lin1
  2023-09-21  7:20 ` [PATCH 09/18] [PATCH 3/5] " Hu, Lin1
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Hu, Lin1 @ 2023-09-21  7:20 UTC (permalink / raw)
  To: gcc-patches; +Cc: hongtao.liu, ubizjak, haochen.jiang

From: Haochen Jiang <haochen.jiang@intel.com>

gcc/ChangeLog:

	* config/i386/i386-builtin.def (BDESC): Add
	OPTION_MASK_ISA2_EVEX512.
---
 gcc/config/i386/i386-builtin.def | 94 ++++++++++++++++----------------
 1 file changed, 47 insertions(+), 47 deletions(-)

diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 0cc526383db..7a0dec9bc8b 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -2408,37 +2408,37 @@ BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_cmpv2df3_mask, "__builtin_
 BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_cmpv4sf3_mask, "__builtin_ia32_cmpps128_mask", IX86_BUILTIN_CMPPS128_MASK, UNKNOWN, (int) UQI_FTYPE_V4SF_V4SF_INT_UQI)
 
 /* AVX512DQ.  */
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_broadcastv16sf_mask, "__builtin_ia32_broadcastf32x2_512_mask", IX86_BUILTIN_BROADCASTF32x2_512, UNKNOWN, (int) V16SF_FTYPE_V4SF_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_broadcastv16si_mask, "__builtin_ia32_broadcasti32x2_512_mask", IX86_BUILTIN_BROADCASTI32x2_512, UNKNOWN, (int) V16SI_FTYPE_V4SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_broadcastv8df_mask_1, "__builtin_ia32_broadcastf64x2_512_mask", IX86_BUILTIN_BROADCASTF64X2_512, UNKNOWN, (int) V8DF_FTYPE_V2DF_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_broadcastv8di_mask_1, "__builtin_ia32_broadcasti64x2_512_mask", IX86_BUILTIN_BROADCASTI64X2_512, UNKNOWN, (int) V8DI_FTYPE_V2DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_broadcastv16sf_mask_1, "__builtin_ia32_broadcastf32x8_512_mask", IX86_BUILTIN_BROADCASTF32X8_512, UNKNOWN, (int) V16SF_FTYPE_V8SF_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_broadcastv16si_mask_1, "__builtin_ia32_broadcasti32x8_512_mask", IX86_BUILTIN_BROADCASTI32X8_512, UNKNOWN, (int) V16SI_FTYPE_V8SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_vextractf64x2_mask, "__builtin_ia32_extractf64x2_512_mask", IX86_BUILTIN_EXTRACTF64X2_512, UNKNOWN, (int) V2DF_FTYPE_V8DF_INT_V2DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_vextractf32x8_mask, "__builtin_ia32_extractf32x8_mask", IX86_BUILTIN_EXTRACTF32X8, UNKNOWN, (int) V8SF_FTYPE_V16SF_INT_V8SF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_vextracti64x2_mask, "__builtin_ia32_extracti64x2_512_mask", IX86_BUILTIN_EXTRACTI64X2_512, UNKNOWN, (int) V2DI_FTYPE_V8DI_INT_V2DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_vextracti32x8_mask, "__builtin_ia32_extracti32x8_mask", IX86_BUILTIN_EXTRACTI32X8, UNKNOWN, (int) V8SI_FTYPE_V16SI_INT_V8SI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_reducepv8df_mask, "__builtin_ia32_reducepd512_mask", IX86_BUILTIN_REDUCEPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_INT_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_reducepv16sf_mask, "__builtin_ia32_reduceps512_mask", IX86_BUILTIN_REDUCEPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_INT_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_mulv8di3_mask, "__builtin_ia32_pmullq512_mask", IX86_BUILTIN_PMULLQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_xorv8df3_mask, "__builtin_ia32_xorpd512_mask", IX86_BUILTIN_XORPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_xorv16sf3_mask, "__builtin_ia32_xorps512_mask", IX86_BUILTIN_XORPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_iorv8df3_mask, "__builtin_ia32_orpd512_mask", IX86_BUILTIN_ORPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_iorv16sf3_mask, "__builtin_ia32_orps512_mask", IX86_BUILTIN_ORPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_andv8df3_mask, "__builtin_ia32_andpd512_mask", IX86_BUILTIN_ANDPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_andv16sf3_mask, "__builtin_ia32_andps512_mask", IX86_BUILTIN_ANDPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512f_andnotv8df3_mask, "__builtin_ia32_andnpd512_mask", IX86_BUILTIN_ANDNPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512f_andnotv16sf3_mask, "__builtin_ia32_andnps512_mask", IX86_BUILTIN_ANDNPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_vinsertf32x8_mask, "__builtin_ia32_insertf32x8_mask", IX86_BUILTIN_INSERTF32X8, UNKNOWN, (int) V16SF_FTYPE_V16SF_V8SF_INT_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_vinserti32x8_mask, "__builtin_ia32_inserti32x8_mask", IX86_BUILTIN_INSERTI32X8, UNKNOWN, (int) V16SI_FTYPE_V16SI_V8SI_INT_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_vinsertf64x2_mask, "__builtin_ia32_insertf64x2_512_mask", IX86_BUILTIN_INSERTF64X2_512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V2DF_INT_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_vinserti64x2_mask, "__builtin_ia32_inserti64x2_512_mask", IX86_BUILTIN_INSERTI64X2_512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V2DI_INT_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_fpclassv8df_mask, "__builtin_ia32_fpclasspd512_mask", IX86_BUILTIN_FPCLASSPD512, UNKNOWN, (int) QI_FTYPE_V8DF_INT_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_fpclassv16sf_mask, "__builtin_ia32_fpclassps512_mask", IX86_BUILTIN_FPCLASSPS512, UNKNOWN, (int) HI_FTYPE_V16SF_INT_UHI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512f_cvtd2maskv16si, "__builtin_ia32_cvtd2mask512", IX86_BUILTIN_CVTD2MASK512, UNKNOWN, (int) UHI_FTYPE_V16SI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512f_cvtq2maskv8di, "__builtin_ia32_cvtq2mask512", IX86_BUILTIN_CVTQ2MASK512, UNKNOWN, (int) UQI_FTYPE_V8DI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512f_cvtmask2dv16si, "__builtin_ia32_cvtmask2d512", IX86_BUILTIN_CVTMASK2D512, UNKNOWN, (int) V16SI_FTYPE_UHI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512f_cvtmask2qv8di, "__builtin_ia32_cvtmask2q512", IX86_BUILTIN_CVTMASK2Q512, UNKNOWN, (int) V8DI_FTYPE_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_broadcastv16sf_mask, "__builtin_ia32_broadcastf32x2_512_mask", IX86_BUILTIN_BROADCASTF32x2_512, UNKNOWN, (int) V16SF_FTYPE_V4SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_broadcastv16si_mask, "__builtin_ia32_broadcasti32x2_512_mask", IX86_BUILTIN_BROADCASTI32x2_512, UNKNOWN, (int) V16SI_FTYPE_V4SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_broadcastv8df_mask_1, "__builtin_ia32_broadcastf64x2_512_mask", IX86_BUILTIN_BROADCASTF64X2_512, UNKNOWN, (int) V8DF_FTYPE_V2DF_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_broadcastv8di_mask_1, "__builtin_ia32_broadcasti64x2_512_mask", IX86_BUILTIN_BROADCASTI64X2_512, UNKNOWN, (int) V8DI_FTYPE_V2DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_broadcastv16sf_mask_1, "__builtin_ia32_broadcastf32x8_512_mask", IX86_BUILTIN_BROADCASTF32X8_512, UNKNOWN, (int) V16SF_FTYPE_V8SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_broadcastv16si_mask_1, "__builtin_ia32_broadcasti32x8_512_mask", IX86_BUILTIN_BROADCASTI32X8_512, UNKNOWN, (int) V16SI_FTYPE_V8SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_vextractf64x2_mask, "__builtin_ia32_extractf64x2_512_mask", IX86_BUILTIN_EXTRACTF64X2_512, UNKNOWN, (int) V2DF_FTYPE_V8DF_INT_V2DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_vextractf32x8_mask, "__builtin_ia32_extractf32x8_mask", IX86_BUILTIN_EXTRACTF32X8, UNKNOWN, (int) V8SF_FTYPE_V16SF_INT_V8SF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_vextracti64x2_mask, "__builtin_ia32_extracti64x2_512_mask", IX86_BUILTIN_EXTRACTI64X2_512, UNKNOWN, (int) V2DI_FTYPE_V8DI_INT_V2DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_vextracti32x8_mask, "__builtin_ia32_extracti32x8_mask", IX86_BUILTIN_EXTRACTI32X8, UNKNOWN, (int) V8SI_FTYPE_V16SI_INT_V8SI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_reducepv8df_mask, "__builtin_ia32_reducepd512_mask", IX86_BUILTIN_REDUCEPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_INT_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_reducepv16sf_mask, "__builtin_ia32_reduceps512_mask", IX86_BUILTIN_REDUCEPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_INT_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_mulv8di3_mask, "__builtin_ia32_pmullq512_mask", IX86_BUILTIN_PMULLQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_xorv8df3_mask, "__builtin_ia32_xorpd512_mask", IX86_BUILTIN_XORPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_xorv16sf3_mask, "__builtin_ia32_xorps512_mask", IX86_BUILTIN_XORPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_iorv8df3_mask, "__builtin_ia32_orpd512_mask", IX86_BUILTIN_ORPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_iorv16sf3_mask, "__builtin_ia32_orps512_mask", IX86_BUILTIN_ORPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_andv8df3_mask, "__builtin_ia32_andpd512_mask", IX86_BUILTIN_ANDPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_andv16sf3_mask, "__builtin_ia32_andps512_mask", IX86_BUILTIN_ANDPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_andnotv8df3_mask, "__builtin_ia32_andnpd512_mask", IX86_BUILTIN_ANDNPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_andnotv16sf3_mask, "__builtin_ia32_andnps512_mask", IX86_BUILTIN_ANDNPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_vinsertf32x8_mask, "__builtin_ia32_insertf32x8_mask", IX86_BUILTIN_INSERTF32X8, UNKNOWN, (int) V16SF_FTYPE_V16SF_V8SF_INT_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_vinserti32x8_mask, "__builtin_ia32_inserti32x8_mask", IX86_BUILTIN_INSERTI32X8, UNKNOWN, (int) V16SI_FTYPE_V16SI_V8SI_INT_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_vinsertf64x2_mask, "__builtin_ia32_insertf64x2_512_mask", IX86_BUILTIN_INSERTF64X2_512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V2DF_INT_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_vinserti64x2_mask, "__builtin_ia32_inserti64x2_512_mask", IX86_BUILTIN_INSERTI64X2_512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V2DI_INT_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_fpclassv8df_mask, "__builtin_ia32_fpclasspd512_mask", IX86_BUILTIN_FPCLASSPD512, UNKNOWN, (int) QI_FTYPE_V8DF_INT_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_fpclassv16sf_mask, "__builtin_ia32_fpclassps512_mask", IX86_BUILTIN_FPCLASSPS512, UNKNOWN, (int) HI_FTYPE_V16SF_INT_UHI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_cvtd2maskv16si, "__builtin_ia32_cvtd2mask512", IX86_BUILTIN_CVTD2MASK512, UNKNOWN, (int) UHI_FTYPE_V16SI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_cvtq2maskv8di, "__builtin_ia32_cvtq2mask512", IX86_BUILTIN_CVTQ2MASK512, UNKNOWN, (int) UQI_FTYPE_V8DI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_cvtmask2dv16si, "__builtin_ia32_cvtmask2d512", IX86_BUILTIN_CVTMASK2D512, UNKNOWN, (int) V16SI_FTYPE_UHI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_cvtmask2qv8di, "__builtin_ia32_cvtmask2q512", IX86_BUILTIN_CVTMASK2Q512, UNKNOWN, (int) V8DI_FTYPE_UQI)
 
 /* AVX512BW.  */
 BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kunpcksi, "__builtin_ia32_kunpcksi", IX86_BUILTIN_KUNPCKWD, UNKNOWN, (int) USI_FTYPE_USI_USI)
@@ -3207,26 +3207,26 @@ BDESC (OPTION_MASK_ISA_AVX512ER, 0, CODE_FOR_avx512er_vmrsqrt28v4sf_round, "__bu
 BDESC (OPTION_MASK_ISA_AVX512ER, 0, CODE_FOR_avx512er_vmrsqrt28v4sf_mask_round, "__builtin_ia32_rsqrt28ss_mask_round", IX86_BUILTIN_RSQRT28SS_MASK_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI_INT)
 
 /* AVX512DQ.  */
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_reducepv8df_mask_round, "__builtin_ia32_reducepd512_mask_round", IX86_BUILTIN_REDUCEPD512_MASK_ROUND, UNKNOWN, (int) V8DF_FTYPE_V8DF_INT_V8DF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_reducepv16sf_mask_round, "__builtin_ia32_reduceps512_mask_round", IX86_BUILTIN_REDUCEPS512_MASK_ROUND, UNKNOWN, (int) V16SF_FTYPE_V16SF_INT_V16SF_UHI_INT)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_reducepv8df_mask_round, "__builtin_ia32_reducepd512_mask_round", IX86_BUILTIN_REDUCEPD512_MASK_ROUND, UNKNOWN, (int) V8DF_FTYPE_V8DF_INT_V8DF_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_reducepv16sf_mask_round, "__builtin_ia32_reduceps512_mask_round", IX86_BUILTIN_REDUCEPS512_MASK_ROUND, UNKNOWN, (int) V16SF_FTYPE_V16SF_INT_V16SF_UHI_INT)
 BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_reducesv2df_mask_round, "__builtin_ia32_reducesd_mask_round", IX86_BUILTIN_REDUCESD128_MASK_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT_V2DF_UQI_INT)
 BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_reducesv4sf_mask_round, "__builtin_ia32_reducess_mask_round", IX86_BUILTIN_REDUCESS128_MASK_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT_V4SF_UQI_INT)
 BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_rangesv2df_mask_round, "__builtin_ia32_rangesd128_mask_round", IX86_BUILTIN_RANGESD128, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT_V2DF_UQI_INT)
 BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_rangesv4sf_mask_round, "__builtin_ia32_rangess128_mask_round", IX86_BUILTIN_RANGESS128, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT_V4SF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_fix_notruncv8dfv8di2_mask_round, "__builtin_ia32_cvtpd2qq512_mask", IX86_BUILTIN_CVTPD2QQ512, UNKNOWN, (int) V8DI_FTYPE_V8DF_V8DI_QI_INT)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_cvtps2qqv8di_mask_round, "__builtin_ia32_cvtps2qq512_mask", IX86_BUILTIN_CVTPS2QQ512, UNKNOWN, (int) V8DI_FTYPE_V8SF_V8DI_QI_INT)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_fixuns_notruncv8dfv8di2_mask_round, "__builtin_ia32_cvtpd2uqq512_mask", IX86_BUILTIN_CVTPD2UQQ512, UNKNOWN, (int) V8DI_FTYPE_V8DF_V8DI_QI_INT)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_cvtps2uqqv8di_mask_round, "__builtin_ia32_cvtps2uqq512_mask", IX86_BUILTIN_CVTPS2UQQ512, UNKNOWN, (int) V8DI_FTYPE_V8SF_V8DI_QI_INT)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_floatv8div8sf2_mask_round, "__builtin_ia32_cvtqq2ps512_mask", IX86_BUILTIN_CVTQQ2PS512, UNKNOWN, (int) V8SF_FTYPE_V8DI_V8SF_QI_INT)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_floatunsv8div8sf2_mask_round, "__builtin_ia32_cvtuqq2ps512_mask", IX86_BUILTIN_CVTUQQ2PS512, UNKNOWN, (int) V8SF_FTYPE_V8DI_V8SF_QI_INT)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_floatv8div8df2_mask_round, "__builtin_ia32_cvtqq2pd512_mask", IX86_BUILTIN_CVTQQ2PD512, UNKNOWN, (int) V8DF_FTYPE_V8DI_V8DF_QI_INT)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_floatunsv8div8df2_mask_round, "__builtin_ia32_cvtuqq2pd512_mask", IX86_BUILTIN_CVTUQQ2PD512, UNKNOWN, (int) V8DF_FTYPE_V8DI_V8DF_QI_INT)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_fix_truncv8sfv8di2_mask_round, "__builtin_ia32_cvttps2qq512_mask", IX86_BUILTIN_CVTTPS2QQ512, UNKNOWN, (int) V8DI_FTYPE_V8SF_V8DI_QI_INT)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_fixuns_truncv8sfv8di2_mask_round, "__builtin_ia32_cvttps2uqq512_mask", IX86_BUILTIN_CVTTPS2UQQ512, UNKNOWN, (int) V8DI_FTYPE_V8SF_V8DI_QI_INT)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_fix_truncv8dfv8di2_mask_round, "__builtin_ia32_cvttpd2qq512_mask", IX86_BUILTIN_CVTTPD2QQ512, UNKNOWN, (int) V8DI_FTYPE_V8DF_V8DI_QI_INT)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_fixuns_truncv8dfv8di2_mask_round, "__builtin_ia32_cvttpd2uqq512_mask", IX86_BUILTIN_CVTTPD2UQQ512, UNKNOWN, (int) V8DI_FTYPE_V8DF_V8DI_QI_INT)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_rangepv16sf_mask_round, "__builtin_ia32_rangeps512_mask", IX86_BUILTIN_RANGEPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_INT_V16SF_HI_INT)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_rangepv8df_mask_round, "__builtin_ia32_rangepd512_mask", IX86_BUILTIN_RANGEPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_INT_V8DF_QI_INT)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_fix_notruncv8dfv8di2_mask_round, "__builtin_ia32_cvtpd2qq512_mask", IX86_BUILTIN_CVTPD2QQ512, UNKNOWN, (int) V8DI_FTYPE_V8DF_V8DI_QI_INT)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_cvtps2qqv8di_mask_round, "__builtin_ia32_cvtps2qq512_mask", IX86_BUILTIN_CVTPS2QQ512, UNKNOWN, (int) V8DI_FTYPE_V8SF_V8DI_QI_INT)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_fixuns_notruncv8dfv8di2_mask_round, "__builtin_ia32_cvtpd2uqq512_mask", IX86_BUILTIN_CVTPD2UQQ512, UNKNOWN, (int) V8DI_FTYPE_V8DF_V8DI_QI_INT)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_cvtps2uqqv8di_mask_round, "__builtin_ia32_cvtps2uqq512_mask", IX86_BUILTIN_CVTPS2UQQ512, UNKNOWN, (int) V8DI_FTYPE_V8SF_V8DI_QI_INT)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_floatv8div8sf2_mask_round, "__builtin_ia32_cvtqq2ps512_mask", IX86_BUILTIN_CVTQQ2PS512, UNKNOWN, (int) V8SF_FTYPE_V8DI_V8SF_QI_INT)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_floatunsv8div8sf2_mask_round, "__builtin_ia32_cvtuqq2ps512_mask", IX86_BUILTIN_CVTUQQ2PS512, UNKNOWN, (int) V8SF_FTYPE_V8DI_V8SF_QI_INT)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_floatv8div8df2_mask_round, "__builtin_ia32_cvtqq2pd512_mask", IX86_BUILTIN_CVTQQ2PD512, UNKNOWN, (int) V8DF_FTYPE_V8DI_V8DF_QI_INT)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_floatunsv8div8df2_mask_round, "__builtin_ia32_cvtuqq2pd512_mask", IX86_BUILTIN_CVTUQQ2PD512, UNKNOWN, (int) V8DF_FTYPE_V8DI_V8DF_QI_INT)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_fix_truncv8sfv8di2_mask_round, "__builtin_ia32_cvttps2qq512_mask", IX86_BUILTIN_CVTTPS2QQ512, UNKNOWN, (int) V8DI_FTYPE_V8SF_V8DI_QI_INT)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_fixuns_truncv8sfv8di2_mask_round, "__builtin_ia32_cvttps2uqq512_mask", IX86_BUILTIN_CVTTPS2UQQ512, UNKNOWN, (int) V8DI_FTYPE_V8SF_V8DI_QI_INT)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_fix_truncv8dfv8di2_mask_round, "__builtin_ia32_cvttpd2qq512_mask", IX86_BUILTIN_CVTTPD2QQ512, UNKNOWN, (int) V8DI_FTYPE_V8DF_V8DI_QI_INT)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_fixuns_truncv8dfv8di2_mask_round, "__builtin_ia32_cvttpd2uqq512_mask", IX86_BUILTIN_CVTTPD2UQQ512, UNKNOWN, (int) V8DI_FTYPE_V8DF_V8DI_QI_INT)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_rangepv16sf_mask_round, "__builtin_ia32_rangeps512_mask", IX86_BUILTIN_RANGEPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_INT_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_rangepv8df_mask_round, "__builtin_ia32_rangepd512_mask", IX86_BUILTIN_RANGEPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_INT_V8DF_QI_INT)
 
 /* AVX512FP16.  */
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_addv32hf3_mask_round, "__builtin_ia32_addph512_mask_round", IX86_BUILTIN_ADDPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
-- 
2.31.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 09/18] [PATCH 3/5] Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins
  2023-09-21  7:19 [PATCH 00/18] Support -mevex512 for AVX512 Hu, Lin1
                   ` (7 preceding siblings ...)
  2023-09-21  7:20 ` [PATCH 08/18] [PATCH 2/5] " Hu, Lin1
@ 2023-09-21  7:20 ` Hu, Lin1
  2023-09-21  7:20 ` [PATCH 10/18] [PATCH 4/5] " Hu, Lin1
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Hu, Lin1 @ 2023-09-21  7:20 UTC (permalink / raw)
  To: gcc-patches; +Cc: hongtao.liu, ubizjak, haochen.jiang

From: Haochen Jiang <haochen.jiang@intel.com>

gcc/ChangeLog:

	* config/i386/i386-builtin.def (BDESC): Add
	OPTION_MASK_ISA2_EVEX512.
---
 gcc/config/i386/i386-builtin.def | 226 +++++++++++++++----------------
 1 file changed, 113 insertions(+), 113 deletions(-)

diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 7a0dec9bc8b..167d530a537 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -293,10 +293,10 @@ BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_CMPCCXADD, CODE_FOR_cmpccxadd_si,
 BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_CMPCCXADD, CODE_FOR_cmpccxadd_di, "__builtin_ia32_cmpccxadd64", IX86_BUILTIN_CMPCCXADD64, UNKNOWN, (int) LONGLONG_FTYPE_PLONGLONG_LONGLONG_LONGLONG_INT)
 
 /* AVX512BW */
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_loadv32hi_mask, "__builtin_ia32_loaddquhi512_mask", IX86_BUILTIN_LOADDQUHI512_MASK, UNKNOWN, (int) V32HI_FTYPE_PCSHORT_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_loadv64qi_mask, "__builtin_ia32_loaddquqi512_mask", IX86_BUILTIN_LOADDQUQI512_MASK, UNKNOWN, (int) V64QI_FTYPE_PCCHAR_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_storev32hi_mask, "__builtin_ia32_storedquhi512_mask", IX86_BUILTIN_STOREDQUHI512_MASK, UNKNOWN, (int) VOID_FTYPE_PSHORT_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_storev64qi_mask, "__builtin_ia32_storedquqi512_mask", IX86_BUILTIN_STOREDQUQI512_MASK, UNKNOWN, (int) VOID_FTYPE_PCHAR_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_loadv32hi_mask, "__builtin_ia32_loaddquhi512_mask", IX86_BUILTIN_LOADDQUHI512_MASK, UNKNOWN, (int) V32HI_FTYPE_PCSHORT_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_loadv64qi_mask, "__builtin_ia32_loaddquqi512_mask", IX86_BUILTIN_LOADDQUQI512_MASK, UNKNOWN, (int) V64QI_FTYPE_PCCHAR_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_storev32hi_mask, "__builtin_ia32_storedquhi512_mask", IX86_BUILTIN_STOREDQUHI512_MASK, UNKNOWN, (int) VOID_FTYPE_PSHORT_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_storev64qi_mask, "__builtin_ia32_storedquqi512_mask", IX86_BUILTIN_STOREDQUQI512_MASK, UNKNOWN, (int) VOID_FTYPE_PCHAR_V64QI_UDI)
 
 /* AVX512VP2INTERSECT */
 BDESC (0, OPTION_MASK_ISA2_AVX512VP2INTERSECT, CODE_FOR_nothing, "__builtin_ia32_2intersectd512", IX86_BUILTIN_2INTERSECTD512, UNKNOWN, (int) VOID_FTYPE_PUHI_PUHI_V16SI_V16SI)
@@ -407,9 +407,9 @@ BDESC (OPTION_MASK_ISA_AVX512BW | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl
 BDESC (OPTION_MASK_ISA_AVX512BW | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_ss_truncatev16hiv16qi2_mask_store, "__builtin_ia32_pmovswb256mem_mask", IX86_BUILTIN_PMOVSWB256_MEM, UNKNOWN, (int) VOID_FTYPE_PV16QI_V16HI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512BW | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_us_truncatev8hiv8qi2_mask_store_2, "__builtin_ia32_pmovuswb128mem_mask", IX86_BUILTIN_PMOVUSWB128_MEM, UNKNOWN, (int) VOID_FTYPE_PUDI_V8HI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512BW | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_us_truncatev16hiv16qi2_mask_store, "__builtin_ia32_pmovuswb256mem_mask", IX86_BUILTIN_PMOVUSWB256_MEM, UNKNOWN, (int) VOID_FTYPE_PV16QI_V16HI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_us_truncatev32hiv32qi2_mask_store, "__builtin_ia32_pmovuswb512mem_mask", IX86_BUILTIN_PMOVUSWB512_MEM, UNKNOWN, (int) VOID_FTYPE_PV32QI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_ss_truncatev32hiv32qi2_mask_store, "__builtin_ia32_pmovswb512mem_mask", IX86_BUILTIN_PMOVSWB512_MEM, UNKNOWN, (int) VOID_FTYPE_PV32QI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_truncatev32hiv32qi2_mask_store, "__builtin_ia32_pmovwb512mem_mask", IX86_BUILTIN_PMOVWB512_MEM, UNKNOWN, (int) VOID_FTYPE_PV32QI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_us_truncatev32hiv32qi2_mask_store, "__builtin_ia32_pmovuswb512mem_mask", IX86_BUILTIN_PMOVUSWB512_MEM, UNKNOWN, (int) VOID_FTYPE_PV32QI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_ss_truncatev32hiv32qi2_mask_store, "__builtin_ia32_pmovswb512mem_mask", IX86_BUILTIN_PMOVSWB512_MEM, UNKNOWN, (int) VOID_FTYPE_PV32QI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_truncatev32hiv32qi2_mask_store, "__builtin_ia32_pmovwb512mem_mask", IX86_BUILTIN_PMOVWB512_MEM, UNKNOWN, (int) VOID_FTYPE_PV32QI_V32HI_USI)
 
 /* AVX512FP16 */
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_loadhf_mask, "__builtin_ia32_loadsh_mask", IX86_BUILTIN_LOADSH_MASK, UNKNOWN, (int) V8HF_FTYPE_PCFLOAT16_V8HF_UQI)
@@ -1590,61 +1590,61 @@ BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_round
 BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_kashiftqi, "__builtin_ia32_kshiftliqi", IX86_BUILTIN_KSHIFTLI8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI_CONST)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_kashifthi, "__builtin_ia32_kshiftlihi", IX86_BUILTIN_KSHIFTLI16, UNKNOWN, (int) UHI_FTYPE_UHI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kashiftsi, "__builtin_ia32_kshiftlisi", IX86_BUILTIN_KSHIFTLI32, UNKNOWN, (int) USI_FTYPE_USI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kashiftdi, "__builtin_ia32_kshiftlidi", IX86_BUILTIN_KSHIFTLI64, UNKNOWN, (int) UDI_FTYPE_UDI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_kashiftdi, "__builtin_ia32_kshiftlidi", IX86_BUILTIN_KSHIFTLI64, UNKNOWN, (int) UDI_FTYPE_UDI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_klshiftrtqi, "__builtin_ia32_kshiftriqi", IX86_BUILTIN_KSHIFTRI8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI_CONST)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_klshiftrthi, "__builtin_ia32_kshiftrihi", IX86_BUILTIN_KSHIFTRI16, UNKNOWN, (int) UHI_FTYPE_UHI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_klshiftrtsi, "__builtin_ia32_kshiftrisi", IX86_BUILTIN_KSHIFTRI32, UNKNOWN, (int) USI_FTYPE_USI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_klshiftrtdi, "__builtin_ia32_kshiftridi", IX86_BUILTIN_KSHIFTRI64, UNKNOWN, (int) UDI_FTYPE_UDI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_klshiftrtdi, "__builtin_ia32_kshiftridi", IX86_BUILTIN_KSHIFTRI64, UNKNOWN, (int) UDI_FTYPE_UDI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_kandqi, "__builtin_ia32_kandqi", IX86_BUILTIN_KAND8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_kandhi, "__builtin_ia32_kandhi", IX86_BUILTIN_KAND16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kandsi, "__builtin_ia32_kandsi", IX86_BUILTIN_KAND32, UNKNOWN, (int) USI_FTYPE_USI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kanddi, "__builtin_ia32_kanddi", IX86_BUILTIN_KAND64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_kanddi, "__builtin_ia32_kanddi", IX86_BUILTIN_KAND64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
 BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_kandnqi, "__builtin_ia32_kandnqi", IX86_BUILTIN_KANDN8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_kandnhi, "__builtin_ia32_kandnhi", IX86_BUILTIN_KANDN16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kandnsi, "__builtin_ia32_kandnsi", IX86_BUILTIN_KANDN32, UNKNOWN, (int) USI_FTYPE_USI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kandndi, "__builtin_ia32_kandndi", IX86_BUILTIN_KANDN64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_kandndi, "__builtin_ia32_kandndi", IX86_BUILTIN_KANDN64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
 BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_knotqi, "__builtin_ia32_knotqi", IX86_BUILTIN_KNOT8, UNKNOWN, (int) UQI_FTYPE_UQI)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_knothi, "__builtin_ia32_knothi", IX86_BUILTIN_KNOT16, UNKNOWN, (int) UHI_FTYPE_UHI)
 BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_knotsi, "__builtin_ia32_knotsi", IX86_BUILTIN_KNOT32, UNKNOWN, (int) USI_FTYPE_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_knotdi, "__builtin_ia32_knotdi", IX86_BUILTIN_KNOT64, UNKNOWN, (int) UDI_FTYPE_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_knotdi, "__builtin_ia32_knotdi", IX86_BUILTIN_KNOT64, UNKNOWN, (int) UDI_FTYPE_UDI)
 BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_kiorqi, "__builtin_ia32_korqi", IX86_BUILTIN_KOR8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_kiorhi, "__builtin_ia32_korhi", IX86_BUILTIN_KOR16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kiorsi, "__builtin_ia32_korsi", IX86_BUILTIN_KOR32, UNKNOWN, (int) USI_FTYPE_USI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kiordi, "__builtin_ia32_kordi", IX86_BUILTIN_KOR64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_kiordi, "__builtin_ia32_kordi", IX86_BUILTIN_KOR64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
 BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_ktestqi, "__builtin_ia32_ktestcqi", IX86_BUILTIN_KTESTC8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_ktestqi, "__builtin_ia32_ktestzqi", IX86_BUILTIN_KTESTZ8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_ktesthi, "__builtin_ia32_ktestchi", IX86_BUILTIN_KTESTC16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_ktesthi, "__builtin_ia32_ktestzhi", IX86_BUILTIN_KTESTZ16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_ktestsi, "__builtin_ia32_ktestcsi", IX86_BUILTIN_KTESTC32, UNKNOWN, (int) USI_FTYPE_USI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_ktestsi, "__builtin_ia32_ktestzsi", IX86_BUILTIN_KTESTZ32, UNKNOWN, (int) USI_FTYPE_USI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_ktestdi, "__builtin_ia32_ktestcdi", IX86_BUILTIN_KTESTC64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_ktestdi, "__builtin_ia32_ktestzdi", IX86_BUILTIN_KTESTZ64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_ktestsi, "__builtin_ia32_ktestcsi", IX86_BUILTIN_KTESTC32, UNKNOWN, (int) USI_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_ktestsi, "__builtin_ia32_ktestzsi", IX86_BUILTIN_KTESTZ32, UNKNOWN, (int) USI_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_ktestdi, "__builtin_ia32_ktestcdi", IX86_BUILTIN_KTESTC64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_ktestdi, "__builtin_ia32_ktestzdi", IX86_BUILTIN_KTESTZ64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
 BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_kortestqi, "__builtin_ia32_kortestcqi", IX86_BUILTIN_KORTESTC8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_kortestqi, "__builtin_ia32_kortestzqi", IX86_BUILTIN_KORTESTZ8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_kortesthi, "__builtin_ia32_kortestchi", IX86_BUILTIN_KORTESTC16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_kortesthi, "__builtin_ia32_kortestzhi", IX86_BUILTIN_KORTESTZ16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kortestsi, "__builtin_ia32_kortestcsi", IX86_BUILTIN_KORTESTC32, UNKNOWN, (int) USI_FTYPE_USI_USI)
 BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kortestsi, "__builtin_ia32_kortestzsi", IX86_BUILTIN_KORTESTZ32, UNKNOWN, (int) USI_FTYPE_USI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kortestdi, "__builtin_ia32_kortestcdi", IX86_BUILTIN_KORTESTC64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kortestdi, "__builtin_ia32_kortestzdi", IX86_BUILTIN_KORTESTZ64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_kortestdi, "__builtin_ia32_kortestcdi", IX86_BUILTIN_KORTESTC64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_kortestdi, "__builtin_ia32_kortestzdi", IX86_BUILTIN_KORTESTZ64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
 
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_kunpckhi, "__builtin_ia32_kunpckhi", IX86_BUILTIN_KUNPCKBW, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_kxnorqi, "__builtin_ia32_kxnorqi", IX86_BUILTIN_KXNOR8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_kxnorhi, "__builtin_ia32_kxnorhi", IX86_BUILTIN_KXNOR16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kxnorsi, "__builtin_ia32_kxnorsi", IX86_BUILTIN_KXNOR32, UNKNOWN, (int) USI_FTYPE_USI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kxnordi, "__builtin_ia32_kxnordi", IX86_BUILTIN_KXNOR64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_kxnordi, "__builtin_ia32_kxnordi", IX86_BUILTIN_KXNOR64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
 BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_kxorqi, "__builtin_ia32_kxorqi", IX86_BUILTIN_KXOR8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_kxorhi, "__builtin_ia32_kxorhi", IX86_BUILTIN_KXOR16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kxorsi, "__builtin_ia32_kxorsi", IX86_BUILTIN_KXOR32, UNKNOWN, (int) USI_FTYPE_USI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kxordi, "__builtin_ia32_kxordi", IX86_BUILTIN_KXOR64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_kxordi, "__builtin_ia32_kxordi", IX86_BUILTIN_KXOR64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
 BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_kmovb, "__builtin_ia32_kmovb", IX86_BUILTIN_KMOV8, UNKNOWN, (int) UQI_FTYPE_UQI)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_kmovw, "__builtin_ia32_kmovw", IX86_BUILTIN_KMOV16, UNKNOWN, (int) UHI_FTYPE_UHI)
 BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kmovd, "__builtin_ia32_kmovd", IX86_BUILTIN_KMOV32, UNKNOWN, (int) USI_FTYPE_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kmovq, "__builtin_ia32_kmovq", IX86_BUILTIN_KMOV64, UNKNOWN, (int) UDI_FTYPE_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_kmovq, "__builtin_ia32_kmovq", IX86_BUILTIN_KMOV64, UNKNOWN, (int) UDI_FTYPE_UDI)
 BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_kaddqi, "__builtin_ia32_kaddqi", IX86_BUILTIN_KADD8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_kaddhi, "__builtin_ia32_kaddhi", IX86_BUILTIN_KADD16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kaddsi, "__builtin_ia32_kaddsi", IX86_BUILTIN_KADD32, UNKNOWN, (int) USI_FTYPE_USI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kadddi, "__builtin_ia32_kadddi", IX86_BUILTIN_KADD64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_kadddi, "__builtin_ia32_kadddi", IX86_BUILTIN_KADD64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
 
 /* SHA */
 BDESC (OPTION_MASK_ISA_SSE2, 0, CODE_FOR_sha1msg1, 0, IX86_BUILTIN_SHA1MSG1, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI)
@@ -2442,96 +2442,96 @@ BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_cvtm
 
 /* AVX512BW.  */
 BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kunpcksi, "__builtin_ia32_kunpcksi", IX86_BUILTIN_KUNPCKWD, UNKNOWN, (int) USI_FTYPE_USI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kunpckdi, "__builtin_ia32_kunpckdi", IX86_BUILTIN_KUNPCKDQ, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_packusdw_mask, "__builtin_ia32_packusdw512_mask",  IX86_BUILTIN_PACKUSDW512, UNKNOWN, (int) V32HI_FTYPE_V16SI_V16SI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_ashlv4ti3, "__builtin_ia32_pslldq512", IX86_BUILTIN_PSLLDQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_INT_CONVERT)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_lshrv4ti3, "__builtin_ia32_psrldq512", IX86_BUILTIN_PSRLDQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_INT_CONVERT)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_packssdw_mask, "__builtin_ia32_packssdw512_mask",  IX86_BUILTIN_PACKSSDW512, UNKNOWN, (int) V32HI_FTYPE_V16SI_V16SI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_palignrv4ti, "__builtin_ia32_palignr512", IX86_BUILTIN_PALIGNR512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_INT_CONVERT)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_palignrv64qi_mask, "__builtin_ia32_palignr512_mask", IX86_BUILTIN_PALIGNR512_MASK, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_INT_V8DI_UDI_CONVERT)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_loadv32hi_mask, "__builtin_ia32_movdquhi512_mask", IX86_BUILTIN_MOVDQUHI512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_loadv64qi_mask, "__builtin_ia32_movdquqi512_mask", IX86_BUILTIN_MOVDQUQI512_MASK, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512f_psadbw, "__builtin_ia32_psadbw512", IX86_BUILTIN_PSADBW512, UNKNOWN, (int) V8DI_FTYPE_V64QI_V64QI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_dbpsadbwv32hi_mask, "__builtin_ia32_dbpsadbw512_mask", IX86_BUILTIN_DBPSADBW512, UNKNOWN, (int) V32HI_FTYPE_V64QI_V64QI_INT_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_vec_dupv64qi_mask, "__builtin_ia32_pbroadcastb512_mask", IX86_BUILTIN_PBROADCASTB512, UNKNOWN, (int) V64QI_FTYPE_V16QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_vec_dup_gprv64qi_mask, "__builtin_ia32_pbroadcastb512_gpr_mask", IX86_BUILTIN_PBROADCASTB512_GPR, UNKNOWN, (int) V64QI_FTYPE_QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_vec_dupv32hi_mask, "__builtin_ia32_pbroadcastw512_mask", IX86_BUILTIN_PBROADCASTW512, UNKNOWN, (int) V32HI_FTYPE_V8HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_vec_dup_gprv32hi_mask, "__builtin_ia32_pbroadcastw512_gpr_mask", IX86_BUILTIN_PBROADCASTW512_GPR, UNKNOWN, (int) V32HI_FTYPE_HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_sign_extendv32qiv32hi2_mask, "__builtin_ia32_pmovsxbw512_mask", IX86_BUILTIN_PMOVSXBW512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32QI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_zero_extendv32qiv32hi2_mask, "__builtin_ia32_pmovzxbw512_mask", IX86_BUILTIN_PMOVZXBW512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32QI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_permvarv32hi_mask, "__builtin_ia32_permvarhi512_mask", IX86_BUILTIN_VPERMVARHI512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_vpermt2varv32hi3_mask, "__builtin_ia32_vpermt2varhi512_mask", IX86_BUILTIN_VPERMT2VARHI512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_vpermt2varv32hi3_maskz, "__builtin_ia32_vpermt2varhi512_maskz", IX86_BUILTIN_VPERMT2VARHI512_MASKZ, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_vpermi2varv32hi3_mask, "__builtin_ia32_vpermi2varhi512_mask", IX86_BUILTIN_VPERMI2VARHI512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_uavgv64qi3_mask, "__builtin_ia32_pavgb512_mask", IX86_BUILTIN_PAVGB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_uavgv32hi3_mask, "__builtin_ia32_pavgw512_mask", IX86_BUILTIN_PAVGW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_addv64qi3_mask, "__builtin_ia32_paddb512_mask", IX86_BUILTIN_PADDB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_subv64qi3_mask, "__builtin_ia32_psubb512_mask", IX86_BUILTIN_PSUBB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_sssubv64qi3_mask, "__builtin_ia32_psubsb512_mask", IX86_BUILTIN_PSUBSB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_ssaddv64qi3_mask, "__builtin_ia32_paddsb512_mask", IX86_BUILTIN_PADDSB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_ussubv64qi3_mask, "__builtin_ia32_psubusb512_mask", IX86_BUILTIN_PSUBUSB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_usaddv64qi3_mask, "__builtin_ia32_paddusb512_mask", IX86_BUILTIN_PADDUSB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_subv32hi3_mask, "__builtin_ia32_psubw512_mask", IX86_BUILTIN_PSUBW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_addv32hi3_mask, "__builtin_ia32_paddw512_mask", IX86_BUILTIN_PADDW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_sssubv32hi3_mask, "__builtin_ia32_psubsw512_mask", IX86_BUILTIN_PSUBSW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_ssaddv32hi3_mask, "__builtin_ia32_paddsw512_mask", IX86_BUILTIN_PADDSW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_ussubv32hi3_mask, "__builtin_ia32_psubusw512_mask", IX86_BUILTIN_PSUBUSW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_usaddv32hi3_mask, "__builtin_ia32_paddusw512_mask", IX86_BUILTIN_PADDUSW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_umaxv32hi3_mask, "__builtin_ia32_pmaxuw512_mask", IX86_BUILTIN_PMAXUW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_smaxv32hi3_mask, "__builtin_ia32_pmaxsw512_mask", IX86_BUILTIN_PMAXSW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_uminv32hi3_mask, "__builtin_ia32_pminuw512_mask", IX86_BUILTIN_PMINUW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_sminv32hi3_mask, "__builtin_ia32_pminsw512_mask", IX86_BUILTIN_PMINSW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_umaxv64qi3_mask, "__builtin_ia32_pmaxub512_mask", IX86_BUILTIN_PMAXUB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_smaxv64qi3_mask, "__builtin_ia32_pmaxsb512_mask", IX86_BUILTIN_PMAXSB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_uminv64qi3_mask, "__builtin_ia32_pminub512_mask", IX86_BUILTIN_PMINUB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_sminv64qi3_mask, "__builtin_ia32_pminsb512_mask", IX86_BUILTIN_PMINSB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_truncatev32hiv32qi2_mask, "__builtin_ia32_pmovwb512_mask", IX86_BUILTIN_PMOVWB512, UNKNOWN, (int) V32QI_FTYPE_V32HI_V32QI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_ss_truncatev32hiv32qi2_mask, "__builtin_ia32_pmovswb512_mask", IX86_BUILTIN_PMOVSWB512, UNKNOWN, (int) V32QI_FTYPE_V32HI_V32QI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_us_truncatev32hiv32qi2_mask, "__builtin_ia32_pmovuswb512_mask", IX86_BUILTIN_PMOVUSWB512, UNKNOWN, (int) V32QI_FTYPE_V32HI_V32QI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_umulhrswv32hi3_mask, "__builtin_ia32_pmulhrsw512_mask", IX86_BUILTIN_PMULHRSW512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_umulv32hi3_highpart_mask, "__builtin_ia32_pmulhuw512_mask" , IX86_BUILTIN_PMULHUW512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_smulv32hi3_highpart_mask, "__builtin_ia32_pmulhw512_mask"  , IX86_BUILTIN_PMULHW512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_mulv32hi3_mask, "__builtin_ia32_pmullw512_mask", IX86_BUILTIN_PMULLW512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_ashlv32hi3_mask, "__builtin_ia32_psllwi512_mask", IX86_BUILTIN_PSLLWI512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_INT_V32HI_USI_COUNT)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_ashlv32hi3_mask, "__builtin_ia32_psllw512_mask", IX86_BUILTIN_PSLLW512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V8HI_V32HI_USI_COUNT)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_packsswb_mask, "__builtin_ia32_packsswb512_mask",  IX86_BUILTIN_PACKSSWB512, UNKNOWN, (int) V64QI_FTYPE_V32HI_V32HI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_packuswb_mask, "__builtin_ia32_packuswb512_mask",  IX86_BUILTIN_PACKUSWB512, UNKNOWN, (int) V64QI_FTYPE_V32HI_V32HI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_ashrvv32hi_mask, "__builtin_ia32_psrav32hi_mask", IX86_BUILTIN_PSRAVV32HI, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_pmaddubsw512v32hi_mask, "__builtin_ia32_pmaddubsw512_mask", IX86_BUILTIN_PMADDUBSW512_MASK, UNKNOWN, (int) V32HI_FTYPE_V64QI_V64QI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_pmaddwd512v32hi_mask, "__builtin_ia32_pmaddwd512_mask", IX86_BUILTIN_PMADDWD512_MASK, UNKNOWN, (int) V16SI_FTYPE_V32HI_V32HI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_lshrvv32hi_mask, "__builtin_ia32_psrlv32hi_mask", IX86_BUILTIN_PSRLVV32HI, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_interleave_highv64qi_mask, "__builtin_ia32_punpckhbw512_mask", IX86_BUILTIN_PUNPCKHBW512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_interleave_highv32hi_mask, "__builtin_ia32_punpckhwd512_mask", IX86_BUILTIN_PUNPCKHWD512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_interleave_lowv64qi_mask, "__builtin_ia32_punpcklbw512_mask", IX86_BUILTIN_PUNPCKLBW512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_interleave_lowv32hi_mask, "__builtin_ia32_punpcklwd512_mask", IX86_BUILTIN_PUNPCKLWD512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_pshufbv64qi3_mask, "__builtin_ia32_pshufb512_mask", IX86_BUILTIN_PSHUFB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_pshufhwv32hi_mask, "__builtin_ia32_pshufhw512_mask", IX86_BUILTIN_PSHUFHW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_INT_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_pshuflwv32hi_mask, "__builtin_ia32_pshuflw512_mask", IX86_BUILTIN_PSHUFLW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_INT_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_ashrv32hi3_mask, "__builtin_ia32_psrawi512_mask", IX86_BUILTIN_PSRAWI512, UNKNOWN, (int) V32HI_FTYPE_V32HI_INT_V32HI_USI_COUNT)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_ashrv32hi3_mask, "__builtin_ia32_psraw512_mask", IX86_BUILTIN_PSRAW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V8HI_V32HI_USI_COUNT)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_lshrv32hi3_mask, "__builtin_ia32_psrlwi512_mask", IX86_BUILTIN_PSRLWI512, UNKNOWN, (int) V32HI_FTYPE_V32HI_INT_V32HI_USI_COUNT)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_lshrv32hi3_mask, "__builtin_ia32_psrlw512_mask", IX86_BUILTIN_PSRLW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V8HI_V32HI_USI_COUNT)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_cvtb2maskv64qi, "__builtin_ia32_cvtb2mask512", IX86_BUILTIN_CVTB2MASK512, UNKNOWN, (int) UDI_FTYPE_V64QI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_cvtw2maskv32hi, "__builtin_ia32_cvtw2mask512", IX86_BUILTIN_CVTW2MASK512, UNKNOWN, (int) USI_FTYPE_V32HI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_cvtmask2bv64qi, "__builtin_ia32_cvtmask2b512", IX86_BUILTIN_CVTMASK2B512, UNKNOWN, (int) V64QI_FTYPE_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_cvtmask2wv32hi, "__builtin_ia32_cvtmask2w512", IX86_BUILTIN_CVTMASK2W512, UNKNOWN, (int) V32HI_FTYPE_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_eqv64qi3_mask, "__builtin_ia32_pcmpeqb512_mask", IX86_BUILTIN_PCMPEQB512_MASK, UNKNOWN, (int) UDI_FTYPE_V64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_eqv32hi3_mask, "__builtin_ia32_pcmpeqw512_mask", IX86_BUILTIN_PCMPEQW512_MASK, UNKNOWN, (int) USI_FTYPE_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_gtv64qi3_mask, "__builtin_ia32_pcmpgtb512_mask", IX86_BUILTIN_PCMPGTB512_MASK, UNKNOWN, (int) UDI_FTYPE_V64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_gtv32hi3_mask, "__builtin_ia32_pcmpgtw512_mask", IX86_BUILTIN_PCMPGTW512_MASK, UNKNOWN, (int) USI_FTYPE_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_testmv64qi3_mask, "__builtin_ia32_ptestmb512", IX86_BUILTIN_PTESTMB512, UNKNOWN, (int) UDI_FTYPE_V64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_testmv32hi3_mask, "__builtin_ia32_ptestmw512", IX86_BUILTIN_PTESTMW512, UNKNOWN, (int) USI_FTYPE_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_testnmv64qi3_mask, "__builtin_ia32_ptestnmb512", IX86_BUILTIN_PTESTNMB512, UNKNOWN, (int) UDI_FTYPE_V64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_testnmv32hi3_mask, "__builtin_ia32_ptestnmw512", IX86_BUILTIN_PTESTNMW512, UNKNOWN, (int) USI_FTYPE_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_ashlvv32hi_mask, "__builtin_ia32_psllv32hi_mask", IX86_BUILTIN_PSLLVV32HI, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_absv64qi2_mask, "__builtin_ia32_pabsb512_mask", IX86_BUILTIN_PABSB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_absv32hi2_mask, "__builtin_ia32_pabsw512_mask", IX86_BUILTIN_PABSW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_blendmv32hi, "__builtin_ia32_blendmw_512_mask", IX86_BUILTIN_BLENDMW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_blendmv64qi, "__builtin_ia32_blendmb_512_mask", IX86_BUILTIN_BLENDMB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_cmpv64qi3_mask, "__builtin_ia32_cmpb512_mask", IX86_BUILTIN_CMPB512, UNKNOWN, (int) UDI_FTYPE_V64QI_V64QI_INT_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_cmpv32hi3_mask, "__builtin_ia32_cmpw512_mask", IX86_BUILTIN_CMPW512, UNKNOWN, (int) USI_FTYPE_V32HI_V32HI_INT_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_ucmpv64qi3_mask, "__builtin_ia32_ucmpb512_mask", IX86_BUILTIN_UCMPB512, UNKNOWN, (int) UDI_FTYPE_V64QI_V64QI_INT_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_ucmpv32hi3_mask, "__builtin_ia32_ucmpw512_mask", IX86_BUILTIN_UCMPW512, UNKNOWN, (int) USI_FTYPE_V32HI_V32HI_INT_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_kunpckdi, "__builtin_ia32_kunpckdi", IX86_BUILTIN_KUNPCKDQ, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_packusdw_mask, "__builtin_ia32_packusdw512_mask",  IX86_BUILTIN_PACKUSDW512, UNKNOWN, (int) V32HI_FTYPE_V16SI_V16SI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_ashlv4ti3, "__builtin_ia32_pslldq512", IX86_BUILTIN_PSLLDQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_INT_CONVERT)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_lshrv4ti3, "__builtin_ia32_psrldq512", IX86_BUILTIN_PSRLDQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_INT_CONVERT)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_packssdw_mask, "__builtin_ia32_packssdw512_mask",  IX86_BUILTIN_PACKSSDW512, UNKNOWN, (int) V32HI_FTYPE_V16SI_V16SI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_palignrv4ti, "__builtin_ia32_palignr512", IX86_BUILTIN_PALIGNR512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_INT_CONVERT)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_palignrv64qi_mask, "__builtin_ia32_palignr512_mask", IX86_BUILTIN_PALIGNR512_MASK, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_INT_V8DI_UDI_CONVERT)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_loadv32hi_mask, "__builtin_ia32_movdquhi512_mask", IX86_BUILTIN_MOVDQUHI512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_loadv64qi_mask, "__builtin_ia32_movdquqi512_mask", IX86_BUILTIN_MOVDQUQI512_MASK, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_psadbw, "__builtin_ia32_psadbw512", IX86_BUILTIN_PSADBW512, UNKNOWN, (int) V8DI_FTYPE_V64QI_V64QI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_dbpsadbwv32hi_mask, "__builtin_ia32_dbpsadbw512_mask", IX86_BUILTIN_DBPSADBW512, UNKNOWN, (int) V32HI_FTYPE_V64QI_V64QI_INT_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_vec_dupv64qi_mask, "__builtin_ia32_pbroadcastb512_mask", IX86_BUILTIN_PBROADCASTB512, UNKNOWN, (int) V64QI_FTYPE_V16QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_vec_dup_gprv64qi_mask, "__builtin_ia32_pbroadcastb512_gpr_mask", IX86_BUILTIN_PBROADCASTB512_GPR, UNKNOWN, (int) V64QI_FTYPE_QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_vec_dupv32hi_mask, "__builtin_ia32_pbroadcastw512_mask", IX86_BUILTIN_PBROADCASTW512, UNKNOWN, (int) V32HI_FTYPE_V8HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_vec_dup_gprv32hi_mask, "__builtin_ia32_pbroadcastw512_gpr_mask", IX86_BUILTIN_PBROADCASTW512_GPR, UNKNOWN, (int) V32HI_FTYPE_HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_sign_extendv32qiv32hi2_mask, "__builtin_ia32_pmovsxbw512_mask", IX86_BUILTIN_PMOVSXBW512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32QI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_zero_extendv32qiv32hi2_mask, "__builtin_ia32_pmovzxbw512_mask", IX86_BUILTIN_PMOVZXBW512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32QI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_permvarv32hi_mask, "__builtin_ia32_permvarhi512_mask", IX86_BUILTIN_VPERMVARHI512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_vpermt2varv32hi3_mask, "__builtin_ia32_vpermt2varhi512_mask", IX86_BUILTIN_VPERMT2VARHI512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_vpermt2varv32hi3_maskz, "__builtin_ia32_vpermt2varhi512_maskz", IX86_BUILTIN_VPERMT2VARHI512_MASKZ, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_vpermi2varv32hi3_mask, "__builtin_ia32_vpermi2varhi512_mask", IX86_BUILTIN_VPERMI2VARHI512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_uavgv64qi3_mask, "__builtin_ia32_pavgb512_mask", IX86_BUILTIN_PAVGB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_uavgv32hi3_mask, "__builtin_ia32_pavgw512_mask", IX86_BUILTIN_PAVGW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_addv64qi3_mask, "__builtin_ia32_paddb512_mask", IX86_BUILTIN_PADDB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_subv64qi3_mask, "__builtin_ia32_psubb512_mask", IX86_BUILTIN_PSUBB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_sssubv64qi3_mask, "__builtin_ia32_psubsb512_mask", IX86_BUILTIN_PSUBSB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_ssaddv64qi3_mask, "__builtin_ia32_paddsb512_mask", IX86_BUILTIN_PADDSB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_ussubv64qi3_mask, "__builtin_ia32_psubusb512_mask", IX86_BUILTIN_PSUBUSB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_usaddv64qi3_mask, "__builtin_ia32_paddusb512_mask", IX86_BUILTIN_PADDUSB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_subv32hi3_mask, "__builtin_ia32_psubw512_mask", IX86_BUILTIN_PSUBW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_addv32hi3_mask, "__builtin_ia32_paddw512_mask", IX86_BUILTIN_PADDW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_sssubv32hi3_mask, "__builtin_ia32_psubsw512_mask", IX86_BUILTIN_PSUBSW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_ssaddv32hi3_mask, "__builtin_ia32_paddsw512_mask", IX86_BUILTIN_PADDSW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_ussubv32hi3_mask, "__builtin_ia32_psubusw512_mask", IX86_BUILTIN_PSUBUSW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_usaddv32hi3_mask, "__builtin_ia32_paddusw512_mask", IX86_BUILTIN_PADDUSW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_umaxv32hi3_mask, "__builtin_ia32_pmaxuw512_mask", IX86_BUILTIN_PMAXUW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_smaxv32hi3_mask, "__builtin_ia32_pmaxsw512_mask", IX86_BUILTIN_PMAXSW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_uminv32hi3_mask, "__builtin_ia32_pminuw512_mask", IX86_BUILTIN_PMINUW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_sminv32hi3_mask, "__builtin_ia32_pminsw512_mask", IX86_BUILTIN_PMINSW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_umaxv64qi3_mask, "__builtin_ia32_pmaxub512_mask", IX86_BUILTIN_PMAXUB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_smaxv64qi3_mask, "__builtin_ia32_pmaxsb512_mask", IX86_BUILTIN_PMAXSB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_uminv64qi3_mask, "__builtin_ia32_pminub512_mask", IX86_BUILTIN_PMINUB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_sminv64qi3_mask, "__builtin_ia32_pminsb512_mask", IX86_BUILTIN_PMINSB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_truncatev32hiv32qi2_mask, "__builtin_ia32_pmovwb512_mask", IX86_BUILTIN_PMOVWB512, UNKNOWN, (int) V32QI_FTYPE_V32HI_V32QI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_ss_truncatev32hiv32qi2_mask, "__builtin_ia32_pmovswb512_mask", IX86_BUILTIN_PMOVSWB512, UNKNOWN, (int) V32QI_FTYPE_V32HI_V32QI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_us_truncatev32hiv32qi2_mask, "__builtin_ia32_pmovuswb512_mask", IX86_BUILTIN_PMOVUSWB512, UNKNOWN, (int) V32QI_FTYPE_V32HI_V32QI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_umulhrswv32hi3_mask, "__builtin_ia32_pmulhrsw512_mask", IX86_BUILTIN_PMULHRSW512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_umulv32hi3_highpart_mask, "__builtin_ia32_pmulhuw512_mask" , IX86_BUILTIN_PMULHUW512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_smulv32hi3_highpart_mask, "__builtin_ia32_pmulhw512_mask"  , IX86_BUILTIN_PMULHW512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_mulv32hi3_mask, "__builtin_ia32_pmullw512_mask", IX86_BUILTIN_PMULLW512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_ashlv32hi3_mask, "__builtin_ia32_psllwi512_mask", IX86_BUILTIN_PSLLWI512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_INT_V32HI_USI_COUNT)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_ashlv32hi3_mask, "__builtin_ia32_psllw512_mask", IX86_BUILTIN_PSLLW512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V8HI_V32HI_USI_COUNT)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_packsswb_mask, "__builtin_ia32_packsswb512_mask",  IX86_BUILTIN_PACKSSWB512, UNKNOWN, (int) V64QI_FTYPE_V32HI_V32HI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_packuswb_mask, "__builtin_ia32_packuswb512_mask",  IX86_BUILTIN_PACKUSWB512, UNKNOWN, (int) V64QI_FTYPE_V32HI_V32HI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_ashrvv32hi_mask, "__builtin_ia32_psrav32hi_mask", IX86_BUILTIN_PSRAVV32HI, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_pmaddubsw512v32hi_mask, "__builtin_ia32_pmaddubsw512_mask", IX86_BUILTIN_PMADDUBSW512_MASK, UNKNOWN, (int) V32HI_FTYPE_V64QI_V64QI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_pmaddwd512v32hi_mask, "__builtin_ia32_pmaddwd512_mask", IX86_BUILTIN_PMADDWD512_MASK, UNKNOWN, (int) V16SI_FTYPE_V32HI_V32HI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_lshrvv32hi_mask, "__builtin_ia32_psrlv32hi_mask", IX86_BUILTIN_PSRLVV32HI, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_interleave_highv64qi_mask, "__builtin_ia32_punpckhbw512_mask", IX86_BUILTIN_PUNPCKHBW512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_interleave_highv32hi_mask, "__builtin_ia32_punpckhwd512_mask", IX86_BUILTIN_PUNPCKHWD512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_interleave_lowv64qi_mask, "__builtin_ia32_punpcklbw512_mask", IX86_BUILTIN_PUNPCKLBW512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_interleave_lowv32hi_mask, "__builtin_ia32_punpcklwd512_mask", IX86_BUILTIN_PUNPCKLWD512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_pshufbv64qi3_mask, "__builtin_ia32_pshufb512_mask", IX86_BUILTIN_PSHUFB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_pshufhwv32hi_mask, "__builtin_ia32_pshufhw512_mask", IX86_BUILTIN_PSHUFHW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_INT_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_pshuflwv32hi_mask, "__builtin_ia32_pshuflw512_mask", IX86_BUILTIN_PSHUFLW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_INT_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_ashrv32hi3_mask, "__builtin_ia32_psrawi512_mask", IX86_BUILTIN_PSRAWI512, UNKNOWN, (int) V32HI_FTYPE_V32HI_INT_V32HI_USI_COUNT)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_ashrv32hi3_mask, "__builtin_ia32_psraw512_mask", IX86_BUILTIN_PSRAW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V8HI_V32HI_USI_COUNT)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_lshrv32hi3_mask, "__builtin_ia32_psrlwi512_mask", IX86_BUILTIN_PSRLWI512, UNKNOWN, (int) V32HI_FTYPE_V32HI_INT_V32HI_USI_COUNT)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_lshrv32hi3_mask, "__builtin_ia32_psrlw512_mask", IX86_BUILTIN_PSRLW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V8HI_V32HI_USI_COUNT)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_cvtb2maskv64qi, "__builtin_ia32_cvtb2mask512", IX86_BUILTIN_CVTB2MASK512, UNKNOWN, (int) UDI_FTYPE_V64QI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_cvtw2maskv32hi, "__builtin_ia32_cvtw2mask512", IX86_BUILTIN_CVTW2MASK512, UNKNOWN, (int) USI_FTYPE_V32HI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_cvtmask2bv64qi, "__builtin_ia32_cvtmask2b512", IX86_BUILTIN_CVTMASK2B512, UNKNOWN, (int) V64QI_FTYPE_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_cvtmask2wv32hi, "__builtin_ia32_cvtmask2w512", IX86_BUILTIN_CVTMASK2W512, UNKNOWN, (int) V32HI_FTYPE_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_eqv64qi3_mask, "__builtin_ia32_pcmpeqb512_mask", IX86_BUILTIN_PCMPEQB512_MASK, UNKNOWN, (int) UDI_FTYPE_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_eqv32hi3_mask, "__builtin_ia32_pcmpeqw512_mask", IX86_BUILTIN_PCMPEQW512_MASK, UNKNOWN, (int) USI_FTYPE_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_gtv64qi3_mask, "__builtin_ia32_pcmpgtb512_mask", IX86_BUILTIN_PCMPGTB512_MASK, UNKNOWN, (int) UDI_FTYPE_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_gtv32hi3_mask, "__builtin_ia32_pcmpgtw512_mask", IX86_BUILTIN_PCMPGTW512_MASK, UNKNOWN, (int) USI_FTYPE_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_testmv64qi3_mask, "__builtin_ia32_ptestmb512", IX86_BUILTIN_PTESTMB512, UNKNOWN, (int) UDI_FTYPE_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_testmv32hi3_mask, "__builtin_ia32_ptestmw512", IX86_BUILTIN_PTESTMW512, UNKNOWN, (int) USI_FTYPE_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_testnmv64qi3_mask, "__builtin_ia32_ptestnmb512", IX86_BUILTIN_PTESTNMB512, UNKNOWN, (int) UDI_FTYPE_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_testnmv32hi3_mask, "__builtin_ia32_ptestnmw512", IX86_BUILTIN_PTESTNMW512, UNKNOWN, (int) USI_FTYPE_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_ashlvv32hi_mask, "__builtin_ia32_psllv32hi_mask", IX86_BUILTIN_PSLLVV32HI, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_absv64qi2_mask, "__builtin_ia32_pabsb512_mask", IX86_BUILTIN_PABSB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_absv32hi2_mask, "__builtin_ia32_pabsw512_mask", IX86_BUILTIN_PABSW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_blendmv32hi, "__builtin_ia32_blendmw_512_mask", IX86_BUILTIN_BLENDMW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_blendmv64qi, "__builtin_ia32_blendmb_512_mask", IX86_BUILTIN_BLENDMB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_cmpv64qi3_mask, "__builtin_ia32_cmpb512_mask", IX86_BUILTIN_CMPB512, UNKNOWN, (int) UDI_FTYPE_V64QI_V64QI_INT_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_cmpv32hi3_mask, "__builtin_ia32_cmpw512_mask", IX86_BUILTIN_CMPW512, UNKNOWN, (int) USI_FTYPE_V32HI_V32HI_INT_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_ucmpv64qi3_mask, "__builtin_ia32_ucmpb512_mask", IX86_BUILTIN_UCMPB512, UNKNOWN, (int) UDI_FTYPE_V64QI_V64QI_INT_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_ucmpv32hi3_mask, "__builtin_ia32_ucmpw512_mask", IX86_BUILTIN_UCMPW512, UNKNOWN, (int) USI_FTYPE_V32HI_V32HI_INT_USI)
 
 /* AVX512IFMA */
 BDESC (OPTION_MASK_ISA_AVX512IFMA, 0, CODE_FOR_vpmadd52luqv8di_mask, "__builtin_ia32_vpmadd52luq512_mask", IX86_BUILTIN_VPMADD52LUQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
-- 
2.31.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 10/18] [PATCH 4/5] Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins
  2023-09-21  7:19 [PATCH 00/18] Support -mevex512 for AVX512 Hu, Lin1
                   ` (8 preceding siblings ...)
  2023-09-21  7:20 ` [PATCH 09/18] [PATCH 3/5] " Hu, Lin1
@ 2023-09-21  7:20 ` Hu, Lin1
  2023-09-21  7:20 ` [PATCH 11/18] [PATCH 5/5] " Hu, Lin1
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Hu, Lin1 @ 2023-09-21  7:20 UTC (permalink / raw)
  To: gcc-patches; +Cc: hongtao.liu, ubizjak, haochen.jiang

From: Haochen Jiang <haochen.jiang@intel.com>

gcc/ChangeLog:

	* config/i386/i386-builtin.def (BDESC): Add
	OPTION_MASK_ISA2_EVEX512.
---
 gcc/config/i386/i386-builtin.def | 188 +++++++++++++++----------------
 1 file changed, 94 insertions(+), 94 deletions(-)

diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 167d530a537..8250e2998cd 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -299,8 +299,8 @@ BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_sto
 BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_storev64qi_mask, "__builtin_ia32_storedquqi512_mask", IX86_BUILTIN_STOREDQUQI512_MASK, UNKNOWN, (int) VOID_FTYPE_PCHAR_V64QI_UDI)
 
 /* AVX512VP2INTERSECT */
-BDESC (0, OPTION_MASK_ISA2_AVX512VP2INTERSECT, CODE_FOR_nothing, "__builtin_ia32_2intersectd512", IX86_BUILTIN_2INTERSECTD512, UNKNOWN, (int) VOID_FTYPE_PUHI_PUHI_V16SI_V16SI)
-BDESC (0, OPTION_MASK_ISA2_AVX512VP2INTERSECT, CODE_FOR_nothing, "__builtin_ia32_2intersectq512", IX86_BUILTIN_2INTERSECTQ512, UNKNOWN, (int) VOID_FTYPE_PUQI_PUQI_V8DI_V8DI)
+BDESC (0, OPTION_MASK_ISA2_AVX512VP2INTERSECT | OPTION_MASK_ISA2_EVEX512, CODE_FOR_nothing, "__builtin_ia32_2intersectd512", IX86_BUILTIN_2INTERSECTD512, UNKNOWN, (int) VOID_FTYPE_PUHI_PUHI_V16SI_V16SI)
+BDESC (0, OPTION_MASK_ISA2_AVX512VP2INTERSECT | OPTION_MASK_ISA2_EVEX512, CODE_FOR_nothing, "__builtin_ia32_2intersectq512", IX86_BUILTIN_2INTERSECTQ512, UNKNOWN, (int) VOID_FTYPE_PUQI_PUQI_V8DI_V8DI)
 BDESC (0, OPTION_MASK_ISA2_AVX512VP2INTERSECT, CODE_FOR_nothing, "__builtin_ia32_2intersectd256", IX86_BUILTIN_2INTERSECTD256, UNKNOWN, (int) VOID_FTYPE_PUQI_PUQI_V8SI_V8SI)
 BDESC (0, OPTION_MASK_ISA2_AVX512VP2INTERSECT, CODE_FOR_nothing, "__builtin_ia32_2intersectq256", IX86_BUILTIN_2INTERSECTQ256, UNKNOWN, (int) VOID_FTYPE_PUQI_PUQI_V4DI_V4DI)
 BDESC (0, OPTION_MASK_ISA2_AVX512VP2INTERSECT, CODE_FOR_nothing, "__builtin_ia32_2intersectd128", IX86_BUILTIN_2INTERSECTD128, UNKNOWN, (int) VOID_FTYPE_PUQI_PUQI_V4SI_V4SI)
@@ -430,17 +430,17 @@ BDESC (OPTION_MASK_ISA_PKU, 0, CODE_FOR_rdpkru,  "__builtin_ia32_rdpkru", IX86_B
 BDESC (OPTION_MASK_ISA_PKU, 0, CODE_FOR_wrpkru,  "__builtin_ia32_wrpkru", IX86_BUILTIN_WRPKRU, UNKNOWN, (int) VOID_FTYPE_UNSIGNED)
 
 /* VBMI2 */
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_compressstorev64qi_mask, "__builtin_ia32_compressstoreuqi512_mask", IX86_BUILTIN_PCOMPRESSBSTORE512, UNKNOWN, (int) VOID_FTYPE_PV64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_compressstorev32hi_mask, "__builtin_ia32_compressstoreuhi512_mask", IX86_BUILTIN_PCOMPRESSWSTORE512, UNKNOWN, (int) VOID_FTYPE_PV32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_compressstorev64qi_mask, "__builtin_ia32_compressstoreuqi512_mask", IX86_BUILTIN_PCOMPRESSBSTORE512, UNKNOWN, (int) VOID_FTYPE_PV64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_compressstorev32hi_mask, "__builtin_ia32_compressstoreuhi512_mask", IX86_BUILTIN_PCOMPRESSWSTORE512, UNKNOWN, (int) VOID_FTYPE_PV32HI_V32HI_USI)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_compressstorev32qi_mask, "__builtin_ia32_compressstoreuqi256_mask", IX86_BUILTIN_PCOMPRESSBSTORE256, UNKNOWN, (int) VOID_FTYPE_PV32QI_V32QI_USI)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_compressstorev16qi_mask, "__builtin_ia32_compressstoreuqi128_mask", IX86_BUILTIN_PCOMPRESSBSTORE128, UNKNOWN, (int) VOID_FTYPE_PV16QI_V16QI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_compressstorev16hi_mask, "__builtin_ia32_compressstoreuhi256_mask", IX86_BUILTIN_PCOMPRESSWSTORE256, UNKNOWN, (int) VOID_FTYPE_PV16HI_V16HI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_compressstorev8hi_mask, "__builtin_ia32_compressstoreuhi128_mask", IX86_BUILTIN_PCOMPRESSWSTORE128, UNKNOWN, (int) VOID_FTYPE_PV8HI_V8HI_UQI)
 
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_expandv64qi_mask, "__builtin_ia32_expandloadqi512_mask", IX86_BUILTIN_PEXPANDBLOAD512, UNKNOWN, (int) V64QI_FTYPE_PCV64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_expandv64qi_maskz, "__builtin_ia32_expandloadqi512_maskz", IX86_BUILTIN_PEXPANDBLOAD512Z, UNKNOWN, (int) V64QI_FTYPE_PCV64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_expandv32hi_mask, "__builtin_ia32_expandloadhi512_mask", IX86_BUILTIN_PEXPANDWLOAD512, UNKNOWN, (int) V32HI_FTYPE_PCV32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_expandv32hi_maskz, "__builtin_ia32_expandloadhi512_maskz", IX86_BUILTIN_PEXPANDWLOAD512Z, UNKNOWN, (int) V32HI_FTYPE_PCV32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_expandv64qi_mask, "__builtin_ia32_expandloadqi512_mask", IX86_BUILTIN_PEXPANDBLOAD512, UNKNOWN, (int) V64QI_FTYPE_PCV64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_expandv64qi_maskz, "__builtin_ia32_expandloadqi512_maskz", IX86_BUILTIN_PEXPANDBLOAD512Z, UNKNOWN, (int) V64QI_FTYPE_PCV64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_expandv32hi_mask, "__builtin_ia32_expandloadhi512_mask", IX86_BUILTIN_PEXPANDWLOAD512, UNKNOWN, (int) V32HI_FTYPE_PCV32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_expandv32hi_maskz, "__builtin_ia32_expandloadhi512_maskz", IX86_BUILTIN_PEXPANDWLOAD512Z, UNKNOWN, (int) V32HI_FTYPE_PCV32HI_V32HI_USI)
 
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv32qi_mask, "__builtin_ia32_expandloadqi256_mask", IX86_BUILTIN_PEXPANDBLOAD256, UNKNOWN, (int) V32QI_FTYPE_PCV32QI_V32QI_USI)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv32qi_maskz, "__builtin_ia32_expandloadqi256_maskz", IX86_BUILTIN_PEXPANDBLOAD256Z, UNKNOWN, (int) V32QI_FTYPE_PCV32QI_V32QI_USI)
@@ -2534,10 +2534,10 @@ BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_ucm
 BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_ucmpv32hi3_mask, "__builtin_ia32_ucmpw512_mask", IX86_BUILTIN_UCMPW512, UNKNOWN, (int) USI_FTYPE_V32HI_V32HI_INT_USI)
 
 /* AVX512IFMA */
-BDESC (OPTION_MASK_ISA_AVX512IFMA, 0, CODE_FOR_vpmadd52luqv8di_mask, "__builtin_ia32_vpmadd52luq512_mask", IX86_BUILTIN_VPMADD52LUQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512IFMA, 0, CODE_FOR_vpmadd52luqv8di_maskz, "__builtin_ia32_vpmadd52luq512_maskz", IX86_BUILTIN_VPMADD52LUQ512_MASKZ, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512IFMA, 0, CODE_FOR_vpmadd52huqv8di_mask, "__builtin_ia32_vpmadd52huq512_mask", IX86_BUILTIN_VPMADD52HUQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512IFMA, 0, CODE_FOR_vpmadd52huqv8di_maskz, "__builtin_ia32_vpmadd52huq512_maskz", IX86_BUILTIN_VPMADD52HUQ512_MASKZ, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512IFMA, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpmadd52luqv8di_mask, "__builtin_ia32_vpmadd52luq512_mask", IX86_BUILTIN_VPMADD52LUQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512IFMA, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpmadd52luqv8di_maskz, "__builtin_ia32_vpmadd52luq512_maskz", IX86_BUILTIN_VPMADD52LUQ512_MASKZ, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512IFMA, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpmadd52huqv8di_mask, "__builtin_ia32_vpmadd52huq512_mask", IX86_BUILTIN_VPMADD52HUQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512IFMA, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpmadd52huqv8di_maskz, "__builtin_ia32_vpmadd52huq512_maskz", IX86_BUILTIN_VPMADD52HUQ512_MASKZ, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512IFMA | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpmadd52luqv4di_mask, "__builtin_ia32_vpmadd52luq256_mask", IX86_BUILTIN_VPMADD52LUQ256, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_V4DI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512IFMA | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpmadd52luqv4di_maskz, "__builtin_ia32_vpmadd52luq256_maskz", IX86_BUILTIN_VPMADD52LUQ256_MASKZ, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_V4DI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512IFMA | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpmadd52huqv4di_mask, "__builtin_ia32_vpmadd52huq256_mask", IX86_BUILTIN_VPMADD52HUQ256, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_V4DI_UQI)
@@ -2552,13 +2552,13 @@ BDESC (OPTION_MASK_ISA_AVX512IFMA | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_A
 BDESC (OPTION_MASK_ISA_AVX512IFMA | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVXIFMA, CODE_FOR_vpmadd52huqv2di, "__builtin_ia32_vpmadd52huq128", IX86_BUINTIN_VPMADD52HUQ128, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_V2DI)
 
 /* AVX512VBMI */
-BDESC (OPTION_MASK_ISA_AVX512VBMI, 0, CODE_FOR_vpmultishiftqbv64qi_mask, "__builtin_ia32_vpmultishiftqb512_mask", IX86_BUILTIN_VPMULTISHIFTQB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpmultishiftqbv64qi_mask, "__builtin_ia32_vpmultishiftqb512_mask", IX86_BUILTIN_VPMULTISHIFTQB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
 BDESC (OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpmultishiftqbv32qi_mask, "__builtin_ia32_vpmultishiftqb256_mask", IX86_BUILTIN_VPMULTISHIFTQB256, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_V32QI_USI)
 BDESC (OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpmultishiftqbv16qi_mask, "__builtin_ia32_vpmultishiftqb128_mask", IX86_BUILTIN_VPMULTISHIFTQB128, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_V16QI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI, 0, CODE_FOR_avx512bw_permvarv64qi_mask, "__builtin_ia32_permvarqi512_mask", IX86_BUILTIN_VPERMVARQI512_MASK, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI, 0, CODE_FOR_avx512bw_vpermt2varv64qi3_mask, "__builtin_ia32_vpermt2varqi512_mask", IX86_BUILTIN_VPERMT2VARQI512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI, 0, CODE_FOR_avx512bw_vpermt2varv64qi3_maskz, "__builtin_ia32_vpermt2varqi512_maskz", IX86_BUILTIN_VPERMT2VARQI512_MASKZ, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI, 0, CODE_FOR_avx512bw_vpermi2varv64qi3_mask, "__builtin_ia32_vpermi2varqi512_mask", IX86_BUILTIN_VPERMI2VARQI512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_permvarv64qi_mask, "__builtin_ia32_permvarqi512_mask", IX86_BUILTIN_VPERMVARQI512_MASK, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_vpermt2varv64qi3_mask, "__builtin_ia32_vpermt2varqi512_mask", IX86_BUILTIN_VPERMT2VARQI512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_vpermt2varv64qi3_maskz, "__builtin_ia32_vpermt2varqi512_maskz", IX86_BUILTIN_VPERMT2VARQI512_MASKZ, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_vpermi2varv64qi3_mask, "__builtin_ia32_vpermi2varqi512_mask", IX86_BUILTIN_VPERMI2VARQI512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
 BDESC (OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_permvarv32qi_mask, "__builtin_ia32_permvarqi256_mask", IX86_BUILTIN_VPERMVARQI256_MASK, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_V32QI_USI)
 BDESC (OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_permvarv16qi_mask, "__builtin_ia32_permvarqi128_mask", IX86_BUILTIN_VPERMVARQI128_MASK, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_V16QI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_vpermt2varv32qi3_mask, "__builtin_ia32_vpermt2varqi256_mask", IX86_BUILTIN_VPERMT2VARQI256, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_V32QI_USI)
@@ -2569,16 +2569,16 @@ BDESC (OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512
 BDESC (OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_vpermi2varv16qi3_mask, "__builtin_ia32_vpermi2varqi128_mask", IX86_BUILTIN_VPERMI2VARQI128, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_V16QI_UHI)
 
 /* VBMI2 */
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_compressv64qi_mask, "__builtin_ia32_compressqi512_mask", IX86_BUILTIN_PCOMPRESSB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_compressv32hi_mask, "__builtin_ia32_compresshi512_mask", IX86_BUILTIN_PCOMPRESSW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_compressv64qi_mask, "__builtin_ia32_compressqi512_mask", IX86_BUILTIN_PCOMPRESSB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_compressv32hi_mask, "__builtin_ia32_compresshi512_mask", IX86_BUILTIN_PCOMPRESSW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_USI)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_compressv32qi_mask, "__builtin_ia32_compressqi256_mask", IX86_BUILTIN_PCOMPRESSB256, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_USI)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_compressv16qi_mask, "__builtin_ia32_compressqi128_mask", IX86_BUILTIN_PCOMPRESSB128, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_compressv16hi_mask, "__builtin_ia32_compresshi256_mask", IX86_BUILTIN_PCOMPRESSW256, UNKNOWN, (int) V16HI_FTYPE_V16HI_V16HI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_compressv8hi_mask, "__builtin_ia32_compresshi128_mask", IX86_BUILTIN_PCOMPRESSW128, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_expandv64qi_mask, "__builtin_ia32_expandqi512_mask", IX86_BUILTIN_PEXPANDB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_expandv64qi_maskz, "__builtin_ia32_expandqi512_maskz", IX86_BUILTIN_PEXPANDB512Z, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_expandv32hi_mask, "__builtin_ia32_expandhi512_mask", IX86_BUILTIN_PEXPANDW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_expandv32hi_maskz, "__builtin_ia32_expandhi512_maskz", IX86_BUILTIN_PEXPANDW512Z, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_expandv64qi_mask, "__builtin_ia32_expandqi512_mask", IX86_BUILTIN_PEXPANDB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_expandv64qi_maskz, "__builtin_ia32_expandqi512_maskz", IX86_BUILTIN_PEXPANDB512Z, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_expandv32hi_mask, "__builtin_ia32_expandhi512_mask", IX86_BUILTIN_PEXPANDW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_expandv32hi_maskz, "__builtin_ia32_expandhi512_maskz", IX86_BUILTIN_PEXPANDW512Z, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_USI)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv32qi_mask, "__builtin_ia32_expandqi256_mask", IX86_BUILTIN_PEXPANDB256, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_USI)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv32qi_maskz, "__builtin_ia32_expandqi256_maskz", IX86_BUILTIN_PEXPANDB256Z, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_USI)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv16qi_mask, "__builtin_ia32_expandqi128_mask", IX86_BUILTIN_PEXPANDB128, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_UHI)
@@ -2587,64 +2587,64 @@ BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expan
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv16hi_maskz, "__builtin_ia32_expandhi256_maskz", IX86_BUILTIN_PEXPANDW256Z, UNKNOWN, (int) V16HI_FTYPE_V16HI_V16HI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv8hi_mask, "__builtin_ia32_expandhi128_mask", IX86_BUILTIN_PEXPANDW128, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv8hi_maskz, "__builtin_ia32_expandhi128_maskz", IX86_BUILTIN_PEXPANDW128Z, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshrd_v32hi, "__builtin_ia32_vpshrd_v32hi", IX86_BUILTIN_VPSHRDV32HI, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_INT)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshrd_v32hi_mask, "__builtin_ia32_vpshrd_v32hi_mask", IX86_BUILTIN_VPSHRDV32HI_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_INT_V32HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshrd_v32hi, "__builtin_ia32_vpshrd_v32hi", IX86_BUILTIN_VPSHRDV32HI, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshrd_v32hi_mask, "__builtin_ia32_vpshrd_v32hi_mask", IX86_BUILTIN_VPSHRDV32HI_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_INT_V32HI_INT)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrd_v16hi, "__builtin_ia32_vpshrd_v16hi", IX86_BUILTIN_VPSHRDV16HI, UNKNOWN, (int) V16HI_FTYPE_V16HI_V16HI_INT)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrd_v16hi_mask, "__builtin_ia32_vpshrd_v16hi_mask", IX86_BUILTIN_VPSHRDV16HI_MASK, UNKNOWN, (int) V16HI_FTYPE_V16HI_V16HI_INT_V16HI_INT)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrd_v8hi, "__builtin_ia32_vpshrd_v8hi", IX86_BUILTIN_VPSHRDV8HI, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI_INT)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrd_v8hi_mask, "__builtin_ia32_vpshrd_v8hi_mask", IX86_BUILTIN_VPSHRDV8HI_MASK, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI_INT_V8HI_INT)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshrd_v16si, "__builtin_ia32_vpshrd_v16si", IX86_BUILTIN_VPSHRDV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_INT)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshrd_v16si_mask, "__builtin_ia32_vpshrd_v16si_mask", IX86_BUILTIN_VPSHRDV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_INT_V16SI_INT)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshrd_v16si, "__builtin_ia32_vpshrd_v16si", IX86_BUILTIN_VPSHRDV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_INT)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshrd_v16si_mask, "__builtin_ia32_vpshrd_v16si_mask", IX86_BUILTIN_VPSHRDV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_INT_V16SI_INT)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrd_v8si, "__builtin_ia32_vpshrd_v8si", IX86_BUILTIN_VPSHRDV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_INT)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrd_v8si_mask, "__builtin_ia32_vpshrd_v8si_mask", IX86_BUILTIN_VPSHRDV8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_INT_V8SI_INT)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrd_v4si, "__builtin_ia32_vpshrd_v4si", IX86_BUILTIN_VPSHRDV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_INT)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrd_v4si_mask, "__builtin_ia32_vpshrd_v4si_mask", IX86_BUILTIN_VPSHRDV4SI_MASK, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_INT_V4SI_INT)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshrd_v8di, "__builtin_ia32_vpshrd_v8di", IX86_BUILTIN_VPSHRDV8DI, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_INT)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshrd_v8di_mask, "__builtin_ia32_vpshrd_v8di_mask", IX86_BUILTIN_VPSHRDV8DI_MASK, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_INT_V8DI_INT)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshrd_v8di, "__builtin_ia32_vpshrd_v8di", IX86_BUILTIN_VPSHRDV8DI, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_INT)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshrd_v8di_mask, "__builtin_ia32_vpshrd_v8di_mask", IX86_BUILTIN_VPSHRDV8DI_MASK, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_INT_V8DI_INT)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrd_v4di, "__builtin_ia32_vpshrd_v4di", IX86_BUILTIN_VPSHRDV4DI, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_INT)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrd_v4di_mask, "__builtin_ia32_vpshrd_v4di_mask", IX86_BUILTIN_VPSHRDV4DI_MASK, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_INT_V4DI_INT)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrd_v2di, "__builtin_ia32_vpshrd_v2di", IX86_BUILTIN_VPSHRDV2DI, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_INT)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrd_v2di_mask, "__builtin_ia32_vpshrd_v2di_mask", IX86_BUILTIN_VPSHRDV2DI_MASK, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_INT_V2DI_INT)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshld_v32hi, "__builtin_ia32_vpshld_v32hi", IX86_BUILTIN_VPSHLDV32HI, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_INT)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshld_v32hi_mask, "__builtin_ia32_vpshld_v32hi_mask", IX86_BUILTIN_VPSHLDV32HI_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_INT_V32HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshld_v32hi, "__builtin_ia32_vpshld_v32hi", IX86_BUILTIN_VPSHLDV32HI, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshld_v32hi_mask, "__builtin_ia32_vpshld_v32hi_mask", IX86_BUILTIN_VPSHLDV32HI_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_INT_V32HI_INT)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshld_v16hi, "__builtin_ia32_vpshld_v16hi", IX86_BUILTIN_VPSHLDV16HI, UNKNOWN, (int) V16HI_FTYPE_V16HI_V16HI_INT)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshld_v16hi_mask, "__builtin_ia32_vpshld_v16hi_mask", IX86_BUILTIN_VPSHLDV16HI_MASK, UNKNOWN, (int) V16HI_FTYPE_V16HI_V16HI_INT_V16HI_INT)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshld_v8hi, "__builtin_ia32_vpshld_v8hi", IX86_BUILTIN_VPSHLDV8HI, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI_INT)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshld_v8hi_mask, "__builtin_ia32_vpshld_v8hi_mask", IX86_BUILTIN_VPSHLDV8HI_MASK, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI_INT_V8HI_INT)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshld_v16si, "__builtin_ia32_vpshld_v16si", IX86_BUILTIN_VPSHLDV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_INT)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshld_v16si_mask, "__builtin_ia32_vpshld_v16si_mask", IX86_BUILTIN_VPSHLDV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_INT_V16SI_INT)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshld_v16si, "__builtin_ia32_vpshld_v16si", IX86_BUILTIN_VPSHLDV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_INT)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshld_v16si_mask, "__builtin_ia32_vpshld_v16si_mask", IX86_BUILTIN_VPSHLDV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_INT_V16SI_INT)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshld_v8si, "__builtin_ia32_vpshld_v8si", IX86_BUILTIN_VPSHLDV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_INT)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshld_v8si_mask, "__builtin_ia32_vpshld_v8si_mask", IX86_BUILTIN_VPSHLDV8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_INT_V8SI_INT)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshld_v4si, "__builtin_ia32_vpshld_v4si", IX86_BUILTIN_VPSHLDV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_INT)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshld_v4si_mask, "__builtin_ia32_vpshld_v4si_mask", IX86_BUILTIN_VPSHLDV4SI_MASK, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_INT_V4SI_INT)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshld_v8di, "__builtin_ia32_vpshld_v8di", IX86_BUILTIN_VPSHLDV8DI, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_INT)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshld_v8di_mask, "__builtin_ia32_vpshld_v8di_mask", IX86_BUILTIN_VPSHLDV8DI_MASK, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_INT_V8DI_INT)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshld_v8di, "__builtin_ia32_vpshld_v8di", IX86_BUILTIN_VPSHLDV8DI, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_INT)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshld_v8di_mask, "__builtin_ia32_vpshld_v8di_mask", IX86_BUILTIN_VPSHLDV8DI_MASK, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_INT_V8DI_INT)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshld_v4di, "__builtin_ia32_vpshld_v4di", IX86_BUILTIN_VPSHLDV4DI, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_INT)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshld_v4di_mask, "__builtin_ia32_vpshld_v4di_mask", IX86_BUILTIN_VPSHLDV4DI_MASK, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_INT_V4DI_INT)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshld_v2di, "__builtin_ia32_vpshld_v2di", IX86_BUILTIN_VPSHLDV2DI, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_INT)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshld_v2di_mask, "__builtin_ia32_vpshld_v2di_mask", IX86_BUILTIN_VPSHLDV2DI_MASK, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_INT_V2DI_INT)
 
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshrdv_v32hi, "__builtin_ia32_vpshrdv_v32hi", IX86_BUILTIN_VPSHRDVV32HI, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshrdv_v32hi_mask, "__builtin_ia32_vpshrdv_v32hi_mask", IX86_BUILTIN_VPSHRDVV32HI_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshrdv_v32hi_maskz, "__builtin_ia32_vpshrdv_v32hi_maskz", IX86_BUILTIN_VPSHRDVV32HI_MASKZ, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshrdv_v32hi, "__builtin_ia32_vpshrdv_v32hi", IX86_BUILTIN_VPSHRDVV32HI, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshrdv_v32hi_mask, "__builtin_ia32_vpshrdv_v32hi_mask", IX86_BUILTIN_VPSHRDVV32HI_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshrdv_v32hi_maskz, "__builtin_ia32_vpshrdv_v32hi_maskz", IX86_BUILTIN_VPSHRDVV32HI_MASKZ, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrdv_v16hi, "__builtin_ia32_vpshrdv_v16hi", IX86_BUILTIN_VPSHRDVV16HI, UNKNOWN, (int) V16HI_FTYPE_V16HI_V16HI_V16HI)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrdv_v16hi_mask, "__builtin_ia32_vpshrdv_v16hi_mask", IX86_BUILTIN_VPSHRDVV16HI_MASK, UNKNOWN, (int) V16HI_FTYPE_V16HI_V16HI_V16HI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrdv_v16hi_maskz, "__builtin_ia32_vpshrdv_v16hi_maskz", IX86_BUILTIN_VPSHRDVV16HI_MASKZ, UNKNOWN, (int) V16HI_FTYPE_V16HI_V16HI_V16HI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrdv_v8hi, "__builtin_ia32_vpshrdv_v8hi", IX86_BUILTIN_VPSHRDVV8HI, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI_V8HI)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrdv_v8hi_mask, "__builtin_ia32_vpshrdv_v8hi_mask", IX86_BUILTIN_VPSHRDVV8HI_MASK, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI_V8HI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrdv_v8hi_maskz, "__builtin_ia32_vpshrdv_v8hi_maskz", IX86_BUILTIN_VPSHRDVV8HI_MASKZ, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI_V8HI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshrdv_v16si, "__builtin_ia32_vpshrdv_v16si", IX86_BUILTIN_VPSHRDVV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshrdv_v16si_mask, "__builtin_ia32_vpshrdv_v16si_mask", IX86_BUILTIN_VPSHRDVV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshrdv_v16si_maskz, "__builtin_ia32_vpshrdv_v16si_maskz", IX86_BUILTIN_VPSHRDVV16SI_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshrdv_v16si, "__builtin_ia32_vpshrdv_v16si", IX86_BUILTIN_VPSHRDVV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshrdv_v16si_mask, "__builtin_ia32_vpshrdv_v16si_mask", IX86_BUILTIN_VPSHRDVV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshrdv_v16si_maskz, "__builtin_ia32_vpshrdv_v16si_maskz", IX86_BUILTIN_VPSHRDVV16SI_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrdv_v8si, "__builtin_ia32_vpshrdv_v8si", IX86_BUILTIN_VPSHRDVV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrdv_v8si_mask, "__builtin_ia32_vpshrdv_v8si_mask", IX86_BUILTIN_VPSHRDVV8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrdv_v8si_maskz, "__builtin_ia32_vpshrdv_v8si_maskz", IX86_BUILTIN_VPSHRDVV8SI_MASKZ, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrdv_v4si, "__builtin_ia32_vpshrdv_v4si", IX86_BUILTIN_VPSHRDVV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrdv_v4si_mask, "__builtin_ia32_vpshrdv_v4si_mask", IX86_BUILTIN_VPSHRDVV4SI_MASK, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrdv_v4si_maskz, "__builtin_ia32_vpshrdv_v4si_maskz", IX86_BUILTIN_VPSHRDVV4SI_MASKZ, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshrdv_v8di, "__builtin_ia32_vpshrdv_v8di", IX86_BUILTIN_VPSHRDVV8DI, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshrdv_v8di_mask, "__builtin_ia32_vpshrdv_v8di_mask", IX86_BUILTIN_VPSHRDVV8DI_MASK, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshrdv_v8di_maskz, "__builtin_ia32_vpshrdv_v8di_maskz", IX86_BUILTIN_VPSHRDVV8DI_MASKZ, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshrdv_v8di, "__builtin_ia32_vpshrdv_v8di", IX86_BUILTIN_VPSHRDVV8DI, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshrdv_v8di_mask, "__builtin_ia32_vpshrdv_v8di_mask", IX86_BUILTIN_VPSHRDVV8DI_MASK, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshrdv_v8di_maskz, "__builtin_ia32_vpshrdv_v8di_maskz", IX86_BUILTIN_VPSHRDVV8DI_MASKZ, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrdv_v4di, "__builtin_ia32_vpshrdv_v4di", IX86_BUILTIN_VPSHRDVV4DI, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_V4DI)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrdv_v4di_mask, "__builtin_ia32_vpshrdv_v4di_mask", IX86_BUILTIN_VPSHRDVV4DI_MASK, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_V4DI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrdv_v4di_maskz, "__builtin_ia32_vpshrdv_v4di_maskz", IX86_BUILTIN_VPSHRDVV4DI_MASKZ, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_V4DI_UQI)
@@ -2652,27 +2652,27 @@ BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshr
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrdv_v2di_mask, "__builtin_ia32_vpshrdv_v2di_mask", IX86_BUILTIN_VPSHRDVV2DI_MASK, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_V2DI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrdv_v2di_maskz, "__builtin_ia32_vpshrdv_v2di_maskz", IX86_BUILTIN_VPSHRDVV2DI_MASKZ, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_V2DI_UQI)
 
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshldv_v32hi, "__builtin_ia32_vpshldv_v32hi", IX86_BUILTIN_VPSHLDVV32HI, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshldv_v32hi_mask, "__builtin_ia32_vpshldv_v32hi_mask", IX86_BUILTIN_VPSHLDVV32HI_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshldv_v32hi_maskz, "__builtin_ia32_vpshldv_v32hi_maskz", IX86_BUILTIN_VPSHLDVV32HI_MASKZ, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshldv_v32hi, "__builtin_ia32_vpshldv_v32hi", IX86_BUILTIN_VPSHLDVV32HI, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshldv_v32hi_mask, "__builtin_ia32_vpshldv_v32hi_mask", IX86_BUILTIN_VPSHLDVV32HI_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshldv_v32hi_maskz, "__builtin_ia32_vpshldv_v32hi_maskz", IX86_BUILTIN_VPSHLDVV32HI_MASKZ, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshldv_v16hi, "__builtin_ia32_vpshldv_v16hi", IX86_BUILTIN_VPSHLDVV16HI, UNKNOWN, (int) V16HI_FTYPE_V16HI_V16HI_V16HI)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshldv_v16hi_mask, "__builtin_ia32_vpshldv_v16hi_mask", IX86_BUILTIN_VPSHLDVV16HI_MASK, UNKNOWN, (int) V16HI_FTYPE_V16HI_V16HI_V16HI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshldv_v16hi_maskz, "__builtin_ia32_vpshldv_v16hi_maskz", IX86_BUILTIN_VPSHLDVV16HI_MASKZ, UNKNOWN, (int) V16HI_FTYPE_V16HI_V16HI_V16HI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshldv_v8hi, "__builtin_ia32_vpshldv_v8hi", IX86_BUILTIN_VPSHLDVV8HI, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI_V8HI)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshldv_v8hi_mask, "__builtin_ia32_vpshldv_v8hi_mask", IX86_BUILTIN_VPSHLDVV8HI_MASK, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI_V8HI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshldv_v8hi_maskz, "__builtin_ia32_vpshldv_v8hi_maskz", IX86_BUILTIN_VPSHLDVV8HI_MASKZ, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI_V8HI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshldv_v16si, "__builtin_ia32_vpshldv_v16si", IX86_BUILTIN_VPSHLDVV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshldv_v16si_mask, "__builtin_ia32_vpshldv_v16si_mask", IX86_BUILTIN_VPSHLDVV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshldv_v16si_maskz, "__builtin_ia32_vpshldv_v16si_maskz", IX86_BUILTIN_VPSHLDVV16SI_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshldv_v16si, "__builtin_ia32_vpshldv_v16si", IX86_BUILTIN_VPSHLDVV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshldv_v16si_mask, "__builtin_ia32_vpshldv_v16si_mask", IX86_BUILTIN_VPSHLDVV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshldv_v16si_maskz, "__builtin_ia32_vpshldv_v16si_maskz", IX86_BUILTIN_VPSHLDVV16SI_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshldv_v8si, "__builtin_ia32_vpshldv_v8si", IX86_BUILTIN_VPSHLDVV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshldv_v8si_mask, "__builtin_ia32_vpshldv_v8si_mask", IX86_BUILTIN_VPSHLDVV8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshldv_v8si_maskz, "__builtin_ia32_vpshldv_v8si_maskz", IX86_BUILTIN_VPSHLDVV8SI_MASKZ, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshldv_v4si, "__builtin_ia32_vpshldv_v4si", IX86_BUILTIN_VPSHLDVV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshldv_v4si_mask, "__builtin_ia32_vpshldv_v4si_mask", IX86_BUILTIN_VPSHLDVV4SI_MASK, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshldv_v4si_maskz, "__builtin_ia32_vpshldv_v4si_maskz", IX86_BUILTIN_VPSHLDVV4SI_MASKZ, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshldv_v8di, "__builtin_ia32_vpshldv_v8di", IX86_BUILTIN_VPSHLDVV8DI, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshldv_v8di_mask, "__builtin_ia32_vpshldv_v8di_mask", IX86_BUILTIN_VPSHLDVV8DI_MASK, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshldv_v8di_maskz, "__builtin_ia32_vpshldv_v8di_maskz", IX86_BUILTIN_VPSHLDVV8DI_MASKZ, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshldv_v8di, "__builtin_ia32_vpshldv_v8di", IX86_BUILTIN_VPSHLDVV8DI, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshldv_v8di_mask, "__builtin_ia32_vpshldv_v8di_mask", IX86_BUILTIN_VPSHLDVV8DI_MASK, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshldv_v8di_maskz, "__builtin_ia32_vpshldv_v8di_maskz", IX86_BUILTIN_VPSHLDVV8DI_MASKZ, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshldv_v4di, "__builtin_ia32_vpshldv_v4di", IX86_BUILTIN_VPSHLDVV4DI, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_V4DI)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshldv_v4di_mask, "__builtin_ia32_vpshldv_v4di_mask", IX86_BUILTIN_VPSHLDVV4DI_MASK, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_V4DI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshldv_v4di_maskz, "__builtin_ia32_vpshldv_v4di_maskz", IX86_BUILTIN_VPSHLDVV4DI_MASKZ, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_V4DI_UQI)
@@ -2681,20 +2681,20 @@ BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshl
 BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshldv_v2di_maskz, "__builtin_ia32_vpshldv_v2di_maskz", IX86_BUILTIN_VPSHLDVV2DI_MASKZ, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_V2DI_UQI)
 
 /* GFNI */
-BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_vgf2p8affineinvqb_v64qi, "__builtin_ia32_vgf2p8affineinvqb_v64qi", IX86_BUILTIN_VGF2P8AFFINEINVQB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_INT)
-BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_vgf2p8affineinvqb_v64qi_mask, "__builtin_ia32_vgf2p8affineinvqb_v64qi_mask", IX86_BUILTIN_VGF2P8AFFINEINVQB512MASK, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_INT_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vgf2p8affineinvqb_v64qi, "__builtin_ia32_vgf2p8affineinvqb_v64qi", IX86_BUILTIN_VGF2P8AFFINEINVQB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_INT)
+BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vgf2p8affineinvqb_v64qi_mask, "__builtin_ia32_vgf2p8affineinvqb_v64qi_mask", IX86_BUILTIN_VGF2P8AFFINEINVQB512MASK, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_INT_V64QI_UDI)
 BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX, 0, CODE_FOR_vgf2p8affineinvqb_v32qi, "__builtin_ia32_vgf2p8affineinvqb_v32qi", IX86_BUILTIN_VGF2P8AFFINEINVQB256, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_INT)
 BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512VL | OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_vgf2p8affineinvqb_v32qi_mask, "__builtin_ia32_vgf2p8affineinvqb_v32qi_mask", IX86_BUILTIN_VGF2P8AFFINEINVQB256MASK, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_INT_V32QI_USI)
 BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_SSE2, 0, CODE_FOR_vgf2p8affineinvqb_v16qi, "__builtin_ia32_vgf2p8affineinvqb_v16qi", IX86_BUILTIN_VGF2P8AFFINEINVQB128, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_INT)
 BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vgf2p8affineinvqb_v16qi_mask, "__builtin_ia32_vgf2p8affineinvqb_v16qi_mask", IX86_BUILTIN_VGF2P8AFFINEINVQB128MASK, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_INT_V16QI_UHI)
-BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_vgf2p8affineqb_v64qi, "__builtin_ia32_vgf2p8affineqb_v64qi", IX86_BUILTIN_VGF2P8AFFINEQB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_INT)
-BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_vgf2p8affineqb_v64qi_mask, "__builtin_ia32_vgf2p8affineqb_v64qi_mask", IX86_BUILTIN_VGF2P8AFFINEQB512MASK, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_INT_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vgf2p8affineqb_v64qi, "__builtin_ia32_vgf2p8affineqb_v64qi", IX86_BUILTIN_VGF2P8AFFINEQB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_INT)
+BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vgf2p8affineqb_v64qi_mask, "__builtin_ia32_vgf2p8affineqb_v64qi_mask", IX86_BUILTIN_VGF2P8AFFINEQB512MASK, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_INT_V64QI_UDI)
 BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX, 0, CODE_FOR_vgf2p8affineqb_v32qi, "__builtin_ia32_vgf2p8affineqb_v32qi", IX86_BUILTIN_VGF2P8AFFINEQB256, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_INT)
 BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512VL | OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_vgf2p8affineqb_v32qi_mask, "__builtin_ia32_vgf2p8affineqb_v32qi_mask", IX86_BUILTIN_VGF2P8AFFINEQB256MASK, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_INT_V32QI_USI)
 BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_SSE2, 0, CODE_FOR_vgf2p8affineqb_v16qi, "__builtin_ia32_vgf2p8affineqb_v16qi", IX86_BUILTIN_VGF2P8AFFINEQB128, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_INT)
 BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vgf2p8affineqb_v16qi_mask, "__builtin_ia32_vgf2p8affineqb_v16qi_mask", IX86_BUILTIN_VGF2P8AFFINEQB128MASK, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_INT_V16QI_UHI)
-BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_vgf2p8mulb_v64qi, "__builtin_ia32_vgf2p8mulb_v64qi", IX86_BUILTIN_VGF2P8MULB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI)
-BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_vgf2p8mulb_v64qi_mask, "__builtin_ia32_vgf2p8mulb_v64qi_mask", IX86_BUILTIN_VGF2P8MULB512MASK, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vgf2p8mulb_v64qi, "__builtin_ia32_vgf2p8mulb_v64qi", IX86_BUILTIN_VGF2P8MULB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI)
+BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vgf2p8mulb_v64qi_mask, "__builtin_ia32_vgf2p8mulb_v64qi_mask", IX86_BUILTIN_VGF2P8MULB512MASK, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
 BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX, 0, CODE_FOR_vgf2p8mulb_v32qi, "__builtin_ia32_vgf2p8mulb_v32qi", IX86_BUILTIN_VGF2P8MULB256, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI)
 BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512VL | OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_vgf2p8mulb_v32qi_mask, "__builtin_ia32_vgf2p8mulb_v32qi_mask", IX86_BUILTIN_VGF2P8MULB256MASK, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_V32QI_USI)
 BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_SSE2, 0, CODE_FOR_vgf2p8mulb_v16qi, "__builtin_ia32_vgf2p8mulb_v16qi", IX86_BUILTIN_VGF2P8MULB128, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI)
@@ -2702,9 +2702,9 @@ BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vgf2p8mulb_v
 
 /* AVX512_VNNI */
 
-BDESC (OPTION_MASK_ISA_AVX512VNNI, 0, CODE_FOR_vpdpbusd_v16si, "__builtin_ia32_vpdpbusd_v16si", IX86_BUILTIN_VPDPBUSDV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI)
-BDESC (OPTION_MASK_ISA_AVX512VNNI, 0, CODE_FOR_vpdpbusd_v16si_mask, "__builtin_ia32_vpdpbusd_v16si_mask", IX86_BUILTIN_VPDPBUSDV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512VNNI, 0, CODE_FOR_vpdpbusd_v16si_maskz, "__builtin_ia32_vpdpbusd_v16si_maskz", IX86_BUILTIN_VPDPBUSDV16SI_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VNNI, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpdpbusd_v16si, "__builtin_ia32_vpdpbusd_v16si", IX86_BUILTIN_VPDPBUSDV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI)
+BDESC (OPTION_MASK_ISA_AVX512VNNI, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpdpbusd_v16si_mask, "__builtin_ia32_vpdpbusd_v16si_mask", IX86_BUILTIN_VPDPBUSDV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VNNI, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpdpbusd_v16si_maskz, "__builtin_ia32_vpdpbusd_v16si_maskz", IX86_BUILTIN_VPDPBUSDV16SI_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVXVNNI, CODE_FOR_vpdpbusd_v8si, "__builtin_ia32_vpdpbusd_v8si", IX86_BUILTIN_VPDPBUSDV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI)
 BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpdpbusd_v8si_mask, "__builtin_ia32_vpdpbusd_v8si_mask", IX86_BUILTIN_VPDPBUSDV8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpdpbusd_v8si_maskz, "__builtin_ia32_vpdpbusd_v8si_maskz", IX86_BUILTIN_VPDPBUSDV8SI_MASKZ, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI)
@@ -2712,9 +2712,9 @@ BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_A
 BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpdpbusd_v4si_mask, "__builtin_ia32_vpdpbusd_v4si_mask", IX86_BUILTIN_VPDPBUSDV4SI_MASK, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpdpbusd_v4si_maskz, "__builtin_ia32_vpdpbusd_v4si_maskz", IX86_BUILTIN_VPDPBUSDV4SI_MASKZ, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI)
 
-BDESC (OPTION_MASK_ISA_AVX512VNNI, 0, CODE_FOR_vpdpbusds_v16si, "__builtin_ia32_vpdpbusds_v16si", IX86_BUILTIN_VPDPBUSDSV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI)
-BDESC (OPTION_MASK_ISA_AVX512VNNI, 0, CODE_FOR_vpdpbusds_v16si_mask, "__builtin_ia32_vpdpbusds_v16si_mask", IX86_BUILTIN_VPDPBUSDSV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512VNNI, 0, CODE_FOR_vpdpbusds_v16si_maskz, "__builtin_ia32_vpdpbusds_v16si_maskz", IX86_BUILTIN_VPDPBUSDSV16SI_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VNNI, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpdpbusds_v16si, "__builtin_ia32_vpdpbusds_v16si", IX86_BUILTIN_VPDPBUSDSV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI)
+BDESC (OPTION_MASK_ISA_AVX512VNNI, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpdpbusds_v16si_mask, "__builtin_ia32_vpdpbusds_v16si_mask", IX86_BUILTIN_VPDPBUSDSV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VNNI, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpdpbusds_v16si_maskz, "__builtin_ia32_vpdpbusds_v16si_maskz", IX86_BUILTIN_VPDPBUSDSV16SI_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVXVNNI, CODE_FOR_vpdpbusds_v8si, "__builtin_ia32_vpdpbusds_v8si", IX86_BUILTIN_VPDPBUSDSV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI)
 BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpdpbusds_v8si_mask, "__builtin_ia32_vpdpbusds_v8si_mask", IX86_BUILTIN_VPDPBUSDSV8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpdpbusds_v8si_maskz, "__builtin_ia32_vpdpbusds_v8si_maskz", IX86_BUILTIN_VPDPBUSDSV8SI_MASKZ, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI)
@@ -2722,9 +2722,9 @@ BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_A
 BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpdpbusds_v4si_mask, "__builtin_ia32_vpdpbusds_v4si_mask", IX86_BUILTIN_VPDPBUSDSV4SI_MASK, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpdpbusds_v4si_maskz, "__builtin_ia32_vpdpbusds_v4si_maskz", IX86_BUILTIN_VPDPBUSDSV4SI_MASKZ, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI)
 
-BDESC (OPTION_MASK_ISA_AVX512VNNI, 0, CODE_FOR_vpdpwssd_v16si, "__builtin_ia32_vpdpwssd_v16si", IX86_BUILTIN_VPDPWSSDV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI)
-BDESC (OPTION_MASK_ISA_AVX512VNNI, 0, CODE_FOR_vpdpwssd_v16si_mask, "__builtin_ia32_vpdpwssd_v16si_mask", IX86_BUILTIN_VPDPWSSDV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512VNNI, 0, CODE_FOR_vpdpwssd_v16si_maskz, "__builtin_ia32_vpdpwssd_v16si_maskz", IX86_BUILTIN_VPDPWSSDV16SI_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VNNI, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpdpwssd_v16si, "__builtin_ia32_vpdpwssd_v16si", IX86_BUILTIN_VPDPWSSDV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI)
+BDESC (OPTION_MASK_ISA_AVX512VNNI, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpdpwssd_v16si_mask, "__builtin_ia32_vpdpwssd_v16si_mask", IX86_BUILTIN_VPDPWSSDV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VNNI, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpdpwssd_v16si_maskz, "__builtin_ia32_vpdpwssd_v16si_maskz", IX86_BUILTIN_VPDPWSSDV16SI_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVXVNNI, CODE_FOR_vpdpwssd_v8si, "__builtin_ia32_vpdpwssd_v8si", IX86_BUILTIN_VPDPWSSDV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI)
 BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpdpwssd_v8si_mask, "__builtin_ia32_vpdpwssd_v8si_mask", IX86_BUILTIN_VPDPWSSDV8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpdpwssd_v8si_maskz, "__builtin_ia32_vpdpwssd_v8si_maskz", IX86_BUILTIN_VPDPWSSDV8SI_MASKZ, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI)
@@ -2732,9 +2732,9 @@ BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_A
 BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpdpwssd_v4si_mask, "__builtin_ia32_vpdpwssd_v4si_mask", IX86_BUILTIN_VPDPWSSDV4SI_MASK, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpdpwssd_v4si_maskz, "__builtin_ia32_vpdpwssd_v4si_maskz", IX86_BUILTIN_VPDPWSSDV4SI_MASKZ, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI)
 
-BDESC (OPTION_MASK_ISA_AVX512VNNI, 0, CODE_FOR_vpdpwssds_v16si, "__builtin_ia32_vpdpwssds_v16si", IX86_BUILTIN_VPDPWSSDSV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI)
-BDESC (OPTION_MASK_ISA_AVX512VNNI, 0, CODE_FOR_vpdpwssds_v16si_mask, "__builtin_ia32_vpdpwssds_v16si_mask", IX86_BUILTIN_VPDPWSSDSV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512VNNI, 0, CODE_FOR_vpdpwssds_v16si_maskz, "__builtin_ia32_vpdpwssds_v16si_maskz", IX86_BUILTIN_VPDPWSSDSV16SI_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VNNI, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpdpwssds_v16si, "__builtin_ia32_vpdpwssds_v16si", IX86_BUILTIN_VPDPWSSDSV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI)
+BDESC (OPTION_MASK_ISA_AVX512VNNI, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpdpwssds_v16si_mask, "__builtin_ia32_vpdpwssds_v16si_mask", IX86_BUILTIN_VPDPWSSDSV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VNNI, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpdpwssds_v16si_maskz, "__builtin_ia32_vpdpwssds_v16si_maskz", IX86_BUILTIN_VPDPWSSDSV16SI_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVXVNNI, CODE_FOR_vpdpwssds_v8si, "__builtin_ia32_vpdpwssds_v8si", IX86_BUILTIN_VPDPWSSDSV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI)
 BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpdpwssds_v8si_mask, "__builtin_ia32_vpdpwssds_v8si_mask", IX86_BUILTIN_VPDPWSSDSV8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpdpwssds_v8si_maskz, "__builtin_ia32_vpdpwssds_v8si_maskz", IX86_BUILTIN_VPDPWSSDSV8SI_MASKZ, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI)
@@ -2773,13 +2773,13 @@ BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT16, CODE_FOR_vpdpwuuds_v4si, "__builtin_ia3
 /* VPCLMULQDQ */
 BDESC (OPTION_MASK_ISA_VPCLMULQDQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpclmulqdq_v2di, "__builtin_ia32_vpclmulqdq_v2di", IX86_BUILTIN_VPCLMULQDQ2, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_INT)
 BDESC (OPTION_MASK_ISA_VPCLMULQDQ | OPTION_MASK_ISA_AVX, 0, CODE_FOR_vpclmulqdq_v4di, "__builtin_ia32_vpclmulqdq_v4di", IX86_BUILTIN_VPCLMULQDQ4, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_INT)
-BDESC (OPTION_MASK_ISA_VPCLMULQDQ | OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_vpclmulqdq_v8di, "__builtin_ia32_vpclmulqdq_v8di", IX86_BUILTIN_VPCLMULQDQ8, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_INT)
+BDESC (OPTION_MASK_ISA_VPCLMULQDQ | OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpclmulqdq_v8di, "__builtin_ia32_vpclmulqdq_v8di", IX86_BUILTIN_VPCLMULQDQ8, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_INT)
 
 /* VPOPCNTDQ */
-BDESC (OPTION_MASK_ISA_AVX512VPOPCNTDQ, 0, CODE_FOR_vpopcountv16si, "__builtin_ia32_vpopcountd_v16si", IX86_BUILTIN_VPOPCOUNTDV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI)
-BDESC (OPTION_MASK_ISA_AVX512VPOPCNTDQ, 0, CODE_FOR_vpopcountv16si_mask, "__builtin_ia32_vpopcountd_v16si_mask", IX86_BUILTIN_VPOPCOUNTDV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512VPOPCNTDQ, 0, CODE_FOR_vpopcountv8di, "__builtin_ia32_vpopcountq_v8di", IX86_BUILTIN_VPOPCOUNTQV8DI, UNKNOWN, (int) V8DI_FTYPE_V8DI)
-BDESC (OPTION_MASK_ISA_AVX512VPOPCNTDQ, 0, CODE_FOR_vpopcountv8di_mask, "__builtin_ia32_vpopcountq_v8di_mask", IX86_BUILTIN_VPOPCOUNTQV8DI_MASK, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VPOPCNTDQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpopcountv16si, "__builtin_ia32_vpopcountd_v16si", IX86_BUILTIN_VPOPCOUNTDV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI)
+BDESC (OPTION_MASK_ISA_AVX512VPOPCNTDQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpopcountv16si_mask, "__builtin_ia32_vpopcountd_v16si_mask", IX86_BUILTIN_VPOPCOUNTDV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VPOPCNTDQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpopcountv8di, "__builtin_ia32_vpopcountq_v8di", IX86_BUILTIN_VPOPCOUNTQV8DI, UNKNOWN, (int) V8DI_FTYPE_V8DI)
+BDESC (OPTION_MASK_ISA_AVX512VPOPCNTDQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpopcountv8di_mask, "__builtin_ia32_vpopcountq_v8di_mask", IX86_BUILTIN_VPOPCOUNTQV8DI_MASK, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI)
 
 BDESC (OPTION_MASK_ISA_AVX512VPOPCNTDQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpopcountv4di, "__builtin_ia32_vpopcountq_v4di", IX86_BUILTIN_VPOPCOUNTQV4DI, UNKNOWN, (int) V4DI_FTYPE_V4DI)
 BDESC (OPTION_MASK_ISA_AVX512VPOPCNTDQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpopcountv4di_mask, "__builtin_ia32_vpopcountq_v4di_mask", IX86_BUILTIN_VPOPCOUNTQV4DI_MASK, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_UQI)
@@ -2791,21 +2791,21 @@ BDESC (OPTION_MASK_ISA_AVX512VPOPCNTDQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_v
 BDESC (OPTION_MASK_ISA_AVX512VPOPCNTDQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpopcountv8si_mask, "__builtin_ia32_vpopcountd_v8si_mask", IX86_BUILTIN_VPOPCOUNTDV8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_UHI)
 
 /* BITALG */
-BDESC (OPTION_MASK_ISA_AVX512BITALG, 0, CODE_FOR_vpopcountv64qi, "__builtin_ia32_vpopcountb_v64qi", IX86_BUILTIN_VPOPCOUNTBV64QI, UNKNOWN, (int) V64QI_FTYPE_V64QI)
-BDESC (OPTION_MASK_ISA_AVX512BITALG, 0, CODE_FOR_vpopcountv64qi_mask, "__builtin_ia32_vpopcountb_v64qi_mask", IX86_BUILTIN_VPOPCOUNTBV64QI_MASK, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BITALG, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpopcountv64qi, "__builtin_ia32_vpopcountb_v64qi", IX86_BUILTIN_VPOPCOUNTBV64QI, UNKNOWN, (int) V64QI_FTYPE_V64QI)
+BDESC (OPTION_MASK_ISA_AVX512BITALG, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpopcountv64qi_mask, "__builtin_ia32_vpopcountb_v64qi_mask", IX86_BUILTIN_VPOPCOUNTBV64QI_MASK, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_UDI)
 BDESC (OPTION_MASK_ISA_AVX512BITALG | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpopcountv32qi, "__builtin_ia32_vpopcountb_v32qi", IX86_BUILTIN_VPOPCOUNTBV32QI, UNKNOWN, (int) V32QI_FTYPE_V32QI)
 BDESC (OPTION_MASK_ISA_AVX512BITALG | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpopcountv32qi_mask, "__builtin_ia32_vpopcountb_v32qi_mask", IX86_BUILTIN_VPOPCOUNTBV32QI_MASK, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_USI)
 BDESC (OPTION_MASK_ISA_AVX512BITALG | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpopcountv16qi, "__builtin_ia32_vpopcountb_v16qi", IX86_BUILTIN_VPOPCOUNTBV16QI, UNKNOWN, (int) V16QI_FTYPE_V16QI)
 BDESC (OPTION_MASK_ISA_AVX512BITALG | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpopcountv16qi_mask, "__builtin_ia32_vpopcountb_v16qi_mask", IX86_BUILTIN_VPOPCOUNTBV16QI_MASK, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_UHI)
 
-BDESC (OPTION_MASK_ISA_AVX512BITALG, 0, CODE_FOR_vpopcountv32hi, "__builtin_ia32_vpopcountw_v32hi", IX86_BUILTIN_VPOPCOUNTWV32HI, UNKNOWN, (int) V32HI_FTYPE_V32HI)
-BDESC (OPTION_MASK_ISA_AVX512BITALG, 0, CODE_FOR_vpopcountv32hi_mask, "__builtin_ia32_vpopcountw_v32hi_mask", IX86_BUILTIN_VPOPCOUNTQV32HI_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BITALG, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpopcountv32hi, "__builtin_ia32_vpopcountw_v32hi", IX86_BUILTIN_VPOPCOUNTWV32HI, UNKNOWN, (int) V32HI_FTYPE_V32HI)
+BDESC (OPTION_MASK_ISA_AVX512BITALG, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpopcountv32hi_mask, "__builtin_ia32_vpopcountw_v32hi_mask", IX86_BUILTIN_VPOPCOUNTQV32HI_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_USI)
 BDESC (OPTION_MASK_ISA_AVX512BITALG | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpopcountv16hi, "__builtin_ia32_vpopcountw_v16hi", IX86_BUILTIN_VPOPCOUNTWV16HI, UNKNOWN, (int) V16HI_FTYPE_V16HI)
 BDESC (OPTION_MASK_ISA_AVX512BITALG | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpopcountv16hi_mask, "__builtin_ia32_vpopcountw_v16hi_mask", IX86_BUILTIN_VPOPCOUNTQV16HI_MASK, UNKNOWN, (int) V16HI_FTYPE_V16HI_V16HI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512BITALG | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpopcountv8hi, "__builtin_ia32_vpopcountw_v8hi", IX86_BUILTIN_VPOPCOUNTWV8HI, UNKNOWN, (int) V8HI_FTYPE_V8HI)
 BDESC (OPTION_MASK_ISA_AVX512BITALG | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpopcountv8hi_mask, "__builtin_ia32_vpopcountw_v8hi_mask", IX86_BUILTIN_VPOPCOUNTQV8HI_MASK, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI_UQI)
 
-BDESC (OPTION_MASK_ISA_AVX512BITALG, 0, CODE_FOR_avx512vl_vpshufbitqmbv64qi_mask, "__builtin_ia32_vpshufbitqmb512_mask", IX86_BUILTIN_VPSHUFBITQMB512_MASK, UNKNOWN, (int) UDI_FTYPE_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BITALG, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512vl_vpshufbitqmbv64qi_mask, "__builtin_ia32_vpshufbitqmb512_mask", IX86_BUILTIN_VPSHUFBITQMB512_MASK, UNKNOWN, (int) UDI_FTYPE_V64QI_V64QI_UDI)
 BDESC (OPTION_MASK_ISA_AVX512BITALG | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_vpshufbitqmbv32qi_mask, "__builtin_ia32_vpshufbitqmb256_mask", IX86_BUILTIN_VPSHUFBITQMB256_MASK, UNKNOWN, (int) USI_FTYPE_V32QI_V32QI_USI)
 BDESC (OPTION_MASK_ISA_AVX512BITALG | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_vpshufbitqmbv16qi_mask, "__builtin_ia32_vpshufbitqmb128_mask", IX86_BUILTIN_VPSHUFBITQMB128_MASK, UNKNOWN, (int) UHI_FTYPE_V16QI_V16QI_UHI)
 
@@ -2829,39 +2829,39 @@ BDESC (0, OPTION_MASK_ISA2_RDPID, CODE_FOR_rdpid, "__builtin_ia32_rdpid", IX86_B
 /* VAES.  */
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_VAES, CODE_FOR_vaesdec_v16qi, "__builtin_ia32_vaesdec_v16qi", IX86_BUILTIN_VAESDEC16, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI)
 BDESC (0, OPTION_MASK_ISA2_VAES, CODE_FOR_vaesdec_v32qi, "__builtin_ia32_vaesdec_v32qi", IX86_BUILTIN_VAESDEC32, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI)
-BDESC (0, OPTION_MASK_ISA2_VAES, CODE_FOR_vaesdec_v64qi, "__builtin_ia32_vaesdec_v64qi", IX86_BUILTIN_VAESDEC64, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI)
+BDESC (0, OPTION_MASK_ISA2_VAES | OPTION_MASK_ISA2_EVEX512, CODE_FOR_vaesdec_v64qi, "__builtin_ia32_vaesdec_v64qi", IX86_BUILTIN_VAESDEC64, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_VAES, CODE_FOR_vaesdeclast_v16qi, "__builtin_ia32_vaesdeclast_v16qi", IX86_BUILTIN_VAESDECLAST16, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI)
 BDESC (0, OPTION_MASK_ISA2_VAES, CODE_FOR_vaesdeclast_v32qi, "__builtin_ia32_vaesdeclast_v32qi", IX86_BUILTIN_VAESDECLAST32, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI)
-BDESC (0, OPTION_MASK_ISA2_VAES, CODE_FOR_vaesdeclast_v64qi, "__builtin_ia32_vaesdeclast_v64qi", IX86_BUILTIN_VAESDECLAST64, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI)
+BDESC (0, OPTION_MASK_ISA2_VAES | OPTION_MASK_ISA2_EVEX512, CODE_FOR_vaesdeclast_v64qi, "__builtin_ia32_vaesdeclast_v64qi", IX86_BUILTIN_VAESDECLAST64, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_VAES, CODE_FOR_vaesenc_v16qi, "__builtin_ia32_vaesenc_v16qi", IX86_BUILTIN_VAESENC16, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI)
 BDESC (0, OPTION_MASK_ISA2_VAES, CODE_FOR_vaesenc_v32qi, "__builtin_ia32_vaesenc_v32qi", IX86_BUILTIN_VAESENC32, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI)
-BDESC (0, OPTION_MASK_ISA2_VAES, CODE_FOR_vaesenc_v64qi, "__builtin_ia32_vaesenc_v64qi", IX86_BUILTIN_VAESENC64, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI)
+BDESC (0, OPTION_MASK_ISA2_VAES | OPTION_MASK_ISA2_EVEX512, CODE_FOR_vaesenc_v64qi, "__builtin_ia32_vaesenc_v64qi", IX86_BUILTIN_VAESENC64, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_VAES, CODE_FOR_vaesenclast_v16qi, "__builtin_ia32_vaesenclast_v16qi", IX86_BUILTIN_VAESENCLAST16, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI)
 BDESC (0, OPTION_MASK_ISA2_VAES, CODE_FOR_vaesenclast_v32qi, "__builtin_ia32_vaesenclast_v32qi", IX86_BUILTIN_VAESENCLAST32, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI)
-BDESC (0, OPTION_MASK_ISA2_VAES, CODE_FOR_vaesenclast_v64qi, "__builtin_ia32_vaesenclast_v64qi", IX86_BUILTIN_VAESENCLAST64, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI)
+BDESC (0, OPTION_MASK_ISA2_VAES | OPTION_MASK_ISA2_EVEX512, CODE_FOR_vaesenclast_v64qi, "__builtin_ia32_vaesenclast_v64qi", IX86_BUILTIN_VAESENCLAST64, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI)
 
 /* BF16 */
-BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v32bf, "__builtin_ia32_cvtne2ps2bf16_v32bf", IX86_BUILTIN_CVTNE2PS2BF16_V32BF, UNKNOWN, (int) V32BF_FTYPE_V16SF_V16SF)
-BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v32bf_mask, "__builtin_ia32_cvtne2ps2bf16_v32bf_mask", IX86_BUILTIN_CVTNE2PS2BF16_V32BF_MASK, UNKNOWN, (int) V32BF_FTYPE_V16SF_V16SF_V32BF_USI)
-BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v32bf_maskz, "__builtin_ia32_cvtne2ps2bf16_v32bf_maskz", IX86_BUILTIN_CVTNE2PS2BF16_V32BF_MASKZ, UNKNOWN, (int) V32BF_FTYPE_V16SF_V16SF_USI)
+BDESC (0, OPTION_MASK_ISA2_AVX512BF16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_cvtne2ps2bf16_v32bf, "__builtin_ia32_cvtne2ps2bf16_v32bf", IX86_BUILTIN_CVTNE2PS2BF16_V32BF, UNKNOWN, (int) V32BF_FTYPE_V16SF_V16SF)
+BDESC (0, OPTION_MASK_ISA2_AVX512BF16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_cvtne2ps2bf16_v32bf_mask, "__builtin_ia32_cvtne2ps2bf16_v32bf_mask", IX86_BUILTIN_CVTNE2PS2BF16_V32BF_MASK, UNKNOWN, (int) V32BF_FTYPE_V16SF_V16SF_V32BF_USI)
+BDESC (0, OPTION_MASK_ISA2_AVX512BF16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_cvtne2ps2bf16_v32bf_maskz, "__builtin_ia32_cvtne2ps2bf16_v32bf_maskz", IX86_BUILTIN_CVTNE2PS2BF16_V32BF_MASKZ, UNKNOWN, (int) V32BF_FTYPE_V16SF_V16SF_USI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v16bf, "__builtin_ia32_cvtne2ps2bf16_v16bf", IX86_BUILTIN_CVTNE2PS2BF16_V16BF, UNKNOWN, (int) V16BF_FTYPE_V8SF_V8SF)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v16bf_mask, "__builtin_ia32_cvtne2ps2bf16_v16bf_mask", IX86_BUILTIN_CVTNE2PS2BF16_V16BF_MASK, UNKNOWN, (int) V16BF_FTYPE_V8SF_V8SF_V16BF_UHI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v16bf_maskz, "__builtin_ia32_cvtne2ps2bf16_v16bf_maskz", IX86_BUILTIN_CVTNE2PS2BF16_V16BF_MASKZ, UNKNOWN, (int) V16BF_FTYPE_V8SF_V8SF_UHI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v8bf, "__builtin_ia32_cvtne2ps2bf16_v8bf", IX86_BUILTIN_CVTNE2PS2BF16_V8BF, UNKNOWN, (int) V8BF_FTYPE_V4SF_V4SF)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v8bf_mask, "__builtin_ia32_cvtne2ps2bf16_v8bf_mask", IX86_BUILTIN_CVTNE2PS2BF16_V8BF_MASK, UNKNOWN, (int) V8BF_FTYPE_V4SF_V4SF_V8BF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v8bf_maskz, "__builtin_ia32_cvtne2ps2bf16_v8bf_maskz", IX86_BUILTIN_CVTNE2PS2BF16_V8BF_MASKZ, UNKNOWN, (int) V8BF_FTYPE_V4SF_V4SF_UQI)
-BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v16sf, "__builtin_ia32_cvtneps2bf16_v16sf", IX86_BUILTIN_CVTNEPS2BF16_V16SF, UNKNOWN, (int) V16BF_FTYPE_V16SF)
-BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v16sf_mask, "__builtin_ia32_cvtneps2bf16_v16sf_mask", IX86_BUILTIN_CVTNEPS2BF16_V16SF_MASK, UNKNOWN, (int) V16BF_FTYPE_V16SF_V16BF_UHI)
-BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v16sf_maskz, "__builtin_ia32_cvtneps2bf16_v16sf_maskz", IX86_BUILTIN_CVTNE2PS2BF16_V16SF_MASKZ, UNKNOWN, (int) V16BF_FTYPE_V16SF_UHI)
+BDESC (0, OPTION_MASK_ISA2_AVX512BF16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_cvtneps2bf16_v16sf, "__builtin_ia32_cvtneps2bf16_v16sf", IX86_BUILTIN_CVTNEPS2BF16_V16SF, UNKNOWN, (int) V16BF_FTYPE_V16SF)
+BDESC (0, OPTION_MASK_ISA2_AVX512BF16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_cvtneps2bf16_v16sf_mask, "__builtin_ia32_cvtneps2bf16_v16sf_mask", IX86_BUILTIN_CVTNEPS2BF16_V16SF_MASK, UNKNOWN, (int) V16BF_FTYPE_V16SF_V16BF_UHI)
+BDESC (0, OPTION_MASK_ISA2_AVX512BF16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_cvtneps2bf16_v16sf_maskz, "__builtin_ia32_cvtneps2bf16_v16sf_maskz", IX86_BUILTIN_CVTNE2PS2BF16_V16SF_MASKZ, UNKNOWN, (int) V16BF_FTYPE_V16SF_UHI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVXNECONVERT | OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_vcvtneps2bf16_v8sf, "__builtin_ia32_cvtneps2bf16_v8sf", IX86_BUILTIN_CVTNEPS2BF16_V8SF, UNKNOWN, (int) V8BF_FTYPE_V8SF)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v8sf_mask, "__builtin_ia32_cvtneps2bf16_v8sf_mask", IX86_BUILTIN_CVTNEPS2BF16_V8SF_MASK, UNKNOWN, (int) V8BF_FTYPE_V8SF_V8BF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v8sf_maskz, "__builtin_ia32_cvtneps2bf16_v8sf_maskz", IX86_BUILTIN_CVTNE2PS2BF16_V8SF_MASKZ, UNKNOWN, (int) V8BF_FTYPE_V8SF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVXNECONVERT | OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_vcvtneps2bf16_v4sf, "__builtin_ia32_cvtneps2bf16_v4sf", IX86_BUILTIN_CVTNEPS2BF16_V4SF, UNKNOWN, (int) V8BF_FTYPE_V4SF)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v4sf_mask, "__builtin_ia32_cvtneps2bf16_v4sf_mask", IX86_BUILTIN_CVTNEPS2BF16_V4SF_MASK, UNKNOWN, (int) V8BF_FTYPE_V4SF_V8BF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v4sf_maskz, "__builtin_ia32_cvtneps2bf16_v4sf_maskz", IX86_BUILTIN_CVTNE2PS2BF16_V4SF_MASKZ, UNKNOWN, (int) V8BF_FTYPE_V4SF_UQI)
-BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v16sf, "__builtin_ia32_dpbf16ps_v16sf", IX86_BUILTIN_DPBF16PS_V16SF, UNKNOWN, (int) V16SF_FTYPE_V16SF_V32BF_V32BF)
-BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v16sf_mask, "__builtin_ia32_dpbf16ps_v16sf_mask", IX86_BUILTIN_DPBF16PS_V16SF_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V32BF_V32BF_UHI)
-BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v16sf_maskz, "__builtin_ia32_dpbf16ps_v16sf_maskz", IX86_BUILTIN_DPBF16PS_V16SF_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V32BF_V32BF_UHI)
+BDESC (0, OPTION_MASK_ISA2_AVX512BF16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_dpbf16ps_v16sf, "__builtin_ia32_dpbf16ps_v16sf", IX86_BUILTIN_DPBF16PS_V16SF, UNKNOWN, (int) V16SF_FTYPE_V16SF_V32BF_V32BF)
+BDESC (0, OPTION_MASK_ISA2_AVX512BF16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_dpbf16ps_v16sf_mask, "__builtin_ia32_dpbf16ps_v16sf_mask", IX86_BUILTIN_DPBF16PS_V16SF_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V32BF_V32BF_UHI)
+BDESC (0, OPTION_MASK_ISA2_AVX512BF16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_dpbf16ps_v16sf_maskz, "__builtin_ia32_dpbf16ps_v16sf_maskz", IX86_BUILTIN_DPBF16PS_V16SF_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V32BF_V32BF_UHI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v8sf, "__builtin_ia32_dpbf16ps_v8sf", IX86_BUILTIN_DPBF16PS_V8SF, UNKNOWN, (int) V8SF_FTYPE_V8SF_V16BF_V16BF)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v8sf_mask, "__builtin_ia32_dpbf16ps_v8sf_mask", IX86_BUILTIN_DPBF16PS_V8SF_MASK, UNKNOWN, (int) V8SF_FTYPE_V8SF_V16BF_V16BF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v8sf_maskz, "__builtin_ia32_dpbf16ps_v8sf_maskz", IX86_BUILTIN_DPBF16PS_V8SF_MASKZ, UNKNOWN, (int) V8SF_FTYPE_V8SF_V16BF_V16BF_UQI)
-- 
2.31.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 11/18] [PATCH 5/5] Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins
  2023-09-21  7:19 [PATCH 00/18] Support -mevex512 for AVX512 Hu, Lin1
                   ` (9 preceding siblings ...)
  2023-09-21  7:20 ` [PATCH 10/18] [PATCH 4/5] " Hu, Lin1
@ 2023-09-21  7:20 ` Hu, Lin1
  2023-09-21  7:20 ` [PATCH 12/18] Disable zmm register and 512 bit libmvec call when !TARGET_EVEX512 Hu, Lin1
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Hu, Lin1 @ 2023-09-21  7:20 UTC (permalink / raw)
  To: gcc-patches; +Cc: hongtao.liu, ubizjak, haochen.jiang

From: Haochen Jiang <haochen.jiang@intel.com>

gcc/ChangeLog:

	* config/i386/i386-builtin.def (BDESC): Add
	OPTION_MASK_ISA2_EVEX512.
---
 gcc/config/i386/i386-builtin.def | 156 +++++++++++++++----------------
 1 file changed, 78 insertions(+), 78 deletions(-)

diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 8250e2998cd..b90d5ccc969 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -1568,9 +1568,9 @@ BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_copysignv8df3
 BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_sqrtv8df2, "__builtin_ia32_sqrtpd512", IX86_BUILTIN_SQRTPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF)
 BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_sqrtv16sf2, "__builtin_ia32_sqrtps512", IX86_BUILTIN_SQRTPS_NR512, UNKNOWN, (int) V16SF_FTYPE_V16SF)
 BDESC (OPTION_MASK_ISA_AVX512ER, 0, CODE_FOR_avx512er_exp2v16sf, "__builtin_ia32_exp2ps", IX86_BUILTIN_EXP2PS, UNKNOWN, (int) V16SF_FTYPE_V16SF)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_rndscalev32hf, "__builtin_ia32_floorph512", IX86_BUILTIN_FLOORPH512, (enum rtx_code) ROUND_FLOOR, (int) V32HF_FTYPE_V32HF_ROUND)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_rndscalev32hf, "__builtin_ia32_ceilph512", IX86_BUILTIN_CEILPH512, (enum rtx_code) ROUND_CEIL, (int) V32HF_FTYPE_V32HF_ROUND)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_rndscalev32hf, "__builtin_ia32_truncph512", IX86_BUILTIN_TRUNCPH512, (enum rtx_code) ROUND_TRUNC, (int) V32HF_FTYPE_V32HF_ROUND)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_rndscalev32hf, "__builtin_ia32_floorph512", IX86_BUILTIN_FLOORPH512, (enum rtx_code) ROUND_FLOOR, (int) V32HF_FTYPE_V32HF_ROUND)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_rndscalev32hf, "__builtin_ia32_ceilph512", IX86_BUILTIN_CEILPH512, (enum rtx_code) ROUND_CEIL, (int) V32HF_FTYPE_V32HF_ROUND)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_rndscalev32hf, "__builtin_ia32_truncph512", IX86_BUILTIN_TRUNCPH512, (enum rtx_code) ROUND_TRUNC, (int) V32HF_FTYPE_V32HF_ROUND)
 BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_roundps512, "__builtin_ia32_floorps512", IX86_BUILTIN_FLOORPS512, (enum rtx_code) ROUND_FLOOR, (int) V16SF_FTYPE_V16SF_ROUND)
 BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_roundps512, "__builtin_ia32_ceilps512", IX86_BUILTIN_CEILPS512, (enum rtx_code) ROUND_CEIL, (int) V16SF_FTYPE_V16SF_ROUND)
 BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_roundps512, "__builtin_ia32_truncps512", IX86_BUILTIN_TRUNCPS512, (enum rtx_code) ROUND_TRUNC, (int) V16SF_FTYPE_V16SF_ROUND)
@@ -2874,40 +2874,40 @@ BDESC (OPTION_MASK_ISA_SSE2, 0, CODE_FOR_extendbfsf2_1, "__builtin_ia32_cvtbf2sf
 /* AVX512FP16.  */
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_addv8hf3_mask, "__builtin_ia32_addph128_mask", IX86_BUILTIN_ADDPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_addv16hf3_mask, "__builtin_ia32_addph256_mask", IX86_BUILTIN_ADDPH256_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_addv32hf3_mask, "__builtin_ia32_addph512_mask", IX86_BUILTIN_ADDPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_addv32hf3_mask, "__builtin_ia32_addph512_mask", IX86_BUILTIN_ADDPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_subv8hf3_mask, "__builtin_ia32_subph128_mask", IX86_BUILTIN_SUBPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_subv16hf3_mask, "__builtin_ia32_subph256_mask", IX86_BUILTIN_SUBPH256_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_subv32hf3_mask, "__builtin_ia32_subph512_mask", IX86_BUILTIN_SUBPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_subv32hf3_mask, "__builtin_ia32_subph512_mask", IX86_BUILTIN_SUBPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_mulv8hf3_mask, "__builtin_ia32_mulph128_mask", IX86_BUILTIN_MULPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_mulv16hf3_mask, "__builtin_ia32_mulph256_mask", IX86_BUILTIN_MULPH256_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_mulv32hf3_mask, "__builtin_ia32_mulph512_mask", IX86_BUILTIN_MULPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_mulv32hf3_mask, "__builtin_ia32_mulph512_mask", IX86_BUILTIN_MULPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_divv8hf3_mask, "__builtin_ia32_divph128_mask", IX86_BUILTIN_DIVPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_divv16hf3_mask, "__builtin_ia32_divph256_mask", IX86_BUILTIN_DIVPH256_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_divv32hf3_mask, "__builtin_ia32_divph512_mask", IX86_BUILTIN_DIVPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_divv32hf3_mask, "__builtin_ia32_divph512_mask", IX86_BUILTIN_DIVPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmaddv8hf3_mask, "__builtin_ia32_addsh_mask", IX86_BUILTIN_ADDSH_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsubv8hf3_mask, "__builtin_ia32_subsh_mask", IX86_BUILTIN_SUBSH_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmmulv8hf3_mask, "__builtin_ia32_mulsh_mask", IX86_BUILTIN_MULSH_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmdivv8hf3_mask, "__builtin_ia32_divsh_mask", IX86_BUILTIN_DIVSH_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_smaxv8hf3_mask, "__builtin_ia32_maxph128_mask", IX86_BUILTIN_MAXPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_smaxv16hf3_mask, "__builtin_ia32_maxph256_mask", IX86_BUILTIN_MAXPH256_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_smaxv32hf3_mask, "__builtin_ia32_maxph512_mask", IX86_BUILTIN_MAXPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_smaxv32hf3_mask, "__builtin_ia32_maxph512_mask", IX86_BUILTIN_MAXPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_sminv8hf3_mask, "__builtin_ia32_minph128_mask", IX86_BUILTIN_MINPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_sminv16hf3_mask, "__builtin_ia32_minph256_mask", IX86_BUILTIN_MINPH256_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_sminv32hf3_mask, "__builtin_ia32_minph512_mask", IX86_BUILTIN_MINPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_sminv32hf3_mask, "__builtin_ia32_minph512_mask", IX86_BUILTIN_MINPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsmaxv8hf3_mask, "__builtin_ia32_maxsh_mask", IX86_BUILTIN_MAXSH_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsminv8hf3_mask, "__builtin_ia32_minsh_mask", IX86_BUILTIN_MINSH_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_cmpv8hf3_mask, "__builtin_ia32_cmpph128_mask", IX86_BUILTIN_CMPPH128_MASK, UNKNOWN, (int) UQI_FTYPE_V8HF_V8HF_INT_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_cmpv16hf3_mask, "__builtin_ia32_cmpph256_mask", IX86_BUILTIN_CMPPH256_MASK, UNKNOWN, (int) UHI_FTYPE_V16HF_V16HF_INT_UHI)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_cmpv32hf3_mask, "__builtin_ia32_cmpph512_mask", IX86_BUILTIN_CMPPH512_MASK, UNKNOWN, (int) USI_FTYPE_V32HF_V32HF_INT_USI)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_cmpv32hf3_mask, "__builtin_ia32_cmpph512_mask", IX86_BUILTIN_CMPPH512_MASK, UNKNOWN, (int) USI_FTYPE_V32HF_V32HF_INT_USI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_sqrtv8hf2_mask, "__builtin_ia32_sqrtph128_mask", IX86_BUILTIN_SQRTPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_sqrtv16hf2_mask, "__builtin_ia32_sqrtph256_mask", IX86_BUILTIN_SQRTPH256_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_UHI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_rsqrtv8hf2_mask, "__builtin_ia32_rsqrtph128_mask", IX86_BUILTIN_RSQRTPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_rsqrtv16hf2_mask, "__builtin_ia32_rsqrtph256_mask", IX86_BUILTIN_RSQRTPH256_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_UHI)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_rsqrtv32hf2_mask, "__builtin_ia32_rsqrtph512_mask", IX86_BUILTIN_RSQRTPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_USI)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_rsqrtv32hf2_mask, "__builtin_ia32_rsqrtph512_mask", IX86_BUILTIN_RSQRTPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_USI)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmrsqrtv8hf2_mask, "__builtin_ia32_rsqrtsh_mask", IX86_BUILTIN_RSQRTSH_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_rcpv8hf2_mask, "__builtin_ia32_rcpph128_mask", IX86_BUILTIN_RCPPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_rcpv16hf2_mask, "__builtin_ia32_rcpph256_mask", IX86_BUILTIN_RCPPH256_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_UHI)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_rcpv32hf2_mask, "__builtin_ia32_rcpph512_mask", IX86_BUILTIN_RCPPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_USI)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_rcpv32hf2_mask, "__builtin_ia32_rcpph512_mask", IX86_BUILTIN_RCPPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_USI)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmrcpv8hf2_mask, "__builtin_ia32_rcpsh_mask", IX86_BUILTIN_RCPSH_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_scalefv8hf_mask, "__builtin_ia32_scalefph128_mask", IX86_BUILTIN_SCALEFPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_scalefv16hf_mask, "__builtin_ia32_scalefph256_mask", IX86_BUILTIN_SCALEFPH256_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI)
@@ -2917,7 +2917,7 @@ BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp1
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_rndscalev16hf_mask, "__builtin_ia32_rndscaleph256_mask", IX86_BUILTIN_RNDSCALEPH256_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_INT_V16HF_UHI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512dq_fpclassv16hf_mask, "__builtin_ia32_fpclassph256_mask", IX86_BUILTIN_FPCLASSPH256, UNKNOWN, (int) HI_FTYPE_V16HF_INT_UHI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512dq_fpclassv8hf_mask, "__builtin_ia32_fpclassph128_mask", IX86_BUILTIN_FPCLASSPH128, UNKNOWN, (int) QI_FTYPE_V8HF_INT_UQI)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512dq_fpclassv32hf_mask, "__builtin_ia32_fpclassph512_mask", IX86_BUILTIN_FPCLASSPH512, UNKNOWN, (int) SI_FTYPE_V32HF_INT_USI)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_fpclassv32hf_mask, "__builtin_ia32_fpclassph512_mask", IX86_BUILTIN_FPCLASSPH512, UNKNOWN, (int) SI_FTYPE_V32HF_INT_USI)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512dq_vmfpclassv8hf_mask, "__builtin_ia32_fpclasssh_mask", IX86_BUILTIN_FPCLASSSH_MASK, UNKNOWN, (int) QI_FTYPE_V8HF_INT_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_getexpv16hf_mask, "__builtin_ia32_getexpph256_mask", IX86_BUILTIN_GETEXPPH256, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_UHI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_getexpv8hf_mask, "__builtin_ia32_getexpph128_mask", IX86_BUILTIN_GETEXPPH128, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_UQI)
@@ -3229,50 +3229,50 @@ BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_ran
 BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_rangepv8df_mask_round, "__builtin_ia32_rangepd512_mask", IX86_BUILTIN_RANGEPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_INT_V8DF_QI_INT)
 
 /* AVX512FP16.  */
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_addv32hf3_mask_round, "__builtin_ia32_addph512_mask_round", IX86_BUILTIN_ADDPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_subv32hf3_mask_round, "__builtin_ia32_subph512_mask_round", IX86_BUILTIN_SUBPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_mulv32hf3_mask_round, "__builtin_ia32_mulph512_mask_round", IX86_BUILTIN_MULPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_divv32hf3_mask_round, "__builtin_ia32_divph512_mask_round", IX86_BUILTIN_DIVPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_addv32hf3_mask_round, "__builtin_ia32_addph512_mask_round", IX86_BUILTIN_ADDPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_subv32hf3_mask_round, "__builtin_ia32_subph512_mask_round", IX86_BUILTIN_SUBPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_mulv32hf3_mask_round, "__builtin_ia32_mulph512_mask_round", IX86_BUILTIN_MULPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_divv32hf3_mask_round, "__builtin_ia32_divph512_mask_round", IX86_BUILTIN_DIVPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmaddv8hf3_mask_round, "__builtin_ia32_addsh_mask_round", IX86_BUILTIN_ADDSH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsubv8hf3_mask_round, "__builtin_ia32_subsh_mask_round", IX86_BUILTIN_SUBSH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmmulv8hf3_mask_round, "__builtin_ia32_mulsh_mask_round", IX86_BUILTIN_MULSH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmdivv8hf3_mask_round, "__builtin_ia32_divsh_mask_round", IX86_BUILTIN_DIVSH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_smaxv32hf3_mask_round, "__builtin_ia32_maxph512_mask_round", IX86_BUILTIN_MAXPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_sminv32hf3_mask_round, "__builtin_ia32_minph512_mask_round", IX86_BUILTIN_MINPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_smaxv32hf3_mask_round, "__builtin_ia32_maxph512_mask_round", IX86_BUILTIN_MAXPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_sminv32hf3_mask_round, "__builtin_ia32_minph512_mask_round", IX86_BUILTIN_MINPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsmaxv8hf3_mask_round, "__builtin_ia32_maxsh_mask_round", IX86_BUILTIN_MAXSH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsminv8hf3_mask_round, "__builtin_ia32_minsh_mask_round", IX86_BUILTIN_MINSH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_cmpv32hf3_mask_round, "__builtin_ia32_cmpph512_mask_round", IX86_BUILTIN_CMPPH512_MASK_ROUND, UNKNOWN, (int) USI_FTYPE_V32HF_V32HF_INT_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_cmpv32hf3_mask_round, "__builtin_ia32_cmpph512_mask_round", IX86_BUILTIN_CMPPH512_MASK_ROUND, UNKNOWN, (int) USI_FTYPE_V32HF_V32HF_INT_USI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmcmpv8hf3_mask_round, "__builtin_ia32_cmpsh_mask_round", IX86_BUILTIN_CMPSH_MASK_ROUND, UNKNOWN, (int) UQI_FTYPE_V8HF_V8HF_INT_UQI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_sqrtv32hf2_mask_round, "__builtin_ia32_sqrtph512_mask_round", IX86_BUILTIN_SQRTPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_sqrtv32hf2_mask_round, "__builtin_ia32_sqrtph512_mask_round", IX86_BUILTIN_SQRTPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_USI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsqrtv8hf2_mask_round, "__builtin_ia32_sqrtsh_mask_round", IX86_BUILTIN_SQRTSH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_scalefv32hf_mask_round, "__builtin_ia32_scalefph512_mask_round", IX86_BUILTIN_SCALEFPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_scalefv32hf_mask_round, "__builtin_ia32_scalefph512_mask_round", IX86_BUILTIN_SCALEFPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmscalefv8hf_mask_round, "__builtin_ia32_scalefsh_mask_round", IX86_BUILTIN_SCALEFSH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_reducepv32hf_mask_round, "__builtin_ia32_reduceph512_mask_round", IX86_BUILTIN_REDUCEPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_INT_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_reducepv32hf_mask_round, "__builtin_ia32_reduceph512_mask_round", IX86_BUILTIN_REDUCEPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_INT_V32HF_USI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_reducesv8hf_mask_round, "__builtin_ia32_reducesh_mask_round", IX86_BUILTIN_REDUCESH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_INT_V8HF_UQI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_rndscalev32hf_mask_round, "__builtin_ia32_rndscaleph512_mask_round", IX86_BUILTIN_RNDSCALEPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_INT_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_rndscalev32hf_mask_round, "__builtin_ia32_rndscaleph512_mask_round", IX86_BUILTIN_RNDSCALEPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_INT_V32HF_USI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_rndscalev8hf_mask_round, "__builtin_ia32_rndscalesh_mask_round", IX86_BUILTIN_RNDSCALESH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_INT_V8HF_UQI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_getexpv32hf_mask_round, "__builtin_ia32_getexpph512_mask", IX86_BUILTIN_GETEXPPH512, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_getexpv32hf_mask_round, "__builtin_ia32_getexpph512_mask", IX86_BUILTIN_GETEXPPH512, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_USI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_sgetexpv8hf_mask_round, "__builtin_ia32_getexpsh_mask_round", IX86_BUILTIN_GETEXPSH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_getmantv32hf_mask_round, "__builtin_ia32_getmantph512_mask", IX86_BUILTIN_GETMANTPH512, UNKNOWN, (int) V32HF_FTYPE_V32HF_INT_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_getmantv32hf_mask_round, "__builtin_ia32_getmantph512_mask", IX86_BUILTIN_GETMANTPH512, UNKNOWN, (int) V32HF_FTYPE_V32HF_INT_V32HF_USI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vgetmantv8hf_mask_round, "__builtin_ia32_getmantsh_mask_round", IX86_BUILTIN_GETMANTSH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_INT_V8HF_UQI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2dq_v16si_mask_round, "__builtin_ia32_vcvtph2dq512_mask_round", IX86_BUILTIN_VCVTPH2DQ512_MASK_ROUND, UNKNOWN, (int) V16SI_FTYPE_V16HF_V16SI_UHI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2udq_v16si_mask_round, "__builtin_ia32_vcvtph2udq512_mask_round", IX86_BUILTIN_VCVTPH2UDQ512_MASK_ROUND, UNKNOWN, (int) V16SI_FTYPE_V16HF_V16SI_UHI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fix_truncv16si2_mask_round, "__builtin_ia32_vcvttph2dq512_mask_round", IX86_BUILTIN_VCVTTPH2DQ512_MASK_ROUND, UNKNOWN, (int) V16SI_FTYPE_V16HF_V16SI_UHI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fixuns_truncv16si2_mask_round, "__builtin_ia32_vcvttph2udq512_mask_round", IX86_BUILTIN_VCVTTPH2UDQ512_MASK_ROUND, UNKNOWN, (int) V16SI_FTYPE_V16HF_V16SI_UHI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2qq_v8di_mask_round, "__builtin_ia32_vcvtph2qq512_mask_round", IX86_BUILTIN_VCVTPH2QQ512_MASK_ROUND, UNKNOWN, (int) V8DI_FTYPE_V8HF_V8DI_UQI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2uqq_v8di_mask_round, "__builtin_ia32_vcvtph2uqq512_mask_round", IX86_BUILTIN_VCVTPH2UQQ512_MASK_ROUND, UNKNOWN, (int) V8DI_FTYPE_V8HF_V8DI_UQI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fix_truncv8di2_mask_round, "__builtin_ia32_vcvttph2qq512_mask_round", IX86_BUILTIN_VCVTTPH2QQ512_MASK_ROUND, UNKNOWN, (int) V8DI_FTYPE_V8HF_V8DI_UQI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fixuns_truncv8di2_mask_round, "__builtin_ia32_vcvttph2uqq512_mask_round", IX86_BUILTIN_VCVTTPH2UQQ512_MASK_ROUND, UNKNOWN, (int) V8DI_FTYPE_V8HF_V8DI_UQI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2w_v32hi_mask_round, "__builtin_ia32_vcvtph2w512_mask_round", IX86_BUILTIN_VCVTPH2W512_MASK_ROUND, UNKNOWN, (int) V32HI_FTYPE_V32HF_V32HI_USI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2uw_v32hi_mask_round, "__builtin_ia32_vcvtph2uw512_mask_round", IX86_BUILTIN_VCVTPH2UW512_MASK_ROUND, UNKNOWN, (int) V32HI_FTYPE_V32HF_V32HI_USI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fix_truncv32hi2_mask_round, "__builtin_ia32_vcvttph2w512_mask_round", IX86_BUILTIN_VCVTTPH2W512_MASK_ROUND, UNKNOWN, (int) V32HI_FTYPE_V32HF_V32HI_USI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fixuns_truncv32hi2_mask_round, "__builtin_ia32_vcvttph2uw512_mask_round", IX86_BUILTIN_VCVTTPH2UW512_MASK_ROUND, UNKNOWN, (int) V32HI_FTYPE_V32HF_V32HI_USI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtw2ph_v32hi_mask_round, "__builtin_ia32_vcvtw2ph512_mask_round", IX86_BUILTIN_VCVTW2PH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HI_V32HF_USI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtuw2ph_v32hi_mask_round, "__builtin_ia32_vcvtuw2ph512_mask_round", IX86_BUILTIN_VCVTUW2PH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HI_V32HF_USI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtdq2ph_v16si_mask_round, "__builtin_ia32_vcvtdq2ph512_mask_round", IX86_BUILTIN_VCVTDQ2PH512_MASK_ROUND, UNKNOWN, (int) V16HF_FTYPE_V16SI_V16HF_UHI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtudq2ph_v16si_mask_round, "__builtin_ia32_vcvtudq2ph512_mask_round", IX86_BUILTIN_VCVTUDQ2PH512_MASK_ROUND, UNKNOWN, (int) V16HF_FTYPE_V16SI_V16HF_UHI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtqq2ph_v8di_mask_round, "__builtin_ia32_vcvtqq2ph512_mask_round", IX86_BUILTIN_VCVTQQ2PH512_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8DI_V8HF_UQI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtuqq2ph_v8di_mask_round, "__builtin_ia32_vcvtuqq2ph512_mask_round", IX86_BUILTIN_VCVTUQQ2PH512_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8DI_V8HF_UQI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_vcvtph2dq_v16si_mask_round, "__builtin_ia32_vcvtph2dq512_mask_round", IX86_BUILTIN_VCVTPH2DQ512_MASK_ROUND, UNKNOWN, (int) V16SI_FTYPE_V16HF_V16SI_UHI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_vcvtph2udq_v16si_mask_round, "__builtin_ia32_vcvtph2udq512_mask_round", IX86_BUILTIN_VCVTPH2UDQ512_MASK_ROUND, UNKNOWN, (int) V16SI_FTYPE_V16HF_V16SI_UHI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_fix_truncv16si2_mask_round, "__builtin_ia32_vcvttph2dq512_mask_round", IX86_BUILTIN_VCVTTPH2DQ512_MASK_ROUND, UNKNOWN, (int) V16SI_FTYPE_V16HF_V16SI_UHI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_fixuns_truncv16si2_mask_round, "__builtin_ia32_vcvttph2udq512_mask_round", IX86_BUILTIN_VCVTTPH2UDQ512_MASK_ROUND, UNKNOWN, (int) V16SI_FTYPE_V16HF_V16SI_UHI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_vcvtph2qq_v8di_mask_round, "__builtin_ia32_vcvtph2qq512_mask_round", IX86_BUILTIN_VCVTPH2QQ512_MASK_ROUND, UNKNOWN, (int) V8DI_FTYPE_V8HF_V8DI_UQI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_vcvtph2uqq_v8di_mask_round, "__builtin_ia32_vcvtph2uqq512_mask_round", IX86_BUILTIN_VCVTPH2UQQ512_MASK_ROUND, UNKNOWN, (int) V8DI_FTYPE_V8HF_V8DI_UQI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_fix_truncv8di2_mask_round, "__builtin_ia32_vcvttph2qq512_mask_round", IX86_BUILTIN_VCVTTPH2QQ512_MASK_ROUND, UNKNOWN, (int) V8DI_FTYPE_V8HF_V8DI_UQI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_fixuns_truncv8di2_mask_round, "__builtin_ia32_vcvttph2uqq512_mask_round", IX86_BUILTIN_VCVTTPH2UQQ512_MASK_ROUND, UNKNOWN, (int) V8DI_FTYPE_V8HF_V8DI_UQI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_vcvtph2w_v32hi_mask_round, "__builtin_ia32_vcvtph2w512_mask_round", IX86_BUILTIN_VCVTPH2W512_MASK_ROUND, UNKNOWN, (int) V32HI_FTYPE_V32HF_V32HI_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_vcvtph2uw_v32hi_mask_round, "__builtin_ia32_vcvtph2uw512_mask_round", IX86_BUILTIN_VCVTPH2UW512_MASK_ROUND, UNKNOWN, (int) V32HI_FTYPE_V32HF_V32HI_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_fix_truncv32hi2_mask_round, "__builtin_ia32_vcvttph2w512_mask_round", IX86_BUILTIN_VCVTTPH2W512_MASK_ROUND, UNKNOWN, (int) V32HI_FTYPE_V32HF_V32HI_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_fixuns_truncv32hi2_mask_round, "__builtin_ia32_vcvttph2uw512_mask_round", IX86_BUILTIN_VCVTTPH2UW512_MASK_ROUND, UNKNOWN, (int) V32HI_FTYPE_V32HF_V32HI_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_vcvtw2ph_v32hi_mask_round, "__builtin_ia32_vcvtw2ph512_mask_round", IX86_BUILTIN_VCVTW2PH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HI_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_vcvtuw2ph_v32hi_mask_round, "__builtin_ia32_vcvtuw2ph512_mask_round", IX86_BUILTIN_VCVTUW2PH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HI_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_vcvtdq2ph_v16si_mask_round, "__builtin_ia32_vcvtdq2ph512_mask_round", IX86_BUILTIN_VCVTDQ2PH512_MASK_ROUND, UNKNOWN, (int) V16HF_FTYPE_V16SI_V16HF_UHI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_vcvtudq2ph_v16si_mask_round, "__builtin_ia32_vcvtudq2ph512_mask_round", IX86_BUILTIN_VCVTUDQ2PH512_MASK_ROUND, UNKNOWN, (int) V16HF_FTYPE_V16SI_V16HF_UHI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_vcvtqq2ph_v8di_mask_round, "__builtin_ia32_vcvtqq2ph512_mask_round", IX86_BUILTIN_VCVTQQ2PH512_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8DI_V8HF_UQI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_vcvtuqq2ph_v8di_mask_round, "__builtin_ia32_vcvtuqq2ph512_mask_round", IX86_BUILTIN_VCVTUQQ2PH512_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8DI_V8HF_UQI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsh2si_round, "__builtin_ia32_vcvtsh2si32_round", IX86_BUILTIN_VCVTSH2SI32_ROUND, UNKNOWN, (int) INT_FTYPE_V8HF_INT)
 BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsh2siq_round, "__builtin_ia32_vcvtsh2si64_round", IX86_BUILTIN_VCVTSH2SI64_ROUND, UNKNOWN, (int) INT64_FTYPE_V8HF_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsh2usi_round, "__builtin_ia32_vcvtsh2usi32_round", IX86_BUILTIN_VCVTSH2USI32_ROUND, UNKNOWN, (int) UINT_FTYPE_V8HF_INT)
@@ -3285,32 +3285,32 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsi2sh_round, "__b
 BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsi2shq_round, "__builtin_ia32_vcvtsi2sh64_round", IX86_BUILTIN_VCVTSI2SH64_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_INT64_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtusi2sh_round, "__builtin_ia32_vcvtusi2sh32_round", IX86_BUILTIN_VCVTUSI2SH32_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_UINT_INT)
 BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtusi2shq_round, "__builtin_ia32_vcvtusi2sh64_round", IX86_BUILTIN_VCVTUSI2SH64_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_UINT64_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_float_extend_phv8df2_mask_round, "__builtin_ia32_vcvtph2pd512_mask_round", IX86_BUILTIN_VCVTPH2PD512_MASK_ROUND, UNKNOWN, (int) V8DF_FTYPE_V8HF_V8DF_UQI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_float_extend_phv16sf2_mask_round, "__builtin_ia32_vcvtph2psx512_mask_round", IX86_BUILTIN_VCVTPH2PSX512_MASK_ROUND, UNKNOWN, (int) V16SF_FTYPE_V16HF_V16SF_UHI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtpd2ph_v8df_mask_round, "__builtin_ia32_vcvtpd2ph512_mask_round", IX86_BUILTIN_VCVTPD2PH512_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8DF_V8HF_UQI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtps2ph_v16sf_mask_round, "__builtin_ia32_vcvtps2phx512_mask_round", IX86_BUILTIN_VCVTPS2PHX512_MASK_ROUND, UNKNOWN, (int) V16HF_FTYPE_V16SF_V16HF_UHI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_float_extend_phv8df2_mask_round, "__builtin_ia32_vcvtph2pd512_mask_round", IX86_BUILTIN_VCVTPH2PD512_MASK_ROUND, UNKNOWN, (int) V8DF_FTYPE_V8HF_V8DF_UQI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_float_extend_phv16sf2_mask_round, "__builtin_ia32_vcvtph2psx512_mask_round", IX86_BUILTIN_VCVTPH2PSX512_MASK_ROUND, UNKNOWN, (int) V16SF_FTYPE_V16HF_V16SF_UHI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_vcvtpd2ph_v8df_mask_round, "__builtin_ia32_vcvtpd2ph512_mask_round", IX86_BUILTIN_VCVTPD2PH512_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8DF_V8HF_UQI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_vcvtps2ph_v16sf_mask_round, "__builtin_ia32_vcvtps2phx512_mask_round", IX86_BUILTIN_VCVTPS2PHX512_MASK_ROUND, UNKNOWN, (int) V16HF_FTYPE_V16SF_V16HF_UHI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsh2ss_mask_round, "__builtin_ia32_vcvtsh2ss_mask_round", IX86_BUILTIN_VCVTSH2SS_MASK_ROUND, UNKNOWN, (int) V4SF_FTYPE_V8HF_V4SF_V4SF_UQI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsh2sd_mask_round, "__builtin_ia32_vcvtsh2sd_mask_round", IX86_BUILTIN_VCVTSH2SD_MASK_ROUND, UNKNOWN, (int) V2DF_FTYPE_V8HF_V2DF_V2DF_UQI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtss2sh_mask_round, "__builtin_ia32_vcvtss2sh_mask_round", IX86_BUILTIN_VCVTSS2SH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V4SF_V8HF_V8HF_UQI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsd2sh_mask_round, "__builtin_ia32_vcvtsd2sh_mask_round", IX86_BUILTIN_VCVTSD2SH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V2DF_V8HF_V8HF_UQI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmaddsub_v32hf_mask_round, "__builtin_ia32_vfmaddsubph512_mask", IX86_BUILTIN_VFMADDSUBPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmaddsub_v32hf_mask3_round, "__builtin_ia32_vfmaddsubph512_mask3", IX86_BUILTIN_VFMADDSUBPH512_MASK3, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmaddsub_v32hf_maskz_round, "__builtin_ia32_vfmaddsubph512_maskz", IX86_BUILTIN_VFMADDSUBPH512_MASKZ, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmsubadd_v32hf_mask_round, "__builtin_ia32_vfmsubaddph512_mask", IX86_BUILTIN_VFMSUBADDPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmsubadd_v32hf_mask3_round, "__builtin_ia32_vfmsubaddph512_mask3", IX86_BUILTIN_VFMSUBADDPH512_MASK3, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmsubadd_v32hf_maskz_round, "__builtin_ia32_vfmsubaddph512_maskz", IX86_BUILTIN_VFMSUBADDPH512_MASKZ, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmadd_v32hf_mask_round, "__builtin_ia32_vfmaddph512_mask", IX86_BUILTIN_VFMADDPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmadd_v32hf_mask3_round, "__builtin_ia32_vfmaddph512_mask3", IX86_BUILTIN_VFMADDPH512_MASK3, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmadd_v32hf_maskz_round, "__builtin_ia32_vfmaddph512_maskz", IX86_BUILTIN_VFMADDPH512_MASKZ, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fnmadd_v32hf_mask_round, "__builtin_ia32_vfnmaddph512_mask", IX86_BUILTIN_VFNMADDPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fnmadd_v32hf_mask3_round, "__builtin_ia32_vfnmaddph512_mask3", IX86_BUILTIN_VFNMADDPH512_MASK3, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fnmadd_v32hf_maskz_round, "__builtin_ia32_vfnmaddph512_maskz", IX86_BUILTIN_VFNMADDPH512_MASKZ, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmsub_v32hf_mask_round, "__builtin_ia32_vfmsubph512_mask", IX86_BUILTIN_VFMSUBPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmsub_v32hf_mask3_round, "__builtin_ia32_vfmsubph512_mask3", IX86_BUILTIN_VFMSUBPH512_MASK3, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmsub_v32hf_maskz_round, "__builtin_ia32_vfmsubph512_maskz", IX86_BUILTIN_VFMSUBPH512_MASKZ, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fnmsub_v32hf_mask_round, "__builtin_ia32_vfnmsubph512_mask", IX86_BUILTIN_VFNMSUBPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fnmsub_v32hf_mask3_round, "__builtin_ia32_vfnmsubph512_mask3", IX86_BUILTIN_VFNMSUBPH512_MASK3, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fnmsub_v32hf_maskz_round, "__builtin_ia32_vfnmsubph512_maskz", IX86_BUILTIN_VFNMSUBPH512_MASKZ, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fmaddsub_v32hf_mask_round, "__builtin_ia32_vfmaddsubph512_mask", IX86_BUILTIN_VFMADDSUBPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fmaddsub_v32hf_mask3_round, "__builtin_ia32_vfmaddsubph512_mask3", IX86_BUILTIN_VFMADDSUBPH512_MASK3, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fmaddsub_v32hf_maskz_round, "__builtin_ia32_vfmaddsubph512_maskz", IX86_BUILTIN_VFMADDSUBPH512_MASKZ, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fmsubadd_v32hf_mask_round, "__builtin_ia32_vfmsubaddph512_mask", IX86_BUILTIN_VFMSUBADDPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fmsubadd_v32hf_mask3_round, "__builtin_ia32_vfmsubaddph512_mask3", IX86_BUILTIN_VFMSUBADDPH512_MASK3, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fmsubadd_v32hf_maskz_round, "__builtin_ia32_vfmsubaddph512_maskz", IX86_BUILTIN_VFMSUBADDPH512_MASKZ, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fmadd_v32hf_mask_round, "__builtin_ia32_vfmaddph512_mask", IX86_BUILTIN_VFMADDPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fmadd_v32hf_mask3_round, "__builtin_ia32_vfmaddph512_mask3", IX86_BUILTIN_VFMADDPH512_MASK3, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fmadd_v32hf_maskz_round, "__builtin_ia32_vfmaddph512_maskz", IX86_BUILTIN_VFMADDPH512_MASKZ, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fnmadd_v32hf_mask_round, "__builtin_ia32_vfnmaddph512_mask", IX86_BUILTIN_VFNMADDPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fnmadd_v32hf_mask3_round, "__builtin_ia32_vfnmaddph512_mask3", IX86_BUILTIN_VFNMADDPH512_MASK3, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fnmadd_v32hf_maskz_round, "__builtin_ia32_vfnmaddph512_maskz", IX86_BUILTIN_VFNMADDPH512_MASKZ, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fmsub_v32hf_mask_round, "__builtin_ia32_vfmsubph512_mask", IX86_BUILTIN_VFMSUBPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fmsub_v32hf_mask3_round, "__builtin_ia32_vfmsubph512_mask3", IX86_BUILTIN_VFMSUBPH512_MASK3, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fmsub_v32hf_maskz_round, "__builtin_ia32_vfmsubph512_maskz", IX86_BUILTIN_VFMSUBPH512_MASKZ, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fnmsub_v32hf_mask_round, "__builtin_ia32_vfnmsubph512_mask", IX86_BUILTIN_VFNMSUBPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fnmsub_v32hf_mask3_round, "__builtin_ia32_vfnmsubph512_mask3", IX86_BUILTIN_VFNMSUBPH512_MASK3, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fnmsub_v32hf_maskz_round, "__builtin_ia32_vfnmsubph512_maskz", IX86_BUILTIN_VFNMSUBPH512_MASKZ, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmfmadd_v8hf_mask_round, "__builtin_ia32_vfmaddsh3_mask", IX86_BUILTIN_VFMADDSH3_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmfmadd_v8hf_mask3_round, "__builtin_ia32_vfmaddsh3_mask3", IX86_BUILTIN_VFMADDSH3_MASK3, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmfmadd_v8hf_maskz_round, "__builtin_ia32_vfmaddsh3_maskz", IX86_BUILTIN_VFMADDSH3_MASKZ, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
@@ -3318,18 +3318,18 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmfnmadd_v8hf_mask_round
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmfnmadd_v8hf_mask3_round, "__builtin_ia32_vfnmaddsh3_mask3", IX86_BUILTIN_VFNMADDSH3_MASK3, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmfnmadd_v8hf_maskz_round, "__builtin_ia32_vfnmaddsh3_maskz", IX86_BUILTIN_VFNMADDSH3_MASKZ, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmfmsub_v8hf_mask3_round, "__builtin_ia32_vfmsubsh3_mask3", IX86_BUILTIN_VFMSUBSH3_MASK3, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_fma_fmaddc_v32hf_round, "__builtin_ia32_vfmaddcph512_round", IX86_BUILTIN_VFMADDCPH512_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmaddc_v32hf_mask1_round, "__builtin_ia32_vfmaddcph512_mask_round", IX86_BUILTIN_VFMADDCPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmaddc_v32hf_mask_round, "__builtin_ia32_vfmaddcph512_mask3_round", IX86_BUILTIN_VFMADDCPH512_MASK3_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmaddc_v32hf_maskz_round, "__builtin_ia32_vfmaddcph512_maskz_round", IX86_BUILTIN_VFMADDCPH512_MASKZ_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_fma_fcmaddc_v32hf_round, "__builtin_ia32_vfcmaddcph512_round", IX86_BUILTIN_VFCMADDCPH512_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fcmaddc_v32hf_mask1_round, "__builtin_ia32_vfcmaddcph512_mask_round", IX86_BUILTIN_VFCMADDCPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fcmaddc_v32hf_mask_round, "__builtin_ia32_vfcmaddcph512_mask3_round", IX86_BUILTIN_VFCMADDCPH512_MASK3_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fcmaddc_v32hf_maskz_round, "__builtin_ia32_vfcmaddcph512_maskz_round", IX86_BUILTIN_VFCMADDCPH512_MASKZ_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fcmulc_v32hf_round, "__builtin_ia32_vfcmulcph512_round", IX86_BUILTIN_VFCMULCPH512_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fcmulc_v32hf_mask_round, "__builtin_ia32_vfcmulcph512_mask_round", IX86_BUILTIN_VFCMULCPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmulc_v32hf_round, "__builtin_ia32_vfmulcph512_round", IX86_BUILTIN_VFMULCPH512_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmulc_v32hf_mask_round, "__builtin_ia32_vfmulcph512_mask_round", IX86_BUILTIN_VFMULCPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_fma_fmaddc_v32hf_round, "__builtin_ia32_vfmaddcph512_round", IX86_BUILTIN_VFMADDCPH512_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fmaddc_v32hf_mask1_round, "__builtin_ia32_vfmaddcph512_mask_round", IX86_BUILTIN_VFMADDCPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fmaddc_v32hf_mask_round, "__builtin_ia32_vfmaddcph512_mask3_round", IX86_BUILTIN_VFMADDCPH512_MASK3_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fmaddc_v32hf_maskz_round, "__builtin_ia32_vfmaddcph512_maskz_round", IX86_BUILTIN_VFMADDCPH512_MASKZ_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_fma_fcmaddc_v32hf_round, "__builtin_ia32_vfcmaddcph512_round", IX86_BUILTIN_VFCMADDCPH512_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fcmaddc_v32hf_mask1_round, "__builtin_ia32_vfcmaddcph512_mask_round", IX86_BUILTIN_VFCMADDCPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fcmaddc_v32hf_mask_round, "__builtin_ia32_vfcmaddcph512_mask3_round", IX86_BUILTIN_VFCMADDCPH512_MASK3_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fcmaddc_v32hf_maskz_round, "__builtin_ia32_vfcmaddcph512_maskz_round", IX86_BUILTIN_VFCMADDCPH512_MASKZ_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fcmulc_v32hf_round, "__builtin_ia32_vfcmulcph512_round", IX86_BUILTIN_VFCMULCPH512_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fcmulc_v32hf_mask_round, "__builtin_ia32_vfcmulcph512_mask_round", IX86_BUILTIN_VFCMULCPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fmulc_v32hf_round, "__builtin_ia32_vfmulcph512_round", IX86_BUILTIN_VFMULCPH512_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fmulc_v32hf_mask_round, "__builtin_ia32_vfmulcph512_mask_round", IX86_BUILTIN_VFMULCPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fma_fcmaddcsh_v8hf_round, "__builtin_ia32_vfcmaddcsh_round", IX86_BUILTIN_VFCMADDCSH_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fcmaddcsh_v8hf_mask1_round, "__builtin_ia32_vfcmaddcsh_mask_round", IX86_BUILTIN_VFCMADDCSH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fcmaddcsh_v8hf_mask3_round, "__builtin_ia32_vfcmaddcsh_mask3_round", IX86_BUILTIN_VFCMADDCSH_MASK3_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
-- 
2.31.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 12/18] Disable zmm register and 512 bit libmvec call when !TARGET_EVEX512
  2023-09-21  7:19 [PATCH 00/18] Support -mevex512 for AVX512 Hu, Lin1
                   ` (10 preceding siblings ...)
  2023-09-21  7:20 ` [PATCH 11/18] [PATCH 5/5] " Hu, Lin1
@ 2023-09-21  7:20 ` Hu, Lin1
  2023-09-21  7:20 ` [PATCH 13/18] Support -mevex512 for AVX512F intrins Hu, Lin1
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Hu, Lin1 @ 2023-09-21  7:20 UTC (permalink / raw)
  To: gcc-patches; +Cc: hongtao.liu, ubizjak, haochen.jiang

From: Haochen Jiang <haochen.jiang@intel.com>

gcc/ChangeLog:

	* config/i386/i386-expand.cc (ix86_broadcast_from_constant):
	Disable zmm broadcast for !TARGET_EVEX512.
	* config/i386/i386-options.cc (ix86_option_override_internal):
	Do not use PVW_512 when no-evex512.
	(ix86_simd_clone_adjust): Add evex512 target into string.
	* config/i386/i386.cc (type_natural_mode): Report ABI warning
	when using zmm register w/o evex512.
	(ix86_return_in_memory): Do not allow zmm when !TARGET_EVEX512.
	(ix86_hard_regno_mode_ok): Ditto.
	(ix86_set_reg_reg_cost): Ditto.
	(ix86_rtx_costs): Ditto.
	(ix86_vector_mode_supported_p): Ditto.
	(ix86_preferred_simd_mode): Ditto.
	(ix86_get_mask_mode): Ditto.
	(ix86_simd_clone_compute_vecsize_and_simdlen): Disable 512 bit
	libmvec call when !TARGET_EVEX512.
	(ix86_simd_clone_usable): Ditto.
	* config/i386/i386.h (BIGGEST_ALIGNMENT): Disable 512 alignment
	when !TARGET_EVEX512
	(MOVE_MAX): Do not use PVW_512 when !TARGET_EVEX512.
	(STORE_MAX_PIECES): Ditto.
---
 gcc/config/i386/i386-expand.cc  |  1 +
 gcc/config/i386/i386-options.cc | 14 +++++----
 gcc/config/i386/i386.cc         | 53 ++++++++++++++++++---------------
 gcc/config/i386/i386.h          |  7 +++--
 4 files changed, 42 insertions(+), 33 deletions(-)

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index e42ff27c6ef..6eedcb384c0 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -611,6 +611,7 @@ ix86_broadcast_from_constant (machine_mode mode, rtx op)
      avx512 embed broadcast is available.  */
   if (GET_MODE_INNER (mode) == DImode && !TARGET_64BIT
       && (!TARGET_AVX512F
+	  || (GET_MODE_SIZE (mode) == 64 && !TARGET_EVEX512)
 	  || (GET_MODE_SIZE (mode) < 64 && !TARGET_AVX512VL)))
     return nullptr;
 
diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
index a1a7a92da9f..e2a90d7d9e2 100644
--- a/gcc/config/i386/i386-options.cc
+++ b/gcc/config/i386/i386-options.cc
@@ -2845,7 +2845,8 @@ ix86_option_override_internal (bool main_args_p,
 	  opts->x_ix86_move_max = opts->x_prefer_vector_width_type;
 	  if (opts_set->x_ix86_move_max == PVW_NONE)
 	    {
-	      if (TARGET_AVX512F_P (opts->x_ix86_isa_flags))
+	      if (TARGET_AVX512F_P (opts->x_ix86_isa_flags)
+		  && TARGET_EVEX512_P (opts->x_ix86_isa_flags2))
 		opts->x_ix86_move_max = PVW_AVX512;
 	      else
 		opts->x_ix86_move_max = PVW_AVX128;
@@ -2866,7 +2867,8 @@ ix86_option_override_internal (bool main_args_p,
 	  opts->x_ix86_store_max = opts->x_prefer_vector_width_type;
 	  if (opts_set->x_ix86_store_max == PVW_NONE)
 	    {
-	      if (TARGET_AVX512F_P (opts->x_ix86_isa_flags))
+	      if (TARGET_AVX512F_P (opts->x_ix86_isa_flags)
+		  && TARGET_EVEX512_P (opts->x_ix86_isa_flags2))
 		opts->x_ix86_store_max = PVW_AVX512;
 	      else
 		opts->x_ix86_store_max = PVW_AVX128;
@@ -3145,13 +3147,13 @@ ix86_simd_clone_adjust (struct cgraph_node *node)
     case 'e':
       if (TARGET_PREFER_AVX256)
 	{
-	  if (!TARGET_AVX512F)
-	    str = "avx512f,prefer-vector-width=512";
+	  if (!TARGET_AVX512F || !TARGET_EVEX512)
+	    str = "avx512f,evex512,prefer-vector-width=512";
 	  else
 	    str = "prefer-vector-width=512";
 	}
-      else if (!TARGET_AVX512F)
-	str = "avx512f";
+      else if (!TARGET_AVX512F || !TARGET_EVEX512)
+	str = "avx512f,evex512";
       break;
     default:
       gcc_unreachable ();
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 477e6cecc38..0df3bf10547 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -1924,7 +1924,8 @@ type_natural_mode (const_tree type, const CUMULATIVE_ARGS *cum,
 	    if (GET_MODE_NUNITS (mode) == TYPE_VECTOR_SUBPARTS (type)
 		&& GET_MODE_INNER (mode) == innermode)
 	      {
-		if (size == 64 && !TARGET_AVX512F && !TARGET_IAMCU)
+		if (size == 64 && (!TARGET_AVX512F || !TARGET_EVEX512)
+		    && !TARGET_IAMCU)
 		  {
 		    static bool warnedavx512f;
 		    static bool warnedavx512f_ret;
@@ -4347,7 +4348,7 @@ ix86_return_in_memory (const_tree type, const_tree fntype ATTRIBUTE_UNUSED)
 
 	  /* AVX512F values are returned in ZMM0 if available.  */
 	  if (size == 64)
-	    return !TARGET_AVX512F;
+	    return !TARGET_AVX512F || !TARGET_EVEX512;
 	}
 
       if (mode == XFmode)
@@ -20286,7 +20287,7 @@ ix86_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
 	  - any of 512-bit wide vector mode
 	  - any scalar mode.  */
       if (TARGET_AVX512F
-	  && (VALID_AVX512F_REG_OR_XI_MODE (mode)
+	  && ((VALID_AVX512F_REG_OR_XI_MODE (mode) && TARGET_EVEX512)
 	      || VALID_AVX512F_SCALAR_MODE (mode)))
 	return true;
 
@@ -20538,7 +20539,7 @@ ix86_set_reg_reg_cost (machine_mode mode)
 
     case MODE_VECTOR_INT:
     case MODE_VECTOR_FLOAT:
-      if ((TARGET_AVX512F && VALID_AVX512F_REG_MODE (mode))
+      if ((TARGET_AVX512F && TARGET_EVEX512 && VALID_AVX512F_REG_MODE (mode))
 	  || (TARGET_AVX && VALID_AVX256_REG_MODE (mode))
 	  || (TARGET_SSE2 && VALID_SSE2_REG_MODE (mode))
 	  || (TARGET_SSE && VALID_SSE_REG_MODE (mode))
@@ -21267,7 +21268,8 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno,
 	{
 	  /* (ior (not ...) ...) can be a single insn in AVX512.  */
 	  if (GET_CODE (XEXP (x, 0)) == NOT && TARGET_AVX512F
-	      && (GET_MODE_SIZE (mode) == 64
+	      && ((TARGET_EVEX512
+		   && GET_MODE_SIZE (mode) == 64)
 		  || (TARGET_AVX512VL
 		      && (GET_MODE_SIZE (mode) == 32
 			  || GET_MODE_SIZE (mode) == 16))))
@@ -21315,7 +21317,8 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno,
 
 	      /* (and (not ...) (not ...)) can be a single insn in AVX512.  */
 	      if (GET_CODE (right) == NOT && TARGET_AVX512F
-		  && (GET_MODE_SIZE (mode) == 64
+		  && ((TARGET_EVEX512
+		       && GET_MODE_SIZE (mode) == 64)
 		      || (TARGET_AVX512VL
 			  && (GET_MODE_SIZE (mode) == 32
 			      || GET_MODE_SIZE (mode) == 16))))
@@ -21385,7 +21388,8 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno,
 	{
 	  /* (not (xor ...)) can be a single insn in AVX512.  */
 	  if (GET_CODE (XEXP (x, 0)) == XOR && TARGET_AVX512F
-	      && (GET_MODE_SIZE (mode) == 64
+	      && ((TARGET_EVEX512
+		   && GET_MODE_SIZE (mode) == 64)
 		  || (TARGET_AVX512VL
 		      && (GET_MODE_SIZE (mode) == 32
 			  || GET_MODE_SIZE (mode) == 16))))
@@ -23000,7 +23004,7 @@ ix86_vector_mode_supported_p (machine_mode mode)
     return true;
   if (TARGET_AVX && VALID_AVX256_REG_MODE (mode))
     return true;
-  if (TARGET_AVX512F && VALID_AVX512F_REG_MODE (mode))
+  if (TARGET_AVX512F && TARGET_EVEX512 && VALID_AVX512F_REG_MODE (mode))
     return true;
   if ((TARGET_MMX || TARGET_MMX_WITH_SSE)
       && VALID_MMX_REG_MODE (mode))
@@ -23690,7 +23694,7 @@ ix86_preferred_simd_mode (scalar_mode mode)
   switch (mode)
     {
     case E_QImode:
-      if (TARGET_AVX512BW && !TARGET_PREFER_AVX256)
+      if (TARGET_AVX512BW && TARGET_EVEX512 && !TARGET_PREFER_AVX256)
 	return V64QImode;
       else if (TARGET_AVX && !TARGET_PREFER_AVX128)
 	return V32QImode;
@@ -23698,7 +23702,7 @@ ix86_preferred_simd_mode (scalar_mode mode)
 	return V16QImode;
 
     case E_HImode:
-      if (TARGET_AVX512BW && !TARGET_PREFER_AVX256)
+      if (TARGET_AVX512BW && TARGET_EVEX512 && !TARGET_PREFER_AVX256)
 	return V32HImode;
       else if (TARGET_AVX && !TARGET_PREFER_AVX128)
 	return V16HImode;
@@ -23706,7 +23710,7 @@ ix86_preferred_simd_mode (scalar_mode mode)
 	return V8HImode;
 
     case E_SImode:
-      if (TARGET_AVX512F && !TARGET_PREFER_AVX256)
+      if (TARGET_AVX512F && TARGET_EVEX512 && !TARGET_PREFER_AVX256)
 	return V16SImode;
       else if (TARGET_AVX && !TARGET_PREFER_AVX128)
 	return V8SImode;
@@ -23714,7 +23718,7 @@ ix86_preferred_simd_mode (scalar_mode mode)
 	return V4SImode;
 
     case E_DImode:
-      if (TARGET_AVX512F && !TARGET_PREFER_AVX256)
+      if (TARGET_AVX512F && TARGET_EVEX512 && !TARGET_PREFER_AVX256)
 	return V8DImode;
       else if (TARGET_AVX && !TARGET_PREFER_AVX128)
 	return V4DImode;
@@ -23728,15 +23732,16 @@ ix86_preferred_simd_mode (scalar_mode mode)
 	    {
 	      if (TARGET_PREFER_AVX128)
 		return V8HFmode;
-	      else if (TARGET_PREFER_AVX256)
+	      else if (TARGET_PREFER_AVX256 || !TARGET_EVEX512)
 		return V16HFmode;
 	    }
-	  return V32HFmode;
+	  if (TARGET_EVEX512)
+	    return V32HFmode;
 	}
       return word_mode;
 
     case E_SFmode:
-      if (TARGET_AVX512F && !TARGET_PREFER_AVX256)
+      if (TARGET_AVX512F && TARGET_EVEX512 && !TARGET_PREFER_AVX256)
 	return V16SFmode;
       else if (TARGET_AVX && !TARGET_PREFER_AVX128)
 	return V8SFmode;
@@ -23744,7 +23749,7 @@ ix86_preferred_simd_mode (scalar_mode mode)
 	return V4SFmode;
 
     case E_DFmode:
-      if (TARGET_AVX512F && !TARGET_PREFER_AVX256)
+      if (TARGET_AVX512F && TARGET_EVEX512 && !TARGET_PREFER_AVX256)
 	return V8DFmode;
       else if (TARGET_AVX && !TARGET_PREFER_AVX128)
 	return V4DFmode;
@@ -23764,13 +23769,13 @@ ix86_preferred_simd_mode (scalar_mode mode)
 static unsigned int
 ix86_autovectorize_vector_modes (vector_modes *modes, bool all)
 {
-  if (TARGET_AVX512F && !TARGET_PREFER_AVX256)
+  if (TARGET_AVX512F && TARGET_EVEX512 && !TARGET_PREFER_AVX256)
     {
       modes->safe_push (V64QImode);
       modes->safe_push (V32QImode);
       modes->safe_push (V16QImode);
     }
-  else if (TARGET_AVX512F && all)
+  else if (TARGET_AVX512F && TARGET_EVEX512 && all)
     {
       modes->safe_push (V32QImode);
       modes->safe_push (V16QImode);
@@ -23808,7 +23813,7 @@ ix86_get_mask_mode (machine_mode data_mode)
   unsigned elem_size = vector_size / nunits;
 
   /* Scalar mask case.  */
-  if ((TARGET_AVX512F && vector_size == 64)
+  if ((TARGET_AVX512F && TARGET_EVEX512 && vector_size == 64)
       || (TARGET_AVX512VL && (vector_size == 32 || vector_size == 16)))
     {
       if (elem_size == 4
@@ -24306,7 +24311,7 @@ ix86_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
     {
       /* If the function isn't exported, we can pick up just one ISA
 	 for the clones.  */
-      if (TARGET_AVX512F)
+      if (TARGET_AVX512F && TARGET_EVEX512)
 	clonei->vecsize_mangle = 'e';
       else if (TARGET_AVX2)
 	clonei->vecsize_mangle = 'd';
@@ -24398,17 +24403,17 @@ ix86_simd_clone_usable (struct cgraph_node *node)
 	return -1;
       if (!TARGET_AVX)
 	return 0;
-      return TARGET_AVX512F ? 3 : TARGET_AVX2 ? 2 : 1;
+      return (TARGET_AVX512F && TARGET_EVEX512) ? 3 : TARGET_AVX2 ? 2 : 1;
     case 'c':
       if (!TARGET_AVX)
 	return -1;
-      return TARGET_AVX512F ? 2 : TARGET_AVX2 ? 1 : 0;
+      return (TARGET_AVX512F && TARGET_EVEX512) ? 2 : TARGET_AVX2 ? 1 : 0;
     case 'd':
       if (!TARGET_AVX2)
 	return -1;
-      return TARGET_AVX512F ? 1 : 0;
+      return (TARGET_AVX512F && TARGET_EVEX512) ? 1 : 0;
     case 'e':
-      if (!TARGET_AVX512F)
+      if (!TARGET_AVX512F || !TARGET_EVEX512)
 	return -1;
       return 0;
     default:
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 3e8488f2ae8..aac972f5caf 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -770,7 +770,8 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
    TARGET_ABSOLUTE_BIGGEST_ALIGNMENT.  */
 
 #define BIGGEST_ALIGNMENT \
-  (TARGET_IAMCU ? 32 : (TARGET_AVX512F ? 512 : (TARGET_AVX ? 256 : 128)))
+  (TARGET_IAMCU ? 32 : ((TARGET_AVX512F && TARGET_EVEX512) \
+			? 512 : (TARGET_AVX ? 256 : 128)))
 
 /* Maximum stack alignment.  */
 #define MAX_STACK_ALIGNMENT MAX_OFILE_ALIGNMENT
@@ -1807,7 +1808,7 @@ typedef struct ix86_args {
    MOVE_MAX_PIECES defaults to MOVE_MAX.  */
 
 #define MOVE_MAX \
-  ((TARGET_AVX512F \
+  ((TARGET_AVX512F && TARGET_EVEX512\
     && (ix86_move_max == PVW_AVX512 \
 	|| ix86_store_max == PVW_AVX512)) \
    ? 64 \
@@ -1826,7 +1827,7 @@ typedef struct ix86_args {
    store_by_pieces of 16/32/64 bytes.  */
 #define STORE_MAX_PIECES \
   (TARGET_INTER_UNIT_MOVES_TO_VEC \
-   ? ((TARGET_AVX512F && ix86_store_max == PVW_AVX512) \
+   ? ((TARGET_AVX512F && TARGET_EVEX512 && ix86_store_max == PVW_AVX512) \
       ? 64 \
       : ((TARGET_AVX \
 	  && ix86_store_max >= PVW_AVX256) \
-- 
2.31.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 13/18] Support -mevex512 for AVX512F intrins
  2023-09-21  7:19 [PATCH 00/18] Support -mevex512 for AVX512 Hu, Lin1
                   ` (11 preceding siblings ...)
  2023-09-21  7:20 ` [PATCH 12/18] Disable zmm register and 512 bit libmvec call when !TARGET_EVEX512 Hu, Lin1
@ 2023-09-21  7:20 ` Hu, Lin1
  2023-09-21  7:20 ` [PATCH 14/18] Support -mevex512 for AVX512DQ intrins Hu, Lin1
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Hu, Lin1 @ 2023-09-21  7:20 UTC (permalink / raw)
  To: gcc-patches; +Cc: hongtao.liu, ubizjak, haochen.jiang

From: Haochen Jiang <haochen.jiang@intel.com>

gcc/ChangeLog:

	* config/i386/i386-builtins.cc
	(ix86_vectorize_builtin_gather): Disable 512 bit gather
	when !TARGET_EVEX512.
	* config/i386/i386-expand.cc (ix86_valid_mask_cmp_mode):
	Add TARGET_EVEX512.
	(ix86_expand_int_sse_cmp): Ditto.
	(ix86_expand_vector_init_one_nonzero): Disable subroutine
	when !TARGET_EVEX512.
	(ix86_emit_swsqrtsf): Add TARGET_EVEX512.
	(ix86_vectorize_vec_perm_const): Disable subroutine when
	!TARGET_EVEX512.
	* config/i386/i386.cc
	(standard_sse_constant_p): Add TARGET_EVEX512.
	(standard_sse_constant_opcode): Ditto.
	(ix86_get_ssemov): Ditto.
	(ix86_legitimate_constant_p): Ditto.
	(ix86_vectorize_builtin_scatter): Diable 512 bit scatter
	when !TARGET_EVEX512.
	* config/i386/i386.md (avx512f_512): New.
	(movxi): Add TARGET_EVEX512.
	(*movxi_internal_avx512f): Ditto.
	(*movdi_internal): Change alternative 12 to ?Yv. Adjust mode
	for alternative 13.
	(*movsi_internal): Change alternative 8 to ?Yv. Adjust mode for
	alternative 9.
	(*movhi_internal): Change alternative 11 to *Yv.
	(*movdf_internal): Change alternative 12 to Yv.
	(*movsf_internal): Change alternative 5 to Yv. Adjust mode for
	alternative 5 and 6.
	(*mov<mode>_internal): Change alternative 4 to Yv.
	(define_split for convert SF to DF): Add TARGET_EVEX512.
	(extendbfsf2_1): Ditto.
	* config/i386/predicates.md (bcst_mem_operand): Disable predicate
	for 512 bit when !TARGET_EVEX512.
	* config/i386/sse.md (VMOVE): Add TARGET_EVEX512.
	(V48_AVX512VL): Ditto.
	(V48_256_512_AVX512VL): Ditto.
	(V48H_AVX512VL): Ditto.
	(VI12_AVX512VL): Ditto.
	(V): Ditto.
	(V_512): Ditto.
	(V_256_512): Ditto.
	(VF): Ditto.
	(VF1_VF2_AVX512DQ): Ditto.
	(VFH): Ditto.
	(VFB): Ditto.
	(VF1): Ditto.
	(VF1_AVX2): Ditto.
	(VF2): Ditto.
	(VF2H): Ditto.
	(VF2_512_256): Ditto.
	(VF2_512_256VL): Ditto.
	(VF_512): Ditto.
	(VFB_512): Ditto.
	(VI48_AVX512VL): Ditto.
	(VI1248_AVX512VLBW): Ditto.
	(VF_AVX512VL): Ditto.
	(VFH_AVX512VL): Ditto.
	(VF1_AVX512VL): Ditto.
	(VI): Ditto.
	(VIHFBF): Ditto.
	(VI_AVX2): Ditto.
	(VI8): Ditto.
	(VI8_AVX512VL): Ditto.
	(VI2_AVX512F): Ditto.
	(VI4_AVX512F): Ditto.
	(VI4_AVX512VL): Ditto.
	(VI48_AVX512F_AVX512VL): Ditto.
	(VI8_AVX2_AVX512F): Ditto.
	(VI8_AVX_AVX512F): Ditto.
	(V8FI): Ditto.
	(V16FI): Ditto.
	(VI124_AVX2_24_AVX512F_1_AVX512BW): Ditto.
	(VI248_AVX512VLBW): Ditto.
	(VI248_AVX2_8_AVX512F_24_AVX512BW): Ditto.
	(VI248_AVX512BW): Ditto.
	(VI248_AVX512BW_AVX512VL): Ditto.
	(VI48_AVX512F): Ditto.
	(VI48_AVX_AVX512F): Ditto.
	(VI12_AVX_AVX512F): Ditto.
	(VI148_512): Ditto.
	(VI124_256_AVX512F_AVX512BW): Ditto.
	(VI48_512): Ditto.
	(VI_AVX512BW): Ditto.
	(VIHFBF_AVX512BW): Ditto.
	(VI4F_256_512): Ditto.
	(VI48F_256_512): Ditto.
	(VI48F): Ditto.
	(VI12_VI48F_AVX512VL): Ditto.
	(V32_512): Ditto.
	(AVX512MODE2P): Ditto.
	(STORENT_MODE): Ditto.
	(REDUC_PLUS_MODE): Ditto.
	(REDUC_SMINMAX_MODE): Ditto.
	(*andnot<mode>3): Change isa attribute to avx512f_512.
	(*andnot<mode>3): Ditto.
	(<code><mode>3): Ditto.
	(<code>tf3): Ditto.
	(FMAMODEM): Add TARGET_EVEX512.
	(FMAMODE_AVX512): Ditto.
	(VFH_SF_AVX512VL): Ditto.
	(avx512f_fix_notruncv16sfv16si<mask_name><round_name>): Ditto.
	(fix<fixunssuffix>_truncv16sfv16si2<mask_name><round_saeonly_name>):
	Ditto.
	(avx512f_cvtdq2pd512_2): Ditto.
	(avx512f_cvtpd2dq512<mask_name><round_name>): Ditto.
	(fix<fixunssuffix>_truncv8dfv8si2<mask_name><round_saeonly_name>):
	Ditto.
	(<mask_codefor>avx512f_cvtpd2ps512<mask_name><round_name>): Ditto.
	(vec_unpacks_lo_v16sf): Ditto.
	(vec_unpacks_hi_v16sf): Ditto.
	(vec_unpacks_float_hi_v16si): Ditto.
	(vec_unpacks_float_lo_v16si): Ditto.
	(vec_unpacku_float_hi_v16si): Ditto.
	(vec_unpacku_float_lo_v16si): Ditto.
	(vec_pack_sfix_trunc_v8df): Ditto.
	(avx512f_vec_pack_sfix_v8df): Ditto.
	(<mask_codefor>avx512f_unpckhps512<mask_name>): Ditto.
	(<mask_codefor>avx512f_unpcklps512<mask_name>): Ditto.
	(<mask_codefor>avx512f_movshdup512<mask_name>): Ditto.
	(<mask_codefor>avx512f_movsldup512<mask_name>): Ditto.
	(AVX512_VEC): Ditto.
	(AVX512_VEC_2): Ditto.
	(vec_extract_lo_v64qi): Ditto.
	(vec_extract_hi_v64qi): Ditto.
	(VEC_EXTRACT_MODE): Ditto.
	(<mask_codefor>avx512f_unpckhpd512<mask_name>): Ditto.
	(avx512f_movddup512<mask_name>): Ditto.
	(avx512f_unpcklpd512<mask_name>): Ditto.
	(*<avx512>_vternlog<mode>_all): Ditto.
	(*<avx512>_vpternlog<mode>_1): Ditto.
	(*<avx512>_vpternlog<mode>_2): Ditto.
	(*<avx512>_vpternlog<mode>_3): Ditto.
	(avx512f_shufps512_mask): Ditto.
	(avx512f_shufps512_1<mask_name>): Ditto.
	(avx512f_shufpd512_mask): Ditto.
	(avx512f_shufpd512_1<mask_name>): Ditto.
	(<mask_codefor>avx512f_interleave_highv8di<mask_name>): Ditto.
	(<mask_codefor>avx512f_interleave_lowv8di<mask_name>): Ditto.
	(vec_dupv2df<mask_name>): Ditto.
	(trunc<pmov_src_lower><mode>2): Ditto.
	(*avx512f_<code><pmov_src_lower><mode>2): Ditto.
	(*avx512f_vpermvar_truncv8div8si_1): Ditto.
	(avx512f_<code><pmov_src_lower><mode>2_mask): Ditto.
	(avx512f_<code><pmov_src_lower><mode>2_mask_store): Ditto.
	(truncv8div8qi2): Ditto.
	(avx512f_<code>v8div16qi2): Ditto.
	(*avx512f_<code>v8div16qi2_store_1): Ditto.
	(*avx512f_<code>v8div16qi2_store_2): Ditto.
	(avx512f_<code>v8div16qi2_mask): Ditto.
	(*avx512f_<code>v8div16qi2_mask_1): Ditto.
	(*avx512f_<code>v8div16qi2_mask_store_1): Ditto.
	(avx512f_<code>v8div16qi2_mask_store_2): Ditto.
	(vec_widen_umult_even_v16si<mask_name>): Ditto.
	(*vec_widen_umult_even_v16si<mask_name>): Ditto.
	(vec_widen_smult_even_v16si<mask_name>): Ditto.
	(*vec_widen_smult_even_v16si<mask_name>): Ditto.
	(VEC_PERM_AVX2): Ditto.
	(one_cmpl<mode>2): Ditto.
	(<mask_codefor>one_cmpl<mode>2<mask_name>): Ditto.
	(*one_cmpl<mode>2_pternlog_false_dep): Ditto.
	(define_split to xor): Ditto.
	(*andnot<mode>3): Ditto.
	(define_split for ior): Ditto.
	(*iornot<mode>3): Ditto.
	(*xnor<mode>3): Ditto.
	(*<nlogic><mode>3): Ditto.
	(<mask_codefor>avx512f_interleave_highv16si<mask_name>): Ditto.
	(<mask_codefor>avx512f_interleave_lowv16si<mask_name>): Ditto.
	(avx512f_pshufdv3_mask): Ditto.
	(avx512f_pshufd_1<mask_name>): Ditto.
	(*vec_extractv4ti): Ditto.
	(VEXTRACTI128_MODE): Ditto.
	(define_split to vec_extract): Ditto.
	(VI1248_AVX512VL_AVX512BW): Ditto.
	(<mask_codefor>avx512f_<code>v16qiv16si2<mask_name>): Ditto.
	(<insn>v16qiv16si2): Ditto.
	(avx512f_<code>v16hiv16si2<mask_name>): Ditto.
	(<insn>v16hiv16si2): Ditto.
	(avx512f_zero_extendv16hiv16si2_1): Ditto.
	(avx512f_<code>v8qiv8di2<mask_name>): Ditto.
	(*avx512f_<code>v8qiv8di2<mask_name>_1): Ditto.
	(*avx512f_<code>v8qiv8di2<mask_name>_2): Ditto.
	(<insn>v8qiv8di2): Ditto.
	(avx512f_<code>v8hiv8di2<mask_name>): Ditto.
	(<insn>v8hiv8di2): Ditto.
	(avx512f_<code>v8siv8di2<mask_name>): Ditto.
	(*avx512f_zero_extendv8siv8di2_1): Ditto.
	(*avx512f_zero_extendv8siv8di2_2): Ditto.
	(<insn>v8siv8di2): Ditto.
	(avx512f_roundps512_sfix): Ditto.
	(vashrv8di3): Ditto.
	(vashrv16si3): Ditto.
	(pbroadcast_evex_isa): Change isa attribute to avx512f_512.
	(vec_dupv4sf): Add TARGET_EVEX512.
	(*vec_dupv4si): Ditto.
	(*vec_dupv2di): Ditto.
	(vec_dup<mode>): Change isa attribute to avx512f_512.
	(VPERMI2): Add TARGET_EVEX512.
	(VPERMI2I): Ditto.
	(VEC_INIT_MODE): Ditto.
	(VEC_INIT_HALF_MODE): Ditto.
	(<mask_codefor>avx512f_vcvtph2ps512<mask_name><round_saeonly_name>):
	Ditto.
	(avx512f_vcvtps2ph512_mask_sae): Ditto.
	(<mask_codefor>avx512f_vcvtps2ph512<mask_name><round_saeonly_name>):
	Ditto.
	(*avx512f_vcvtps2ph512<merge_mask_name>): Ditto.
	(INT_BROADCAST_MODE): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/pr89229-5b.c: Modify message of
	scan-assembler.
	* gcc.target/i386/pr89229-6b.c: Ditto.
	* gcc.target/i386/pr89229-7b.c: Ditto.
---
 gcc/config/i386/i386-builtins.cc           |  24 +-
 gcc/config/i386/i386-expand.cc             |  12 +-
 gcc/config/i386/i386.cc                    | 101 ++--
 gcc/config/i386/i386.md                    |  81 ++-
 gcc/config/i386/predicates.md              |   3 +-
 gcc/config/i386/sse.md                     | 553 +++++++++++----------
 gcc/testsuite/gcc.target/i386/pr89229-5b.c |   2 +-
 gcc/testsuite/gcc.target/i386/pr89229-6b.c |   2 +-
 gcc/testsuite/gcc.target/i386/pr89229-7b.c |   2 +-
 9 files changed, 445 insertions(+), 335 deletions(-)

diff --git a/gcc/config/i386/i386-builtins.cc b/gcc/config/i386/i386-builtins.cc
index e1d1dac2ba2..70538fbe17b 100644
--- a/gcc/config/i386/i386-builtins.cc
+++ b/gcc/config/i386/i386-builtins.cc
@@ -1675,6 +1675,10 @@ ix86_vectorize_builtin_gather (const_tree mem_vectype,
 {
   bool si;
   enum ix86_builtins code;
+  const machine_mode mode = TYPE_MODE (TREE_TYPE (mem_vectype));
+
+  if ((!TARGET_AVX512F || !TARGET_EVEX512) && GET_MODE_SIZE (mode) == 64)
+    return NULL_TREE;
 
   if (! TARGET_AVX2
       || (known_eq (TYPE_VECTOR_SUBPARTS (mem_vectype), 2u)
@@ -1755,28 +1759,16 @@ ix86_vectorize_builtin_gather (const_tree mem_vectype,
 	code = si ? IX86_BUILTIN_GATHERSIV8SI : IX86_BUILTIN_GATHERALTDIV8SI;
       break;
     case E_V8DFmode:
-      if (TARGET_AVX512F)
-	code = si ? IX86_BUILTIN_GATHER3ALTSIV8DF : IX86_BUILTIN_GATHER3DIV8DF;
-      else
-	return NULL_TREE;
+      code = si ? IX86_BUILTIN_GATHER3ALTSIV8DF : IX86_BUILTIN_GATHER3DIV8DF;
       break;
     case E_V8DImode:
-      if (TARGET_AVX512F)
-	code = si ? IX86_BUILTIN_GATHER3ALTSIV8DI : IX86_BUILTIN_GATHER3DIV8DI;
-      else
-	return NULL_TREE;
+      code = si ? IX86_BUILTIN_GATHER3ALTSIV8DI : IX86_BUILTIN_GATHER3DIV8DI;
       break;
     case E_V16SFmode:
-      if (TARGET_AVX512F)
-	code = si ? IX86_BUILTIN_GATHER3SIV16SF : IX86_BUILTIN_GATHER3ALTDIV16SF;
-      else
-	return NULL_TREE;
+      code = si ? IX86_BUILTIN_GATHER3SIV16SF : IX86_BUILTIN_GATHER3ALTDIV16SF;
       break;
     case E_V16SImode:
-      if (TARGET_AVX512F)
-	code = si ? IX86_BUILTIN_GATHER3SIV16SI : IX86_BUILTIN_GATHER3ALTDIV16SI;
-      else
-	return NULL_TREE;
+      code = si ? IX86_BUILTIN_GATHER3SIV16SI : IX86_BUILTIN_GATHER3ALTDIV16SI;
       break;
     default:
       return NULL_TREE;
diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 6eedcb384c0..0705e08d38c 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -3943,7 +3943,7 @@ ix86_valid_mask_cmp_mode (machine_mode mode)
   if ((inner_mode == QImode || inner_mode == HImode) && !TARGET_AVX512BW)
     return false;
 
-  return vector_size == 64 || TARGET_AVX512VL;
+  return (vector_size == 64 && TARGET_EVEX512) || TARGET_AVX512VL;
 }
 
 /* Return true if integer mask comparison should be used.  */
@@ -4773,7 +4773,7 @@ ix86_expand_int_sse_cmp (rtx dest, enum rtx_code code, rtx cop0, rtx cop1,
 	      && GET_MODE_SIZE (GET_MODE_INNER (mode)) >= 4
 	      /* Don't do it if not using integer masks and we'd end up with
 		 the right values in the registers though.  */
-	      && (GET_MODE_SIZE (mode) == 64
+	      && ((GET_MODE_SIZE (mode) == 64 && TARGET_EVEX512)
 		  || !vector_all_ones_operand (optrue, data_mode)
 		  || opfalse != CONST0_RTX (data_mode))))
 	{
@@ -15668,6 +15668,9 @@ ix86_expand_vector_init_one_nonzero (bool mmx_ok, machine_mode mode,
   bool use_vector_set = false;
   rtx (*gen_vec_set_0) (rtx, rtx, rtx) = NULL;
 
+  if (GET_MODE_SIZE (mode) == 64 && !TARGET_EVEX512)
+    return false;
+
   switch (mode)
     {
     case E_V2DImode:
@@ -18288,7 +18291,7 @@ ix86_emit_swsqrtsf (rtx res, rtx a, machine_mode mode, bool recip)
 
   unsigned vector_size = GET_MODE_SIZE (mode);
   if (TARGET_FMA
-      || (TARGET_AVX512F && vector_size == 64)
+      || (TARGET_AVX512F && TARGET_EVEX512 && vector_size == 64)
       || (TARGET_AVX512VL && (vector_size == 32 || vector_size == 16)))
     emit_insn (gen_rtx_SET (e2,
 			    gen_rtx_FMA (mode, e0, x0, mthree)));
@@ -23005,6 +23008,9 @@ ix86_vectorize_vec_perm_const (machine_mode vmode, machine_mode op_mode,
   unsigned int i, nelt, which;
   bool two_args;
 
+  if (GET_MODE_SIZE (vmode) == 64 && !TARGET_EVEX512)
+    return false;
+
   /* For HF mode vector, convert it to HI using subreg.  */
   if (GET_MODE_INNER (vmode) == HFmode)
     {
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 0df3bf10547..635dd85e764 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -5263,7 +5263,7 @@ standard_sse_constant_p (rtx x, machine_mode pred_mode)
       switch (GET_MODE_SIZE (mode))
 	{
 	case 64:
-	  if (TARGET_AVX512F)
+	  if (TARGET_AVX512F && TARGET_EVEX512)
 	    return 2;
 	  break;
 	case 32:
@@ -5313,9 +5313,14 @@ standard_sse_constant_opcode (rtx_insn *insn, rtx *operands)
 	case MODE_XI:
 	case MODE_OI:
 	  if (EXT_REX_SSE_REG_P (operands[0]))
-	    return (TARGET_AVX512VL
-		    ? "vpxord\t%x0, %x0, %x0"
-		    : "vpxord\t%g0, %g0, %g0");
+	    {
+	      if (TARGET_AVX512VL)
+		return "vpxord\t%x0, %x0, %x0";
+	      else if (TARGET_EVEX512)
+		return "vpxord\t%g0, %g0, %g0";
+	      else
+		gcc_unreachable ();
+	    }
 	  return "vpxor\t%x0, %x0, %x0";
 
 	case MODE_V2DF:
@@ -5324,16 +5329,23 @@ standard_sse_constant_opcode (rtx_insn *insn, rtx *operands)
 	  /* FALLTHRU */
 	case MODE_V8DF:
 	case MODE_V4DF:
-	  if (!EXT_REX_SSE_REG_P (operands[0]))
-	    return "vxorpd\t%x0, %x0, %x0";
-	  else if (TARGET_AVX512DQ)
-	    return (TARGET_AVX512VL
-		    ? "vxorpd\t%x0, %x0, %x0"
-		    : "vxorpd\t%g0, %g0, %g0");
-	  else
-	    return (TARGET_AVX512VL
-		    ? "vpxorq\t%x0, %x0, %x0"
-		    : "vpxorq\t%g0, %g0, %g0");
+	  if (EXT_REX_SSE_REG_P (operands[0]))
+	    {
+	      if (TARGET_AVX512DQ)
+		return (TARGET_AVX512VL
+			? "vxorpd\t%x0, %x0, %x0"
+			: "vxorpd\t%g0, %g0, %g0");
+	      else
+		{
+		  if (TARGET_AVX512VL)
+		    return "vpxorq\t%x0, %x0, %x0";
+		  else if (TARGET_EVEX512)
+		    return "vpxorq\t%g0, %g0, %g0";
+		  else
+		    gcc_unreachable ();
+		}
+	    }
+	  return "vxorpd\t%x0, %x0, %x0";
 
 	case MODE_V4SF:
 	  if (!EXT_REX_SSE_REG_P (operands[0]))
@@ -5341,16 +5353,23 @@ standard_sse_constant_opcode (rtx_insn *insn, rtx *operands)
 	  /* FALLTHRU */
 	case MODE_V16SF:
 	case MODE_V8SF:
-	  if (!EXT_REX_SSE_REG_P (operands[0]))
-	    return "vxorps\t%x0, %x0, %x0";
-	  else if (TARGET_AVX512DQ)
-	    return (TARGET_AVX512VL
-		    ? "vxorps\t%x0, %x0, %x0"
-		    : "vxorps\t%g0, %g0, %g0");
-	  else
-	    return (TARGET_AVX512VL
-		    ? "vpxord\t%x0, %x0, %x0"
-		    : "vpxord\t%g0, %g0, %g0");
+	  if (EXT_REX_SSE_REG_P (operands[0]))
+	    {
+	      if (TARGET_AVX512DQ)
+		return (TARGET_AVX512VL
+			? "vxorps\t%x0, %x0, %x0"
+			: "vxorps\t%g0, %g0, %g0");
+	      else
+		{
+		  if (TARGET_AVX512VL)
+		    return "vpxord\t%x0, %x0, %x0";
+		  else if (TARGET_EVEX512)
+		    return "vpxord\t%g0, %g0, %g0";
+		  else
+		    gcc_unreachable ();
+		}
+	    }
+	  return "vxorps\t%x0, %x0, %x0";
 
 	default:
 	  gcc_unreachable ();
@@ -5368,7 +5387,7 @@ standard_sse_constant_opcode (rtx_insn *insn, rtx *operands)
 	case MODE_XI:
 	case MODE_V8DF:
 	case MODE_V16SF:
-	  gcc_assert (TARGET_AVX512F);
+	  gcc_assert (TARGET_AVX512F && TARGET_EVEX512);
 	  return "vpternlogd\t{$0xFF, %g0, %g0, %g0|%g0, %g0, %g0, 0xFF}";
 
 	case MODE_OI:
@@ -5380,14 +5399,18 @@ standard_sse_constant_opcode (rtx_insn *insn, rtx *operands)
 	case MODE_V2DF:
 	case MODE_V4SF:
 	  gcc_assert (TARGET_SSE2);
-	  if (!EXT_REX_SSE_REG_P (operands[0]))
-	    return (TARGET_AVX
-		    ? "vpcmpeqd\t%0, %0, %0"
-		    : "pcmpeqd\t%0, %0");
-	  else if (TARGET_AVX512VL)
-	    return "vpternlogd\t{$0xFF, %0, %0, %0|%0, %0, %0, 0xFF}";
-	  else
-	    return "vpternlogd\t{$0xFF, %g0, %g0, %g0|%g0, %g0, %g0, 0xFF}";
+	  if (EXT_REX_SSE_REG_P (operands[0]))
+	    {
+	      if (TARGET_AVX512VL)
+		return "vpternlogd\t{$0xFF, %0, %0, %0|%0, %0, %0, 0xFF}";
+	      else if (TARGET_EVEX512)
+		return "vpternlogd\t{$0xFF, %g0, %g0, %g0|%g0, %g0, %g0, 0xFF}";
+	      else
+		gcc_unreachable ();
+	    }
+	  return (TARGET_AVX
+		  ? "vpcmpeqd\t%0, %0, %0"
+		  : "pcmpeqd\t%0, %0");
 
 	default:
 	  gcc_unreachable ();
@@ -5397,7 +5420,7 @@ standard_sse_constant_opcode (rtx_insn *insn, rtx *operands)
     {
       if (GET_MODE_SIZE (mode) == 64)
 	{
-	  gcc_assert (TARGET_AVX512F);
+	  gcc_assert (TARGET_AVX512F && TARGET_EVEX512);
 	  return "vpcmpeqd\t%t0, %t0, %t0";
 	}
       else if (GET_MODE_SIZE (mode) == 32)
@@ -5409,7 +5432,7 @@ standard_sse_constant_opcode (rtx_insn *insn, rtx *operands)
     }
   else if (vector_all_ones_zero_extend_quarter_operand (x, mode))
     {
-      gcc_assert (TARGET_AVX512F);
+      gcc_assert (TARGET_AVX512F && TARGET_EVEX512);
       return "vpcmpeqd\t%x0, %x0, %x0";
     }
 
@@ -5511,6 +5534,8 @@ ix86_get_ssemov (rtx *operands, unsigned size,
 	  || memory_operand (operands[1], mode))
 	gcc_unreachable ();
       size = 64;
+      /* We need TARGET_EVEX512 to move into zmm register.  */
+      gcc_assert (TARGET_EVEX512);
       switch (type)
 	{
 	case opcode_int:
@@ -10727,7 +10752,7 @@ ix86_legitimate_constant_p (machine_mode mode, rtx x)
 	case E_OImode:
 	case E_XImode:
 	  if (!standard_sse_constant_p (x, mode)
-	      && GET_MODE_SIZE (TARGET_AVX512F
+	      && GET_MODE_SIZE (TARGET_AVX512F && TARGET_EVEX512
 				? XImode
 				: (TARGET_AVX
 				   ? OImode
@@ -19195,10 +19220,14 @@ ix86_vectorize_builtin_scatter (const_tree vectype,
 {
   bool si;
   enum ix86_builtins code;
+  const machine_mode mode = TYPE_MODE (TREE_TYPE (vectype));
 
   if (!TARGET_AVX512F)
     return NULL_TREE;
 
+  if (!TARGET_EVEX512 && GET_MODE_SIZE (mode) == 64)
+    return NULL_TREE;
+
   if (known_eq (TYPE_VECTOR_SUBPARTS (vectype), 2u)
       ? !TARGET_USE_SCATTER_2PARTS
       : (known_eq (TYPE_VECTOR_SUBPARTS (vectype), 4u)
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index eef8a0e01eb..6eb4e540140 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -535,10 +535,11 @@
 (define_attr "isa" "base,x64,nox64,x64_sse2,x64_sse4,x64_sse4_noavx,
 		    x64_avx,x64_avx512bw,x64_avx512dq,aes,
 		    sse_noavx,sse2,sse2_noavx,sse3,sse3_noavx,sse4,sse4_noavx,
-		    avx,noavx,avx2,noavx2,bmi,bmi2,fma4,fma,avx512f,noavx512f,
-		    avx512bw,noavx512bw,avx512dq,noavx512dq,fma_or_avx512vl,
-		    avx512vl,noavx512vl,avxvnni,avx512vnnivl,avx512fp16,avxifma,
-		    avx512ifmavl,avxneconvert,avx512bf16vl,vpclmulqdqvl"
+		    avx,noavx,avx2,noavx2,bmi,bmi2,fma4,fma,avx512f,avx512f_512,
+		    noavx512f,avx512bw,noavx512bw,avx512dq,noavx512dq,
+		    fma_or_avx512vl,avx512vl,noavx512vl,avxvnni,avx512vnnivl,
+		    avx512fp16,avxifma,avx512ifmavl,avxneconvert,avx512bf16vl,
+		    vpclmulqdqvl"
   (const_string "base"))
 
 ;; The (bounding maximum) length of an instruction immediate.
@@ -899,6 +900,8 @@
 	 (eq_attr "isa" "fma_or_avx512vl")
 	   (symbol_ref "TARGET_FMA || TARGET_AVX512VL")
 	 (eq_attr "isa" "avx512f") (symbol_ref "TARGET_AVX512F")
+	 (eq_attr "isa" "avx512f_512")
+	   (symbol_ref "TARGET_AVX512F && TARGET_EVEX512")
 	 (eq_attr "isa" "noavx512f") (symbol_ref "!TARGET_AVX512F")
 	 (eq_attr "isa" "avx512bw") (symbol_ref "TARGET_AVX512BW")
 	 (eq_attr "isa" "noavx512bw") (symbol_ref "!TARGET_AVX512BW")
@@ -2281,7 +2284,7 @@
 (define_expand "movxi"
   [(set (match_operand:XI 0 "nonimmediate_operand")
 	(match_operand:XI 1 "general_operand"))]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && TARGET_EVEX512"
   "ix86_expand_vector_move (XImode, operands); DONE;")
 
 (define_expand "movoi"
@@ -2357,7 +2360,7 @@
 (define_insn "*movxi_internal_avx512f"
   [(set (match_operand:XI 0 "nonimmediate_operand"		"=v,v ,v ,m")
 	(match_operand:XI 1 "nonimmediate_or_sse_const_operand" " C,BC,vm,v"))]
-  "TARGET_AVX512F
+  "TARGET_AVX512F && TARGET_EVEX512
    && (register_operand (operands[0], XImode)
        || register_operand (operands[1], XImode))"
 {
@@ -2485,9 +2488,9 @@
 
 (define_insn "*movdi_internal"
   [(set (match_operand:DI 0 "nonimmediate_operand"
-    "=r  ,o  ,r,r  ,r,m ,*y,*y,?*y,?m,?r,?*y,?v,?v,?v,m ,m,?r ,?*Yd,?r,?v,?*y,?*x,*k,*k  ,*r,*m,*k")
+    "=r  ,o  ,r,r  ,r,m ,*y,*y,?*y,?m,?r,?*y,?Yv,?v,?v,m ,m,?r ,?*Yd,?r,?v,?*y,?*x,*k,*k  ,*r,*m,*k")
 	(match_operand:DI 1 "general_operand"
-    "riFo,riF,Z,rem,i,re,C ,*y,Bk ,*y,*y,r  ,C ,?v,Bk,?v,v,*Yd,r   ,?v,r  ,*x ,*y ,*r,*kBk,*k,*k,CBC"))]
+    "riFo,riF,Z,rem,i,re,C ,*y,Bk ,*y,*y,r  ,C  ,?v,Bk,?v,v,*Yd,r   ,?v,r  ,*x ,*y ,*r,*kBk,*k,*k,CBC"))]
   "!(MEM_P (operands[0]) && MEM_P (operands[1]))
    && ix86_hardreg_mov_ok (operands[0], operands[1])"
 {
@@ -2605,7 +2608,7 @@
    (set (attr "mode")
      (cond [(eq_attr "alternative" "2")
 	      (const_string "SI")
-	    (eq_attr "alternative" "12,13")
+	    (eq_attr "alternative" "12")
 	      (cond [(match_test "TARGET_AVX")
 		       (const_string "TI")
 		     (ior (not (match_test "TARGET_SSE2"))
@@ -2613,6 +2616,18 @@
 		       (const_string "V4SF")
 		    ]
 		    (const_string "TI"))
+	    (eq_attr "alternative" "13")
+	      (cond [(match_test "TARGET_AVX512VL")
+		       (const_string "TI")
+		     (match_test "TARGET_AVX512F")
+		       (const_string "DF")
+		     (match_test "TARGET_AVX")
+		       (const_string "TI")
+		     (ior (not (match_test "TARGET_SSE2"))
+			  (match_test "optimize_function_for_size_p (cfun)"))
+		       (const_string "V4SF")
+		    ]
+		    (const_string "TI"))
 
 	    (and (eq_attr "alternative" "14,15,16")
 		 (not (match_test "TARGET_SSE2")))
@@ -2706,9 +2721,9 @@
 
 (define_insn "*movsi_internal"
   [(set (match_operand:SI 0 "nonimmediate_operand"
-    "=r,m ,*y,*y,?*y,?m,?r,?*y,?v,?v,?v,m ,?r,?v,*k,*k  ,*rm,*k")
+    "=r,m ,*y,*y,?*y,?m,?r,?*y,?Yv,?v,?v,m ,?r,?v,*k,*k  ,*rm,*k")
 	(match_operand:SI 1 "general_operand"
-    "g ,re,C ,*y,Bk ,*y,*y,r  ,C ,?v,Bk,?v,?v,r  ,*r,*kBk,*k ,CBC"))]
+    "g ,re,C ,*y,Bk ,*y,*y,r  ,C  ,?v,Bk,?v,?v,r  ,*r,*kBk,*k ,CBC"))]
   "!(MEM_P (operands[0]) && MEM_P (operands[1]))
    && ix86_hardreg_mov_ok (operands[0], operands[1])"
 {
@@ -2793,7 +2808,7 @@
    (set (attr "mode")
      (cond [(eq_attr "alternative" "2,3")
 	      (const_string "DI")
-	    (eq_attr "alternative" "8,9")
+	    (eq_attr "alternative" "8")
 	      (cond [(match_test "TARGET_AVX")
 		       (const_string "TI")
 		     (ior (not (match_test "TARGET_SSE2"))
@@ -2801,6 +2816,18 @@
 		       (const_string "V4SF")
 		    ]
 		    (const_string "TI"))
+	    (eq_attr "alternative" "9")
+	      (cond [(match_test "TARGET_AVX512VL")
+		       (const_string "TI")
+		     (match_test "TARGET_AVX512F")
+		       (const_string "SF")
+		     (match_test "TARGET_AVX")
+		       (const_string "TI")
+		     (ior (not (match_test "TARGET_SSE2"))
+			  (match_test "optimize_function_for_size_p (cfun)"))
+		       (const_string "V4SF")
+		    ]
+		    (const_string "TI"))
 
 	    (and (eq_attr "alternative" "10,11")
 	         (not (match_test "TARGET_SSE2")))
@@ -2849,9 +2876,9 @@
 
 (define_insn "*movhi_internal"
   [(set (match_operand:HI 0 "nonimmediate_operand"
-    "=r,r,r,m ,*k,*k ,r ,m ,*k ,?r,?*v,*v,*v,*v,m")
+    "=r,r,r,m ,*k,*k ,r ,m ,*k ,?r,?*v,*Yv,*v,*v,m")
 	(match_operand:HI 1 "general_operand"
-    "r ,n,m,rn,r ,*km,*k,*k,CBC,*v,r  ,C ,*v,m ,*v"))]
+    "r ,n,m,rn,r ,*km,*k,*k,CBC,*v,r  ,C  ,*v,m ,*v"))]
   "!(MEM_P (operands[0]) && MEM_P (operands[1]))
    && ix86_hardreg_mov_ok (operands[0], operands[1])"
 {
@@ -3993,9 +4020,9 @@
 ;; Possible store forwarding (partial memory) stall in alternatives 4, 6 and 7.
 (define_insn "*movdf_internal"
   [(set (match_operand:DF 0 "nonimmediate_operand"
-    "=Yf*f,m   ,Yf*f,?r ,!o,?*r ,!o,!o,?r,?m,?r,?r,v,v,v,m,*x,*x,*x,m ,?r,?v,r  ,o ,r  ,m")
+    "=Yf*f,m   ,Yf*f,?r ,!o,?*r ,!o,!o,?r,?m,?r,?r,Yv,v,v,m,*x,*x,*x,m ,?r,?v,r  ,o ,r  ,m")
 	(match_operand:DF 1 "general_operand"
-    "Yf*fm,Yf*f,G   ,roF,r ,*roF,*r,F ,rm,rC,C ,F ,C,v,m,v,C ,*x,m ,*x, v, r,roF,rF,rmF,rC"))]
+    "Yf*fm,Yf*f,G   ,roF,r ,*roF,*r,F ,rm,rC,C ,F ,C ,v,m,v,C ,*x,m ,*x, v, r,roF,rF,rmF,rC"))]
   "!(MEM_P (operands[0]) && MEM_P (operands[1]))
    && (lra_in_progress || reload_completed
        || !CONST_DOUBLE_P (operands[1])
@@ -4170,9 +4197,9 @@
 
 (define_insn "*movsf_internal"
   [(set (match_operand:SF 0 "nonimmediate_operand"
-	  "=Yf*f,m   ,Yf*f,?r ,?m,v,v,v,m,?r,?v,!*y,!*y,!m,!r,!*y,r  ,m")
+	  "=Yf*f,m   ,Yf*f,?r ,?m,Yv,v,v,m,?r,?v,!*y,!*y,!m,!r,!*y,r  ,m")
 	(match_operand:SF 1 "general_operand"
-	  "Yf*fm,Yf*f,G   ,rmF,rF,C,v,m,v,v ,r ,*y ,m  ,*y,*y,r  ,rmF,rF"))]
+	  "Yf*fm,Yf*f,G   ,rmF,rF,C ,v,m,v,v ,r ,*y ,m  ,*y,*y,r  ,rmF,rF"))]
   "!(MEM_P (operands[0]) && MEM_P (operands[1]))
    && (lra_in_progress || reload_completed
        || !CONST_DOUBLE_P (operands[1])
@@ -4247,7 +4274,7 @@
 	       (eq_attr "alternative" "11")
 		 (const_string "DI")
 	       (eq_attr "alternative" "5")
-		 (cond [(and (match_test "TARGET_AVX512F")
+		 (cond [(and (match_test "TARGET_AVX512F && TARGET_EVEX512")
 			     (not (match_test "TARGET_PREFER_AVX256")))
 			  (const_string "V16SF")
 			(match_test "TARGET_AVX")
@@ -4271,7 +4298,11 @@
 		  better to maintain the whole registers in single format
 		  to avoid problems on using packed logical operations.  */
 	       (eq_attr "alternative" "6")
-		 (cond [(ior (match_test "TARGET_SSE_PARTIAL_REG_DEPENDENCY")
+		 (cond [(match_test "TARGET_AVX512VL")
+			  (const_string "V4SF")
+			(match_test "TARGET_AVX512F")
+			  (const_string "SF")
+			(ior (match_test "TARGET_SSE_PARTIAL_REG_DEPENDENCY")
 			     (match_test "TARGET_SSE_SPLIT_REGS"))
 			  (const_string "V4SF")
 		       ]
@@ -4301,9 +4332,9 @@
 
 (define_insn "*mov<mode>_internal"
  [(set (match_operand:HFBF 0 "nonimmediate_operand"
-	 "=?r,?r,?r,?m,v,v,?r,m,?v,v")
+	 "=?r,?r,?r,?m		 ,Yv,v,?r,m,?v,v")
        (match_operand:HFBF 1 "general_operand"
-	 "r  ,F ,m ,r<hfbfconstf>,C,v, v,v,r ,m"))]
+	 "r  ,F ,m ,r<hfbfconstf>,C ,v, v,v,r ,m"))]
  "!(MEM_P (operands[0]) && MEM_P (operands[1]))
   && (lra_in_progress
       || reload_completed
@@ -5144,7 +5175,7 @@
    && optimize_insn_for_speed_p ()
    && reload_completed
    && (!EXT_REX_SSE_REG_P (operands[0])
-       || TARGET_AVX512VL)"
+       || TARGET_AVX512VL || TARGET_EVEX512)"
    [(set (match_dup 2)
 	 (float_extend:V2DF
 	   (vec_select:V2SF
@@ -5287,8 +5318,8 @@
    (set_attr "memory" "none")
    (set (attr "enabled")
      (if_then_else (eq_attr "alternative" "2")
-       (symbol_ref "TARGET_AVX512F && !TARGET_AVX512VL
-		    && !TARGET_PREFER_AVX256")
+       (symbol_ref "TARGET_AVX512F && TARGET_EVEX512
+		    && !TARGET_AVX512VL && !TARGET_PREFER_AVX256")
        (const_string "*")))])
 
 (define_expand "extend<mode>xf2"
diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index 37d20c6303a..ef49efdbde5 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -1276,7 +1276,8 @@
   (and (match_code "vec_duplicate")
        (and (match_test "TARGET_AVX512F")
 	    (ior (match_test "TARGET_AVX512VL")
-		 (match_test "GET_MODE_SIZE (GET_MODE (op)) == 64")))
+		 (and (match_test "GET_MODE_SIZE (GET_MODE (op)) == 64")
+		      (match_test "TARGET_EVEX512"))))
        (match_test "VALID_BCST_MODE_P (GET_MODE_INNER (GET_MODE (op)))")
        (match_test "GET_MODE (XEXP (op, 0))
 		    == GET_MODE_INNER (GET_MODE (op))")
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 80b43fd7db7..8d1b75b43e0 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -253,43 +253,43 @@
 
 ;; All vector modes including V?TImode, used in move patterns.
 (define_mode_iterator VMOVE
-  [(V64QI "TARGET_AVX512F") (V32QI "TARGET_AVX") V16QI
-   (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI
-   (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
-   (V8DI "TARGET_AVX512F")  (V4DI "TARGET_AVX") V2DI
-   (V4TI "TARGET_AVX512F") (V2TI "TARGET_AVX") V1TI
-   (V32HF "TARGET_AVX512F") (V16HF "TARGET_AVX") V8HF
-   (V32BF "TARGET_AVX512F") (V16BF "TARGET_AVX") V8BF
-   (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
-   (V8DF "TARGET_AVX512F")  (V4DF "TARGET_AVX") V2DF])
+  [(V64QI "TARGET_AVX512F && TARGET_EVEX512") (V32QI "TARGET_AVX") V16QI
+   (V32HI "TARGET_AVX512F && TARGET_EVEX512") (V16HI "TARGET_AVX") V8HI
+   (V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8SI "TARGET_AVX") V4SI
+   (V8DI "TARGET_AVX512F && TARGET_EVEX512")  (V4DI "TARGET_AVX") V2DI
+   (V4TI "TARGET_AVX512F && TARGET_EVEX512") (V2TI "TARGET_AVX") V1TI
+   (V32HF "TARGET_AVX512F && TARGET_EVEX512") (V16HF "TARGET_AVX") V8HF
+   (V32BF "TARGET_AVX512F && TARGET_EVEX512") (V16BF "TARGET_AVX") V8BF
+   (V16SF "TARGET_AVX512F && TARGET_EVEX512") (V8SF "TARGET_AVX") V4SF
+   (V8DF "TARGET_AVX512F && TARGET_EVEX512")  (V4DF "TARGET_AVX") V2DF])
 
 ;; All AVX-512{F,VL} vector modes without HF. Supposed TARGET_AVX512F baseline.
 (define_mode_iterator V48_AVX512VL
-  [V16SI (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")
-   V8DI  (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")
-   V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")
-   V8DF  (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
+  [(V16SI "TARGET_EVEX512") (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")
+   (V8DI "TARGET_EVEX512") (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")
+   (V16SF "TARGET_EVEX512") (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")
+   (V8DF "TARGET_EVEX512") (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
 
 (define_mode_iterator V48_256_512_AVX512VL
-  [V16SI (V8SI "TARGET_AVX512VL")
-   V8DI  (V4DI "TARGET_AVX512VL")
-   V16SF (V8SF "TARGET_AVX512VL")
-   V8DF  (V4DF "TARGET_AVX512VL")])
+  [(V16SI "TARGET_EVEX512") (V8SI "TARGET_AVX512VL")
+   (V8DI "TARGET_EVEX512") (V4DI "TARGET_AVX512VL")
+   (V16SF "TARGET_EVEX512") (V8SF "TARGET_AVX512VL")
+   (V8DF "TARGET_EVEX512") (V4DF "TARGET_AVX512VL")])
 
 ;; All AVX-512{F,VL} vector modes. Supposed TARGET_AVX512F baseline.
 (define_mode_iterator V48H_AVX512VL
-  [V16SI (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")
-   V8DI  (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")
+  [(V16SI "TARGET_EVEX512") (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")
+   (V8DI "TARGET_EVEX512") (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")
    (V32HF "TARGET_AVX512FP16")
    (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
    (V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
-   V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")
-   V8DF  (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
+   (V16SF "TARGET_EVEX512") (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")
+   (V8DF "TARGET_EVEX512") (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
 
 ;; 1,2 byte AVX-512{BW,VL} vector modes. Supposed TARGET_AVX512BW baseline.
 (define_mode_iterator VI12_AVX512VL
-  [V64QI (V16QI "TARGET_AVX512VL") (V32QI "TARGET_AVX512VL")
-   V32HI (V16HI "TARGET_AVX512VL") (V8HI "TARGET_AVX512VL")])
+  [(V64QI "TARGET_EVEX512") (V16QI "TARGET_AVX512VL") (V32QI "TARGET_AVX512VL")
+   (V32HI "TARGET_EVEX512") (V16HI "TARGET_AVX512VL") (V8HI "TARGET_AVX512VL")])
 
 (define_mode_iterator VI12HFBF_AVX512VL
   [V64QI (V16QI "TARGET_AVX512VL") (V32QI "TARGET_AVX512VL")
@@ -302,13 +302,13 @@
 
 ;; All vector modes
 (define_mode_iterator V
-  [(V64QI "TARGET_AVX512F") (V32QI "TARGET_AVX") V16QI
-   (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI
-   (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
-   (V8DI "TARGET_AVX512F")  (V4DI "TARGET_AVX") V2DI
-   (V32HF "TARGET_AVX512F") (V16HF "TARGET_AVX") V8HF
-   (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
-   (V8DF "TARGET_AVX512F")  (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")])
+  [(V64QI "TARGET_AVX512F && TARGET_EVEX512") (V32QI "TARGET_AVX") V16QI
+   (V32HI "TARGET_AVX512F && TARGET_EVEX512") (V16HI "TARGET_AVX") V8HI
+   (V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8SI "TARGET_AVX") V4SI
+   (V8DI "TARGET_AVX512F && TARGET_EVEX512")  (V4DI "TARGET_AVX") V2DI
+   (V32HF "TARGET_AVX512F && TARGET_EVEX512") (V16HF "TARGET_AVX") V8HF
+   (V16SF "TARGET_AVX512F && TARGET_EVEX512") (V8SF "TARGET_AVX") V4SF
+   (V8DF "TARGET_AVX512F && TARGET_EVEX512")  (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")])
 
 ;; All 128bit vector modes
 (define_mode_iterator V_128
@@ -324,22 +324,32 @@
    V16HF V8HF V8SF V4SF V4DF V2DF])
 
 ;; All 512bit vector modes
-(define_mode_iterator V_512 [V64QI V32HI V16SI V8DI V16SF V8DF V32HF V32BF])
+(define_mode_iterator V_512
+  [(V64QI "TARGET_EVEX512") (V32HI "TARGET_EVEX512")
+   (V16SI "TARGET_EVEX512") (V8DI "TARGET_EVEX512")
+   (V16SF "TARGET_EVEX512") (V8DF "TARGET_EVEX512")
+   (V32HF "TARGET_EVEX512") (V32BF "TARGET_EVEX512")])
 
 ;; All 256bit and 512bit vector modes
 (define_mode_iterator V_256_512
   [V32QI V16HI V16HF V16BF V8SI V4DI V8SF V4DF
-   (V64QI "TARGET_AVX512F") (V32HI "TARGET_AVX512F") (V32HF "TARGET_AVX512F")
-   (V32BF "TARGET_AVX512F") (V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F")
-   (V16SF "TARGET_AVX512F") (V8DF "TARGET_AVX512F")])
+   (V64QI "TARGET_AVX512F && TARGET_EVEX512")
+   (V32HI "TARGET_AVX512F && TARGET_EVEX512")
+   (V32HF "TARGET_AVX512F && TARGET_EVEX512")
+   (V32BF "TARGET_AVX512F && TARGET_EVEX512")
+   (V16SI "TARGET_AVX512F && TARGET_EVEX512")
+   (V8DI "TARGET_AVX512F && TARGET_EVEX512")
+   (V16SF "TARGET_AVX512F && TARGET_EVEX512")
+   (V8DF "TARGET_AVX512F && TARGET_EVEX512")])
 
 ;; All vector float modes
 (define_mode_iterator VF
-  [(V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
-   (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")])
+  [(V16SF "TARGET_AVX512F && TARGET_EVEX512") (V8SF "TARGET_AVX") V4SF
+   (V8DF "TARGET_AVX512F && TARGET_EVEX512") (V4DF "TARGET_AVX")
+   (V2DF "TARGET_SSE2")])
 
 (define_mode_iterator VF1_VF2_AVX512DQ
-  [(V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
+  [(V16SF "TARGET_AVX512F && TARGET_EVEX512") (V8SF "TARGET_AVX") V4SF
    (V8DF "TARGET_AVX512DQ") (V4DF "TARGET_AVX512DQ && TARGET_AVX512VL")
    (V2DF "TARGET_AVX512DQ && TARGET_AVX512VL")])
 
@@ -347,14 +357,17 @@
   [(V32HF "TARGET_AVX512FP16")
    (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
    (V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
-   (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
-   (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")])
+   (V16SF "TARGET_AVX512F && TARGET_EVEX512") (V8SF "TARGET_AVX") V4SF
+   (V8DF "TARGET_AVX512F && TARGET_EVEX512") (V4DF "TARGET_AVX")
+   (V2DF "TARGET_SSE2")])
 
 ;; 128-, 256- and 512-bit float vector modes for bitwise operations
 (define_mode_iterator VFB
-  [(V32HF "TARGET_AVX512F") (V16HF "TARGET_AVX") (V8HF "TARGET_SSE2")
-   (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
-   (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")])
+  [(V32HF "TARGET_AVX512F && TARGET_EVEX512")
+   (V16HF "TARGET_AVX") (V8HF "TARGET_SSE2")
+   (V16SF "TARGET_AVX512F && TARGET_EVEX512") (V8SF "TARGET_AVX") V4SF
+   (V8DF "TARGET_AVX512F && TARGET_EVEX512")
+   (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")])
 
 ;; 128- and 256-bit float vector modes
 (define_mode_iterator VF_128_256
@@ -369,10 +382,10 @@
 
 ;; All SFmode vector float modes
 (define_mode_iterator VF1
-  [(V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF])
+  [(V16SF "TARGET_AVX512F && TARGET_EVEX512") (V8SF "TARGET_AVX") V4SF])
 
 (define_mode_iterator VF1_AVX2
-  [(V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX2") V4SF])
+  [(V16SF "TARGET_AVX512F && TARGET_EVEX512") (V8SF "TARGET_AVX2") V4SF])
 
 ;; 128- and 256-bit SF vector modes
 (define_mode_iterator VF1_128_256
@@ -383,24 +396,24 @@
 
 ;; All DFmode vector float modes
 (define_mode_iterator VF2
-  [(V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") V2DF])
+  [(V8DF "TARGET_AVX512F && TARGET_EVEX512") (V4DF "TARGET_AVX") V2DF])
 
 ;; All DFmode & HFmode vector float modes
 (define_mode_iterator VF2H
   [(V32HF "TARGET_AVX512FP16")
    (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
    (V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
-   (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") V2DF])
+   (V8DF "TARGET_AVX512F && TARGET_EVEX512") (V4DF "TARGET_AVX") V2DF])
 
 ;; 128- and 256-bit DF vector modes
 (define_mode_iterator VF2_128_256
   [(V4DF "TARGET_AVX") V2DF])
 
 (define_mode_iterator VF2_512_256
-  [(V8DF "TARGET_AVX512F") V4DF])
+  [(V8DF "TARGET_AVX512F && TARGET_EVEX512") V4DF])
 
 (define_mode_iterator VF2_512_256VL
-  [V8DF (V4DF "TARGET_AVX512VL")])
+  [(V8DF "TARGET_EVEX512") (V4DF "TARGET_AVX512VL")])
 
 ;; All 128bit vector SF/DF modes
 (define_mode_iterator VF_128
@@ -417,30 +430,30 @@
 
 ;; All 512bit vector float modes
 (define_mode_iterator VF_512
-  [V16SF V8DF])
+  [(V16SF "TARGET_EVEX512") (V8DF "TARGET_EVEX512")])
 
 ;; All 512bit vector float modes for bitwise operations
 (define_mode_iterator VFB_512
-  [V32HF V16SF V8DF])
+  [(V32HF "TARGET_EVEX512") (V16SF "TARGET_EVEX512") (V8DF "TARGET_EVEX512")])
 
 (define_mode_iterator V4SF_V8HF
   [V4SF V8HF])
 
 (define_mode_iterator VI48_AVX512VL
-  [V16SI (V8SI  "TARGET_AVX512VL") (V4SI  "TARGET_AVX512VL")
-   V8DI  (V4DI  "TARGET_AVX512VL") (V2DI  "TARGET_AVX512VL")])
+  [(V16SI "TARGET_EVEX512") (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")
+   (V8DI "TARGET_EVEX512") (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")])
 
 (define_mode_iterator VI1248_AVX512VLBW
   [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX512VL && TARGET_AVX512BW")
    (V16QI "TARGET_AVX512VL && TARGET_AVX512BW")
    (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX512VL && TARGET_AVX512BW")
    (V8HI "TARGET_AVX512VL && TARGET_AVX512BW")
-   V16SI (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")
-   V8DI (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")])
+   (V16SI "TARGET_EVEX512") (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")
+   (V8DI "TARGET_EVEX512") (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")])
 
 (define_mode_iterator VF_AVX512VL
-  [V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")
-   V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
+  [(V16SF "TARGET_EVEX512") (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")
+   (V8DF "TARGET_EVEX512") (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
 
 ;; AVX512ER SF plus 128- and 256-bit SF vector modes
 (define_mode_iterator VF1_AVX512ER_128_256
@@ -450,14 +463,14 @@
   [(V32HF "TARGET_AVX512FP16")
    (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
    (V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
-   V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")
-   V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
+   (V16SF "TARGET_EVEX512") (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")
+   (V8DF "TARGET_EVEX512") (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
 
 (define_mode_iterator VF2_AVX512VL
   [V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
 
 (define_mode_iterator VF1_AVX512VL
-  [V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")])
+  [(V16SF "TARGET_EVEX512") (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")])
 
 (define_mode_iterator VHFBF [V32HF V16HF V8HF V32BF V16BF V8BF])
 (define_mode_iterator VHFBF_256 [V16HF V16BF])
@@ -472,7 +485,8 @@
 
 ;; All vector integer modes
 (define_mode_iterator VI
-  [(V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F")
+  [(V16SI "TARGET_AVX512F && TARGET_EVEX512")
+   (V8DI "TARGET_AVX512F && TARGET_EVEX512")
    (V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX") V16QI
    (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX") V8HI
    (V8SI "TARGET_AVX") V4SI
@@ -480,7 +494,8 @@
 
 ;; All vector integer and HF modes
 (define_mode_iterator VIHFBF
-  [(V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F")
+  [(V16SI "TARGET_AVX512F && TARGET_EVEX512")
+   (V8DI "TARGET_AVX512F && TARGET_EVEX512")
    (V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX") V16QI
    (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX") V8HI
    (V8SI "TARGET_AVX") V4SI
@@ -491,8 +506,8 @@
 (define_mode_iterator VI_AVX2
   [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX2") V16QI
    (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX2") V8HI
-   (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX2") V4SI
-   (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX2") V2DI])
+   (V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8SI "TARGET_AVX2") V4SI
+   (V8DI "TARGET_AVX512F && TARGET_EVEX512") (V4DI "TARGET_AVX2") V2DI])
 
 ;; All QImode vector integer modes
 (define_mode_iterator VI1
@@ -510,13 +525,13 @@
   (V8SI "TARGET_AVX") (V4DI "TARGET_AVX")])
 
 (define_mode_iterator VI8
-  [(V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX") V2DI])
+  [(V8DI "TARGET_AVX512F && TARGET_EVEX512") (V4DI "TARGET_AVX") V2DI])
 
 (define_mode_iterator VI8_FVL
   [(V8DI "TARGET_AVX512F") V4DI (V2DI "TARGET_AVX512VL")])
 
 (define_mode_iterator VI8_AVX512VL
-  [V8DI (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")])
+  [(V8DI "TARGET_EVEX512") (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")])
 
 (define_mode_iterator VI8_256_512
   [V8DI (V4DI "TARGET_AVX512VL")])
@@ -544,7 +559,7 @@
   [(V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX2") V8HI])
 
 (define_mode_iterator VI2_AVX512F
-  [(V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX2") V8HI])
+  [(V32HI "TARGET_AVX512F && TARGET_EVEX512") (V16HI "TARGET_AVX2") V8HI])
 
 (define_mode_iterator VI2_AVX512VNNIBW
   [(V32HI "TARGET_AVX512BW || TARGET_AVX512VNNI")
@@ -557,14 +572,15 @@
   [(V8SI "TARGET_AVX2") V4SI])
 
 (define_mode_iterator VI4_AVX512F
-  [(V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX2") V4SI])
+  [(V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8SI "TARGET_AVX2") V4SI])
 
 (define_mode_iterator VI4_AVX512VL
-  [V16SI (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")])
+  [(V16SI "TARGET_EVEX512") (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")])
 
 (define_mode_iterator VI48_AVX512F_AVX512VL
-  [V4SI V8SI (V16SI "TARGET_AVX512F")
-   (V2DI "TARGET_AVX512VL") (V4DI "TARGET_AVX512VL") (V8DI "TARGET_AVX512F")])
+  [V4SI V8SI (V16SI "TARGET_AVX512F && TARGET_EVEX512")
+   (V2DI "TARGET_AVX512VL") (V4DI "TARGET_AVX512VL")
+   (V8DI "TARGET_AVX512F && TARGET_EVEX512")])
 
 (define_mode_iterator VI2_AVX512VL
   [(V8HI "TARGET_AVX512VL") (V16HI "TARGET_AVX512VL") V32HI])
@@ -589,21 +605,21 @@
   [(V4DI "TARGET_AVX2") V2DI])
 
 (define_mode_iterator VI8_AVX2_AVX512F
-  [(V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX2") V2DI])
+  [(V8DI "TARGET_AVX512F && TARGET_EVEX512") (V4DI "TARGET_AVX2") V2DI])
 
 (define_mode_iterator VI8_AVX_AVX512F
-  [(V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX")])
+  [(V8DI "TARGET_AVX512F && TARGET_EVEX512") (V4DI "TARGET_AVX")])
 
 (define_mode_iterator VI4_128_8_256
   [V4SI V4DI])
 
 ;; All V8D* modes
 (define_mode_iterator V8FI
-  [V8DF V8DI])
+  [(V8DF "TARGET_EVEX512") (V8DI "TARGET_EVEX512")])
 
 ;; All V16S* modes
 (define_mode_iterator V16FI
-  [V16SF V16SI])
+  [(V16SF "TARGET_EVEX512") (V16SI "TARGET_EVEX512")])
 
 ;; ??? We should probably use TImode instead.
 (define_mode_iterator VIMAX_AVX2_AVX512BW
@@ -630,8 +646,8 @@
 
 (define_mode_iterator VI124_AVX2_24_AVX512F_1_AVX512BW
   [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX2") V16QI
-   (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX2") V8HI
-   (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX2") V4SI])
+   (V32HI "TARGET_AVX512F && TARGET_EVEX512") (V16HI "TARGET_AVX2") V8HI
+   (V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8SI "TARGET_AVX2") V4SI])
 
 (define_mode_iterator VI124_AVX2
   [(V32QI "TARGET_AVX2") V16QI
@@ -648,8 +664,8 @@
   [(V32HI "TARGET_AVX512BW")
    (V16HI "TARGET_AVX512VL && TARGET_AVX512BW")
    (V8HI "TARGET_AVX512VL && TARGET_AVX512BW")
-   V16SI (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")
-   V8DI (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")])
+   (V16SI "TARGET_EVEX512") (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")
+   (V8DI "TARGET_EVEX512") (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")])
 
 (define_mode_iterator VI48_AVX2
   [(V8SI "TARGET_AVX2") V4SI
@@ -663,14 +679,15 @@
 (define_mode_iterator VI248_AVX2_8_AVX512F_24_AVX512BW
   [(V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX2") V8HI
    (V16SI "TARGET_AVX512BW") (V8SI "TARGET_AVX2") V4SI
-   (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX2") V2DI])
+   (V8DI "TARGET_AVX512F && TARGET_EVEX512") (V4DI "TARGET_AVX2") V2DI])
 
 (define_mode_iterator VI248_AVX512BW
-  [(V32HI "TARGET_AVX512BW") V16SI V8DI])
+  [(V32HI "TARGET_AVX512BW") (V16SI "TARGET_EVEX512")
+   (V8DI "TARGET_EVEX512")])
 
 (define_mode_iterator VI248_AVX512BW_AVX512VL
   [(V32HI "TARGET_AVX512BW") 
-   (V4DI "TARGET_AVX512VL") V16SI V8DI])
+   (V4DI "TARGET_AVX512VL") (V16SI "TARGET_EVEX512") (V8DI "TARGET_EVEX512")])
 
 ;; Suppose TARGET_AVX512VL as baseline
 (define_mode_iterator VI248_AVX512BW_1
@@ -684,16 +701,16 @@
   V4DI V2DI])
    
 (define_mode_iterator VI48_AVX512F
-  [(V16SI "TARGET_AVX512F") V8SI V4SI
-   (V8DI "TARGET_AVX512F") V4DI V2DI])
+  [(V16SI "TARGET_AVX512F && TARGET_EVEX512") V8SI V4SI
+   (V8DI "TARGET_AVX512F && TARGET_EVEX512") V4DI V2DI])
 
 (define_mode_iterator VI48_AVX_AVX512F
-  [(V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
-   (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX") V2DI])
+  [(V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8SI "TARGET_AVX") V4SI
+   (V8DI "TARGET_AVX512F && TARGET_EVEX512") (V4DI "TARGET_AVX") V2DI])
 
 (define_mode_iterator VI12_AVX_AVX512F
-  [ (V64QI "TARGET_AVX512F") (V32QI "TARGET_AVX") V16QI
-    (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI])
+  [(V64QI "TARGET_AVX512F && TARGET_EVEX512") (V32QI "TARGET_AVX") V16QI
+   (V32HI "TARGET_AVX512F && TARGET_EVEX512") (V16HI "TARGET_AVX") V8HI])
 
 (define_mode_iterator V48_128_256
   [V4SF V2DF
@@ -834,7 +851,8 @@
 (define_mode_iterator VI248_256 [V16HI V8SI V4DI])
 (define_mode_iterator VI248_512 [V32HI V16SI V8DI])
 (define_mode_iterator VI48_128 [V4SI V2DI])
-(define_mode_iterator VI148_512 [V64QI V16SI V8DI])
+(define_mode_iterator VI148_512
+  [(V64QI "TARGET_EVEX512") (V16SI "TARGET_EVEX512") (V8DI "TARGET_EVEX512")])
 (define_mode_iterator VI148_256 [V32QI V8SI V4DI])
 (define_mode_iterator VI148_128 [V16QI V4SI V2DI])
 
@@ -844,15 +862,18 @@
   [V32QI V16HI V8SI
    (V64QI "TARGET_AVX512BW")
    (V32HI "TARGET_AVX512BW")
-   (V16SI "TARGET_AVX512F")])
+   (V16SI "TARGET_AVX512F && TARGET_EVEX512")])
 (define_mode_iterator VI48_256 [V8SI V4DI])
-(define_mode_iterator VI48_512 [V16SI V8DI])
+(define_mode_iterator VI48_512
+  [(V16SI "TARGET_EVEX512") (V8DI "TARGET_EVEX512")])
 (define_mode_iterator VI4_256_8_512 [V8SI V8DI])
 (define_mode_iterator VI_AVX512BW
-  [V16SI V8DI (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512BW")])
+  [(V16SI "TARGET_EVEX512") (V8DI "TARGET_EVEX512")
+   (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512BW")])
 (define_mode_iterator VIHFBF_AVX512BW
-  [V16SI V8DI (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512BW")
-  (V32HF "TARGET_AVX512BW") (V32BF "TARGET_AVX512BW")])
+  [(V16SI "TARGET_EVEX512") (V8DI "TARGET_EVEX512")
+   (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512BW")
+   (V32HF "TARGET_AVX512BW") (V32BF "TARGET_AVX512BW")])
 
 ;; Int-float size matches
 (define_mode_iterator VI2F_256_512 [V16HI V32HI V16HF V32HF V16BF V32BF])
@@ -862,12 +883,15 @@
 (define_mode_iterator VI8F_256 [V4DI V4DF])
 (define_mode_iterator VI4F_256_512
   [V8SI V8SF
-   (V16SI "TARGET_AVX512F") (V16SF "TARGET_AVX512F")])
+   (V16SI "TARGET_AVX512F && TARGET_EVEX512")
+   (V16SF "TARGET_AVX512F && TARGET_EVEX512")])
 (define_mode_iterator VI48F_256_512
   [V8SI V8SF
-  (V16SI "TARGET_AVX512F") (V16SF "TARGET_AVX512F")
-  (V8DI  "TARGET_AVX512F") (V8DF  "TARGET_AVX512F")
-  (V4DI  "TARGET_AVX512VL") (V4DF  "TARGET_AVX512VL")])
+  (V16SI "TARGET_AVX512F && TARGET_EVEX512")
+  (V16SF "TARGET_AVX512F && TARGET_EVEX512")
+  (V8DI "TARGET_AVX512F && TARGET_EVEX512")
+  (V8DF "TARGET_AVX512F && TARGET_EVEX512")
+  (V4DI "TARGET_AVX512VL") (V4DF  "TARGET_AVX512VL")])
 (define_mode_iterator VF48_I1248
   [V16SI V16SF V8DI V8DF V32HI V64QI])
 (define_mode_iterator VF48H_AVX512VL
@@ -877,14 +901,17 @@
   [V2DF V4SF])
 
 (define_mode_iterator VI48F
-  [V16SI V16SF V8DI V8DF
+  [(V16SI "TARGET_EVEX512") (V16SF "TARGET_EVEX512")
+   (V8DI "TARGET_EVEX512") (V8DF "TARGET_EVEX512")
    (V8SI "TARGET_AVX512VL") (V8SF "TARGET_AVX512VL")
    (V4DI "TARGET_AVX512VL") (V4DF "TARGET_AVX512VL")
    (V4SI "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")
    (V2DI "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
 (define_mode_iterator VI12_VI48F_AVX512VL
-  [(V16SI "TARGET_AVX512F") (V16SF "TARGET_AVX512F")
-   (V8DI "TARGET_AVX512F") (V8DF "TARGET_AVX512F")
+  [(V16SI "TARGET_AVX512F && TARGET_EVEX512")
+   (V16SF "TARGET_AVX512F && TARGET_EVEX512")
+   (V8DI "TARGET_AVX512F && TARGET_EVEX512")
+   (V8DF "TARGET_AVX512F && TARGET_EVEX512")
    (V8SI "TARGET_AVX512VL") (V8SF "TARGET_AVX512VL")
    (V4DI "TARGET_AVX512VL") (V4DF "TARGET_AVX512VL")
    (V4SI "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")
@@ -901,7 +928,8 @@
 
 (define_mode_iterator V8_128 [V8HI V8HF V8BF])
 (define_mode_iterator V16_256 [V16HI V16HF V16BF])
-(define_mode_iterator V32_512 [V32HI V32HF V32BF])
+(define_mode_iterator V32_512
+ [(V32HI "TARGET_EVEX512") (V32HF "TARGET_EVEX512") (V32BF "TARGET_EVEX512")])
 
 ;; Mapping from float mode to required SSE level
 (define_mode_attr sse
@@ -1295,7 +1323,8 @@
 
 ;; Mix-n-match
 (define_mode_iterator AVX256MODE2P [V8SI V8SF V4DF])
-(define_mode_iterator AVX512MODE2P [V16SI V16SF V8DF])
+(define_mode_iterator AVX512MODE2P
+  [(V16SI "TARGET_EVEX512") (V16SF "TARGET_EVEX512") (V8DF "TARGET_EVEX512")])
 
 ;; Mapping for dbpsabbw modes
 (define_mode_attr dbpsadbwmode
@@ -1897,9 +1926,11 @@
 (define_mode_iterator STORENT_MODE
   [(DI "TARGET_SSE2 && TARGET_64BIT") (SI "TARGET_SSE2")
    (SF "TARGET_SSE4A") (DF "TARGET_SSE4A")
-   (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX") (V2DI "TARGET_SSE2")
-   (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
-   (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")])
+   (V8DI "TARGET_AVX512F && TARGET_EVEX512")
+   (V4DI "TARGET_AVX") (V2DI "TARGET_SSE2")
+   (V16SF "TARGET_AVX512F && TARGET_EVEX512") (V8SF "TARGET_AVX") V4SF
+   (V8DF "TARGET_AVX512F && TARGET_EVEX512")
+   (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")])
 
 (define_expand "storent<mode>"
   [(set (match_operand:STORENT_MODE 0 "memory_operand")
@@ -3377,9 +3408,11 @@
 (define_mode_iterator REDUC_PLUS_MODE
  [(V4DF "TARGET_AVX") (V8SF "TARGET_AVX")
   (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
-  (V8DF "TARGET_AVX512F") (V16SF "TARGET_AVX512F")
+  (V8DF "TARGET_AVX512F && TARGET_EVEX512")
+  (V16SF "TARGET_AVX512F && TARGET_EVEX512")
   (V32HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
-  (V32QI "TARGET_AVX") (V64QI "TARGET_AVX512F")])
+  (V32QI "TARGET_AVX")
+  (V64QI "TARGET_AVX512F && TARGET_EVEX512")])
 
 (define_expand "reduc_plus_scal_<mode>"
  [(plus:REDUC_PLUS_MODE
@@ -3423,9 +3456,11 @@
    (V8SF "TARGET_AVX") (V4DF "TARGET_AVX")
    (V64QI "TARGET_AVX512BW")
    (V32HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
-   (V32HI "TARGET_AVX512BW") (V16SI "TARGET_AVX512F")
-   (V8DI "TARGET_AVX512F") (V16SF "TARGET_AVX512F")
-   (V8DF "TARGET_AVX512F")])
+   (V32HI "TARGET_AVX512BW")
+   (V16SI "TARGET_AVX512F && TARGET_EVEX512")
+   (V8DI "TARGET_AVX512F && TARGET_EVEX512")
+   (V16SF "TARGET_AVX512F && TARGET_EVEX512")
+   (V8DF "TARGET_AVX512F && TARGET_EVEX512")])
 
 (define_expand "reduc_<code>_scal_<mode>"
   [(smaxmin:REDUC_SMINMAX_MODE
@@ -5035,7 +5070,7 @@
   output_asm_insn (buf, operands);
   return "";
 }
-  [(set_attr "isa" "noavx,avx,avx512vl,avx512f")
+  [(set_attr "isa" "noavx,avx,avx512vl,avx512f_512")
    (set_attr "type" "sselog")
    (set_attr "prefix" "orig,vex,evex,evex")
    (set (attr "mode")
@@ -5092,7 +5127,7 @@
   output_asm_insn (buf, operands);
   return "";
 }
-  [(set_attr "isa" "noavx,avx,avx512vl,avx512f")
+  [(set_attr "isa" "noavx,avx,avx512vl,avx512f_512")
    (set_attr "type" "sselog")
    (set (attr "prefix_data16")
      (if_then_else
@@ -5161,7 +5196,7 @@
   output_asm_insn (buf, operands);
   return "";
 }
-  [(set_attr "isa" "noavx,avx,avx512vl,avx512f")
+  [(set_attr "isa" "noavx,avx,avx512vl,avx512f_512")
    (set_attr "type" "sselog")
    (set_attr "prefix" "orig,vex,evex,evex")
    (set (attr "mode")
@@ -5223,7 +5258,7 @@
   output_asm_insn (buf, operands);
   return "";
 }
-  [(set_attr "isa" "noavx,avx,avx512vl,avx512f")
+  [(set_attr "isa" "noavx,avx,avx512vl,avx512f_512")
    (set_attr "type" "sselog")
    (set (attr "prefix_data16")
      (if_then_else
@@ -5269,8 +5304,8 @@
    (V2DF "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512VL")
    (V8SF "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512VL")
    (V4DF "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512VL")
-   (V16SF "TARGET_AVX512F")
-   (V8DF "TARGET_AVX512F")
+   (V16SF "TARGET_AVX512F && TARGET_EVEX512")
+   (V8DF "TARGET_AVX512F && TARGET_EVEX512")
    (HF "TARGET_AVX512FP16")
    (V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
    (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
@@ -5312,8 +5347,8 @@
   (V2DF "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512VL")
   (V8SF "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512VL")
   (V4DF "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512VL")
-  (V16SF "TARGET_AVX512F")
-  (V8DF "TARGET_AVX512F")])
+  (V16SF "TARGET_AVX512F && TARGET_EVEX512")
+  (V8DF "TARGET_AVX512F && TARGET_EVEX512")])
 
 (define_mode_iterator FMAMODE
   [SF DF V4SF V2DF V8SF V4DF])
@@ -5387,8 +5422,10 @@
    (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
    (V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
    (HF "TARGET_AVX512FP16")
-   SF V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")
-   DF V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
+   SF (V16SF "TARGET_EVEX512")
+   (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")
+   DF (V8DF "TARGET_EVEX512")
+   (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
 
 (define_insn "<sd_mask_codefor>fma_fmadd_<mode><sd_maskz_name><round_name>"
   [(set (match_operand:VFH_SF_AVX512VL 0 "register_operand" "=v,v,v")
@@ -8028,7 +8065,7 @@
 	(unspec:V16SI
 	  [(match_operand:V16SF 1 "<round_nimm_predicate>" "<round_constraint>")]
 	  UNSPEC_FIX_NOTRUNC))]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && TARGET_EVEX512"
   "vcvtps2dq\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
@@ -8095,7 +8132,7 @@
   [(set (match_operand:V16SI 0 "register_operand" "=v")
 	(any_fix:V16SI
 	  (match_operand:V16SF 1 "<round_saeonly_nimm_predicate>" "<round_saeonly_constraint>")))]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && TARGET_EVEX512"
   "vcvttps2<fixsuffix>dq\t{<round_saeonly_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_saeonly_mask_op2>}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
@@ -8595,7 +8632,7 @@
 		       (const_int 2) (const_int 3)
 		       (const_int 4) (const_int 5)
 		       (const_int 6) (const_int 7)]))))]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && TARGET_EVEX512"
   "vcvtdq2pd\t{%t1, %0|%0, %t1}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
@@ -8631,7 +8668,7 @@
 	(unspec:V8SI
 	  [(match_operand:V8DF 1 "<round_nimm_predicate>" "<round_constraint>")]
 	  UNSPEC_FIX_NOTRUNC))]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && TARGET_EVEX512"
   "vcvtpd2dq\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
@@ -8789,7 +8826,7 @@
   [(set (match_operand:V8SI 0 "register_operand" "=v")
 	(any_fix:V8SI
 	  (match_operand:V8DF 1 "<round_saeonly_nimm_predicate>" "<round_saeonly_constraint>")))]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && TARGET_EVEX512"
   "vcvttpd2<fixsuffix>dq\t{<round_saeonly_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_saeonly_mask_op2>}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
@@ -9193,7 +9230,7 @@
   [(set (match_operand:V8SF 0 "register_operand" "=v")
 	(float_truncate:V8SF
 	  (match_operand:V8DF 1 "<round_nimm_predicate>" "<round_constraint>")))]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && TARGET_EVEX512"
   "vcvtpd2ps\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
@@ -9355,7 +9392,7 @@
 		       (const_int 2) (const_int 3)
 		       (const_int 4) (const_int 5)
 		       (const_int 6) (const_int 7)]))))]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && TARGET_EVEX512"
   "vcvtps2pd\t{%t1, %0|%0, %t1}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
@@ -9540,7 +9577,7 @@
    (set (match_operand:V8DF 0 "register_operand")
 	(float_extend:V8DF
 	  (match_dup 2)))]
-"TARGET_AVX512F"
+"TARGET_AVX512F && TARGET_EVEX512"
 "operands[2] = gen_reg_rtx (V8SFmode);")
 
 (define_expand "vec_unpacks_lo_v4sf"
@@ -9678,7 +9715,7 @@
    (set (match_operand:V8DF 0 "register_operand")
 	(float:V8DF
 	  (match_dup 2)))]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && TARGET_EVEX512"
   "operands[2] = gen_reg_rtx (V8SImode);")
 
 (define_expand "vec_unpacks_float_lo_v16si"
@@ -9690,7 +9727,7 @@
 		       (const_int 2) (const_int 3)
 		       (const_int 4) (const_int 5)
 		       (const_int 6) (const_int 7)]))))]
-  "TARGET_AVX512F")
+  "TARGET_AVX512F && TARGET_EVEX512")
 
 (define_expand "vec_unpacku_float_hi_v4si"
   [(set (match_dup 5)
@@ -9786,7 +9823,7 @@
 (define_expand "vec_unpacku_float_hi_v16si"
   [(match_operand:V8DF 0 "register_operand")
    (match_operand:V16SI 1 "register_operand")]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && TARGET_EVEX512"
 {
   REAL_VALUE_TYPE TWO32r;
   rtx k, x, tmp[4];
@@ -9835,7 +9872,7 @@
 (define_expand "vec_unpacku_float_lo_v16si"
   [(match_operand:V8DF 0 "register_operand")
    (match_operand:V16SI 1 "nonimmediate_operand")]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && TARGET_EVEX512"
 {
   REAL_VALUE_TYPE TWO32r;
   rtx k, x, tmp[3];
@@ -9929,7 +9966,7 @@
   [(match_operand:V16SI 0 "register_operand")
    (match_operand:V8DF 1 "nonimmediate_operand")
    (match_operand:V8DF 2 "nonimmediate_operand")]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && TARGET_EVEX512"
 {
   rtx r1, r2;
 
@@ -10044,7 +10081,7 @@
   [(match_operand:V16SI 0 "register_operand")
    (match_operand:V8DF 1 "nonimmediate_operand")
    (match_operand:V8DF 2 "nonimmediate_operand")]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && TARGET_EVEX512"
 {
   rtx r1, r2;
 
@@ -10237,7 +10274,7 @@
 		     (const_int 11) (const_int 27)
 		     (const_int 14) (const_int 30)
 		     (const_int 15) (const_int 31)])))]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && TARGET_EVEX512"
   "vunpckhps\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "type" "sselog")
    (set_attr "prefix" "evex")
@@ -10325,7 +10362,7 @@
 		     (const_int 9) (const_int 25)
 		     (const_int 12) (const_int 28)
 		     (const_int 13) (const_int 29)])))]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && TARGET_EVEX512"
   "vunpcklps\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "type" "sselog")
    (set_attr "prefix" "evex")
@@ -10465,7 +10502,7 @@
 		     (const_int 11) (const_int 11)
 		     (const_int 13) (const_int 13)
 		     (const_int 15) (const_int 15)])))]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && TARGET_EVEX512"
   "vmovshdup\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "sse")
    (set_attr "prefix" "evex")
@@ -10518,7 +10555,7 @@
 		     (const_int 10) (const_int 10)
 		     (const_int 12) (const_int 12)
 		     (const_int 14) (const_int 14)])))]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && TARGET_EVEX512"
   "vmovsldup\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "sse")
    (set_attr "prefix" "evex")
@@ -11429,7 +11466,8 @@
    (V8SF "32x4") (V8SI "32x4") (V4DF "64x2") (V4DI "64x2")])
 
 (define_mode_iterator AVX512_VEC
-  [(V8DF "TARGET_AVX512DQ") (V8DI "TARGET_AVX512DQ") V16SF V16SI])
+  [(V8DF "TARGET_AVX512DQ") (V8DI "TARGET_AVX512DQ")
+   (V16SF "TARGET_EVEX512") (V16SI "TARGET_EVEX512")])
 
 (define_expand "<extract_type>_vextract<shuffletype><extract_suf>_mask"
   [(match_operand:<ssequartermode> 0 "nonimmediate_operand")
@@ -11598,7 +11636,8 @@
   [(V16SF "32x8") (V16SI "32x8") (V8DF "64x4") (V8DI "64x4")])
 
 (define_mode_iterator AVX512_VEC_2
-  [(V16SF "TARGET_AVX512DQ") (V16SI "TARGET_AVX512DQ") V8DF V8DI])
+  [(V16SF "TARGET_AVX512DQ") (V16SI "TARGET_AVX512DQ")
+   (V8DF "TARGET_EVEX512") (V8DI "TARGET_EVEX512")])
 
 (define_expand "<extract_type_2>_vextract<shuffletype><extract_suf_2>_mask"
   [(match_operand:<ssehalfvecmode> 0 "nonimmediate_operand")
@@ -12155,7 +12194,8 @@
 		     (const_int 26) (const_int 27)
 		     (const_int 28) (const_int 29)
 		     (const_int 30) (const_int 31)])))]
-  "TARGET_AVX512F && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
+  "TARGET_AVX512F && TARGET_EVEX512
+   && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
 {
   if (TARGET_AVX512VL
       || REG_P (operands[0])
@@ -12203,7 +12243,7 @@
 		     (const_int 58) (const_int 59)
 		     (const_int 60) (const_int 61)
 		     (const_int 62) (const_int 63)])))]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && TARGET_EVEX512"
   "vextracti64x4\t{$0x1, %1, %0|%0, %1, 0x1}"
   [(set_attr "type" "sselog1")
    (set_attr "length_immediate" "1")
@@ -12299,13 +12339,13 @@
 (define_mode_iterator VEC_EXTRACT_MODE
   [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX") V16QI
    (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX") V8HI
-   (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
-   (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX") V2DI
+   (V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8SI "TARGET_AVX") V4SI
+   (V8DI "TARGET_AVX512F && TARGET_EVEX512") (V4DI "TARGET_AVX") V2DI
    (V32HF "TARGET_AVX512BW") (V16HF "TARGET_AVX") V8HF
    (V32BF "TARGET_AVX512BW") (V16BF "TARGET_AVX") V8BF
-   (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
-   (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") V2DF
-   (V4TI "TARGET_AVX512F") (V2TI "TARGET_AVX")])
+   (V16SF "TARGET_AVX512F && TARGET_EVEX512") (V8SF "TARGET_AVX") V4SF
+   (V8DF "TARGET_AVX512F && TARGET_EVEX512") (V4DF "TARGET_AVX") V2DF
+   (V4TI "TARGET_AVX512F && TARGET_EVEX512") (V2TI "TARGET_AVX")])
 
 (define_expand "vec_extract<mode><ssescalarmodelower>"
   [(match_operand:<ssescalarmode> 0 "register_operand")
@@ -12347,7 +12387,7 @@
 		     (const_int 3) (const_int 11)
 		     (const_int 5) (const_int 13)
 		     (const_int 7) (const_int 15)])))]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && TARGET_EVEX512"
   "vunpckhpd\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "type" "sselog")
    (set_attr "prefix" "evex")
@@ -12461,7 +12501,7 @@
 		     (const_int 2) (const_int 10)
 		     (const_int 4) (const_int 12)
 		     (const_int 6) (const_int 14)])))]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && TARGET_EVEX512"
   "vmovddup\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "sselog1")
    (set_attr "prefix" "evex")
@@ -12477,7 +12517,7 @@
 		     (const_int 2) (const_int 10)
 		     (const_int 4) (const_int 12)
 		     (const_int 6) (const_int 14)])))]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && TARGET_EVEX512"
   "vunpcklpd\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "type" "sselog")
    (set_attr "prefix" "evex")
@@ -12689,7 +12729,7 @@
 	   (match_operand:SI 4 "const_0_to_255_operand")]
 	  UNSPEC_VTERNLOG))]
   "(<MODE_SIZE> == 64 || TARGET_AVX512VL
-    || (TARGET_AVX512F && !TARGET_PREFER_AVX256))
+    || (TARGET_AVX512F && TARGET_EVEX512 && !TARGET_PREFER_AVX256))
 /* Disallow embeded broadcast for vector HFmode since
    it's not real AVX512FP16 instruction.  */
   && (GET_MODE_SIZE (GET_MODE_INNER (<MODE>mode)) >= 4
@@ -12781,7 +12821,7 @@
 	    (match_operand:V 3 "regmem_or_bitnot_regmem_operand")
 	    (match_operand:V 4 "regmem_or_bitnot_regmem_operand"))))]
   "(<MODE_SIZE> == 64 || TARGET_AVX512VL
-    || (TARGET_AVX512F && !TARGET_PREFER_AVX256))
+    || (TARGET_AVX512F && TARGET_EVEX512 && !TARGET_PREFER_AVX256))
    && ix86_pre_reload_split ()
    && (rtx_equal_p (STRIP_UNARY (operands[1]),
 		    STRIP_UNARY (operands[4]))
@@ -12866,7 +12906,7 @@
 	    (match_operand:V 3 "regmem_or_bitnot_regmem_operand"))
 	  (match_operand:V 4 "regmem_or_bitnot_regmem_operand")))]
   "(<MODE_SIZE> == 64 || TARGET_AVX512VL
-    || (TARGET_AVX512F && !TARGET_PREFER_AVX256))
+    || (TARGET_AVX512F && TARGET_EVEX512 && !TARGET_PREFER_AVX256))
    && ix86_pre_reload_split ()
    && (rtx_equal_p (STRIP_UNARY (operands[1]),
 		    STRIP_UNARY (operands[4]))
@@ -12950,7 +12990,7 @@
 	    (match_operand:V 2 "regmem_or_bitnot_regmem_operand"))
 	  (match_operand:V 3 "regmem_or_bitnot_regmem_operand")))]
   "(<MODE_SIZE> == 64 || TARGET_AVX512VL
-    || (TARGET_AVX512F && !TARGET_PREFER_AVX256))
+    || (TARGET_AVX512F && TARGET_EVEX512 && !TARGET_PREFER_AVX256))
    && ix86_pre_reload_split ()"
   "#"
   "&& 1"
@@ -13074,7 +13114,7 @@
    (match_operand:SI 3 "const_0_to_255_operand")
    (match_operand:V16SF 4 "register_operand")
    (match_operand:HI 5 "register_operand")]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && TARGET_EVEX512"
 {
   int mask = INTVAL (operands[3]);
   emit_insn (gen_avx512f_shufps512_1_mask (operands[0], operands[1], operands[2],
@@ -13261,7 +13301,7 @@
 		     (match_operand 16  "const_12_to_15_operand")
 		     (match_operand 17  "const_28_to_31_operand")
 		     (match_operand 18  "const_28_to_31_operand")])))]
-  "TARGET_AVX512F
+  "TARGET_AVX512F && TARGET_EVEX512
    && (INTVAL (operands[3]) == (INTVAL (operands[7]) - 4)
        && INTVAL (operands[4]) == (INTVAL (operands[8]) - 4)
        && INTVAL (operands[5]) == (INTVAL (operands[9]) - 4)
@@ -13296,7 +13336,7 @@
    (match_operand:SI 3 "const_0_to_255_operand")
    (match_operand:V8DF 4 "register_operand")
    (match_operand:QI 5 "register_operand")]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && TARGET_EVEX512"
 {
   int mask = INTVAL (operands[3]);
   emit_insn (gen_avx512f_shufpd512_1_mask (operands[0], operands[1], operands[2],
@@ -13326,7 +13366,7 @@
 		     (match_operand 8 "const_12_to_13_operand")
 		     (match_operand 9 "const_6_to_7_operand")
 		     (match_operand 10 "const_14_to_15_operand")])))]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && TARGET_EVEX512"
 {
   int mask;
   mask = INTVAL (operands[3]);
@@ -13458,7 +13498,7 @@
 		     (const_int 3) (const_int 11)
 		     (const_int 5) (const_int 13)
 		     (const_int 7) (const_int 15)])))]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && TARGET_EVEX512"
   "vpunpckhqdq\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "type" "sselog")
    (set_attr "prefix" "evex")
@@ -13508,7 +13548,7 @@
 		     (const_int 2) (const_int 10)
 		     (const_int 4) (const_int 12)
 		     (const_int 6) (const_int 14)])))]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && TARGET_EVEX512"
   "vpunpcklqdq\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "type" "sselog")
    (set_attr "prefix" "evex")
@@ -13872,8 +13912,8 @@
    (set_attr "mode" "V2DF,DF,V8DF")
    (set (attr "enabled")
 	(cond [(eq_attr "alternative" "2")
-		 (symbol_ref "TARGET_AVX512F && !TARGET_AVX512VL
-			      && !TARGET_PREFER_AVX256")
+		 (symbol_ref "TARGET_AVX512F && TARGET_EVEX512
+			      && !TARGET_AVX512VL && !TARGET_PREFER_AVX256")
 	       (match_test "<mask_avx512vl_condition>")
 	         (const_string "*")
 	      ]
@@ -13957,13 +13997,13 @@
   [(set (match_operand:PMOV_DST_MODE_1 0 "nonimmediate_operand")
 	(truncate:PMOV_DST_MODE_1
 	  (match_operand:<pmov_src_mode> 1 "register_operand")))]
-  "TARGET_AVX512F")
+  "TARGET_AVX512F && TARGET_EVEX512")
 
 (define_insn "*avx512f_<code><pmov_src_lower><mode>2"
   [(set (match_operand:PMOV_DST_MODE_1 0 "nonimmediate_operand" "=v,m")
 	(any_truncate:PMOV_DST_MODE_1
 	  (match_operand:<pmov_src_mode> 1 "register_operand" "v,v")))]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && TARGET_EVEX512"
   "vpmov<trunsuffix><pmov_suff_1>\t{%1, %0|%0, %1}"
   [(set_attr "type" "ssemov")
    (set_attr "memory" "none,store")
@@ -14094,7 +14134,7 @@
 		     (const_int 2) (const_int 3)
 		     (const_int 4) (const_int 5)
 		     (const_int 6) (const_int 7)])))]
-  "TARGET_AVX512F && ix86_pre_reload_split ()"
+  "TARGET_AVX512F && TARGET_EVEX512 && ix86_pre_reload_split ()"
   "#"
   "&& 1"
   [(set (match_dup 0)
@@ -14110,7 +14150,7 @@
         (match_operand:<pmov_src_mode> 1 "register_operand" "v,v"))
       (match_operand:PMOV_DST_MODE_1 2 "nonimm_or_0_operand" "0C,0")
       (match_operand:<avx512fmaskmode> 3 "register_operand" "Yk,Yk")))]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && TARGET_EVEX512"
   "vpmov<trunsuffix><pmov_suff_1>\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}"
   [(set_attr "type" "ssemov")
    (set_attr "memory" "none,store")
@@ -14124,7 +14164,7 @@
         (match_operand:<pmov_src_mode> 1 "register_operand"))
       (match_dup 0)
       (match_operand:<avx512fmaskmode> 2 "register_operand")))]
-  "TARGET_AVX512F")
+  "TARGET_AVX512F && TARGET_EVEX512")
 
 (define_expand "truncv32hiv32qi2"
   [(set (match_operand:V32QI 0 "nonimmediate_operand")
@@ -15072,7 +15112,7 @@
   [(set (match_operand:V8QI 0 "register_operand")
 	(truncate:V8QI
 	    (match_operand:V8DI 1 "register_operand")))]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && TARGET_EVEX512"
 {
   rtx op0 = gen_reg_rtx (V16QImode);
 
@@ -15092,7 +15132,7 @@
 			      (const_int 0) (const_int 0)
 			      (const_int 0) (const_int 0)
 			      (const_int 0) (const_int 0)])))]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && TARGET_EVEX512"
   "vpmov<trunsuffix>qb\t{%1, %0|%0, %1}"
   [(set_attr "type" "ssemov")
    (set_attr "prefix" "evex")
@@ -15102,7 +15142,7 @@
   [(set (match_operand:V8QI 0 "memory_operand" "=m")
 	(any_truncate:V8QI
 	  (match_operand:V8DI 1 "register_operand" "v")))]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && TARGET_EVEX512"
   "vpmov<trunsuffix>qb\t{%1, %0|%0, %1}"
   [(set_attr "type" "ssemov")
    (set_attr "memory" "store")
@@ -15114,7 +15154,7 @@
 	(subreg:DI
 	  (any_truncate:V8QI
 	    (match_operand:V8DI 1 "register_operand")) 0))]
-  "TARGET_AVX512F && ix86_pre_reload_split ()"
+  "TARGET_AVX512F && TARGET_EVEX512 && ix86_pre_reload_split ()"
   "#"
   "&& 1"
   [(set (match_dup 0)
@@ -15138,7 +15178,7 @@
                           (const_int 0) (const_int 0)
                           (const_int 0) (const_int 0)
                           (const_int 0) (const_int 0)])))]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && TARGET_EVEX512"
   "vpmov<trunsuffix>qb\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}"
   [(set_attr "type" "ssemov")
    (set_attr "prefix" "evex")
@@ -15159,7 +15199,7 @@
 			  (const_int 0) (const_int 0)
 			  (const_int 0) (const_int 0)
 			  (const_int 0) (const_int 0)])))]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && TARGET_EVEX512"
   "vpmov<trunsuffix>qb\t{%1, %0%{%2%}%{z%}|%0%{%2%}%{z%}, %1}"
   [(set_attr "type" "ssemov")
    (set_attr "prefix" "evex")
@@ -15172,7 +15212,7 @@
 	    (match_operand:V8DI 1 "register_operand" "v"))
 	(match_dup 0)
 	(match_operand:QI 2 "register_operand" "Yk")))]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && TARGET_EVEX512"
   "vpmov<trunsuffix>qb\t{%1, %0%{%2%}|%0%{%2%}, %1}"
   [(set_attr "type" "ssemov")
    (set_attr "memory" "store")
@@ -15195,7 +15235,7 @@
 		       (const_int 4) (const_int 5)
 		       (const_int 6) (const_int 7)]))
 	  (match_operand:QI 2 "register_operand")) 0))]
-  "TARGET_AVX512F && ix86_pre_reload_split ()"
+  "TARGET_AVX512F && TARGET_EVEX512 && ix86_pre_reload_split ()"
   "#"
   "&& 1"
   [(set (match_dup 0)
@@ -15453,7 +15493,7 @@
                          (const_int 4) (const_int 6)
                          (const_int 8) (const_int 10)
                          (const_int 12) (const_int 14)])))))]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && TARGET_EVEX512"
   "ix86_fixup_binary_operands_no_copy (MULT, V16SImode, operands);")
 
 (define_insn "*vec_widen_umult_even_v16si<mask_name>"
@@ -15473,7 +15513,8 @@
                          (const_int 4) (const_int 6)
                          (const_int 8) (const_int 10)
                          (const_int 12) (const_int 14)])))))]
-  "TARGET_AVX512F && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
+  "TARGET_AVX512F && TARGET_EVEX512
+   && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
   "vpmuludq\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "type" "sseimul")
    (set_attr "prefix" "evex")
@@ -15568,7 +15609,7 @@
                          (const_int 4) (const_int 6)
                          (const_int 8) (const_int 10)
                          (const_int 12) (const_int 14)])))))]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && TARGET_EVEX512"
   "ix86_fixup_binary_operands_no_copy (MULT, V16SImode, operands);")
 
 (define_insn "*vec_widen_smult_even_v16si<mask_name>"
@@ -15588,7 +15629,8 @@
                          (const_int 4) (const_int 6)
                          (const_int 8) (const_int 10)
                          (const_int 12) (const_int 14)])))))]
-  "TARGET_AVX512F && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
+  "TARGET_AVX512F && TARGET_EVEX512
+   && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
   "vpmuldq\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "type" "sseimul")
    (set_attr "prefix" "evex")
@@ -17263,8 +17305,10 @@
    (V8SI "TARGET_AVX2") (V4DI "TARGET_AVX2")
    (V8SF "TARGET_AVX2") (V4DF "TARGET_AVX2")
    (V16HF "TARGET_AVX512FP16")
-   (V16SF "TARGET_AVX512F") (V8DF "TARGET_AVX512F")
-   (V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F")
+   (V16SF "TARGET_AVX512F && TARGET_EVEX512")
+   (V8DF "TARGET_AVX512F && TARGET_EVEX512")
+   (V16SI "TARGET_AVX512F && TARGET_EVEX512")
+   (V8DI "TARGET_AVX512F && TARGET_EVEX512")
    (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512VBMI")
    (V32HF "TARGET_AVX512FP16")])
 
@@ -17293,7 +17337,7 @@
 {
   operands[2] = CONSTM1_RTX (<MODE>mode);
 
-  if (!TARGET_AVX512F)
+  if (!TARGET_AVX512F || (!TARGET_AVX512VL && !TARGET_EVEX512))
     operands[2] = force_reg (<MODE>mode, operands[2]);
 })
 
@@ -17302,6 +17346,7 @@
 	(xor:VI (match_operand:VI 1 "bcst_vector_operand"     " 0, m,Br")
 		(match_operand:VI 2 "vector_all_ones_operand" "BC,BC,BC")))]
   "TARGET_AVX512F
+   && (<MODE_SIZE> == 64 || TARGET_AVX512VL || TARGET_EVEX512)
    && (!<mask_applied>
        || <ssescalarmode>mode == SImode
        || <ssescalarmode>mode == DImode)"
@@ -17368,7 +17413,7 @@
 		(match_operand:VI 2 "vector_all_ones_operand" "BC,BC,BC")))
    (unspec [(match_operand:VI 3 "register_operand" "0,0,0")]
      UNSPEC_INSN_FALSE_DEP)]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && (<MODE_SIZE> == 64 || TARGET_AVX512VL || TARGET_EVEX512)"
 {
   if (TARGET_AVX512VL)
     return "vpternlog<ternlogsuffix>\t{$0x55, %1, %0, %0<mask_operand3>|%0<mask_operand3>, %0, %1, 0x55}";
@@ -17392,7 +17437,7 @@
 	  (not:<ssescalarmode>
 	    (match_operand:<ssescalarmode> 1 "nonimmediate_operand"))))]
   "<MODE_SIZE> == 64 || TARGET_AVX512VL
-   || (TARGET_AVX512F && !TARGET_PREFER_AVX256)"
+   || (TARGET_AVX512F && TARGET_EVEX512 && !TARGET_PREFER_AVX256)"
   [(set (match_dup 0)
 	(xor:VI48_AVX512F
 	  (vec_duplicate:VI48_AVX512F (match_dup 1))
@@ -17538,7 +17583,8 @@
 		 (symbol_ref "<MODE_SIZE> == 64 || TARGET_AVX512VL")
 	       (eq_attr "alternative" "4")
 		 (symbol_ref "<MODE_SIZE> == 64 || TARGET_AVX512VL
-			      || (TARGET_AVX512F && !TARGET_PREFER_AVX256)")
+			      || (TARGET_AVX512F && TARGET_EVEX512
+				  && !TARGET_PREFER_AVX256)")
 	      ]
 	      (const_string "*")))])
 
@@ -17582,7 +17628,7 @@
 	      (match_operand:<ssescalarmode> 1 "nonimmediate_operand")))
 	  (match_operand:VI 2 "vector_operand")))]
   "<MODE_SIZE> == 64 || TARGET_AVX512VL
-   || (TARGET_AVX512F && !TARGET_PREFER_AVX256)"
+   || (TARGET_AVX512F && TARGET_EVEX512 && !TARGET_PREFER_AVX256)"
   [(set (match_dup 3)
 	(vec_duplicate:VI (match_dup 1)))
    (set (match_dup 0)
@@ -17597,7 +17643,7 @@
 	      (match_operand:<ssescalarmode> 1 "nonimmediate_operand")))
 	  (match_operand:VI 2 "vector_operand")))]
   "<MODE_SIZE> == 64 || TARGET_AVX512VL
-   || (TARGET_AVX512F && !TARGET_PREFER_AVX256)"
+   || (TARGET_AVX512F && TARGET_EVEX512 && !TARGET_PREFER_AVX256)"
   [(set (match_dup 3)
 	(vec_duplicate:VI (match_dup 1)))
    (set (match_dup 0)
@@ -17883,7 +17929,7 @@
 	    (match_operand:VI 1 "bcst_vector_operand" "0,m,  0,vBr"))
 	  (match_operand:VI 2 "bcst_vector_operand"   "m,0,vBr,  0")))]
   "(<MODE_SIZE> == 64 || TARGET_AVX512VL
-    || (TARGET_AVX512F && !TARGET_PREFER_AVX256))
+    || (TARGET_AVX512F && TARGET_EVEX512 && !TARGET_PREFER_AVX256))
    && (register_operand (operands[1], <MODE>mode)
        || register_operand (operands[2], <MODE>mode))"
 {
@@ -17916,7 +17962,7 @@
 	    (match_operand:VI 1 "bcst_vector_operand" "%0, 0")
 	    (match_operand:VI 2 "bcst_vector_operand" " m,vBr"))))]
   "(<MODE_SIZE> == 64 || TARGET_AVX512VL
-    || (TARGET_AVX512F && !TARGET_PREFER_AVX256))
+    || (TARGET_AVX512F && TARGET_EVEX512 && !TARGET_PREFER_AVX256))
    && (register_operand (operands[1], <MODE>mode)
        || register_operand (operands[2], <MODE>mode))"
 {
@@ -17947,7 +17993,7 @@
 	  (not:VI (match_operand:VI 1 "bcst_vector_operand" "%0, 0"))
 	  (not:VI (match_operand:VI 2 "bcst_vector_operand" "m,vBr"))))]
   "(<MODE_SIZE> == 64 || TARGET_AVX512VL
-    || (TARGET_AVX512F && !TARGET_PREFER_AVX256))
+    || (TARGET_AVX512F && TARGET_EVEX512 && !TARGET_PREFER_AVX256))
    && (register_operand (operands[1], <MODE>mode)
        || register_operand (operands[2], <MODE>mode))"
 {
@@ -18669,7 +18715,7 @@
 		     (const_int 11) (const_int 27)
 		     (const_int 14) (const_int 30)
 		     (const_int 15) (const_int 31)])))]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && TARGET_EVEX512"
   "vpunpckhdq\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "type" "sselog")
    (set_attr "prefix" "evex")
@@ -18724,7 +18770,7 @@
 		     (const_int 9) (const_int 25)
 		     (const_int 12) (const_int 28)
 		     (const_int 13) (const_int 29)])))]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && TARGET_EVEX512"
   "vpunpckldq\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "type" "sselog")
    (set_attr "prefix" "evex")
@@ -19418,7 +19464,7 @@
    (match_operand:SI 2 "const_0_to_255_operand")
    (match_operand:V16SI 3 "register_operand")
    (match_operand:HI 4 "register_operand")]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && TARGET_EVEX512"
 {
   int mask = INTVAL (operands[2]);
   emit_insn (gen_avx512f_pshufd_1_mask (operands[0], operands[1],
@@ -19462,7 +19508,7 @@
 		     (match_operand 15 "const_12_to_15_operand")
 		     (match_operand 16 "const_12_to_15_operand")
 		     (match_operand 17 "const_12_to_15_operand")])))]
-  "TARGET_AVX512F
+  "TARGET_AVX512F && TARGET_EVEX512
    && INTVAL (operands[2]) + 4 == INTVAL (operands[6])
    && INTVAL (operands[3]) + 4 == INTVAL (operands[7])
    && INTVAL (operands[4]) + 4 == INTVAL (operands[8])
@@ -20315,7 +20361,7 @@
 	  (match_operand:V4TI 1 "register_operand" "v")
 	  (parallel
 	    [(match_operand:SI 2 "const_0_to_3_operand")])))]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && TARGET_EVEX512"
   "vextracti32x4\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "sselog")
    (set_attr "length_immediate" "1")
@@ -20323,7 +20369,7 @@
    (set_attr "mode" "XI")])
 
 (define_mode_iterator VEXTRACTI128_MODE
-  [(V4TI "TARGET_AVX512F") V2TI])
+  [(V4TI "TARGET_AVX512F && TARGET_EVEX512") V2TI])
 
 (define_split
   [(set (match_operand:TI 0 "nonimmediate_operand")
@@ -20346,7 +20392,8 @@
    && VECTOR_MODE_P (GET_MODE (operands[1]))
    && ((TARGET_SSE && GET_MODE_SIZE (GET_MODE (operands[1])) == 16)
        || (TARGET_AVX && GET_MODE_SIZE (GET_MODE (operands[1])) == 32)
-       || (TARGET_AVX512F && GET_MODE_SIZE (GET_MODE (operands[1])) == 64))
+       || (TARGET_AVX512F && TARGET_EVEX512
+	   && GET_MODE_SIZE (GET_MODE (operands[1])) == 64))
    && (<MODE>mode == SImode || TARGET_64BIT || MEM_P (operands[0]))"
   [(set (match_dup 0) (vec_select:SWI48x (match_dup 1)
 					 (parallel [(const_int 0)])))]
@@ -21994,8 +22041,9 @@
 (define_mode_iterator VI1248_AVX512VL_AVX512BW
   [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX2") V16QI
    (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX2") V8HI
-   (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX2") V4SI
-   (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")])
+   (V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8SI "TARGET_AVX2") V4SI
+   (V8DI "TARGET_AVX512F && TARGET_EVEX512") (V4DI "TARGET_AVX512VL")
+   (V2DI "TARGET_AVX512VL")])
 
 (define_insn "*abs<mode>2"
   [(set (match_operand:VI1248_AVX512VL_AVX512BW 0 "register_operand" "=<v_Yw>")
@@ -22840,7 +22888,7 @@
   [(set (match_operand:V16SI 0 "register_operand" "=v")
 	(any_extend:V16SI
 	  (match_operand:V16QI 1 "nonimmediate_operand" "vm")))]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && TARGET_EVEX512"
   "vpmov<extsuffix>bd\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssemov")
    (set_attr "prefix" "evex")
@@ -22850,7 +22898,7 @@
   [(set (match_operand:V16SI 0 "register_operand")
 	(any_extend:V16SI
 	  (match_operand:V16QI 1 "nonimmediate_operand")))]
-  "TARGET_AVX512F")
+  "TARGET_AVX512F && TARGET_EVEX512")
 
 (define_insn "avx2_<code>v8qiv8si2<mask_name>"
   [(set (match_operand:V8SI 0 "register_operand" "=v")
@@ -22982,7 +23030,7 @@
   [(set (match_operand:V16SI 0 "register_operand" "=v")
 	(any_extend:V16SI
 	  (match_operand:V16HI 1 "nonimmediate_operand" "vm")))]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && TARGET_EVEX512"
   "vpmov<extsuffix>wd\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssemov")
    (set_attr "prefix" "evex")
@@ -22992,7 +23040,7 @@
   [(set (match_operand:V16SI 0 "register_operand")
 	(any_extend:V16SI
 	  (match_operand:V16HI 1 "nonimmediate_operand")))]
-  "TARGET_AVX512F")
+  "TARGET_AVX512F && TARGET_EVEX512")
 
 (define_insn_and_split "avx512f_zero_extendv16hiv16si2_1"
   [(set (match_operand:V32HI 0 "register_operand" "=v")
@@ -23002,7 +23050,7 @@
 	    (match_operand:V32HI 2 "const0_operand"))
 	  (match_parallel 3 "pmovzx_parallel"
 	    [(match_operand 4 "const_int_operand")])))]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && TARGET_EVEX512"
   "#"
   "&& reload_completed"
   [(set (match_dup 0) (zero_extend:V16SI (match_dup 1)))]
@@ -23223,7 +23271,7 @@
 		       (const_int 2) (const_int 3)
 		       (const_int 4) (const_int 5)
 		       (const_int 6) (const_int 7)]))))]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && TARGET_EVEX512"
   "vpmov<extsuffix>bq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssemov")
    (set_attr "prefix" "evex")
@@ -23233,7 +23281,7 @@
   [(set (match_operand:V8DI 0 "register_operand" "=v")
 	(any_extend:V8DI
 	  (match_operand:V8QI 1 "memory_operand" "m")))]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && TARGET_EVEX512"
   "vpmov<extsuffix>bq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssemov")
    (set_attr "prefix" "evex")
@@ -23251,7 +23299,7 @@
 		       (const_int 2) (const_int 3)
 		       (const_int 4) (const_int 5)
 		       (const_int 6) (const_int 7)]))))]
-  "TARGET_AVX512F && ix86_pre_reload_split ()"
+  "TARGET_AVX512F && TARGET_EVEX512 && ix86_pre_reload_split ()"
   "#"
   "&& 1"
   [(set (match_dup 0)
@@ -23262,7 +23310,7 @@
   [(set (match_operand:V8DI 0 "register_operand")
 	(any_extend:V8DI
 	  (match_operand:V8QI 1 "nonimmediate_operand")))]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && TARGET_EVEX512"
 {
   if (!MEM_P (operands[1]))
     {
@@ -23402,7 +23450,7 @@
   [(set (match_operand:V8DI 0 "register_operand" "=v")
 	(any_extend:V8DI
 	  (match_operand:V8HI 1 "nonimmediate_operand" "vm")))]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && TARGET_EVEX512"
   "vpmov<extsuffix>wq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssemov")
    (set_attr "prefix" "evex")
@@ -23412,7 +23460,7 @@
   [(set (match_operand:V8DI 0 "register_operand")
 	(any_extend:V8DI
 	  (match_operand:V8HI 1 "nonimmediate_operand")))]
-  "TARGET_AVX512F")
+  "TARGET_AVX512F && TARGET_EVEX512")
 
 (define_insn "avx2_<code>v4hiv4di2<mask_name>"
   [(set (match_operand:V4DI 0 "register_operand" "=v")
@@ -23538,7 +23586,7 @@
   [(set (match_operand:V8DI 0 "register_operand" "=v")
 	(any_extend:V8DI
 	  (match_operand:V8SI 1 "nonimmediate_operand" "vm")))]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && TARGET_EVEX512"
   "vpmov<extsuffix>dq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssemov")
    (set_attr "prefix" "evex")
@@ -23552,7 +23600,7 @@
 	    (match_operand:V16SI 2 "const0_operand"))
 	  (match_parallel 3 "pmovzx_parallel"
 	    [(match_operand 4 "const_int_operand")])))]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && TARGET_EVEX512"
   "#"
   "&& reload_completed"
   [(set (match_dup 0) (zero_extend:V8DI (match_dup 1)))]
@@ -23571,7 +23619,7 @@
 	    (match_operand:V16SI 3 "const0_operand"))
 	  (match_parallel 4 "pmovzx_parallel"
 	    [(match_operand 5 "const_int_operand")])))]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && TARGET_EVEX512"
   "#"
   "&& reload_completed"
   [(set (match_dup 0) (zero_extend:V8DI (match_dup 1)))]
@@ -23583,7 +23631,7 @@
   [(set (match_operand:V8DI 0 "register_operand" "=v")
 	(any_extend:V8DI
 	  (match_operand:V8SI 1 "nonimmediate_operand" "vm")))]
-  "TARGET_AVX512F")
+  "TARGET_AVX512F && TARGET_EVEX512")
 
 (define_insn "avx2_<code>v4siv4di2<mask_name>"
   [(set (match_operand:V4DI 0 "register_operand" "=v")
@@ -23977,7 +24025,7 @@
   [(match_operand:V16SI 0 "register_operand")
    (match_operand:V16SF 1 "nonimmediate_operand")
    (match_operand:SI 2 "const_0_to_15_operand")]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && TARGET_EVEX512"
 {
   rtx tmp = gen_reg_rtx (V16SFmode);
   emit_insn (gen_avx512f_rndscalev16sf (tmp, operands[1], operands[2]));
@@ -25394,7 +25442,7 @@
 	(ashiftrt:V8DI
 	  (match_operand:V8DI 1 "register_operand")
 	  (match_operand:V8DI 2 "nonimmediate_operand")))]
-  "TARGET_AVX512F")
+  "TARGET_AVX512F && TARGET_EVEX512")
 
 (define_expand "vashrv4di3"
   [(set (match_operand:V4DI 0 "register_operand")
@@ -25485,7 +25533,7 @@
   [(set (match_operand:V16SI 0 "register_operand")
 	(ashiftrt:V16SI (match_operand:V16SI 1 "register_operand")
 		        (match_operand:V16SI 2 "nonimmediate_operand")))]
-  "TARGET_AVX512F")
+  "TARGET_AVX512F && TARGET_EVEX512")
 
 (define_expand "vashrv8si3"
   [(set (match_operand:V8SI 0 "register_operand")
@@ -26058,8 +26106,8 @@
 (define_mode_attr pbroadcast_evex_isa
   [(V64QI "avx512bw") (V32QI "avx512bw") (V16QI "avx512bw")
    (V32HI "avx512bw") (V16HI "avx512bw") (V8HI "avx512bw")
-   (V16SI "avx512f") (V8SI "avx512f") (V4SI "avx512f")
-   (V8DI "avx512f") (V4DI "avx512f") (V2DI "avx512f")
+   (V16SI "avx512f_512") (V8SI "avx512f") (V4SI "avx512f")
+   (V8DI "avx512f_512") (V4DI "avx512f") (V2DI "avx512f")
    (V32HF "avx512bw") (V16HF "avx512bw") (V8HF "avx512bw")
    (V32BF "avx512bw") (V16BF "avx512bw") (V8BF "avx512bw")])
 
@@ -26602,7 +26650,7 @@
    (set (attr "enabled")
 	(if_then_else (eq_attr "alternative" "1")
 		      (symbol_ref "TARGET_AVX512F && !TARGET_AVX512VL
-				   && !TARGET_PREFER_AVX256")
+				   && TARGET_EVEX512 && !TARGET_PREFER_AVX256")
 		      (const_string "*")))])
 
 (define_insn "*vec_dupv4si"
@@ -26630,7 +26678,7 @@
    (set (attr "enabled")
 	(if_then_else (eq_attr "alternative" "1")
 		      (symbol_ref "TARGET_AVX512F && !TARGET_AVX512VL
-				   && !TARGET_PREFER_AVX256")
+				   && TARGET_EVEX512 && !TARGET_PREFER_AVX256")
 		      (const_string "*")))])
 
 (define_insn "*vec_dupv2di"
@@ -26661,7 +26709,8 @@
 	(if_then_else
 	  (eq_attr "alternative" "2")
 	  (symbol_ref "TARGET_AVX512VL
-		       || (TARGET_AVX512F && !TARGET_PREFER_AVX256)")
+		       || (TARGET_AVX512F && TARGET_EVEX512
+			   && !TARGET_PREFER_AVX256)")
 	  (const_string "*")))])
 
 (define_insn "avx2_vbroadcasti128_<mode>"
@@ -26741,7 +26790,7 @@
   [(set_attr "type" "ssemov")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "maybe_evex")
-   (set_attr "isa" "avx2,noavx2,avx2,avx512f,noavx2")
+   (set_attr "isa" "avx2,noavx2,avx2,avx512f_512,noavx2")
    (set_attr "mode" "<sseinsnmode>,V8SF,<sseinsnmode>,<sseinsnmode>,V8SF")])
 
 (define_split
@@ -26908,7 +26957,8 @@
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_mode_iterator VPERMI2
-  [V16SI V16SF V8DI V8DF
+  [(V16SI "TARGET_EVEX512") (V16SF "TARGET_EVEX512")
+   (V8DI "TARGET_EVEX512") (V8DF "TARGET_EVEX512")
    (V8SI "TARGET_AVX512VL") (V8SF "TARGET_AVX512VL")
    (V4DI "TARGET_AVX512VL") (V4DF "TARGET_AVX512VL")
    (V4SI "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")
@@ -26919,7 +26969,7 @@
    (V16QI "TARGET_AVX512VBMI && TARGET_AVX512VL")])
 
 (define_mode_iterator VPERMI2I
-  [V16SI V8DI
+  [(V16SI "TARGET_EVEX512") (V8DI "TARGET_EVEX512")
    (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")
    (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")
    (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX512BW && TARGET_AVX512VL")
@@ -27543,28 +27593,29 @@
 
 ;; Modes handled by vec_init expanders.
 (define_mode_iterator VEC_INIT_MODE
-  [(V64QI "TARGET_AVX512F") (V32QI "TARGET_AVX") V16QI
-   (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI
-   (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
-   (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX") V2DI
-   (V32HF "TARGET_AVX512F") (V16HF "TARGET_AVX") V8HF
-   (V32BF "TARGET_AVX512F") (V16BF "TARGET_AVX") V8BF
-   (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
-   (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")
-   (V4TI "TARGET_AVX512F") (V2TI "TARGET_AVX")])
+  [(V64QI "TARGET_AVX512F && TARGET_EVEX512") (V32QI "TARGET_AVX") V16QI
+   (V32HI "TARGET_AVX512F && TARGET_EVEX512") (V16HI "TARGET_AVX") V8HI
+   (V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8SI "TARGET_AVX") V4SI
+   (V8DI "TARGET_AVX512F && TARGET_EVEX512") (V4DI "TARGET_AVX") V2DI
+   (V32HF "TARGET_AVX512F && TARGET_EVEX512") (V16HF "TARGET_AVX") V8HF
+   (V32BF "TARGET_AVX512F && TARGET_EVEX512") (V16BF "TARGET_AVX") V8BF
+   (V16SF "TARGET_AVX512F && TARGET_EVEX512") (V8SF "TARGET_AVX") V4SF
+   (V8DF "TARGET_AVX512F && TARGET_EVEX512")
+   (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")
+   (V4TI "TARGET_AVX512F && TARGET_EVEX512") (V2TI "TARGET_AVX")])
 
 ;; Likewise, but for initialization from half sized vectors.
 ;; Thus, these are all VEC_INIT_MODE modes except V2??.
 (define_mode_iterator VEC_INIT_HALF_MODE
-  [(V64QI "TARGET_AVX512F") (V32QI "TARGET_AVX") V16QI
-   (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI
-   (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
-   (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX")
-   (V32HF "TARGET_AVX512F") (V16HF "TARGET_AVX") V8HF
-   (V32BF "TARGET_AVX512F") (V16BF "TARGET_AVX") V8BF
-   (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
-   (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX")
-   (V4TI "TARGET_AVX512F")])
+  [(V64QI "TARGET_AVX512F && TARGET_EVEX512") (V32QI "TARGET_AVX") V16QI
+   (V32HI "TARGET_AVX512F && TARGET_EVEX512") (V16HI "TARGET_AVX") V8HI
+   (V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8SI "TARGET_AVX") V4SI
+   (V8DI "TARGET_AVX512F && TARGET_EVEX512") (V4DI "TARGET_AVX")
+   (V32HF "TARGET_AVX512F && TARGET_EVEX512") (V16HF "TARGET_AVX") V8HF
+   (V32BF "TARGET_AVX512F && TARGET_EVEX512") (V16BF "TARGET_AVX") V8BF
+   (V16SF "TARGET_AVX512F && TARGET_EVEX512") (V8SF "TARGET_AVX") V4SF
+   (V8DF "TARGET_AVX512F && TARGET_EVEX512") (V4DF "TARGET_AVX")
+   (V4TI "TARGET_AVX512F && TARGET_EVEX512")])
 
 (define_expand "vec_init<mode><ssescalarmodelower>"
   [(match_operand:VEC_INIT_MODE 0 "register_operand")
@@ -27817,7 +27868,7 @@
 	(unspec:V16SF
 	  [(match_operand:V16HI 1 "<round_saeonly_nimm_predicate>" "<round_saeonly_constraint>")]
 	  UNSPEC_VCVTPH2PS))]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && TARGET_EVEX512"
   "vcvtph2ps\t{<round_saeonly_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_saeonly_mask_op2>}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
@@ -27907,7 +27958,7 @@
 	   UNSPEC_VCVTPS2PH)
 	 (match_operand:V16HI 3 "nonimm_or_0_operand")
 	 (match_operand:HI 4 "register_operand")))]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && TARGET_EVEX512"
 {
   int round = INTVAL (operands[2]);
   /* Separate {sae} from rounding control imm,
@@ -27926,7 +27977,7 @@
 	  [(match_operand:V16SF 1 "register_operand" "v")
 	   (match_operand:SI 2 "const_0_to_255_operand")]
 	  UNSPEC_VCVTPS2PH))]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && TARGET_EVEX512"
   "vcvtps2ph\t{%2, <round_saeonly_mask_op3>%1, %0<mask_operand3>|%0<mask_operand3>, %1<round_saeonly_mask_op3>, %2}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
@@ -27938,7 +27989,7 @@
 	  [(match_operand:V16SF 1 "register_operand" "v")
 	   (match_operand:SI 2 "const_0_to_255_operand")]
 	  UNSPEC_VCVTPS2PH))]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && TARGET_EVEX512"
   "vcvtps2ph\t{%2, %1, %0<merge_mask_operand3>|%0<merge_mask_operand3>, %1, %2}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
@@ -30285,10 +30336,10 @@
 ;;	vinserti64x4	$0x1, %ymm15, %zmm15, %zmm15
 
 (define_mode_iterator INT_BROADCAST_MODE
-  [(V64QI "TARGET_AVX512F") (V32QI "TARGET_AVX") V16QI
-   (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI
-   (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
-   (V8DI "TARGET_AVX512F && TARGET_64BIT")
+  [(V64QI "TARGET_AVX512F && TARGET_EVEX512") (V32QI "TARGET_AVX") V16QI
+   (V32HI "TARGET_AVX512F && TARGET_EVEX512") (V16HI "TARGET_AVX") V8HI
+   (V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8SI "TARGET_AVX") V4SI
+   (V8DI "TARGET_AVX512F && TARGET_EVEX512 && TARGET_64BIT")
    (V4DI "TARGET_AVX && TARGET_64BIT") (V2DI "TARGET_64BIT")])
 
 ;; Broadcast from an integer.  NB: Enable broadcast only if we can move
diff --git a/gcc/testsuite/gcc.target/i386/pr89229-5b.c b/gcc/testsuite/gcc.target/i386/pr89229-5b.c
index 261f2e12e8d..8a81585e790 100644
--- a/gcc/testsuite/gcc.target/i386/pr89229-5b.c
+++ b/gcc/testsuite/gcc.target/i386/pr89229-5b.c
@@ -3,4 +3,4 @@
 
 #include "pr89229-5a.c"
 
-/* { dg-final { scan-assembler-times "vmovdqa32\[^\n\r]*zmm1\[67]\[^\n\r]*zmm1\[67]" 1 } } */
+/* { dg-final { scan-assembler-times "vmovsd\[^\n\r]*xmm1\[67]\[^\n\r]*xmm1\[67]\[^\n\r]*xmm1\[67]" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr89229-6b.c b/gcc/testsuite/gcc.target/i386/pr89229-6b.c
index a74f7169e6e..0c27daa4f74 100644
--- a/gcc/testsuite/gcc.target/i386/pr89229-6b.c
+++ b/gcc/testsuite/gcc.target/i386/pr89229-6b.c
@@ -3,4 +3,4 @@
 
 #include "pr89229-6a.c"
 
-/* { dg-final { scan-assembler-times "vmovaps\[^\n\r]*zmm1\[67]\[^\n\r]*zmm1\[67]" 1 } } */
+/* { dg-final { scan-assembler-times "vmovss\[^\n\r]*xmm1\[67]\[^\n\r]*xmm1\[67]\[^\n\r]*xmm1\[67]" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr89229-7b.c b/gcc/testsuite/gcc.target/i386/pr89229-7b.c
index d3a56e6e2b7..baba99ec775 100644
--- a/gcc/testsuite/gcc.target/i386/pr89229-7b.c
+++ b/gcc/testsuite/gcc.target/i386/pr89229-7b.c
@@ -3,4 +3,4 @@
 
 #include "pr89229-7a.c"
 
-/* { dg-final { scan-assembler-times "vmovdqa32\[^\n\r]*zmm1\[67]\[^\n\r]*zmm1\[67]" 1 } } */
+/* { dg-final { scan-assembler-times "vmovss\[^\n\r]*xmm1\[67]\[^\n\r]*xmm1\[67]\[^\n\r]*xmm1\[67]" 1 } } */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 14/18] Support -mevex512 for AVX512DQ intrins
  2023-09-21  7:19 [PATCH 00/18] Support -mevex512 for AVX512 Hu, Lin1
                   ` (12 preceding siblings ...)
  2023-09-21  7:20 ` [PATCH 13/18] Support -mevex512 for AVX512F intrins Hu, Lin1
@ 2023-09-21  7:20 ` Hu, Lin1
  2023-09-21  7:20 ` [PATCH 15/18] Support -mevex512 for AVX512BW intrins Hu, Lin1
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Hu, Lin1 @ 2023-09-21  7:20 UTC (permalink / raw)
  To: gcc-patches; +Cc: hongtao.liu, ubizjak, haochen.jiang

From: Haochen Jiang <haochen.jiang@intel.com>

gcc/ChangeLog:

	* config/i386/i386-expand.cc (ix86_expand_sse2_mulvxdi3):
	Add TARGET_EVEX512 for 512 bit usage.
	* config/i386/i386.cc (standard_sse_constant_opcode): Ditto.
	* config/i386/sse.md (VF1_VF2_AVX512DQ): Ditto.
	(VF1_128_256VL): Ditto.
	(VF2_AVX512VL): Ditto.
	(VI8_256_512): Ditto.
	(<mask_codefor>fixuns_trunc<mode><sseintvecmodelower>2<mask_name>):
	Ditto.
	(AVX512_VEC): Ditto.
	(AVX512_VEC_2): Ditto.
	(VI4F_BRCST32x2): Ditto.
	(VI8F_BRCST64x2): Ditto.
---
 gcc/config/i386/i386-expand.cc |  2 +-
 gcc/config/i386/i386.cc        | 22 ++++++++++++++++------
 gcc/config/i386/sse.md         | 24 ++++++++++++++----------
 3 files changed, 31 insertions(+), 17 deletions(-)

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 0705e08d38c..063561e1265 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -24008,7 +24008,7 @@ ix86_expand_sse2_mulvxdi3 (rtx op0, rtx op1, rtx op2)
   machine_mode mode = GET_MODE (op0);
   rtx t1, t2, t3, t4, t5, t6;
 
-  if (TARGET_AVX512DQ && mode == V8DImode)
+  if (TARGET_AVX512DQ && TARGET_EVEX512 && mode == V8DImode)
     emit_insn (gen_avx512dq_mulv8di3 (op0, op1, op2));
   else if (TARGET_AVX512DQ && TARGET_AVX512VL && mode == V4DImode)
     emit_insn (gen_avx512dq_mulv4di3 (op0, op1, op2));
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 635dd85e764..589b29a324d 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -5332,9 +5332,14 @@ standard_sse_constant_opcode (rtx_insn *insn, rtx *operands)
 	  if (EXT_REX_SSE_REG_P (operands[0]))
 	    {
 	      if (TARGET_AVX512DQ)
-		return (TARGET_AVX512VL
-			? "vxorpd\t%x0, %x0, %x0"
-			: "vxorpd\t%g0, %g0, %g0");
+		{
+		  if (TARGET_AVX512VL)
+		    return "vxorpd\t%x0, %x0, %x0";
+		  else if (TARGET_EVEX512)
+		    return "vxorpd\t%g0, %g0, %g0";
+		  else
+		    gcc_unreachable ();
+		}
 	      else
 		{
 		  if (TARGET_AVX512VL)
@@ -5356,9 +5361,14 @@ standard_sse_constant_opcode (rtx_insn *insn, rtx *operands)
 	  if (EXT_REX_SSE_REG_P (operands[0]))
 	    {
 	      if (TARGET_AVX512DQ)
-		return (TARGET_AVX512VL
-			? "vxorps\t%x0, %x0, %x0"
-			: "vxorps\t%g0, %g0, %g0");
+		{
+		  if (TARGET_AVX512VL)
+		    return "vxorps\t%x0, %x0, %x0";
+		  else if (TARGET_EVEX512)
+		    return "vxorps\t%g0, %g0, %g0";
+		  else
+		    gcc_unreachable ();
+		}
 	      else
 		{
 		  if (TARGET_AVX512VL)
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 8d1b75b43e0..a8f93ceddc5 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -350,7 +350,8 @@
 
 (define_mode_iterator VF1_VF2_AVX512DQ
   [(V16SF "TARGET_AVX512F && TARGET_EVEX512") (V8SF "TARGET_AVX") V4SF
-   (V8DF "TARGET_AVX512DQ") (V4DF "TARGET_AVX512DQ && TARGET_AVX512VL")
+   (V8DF "TARGET_AVX512DQ && TARGET_EVEX512")
+   (V4DF "TARGET_AVX512DQ && TARGET_AVX512VL")
    (V2DF "TARGET_AVX512DQ && TARGET_AVX512VL")])
 
 (define_mode_iterator VFH
@@ -392,7 +393,7 @@
   [(V8SF "TARGET_AVX") V4SF])
 
 (define_mode_iterator VF1_128_256VL
-  [V8SF (V4SF "TARGET_AVX512VL")])
+  [(V8SF "TARGET_EVEX512") (V4SF "TARGET_AVX512VL")])
 
 ;; All DFmode vector float modes
 (define_mode_iterator VF2
@@ -467,7 +468,7 @@
    (V8DF "TARGET_EVEX512") (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
 
 (define_mode_iterator VF2_AVX512VL
-  [V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
+  [(V8DF "TARGET_EVEX512") (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
 
 (define_mode_iterator VF1_AVX512VL
   [(V16SF "TARGET_EVEX512") (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")])
@@ -534,7 +535,7 @@
   [(V8DI "TARGET_EVEX512") (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")])
 
 (define_mode_iterator VI8_256_512
-  [V8DI (V4DI "TARGET_AVX512VL")])
+  [(V8DI "TARGET_EVEX512") (V4DI "TARGET_AVX512VL")])
 
 (define_mode_iterator VI1_AVX2
   [(V32QI "TARGET_AVX2") V16QI])
@@ -9075,7 +9076,7 @@
 (define_insn "<mask_codefor>fixuns_trunc<mode><sseintvecmodelower>2<mask_name>"
   [(set (match_operand:<sseintvecmode> 0 "register_operand" "=v")
 	(unsigned_fix:<sseintvecmode>
-	  (match_operand:VF1_128_256VL 1 "nonimmediate_operand" "vm")))]
+	  (match_operand:VF1_128_256 1 "nonimmediate_operand" "vm")))]
   "TARGET_AVX512VL"
   "vcvttps2udq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssecvt")
@@ -11466,7 +11467,8 @@
    (V8SF "32x4") (V8SI "32x4") (V4DF "64x2") (V4DI "64x2")])
 
 (define_mode_iterator AVX512_VEC
-  [(V8DF "TARGET_AVX512DQ") (V8DI "TARGET_AVX512DQ")
+  [(V8DF "TARGET_AVX512DQ && TARGET_EVEX512")
+   (V8DI "TARGET_AVX512DQ && TARGET_EVEX512")
    (V16SF "TARGET_EVEX512") (V16SI "TARGET_EVEX512")])
 
 (define_expand "<extract_type>_vextract<shuffletype><extract_suf>_mask"
@@ -11636,7 +11638,8 @@
   [(V16SF "32x8") (V16SI "32x8") (V8DF "64x4") (V8DI "64x4")])
 
 (define_mode_iterator AVX512_VEC_2
-  [(V16SF "TARGET_AVX512DQ") (V16SI "TARGET_AVX512DQ")
+  [(V16SF "TARGET_AVX512DQ && TARGET_EVEX512")
+   (V16SI "TARGET_AVX512DQ && TARGET_EVEX512")
    (V8DF "TARGET_EVEX512") (V8DI "TARGET_EVEX512")])
 
 (define_expand "<extract_type_2>_vextract<shuffletype><extract_suf_2>_mask"
@@ -26850,8 +26853,8 @@
 
 ;; For broadcast[i|f]32x2.  Yes there is no v4sf version, only v4si.
 (define_mode_iterator VI4F_BRCST32x2
-  [V16SI (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")
-   V16SF (V8SF "TARGET_AVX512VL")])
+  [(V16SI "TARGET_EVEX512") (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")
+   (V16SF "TARGET_EVEX512") (V8SF "TARGET_AVX512VL")])
 
 (define_mode_attr 64x2mode
   [(V8DF "V2DF") (V8DI "V2DI") (V4DI "V2DI") (V4DF "V2DF")])
@@ -26901,7 +26904,8 @@
 
 ;; For broadcast[i|f]64x2
 (define_mode_iterator VI8F_BRCST64x2
-  [V8DI V8DF (V4DI "TARGET_AVX512VL") (V4DF "TARGET_AVX512VL")])
+  [(V8DI "TARGET_EVEX512") (V8DF "TARGET_EVEX512")
+   (V4DI "TARGET_AVX512VL") (V4DF "TARGET_AVX512VL")])
 
 (define_insn "<mask_codefor>avx512dq_broadcast<mode><mask_name>_1"
   [(set (match_operand:VI8F_BRCST64x2 0 "register_operand" "=v,v")
-- 
2.31.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 15/18] Support -mevex512 for AVX512BW intrins
  2023-09-21  7:19 [PATCH 00/18] Support -mevex512 for AVX512 Hu, Lin1
                   ` (13 preceding siblings ...)
  2023-09-21  7:20 ` [PATCH 14/18] Support -mevex512 for AVX512DQ intrins Hu, Lin1
@ 2023-09-21  7:20 ` Hu, Lin1
  2023-09-21  7:20 ` [PATCH 16/18] Support -mevex512 for AVX512{IFMA,VBMI,VNNI,BF16,VPOPCNTDQ,VBMI2,BITALG,VP2INTERSECT},VAES,GFNI,VPCLMULQDQ intrins Hu, Lin1
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Hu, Lin1 @ 2023-09-21  7:20 UTC (permalink / raw)
  To: gcc-patches; +Cc: hongtao.liu, ubizjak, haochen.jiang

From: Haochen Jiang <haochen.jiang@intel.com>

gcc/Changelog:

	* config/i386/i386-expand.cc (ix86_expand_vector_init_duplicate):
	Make sure there is EVEX512 enabled.
	(ix86_expand_vecop_qihi2): Refuse V32QI->V32HI when no EVEX512.
	* config/i386/i386.cc (ix86_hard_regno_mode_ok): Disable 64 bit mask
	when !TARGET_EVEX512.
	* config/i386/i386.md (avx512bw_512): New.
	(SWI1248_AVX512BWDQ_64): Add TARGET_EVEX512.
	(*zero_extendsidi2): Change isa to avx512bw_512.
	(kmov_isa): Ditto.
	(*anddi_1): Ditto.
	(*andn<mode>_1): Change isa to kmov_isa.
	(*<code><mode>_1): Ditto.
	(*notxor<mode>_1): Ditto.
	(*one_cmpl<mode>2_1): Ditto.
	(*one_cmplsi2_1_zext): Change isa to avx512bw_512.
	(*ashl<mode>3_1): Change isa to kmov_isa.
	(*lshr<mode>3_1): Ditto.
	* config/i386/sse.md (VI12HFBF_AVX512VL): Add TARGET_EVEX512.
	(VI1248_AVX512VLBW): Ditto.
	(VHFBF_AVX512VL): Ditto.
	(VI): Ditto.
	(VIHFBF): Ditto.
	(VI_AVX2): Ditto.
	(VI1_AVX512): Ditto.
	(VI12_256_512_AVX512VL): Ditto.
	(VI2_AVX2_AVX512BW): Ditto.
	(VI2_AVX512VNNIBW): Ditto.
	(VI2_AVX512VL): Ditto.
	(VI2HFBF_AVX512VL): Ditto.
	(VI8_AVX2_AVX512BW): Ditto.
	(VIMAX_AVX2_AVX512BW): Ditto.
	(VIMAX_AVX512VL): Ditto.
	(VI12_AVX2_AVX512BW): Ditto.
	(VI124_AVX2_24_AVX512F_1_AVX512BW): Ditto.
	(VI248_AVX512VL): Ditto.
	(VI248_AVX512VLBW): Ditto.
	(VI248_AVX2_8_AVX512F_24_AVX512BW): Ditto.
	(VI248_AVX512BW): Ditto.
	(VI248_AVX512BW_AVX512VL): Ditto.
	(VI248_512): Ditto.
	(VI124_256_AVX512F_AVX512BW): Ditto.
	(VI_AVX512BW): Ditto.
	(VIHFBF_AVX512BW): Ditto.
	(SWI1248_AVX512BWDQ): Ditto.
	(SWI1248_AVX512BW): Ditto.
	(SWI1248_AVX512BWDQ2): Ditto.
	(*knotsi_1_zext): Ditto.
	(define_split for zero_extend + not): Ditto.
	(kunpckdi): Ditto.
	(REDUC_SMINMAX_MODE): Ditto.
	(VEC_EXTRACT_MODE): Ditto.
	(*avx512bw_permvar_truncv16siv16hi_1): Ditto.
	(*avx512bw_permvar_truncv16siv16hi_1_hf): Ditto.
	(truncv32hiv32qi2): Ditto.
	(avx512bw_<code>v32hiv32qi2): Ditto.
	(avx512bw_<code>v32hiv32qi2_mask): Ditto.
	(avx512bw_<code>v32hiv32qi2_mask_store): Ditto.
	(usadv64qi): Ditto.
	(VEC_PERM_AVX2): Ditto.
	(AVX512ZEXTMASK): Ditto.
	(SWI24_MASK): New.
	(vec_pack_trunc_<mode>): Change iterator to SWI24_MASK.
	(avx512bw_packsswb<mask_name>): Add TARGET_EVEX512.
	(avx512bw_packssdw<mask_name>): Ditto.
	(avx512bw_interleave_highv64qi<mask_name>): Ditto.
	(avx512bw_interleave_lowv64qi<mask_name>): Ditto.
	(<mask_codefor>avx512bw_pshuflwv32hi<mask_name>): Ditto.
	(<mask_codefor>avx512bw_pshufhwv32hi<mask_name>): Ditto.
	(vec_unpacks_lo_di): Ditto.
	(SWI48x_MASK): New.
	(vec_unpacks_hi_<mode>): Change iterator to SWI48x_MASK.
	(avx512bw_umulhrswv32hi3<mask_name>): Add TARGET_EVEX512.
	(VI1248_AVX512VL_AVX512BW): Ditto.
	(avx512bw_<code>v32qiv32hi2<mask_name>): Ditto.
	(*avx512bw_zero_extendv32qiv32hi2_1): Ditto.
	(*avx512bw_zero_extendv32qiv32hi2_2): Ditto.
	(<insn>v32qiv32hi2): Ditto.
	(pbroadcast_evex_isa): Change isa attribute to avx512bw_512.
	(VPERMI2): Add TARGET_EVEX512.
	(VPERMI2I): Ditto.
---
 gcc/config/i386/i386-expand.cc |   3 +-
 gcc/config/i386/i386.cc        |   4 +-
 gcc/config/i386/i386.md        |  54 ++++-----
 gcc/config/i386/sse.md         | 193 ++++++++++++++++++---------------
 4 files changed, 128 insertions(+), 126 deletions(-)

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 063561e1265..ff2423f91ed 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -15617,6 +15617,7 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, machine_mode mode,
     case E_V32HFmode:
     case E_V32BFmode:
     case E_V64QImode:
+      gcc_assert (TARGET_EVEX512);
       if (TARGET_AVX512BW)
 	return ix86_vector_duplicate_value (mode, target, val);
       else
@@ -23512,7 +23513,7 @@ ix86_expand_vecop_qihi2 (enum rtx_code code, rtx dest, rtx op1, rtx op2)
   bool uns_p = code != ASHIFTRT;
 
   if ((qimode == V16QImode && !TARGET_AVX2)
-      || (qimode == V32QImode && !TARGET_AVX512BW)
+      || (qimode == V32QImode && (!TARGET_AVX512BW || !TARGET_EVEX512))
       /* There are no V64HImode instructions.  */
       || qimode == V64QImode)
      return false;
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 589b29a324d..03c96ff048d 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -20308,8 +20308,8 @@ ix86_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
 	return MASK_PAIR_REGNO_P(regno);
 
       return ((TARGET_AVX512F && VALID_MASK_REG_MODE (mode))
-	      || (TARGET_AVX512BW
-		  && VALID_MASK_AVX512BW_MODE (mode)));
+	      || (TARGET_AVX512BW && mode == SImode)
+	      || (TARGET_AVX512BW && TARGET_EVEX512 && mode == DImode));
     }
 
   if (GET_MODE_CLASS (mode) == MODE_PARTIAL_INT)
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 6eb4e540140..bdececc2309 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -536,10 +536,10 @@
 		    x64_avx,x64_avx512bw,x64_avx512dq,aes,
 		    sse_noavx,sse2,sse2_noavx,sse3,sse3_noavx,sse4,sse4_noavx,
 		    avx,noavx,avx2,noavx2,bmi,bmi2,fma4,fma,avx512f,avx512f_512,
-		    noavx512f,avx512bw,noavx512bw,avx512dq,noavx512dq,
-		    fma_or_avx512vl,avx512vl,noavx512vl,avxvnni,avx512vnnivl,
-		    avx512fp16,avxifma,avx512ifmavl,avxneconvert,avx512bf16vl,
-		    vpclmulqdqvl"
+		    noavx512f,avx512bw,avx512bw_512,noavx512bw,avx512dq,
+		    noavx512dq,fma_or_avx512vl,avx512vl,noavx512vl,avxvnni,
+		    avx512vnnivl,avx512fp16,avxifma,avx512ifmavl,avxneconvert,
+		    avx512bf16vl,vpclmulqdqvl"
   (const_string "base"))
 
 ;; The (bounding maximum) length of an instruction immediate.
@@ -904,6 +904,8 @@
 	   (symbol_ref "TARGET_AVX512F && TARGET_EVEX512")
 	 (eq_attr "isa" "noavx512f") (symbol_ref "!TARGET_AVX512F")
 	 (eq_attr "isa" "avx512bw") (symbol_ref "TARGET_AVX512BW")
+	 (eq_attr "isa" "avx512bw_512")
+	   (symbol_ref "TARGET_AVX512BW && TARGET_EVEX512")
 	 (eq_attr "isa" "noavx512bw") (symbol_ref "!TARGET_AVX512BW")
 	 (eq_attr "isa" "avx512dq") (symbol_ref "TARGET_AVX512DQ")
 	 (eq_attr "isa" "noavx512dq") (symbol_ref "!TARGET_AVX512DQ")
@@ -1440,7 +1442,8 @@
 
 (define_mode_iterator SWI1248_AVX512BWDQ_64
   [(QI "TARGET_AVX512DQ") HI
-   (SI "TARGET_AVX512BW") (DI "TARGET_AVX512BW && TARGET_64BIT")])
+   (SI "TARGET_AVX512BW")
+   (DI "TARGET_AVX512BW && TARGET_EVEX512 && TARGET_64BIT")])
 
 (define_insn "*cmp<mode>_ccz_1"
   [(set (reg FLAGS_REG)
@@ -4580,7 +4583,7 @@
 	    (eq_attr "alternative" "12")
 	      (const_string "x64_avx512bw")
 	    (eq_attr "alternative" "13")
-	      (const_string "avx512bw")
+	      (const_string "avx512bw_512")
 	   ]
 	   (const_string "*")))
    (set (attr "mmx_isa")
@@ -4657,7 +4660,7 @@
   "split_double_mode (DImode, &operands[0], 1, &operands[3], &operands[4]);")
 
 (define_mode_attr kmov_isa
-  [(QI "avx512dq") (HI "avx512f") (SI "avx512bw") (DI "avx512bw")])
+  [(QI "avx512dq") (HI "avx512f") (SI "avx512bw") (DI "avx512bw_512")])
 
 (define_insn "zero_extend<mode>di2"
   [(set (match_operand:DI 0 "register_operand" "=r,*r,*k")
@@ -11124,7 +11127,7 @@
    and{q}\t{%2, %0|%0, %2}
    #
    #"
-  [(set_attr "isa" "x64,x64,x64,x64,avx512bw")
+  [(set_attr "isa" "x64,x64,x64,x64,avx512bw_512")
    (set_attr "type" "alu,alu,alu,imovx,msklog")
    (set_attr "length_immediate" "*,*,*,0,*")
    (set (attr "prefix_rex")
@@ -11647,12 +11650,13 @@
 	  (not:SWI48 (match_operand:SWI48 1 "register_operand" "r,r,k"))
 	  (match_operand:SWI48 2 "nonimmediate_operand" "r,m,k")))
    (clobber (reg:CC FLAGS_REG))]
-  "TARGET_BMI || TARGET_AVX512BW"
+  "TARGET_BMI
+   || (TARGET_AVX512BW && (<MODE>mode == SImode || TARGET_EVEX512))"
   "@
    andn\t{%2, %1, %0|%0, %1, %2}
    andn\t{%2, %1, %0|%0, %1, %2}
    #"
-  [(set_attr "isa" "bmi,bmi,avx512bw")
+  [(set_attr "isa" "bmi,bmi,<kmov_isa>")
    (set_attr "type" "bitmanip,bitmanip,msklog")
    (set_attr "btver2_decode" "direct, double,*")
    (set_attr "mode" "<MODE>")])
@@ -11880,13 +11884,7 @@
    <logic>{<imodesuffix>}\t{%2, %0|%0, %2}
    <logic>{<imodesuffix>}\t{%2, %0|%0, %2}
    #"
-  [(set (attr "isa")
-	(cond [(eq_attr "alternative" "2")
-		 (if_then_else (eq_attr "mode" "SI,DI")
-		   (const_string "avx512bw")
-		   (const_string "avx512f"))
-	      ]
-	      (const_string "*")))
+  [(set_attr "isa" "*,*,<kmov_isa>")
    (set_attr "type" "alu, alu, msklog")
    (set_attr "mode" "<MODE>")])
 
@@ -11913,13 +11911,7 @@
       DONE;
     }
 }
-  [(set (attr "isa")
-	(cond [(eq_attr "alternative" "2")
-		 (if_then_else (eq_attr "mode" "SI,DI")
-		   (const_string "avx512bw")
-		   (const_string "avx512f"))
-	      ]
-	      (const_string "*")))
+  [(set_attr "isa" "*,*,<kmov_isa>")
    (set_attr "type" "alu, alu, msklog")
    (set_attr "mode" "<MODE>")])
 
@@ -13300,13 +13292,7 @@
   "@
    not{<imodesuffix>}\t%0
    #"
-  [(set (attr "isa")
-	(cond [(eq_attr "alternative" "1")
-		 (if_then_else (eq_attr "mode" "SI,DI")
-		   (const_string "avx512bw")
-		   (const_string "avx512f"))
-	      ]
-	      (const_string "*")))
+  [(set_attr "isa" "*,<kmov_isa>")
    (set_attr "type" "negnot,msklog")
    (set_attr "mode" "<MODE>")])
 
@@ -13318,7 +13304,7 @@
   "@
    not{l}\t%k0
    #"
-  [(set_attr "isa" "x64,avx512bw")
+  [(set_attr "isa" "x64,avx512bw_512")
    (set_attr "type" "negnot,msklog")
    (set_attr "mode" "SI,SI")])
 
@@ -13943,7 +13929,7 @@
 	return "sal{<imodesuffix>}\t{%2, %0|%0, %2}";
     }
 }
-  [(set_attr "isa" "*,*,bmi2,avx512bw")
+  [(set_attr "isa" "*,*,bmi2,<kmov_isa>")
    (set (attr "type")
      (cond [(eq_attr "alternative" "1")
 	      (const_string "lea")
@@ -14995,7 +14981,7 @@
 	return "shr{<imodesuffix>}\t{%2, %0|%0, %2}";
     }
 }
-  [(set_attr "isa" "*,bmi2,avx512bw")
+  [(set_attr "isa" "*,bmi2,<kmov_isa>")
    (set_attr "type" "ishift,ishiftx,msklog")
    (set (attr "length_immediate")
      (if_then_else
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index a8f93ceddc5..e59f6bf4410 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -292,10 +292,10 @@
    (V32HI "TARGET_EVEX512") (V16HI "TARGET_AVX512VL") (V8HI "TARGET_AVX512VL")])
 
 (define_mode_iterator VI12HFBF_AVX512VL
-  [V64QI (V16QI "TARGET_AVX512VL") (V32QI "TARGET_AVX512VL")
-   V32HI (V16HI "TARGET_AVX512VL") (V8HI "TARGET_AVX512VL")
-   V32HF (V16HF "TARGET_AVX512VL") (V8HF "TARGET_AVX512VL")
-   V32BF (V16BF "TARGET_AVX512VL") (V8BF "TARGET_AVX512VL")])
+  [(V64QI "TARGET_EVEX512") (V16QI "TARGET_AVX512VL") (V32QI "TARGET_AVX512VL")
+   (V32HI "TARGET_EVEX512") (V16HI "TARGET_AVX512VL") (V8HI "TARGET_AVX512VL")
+   (V32HF "TARGET_EVEX512") (V16HF "TARGET_AVX512VL") (V8HF "TARGET_AVX512VL")
+   (V32BF "TARGET_EVEX512") (V16BF "TARGET_AVX512VL") (V8BF "TARGET_AVX512VL")])
 
 (define_mode_iterator VI1_AVX512VL
   [V64QI (V16QI "TARGET_AVX512VL") (V32QI "TARGET_AVX512VL")])
@@ -445,9 +445,11 @@
    (V8DI "TARGET_EVEX512") (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")])
 
 (define_mode_iterator VI1248_AVX512VLBW
-  [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX512VL && TARGET_AVX512BW")
+  [(V64QI "TARGET_AVX512BW && TARGET_EVEX512")
+   (V32QI "TARGET_AVX512VL && TARGET_AVX512BW")
    (V16QI "TARGET_AVX512VL && TARGET_AVX512BW")
-   (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX512VL && TARGET_AVX512BW")
+   (V32HI "TARGET_AVX512BW && TARGET_EVEX512")
+   (V16HI "TARGET_AVX512VL && TARGET_AVX512BW")
    (V8HI "TARGET_AVX512VL && TARGET_AVX512BW")
    (V16SI "TARGET_EVEX512") (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")
    (V8DI "TARGET_EVEX512") (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")])
@@ -481,15 +483,15 @@
   [V32HF (V16HF "TARGET_AVX512VL") (V8HF "TARGET_AVX512VL")])
 
 (define_mode_iterator VHFBF_AVX512VL
-  [V32HF (V16HF "TARGET_AVX512VL") (V8HF "TARGET_AVX512VL")
-   V32BF (V16BF "TARGET_AVX512VL") (V8BF "TARGET_AVX512VL")])
+  [(V32HF "TARGET_EVEX512") (V16HF "TARGET_AVX512VL") (V8HF "TARGET_AVX512VL")
+   (V32BF "TARGET_EVEX512") (V16BF "TARGET_AVX512VL") (V8BF "TARGET_AVX512VL")])
 
 ;; All vector integer modes
 (define_mode_iterator VI
   [(V16SI "TARGET_AVX512F && TARGET_EVEX512")
    (V8DI "TARGET_AVX512F && TARGET_EVEX512")
-   (V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX") V16QI
-   (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX") V8HI
+   (V64QI "TARGET_AVX512BW && TARGET_EVEX512") (V32QI "TARGET_AVX") V16QI
+   (V32HI "TARGET_AVX512BW && TARGET_EVEX512") (V16HI "TARGET_AVX") V8HI
    (V8SI "TARGET_AVX") V4SI
    (V4DI "TARGET_AVX") V2DI])
 
@@ -497,16 +499,16 @@
 (define_mode_iterator VIHFBF
   [(V16SI "TARGET_AVX512F && TARGET_EVEX512")
    (V8DI "TARGET_AVX512F && TARGET_EVEX512")
-   (V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX") V16QI
-   (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX") V8HI
+   (V64QI "TARGET_AVX512BW && TARGET_EVEX512") (V32QI "TARGET_AVX") V16QI
+   (V32HI "TARGET_AVX512BW && TARGET_EVEX512") (V16HI "TARGET_AVX") V8HI
    (V8SI "TARGET_AVX") V4SI
    (V4DI "TARGET_AVX") V2DI
-   (V32HF "TARGET_AVX512BW") (V16HF "TARGET_AVX") V8HF
-   (V32BF "TARGET_AVX512BW") (V16BF "TARGET_AVX") V8BF])
+   (V32HF "TARGET_AVX512BW && TARGET_EVEX512") (V16HF "TARGET_AVX") V8HF
+   (V32BF "TARGET_AVX512BW && TARGET_EVEX512") (V16BF "TARGET_AVX") V8BF])
 
 (define_mode_iterator VI_AVX2
-  [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX2") V16QI
-   (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX2") V8HI
+  [(V64QI "TARGET_AVX512BW && TARGET_EVEX512") (V32QI "TARGET_AVX2") V16QI
+   (V32HI "TARGET_AVX512BW && TARGET_EVEX512") (V16HI "TARGET_AVX2") V8HI
    (V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8SI "TARGET_AVX2") V4SI
    (V8DI "TARGET_AVX512F && TARGET_EVEX512") (V4DI "TARGET_AVX2") V2DI])
 
@@ -541,7 +543,7 @@
   [(V32QI "TARGET_AVX2") V16QI])
 
 (define_mode_iterator VI1_AVX512
-  [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX2") V16QI])
+  [(V64QI "TARGET_AVX512BW && TARGET_EVEX512") (V32QI "TARGET_AVX2") V16QI])
 
 (define_mode_iterator VI1_AVX512F
   [(V64QI "TARGET_AVX512F") (V32QI "TARGET_AVX") V16QI])
@@ -550,20 +552,20 @@
   [(V64QI "TARGET_AVX512VNNI") (V32QI "TARGET_AVX2") V16QI])
 
 (define_mode_iterator VI12_256_512_AVX512VL
-  [V64QI (V32QI "TARGET_AVX512VL")
-   V32HI (V16HI "TARGET_AVX512VL")])
+  [(V64QI "TARGET_EVEX512") (V32QI "TARGET_AVX512VL")
+   (V32HI "TARGET_EVEX512") (V16HI "TARGET_AVX512VL")])
 
 (define_mode_iterator VI2_AVX2
   [(V16HI "TARGET_AVX2") V8HI])
 
 (define_mode_iterator VI2_AVX2_AVX512BW
-  [(V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX2") V8HI])
+  [(V32HI "TARGET_AVX512BW && TARGET_EVEX512") (V16HI "TARGET_AVX2") V8HI])
 
 (define_mode_iterator VI2_AVX512F
   [(V32HI "TARGET_AVX512F && TARGET_EVEX512") (V16HI "TARGET_AVX2") V8HI])
 
 (define_mode_iterator VI2_AVX512VNNIBW
-  [(V32HI "TARGET_AVX512BW || TARGET_AVX512VNNI")
+  [(V32HI "(TARGET_AVX512BW || TARGET_AVX512VNNI) && TARGET_EVEX512")
    (V16HI "TARGET_AVX2") V8HI])
 
 (define_mode_iterator VI4_AVX
@@ -584,12 +586,12 @@
    (V8DI "TARGET_AVX512F && TARGET_EVEX512")])
 
 (define_mode_iterator VI2_AVX512VL
-  [(V8HI "TARGET_AVX512VL") (V16HI "TARGET_AVX512VL") V32HI])
+  [(V8HI "TARGET_AVX512VL") (V16HI "TARGET_AVX512VL") (V32HI "TARGET_EVEX512")])
 
 (define_mode_iterator VI2HFBF_AVX512VL
-  [(V8HI "TARGET_AVX512VL") (V16HI "TARGET_AVX512VL") V32HI
-   (V8HF "TARGET_AVX512VL") (V16HF "TARGET_AVX512VL") V32HF
-   (V8BF "TARGET_AVX512VL") (V16BF "TARGET_AVX512VL") V32BF])
+  [(V8HI "TARGET_AVX512VL") (V16HI "TARGET_AVX512VL") (V32HI "TARGET_EVEX512")
+   (V8HF "TARGET_AVX512VL") (V16HF "TARGET_AVX512VL") (V32HF "TARGET_EVEX512")
+   (V8BF "TARGET_AVX512VL") (V16BF "TARGET_AVX512VL") (V32BF "TARGET_EVEX512")])
 
 (define_mode_iterator VI2H_AVX512VL
   [(V8HI "TARGET_AVX512VL") (V16HI "TARGET_AVX512VL") V32HI
@@ -600,7 +602,7 @@
   [V32QI (V16QI "TARGET_AVX512VL") (V64QI "TARGET_AVX512F")])
 
 (define_mode_iterator VI8_AVX2_AVX512BW
-  [(V8DI "TARGET_AVX512BW") (V4DI "TARGET_AVX2") V2DI])
+  [(V8DI "TARGET_AVX512BW && TARGET_EVEX512") (V4DI "TARGET_AVX2") V2DI])
 
 (define_mode_iterator VI8_AVX2
   [(V4DI "TARGET_AVX2") V2DI])
@@ -624,11 +626,11 @@
 
 ;; ??? We should probably use TImode instead.
 (define_mode_iterator VIMAX_AVX2_AVX512BW
-  [(V4TI "TARGET_AVX512BW") (V2TI "TARGET_AVX2") V1TI])
+  [(V4TI "TARGET_AVX512BW && TARGET_EVEX512") (V2TI "TARGET_AVX2") V1TI])
 
 ;; Suppose TARGET_AVX512BW as baseline
 (define_mode_iterator VIMAX_AVX512VL
-  [V4TI (V2TI "TARGET_AVX512VL") (V1TI "TARGET_AVX512VL")])
+  [(V4TI "TARGET_EVEX512") (V2TI "TARGET_AVX512VL") (V1TI "TARGET_AVX512VL")])
 
 (define_mode_iterator VIMAX_AVX2
   [(V2TI "TARGET_AVX2") V1TI])
@@ -638,15 +640,15 @@
    (V16HI "TARGET_AVX2") V8HI])
 
 (define_mode_iterator VI12_AVX2_AVX512BW
-  [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX2") V16QI
-   (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX2") V8HI])
+  [(V64QI "TARGET_AVX512BW && TARGET_EVEX512") (V32QI "TARGET_AVX2") V16QI
+   (V32HI "TARGET_AVX512BW && TARGET_EVEX512") (V16HI "TARGET_AVX2") V8HI])
 
 (define_mode_iterator VI24_AVX2
   [(V16HI "TARGET_AVX2") V8HI
    (V8SI "TARGET_AVX2") V4SI])
 
 (define_mode_iterator VI124_AVX2_24_AVX512F_1_AVX512BW
-  [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX2") V16QI
+  [(V64QI "TARGET_AVX512BW && TARGET_EVEX512") (V32QI "TARGET_AVX2") V16QI
    (V32HI "TARGET_AVX512F && TARGET_EVEX512") (V16HI "TARGET_AVX2") V8HI
    (V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8SI "TARGET_AVX2") V4SI])
 
@@ -656,13 +658,13 @@
    (V8SI "TARGET_AVX2") V4SI])
 
 (define_mode_iterator VI248_AVX512VL
-  [V32HI V16SI V8DI
+  [(V32HI "TARGET_EVEX512") (V16SI "TARGET_EVEX512") (V8DI "TARGET_EVEX512")
    (V16HI "TARGET_AVX512VL") (V8SI "TARGET_AVX512VL")
    (V4DI "TARGET_AVX512VL") (V8HI "TARGET_AVX512VL")
    (V4SI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")])
 
 (define_mode_iterator VI248_AVX512VLBW
-  [(V32HI "TARGET_AVX512BW")
+  [(V32HI "TARGET_AVX512BW && TARGET_EVEX512")
    (V16HI "TARGET_AVX512VL && TARGET_AVX512BW")
    (V8HI "TARGET_AVX512VL && TARGET_AVX512BW")
    (V16SI "TARGET_EVEX512") (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")
@@ -678,16 +680,16 @@
    (V4DI "TARGET_AVX2") V2DI])
 
 (define_mode_iterator VI248_AVX2_8_AVX512F_24_AVX512BW
-  [(V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX2") V8HI
-   (V16SI "TARGET_AVX512BW") (V8SI "TARGET_AVX2") V4SI
+  [(V32HI "TARGET_AVX512BW && TARGET_EVEX512") (V16HI "TARGET_AVX2") V8HI
+   (V16SI "TARGET_AVX512BW && TARGET_EVEX512") (V8SI "TARGET_AVX2") V4SI
    (V8DI "TARGET_AVX512F && TARGET_EVEX512") (V4DI "TARGET_AVX2") V2DI])
 
 (define_mode_iterator VI248_AVX512BW
-  [(V32HI "TARGET_AVX512BW") (V16SI "TARGET_EVEX512")
+  [(V32HI "TARGET_AVX512BW && TARGET_EVEX512") (V16SI "TARGET_EVEX512")
    (V8DI "TARGET_EVEX512")])
 
 (define_mode_iterator VI248_AVX512BW_AVX512VL
-  [(V32HI "TARGET_AVX512BW") 
+  [(V32HI "TARGET_AVX512BW && TARGET_EVEX512") 
    (V4DI "TARGET_AVX512VL") (V16SI "TARGET_EVEX512") (V8DI "TARGET_EVEX512")])
 
 ;; Suppose TARGET_AVX512VL as baseline
@@ -850,7 +852,8 @@
 (define_mode_iterator VI24_128 [V8HI V4SI])
 (define_mode_iterator VI248_128 [V8HI V4SI V2DI])
 (define_mode_iterator VI248_256 [V16HI V8SI V4DI])
-(define_mode_iterator VI248_512 [V32HI V16SI V8DI])
+(define_mode_iterator VI248_512
+  [(V32HI "TARGET_EVEX512") (V16SI "TARGET_EVEX512") (V8DI "TARGET_EVEX512")])
 (define_mode_iterator VI48_128 [V4SI V2DI])
 (define_mode_iterator VI148_512
   [(V64QI "TARGET_EVEX512") (V16SI "TARGET_EVEX512") (V8DI "TARGET_EVEX512")])
@@ -861,8 +864,8 @@
 (define_mode_iterator VI124_256 [V32QI V16HI V8SI])
 (define_mode_iterator VI124_256_AVX512F_AVX512BW
   [V32QI V16HI V8SI
-   (V64QI "TARGET_AVX512BW")
-   (V32HI "TARGET_AVX512BW")
+   (V64QI "TARGET_AVX512BW && TARGET_EVEX512")
+   (V32HI "TARGET_AVX512BW && TARGET_EVEX512")
    (V16SI "TARGET_AVX512F && TARGET_EVEX512")])
 (define_mode_iterator VI48_256 [V8SI V4DI])
 (define_mode_iterator VI48_512
@@ -870,11 +873,14 @@
 (define_mode_iterator VI4_256_8_512 [V8SI V8DI])
 (define_mode_iterator VI_AVX512BW
   [(V16SI "TARGET_EVEX512") (V8DI "TARGET_EVEX512")
-   (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512BW")])
+   (V32HI "TARGET_AVX512BW && TARGET_EVEX512")
+   (V64QI "TARGET_AVX512BW && TARGET_EVEX512")])
 (define_mode_iterator VIHFBF_AVX512BW
   [(V16SI "TARGET_EVEX512") (V8DI "TARGET_EVEX512")
-   (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512BW")
-   (V32HF "TARGET_AVX512BW") (V32BF "TARGET_AVX512BW")])
+   (V32HI "TARGET_AVX512BW && TARGET_EVEX512")
+   (V64QI "TARGET_AVX512BW && TARGET_EVEX512")
+   (V32HF "TARGET_AVX512BW && TARGET_EVEX512")
+   (V32BF "TARGET_AVX512BW && TARGET_EVEX512")])
 
 ;; Int-float size matches
 (define_mode_iterator VI2F_256_512 [V16HI V32HI V16HF V32HF V16BF V32BF])
@@ -1948,17 +1954,19 @@
 
 ;; All integer modes with AVX512BW/DQ.
 (define_mode_iterator SWI1248_AVX512BWDQ
-  [(QI "TARGET_AVX512DQ") HI (SI "TARGET_AVX512BW") (DI "TARGET_AVX512BW")])
+  [(QI "TARGET_AVX512DQ") HI (SI "TARGET_AVX512BW")
+   (DI "TARGET_AVX512BW && TARGET_EVEX512")])
 
 ;; All integer modes with AVX512BW, where HImode operation
 ;; can be used instead of QImode.
 (define_mode_iterator SWI1248_AVX512BW
-  [QI HI (SI "TARGET_AVX512BW") (DI "TARGET_AVX512BW")])
+  [QI HI (SI "TARGET_AVX512BW")
+   (DI "TARGET_AVX512BW && TARGET_EVEX512")])
 
 ;; All integer modes with AVX512BW/DQ, even HImode requires DQ.
 (define_mode_iterator SWI1248_AVX512BWDQ2
   [(QI "TARGET_AVX512DQ") (HI "TARGET_AVX512DQ")
-   (SI "TARGET_AVX512BW") (DI "TARGET_AVX512BW")])
+   (SI "TARGET_AVX512BW") (DI "TARGET_AVX512BW && TARGET_EVEX512")])
 
 (define_expand "kmov<mskmodesuffix>"
   [(set (match_operand:SWI1248_AVX512BWDQ 0 "nonimmediate_operand")
@@ -2097,7 +2105,7 @@
 	(zero_extend:DI
 	  (not:SI (match_operand:SI 1 "register_operand" "k"))))
    (unspec [(const_int 0)] UNSPEC_MASKOP)]
-  "TARGET_AVX512BW"
+  "TARGET_AVX512BW && TARGET_EVEX512"
   "knotd\t{%1, %0|%0, %1}";
   [(set_attr "type" "msklog")
    (set_attr "prefix" "vex")
@@ -2107,7 +2115,7 @@
   [(set (match_operand:DI 0 "mask_reg_operand")
 	(zero_extend:DI
 	  (not:SI (match_operand:SI 1 "mask_reg_operand"))))]
-  "TARGET_AVX512BW && reload_completed"
+  "TARGET_AVX512BW && TARGET_EVEX512 && reload_completed"
   [(parallel
      [(set (match_dup 0)
 	   (zero_extend:DI
@@ -2213,7 +2221,7 @@
 	    (const_int 32))
 	  (zero_extend:DI (match_operand:SI 2 "register_operand" "k"))))
    (unspec [(const_int 0)] UNSPEC_MASKOP)]
-  "TARGET_AVX512BW"
+  "TARGET_AVX512BW && TARGET_EVEX512"
   "kunpckdq\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "mode" "DI")])
 
@@ -3455,9 +3463,9 @@
    (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
    (V8SI "TARGET_AVX2") (V4DI "TARGET_AVX2")
    (V8SF "TARGET_AVX") (V4DF "TARGET_AVX")
-   (V64QI "TARGET_AVX512BW")
+   (V64QI "TARGET_AVX512BW && TARGET_EVEX512")
    (V32HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
-   (V32HI "TARGET_AVX512BW")
+   (V32HI "TARGET_AVX512BW && TARGET_EVEX512")
    (V16SI "TARGET_AVX512F && TARGET_EVEX512")
    (V8DI "TARGET_AVX512F && TARGET_EVEX512")
    (V16SF "TARGET_AVX512F && TARGET_EVEX512")
@@ -12340,12 +12348,12 @@
 
 ;; Modes handled by vec_extract patterns.
 (define_mode_iterator VEC_EXTRACT_MODE
-  [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX") V16QI
-   (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX") V8HI
+  [(V64QI "TARGET_AVX512BW && TARGET_EVEX512") (V32QI "TARGET_AVX") V16QI
+   (V32HI "TARGET_AVX512BW && TARGET_EVEX512") (V16HI "TARGET_AVX") V8HI
    (V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8SI "TARGET_AVX") V4SI
    (V8DI "TARGET_AVX512F && TARGET_EVEX512") (V4DI "TARGET_AVX") V2DI
-   (V32HF "TARGET_AVX512BW") (V16HF "TARGET_AVX") V8HF
-   (V32BF "TARGET_AVX512BW") (V16BF "TARGET_AVX") V8BF
+   (V32HF "TARGET_AVX512BW && TARGET_EVEX512") (V16HF "TARGET_AVX") V8HF
+   (V32BF "TARGET_AVX512BW && TARGET_EVEX512") (V16BF "TARGET_AVX") V8BF
    (V16SF "TARGET_AVX512F && TARGET_EVEX512") (V8SF "TARGET_AVX") V4SF
    (V8DF "TARGET_AVX512F && TARGET_EVEX512") (V4DF "TARGET_AVX") V2DF
    (V4TI "TARGET_AVX512F && TARGET_EVEX512") (V2TI "TARGET_AVX")])
@@ -14028,7 +14036,7 @@
 		     (const_int 10) (const_int 11)
 		     (const_int 12) (const_int 13)
 		     (const_int 14) (const_int 15)])))]
-  "TARGET_AVX512BW && ix86_pre_reload_split ()"
+  "TARGET_AVX512BW && TARGET_EVEX512 && ix86_pre_reload_split ()"
   "#"
   "&& 1"
   [(set (match_dup 0)
@@ -14053,7 +14061,7 @@
 		     (const_int 10) (const_int 11)
 		     (const_int 12) (const_int 13)
 		     (const_int 14) (const_int 15)])))]
-  "TARGET_AVX512BW && ix86_pre_reload_split ()"
+  "TARGET_AVX512BW && TARGET_EVEX512 && ix86_pre_reload_split ()"
   "#"
   "&& 1"
   [(set (match_dup 0)
@@ -14173,13 +14181,13 @@
   [(set (match_operand:V32QI 0 "nonimmediate_operand")
 	(truncate:V32QI
 	  (match_operand:V32HI 1 "register_operand")))]
-  "TARGET_AVX512BW")
+  "TARGET_AVX512BW && TARGET_EVEX512")
 
 (define_insn "avx512bw_<code>v32hiv32qi2"
   [(set (match_operand:V32QI 0 "nonimmediate_operand" "=v,m")
 	(any_truncate:V32QI
 	    (match_operand:V32HI 1 "register_operand" "v,v")))]
-  "TARGET_AVX512BW"
+  "TARGET_AVX512BW && TARGET_EVEX512"
   "vpmov<trunsuffix>wb\t{%1, %0|%0, %1}"
   [(set_attr "type" "ssemov")
    (set_attr "memory" "none,store")
@@ -14225,7 +14233,7 @@
         (match_operand:V32HI 1 "register_operand" "v,v"))
       (match_operand:V32QI 2 "nonimm_or_0_operand" "0C,0")
       (match_operand:SI 3 "register_operand" "Yk,Yk")))]
-  "TARGET_AVX512BW"
+  "TARGET_AVX512BW && TARGET_EVEX512"
   "vpmov<trunsuffix>wb\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}"
   [(set_attr "type" "ssemov")
    (set_attr "memory" "none,store")
@@ -14239,7 +14247,7 @@
         (match_operand:V32HI 1 "register_operand"))
       (match_dup 0)
       (match_operand:SI 2 "register_operand")))]
-  "TARGET_AVX512BW")
+  "TARGET_AVX512BW && TARGET_EVEX512")
 
 (define_mode_iterator PMOV_DST_MODE_2
   [V4SI V8HI (V16QI "TARGET_AVX512BW")])
@@ -16126,7 +16134,7 @@
    (match_operand:V64QI 1 "register_operand")
    (match_operand:V64QI 2 "nonimmediate_operand")
    (match_operand:V16SI 3 "nonimmediate_operand")]
-  "TARGET_AVX512BW"
+  "TARGET_AVX512BW && TARGET_EVEX512"
 {
   rtx t1 = gen_reg_rtx (V8DImode);
   rtx t2 = gen_reg_rtx (V16SImode);
@@ -17312,7 +17320,7 @@
    (V8DF "TARGET_AVX512F && TARGET_EVEX512")
    (V16SI "TARGET_AVX512F && TARGET_EVEX512")
    (V8DI "TARGET_AVX512F && TARGET_EVEX512")
-   (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512VBMI")
+   (V32HI "TARGET_AVX512BW && TARGET_EVEX512") (V64QI "TARGET_AVX512VBMI")
    (V32HF "TARGET_AVX512FP16")])
 
 (define_expand "vec_perm<mode>"
@@ -18018,7 +18026,7 @@
 		      (const_string "*")))])
 
 (define_mode_iterator AVX512ZEXTMASK
-  [(DI "TARGET_AVX512BW") (SI "TARGET_AVX512BW") HI])
+  [(DI "TARGET_AVX512BW && TARGET_EVEX512") (SI "TARGET_AVX512BW") HI])
 
 (define_insn "<avx512>_testm<mode>3<mask_scalar_merge_name>"
   [(set (match_operand:<avx512fmaskmode> 0 "register_operand" "=k")
@@ -18130,16 +18138,18 @@
      (unspec [(const_int 0)] UNSPEC_MASKOP)])]
   "TARGET_AVX512F")
 
+(define_mode_iterator SWI24_MASK [HI (SI "TARGET_EVEX512")])
+
 (define_expand "vec_pack_trunc_<mode>"
   [(parallel
     [(set (match_operand:<DOUBLEMASKMODE> 0 "register_operand")
 	  (ior:<DOUBLEMASKMODE>
 	    (ashift:<DOUBLEMASKMODE>
 	      (zero_extend:<DOUBLEMASKMODE>
-	        (match_operand:SWI24 2 "register_operand"))
+	        (match_operand:SWI24_MASK 2 "register_operand"))
 	      (match_dup 3))
 	    (zero_extend:<DOUBLEMASKMODE>
-	      (match_operand:SWI24 1 "register_operand"))))
+	      (match_operand:SWI24_MASK 1 "register_operand"))))
      (unspec [(const_int 0)] UNSPEC_MASKOP)])]
   "TARGET_AVX512BW"
 {
@@ -18267,7 +18277,7 @@
 		     (const_int 60) (const_int 61)
 		     (const_int 62) (const_int 63)])))]
 
-  "TARGET_AVX512BW"
+  "TARGET_AVX512BW && TARGET_EVEX512"
   "vpacksswb\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "type" "sselog")
    (set_attr "prefix" "<mask_prefix>")
@@ -18336,7 +18346,7 @@
 		     (const_int 14)  (const_int 15)
 		     (const_int 28)  (const_int 29)
 		     (const_int 30)  (const_int 31)])))]
-  "TARGET_AVX512BW"
+  "TARGET_AVX512BW && TARGET_EVEX512"
   "vpackssdw\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "type" "sselog")
    (set_attr "prefix" "<mask_prefix>")
@@ -18398,7 +18408,7 @@
 		     (const_int 61) (const_int 125)
 		     (const_int 62) (const_int 126)
 		     (const_int 63) (const_int 127)])))]
-  "TARGET_AVX512BW"
+  "TARGET_AVX512BW && TARGET_EVEX512"
   "vpunpckhbw\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "type" "sselog")
    (set_attr "prefix" "evex")
@@ -18494,7 +18504,7 @@
 		     (const_int 53) (const_int 117)
 		     (const_int 54) (const_int 118)
 		     (const_int 55) (const_int 119)])))]
-  "TARGET_AVX512BW"
+  "TARGET_AVX512BW && TARGET_EVEX512"
   "vpunpcklbw\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "type" "sselog")
    (set_attr "prefix" "evex")
@@ -19677,7 +19687,7 @@
 	  [(match_operand:V32HI 1 "nonimmediate_operand" "vm")
 	   (match_operand:SI 2 "const_0_to_255_operand")]
 	  UNSPEC_PSHUFLW))]
-  "TARGET_AVX512BW"
+  "TARGET_AVX512BW && TARGET_EVEX512"
   "vpshuflw\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "type" "sselog")
    (set_attr "prefix" "evex")
@@ -19853,7 +19863,7 @@
 	  [(match_operand:V32HI 1 "nonimmediate_operand" "vm")
 	   (match_operand:SI 2 "const_0_to_255_operand")]
 	  UNSPEC_PSHUFHW))]
-  "TARGET_AVX512BW"
+  "TARGET_AVX512BW && TARGET_EVEX512"
   "vpshufhw\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "type" "sselog")
    (set_attr "prefix" "evex")
@@ -20735,7 +20745,7 @@
 (define_expand "vec_unpacks_lo_di"
   [(set (match_operand:SI 0 "register_operand")
         (subreg:SI (match_operand:DI 1 "register_operand") 0))]
-  "TARGET_AVX512BW")
+  "TARGET_AVX512BW && TARGET_EVEX512")
 
 (define_expand "vec_unpacku_hi_<mode>"
   [(match_operand:<sseunpackmode> 0 "register_operand")
@@ -20774,12 +20784,15 @@
       (unspec [(const_int 0)] UNSPEC_MASKOP)])]
   "TARGET_AVX512F")
 
+(define_mode_iterator SWI48x_MASK [SI (DI "TARGET_EVEX512")])
+
 (define_expand "vec_unpacks_hi_<mode>"
   [(parallel
-     [(set (subreg:SWI48x
+     [(set (subreg:SWI48x_MASK
 	     (match_operand:<HALFMASKMODE> 0 "register_operand") 0)
-	   (lshiftrt:SWI48x (match_operand:SWI48x 1 "register_operand")
-			    (match_dup 2)))
+	   (lshiftrt:SWI48x_MASK
+	     (match_operand:SWI48x_MASK 1 "register_operand")
+	     (match_dup 2)))
       (unspec [(const_int 0)] UNSPEC_MASKOP)])]
   "TARGET_AVX512BW"
   "operands[2] = GEN_INT (GET_MODE_BITSIZE (<HALFMASKMODE>mode));")
@@ -21534,7 +21547,7 @@
 				   (const_int 1) (const_int 1)
 				   (const_int 1) (const_int 1)]))
 	    (const_int 1))))]
-  "TARGET_AVX512BW"
+  "TARGET_AVX512BW && TARGET_EVEX512"
   "vpmulhrsw\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "type" "sseimul")
    (set_attr "prefix" "evex")
@@ -22042,8 +22055,8 @@
 ;; Mode iterator to handle singularity w/ absence of V2DI and V4DI
 ;; modes for abs instruction on pre AVX-512 targets.
 (define_mode_iterator VI1248_AVX512VL_AVX512BW
-  [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX2") V16QI
-   (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX2") V8HI
+  [(V64QI "TARGET_AVX512BW && TARGET_EVEX512") (V32QI "TARGET_AVX2") V16QI
+   (V32HI "TARGET_AVX512BW && TARGET_EVEX512") (V16HI "TARGET_AVX2") V8HI
    (V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8SI "TARGET_AVX2") V4SI
    (V8DI "TARGET_AVX512F && TARGET_EVEX512") (V4DI "TARGET_AVX512VL")
    (V2DI "TARGET_AVX512VL")])
@@ -22702,7 +22715,7 @@
   [(set (match_operand:V32HI 0 "register_operand" "=v")
 	(any_extend:V32HI
 	  (match_operand:V32QI 1 "nonimmediate_operand" "vm")))]
-  "TARGET_AVX512BW"
+  "TARGET_AVX512BW && TARGET_EVEX512"
   "vpmov<extsuffix>bw\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssemov")
    (set_attr "prefix" "evex")
@@ -22716,7 +22729,7 @@
 	    (match_operand:V64QI 2 "const0_operand"))
 	  (match_parallel 3 "pmovzx_parallel"
 	    [(match_operand 4 "const_int_operand")])))]
-  "TARGET_AVX512BW"
+  "TARGET_AVX512BW && TARGET_EVEX512"
   "#"
   "&& reload_completed"
   [(set (match_dup 0) (zero_extend:V32HI (match_dup 1)))]
@@ -22736,7 +22749,7 @@
 	    (match_operand:V64QI 3 "const0_operand"))
 	  (match_parallel 4 "pmovzx_parallel"
 	    [(match_operand 5 "const_int_operand")])))]
-  "TARGET_AVX512BW"
+  "TARGET_AVX512BW && TARGET_EVEX512"
   "#"
   "&& reload_completed"
   [(set (match_dup 0) (zero_extend:V32HI (match_dup 1)))]
@@ -22749,7 +22762,7 @@
   [(set (match_operand:V32HI 0 "register_operand")
 	(any_extend:V32HI
 	  (match_operand:V32QI 1 "nonimmediate_operand")))]
-  "TARGET_AVX512BW")
+  "TARGET_AVX512BW && TARGET_EVEX512")
 
 (define_insn "sse4_1_<code>v8qiv8hi2<mask_name>"
   [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,Yw")
@@ -26107,12 +26120,12 @@
    (set_attr "mode" "OI")])
 
 (define_mode_attr pbroadcast_evex_isa
-  [(V64QI "avx512bw") (V32QI "avx512bw") (V16QI "avx512bw")
-   (V32HI "avx512bw") (V16HI "avx512bw") (V8HI "avx512bw")
+  [(V64QI "avx512bw_512") (V32QI "avx512bw") (V16QI "avx512bw")
+   (V32HI "avx512bw_512") (V16HI "avx512bw") (V8HI "avx512bw")
    (V16SI "avx512f_512") (V8SI "avx512f") (V4SI "avx512f")
    (V8DI "avx512f_512") (V4DI "avx512f") (V2DI "avx512f")
-   (V32HF "avx512bw") (V16HF "avx512bw") (V8HF "avx512bw")
-   (V32BF "avx512bw") (V16BF "avx512bw") (V8BF "avx512bw")])
+   (V32HF "avx512bw_512") (V16HF "avx512bw") (V8HF "avx512bw")
+   (V32BF "avx512bw_512") (V16BF "avx512bw") (V8BF "avx512bw")])
 
 (define_insn "avx2_pbroadcast<mode>"
   [(set (match_operand:VIHFBF 0 "register_operand" "=x,v")
@@ -26967,7 +26980,8 @@
    (V4DI "TARGET_AVX512VL") (V4DF "TARGET_AVX512VL")
    (V4SI "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")
    (V2DI "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")
-   (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX512BW && TARGET_AVX512VL")
+   (V32HI "TARGET_AVX512BW && TARGET_EVEX512")
+   (V16HI "TARGET_AVX512BW && TARGET_AVX512VL")
    (V8HI "TARGET_AVX512BW && TARGET_AVX512VL")
    (V64QI "TARGET_AVX512VBMI") (V32QI "TARGET_AVX512VBMI && TARGET_AVX512VL")
    (V16QI "TARGET_AVX512VBMI && TARGET_AVX512VL")])
@@ -26976,7 +26990,8 @@
   [(V16SI "TARGET_EVEX512") (V8DI "TARGET_EVEX512")
    (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")
    (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")
-   (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX512BW && TARGET_AVX512VL")
+   (V32HI "TARGET_AVX512BW && TARGET_EVEX512")
+   (V16HI "TARGET_AVX512BW && TARGET_AVX512VL")
    (V8HI "TARGET_AVX512BW && TARGET_AVX512VL")
    (V64QI "TARGET_AVX512VBMI") (V32QI "TARGET_AVX512VBMI && TARGET_AVX512VL")
    (V16QI "TARGET_AVX512VBMI && TARGET_AVX512VL")])
-- 
2.31.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 16/18] Support -mevex512 for AVX512{IFMA,VBMI,VNNI,BF16,VPOPCNTDQ,VBMI2,BITALG,VP2INTERSECT},VAES,GFNI,VPCLMULQDQ intrins
  2023-09-21  7:19 [PATCH 00/18] Support -mevex512 for AVX512 Hu, Lin1
                   ` (14 preceding siblings ...)
  2023-09-21  7:20 ` [PATCH 15/18] Support -mevex512 for AVX512BW intrins Hu, Lin1
@ 2023-09-21  7:20 ` Hu, Lin1
  2023-09-21  7:20 ` [PATCH 17/18] Support -mevex512 for AVX512FP16 intrins Hu, Lin1
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Hu, Lin1 @ 2023-09-21  7:20 UTC (permalink / raw)
  To: gcc-patches; +Cc: hongtao.liu, ubizjak, haochen.jiang

From: Haochen Jiang <haochen.jiang@intel.com>

gcc/ChangeLog:

	* config/i386/sse.md (VI1_AVX512VL): Add TARGET_EVEX512.
	(VI8_FVL): Ditto.
	(VI1_AVX512F): Ditto.
	(VI1_AVX512VNNI): Ditto.
	(VI1_AVX512VL_F): Ditto.
	(VI12_VI48F_AVX512VL): Ditto.
	(*avx512f_permvar_truncv32hiv32qi_1): Ditto.
	(sdot_prod<mode>): Ditto.
	(VEC_PERM_AVX2): Ditto.
	(VPERMI2): Ditto.
	(VPERMI2I): Ditto.
	(vpmadd52<vpmadd52type>v8di): Ditto.
	(usdot_prod<mode>): Ditto.
	(vpdpbusd_v16si): Ditto.
	(vpdpbusds_v16si): Ditto.
	(vpdpwssd_v16si): Ditto.
	(vpdpwssds_v16si): Ditto.
	(VI48_AVX512VP2VL): Ditto.
	(avx512vp2intersect_2intersectv16si): Ditto.
	(VF_AVX512BF16VL): Ditto.
	(VF1_AVX512_256): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/pr90096.c: Adjust error message.

Co-authored-by: Hu, Lin1 <lin1.hu@intel.com>
---
 gcc/config/i386/sse.md                  | 56 +++++++++++++------------
 gcc/testsuite/gcc.target/i386/pr90096.c |  2 +-
 2 files changed, 31 insertions(+), 27 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index e59f6bf4410..a5a95b9de66 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -298,7 +298,7 @@
    (V32BF "TARGET_EVEX512") (V16BF "TARGET_AVX512VL") (V8BF "TARGET_AVX512VL")])
 
 (define_mode_iterator VI1_AVX512VL
-  [V64QI (V16QI "TARGET_AVX512VL") (V32QI "TARGET_AVX512VL")])
+  [(V64QI "TARGET_EVEX512") (V16QI "TARGET_AVX512VL") (V32QI "TARGET_AVX512VL")])
 
 ;; All vector modes
 (define_mode_iterator V
@@ -531,7 +531,7 @@
   [(V8DI "TARGET_AVX512F && TARGET_EVEX512") (V4DI "TARGET_AVX") V2DI])
 
 (define_mode_iterator VI8_FVL
-  [(V8DI "TARGET_AVX512F") V4DI (V2DI "TARGET_AVX512VL")])
+  [(V8DI "TARGET_AVX512F && TARGET_EVEX512") V4DI (V2DI "TARGET_AVX512VL")])
 
 (define_mode_iterator VI8_AVX512VL
   [(V8DI "TARGET_EVEX512") (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")])
@@ -546,10 +546,10 @@
   [(V64QI "TARGET_AVX512BW && TARGET_EVEX512") (V32QI "TARGET_AVX2") V16QI])
 
 (define_mode_iterator VI1_AVX512F
-  [(V64QI "TARGET_AVX512F") (V32QI "TARGET_AVX") V16QI])
+  [(V64QI "TARGET_AVX512F && TARGET_EVEX512") (V32QI "TARGET_AVX") V16QI])
 
 (define_mode_iterator VI1_AVX512VNNI
-  [(V64QI "TARGET_AVX512VNNI") (V32QI "TARGET_AVX2") V16QI])
+  [(V64QI "TARGET_AVX512VNNI && TARGET_EVEX512") (V32QI "TARGET_AVX2") V16QI])
 
 (define_mode_iterator VI12_256_512_AVX512VL
   [(V64QI "TARGET_EVEX512") (V32QI "TARGET_AVX512VL")
@@ -599,7 +599,7 @@
    V8DI ])
 
 (define_mode_iterator VI1_AVX512VL_F
-  [V32QI (V16QI "TARGET_AVX512VL") (V64QI "TARGET_AVX512F")])
+  [V32QI (V16QI "TARGET_AVX512VL") (V64QI "TARGET_AVX512F && TARGET_EVEX512")])
 
 (define_mode_iterator VI8_AVX2_AVX512BW
   [(V8DI "TARGET_AVX512BW && TARGET_EVEX512") (V4DI "TARGET_AVX2") V2DI])
@@ -923,8 +923,8 @@
    (V4DI "TARGET_AVX512VL") (V4DF "TARGET_AVX512VL")
    (V4SI "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")
    (V2DI "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")
-   V64QI (V16QI "TARGET_AVX512VL") (V32QI "TARGET_AVX512VL")
-   V32HI (V16HI "TARGET_AVX512VL") (V8HI "TARGET_AVX512VL")])
+   (V64QI "TARGET_EVEX512") (V16QI "TARGET_AVX512VL") (V32QI "TARGET_AVX512VL")
+   (V32HI "TARGET_EVEX512") (V16HI "TARGET_AVX512VL") (V8HI "TARGET_AVX512VL")])
 
 (define_mode_iterator VI48F_256 [V8SI V8SF V4DI V4DF])
 
@@ -14217,7 +14217,7 @@
 		     (const_int 26) (const_int 27)
 		     (const_int 28) (const_int 29)
 		     (const_int 30) (const_int 31)])))]
-  "TARGET_AVX512VBMI && ix86_pre_reload_split ()"
+  "TARGET_AVX512VBMI && TARGET_EVEX512 && ix86_pre_reload_split ()"
   "#"
   "&& 1"
   [(set (match_dup 0)
@@ -16040,7 +16040,7 @@
   "TARGET_SSE2"
 {
   /* Try with vnni instructions.  */
-  if ((<MODE_SIZE> == 64 && TARGET_AVX512VNNI)
+  if ((<MODE_SIZE> == 64 && TARGET_AVX512VNNI && TARGET_EVEX512)
       || (<MODE_SIZE> < 64
 	  && ((TARGET_AVX512VNNI && TARGET_AVX512VL) || TARGET_AVXVNNI)))
     {
@@ -17320,7 +17320,8 @@
    (V8DF "TARGET_AVX512F && TARGET_EVEX512")
    (V16SI "TARGET_AVX512F && TARGET_EVEX512")
    (V8DI "TARGET_AVX512F && TARGET_EVEX512")
-   (V32HI "TARGET_AVX512BW && TARGET_EVEX512") (V64QI "TARGET_AVX512VBMI")
+   (V32HI "TARGET_AVX512BW && TARGET_EVEX512")
+   (V64QI "TARGET_AVX512VBMI && TARGET_EVEX512")
    (V32HF "TARGET_AVX512FP16")])
 
 (define_expand "vec_perm<mode>"
@@ -26983,7 +26984,8 @@
    (V32HI "TARGET_AVX512BW && TARGET_EVEX512")
    (V16HI "TARGET_AVX512BW && TARGET_AVX512VL")
    (V8HI "TARGET_AVX512BW && TARGET_AVX512VL")
-   (V64QI "TARGET_AVX512VBMI") (V32QI "TARGET_AVX512VBMI && TARGET_AVX512VL")
+   (V64QI "TARGET_AVX512VBMI && TARGET_EVEX512")
+   (V32QI "TARGET_AVX512VBMI && TARGET_AVX512VL")
    (V16QI "TARGET_AVX512VBMI && TARGET_AVX512VL")])
 
 (define_mode_iterator VPERMI2I
@@ -26993,7 +26995,8 @@
    (V32HI "TARGET_AVX512BW && TARGET_EVEX512")
    (V16HI "TARGET_AVX512BW && TARGET_AVX512VL")
    (V8HI "TARGET_AVX512BW && TARGET_AVX512VL")
-   (V64QI "TARGET_AVX512VBMI") (V32QI "TARGET_AVX512VBMI && TARGET_AVX512VL")
+   (V64QI "TARGET_AVX512VBMI && TARGET_EVEX512")
+   (V32QI "TARGET_AVX512VBMI && TARGET_AVX512VL")
    (V16QI "TARGET_AVX512VBMI && TARGET_AVX512VL")])
 
 (define_expand "<avx512>_vpermi2var<mode>3_mask"
@@ -28977,7 +28980,7 @@
 	   (match_operand:V8DI 2 "register_operand" "v")
 	   (match_operand:V8DI 3 "nonimmediate_operand" "vm")]
 	  VPMADD52))]
-  "TARGET_AVX512IFMA"
+  "TARGET_AVX512IFMA && TARGET_EVEX512"
   "vpmadd52<vpmadd52type>\t{%3, %2, %0|%0, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "prefix" "evex")
@@ -29579,9 +29582,9 @@
    (match_operand:VI1_AVX512VNNI 1 "register_operand")
    (match_operand:VI1_AVX512VNNI 2 "register_operand")
    (match_operand:<ssedvecmode> 3 "register_operand")]
-  "(<MODE_SIZE> == 64
-    ||((TARGET_AVX512VNNI && TARGET_AVX512VL)
-	    || TARGET_AVXVNNI))"
+  "((<MODE_SIZE> == 64 && TARGET_EVEX512)
+    || ((TARGET_AVX512VNNI && TARGET_AVX512VL)
+	|| TARGET_AVXVNNI))"
 {
   operands[1] = lowpart_subreg (<ssedvecmode>mode,
 				force_reg (<MODE>mode, operands[1]),
@@ -29602,7 +29605,7 @@
 	   (match_operand:V16SI 2 "register_operand" "v")
 	   (match_operand:V16SI 3 "nonimmediate_operand" "vm")]
 	  UNSPEC_VPDPBUSD))]
-  "TARGET_AVX512VNNI"
+  "TARGET_AVX512VNNI && TARGET_EVEX512"
   "vpdpbusd\t{%3, %2, %0|%0, %2, %3}"
   [(set_attr ("prefix") ("evex"))])
 
@@ -29670,7 +29673,7 @@
 	   (match_operand:V16SI 2 "register_operand" "v")
 	   (match_operand:V16SI 3 "nonimmediate_operand" "vm")]
 	  UNSPEC_VPDPBUSDS))]
-  "TARGET_AVX512VNNI"
+  "TARGET_AVX512VNNI && TARGET_EVEX512"
   "vpdpbusds\t{%3, %2, %0|%0, %2, %3}"
   [(set_attr ("prefix") ("evex"))])
 
@@ -29738,7 +29741,7 @@
 	   (match_operand:V16SI 2 "register_operand" "v")
 	   (match_operand:V16SI 3 "nonimmediate_operand" "vm")]
 	  UNSPEC_VPDPWSSD))]
-  "TARGET_AVX512VNNI"
+  "TARGET_AVX512VNNI && TARGET_EVEX512"
   "vpdpwssd\t{%3, %2, %0|%0, %2, %3}"
   [(set_attr ("prefix") ("evex"))])
 
@@ -29806,7 +29809,7 @@
 	   (match_operand:V16SI 2 "register_operand" "v")
 	   (match_operand:V16SI 3 "nonimmediate_operand" "vm")]
 	  UNSPEC_VPDPWSSDS))]
-  "TARGET_AVX512VNNI"
+  "TARGET_AVX512VNNI && TARGET_EVEX512"
   "vpdpwssds\t{%3, %2, %0|%0, %2, %3}"
   [(set_attr ("prefix") ("evex"))])
 
@@ -29929,9 +29932,9 @@
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_mode_iterator VI48_AVX512VP2VL
-  [V8DI
-  (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")
-  (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")])
+  [(V8DI "TARGET_EVEX512")
+   (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")
+   (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")])
 
 (define_mode_iterator MASK_DWI [P2QI P2HI])
 
@@ -29972,12 +29975,12 @@
 	(unspec:P2HI [(match_operand:V16SI 1 "register_operand" "v")
 		      (match_operand:V16SI 2 "vector_operand" "vm")]
 		UNSPEC_VP2INTERSECT))]
-  "TARGET_AVX512VP2INTERSECT"
+  "TARGET_AVX512VP2INTERSECT && TARGET_EVEX512"
   "vp2intersectd\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr ("prefix") ("evex"))])
 
 (define_mode_iterator VF_AVX512BF16VL
-  [V32BF (V16BF "TARGET_AVX512VL") (V8BF "TARGET_AVX512VL")])
+  [(V32BF "TARGET_EVEX512") (V16BF "TARGET_AVX512VL") (V8BF "TARGET_AVX512VL")])
 ;; Converting from BF to SF
 (define_mode_attr bf16_cvt_2sf
   [(V32BF  "V16SF") (V16BF  "V8SF") (V8BF  "V4SF")])
@@ -30070,7 +30073,8 @@
   "TARGET_AVX512BF16 && TARGET_AVX512VL"
   "vcvtneps2bf16{x}\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}")
 
-(define_mode_iterator VF1_AVX512_256 [V16SF (V8SF "TARGET_AVX512VL")])
+(define_mode_iterator VF1_AVX512_256
+  [(V16SF "TARGET_EVEX512") (V8SF "TARGET_AVX512VL")])
 
 (define_expand "avx512f_cvtneps2bf16_<mode>_maskz"
   [(match_operand:<sf_cvt_bf16> 0 "register_operand")
diff --git a/gcc/testsuite/gcc.target/i386/pr90096.c b/gcc/testsuite/gcc.target/i386/pr90096.c
index 871e0ffc691..74f052ea8e5 100644
--- a/gcc/testsuite/gcc.target/i386/pr90096.c
+++ b/gcc/testsuite/gcc.target/i386/pr90096.c
@@ -10,7 +10,7 @@ volatile __mmask64 m64;
 void
 foo (int i)
 {
-  x1 = _mm512_gf2p8affineinv_epi64_epi8 (x1, x2, 3);	/* { dg-error "needs isa option -mgfni -mavx512f" } */
+  x1 = _mm512_gf2p8affineinv_epi64_epi8 (x1, x2, 3);	/* { dg-error "needs isa option -mevex512 -mgfni -mavx512f" } */
 }
 
 #ifdef __x86_64__
-- 
2.31.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 17/18] Support -mevex512 for AVX512FP16 intrins
  2023-09-21  7:19 [PATCH 00/18] Support -mevex512 for AVX512 Hu, Lin1
                   ` (15 preceding siblings ...)
  2023-09-21  7:20 ` [PATCH 16/18] Support -mevex512 for AVX512{IFMA,VBMI,VNNI,BF16,VPOPCNTDQ,VBMI2,BITALG,VP2INTERSECT},VAES,GFNI,VPCLMULQDQ intrins Hu, Lin1
@ 2023-09-21  7:20 ` Hu, Lin1
  2023-09-21  7:20 ` [PATCH 18/18] Allow -mno-evex512 usage Hu, Lin1
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Hu, Lin1 @ 2023-09-21  7:20 UTC (permalink / raw)
  To: gcc-patches; +Cc: hongtao.liu, ubizjak, haochen.jiang

From: Haochen Jiang <haochen.jiang@intel.com>

gcc/ChangeLog:

	* config/i386/sse.md (V48H_AVX512VL): Add TARGET_EVEX512.
	(VFH): Ditto.
	(VF2H): Ditto.
	(VFH_AVX512VL): Ditto.
	(VHFBF): Ditto.
	(VHF_AVX512VL): Ditto.
	(VI2H_AVX512VL): Ditto.
	(VI2F_256_512): Ditto.
	(VF48_I1248): Remove unused iterator.
	(VF48H_AVX512VL): Add TARGET_EVEX512.
	(VF_AVX512): Remove unused iterator.
	(REDUC_PLUS_MODE): Add TARGET_EVEX512.
	(REDUC_SMINMAX_MODE): Ditto.
	(FMAMODEM): Ditto.
	(VFH_SF_AVX512VL): Ditto.
	(VEC_PERM_AVX2): Ditto.

Co-authored-by: Hu, Lin1 <lin1.hu@intel.com>
---
 gcc/config/i386/sse.md | 44 ++++++++++++++++++++----------------------
 1 file changed, 21 insertions(+), 23 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index a5a95b9de66..25d53e15dce 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -280,7 +280,7 @@
 (define_mode_iterator V48H_AVX512VL
   [(V16SI "TARGET_EVEX512") (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")
    (V8DI "TARGET_EVEX512") (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")
-   (V32HF "TARGET_AVX512FP16")
+   (V32HF "TARGET_AVX512FP16 && TARGET_EVEX512")
    (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
    (V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
    (V16SF "TARGET_EVEX512") (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")
@@ -355,7 +355,7 @@
    (V2DF "TARGET_AVX512DQ && TARGET_AVX512VL")])
 
 (define_mode_iterator VFH
-  [(V32HF "TARGET_AVX512FP16")
+  [(V32HF "TARGET_AVX512FP16 && TARGET_EVEX512")
    (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
    (V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
    (V16SF "TARGET_AVX512F && TARGET_EVEX512") (V8SF "TARGET_AVX") V4SF
@@ -401,7 +401,7 @@
 
 ;; All DFmode & HFmode vector float modes
 (define_mode_iterator VF2H
-  [(V32HF "TARGET_AVX512FP16")
+  [(V32HF "TARGET_AVX512FP16 && TARGET_EVEX512")
    (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
    (V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
    (V8DF "TARGET_AVX512F && TARGET_EVEX512") (V4DF "TARGET_AVX") V2DF])
@@ -463,7 +463,7 @@
   [(V16SF "TARGET_AVX512ER") (V8SF "TARGET_AVX") V4SF])
 
 (define_mode_iterator VFH_AVX512VL
-  [(V32HF "TARGET_AVX512FP16")
+  [(V32HF "TARGET_AVX512FP16 && TARGET_EVEX512")
    (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
    (V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
    (V16SF "TARGET_EVEX512") (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")
@@ -475,12 +475,14 @@
 (define_mode_iterator VF1_AVX512VL
   [(V16SF "TARGET_EVEX512") (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")])
 
-(define_mode_iterator VHFBF [V32HF V16HF V8HF V32BF V16BF V8BF])
+(define_mode_iterator VHFBF
+  [(V32HF "TARGET_EVEX512") V16HF V8HF
+   (V32BF "TARGET_EVEX512") V16BF V8BF])
 (define_mode_iterator VHFBF_256 [V16HF V16BF])
 (define_mode_iterator VHFBF_128 [V8HF V8BF])
 
 (define_mode_iterator VHF_AVX512VL
-  [V32HF (V16HF "TARGET_AVX512VL") (V8HF "TARGET_AVX512VL")])
+  [(V32HF "TARGET_EVEX512") (V16HF "TARGET_AVX512VL") (V8HF "TARGET_AVX512VL")])
 
 (define_mode_iterator VHFBF_AVX512VL
   [(V32HF "TARGET_EVEX512") (V16HF "TARGET_AVX512VL") (V8HF "TARGET_AVX512VL")
@@ -594,9 +596,9 @@
    (V8BF "TARGET_AVX512VL") (V16BF "TARGET_AVX512VL") (V32BF "TARGET_EVEX512")])
 
 (define_mode_iterator VI2H_AVX512VL
-  [(V8HI "TARGET_AVX512VL") (V16HI "TARGET_AVX512VL") V32HI
-   (V8SI "TARGET_AVX512VL") V16SI
-   V8DI ])
+  [(V8HI "TARGET_AVX512VL") (V16HI "TARGET_AVX512VL") (V32HI "TARGET_EVEX512")
+   (V8SI "TARGET_AVX512VL") (V16SI "TARGET_EVEX512")
+   (V8DI "TARGET_EVEX512")])
 
 (define_mode_iterator VI1_AVX512VL_F
   [V32QI (V16QI "TARGET_AVX512VL") (V64QI "TARGET_AVX512F && TARGET_EVEX512")])
@@ -883,7 +885,10 @@
    (V32BF "TARGET_AVX512BW && TARGET_EVEX512")])
 
 ;; Int-float size matches
-(define_mode_iterator VI2F_256_512 [V16HI V32HI V16HF V32HF V16BF V32BF])
+(define_mode_iterator VI2F_256_512
+  [V16HI (V32HI "TARGET_EVEX512")
+   V16HF (V32HF "TARGET_EVEX512")
+   V16BF (V32BF "TARGET_EVEX512")])
 (define_mode_iterator VI4F_128 [V4SI V4SF])
 (define_mode_iterator VI8F_128 [V2DI V2DF])
 (define_mode_iterator VI4F_256 [V8SI V8SF])
@@ -899,10 +904,8 @@
   (V8DI "TARGET_AVX512F && TARGET_EVEX512")
   (V8DF "TARGET_AVX512F && TARGET_EVEX512")
   (V4DI "TARGET_AVX512VL") (V4DF  "TARGET_AVX512VL")])
-(define_mode_iterator VF48_I1248
-  [V16SI V16SF V8DI V8DF V32HI V64QI])
 (define_mode_iterator VF48H_AVX512VL
-  [V8DF V16SF (V8SF "TARGET_AVX512VL")])
+  [(V8DF "TARGET_EVEX512") (V16SF "TARGET_EVEX512") (V8SF "TARGET_AVX512VL")])
 
 (define_mode_iterator VF48_128
   [V2DF V4SF])
@@ -928,11 +931,6 @@
 
 (define_mode_iterator VI48F_256 [V8SI V8SF V4DI V4DF])
 
-(define_mode_iterator VF_AVX512
-  [(V4SF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")
-   (V8SF "TARGET_AVX512VL") (V4DF "TARGET_AVX512VL")
-   V16SF V8DF])
-
 (define_mode_iterator V8_128 [V8HI V8HF V8BF])
 (define_mode_iterator V16_256 [V16HI V16HF V16BF])
 (define_mode_iterator V32_512
@@ -3419,7 +3417,7 @@
   (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
   (V8DF "TARGET_AVX512F && TARGET_EVEX512")
   (V16SF "TARGET_AVX512F && TARGET_EVEX512")
-  (V32HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
+  (V32HF "TARGET_AVX512FP16 && TARGET_AVX512VL && TARGET_EVEX512")
   (V32QI "TARGET_AVX")
   (V64QI "TARGET_AVX512F && TARGET_EVEX512")])
 
@@ -3464,7 +3462,7 @@
    (V8SI "TARGET_AVX2") (V4DI "TARGET_AVX2")
    (V8SF "TARGET_AVX") (V4DF "TARGET_AVX")
    (V64QI "TARGET_AVX512BW && TARGET_EVEX512")
-   (V32HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
+   (V32HF "TARGET_AVX512FP16 && TARGET_AVX512VL && TARGET_EVEX512")
    (V32HI "TARGET_AVX512BW && TARGET_EVEX512")
    (V16SI "TARGET_AVX512F && TARGET_EVEX512")
    (V8DI "TARGET_AVX512F && TARGET_EVEX512")
@@ -5318,7 +5316,7 @@
    (HF "TARGET_AVX512FP16")
    (V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
    (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
-   (V32HF "TARGET_AVX512FP16")])
+   (V32HF "TARGET_AVX512FP16 && TARGET_EVEX512")])
 
 (define_expand "fma<mode>4"
   [(set (match_operand:FMAMODEM 0 "register_operand")
@@ -5427,7 +5425,7 @@
 
 ;; Suppose AVX-512F as baseline
 (define_mode_iterator VFH_SF_AVX512VL
-  [(V32HF "TARGET_AVX512FP16")
+  [(V32HF "TARGET_AVX512FP16 && TARGET_EVEX512")
    (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
    (V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
    (HF "TARGET_AVX512FP16")
@@ -17322,7 +17320,7 @@
    (V8DI "TARGET_AVX512F && TARGET_EVEX512")
    (V32HI "TARGET_AVX512BW && TARGET_EVEX512")
    (V64QI "TARGET_AVX512VBMI && TARGET_EVEX512")
-   (V32HF "TARGET_AVX512FP16")])
+   (V32HF "TARGET_AVX512FP16 && TARGET_EVEX512")])
 
 (define_expand "vec_perm<mode>"
   [(match_operand:VEC_PERM_AVX2 0 "register_operand")
-- 
2.31.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 18/18] Allow -mno-evex512 usage
  2023-09-21  7:19 [PATCH 00/18] Support -mevex512 for AVX512 Hu, Lin1
                   ` (16 preceding siblings ...)
  2023-09-21  7:20 ` [PATCH 17/18] Support -mevex512 for AVX512FP16 intrins Hu, Lin1
@ 2023-09-21  7:20 ` Hu, Lin1
  2023-09-22  3:30 ` [PATCH 00/18] Support -mevex512 for AVX512 Hongtao Liu
  2023-09-28  0:32 ` ZiNgA BuRgA
  19 siblings, 0 replies; 25+ messages in thread
From: Hu, Lin1 @ 2023-09-21  7:20 UTC (permalink / raw)
  To: gcc-patches; +Cc: hongtao.liu, ubizjak, haochen.jiang

From: Haochen Jiang <haochen.jiang@intel.com>

gcc/ChangeLog:

	* config/i386/i386.opt: Allow -mno-evex512.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/noevex512-1.c: New test.
	* gcc.target/i386/noevex512-2.c: Ditto.
	* gcc.target/i386/noevex512-3.c: Ditto.
---
 gcc/config/i386/i386.opt                    |  2 +-
 gcc/testsuite/gcc.target/i386/noevex512-1.c | 13 +++++++++++++
 gcc/testsuite/gcc.target/i386/noevex512-2.c | 13 +++++++++++++
 gcc/testsuite/gcc.target/i386/noevex512-3.c | 13 +++++++++++++
 4 files changed, 40 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/noevex512-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/noevex512-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/noevex512-3.c

diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index 6d8601b1f75..34fc167af82 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -1312,5 +1312,5 @@ Target Alias(mtune-ctrl=, use_scatter, ^use_scatter)
 Enable vectorization for scatter instruction.
 
 mevex512
-Target RejectNegative Mask(ISA2_EVEX512) Var(ix86_isa_flags2) Save
+Target Mask(ISA2_EVEX512) Var(ix86_isa_flags2) Save
 Support 512 bit vector built-in functions and code generation.
diff --git a/gcc/testsuite/gcc.target/i386/noevex512-1.c b/gcc/testsuite/gcc.target/i386/noevex512-1.c
new file mode 100644
index 00000000000..7fd45f15be6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/noevex512-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O0 -march=x86-64 -mavx512f -mno-evex512 -Wno-psabi" } */
+/* { dg-final { scan-assembler-not ".%zmm" } } */
+
+typedef double __m512d __attribute__ ((__vector_size__ (64), __may_alias__));
+
+__m512d
+foo ()
+{
+  __m512d a, b;
+  a = a + b;
+  return a;
+}
diff --git a/gcc/testsuite/gcc.target/i386/noevex512-2.c b/gcc/testsuite/gcc.target/i386/noevex512-2.c
new file mode 100644
index 00000000000..1c206e385d0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/noevex512-2.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=x86-64 -mavx512bw -mno-evex512" } */
+
+#include <immintrin.h>
+
+long long
+foo (long long c)
+{
+  register long long a __asm ("k7") = c;
+  long long b = foo (a);
+  asm volatile ("" : "+k" (b)); /* { dg-error "inconsistent operand constraints in an 'asm'" } */
+  return b;
+}
diff --git a/gcc/testsuite/gcc.target/i386/noevex512-3.c b/gcc/testsuite/gcc.target/i386/noevex512-3.c
new file mode 100644
index 00000000000..10e00c2d61c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/noevex512-3.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-march=x86-64 -Wno-psabi -mavx512f" } */
+/* { dg-final { scan-assembler-not ".%zmm" } } */
+
+typedef double __m512d __attribute__ ((__vector_size__ (64), __may_alias__));
+
+__attribute__ ((target ("no-evex512"))) __m512d
+foo ()
+{
+  __m512d a, b;
+  a = a + b;
+  return a;
+}
-- 
2.31.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 00/18] Support -mevex512 for AVX512
  2023-09-21  7:19 [PATCH 00/18] Support -mevex512 for AVX512 Hu, Lin1
                   ` (17 preceding siblings ...)
  2023-09-21  7:20 ` [PATCH 18/18] Allow -mno-evex512 usage Hu, Lin1
@ 2023-09-22  3:30 ` Hongtao Liu
  2023-09-28  0:32 ` ZiNgA BuRgA
  19 siblings, 0 replies; 25+ messages in thread
From: Hongtao Liu @ 2023-09-22  3:30 UTC (permalink / raw)
  To: Hu, Lin1; +Cc: gcc-patches, hongtao.liu, ubizjak, haochen.jiang

On Thu, Sep 21, 2023 at 3:22 PM Hu, Lin1 <lin1.hu@intel.com> wrote:
>
> Hi all,
>
> After previous discussion, instead of supporting option -mavx10.1, we
> will first introduct option -m[no-]evex512, which will enable/disable
> 512 bit register and 64 bit mask register.
>
> It will not change the current option behavior since if AVX512F is
> enabled with no evex512 option specified, it will automatically enable
> 512 bit register and 64 bit mask register.
>
> How the patches go comes following:
>
> Patch 1 added initial support for option -mevex512.
>
> Patch 2-6 refined current intrin file to push evex512 target for all
> 512 bit intrins. Those scalar intrins remained untouched.
>
> Patch 7-11 added OPTION_MASK_ISA2_EVEX512 for all related builtins.
>
> Patch 12 disabled zmm register, 512 bit libmvec call for no-evex512,
> also requested evex512 for vectorization when using 512 bit register.
>
> Patch 13-17 supported evex512 in related patterns.
>
> Patch 18 added testcases for -mno-evex512 and allowed its usage.
>
> The patches currently cause scan-asm fail for pr89229-{5,6,7}b.c since
> we will emit scalar vmovss here. When trying to use x/ymm 16+ w/o
> avx512vl but with avx512f+evex512, I suppose we could either emit scalar
> or zmm instructions. It is quite a rare case on HW since there is no
> HW w/o avx512vl but with avx512f, so I prefer to not to add maintainence
> effort here to get a slightly perf improvement. But it could be changed
> to former behavior.
To make it easier for people to test before committing, I pushed the
patch to the vendor branch
refs/vendors/ix86/heads/evex512.
Welcome to try it out.

>
> Discussions are welcomed for all the patches.
>
> Thx,
> Haochen
>
> Haochen Jiang (18):
>   Initial support for -mevex512
>   Push evex512 target for 512 bit intrins
>   Push evex512 target for 512 bit intrins
>   Push evex512 target for 512 bit intrins
>   Push evex512 target for 512 bit intrins
>   Push evex512 target for 512 bit intrins
>   Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins
>   Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins
>   Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins
>   Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins
>   Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins
>   Disable zmm register and 512 bit libmvec call when !TARGET_EVEX512
>   Support -mevex512 for AVX512F intrins
>   Support -mevex512 for AVX512DQ intrins
>   Support -mevex512 for AVX512BW intrins
>   Support -mevex512 for
>     AVX512{IFMA,VBMI,VNNI,BF16,VPOPCNTDQ,VBMI2,BITALG,VP2INTERSECT},VAES,GFNI,VPCLMULQDQ
>     intrins
>   Support -mevex512 for AVX512FP16 intrins
>   Allow -mno-evex512 usage
>
>  gcc/common/config/i386/i386-common.cc       |    15 +
>  gcc/config.gcc                              |    19 +-
>  gcc/config/i386/avx5124fmapsintrin.h        |     2 +-
>  gcc/config/i386/avx5124vnniwintrin.h        |     2 +-
>  gcc/config/i386/avx512bf16intrin.h          |    31 +-
>  gcc/config/i386/avx512bitalgintrin.h        |   155 +-
>  gcc/config/i386/avx512bitalgvlintrin.h      |   180 +
>  gcc/config/i386/avx512bwintrin.h            |   291 +-
>  gcc/config/i386/avx512dqintrin.h            |  1840 +-
>  gcc/config/i386/avx512erintrin.h            |     2 +-
>  gcc/config/i386/avx512fintrin.h             | 19663 +++++++++---------
>  gcc/config/i386/avx512fp16intrin.h          |  8925 ++++----
>  gcc/config/i386/avx512ifmaintrin.h          |     4 +-
>  gcc/config/i386/avx512pfintrin.h            |     2 +-
>  gcc/config/i386/avx512vbmi2intrin.h         |     4 +-
>  gcc/config/i386/avx512vbmiintrin.h          |     4 +-
>  gcc/config/i386/avx512vnniintrin.h          |     4 +-
>  gcc/config/i386/avx512vp2intersectintrin.h  |     4 +-
>  gcc/config/i386/avx512vpopcntdqintrin.h     |     4 +-
>  gcc/config/i386/gfniintrin.h                |    76 +-
>  gcc/config/i386/i386-builtin.def            |  1312 +-
>  gcc/config/i386/i386-builtins.cc            |    96 +-
>  gcc/config/i386/i386-c.cc                   |     2 +
>  gcc/config/i386/i386-expand.cc              |    18 +-
>  gcc/config/i386/i386-options.cc             |    33 +-
>  gcc/config/i386/i386.cc                     |   168 +-
>  gcc/config/i386/i386.h                      |     7 +-
>  gcc/config/i386/i386.md                     |   127 +-
>  gcc/config/i386/i386.opt                    |     4 +
>  gcc/config/i386/immintrin.h                 |     2 +
>  gcc/config/i386/predicates.md               |     3 +-
>  gcc/config/i386/sse.md                      |   854 +-
>  gcc/config/i386/vaesintrin.h                |     4 +-
>  gcc/config/i386/vpclmulqdqintrin.h          |     4 +-
>  gcc/testsuite/gcc.target/i386/noevex512-1.c |    13 +
>  gcc/testsuite/gcc.target/i386/noevex512-2.c |    13 +
>  gcc/testsuite/gcc.target/i386/noevex512-3.c |    13 +
>  gcc/testsuite/gcc.target/i386/pr89229-5b.c  |     2 +-
>  gcc/testsuite/gcc.target/i386/pr89229-6b.c  |     2 +-
>  gcc/testsuite/gcc.target/i386/pr89229-7b.c  |     2 +-
>  gcc/testsuite/gcc.target/i386/pr90096.c     |     2 +-
>  41 files changed, 17170 insertions(+), 16738 deletions(-)
>  create mode 100644 gcc/config/i386/avx512bitalgvlintrin.h
>  create mode 100644 gcc/testsuite/gcc.target/i386/noevex512-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/noevex512-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/noevex512-3.c
>
> --
> 2.31.1
>


-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 00/18] Support -mevex512 for AVX512
  2023-09-21  7:19 [PATCH 00/18] Support -mevex512 for AVX512 Hu, Lin1
                   ` (18 preceding siblings ...)
  2023-09-22  3:30 ` [PATCH 00/18] Support -mevex512 for AVX512 Hongtao Liu
@ 2023-09-28  0:32 ` ZiNgA BuRgA
  2023-09-28  2:26   ` Hu, Lin1
  19 siblings, 1 reply; 25+ messages in thread
From: ZiNgA BuRgA @ 2023-09-28  0:32 UTC (permalink / raw)
  To: lin1.hu, gcc-patches

Thanks for the new patch!

I see that there's a new __EVEX512__ define.  Will there be some 
__EVEX256__ (or maybe some max EVEX width) define, so that code can 
detect whether the compiler supports AVX10.1/256 without resorting to 
version checks?



^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH 00/18] Support -mevex512 for AVX512
  2023-09-28  0:32 ` ZiNgA BuRgA
@ 2023-09-28  2:26   ` Hu, Lin1
  2023-09-28  3:23     ` ZiNgA BuRgA
  0 siblings, 1 reply; 25+ messages in thread
From: Hu, Lin1 @ 2023-09-28  2:26 UTC (permalink / raw)
  To: ZiNgA BuRgA, gcc-patches

Hi, 

Thanks for you reply.

I'd like to verify that our understanding of your requirements is correct, and that __EVEX256__ can be considered a default macro to determine whether the compiler supports the __EVEX***__ series of switches. 

For example:

I have a segment of code like:
#if defined(__EVEX512__):
__mm512.*__;
#else
__mm256.*__;
#endif

But __EVEX512__ is undefined that doesn't mean I only need 256bit, maybe I use gcc-13, so I can still use 512bit.

So the code should be:
#if defined(__EVEX512__):
__mm512.*__;
#elif defined(__EVEX256__):
__mm256.*__;
#else
__mm512.*__;
#endif

If we understand correctly, we'll consider the request. But since we're about to have a vacation, follow-up replies may be a bit slower.

BRs,
Lin

-----Original Message-----
From: ZiNgA BuRgA <zingaburga@hotmail.com> 
Sent: Thursday, September 28, 2023 8:32 AM
To: Hu, Lin1 <lin1.hu@intel.com>; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH 00/18] Support -mevex512 for AVX512

Thanks for the new patch!

I see that there's a new __EVEX512__ define.  Will there be some __EVEX256__ (or maybe some max EVEX width) define, so that code can detect whether the compiler supports AVX10.1/256 without resorting to version checks?



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 00/18] Support -mevex512 for AVX512
  2023-09-28  2:26   ` Hu, Lin1
@ 2023-09-28  3:23     ` ZiNgA BuRgA
  2023-10-07  2:33       ` Hongtao Liu
  0 siblings, 1 reply; 25+ messages in thread
From: ZiNgA BuRgA @ 2023-09-28  3:23 UTC (permalink / raw)
  To: Hu, Lin1, gcc-patches

That sounds about right.  The code I had in mind would perhaps look like:


#if defined(__AVX512BW__) && defined(__AVX512VL__)
     #if defined(__EVEX256__) && !defined(__EVEX512__)
         // compiled code is AVX10.1/256 and AVX512 compatible
     #else
         // compiled code is only AVX512 compatible
     #endif

     // some code which only uses 256b instructions
     __m256i...
#endif


The '__EVEX256__' define would avoid needing to check compiler versions.
Hopefully you can align it with whatever Clang does: 
https://discourse.llvm.org/t/rfc-design-for-avx10-feature-support/72661/18

Thanks!

On 28/09/2023 12:26 pm, Hu, Lin1 wrote:
> Hi,
>
> Thanks for you reply.
>
> I'd like to verify that our understanding of your requirements is correct, and that __EVEX256__ can be considered a default macro to determine whether the compiler supports the __EVEX***__ series of switches.
>
> For example:
>
> I have a segment of code like:
> #if defined(__EVEX512__):
> __mm512.*__;
> #else
> __mm256.*__;
> #endif
>
> But __EVEX512__ is undefined that doesn't mean I only need 256bit, maybe I use gcc-13, so I can still use 512bit.
>
> So the code should be:
> #if defined(__EVEX512__):
> __mm512.*__;
> #elif defined(__EVEX256__):
> __mm256.*__;
> #else
> __mm512.*__;
> #endif
>
> If we understand correctly, we'll consider the request. But since we're about to have a vacation, follow-up replies may be a bit slower.
>
> BRs,
> Lin
>
> -----Original Message-----
> From: ZiNgA BuRgA <zingaburga@hotmail.com>
> Sent: Thursday, September 28, 2023 8:32 AM
> To: Hu, Lin1 <lin1.hu@intel.com>; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH 00/18] Support -mevex512 for AVX512
>
> Thanks for the new patch!
>
> I see that there's a new __EVEX512__ define.  Will there be some __EVEX256__ (or maybe some max EVEX width) define, so that code can detect whether the compiler supports AVX10.1/256 without resorting to version checks?
>
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 00/18] Support -mevex512 for AVX512
  2023-09-28  3:23     ` ZiNgA BuRgA
@ 2023-10-07  2:33       ` Hongtao Liu
  0 siblings, 0 replies; 25+ messages in thread
From: Hongtao Liu @ 2023-10-07  2:33 UTC (permalink / raw)
  To: ZiNgA BuRgA; +Cc: Hu, Lin1, gcc-patches

On Thu, Sep 28, 2023 at 11:23 AM ZiNgA BuRgA <zingaburga@hotmail.com> wrote:
>
> That sounds about right.  The code I had in mind would perhaps look like:
>
>
> #if defined(__AVX512BW__) && defined(__AVX512VL__)
>      #if defined(__EVEX256__) && !defined(__EVEX512__)
>          // compiled code is AVX10.1/256 and AVX512 compatible
>      #else
>          // compiled code is only AVX512 compatible
>      #endif
>
>      // some code which only uses 256b instructions
>      __m256i...
> #endif
>
>
> The '__EVEX256__' define would avoid needing to check compiler versions.
Sounds reasonable, regarding how to set __EVEX256__, I think it should
be set/unset along with __AVX512VL__ and __EVEX512__ should not unset
__EVEX256__.

> Hopefully you can align it with whatever Clang does:
> https://discourse.llvm.org/t/rfc-design-for-avx10-feature-support/72661/18

>
> Thanks!
>
> On 28/09/2023 12:26 pm, Hu, Lin1 wrote:
> > Hi,
> >
> > Thanks for you reply.
> >
> > I'd like to verify that our understanding of your requirements is correct, and that __EVEX256__ can be considered a default macro to determine whether the compiler supports the __EVEX***__ series of switches.
> >
> > For example:
> >
> > I have a segment of code like:
> > #if defined(__EVEX512__):
> > __mm512.*__;
> > #else
> > __mm256.*__;
> > #endif
> >
> > But __EVEX512__ is undefined that doesn't mean I only need 256bit, maybe I use gcc-13, so I can still use 512bit.
> >
> > So the code should be:
> > #if defined(__EVEX512__):
> > __mm512.*__;
> > #elif defined(__EVEX256__):
> > __mm256.*__;
> > #else
> > __mm512.*__;
> > #endif
> >
> > If we understand correctly, we'll consider the request. But since we're about to have a vacation, follow-up replies may be a bit slower.
> >
> > BRs,
> > Lin
> >
> > -----Original Message-----
> > From: ZiNgA BuRgA <zingaburga@hotmail.com>
> > Sent: Thursday, September 28, 2023 8:32 AM
> > To: Hu, Lin1 <lin1.hu@intel.com>; gcc-patches@gcc.gnu.org
> > Subject: Re: [PATCH 00/18] Support -mevex512 for AVX512
> >
> > Thanks for the new patch!
> >
> > I see that there's a new __EVEX512__ define.  Will there be some __EVEX256__ (or maybe some max EVEX width) define, so that code can detect whether the compiler supports AVX10.1/256 without resorting to version checks?
> >
> >
>


-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 01/18] Initial support for -mevex512
  2023-09-21  7:19 ` [PATCH 01/18] Initial support for -mevex512 Hu, Lin1
@ 2023-10-07  6:34   ` Haochen Jiang
  0 siblings, 0 replies; 25+ messages in thread
From: Haochen Jiang @ 2023-10-07  6:34 UTC (permalink / raw)
  To: gcc-patches; +Cc: hongtao.liu, ubizjak, zingaburga

Hi all,

Sorry for the patch revision delay since just back from the vacation.

I have slightly revised this patch for the __EVEX256__ request with the code:

diff --git a/gcc/config/i386/i386-c.cc b/gcc/config/i386/i386-c.cc
index 47768fa0940..9c44bd7fb63 100644
--- a/gcc/config/i386/i386-c.cc
+++ b/gcc/config/i386/i386-c.cc
@@ -546,7 +546,10 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
   if (isa_flag & OPTION_MASK_ISA_AVX512BW)
     def_or_undef (parse_in, "__AVX512BW__");
   if (isa_flag & OPTION_MASK_ISA_AVX512VL)
-    def_or_undef (parse_in, "__AVX512VL__");
+    {
+      def_or_undef (parse_in, "__AVX512VL__");
+      def_or_undef (parse_in, "__EVEX256__");
+    }
   if (isa_flag & OPTION_MASK_ISA_AVX512VBMI)
     def_or_undef (parse_in, "__AVX512VBMI__");
   if (isa_flag & OPTION_MASK_ISA_AVX512IFMA)

See if it meets the need. If there is no concern, I will commit all 18 patches
on Monday or Tuesday.

Thx,
Haochen

gcc/ChangeLog:

	* common/config/i386/i386-common.cc
	(OPTION_MASK_ISA2_EVEX512_SET): New.
	(OPTION_MASK_ISA2_EVEX512_UNSET): Ditto.
	(ix86_handle_option): Handle EVEX512.
	* config/i386/i386-c.cc
	(ix86_target_macros_internal): Handle EVEX512. Add __EVEX256__
	when AVX512VL is set.
	* config/i386/i386-options.cc: (isa2_opts): Handle EVEX512.
	(ix86_valid_target_attribute_inner_p): Ditto.
	(ix86_option_override_internal): Set EVEX512 target if it is not
	explicitly set when AVX512 is enabled. Disable
	AVX512{PF,ER,4VNNIW,4FAMPS} for -mno-evex512.
	* config/i386/i386.opt: Add mevex512. Temporaily RejectNegative.
---
 gcc/common/config/i386/i386-common.cc | 15 +++++++++++++++
 gcc/config/i386/i386-c.cc             |  7 ++++++-
 gcc/config/i386/i386-options.cc       | 19 ++++++++++++++++++-
 gcc/config/i386/i386.opt              |  4 ++++
 4 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/gcc/common/config/i386/i386-common.cc b/gcc/common/config/i386/i386-common.cc
index 95468b7c405..8cc59e08d06 100644
--- a/gcc/common/config/i386/i386-common.cc
+++ b/gcc/common/config/i386/i386-common.cc
@@ -123,6 +123,7 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA2_SM3_SET OPTION_MASK_ISA2_SM3
 #define OPTION_MASK_ISA2_SHA512_SET OPTION_MASK_ISA2_SHA512
 #define OPTION_MASK_ISA2_SM4_SET OPTION_MASK_ISA2_SM4
+#define OPTION_MASK_ISA2_EVEX512_SET OPTION_MASK_ISA2_EVEX512
 
 /* SSE4 includes both SSE4.1 and SSE4.2. -msse4 should be the same
    as -msse4.2.  */
@@ -309,6 +310,7 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA2_SM3_UNSET OPTION_MASK_ISA2_SM3
 #define OPTION_MASK_ISA2_SHA512_UNSET OPTION_MASK_ISA2_SHA512
 #define OPTION_MASK_ISA2_SM4_UNSET OPTION_MASK_ISA2_SM4
+#define OPTION_MASK_ISA2_EVEX512_UNSET OPTION_MASK_ISA2_EVEX512
 
 /* SSE4 includes both SSE4.1 and SSE4.2.  -mno-sse4 should the same
    as -mno-sse4.1. */
@@ -1341,6 +1343,19 @@ ix86_handle_option (struct gcc_options *opts,
 	}
       return true;
 
+    case OPT_mevex512:
+      if (value)
+	{
+	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_EVEX512_SET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_EVEX512_SET;
+	}
+      else
+	{
+	  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_EVEX512_UNSET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_EVEX512_UNSET;
+	}
+      return true;
+
     case OPT_mfma:
       if (value)
 	{
diff --git a/gcc/config/i386/i386-c.cc b/gcc/config/i386/i386-c.cc
index 47768fa0940..9c44bd7fb63 100644
--- a/gcc/config/i386/i386-c.cc
+++ b/gcc/config/i386/i386-c.cc
@@ -546,7 +546,10 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
   if (isa_flag & OPTION_MASK_ISA_AVX512BW)
     def_or_undef (parse_in, "__AVX512BW__");
   if (isa_flag & OPTION_MASK_ISA_AVX512VL)
-    def_or_undef (parse_in, "__AVX512VL__");
+    {
+      def_or_undef (parse_in, "__AVX512VL__");
+      def_or_undef (parse_in, "__EVEX256__");
+    }
   if (isa_flag & OPTION_MASK_ISA_AVX512VBMI)
     def_or_undef (parse_in, "__AVX512VBMI__");
   if (isa_flag & OPTION_MASK_ISA_AVX512IFMA)
@@ -707,6 +710,8 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
     def_or_undef (parse_in, "__SHA512__");
   if (isa_flag2 & OPTION_MASK_ISA2_SM4)
     def_or_undef (parse_in, "__SM4__");
+  if (isa_flag2 & OPTION_MASK_ISA2_EVEX512)
+    def_or_undef (parse_in, "__EVEX512__");
   if (TARGET_IAMCU)
     {
       def_or_undef (parse_in, "__iamcu");
diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
index e47f9ed5d5f..a1a7a92da9f 100644
--- a/gcc/config/i386/i386-options.cc
+++ b/gcc/config/i386/i386-options.cc
@@ -250,7 +250,8 @@ static struct ix86_target_opts isa2_opts[] =
   { "-mavxvnniint16",	OPTION_MASK_ISA2_AVXVNNIINT16 },
   { "-msm3",		OPTION_MASK_ISA2_SM3 },
   { "-msha512",		OPTION_MASK_ISA2_SHA512 },
-  { "-msm4",            OPTION_MASK_ISA2_SM4 }
+  { "-msm4",            OPTION_MASK_ISA2_SM4 },
+  { "-mevex512",        OPTION_MASK_ISA2_EVEX512 }
 };
 static struct ix86_target_opts isa_opts[] =
 {
@@ -1109,6 +1110,7 @@ ix86_valid_target_attribute_inner_p (tree fndecl, tree args, char *p_strings[],
     IX86_ATTR_ISA ("sm3", OPT_msm3),
     IX86_ATTR_ISA ("sha512", OPT_msha512),
     IX86_ATTR_ISA ("sm4", OPT_msm4),
+    IX86_ATTR_ISA ("evex512", OPT_mevex512),
 
     /* enum options */
     IX86_ATTR_ENUM ("fpmath=",	OPT_mfpmath_),
@@ -2559,6 +2561,21 @@ ix86_option_override_internal (bool main_args_p,
       &= ~((OPTION_MASK_ISA_BMI | OPTION_MASK_ISA_BMI2 | OPTION_MASK_ISA_TBM)
 	   & ~opts->x_ix86_isa_flags_explicit);
 
+  /* Set EVEX512 target if it is not explicitly set
+     when AVX512 is enabled.  */
+  if (TARGET_AVX512F_P(opts->x_ix86_isa_flags)
+      && !(opts->x_ix86_isa_flags2_explicit & OPTION_MASK_ISA2_EVEX512))
+    opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_EVEX512;
+
+  /* Disable AVX512{PF,ER,4VNNIW,4FAMPS} for -mno-evex512.  */
+  if (!TARGET_EVEX512_P(opts->x_ix86_isa_flags2))
+    {
+      opts->x_ix86_isa_flags
+	&= ~(OPTION_MASK_ISA_AVX512PF | OPTION_MASK_ISA_AVX512ER);
+      opts->x_ix86_isa_flags2
+	&= ~(OPTION_MASK_ISA2_AVX5124FMAPS | OPTION_MASK_ISA2_AVX5124VNNIW);
+    }
+
   /* Validate -mpreferred-stack-boundary= value or default it to
      PREFERRED_STACK_BOUNDARY_DEFAULT.  */
   ix86_preferred_stack_boundary = PREFERRED_STACK_BOUNDARY_DEFAULT;
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index 78b499304a4..6d8601b1f75 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -1310,3 +1310,7 @@ Enable vectorization for gather instruction.
 mscatter
 Target Alias(mtune-ctrl=, use_scatter, ^use_scatter)
 Enable vectorization for scatter instruction.
+
+mevex512
+Target RejectNegative Mask(ISA2_EVEX512) Var(ix86_isa_flags2) Save
+Support 512 bit vector built-in functions and code generation.
-- 
2.31.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2023-10-07  6:36 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-21  7:19 [PATCH 00/18] Support -mevex512 for AVX512 Hu, Lin1
2023-09-21  7:19 ` [PATCH 01/18] Initial support for -mevex512 Hu, Lin1
2023-10-07  6:34   ` [PATCH v2 " Haochen Jiang
2023-09-21  7:19 ` [PATCH 02/18] [PATCH 1/5] Push evex512 target for 512 bit intrins Hu, Lin1
2023-09-21  7:19 ` [PATCH 03/18] [PATCH 2/5] " Hu, Lin1
2023-09-21  7:19 ` [PATCH 04/18] [PATCH 3/5] " Hu, Lin1
2023-09-21  7:20 ` [PATCH 05/18] [PATCH 4/5] " Hu, Lin1
2023-09-21  7:20 ` [PATCH 06/18] [PATCH 5/5] " Hu, Lin1
2023-09-21  7:20 ` [PATCH 07/18] [PATCH 1/5] Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins Hu, Lin1
2023-09-21  7:20 ` [PATCH 08/18] [PATCH 2/5] " Hu, Lin1
2023-09-21  7:20 ` [PATCH 09/18] [PATCH 3/5] " Hu, Lin1
2023-09-21  7:20 ` [PATCH 10/18] [PATCH 4/5] " Hu, Lin1
2023-09-21  7:20 ` [PATCH 11/18] [PATCH 5/5] " Hu, Lin1
2023-09-21  7:20 ` [PATCH 12/18] Disable zmm register and 512 bit libmvec call when !TARGET_EVEX512 Hu, Lin1
2023-09-21  7:20 ` [PATCH 13/18] Support -mevex512 for AVX512F intrins Hu, Lin1
2023-09-21  7:20 ` [PATCH 14/18] Support -mevex512 for AVX512DQ intrins Hu, Lin1
2023-09-21  7:20 ` [PATCH 15/18] Support -mevex512 for AVX512BW intrins Hu, Lin1
2023-09-21  7:20 ` [PATCH 16/18] Support -mevex512 for AVX512{IFMA,VBMI,VNNI,BF16,VPOPCNTDQ,VBMI2,BITALG,VP2INTERSECT},VAES,GFNI,VPCLMULQDQ intrins Hu, Lin1
2023-09-21  7:20 ` [PATCH 17/18] Support -mevex512 for AVX512FP16 intrins Hu, Lin1
2023-09-21  7:20 ` [PATCH 18/18] Allow -mno-evex512 usage Hu, Lin1
2023-09-22  3:30 ` [PATCH 00/18] Support -mevex512 for AVX512 Hongtao Liu
2023-09-28  0:32 ` ZiNgA BuRgA
2023-09-28  2:26   ` Hu, Lin1
2023-09-28  3:23     ` ZiNgA BuRgA
2023-10-07  2:33       ` Hongtao Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).